How to... load a flat file?
A flat file is a type of file that stores data in a plain, simple format without structured relationships, often using text or binary encoding. The data is typically organized in a single, two-dimensional table, with rows representing records and columns representing fields.
The below code examples load that data from a flat file, to a dataframe, to a SQL table
Load a flat file from a mounted S3 location:
%python
from pyspark.sql.types import *
file_location = "/Volumes/<catalog>/<schema>/<voume_name>/<S3_file_path>/<filename>.csv"
df = (spark.read.option("header", "true" #header record true/false
).option("delimiter", "," #Define delimiter here
).csv(file_location)
)
#insert into existing table
df.write.insertInto("<catalog>.<volume>.<table>")
Load file using access keys:
%python
user_id = "<username>"
access_key = dbutils.secrets.get(user_id, "access_key")
secret_key = dbutils.secrets.get(user_id, "secret_key").replace("/", "%2F")
secret_key_full = dbutils.secrets.get(user_id, "secret_key")
S3_PREFIX = f's3a://{access_key}:{secret_key}@'
s3_path = S3_PREFIX+f"<bucket_name>/<file_path>/<filename>.csv";
# Read the file into a DataFrame
df = spark.read.option("delimiter", "|").csv(s3_path, header=True, inferSchema=True)
# Define the SQL table name
table_name = "<catalog>.<volume>.<table>"
# Save the DataFrame as a SQL table
df.write.mode("overwrite").saveAsTable(table_name)
See Also: