How to... load a flat file?

A flat file is a type of file that stores data in a plain, simple format without structured relationships, often using text or binary encoding. The data is typically organized in a single, two-dimensional table, with rows representing records and columns representing fields.

The below code examples load that data from a flat file, to a dataframe, to a SQL table

Load a flat file from a mounted S3 location:

%python


from pyspark.sql.types import *


file_location = "/Volumes/<catalog>/<schema>/<voume_name>/<S3_file_path>/<filename>.csv"

df = (spark.read.option("header", "true"  #header record true/false
            ).option("delimiter", ","  #Define delimiter here
            ).csv(file_location)
            )
#insert into existing table
df.write.insertInto("<catalog>.<volume>.<table>")

Load file using access keys:

 %python

user_id = "<username>"
access_key = dbutils.secrets.get(user_id, "access_key")
secret_key = dbutils.secrets.get(user_id, "secret_key").replace("/", "%2F")
secret_key_full = dbutils.secrets.get(user_id, "secret_key")
S3_PREFIX  = f's3a://{access_key}:{secret_key}@'


s3_path =  S3_PREFIX+f"<bucket_name>/<file_path>/<filename>.csv";

# Read the file into a DataFrame
df = spark.read.option("delimiter", "|").csv(s3_path, header=True, inferSchema=True)

# Define the SQL table name
table_name = "<catalog>.<volume>.<table>"
# Save the DataFrame as a SQL table
df.write.mode("overwrite").saveAsTable(table_name)