A Parquet file is a columnar storage file format used for efficiently storing large datasets. It is primarily used in big data processing systems like Hadoop, Apache Spark, and data lakes. Parquet is designed to provide efficient data compression, storage, and retrieval, particularly in analytics workloads, which often require scanning large datasets.
The below code snippet loads data from a flat file into a dataframe, and then writes that into a SQL table.
from pyspark.sql.types import *
file_location = "/Volumes/<catalog>/<schema>/<table>/<file_path>/<filename_foo>.parquet";
df = spark.read.parquet(file_location)
df.write.mode("overwrite").saveAsTable("<catalog>.<schema>.<tablename_bar>")
See Also: