How to... export to an .xlsx format?
Assumptions
This article assumes all of the following:
- That you have access to a Unity Catalog which in turn has an external S3 location mounted on a volume path
- That you have access to a Unity Cluster from which to run this code
- That the Maven library com.crealytics:spark-excel_2.12:3.5.0_0.20.3 is installed on the cluster and enabled on the Catalog.
Writing to .xlsx format
The below script allows you to export a table into a .xlsx formatted spreadsheet. This code only creates one sheet per Excel workbook.
import os
from pyspark.sql.functions import *
from pyspark.sql.types import *
code = "select * from <catalog>.<schema>.<table>"
df=spark.sql(code)
file_location = "/Volumes/<catalog>/<schema>/<location>/<table_path>.xlsx";
df.write.format("com.crealytics.spark.excel")\
.option("header", "true")\
.mode("overwrite")\
.save(file_location)
Scheduled jobs
If you are setting up a scheduled job you need to perform the following steps to allow this to work
- Once your scheduled job is created, click on tasks and go to the task which has the excel extract
- In the Dependent libraries, select Maven and insert the value into the Coordinates and click add, then save task
See Also: