How to... export to an .xlsx format?

Assumptions

This article assumes all of the following:

That you have access to a Unity Catalog which in turn has an external S3 location mounted on a volume path
That you have access to a Unity Cluster from which to run this code
That the Maven library com.crealytics:spark-excel_2.12:3.5.0_0.20.3 is installed on the cluster and enabled on the Catalog.

Writing to .xlsx format

The below script allows you to export a table into a .xlsx formatted spreadsheet. This code only creates one sheet per Excel workbook.

import os
from pyspark.sql.functions import *
from pyspark.sql.types import *

code = "select * from <catalog>.<schema>.<table>"
df=spark.sql(code)

file_location = "/Volumes/<catalog>/<schema>/<location>/<table_path>.xlsx"; 

df.write.format("com.crealytics.spark.excel")\
  .option("header", "true")\
  .mode("overwrite")\
  .save(file_location)

Scheduled jobs

If you are setting up a scheduled job you need to perform the following steps to allow this to work

Once your scheduled job is created, click on tasks and go to the task which has the excel extract
In the Dependent libraries, select Maven and insert the value into the Coordinates and click add, then save task

See Also: