How to... access data from S3?
There are 2 suggested ways of accessing data stored in S3 in the Databricks environment.
-
Using Managed Volumes Mounted onto a Catalog (Recommended): In Databricks, managed volumes are a way to securely and efficiently manage access to non-tabular data stored in Amazon S3. When you create a managed volume, it is associated with a specific catalog and uses a storage credential to authorize access to the S3 bucket.
- Using AWS Access Tokens and a URL to Request Data: Another method to access data stored in Amazon S3 is by using AWS access tokens (an access key ID, a secret key, and a session token) along with a direct URL to the S3 object. This approach involves generating temporary credentials that grant access to the S3 bucket. You can then use these credentials to authenticate and directly request data from the specified S3 URL. This method is more flexible and can be used for ad-hoc data access or sharing data with other users or systems, but it requires careful management of the temporary credentials to ensure security, and as such we don't recommend using this method unless the first option is not possible.
See also: