What is... a Cluster?
A Databricks cluster is a set of computation resources (virtual machines) that work together to run data processing, data engineering, machine learning, and analytics workloads within the Databricks platform. A cluster provides the necessary compute power to execute jobs, run notebooks, and perform queries on large datasets. Databricks clusters are highly scalable and can be dynamically configured based on the workload.
You can run these workloads from Notebooks or as automated Jobs
There are broadly speaking 3 types of clusters:
1 - All-purpose compute
These clusters are setup and used on an project-by-project basis. They are mainly used to run queries out of interactive and collaborative notebooks.
2 - Jobs clusters
These are pre-defined compute clusters that are designed to run fast-paced automated and robust workflows.
3 - SQL Warehouses
These clusters can be either standard or serverless compute but are typically the latter - meaning they are on-demand elastic clusters used for SQL specific workloads on large datasets. Because severless compute are managed by Databricks, they are cheaper and faster to run than standard, and are better supported, often receiving new features and upgrades long before standard compute.
In all cases, clusters require a billing/TRS code before create.
The configuration of a compute extends to its runtime, instance type, driver type, ratio of spot nodes to on-demand nodes.
Creation can be requested via the Service Desk.
See Also: