Databricks setup
profiles.yml
file is for CLI users onlyIf you're using dbt Cloud, you don't need to create a profiles.yml
file. This file is only for CLI users. To connect your data platform to dbt Cloud, refer to About data platforms.
Overview of dbt-databricks
- Maintained by: Databricks
- Authors: some dbt loving Bricksters
- GitHub repo: databricks/dbt-databricks
- PyPI package:
dbt-databricks
- Slack channel: #db-databricks-and-spark
- Supported dbt Core version: v0.18.0 and newer
- dbt Cloud support: Supported
- Minimum data platform version: Databricks SQL or DBR 12+
Installing dbt-databricks
pip is the easiest way to install the adapter:
pip install dbt-databricks
Installing dbt-databricks
will also install dbt-core
and any other dependencies.
Configuring dbt-databricks
For Databricks-specifc configuration please refer to Databricks Configuration
For further info, refer to the GitHub repository: databricks/dbt-databricks
dbt-databricks
is the recommended adapter for Databricks. It includes features not available in dbt-spark
, such as:
- Unity Catalog support
- No need to install additional drivers or dependencies for use on the CLI
- Use of Delta Lake for all models out of the box
- SQL macros that are optimized to run with Photon
Connecting to Databricks
To connect to a data platform with dbt Core, create the appropriate profile and target YAML keys/values in the profiles.yml
configuration file for your Databricks SQL Warehouse/cluster. This dbt YAML file lives in the .dbt/
directory of your user/home directory. For more info, refer to Connection profiles and profiles.yml.
dbt-databricks
can connect to Databricks SQL Warehouses and all-purpose clusters. Databricks SQL Warehouses is the recommended way to get started with Databricks.
Refer to the Databricks docs for more info on how to obtain the credentials for configuring your profile.
Examples
You can use either token-based authentication or OAuth client-based authentication to connect to Databricks. Refer to the following examples for more info on how to configure your profile for each type of authentication.
- Token-based authentication
- OAuth client-based authentication
your_profile_name:
target: dev
outputs:
dev:
type: databricks
catalog: [optional catalog name if you are using Unity Catalog]
schema: [schema name] # Required
host: [yourorg.databrickshost.com] # Required
http_path: [/sql/your/http/path] # Required
token: [dapiXXXXXXXXXXXXXXXXXXXXXXX] # Required Personal Access Token (PAT) if using token-based authentication
threads: [1 or more] # Optional, default 1
your_profile_name:
target: dev
outputs:
dev:
type: databricks
catalog: [optional catalog name if you are using Unity Catalog]
schema: [schema name] # Required
host: [yourorg.databrickshost.com] # Required
http_path: [/sql/your/http/path] # Required
auth_type: oauth # Required if using OAuth-based authentication
client_id: [OAuth-Client-ID] # The ID of your OAuth application. Required if using OAuth-based authentication
client_secret: [XXXXXXXXXXXXXXXXXXXXXXXXXXX] # OAuth client secret. # Required if using OAuth-based authentication
threads: [1 or more] # Optional, default 1
Host parameters
The following profile fields are always required.
Field | Description | Example |
---|---|---|
host | The hostname of your cluster. Don't include the http:// or https:// prefix. | yourorg.databrickshost.com |
http_path | The http path to your SQL Warehouse or all-purpose cluster. | /sql/your/http/path |
schema | The name of a schema within your cluster's catalog. It's not recommended to use schema names that have upper case or mixed case letters. | my_schema |
Authentication parameters
The dbt-databricks
adapter supports both token-based authentication and OAuth client-based authentication.
Refer to the following required parameters to configure your profile for each type of authentication:
Field | Authentication type | Description | Example |
---|---|---|---|
token | Token-based | The Personal Access Token (PAT) to connect to Databricks. | dapiXXXXXXXXX XXXXXXXXXXXXXX |
client_id | OAuth-based | The client ID for your Databricks OAuth application. | <oauth-client-id> |
client_secret | OAuth-based | The client secret for your Databricks OAuth application. | XXXXXXXXXXXXX XXXXXXXXXXXXXX |
auth_type | OAuth-based | The type of authorization needed to connect to Databricks. | oauth |
Additional parameters
The following profile fields are optional to set up. They help you configure how your cluster's session and dbt work for your connection.
Profile field | Description | Example |
---|---|---|
threads | The number of threads dbt should use (default is 1 ) | 8 |
connect_retries | The number of times dbt should retry the connection to Databricks (default is 1 ) | 3 |
connect_timeout | How many seconds before the connection to Databricks should timeout (default behavior is no timeouts) | 1000 |
session_properties | This sets the Databricks session properties used in the connection. Execute SET -v to see available options | ansi_mode: true |
Supported Functionality
Delta Lake
Most dbt Core functionality is supported, but some features are only available on Delta Lake.
Delta-only features:
- Incremental model updates by
unique_key
instead ofpartition_by
(seemerge
strategy) - Snapshots
Unity Catalog
The adapter dbt-databricks>=1.1.1
supports the 3-level namespace of Unity Catalog (catalog / schema / relations) so you can organize and secure your data the way you like.