How to set up Databricks

Atlan supports three authentication methods for fetching metadata from Databricks. Set up authentication for your Databricks instance using one of the following three options:

💪 Did you know? Atlan currently only supports querying data and viewing sample data preview for personal access token authentication method.

Personal access token authentication

🤓 Who can do this? Check that you have Admin and Databricks SQL access for the Databricks workspace. This is required for both authentication options described below. If you do not have this access, contact your Databricks administrator.

Set up personal access token authentication for your Databricks instance using one of the following two options:

  • Interactive cluster
  • SQL warehouse (formerly SQL endpoint)

Interactive cluster

Confirm cluster setup

To confirm an all-purpose interactive cluster is configured:

  1. From the left menu of any page of your Databricks instance, click Compute.
  2. Under the All-purpose clusters tab ensure you have a cluster defined.
  3. Click the link under the Name column of the table to open your cluster.
  4. Under the Configuration tab, ensure the Autopilot options to Terminate after ... minutes is enabled.
  5. At the bottom of the Configuration tab, expand the Advanced options expandable.
    1. Under the Advanced options expandable, open the JDBC/ODBC tab.
    2. Confirm that all of the fields in this tab are populated, and copy them for use in crawling: Server HostnamePort, and HTTP Path.

Generate a personal access token

To generate a personal access token:

  1. From the left menu of any page of your Databricks instance, at the bottom, click Settings and then User Settings.
  2. Under the Access tokens tab, click the Generate new token button.
  3. In the Generate new token dialog:
    1. For Comment enter a description of the token's intended use (for example, Atlan crawler).
    2. For Lifetime (days) consider removing the number. This will allow the token to be used indefinitely — it will not need to be refreshed. If you do enter a number, remember that you will need to generate a new token periodically and enter
      🚨 Careful! If you do enter a number, remember that you will need to periodically regenerate it and update Atlan's crawler configuration with the new token each time.
    3. At the bottom of the dialog click Generate.
  4. Copy and save the generated token somewhere, and then click Done.

SQL warehouse (formerly SQL endpoint)

Confirm warehouse setup

To confirm a SQL warehouse is configured:

  1. From the left menu of any page of your Databricks instance, open the drop-down just below the databricks logo and change to SQL.
  2. From the refreshed left menu, click SQL Warehouses.
  3. Click the link under the Name column of the table to open your SQL warehouse.
  4. Under the Connection details tab, confirm that all of the fields are populated and copy them for use in crawling: Server hostnamePortHTTP path.

Generate a personal access token

To generate a personal access token:

  1. In the lower-right corner of the Connection details tab of the SQL warehouse, click the link to Create a personal access token.
  2. In the resulting User Settings page, under the Personal access tokens tab, click the Generate new token button.
  3. In the Generate token dialog:
    1. For Comment enter a description of the token's intended use (for example, Atlan crawler).
    2. For Lifetime (days) consider removing the number. This will allow the token to be used indefinitely — it will not need to be refreshed. If you do enter a number, remember that you will need to generate a new token periodically and enter
      🚨 Careful! If you do enter a number, remember that you will need to periodically regenerate it and update Atlan's crawler configuration with the new token each time.
    3. At the bottom of the dialog click Generate.
  4. Copy and save the generated token somewhere, and then click Done.

(Optional) Set permissions for Unity Catalog

If you're managing permissions selectively through Unity Catalog, then Atlan will require certain privileges:

  • To crawl metadata for tables and views: The caller must be a metastore admin or an owner of (or have the SELECT privilege on) the tables. For the latter case, the caller must also be the owner or have the USE_CATALOG privilege on the parent catalog and the USE_SCHEMA privilege on the parent schema.
  • To import tags from Unity Catalog: You can only import Databricks tags from Unity Catalog-enabled workspaces. Atlan requires the following privileges to import tags:
    • The caller must be the owner or have USE_CATALOG privilege on the SYSTEM catalog and USE_SCHEMA privilege on the system.information_schema schema.
    • The caller must be an owner or have SELECT privilege on the following tables:
      • system.information_schema.catalog_tags
      • system.information_schema.schema_tags
      • system.information_schema.table_tags
      • system.information_schema.column_tags
  • To enable reverse sync for imported Databricks tags: You can only push tag updates for imported Databricks tags to Unity Catalog-enabled workspaces. To push tags updated for assets in Atlan to Databricks, Atlan requires the following privileges:
    • The caller must have the APPLY TAG privilege on the object, the USE SCHEMA privilege on the object’s parent schema, and the USE CATALOG privilege on the object’s parent catalog.

AWS service principal authentication

🤓 Who can do this? You will need your AWS Databricks account admin to create a service principal and manage OAuth credentials for the service principal and your AWS Databricks workspace admin to add the service principal to your AWS Databricks workspace — you may not have access yourself.

Atlan currently only supports REST API extraction method when using AWS service principal authentication to crawl Databricks. You will need the following to authenticate the connection in Atlan:

Create a service principal

You can create a service principal directly in your Databricks account or from a Databricks workspace.

  • Identity federation enabled on your workspaces: Databricks recommends creating the service principal in the account and assigning it to workspaces.
  • Identity federation disabled on your workspaces: Databricks recommends that you create your service principal from a workspace.

Identity federation enabled

To create a service principal from your Databricks account, with identify federation enabled:

  1. Log in to your Databricks account console as an account admin.
  2. From the left menu of the account console, click User management.
  3. From the tabs along the top of the User management page, click the Service principals tab.
  4. In the upper right of the Service principals page, click Add service principal.
  5. On the Add service principal page, enter a name for the service principal and then click Add.
  6. Once the service principal has been created, you can assign it to your identity federated workspace. From the left menu of the account console, click Workspaces and then select a workspace to which you want to add the service principal.
  7. From the tabs along the top of your workspace page, click the Permissions tab.
  8. In the upper right of the Permissions page, click Add permissions.
  9. In the Add permissions dialog, enter the following details:
    1. For User, group, or service principal, select the service principal you created.
    2. For Permission, click the dropdown and select workspace User.

Identity federation disabled

To create a service principal from a Databricks workspace, with identity federation disabled:

  1. Log in to your AWS Databricks workspace as a workspace admin.
  2. From the top right of your workspace, click your username, and then from the dropdown, click Admin Settings.
  3. In the left menu of the Settings page, under the Workspace admin subheading, click Identity and access.
  4. On the Identity and access page, under Management and permissions, next to Service principals, click Manage
  5. In the upper right of the Service principals page, click Add service principal.
  6. In the Add service principal dialog, click the Add new button.
  7. For New service principal display name, enter a name for the service principal and then click Add.

Create an OAuth secret for the service principal

You will need to create an OAuth secret to authenticate to Databricks REST APIs.

To create an OAuth secret for the service principal:

  1. Log in to your Databricks account console as an account admin.
  2. From the left menu of the account console, click User management.
  3. From the tabs along the top of the User management page, click the Service principals tab.
  4. In the upper right of the Service principals page, select the service principal you created.
  5. On the service principal page, under OAuth secrets, click Generate secret.
  6. From the Generate secret dialog, copy the Secret and Client ID and store it in a secure location.
    🚨 Careful! Note that this secret will only be revealed once during creation. The client ID is the same as the application ID of the service principal.
  7. Once you've copied the client ID and secret, click Done.

Grant permissions to access catalogs

You will need to grant permissions to the newly created service principal to allow access to your Unity Catalog-enabled AWS Databricks workspace.

To import Databricks tags, you must have a Unity Catalog-enabled workspace and a SQL warehouse configured. No additional permissions are required to import tags other than the ones outlined below. To enable reverse sync for imported Databricks tags from Atlan to Databricks, you will need to grant additional permissions.

To grant permissions to the service principal:

  1. Log in to your AWS Databricks workspace as a workspace admin.
  2. From the left menu of your workspace, click Catalog.
  3. In the left menu of the Catalog Explorer page, select your workspace.
  4. From the tabs along the top of your workspace page, click the Permissions tab and then click the Grant button.
  5. In the Grant on (workspace name) dialog, enter the following details:
    1. Under Principals, click the dropdown and then select the newly created service principal.
    2. Under Privilege presets, click the dropdown and then click Data Reader to allow read-only access to the catalog. Doing so will automatically select the following privileges — USE CATALOG, USE SCHEMA, EXECUTE, READ VOLUME, and SELECT.
    3. At the bottom of the dialog, click Grant.
  6. (Optional) To push tags updated for assets in Atlan to Databricks, you must assign the APPLY TAG privilege on the object, the USE SCHEMA privilege on the object’s parent schema, and the USE CATALOG privilege on the object’s parent catalog.

Grant permissions to mine query history

To mine query history, you will need to assign the CAN MANAGE permission on your SQL warehouses to the AWS service principal.

To grant permissions to mine query history:

  1. Log in to your AWS Databricks workspace as a workspace admin.
  2. From the left menu of your workspace, click SQL Warehouses.
  3. On the Compute page, for each SQL warehouse you want to mine query history, click the 3-dot icon and then click Permissions.
  4. In the Manage permissions dialog, configure the following:
    1. In the Type to add multiple users or groups field, search for and select the AWS service principal you created.
    2. Expand the Can use permissions dropdown and then select Can manage. This permission allows the service principal to view all queries for the warehouse.
    3. Click Add to assign the CAN MANAGE permission to the service principal.

Azure service principal authentication

🤓 Who can do this? You will need your Azure Databricks account admin to create a service principal and your Azure Databricks workspace admin to add the service principal to your Azure Databricks workspace — you may not have access yourself.

Atlan currently only supports REST API extraction method when using Azure service principal authentication to crawl Databricks. You will need the following to authenticate the connection in Atlan:

Create a service principal

To use service principals on Azure Databricks, an admin user must create a new Microsoft Entra ID (formerly Azure Active Directory) application and then add it to the Azure Databricks workspace to use as a service principal.

To create a service principal:

  1. Sign in to the Azure portal.
  2. If you have access to multiple tenants, subscriptions, or directories, click the Directories + subscriptions (directory with filter) icon in the top menu to switch to the directory in which you want to create the service principal.
  3. In Search resources, services, and docs, search for and select Microsoft Entra ID.
  4. Click + Add and select App registration.
  5. For Name, enter a name for the application.
  6. In the Supported account types section, select Accounts in this organizational directory only (Single tenant) and then click Register.
  7. On the application page’s Overview page, in the Essentials section, copy and store the following values in a secure location:
    • Application (client) ID
    • Directory (tenant) ID
  8. To generate a client secret, within Manage, click Certificates & secrets.
  9. On the Client secrets tab, click New client secret.
  10. In the Add a client secret dialog, enter the following details:
    1. For Description, enter a description for the client secret.
    2. For Expires, select an expiry time period for the client secret and then click Add.
    3. Copy and store the client secret’s Value in a secure place.

Add a service principal to your account

To add a service principal to your Azure Databricks account:

  1. Log in to your Azure Databricks account console as an account admin.
  2. From the left menu of the account console, click User management.
  3. From the tabs along the top of the User management page, click the Service principals tab.
  4. In the upper right of the Service principals page, click Add service principal.
  5. On the Add service principal page, enter a name for the service principal.
  6. Under UUID, paste the Application (client) ID for the service principal.
  7. Click Add.

Assign a service principal to a workspace

To add users to a workspace using the account console, the workspace must be enabled for identity federation. Workspace admins can also assign service principals to workspaces using the workspace admin settings page.

Identity federation enabled

To assign a service principal to your Azure Databricks account:

  1. Log in to your Databricks account console as an account admin.
  2. From the left menu of the account console, click Workspaces and then select a workspace to which you want to add the service principal.
  3. From the tabs along the top of your workspace page, click the Permissions tab.
  4. In the upper right of the Permissions page, click Add permissions.
  5. In the Add permissions dialog, enter the following details:
    1. For User, group, or service principal, select the service principal you created.
    2. For Permission, click the dropdown to select workspace User.

Identity federation disabled

To assign a service principal to your Azure Databricks workspace:

  1. Log in to your Azure Databricks workspace as a workspace admin.
  2. From the top right of your workspace, click your username, and then from the dropdown, click Admin Settings.
  3. In the left menu of the Settings page, under the Workspace admin subheading, click Identity and access.
  4. On the Identity and access page, under Management and permissions, next to Service principals, click Manage
  5. In the upper right of the Service principals page, click Add service principal.
  6. In the Add service principal dialog, click the Add new button.
  7. For New service principal display name, paste the Application (client) ID for the service principal, enter a display name, and then click Add.

Grant permissions to access catalogs

You will need to grant permissions to the newly created service principal to allow access to your Unity Catalog-enabled Azure Databricks workspace.

To import Databricks tags, you must have a Unity Catalog-enabled workspace and a SQL warehouse configured. No additional permissions are required to import tags other than the ones outlined below. To enable reverse sync for imported Databricks tags from Atlan to Databricks, you will need to grant additional permissions.

To grant permissions to the service principal:

  1. Log in to your Azure Databricks workspace as a workspace admin.
  2. From the left menu of your workspace, click Catalog.
  3. In the left menu of the Catalog Explorer page, select your workspace.
  4. From the tabs along the top of your workspace page, click the Permissions tab and then click the Grant button.
  5. In the Grant on (workspace name) dialog, enter the following details:
    1. Under Principals, click the dropdown and then select the newly created service principal.
    2. Under Privilege presets, click the dropdown and then click Data Reader to allow read-only access to the catalog. Doing so will automatically select the following privileges — USE CATALOG, USE SCHEMA, EXECUTE, READ VOLUME, and SELECT.
    3. At the bottom of the dialog, click Grant.
  6. (Optional) To push tags updated for assets in Atlan to Databricks, you must assign the APPLY TAG privilege on the object, the USE SCHEMA privilege on the object’s parent schema, and the USE CATALOG privilege on the object’s parent catalog.

Grant permissions to mine query history

To mine query history, you will need to assign the CAN MANAGE permission on your SQL warehouses to the Azure service principal.

To grant permissions to mine query history:

  1. Log in to your Azure Databricks workspace as a workspace admin.
  2. From the left menu of your workspace, click SQL Warehouses.
  3. On the Compute page, for each SQL warehouse you want to mine query history, click the 3-dot icon and then click Permissions.
  4. In the Manage permissions dialog, configure the following:
    1. In the Type to add multiple users or groups field, search for and select the Azure service principal you created.
    2. Expand the Can use permissions dropdown and then select Can manage. This permission allows the service principal to view all queries for the warehouse.
    3. Click Add to assign the CAN MANAGE permission to the service principal.

(Optional) Grant access to system tables for lineage extraction

Atlan supports extracting lineage for your Databricks assets using system tables. You must have a Unity Catalog-enabled workspace to use system tables.

Enable a system schema

You will need your account admin to enable the system.access schema using the SystemSchemas API. This will allow Atlan to extract lineage using system tables.

To enable a system schema, refer to Databricks documentation:

  • Replace {schema_name} with access.

To ensure that system schemas is enabled for each schema, follow the steps in Databricks documentation. If enabled for any given schema, the state will be EnableCompleted.

Grant permissions

Atlan supports extracting Databricks lineage using system tables for all three authentication methods.

Once you have created a personal access token, an AWS service principal, or an Azure service principal, you will need to grant the following permissions:

  • CAN_USE on a SQL warehouse.
  • USE SCHEMA on system.access.
  • SELECT on system.access.table_lineage and system.access.column_lineage.

You will need to create a Databricks connection in Atlan for each metastore. You can use the hostname of your Unity Catalog-enabled workspace as the Host for the connection.

Locate warehouse ID

To extract lineage using system tables, you will also need the warehouse ID of your SQL warehouse.

To locate the warehouse ID:

  1. Log in to your Databricks workspace as a workspace admin.
  2. From the left menu of your workspace, click SQL Warehouses.
  3. On the Compute page, select the warehouse you want to use.
  4. From the Overview tab of your warehouse page, next to the Name of your warehouse, copy the value for your SQL warehouse ID. For example, example-warehouse (ID: 123ab4c5def67890), copy the value 123ab4c5def67890 and store it in a secure location.

Related articles

Was this article helpful?
1 out of 1 found this helpful