How to set up Databricks

Atlan supports three authentication methods for fetching metadata from Databricks. You can set up any of the following authentication methods:

Personal access token authentication

🤓 Who can do this? Check that you have Admin and Databricks SQL access for the Databricks workspace. This is required for both cluster options described below. If you do not have this access, contact your Databricks administrator.

Grant user access to workspace

To grant workspace access to the user creating a personal access token:

  1. From the left menu of the account console, click Workspaces and then select a workspace to which you want to add the user.
  2. From the tabs along the top of your workspace page, click the Permissions tab.
  3. In the upper right of the Permissions page, click Add permissions.
  4. In the Add permissions dialog, enter the following details:
    1. For User, group, or service principal, select the user to grant access.
    2. For Permission, click the dropdown and select workspace User.

Generate a personal access token

You can generate a personal access token in your Databricks workspace to the authenticate the integration in Atlan.

To generate a personal access token:

  1. From the top right of your Databricks workspace, click your Databricks username, and then from the dropdown, click User Settings.
  2. Under the Settings menu, click Developer.
  3. On the Developer page, next to Access tokens, click Manage.
  4. On the Access tokens page, click the Generate new token button.
  5. In the Generate new token dialog:
    1. For Comment, enter a description of the token's intended use — for example, Atlan crawler.
    2. For Lifetime (days), consider removing the number. This will allow the token to be used indefinitely — it will not need to be refreshed.
      🚨 Careful! If you do enter a number, remember that you will need to periodically regenerate it and update Atlan's crawler configuration with the new token each time.
    3. At the bottom of the dialog, click Generate.
  6. Copy and save the generated token in a secure location, and then click Done.

Select a cluster

💪 Did you know? Atlan recommends using serverless SQL warehouses for instant compute availability. To enable serverless SQL warehouses, refer to Databricks documentation for AWS Databricks workspaces or Microsoft documentation for Azure Databricks workspaces.

You can set up personal access token authentication for your Databricks instance using one of the following cluster options:

  • Interactive cluster
  • SQL warehouse (formerly SQL endpoint)

Interactive cluster

To confirm an all-purpose interactive cluster is configured:

  1. From the left menu of any page of your Databricks instance, click Compute.
  2. Under the All-purpose clusters tab, ensure you have a cluster defined.
  3. Click the link under the Name column of the table to open your cluster.
  4. Under the Configuration tab, ensure the Autopilot options to Terminate after ... minutes is enabled.
  5. At the bottom of the Configuration tab, expand the Advanced options expandable.
    1. Under the Advanced options expandable, open the JDBC/ODBC tab.
    2. Confirm that all of the fields in this tab are populated, and copy them for use in crawling: Server Hostname, Port, and HTTP Path.

SQL warehouse (formerly SQL endpoint)

To confirm a SQL warehouse is configured:

  1. From the left menu of any page of your Databricks instance, open the dropdown just below the databricks logo and change to SQL.
  2. From the refreshed left menu, click SQL Warehouses.
  3. Click the link under the Name column of the table to open your SQL warehouse.
  4. Under the Connection details tab, confirm that all of the fields are populated and copy them for use in crawling: Server hostname, Port, and HTTP path.

AWS service principal authentication

🤓 Who can do this? You will need your AWS Databricks account admin to create a service principal and manage OAuth credentials for the service principal and your AWS Databricks workspace admin to add the service principal to your AWS Databricks workspace — you may not have access yourself.

Atlan currently only supports REST API extraction method when using AWS service principal authentication to crawl Databricks. You will need the following to authenticate the connection in Atlan:

Create a service principal

You can create a service principal directly in your Databricks account or from a Databricks workspace.

  • Identity federation enabled on your workspaces: Databricks recommends creating the service principal in the account and assigning it to workspaces.
  • Identity federation disabled on your workspaces: Databricks recommends that you create your service principal from a workspace.

Identity federation enabled

To create a service principal from your Databricks account, with identify federation enabled:

  1. Log in to your Databricks account console as an account admin.
  2. From the left menu of the account console, click User management.
  3. From the tabs along the top of the User management page, click the Service principals tab.
  4. In the upper right of the Service principals page, click Add service principal.
  5. On the Add service principal page, enter a name for the service principal and then click Add.
  6. Once the service principal has been created, you can assign it to your identity federated workspace. From the left menu of the account console, click Workspaces and then select a workspace to which you want to add the service principal.
  7. From the tabs along the top of your workspace page, click the Permissions tab.
  8. In the upper right of the Permissions page, click Add permissions.
  9. In the Add permissions dialog, enter the following details:
    1. For User, group, or service principal, select the service principal you created.
    2. For Permission, click the dropdown and select workspace User.

Identity federation disabled

To create a service principal from a Databricks workspace, with identity federation disabled:

  1. Log in to your AWS Databricks workspace as a workspace admin.
  2. From the top right of your workspace, click your username, and then from the dropdown, click Admin Settings.
  3. In the left menu of the Settings page, under the Workspace admin subheading, click Identity and access.
  4. On the Identity and access page, under Management and permissions, next to Service principals, click Manage. 
  5. In the upper right of the Service principals page, click Add service principal.
  6. In the Add service principal dialog, click the Add new button.
  7. For New service principal display name, enter a name for the service principal and then click Add.

Create an OAuth secret for the service principal

You will need to create an OAuth secret to authenticate to Databricks REST APIs.

To create an OAuth secret for the service principal:

  1. Log in to your Databricks account console as an account admin.
  2. From the left menu of the account console, click User management.
  3. From the tabs along the top of the User management page, click the Service principals tab.
  4. In the upper right of the Service principals page, select the service principal you created.
  5. On the service principal page, under OAuth secrets, click Generate secret.
  6. From the Generate secret dialog, copy the Secret and Client ID and store it in a secure location.
    🚨 Careful! Note that this secret will only be revealed once during creation. The client ID is the same as the application ID of the service principal.
  7. Once you've copied the client ID and secret, click Done.

Azure service principal authentication

🤓 Who can do this? You will need your Azure Databricks account admin to create a service principal and your Azure Databricks workspace admin to add the service principal to your Azure Databricks workspace — you may not have access yourself.

Atlan currently only supports REST API extraction method when using Azure service principal authentication to crawl Databricks. You will need the following to authenticate the connection in Atlan:

Create a service principal

To use service principals on Azure Databricks, an admin user must create a new Microsoft Entra ID (formerly Azure Active Directory) application and then add it to the Azure Databricks workspace to use as a service principal.

To create a service principal:

  1. Sign in to the Azure portal.
  2. If you have access to multiple tenants, subscriptions, or directories, click the Directories + subscriptions (directory with filter) icon in the top menu to switch to the directory in which you want to create the service principal.
  3. In Search resources, services, and docs, search for and select Microsoft Entra ID.
  4. Click + Add and select App registration.
  5. For Name, enter a name for the application.
  6. In the Supported account types section, select Accounts in this organizational directory only (Single tenant) and then click Register.
  7. On the application page’s Overview page, in the Essentials section, copy and store the following values in a secure location:
    • Application (client) ID
    • Directory (tenant) ID
  8. To generate a client secret, within Manage, click Certificates & secrets.
  9. On the Client secrets tab, click New client secret.
  10. In the Add a client secret dialog, enter the following details:
    1. For Description, enter a description for the client secret.
    2. For Expires, select an expiry time period for the client secret and then click Add.
    3. Copy and store the client secret’s Value in a secure place.

Add a service principal to your account

To add a service principal to your Azure Databricks account:

  1. Log in to your Azure Databricks account console as an account admin.
  2. From the left menu of the account console, click User management.
  3. From the tabs along the top of the User management page, click the Service principals tab.
  4. In the upper right of the Service principals page, click Add service principal.
  5. On the Add service principal page, enter a name for the service principal.
  6. Under UUID, paste the Application (client) ID for the service principal.
  7. Click Add.

Assign a service principal to a workspace

To add users to a workspace using the account console, the workspace must be enabled for identity federation. Workspace admins can also assign service principals to workspaces using the workspace admin settings page.

Identity federation enabled

To assign a service principal to your Azure Databricks account:

  1. Log in to your Databricks account console as an account admin.
  2. From the left menu of the account console, click Workspaces and then select a workspace to which you want to add the service principal.
  3. From the tabs along the top of your workspace page, click the Permissions tab.
  4. In the upper right of the Permissions page, click Add permissions.
  5. In the Add permissions dialog, enter the following details:
    1. For User, group, or service principal, select the service principal you created.
    2. For Permission, click the dropdown to select workspace User.

Identity federation disabled

To assign a service principal to your Azure Databricks workspace:

  1. Log in to your Azure Databricks workspace as a workspace admin.
  2. From the top right of your workspace, click your username, and then from the dropdown, click Admin Settings.
  3. In the left menu of the Settings page, under the Workspace admin subheading, click Identity and access.
  4. On the Identity and access page, under Management and permissions, next to Service principals, click Manage. 
  5. In the upper right of the Service principals page, click Add service principal.
  6. In the Add service principal dialog, click the Add new button.
  7. For New service principal display name, paste the Application (client) ID for the service principal, enter a display name, and then click Add.

Grant permissions to crawl metadata

You must have a Unity Catalog-enabled Databricks workspace to crawl metadata in Atlan.

To extract metadata, you can grant the BROWSE privilege, currently in public preview. You will no longer require the Data Reader preset that granted the following privileges on objects in the catalog — USE CATALOG, USE SCHEMA, EXECUTE, READ VOLUME, and SELECT.

To grant permissions to a user or service principal:

  1. Log in to your Databricks workspace as a workspace admin.
  2. From the left menu of your workspace, click Catalog.
  3. In the left menu of the Catalog Explorer page, select the catalog you want to crawl in Atlan.
  4. From the tabs along the top of your workspace page, click the Permissions tab and then click the Grant button.
  5. In the Grant on (workspace name) dialog, configure the following:
    1. Under Principals, click the dropdown and then select the user or service principal.
    2. Under Privileges, check the BROWSE privilege.
    3. At the bottom of the dialog, click Grant.
  6. (Optional) Repeat steps 3-5 for each catalog you want to crawl in Atlan.

(Optional) Grant permissions to query and preview data

🚨 Careful! Atlan currently only supports querying data and viewing sample data preview for the personal access token authentication method.

To grant permissions to query data and preview sample data:

  1. Log in to your Databricks workspace as a workspace admin.
  2. From the left menu of your workspace, click Catalog.
  3. In the left menu of the Catalog Explorer page, select the catalog you want to query and preview data from in Atlan.
  4. From the tabs along the top of your workspace page, click the Permissions tab and then click the Grant button.
  5. In the Grant on (workspace name) dialog, configure the following:
    1. Under Principals, click the dropdown and then select the user or service principal.
    2. Under Privilege presets, click the dropdown and then click Data Reader to allow read-only access to the catalog. Doing so will automatically select the following privileges — USE CATALOG, USE SCHEMA, EXECUTE, READ VOLUME, and SELECT.
    3. At the bottom of the dialog, click Grant.
  6. (Optional) Repeat steps 3-5 for each catalog you want to query and preview data from in Atlan.

(Optional) Grant permissions to import and update tags

To import Databricks tags, you must have a Unity Catalog-enabled workspace and a SQL warehouse configured. Atlan supports importing Databricks tags using system tables for all three authentication methods.

Once you have created a personal access token, an AWS service principal, or an Azure service principal, you will need to grant the following privileges:

  • CAN_USE on a SQL warehouse
  • USE CATALOG on system catalog
  • USE SCHEMA on system.information_schema
  • SELECT on the following tables:
    • system.information_schema.catalog_tags
    • system.information_schema.schema_tags
    • system.information_schema.table_tags
    • system.information_schema.column_tags

To push tags updated for assets in Atlan to Databricks, you will need to grant the following privileges:

  • APPLY TAG on the object
  • USE CATALOG on the object’s parent catalog
  • USE SCHEMA on the object’s parent schema

(Optional) Grant permissions to extract lineage and usage from system tables

You must have a Unity Catalog-enabled workspace to use system tables.

Atlan supports extracting the following for your Databricks assets using system tables:

Enable system.access schema

You will need your account admin to enable the system.access schema using the SystemSchemas API. This will allow Atlan to extract lineage using system tables.

In Atlan, one Databricks connection corresponds to one metastore. Repeat the following process for each metastore in your Databricks environment for which you want to extract lineage.

To enable a system schema, refer to Databricks documentation:

  • Use the SystemSchemas API to enable the system.access schema for a given metastore.
  • Replace {schema_name} with access.

To ensure that system schemas are enabled for each schema, follow the steps in Databricks documentation.

  • List system schemas using the SystemSchemas API to check the status.
  • If enabled for any given schema, the state will be EnableCompleted. This confirms that the schema has been enabled for that specific metastore.
  • Atlan will only be able to extract lineage using system tables when the state is marked as EnableCompleted.

(Optional) Enable system.query schema

This is only required if you also want to extract usage and popularity metrics from Databricks.

You will need your account admin to enable the system.query schema using the SystemSchemas API. This will allow Atlan to mine query history using system tables for usage and popularity metrics.

To enable a system schema, refer to Databricks documentation:

  • Replace {schema_name} with query.

To ensure that system schemas is enabled for each schema, follow the steps in Databricks documentation. If enabled for any given schema, the state will be EnableCompleted.

Grant permissions

Atlan supports extracting Databricks lineage and usage and popularity metrics using system tables for all three authentication methods.

Once you have created a personal access token, an AWS service principal, or an Azure service principal, you will need to grant the following permissions:

  • CAN_USE on a SQL warehouse
  • USE_CATALOG on system catalog
  • USE SCHEMA on system.access schema
  • USE SCHEMA on system.query schema (to mine query history for usage and popularity metrics)
  • SELECT on the following tables:
    • system.query.history (to mine query history for usage and popularity metrics)
    • system.access.table_lineage
    • system.access.column_lineage

You will need to create a Databricks connection in Atlan for each metastore. You can use the hostname of your Unity Catalog-enabled workspace as the Host for the connection.

Locate warehouse ID

To extract lineage and usage and popularity metrics using system tables, you will also need the warehouse ID of your SQL warehouse.

To locate the warehouse ID:

  1. Log in to your Databricks workspace as a workspace admin.
  2. From the left menu of your workspace, click SQL Warehouses.
  3. On the Compute page, select the warehouse you want to use.
  4. From the Overview tab of your warehouse page, next to the Name of your warehouse, copy the value for your SQL warehouse ID. For example, example-warehouse (ID: 123ab4c5def67890), copy the value 123ab4c5def67890 and store it in a secure location.

(Optional) Grant permissions to mine query history

To mine query history using REST API, you will need to assign the CAN MANAGE permission on your SQL warehouses to the user or service principal.

To grant permissions to mine query history:

  1. Log in to your Databricks workspace as a workspace admin.
  2. From the left menu of your workspace, click SQL Warehouses.
  3. On the Compute page, for each SQL warehouse you want to mine query history, click the 3-dot icon and then click Permissions.
  4. In the Manage permissions dialog, configure the following:
    1. In the Type to add multiple users or groups field, search for and select a user or service principal.
    2. Expand the Can use permissions dropdown and then select Can manage. This permission allows the service principal to view all queries for the warehouse.
    3. Click Add to assign the CAN MANAGE permission to the service principal.

Related articles

Was this article helpful?
1 out of 1 found this helpful