How to set up Databricks

Atlan supports three authentication methods for fetching metadata from Databricks. Set up authentication for your Databricks instance using one of the following three options:

💪 Did you know? Atlan currently only supports querying data and viewing sample data preview for personal access token authentication method.

Personal access token authentication

🤓 Who can do this? Check that you have Admin and Databricks SQL access for the Databricks workspace. This is required for both authentication options described below. If you do not have this access, contact your Databricks administrator.

Set up personal access token authentication for your Databricks instance using one of the following two options:

  • Interactive cluster
  • SQL warehouse (formerly SQL endpoint)

Interactive cluster

Confirm cluster setup

To confirm an all-purpose interactive cluster is configured:

  1. From the left menu of any page of your Databricks instance, click Compute.
  2. Under the All-purpose clusters tab ensure you have a cluster defined.
  3. Click the link under the Name column of the table to open your cluster.
  4. Under the Configuration tab, ensure the Autopilot options to Terminate after ... minutes is enabled.
  5. At the bottom of the Configuration tab, expand the Advanced options expandable.
    1. Under the Advanced options expandable, open the JDBC/ODBC tab.
    2. Confirm that all of the fields in this tab are populated, and copy them for use in crawling: Server HostnamePort, and HTTP Path.

Generate a personal access token

To generate a personal access token:

  1. From the left menu of any page of your Databricks instance, at the bottom, click Settings and then User Settings.
  2. Under the Access tokens tab, click the Generate new token button.
  3. In the Generate new token dialog:
    1. For Comment enter a description of the token's intended use (for example, Atlan crawler).
    2. For Lifetime (days) consider removing the number. This will allow the token to be used indefinitely — it will not need to be refreshed. If you do enter a number, remember that you will need to generate a new token periodically and enter
      🚨 Careful! If you do enter a number, remember that you will need to periodically regenerate it and update Atlan's crawler configuration with the new token each time.
    3. At the bottom of the dialog click Generate.
  4. Copy and save the generated token somewhere, and then click Done.

SQL warehouse (formerly SQL endpoint)

Confirm warehouse setup

To confirm a SQL warehouse is configured:

  1. From the left menu of any page of your Databricks instance, open the drop-down just below the databricks logo and change to SQL.
  2. From the refreshed left menu, click SQL Warehouses.
  3. Click the link under the Name column of the table to open your SQL warehouse.
  4. Under the Connection details tab, confirm that all of the fields are populated and copy them for use in crawling: Server hostnamePortHTTP path.

Generate a personal access token

To generate a personal access token:

  1. In the lower-right corner of the Connection details tab of the SQL warehouse, click the link to Create a personal access token.
  2. In the resulting User Settings page, under the Personal access tokens tab, click the Generate new token button.
  3. In the Generate token dialog:
    1. For Comment enter a description of the token's intended use (for example, Atlan crawler).
    2. For Lifetime (days) consider removing the number. This will allow the token to be used indefinitely — it will not need to be refreshed. If you do enter a number, remember that you will need to generate a new token periodically and enter
      🚨 Careful! If you do enter a number, remember that you will need to periodically regenerate it and update Atlan's crawler configuration with the new token each time.
    3. At the bottom of the dialog click Generate.
  4. Copy and save the generated token somewhere, and then click Done.

(Optional) Set permissions for Unity Catalog

If you're managing permissions selectively through Unity Catalog, then Atlan will require certain privileges to crawl metadata for tables and views.

The caller must be a metastore admin or an owner of (or have the SELECT privilege on) the tables. For the latter case, the caller must also be the owner or have the USE_CATALOG privilege on the parent catalog and the USE_SCHEMA privilege on the parent schema.

AWS service principal authentication

🤓 Who can do this? You will need your AWS Databricks account admin to create a service principal and manage OAuth credentials for the service principal and your AWS Databricks workspace admin to add the service principal to your AWS Databricks workspace — you may not have access yourself.

Atlan currently only supports REST API extraction method when using AWS service principal authentication to crawl Databricks. You will need the following to authenticate the connection in Atlan:

Create a service principal

You can create a service principal directly in your Databricks account or from a Databricks workspace.

  • Identity federation enabled on your workspaces: Databricks recommends creating the service principal in the account and assigning it to workspaces.
  • Identity federation disabled on your workspaces: Databricks recommends that you create your service principal from a workspace.

Identity federation enabled

To create a service principal from your Databricks account, with identify federation enabled:

  1. Log in to your Databricks account console as an account admin.
  2. From the left menu of the account console, click User management.
  3. From the tabs along the top of the User management page, click the Service principals tab.
  4. In the upper right of the Service principals page, click Add service principal.
  5. On the Add service principal page, enter a name for the service principal and then click Add.
  6. Once the service principal has been created, you can assign it to your identity federated workspace. From the left menu of the account console, click Workspaces and then select a workspace to which you want to add the service principal.
  7. From the tabs along the top of your workspace page, click the Permissions tab.
  8. In the upper right of the Permissions page, click Add permissions.
  9. In the Add permissions dialog, enter the following details:
    1. For User, group, or service principal, select the service principal you created.
    2. For Permission, click the dropdown and select workspace User.

Identity federation disabled

To create a service principal from a Databricks workspace, with identity federation disabled:

  1. Log in to your AWS Databricks workspace as a workspace admin.
  2. From the top right of your workspace, click your username, and then from the dropdown, click Admin Settings.
  3. In the left menu of the Settings page, under the Workspace admin subheading, click Identity and access.
  4. On the Identity and access page, under Management and permissions, next to Service principals, click Manage
  5. In the upper right of the Service principals page, click Add service principal.
  6. In the Add service principal dialog, click the Add new button.
  7. For New service principal display name, enter a name for the service principal and then click Add.

Create an OAuth secret for the service principal

You will need to create an OAuth secret to authenticate to Databricks REST APIs.

To create an OAuth secret for the service principal:

  1. Log in to your Databricks account console as an account admin.
  2. From the left menu of the account console, click User management.
  3. From the tabs along the top of the User management page, click the Service principals tab.
  4. In the upper right of the Service principals page, select the service principal you created.
  5. On the service principal page, under OAuth secrets, click Generate secret.
  6. From the Generate secret dialog, copy the Secret and Client ID and store it in a secure location.
    🚨 Careful! Note that this secret will only be revealed once during creation. The client ID is the same as the application ID of the service principal.
  7. Once you've copied the client ID and secret, click Done.

Grant permissions to access catalogs

You will need to grant permissions to the newly created service principal to allow access to your Unity Catalog-enabled AWS Databricks workspace.

To grant permissions to the service principal:

  1. Log in to your AWS Databricks workspace as a workspace admin.
  2. From the left menu of your workspace, click Catalog.
  3. In the left menu of the Catalog Explorer page, select your workspace.
  4. From the tabs along the top of your workspace page, click the Permissions tab and then click the Grant button.
  5. In the Grant on (workspace name) dialog, enter the following details:
    1. Under Principals, click the dropdown and then select the newly created service principal.
    2. Under Privilege presets, click the dropdown and then click Data Reader to allow read-only access to the catalog. Doing so will automatically select the following privileges — USE CATALOG, USE SCHEMA, EXECUTE, READ VOLUME, and SELECT.
    3. At the bottom of the dialog, click Grant.

Azure service principal authentication

🤓 Who can do this? You will need your Azure Databricks account admin to create a service principal and your Azure Databricks workspace admin to add the service principal to your Azure Databricks workspace — you may not have access yourself.

Atlan currently only supports REST API extraction method when using Azure service principal authentication to crawl Databricks. You will need the following to authenticate the connection in Atlan:

Create a service principal

To use service principals on Azure Databricks, an admin user must create a new Microsoft Entra ID (formerly Azure Active Directory) application and then add it to the Azure Databricks workspace to use as a service principal.

To create a service principal:

  1. Sign in to the Azure portal.
  2. If you have access to multiple tenants, subscriptions, or directories, click the Directories + subscriptions (directory with filter) icon in the top menu to switch to the directory in which you want to create the service principal.
  3. In Search resources, services, and docs, search for and select Microsoft Entra ID.
  4. Click + Add and select App registration.
  5. For Name, enter a name for the application.
  6. In the Supported account types section, select Accounts in this organizational directory only (Single tenant) and then click Register.
  7. On the application page’s Overview page, in the Essentials section, copy and store the following values in a secure location:
    • Application (client) ID
    • Directory (tenant) ID
  8. To generate a client secret, within Manage, click Certificates & secrets.
  9. On the Client secrets tab, click New client secret.
  10. In the Add a client secret dialog, enter the following details:
    1. For Description, enter a description for the client secret.
    2. For Expires, select an expiry time period for the client secret and then click Add.
    3. Copy and store the client secret’s Value in a secure place.

Add a service principal to your account

To add a service principal to your Azure Databricks account:

  1. Log in to your Azure Databricks account console as an account admin.
  2. From the left menu of the account console, click User management.
  3. From the tabs along the top of the User management page, click the Service principals tab.
  4. In the upper right of the Service principals page, click Add service principal.
  5. On the Add service principal page, enter a name for the service principal.
  6. Under UUID, paste the Application (client) ID for the service principal.
  7. Click Add.

Assign a service principal to a workspace

To add users to a workspace using the account console, the workspace must be enabled for identity federation. Workspace admins can also assign service principals to workspaces using the workspace admin settings page.

Identity federation enabled

To assign a service principal to your Azure Databricks account:

  1. Log in to your Databricks account console as an account admin.
  2. From the left menu of the account console, click Workspaces and then select a workspace to which you want to add the service principal.
  3. From the tabs along the top of your workspace page, click the Permissions tab.
  4. In the upper right of the Permissions page, click Add permissions.
  5. In the Add permissions dialog, enter the following details:
    1. For User, group, or service principal, select the service principal you created.
    2. For Permission, click the dropdown to select workspace User.

Identity federation disabled

To assign a service principal to your Azure Databricks workspace:

  1. Log in to your Azure Databricks workspace as a workspace admin.
  2. From the top right of your workspace, click your username, and then from the dropdown, click Admin Settings.
  3. In the left menu of the Settings page, under the Workspace admin subheading, click Identity and access.
  4. On the Identity and access page, under Management and permissions, next to Service principals, click Manage
  5. In the upper right of the Service principals page, click Add service principal.
  6. In the Add service principal dialog, click the Add new button.
  7. For New service principal display name, paste the Application (client) ID for the service principal, enter a display name, and then click Add.

Grant permissions to access catalogs

You will need to grant permissions to the newly created service principal to allow access to your Unity Catalog-enabled Azure Databricks workspace.

To grant permissions to the service principal:

  1. Log in to your Azure Databricks workspace as a workspace admin.
  2. From the left menu of your workspace, click Catalog.
  3. In the left menu of the Catalog Explorer page, select your workspace.
  4. From the tabs along the top of your workspace page, click the Permissions tab and then click the Grant button.
  5. In the Grant on (workspace name) dialog, enter the following details:
    1. Under Principals, click the dropdown and then select the newly created service principal.
    2. Under Privilege presets, click the dropdown and then click Data Reader to allow read-only access to the catalog. Doing so will automatically select the following privileges — USE CATALOG, USE SCHEMA, EXECUTE, READ VOLUME, and SELECT.
    3. At the bottom of the dialog, click Grant.

Related articles

Was this article helpful?
1 out of 1 found this helpful