Atlan supports three authentication methods for fetching metadata from Databricks. You can set up any of the following authentication methods:
- Personal access token authentication
- AWS service principal authentication
- Azure service principal authentication
Personal access token authentication
Grant user access to workspace
To grant workspace access to the user creating a personal access token:
- From the left menu of the account console, click Workspaces and then select a workspace to which you want to add the user.
- From the tabs along the top of your workspace page, click the Permissions tab.
- In the upper right of the Permissions page, click Add permissions.
- In the Add permissions dialog, enter the following details:
- For User, group, or service principal, select the user to grant access.
- For Permission, click the dropdown and select workspace User.
Generate a personal access token
You can generate a personal access token in your Databricks workspace to the authenticate the integration in Atlan.
To generate a personal access token:
- From the top right of your Databricks workspace, click your Databricks username, and then from the dropdown, click User Settings.
- Under the Settings menu, click Developer.
- On the Developer page, next to Access tokens, click Manage.
- On the Access tokens page, click the Generate new token button.
- In the Generate new token dialog:
- For Comment, enter a description of the token's intended use — for example,
Atlan crawler
. - For Lifetime (days), consider removing the number. This will allow the token to be used indefinitely — it will not need to be refreshed.
🚨 Careful! If you do enter a number, remember that you will need to periodically regenerate it and update Atlan's crawler configuration with the new token each time.
- At the bottom of the dialog, click Generate.
- For Comment, enter a description of the token's intended use — for example,
- Copy and save the generated token in a secure location, and then click Done.
Select a cluster
You can set up personal access token authentication for your Databricks instance using one of the following cluster options:
- Interactive cluster
- SQL warehouse (formerly SQL endpoint)
Interactive cluster
To confirm an all-purpose interactive cluster is configured:
- From the left menu of any page of your Databricks instance, click Compute.
- Under the All-purpose clusters tab, ensure you have a cluster defined.
- Click the link under the Name column of the table to open your cluster.
- Under the Configuration tab, ensure the Autopilot options to Terminate after ... minutes is enabled.
- At the bottom of the Configuration tab, expand the Advanced options expandable.
- Under the Advanced options expandable, open the JDBC/ODBC tab.
- Confirm that all of the fields in this tab are populated, and copy them for use in crawling: Server Hostname, Port, and HTTP Path.
SQL warehouse (formerly SQL endpoint)
To confirm a SQL warehouse is configured:
- From the left menu of any page of your Databricks instance, open the dropdown just below the databricks logo and change to SQL.
- From the refreshed left menu, click SQL Warehouses.
- Click the link under the Name column of the table to open your SQL warehouse.
- Under the Connection details tab, confirm that all of the fields are populated and copy them for use in crawling: Server hostname, Port, and HTTP path.
AWS service principal authentication
Atlan currently only supports REST API extraction method when using AWS service principal authentication to crawl Databricks. You will need the following to authenticate the connection in Atlan:
Create a service principal
You can create a service principal directly in your Databricks account or from a Databricks workspace.
- Identity federation enabled on your workspaces: Databricks recommends creating the service principal in the account and assigning it to workspaces.
- Identity federation disabled on your workspaces: Databricks recommends that you create your service principal from a workspace.
Identity federation enabled
To create a service principal from your Databricks account, with identify federation enabled:
- Log in to your Databricks account console as an account admin.
- From the left menu of the account console, click User management.
- From the tabs along the top of the User management page, click the Service principals tab.
- In the upper right of the Service principals page, click Add service principal.
- On the Add service principal page, enter a name for the service principal and then click Add.
- Once the service principal has been created, you can assign it to your identity federated workspace. From the left menu of the account console, click Workspaces and then select a workspace to which you want to add the service principal.
- From the tabs along the top of your workspace page, click the Permissions tab.
- In the upper right of the Permissions page, click Add permissions.
- In the Add permissions dialog, enter the following details:
- For User, group, or service principal, select the service principal you created.
- For Permission, click the dropdown and select workspace User.
Identity federation disabled
To create a service principal from a Databricks workspace, with identity federation disabled:
- Log in to your AWS Databricks workspace as a workspace admin.
- From the top right of your workspace, click your username, and then from the dropdown, click Admin Settings.
- In the left menu of the Settings page, under the Workspace admin subheading, click Identity and access.
- On the Identity and access page, under Management and permissions, next to Service principals, click Manage.Â
- In the upper right of the Service principals page, click Add service principal.
- In the Add service principal dialog, click the Add new button.
- For New service principal display name, enter a name for the service principal and then click Add.
Create an OAuth secret for the service principal
You will need to create an OAuth secret to authenticate to Databricks REST APIs.
To create an OAuth secret for the service principal:
- Log in to your Databricks account console as an account admin.
- From the left menu of the account console, click User management.
- From the tabs along the top of the User management page, click the Service principals tab.
- In the upper right of the Service principals page, select the service principal you created.
- On the service principal page, under OAuth secrets, click Generate secret.
- From the Generate secret dialog, copy the Secret and Client ID and store it in a secure location.
🚨 Careful! Note that this secret will only be revealed once during creation. The client ID is the same as the application ID of the service principal.
- Once you've copied the client ID and secret, click Done.
Azure service principal authentication
Atlan currently only supports REST API extraction method when using Azure service principal authentication to crawl Databricks. You will need the following to authenticate the connection in Atlan:
- Client ID (application ID)
- Client secret
- Tenant ID (directory ID)
Create a service principal
To use service principals on Azure Databricks, an admin user must create a new Microsoft Entra ID (formerly Azure Active Directory) application and then add it to the Azure Databricks workspace to use as a service principal.
To create a service principal:
- Sign in to the Azure portal.
- If you have access to multiple tenants, subscriptions, or directories, click the Directories + subscriptions (directory with filter) icon in the top menu to switch to the directory in which you want to create the service principal.
- In Search resources, services, and docs, search for and select Microsoft Entra ID.
- Click + Add and select App registration.
- For Name, enter a name for the application.
- In the Supported account types section, select Accounts in this organizational directory only (Single tenant) and then click Register.
- On the application page’s Overview page, in the Essentials section, copy and store the following values in a secure location:
- Application (client) ID
- Directory (tenant) ID
- To generate a client secret, within Manage, click Certificates & secrets.
- On the Client secrets tab, click New client secret.
- In the Add a client secret dialog, enter the following details:
- For Description, enter a description for the client secret.
- For Expires, select an expiry time period for the client secret and then click Add.
- Copy and store the client secret’s Value in a secure place.
Add a service principal to your account
To add a service principal to your Azure Databricks account:
- Log in to your Azure Databricks account console as an account admin.
- From the left menu of the account console, click User management.
- From the tabs along the top of the User management page, click the Service principals tab.
- In the upper right of the Service principals page, click Add service principal.
- On the Add service principal page, enter a name for the service principal.
- Under UUID, paste the Application (client) ID for the service principal.
- Click Add.
Assign a service principal to a workspace
To add users to a workspace using the account console, the workspace must be enabled for identity federation. Workspace admins can also assign service principals to workspaces using the workspace admin settings page.
Identity federation enabled
To assign a service principal to your Azure Databricks account:
- Log in to your Databricks account console as an account admin.
- From the left menu of the account console, click Workspaces and then select a workspace to which you want to add the service principal.
- From the tabs along the top of your workspace page, click the Permissions tab.
- In the upper right of the Permissions page, click Add permissions.
- In the Add permissions dialog, enter the following details:
- For User, group, or service principal, select the service principal you created.
- For Permission, click the dropdown to select workspace User.
Identity federation disabled
To assign a service principal to your Azure Databricks workspace:
- Log in to your Azure Databricks workspace as a workspace admin.
- From the top right of your workspace, click your username, and then from the dropdown, click Admin Settings.
- In the left menu of the Settings page, under the Workspace admin subheading, click Identity and access.
- On the Identity and access page, under Management and permissions, next to Service principals, click Manage.Â
- In the upper right of the Service principals page, click Add service principal.
- In the Add service principal dialog, click the Add new button.
- For New service principal display name, paste the Application (client) ID for the service principal, enter a display name, and then click Add.
Grant permissions to crawl metadata
You must have a Unity Catalog-enabled Databricks workspace to crawl metadata in Atlan.
To extract metadata, you can grant the BROWSE privilege, currently in public preview. You will no longer require the Data Reader preset that granted the following privileges on objects in the catalog — USE CATALOG
, USE SCHEMA
, EXECUTE
, READ VOLUME
, and SELECT
.
To grant permissions to a user or service principal:
- Log in to your Databricks workspace as a workspace admin.
- From the left menu of your workspace, click Catalog.
- In the left menu of the Catalog Explorer page, select the catalog you want to crawl in Atlan.
- From the tabs along the top of your workspace page, click the Permissions tab and then click the Grant button.
- In the Grant on (workspace name) dialog, configure the following:
- Under Principals, click the dropdown and then select the user or service principal.
- Under Privileges, check the BROWSE privilege.
- At the bottom of the dialog, click Grant.
- (Optional) Repeat steps 3-5 for each catalog you want to crawl in Atlan.
(Optional) Grant permissions to query and preview data
To grant permissions to query data and preview sample data:
- Log in to your Databricks workspace as a workspace admin.
- From the left menu of your workspace, click Catalog.
- In the left menu of the Catalog Explorer page, select the catalog you want to query and preview data from in Atlan.
- From the tabs along the top of your workspace page, click the Permissions tab and then click the Grant button.
- In the Grant on (workspace name) dialog, configure the following:
- Under Principals, click the dropdown and then select the user or service principal.
- Under Privilege presets, click the dropdown and then click Data Reader to allow read-only access to the catalog. Doing so will automatically select the following privileges —
USE CATALOG
,USE SCHEMA
,EXECUTE
,READ VOLUME
, andSELECT
. - At the bottom of the dialog, click Grant.
- (Optional) Repeat steps 3-5 for each catalog you want to query and preview data from in Atlan.
(Optional) Grant permissions to import and update tags
To import Databricks tags, you must have a Unity Catalog-enabled workspace and a SQL warehouse configured. Atlan supports importing Databricks tags using system tables for all three authentication methods.
Once you have created a personal access token, an AWS service principal, or an Azure service principal, you will need to grant the following privileges:
-
CAN_USE
on a SQL warehouse -
USE CATALOG
onsystem catalog
-
USE SCHEMA
 onÂsystem.information_schema
-
SELECT
on the following tables:system.information_schema.catalog_tags
system.information_schema.schema_tags
system.information_schema.table_tags
system.information_schema.column_tags
To push tags updated for assets in Atlan to Databricks, you will need to grant the following privileges:
-
APPLY TAG
on the object -
USE CATALOG
on the object’s parent catalog -
USE SCHEMA
on the object’s parent schema
(Optional) Grant permissions to extract lineage and usage from system tables
You must have a Unity Catalog-enabled workspace to use system tables.
Atlan supports extracting the following for your Databricks assets using system tables:
Enable system.access schema
You will need your account admin to enable the system.access
schema using the SystemSchemas API. This will allow Atlan to extract lineage using system tables.
In Atlan, one Databricks connection corresponds to one metastore. Repeat the following process for each metastore in your Databricks environment for which you want to extract lineage.
To enable a system schema, refer to Databricks documentation:
- Use the SystemSchemas API to enable the
system.access
schema for a given metastore. - Replace
{schema_name}
 withaccess
.
To ensure that system schemas are enabled for each schema, follow the steps in Databricks documentation.
- List system schemas using the SystemSchemas API to check the status.
- If enabled for any given schema, the state will be
EnableCompleted
. This confirms that the schema has been enabled for that specific metastore. - Atlan will only be able to extract lineage using system tables when the state is marked as
EnableCompleted
.
(Optional) Enable system.query schema
This is only required if you also want to extract usage and popularity metrics from Databricks.
You will need your account admin to enable the system.query
schema using the SystemSchemas API. This will allow Atlan to mine query history using system tables for usage and popularity metrics.
To enable a system schema, refer to Databricks documentation:
- Replace
{schema_name}
withquery
.
To ensure that system schemas is enabled for each schema, follow the steps in Databricks documentation. If enabled for any given schema, the state will be EnableCompleted
.
Grant permissions
Atlan supports extracting Databricks lineage and usage and popularity metrics using system tables for all three authentication methods.
Once you have created a personal access token, an AWS service principal, or an Azure service principal, you will need to grant the following permissions:
-
CAN_USE
on a SQL warehouse -
USE_CATALOG
onsystem
catalog -
USE SCHEMA
onsystem.access
schema -
USE SCHEMA
onsystem.query
schema (to mine query history for usage and popularity metrics) -
SELECT
on the following tables:-
system.query.history
 (to mine query history for usage and popularity metrics) system.access.table_lineage
system.access.column_lineage
-
You will need to create a Databricks connection in Atlan for each metastore. You can use the hostname of your Unity Catalog-enabled workspace as the Host for the connection.
Locate warehouse ID
To extract lineage and usage and popularity metrics using system tables, you will also need the warehouse ID of your SQL warehouse.
To locate the warehouse ID:
- Log in to your Databricks workspace as a workspace admin.
- From the left menu of your workspace, click SQL Warehouses.
- On the Compute page, select the warehouse you want to use.
- From the Overview tab of your warehouse page, next to the Name of your warehouse, copy the value for your SQL warehouse ID. For example,
example-warehouse (ID: 123ab4c5def67890)
, copy the value123ab4c5def67890
and store it in a secure location.
(Optional) Grant permissions to mine query history
To mine query history using REST API, you will need to assign the CAN MANAGE
permission on your SQL warehouses to the user or service principal.
To grant permissions to mine query history:
- Log in to your Databricks workspace as a workspace admin.
- From the left menu of your workspace, click SQL Warehouses.
- On the Compute page, for each SQL warehouse you want to mine query history, click the 3-dot icon and then click Permissions.
- In the Manage permissions dialog, configure the following:
- In the Type to add multiple users or groups field, search for and select a user or service principal.
- Expand the Can use permissions dropdown and then select Can manage. This permission allows the service principal to view all queries for the warehouse.
- Click Add to assign the
CAN MANAGE
permission to the service principal.