Once you have crawled assets from Databricks, you can retrieve lineage from Unity Catalog and usage and popularity metrics from query history.
To retrieve lineage and usage from Databricks, review the order of operations and then complete the following steps.
Select the extractor
To select the Databricks lineage and usage extractor:
- In the top right of any screen, navigate to New and then click New Workflow.
- From the filters along the top, click Miner.
- From the list of packages, select Databricks Lineage and click on Setup Workflow.
Configure the lineage extractor
To configure the Databricks lineage extractor:
- For Connection, select the connection to extract. (To select a connection, the crawler must have already run.)
- Click Next to proceed.
Configure the usage extractor
Atlan extracts usage and popularity metrics from query history. This feature is currently limited to queries on SQL warehouses β queries on interactive clusters are not supported. Additionally, expensive queries and compute costs for Databricks assets are currently unavailable due to limitations of the Databricks APIs.
To configure the Databricks usage and popularity extractor:
- (Optional) For Fetch Query History and Calculate Popularity, click Yes to retrieve usage and popularity metrics for your Databricks assets from query history.
- For Popularity Window (days), 30 days is the maximum limit. You can set a shorter popularity window of less than 30 days.
- For Start time, choose the earliest date from which to mine query history.
- For Excluded Users, type the names of users to be excluded while calculating usage metrics for Databricks assets. Press
enter
after each name to add more names.
Run the extractor
To run the Databricks lineage and popularity extractor, after completing the steps above:
- To check for any permissions or other configuration issues before running the crawler, click Preflight checks.
- You can either:
- To run the crawler once immediately, at the bottom of the screen, click the Run button.
- To schedule the crawler to run hourly, daily, weekly, or monthly, at the bottom of the screen, click the Schedule Run button.
Once the extractor has completed running, you will see lineage for Databricks assets! π