Troubleshooting usage and popularity metrics

Atlan currently supports usage and popularity metrics for the following connectors:

  • Amazon Redshift — tables, views, and columns. Expensive queries and compute costs for Amazon Redshift assets are currently unavailable due to limitations at source.
  • Databricks — tables, views, and columns. Expensive queries and compute costs for Databricks assets are currently unavailable due to limitations of the Databricks APIs.
  • Google BigQuery — tables, views, and columns
  • Microsoft Power BI — reports and dashboards
  • Snowflake — tables, views, and columns

Can any user set this up?

There is no separate setup required. It is bundled with the Amazon Redshift, Databricks, Google BigQuery, Microsoft Power BI, and Snowflake miner packages. As long as the miner is set up, popularity will be calculated.

Which editions of Snowflake are supported?

The following Snowflake editions are supported:

  • Standard Edition
  • Enterprise Edition
  • Business Critical Edition
  • Virtual Private Snowflake (VPS)

Do account-level permissions need to be modified for setup?

No extra permissions are required.

  • For enterprise customers, Atlan will use the ACCESS_HISTORY table to determine which objects have been accessed — keeping the data source as the source of truth.
  • For other customers, Atlan will use internal logic to determine the same.

How can I set up popularity on my instance?

First-time setup:

  • Head over to the Atlan marketplace and set up the miner package for a supported connector with a daily frequency (recommended).
  • On the first run, popularity will be calculated from the start date of the miner with a default window of 14 days.
  • For all subsequent runs, the popularity window will be increased to a maximum limit of 30 days. 

For preconfigured miners:

  • The next run of the miner will migrate the last 30 days worth of query data and calculate popularity for 30 days (if available).
  • Subsequent runs will work as expected.

How is the popularity score calculated?

The computation of popularity score is based on the number of read queries that used the data asset and the number of distinct users executing the read queries. Values are collected over a period of 30 days.

Popularity score = number of distinct readers * log (total number of read queries)

What asset types have popularity scores?

Popularity scores are currently available for all Amazon Redshift, Databricks, Google BigQuery, and Snowflake tables, views, and columns, and Microsoft Power BI reports and dashboards.

How is the compute cost estimated per asset?

Amazon Redshift

Compute cost for Amazon Redshift assets are currently unavailable due to limitations at source.

Databricks

Compute cost for Databricks assets are currently unavailable due to limitations of the Databricks APIs.

Google BigQuery

If the Google BigQuery miner is configured to use:

  • On-demand pricing — the bytes field will indicate the total bytes billed for the query.
  • Flat-rate pricing — the slot-ms (slot-milliseconds) field will indicate the total slot utilization for the query. For example, if a 20-second query is continuously consuming 4 slots, then the query will have utilized 80,000 slot-ms.

In Atlan, the compute cost of a Google BigQuery asset is estimated based on each individual query. For example: 

  • Query A ran on asset x = 10 slot-ms or bytes
  • Query B ran on asset x = 15 slot-ms or bytes
  • Estimated compute cost for asset x = 10 + 15 = 25 slot-ms or bytes

Learn more about Google BigQuery pricing.

Snowflake

In Atlan, the compute cost of a Snowflake asset is estimated based on each individual query. For example, if warehouse X-small costs 1 credit per hour: 

  • Query A ran from 1 p.m. to 5 p.m. on warehouse X-small = 4 credits
  • Query B ran from 11 a.m. to 3 p.m. on warehouse X-small = 4 credits
  • Estimated compute cost: 4 credits + 4 credits = 8 credits

Can we exclude queries run by ETL users?

Atlan supports excluding queries by users. You can exclude users while configuring the miner behavior for all supported connectors — Amazon Redshift, Databricks, Google BigQuery, Microsoft Power BI, and Snowflake.

Why are metrics missing even after running the miner?

Usage and popularity metrics are computed using both the number of queries and the number of users who have queried that asset in the last 30 days. If an asset has not been queried in the last 30 days, there will be no usage and popularity metrics to report for that asset.

If an asset has been queried and you've run the miner but metrics are still missing, then it may be due to the miner logic at work.

When setting up the miner for the first time, you will need to provide a start date — ranging from the last two days up to past two weeks of query history. If an asset has not been queried during the selected time period, usage and popularity metrics will be unavailable.

For subsequent runs, the miner will fetch query history based on the following logic:

START_TIME <= CURRENT_DATE - INTERVAL '1 DAY'

For example, the miner logic for January 23 will be:

  • Jan 22 5 p.m. <= Jan 23 00:00 - 1 day
  • Jan 22 5 p.m. <= Jan 22 00:00

The miner will not fetch the data for the previous day (January 22) on the current day (January 23). Atlan requires a minimum lag of 24 to 48 hours to capture all the relevant transformations that were part of a query session.

What type of queries do you show for popular, slow, or expensive queries?

Only read queries or SELECT statements are shown for popular, slow, and expensive queries. DDL and DML statements are not supported.

Related articles

Was this article helpful?
1 out of 1 found this helpful