Monitoring and improving data quality is critical to building trust in your data assets. Atlan solves for this with profiling playbooks!
Profiling playbooks help power data observability for your assets in Atlan. You can create profiling playbooks to scan your assets at scale, identify any issues or inconsistencies, and improve the data quality of your assets.
Supported sources
Atlan currently supports column profiling for the following connectors:
- Amazon Athena
- Amazon Redshift
- Databricks
- Google BigQuery
- Microsoft SQL Server
- MySQL
- PostgreSQL
- Snowflake
- Trino
Create a profiling playbook
To create a profiling playbook:
- In the left menu in Atlan, click Governance.
- Under the Governance heading of the Governance center, click Playbooks.
- To the right of the Create New button, click the downward arrow and then select Profiling Playbook.
- In the Create new profiling playbook dialog, enter the following details:
- For Name, enter a name for the task to be accomplished β for example,
Tables scan
. (Atlan recommends that the length of a playbook name must be no longer than 46 characters.) - For Connection, select a supported connection from the dropdown menu β in this example, we'll select a Google BigQuery connection
development
. - (Optional) For Description, enter a description for your playbook.
- (Optional) Select an icon for your playbook.
- For Name, enter a name for the task to be accomplished β for example,
- Click Create to save your playbook.
Set up rules as filters
To set up rules as filters for your profiling playbook:
- In the Build Rules page of your profiling playbook, click Filters.
- For the name field, add a name to your filter β for example, Profiling action.
- To set a matching condition for the filters, select Match all or Match any. Match all will logically
AND
the criteria, while Match any will logicallyOR
the criteria. - For Attributes, select the relevant option. For this example, we'll select Name listed under Properties. (Optional) To further refine your asset selection:
- Click Connection to select a specific connection.
- Click All databases to filter by databases in a selected connection.
- Click All schemas to filter by schemas in a selected connection.
- Click Connector to filter assets by supported connectors.
- Click Asset type to filter by specific asset types β for example, tables, columns, queries, glossaries, and more.
- Click Certificate to filter assets by certification status.
- Click Owners to filter assets by asset owners.
- Click Tags to filter assets by your tags in Atlan, including imported Snowflake and dbt tags.
- (Optional) For Snowflake tags only, to the left of the checkbox, click Select value, and then from the Select tag value dialog, select any value(s) to filter assets by tag value.
- Click Glossary, terms, & categories to filter by a specific glossary or category to bulk update all the nested terms or by multiple glossaries and categories.
- Click Linked terms to filter assets by linked terms.
- Click Schema qualified Name to filter assets by the qualified name of a given schema.
- Click Database qualified Name to filter assets by the qualified name of a given database.
- Click dbt to filter assets by dbt-specific filters and then select a dbt Cloud or dbt Core filter.
- Click Properties to filter assets by common asset properties.
- Click Usage to filter assets by usage metrics.
- Click Monte Carlo to filter assets by Monte Carlo-specific filters.
- Click Soda to filter assets by Soda-specific filters.
- Click Table/View to filter tables or views by row count, column count, or size.
- Click Column to filter columns by column-specific filters, including parent asset type or name, data type, or column keys.
- Click Process to filter lineage processes by the SQL query.
- Click Query to filter assets by associated visual queries.
- Click Measure to filter Microsoft Power BI measures using the external measures filter.
- Click Connection to select a specific connection.
- For Operator, select Is one of for values to include or Is not for values to exclude. Depending on the selected attribute(s), you can also choose from additional operators:
- Select Equals (=) or Not Equals (!=) to include or exclude assets through exact match search.
- Select Starts With or Ends With to filter assets using the starting or ending sequence of values.
- Select Contains or Does not contain to find assets with or without specified values contained within the attribute.
- Select Pattern to filter assets using supported Elastic DSL regular expressions.
- Select Is empty to filter assets with null values.
- For Values, select the relevant values. The values will vary depending on the selected attributes.
- (Optional) To add more filters, click Add filter and select Filter to add individual filters or Filter Group to nest more filters in a group.
- (Optional) To view all the assets that match your rules, in the Filters card, click View all for a preview.
Confirm profiling actions
To select the actions to be performed based on your rules:
- The default profiling actions to be performed include:
- Base metrics:
- Distinct count β number of rows that contain distinct values, relative to the column.
- Missing count β number of rows that do not contain specific values.
- Numeric metrics:
- Minimum and maximum values β smallest and greatest values in a numeric column.
- Average β calculated average of values in a numeric column.
- Standard deviation β calculated standard deviation of values in a numeric column.
- Variance β calculated variance of values in a numeric column.
- String metrics:
- Average length β average length of string values in a column.
- Minimum and maximum length β minimum and maximum length of string values in a column.
- Base metrics:
- Click Next to proceed to the next step.
- In the Optimize your Profiling query popup, the following message will appear: This Profiling playbook will query
x
rows acrossy
assets. To avoid significant computing costs, review your applied filters before proceeding. Click Review filters to review your existing filters or click Continue anyway to proceed.
Note that Atlan is working to support sampling functionality in the future.
Run the playbook
If you'd like to continue working on your playbook, you can save it as a draft. If your playbook is ready, you can proceed to running it.
To run the playbook:
- You can either:
- To run the playbook once immediately, click Run once.
- To schedule the playbook to run hourly, daily, weekly, or monthly, click Schedule and choose the preferred frequency, timezone, and time.
- Click Complete to confirm your selections.
- In the resulting screen, click Go to profile to view your playbook profile.
Once your playbook run is completed, you will see the data profile updated for your assets! π
View profiled assets
To view the profiled assets for your playbook:
- In the Overview page of your playbook, to the right of Profiling action, click the total count of profiled assets.
- In the sidebar to the right, profiled assets will be indicated with a bar graph icon. Click any profiled asset to proceed to viewing profiling data.
- From the table sidebar, click the Column tab to view column assets and then select any of the profiled columns.
- In the column sidebar to the right, click Profile to view profiling data for the selected column asset.