Atlan crawls datasets and then filters out all the datasets without any checks. It then crawls the checks associated with each of the datasets with checks from Soda. These checks are cataloged in Atlan to create a relationship with existing assets using the association information from the dataset.
Once you have crawled Soda, you can use connector-specific filters for quick asset discovery. The following filters are currently supported for Soda assets:
- Check status — filter Soda checks by status
- Check owner — filter Soda checks by email address of check owner
- Last scanned at — filter Soda checks by timestamp for last scanned in Soda
The following Soda filters are currently available for supported SQL assets:
- Data quality status — filter SQL assets by overall data quality status, including Pass, Warn, Fail, and Not evaluated
- Check count — filter SQL assets by total count of associated Soda checks
- Scanned date — filter SQL assets by timestamp for last scanned in Soda
- Last synced (in Atlan) — filter SQL assets by timestamp for when any associated checks were last updated in Atlan
Atlan crawls and maps the following assets and properties from Soda.
Checks
Atlan maps checks from Soda to its SodaCheck
asset type.
Source property | Atlan property |
---|---|
name |
name |
description |
description |
id |
sodaCheckId |
evaluationStatus |
sodaCheckEvaluationStatus |
definition |
sodaCheckDefinition |
incidents |
sodaCheckIncidentCount |
lastCheckRunTime |
sodaCheckLastScanAt |
cloudUrl |
sourceURL |
lastUpdated |
sourceUpdatedAt |
owner.email |
sourceOwners |
Supported sources
If you have crawled supported data sources, you can view Soda checks on your existing assets in Atlan:
- Amazon Athena
- Amazon Redshift
- Databricks
- Google BigQuery
- Hive
- Microsoft Azure Synapse Analytics
- Microsoft SQL Server
- MySQL
- PostgreSQL
- Snowflake
- Trino
Soda checks can also be cataloged for some data sources that are not natively supported in Atlan. These will require additional configuration at source. To ensure that the datasets are mapped to your assets in Atlan, set the value of the data_source_name
field to <database>.<schema>
when connecting to: