Do the following questions sound familiar to you and your data team?
- Is the data in this table actually updated?
- When was the last time this data was updated?
- Can I use this data for my analysis?
For data consumers, the data pipeline can be a black box.
For example, data analysts might finish an entire analysis project only to find that the data was outdated. Or worse, an executive dashboard may have been shipped with the wrong data. π¨ On the other hand, data engineers waste a ton of time responding to Slack messages and emails from panicking data consumers.
To solve this once and for all, some amazing DataOps champions integrate metadata from their data pipelines into Atlan to create a single source of truth.
How data pipeline metadata can help data consumers
Data pipelines can be a rich source of metadata. When made easily accessible to the end data consumers, this additional context can significantly improve trust in your data team.
Here are a few examples of metadata that we see DataOps champions bring into Atlan's asset profiles from their data pipelines:
- Data freshness β for example, last updated date and time
- Pipeline run status β for example, success or failure
- Links to the pipeline β for example, a link to the relevant Airflow DAG for troubleshooting
Integrate your data pipeline metadata
You can either integrate your data pipeline metadata into Atlan as custom metadata or through APIs.
Add pipeline metadata as custom metadata
You can add your data pipeline metadata as custom metadata in Atlan. Navigate to the Custom Metadata tab in the admin center to create a custom metadata structure, define the attributes you want to bring in from your data pipelines, and then display them on your asset profiles.
Additionally, you can highlight your most critical custom metadata with badges to help users quickly get the context they need for your assets.
Add pipeline metadata with Atlan APIs
To post metadata from your ETL tool into Atlan, follow the steps in the Atlan API documentation. You can plug this API call at the end of each pipeline's run, so that the data in Atlan is updated automatically whenever a pipeline run happens.