Atlan is a fully virtualized solution that does not involve moving data from existing storage layers. Atlan crawls metadata from upstream data sources and stores it in a secure VPC (virtual private cloud).
Atlan pushes any queries to existing processing layers. For example, directly to your database, warehouse, or a processing layer such as Athena or Presto on top of blob storage. So the data itself stays put — Atlan does not move or store it.
Not sure on the difference between data and metadata? Try our helpful primer.
Data previews and queries
Atlan gives users the ability to see sample data previews for a data asset as well as the results for any queries run on Atlan.
In both cases, Atlan pushes the request upstream to the data source, and shows a 100-row sample of the result to Atlan users. Atlan does not cache any of this data. So each time a user previews or queries data, it is re-queried from the source.
Every time a user runs a query, Atlan streams query results in batches directly from your data source. Since the data is streamed in real-time from the data source, there is no need to persist the query results in Atlan's cache or storage layer. This ensures that the data displayed is always up to date and accurate, eliminating the need for storing intermediate query results.
Metadata storage
Atlan stores the metadata it collects and creates in applications and databases within the VPC. This includes:
- asset metadata
- user data
Asset metadata
Atlan stores asset metadata, including lineage, in:
- Apache Atlas, a graph database layer that stores entity relationships and attributes
- Elasticsearch, to optimize search on the product
- Cassandra, as the persistence back-end
User data
Atlan stores data on users, roles, and groups in its own PostgreSQL database. Keycloak uses this information for access and identity management.
Atlan hashes all sensitive fields like passwords and stores them securely. Any user data transmitted over the internet is SSL-encrypted over HTTPS.