In some cases you will not be able to expose your Kafka instance for Atlan to crawl and ingest metadata. For example, this may happen when security requirements restrict access to sensitive, mission-critical data.
In such cases you may want to decouple the extraction of metadata from its ingestion in Atlan. This approach gives you full control over your resources and metadata transfer to Atlan.
Prerequisites
To extract metadata from your Kafka instance, you will need to use Atlan's kafka-extractor tool.
Install Docker Compose
Docker Compose is a tool for defining and running applications composed of many Docker containers. (Any guesses where the name came from? π)
To install Docker Compose:
Get the kafka-extractor tool
To get the kafka-extractor tool:
- Raise a support ticket to get the link to the latest version.
- Download the image using the link provided by support.
- Load the image to the server you'll use to crawl Kafka:
sudo docker load -i /path/to/kafka-extractor-master.tar
Get the compose file
Atlan provides you with a Docker compose file for the kafka-extractor tool.
To get the compose file:
- Download the latest compose file.
- Save the file to an empty directory on the server you'll use to access your on-premises Kafka instance.
- The file is
docker-compose.yaml
.
Define Kafka connections
The structure of the compose file includes three main sections:
x-templates
contains configuration fragments. You should ignore this section β do not make any changes to it.services
is where you will define your Kafka connections.volumes
contains mount information. You should ignore this section as well β do not make any changes to it.
Define services for Apache Kafka
For each Apache Kafka instance, define an entry under services
in the compose file.
Each entry will have the following structure:
# Example Apache Kafka connection
connection-name:
<<: *extract
environment:
<<: *kafka-defaults
# Kafka bootstrap servers (semicolon-separated)
KAFKA_BOOTSTRAP_SERVERS: "localhost:9092"
# Skip topics that are internal to Kafka (e.g. __consumer_offsets)
SKIP_INTERNAL_TOPICS: "true"
volumes:
# You can change './output/connection-name' to any output location you want
- ./output/connection-name:/output
- Replace
connection-name
with the name of your connection. <<: *extract
tells the kafka-extractor tool to run.environment
contains all parameters for the tool.volumes
specifies where to store results. In this example, the extractor will store results in the./output/connection-name
folder on the local file system.
You can add as many Apache Kafka connections as you want.
Define services for Confluent Kafka
For each Confluent Kafka instance, define an entry under services
in the compose file.
Each entry will have the following structure:
# Example Confluent Kafka connection
connection-name:
<<: *extract
environment:
<<: *kafka-defaults
# Kafka bootstrap servers (semicolon-separated)
KAFKA_BOOTSTRAP_SERVERS: "localhost:9092"
# Skip topics that are internal to Kafka (e.g. __consumer_offsets)
SKIP_INTERNAL_TOPICS: "true"
volumes:
# You can change './output/connection-name' to any output location you want
- ./output/connection-name:/output
- Replace
connection-name
with the name of your connection. <<: *extract
tells the kafka-extractor tool to run.environment
contains all parameters for the tool.volumes
specifies where to store results. In this example, the extractor will store results in the./output/connection-name
folder on the local file system.
You can add as many Confluent Kafka connections as you want.
Define services for Aiven Kafka
For each Aiven Kafka instance, define an entry under services
in the compose file.
Each entry will have the following structure:
# Example Aiven Kafka connection
connection-name:
<<: *extract
secrets:
- kafka_client_config
- kafka_ca_cert
# Uncomment the following lines if you are using Aiven Kafka with Client Certificate Authentication
# - kafka_access_cert
# - kafka_access_key
environment:
<<: *kafka-defaults
# Kafka bootstrap servers (semicolon-separated)
KAFKA_BOOTSTRAP_SERVERS: "localhost:9092"
# Skip topics that are internal to Kafka (e.g. __consumer_offsets)
SKIP_INTERNAL_TOPICS: "true"
volumes:
# You can change './output/connection-name' to any output location you want
- ./output/connection-name:/output
- Replace
connection-name
with the name of your connection. <<: *extract
tells the kafka-extractor tool to run.environment
contains all parameters for the tool.volumes
specifies where to store results. In this example, the extractor will store results in the./output/connection-name
folder on the local file system.
You can add as many Aiven Kafka connections as you want.
Define services for Redpanda Kafka
For each Redpanda Kafka instance, define an entry under services
in the compose file.
Each entry will have the following structure:
# Example Redpanda Kafka connection
connection-name:
<<: *extract
environment:
<<: *kafka-defaults
# Kafka bootstrap servers (semicolon-separated)
KAFKA_BOOTSTRAP_SERVERS: "localhost:9092"
# Skip topics that are internal to Kafka (e.g. __consumer_offsets)
SKIP_INTERNAL_TOPICS: "true"
volumes:
# You can change './output/connection-name' to any output location you want
- ./output/connection-name:/output
- Replace
connection-name
with the name of your connection. <<: *extract
tells the kafka-extractor tool to run.environment
contains all parameters for the tool.volumes
specifies where to store results. In this example, the extractor will store results in the./output/connection-name
folder on the local file system.
You can add as many Redpanda Kafka connections as you want.
services
format in more detail.Provide credentials
To define the credentials for your Kafka connections, you will need to provide a Kafka client configuration file. For managed Kafka instances such as Confluent Cloud and Aiven, this configuration can be copied directly from the console.
Here is an example that would be compatible with all Kafka variants β Apache Kafka, Confluent Cloud, and Aiven Kafka. This is just an example, your cluster configuration may vary:
# Required connection configs for Kafka producer, consumer, and admin
# If SSL enabled, use SASL_SSL, otherwise use SASL_PLAINTEXT (when using with basic auth)
security.protocol=SASL_SSL
# If basic auth is enabled
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='{{ USERNAME or CLUSTER_API_KEY }}' password='{{ PASSWORD or CLUSTER_API_SECRET }}';
sasl.mechanism=PLAIN
# Best practice for higher availability in Apache Kafka clients prior to 3.0
session.timeout.ms=45000
Redpanda Kafka only supports the SCRAM authentication method. Here is an example configuration:
sasl.mechanism=<SCRAM-SHA-256 or SCRAM-SHA-512 depending on your config>
security.protocol=SASL_SSL
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="<username>" password="<password>";
# Best practice for higher availability in Apache Kafka clients prior to 3.0
session.timeout.ms=45000
Secure credentials
Using local files
To specify the local files in your compose file:
secrets:
kafka_client_config:
# Change it to the actual location of your kafka config file (MANDATORY)
file: ./kafka.client.config
kafka_ca_cert:
# Change it to the actual location of your kafka CA cert file (OPTIONAL - only use if using custom CA)
file: ./ca.pem
kafka_access_cert:
# Change it to the actual location of your kafka access cert file (OPTIONAL - only use if using Client Certificate auth)
file: ./service.cert
kafka_access_key:
# Change it to the actual location of your kafka access key file (OPTIONAL - only use if using Client Certificate auth)
file: ./service.key
secrets
section is at the same top-level as the services
section described earlier. It is not a sub-section of the services
section.Using Docker secrets
To create and use Docker secrets:
- Store the Kafka configuration file:
sudo docker secret create kafka_client_config path/to/kafka.client.config # Optional sudo docker secret create kafka_ca_cert path/to/ca.pem sudo docker secret create kafka_access_cert path/to/service.cert sudo docker secret create kafka_access_key path/to/service.key
- At the top of your compose file, add a secrets element to access your secret:
secrets: kafka_client_config: external: true name: kafka_client_config kafka_ca_cert: external: true name: kafka_ca_cert kafka_access_cert: external: true name: kafka_access_cert kafka_access_key: external: true name: kafka_access_key
- The
name
should be the same one you used in thedocker secret create
command above. - Once stored as a Docker secret, you can remove the local Kafka configuration file.
- The
-
Within the
service
section of the compose file, add a new secrets element and specify the name of the secret within your service to use it.
Example
Let's explain in detail with an example:
secrets:
kafka_client_config:
external: true
name: kafka_client_config
x-templates:
# ...
services:
# Example Apache Kafka connection
apache-kafka-example:
<<: *extract
environment:
<<: *kafka-defaults
# Kafka bootstrap servers (semicolon-separated)
KAFKA_BOOTSTRAP_SERVERS: "localhost:9092"
# Skip topics that are internal to Kafka (e.g. __consumer_offsets)
SKIP_INTERNAL_TOPICS: "true"
volumes:
# You can change './output/apache-kafka-example' to any output location you want
- ./output/apache-kafka-example:/output
- In this example, we've defined the secrets at the top of the file (you could also define them at the bottom). The
kafka_client_config
refers to an external Docker secret created using thedocker secret create
command. - The name of this service is
apache-kafka-example
. You can use any meaningful name you want. - The
<<: *kafka-defaults
sets the connection type to Kafka. KAFKA_BOOTSTRAP_SERVERS
tells the extractor about the Kafka hosts or brokers.SKIP_INTERNAL_TOPICS
tells the extractor whether to extract internal topics or skip them.- The
./output/apache-kafka-example:/output
line tells the extractor where to store results. In this example, the extractor will store results in the./output/apache-kafka-example
directory on the local file system. We recommend you output the extracted metadata for different connections in separate directories. - The
secrets
section withinservices
tells the extractor which secrets to use for this service. Each of these refers to the name of a secret listed at the beginning of the compose file.