The metadata-extractor tool supports the following connection types.
These describe the details required when setting up on-premises database access.
Amazon Redshift with basic authentication
Use <<: *redshift
under the environment
section to use this connection type.
Available attributes:
-
HOST
— database host name or IP address (required) -
PORT
— database port (optional, default is 5439) -
DATABASE
— database name (required) -
USERNAME
— database username (required) -
PASSWORD
— database user password (required) -
EXCLUDE_FILTER_TEMPLATE
— exclude filter pattern (optional) -
INCLUDE_FILTER_TEMPLATE
— include filter pattern (optional) -
USE_SOURCE_SCHEMA_FILTERING
— Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional) -
USE_JDBC_INTERNAL_METHODS
- Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)
Example
services:
my-redshift-database:
<<: *extract
environment:
<<: *redshift
USERNAME: my-database-username
PASSWORD: my-database-password
HOST: redshift-host
PORT: redshift-database-port
DATABASE: my-database-name
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/redshift-example:/output
Amazon Redshift with IAM user authentication
Use <<: *redshift
under the environment
section to use this connection type.
Available attributes:
-
HOST
— database host name or IP address (required) -
PORT
— database port (optional, default is 5439) -
DATABASE
— database name (required) -
DATABASE_USER
— database username of the IAM user (required) -
AWS_ACCESS_KEY_ID
— AWS access key ID (required) -
AWS_SECRET_ACCESS_KEY
— AWS secret access key (required) -
CLUSTER_ID
— cluster identifier of your private Amazon Redshift cluster (optional) -
EXCLUDE_FILTER_TEMPLATE
— exclude filter pattern (optional) -
INCLUDE_FILTER_TEMPLATE
— include filter pattern (optional) -
USE_SOURCE_SCHEMA_FILTERING
— Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional) -
USE_JDBC_INTERNAL_METHODS
- Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)
Example
services:
redshift-iam-user-example:
<<: *extract
environment:
<<: *redshift-iam-user
AWS_ACCESS_KEY_ID: aws-access-key-id
AWS_SECRET_ACCESS_KEY: aws-secret-access-key
HOST: redshift-host
DATABASE: my-database-name
DATABASE_USER: my-database-user
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
CLUSTER_ID: private-cluster-id
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/redshift-iam-user-example:/output
Hive with basic authentication
Use <<: *hive
under the environment
section to use this connection type.
Available attributes:
-
HOST
— database host name or IP address (required) -
PORT
— database port (optional, default is 10000) -
DEFAULT_SCHEMA
— default schema name (optional, default isdefault
) -
USERNAME
— database user name (required) -
PASSWORD
— database user password (required) -
SCHEMA_EXCLUDE_REGEX
— regex to exclude schemas (optional) -
SCHEMA_INCLUDE_REGEX
— regex to include schemas (optional) -
TEMP_TABLE_REGEX
— regex to exclude tables (optional) -
USE_SOURCE_SCHEMA_FILTERING
— Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional) -
EXCLUDE_FILTER_TEMPLATE
— exclude filter pattern (optional) -
INCLUDE_FILTER_TEMPLATE
— include filter pattern (optional) -
USE_JDBC_INTERNAL_METHODS
- Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)
Example
hive-example:
<<: *hive-extract
environment:
<<: *hive
HOST: hive-host
PORT: hive-port
DEFAULT_SCHEMA: default
USERNAME: my-hive-username
PASSWORD: my-hive-password
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/hive-example:/output
Microsoft SQL Server with basic authentication
Use <<: *mssql
under the environment
section to use this connection type.
Available attributes:
-
HOST
— database host name or IP address (required) -
DATABASE
— database name (required) -
USERNAME
— database user name (required) -
PASSWORD
— database user password (required) -
USE_SOURCE_SCHEMA_FILTERING
— Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional) -
EXCLUDE_FILTER_TEMPLATE
— exclude filter pattern (optional) -
INCLUDE_FILTER_TEMPLATE
— include filter pattern (optional) -
USE_JDBC_INTERNAL_METHODS
- Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)
Example
services:
my-mssql-database:
<<: *extract
environment:
<<: *mssql
USERNAME: db-user
PASSWORD: db-user-password
HOST: mssql-database-host
DATABASE: northwind
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/mssql-database:/output
MySQL with basic authentication
Use <<: *mysql
under the environment
section to use this connection type.
Available attributes:
-
HOST
— database host name or IP address (required) -
PORT
— database port (optional, default is 3306) -
USERNAME
— database user name (required) -
PASSWORD
— database user password (required) -
USE_SOURCE_SCHEMA_FILTERING
— Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional) -
EXCLUDE_FILTER_TEMPLATE
— exclude filter pattern (optional) -
INCLUDE_FILTER_TEMPLATE
— include filter pattern (optional) -
USE_JDBC_INTERNAL_METHODS
- Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)
Example
services:
my-mysql-database:
<<: *extract
environment:
<<: *mysql
USERNAME: db-user
PASSWORD: db-user-password
HOST: mysql-database-host.internal
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/my-mysql-database:/output
MySQL with IAM authentication
Use <<: *mysql-iam
under the environment
section to use this connection type.
Available attributes:
-
HOST
— database host name or IP address (required) -
PORT
— database port (optional, default is 3306) -
USERNAME
— database user name (required) -
AWS_ACCESS_KEY_ID
— AWS access key id (required) -
AWS_SECRET_ACCESS_KEY
— AWS secret access key (required) -
USE_SOURCE_SCHEMA_FILTERING
— Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional) -
EXCLUDE_FILTER_TEMPLATE
— exclude filter pattern (optional) -
INCLUDE_FILTER_TEMPLATE
— include filter pattern (optional) -
USE_JDBC_INTERNAL_METHODS
- Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)
Example
services:
my-mysql-database:
<<: *extract
environment:
<<: *mysql-iam
AWS_ACCESS_KEY_ID: my-access-key-id
AWS_SECRET_ACCESS_KEY: my-secret-access-key
USERNAME: db-user
HOST: mysql-database-host.internal
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/my-mysql-database:/output
Oracle with basic authentication
Use <<: *oracledb
under the environment
section to use this connection type.
Available attributes:
-
HOST
— database host name or IP address (required) -
PORT
— database port (optional, default is 1521) -
USERNAME
— database user name (required) -
PASSWORD
— database user password (required) -
SERVICE
— SID/service identifier (usuallyORCL
) or service name (required) -
DEFAULT_DATABASE_NAME
— database name (usually same as SID) (required) -
USE_SOURCE_SCHEMA_FILTERING
— Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional) -
EXCLUDE_FILTER_TEMPLATE
— exclude filter pattern (optional) -
INCLUDE_FILTER_TEMPLATE
— include filter pattern (optional) -
USE_JDBC_INTERNAL_METHODS
- Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)
Example
services:
my-oracledb-database:
<<: *extract
environment:
<<: *oracledb
HOST: oracle-host.internal
USERNAME: db-user
PASSWORD: db-user-password
SERVICE: ORCL
DEFAULT_DATABASE_NAME: ORCL
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/my-oracledb-database:/output
PostgreSQL with basic authentication
Use <<: *psql
under the environment
section to use this connection type.
Available attributes:
-
HOST
— database host name or IP address (required) -
PORT
— database port (optional, default is 5432) -
DATABASE
— database name (required) -
USERNAME
— database user name (required) -
PASSWORD
— database user password (required) -
USE_SOURCE_SCHEMA_FILTERING
— Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional) -
EXCLUDE_FILTER_TEMPLATE
— exclude filter pattern (optional) -
INCLUDE_FILTER_TEMPLATE
— include filter pattern (optional) -
USE_JDBC_INTERNAL_METHODS
- Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)
Example
services:
my-psql-database:
<<: *extract
environment:
<<: *psql
USERNAME: db-user
PASSWORD: db-user-password
HOST: psql-database-host.internal
DATABASE: inventory
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/my-psql-database:/output
PostgreSQL with IAM authentication
Use <<: *psql-iam
under the environment
section to use this connection type.
Available attributes:
-
HOST
— database host name or IP address (required) -
PORT
— database port (optional, default is 5432) -
DATABASE
— database name (required) -
USERNAME
— database user name (required) -
AWS_ACCESS_KEY_ID
— AWS access key id (required) -
AWS_SECRET_ACCESS_KEY
— AWS secret access key (required) -
USE_SOURCE_SCHEMA_FILTERING
— Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional) -
EXCLUDE_FILTER_TEMPLATE
— exclude filter pattern (optional) -
INCLUDE_FILTER_TEMPLATE
— include filter pattern (optional) -
USE_JDBC_INTERNAL_METHODS
- Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)
Example
services:
my-psql-database:
<<: *extract
environment:
<<: *psql-iam
AWS_ACCESS_KEY_ID: my-access-key-id
AWS_SECRET_ACCESS_KEY: my-secret-access-key
USERNAME: db-user
HOST: psql-database-host.internal
DATABASE: inventory
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DB1$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DB2$": ["^SCHEMA1$", "^SCHEMA2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/my-psql-database:/output
SAP HANA with basic authentication
Use <<: *saphana
under the environment
section to use this connection type.
Available attributes:
-
HOST
— database host name or IP address (required) -
PORT
— database port (optional, default is 39017) -
USERNAME
— database user name (required) -
PASSWORD
— database user password (required) -
USE_SOURCE_SCHEMA_FILTERING
— Boolean to specify if schema-level filtering needs to be enabled while fetching schemas, tables, and columns (optional) -
EXCLUDE_FILTER_TEMPLATE
— exclude filter pattern (optional) -
INCLUDE_FILTER_TEMPLATE
— include filter pattern (optional) -
USE_JDBC_INTERNAL_METHODS
- Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)
Example
services:
my-saphana-database:
<<: *extract
environment:
<<: *saphana
USERNAME: my-database-username
PASSWORD: my-database-password
HOST: my-database-host
PORT: 39017
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DEFAULT$$": ["^SCHEMA1$$", "^SCHEMA2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DEFAULT$": ["^SCHEMA1$", "^SCHEMA2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/saphana-example:/output
Teradata with basic authentication
Use <<: *teradata
under the environment
section to use this connection type.
Available attributes:
-
HOST
— database host name or IP address (required) -
PORT
— database port (optional, default is 1025) -
USERNAME
— database user name (required) -
PASSWORD
— database user password (required) -
USE_SOURCE_SCHEMA_FILTERING
— Boolean to specify if database-level filtering needs to be enabled while fetching schemas, tables, and columns (optional) -
EXCLUDE_FILTER_TEMPLATE
— exclude filter pattern (optional) -
INCLUDE_FILTER_TEMPLATE
— include filter pattern (optional) -
USE_JDBC_INTERNAL_METHODS
- Boolean to specify if JDBC internal methods need to be used as part of the extraction (optional)
Example
services:
my-teradata-database:
<<: *extract
environment:
<<: *teradata
USERNAME: db-user
PASSWORD: db-user-password
HOST: teradata-database-host.internal
# If using Docker Swarm mode for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DEFAULT$$": ["^DB1$$", "^DB2$$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DEFAULT$$": ["^DB1$$", "^DB2$$"]}'
# If using Docker Compose for offline extraction, format the filters as follows:
INCLUDE_FILTER_TEMPLATE: '{"^DEFAULT$": ["^DB1$", "^DB2$"]}'
EXCLUDE_FILTER_TEMPLATE: '{"^DEFAULT$": ["^DB1$", "^DB2$"]}'
USE_SOURCE_SCHEMA_FILTERING: "false"
USE_JDBC_INTERNAL_METHODS: "false"
volumes:
- *shared-jdbc-drivers
- ./output/my-teradata-database:/output