Airbyte Based Sources
Make sure Jitsu has access to docker. If you've deployed Jitsu with Docker - you should mount volumes: /var/run/docker.sock:/var/run/docker.sock
.
and jitsu_workspace:/home/eventnative/data/airbyte
Read more about volumes for Joint Image (if you deploy @jitsucom/jitsu
or for Jitsu Server if you deploy @jitsucom/server
Airbyte Based Sources may not work properly in managed environments that limits access to docker, e.g., Heroku.
If you deploy Jitsu to Kubernetes please check How to run Airbyte sources in Kubernetes
Airbyte is an open-source ETL-framework. Jitsu supports Airbyte as an of the connector backend (the other one being Singer and native connectors
This page describes how to configure Jitsu Server if it runs in standalone mode (without Configurator). If you deployed Jitsu along with Configurator, you can configure Airbyte connectors directly in the UI
Understanding Airbyte connectors#
Each Airbyte connector is a standalone docker image. The connector implements Airbyte protocol. The protocol is CLI based: connector executor (Jitsu) feeds connector with a formatted JSON objects (see below) and parses the stdout which is a stream of JSON objects as well.
Configuration#
Airbyte configuration is a two of JSON objects (see specification):
Name | Description |
---|---|
Config (required) | JSON object contains connector's configuration parameters. Each Airbyte connector has a configuration specification which describes configuration parameters. |
Catalog | JSON object contains all streams (object types) and fields to download. If not provided, Jitsu will do discover and save catalog with all available streams. JSON structure is standardized, but stream and field names depend on the connector. |
The connectors' configuration should be placed under the sources
section of configuration file. The structure is an extension
of generic source configuration - type: airbyte
indicates that the source should be handled
by Airbyte executor
sources:
<connector_id>:
type: airbyte
schedule: '...' # Cron expression or period notation (@daily or @hourly)
destinations: [ ] # List of destinations
config:
config: {} # Airbyte config object (see below)
docker_image: 'airbyte/image' # Airbyte connector image
catalog: {} # Airbyte catalog object (see below).
# Optional. If not provided, all streams
# will be synchronized
Example:
sources:
...
jitsu_airbyte_hubspot:
type: airbyte
destinations: [ "clickhouse_destination_id" ]
schedule: '*/5 * * * *'
config:
config:
credentials:
api_key: '<HUBSPOT_API_KEY>'
start_date: "2017-01-25T00:00:00Z"
docker_image: source-hubspot
airbyte_source_shopify:
destinations: [ "clickhouse_destination_id" ]
type: airbyte
schedule: '@daily'
config:
config:
api_password: '<SHOPIFY_API_PASSWORD>'
shop: 'https://EXAMPLE.myshopify.com'
start_date: "2020-10-10"
docker_image: source-shopify
JSON configuration parameters such as config
, catalog
, state
can be an object or a raw JSON or JSON string or path to
local JSON file
Table Names#
Jitsu creates tables with names $sourceID_$AirbyteStreamName
by default. For instance, table with name jitsu_airbyte_shopify_orders
will be created according to the following configuration:
sources:
...
airbyte_source_shopify:
destinations: [ "clickhouse_destination_id" ]
type: airbyte
schedule: '@daily'
config:
config:
api_password: '<SHOPIFY_API_PASSWORD>'
shop: 'https://EXAMPLE.myshopify.com'
start_date: "2020-10-10"
catalog: '{"streams":[{"stream": {"name":"orders", ...}}]}'
docker_image: source-shopify
Table names might be overridden by adding stream_table_names
configuration parameter:
sources:
...
airbyte_source_shopify:
destinations: [ "clickhouse_destination_id" ]
type: airbyte
schedule: '@daily'
config:
config:
api_password: '<SHOPIFY_API_PASSWORD>'
shop: 'https://EXAMPLE.myshopify.com'
start_date: "2020-10-10"
catalog: '{"streams":[{"stream": {"name":"orders", ...}},{"stream": {"name":"products", ...}}]}'
docker_image: source-shopify
stream_table_names:
orders: my_orders
products: my_products