Retroactive User Recognition

Jitsu supports storing all events from anonymous users and updates them in DWH with user id after users identification. At present this functionality is supported only for Postgres, Redshift, Snowflake, MySQL and ClickHouse^*

^*User Recognition support for Clickhouse is limited to ReplacingMergeTree and ReplicatedReplacingMergeTree engine.
^*Clickhouse handles data mutation differently. Please read Clickhouse specifics to avoid unexpected results of Retroactive User Recognition on Clickhouse data tables.

Example#

event_id	anonymous_id	email
event1	1
event2	1
event3	1	a@b.com
event4	1	a@b.com

Right after event3 Jitsu amends event1 and event2 and adds email=a@b.com. As a result, there will be the following events in DWH:

event_id	anonymous_id	email
event1	1	a@b.com
event2	1	a@b.com
event3	1	a@b.com
event4	1	a@b.com

Fields anonymous_id and email are configurable. See identification_nodes below.

Resources#

user recognition flow

Retroactive Users Recognition stores all anonymous incoming events to Redis. RAM consumption can be pretty high. You can take a few measures to reduce the consumption. Namely, use a dedicated Redis instance and configure eviction and compression. Read how to optimize Redis memory

Configuration#

To enable this feature, set users_recognition.enabled to true in the configuration file. Or use its env variable equivalent USER_RECOGNITION_ENABLED=true.

This setting enables user recognition for all supported destinations: Postgres, Redshift, Snowflake, MySQL and ClickHouse. By default, /user/anonymous_id will be used as a node for getting anonymous_id. /user/id and /user/email will be used as a source for user identification field.

Those settings can be redefined on global level of config file:

users_recognition:
  enabled: true #Enabled by default.
  anonymous_id_node: /user/anonymous_id
  identification_nodes:
    - /user/id
    - /user/email

Those settings cannot be configured with env variables at the moment :(

By default, a system-wide Redis instance will be used for storing the data (meta.storage.redis in config file or REDIS_URL env var).

You can use a dedicated Redis instance (separate from Redis user for configuration and short-time caches) and apply memory optimization. Read more about options here.

This feature requires:

users_recognition.redis or meta.storage.redis configuration
primary_key_fields configuration in Postgres, Redshift and MySQL destinations. Read more about those settings on General Configuration

🚀 Quick Start

✈️ Sending data

📜 Configuration

❤️ Features

👩‍🔬 Extending Jitsu

Jitsu Internals