Retroactive User Recognition
Jitsu supports storing all events from anonymous users and updates them in DWH with user id after users identification. At present this functionality is supported only for Postgres, Redshift, Snowflake, MySQL and ClickHouse*
*User Recognition support for Clickhouse is limited to ReplacingMergeTree and ReplicatedReplacingMergeTree engine.
*Clickhouse handles data mutation differently. Please read Clickhouse specifics to avoid unexpected results of Retroactive User Recognition on Clickhouse data tables.
Example#
event_id | anonymous_id | |
---|---|---|
event1 | 1 | |
event2 | 1 | |
event3 | 1 | a@b.com |
event4 | 1 | a@b.com |
Right after event3 Jitsu amends event1 and event2 and adds email=a@b.com. As a result, there will be the following events in DWH:
event_id | anonymous_id | |
---|---|---|
event1 | 1 | a@b.com |
event2 | 1 | a@b.com |
event3 | 1 | a@b.com |
event4 | 1 | a@b.com |
Fields anonymous_id and email are configurable. See identification_nodes
below.
Resources#
Retroactive Users Recognition stores all anonymous incoming events to Redis. RAM consumption can be pretty high. You can take a few measures to reduce the consumption. Namely, use a dedicated Redis instance and configure eviction and compression. Read how to optimize Redis memory
Configuration#
To enable this feature, set users_recognition.enabled
to true
in the configuration file. Or use its env variable equivalent USER_RECOGNITION_ENABLED=true
.
This setting enables user recognition for all supported destinations: Postgres, Redshift, Snowflake, MySQL and ClickHouse. By default,
/user/anonymous_id
will be used as a node for getting anonymous_id. /user/id
and /user/email
will be used as a source for user identification field.
Those settings can be redefined on global level of config file:
users_recognition:
enabled: true #Enabled by default.
anonymous_id_node: /user/anonymous_id
identification_nodes:
- /user/id
- /user/email
Those settings cannot be configured with env variables at the moment :(
By default, a system-wide Redis instance will be used for storing the data (meta.storage.redis
in config file or REDIS_URL
env var).
You can use a dedicated Redis instance (separate from Redis user for configuration and short-time caches) and apply memory optimization. Read more about options here.
This feature requires:
users_recognition.redis
ormeta.storage.redis
configurationprimary_key_fields
configuration in Postgres, Redshift and MySQL destinations. Read more about those settings on General Configuration