Depending on your data volumes, you may need to adjust the frequency and configuration of the Digital Unified Model to optimise its performance.
Configuration variables
Adjusting the below configuration variables may assist with optimisation efforts.
lookback_window_hours
The number of hours to look before the latest event processed to account for late arriving data, which comes out of order. Read more in Late Loaded Events on our docs.
Default | Implications |
---|---|
6 |
Partioned on collector: If your data pipeline is stable, this can be decreased to improve performance, but it risks missing events during spikes that will result in late loaded events. Partioned on loader: You are ok to change it to 0-1 as the late loaded will just have the late loading timestamp you will be fine |
max_session_days
The maximum allowed session length in days. For a session exceeding this length, all events after this limit will stop being processed. Exists to reduce lengthy table scans that can occur due to long sessions which are usually a result of bots. Read more in Quarantine Table on our docs.
Default | Implications |
---|---|
3 |
If sessions are generally shorter, reducing this value can optimize performance, but setting it too low may prematurely cut off valid sessions. |
days_late_allowed
The maximum allowed number of days between the event creation and it being sent to the collector. Exists to reduce lengthy table scans that can occur as a result of late arriving data. Read more in Late Sent Events on our docs.
Default | Implications |
---|---|
3 |
It is necessary for apps that support offline event generation; reducing it may increase performance but risks dropping late events. |
upsert_lookback_days
Number of days to look back over the incremental derived tables during the upsert. Where performance is not a concern, should be set to as long a value as possible. Having too short a period can result in duplicates. Read more in Optimize Upserts on our docs.
Default | Implications |
---|---|
30 |
Lowering it may improve performance but increases the risk of data loss (reduces integrity due to late arrivals). Disabling it will worsen performance so not recommended. |
session_lookback_days
Number of days to limit scan on snowplow_unified_base_sessions_lifecycle_manifest
manifest. Exists to improve performance of model when we have a lot of sessions. Should be set to as large a number as practical.
Default | Implications |
---|---|
730 |
n/a |
backfill_limit_days
The maximum numbers of days of new data to be processed since the latest event processed. Please refer to the incremental logic section for more details. Read more in Sessionization on our docs.
Default | Implications |
---|---|
30 |
Only affects backfill processes, not regular runs, so tuning this is useful for managing data loads in historical updates. |