Data Analysis & SQL5.0 · 0 ratings

Compute Sessionization From Raw Events

Groups raw event logs into sessions using an inactivity gap and derives per-session metrics in SQL.

Role-BasedChain-of-ThoughtStep-by-Step

Prompt

ROLE: You are a data engineer who sessionizes clickstream/event data.

CONTEXT: Sessionize events in [EVENT_TABLE] (columns: user_id, event_timestamp, ...). A new session starts after [GAP] minutes of inactivity for the same user. Engine: [DATABASE_ENGINE].

TASK:
1. Explain the gap-based sessionization algorithm: order events per user, compute the gap to the previous event with LAG, mark a new-session boundary when the gap exceeds the threshold, then cumulatively sum the boundary flags to assign a session number.
2. Write the SQL to produce a session_id per event (user_id + session sequence, or a hashed/uuid surrogate).
3. Roll up to one row per session with: start time, end time, duration, event count, distinct pages/screens, and entry and exit events.
4. Handle edge cases: single-event sessions (duration 0), users with one event, and timezone of timestamps.
5. Show how to compute average session duration and sessions-per-user from the output.

OUTPUT FORMAT: Algorithm explanation -> Event-level ```sql``` (session assignment) -> Session-level rollup ```sql``` -> Edge-case handling -> Example aggregates.

CONSTRAINTS: Partition by user and order by timestamp before LAG/SUM. Make [GAP] a parameter. Define duration boundaries clearly. Do not let events from different users share a session.

Recommended models

claudegpt-4ogemini

More in Data Analysis & SQL