Cohort Analysis¶
Cohort analysis groups users by a shared characteristic (usually when they "started") and tracks their behavior over time. ASQL's features make cohort queries simple and readable.
What is Cohort Analysis?¶
Cohort analysis tracks the same entities over time periods relative to when they started. Key components:
- Cohort: When each entity "started" (signup month, first purchase month, etc.)
- Period: Time elapsed since their cohort start (0 = start, 1 = first period after, etc.)
- Cohort size: How many entities in each cohort (for retention %)
The Simple Pattern¶
In ASQL, cohort analysis is just computed columns + GROUP BY:
from orders
join users
group by
month(users.signup_date) as cohort,
months_between(orders.created_at, users.signup_date) as period
(
count(distinct customer_id) as active
)
That's it. ASQL's alias reuse lets you reference cohort and period immediately after defining them.
Why This Works¶
Traditional SQL requires CTEs because you can't reference a computed column in the same SELECT:
-- This DOESN'T work in SQL:
SELECT
DATE_TRUNC('month', signup_date) AS cohort,
DATEDIFF(month, cohort, order_date) AS period -- ERROR!
FROM ...
ASQL's alias reuse handles this automatically - each alias becomes available to subsequent expressions.
Examples¶
User Retention by Signup Cohort¶
Track monthly active users by their signup month:
from events
join users
group by
month(users.signup_date) as cohort,
months_between(events.event_date, users.signup_date) as period
(
count(distinct user_id) as active
)
order by cohort, period
Output columns:
- cohort: Month users signed up
- period: Months since signup (0 = signup month, 1 = first month after, etc.)
- active: Active users in that period
Revenue Cohort Analysis¶
Track revenue by first purchase cohort:
from orders
join customers
group by
month(customers.first_order_date) as cohort,
months_between(orders.order_date, customers.first_order_date) as period
(
sum(total) as revenue,
count(distinct customer_id) as customers
)
order by cohort, period
Weekly Cohorts¶
Track weekly active users with weekly periods:
from events
join users
group by
week(users.signup_date) as cohort,
weeks_between(events.event_date, users.signup_date) as period
(
count(distinct user_id) as active
)
order by cohort, period
Retention with Cohort Size¶
To calculate retention percentage, you need cohort size. Use a window function:
from events
join users
group by
month(users.signup_date) as cohort,
months_between(events.event_date, users.signup_date) as period
(
count(distinct user_id) as active
)
select *,
first(active) over cohort as cohort_size,
round(active::numeric / cohort_size * 100, 1) as retention_pct
order by cohort, period
The first(active) over cohort gets the period 0 value (all users in cohort), which is the cohort size.
Cumulative Metrics (LTV)¶
Use running_sum() for lifetime value:
from orders
join customers
group by
month(customers.first_order_date) as cohort,
months_between(orders.order_date, customers.first_order_date) as period
(
sum(total) as revenue,
count(distinct customer_id) as customers
)
select *,
running_sum(revenue) over cohort as cumulative_revenue,
first(customers) over cohort as cohort_size,
running_sum(revenue) over cohort / cohort_size as ltv
order by cohort, period
Self-Cohort (First Activity as Cohort)¶
When the cohort date isn't in a separate table, use a subquery or window function to find each entity's first activity:
from (
from orders
select *,
month(first(created_at) over customer_id) as cohort
)
group by
cohort,
months_between(created_at, cohort) as period
(
count(distinct customer_id) as active
)
order by cohort, period
Segmented Cohorts¶
Add segment dimensions to compare cohorts across segments:
from events
join users
group by
users.channel,
month(users.signup_date) as cohort,
months_between(events.event_date, users.signup_date) as period
(
count(distinct user_id) as active
)
order by channel, cohort, period
Period Calculation Functions¶
ASQL provides date difference functions for period calculation:
| Function | Returns |
|---|---|
months_between(date1, date2) |
Months between dates |
weeks_between(date1, date2) |
Weeks between dates |
days_between(date1, date2) |
Days between dates |
Or use explicit arithmetic:
datediff('month', signup_date, event_date) as period
Gap Filling with Spine¶
For complete cohort matrices (showing 0 for periods with no activity), use spine by:
from events
join users
spine by
month(users.signup_date) as cohort,
months_between(events.event_date, users.signup_date) as period
(
count(distinct user_id) as active
)
order by cohort, period
The spine by ensures you get a row for every (cohort, period) combination.
Common Patterns¶
Retention Table¶
from events
join users
group by
month(users.signup_date) as cohort,
months_between(events.event_date, users.signup_date) as period
(
count(distinct user_id) as active
)
select cohort, period, active,
first(active) over cohort as cohort_size,
round(active::numeric / cohort_size * 100, 1) as retention_pct
where period between 0 and 12
order by cohort, period
Churn Analysis¶
from events
join users
group by
month(users.signup_date) as cohort,
months_between(events.event_date, users.signup_date) as period
(
count(distinct user_id) as active
)
select cohort, period, active,
first(active) over cohort as cohort_size,
cohort_size - active as churned,
round((cohort_size - active)::numeric / cohort_size * 100, 1) as churn_pct
order by cohort, period
See Also¶
- Window Functions -
first(),running_sum(), and other functions useful for cohorts - Date Functions -
month(),week(), date arithmetic - Spine (Gap Filling) - Ensuring complete time series