Cohort Analysis¶
Cohort analysis groups users by a shared characteristic (usually when they "started") and tracks their behavior over time. ASQL's cohort by operator dramatically simplifies this complex pattern.
The Problem: Cohort Queries Are Complex¶
Traditional SQL requires 3-5 CTEs and 50+ lines for even basic cohort queries. A simple retention cohort requires:
- Finding each user's cohort (when they started)
- Finding each user's activity by period
- Calculating periods since cohort start
- Getting cohort sizes
- Joining everything together
- Calculating retention rates
SQL Example (50+ lines):
WITH user_cohorts AS (
SELECT user_id, DATE_TRUNC('month', signup_date) AS cohort_month
FROM users
),
activity_months AS (
SELECT user_id, DATE_TRUNC('month', event_date) AS activity_month
FROM events
GROUP BY 1, 2
),
cohort_sizes AS (
SELECT cohort_month, COUNT(*) AS size
FROM user_cohorts GROUP BY 1
),
cohort_activity AS (
SELECT
uc.cohort_month,
EXTRACT(YEAR FROM AGE(am.activity_month, uc.cohort_month)) * 12 +
EXTRACT(MONTH FROM AGE(am.activity_month, uc.cohort_month)) AS period,
COUNT(DISTINCT uc.user_id) AS active
FROM user_cohorts uc
JOIN activity_months am ON uc.user_id = am.user_id
WHERE am.activity_month >= uc.cohort_month
GROUP BY 1, 2
)
SELECT
ca.cohort_month, cs.size, ca.period,
ca.active, ROUND(ca.active::numeric / cs.size * 100, 1) AS retention
FROM cohort_activity ca
JOIN cohort_sizes cs ON ca.cohort_month = cs.cohort_month
ORDER BY ca.cohort_month, ca.period;
ASQL (3 lines):
from events
group by month(event_date) (count(distinct user_id) as active)
cohort by month(users.signup_date)
Reduction: 94% 🎉
Basic Syntax¶
The cohort by clause transforms any aggregation query into a cohort analysis:
from <activity_table>
group by <granularity>(<activity_date>) (<aggregations>)
cohort by <granularity>(<cohort_table>.<cohort_date>)
Components¶
- Activity table: The table containing events/activities to analyze (e.g.,
events,orders) - Activity date: The date column in the activity table (e.g.,
event_date,order_date) - Granularity: Time bucket function -
month(),week(), orday() - Cohort table: The table containing cohort assignment (e.g.,
users,customers) - Cohort date: The date that defines when users joined the cohort (e.g.,
signup_date,first_order_date)
Examples¶
User Retention by Signup Cohort¶
Track monthly active users by their signup month:
from events
group by month(event_date) (count(distinct user_id) as active)
cohort by month(users.signup_date)
Output columns:
- cohort_month: Month users signed up
- period: Months since signup (0 = signup month, 1 = first month after, etc.)
- active: Active users in that period
- cohort_size: Total users in the cohort
Revenue Cohort Analysis¶
Track revenue by first purchase cohort:
from orders
group by month(order_date) (sum(total) as revenue)
cohort by month(customers.first_order_date)
Weekly Activity Cohorts¶
Track weekly active users:
from events
group by week(event_date) (count(distinct user_id) as active)
cohort by week(users.signup_date)
Multiple Metrics¶
Track multiple metrics simultaneously:
from events
group by month(event_date) (
count(distinct user_id) as active,
# as events,
sum(revenue) as revenue
)
cohort by month(users.signup_date)
Period Calculation¶
Period is automatically calculated based on the granularity:
cohort by month(...)→ period in months since cohort startcohort by week(...)→ period in weeks since cohort startcohort by day(...)→ period in days since cohort start
The period calculation uses:
- The activity date column from your group by clause
- The cohort date from the cohort by clause
- Computes the difference using appropriate date arithmetic
Period values:
- 0: The cohort start period (signup month/week/day)
- 1: First period after cohort start
- 2: Second period after cohort start
- And so on...
Retention Calculations¶
With cohort by, retention calculations become straightforward. The cohort_size column is automatically available:
from events
group by month(event_date) (count(distinct user_id) as active)
cohort by month(users.signup_date)
You can calculate retention rates in your BI tool or with window functions:
from events
group by month(event_date) (count(distinct user_id) as active)
cohort by month(users.signup_date)
select
cohort_month,
period,
active,
cohort_size,
round(active::numeric / cohort_size * 100, 1) as retention_pct
Period-over-Period Comparisons¶
Use window functions like prior() for period-over-period analysis:
from events
group by month(event_date) (count(distinct user_id) as active)
cohort by month(users.signup_date)
select
cohort_month,
period,
active,
prior(active) as prev_period_active,
active - prior(active) as change,
round((active - prior(active))::numeric / prior(active) * 100, 1) as pct_change
The cohort by clause automatically partitions window functions by cohort and orders by period, so prior() works correctly.
Cumulative Metrics (LTV)¶
Use running_sum() for cumulative metrics like Lifetime Value (LTV):
from orders
group by month(order_date) (sum(total) as revenue)
cohort by month(customers.first_order_date)
select
cohort_month,
period,
revenue,
running_sum(revenue) as cumulative_revenue,
running_sum(revenue) / cohort_size as ltv
Segmented Cohorts¶
Add segment dimensions before the time function to create segmented cohorts:
from events
group by month(event_date) (count(distinct user_id) as active)
cohort by users.channel, month(users.signup_date)
This creates cohorts segmented by acquisition channel, allowing you to compare retention across different channels.
Output columns:
- channel: Acquisition channel
- cohort_month: Month users signed up
- period: Months since signup
- active: Active users
- cohort_size: Total users in that channel cohort
Join Key Inference¶
ASQL automatically infers join keys using foreign key naming conventions. For example:
events.user_id→ joins tousers.idorders.customer_id→ joins tocustomers.id
from orders
group by month(order_date) (sum(total) as revenue)
cohort by month(customers.first_order_date)
If your schema doesn't follow these conventions, provide schema information or enable infer_join_keys mode for examples.
Common Use Cases¶
1. User Retention Analysis¶
from events
group by month(event_date) (count(distinct user_id) as active)
cohort by month(users.signup_date)
2. Revenue Cohort with LTV¶
from orders
group by month(order_date) (sum(total) as revenue)
cohort by month(customers.first_order_date)
select
cohort_month,
period,
revenue,
running_sum(revenue) as cumulative_revenue,
running_sum(revenue) / cohort_size as ltv
3. Feature Adoption Cohorts¶
from feature_events
where feature_name = "dashboard"
group by week(event_date) (count(distinct user_id) as feature_users)
cohort by week(users.first_dashboard_use_date)
4. Subscription Survival Analysis¶
from subscription_events
where status = "active"
group by month(event_date) (count(distinct subscription_id) as active_subs)
cohort by month(subscriptions.start_date)
5. Churn Analysis¶
from events
group by month(event_date) (count(distinct user_id) as active)
cohort by month(users.signup_date)
select
cohort_month,
period,
active,
cohort_size,
cohort_size - active as churned,
round((cohort_size - active)::numeric / cohort_size * 100, 1) as churn_rate
How It Works¶
When you use cohort by, ASQL automatically:
- Creates cohort CTEs: Generates
cohort_base(cohort assignment) andcohort_sizes(cohort size calculation) - Infers joins: Automatically joins the activity table to the cohort table using foreign key conventions
- Calculates period: Computes periods since cohort start based on the granularity function
- Modifies GROUP BY: Adds
cohort_monthandperiodto the grouping - Adds columns: Includes
cohort_month,period, andcohort_sizein the SELECT - Orders results: Automatically orders by
cohort_month, period
Best Practices¶
- Match granularities: Use the same granularity in
group byandcohort by(e.g., bothmonth()) - Follow FK conventions: Use
{table}_idnaming (e.g.,user_id,customer_id) for automatic join inference - Filter activity: Apply filters before
cohort byto analyze specific event types - Combine with window functions: Use
prior(),running_sum(), etc. for advanced analysis
Limitations¶
- Join inference: Requires standard foreign key naming (e.g.,
user_id→users.id) - Granularity matching: Activity and cohort granularities should match or be compatible
- Single cohort dimension: Currently supports one cohort definition per query (segmented cohorts are supported)
See Also¶
- Window Functions -
prior(),running_sum(), and other functions useful for cohorts - Date Functions -
month(),week(),day()functions - Spec: Cohort Analysis - Full specification