Guaranteed Groups¶
One of ASQL's most powerful features is guaranteed groups: when you group data, ASQL ensures all expected dimension values appear in results—even if they have no data.
This feature embodies the Completeness Over Fast Queries design value.
The Problem¶
SQL doesn't guarantee your grouped results are complete. If a dimension value has no data, it simply won't appear:
SELECT month, SUM(amount) as revenue
FROM orders
GROUP BY month;
| month | revenue |
|---|---|
| Jan | 1000 |
| Feb | 1500 |
| Mar | 800 |
| Jun | 1200 |
April and May are missing. This isn't a bug—SQL was designed for transactional systems where you're asking "what happened?" and showing non-existent data would be wrong.
But analytics is different. When you build a time-series chart, missing data points cause real problems: - The line jumps unexpectedly - Month-over-month calculations use the wrong prior month - The dashboard looks broken
The Solution¶
ASQL automatically fills gaps:
from orders
group by month(order_date) (
sum(amount) ?? 0 as revenue
)
| month | revenue |
|---|---|
| Jan | 1000 |
| Feb | 1500 |
| Mar | 800 |
| Apr | 0 |
| May | 0 |
| Jun | 1200 |
No dimension tables. No extra CTEs. No post-processing.
How It Works¶
Date Truncations¶
When you group by a date truncation function (month(), year(), week(), etc.), ASQL:
- Infers the date range from your WHERE clause
- Generates all values in that range
- LEFT JOINs your data to the complete range
- Fills missing values with NULL (use
??for defaults)
from orders
where order_date >= @2024-01-01 and order_date < @2024-07-01
group by month(order_date) (
sum(amount) ?? 0 as revenue
)
All months from January to June will appear.
Non-Date Columns¶
For non-date columns, ASQL uses DISTINCT values from the filtered source data:
from orders
group by status (
# ?? 0 as count
)
If your data has orders with status "pending", "shipped", and "delivered", all three will appear even if one has zero orders in the filtered period.
How Filters Affect Categorical Spines¶
Important: WHERE filters are applied when generating the spine for categorical columns. The spine is built from:
SELECT DISTINCT column FROM source_table WHERE <your filters>
This means if you filter your data, only values matching that filter will appear in the spine:
from orders
where region = 'North America'
group by region (
sum(amount) ?? 0 as revenue
)
This will only show North America—not Europe, Asia, or other regions—because the spine is generated from the filtered data.
This is usually what you want! When you filter to North America, you typically want results only for North America. The spine ensures you see all North American sub-categories (if grouping by something like state), but it respects your top-level filter.
Want all regions to appear (even with filtered data)? Use guarantee():
from orders
where region = 'North America'
group by guarantee(region, ['North America', 'Europe', 'Asia', 'South America']) (
sum(amount) ?? 0 as revenue
)
Now all four regions appear—North America with actual data, the others with zero revenue.
| Scenario | Result |
|---|---|
group by region with where region = 'NA' |
Only NA appears |
group by region (no filter) |
All regions in data appear |
group by guarantee(region, [...]) |
All listed values appear |
Multiple GROUP BY Columns¶
When grouping by multiple columns, ASQL creates all combinations:
from orders
group by region, month(order_date) (
sum(amount) ?? 0 as revenue
)
Every region × month combination will appear.
Explicit Values with guarantee()¶
Specify exactly which values should appear:
from orders
group by guarantee(status, ['pending', 'shipped', 'delivered', 'cancelled']) (
# ?? 0 as order_count
)
This ensures all four statuses appear, even if some have zero orders.
Use Cases for guarantee()¶
- Fixed categories that should always appear
- Enum-like values from your data model
- Dashboard filters with known options
Default Values with ??¶
Use the nullish coalescing operator to provide defaults for missing values:
from orders
group by month(order_date) (
sum(amount) ?? 0 as revenue, -- Default to 0
# ?? 0 as orders, -- Default to 0
avg(amount) as avg_order -- Leave as NULL
)
Different columns can have different default behaviors.
Disabling Guaranteed Groups¶
Filter the Results¶
The most common approach—just filter out zeros:
from orders
group by month(order_date) (
sum(amount) as revenue
)
where revenue > 0
Disable for a Query¶
Use a SET statement:
SET auto_spine = false;
from orders
group by month(order_date) (
sum(amount) as revenue
)
Disable Globally¶
Configure in your ASQL settings or via API.
Understanding the Generated SQL¶
Transpilation Comments¶
To understand the generated SQL, enable transpilation comments:
SET include_transpilation_comments = true;
from orders
where order_date >= @2024-01-01 and order_date < @2024-07-01
group by month(order_date) (
sum(amount) ?? 0 as revenue
)
The output will include a helpful explanation:
/* ASQL auto-spine: Gap-filling CTEs were generated to ensure all
expected GROUP BY values appear (even with zero/null aggregates).
Disable with: SET auto_spine = false; */
WITH month_order_date_spine AS (...)
...
This is especially useful when first learning how auto-spine works or when debugging complex queries.
Technical Details¶
Generated SQL¶
ASQL generates a "spine" CTE with all expected values, then LEFT JOINs your data:
WITH date_spine AS (
SELECT DATE_TRUNC('month', d) AS month
FROM generate_series('2024-01-01'::date, '2024-06-01'::date, INTERVAL '1 month') AS d
),
aggregated AS (
SELECT DATE_TRUNC('month', order_date) AS month, SUM(amount) AS revenue
FROM orders
WHERE order_date >= '2024-01-01' AND order_date < '2024-07-01'
GROUP BY 1
)
SELECT
date_spine.month,
COALESCE(aggregated.revenue, 0) AS revenue
FROM date_spine
LEFT JOIN aggregated ON date_spine.month = aggregated.month
Dialect Support¶
ASQL generates dialect-appropriate date generation:
| Dialect | Method |
|---|---|
| PostgreSQL | generate_series() |
| BigQuery | GENERATE_DATE_ARRAY() |
| Snowflake | GENERATOR() with DATEADD |
| DuckDB | generate_series() |
| Others | Recursive CTE or numbers table |
Spines and Performance¶
Auto-spine adds minimal overhead (< 5% query time) in most cases. Here's what to expect:
Date Spines: Negligible Cost¶
Date spines use in-memory generation (generate_series, GENERATE_DATE_ARRAY, etc.) which is extremely fast:
| Date Range | Spine Rows | Typical Cost |
|---|---|---|
| 1 year of months | 12 rows | < 1ms |
| 5 years of days | ~1,825 rows | < 5ms |
| 10 years of days | ~3,650 rows | < 10ms |
The spine generation itself is trivial. The only real cost is the LEFT JOIN with your aggregated data—but since your aggregated data is already grouped (small result set), this join is fast.
Categorical Spines: One Extra Query¶
For non-date columns, the spine requires:
SELECT DISTINCT column FROM source_table WHERE <your filters>
This hits your source table, so performance depends on:
| Factor | Impact |
|---|---|
| Column is indexed | Very fast (index-only scan) |
| Column has low cardinality | Fast (few distinct values) |
| Large table, no index | Can be slow |
| Already filtered by WHERE | Usually fast (smaller scan) |
Tip: If you're grouping by a categorical column frequently, ensure it's indexed. Most dimension columns (status, region, category) naturally have low cardinality and are fast regardless.
Multiple GROUP BY Columns: Cross-Join¶
When grouping by multiple columns, spines are cross-joined:
12 months × 5 regions = 60 spine rows
12 months × 50 states = 600 spine rows
365 days × 100 products = 36,500 spine rows
For typical analytics (monthly/quarterly × reasonable dimensions), this is fine. For high-cardinality combinations, consider whether you really need gap-filling.
When to Disable Auto-Spine¶
Consider SET auto_spine = false when:
- You're grouping by a high-cardinality column (user_id, order_id) where gap-filling doesn't make sense—you don't want a row for every user with zero orders. Disabling is equivalent to filtering
where count != 0afterwards, but saves 10-30% query overhead. - You specifically don't want gap-filling behavior
Multiple GROUP BY Columns¶
When grouping by many columns, the spines are cross-joined:
12 months × 5 regions = 60 rows
12 months × 50 states × 20 categories = 12,000 rows
This is still efficient—the cross-join happens on small dimension sets, and the LEFT JOIN uses hash joins. Even for large combinations (millions of rows), expect 20-40% overhead compared to the base query. For materialized views or batch jobs where you genuinely want complete dimension coverage, this is reasonable.
One benefit of auto-spine: you don't need to pre-materialize every dimension combination "just in case." Since downstream queries also get gap-filling automatically, you can let each query fill the gaps it needs rather than over-engineering your data models for completeness upfront.
The Bottom Line¶
For typical analytics queries (time-series by month/quarter, categorical breakdowns by region/status/category), auto-spine overhead is < 5% of query time and usually unnoticeable. The convenience and correctness benefits far outweigh the cost.
Column-to-Column Comparisons in WHERE¶
When your WHERE clause compares the grouped date column to another column, ASQL bounds the spine by your actual data:
from orders
where created_at > updated_at -- Data-dependent bound
group by month(created_at) (sum(amount) ?? 0 as revenue)
This is the correct behavior. When you write created_at > updated_at, you're saying "my bounds should be defined by this data relationship." The spine correctly represents all periods where your predicate could be satisfied—there's no "missing" data before the MIN because by definition, no data could exist there that satisfies your condition.
Want explicit bounds instead? Add them:
from orders
where created_at > updated_at
and created_at >= @2024-01-01 -- Explicit bound added
group by month(created_at) (sum(amount) ?? 0 as revenue)
Or filter the grouped results:
from orders
where created_at > updated_at
group by month(created_at) (sum(amount) ?? 0 as revenue)
where month_created_at >= @2024-01-01 -- Filter after grouping
Best Practices¶
- Always use
??for aggregates — Decide what missing means (0? NULL? N/A?) - Specify date bounds in WHERE — Helps ASQL generate the right spine
- Use
guarantee()for fixed categories — Don't rely on data having all values - Filter zeros when needed —
where revenue > 0after grouping
Comparison with dbt¶
dbt's date_spine macro and Kimball-style dimension tables solve the same problem, but require:
- Explicit dimension table creation
- Manual maintenance
- Extra modeling work
ASQL provides this automatically for common patterns.
Real-World Examples¶
Monthly Revenue with All Months¶
from orders
where order_date >= @2024-01-01 and order_date < @2025-01-01
group by month(order_date) (
sum(amount) ?? 0 as revenue,
# ?? 0 as orders
)
order by month
Status Dashboard¶
from tickets
group by guarantee(status, ['open', 'in_progress', 'resolved', 'closed']) (
# ?? 0 as ticket_count
)
Sales by Region and Quarter¶
from sales
where year(sale_date) = 2024
group by region, quarter(sale_date) (
sum(amount) ?? 0 as revenue
)
order by region, quarter
Next Steps¶
- Dates & Time — Date truncation and arithmetic
- Aggregations — GROUP BY syntax
- Convention Over Configuration — How ASQL infers defaults