Syntax Decisions¶
This document records the syntax choices made in ASQL, what alternatives were considered, and why we made the decisions we did. It's useful for understanding the language design philosophy and for discussions about potential changes.
Core Design Values¶
All syntax decisions are guided by ASQL's design values:
- Pipeline Order Over Projection-First — Written order = execution order
- Convention Over Configuration — Infer from naming patterns
- Familiarity Over Novelty — Keep SQL vocabulary
- Portable Over Proprietary — Transpile to any dialect
- Completeness Over Fast Queries — Guaranteed groups, gap-filling
Filtering: where¶
Decision: Use where for filtering rows.
| Alternative | Source | Why Not |
|---|---|---|
filter |
PRQL | Ambiguous: "filter out" vs "filter in"? where clearly means "keep rows where this is true" |
keep |
pandas | Less familiar to SQL users |
Rationale: where is universally understood by SQL users. While PRQL argues filter is cleaner, we found where is actually clearer — "where status = 'active'" unambiguously means "keep rows where this condition is true."
Bonus: In ASQL, where works anywhere in the pipeline. Use it before group by (SQL's WHERE) or after (SQL's HAVING). No need for a separate HAVING keyword.
from orders
where status = "active" -- pre-aggregation (WHERE)
group by region (sum(amount) as revenue)
where revenue > 1000 -- post-aggregation (HAVING)
Limiting: limit¶
Decision: Use limit to restrict row count.
| Alternative | Source | Why Not |
|---|---|---|
take |
PRQL | Marginally more "natural English" but not enough to justify the learning cost |
top |
SQL Server | Non-standard SQL |
first |
— | Conflicts with window function first() |
Rationale: LIMIT is the standard SQL keyword across most dialects. While take 10 might read slightly more naturally than limit 10, the difference is minimal. Familiarity wins.
Sorting: order by¶
Decision: Use order by for sorting.
| Alternative | Source | Why Not |
|---|---|---|
sort |
PRQL | One word is nicer, but familiarity matters more |
sort by |
— | Considered as an alias; may be added |
Rationale: ORDER BY is universal SQL. The two-word syntax is slightly more verbose, but order by -revenue reads naturally.
Descending sort: Use - prefix for descending order. This is concise and intuitive:
order by -revenue -- descending
order by created_at -- ascending (default)
order by -year, name -- year descending, then name ascending
Future consideration: We may add sort by as an alias since ASQL already supports filler words. Low-cost addition for users coming from other languages.
Adding Columns: extend¶
Decision: Use extend for adding computed columns (when merged).
| Alternative | Source | Why Not |
|---|---|---|
derive |
PRQL | Less SQL-friendly vocabulary |
select *, ... |
SQL | Verbose; requires listing * |
add |
— | Too generic |
with |
— | Conflicts with CTE syntax |
Rationale: Adding a column shouldn't require select *, expr as col. A dedicated verb is cleaner:
-- With extend (cleaner)
from users
extend age > 18 as is_adult
-- Without (verbose)
from users
select *, age > 18 as is_adult
Status: extend is implemented but in an unmerged branch. We chose extend over PRQL's derive to maintain a more SQL-familiar vocabulary.
Aggregation: group by ... ()¶
Decision: Combine grouping and aggregation in one clause with parentheses.
| Alternative | Source | Why Not |
|---|---|---|
Separate group and aggregate |
PRQL | More orthogonal but more verbose for the 95% case |
summarize |
Kusto | Less familiar than GROUP BY |
rollup by |
— | Considered as alias; GROUP BY is more familiar |
Rationale: 95% of the time you group AND aggregate together. Combining them is more concise:
from orders
group by region (
sum(amount) as revenue,
# as order_count
)
PRQL separates group and aggregate for orthogonality, enabling "group without aggregate" (partition operations). ASQL handles that use case with the per command instead.
Note: The parentheses syntax is inspired by PRQL's block structure, which we found readable and elegant.
Counting: #¶
Decision: Use # as shorthand for COUNT.
| Alternative | Source | Why Not |
|---|---|---|
count(*) only |
SQL | More verbose |
n or num |
— | Less intuitive |
Rationale: # universally means "number of." It's extremely concise while remaining readable:
from users
group by country (
# as total, -- COUNT(*)
#(email) as with_email, -- COUNT(email)
# users as unique_users -- COUNT(DISTINCT user_id)
)
The # table_name syntax (e.g., # users) automatically infers the primary key and performs a distinct count. This uses ASQL's convention-based approach.
Joins: &, &?, ?&, *¶
Decision: Use symbolic operators for join types.
| Operator | Join Type | Mnemonic |
|---|---|---|
& |
INNER JOIN | Both sides required |
&? |
LEFT JOIN | Right side optional (nullable) |
?& |
RIGHT JOIN | Left side optional (nullable) |
?&? |
FULL OUTER | Both sides optional |
* |
CROSS JOIN | Cartesian product |
| Alternative | Source | Why Not |
|---|---|---|
| SQL keywords | SQL | More verbose; LEFT JOIN is 9 characters vs 2 |
join ... (==col) |
PRQL | Less visually distinct |
Rationale: Joins are common in analytics. Compact symbols save space while being visually distinctive. The mnemonic "? marks the nullable side" makes them learnable quickly.
from orders
&? customers on orders.customer_id = customers.id
-- LEFT JOIN: orders always present, customers may be NULL
Tradeoff: Less familiar to newcomers. But learnable in minutes, and the visual distinction helps when scanning code.
Null Handling: ??¶
Decision: Use ?? for COALESCE (nullish coalescing).
| Alternative | Source | Why Not |
|---|---|---|
coalesce() only |
SQL | More verbose |
\|\| |
Some languages | \|\| is string concatenation in SQL |
or |
— | Confusing with logical OR |
Rationale: ?? is the nullish coalescing operator in JavaScript, TypeScript, C#, Swift, and PHP. It's concise and chains naturally:
revenue ?? 0 -- default to 0
name ?? nickname ?? "Unknown" -- multiple fallbacks
Both ?? and coalesce() work in ASQL.
Date Literals: @¶
Decision: Use @ prefix for date literals.
| Alternative | Source | Why Not |
|---|---|---|
DATE '2024-01-01' |
SQL | Verbose |
| String casting | — | Ambiguous; requires explicit cast |
Rationale: @2024-01-01 is much shorter than DATE '2024-01-01' and immediately distinguishes dates from strings:
where created_at >= @2024-01-01
where created_at between @2024-01-01 and @2024-12-31
Date Arithmetic: + 7 days¶
Decision: Use natural language date arithmetic.
| Alternative | Source | Why Not |
|---|---|---|
date_add(col, 7, 'day') |
SQL | Verbose and varies by dialect |
col + INTERVAL '7 days' |
PostgreSQL | Dialect-specific |
Rationale: Date arithmetic is common in analytics. Natural language syntax is readable and dialect-portable:
created_at + 7 days
created_at - 1 month
due_date + 2 weeks
ASQL handles the dialect translation automatically.
Relative Dates: ago and from now¶
Decision: Support N unit ago and N unit from now.
Rationale: "Last 30 days" is the most common filter in analytics. Make it readable:
where created_at >= 30 days ago
where created_at >= 1 month ago
where due_date <= 7 days from now
String Matching: contains, starts with, ends with¶
Decision: Use natural language string matching operators.
| Alternative | Source | Why Not |
|---|---|---|
LIKE only |
SQL | Requires remembering %pattern% syntax |
~ regex |
— | Most SQL dialects don't support regex well |
Rationale: String matching is common, and LIKE syntax is error-prone:
-- ASQL (clear)
where email contains "@gmail.com"
where name starts with "John"
where filename ends with ".pdf"
-- SQL LIKE (cryptic)
WHERE email LIKE '%@gmail.com%'
WHERE name LIKE 'John%'
WHERE filename LIKE '%.pdf'
Case-insensitive variants: icontains, istarts with, iends with.
Window Operations: per¶
Decision: Use per command for partition-based operations.
| Alternative | Source | Why Not |
|---|---|---|
| Window function syntax only | SQL | Verbose for common patterns |
partition by ... () block |
PRQL-like | More complex syntax |
deduplicate by |
dbt-style | Only covers one use case |
Rationale: Deduplication and row numbering are extremely common patterns. per makes them concise:
-- Keep most recent order per customer (deduplication)
from orders
per customer_id first by -order_date
-- Add row numbers per customer
from orders
per customer_id number by -order_date
-- Rank employees by salary within department
from employees
per department rank by -salary
The syntax reads naturally: "per customer_id, first by order_date descending."
Cohort Analysis: cohort by¶
Decision: Add first-class cohort analysis support.
| Alternative | Source | Why Not |
|---|---|---|
| Manual CTEs | SQL | 50+ lines of boilerplate |
| No special syntax | — | Cohort analysis is too common in analytics to ignore |
Rationale: Cohort retention analysis is one of the most common analytics patterns. It typically requires 3-5 CTEs and 50+ lines of SQL. ASQL reduces this to 3 lines:
from events
group by month(event_date) (count(distinct user_id) as active)
cohort by month(users.signup_date)
This automatically: - Creates cohort assignment CTEs - Joins activity to cohort data - Calculates period (months since signup) - Adds cohort_size for retention calculations
Consideration: This is a domain-specific feature that not all users need. However, for analytics use cases, it's a massive productivity win. The feature is well-documented and optional.
Guaranteed Groups: Automatic Gap-Filling¶
Decision: GROUP BY automatically fills gaps with NULLs.
| Alternative | Source | Why Not |
|---|---|---|
| SQL behavior (no filling) | SQL | Causes dashboard bugs, incorrect calculations |
Require explicit guarantee() |
— | Too much boilerplate for the common case |
Rationale: Missing data in grouped results is a constant source of analytics bugs. When you group by month, you expect all months to appear:
from orders
where order_date >= @2024-01-01 and order_date < @2024-07-01
group by month(order_date) (
sum(amount) ?? 0 as revenue
)
All 6 months appear, even if some have no orders.
Consideration: This changes semantics from standard SQL. Users coming from SQL might be surprised. We mitigate this with:
1. Clear documentation
2. SET auto_spine = false to disable
3. Optional guarantee() for explicit control
Tradeoff: Small performance overhead (<5%) for the automatic spine generation.
CTEs: stash as¶
Decision: Use stash as for inline CTEs.
| Alternative | Source | Why Not |
|---|---|---|
WITH ... AS at top |
SQL | CTE defined far from usage |
let name = (...) |
PRQL | Different syntax; variable-like |
set name = (...) |
— | Reserved for compiler settings |
Rationale: CTEs should be defined close to where they're used:
from users
where is_active
stash as active_users
group by country (# as total)
order by -total
The name appears right before you reference it, making the flow clear.
No having Keyword¶
Decision: Eliminate HAVING; use where everywhere.
| Alternative | Source | Why Not |
|---|---|---|
Keep having |
SQL | Adds complexity; WHERE/HAVING distinction is confusing |
Rationale: Pipeline semantics eliminate the need for HAVING. The position of where in the pipeline determines when filtering occurs:
from orders
where status = "active" -- before aggregation (SQL's WHERE)
group by region (sum(amount) as revenue)
where revenue > 1000 -- after aggregation (SQL's HAVING)
One keyword, clear semantics.
Scalar Variables: Not Implemented¶
Decision: No let or scalar variable syntax (yet).
| Alternative | Source | Status |
|---|---|---|
let threshold = 1000 |
PRQL | Considered for future |
define threshold = 1000 |
— | Considered for future |
Current workaround: Use a single-row CTE:
from (select 1000 as threshold) stash as config
from orders
* config
where amount > config.threshold
Consideration: Scalar variables are useful for thresholds, date ranges, and configuration. However, they add language complexity. ASQL's philosophy is "convention over configuration," not "abstraction." We may add this feature if demand is high, but it's not a priority.
User-Defined Functions: Not Implemented¶
Decision: No func definitions (like PRQL).
Rationale: ASQL emphasizes convention-based convenience over abstraction. User-defined functions add significant language complexity. We let PRQL own this space.
Workaround: Use stash as for reusable CTEs, or define SQL functions in your database.
Case-Insensitive Identifiers¶
Decision: ASQL is case-safe.
Rationale: Database conventions (snake_case) conflict with frontend conventions (camelCase). ASQL eliminates this friction:
from Users -- matches 'users' table
select firstName, createdAt -- matches 'first_name', 'created_at'
This reduces errors when moving between different naming conventions.
Auto-Aliasing¶
Decision: Automatically generate column names for expressions without aliases.
Rationale: SQL produces unusable default names like SUM(amount) or f0_. ASQL generates meaningful names:
from orders
group by region (
sum(amount), -- → column: sum_amount
avg(price), -- → column: avg_price
count(*) -- → column: num
)
order by -sum_amount -- reference the auto-alias!
Pattern: {function}_{column} (configurable).
Summary Table¶
| Feature | ASQL Choice | Main Alternative | Winner Criteria |
|---|---|---|---|
| Filtering | where |
filter |
Clarity |
| Limiting | limit |
take |
Familiarity |
| Sorting | order by |
sort |
Familiarity |
| Adding columns | extend |
derive |
SQL-friendly |
| Aggregation | group by () |
group + aggregate |
Conciseness |
| Counting | # |
count(*) |
Brevity |
| Joins | &, &? |
SQL keywords | Compactness |
| Nulls | ?? |
coalesce() |
Brevity |
| Dates | @2024-01-01 |
DATE '...' |
Brevity |
| Date math | + 7 days |
date_add() |
Readability |
| Relative dates | ago, from now |
Manual | Readability |
| String matching | contains |
LIKE |
Readability |
| Window ops | per |
Window functions | Conciseness |
| Cohorts | cohort by |
Manual CTEs | Productivity |
| Gap-filling | Automatic | Manual spines | Correctness |
| CTEs | stash as |
WITH ... AS |
Proximity |
| Post-aggregation filter | where |
having |
Simplicity |
Open Questions¶
These are syntax decisions still under consideration:
-
sort byalias: Should we addsort byas an alias fororder by? Probably yes. -
Scalar variables: Should we add
let threshold = 1000? Under consideration. -
sortwithoutby: Should baresort -colwork? Probably, for consistency with ASQL's filler-word approach.
Changelog¶
| Date | Decision | Notes |
|---|---|---|
| 2024 | Core syntax established | where, limit, order by, group by () |
| 2024 | # count shorthand |
Novel but intuitive |
| 2024 | per for window operations |
Simplifies deduplication |
| 2024 | Guaranteed groups | Automatic gap-filling |
| 2024 | cohort by |
First-class cohort analysis |
| 2025-01 | extend implemented |
In branch, pending merge |
| 2026-01 | Documentation | This page created |