ASQL: Analytic SQL — Language Specification¶

Version: 0.3
Status: Draft
Last Updated: December 2025

⚠️ This language specification and implementation are a heavy work-in-progress (WIP). This project is in active development and contains many known issues, bugs, and incomplete features. Much of the specification is just brainstorming and prototyping. Do not rely on this for anything yet.

1. Overview¶

ASQL (Analytic SQL, pronounced "Ask-el") is a modern, pipeline-based query language designed specifically for analytical workloads. It transpiles cleanly to standard SQL (via SQLGlot) and aims to be more human-readable, less verbose, and better suited for analytics than traditional SQL.

Key Philosophy: ASQL should feel like asking a natural question, not writing cryptic code. It reads like natural language while maintaining the power and precision of SQL.

⚠️ Note: This specification is an extensive list of brainstormed improvements to SQL. Not all features should necessarily be implemented. It's meant to be open-minded and aggressive to start, with features refined based on real-world usage and feedback.

Design Philosophy¶

Natural language feel: Queries read like questions you'd ask a colleague
Convention over configuration: ASQL assumes good modeling standards and makes smart inferences based on conventions. If you follow standards (like dbt's naming conventions), ASQL works seamlessly. Everything is configurable, but we have strong opinions.
Pipeline semantics: Every query is a sequence of tabular transformations
Familiarity over novelty: Keeps SQL's nouns and functions, simplifies verbs and ordering
Standards-based inference: Smart behaviors are convention-driven, not magical. We assume standardized naming (e.g., created_at for timestamps), proper foreign keys, and good modeling practices.
Readable and composable: Short, expressive, indentation-based pipelines
Case-safe: Works seamlessly with camelCase, snake_case, PascalCase, and any naming convention
Portable: Transpiles to ANSI SQL or specific dialects via SQLGlot
Inspectable: Every ASQL query can show its generated SQL, plan, and metrics

2. Core Syntax¶

2.1 Basic Query Structure¶

Every ASQL query starts with a data source. Transformations can be chained using indentation (preferred) or optional pipeline operators (|):

Indentation-based (preferred, cleaner):

from users
  where status = "active"
  group by country ( # as total_users )
  order by -total_users

Pipeline operator (optional, explicit):

from users
| where status = "active"
| group by country ( # as total_users )
| order by -total_users

Both styles are equivalent. Choose based on preference or context.

Pros of indentation-based: - ✅ Cleaner, more natural reading flow - ✅ Less visual clutter - ✅ Feels more like natural language - ✅ Easier to write (no need to type |)

Pros of pipeline operator (|): - ✅ Makes data flow explicit and visible - ✅ Familiar to users of PRQL, PowerShell, Unix pipes - ✅ Easier to parse visually in complex queries - ✅ Can help with debugging (see exactly where each transformation happens)

Recommendation: Support both. Indentation-based is the default and recommended style, but the pipe operator is available for those who prefer explicit flow markers or are coming from other pipeline-based languages.

2.2 Entry Point: `from`¶

The from clause specifies the starting table:

from users
from sales
from opportunities

Smart inference: When referencing columns, the table can be inferred: - Users.name → explicit table reference - User.name → smart plural handling (inferred as Users.name)

3. Pipeline Operators¶

Operators are applied in logical order using the pipe (|) symbol:

Operator	Meaning	SQL Equivalent	Example
`where`	Filter rows	`WHERE`	`where status = "active"`
`group by`	Group and aggregate	`GROUP BY`	`group by country ( # as total_users )`
`&`, `&?`, `?&`, `*`	Join datasets	`JOIN`	`& owners on owner_id = owners.id`
`select` / `project`	Choose final columns	`SELECT`	`select country, users, avg_age`
`order by`	Sort rows	`ORDER BY`	`order by -users` (descending)
`limit`	Limit rows	`LIMIT`	`limit 10`
`stash as`	Define reusable CTE	`WITH ... AS`	`stash as active_users`

4. Expressions & Operators¶

4.1 Comparison Operators¶

= - equals (also accepts ==)
!= - not equals
<, >, <=, >= - comparison
is, is not - null checks
in, not in - membership

Note: Both = and == work for equality. = is preferred as it's standard SQL, but == is accepted for those coming from programming languages.

4.2 Logical Operators¶

and, or, not - logical operations
&& - alternative syntax for and

Note: Use the or keyword for logical OR (not ||). The ?? operator is used for COALESCE (see Section 4.6).

4.3 Arithmetic Operators¶

+, -, *, / - standard arithmetic
% - modulo

4.4 String & Date Literals¶

Strings: 'active' (single quotes are standard SQL and recommended)
Dates: @2025-01-10, @2025-11-10
Numbers: 42, 3.14

Note: Use single quotes for string literals. Double quotes ("column") are reserved for quoted identifiers (column/table names with spaces or special characters), following SQL standard conventions.

4.5 String Matching Operators¶

ASQL provides intuitive string matching operators that are more readable than SQL's LIKE syntax.

Case-Sensitive Operators¶

Contains (substring match):

from users where email contains "@gmail.com"
from users where name contains "John"

Starts with (prefix match):

from users where email starts with "admin"
from users where domain starts with "https://"

Ends with (suffix match):

from users where email ends with ".com"
from users where filename ends with ".pdf"

Case-Insensitive Operators¶

For case-insensitive matching, use the i prefix:

Case-insensitive contains:

from users where email icontains "gmail"
from users where name icontains "john"

Case-insensitive starts with:

from users where domain istarts with "https://"
from users where name istarts with "john"

Case-insensitive ends with:

from users where filename iends with ".pdf"
from users where email iends with ".com"

Pattern Matching¶

The matches operator supports SQL LIKE syntax with % and _ wildcards:

from users where email matches "%@gmail.com"
from users where phone matches "555-___-____"

Note: matches uses LIKE syntax (with % and _ wildcards), not regex. Regex support may be added in the future, but most SQL dialects don't support it well anyway.

SQL Equivalents¶

ASQL	SQL Equivalent	Notes
`contains "pattern"`	`LIKE '%pattern%'`	More intuitive, no wildcards needed
`icontains "pattern"`	`ILIKE '%pattern%'` (PostgreSQL) or `LOWER(column) LIKE LOWER('%pattern%')`	Case-insensitive
`starts with "pattern"`	`LIKE 'pattern%'`	Clearer intent than LIKE
`istarts with "pattern"`	`ILIKE 'pattern%'` or `LOWER(column) LIKE LOWER('pattern%')`	Case-insensitive
`ends with "pattern"`	`LIKE '%pattern'`	Clearer intent than LIKE
`iends with "pattern"`	`ILIKE '%pattern'` or `LOWER(column) LIKE LOWER('%pattern')`	Case-insensitive
`matches "%pattern%"`	`LIKE '%pattern%'`	LIKE syntax, not regex

Usage Examples¶

With logical operators:

from users 
  where email contains "@gmail.com" and status = "active"
  where name starts with "John" or name starts with "Jane"

With function calls:

from users where upper(name) contains "JOHN"
from users where (email ?? "") contains "@"

With dotted column names:

from users where users.email contains "@gmail.com"

Design Decision: Use icontains / istarts with / iends with instead of contains ... ignore case - analysts will prefer this syntax as it's more explicit and readable.

4.6 COALESCE Operator (`??`)¶

ASQL uses the ?? operator for COALESCE (nullish coalescing), providing a cleaner syntax than the function call.

Syntax:

-- Operator form (preferred)
column ?? default_value

-- Chained (multiple fallbacks)
column ?? fallback1 ?? fallback2 ?? "default"

-- Function form (also accepted)
coalesce(column, default_value)

Examples:

-- Handle NULL values
from users select name ?? "Unknown" as display_name

-- Multiple fallbacks
from products select price ?? sale_price ?? 0 as final_price

-- In WHERE clauses
from users where not (is_deleted ?? false)

-- With boolean logic
from orders where (status ?? "pending") = "completed"

Why ?? for COALESCE? - JavaScript uses ?? for nullish coalescing (not ||) - || in SQL is string concatenation in most dialects - avoids confusion - More concise than coalesce() function calls - Chains naturally: a ?? b ?? c reads as "a, or if null b, or if null c"

Precedence: The ?? operator has higher precedence than logical operators (and, or, not) but lower than comparison operators (=, !=, etc.). Use parentheses for clarity in complex expressions.

4.7 Conditional Expressions¶

ASQL provides two syntaxes for conditional expressions: ternary (? :) for simple binary conditions, and when for multi-branch conditions. Both replace SQL's verbose CASE statement with cleaner, more natural syntax.

4.7.1 Ternary Conditional (`? :`)¶

For simple binary conditions, ASQL supports a concise ternary syntax:

Syntax:

condition ? true_value : false_value

Examples:

from orders
  select 
    amount > 1000 ? "high" : "low" as tier,
    status = "active" ? 1 : 0 as is_active

-- With expressions
from users
  select
    age >= 18 ? "adult" : "minor" as age_group,
    (score ?? 0) > 80 ? "pass" : "fail" as result

When to use ternary vs when: - Use ternary (? :) for simple binary conditions: condition ? value_if_true : value_if_false - Use when for multi-branch conditions or when readability benefits from the explicit structure

Compiles to: CASE WHEN condition THEN true_value ELSE false_value END

4.7.2 Multi-Branch Conditionals (`when`)¶

For multi-branch conditions, ASQL uses when expressions with comma-separated branches:

Basic syntax with is for equality:

from users
  select
    name,
    when status
      is "active" then "Active User",
      is "pending" then "Pending",
      otherwise "Unknown"
    as status_label

Inline form (single line):

when status is "active" then 1, is "pending" then 0, otherwise -1

Comparison operators:

when age
  < 4 then "infant",
  < 12 then "child",
  < 18 then "teen",
  otherwise "adult"

Inequality with is not:

when status is not "deleted" then 1, otherwise 0

Multiple values with in:

when status
  in ("active", "pending") then "open",
  in ("completed", "shipped") then "done",
  otherwise "unknown"

Searched when (complex conditions):

when
  age < 18 and country = "US" then "US Minor",
  age < 18 then "Minor",
  otherwise "Adult"

Complex business logic:

from opportunity
  select
    when
      is_won then "Won",
      not is_won and is_closed then "Lost",
      not is_closed and lower(forecast_category) in ("pipeline", "forecast", "bestcase") then "Pipeline",
      otherwise "Other"
    as status

In aggregations:

from orders
  group by customer_id
  select
    customer_id,
    sum(when status is "completed" then 1, otherwise 0) as completed_count,
    sum(when status is "returned" then amount, otherwise 0) as returned_value

Branch separators: Branches are separated by commas. The trailing comma before otherwise/else is optional.

Operators supported: - is / = - equality - is not / != - inequality
- <, >, <=, >= - comparisons - in (values) - multiple value match

Default clause: Both else and otherwise are supported (they are aliases).

All forms compile to standard SQL CASE WHEN ... THEN ... ELSE ... END.

4.8 Type Casting (`::`)¶

ASQL uses PostgreSQL-style double colon (::) syntax for type casting, which is more concise and readable than SQL's CAST(... AS ...) syntax.

Syntax:

expression::type_name

Examples:

-- Cast to timestamp
from fields
  select _fivetran_synced::TIMESTAMP as _fivetran_synced

-- Cast to date
from events
  select created_at::DATE as date_day

-- Cast to integer
from products
  select price::INT as price_int

-- Cast to string
from users
  select id::VARCHAR as user_id_str

-- Cast in WHERE clauses
from orders
  where created_at::DATE = @2024-01-01

Precedence: The :: operator has high precedence (same as function calls), so it binds tightly: - amount::INT * 2 → (amount::INT) * 2 ✅ - sum(amount)::FLOAT → (sum(amount))::FLOAT ✅

Reverse Translation: When converting SQL to ASQL, CAST(... AS ...) expressions are automatically converted to :: syntax: - SQL: CAST(_fivetran_synced AS TIMESTAMP) → ASQL: _fivetran_synced::TIMESTAMP - SQL: CAST(created_at AS DATE) → ASQL: created_at::DATE

Supported Types: All standard SQL types are supported, including: - Numeric: INT, INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, NUMERIC - String: VARCHAR, CHAR, TEXT, STRING - Date/Time: DATE, TIMESTAMP, TIMESTAMP_NTZ, TIME - Boolean: BOOLEAN, BOOL - And dialect-specific types (e.g., Snowflake's NUMBER, BigQuery's INT64)

4.9 Comments¶

ASQL uses SQL-standard comment syntax:

Single-line: -- this is a comment
Multi-line: /* this is a multi-line comment */

Note: We use -- instead of # because: - It's the SQL standard (familiar to SQL users) - # is reserved for count aggregation syntax (see Section 5.2) - Better compatibility with SQL tooling and editors

Comment Settings¶

ASQL provides two settings to control comment behavior:

passthrough_comments (default: true): Controls whether source ASQL comments are preserved in the generated SQL. Set to false to strip all source comments:

SET passthrough_comments = false;
-- This comment will not appear in output
from users limit 10

include_transpilation_comments (default: false): When enabled, ASQL adds explanatory comments about complex transformations like spine:

SET include_transpilation_comments = true;
from orders
  where order_date >= @2024-01-01 and order_date < @2024-02-01
  spine by month(order_date) ( sum(amount) as revenue )

Output includes helpful explanation:

/* ASQL spine: Gap-filling CTEs were generated to ensure all 
   expected time periods appear (even with zero/null aggregates). */
WITH month_order_date_spine AS (...)
...

This is especially useful when learning how ASQL transformations work or when debugging generated SQL.

4.10 String Functions¶

ASQL provides clean string manipulation functions.

String Concatenation¶

Use concat() for joining strings:

from users
  select concat(first_name, " ", last_name) as full_name

For concatenating values across rows in a GROUP BY, use string_agg():

from orders
  group by customer_id (
    string_agg(product_name, ", ") as products
  )

Note: string_agg() is the standard name used in PostgreSQL, SQL Server, and DuckDB. ASQL compiles to the appropriate dialect (e.g., GROUP_CONCAT for MySQL, LISTAGG for Snowflake).

Substring with Slice Syntax¶

ASQL supports Python-style slice syntax for extracting substrings:

from users
  select 
    email[1:5] as email_prefix,        -- characters 1-5
    email[1:] as from_start,           -- from position 1 to end
    email[-5:] as last_five,           -- last 5 characters
    email[3:8] as middle_part          -- characters 3-8

Slice syntax: - string[start:end] - characters from start to end (1-indexed) - string[start:] - from start to end of string - string[:end] - from beginning to end position - string[-n:] - last n characters

The traditional substring() function is also supported:

substring(email, 1, 5) as email_prefix

String Replacement¶

Use replace() for string substitution:

from products
  select replace(description, "old", "new") as updated_desc

String Functions Reference¶

Function	Description	Example
`concat(a, b, ...)`	Concatenate strings	`concat(first, " ", last)`
`string_agg(col, sep)`	Aggregate strings with separator	`string_agg(name, ", ")`
`string[start:end]`	Slice substring	`email[1:5]`
`substring(str, pos, len)`	Extract substring	`substring(email, 1, 5)`
`replace(str, old, new)`	Replace occurrences	`replace(text, "a", "b")`
`lower(str)`	Convert to lowercase	`lower(name)`
`upper(str)`	Convert to uppercase	`upper(code)`
`trim(str)`	Remove whitespace	`trim(input)`
`length(str)`	String length	`length(name)`

4.11 Comparison Functions¶

Multi-Value Min/Max¶

Use min() and max() with multiple arguments to find the smallest or largest value:

from products
  select 
    max(price1, price2, price3) as highest_price,
    min(start_date, end_date) as earliest_date

SQL alternatives: greatest() and least() are also supported for familiarity with SQL:

from products
  select greatest(price1, price2, price3) as highest_price

Recommendation: Use max() and min() as they're more intuitive and consistent with aggregation functions.

Note: This is configurable - if you prefer SQL-style greatest()/least() as the default, this can be adjusted in ASQL settings.

4.12 NULL Handling¶

NULLIF Alternative¶

Instead of SQL's NULLIF() function, use ASQL's when expression (or SQL NULLIF() directly).

from transactions
  select 
    when
      amount = 0 then null
      otherwise amount
    as safe_amount

This is clearer than nullif(amount, 0) for some readers, but either is valid.

Note: You can also use SQL NULLIF() directly if you prefer:

SELECT NULLIF(amount, 0) AS safe_amount

4.13 Function Shorthand (Underscore/Space Principle)¶

ASQL supports shorthand/normalization rules that make queries read more naturally.

Implemented today¶

Natural-language aggregate calls (space + optional of) are normalized to function calls:

sum amount        -- → sum(amount)
sum of amount     -- → sum(amount)
avg price         -- → avg(price)

Certain multi-word functions can be written with spaces and are normalized to underscored function names:

day of week created_at     -- → day_of_week(created_at)
week of year created_at    -- → week_of_year(created_at)
string agg(name, ", ")     -- → string_agg(name, ", ")

Date “since/until” patterns are supported as special forms:

-- All three syntaxes work:
days_since(created_at)     -- Function call (recommended)
days since created_at      -- Space notation
days_since_created_at      -- Underscore alias

days_until(due_date)       -- → DATEDIFF('day', CURRENT_TIMESTAMP, due_date)

Underscore shorthand¶

In function contexts, underscores are interchangeable with spaces. For example:

sum_amount        -- → sum(amount)
month_created_at  -- → month(created_at)

5. Aggregations¶

5.1 Standard Aggregates¶

Aggregates are used within group by blocks. ASQL uses as syntax (like SQL) for aliasing:

from sales
  group by region (
    sum(amount) as revenue,
    count(distinct customer_id) as customers,
    avg(amount) as avg_order,
    max(amount) as max_order,
    min(amount) as min_order
  )

Auto-Aliasing:

ASQL automatically generates meaningful column names when functions are used without explicit AS aliases. This solves SQL's problem of unusable default names like count, f0_, or SUM(amount).

Aggregate Form	With `as` Alias	Auto-Generated Alias (No `as`)
`sum(amount)`	`sum(amount) as revenue` → column: `revenue`	`sum_amount`
`avg(price)`	`avg(price) as avg_price` → column: `avg_price`	`avg_price`
`count(*)` / `#`	`# as total` → column: `total`	`num`
`count(email)`	`count(email) as emails` → column: `emails`	`num_email`
`count(distinct user_id)`	`count(distinct user_id) as unique_users` → column: `unique_users`	`num_distinct_user_id`
`month(created_at)`	`month(created_at) as month` → column: `month`	`month_created_at`
`first(col order by ...)`	`first(order_id order by -date) as latest` → column: `latest`	`first_order_id`

Pattern rules: - Single-arg functions: func(col) → func_col - Multi-arg functions: func(a, b) → func_a_b
- Special cases: count(*) → num, row_number() → row_num

Explicit aliases always win: If you provide an as alias, it overrides the auto-generated name.

Configuration: Auto-aliasing is configurable via asql.config.yaml or SET statements. See the Auto-Aliasing Reference for the complete mapping table, template system, and configuration options.

Default return behavior: If no select clause is specified, the query returns all grouping columns followed by all aggregations in the order they're listed. select * has the same behavior.

5.2 Count Aggregation (`#`, `##`, `uniq`)¶

The # symbol is a shortcut for counting. Use ## (double hash) or uniq() for distinct counts.

-- Basic count syntaxes
#                    -- COUNT(*)
#*                   -- COUNT(*) (explicit)
#col                 -- COUNT(col)
# col                -- COUNT(col) (with space)
#(col)               -- COUNT(col) (with parens)

-- Distinct count syntaxes (double hash or uniq)
##col                -- COUNT(DISTINCT col)
## col               -- COUNT(DISTINCT col) (with space)
#(distinct col)      -- COUNT(DISTINCT col) (explicit)
uniq(col)            -- COUNT(DISTINCT col)
uniq col             -- COUNT(DISTINCT col) (space notation)

-- In select statements
from orders
  select #, ##user_id as unique_customers
  -- Returns: COUNT(*), COUNT(DISTINCT user_id) AS unique_customers

Count shorthand reference: - # → COUNT(*) (row count) - #col or # col → COUNT(col) (count non-null values) - #(col) → COUNT(col) (explicit parens) - ##col or ## col → COUNT(DISTINCT col) (distinct count) - #(distinct col) → COUNT(DISTINCT col) (explicit) - uniq(col) or uniq col → COUNT(DISTINCT col) (function form)

5.3 Sum & Total Aggregation¶

sum and total are interchangeable (both compile to SUM()):

-- Standard syntax
sum(amount) as revenue
total(amount) as revenue

-- Natural language syntax
Sum of amount as revenue
Total of amount as revenue
Sum amount as revenue
Total amount as revenue

-- In group by
from sales
  group by region (
    total amount as revenue
  )

5.4 Average Aggregation¶

Multiple natural language forms for averages:

-- Standard syntax
avg(Users.age) as avg_age
average(Users.age) as avg_age

-- Natural language syntax
Avg Users.age as avg_age
Average of Users.age as avg_age
Average Users.age as avg_age
Avg of Users.age as avg_age

-- With expressions
Avg(Users.age + 3) as adjusted_age
Average of Users.age + 3 as adjusted_age

5.5 Natural Language Philosophy¶

ASQL encourages natural language expressions. The of keyword can replace parentheses, making queries read like questions:

-- Instead of: count(*) from Users where country = 'US'
-- of Users where country = "US"

-- Instead of: sum(amount) from sales
-- Sum of amount from sales

-- Instead of: avg(age) from users group by country
-- Average of Users.age by country

This makes ASQL queries feel like asking questions rather than writing code.

5.6 Contextual Aggregates¶

When grouping, aggregates are computed per group:

from users
  group by country (
    count() as total_users,
    avg(age) as avg_age
  )

5.7 Window Functions¶

ASQL provides intuitive syntax for common window function patterns, making operations like deduplication, ranking, and running totals much cleaner than SQL.

The `per` Command (Pipeline Window Operations)¶

The per command creates a window context for operations on partitions:

Syntax:

per <partition_cols> <operation> by <order_cols> [as <alias>]

Or without partition (whole table):

<operation> by <order_cols> [as <alias>]

Available Operations:

Operation	What it does	Default alias	Row count
`first`	Keep first row per partition	(no column)	↓ Reduces
`last`	Keep last row per partition	(no column)	↓ Reduces
`number`	Add row number column	`row_num`	Same
`rank`	Add rank column	`rank`	Same
`dense rank`	Add dense rank column	`dense_rank`	Same

Examples:

-- DEDUPLICATION: Keep most recent order per customer
from orders
  per customer_id first by -order_date

-- ADD ROW NUMBER: Number orders per customer (most recent = 1)
from orders
  per customer_id number by -order_date
-- Result: adds `row_num` column

-- ADD RANK: Rank employees by salary within department
from employees
  per department rank by -salary
-- Result: adds `rank` column

-- NO PARTITION: Number all rows
from events
  number by -timestamp
-- Result: adds `row_num` to all rows, ordered by timestamp desc

Reading the syntax naturally: - per customer_id first by -order_date → "Per customer, get the first by order date descending" - per department rank by -salary → "Per department, rank by salary descending"

first() / last() in GROUP BY¶

Use first() and last() as aggregates within GROUP BY to extract values:

from orders
  group by customer_id (
    first(order_id order by -order_date) as latest_order,
    last(order_id order by order_date) as first_order,
    # as total_orders
  )

Comparison: per ... first vs first() in GROUP BY: - Use per ... first when you want the whole row (deduplication) - Use first() in GROUP BY when you want specific aggregated values

arg_max() / arg_min() (ClickHouse-inspired)¶

Get the value of one column where another column is max/min:

from orders
  group by customer_id (
    arg_max(order_id, order_date) as latest_order_id,
    arg_min(order_id, order_date) as earliest_order_id
  )

prior() / next() (Simplified LAG/LEAD)¶

from monthly_sales
  order by month
  select
    month,
    revenue,
    prior(revenue) as prior_revenue,       # LAG(revenue, 1)
    prior(revenue, 3) as three_months_ago,
    next(revenue) as next_revenue          # LEAD(revenue, 1)

Running Aggregates (Cumulative)¶

from transactions
  order by date
  select
    date,
    amount,
    running_sum(amount) as cumulative_amount,
    running_avg(amount) as avg_to_date,
    running_count(*) as transaction_number

Shorthand: running amount is equivalent to running_sum(amount).

Rolling Window Aggregates¶

Moving window with specified size:

from daily_sales
  order by date
  select
    date,
    revenue,
    rolling_avg(revenue, 7) as seven_day_avg,
    rolling_sum(revenue, 30) as monthly_total

Window Functions Quick Reference¶

Intent	ASQL Syntax
Most recent row per group	`per group_col first by -date`
Oldest row per group	`per group_col first by date`
Add row numbers per group	`per group_col number by -date`
Add rank per group	`per group_col rank by -value`
Add dense rank per group	`per group_col dense rank by -value`
Get column value at max	`arg_max(col, sort_col)`
Previous row value	`prior(col)`
Next row value	`next(col)`
Cumulative sum	`running_sum(col)`
Cumulative average	`running_avg(col)`
7-day moving average	`rolling_avg(col, 7)`
First value in group	`first(col order by ...)`

6. Grouping¶

6.1 Basic Grouping¶

from users
  group by country ( # as total_users )

Note: The parentheses syntax for grouping aggregates is inspired by PRQL, which uses a similar block structure for clarity and readability.

Alternative names: While group by is standard SQL and recommended, alternatives that may be more intuitive to newcomers include: - rollup by - emphasizes aggregation - bucket by - emphasizes grouping into buckets - aggregate by - explicit about aggregation

These are syntactic alternatives - all compile to SQL GROUP BY. group by remains the primary syntax for familiarity and SQL compatibility.

6.2 Multiple Grouping Columns¶

from sales
  group by region, month (
    sum(amount) as revenue,
    count() as orders
  )

6.3 Natural Language Grouping¶

For very simple queries, natural language syntax can be used:

-- of Users by country
Sum of revenue by region, month
Avg Users.age by country

Note: These are syntactic shortcuts. For complex queries, the explicit group by syntax with parentheses (Section 6.1) is recommended for clarity and consistency.

6.4 Spine (Gap-Filling)¶

Use spine by to ensure all expected dimension values appear in grouped results—even if they have no data. This prevents the common analytics bug where missing data creates gaps in charts and incorrect calculations.

`spine by` Syntax¶

-- All months from Jan-Jun will appear, even with zero revenue
from orders
  where order_date >= @2024-01-01 and order_date < @2024-07-01
  spine by month(order_date) (
    sum(amount) ?? 0 as revenue
  )

`spine()` in GROUP BY¶

Use spine() within a group by to spine only specific columns:

-- Spine dates but NOT regions (only regions with data appear)
from orders
  group by spine(month(order_date)), region (
    sum(amount) ?? 0 as revenue
  )

How It Works¶

Date truncations (month(), year(), week(), etc.): ASQL infers the date range from your WHERE clause and fills all periods
Non-date columns: Uses DISTINCT values from the source data
Cross-joins multiple columns: All combinations of dimension values are guaranteed

Explicit Values with guarantee() (Planned)¶

Note: guarantee() is a planned feature not yet implemented. For now, use categorical columns directly with spine by.

Use guarantee() to specify exactly which values should appear:

from orders
  spine by guarantee(status, ['pending', 'shipped', 'delivered', 'cancelled']) (
    # ?? 0 as order_count
  )

This ensures all four statuses appear in results, even if some have zero orders.

When NOT to Use Spine¶

Use regular group by instead of spine by when you don't want gap-filling:

from orders
  group by month(order_date) ( sum(amount) as revenue )

Filter the results if you used spine but want to remove zeros:

from orders
  spine by month(order_date) ( sum(amount) as revenue )
  where revenue > 0

7. Joins & Relationships¶

ASQL uses symbolic operators for joins, making the join type visually clear. The & represents the join point, and ? marks optional (nullable) sides.

7.1 Join Operators¶

Operator	Join Type	SQL Equivalent	Meaning
`&`	INNER JOIN	`INNER JOIN`	Both sides must match
`&?`	LEFT JOIN	`LEFT JOIN`	Right side is optional (can be NULL)
`?&`	RIGHT JOIN	`RIGHT JOIN`	Left side is optional (can be NULL)
`?&?`	FULL OUTER JOIN	`FULL OUTER JOIN`	Both sides are optional
`*`	CROSS JOIN	`CROSS JOIN`	Cartesian product

Mnemonic: "The ? marks the side that might be NULL"

7.2 Basic Join Syntax¶

-- INNER JOIN: only matching rows
from opportunities & owners
  select opportunities.amount, owners.name

-- LEFT JOIN: all opportunities, owners may be NULL
from opportunities &? owners
  select opportunities.amount, owners.name

-- RIGHT JOIN: all owners, opportunities may be NULL  
from opportunities ?& owners
  select opportunities.amount, owners.name

-- FULL OUTER JOIN: all rows from both sides
from opportunities ?&? owners
  select opportunities.amount, owners.name

-- CROSS JOIN: every combination
from opportunities * owners
  select opportunities.amount, owners.name

7.3 Table Aliasing¶

Use as to alias joined tables:

from opportunities &? users as owner
  select opportunities.amount, owner.name, owner.email

7.4 Explicit Join Conditions¶

ASQL offers three levels of explicitness for join conditions:

Fully Inferred Joins¶

When tables follow FK naming conventions ({table}_id), ASQL can infer the join condition:

-- ASQL infers: orders.user_id = users.id
from orders & users

FK Column Shorthand¶

When a table has multiple FKs to the same target table, specify which FK column to use:

-- accounts has both owner_id and created_by_id pointing to users
from accounts &? users on owner_id
-- → FROM accounts LEFT JOIN users ON accounts.owner_id = users.id

-- Works in either direction
from users & accounts on owner_id
-- → FROM users JOIN accounts ON accounts.owner_id = users.id

-- With aliases
from accounts &? users as owner on owner_id
-- → FROM accounts LEFT JOIN users AS owner ON accounts.owner_id = owner.id

The FK column shorthand assumes: - The FK column is on the "from" table (left side) - The FK points to id on the target table

Fully Explicit Conditions¶

For non-standard relationships, specify the complete condition:

from opportunities &? owners on opportunities.owner_id = owners.id
  select opportunities.amount, owners.name

-- With alias
from opportunities &? users as owner on opportunities.owner_id = owner.id
  select opportunities.amount, owner.name

-- Complex conditions
from orders & users on orders.customer_id = users.id and users.active = true

7.5 Dot Notation for FK Traversal¶

This works WITHOUT a model file - ASQL recognizes FK naming conventions.

If a column follows the pattern {name}_id, you can traverse it using .{name}.:

-- opportunities has owner_id column (FK to users table)
from opportunities
  select 
    opportunities.amount,
    opportunities.owner.name,      -- Auto-joins via owner_id
    opportunities.owner.email      -- Same join, different column

Compiles to:

SELECT 
  opportunities.amount,
  owner_1.name,
  owner_1.email
FROM opportunities
LEFT JOIN users AS owner_1 ON opportunities.owner_id = owner_1.id

Key points: - The FK column owner_id enables .owner. traversal - ASQL finds the target table by checking: owners table, then users table (singularization) - Dot traversal defaults to LEFT JOIN (the FK might be NULL) - Multiple references to same FK reuse the same join (no duplicate joins)

7.6 Chained FK Traversal¶

Navigate through multiple relationships:

-- order_items.order_id → orders.user_id → users
from order_items
  select 
    order_items.quantity,
    order_items.order.total,           -- → orders
    order_items.order.user.name        -- → orders → users

7.7 Multiple FKs to Same Table¶

When a table has multiple FKs to the same table, use the <alias>_<table>_id convention:

-- accounts has owner_user_id, manager_user_id, support_rep_user_id all → users
from accounts
  select 
    accounts.owner.name as owner_name,           -- via owner_user_id
    accounts.manager.name as manager_name,       -- via manager_user_id  
    accounts.support_rep.name as support_name    -- via support_rep_user_id

The FK naming pattern <alias>_user_id enables .alias. dot traversal to the users table.

Or with explicit joins:

from accounts 
  &? users as owner on accounts.owner_user_id = owner.id
  &? users as manager on accounts.manager_user_id = manager.id
  select owner.name, manager.name

7.8 Convention-Based Inference¶

ASQL uses naming conventions to auto-detect joins:

FK Naming Patterns:

FK Column	Alias	Target Table	Dot Traversal
`user_id`	`user`	`users`	`.user.`
`account_id`	`account`	`accounts`	`.account.`
`owner_user_id`	`owner`	`users`	`.owner.`
`manager_user_id`	`manager`	`users`	`.manager.`
`parent_account_id`	`parent_account`	`accounts`	`.parent_account.`

Pattern: <alias>_<table_name>_id → alias is <alias>, traverses to <table_name> table

Inference priority: 1. Explicit on clause - Always wins 2. Model metadata - If relationships are explicitly defined in model files 3. Naming convention inference - {name}_id enables .{name}. traversal 4. Single FK check - If only one FK exists between tables, auto-join 5. Error with suggestions - If ambiguous or no match found

7.9 Self-Joins (Hierarchies)¶

-- Employees and their managers (explicit)
from employees &? employees as manager on employees.manager_id = manager.id
  select employees.name, manager.name as manager_name

-- Or with dot notation (uses manager_id FK automatically)
from employees
  select 
    employees.name, 
    employees.manager.name as manager_name,
    employees.manager.manager.name as skip_level_manager

7.10 Optional Model Metadata¶

While not required, you can define relationships explicitly for non-standard FK names:

# asql_schema.yml
relationships:
  - from: opportunities.owner_id
    to: users.id
    alias: owner

  - from: accounts.primary_contact
    to: contacts.id  # Non-standard name, needs explicit mapping

dbt Compatibility: ASQL can read dbt's schema.yml files to infer relationships from relationships tests.

7.11 Join Quick Reference¶

SQL	ASQL Operator	ASQL with ON clause
`INNER JOIN`	`&`	`& users on ...`
`LEFT JOIN`	`&?`	`&? users on ...`
`RIGHT JOIN`	`?&`	`?& users on ...`
`FULL OUTER JOIN`	`?&?`	`?&? users on ...`
`CROSS JOIN`	`*`	`* users`

Feature	Syntax	Example
Aliasing	`as`	`&? users as owner`
Explicit condition	`on`	`&? users on orders.user_id = users.id`
FK traversal	`.fk.`	`orders.user.name` (via `user_id`)

Key principles: 1. ? marks the optional/nullable side: Easy to remember 2. FK inference via naming convention: {name}_id enables .{name}. traversal 3. No model file required: Convention-based inference works out of the box 4. Explicit always works: Full on clause syntax never fails 5. Smart deduplication: Dot traversal reuses explicit joins when FK matches

8. Dates & Time¶

Dates are one of the most important features in analytics. ASQL provides a clean, intuitive, and portable date syntax that compiles to the right dialect-specific SQL.

8.1 Date Literals¶

ASQL uses the @ prefix for date literals:

from users
  where signup_date >= @2024-01-01

from orders
  where order_date between @2024-01-01 and @2024-12-31

The @ prefix distinguishes dates from strings and is clean and unambiguous.

8.2 Time Truncation Functions¶

ASQL provides simple time truncation functions for grouping and time series:

year(created_at)      -- Truncate to year start: 2025-01-01
month(created_at)     -- Truncate to month start: 2025-01-01
week(created_at)      -- Truncate to week start: 2025-01-06
day(created_at)       -- Truncate to day: 2025-01-15
hour(created_at)      -- Truncate to hour: 2025-01-15 14:00:00
quarter(created_at)   -- Truncate to quarter start

Current note: Time truncation currently requires normal function-call syntax like year(created_at). (The broader underscore/space shorthands are not implemented.)

8.3 Date Part Extraction¶

Extracting date parts (day of week, week of year, etc.) uses natural language phrases:

day of week created_at        -- 1-7 (which day of the week)
day of month created_at       -- 1-31 (which day of the month)
day of year created_at        -- 1-366 (which day of the year)
week of year created_at       -- 1-52 (which week of the year)
month of year created_at      -- 1-12 (which month)
quarter of year created_at    -- 1-4 (which quarter)

Function equivalents (same behavior):

day_of_week(created_at)
week_of_year(created_at)
month_of_year(created_at)

Example usage:

-- Weekend orders
from orders
  where day of week order_date in (6, 7)

-- Sales by day of week
from sales
  group by day of week sale_date (
    sum(amount) as revenue
  )

Distinction from truncation: - month(created_at) → 2025-01-01 (truncated date, for time series) - month of year created_at → 1 (integer, for "all Januaries")

8.4 Date Arithmetic¶

Adding or subtracting from dates uses clean inline syntax:

order_date + 7 days
order_date - 1 month
order_date + 2 weeks
created_at + 24 hours
updated_at - 90 minutes

Singular/plural both work:

order_date + 1 day           -- singular
order_date + 7 days          -- plural

Compiles to (dialect-specific):

-- PostgreSQL
order_date + INTERVAL '7 days'

-- SQL Server  
DATEADD(day, 7, order_date)

-- MySQL
DATE_ADD(order_date, INTERVAL 7 DAY)

8.5 Date Difference¶

Getting days/months/years between dates:

days(end_date - start_date)      -- Returns integer days
months(end_date - start_date)    -- Returns integer months
years(end_date - start_date)     -- Returns integer years
hours(end_date - start_date)     -- Returns integer hours
weeks(end_date - start_date)     -- Returns integer weeks

Alternative syntax:

days_between(start_date, end_date)
months_between(start_date, end_date)

Example:

from orders
  select 
    days(shipped_date - order_date) as fulfillment_days,
    months(now() - customer_since) as customer_tenure_months

8.6 Relative Dates (`ago` and `from now`)¶

Past dates with ago:

from users
  where last_login >= 7 days ago

from orders
  where created_at >= 30 days ago
  where created_at >= 1 month ago
  where created_at >= 3 hours ago

Future dates with from now:

from orders
  where estimated_delivery <= 3 days from now

from reminders
  where remind_at <= 1 hour from now

Compiles to:

-- PostgreSQL
WHERE last_login >= CURRENT_DATE - INTERVAL '7 days'
WHERE estimated_delivery <= CURRENT_TIMESTAMP + INTERVAL '3 days'

8.7 Time Since/Until Patterns¶

ASQL provides three equivalent syntaxes for calculating time differences:

Syntax variants (all equivalent):

-- Function call syntax (recommended)
days_since(created_at)
months_until(due_date)

-- Space notation (natural language)
days since created_at
months until due_date

-- Underscore alias (compact)
days_since_created_at
months_until_due_date

Available functions: | Since (time elapsed) | Until (time remaining) | |---------------------|------------------------| | days_since(col) | days_until(col) | | weeks_since(col) | weeks_until(col) | | months_since(col) | months_until(col) | | years_since(col) | years_until(col) | | hours_since(col) | hours_until(col) | | minutes_since(col) | minutes_until(col) | | seconds_since(col) | seconds_until(col) |

Example usage:

from users
  select
    name,
    days_since(last_login),           -- Function call
    months since signup_date,          -- Space notation
    years_since_birth_date as age      -- Underscore alias

from tasks
  where days_until(due_date) < 7
  -- or: where days until due_date < 7
  -- or: where days_until_due_date < 7

Compiles to (PostgreSQL):

SELECT name, 
       DATEDIFF('day', last_login, CURRENT_TIMESTAMP),
       DATEDIFF('month', signup_date, CURRENT_TIMESTAMP),
       DATEDIFF('year', birth_date, CURRENT_TIMESTAMP) AS age
FROM users

8.8 Week Start Configuration¶

Default: ISO 8601 standard (Monday = day 1)

week(created_at)              -- Default: ISO (Monday start)
week_monday(created_at)       -- Explicit Monday start
week_sunday(created_at)       -- US-style Sunday start

day of week created_at        -- Default: 1 = Monday, 7 = Sunday

Rationale: ISO 8601 is the international standard. Most analytics/business contexts expect Monday start.

8.9 Timezone Handling¶

Use cast-like syntax with :: for timezone conversion:

-- Short timezone codes
created_at::PST
created_at::UTC
created_at::EST

-- Full IANA timezone names (quoted)
created_at::"America/Los_Angeles"
created_at::"Europe/London"

-- Chained with other operations
month(created_at::PST)
created_at::UTC + 7 days

Compiles to (PostgreSQL):

created_at AT TIME ZONE 'PST'
created_at AT TIME ZONE 'America/Los_Angeles'

8.10 Time Bucketing (Grouping)¶

Time bucketing is simply grouping by a time function:

from users
  group by month(created_at) ( # as signups )

from sales
  group by week(sale_date) ( sum(amount) as revenue )

8.11 Date Functions Quick Reference¶

Operation	ASQL Syntax	Example
Date literal	`@YYYY-MM-DD`	`@2025-01-15`
Truncation	`unit(col)`	`month(created_at)`
Extraction	`unit of period col`	`day of week created_at`
Arithmetic	`date + N unit`	`order_date + 7 days`
Difference	`unit(date1 - date2)`	`days(end - start)`
Relative past	`N unit ago`	`7 days ago`
Relative future	`N unit from now`	`3 days from now`
Time since	`unit_since(col)` or `unit since col`	`days_since(created_at)`
Time until	`unit_until(col)` or `unit until col`	`days_until(due_date)`
Timezone	`col::TZ`	`created_at::PST`
Week (Sunday)	`week_sunday(col)`	`week_sunday(created_at)`

8.12 Default Time Fields & Conventions¶

ASQL follows a "convention over configuration" philosophy:

Standard naming: Timestamp columns named created_at, updated_at enable automatic inference
Smart defaults: If a table has created_at, it's assumed to be the primary time field
Configurable: Override defaults in model metadata when needed

models:
  users:
    default_time: signup_date  # Override convention

Best practice: Follow dbt-style modeling standards, and ASQL's date features "just work."

9. Filtering¶

9.1 Basic Filters¶

ASQL uses where instead of filter because it's more intuitive - "where" clearly means "filter in" (keep rows that match), whereas "filter" is ambiguous (filter in or filter out?).

from users
  where status = "active"
  where age >= 18

9.2 Multiple Conditions¶

Multiple conditions can be written in several ways:

Separate where clauses:

from opportunities
  where status = "open"
  where owner.is_active
  where org_type != "Non Profit"

Using and on same line:

from opportunities
  where status = "open" and owner.is_active and org_type != "Non Profit"

Using & operator:

from opportunities
  where status = "open" & owner.is_active & org_type != "Non Profit"

Tabbed indentation (multi-line):

from opportunities
  where status = "open"
    and owner.is_active
    and org_type != "Non Profit"

Using or (with parentheses for grouping):

from opportunities
  where (status = "open" or status = "pending")
    and owner.is_active

All of these compile to SQL WHERE clauses. The pipeline approach makes complex conditions easier to read than nested SQL.

9.3 Alternative Syntax: `if`¶

from opportunities
  group by owner.name ( total as sum(amount) )
  if status = "open"
  if owner.is_active

Note: if is syntactic sugar for where and can be used interchangeably. It makes queries read more naturally: "total pipeline by owner name if status is open".

10. Ordering¶

10.1 Basic Ordering¶

The order by clause sorts rows by one or more columns:

from users order by name
from users order by -total_users

Descending order: Use the - prefix for descending order: - order by name → ascending (A-Z) - order by -name → descending (Z-A)

10.2 Ordering by Function Calls¶

You can order by function calls using the - prefix for descending order:

from users order by month(created_at)
from users order by -updated_at
from events order by -year(created_at), name

Examples: - order by updated_at → Sort by updated_at ascending - order by -updated_at → Sort by updated_at descending (newest first) - order by -year(created_at), name → Sort by year descending, then name ascending

This syntax makes it easy to order by computed values like date functions.

10.3 Multiple Order Columns¶

Multiple order columns are separated by commas:

from users order by -total_users, name
from sales order by -revenue, region, -date

The - prefix applies only to the column immediately following it.

11. Settings & CTEs¶

11.1 Compile settings (`SET`)¶

ASQL supports SQL-style SET statements for compile settings (not CTE variables). Supported settings:

dialect (string) — Target SQL dialect
week_start (string) — "monday" or "sunday"
relative_date_type (string) — "timestamp" or "date"
include_transpilation_comments (boolean) — Add explanatory comments about transformations
passthrough_comments (boolean) — Preserve source comments in output

Examples:

SET dialect = 'postgres';
SET week_start = 'sunday';

from orders
  where status = "active"
  limit 10

-- Enable explanatory comments for debugging/learning
SET include_transpilation_comments = true;
-- Strip source comments from output  
SET passthrough_comments = false;

from orders
  spine by month(order_date) ( sum(amount) as revenue )

Note: set name = <query> (using SET to define a CTE variable) is not implemented.

11.2 CTEs via `stash as`¶

ASQL can create reusable CTEs using stash as. This keeps CTEs close to where they’re defined and used.

Basic usage (at the end):

from users
  where status = "active"
  group by country ( # as total_users )
  select country, total_users
  stash as revenue

from revenue
  order by -total_users

Using stash as in the middle of a pipeline:

from users
  where status = "active"
  stash as active_users
  group by country ( # as total_users )
  order by -total_users

When stash as appears in the middle, it stashes everything before it as a CTE, then continues with the pipeline.

Multiple queries reusing a stashed CTE:

from sales
  where year(date) = 2025
  group by region ( sum(amount) as revenue )
  stash as use_this_later
  order by -revenue
  limit 10;

from use_this_later
  where revenue > 1000
  select region, revenue

Benefits of stash as: - ✅ Proximity: CTEs are defined where they're used, often right before they're referenced - ✅ Clear chaining: You can see the data flow clearly at the end of pipelines - ✅ Reusability: Multiple queries can reference the same stashed CTE - ✅ Natural flow: Fits naturally into the pipe syntax - ✅ Better readability: You end with the name and use it right after, so your eyes don't have to jump around

Note: stash as is currently the only ASQL syntax that creates CTEs.

12. Functions¶

Not implemented yet: User-defined functions (func ... = ...) are not currently supported.

You can still use normal SQL/SQLGlot function calls in expressions (e.g. sum(amount), date_trunc(...), substring(...)) and ASQL’s implemented shorthands described elsewhere in this document.

13. Data Transformation Operators¶

ASQL provides built-in operators that replace common dbt macro patterns, making queries cleaner and more portable.

13.1 Column Set Operators¶

These operators manipulate column sets without needing to list every column.

`except` - Exclude Columns¶

Exclude specific columns from the result:

-- Exclude sensitive columns
from users
  except email, phone, ssn

-- After a join, exclude from specific tables
from users
  & orders on users.id = orders.user_id
  except users.password_hash, orders.internal_notes

Compiles to: Explicit SELECT with all columns except those listed (uses schema metadata).

`rename` - Rename Columns¶

Rename columns inline:

from users
  rename id as user_id, name as user_name

-- Rename with table prefix after join
from users
  & orders on users.id = orders.user_id
  rename users.id as user_id

Compiles to: SELECT * EXCEPT(id, name), id AS user_id, name AS user_name FROM users

`replace` - Replace Column Values¶

Replace column values with new expressions:

from users
  replace name with upper(name)

-- Chained replacements (comma-separated)
from users
  replace name with upper(name), email with lower(email), salary with round(salary, 2)

-- Or separate statements
from users
  replace name with upper(name)
  replace email with lower(email)

Compiles to: SELECT * EXCEPT(name, email, salary), upper(name) AS name, lower(email) AS email, round(salary, 2) AS salary FROM users

Combining Column Operators¶

from users
  & orders on users.id = orders.user_id
  except password_hash, internal_notes
  rename users.id as user_id
  replace name with upper(name)

13.2 Deduplicate¶

deduplicate by ... order by ... is syntax sugar for the existing window pattern per ... first by ....

from events
  deduplicate by user_id, event_type
  order by -created_at

Notes: - order by is required (it defines which row is kept) - This rewrites to: per user_id, event_type first by -created_at

13.3 Pivot / Unpivot / Explode¶

`pivot` - Rows to Columns¶

Transform row values into columns.

Static Pivot (Explicit Values)

When you know the pivot values at compile time, specify them explicitly:

-- Pivot with explicit values
from sales
  pivot sum(amount) by category values ("Electronics", "Clothing", "Food")

-- Non-aggregate pivot (uses MAX)
from sales
  pivot amount by category values ("A", "B", "C")

Note: Pivot currently requires an explicit values list. “Dynamic pivot” (values from a subquery) is not implemented.

`unpivot` - Columns to Rows¶

Transform columns into rows:

from monthly_metrics
  unpivot jan, feb, mar, apr into month, value

Compiles to: UNION ALL subquery that works across all dialects.

`explode` - Array to Rows¶

Expand array-typed columns into multiple rows:

-- Explode array column
from posts
  explode tags as tag

-- Split string and explode
from posts
  explode split(tags_csv, ',') as tag

Compiles to: - Postgres/DuckDB: UNNEST(array) AS alias - BigQuery: CROSS JOIN UNNEST(array) AS alias - Snowflake: CROSS JOIN (SELECT value AS alias FROM TABLE(FLATTEN(...)))

This is the inverse of array_agg() / array agg.

13.4 Date Spine / Series¶

Not implemented yet: date_spine(...) and series(...) as table sources are not currently supported (see docs/spec_future.md).

13.5 Union with Schema Alignment¶

Not implemented yet: union(...) with schema alignment is not currently supported (see docs/spec_future.md).

13.6 Surrogate Keys¶

ASQL provides key(col1, col2, ...) to generate deterministic surrogate keys (inspired by dbt_utils).

from orders
  select key(user_id, order_id) as order_key

Semantics: - Deterministic hashing with delimiter injection between values - Consistent NULL handling (NULLs become empty strings before hashing) - Type normalization (values are cast to strings before concatenation)

Compiles to (conceptually): MD5(CONCAT(COALESCE(CAST(col1 AS VARCHAR), ''), '||', COALESCE(CAST(col2 AS VARCHAR), ''), ...))

13.7 Operator Quick Reference¶

Operator	Purpose	Example
`except`	Exclude columns	`except email, phone`
`rename`	Rename columns	`rename id as user_id`
`replace`	Replace column values	`replace name with upper(name)`
`pivot ... by ... values`	Rows to columns	`pivot sum(amount) by category values ('A', 'B')`
`unpivot ... into`	Columns to rows	`unpivot jan, feb into month, value`
`explode ... as`	Array to rows	`explode tags as tag`

14. Cohort Analysis¶

Cohort analysis groups users by a shared characteristic (usually when they "started") and tracks their behavior over time. In ASQL, cohort analysis is straightforward using computed columns and GROUP BY - no special syntax needed.

14.1 The Core Pattern¶

Cohort analysis in ASQL is just computed columns + GROUP BY:

from events
join users
group by 
  month(users.signup_date) as cohort,
  months_between(events.event_date, users.signup_date) as period
(
  count(distinct user_id) as active
)
order by cohort, period

ASQL's alias reuse lets you reference cohort immediately after defining it. No CTEs needed.

14.2 Why This Works¶

Traditional SQL requires CTEs because you can't reference a computed column in the same SELECT:

-- This DOESN'T work in SQL:
SELECT 
  DATE_TRUNC('month', signup_date) AS cohort,
  DATEDIFF(month, cohort, event_date) AS period  -- ERROR!
FROM ...

ASQL's alias reuse handles this automatically - each alias becomes available to subsequent expressions.

14.3 Key Components¶

Cohort: When each entity "started" (signup month, first purchase month, etc.)
Period: Time elapsed since their cohort start (0 = start, 1 = first period after, etc.)
Cohort size: How many entities in each cohort (for retention %)

14.4 Examples¶

User Retention by Signup Cohort:

from events
join users
group by 
  month(users.signup_date) as cohort,
  months_between(events.event_date, users.signup_date) as period
(
  count(distinct user_id) as active
)
order by cohort, period

Revenue Cohort with LTV:

from orders
join customers
group by 
  month(customers.first_order_date) as cohort,
  months_between(orders.order_date, customers.first_order_date) as period
(
  sum(total) as revenue,
  count(distinct customer_id) as customers
)
select *,
  first(customers) over cohort as cohort_size,
  running_sum(revenue) over cohort / cohort_size as ltv
order by cohort, period

Retention with Percentages:

from events
join users
group by 
  month(users.signup_date) as cohort,
  months_between(events.event_date, users.signup_date) as period
(
  count(distinct user_id) as active
)
select *,
  first(active) over cohort as cohort_size,
  round(active::numeric / cohort_size * 100, 1) as retention_pct
where period between 0 and 12
order by cohort, period

Segmented Cohorts:

from events
join users
group by 
  users.channel,
  month(users.signup_date) as cohort,
  months_between(events.event_date, users.signup_date) as period
(
  count(distinct user_id) as active
)
order by channel, cohort, period

14.5 Period Functions¶

ASQL provides date difference functions for period calculation:

Function	Returns
`months_between(date1, date2)`	Months between dates
`weeks_between(date1, date2)`	Weeks between dates
`days_between(date1, date2)`	Days between dates

Or use explicit arithmetic: datediff('month', signup_date, event_date)

14.6 Gap Filling with Spine¶

For complete cohort matrices (showing 0 for periods with no activity), use spine by:

from events
join users
spine by 
  month(users.signup_date) as cohort,
  months_between(events.event_date, users.signup_date) as period
(
  count(distinct user_id) as active
)
order by cohort, period

15. Models (Optional Metadata)¶

Philosophy: Ideally, ASQL doesn't create its own model format. It should: 1. Use dbt's existing schema.yml files when available 2. Infer from database schema metadata and conventions 3. Only create custom models when neither of the above are available

If relationships are defined in dbt's schema.yml:

models:
  - name: opportunities
    columns:
      - name: owner_id
        tests:
          - relationships:
              to: ref('owners')
              field: id

Then ASQL can use those relationships automatically:

from opportunities
  group by owner.name ( sum(amount) as total_pipeline )

Custom model format (only if dbt/schema metadata unavailable):

model users:
  default_time: created_at
  links:
    orders: orders.user_id

model opportunities:
  links:
    owner: owners.id
  default_time: created_at

Usage:

from users
  group by country ( # as total_users, average of age as avg_age )

16. Nested Results (Optional)¶

Not implemented yet: Nested result shapes (EdgeQL-style select { ... }) are not currently supported.

from countries
| select {
    name,
    users = from users 
      | filter users.country = countries.code 
      | select name, age
  }

17. Indentation & Multi-line Queries¶

17.1 Indentation Rules¶

Every line must return a new table. For multi-line operations, indent:

from users
  where status = "active"
  where age >= 18
  group by country ( count() as count )

17.2 Nested Selects¶

from users
  -- Nested selects are not implemented yet; write as separate queries/joins for now.

18. Capitalization & Naming¶

18.1 Case-Safe Design¶

ASQL is case-safe by design. This means you can use capital letters in column and table names without wrapping them in quotes obsessively. However, table/column names must still match the actual database names (case-insensitively).

-- You can write queries using any case style
from Users
  select firstName, createdAt, user_id
  where status = "active"

-- ASQL resolves case-insensitively to actual database names
-- If database has: users table, first_name column, created_at column
-- ASQL matches them correctly without requiring exact case

Important clarification: - from Users will query the users table (case-insensitive match) - from USERS will NOT query a different table - it matches users case-insensitively - You can use firstName in your query even if the database column is first_name - Aliases can be in any case: select firstName as UserName works fine

Why this matters: - Database best practices often recommend snake_case (created_at, user_id) - Frontend development typically uses camelCase (createdAt, userId) - APIs might return PascalCase (CreatedAt, UserId) - Legacy databases might have inconsistent casing

ASQL eliminates the friction of matching exact case, allowing you to write queries using whatever naming style feels natural while still matching the correct database objects.

18.2 Case Handling Strategy¶

ASQL normalizes identifiers internally while preserving the original case for SQL generation:

Parse: Accept any case variation
Normalize: Convert to a canonical form for matching
Resolve: Match against schema (case-insensitive)
Generate: Use the database's preferred case (from schema metadata) or preserve original

Example:

-- You write:
from Users
  select firstName, createdAt

-- ASQL resolves (case-insensitive):
-- - Users → users (if that's the actual table name)
-- - firstName → first_name (if that's the actual column)
-- - createdAt → created_at (if that's the actual column)

-- Generated SQL uses actual database names:
SELECT first_name, created_at FROM users

18.3 Column Name Conflicts¶

ASQL helps reduce ambiguity in joined queries by auto-qualifying unqualified column references when it can.

Before (Explicit Qualification Required)¶

In traditional SQL, when joining tables with conflicting column names, you must explicitly qualify every column:

-- SQL: Must explicitly qualify conflicting columns
SELECT users.id AS user_id, orders.id AS order_id, users.name, orders.amount
FROM users
JOIN orders ON users.id = orders.user_id

After (Automatic Resolution)¶

ASQL does not rewrite SELECT * into table.* lists. Instead, it focuses on qualifying ambiguous references (e.g., id → users.id or orders.id) in later clauses when possible.

Examples:

-- Simple join: SELECT * expands to users.*, orders.*
from users & orders on users.id = orders.user_id
-- → SELECT users.*, orders.* FROM users JOIN orders ON users.id = orders.user_id

-- Multiple joins: Expands to all table.* columns
from orders
  & customers on orders.customer_id = customers.id
  & order_items on orders.id = order_items.order_id
-- → SELECT orders.*, customers.*, order_items.* FROM ...

-- With aliases: Uses alias names
from users &? orders as o on users.id = o.user_id
-- → SELECT users.*, o.* FROM users LEFT JOIN orders AS o ON users.id = o.user_id

-- Explicit SELECT: No expansion (uses your explicit columns)
from users & orders on users.id = orders.user_id
select users.name, orders.amount
-- → SELECT users.name, orders.amount FROM users JOIN orders ON users.id = orders.user_id

When Expansion Happens: - ✅ SELECT * with joins → Expanded to table.* for each table - ❌ SELECT * without joins → Kept as SELECT * - ❌ Explicit SELECT columns → No expansion (uses your columns)

Benefits: - No ambiguous column errors - Columns remain accessible with table qualification - Works with all join types (INNER, LEFT, RIGHT, FULL OUTER, CROSS) - Respects table aliases

Note: Without schema information, ASQL cannot automatically rename conflicting columns to users_id and orders_id. The expansion to table.* allows you to reference columns with table qualification (e.g., users.id, orders.id) to avoid conflicts. You can still add explicit aliases if you want renamed columns:

from users & orders on users.id = orders.user_id
select users.id as user_id, orders.id as order_id

18.4 Why Case-Safe is Good¶

Pros: - ✅ Eliminates a common source of errors - ✅ Works seamlessly across different naming conventions - ✅ Reduces cognitive load (don't worry about case) - ✅ Better developer experience

Potential Concerns: - ⚠️ Might hide typos (but schema validation catches these) - ⚠️ Could be confusing if database has both UserId and user_id (but this is rare and would be caught during resolution)

Verdict: Case-safety is a significant quality-of-life improvement that outweighs the minor risks. The compiler can warn about ambiguous cases during schema resolution.

19. Examples¶

Example 1: Simple Analytic Query¶

from sales
  where year(date) = 2025
  group by region ( sum(amount) as revenue )
  order by -revenue

Generated SQL:

WITH step1 AS (
  SELECT * FROM sales WHERE EXTRACT(YEAR FROM date) = 2025
)
SELECT region, SUM(amount) AS revenue
FROM step1
GROUP BY region
ORDER BY revenue DESC;

Example 2: Joins & Conditions¶

from opportunities
  & owners
  where owners.is_active
  group by owners.name ( sum(amount) as total_pipeline )
  order by -total_pipeline

Example 3: Time Series¶

from sessions
  group by week(start_time) (
    #(distinct user_id) as active_users
  )

Example 4: Natural Language Aggregates¶

-- of Users by country
Sum of revenue by region
Avg Users.age by country

Example 5: Variables and Reuse¶

from users
  where plan = "premium"
  stash as premium_users

from premium_users
  group by country ( # as total_users )

Example 6: Complex Pipeline¶

from opportunities
  where status = "open"
  & owners
  where owners.is_active
  where org_type != "Non Profit"
  group by owner.name ( sum(amount) as total_pipeline )
  order by -total_pipeline
  limit 10

Example 7: Date Grouping¶

from users
  group by month(created_at) ( # as signups )
  select month, signups

Example 8: User-Defined Function¶

-- User-defined functions are not implemented yet.
-- Use inline expressions or SQL functions directly for now.
from users
  group by country ( avg(years(now() - birthday)) as avg_age )

Example 9: Case-Safe Naming¶

-- Works regardless of database naming convention
from Users
  select firstName, createdAt, user_id
  where status = "active"

Example 10: Natural Language with "of"¶

from sales
  group by region (
    total amount as revenue,
    #(distinct customer_id) as customers,
    average amount as avg_order
  )

Note on "as" order: Keep SQL's order - expression as alias. So total amount as revenue reads correctly: "total amount, aliased as revenue". The natural language function names like total amount make it sound like they're already aliases, but they're expressions that need aliasing. In group by blocks, always use expression as alias format: sum(amount) as revenue, # as total_users, etc.

Example with function:

-- User-defined functions are not implemented yet.
-- Inline expressions are the current workaround:
from users
  group by country ( avg(years(now() - birthday)) as avg_age )

20. Compilation & Transpilation¶

20.1 Compilation Process¶

Parse: ASQL → AST (Abstract Syntax Tree)
Resolve: AST → Resolved AST (with type info, relationships)
Transform: Resolved AST → SQL AST (via SQLGlot)
Generate: SQL AST → Target SQL dialect

20.2 Intermediate Representation¶

Each pipeline step becomes a CTE:

from users
  where status = "active"
  group by country ( count() as count )

Becomes:

WITH step1 AS (
  SELECT * FROM users WHERE status = 'active'
)
SELECT country, COUNT(*) AS count
FROM step1
GROUP BY country;

20.3 Target Dialects¶

Via SQLGlot, ASQL can transpile to: - ANSI SQL - PostgreSQL - MySQL - SQLite - BigQuery - Snowflake - Redshift - And more...

21. Implementation Roadmap¶

Stage	Milestone	Description
v0.1	Parser + SQLGlot transpiler + CLI	Basic syntax parsing and SQL generation
v0.2	VSCode extension	Autocomplete, syntax highlighting, SQL preview
v0.3	Model layer + relationships	YAML model files, FK inference
v0.4	Functions, fragments, REPL	User-defined functions, query fragments, interactive REPL
v1.0	Optimizer, dialects, adapters	Query optimization, full dialect support, DuckDB adapter

22. Design Decisions & Rationale¶

22.1 Why Remove SELECT?¶

Traditional SQL requires SELECT at the start, but the columns you need often aren't known until the end of the query. ASQL's pipeline approach lets you build up the query naturally, with select/project appearing only when needed.

22.2 Why Indentation-Based Syntax?¶

Indentation-based syntax (with optional pipe operators) is cleaner and more natural than requiring explicit operators. It reads like a conversation: "from users, filter active ones, group by country, count them." The pipe operator (|) is available for those who prefer explicit flow markers.

22.3 Why Natural Language?¶

ASQL is pronounced "Ask-el" - it should feel like asking a question. Natural language syntax (# of Users, Sum of amount, Average of age) makes queries readable to non-technical stakeholders while maintaining precision.

22.4 Why Case-Safe?¶

Database conventions (snake_case) conflict with frontend conventions (camelCase). ASQL eliminates this friction by being case-insensitive, allowing developers to write queries using whatever naming style feels natural.

22.5 Why Convention Over Configuration?¶

ASQL follows a "convention over configuration" philosophy (inspired by frameworks like Rails and dbt):

Assumes good modeling: If you follow standards (standardized column names like created_at, proper FK naming like user_id), ASQL infers relationships and defaults automatically
Encourages best practices: By making standard patterns easy and non-standard patterns explicit, ASQL encourages good modeling
Reduces boilerplate: Most queries don't need explicit joins or time field specifications when conventions are followed
Configurable when needed: Everything can be overridden, but defaults work for 80% of cases
dbt-friendly: Works seamlessly with dbt projects that follow dbt's modeling standards

Example: If you have Accounts.user_id and a Users table, ASQL automatically infers the FK relationship. If you have Accounts.ownerUserRef, you'll need to configure it explicitly (encouraging you to rename it to owner_id).

22.6 Why Not Replace SQL?¶

ASQL transpiles to SQL, ensuring compatibility with existing tools, databases, and knowledge. It's an evolution, not a revolution. Maybe one day different databases will adopt ASQL or move toward it, just as JavaScript moved toward CoffeeScript's ideas (async/await, arrow functions, etc.).

23. Future Features¶

For features that are planned, under consideration, or marked as "maybe" for v1.0, see docs/spec_future.md.

24. Major Benefits of ASQL¶

24.1 Reduced Need for CTEs and Nested Queries¶

Traditional SQL often requires CTEs or nested subqueries to break down complex logic. ASQL's pipeline approach eliminates most of this need:

SQL (requires CTE):

WITH filtered_users AS (
  SELECT * FROM users WHERE is_active
),
grouped AS (
  SELECT country, COUNT(*) as total
  FROM filtered_users
  GROUP BY country
)
SELECT * FROM grouped ORDER BY total DESC;

ASQL (no CTE needed):

from users
  where is_active
  -- cleaned users by country
  group by country ( # as total_users )
  order by -total_users

The comment marks where you might have created a CTE in SQL, but the pipeline continues naturally.

24.2 More Readable Column Names¶

Natural language syntax makes column names more readable without needing explicit aliases:

SQL:

SELECT country, COUNT(*) as total_users, AVG(age) as avg_age

ASQL:

select country, # of Users as total_users, average of age

The natural language makes columns self-documenting - # of Users is clearer than count or even total_users.

24.3 Less Boilerplate¶

No need to write SELECT at the start when you don't know what columns you need yet
No need for explicit joins when FKs follow conventions
No need for verbose date extraction functions
No need to quote identifiers obsessively

24.4 Better for Analytics¶

Time functions that work consistently across databases
Natural language aggregations that read like questions
Pipeline flow that matches analytical thinking
Convention-based defaults that reduce configuration

Appendix A: Grammar Sketch (Informal)¶

query := from_clause pipeline*

from_clause := 'from' table_name

pipeline := operator  -- indentation-based, or '|' operator (optional)

operator := filter_op
          | group_by_op
          | join_op
          | select_op
          | order_op
          | limit_op
          | set_op

filter_op := 'where' expression
           | 'if' expression

group_by_op := 'group' 'by' expression_list '(' aggregate_list ')'

join_op := 'join' table_name ('on' expression)?

select_op := 'select' column_list

order_op := 'order' 'by' ('-'? (column_name | function_call))+

limit_op := 'limit' number

set_op := 'set' var_name '=' query

aggregate := var_name 'as' aggregate_func '(' expression ')'
           | natural_language_aggregate
           | '#' | '#(' expression ')' | '# of' expression

aggregate_func := 'count' | 'sum' | 'avg' | 'min' | 'max'

Appendix B: Comparison with Other Languages¶

Feature	SQL	PRQL	Malloy	ASQL
Pipe syntax	❌	✅	✅	✅
Model layer	❌	❌	✅	✅ (optional)
Natural language	❌	❌	✅	✅
Smart joins	❌	❌	✅	✅
Time bucketing	Manual	Manual	✅	✅
Transpiles to SQL	N/A	✅	✅	✅

References & Inspiration¶

PRQL: Pipeline structure, group by blocks, func (note: PRQL uses let but ASQL uses set for SQL familiarity)
KQL: Verb syntax (filter, project, summarize), pipeline operators
Malloy: Model layer, measures/dimensions, time bucketing concept (ASQL uses simpler month(), year() functions rather than Malloy's time_bucket() function), default_time concept
EdgeQL: Dot traversal for relationships, nested result shapes
FunSQL: Composable variables and reusable fragments
Google Pipe-SQL: Compatibility mindset, additive approach
SQLGlot: Transpilation engine and SQL parsing

End of Specification

ASQL: Analytic SQL — Language Specification¶

1. Overview¶

Design Philosophy¶

2. Core Syntax¶

2.1 Basic Query Structure¶

2.2 Entry Point: from¶

3. Pipeline Operators¶

4. Expressions & Operators¶

4.1 Comparison Operators¶

4.2 Logical Operators¶

4.3 Arithmetic Operators¶

4.4 String & Date Literals¶

4.5 String Matching Operators¶

Case-Sensitive Operators¶

Case-Insensitive Operators¶

Pattern Matching¶

SQL Equivalents¶

Usage Examples¶

4.6 COALESCE Operator (??)¶

4.7 Conditional Expressions¶

4.7.1 Ternary Conditional (? :)¶

4.7.2 Multi-Branch Conditionals (when)¶

4.8 Type Casting (::)¶

4.9 Comments¶

Comment Settings¶

4.10 String Functions¶

String Concatenation¶

Substring with Slice Syntax¶

String Replacement¶

String Functions Reference¶

4.11 Comparison Functions¶

Multi-Value Min/Max¶

4.12 NULL Handling¶

NULLIF Alternative¶

4.13 Function Shorthand (Underscore/Space Principle)¶

Implemented today¶

Underscore shorthand¶

5. Aggregations¶

5.1 Standard Aggregates¶

5.2 Count Aggregation (#, ##, uniq)¶

5.3 Sum & Total Aggregation¶

5.4 Average Aggregation¶

5.5 Natural Language Philosophy¶

5.6 Contextual Aggregates¶

5.7 Window Functions¶

The per Command (Pipeline Window Operations)¶

first() / last() in GROUP BY¶

arg_max() / arg_min() (ClickHouse-inspired)¶

prior() / next() (Simplified LAG/LEAD)¶

Running Aggregates (Cumulative)¶

Rolling Window Aggregates¶

Window Functions Quick Reference¶

6. Grouping¶

6.1 Basic Grouping¶

6.2 Multiple Grouping Columns¶

6.3 Natural Language Grouping¶

6.4 Spine (Gap-Filling)¶

spine by Syntax¶

spine() in GROUP BY¶

How It Works¶

Explicit Values with guarantee() (Planned)¶

When NOT to Use Spine¶

7. Joins & Relationships¶

7.1 Join Operators¶

7.2 Basic Join Syntax¶

7.3 Table Aliasing¶

7.4 Explicit Join Conditions¶

Fully Inferred Joins¶

FK Column Shorthand¶

Fully Explicit Conditions¶

7.5 Dot Notation for FK Traversal¶

7.6 Chained FK Traversal¶

7.7 Multiple FKs to Same Table¶

7.8 Convention-Based Inference¶

7.9 Self-Joins (Hierarchies)¶

7.10 Optional Model Metadata¶

7.11 Join Quick Reference¶

8. Dates & Time¶

8.1 Date Literals¶

8.2 Time Truncation Functions¶

2.2 Entry Point: `from`¶

4.6 COALESCE Operator (`??`)¶

4.7.1 Ternary Conditional (`? :`)¶

4.7.2 Multi-Branch Conditionals (`when`)¶

4.8 Type Casting (`::`)¶

5.2 Count Aggregation (`#`, `##`, `uniq`)¶

The `per` Command (Pipeline Window Operations)¶

`spine by` Syntax¶

`spine()` in GROUP BY¶

8.6 Relative Dates (`ago` and `from now`)¶

9.3 Alternative Syntax: `if`¶

11.1 Compile settings (`SET`)¶

11.2 CTEs via `stash as`¶

`except` - Exclude Columns¶

`rename` - Rename Columns¶

`replace` - Replace Column Values¶

`pivot` - Rows to Columns¶

`unpivot` - Columns to Rows¶

`explode` - Array to Rows¶