Pipeline Semantics¶
ASQL uses a pipeline-based query model where data flows from top to bottom. This matches how you think about data transformations and makes complex queries easier to understand.
The Problem with SQL¶
SQL's syntax puts SELECT first, even though it executes last:
SELECT region, SUM(amount) AS revenue -- 5th: finally, what columns
FROM sales -- 1st: start here
WHERE year = 2024 -- 2nd: filter
GROUP BY region -- 3rd: aggregate
HAVING SUM(amount) > 1000 -- 4th: filter again
ORDER BY revenue DESC -- 6th: sort
LIMIT 10; -- 7th: limit
This inside-out structure makes complex queries hard to: - Read: You have to jump around to understand the flow - Write: You can't see what columns are available until you've written the FROM - Review: Code review is harder when logic isn't linear - Debug: Where's the bug? Hard to trace when order is jumbled
ASQL's Solution¶
ASQL puts operations in execution order:
from sales -- 1st: start here
where year(date) = 2024 -- 2nd: filter
group by region (sum(amount) as revenue) -- 3rd: aggregate
where revenue > 1000 -- 4th: filter (HAVING)
order by -revenue -- 5th: sort
limit 10 -- 6th: limit
Read top-to-bottom, execute top-to-bottom.
Key Benefits¶
1. Progressive Disclosure¶
Build queries step by step:
from orders -- See all orders
where status = "completed" -- Only completed
group by customer_id ( -- Aggregate per customer
sum(amount) as total
)
order by -total -- Sort by total
limit 100 -- Top 100
Each line transforms the result of the previous line.
2. Composability¶
Combine transformations naturally:
from orders
where status = "completed"
& customers on orders.customer_id = customers.id
where customers.country = "US"
group by customers.state (
sum(orders.amount) as revenue
)
3. Reduced Need for CTEs¶
SQL often uses CTEs to linearize logic:
WITH filtered AS (
SELECT * FROM orders WHERE status = 'completed'
),
joined AS (
SELECT o.*, c.country FROM filtered o
JOIN customers c ON o.customer_id = c.id
)
SELECT country, SUM(amount) FROM joined GROUP BY country;
ASQL handles this with pipelines:
from orders
where status = "completed"
& customers on orders.customer_id = customers.id
group by customers.country (sum(amount) as revenue)
4. Easier Debugging¶
When something's wrong, you can trace the flow:
from orders -- ✓ Orders look right
where status = "completed" -- ✓ Filter is correct
group by customer_id (...) -- ← Bug is here: wrong column
How Pipelines Work¶
Each Line Transforms a Table¶
Every pipeline step takes a table and produces a new table:
orders (raw)
→ where status = "completed" → orders (filtered)
→ group by customer_id (...) → customer_totals
→ order by -total → customer_totals (sorted)
Implicit SELECT¶
You don't need to specify SELECT upfront:
from orders -- Implicitly: SELECT * FROM orders
Add SELECT when you want specific columns:
from orders
select id, amount, created_at
Multiple WHERE Clauses¶
Multiple WHERE clauses combine with AND:
from orders
where status = "completed"
where amount > 100
where created_at >= @2024-01-01
Equivalent to:
from orders
where status = "completed"
and amount > 100
and created_at >= @2024-01-01
Generated SQL¶
ASQL compiles to standard SQL. Simple queries map directly:
from users
where is_active
order by -created_at
limit 10
SELECT * FROM users WHERE is_active ORDER BY created_at DESC LIMIT 10
Complex queries may use CTEs to maintain structure:
from orders
where status = "completed"
group by customer_id (sum(amount) as total)
where total > 1000
WITH grouped AS (
SELECT customer_id, SUM(amount) AS total
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
)
SELECT * FROM grouped WHERE total > 1000
Comparison with Other Languages¶
| Language | Approach |
|---|---|
| SQL | SELECT-first, inside-out execution |
| PRQL | Pipeline-based, similar to ASQL |
| Kusto/KQL | Pipeline-based with \| operator |
| pandas | Method chaining |
| dplyr (R) | Pipe operator (%>%) |
ASQL draws inspiration from all of these while maintaining SQL familiarity.
When SQL is Better¶
Pipelines aren't always better:
- Simple queries:
SELECT * FROM users WHERE id = 1is fine as-is - Set operations: UNION, INTERSECT, EXCEPT can be clearer in SQL
- Subqueries: Some correlated subqueries are cleaner in SQL
ASQL transpiles to SQL, so you can always use the generated SQL directly.
Next Steps¶
- Pipeline Basics — Syntax reference
- CTEs & Variables — When you do need CTEs
- Convention Over Configuration — How ASQL infers defaults