Sampling¶

ASQL provides intuitive syntax for random sampling, essential for exploratory data analysis and working with large datasets.

Fixed Sample Size¶

Get a random sample of N rows:

from orders
  sample 100

This returns 100 random rows from the table.

Generated SQL:

SELECT * FROM orders ORDER BY RANDOM() LIMIT 100

Get a random percentage of rows:

from orders
  sample 10%

This returns approximately 10% of the rows, randomly selected.

Generated SQL (using SQL standard TABLESAMPLE):

SELECT * FROM orders TABLESAMPLE BERNOULLI(10)

Dialect	SQL Output
PostgreSQL/Redshift	`TABLESAMPLE BERNOULLI(10)` or `TABLESAMPLE SYSTEM(10)`
BigQuery	`TABLESAMPLE SYSTEM(10 PERCENT)`
Snowflake	`SAMPLE (10)`
DuckDB	`USING SAMPLE 10%`

Get N random rows per group value (stratified sampling):

from orders
  sample 100 per category

This returns 100 random rows for each distinct value of category.

Generated SQL:

SELECT *
FROM orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY category ORDER BY RANDOM()) <= 100

Balanced training datasets — Ensure ML models see equal examples from each class
Representative samples — Get proportional data across categories
Quality checks — Sample from each segment for review

Sampling can be combined with filters and other operations:

from orders
  where status = "completed"
  sample 1000
  order by created_at

Note: The order by applies to the final sampled result, not to the sampling process itself.

-- Get a quick look at recent orders
from orders
  sample 50
  order by -created_at

-- Sample orders from each region
from orders
  sample 100 per region
  select region, customer_id, amount, created_at

-- Balanced sample across categories
from products
  sample 500 per category
  select *

-- Sample active users with purchases
from users
  where is_active
  where total_purchases > 0
  sample 1000

Method	Performance	Best For
`sample N`	Requires full scan	Small tables, any dialect
`sample N%`	Fast (block sampling)	Large tables with TABLESAMPLE support
`sample N per col`	Window function overhead	Stratified sampling