Composite Indexes: Column Order Matters
A composite index on (a, b, c) can satisfy queries on a, a, b, and a, b, c — but NOT on b alone or c alone. The leading column rule is the most commonly misunderstood fact about composite indexes.
-- this index supports:
-- WHERE status = 'pending' ✓
-- WHERE status = 'pending' AND created_at > ... ✓
-- WHERE status = 'pending' AND created_at > ... AND customer_id = 42 ✓
-- WHERE created_at > ... ✗ (can't skip status)
CREATE INDEX idx_orders_status_created_customer
ON orders (status, created_at, customer_id);
-- covering index — includes all columns the query needs, zero heap fetches
CREATE INDEX idx_orders_covering ON orders (customer_id, status)
INCLUDE (total, created_at);
-- query that uses the covering index with no heap access:
SELECT total, created_at
FROM orders
WHERE customer_id = 42 AND status = 'shipped';
Reading EXPLAIN ANALYZE Output
EXPLAIN (ANALYZE, BUFFERS)
SELECT p.name, SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN products p ON p.id = oi.product_id
WHERE oi.created_at BETWEEN '2026-01-01' AND '2026-03-31'
GROUP BY p.id, p.name
ORDER BY revenue DESC
LIMIT 20;
/*
Sample output sections to read:
-> Hash Join (cost=1234.56..5678.90 rows=10000 width=48)
(actual time=45.123..98.456 rows=8734 loops=1)
Hash Cond: (oi.product_id = p.id)
Buffers: shared hit=1023 read=234
-> Bitmap Heap Scan on order_items oi
(actual time=12.3..34.5 rows=92345 loops=1)
Recheck Cond: (created_at BETWEEN ...)
-> Bitmap Index Scan on idx_oi_created_at
(actual time=8.9..8.9 rows=92345 loops=1)
Planning Time: 2.1 ms
Execution Time: 102.3 ms
*/
Key metrics: actual time is the real wall time. rows discrepancy is your optimization signal. Buffers: read=N means disk I/O — large values indicate missing indexes or cold cache.
Eliminating N+1 Queries
The N+1 problem happens when you fetch a list of records, then run one query per record to fetch related data. In SQL, the fix is always a JOIN.
-- BAD: 1 query for orders + N queries for customers
SELECT * FROM orders WHERE status = 'pending';
-- then for each order:
SELECT * FROM customers WHERE id = ?;
-- GOOD: single query with JOIN
SELECT
o.id,
o.total,
o.created_at,
c.id AS customer_id,
c.email AS customer_email,
c.first_name,
c.last_name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'pending'
ORDER BY o.created_at DESC;
-- fetching nested aggregates without N+1
-- BAD: separate query per order for item count
SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '30 days';
-- then: SELECT COUNT(*) FROM order_items WHERE order_id = ?
-- GOOD: aggregate in JOIN
SELECT
o.id,
o.total,
COUNT(oi.id) AS item_count,
SUM(oi.quantity) AS total_units
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.id
WHERE o.created_at > NOW() - INTERVAL '30 days'
GROUP BY o.id, o.total
ORDER BY o.created_at DESC;
CTEs vs Subqueries
-- subquery in FROM (derived table) — planner can inline and optimize
SELECT p.name, revenue_data.total_revenue
FROM products p
JOIN (
SELECT product_id, SUM(quantity * unit_price) AS total_revenue
FROM order_items
WHERE created_at > NOW() - INTERVAL '90 days'
GROUP BY product_id
) revenue_data ON revenue_data.product_id = p.id
ORDER BY revenue_data.total_revenue DESC;
-- CTE — cleaner syntax, same performance in Postgres 12+
-- (CTEs are now inlined by default unless MATERIALIZED is specified)
WITH revenue_by_product AS (
SELECT
product_id,
SUM(quantity * unit_price) AS total_revenue,
COUNT(DISTINCT order_id) AS order_count
FROM order_items
WHERE created_at > NOW() - INTERVAL '90 days'
GROUP BY product_id
),
top_products AS (
SELECT product_id, total_revenue, order_count
FROM revenue_by_product
WHERE total_revenue > 1000
)
SELECT p.name, tp.total_revenue, tp.order_count
FROM top_products tp
JOIN products p ON p.id = tp.product_id
ORDER BY tp.total_revenue DESC
LIMIT 50;
-- force materialization when the CTE result is expensive + reused
WITH MATERIALIZED expensive_aggregation AS (
SELECT customer_id, AVG(total) AS avg_order_value
FROM orders
WHERE created_at > NOW() - INTERVAL '1 year'
GROUP BY customer_id
)
SELECT c.email, ea.avg_order_value
FROM expensive_aggregation ea
JOIN customers c ON c.id = ea.customer_id
WHERE ea.avg_order_value > 200;
Window Functions
Window functions compute a value across a set of rows related to the current row — without collapsing them into a single GROUP BY result. They are one of the most powerful SQL features for analytics.
-- running total
SELECT
created_at::date AS date,
SUM(total) AS daily_revenue,
SUM(SUM(total)) OVER (
ORDER BY created_at::date
ROWS UNBOUNDED PRECEDING
) AS cumulative_revenue
FROM orders
WHERE created_at >= '2026-01-01'
GROUP BY created_at::date
ORDER BY date;
-- rank products by revenue within each category
SELECT
p.name,
p.category,
SUM(oi.quantity * oi.unit_price) AS revenue,
RANK() OVER (
PARTITION BY p.category
ORDER BY SUM(oi.quantity * oi.unit_price) DESC
) AS rank_in_category
FROM products p
JOIN order_items oi ON oi.product_id = p.id
GROUP BY p.id, p.name, p.category
ORDER BY p.category, rank_in_category;
-- compare to previous period (LAG)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month))
/ NULLIF(LAG(revenue) OVER (ORDER BY month), 0) * 100,
2
) AS pct_change
FROM (
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(total) AS revenue
FROM orders
GROUP BY 1
) monthly
ORDER BY month;
-- deduplicate: keep latest record per customer
SELECT DISTINCT ON (customer_id)
customer_id, id AS order_id, created_at, total
FROM orders
ORDER BY customer_id, created_at DESC;
Index Maintenance
-- find unused indexes (waste space, slow writes)
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE 'pg_%'
ORDER BY pg_relation_size(indexrelid) DESC;
-- find missing indexes on foreign keys
SELECT
tc.table_name,
kcu.column_name,
ccu.table_name AS references_table
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
ON tc.constraint_name = ccu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
AND NOT EXISTS (
SELECT 1 FROM pg_indexes
WHERE tablename = tc.table_name
AND indexdef LIKE '%' || kcu.column_name || '%'
);
-- rebuild bloated indexes concurrently (no table lock)
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;
People Also Ask
When should I use a partial index instead of a full index?
Use a partial index when your queries consistently filter on a specific condition — like WHERE status = 'pending' or WHERE deleted_at IS NULL. A partial index only indexes rows matching the condition, so it is smaller (faster to build, cheaper to maintain) and has higher selectivity (more likely to be used by the planner). The tradeoff is that it only helps queries that include the matching WHERE clause.
Why does adding an index sometimes make queries slower?
For very low-selectivity queries (e.g., WHERE status IN ('a', 'b', 'c') matching 80% of rows), a sequential scan is actually faster than an index scan because the index forces random I/O to fetch each heap page individually. The planner knows this and will choose a seq scan. You can force it with hints in development, but in production you should trust the planner — and investigate its statistics if you think it is wrong.
What is the difference between EXPLAIN and EXPLAIN ANALYZE?
EXPLAIN shows the query plan the planner would use — it does not execute the query. EXPLAIN ANALYZE actually runs the query and shows both the estimated and actual row counts and timings. Always use EXPLAIN ANALYZE with BUFFERS (EXPLAIN (ANALYZE, BUFFERS)) when debugging performance issues so you can see disk I/O. Be careful: EXPLAIN ANALYZE on a DELETE or UPDATE will actually execute those statements, so wrap them in a transaction you roll back if needed.
Comments · 0
No comments yet. Be the first to share your thoughts.