Options Filldown SQL Optimization
A case study on how we optimized the options fill-down query from 8+ seconds (timeout) to 350ms.
The Problem
The fn_options_filldown() function was timing out on a 1M+ row pnl_daily_v2 table. We needed to:
- Find 45 options that "disappeared" (not in latest PnL date)
- Get each option's identity fields (last non-synthetic row)
- Generate 17,376 synthetic rows to fill the gaps
Initial Approach (8+ seconds, timeout)
-- SLOW: MAX() aggregation across 1M+ rows
SELECT
position_key,
MAX(pnl_date) AS last_seen_date
FROM sable.pnl_daily_v2
WHERE deleted_at IS NULL -- WRONG index!
AND asset_class = 'OPTIONS'
GROUP BY position_key
HAVING MAX(pnl_date) < v_reference_date;
Problems Identified
- Wrong filter condition: Using
deleted_at IS NULLbut indexes useis_active = true - Expensive aggregation:
MAX(pnl_date)per position_key requires scanning all rows - No index utilization: Sequential scan on 1M+ rows
Optimization #1: Match Index Predicates
The table has partial indexes:
CREATE INDEX idx_pnl_daily_v2_unique ON sable.pnl_daily_v2(position_key, pnl_date)
WHERE is_active = true;
Fix: Change WHERE deleted_at IS NULL to WHERE is_active = true:
-- BETTER: Matches partial index
WHERE is_active = true AND asset_class = 'OPTIONS'
This alone reduced query time significantly, but still too slow.
Optimization #2: NOT IN Pattern Instead of MAX
Instead of computing MAX(pnl_date) for every position, we:
- Get position_keys that ARE on the latest date
- Find position_keys that are NOT in that set
-- Fast: Uses index on (pnl_date, position_key)
WITH on_latest AS (
SELECT position_key
FROM sable.pnl_daily_v2
WHERE is_active = true
AND pnl_date = v_reference_date
AND asset_class = 'OPTIONS'
),
disappeared AS (
SELECT DISTINCT position_key
FROM sable.pnl_daily_v2
WHERE is_active = true
AND asset_class = 'OPTIONS'
AND position_key NOT IN (SELECT position_key FROM on_latest)
)
SELECT * FROM disappeared;
Result: 139ms to find 45 disappeared options (vs 8+ seconds with MAX)
Optimization #3: LATERAL Join for Final State
For each disappeared option, we need its last non-synthetic row. A regular JOIN would still be slow:
-- SLOW: Join scans full table for each match
SELECT d.position_key, p.*
FROM disappeared d
JOIN sable.pnl_daily_v2 p ON p.position_key = d.position_key
WHERE p.pnl_date = (SELECT MAX(pnl_date) FROM ...) -- Nested MAX = slow
Solution: Use CROSS JOIN LATERAL with LIMIT 1:
-- FAST: Uses index seek per position (45 seeks total)
SELECT d.position_key, fs.*
FROM disappeared d
CROSS JOIN LATERAL (
-- Only select identity fields (symbol, account, sector)
-- P&L values are NOT selected - synthetic rows use $0
SELECT symbol, account_code, economic_sector, pnl_date as last_date
FROM sable.pnl_daily_v2
WHERE position_key = d.position_key
AND is_active = true
AND (data_source IS NULL OR data_source NOT LIKE 'synthetic%')
ORDER BY pnl_date DESC
LIMIT 1
) fs;
Why it's fast:
- LATERAL executes the subquery once per row in
disappeared - With only 45 rows, that's 45 index seeks
- Each seek uses the index on
(position_key, pnl_date) LIMIT 1stops after finding the first (most recent) match
Final Query Structure
WITH on_latest AS (
SELECT position_key FROM sable.pnl_daily_v2
WHERE is_active = true AND pnl_date = v_reference_date AND asset_class = 'OPTIONS'
),
disappeared AS (
SELECT DISTINCT position_key FROM sable.pnl_daily_v2
WHERE is_active = true AND asset_class = 'OPTIONS'
AND position_key NOT IN (SELECT position_key FROM on_latest)
),
final_states AS (
SELECT d.position_key, fs.*
FROM disappeared d
CROSS JOIN LATERAL (
SELECT symbol, security_description, account_code, ...
FROM sable.pnl_daily_v2
WHERE position_key = d.position_key AND is_active = true
AND (data_source IS NULL OR data_source NOT LIKE 'synthetic%')
ORDER BY pnl_date DESC LIMIT 1
) fs
),
to_insert AS (
SELECT fs.*,
generate_series(fs.last_date + 1, v_reference_date, '1 day')::DATE AS fill_date
FROM final_states fs
)
INSERT INTO sable.pnl_daily_v2 (...)
SELECT ... FROM to_insert
ON CONFLICT (position_key, pnl_date) WHERE is_active = true
DO UPDATE SET ...;
Performance Comparison
| Approach | Time | Notes |
|---|---|---|
| Original (MAX aggregation) | 8+ sec | Timeout |
With is_active = true | ~4 sec | Better but still slow |
| NOT IN pattern | 139ms | Finding disappeared |
| + LATERAL join | 353ms | Total with final state |
Final result: 23x faster (8000ms → 350ms)
Indexes Used
-- Partial index on position_key, date (used by LATERAL)
CREATE INDEX idx_pnl_daily_v2_unique
ON sable.pnl_daily_v2(position_key, pnl_date)
WHERE is_active = true;
-- Index for OPTIONS queries (created for this feature)
CREATE INDEX idx_pnl_daily_v2_asset_class
ON sable.pnl_daily_v2(asset_class, position_key, pnl_date)
WHERE is_active = true;
-- Date index for latest date lookup
CREATE INDEX idx_pnl_daily_v2_date
ON sable.pnl_daily_v2(pnl_date)
WHERE is_active = true;
Key Lessons
-
Match your partial index predicates - Using
deleted_at IS NULLwhen indexes haveis_active = truebypasses all indexes -
Avoid MAX() for "latest row" queries - Use NOT IN or NOT EXISTS patterns with an explicit date lookup
-
LATERAL joins are powerful - For per-row lookups, LATERAL with LIMIT 1 converts a full table scan into N index seeks
-
Use EXPLAIN ANALYZE - The query plan showed sequential scans immediately, pointing to the index mismatch
Using EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM sable.fn_options_filldown(NULL, NULL, TRUE);
Look for:
Seq Scanon large tables → missing index or wrong predicate- High
rows=in nested loops → consider LATERAL Buffers: shared hit=→ how much data scanned
Data Integrity
Synthetic rows must have:
- All P&L values = $0 (P&L transferred to stock on assignment)
- All quantities = 0 (position closed)
This is enforced by a database trigger (trg_synthetic_pnl_zero) and validated by:
SELECT * FROM sable.qa_synthetic_rows;
-- Must return status = 'PASS'