Skip to main content

Options Filldown SQL Optimization

A case study on how we optimized the options fill-down query from 8+ seconds (timeout) to 350ms.

The Problem

The fn_options_filldown() function was timing out on a 1M+ row pnl_daily_v2 table. We needed to:

  1. Find 45 options that "disappeared" (not in latest PnL date)
  2. Get each option's identity fields (last non-synthetic row)
  3. Generate 17,376 synthetic rows to fill the gaps

Initial Approach (8+ seconds, timeout)

-- SLOW: MAX() aggregation across 1M+ rows
SELECT
position_key,
MAX(pnl_date) AS last_seen_date
FROM sable.pnl_daily_v2
WHERE deleted_at IS NULL -- WRONG index!
AND asset_class = 'OPTIONS'
GROUP BY position_key
HAVING MAX(pnl_date) < v_reference_date;

Problems Identified

  1. Wrong filter condition: Using deleted_at IS NULL but indexes use is_active = true
  2. Expensive aggregation: MAX(pnl_date) per position_key requires scanning all rows
  3. No index utilization: Sequential scan on 1M+ rows

Optimization #1: Match Index Predicates

The table has partial indexes:

CREATE INDEX idx_pnl_daily_v2_unique ON sable.pnl_daily_v2(position_key, pnl_date)
WHERE is_active = true;

Fix: Change WHERE deleted_at IS NULL to WHERE is_active = true:

-- BETTER: Matches partial index
WHERE is_active = true AND asset_class = 'OPTIONS'

This alone reduced query time significantly, but still too slow.

Optimization #2: NOT IN Pattern Instead of MAX

Instead of computing MAX(pnl_date) for every position, we:

  1. Get position_keys that ARE on the latest date
  2. Find position_keys that are NOT in that set
-- Fast: Uses index on (pnl_date, position_key)
WITH on_latest AS (
SELECT position_key
FROM sable.pnl_daily_v2
WHERE is_active = true
AND pnl_date = v_reference_date
AND asset_class = 'OPTIONS'
),
disappeared AS (
SELECT DISTINCT position_key
FROM sable.pnl_daily_v2
WHERE is_active = true
AND asset_class = 'OPTIONS'
AND position_key NOT IN (SELECT position_key FROM on_latest)
)
SELECT * FROM disappeared;

Result: 139ms to find 45 disappeared options (vs 8+ seconds with MAX)

Optimization #3: LATERAL Join for Final State

For each disappeared option, we need its last non-synthetic row. A regular JOIN would still be slow:

-- SLOW: Join scans full table for each match
SELECT d.position_key, p.*
FROM disappeared d
JOIN sable.pnl_daily_v2 p ON p.position_key = d.position_key
WHERE p.pnl_date = (SELECT MAX(pnl_date) FROM ...) -- Nested MAX = slow

Solution: Use CROSS JOIN LATERAL with LIMIT 1:

-- FAST: Uses index seek per position (45 seeks total)
SELECT d.position_key, fs.*
FROM disappeared d
CROSS JOIN LATERAL (
-- Only select identity fields (symbol, account, sector)
-- P&L values are NOT selected - synthetic rows use $0
SELECT symbol, account_code, economic_sector, pnl_date as last_date
FROM sable.pnl_daily_v2
WHERE position_key = d.position_key
AND is_active = true
AND (data_source IS NULL OR data_source NOT LIKE 'synthetic%')
ORDER BY pnl_date DESC
LIMIT 1
) fs;

Why it's fast:

  • LATERAL executes the subquery once per row in disappeared
  • With only 45 rows, that's 45 index seeks
  • Each seek uses the index on (position_key, pnl_date)
  • LIMIT 1 stops after finding the first (most recent) match

Final Query Structure

WITH on_latest AS (
SELECT position_key FROM sable.pnl_daily_v2
WHERE is_active = true AND pnl_date = v_reference_date AND asset_class = 'OPTIONS'
),
disappeared AS (
SELECT DISTINCT position_key FROM sable.pnl_daily_v2
WHERE is_active = true AND asset_class = 'OPTIONS'
AND position_key NOT IN (SELECT position_key FROM on_latest)
),
final_states AS (
SELECT d.position_key, fs.*
FROM disappeared d
CROSS JOIN LATERAL (
SELECT symbol, security_description, account_code, ...
FROM sable.pnl_daily_v2
WHERE position_key = d.position_key AND is_active = true
AND (data_source IS NULL OR data_source NOT LIKE 'synthetic%')
ORDER BY pnl_date DESC LIMIT 1
) fs
),
to_insert AS (
SELECT fs.*,
generate_series(fs.last_date + 1, v_reference_date, '1 day')::DATE AS fill_date
FROM final_states fs
)
INSERT INTO sable.pnl_daily_v2 (...)
SELECT ... FROM to_insert
ON CONFLICT (position_key, pnl_date) WHERE is_active = true
DO UPDATE SET ...;

Performance Comparison

ApproachTimeNotes
Original (MAX aggregation)8+ secTimeout
With is_active = true~4 secBetter but still slow
NOT IN pattern139msFinding disappeared
+ LATERAL join353msTotal with final state

Final result: 23x faster (8000ms → 350ms)

Indexes Used

-- Partial index on position_key, date (used by LATERAL)
CREATE INDEX idx_pnl_daily_v2_unique
ON sable.pnl_daily_v2(position_key, pnl_date)
WHERE is_active = true;

-- Index for OPTIONS queries (created for this feature)
CREATE INDEX idx_pnl_daily_v2_asset_class
ON sable.pnl_daily_v2(asset_class, position_key, pnl_date)
WHERE is_active = true;

-- Date index for latest date lookup
CREATE INDEX idx_pnl_daily_v2_date
ON sable.pnl_daily_v2(pnl_date)
WHERE is_active = true;

Key Lessons

  1. Match your partial index predicates - Using deleted_at IS NULL when indexes have is_active = true bypasses all indexes

  2. Avoid MAX() for "latest row" queries - Use NOT IN or NOT EXISTS patterns with an explicit date lookup

  3. LATERAL joins are powerful - For per-row lookups, LATERAL with LIMIT 1 converts a full table scan into N index seeks

  4. Use EXPLAIN ANALYZE - The query plan showed sequential scans immediately, pointing to the index mismatch

Using EXPLAIN ANALYZE

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM sable.fn_options_filldown(NULL, NULL, TRUE);

Look for:

  • Seq Scan on large tables → missing index or wrong predicate
  • High rows= in nested loops → consider LATERAL
  • Buffers: shared hit= → how much data scanned

Data Integrity

Synthetic rows must have:

  • All P&L values = $0 (P&L transferred to stock on assignment)
  • All quantities = 0 (position closed)

This is enforced by a database trigger (trg_synthetic_pnl_zero) and validated by:

SELECT * FROM sable.qa_synthetic_rows;
-- Must return status = 'PASS'