Stop backtesting
with yesterday's winners.
Download today's Nasdaq-100 list and backtest it — you silently inflate returns by 208 bps/year. You're only testing on survivors — companies that made it. YHOO, RIMM, WFMI were in the index too. Their data is gone. We collected it before it was.
# ❌ Every "quant" tutorial does this — and gets a lie import yfinance as yf tickers = get_current_ndx_100() # today's 100 survivors data = yf.download(tickers, start="2010-01-01") # +208 bps/yr illusion # ✓ Point-in-time universe — what was ACTUALLY in NDX on each date import pandas as pd pit = pd.read_parquet("ndx_pit_daily.parquet") ohlcv = pd.read_parquet("ndx_ohlcv.parquet") for date in trading_days: universe = pit.loc[date][pit.loc[date]].index # PIT universe features = make_features(ohlcv, pit, as_of_date=date) signals = strategy.get_signals(train, features) # real alpha
Your backtest is measuring the wrong thing
Take today's Nasdaq-100 list. Download 15 years of data. Run signals. Report Sharpe.
The implicit assumption: these 100 companies always existed and always traded.
On every date, use only the tickers that were actually in the NDX on that date — including the ones that later failed, got acquired, or were booted from the index.
That gap is survivorship bias. It accumulates silently every year you backtest on only today's winners. At 208 bps/yr, a strategy that looks like it beats the market by 2% actually doesn't — the edge was in the data selection, not the signal.
Everything in the box
One ZIP. Drop the parquets into crucible, point Claude at the skills files, and you're running institutional-grade backtests in minutes.
This dataset was assembled by cross-referencing Nasdaq-100 component change history with historical price feeds collected over time — including for companies that were later acquired or delisted. That price history is no longer publicly available for most of these tickers. You cannot replicate this dataset by running a script today — which is exactly why it exists.
ndx_ohlcv.parquet
31 MB of OHLCV history. MultiIndex (field, ticker) × date. 210 tickers, 4,900 days. Loads in <1 second with pandas or polars.
ndx_pit_daily.parquet
Boolean membership matrix: pit.loc[date, ticker] is True only if that ticker was in the NDX on that date. 4,900 × 265.
ndx_component_changes.csv
225 add/remove events with exact dates. Study inclusion effects, front-running patterns, or index arbitrage strategies.
ndx_pit_summary.csv
Per-ticker entry/exit dates and trading-day counts. Quick reference for understanding which stocks had short vs long NDX tenures.
6 × CLAUDE.md Skill Files
Markdown methodology guides written for Claude Code to read. Append them to your project's CLAUDE.md and Claude instantly understands:
SKILL_pit_dataset.md— PIT filtering, universe constructionSKILL_triple_barrier.md— López de Prado triple barrier labelsSKILL_cpcv.md— Combinatorial Purged Cross-ValidationSKILL_feature_engineering.md— Stationarity, CS z-scoring, frac diffSKILL_position_sizing.md— Kelly, half-Kelly, vol targetingSKILL_regime_detection.md— 200d MA, vol regime, HMM
The vibe trader workflow
The dataset is designed to plug directly into crucible, the open-source backtesting framework. Claude reads your CLAUDE.md skills, enforces the methodology, and calls you out on anti-patterns.
Buy + download
You get a ZIP with parquets + 6 skill files. Drop them into crucible/data/.
Install skills
Append the skill files to your CLAUDE.md. Claude Code reads them automatically on every session start.
Build strategies
Edit strategy.py. Claude enforces PIT filtering, triple barrier labels, and CPCV validation — and flags your anti-patterns before you commit.
Trust the OOS Sharpe
CPCV generates 15 independent OOS paths. You get a Sharpe distribution — not a single lucky number from one train/test split.
Loading NDX PIT dataset... ohlcv : 4,900 days × 210 tickers (31.3 MB) pit : 4,900 days × 265 tickers (165 KB) Running CPCV C(6,2) = 15 paths... path 00 sharpe=0.81 sortino=1.24 dd=-17.3% path 01 sharpe=0.74 sortino=1.11 dd=-21.5% path 02 sharpe=0.88 sortino=1.38 dd=-14.9% ... ── OOS Summary ────────────────────────────── oos_sharpe : 0.79 oos_sharpe_std : 0.09 ← tight distribution = robust signal cpcv_paths : 15 folds_passed : 13/15 max_drawdown : -21.5% elapsed : 47.3s
One price. Everything included.
No subscription. No API keys. No rate limits. Buy once, use forever.
- ✓
ndx_ohlcv.parquet— 4,900 days × 210 tickers OHLCV - ✓
ndx_pit_daily.parquet— point-in-time membership matrix - ✓
ndx_component_changes.csv— 225 rebalancing events - ✓
ndx_pit_summary.csv— per-ticker entry/exit dates - ✓ 6 × CLAUDE.md skill files (PIT, triple barrier, CPCV, features, sizing, regime)
- ✓ Works with crucible open-source backtesting framework
- ✓ Apache Parquet — loads in <1s with pandas / polars
- ✓ Instant download after payment
Instant download after payment. For research and strategy development only. Not financial advice.
vs the alternatives
Ready to build strategies that actually work?
Join other Claude Code traders who stopped testing against phantom alpha.