Built for Claude Code Vibe Traders

Stop backtesting
with yesterday's winners.

Download today's Nasdaq-100 list and backtest it — you silently inflate returns by 208 bps/year. You're only testing on survivors — companies that made it. YHOO, RIMM, WFMI were in the index too. Their data is gone. We collected it before it was.

backtest.py
# ❌ Every "quant" tutorial does this — and gets a lie
import yfinance as yf
tickers = get_current_ndx_100()  # today's 100 survivors
data = yf.download(tickers, start="2010-01-01")   # +208 bps/yr illusion

# ✓ Point-in-time universe — what was ACTUALLY in NDX on each date
import pandas as pd
pit  = pd.read_parquet("ndx_pit_daily.parquet")
ohlcv = pd.read_parquet("ndx_ohlcv.parquet")

for date in trading_days:
    universe = pit.loc[date][pit.loc[date]].index  # PIT universe
    features = make_features(ohlcv, pit, as_of_date=date)
    signals  = strategy.get_signals(train, features)  # real alpha
0 bps/yr Survivorship bias eliminated
0 days Daily trading history
0 tickers Historical NDX members tracked
0 years Jan 2007 → today
0 events Index rebalancing events
36 / 36 Validation checks passed

Your backtest is measuring the wrong thing

What most backtests do

Take today's Nasdaq-100 list. Download 15 years of data. Run signals. Report Sharpe.

The implicit assumption: these 100 companies always existed and always traded.

19.63% CAGR (survivor-only, 2010–2026)
What PIT data lets you do

On every date, use only the tickers that were actually in the NDX on that date — including the ones that later failed, got acquired, or were booted from the index.

17.54% CAGR (point-in-time, 2010–2026)
208 bps/yr

That gap is survivorship bias. It accumulates silently every year you backtest on only today's winners. At 208 bps/yr, a strategy that looks like it beats the market by 2% actually doesn't — the edge was in the data selection, not the signal.

Everything in the box

One ZIP. Drop the parquets into crucible, point Claude at the skills files, and you're running institutional-grade backtests in minutes.

This dataset was assembled by cross-referencing Nasdaq-100 component change history with historical price feeds collected over time — including for companies that were later acquired or delisted. That price history is no longer publicly available for most of these tickers. You cannot replicate this dataset by running a script today — which is exactly why it exists.

ndx_ohlcv.parquet

31 MB of OHLCV history. MultiIndex (field, ticker) × date. 210 tickers, 4,900 days. Loads in <1 second with pandas or polars.

31 MB · Apache Parquet · split-adjusted

ndx_pit_daily.parquet

Boolean membership matrix: pit.loc[date, ticker] is True only if that ticker was in the NDX on that date. 4,900 × 265.

165 KB · bool dtype · instant slice

ndx_component_changes.csv

225 add/remove events with exact dates. Study inclusion effects, front-running patterns, or index arbitrage strategies.

225 events · 2007-02-01 → 2026-06-22

ndx_pit_summary.csv

Per-ticker entry/exit dates and trading-day counts. Quick reference for understanding which stocks had short vs long NDX tenures.

265 tickers · first_date, last_date, n_days

6 × CLAUDE.md Skill Files

Markdown methodology guides written for Claude Code to read. Append them to your project's CLAUDE.md and Claude instantly understands:

  • SKILL_pit_dataset.md — PIT filtering, universe construction
  • SKILL_triple_barrier.md — López de Prado triple barrier labels
  • SKILL_cpcv.md — Combinatorial Purged Cross-Validation
  • SKILL_feature_engineering.md — Stationarity, CS z-scoring, frac diff
  • SKILL_position_sizing.md — Kelly, half-Kelly, vol targeting
  • SKILL_regime_detection.md — 200d MA, vol regime, HMM
6 files · drop into CLAUDE.md · instant methodology context

The vibe trader workflow

The dataset is designed to plug directly into crucible, the open-source backtesting framework. Claude reads your CLAUDE.md skills, enforces the methodology, and calls you out on anti-patterns.

01

Buy + download

You get a ZIP with parquets + 6 skill files. Drop them into crucible/data/.

02

Install skills

Append the skill files to your CLAUDE.md. Claude Code reads them automatically on every session start.

03

Build strategies

Edit strategy.py. Claude enforces PIT filtering, triple barrier labels, and CPCV validation — and flags your anti-patterns before you commit.

04

Trust the OOS Sharpe

CPCV generates 15 independent OOS paths. You get a Sharpe distribution — not a single lucky number from one train/test split.

$ python backtest.py
Loading NDX PIT dataset...
  ohlcv  : 4,900 days × 210 tickers  (31.3 MB)
  pit    : 4,900 days × 265 tickers  (165 KB)

Running CPCV  C(6,2) = 15 paths...
  path 00  sharpe=0.81  sortino=1.24  dd=-17.3%
  path 01  sharpe=0.74  sortino=1.11  dd=-21.5%
  path 02  sharpe=0.88  sortino=1.38  dd=-14.9%
  ...

── OOS Summary ──────────────────────────────
  oos_sharpe     : 0.79
  oos_sharpe_std : 0.09   ← tight distribution = robust signal
  cpcv_paths     : 15
  folds_passed   : 13/15
  max_drawdown   : -21.5%
  elapsed        : 47.3s

One price. Everything included.

No subscription. No API keys. No rate limits. Buy once, use forever.

COMPLETE DATASET
$ 20 one-time
  • ndx_ohlcv.parquet — 4,900 days × 210 tickers OHLCV
  • ndx_pit_daily.parquet — point-in-time membership matrix
  • ndx_component_changes.csv — 225 rebalancing events
  • ndx_pit_summary.csv — per-ticker entry/exit dates
  • 6 × CLAUDE.md skill files (PIT, triple barrier, CPCV, features, sizing, regime)
  • Works with crucible open-source backtesting framework
  • Apache Parquet — loads in <1s with pandas / polars
  • Instant download after payment

Instant download after payment. For research and strategy development only. Not financial advice.

vs the alternatives

Bloomberg Terminal $2,000+/mo Overkill for algo dev
Refinitiv/LSEG $5,000+/mo Enterprise sales required
Download it yourself Can't — data is gone Delisted tickers no longer available
NDX PIT Dataset $20 once Built for Claude Code traders

Ready to build strategies that actually work?

Join other Claude Code traders who stopped testing against phantom alpha.