Data
Live, point-in-time-safe collection at scoring time — no look-ahead from information unavailable on the catalyst date.
| Source | Data |
|---|---|
| ClinicalTrials.gov | Trial phases, enrollment pace, primary endpoints, p-values |
| SEC EDGAR | XBRL financials, Form 4 insider trades, 8-K filings |
| openFDA | Designations, AdComm votes, CRL history, prior approvals |
| CTO Dataset | 14,700 Phase 3 trial outcome labels (Gao et al. 2024) |
| PubMed | Published Phase 3 trial results for endpoint verification |
Model
Each drug starts from a historical phase × indication base rate (BIO/IQVIA). Factors across four signal categories shift the probability up or down in log-odds space; an isotonic map fit on a rolling 2018+ window corrects systematic overconfidence at the high end.
| Category | # Factors |
|---|---|
| Clinical | 12 |
| Regulatory | 11 |
| Financial | 3 |
| Non-linear interactions (SHAP) | 3 |
Validation
Trained on ~1,000 historical public NDA/BLA events through 2020, evaluated out-of-time on ~500 events from 2021-present.
| Metric | Value |
|---|---|
| AUC-ROC (out-of-time, n=512) | 0.841 |
| Small-cap subset (mega-pharma excluded) | 0.81 |
| Brier score | 0.117 |
| Expected Calibration Error | 2.1% |
| Accuracy (optimal threshold) | 83% |
The headline AUC is buoyed by mega-pharma events (PFE, JNJ, NVS) that are trivially predictable from sponsor track record. The small-and-mid-cap subset — where the prediction is hard and the use-case lives — is 0.81.
Benchmarks
All models trained and evaluated on the same out-of-time public test set.
| Model | AUC-ROC |
|---|---|
| ApprovalAlpha | 0.841 |
| Lo et al. 2019 method (reproduced) | 0.807 |
| Phase × indication base rate only | 0.667 |
Designations Are Not a Free Lift
A propensity-matched analysis on the training data: after controlling for base rate, trial results, sponsor history, endpoint type, and mechanism of action, Priority Review is the only FDA designation with a positive causal effect. Breakthrough Therapy, Fast Track, and Orphan Drug are markers of clinical difficulty — FDA grants them to drugs whose path is inherently harder. The model uses the causal-adjusted coefficients, not the naive correlations.
Dossier Outputs
Beyond the headline probability, every scoring emits five structured outputs. Deterministic post-hoc transforms of the model output — no LLMs, no extra training.
| Output | What it is |
|---|---|
| Top drivers | Per-factor pp impact on this specific prediction. |
| Comparables | Five most-similar historical events (cosine similarity weighted by L1 coefficient magnitude) with realised outcomes. |
| Sensitivity | What the prediction becomes if any one factor flips toward the opposite class's typical value. |
| Subscores | Clinical / Regulatory / Sponsor decomposition. Describes which feature bucket pulls probability down, not which CRL category will occur. |
| Probability history | Sparkline of prior scorings for the same drug as new data lands (8-Ks, AdComm decisions, dilution events). |
Disclosures
- Model outputs reflect public information only; non-public FDA review material is not observable.
- Historical accuracy does not guarantee future performance.
- Not investment advice. Research tool, not a buy/sell signal.
References
- Lo, A.W. et al. (2019). Machine learning with statistical imputation for predicting drug approvals. Harvard Data Science Review.
- Siah, K.W. et al. (2021). Predicting drug approvals: the Novartis data science and AI challenge. Patterns.
- Wong, C.H. et al. (2018). Estimation of clinical trial success rates and related parameters. Biostatistics.
- Gao, Z. et al. (2024). CTO: Clinical Trial Outcome prediction dataset. NeurIPS Datasets & Benchmarks.
- BIO/IQVIA/QLS Advisors (2021). Clinical Development Success Rates 2011–2020.
Built by Sean Koth, finance student at Fordham University's Gabelli School of Business.