THE PROBLEM
Why This Exists
The planet is already past comfortable margins. Global temperatures are rising, extreme weather is intensifying, and ecosystems are degrading at a pace that makes every new fossil fuel dependency a compounding liability. The electricity grid is one of the largest levers we have, and right now, it is still heavily tied to carbon. Into that already-strained backdrop comes a new and accelerating demand: AI. Data center load is growing faster than clean firm supply can keep up with, and when that gap goes unmet by zero-carbon generation, it defaults to gas. Coal2Core exists at that intersection: the urgency of the climate crisis and the specific, solvable problem of where the next gigawatt of clean, reliable power comes from.
~100 TWh/yrResidual clean gap between projected AI demand (~750 TWh/yr) and clean firm supply (~650 TWh/yr) by 2035
TRAINING THE MODEL
Model Selection & Performance
The model was trained exclusively on 155 verified coal plant records curated from the full operating fleet, no synthetic data or unverifiable plants were added. National projections were then applied across 374 coal plants in total.
Four model families were rigorously benchmarked using five-fold nested cross-validation, with the inner loop handling hyperparameter selection and the outer loop producing unbiased out-of-fold evaluations.
| Model | OOF R² | Stability | Notes |
|---|
| SVR (RBF kernel): Selected | 0.9652 | Very high (std dev = 0.021) | Best fit and most consistent rankings |
| ElasticNet | ~0.86 | High | Noticeably lower accuracy; linear assumptions insufficient for this siting problem |
| BayesianRidge | ~0.86 | High | Similar to ElasticNet; failed to capture non-linear patterns effectively |
| Polynomial Lasso (degree 2) | ~0.68 | Very low | Unstable rankings across folds; rejected due to high variability |
SVR with an RBF kernel was selected as the final model for its superior balance of predictive accuracy and ranking stability. This strong, stable performance confirms that the model reliably learns meaningful, non-linear relationships from the verified data. The lower scores of the alternative models demonstrate that simpler linear or polynomial approaches cannot adequately capture the complex interactions in coal-to-nuclear site suitability.
TESTING THE MODEL
Scoring Coal Plants for Nuclear Suitability
We divided the U.S. coal power plant dataset into a training set (with suitability labels) and a test set (with only raw data). The goal was to assign fair, realistic, and explainable suitability scores to the test set for nuclear conversion potential.
The scoring approach is based entirely on the U.S. Department of Energy's Coal-to-Nuclear Transition Report (2022), which emphasizes key viability factors: strong grid infrastructure, reliable cooling water access, rural space and safety buffers, and manageable environmental conditions. These were translated into four features: Capacity (capacity_normalized + large_capacity_bonus), Location (rural_score), Cooling (dedicated_cooling), and Environment (unlined_ash_penalty).
score = 0.45 + 0.17 × Capacity + 0.13 × (Location + Cooling + Environment)
The 0.45 baseline provides a solid foundation for every qualifying plant. The capacity term (0.17 weight) rewards larger plants with better grid connections, including a bonus for sites at or above 800 MWe. The combined location, cooling, and environment term (0.13 weight) favors rural sites with dedicated cooling while applying penalties for issues like unlined ash ponds.
In practice, raw plant data is used to compute the four features, the formula is applied, and the resulting score (0 to 1) is appended to each test plant's record. Higher scores indicate stronger candidates for nuclear conversion. The process is fully transparent, reproducible, and aligned with DOE engineering and policy criteria.
ROBUSTNESS
Monte Carlo Stress Testing
A single deterministic rank is not enough. Coal2Core runs 1,000 perturbation simulations to test whether a site remains strong when inputs move within realistic bounds. Continuous features including capacity, water distance, transmission distance, population density, retirement year, and hazard metrics are independently perturbed with Gaussian noise scaled to the empirical standard deviation, then clipped to observed data bounds so the simulation does not generate physically implausible plants.
The primary robustness signal is top-decile probability: the share of simulations in which the site lands in the top 10%. A site that stays near the top under uncertainty is more valuable than one that spikes in a single clean run. Sites with a strong deterministic rank but weak top-decile frequency are treated as brittle and are not rewarded.
1,000Perturbation simulations per site; ranking persistence under uncertainty is the objective, not just accuracy on paper
ECONOMICS
Financial Impact
Economics are evaluated with one standardized SMR model over a 40-year horizon so site rankings are not distorted by custom assumptions. Fixed inputs: $6,000/kW overnight CapEx, $120/kW-year fixed O&M, $9/MWh variable O&M, 7% discount rate, 93% capacity factor, $90/MWh electricity price.
Three CapEx scenarios are tested: optimistic (0.8×), base case (1.0×), and pessimistic (1.3×). The top candidates remain attractive in the base case, while the pessimistic scenario identifies which sites are financially resilient rather than merely best-case winners.
$6,000/kWOvernight CapEx baseline for a standardized 40-year NPV at 7% discount rate and 93% capacity factor
CLIMATE IMPACT
Carbon & Energy Impact
Avoided emissions are estimated by replacing coal generation with near-zero operational SMR output at a 93% capacity factor: CO₂ avoided = Capacity (MW) × 8,760 h × 0.93 × 1.0 ton/MWh.
The top 50 ranked sites represent approximately 83 GW of capacity, translating to roughly 671 TWh per year of annual clean generation, far exceeding the ~100 TWh/yr AI clean gap. Even a targeted subset of high-ranking coal sites could materially reduce the carbon cost of AI expansion while adding firm, dispatchable grid capacity.
~83 GWCombined capacity of the top 50 ranked sites: ~671 TWh/yr of clean generation, closing the AI gap several times over