Sealed Before the Event: A Publicly…
A one-operator forecasting record ran the scientific method in public for three years: predictions sealed cryptographically before events, graded against externally recorded outcomes, misses kept in, and every analysis recomputable by anyone. This paper documents the verification protocol, the 68-call graded record, the pre-specified analyses, the alternative explanations that failed, and the honest limits — with the generating method treated, deliberately, as a black box.
**Abstract.** Public forecasting suffers a verification gap: claims of foresight are typically asserted after the fact, graded by their author against shifting standards, or hosted on platforms whose question formats exclude high-specificity calls. This paper documents a protocol that closes the verifiable half of that gap, and the record produced by running it publicly since 2023: 68 graded forecasts — 23 in launch mission assurance across five competing providers, 45 in strategic and geopolitical warning — each sealed before its event (SHA-256 over the verbatim claim, timestamped by a public platform, anchored to the Bitcoin blockchain via OpenTimestamps), graded against externally recorded outcomes under a published rubric, with misses retained at full weight and the complete grading ledger frozen and independently recomputable. Pre-specified analyses are reported: a clustered significance test (36 of 44 independent events strict-successful; exact binomial p = 1.3×10⁻⁵ at a coin-flip floor per event; break-even luck-prior ≥ 0.695), a signal-detection decomposition (zero false all-clears, zero missed catastrophes; the only errors are two severity overcalls, both self-penalized), a 23-row dose-response table with its two gradient-breakers shown in place, provider-neutrality and natural-experiment results, and, on the warning desk, lead-time and specificity structure for which no official-channel comparator existed in most cases. Alternative explanations — fabrication, insider information, self-fulfilling prophecy, Barnum vagueness — are addressed by construction or by natural experiment; the strongest surviving mundane hypothesis is named, and the evidence bearing on it is stated together with what cannot discriminate it. The generating method is proprietary and deliberately undisclosed; the paper evaluates the record as a black box. The aggregate Brier score (0.0717) is conceded to be matched by a base-rate baseline on this high-confidence sample; the record's information lives in specificity, lead time, and the intensity structure documented here. All data, seals, and analyses are public and recomputable without trusting the author.
## 1. Introduction
Science's demand of any forecaster is procedural, not credential: state the prediction before the event, in falsifiable terms; let reality adjudicate; keep the failures. Feynman's formulation is the shortest: guess, compute the consequences, compare with experiment — "if it disagrees with experiment, it's wrong." Almost no public forecasting meets this demand. Retrospective claims dominate; graded records are rare; records graded by third parties against pre-registered, high-specificity claims are rarer still. The prediction platforms that do enforce pre-registration (prediction markets, forecasting tournaments) restrict themselves to consensus-resolvable question formats, which by construction exclude the class of forecast documented here: a named mechanism, a named place, a dated window, sealed months before any event existed to observe.