The Setup

Kalshi has run daily Bitcoin prediction markets since June 2024. Each market asks the same question: will BTC be higher or lower in 7 days? When the window closes, the market settles. Up or down. Binary outcome, publicly recorded, no ambiguity.

561 markets have settled. We took the BV-7X V5 model — four macro signals (trend, momentum, flow, value) with a 7-day horizon — and ran it against every Kalshi observation where we had overlapping data. 520 matched days, from June 2024 through January 2026. Ground truth is Kalshi's own settlement brackets, not ours.

This is the head-to-head.

What this comparison does

It compares BV-7X's signal on day T against the actual 7-day outcome — which is what Kalshi settles on. The Kalshi data provides the ground truth timeline, not the crowd's implied odds.

What it does not do: compare against Kalshi's implied probability at event open. The dataset contains only settlement outcomes, not crowd probabilities. The "48.5% base rate" is simply the percentage of 7-day windows that resolved UP — the accuracy of a naive "always predict UP" strategy, not a crowd consensus figure.

The framing: BV-7X 59.7% vs coin flip 50% vs naive-UP 48.5%, all measured against Kalshi's settled 7-day brackets as ground truth.


The Baseline

Coin flip: 50%. The Kalshi crowd's actual base rate across all 561 settled markets: 48.5% resolved UP. On our 520 overlapping days: 50.6% resolved UP. Bitcoin on a weekly horizon is effectively a coin toss. Any sustained accuracy above 50% is signal. Everything below is noise dressed up as conviction.


Results

Metric BV-7X V5 Coin Flip Kalshi Crowd
Accuracy 59.7% (273/457) 50.0% 48.5% (272/561)
BUY accuracy 56.6% (146/258)
SELL accuracy 63.8% (127/199)
Signal rate 87.9% (457/520) 100% 100%
Edge vs flip +9.7pp -1.5pp
Kelly fraction 19.5% 0% 0%

457 actionable signals out of 520 days. 63 HOLDs. The model chose not to play 12.1% of the time. When it played, it was right 59.7% of the time. The crowd played every day and landed below 50%.


The SELL Edge

SELL accuracy at 63.8% across 199 signals is the standout. The V5 strength-mode SELL gate plus a flow-confirmation filter requires institutional selling conviction before calling a decline. When trend turns bearish but ETF flows haven't confirmed, the model stands aside rather than forcing a weak sell. It doesn't short Bitcoin recklessly. It requires convergence from multiple independent indicators before calling a 7-day decline.

BUY at 56.6% across 258 signals. Both sides are well above the coin flip baseline. 457 out of 520 days produced an actionable signal — the model sits out 12% of the time, mostly during ambiguous regime transitions, and still maintains a positive edge on both sides of the trade.


Month by Month

Month Signals Correct Accuracy HOLDs Base UP
Jun 202415960.0%013.3%
Jul 202417952.9%554.5%
Aug 202413969.2%940.9%
Sep 202414750.0%665.0%
Oct 2024201260.0%373.9%
Nov 2024211885.7%085.7%
Dec 2024291551.7%251.6%
Jan 2025311754.8%029.0%
Feb 2025281967.9%039.3%
Mar 2025271763.0%438.7%
Apr 2025201050.0%1076.7%
May 2025311961.3%061.3%
Jun 2025281760.7%263.3%
Jul 2025311651.6%051.6%
Aug 202520945.0%1141.9%
Sep 2025261453.8%466.7%
Oct 2025281553.6%329.0%
Nov 2025292482.8%126.7%
Dec 2025281760.7%351.6%

The edge is distributed. No single month carries the aggregate. Best months: Nov 2024 (85.7%), Nov 2025 (82.8%), Aug 2024 (69.2%). Only one month below 50%: Aug 2025 (45.0%). Months that previously struggled — Sep 2024, Dec 2024, Oct 2025 — now sit above 50% thanks to the correction override and flow-confirmation gate. The model's remaining weak spot is choppy sideways action where no trend-following system has an edge.


Caveats

The 59.7% is an in-sample number. The V5 model's thresholds were optimized on historical data that overlaps this Kalshi period. That means the comparison flatters the model. We know this and we're saying it upfront.

The unbiased estimate comes from walk-forward testing: 19 expanding-window folds, each with a mini grid search on the training set and a 180-day out-of-sample test window. That number is 61.2% (764/1,248). The out-of-sample accuracy actually exceeds the in-sample Kalshi number. That's unusual and worth pausing on.

Overfitting is the cardinal sin of quantitative modelling. It means your model has memorized the past rather than learned from it — it performs beautifully on historical data and collapses on new data. The telltale sign is a large gap between in-sample accuracy and out-of-sample accuracy. BV-7X shows the opposite: 59.7% in-sample, 61.2% out-of-sample. The model isn't fitting to noise. It's capturing something structural about how Bitcoin moves on a weekly horizon — trend persistence, institutional flow confirmation, mean reversion at extremes. Four macro signals, simple thresholds, no neural nets, no hundred-parameter black boxes. Parsimony is the best defence against overfitting, and this model was built parsimonious from day one.


What This Means

+9.7 percentage points over a coin flip. Kelly criterion says 19.5% of bankroll per bet. On Kalshi's weekly BTC markets, that's a quantifiable, repeatable edge. The model doesn't predict the future — it tilts the odds. Over hundreds of bets, tilted odds compound. That's the thesis. 520 days of data say the tilt is real.


See the Signal

~60% accuracy across 520 days vs Kalshi. Live signals updated daily.

View Dashboard
Mischa0X
Building BV-7X — an autonomous AI oracle for Bitcoin macro signals.
Previously: derivatives infrastructure, quantitative research.