On February 3rd, we published a blog post titled "The 71.9% Edge." In it, we claimed that BV-7X's signal model achieved 71.9% directional accuracy and described this, with considerable enthusiasm, as "a printing press" for binary prediction markets.

That number was wrong. This post explains what happened, what the model actually achieves, and what we have done about it.


Where 71.9% Came From

The figure was not, as the blog post implied, a validated backtest result. It was the output of a logistic transformation—a mathematical function that converts a raw composite score into a probability estimate. The model scored its inputs, adjusted for the current market regime, passed the result through a sigmoid function, and produced 71.9%. It was the model's self-assessed confidence in its own signal.

This is the equivalent of asking a student how well they think they did on an exam and reporting their answer as the grade. It tells you something about the student's calibration. It tells you nothing about their actual performance.

The production validation—a rolling-window out-of-sample test using 40,000 Wild Bootstrap iterations—had already computed the actual number. It was 59.2%. This figure was available in the codebase at the time the blog post was written. It was not used.

Then the model went live, and the real examination began. Over twenty-one predictions under the v3 architecture, BV-7X achieved 33.3% accuracy—seven correct calls. Worse than a coin flip. The model's confidence in itself bore no relationship to its actual performance.


What Went Wrong with v3

The v3 model suffered from three compounding failures, each of which would have been individually damaging and which in combination were fatal:

  • Hardcoded indicators. The 200-day moving average was stored as a static value of $88,000. The actual value at the time was $103,000. The RSI was hardcoded at 50 when the true reading was 26.7. The model was not observing the market. It was observing a snapshot from weeks earlier.
  • No momentum signal. The model had three inputs—trend, flow, and value—but no mechanism for detecting that prices were falling rapidly in the short term. It could read long-term positioning. It was deaf to the crash happening in real time.
  • Misrepresented validation. The 71.9% figure was a self-assessed probability, not a measured outcome. The distinction is fundamental to quantitative research, and conflating the two was a methodological error.

The Rebuild: v4.0 Through v4.3

Between February 5th and February 10th, the model underwent four iterations. Each addressed specific, identified failures rather than adding complexity for its own sake.

v4.0 added a fourth signal—momentum—and replaced all hardcoded indicators with live calculations from CoinGecko daily price data. It added capitulation protection: when Fear & Greed drops below 20 and RSI falls under 35, the model blocks sell signals on the principle that extreme selling pressure is more likely to reverse than continue.

v4.2 introduced three research-driven refinements: a funding rate modifier drawn from CoinGlass perpetual swap data, a supply-in-profit filter based on feature selection work by Omole & Enke (2024), and a selectivity gate that withholds signals when the model's internal confidence falls below a threshold. The thesis was simple: if the model is uncertain, it should say nothing.

v4.3 was informed by the most rigorous test we have conducted to date: one thousand bootstrap backtests across 4,446 days of historical data. The results exposed five specific failure modes, each of which was fixed with a targeted intervention rather than a wholesale redesign.


The 1,000 Backtest Audit

A bootstrap backtest works by resampling historical data with replacement—drawing random subsets of the full dataset and testing the model against each one. Run a thousand times, it produces a distribution of outcomes rather than a single point estimate. This is the difference between saying "the model is 55% accurate" and saying "the model is 55% accurate, and we are 95% confident the true figure lies between 52% and 59%." The latter is honest. The former is a number pretending to be more precise than it is.

The v4.2 results were sobering:

Metricv4.2 Result
Baseline accuracy55.6%
95% confidence interval[52.4%, 58.9%]
Bull market accuracy45.5%
2023 accuracy29.2%
Dead zones (0 signals)5 windows
Max losing streak16 consecutive

The model was performing below a coin flip during bull markets. It generated zero signals for the entirety of 2016 and 2017—two full years of silence because its buy rule required ETF flow data that did not exist before 2024. It sold into the 2023 recovery rally with 29.2% accuracy, getting it wrong seven times out of ten. Its confidence calibration was broken: at its highest confidence level of 0.65, actual accuracy was 52.5%.

The dominant failure mode was clear. Of the model's errors, 280 were wrong sell signals with an average forward return of +5.3%. The model's primary sin was selling before rallies.


Five Fixes

Each fix targeted a specific, empirically documented problem.

1. Buy signals no longer require ETF flows.

The original buy rule demanded both a bullish trend and bullish ETF flows. Since ETF data only exists from January 2024, the model could not produce a single buy signal for the first decade of Bitcoin's history. The fix allows a weaker buy when trend and momentum align, even without flow confirmation.

2. Anti-reversal filter.

If the seven-day rate of change is positive—meaning price is actually rising—the model no longer issues sell signals. The logic is elemental: do not bet against the direction price is already moving in the near term.

3. RSI dead zone gate.

In the RSI 30–40 range, the model's error rate was 52.5%—effectively random. The fix requires higher confidence before signalling in this zone, filtering out the noise.

4. Confidence recalibration.

The model was systematically overconfident. A confidence reading of 0.65 corresponded to only 52.5% real accuracy. A scaling factor now compresses confidence estimates toward realistic levels.

5. Bull market sell suppression.

When price is more than 20% above the 200-day moving average—an unambiguous uptrend—the model now suppresses sell signals unless extreme conditions are present. This addressed the 45.5% bull market accuracy directly.


Where We Stand

After implementing these five corrections, we ran the full thousand-backtest suite again. The results:

Metricv4.2v4.3
Baseline accuracy55.6%60.3%
95% confidence interval[52.4%, 58.9%][57.7%, 63.2%]
Bull market accuracy45.5%57.1%
Bear market accuracy62.1%66.7%
2023 accuracy29.2%52.3%
Dead zones5 windows0
Actionable signals9151,363
Max losing streak1615

The lower bound of the 95% confidence interval is now 57.7%. This means that even under pessimistic resampling, the model performs meaningfully above chance. The upper bound is 63.2%. The true accuracy, with high confidence, lies somewhere in this range.

This is not 71.9%. It was never going to be 71.9%. The published academic literature on Bitcoin direction prediction suggests that 60–65% on a seven-day horizon, using rule-based systems against real-time data, is approximately what careful methodology can deliver. Machine learning approaches occasionally report higher figures, but these typically use next-day horizons, in-sample evaluation, or both.


What 60% Actually Means

A 60.3% accuracy on binary directional predictions produces a mathematical edge of 10.3 percentage points over chance. Over one thousand trades, the probability of this being luck is vanishingly small. Applied with even rudimentary position sizing, it compounds.

But we are not going to call it a printing press. A 60% edge is a real but modest advantage that requires discipline, patience, and continuous monitoring to exploit. It will endure losing streaks of fifteen consecutive wrong calls. It will underperform in choppy sideways markets. It will, occasionally, look like it has stopped working entirely.

The correct analogy is not a printing press. It is a slightly weighted coin. Flip it once, you learn nothing. Flip it a thousand times, the edge emerges. Flip it ten thousand times, and the compounding becomes difficult to ignore.


What Remains Broken

Transparency requires acknowledging what the model still cannot do.

  • 2021 remains at 46.9% accuracy. The May crash, summer recovery, and November collapse created whiplash that no momentum-based system handles gracefully.
  • Recent performance is mediocre. The 2024–2025 period sits around 53–55%—barely above baseline. Choppy, range-bound markets are the model's weakest environment.
  • Confidence calibration is still imperfect. At the highest confidence bucket, actual accuracy drops to 46.9%. The model is most wrong precisely when it is most certain.
  • Fifteen consecutive wrong calls is still the worst-case losing streak. Any position sizing strategy must survive this.

The Standard We Are Setting

This post exists because the original claim was wrong and correcting it matters more than the embarrassment of admitting it.

In crypto, projects routinely publish inflated accuracy metrics, cherry-picked backtests, and self-assessed confidence scores as though they were validated results. The 71.9% blog post was, inadvertently, an instance of the same practice. Not fraudulent—the mistake was methodological, not intentional—but the effect on the reader is the same regardless of intent.

Going forward, every accuracy claim from BV-7X will be accompanied by three things: the sample size, the confidence interval, and the methodology used to compute it. A single number without context is worse than no number at all.

The model is v4.3.0. Its backtest accuracy is 60.3%, with a 95% confidence interval of [57.7%, 63.2%], computed across 1,000 bootstrap iterations on 4,446 days of historical data using the parsimonious four-signal architecture. Its live track record under this version is zero predictions resolved. That will change over the coming weeks, and the results—whatever they are—will be published on the scorecard without editorial commentary.

We owe our holders and users accurate information, not optimistic information. This is the accurate information.


Verify Everything

The scorecard, signal methodology, backtest results, and full bootstrap analysis are all public. Audit the data yourself.

View Live Signal →

Mischa0X
Building: BitVault, VaultCraft, BV-7X
Previously: Popcorn DAO, IKU Protocol, DrPepe.ai