Empirical validation of Fibonacci levels: anchor selection, microstructure interactions, and null models
Looking to design a robust test of whether Fibonacci retracement/extension levels add predictive value beyond standard market structure signals and round-number effects. Most backtests I see bake in subjective anchor selection and lax event definitions, inflating hit rates. Proposal and questions:
Proposed methodology
Anchors
- Compare three algorithmic anchor schemes on intraday and swing horizons:
1) ZigZag with adaptive threshold in ATR units (e.g., 1.5-3 ATR), fixed lookback.
2) Bayesian change-point or structural break detection on log returns to define regime legs.
3) Volume-weighted swings: endpoints minimize cumulative signed volume imbalance and maximize directional delta.
- Anchor normalization: log-price distance; de-duplicate overlapping swings via interval scheduling to avoid multiple counting.
Levels and event definition
- Retracements: 0.236, 0.382, 0.5, 0.618, 0.786; extensions: 1.272, 1.618, 2.0.
- Touch tolerance scaled by microstructure noise: max(2-3 ticks, 0.1-0.2 times recent bid-ask spread, or 0.1 ATR).
- Reaction criterion: after first touch, adverse excursion less than next level’s distance while favorable excursion ≥ k ATR (k = 0.5, 1.0) within T bars; alternatively, survival analysis on reversal hazard at distance-to-level.
Null models and controls
- Equal-interval control grid: 0.25, 0.5, 0.75 (or quantiles of prior leg’s return).
- ATR-multiple grid: k × 0.5 ATR; and round-number grid (price ending in 00/50 for FX; big figure for futures).
- Randomized anchor rotation within the same leg length distribution to control for data-snooping on anchor placement.
- Confluence controls: prior day high/low, session VWAP and anchored VWAP at swing endpoints, high-volume nodes (VPVR), options gamma exposure nodes (for index/large-cap).
Microstructure layer (where L2 is available)
- Measure passive liquidity density and imbalance within ±x ticks of each level; iceberg detection via replenishment patterns; sweep frequency and fill ratio changes pre/post touch.
- Queue position and slippage analysis for limit entries placed one tick ahead vs at the level.
Regime segmentation and robustness
- Partition by volatility regime (realized vol quantiles), trend strength (ADX, Hurst), and time-of-day buckets.
- Cross-asset: index futures, major FX, liquid commodities, single-name equities; multiple compressions (1-5m, 15-60m, daily).
- Out-of-sample evaluation with anchored walk-forward to prevent look-ahead bias from dynamic anchor selection.
Evaluation metrics
- Incremental predictive power: logistic regression or gradient boosting with features {distance to nearest Fib level, distance to controls, confluence flags, regime dummies}; report AUC and Brier improvement over controls.
- Event study: mean conditional return and hit rate uplift at Fib vs control levels, with clustered standard errors by day and instrument.
- Execution-adjusted PnL with realistic costs and adverse selection, not just level hit rates.
Questions for the group
1) Anchor selection: Which of the above anchor algorithms has yielded the most stable, out-of-sample results for you? Any better alternatives for defining swing legs that avoid hindsight bias?
2) Tolerance bands: How do you parameterize “respecting” a level relative to spread and volatility without curve-fitting? Fixed ticks, percentage, or volatility-scaled?
3) Confluence: Does adding anchored VWAP from swing endpoints or high-volume nodes materially improve edge at 0.382/0.618? Any quantified uplift you can share?
4) Microstructure: For those with order book data, do you observe statistically significant absorption or replenishment around 0.618 versus control grids, or is liquidity clustering explained by round numbers and session extremes?
5) Regimes: In which regimes, if any, do extensions (1.272, 1.618) outperform simple swing-projection or ATR targets? Any evidence that 0.618 is uniquely informative versus a generic 0.65 retracement?
6) Multiple testing: Preferred protocols to limit data snooping when scanning multiple levels, anchors, and assets? Nested cross-validation, hierarchical testing, or empirical Bayes?
7) Options overlay: Has anyone tied Fib projections to options structures (e.g., verticals at 1.272/1.618) and measured risk-adjusted vs delta-only exits?
If there is interest, I can share a minimal, reproducible pipeline outline (data schema, anchor code, event labeling) so results across venues can be compared apples-to-apples.