Introduction
Candlestick patterns are everywhere in forex education. Hammer. Engulfing. Morning Star. Traders are told these formations signal reversals, predict price direction, and create edge. But do they? We decided to find out — rigorously, with real data, and without cherry-picking.
We ran 105 systematic backtests — every combination of 21 classic candlestick patterns across 5 major forex pairs — using 23 years of OANDA daily candle data (2002–2025). No hindsight, no manual interpretation. Pure mechanical signal detection with realistic trading costs. Here's what the data actually shows.
Methodology
Data: OANDA historical daily candles for EUR/USD, GBP/USD, USD/JPY, GBP/JPY, and AUD/USD from May 2002 to March 2025 — approximately 7,000 bars per pair.
Pattern detection: TA-Lib's battle-tested candlestick recognition library. Every pattern is detected mechanically using the same algorithm, applied consistently across all pairs and time periods. No human interpretation.
Walk-forward testing: To prevent overfitting, we used a strict walk-forward methodology — 24 months of training data followed by a 12-month out-of-sample test window, rolled forward through the full dataset. This produced 8 independent test periods per strategy, each tested on data the model had never seen.
Trade mechanics: Each pattern signal triggers a market entry on the next open. Exit rules: 0.5% stop-loss, 1.0% take-profit (2:1 reward-to-risk ratio), or a maximum 96-bar holding period. Position size: 10% of capital per trade. Commission: $2.50 per trade. Slippage: 5 basis points.
What we measured: Average return per test window, win rate, consistency ratio (% of test windows with profitable results), and total trade count across all 5 pairs.
Total runs: 105. Total trades executed: 1,813.
Complete Results: All 21 Patterns Ranked
Results are ranked by average return across all 5 pairs, then by consistency ratio. Patterns with zero trades were not detectable on daily forex charts using TA-Lib's strict mechanical criteria.
| Pattern | Total Trades | Avg Win Rate | Consistency | Avg Return |
|---|---|---|---|---|
| Doji Star | 77 | 33.5% | 42.5% | +0.002% |
| Gravestone Doji | 38 | 22.2% | 22.5% | +0.002% |
| Hammer | 32 | 15.8% | 15.0% | +0.002% |
| Belt Hold | 351 | 23.2% | 12.5% | 0.000% |
| Dragonfly Doji | 54 | 34.6% | 35.0% | 0.000% |
| Engulfing | 296 | 24.2% | 27.5% | 0.000% |
| Harami | 192 | 31.3% | 35.0% | 0.000% |
| Inverted Hammer | 26 | 24.6% | 25.0% | 0.000% |
| Hanging Man | 71 | 14.0% | 20.0% | 0.000% |
| Shooting Star | 25 | 8.1% | 7.5% | 0.000% |
| Tristar | 276 | 6.3% | 5.0% | 0.000% |
| 3 Black Crows | 2 | 0.0% | 0.0% | 0.000% |
| 3 White Soldiers | 1 | 2.5% | 2.5% | 0.000% |
| Harami Cross | 71 | 26.6% | 27.5% | -0.002% |
| Doji | 301 | 32.3% | 22.5% | -0.010% |
| Abandoned Baby | 0 | — | — | n/a |
| Breakaway | 0 | — | — | n/a |
| Dark Cloud Cover | 0 | — | — | n/a |
| Evening Star | 0 | — | — | n/a |
| Morning Star | 0 | — | — | n/a |
| Piercing | 0 | — | — | n/a |
Consistency = % of 12-month test windows with profitable results. Patterns in grey produced zero detections on daily charts across all 5 pairs and 23 years of data.
Top Performers: Doji Variants Lead on Reliability
The three patterns with the best average returns — Doji Star, Gravestone Doji, and Hammer — share something important: they're all defined by extreme wicks relative to the candle body. These structures capture moments of genuine market indecision or rejection, which appear to have slightly more predictive value than body-based patterns on daily charts.
Doji Star (42.5% consistency) was the standout for reliability — it was profitable in more test windows than any other pattern. Across 77 trades over 5 pairs and 23 years, it held a 33.5% win rate. Given our 2:1 take-profit-to-stop-loss ratio, you need better than 33% to break even — Doji Star sits right at that threshold.
Dragonfly Doji deserves special mention: it produced the highest win rate overall (34.6%) and a 35% consistency ratio, despite its average return rounding to zero. It also showed meaningful signal strength on USD/JPY (50% win rate across 16 trades), suggesting yen pairs respond better to wick-based patterns.
Harami and Engulfing generated the highest trade counts among the better-performing group — 192 and 296 trades respectively across 5 pairs — with consistent mid-20s to low-30s win rates. As continuation/reversal signals they fire frequently, but their edge alone is thin.
Bottom Performers: Rarely Seen, Rarely Useful
Six patterns — Abandoned Baby, Breakaway, Dark Cloud Cover, Evening Star, Morning Star, and Piercing — generated exactly zero trades across 5 pairs and 23 years of daily data. This is not a backtest failure. It's a real finding: TA-Lib's strict mechanical definitions for these multi-candle patterns require very specific gap-and-body relationships that almost never occur in modern forex daily data.
Forex daily candles rarely show the clean gaps these patterns require. Stock markets close overnight; forex doesn't. The 'gap' component of Evening Star, Morning Star, and Dark Cloud Cover essentially never appears on forex daily charts the way it does in equity markets where these patterns were originally codified.
Tristar is the clearest loser among patterns that actually traded: 276 trades, 6.3% win rate, 5% consistency. It fired frequently (which seems promising) but was right only 1 in 16 times — well below the 33% needed to profit at 2:1 R/R.
Shooting Star had similar problems: 25 trades, 8.1% win rate. Counterintuitively, patterns that fire infrequently but correctly (Doji Star, Gravestone Doji) outperformed high-frequency patterns with poor win rates.
Surprise Findings
Inverted Hammer on GBP/USD: The single most striking number in our dataset — Inverted Hammer produced a 62.5% win rate on GBP/USD across 8 trades. At 2:1 R/R this would be highly profitable. But 8 trades over 23 years is too small a sample to conclude anything. This is exactly the kind of apparent anomaly that leads traders astray — always check sample size before trusting a number.
Doji cost money despite a 32% win rate: The plain Doji — one of the most-watched forex signals — was the worst performer, losing -0.010% on average and generating 301 trades. High frequency, slightly below break-even win rate, and costs erode the rest. The Doji is so well-known it may be arbitraged away.
USD/JPY responded best to wick patterns: Dragonfly Doji (50% win rate, 16 trades), Gravestone Doji (+0.01% return), and Doji Star (+0.01% return) all performed better on USD/JPY than on European pairs. Yen pairs have unique volatility characteristics driven by BOJ intervention and carry trade dynamics — these may create more genuine reversal moments at wick extremes.
The Honest Verdict: Patterns Are Clues, Not Edge
The headline finding is clear: no candlestick pattern alone generated statistically significant, consistent profits on daily forex charts over 23 years. The best performers barely reached break-even. Most lost slightly after costs.
This doesn't mean candlestick patterns are useless. It means they're incomplete signals. A Doji Star at a major support level, with RSI oversold, on a pair showing declining momentum — that combination may have real predictive power. The same Doji Star in the middle of a trend, in isolation, does not.
The win rates tell the deeper story. Patterns like Dragonfly Doji (34.6% win rate) and Doji Star (33.5%) are right often enough to matter — they just need something more to tip the balance above costs. In our mechanical backtest, that 'something more' wasn't present. In a discretionary system with proper context, it might be.
The patterns that fired most frequently (Belt Hold, Engulfing, Tristar, Doji) had the worst win rates. Rarity and precision of a pattern appears inversely correlated with how often it misfires — which makes intuitive sense. Common patterns reflect common market states; rare, strict patterns identify more extreme and potentially more meaningful moments.
Combining Patterns with Indicators: Where Real Edge Emerges
Our backtest isolated candlestick patterns to measure their standalone contribution. Real trading systems don't work this way — and shouldn't. The appropriate use of a candlestick pattern is as a trigger, not a strategy.
Consider adding any of the following filters before acting on a pattern signal: RSI below 30 or above 70 (confirming the reversal reading), MACD crossover alignment (momentum confirmation), key support/resistance proximity (structural context), or volume spike on the pattern candle (conviction confirmation).
Portfolio Signals continuously backtests exactly this kind of multi-factor combination, including candlestick patterns combined with RSI, SMA trend filters, and Stochastic RSI. When patterns appear in high-probability confluence zones, the numbers look meaningfully different from the standalone results you see here.
Methodology Caveats
Mechanical vs. discretionary: Human traders spot patterns with visual context — body size relative to recent candles, location in a trend, proximity to round numbers. TA-Lib applies fixed mathematical rules. Some patterns a human would trade, TA-Lib won't trigger, and vice versa. This test measures the signal itself, not skilled human interpretation.
Small samples for rare patterns: Several patterns produced fewer than 10 trades per pair over 23 years. Statistical conclusions from these are unreliable. Treat results for 3 Black Crows (2 trades total), Inverted Hammer GBP/USD (8 trades), and similar small-count results as directional hints, not evidence.
Fixed position sizing: We used 10% position size throughout. Real traders size positions based on volatility, account state, and conviction — this introduces another layer of skill that our backtest doesn't capture.
No regime filtering: The 2008 financial crisis, COVID crash, and post-2022 rate cycle all show up in this dataset. Pattern behavior may differ significantly across market regimes. The walk-forward methodology partially accounts for this by testing across different time periods, but a dedicated regime-filtered analysis would be more granular.
Data source: OANDA institutional feed, daily close-to-close candles. Results may differ on retail broker data due to spread differences and end-of-day pricing conventions.
Try It Yourself
The backtests in this article were generated using Portfolio Signals' production backtest engine — the same system that runs our live pattern detection across 40+ forex pairs.
If you want to explore how candlestick patterns perform when combined with technical indicators, trend filters, and multi-timeframe analysis, Portfolio Signals gives you the tools to test it systematically — without writing code or downloading data.
The data doesn't lie: patterns alone aren't enough. But they're a starting point worth understanding precisely because so many traders rely on them. Now you know what the numbers actually say.