You push your code. The CI pipeline runs. Tests pass. You merge with confidence. The next person pushes their code, same tests, no changes to the relevant code, and three tests fail. They re-run the pipeline. Everything passes. Nobody investigates. This happens every day in thousands of engineering teams, and it's slowly eroding the very thing test suites are supposed to provide: trust.
These are flaky tests, tests that pass and fail intermittently without any meaningful change to the code they're testing. They're more than an annoyance. They're a systemic problem that affects productivity, morale, and the quality of your software.
What Makes a Test Flaky
A test is flaky when its outcome isn't determined solely by the code it's testing. Something external, time, network conditions, database state, execution order, or system load, influences whether the test passes or fails. The code hasn't changed, but the result has.
The most common sources of flakiness are:
- ●Timing dependencies: Tests that rely on setTimeout, animations completing, or network responses arriving within a specific window. On a loaded CI server, those timing assumptions fall apart.
- ●Shared state: Tests that read from or write to a shared database, filesystem, or global variable without proper isolation. When test A runs before test B, everything works. When the order reverses, test B sees stale data and fails.
- ●External dependencies: Tests that hit real APIs, external services, or rely on specific network conditions. One slow response or rate-limited endpoint, and the test fails.
- ●Race conditions: Async operations completing in a different order than expected. The UI renders before the data arrives, or two concurrent writes collide.
The Real Cost of Ignoring Flaky Tests
It's tempting to just hit “retry” and move on. But flaky tests compound. Once a team starts ignoring test failures, even sometimes, the entire testing culture begins to shift. Here's how the erosion typically works:
Phase 1: “That test is just flaky”
Developers learn which tests are unreliable and start mentally discounting them. When they fail, nobody looks at the failure message. They just re-run.
Phase 2: Real failures get ignored
A legitimate failure happens in a test that's been flaky before. Everyone assumes it's just the usual flakiness. The bug ships to production.
Phase 3: New tests aren't prioritized
Why write tests if they're going to be flaky anyway? The test suite stops growing. Coverage stagnates. The team effectively goes back to manual testing for new features.
Google's engineering productivity team found that approximately 16% of their tests exhibited some degree of flakiness, and that developers spent significant engineering time dealing with flaky test failures. At scale, this isn't just an inconvenience, it's a meaningful drag on velocity.
Strategies That Actually Work
There's no single fix for flaky tests because there's no single cause. But there are proven patterns that dramatically reduce flakiness.
1. Isolate your tests completely
Every test should set up its own state and tear it down afterward. No test should depend on another test having run first. If you're testing against a database, use transactions that roll back. If you're testing against an API, use mocks or dedicated test fixtures. The goal: each test is a self-contained world.
2. Replace timing with polling
Instead of sleep(2000) or fixed timeout waits, poll for the condition you're expecting. Wait until the element appears, until the API responds, until the state changes. Modern testing libraries provide “wait for” utilities specifically for this purpose. Use them. Your tests will be both faster and more reliable.
3. Quarantine, don't delete
When you identify a flaky test, move it to a quarantine suite. It still runs, but it doesn't block merges. This keeps the main pipeline reliable while giving you visibility into what's flaky. Track quarantined tests, set an SLA for fixing them (e.g., no test stays quarantined more than two weeks), and treat the quarantine list like a bug backlog.
4. Record and replay
For tests that depend on external services, record the API responses during development and replay them in CI. Tools like VCR (Ruby), Polly (JavaScript), or WireMock (Java) make this straightforward. You get deterministic tests that still exercise your integration logic, without the network dependency.
The Detection Problem
One of the trickiest aspects of flaky tests is identifying them in the first place. A test that fails once in fifty runs might not raise any alarms. A test that fails in CI but passes locally might be written off as an environment issue.
Effective detection requires two things: running tests multiple times (some teams run new tests five or ten times before accepting them), and tracking test results over time. If you can see that test X has failed four times in the past month with four different commits, that's almost certainly a flaky test, not four separate bugs.
Some CI systems now have built-in flaky test detection. If yours doesn't, even a simple spreadsheet tracking “tests that failed but passed on re-run” gives you valuable data. Pattern recognition is the first step toward fixing the problem.
Building a Flaky-Resistant Culture
Technical solutions are necessary but not sufficient. The teams that truly overcome flakiness do so by establishing norms:
- ●Every flaky test failure is investigated at least once. Even if it's “known flaky,” someone documents why and files a ticket to fix it.
- ●New tests are reviewed for potential flakiness as part of code review. Reviewers ask: “Could this test pass/fail depending on timing? Does it use shared state? Does it hit an external service?”
- ●The team treats a green test suite as sacred. If tests are green, you can merge with confidence. If that promise is broken by flakiness, fixing it takes priority.
The Path Forward
Flaky tests are a solvable problem. Not overnight, and not with a single tool or technique, but through consistent attention and systematic improvement. The investment pays for itself many times over, in developer productivity, in deployment confidence, and in the simple peace of mind that comes from a test suite you can actually believe in.
Start small. Pick the three flakiest tests in your codebase. Fix them. Watch what happens to your team's relationship with the test suite. That shift in trust? That's the real prize.