The Hidden Costs of LLMs in Production

Large Language Models have captured the imagination of the tech industry. They can write code, answer questions, and generate content that feels almost magical. But magic has a way of disappearing when you need reliability.

When Traditional Code Beats LLMs

There’s a pattern I’ve seen repeatedly: a team reaches for an LLM to solve a problem that would be better served by traditional code. The LLM works in demos, struggles in production, and eventually gets replaced by the straightforward solution that should have been built first.

The Case for End-to-End Tests

The testing pyramid tells us to write many unit tests, fewer integration tests, and even fewer end-to-end tests. This advice is often misinterpreted as “e2e tests are bad.” They’re not. They test something nothing else can: whether your application actually works for users.

Writing E2E Tests That Don't Break

E2E tests have a reputation for flakiness. Tests that pass locally fail in CI. Tests that passed yesterday fail today. Teams lose trust and eventually disable the tests entirely.

This doesn’t have to happen. Flaky tests are usually a symptom of poor test design, not an inherent property of E2E testing.

Why Refactoring Is Not Optional

“We don’t have time to refactor.” I’ve heard this countless times. And every time, I’ve watched teams spend more time fighting their codebase than building features. Refactoring isn’t a luxury. In a long-lived codebase, it’s survival.