Clean Code isn't Dead

AI has made it even more important

Sep 26, 2025

Over the course of my career, I’ve talked about clean code… a lot. Nestled in with my hot takes and old man rants on data science, data engineer, and MLOps I have consistently talked about the importance of clean code. I’ve also given hundreds of tips and tricks on the subject on how to do it, what it is, and how to get started. This was important to me, because back when I first starting posting, a large portion of data scientists didn’t write clean code to their own detriment.

The problem was exacerbated as many machine learning delivery pipelines employed the old “chuck it over the wall” approach. Which works fine if all you are sending is a .png file of a pretty chart to put into a presentation. It’s a completely different story when you pass a jupyter notebook with dirty code—which is akin to throwing a grenade at your coworkers.

Soon I saw many others tired of trying to make heads or tails of the cryptographic enigmas their peers were sharing with them make similar posts. A movement was born. As many people picked up the banner exalting clean code we started to see a shift in the industry. Things were actually improving. And then GenAI came along.

Clean Code comes for Free?

To a certain degree, when you ask Github Copilot, Claude Code, or Cursor to write a module for you it looks pretty by default. LLMs are syntax machines and from the massive amount of data they were trained on, they automatically pick up on common linting expectations. Running flake8, black, and ruff is almost a joke now. Not really, but the vibe is there.

And it’s not just that indentation is correct either. Variable names are usually decent and barring that, they are at least no longer only X and Y. Tests are a prompt away. Docstrings appear without you asking. Comments are peppered everywhere. Even type hints are sprinkled in automatically. It’s a senior engineers guilty fantasy come true. The stuff that used to take junior engineers months of head banging to learn over the course of thousands of code review comments, just… kind of happens “out of the box” when an LLM writes the code.

For someone like me it’s like Christmas in July, but there’s a catch. There’s always a catch.

Good Syntax != Clean Code

In the past, good syntax generally correlated with clean code. Programmers who cared enough about their craft to get the little things right were also the same people to know enough to get the big things right too. That or they’d been around long enough banging their head against the “code review wall” to pick up enough tricks and best practices to finally get things right. We call this experience.

But LLMs aren’t trained on experience. They don’t understand the pragmatics of programming. Structure, abstraction boundaries, error handling, testability, good naming (I mean really good naming)—those don’t “fall out” of a model’s training corpus. An LLM can give you scaffolding that looks polished, but it won’t know whether it introduced a subtle dependency that will haunt you six months down the line. Heck, most the time it solves failing tests by simply writing a try-catch block around the critical code, ultimately defeating the entire purpose. This isn’t clean.

The converse was true too, ugly code correlated with dirty code. Code written by those who could care less about clean code, was often terrible, written quickly with shortcuts to get the job done, and often done to fill a resume, and get a promotion. You don’t need to write code once you become a manager, so what’s the point of getting good at it? Ultimately these people move on quickly, leaving a pile of tech debt for someone else to pay the tab.

So while good syntax always was just a proxy of clean code, it was a very good proxy. If you could set a junior data scientist down the war path of learning how to write clean code, they’d come out a battle hardened beast you could trust with the most difficult projects.

If not this, What is Clean Code?

Throughout my career when speaking about clean code, I often received responses like the following, “I just don’t have time to write comments”, “I love code with good comments”, or “Commenting code is a waste of time.” It was a bit annoying for people to boil down Clean Code to simply writing comments especially because I always felt comments were unnecessary, often outdated, and poor proxy for just writing good names—in fact, I’d go so far as to say comments are an antipattern of clean code, but I digress.

I feel like AI has exposed just how many of the current “best practices” were simply clean code theatre. With GenAI and coding agents, well written syntactical code is no longer a proxy for clean code. Clean code has always been about making the code easier to read. The thought process being that you tend to read your code ten times more than you write it. And now? Well we don’t write it at all, so I guess that means you read it infinitely more times than you write.

And sure, getting the team to align on a style guide or simply just using black1 can make it easier to read. Everyone appreciates it when code is nice on the eyes, but that’s not what we are actually concerned with when we want to make code easier to read. We want it to be easier to understand. Can you clearly identify bugs fixing them in a code review before they hit production? Can you determine the overall architecture and determine their weaknesses? Can you effortlessly understand where to extend it and add new features? Clean code does this and the problem it solves hasn’t gone anywhere.

I’ve noticed a trend where some engineers aren’t reading the code anymore. If that’s you, well, then you are no better than the vibe coders you make fun of. Just run it and see if it works, is a good way to drop a database in prod. To be clear, it won’t be long before simply being able to read the code is what separates the engineers from everyone else.

For example, most product owners know enough lingo to ask an agent to use TDD, follow an architectural diagram, and solve an issue. Heck, many of them wrote the JIRA ticket that you just pointed Claude Code at and went to go play Helldivers 2. If you aren’t even reading the code, how can you claim to understand it? If you don’t understand it, why does the product owner need you? He doesn’t understand it just fine all by himself.

Thanks for reading The Data Pioneer! This post is public so feel free to share it.

If you enjoyed the post please like, share, or subscribe. It would mean a lot to me.

“Any color the customer wants, as long as it’s black.” -Henry Ford

MLOps Club

Sep 26

This is Eric, it just looks like I'm MLOps Club lol.

Love this. If clean code is a checklist, then yeah, AI can do it for you. But if you give a checkbox to AI that says "make sure this code is understandable" ... it almost never does that.

You can write more and more markdown files. That's good. When Product managers and junior SWE's go to contribute, those markdown files will help their generated code be closer to great sooner.

It takes strong SWE's to do that, though. Well how about we make a startup that makes good markdown files? Um.... I don't think we'd get far. We'd find our markdown files can't capture the nuances specific to each business, or even each problem.

Expand full comment

The Data Pioneer

Discussion about this post

Ready for more?