AI can write decent code. The question is who still pays attention afterward

Michel Mix · Medellín, Colombia · May 7, 2026

Illustration asking whether to ship an AI-generated patch after the build and tests pass

There is something seductive about AI-generated code that looks good right away.

You ask for a small change. The assistant thinks for a moment. Then a patch appears with tidy function names, reasonable tests, and a summary that sounds as if someone walked through the codebase with a clipboard.

Build green. Tests green. AI summary greenish in tone.

Then comes the moment when you should slow down.

Do I understand this code myself? Does it fit the rest of the system? Is it really tested, or mostly dressed up to feel reassuring? Can I explain this to a teammate six months from now without pointing at a chat window?

That, to me, is the core of AI-assisted software development. Not whether AI can write code. We know it can. The better question is: what remains of our engineering standards after AI has written something?

The problem is not that AI helps

AI coding assistants are no longer experimental. They live in editors, chat tools, agents, pull requests, and review workflows. They write boilerplate, tests, documentation, scripts, refactors, and sometimes things that make you think later: hold on, why did we suddenly get a second cache layer?

That sounds cynical, but I do not mean it that way.

AI can genuinely help. Especially on clearly bounded tasks. Setting up a test. Writing out an API call. Adapting an existing function to a visible pattern. Explaining an error message. Drafting an initial review checklist. Those are moments where AI feels like someone who starts putting up the scaffolding while you are still looking at the blueprint.

The literature shows this too. In large field studies, AI tools produce average productivity gains. Developers complete more tasks, make more commits, and move through routine work faster. Less experienced developers can benefit especially, because they spend less time on syntax, boilerplate, and searching (Cui et al., 2025).

But average is a dangerous word.

Average does not tell you what happens in an old codebase with implicit conventions. Average does not tell you how much time you spend checking. Average does not tell you whether the developer still understands the code. And average certainly does not tell you whether the team will be happy three months from now with what was accepted quickly today. Productivity is more than commits; it is also understanding, maintainability, and ownership (Chen et al., 2026).

Fast does not always feel faster

One of the most interesting findings in recent literature is that experienced developers can sometimes be slower with AI (Becker et al., 2025).

Not because they cannot use AI. Not because they want to go back to the typewriter. But because, in Becker and colleagues’ experiment, they worked in their own familiar open-source projects, with conventions and implicit knowledge that do not all fit into a prompt.

In that environment, an AI suggestion is often almost right.

Almost right is treacherous.

Completely wrong is easy to spot. You throw the suggestion away and keep writing. But almost right demands attention. You read. You compare. You adjust. You think: this is neat enough, but it is not how we handle errors. Or: this test tests the mock, not the behavior. Or: this solution works for the example, but forgets the edge cases that production always seems to wait for.

That is the hidden bill.

AI saves writing time, but adds verification time. And that verification time is not free. Research on AI-assisted programming shows that developers spend a lot of time reading, evaluating, and repairing AI output (Mozannar et al., 2024). Other studies show that verification load can accumulate: the longer you keep checking AI output, the more likely your review becomes shallow (Fan et al., 2026).

I recognize that in practice. You check the first suggestion sharply. By the fifth, you think faster: yes, looks logical. That is exactly the moment when you need coffee, not a merge button.

The first question is: is this even an AI task?

A good AI workflow does not start with a prompt.

It starts with the question of whether this task should be given to AI at all.

There is a useful distinction between two modes. In the first mode, you know what you want. You use AI as an accelerator. The task is small, the direction is clear, and you can evaluate the output quickly. Think of a test variant, a refactor within a known pattern, a piece of documentation, or writing out necessary but boring code (Barke et al., 2023).

In the second mode, you do not yet know exactly what you want. You use AI to explore options. That can be useful, but it requires heavier checking. If you do not know the direction yet, AI can confidently walk in the wrong direction. With excellent variable names. That does not help.

There is another point too: AI does not automatically understand your world. Not the codebase, not the users, not the policy, not the release risk. In developer interviews, that boundary comes back clearly: AI can help with a task, but it often lacks the real-world context that gives the task its meaning (Vigh et al., 2026).

So a second question belongs next to the first: what is the risk level?

For boilerplate, AI is often fine. For security-sensitive code, architecture decisions, compliance logic, or release judgment, the situation is different. You can let AI think along, but the judgment remains human. Not ceremonially human. Actually human. Someone has to understand what is happening and take responsibility for what gets checked in.

That is not nostalgia for handcraft. That is professional ownership.

Specification is not paperwork

A lot of bad AI output starts with a task that is too easy to ask for.

“Finish this function.”

“Fix this bug.”

“Write tests.”

That sounds efficient. Usually it is just vague.

AI fills in the gaps. That is what it does. But the model does not fill those gaps with your domain knowledge, your release context, or the agreements that emerged in a pull request three years ago and were never written down afterward. It fills them with probability.

Sometimes that is good enough. Often it is almost good enough.

That is why specification is one of the most important steps in AI-assisted development. Requirements clarification can measurably improve the quality of generated code precisely because the model has to guess less (Mu et al., 2024). Not as a heavy document. As explicit preparation:

What should happen functionally?
Which constraints apply here?
Which existing code or interface is authoritative?
What is explicitly out of scope?
Which risks need to be tested or reviewed?

That may feel like slowing down, but it is the cheapest place to prevent mistakes. An unclear prompt produces an unclear patch. After that, you get to play archaeologist in your own diff. With a little bad luck, you find pottery too.

Good specification does not make AI magical. It makes AI bounded. And bounded is exactly what you want in software development. The goal is not to offload thinking, but to stay engaged while AI reduces friction (Gerlich, 2025; Sarkar et al., 2024).

Verification is where the real work lives

The most important workflow rule is simple:

AI output is a draft.

Not a decision. Not evidence. Not a tiny colleague who has already taken responsibility. A draft.

That is why verification has to be explicit. Not just scrolling through the diff. Not “the tests run, so fine.” But deliberately checking what the change does.

I look at four things.

First: do I understand the code myself? If I cannot explain a line, that is not a sign that AI is smart. It is a sign that I am not done.

Second: is the behavior correct? Not only for the example in the prompt, but for representative input and edge cases.

Third: does it fit the codebase? Naming, error handling, logging, performance, dependencies, test style. AI often writes generally neat code. A codebase needs specifically fitting code.

Fourth: what are the non-functional consequences? Maintainability, security, readability, observability. The boring words production runs on.

Tooling helps here. Tests, linters, type checks, static analysis, security scanners. Not because tooling sees everything. Because tooling consistently sees what humans like to skip on Friday afternoon.

The literature is fairly clear on this: AI output introduces structural quality problems. In a large-scale study of more than 300,000 AI-generated commits, 22.7 percent of identified issues persisted into the latest repository version, with code smells by far the dominant issue type (Liu et al., 2026). Security is even more sensitive. Research on secure code generation shows that “secure” and “functional” do not automatically arrive together, especially under adversarial prompting (Tessa et al., 2026). A model saying code is secure has no authority. It has produced text. That is different.

Review becomes more important, not less

One misunderstanding about AI is that it makes review less necessary.

I think the opposite is true.

AI can help with the first layer. It can flag conventions, point out suspicious patterns, summarize a diff, or suggest questions for the reviewer. That is useful; at Google, AI has been used precisely as a first filter for coding practices, so human reviewers can spend more attention on logic and design (Vijayvergiya et al., 2024). Nobody becomes a better engineer from the twelfth comment about formatting.

But peer review is more than bug finding.

Review is also knowledge transfer. It is team memory. It is where someone asks: why are we doing this this way? It is where implicit standards become explicit. And it is a social moment of accountability: I submit work to people I will work with again tomorrow (Bacchelli & Bird, 2013).

An AI review cannot simply stand in for that. You can feel accountable to a teammate. Toward a model, that works differently (Alami et al., 2025).

That does not mean AI should stay out of review. It means AI can be a filter, not the final gate. Let AI collect the easy signals. Let humans judge semantics, architecture, risk, and ownership.

The reviewer should not ultimately ask: “Did AI find anything here?”

The reviewer should ask: “Does the author understand this change, and does it fit what we as a team are willing to maintain?”

Ownership cannot be transferred

This may be the most practical point.

If I commit code, it is my code.

Even if AI suggested it. Even if an agent wrote it. Even if the suggestion looked better than my first version. The repository has no moral footnote saying: “Do not be mad, this came from a model.” If code is wrong, “AI wrote it” is not a defense (Vigh et al., 2026).

Ownership does not mean you have to type everything yourself. That is a romantic but impractical idea. Ownership means you understand what you accept. You can explain why this solution fits. You know which checks were run. You dare to name the risks. And you are willing to maintain the code later.

That is why I do not find the question “how much code did AI write?” very interesting. That question mostly tells us about origin.

The better question is: how much of the AI output was understood, adapted, tested, and consciously accepted?

In studies of pull requests with ChatGPT use, developers often do not take AI patches wholesale. They select, modify, rewrite, and combine them (Ogenrwot & Businge, 2026). That is not a weakness of the tool. That is what professional use looks like.

AI may be a starting point. It may even be a very good starting point. But the endpoint remains human judgment.

A workflow that actually works

If you want to use AI seriously in development, you do not need a grand manifesto. You need a workflow boring enough to keep using.

For me, it comes down to seven steps:

Decide first whether the task is suitable for AI.
Make intent, context, and constraints explicit.
Let AI produce a small draft.
Check the output structurally.
Run tests, static analysis, and security checks where needed.
Use review as the human final gate.
Make AI use and verification visible in the handoff.

That sounds less spectacular than “agent builds feature independently.”

Good.

Spectacular is not the goal. Working software that people dare to change is the goal.

The best AI workflow therefore does not feel like magic. It feels like normal professional work, with a fast assistant nearby. The tool matters, but the workflow around it matters at least as much. That assistant can do a lot, but it does not know your system the way your team knows it. It does not feel production risk. It does not have to explain the code six months from now. It will not be in the retrospective when someone asks why complexity suddenly went up.

You will.

The question that stays

I am not against AI in software development. Quite the opposite. I like using it, precisely because it can make a lot of work faster, sharper, and easier to oversee.

But I do not want speed to be confused with quality.

An AI patch that appears quickly is not a finished change. It is a proposal. Sometimes a good proposal. Sometimes a proposal with polished shoes and mud on the soles. Fast code can still leave debt behind if validation and review get weaker (Liu et al., 2026; Sun et al., 2026).

So the professional question is not: can we make AI write code?

The question is: can we use AI without handing over our own standards?

That is where the real work starts. Not at the prompt. In what you do afterward.

Sources

This article is based on the literature synthesis AI as an accelerator, not a replacement.

Alami, A., Jensen, V., & Ernst, N. (2025). Accountability in code review: The role of intrinsic drivers and the impact of LLMs. ACM Transactions on Software Engineering and Methodology, 34(8), 1-44. https://doi.org/10.1145/3721127

Bacchelli, A., & Bird, C. (2013). Expectations, outcomes, and challenges of modern code review. Proceedings of the 35th International Conference on Software Engineering, 712-721.

Barke, S., James, M. B., & Polikarpova, N. (2023). Grounded Copilot: How programmers interact with code-generating models. Proceedings of the ACM on Programming Languages, 7(OOPSLA1), 85-111.

Becker, J., Rush, N., Barnes, E., & Rein, D. (2025). Measuring the impact of early-2025 AI on experienced open-source developer productivity. arXiv. https://doi.org/10.48550/arXiv.2507.09089

Chen, V., He, J., Williams, B., Valentino, J., & Talwalkar, A. (2026). Beyond the commit: Developer perspectives on productivity with AI coding assistants. arXiv. https://doi.org/10.48550/arXiv.2602.03593

Cui, Z., Demirer, M., Jaffe, S., Musolff, L., Peng, S., & Salz, T. (2025). The effects of generative AI on high-skilled work: Evidence from three field experiments with software developers. SSRN. https://doi.org/10.2139/ssrn.4945566

Fan, G., Liu, D., Pan, L., & Zhang, R. (2026). When help hurts: Verification load and fatigue with AI coding assistants. Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems, 1-25. https://doi.org/10.1145/3772318.3791176

Gerlich, M. (2025). From offloading to engagement: An experimental study on structured prompting and critical reasoning with generative AI. Data, 10(11), 172. https://doi.org/10.3390/data10110172

Liu, Y., Widyasari, R., Zhao, Y., Irsan, I. C., Chen, J., & Lo, D. (2026). Debt behind the AI boom: A large-scale empirical study of AI-generated code in the wild. arXiv. https://doi.org/10.48550/arXiv.2603.28592

Mozannar, H., Bansal, G., Fourney, A., & Horvitz, E. (2024). Reading between the lines: Modeling user behavior and costs in AI-assisted programming. Proceedings of the CHI Conference on Human Factors in Computing Systems, 1-16. https://doi.org/10.1145/3613904.3641936

Mu, F., Shi, L., Wang, S., Yu, Z., Zhang, B., Wang, C., Liu, S., & Wang, Q. (2024). ClarifyGPT: A framework for enhancing LLM-based code generation via requirements clarification. Proceedings of the ACM on Software Engineering, 1(FSE), 2332-2354. https://doi.org/10.1145/3660810

Ogenrwot, D., & Businge, J. (2026). PatchTrack: A comprehensive analysis of ChatGPT’s influence on pull request outcomes. Empirical Software Engineering, 31(5). https://doi.org/10.1007/s10664-026-10869-5

Sarkar, A., Xu, X., Toronto, N., Drosos, I., & Poelitz, C. (2024). When Copilot becomes Autopilot: Generative AI’s critical risk to knowledge work and a critical solution. arXiv. https://doi.org/10.48550/arXiv.2412.15030

Sun, X., Ståhl, D., Sandahl, K., & Kessler, C. (2026). Quality assurance of LLM-generated code: Addressing non-functional quality characteristics. Journal of Systems and Software, 238, 112885. https://doi.org/10.1016/j.jss.2026.112885

Tessa, M., Olatunji, I. E., War, A., Klein, J., & Bissyande, T. F. (2026). How secure is secure code generation? Adversarial prompts put LLM defenses to the test. arXiv. https://doi.org/10.48550/arXiv.2601.07084

Vigh, E., Sunesen, F., & Barkhuus, L. (2026). “AI does not understand the real world.”: AI augmented software development. Proceedings of the Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems, 1-5. https://doi.org/10.1145/3772363.3799079

Vijayvergiya, M., Salawa, M., Budiselić, I., Zheng, D., Lamblin, P., Ivanković, M., Carin, J., Lewko, M., Andonov, J., Petrović, G., Tarlow, D., Maniatis, P., & Just, R. (2024). AI-assisted assessment of coding practices in modern code review. Proceedings of the 1st ACM International Conference on AI-Powered Software, 85-93. https://doi.org/10.1145/3664646.3665664

AI Statement

AI was used to support turning the literature synthesis into a public-facing blog article, including structure, wording, and consistency checking. The content decisions, source interpretation, final editing, and responsibility for the final text remain with the author.