Back to blog

Mind the Gap

AI won't end software engineering, and it won't vibe-code a cure for cancer. But it is changing what counts as evidence that your code is safe.

securityengineering

Mind the Gap

Firefox 150 shipped last month with fixes for 271 vulnerabilities. Most of them were found by Anthropic's Mythos, a general-purpose model that, by Anthropic's own account, wasn't trained for security in particular. The capability just emerged. And Mozilla didn't panic about it. "The zero-days are numbered" is, despite the title, an optimistic post. The argument goes roughly like this: For a long time the attacker only had to find one way in, while the defender had to find all of them, and that asymmetry quietly favoured the attacker. Once we have a machine that finds bugs about as cheaply as a very good human does, the asymmetry starts to fade.

I think they're basically right. That's actually good news. It is also, despite a fair amount of noise to the contrary, not the end of anything.

We don't seem able to talk about AI without picking an edge to stand on. One edge is the end of software engineering as a career. The other is the weekend on which you vibe-code a cure for cancer. AI might genuinely help cure cancer, for what it's worth. It won't do it by vibe-coding, though, and probably not as fast as the loudest people expect. Both edges are popular for the same reason. Standing on an edge is easy. It costs nothing except a confident opinion. The work that matters is in the unglamorous middle, where it usually is.

Intent vs. Behavior

So what does the middle actually contain? Mostly this. When a human reviews code for security, the question in their head is something like "does this do what the spec intended." A machine reviewing the same code asks a wider and less polite question: "what does this code permit, whoever wrote it and whatever they meant by it?" A lot of real vulnerabilities live in the gap between these two questions: Intent vs. behavior. The endpoint that returns your own invoice, and also returns everyone else's the moment you change the id in the URL: nobody intended that, and the code allowed it anyway. Nate has a good walkthrough of that gap if you want the longer version.

There is a problem with the word "spec", though. Most software hasn't really got one. The code is the spec. That is the actual issue, because code does precisely what it says, including all the things nobody asked it to do. You are not supposed to say this out loud. People take it personally. I'll say it anyway.

So the machine reads that de facto spec back to you and tells you what it really does, not what you were intended^H hoping it did. That quietly changes what we accept as evidence that a piece of code is safe. For decades the evidence has been, more or less, that competent people wrote it and other competent people reviewed it. That standard is starting to look thin. The replacement is less flattering to all of us. The code is safe because something picked it apart, hard, at a scale no review meeting will ever match. And btw: It can be both.

The Benefit

None of this is a tragedy for engineers, whatever the headlines say. It is a better tool, and it points at exactly one thing, which is security. It does not understand your customers. It has no opinion on whether the product is worth building. That part was always the job, and it still is. What the tool does is close a gap that we, if we're honest, were never really good at closing ourselves. Software gets safer, and we should be thankful for that.

We've been here before, more than once. "Machines will never beat a human at Go" aged about as well as most sentences that begin with "machines will never." The sane response to a sharper tool was never to grieve it, and never to claim it does everything either. It's pragmatism. Learn the craft properly, watch what the tool changes, and use it where it earns its place.