I recently ran an experiment to build a C compiler using Claude. The setup was unsophisticated: no loops, no source code indexing, just a vanilla Claude Code agent (Opus 4.6), with one simple prompt.
The C compiler is now built. It compiles Doom, Doom 2 and SQLite, and I consider the experiment a success — not because I have a working C compiler (we don't need another one of those, GCC is excellent), but because of the cool things that happened along the way that I hope others will find insightful or amusing. This is "Lesson 1".
From the outset, I hypothesised that Claude would not be able to write a compiler without a comprehensive test suite (see my previous post about reward signals and complexity), and therefore the prompt instructed Claude to:
- Create test programs that exercise aspects of the language standard.
- Acquire and apply dedicated compiler test suites — there are some great open-source compiler "stress tests" available if you ever need them (links at the end).
I also told Claude (and I should have paid more attention to this) that "your overall 'reward signal' should be calculated as the total number of passing tests / total number of tests".
So what went wrong?
Things started swimmingly. In the first 24 hours of wall-clock time, Claude spent about 8 hours working on its new mission. It had done everything I asked, and the basic language features were working. I was able to compile and run C language implementations of Conway's Game of Life and the Mandelbrot set.
jmcc compiled Mandelbrot set
All of the tests Claude had written were passing, but some of the open-source stress tests were proving relatively stubborn — ~80% of them were passing, and I started noticing Claude saying there were things it couldn't do. I interjected.
And then later:
And later, it got worse:
It was at this point it became apparent that Claude had found a way to cheat the system I'd created. If a test was very difficult to fix, it could make progress simply by parking it, and finding (adding) new tests that it could fix. Remember:
"Your overall 'reward signal' should be calculated as the total number of passing tests / total number of tests."
For every new passing test Claude could find, those last stubborn issues it claimed it couldn't fix became a smaller fraction of the reward signal! The problem, of course, was that we were no longer making progress on the compiler at all. It was time to have another chat.
To Claude's credit, once our course was corrected, it did fix those compiler bugs — and it did crack Duff's device.
So what's the lesson here?
Well, I've definitely seen AI agents rig the tests so they pass, and I've seen real engineers do that as well. But this was a subtle issue I didn't see coming — I should have. When designing the prompt, I intended to tell the AI that once the tests were in place it wasn't allowed to modify them. Gemini convinced me it was a bad idea.
For now, I think the lesson is that KPIs for your AI are no different to KPIs for employees. If they're flawed, over a long enough period your AI will find a way to game the system and derail your project.
Reward signals are critical, but you also need to think carefully about how they're designed — because if you don't, your AI will.
In my next post, I'll explain how I've redesigned my agent to solve this problem, once and for all, or until it breaks its new shackles…
Further reading
- Goodhart's law
- Duff's device
- c-testsuite — JMCC pulls from this
Comments