S. Ostermann

A Week of Vibe Coding

April 18, 2026

After a week of vibe coding on the side (while juggling a full-time job, family, and way too little sleep), I almost slipped into a proper funk. Dark thoughts about the future of my profession and humanity in general. Then again, the sleep deprivation probably had something to do with it.

Anyway, I learned a few things...

Open source models

They're still not at the level of frontier models like Anthropic's, but the progress over the last year is impressive. Gemma 4 in particular turned out to be really useful and handled most of what I threw at it. Qwen 3.6 just dropped, so I haven't played with it much yet. Sometimes you need an extra correction loop compared to Sonnet, but honestly, it gets the job done. My biggest headaches were bugs in the models themselves or in the inference tools I was using (llama.cpp based, LM Studio).

Claude is on another level

I threw a large, 10+ year old codebase at Claude Code and it didn't even flinch. On my Android app (the guinea pig for this experiment) HandWrite Pro, I created a bunch of new features, stability fixes, UX improvements, modernizations, and even some architectural redesigns. Doing all that as a side project would normally have taken me months, maybe years.

At first everything ran smoothly with barely any corrections needed.

Later on, cracks started to show. A refactoring introduced some nasty, hard-to-spot bugs. And after Sonnet and I tracked them down together, the same bugs got reintroduced. Twice.

The worst part was unit tests for an open source PDF generation lib. PDF generation is insanely complex. Claude happily generated about 20 elaborate PDF test documents with all the trimmings, but every single test only checked one thing: is the file bigger than 0 bytes? That's just lazy. Borderline refusing to work. It took a lot of nagging to get Claude to write actually meaningful tests.

So what do I take away from all this?

Open source models absolutely have their place. They've caught up, they're solid, and they're often good enough. Whenever I'm working with data I don't want to hand over to someone else, that's what I reach for.

Frontier models like Claude Opus or Sonnet are in a different league. The code they produce is ridiculously good. Any developer who refuses to work with LLMs is going to get left behind. No human can keep up with that pace.

But they're not perfect (nobody expected that anyway, right?). What drove me nuts was hitting token limits so quickly. Though I guess you can throw money at that problem. The bigger issue is how tempting it is to just switch your brain off. I caught myself coding from the couch while watching a show, barely paying attention and switching off all guardrails.

And honestly, I'm left with a weird feeling about where this is all going. What will a computer science degree even look like? How should we be teaching our kids? And how does Europe avoid getting left in the dust when all the good LLMs come from the US or China?