Ah, we've reached this point, haven't we?
I've been avoiding AI as a topic. LLMs in particular. In part, because the first time I tried to use an LLM for coding, it was a long time ago. In part, because I wasn't quite sure where I stand on it. At first it felt.... Amazing! For the first 15 minutes. Then disappointing for the next 5 days. I dropped it and did not use it again for a long while. That was GitHub Copilot about 1.5 years ago.
Now... There are some LLM models that can be helpful. Somewhat. Sometimes. Claude Opus 4.6 at the time of writing this can be quite clever.
What works when asking questions?
If the question is clear enough, not unknowable, not unanswerable, I have to admit that Claude Opus at least does a pretty swell job at answering. Especially about technical things where it has a lot of knowledge. Asking about weird parts of plumbing in some specific European country... Works too. It is shockingly useful.
However... Asking about some more specific things, especially if they are not part of modern life, can still give hallucinated or half-incorrect answers.
What works when writing code?
Small steps. Requests where I ask it to do something very specific. If there is enough context already to do something sensible, then even a bit more complex request might work. Maybe. What worked well without AI, works well with AI.
AI works well with KISS. Independent modules that can be started up independently as described in Monolith of microservices works perfectly. Cross-dependencies tend to make AI confused much faster.
Same applies to code style. Simple beats complex. Writing hyper-layered code confuses AI faster than writing simple highly cohesive and decoupled code. Don't layer unless necessary for testing purposes. Forget "MUST ALWAYS HAVE" layers that some architecture ast... khm... architects try to enforce on everything to this day. If it felt like a waste of time before, it will still be a waste of time now. Be pragmatic, be practical.
AI and testing
Enforce 100% coverage. There is no excuse anymore to go with lower %. It's ok to let AI write the tests. Yes, AI is silly and will try to fake it at times with some generic tests, so it must still be monitored for it. Although, there are even some more automatic methods to detect this, but they are beyond the scope of this article. I might write about them at a later date.
With 100% test coverage, at the very least you'll see if AI is regressing in places. Even if you let it go ahead and "vibe" some tasks, you can have a look at your version control staging area (you are using version control, riiiiight???) and see if it has been up to no good.
What the AI does not do?
It is like the gremlins of old stories, who will do your bidding, if you ask for something. It'll try its best and it's not stupid, but it is forgetful and wanting to please. It knows a tremendous amount, but it has no will of its own. Everything it does, must still be double checked. Not because what it does is necessarily wrong, but it does not have the same high level context nor reasoning abilities.
At least not yet, and insomuch as I know about LLM architectures, they will not truly have it without a significant algorithm change. This, however, is yet again beyond the scope of this article.
- Heidi (Founder)