AI Code assistant experience comparison (golang-kata-1)

This entry is part 1 of 2 in the series Golang AI kata comparison

If you’re reading this, and thinking about trying an IDE integrated coding agent, or thinking about switching, maybe stick around, have a read and watch some of the videos. There is at least 6 hours worth of experience wrapped up in this 20 minuite read!

I’m watching a thread on the GitHub community forums, where people are discussing how GitHub Copilot has potentially gone slightly downhill. And in some ways I agree, so I through I’d spend a little bit more time looking at the alternatives, and how they behave.

This post tries to compare 9 different setups, and will primarily look at the differences in presentation within the VS Code IDE that each of these different coding assistants have. How the default user interactions work, and how the tasks are broken down and presented to the user, and generally what the user experience is like between these different assistants.

I’ll try to flag up some other useful information along the way, such as time comparisons, amount of human interaction needed, and overall satisfaction with what the thing is doing, and if this all presents itself nicely in this post, I might find myself writing more in the future…

However, I will not be looking at cost, setup, resource usage or what’s happening with my data along the way…

Assistant, LLM combinations

AssistantModelMain tasks @Tests @Second app @
Github CopilotGPT 4o~ 5:00~ 24:45~ 32
Github CopilotGPT 4.1~ 15:00~ 17:40~ 35
Github CopilotClaude Sonnet 4~ 17:00 (inc tests)~ 17:00~ 28
Gemini Code AssistantGemini Something ?~ 11:20~ 14:30~ 25
AmazonQClaude Sonnet 4~ 7:20~ 15:50~ 28
RoocodeGPT 4.1 (via Github Copilot)~ 5:30~ 10:00~ 18
RoocodeClaude Sonnet 4 (via Anthropic)~ 15:30~ 20:00~ 37
Claude CodeClaude Sonnet 4~ 9:30~ 17:40~ 24
Claude CodeClaude Opus 4~ 10:00N/AN/A

I have setup this post, and the code problem in such a way that I should be able to easily add more combinations and comparisons in the future, and directly compare the performance back to this post. Ideally, at some stage I’d try some other models via Ollama, and also some other pay per requests LLM APIs…

Setup

I’ll be using Visual Studio Code version 1.102.0 and focusing on the Kata echocat/golang-kata-1, specifically at this base commit each time. And I’ll be screen recording the whole VS Code window, so you can see the entire user interaction flow and directly compare the same prompts between each assistant, and how they actually present their tasks and make the changes.

This Kata ultimately just contains some CSV files, with information about a library. These are books, magazines and authors. And the original README says that the software should:

  1. read data from CSV files
  2. display all books and magazines with details
  3. search by ISBN or author email
  4. print them sorted by title
  5. optional tasks include:
    • writing unit tests
    • creating an interactive UI
    • allowing additions to the data with CSV export.

I decided to remove the README from the git repo, so that the assistant would always focus on what I am saying, rather than try to invent tasks of its own if it ever takes a look at the README and existing kata instructions.

Context: I’ll try to maintain the same chat session / context window throughout the whole coding session, although personally while using assistants I do find it useful to totally ditch a session if the assistant starts doing dumb things.

Prompts: I came up with the prompts ahead of time so as not to be biased by what each agent does along the way. For each step, I’ll follow up on the prompts as appropriate to guide the assistant toward the target goal if it strays away or gets lost.

In the recordings, you’ll see me copying and pasting these prompts into each assistant.

The code repository is going to be a small Library application.
There are CSV files in the resources directory that contain the content of the library.
Create a user interface that allows display of all books and magazines with detailsCode language: Access log (accesslog)
Great! Now I want to add to the interface that you created to allow searching for a book or magazine by its ISBN.
I still want to be able to easily list all books and magazines without any filters / searches too.Code language: Access log (accesslog)
Add an additional way to find books and magazines, by using their authors email address.Code language: Access log (accesslog)
Add the ability to sort all books and magazines, with all of their details displayed, sorted by title.
The sort should be done for magazines and books in combination.Code language: Access log (accesslog)

And as a follow-up to the core part of the Kata, some stretch prompts, which I managed to complete for everything except for Opus (as it was starting to cost too much…)

Write unit tests for the project where appropriate.
If the code is not very unit testable, refactor it, maintaining functionality.Code language: Access log (accesslog)
Add one final feature that allows a user to add books, magazine and or authors to the library.
This should update the CSV files to show the data for future library lookups.
Don't let them just add to each file separately (such as author first), create a sensible form for adding things in a single action.Code language: Access log (accesslog)
If you didn't yet create an interactive WEB UI, do so now.
If you didn't yet create an interactive CLI UI, do so now.
This can be very simple without much or any styling, but must be functional for the required features:
 - read data from CSV files
 - display all books and magazines with details
 - search by ISBN or author email
 - print them sorted by title
 - add things to the library (persisting to the CSVs)Code language: Access log (accesslog)

Reflecting on where the assistants most commonly “went wrong” of didn’t quite live up to my expectations, I also often used these (or something similar).

Allow the user to choose the sort directionCode language: Access log (accesslog)
I get a 404 when i visit the root of the web app. Add a home page.Code language: Access log (accesslog)

I generally tried not to interfere, however there were some occasions where I stopped the assistant from completing it’s decided on flow as it was either:

  • stuck (in a loop waiting for a command)
  • taking too long (in the case it was trying to slowly test everything in a browser)
  • broken (sometimes files got into a really bad state and needed some undos)

And although I do jab at the thing the assistants are making to make sure it’s mostly working the way I would expect, I am certainly not checking the code for accuracy, completeness and competence. At the end of some of the recordings, there are certainly other things that I could follow up with, however they fall out of the scope of this comparison, and the initially thought up prompts.

And with no further waiting, here are the results…

copilot-gpt-4.1

The first code assistant / LLM based coding tool that I used was GitHub copilot with the GPT models. More recently, they seem to be behaving badly, and that is perfectly demonstrated in the recording.

It started off with a CLI tool that seemed to list the library after only 30 seconds

At around 5:14 you see it trying to write unit tests, and it makes a giant mess of the files partially updating code, and leaving it all in a broken state, until I basically revert its changes to a working copy at 11:45 and tell it to start fresh from there. In fact, it also broke things after this, and likely just spent 10 minutes fighting with itself or so.

After 32 minutes it had completed the rest of the tasks and also created a very basic web app.

Everything seemed to function well enough, even if they looked a little ugly.

Skimming the code, everything ended up in a main.go for the CLI stuff, and a web/server.go for the web server, with a single HTML and JS file being served. CSVs were only ever appended to (other assistants kept them sorted).

Having a look at the timings, the main tasks were complete after 5:00, it spent at least 6:00 trying to fix its broken state before I helped, due to the mess it made, it wasn’t super pleasant, and we had to revist some things.

copilot-gpt-4o

Having spent a while not using the GPT 4o options in Copilot, I now feel like this model might have been the “better” copilot experience I remember pre Agent mode (using edit mode instead). It certainly can’t “think” through problems as far as other models, but it’s rather quick, and with the right instructions fairly effective.

It did struggle to look into the CSVs and correctly parse them out of the gate, and needed some hand holding (the only model that did), and decided to leave the Hello World code in throughout most of the activity.

It also started off with a CLI tool, however due to the CSV parsing issues I wouldn’t consider this to be working until around the 3:40 mark as an initial version.

However as part of the second search task I realized it was still parsing the data incorrectly, basically making stuff up rather than reading files. During further fixes to the CSV parsing (where it really got hung up on ISBN parsing issues that didn’t exist), it then accidently removed part of the search feature that it has already implemented that then needed to be added back.

After 35 minuites it had made a basic web app, this time using multiple HTML files!

As with 4.1 via copilot, everything seemed to function well enough, even if they looked a little ugly. It needed a lot more hand holding, but was rather nice and quick throughout, even if half the responses were problematic.

I would consider the core tasks complete after 15:11, which is 3x slower than copilot with the 4.1 model.

copilot-claude-sonnet-4

GitHub Copilot with the Claude 4 model became one of my favourite combinations in recent months, potentially just due to GPT 4.1 via Copilot feeling a little off… It always seems to hunt around for context more by itself, and generally break things down into more reasoanble steps and interacting with the IDE more, such as running scripts and tests itself.

In this run through it wen’t overboard being the first run to populate the README file, and even went ahead and made some demo scripts for the setup.

It also started off with a CLI tool, but notibly it started maintaining and running the unit tests from the very first prompt.

At one stage it tried to make some backup files and duplicated go files, which lead to it getting a little stuck trying to solve issues that it again had caused itself, and could easily be resolved by removing these .go backups (and I had to step in here)

Along the same sort of lines, around 13:50 the VS Code was telling copilot that there were buplicate declarations that it needed to fix, however on disk there were no, and I believe this was due to it having changed files in its context that had been deleted from disk? However while mid task there is no way to remove this, but giving the prompt “the IDE is giving you bad info, these things are not declared twice” seemed to enable it to ignore the supposed problem.

At 17 minuites it had completed the first tasks, though this includes unit tests, and after 28 minuites, it had made a slightly more impresive UI.

gemini-code-assist

I started using the Gemini code assistant this week. I have previously used the Gemini models within Github Copilot, however they are now part of the rate limited / restricted tiers, so I try to avoid them.

This isn’t really a fair comparison, as this is NOT in agent mode, as it isn’t quite released yet for me to use…. However it should be coming any day…

Seemingly the code assist extension does sometimes get into a bade state with its knowledge of the code, and the code that is actually on disk, and therefor it can struggle to apply diffs in some cases. This seems to be more apparent on larger projects, and I believe it only comes up once in the recording. (see the 6:00 mark)

The lack of feedback to the user sucks a little bit, and lots of this video feels a little like watching paint dry, or being stuck on a spinner of doom, not knowing if perhaps you should hit stop, and try to re prompt. 1:52 -> 3:13 is just a loading spinner (nearly 1.5 minuites…)

Starting off, it failed to look around for context enough, and decided it should create it’s own CSV files, but with a little guidence toward them it started on with the tasks.

After 4 minuites it had the first working version listing from the CSVs. And after 11:20 it had done the main tasks. Despite all of the time waiting with no feedback I considered everything to be done after only 25 minuites, which means this is the “fastest” so far, not that speed is what we are measuring here!

And we ende dup with a fairly reasonable web app.

Directly comparing with the experience of Copilot, I quite enjoy the fact that the text edits need approval in the side bar before twearing the files appaert in your IDE. In a way I think Gemini detecting that it can’t apply a diff cleanly is one of the current issues with Copilot, but it decided to just barrel ahead intead of consulting the human.

amazonq-claude-sonnet-4

Prior to this run through I had never before using Amazon Q as a code assistant. I had heard someone mention that it was slow, however I was on the whole impressed with the experence.

I really liked the command execution with the chat frame (less clicking between things), and although I didn’t use them, the idea that I can have multiple chat tabs open at once feels like it has a lot of potential.

As with the Gemini assistant, sometimes it thinks for a very long time, giving you no feedback and you just fele like your sat there wondering if it is broken or not… (4.5 minuites was the longest time I was just watching a pulsating “Thinking…” text)…

Directly editing files, and making use of snapshots and diffs visible in the chat window is also a far nicer experience than that of Copilot to me. The window is no longer jumping around, and flashing green and blue, you can still navigate your code base while the chat is doing things.

The first working version was there at the 1:30 mark, 7:20 saw us with the main tasks done, all wrapping up with a web UI after 28 minuites.

roo-copilot-gpt-4.1

I first heard of Roo this week, on the thread moaning about Copilot.

Copilot officially sucks now after the rate limiting, it feels like we’re moving backward instead of forward.

Either way, I was recently advised to use Copilot with RooCode if you have been rate limited, have you tried it? Also, there’s Gemini Assist now; have you tried that?

Metanome on Github

Roo Code is an open-source, AI-powered coding assistant that runs in VS Code, and allows you to integrate any AI model, including those which are provided within VS Code if you have Copilot.

So this run through uses the VS Code provided GTP 4.1 as a direct comparison with the Copilot 4.1 run above.

The interface has so much more visibility, flexability and configurability than any of the other assistants I have used to date. You can see the size of the context, how many tokens are flying around, and every step that is happenig along the way.

Roo with GPT 4.1 was the first combination that decided to start off with a web app, rather than a CLI tool.

After a little encourgament to make a home page so that I could find the sub pages it had a working list of books at 2:30. The styling was very 90s and essentially reminded me that this is GPT 4.1 creating this web page, and it doesnt like css or color.

After 5:30 it had completed the first sets of tasks, it started writing the CLI at 10:00 and was all done at 17:40.

Roo did get a little stuck toward the end while trying to make the second cli application, and needed a little help with some go fundamentals to not make a massive mess.

But wow, what nice flrexability, visibility and interface!

roo-anthropic-claude-sonnet-4

Now for the combination that I was most excited by, seemignly most of my favourite run throughs so far make use of a claude sonnet 4 variation, and now I might be combining that with my favourite assistant interface so far also.

However, this will also be the first run through that we can see the cost of the API calls adding up, as so far everything else has been on a varierty of free or pre packaged / demo offerings…

Roo with Claude also seemed to opt for a web based UI first, so perhaps there is something within Roo that is making this decision?

After 2:15 the first version of the web service was running, even if there are some oddities going on…

While trying to make it fix this odd text, all of a sudden I realized it was waiting to try and open a browser to checkout the page itself! It ended up taking a screenshot and trying to fix the visible issue.

The web browser didn’t seemt o be the fastest, and also seemed to knock the API call price up quite a bit, though perhaps thats due to the continually building context…

After 15:30 the first stack of tasks were done, and I had spent $1.28 according to Roo (slightly more recorded in the Anthropic API dashboard).

By 20:00 API calls were costing me around $0.03 each, and I decided to try hitting the “condense context” button, which may have been a mistake, as that action in itself cost me $0.38… And also lead to be hitting the Anthropic token rate limit for a number of minuites, where I just sat there twiddling my thumbs.

Overall this run cost me 4.21 USD……. Given I don’t normally pay per API call, this was a shocker for me, as really I feel like I havn’t gotten it to do anything even remotely hard. This certainly make me shy away from the pay per API call billing models for these assistants.

claude-sonnet-4

To date, I have never used Claude Code directly, the closest was at a Hackathon earlier this year watching over someone elses shoulder. Although there is a VS Code extension, seemignly this just opens the terminal anyway, so this is all terminal based, though it is a very nice terminal interface.

Comparing with Copilot, Gemini and Amazon Q, its nice that Claude Code at least tells you how many tokens are flying around while you are waiting for your API calls to complete.

After 2:30 there was a working first CLi version listing the library contents. And at 9:30 the first round of tasks were all implemented in the CLI. At 17:40 it started making the web UI, and although it initally message up its go templating, that was quickly resolved.

I quite like the CLI focused interface, and I know Copilot also has this, though I have never used it.

Overall this session cost 3.63 USD, so slightly less than when used via Roocode (though this session didnt have any screenshots in it…)

claude-opus-4

Opus came out recently, and whereas Sonnet 4 has an input cost of $3 / MTok and output cost of $15 / MTok, Opus costs around 5 times that at an input cost of $15 / MTok and output of $75 / MTok (not to mention the increased cost of prompt caching)

All in all, I added more credit once again to the Anthropic account, and I still ran out before reaching the complete end of the planned prompts, however I think the first half still gives a pretty nice overview.

Generally speaking, Opus just didnt really need me. It created a web app first, and after 3 minuites it was running and looked very nice.

After just over 10 miuites it had completed the first set of tasks with little to no human interaction other than comfirming actions.

This was also the point in time that it gave me the warning about already having burnt $5, and I just started to watch the last few dollars drip out of my account (it felt more like a serioud leak than a drip).

It didnt get to the end of writing unit tests at 14:18 before I ran out of credit, and actually went slightly negative!)

Some sort of conclusion?

  • I dislike the pay per call / pay per token approach for coding assistants, most of them don’t give you enough control over what they are doing in their API calls to make efficient use of your money. Arguably this might be why this model seems to “work better” though, as the companies are no longer focusing on optimizing the context and token count down?
  • Roocode is friggin awesome, and the fact that it can integrate with the VS Code Github Copiolt LLMs out of the box is very nice. So much configuration, so much visibility.
  • I defintly want to try some other LLMs including “locally” hosted at some point, likely via Ollama.
  • GPT based LLMs make for ugly UIs out of the box

And in terms of the user experience specificaly:

  • Multipe chat tabs are nice
  • Not just seeing a loading bar or spinner is nice
  • Not having my IDE jump around all over the place as actions are happening is nice
  • Having console runs and output embeded in the chat is sweeet

And at a higher level

  • I now have far too many VS Code assistant extensions installed
  • I want to try running them entirely within a code base within a dev container
  • If anyone wants to cover some of the assistant / LLM bill for these sorts of posts, perhaps ou should buy me a coffee (or a beer)

Golang AI kata comparison

VS Code Copilot (Agent) vs Google Antigravity (Planning) & More

2 thoughts on “AI Code assistant experience comparison (golang-kata-1)”

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.