Golang AI kata comparison

This entry is part 2 of 2 in the series Golang AI kata comparison

Back in July 2025 I did a little comparison of various AI code assistants with a small Golang kata (with some content stripped out), and I’m back for a second attempt using the same kata, but focusing on some of the newer Copilot models, as well as cloud agents, and a run through with Google Antigravity. All runs have been screen recorded, very generic time metrics extracted, and the code is also all up in branches in the code repo if you are curious…

The prompts used will also be the same as i my last blog post, starting with…

The code repository is going to be a small Library application.
There are CSV files in the resources directory that contain the content of the library.
Create a user interface that allows display of all books and magazines with detailsCode language: Access log (accesslog)

And continuing to guide the agent through adding a couple of features such as searches, ordering, writing some basic tests, allowing adding data and having a companion application (either CLI or UI, depending on what it chose to do first). You can expand the second below to see them all…

Great! Now I want to add to the interface that you created to allow searching for a book or magazine by its ISBN.
I still want to be able to easily list all books and magazines without any filters / searches too.Code language: Access log (accesslog)

Add an additional way to find books and magazines, by using their authors email address.Code language: Access log (accesslog)

Add the ability to sort all books and magazines, with all of their details displayed, sorted by title.
The sort should be done for magazines and books in combination.Code language: Access log (accesslog)

And as a follow-up …

Write unit tests for the project where appropriate.
If the code is not very unit testable, refactor it, maintaining functionality.Code language: Access log (accesslog)

Add one final feature that allows a user to add books, magazine and or authors to the library.
This should update the CSV files to show the data for future library lookups.
Don't let them just add to each file separately (such as author first), create a sensible form for adding things in a single action.Code language: Access log (accesslog)

And for the second application…

If you didn't yet create an interactive WEB UI, do so now.
If you didn't yet create an interactive CLI UI, do so now.
This can be very simple without much or any styling, but must be functional for the required features:
 - read data from CSV files
 - display all books and magazines with details
 - search by ISBN or author email
 - print them sorted by title
 - add things to the library (persisting to the CSVs)Code language: Access log (accesslog)

You might also have seen me occasionally send this hint to try harder…

Allow the user to choose the sort directionCode language: Access log (accesslog)

Before diving into details, overall the 5.2 Codex model as part of Copilot seems to be my favourite, though to be honest, if you can structure your prompts and repositories to work well within the GitHub Copilot cloud agents setup, that looks very appealing.

The numbers

These numbers all come with a pinch of salt, and you need to read the blurbs next to each video below to learn about that salt. Each number roughly summarizes how long that stage took according to the video replay. Some setups do more validation which takes more time, some just hope they got it right, others get stuck in the same problems over and over again, and some one shot it…

IDE & Assistant	Model	Main tasks	Tests	Editing	Second app	Total
VS Code & GitHub Copilot	Grok Code Fast 1	6:34	1:49	1:53	CLI 1:20	~ 12 minutes
VS Code & GitHub Copilot	Claude Haiku 4.5	11:57	3:20	7:27	CLI 3:41	~27 minutes
VS Code & GitHub Copilot	Claude Opus 4.5	2:45	4:04	1:19	CLI 7:21	~16 minutes
VS Code & GitHub Copilot	GPT 5.2 Codex	5:49	1:21	2:46	CLI 0:55	~ 10 minutes
Google Antigravity	Gemini 3 Pro	19:15	4:15	4:56	CLI 1:56	~ 32 minutes

It turns out figuring the timings for the individual steps for the cloud agents was rather hard, so I’ll only note their complete times…

IDE & Assistant	Model	Total
Cloud – GitHub Copilot	Auto?	17 minutes
Cloud – Google Jules	Gemini 3 Flash	31 minutes

The shortest overall time from my last post was around 18 minutes which was Roocode in VS Code with GPT 4.1, so seeing that GPT 5.2 Codex in Copilot did it all in around half the time and came up with a “better application” hints at a great improvement in the last 6 months.

Assistant, LLM combinations

I have setup this post, and the code problem in such a way that I should be able to easily add more combinations and comparisons in the future, and directly compare the performance back to this post. Ideally, at some stage I’d try some other models via Ollama, and also some other pay per requests LLM APIs…

Assistant	Model	Main tasks @	Tests @	Second app @
Github Copilot	GPT 4o	~ 5:00	~ 24:45	~ 32
Github Copilot	GPT 4.1	~ 15:00	~ 17:40	~ 35
Github Copilot	Claude Sonnet 4	~ 17:00 (inc tests)	~ 17:00	~ 28
Gemini Code Assistant	Gemini Something ?	~ 11:20	~ 14:30	~ 25
AmazonQ	Claude Sonnet 4	~ 7:20	~ 15:50	~ 28
Roocode	GPT 4.1 (via Github Copilot)	~ 5:30	~ 10:00	~ 18
Roocode	Claude Sonnet 4 (via Anthropic)	~ 15:30	~ 20:00	~ 37
Claude Code	Claude Sonnet 4	~ 9:30	~ 17:40	~ 24
Claude Code	Claude Opus 4	~ 10:00	N/A	N/A

VS Code Copilot (Agent) vs Google Antigravity (Planning) & More

The numbers

AI Code assistant experience comparison (golang-kata-1)

Assistant, LLM combinations

Prompts continued…+

The numbers

Assistant, LLM combinations