Back in July 2025 I did a little comparison of various AI code assistants with a small Golang kata (with some content stripped out), and I’m back for a second attempt using the same kata, but focusing on some of the newer Copilot models, as well as cloud agents, and a run through with Google Antigravity. All runs have been screen recorded, very generic time metrics extracted, and the code is also all up in branches in the code repo if you are curious…
The prompts used will also be the same as i my last blog post, starting with…
The code repository is going to be a small Library application.
There are CSV files in the resources directory that contain the content of the library.
Create a user interface that allows display of all books and magazines with detailsCode language: Access log (accesslog)
And continuing to guide the agent through adding a couple of features such as searches, ordering, writing some basic tests, allowing adding data and having a companion application (either CLI or UI, depending on what it chose to do first). You can expand the second below to see them all…
Great! Now I want to add to the interface that you created to allow searching for a book or magazine by its ISBN.
I still want to be able to easily list all books and magazines without any filters / searches too.Code language: Access log (accesslog)
Add an additional way to find books and magazines, by using their authors email address.Code language: Access log (accesslog)
Add the ability to sort all books and magazines, with all of their details displayed, sorted by title.
The sort should be done for magazines and books in combination.Code language: Access log (accesslog)
And as a follow-up …
Write unit tests for the project where appropriate.
If the code is not very unit testable, refactor it, maintaining functionality.Code language: Access log (accesslog)
Add one final feature that allows a user to add books, magazine and or authors to the library.
This should update the CSV files to show the data for future library lookups.
Don't let them just add to each file separately (such as author first), create a sensible form for adding things in a single action.Code language: Access log (accesslog)
And for the second application…
If you didn't yet create an interactive WEB UI, do so now.
If you didn't yet create an interactive CLI UI, do so now.
This can be very simple without much or any styling, but must be functional for the required features:
- read data from CSV files
- display all books and magazines with details
- search by ISBN or author email
- print them sorted by title
- add things to the library (persisting to the CSVs)Code language: Access log (accesslog)
You might also have seen me occasionally send this hint to try harder…
Allow the user to choose the sort directionCode language: Access log (accesslog)Before diving into details, overall the 5.2 Codex model as part of Copilot seems to be my favourite, though to be honest, if you can structure your prompts and repositories to work well within the GitHub Copilot cloud agents setup, that looks very appealing.
The numbers
These numbers all come with a pinch of salt, and you need to read the blurbs next to each video below to learn about that salt. Each number roughly summarizes how long that stage took according to the video replay. Some setups do more validation which takes more time, some just hope they got it right, others get stuck in the same problems over and over again, and some one shot it…
| IDE & Assistant | Model | Main tasks | Tests | Editing | Second app | Total |
| VS Code & GitHub Copilot | Grok Code Fast 1 | 6:34 | 1:49 | 1:53 | CLI 1:20 | ~ 12 minutes |
| VS Code & GitHub Copilot | Claude Haiku 4.5 | 11:57 | 3:20 | 7:27 | CLI 3:41 | ~27 minutes |
| VS Code & GitHub Copilot | Claude Opus 4.5 | 2:45 | 4:04 | 1:19 | CLI 7:21 | ~16 minutes |
| VS Code & GitHub Copilot | GPT 5.2 Codex | 5:49 | 1:21 | 2:46 | CLI 0:55 | ~ 10 minutes |
| Google Antigravity | Gemini 3 Pro | 19:15 | 4:15 | 4:56 | CLI 1:56 | ~ 32 minutes |
It turns out figuring the timings for the individual steps for the cloud agents was rather hard, so I’ll only note their complete times…
| IDE & Assistant | Model | Total |
| Cloud – GitHub Copilot | Auto? | 17 minutes |
| Cloud – Google Jules | Gemini 3 Flash | 31 minutes |
The shortest overall time from my last post was around 18 minutes which was Roocode in VS Code with GPT 4.1, so seeing that GPT 5.2 Codex in Copilot did it all in around half the time and came up with a “better application” hints at a great improvement in the last 6 months.
VS Code & GitHub Copilot – Grok Code Fast 1
Grok Code Fast 1 was my first run through for this post, and the main things I noticed along the way were…
- Mainly unstyled web application (fine, but arguably less “complete”)
- Kept tripping over itself with terminals and existing running servers and port conflicts, never cleaning up after itself
- Everything is in a single main.go file, including all the HTML etc embedded within
- For the web UI, almost everything is on a single page
To be honest, this all makes sense, as it is one of the lighter models, probably primarily meant for auto-completion and smaller task completion where it has nice accurate small context, or doesn’t need to invent too many structures itself.
It completed everything in one of the fastest times, at around 12 minutes, however It’s easy to argue that other setups that only took slightly longer created something much better, and got there in a less messy way.
VS Code & GitHub Copilot – Claude Haiku 4.5
Currently only of my favourite cheaper models for use within GitHub Copilot, as it’s 0.33x request usage currently, and seems to perform comparably to the Sonnet models with the right sized tasks.
During this session Haiku decided to use the playwright MCP to try and do some validation against the application it was making, which was a neat idea, but slowed it down when directly comparing against some of the other runs.
It seemed to make quite a few fundamental Golang mistakes which was frustrating to watch.
I don’t know if this Claude colour pallet is coded into the initial Claude prompts, but it does seem to appear in many of my Claude related tests.
The screenshots that were taken and left in the chat context seemingly broke the underlying API leading to 413 Request Entity Too Large, which lead to me ditching the context and having to start the next task from fresh.
This is something I do sometimes see within GitHub Copilot on other models too, and I’m unsure if it is a Copilot issue, or the LLM APIs below.
VS Code & GitHub Copilot – Claude Opus 4.5
In my previous test, I tried to use Claude Opus 4 via Claude Code, however I quickly ran out of credit on my account even for this simple Kata, and didn’t finihs getting through the prompts… However using these models through GitHub Copilot always seems to be far more efficient in terms of spend, as all billing is done in terms of number of requests fed into the system by you as a user, roughly comparable to number of prompts by the user, rather than actual LLM API requests, or token usage. As a result you can do the same in a single Github Copilot request, that might already have cost you multiple $ using an API token directly.
If you use Claude Code directly, that’s probably why you want to have a look at the plans, rather than just use an API key like I did last time…
Going into this Opus seemed like it might be one of the more OP models, and it did complete some stages rather quickly, however overall it fell behind, and with a 3x adjustment in the GitHub Copilot rates for it right now, I might opt for alternative modes (such as GPT codex) right now.
Opus ran into a context window limit, and it had to “summarize” the conversation, which took around 40s, and was the only model I saw do this through this experiment.
According to what VS Code currently tells me, it is limited to a 200k content window, 128k prompt and 16k output. GPT 5.2 has a 264k window, many GPT 5 and 5 Codex models 400k. So this is also likely a big differentiating factor depending on the setup you use…
VS Code & GitHub Copilot – GPT 5.2 Codex
This was one of my first times trying out the GPT 5.2 Codex model with Github Copilot. In the past weeks or month so many GPT and Codex models have appeared in the list as options to use with varying variations.
- GPT 5
- GPT 5 Codex (Preview)
- GPT 5.1
- GPT 5.1 Codex
- GPT 5.1 Codex MAX
- GPT 5.1 Codex Mini (Preview)
- GPT 5.2
- GPT 5.2 Codex
Overall, within VS Code the 5.2 Codex run through of this test impressed me the most. Maybe if Opus didn’t hit its context window limit it might have been very nearby, but the 3x cost via Copilot again turns me away.
File structure left something to be desired, but the application function was good, it was styled, and pretty damn fast at getting there.
Google Antigravity – Gemini 3 Pro
This was only my second time having a look at Google Antigravity, as you can see by my stumbling around mid-recording trying to find the built-in browser…
On the whole, it was a rather different experience to Copilot, perhaps in part because I was using the planning mode rather than the “Fast” agent mode, which for a more accurate comparison I likely should have been using. The planning mode leads to it spending more time thinking and writing planning documents, not just jumping right to the code, and these plans also require additional input from the user. This is more similar to the patterns I have seen with Google Jules before, and perhaps some of that learning is being folded into Anti Gravity.
The application worked, at the end, the styling left a little to be desired, but it was also there, it got a little tangled in how many web servers it was running at one point, and ports got confused, but it got there.
The main reason I haven’t tried it out more to date is that on my first attempt on release, the WSL integration was not working. (EDIT: turns out WSL integration does work, it’s just a bit confusing…)
Considering I don’t have a paid for AI plan with Google, I was quite happy though to be able to play around with Antigravity without needing to jump through any paid hoops, I’m sure some folks will get some great utility out of this.
GitHub Copilot Cloud
No video for this one, but you can have a look at the pull request, apparently there is no way to share the actual agent session, even of a public repo?
I kicked off the agent at 19:29:46 with it wrapping up the commits at 19:46, so roughly 17 minutes total execution in the cloud.
The pull request came with documentation around how to use the thing, screenshots of what it looks like, and I can imagine this would be very neat and easy to iterate on with a repository that is set up for pull request previews etc.
Checking it out locally, everything seemed to work well, and skimming the agent logs it didn’t seem to horribly break anything along the way or get stuck.
File structure wise, it now has a few go files, and HTML templates (not embedding it in code, woo!)
Google Jules (Cloud)
A self claimed time of 31 minutes, Jules includes less to look at as part of the pull request that is created (no screenshots and not much documentation). Also despite completing the tasks in multiple different steps, and the fact that I requested it to make commits along the way, it just made a single big commit at the end…
You can also see the pull requests for this, but there is also no way to share the agent logs while it was working…
Overall
Things keep improving, the options are diversifying, its getting easier to fold your own models into these setups, and they are becoming more capable in terms of the tools that they can use. From a user perspective, the main red flags (as they were before):
- Getting caught in loops or make the same mistake over and over again
- Making fundamental language mistakes
- Trying to put everything in a single file always…
- Hitting context windows, and needing to “summarize”
The cloud agents seem super promising in the right environment.
Apendix: GitHub Copilot current models & context windows
I couldn’t make VS Code give me the table that I see in https://github.com/microsoft/vscode/issues/248860, however, here is a formatted table from the raw API calls.
| Model Name | Vendor | Context Window | Max Output | Category | Premium |
| GPT-5.2-Codex | OpenAI | 400,000 | 128,000 | Powerful | Yes (1x) |
| GPT-5.2 | OpenAI | 264,000 | 64,000 | Versatile | Yes (1x) |
| GPT-5.1-Codex-Max | OpenAI | 400,000 | 128,000 | Powerful | Yes (1x) |
| GPT-5.1-Codex-Mini | OpenAI | 400,000 | 128,000 | Powerful | Yes (0.33x) |
| GPT-5.1-Codex | OpenAI | 400,000 | 128,000 | Powerful | Yes (1x) |
| GPT-5.1 | OpenAI | 264,000 | 64,000 | Versatile | Yes (1x) |
| GPT-5-Codex (Preview) | OpenAI | 400,000 | 128,000 | Powerful | Yes (1x) |
| GPT-5 | Azure OpenAI | 400,000 | 128,000 | Versatile | Yes (1x) |
| GPT-5 mini | Azure OpenAI | 264,000 | 64,000 | Lightweight | No |
| GPT-4.1 (Default) | Azure OpenAI | 128,000 | 16,384 | Versatile | No |
| GPT-4o (2024-11-20) | Azure OpenAI | 128,000 | 16,384 | Versatile | No |
| Claude Opus 4.5 | Anthropic | 200,000 | 16,000 | Powerful | Yes (3x) |
| Claude Sonnet 4.5 | Anthropic | 200,000 | 16,000 | Versatile | Yes (1x) |
| Claude Sonnet 4 | Anthropic | 216,000 | 16,000 | Versatile | Yes (1x) |
| Claude Haiku 4.5 | Anthropic | 200,000 | 16,000 | Versatile | Yes (0.33x) |
| Gemini 3 Pro (PV) | 128,000 | 64,000 | Powerful | Yes (1x) | |
| Gemini 3 Flash (PV) | 128,000 | 64,000 | Lightweight | Yes (0.33x) | |
| Gemini 2.5 Pro | 128,000 | 64,000 | Powerful | Yes (1x) | |
| Grok Code Fast 1 | xAI | 128,000 | 64,000 | Lightweight | No (Promo) |
Apendix: Screenshots




1 thought on “VS Code Copilot (Agent) vs Google Antigravity (Planning) & More”