Profiling Wikibase APIs and import speed

There has been some recent chat once again on the Wikibase telegram groups around importing, and the best approach to import a large amount of data into a Wikibase instance. 2 years ago I started a little GitHub project aimed at profiling the speed of loading using the action API, and various settings, DB versions etc, as well as trying out a bulk load API. And I have just taken the opportunity to take another look at it and try to visualize some of the comparisons given changes through the last 2 years.

In case you don’t want to read and follow everything below, the key takeaways are:

  • EPS (edits per second) of around 150 are achievable on a single laptop
  • When testing imports, you really need to test at least 50k items to get some good figures
  • The 2 ID generation related settings are VERY IMPORTANT if you want to maximise import times
  • Make async requests, but not too many, likely tuned to the number of CPUs you have serving web requests. You wan near 100% utilization
  • A batch API, such as FrozenMink/batchingestionextension would dramaticly increase import times

Some napkin math benchmarks for smallish items, I would hope:

  • 1 million items, 2 hours (validated)
  • 10 million items, 1 day
  • Wikidata (116 million) items, 14 day+

Read more

The day Google (almost) lost my timeline data…

On the morning of 22nd March 2025 I received and read an email from Google giving me an “update” on my “Google Maps Timeline”, little did I know this was actually telling me they had just lost some of my data…

The email read…

We briefly experienced a technical issue that caused the deletion of Timeline data for some people. We’re reaching out as your account may have been impacted.
If you have encrypted backups enabled, you may be able to restore your data. Make sure that you have the latest version of Google Maps, then go to your Timeline. Tap the cloud icon near the top of your screen, and choose a backup to import your data. If you did not have backups turned on, unfortunately you will not be able to recover lost data.
We understand that this can be frustrating if you use Timeline to remember places that you’ve visited, and we are taking steps to improve our systems for the future.

I have heard of Google loosing data before (drive files and or photos disappearing and such), or making it inaccessible for people, and so far I’m glad to not have been affected, and have never really dived into these cases before to see if it has happened.

However, it was easy to see in a matter of minutes that ~10 years of location data was indeed gone from my phone… With data only showing from the 6th or 7th of March.

Read more

Splitting a Terraform / Spacelift stack in 2

A year or so ago, I imported a bunch of existing AWS resources into a Spacelift stack using Terraform. Parts of this stack included provisioning Github actions secrets from AWS into Github itself. Due to the way the Github provider and Github API work, I was starting to hit into rate limits due to my ever-increasing number of secrets.

Rather than do anything fancy with additional authentications with the Github API, or higher limits or refactorings within the stack, I opted to split the stack out into the more manageable and focused stacks, which I had already started with my latest deployment which had a stack all to itself.

Unfortunately, there is no “super easy” way to do this. I was dreaming of clicking a button and being able to drag and drop configuration and or state between the various stacks, that would be dreamy. But instead I had to code up some simple scripts to help me migrate the state locally.

High level process

First:

Read more