There has been some recent chat once again on the Wikibase telegram groups around importing, and the best approach to import a large amount of data into a Wikibase instance. 2 years ago I started a little GitHub project aimed at profiling the speed of loading using the action API, and various settings, DB versions etc, as well as trying out a bulk load API. And I have just taken the opportunity to take another look at it and try to visualize some of the comparisons given changes through the last 2 years.
In case you don’t want to read and follow everything below, the key takeaways are:
- EPS (edits per second) of around 150 are achievable on a single laptop
- When testing imports, you really need to test at least 50k items to get some good figures
- The 2 ID generation related settings are VERY IMPORTANT if you want to maximise import times
- Make async requests, but not too many, likely tuned to the number of CPUs you have serving web requests. You wan near 100% utilization
- A batch API, such as FrozenMink/batchingestionextension would dramaticly increase import times
Some napkin math benchmarks for smallish items, I would hope:
- 1 million items, 2 hours (validated)
- 10 million items, 1 day
- Wikidata (116 million) items, 14 day+




