Toward the end of 2020 I spent some time blackbox testing data load times for WDQS and Blazegraph to try and find out which possible setting tweaks might make things faster.
I didn’t come to any major conclusions as part of this effort but will write up the approach and data nonetheless incase it is useful for others.
I expect the next step toward trying to make this go faster would be via some whitebox testing, consulting with some of the original developers or with people that have taken a deep dive into the code (which I started but didn’t complete).
The Wikidata query service is a public SPARQL endpoint for querying all of the data contained within Wikidata. In a previous blog post I walked through how to set up a complete copy of this query service. One of the steps in this process is the munge step. This performs some pre-processing on the RDF dump that comes directly from Wikidata.
This post walks through using the new Hadoop based munge step with the latest Wikidata TTL dump on Google cloudsDataproc service. This cuts the munge time down from 1-2 days to just 2 hours using an 8 worker cluster. Even faster times can be expected with more workers, all the way down to ~20 minutes.
Many users of Wikibase find themselves in a position where they need to change the concept URI of an existing Wikibase for one or more reasons, such as a domain name update or desire to have https concept URIs instead of HTTP.
Below I walk through a minimal example of how this can be done using a small amount of data and the Wikibase Docker images. If you are not using the Docker images the steps should still work, but you do not need to worry about copying files into and out of containers or running commands inside containers.
This is a belated post about the Wikibase docker images that I recently created for the Wikidata 5th birthday. You can find the various images on docker hub and matching Dockerfiles on github. These images combined allow you to quickly create docker containers for Wikibase backed by MySQL and with a SPARQL query service running alongside updating live from the Wikibase install.