Many users of Wikibase find themselves in a position where they need to change the concept URI of an existing Wikibase for one or more reasons, such as a domain name update or desire to have https concept URIs instead of HTTP.

Below I walk through a minimal example of how this can be done using a small amount of data and the Wikibase Docker images. If you are not using the Docker images the steps should still work, but you do not need to worry about copying files into and out of containers or running commands inside containers.

Creating some test data

Firstly I need some test data, and for that data to exist in Wikibase and the Query service. I’ll go ahead with 1 property and 1 item, with some labels, descriptions and a statement.

Once the updater has run (by default it sleeps for 10 seconds before checking for changes) the triples can be seen in blazegraph using the following SPARQL query.

Query results showing the concept URI as http://wikibase.svc/entity

The concept URI is clearly visible in the triples as the default ‘wikibase.svc’ provided by the docker-compose example for wikibase.

Running a new query service

You could choose to load the triples with a new concept URI into the same queryservice and namespace. However to simplify things, specifically, the cleanup of old triples, a clean and empty query service is also a good choice.

In my docker-compose file, I will specify a new wdqs service with a new name and altered set of environment variables, with the WIKIBASE_HOST environment variable changed to the new URI.

This query service makes use of a new docker volume that I also need to define in my docker-compose.

As this URI is actually fake, and also in order to keep my updater requests within the local network I also need to add a new network alias to the existing wikibase service. After doing so my wikibase network section will look like this.

To apply the changes I’ll restart the wikibase service and start the new updater service using the following commands.

Now 2 blazegraph query services will be running, both controlled by docker-compose.

The published endpoint, via the wdqs-proxy, is still pointing at the old wdqs service, as is the updater that is currently running.

Dumping RDF from Wikibase

The dumpRdf.php maintenance script in Wikibase repo allows the dumping of all Items and properties as RDF for use in external services, such as the query service.

The default concept URI for Wikibase is determined from the wgServer MediaWiki global setting [code]. Before MediaWiki 1.34 wgServer was auto-detected [docs] in PHP.

Thus when running a maintenance script, wgServer is unknown, and will default to the hostname the wikibase container can see, for example, “b3a2e9156cc1”.

In order to avoid dumping data with this garbage concept URI one of the following must be done:

  • Wikibase repo conceptBaseUri setting must be set (to the new concept URI)
  • MediaWiki wgServer setting must be set (to the new concept URI)
  • –server <newConceptUriServerBase> must be provided to the dumpRdf.php script

So in order to generate a new RDF dump with the new concept URI, and store the RDF in a file run the following command.

The generated file can then be copied from the wikibase container to the local filesystem using the docker cp command and the name of the wikibase container for your setup, which you can find using docker ps.

Munging the dump

In order to munge the dump, first I’ll copy it into the new wdqs service with the following command.

And then run the munge script over the dump, specifying the concept URI.

The munge step will batch the data into a set of chunks based on a configured size. It also alters some of the triples along the way. The changes are documented here. If you have more data you may end up with more chunks.

Loading the new query service

Using the munged data and the loadData.sh script, the data can now be loaded directly into the query service.

A second updater

Currently, I have 2 query services running. The old one, which is public and still being updated by an updater, and the new one which is freshly loaded and slowly becoming out of date.

To create a second updater that will run alongside the old updater I define the following new service in my docker-compose file, which points to the new wikibase hostname and query service backend.

Starting it with a command I have used a few times in this blog post.

I can confirm using ‘docker ps’ and also by looking at the container logs that the new updater is running.

Using the new query service

You might want to check your query service before switching live traffic to it to make sure everything is OK, but I will skip that step.

In order to direct traffic to the newly loaded and now updating query service all that is needed is to reload the wdqs proxy with the new backend host using the wdqs-proxy docker image, this can be done with PROXY_PASS_HOST.

And the service can be restarted with that same old command.

Running the same query in the UI will now return results with the new concept URIs.

And if I make a new item (Q2) I can also see this appear in the new query service with the correct concept URI.

Result of: SELECT * WHERE { <http://somFancyNewLocation.foo/entity/Q2> ?b ?c. }

Cleanup

I left some things lying around that are no longer needed that I should cleanup. These include docker containers, docker volumes and files.

First the containers.

Then the volume. Note, this is a permanent removal of any data stored in the volume.

And other files, also being permanently removed.

I then also removed these services and volumes from the docker-compose yml file.

Things to consider

  • This process will take longer on larger wikibases.
  • If not using docker, you will have to run each query service on a different port.
  • This post was using wdqs 0.3.2. Future versions will likely work in the same way, but past versions may not.

Reading