Wikibase of Wikibases

April 30, 2018 4 By addshore

The Wikibase registry was one of the outcomes of the first in a series of Federated wikibase workshops organised in partnership with the European research council.

The aim of the registry is to act as a central point for details of public Wikibase installs hosted around the web. Data held about the installs currently includes the URL for the home page, Query frontend URL and SPARQL API endpoint URL (if a query service exists).

During the workshop an initial data set was added, and this can be easily seen using the timeline view of the query service and a query that is explained within this post.

Setting up the Wikibase install

The registry is running on the WMF Cloud infrastructure using the wikibase and query service docker images on a single m1.medium VPS with 2 CPUs, 4GB RAM and 40GB disk.

The first step was to request the creation of a project for the install. The current process for this is to create a Phabricator ticket, and that ticket can be seen here.

Once the project was created I could head to horizon (the openstack management interface) and create a VPS to host the install.

I chose the m1.medium flavour for the 4GB memory allowance. As is currently documented in the wikibase-docker docker-compose example readme the setup can fail with less than 3GB memory due to the initial spike in memory usage when setting up the collection of docker services.

Once the machine was up and running I could install docker and docker-compose by following the docker docs for Debian (the OS I chose during the machine creation step).

With docker and docker-compose installed it was time to craft my own docker-compose.yml file based on the example currently present in the wikibase-docker repo.

The key environment variables to change were:

  • For the wikibase service:
    • MW_ADMIN_NAME: <Some user to be created>
    • MW_ADMIN_PASS: <The password for the above user>
    • MW_SITE_NAME: “Wikibase Registry”
    • DB_PASS: <Password matching the MySQL service>
    • DB_USER: <Username matching the MySQL service>
    • DB_NAME: <DB name matching the MySQL service>
  • For the MySQL service:
    • MYSQL_DATABASE: <Some SQL table to be auto created>
    • MYSQL_USER: <Some username for DB access>
    • MYSQL_PASSWORD: <Some password for DB access>
  • For the query service frontend service:
    • BRAND_TITLE: <Name to be displayed for the UI>
  • For the query service and updater services:
    • WIKIBASE_HOST: wikibase-registry.wmflabs.org

The docs for the environment variables are visible in the README for each image use for the service. For example the ‘wikibase’ image docs can be found in this README.
Once created it was time to start running the services using the following command:

user@wbregistry-01:~/wikibase-registry# docker-compose up -d
Creating network "wikibase-registry_default" with the default driver
Creating volume "wikibase-registry_mediawiki-mysql-data" with default driver
Creating volume "wikibase-registry_mediawiki-images-data" with default driver
Creating volume "wikibase-registry_query-service-data" with default driver
Creating wikibase-registry_mysql_1 ... done
Creating wikibase-registry_wdqs_1     ... done
Creating wikibase-registry_wikibase_1   ... done
Creating wikibase-registry_wdqs-proxy_1   ... done
Creating wikibase-registry_wdqs-updater_1  ... done
Creating wikibase-registry_wdqs-frontend_1 ... done

The output of the command stated that everything correctly started, and I double checked using the following:

user@wbregistry-01:~/wikibase-registry# docker-compose ps
              Name                             Command               State          Ports
-------------------------------------------------------------------------------------------------
wikibase-registry_mysql_1           docker-entrypoint.sh mysqld      Up      3306/tcp
wikibase-registry_wdqs-frontend_1   /entrypoint.sh nginx -g da ...   Up      0.0.0.0:8282->80/tcp
wikibase-registry_wdqs-proxy_1      /bin/sh -c "/entrypoint.sh"      Up      0.0.0.0:8989->80/tcp
wikibase-registry_wdqs-updater_1    /entrypoint.sh /runUpdate.sh     Up      9999/tcp
wikibase-registry_wdqs_1            /entrypoint.sh /runBlazegr ...   Up      9999/tcp
wikibase-registry_wikibase_1        /bin/sh /entrypoint.sh           Up      0.0.0.0:8181->80/tcp

Wikibase and the query service UI were exposed on ports 8181 and 8282 on the machine respectively, but the openstack firewall rules would block any access from outside the project by default, so I created 2 new rules allowing ingress from within the labs network (range 10.0.0.0/8).

I could then setup a web proxy in horizon to map some domains to the exposed ports on the machine.

With the proxies created the 2 services were then accessible to the outside world:

Adding some initial data

The first version of this repository was planned to just hold Items for Wikibase installs. The initial list of properties could be pretty straight forward. A link to the homepage of the wiki is of course useful, and enables navigating to the site. Sites may not expose a query service in a uniform way, so a property would also be needed for this. The SPARQL endpoint used by the query service could also differ thus another property would be needed. And finally to be able to display the initial data on a timeline, and initial creation date would be needed. I added a property for install logo to make the timeline a little prettier.

The properties created initially to describe Wikibase installs with (with example data values for wikidata.org) can be seen below:

Some other properties were also created:

  • License (P6) – Probably can’t be used yet. It would make the most sense to be able to refer to Items from wikidata.org here for Licences, but that is not yet possible. We could create copies of the licences needed for this install locally, this would also require creating some sort of “instance of” property, to identify what items are installs and which are licences.
  • Reference URL (P7) – To be used for quick initial referencing of data added.

I then added all other wikibase instances run by the WMF which included test and beta Wikidata sites. Wikiba.se also contains a list of Wikibase installs (although out of date). I also managed to find some new installs from wikiapiary looking at the Wikibase Repo extension usage. And of course some of the people in the room had instances to add to the list.

I based the creation date on the rough creation of the first item, or an official inception date. All of the creation date statements should probably have references.

The timeline query

The below SPARQL queries show the creation of a federated timeline query crossing the local wikibase query service (for the registry) and also the wikidata.org query service.

1) Select all Items with our date property (P5):

SELECT ?item ?date
WHERE {
    ?item wdt:P5 ?date .
}

2) Use the label service to select the Item Labels instead of IDs:

SELECT ?itemLabel ?date
WHERE {
    ?item wdt:P5 ?date .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}

3) Also select the logo (P8) if it exists:

SELECT ?itemLabel ?date ?logo
WHERE {
    ?item wdt:P5 ?date .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
    OPTIONAL { ?item wdt:P8 ?logo }
}

4) Display the results on a timeline by default:

#defaultView:Timeline
SELECT ?itemLabel ?date ?logo
WHERE{
    ?item wdt:P5 ?date .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
    OPTIONAL { ?item wdt:P8 ?logo }
}

5) Also include some results from the wikidata.org query service (using federated queries) to show the WikidataCon events:

In this query new prefixes are needed for wikidata.org as the default “wd” and “wdt” prefixes point to the local wikibase install.
Q37807168 on wikidata.org is “WikidataCon” and P31 is “instance of”.

#defaultView:Timeline

PREFIX wd-wd: <http://www.wikidata.org/entity/>
PREFIX wd-wdt: <http://www.wikidata.org/prop/direct/>

SELECT ?itemLabel ?date (SAMPLE(?logo) AS ?image)
WHERE
{
  {
   ?item wdt:P5 ?date .
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
   OPTIONAL { ?item wdt:P8 ?logo }
  }
 UNION
  {
   SERVICE <https://query.wikidata.org/sparql> {
    ?item wd-wdt:P31 wd-wd:Q37807168 .
    ?item wd-wdt:P580 ?date .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
    OPTIONAL { ?item wd-wdt:P154 ?logo }
   } 
  }
}
GROUP BY ?itemLabel ?date

This generates the timeline that you see at the top of the post.

Other issues noticed during setup

Some of the issues were known before this blog post, but others were fresh. Nonetheless if you are following along the following issues and tickets may be of help: