Wikibase of Wikibases
The Wikibase registry was one of the outcomes of the first in a series of Federated wikibase workshops organised in partnership with the European research council.
The aim of the registry is to act as a central point for details of public Wikibase installs hosted around the web. Data held about the installs currently includes the URL for the home page, Query frontend URL and SPARQL API endpoint URL (if a query service exists).
During the workshop an initial data set was added, and this can be easily seen using the timeline view of the query service and a query that is explained within this post.
Setting up the Wikibase install
The registry is running on the WMF Cloud infrastructure using the wikibase and query service docker images on a single m1.medium VPS with 2 CPUs, 4GB RAM and 40GB disk.
The first step was to request the creation of a project for the install. The current process for this is to create a Phabricator ticket, and that ticket can be seen here.
Once the project was created I could head to horizon (the openstack management interface) and create a VPS to host the install.
I chose the m1.medium flavour for the 4GB memory allowance. As is currently documented in the wikibase-docker docker-compose example readme the setup can fail with less than 3GB memory due to the initial spike in memory usage when setting up the collection of docker services.
Once the machine was up and running I could install docker and docker-compose by following the docker docs for Debian (the OS I chose during the machine creation step).
With docker and docker-compose installed it was time to craft my own docker-compose.yml file based on the example currently present in the wikibase-docker repo.
The key environment variables to change were:
- For the wikibase service:
- MW_ADMIN_NAME: <Some user to be created>
- MW_ADMIN_PASS: <The password for the above user>
- MW_SITE_NAME: “Wikibase Registry”
- DB_PASS: <Password matching the MySQL service>
- DB_USER: <Username matching the MySQL service>
- DB_NAME: <DB name matching the MySQL service>
- For the MySQL service:
- MYSQL_DATABASE: <Some SQL table to be auto created>
- MYSQL_USER: <Some username for DB access>
- MYSQL_PASSWORD: <Some password for DB access>
- For the query service frontend service:
- BRAND_TITLE: <Name to be displayed for the UI>
- For the query service and updater services:
- WIKIBASE_HOST: wikibase-registry.wmflabs.org
The docs for the environment variables are visible in the README for each image use for the service. For example the ‘wikibase’ image docs can be found in this README.
Once created it was time to start running the services using the following command:
user@wbregistry-01:~/wikibase-registry# docker-compose up -d Creating network "wikibase-registry_default" with the default driver Creating volume "wikibase-registry_mediawiki-mysql-data" with default driver Creating volume "wikibase-registry_mediawiki-images-data" with default driver Creating volume "wikibase-registry_query-service-data" with default driver Creating wikibase-registry_mysql_1 ... done Creating wikibase-registry_wdqs_1 ... done Creating wikibase-registry_wikibase_1 ... done Creating wikibase-registry_wdqs-proxy_1 ... done Creating wikibase-registry_wdqs-updater_1 ... done Creating wikibase-registry_wdqs-frontend_1 ... done
The output of the command stated that everything correctly started, and I double checked using the following:
user@wbregistry-01:~/wikibase-registry# docker-compose ps Name Command State Ports ------------------------------------------------------------------------------------------------- wikibase-registry_mysql_1 docker-entrypoint.sh mysqld Up 3306/tcp wikibase-registry_wdqs-frontend_1 /entrypoint.sh nginx -g da ... Up 0.0.0.0:8282->80/tcp wikibase-registry_wdqs-proxy_1 /bin/sh -c "/entrypoint.sh" Up 0.0.0.0:8989->80/tcp wikibase-registry_wdqs-updater_1 /entrypoint.sh /runUpdate.sh Up 9999/tcp wikibase-registry_wdqs_1 /entrypoint.sh /runBlazegr ... Up 9999/tcp wikibase-registry_wikibase_1 /bin/sh /entrypoint.sh Up 0.0.0.0:8181->80/tcp
Wikibase and the query service UI were exposed on ports 8181 and 8282 on the machine respectively, but the openstack firewall rules would block any access from outside the project by default, so I created 2 new rules allowing ingress from within the labs network (range 10.0.0.0/8).
I could then setup a web proxy in horizon to map some domains to the exposed ports on the machine.
With the proxies created the 2 services were then accessible to the outside world:
Adding some initial data
The first version of this repository was planned to just hold Items for Wikibase installs. The initial list of properties could be pretty straight forward. A link to the homepage of the wiki is of course useful, and enables navigating to the site. Sites may not expose a query service in a uniform way, so a property would also be needed for this. The SPARQL endpoint used by the query service could also differ thus another property would be needed. And finally to be able to display the initial data on a timeline, and initial creation date would be needed. I added a property for install logo to make the timeline a little prettier.
The properties created initially to describe Wikibase installs with (with example data values for wikidata.org) can be seen below:
- Main Page (P2) – https://www.wikidata.org/wiki/Wikidata:Main_Page
- Query Service UI (P3) – https://query.wikidata.org/
- SPARQL endpoint (P4) – https://query.wikidata.org/sparql
- Creation date (P5) – 26 October 2012
- Commons Logo (P8) – https://commons.wikimedia.org/wiki/File:Wikidata-logo-en.svg
Some other properties were also created:
- License (P6) – Probably can’t be used yet. It would make the most sense to be able to refer to Items from wikidata.org here for Licences, but that is not yet possible. We could create copies of the licences needed for this install locally, this would also require creating some sort of “instance of” property, to identify what items are installs and which are licences.
- Reference URL (P7) – To be used for quick initial referencing of data added.
I then added all other wikibase instances run by the WMF which included test and beta Wikidata sites. Wikiba.se also contains a list of Wikibase installs (although out of date). I also managed to find some new installs from wikiapiary looking at the Wikibase Repo extension usage. And of course some of the people in the room had instances to add to the list.
I based the creation date on the rough creation of the first item, or an official inception date. All of the creation date statements should probably have references.
The timeline query
The below SPARQL queries show the creation of a federated timeline query crossing the local wikibase query service (for the registry) and also the wikidata.org query service.
1) Select all Items with our date property (P5):
SELECT ?item ?date WHERE { ?item wdt:P5 ?date . }
2) Use the label service to select the Item Labels instead of IDs:
SELECT ?itemLabel ?date WHERE { ?item wdt:P5 ?date . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } }
3) Also select the logo (P8) if it exists:
SELECT ?itemLabel ?date ?logo WHERE { ?item wdt:P5 ?date . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } OPTIONAL { ?item wdt:P8 ?logo } }
4) Display the results on a timeline by default:
#defaultView:Timeline SELECT ?itemLabel ?date ?logo WHERE{ ?item wdt:P5 ?date . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } OPTIONAL { ?item wdt:P8 ?logo } }
5) Also include some results from the wikidata.org query service (using federated queries) to show the WikidataCon events:
In this query new prefixes are needed for wikidata.org as the default “wd” and “wdt” prefixes point to the local wikibase install.
Q37807168 on wikidata.org is “WikidataCon” and P31 is “instance of”.
#defaultView:Timeline PREFIX wd-wd: <http://www.wikidata.org/entity/> PREFIX wd-wdt: <http://www.wikidata.org/prop/direct/> SELECT ?itemLabel ?date (SAMPLE(?logo) AS ?image) WHERE { { ?item wdt:P5 ?date . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } OPTIONAL { ?item wdt:P8 ?logo } } UNION { SERVICE <https://query.wikidata.org/sparql> { ?item wd-wdt:P31 wd-wd:Q37807168 . ?item wd-wdt:P580 ?date . SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } OPTIONAL { ?item wd-wdt:P154 ?logo } } } } GROUP BY ?itemLabel ?date
This generates the timeline that you see at the top of the post.
Other issues noticed during setup
Some of the issues were known before this blog post, but others were fresh. Nonetheless if you are following along the following issues and tickets may be of help:
- The example queries displayed as part of the query service UI are hardcoded to wikidata.org
- The RecentChanges page of the install displays a very weird date range, which makes looking at recent changes fairly impossible
- The query service UI links to entity URIs / URLs that are not correctly redirected by the current Wikibase docker image
- The “CommonsMediaType” is not currently reusable for other media repositories
- Properties and Items from Wikidata.org can not currently be used on other Wikibase installs
- The BRAND_TITLE env var did not appear to change the name of the query service UI
- The default MediaWiki admin that should be created by install.php as part of the install step could not be logged into
[…] over a month ago I setup the Wikibase registry project on Wikimedia Cloud VPS using the docker-compose example provided by Wikibase docker images. The […]
[…] installation creation process is documented in this blog post, and some customization regarding LocalSettings and extensions was covered here.The current state […]
[…] the Wikibase Registry is deployed using the shoehorning approach described in one of my earlier posts. After continued […]
[…] A Wikibase of Wikibases (Wikibase registry) was created as one of the outcomes of these workshops, making use of the docker images released in the previous year. […]