Wikidata query service updater evolution

The Wikidata Query Service (WDQS) sits in front of Wikidata and provides access to query its data via a SPARQL API. The query service itself is built on top of Blazegraph, but in many regards is very similar to any other triple store that provides a SPARQL API.

In the early days of the query service (circa 2015), the service was only run by Wikidata, hence the name. However, as interest and usage of Wikibase continued to grow more people started running a query service of their own, for data in their own Wikibase. But you’ll notice most people still refer to it as WDQS today.

Whereas most core Wikibase functionality is developed by Wikimedia Deutschland, the query service is developed by the search platform team at the Wikimedia Foundation, with a focus on wikidata.org, but also a goal of keeping it useable outside of Wikimedia infrastructure.

The query service itself currently works as a whole application rather than just a database. Under the surface, this can roughly be split into 2 key parts

  • Backend Blazegraph database that stores and indexes data
  • Updater process that takes data from a Wikibase and puts it in the database

This actually means that you can run your own query service, without running a Wikibase at all. For example, you can load the whole of Wikidata into a query service that you operate, and have it stay up to date with current events. Though in practice this is quite some work, and expense on storage and indexing and I expect not many folks do this.

Over time the updater element of the query service updater has iterated through some changes. The updater now packaged with Wikibase as used by most folks outside of the Wikimedia infrastructure is now 2 steps behind the updater used for Wikidata itself.

The updater generations look something like this:

  • HTTP API Recent Changes polling updater (used by most Wikibases)
  • Kafka based Recent Changes polling updater
  • Streaming updater (used on Wikidata)

Let’s take a look at a high-level overview of these updaters, what has changed and why. I’ll also be applying some pretty arbitrary / gut feeling scores to 4 categories for each updater.

Read more

Wikidata maxlag, via the ApiMaxLagInfo hook

Wikidata tinkers with the concept of maxlag that has existed in MediaWiki for some years in order to slow automated editing at times of lag in various systems.

Here you will find a little introduction to MediaWiki maxlag, and the ways that Wikidata hooks into the value, altering it for its needs.

Screenshot of the “Wikidata Edits” grafana dashboard showing increased maxlag and decreased edits

As you can see above, a high maxlag can cause automated editing to reduce or stop on wikidata.org

Read more

Small commits

There are many blog posts and articles out there about making small git commits. I’m sure most people (including me) bring up the same few topics around why small commits are good and why we should all probably be making smaller commits.

In this post, I’ll look at some of the key topics from my perspective, and try to tie these topics to concrete examples from repositories that I have worked on. The topics are in no particular order, so be sure to give them all a read.

One thing to note is that “small” doesn’t necessarily mean small in terms of lines of code. Small here is also relative. Also, small commits can benefit you in many different places, but to stand the test of time they must end up in your main branch.

Git features during development

Git only takes full responsibility for your data when you commit

Commit Often, Perfect Later, Publish Once: Git Best Practices

Read more

Wikibase a history

I have had the pleasure of being part of the Wikibase journey one way or another since 2013 when I first joined Wikimedia Germany to work on Wikidata. That long-running relation to the project should put me in a fairly good position to give a high-level overview of the history, from both a technical and higher-level perspective. So here it goes.

For those that don’t know Wikibase is code that powers wikidata.org, and a growing number of other sites. If you want to know more read about it on Wikipedia, or the Wikibase website.

For this reason, a lot of the early timeline is quite heavy on the Wikidata side. There are certainly some key points missing, if you think they are worthy of mentioning then leave a comment or reach out!

Read more

Profiling a Wikibase item creation on test.wikidata.org

Today I was in a Wikibase Stakeholder group call, and one of the discussions was around Wikibase importing speed, data loading, and the APIs. My previous blog post covering what happens when you make a new Wikibase item was raised, and we also got onto the topic of profiling.

So here comes another post looking at some of the internals of Wikibase, through the lens of profiling on test.wikidata.org.

The tools used to write this blog post for Wikimedia infrastructure are both open source, and also public. You can do similar profiling on both your own Wikibase, or for your requests that you suspect are slow on Wikimedia sites such as Wikidata.

Wikimedia Profiling

Profiling of Wikimedia sites is managed and maintained by the Wikimedia performance team. They have a blog, and one of the most recent posts was actually covering profiling PHP at scale in production, so if you want to know the details of how this is achieved give it a read.

Throughout this post I will be looking at data collected from a production Wikimedia request, by setting the X-Wikimedia-Debug header in my request. This header has a few options, and you can find the docs on wikitech.wikimedia.org. There are also browser extensions available to easily set this header on your requests.

I will be using the Wikimedia hosted XHGui to visualize the profile data. Wikimedia specific documentation for this interface also exists on wikitech.wikimedia.org. This interface contains a random set of profiled requests, as well as any requests that were specifically requested to be profiled.

Profiling PHP & MediaWiki

If you want to profile your own MediaWiki or Wikibase install, or PHP in general, then you should take a look at the mediawiki.org documentation page for this. You’ll likely want to use either Tideways or XDebug, but probably want to avoid having to setup any extra UI to visualize the data.

This profiling only covered the main PHP application (MediaWiki & Wikibase extension). Other services such as the query service would require separate profiling.

Read more

WBStack close and migration

This entry is part 11 of 12 in the series WBStack

The time is approaching for the end of life of the WBStack alpha platform (don’t worry, it’s still some months away, and there is a migration path to a new platform etc :)).

In this post you’ll find an update on the current state of WBStack, another introduction to Wikibase.Cloud, some rough dates and connections to other communications. If you don’t know what WBStack is then you can start with this introduction.

Wikibase.Cloud

Following the pre launch announcement of Wikibase.Cloud at WikidataCon 2021, the WMDE team has been working on getting the new platform set up and ready to replace wbstack.com. This includes updates to components such as MediaWiki, Wikibase, reworking components, modifying code bases to be more easily maintained by a team, and generally getting to grips with the platform.

This new platform uses the same codebases, and architecture as wbstack.com does currently, but it is maintained by a team at Wikimedia Deutschland, rather than me, an individual.

You can read more on the launch from the WMDE perspective in the mailing list post that will be sent at the same time as this blog post.

Initially, Wikibase.cloud will launch as a closed beta for WBStack users who registered before February 2nd, 2022 with a waiting list for later expansion. To join the waiting list, please fill out this form https://lime.wikimedia.de/index.php/717538. Please note that the waitlist will be considered after WBStack migration has completed.

For current users of WBStack, you do not need to join this waitlist. You will receive an email with details on how you can opt-in to a migration to wikibase.cloud in March.

wikibase-cloud mailing list February 2022

As migration time approaches, I will be reaching out to the current users of WBStack about the options and approach to migration. And if you are a current user, it’s worth reading the rest of this post.

Read more

Pre-launch Announcement of Wikibase.Cloud [WikidataCon Writeup]

This entry is part 10 of 12 in the series WBStack

WikidataCon 2021 was in October 2021, and one of the sessions that I spoke in was a “Pre-launch Announcement and Preview of Wikibase.Cloud”.

The recording is now up on YouTube, and below you’ll find a write-up summary of what was said.

You can also find:

So what is wikibase.cloud?

It’s a new platform that has yet to be launched, that is based on WBStack code, but that will be managed and maintained by Wikimedia Deutschland (or Wikimedia Germany).

This is a Wikibase as a service platform, that exists to offer open knowledge projects a new way to create their own Wikibase very quickly and very easily.

Read more

Most liked Wikibase tweets

Wikidata is 9, and Wikibase the software that powers it is also thus about 9! Twitter has been around for the entire Wikibase lifespan. So let’s take a look back through time at some of the most liked Wikibase tweets (according to Twitter free search) since creation.

Want this list but for Wikidata? Check out my Wikidata focused post!

2021, @annechardo 113 💕s

My thesis on “Managing Archival Authority Data in the Data Web” is online! The first part (from #maintenance to #MetadataDebt via #RiC ) is completed by a case study using #Wikibase :

@annechardo, Twitter Translate
https://twitter.com/annechardo/status/1348650172268617731

Read more

WBStack in 2021 and the future

This entry is part 9 of 12 in the series WBStack

2021 is nearly over, WBStack is over 2 years old (initially announced back in 2019), and has continued to grow. The future is bright with wikibase.cloud looking to be launched by Wikimedia Deutschland in the new year (announced at WikidataCon 2021), and as a result, the code under the surface has had the most eyes on it since its inception.

Let’s take a closer look at some of the developments this year, and the progress that WBStack has made.

Current Usage

WBStack now has 148 individual user accounts registered on the platform that enabled wiki creation. These accounts have created 510 wikis with Wikibase installed since the platform was initially put online, and 335 of those wikis are still currently published (the other 175 have been deleted).

Nov 2019April 2020May 2020Nov 2021Dec 2021
Platform Users387076139148
Non deleted Wikis145306335
All Wikis65178226476510
Pages1.4 million1.9 million
Edits200,000295,0004.1 million4.6 million

Read more

Tech Lead Digest – Q3/4 2021

This entry is part 5 of 5 in the series Tech Lead Digest (wmde)

It’s time for the 5th instalment of my tech lead digest posts. I switched to monthly for 2 months, but decided to back down to quarterlyish. You can find the other digests by checking out the series.

🧑‍🤝‍🧑Wikidata & Wikibase

The biggest event of note in the past months was WikidataCon 2021 which took place toward the end of October 2021. Spread over 3 days the event celebrated Wikidatas 9th birthday. We are still awaiting the report from the event to know how many folks participated, and recordings of talks will likely not be available until early 2022. At which point I’ll try to write another blog post.

Just before WikidataCon the updated strategy for Linked Open Data was published by Wikimedia Deutschland which includes sub-strategies for Wikidata and the Wikibase Ecosystem. This strategy is much easier to digest than the strategy papers published in 2019 and I highly recommend the read. Part of the Wikidata strategy talks about “sharing workload” which reminds me of some thoughts I recently had comparing Wikipedia and Wikidata editing. Wikibase has a focus on Ecosystem enablement, which I am looking forward to working on.

The Wikibase stakeholder group continues to grow and organize. A Twitter account (@wbstakeholders) now exists tweeting relevant updates. Now with over 14 organizational members and 15 individual members, the budget is now public and the group is working on getting some desired features implemented. If you are an organization or individual working in the Wikibase space, be sure to check them out! The group recently published a prioritized list of institutional requirements, and I’m happy to say that some parts of the “Automatic maintenance processes and updating cascades should work out of the box” area that scored 4 have already been tackled by the Wikidata / Wikibase teams.

Read more