Wikibase a history

February 15, 2022 1 By addshore

I have had the pleasure of being part of the Wikibase journey one way or another since 2013 when I first joined Wikimedia Germany to work on Wikidata. That long-running relation to the project should put me in a fairly good position to give a high-level overview of the history, from both a technical and higher-level perspective. So here it goes.

For those that don’t know Wikibase is code that powers wikidata.org, and a growing number of other sites. If you want to know more read about it on Wikipedia, or the Wikibase website.

For this reason, a lot of the early timeline is quite heavy on the Wikidata side. There are certainly some key points missing, if you think they are worthy of mentioning then leave a comment or reach out!

2005

At Wikimania 2005 there was a series of talks on “Semantic web”. One of these was Wikipedia and the Semantic Web – The Missing Links, and this Wikimania lead to the creation of Semantic MediaWiki.

The WikiProject “Semantic MediaWiki” provides a common platform for discussing extensions of the MediaWiki software that allow for simple, machine-based processing of Wiki-content. This usually requires some form of “semantic annotation,” but the special Wiki environment and the multitude of envisaged applications impose a number of additional requirements.

Semantic MediaWiki at 22:29, 2 January 2006

The initial version of Semantic MediaWiki was released in late 2005 (version 0.1), with 4.0.0 being released at the start of 2022.

If you read through the Wikimania and connected resources carefully, you’ll find a reference to Wikidata already, though at this point Wikidata is only a project proposal.

Wikidata is a proposed wiki-like database for various types of content. This project as proposed here requires significant changes to the software (or possibly a completely new software) but has the potential to centrally store and manage data from all Wikimedia projects, and to radically expand the range of content that can be built using wiki principles.

Wikidata/Archive/Wikidata/historical at 19:16, 30 November 2005

And ultimately the Wikidata project lead to the creation of the Wikibase software.

2012

There was certainly some work behind the scenes between 2005 and 2012 but most of this seemingly doesn’t have a super public record. There certainly will have been work done on the project proposal, and ongoing discussions with the Wikimedia Foundation about the project.

In March 2012, the Wikimedia Foundation and Wikimedia Germany jointly announced “The Wikipedia data revolution”.

Wikimedia Deutschland, the German chapter of the Wikimedia movement, and the Wikimedia Foundation are proud to announce Wikidata, a collaboratively edited database of the world’s knowledge and the first new Wikimedia project since 2006.

The Wikipedia data revolution (Wikimedia Foundation)

A team of 12 was hired and announced in April to work on Wikidata, with the team being complete at the end of March with the first office hour.

If you want a video introduction from 2012 take a look at this video from SMWCon Fall 2012 in a session called “Wikidata: Semantic Wikipedia”.

If you want to know the original goals of the Wikidata project, and thus the Wikibase software, take a look here. (Maybe I should write some of this up soon)…

Also in April 2012, Jeroen De Dauw created the initial content of the first Wikibase extension page on mediawiki.org. And thus Wikibase was born.

And the end of 2012 you can start to see the structure of Wikibase by looking back at this extension page and the code at the time. We have:

  • client, repo, lib: The three sub extensions that have been a part of the Wikibase git repository since the eary days. One for Wikidata.org, one for Wikipedias, and one containing shared code.
  • terms (labels, descriptions aliases): So that concepts can be identified in language
  • sitelinks: Connections from Wikibase to other MediaWiki sites such as Wikipedia
  • Namespaces for “data”, now “items”, properties, and queries.

2013

In 2013 I joined the team 🎉🎉🎉. And there are some things that I distinctly remember:

  • There were continued disucssions around how to get started with a query service
  • Multiple libraries were split out to be reusable outside of the main Wikibase codebase such as DataValues, DataTypes, DataModel (Some of these were created as MediaWiki extensions before later being turned into libraries.)
  • We were still doing a phased role out of Wikidata to various Wikimedia projects (phase 1 being sitelinks)
  • I personally remember working on the Wikbiase Action API soomewhat, adding item merge functioanlity.

The main hidden gem that is worth pointing out about 2013 developments is that some portion of time was spent developing WikibaseQuery, WikibaseQueryEngine and WikibaseDatabase that never saw the light of day. These were primarily built to meet the first usecase of “Query by one property and one value“.

2014

It may seem insignificant, but 2014 saw the first version of the wikiba.se website.

Wikibase is a collection of applications and libraries for creating, managing and sharing structured data. It is an open source project, and everyone is welcome to join in development.

wikiba.se in 2014

JSON dumps of Wikidata were created for the first time this year.

The various query related extensions developed were archived, as the Wikimedia Foundation had a need for both simple and complex queries for a project called WikiGrok. Work kicked off at the foundation looking into Wikibase indexing needs and goals.

2015

The news of the year was certainly that the Wikidata Query Service was launched by the Discovery team at the Wikimedia Foundation. This was the SPARQL and blazegraph implementation that we have now been using next to Wikibase for the past 7 years.

A side note here is that Titan was originally evaluated, but looks like it was ditched as it, and the team was bought by DataStax to build a new graph database (Ironically this happened with blaze graph a few years later).

The SPARQL endpoint also saw the completion of the RDF mapping for Wikibase, so now we have stable RDF output.

Generally speaking, the Wikibase extension itself looks very similar to the early years, but extensions such as WikibaseQualityConstraints were developed and deployed to Wikidata.

2016

Wikibase code docs are now built to doc.wikimedia.org (patch).

I’m sure other things happened this year, but things really start to pick up in 2017! ;)

2017

Wikibase docker images saw the light of day to try and make Wikibase easier to get started with.

I feel that this really was a springboard enabling many more folks to try out Wikibase for their own projects locally, and also to run production instances.

Code wise, the “data-access” component appeared for the first time in Wikibase.git.

2018

On 23-25 April 2018, a “Workshop on harnessing open data for Monitoring and Evaluation” is taking place in Antwerp (Q12892), focused on using Wikibase (Q16354758) instances federated with Wikidata (Q2013) in the context of research assessment (Q51844619).

Wikidata:WikiProject Wikidata for research/Meetups/2018-04-23-25-Antwerpen

This round of workshops showed a real momentum increase around interest in Wikibase. At this point, although there were technical developments ongoing on the Wikibase software, these were still all primarily driven from a Wikidata perspective.

A Wikibase of Wikibases (Wikibase registry) was created as one of the outcomes of these workshops, making use of the docker images released in the previous year.

2019

WBStack, the first Wikibase as a service, was launched.

A first Wikibase Ecosystem strategy paper was published. At a high level this said “Wikibase powers a thriving linked open data web that is the backbone of free and open knowledge”, looking at some key areas:

  • Focus on enabling connections between data and people
  • Partner with the main players in their field, utilize network effects and branch out
  • Leverage mandates to open up data
  • Maximize the competitive advantage gained via Wikidata

2020

Things start getting a little easier here, as Envel Le Hir has started collecting yearly summaries of Wikibase, such as “Wikibase Yearly Summary 2020“. I highly recommend reading these for a full overview, but I’ll extract some key points here.

Code wise the introduction of “packages” in Wikibase.git happened!

2021

Wikibase Yearly Summary 2021 by Envel Le Hir

This year Wikibase got its own all-important Twitter account. More and more workshops and projects around Wikibase were created, including a series of working hours around WBStack. Great projects exposing user needs were created such as RaiseWikibase. Federated properties, blog posts, WikidataCon 2021 and more.

The Wikibase stakeholder group is thriving with 17 organizational members, and 26 individual members. Institutional requirements have been collected and presented, and the group even has a budget to work with, and also a Twitter account!

Most importantly the new Linked Open Data strategy was published by Wikimedia Germany. The highlight of this for Wikibase, is the clear and distinct strategy for the Wikibase Ecosystem.

  • Empower knowledge curators to share their data: Increase the number and diversity of Wikibases that can eventually be connected to the LOD web.
  • Ecosystem enablement: Enable an ecosystem of extensions as well as tools and custom interfaces based on WB APIs to emerge around Wikibase, extending the functionality of the software for more use cases.
  • Connect data across technological & institutional barriers: Ensure Wikibases can connect more deeply with each other and Wikidata to form an LOD web

Code wise some of the libraries that were split out of Wikibase.git back in 2013 were moved back into the code base to be managed as a mono repo.

2022

Wikibase Yearly Summary 2022 by Envel Le Hir

It’s only February, and the next thing on the cards for Wikibase is the Wikibase.cloud offering by Wikimedia Deutschland to replace wbstack.com.

Lots still to happen here, as I am writing this in February :)