2 years of wikibase.cloud by WMDE
It’s been somewhere between 2 and 3 years since WMDE took over WBStack, turned it into wikibase.cloud. During this time, my techy focus has slowly shifted away from the world of Wikibase, though I still enjoy following along and working on other Wikimedia areas.
Here I will ramble on about what I saw in terms of potential for wikibase.cloud within the Wikibase ecosystem, as well as what developments have happened within the past years.
The initial problems, goals and dreams
From An introduction to WBStack, I said:
The idea behind the project is to provide Wikibase and surrounding services, such as a blazegraph query service, query service ui, quick statements, and others on a shared platform where installs, upgrades and maintenance are handled centrally.
Now, this is fairly obvious, and clearly something that wikibase.cloud still offers today, however I didn’t write why!? And this is potentially something that has gotten lost through the years of multiple PMs, multiple engineers, multiple project names etc.
Reducing time and money spent on maintenance across the ecosystem
Many individuals and organizations were spending lots of time maintaining Wikibase and MediaWiki. Some of this was also spilling over into Wikimedia Germany, helping support various institutions with their Wikibase installs, even running some prototype instances for folks.
Whether you are maintaining 1, 10 or 100 installations of Wikibase, this can be done mostly in a way where you get exponentials gains managing them as a single system (or platform), rather than on an individual basis. This applies to both time and money.
I had a call with some folks over at Automattic back in 2021 and some key takeaways could be summarized in line with reducing overheads across an ecosystem, in their case WordPress:
- Take the famous 5 minute installation, turn it into 5 second installation
- Don’t worry about hosting, downloading, just go to the sign-up page, site created for you
- Platform hasn’t changed in terms of architecture or infrastructure, except that it has grown bigger (200 million WordPress sites)
- Fractions of a penny per site model
Ultimately, for a large ecosystem to thrive, in a world where most users are non-technical, arguably cash strapped, and part of slow moving institutions or organizations, solving these overheads would be a great win.
I’m sure I could be quoted saying something along the lines of… I would rather people spend time and money on the data and knowledge being created and maintained as part of a set of open data sets, rather than on upgrading MediaWiki and the Query service for the 9th time.
Internal team knowledge
Wikimedia Germany itself, up until this point, didn’t really have too much experience running Wikibase and its components at any sort of scale, as the Wikimedia Foundation mostly does this for Wikidata, and now Wikimedia Commons. Most folks developing on the Wikibase code base for example wouldn’t run with all components active, and wouldn’t get a feel for the whole stack for years. The foundation even entirely develops some components that are critical to the setup, such as the query service, and now also elastic search, further distancing any Wikibase team from getting a holistic feel for the technical area.
And as the teams grew, split and turned over, this knowledge continued to dilute, and there have been and still are many areas of the Wikibase stack that lack internal knowledge by the team. (Not to mention that from the “outside” it’s incredibly hard to actually see what and or who the team is, what they are working on and prioritising).
Dogfooding, one of more Wikibase’s seemed like an easy win, and in dream land would lead to improved experiences across the board for anyone deploying or working on or in the Wikibase area. Documentation, guides, improving workflows and making general life better across the ecosystem in all Wikibase related product areas.
Connection to users & turnaround time
Users of Wikibase for years had been very distantly connected to the developments that were happening within Wikibase, with most product movements actually being driven from the Wikidata side of things.
Many users and use cases in the wider potential ecosystem are drastically different from the concerns of Wikidata, but finding that meaningful connection to Wikibase users was hard, and any developments that happened within Wikibase wouldn’t often make their way to end users for years (still a problem today, even with wikibase.cloud it turns out).
Wikibase.cloud had and has the potential to close this feedback loop, allowing much tighter iterations on core Wikibase changes and features, as well as helping to align a large portion of the experiences while interacting with Wikibase installations. A new requirement could emerge, code be written, pushed out to a large portion of Wikibase users in a short period of time, iterated on, and finalized before being added to a “real” Wikibase release for those folks that run on-prem LTS versions etc.
Movement and organizational alignment
Including some choice quotes:
Wikimedia is a global movement whose mission is to bring free educational content to the world. — Wikimedia
We liberate knowledge and make it accessible to everyone! — WMDE
Empower and engage people around the world to collect and develop educational content under a free licence or in the public domain, and to disseminate it effectively and globally. — WMF
Really, it seems to fit quite well (with a few conditions):
- collect and develop free educational content
- free licence or in the public domain
- disseminate it effectively and globally
- accessible to everyone
- liberate knowledge
- empower and engage
Back during the discussions of WBStack, and its future either with or without WMDE and Wikimedia support, I had a strong feeling that “life” within the walls of WMDE would lead to longevity with regards to the resulting collected data, and would serve the open ecosystem more.
A quote from early discussions:
It’s clear it makes sense for WMDE goals, but presenting the clear mission alignment might help.
Looking one level down, through the Linked Open Data strategy presented by WMDE, this is again clear, with some focus on the following 3 points:
- Empower knowledge curators to share their data
- Ecosystem enablement
- Connect data across technological & institutional barriers
A 2+ year summary (timeline)
Collected from various announcements, my own knowledge, commits and Phabricator tickets, this roughly describes the life of Wikibase.cloud over the last 2 years (and briefly the years before).
Naturally, more things have happened than are included in this list, but I’m trying to summarize the more notable and outward facing or progressive points in terms of what I said above.
- 2019: WBStack launch
- 2021
- 2022
- April
- June
- August
- November
- wbaas-deploy is officially open-source!
- 2023
- January
- Development plan says wikibase.cloud has an objective of expanding “its user base with an improved experience for new joiners and continuous learning to suit their needs while increasing infrastructure reliability.”
- February
- March
- ElasticSearch has been rolled out to all Wikibases on Cloud
- Search issues started?
- April
- MediaWiki 1.38
- May
- MediaWiki 1.39
- October
- We’re finally in Open Beta!
- November
- QueryService performance improvements
- Wrapped up ElasticSearch version updates
- December
- New homepage was release
- Evelien leaves as PM (roughly)
- January
- 2024
- January
- QuestyCaptcha deployed
- May
- July
- September
- Note: These come from https://meta.wikimedia.org/wiki/Wikibase/Wikibase.cloud, but there were no updates there in March, April, May, June, July, August, so timings might be off…
- First iteration of the feature to import entities from Wikidata implemented.
- Users can now illustrate their entities with images from Wikimedia Commons.
- We optimized how the search engine (Elasticsearch) is shared across instances
- Note: These come from https://meta.wikimedia.org/wiki/Wikibase/Wikibase.cloud, but there were no updates there in March, April, May, June, July, August, so timings might be off…
- January
Some things that have happened that I won’t tie to a specific time include:
- Keeping software and services up to date
- Putting guard rails in place to stop the service from being abused in any way (rate limits, size limits etc)
- UI improvements and fixes at the wikibase.cloud platform level (mostly not touching Wikibase itself)
- UI research
- Metrics and statistics collection around the use of wikibase.cloud
- Empty wiki notifications (for wiki owners)
- Setting initial Main Page content
Size and growth
Taking a look at the regular status updates that are published on wiki, we can see that growth of the platform in terms of number of sites looks something like this…
Taking a look at all of the wikibase installs recorded on wikibase.world, we have some sort of size and growth to compare this to, where we really see things pick up in 2018. Though there are only 13 wikibase.cloud wikis recorded on wikibase.world at the time of writing this post.
So inferring some things and making wild assumptions, there are currently 72 wikibases out there in the wild (as recorded on wikibase.world) and around 1000 wikibases on wikibase.cloud.
I’d love to dive into the details of these a little more at some point and try to figure out exactly what is running where, by who and on what versions etc…
My thoughts
On the whole, the team has had a big learning curve when it comes to managing and maintaining Wikibase and surrounding services, which as mentioned above, was mainly done by the Wikimedia Foundation in the past (for better or for worse).
WMDE historically doesn’t run many services, and those that are run are mostly simple websites or smaller applications without as many complex dependencies. WbStack was created focused around scaling on Kubernetes and deployed to Google Cloud, both of which are mostly new tech for many team members.
Many developments have happened at both a high and low level over the platform, be that upgrading core infrastructure and services such as elastic search, making many changes and improvements to the UI control plan that wikibase.cloud offers on top of Wikibase, or tweaking MediaWiki and Wikibase configuration to provide or disable already developed functioanlity such as Instant commons.
Ultimately development of Wikibase, and development on the components required by Wikibase probably hasn’t happened by the team much, at least not in a way that furthers Wikibase for users that are no on cloud. For example, default Main Page content changes happened, and sidebar changes happened, but these only exist on wikibase.cloud, not within Wikibase. Really it would have been nice if the team could have pushed forward on small Wikibase features that make a large impact on cloud users that surfaced over the years, and would have equal impact on other Wikibase users.
There have been 3 PMs covering wikibase.cloud in the past 3 years, and a single year isn’t really much time to catch up on the seemignly complex (and slow) nature of any Wikimedia based product, especially one that spans so many areas and concerns. I can see why there has been so much focus on the top layer of what cloud has to offer, rather than the depths of Wikibase funcaionlity and what could perhaps change there.
Testimonials
I didn’t want this post to end too dry and board, and almost like me moaning that some things might not have happened etc, so I poked around in the telegram group for a few testimonials for cloud.
I’ll start with myself, and say that wikibase.cloud is the perfect place for me to go and prototype data within wikibase, and to provide concrete examples of how other peoples data might be able to be structured within Wikibase structures, and how things can thus be meachine accessible, human curatable, in a common interface and have powerfull tools working alongside it.
By running the VHP4Safety Compound Wiki (https://compoundcloud.wikibase.cloud/) on wikibase.cloud, we get a free hosted Wikibase instance to make toxicology-related data available as Linked Open Data. A custom Wikibase has the advantage over Wikidata that we can define external IDs for rare, domain-specific resources important and specific to our research.
USGS is building the Geoscience Knowledgebase (https://geokb.wikibase.cloud/) as a way of organizing our entire geoscience portfolio adjacent to the ‘Global Knowledge Commons.’ By establishing same as relationships with Wikidata properties and entities as well as linkages with other knowledge organization systems, we are working to share our institutional knowledge in a way that others can build from.
As a Wikidata contributor, I was thrilled to learn about wikibase.cloud. It was a natural choice for a knowledge base I am building on symbiotic interactions involving microorganisms (https://ppsdb.wikibase.cloud/), as it is easy to use and integrate with the existing Wikidata ecosystem.
These 3 Wikibases alone have over 250k entities, curated over around 1.5 million edits.
Wikibase.cloud seems to have around 4.3 million entities, curated over around 24.6 million edits.
Wikidata has around 114 million, so in terms of entities, wikibase.cloud is ~3.7% there?
It would be nice to know about “connectedness” of these Wikibase instances, or more generally, write up a post about how you see the future of federation and challenges that instances still have for incorporating another instance’s knowledge into their instance? Is it easy enough for them to configure and point to another instance that they want to index, so that the external instances items/properties show up when searching within their instance? Federation?
Yes, I’d love to dive down into the data in cloud a little more:
a) Looking at number of datamodel elements (terms, statements) across time.
b) Seeing what datasets these things connect to (including wikidata).
And also compare growth rates of entities and then also statement count between wikidata and all of cloud.
This is something that I might look at trying to put into wikibase.world? perhaps?