Visualizing Wikibase ecosystem, using wikibase.world

This entry is part 2 of 3 in the series Wikibase ecosystem

In October last year, I wrote a post starting to visualize the connections between Wikibases in the ecosystem that had been found and collected on wikibase.world thanks to my bot that I occasionally run. That post made use of the query service visualizations, and in this post I’ll take the visualizations a step further, making use of IPython notebooks and plotly.

Previously I reported the total number of Wikibases tracked in wikibase.world being around 784, with around 755 being active (however I didn’t write down exactly how I determined this). So I’m going to take another stab at that with some code backing up the determinations, rather than just my late night data ramblings.

All of the data shown in this post is generated from the IPython notebook available on Github, on 16 Feb 2025, based on the data on wikibase.world which is maintained as a best effort system.

General numbers

MetricValue
Wikibases with properties777
Wikibases with properties, and more than 10 pages600
Wikibases with properties, and more than 10 pages, and 1 or more active users264
Wikibases with properties, and more than 10 pages, and 2 or more active users129
Wikibases that link to other wikibases194
Wikibases that only link to non Wikimedia Foundation wikibases5
Wikibases that link to other wikibases, excluding Wikimedia Foundation35

A few things of note:

  • “with properties” is used, as a clear indicator that Wikibase is not only installed, but also used in at least a very basic way. (ie, it has a created Wikibase property). I would use the number of items ideally as a measure here, however as far as I can tell, this is hard to figure out?)
  • “with more than 10 pages” is my baseline measure of the site having some content, however this applies across all namespaces, so can also be wikitext pages…
  • “active users” are taken from MediaWiki statistics, and apply across all namespaces. These numbers also rely on MediaWiki being correctly maintained and these numbers actually being updated. (Users who have performed an action in the last 30 days)
  • “link to other wikibases” are links extracted from sites by Addbot either via external links or specific properties that state they are links to other wikibases. (The code is not pretty, but gives us an initial view)

And summarized in words:

  • 264 Wikibases with some content that have been edited in the past 30 days
  • 194 Wikibases link in some way to other Wikibases
    • Excluding links to Wikidata and Commons, this number comes down to 35 (So Wikidata is very much the centre)

And of course, take all of this with a pinch of salt, these numbers are an initial stab at trying to have an overview of the ecosystem.

An updated web

My October post included some basic visualizations from the query service of wikibase.world.

However, it’s time to get a little more fancy and interactive. (As well as showing all wikibases, not just the linked ones)

Read more

mwcli (a MediaWiki focused command line tool targeting developers) over the years

mcwli includes the third of so generation of “developer environments” that I have made for MediaWiki over the years. You can see the backstory in this earlier post.

Since the early days of 2022, there has been optional metric collection included within the mwcli tool.

This metric collection simply collects what command you run, and when you run it (without any parameters or inputs) so that the data can be aggregated, and the various commands usage can be graphed.

Each command run includes something like:

Commanddocker mediawiki exec
DateTime2025-01-07T12:45:18.213Z
Version0.25.1

I used to have live (ish) graphs in the Wikimedia Superset installation, however, the queries there appear to fail now. So I took some time to export the dataset as a CSV, and shove it around a bit in a Python notebook.

Read more

Wikidata user and project talk page connection graph

Talk pages are a pretty key part of how wikis have worked over the years. Realtime chat apps and services are probably changing this dynamic somewhat, but they are still used, and also most of the history of these pages is still recorded.

I started up an IPython Notebook to try and take a look at some of the connections between different users on Wikidata over the years. Below you’ll find a few representations of these connections, as well as notable things I spotted along the way, the generating code, SQL query and more!

The data

MediaWiki maintains links tables for all pages, so getting all of the current links out of Wikidata is very easy. I made use of the Wikimedia Cloud Quarry service to run this query and host a CSV of the results.

SELECT
  SUBSTRING_INDEX(page_title, '/', 1) AS t1,
  pl_from_namespace AS t1ns,
  SUBSTRING_INDEX(pl_title, '/', 1) AS t2,
  pl_namespace AS t2ns
FROM pagelinks, page
WHERE pl_namespace IN (3,5) AND pl_from_namespace IN (3,5)
AND page_id = pl_from AND page_title != pl_title
GROUP BY t1, t2Code language: PHP (php)

I then loaded this data directly into an IPython Notebook and did some cleaning, such as removing all IP addresses. I then spent quite some time applying more filtering and twiddling knobs to try and get some graphics out that are easy to look at. The first attempts looked like solid blobs as you can see in this tweet.

You can find a copy of the Notebook on notebooksharing.space.

Read more