Visualizing Wikibase connections, using wikibase.world

October 13, 2024 2 By addshore

Over the past week I have spent some time writing some code to start running a little bot on the wikibase.world project, aimed at expanding the number of Wikibases that are collected there, and automating collection of some of the data that can easily be automated.

Over the past week, the bot has imported 650 Wikibase installs that increases the total to 784, and active to 755.

I mainly wanted to do this to try and visualize “federation” or rather, links between Wikibases that are currently occurring, hence creating P55 (links to Wikibase) and P56 (linked from Wikibase).

251 Wikibases seem to link to each other, and Wikidata is very clearly at the centre of that web.

Many Wikibases only link to Wikidata, but there are a few other notable clusters, including Wikimedia Commons (but see the improvements section below, as some of these may be false positives).

I’m not sure why Q2 didn’t render the label, but Q2 is Commons in the below image.

Others such as LexBib, MaRDi portal, PersonalData.io, Librarybase, R74n and more also seem to have multiple connections (more than one)

Here is a fairly nice SPARQL query that can get you these links in their current state, in a table…

PREFIX wwdt: <https://wikibase.world/prop/direct/>
PREFIX wwd: <https://wikibase.world/entity/>

SELECT ?wikibase ?wikibaseLabel ?linksTo ?linksToLabel
WHERE {
    ?wikibase wwdt:P3 wwd:Q10.
    ?wikibase wwdt:P13 wwd:Q54.
    ?wikibase wwdt:P55 ?linksTo
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}   Code language: JavaScript (javascript)

Runnable here: https://tinyurl.com/28dor4qe

The scripts

Very briefly, there are a collection of scripts that import Wikibases found via a variety of methods (I’m open to new ideas if you have them).

  • wikibase.cloud: which exposes an API of all currently active installations
  • wikibase-metadata.toolforge.org: which as some data collected about usage of “Wikibase Suite” installed elsewhere
  • google: with some painfully long, crafted search terms that match the few things identifying a Wikibase that might get indexed.

These scripts import a very bare-bones version of an Item, such as [1], [2], [3]…

Once the data is in wikibase.world, a separate process loads all currently active Wikibases, and tries to add and refine information.

  • Load the site and see if it is a 200
  • Try to normalize the URLs a bit if possible
  • Try to detect and record the host
  • Add an inception date, based on the first logged action by MediaWiki
  • Add entity types and tools used (sometimes)… (extensions to come soon?)
  • Add links to and from other Wikibases based on some External Identifiers, and all URL properties.

The code makes use of wikibase-edit and wikibase-sdk written by maxlath. They were a pleasure to use, really simplify Wikibase APIs down to basics, which is all I needed here.

Improvements

There are many other elements of data that could be added, and that also would be nice to be able to filter by across all Wikibases, such as number of entities, number of users, date of first Wikibase edit etc. I plan on slowly trying to tackle these parts moving forward.

There are also possibly a few issues with the current process

  • Not all External Identifier properties are currently inspected. Only those that have a formatter URL property defined, and also that have that formatter URL property exposed via WikibaseManifest (so the WikibaseManifest extension is also a requirement)
  • All URLs are inspected for known domains, and these may link to NON Wikibase and NON entity pages. Such as a URL that just links to https://commons.wikimedia.org would currently appear as a link…

Currently, I have just been running the scripts locally, but I’ll aim to set them up on GitHub Actions so they run weekly perhaps?

And let’s pretend that I wrote the code in a nice tidy way, haha, naaah

That will come (if this all still seems like a good idea)