Wikimedia Hackathon Northwestern Europe 2026

Historically I’m terrible at post Hackathon write ups, though a few do exist… (#hackathon posts). For the past few days I have been attending the Wikimedia Hackathon Northwestern Europe 2026 in Arnhem NL with around 70 other people. Around 42 projects were shown at the showcase, and I want to briefly look at some of those, and also document some of the other things that were going on in my vicinity.

On the whole, this was a great hcakathon, larger than the last NL organized hackathon, a beautifull venue, good organization, good food, good people, lots of conversation, and for me at least, everything was very convenient.

Goings on

SavanahHQ

Ahead of the Hackathon, Siebrand had the idea of people being able to monitor the impact of events, specifically Hackathons, and growth or retention in the various technical spaces that Wikimedia has.

This reminded Ollie and I of a talk that we heard at OggCamp a few years earlier, of a product called SavanahHQ which is a open source project with a paid SaaS service too for giving “you the insights you need to better understand, nurture, and grow your community.”

This sounded like it roughly aligned, and we spent some hours trying it out and importing some basic RSS and GitHub data, including making a bunch of patches on a fork with some fixes and improvements to make a free and open / non-billable product easier to use.

This product might be neat for smaller communities with less possible data sources, but we eventually decided we might take a different approach to try and figure out some more numbers (closer to the raw data).

In the images you can see the “activity” that was imported form various sources, along with the actors that were detected, you could then map actors in different places together, add events, and track the change in community activity and size etc around the events.

A good experiement, and some things learnt, but I’ll be trying a different approach next…

Wikibase query prefixes

Firstly… What are query prefixes…

Prefixes are shorthand aliases for long resource URIs that allow you to write more concise and readable SPARQL queries.

Instead of http://www.wikidata.org/entity/Q12165555

You might be able to just write… wd:Q12165555

These are user definable as part of your query, but sometimes SPARQL services also have defaults provided.

During the start of the Friday, a few of us spent some time looking through and thinking about the current state of wikidata and wikibase query service prefixes. (T419953 Discuss and Document how to handle SPARQL Prefixes across Wikibases) with hopefully a few decisions made, and one already written up as an actionable task (T419994 Add a `wb` SPARQL prefix-prefix automatically to refer to the current wikibase).

Right now, many prefixes like wdt: and p: are hardcoded for Wikidata, or perhaps have been overridden to point to a local installation, which makes it hard for people using Wikibase Cloud or other private installs because their tools and autocomplete often point to the wrong data. The inconsistencies also lead to a pain point for tool developers, where prefixes can’t be used reliably (even when full URIs could be).

We proposed a standard “wb” prefix-prefix for the local wikibase, always. This should make it much easier for users to write queries without needing to type out full, long URIs, and it will keep things consistent across all different Wikibase sites. wd: style prefixes would be preferred for wikidata URIs, and again the goal would be to make this consistent on all installs. Also then allowing an additional set of fully customizable prefixes if a particular installation would want to set them up.

A few other observations were made:

  • The WikibaseManifest exposes something that looks like but are not prefixes. They are absent of keys relating to URI prefixes, but the keys are not themselves prefixes, just a standard way of looking up the URI part.
  • wd: as a prefix is already used to mean multiple things in this space, but really we all agreed this should be reserved for wikidata.
  • Other conversations throughout the Hackathon came back around to highlight the importance of discoverable URIs or known prefixes to enable tool developers to make tools work for all wikibases more easily.

The work here is not done yet, the path is clear, and I hope WMDE will try to action it in the not too distant future.

It was suggested that I try to link things like this to current Wikibase / Wikidata / WMDE goals in order to increase the likely hood they will get done. It doesn’t look like this fits within any of the Q1 plans, though looking at the plan, it’s likely included in these parts…

  • “The distributed Wikibase ecosystem is more sustainable because of […] increased feature parity, and interoperability”
  • “The ability to federate knowledge across instances has improved”
  • “We have made Wikibase self-hosting operations more accessible, robust and easier to manage”

Integraality

Related to the above, I spent some time talking with Jean-Fred about Integraality and specifically “T294892 integraality for Wikibases?” which again touches on things such as default query prefixes above.

However, one of the main things that has actually been documented now (rather than just discussed) is T420096 Universal proxy authentication for any tool to edit any Wikibase which could make life easier? but also could be a terrible idea… (Think magnus tools widar, but for all tools and all wikibases?)…

N tools interacting with M independent Wikibases, resulting in N x M manual configurations, OR each wikibase needing to deploy its own version of each tool, leading to less control for tool authors, and more work for wikibase creators…

There would be a lot of downsides to something like this… Loss of Auditability in terms of the “tool” or consumers that has actually caused an action? The proxy would be a large and growing pile of security realted data, and also a single point of failure.

Probably something like https://www.rfc-editor.org/rfc/rfc7591 OAuth 2.0 Dynamic Client Registration Protocol would be a better idea in this space. Independent Wikibases can be configured to trust a central Identity Provider (IdP). This could be Wikimedia’s CentralAuth, GitHub, ORCID or some other known provider. A user goes to a new Wikibase -> clicks “Log In” -> is redirected to the central IdP -> logs in -> is redirected back. The Wikibase automatically provisions a local user account mapped to that global identity. The user never creates a new password. Which is essentially T383142 Enable Wikimedia login on Wikibase.cloud sites, but for all Wikibases. This would likely make use of PluggableAuth and perhaps WSOAuth which can be configured to authenticate users with Wikimedia login.

Wikibase community telegram groups

There were a fair few discussions going on about wikibase at the hackathon, and at some point during all of those the vast array of telegram groups came up.

I believe one of the oldest groups is the wikibase community telegram group currently with 371 members. There are then separate single channels for things such as wikibase.cloud 226 members, then including broadcast channels and such, and finally the newest channel on the block wikibase suite, with 47 members.

The wikibase suite channel has some nice structure to it with sub channels for various topics such as configuration, however, the wikibase community channel is much larger and has much wider reach. Personally, I still think there’s confusion around why this “wikibase suite” term has been segregated from just wikibase. And generally we felt like the community would benefit from a single channel for discussing wikibase installations, with the structure of the sweet channel but the involvement and spread of the main community channel.

It looks like this is primarily something for wmde to consider, as they have most of the owner and admin rights across both of these channels.

Developer activity and retention

After deciding that SavanahHQ was probably not the tool I wanted to use for the job, we just started scraping some data, primarily from Phabriator via https://wikimedia.biterg.io/, and also consructing a git log of all Wikimedia related repositories that I could find across Gerrit, Gitlab and Github… The initial scrape, and clone took some time, and eventually we started to get towards having data from each source that included:

  • Source: Where has the data / activity event come from [e.g. phabricator]
  • Type: What was the event type [e.g. task-create]
  • Actor: Who or what triggered the event, such as a username [e.g. Addshore]
  • Identity: The unique identity, which when combined with the soruce and type could be used for deduplication, and looking up the thing again [e.g. T12345]
  • Timestamp: The time the event occoured

So for the phabricator scrape of bitergia, the entry might look something like {"source":"phabricator","timestamp":"2020-04-28T10:20:25+00:00","type":"task/create","actor":"Addshore","identity":"T251244"}

And for a gitlab repository, perhaps something like {"source":"git/gitlab/addshore/backstage","timestamp":"2021-10-18T09:56:38-04:00","type":"commit","actor":{"name":"Addshore","email":"addshore@example.org"},"identity":"c3699d5bd3141bc8dbe688419169f352d1502c9f","metadata":{"subject":"foo bar test"}}

I didn’t manage to collect all of this data during the hackathon, but spat out some graphs to present during the showcase anyway with an indication of the sort of insights this might be able to show you…

So NOTE: the below graphs are NOT COMPLETE so really you should totally ignore them until I get to look at them for another round..

I think this is certainly an area worth continuing to explore, but in order to get to any meaningfull point, more raw data is needed and more refining needs to happen:

  • More sources (try not to be scared):
    • RSS / blogs
    • Chat logs and activity (IRC, telegram)
    • On wiki edits of JS, CSS, Templates and Modules and the relevant talk pages / docs
    • A more complete list of git repositories
    • More phabricator activity (comments would likely be the next most relevant)
    • GitLab MRs & Issues
    • Gerrit code review
    • SAL (Server admin log)
    • Mailing list posts

And I am sure people could come up with more.

Already there are additional things worth considering:

  • Ignoring some repos, such as “kubernetes” which is forked into the WMF Gerrit
  • Flagging bots and automation actors throughout the above
  • Some attempt at deduplicating / connecting the same actors across various places where possible
  • Automation, automation, automation…

Noteworthy in this space would be https://techcontibs.toolforge.org which is a very cool tool for visualizing your individual technical contributions across multiple platforms.

https://wikimedia.biterg.io which contains high level trend activity data for some platforms and to some levels, without too much further analysis.

And https://strategy.wikimedia.org/wiki/Editor_Trends_Study/Results which exists for Wikimedia editor trends, but not for the techncial community.

I hope to be looking back at this soon…

Showcase

You can find the full showcase listing on wiki, but here are a few bits that I will particularly want to remember.

Telegram commons uploader

Siebrand and Maarten managed to whip together a Wikimedian Commons telegram uploader bot (which you can already use).

This really lowers the barrier to entry for image uploads if you are out and about. You no longer have to use a dedicated app such as the commons android app, or the web based browser upload flows, you can just send your image (as a file) to the bot, answer some questions and it will appear on Commons!

You can read more about it on wiki, find the bot on Telegram, and see all images that have been uploaded by the bot so far in the dedicated category.

Wikimedia Developer Starter Kit

Very much a starting page at https://meta.wikimedia.org/wiki/User:Eugene233/NewDevKit, but also a lovely idea.

Maybe at some point we would have a single point of reference we all agree on and like to link newcommers to accross the board.

Wiki as Git!

Ever wanted to know who to blame for a specific part of an article? Now you can.

This app takes your request, and drags the history into git, hosted on Github for you to visiualize and run a blame on.

https://wiki-as-git.netlify.app/en.wikipedia.org/Brazil%20at%20the%202026%20Winter%20Paralympics

wiki-as-git has existed for many years, the concept of seeing the Git history just by browsing a URL has been hacked at this hackathon.

Solving the hackathon logo puzzle 😎

The logo included at the top of this post, and throughout the other pages relating to the Hackathon had a secret code in it. And one evening a bunch of people got together and figured it out (spoiler: it was a rick roll).

The most complete working can be found at https://gist.github.com/Krinkle/ace3f2023a250ff387d432bdb5c22c83, with a link right at the end showing you the result (https://people.wikimedia.org/~krinkle/wmhack2026-puzzle/13-workspace.html).

And if you want an interactive tool to try and figure it out yourself (with a fair bit of help already in there), see https://simon04.github.io/Wikimedia-Hackathon-Northwestern-Europe-2026/

1 thought on “Wikimedia Hackathon Northwestern Europe 2026”

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.