Profiling a Wikibase item creation on test.wikidata.org

Today I was in a Wikibase Stakeholder group call, and one of the discussions was around Wikibase importing speed, data loading, and the APIs. My previous blog post covering what happens when you make a new Wikibase item was raised, and we also got onto the topic of profiling.

So here comes another post looking at some of the internals of Wikibase, through the lens of profiling on test.wikidata.org.

The tools used to write this blog post for Wikimedia infrastructure are both open source, and also public. You can do similar profiling on both your own Wikibase, or for your requests that you suspect are slow on Wikimedia sites such as Wikidata.

Wikimedia Profiling

Profiling of Wikimedia sites is managed and maintained by the Wikimedia performance team. They have a blog, and one of the most recent posts was actually covering profiling PHP at scale in production, so if you want to know the details of how this is achieved give it a read.

Throughout this post I will be looking at data collected from a production Wikimedia request, by setting the X-Wikimedia-Debug header in my request. This header has a few options, and you can find the docs on wikitech.wikimedia.org. There are also browser extensions available to easily set this header on your requests.

I will be using the Wikimedia hosted XHGui to visualize the profile data. Wikimedia specific documentation for this interface also exists on wikitech.wikimedia.org. This interface contains a random set of profiled requests, as well as any requests that were specifically requested to be profiled.

Profiling PHP & MediaWiki

If you want to profile your own MediaWiki or Wikibase install, or PHP in general, then you should take a look at the mediawiki.org documentation page for this. You’ll likely want to use either Tideways or XDebug, but probably want to avoid having to setup any extra UI to visualize the data.

This profiling only covered the main PHP application (MediaWiki & Wikibase extension). Other services such as the query service would require separate profiling.

Read more

Laravel 8 with Wikimedia OAuth login

I recently wrote a little app called wikicrowd (blog post to follow) using Laravel and MediaWiki / Wikimedia authentication. It certainly wasn’t entirely out of the box, and the existing docs still need some tweaking.

This post reflects the steps I went through to set this app up, and it should only take a few minutes.

You can find a tag of the code at the end of this walkthrough on Github for PHP 8. (There is also a tag for PHP 7.4)

Shout out to the developers that worked on the Wikidata Mismatch Finder which is also a Laravel app with MediaWiki OAuth and was used as inspiration when writing this post, along with the documentation for the package used by Taavi.

Setup Laravel

First off I need a Laravel installation. Currently 8.x is the stream of the latest versions, and the installation docs say to run the below command.

curl -s https://laravel.build/demo-laravel-mediawiki-auth | bashCode language: JavaScript (javascript)

I’m not a fan of running random code on the internet on my machine, but this is what the docs say. It creates a directory at the location you specify at the end of the URL, in my case demo-laravel-mediawiki-auth , creates a laravel/laravel project, and does a composer install.

Read more

SMWCon 2021, Development environments using containers

SMWCon 2021 is happening as I write this post. I was invited to give a short talk as part of a MediaWiki and Docker workshop organized by Cindy Cicalese on day 2. As I am writing a month of blog posts I’m going to turn my slides into a more digestible and searchable online blog post.

The original slides can still be found on Google Slides, and when the conference recording is up you should find it on the associated event page.

Disclaimer

Read more

Tech Lead Digest – Q3/4 2021

This entry is part 5 of 5 in the series Tech Lead Digest (wmde)

It’s time for the 5th instalment of my tech lead digest posts. I switched to monthly for 2 months, but decided to back down to quarterlyish. You can find the other digests by checking out the series.

🧑‍🤝‍🧑Wikidata & Wikibase

The biggest event of note in the past months was WikidataCon 2021 which took place toward the end of October 2021. Spread over 3 days the event celebrated Wikidatas 9th birthday. We are still awaiting the report from the event to know how many folks participated, and recordings of talks will likely not be available until early 2022. At which point I’ll try to write another blog post.

Just before WikidataCon the updated strategy for Linked Open Data was published by Wikimedia Deutschland which includes sub-strategies for Wikidata and the Wikibase Ecosystem. This strategy is much easier to digest than the strategy papers published in 2019 and I highly recommend the read. Part of the Wikidata strategy talks about “sharing workload” which reminds me of some thoughts I recently had comparing Wikipedia and Wikidata editing. Wikibase has a focus on Ecosystem enablement, which I am looking forward to working on.

The Wikibase stakeholder group continues to grow and organize. A Twitter account (@wbstakeholders) now exists tweeting relevant updates. Now with over 14 organizational members and 15 individual members, the budget is now public and the group is working on getting some desired features implemented. If you are an organization or individual working in the Wikibase space, be sure to check them out! The group recently published a prioritized list of institutional requirements, and I’m happy to say that some parts of the “Automatic maintenance processes and updating cascades should work out of the box” area that scored 4 have already been tackled by the Wikidata / Wikibase teams.

Read more

mediawiki-docker-dev in mwcli

The original mediawiki-docker-dev environment was created by accident and without much design back in 2017.

In 2020 I started working on a new branch with some intentional design and quite liked the direction.

And now finally, the all of the mediawiki-docker-dev functionality exists in a new home, with more intentional design, tests, stability, releases and more.

I’ve already written a brief history of the tool in a previous post so now I’ll focus on what mediawiki-docker-dev looks like in the mwcli environment for the current version 0.8.0.

Read more

addwiki php libraries 3.0.0

Back in 2014 I wrote a small collection of PHP libraries, releasing 2.0.0 of the base library in 2015 for interacting with MediaWiki and Wikibase. My goal back then was to create a stable base that PHP bot frameworks could be built on, while also experimenting with some framework like features in surrounding libraries.

And now, version 3.0.0 has been released, with a couple of new features, such as OAuth authentication, and lots of refactoring to make the libraries easier to work with and contribute to.

Read more

What happens in Wikibase when you make a new Item?

A recent Wikibase email list post on the topic of Wikibase and bulk imports caused me to write up a mostly human readable version of what happens, in what order, and when, for Wikibase action API edits, for the specific case of item creation.

There are a fair few areas that could be improved and optimized for a bulk import use case in the existing APIs and code. Some of which are actively being worked on today (T285987). Some of which are on the roadmap, such as the new REST APIs for Wikibase. And others which are out there, waiting to be considered.

This post is is written looking at Wikibase and MediaWiki 1.36 with links to Github for code references. Same areas may be glossed over or even slightly inaccurate, so take everything here with a pinch of salt.

Reach out to me on Twitter if you have questions or fancy another deep dive.

Read more

WBStack Infrastructure

This entry is part 7 of 12 in the series WBStack

WBStack is a platform allowing shared scalable hosting of Wikibase and surrounding services.

A year ago I made an initial post covering the state of WBStack infrastructure. Since then some things have changed, and I have also had more time to create a clear diagram. So it is time for the 2021 edition.

WBStack currently runs on a single Google Cloud Kubernetes Engine cluster, now made up of 3 virtual machines, one e2-standard-2 and two e2-medium. This results in roughly 4 vCPUs and 12GB of allocatable memory.

Read more

mediawiki-docker-dev v1 rewrite

Back in 2017 at the Wikimedia Hackathon, I played around with Docker and docker-compose in relation to MediaWiki and testing with multiple setups at once while developing, meaning multiple PHP versions, web servers and databases. My original slides can still be found here.

Since then mediawiki-docker-dev evolved into less of a testing system and more of a development environment, allowing the use of a master replica DB setup, easily swappable PHP versions, debugging and more. The project on GitHub currently has 40 stars, 38 forks and has seen 17 people contributing back.

Over the past couple of years, developer productivity and development environments have been a big discussion area. The Wikimedia technical conference in 2019 had the main topic of Developer Productivity. There have also been a few efforts in a few directions trying to figure out what is best for the majority of people. These include local-charts (Kubernetes based environment) and MediaWiki-Docker (simple docker-compose based environment).

Read more

WBStack 2020 Update 1

This entry is part 4 of 12 in the series WBStack

WBStack has now been up and running for 6 months. During that time it has helped 70 people create 178 MediaWiki installs running Wikibase, a SPARQL query service and quickstatements, all at the click of a button, with a total of around 200,000 edits across all sites.

The most active site is currently virus-taxonomy.wiki.opencura.com which was developed during the Virtual Biohackathon on COVID-19 as a staging environment for “improving the taxonomy of viruses on Wikidata”. It currently stands at 20,000 edits, around 7000 Items.

Screenshot of the virus-taxonomy Wikibase Main Page, 19 April 2020

Thanks again to Rhizome, who run their very own Wikibase, for their support paying the Google Cloud bill in the early stages of this project.

Read more