Faster munging for the Wikidata Query Service using Hadoop
The Wikidata query service is a public SPARQL endpoint for querying all of the data contained within Wikidata. In a previous blog post I walked through how to set up a complete copy of this query service. One of the steps in this process is the munge step. This performs some pre-processing on the RDF…
How can I get data on all the dams in the world? Use Wikidata
During my first week at Newspeak house while explaining Wikidata and Wikibase to some folks on the terrace the topic of Dams came up while discussing an old project that someone had worked on. Back in the day collecting information about Dams would have been quite an effort, compiling a bunch of different data from…
Creating new Wikidata items with OpenRefine and Quickstatements
Following on from my blog post using OpenRefine for the first time, I continued my journey to fill Wikidata with all of the Tors on Dartmoor. This post assumes you already have some knowledge of Wikidata, Quickstatements, and have OpenRefine setup. Note: If you are having problems with the reconciliation service it might be worth…
Using OpenRefine with Wikidata for the first time
I have long known about OpenRefine (previously Google Refine) which is a tool for working with data, manipulating and cleaning it. As of version 3.0 (May 2018), OpenRefine included a Wikidata extension, allowing for extra reconciliation and also editing of Wikidata directly (as far as I understand it). You can find some documentation on this…
Minecraft Java mod using Bukkit / Spigot
I have owned Minecraft Java for several years, but despite being a software developer, I have never looked into creating a mod, until now! This is certainly a different topic compared with my regular blog posts, but as always, I hope it will help someone somewhere. I stumbled upon a video by one of the…
mediawiki-docker-dev v1 rewrite
Back in 2017 at the Wikimedia Hackathon, I played around with Docker and docker-compose in relation to MediaWiki and testing with multiple setups at once while developing, meaning multiple PHP versions, web servers and databases. My original slides can still be found here. Since then mediawiki-docker-dev evolved into less of a testing system and more…
Adding git bash to Windows terminal
I just saw a tweet saying that Windows terminal is now generally available, so I had to give it a try. After downloading from the store and booting up I realized that only powershell, cmd and wsl are listed by default (and also Azure which I don’t really care about). Clicking around the UI a…
Reducing Java JVM memory usage in Containers and on Kubernetes
For a while I have been running a Wikibase query service update script for WBStack, which is a Java application on a Kubernetes cluster. Part of that journey has included the updater using all available memory, hitting into the kubernetes memory limit and being OOM killed. The title of the post is a little verbose,…
WBStack 2020 Update 2 (May)
WBStack is now in its 7th month with 76 user accounts who have created 226 MediaWiki sites running Wikibase, of which 145 are currently online (81 deleted sites). 295,000 edits have now been made in total, which is an increase of 95,000 in the last month, which roughly equates to 2 edits a minute for…
Wikidata Map May – November 2019
It’s time for another blog post in my Wikidata map series, this time comparing the item maps that were generated on the 13th May 2019 and 11th November 2019 (roughly 6 months). I’ll again be using Resemble.js to generate a difference image highlighting changed areas in pink, and breakdown the areas that have had the…