Faster munging for the Wikidata Query Service using Hadoop
The Wikidata query service is a public SPARQL endpoint for querying all of the data contained within Wikidata. In a previous blog post I walked through how to set up a complete copy of this query service. One of the steps in this process is the munge step. This performs some pre-processing on the RDF…
How can I get data on all the dams in the world? Use Wikidata
During my first week at Newspeak house while explaining Wikidata and Wikibase to some folks on the terrace the topic of Dams came up while discussing an old project that someone had worked on. Back in the day collecting information about Dams would have been quite an effort, compiling a bunch of different data from…
Creating new Wikidata items with OpenRefine and Quickstatements
Following on from my blog post using OpenRefine for the first time, I continued my journey to fill Wikidata with all of the Tors on Dartmoor. This post assumes you already have some knowledge of Wikidata, Quickstatements, and have OpenRefine setup. Note: If you are having problems with the reconciliation service it might be worth…
Using OpenRefine with Wikidata for the first time
I have long known about OpenRefine (previously Google Refine) which is a tool for working with data, manipulating and cleaning it. As of version 3.0 (May 2018), OpenRefine included a Wikidata extension, allowing for extra reconciliation and also editing of Wikidata directly (as far as I understand it). You can find some documentation on this…
Wikidata Map May – November 2019
It’s time for another blog post in my Wikidata map series, this time comparing the item maps that were generated on the 13th May 2019 and 11th November 2019 (roughly 6 months). I’ll again be using Resemble.js to generate a difference image highlighting changed areas in pink, and breakdown the areas that have had the…
Covid-19 Wikipedia pageviews, a first look
World events often have a dramatic impact on online services. A past example would be the death of Michael Jackson which brought down Twitter and Wikipedia and made Google believe that they were under attack according to the BBC. Events like the COVID-19 (Coronavirus) pandemic have less instantaneous affect but trends can still be seen…
Your own Wikidata Query Service, with no limits
The Wikidata Query Service allows anyone to use SPARQL to query the continuously evolving data contained within the Wikidata project, currently standing at nearly 65 millions data items (concepts) and over 7000 properties, which translates to roughly 8.4 billion triples. You can find a great write up introducing SPARQL, Wikidata, the query service and what…
Wikidata Map July 2019
It’s been another 9 months since my last blog post covering the Wikidata generated geo location maps that I have been tending to for a few years now. Writing this from a hammock, lets see what has noticeably changed in the last 9 months using a visual diff and my pretty reasonable eyes.
Wikidata Architecture Overview (diagrams)
Over the years diagrams have appeared in a variety of forms covering various areas of the architecture of Wikidata. Now, as the current tech lead for Wikidata it is my turn. Wikidata has slowly become a more and more complex system, including multiple extensions, services and storage backends. Those of us that work with it…
Wikidata is 6
It’s was Wikidata’s 6th birthday on the 30th of October 2018. WMUK celebrated this with a meetup on the 7th of November. They also made this great post event video.