I searched around for a while looking at various lists of tors on Dartmoor. Slowly I compiled a list that seemed to be quite complete from a variety of sources into a Google Sheet. This list included some initial names and rough OS Mapgrid coordinates(P613).
In order to load the data into OpenRefine I exported the sheet as a CSV and dragged it into OpenRefine using the same process as detailed in my previous post.
I have long known about OpenRefine (previously Google Refine) which is a tool for working with data, manipulating and cleaning it. As of version 3.0 (May 2018), OpenRefine included a Wikidata extension, allowing for extra reconciliation and also editing of Wikidata directly (as far as I understand it). You can find some documentation on this topic on Wikidata itself.
This post serves as a summary of my initial experiences with OpenRefine, including some very basic reconciliation from a Wikidata Query Service SPARQL query, and making edits on Wikidata.
In order to follow along you should already know a little about what Wikidata is.
I tried out OpenRefine in two different setups both of which were easy to set up following the installation docs. The setups were on my actual machine and in a VM. For the VM I also had to use the -i option to make the service listen on a different IP. refine -i 172.23.111.140
It’s time for another blog post in my Wikidata map series, this time comparing the item maps that were generated on the 13th May 2019 and 11th November 2019 (roughly 6 months). I’ll again be using Resemble.js to generate a difference image highlighting changed areas in pink, and breakdown the areas that have had the greatest change throughout the 6 month period. The full comparison image can be found here.
If you don’t know what Wikidata is, or what items are then give this page a read. This map shows all items that have a “coordinate location” as a light pixel on a black canvas. The more items with coordinates in a single pixel, the brighter that pixel. This map is generated using code that can be found here.
World events often have a dramatic impact on online services. A past example would be the death of Michael Jackson which brought down Twitter and Wikipedia and made Google believe that they were under attack according to the BBC.
Events like the COVID-19 (Coronavirus) pandemic have less instantaneous affect but trends can still be seen to change. Cloudflare recently posted about some of the internet wide traffic changes due to the pandemic and various government announcements, quarantines and lockdowns.
It’s been another 9 months since my last blog post covering the Wikidata generated geo location maps that I have been tending to for a few years now. Writing this from a hammock, lets see what has noticeably changed in the last 9 months using a visual diff and my pretty reasonable eyes.
Over the years diagrams have appeared in a variety of forms covering various areas of the architecture of Wikidata. Now, as the current tech lead for Wikidata it is my turn.
Wikidata has slowly become a more and more complex system, including multiple extensions, services and storage backends. Those of us that work with it on a day to day basis have a pretty good idea of the full system, but it can be challenging for others to get up to speed. Hence, diagrams!
All diagrams can currently be found on Wikimedia Commons using this search, and are released under CC-BY-SA 4.0. The layout of the diagrams with extra whitespace is intended to allow easy comparison of diagrams that feature the same elements.
Wikidata is accessed through a Varnish caching and load balancing layer provided by the WMF. Users, tools and any 3rd parties interact with Wikidata through this layer.
Off to the right are various other external services provided by the WMF. Hadoop, Hive, Ooozie and Spark make up part of the WMF analytics cluster for creating pageview datasets. Graphite and Grafana provide live monitoring. There are many other general WMF services that are not listed in the diagram.
Finally we have our semi persistent and persistent storages which are used directly by Mediawiki and Wikibase. These include Memcached and Redis for caching, SQL(mariadb) for primary meta data, Blazegraph for triples, Swift for files and ElasticSearch for search indexing.
It has been another 6 months since my last post in the Wikidata Map series. In that time Wikidata has gained 4 million items, 1 property with the globe-coordinate data type (coordinates of geographic centre) and 1 million items with coordinates . Each Wikidata item with a coordinate is represented on the map with a single dim pixel. Below you can see the areas of change between this new map and the once generated in March. To see the equivalent change in the previous 4 months take a look at the previous post.
Wikidata.org runs on MediaWiki with the Wikibase extension. But there is more to it than just that. The Wikibase extension itself is split into 3 different sections, being Lib, Repo and Client. There are also 6 other extensions all providing extra functionality to the site and it’s sisters. The extensions are also loaded on a different combination of Clients (such a Wikipedia) and the Repo itself (wikidata.org).