Wikidata Map July 2017

It’s been 9 months since my last Wikidata map update and once again we have many new noticable areas appearing, including Norway, South Africa, Peru and New Zealand to name but a few.  As with the last map generation post I once again created a diff image so that the areas of change are easily identifiable comparing the data from July 2017 with that from my last post on October 2016.

Wikidata Map October 2016

I has been another 5 months since my last post about the Wikidata maps, and again some areas of the world have lit up. Since my last post at least 9 noticeable areas have appeared with many new items containing coordinate locations. These include Afghanistan, Angola, Bosnia & Herzegovina, Burundi, Lebanon, Lithuania, Macedonia, South Sudan and Syria.

The difference map below was generated using Resemble.js. The pink areas show areas of difference between the two maps from April and October 2016.

Wikidata Map May 2016 (Belarus & Uganda)

I originally posted about the Wikidata maps back in early 2015 and have followed up with a few posts since looking at interesting developments. This is another one of those posts covering the changes since the last post, so late 2015, to now, May 2016.

The new maps look very similar to the naked eye and the new ‘big’ map can be seen below.

So while at the 2016 Wikimedia Hackathon in Jerusalem I teamed up with @valhallasw to generate some diffs of these maps, in a slightly more programatic way to my posts following up the 2015 Wikimania!

Myanmar coordinates on Wikidata by Lockal & Widar

In a recent blog post I showed the amazing apparent effect that Wikimania’s location had on the coordinate location data in Mexico on Wikidata. A comment on the post by Finn Årup Nielsen pointed out a massive increase in data in the Myanmar (Burma). I had previously spotted this increase but chosen not to mention it in the post. But now after a quick look at some items and edit histories I have found who we have to thank!

The increase in geo coordinate information around the region can clearly be seen in the image above. As with the Mexico comparison this shows the difference between June and October 2015.

Impact of Wikimania Mexico 2015 on Wikidata

Recently Wikidata celebrated its third birthday. For the occasion I ran the map generation script that I have talked about before again to see what had changed in the geo coordinate landscape of Wikidata!

I found, well, Mexico blossomed!

The image to the left is from June 2015, the right October 2015 and Wikimania was in July 2015!

I will be keeping an eye out for what happens on the map around Esino Lario in 2016 to see what impact the event has on Wikidata again.

Full maps

Wikimedia Grafana graphs of Wikidata profiling information

I recently discovered the Wikimedia Grafana instance. After poking it for a little while here are some slightly interesting graphs that I managed to extract.

Barack Obama GeneaWiki, 1 year later

GeneaWiki is a tool created by Magnus Manske to visualize the family of a person using data pulled from Wikidata.

I used the GeneaWiki tool as an example use of Wikidata in a presentation a year ago (2014) and below you can see the screenshot I took from it. It shows 10 people in Barack Obamas family tree / web.

GenaWiki Q76 2014


When creating a new presentation this year (2015) I went back to GeneaWiki to take another screenshot and this is what I found!

GenaWiki Q76 2015

Around 30 people now! :)

Yay, more data!

Wikidata Map – 19 months on

The last Wikidata map generation, as last discussed here and as originally created by Denny Vrandečić was on the 7th of November 2013. Recently I have started rewriting the code that generates the maps, stored on github, and boom, a new map!

The old code

The old version of the wikidata-analysis repo, which generated the maps (along with other things) was terribly inefficient. The whole task of analysing the dump and generating data for various visualisations was tied together using a bash script which ran multiple python scripts in turn.

  • The script took somewhere between 6 and 12 hours to run.
  • At some points this script needed over 6GB of memory to run. And this was running when Wikidata was much smaller, this probably wouldn’t even run any more.
  • All of the code was hard to read, follow and understand.
  • The code was not maintained and thus didn’t actually run any more.

The Rewrite

The initial code that generated the map can mainly be found in the following two repositories which were included as sub-modules into the main repo:

The code worked on the Mediawiki page dumps for Wikidata and relied on the internal representation of Wikidata items and thus as this changed everything broke.

The wda repository pointed toward the Wikidata-Toolkit which is written in Java and is actively maintained, and thus the rewrite began! The rewrite is much faster, easily understood and easily expandable (maybe I will make another post about it once it is done)!

The change to the map in 19 months

Unfortunately according to the settings of my blog currently I can not upload the 2 versions of the map so will instead link to the the twitter post announcing the new map as well as the images used there (not full size).

The tweet can be found here.

Wikidata map 7 Nov 2013

Wikidata map 3 June 2015

As you an see, the bottom map contains MORE DOTS! Yay!

Still to do

  • Stop the rewrite of the dump analyser using somewhere between 1 and 2GB ram.
    • Problem: Currently the rewrite takes the data it wants and collects it in a Java JSON object writing to disk at the end of the entire dump has been read. Because of this lots of data ends up in this JSON object and thus in memory, and as we analyse things more this problem is only going to get worse.
    • Solution: Write all data we want directly to disk. After the dump has fully been analysed read all of these output files individually and put them in the format we want (probably JSON).
  • Make all of the analysis run whenever a new JSON dump is available!
  • Keep all of the old data that is generated! This will mean we will be able to look at past maps. Previously the maps were overwritten every day.
  • Fix the interactive map!
    • Problem: Due to the large amount of data that is now loaded (compared with then the interactive map last worked 19 months ago) the interactive map crashes all browsers that try to load it.
    • Solution: Optimise the JS code for the interactive map!
  • Add more data to the interactive map! (of course once the task above is done)

Maps Maps Maps!

Wikidata map visualizations

the Wikidata LogoIn 2013 and 2014 I made a few presentations to various groups of people talking about Wikidata.

When creating those presentations I used as many graphical representations of the data in Wikidata as possible to try and give people an clearer picture of what is already stored.

One of the best visualisations at the time was the Wikidata map created by Denny Vrandečić which came after the introduction of coordinate locations to Wikidata.

Below you can see a GIF showing the additions of the coordinate location property to Wikidata items over roughly the first 40 days of enabling the coordinates data type.

By Denny Vrandecic and Lydia Pintscher (Own work) [CC0], via Wikimedia Commons

 Below are some of the images that I extracted from the full map for use in my presentations. Although they are now quite outdated they are still great to look at!

