Addshore

It's a blog

Tag: Wikidata (page 2 of 3)

Language usage on Wikidata

the Wikidata LogoWikidata is a multilingual project, but due to the size of the project it is hard to get a view on the usage of languages.

For some time now the Wikidata dashboards have existed on the Wikimedia grafana install. These dashboards contain data about the language content of the data model by looking at terms (labels, descriptions and aliases) as well as data about the language distribution of the active community.

For reference the dashboard used are:

All data below was retrieved on 1 February 2016

Continue reading

The break in Wikidata edits on 28 Jan 2016

On the 28th of January 2016 all Wikimedia MediaWiki APIs had 2 short outages. The outage is documented on Wikitech here.

The outage didn’t have much of an impact on most projects hosted by Wikimedia. However due to most Wikidata editing happening through the API, even when using the UI, the project basically stopped for roughly 30 minutes.

Interestingly there is an unusual increase in the edit rate 30 minutes after recovery.

I wonder if this is everything that would have happened in the gap?

Building applications around Wikidata (a beer example)

Wikidata provides free and open access to entities representing real world concepts. Of course Wikidata is not meant to contain every kind of data, for example beer reviews or product reviews would probably never make it into Wikidata items. However creating an app that is powered by Wikidata & Wikibase to contain beer reviews should be rather easy.

Continue reading

Wikidata references from Microdata

Recently some articles appeared on the English Wikipedia Signpost about Wikidata (1, 2, 3). Reading these articles, especially the second and third, pushed me to try to make a dent in the ‘problem’ of references on Wikidata. It turns out that this is actually not that hard!

Continue reading

Myanmar coordinates on Wikidata by Lockal & Widar

In a recent blog post I showed the amazing apparent effect that Wikimania’s location had on the coordinate location data in Mexico on Wikidata. A comment on the post by Finn Årup Nielsen pointed out a massive increase in data in the Myanmar (Burma). I had previously spotted this increase but chosen not to mention it in the post. But now after a quick look at some items and edit histories I have found who we have to thank!

The increase in geo coordinate information around the region can clearly be seen in the image above. As with the Mexico comparison this shows the difference between June and October 2015.

Continue reading

Impact of Wikimania Mexico 2015 on Wikidata

Recently Wikidata celebrated its third birthday. For the occasion I ran the map generation script that I have talked about before again to see what had changed in the geo coordinate landscape of Wikidata!

I found, well, Mexico blossomed!

The image to the left is from June 2015, the right October 2015 and Wikimania was in July 2015!

I will be keeping an eye out for what happens on the map around Esino Lario in 2016 to see what impact the event has on Wikidata again.

Full maps

Un-deleting 500,000 Wikidata items

Since some time in January of this year I have been on a mission to un-delete all Wikidata items that were merged into other items before the redirect functionality of Wikidata existed. Finally I am done (well nearly). This is the short story…

Reasoning

Earlier this year I pointed out the importance of redirects on Wikidata in a blog post. At the time I was amazed at how the community nearly said that they were not going to create redirects for merged items…. but thank the higher powers that the discussion just swung in favour of redirects.

Redirects are needed to maintain the persistent identifiers that Wikidata has. When two items relate to the same concept, they are merged and one of the identifiers must then be left pointing to the identifier now holding the data of the concept.

Listing approach

Since Wikidata began there have been around 1,000,000 log entries deleting pages, which equates to roughly the same number of items deleted, although some deleted items may also have been restored. This was a great starting point. The basic query to get this result was can be found below.

I removed quite a few items from this initial list by looking at at items that had already been restored and were already redirects. To do this I had to find all of the redirects!

At this stage I could have probably tried and remove more items depending on if they currently exist, but there was very little point. In fact it turned out that there was very little point in the above query as prior to my run very few items were un-deleted in order to create redirects.

The next step was to determine which of the logged deletions were actually due to the item being merged into another item. This is fairly easy as most cases of merges used the merge gadget on Wikidata.org. So if the summary matched the following regular expression! I would therefore assume it was deleted due to being merged / a duplicate of another item.

And of course in order to create a redirect I would have to be able to identify a target, so, match Q id links.

I then had a fairly  nice list, although it was still large, but it was time to actually start trying to create these redirects!

Editing approach

So firstly I should point out that such a task is only possible while using an Admin account, as you need to be able to see deleted revisions / un-delete items. Secondly it is not possible to create a redirect over a deleted item and also not possible to restore an item when that would create a conflict on the site, for example due to duplicate site links on items or duplicate joined labels and descriptions.

I split the list up into 104 different sections, each containing exactly 10,000 item IDs. I could then fire up multiple processes to try and create these redirects to make the task go as quickly as possible.

The process of touching a single ID was:

  1. Make sure that the target of the merge exists. If it does not then log to a file, if it does, continue.
  2. Try to un-delete the item. If the deletion fails log to a file, if it is successful continue.
  3. Try to clear the item (as you can only create redirects over empty items). This either results in an edit or no edit, it doesn’t really matter.
  4. Try to create the redirect, this should never fail! If it does log to a fail file that I can clean up after.

The approach on the whole worked very well. As far as I know there were no incorrect un-deletions and nothing failing in the middle.

The first of 2 snags that I hit was the rate at which I was trying to edit was causing the dispatch lag on wikidata to increase. There was no real solution to this other than to keep an eye on the lag and if it ever increased above a certain level to stop editing.

The second snag was causing multiple database locks during the final day of running, although again this was not really a snag as all the transactions recovered. The deadlocks can be seen in the graph below:

The result

  • 500,000 more item IDs now point to the correct locations.
  • We have an accurate idea of how many items have actually been deleted due to not being notable / being test items.
  • The reasoning for redirects has been reinforced in the community.

Final note

One of the steps in the editing approach was to attempt to un-elete an item and if un-deleting were to fail to log the item ID to a log file.

As a result I have now identified a list of roughly 6000 items that should be redirects but and not currently be un-deleted in order to be created.

See https://phabricator.wikimedia.org/T71166

It looks like there is still a bit of work to be done!

Again, sorry for the lack of images :/

Wikimedia Grafana graphs of Wikidata profiling information

I recently discovered the Wikimedia Grafana instance. After poking it for a little while here are some slightly interesting graphs that I managed to extract.

Continue reading

Barack Obama GeneaWiki, 1 year later

GeneaWiki is a tool created by Magnus Manske to visualize the family of a person using data pulled from Wikidata.

I used the GeneaWiki tool as an example use of Wikidata in a presentation a year ago (2014) and below you can see the screenshot I took from it. It shows 10 people in Barack Obamas family tree / web.

GenaWiki Q76 2014

 

When creating a new presentation this year (2015) I went back to GeneaWiki to take another screenshot and this is what I found!

GenaWiki Q76 2015

Around 30 people now! :)

Yay, more data!

https://tools.wmflabs.org/magnus-toolserver/ts2/geneawiki/?q=Q76

Review of the big Interwiki link migration

Wikidata was launched on 30 October 2012 and was the first new project of the Wikimedia Foundation since 2006. The first phase enabled items to be created and filled with basic information: a label – a name or title, aliases – alternative terms for the label, a description, and links to articles about the topic in all the various language editions of Wikipedia.

On 14 January 2013, the Hungarian Wikipedia became the first to enable the provision of interlanguage links via Wikidata. This functionality was slowly enabled on more sites until it was enabled on all Wikipedias on the 6th March.

The side bar that these interlanguage links are used to generate can be seen to the right. Continue reading

Older posts Newer posts

© 2018 Addshore

Theme by Anders NorenUp ↑