Addshore

It's a blog

Tag: Wikimedia

The RevisionSlider

The RevisionSlider is an extension for MediaWiki that has just been deployed on all Wikipedias and other Wikimedia websites as a beta feature. The extension was developed by Wikimedia Germany as part of their focus on technical wishes of the German speaking Wikimedia community. This post will look at the RevisionSliders design, development and use so far.

Continue reading

Un-deleting 500,000 Wikidata items

Since some time in January of this year I have been on a mission to un-delete all Wikidata items that were merged into other items before the redirect functionality of Wikidata existed. Finally I am done (well nearly). This is the short story…

Reasoning

Earlier this year I pointed out the importance of redirects on Wikidata in a blog post. At the time I was amazed at how the community nearly said that they were not going to create redirects for merged items…. but thank the higher powers that the discussion just swung in favour of redirects.

Redirects are needed to maintain the persistent identifiers that Wikidata has. When two items relate to the same concept, they are merged and one of the identifiers must then be left pointing to the identifier now holding the data of the concept.

Listing approach

Since Wikidata began there have been around 1,000,000 log entries deleting pages, which equates to roughly the same number of items deleted, although some deleted items may also have been restored. This was a great starting point. The basic query to get this result was can be found below.

I removed quite a few items from this initial list by looking at at items that had already been restored and were already redirects. To do this I had to find all of the redirects!

At this stage I could have probably tried and remove more items depending on if they currently exist, but there was very little point. In fact it turned out that there was very little point in the above query as prior to my run very few items were un-deleted in order to create redirects.

The next step was to determine which of the logged deletions were actually due to the item being merged into another item. This is fairly easy as most cases of merges used the merge gadget on Wikidata.org. So if the summary matched the following regular expression! I would therefore assume it was deleted due to being merged / a duplicate of another item.

And of course in order to create a redirect I would have to be able to identify a target, so, match Q id links.

I then had a fairly  nice list, although it was still large, but it was time to actually start trying to create these redirects!

Editing approach

So firstly I should point out that such a task is only possible while using an Admin account, as you need to be able to see deleted revisions / un-delete items. Secondly it is not possible to create a redirect over a deleted item and also not possible to restore an item when that would create a conflict on the site, for example due to duplicate site links on items or duplicate joined labels and descriptions.

I split the list up into 104 different sections, each containing exactly 10,000 item IDs. I could then fire up multiple processes to try and create these redirects to make the task go as quickly as possible.

The process of touching a single ID was:

  1. Make sure that the target of the merge exists. If it does not then log to a file, if it does, continue.
  2. Try to un-delete the item. If the deletion fails log to a file, if it is successful continue.
  3. Try to clear the item (as you can only create redirects over empty items). This either results in an edit or no edit, it doesn’t really matter.
  4. Try to create the redirect, this should never fail! If it does log to a fail file that I can clean up after.

The approach on the whole worked very well. As far as I know there were no incorrect un-deletions and nothing failing in the middle.

The first of 2 snags that I hit was the rate at which I was trying to edit was causing the dispatch lag on wikidata to increase. There was no real solution to this other than to keep an eye on the lag and if it ever increased above a certain level to stop editing.

The second snag was causing multiple database locks during the final day of running, although again this was not really a snag as all the transactions recovered. The deadlocks can be seen in the graph below:

The result

  • 500,000 more item IDs now point to the correct locations.
  • We have an accurate idea of how many items have actually been deleted due to not being notable / being test items.
  • The reasoning for redirects has been reinforced in the community.

Final note

One of the steps in the editing approach was to attempt to un-elete an item and if un-deleting were to fail to log the item ID to a log file.

As a result I have now identified a list of roughly 6000 items that should be redirects but and not currently be un-deleted in order to be created.

See https://phabricator.wikimedia.org/T71166

It looks like there is still a bit of work to be done!

Again, sorry for the lack of images :/

Wikimedia Grafana graphs of Wikidata profiling information

I recently discovered the Wikimedia Grafana instance. After poking it for a little while here are some slightly interesting graphs that I managed to extract.

Continue reading

Review of the big Interwiki link migration

Wikidata was launched on 30 October 2012 and was the first new project of the Wikimedia Foundation since 2006. The first phase enabled items to be created and filled with basic information: a label – a name or title, aliases – alternative terms for the label, a description, and links to articles about the topic in all the various language editions of Wikipedia.

On 14 January 2013, the Hungarian Wikipedia became the first to enable the provision of interlanguage links via Wikidata. This functionality was slowly enabled on more sites until it was enabled on all Wikipedias on the 6th March.

The side bar that these interlanguage links are used to generate can be seen to the right. Continue reading

Wikimedia Hackathon 2015 (Lyon)

By Jean-Philippe Kmiec & Sylvain Boissel (Own work) [CC BY-SA 4.0], via Wikimedia Commons

This years Wikimedia Hackathon was located in Lyon, France at Valpré-Lyon between the 23rd and 25th of May.

The hotel (Valpre-Lyon) was absolutely beautiful with large grass areas, great architecture and a place for you weather you wanted to have a large or small discussion, sit quietly or sit outside. As well as Pétanque, table tennis was also available as well as plenty of people to meet!

Valpré Castel

Some of the hackathon grounds. By Alex Cella (Own work) [CC BY-SA 4.0], via Wikimedia Commons

I planned on primarily hacking on my MassAction extension along with one of two others but as at any hackathon I got massively distracted talking to people and working on other projects. Continue reading

Right now

A Stickman

So a quick summary of everything I am working on right now:

  • MassAction is a Mediawiki extension allowing users to perform mass actions on targets on a Mediawiki site through a static page using Mediawikis inbuilt job queue that I have been working on for the past half a year or so. I look forward to releasing it to the open source world soon!
  • addwiki is a collection of Mediawiki related PHP libraries (including one for wikibase). Previous to this I developed various PHP scripts and bots for Wikipedia using other libraries and always found that they were quite badly coded and prone to doing unexpected things. Addwiki is the start of my attempt to fix that for PHP.
  • Orain (github) is a community-driven, not-for-profit wiki network that I help to keep running.
  • I am also still an active contributor to all kinds of Wikimedia projects including the wonderful Wikidata.
  • I should at some point be trying to create a very rudimentary backup script / system for Sharepoint Lists…
  • I am also currently working on a redesign of the Joomla 1 components used to power http://studentwindsurfing.co.uk/ written for Joomla 3 which should be ready in the next 6 months.

I am also still involved with:

  • Huggle (github), an antivandalism tool for use on Wikipedia and similar projects.

I could also point toward a few other things:

I also have a backlog of posts that I might try to write…

But for now lets end this post here.

Wikimania

Wikimania 2014 was a 2000+ person conference, festival, meetup, workshop, hackathon, and celebration, spread over five days in August 2014, preceded and followed by fringe events. Wikimania is the official annual event of the Wikimedia movement, where one can discover all kinds of projects that people are making with wikis and open content, as well as meet the community that produced the most famous wiki of all, Wikipedia!

The core event was held in and around The Barbican Centre in London, UK.

Watch the videos on YouTubeCommons or LiveStream.

Read about it by following one of the following links:

Also you can find a blog post looking back at Wikimania from the Barbican Centre here.

Post photo from https://wikimania2014.wikimedia.org/wiki/File:Wikimania_2014_group_photo.jpeg

At future events these posts will likely be much better, as I’ll be writing them while I am there! This post was actually written in April 2015 :/

Zürich Wikimedia Hackathon

This year the Wikimedia Hackathon was held in Zürich, Switzerland from the 9th to 11th May 2014. The organization of the event was great, from lanyards and badges that included a USB memory stick to a city map and a ticket for public transport, Wikimedia Switzerland had prepared fantastic hackathon.

More than 150 developers, engineers, sysadmins, and technology enthusiasts gathered coming from more than 30 countries aiming to share knowledge about new and existing technologies, fix bugs, come up with new ideas and work together on tools and systems relating to the Wikimedia movement.

As the name suggests a lot of time at a hackathon is spent ‘hacking’ (coding and such) there are also workshops available on all days. This year these workshops and talks included multiple sessions on ‘Vagrant’ working toward a production like development system, ‘Open data’ looking at Wikidata and government open data as well as sessions of ‘Phabricator’ and ‘Jenkins’.

Hackathons are not just a place to hack, but they provide people with a crucial time to allow people with different specialisms and interests to meet each other in person, put faces to names and names to pseudonyms, to build relationships and in turn build the movement.

Until next time!

Image Credits:

  • Logo: By Original: Trevor Parscal Modification: Lokal_Profil [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
  • Photo: By Christian Meixner (Own work) [CC BY 3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons

Spaghetti Open Data

Spaghetti Open DataSpaghetti Open Data is made up of a group of Italian citizens interested in the release of public data in open formats, in order to make it easy to access and reuse.

Starting in 2010 they have had yearly conferences.

I was lucky enough to be invited to their conference this year (SOD14) as the keynote speaker talking about Wikidata, paid for by Wikimedia Italy.

An overview of the conference can be found at http://www.spaghettiopendata.org/page/conferenza-sod14

IMG_0004

Photo by: Homer Project. Harmonising Open Data

 

My talk went well and during the day many people came to discuss various things with me.

My presentation can be found on Google Docs.

As for my participation in the rest of the conference, well, I don’t speak Italian, so……

I can however say that Italy is a great place!

More Links

Wikidata training @ Wikimedia UK

I recently led an event at Wikimedia UK on 28 September 2013 entitled Wikidata Training as part of the Wikidata development team.

The event page on the Wikimedia UK wiki can be found at https://wikimedia.org.uk/wiki/Wikidata_training.

I want to share the LONG presentation that I gave on that day using Prezi.

You can find it below, sorry if it drags on but it gets lots across.

Continue reading

© 2017 Addshore

Theme by Anders NorenUp ↑