Addshore

It's a blog

Language usage on Wikidata

the Wikidata LogoWikidata is a multilingual project, but due to the size of the project it is hard to get a view on the usage of languages.

For some time now the Wikidata dashboards have existed on the Wikimedia grafana install. These dashboards contain data about the language content of the data model by looking at terms (labels, descriptions and aliases) as well as data about the language distribution of the active community.

For reference the dashboard used are:

All data below was retrieved on 1 February 2016

Active user language coverage

Active users here is defined as users that have made 1 edit or more in the last 30 days.

A single user can have multiple languages (in the case that they use a babel box). If the user does not have a babel box then the user interface language is used.

18190 users are represented below with 317 languages shown as covered 27660 times.

The primary active user language is shown as English, this is likely due to the fact that the default user interface language is English and only 2905 users have babel boxes.

On average a user that has a babel box has 3.3 languages defined in it.

Term language coverage

Across all Wikidata entities 410 languages are used (including variants).

This leaves a gap of roughly 93 languages between those used in terms and those viewed by active editors currently.

The distributions per term type can be seem below.

Of course all of the numbers above are constantly changing and the dashboards should be referred to for up to date data.

1 Comment

  1. You should have excluded taxons.

    As https://www.wikidata.org/wiki/Wikidata:Statistics/Wikipedia shows Cebuano and Waray-Waray have 95 % bot articles, whose labels are basically latin.

Leave a Reply

© 2017 Addshore

Theme by Anders NorenUp ↑