Over the years diagrams have appeared in a variety of forms covering various areas of the architecture of Wikidata. Now, as the current tech lead for Wikidata it is my turn.

Wikidata has slowly become a more and more complex system, including multiple extensions, services and storage backends. Those of us that work with it on a day to day basis have a pretty good idea of the full system, but it can be challenging for others to get up to speed. Hence, diagrams!

All diagrams can currently be found on Wikimedia Commons using this search, and are released under CC-BY-SA 4.0. The layout of the diagrams with extra whitespace is intended to allow easy comparison of diagrams that feature the same elements.

High level overview

High level overview of the Wikidata architecture

This overview shows the Wikidata website, running Mediawiki with the Wikibase extension in the left blue box. Various other extensions are also run such as WikibaseLexeme, WikibaseQualityConstraints, and PropertySuggester.

Wikidata is accessed through a Varnish caching and load balancing layer provided by the WMF. Users, tools and any 3rd parties interact with Wikidata through this layer.

Off to the right are various other external services provided by the WMF. Hadoop, Hive, Ooozie and Spark make up part of the WMF analytics cluster for creating pageview datasets. Graphite and Grafana provide live monitoring. There are many other general WMF services that are not listed in the diagram.

Finally we have our semi persistent and persistent storages which are used directly by Mediawiki and Wikibase. These include Memcached and Redis for caching, SQL(mariadb) for primary meta data, Blazegraph for triples, Swift for files and ElasticSearch for search indexing.

Continue reading