It's a blog

Tag: architechture

Tech Lead Digest – August 2021

Welcome to the 4th instalment of my Wikidata & Wikibase Tech lead Digest for August 2021. For previous instalments see Q1, Q2 & July.

🧑‍🤝‍🧑Wikidata & Wikibase

The Wikidata Query Builder has been deployed. The Wikidata Query Builder provides a visual interface for building a simple Wikidata query. It is ideal for users with little or no experience in SPARQL.

The Wikibase fall release, which will be compatible with MediaWiki 1.36 will be made in the next month or so. At some point in the next 3-6 months we will likely also make a Wikibase 1.37 release. Keep an eye out on the mailing list for these.

Work is about to wrap up on the next iteration of Wikibase Federated Properties which will enable the use of properties from multiple sources at once, such as Wikidata and also the local Wikibase.

Work continues on the Wikidata Mismatch Finder. You can read more about this future tool on wikidata.org.

The campsite worked on many other things. Most notably Ladsgroup spotted that SpamBlacklist was rendering content on Wikidata twice (phabricator). This fix resulted in a rather significant improvement in save times for Wikidata, and users of Wikibase and SpamBlacklist in combination.

Continue reading

Wikidata Architecture Overview (diagrams)

Over the years diagrams have appeared in a variety of forms covering various areas of the architecture of Wikidata. Now, as the current tech lead for Wikidata it is my turn.

Wikidata has slowly become a more and more complex system, including multiple extensions, services and storage backends. Those of us that work with it on a day to day basis have a pretty good idea of the full system, but it can be challenging for others to get up to speed. Hence, diagrams!

All diagrams can currently be found on Wikimedia Commons using this search, and are released under CC-BY-SA 4.0. The layout of the diagrams with extra whitespace is intended to allow easy comparison of diagrams that feature the same elements.

High level overview

High level overview of the Wikidata architecture

This overview shows the Wikidata website, running Mediawiki with the Wikibase extension in the left blue box. Various other extensions are also run such as WikibaseLexeme, WikibaseQualityConstraints, and PropertySuggester.

Wikidata is accessed through a Varnish caching and load balancing layer provided by the WMF. Users, tools and any 3rd parties interact with Wikidata through this layer.

Off to the right are various other external services provided by the WMF. Hadoop, Hive, Ooozie and Spark make up part of the WMF analytics cluster for creating pageview datasets. Graphite and Grafana provide live monitoring. There are many other general WMF services that are not listed in the diagram.

Finally we have our semi persistent and persistent storages which are used directly by Mediawiki and Wikibase. These include Memcached and Redis for caching, SQL(mariadb) for primary meta data, Blazegraph for triples, Swift for files and ElasticSearch for search indexing.

Continue reading

© 2021 Addshore

Theme by Anders NorénUp ↑