Faster munging for the Wikidata Query Service using Hadoop
The Wikidata query service is a public SPARQL endpoint for querying all of the data contained within Wikidata. In a previous blog post I walked through how to set up a complete copy of this query service. One of the steps in this process is the munge step. This performs some pre-processing on the RDF…
Using Hue & Hive to quickly determine Wikidata API maxlag usage
Hue, or Hadoop User Experience is described by its documentation pages as “a Web application that enables you to easily interact with an Hadoop cluster”. The Wikimedia Foundation has a Hue frontend for their Hadoop cluster, which contains various datasets including web requests, API usage and the MediaWiki edit history for all hosted sites. The install…
WMDE: Metrics & Data Gatherings
Below you will find an internal WMDE presentation covering the general area of WMDE Metric & Data Gatherings from 2016. This presentation follows on from the initial introduction to engineering analytics activities. The presentation skims through: WMDE Grafana dashboards The Wikimedia Analytics landscape Grafana & graphite Hadoop, Kafka, Hive & Oozie EventLogging, Mysql replicas &…