Addshore

It's a blog

Tag: hive

Using Hue & Hive to quickly determine Wikidata API maxlag usage

Hue, or Hadoop User Experience is described by its documentation pages as “a Web application that enables you to easily interact with an Hadoop cluster”.

The Wikimedia Foundation has a Hue frontend for their Hadoop cluster, which contains various datasets including web requests, API usage and the MediaWiki edit history for all hosted sites. The install can be accessed at https://hue.wikimedia.org/ using Wikimedia LDAP for authentication.

Once logged in Hue can be used to write Hive queries with syntax highlighting, auto suggestions and formatting, as well as allowing users to save queries with names and descriptions, run queries from the browser and watch hadoop job execution state.

The Wikidata & maxlag bit

MediaWiki has a maxlag API parameter that can be passed alongside API requests in order to cause errors / stop writes from happening when the DB servers are lagging behind the master. Within MediaWiki this lag can also be raised when the JobQueue is very full. Recently Wikibase introduced the ability to raise this lag when the Dispatching of changes to client projects is also lagged behind. In order to see how effective this will be, we can take a look at previous API calls.

Continue reading

WMDE: Metrics & Data Gatherings

Below you will find an internal WMDE presentation covering the general area of WMDE Metric & Data Gatherings from 2016.

This presentation follows on from the initial introduction to engineering analytics activities.

The presentation skims through:

  • WMDE Grafana dashboards
  • The Wikimedia Analytics landscape
  • Grafana & graphite
  • Hadoop, Kafka, Hive & Oozie
  • EventLogging, Mysql replicas & MediaWiki logs
  • Out Analytics scripts
  • How to get access

© 2018 Addshore

Theme by Anders NorenUp ↑