The Wikimedia Server Admin Logs

June 27, 2018 0 By addshore

The Wikimedia Server Admin Log or SAL for short is a timestamped log of actions performed on the Wikimedia cluster by users such as roots and deployers. The log is stored on the WikiTech Wikimedia project and can be found at the following URL: https://wikitech.wikimedia.org/wiki/Server_Admin_Log

An example entry in the log could be:

09:04 addshore: addshore@terbium:~$ for i in {1..2500}; do echo Lexeme:L$i; done | mwscript purgePage.php --wiki wikidatawiki

As well as the main cluster SAL there are also logs for release engineering (jenkins, zuul, and other CI things) and individual logs for each project that uses Wikimedia Cloud VPS.

A tool has been created for easy SAL navigation which can be found at https://tools.wmflabs.org/sal

Each SAL can be selected at the top of the tool, with ‘Other’ providing you with a list of all Cloud VPS SALs.

The search and date filters can then be used to find entries throughout history.

Usages of the SAL

The timestamps of the SAL can be used to determine the causes of cluster or projects issues. The entries can be used alongside other timestamped data in incident reports, for example https://wikitech.wikimedia.org/wiki/Incident_documentation/20180229-Train-1.31.0-wmf.27#Timeline

The SAL messages can sometimes be used to find changes that have otherwise gone entirely undocumented (not in puppet or any git repo).

An example of finding such an undocumented change can be found below with the tmp1 index that was recently found on the wb_terms wikidata table.

Finding the wikidata tmp1 index

Searching for ‘tmp1’ only finds log entries from 2018:

2018-05-26
14:25   <marostegui>  Add tmp1 index back on db1101:3318 - T194273
09:56   <marostegui>  Add tmp1 index back on db1087 (sanitarium master), this will generate lag on labsdb hosts - T194273
05:21   <marostegui>  Add tmp1 index back on db1099:3318 - T194273
2018-05-25
08:36   <marostegui>  Add tmp1 back on db1092 - https://phabricator.wikimedia.org/T194273
05:57   <marostegui>  Add tmp1 index back on dbstore1002 - T194273
05:05   <marostegui>  Add tmp1 index back to db1109 - T194273
2018-05-24
20:58   <marostegui>  Add tmp1 index back on db1104

Changing the search a little to look for the table name instead of the index name reveals some different log entries:

2014-01-04
04:13   <springle>    schema change, ad-hoc, additional indexes on recentchanges & wb_terms for recent slow queries
2013-12-13
02:50   <springle>    ongoing schema changes on slaves, indexing only, logging gerrit 85508, wb_terms gerrit 99660
2013-07-18
19:52   <springle>    delaying slave db45 for wikidatawiki wb_terms OSC duration
18:13   <Reedy>   Added term weight to wb_terms on testwikidatawiki

The key entry here mentioning gerrit change 99660 which in turn links to bugzilla bug 45529 which is now recorded as phabricator ticket T47529.

Looking through the phabricator comments we can see a first mention of tmp1, https://phabricator.wikimedia.org/T47529#518889.

Writing to SAL

The SAL can of course be written to directly on the various wiki pages however this is not advised at it can of course lead to edit conflicts.

Take a look at https://wikitech.wikimedia.org/wiki/Tool:Stashbot