Recently I have been spending lots of time looking at the Wikimedia graphite set-up due to working on Grafana dashboards. In exchange for what some people had been doing for me I decided to take a quick look down the list of open Graphite tickets and found T116031. Sometimes it is great when such a small fix can have such a big impact!

After digging through all of the code I eventually discovered the method which sends Mediawiki metrics to Statsd is SamplingStatsdClient::send. This method is an overridden version of StatsdClient::send which is provided by liuggio/statsd-php-client. However a bug has existed in the sampling client ever since its creation!

The fix for the bug can be found on gerrit and only a +10 -4 line change (only 2 of those lines were actually code).

// Before
$data = $this->sampleData( $data );
$messages = array_map( 'strval', $data );
$data = $this->reduceCount( $data );
$this->send( $messages );

//After
$data = $this->sampleData( $data );
$data = array_map( 'strval', $data );
$data = $this->reduceCount( $data );
$this->send( $data );

The result of deploying this fix on the Wikimedia cluster can be seen below.

Decrease in packets when deploying fixed Mediawiki Statsd client

You can see a reduction from roughly 85kpps to 25kpps at the point of deployment. This is over a 50% decrease!

Decrease in bytes in after Mediawiki Statsd client fix deployment

A decrease in bytes received can also be seen, even though the same number of metrics are being sent. This is due to the reduction in packet overhead, a drop of roughly 1MBps at deployment.

The little things really are great. Now to see if we can reduce that packet count even more!