Recently I have been spending lots of time looking at the Wikimedia graphite set-up due to working on Grafana dashboards. In exchange for what some people had been doing for me I decided to take a quick look down the list of open Graphite tickets and found T116031. Sometimes it is great when such a small fix can have such a big impact!

After digging through all of the code I eventually discovered the method which sends Mediawiki metrics to Statsd is SamplingStatsdClient::send. This method is an overridden version of StatsdClient::send which is provided by liuggio/statsd-php-client. However a bug has existed in the sampling client ever since its creation!

The fix for the bug can be found on gerrit and only a +10 -4 line change (only 2 of those lines were actually code).

The result of deploying this fix on the Wikimedia cluster can be seen below.

Decrease in packets when deploying fixed Mediawiki Statsd client

You can see a reduction from roughly 85kpps to 25kpps at the point of deployment. This is over a 50% decrease!

Decrease in bytes in after Mediawiki Statsd client fix deployment

A decrease in bytes received can also be seen, even though the same number of metrics are being sent. This is due to the reduction in packet overhead, a drop of roughly 1MBps at deployment.

The little things really are great. Now to see if we can reduce that packet count even more!