MediaWiki CRAP – The worst of it

November 21, 2015 0 By addshore

I don’t mean Mediawiki is crap! The Change Risk Anti-Patterns (CRAP) Index is calculated based on the cyclomatic complexity and code coverage of a unit of code. Complex code and untested code will have a higher CRAP index compared with simple well tested code. Over the last 2 years I have been tracking the CRAP index of some of Mediawikis more complex classes as reported by the automatic coverage reports, and this is a simple summary of what has been happening.

Just over 2 years ago I went through all of the Mediawiki unit tests and added @covers tags to improve the coverage reports for the source. This brought the line coverage to roughly 4% in toward the end of 2013. Since then the coverage has steadily been growing and is now at an amazing 9%. Now I am only counting coverage of the includes directory here, including maintenance scripts and Language definitions the 9% is actually 7%.

You can see the sharp increase in coverage at the very start of the graph below.

Over the past 2 years there has also been a push forward with librarization which has resulted in the removal of many things from the core repository and creation of many libraries now required using composer. Such libraries include:

  • mediawiki/at-ease – A safe alternative to PHP’s “@” error control operator
  • wikimedia/assert – Alternative to PHP’s assert()
  • wikimedia/base-convert – Improved base_convert for PHP
  • wikimedia/ip-set – PHP library to match IPs against CIDR specs
  • wikimedia/relpath – Compute a relative path between two paths
  • wikimedia/utfnormal – Unicode normalization functions
  • etc.

All of the above has helped to generally reduce the CRAP across the code base, even with some of the locations with the largest CRAP score.

The graph shows the CRAP index for the top 10 CRAP clases in Mediawiki core at any one time. The data is taken from 12 snapshots of the CRAP index across the 2 year period. At the very left of the graph you can see a sharp decrease in the CRAP index as unit test coverage was taken into account from this point (as in the coverage graph). Some classes fall out of the top 10 and are replaced by more CRAP classes through the 2 year period.

Well, coverage is generally trending up, CRAP is generally trending down. That’s good right? The overall CRAP index of the top 10 CRAP classes has actually decreased from 2.5 million to 2.2 million! Which of source means for the top 10 classes the CRAP average has decreased from 250,000 to 220,000!

Still a long way to go but it will be interesting to see what this looks like in another year.