Refactoring around WatchedItem in MediaWiki

May 3, 2016 0 By addshore

The refactoring started as part of [RFC] Expiring watch list entries. After an initial draft patch was made touching all of the necessary areas it was decided refactoring first would be a good idea as the change initially spanned many files. It is always good to do things properly ® instead of pushing forward in a hacky way increasing technical debt.

The idea of a WatchedItemStore was created that would remove lots of logic from the WatchedItem class as well as other watchlist database related code that was dotted around the code base such as in API modules and special pages.

The main patches can be seen here.

Moving the logic

Firstly logic was removed from WatchedItem with backward compatible fallbacks left behind wrapping the logic in WatchedItemStore. This essentially turned WatchedItem into a value object.

During this stage it was discovered that most of the logic from the class did not actually need access to a full Title object (the kitchen sink of MediaWiki). Recently a TitleValue object had been introduced and this much smaller class provided everything that was needed, but although the two classes have several methods in common, they did not implement a shared interface, thus LinkTarget was introduced.

Secondly queries and logic were extracted from other classes and brought into WatchedItemStore to be shared, but to start with this only included code that exclusively dealt with the watchlist. Code that combines the watchlist and recent changes for example will likely end up living in a different class.

Testing

Testing is important and of course was one of the targets of the refactoring. Prior to the refactoring there was essentially 0% test coverage for watchlist related code. After the refactoring line coverage was roughly 95% with a combination of unit and integration phpunit tests that have been clearly separated.

https://integration.wikimedia.org/cover/mediawiki-core/master/php/includes/WatchedItem.php.html

https://integration.wikimedia.org/cover/mediawiki-core/master/php/includes/WatchedItemStore.php.html

 

This test coverage has been achieved by injecting all services through the constructor of the WatchedItemStore which allows them to be mocked during tests.

Injecting a LoadBalancer instance allows mock Database services to be used and thus during unit tests all DB calls can be asserted while not calling a real DB. This is new and has not really been done in mediawiki core before and a test strategy such as this only really works if integration tests are also in place.

As can be seen in the image above 2 callbacks are also defined in the constructor. These are callbacks to static methods which are hard to mock. For these two static methods the production code calls the callback defined in the class. Phpunit tests can call a method to override this callback with a custom function allowing testing.

Using the ServiceLocator

Recently a ServiceLocator was introduced in MediaWiki. WatchedItemStore will be one of the first classes to use this locator once services can be overridden within tests. This will allow the removal of the singleton pattern from within WatchedItemStore.

Caching review

Basic in-process caching was added to WatchedItemStore as some of the logic extracted from the User class included caching. As a result the importance of this caching is now being investigated and the actuall use of the current caching can be seen at https://grafana.wikimedia.org/dashboard/db/mediawiki-watcheditemstore

Currently roughly 15% to 20% of calls to getWatchedItem result in a cached item being retrieved with the other ~80% causing db hits.

The low number of cache hits is likely due to the fact the cache is currently only a per request cache. More advanced caching would be needed to use a longer term caching allowing tagging of cache items / keys to enable purging.

Findings

The review process worked well throughout all related patches. Generally 2 people created the patches and then a mixture of people put the patches through a few rounds of review before later being merged by one of 3 or 4 people. The review was likely so smooth as the changes generally were just refactoring.

One issue that was run into a few times was TitleValue doing a strict check to make sure that the namespace ID that it is constructed with is an int. The MediaWiki DB abstraction layer will return int columns as strings, thus this caused exceptions in initial versions before int casts were added.

Also while trying to add more advanced caching MediaWiki’s lack of common cache interfaces caused a bit of pain and as a result and RFC has been started about a common interface or potentially using PSR-6. https://phabricator.wikimedia.org/T130528

A similar method of refactoring could probably be applied for much of MediaWiki, particular storage areas.