Small commits

February 24, 2022 0 By addshore

There are many blog posts and articles out there about making small git commits. I’m sure most people (including me) bring up the same few topics around why small commits are good and why we should all probably be making smaller commits.

In this post, I’ll look at some of the key topics from my perspective, and try to tie these topics to concrete examples from repositories that I have worked on. The topics are in no particular order, so be sure to give them all a read.

One thing to note is that “small” doesn’t necessarily mean small in terms of lines of code. Small here is also relative. Also, small commits can benefit you in many different places, but to stand the test of time they must end up in your main branch.

Git features during development

Git only takes full responsibility for your data when you commit

Commit Often, Perfect Later, Publish Once: Git Best Practices

You can’t take advantage of any of the features that git has to offer unless you make commits. So commit early, and commit often, even if you later squash your changes either to submit them to a single commit based review system such as Gerrit or aim to squash them when merging a pull request.

Some IDEs do retain a history of individual file changes that can help you in some regards here, such as Local History which is part of IntelliJ. But git of course offers similar functionality IF you make commits.

Git commits will enable you to track named versions of your changes, as well as go back and follow your own thought process. You can even share this thought process (the commits) with others. Multiple commits will also allow you to take advantage of things such as revert and diff for parts of your change.

Take the example below, which includes 8 commits, but ultimately only ends up adding ~120 lines of code. The set of commits is a good example and communication tool showing how I personally tackled the problem that I was looking at, even if the resulting value is the ~120 lines of code.

Separation of concerns

Separation of concerns is primarily discussed as a design principle in software development, where generally speaking code should do one thing, and do it well.

All of the benefits that can be seen with code implementations can also be seen with git commits when those commits are small, targetted and deliberate, atomic if you like, just trying to do a single thing. The points below were based on a separation of concerns post by nalexn on Github.

  • More clarity: It’s much easier to understand what is going on in commits, when each commit just does one thing.
  • Better reusability: When if work in situations where you may need to apply a change to multiple branches, or package it up as a patch file, it’s easier to extract what you want with small distinct commits.
  • Better testability: It’s easier to see what behaviours small changes have had by looking at small test changes. Or even be able to see that a particular change is untested, or results in no behaviour change. Tests and continious integration workflows can also be run only for the small commit, rather than many things together.
  • Faster evolution: It’s much easier to get small changes merged into the code base and thus propell the code base forwards. If you tie changes together they may get blocked things that are totally unrelated, and thus not make it into the code base quickly, meaning the code base can’t benefit from the change.
  • It is easier to have simultaneous development by multiple engineers. (There are many sub topics here)

Splitting a change which does many things into smaller changes which each do only one thing can decrease the total complexity associated with accomplishing the same goal.

Writing Reviewable Code

An example of clarity and faster evolution from my work today would be this 5 line change altering some code that is used during the current Wikibase release process. I found that I wanted to make this change while rewriting documentation in another pull request which adds 100 lines and removes 41, however, I created a separate small commit, publishing it for review separately.

At the time of writing this, the small fix is already merged, and the documentation change is still pending. If I had submitted them together neither would have made their way into the codebase yet.

Reusability of commits, testability and clarity really shines through when it comes to maintaining multiple releases of software. For Wikibase we maintain multiple release branches alongside our main branch and often want to backport fixes and small improvements that have been made by developers in the main branch.

An example would be this change that alters how a single SQL query is formed (a small +3 -1 line change). This change appears to have been made alongside a larger change to the code, which is also generally doing other things. As these are separate commits we can see what is happening, the impact that it should have, the fact that the tests all pass in the commits standalone form and we can easily cherry-pick the change onto the release branch, which was done.

Easier review

Your reviewers’ number one challenge is to understand what you’re doing and why, and the way to help someone understand is to manage their cognitive load.

Cracking the code review (Part 2): Make them seem small

Keeping the cognitive load low during code review is important to increase the quality of the review, and also get through the review process faster, thus merging the code faster. You want to avoid reviewers’ minds wandering, and you can do this by keeping things small, and focused, particularly keeping refactorings and behavioural changes separately. The post I quoted above comes in 2 parts (part 1 & part 2) and I highly recommend reading both.

The example of small commits making easy review that I bring is this change of 9 commits that were reviewed and merged into Wikibase in 2020. You can see the relation chain in the screenshot below in the top right.

The first 8 changes here touched a total of 1,600 lines of code. The commits were concise and stated clearly what they were doing, and each commit only did that one thing. For example, remove class X or deprecate thing Y.

The final change still touched around 4,000 lines of code, but everything was much easier for the reviewers to digest, due to small/atomic focused commits. Having the commits combined would have given reviewers more things to think about at once, prolonged review and prevented merging early.

The example may be thousands of lines of code, but the same principles apply with 10 lines of code.

Conflicts, Rebasing & Reverts

The bigger a patch is, the longer review takes, and the more people that are working on any given codebase, the more likely you are to have merge conflicts. Merge conflicts in turn prolong the time before code can be merged, introduce a new round of review and as the size of a change increases the time to resolve conflicts probably increases non-linearly.

So avoid making changes that could be split out into smaller commits together. If you notice some unrelated documentation that has a typo, or some indenting that is wrong, a small bug that could be fixed, or some optimization that could be made, do it in a separate commit!

Conflicts don’t only happen during the review cycle, but also if reverts are needed for any reason, such as a bug in a change that is only found after merging to main has already happened. If a commit is small it will likely be easy to revert, but if it is doing many things as time progresses the revert will become harder, need more manual intervention and likely lead to more issues.

This is also a reason to do cleanup and refactorings around a change before and not after making behaviour changes. If you need to revert a behaviour change that has already been refactored, you’ll likely also need to revert the refactoring. If refactoring came first you can simply revert the behaviour change.

The example I will use while looking at reverts will be a rather large change to MediaWiki core back in 2015 introducing a new feature called catwatch.

The original patch “Enable users to watch category membership changes” (+614 -57) needed to be reverted 6 days later. The revert had a few conflicts that needed to be manually resolved (an opportunity to introduce more errors) before it could be merged.

This patch was later re-applied in a few smaller commits, aiding in review, and also, if needed in the future aiding in future reverts. The patches were:

Though not a perfect example, in the case of a future revert, the behavioural change was kept to the third commit, and only this one commit would need to be reverted if there was an issue.

Teamwork & Future you

The commits that end up being merged into a repository are only kept around to help the group of people working on a codebase, and that includes future you. A lot of other git principles come into play here, such as writing meaningful commit messages.

Other folks that were not involved in any code review process, or in writing code that has been merged start with a disadvantage in terms of understanding what has happened and why. This should be apparent from the commit and the commit message, and things should not be hidden beneath the surface. Commits are guaranteed to stand the course of time in your repository, code review and external discussions not so much.

Even if you wrote the code, if you have a reason to look back through weeks, months or even years of code, then the footprints that you leave in the form of commits end up being the main useful signposts. You’ll primarily be looking at the first line of commit messages, and hoping that this message represents what the commit is actually doing.

I can’t find a concrete example link for this, but imagine that you write a commit called “implement feature X”, but this commit also fixes bug Y. If you are looking at commits, you won’t see this hidden change.