Julien Danjou recently posted a thought-provoking rant about GitHub’s pull request workflow implementation. His main point is essentially that Gerrit provides a more sophisticated review system. Of course I’m not going to disagree with that 😉
I am generally a big fan of Julien’s work and I’m very excited for the future of OpenStack Ceilometer of which he is the current PTL. However, in this case I think his views are understandably biased towards review workflows on very large projects like OpenStack, and I found the length of the rant slightly disproportionate to the actual substance of the points made within it. Somewhat ironically, AFAICS his blog’s own review process could use some improvement due to comments currently being disabled 😉 So here are my thoughts.
Disclaimer: currently I use GitHub for reviews more regularly than Gerrit, so my views are likely to be biased at least as much as Julien’s, but in the opposite direction 😉
Barrier to entry too low?
The pull-request system looks like an incredible easy way to contribute to any project hosted on Github. You’re a click away to send your contribution to any software. But the problem is that any worthy contribution isn’t an effort of a single click.
Doing any proper and useful contribution to a software is never done right the first time.
That’s an unhelpful over-generalization. There are plenty of cases where the low barrier to entry of a single click means the difference between a worthy contribution being merged and it never being sent. Fixing typos is the classic example, but there are others – small tweaks to comments and docs, one-line bug fixes etc. I work on several small projects for which setting up Gerrit would be complete overkill, and GitHub’s PR mechanism is a godsend in this regard.
So I’d suggest that one size does not fit all, and the larger the project, the higher the barrier to entry should be.
Preserving rewritten history
The problem is that if your contributor does this [history rewriting via
git rebase --interactive] and then repush the branch composing your pull-request to Github, you will both lose the previous review done, each time. There’s no history on the different versions of the branch that has been pushed.
This is not quite correct. Julien is right to point out that GitHub doesn’t provide history of the commit SHA1s via the web UI or via the API. This is indeed a crucial flaw 🙁 I have not seen any evidence that GitHub exposes anything akin to reflogs, and this flaw aligns with that, so it may be a deliberate design trade-off they decided in order to save on disk space (although that doesn’t sound like a big saving to me).
However, let’s give credit where it’s due: GitHub does provide history of the review discussion, and in particular has quite a nice feature for handling comments on outdated diff hunks. If someone submitted a pull request with (say) two diff hunks, you provided feedback on both, and then they force-pushed new history which only changes one of those hunks, then the pull request page will now only show one of the hunks as outdated, so both the contributor and the reviewer can see that one hunk needs a fresh review, and the other still contains unresolved issues. Here’s an example pull request I just made to illustrate it. GitHub presents a chronological history of the review:
- the first version of the patch is lamentably missing from near the top,
- the final version of the patch is near the bottom, and
- discussion over now-outdated diff hunks are initially collapsed for brevity.
You can click to expand the discussion over points which are outdated. For comparison, here’s the JSON returned by the API for the same view.
Gate testing is very easy to set up on GitHub projects using Travis CI, and integration between the two systems is very nice.
Again, I agree with Julien that the weakness of this approach is that (AFAIK) Travis only runs the tests against the tip of each pull request branch, rather than for each commit within it, and this leaves the project more vulnerable to pull requests which contain broken commits. I am every bit as passionate about history cleanliness as Julien, as you can see from my previous blog on the subject. However, this assertion is inaccurate:
With such a restriction, it’s impossible to have “fixup commits” merged in your project and pollute the history and the testability of the project. […] This implies that no broken patchset can ever sneak in, […]
Even if the tests pass for every commit, that does not prove that the pull request contains no fixup commits. For example, a fixup commit might fix a typo in a comment introduced by a previous commit in the same series. The typo won’t break any tests, but it will pollute the history. So testing every commit is very good (if you can afford it), but it’s not a panacea.
I am an occasional contributor to
git itself, and I find it interesting to note that those folks are absolutely religious about reviewing patches via the mailing list, using
git format-patch and
git send-email, rather than via an external tool such as GitHub or Gerrit. Although the maintainer does an incredibly good job, there are occasions where patches can get accidentally dropped or the wrong version taken, and my personal preference would be to use a more structured system like Gerrit to defend against this. However every system has its strengths and weaknesses, and I’m fairly sure they have good reasons for deliberately choosing not to use Gerrit.
Additionally, Linus absolutely hates GitHub pull requests 😉
Yes, GitHub has some crucial deficiencies in its pull request review workflow. But nevertheless, for small to medium-size projects, what it offers for free can be incredibly useful, and I for one am very grateful.
In the future I hope to use Gerrit more often than GitHub, but I thought it was worth partially defending the underdog here 😉