Squash-merging and other problems with GitHub

By , August 16, 2017 6:45 pm

(Thanks to Ben North, Colleen Murphy, and Nicolas Bock for reviewing earlier drafts of this post.)

In April 2016, GitHub announced a new feature supporting squashing of multiple commits in a PR at merge-time (announced on April 1st, but it was actually bona-fide 😉 ).

I appreciate that there was high demand for this feature (and similarly on GitLab), and apparently that many projects have a “squash before submitting a PR” policy, but I’d like to contend that this is a poor-man’s workaround for the lack of a real solution to the underlying problems.

Why squash-merge?

So what are the underlying problems which made this such a frequently requested feature? From reading the various links above, it seems that by far the biggest motivator is that people frequently submit pull requests (or merge requests, in GitLab-speak) which contain multiple commits, and these commits are seen as too “noisy” / fine-grained. In other words there is a desire to not pollute the target/trunk branch (e.g. master) with these fine-grained commits, and instead only have larger, less fine-grained commits merged.

But where does this desire come from? Well, if the fine-grained commits which accumulate on a PR branch are frequently amendments to earlier commits in the same PR (like “oops, fix typo I just made” or “oops, fix bug I just introduced”) then this desire is entirely understandable, because noone wants to see that kind of mess on master. However the real problem here is that that kind of mess should have never made it onto GitHub in the first place – not even onto a PR branch! It should have instead been fixed in the developer’s local repository. That is why there is a whole section in the “Pro Git” book dedicated to explaining how to rewrite local history, and why git-commit(1) and git-rebase(1) have native support for creating and squashing “fixup” commits into commits which they fix.

Use the force-push, Luke

If an existing PR needs to be amended, make the change and then rewrite local history so that it’s clean. The new version of the branch can then be force-pushed to GitHub via git push -f, which is an operation GitHub understands and in many situations handles reasonably gracefully. I have previously blogged about why this way is better, but one way of quickly summarising it is: don’t wash your dirty linen in public any more than you have to.

Continue reading 'Squash-merging and other problems with GitHub'»


Cloud rearrangement for fun and profit

By , May 17, 2015 4:42 am

In a populated compute cloud, there are several scenarios in which it’s beneficial to be able to rearrange VM guest instances into a different placement across the hypervisor hosts via migration (live or otherwise). These use cases typically fall into three categories:

  1. Rebalancing – spread the VMs evenly across as many physical VM host machines as possible (conceptually similar to vSphere DRS). Example use cases:
  2. Consolidation – condense VMs onto fewer physical VM host machines (conceptually similar to vSphere DPM). Typically involves some degree of defragmentation. Example use cases:
  3. Evacuation – free up physical servers:

Whilst one-shot manual or semi-automatic rearrangement can bring immediate benefits, the biggest wins often come when continual rearrangement is automated. The approaches can also be combined, e.g. first evacuate and/or consolidate, then rebalance on the remaining physical servers.

Other custom rearrangements may be required according to other IT- or business-driven policies, e.g. only rearrange VM instances relating to a specific workload, in order to increase locality of reference, reduce latency, respect availability zones, or facilitate other out-of-band workflows or policies (such as data privacy or other legalities).

In the rest of this post I will expand this topic in the context of OpenStack, talk about the computer science behind it, propose a possible way forward, and offer a working prototype in Python.

If you’re in Vancouver for the OpenStack summit which starts this Monday and you find this post interesting, ping me for a face-to-face chat!

Continue reading 'Cloud rearrangement for fun and profit'»


Tories to limit use of mathematics in amendment to anti-terrorism bill

By , May 9, 2015 3:45 am

Following on from the Conservative Party’s plans to take immediate advantage of their new majority in the House of Commons by pushing through surveillance powers known as the Snoopers’ Charter, the party has announced an amendment to the bill which will make it illegal for anyone to use any form of mathematics not on a government-approved whitelist.

In yesterday’s announcement, Theresa May, who as home secretary led the original legislation, said: “We were disappointed to receive feedback on the original Communications Data Bill from technology experts and civil liberties campaigners who considered it more important for citizens to be able to continue using encryption for non-essential activities like secure online shopping / banking, than for the police to be able to monitor the communications of anyone who could be a terrorist. The country was extremely healthy under John Major’s government in the 1990s before online services such as e-commerce and e-banking even existed, so it is a trivial and easily justifiable sacrifice to replace the freedom to use those services securely with laws creating a powerful deterrent for terrorists, who would face stiff fines and potentially even jail-time if found guilty of using encrypted communications.”

“However, during consultations with the financial sector in the City, we have been advised that banning use of all encryption software would prevent large UK corporations from trading on global markets.”

She continued, “We also discovered that communication can be encrypted non-electronically, for example using simple mathematical techniques on pen and paper, and we cannot in good conscience allow potential terrorists to use these techniques without fear of being arrested and detained for an arbitrary amount of questioning.”

“Therefore the only logical course of action is to amend the bill to ban use of all types of mathematics for which permission has not been explicitly granted by the government. A whitelist will be drafted for the upcoming debate on the bill. In order to avoid any impact on the economy, a special security exception will be made to allow financial institutions to continue using mathematics as before. For ordinary citizens, basic arithmetic will of course be allowed, although in financial contexts some restrictions will be imposed; for example, in the interests of national security, it will be forbidden for the general public to perform calculations relating to any personal expenditure of MPs or peers in the House of Lords.”

David Cameron issued a separate statement reinforcing the Home Secretary’s announcement and also rejecting an opposing argument which highlighted that whilst every year in the UK around 2,000 people die from traffic accidents and 65,000 from heart disease, in the past 5 years there have only been 2 people killed through terrorism. “Terrorism is a rising global threat, and must be countered at any cost, even at the expense of civil liberties and personal privacy”, the newly re-elected Prime Minster said. “If you have nothing to hide, why would you need privacy anyway? Everybody already shares everything on Facebook anyway.”


Why and how to correctly amend GitHub pull requests

By , March 24, 2015 3:00 pm

Like many F/OSS developers, I’m a heavy user of GitHub, collaborating on many projects which use the typical “fork & pull” workflow based on pull requests. The GitHub documentation on pull requests covers this workflow fairly comprehensively, but there seems to be one area which is significantly lacking in detail: why and how to amend existing pull requests. The article simply says:

After your pull request is sent, any new commits pushed to your branch will automatically be added to the pull request. This is especially useful if you need to make more changes.

The problem is that this completely ignores the fact that there are often very good reasons for amending existing commits within the pull request, not just for adding new commits it.

Why amend an existing pull request?

A peer review cycle can potentially reveal many issues which make the pull request unready for merging, e.g.

  • typos
  • bugs in the proposed code changes
  • missing features in the proposed code changes
  • incomplete test coverage
  • incomplete documentation changes
  • style inconsistencies (including whitespace issues)
  • incorrect or incomplete commit messages
  • the commits violate the rule of one logical change per commit
  • some changes are outside the scope of the pull request

This is of course what makes peer review of pull requests so valuable: the problems can be addressed even before they hit the master branch, which helps maintain high quality in the development trunk. But then how do we address the issues?

Continue reading 'Why and how to correctly amend GitHub pull requests'»


Announcing git-deps: commit dependency analysis / visualization tool

By , January 19, 2015 12:15 am

I’m happy to announce a new tool called git-deps which performs automatic analysis and visualization of dependencies between commits in a git repository. Here’s a screencast demonstration!

Back in 2013 I blogged about some tools I wrote which harness the notes feature of git to help with the process of porting commits from one branch to another. These are mostly useful in the cases where porting is more complex than just cherry-picking a small number of commits.

However, even in the case where there are a small number of desired commits, sometimes those commits have hidden dependencies on other commits which you didn’t particularly want, but need to pull in anyway, e.g. in order to avoid conflicts during cherry-picking. Of course those secondary commits may in turn require other commits, and before you know it, you’re in dependency hell, which is only supposed to happen if you’re trying to install Linux packages and it’s still 1998 … but in fact that’s exactly what happened to me at SUSEcon 2013, when I attempted to help a colleague backport a bugfix in OpenStack Nova from the master branch to a stable release branch.

At first sight it looked like it would only require a trivial git cherry-pick, but that immediately revealed conflicts due to related code having changed in master since the release was made. I manually found the underlying commit which the bugfix required by using git blame, and tried another cherry-pick. The same thing happened again. Very soon I found myself in a quagmire of dependencies between commits, with no idea whether the end was in sight.

So wouldn’t it be nice if you could see the dependency tree ahead of time, rather than spending a whole bunch of time resolving unexpected conflicts due to missing dependencies, only to realise that the tree’s way deeper than you expected, and that actually a totally different approach is needed? Well, I thought it would, and so git-deps was born!

In coffee breaks during the ensuing openSUSE conference at the same venue, I feverishly hacked together a prototype and it seemed to work. Then normal life intervened, and no progress was made for another year.

However thanks to SUSE’s generous Hack Week policy, I have had the luxury of being able to spending some of early January 2015 working to bring this tool to the next level. I submitted a Hack Week project page, announced my intentions on the git mailing list, started hacking, missed quite a bit of sleep, and finally recorded the above screencast.

The tool is available here: https://github.com/aspiers/git-deps

Please give it a go and let me know what you think! I’m particularly interested in hearing ideas for use cases I didn’t think of yet, and proposals for integration with other git web front-ends.


Panorama Theme by Themocracy