You’ve must heard about “git merge recursive strategy” which is the default algorithm that git uses when merging two branches.
How does it work and why is it good?
EDIT: If you're interested on merge recursive you will probably enjoy this one too.
EDIT: 2013/10/02 - screencast available at http://youtu.be/Lg1igaCtAck
EDIT: 2014/03/35 - if you arrived here from the mercurial mailing list, then it is worth you check my replies to Matt. He was quite rude with the comments and I replied and wrote a follow up blog post where we not only explain why we do know quite a bit about merging, but also why our merge system is better than Mercurial's.
Basics – elements on a mergeYou’ve got two branches you want to merge. The basic “players” to consider are:
When is “merge recursive” needed?But, sometimes the scenario is not that simple. What if we find “two common ancestors”? The picture below shows how in this example we have two possible “common ancestors”.
While it won’t happen on a daily basis, it is really likely to happen with long lived branches or complex branch topologies (the case depicted above is the shortest one driving to the “multiple ancestor” problem, but it can happen too with several changesets and branches in between the “crossed” merges).
One solution is to “select” one of the ancestors as the valid one for the merge (which is the option Mercurial takes) but as we will see below, it has many drawbacks.
How “merge recursive” works?When more than one valid ancestor is found, the “recursive merge” strategy will create a new unique “virtual ancestor” merging the ones initially found. The following image depicts the algorithm:
A new “ancestor 2” will be used as “ancestor” to merge the “src” and “dst”.
The “merge recursive strategy” is able to find a better solution than just “selecting one of the two” as I’ll describe below.
Why merge recursive is better – a step by step exampleLet me use the following “notation” in thenext example: a file “foo.c” with three lines like these:
b c d
It will be described as: /foo.c = bcd.
Of course, for the sake of simplicity I’ll be using “stupid lines” like “abc” but assume the example is valid for real code too.
Let’s take a look at the following case in the diagram below:
I’ll try to describe it changeset by changeset:
If we now merge “task002” on “main” (cset “7” and cset “6”) what should we get?
We should get the “addition” on “task002” (a new line “a” at the beginning) on top of “7”, isn’t it?
The expected result is: foo.c=abcdE.
We shouldn’t get the line “C” in uppercase as it is on “task002” because we fixed it afterwards on “main”.
As you can see on the diagram, I highlighted the changesets “4” and “2” because they’re the two possible common ancestors from “6” and “7”.
Which one should be choose?
Mercurial will choose “4” because its algorithm chooses the “deepest” ancestor in case that more than one is found.
What happens if we choose “4”?
We will merge 4=bcdE, 6=abCdE and 7=abcdE and the automatic result (by any 3-way merge tool will be):
So, we automatically get foo=abCdE which is WRONG!!
We took “C” instead of “c” due to the wrong ancestor selection.
How recursive merge fixes the mess?As I described above, the first thing “recursive merge” is going to do is to calculate a new “virtual ancestor” merging “4” and “2” as the following picture shows:
The result “cset X” is foo=bCdE.
Later, “cset X” is used as the “ancestor” of “6” and “7” and then we get:
The calculated result takes into account the “fix” done in changeset “5” and hence the result is correct!
Why it is so good?If you have to deal with branching and merging and you don’t have a good merge algorithm, you can end up with broken files without notice!
In short: Git will do it correctly, Hg will break the result, and SVN and others will simply mess up the whole thing.
In Plastic SCM 4.0 we’ve also implemented “merge recursive” algorithm, so we are able to produce the same result (in fact, our algorithm is even more powerful, handling correctly cases that even Git is unable to deal with successfully).