More on recursive merge strategy

January 20, 2012

[Edit - 2013/10/02 - screencast available at http://youtu.be/Lg1igaCtAck]

A few days ago I had the chance to read a comment on my previous blog post about merge recursive pointing to the Mercurial mailing list where Matt Mackall (of Mercurial fame) strongly criticized my explanation of the recursive merge strategy.

Edit: I replied to Matt on the mailing list and things were clarified. Plastic is not a 'git wrapper' as he pointed but a full feature scm, and implementing quite strong merge :P

Well, some were simply about the “rendering style” of plastic’s branch explorer (which is what I used to drive the examples), and others about the explanation itself.

Mercurial’s programmer admits that recursive merge is better although argues that it is only worth in some cases and that its costs to implement is bigger than its benefits. Implementing a recursive merge isn’t cheap, as we known down here at Codice, but you know, the big difference between a great product and an exceptional one normally relies on the upper 2% of features that makes it rock.

Today we'll be covering (I'm writing this with the help of Borja, our merge master) "advantages of recursive merge - Case 1: Only one ancestor is the right option".

Back to the drawing board

Recursive merge is not only good for “criss-cross” merge situations but also for more “regular” ones where merge history simply gets complicated.

I’d like to go back to the original example and explain it now in bigger detail:

We’re going to merge from changeset 5 to changeset 6. In this diagram (another explanation using code will come later) we simply say the “content” of the file is A or B or C, but it applies to lines of code (containing important changes, bug fixes and so on).

DAG format

Just in case anyone has trouble understanding that the green lines are merges and they’re rendered from source to destination, here’s an alternative diagram:

Common ancestors

In order to run a merge, you need to identify the “source”, the “destination” and the nearest common ancestor.

In this case the “src” and “dst” are clear but the problem is the “common ancestor” because there’s more than one candidate.

If you look carefully you’ll see that both changesets “2” and “3” can be taken as common ancestors.

A non-recursive merge algorithm will only use one of them, and depending on the “choice”, the result can be quite different.

Selecting “3” as common ancestor

If the algorithm chooses “3” (and as we explained in our previous post, this is the choice taken by Hg, and also would be the one taken by former plastic’s algorithm previous to “merge recursive” implementation) then the following situation will happen:
It will lead to a manual merge, where the user will be forced to choose.

Selecting “2” as common ancestor – the lucky shot

If the algorithm uses “2” as common ancestor, then the merge will be between content “B”, “B” and “C•”, so, since two contributors are the same, the result will be “C”.
It is the expected result as we will see later, and it is a pretty valid choice, the problem is that there’s no good consistent way to choose “2”.

This time, the result matches the expected result, but sometimes it simply leads to an incorrect automatic choice, which is the real problem and the real reason why “recursive” is implemented by git and plastic.

The recursive option

Again, here’s how the “recursive works”: since there’re two ancestors at the same distance, it will “merge them” creating a new virtual ancestor that will be used as the “base” for the final merge between the merge candidates “5” and “6”:
The following image depicts the first step, the calculation of the “virt” common ancestor (the intermediate one):
And then the second step, where the result is calculated automatically, this time without user intervention:

Where is the huge difference?

Comparing recursive and no-recursive algorithms in this example, you might only see that the recursive is able to avoid a manual conflict because it is able to find out how it was solved before. Is it such a big difference? Just avoiding one manual merge?

Real world has the answer: try a "real" merge involving several hundreds of files and then the answer is clear: usable if you don't have to solve the conflicts vs not usable.

Wrap up

We still have a ton of posts to write to share the particular cases where having recursive merge saves the day, so we’ll try to move forward one by one in the coming weeks.

10 comments :

CarpeDiemEVive said...

Conversation was posted on Reddit today:

http://www.reddit.com/r/programming/comments/ovviu/discussion_on_recursive_merge_hg_vs_git/

Andreas Krey said...

Your first diagram is missing an arrow in comparison to the second. (In the first, the base for a 3-way-merge is pretty obvious.)

pablo said...

@Andreas: which one is missing an arrow? http://3.bp.blogspot.com/-EvhJDUNxc0k/TxiiWdrT-gI/AAAAAAAABQA/kDmQJ_clNAQ/s1600/image00.png Are you sure?

Andreas Krey said...

Well, yes. In the second image (where time goes down), Node 6 is the result of a merge of 2 and 3, but in the first image (where time goes right) there is no arrow from 3 to 6.

pablo said...

@Andreas: should be fixed now.

Maksim Sakałoŭski said...

And what about the situation when there are more than 2 common ancestors? How will be they merged? In what order?

Thanks

The Swan Group said...

Once I understand how a 3-way merge is "smarter" than a 2-way merge, I can understand how a recursive merge can result in a "smarter" "virtual merge" with fewer conflicts; recursive is great! Thanks for your explanations.

redpeas said...

Once I understand "3-way merge" from another SCM blog; I can appreciate how a "recursive merge" results in a "smarter" virtual merge from common ancestors. Great explanation , thanks!

Jorge said...

Hi Pablo,

sorry for resurrecting this old topic, but I think you failed to address Mercurial-guy's only interesting point: What to do when you can't automatically create the virtual ancestor because the merge to create it has conflicts.

Do you have any thoughts on that?

Thanks,
Jorge

pablo said...

Well, I actually addressed some of the mercurial's post mistakes, starting with the assumption that Plastic was a layer on top of Git :-)

The thing is: if there is a merge in the intermediate virtual ancestor Plastic will go and ask you to merge it. And it will do a better job than Git dealing with intermediate file merges, because it won't write the resulting file with several >> marks saying what was modified, but will launch the 3-way merge tool on each intermediate merge.

Taking the wrong ancestor is worse than having to fix some conflicts :-)

Real Time Web Analytics