Who we are

We are the developers of Plastic SCM, a full version control stack (not a Git variant). We work on the strongest branching and merging you can find, and a core that doesn't cringe with huge binaries and repos. We also develop the GUIs, mergetools and everything needed to give you the full version control stack.

If you want to give it a try, download it from here.

We also code SemanticMerge, and the gmaster Git client.

More on recursive merge strategy

Friday, January 20, 2012 Pablo Santos 10 Comments

NB: This article has been updated through out the years. It's the continuation to this initial post explaining how recursive merge works. Last update was done on December 2018.

Example: Only one ancestor is the right option

Recursive merge is not only good for “criss-cross” merge situations but also for more “regular” ones where merge history simply gets complicated. I’d like to go back to the original example and explain it now in bigger detail:
We’re going to merge from changeset 5 to changeset 6. In this diagram (another explanation using code will come later) we simply say the “content” of the file is A or B or C, but it applies to lines of code (containing important changes, bug fixes and so on).

DAG format

Just in case anyone has trouble understanding that the green lines are merges and they’re rendered from source to destination, here’s an alternative diagram:

Common ancestors

In order to run a merge, you need to identify the “source”, the “destination” and the nearest common ancestor. In this case the “src” and “dst” are clear but the problem is the “common ancestor” because there’s more than one candidate.
If you look carefully you’ll see that both changesets “2” and “3” can be taken as common ancestors.
A non-recursive merge algorithm will only use one of them, and depending on the “choice”, the result can be quite different.

Selecting “3” as common ancestor

If the algorithm chooses “3” (and as we explained in our previous post, this is the choice taken by Hg, and also would be the one taken by former plastic’s algorithm previous to “merge recursive” implementation) then the following situation will happen:
It will lead to a manual merge, where the user will be forced to choose.

Selecting “2” as common ancestor – the lucky shot

If the algorithm uses “2” as common ancestor, then the merge will be between content “B”, “B” and “C•”, so, since two contributors are the same, the result will be “C”.
It is the expected result as we will see later, and it is a pretty valid choice, the problem is that there’s no good consistent way to choose “2”. This time, the result matches the expected result, but sometimes it simply leads to an incorrect automatic choice, which is the real problem and the real reason why “recursive” is implemented by git and plastic.

The recursive option

Again, here’s how the “recursive works”: since there’re two ancestors at the same distance, it will “merge them” creating a new virtual ancestor that will be used as the “base” for the final merge between the merge candidates “5” and “6”:
The following image depicts the first step, the calculation of the “virt” common ancestor (the intermediate one):
And then the second step, where the result is calculated automatically, this time without user intervention:

Where is the huge difference?

Comparing recursive and no-recursive algorithms in this example, you might only see that the recursive is able to avoid a manual conflict because it is able to find out how it was solved before. Is it such a big difference? Just avoiding one manual merge? Real world has the answer: try a "real" merge involving several hundreds of files and then the answer is clear: usable if you don't have to solve the conflicts vs not usable.

Wrap up

We still have a ton of posts to write to share the particular cases where having recursive merge saves the day, so we’ll try to move forward one by one in the coming weeks.






Pablo Santos
I'm the CTO and Founder at Códice.
I've been leading Plastic SCM since 2005. My passion is helping teams work better through version control.
I had the opportunity to see teams from many different industries at work while I helped them improving their version control practices.
I really enjoy teaching (I've been a University professor for 6+ years) and sharing my experience in talks and articles.
And I love simple code. You can reach me at @psluaces.

10 comments:

  1. Conversation was posted on Reddit today:

    http://www.reddit.com/r/programming/comments/ovviu/discussion_on_recursive_merge_hg_vs_git/

    ReplyDelete
  2. Your first diagram is missing an arrow in comparison to the second. (In the first, the base for a 3-way-merge is pretty obvious.)

    ReplyDelete
  3. @Andreas: which one is missing an arrow? http://3.bp.blogspot.com/-EvhJDUNxc0k/TxiiWdrT-gI/AAAAAAAABQA/kDmQJ_clNAQ/s1600/image00.png Are you sure?

    ReplyDelete
  4. Well, yes. In the second image (where time goes down), Node 6 is the result of a merge of 2 and 3, but in the first image (where time goes right) there is no arrow from 3 to 6.

    ReplyDelete
  5. And what about the situation when there are more than 2 common ancestors? How will be they merged? In what order?

    Thanks

    ReplyDelete
  6. Once I understand how a 3-way merge is "smarter" than a 2-way merge, I can understand how a recursive merge can result in a "smarter" "virtual merge" with fewer conflicts; recursive is great! Thanks for your explanations.

    ReplyDelete
  7. Once I understand "3-way merge" from another SCM blog; I can appreciate how a "recursive merge" results in a "smarter" virtual merge from common ancestors. Great explanation , thanks!

    ReplyDelete
  8. Hi Pablo,

    sorry for resurrecting this old topic, but I think you failed to address Mercurial-guy's only interesting point: What to do when you can't automatically create the virtual ancestor because the merge to create it has conflicts.

    Do you have any thoughts on that?

    Thanks,
    Jorge

    ReplyDelete
  9. Well, I actually addressed some of the mercurial's post mistakes, starting with the assumption that Plastic was a layer on top of Git :-)

    The thing is: if there is a merge in the intermediate virtual ancestor Plastic will go and ask you to merge it. And it will do a better job than Git dealing with intermediate file merges, because it won't write the resulting file with several >> marks saying what was modified, but will launch the 3-way merge tool on each intermediate merge.

    Taking the wrong ancestor is worse than having to fix some conflicts :-)

    ReplyDelete