The Plastic SCM blog

Merge recursive strategy

You’ve must heard about “git merge recursive strategy” which is the default algorithm that git uses when merging two branches. How does it work and why is it good?

Basics – elements on a merge

You’ve got two branches you want to merge. The basic “players” to consider are:
  • The “source”: the changeset you’re merging from. Changeset 16 in our sample below.
  • The “destination”: the changeset you’re merging to. 15 in our case.
  • The “ancestor”: the changeset (or commit) which is the nearest parent of the “source” and the “destination”, which is “10” in our case.
    So, we will merge 16 and 15 using 10 as ancestor. We DO need an ancestor to be used in “three-way” merges (more about it here).

    When is “merge recursive” needed?

    But, sometimes the scenario is not that simple. What if we find “two common ancestors”? The picture below shows how in this example we have to possible “common ancestors”.
    Please note: the example is a little bit forced since there’s not a good reason (initially) for the developer merging from changeset 11 into 16 instead of merging from changeset 15 (the latest from the branch “main” at the point of the merge). But let’s assume he has to do it for a reason (like cset:11 was stable and 13 and 15 weren’t at the time, for instance). The point is: between 15 and 16 there’s not a single unique ancestors but two ancestors at the same “distance”: 12 and 11.

    While it won’t happen on a daily basis, it is really likely to happen with long lived branches or complex branch topologies (the case depicted above is the shortest one driving to the “multiple ancestor” problem, but it can happen too with several changesets and branches in between the “crossed” merges).

    One solution is to “select” one of the ancestors as the valid one for the merge (which is the option Mercurial takes) but as we will see below, it has many drawbacks.

    How “merge recursive” works?

    When more than one valid ancestor is found, the “recursive merge” strategy will create a new unique “virtual ancestor” merging the ones initially found. The following image depicts the algorithm:

    A new “ancestor 2” will be used as “ancestor” to merge the “src” and “dst”.

    The “merge recursive strategy” is able to find a better solution than just “selecting one of the two” as I’ll describe below.

    Why merge recursive is better – a step by step example

    Let me use the following “notation” in thenext example: a file “foo.c” with three lines like these:
    b
    c
    d
    

    It will be described as: /foo.c = bcd.

    Of course, for the sake of simplicity I’ll be using “stupid lines” like “abc” but assume the example is valid for real code too.

    Let’s take a look at the following case in the diagram below:

    I’ll try to describe it changeset by changeset:

  • 0 -> we had a file foo.c with content foo=bcd (three lines, first is “b”, second is “c” and third is “d”)
  • 1 -> we edit foo.c on a branch and add a new line so it ends up being foo=bcde
  • 2 -> we modify the second line of the file on “main”. Now the file is foo=bCd
  • 3 -> we create a new branch from cset “2” and add a new line at the beginning so it is now foo.c=abCd
  • 4 -> going back to “task001” we modify the line we just added: foo.c=bcdE
  • 5 -> we “undo” the change we just did on “main”: foo.c=bcd
  • 6 -> we MERGE “4” and “3” and create “6” as foo.c=abCdE (we combine the changes we made on “task002” (adding a new line at the beginning) with the ones in “task001” (adding “E” at the end) and also the change coming from “2” on /main.
  • 7 -> we now MERGED “4” and “5” introducing the change from “4” (last line added) into “main”: foo.c=bcdE

    If we now merge “task002” on “main” (cset “7” and cset “6”) what should we get?

    We should get the “addition” on “task002” (a new line “a” at the beginning) on top of “7”, isn’t it?

    The expected result is: foo.c=abcdE.

    We shouldn’t get the line “C” in uppercase as it is on “task002” because we fixed it afterwards on “main”.

    As you can see on the diagram, I highlighted the changesets “4” and “2” because they’re the two possible common ancestors from “6” and “7”.

    Which one should be choose?

    Mercurial will choose “4” because its algorithm chooses the “deepest” ancestor in case that more than one is found.

    What happens if we choose “4”?

    We will merge 4=bcdE, 6=abCdE and 7=abcdE and the automatic result (by any 3-way merge tool will be):

  • First line -> a (it’s there on 6 and 7)
  • Second line -> b (it’s there on the three contributors)
  • Third line -> C (changed on “6” but unchanged on “4” and “7”)
  • Fourth line -> d (unchanged)
  • Fifth line -> E (unchanged)

    So, we automatically get foo=abCdE which is WRONG!!

    We took “C” instead of “c” due to the wrong ancestor selection.

    How recursive merge fixes the mess?

    As I described above, the first thing “recursive merge” is going to do is to calculate a new “virtual ancestor” merging “4” and “2” as the following picture shows:

    The result “cset X” is foo=bCdE.

    Later, “cset X” is used as the “ancestor” of “6” and “7” and then we get:

  • Ancestor: foo=bCdeE
  • Source: foo=abCdE
  • Destination: foo=bcdE
  • Result: foo=abcdE -> which is what we were looking for!

    The calculated result takes into account the “fix” done in changeset “5” and hence the result is correct!

    Why it is so good?

    If you have to deal with branching and merging and you don’t have a good merge algorithm, you can end up with broken files without notice!

    In short: Git will do it correctly, Hg will break the result, and SVN and others will simply mess up the whole thing.

    In Plastic SCM 4.0 we’ve also implemented “merge recursive” algorithm, so we are able to produce the same result (in fact, our algorithm is even more powerful, handling correctly cases that even Git is unable to deal with successfully).

    More cases

    I focused on “file conflicts” today but it is easy to come up with examples affecting directory structures too. I’ll cover it in a follow up blog post.

    Wrapping up

    Branching and merging are the two weapons you must have in your developer’s toolset… but make sure you have the best possible ones, the ones that really do the job.
  • Plastic SCM release 3.0.30

    It has been a while announcing a blogpost for new 3.0.xx release, however, we have been releasing a bunch of them with bug fixes and usability improvements, and this one will highlight the most important bullets in the current release Plastic SCM 3.0.30.

    Full release notes: http://www.plasticscm.com/download/releasenotes.aspx

    Download page: http://www.plasticscm.com/download/login.aspx

    Third party integration matrix list: http://www.plasticscm.com/infocenter/third-party-compatibility.aspx

    New features:

    Atlassian Bamboo integration:

    Plastic SCM now integrates and includes a plugin to integrate with Atlassian Bamboo Continues Integration (http://www.atlassian.com/software/bamboo/)

    Zutubi Pulse:

    Plastic SCM plugin is updated to support Zutubi Pulse 2.3

    Eclipse plugin:

    A checkbox has been added to include changed items on the sync view and the commit dialog.




    Mergetool standalone:

    A request from our Plastic SCM community to add Drag&Drop into our mergetool.exe when executed as a standalone diff / merge tool.



    Filtering dialog Cloacked / ignored is sizable:

    Now the dialog is sizable, so users can see long rules.



    Support for big files > 2 Gb

    When a user checked in a file bigger than 2GB, the update failed. It did not download the entire file. There was an overflow with the size of the file.


    Improvements and bug fixes (only most important mentioned below):

    Visual Studio integration:

    The Plastic SCM plugin for visual studio has been reviewed a couple of time since we discovered that the support for solution folders added in release 3.0.19 needed to be revisited.

    Duplicate project entries in Pending changes view.

    Duplicate solution files .sln in Pending changes view

    Fixed an invalid cast exception when trying to select items in the Solution Explorer of Visual Studio 2010. This bug happened occasionally.

    When having a Windows Presentation Foundation project some checked-out files were not shown in the Pending changes view. The offline checkouts were not restored when Visual Studio was restarted.

    SQL server database backend:

    If during a label application a SQL command timeout fires, the rollback operation was not correctly performed, It could cause inconsistency in the databases between the tables marker realizations and object table.

    Installer for MAC:

    There was a problem when trying to start up the server in some configurations, related to the setup of the /Library/Startup directory.

    Installer for Plastic SCM Proxy:

    Proxy installer: problem solved related to a library not included in the installer.

    Plastic SCM Client / server version check:

    Compatibility check between client and server was not correctly handled
    when the authentication mode was LDAP. It has been fixed and improved.

    Text / binary filetypes recognition:

    When a text based file "an example: makefile" was read-only, the detection was binary, this has been fixed.

    cm help:

    An error in the cm help label fixed: to apply a label recursively from the current directory the command is: "cm lb lb:LB001 . -R".


    And many other bug fixes and improvements, review release notes for full details.

    We hope you enjoy this release.

    Codice Software team

    Plastic SCM new forum - online!

    We are pleased to announce that we migrated our old forum which was based on "YAF - Yet Another Forum" to a much better one (IP.BOARD) from www.invisionpower.com .

    The old link should be the same, since we redirect to the new forum, but just in case your link is still redirecting you to the old - non existent forum, try inserting http://www.plasticscm.net and update your bookmark.

    Users + password have been migrated, but with the new system, generating "requesting" a new password is a 1 minute job, fully automated.

    We hope you enjoy the new forum as we do, and feel free to share your ideas there with the Plastic SCM growing community.

    regards,
    miller

    Real Time Web Analytics