The Plastic SCM blog

Active Directory - Work with huge AD trees

Recently we have experienced an issue when configuring the system authentication mode to Active Directory. The Active Directory tree contained more than 3000 entries.

By default, Active Directory allows fetching only 1000 entries foreach search request to the Active Directory Domain. The main reasons of this limit are security and performance.

If a search query to the Active Directory returns more than 1000 results, the ActiveDirectory throws an exception (sizelimitexception, LDAP error code 4).
Plastic SCM catches the exception and shows the following warning message:




At this point, there are three possible solutions to get a result:

  1. Close the warning message and specify a filter on the "Filter" textbox from the User Selection dialog.
    Doing this, Plastic SCM will filter the query to the Active Directory domain, and will retrieve less results than before.

    Constraints:
    - The specified filter has to return less than 1000 results.

  2. Specify a subdomain to the PlasticSCM server configuration instead of the entire Active Directory domain.
    Doing this, Plastic SCM server will query only for users and groups from the specified subdomain.

    Example:
    If you currently have configured your PlasticSCM server ActiveDirectory's domain to:
    "mycompany.com" (or "dc=mycompany,dc=com")
    Change it to:
    "developers.mycompany.com" (or "dc=developers,dc=mycompany,dc=com")
    You can perform this change through the server configuration wizard.




    Constraints:
    - All the PlasticSCM users must be contained on that subdomain
    - The list of users/group on the subdomain contains less than 1000 entries.

  3. Change the Active Directory's limit. You can do that by following this guide from Microsoft's Knowledge base: http://support.microsoft.com/kb/315071
    (Sections: "Starting Ntdsutil.exe", "Viewing current policy settings" and "Modifying policy settings").

    Mainly, the steps are the following:
    • Run "Ntdsutil.exe" on the Active Directory machine.
    • At the "Ntdsutil.exe" command prompt, type "LDAP policies"
    • At the "LDAP policy" command prompt, type "connections"
    • At the "server connection" command prompt, type "connect to server MYHOST.mydomain.com"
      Examples:
      "connect to server localhost"
      "connect to server ldapserver.archgroup.com"
    • At the "server connection" command prompt, type "q"
    • At the "LDAP policy" command prompt, type "Set MaxPageSize to NEW_VALUE"
      Example:"Set MaxPageSize to 3000"
    • At the "LDAP policy" command prompt, type "Commit Changes"
    • At the "LDAP policy" command prompt, type "q"
    • At the "Ntdsutil.exe" command prompt,type "q"


Move detection – advanced bits

Move detection is one of the big features in 4.0, as I’m sure you’re already aware of.

It has been implemented on top of the same underlying technology we use for Xdiff and Xmerge.

The point is: you just move a file on your workspace without issuing a “cm mv” operation (command line, GUI or through a plugin), and later Plastic is able to “detect” the move happened.

How it works

The principles of move detection are quite easy: Plastic has a list of the files (stored under .plastic/plastic.wktree file on the workspace) that are “controlled” under the workspace. Then you decide to look for changes:
  • If a file on the workspace is not on the list: then it is proposed as an “added” candidate
  • If a file is on the list but not on the workspace: then it is proposed as a “deleted” candidate

    How the “moves” are detected? The “added candidates” are matched with the “deleted candidates” and if they’re “similar enough” then they’re proposed as “moved”.

    Let’s make it more complicated

    What if foo.c is something like the following?

    And we rename it to bar.c and modify it this way:

    The file was so small that this little change will make the two versions less than 90% different, so “pending changes” view will look like this:

    As you can see plastic detects a “potential add” and a “potential delete”.

    Matching manually

    Right click on the “potentially added” file and select “search matches”:

    And then the “matching” dialog will show up. You can slide the similarity bar until the candidate appears:

    And once you “accept the selected match”, the “pending changes view” will reflect the move:
  • plastic 700 sec – git 1200 sec – a c# development story

    The short story: take a code tree of 192.818 files in 33.877 directories (overall size: 5.75GB) and check in on your favorite version control tool. Plastic SCM needs 713 secs, Git needs 1287 secs. Yes, we’re faster than Git!!!!!! And yes: a C# program can outperform a well-written C program by a 44%!!! (Ok, we’d be running cycles around “gitty” had we chosen C++ :P)

    The story – the beginning

    We started plastic scm back in 2005 and we decided to go for C# because it was much faster to develop with than C/C++. I missed C++ for a while but “.net remoting” (I was a DCOM fan) changed my mind.

    We only used C# because Mono existed. A true SCM must be multiplatform. Mono was there, so we went for C#. The first time we added a source tree to plastic was in September 2005 or so. It was the quake source code: 1200 files (about 30Mb). It took 11 hours to complete. (Yes, you read correctly, eleven hours!!!)

    Then we removed NHibernate out of the picture and developed the “sql” datalayer (look for datalayersql assembly when you download plastic) we’re still using today (with a ton of improvements) and things started to speed up.

    We released Plastic SCM 1.0 back in November 2006 in TechEd Developers in Barcelona. It wasn’t the fastest thing on earth but it already had some of the best branching and merging in town. (Check this for some historical plastic scm photos)

    You’ve to pay for your mistakes

    And we did. The first thing was the design of the “communication layer” between the client and the server. We came up with a neat and supposedly well designed set of interfaces. Thanks to Mono.Remoting it was all like “invoking local methods”. Isn’t it good?

    NO.

    Interface Oriented Design greatly explains why. Initially (for newbies) it can sound much better to have something like this:

    CheckinInterface
    {
        void Checkin(File file);
    }
    
    Than this:
    CheckinInterface
    {
        void Checkin(File[] files);
    }
    

    But it is simply wrong. Over the network, the less roundtrips, the better.

    Of course it wasn’t as simple as that, it meant really redesigning most operations to work “block based” instead of individually, something we finally put on stone back in Nov 2010 when we started Plastic SCM 4.0.

    Now every data transfer is minimized (and there’s still work to do) and ready to work with “bulks”. So, the bigger the checkin op is, the faster we are compared to other systems out there.

    Being faster than Git

    At the beginning we wanted to be faster than Perforce and so we did: our “update” (downloading the code to a workspace, “checkout” in git/svn jargon) was faster than competitors long ago.

    But, the folks at Perforce removed their “fast scm motto” once Git became mainstream (now they’re on a different party) and beating Perforce wasn’t fun anymore.

    We focused on scalability due to business requirements for a while() and we still do! But beating all competitors on a single “speed up” test was sort of a goal.

    Changes on the database backend

    Plastic stores all data and metadata on a database: it can be MySql, Firebird, SQLServer, SQLServer CE, Oracle and now also Postgresql.

    In order to speed up we had to dramatically reduce the number of data transfers to the database. In SQL Server we did that using “bulk copy” selectively when possible (a huge checkin will activate it), so 8 months ago we were consistently beating git with the same data tree using SQL Server…

    Yes!! It is possible to insert a tree of 200k items on SQL Server (using the network stack and everything) than putting it on git’s hidden directory (ok, database :P).

    The current test

    The current test I’m writing about today was performed using:
  • Plastic 4.0.237.2 (sqlite backend)
  • git -> 1.7.8.msysgit.0

    We’re running on Windows 7 on a DELL XPS 13 laptop (2/3 years old) with 8GB RAM.

    We’re using the sqlite backend, which is very good for distributed usage (I’m using it for more than 2 years now on my laptop) but doesn’t work well with concurrency.

    Future steps

    We always wrote our data and metadata on SQL databases. They work simply great, even faster than the file system (http://codicesoftware.blogspot.com/2008/09/firebird-is-faster-than-filesystem-for.html) under certain circumstances.

    But we’re considering a file system based backend (custom, closed, sort of what git/hg do today) to be used by distributed developers (main server can still be on SQL Server or MySql) and speed up some operations even more… (if possible! :P)

  • How to link repositories using Xlinks

    Xlinks are one of the coolest new features in Plastic SCM 4.0.

    They're basically a way to link different "trees" together so that a "project" repository can have "versioned links" to other component repositories and then just switching to a branch on the "project" repo will set up all the code for you.

    For 3.0 users: instead of delving with complex multi-repository selectors, Xlinks enable a way to just link to repos and download them easily (developers on your team won't have to copy/paste complex selectors, just switch to the right branches).

    For Git users: Xlinks are "submodules" done right!

    I’m going to show you how easy is working with multiple repositories at the same time using Xlinks.

    Scenario

    We have several repositories, one for the source code, one for marketing stuff and the last one for third party tools.
    Core

    ThirdParty

    Marketing


    Motivation

    It’s a good idea to have your information divided into isolated repositories since they are independent sub-projects, but actually they are part of the same project, so many times they will need to evolve together and share their content. For example, the Core repository needs the ThirdParty repository to achieve a successful build.

    How can we mix these repositories and work with all of them at the same time? Xlinks.

    Solution

    We have to create a new repository that will be the one who is going to manage the Xlinks to external repositories. This will be our “Project” repository.

    Create a new workspace working against the new repository, make three new directories, this will be the skeleton for the final structure.




    Now we create the Xlinks to our external repositories. You can find more info about how to create them issuing “cm xlink --help” but basically you just need to type ‘--w’ if you want to create a writable Xlink, where is going to be placed the external repository content, the external repository path that is going to be mounted and finally the starting changeset of the external repository.
    Sub-root paths will be supported in the future but from now you can only use the root path “/”.
    Now checkin ‘em all!!! You can use the pending changes view to do it.
    When everything is inside the repository your Items view will be like image below, notice the small arrows over the icon means that they are Xlinks to external repositories.


    The last step to finally get the external content is perform an update operation, you can do it by right clicking on the workspace root item and Update.

    One repository to rule them all, One repository to find them.
    One repository to bring them all and in the darkness bind them.

    Extra bonus

    Do you remember that we created the Xlink with the “--w”? That means that they are writable! You can modify items under an Xlink, remove, move, merge, create branches, basically EVERYTHING.
    With writable Xlinks you can evolve the 4 repositories at the same time, try to create a new branch on your Project repository, switch to it. Checkout a file inside an external repository, modify it a little bit and finally checkin it. PlasticSCM will automatically create a new branch on the remote repository to keep the change done! And you don’t have to care about nothing it simply works!
    Real Time Web Analytics