The Plastic SCM blog

sgen is getting ready for production

Last week I was running some tests together with Mark Probst on our test cluster using Plastic SCM and the new sgen garbage collector.

The load test consists on the following:

  • Each client downloads the entire project (runs an update from version control)
  • Creates a branch
  • Modifies 5 files
  • Checkins all changes
  • Goes back to step 1 (up to 5 times)

    So, we run 85 clients on 85 different machines against one single Plastic SCM server running MySql on Linux and using Mono + sgen.

    The results are: sgen, right now is only 15% slower than Boehm.


  • As you can see time gets better when you increase nursery size, but what's extremely better is overall memory usage: first VM peak mem is much lower and at the end of the test we checked how RES mem is also much, much lower. During the test memory consumption is also lower and you can see how sgen frees virtual memory (something you'll never see with Boehm, and ends up being a big problem).









    version (85 concurrent clients)time (sec)Peak VM(Gb)RES and GC (final)
    boehm gc4942.4600
    MONO_GC_PARAMS=nursery-size=4m
    10070.7200
    MONO_GC_PARAMS=nursery-size=16m6400.8200
    MONO_GC_PARAMS=nursery-size=32m5890.9200
    MONO_GC_PARAMS=nursery-size=256m5681.1200

    Plastic SCM proxy server explained

    One of the new features we introduced with the 2.9 release is the Proxy Server. As you know Plastic is all about flexibility, so it can behave as a DVCS or as a centralized system.

    When you run Plastic in centralized mode, especially on wide area networks or across VPNs, you’ll be hit by network issues: latency, slow down, connection problems… Then you have two options: you can use the distributed system to avoid being hit by the network (setting up a local server at your office to communicate with the central one, then avoiding a huge number of roundtrips), or you can set up a proxy server to greatly reduce network traffic and improve performance.

    Depending on your own circumstances, preferences, network resources and so on, you can go from one or the other. At the end of the day what we try to come up with is a good set of options so you can choose.

    How the proxy server works


    The proxy server works in a pretty straightforward way: it simply caches revision data (file data actually) to make it available to clients so that they don’t have to go and query the central server. It greatly reduces network usage since normally data transfers (more than metadata) generate most of the daily traffic.

    In order to use the proxy server the clients need to be specifically configured (a detailed explanation later), so every time they need to request data, they’ll ask the proxy server, which will make the call on their behalf, handle concurrent requests of the same revision so the data is retrieved only once (reducing data traffic) and store the data locally (using a pre-configured cache directory) before returning it to the client.

    The proxy doesn’t need any configuration since:
  • It doesn’t know about servers in advance, it just receives requests from the clients and connects to the specific servers on their behalf using the same credentials the client does.
  • There’s no specific preload operation: in order to trigger a preload simply run a “update forced” on a existing workspace or a regular one on a new one (force to download data).
  • Currently there’s no limit on the maximum cache size, but all data is stored on a single directory, so it’s straightforward to remove data if it grows too large.

    Data is stored by server and repository (a different directory for each server and then a directory for each repository).

    The following figure shows how the basic communication flow works and how data is arranged inside the proxy server data location.


    The next graphic explains how the individual calls requesting data for revisions are handled by the proxy server which will cache the received data after calling the repository server.


    And the same principle will apply when scenarios get more complicated and instead of a single server and repository there are several servers and repositories involved.


    What happens if the proxy server goes down?


    Currently the mechanism we’ve implemented is also pretty transparent: if the proxy server goes down (or you shut it down), the client will detect it (network connection will fail) and will directly contact the real repository server. It will log it for diagnostic purposes. A client won’t use again the proxy server once it detects it is down until the client itself gets restarted.

    Installing a proxy server


    Installing a proxy server is pretty straightforward on Windows, Linux and Mac OS X. You just have to get the installer and follow the steps. In fact, it will only ask you for a directory to locate the cached data, and that’s all.
    The configuration will be saved on a plasticcached.conf file with a single entry for the directory mentioned above.


    Configuring a proxy server on the client side


    There’s only a simple change to perform on the clients: run the configuration wizard (from the GUI preferences option or running plastic - -configure) and set the right proxy server.


    How a typical proxy server set up looks like


    The initial situation before you set up a proxy server will be something like the following.


    The network traffic (in red) is too high and clients are slowed down. In order to solve it you can set up a couple of proxy servers, one at each LAN.



    Now the data traffic will be local and performance will get much better.

    Performance benchmark


    Ok, so far I’ve been telling that performance gets better on centralized setups when you introduce a proxy server, but I didn’t share any data about how better does it actually get.

    We run load tests on a cluster to check and improve Plastic SCM performance, and this time we focused on finding out how to reduce network traffic by using proxy servers.

    We use the following configuration: 4 different networks where computers are connected through a gigabit connection and then one central server connected to the different sub-networks with a 100Mbps connection (which is the actual limiting factor). In total we will use 71 concurrent clients.

    We use a very simple repository were a simple copy consist on 25k files and about 3k directories and a total of 300Mb.

    The test itself is very simple:
  • Every client will create a workspace and download the latest copy of the main branch (trunk in SVN jargon)
  • Then the client will create a branch, switch to it and modify a total of 10 files on it.
  • Will repeat the process (go back to step 2) 5 times.

    The following figure depicts the network layout and the machines at each lab (CPU, total bogomips of each node and RAM).



    Then we run the test with and without proxy servers and compare the results.

    server os

    time (min)

    Gb Sent

    Gb Recv

    Linux 64bits + proxy servers

    10,73

    2,10

    0,26

    Linux 64bits

    30,17

    16,85

    0,30



  • As you can see, in this very simple example, we can multiply overall performance by a factor of 3 by introducing proxy servers. The actual number of proxy servers and configuration will vary depending on your layout, we tested with 4 proxy servers because we’re using 4 networks, but it would vary depending on the topology.

    New style for Plastic SCM 3.0

    The launch date for Plastic SCM 3.0 is getting closer and together with a full pack of new features and improvements (not yet public) we're going to apply some changes to the main GUI design.

    David just sent me a few screenshots with several alternatives.

    The first one makes a subtle change on the "workspace information area" to include details about the current task (pulled from the issue tracking system you're using, you know: Bugzilla, Mantis, Jira, OnTime, Rally and so on...).



    The second one keeps the previous concept but resolves one request we've often heard from users: reduce the space above the 'view area'.



    It also highlights the idea of 'tabs'.

    I personally prefer the second one but I'd like to see the 'top area' even more reduced so the 'views' can get more screen space.
    Real Time Web Analytics