Distributed and proxy servers

October 5, 2012

Plastic SCM supports both distributed development (it is a DVCS – distributed version control) through distributed servers and also proxy servers.

Distributed servers and proxy servers are not the same and they serve very different purposes. But sometimes users get really confused, that’s why I’m going to try to explain what we had in mind when we designed our proxy and distributed servers.

A proxy server story

Proxy servers in version control jargon (including Plastic’s) are just “cache servers” able to cache data and optionally metadata to reduce the number of requests to the central server.

In the old days (present for some products out there :P) of central servers, proxy servers where the only way to speed things up for developers working at a remote location.

Proxy servers are read-only, which means you can read from them, but if you need to execute a “write” operation you need to connect to the central server.

Now, what happens if the network connection goes down (or it is extremely slow)? Then the developers can’t work with the version control.

That’s why “proxy servers” are not very good for distributed development. Even when the connection is reliable, there’s a real risk it won’t be fast enough for high demand users like software developers.

Many vendors, though, have been presenting proxy servers as a “distributed solution” when they don’t have a real DVCS offering.

Distributed means no cable

This is how a distributed setup (a multi-site setup to be more precise) looks like. Two servers pulling and pushing changes over a network, potentially the internet.

In a pure distributed scenario each developer will be able to host his own distributed repository so there won’t be just two “servers”.

In a “multi-site” setup like the one described above, one server will be at each side of the “cable”.

Plastic SCM is actually able to support the two scenarios: multi-site and fully distributed.

Now, what if the network goes down?

As the following diagram shows, each server is able to work independently of its peers. It won’t be able to exchange data with the disconnected server, but it will be able to continue serving the developers, so that broken connections or slow networks won’t force them to halt.

The ability to run fully distributed servers is what actually enables true distributed development and real DVCS.

It is not an easy task and that’s what makes Plastic SCM so unique: the only commercial DVCS (BitKeeper still there?) and the only one able to work both in central and distributed modes.

Use proxy servers at home – world upside down

Every Plastic SCM server is a distributed server, it is able to push and pull changes from remote ones.

So, why did we implement a proxy server if we already have a distributed one?

First thing is flexibility: we love to give our users all the possible options, so they can choose the best one for them. They can host repositories on each developer machine (pure git’s style) or they can have huge central servers using SQL Server or Oracle backends. That’s why we have both: some teams will prefer to go fully distributed, others will stay on a multi-site approach with single servers at each location and clients directly connecting to them without local repos, and others will prefer to have proxy servers on different locations.

But, to be honest, the main reason why we support proxy servers is to enable high performance setups behind the firewall. Teams with huge amounts of developers (several hundreds of thousands) working on the same physical location but with separate teams on different network fragments. In order to speed up we like to install a proxy server on each fragment to reduce data transfers bottlenecks on the main server.

Conclusion and disclaimer

I hope I wrote a better explanation on proxies and DVCS, although to be really honest I just learnt to draw the beautiful blue cloud combining circles in Visio and I wanted to share it with the world somehow :)

Enjoy.



No comments :

Real Time Web Analytics