Who we are

We are the developers of Plastic SCM, a full version control stack (not a Git variant). We work on the strongest branching and merging you can find, and a core that doesn't cringe with huge binaries and repos. We also develop the GUIs, mergetools and everything needed to give you the full version control stack.

If you want to give it a try, download it from here.

We also code SemanticMerge, and the gmaster Git client.

Firebird is faster than filesystem for blob storage!

Monday, September 01, 2008 Pablo Santos 17 Comments

To blob or not to blob... good question!

I guess if you develop data intensive applications dealing with large files at some point, you've thought about it at least once: should I put my data files inside the database or should I write them directly to disk?

There are a number of interesting links to consider:

  • Use FILESTREAM to store blobs in SQL Server 2008
  • Storing blobs in database vs filesystem
  • Storing BLOBs in database or filesystem?
  • Should I store images in the database or the filesystem?

    Just google it and you'll get a big number of entries.

    Ok, so here's my point:

    storing files in Firebird is faster than writing the same files on the filesystem!


    Which is pretty good for all the structured storage true believers, I think.

    Ok, let's take a look at the figures, here's my program which is able to recursively take the files in a directory, split them in 4Mb chunks and write them to another directory or a firebird database.

    For my test I used a tree with:
  • 3659 files
  • 468 directories
  • 343 Mb in total

    And here's the output, first writing to Firebird and then to a filesystem.

    >writedir d:\testcode d:\repodata fb
    Total time 27669 ms

    >writedir d:\testcode d:\repodata
    Total time 40378 ms

    A hash is calculated for each file chunk which also takes some time.

    What I believe is pretty interesting is how writing to a Firebird database (which stores small files saving disk space since it only uses one single file!) is actually faster than dumping the same content to disk!!

    Of course, there are a number of situations where high-perf software (including SCM) can benefit from using a FS instead of a database, but blob storage (with Firebird :-P) is not one of them...

    You can find the full source code here.


    Enjoy!

    Side notes

  • I'm using Firebird embedded on windows
  • Don't try with SQL Server... it can't handle blob storage at the same speed... at least not without the latest 2008 improvements
    Pablo Santos
    I'm the CTO and Founder at Códice.
    I've been leading Plastic SCM since 2005. My passion is helping teams work better through version control.
    I had the opportunity to see teams from many different industries at work while I helped them improving their version control practices.
    I really enjoy teaching (I've been a University professor for 6+ years) and sharing my experience in talks and articles.
    And I love simple code. You can reach me at @psluaces.
  • 17 comments:

    1. Of course, you'll be relying on whatever Firebird is doing internally for things like fragmentation, etc...

      ReplyDelete
    2. It will be interesting to test it on raw devices (linux side)

      and later to use at as real filesystem

      we should write an fuse firebird driver (derived from the mysql one)

      http://www.linux.com/feature/127055?theme=print

      ps: and my dream is to keep my thunderbird mails in firebird

      ReplyDelete
    3. Hi mariuz,

      As your link points, MySql is much slower dealing with blobs than Firebird. I also have some tests here:

      http://codicesoftware.blogspot.com/2006/08/blob-performance.html

      I'm very interesting in writing a Fuse FS as well. I'd love to have a FS front-end for plastic, which could be used to store your thunderbird mails (and mine :-P) and even provide versioning.

      One concern I have is that Firebird embedded is faster to store blobs than Fb Server, and AFAIK it is not available anymore on Linux.

      ReplyDelete
    4. > One concern I have is that Firebird embedded is faster to store blobs than Fb Server,

      Certainly. Network protocol (which is eliminated in embedded case) adds overhead

      > and AFAIK it is not available anymore on Linux.
      Embedded always been available on Linux and there is no plans to kill it ;)

      ReplyDelete
    5. Hi hvlad,

      Yes, the network adds some overhead, it happens to our plastic server as well: we were running performance tests against different version control systems, and (using Firebird! :-P) we were able to beat all of them in checkin... except GIT. And the main reason was the network overhead (about 18 seconds in our test machine to send the data through the loopback). We'll implement a shared mem protocol for users running servers locally (i.e. distributed development).

      About the embedded: sorry if this is an obvious question but: where can I find info about how to set up the embedded Linux server? Is it possible using mono?

      ReplyDelete
    6. > where can I find info about how to set up the embedded Linux server?
      Just use libfbembed.so

      > Is it possible using mono?
      Why not ? :) If you use Firebird .Net Provider with mono you need to use its ability to work via libfbembed.so (not via its own protocol implementation). AFAIR, its possible on Windows (setting something in connection string) so i think its possible on Linux too

      ReplyDelete
    7. I wonder if a simple tar archive (fuse or plain code) would be as fast?

      ReplyDelete
    8. For the benchmark to be of any significance, you'll have to test repeatedly with data much larger than your RAM, and a disk subsystem much faster than a single drive.

      ReplyDelete
    9. Well, there are some things to consider, like the fact that Firebird probably just accepts the payload into a buffer and says "OK!" -- the actual data is probably not written to disk until the flusher comes by (I don't know how Firebird does this.)

      I don't know, just a thought.

      ReplyDelete
    10. Jespern may be right. It would be interesting to see the different results with forced writes on and off.

      ReplyDelete
    11. FB acts just as a buffer. No magic here.

      ReplyDelete
    12. Sure, no magic involved! But it does pretty good disk management.

      Besides, storing small files on a single firebird file (the database) is better to optimize disk usage.

      ReplyDelete
    13. hvlad,

      One more question:

      Is libfbembed.so included by default on a Fb distro or we have to build it ourselves?

      ReplyDelete
    14. ~300MB of data? Are you serious?
      Please come back when you've tested with at least 3TB, and a few orders of magnitude more files.

      ReplyDelete
    15. Of course the database is faster if the file system code isn't optimized... Try running the example code as is, and then try it with:

      file.Flush();

      issued before:

      file.Close();


      After this change the file system starts to look a lot more compelling...

      ReplyDelete
    16. Did you do this test with Forced Writes ON or OFF?

      ReplyDelete