Scaling Masterbranch with Siena

Pablo Villalba September 17 2009

I’d like to share with you some of the amazing technology the guys at Masterbranch are putting together. They were so kind to accept an interview from our team.

Masterbranch is a new web application that builds an online CV based on your activity as a programmer in the Internet in a graphical and easy way. Open-source contributors can show and prove all the work they did for all those projects.

As a programmer, what matters is not what you say about yourself, but what you actually did. The facts prove your experience.

In Open Source projects the work is public. It is possible to know exactly the contributions of each programmer. Masterbranch tracks the activity of the Open Source contributors, as well as activity in Q&A, blogs, mailing lists, etc. With all this, Masterbranch builds an online CV continuously updated with your real experience.

It’s also possible to follow the activity of interesting people or projects, in a twitter-like way.

But then, the scaling problem arises. Masterbranch is tracking, from day one, 130,000 projects. Updates are gathered in real-time and there’s tons of data involved to process and display. In order to avoid the fail whale when dealing with big data sets, Masterbranch developed their own persistence API. They required good performance even with big datasets of information and, as a startup company, development times are critical.

That’s how Siena, their own ORM, was born. Scaling and ease of use have been the main goals for it.

What is a persistence API, and how is it used?

A persistence API is a piece of code that helps programmers develop against a database. The persistence API abstracts you from the database letting you write the same code no matter wich kind of database you are using. It also hides the complexity of the database so you will make less errors and you will develop faster.

In which ways is Siena different to other persistence APIs?

Siena has been designed with the ease of use in mind. Siena has less features than other persistence APIs, but it’s far easier to use than other persistence APIs. Less code means fewer errors and faster development.

Siena purposedly avoids “features” that are considered as bas practices for big databases, as they’d decrease performance and make scaling impossible.

Why did you develop your own persistence API?

Other persistence APIs make you write boilerplate code so developing with them is error-prone, and it takes a lot of time even to develop the simplest tasks.

We realized that we needed a better persistence API. We wanted a simpler persistence API to develop faster and without errors, and we also wanted a persistence API that let us do only those things that perform well with big datasets.

Which kind of projects would Siena be best suited for?

Siena can be used in any kind of projects that need a database to store data. But is best suited for web applications that need to handle a lot of data. If your application is going to handle a small dataset, you are better off with other persistence technology.

Thirsty for more? Keep on reading

blog comments powered by Disqus

Latest Posts