Sunday, May 13, 2012

AMQP Story. Part 1

Introduction

 

Few months ago a friend of mine suggested to do a startup together. I will not tell you yet about the project we’ve decided to do but  I’ll tell you that it is a mobile oriented service. I decided to describe the project flow step by step and so there will be  series of articles and I will do the project presentation in the very last article. So here we go.

Architecture design


I’ve had  some experience in developing back-ends for iPhone apps previously and the main challenge I’ve faced was a sharp rise of load. In other words everybody  started to purchase the app suddenly and the back-end failed to handle such amount of connections. To make things easier to understand I’ll tell you few words about the back-end I used. It was a Zend Framework based application which was working on Apache2 web server, MySQL database was used and the application provided the REST API for the iPhone app. iPhone application was just an image gallery where images were provided by the backend.
To continue with the back-end I would like to tell few words about scalability. As you can see from the article there are 2 types of scaling vertical  and horizontal. The vertical scaling is an upgrade and improvement of server’s hardware and the horizontal one adds more nodes (servers) into system. Both of these types have their advantages and disadvantages but it’s always better to scale out (horizontal scaling) but not up (vertical scaling). So the only way to make the back-end work properly was an upgrade of server hardware which was an expensive solution.  Also it’s impossible to scale vertically forever and it is the main disadvantage.
According to my previous experience I could define the first and main requirement to the architecture of the project I’ve talked about in the beginning of the post- it should be able to scale it horizontally easily.
Also there might be tasks which can take much time (i.e. video processing, sending emails when mail server is overloaded, bulk update of big amount of rows in a DB) and it must be possible to run these tasks asynchronously and not to wait for a response until timeout happens.
To fit these requirements I’ve designed the following architecture (Figure 1). I’ve omitted some parts of the system like static content storage node, cache node, full text search node I left things connected scalability only on this figure. As you can see there are 3 levels which can be scaled:
1. Httpd layer. It supposed to be a PHP application running on Apache web server with balancer so it’s possible to add additional servers to this node. I was thinking of using Zend Framework here but model layer would interact with API (see below) but not with a database.
2. API layer. It supposed to be a PHP application too. As in the case with httpd layer I was going to run it on Apache with balancer. It was expected that this application would interact with a database layer and provide REST API for the HTTPD Layer app.
3. Database layer. It was going to use MySQL here. It’s true that MySQL scaling could be painful (see this) but that was fine for me.

Figure 1. First version of architecture design

At the first glance everything is fine with this architecture. It has 3 levels of scalability which could be scaled separately, API php application could be replaced with c/c++ or java app for a boost if needed, it’s possible to do asynchronous API calls for operations which take much time. But I was not satisfied enough with this solution because it has a drawback. API layer it not flexible enough. Let’s imagine that API has 2 functions functionA and functionB and is using N physical servers. functionA is called in 20% of calls and functionB in 80% and it would be great to handle functionA calls on 0.2N servers and functionB calls on 0.8N ones. This approach allows to use computing resources more rationally but this architecture does not allow to do it. I was about to accept this drawback but then I’ve made a research and  found this http://highscalability.com/  web site and I should say that it is worth reading. It contains real world examples, scalability strategies description and really helped me to clarify scalability things. I would like to name this awesome article as it contains a lot of useful advices. So as the result of my research I’ve got acquainted with AMQP and one of its implementation RabbitMQ. I’ve chosen RabbitMQ because it is written on Erlang and is extremely fast, it scales great and easy, it has high availability and it is an open source. So figure 2 shows the new version of architecture.    

Figure 2. Architecture design

As you can see this architecture is pretty familiar to the previous one. I left httpd layer and DB layer with no changes but I got rid of API layer and replaced it with RabbitMQ and a bunch of AMQP consumers. My models in httpd layer will use consumers as a data source and will interact with them via RabbitMQ in other words my models will be AMQP consumers. The basics of creating messaging application can be found here http://www.rabbitmq.com/getstarted.html so here you can learn the main conceps of AMQP such as producer, consumer, exchange, queue, types of exchange.  I will use Routing for asynchronous calls in my system and RPC for synchronous ones.  So why is this architecture better? Do you remember the case with functionA and functionB? Now we can run consumer which performs functionA on 0.2N of servers and consumer which performs functionB on 0.8N of servers. Also this architecture is much faster! 

   

Benchmarking  

    Let me demonstrate some benchmarks to you. All benchmarks were done with ab utility. Tests were done on my old HP laptop with 2.5Gb of RAM and Core 2 Duo processor.

    1. API layer returns ‘Hello World’ string. It’s nothing here to comment AMQP architecture is faster.

    2. Insert into DB. Client calls API layer function which inserts row into DB table.
As you can see from the charts REST architecture is faster in this case. But the following charts will show that AMQP architecture scales very easy. I’ve run 2 AMQP consumers and did benchmarks again.

    3.  Everything is the same like in the previous case but API calls are asynchronous. AMQP architecture has won again. Also please note that asynchronous call is 2 times faster than synchronous one.


 

Conclusion


    So I have decided to choose AMQP architecture for our project. Of course the architecture scheme provided in this article is not full and some parts like logging system, cache module and search module are not mentioned here, I will describe those parts in my further articles. I will also describe not only architectural solutions but different PHP tools and libraries, both custom and 3rd party, which will be used in the project. I will also tell about AMQP library which I’m using in the next article.

3 comments: