Tuesday, October 21, 2014

Making Socket.io 1.1.0 work with multiple nodes

Socket.io is the most popular websocket implementation library that is available to the application develeper. However for a long time it was full of bugs, had a lot of architectural issues and is not maintained. But changing all that socket.io version 1.0.0 came out. At this time the latest stable socket.io version is 1.1.0 and its so much improved.

If you are excited about the websockets technology and socket.io helped you to explore it, you'd be delighted to here about this all new rebirth of it. But trying to migrate to the new version from the old one is where you'd lose most of that delight. Specially if you have multiple nodes running on your server for load balancing.

When multiple nodes are running on the server side they collectively are responsible for handling the socket.io clients. Clients would not have much idea of which server node they are dealing with. This means that the server nodes need some common ground to share information about the clients on, so that any one of them can handle any client. In socket.io 0.9.* this ground is given the name store. A store can be implemented using any storage technology according to a store interface. The redis-store was the most used.

There are many fundamental problems with this architecture. One of the main ones being that the store used will contain every single details about every single client that connects. This makes drastically decreases the possibility of horizontal scaling. It would work great for few nodes with limited number of subscribed clients but when the number of clients touch millions this should give a lot of problems. Another is that it is not possible to add new nodes to the cluster without taking the whole cluster down. This is because new nodes do not update with the data available with already running nodes and are unable to handle requests from the existing clients.

So they have removed 'stores' from the new socket.io version and rightly so.

The successor of the redis-store will be redis-adapter. Here is how my diff looked like after the substitution of redis-adapter instead of the redis-store.

     var sio = require('socket.io');
     io = sio.listen(server);
 
-    var subscriber = redis.createClient(
-                         config.redisPort, config.redisHost, config.redisOptions);
-    var publisher = redis.createClient(
-                         config.redisPort, config.redisHost, config.redisOptions);
 
-    var RedisStore = require("socket.io-redis");
 
-    io.set('store', new RedisStore(
-             {pubClient:publisher, subClient:subscriber, host:config.redisHost,port:config.redisPort}));


+    var redisadapter = require('socket.io-redis');
+    io.adapter(redisadapter({ host: config.redisHost, port: config.redisPort }));

But the migration does not end here. The new socket.io requires the nodes to have sticky sessions in order to operate.

Sticky sessions ensures that a subsequent request would be forwarded to the same node that handled the previous requests corresponding to that request. So IP address based sticky sessions make sure that all the requests from a particular IP address is sent to the same node.

How you should implement sticky sessions depends on the technology you use in the load balancer. If you are using Nginx it can be configured in the setup. Or if you are using pm2 you are not that lucky (yet).

Or it is possible that you use the node cluster module for the load balancing. In that case 'sticky-session' node module should give you a hand. But its still not very mature and could have many more features. Anyway it works.

Wrapping the server instance in sticky function should do it.

+    var sticky = require('sticky-session');

-    var server = http.createServer(handler);
+    var server = sticky(http.createServer(handler));

And now socket.io 1.1.0 starts working! Its really not that difficult but there are not much help around the internet to the migrater. Once many stackoverflow questions around are answered and many new tutorials are put up socket.io would be great to work with.

Tuesday, July 22, 2014

Uploading files to a mongodb database without using express


Building functionality to upload a file to a Node.js server using express is a piece of cake. But for various reasons sometimes we do not want to use express. I had to implement such a functionality for such a system which only uses pure Node.js. Here is my experience while at it.

HTTP multipart request

Http is a text based protocol. It is intended to transfer text. If we transfer files which may contain binary patterns that are not found in simple text files, the network components, as they are only intended to handle text, may misbehave. The data in the http packet could contain a byte with a pattern that is used as a control signal in the http protocol. For example the end of transmission(EOT) character. Some components may reject bytes that are not valid text. Some may edit them. These may corrupt the file.

To avoid such pitfalls the standard of http multipart request is used. Http multipart request body is a little different in format to its regular counterpart. Most notably the value of the content type header field would be 'multipart/form-data'. The body of the http request could contain multiple files separated by a boundary. Network components are designed so that they would interpret multipart requests differently than regular ones. Data amid boundaries are treated as binary and they would not care what they mean.

So when we upload a file to a server through the internet what we actually do is no different than what we do when we submit a form by an http post request. Except that the http post request is encoded in a different way.

However above information is not needed to be known by the application programmer because the user agent she is writing the program to, should know how to put together an http multipart request. For example the browser (a user agent) would submit a multiparty request at the submission of following html form.


    <form action="/upload" enctype="multipart/form-data" method="post">
    <input type="text" name="title"><br>
    <input type="file" name="upload" multiple="multiple"><br>
    <input type="submit" value="Upload">
    </form>

Or on the Linux terminal

curl -v -include --form file=@my_image.png http://localhost:3000/upload

Server side

Just as the http client the application programmer is using would encode an http multiparty request, the server side framework should decode one for her. As mentioned earlier express would do this without a hassle. But if express is not an option for you, if you are on pure Node.js, then you might be a little confused. I was too until I got to know about multiparty. This npm package takes in the request instance and gives you references to the files saved in your disk on the temp directory, the files that were included in the request. Just as express would have.


http.createServer(function(req, res) {
  var multiparty = require('multiparty');

  if (req.url === '/upload' && req.method === 'POST') {
    // parse a file upload
    var form = new multiparty.Form();

    form.parse(req, function(err, fields, files) {
      res.writeHead(200, {'content-type': 'text/plain'});
      response.end("File uploaded successfully!");
      // 'files' array would contain the files in the request
    });

    return;
  }

}).listen(8080);

In the callback of the form.parse method it is possible to read the file in and save it to a database, rename it (move it) or do any other processing.

Processing the request

But if we are gonna save the file on the mongodb database why save it in the disk? Turns out we don't have to.

The form instant created by multiparty's Form constructor has 'part' and 'close' events to which handlers can be hooked. The 'part' event will be triggered once for each file(part) included in the multipart request. 'close' will be triggered once all the files are read.

The handler of the 'part' event will be passed an instance of a Node.js ReadableStream, just like a request instance to an Node.js http server. So it has 'data' and 'close' events (among others) just like a request instance to an Node.js http server, that can be used to read in the file, chunk by chunk.


form.on('part', function(part) {
    console.log('got file named ' + part.name);
    var data = '';
    part.setEncoding('binary'); //read as binary
    part.on('data', function(d){ data = data + d; });
    part.on('end', function(){
      //data variable has the file now. It can be saved in the mongodb database.
    });
  });

The handler of the 'close' can be used to respond to the client.


  form.on('close', function() {
    res.writeHead(200, {'content-type': 'text/plain'});
    response.end("File uploaded successfully!");
  });

The complete code would look like this.


  var multiparty = require('multiparty');
  var form = new multiparty.Form();

  var attachments = []

  form.on('part', function(part) {
    var bufs = [];

    if (!part.filename) { //not a file but a field
      console.log('got field named ' + part.name);
      part.resume();
    }

    if (part.filename) {
      console.log('got file named ' + part.name);
      var data = "";
      part.setEncoding('binary'); //read as binary
      part.on('data', function(d){ data = data + d; });
      part.on('end', function(){
        //data variable has the file now. It can be saved in the mongodb database.
      });
    }
  });

  form.on('close', function() {
    response.writeHead(200);
    response.end("File uploaded successfully!");
  });

  form.parse(request);

Multiparty would save the files to the disk, only if the form.parse method is provided a callback. So in the above case it would not do so. It is expected that processing of the file is handled using the event handlers of the form instance.

Saving on MongoDb

Saving the data on the mongodb database could be done using the GridStore. This part will not be included in this post since it is straight forward. Further this step will be the same whether we use express or not, and I want this post to be specific to the case of pure Node.js.

Thanks for checking out!

Saturday, June 28, 2014

Faking Redis in Nodejs with Fakeredis

You probably know what redis is. Just in case it is a data structure server that usually work as a key value pair store. The data is stored in memory by default instead of a disk, so it is really really fast with the sacrifice of reliability.

Faking


At work while writing tests for some code that uses redis, in nodejs, I wanted to fake a redis server. Why would I do such thing? Because my tests are better isolated. They don't depend on some other process (the redis server).

Now how to fake the behavior of redis for tests? By using fantastic fakeredis. fakeredis is an npm library which makes faking the redis behavior in a nodejs application very easy. It also has a ruby gem, so can be used in a ruby application as well.

The code under tests


This is the code I wanted to test.

var redis = require('redis');

var client = redis.createClient();

function isOnline(username, callback){
  client.exists(username + 'SocketId', function(err, exists){
    callback(err, !!exists);
  });
}

exports.isOnline = isOnline;

isOnline is a simple function that checks whether a socket id for a given user name exists in the redis store.

Tests Without Faking


My usual test for this function, if I didn't knew fakeredis existed, would be,

var redis = require('redis');
var assert = require('assert');

var users = require('../lib/users.js')

var client = redis.createClient();

describe('isOnline', function(){
  it('should return true if user has a socket id', function(done){
    //add a mock key value pair
    client.set('onlineUserSocketId', 'some_socket_id', function(){ 

      users.isOnline('onlineUser', function(err, online){
        assert(online);
        done();
      });

    });
  });

  it('should return false if user does not have a socket id', function(done){
    users.isOnline('offlineUser', function(err, online){
      assert(!online);
      done();
    });
  });
});

As it is obvious by the code itself I am testing whether the function returns true to the callback when a socket id for a particular user name is in the redis store, and false otherwise.

The biggest problem here is that anyone who runs those tests must make sure a redis-server process is running. Tests won't pass otherwise. Also running these tests on the production server is probably a bad idea since they would tamper with production data. Also after each test their should be a clean up step that clears out the data added to the database during the test. A flushdb call, which flushes out the data, would be ideal for this task. As every test would start to run on a clean database, they will be better isolated from each other. But such call is hazardous because someone could run the tests on a production server and flush out all the production data!

Tests With faking


Here's how fakeredis is used to avoid above shortcomings.

I used sinon to stub the redis createClient function. Before all here is the before hook that does it.

var fakeredis = require('fakeredis');
var sinon = require('sinon');
var assert = require('assert');

var users;

var client;

describe('isOnline', function(){

  before(function(){
    sinon.stub(redis, 'createClient', fakeredis.createClient);
    users = require('../lib/users.js');
    client = redis.createClient();
  });

});

The block of code in the before hook would run before running any test and would stub the regulare redis.createClient function with fakeredis.createClient function. Stubbing is replacing a resource that your system uses, with something that imitates that resource's behavior. So after stubbing whenever the redis.createClient is called what actually gets called is fakeredis.createClient.

Now any method call on the redis client instance returned by createClient function would not tamper with anything in the redis server. fakeredis would imitate the behavior of and the code under tests would not know a thing about the faking! Now the tests look like this.

var redis = require('redis');
var fakeredis = require('fakeredis');
var sinon = require('sinon');
var assert = require('assert');

var users;

var client;

describe('isOnline', function(){

  before(function(){
    sinon.stub(redis, 'createClient', fakeredis.createClient);
    users = require('../lib/users.js');
    client = redis.createClient();
  })

  after(function(){
    redis.createClient.restore();
  });

  afterEach(function(done){
    client.flushdb(function(err){
      done();
    });
  })

  it('should return true if user has a socket id.', function(done){
    client.set('onlineUserSocketId', 'some_socket_id', function(){
      users.isOnline('onlineUser', function(err, online){
        assert(online);
        done();
      });
    });
  });

  it('should return false if user does not have a socket id.', function(done){
    users.isOnline('offlineUser', function(err, online){
      assert(!online);
      done();
    });
  });
});

As you can see I have added an after and an afterEach block. What the after block does should be obvious. It brings the redis.createClient function to the previous state.

The afterEach block is more interresting. It adds the clean up step I stated earlier. It calls client.flushdb without hesitation because the the client object would only work with fakeredis data. As the name implies afterEach hook is called after each test so that it makes sure the next test is run on a fresh store.