Tuesday, July 22, 2014

Uploading files to a mongodb database without using express


Building functionality to upload a file to a Node.js server using express is a piece of cake. But for various reasons sometimes we do not want to use express. I had to implement such a functionality for such a system which only uses pure Node.js. Here is my experience while at it.

HTTP multipart request

Http is a text based protocol. It is intended to transfer text. If we transfer files which may contain binary patterns that are not found in simple text files, the network components, as they are only intended to handle text, may misbehave. The data in the http packet could contain a byte with a pattern that is used as a control signal in the http protocol. For example the end of transmission(EOT) character. Some components may reject bytes that are not valid text. Some may edit them. These may corrupt the file.

To avoid such pitfalls the standard of http multipart request is used. Http multipart request body is a little different in format to its regular counterpart. Most notably the value of the content type header field would be 'multipart/form-data'. The body of the http request could contain multiple files separated by a boundary. Network components are designed so that they would interpret multipart requests differently than regular ones. Data amid boundaries are treated as binary and they would not care what they mean.

So when we upload a file to a server through the internet what we actually do is no different than what we do when we submit a form by an http post request. Except that the http post request is encoded in a different way.

However above information is not needed to be known by the application programmer because the user agent she is writing the program to, should know how to put together an http multipart request. For example the browser (a user agent) would submit a multiparty request at the submission of following html form.


    <form action="/upload" enctype="multipart/form-data" method="post">
    <input type="text" name="title"><br>
    <input type="file" name="upload" multiple="multiple"><br>
    <input type="submit" value="Upload">
    </form>

Or on the Linux terminal

curl -v -include --form file=@my_image.png http://localhost:3000/upload

Server side

Just as the http client the application programmer is using would encode an http multiparty request, the server side framework should decode one for her. As mentioned earlier express would do this without a hassle. But if express is not an option for you, if you are on pure Node.js, then you might be a little confused. I was too until I got to know about multiparty. This npm package takes in the request instance and gives you references to the files saved in your disk on the temp directory, the files that were included in the request. Just as express would have.


http.createServer(function(req, res) {
  var multiparty = require('multiparty');

  if (req.url === '/upload' && req.method === 'POST') {
    // parse a file upload
    var form = new multiparty.Form();

    form.parse(req, function(err, fields, files) {
      res.writeHead(200, {'content-type': 'text/plain'});
      response.end("File uploaded successfully!");
      // 'files' array would contain the files in the request
    });

    return;
  }

}).listen(8080);

In the callback of the form.parse method it is possible to read the file in and save it to a database, rename it (move it) or do any other processing.

Processing the request

But if we are gonna save the file on the mongodb database why save it in the disk? Turns out we don't have to.

The form instant created by multiparty's Form constructor has 'part' and 'close' events to which handlers can be hooked. The 'part' event will be triggered once for each file(part) included in the multipart request. 'close' will be triggered once all the files are read.

The handler of the 'part' event will be passed an instance of a Node.js ReadableStream, just like a request instance to an Node.js http server. So it has 'data' and 'close' events (among others) just like a request instance to an Node.js http server, that can be used to read in the file, chunk by chunk.


form.on('part', function(part) {
    console.log('got file named ' + part.name);
    var data = '';
    part.setEncoding('binary'); //read as binary
    part.on('data', function(d){ data = data + d; });
    part.on('end', function(){
      //data variable has the file now. It can be saved in the mongodb database.
    });
  });

The handler of the 'close' can be used to respond to the client.


  form.on('close', function() {
    res.writeHead(200, {'content-type': 'text/plain'});
    response.end("File uploaded successfully!");
  });

The complete code would look like this.


  var multiparty = require('multiparty');
  var form = new multiparty.Form();

  var attachments = []

  form.on('part', function(part) {
    var bufs = [];

    if (!part.filename) { //not a file but a field
      console.log('got field named ' + part.name);
      part.resume();
    }

    if (part.filename) {
      console.log('got file named ' + part.name);
      var data = "";
      part.setEncoding('binary'); //read as binary
      part.on('data', function(d){ data = data + d; });
      part.on('end', function(){
        //data variable has the file now. It can be saved in the mongodb database.
      });
    }
  });

  form.on('close', function() {
    response.writeHead(200);
    response.end("File uploaded successfully!");
  });

  form.parse(request);

Multiparty would save the files to the disk, only if the form.parse method is provided a callback. So in the above case it would not do so. It is expected that processing of the file is handled using the event handlers of the form instance.

Saving on MongoDb

Saving the data on the mongodb database could be done using the GridStore. This part will not be included in this post since it is straight forward. Further this step will be the same whether we use express or not, and I want this post to be specific to the case of pure Node.js.

Thanks for checking out!

6 comments:

  1. Such an awesome write man, you just made an node novice day :) I am working web developer for quite sometime now and never paid attention to details of multipart/form-data, thanks for throwing light on that dark side. Keep writing !

    ReplyDelete
    Replies
    1. So glad my writing helped someone! Thanks for taking time to post a comment :)

      Delete
    2. I have posted this question on StackOverFlow - http://stackoverflow.com/questions/40099016, explaining my usecase. I have used code sample given by you to read XLS file and convert it to bytes. I just wanted to check if there is a way to convert that bytes of data to JSON objects ? I found few npm modules like "excel-as-json" but that asks for the XLS file path.

      Delete
    3. Hey Suman, I couldn't find anything from a little Google search.

      If you could not find a way, I think its fine to just allow multiparty to save the file temporary and read it that module. Worry about converting directly only if this gives you problems. Like performance issues.

      Or you can look into the code of that module and hack it a bit to create one of your own. I also found this: https://github.com/DataGarage/node-xls-json.
      But this might be challenging so do the easy thing and save an xls file using multiparty.

      Delete
  2. Connection to database: No updates that are unauthorized are permitted by the system administrator to be done on the database. mysql dashboards

    ReplyDelete
  3. The Cheng Hoon Teng - is a Buddhist temple located in Jalan, Tokong Malacca and is the oldest operating temple in Malaysia. The mosque is pretty spacious with an area of 4,600 square meters. It has numerous prayer halls and later added with small prayer Watch Latest News At Gossiplankaupdates.

    ReplyDelete