Monday, October 17, 2016

A new module system for Node.js

Node.js does not need a new module system. Its existing implementation of a CommonJS module system works great. Even Facebook apparently gave up developing their internal module system, Haste. So the module system I am building is not of any production value but is just a fun weekend project.

How it will work

I name this new module system node-get because get is the global used to load new modules with it. There's an executable named node-get that can be installed using npm -g node-get-modules. You can run it just like the node executable.

$ node-get hello.js

Where hello.js is a JavaScript file that uses node-get module system. Here's an example.

// hello.js
const capitalize = get('capitalize.js');
const hello = capitalize('hello world!');
console.log(hello) // Prints Hello World!

But in this post, I'll use node directly to run it because it requires no set up and works anywhere.

$ node node-get.js hello.js

I am using Node.js version 6 to build node-get so the code here uses ES6 syntax. Everything should work in node version 4 as well.

vm module

First, I want to introduce you the vm module from Node.js. vm's responsibility is executing JavaScript. Every single JavaScript file you write in your Node.js app must contact this module at some point to have it executed.

vm provides two methods to facilitate this.

  • vm.runInNewContext(someJSCode, theNewContext)
  • vm.runInThisContext(someJSCode)

Context in these methods refer to the global state. Both methods execute the JavaScript code stored in string variable someJSCode. They differ only by the global variables they allow someJSCode to use.

runInThisContext makes a brand new set of variables and functions using entries in theNewContext object and makes them available to someJSCode as globals.

runInThisContext makes all the globals available to the script that runs it, to be available to someJSCode as well.

I will get to these methods the moment I start building the new module system.

The job of a module system

It's the job a module system to read contents of JavaScript files and run their content using the vm module. It should help these files communicate by passing results of the callee to the caller.

The Node.js module system, with require and exports, does just that and so will my new module system.

I start coding

I'll start to code; feel free to follow along if you like.

First I'll create two files.

node-get.js will contain the actual code of node-get, my new module system. hello.js will contain the JavaScript code that I will run using node-get. It will demonstrate the features of node-get

I'll put some code in hello.js.

// hello.js
console.log('hello world!');

I'll start node-get.js with following code. It's using runInNewContext from the vm module.

const vm = require('vm');
const fs = require('fs');

// Read the module
const moduleJS = fs.readFileSync('./hello.js')

// Create an empty context
context = {};

// Execute JavaScript from hello.js
vm.runInNewContext(moduleJs, context)

I am extracting the content of hello.js into a variable named moduleJS and executing them using the introduced vm.runInNewContext. Since context is just an empty object, JavaScript in moduleJS does not have access to any global variables.

I'll run the program to see how it go.

$ node node-get.js hello.js

Aaaand error!

evalmachine.<anonymous>:3
console.log('hello world!');
^
ReferenceError: console is not defined

Enlightnment: conosole is not JavaScript

When I'm writing JavaScript, irrespective of whether it's for the browser or Node.js, I use console.log statements a lot. And they work every time. So naturally I thought it will work inside vm. I guess subconsciously I thought that console is a part of JavaScript. But as it turns out it's just a global provided by the environment.

Above I used runInNewContext. So in this new context there is no console defined. One way to fix it is to add console to the context.

context = {console}; // Now contex has a console
vm.runInNewContext(moduleJs, context);

This does work for now, But console is not the only global that we may use in our modules. There is a whole list of them documented in Node.js documentation. process, Buffer, setTimeout, to name a few.

So if I want to pass in all the globals I'll have to do something like,

vm.runInNewContext(moduleJs, {...globals})

But remembering that I have another method from vm at my disposal, I will use it instead.

const vm = require('vm');
const fs = require('fs');

const moduleJs = fs.readFileSync('./hello.js');
vm.runInThisContext(moduleJs);

hello.js now have access to any global available to node-get.js. It works now!

$ node node-get.js hello.js
hello world!

The get

I will now add the get global so hello.js can load JavaScript from other files as well.

I will define this function inside node-get.js but I intend to use it inside hello.js and inside any other module that hello.js might load(get).

Remember that any global available to node-get.js is available to JavaScript code that goes through runInThisContext. So we need to define get as a global inside node-get.js.

global.get = filename => {
  const loadedJS = fs.readFileSync(filename)
  vm.runInThisContext(loadedJS);
}

With that, my node-get.js looks like this.

// node-get.js
const vm = require('vm');
const fs = require('fs');

global.get = filename => {
  const loadedJS = fs.readFileSync(filename)
  vm.runInThisContext(loadedJS);
}

global.get(process.argv[2])

Note that I am using process.argv[2] to get the entry point to the app instead of hardcoded hello.js.

This entry point module have access to get as a global and any module that's loaded using get('any JS file') will to. So recursively any module in the example project can use get.

To demonstrate these capabilities of node-get, from hello.js I will get a file named cat.js and from within this cat.js I will get another file named mouse.js. All files contain some dumb console.log statement.

// hello.js
console.log('hello world!');
get('./cat.js')
// cat.js
console.log('hello, I am a cat.')
get('./mouse.js')
// mouse.js
console.log('hello, I am a mouse.')

Run node node-get.js hello.js; Aaaand...

$ node node-get.js hello.js
hello world!
hello, I am a cat.
hello, I am a mouse.

Success!

Module scope

So, for now, everything seems to work fine. Let's add more JavaScript to our modules. I'll start with variables. I will define a variable named name in each of cat.js and mouse.js modules.

// cat.js
const name = 'Tom'
console.log(`hello, I am a cat named ${name}`);
// mouse.js
const name = 'Jerry'
console.log(`hello, I am a mouse named ${name}`);

This time, I will get both modules in hello.js

// hello.js
console.log('hello world!');
get('./cat.js')
get('./mouse.js')

Aaaand run it.

$ node node-get.js hello.js
hello world!
hello, I am a cat named Tom
evalmachine.<anonymous>:1
const name = 'Jerry'
^

TypeError: Identifier 'name' has already been declared.

Variables defined in a Node.js(CommonJS) module are local to that module. Unless we export them using exports we can't access them outside the module. But code in cat.js and mouse.js apparently run in the same scope.

Just because they live in two separate files does not make them run in two separate scopes. This problem can be traced to this line from node-get.js.

vm.runInThisContext(loadedJS);

Every single module that will be loaded using our module system will go through this line. So every single module will be run in the context of node-get.js; And in the scope of the get function.

The problem of scopes in JavaScript is well discussed over many years. Before ES6 functions are the only constructs in JavaScript that had a scope of their own. (ES6 introduced classes, let and const) So to give these modules their own scope I'll have to stick them inside one.

I'll write a function called wrap which returns the code of a JavaScript function containing JavaScript code from the module.

const wrap = moduleJS => (
  `(() => {${moduleJS}})()` // wrapping moduleJS in a self calling arrow function
)

global.get = filename => {
  const loadedJS = fs.readFileSync(filename);
  const wrappedJS = wrap(loadedJS)
  vm.runInThisContext(wrappedJS);
}

Now contents of the loaded module are put inside a function. This function calls itself.

This fixes node-get's scope problem so I get the desired output.

$ node node-get.js hello.js
hello world!
hello, I am a cat named Tom
hello, I am a mouse named Jerry

get relative paths

So now my module system is working pretty well. Currently, my Hello-Tom&Jerry project's and node-gets files are all in the same directory. I'll tidy up things a bit by moving the example project's files into a directory aptly named 'example'.

├── example
│   ├── cat.js
│   ├── mouse.js
│   └── hello.js
└── node-get.js

I shouldn't need to change anything inside hello.js since I used relative paths to get other modules in it. And relative paths would be the same in this directory structure as well.

Let's see how that works out.

$ node node-get.js example/hello.js

hello world!
fs.js:640
  return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
                 ^

Error: ENOENT: no such file or directory, open './cat.js'

I was used to require files with paths relative to the module I am calling require in, I thought get will work the same way. But turns out I need to do a bit of work to get it to work that way.

Let's first understand why it did not work this way.

The fs module is what we use to read the contents in loaded modules from node-get.js

fs module actually resolves relative paths relative to the current working directory of the process. Say I run node-get from cwd/node-get. So when get('./cat.js') is called inside hello.js(or anywhere else) where it looks for it is cwd/cat.js. It's not going to find a cat.js there because I just moved it into a directory named example so it's in cwd/example/cat.js

I'd like get to resolve relative paths the same way require does. So I want the get global method in each of my modules to resolve relative modules relative to its own self. So get in each module should work in a way that is specific to that module. The best way I could achieve this is providing each module with its own specific instance of get.

So first I'll change wrappedFunction to take a get parameter.

const wrap = moduleJS => (
  `(get => {${moduleJS}})`
)

Note that the wrappedFunction is not self-calling anymore. (I have taken out the () at the end.) Instead, it's returned to the place where runInThisContext is called so it can be called from there.

Now I'll change the get function.

I have already decided that I need a specific get function for each new module. So instead of one single global get function, I will create a get factory function named createGet so I can create any number of gets from it. Each created get is different from another because each get function has a caller specific to that particular get.

Here is the createGet function with each line following a comment describing it.

const createGet = caller => {
  return filename => {
    // Get the directory the caller is in
    const callersDirectory = path.dirname(caller);

    // resolve relative path relative to the caller's directory
    const filepath = path.resolve(callersDirectory, filename);

    // Read the content in loaded file
    const loadedJS = fs.readFileSync(filepath);

    // wrap it inside the wrapper function. It's not immediately called now
    const wrappedJS = wrap(loadedJS)

    // Run the content through vm. This returns the wrapped function so we can call it later
    const newModule = vm.runInThisContext(wrappedJS);

    // Create a new get to be used in this new module, using createGet itself. Bit of a recursion :)
    const newGet = createGet(filename);

    // Call the newModule (wrappedFunction) with the created `get`
    newModule(newGet);
  }
}

When a get is passed a relative file path, it is resolved relative to the get function's callers location.

Here's the latest node-get.js.

// node-get.js
cconst vm = require('vm');
const fs = require('fs');
const path = require('path');

const wrap = moduleJS => (
  `(get => {${moduleJS}})`
)

const createGet = caller => {
  return filename => {
    const callersDirectory = path.dirname(caller);
    const filepath = path.resolve(callersDirectory, filename); // Paths resolved relative to caller's directory
    const loadedJS = fs.readFileSync(filepath);
    const wrappedJS = wrap(loadedJS)
    const newModule = vm.runInThisContext(wrappedJS);

    const newGet = createGet(filename);

    newModule(newGet);
  }
}

// The entry point to the app does not have a caller. So we create an artificial one.
const rootCaller = path.join(process.cwd(), '__main__');
const rootGet = createGet(rootCaller);
rootGet(process.argv[2])

Now relative paths work the way we are familiar with and I get the expected output.

$ node node-get.js example/hello.js
hello world!
hello, I am a cat named Tom
hello, I am a mouse named Jerry

give

Currently when get is used to load another JavaScript file the contents of that file is executed. But with Node.js modules we can return the results of this execution to the caller to be used later. (Using exports)

const fs = require('fs');
fs.readFileSync('somefile') // Like this.

Now I'll implement the same functionality in node-get.

I'll provide each module with a give function to complement the get it already has. give can be used in the following way.

// capitalize.js
const capitalize = () => { /*function logic*/ }
give(capitalize)

First, I'll change the wrapperFunction to accept another parameter, give.

const wrap = moduleJS => (
  `((get, give) => {${moduleJS}})`
)

I'll implement give in the createGet function itself.

const createGet = caller => {
  return filename => {
    const callersDirectory = path.dirname(caller);
    const filepath = path.resolve(callersDirectory, filename); // Paths resolved relative to caller's directory
    const loadedJS = fs.readFileSync(filepath);
    const wrappedJS = wrap(loadedJS)
    const newModule = vm.runInThisContext(wrappedJS);

    const newGet = createGet(filepath);

    let givenValue;
    const newGive = value => { givenValue = value }

    newModule(newGet, newGive); // Pass new give along side new get.

    return givenValue;
  }
}

It's very simple to implement give. It takes the value passed to it and assigns it to givenValue which is returned from the outer get function. This would mean that only the last give call from a module will take effect.

This completes my new module system and I feel quite clever!

Here are the files from my example project updated to demonstrate the latest features of node-get.

// utils/capitalize.js
// Lifted from stackoverflow: http://stackoverflow.com/a/7592235/1150725
const capitalize = text => {
    return text.replace(/(?:^|\s)\S/g, function(a) { return a.toUpperCase(); });
}
give(capitalize)
// cat.js
const capitalize = get('./utils/capitalize.js')
const name = 'Tom'
give(capitalize(`hello, I am a cat named ${name}`));
// mouse.js
const capitalize = get('./utils/capitalize.js')
const name = 'Jerry'
give(capitalize(`hello, I am a mouse named ${name}`));
// hello.js
console.log('hello world!');

const catText = get('./cat.js');
console.log(catText);

const mouseText = get('./mouse.js');
console.log(mouseText);

Here is the completed node-get.js.

// node-get.js
const vm = require('vm');
const fs = require('fs');
const path = require('path');

const wrap = moduleJS => (
  `((get, give) => {${moduleJS}})`
)

const createGet = parent => {
  return filename => {
    const parentsDirectory = path.dirname(parent);
    const filepath = path.resolve(parentsDirectory, filename); // Paths resolved relative to parent's directory
    const loadedJS = fs.readFileSync(filepath);
    const wrappedJS = wrap(loadedJS)
    const newModule = vm.runInThisContext(wrappedJS);

    const newGet = createGet(filepath);

    let givenValue;
    const newGive = value => { givenValue = value }

    newModule(newGet, newGive);

    return givenValue;
  }
}

// The entry point to the app does not have a parent. So we create an artificial one.
const rootParent = path.join(process.cwd(), '__main__');
const rootGet = createGet(rootParent);
rootGet(process.argv[2])

I'll run node-get one last time.

$ node node-get.js example/hello.js
hello world!
Hello, I Am A Cat Named Tom
Hello, I Am A Mouse Named Jerry

Comparison with Node.js module system

Node.js module system works very similar to the module system I just built.

  • Node.js module system reads a new module using fs.readFileSync and executes its JavaScript using vm.runInThisContext just the way node-get does.

  • It also wraps JavaScript files inside a wrapperFunction to give them a local scope. In fact, this wrapper can be looked at using the module module. Let me show.

$ node
> const m = require('module')
> m.wrap("somejs")
'(function (exports, require, module, __filename, __dirname) { somejs\n});'

See that its signature is quite similar to node-get's wrapper function's.

  • It also has a require function specific to each module and uses this fact to resolve relative paths relative to the module's location

These similarities are there because node-get is built using the understanding I got of Node.js module system by going through its source.

And of course, Node.js module system has many additional features as well.

  • When a module is required, it is cached. So later requires to the same module will be faster. This also means that they act as singletons. (A module is executed only once)
  • It has node_modules. When require is called with an absolute path it looks in several locations including a node_modules directory in the root of the project.
  • You can require JSON files with it.

These features are not that complex. I bet you could think of ways to implement them into node-get if needed.

This excercise helped me to get some subtle understanding of Node.js. I hope you enjoyed reading about it.

Despite what is commonly said, I really think that JavaScript is alright. I love Node.js for allowing me to do a great many things with it.

I plan to hack deeper into Node.js, and write about my experiments with it. Stay tuned!

Tuesday, September 27, 2016

Magic of require extensions

Magic of babel register


Some development tools are like magic. Babel require hook, at least to me, is such a tool. Not only does it convert the flashy ES6 and ES7 to plain old ES5, which itself is quite awesome, it does so on the fly. You make one require call to babel-register and then require any js file with ES6/ES7 code and it just works!

I needed to do something like what it does at work. So to learn how it work I peeked into babel-register's source. It's very small. The the transpiling stuff is done by babel core so they were outside the source. I was after the stuff that does the require magic and within minutes I found what I was looking for.

Require extensions.


Require extensions allow you to define a function describing what to do when a file with a given extension is required. Can't get more flexible than that!

Here is a way to extend require to text files.

fs = require('fs');

// Now you can require '.txt' files
require.extensions['.txt'] = (m, filename) => {
  m.exports = fs.readFileSync(filename, 'utf8');
}

const text = require('./test.txt')
console.log(text); // Cool!

This is cool! But..

It's deprecated.


In fact its been deprecated for a while. Node versions as early as 0.10 documents it as deprecated. But it has survived close to 3 years and many versions to appear in node version 6 as well. Even the documentation admits that its unlikely to go away.
Since the Module system is locked, this feature will probably never go away.
But it also say,
However, it may have subtle bugs and complexities that are best left untouched.
I don't know about its internal bugs. But complexities might occur because it may compel developers to publish non-javascript packages for javascript projects. For example someone can write the entire source of their package using TypeScript. And only in the entry point to the package use require extensions to register a require extension for .ts. This handler can use a transform function to compile TypeScript to javascript dynamically.

I can see two reasons why this is bad.
  • Require extensions is global.
The .ts extension handler set by this package could be over-written by another package that use TypeScript. 
The second package could be using a compiler for a different version of TypeScript. Now that compiler will try to compile the first package's TypeScript sources and will break it!
  • Compilation unnecessarily takes time
When an application developer install the package written in TypeScript he/she will be compiling it every time their app is run. But the packages source won't change. So they will be compiling the same source files over and over again.
If the source is precompiled and published these problems does not occur.

Its not hard to understand that using require extensions this way should be avoided. But what about using it for development? Development is where we use babel-register. Development is where I needed it too.

My Use-case


Many projects run tests for front end code in node. Many projects use webpack to compile front end code from jsx and ES6/ES7 stuff to plain js. With webpack we can use special require calls that invoke webpack loaders. For this reason test tools like jest, enzyme and others need some special configuration to work with webpack. In kadirahq's storyshots project we provide an easy way for these projects to run snapshot tests. For more info read its introduction on Kadira voice. In storyshots we have a simple setup that use babel-register, and we needed it to work with webpack loaders.

So if we want a to run the front end code on node we have to run webpack on node. Tests are run often so should run fast. Running webpack and saving a file and then requiring it again takes time. It's common for a webpack build to take around 5 seconds. That is what we tried to avoid with require extensions.

We made a substitute for webpack loaders in a few lines using require extensions.

const loader = loaders[ext];

require.extensions[`.${ext}`] = (m, filepath) => {
  m.exports = loader(filepath);
};

The loader function mimics some loader for the file extension ext.

For example following mimics the url loader for jpgs.

loaders['jpg'] = filepath => filepath;

If we consider css content is not important for our tests, because say we only need to test if we add the correct css classes at correct places, we can ignore css with a loader function like following.

loaders['css'] = () => null;


Useful enough to take a measured risk


Transpilers are put to heavy use these days with lots of react and ES6 development going on. If dared to be used, require extensions could help to develop more magic-like tools like babel-register.

Thursday, May 7, 2015

Displaying Sinhala characters on the web

Sinhala language is spoken only by the Sinhalese people in the small island of Sri Lanka who are about 60% of the total population of a 20 million. So its one of the least spoken languages in the world which makes seeing Sinhala characters on the web is a delightful experience for a Sinhalese. At least it used to be. Nowadays with the breakethrough of unicode its become so commonplace, theres nothing so special about it. There are many sinhala websites, half of posts on my facebook wall are in Sinhala and though its still somewhat dodgy, Google translate includes Sinhala.

Before unicode ASCII was popular. It still is here but unicode is the norm. There are 256 ASCII characters and 128 of them were used for letters of the English language. As a standard ascii code 065 is used for capital A in English fonts. In fonts that have glyphs for languages other than English its for some standard letter of that language's alphabet. And in Wijesekara layout for Sinhala fonts it stands for "Hal kireema" . The problem with this standard is that same sequence of ascii codes could display different glyphs depending on the font used. Some text written in Sinhala language using a Sinhala ascii font, if viewed using a different font, could display nothing but gibberish. Or worse, if two languages contain similar letters, it could give out a meaningful yet different meaning. So its obvious that ascii texts are very difficult to be used universally. Hence unicode.

There were 65000 unicode characters in the beginning and now there are 17 times more which allows every letter in every language in the world to have its own unicode code. There's still an excess of codes which are taken up by glyphs like ♥, ♫, ☯,  ☺. With unicode, the font used should not matter in deciding which letter of which language is displayed. Only in the visual properties of glyphs it should matter. Font makers are guided on which unicode code should display which characters. 128 characters from code U+0D80 through U+0DFF are reserved for Sinhala characters.

Obviously a font cannot contain glyphs for every unicode code. If a selected font does not contain glyphs for a certain unicode characters those characters would be displayed in a font that does. Applications including web browsers would select the fallback font depending on the way the system is configured. In my ubuntu 14.04 machine Sinhala characters are displayed by the font LKLUG. It can be changed by changing configuration of fontconfig.

Now to displaying characters on the web. Earlier, content of web sites are displayed entirely in fonts that are installed in viewers system. Websites could optionally specify a certain family or a font or a chain of fonts for fallbacks. Though a webpage could end up being displayed entirely different from the way the developer expected because of lack of a certain installed font. Though this is the case with many websites even now, there is the introduction of webfonts which could change all that.

Developers can specify a font to use and the place to get that font, using @font-face notation so the clients (web browsers) would do everything they can to display text using that font. Usually they only fallback if they could not download the webfont from the specified location.

Early Sinhala websites would include ASCII text. And as none of the sinhala fonts they could use could be considered web safe, they asked users to download and install whatever font they are using. Notices appeared that says "Do you see sinhala characters? If not download and install this font" while gibberish apeard in whatever the english ascii font the web browser decided to fallback. Only once the font is installed the text would look meaningful.

When unicode came through many sinhala websites changed from ASCII to Unicode. The upside is most systems included a unicode font that covers the sinhala unicode characters. This got rid of the step of downloading and installing fonts. Unfortunately this is also the downside. Most systems... Some systems does not include a Sinhala unicode font. For example Android devices with versions KitKat and prior. And without rooting its very difficult to install a new font there. LollyPop standard font includes glyphs for Sinhala characters. But some manufacturers like Sony removed them for reasons known only to them. Maybe they thought extra few KiloBytes is not worth an entire nation reading and writing from their native toung. Sinhala websites like bbc.lk, lankadeepa.lk contains unicode text. So they are readable from most pcs but not from most Android hand helds.

Then webfonts came up. Which allows developers to include a Sinhala unicode font with the rest of the content from the website. So its readable from most browsers including ones in Android devices. gossip.hirufm.lk does this. Many other Sinhala websites do not seem to do it.

gossiplankanews.lk, another gossip site!, use webfonts but they are sticking to ASCII. If text from their site is copied and pasted somewhere you can see the gibberish they truely are. But at least since they use webfonts content should be readable from systems without Sinhala Unicode fonts, so Android devices.

If the developers of Sinhala websites use webfonts with unicode content they can increase their audience. fontsquirrel is a good place, among others, to generate a webfonts kit. The hodipotha font from icta.lk is released under creative commons license, so it can be used to generate the webfonts kit.

In fontsquirrel it is important to chose the expert option and pick no subsetting. Unless it will generate webfonts with characters only in the range of western charaters omitting Sinhala characters.

Following text is using webfonts (hodipotha) and hence should be visible in many browsers including ones in Android mobiles in (not very beautiful) glyphs of the hodipotha font.

සිංහල යුනිකෝඩ් (unicode) වෙබ් ෆොන්ට්ස් හරහා

Following is not. And hence would show up in whatever the font your system decides.(is configured)

සිංහල යුනිකෝඩ් (unicode)


Monday, March 16, 2015

Pike, A Hidden Beauty

Last few days I was getting myself familiar with an interesting project named sTeam. One of the things that makes it interesting is that its written in pike. Pike is the programming language that is used to write it.

Don't worry if you haven't heard of pike. Many people have not. Its a relatively undescovered language. Average google search along the lines of "Pike tutorials" would not get you much other than this official beginner tutorial. The stackoverflow tag 'pike' has mere 8 subscribers. There is only the single official intrepretter and not many IDEs except for an emacs mode and an eclipse plugin. Text editor I use, gedit, by default does not know how to highlight syntax of a .pike file.

But installing pike and trying it out makes you wonder.. why? why don't we use this more.

Pike is very attractive just like an unseen hidden forrest flower.

In my short career so far I have been lucky enough to use many languages. Most of my assignments in the university are to be done using C or Java. My final year project is done with Diaspora, which is ruby. I coded python for melange in Google summer of code. During an internship with IroneOne I worked mostly on an iOS app which is objective C. And my first task in my first job was building a nodejs application, which is Javascript. While going through the tutorial, I felt pike is made of good parts of many of those languages.

Duck Typing


There are both pros and cons of Duck Typing. While coding python, ruby and javascript code I felt great and previlaged to be able to use dynamic data typing. But it wasn't unclear to me, that some of the code that takes advantage of Duck typing could lead to problems. Specially if they are not documented or the documentation is not reffered. Sometimes it took some of my time to understand poorly documented javascript APIs.

On the other hand, sometimes, In early days as an undergraduate I felt frustrated to not being able to return couple of integers and a string in a single array from a java function.

Pike has a simple yet effective solution. In addition to its three basic types, int float and string it has another queer little type, mixed. You can specify a variable as mixed and and store any value irrespective of whether it is an int float string or even a complex type.

Pike v7.8 release 866 running Hilfe v3.5 (Incremental Pike Frontend)

> int a;
> a = 7;
(1) Result: 7
> a = "aruna"
>> ;
Compiler Error: 3: Bad type in assignment.
Compiler Error: 3: Expected: int.
Compiler Error: 3: Got     : string(0..255).

> mixed b;  // using mixed type
> b = 21;
(2) Result: 21
> b = "herath";
(3) Result: "herath"


Arrays declared to hold mixed types, or not declared to hold any type, can contain values of mixed types. A function declared to return mixed type can return anything.

Syntax


I feel syntax in pike manages, again to get the best, most elagant bits out of the languages I am familiar with. It uses c style parantheses to identify scopes of funtions conditional and loop statesments. In my view much better than pythons awkward tabs/spaces.

Outshining c it has a simple 'dictionary' data structure known as mappings, with elegant syntax.

mapping(string:string) batsmen = (["Sangakkara": "Sri Lanka", "Maxwell":"Australia", "Kohli": "India"])

Arrays,

array(string) cricketers = ({"McCullum", "De Villiars", "Malinga", "O'Brien"})

In my view pike syntax is pleasing to the eye...

Interpreter


You probably have guessed from whats mentioned so far that pike is interpreted. Yes, Pike comes with an official interpreter named Hilfe.
A real time interpreter is something I find very useful to learn a language. It is of great value when developing as well. You can just enter few lines in the interpreter and see if you got the syntax or the logic correct. It's much easier and quicker than writing a program and compiling and testing or looking through documentation.

Further pike does not have many external libraries, but its distributed with many modules so you can get most of your work done with pike itself.

In Top Gear consumer advice style, pike would not treat you like a kid and hold you back from doing stuff freely, nor would it consider you a saint and over trust you with everything.

It probably is not a good idea to use pike in your next big project, for the very little support you'd get, but definitely worth knowing about and trying it out in one of those pet projects.

Wednesday, January 21, 2015

GCI mentoring with FOSSASIA

For the last few weeks I got the opportunity to be involved in the Google Code-In 2014 program as a mentor for FOSSASIA (Thanks Andun Sameera!). It was challenging than I thought specially while doing a full time job. But was a great experience and I learned things myself with the students.

The program is almost over, with only the results are yet to be out.

FOSSASIA's co-admin Mario Behling initiated an interesting project at the start of the program to give students an opportunity to experience open source development culture. The project was to create a small website to hold FOSSASIA's students' and mentors' details. It came out to be a great success with a cute little website being created and more importantly a nice little community of students created around it.



Usually there is a barrier you need to get past as a novice contributor, to get your first commit merged in to an open source project. The administrators would want you to follow annoying coding conventions, to "combine your 5 commits, solving a simple small bug into one big commit" or to "rebase your pull request on top of master". Until you continue contributing for some time and realize the importance of those, and start to appreciate them, they are just some annoyance that you have to deal with, on the way to get your work integrated.

We for this project initially made this barrier very very less challenging. We would merge pull requests if they do the job. This so that young student contributors don't feel discouraged and only until they get themselves started.

But having being well mentored at Google Summer of Code 2013 I wanted some niceties in our git commits. So I made learning them into a task.

The task was to learn how to make your local commits look nice before you push them to the repo. To make it more organized and can be evaluated, and hopefully fun, I built up a small set of commits with a interesting bit of a commit history; a story.

I added the set of commits to a Github repo that includes wrongly commited commit message and two commits that could look better sqashed into a bigger commit. Students are asked to clone the repo and then using git interactive rebase, make the commit history look better. The story of the commits and a set of instructions are given. Then they have to blog about there experience.

They came up with some great write ups! Some focused on the technical aspects and were of a tutorial point of view. Some were explaining the personal experience writers themselves got and were on a lighter, less technical, language. However all were great!

I think I got few students to learn something that will be valuable in their future careers and also one student to start blogging!

When I saw a set of commits that could be better organized in a pull request for any of FOSSASIA's repositories, from a student who completed this task, I asked them to make them better. Thanks to above task, they knew the terminology, and communication was easier. When I say squash these commits and reword the commit message to something like this, they knew what I was saying, and how to do that, and were happy to oblige.

We gradually made it harder and more challenging, bringing the barrier to the usual level, for students who hang around to complete more tasks.

This hopefully resulted in not only the finish product, but also the path towards it, to be in great shape.

Students managed to complete many more very valuable work for FOSSASIA.

It was fun working with them and I wish them an exciting and a fruitful future!

Tuesday, October 21, 2014

Making Socket.io 1.1.0 work with multiple nodes

Socket.io is the most popular websocket implementation library that is available to the application develeper. However for a long time it was full of bugs, had a lot of architectural issues and is not maintained. But changing all that socket.io version 1.0.0 came out. At this time the latest stable socket.io version is 1.1.0 and its so much improved.

If you are excited about the websockets technology and socket.io helped you to explore it, you'd be delighted to here about this all new rebirth of it. But trying to migrate to the new version from the old one is where you'd lose most of that delight. Specially if you have multiple nodes running on your server for load balancing.

When multiple nodes are running on the server side they collectively are responsible for handling the socket.io clients. Clients would not have much idea of which server node they are dealing with. This means that the server nodes need some common ground to share information about the clients on, so that any one of them can handle any client. In socket.io 0.9.* this ground is given the name store. A store can be implemented using any storage technology according to a store interface. The redis-store was the most used.

There are many fundamental problems with this architecture. One of the main ones being that the store used will contain every single details about every single client that connects. This makes drastically decreases the possibility of horizontal scaling. It would work great for few nodes with limited number of subscribed clients but when the number of clients touch millions this should give a lot of problems. Another is that it is not possible to add new nodes to the cluster without taking the whole cluster down. This is because new nodes do not update with the data available with already running nodes and are unable to handle requests from the existing clients.

So they have removed 'stores' from the new socket.io version and rightly so.

The successor of the redis-store will be redis-adapter. Here is how my diff looked like after the substitution of redis-adapter instead of the redis-store.

     var sio = require('socket.io');
     io = sio.listen(server);
 
-    var subscriber = redis.createClient(
-                         config.redisPort, config.redisHost, config.redisOptions);
-    var publisher = redis.createClient(
-                         config.redisPort, config.redisHost, config.redisOptions);
 
-    var RedisStore = require("socket.io-redis");
 
-    io.set('store', new RedisStore(
-             {pubClient:publisher, subClient:subscriber, host:config.redisHost,port:config.redisPort}));


+    var redisadapter = require('socket.io-redis');
+    io.adapter(redisadapter({ host: config.redisHost, port: config.redisPort }));

But the migration does not end here. The new socket.io requires the nodes to have sticky sessions in order to operate.

Sticky sessions ensures that a subsequent request would be forwarded to the same node that handled the previous requests corresponding to that request. So IP address based sticky sessions make sure that all the requests from a particular IP address is sent to the same node.

How you should implement sticky sessions depends on the technology you use in the load balancer. If you are using Nginx it can be configured in the setup. Or if you are using pm2 you are not that lucky (yet).

Or it is possible that you use the node cluster module for the load balancing. In that case 'sticky-session' node module should give you a hand. But its still not very mature and could have many more features. Anyway it works.

Wrapping the server instance in sticky function should do it.

+    var sticky = require('sticky-session');

-    var server = http.createServer(handler);
+    var server = sticky(http.createServer(handler));

And now socket.io 1.1.0 starts working! Its really not that difficult but there are not much help around the internet to the migrater. Once many stackoverflow questions around are answered and many new tutorials are put up socket.io would be great to work with.

Tuesday, July 22, 2014

Uploading files to a mongodb database without using express


Building functionality to upload a file to a Node.js server using express is a piece of cake. But for various reasons sometimes we do not want to use express. I had to implement such a functionality for such a system which only uses pure Node.js. Here is my experience while at it.

HTTP multipart request

Http is a text based protocol. It is intended to transfer text. If we transfer files which may contain binary patterns that are not found in simple text files, the network components, as they are only intended to handle text, may misbehave. The data in the http packet could contain a byte with a pattern that is used as a control signal in the http protocol. For example the end of transmission(EOT) character. Some components may reject bytes that are not valid text. Some may edit them. These may corrupt the file.

To avoid such pitfalls the standard of http multipart request is used. Http multipart request body is a little different in format to its regular counterpart. Most notably the value of the content type header field would be 'multipart/form-data'. The body of the http request could contain multiple files separated by a boundary. Network components are designed so that they would interpret multipart requests differently than regular ones. Data amid boundaries are treated as binary and they would not care what they mean.

So when we upload a file to a server through the internet what we actually do is no different than what we do when we submit a form by an http post request. Except that the http post request is encoded in a different way.

However above information is not needed to be known by the application programmer because the user agent she is writing the program to, should know how to put together an http multipart request. For example the browser (a user agent) would submit a multiparty request at the submission of following html form.


    <form action="/upload" enctype="multipart/form-data" method="post">
    <input type="text" name="title"><br>
    <input type="file" name="upload" multiple="multiple"><br>
    <input type="submit" value="Upload">
    </form>

Or on the Linux terminal

curl -v -include --form file=@my_image.png http://localhost:3000/upload

Server side

Just as the http client the application programmer is using would encode an http multiparty request, the server side framework should decode one for her. As mentioned earlier express would do this without a hassle. But if express is not an option for you, if you are on pure Node.js, then you might be a little confused. I was too until I got to know about multiparty. This npm package takes in the request instance and gives you references to the files saved in your disk on the temp directory, the files that were included in the request. Just as express would have.


http.createServer(function(req, res) {
  var multiparty = require('multiparty');

  if (req.url === '/upload' && req.method === 'POST') {
    // parse a file upload
    var form = new multiparty.Form();

    form.parse(req, function(err, fields, files) {
      res.writeHead(200, {'content-type': 'text/plain'});
      response.end("File uploaded successfully!");
      // 'files' array would contain the files in the request
    });

    return;
  }

}).listen(8080);

In the callback of the form.parse method it is possible to read the file in and save it to a database, rename it (move it) or do any other processing.

Processing the request

But if we are gonna save the file on the mongodb database why save it in the disk? Turns out we don't have to.

The form instant created by multiparty's Form constructor has 'part' and 'close' events to which handlers can be hooked. The 'part' event will be triggered once for each file(part) included in the multipart request. 'close' will be triggered once all the files are read.

The handler of the 'part' event will be passed an instance of a Node.js ReadableStream, just like a request instance to an Node.js http server. So it has 'data' and 'close' events (among others) just like a request instance to an Node.js http server, that can be used to read in the file, chunk by chunk.


form.on('part', function(part) {
    console.log('got file named ' + part.name);
    var data = '';
    part.setEncoding('binary'); //read as binary
    part.on('data', function(d){ data = data + d; });
    part.on('end', function(){
      //data variable has the file now. It can be saved in the mongodb database.
    });
  });

The handler of the 'close' can be used to respond to the client.


  form.on('close', function() {
    res.writeHead(200, {'content-type': 'text/plain'});
    response.end("File uploaded successfully!");
  });

The complete code would look like this.


  var multiparty = require('multiparty');
  var form = new multiparty.Form();

  var attachments = []

  form.on('part', function(part) {
    var bufs = [];

    if (!part.filename) { //not a file but a field
      console.log('got field named ' + part.name);
      part.resume();
    }

    if (part.filename) {
      console.log('got file named ' + part.name);
      var data = "";
      part.setEncoding('binary'); //read as binary
      part.on('data', function(d){ data = data + d; });
      part.on('end', function(){
        //data variable has the file now. It can be saved in the mongodb database.
      });
    }
  });

  form.on('close', function() {
    response.writeHead(200);
    response.end("File uploaded successfully!");
  });

  form.parse(request);

Multiparty would save the files to the disk, only if the form.parse method is provided a callback. So in the above case it would not do so. It is expected that processing of the file is handled using the event handlers of the form instance.

Saving on MongoDb

Saving the data on the mongodb database could be done using the GridStore. This part will not be included in this post since it is straight forward. Further this step will be the same whether we use express or not, and I want this post to be specific to the case of pure Node.js.

Thanks for checking out!