Monthly Archives: November 2013

audioconvert.js – fast audio transcoding in browser

Recently I have been experimenting with emscripten and FFmpeg in order to learn possibilities of C++/C to Javascript porting. Firefox can run such JS executables with about half of a native speed when using its asm.js engine. Chrome has its own PNaCl project for running C++/C applications with almost native speed (they compile to special intermediate representation instead of Javascript). One can imagine that having such a powerful tool can lead to a whole new class of web applications and fat clients, where a difference between a native program and a browser client is even more blurred. You could execute expensive computations in a browser itself without sending any (sometimes confidential) data to a server. There are examples of QT apps ported to Javascript, which don’t fell slow when run in a browser. Imagine having a web page with a word processor, which would have most of features of a desktop application without compromising performance and security. Such an app wouldn’t need to make continuous AJAX requests to a back-end server in order to operate. It could also store documents locally, so that all secrets would be safe. All of this could be achieved without installing and updating desktop program on multiple PCs. How convenient is that?


In my example pet project: quick audio converter (checkout github too) I have ported FFmpeg (with lame, ogg, vorbis and fdk-aac libraries) to Javascript in order to create a high-quality online audio transcoding service. Contrary to existing services, using my application a user can convert audio clips without uploading any files to a server. Obviously, this is a great advantage since computations can be distributed among each user’s computer. There is a great number of services which could make use of that. For instance, could encode user’s video files before uploading them to a server and save tons of processing time this way.

There are, however, some serious limitations in current browsers that reduce practical usage. The biggest issue is a lack of serious threads in Javascript. WebWorkers do provide threading mechanism for JS, but they can only communicate though messaging which is not equivalent of memory sharing native threads. In case of FFmpeg it is easy to turn off pthreads during configuration, but other packages might not have such an option.

Another issue is storage capabilities. Emscripten provides file system abstraction and you can even attach your own driver for reading and writing operations. However, there is currently no way to store files on a hard drive, which is a serious limitation for programs that have to process large amount of data (e.g. video transcoding). In the future Filesystem & FileWriter API might be used for this purpose, but currently those APIs are only supported by Webkit.

In case of GUI applications a single threaded environment also requires a special care for handling input events. Input events are only processed when a Javascript program is idle. This means that a GUI Javascript app has to yield control to a browser and inner event loops are prohibited (because they would block a browser from processing the events). The best examples are QT ports. A normal QT application has a main event loop which runs until the program quits. However, QT Emscripten ports do not enter main loop after initial setup and relieve control to a browser, which then handles input events.

The last issue is a size of compiled JS programs. Those can grow to tens of MBs when porting large software suites. In case of a FFmpeg port for audio transcoding the compiled Javascript file is around 8.5MB, but with video support the file would be much larger. There is not a simple solution to this problem. However, for a FFmpeg port it should be possible to have multiple JS compilations, each supporting a different media format. Appropriate ffmpeg.js scripts would be loaded for input decoder and output encoder and then pipelined together in order to transcode a media file.

I think that emscripten, asm.js and PNaCl have a huge potential and we will start seeing innovative applications of these technologies. Even though currently they might seem a bit immature, they will definitely change the way we use our browsers and eventually bring us closer to a web based operating systems.

A top notch mobile e-books reader… better than Google Books

One of my tasks at work was to prototype and then create a quality e-books reader for all mobile platforms. I must say that the end result is even better than I thought. Generally, a great challenge in an e-books reader, which is not obvious at first glance, is to provide the smoothest experience possible. There are many functionalities, which seem initially trivial, but are definitely complex if we want to keep the smooth experience. Such functionalities are pages browsing (especially on a chapter edge), changing the font size or changing the device orientation. It is the best if a user doesn’t even notice them. It is definitely bad if a progress bar appears for many seconds when a user increases the font size or accidentally switches the device orientation. It is even worse if a user looses a context within the book (e.g. the first sentence of a page disappears or moves to the bottom of a page) by changing orientation or font. We can group readers by how well they preserve a context during such operations.

Most reader apps are based on an internal mobile device browser for rendering chapters and pages (e.g. UIWebView for Apple and WebView for Android). The least precise readers (e.g. Monocle and probably Google Books) would move the first sentence of a page around. Those readers are usually based on CSS3 multi-column layout. Although it is the easiest way, it introduces many problems such as the delay (even few seconds) imposed by columnizing all content within a chapter whenever the orientation or the font changes.

The more advanced readers (I think Kindle app does that) would most likely render an entire chapter as a one big vertical page and then scroll to a y position of the beginning of a current page. They would also clip a content on the bottom of a page that does not fit entirely on the page. This approach should be faster since CSS3 multi-column is quite slow. Those readers have a single line accuracy which means that the first word of a page should still be visible in the first line of the page after the orientation or the font changes. The drawback of this approach is that special algorithms which compute where the page begins or ends must be implemented.

The most advanced solution (e.g. our solution and also Kindle on some devices) would always keep the first word of a page preserved whenever the font or the orientation changes. This way the user doesn’t get confused and has the best experience. However this requires implementing additional algorithms for accurate pages displaying. The big advantage of such fine grain control over pages rendering is that the performance of such reader is stunning. I think that we have managed to create one of the best and fastest readers in the market. You can see screencast from the app below (with comparison to Google Books):

or you can try using it yourself by installing Audioteka app.