Author Archives: Karol Sobczak

About Karol Sobczak

A programmer and inventor that looks for solutions to interesting technological problems.

audioconvert.js – fast audio transcoding in browser

Recently I have been experimenting with emscripten and FFmpeg in order to learn possibilities of C++/C to Javascript porting. Firefox can run such JS executables with about half of a native speed when using its asm.js engine. Chrome has its own PNaCl project for running C++/C applications with almost native speed (they compile to special intermediate representation instead of Javascript). One can imagine that having such a powerful tool can lead to a whole new class of web applications and fat clients, where a difference between a native program and a browser client is even more blurred. You could execute expensive computations in a browser itself without sending any (sometimes confidential) data to a server. There are examples of QT apps ported to Javascript, which don’t fell slow when run in a browser. Imagine having a web page with a word processor, which would have most of features of a desktop application without compromising performance and security. Such an app wouldn’t need to make continuous AJAX requests to a back-end server in order to operate. It could also store documents locally, so that all secrets would be safe. All of this could be achieved without installing and updating desktop program on multiple PCs. How convenient is that?

In my example pet project: quick audio converter (checkout github too) I have ported FFmpeg (with lame, ogg, vorbis and fdk-aac libraries) to Javascript in order to create a high-quality online audio transcoding service. Contrary to existing services, using my application a user can convert audio clips without uploading any files to a server. Obviously, this is a great advantage since computations can be distributed among each user’s computer. There is a great number of services which could make use of that. For instance, youtube.com could encode user’s video files before uploading them to a server and save tons of processing time this way.

There are, however, some serious limitations in current browsers that reduce practical usage. The biggest issue is a lack of serious threads in Javascript. WebWorkers do provide threading mechanism for JS, but they can only communicate though messaging which is not equivalent of memory sharing native threads. In case of FFmpeg it is easy to turn off pthreads during configuration, but other packages might not have such an option.

Another issue is storage capabilities. Emscripten provides file system abstraction and you can even attach your own driver for reading and writing operations. However, there is currently no way to store files on a hard drive, which is a serious limitation for programs that have to process large amount of data (e.g. video transcoding). In the future Filesystem & FileWriter API might be used for this purpose, but currently those APIs are only supported by Webkit.

In case of GUI applications a single threaded environment also requires a special care for handling input events. Input events are only processed when a Javascript program is idle. This means that a GUI Javascript app has to yield control to a browser and inner event loops are prohibited (because they would block a browser from processing the events). The best examples are QT ports. A normal QT application has a main event loop which runs until the program quits. However, QT Emscripten ports do not enter main loop after initial setup and relieve control to a browser, which then handles input events.

The last issue is a size of compiled JS programs. Those can grow to tens of MBs when porting large software suites. In case of a FFmpeg port for audio transcoding the compiled Javascript file is around 8.5MB, but with video support the file would be much larger. There is not a simple solution to this problem. However, for a FFmpeg port it should be possible to have multiple JS compilations, each supporting a different media format. Appropriate ffmpeg.js scripts would be loaded for input decoder and output encoder and then pipelined together in order to transcode a media file.

I think that emscripten, asm.js and PNaCl have a huge potential and we will start seeing innovative applications of these technologies. Even though currently they might seem a bit immature, they will definitely change the way we use our browsers and eventually bring us closer to a web based operating systems.

A top notch mobile e-books reader… better than Google Books

One of my tasks at work was to prototype and then create a quality e-books reader for all mobile platforms. I must say that the end result is even better than I thought. Generally, a great challenge in an e-books reader, which is not obvious at first glance, is to provide the smoothest experience possible. There are many functionalities, which seem initially trivial, but are definitely complex if we want to keep the smooth experience. Such functionalities are pages browsing (especially on a chapter edge), changing the font size or changing the device orientation. It is the best if a user doesn’t even notice them. It is definitely bad if a progress bar appears for many seconds when a user increases the font size or accidentally switches the device orientation. It is even worse if a user looses a context within the book (e.g. the first sentence of a page disappears or moves to the bottom of a page) by changing orientation or font. We can group readers by how well they preserve a context during such operations.

Most reader apps are based on an internal mobile device browser for rendering chapters and pages (e.g. UIWebView for Apple and WebView for Android). The least precise readers (e.g. Monocle and probably Google Books) would move the first sentence of a page around. Those readers are usually based on CSS3 multi-column layout. Although it is the easiest way, it introduces many problems such as the delay (even few seconds) imposed by columnizing all content within a chapter whenever the orientation or the font changes.

The more advanced readers (I think Kindle app does that) would most likely render an entire chapter as a one big vertical page and then scroll to a y position of the beginning of a current page. They would also clip a content on the bottom of a page that does not fit entirely on the page. This approach should be faster since CSS3 multi-column is quite slow. Those readers have a single line accuracy which means that the first word of a page should still be visible in the first line of the page after the orientation or the font changes. The drawback of this approach is that special algorithms which compute where the page begins or ends must be implemented.

The most advanced solution (e.g. our solution and also Kindle on some devices) would always keep the first word of a page preserved whenever the font or the orientation changes. This way the user doesn’t get confused and has the best experience. However this requires implementing additional algorithms for accurate pages displaying. The big advantage of such fine grain control over pages rendering is that the performance of such reader is stunning. I think that we have managed to create one of the best and fastest readers in the market. You can see screencast from the app below (with comparison to Google Books):

or you can try using it yourself by installing Audioteka app.

Simple extensible red-black tree and interval tree implementation

For my warm up post I would like to present a small and simple (hopefully) red-black tree implementation. In my other project: AdblockTV, which I will also describe in future posts, I required an extensible balanced tree implementation that would support data structure enrichment. I checked open source libraries like guava, but I found it to be an overkill to include an entire library for just a single functionality in a relatively compact project (also guava didn’t had RangeMap class by that time). Additionally, I wanted to have a swiss knife tree implementation that I could easily enrich whenever required and writing one would be fun too. Therefore I ended up creating RedBackTree and IntervalTreeMultiSet classes. Sources can be downloaded from here.

The idea of RedBlackTree class is that it provides basic tree methods:

protected void add (Node z);
protected Node minimum (Node x);
protected Node successor (Node x);
protected void remove (Node z);
protected Node search (K key);

which operate on a tree composed of Node objects. Classes providing tree functionality must derive from the RedBlackTree class and implement own access methods. For example add(K elem) method in RedBlackTreeSet is coded below:

    public void add(K e) {
        Node node = search(e);
        if (node == getNilNode()) {
            add(new Node(e, null));
        }
    }

One can see that in this function a special method getNilNode() is called. In fact, there are two special methods in the RedBlackTree class:

protected Node getNilNode();
protected Node getRootNode();

They return Nil and Root nodes of the tree respectively.

Now, if one wants to enrich a standard red-black tree with additional data, one must derive from the RedBlackTree class. In addition, one has to define an embedded class (e.g EnrichedNode class), which derives from the RedBlackTree.Node class. Having those, one can write own access methods, which would take advantage of a balanced tree structure just like it has been done in the IntervalTreeMultiSet class. However, there are few RedBlackTree.Node methods that may need to be implemented in the example EnrichedNode class:

protected void updatePath();
protected void updateSingle();
protected void rotateUpdate();

Those methods should update enriched node data based on data from child nodes. Additionally, updatePath() should also call updateSingle() on parent nodes up to a root. Those methods are called whenever a tree structure changes and do nothing by default since we don’t want to waste CPU cycles. The reason why there are three update methods is that in a red-black tree it is not always necessary to update an entire path up to a tree root. Depending on enriched node data and operation type (e.g. nodes rotation) it might be sufficient to update just a single node and not the entire path upwards. For example, when in an IntervalTreeMultiSet instance a node is rotated with its parent, it is enough to update just that node min and max values (and not top nodes), because total min and max values in a subtree do not change. For more references, I recommend checking the IntervalTreeMultiSet itself (it is well documented). A nice feature of that class is that it supports both inclusive and exclusive intervals.

Hello world!

Hi everyone, I have finally started my own tech blog. I have decided that the things that I have done in the past as well as the things that I am doing currently are, in fact, very interesting. In my opinion it would be a waste to just have this experience and not share it with anyone. Additionally, I will feel more comfortable knowing that I can always peek here to check out challenges and ideas that I have been dealing with in the past. Hope that readers will enjoy it too!

Karol's Tech Blog

Interesting ideas and challenging problems that I have encountered.