HTML5 Audio/Video Capture Status

Here at CrowdEmotion one important tech topic is capturing video in the most efficient way: we must be very careful about the quality of capture process, as on it depends part of the accuracy of our analysis processes.

On desktop web platforms the obvious choice is between Adobe Flash and, quite recently, HTML5.

Why bother to find something different from Flash? I’ll not add anything to what someone way more authoritative than me wrote on this topic; the conclusion is simply that HTML5 is the way to go for a number of very good reasons.

Currently, we use a 3rd-party Flash-based solution and we are fairly satisfied with it, but we need to do more, be more flexible. We already started experimenting with Flash development (with Haxe to try to work with open tools) and open source RTMP video servers (Wowza, Red5), but in any case you’re limited by the closed Flash platform.

One simple question first before going on more on HTML5 capture: why Google Hangouts does not yet run on HTML5 and relies instead on a native plugin?

Quick answer: because HTML5 real-time communication stack, called WebRTC, even if feature-complete, seems definitely too young to be adopted for a mainstream & strategic product like Hangouts.

WebRTC is a technology born in 2011 thanks to Google, then the industry came with Mozilla among the others. Probably the best thing about WebRTC currently is that it brings order and standards to the world of real-time communication in which, currently, only closed-source proprietary solutions reigned almost alone (e.g. Skype).

As of today, W3C specs are still in draft form for the most parts and some components are still not yet implemented, even if P2P communication is working nicely in almost all recent browsers and you can find plenty of fun experiments and good products out there.

WebRTC has been thought as a solution to browser-native peer-to-peer audio/video communication, which means direct communication between browsers (not always possible because of networking configurations). Everything in WebRTC is currently related only to this scenario and server-side recording i.e. client-to-server media streaming can only be obtained as a side-effect or by faking a client-behavior on the server.

This is also the reason why here I’m talking about WebRTC, just because the incomplete implementation of the audio/video capture browsers’ stack forces developers to revert to real-time communication components since it’s the only way to capture audio/video in a browser.

With a trick it’s possible to record and save the entire audio/video (separated) streams and post them to a server at the end, but this have drawbacks like length of the recording, upload time at the end of the recording, etc.

From an architectural point of view, that’s not a clean solution.

A better solution would be having the WebRTC stack on server that acts as a peer to other peers but let’s you tap in to do typical server-side work like saving the stream in a video file or taking each frame and to something with it in real-time.

There are some implementations of this role out there and a few seems really interesting and stable. I think that this will be the direction of our efforts and experiments, until the proper solution will be implemented in browsers.

The other problem is related to diffusion of browsers that support WebRTC.

Our interest is to reach as many people as possible so we must be compatible with any setup the user has on the computer is using, so we must keep into account and be compatible with every OS/browser/Flash version combination (in one of the last research we did, on 350 test sessions there have been 110 different user agents, not counting Flash versions!)

According to, WebRTC support is about 49% globally and this figure is too low for us to ditch Flash, but adding HTML5 could also increase the success percentage.

My conclusion is that we’ll continue with HTML5 research efforts but stay with Flash until implementations stabilize and browsers/devices diffusion will reach a higher level.

Since this is evolving matter, updates and suggestions are definitely welcome.


6 thoughts on “HTML5 Audio/Video Capture Status”

  1. Hi Laurent,

    since we’re still in the early days of evaluation of a couple of options (including developing our own solution), I’d like to talk about them when we’ll have more info that assesses their reliability.

    Did you have the chance to find any solution, too?

      1. Yes, we tried recordRTC but I see it implementing more a clever trick than a proper architectural solution (but never say never again… I’m thinking to web push implementations on the web like the Comet protocol i.e. all those “solutions” that uses things in a way they weren’t meant for).

        What do you think about this Laurent, could it be considered (at least) stable enough to be adopted?

        I think in this case the best solution (until MediaRecorder will be there) is having a server-side implementation of [part of] WebRTC: in this way you can leverage all its features without having to deal with the consequences of a trick.

        We’re moving in this way currently.

Comments are closed.