GStreamer Echo Canceller

For a long time I believed that echo cancellers had no place inside GStreamer. The theory was that GStreamer was too high level and would never be able to provide accurate enough delay information for any canceller to work. With fairly simple test, I could quickly confirm that the reported latency is often off by a period (generally 10ms). This isn’t strictly GStreamer’s fault and is not in any ways catastrophic for general playback experience.

With the apparition of WebRTC in browsers, it most likely became apparent that to be cross-platform browsers needed to have their own canceller. That’s exactly what happened in libWebRTC (former libjingle, used in both Firefox and Chrome to implement WebRTC). They implemented an echo canceller that accept an approximate delay and this changes everything for GStreamer.

At Collabora, I recently had the opportunity to implement this WebRTC Audio Processing based echo canceller. The main motivation was that the canceller on the hardware DSP we had didn’t work due to a hardware bug. A lot of those boards had been produced and no rework was possible. To save these boards, we decided to try with a software echo canceller. Even though it was using a fair amount of CPU, the experiment was a success. I have then clean-up the code and the new elements are now available in GStreamer Plugins Bad.

How does it work ?

Echo

The first step is to understand what is the echo. In a phone call with loud speaker, what happens is that your microphone records both your voice and the far end voices. The side effect, is that you are sending to the far end listeners both your voice along with a bad copy of their voices a moment before (the echo). To avoid this echo, you need to monitor the far end stream that you are playing back and “subtract” it from the recorded stream. In practice, it’s much more complex work, since the signal is deteriorated by the speaker and the microphone. You also need to figure-out the delays and hint the canceller, otherwise you may end-up with a terrible startup time or it may simply not work.

The implementation was greatly inspired from an experiment Olivier Crête did in 2008 using Speex DSP. I must admit, I never really understood his way of synchronizing the streams and literally ignored pretty much all the code that wasn’t GStreamer specific. The design works this way, you have a DSP element (webrtcdsp) that process the recorded stream and a probe element (webrtcechoprobe) that analyses the far end stream (before playing it back). Due to WebRTC library limitation, those two elements will transform the input buffer into chunk of 10ms. This is done using GstAdapter help. On the probe side, we push buffers in the adapter with timestamp transformed to running time. This time, plus the pipeline latency, gives us the moment in running time when the buffer should be heard by the microphone. We then synchronize the far end data against the recorded data and then let WebRTC Audio Processing library do it’s magic. A simple way of testing the element is by using an echo loop.

  gst-launch-1.0 pulsesrc ! webrtcdsp ! webrtcechoprobe ! pulsesink

Without the canceller, this pipeline would create a lot of echo, and probably end with loud feedback if your microphone volume is high enough. With the canceller, you should instead ear only one echo. It behaves a bit like a sound monitor but with too high latency and the side effect of fading in and out monotonic frequencies. After all, this is not what the algorithm have been design for. Try it in your real audio call application, that’s where you will most likely get the best results.

Before I conclude, there is a good reason why I called the element DSP rather then AEC. WebRTC Audio Processing is much more then just an echo canceller. In fact, it implements a wide variety of filters, noise suppressor, voice activity detection, etc. Currently we enable of subset of it, but I’m definitely looking forward enabling more (if not all) features from this library. I also encourage contributions. This works was only possible because of the great effort Arun Raghavan have put into extracting the echo canceller form the WebRTC project, create a standalone library usable by all. If you are interested about what cool feature could be added in the future, have a look at Arun’s blog about beamforming. And last one, thanks to my colleague who had to suffer me speaking with my computer listening to my echo for few weeks.

Sharing my first 3D printing experience

For a while I wanted to try this new world of 3D printing, but didn’t have any useful project in mind and my motivation faded away. Though last weekend, while I was going to setup the umbrella on the balcony, I realized that my installation would be lot better if I had the right part. Let me explain the setup. Instead of buying a large and heavy base, I simply attach the pole to the ramp. As the ramp is a bit higher, I need to create a certain distance between the pole and the ramp in order to be able to plug and un-plug the umbrella. Always leaving the umbrella in place is not an option due to strong winds. So the setup is simply a U shaped bolt, with a metal plate, two knots and something between the ramp and the pole. That something was last year a simple peace of wood, which was very shaky in the end.

Like any othThe 2D parter project, I started googling. Obviously I wanted to do this work entirely from Linux and using OpenSource softwares. I stumbled across a really simple way of create the part I needed. The idea is simple, you draw the 2D shape of the part and then give it a depth. Now all you need is the tools. To draw the 3D shape, I found that Inkscape was quite nice. It works well in millimeters and you can do shape subtraction. The resulting shape is a rectangle minus the pole (a circle of 38 mm) and the bar (a square of 10 mm). To make sure it fits, the circle was enlarged to 39 mm and the bar to 11 mm, with the exception that the bare must not be entirely inside the part otherwise it would be loose.  You can find the the SVG is here.Appuie3D

The next step is to make a 3D model for this part. This is really easy using an Inkscape extension that produce an OpenSCAD file. Then from OpenSCAD you simply export into STL format. This format is commonly supported by slicer application. Now, this isn’t the last step. In order to print the part, you need to convert this STL into GCode. This is a format that describe the path that the printer will use to print. That GCode is often specifically configured for your printer. As I don’t own a printer, I headed to échoFab, a FabLab near my work office in Montréal, and the first in Canada. At the Lab, they already have the slicer Cura configured for their printer to produce the GCode. The rest was very straightforward with the help of the people in the Lab.

After two hours of printing, I finally got the parts and could install the setup. Enjoy !

OLYMPUS DIGITAL CAMERA
Close view
OLYMPUS DIGITAL CAMERA
The parts
OLYMPUS DIGITAL CAMERA
The final result

GStreamer gains V4L2 Mem-2-Mem support

Two years since my last post, that seems a long time, but I was busy becoming a GStreamer developer. This story started late in 2009. A team at Samsung (and many core Linux contributors) started adding new type of drivers to the Linux Media Infrastructure API (also known as Video4Linux 2). They introduced video decoding, encoding and video post-processing support through a class of drivers called memory-to-memory.

At the end of 2012, my employer, Collabora was chosen to implement a proof of concept, enabling hardware decoding support to the Cotton Candy, a USB stick size computer based on Samsung Exynos 4412 and built by FXI. The new element has been developed by Sebastien Dröge and was called mfcdec. All this being demonstration code, it never got close to being useful in production..

At the end of 2013, we got contracted again, to bring the demonstration code toward production code. At this point, we took the decision that we where no longer going to build an Exynos specific decoder, but instead re-use the existing GStreamer V4L2 support and do it the “right” way.

It took nearly three months, but with the help of my colleague Julien Isorce, we managed to upstream and ship hardware decoding support for the Cotton Candy. The new element is called v4l2videoNdec, where videoN is that name of the driver node (to allow having multiple decoder at the same time). The element was well suited for static pipeline and embedded applications, but not as flexible as software decoders for desktop.

At the beginning of 2014, we started a new project with Endless Mobile. This time, the goal was to do hardware accelerated decoding also on an Exynos 4412 platform, but in a desktop environment base on Gnome Shell. Two main issues had to be addressed. The buffer pool in GstV4l2 did not track it’s memory, and the color format produced by this decoder could not be color converter using GLES2 shader (not enough coordinate precision). We had to implement a custom memory allocator and rewrite most of the v4l2 buffer pool code. To handle the color format, we had to implement an element that wraps hardware video converter in order to obtain video frames in a format that can be uploaded to GLES2.

As of today, all this effort has landed into GStreamer and is now part of 1.4 GStreamer release. Some of my colleagues went even further by demonstrating during SIGGRAH the benefits of using V4L2 decoder when combining DMABUF and Wayland. Other team, including Pengutronix on Freescale CUDA and STE have started testing against this new promising decoder which finally brings a standard and low level way of decoding medias on Linux.

As easy as Seed


It has been around for a while, but only today I decided to try SeedKit. I told myself, let’s write something, and this is what came up.

Toward an improved GLib network stack

I recently started looking into GLib TLS implementation written by Dan Winship. As a reflex, I tried to merge it into my proxy branch. Things did not go so well, not because the branches are wrong, but because they both need similar features inside GIO. This is a good lesson for me, I should have been more careful to what Dan was doing.

So I decided to restart my analyses with both features together (Proxy and TLS).  I’m far from being done but I’d like to share my findings so far. Let’s start with a simplified representation of a “normal” network connection.

Glib Network Connection Timeline

The goal with that little timeline was to prove myself that both features fits well together. Some may notice that this is exactly what would happen if you connect through SOCKSv5 to an XMPP server doing TLS auto-negotiation. This use case is important for me since I would like to remove TLS code from Wocky (Telepathy Gabble XMPP stack) and use GLib in the future.

That’s all very interesting but where’s the problem ? Well the thing is that both proxy and TLS handshake requires information about the original destination address. For proxy handshake, we have to send the original hostname and port to the proxy server. For proxy protocol like HTTP, you need to know the destination protocol in order to decide if you leave the connection as-is (e.g HTTP, Gopher and FTP) or if you have to use HTTP Connect method. For TLS handshake, you need to check server certificate against the original destination address (not against the proxy server address). You also need to take in consideration the scheme if an URI was used to connect.

On proxy side, the implementation is using the GProxyAddress (a subclass of GSocketAddress). It is used to memorize the destination hostname and port. On TLS side a method get_name() was added to GSocketConnectable interface. The first thing I told myself when I started writing this was: Why a program can’t just remember that information ? Well the answer is that in both cases this information might be acquired dynamically during the address enumeration. As an example, if you use GNetworkService class, you will never know what hostname was really used since the returned GSocketAddress does not contain it.

Basically, this is what I need to work on to pursue my way trough a more complete network stack in GLib (and get Proxy/TLS support from GLib in Telepathy Gabble ;)).

Transparent proxy for GLib applications

The biggest problem with socket based proxies are their simplicity.  It’s a bigger pain to write them once for all applications than to implement them for a single application.  But that is if you don’t consider system wide and automatic configuration. So far, the result on Linux is that most desktop environments have global settings. Except the browsers, none of the applications use them.

Continue reading

Mail Notification in Meego

With Meego for Notebooks 1.0 users of Google and MSN can now monitor and open their online e-mails account with a single mouse click in the MyZone panel. Thanks to Intel for hiring Collabora to implement this feature. The GUI shows the number of unread messages for each of your accounts in real-time. Clicking the button will open the online mailbox in your favourite browser. Have a look at the bottom left of the following screen:

Meego for Notebook 1.0 presenting Google Mail Notification

The feature is as been made possible by Telepathy draft API for mail notification. Meego provides the premiere integration of this API in a user interface. It fills a long standing gap in the Telepathy framework.

Got a blog

First let me introduce myself. I’m Nicolas Dufresne also known has stormer on IRC. I’m currently occupied at adding proxy support into GLib Network IO API.  From now own, I will be posting about my current and previous work, or simply giving my ideas about Open Sources softwares in general.

Enjoy!