Jump to Navigation

Writing Interactive Audio Applications with Low Latency

Session Summary: 
Writing an interactive audio application, where the device becomes a musical instrument, is a totally different game from your standard "media" application. How do you write a multi-touch DJ app, drum machine, or synthesizer that will be... awesome? The answer is to build your app for low-latency response times. When someone presses start or stop on a media player like Banshee... any time delays go unnoticed. But if a drum machine takes 100ms from touch to "BOOM"... it becomes instantly un-cool. This session will cover some of the basics of PCM audio, present good design practices for audio apps, discuss hardware challenges, and discuss API options in MeeGo. Real-world audio applications from the Indamixx 2 and the Linux Audio community will be used as examples. If time permits, the adaptation of MeeGo to the Indamixx 2 will also be discussed.
Session Abstract: 

Interactive audio applications are a different breed. They literally become musical instruments. The connection between the musician and the instrument must be a tight feedback loop. Whereas a typical "media" application isn't concerned with latency or the details of the audio stream... low latency is the whole point of an interactive audio application. This requires a different approach from the black-box approach of the "media" app.

When the user gives some manner of input (e.g. touches the screen)... the amount of time it takes for the user to hear the audio is called latency. This time delay includes everything in the pipeline: touchscreen hardware response time, kernel response time, application response time, kernel audio output response time, and audio hardware output response time. Musicians are able to perceive latencies as low as 5-10 ms, and normal people will typically perceive anything above 25 ms. A latency of 40 ms or higher may make your application unsuitable for live (interactive) performance. For most computer systems, and especially power-conserving mobile systems, this is a tight requirement.

To design for low latency, this session will present techniques to separate the UI and audio layers. Interrupt-driven audio programming will be introduced, as well as the rules of writing real-time audio code. Real-world examples from the Indamixx 2 and Linux Audio communities will be presented. Realistic performance measurements within MeeGo will also be discussed.

Within MeeGo, the audio API's that you have available are: The MeeGo API (QtMultimedia), PulseAudio, ALSA, and GStreamer. Each of these API's will be discussed (along with the pros/cons). StretchPlayer will be used as an example of porting to each API.

Also in MeeGo, the different hardware platforms require consideration. Applications in this class are typically done with floating-point math, which is fine on the desktop. But in MeeGo many of the target processors do not have enough floating-point power for this kind of application (e.g. ARM). The basics of PCM audio and the different audio formats will be discussed.

(Time Permitting) The stock MeeGo components are optimized for power-saving, mobile devices. While this is good, it is not good for low-latency interactive audio applications. The Indamixx 2 is a MeeGo-based OS for Musicians, DJ's, and Pro-Audio users. This required an adaptation kernel, the addition of the JACK Audio Connection Kit, and various other tweaks. One strength of MeeGo is that you can make an adaptation like this, and still be "MeeGo." The changes made and the reasons for the changes will be presented.