Boost logo

Boost :

Subject: Re: [boost] Fwd: [asio]Extension for audio device service
From: adrien courdavault (adrien.courdavault_at_[hidden])
Date: 2013-04-24 17:54:29


Just an update, I try to clarify some basic points of the projct (the
scope especially).
The ideas on the conception are still really temporary, I am building
a comparison of existing APIs on various systems to see what would be
a good portable interface for several actions. The difficulty is
clearly the variety of existing systems. But I think the API is not
directly depending on that, it just has to be common. I guess the API
will also give native handle getters so the users can acccess to the
underlying implementation when needed.

Nonetheless I can say that seeing that as a Boost.ASIO like extension
seems to be right.
There will probably be different kinds of ioobjects : to list the
devices and retrieve stati infos, to open one and set the
sockets/streaming ...

On 24 April 2013 14:35, adrien courdavault <adrien.courdavault_at_[hidden]> wrote:
> Hi
>
> From all that I read that is BTW really interesting, I would like to
> be clearer, and I have to put that in the document:
> I want to do somethig simple.
> 1. audio streaming output only with native format supported by the
> device and no mixer or DSP processing.
> 2. add input (only asynchrounous)
> 3. push/shared modes based on the existing mixer backends.
>
> i don't want to do something that can does a lot of things
> conversions of formats and mixing are well done in existing mixers.
> for this reason too resampling is not part of the problem
>
> Then it will be possible to write layers over that to make it higher level.
>
> But really I just want something that can
> 1 list and adress audio devices and endpoints (given a certain audio
> direction and channel nb - this is related to the problem of the bus
> kind 1.0, 2.1, 2.0, 5.1 -)
> 2 connect to an audio device a streaming callback using the native
> format supported by the driver
> a output
> b input
> [3 later] connect to an existing mixer backend that allows more
> formats but that is not streaming
>
> thank you a lot for your feedbacks.
>
> I put my comments inside your post.
>
> On 24 April 2013 12:37, Brendon Costa <brendon.j.costa_at_[hidden]> wrote:
>> Hi Adrian,
>>
>> I am interested in helping out with this. I have also been thinking about
>> developing a boost library for use of audio hardware for about a year now.
>> But have not had the time and motivation to get around to it.
> I think it will take time indeed.
>>
>> The general idea of using boost::asio to provide an interface that is
>> consistent with other IO in boost is a very good one if we can achieve the
>> realtime requirements through it (which it sounds like we can from what you
>> have said).
> As I said, it looks possible from what I know (to fit the Boost.ASIO
> design), I may be wrong but worse case, we have something that works,
> and that does not follow Boost.ASIO design.
> But I would prefer, and I really think we can do this like other boost services.
>
>>
>> I would suggest that you need to support multiple audio "backends" per OS
>> when it comes to device selection. For example, on linux there is a
>> plethora of choices and no one choice is "correct". The user should be
>> given the option to choose which backend(s) to use/support. You will notice
>> that some apps like to provide all options, and others work with only one
>> backend.
> Clearly I want to support all streaming solutions : Steinberg ASIO and
> WASAPI on windows (minimum) and ALSA on Linux, and CoreAudio
> streaming.
> Then in a second time work on the non-streaming modes (shared modes).
>
>>
>> I was thinking of proposing three related audio API's for boost a while
>> ago. I do believe that to do this properly is a *LOT* of work (which is why
>> I never really started).
> I think this might be big, but that is also why we should stay focus
> and clear on what we would like to do.
> This is why I would like in the first steps to reduce the scope to
> streaming modes,
> and also the problem of cross compiling, is complew.
> An in the first time I would prefer to only do MSVC on windows, and
> gcc and clang on linux and mac. (not mingw on windows, because of the
> complexity of building win32 apis)
>
>>
>> 1) Audio DSP Module
>> This is for DSP related tasks and takes care of common things like audio
>> format conversion, resampling etc.
>> It is possible to write all of this I think in a platform agnostic way,
>> however some platforms provide helpers for some of these things like MacOSX
>> and maybe we should consider how we could use them or at least integrate
>> with them.
>> Possibly define both a compile time and runtime interface as both have
>> advantages
>> This would be used in the main interface for the play/record API for the
>> pull/push callback
>> Additionally this may expand in the future with all sorts of algorithms
>> defined for audio processing. Though at first I would suggest it be a
>> minimal of what is required to support the other two modules described
> Yes all you say is right, but in the first place, I would prefer to
> support the native formats offered by the deviec only.
> I don't wan't to do a processing layer, like jack for instance, I
> would really like to stay focus on the problem of opening the devices.
> This means that the device may answer pretty often that is format is
> 16bit PCM int, and that it does not support floating point.
> This what the different drivers familys do.
> Then in a further step add the push/shared mode (which will obviously
> offer way more formats).
>
>
>>
>> 2) Audio HW Play/Record Module
>> This is for accessing the play/record of audio on audio hardware.
>> I would suggest as you mentioned both push and pull modes. You can
>> "emulate" one with the other though be it with a lot of work (and some
>> latency) and some backends support both. So we should provide the interface
>> for all backends but export a "capabilities" of the backend indicating what
>> it supports natively (same as we should do for audio format).
>>
> Yes again I don't want to do emulations, or processing, I first want
> to provide easy access to the supported natives formats by the devices
> or the OS mixer.
>
>> 3) Audio Mixer Module
>> This is to control mixer settings including things like default device
>> selections, volume controls, mutes, supported formats and device
>> enumeration.
>> Also, it should recieve events indicating external changes to these things.
>>
>> I have designed and implemented an API at work that did a lot of this
>> (though the separation was not so well defined) and supported a number of
>> backends including DirectSound, WASAPI, CoreAudio, portaudio, PulseAudio,
>> OpenSLES.
>>
>> I might spend some time next week to writeup a proposal for a basic outline
>> of what I think an interface could look like. Maybe we can compare ideas
>> and notes?
>>
> Yes that would be a good idea.
> Again, I will first focus on native streaming modes portability which
> is my main issue to address
>>
>>
>>
>> --- snip ----
>> errcode audio_port::open(audio_format &,const audio_direction,
>> audio_device_mode ).
>> --- snip ----
>>
>> Could you define what you mean by an audio_port?
> Yes sorry, the audio port would the the i/o object created by the
> service, it would target an audio endpoint. (not a device)
> the name is probably not good.
> The service when creating an audio io bject would have informations to
> know which endpoint to open.
> Probably the direction should not be in the open API, but when
> requestiong the iobject to the service.
>
>>
>> Do you consider an audio_port to be an audio hardware device or a port on
>> that device?
> the endpoint, not the device.
> I have to make that clearer in the document.
>>
>> For example: I consider a hadware device as representing a single physical
>> device having one or more ports. Where a port is one of:
>> * Recording port : Normal wave device input records from mic/line in etc
>> * Playback port : Normal wave device output plays out to line out/speakers
>> etc
>> * Monitor port : Not always available but is similar to a Recording port,
>> except it returns exactly what is going out of the device speakers (for
>> example may include CD noise or MIDI or WAVE output from other processes).
>> This is sometimes used for doing echo cancellation on input audio that
>> cancels system sounds generated from other applications and not just your
>> application.
>>
>> I also think some professional sound cards may have additional "ports" of
>> type recording/playback. I think the MOTU is like that for example.
> yes I have the same.
>>
>> If you take that definition of a port, then the direction may become
>> un-necessary (at least I haven't seen a case where it would allow
>> bi-directional).
> i think we have the same point of view, we have to find a way to
> address endpoint on a device.
> Then the audio_port, the ioobject is created by the audio_service from
> the addressing information you give.
> The problem of addressing endpoints is a big issue, someone suggested
> something like an ip addressing.
> i mean this is not the same, but clearly you would like to say something like
> auto my_audio_port =sercice.create("motu.output.default_stereo");
> or something else
> then he open function would only have the mode and format as argument.
>>
>>
>> As for enumeration of "ports" it makes sense to have a simple function as
>> you described that actually wraps something that is possibly more complex
>> from the "Audio Mixer Module". I guess it *may* look something like:
>> std::list<boost::audiohw::port>
>> boost::audiohw::get_ports(boost::audiohw::recording_port|boost::audiohw::playback_port|boost::audiohw::monitor_port);
>>
> This is related to the problem of adressing, we should be able to get
> a list of devices, and a list of audio endpoints on each device.
>
>> What do you propose to be part of the audio_format struct?
>> I can think of:
>> * Channel Count : Integer >= 1
>> * Data format : float32, int16, ...
>> * Interleaving : yes/no
>> * Sample rate: 48000
>> * Channel mapping : (have a default defined mapping, but allow user
>> overrides)
> yes a bit like that.
> but I don't nkow if channel mapping should be in that that depends
> more on the addressiong of the endpoints I think
>>
>>
>> There should also be the ability to define some form of latency parameters
>> I think. Possibly even a place holder for extensible backend specific
>> configuration (latency details could be part of that).
>>
> Yes, even if that something that I have to study more
>>
>> --- snip ----
>> errcode audio_port::connect(audio_callback &);
>> --- snip ----
>>
>> This makes sense, would you allow multiple callbacks to be connected to the
>> single open port?
> No, not in streaming mode.
> because connecting several callbacks suggest that you have a mixer.
>
>> For recording ports, I guess this is simple in that it just calls all
>> connected callbacks with the same data
> same question, it should probably be exclusive
>
>> For playback ports, would you call multiple and then mix the audio into a
>> block given to the device?
> for the push/shared mode, which suggest a mixer, I would like to use
> the backend provided by the OS in this kind of Mode.
>
>> If so what are the restrictions on the audio produced by the callbacks?
>> Should they provide exactly as much data as requested?
> yes, when you do a realtime plugin, or application, using streaming,
> you receive a buffer to fill and its size.
> if you want to produce more audio data, then you have your own local
> buffer to bufferize
>
> this could be added in a further evolution, but I really want to keep it simple.
>
>> If so, how much data does the callback request in one call? Is it fixed,
>> variable, configurable (latency parameters I mentioned in the config)
>> If they could return less than requested, then the mixing becomes more
>> complex
> no the device asks for the data it needs, and this depends on how the
> device and the driver are made
> latency is fixed, samplerate too
>
>> Can you connect callbacks after you have started the device?
> no you connect, start stop .
>
>> I would assume that because the audio format has been defined already, then
>> the callback can be verified based on the callback type.
>> I.e. If defining a float32 format, then this could verify the callback data
>> type is a float. Or you could go the C way and use void* but there are
>> advantages to the type safety
>>
>> I guess a proposal for the callback could be:
>> result on_audio_cb(float* data_out, size_t data_out_size, const
>> audiohw::format& audio_format);
> Not exactly, this depends on the format, and the channel number.
>>
>> How do you plan to handle synchronous input/output audio?
>> What I mean by this is that the consumption and production of audio is
>> synchronized so when you get 20msec of audio from the input you must also
>> output 20msec.
> we have to see that but i think this is 2 separate callback, one
> sending the datas, the other pulling, and i think both may have
> different buffer sizes
>
>> This has benefits to various audio algorithms, and can be achieved across
>> different devices with special clock monitoring and resampling techniques.
>> It is generally the case for input/output on the same physical audio
>> hardware that they are synchronized, but is not across different hardware
>> as often the audio card clocks can be un-synchronized.
>> This decision can affect how to structure the open() call. For example, you
>> may open both input/output at the same time if you want to support this.
> This is a good question in the first steps no
>>
>> There are a few options the first is my preferred though is more difficult
>> to use:
>> * You can define for synchronous handling, that the record/playback/monitor
>> callbacks will always be called in order
>> * You could do what portaudio does and define the callback to have (but
>> that doesn't work well with the previous design)
>> result on_audio_cb(float* data_out, const float* data_in, size_t data_size,
>> const audiohw::format& audio_out_format, const audiohw::format&
>> audio_in_format);
>>
>> How do you plan to handle asynchronous errors in the audio device?
>> Maybe an error code passed to the on_audio_cb() or a seperate callback that
>> can be registered for errors?
> probably separate
>>
>>
>>
>>
>> On Wednesday, 24 April 2013, adrien courdavault wrote:
>>
>>> Hello.
>>>
>>> I make this new thread to be clearer.
>>> There is currently no way to manage audio endpoints cconnection easily.
>>> It looks like some people might find this usefull (as I do), and I've
>>> been suggested on the boost dev list to try to detail this, as an
>>> extension to Boost.ASIO.
>>>
>>> For this reason I create this thread.
>>>
>>> I'm trying to make a very basic first draft of the concepts and see if
>>> this may be a good idea.
>>>
>>> I attached here the first things I've written. This is very short and
>>> general.
>>>
>>> I would like to know:
>>> * do you think I'm going in the right direction by seing this as
>>> Boost.ASIO extension.
>>> * do you have suggestions
>>> * would someone like to participate ?
>>>
>>> Thank you
>>>
>>
>> _______________________________________________
>> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Boost.ASIO extension with an audio device service

   Author: Adrien Courdavault
   Date: 04/23/2013

  Introduction

   The goal is to provide a simple way to open an audio device and connect it
   to an audio callback function.

   The scope of this proposal wants to be really limited, we want :
     * define a service to address audio devices and endpoints
     * define how to open the addressed endpoints in streaming mode for output
       or input direction

   In the long term we also want:
     * define how to open an existing backend audio mixer such as WASAPI,
       Direct sound, PulseAudio in shared / push mode i.e. non-streaming
     * add a generic compatibility layer for 16bit PCM to floating point

   we don't plan currently to implement
     * a mixer
     * a format conversion layer
     * any processing algorithm
     * a re-buffering layer (from variable buffer length, to fixed length for
       instance)
     * any kind of higher level function

  Motivation

   There is currently no way to connect to an audio device using Standard c++
   lib or Boost.

   There is however several libraries that have this as a feature but which
   don't have a license like Boost, or which do much more than audio device
   connection, or are not designed for generic c++ usage, or are hard to
   integrate in a c++ project.

  Integration in Boost.ASIO

   The current Boost.ASIO library seems to be a good place to start this
   implementation which in concept is a bit like the serial port service of
   this library.

   However considering the differences with the other existing services of
   Boost.ASIO it seems to be better to make a separate service for this use.

  State of the Art

    Foreword

   Very often there is 2 ways to play audio:
     * the program implements a callback which will be called by the OS when
       audio is needed. This is a streaming API.
     * the program pushes audio data to the audio system which plays it when
       needed.

    Various drivers family

   The problem of opening audio devices and connecting programs with these
   devices is different on the different families of OS.
     * on MacOS, the audio devices are managed by [1]CoreAudio API only.
       CoreAudio support a streaming interface with very low latency.
     * on Windows, there is several driver families :
          + WASAPI: the audio system since Vista has an exclusive and a shared
            mode.
               o The exclusive mode is like the kernel streaming mode but in
                 the user layer, required for low latency professional audio
               o The shared mode is like the DirectSound, a push mode
          + DirectSound and Wavexxx API receives the audio buffer of all
            applications, and then mix and send that to the audio device
            driver. This is a push mode.
          + Steinberg ASIO: this is a format for drivers to implement streaming
            API on windows. This exists mainly because there was no low latency
            streaming API before WASAPI.
     * on Linux
          + ALSA Advanced Linux Audio Architecture supports up to 8 audio
            devices at the same time. This is used as a **streaming
            API.available on Linux>2.5.
          + PulseAudio : is a POSIX compliant audio network system which uses
            ALSA or OSS to open the audio device.
          + OSS is a legacy audio system with read write functions (like a
            file) this is push mode

  Design

    Overview

      The endpoint addressing

   This service can be used to create I/O objects to list audio devices, and
   endpoints, and get their address. The questions is what could an audio
   endpoint address be like ?

      The endpoint access

   Once you have addressed an endpoint, you need to get an I/O object that can
   make the request to open the endpoint stream, start and stop it.

    Other issues to address

      Compiler portability

   There is a lot of combinations possible here, one of the main issues is the
   specificity of the platforms and the compatibility with various compilers.

   For instance on Windows, several headers needed to build the WASAPI API
   implementation requires to build or with the Visual SDK, which doesn't work
   without several workarounds to build with MinGW.

   In the first steps there is no use to use workarounds, but it might be an
   issue in long term.

  Proposed API

   This part is really changing a lot currently and is just a vague idea.
boost::asio::io_service service;
auto resolver = audio::resolver(service);
//container of a list of endpoints
auto collection = resolver.resolve(
    audio::direction::output);

//select the first, i assume there is one
auto endpoint = collection[0];

//id or address of the endpoint
auto id = endpoint.get_id();
audio::socket socket(service);
audio::format desired_format(
    audio::data_type::int16,
    44100., //samplerate
    ...);
audio::format nearest_format;
auto errcode = socket.open(desired_format,nearest_format);

//connect you callback
errcode = socket.connect(my_audio_callback);
socket.start();
//...

   All this is currently a speculation.

References

   1. http://developer.apple.com/library/ios/#documentation/MusicAudio/Conceptual/CoreAudioOverview/CoreAudioEssentials/CoreAudioEssentials.html


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk