Monday, February 4, 2013

Simple streaming audio mixer for Android with OpenSL ES - part 2




Previous parts
 This is second part of small tutorial how to build simple streaming audio mixer for Android using OpenSL ES. In first part we initialize OpenSL Engine and created AudioPlayer. We also set callback routine that gets called when playing of buffer with sound data is finished. What we have to do now is to implement prepareSoundBuffer() and swapSoundBuffers() routines as well as some routine (PlaySound) that will initiate new sounds.

The Logic


 If you remember we cleared and sent two sound buffers when initializing the AudioOutput:
 // prepare mixer and enqueue 2 buffers
 // clear buffers
 memset(mAudioOutSoundData1, 0, sizeof(s16) * SBC_AUDIO_OUT_BUFFER_SIZE);
 memset(mAudioOutSoundData2, 0, sizeof(s16) * SBC_AUDIO_OUT_BUFFER_SIZE);
 // point to first one
 mActiveAudioOutSoundBuffer = mAudioOutSoundData1;

 // send two buffers
 sendSoundBuffer();
 sendSoundBuffer();

The logic is like this:
 1) in the beginning we send two buffers,
 2) as we set callback we are informed when playing of the first buffer is finished,
 3) when callback is called we fill the finished buffer with new data while the second buffer is currently being played,
 4) we enque buffer with new data and all the process repeats from 2).


Mixer


 As the name says the mixer mix something. Sounds in our case. How many sounds it can mix together depends on number of channels we create. The channel data structure looks like this:
typedef struct sSoundInfo
{
 bool mUsed;
 u16 mSoundID;
 s16* mData;
 u32 mLength;
 u32 mPosition;
 u32 mStarted;
} SoundInfo;

 meaning of the variables is like this:
mUsed - channel is used (not free for playing sound),
mSoundID - any ID you consider as useful for your game. It helps you to identify which channel is playing which sound and do something with it (like stop storm and flash sounds when storm is over ...),
mData - pointer to raw PCM data,
mLength - length of the sound data,
mPosition - current position within sound data (offset from start)
mStarted - time when sound started in milliseconds

the mixer structure then looks like this: 
 // mixer
 s32 mTmpSoundBuffer[SBC_AUDIO_OUT_BUFFER_SIZE * 4];
 s16 mAudioOutSoundData1[SBC_AUDIO_OUT_BUFFER_SIZE * 2];
 s16 mAudioOutSoundData2[SBC_AUDIO_OUT_BUFFER_SIZE * 2];
 s16* mActiveAudioOutSoundBuffer;
 // mixer channels
 SoundInfo mSounds[SBC_AUDIO_OUT_CHANNELS];

 Here you can see two buffers we are speaking about - mAudioOutSoundData1, mAudioOutSoundData2. There is also 1 temporary buffer mTmpSoundBuffer with twice as much bit width than needed for 16 bit sound. It will be clear shortly what is its purpose. Next, there is pointer to currently active buffer from the two of them. Writing data will be done into the buffer this pointer points to. After writing the data the pointer will switch to second one. The last line says how many sound channels our mixer has.

 sendSoundBuffer routine is the place where the empty buffer is first filled with data and then enquened to play. I will repeat the code from last article here:
void SoundService::sendSoundBuffer()
{
 SLuint32 result;

 prepareSoundBuffer();
 result = (*mSoundQueue)->Enqueue(mSoundQueue, mActiveAudioOutSoundBuffer,
   sizeof(s16) * SBC_AUDIO_OUT_BUFFER_SIZE);
 if (result != SL_RESULT_SUCCESS)
  LOGE("enqueue method of sound buffer failed");
 swapSoundBuffers();
}

 It is apparent, that sound buffer is prepared in prepareSoundBuffer(), so let's take closer look at it. First, we clear the temporary buffer:
void SoundService::prepareSoundBuffer()
{
 s32* tmp = mTmpSoundBuffer;

 // clear tmp buffer
 memset(mTmpSoundBuffer, 0, sizeof(s32) * SBC_AUDIO_OUT_BUFFER_SIZE);

 Next we go through our channels. First we check whether the channel is active. Then we calculate which is shorter - whether the sound buffer or remaining sound data in sample for our channel. We adjust position for next time and fill the temporary buffer with data. We are adding the sound data to data that are currently in temporary buffer. As we are playing 16bit PCM data it may happen that two or more amplitudes will meet at the same position. If we put data directly into buffer we would overflow the top or bottom limit. With temporary buffer wide twice as much we are safe.
 // fill tmp buffer
 for (s32 i = 0; i < SBC_AUDIO_OUT_CHANNELS; i++)
 {
  SoundInfo& sound = mSounds[i];
  if (sound.mUsed == true)
  {
   s32 addLength = Math::min(SBC_AUDIO_OUT_BUFFER_SIZE, sound.mLength - sound.mPosition);
   s16* addData = sound.mData + sound.mPosition;
   if (sound.mPosition + addLength >= sound.mLength)
    sound.mUsed = false;
   else
    sound.mPosition += addLength;

   for (s32 j = 0; j < addLength; j++)
    tmp[j] += addData[j];
  }
 }

 Finally, we need to clip the data in temporary int buffer to fit into final short buffer. This is done with simple loop while checking the range.
 // finalize (clip) output buffer
 s16* dataOut = mActiveAudioOutSoundBuffer;
 for (s32 i = 0; i < SBC_AUDIO_OUT_BUFFER_SIZE; i++)
 {
  if (tmp[i] > SHRT_MAX)
   dataOut[i] = SHRT_MAX;
  else if (tmp[i] < SHRT_MIN)
   dataOut[i] = SHRT_MIN;
  else
   dataOut[i] = tmp[i];
 }
}


Playing the sound(s)


 Until now we implemented all that is need to set up OpenSL ES and to mix sounds. Below is my routine that starts playing the sound. The class ISoundInfoProvider is interface class with only one pure virtual method:
virtual void getSoundInfo(SoundInfo& aSoundInfo) = 0;

 My sound asset classes are derived from common Asset class as other assets (textures, ...). Additionally it has to implement ISoundInfoProvider interface. From method parameters it can be seen that structure for channel is handed to it. The implementation of method should fill all necessary information needed to play the sound - where its data are stored, where it starts, how long it is, etc. For example: one of my sound assets are .WAV files. The implementation skips the .WAV header, says where the sound data are stored in memory and how much of them is there.
 With this approach you can use different file formats as far as you return requested fields in SoundInfo structure. Additionally your "System" part of engine (see Simple cross-platform game engine - Introduction) does not need to know about anything specific for currently developed game nor for "Engine" part of engine.

 The playSound first looks for free channel. If no channel is free and priority is zero then no sound is played. If priority is higher than zero then the longest playing channel is selected. Here much better handling of priorities can be done. But using 8 channels was enough for me so far so I had no problems. I used the priority only when I wanted to be 100% sure that sound will play. In the end I ask the ISoundInfoProcider implementation for filling the needed initial channel data and also the time when playing started is stored.
bool SoundService::playSound(ISoundInfoProvider* aSoundInfoProvider, s32 aPriority)
{
 // get sound info (where are data, how long it is, ...) from sound info provider
 SoundInfo* soundInfo = NULL;

 // find free sound slot
 for (s32 i = 0; i < SBC_AUDIO_OUT_CHANNELS; i++)
 {
  if (!mSounds[i].mUsed)
  {
   soundInfo = &mSounds[i];
   break;
  }
 }

 // not any free slot?
 if (soundInfo == NULL)
 {
  if (aPriority == 0)
   return false;
  else
  {
   // find oldest sound
   u32 started = 0x7FFFFFFF;
   s32 slot = -1;
   for (s32 i = 0; i < SBC_AUDIO_OUT_CHANNELS; i++)
   {
    if (mSounds[i].mStarted < started)
    {
     started = mSounds[i].mStarted;
     slot = i;
    }
   }

   if (slot == -1)
    return false;

   soundInfo = &mSounds[slot];
  }
 }

 // load sound info into free slot
 aSoundInfoProvider->getSoundInfo(*soundInfo);
 soundInfo->mUsed = true;
 soundInfo->mStarted = TimeService::getTickCount();

 return true;
}


Problems


 As you already guessed the new sound starts playing when buffer it is in starts playing. So there is some delay between the time you request the playing and time it starts because the buffer waits in two buffer queue. So you may attempt to make the buffer smaller. But doing this may lead to another type of problem. The buffers are so short you are not fast enough to fill them with data and the sound is choppy. So, you have to balance the size of the buffer.

 Here are values that work good for me:
#define SBC_AUDIO_OUT_BUFFER_SIZE 256
#define SBC_AUDIO_OUT_CHANNELS 8
#define SBC_AUDIO_OUT_SAMPLE_RATE 11025


Conclusion


 If you followed both parts of this article you should have now enough information on how to handle the OpenSL ES setting and how to simply mix sounds. This mixer we used for playing sounds in our Deadly Abyss 2 and Mahjong Tris games.