Friday, April 13, 2007

IAudioEndpointVolume

As I mentioned in my last post, our attempts to synchronize Windows' mixer with the hardware is one of the known issues with Vista compatibility. I've been working on this the last few days and I now have code that should work. This post is about my understanding of the system and how I got it to work. While I don't think there are many that are facing exactly the challenges we are, I suspect my experiences may shed some light on the system as a whole which you might find useful.

The new audio system in Windows Vista (WASAPI) is made up of a cornucopia of streams, sessions, endpoints, audio processing objects, and so on. Going into detail on all of these is way beyond the scope of this post.

One of the bits you do need to understand is about sessions. Every process that works with audio has one session that is implicitly created by default. The new WASAPI interfaces allow you to manipulate this session and create more. If you are still using the old multimedia system, you obviously don't have access to session manipulation functions and you're stuck with the default one. This isn't really a problem for most purposes. The reason it is a problem for us is that all mixer adjustments through the old multimedia API only apply to this default session object, which is local to your process and do not affect the rest of the system. Muting the microphone just for the PerSonoCall process and leaving the rest of the system alone isn't too useful. The point of the integration is to ensure the hardware mute is in sync with the system-wide mute.

This leads us to the next little bit of WASAPI that needs to be understood, endpoints. An endpoint is something capable of rendering audio (e.g. speaker) or capturing audio (e.g. microphone). An endpoint is the lowest level of access to audio properties before everything disappears into kernel-space drivers and the hardware itself. If you are looking at the mixer control panel in Vista, the master volume slider on the left represents the endpoint's setting, while the subordinate mixers to the right represent the settings for all the currently active sessions.



Changing settings at the endpoint is exactly what we need to do in the Plantronics SDK. To accomplish this, we fist need the absolutely latest platform SDK. Once that is in place we need to include Mmdeviceapi.h and Endpointvolume.h from the SDK's /include directory. The last thing to note is that all the new WASAPI stuff is done through COM interfaces, meaning you can add the code to your project and compile without introducing any Vista dependencies on your user's machine. Just make sure your on Vista before you try to instantiate any WASAPI objects. Also, since this is Vista code, we need to make sure WINVER, _WIN32_WINNT, _WIN32_WINDOWS, _WIN32_IE are defined to the proper value of 0x0600.

The place we actually start code-wise is with IMMDeviceEnumerator, which will get us a list of endpoints in an IMMDeviceCollection. That collection gives us access to a series of IMMDevice entries that represent devices meeting the specifications we pass in to IMMDeviceEnumerator::EnumAudioEndpoints(). In this case we'll be asking for active capture devices.

//Obtain a device enumerator, the root of all this Vista multimedia businesss
ATL::CComPtr deviceEnumeratorPtr;
deviceEnumeratorPtr.CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_ALL);

//use the enumerator to obtain a collection of devices
ATL::CComPtr devCollectionPtr;
deviceEnumeratorPtr->EnumAudioEndpoints(eCapture, DEVICE_STATE_ACTIVE, &devCollectionPtr);

Next we'll work our way through the collection checking one device at a time until we find the one we want. In this case, a CS50/CS60-USB. Getting a specific device is a little bit complicated. If you aren't looking for a specific device, using IMMDeviceEnumerator::GetDefaultAudioEndpoint() is easier. The reason it is complicated is that we'll need to use the property store system, another new thing with Vista. We'll need propvarutil.h from the platform SDK's /include directory as well as propsys.lib from the /lib directory. Note that adding this library file introduces a dependency on the presence of propsys.dll, which won't be around on any system other than Vista. Marking propsys.dll as a delayed load solves this problem, but has some other consequences beyond the scope of this post.

A property store is a bag of key/value pairs. The key has its own specific PROPERTYKEY struct that consists of a GUID identifying the property's family and a DWORD identifying the specific property. The value accessed by the key is a PROPVARIANT which isn't much different from the run-of-the-mill VARIANT you probably already know. This property system was developed for the Windows shell and for working with files. In WASAPI's case it is being used with vaguely file-like objects such as endpoints. You can see a lot of the property keys used by the shell are already defined in propkey.h in the SDK's /lib directory. There's a few defined for the audio system in mmdeviceapi.h. Unfortunately the ones we need aren't there. I've determined that {B3F8FA53-0004-438E-9003-51A46E139BFC}.6 appears to be the correct value through experiment. It works, but don't take it as gospel or anything. Also note that opening the property store obtains a lock on the object, so open it, read it, and close it quickly.

//Find the device in the collection that represents the CS50/CS60-USB by going through the collection one device at a time
UINT devCount = 0;
devCollectionPtr->GetCount(&devCount);
for(UINT i = 0; i < devCount; i++)
{
//Obtain a device pointer
ATL::CComPtr devicePtr;
devCollectionPtr->Item(i, &devicePtr);

//open the device's property store
ATL::CComPtr propertyStorePtr;
devicePtr->OpenPropertyStore(STGM_READ, &propertyStorePtr);

//check the proptery store for the device name
GUID MixerNamePropertyGUID = { 0xB3F8FA53, 0x0004, 0x438E, { 0x90, 0x03, 0x51, 0xA4, 0x6E, 0x13, 0x9B, 0xFC } };
DWORD MixerNamePropertyID = 6;
PROPERTYKEY mixerNamePropKey = {MixerNamePropertyGUID, MixerNamePropertyID}; PROPVARIANT mixerNamePropVar;
WCHAR* mixerNamePropVarStr;
propertyStorePtr->GetValue(mixerNamePropKey, &mixerNamePropVar);
PropVariantToStringAlloc(mixerNamePropVar, &mixerNamePropVarStr);

//if the device's property store says it is the CS50/CS60-USB, toggle its mute setting
if(0 == wcscmp(mixerNamePropVarStr, L"CS50/CS60-USB Headset"))
{
//use the device pointer to get an audio endpoint volume pointer
ATL::CComPtr audioEndpointVolumePtr;
devicePtr->Activate(__uuidof(IAudioEndpointVolume), CLSCTX_ALL, NULL, (void**)&audioEndpointVolumePtr);

//get the current mute setting and invert it.
BOOL muteSetting;
audioEndpointVolumePtr->GetMute(&muteSetting);
muteSetting = !muteSetting;
audioEndpointVolumePtr->SetMute(muteSetting, NULL);
}

//free the storage that contains the device name that was allocated by PropVariantToStringAlloc
CoTaskMemFree(mixerNamePropVarStr);
}

Starting from the top there's a few things to note. First we get a IMMDevice with IMMDeviceCollection::Item(). We then get an IPropertyStore with IMMDevice::OpenPropertyStore(). We then build a property key, which we pass into IPropertyStore::GetValue(). We turn this into a string we can work with using ::PropVariantToStringAlloc(). It is important to note that this allocates memory that needs to be cleaned up with CoTaskMemFree(). Once we find a matching name string we get an IAudioEndpointVolume with IMMDevice::Activate(). We finish up with IAudioEndpointVolume::GetMute() and IAudioEndpointVolume::SetMute().

That's a lot to take in all at once. The quick summary of the basic flow is IMMDeviceEnumerator to IMMDeviceCollection to IMMDevice to IAudioEndpointVolume, with a brief digression into IPropertyStore to make sure we have the right device.