Microsoft is withdrawing from its public support for some AI-powered features, including facial recognition, acknowledging the discrimination and accuracy issues these offerings pose. But the company had years to fix the issues and didn’t. That’s like a car manufacturer recalling a vehicle instead of repairing it.
Despite concerns that facial recognition technology may be discriminatory, the real problem is that the results are inaccurate. (The discriminatory argument comes into play, however, because of the assumptions Microsoft developers made when creating these apps.)
Let’s start with what Microsoft did and said. Sarah Bird, the product manager of Microsoft’s principal group for Azure AI, summed up last month’s slump in a Microsoft blog†
†Effective today (June 21), new customers must request access to use facial recognition operations in Azure Face API, Computer Vision, and Video Indexer. Existing customers have one year to sign up and receive approval for continued access to the facial recognition services based on their provided use cases. By introducing Restricted Access, we are adding an extra layer of control to the use and implementation of facial recognition to ensure that the use of these services aligns with Microsoft’s Responsible AI Standard and contributes to high-quality end-users and societal benefits. This includes introducing use cases and eligibility requirements for the customer to access these services.
“Face detection capabilities — including detecting blur, exposure, glasses, head posture, landmarks, noise, occlusion, and facial bounding box — remain widely available and require no application.”
Look at that second sentence, where Bird highlights this extra hoop that users can jump through “to make sure using these services aligns with Microsoft’s Responsible AI Standard and contributes to high-quality end users and societal benefits.”
This certainly sounds nice, but is that really what this change does? Or will Microsoft just rely on it as a way to prevent people from using the app where the inaccuracies are greatest?
One of the situations Microsoft discussed involves speech recognition, finding that “speech-to-text technology in the tech sector produced error rates for members of some black and African American communities nearly double that for white users,” said Natasha Crampton, Microsoft’s Chief Responsible AI Officer. “We stepped back and considered the findings of the study and found that our pre-release testing had not satisfactorily accounted for the rich diversity of speech from people of different backgrounds and regions.”
Another problem Microsoft has identified is that people of all backgrounds tend to speak differently in formal than informal situations. Really? Didn’t the developers know that before? I bet they did, but didn’t think about the implications of doing nothing.
One way to approach this is to re-examine the data collection process. By nature, people being recorded for voice analysis will be a little nervous and will likely speak sternly and stiffly. One way to deal with this is to have much longer recording sessions in as relaxed an environment as possible. After a few hours, some people may forget they are being recorded and settle into casual speaking patterns.
I’ve seen this play with how people interact with speech recognition. At first, they speak slowly and tend to exaggerate. Over time, they slowly fall into what I call “Star Trek” mode and speak as they would to another person.
A similar problem was discovered in emotion detection attempts.
More from Vogel: “In another change, we will disable facial analysis functions that claim to infer emotional states and identity traits such as gender, age, smile, facial hair, hair, and makeup. We collaborated with internal and external researchers to understand the limitations and potential benefits of this technology and make the tradeoffs. Specifically in the case of emotion classification, these efforts raised important questions about privacy, the lack of consensus on a definition of emotions, and the inability to generalize the association between facial expression and emotional state across usage situations, regions, and demographics. API access to capabilities that predict sensitive characteristics also opens up a wide variety of ways they can be exploited, including exposing people to stereotyping, discrimination, or unfair denial of services. To mitigate these risks, we chose not to support a general purpose system in the Face API that could infer emotional states, gender, age, smile, facial hair, hair, and makeup. Discovery of these attributes will no longer be available to new customers from June 21, 2022, and existing customers will have until June 30, 2023 to stop using these attributes before retiring.†
In terms of emotion detection, facial analysis has historically proven to be much less accurate than simple voice analysis. Emotion voice recognition has proven to be quite effective in call center applications, where a customer who sounds very angry can be immediately put through to a senior supervisor.
To some extent, that helps make Microsoft’s point that the way the data is used should be limited. In that call center scenario, if the software is wrong and that customer… not in fact angry, no harm is done. The supervisor just completes the call normally. Note: The only common emotion detection with voice I’ve seen is where the customer is angry at the phone boom and his inability to really understand simple sentences. The software thinks the customer is mad at the company. A reasonable mistake.
But again, if the software is wrong, no harm is done.
Bird made a good point that some use cases can still responsibly rely on these AI features. †Azure Cognitive Services customers can now take advantage of the open-source Fairlearn package and Microsoft’s Fairness Dashboard to measure the fairness of Microsoft’s facial verification algorithms on their own data, helping them identify and address potential fairness issues that could affect on different demographics before deploying their technology.”
Bird also said technical issues played a role in some of the inaccuracies. “When working with customers who use our Face service, we also realized that some errors originally attributed to fairness issues were caused by poor image quality. If the image someone submits is too dark or blurry, the model may look may not match well. We recognize that this poor image quality may be unfairly concentrated among demographic groups.”
Among demographic groups? Isn’t that everyone, since everyone belongs to a certain demographic group? That sounds like a coy way of saying that non-whites can have poor match functionality. This is why law enforcement’s use of these tools is so problematic. An important question for IT to ask: what are the consequences if the software is wrong? Is the software one of the 50 tools being used, or is it just relied upon?
Microsoft said it is working to fix that problem with a new tool. That’s why Microsoft is offering customers a new Recognition Quality API that identifies issues with lighting, blur, occlusions, or head angles in images submitted for facial verification,” Bird said. “Microsoft also offers a reference app that provides real-time suggestions to help users create higher quality images that are likely to yield more accurate results.”
In a New York Times interviewCrampton pointed out another problem with “the system’s so-called gender classification was binary ‘and that’s not in line with our values’.”
In short, she says that while the system doesn’t just think in terms of just male and female, it can’t easily label people who identified in other gender ways. In this case, Microsoft simply chose to stop trying to guess the gender, which is probably the right decision.
Copyright © 2022 IDG Communications, Inc.