Professors working to address inclusivity in automatic speech recognition systems

French & Italian professor Zsuzsanna Fagyal and electrical & computer engineering professor Mark Hasegawa-Johnson, affiliate faculty in the Department of Linguistics, have been awarded an $800,000 grant to help make automatic speech recognition systems more inclusive.

Fagyal and Hasegawa-Johnson received the NSF Program on Fairness in Artificial Intelligence in Collaboration with Amazon grant from the National Science Foundation and Amazon.

In collaboration with a team of three scientists from Johns Hopkins University, they will work on developing a new model for the evaluation and training of inclusive automatic recognition systems. The goal is to better recognize regional, ethnic, and learner varieties of spoken American English, as well as speech patterns of second-language learners and people with disabilities.

One of the reasons Fagyal became interested in this research project is because it opens new horizons for sociophoneticians like her, who study the sources of socially meaningful variation in speech.

“There is a long tradition in sociophonetics of studying the ways in which our pronunciation can convey information about identities, among them age, gender, dialect, race, and ethnicity, but we have only recently started testing some of our hypotheses in very large corpora using cutting-edge computational tools,” Fagyal said. “One of the corpora used in this project, for instance, will be based on a 100,000-podcast spoken English database released by Spotify for academic research; this is definitely the largest sample size that I have ever worked with.”

The most important motivation for her involvement, though, is the opportunity to contribute to inclusivity and fairness in her field.  

“Type into a search engine the keywords ‘technology’ and ‘discrimination,’ and you don’t have to scroll down too far before you see articles on how emerging technologies run the risk of perpetuating intolerance and social injustice,” Fagyal said. “I feel that it is time to take these challenges seriously. Linguists and social scientists should team up with colleagues in AI to help design and train automatic speech recognition systems that will have much lower error rates than they have today in recognizing the speech of black speakers, women, non-native speakers, and many other groups.”

The results of this work will be disseminated via publications and presentations and incorporated into communicative, learning, and assistive technologies.

“In addition to improving practical applications in speech technologies and, hopefully, seeing fewer court cases on bias in access to the best jobs and schools, I hope that projects like this can also contribute to changing our own mentalities over time,” Fagyal said. “Just like our intelligent machines, we need to get rid of our stereotypes of ‘good English,’ ‘good French,’ good… any languages that are nothing but harmful stereotypes.”

Dania De La Hoya Rojas