Earlier this month, MIT’s Centre for Brains, Minds and Machines announced it had compiled its first database of written English composed entirely by non-native speakers. The aim was to create a richer context for machine learning, considering the huge proportion of Internet users who speak English do so as a second language.
But it’s not just written English that poses an obstacle for artificial intelligence; virtual assistants like Siri still struggle with a wide range of spoken accents.
Irish and Scottish accents have long been a source of bafflement to virtual assistants, a fact that is well documented online, from the “15 times Siri hated Irish people” to “Siri vs Scottish accent”. Meanwhile, in the States, Texan writer Julia Reed has commented that “a smart person could make a lot of money by inventing a Siri for Southerners.”
The New Queen’s English
The speech recognition market was valued at $3.73 billion in 2015, and is expected to grow to $9.97 billion by 2022, according to a Markets & Markets report published in June this year.
“Speech technologies have proven so useful and successful at powering intelligent applications,” says Marsal Gavalda, Chief of Machine Learning at messaging app YikYak. “At the same time, we need to be cognisant they don’t work so well for everyone….We need to prevent a ‘speech divide,’ a class of people for whom speech technologies work well and another for whom they don’t. You’re putting those people at a disadvantage.”
This speech divide has led to a bizarre social phenomenon, wherein smartphone users moderate their own voices, or even attempt to imitate an American accent, in order to get their virtual assistant to grasp what they are saying.
“Most people have what we would call a telephone voice… They also have a machine voice,” says Alan Black, a Professor at Carnegie Mellon’s Language Technologies Institute (and Scotsman). “People speak to machines differently than how they speak to people,” he adds. “They move into a different register. If you’re standing next to somebody in an airport or at a bus stop or something, you can typically tell when they’re talking to a machine rather than talking to a person.”
In short; users are adapting to their devices, rather than the other way around.
But could this lead to the slow death of regional accents, as the English language’s rich global tapestry shifts and morphs into a single, generic, robot-friendly pronunciation? When accent and dialect are so intrinsically tied to class, heritage, and countless other factors which make up our personal identities, is the “machine voice” a seemingly innocuous first step towards the homogenisation of society and culture?
Tech companies are working hard to ensure that this doesn’t happen. Last year, the iOS8.3 upgrade equipped Siri with a greater ability to understand users with Indian accents. Earlier this year, Google announced that it “speaks ‘Strayan,” and there are now rumours that the company is seeking people with strong Scottish accents to help develop its speech recognition technologies, most likely in response to years of customer frustration.
Artificial Intelligence Can Be Biased
Siri isn’t a snob. There is a very simple reason why virtual assistants and other forms of artificial intelligence seem to be hard of hearing with regards to accent and intonation.
The more audio data that AI has access to, the better it gets at comprehending and responding to requests. However, when the datasets provided don’t include diverse speakers, it is hardly surprising that an AI will struggle to interact as effectively with different user groups.
“Automated technology is developed by humans, so our human biases can seep into the software and tools we are creating,” writes The Daily Dot’s Selena Larson. “But when systems fail to account for human bias, the results can be unfair and potentially harmful to groups underrepresented in the field in which these systems are built.”
One recent, pertinent example is the alleged gender bias of Google’s voice recognition software. According to linguistic researcher Rachael Tatman, queries from male voices were more consistently understood than those from women.
“Generally, the people who are doing the training aren’t the people whose voices are in the dataset,” says Tatman. “I think the people who don’t have socio-linguistic knowledge haven’t thought that the demographic of people speaking would have an effect.”
In order to overcome some of the existing hurdles in speech recognition, Tatman recommends that technology companies take greater care to assemble training sets which are diverse in race, gender and class.
Siri, I hope you’re listening!