Developing for Amazon’s Alexa means thinking differently about how you approach some common engineering challenges. In this post, The Voyage Team’s Nik Reynolds looks at lessons learned from the recent development of a new Amazon Alexa Skill.
It can be easy to forget just how long Amazon Alexa’s been with us, particularly for those of us based in the UK. While Amazon didn’t release the Echo in Britain until September 2016, Alexa will actually celebrate her fourth birthday in November this year, having launched in the US all the way back in 2014.
During that time, the world at large has started to become a lot more comfortable with the concept of voice UI. While there will always be people who can’t get past the idea that a computer is listening to them talk, plenty more have discovered that the ability to do everything from playing your favourite songs to checking the weather without pressing a button is pretty handy.
As with all software development though, creating apps for a voice UI has its challenges. In this case, it’s the fact that human language can be difficult to understand, particularly when you ask a machine to interpret it. As people, we rely on things like tone, nuance and gesticulation to get our point across, all things that the vast majority of computers can’t grasp (not yet, at least).
That’s something we had to take into account when we developed a new Amazon Alexa Skill for Heathrow airport recently. Creating a Skill that allows users to check the status of any flight departing from or arriving at Heathrow, we quickly found that we needed to focus very specifically on three things: characters, numbers, and the incredible number of airlines that use Heathrow as a destination.
Two of these – characters and numbers – present a specific issue because the current version of Alexa is designed primarily to listen for user instructions in the form of full sentences; “Alexa, play the top forty songs from 1975”, for instance, or “Alexa, tell me what Premier League results were last Saturday”.
When you ask something like “Alexa, tell me about flight BA105”, the “BA105” element can quite easily get confused. Because the device isn’t expecting to be asked about a series of characters and numbers, it will sometimes try and interpret it as a word instead. When you add accents into the mix, that presents an even bigger issue for the device.
To address that, we had to look at the many potential ways that Amazon Alexa was interpreting each character or number. Helpfully, the Alexa development portal provides a readout of what the device hears when you ask it a question. That allowed us to look at what a user was asking and what the device was hearing and – as a result – create a much closer correlation between the two. If a user says “one” and Alexa hears “won”, for instance, we taught it that in the context of flight information, it needed to use “one” instead.
That helped to solve two of the issues we experienced but for the third, airline names, we simply had to teach Amazon Alexa what they were. That meant creating a database of every airline that flies from Heathrow, including potential variations when said by foreign-language speakers, and then feeding that data into the development. Much of that revolved around providing the device with phonetic breakdowns to ensure that no matter how an airline name was pronounced, the device had a strong chance of recognising it.
All of this would have been for naught if we hadn’t also built a good amount of “real life” testing into the development. If you want to create a great Amazon Alexa Skill, you need to be sure that it works with real human voices with many different accents. While it might be faster and simpler to test it with an English or robotic voice alone, there’s only one way to be sure that it’s going to work when it hits the Amazon storefront – and that’s by testing, testing and testing with users from around the world.