07/02/2012

Does the speech enabled iPod shuffle hint at a voice enabled next iPhone?

Apple has released a new iPod shuffle. Besides its design a remarkable feature is called VoiceOver. I don’t own one, yet, but I assume it’s a built-in text-to-speech engine that reads out artists names, song titles and the name of your playlists. Controlling the iPod shuffle is still up to pressing tiny buttons, so it’s a one-way speech interface merely providing feedback to the user.

I wonder whether this move indicates that we might see a “fully” voice enabled next iPhone release this summer.

In any case, I’d like to get your opinion on the strategic role of speech as a first class UI technology for smartphones.

The role of speech on ultra-smart smartphones?

I’d like to get your opinion, so please do use the comment feature of my blog to respond!

Nobody doubts that the landing of the iPhone on our planet serves as kind of a game changer for at least the smartphone business. A multitude of devices are already following the way Apple has paved and – at least that’s my analysis – are putting the mass market consumer in the centre of all design considerations. This will eventually lead to real behavioral change (large scale) and give birth to a mobile application era with real-time services at its centre.

I’ve recently talked to folks at Nuance. Nuance are the unrivaled market leader in the speech recognition industry. They’ve literally acquired each and every competitor and as far as I know an 80%+ market share. From my experience their technology is rock solid and by far the best in speech recognition and natural language understanding one can get. Besides this Nuance owns T9 and SNAPin, technologies they have acquired and that bring Nuance pre-installed on many, many handsets across the globe.

One of the fundamental believes Steve Chambers (their Mobile & Enterprise Division President) has is that Speech will be the future mobile UI. Of course speech is the most natural way for human beings to communicate – to other human beings. It has not really gained momentum when it comes to controlling computers. Again, I do understand use cases addressed by tools like Dragon NaturallySpeaking but these are isolated solutions and not so much a proof point for speech as the ultimate future user interface.

I also understand use cases like voice based dialing where when you’re driving in your car you don’t want to lookup contacts on your device. I also like the idea of keyword based voice enabled search. These use cases already do exist. (You find more of those here.)

But: What role will speech recognition play with respect to the upcoming generation of ultra-smart smartphones?

Will consumers prefer the visual/gesture/touch based approach which gave the iPhone such a tremendous breakthrough? Or will consumers ultimately want to control their handsets via natural language voice commands?

I’m trying to get an idea of the strategic role speech might play as a future, core mobile UI technology.

What do you think?