07/02/2012

The role of speech on ultra-smart smartphones?

I’d like to get your opinion, so please do use the comment feature of my blog to respond!

Nobody doubts that the landing of the iPhone on our planet serves as kind of a game changer for at least the smartphone business. A multitude of devices are already following the way Apple has paved and – at least that’s my analysis – are putting the mass market consumer in the centre of all design considerations. This will eventually lead to real behavioral change (large scale) and give birth to a mobile application era with real-time services at its centre.

I’ve recently talked to folks at Nuance. Nuance are the unrivaled market leader in the speech recognition industry. They’ve literally acquired each and every competitor and as far as I know an 80%+ market share. From my experience their technology is rock solid and by far the best in speech recognition and natural language understanding one can get. Besides this Nuance owns T9 and SNAPin, technologies they have acquired and that bring Nuance pre-installed on many, many handsets across the globe.

One of the fundamental believes Steve Chambers (their Mobile & Enterprise Division President) has is that Speech will be the future mobile UI. Of course speech is the most natural way for human beings to communicate – to other human beings. It has not really gained momentum when it comes to controlling computers. Again, I do understand use cases addressed by tools like Dragon NaturallySpeaking but these are isolated solutions and not so much a proof point for speech as the ultimate future user interface.

I also understand use cases like voice based dialing where when you’re driving in your car you don’t want to lookup contacts on your device. I also like the idea of keyword based voice enabled search. These use cases already do exist. (You find more of those here.)

But: What role will speech recognition play with respect to the upcoming generation of ultra-smart smartphones?

Will consumers prefer the visual/gesture/touch based approach which gave the iPhone such a tremendous breakthrough? Or will consumers ultimately want to control their handsets via natural language voice commands?

I’m trying to get an idea of the strategic role speech might play as a future, core mobile UI technology.

What do you think?

facebook comments:

Comments

  1. Juan Mateu says:

    Ralf, if the internet trend is making devices simpler and less capable and rely in common interfaces (browser), avoiding applications, and relying in the cloud, I think the mobile internet and mobile services may follow the same path.

    We are working on using mobile enablers combined (voice call or VoIP call + Browser) to do the same:
    http://www.youtube.com/view_play_list?p=B08A200903FCD027

    Obviously the ASR mobile recogntion could be ok for handset functions (address book, menues, etc) but related full interaction with browsing, needing almost “natural language speech recognition”, I think is better to use the full ASR capability remotely.

  2. Tom Millar says:

    Hi Ralf,

    A bit light hearted, but maybe we are looking at the Star Trek effect here. But more seriously, I have just bought a new Nokia e71 which has a voice recognition application that you use to say the name of somone in your contact list and the phone automatically calls their number. I think it is a great feature of the phone and will be very useful when driving and operating ‘hands free’.

    For the future, I guess that a well developed voice recognition UI for high end phones would be great and of high value to users.

    Kind Regards

    Tom

  3. Dion Lisle says:

    I agree with Tom – voice recognition for looking up and dialing would be great. I have downloaded a couple for the iPhone, but so far they are disappointing. I am sure there are other applications as well. I would love to say an address into the phone and have a map come up with directions that read out loud with turn by turn instructions. Maybe the ability to dictate text into the phone and have SMS messages be sent. Any apps for the iphone that do this yet ?

  4. Bernd Wiegmann says:

    Voice recognition is already there in many phones and and it was actually working for 70% of the contacts I tried, so if it really did work perfectly it would be very useful to use in the car. There is only the slight side effect that you have to press a certain button to start voice recognition.
    If you think of a crowded train with a lot of people yelling at their phone tying to dictate a SMS, you might reconsider the value of this technology. If id really would work like in Star Trek it would be great, but I’ve seen promises for years now that a good speech recognition solution is just around the corner but so far I’m not convinced.

  5. So far, so good. Seems as if use cases like voice activated dialing and text-to-speech based listening to text messages make sense – at least in situations where users don’t want to focus (visually) on their devices (driving etc.).

    But how about the idea of having speech as the central future UI for smartphones? I’m asking because of some discussions I’ve had with people from Nuance. They tend to believe that speech is the ideal human-to-machine communication means. (They might be biased given that Nuance claims 80%+ of the ASR market share.)

    So what do you think? Besides these obvious use cases that already exist and work, will a full blown “speech UI” be what users want?

    Or will smartphones UIs evolve around other themes, e.g. touch and gestures?

  6. Paul Golding says:

    Speech recognition in The Cloud is far more important and I think there is a disruptive potential here for someone like Google to launch a speech services API, not just for search, but for any voice mash-up. I believe that this is Google’s intention with their various speech projects over and under the radar.

    http://googleblog.blogspot.com/2009/03/here-comes-google-voice.html

  7. Mark Tognetti says:

    Until recently I ran the applied R&D function for a fleet management company, where we did pretty extensive research in to the “next generation” of mobile UIs and talked with our clients. I firmly believe that a voice UI (VUI) will as important as the touch/gesture UIs. VUI provides for hands-free usage – an important safety aspect when driving (or even walking through an airport). That said, a strong touch/gesture UI provides a degree of privacy and/or courtesy that most folks want when sitting on plane, a train or in a meeting.

    We looked at a number of products that incorporate speech recognition (TellMe, MS Auto, Jott Networks, SpinVox, etc) and their underlying engines, including Nuance. Although some were impressive, none of them are truly good enough (yet). The most common areas requiring improvement were: 1) weak (or lacking) natural langauge processing and 2) poor recognition accuracy in noisy environments. Many of the vendors providing online services (including Nuance) use a mix of speech recognition software and human transcriptionists. With enough processing power and a repository of existing spoken words (perhaps in The Cloud) coupled with the right algorithms for phonetic and contextual matching, I know the technology will get there.

    Admittedly not a mobile application… I just saw a demo of Adobe Soundbooth which includes an audio transcription feature based on technology from Autonomy (I think) and it was pretty impressive.

  8. AJ says:

    I believe Mark hits the nail on the head – voice is crucial when driving or perhaps operating machinery of some kind but privacy and courtesy are often overlooked. Personally I would never use a VUI if anybody could hear me.

  9. Ilieva Ageenko, PhD says:

    Speech recognition in mobile phones would be very helpful. So far the quality of commercially available speech recognition features I have seen don’t meet consumer expectations. I agree with Mark, the most common areas requiring improvement are : 1) weak (or lacking) natural langauge processing and 2) poor recognition accuracy in noisy environments.

  10. Steve Howard says:

    This is interesting. The conversation seems to be progressing as if speech functionality is somehow new in phones. We’ve had this functionality available to us for a long time (10 years? More?) but largely it has been used by only a small percentage of people. Granted the potential quality of speech recognition and TTS engines has imporved as devices have become more powerful, but I really don;t think we care to talk to computers i nthe way Star Trek had us believe we might.

    I think the reality is that, aside from hands-free dialing, most people don;t have much interesting in speaking to their phones – and that’s ignoring the barriers like noisy bar/restaurant/commute, the need for privacy, the self-consciousness to talking to your pone when alone, and indeed in public … :-)

    To be honest, I think we’ll skip the step of talking to our phones en mass and jump to mind-control as soon as it is reliable and accessible to all.

  11. Partha Srinivasan says:

    Mind control system..I like that idea, probably the cleanest way of getting things done(provided it ignores the hallucination part).. :)..

    While its true that voice recognition has been in existence in mobile phones for quite sometime now, it hasn’t caught up with the mainstream users, due to recognition quality(and the pain of corrections). Besides the current usecases for dialing, texting, I believe web search and speech enabled navigation will soon catch up as google has already started providing voice search in their search application for mobiles. I am really impressed with their recognition quality as it seems to work well even for non-native english speakers like me!

  12. Steve Howard says:

    Wow – my tpying (!) is usually ropey, but that was extra bad …

    I noticed a friend of mine is using speech recognition to send tweets to Twitter with his Windows Mobile phone.

    Also, Micosoft recently released a ‘prerelease’ of an app called Microsoft Recite http://recite.microsoft.com/Pages/index.aspx This cool app lets you record voice messages for yourself, then let’s you search them on keywords using spoken search terms – e.g. every time your wife mentions something that you think she might like for her birthday, fire up Recite and speak something like

    Birthday, flowers
    Birthday, diamonds
    Birthday, Bahamas

    Then, when you get to time to buy a present, search your notes by speaking “Birthday”, and Recite will return all notes that contain the word Birthday. No need for training.

    I’ve tested this over the last few weeks and it is pretty effective. This isn’t actually text to speech, but I think it demonstrates one possible powerful use of mobile devices combined with speech.

    Here’s a review:- http://arstechnica.com/microsoft/news/2009/02/microsoft-recite-for-windows-mobile-previewed.ars

Speak Your Mind

*