Tutorial – Using the Android Speech API

In this blog we will teach you how to use voice input and output on Android devices.

There are two parts to this tutorial.

Speech Recognition and Text to Speech

The software in this tutorial makes use of the native speech recognition and text to speech capabilities of your handset, using the android.speech package.

  • Speech recognition android.speech requires API Level 3 or above while
  • Text to Speech android.speech.tts requires API Level 4 or above

You can find the the Javadocs at http://developer.android.com/reference/android/speech/package-summary.html. Compared to reading all of the docs, this tutorial should greatly reduce the time it takes to build your first speech recognition application.

Create your application

This tutorial assumes that you have built an Android application before. In this tutorial we are going to create an application that performs speech recognition and then plays back the recognised works to the user using text to speech.

The first step in this tutorial is to create an application, in which the main activity is called MainActivity. Then create your layout so that it includes a Text View and a Button. If you have done this correctly you should have the following layout in the file res/layout/activity_main.xml.

Speech Recognition

The Google Voice API is performed in the network, so if you’re using the Google speech rec, it will only function when your network is activated.

Speech recognition is achieved using the android.speech.RecognizerIntent. class. When we want to invoke the speech recogniser we simply fire a RecognizerIntent as shown in the following code.

private static final int REQUEST_CODE = 1234;

private void startVoiceRecognition()

{

talker.stop();

Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);

intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,

RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);

intent.putExtra(RecognizerIntent.EXTRA_PROMPT, “Speak Now…”);

startActivityForResult(intent, REQUEST_CODE);

}

Then we override the onActivityResult method to handle the result from the speech recognizer as follows.

protected void onActivityResult(int requestCode, int resultCode, Intent data

)

{

String heard = “”;

String default_phrase=”Default Phrase”;

if (requestCode == REQUEST_CODE && resultCode == RESULT_OK)

{

// Extract the most likely spoken phrase.

ArrayList matches = data.getStringArrayListExtra(

RecognizerIntent.EXTRA_RESULTS);

if(matches.size()>0){

heard = matches.get(0);

} else {

heard = default_phrase;

}

} else {

heard = default_phrase;

}

super.onActivityResult(requestCode, resultCode, data);

HandleVoiceInput(heard);

}

In the example above we extract the most likely spoken phrase and pass it to the method

HandleVoiceInput(String heard);

Text to Speech

Text to speech is even simpler because it doesn’t involve any callbacks.

To add text to speech you simply need an instance of the class android.speech.tts.TextToSpeech in your main activity class.

Add the following lines to the header of your main activity class.

import android.speech.tts.TextToSpeech

import android.speech.tts.TextToSpeech.OnInitListener;

Within the class definition create an member of this class i.e.

TextToSpeech talker;

Your main activity class needs to implements OnInitListener class. You will also need to override the method;

public void onInit(int status)

This method can be empty. You will also need an instance of the class HashMap;

HashMap ttsparams = new HashMap();

Override the onCreate() method and in it create the object

talker = new TextToSpeech(this, this);

Then in the same method setup the parameter hash map as follows;

ttsparams.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID,”stringId”);

Now all that is required for text to speech is to call the speak() method as follows;

talker.speak(text2say, TextToSpeech.QUEUE_FLUSH, ttsparams);

Let’s put this inside a method as follows;

public void say(String text2say){

talker.speak(text2say, TextToSpeech.QUEUE_FLUSH, ttsparams);

}

It is also a good idea to stop the text to speech when speech recognition begins using the stop() method as follows;

talker.stop();

The last thing that you need to do is to shutdown the TTS when the Activity is destroyed as follows;

@Override

public void onDestroy() {

// Shutdown the TTS

if (talker != null) {

talker.stop();

talker.shutdown();

}

super.onDestroy();

}

Putting it all together

If we put it altogether we end up with something like the following code, that performs speech recognition and then plays back the recognised works to the user using text to speech.

public class MainActivity extends Activity implements OnInitListener {

TextToSpeech talker;

HashMap ttsparams = new HashMap();

private static final int REQUEST_CODE = 1234;

@Override

public void onCreate(Bundle savedInstanceState) {

super.onCreate(savedInstanceState);

setContentView(R.layout.activity_main);

Button speakButton = (Button) findViewById(R.id.speakButton);

// Disable the speech rec button if speech recognition

// is not available

PackageManager pm = getPackageManager();

List activities = pm.queryIntentActivities(

new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH), 0);

if (activities.size() == 0)

{

speakButton.setEnabled(false);

speakButton.setText(“Recognizer not present”);

}

talker = new TextToSpeech(this, this);

ttsparams.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID,”stringId”);

say(“Press the button below and speak.”);

}

public void say(String text2say){

talker.speak(text2say, TextToSpeech.QUEUE_FLUSH, ttsparams);

}

@Override

public void onInit(int status) {

say(“Press the button below and speak”);

}

/**

* The method that is called with the “Click Me” button is pressed.

*/

public void speakButtonClicked(View v)

{

talker.stop();

startVoiceRecognition();

}

/**

* Fire an intent to start the voice recognition activity.

*/

private void startVoiceRecognition()

{

talker.stop();

Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);

intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,

RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);

intent.putExtra(RecognizerIntent.EXTRA_PROMPT, “Speak Now…”);

startActivityForResult(intent, REQUEST_CODE);

}

/**

* Handle the results from the voice recognition activity.

*/

@Override

protected void onActivityResult(int requestCode, int resultCode, Intent data)

{

String default_phrase=”Default Phrase”;

String heard = “”;

if (requestCode == REQUEST_CODE && resultCode == RESULT_OK)

{

// Extract the most likely spoken phrase.

ArrayList matches = data.getStringArrayListExtra(

RecognizerIntent.EXTRA_RESULTS);

if(matches.size()>0){

heard = matches.get(0);

} else {

heard = default_phrase;

}

} else {

heard = default_phrase;

}

super.onActivityResult(requestCode, resultCode, data);

HandleVoiceInput(heard);

}

@Override

public void onDestroy() {

// Shutdown the TTS

if (talker != null) {

talker.stop();

talker.shutdown();

}

super.onDestroy();

}

public void HandleVoiceInput(String heard){

say(“I thought you said, “+ heard+”. Press the button below and speak.”);

}

}

Gazunti Animation

Herewith our latest animated video talking about the Gazunti technology and our business model. Enjoy!

Tags: , , , , ,

Gazunti for Agriculture

There are amazing opportunities for the use of Gazunti in the agriculture sector that is looking to leverage technology to maximize production and decrease costs.

When working with the team from UCLA we found that the economic benefits to farmers of technology solutions that improve access to information are clear. Speaking with the management team of just one of the largest, most technology rich, operating farms in the state of California revealed just how big an impact technology can make to the agriculture industry. This large farm of over 25,000 acres perceived an opportunity to internally build a technology platform to better manage their substandtial acreage, crop diversity and dynamic nutrient needs. The first implementation of this platform immediately resulted in an efficiency gain of 20% even at the very basic level of opereation. With every new implementation, the farm saved more labor time and other costs, impacting the bottom line year after year. Currently, the operation is 90% more efficient than it was without the new uses of technology. When asked what size a farm would need to be to start seeing efficiency benefits from the technology, the response was an immediate…

“…one acre. Every farm at any size can implement technology and see a positive impact on efficiency, and they should.”

The use of technology can mean reducing fertilizer and pest control costs due to better targeting and monitoring, it could mean saving precious irrigation water resources when a rare storm is predicted in the next few days, or saving gallons of expensive fuel by using more precise guidance systems. Large multinational companies have perhaps demonstrated the potential to farm technology and create efficiencies to the greatest extent. Their vertical integration models allow them to guide production from the farm, to the food processor, to the distributor. Armed with the data and technology to reduce costs at every stage of the process, they are able to increase profits, pass savings to customers, and drive competitiveness in certain retail markets.

What is Gazunti

Gazunti is Knowledge Navigation…we provide answers fast!

At Gazunti we build and enable conversational software and solutions. Solutions built using Gazunti allow end users to ask questions of very large volumes of data and get specific answers immediately.

Developers can use the Gazunti Development Framework (GazBuilder) in the cloud to develop their own solutions and the Gazunti team provides training, accreditation and support to the Gazunti developer community.
Applications built using this GazBuilder are known as GazApps, and they are typically “knowledge navigator” or “virtual employee” solutions that enable endusers to access enterprise or government information and services. A GazApp can be thought of as being like SIRI[1] for the enterprise.

Your wish is our command….

Out of the box, end users can engage with Gazunti solutions using multiple channels including voice, text, browser and smart device.

[1]SIRI is a trademark of Apple Inc.

The AI Personal Assistant

Personal assistant solutions improve customer service

(The content here has been derived from various articles and publications written by Dr. Bill Meisel, a world leader in the area of speech and mobility)

Dr. Bill Meisel, president of technology analysis firm TMA Associates, claimed that “the explosion of smartphones and tablet computers is indicative of a desire by individuals to always have with us the power of computers to provide us information and entertainment and to keep us connected to others”. “The maturing of speech recognition and natural language understanding technologies, symbolized by personal assistant technology, compounds this development by making what is now available easily usable. This combination is fundamental, and offers new business opportunities to entertain, inform, and reach customers that didn’t exist before.”

Gazunti delivers! Gazunti brings together advanced knowledge navigation technology (which allows users to interact in a flexible, natural language, manner with large volumes of data to collect information); and speech recognition (which allows that interaction to be conducted using voice – as well as text, browser, and smart device interface). Gazunti is the ideal platform on which to develop, deploy and run personal assistant applications for today’s market.

As Bill sees it, “this personal assistant model is a paradigm shift. It’s not about trying to inject human intelligence into a service. It’s about human-like services that harness computer intelligence to do our tasks for us — better and faster — by tapping into the strengths computing technology has (and always will) in memory and the ability to process a lot of information and do terrific searches.”

What’s most interesting about this development is the potential impact on enterprises as consumers expect a new standard of personal service.
Meisel also said in 2012 that, “companies should have a branded personal assistant app to engage their customers (http://www.meisel-on-mobile.com/2011/11/14/mobile-phone-marketing-and-customer-service/). A mobile phone app will eventually be as necessary for a company as a web site. Making that app voice-interactive will match the trend toward reducing the need to type on small devices, and take advantage of the “personal assistant” model exemplified by Apple’s SIRI. Such apps can serve multiple purposes in providing information, service, entertainment, marketing, and closing sales.”

A personal assistant application built using Gazunti has natural language capabilities that allow for a flexible interaction (simply ask for what you want with a clarifying interaction if necessary).
Bill says that the continuing explosion in the use of speech enabled personal assistants is the result of three key factors:
(1) Enthusiasm for a user interface innovation that is particularly effective on small devices such as a mobile phone or when hands-free use is safer and more convenient (e.g., when driving);
(2) The over-burdening of the Graphical User Interface by growing web and app variety (particularly evident on small devices); and
(3) The simplicity, efficiency, and generality of the personal assistant model (“just say or type what you want, and get directly to the answer”).

It is hard to identify when it became imperative for a company to have a web site. Universities launched the original World Wide Web so they could share research; the growth exceeded any early expectations. Technology moves more quickly than ever today. How soon will it be that a company must have a personal assistant app? The earliest and most creative examples will certainly get the attention innovators deserve.
Gazunti brings together years of experience in the fields of knowledge navigation, speech recognition and cloud services, the result of which is the ability to deliver personal assistant solutions in the cloud to the enterprise. Below is a high level depiction of how Gazunti works to deliver personal assistant solutions.

For more information about personal assistant applications or any other potential Gazunti solution please contact us.