Simple Text to Speech using Xamarin for Android #1

Obviously, one of the most important parts of our caller name announcing app is to let the system read out the caller’s name. Luckily Android has everything which is needed already built in, so let’s have a look at how we apply it:

The foundation of outputting spoken text is Android’s SpeechToText class, which is also available via the Xamarin platform.

As always, let’s start by creating a class that can then be used from anywhere within the project:

public class MySpeaker : TextToSpeech.IOnInitListener
{
	private readonly TextToSpeech _speaker;

	public PhonecallReceiver(Context context)
	{
		_speaker = new TextToSpeech(context, this, "com.google.android.tts");
	}
}

A few things to notice in this first snippet:

  • Our MySpeaker class needs to implement Android’s IOnInitListener interface, because we want to react to the speech-to-text engine’s start up events – i’ll come back to that topic a little later.
  • Our speaker class holds an instance of the TextToSpeech class I mentioned already – this one will be used for all actual speaking stuff, for now this only important thing is to ensure that it is available throughout the lifetime of our speaker…
  • …and, of course, to initialize it! This is done in the constructor: Creating a new SpeechToText instance takes three things: The Android context (which is passed in from e.g. the calling Activity), the IOnInitListener implementation to use (in this case, we use our speaker class itself, as discussed above), and the speech-to-text engine to use (com.google.android.tts is the default engine on Android systems, so let’s just reference that).

In a second step, let’s see what advantages we get from implementing the IOnInitListener interface as mentioned above: It allows us to add an OnInit method to the speaker class in order to react to the speech-to-text engine starting up, either successfully or erroneous:

public void OnInit(OperationResult status)
{
	if (status == OperationResult.Success)
	{
		// TODO
	}
	else
	{
		// TODO
	}
}

After instantiating the SpeechToText in the constructor, this method will be called automatically, telling us if start up was successful or not. Of course, you’ll want to add breakpoints in both branches for now, and implement proper error handling in the else case later on.

And now, last but not least, the most interesting part: Let’s instruct the speaker class to actually speak! This one’s surprisingly easy:

public void Speak(string message)
{
	_speaker.Speak(message, QueueMode.Flush, null, "testUtterance");
}

We wrap this part within a separate method to be able to call it from outside. The actual magic happens in only one line of code – simply invoke a method called Speak on our global SpeechToText instance! This method expects four parameters:

  • The most important one: The text to be output (we pass that in from the outside, to be flexible)
  • The queue mode: This one is relevant when using the speaker frequently. Flush removes all items scheduled for speech output from the queue and instantly starts speaking out the current message, while Add would append the current message to any existing ones, to be output one after the other.
  • Additional parameters to be passed to the speech-to-text engine – we don’t need these for the moment.
  • A unique identifiyer for the utterance. For now, let’s hardcode some name to be able to quickly test it. If you use the speaker regularly throughout an app and want to be able to identify single utterances and react to them or modify them, you’ll need to pass a unique and recognizable identifier here.

That’s it! We should be able to instantiate and use this class already. E.g., from a simple test app’s main Activity, we could invoke it by a snippet similar to the folllowing:

public class MainActivity : AppCompatActivity
{
    // ...
 
    protected override void OnCreate(Bundle savedInstanceState)
    {
        // ...
         
        var speaker = new MySpeaker(this);
		speaker.Speak("This is a first test message");
    }
}

Additional details about identifying the state of single utterances will be covered in the following blog post!