In this section, you will learn how to use Jovo to craft a response to your users.

Introduction to Output Types

What do users expect from a voice assistant? Usually, it's either direct or indirect output in form of speech, audio, or visual information. In this section, you will learn more about basic output types like tell, ask, but also how to use SSML or the Jovo speechBuilder to create more advanced output elements.

Basic Output

Jovo's basic output options offer simple methods for interacting with users through text-to-speech. If you're interested in more, take a look at Advanced Output.


The tell method is used to have Alexa or Google Assistant say something to your users. You can either use plain text, SSML (Speech Synthesis Markup Language), or a speechBuilder object (this.$speech)).

Important: The session ends after a tell method, this means the mic is off and there is no more interaction between the user and your app until the user invokes it again.

Learn more about sessions here.


Whenever you want to make the experience more interactive and get some user input, the ask method is the way to go.

This method keeps the mic open, meaning the speech element is used initially to ask the user for some input. If there is no response, the reprompt is used to ask again.

You can also use SSML or speechBuilder objects (this.$speech and this.$reprompt) for your speech and reprompt elements.

Multiple Reprompts

Google Assistant offers the functionality to use multiple reprompts.

You can find more detail about this feature here: Platforms > Google Assistant > Multiple Reprompts.


It is recommended to use a RepeatIntent (e.g. the AMAZON.RepeatIntent) that allows users to ask your app to repeat the previous output if they missed it.

This feature makes use of the Jovo User Context. To be able to use it, please make sure that you have a database integration set up and the Jovo User Context enabled.

Advanced Output

Voice platforms offer a lot more than just converting a sentence or paragraph to speech output. In the following sections, you will learn more about advanced output elements.


SSML is short for "Speech Synthesis Markup Language." You can use it to add more things like pronunciations, breaks, or audio files. For some more info, see the SSML references by Amazon, and by Google. Here's another valuable resource for cross-platform SSML.

Here is an example how SSML-enriched output could look like:

But isn't that a little inconvenient? Let's take a look at the Jovo speechBuilder.


With the speechBuilder, you can assemble a speech element by adding different types of input:

You can find everything about the SpeechBuilder here.


Jovo uses a package called i18next to support multilanguage voice apps.

Learn more about i18n responses here.

Raw JSON Responses

If you prefer to return some specific responses in a raw JSON format, you can do this with the platform-specific functions this.<platform-name>.setResponseObject(obj).

Learn more about platform-specific features and responses here: Platforms.

Visual Output

The Jovo framework, besides sound and voice output, can also be used for visual output.

Learn more about visual output here.

No Speech Output

Sometimes, you might want to end a session without speech output. In that case, simply don't add any kind of output in your handler function.

Comments and Questions

Any specific questions? Just drop them below or join the Jovo Community Forum.

Join Our Newsletter

Be the first to get our free tutorials, courses, and other resources for voice app developers.