Conversation design process Scale your design

Scale your design

Now that you have a solid conversation on Google Home, it’s time to scale your design to help users wherever they are. Since the Google Assistant helps users across devices, your Actions should too. To do that, you’ll adapt your spoken conversation into a multimodal conversation.

Multimodal design Expand and collapse content An arrow that points down when collapsed and points up when expanded.

Watch this video to learn how to leverage the strengths of each modality to create a compelling experience that scales across devices.

Saba Zaidi and Ulas Kirazci, on designing multimodal Actions at Google I/O 2018

Anatomy of a response Expand and collapse content An arrow that points down when collapsed and points up when expanded.

Your Action's response to the user is made up of components.

Conversational components

Conversational components are combined to compose the content in the spoken prompts, display prompts, and chips.

Conversational components (prompts and chips) should be designed for every dialog turn.






Visual components

Visual components include cards, carousels, and other visual assets.

Perfect for scanning and comparing options, visual components are useful if you're presenting detailed information—but they aren't required for every dialog turn.

Spoken prompt

The content your Action speaks to the user, via TTS or pre-recorded audio

Display prompt

The content your Action writes to the user, via printed text on the screen

Chips

Suggestions for how the user can continue or pivot the conversation



Basic card

Use basic cards to display an image and text to users

Browsing carousel

Browsing carousels are optimized for allowing users to select one of many items, when those items are content from the web

Carousel

Carousels are optimized for allowing users to select one of many items, when those items are most easily differentiated by an image

List

Lists are optimized for allowing users to select one of many items, when those items are most easily differentiated by their title

Media response

Used to play and control the playback of audio content like music or other media

Persona: Ibento (fake ticket-seller). Spoken prompt: Here are the best available seats for two people seated together. Do you want to get these? Display prompt: Here’re the best available seats. Do you want them? Visual: Basic card showing seating chart with seat location and pricing. Chips: Buy tickets, VIP, pricing chart.

Group devices by the components used for the response:


For conversations on smart speakers or headphones, the spoken prompts carry the whole conversation and convey the core message.


For conversations in the car or on a smart display, the screen may not always be available to the user. Therefore, the spoken prompts have to carry most of the conversation and convey the core message. The screen can be used for supplementary visual information as well as suggestions to continue or pivot the conversation.


Conversations on a TV, laptop, phone or watch are equally suited for audio input/output and screen-based interactions. The user can choose to continue the conversation in either the spoken or visual modality. Therefore, all the components work together to carry the conversation and convey the core message.

Go from spoken to multimodal Expand and collapse content An arrow that points down when collapsed and points up when expanded.

When you wrote your sample dialogs, we recommended that you start with the spoken conversation—that is, designing for a screenless devices like smart speakers and headphones. Now that you’re ready to scale your designs to other devices, pieces will move out of the spoken prompts and into the display prompts, chips, and visuals.

Here are a couple examples from the Google I/O 18 Action:

Persona: Google I/O 18. User input: Yeah, I’m attending. Spoken prompt: Congrats! As the Keeper of I/O-specific knowledge, consider me your guide. I can manage your schedule, help you find things to do, or give you directions. So, which do you need?

Start with the original spoken prompt from the example sample dialog.

Persona: Google I/O 18. User input: Yeah, I’m attending. Spoken prompt: Congrats! As the Keeper of I/O-specific knowledge, consider me your guide. I can manage your schedule, help you find things to do, or give you directions. So, which do you need? Chips: Manage my schedule, Find things to do.

Most of the time, you can simply re-use the same spoken prompt on devices like smart displays, since the need to convey the core of the conversation remains the same.

At this point in the conversation, there isn’t any content that would be appropriate in a visual component like a card or carousel, so none is included.

Be sure to add chips. At a minimum, these should include any options offered in the prompts so the user can quickly tap them to respond.

Persona: Google I/O 18. User input: Yeah, I’m attending. Spoken prompt: Congrats! As the Keeper of I/O-specific knowledge, consider me your guide. I can manage your schedule, help you find things to do, or give you directions. So, which do you need? Display prompt: As the Keeper of I/O-specific knowledge, consider me your guide. What can I help you with? Chips: Manage my schedule, Find things to do.

Since there isn’t any content that would be appropriate in a visual component, there’s no content that can be moved out of the spoken prompt. Therefore, it’s okay to re-use the original.

The display prompt should be a condensed version of the spoken prompt, optimized for scannability. Move any response options to the chips, but be sure to always include the question.

Re-use the same chips you just created.

Persona: Google I/O 18. User input: Browse sessions. Spoken prompt: Here’re some of the topics left to cover today: machine learning and artificial intelligence, identity, Nest, Android and Play, open source, and Assistant. Do any of those sound good?

Start with the original spoken prompt from the example sample dialog.

Note that the spoken list is limited to 6 items (of 17 total) in order to reduce cognitive load. The topics are randomized to not favor one topic over another.

Persona: Google I/O 18. User input: Browse sessions. Spoken prompt: Here’re some of the topics left to cover today: machine learning and artificial intelligence, identity, Nest, Android and Play, open source, and Assistant. Do any of those sound good? Visual: Paginated list card showing topics. Chips: None of those.

Once again, it’s okay to re-use the same spoken prompt, since we can’t assume the user is looking at the screen.

Including a visual list of all the topics helps the user to browse and select. Note that the visual list of all 17 items (paginated) is shown in alphabetical order, which is easiest for users to search for the topic they want.

Because the list already enumerates the topics that can be chosen, there is no need to include them as chips. Instead, include other options like “None of those” to offer the user a way out.

Persona: Google I/O 18. User input: Browse sessions. Spoken prompt: There’re talks on 17 different topics. Which most interests you? Display prompt: Which topic are you interested in? Visual: Paginated list card showing topics. Chips: None of those.

Here, we can assume that the user has equal access to the audio and the screen. Since the visual modality is better suited to lists, leverage this strength by directing the user to the screen to pick a topic. This allows us to shorten the spoken prompt to a simple list overview and question.

Only the question needs to be maintained in the display prompt.

Re-use the same chip you just created.

Relationship between prompts Expand and collapse content An arrow that points down when collapsed and points up when expanded.

In general, spoken prompts are optimized for and follow the conventions of spoken conversations. Display prompts are optimized for and follow the conventions of written conversations. Although slightly different, they should still convey the same core message.

Design prompts for both the ear and the eye. It’s easiest to start with the spoken prompt, imagining what you might say in a human-to-human conversation. Then, condense it to create the display prompt.

Say, essentially, the same thing

Ibento (fake ticket-seller). User input: What events are happening this weekend? Spoken prompt: Here’re some events this weekend that still have tickets available. Which ones do you want tickets for? Display prompt: Here are some events this weekend. Which one do you want? Visual: Paginated list card showing different events in San Francisco. Chips: Only sporting events, Under $100, Next weekend.

Do.

Keep the same narrative from spoken prompt to display prompt.

Ibento (fake ticket-seller). User input: What events are happening this weekend? Spoken prompt: Here’re some events this weekend that still have tickets available. Which ones do you want tickets for? Display prompt: Here are the hottest events happening in the city. Which one are you hoping to score tickets to? Visual: Paginated list card showing different events in San Francisco. Chips: Only sporting events, Under $100, Next weekend.

Don’t.

Don't lead the user to a different topic or branching experience.

Display prompts should be condensed versions of their spoken counterparts

Persona: Google I/O 18. User input: When’s I/O happening? Spoken prompt: This year’s developer festival will be held May 8th through 10th at the Shoreline Amphitheatre. That’s in Mountain View, California, next to Google’s main campus. Now, you can ask about the keynotes or sessions, or anything else you want to know about I/O. Display prompt: This year’s developer festival will be held May 8-10 at the Shoreline Amphitheatre in Mountain View, CA. Now, you can ask anything else you want to know about I/O.

Do.

Use condensed display prompts.

Persona: Google I/O 18. User input: When’s I/O happening? Spoken prompt: This year’s developer festival will be held May 8th through 10th at the Shoreline Amphitheatre. That’s in Mountain View, California, next to Google’s main campus. Now, you can ask about the keynotes or sessions, or anything else you want to know about I/O. Display prompt: This year’s developer festival will be held May 8 through 10 at the Shoreline Amphitheatre. That’s in Mountain View, California, next to Google’s main campus. Now, you can ask about the keynotes or sessions, or anything else you want to know about I/O.

Don’t.

Don't simply duplicate spoken prompts.

Keep the voice and tone consistent

Persona: Geek num (fake game). Spoken prompt: Howdy! I can tell you facts and trivia about almost any number, like 42. What number would you like to know about? Display prompt: Howdy! I can tell you fun facts about almost any number. What number do you have in mind?

Do.

Stay in persona.

Persona: Geek num (fake game). Spoken prompt: Howdy! I can tell you facts and trivia about almost any number, like 42. What number would you like to know about? Display prompt: Greetings! I’m programmed to give you fun facts for any number. What number do you want?

Don’t.

Avoid designing prompts that feel like they’re coming from different personas.

Design spoken and display prompts so they can be understood independently

Persona: Geek num (fake game). Spoken prompt: Howdy! I can tell you facts and trivia about almost any number, like 42. What number would you like to know about? Display prompt: Howdy! I can tell you fun facts about almost any number. What number do you have in mind?

Do.

If you’re asking a question, make sure it appears in both prompts, so the user knows what to do next.

Persona: Geek num (fake game). Spoken prompt: Howdy! I can tell you facts and trivia about almost any number, like 42. What number would you like to know about? Display prompt: Howdy! I can tell you fun facts about almost any number.

Don’t.

Don't rely on spoken prompts alone to carry the conversation. This can backfire when the user can't hear them. Here, if the user has their device muted, they won’t hear the question.

Relationship between components Expand and collapse content An arrow that points down when collapsed and points up when expanded.

Remember that all the components are meant to provide a single unified response.

It’s often easiest to start by writing prompts for a screenless experience, again imagining what you might say in a human-to-human conversation. Then, imagine how the conversation would change if one of the participants was holding a touchscreen. What details can now be omitted from the conversational components? Typically, the display prompt is significantly reduced since the user can just as easily comprehend the information in the visual as they can in the display prompt. Group the information in such a way that the user doesn’t have to look back and forth between the display prompt and visual repeatedly.

Always include the question in the prompts

Persona: Ibento (fake ticket-seller). User input: I want SportsTeam tickets. Spoken prompt: The SportsTeam has a few home games coming up. Which one do you want tickets for? Display prompt: For which game? Visual: Paginated list card showing three games with dates and times. Chips: Chicago, July, Playoffs.

Do.

Make the call to action clear by asking a question.

Persona: Ibento (fake ticket-seller). User input: I want SportsTeam tickets. Spoken prompt: Here are some upcoming SportsTeam games. Display prompt: SportsTeam games. Visual: Paginated list card showing three games with dates and times. Chips: Chicago, July, Playoffs.

Don’t.

When presented with this design, many users did not take their turn.

Avoid redundancy

Persona: Sekai (fake shoe store). User input: Where are my shoes? Spoken prompt: Your order for men’s running shoes is scheduled to be delivered tomorrow. Is there anything else I can help you with? Display prompt: Your order is scheduled for delivery tomorrow. Is there anything else I can help you with? Visual: A basic card shows an image of the shoes with their name, size, description, price, and shipping details. Chips: Orders from last month, New order.

Do.

Spread information across the display prompt and visual component.

Persona: Sekai (fake shoe store). User input: Where are my shoes? Spoken prompt: You placed an order on June 11th for men’s running shoes, size 10, in black and blue patchwork. It was shipped on June 15th and is scheduled to be delivered tomorrow. Is there anything else I can help you with? Display prompt: You placed an order on June 11 for men’s running shoes, size 10, in black and blue patchwork. It was shipped on June 15 and is scheduled to be delivered tomorrow. Is there anything else I can help you with? Visual: A basic card shows an image of the shoes with their name, size, description, price, and shipping details. Chips: Orders from last month, New order.

Don’t.

Don't cram everything from the visual component into the prompts. Focus on just the key information.

Give the short answer in the prompts, and the details in the visuals

Persona: Google I/O 18. User input: When’s my next session? Spoken prompt: Your next session is at 11:30 AM. Do you need anything else? Display prompt: It’s at 11:30 AM. Anything else? Visual: Basic card with name, time, date, location, and description of upcoming session. Chips: Get directions, Next event, Open I/O app.

Do.

Use the spoken and display prompts to give the specific answer to the user’s directed question (11:30 AM in this example). Use the visuals for related details.

Persona: Google I/O 18. User input: When’s my next session? Spoken prompt: Your next session is at 11:30 AM. It’s called Design Actions for the Google Assistant: beyond smart speakers, to phones and smart displays. Display prompt: It’s starting at 11:30 AM at Stage 2. It’s called Design Actions for the Google Assistant: beyond smart speakers, to phones and smart displays. Visual: Basic card with name, time, date, location, and description of upcoming session. Chips: Get directions, Next event, Open I/O app.

Don’t.

Avoid redundancy between the spoken prompt, display prompt, and visuals.

Even when the visuals provide the best answer, make sure the prompts still carry the core of the message

Persona: Miso flowers (fake flower shop). User input: How do I care for my bouquet? Spoken prompt: Keep your bouquet fresh longer by changing the water when it gets cloudy, following these steps. Any other questions? Display prompt: Change the water when it gets cloudy, following these steps. Any other questions? Visual: List card showing numbered steps for flower care. Chips: Do flowers last long, Pet-safe flowers.

Do.

Use the prompts to give an overview. Use the visuals to provide additional detail.

Persona: Miso flowers (fake flower shop). User input: How do I care for my bouquet? Spoken prompt: Follow these steps. Any other questions? Display prompt: Follow these steps. Any other questions? Visual: List card showing numbered steps for flower care. Chips: Do flowers last long, Pet-safe flowers.

Don’t.

Don’t force the reader to scan and read. Your persona should reduce the work the user needs to do, which includes the effort of scanning through detailed information.

Encourage users to select from lists or carousels, but allow them to continue with their voice

Persona: Ibento (fake ticket-seller). User input: I want orchestra tickets. Spoken prompt: Ok. Your local orchestra has a few concerts coming up. The next one is April 13th. Which one do you want tickets for? Display prompt: Here are the upcoming concerts for your local orchestra. Which do you want tickets for? Visual: Paginated list card showing times, dates, and locations for different concerts. Chips: None of those.

Do.

Encourage the user to look at the list.

Persona: Ibento (fake ticket-seller). User input: I want orchestra tickets. Spoken prompt: Ok. Your local orchestra has a few concerts coming up. The next one is April 13th at 6 PM. After that, there’s one on April 14th at 8 PM. Then, there’s one on April 15th at 7 PM. Do you want tickets for one of those or do you want to hear more? Display prompt: Your local orchestra has a few concerts coming up. Which do you want tickets for? Visual: Paginated list card showing times, dates, and locations for different concerts. Chips: None of those.

Don’t.

Don't rattle out the full list.