Conversation design Learn about conversation

Learn about conversation

New to conversation design? Here are a few high-level principles and concepts to get you started before you write your first sample dialog.

The Cooperative Principle Expand and collapse content An arrow that points down when collapsed and points up when expanded.

Watch this video for a crash course on how to write dialog that’s comfortable, frictionless, and user-centric by applying the Cooperative Principle.

James Giangola, on applications of the Cooperative Principle to Voice User Interfaces, at Google I/O 2017

According to the Cooperative Principle, efficient communication relies on the assumption that there’s an undercurrent of cooperation between conversational participants.

The Cooperative Principle can be understood in terms of four rules, called Grice’s Maxims.

We instinctively cooperate in terms of...

Maxim (or rule)

...the truth of what we say

Maxim of Quality

...the quantity of information that we provide

Maxim of Quantity

...the relevance of what we contribute

Maxim of Relevance

...the way we strive to communicate clearly, without obscurity or ambiguity

Maxim of Manner

Research has shown that people respond to technology as they would to another human. This means users rely on their existing model of human-to-human conversation and follow the Cooperative Principle even when interacting with the persona of a conversational user interface, and they expect your persona to follow it, too.

Expect users to be informative.

Because users are cooperative, they often offer more information than is literally required of them.

Persona: Sekai (fake shoe store). Spoken prompt: What kind of shoes can I help you find today? New turn. User input: I’m looking for patchwork sneakers in size 10. Spoken prompt: Got it. Patchwork sneakers in a size 10. Do you have a specific color in mind?

Do.

Not only did this user answer the question about shoe-type, they also specified the size, in an attempt to find what they want quickly. Expect this kind of behavior from repeat users who know what other questions will be asked.

Persona: Sekai (fake shoe store). Spoken prompt: What kind of shoes can I help you find today? New turn. User input: I’m looking for patchwork sneakers in size 10. Spoken prompt: Got it. Patchwork sneakers. In what size?

Don’t.

Here, the persona was only expecting the answer to the question about shoe-type. Cooperative users will be frustrated by having to repeat the shoe size again.

Get the dialog back on track.

Your persona won’t always be able to handle cooperative responses. In these cases, rely on lightweight and conversational error handling to get the dialog back on track in a way that doesn’t draw attention to the error.

Persona: Ibento (fake ticket-seller). Spoken prompt: You got it. And how many people are going? New turn. User input: Uh, it’s just my wife and me. Spoken prompt: Sorry, how many was that? New turn. User input: 2 people.

Do.

If your persona is expecting a numerical answer, then cooperative/informative responses like these won’t be understood. So handle this No Match error with a rapid reprompt.

Persona: Ibento (fake ticket-seller). Spoken prompt: You got it. And how many people are going? New turn. User input: Uh, it’s just me and my wife. Spoken prompt: I’m sorry, I didn’t understand. Please speak the number of people in your party. You may say, for example, “I want 2 tickets”. Go ahead and speak your response now. New turn. User input: 2 people.

Don’t.

This reprompt is verbose, unnatural, and overly formal, and it doesn’t follow best practices for error handling.

Move the conversation forward.

Like your users, your persona should also be cooperative and informative, offering as much information as is necessary to advance the conversation.

Persona: Miso flowers (fake flower shop). User input: Delivery. Spoken prompt: Sure. That arrangement is available for next day delivery, or standard delivery within 3 business days. Which would you like? New turn. User input: Neither of those. Spoken prompt: In that case, do you want to look for an arrangement that can be delivered today?

Do.

This persona makes an effort to find alternative delivery options that will satisfy the user’s intent—a cooperative gesture to move the conversation forward.

Persona: Miso flowers (fake flower shop). User input: Delivery. Spoken prompt: Sure. That arrangement is available for next day delivery, or standard delivery within 3 business days. Which would you like? New turn. User input: Neither of those. Spoken prompt: Sorry, that’s all the delivery options available. Come back again next time you need flowers.

Don’t.

This persona makes no attempt to satisfy the user’s intent and, after presenting some bad news, simply ends the interaction.

Optimize for relevance.

In conversation, saying too much is as uncooperative as saying too little. Facilitate comprehension by keeping turns brief and optimally relevant from the user’s point of view.

Persona: Ibento (fake ticket-seller). Spoken prompt: Ok, I found 4 sections with available spots for 2 people sitting together: sections 211, 212, 215, and 218. Do you want to hear more about one of those? New turn. User input: Yeah, the one in section 211. Spoken prompt: Ok, in section 211, row 14, seats 7 and 8 are available. You can purchase them for $130 each. Do you want to get these?

Do.

This design is optimized for relevance. The persona doesn’t include details that aren’t relevant to the current decision.

Persona: Ibento (fake ticket-seller). Spoken prompt: Ok, I found 4 sections with open seats. Section 211, row 14, has open spots: seats 7 and 8. You can purchase these for $130 each. Section 212, row 7, has 2 open spots: seats 15 and 16. You can purchase these for $130 each. Section 215, row 2, has 2 open spots: seats 9 and 10. You can purchase these for $160 each. Section 218, row 5, has 2 open spots: seats 4 and 5. You can purchase these for $150 each. Which would you like?

Don’t.

From the user’s point of view, most of this information is irrelevant and repetitive. All this detail imposes a heavy burden on the user’s short-term memory as they patiently await their turn.

Persona: Miso flowers (fake flower shop). Spoken prompt: Now, in case we need to reach the person you’re sending this to, what’s the best number? New turn. User input: 1 2 3 4 5 6 7. Spoken prompt: And the area code? New turn. User input: 0 1 0.

Do.

This design is optimized for relevance. Only the very few users who don’t say the area code will be asked for it explicitly.

Persona: Miso flowers (fake flower shop). Spoken prompt: Now, in case we need to reach the person you’re sending this to, tell me the ten-digit phone number, starting with the area code. User input: 0 1 0 1 2 3 4 5 6 7.

Don’t.

This design forces every user to listen to instructions on how to say a phone number in a way that’s easiest for the persona, but not for the user. It also incorrectly presumes that the user would be unable to do this without instruction, making it feel like an error prompt.

Listen between the lines.

Knowing what someone said is not the same as knowing what they meant. People often suggest things rather than state them explicitly. Our ability to “listen between the lines” is known as “conversational implicature”.

Implicature vs implication. By conversational implicature, “Last night I saw John at a restaurant with a woman” suggests that John was with a woman other than his wife, because if the woman had been his wife, the speaker would have said so. However, by logical implication, the woman could have been John’s wife, since all wives are women.

Persona: Google I/O ‘18. User input: When’s I/O? This year’s developer festival will be held May 8th through 10th at Shoreline Amphitheatre. That’s in Mountain View, California, next to Google’s main campus. Now, you can search for talks, manage your viewing schedule, or ask me anything else you want to know about I/O. Display prompt: This year’s developer festival will be held May 8-10 at the Shoreline Amphitheatre in Mountain View, CA. Now, you can ask me anything else you want to know about I/O. Chips: Search for talks. Manage my schedule. New turn. User input: That’s it. Spoken prompt: Come back if there’s anything else I/O you’d like to know. Display prompt: Come back if there’s anything else I/O you’d like to know.

Do.

Be sure to add handling for phrases like “That’s it”, “That’s all” “Thanks” or “I’m done” for which the implied meaning is usually “I’ve gotten everything I need out of this conversation and am done talking. Goodbye.”

Persona: Google I/O ‘18. User input: When’s I/O? This year’s developer festival will be held May 8th through 10th at Shoreline Amphitheatre. That’s in Mountain View, California, next to Google’s main campus. Now, you can search for talks, manage your viewing schedule, or ask me anything else you want to know about I/O. Display prompt: This year’s developer festival will be held May 8-10 at the Shoreline Amphitheatre in Mountain View, CA. Now, you can ask me anything else you want to know about I/O. Chips: Search for talks. Manage my schedule. New turn. User input: That’s it. Spoken prompt: I’m getting a 404 error. What did you want to know about I/O? Display prompt: That’s a 404. What did you want to know about I/O?

Don’t.

Here, “That’s it” was not added to the grammar, so it triggers a No Match error.

Speak clearly, in plain English.

People naturally avoid ambiguity and obscurity of expression in a conversation. Using words and phrases that are familiar help reduce cognitive load. When it comes to word choice, if you wouldn’t say it, neither should your persona.

When you can’t decide between a few similar terms, use Google Trends to find out which term people search for most and Google Books Ngram Viewer to find out which is more commonly published.

Persona: Miso flowers (fake flower shop). User input: Delivery. Spoken prompt: Sure. What’s the zip code for delivery? Display prompt: What’s the zip code for delivery? New turn. User input: 940423. Spoken prompt: That was longer than I expected. What’s the 5 digit zip code?

Do.

The user may have made a typo or quickly corrected themselves when speaking. So, the persona explains the issue, in plain English, and asks again.

Persona: Miso flowers (fake flower shop). User input: Delivery. Spoken prompt: Sure. What’s the zip code for delivery? Display prompt: What’s the zip code for delivery? New turn. User input: 940423. Spoken prompt: That zip code is invalid. I need a valid zip code for delivery. Try again.

Don’t.

“Invalid” is overly technical and doesn’t help get the user back on track.

Persona: Ibento (fake ticket-seller). Spoken prompt: Ok, you’re all set. Anything else? Display prompt: Ok, you’re all set.

Do.

Keep confirmations simple. “Done!” is also a good option.

Persona: Ibento (fake ticket seller). Spoken prompt: The transaction you have requested is now complete. Do you require further assistance? Display prompt: The transaction you have requested is now complete.

Don’t.

“Transaction,” “requested,” and “complete” are formal, not conversational. Also, this message fails to be relevant: There is no value in reminding the user that they’ve just requested a transaction.

Persona: Ibento (fake ticket-seller). User input: Are there any concerts tonight? Spoken prompt: Here are some concerts in San Francisco tonight. Do any of these sound good? Display prompt: Here are the concerts in San Francisco tonight. Which are you interested in? Visual: A paginated list card displaying 3 concerts in San Francisco. Chips: Tomorrow. Oakland.

Do.

Confirm that the user’s request was understood, and point to the results.

Persona: Ibento (fake ticket-seller). User input: Are there any concerts tonight? Spoken prompt: Okay, here are some matching events. Display prompt: Showing matching events. Visual: A paginated list card displaying 3 concerts in San Francisco. Chips: Tomorrow. Oakland.

Don’t.

“Matching events” is a technical expression. The “match” is the correlation between what the user said and the concerts available. Everyday users don’t know, let alone care about, the challenge of matching queries to results. In addition, “matching events” is ambiguous—it can also mean events that match each other, like “matching socks.”

Context Expand and collapse content An arrow that points down when collapsed and points up when expanded.

Advances in automatic speech recognition (ASR) means that we almost always know exactly what users said. However, determining what users meant is still a challenge.

Utterances often can’t be understood in isolation; they can only be understood in context.

Pronouns or generic references

Your persona needs to keep track of context in order to understand the user’s utterances.

If you’re using Dialogflow, go here to read more about adding context.

Persona: Ibento (fake ticket-seller). User input: I want NotARealDJ concert tickets. Spoken prompt: NotARealDJ has a few concerts coming up. Which one do you want tickets for? Display prompt: Which one do you want tickets for? Visual: A paginated list card with 3 concert dates, times, and locations. Chips: May, Los Angeles, Las Vegas. New turn. User Input: When’s he coming to the city? Spoken prompt: He’s performing in San Francisco on May 13th. Do you want me to see if there are tickets available? Display prompt: On May 13. Should I see if there are tickets available? Visual: Basic card with image, artist name, and May 13th concert details. Chips: Yes, No, Bay Area concerts, Other locations.

Do.

Knowledge of the previous turn is required to know that “he” refers to NotARealDJ. And knowledge of the user’s geographical location is required to know that “the city” refers to “San Francisco”.

Persona: Ibento (fake ticket-seller). User input: I want NotARealDJ concert tickets. Spoken prompt: NotARealDJ has a few concerts coming up. Which one do you want tickets for? Display prompt: Which one do you want tickets for? Visual: A paginated list card with 3 concert dates, times, and locations. Chips: May, Los Angeles, Las Vegas. New turn. User Input: When’s he coming to the city? Spoken prompt: Sorry, which concert do you want tickets for?

Don’t.

Here, the user’s question isn’t understood, and a No Match error occurs.

Follow-up intents

Your persona needs to keep track of context in order to understand follow-up intents.

Unless the user changes the subject, we can assume that the thread of conversation continues. Therefore, it’s likely that ambiguities in the current utterance can be resolved by referring to previous utterances.

If you’re using Dialogflow, read the section on follow-up intents for details.

Persona: Misoflowers (fake flower shop). User Input: How much for a dozen roses? Spoken prompt: A dozen of our premium long-stem red roses cost 74 dollars and 99 cents. Should I add them to your cart? New turn. User input: What about a half dozen? Spoken prompt: 6 premium long-stem red roses cost 37 dollars and 50 cents. Do you want to get these?

Do.

Using follow-up intents, the persona is able to understand that “What about a half dozen?” is a follow-on to the user’s previous utterance, and interpret it as “How much does a bouquet of 6 roses cost?”

Persona: Misoflowers (fake flower shop). User Input: How much for a dozen roses? Spoken prompt: A dozen of our premium long-stem red roses cost 74 dollars and 99 cents. Should I add them to your cart? New turn. User input: What about a half dozen? Spoken prompt: Sorry, did you want me to add the roses to your cart?

Don’t.

If your Action fails to interpret utterances in their larger conversational context, it will either misinterpret the user’s query or wind up in an error—in this case, a No Match error.

References to what’s on the screen

For multimodal interactions, everything on the screen is part of the context of the conversation. If users can see it, they’ll assume they can refer to it.

Persona: Sekai (fake shoe store). Spoken prompt: Ok, here are 5 pairs you might like. Do you want more details on any of these? Display prompt: Here are 5 you might like. Which do you want more details on? Visual: Carousel card showing different boots. Chips: Under $100, Dark brown, Find other shoes. New turn. User input: The first pair. Spoken prompt: Alright, do you want to get these or keep looking? Display prompt: Alright, do you want to get these or keep looking? Visual: Basic card showing boots with image, size, description, and price. Chips: Add to cart, Back to list, New search.

Do.

Do anticipate references to where an item is located on the screen, e.g., “the first one”, or what it looks like, e.g., “the red one”.

Persona: Sekai (fake shoe store). Spoken prompt: Ok, here are 5 pairs you might like. Do you want more details on any of these? Display prompt: Here are 5 you might like. Which do you want more details on? Visual: Carousel card showing different boots. Chips: Under $100, Dark brown, Find other shoes. New turn. User input: The first pair. Spoken prompt: Sorry, which shoes? Display prompt: Sorry, which shoes? Visual: Carousel card showing different boots. Chips: Under $100, Dark brown, Find other shoes.

Don’t.

It’s problematic when your persona doesn’t appear to have any awareness of what is showing on the screen.

Variation Expand and collapse content An arrow that points down when collapsed and points up when expanded.

Variety is the spice of life. Users pay more attention when there’s more of it. Variety can also keep the interaction from feeling monotonous or robotic.

So randomize. For any given prompt, there are usually a few conversational alternatives that’ll work. Focus your efforts on prompts that users hear frequently, so these phrases don’t become tiresome.

If you’re using Dialogflow, you’ll be able to easily add multiple response variations.

Consider all the different ways to answer the question “what time is it?”

It’s three forty-eight, I’ve got eleven thirty, It’s five-oclock on the dot, A quarter to five, et cetera.

If your Action told users the time, you’d want to add all of the above variations and randomize playing them to users in the conditions in which they apply.

Turn-taking Expand and collapse content An arrow that points down when collapsed and points up when expanded.

By taking turns, we avoid interrupting each other and keep the conversation in sync. Turn-taking is about who “has the mic”: taking the mic, holding the mic, and handing it over to another speaker. To manage this complex process, we rely on a rich inventory of cues embedded in sentence structure, intonation, eye gaze, and body language. Although your Action will be limited in expressing and detecting these cues, you can still write prompts in a way that helps the user know when to take their turn.

Ask questions

Your persona should give clear signals when it’s the user’s turn.

Persona: Ibento (fake ticket-seller). User input: I want SportsTeam tickets. Spoken prompt: The SportsTeam has a few home games coming up. Which one do you want tickets for? Display prompt: For which game? Visual: Paginated list card showing three games with dates and times. Chips: Chicago, July, Playoffs.

Do.

Make the call to action clear by asking a question.

Persona: Ibento (fake ticket-seller). User input: I want SportsTeam tickets. Spoken prompt: Here are some upcoming SportsTeam games. Display prompt: SportsTeam games. Visual: Paginated list card showing three games with dates and times. Chips: Chicago, July, Playoffs.

Don’t.

When presented with this design, many users will not take their turn.

Don’t monopolize

Your persona should not monopolize the conversation or try to present all options/questions in a single turn.

Persona: Ibento (fake ticket-seller). User input: Find events near me. Spoken prompt: Sure. Do you have a specific type of event in mind?

Do.

Prompt the user with only a single question at a time.

Persona: Ibento (fake ticket-seller). User input: Find events near me. Spoken prompt: Sure. Some upcoming events include a concert by NotARealRapper and a SportsTeam home game. Do either of these sound good? Or do you want to start by picking a type of event? The ballet is popular in your area.

Don’t.

Don’t keep speaking after asking a question. Don’t overwhelm the user with options and questions.

Additional resources Expand and collapse content An arrow that points down when collapsed and points up when expanded.

  • Give your VUI a personality
  • Move the conversation forward
  • Be brief, be relevant
  • Leverage context
  • Direct the user’s focus through word order and stress
  • Don’t teach “commands”—speaking is intuitive

Method Podcast, Episode 8, Margaret Urban, Senior Interaction Designer @ Google, on the science of speech and designing voice user interfaces for the Google Assistant

A few highlights:
On the magic of language,
3:13
“I've been fascinated by language all my life. I think it's magical. It's like telepathy—that simply by the vibration of a sound wave, I can put an idea in your head.”
On making computers sound human,
2:37
“How language is used in the social context is really important in creating something that sounds natural. Because we want the computers to talk like people. We don't want to force the people to talk like computers.”

The Incredible Thing We Do During Conversations By Ed Yong. The Atlantic. January 4, 2016

“When we talk we take turns, where the “right” to speak flips back and forth between partners. This conversational pitter-patter is so familiar and seemingly unremarkable that we rarely remark on it. But consider the timing: On average, each turn lasts for around 2 seconds, and the typical gap between them is just 200 milliseconds—barely enough time to utter a syllable. That figure is nigh-universal. It exists across cultures, with only slight variations. It’s even there in sign-language conversations.”

“When one person asks another a question, it takes an average of 200 milliseconds for them to respond. This is so fast that we can’t even hear the pause. In fact, it’s faster than our brains actually work. It takes the brain about half a second to retrieve the words to say something, which means that in conversation, one person is gearing up to speak before the other is even finished. By listening to the tone, grammar, and content of another’s speech, we can predict when they’ll be done.”

The importance of pauses in conversation By Johnson. The Economist. December 14, 2017

“Conversation, it turns out, is a finely tuned machine, as Nick Enfield, a linguist at the University of Sydney, suggests in “How We Talk”. Humans mostly follow a rule called “no gap, no overlap”, reacting to the end of a conversational turn by beginning their own in about 200 milliseconds—about the time it takes a sprinter to respond to the starting gun. This is all the more remarkable given that it takes about 600 milliseconds for someone to work out what they are going to say by mentally retrieving the words and organising how they are to be expressed.”