The Secret Ingredient for Successful AI? Politeness!

Don't let your AI turn into a Rudebot. Here's how to ensure polite customer interactions.

ai-politeness-rudebot-ISV

Politeness remains a cornerstone of positive interactions. Contact centers have long monitored their agents for politeness, understanding its critical role in customer satisfaction and retention. As conversational AI becomes more prevalent, ensuring that these systems maintain a high standard of politeness will continue to be vital for sustained business success in the digital age. 

Understanding Politeness in AI

Politeness is a universal social norm between humans that guides appropriate behavior in various contexts and situations. According to research published in the Artificial Intelligence Review, politeness helps people maintain positive social value and trust in interactions.  Moreover, if computers exhibit polite behavior in the correct situation, this response may contribute to their trustworthiness. Even though this concept seems straightforward, it’s multifaceted and culturally dependent. After all, what is considered polite behavior can vary across cultures and situations, adding layers of complexity to defining and measuring politeness in AI. For example, a phrase that is polite in one culture might be perceived as overly formal — or even rude — in another culture.

For businesses using AI in customer interactions, this variability means that a one-size-fits-all approach to politeness may not be effective. AI systems must be designed to recognize and adapt to these nuances. Additionally, language and social norms change over time, requiring AI systems to adapt to current standards. Thus, creating AI that can navigate these subtleties is crucial for building consumer trust and loyalty. 

The Need for Frequent, Automated Testing

The non-deterministic nature of generative AI means that the same prompt can yield different responses. This variability creates dynamic, engaging conversations. But, it also introduces a level of unpredictability at best, instability at worst. Unlike traditional intent-based chatbots, where responses are pre-written and approved, responses from a system using generative AI respond in real-time and cannot be pre-approved by a branding team.

In this situation, the process of generating a response from a Large Language Model (LLM) generally involves a system prompt, which guides the tone and behavior, and the user prompt, which is a (potentially sanitized and modified) version of the user’s input. Even small, seemingly insignificant changes in the system prompt can unintentionally affect other aspects of the AI’s behavior, such as politeness. These changes are why frequent and automated testing is crucial. As a developer iterates on a prompt to improve accuracy, regularly repeating these tests ensures that any changes to the system prompt do not negatively impact the politeness of AI responses.

So where does automated testing begin? It starts with the ability to define and measure the target attribute. Through my company’s research, we developed a framework for evaluating conversational AI for politeness, utilizing various classifiers. These include embedding classifiers, few-shot prompt classifiers, and fine-tuned language models. Our research has shown that initially, deploying high-parameter large language models (LLMs) with few-shot prompts can be the most effective at evaluating politeness, especially when domain-specific data is scarce. (Note that this approach can become costly at scale.)

Leveraging Data and User Feedback

As AI systems interact with more customers, accumulating interaction data allows for fine-tuning smaller models, or even developing traditional classifier models. This iterative process balances cost and performance, ensuring that the AI maintains a polite demeanor while becoming more efficient.

User feedback is crucial in this refinement process. Gathering insights from real interactions helps validate the AI’s performance and informs further adjustments. For example, a customer service chatbot using a high-parameter LLM may start with broadly polite interactions. Over time, as it collects data, the system can be fine-tuned to better reflect specific politeness norms valued by its user base — leading to improved customer satisfaction.

Balancing Politeness and Practicality

While politeness is generally seen as positive, it’s filled with nuances. Overly polite AI responses can sometimes be perceived as insincere or dismissive, particularly in contexts like health support. Striking the right balance is essential and often requires context-specific adjustments. The AI’s polite demeanor can result in a lack of specificity, personalization or urgency, which can be important. Balancing politeness with clarity is essential to ensure that the AI provides appropriate and effective assistance.

Recent incidents highlight the importance of monitoring AI for politeness. For instance, the BBC reported an instance where a chatbot used by a parcel delivery firm unexpectedly swore at customers due to an update error. Similarly, an AI chatbot trained on billions of conversations between young couples was removed from a popular social media platform after making discriminatory remarks about minorities. These examples underscore the potential reputational risks for brands of unmonitored AI interactions, which we can minimize with rigorous testing and evaluation. 

Why Politeness Matters for AI Success

Evaluating conversational AI for politeness is not just about maintaining good manners; it’s about building trust, enhancing user experience, and driving customer loyalty. By leveraging automated testing, user feedback, and continuous refinement, businesses can better ensure their AI interactions are consistently polite and effective. As we advance these methods, our goal is to create AI systems that mirror the best of human interactions, delivering courteous and engaging customer service at scale that mitigate misunderstandings, miscommunications, or inappropriate deference to users’ inputs.

Michelle Avery

With 17 years of experience across engineering, AI, FinTech, and Robotics, Michelle Avery is an innovative technology executive, currently the Group Vice President of AI at WillowTree, a Telus International Company. At WillowTree, a leading digital product agency, she spearheads initiatives in generative AI, pushing the boundaries of AI and data-intensive applications for Fortune 100 clients. Michelle’s journey in tech began without a formal degree in computer science, proving that dedication, continuous learning, and a passion for innovation can drive success in the tech industry.


Datacap - We Solve Payment Problems
Michelle Avery

With 17 years of experience across engineering, AI, FinTech, and Robotics, Michelle Avery is an innovative technology executive, currently the Group Vice President of AI at WillowTree, a Telus International Company. At WillowTree, a leading digital product agency, she spearheads initiatives in generative AI, pushing the boundaries of AI and data-intensive applications for Fortune 100 clients. Michelle’s journey in tech began without a formal degree in computer science, proving that dedication, continuous learning, and a passion for innovation can drive success in the tech industry.