feedback systems to drive AI model improvement

UX Designer & Researcher | Honeywell | Atlanta, GA

Nov 2024 - Jan 2024

How do we get users to provide feedback on the data that ai models generate?

background

In 2024, my team at Honeywell developed an AI-powered chatbot to help maintenance engineers and technicians quickly find solutions to problems without having to sift through extensive documentation. The chatbot uses generative AI technology to provide responses based on the available documentation, allowing users to simply type in their issues and receive relevant solutions.

However, the team was struggling to get meaningful feedback from users to improve the AI model's accuracy, as users were not accustomed to providing feedback like they do with other AI tools. To design an effective feedback mechanism that would encourage users to provide input on the chatbot's responses, without frustrating or annoying them, in order to continuously improve the AI model.

This case study highlights my approach to designing more effective feedback loops within AI products to gather feedback from users on the response generated by the AI models.

problem statement

After the product was deployed at customer sites, the team came back with a new challenge we faced - users were not providing feedback on the GenAI model's responses.

The engineering team had initially mandated feedback, which led to user complaints and high churn rates, as users found it time-consuming and frustrating, especially when dealing with urgent issues. The team was struggling to find the right balance between getting enough user feedback to improve the model and not annoying the users.

should feedback be mandated?

Feedback loops are essential in AI systems to learn from the user and to drive continuous improvement to the models. Essentially, if the user believes the answer generated by the model is wrong, they can provide this feedback to help the system do better next time. This in turn helps improve response quality, build user trust, and enhance the overall experience.
In this case, without this crucial input, the system’s ability to refine its outputs and adapt to evolving troubleshooting scenarios was at risk.

If feedback is mandated, users will not be able to ask a next question without essentially 'rating' the previous response from the chat agent.

To add to this, if a user rated it as a good response, they did not have to provide detailed feedback. However, if a user rate it as a bad response, they would need to provide detailed feedback.

This goes against the principles of a good user experience.

Users should have the flexibility to navigate and interact with the system at their own pace. Blocking further action until feedback is provided takes control away from the user and limits their autonomy.

Interruptions that break the conversational flow reduce efficiency. Given the nature of the product, users need to focus on problem-solving rather than tasks like providing ratings which do not benefit them right away.

Forcing users to stop and rate each response adds unnecessary steps and disrupts their workflow, increasing frustration.

Imagine if ChatGPT (or Gemini or Copilot) required you to rate the generated responses after every single question you asked. Would you continue using the tool? How would you go about giving feedback?

Suddenly, but not surprisingly, we had frustrated users and were gathering feedback which was unhelpful to train the model. Along with creating a better experience for users and ensuring we are gathering data for training the model, I had the added challenge of researching how mandating feedback after each query was not the best approach.

how feedback models work with ai

proposed concepts

After reviewing examples of feedback systems that minimize user disruption across various products, I explored solutions to balance the need for collecting meaningful feedback with maintaining a seamless user experience. Options 2 and 3 were design explorations, while Option 1 was created specifically for testing purposes to understand users' perceptions of the current mandatory feedback experience.

Below are the three concepts which were evaluated:

~ concept 1: mandatory feedback with in-line interactions

What it is:
Users are required to provide feedback after each response. They cannot proceed with their next question until they have rated or responded to the feedback prompt.

How it works:
A rating scale or simple thumbs up/thumbs down button is embedded in the chat below each AI response. Users must interact with it before continuing the conversation.

~ concept 2: optional feedback with in-line interactions

What it is: Users can provide feedback on responses via an embedded feedback option within the chat interface, but it is entirely optional. The idea was that users would provide feedback if they find issues with responses.

How it works: A subtle thumbs up/thumbs down or rating option appears below each AI response. Users can choose to provide feedback without any interruption to their workflow.

~ concept 3: optional feedback with popup-style modal

What it is: Feedback is optional but presented in a separate popup modal, which appears at specific milestones (e.g., at the end of a conversation or after resolving a task) in addition to the option of providing feedback on each response.

How it works: Instead of embedding feedback prompts in the chat interface, users are presented with a small popup at natural stopping points, asking for feedback on their overall experience.

These concepts were designed to address user pain points while meeting the need for actionable feedback. Based on initial testing and user interviews, Concept 2 emerged as the most user-friendly approach, offering a balance of user autonomy and efficient data collection. Next steps involve prototyping and usability testing to validate this concept and refine its implementation.

user validation & key insights

Along with another UX researcher on my team, I conducted concept testing sessions to evaluate feedback optimization in Maintenance Assist.
During this study we also collected information on feedback gathering models to identify how users perceive giving feedback to AI chat agents on responses generated by the system.

My role involved - creating testing script, co-facilitating the sessions, analyzing and synthesizing the data and writing a read-out report to share with the product team.

Methodology

Participants

7

Moderated concept testing sessions

3

Concepts

5

Maintenance Engineers

2

Maintenance Analysts

~ insights on the feedback model

concept comparison
Participants were asked to choose between three concepts and the majority preferred concept 2 over others. Users want a smooth, uninterrupted experience that allows them to continue interacting with the chat agent and quickly access the necessary information.

"But that would be my kind of critique here is if I'm on the right track and I want to get to a better answer quickly that locking me out kind of stops my train of thought.”
-Participant 7

ongoing chat and follow ups
All participants indicated that they would need to send follow up queries to get more details or clarifications. Participants expressed frustration when required to give feedback before they could ask follow up questions. Users want to be able to ask follow up questions without having to prioritize giving feedback.

"I am less likely to give feedback if I need to ask a follow up question because I'm wanting to move on and get a follow up question in as fast as I can.”
- Participant 6

frustration with mandate feedback
Participants said they would be using this tool to address urgent maintenance issues. Having to prioritize giving feedback in such cases would hinder their ability to resolve critical issues quickly. Users want to focus solely on solving their problems efficiently without the added task of mandatory feedback in high-stake situations.

"If I was in a hurry, say this compressor was costing the company a million dollars a day for lost production - I wouldn't want to go through all this just to give feedback. I may have another inquiry so that the requirement for feedback is something that I would not want or expect.”
- Participant 2

motivation to provide feedback
Participants stated that they would be more likely to give feedback if prompted by a supervisor or team lead. While they might not be motivated to provide feedback on their own, they feel more compelled to do so when it's a part of a formal process, as it aligns with their professional responsibilities.

"Say this was being administered by somebody, an engineer, somewhere up in my up in my chain of command and they were asking for it, then I would definitely do it."
-Participant 2

feedback on inaccurate responses
Participants reported that they are more likely to provide feedback if they received inaccurate responses from the chat agent. Users feel the need to correct the agent's responses to prevent future inaccuracies.
They want to feel confident that their feedback will improve the model's accuracy and ensure better responses.

"If I'm going to interact with that program in the future, it will affect all the answers in the future.If I don't interact, it will keep giving me incorrect answers."
-Participant 6

expectations with feedback model
Participants expect that the feedback they provide is reviewed and contributes to improving the agent's response. Users are more likely to engage with the feedback process if they know their input is actively helping the system learn and improve. Tangible evidence that their feedback is being used would motivate them to continue providing feedback.

“Assuming that I know that my response is helping either the AI or helping somebody actually make this system learn and learn better then that, that's a driver for me to want to give it the correct feedback.”
-Participant 7

~ views on ai

concerns with accuracy and references
Participants expect to receive trustworthy and reliable information from the agent and want to verify the data it provides.

They expressed concerns over outdated manuals being referenced leading to errors in critical decision making. Having citations and links help quickly verify accuracy of data.

"Some engineers could be aware that older revision (of the document) is right. New revision of the document is not right, but the model would populate the newer revision of the information.”
- Participant 5

feedback on inaccurate responses
Participants discussed how they work with expensive and critical assets which makes them highly cautious about taking actions based on the model's responses.

They want to be confident that the responses are reliable and that they can minimize risks, particularly when inexperienced users might overlook mistakes or misinterpret suggestions.

”It’s very important that our machines keep running and then if the AIgives the wrong recommendation or wrong suggestion and then we try it and it'sactually damaged the machine, that will be very costly."
- Participant 4

expectation with feedback model
Most participants have experience using consumer-facing generative AI tools, such as ChatGPT, Gemini and Co-pilot.

While they are familiar with these platforms, they may expect similar interaction patterns which can make their transition smoother and the experience more intuitive.

ux recommendations

1. Given the time critical nature of work for our target audience and keeping their goals in mind, mandating feedback is not recommended.

2. Including feedback collection strategically is recommended.

3. Recommend team intervention to encourage users to provide feedback, highlighting its importance in enhancing the overall experience. 

4. Updated messaging to communicate that the feedback users provide helps improve the responses.

5. Training the model more frequently based on the collected feedback.

6. Add a feedback loop at timely intervals.

final prototype

The final prototype involved incorporating all of these changes and adding an optional feedback loop after every 'X' questions* to ensure the conversations are going as expected.

reflection

The key challenge was finding the right balance between getting necessary user feedback and not frustrating or annoying the users, who were dealing with urgent issues.

The design process, including user research and research study, was crucial in identifying the most effective approach that aligned with user expectations and behaviors. By focusing on optional feedback and clear communication, the team was able to strike a balance and improve the model's accuracy without compromising the user experience.