SaaS Platform - Annotation system

Overview

This new stock trading app aims to explore a new market targeting at individual investors with a stock trading APP. Through user research, identified user pain points and product prospects, as well as tailored product designs and data services for young investors. Provide concise information presentation and interpretation to help users make better investment decisions.

Background

How does the chatbot system work?

In the era without chatGPT or any large language module(LLM), the language module was trained with corpus provided by clients’ frequently asked questions(FAQs) and customer conversation log. We helped clients build their own FAQ knowledge base. If the user asks a question, then it will go into the Q&A intent to find the most matched corpus.
‍

Why do we need an annotation system?

Annotating on AI conversational text regularly can help keep high quality of the chatbot service, and evaluate on its performance Measure on how much questions the AI can cover, and the percentage that it can answer correctly.
Also, correcting and continuous update can help to fine-tuning. When it answers in a wrong way, annotators can teach it the right answer by pointing to the matching corpus. If the AI can’t answer the question because there is no matching one, people can make up for it in time.

Design challenge

1. Train module with validated corpus data

The requirement of training data:
- avoid conflict and confuse corpus in module training;
- reduce overly similar questions.
‍

2. Define the key indicators to measure AI performance

Accuracy, recall and precision are the metrics commonly used in machine learning. However, as a SaaS Service provider, we need make it clear how the AI performances in details — What can be improved by AI module, and what should customer service group make more efforts.
‍

3. Streamline the product process with efficiency

The annotation involves lots of repetitive work, so the work flow should be designed in a concise way, leading to handle high volume of tasks in a short time.What’s more, the system design should be ethical to keep annotator work with a sense of accomplishment and respect their work.
‍

Design Process

1. Requirements of corpus knowledge base

In the AI Q&A system, both the quality of module and corpus will decide the performance of its answers. The more corpus knowledge base cover customers’ frequent questions, the more questions can be answered by AI. Meanwhile, for each corpus, it contains of multi questions that might be asked in different ways. The more diverse the questions are, the more accurately the model can predict.

2. Stakeholders in the system

Normally, the customer service group of our clients plays the role in managing the chatbot, who are responsible for gathering FAQs corpus annotation works. In the annotation system, the Admin is charge of annotation task management and supervising on quality of annotations. The Annotators are assigned to finish these tasks.‍

3. Annotation system modules

The annotation flow contains 3 main stages:
‍Create&Assign task: The admin create an annotation task, then the system will select samples from conversation log, and assign them to annotators.
‍Annotation: Each annotators will take their own task. Admin need to review them, in order to supervise the data are annotated correctly.
‍Analyze&Fine-tuning: The system will calculate the indicators and fine-tuning the module based on the result of annotation.‍

4. Annotation flow

The annotate flow is that examine the intent classification firstly, if its a QA intent, then examine the whether it matched to the right question in the corpus module.‍

5. The key indicators of Q&A System

To evaluate the performance of Q&A System, we need to define the indicators to analyze on these aspects:
- How does the AI module perform in accuracy
- How much it can be improved by supplementing the corpus base

The AI applies classification module to predict users’ intents. Accuracy, precision, and recall help evaluate the quality of classification models in machine learning.‍

Intent Accuracy shows how often a classification ML model is correct overall.‍

Intent Recall shows whether an ML model can find all objects of the target class.‍

Question Accuracy shows how often a model can match the correct similar corpus. In order to exclude the effect of the effects of missing corpus, the accuracy minuses the amount of missing corpus*.

(* Missing corpus: The questions are not existed in the corpus knowledge base.)‍

Iteration

After launched the first version, we collect valuable feedback from clients. Through analyzed their feedback and behavior data, we made an iteration.
‍

1. Reduce repetitive work

The system selected samples from conversation log randomly, leading to a plenty of similar questions in the task. It’s not productive to let annotators deal with these similar questions for many times. So I reduced the repetitive by clustering questions with high similarity, and only select a few from the cluster.

2. Simplify annotation flow

If the questions are not random distributed, then it can’t evaluate accurately. So I separated annotation into two types tasks — improvement, evaluation.

3. Add missing corpus with efficiency

Cluster the similar questions, and add it once in an Excel sheet. It saves a plenty of time to edit while annotating. And it also reduce repetitive work.

3. Refer to context of the conversation

Sometimes annotators need to know the context of the question. So we provide an access refer to the whole conversation of this session.

The final design work

Result

The annotation system helps our clients improve the AI performance successfully and rise the productivity of annotating.