Where do LLMs Encode the Knowledge to Assess the Ambiguity?

Feb 10

Written By Seri An

**Hancheol Park, Ph. D.**
AI Research Engineer, Nota AI

**Geonmin Kim, Ph. D.**
AI Research Engineer, Nota AI

Summary

In this study, we present a method for detecting ambiguous samples in natural language understanding (NLU) tasks using large language models (LLMs).

As a novel approach, we propose a classifier that uses as input the representation from an intermediate layer of the LLM.
Our paper on this study has been accepted for the 31st International Conference on Computational Linguistics (COLING 2025).

Key Messages of the Paper

Recently, large language models (LLMs) have shown remarkable performance across various natural language processing tasks, thanks to their vast amount of knowledge. Nevertheless, they often generate unreliable responses. A common example is providing a single biased answer to an ambiguous question that could have multiple correct answers. To address this issue, in this study, we discuss methods to detect such ambiguous samples. Specifically, we focus on ambiguous samples in natural language understanding (NLU) tasks. Typical examples of NLU tasks include recognizing textual entailment and analyzing emotions in text, where samples are generally accompanied by predefined labels as the correct answers.

More specifically, we propose a classifier that uses a representation from an intermediate layer of the LLM as input. This approach is based on our observation that representations of ambiguous samples in intermediate layers are closer to those of relevant labeled samples in the embedding space, whereas this is not necessarily the case in higher layers. The experimental results demonstrate that using representations from intermediate layers detects ambiguous input prompts more effectively than using representations from the final layer. Furthermore, in this study, we propose a method to train such classifiers without ambiguity labels, as most datasets lack labels regarding the ambiguity of samples, and evaluate its effectiveness.

Significance/Importance of the Paper

LLMs often generate unreliable responses to users' inputs. Especially due to these reliability issues, it may be difficult for service providers to offer their LLMs, which have required significant investment to develop, leading to serious setbacks. Given the recent growth in the market for applications based on LLMs, this problem should be addressed.

The most well-known cause for generating unreliable responses is hallucination behavior. This refers to the behavior where LLMs respond with a tone of high confidence in incorrect information. This issue has been extensively addressed by numerous researchers . Another reason is the response behavior of LLMs, which often provide a single biased answer to an ambiguous question that could have multiple correct answers, as shown in Table 1.

Ideal LLMs should indicate whether such questions are ambiguous and encourage alternatives such as using multi-label classification models or judgments from experts to help users make better decisions without bias. Particularly, since it is well-known that numerous ambiguous samples exist in NLU tasks, it is crucial to determine whether a given sample is ambiguous. Nevertheless, research on determining whether input prompts are ambiguous or not has not been relatively well-explored, and only a few impractical methods have been proposed.

Summary of Methodology

In this study, we verify whether the representations (i.e., hidden states of the last input tokens) from intermediate layers contain knowledge that can judge the ambiguity of input prompts. To do this, we first automatically construct annotated datasets indicating whether each input prompt is ambiguous or not across various NLU tasks. Then, we train classifiers that use representations of input prompts from the instruction-following LLMs as inputs.

Datasets for Detecting Ambiguity

We first create datasets where each sample is annotated to indicate whether it is ambiguous or not. To automatically construct these datasets, we use existing datasets that are used for multi-label classification or those that contain multiple annotations per sample. Specifically, we use three datasets for sentiment analysis and NLI tasks. For sentiment analysis, we employ the GoEmotions dataset, which is a multi-label emotion and sentiment analysis dataset. For the NLI tasks, we use the SNLI and MNLI development and test datasets, which contain multiple annotations (5 or 100) per sample.

For multi-label datasets, samples annotated with multiple labels are considered ambiguous. For the NLI datasets, if all five annotators provide the same label, the sample is considered non-ambiguous. If two labels receive at least two votes each (e.g., 3/2/0 or 2/2/1), the sample is considered ambiguous. Additionally, a subset of samples from SNLI and MNLI is annotated by 100 annotators in the ChaosNLI dataset. We use this information to annotate each sample: samples where the majority label receives more than 80 votes out of 100 are considered unambiguous, while samples where the majority label receives less than 60 votes are considered ambiguous.

Finally, the texts in the entire dataset are modified into the format of input prompts for LLMs. To simulate scenarios where actual users employ instruction-following LLMs, the prompts used during training are constructed differently from those used during the evaluation stage. Examples of the prompt templates we used are illustrated in Table 2.

Classifier for Detecting Ambiguous Samples

We train a classifier that uses a representation from a layer of an LLM as input to determine whether input samples are ambiguous or not. This representation corresponds to the hidden state of the last token in the input prompt. If our hypothesis hold true, representations from intermediate layers should effectively distinguish between ambiguous and unambiguous samples, leading to high classification accuracy. In this work, we employ a three-layer multi-layer perceptron (MLP) as the classifier, with ReLU activation functions applied to each layer.

Experimental Results

As instruction-following LLMs, we use instruction-tuned OPT-IML-1.3B, LLaMA 2-7B and 13B. These models have 24, 32, and 40 layers and 2,048, 4,096, and 5,120 hidden units, respectively. We use three-layer MLP classifiers to detect ambiguous samples. For OPT-IML-1.3B, the configuration is 2,048-512-128-2 for the hidden units. For LLaMA 2-7B, the configuration is 4,096-1,024-256-2, and for LLaMA 2-13B, it is 5,120-1,024-256-2 hidden units.

As shown in Table 4, we can observe that using representations from the intermediate layers is more effective in determining the ambiguity of samples than using representations from the final layers. It has been also confirmed that detecting ambiguous samples in more subjective tasks such as sentiment analysis is more challenging than in NLI tasks. Furthermore, since the optimal intermediate layer varies across tasks and models, identifying such layer for each task appears to be a new challenge for the future.

In this study, we also propose a method to train such classifiers without ambiguity labels, as most datasets lack labels indicating the ambiguity of samples. To achieve this, we introduce a loss function based on well-known training dynamics:

Mathematical Explanation

Where $$x$$ is the input prompt, $p_{gt}$ is the predicted probability for the ground truth label of the original task (e.g., for an NLI task, the probability for one of the labels: entailment, neutral, or contradiction) and $p_{amb}$ is the probability that a given sample is ambiguous.

Both $p_{gt}$ and $p_{amb}$ are calculated by passing the output logits of a classifier that uses representations from an LLM as input through a softmax layer. To achieve this, the number of output neurons in the final layer of the classifier is adjusted to be the number of labels for each task plus one (for the label indicating that a given sample is ambiguous).

It is known that deep learning models start by learning easy samples in the early stages of training and progress to harder samples later on. Therefore, we assume that if the $p_{gt}$ value is low in the early stages of training, the sample is ambiguous and difficult to judge with a specific label. The hyperparameter $\lambda$ is tuned using a small set of labeled validation samples that indicate whether a sample is ambiguous or not.

As shown in Table 7, it can be observed that by training with the proposed loss function, it is possible to train classifiers to determine ambiguity even without labels for ambiguity.

Conclusion

In this study, we found that using representations from intermediate layers allows for a more accurate assessment of the ambiguity in input prompts. This enables LLMs to evaluate the ambiguity of inputs before generating responses for tasks that require such judgment.

In future work, we will explore methods for automatically annotating the ambiguity of samples in NLU datasets, particularly when evaluation results from multiple annotators per sample are unavailable.

Furthermore, we will investigate techniques for automatically selecting the optimal intermediate layer that most effectively supports the assessment of input prompt ambiguity.

If you have any further inquiries about this research, please feel free to reach out to us at following 📧 email address: contact@nota.ai.

Furthermore, if you have an interest in AI optimization technologies, you can visit our website at 🔗 netspress.ai.

Seri An