Chain of Natural Language Inferencing (CoNLI) Prompting

Apr 28, 2024

Abstract

This paper talks about a new way to find and fix mistakes in texts created by LLMs that can write like humans. These mistakes are when the LLM says something that isn't true or doesn't make sense based on the information it was given.
The authors created a special method that checks the text in two steps to make sure it's correct and makes sense, improving the quality of the text without needing extra adjustments to the program.

Practical Implications

This paper introduces a method that helps computer programs write more accurately by catching and fixing errors where the program makes stuff up or gets things wrong based on the information it was given. This means that businesses and developers can use these programs to create better quality texts, like summaries or answers to questions, without having to manually check and correct them.
By using this new approach, the quality of automatically generated texts can be significantly improved, making them more reliable for users. This is especially important in areas like customer service, content creation, and educational tools, where accurate information is crucial.
The method is designed to be easy to add to existing systems, which means companies can improve their text generation without a lot of extra work or needing to understand complex programming.

Methodology

The paper introduces a hierarchical framework that first uses sentence-level detection to identify potential errors in text generated by large language models, followed by a more detailed entity-level detection to further scrutinize the text for inaccuracies.
It employs a Chain of Natural Language Inference (CoNLI) approach, which involves checking if the generated text logically follows from the given information, to detect and reduce hallucinations in the text without needing to fine-tune the language models or use specific prompts.
For the mitigation of detected hallucinations, the paper proposes a mitigation agent that uses the detection results to guide the rewriting of the generated text, aiming to preserve its fluency and coherence while correcting inaccuracies.

Limitations

The method relies heavily on the performance of the underlying language models for natural language inference, which means its effectiveness might be limited by the current capabilities and biases of these models.
While the framework is designed to be plug-and-play, integrating it into existing systems might still require technical knowledge and adjustments, potentially limiting its accessibility to non-technical users.
The approach primarily focuses on text-to-text generation tasks, which may not fully address the nuances and requirements of other types of language generation applications, such as conversational agents or creative writing tools.

Conclusion

The paper presents a new way to find and fix mistakes in texts created by LLMs, showing that their method works really well without needing extra tweaks or special instructions.
It introduces a smart system that checks if the text makes sense based on what was given before, helping to make sure the information is accurate and trustworthy.
This research also highlights the importance of keeping the quality of the text high while correcting errors, ensuring that the final text reads well and makes sense.

How Chain of Natural Language Inferencing Prompting is different from Chain-of-Thoughts Prompting?

CoNLI specifically targets the issue of "hallucinations" or incorrect information in text generated by AI, by checking if the text matches the given documents, ensuring the information is accurate and grounded in reality.
Unlike Chain-of-Thought (CoT), which guides the AI in a step-by-step reasoning process to reach a conclusion, CoNLI uses a hierarchical approach to first detect and then correct errors at both the sentence and entity level, making sure the final text is both accurate and relevant.
CoNLI achieves this without needing any special adjustments or specific instructions tailored to different tasks, making it a more versatile and straightforward application compared to CoT, which may require more detailed prompt engineering to guide reasoning.

What are different types of hallucination mentioned in this paper?

Context-related hallucination: This type involves generating responses that contradict common sense, meaning the information doesn't make sense based on what most people know to be true.
Self-conflicting hallucination: Here, the generated responses or sentences conflict with each other, like when a story or explanation doesn't add up because different parts disagree with each other.
Ungrounded hallucination: This type occurs when the generated sentences don't match or support the source text, essentially making up information that wasn't provided or implied by the original documents

Paper Infographic

Visual GenAI Summary