What is ConveyorAI's Hallucination Rate?

All LLMs are known to "hallucinate": that is, due to the nature of how they work, they occasionally generate a statement that was not present in the original facts.

The off-the-shelf hallucination rate of an LLM like GPT-4 when conducting retrieval-augmented generation is approximately 3%. That means if you were to ask GPT-4 1,000 questions about a corpus of information, GPT-4 might return an answer that contains a fabricated fact (i.e., a "hallucination") about 30 times. Older or cheaper models (e.g. GPT-3.5) result in an even higher hallucination rate.

The technology behind ConveyorAI contains a variety of steps that prevent, detect and eliminate hallucinations from its answers. The result is that ConveyorAI's hallucination rate is significantly lower than the 3% hallucination rate of off-the-shelf LLMs.

Specifically, when ConveyorAI is processing an information security questionnaire, and has been seeded with a strong library, the hallucination rate is 0.1%. That means for every 1,000 answers generated by ConveyorAI, only 1 may contain a hallucination.

Conveyor is able to achieve such a low hallucination rate by:

  • Utilizing post-processing techniques to ensure ConveyorAI answers are grounded in customer data;
  • Relying on the most advanced LLM models and frameworks available (rather than smaller, cheaper, faster models that tend to hallucinate more);
  • Providing prescriptive guidance about what content to add to your library to generate the highest-confidence answers;
  • Robust observability, evaluation and experimentation pipelines that allow us to understand loss cases across thousands of questionnaires and iterate quickly on our technology.