ConveyorAI’s job is to pull the best information you already have (policies, past answers, wikis, etc.) and draft security-questionnaire or RFP answers for you. Most of the time it’s 90 %+ correct, but mistakes can still happen. Here is the simple version of why and what we do about it:

Why an answer can be wrong	How ConveyorAI reduces the risk
Out-of-date or conflicting content in your knowledge base.	• Sync directly with “source-of-truth” systems (Confluence, Google Drive, websites) so you edit in one place. • Automatic reminders to verify any Q&A you store in Conveyor. • “AI Librarian” spots duplicates or conflicts and retires or flags them. • If you manually correct an answer, that correction becomes the new source for the future.
The question was misread or ambiguous.	Ongoing improvements to question extraction and context capture. If extraction is wrong you can correct the answer - which will be remembered infinitely by default.
LLM hallucinations (the model makes something up).	Retrieval-Augmented Generation (RAG) forces the model to ground answers only in the cited sources, plus automatic checks that the draft actually matches those sources. Occurs < 0.1 % of the time in internal tests.

What are confidence scores and how do they affect if/when ConveyorAI answers?*

When ConveyorAI drafts an answer it tags it with a traffic-light confidence color:

Green – exact match from a past, already-approved answer (safe to auto-send).

Blue – high-confidence AI answer grounded in good sources (often safe to auto-send).

Yellow – lower confidence (partial match, mixed sources, or uncertain context). These are routed to a human for review before anyone external sees them.

Admins can set a policy such as “only send Green or Blue automatically; hold Yellow for review,” so the confidence score directly gates whether the answer is released immediately, held for escalation, or never shown.

How do we protect customers from hallucinations?

Grounding in your data (RAG). The model is instructed to answer only with the snippets it just retrieved.

Source-consistency guardrail. If the AI’s draft strays from those snippets, the system deletes or flags the answer instead of sending it.

Modern, higher-accuracy models. Switching to state-of-the-art LLMs further lowers the base hallucination rate.

Visibility of evidence. Every answer cites its sources, so reviewers (or customers, if you allow it) can quickly verify the claim.

All LLMs are known to "hallucinate": that is, due to the nature of how they work, they occasionally generate a statement that was not present in the original facts.

The off-the-shelf hallucination rate of an LLM like GPT-4 when conducting retrieval-augmented generation is approximately 3%. That means if you were to ask GPT-4 1,000 questions about a corpus of information, GPT-4 might return an answer that contains a fabricated fact (i.e., a "hallucination") about 30 times. Older or cheaper models result in an even higher hallucination rate.

The technology behind ConveyorAI contains a variety of steps that prevent, detect and eliminate hallucinations from its answers. The result is that ConveyorAI's hallucination rate is significantly lower than the 3% hallucination rate of off-the-shelf LLMs.

Specifically, when ConveyorAI is processing an information security questionnaire, and has been seeded with a strong library, the hallucination rate is 0.1%. That means for every 1,000 answers generated by ConveyorAI, only 1 may contain a hallucination.

Conveyor is able to achieve such a low hallucination rate by:

Using post-processing techniques to ensure ConveyorAI answers are grounded in customer data;
Relying on the most advanced LLM models and frameworks available (rather than smaller, cheaper, faster models that tend to hallucinate more)
Providing prescriptive guidance about what content to add to your library to generate the highest-confidence answers
Robust observability, evaluation and experimentation pipelines that allow us to understand loss cases across thousands of questionnaires and iterate quickly on our technology.

What is our AI grading mechanism and why does it matter?

Conveyor runs internal evaluations (“evals”) on both lab test sets and real production traffic:

Each new model, retrieval tweak, or prompt change is graded against a gold-standard answer set before release.

Live answers are continuously sampled and scored so the team can catch quality dips early.

These scores feed the product roadmap—areas with lower grades get prioritized for improvement.

For a user, this means accuracy isn’t left to chance; it is measured, trended, and used to drive ongoing improvements.

Hallucinations vs. inaccuracies—what’s the difference and how are they handled?

Term	What it means	Typical Cause	How ConveyorAI mitigates
Hallucination	The AI invents facts that do not appear in any source.	LLM reasoning quirks.	RAG grounding + source-consistency checks + modern models keep the rate under 0.1 %.
Inaccuracy	The AI faithfully uses the source, but the source itself is wrong, outdated, conflicting, or not the right context.	Content maintenance issues; retrieval misses; question misunderstood.	Source-of-truth sync, verification reminders, AI Librarian conflict detection, product-line scoping, confidence scoring and human review for edge cases.

Take-away for users

Confidence scores tell you when it is safe to let the ConveyorAI respond automatically. Rigorous guardrails and grading make sure both hallucinations and simple inaccuracies stay rare and visible. As your team updates or corrects answers, ConveyorAI learns, pushing overall accuracy even higher over time