Should I use the Assistants API or build a custom conversation loop?

The Assistants API (OpenAI) provides persistent threads and built-in tool support, which simplifies development for standard use cases. For full control over context management, cost optimisation, and multi-model architectures, build a custom loop with the chat completions API.

How do I prevent the chatbot from making up information?

Use RAG to ground responses in verified sources, include "If you don't know the answer from the provided context, say so" in the system prompt, and implement a citation requirement so every claim is backed by a source reference that users can verify.

How much does a production chatbot cost to run?

Costs vary enormously by traffic and model. A chatbot serving 10,000 daily active users with 10 messages each, averaging 500 input tokens and 200 output tokens per message, using GPT-4o mini costs approximately $67/day. Using GPT-4o for the same traffic costs $1,125/day. Model selection is the single largest cost variable.

Should I use the Assistants API or build a custom conversation loop?

The Assistants API (OpenAI) provides persistent threads and built-in tool support, which simplifies development for standard use cases. For full control over context management, cost optimisation, and multi-model architectures, build a custom loop with the chat completions API.

How do I prevent the chatbot from making up information?

Use RAG to ground responses in verified sources, include "If you don't know the answer from the provided context, say so" in the system prompt, and implement a citation requirement so every claim is backed by a source reference that users can verify.

How much does a production chatbot cost to run?

Costs vary enormously by traffic and model. A chatbot serving 10,000 daily active users with 10 messages each, averaging 500 input tokens and 200 output tokens per message, using GPT-4o mini costs approximately $67/day. Using GPT-4o for the same traffic costs $1,125/day. Model selection is the single largest cost variable.

Building AI Chatbots with LLM APIs

The Problem

Building a reliable, production-grade AI chatbot is significantly more complex than calling an LLM API and displaying the response. Conversation context management, safety guardrails, cost control, hallucination handling, and graceful degradation on out-of-scope questions all require careful engineering that is not obvious from the API documentation alone.

How AI Helps

01.Generates system prompt templates that define the chatbot's persona, domain scope, and fallback behaviour, giving developers a production-ready starting point rather than a blank prompt.
02.Writes conversation state management logic for multi-turn conversations: tracking context, summarising old turns, and maintaining user preferences across sessions.
03.Designs intent routing: a lightweight classifier that routes simple queries to a fast, cheap model and complex queries to a more capable one, optimising cost and latency simultaneously.
04.Generates test cases for safety and quality evaluation: adversarial inputs, out-of-scope questions, and jailbreak attempts that should be handled gracefully.
05.Writes the conversation evaluation pipeline — using an LLM judge to score response quality, helpfulness, and safety across a test set — enabling systematic quality monitoring.

Recommended Tools

PBD

AI Prompt Builder

Build structured AI prompts with role, task, context, and output format fields.

CHN

AI Prompt Chain Builder

Design multi-step AI prompt chains with variable references between steps.

PLV

AI Pipeline Visualizer

Visualize AI prompt chain JSON as a vertical flowchart.

AIC

AI Prompt Injection Checker

Detect prompt injection attacks in text with pattern matching and a 0-10 risk score.

AOF

AI Output Filter Builder

Build content filter rules for LLM output: blocked words, regex patterns, PII detection, and format constraints.

Recommended Models

gpt-4oclaude-3-5-sonnet-20241022gpt-4o-mini

Example Prompts

[prompt]

Code Review Prompt

This prompt structures code reviews into five clear categories so the AI produces actionable, priori...

[prompt]

Test Plan Generation Prompt

Test plans generated without structure produce long lists of obvious cases while missing the most va...

FAQ

Should I use the Assistants API or build a custom conversation loop?: The Assistants API (OpenAI) provides persistent threads and built-in tool support, which simplifies development for standard use cases. For full control over context management, cost optimisation, and multi-model architectures, build a custom loop with the chat completions API.
How do I prevent the chatbot from making up information?: Use RAG to ground responses in verified sources, include "If you don't know the answer from the provided context, say so" in the system prompt, and implement a citation requirement so every claim is backed by a source reference that users can verify.
How much does a production chatbot cost to run?: Costs vary enormously by traffic and model. A chatbot serving 10,000 daily active users with 10 messages each, averaging 500 input tokens and 200 output tokens per message, using GPT-4o mini costs approximately $67/day. Using GPT-4o for the same traffic costs $1,125/day. Model selection is the single largest cost variable.

Related Use Cases

Building RAG Pipelines with AI

LLMs have a training cutoff and cannot access your proprietary documents, internal knowled...

AI-Assisted Code Review

Manual code reviews are time-consuming and inconsistent. Reviewers miss security vulnerabi...

AI Document and Meeting Summarisation

Information overload is a universal productivity problem. Long meeting transcripts, resear...