STAND WITH UKRAINE

AI Guides

Practical, in-depth guides for developers working with large language models. Covers prompt engineering, tokenisation, context window management, cost optimisation, and AI security.

Token Counting for LLMs: The Complete Guide

Token counting is the foundation of working efficiently with large language models. Every input and output you send to a model is measured in tokens — not words, not characters — and understanding the difference directly affects your costs, context usage, and response quality. This guide covers how tokenisation works, why it varies between models, and practical strategies for managing token budgets in production applications.

Prompt Engineering Basics: A Practical Guide

Prompt engineering is the practice of crafting inputs to language models that reliably produce useful outputs. While LLMs are remarkably flexible, the way you phrase a request dramatically affects the quality, format, and accuracy of the response. This guide covers the foundational techniques used by AI engineers — from basic instructions to few-shot prompting and chain-of-thought reasoning — with concrete examples for each.

Managing Context Windows in LLM Applications

Context window management is one of the most important engineering challenges in production LLM applications. As your application grows — adding conversation history, documents, and system instructions — you will inevitably approach the context limit. This guide explains the strategies engineers use to stay within limits while preserving the information the model needs to produce accurate responses.

Getting Structured Output from LLMs

Getting an LLM to reliably return structured data like JSON is one of the most important skills in production AI engineering. Without enforcement, models occasionally produce malformed JSON, add explanatory text around the JSON, or invent fields. This guide explains all available mechanisms for enforcing structured output and how to choose between them.

Prompt Injection: Risks and Prevention

Prompt injection is the most significant security vulnerability in LLM-powered applications. Attackers embed instructions in user inputs or retrieved content that override your system prompt and cause the model to perform unintended actions — leaking system prompts, bypassing safety filters, or executing privileged operations. Understanding and mitigating prompt injection is essential before deploying any AI application that handles untrusted input.

LLM Cost Optimization: A Practical Guide

LLM API costs can grow rapidly as your application scales. A chatbot that costs $50 per day in development can cost $5,000 per day at production scale. This guide covers the full range of cost optimisation strategies — from choosing the right model to implementing caching and batching — with realistic estimates of the savings each approach provides.

OpenAI vs. Claude: Choosing the Right LLM for Your Use Case

Choosing between OpenAI and Anthropic models is one of the first decisions AI engineers face. Both GPT-4o and Claude 3.5 Sonnet are state-of-the-art frontier models, but they have distinct strengths, pricing, and API capabilities. This guide compares them across the dimensions that matter most for production applications.

Function Calling and Tool Use in LLMs

Function calling (also called tool use) is the mechanism that transforms an LLM from a text generator into an agent that can interact with external systems. By declaring a set of available functions, you allow the model to request a function call with specific arguments, which your application executes and returns the result to the model. This guide covers the full function calling lifecycle, advanced patterns, and common pitfalls.

Writing Effective System Prompts for Claude

Claude's system prompt is your primary lever for controlling the model's behaviour across an entire conversation. Unlike the user message, the system prompt persists for the lifetime of the conversation and establishes the model's role, constraints, output format, and personality. Claude's training gives system prompts strong authority over behaviour, but there are specific techniques that reliably produce consistent, high-quality results.

Document Chunking Strategies for RAG Applications

How you split documents into chunks is one of the most important decisions in building a RAG (Retrieval-Augmented Generation) application. Poor chunking causes relevant information to be split across chunk boundaries and retrieved incompletely, leading to low-quality answers. This guide covers the main chunking strategies, when to use each, and how to evaluate chunk quality.

Batch Processing with LLM APIs

Batch processing transforms LLMs from interactive tools into data processing pipelines. When you need to process thousands of documents, generate content at scale, or run evaluations over a large dataset, the batch APIs offered by OpenAI and Anthropic provide a 50% cost discount while handling the throughput management for you. This guide covers when to use batch processing, how to implement it, and how to handle errors.

Handling PII in LLM Applications

Sending personally identifiable information (PII) to LLM APIs is one of the most common compliance risks in enterprise AI deployments. Names, email addresses, phone numbers, health information, and financial data in user inputs or documents may be inadvertently sent to third-party API providers, violating GDPR, HIPAA, and other regulations. This guide covers how to detect, redact, and handle PII safely in production LLM applications.

JSONL Datasets for LLM Fine-tuning and Evaluation

JSONL (JSON Lines) is the standard format for LLM training datasets, fine-tuning files, and batch evaluation sets. Each line in a JSONL file is a self-contained JSON object representing one training example or request. This guide covers how to create, validate, and optimise JSONL datasets for OpenAI fine-tuning, Anthropic Message Batches, and custom evaluation pipelines.

Testing and Evaluating AI Prompts

Prompt engineering without evaluation is guesswork. A prompt that works well on the ten examples you tested during development may fail on 20% of production inputs in ways you never anticipated. This guide covers the methods used by AI engineers to systematically evaluate prompts — from simple manual review to automated evaluation pipelines — so you can ship prompt changes with confidence.