dataLlama

SQL Query Generation Prompt (LLaMA / Ollama)

Generating SQL without schema context produces queries that reference non-existent columns or miss join conditions. This prompt requires the schema upfront and asks for comments inside the query, which makes the generated SQL much easier to understand and debug. The performance notes section often surfaces missing index suggestions that would otherwise require a separate query. This variant is formatted for LLaMA / Ollama: Optimised for LLaMA 3, Mistral, and Ollama local models. Uses [INST] / <<SYS>> instruction format.

Prompt Template
[INST] <<SYS>>
You are a helpful, accurate, and detailed AI assistant. Follow the instructions carefully.
<</SYS>>

You are a database engineer expert in {{dialect}} SQL.

Generate a SQL query that answers this question: {{question}}

Database schema:
```sql
{{schema}}
```

Additional requirements:
- {{requirements}}
- Use table aliases for readability
- Add a comment above each major clause explaining its purpose
- If the query requires a CTE or subquery, explain the logic in a comment before the query

After the query, provide:
1. **Plain English explanation** of what the query does
2. **Performance notes** — any indexes that would help, or steps that might be slow on large tables
3. **Sample output** — show what 2-3 example rows would look like [/INST]

Variables

{{dialect}}SQL dialect: PostgreSQL, MySQL, SQLite, BigQuery, Snowflake, T-SQL
{{question}}The business question you want the query to answer
{{schema}}CREATE TABLE statements or a description of relevant tables and columns
{{requirements}}Extra constraints, e.g., "filter to the last 30 days", "exclude test accounts", or "None"

Example

Input
dialect: PostgreSQL
question: Which customers made more than 3 orders in the last 90 days and what is their average order value?
schema:
CREATE TABLE customers (id INT, name TEXT, email TEXT, created_at TIMESTAMPTZ);
CREATE TABLE orders (id INT, customer_id INT, total NUMERIC, created_at TIMESTAMPTZ);
requirements: Exclude customers whose email contains 'test'
Output
```sql
-- CTE: filter orders to the last 90 days for efficiency
WITH recent_orders AS (
  SELECT customer_id, total
  FROM orders
  WHERE created_at >= NOW() - INTERVAL '90 days'
)
-- Main query: aggregate per customer and apply thresholds
SELECT
  c.name,
  c.email,
  COUNT(ro.customer_id) AS order_count,
  ROUND(AVG(ro.total), 2) AS avg_order_value
FROM customers c
JOIN recent_orders ro ON c.id = ro.customer_id
WHERE c.email NOT LIKE '%test%'
GROUP BY c.id, c.name, c.email
HAVING COUNT(ro.customer_id) > 3
ORDER BY avg_order_value DESC;
```

Related Tools

FAQ

Can this generate queries for NoSQL databases?
This prompt is optimised for SQL. For MongoDB, change the dialect to "MongoDB aggregation pipeline" and replace the schema with your document structure. The model will generate a JavaScript aggregation pipeline instead.
How do I handle queries across 10+ tables?
Include only the relevant tables and foreign key relationships in the schema, not the full database. Too much schema context confuses the model and produces join errors.
Will the AI always use the most efficient query structure?
Not always. The generated query is logically correct more often than it is optimally efficient. Always check the EXPLAIN output in your database and review the performance notes section for indexing suggestions.

Related Prompts