CSV Data Processing Prompt (LLaMA / Ollama)
CSV processing tasks involve numerous small decisions about null handling, column types, and deduplication logic. This prompt makes those decisions explicit before code generation, which prevents the AI from making silent assumptions. The validation section (row count before/after) is essential for catching bugs where transformations silently drop more rows than expected. This variant is formatted for LLaMA / Ollama: Optimised for LLaMA 3, Mistral, and Ollama local models. Uses [INST] / <<SYS>> instruction format.
[INST] <<SYS>>
You are a helpful, accurate, and detailed AI assistant. Follow the instructions carefully.
<</SYS>>
You are a data engineer. Write {{language}} code to process the following CSV data.
Task: {{task}}
Output format: {{output_format}}
CSV sample (first 5 rows):
```
{{csv_sample}}
```
Full column list: {{columns}}
Expected output columns: {{output_columns}}
Requirements:
- Handle missing values by {{missing_value_strategy}}
- {{additional_requirements}}
Provide:
1. Complete, runnable {{language}} code
2. Comments explaining each transformation step
3. A short validation section that prints row counts before and after to confirm no unexpected data loss [/INST]Variables
{{language}}Code language: Python (pandas), JavaScript, R, or SQL{{task}}What to do with the data, e.g., "deduplicate by email, normalise phone numbers, pivot sales by month"{{output_format}}Output format: CSV, JSON, pandas DataFrame, SQL INSERT statements{{csv_sample}}The first 5 rows of your CSV including headers{{columns}}All column names and their data types{{output_columns}}The columns expected in the output{{missing_value_strategy}}How to handle nulls, e.g., "filling with 0", "dropping the row", "keeping as NaN"{{additional_requirements}}Any other constraints, or "None"Example
language: Python (pandas) task: Deduplicate by email (keep most recent row), normalise phone numbers to E.164 format output_format: CSV csv_sample: name,email,phone,signup_date Alice,[email protected],(555) 123-4567,2024-01-15 Bob,[email protected],5559876543,2024-02-20 Alice,[email protected],555-123-4567,2024-03-01
```python
import pandas as pd
import re
df = pd.read_csv('input.csv')
print(f"Rows before: {len(df)}")
# Normalise phone numbers to E.164 (+1XXXXXXXXXX for US)
def normalise_phone(phone):
digits = re.sub(r'\D', '', str(phone))
return f"+1{digits}" if len(digits) == 10 else None
df['phone'] = df['phone'].apply(normalise_phone)
# Deduplicate by email, keeping the most recent signup_date
df['signup_date'] = pd.to_datetime(df['signup_date'])
df = df.sort_values('signup_date').drop_duplicates('email', keep='last')
print(f"Rows after: {len(df)}")
df.to_csv('output.csv', index=False)
```Related Tools
FAQ
- Can this process files that are too large to paste?
- Paste only the header row and first 5 data rows as the csv_sample, then list all column names in the columns field. The AI generates code based on the structure, which then runs against your full file.
- Can I use this to convert CSV to JSON or SQL?
- Yes. Set output_format to "JSON array" or "SQL INSERT statements" and specify the target table name in additional_requirements.
- What if my CSV has inconsistent encoding or BOM characters?
- Add to additional_requirements: "Read the file with encoding='utf-8-sig' to handle BOM characters and use errors='replace' for encoding errors." The AI will add the appropriate read_csv parameters.
Related Prompts
CSV processing tasks involve numerous small decisions about null handling, column types, a...
CSV Data Processing Prompt (ChatGPT)CSV processing tasks involve numerous small decisions about null handling, column types, a...
CSV Data Processing Prompt (Claude)CSV processing tasks involve numerous small decisions about null handling, column types, a...
CSV Data Processing Prompt (Gemini)CSV processing tasks involve numerous small decisions about null handling, column types, a...
Data Analysis PromptMost AI data analysis prompts produce vague observations like "sales increased in Q2". Thi...
SQL Query Generation PromptGenerating SQL without schema context produces queries that reference non-existent columns...