How to Fix LLM Output Control Issues: JSON Format, Length, and Structure
The Problem: LLM Output Unpredictability
You're building an application using GPT-4, Claude, or Llama. The LLM works in the playground but breaks in production because:
- Outputs are too short or too verbose
- JSON responses are malformed
- The model ignores your formatting instructions
- Structured output fails to parse
This is not a model problem. This is a prompt engineering problem.
Issue #1: Output Too Short
Symptom
Your Prompt:
Explain how Docker works.
Model Output:
Docker is containerization technology.
Expected: A detailed explanation with examples.
Root Cause
- Ambiguous instruction - "Explain" can mean 1 sentence or 10 paragraphs
- No length constraint - Model defaults to concise responses
- Missing context - Model doesn't know your audience level
Wrong Approach
Explain how Docker works in detail.
Problem: "In detail" is still vague.
Correct Approach
Technique 1: Specify Exact Length
Explain how Docker works. Your response must be at least 300 words and include:
1. What Docker is
2. How containers differ from VMs
3. Basic Docker commands
4. A simple use case example
Write in a tutorial style for beginners.
Technique 2: Use Word/Paragraph Count
Write a 3-paragraph explanation of Docker:
- Paragraph 1: Definition and core concept
- Paragraph 2: Technical architecture (images, containers, daemon)
- Paragraph 3: Practical example with commands
Technique 3: Chain of Thought
Explain Docker step-by-step:
1. First, define what a container is
2. Then explain how Docker manages containers
3. Compare Docker to virtual machines
4. Finally, show a simple docker run example
Think through each step carefully before writing.
Why This Works
- Explicit constraints give the model clear targets
- Structured outline forces comprehensive coverage
- Audience specification calibrates detail level
Issue #2: Output Too Verbose
Symptom
Your Prompt:
Extract the email address from this text: "Contact John at john.doe@example.com"
Model Output:
Certainly! I'd be happy to help you extract the email address from the provided text.
The email address found in the text "Contact John at john.doe@example.com" is:
john.doe@example.com
This email appears to belong to someone named John Doe, based on the context provided. Email addresses typically follow the format of username@domain.extension...
[continues for 3 more paragraphs]
Expected: Just john.doe@example.com
Root Cause
- Conversational training - Models are trained to be helpful and verbose
- No output constraint - Model assumes you want explanation
- Politeness bias - Models add pleasantries by default
Wrong Approach
Just give me the email, don't explain.
Problem: Still allows preamble and politeness.
Correct Approach
Technique 1: Direct Output Format
Extract email from text. Output ONLY the email address, nothing else.
Text: "Contact John at john.doe@example.com"
Technique 2: Template Enforcement
Extract the email address.
Input: "Contact John at john.doe@example.com"
Output: [email only, no explanation]
Technique 3: System Message (API)
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "system",
content: "You are a data extraction tool. Output only the requested data with no preamble, explanation, or politeness. No markdown formatting."
},
{
role: "user",
content: "Extract email from: 'Contact John at john.doe@example.com'"
}
]
});
Technique 4: Few-Shot Examples
Extract email addresses from text. Output only the email, nothing else.
Example 1:
Input: "Reach out to jane@company.com for details"
Output: jane@company.com
Example 2:
Input: "Email bob.smith@tech.org if interested"
Output: bob.smith@tech.org
Now do this:
Input: "Contact John at john.doe@example.com"
Output:
Why This Works
- System message sets behavior at API level (highest priority)
- "Output ONLY" is explicit constraint
- Few-shot examples show exact format without ambiguity
Issue #3: Invalid JSON Output
Symptom
Your Prompt:
Extract name and email as JSON from: "John Doe, john@example.com"
Model Output:
Sure! Here's the JSON:
```json
{
"name": "John Doe",
"email": "john@example.com"
}
This JSON contains...
**Your Code:**
```javascript
const data = JSON.parse(response);
// Error: SyntaxError: Unexpected token S in JSON at position 0
Problem: Response has markdown code blocks and explanation text.
Root Cause
- Markdown formatting - Model wraps JSON in code blocks
- Extra text - Preambles and explanations
- No schema enforcement - Model guesses structure
- Inconsistent keys - Model might use "full_name" vs "name"
Wrong Approach
Return JSON without markdown.
Problem: Model might still add explanations. Not specific enough about schema.
Correct Approach
Technique 1: Explicit JSON-Only Instruction
Extract data as JSON. Output ONLY valid JSON, no markdown formatting, no explanations.
Schema:
{
"name": "string",
"email": "string"
}
Input: "John Doe, john@example.com"
Output:
Technique 2: Function Calling (OpenAI)
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{ role: "user", content: "Extract name and email from: 'John Doe, john@example.com'" }
],
functions: [
{
name: "extract_contact",
description: "Extract contact information",
parameters: {
type: "object",
properties: {
name: { type: "string", description: "Person's full name" },
email: { type: "string", description: "Email address" }
},
required: ["name", "email"]
}
}
],
function_call: { name: "extract_contact" }
});
const args = JSON.parse(response.choices[0].message.function_call.arguments);
// Guaranteed valid JSON: { name: "John Doe", email: "john@example.com" }
Technique 3: Structured Output (OpenAI)
const completion = await openai.beta.chat.completions.parse({
model: "gpt-4o-2024-08-06",
messages: [
{ role: "user", content: "Extract name and email from: 'John Doe, john@example.com'" }
],
response_format: {
type: "json_schema",
json_schema: {
name: "contact_schema",
strict: true,
schema: {
type: "object",
properties: {
name: { type: "string" },
email: { type: "string" }
},
required: ["name", "email"],
additionalProperties: false
}
}
}
});
// Guaranteed to match schema or API returns error
Technique 4: Post-Processing (Fallback)
function extractJSON(text) {
// Remove markdown code blocks
text = text.replace(/```json\n?/g, '').replace(/```\n?/g, '');
// Find JSON object
const match = text.match(/\{[\s\S]*\}/);
if (!match) throw new Error('No JSON found');
return JSON.parse(match[0]);
}
const cleaned = extractJSON(response.content);
Why This Works
- Function Calling enforces schema at API level (most reliable)
- Structured Output uses constrained decoding (100% guarantee)
- Explicit schema in prompt reduces ambiguity
- Post-processing handles legacy models or edge cases
Issue #4: Model Not Following Format
Symptom
Your Prompt:
List 3 programming languages. Format:
1. [Language] - [Use case]
Model Output:
Here are three popular programming languages:
• Python is great for data science
• JavaScript for web development
• Go is used for backend systems
Expected:
1. Python - Data science and machine learning
2. JavaScript - Frontend and backend web development
3. Go - High-performance backend systems
Root Cause
- Competing instructions - Model prioritizes "helpfulness" over format
- Vague format specification - "Format:" is not imperative enough
- No enforcement mechanism - Model can ignore format without penalty
Wrong Approach
Follow the format exactly!
Problem: Still not specific about what "exactly" means.
Correct Approach
Technique 1: Template with Placeholders
List 3 programming languages using this EXACT template:
1. [LANGUAGE] - [USE_CASE]
2. [LANGUAGE] - [USE_CASE]
3. [LANGUAGE] - [USE_CASE]
Do not add any text before or after the list. Do not use bullet points. Numbers and dashes must be exactly as shown.
Technique 2: Output Example First
You must format your response exactly like this example:
Example:
1. Python - Data science
2. JavaScript - Web development
3. Go - Backend systems
Now list 3 database systems in the same format:
Technique 3: Regex Validation Threat
List 3 programming languages. Output must match this regex pattern:
^\d+\. \w+ - .+$
Format:
1. [Language] - [Use case]
Your output will be validated against the regex. Any deviation will fail.
Technique 4: System Message + User Constraint
{
role: "system",
content: "You are an API that outputs data in exact formats. Never add preamble, explanations, or deviate from specified format."
},
{
role: "user",
content: `Output 3 languages in format:
1. [Language] - [Use case]
2. [Language] - [Use case]
3. [Language] - [Use case]`
}
Why This Works
- Exact template removes interpretation space
- Examples are clearer than descriptions
- Validation threat signals importance (even if you don't validate)
- System message sets behavior mode upfront
Issue #5: Forcing Structured Output (Production)
Problem
You need guaranteed structured output for:
- API responses
- Database inserts
- UI rendering
- Downstream processing
Prompts alone are unreliable (~95% success rate).
Production Solutions
Solution 1: OpenAI Function Calling
Most reliable for GPT-4/GPT-3.5
const tools = [
{
type: "function",
function: {
name: "save_user_data",
description: "Save extracted user information",
parameters: {
type: "object",
properties: {
name: { type: "string" },
age: { type: "integer" },
email: { type: "string", format: "email" },
interests: {
type: "array",
items: { type: "string" }
}
},
required: ["name", "email"]
}
}
}
];
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{ role: "user", content: "Extract: 'John, 30, john@test.com, loves coding and gaming'" }
],
tools: tools,
tool_choice: { type: "function", function: { name: "save_user_data" } }
});
const data = JSON.parse(response.choices[0].message.tool_calls[0].function.arguments);
Solution 2: JSON Mode (OpenAI)
const response = await openai.chat.completions.create({
model: "gpt-4-turbo",
response_format: { type: "json_object" },
messages: [
{
role: "system",
content: "Extract user data as JSON with keys: name, age, email, interests (array)"
},
{
role: "user",
content: "John, 30, john@test.com, loves coding and gaming"
}
]
});
const data = JSON.parse(response.choices[0].message.content);
Important: You MUST mention "JSON" in the prompt when using json_object mode.
Solution 3: Structured Output (OpenAI Strict)
Newest and most reliable (100% schema adherence)
import { z } from 'zod';
const UserSchema = z.object({
name: z.string(),
age: z.number().int().positive(),
email: z.string().email(),
interests: z.array(z.string())
});
const completion = await openai.beta.chat.completions.parse({
model: "gpt-4o-2024-08-06",
messages: [
{ role: "system", content: "Extract user data" },
{ role: "user", content: "John, 30, john@test.com, loves coding and gaming" }
],
response_format: zodResponseFormat(UserSchema, "user_data")
});
const data = completion.choices[0].message.parsed;
// Type-safe, guaranteed to match schema
Solution 4: Pydantic with Instructor (Python)
from pydantic import BaseModel, EmailStr
from typing import List
import instructor
from openai import OpenAI
client = instructor.patch(OpenAI())
class UserData(BaseModel):
name: str
age: int
email: EmailStr
interests: List[str]
data = client.chat.completions.create(
model="gpt-4",
response_model=UserData,
messages=[
{"role": "user", "content": "John, 30, john@test.com, loves coding and gaming"}
]
)
print(data.model_dump())
# {'name': 'John', 'age': 30, 'email': 'john@test.com', 'interests': ['coding', 'gaming']}
Solution 5: Validation + Retry Pattern
async function extractWithRetry(text, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{
role: "system",
content: "Extract as JSON: {name: string, age: number, email: string, interests: string[]}"
},
{ role: "user", content: text }
]
});
try {
const data = JSON.parse(response.choices[0].message.content);
// Validate schema
if (!data.name || !data.email) {
throw new Error('Missing required fields');
}
return data;
} catch (error) {
if (i === maxRetries - 1) throw error;
// Retry with more explicit prompt
console.log(`Attempt ${i + 1} failed, retrying...`);
}
}
}
Common Mistakes
Mistake #1: Assuming Models Read Like Humans
Wrong:
List the items in alphabetical order.
Problem: Model might alphabetize by first letter only, or ignore case.
Correct:
List items in strict alphabetical order (case-insensitive, full string comparison).
Example:
- Apple
- Banana
- Cherry
Not:
- Banana
- Apple
- cherry
Mistake #2: Using Vague Constraints
Wrong:
Keep response brief.
Correct:
Response must be maximum 50 words.
Mistake #3: Ignoring System Messages
Wrong:
// All instructions in user message
messages: [
{ role: "user", content: "You are a JSON API. Extract name from 'John Doe'" }
]
Correct:
messages: [
{ role: "system", content: "You are a JSON API. Output only valid JSON, no text." },
{ role: "user", content: "Extract name from 'John Doe'" }
]
System messages have higher priority than user messages.
Mistake #4: Not Using Model-Specific Features
Each model family has strengths:
- GPT-4/3.5: Function calling, JSON mode
- Claude: Long context, markdown formatting
- Llama: Fast, good for constrained generation
Use the right tool for the job.
Mistake #5: Over-Reliance on Model Behavior
Never assume the model will follow your format 100% of the time without enforcement.
Always:
- Use schema enforcement (function calling, Pydantic)
- Validate output
- Have fallback/retry logic
Best Practices Checklist
For Output Length
- [ ] Specify exact word/paragraph count
- [ ] Provide structural outline
- [ ] Define audience and detail level
- [ ] Use few-shot examples for expected length
For Output Format
- [ ] Use system message to set behavior mode
- [ ] Provide exact template with placeholders
- [ ] Show format examples, not just descriptions
- [ ] Specify what NOT to include (e.g., "no markdown")
For JSON/Structured Output
- [ ] Use function calling or structured output API features
- [ ] Define complete schema upfront
- [ ] Validate output with Zod/Pydantic
- [ ] Implement retry logic with error feedback
- [ ] Strip markdown/extra text in post-processing
For Production Reliability
- [ ] Use explicit system messages
- [ ] Prefer API-level constraints over prompt instructions
- [ ] Test with edge cases (empty input, special characters)
- [ ] Log failures for prompt refinement
- [ ] Set temperature=0 for deterministic output
- [ ] Monitor success rate and iterate
Quick Reference: Output Control Parameters
// GPT-4 optimal settings for structured output
{
model: "gpt-4-turbo",
temperature: 0, // Deterministic
max_tokens: 500, // Limit length
response_format: { type: "json_object" },
messages: [
{
role: "system",
content: "Output only valid JSON. No explanations."
},
// ...
]
}
Temperature:
0.0- Deterministic, consistent (use for structured output)0.3-0.7- Balanced (use for creative but controlled)1.0+- Creative, unpredictable (avoid for production APIs)
Max Tokens:
- Set to reasonable limit to prevent verbosity
- 500 for structured data extraction
- 1000 for explanations
- 2000+ for long-form content
Debugging Workflow
When output is wrong:
- Check system message - Is behavior mode set?
- Review prompt clarity - Remove ambiguity
- Add few-shot examples - Show exact format
- Enable JSON mode - If using JSON
- Lower temperature - Reduce randomness
- Use function calling - For guaranteed structure
- Log failures - Analyze patterns
- Iterate prompt - Based on failure modes
Conclusion
LLM output control is engineering, not magic:
- Short output → Specify length explicitly
- Verbose output → Use system message + constraints
- Invalid JSON → Function calling or structured output
- Format ignored → Templates + examples + validation
The hierarchy of reliability:
- Structured Output API (100% schema adherence)
- Function Calling (99% reliability)
- JSON Mode + validation (95% reliability)
- Prompt engineering alone (80-90% reliability)
For production, always use #1 or #2. Prompts are documentation, not enforcement.
Pro Tip: Start with the strictest method (Structured Output) and only fall back to prompts if your model doesn't support it. Don't try to "fix" a prompt that should be a schema.