Prompt Instruction Counter

How many instructions can an LLM reliably follow? Research shows that model performance degrades as instruction count increases. This tool helps you measure and optimize your prompt's instruction density.

Why Instruction Count Matters

Every bullet point, rule, and constraint in your system prompt is an instruction the model must track and follow. Research on instruction following (like IFEval and related work) shows that:

Frontier reasoning models (GPT-4, Claude 3.5) handle ~150-250 instructions near-perfectly
Large general models show linear degradation past ~100 instructions
Smaller models experience exponential decay past ~50 instructions

What This Tool Measures

The Instruction Counter analyzes your prompt and identifies:

Instruction Types

Imperative – Direct commands like "Use X", "Create Y", "Avoid Z"
Modal – Obligation words: "must", "should", "required"
Negation – Prohibitions: "don't", "never", "avoid"
Conditional – Logic branches: "if...then", "when the user..."
Constraint – Restrictions: "always", "only", "unless"
Format – Output specifications: "respond in JSON", "format as..."

Risk Levels

Level	Instructions	Guidance
Low	<100	Safe for all model sizes
Medium	100–149	Consider smaller models may struggle
High	150–199	Approaching frontier model limits
Critical	≥200	Expect instruction-following degradation

How to Use the Results

High instruction count? Consolidate related rules into fewer, broader guidelines
High-density units? Split complex bullet points that contain multiple directives
Lots of negations? Reframe as positive instructions where possible
Many conditionals? Consider if all edge cases are necessary

Methodology

The analyzer uses heuristic pattern matching to identify instruction signals. It's deterministic (no AI calls) and runs entirely in your browser. The approach:

Strips front matter and identifies sections
Excludes example sections from counting
Extracts atomic units (bullets and sentences)
Detects instruction signals using validated regex patterns
Calculates weighted scores based on signal density

Related Tools

For deeper prompt analysis including structure visualization, issue detection, and semantic duplicate finding, check out the Prompt Analyzer.

References

IFEval: Instruction Following Evaluation – Foundational research on instruction following capabilities
IFScale: Scaling Instruction Following – Research on model capacity limits