Prompt Instruction Counter
Count instructions in your LLM prompts and assess complexity risk based on research thresholds
How many instructions can an LLM reliably follow? Research shows that model performance degrades as instruction count increases. This tool helps you measure and optimize your prompt's instruction density.
Why Instruction Count Matters
Every bullet point, rule, and constraint in your system prompt is an instruction the model must track and follow. Research on instruction following (like IFEval and related work) shows that:
- Frontier reasoning models (GPT-4, Claude 3.5) handle ~150-250 instructions near-perfectly
- Large general models show linear degradation past ~100 instructions
- Smaller models experience exponential decay past ~50 instructions
What This Tool Measures
The Instruction Counter analyzes your prompt and identifies:
Instruction Types
- Imperative – Direct commands like "Use X", "Create Y", "Avoid Z"
- Modal – Obligation words: "must", "should", "required"
- Negation – Prohibitions: "don't", "never", "avoid"
- Conditional – Logic branches: "if...then", "when the user..."
- Constraint – Restrictions: "always", "only", "unless"
- Format – Output specifications: "respond in JSON", "format as..."
Risk Levels
| Level | Instructions | Guidance |
|---|---|---|
| Low | <100 | Safe for all model sizes |
| Medium | 100–149 | Consider smaller models may struggle |
| High | 150–199 | Approaching frontier model limits |
| Critical | ≥200 | Expect instruction-following degradation |
How to Use the Results
- High instruction count? Consolidate related rules into fewer, broader guidelines
- High-density units? Split complex bullet points that contain multiple directives
- Lots of negations? Reframe as positive instructions where possible
- Many conditionals? Consider if all edge cases are necessary
Methodology
The analyzer uses heuristic pattern matching to identify instruction signals. It's deterministic (no AI calls) and runs entirely in your browser. The approach:
- Strips front matter and identifies sections
- Excludes example sections from counting
- Extracts atomic units (bullets and sentences)
- Detects instruction signals using validated regex patterns
- Calculates weighted scores based on signal density
Related Tools
For deeper prompt analysis including structure visualization, issue detection, and semantic duplicate finding, check out the Prompt Analyzer.
References
- IFEval: Instruction Following Evaluation – Foundational research on instruction following capabilities
- IFScale: Scaling Instruction Following – Research on model capacity limits