prompts

Prompt Instruction Counter

Count instructions in your LLM prompts and assess complexity risk based on research thresholds

How many instructions can an LLM reliably follow? Research shows that model performance degrades as instruction count increases. This tool helps you measure and optimize your prompt's instruction density.

Why Instruction Count Matters

Every bullet point, rule, and constraint in your system prompt is an instruction the model must track and follow. Research on instruction following (like IFEval and related work) shows that:

  • Frontier reasoning models (GPT-4, Claude 3.5) handle ~150-250 instructions near-perfectly
  • Large general models show linear degradation past ~100 instructions
  • Smaller models experience exponential decay past ~50 instructions

What This Tool Measures

The Instruction Counter analyzes your prompt and identifies:

Instruction Types

  • Imperative – Direct commands like "Use X", "Create Y", "Avoid Z"
  • Modal – Obligation words: "must", "should", "required"
  • Negation – Prohibitions: "don't", "never", "avoid"
  • Conditional – Logic branches: "if...then", "when the user..."
  • Constraint – Restrictions: "always", "only", "unless"
  • Format – Output specifications: "respond in JSON", "format as..."

Risk Levels

LevelInstructionsGuidance
Low<100Safe for all model sizes
Medium100–149Consider smaller models may struggle
High150–199Approaching frontier model limits
Critical≥200Expect instruction-following degradation

How to Use the Results

  1. High instruction count? Consolidate related rules into fewer, broader guidelines
  2. High-density units? Split complex bullet points that contain multiple directives
  3. Lots of negations? Reframe as positive instructions where possible
  4. Many conditionals? Consider if all edge cases are necessary

Methodology

The analyzer uses heuristic pattern matching to identify instruction signals. It's deterministic (no AI calls) and runs entirely in your browser. The approach:

  1. Strips front matter and identifies sections
  2. Excludes example sections from counting
  3. Extracts atomic units (bullets and sentences)
  4. Detects instruction signals using validated regex patterns
  5. Calculates weighted scores based on signal density

Related Tools

For deeper prompt analysis including structure visualization, issue detection, and semantic duplicate finding, check out the Prompt Analyzer.

References