Quell reads your docstrings, Pydantic models, and type annotations, extracts every testable requirement, finds which ones have no test, generates pytest tests via a rule engine, verifies each test through a 5-gate pipeline, and writes only proven tests to disk.

Does Quell require an LLM API key?

No. Quell's deterministic rule engine handles ~75% of edge cases with no LLM, no network call, and no API key. The LLM fallback is opt-in and only used for complex cases.

What is the 5-gate pipeline?

Every generated test must pass: Gate 1 (AST valid Python), Gate 2 (not already in a test file), Gate 3 (no shell calls or file writes), Gate 4 (passes against original code), Gate 5 (fails when the requirement is violated). Only gate-5-verified tests are written to disk.

What is the Production Readiness Score (PRS)?

PRS = (WRITTEN × 1.0 + SCAFFOLDED × 0.5) / total_requirements × 100. Tiers: 80-100 Production Ready, 60-79 Review Needed, 0-59 Needs Work.

v2.0.0 · three-bucket output · PRS

Untested edge cases
will bite you
in production.
Quell finds them first.

Run quell find src/ and get three buckets back: tests written to disk, stubs to finish, and gaps with a one-line reason. Every WRITTEN test passed five gates — including proving it actually catches the bug.

Start for free Star on GitHub

$pip install quelltest

· no LLM key required· runs offline· MIT licensed

$ quell find src/payments/ --fix

v2.0.0

13 requirements·12 functions·2 Pydantic models

WRITTEN

SCAFFOLDED

FLAGGED

✓test_payment_rejects_zero_amount

94%

✓test_payment_rejects_negative

91%

✓test_user_email_must_not_be_empty

88%

~test_refund_idempotency

stub

✗src/billing.py:142 — external API

PRS: 84 / 100Production Ready

code never left your machine

verification gates

every WRITTEN test passes all five

output buckets

WRITTEN · SCAFFOLDED · FLAGGED

of cases handled by rule engine

no LLM, no network, deterministic

local by default

no source code leaves your machine

THE OUTPUT

Three buckets.
Nothing dropped silently.

Edge Cases Found

13 requirements extracted

5 / 5 gates

Partial gates

Cannot test

✓ WRITTEN

~ SCAFFOLDED

✗ FLAGGED

✓ WRITTEN — Tests written to disk

~ SCAFFOLDED — Stubs for you to complete

✗ FLAGGED — Cannot auto-test, reason shown

✓

WRITTEN

5 / 5 gates passed

Test written and proven

Fully generated, verified to pass on correct code and fail when the guard is removed. Written to your test file via libcst — no string pasting.

# tests/test_payments.py
def test_payment_rejects_zero_amount():
    with pytest.raises(ValueError):
        process_payment(amount=0, currency="USD")
 
# ✓ WRITTEN — all 5 gates passed

Source file restored immediately after gate 5 verification.

SCAFFOLDED

Partial — gates 1-3 passed

Stub ready for you to finish

Gate 1-3 passed but gate 4 or 5 could not be verified automatically. Quell writes a stub with a clear comment on what's needed — you fill in the rest.

# tests/test_payments.py
def test_refund_idempotency():
    # TODO: verify idempotent refund behavior
    # Quell: external state makes gate 5 unprovable
    pass  # complete me
 
# ~ SCAFFOLDED — gates 1-3 passed

Stubs are valid Python — they run (and fail) immediately.

✗

FLAGGED

Cannot auto-test

Gap documented, reason given

The requirement exists and is documented, but no automatable test path exists. Quell explains why — side effects, non-determinism, external service — so you can decide.

# FLAGGED requirements:
 
✗ test_external_payment_gateway
  Reason: calls stripe.Charge.create()
  Side effect detected — cannot inject violation
 
# -5 PRS per unflagged requirement

Each flagged item documents exactly why it cannot be auto-tested.

PRODUCTION READINESS SCORE

One number.
How production-ready
are your edge cases?

PRS (0–100) aggregates how many of your requirements have verified tests. It rewards WRITTEN tests, penalizes uncovered gaps, and gives partial credit for SCAFFOLDED stubs. One number in CI.

Score Tiers

80 – 100Production Ready

60 – 79Review Needed

0 – 59Needs Work

+5for each documented FLAGGED requirement (you know what can't be tested)

-10for each skipped high-confidence test (coverage gap you chose to ignore)

WRITTEN

SCAFFOLDED

FLAGGED

avg confidence 91%·13 edge cases

CI gate config

# pyproject.toml
[tool.quell]
prs_threshold = 80
fail_on_below_threshold = true
 
# CI gate — fails if PRS < 80
$ quell score --gate
  PRS: 84/100  Production Ready  ✓

HOW IT WORKS

From spec to verified test
in a few seconds.

Input Readers

Docstrings · Pydantic PySpark StructType

Requirements

list[Requirement]

Test Synthesizer

Rule engine LLM fallback

Verification

5-gate pipeline

Writer

libcst injection

✓ WRITTEN

~ SCAFFOLDED

✗ FLAGGED

Input Readers

Docstrings · Pydantic PySpark StructType

Requirements

list[Requirement]

Test Synthesizer

Rule engine LLM fallback

Verification

5-gate pipeline

Writer

libcst injection

✓ WRITTEN

~ SCAFFOLDED

✗ FLAGGED

Read existing specs

Quell AST-scans your source files. No annotations required. It reads Python docstrings (numpy/google/plain), Pydantic model field validators and constraints, and PySpark StructType schemas. Each reader returns [] on any error — it never crashes.

# quell reads what's already there
 
class PaymentRequest(BaseModel):
    amount: float = Field(gt=0, description="Must be positive")
    currency: str = Field(min_length=3, max_length=3)
 
# Extracted: MUST_RAISE, BOUNDARY, ENUM_VALID

Rule engine generates candidates

~75% of cases are handled by the deterministic rule engine — no LLM, no network. Rules handle MUST_RAISE, MUST_RETURN, BOUNDARY, ENUM_VALID, NOT_NULL, and TYPE_CHECK constraints. LLM is only called as a fallback for complex cases.

# Rule engine: BOUNDARY constraint
 
# From: amount: float = Field(gt=0)
# Generates:
def test_payment_rejects_zero_amount():
    with pytest.raises(ValidationError):
        PaymentRequest(amount=0, currency='USD')

5-gate verification pipeline

Every candidate test runs the 5-gate pipeline. Gates 1-3 are static (AST valid, not duplicate, no side effects). Gate 4 runs the test on the original code — it must pass. Gate 5 injects a violation and runs again — it must fail. Gate 5 is the moat.

Gate 1: AST Valid        ✓ parses
Gate 2: Original         ✓ not duplicate
Gate 3: Secure           ✓ no side effects
Gate 4: Passes correct   ✓ test passes
Gate 5: Fails violated   ✓ violation caught
 
→ WRITTEN  (5/5 gates)

Written to disk with libcst

Tests that pass all 5 gates are injected into your test file using libcst — Concrete Syntax Tree safe injection. No string concatenation, no overwriting. Quell backs up the file before writing, validates the CST, and restores on any failure. An audit log entry is appended.

# libcst injection — CST-safe
 
$ quell find src/
  → tests/test_payments.py (+8 tests)
  → tests/test_users.py (+3 tests)
 
  Audit log: .quell/audit.jsonl
  Backup: .quell/backups/

THE MOAT

Every WRITTEN test passes five gates.
Most tools run one.

~18% of generated tests that look correct actually fail gate 5 — they don't catch the bug they're supposed to catch.

AST Valid

Parses to valid Python AST before any execution

Original

Test not already present in any test file

Secure

No shell calls, no file system writes, no network

Passes Correct

Runs against original code

✓ MUST PASS

Fails Violated

Runs against code with injected violation

✗ MUST FAIL

THE MOAT — Only Quell verifies both

AST Valid

Parses to valid Python AST before any execution

Original

Test not already present in any test file

Secure

No shell calls, no file system writes, no network

Passes CorrectTHE MOAT

Runs against original code

✓ MUST PASS

Fails ViolatedTHE MOAT

Runs against code with injected violation

✗ MUST FAIL

Gate 4 — Passes correct code

# Original code — guard intact

if amount <= 0:

raise ValueError

✓ test passes — correct behavior

Gate 5 — Fails violated code (THE MOAT)

# Violation injected — guard removed

# if amount <= 0:

# raise ValueError

pass

✗ test fails — violation detected

Gate 4 — Passes correct code

# Gate 4: test on ORIGINAL code
 
def process_payment(amount: float):
    if amount <= 0:  # guard intact
        raise ValueError('amount must be positive')
 
$ pytest test_payment_rejects_zero_amount
  PASSED  ✓ (gate 4 passed)

Gate 5 — Fails violated code (THE MOAT)

# Gate 5: test on VIOLATED code
 
def process_payment(amount: float):
    # if amount <= 0:  <- guard removed
    #     raise ValueError  <- violation
    pass  # nothing raised
 
$ pytest test_payment_rejects_zero_amount
  FAILED  ✗ (gate 5 passed — bug caught)

COVERAGE VS PRS

High coverage. Low PRS.
Both can be true simultaneously.

Coverage tells you which lines executed. It says nothing about whether those lines have any checks at the edge cases that matter. A 91% coverage score can coexist with a 52 PRS — same codebase, same tests, different measures.

Line Coverage(coverage.py)

91%

Production Readiness(PRS (quell score))

52 / 100

Same codebase. Both numbers correct. Coverage measures which lines ran. PRS measures whether tests actually catch bugs.

Constraint kinds & violation injections

Constraint	What it checks	Violation injection
MUST_RAISE	Expected exception	Remove raise statement
MUST_RETURN	Expected return value	Change return to wrong value
BOUNDARY	Numeric boundary check	Negate comparison operator
ENUM_VALID	Allowed set membership	Remove enum validation
NOT_NULL	None rejection	Remove None check

SPEC SOURCES

No annotations needed.
Quell reads what's already there.

Your docstrings, type annotations, and schema definitions already contain testable requirements. Quell extracts them without any changes to your source code.

Python Docstrings

Numpy, Google, plain, reStructuredText

def process_payment(amount: float):
    """Process a payment.
 
    Args:
        amount: Must be > 0. Raises ValueError
                if zero or negative."""
# → MUST_RAISE (ValueError, amount <= 0)

MUST_RAISEBOUNDARYNOT_NULL

Pydantic Models

v1 and v2, Field validators, model validators

class OrderRequest(BaseModel):
    quantity: int = Field(ge=1, le=999)
    sku: str = Field(min_length=6, max_length=12)
    status: Literal['new','paid','shipped']
 
# → BOUNDARY, ENUM_VALID, TYPE_CHECK

BOUNDARYENUM_VALIDTYPE_CHECK

PySpark Schemas

StructType, StructField, nullable=False constraints

schema = StructType([
    StructField('user_id', LongType(), nullable=False),
    StructField('amount', DoubleType(), nullable=False),
    StructField('currency', StringType(), nullable=True),
])
# → NOT_NULL (user_id, amount)

NOT_NULLTYPE_CHECK

OpenAPI, TypeScript types, and mutation results are on the roadmap.

VERSUS

Most tools run one gate.
We run five.

Feature	Quell quelltest	GitHub Copilot	Qodo (CodiumAI)	Hypothesis
Reads existing specs (no annotation)	✓	✗	✗	✗
Deterministic rule engine (no LLM needed)	✓	✗	✗	✓
Gate 4: test passes on correct code	✓	partial	partial	✗
Gate 5: test fails on violated code	✓	✗	✗	✗
Works offline (no network required)	✓	✗	✗	✓
Writes verified tests to disk (libcst)	✓	✗	✗	✗
Three-bucket output (WRITTEN / SCAFFOLDED / FLAGGED)	✓	✗	✗	✗
PRS production readiness score	✓	✗	✗	✗
No LLM API key required	✓	✗	✗	✓
Source file restore guarantee (finally block)	✓	✗	✗	✗
Supports Pydantic + PySpark schemas	✓	✗	partial	✗
MIT licensed, runs in CI	✓	✗	✗	✓

Comparison as of May 2026. Information sourced from public documentation.

PRICING

Free to start.
Scale when you need to.

Hobby

Free

For individuals exploring Quell on personal projects.

✓500 verifications / month
✓Python docstrings + Pydantic
✓3-bucket output
✓CLI access
✓Community support
✓MIT licensed

Get started free

Pro

Questions, answered.

No. The rule engine handles ~75% of cases offline and deterministically. MUST_RAISE, MUST_RETURN, BOUNDARY, ENUM_VALID, NOT_NULL, and TYPE_CHECK are all pure rule-based. The LLM fallback is optional and only activates on complex cases you opt into.

Stop shipping
untested edge cases.

Quell reads your existing specs, generates verified tests, and writes them to disk. No LLM key. No internet. No guessing whether your tests actually catch bugs — they do.

Start for free Read the docs

$pip install quelltest

MIT licensed · Python 3.11+ · runs offline · no LLM key required

Untested edge caseswill bite youin production.Quell finds them first.

Three buckets.Nothing dropped silently.

One number.How production-readyare your edge cases?

From spec to verified testin a few seconds.

Read existing specs

Rule engine generates candidates

5-gate verification pipeline

Written to disk with libcst

Every WRITTEN test passes five gates.Most tools run one.

High coverage. Low PRS.Both can be true simultaneously.

No annotations needed.Quell reads what's already there.

Most tools run one gate.We run five.

Free to start.Scale when you need to.

Questions, answered.

Stop shippinguntested edge cases.

Untested edge cases
will bite you
in production.
Quell finds them first.

Three buckets.
Nothing dropped silently.

One number.
How production-ready
are your edge cases?

From spec to verified test
in a few seconds.

Every WRITTEN test passes five gates.
Most tools run one.

High coverage. Low PRS.
Both can be true simultaneously.

No annotations needed.
Quell reads what's already there.

Most tools run one gate.
We run five.

Free to start.
Scale when you need to.

Stop shipping
untested edge cases.