Open-Source, AI-Native Test Automation for Python/pytest
PraisonAI TestGen is an open-source, AI-powered test generation and maintenance platform for Python projects. Unlike traditional code generation tools that produce tests once and leave maintenance to developers, PraisonAI TestGen autonomously generates, validates, and maintains unit tests as your codebase evolves.
PraisonAI TestGen leverages existing battle-tested packages rather than reinventing the wheel:
| Package | Role in TestGen | What We Reuse |
|---|---|---|
| praisonaiagents | Multi-agent orchestration | Agent, Agents, Task, @tool, hooks, memory |
| testagent | Test validation engine | test(), accuracy(), judges, caching, CLI |
| praisonai-testgen (NEW) | Test generation & maintenance | Code analysis, generation, maintenance loop |
┌─────────────────────────────────────────────────────────────────────────────┐
│ Package Architecture │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ praisonai-testgen (NEW) │ │
│ │ Test Generation • Maintenance • IDE/CI Integration │ │
│ └───────────────────────────────┬─────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────┴───────────────────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌───────────────────────┐ ┌───────────────────────┐ │
│ │ praisonaiagents │ │ testagent │ │
│ │ ──────────────────── │ │ ──────────────────── │ │
│ │ • Agent orchestration│ │ • Test validation │ │
│ │ • Task management │ │ • LLM-as-judge │ │
│ │ • Tool framework │ │ • Caching (100x) │ │
│ │ • Memory & hooks │ │ • Assertions │ │
│ │ • Judge base class │ │ • CLI tools │ │
│ └───────────────────────┘ └───────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Problem | Solution |
|---|---|
| Writing tests is time-consuming | AI generates tests in seconds |
| Tests become stale as code evolves | Continuous maintenance on PRs/MRs |
| Code coverage is inconsistent | Autonomous coverage targeting |
| Legacy code is hard to test | Refactoring suggestions + testability analysis |
| Test selection is slow in CI | Patch-based smart test selection |
“Make every Python developer’s codebase thoroughly tested with zero manual effort.”
praisonaiagents, testagent) rather than duplicating functionality.name, instructions, and tools required.These principles are MANDATORY for all development work on TestGen.
RED → GREEN → REFACTOR
| Step | Action | Rule |
|---|---|---|
| RED | Write failing test first | NO production code without a test |
| GREEN | Write minimal code to pass | Only enough to make the test pass |
| REFACTOR | Clean up code | Keep tests green |
TDD Rules:
tests/ directory matching source structurepytest -x after every changeREUSE → EXTEND → CREATE (last resort)
| Priority | Action | When |
|---|---|---|
| 1st | Reuse | Use existing code from praisonaiagents or testagent |
| 2nd | Extend | Subclass or wrap existing functionality |
| 3rd | Create | Only if nothing exists to reuse |
DRY Rules:
praisonaiagents before writing any agent codetestagent before writing any validation codeMINI AGENT PATTERN: name + instructions + tools
| Component | Keep | Avoid |
|---|---|---|
name |
✅ Required | - |
instructions |
✅ Required | role, goal, backstory |
tools |
✅ Optional | Inline logic in agents |
| Defaults | ✅ Use them | Over-configuration |
Agent-Centric Rules:
@tool functionsAgents for multi-agent workflows80/20 RULE: 20% effort → 80% value
| Principle | Do | Don’t |
|---|---|---|
| Focus | Core functionality first | Edge cases upfront |
| Iterate | Small, working increments | Big bang releases |
| Validate | Test with real usage | Over-engineer |
| Ship | Working > Perfect | Wait for perfection |
Minimal Work Rules:
# ✅ GOOD: Clear, simple, tested
@tool
def parse_function(code: str) -> dict:
"""Parse Python function and extract metadata."""
tree = ast.parse(code)
return {"name": ..., "args": ...}
# ❌ BAD: Complex, untested, unclear
def do_stuff(x):
# 100 lines of nested logic
pass
Quality Rules:
“Don’t Repeat Yourself” — Every capability in TestGen should reuse existing code from
praisonaiagentsortestagentwherever possible. New code is only written for genuinely new functionality.
praisonaiagents (Multi-Agent Framework)| Component | Location | How TestGen Uses It |
|---|---|---|
Agent |
praisonaiagents/agent/agent.py |
Base class for AnalyzerAgent, GeneratorAgent, etc. |
Agents |
praisonaiagents/agents/agents.py |
Orchestrates multi-agent generation workflow |
Task |
praisonaiagents/task/task.py |
Defines generation/validation tasks |
@tool |
praisonaiagents/tools/decorator.py |
Creates AST parser, pytest runner tools |
Judge |
praisonaiagents/eval/judge.py |
Base for test quality evaluation |
| Hooks | praisonaiagents/hooks/ |
Event system for progress tracking |
| Memory | praisonaiagents/memory/ |
Caches code analysis results |
| Guardrails | praisonaiagents/guardrails/ |
Validates generated test quality |
testagent (Test Validation Framework)| Component | Location | How TestGen Uses It |
|---|---|---|
test() |
testagent/core.py |
Validates generated test assertions |
accuracy() |
testagent/core.py |
Compares actual vs expected test output |
criteria() |
testagent/core.py |
Evaluates test quality criteria |
TestAgentCache |
testagent/cache.py |
Caches validation results (100x speedup) |
CodeJudge |
testagent/judges/code.py |
Evaluates generated code quality |
approx, raises |
testagent/assertions.py |
Assertion helpers for tests |
Collector |
testagent/collector.py |
Test discovery for validation |
| CLI | testagent/cli.py |
Extends for testgen commands |
| Component | Purpose | Why Not Reusable |
|---|---|---|
| Code Analyzer | AST parsing for Python | Specific to test generation |
| Test Synthesizer | Generate pytest code | Core novel functionality |
| Maintenance Tracker | Source-test mapping | Novel maintenance loop |
| Dependency Graph | Code relationship mapping | Specific to optimization |
| IDE Plugin | VS Code integration | Platform-specific |
| CI Pipeline | GitHub/GitLab actions | Platform-specific |
PraisonAI Agents support a “mini agent” pattern where you only need to provide:
name - Agent identifierinstructions - What the agent doestools - (optional) Functions the agent can callNo need for: role, goal, backstory, or complex configs. Everything else has sensible defaults!
from praisonaiagents import Agent, Agents, Task, tool
# ============================================================================
# TOOLS - Simple functions with @tool decorator
# ============================================================================
@tool
def parse_python_ast(file_path: str) -> dict:
"""Parse Python file and extract functions/classes for testing."""
import ast
with open(file_path) as f:
tree = ast.parse(f.read())
functions = []
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
functions.append({
"name": node.name,
"args": [a.arg for a in node.args.args],
"lineno": node.lineno,
})
return {"file": file_path, "functions": functions}
@tool
def infer_types(code: str) -> dict:
"""Infer types for function parameters and returns."""
# Uses mypy or simple heuristics
return {"types": {}}
@tool
def generate_test_code(function_info: dict) -> str:
"""Generate pytest test code for a function."""
# LLM fills in the actual test logic
return f"def test_{function_info['name']}():\n pass"
@tool
def create_fixtures(dependencies: list) -> str:
"""Create pytest fixtures for test dependencies."""
return "@pytest.fixture\ndef sample_data():\n return {}"
@tool
def run_pytest(test_file: str) -> dict:
"""Execute pytest and return pass/fail results."""
import subprocess
result = subprocess.run(
["pytest", test_file, "-v", "--tb=short"],
capture_output=True, text=True
)
return {
"passed": result.returncode == 0,
"output": result.stdout,
"errors": result.stderr,
}
@tool
def validate_test_quality(test_code: str) -> dict:
"""Validate test quality using testagent."""
from testagent import CodeJudge
judge = CodeJudge()
result = judge.judge(test_code, criteria="well-structured pytest test")
return {
"score": result.score,
"passed": result.passed,
"feedback": result.reasoning,
}
# ============================================================================
# AGENTS - Mini pattern: just name + instructions + tools
# ============================================================================
analyzer = Agent(
name="Analyzer",
instructions="Parse Python code and identify all testable functions and classes. Extract signatures, dependencies, and docstrings.",
tools=[parse_python_ast, infer_types],
)
generator = Agent(
name="Generator",
instructions="Create comprehensive pytest tests for the analyzed code. Include happy path, edge cases, and error handling tests.",
tools=[generate_test_code, create_fixtures],
)
validator = Agent(
name="Validator",
instructions="Validate that generated tests compile, pass execution, and meet quality standards. Provide feedback for improvements.",
tools=[run_pytest, validate_test_quality],
)
# ============================================================================
# WORKFLOW - Simple task chain
# ============================================================================
def generate_tests(source_file: str):
"""Generate tests for a Python file."""
analyze_task = Task(
description=f"Analyze {source_file} and identify all testable units",
agent=analyzer,
)
generate_task = Task(
description="Generate pytest tests for the analyzed code",
agent=generator,
context=[analyze_task],
)
validate_task = Task(
description="Validate tests compile and pass",
agent=validator,
context=[generate_task],
)
workflow = Agents(
agents=[analyzer, generator, validator],
tasks=[analyze_task, generate_task, validate_task],
)
return workflow.start()
# ❌ VERBOSE (old pattern) - 8 lines per agent
analyzer_agent = Agent(
name="Analyzer",
role="Code Analyzer",
goal="Parse and understand Python code structure",
backstory="You are an expert Python developer who understands code architecture...",
tools=[parse_python_ast, infer_types],
llm="gpt-4o-mini",
verbose=False,
allow_delegation=False,
)
# ✅ MINI PATTERN - 4 lines per agent
analyzer = Agent(
name="Analyzer",
instructions="Parse Python code and identify testable functions",
tools=[parse_python_ast, infer_types],
)
Benefits of Mini Pattern:
instructions is more natural than separate role/goal/backstory┌─────────────────────────────────────────────────────────────────────────────┐
│ TestGen Feature → Existing Component │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ TESTGEN ENGINE │
│ ├── Analyzer ──────────────────→ Agent(instructions=..., tools=[...]) │
│ ├── Generator ─────────────────→ Agent(instructions=..., tools=[...]) │
│ ├── Validator ─────────────────→ testagent.test() + testagent.CodeJudge │
│ ├── Workflow Orchestration ────→ praisonaiagents.Agents (sequential) │
│ └── Quality Guardrails ────────→ praisonaiagents.guardrails │
│ │
│ VALIDATION & JUDGING │
│ ├── Test Execution Check ──────→ testagent.accuracy() │
│ ├── Code Quality Check ────────→ testagent.CodeJudge │
│ ├── Assertion Validation ──────→ testagent.approx, testagent.raises │
│ └── Result Caching ────────────→ testagent.TestAgentCache │
│ │
│ REPORTS & OPTIMIZATION │
│ ├── Coverage Collection ───────→ pytest-cov (external) │
│ ├── Quality Scoring ───────────→ praisonaiagents.eval.Judge │
│ └── Test Selection ────────────→ NEW (TestGen-specific) │
│ │
│ CLI & INTEGRATIONS │
│ ├── CLI Framework ─────────────→ testagent.cli (extend with Typer) │
│ ├── MCP Server ────────────────→ praisonaiagents.mcp │
│ └── IDE/CI Plugins ────────────→ NEW (TestGen-specific) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
from testagent import test, accuracy, criteria, CodeJudge, TestAgentCache
class TestGenValidator:
"""Validates generated tests using testagent."""
def __init__(self):
self.cache = TestAgentCache() # 100x speedup
self.code_judge = CodeJudge()
def validate_passes(self, test_code: str) -> bool:
"""Check if test actually passes."""
result = self._run_test(test_code)
return accuracy(output=str(result.passed), expected="True").passed
def validate_quality(self, test_code: str) -> dict:
"""Check test quality."""
result = self.code_judge.judge(
test_code,
criteria="pytest test with clear assertions"
)
return {"score": result.score, "passed": result.passed}
def validate_assertions(self, test_code: str) -> dict:
"""Check that assertions are meaningful."""
result = criteria(
output=test_code,
criteria="contains meaningful assertions, not just 'assert True'"
)
return {"meaningful": result.passed}
The foundational test generation and maintenance engine, built on praisonaiagents.
from praisonai_testgen import TestGen
# One-liner
result = TestGen().generate("src/calculator.py")
# With options
testgen = TestGen(coverage_target=80)
result = testgen.generate("src/", include_edge_cases=True)
┌──────────┐ ┌───────────┐ ┌────────────────────────────────┐
│ Generate │ ──▶ │ Execute │ ──▶ │ Validate with testagent │
│ Tests │ │ Tests │ │ ├─ test() - basic validation │
└──────────┘ └───────────┘ │ ├─ CodeJudge - quality check │
▲ │ └─ accuracy() - correctness │
│ └────────────────┬───────────────┘
│ │
│ ┌───────────────────────┤
│ │ │
│ ┌──────▼──────┐ ┌─────▼─────┐
└─── Retry ◀────│ Failed │ │ Passed │ ──▶ Output
└─────────────┘ └───────────┘
Extends testagent CLI with generation commands.
# Test generation
testgen init # Initialize TestGen in project
testgen generate src/ # Generate tests for directory
testgen generate src/calc.py # Generate tests for file
testgen update # Update tests for changed code
# Validation (reuses testagent)
testgen validate tests/ # Validate all generated tests
# Reporting
testgen report # Coverage report
testgen report --risk # Risk analysis
# Direct testagent access
testgen test "output" --criteria "is correct"
testgen accuracy "4" --expected "4"
┌──────────────────────────────────────────────────────────────────────┐
│ VS Code │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ calculator.py │ Tests Panel │
│ ───────────── │ ─────────── │
│ def add(a, b): │ ✓ test_add_positive (generated) │
│ return a + b [Gen Tests] │ ✓ test_add_negative (generated) │
│ │ ⚠ test_add_overflow (needs fix) │
│ def subtract(a, b): │ │
│ return a - b [Gen Tests] │ Coverage: 87% │
│ │ │
│ [Coverage: 45%] │ [Regenerate] [Validate All] │
│ │
└──────────────────────────────────────────────────────────────────────┘
# .github/workflows/testgen.yaml
name: PraisonAI TestGen
on:
pull_request:
branches: [main]
jobs:
testgen:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: praisonai/testgen-action@v1
with:
mode: generate-and-commit
coverage-target: 80
env:
OPENAI_API_KEY: $
from praisonai_testgen import testgen
@testgen.skip # Never generate tests
def internal_helper():
pass
@testgen.mock("external_api.call") # Always mock this
def fetch_data():
return external_api.call()
@testgen.priority(high=True) # Generate tests first
def critical_payment_logic():
pass
@testgen.edge_cases(amount=[0, 100, -1, float("inf")])
def calculate_tax(amount: float):
pass
| Layer | Technology | Source |
|---|---|---|
| Agent Framework | praisonaiagents |
REUSE |
| Test Validation | testagent |
REUSE |
| LLM Integration | LiteLLM via praisonaiagents | REUSE |
| Code Analysis | ast, LibCST |
NEW |
| Test Framework | pytest, pytest-cov | EXTERNAL |
| CLI | Typer (extend testagent) | EXTEND |
| IDE Plugin | VS Code Extension API | NEW |
| CI Integration | GitHub Actions | NEW |
praisonai-testgen
├── praisonaiagents (core dependency)
│ ├── Agent (mini pattern: name + instructions + tools)
│ ├── Agents, Task
│ ├── @tool decorator
│ ├── Judge (evaluation)
│ └── hooks, memory, guardrails
├── testagent (validation dependency)
│ ├── test(), accuracy(), criteria()
│ ├── CodeJudge
│ ├── TestAgentCache
│ └── CLI framework
├── libcst (code analysis)
├── pytest (test execution)
└── typer (CLI extension)
praisonai-testgen/
├── src/
│ └── praisonai_testgen/
│ ├── __init__.py
│ ├── testgen.py # Main TestGen class
│ │
│ ├── agents/ # Mini agents (name + instructions + tools)
│ │ ├── __init__.py
│ │ ├── analyzer.py # Analyzer agent
│ │ ├── generator.py # Generator agent
│ │ └── validator.py # Validator agent (uses testagent)
│ │
│ ├── tools/ # @tool decorated functions
│ │ ├── __init__.py
│ │ ├── ast_parser.py # parse_python_ast()
│ │ ├── type_inferencer.py
│ │ ├── test_synthesizer.py
│ │ ├── fixture_builder.py
│ │ └── pytest_runner.py
│ │
│ ├── validation/ # testagent integration
│ │ ├── __init__.py
│ │ └── validator.py # Uses testagent.test(), CodeJudge
│ │
│ ├── maintenance/ # Source-test mapping (NEW)
│ │ ├── __init__.py
│ │ ├── tracker.py
│ │ └── updater.py
│ │
│ ├── cli/ # Extends testagent CLI
│ │ ├── __init__.py
│ │ └── commands.py
│ │
│ └── annotations/ # Decorators
│ ├── __init__.py
│ └── decorators.py
│
├── vscode-extension/
├── github-action/
├── docs/
├── tests/
├── pyproject.toml
└── README.md
Goal: Core engine with mini agents
| Task | Approach | Effort |
|---|---|---|
| Analyzer agent | Mini pattern + AST tools | 1 week |
| Generator agent | Mini pattern + synthesis tools | 1.5 weeks |
| Validator agent | Mini pattern + testagent | 0.5 weeks |
| Workflow | Agents orchestration |
0.5 weeks |
| Basic CLI | testgen generate |
0.5 weeks |
Deliverables:
testgen generate commandGoal: Test maintenance loop
| Task | Effort |
|---|---|
| Source-test mapping | 1 week |
| Update detection | 0.5 weeks |
| Update generation | 1 week |
testgen update |
0.5 weeks |
| Task | Effort |
|---|---|
| Coverage reports | 1 week |
| Quality scoring | 0.5 weeks |
| Test selection | 1.5 weeks |
| Task | Effort |
|---|---|
| VS Code extension | 3 weeks |
| GitHub Action | 0.5 weeks |
| GitLab CI | 0.5 weeks |
| Task | Effort |
|---|---|
| Annotations | 0.5 weeks |
| MCP integration | 0.5 weeks |
| Polish & docs | 1 week |
Total: ~16 weeks to v1.0.0
from praisonai_testgen import TestGen
testgen = TestGen()
# Generate tests
result = testgen.generate("src/calculator.py")
result = testgen.generate("src/calculator.py::add") # Specific function
# Update tests
result = testgen.update(since="HEAD~1")
# Reports
report = testgen.report()
report = testgen.report(include_risk=True)
# Optimization
tests = testgen.affected_tests(since="main")
from praisonai_testgen import TestGen, TestGenConfig
config = TestGenConfig(
test_dir="tests",
coverage_target=80,
validation_threshold=7.0, # Passed to testagent
)
testgen = TestGen(config=config)
# .testgen/config.yaml
version: 1
project:
test_dir: tests
coverage_target: 80
llm:
model: gpt-4o-mini # Default, uses OPENAI_MODEL_NAME env
validation:
threshold: 7.0 # testagent score threshold
use_cache: true # testagent cache (100x speedup)
generation:
max_tests_per_function: 10
include_edge_cases: true
maintenance:
auto_update: true
| Feature | Implementation | Source |
|---|---|---|
| Agent definitions | Agent(name=..., instructions=..., tools=[...]) |
praisonaiagents |
| Agent orchestration | Agents, Task |
praisonaiagents |
| Tool creation | @tool decorator |
praisonaiagents |
| Test validation | testagent.test(), accuracy() |
testagent |
| Quality judging | testagent.CodeJudge |
testagent |
| Caching | testagent.TestAgentCache |
testagent |
| CLI | Extend testagent CLI | testagent |
| Code analysis | AST + libcst (NEW) | praisonai-testgen |
| Test synthesis | NEW | praisonai-testgen |
| Maintenance | NEW | praisonai-testgen |
Result: ~60% reuse from existing packages. Mini pattern reduces agent code by 50%. Only genuinely new code for code analysis, synthesis, and maintenance.
Document Version: 3.0.0
Product Name: PraisonAI TestGen
Last Updated: 2025-01-28
Architecture: DRY + Mini Agent Pattern