Deep Dive

How TraceMint Works

A technical deep-dive into our proof-first analysis pipeline: deterministic taint tracking, CFG-based guard verification and an AI-assisted layer that's proof-gated. Discover why we find vulnerabilities that other scanners miss.

01

The Analysis Pipeline

Unlike traditional scanners that rely on regex patterns, TraceMint uses a multi-stage pipeline that combines static analysis with semantic understanding. Each stage progressively refines candidates from thousands of pattern matches down to verified vulnerabilities.

Analysis Mode Choose depth vs speed tradeoff

Parse

AST extraction via tree-sitter across 30+ languages

source → AST → symbol table

Generate

Pattern + Taint + Route generators produce candidates

1800+ patterns × taint flows

Verify

Category-specific verifiers check guards & sanitizers

guard dominance + binding proof

Verdict

Evidence-backed findings with proof chains

VULN | NEEDS_REVIEW | SAFE
02

Language-Agnostic AST Parsing

We use tree-sitter for high-fidelity AST extraction, enabling precise semantic analysis across 30+ programming languages. Each language has a dedicated taint engine that understands framework-specific idioms.

AST Node Extraction & Taint Propagation
# Source Code (PHP - Laravel)
public function show(Request $request) {
    $orderId = $request->input('order_id');
    $order = Order::find($orderId);
    return response()->json($order);
}

# Extracted AST with Taint Labels
FunctionDecl: show
  └─ Parameter: $request [TAINT_SOURCE: HTTP_REQUEST]

Assignment: $orderId
  └─ MethodCall: $request->input('order_id')
      └─ [USER_CONTROLLED: query_param]

MethodCall: Order::find($orderId)
  └─ Argument: $orderId [TAINT_SINK: DB_LOOKUP]
  └─ [POTENTIAL_IDOR: No ownership check before sink]

Return: response()->json($order)
  └─ [DATA_EXPOSURE: Full object returned]

Codebase Indexing Output

Before analysis begins, we build a complete map of your application's structure.

🗺️ Route Map
GET  /api/orders/{id}     → OrderController@show
POST /api/orders          → OrderController@store
GET  /api/users/{id}      → UserController@show
PUT  /api/users/{id}      → UserController@update
DELETE /api/admin/users   → AdminController@delete
🔗 Call Graph
OrderController@show
  ├─→ OrderService::getOrder()
  │     └─→ Order::find()  [SINK]
  └─→ response()->json()

UserController@update
  ├─→ $this->authorize()  [GUARD]
  └─→ User::update()
📊 Symbol Index
Classes:      127
Functions:    843
Routes:        47
Middlewares:   12
Models:        23
─────────────────
Taint Sources: 89
Potential Sinks: 156
03

Cross-File Interprocedural Taint Tracking

Our interprocedural analysis follows data flow across function boundaries, classes, and even different files to find vulnerabilities that traditional scanners miss. We track taint through 3+ levels of function calls.

routes/api.php Entry Point
Route::get('/order/{id}', [OrderController::class, 'show']);
$id propagates to controller
Controllers/OrderController.php Controller
public function show($id) {
    return $this->orderService->getOrder($id);
}
Crosses class boundary to service
Services/OrderService.php Sink Location
public function getOrder($orderId) {
    return Order::find($orderId);  // IDOR: No ownership check!
}

Taint Flow Visualization

Watch how tainted data flows through your application architecture.

Route Controller Service Repository DB Sink $id getOrder() find() TAINTED!
🔗

Inter-procedural Analysis

Tracks taint through function calls with configurable depth (default: 3 levels). Creates function summaries for reuse.

📁

Cross-File Resolution

Resolves imports, inheritance, and namespace across entire codebase. Handles dependency injection patterns.

🏷️

Field-Sensitive Tracking

Tracks taint in object fields independently: $user->id vs $user->name

04

CFG-Based Guard Verification

Finding a guard isn't enough. We verify that the guard actually protects the vulnerable sink through Control Flow Graph (CFG) dominance analysis. A guard must dominate the sink path to be effective.

🔐

AUTH_CHECK

User authentication

Auth::check(), isLoggedIn()
🛡️

AUTHZ_CHECK

Object-level authorization

$this->authorize(), Gate::allows()
👤

OWNERSHIP

Resource ownership binding

$order->user_id == Auth::id()

VALIDATION

Input sanitization

filter_var(), is_numeric(), htmlspecialchars()

Guard Dominance Analysis

A guard must dominate the sink in the control flow graph. We verify guards don't have bypass patterns.

❌ Guard doesn't dominate - VULNERABLE
if ($isAdmin) {
    log("admin access");
}
// Guard in different branch!
// Sink is NOT protected
$order = Order::find($id);
return $order;
✓ Guard dominates sink - SAFE
$order = Order::find($id);
if ($order->user_id != Auth::id()) {
    abort(403);  // Throws exception
}
// Guard dominates return
return $order;

Proof Obligations: How We Verify

We don't reduce false positives with filters. We reduce them with proof obligations. Every finding must satisfy a formal obligation checklist before getting a verdict.

🔍
ACCESS

Resource primitive detected (fetch/update/delete via ORM, raw query, file op)

Required
🔗
BINDING

ID parameter bound to authorization context (user_id, tenant_id, session)

Check
🛡️
DOMINANCE

Guard dominates sink in CFG (no bypass path exists)

Check
EFFECT

Security-relevant impact (data exposure, state mutation, privilege escalation)

Required
VULN = ACCESS ∧ ¬BINDING ∧ ¬DOMINANCE ∧ EFFECT All obligations must be proven or disproven. Uncertainty → NEEDS_REVIEW

Verdict Outcomes

VULN
All conditions must be true:
  • Taint source reaches sink (verified path)
  • No sanitizer on the path
  • No guard dominates the sink in CFG
  • Effect is security-relevant (data exposure, state change)
→ Report with full evidence chain
NEEDS_REVIEW
Uncertainty in analysis:
  • Guard exists but can't verify effectiveness
  • Custom sanitizer detected but not in our KB
  • Cross-file resolution incomplete
  • Dynamic dispatch blocks static analysis
→ Flag for manual review with context
SAFE
Protection verified:
  • Guard dominates sink AND binds to user identity
  • Sanitizer verified for this vulnerability class
  • Input constrained (enum, hardcoded, internal-only)
  • Framework provides implicit protection
→ Suppress with documented reason
05

5-Stage False Positive Reduction

Our multi-stage filtering system eliminates false positives while preserving real vulnerabilities. Each stage applies progressively more sophisticated analysis to reduce noise.

1

Static Context Filtering

  • Test file detection
  • Comment/string literal exclusion
  • Dead code path removal
  • Example/demo file detection
~20% filtered
2

Framework-Aware Analysis

  • Framework-safe patterns (ORM parameterization)
  • Built-in validation detection
  • Internal/admin-only routes
  • Config file exclusion
~15% filtered
3

AST Sanitizer Detection

  • Category-specific sanitizers
  • Custom validator recognition
  • Type coercion analysis
  • Encoding function detection
~25% filtered
4

Taint-Aware Reachability

  • Cross-file taint verification
  • Function summary utilization
  • Alias & field tracking
  • Conditional taint flow
~20% filtered

Verdict Levels — Not Percentages

We don't claim arbitrary FP reduction numbers. Instead, every finding gets a clear verdict level based on proof obligations:

VERIFIED Docker PoC executed successfully — exploit confirmed
PROOF-BACKED ACCESS + BINDING + DOMINANCE + EFFECT chain complete
NEEDS REVIEW Strong signal but incomplete proof — human review recommended
06

AI-Assisted Semantic Layer (Proof-Gated)

Our local 32B parameter model accelerates analysis, but never makes final decisions alone. Every AI suggestion must pass through our deterministic proof kernel before becoming a verdict. The AI assists — the proof engine decides.

⚖️

Trust Model: What's Deterministic vs AI?

Always On
Deterministic Core
  • AST / CFG / DFG analysis
  • Cross-file taint tracking
  • Guard dominance verification
  • Proof obligation checks
  • Verdict engine logic
Optional
AI-Assisted
  • Candidate expansion
  • Finding ranking
  • Patch suggestions
  • Human-readable explanations
  • Business logic hints

Private, In-House Fine-Tuned Model

Fine-tuned in-house on curated vulnerability data for localization and ranking. No external LLM API calls.

🔒 30K+ Training Examples
📝 25+ Vuln Categories
🎯 80.4% Strict CVE Recall
🔄 Active Development

Semantic Code Understanding

Understands what code does, not just what it looks like. Recognizes custom validators, business logic guards, and framework idioms that pattern matching cannot identify.

Context-Aware Reasoning

Analyzes surrounding code context to determine if a pattern is actually vulnerable or if there's implicit protection. Understands auth middleware, role checks, and ownership patterns.

Continuous Learning

The model is continuously updated with new vulnerability patterns from our ongoing security research. Every CVE we discover improves detection for the next scan.

07

Auto-Generated Proof of Concept

PoC is generated automatically and replayed in a local Docker lab if available. No more spending hours crafting exploit payloads. TraceMint generates them based on the detected vulnerability pattern and your application's API structure.

OrderController.php CRITICAL
42 $orderId = $request->input('order_id');
43
44 // Missing: ownership check!
45 // Should be: if ($order->user_id != Auth::id())
46
47 $order = Order::find($orderId);
48 return response()->json($order);

Proof Obligations

ACCESS: Proven ✓

Resource primitive detected: Order::find() performs database lookup via Eloquent ORM.

BINDING: Failed ✗

No binding found between $orderId and authenticated user context (Auth::id()).

DOMINANCE: Failed ✗

No guard dominates the sink. Auth middleware exists but doesn't check resource ownership.

EFFECT: Proven ✓

DATA_EXPOSURE: Full order object returned including PII fields.

Verdict: VULN

One-Click Simulation

Copy-paste ready PoC commands. Test vulnerabilities immediately without manual payload crafting.

🎯

Context-Aware Payloads

PoCs are generated based on your app's actual routes, parameters, and authentication mechanisms.

📋

Report-Ready Evidence

Every finding includes full taint chain, guard analysis, and reproduction steps for security reports.

🐳

Local Docker Replay

If docker-compose or Dockerfile exists in the repo, PoC runs automatically in an isolated lab and marks the finding as Verified.

"We don't just flag. We prove — and if Docker is available, we reproduce locally."

08

Real-World Results

TraceMint has been battle-tested against 30+ open-source projects, discovering and responsibly disclosing critical vulnerabilities. These are real CVEs, not synthetic benchmarks.

30+ OSS Projects Audited
50+ Vulnerabilities Reported
High Recall on Known CVEs
15+ Critical Severity
EXTENSIBILITY

Built for Customization

TraceMint's modular architecture lets you add new languages, frameworks, and detection rules without modifying core analysis logic.

Core IR + Proof Kernel
Language-agnostic intermediate representation. Proof obligations, CFG analysis, verdict engine.
Stable API
Language Front-ends
PythonPHPJavaScriptJavaGoRuby+10 more
Framework Adapters
DjangoLaravelExpressSpringFastAPIRails+20 more
Plugin System
Vuln Category Rules
IDORSQLiSSRFXSSRCECustom...
Output Formats
SARIFJSONHTMLMarkdownCustom...
YAML Config

Extension Points

🔧
Custom Rules

Define new vulnerability patterns in YAML. Specify sources, sinks, sanitizers, and proof requirements.

rules/custom/my_pattern.yaml
🌐
New Frameworks

Add framework adapters that teach TraceMint about routes, middleware, and built-in protections.

adapters/my_framework.py
🗣️
New Languages

Implement a tree-sitter-based parser and taint engine. The core analysis remains unchanged.

engines/my_lang_taint.py
📤
Custom Reporters

Export findings in any format. Built-in support for SARIF, but easily extensible to JIRA, Slack, etc.

reporters/my_output.py
ADVANTAGE

Four Pillars That Set Us Apart

Competitors promise "AI agents" and "zero false positives." We deliver something more defensible: a system you can verify, trust, and deploy on your terms — self-hosted or managed SaaS.

01
🔒

Data-Control First

Your code, your deployment choice

  • Self-hosted or managed SaaS options
  • No third-party LLM API calls
  • Air-gapped deployment supported
  • SOC2, FedRAMP, GDPR compatible
vs. Third-Party LLMs They send code to external APIs. We keep it in your control.
02
📋

Proof-First

Every finding ships with evidence

  • 4 proof obligations: ACCESS, BINDING, DOMINANCE, EFFECT
  • Complete source→sink taint chain
  • Guard analysis with CFG dominance
  • Exportable evidence for compliance
vs. Pattern Matchers They say "possible IDOR." We prove why it's exploitable.
03
🔬

Verification-Ready

We eliminate FPs—not you

  • 5-stage FP reduction pipeline
  • Auto-generated PoC for each finding
  • Semantic guard verification
  • Patch-pair testing prevents regressions
vs. Alert Fatigue Tools They dump 500 alerts. We surface 7 verified vulns.
04

Mode-Based

Fast for CI, Deep for audits

  • FAST: Pattern + AST for PR checks
  • BALANCED: Proof kernel + taint tracking
  • DEEP: Full LLM verification + PoC
  • Same engine, configurable depth
vs. One-Size-Fits-All They run the same scan everywhere. We adapt to context.

The Difference in Action

Pattern Matcher

"Found User::find($id) - possible IDOR"

You investigate. You write the PoC. You decide if it's real.
TraceMint

IDOR: User::find() at line 47
✗ BINDING: No ownership check against Auth::id()
✗ DOMINANCE: Guard at line 12 doesn't protect sink
→ EFFECT: DATA_EXPOSURE (email, address, phone)

Proof chain complete. PoC generated. Ready to fix or report.

Ready to find vulnerabilities
in your codebase?

Start scanning with TraceMint today. See the difference semantic analysis makes.