Vulnerability Scanner – CrewAI

I built a self-hosted AI pipeline that clones a Java or Kotlin repository, resolves the full transitive dependency tree, checks every dependency against the OSV and NVD vulnerability databases, and generates an actionable security report — complete with BOM-aware upgrade recommendations and the exact build file lines to change. It runs as a Docker container with a built-in web UI and scan history. The code is open source.

GitHub: github.com/crewwithravi/scanner

What is VulnHawk?

VulnHawk is a personal research project I built to explore multi-agent AI architectures applied to software supply chain security. The idea is straightforward:

You submit — a GitHub repository URL or a list of Maven/Gradle dependency coordinates (via the API or web UI)
VulnHawk resolves — the full transitive dependency tree using mvn dependency:tree or gradle dependencies, then batch-queries the OSV and NVD vulnerability databases
AI interprets — a CrewAI agent pipeline reads the vulnerability data, checks whether each dep is BOM-managed, reviews changelogs, and searches the source code for affected APIs
You receive — a structured 10-section Markdown report with CVE IDs, severity ratings, and the exact version bump to make in your pom.xml or build.gradle

The key design principle: the vulnerability data is always real. The AI interprets and recommends — it never fabricates CVE IDs or version numbers.

Why I Built It

As someone working in enterprise application development — and currently pursuing a Master's in Data Science — I wanted a project that sits at the intersection of production engineering and applied AI. Dependency auditing was the right target: it is tedious, repetitive, and full of decisions that look simple but are surprisingly subtle.

The subtlest problem is the BOM trap. In any Spring Boot project, libraries like embedded Tomcat, Jackson, and Netty are not directly declared in pom.xml — their versions are managed by the Spring Boot BOM. If you bump them directly you break BOM consistency. The correct fix is to bump the Spring Boot parent version, which brings the safe library version automatically. Most vulnerability scanners miss this entirely and either give you the wrong fix or no fix at all.

VulnHawk detects this automatically and tells you exactly which parent version to change.

Key Features

Feature	Description
Full transitive resolution	Runs `mvn dependency:tree` or `gradle dependencies` to capture every transitive dependency — not just the ones declared in the build file. Falls back to BFS POM expansion via Maven Central if no build tool is installed.
OSV + NVD dual-source	Batch-queries the OSV API for all dependencies. Automatically falls back to NVD for Apache Tomcat, Struts, and Log4j — packages that OSV only tracks via Git commit ranges rather than Maven coordinates.
BOM-aware upgrades	Detects Spring Boot BOM-managed libraries and recommends bumping the parent version instead of the dependency directly. Covers 8 Spring Boot releases from 2.6.x through 3.5.x.
Changelog + code review	For each upgrade, fetches release notes from GitHub, searches the project source for affected API usage, and produces a confidence score on whether the upgrade is safe.
Scan history & OSV cache	Every scan is saved to a local SQLite database. OSV results are cached for 24 hours — re-scanning the same project within that window skips the API calls entirely.
Fast / deterministic mode	Skip the LLM agents and generate the report programmatically from raw tool output. Used automatically with Ollama backends or by setting `VULNHAWK_FAST=1`.
Web UI + REST API	Browser UI with live progress, rendered report tables, severity badges, and a scan history panel. Full REST API with Swagger docs at `/docs`.
Multi-LLM support	Switch between Google Gemini (default, no GPU), Ollama (self-hosted, NVIDIA GPU required) with one environment variable.

Architecture: 4-Agent Pipeline

VulnHawk uses a sequential CrewAI pipeline — four specialised agents that hand structured data to the next in line. All data is fetched and computed from real sources before the AI layer ever sees it.

Agent 1  Repo Scanner          → detect Maven/Gradle, run dependency:tree
                                  output: [{group_id, artifact_id, version, scope, depth}]

Agent 2  Vulnerability Analyst → batch OSV query + NVD fallback per dep
                                  output: {vuln_count, vulnerabilities: [{id, severity, fix_version}]}

Agent 3  Upgrade Strategist    → BOM check → version lookup → changelog → code search
                                  output: upgrade plan with confidence score per dep

Agent 4  Report Generator      → 10-section Markdown report with exact build file changes

Every API response separates verified data (CVE IDs, CVSS scores, affected version ranges from OSV/NVD) from AI output (prose summaries, compatibility assessment, confidence scores). The LLM report is validated against the raw vulnerability data after generation — if it fails validation, VulnHawk silently falls back to a deterministic report builder.

Tech Stack

Component	Technology	Purpose
Agent framework	CrewAI	Sequential multi-agent pipeline with tool use
LLM (cloud)	Google Gemini 2.0 Flash	Default backend — fast, free tier available
LLM (self-hosted)	Ollama	Local option, NVIDIA GPU required, auto fast mode
API server	FastAPI + Uvicorn	REST endpoints, lifespan management, static file serving
Vulnerability data	OSV API + NVD API	Primary CVE source with NVD fallback for blind spots
Version lookup	Maven Central Search API	Find the smallest safe upgrade version on Maven Central
Persistence	SQLite (WAL mode)	Scan history and 24h OSV result cache
Container	Docker + Docker Compose	Non-root image with Java 17 + Maven; optional Ollama GPU service
Web UI	HTML + Tailwind CSS + JS	Built-in dark-themed UI served directly by FastAPI — no build step

The Web UI

VulnHawk ships with a built-in web interface — no separate frontend build required. It includes:

Dark theme with glass-morphism design matching the API aesthetic
GitHub URL tab and dependency list tab — paste coordinates directly to scan without cloning
Live progress bar showing which agent is currently running
Rendered report with colour-coded severity badges (CRITICAL / HIGH / MEDIUM / LOW), sortable dependency table, and upgrade plan
History panel — clock icon top-right — browse, reload, or delete any past scan
Health indicator showing LLM backend connectivity

The frontend is pure HTML, JavaScript, and CSS served directly by FastAPI. Zero build step, zero npm.

API Endpoints

Method	Endpoint	Description
`GET`	`/health`	Health check — LLM backend connectivity and provider info
`POST`	`/scan`	Run a full vulnerability scan — accepts `github_url` or raw `input` dependency coordinates
`GET`	`/history`	List past scan summaries ordered most-recent first (supports `?limit=N`)
`GET`	`/history/{id}`	Retrieve the full Markdown report for a specific past scan
`DELETE`	`/history/{id}`	Remove a scan record from the history database

Interactive API documentation is auto-generated at /docs (Swagger UI).

Running It Yourself

The quickest way to try VulnHawk locally:

git clone https://github.com/crewwithravi/scanner.git
cd scanner
cp .env.example .env
# Add your GEMINI_API_KEY to .env

./deploy.sh

The deploy script handles everything — creates the .env interactively if it doesn't exist, builds the Docker image, starts the container, and waits for the health check. Then open http://localhost:8000 in your browser.

To scan without Docker:

pip install -r requirements.txt

LLM_VENDOR=google \
GEMINI_API_KEY=your-api-key \
GOOGLE_MODEL=gemini-2.0-flash \
uvicorn app.main:app --host 0.0.0.0 --port 8000

For full deployment options, scan history persistence, and Docker volume configuration, see the documentation on GitHub.

BOM-Aware Upgrades: How It Works

This is the part of VulnHawk I'm most proud of. For every vulnerable dependency, Agent 3 first checks whether it is managed by the Spring Boot BOM before recommending any version change. Here is what that looks like in the report:

⚠  DO NOT bump tomcat-embed-core directly — it is BOM-managed.

FIX: Upgrade spring-boot 3.2.5 → 3.3.10
     This automatically brings tomcat-embed-core 10.1.30 (≥ safe version).

pom.xml:       <spring-boot.version>3.3.10</spring-boot.version>
build.gradle:  id 'org.springframework.boot' version '3.3.10'

Libraries covered by the BOM resolver:

Library	Maven Coordinates	Common Vulnerability
Embedded Tomcat	`org.apache.tomcat.embed:tomcat-embed-core`	Session fixation, request smuggling CVEs
Jackson Databind	`com.fasterxml.jackson.core:jackson-databind`	Deserialization gadget chain CVEs
Netty	`io.netty:netty-all`	HTTP request smuggling CVEs
Log4j	`org.apache.logging.log4j:log4j-core`	Log4Shell (CVE-2021-44228) and follow-on CVEs
Spring Framework	`org.springframework:spring-core`	Spring4Shell and expression injection CVEs
Logback	`ch.qos.logback:logback-classic`	JNDI injection CVEs

What's Next

Support for Gradle multi-project builds — scan each subproject separately
GitHub Actions integration — post scan results as PR comments
Private repository support via GitHub token
Scheduled background re-scans with email / Slack alerts when new CVEs are published for tracked repos
SBOM export in CycloneDX format

Disclaimer

VulnHawk is an educational and experimental project built for learning and demonstration purposes. It is not production security software. Scan results are a useful starting point but may be incomplete — the OSV and NVD databases have known gaps, and AI-generated upgrade assessments should always be verified independently before being applied to production systems. Always consult a qualified security professional for critical decisions. Use at your own risk.

Full source code, deployment guide, and API docs:
github.com/crewwithravi/scanner

Licensed under MIT · Built with CrewAI, FastAPI, and OSV