How Artemis Found Hidden Bugs in NVIDIA GPU Libraries

What happens when you run Artemis on production-grade CUDA libraries?
You don’t just optimize code—you uncover bugs that test suites miss.
We applied Artemis to cuCollections and NCCL, two core GPU-accelerated libraries from NVIDIA. Both are widely used for high-performance data structures and multi-GPU communication. Artemis flagged real-world bugs, cleaned up edge-case logic, and made critical paths safer and more maintainable.
📄 Read the white paper: Managing Technical Debt with Artemis
What is Artemis?
Artemis is an evolutionary AI platform for code optimization and validation. It helps developers:
- Analyze code to find hidden bugs, bottlenecks, and design flaws—fast.
- Optimize code with evolutionary algorithms—beyond what prompt-based LLMs alone can do.
- Validate code changes that compile, pass unit tests - benchmarks performance.
How Artemis Fixes Bugs
Artemis uses a three-phase process to analyse, optimize and validate issues in real-world code.
0. Setup: Custom Scoring Prompts
Artemis evaluates code quality using LLM-generated scores based on prompt instructions that is applied throughout different phases in the optimization process including on the original codebase and any code suggestions by other LLMs.

- BUG: for scoring the original code for bugs
- FIX: for scoring the code suggestions for how well it fixes the bug
- Code Analysis and Bug Fixing Process: for generating code change recommendations
This setup allows Artemis to systematically reason about large volumes of code and target areas most in need of attention.
1. Analyse: Locate Hidden Bugs in Source Code
Artemis begins by extracting functions (or classes/files) and running automatic review tasks on each one.
We used this prompt to detect bugs:
1“Analyze the code for potential bugs. Give the highest score if there is definitely a bug. Give the lowest score if there is no bug.”
Artemis scores every function, allowing you to pinpoint suspect logic paths likely to fail or behave incorrectly, even if no test currently exposes the issue. It removes the need for test-based bug discovery and enables bug triage at scale.

2. Optimize: Auto-Generate Patches with LLMs
For high-scoring functions, Artemis uses LLMs to attempt fixes, with this prompt:
1“Analyze the code for bugs. If there are bugs, edit the code to fix them. Only modify the code if you are certain it is a bug.”
Artemis generates multiple fix candidates, filtered by the FIX prompt to prioritize high-confidence changes. False positives are ignored; Artemis does not alter code unless it's confident a real issue exists.

3. Validate: Score and Explain the Fix
Finally, Artemis runs comparative validation between the original and modified code using another prompt:
1“Compare the reference and variant code specifically for bugs. Score how well the variant fixes the bug, or if it introduces new ones.”
It provides both:
- A score indicating how effective the fix is.
- An explanation for the change—great for PR review or auditing.

Results: Optimizing NVIDIA CUDA Libraries with Artemis
cuCollections: Catching Hidden Bugs in GPU Data Structures
Even in a high-quality library, Artemis caught subtle issues that could fail silently:
- Unused Loop Index
A loop was declared withi = rank
but usedrank
inside its body, making the logic misleading and wrong under edge cases. Artemis corrected the reference to usei
.
- Broken Termination Condition
A loop was written aswhile (expr)
instead ofwhile (i < expr)
. Under certain inputs, this caused infinite loops. Artemis updated the condition to prevent runaway execution.
These bugs were undetected by existing tests. Artemis found them via static and semantic analysis.
📂 View Pull request: https://github.com/NVIDIA/cuCollections/pull/675
NCCL: Improving Communication and Memory Safety
In NCCL, Artemis uncovered four issues in critical communication paths:
- Dead Return Statement
Areturn
was unreachable due to earlier loop exits. Artemis removed the dead code to clean up logic.
- Loop Restart Bug
The loop index reset tos = 0
but resumed ats++
, skipping an element. Artemis corrected it tos = -1
with a clear inline comment.
- Memory Leak
A function allocated memory but didn’t free it on all return paths. Artemis injected properfree()
calls to eliminate the leak.
- Quote Matching Failure
An XML parser only matched double quotes, failing on single-quoted attributes. Artemis fixed it to handle both quote types consistently.
📂 View Pull request: https://github.com/NVIDIA/nccl/pull/1635
Why This Matters
✅ Static Tools Aren’t Enough
Artemis finds bugs traditional tools miss—especially in low-level, parallel code.
✅ Reliability at Scale
Preventing subtle indexing or memory errors avoids costly downstream failures in HPC systems.
✅ Production-Ready Fixes
Artemis doesn’t just suggest edits—it applies and validates production-safe changes.
Other Resources
