How Artemis Found Hidden Bugs in NVIDIA GPU Libraries

Multiple Authors

March 10, 2025

What happens when you run Artemis on production-grade CUDA libraries?

You don’t just optimize code—you uncover bugs that test suites miss.

We applied Artemis to cuCollections and NCCL, two core GPU-accelerated libraries from NVIDIA. Both are widely used for high-performance data structures and multi-GPU communication. Artemis flagged real-world bugs, cleaned up edge-case logic, and made critical paths safer and more maintainable.

‍

📄 Read the white paper: Managing Technical Debt with Artemis

‍

What is Artemis?

Artemis is an evolutionary AI platform for code optimization and validation. It helps developers:

‍Analyze code to find hidden bugs, bottlenecks, and design flaws—fast.
‍Optimize code with evolutionary algorithms—beyond what prompt-based LLMs alone can do.
‍Validate code changes that compile, pass unit tests - benchmarks performance.

‍

‍How Artemis Fixes Bugs

Artemis uses a three-phase process to analyse, optimize and validate issues in real-world code.

‍

0. Setup: Custom Scoring Prompts

Artemis evaluates code quality using LLM-generated scores based on prompt instructions that is applied throughout different phases in the optimization process including on the original codebase and any code suggestions by other LLMs.

‍

BUG: for scoring the original code for bugs
FIX: for scoring the code suggestions for how well it fixes the bug
Code Analysis and Bug Fixing Process: for generating code change recommendations

‍

This setup allows Artemis to systematically reason about large volumes of code and target areas most in need of attention.

‍

1. Analyse: Locate Hidden Bugs in Source Code

Artemis begins by extracting functions (or classes/files) and running automatic review tasks on each one.

We used this prompt to detect bugs:

1“Analyze the code for potential bugs. Give the highest score if there is definitely a bug. Give the lowest score if there is no bug.”

‍

Artemis scores every function, allowing you to pinpoint suspect logic paths likely to fail or behave incorrectly, even if no test currently exposes the issue. It removes the need for test-based bug discovery and enables bug triage at scale.

‍

2. Optimize: Auto-Generate Patches with LLMs

For high-scoring functions, Artemis uses LLMs to attempt fixes, with this prompt:

1“Analyze the code for bugs. If there are bugs, edit the code to fix them. Only modify the code if you are certain it is a bug.”

‍

Artemis generates multiple fix candidates, filtered by the FIX prompt to prioritize high-confidence changes. False positives are ignored; Artemis does not alter code unless it's confident a real issue exists.

‍

Filtered by scores and generated recommendations using the custom prompt

‍

3. Validate: Score and Explain the Fix

Finally, Artemis runs comparative validation between the original and modified code using another prompt:

1“Compare the reference and variant code specifically for bugs. Score how well the variant fixes the bug, or if it introduces new ones.”

‍

It provides both:

‍A score indicating how effective the fix is.
‍An explanation for the change—great for PR review or auditing.

‍

‍

Results: Optimizing NVIDIA CUDA Libraries with Artemis

cuCollections: Catching Hidden Bugs in GPU Data Structures ‍

Even in a high-quality library, Artemis caught subtle issues that could fail silently:

‍Unused Loop Index
‍A loop was declared with i = rank but used rank inside its body, making the logic misleading and wrong under edge cases. Artemis corrected the reference to use i. ‍
‍
‍Broken Termination Condition
‍A loop was written as while (expr) instead of while (i < expr) . Under certain inputs, this caused infinite loops. Artemis updated the condition to prevent runaway execution.

These bugs were undetected by existing tests. Artemis found them via static and semantic analysis.

‍

📂 View Pull request: https://github.com/NVIDIA/cuCollections/pull/675

‍

NCCL: Improving Communication and Memory Safety ‍

In NCCL, Artemis uncovered four issues in critical communication paths:

‍Dead Return Statement
‍A return was unreachable due to earlier loop exits. Artemis removed the dead code to clean up logic.
‍
‍Loop Restart Bug
‍The loop index reset to s = 0 but resumed at s++ , skipping an element. Artemis corrected it to s = -1 with a clear inline comment. ‍
‍
‍Memory Leak
‍A function allocated memory but didn’t free it on all return paths. Artemis injected proper free() calls to eliminate the leak.
‍
‍Quote Matching Failure
‍An XML parser only matched double quotes, failing on single-quoted attributes. Artemis fixed it to handle both quote types consistently.