Nvidia Let AI Agents Train Robots Overnight — And They Beat the Human Researchers
Nvidia's ENPIRE framework gave AI coding agents a robotics lab, a token budget, and one job: teach robots to insert GPUs and cut zip ties. They hit 99% success. Sometimes faster than humans.
Imagine locking up your robotics lab at 6 PM. You've got robotic arms, a pile of components, and a task that's been stumping your team for weeks. But instead of coming in early tomorrow to grind through more manual training sessions, you let the AI handle it. Not the robot. The researcher.
You give a team of AI coding agents access to the lab, a generous token budget, and one instruction: figure out how to teach the robot to insert a GPU into a motherboard socket. You go home. You sleep. You come back in the morning, pour coffee, and read the report.
The robot can now do it. 99% success rate.
That's not a thought experiment. That happened last week at Nvidia's GEAR lab, and it might be the most important robotics paper of the year.
What Nvidia Actually Built
The system is called ENPIRE, and it comes from Nvidia's GEAR (Generalist Embodied Agent Research) lab, in collaboration with Carnegie Mellon and UC Berkeley. The name is an acronym for its four modules: Environment, Policy Improvement, Rollout, and Evolution.
But forget the acronym for a second. Here's what it actually does.
ENPIRE is a harness — software that wraps around AI coding agents and gives them everything they need to run a robotics experiment from start to finish. It handles the boring infrastructure: resetting the scene after each trial, verifying whether the robot succeeded, logging the results, and feeding that data back to the agents so they can iterate.
The agents don't just run the experiments. They design them. They propose algorithmic approaches, write the training code, launch experiments on physical robots, analyze the failure logs, read research papers for inspiration, and then improve their code for the next round. The whole loop runs autonomously.
Jim Fan, director of AI at Nvidia, put it bluntly on LinkedIn:
"A part of our NVIDIA GEAR lab now self-improves tirelessly overnight. We just read the reports in the morning."
He joked that the goal is for everyone to take a holiday and "Jensen wouldn't even notice." Classic Jim Fan.
The Tasks: Push-T, Pins, Zip Ties, and GPUs
ENPIRE was tested on four real-world manipulation tasks. These aren't toy problems — they require genuine dexterity:
- Push-T: Move a T-shaped block into a target position on a table. Sounds simple. It's the robotics equivalent of a benchmark stress test.
- Pin insertion: Organize pins into a pin box. Requires sub-millimeter precision.
- Zip-tie tying and cutting: The robot has to thread a zip tie, tighten it, and then cut the excess with a tool.
- GPU insertion: Place a graphics card into a motherboard socket, then unplug it to reset for the next trial.
The results? 99% success rate across the board.
The most jaw-dropping result was the pin insertion task. The AI coding agents achieved nearly 100% success faster than a frontier human-in-the-loop method developed by many of the same human researchers. Let that sink in. The AI researchers beat the human researchers at the humans' own job — training robots — and they did it faster.
Three Agents Walked Into a Lab
Nvidia tested three different AI coding agents to see who was best at the job:
- OpenAI Codex powered by GPT-5.5
- Anthropic Claude Code powered by Opus 4.7
- Moonshot AI's Kimi Code powered by Kimi K2.6
Each agent team independently developed different algorithmic strategies — behavior cloning, offline and online reinforcement learning, heuristic approaches. They ran real-world experiments, kept what worked, and discarded what didn't. Think of it as evolution, but the mutations are code commits and the selection pressure is whether a robot arm can stick a pin in a hole.
The idea tree from the experiments is genuinely beautiful in a nerdy way. They traced 80+ ideas across multiple agent branches, plotted on a wall-clock timeline alongside the success rate curve. You can see exactly which ideas moved the needle: "Online RL mix Demo" added 3.8 percentage points. "BC regularization" added 10.8 points. Each green dot on the tree is an idea that worked.
More Agents = Faster Results (Mostly)
Here's where it gets interesting from a practical standpoint. Nvidia tested teams of different sizes:
- 8 agents: Hit 99% on Push-T in 2 hours
- 4 agents: Hit 99% in 3 hours
- Single agent: Hit 99% in nearly 5 hours
More agents working in parallel means faster convergence. That tracks with what we see in software too — more parallel exploration of the solution space finds good answers quicker.
But there's a catch, and it's an important one.
The Robots Got Bored
The human researchers found some significant limitations when they let the agents loose.
The robots sat idle a lot. While the coding agents were busy reading logs, writing code, debugging, or waiting for their language model backbone to respond, the physical robots just... sat there. Unused. If you're thinking about hardware utilization, that's wasteful.
Bigger teams talked more, worked less. Larger teams of coding agents spent more time summarizing each other's ideas and less time actually using the robots. There's a coordination overhead — the same one that bloats human meetings. Turns out AI agents suffer from the same "too many cooks" problem.
Compute wasn't always maxed out. The agents sometimes failed to make full use of available resources when launching parallel training sessions. They'd leave compute on the table.
Token costs scale with success. The faster success rates came at the cost of higher token consumption. Which, in a week where Anthropic was simultaneously trying to jack up token-based billing for its Claude Agent SDK, feels like a pointed reminder. Autonomous research is expensive when you're burning tokens around the clock.
These limitations aren't dealbreakers. They're the kind of first-version friction you expect from a breakthrough. But they're real, and the Nvidia team deserves credit for publishing them honestly instead of hiding them behind a glossy demo video.
Why This Matters Beyond Nvidia
Here's the bigger picture. We've spent the last two years watching AI coding agents get good at writing software. Cursor, Claude Code, Copilot — they're all eating the coding workflow. But ENPIRE shows that the same paradigm — give an AI agent a feedback loop and let it iterate — works in the physical world too.
The missing piece was always the bridge between digital and physical. A coding agent can compile and run code in milliseconds. A robot takes seconds to move. The feedback loop is slower, messier, and more expensive. ENPIRE's contribution is making that loop automatable — reset, execute, verify, refine — without a human in the middle.
And Nvidia isn't keeping this to themselves. Fan said the team plans to open-source everything. The vision: anyone can host their own "self-running robot lab at home." Now, your home probably doesn't have a fleet of robotic arms. But university labs, startups, and maker spaces do. If ENPIRE works as advertised, it democratizes robotics research in a way we haven't seen before.
Meanwhile, Nvidia is pushing hard on physical AI across the board. They partnered with Unitree to build a "Reference Humanoid Robot." Jensen Huang just met with Hyundai (which owns Boston Dynamics) to talk about mass-manufacturing AI robots. The company that sells the picks and shovels for the AI gold rush is now building the miners too.
The Honest Take
Is this the moment robots start training themselves and we're all out of a job? No. Let's be real.
The tasks are impressive but narrow. Inserting a GPU into a socket is a long way from assembling a car or folding laundry. The 99% success rate uses "pass@8" — meaning the agent gets up to 8 in-context retries per subtask, each conditioned on previous failures. That's not the robot nailing it on the first try every time. It's the robot recovering from mistakes within a rolling window. Still impressive, but worth understanding the nuance.
The idle robot problem is also a sign that we're early. The system works, but it's not optimized. A lot of the "autonomy" is currently the agents spending time thinking (and burning tokens) while expensive hardware collects dust.
But here's what I'd bet on: the gap between "impressive demo" and "production system" is closing fast. ENPIRE is a framework, not a one-off trick. Every limitation the researchers identified is an engineering problem with a known solution. Better scheduling algorithms will fix the idle robots. Smarter agent coordination will fix the communication overhead. The trajectory is clear.
Six months ago, the idea of an AI agent autonomously running a robotics lab overnight was science fiction. Last week, Nvidia did it and published the paper. Six months from now, someone will have improved on it.
The real lesson isn't about robots. It's about agents. The pattern that works — give an AI a goal, a feedback loop, and tools, then let it iterate — is domain-agnostic. It works for code. It works for robots. It'll work for whatever you're trying to automate next.
The question isn't whether AI agents will be doing your job. It's whether you'll be the one directing them or competing with them.
If you're curious what an AI agent can do on your machine today — no robotic arm required — CopperRiver is a desktop AI assistant that browses the web, runs terminal commands, reads files, and automates tasks. It runs on open-source models. Plans start at $9/mo.