RL & SimulationSelected work2026 – present

Reinforcement-Learned Locomotion & Evaluation in Isaac Sim

PPO locomotion policies and benchmark-style evaluation in Isaac Sim / Isaac Lab, with sim-to-real transfer as the point, not an afterthought.

Role — RL training infrastructure, benchmark/evaluation setup, sim-to-real analysis

Isaac SimIsaac LabPPOPythonPyTorch

Pipeline diagram

Problem: Legged locomotion policies are easy to overfit to a simulator. The question is always the same: what in the reward, randomization, and evaluation setup actually predicts real-world behavior?
System type: Simulation training + evaluation pipeline
Why it matters: Sim-to-real is the tax every embodied-AI system pays. Understanding it at the training-and-evaluation layer is what makes the humanoid deployment predictable instead of lucky.
Team context: Part of the Booster K1 research program under Dr. Yiyan Li; training infrastructure built as the deployment substrate.

The Booster K1 model in an Isaac Sim training environment — K1 in the Isaac Sim training environment.

Overview

The training side of the Booster K1 research program: building reinforcement-learning infrastructure in NVIDIA Isaac Lab / Isaac Sim for the K1's velocity-tracking locomotion policy — the 50 Hz tier that executes what the vision-language planner decides. The work covers PPO training, benchmark and evaluation tooling, and the practical study of which simulated behaviors survive contact with hardware.

System architecture

Standard modern RL loop: parallelized simulation environments feed a PPO learner; checkpoints flow into an evaluation harness that scores policies on defined tasks; results inform reward and randomization iteration — and the surviving policies deploy under the VLA planner on the real robot.

Isaac Sim / Isaac Lab environments

PPO training loop

Policy checkpoints — 50 Hz control

Benchmark & evaluation harness

Deploy to K1 under VLA planner

Isaac Sim / Isaac Lab environments

PPO training loop

Policy checkpoints — 50 Hz control

Benchmark & evaluation harness

Deploy to K1 under VLA planner

PPO training and evaluation pipeline in Isaac Sim and Isaac Lab — Training → checkpoint → evaluation loop.

Contributions

Building PPO training infrastructure in Isaac Lab / Isaac Sim for the K1's velocity-tracking locomotion policy (50 Hz control).
Running benchmark/evaluation passes over trained policies.
[Specify as the work matures: environments, reward shaping decisions, randomization strategy, and which evaluations were designed vs. reused.]to fill

Evidence & evaluation

Evidence

Pipeline diagram

attached

Training → checkpoint → evaluation loop in the gallery.

Training curves

pending

Attach reward/episode-length curves with config details.

Evaluation protocol

pending

Document tasks, seeds, episode counts, and success criteria.

Sim-to-real comparison

pending

Side-by-side of simulated vs. real behavior for the same policy class.

Metrics

Control rate

50 Hz

Benchmark scores

Not yet measured

Report only with the exact eval config attached.

Training scale

Not yet measured

Env count, steps, wall-clock — measured, not recalled.

Limitations

[State clearly which parts of the pipeline were built vs. configured — Isaac Lab ships strong defaults, and readers know it.]to fill

Lessons & tradeoffs

Evaluation design is where RL work becomes science; without a fixed protocol, every policy looks fine in its own demo.
[Add a concrete reward-design or randomization lesson from your runs.]to fill

Artifacts

Evaluation writeupnot yet published
Training / eval codenot yet published
Pipeline diagram