Skip to content
Jangara Bliss
All projects
RL & SimulationSelected work2026 – present

Reinforcement-Learned Locomotion & Evaluation in Isaac Sim

PPO locomotion policies and benchmark-style evaluation in Isaac Sim / Isaac Lab, with sim-to-real transfer as the point, not an afterthought.

Role — RL training infrastructure, benchmark/evaluation setup, sim-to-real analysis

Isaac SimIsaac LabPPOPythonPyTorch
Problem
Legged locomotion policies are easy to overfit to a simulator. The question is always the same: what in the reward, randomization, and evaluation setup actually predicts real-world behavior?
System type
Simulation training + evaluation pipeline
Why it matters
Sim-to-real is the tax every embodied-AI system pays. Understanding it at the training-and-evaluation layer is what makes the humanoid deployment predictable instead of lucky.
Team context
Part of the Booster K1 research program under Dr. Yiyan Li; training infrastructure built as the deployment substrate.
The Booster K1 model in an Isaac Sim training environment
K1 in the Isaac Sim training environment.

01

Overview

The training side of the Booster K1 research program: building reinforcement-learning infrastructure in NVIDIA Isaac Lab / Isaac Sim for the K1's velocity-tracking locomotion policy — the 50 Hz tier that executes what the vision-language planner decides. The work covers PPO training, benchmark and evaluation tooling, and the practical study of which simulated behaviors survive contact with hardware.

System architecture

Standard modern RL loop: parallelized simulation environments feed a PPO learner; checkpoints flow into an evaluation harness that scores policies on defined tasks; results inform reward and randomization iteration — and the surviving policies deploy under the VLA planner on the real robot.

  1. Isaac Sim / Isaac Lab environments
  2. PPO training loop
  3. Policy checkpoints — 50 Hz control
  4. Benchmark & evaluation harness
  5. Deploy to K1 under VLA planner
PPO training and evaluation pipeline in Isaac Sim and Isaac Lab
Training → checkpoint → evaluation loop.

02

Contributions

  • Building PPO training infrastructure in Isaac Lab / Isaac Sim for the K1's velocity-tracking locomotion policy (50 Hz control).
  • Running benchmark/evaluation passes over trained policies.
  • [Specify as the work matures: environments, reward shaping decisions, randomization strategy, and which evaluations were designed vs. reused.]to fill

03

Evidence & evaluation

Evidence

Pipeline diagram

attached

Training → checkpoint → evaluation loop in the gallery.

Training curves

pending

Attach reward/episode-length curves with config details.

Evaluation protocol

pending

Document tasks, seeds, episode counts, and success criteria.

Sim-to-real comparison

pending

Side-by-side of simulated vs. real behavior for the same policy class.

Metrics

Control rate

50 Hz

Benchmark scores

Not yet measured

Report only with the exact eval config attached.

Training scale

Not yet measured

Env count, steps, wall-clock — measured, not recalled.

04

Limitations

  • [State clearly which parts of the pipeline were built vs. configured — Isaac Lab ships strong defaults, and readers know it.]to fill

05

Lessons & tradeoffs

  • Evaluation design is where RL work becomes science; without a fixed protocol, every policy looks fine in its own demo.
  • [Add a concrete reward-design or randomization lesson from your runs.]to fill

06

Artifacts

  • Evaluation writeupnot yet published
  • Training / eval codenot yet published
  • Pipeline diagram