Building NavBot-PPO for Safer Mobile Robots
I set out to answer a simple question for my thesis: what does it take for a mobile robot to plan safe trajectories in unstructured labs without overfitting to simulation? That question became NavBot-PPO, a re-imagined PPO baseline built on PyTorch, ROS, and Gazebo.
Key ingredients:
- Curriculum scheduling. I scripted a difficulty ramp that injects denser obstacle layouts only after the policy crosses a safety threshold. It kept training stable even when replay buffers were noisy.
- Risk-aware critics. Additional critics estimate collision likelihood so the actor can down-weight risky turns without freezing.
- Sensing augmentation. LiDAR dropout, Gaussian range noise, and random resets improved sim-to-real transfer when testing in the lab.
The result: 38% fewer collisions and roughly 25% faster convergence compared to the standard PPO baseline on the same GPU. Next, I am exploring how these techniques compose with diffusion-based planners.
Enjoy Reading This Article?
Here are some more articles you might like to read next: