Deep Reinforcement Learning with Enhanced PPO for Safe Mobile Robot Navigation
Abstract
This study trains a deep reinforcement learning agent to enable a wheeled mobile robot to navigate autonomously and collision-free toward a target in complex, obstacle-rich environments. Using only raw LiDAR observations and the goal pose, a deep neural network maps perception directly to continuous velocity commands. The work compares DDPG and PPO and introduces an enhanced PPO network architecture together with a tailored reward function, validated in both obstacle and obstacle-free Gazebo environments.
This paper sits at the core of my research direction: learning-based control for safe autonomous navigation. Rather than relying on a hand-built map and classical planner, the agent learns an end-to-end policy that reads sparse LiDAR returns plus the relative goal and outputs continuous linear and angular velocity.
Approach
- Observation → action, end-to-end. A deep network consumes LiDAR beams and the target pose and directly emits continuous control — no SLAM, no occupancy grid.
- DDPG vs PPO. Both continuous-control algorithms are evaluated; PPO proves more stable for this task.
- Enhanced PPO. A modified network architecture and a reward function tailored to collision avoidance and goal-reaching improve navigation performance.
- Validation. Tested across obstacle and obstacle-free environments in Gazebo.
The companion implementation is open-sourced as
navbot_ppo — a mapless motion planner for a
TurtleBot in a Dockerized Gazebo setup.