Sensori Robotics
Overview
Senior capstone project developing PPO-based autonomous navigation in Nvidia IsaacSim for a mobile robotics platform.
Overview
Senior capstone project at The University of Texas at Dallas (Fall 2025). The project developed an autonomous navigation system for a mobile robotics platform, using reinforcement learning for path planning and Nvidia IsaacSim/IsaacLab as the simulation environment.
Role
Developer on the capstone team, focused on the reinforcement-learning training loops and environment creation — porting real-world environments into Nvidia IsaacSim and defining the IsaacLab training setup (observation and action spaces, reward shaping) used to train the PPO navigation policy.
Problem
Training a navigation policy on physical hardware is slow and expensive — every failed trajectory risks damaging the robot or its environment, and resetting between episodes takes real time. Dynamic obstacles compound this because the agent needs thousands of interactions to learn reactive avoidance behavior, which is impractical to collect in the real world. The project needed a way to iterate on the navigation policy rapidly without physical deployment.
Approach
The navigation policy was trained using Proximal Policy Optimization (PPO), a reinforcement learning algorithm suited for continuous control tasks. PPO was chosen over alternatives like SAC or DQN because it offers stable training with monotonic improvement guarantees, which mattered given the limited capstone timeline — an unstable training process would have been difficult to debug within a single semester.
Real-world environments were ported into Nvidia IsaacSim, which provided a physics-accurate simulation for the robot to train in. IsaacLab sat on top of IsaacSim as the framework for defining the RL training loop — observation space, action space, reward shaping, and episode management. This allowed the team to run thousands of training episodes in parallel on GPU, compressing what would have been weeks of physical testing into hours of simulation time.
Control systems principles were applied at the hardware-software boundary to translate the learned policy’s continuous action outputs into motor commands compatible with the physical robot platform, bridging the sim-to-real gap for eventual deployment validation.
Availability
Source code and detailed results are covered under NDA and are not publicly available.
Stack
Timeline
Aug 2025 — Dec 2025