Software
Simulation
Policy Training
⚠️

This documentation is under construction and incomplete. Please sign up here for K-Scale updates (opens in a new tab) and check back later for our progress.

Minimal PPO Implementation

GitHub (opens in a new tab)

A minimal implementation of Proximal Policy Optimization (PPO) utilizing JAX in just three files. Users can import their own custom Mujoco environemnts, define their rewards, and train their own agents with ease.

With this pipeline, we can train agents to perform basic tasks with complete understanding of the underlying training loop, rewards, and physics. Compared to Isaac Gym, this pipeline is much more easier to understand, lightweight, and therefore hackable for research settings.

Here's a video with some basic walking/standing with a humanoid robot: