Reward Function
Last updated
Last updated
We use a velocity-conditioned reward to train the agent. The reward term in each environment step is
Where each is a scaling constant and is an individual reward term.
Let's view our list of way points as a circular path. Each time our environment steps, it retrieves the location of the vehicle in the 3D space as and projects it onto the closest point on the circular path as . With this projection we can also retrieve information about how far (in terms of distances and percentage) we have traversed along this path from the first waypoint.
Then simply resembles the forward distance (along the circular path constructed by waypoints) that we have traversed since the last environment step
velocity
20.0