ā”Reward Function
We use a velocity-conditioned reward to train the agent. The reward term in each environment step is
Where each is a scaling constant and is an individual reward term.
Velocity Reward Term
Let's view our list of way points as a circular path. Each time our environment steps, it retrieves the location of the vehicle in the 3D space as and projects it onto the closest point on the circular path as . With this projection we can also retrieve information about how far (in terms of distances and percentage) we have traversed along this path from the first waypoint.
Then simply resembles the forward distance (along the circular path constructed by waypoints) that we have traversed since the last environment step
Scaling Constants
Reward Term
Scaling Constant
velocity
20.0
Last updated