ā”Reward Function
Last updated
Last updated
Here we describe reward functions provided to the agent assuming that way points and location tracking is available. This is typically available only inside a simulator.
In the roar_py_rl
package there's a RoarRLSimEnv
base environment defined that implements this reward function. The RoarRLSimEnv
requires several parameters (location sensor, list of waypoints, etc.) to be passed in.
We use a velocity-conditioned reward to train the agent. The reward term in each environment step is
Where each is a scaling constant and is an individual reward term.
Let's view our list of way points as a circular path. Each time our environment steps, it retrieves the location of the vehicle in the 3D space as and projects it onto the closest point on the circular path as . With this projection we can also retrieve information about how far (in terms of distances and percentage) we have traversed along this path from the first waypoint.
Then simply resembles the forward distance (along the circular path constructed by waypoints) that we have traversed since the last environment step
The detail of how we implemented this traversal is not important here, but if you want to learn more about how we were able to do this projection fast feel free to read the source code of RoarPyWaypointsTracer
velocity
20.0