β‘Reward Function
Here we describe reward functions provided to the agent assuming that way points and location tracking is available. This is typically available only inside a simulator.
In the roar_py_rl package there's a RoarRLSimEnv base environment defined that implements this reward function. The RoarRLSimEnv requires several parameters (location sensor, list of waypoints, etc.) to be passed in.
We use a velocity-conditioned reward to train the agent. The reward term in each environment step is
rtβ=βiβciβrtiβ
Where each ciβ is a scaling constant and rtiβ is an individual reward term.
Velocity Reward Term
rtvβ=Ξ΄d^tβ
Let's view our list of way points as a circular path. Each time our environment steps, it retrieves the location of the vehicle in the 3D space as pβvβ and projects it onto the closest point on the circular path as pβ^βvβ. With this projection we can also retrieve information about how far (in terms of distances d^vβ and percentage) we have traversed along this path from the first waypoint.
Then Ξ΄d^tβ=d^tββd^tβ1β simply resembles the forward distance (along the circular path constructed by waypoints) that we have traversed since the last environment step
The detail of how we implemented this traversal is not important here, but if you want to learn more about how we were able to do this projection fast feel free to read the source code of RoarPyWaypointsTracer
Scaling Constants
velocity rtvβ
20.0
Last updated