⚔Reward Function

Here we describe reward functions provided to the agent assuming that way points and location tracking is available. This is typically available only inside a simulator.

In the roar_py_rl package there's a RoarRLSimEnv base environment defined that implements this reward function. The RoarRLSimEnv requires several parameters (location sensor, list of waypoints, etc.) to be passed in.

We use a velocity-conditioned reward to train the agent. The reward term in each environment step is

rt=āˆ‘icirtir_t = \sum_i c_i r^i_t

Where each ci c_i is a scaling constant and rtir_t^i is an individual reward term.

Velocity Reward Term

rtv=Γd^tr_t^v = \delta\hat{d}_t

Let's view our list of way points as a circular path. Each time our environment steps, it retrieves the location of the vehicle in the 3D space as pāƒ—v\vec{p}_v and projects it onto the closest point on the circular path as pāƒ—^v\hat{\vec{p}}_v. With this projection we can also retrieve information about how far (in terms of distances d^v\hat{d}_v and percentage) we have traversed along this path from the first waypoint.

Then Ī“d^t=d^tāˆ’d^tāˆ’1\delta \hat{d}_t = \hat{d}_t - \hat{d}_{t-1} simply resembles the forward distance (along the circular path constructed by waypoints) that we have traversed since the last environment step

The detail of how we implemented this traversal is not important here, but if you want to learn more about how we were able to do this projection fast feel free to read the source code of RoarPyWaypointsTracer

Scaling Constants

Reward Term
Scaling Constant

velocity rtvr^v_t

20.0

Last updated