⚡Reward Function

Here we describe reward functions provided to the agent assuming that way points and location tracking is available. This is typically available only inside a simulator.

In the roar_py_rl package there's a RoarRLSimEnv base environment defined that implements this reward function. The RoarRLSimEnv requires several parameters (location sensor, list of waypoints, etc.) to be passed in.

We use a velocity-conditioned reward to train the agent. The reward term in each environment step is

$r_t = \sum_i c_i r^i_t$

Where each $c_i$ is a scaling constant and $r_t^i$ is an individual reward term.

Velocity Reward Term

$r_t^v = \delta\hat{d}_t$

Let's view our list of way points as a circular path. Each time our environment steps, it retrieves the location of the vehicle in the 3D space as $\vec{p}_v$ and projects it onto the closest point on the circular path as $\hat{\vec{p}}_v$ . With this projection we can also retrieve information about how far (in terms of distances $\hat{d}_v$ and percentage) we have traversed along this path from the first waypoint.

Then $\delta \hat{d}_t = \hat{d}_t - \hat{d}_{t-1}$ simply resembles the forward distance (along the circular path constructed by waypoints) that we have traversed since the last environment step

The detail of how we implemented this traversal is not important here, but if you want to learn more about how we were able to do this projection fast feel free to read the source code of RoarPyWaypointsTracer

Scaling Constants

Reward Term

Scaling Constant

velocity $r^v_t$

20.0

PreviousSim Environments NextObservation Space

Last updated 2 years ago