CorePolicy
Abstract base class for all implemented policy.
Do not use this abstract base class directly but instead use one of the concrete policy implemented.
To implement your own policy, you have to implement the following methods:
decayuse_network
Methods:
.reset
.reset()
reset
.decay
.decay()
Decaying the epsilon / sigma value of the policy.
.use_network
.use_network()
Sample an experience replay batch with size.
Returns
use (bool): Boolean value for using the nn.
GreedyQPolicy
Methods:
.reset
.reset()
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.decay
.decay()
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.use_network
.use_network()
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
EpsilonGreedyPolicy
EpsilonGreedyPolicy(
max_value = 1.0, min_value = 0.0, decay_steps = 1
)
Epsilon Greedy
Arguments
max_value (float): . min_value (float): . decay_steps (int): .
Methods:
.reset
.reset()
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.decay
.decay()
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.use_network
.use_network()
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.