CorePolicy



Abstract base class for all implemented policy.

Do not use this abstract base class directly but instead use one of the concrete policy implemented.

To implement your own policy, you have to implement the following methods:

  • decay
  • use_network

Methods:

.reset

.reset()

reset

.decay

.decay()

Decaying the epsilon / sigma value of the policy.

.use_network

.use_network()

Sample an experience replay batch with size.

Returns

use (bool): Boolean value for using the nn.


GreedyQPolicy


Methods:

.reset

.reset()

Remember the transaction.

Accepts a state, action, reward, next_state, terminal transaction.

Arguments

transaction (abstract): state, action, reward, next_state, terminal transaction.

.decay

.decay()

Remember the transaction.

Accepts a state, action, reward, next_state, terminal transaction.

Arguments

transaction (abstract): state, action, reward, next_state, terminal transaction.

.use_network

.use_network()

Remember the transaction.

Accepts a state, action, reward, next_state, terminal transaction.

Arguments

transaction (abstract): state, action, reward, next_state, terminal transaction.


EpsilonGreedyPolicy

EpsilonGreedyPolicy(
   max_value = 1.0, min_value = 0.0, decay_steps = 1
)

Epsilon Greedy

Arguments

max_value (float): . min_value (float): . decay_steps (int): .

Methods:

.reset

.reset()

Remember the transaction.

Accepts a state, action, reward, next_state, terminal transaction.

Arguments

transaction (abstract): state, action, reward, next_state, terminal transaction.

.decay

.decay()

Remember the transaction.

Accepts a state, action, reward, next_state, terminal transaction.

Arguments

transaction (abstract): state, action, reward, next_state, terminal transaction.

.use_network

.use_network()

Remember the transaction.

Accepts a state, action, reward, next_state, terminal transaction.

Arguments

transaction (abstract): state, action, reward, next_state, terminal transaction.