DQNTrainer
DQNTrainer(
environment, memory, processor, model, callbacks, test_policy, train_policy,
her = False, action_num = 4
)
Deep Deterministic Policy Gradient
Arguments
actor_model (keras.nn.Model instance): See Model for details.
critic_model (keras.nn.Model instance): See Model for details.
optimizer (keras.optimizers.Optimizer instance):
See Optimizer for details.
action_inp (keras.layers.Input / keras.layers.InputLayer instance):
See Input for details.
tau (float): tau.
gamma (float): gamma.
Methods:
.goal
.goal()
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.get_step
.get_step(
action, mode = 'q_learning', action_number = 4
)
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.get_action
.get_action(
state, goal_state, policy
)
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.train
.train(
batch_size = 32, max_action = 200, max_episode = 12000, warmup = 120000
)
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.evaluate
.evaluate(
max_action = 50, max_episode = 12
)
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
DDPGTrainer
DDPGTrainer(
environment, random_process, processor, memory, model, callbacks, her = False
)
Deep Deterministic Policy Gradient
Arguments
actor_model (keras.nn.Model instance): See Model for details.
critic_model (keras.nn.Model instance): See Model for details.
optimizer (keras.optimizers.Optimizer instance):
See Optimizer for details.
action_inp (keras.layers.Input / keras.layers.InputLayer instance):
See Input for details.
tau (float): tau.
gamma (float): gamma.
Methods:
.goal
.goal()
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.get_action
.get_action(
state, goal_state
)
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.train
.train(
batch_size = 32, max_action = 50, max_episode = 120, warmup = 0, replay_interval = 4,
update_interval = 1, test_interval = 1000
)
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.evaluate
.evaluate(
max_action = 50, max_episode = 12
)
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
TD3Trainer
TD3Trainer(
environment, random_process, processor, memory, model, callbacks, her = False
)
Deep Deterministic Policy Gradient
Arguments
actor_model (keras.nn.Model instance): See Model for details.
critic_model (keras.nn.Model instance): See Model for details.
optimizer (keras.optimizers.Optimizer instance):
See Optimizer for details.
action_inp (keras.layers.Input / keras.layers.InputLayer instance):
See Input for details.
tau (float): tau.
gamma (float): gamma.
Methods:
.goal
.goal()
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.get_action
.get_action(
state, goal_state
)
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.train
.train(
batch_size = 32, max_action = 50, max_episode = 120, warmup = 0, replay_interval = 4,
update_interval = 1, test_interval = 1000
)
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.
.evaluate
.evaluate(
max_action = 50, max_episode = 12
)
Remember the transaction.
Accepts a state, action, reward, next_state, terminal transaction.
Arguments
transaction (abstract): state, action, reward, next_state, terminal transaction.