现在让我们定义这个利用 Q-Learning 学习 Catch 游戏的模型。我们使用 Keras 作为 Tensorflow 的前端。我们的基准模型是一个简单的三层密集网络。这个模型在简单版的 Catch 游戏当中表现很好。你可以在 GitHub 中找到它的完整实现过程。
你也可以尝试更加复杂的模型,测试其能否获得更好的性能。
num_actions =3# [move_left, stay, move_right]hidden_size =100# Size of the hidden layersgrid_size =10# Size of the playing fielddefbaseline_model(grid_size,num_actions,hidden_size):#seting up the model with kerasmodel = Sequential() model.add(Dense(hidden_size, input_shape=(grid_size**2,), activation='relu')) model.add(Dense(hidden_size, activation='relu')) model.add(Dense(num_actions)) model.compile(sgd(lr=.1),"mse")returnmodel
探索
Q-Learning 的最后一种成分是探索。日常生活的经验告诉我们,有时候你得做点奇怪的事情或是随机的手段,才能发现是否有比日常动作更好的东西。
Q-Learning 也是如此。总是做最好的选择,意味着你可能会错过一些从未探索的道路。为了避免这种情况,学习者有时会添加一个随机项,而未必总是用最好的。我们可以将定义训练方法如下:
deftrain(model,epochs):# Train#Reseting the win counterwin_cnt =0# We want to keep track of the progress of the AI over time, so we save its win count historywin_hist = []#Epochs is the number of games we playforeinrange(epochs): loss =0.#Resetting the gameenv.reset() game_over =False# get initial inputinput_t = env.observe()whilenotgame_over:#The learner is acting on the last observed game screen#input_t is a vector containing representing the game screeninput_tm1 = input_t#Take a random action with probability epsilonifnp.random.rand() <= epsilon:#Eat something random from the menuaction = np.random.randint(0, num_actions, size=1)else:#Choose yourself#q contains the expected rewards for the actionsq = model.predict(input_tm1)#We pick the action with the highest expected rewardaction = np.argmax(q[0])# apply action, get rewards and new stateinput_t, reward, game_over = env.act(action)#If we managed to catch the fruit we add 1 to our win counterifreward ==1: win_cnt +=1#Uncomment this to render the game here#display_screen(action,3000,inputs[0])""" The experiences < s, a, r, s’ > we make during gameplay are our training data. Here we first save the last experience, and then load a batch of experiences to train our model """# store experienceexp_replay.remember([input_tm1, action, reward, input_t], game_over)# Load batch of experiencesinputs, targets = exp_replay.get_batch(model, batch_size=batch_size)# train model on experiencesbatch_loss = model.train_on_batch(inputs, targets)#sum up loss over all batches in an epochloss += batch_loss win_hist.append(win_cnt)returnwin_hist
我将这个游戏机器人训练了 5000 个 epoch,结果表现得很不错!
Catch 机器人的动作
评论
查看更多