Step 3

Training the DQN Agent to Control the Level

Overview

In Step 3, a reinforcement learning technique called Deep Q Learning (DQN) will be used to train an agent to control the level in Tank T-1 at the desired set point per the suggested optimization. The agent will use the digital twin created in Step 2 as its environment to explore. There are multiple reinforcement learning solutions that could also work to control the Tank's level, however, DQN is the method used in this example.

Assumptions

It is assumed that you have properly executed Step 2, and have generated an acceptable Digital Twin, LIC101.AiPV. The code samples shown in this section are all from VS Code. This step will likely also require the use of your GPU should your PC/MAC, or whatever has one. Please be sure that all drivers have been updated for best and most optimal use of your time.

Settings in Step 3

You can use the default settings supplied in the repo, or you can make edits to the settings. Each setting in the code has short details in the comments of the code as to what these settings are used for. If biggest key setting to understand is to make sure that the lookback of the agent does not exceed the lookback of the digital twin.

Executing Step 3

It is key to understand that the DQN method used is not a convex optimization method. There is no way to guarantee the model has converged to a global optima. The sampling is also random. If and/or when a model converges it can also can diverge and get worse.

// Execute Step 3 from the terminal
python ./step3.py
// When the code executes, the terminal will start to scroll.
LIC0101.AiMV/
INFO: Created TensorFlow Lite delegate for select TF ops.
INFO: TfLiteFlexDelegate delegate: 6 nodes delegated out of 24 nodes with 4 partitions.

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
episode_ 0 score_ 65.0  average_ 0  epsilon_ 0.999
episode_ 1 score_ 22.0  average_ 0  epsilon_ 0.999
episode_ 2 score_ 86.0  average_ 0  epsilon_ 0.998
episode_ 3 score_ 18.0  average_ 0  epsilon_ 0.998
episode_ 4 score_ 18.0  average_ 0  epsilon_ 0.997
episode_ 5 score_ 57.0  average_ 0  epsilon_ 0.996
episode_ 6 score_ 17.0  average_ 0  epsilon_ 0.996
episode_ 7 score_ 49.0  average_ 0  epsilon_ 0.995
episode_ 8 score_ 18.0  average_ 0  epsilon_ 0.995
episode_ 9 score_ 66.0  average_ 0  epsilon_ 0.994
episode_ 10 score_ 18.0  average_ 0  epsilon_ 0.993
episode_ 11 score_ 40.0  average_ 0  epsilon_ 0.993
episode_ 12 score_ 82.0  average_ 0  epsilon_ 0.992
episode_ 13 score_ 19.0  average_ 0  epsilon_ 0.992
episode_ 14 score_ 57.0  average_ 0  epsilon_ 0.991
episode_ 15 score_ 19.0  average_ 0  epsilon_ 0.99
episode_ 16 score_ 19.0  average_ 0  epsilon_ 0.99
episode_ 17 score_ 18.0  average_ 0  epsilon_ 0.989
episode_ 18 score_ 24.0  average_ 0  epsilon_ 0.989
episode_ 19 score_ 17.0  average_ 0  epsilon_ 0.988
episode_ 20 score_ 74.0  average_ 0  epsilon_ 0.987
saved best model
episode_ 21 score_ 48.0  average_ 36.9  epsilon_ 0.987
saved best model
episode_ 22 score_ 17.0  average_ 38.2  epsilon_ 0.986
episode_ 23 score_ 49.0  average_ 34.8  epsilon_ 0.986
episode_ 24 score_ 19.0  average_ 36.3  epsilon_ 0.985
episode_ 25 score_ 70.0  average_ 36.3  epsilon_ 0.985
episode_ 26 score_ 85.0  average_ 36.9  epsilon_ 0.984
saved best model
episode_ 27 score_ 18.0  average_ 40.3  epsilon_ 0.983
episode_ 28 score_ 59.0  average_ 38.8  epsilon_ 0.983
saved best model
episode_ 29 score_ 76.0  average_ 40.9  epsilon_ 0.982
saved best model
episode_ 30 score_ 85.0  average_ 41.4  epsilon_ 0.982
saved best model
episode_ 31 score_ 84.0  average_ 44.7  epsilon_ 0.981
saved best model
episode_ 32 score_ 74.0  average_ 46.9  epsilon_ 0.98
episode_ 33 score_ 60.0  average_ 46.5  epsilon_ 0.98
saved best model
episode_ 34 score_ 69.0  average_ 48.5  epsilon_ 0.979
saved best model
episode_ 35 score_ 48.0  average_ 49.1  epsilon_ 0.979
saved best model
episode_ 36 score_ 45.0  average_ 50.6  epsilon_ 0.978
saved best model
episode_ 37 score_ 18.0  average_ 51.9  epsilon_ 0.977
episode_ 38 score_ 43.0  average_ 51.9  epsilon_ 0.977
saved best model
episode_ 39 score_ 78.0  average_ 52.8  epsilon_ 0.976
saved best model
episode_ 40 score_ 25.0  average_ 55.9  epsilon_ 0.976
episode_ 41 score_ 48.0  average_ 53.4  epsilon_ 0.975
episode_ 42 score_ 68.0  average_ 53.4  epsilon_ 0.975
saved best model
episode_ 43 score_ 79.0  average_ 55.9  epsilon_ 0.974
saved best model
episode_ 44 score_ 19.0  average_ 57.4  epsilon_ 0.973
episode_ 45 score_ 75.0  average_ 57.4  epsilon_ 0.973
saved best model
episode_ 46 score_ 89.0  average_ 57.7  epsilon_ 0.972
saved best model
episode_ 47 score_ 84.0  average_ 57.9  epsilon_ 0.972
saved best model
episode_ 48 score_ 19.0  average_ 61.2  epsilon_ 0.971
episode_ 49 score_ 87.0  average_ 59.1  epsilon_ 0.97
episode_ 50 score_ 19.0  average_ 59.7  epsilon_ 0.97
episode_ 51 score_ 67.0  average_ 56.4  epsilon_ 0.969
episode_ 52 score_ 18.0  average_ 55.5  epsilon_ 0.969
episode_ 53 score_ 69.0  average_ 52.7  epsilon_ 0.968
episode_ 54 score_ 73.0  average_ 53.2  epsilon_ 0.968
episode_ 55 score_ 18.0  average_ 53.4  epsilon_ 0.967
episode_ 56 score_ 85.0  average_ 51.9  epsilon_ 0.966

Understanding Scores

Each training episode is scored, and the rolling average is also printing out at the terminal at the end of each episode. A model can be said to converge when the average score continues to improve towards it maximum possible score. It is common to see this average float up and down. We have incorporated a custom 'Saved Best Model' function to assist in making sure that only the best training episodes are stored off.

Score

The overall episodes score is the summation of each step's score in the episode. You can view each step's score by opening the replaybuffer.csv located in the agents directory. This score will print at the end of each step.

Average

The average is a rolling average of the previous episode scores. This average will not calculate until after the first 20 episodes have completed.

Finished Directory Files

When the agent has completed training, there Agent Directory will look like the following:

Last updated