In Step 3, a reinforcement learning technique called Deep Q Learning (DQN) will be used to train an agent to control the level in Tank T-1 at the desired set point per the suggested optimization. The agent will use the digital twin created in Step 2 as its environment to explore. There are multiple reinforcement learning solutions that could also work to control the Tank's level, however, DQN is the method used in this example.
Assumptions
It is assumed that you have properly executed Step 2, and have generated an acceptable Digital Twin, LIC101.AiPV. The code samples shown in this section are all from VS Code. This step will likely also require the use of your GPU should your PC/MAC, or whatever has one. Please be sure that all drivers have been updated for best and most optimal use of your time.
Settings in Step 3
You can use the default settings supplied in the repo, or you can make edits to the settings. Each setting in the code has short details in the comments of the code as to what these settings are used for. If biggest key setting to understand is to make sure that the lookback of the agent does not exceed the lookback of the digital twin.
Executing Step 3
It is key to understand that the DQN method used is not a convex optimization method. There is no way to guarantee the model has converged to a global optima. The sampling is also random. If and/or when a model converges it can also can diverge and get worse.
// Execute Step 3 from the terminal
python ./step3.py
Each training episode is scored, and the rolling average is also printing out at the terminal at the end of each episode. A model can be said to converge when the average score continues to improve towards it maximum possible score. It is common to see this average float up and down. We have incorporated a custom 'Saved Best Model' function to assist in making sure that only the best training episodes are stored off.
Score
The overall episodes score is the summation of each step's score in the episode. You can view each step's score by opening the replaybuffer.csv located in the agents directory. This score will print at the end of each step.
Average
The average is a rolling average of the previous episode scores. This average will not calculate until after the first 20 episodes have completed.
Finished Directory Files
When the agent has completed training, there Agent Directory will look like the following: