Step 2

Creating the Digital Environment

Overview

In Step 1, we generated the required training and validation data sets needed for using a machine learning algorithm to replicate a digital environment (digital twin) of our Tank, T-1 process. In this step, we will complete the generation of the digital twin, and validate the digital twin model. Preparing a valid digital environment is necessary when optimizing industrial processes because testing optimization scenarios on the live process can create hazardous conditions harmful to personnel and equipment. Therefore, to make machine learning an effective tool in an industry's physical environment, digitally replicated environments must be as accurate as possible so that optimization techniques can be applied to the digital environment before ever being implemented in the physical environment. By applying machine learning techniques in this way poses no harm or risk to the industry's physical environment during development and training, while maximizing testing scenarios and events.

Assumptions

It is assumed that you have properly executed Step 1, and have generated your traindata.csv, and valdata.csv files. The code samples shown in this section are all from VS Code. This step will likely also require the use of your GPU should your PC/MAC, or whatever has one. Please be sure that all drivers have been updated for best and most optimal use of your time.

Settings in Step 2

You can use the default settings supplied in the repo, or you can make edits to the settings. Below is an explanation of those settings:

Basic Settings

Dependent Variable (dependantVars)

The dependent variable is the variable that the Digital Twin will predict in this example. In the case of this example it is the indexed in the traindata.csv and valdata.csv as '0' with the Process Value (PV) of the Tank T-1 level meter, LIC-101.PV. Keep at default

Independent Variables (independantVars)

The independent variable is a list of variables that will be used to learn or predict the dependent variable. In this example, the tag list are indexes to identify the column(s) associated with the data we want to use as independent variables. You can adjust these as you desire to get different training results. A detail of the index(s) has been provided in the comments of the step 2 code.

Digital Twin Lookback (dt_lookback)

The look back is an integer value that tells the digital twin algorithm used how far back into the past get retrieve information at each step. In this example, we will be using a Gated Recurrent Unit (GRU) type of a Recurring Neural Network (RNN). In GRU methodology a lookback can be used to offset the problem of vanishing gradients, where a standard RNN may loss the learned affect of earlier inputs into the model. There may other methods that can be used for improved RNN, but in this example GRU is used.

Scan Rate (scanrate)

The scan rate is the sample rate to used for the system. A sample rate of 1, applies no down sampling to the dataset, and is the suggested and default setting. In the default settings, a dt_lookback = 3, and scanrate = 1 will get the previous 3 timestamps from the dataset each time the training executes another step. However, a dt_lookback = 3 and scanrate = 10 will get the previous 30 timestamps and sample to 3. The scanrate and dt_lookback can be used to adjust for long term time affects.

dependantVar = 0
independantVars = [2, 3, 4, 5]
dt_lookback = 3
scanrate = 1

GRU Training Settings

As mentioned in the previous section, for this example the GRU type of RNN was used to train the digital twin.

GRU1 (gru1_dims)

GRU2 (gru2_dims)

Learning Rate (lr)

Episodes (ep)

Batch Size (batch_size)

dt.trainDt(
    gru1_dims=32,
    gru2_dims=32,
    lr=.001,
    ep=20,
    batch_size=5000)

GRU Validation Settings

Maximum Validation Length (max_len)

The maximum validation data set length expressed in scanrate. In this example, the scan rate is 1 and the max length for validation is 500. If the sample rate for the data points was set at 1 second, then the maximum validation set would be capped at 500 seconds, or 500 data samples.

Number of Validations (num_val)

The number of validations to execute on the Digital Twin. In the default example, we will run six (6) digital validations and chart their results and error(s).

dt.validate(
    max_len=500,
    num_val=6
    )

Executing Step 2

// Execute Step 2 from the terminal
python ./step2.py

By running Step 2, there will be two (2) sequential functions that will execute. The first function will train the model, and the second function will validate the model. By using the default settings supplied with the code example, terminal output will look similar to the below:

// Example Output of Step2 Execution for Training
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
gru (GRU)                    (None, 3, 32)             3552      
_________________________________________________________________
activation (Activation)      (None, 3, 32)             0
_________________________________________________________________
gru_1 (GRU)                  (None, 32)                6240
_________________________________________________________________
activation_1 (Activation)    (None, 32)                0
_________________________________________________________________
dense (Dense)                (None, 1)                 33
_________________________________________________________________
activation_2 (Activation)    (None, 1)                 0
=================================================================
Total params: 9,825
Trainable params: 9,825
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/20
2023-06-20 14:30:43.306501: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
160/160 [==============================] - 11s 47ms/step - loss: 0.3547 - val_loss: 0.2131
Epoch 2/20
160/160 [==============================] - 7s 47ms/step - loss: 0.1966 - val_loss: 0.1882
Epoch 3/20
160/160 [==============================] - 8s 47ms/step - loss: 0.1875 - val_loss: 0.1843
Epoch 4/20
160/160 [==============================] - 8s 47ms/step - loss: 0.1851 - val_loss: 0.1832
Epoch 5/20
160/160 [==============================] - 8s 47ms/step - loss: 0.1842 - val_loss: 0.1827
Epoch 6/20
160/160 [==============================] - 8s 47ms/step - loss: 0.1842 - val_loss: 0.1823
Epoch 7/20
160/160 [==============================] - 8s 47ms/step - loss: 0.1836 - val_loss: 0.1823
Epoch 8/20
160/160 [==============================] - 7s 46ms/step - loss: 0.1836 - val_loss: 0.1819
Epoch 9/20
160/160 [==============================] - 7s 46ms/step - loss: 0.1831 - val_loss: 0.1819
Epoch 10/20
160/160 [==============================] - 7s 47ms/step - loss: 0.1832 - val_loss: 0.1816
Epoch 11/20
160/160 [==============================] - 7s 46ms/step - loss: 0.1830 - val_loss: 0.1816
Epoch 12/20
160/160 [==============================] - 7s 47ms/step - loss: 0.1828 - val_loss: 0.1831
Epoch 13/20
160/160 [==============================] - 7s 47ms/step - loss: 0.1827 - val_loss: 0.1814
Epoch 14/20
160/160 [==============================] - 8s 48ms/step - loss: 0.1825 - val_loss: 0.1812
Epoch 15/20
160/160 [==============================] - 8s 47ms/step - loss: 0.1822 - val_loss: 0.1809
Epoch 16/20
160/160 [==============================] - 8s 48ms/step - loss: 0.1824 - val_loss: 0.1808
Epoch 17/20
160/160 [==============================] - 8s 48ms/step - loss: 0.1825 - val_loss: 0.1807
Epoch 18/20
160/160 [==============================] - 8s 47ms/step - loss: 0.1820 - val_loss: 0.1805
Epoch 19/20
160/160 [==============================] - 7s 47ms/step - loss: 0.1820 - val_loss: 0.1806
Epoch 20/20
160/160 [==============================] - 8s 47ms/step - loss: 0.1818 - val_loss: 0.1802

At the end of the Epoch series, in the default setting there are 20 epoch/episodes to go through, the validation will execute automatically, and the output terminal will continue to print out and look like the following:

// Example of terminal output for Step 2 Execution of Validation
Model Saved LIC101.AiPV/
Validating Environment...
INFO: Created TensorFlow Lite delegate for select TF ops.
2023-06-20 14:33:20.588580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1661 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3050 Ti Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
INFO: TfLiteFlexDelegate delegate: 6 nodes delegated out of 24 nodes with 4 partitions.

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Environment Mean Absolute Error % 11.26
Environment Max Error % 25.1
2023-06-20 14:33:20.998669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1661 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3050 Ti Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Environment Mean Absolute Error % 12.37
Environment Max Error % 46.32
2023-06-20 14:33:21.203933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1661 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3050 Ti Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Environment Mean Absolute Error % 24.12
Environment Max Error % 47.07
2023-06-20 14:33:21.406470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1661 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3050 Ti Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Environment Mean Absolute Error % 16.9
Environment Max Error % 38.34
2023-06-20 14:33:21.607501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1661 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3050 Ti Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Environment Mean Absolute Error % 12.91
Environment Max Error % 29.52
2023-06-20 14:33:21.806206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1661 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3050 Ti Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Environment Mean Absolute Error % 5.7
Environment Max Error % 12.68

The model, and validation jpg(s) are saved to the main directory of project. The digital twin model folder will be called, LIC101.AiPV and will hold the .h5 and .tflite model files, as well as the proof of validation charts saved as jpgs.

Bump Test

A bump test of the digital twin will also automatically execute after the digital twin validations. The bump test, is where the model is put through a series of step changes to see how the system responds to the agent's output. In this example, and in Step 3, we will train a DQN agent to control the level via LIC101.MV control valve output. Prior to initiating Step 3, it is necessary to understand that the model will actually respond to the agents handle, in this case LIC101.MV.

In the above figure, it can be seen that the digital twin bump test does show that the model responds (AiPV) to the agents handle, LIC101.MV (AiMV).

Last updated