An AI agent selects the best action for the batches of mangoes based on the estimated shelf-life or the first import's mango. The program starts by defining the parameters of Deep Q-learning, and thereafter defines three classes and a function. These three classes define the environment, neural network, and Deep Q-learning, respectively, while a function runs on the main program. The parameters for Deep Q-learning are as follows: 0.9 for Epsilon, 0.9 for Gamma, 0.01 is the learning rate in an Adam optimizer, memory capacity is 3,000, Q-Network iterations are 100, batch size is 32, and episodes are 1,000. The environment class selects a random integer between 0 and 1,200 for the shelf-life state, 480 array shapes for the shelf-life future projection and creates a store state from 1 to 480 to determine the reward. After 480 steps, the environment is reset to its original parameters and returns an array from the shelf-life state, future projection, and storage state.
Free full text: Click here