Optuna Tuning Results PPO Reinforcement Learning Hyperparameters Performance

Name: Optuna Tuning Results PPO Reinforcement Learning Hyperparameters Performance
Creator: Abdelkader Messlem
Published: 2024-10-16T12:48:26.435Z
Keywords: Machine Learning, Multi-Objective Optimization

Messlem, Abdelkader; Messlem, Youcef; Safa, Ahmed; Ould Abdeslam , Djafar

doi:10.17632/sjp82gkxgz.1

Optuna Tuning Results PPO Reinforcement Learning Hyperparameters Performance

Published: 16 October 2024| Version 1 | DOI: 10.17632/sjp82gkxgz.1

Contributors:

Abdelkader Messlem,

,

Description

Systematic hyperparameter tuning using Optuna was expected to improve PPO model performance in a multi-microgrid environment. We hypothesized that optimizing hyperparameters like learning rate and network architecture would enhance model performance, reflected in increased mean reward and training stability. Data Overview: Dataset: Results from hyperparameter tuning of a PPO model in a multi-microgrid environment Contents: Hyperparameter settings and performance metrics. Data Collection Process: Sampling: Hyperparameters sampled by Optuna and tested by training PPO for 500,000 timesteps Use the command tensorboard --logdir=./Logs/PPO_1 to visualize the data with TensorBoard.

Files

Steps to reproduce

Objective: The primary goal was to optimize the hyperparameters of a Proximal Policy Optimization (PPO) model for reinforcement learning within a multi-microgrid environment. The aim was to identify optimal settings to enhance the model's performance in controlling battery energy storage systems across multiple microgrids. Experimental Design: Optimization Tool: Optuna Algorithm: Proximal Policy Optimization (PPO) Environment: Custom multi-microgrid simulation Setup and Tools: Optuna: For hyperparameter tuning Stable-Baselines3: For PPO implementations Python: For scripting and data management Custom Environment: Simulates battery storage systems in multi-microgrids Protocols and Methods: Hyperparameter Sampling: Optuna’s TPESampler for various settings Training: PPO model trained for 500,000 timesteps Evaluation: Mean reward recorded, with pruning based on early performance indicators Data Storage: CSV file for analysis and TensorBoard to visualize the data Use the command tensorboard --logdir=./Logs/PPO_1 to visualize the data with TensorBoard.

Institutions

Universite Ibn Khaldoun Tiaret

Optuna Tuning Results PPO Reinforcement Learning Hyperparameters Performance

Description

Files

Steps to reproduce

Institutions

Categories

Licence