Maximizing Throughput of Aerial Base Stations via Resources-based Multi-Agent Proximal Policy Optimization: A Deep Reinforcement Learning Approach

Yu Min Park; Sheikh Salman Hassan; Choong Seon Hong

Summary

Asia-Pacific Network Operations and Management Symposium

2022

Session Number:PS2

Session:

Number:PS2-07

Maximizing Throughput of Aerial Base Stations via Resources-based Multi-Agent Proximal Policy Optimization: A Deep Reinforcement Learning Approach

Yu Min Park, Sheikh Salman Hassan, Choong Seon Hong,

pp.-

Publication Date:2022/09/28

Online ISSN:2188-5079

DOI:10.34385/proc.70.PS2-07

PDF download

Summary:

Fifth-generation (5G) networks use millimeter-wave (mmWave) technology to process high-speed and capacity data services. However, wireless communication losses occur due to mmWave limitations, i.e., penetration, rain attenuation, and coverage range. Furthermore, many base stations (BSs) are needed to support stable wireless communications and overcome coverage distances in rural and suburban areas. Therefore, a new wireless communication platform that supports communication services at the aerial level is required. Furthermore, this aerial platform enables line-of-sight (LoS) communications rather than non-LoS (NLoS), which is advantageous in overcoming ground-level losses. Thus, an unmanned aerial vehicle (UAV) or an unmanned aerial platform (UAP) that can be rapidly and dynamically deployed at the point of interest is considered. Despite these benefits, UAV-BSs (also known as aerial BSs) still have optimization problems to solve, i.e., resource allocation and trajectory optimization. Thus, this study considered resource-based multi-agent deep reinforcement learning (MADRL) to solve the resource allocation and trajectory optimization problems of UAV-BSs at the same time. However, our proposed optimization problem is non-convex. Thus we proposed an algorithm based on multi-agent proximal policy optimization (MAPPO) DRL. The proposed algorithm treats each agent as a resource variable to perform optimization more effectively. As a result, the proposed algorithm achieved faster convergence and higher rewards than the baselines.