Authors - Jawad Mahmood, Muhammad Adil Raja, John Loane, Fergal McCaffery Abstract - This research focuses on developing a testbed for Artificial Intelligence (AI) enabled Unmanned Aerial Vehicles (UAVs), particularly addressing the challenges posed by delayed rewards in Reinforcement Learning (RL). The Multi-UAV testbed integrates a realistic flight simulator with a Flight Dynamics Model (FDM), creating a versatile environment for testing and training RL algorithms. Two primary models were implemented: the Advantage Actor-Critic (A2C) model controls the target UAV, while the Asynchronous Advantage Actor-Critic (A3C) model governs the tracking UAV, leveraging asynchronous updates for efficient exploration and faster learning. A significant obstacle in reinforcement learning is the issue of delayed rewards, where feedback for an agent’s actions is not immediately available, potentially leading to unstable learning and reduced performance. This project addresses this challenge by integrating the Intrinsic Curiosity Module (ICM) with the A3C model. The ICM generates intrinsic rewards encouraging the A3C-controlled tracking UAV to explore new states, even in the absence of external rewards. This approach mitigates the effects of delayed rewards, allowing the tracking UAV to maintain effective pursuit of the target UAV under dynamic conditions.