Authors - Adarsh Varshney, Karthick Seshadri, Viswa Kiran Andraju Abstract - The Knowledge-Infused Policy Gradient with Upper Confidence Bound (KIPGUCB) strategy addresses contextual multi-armed bandit problems by balancing exploration and exploitation. This study evaluates the performance of a KIPGUCB-based agent in the partially observable environment of StarCraft II. Unlike traditional deep reinforcement learning models that rely on low-level atomic actions, our approach enhances decision-making by employing higher-level tactical strategies. A tactic manager dynamically selects optimal tactics based on game state and reward signals which improves resource management and structured tasks such as unit training. The agent’s performance is compared with a StarCraft II Grandmaster, a novice human player and DeepMind’s baseline RL agent across five mini-games. Experimental results show that the KIPGUCB-based agent outperforms the baseline model in resource focused and structured tasks but struggles in combat-oriented scenarios requiring adaptive responses.