Authors - Ezhilmathi Krishnasamy, Pascal Bouvry Abstract - The ongoing development in architecture and programming models, particularly regarding GPUs, significantly influences the landscape of high-performance computing. Presently, nearly all supercomputers worldwide are equipped with GPU compute nodes, promoting advancements in architecture that emphasize heterogeneity. Concurrently, there is a parallel evolution in programming models designed to effectively harness the potential of this diverse architecture for executing scientific applications. Notable architectures, longing trends in Nvidia GPUs, such as Grace Hopper and the AMD Mi300 series, alongside programming paradigms like SYCL, OpenCL, and various library-based models, including Kokos and StarPU, illustrate this trend. The primary motivation is to establish an appropriate programming model that targets modern architectures, essential for maximizing computational efficiency across specific applications. A multitude of programming models exist, each offering the potential for optimal utilization of advanced architectural capabilities. However, a critical question arises regarding the ease of adoption and the user-friendliness of these models for existing large-scale scientific codes. This study focuses on OpenMP Offloading, examining its application on Nvidia to ensure a unified source code compatible with various GPU architectures. Identifying an optimal programming model for effective OpenMP Offloading usage is of paramount importance. This paper conducts a comparative analysis of the OpenMP Offloading programming model against CUDA for Nvidia GPUs, facilitating a comprehensive performance evaluation. The analysis employs key BLAS operations to assess the performance characteristics of OpenMP Offloading in relation to CUDA, thereby elucidating the advantages and limitations associated with leveraging OpenMP Offloading.