GLSL

Memory efficient Scaled Dot Product Attention (SDPA) with Tensor Cores acceleration implemented in Vulkan

Jan 19, 2025

Deep learning, GPU programming, Uncategorized

ai, attention, FlashAttention, FlashAttention-2, GLSL, machine-learning, Scaled Dot Product Attention, SDPA, Vulkan

I recently uploaded the implementation of the forward pass of a memory efficient attention algorithm (FlashAttention-2 (Dao et al., 2023)) using Vulkan compute and VK_KHR_cooperative_matrix extension to use Tensor Cores or equivalent hardware to accelerate matrix-matrix multiplications . In this post I will go into the details. Background The goal of this project is to…