ai

Memory efficient Scaled Dot Product Attention (SDPA) with Tensor Cores acceleration implemented in Vulkan

Jan 19, 2025

Deep learning, GPU programming, Uncategorized

ai, attention, FlashAttention, FlashAttention-2, GLSL, machine-learning, Scaled Dot Product Attention, SDPA, Vulkan

I recently uploaded the implementation of the forward pass of a memory efficient attention algorithm (FlashAttention-2 (Dao et al., 2023)) using Vulkan compute and VK_KHR_cooperative_matrix extension to use Tensor Cores or equivalent hardware to accelerate matrix-matrix multiplications . In this post I will go into the details. Background The goal of this project is to…
Gradient of the attention op

Oct 9, 2024

Uncategorized

ai, attention, automatic differentiation, deep-learning, gradients, machine-learning, math, mathematics, numpy, pytorch

In this post, the gradient of the attention op will be derived from a single rule used to implement reverse mode automatic differentiation. Attention mechanism is the foundational building block of the transformer architecture that is the foundation of Today’s most successful language models. It was shown that it can replace recurrent blocks in neural…