Uncategorized

Recently, I tested how smoothly Windows only games can run on Linux through Steam’s Proton (a Wine fork), so I installed Grant Theft Auto V (uses Direct3D 11 API) through Proton running in my Linux desktop. The game works, but I could not find any of my cloud saves. For anyone who interested in the…

Memory efficient Scaled Dot Product Attention (SDPA) with Tensor Cores acceleration implemented in Vulkan

Jan 19, 2025

Deep learning, GPU programming, Uncategorized

ai, attention, FlashAttention, FlashAttention-2, GLSL, machine-learning, Scaled Dot Product Attention, SDPA, Vulkan

Memory efficient Scaled Dot Product Attention (SDPA) with Tensor Cores acceleration implemented in Vulkan

I recently uploaded the implementation of the forward pass of a memory efficient attention algorithm (FlashAttention-2 (Dao et al., 2023)) using Vulkan compute and VK_KHR_cooperative_matrix extension to use Tensor Cores or equivalent hardware to accelerate matrix-matrix multiplications . In this post I will go into the details. Background The goal of this project is to…

Gradient of the attention op

Oct 9, 2024

Uncategorized

ai, attention, automatic differentiation, deep-learning, gradients, machine-learning, math, mathematics, numpy, pytorch

In this post, the gradient of the attention op will be derived from a single rule used to implement reverse mode automatic differentiation. Attention mechanism is the foundational building block of the transformer architecture that is the foundation of Today’s most successful language models. It was shown that it can replace recurrent blocks in neural…

Proton profile location

Memory efficient Scaled Dot Product Attention (SDPA) with Tensor Cores acceleration implemented in Vulkan

Gradient of the attention op