GitHub
LinkedIn
Mail

Ervin Tasnadi’s blog

GPU programming & deep learning

Nanobenchmarking: cycle accurate benchmarking of CUDA kernels

Dec 3
Proton profile location

Jan 28
Memory efficient Scaled Dot Product Attention (SDPA) with Tensor Cores acceleration implemented in Vulkan

Jan 19
Gradient of the attention op

Oct 9

Designed with WordPress