GitHub
LinkedIn
Mail
Ervin Tasnadi’s blog
GPU programming & deep learning
Nanobenchmarking: cycle accurate benchmarking of CUDA kernels
Dec 3
Proton profile location
Jan 28
Memory efficient Scaled Dot Product Attention (SDPA) with Tensor Cores acceleration implemented in Vulkan
Jan 19
Gradient of the attention op
Oct 9