Aditya Bang's Portfolio Website
Home
About
Projects
Blogs
Qualifications
Contact

Linkedin

My Resume

Blogs

GPUTritonPerformance Optimization

GraphCUDA: Fusing Sparse-Dense and Dense-Dense Matrix Multiplication (Part 1)

Optimizing a fused sparse-dense and dense-dense matrix multiplication kernel in Triton.

2026-04-298 min read
GPUCUDAPerformance Optimization

GraphCUDA: Fusing Sparse-Dense and Dense-Dense Matrix Multiplication (Part 2)

Continuing the fused SpMM-GEMM optimization series with lower-level CUDA implementation details.

2026-04-29Coming soon
GPUCuTePerformance Optimization

GraphCUDA: Fusing Sparse-Dense and Dense-Dense Matrix Multiplication (Part 3)

Continuing the fused SpMM-GEMM optimization series with CuTe and newer GPU architectures.

2026-04-29Coming soon