This paper presents efficient sparse matrix-vector multiplication (SpMV) kernels for throughput-oriented processors like the GPU. The authors explore various sparsity formats (DIA, ELL, CSR, COO, and HYB) and their performance on the GeForce GTX 285 GPU. The kernels are implemented in CUDA and achieve high bandwidth utilization and excellent throughput, with double precision performance of 16 GFLOP/s for structured grids and 10 GFLOP/s for unstructured meshes. These results are significantly better than previous approaches on Cell BE and quad-core Intel systems. The study highlights the importance of fine-grained parallelism and memory access patterns in achieving high performance on throughput-oriented architectures. The paper also discusses the trade-offs between different formats and their suitability for various sparsity patterns. The results show that the HYB format achieves the highest performance on unstructured matrices, while the DIA format performs best on structured matrices. The study concludes that the proposed kernels are efficient and effective for a wide range of sparse matrix applications.This paper presents efficient sparse matrix-vector multiplication (SpMV) kernels for throughput-oriented processors like the GPU. The authors explore various sparsity formats (DIA, ELL, CSR, COO, and HYB) and their performance on the GeForce GTX 285 GPU. The kernels are implemented in CUDA and achieve high bandwidth utilization and excellent throughput, with double precision performance of 16 GFLOP/s for structured grids and 10 GFLOP/s for unstructured meshes. These results are significantly better than previous approaches on Cell BE and quad-core Intel systems. The study highlights the importance of fine-grained parallelism and memory access patterns in achieving high performance on throughput-oriented architectures. The paper also discusses the trade-offs between different formats and their suitability for various sparsity patterns. The results show that the HYB format achieves the highest performance on unstructured matrices, while the DIA format performs best on structured matrices. The study concludes that the proposed kernels are efficient and effective for a wide range of sparse matrix applications.