Performance evaluation of unfolded sparse matrix-vector multiplication

Akgün, İbrahim Ümit

View/Open

File_10063996 (557.4Kb)

Date

2015

Author

Akgün, İbrahim Ümit

Metadata

Show full item record

Abstract

Seyrek matris-vekto ̈r ̧carpımı (spMV) bilimsel hesaplamalarda kullanılan ̧cok temel bir i ̧slemdir. Kimi bilimsel problemlerde aynı matris farklı vekt ̈orlerle tekrar tekrar ̧carpılmaktadır. Bu problemlerde kullanılan spMV kodunu matrise g ̈ore o ̈zelle ̧smi ̧s bir ̧sekilde optimize edersek ̧cok ciddi performans artı ̧sları sa ̆glanabilir. Bunu ger ̧cekle ̧stirmek i ̧cin program u ̈retimi teknikleri uygundur. Program u ̈retimi ile spMV kodundaki do ̈ngu ̈ yu ̈kleri kaldırılabilir, ayrıca etkili eniyilemeler uygulanabilir. Bu c ̧alı ̧smada, spMV kodunun tam d ̈ongu ̈ a ̧cılımı vasıtasıyla ̧carpımı yapılmak istenen matrise go ̈re o ̈zelle ̧stirilmesini inceledik. Ger ̧cek ̈orneklerden olu ̧san 70 adet matris u ̈zerinde deney- sel performans ̧calı ̧smaları yaptık. Ayrıca, kaynak kod u ̈retimi ve sonrasında genel ama ̧clı derleyici kullanımına gerek bırakmayacak kadar yu ̈ksek kaliteli makine ko- dunu hızlı bir ̧sekilde u ̈retmemizi sag ̆layacak eniyilemeler sunuyoruz. Son olarak da, tanımladıg ̆ımız eniyilemelerden birinin kod do ̈nu ̈ ̧su ̈mu ̈ ̧seklinde nasıl tanımlanabilece ̆gini go ̈steriyoruz.

Sparse matrix-vector multiplication (spMV) is a kernel operation in scientific com- putation. There exist problems where a matrix is repeatedly multiplied by many different vectors. For such problems, specializing the spMV code based on the matrix has the potential of producing significantly faster code. This, in fact, has been one of the motivational examples of program generation. Using program generation, spMV code can be unfolded fully to eliminate loop overheads as well as enable high-impact optimizations. In this work we focus on specialization of spMV by unfolding the code according to a given matrix. We provide an experimental evaluation of performance using 70 sparse matrices collected from real-world scientific computation domains. We present optimizations with which high-performant assembly code can be generated rapidly without having to generate source-level code and go through all the phases of a general-purpose compiler. We finally present how one of the optimizations we studied can be implemented as a code-transforming pass.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/103616

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess