On the analysis and evaluation of sparse hybrid linear solvers

Farea, Afrah Najib Abdullah

dc.contributor.advisor	Çelebi, Mustafa Serdar
dc.contributor.author	Farea, Afrah Najib Abdullah
dc.date.accessioned	2020-12-07T10:00:45Z
dc.date.available	2020-12-07T10:00:45Z
dc.date.submitted	2018
dc.date.issued	2018-11-28
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/127979
dc.description.abstract	Birçok girdisinin sıfır olması durumunda bir matris seyrek olarak adlandırılır. Bu türmatrisler, sayısal simülasyonlar, Akışkanlar Dinamiği, Grafik Teorisi, OptimizasyonProblemleri, Sinyal İşleme, Finans, Endüstri, Doğrusal Programlama,Elektromanyetik, 2D / 3D ve diğer pek çok gerçek uygulama gibi birçok alanınayrıklaştırma problemlerinden kaynaklanır. Genelde, bellek depolama ve hesaplamaaçısından öğelerinin seyrekliginden yararlanabilirsek ona seyrek matris diyoruz.Büyük bellek ve hesaplama tasarruflarına yol açan farklı seyrek matris formatlarıönerilmiştir.Benzer şekilde, büyük sparse lineer sistemlerin çözümü, bu tür bilimsel uygulamalariçin bir çekirdek görevinde olduğundan giderek önem kazanmaktadır. Dahası, sonyıllarda modern mikroişlemci mimarisinde ve genel olarak donanımda büyük vekarmaşık gelişmeler yaşandı. Ayrıca, paralel hesaplama yöntemleri ve dilleri, gerçekdünyadaki problemleri çözmede, özellikle Mesaj Geçiş Arayüzü (MPI)geliştirilmesinde bir çeşit gelişme göstermiştir. Sonuç olarak birçok algoritma veyazılım paketi donanımda yeni gelişmelere kapı açtı. Örnek verilecek olunursa, bellekve hesaplama düğümlerinin hiyerarşisindeki gelişme, modern mimarinin küçükönbelleği barındıran ve floating point performansını arttıran yeni engellemealgoritmalarının geliştirilmesini sağlar.Genel olarak, doğrusal sistemleri matematiksel olarak çözmek için iki ana kategorivardır: doğrudan ve yinelemeli yöntemler. Doğrudan yöntemler yüksek doğruluktasonuçlar verebilir (10−30) ve ihtiyaç duydukları düşük hafıza tüketimi nedeniyle (sonyıllarda doğrudan yöntemler birkaç milyon odf denklemini çözebilir) problemler içindaha uygundur (örneğin, Büyük Sparse 3D problemlerini çözemezler). Bunun yanındaparalel boyutlandırmayı sınırlar. Öte yandan Preconditioned iterasyon yöntemleri dahasağlıklıdır, daha az bellek gerektirir ve paralelleştirilmesi daha kolaydır. Ancak, bunlarproblem bağımlıdırlar ve iyi preconditioning metodlarla daha hızlı yakınsanabilir. Butezdeki odak noktamız, büyük Sparse lineer sistemlerini çözmek için kullanılan paralelyazılım paketleridir. Spars lineer sistemlerin çözümü için çeşitli yöntemler ve teknikleraraştırdık ve modern bilgisayarların hiyerarşik yapısı daha esnek ve uygun olduklarıiçin açık kaynaklı hibrit yaklaşımlara yoğunlaştık.Hibrit kelimesinin farklı anlamları vardır. Programlamada hibrit, OpenMP ve MPIdillerini birleşimi anlamına gelmektedir çünkü paylaşımlı ve distrübe edilmiş hafızayıçalıştırabilirler. Donanımda ise hibrit, CPU ve GPU'nun birlikte çalışmasıdır. Hibritalgoritmada doğrudan ve iteratif yöntemleri birleştirmek anlamına gelir. Odaknoktamız olan algoritma hibriti; daha verimli bir yöntem oluşturmak için doğrudan veiteratif yöntemleri birleştirir. Bu nedenle, bu tez boyunca hibrit sözcüğündenbahsettiğimizde, doğrudan ve yinelemeli yöntemleri birlikte kastetmiş olacağız.Çoğu seyrek hibrit çözümleyicileri `Schur complement framework` yönteminikullanır. Buradaki amaç doğrudan ve iteratif metodları birleştirmektir. Bu çerçevede,orijinal matris 2 × 2 blok matrisine ayrılmıştır. Grafik teorisi algoritmaları matrisindüzenlenmesi ve bu blok yapısının elde edilmesine yarar. İlk blok büyük bir blok köşegen kare matrisidir. Her diagonal blok iç alt domain olarak adlandırılır ve direkmetodlar kullanılarak çarpanlarına ayrılırlar. İkinci ve üçüncü bloklar`interface`lerdir. Eğer matris simetrik ise bu interface bloklar birbirlerinintranspozesidir. Dördüncü blok Schur komplement matris olarak adlandırılan kare birmatristir. Bu matris asıl matristen daha küçüktür ve kullanmak için daha uygundur.Eğer asıl matris SPD matris ise Schur komplement matris de SPD matristir. Aynızamanda Schur matris kullanmak asıl matrisi kullanmaktan daha kolaydır. Matrisidaha önceden preconditioning etmek daha hızlı bir yakınsama için gereklidir.Çarpanlarına ayrılmış iç alt domainler Schur complement matris yaklaşımı veprecondition yapmak için kullanılır.Büyük seyrek lineer sistemlerini çözmek sekansiyel olarak çalıştırıldığında, algoritmikhibrit yaklaşımlarla bile günler alabilir. Schur tamamlayıcı çerçeve paralelprogramlama için daha fazla tercih edilir. Donanım ve süper bilgisayarlarıngelişmesiyle birlikte, bu sistemler birkaç saniye içinde verimli bir şekildeçözülebilir. Ancak, var olan algoritmalar ölçeklenebilirlik ve yük dengesi gibi paralelçevresel kısıtlamalara uyum sağlayabilmek için modifiye edilmelidir.PDSLin ve Maphys, mevcut en iyi hibrit çözücü tabanlı açık kod Schurcomplement'dir.Bu çözücüler üzerinde derinlemesine inceleme yaptık, bunları farklımatris türlerinde test ettik ve sonuçlarını son teknoloji Superlu-dist doğrudan çözücüsüile karşılaştırdık.Sonuç olarak, seyrek hibrit çözücüler daha esneklerdir çünkü farklı bileşenleresahiplerdir ve her bileşen mevcut probleme uyarlanabilecek şekilde diğerinin yerinegeçirilebilir. İki seviyeli paralellik kullanmak, hibrit çözücülerin verimli olmasını vebin adet işlemci sayısına kadar ölçeklenebilir olmasını sağlayabilir. Hibrit çözücüler,doğrudan veya iteratif yöntemlerden çok daha az miktarda bellek kullanılır.Büyük seyrek lineer sistemlerini çözmek sekansiyel olarak çalıştırıldığında, algoritmikhibrit yaklaşımlarla bile günler alabilir. Schur tamamlayıcı çerçeve paralelprogramlama için daha fazla tercih edilir. Donanım ve süper bilgisayarlarıngelişmesiyle birlikte, bu sistemler birkaç saniye içinde verimli bir şekildeçözülebilir. Ancak, var olan algoritmalar ölçeklenebilirlik ve yük dengesi gibi paralelçevresel kısıtlamalara uyum sağlayabilmek için modifiye edilmelidir.PDSLin ve Maphys, mevcut en iyi hibrit çözücü tabanlı açık kod Schurcomplement'dir.Bu çözücüler üzerinde derinlemesine inceleme yaptık, bunları farklımatris türlerinde test ettik ve sonuçlarını son teknoloji Superlu-dist doğrudan çözücüsüile karşılaştırdık.Sonuç olarak, seyrek hibrit çözücüler daha esneklerdir çünkü farklı bileşenleresahiplerdir ve her bileşen mevcut probleme uyarlanabilecek şekilde diğerinin yerinegeçirilebilir. İki seviyeli paralellik kullanmak, hibrit çözücülerin verimli olmasını vebin adet işlemci sayısına kadar ölçeklenebilir olmasını sağlayabilir. Hibrit çözücüler,doğrudan veya iteratif yöntemlerden çok daha az miktarda bellek kullanılırBüyük seyrek lineer sistemlerini çözmek sekansiyel olarak çalıştırıldığında, algoritmikhibrit yaklaşımlarla bile günler alabilir. Schur tamamlayıcı çerçeve paralelprogramlama için daha fazla tercih edilir. Donanım ve süper bilgisayarlarıngelişmesiyle birlikte, bu sistemler birkaç saniye içinde verimli bir şekildeçözülebilir. Ancak, var olan algoritmalar ölçeklenebilirlik ve yük dengesi gibi paralelçevresel kısıtlamalara uyum sağlayabilmek için modifiye edilmelidir.PDSLin ve Maphys, mevcut en iyi hibrit çözücü tabanlı açık kod Schurcomplement'dir.Bu çözücüler üzerinde derinlemesine inceleme yaptık, bunları farklımatris türlerinde test ettik ve sonuçlarını son teknoloji Superlu-dist doğrudan çözücüsüile karşılaştırdık.Sonuç olarak, seyrek hibrit çözücüler daha esneklerdir çünkü farklı bileşenleresahiplerdir ve her bileşen mevcut probleme uyarlanabilecek şekilde diğerinin yerinegeçirilebilir. İki seviyeli paralellik kullanmak, hibrit çözücülerin verimli olmasını vebin adet işlemci sayısına kadar ölçeklenebilir olmasını sağlayabilir. Hibrit çözücüler,doğrudan veya iteratif yöntemlerden çok daha az miktarda bellek kullanılır.
dc.description.abstract	A matrix is called sparse if many of its entries are zero. Such types of matrices aregenerated from discretization problems of many fields like numerical simulations,Fluid Dynamics, Graph theory, Optimization problems, Signal processing, Finance,industry, Linear Programming, Electromagnetics, 2D/3D and many other real-worldapplications. In general, we call a matrix sparse if we could exploit the sparsity of itselements in terms of memory storage and computation. Different sparse matrix formatsare proposed which lead to huge memory and computation savings.Similarly, solving large sparse linear systems becomes increasingly important as akernel task for such scientific applications. Furthermore, recent years witnessed hugeand complex development in modern microprocessor architecture and hardware ingeneral. Besides, the parallel computing methods and languages has shown a kindmaturation in solving real-world problems especially, the development of MessagePassing Interface (MPI). Consequently, many algorithms and software packages haveemerged to exploit the new developments in hardware and software alike. For instance,the development in the hierarchy of memory and computation nodes leads todeveloping new blocking algorithms which accommodate the small cache memory ofmodern architecture and increase the floating-point performance.In general, there are two broad categories for solving linear systems mathematically:direct and iterative methods. Direct methods can give high accurate results (10−30)and are more suitable for small problems (current direct methods can solve up to acouple of millions of equations) because of the memory consumption they require (forexample, they can not solve large sparse 3D problems which may generate hundredsof millions of unknowns). Besides, they have limited parallel scalability.Preconditioned iterative methods, on the other hand, are more robust, require lessmemory and easier to parallelize. However, they are problem dependent and canconverge faster with good preconditioning methods. These methods are the methodsof choice when an approximate solution of the problem is sought. Our focus in thisthesis is the parallel software packages used for solving large sparse linear systems.We investigated the various methods and techniques for solving sparse linear systemsand concentrate on the open source hybrid approaches as they are more flexible andwell-suited the hierarchal structure of modern computers.The word hybrid has different meanings. Hybrid in programming means combiningOpenMP and MPI so they work for shared and distributed memory. Hybrid inhardware means working on CPU and GPU. Hybrid in algorithms means combiningdirect and iterative methods. So, our focus is on the third meaning; combining directand iterative algorithms in order to get more efficient methods for solving sparsematrices. Therefore, throughout this dissertation, when we mention the word hybrid,we mean direct/ iterative methods.Hybrid solvers are the latest research in developing robust and scalable methods forsolving sparse linear systems. These methods combine direct and iterative approachesin certain ways, mostly using Schur complement framework, with the aim of gettingthe desired features of both direct and iterative methods especially the speed and lowmemory consumption of the iterative methods and the robustness of the directmethods.Most sparse hybrid solvers use the so called 'Schur complement framework' tocombine direct and iterative methods. In this framework, the original matrix is dividedinto 2×2 block matrices. This approach is also known as sub-structuring domaindecomposition method. Graph theory algorithms are responsible for ordering theglobal matrix and getting such block structure. The first block is a large block diagonalsquare matrix. Each diagonal block, called internal subdomain, is factorized usingdirect methods. The second and third block are the interfaces. If the matrix issymmetric, these interface blocks are transpose of each other. The fourth block is asquare matrix called Schur complement matrix. This matrix has many desired featureswell-suited for iterative methods. It has smaller size than the original matrix and it isbetter conditioned than the original matrix. If the original matrix is symmetric positivedefinite, this Schur matrix will inherit this property and thus Conjugate Gradient canbe used. Although Schur matrix is better conditioned than the original matrix,preconditioning the matrix before applying iterative method is necessary for fasterconvergence. The factorized internal subdomains are used to get an approximation ofthe Schur complement matrix and used as a preconditioner.Solving large sparse linear systems can take days even with algorithmic hybridapproaches if performed in sequential. Schur complement framework is moreappealing for parallel programming. With increasing development of hardware andsupercomputers, these systems can be solved efficiently within few seconds. However,existing algorithms should be modified to accommodate the parallel environmentconstrains such as scalability and load balancing.PDSLin and Maphys are the best existing public domain Schur-complement basedhybrid solvers. They are based on different preconditioning methods which is a crucialingredient in any iterative solver. PDSLin uses an approximation of the Schurcomplement as a precontitioner which gives a global view of the domain problem.Maphys uses an approximation of the local assembled Schur matrix which makes thesolver more scalable. In our experiments, we thoroughly examined these solvers, testthem on different matrix types and compare their results with the state-of-art Superludistdirect solver. We also investigated the effect of tuning preconditioning inputparameters on PDSLin and Maphys with increasing number of processors.Developed at Lawrence Berkeley National Laboratory (LBNL) by two distinguishingresearchers in parallel linear algebra Ichitaro Yamazaki and X. Sherry Li; PDSLinsolver is very robust and scales very well with increasing number of processors.However, it is very sensitive to the input parameters especially sparsifying toleranceswhich is not good. PDSLin is a powerful solver but has a lot of bugs and still needs alot of work. Our results show that Maphys solver is more stable with the input valuesthan PDSLin and gives better results. PDSLin results are unpredictable and sometimesfails with the slightest change of the input values. Maphys performance is better thanboth PDSLin and Superlu-dist in terms of time consumption and memory. Besides, thetwo-level parallelism of PDSLin is more robust than multithreading of Maphys. Theserial partitioning time of Maphys is significant in many cases of our experiments and thus it is better to change into a parallel graph partitioner in Maphys than using asequential partitioner. Sometimes the partitioning time is larger than summation of theother solution steps of Maphys and this is clearly shown in Audik and Freescale cases.Maphys solver also scales very well with increasing number of processors.Our conclusion is that sparse hybrid solvers are more flexible because they havedifferent components and each component can be substituted to accommodate theproblem at hand. The developers of the solvers we considered have already aware ofthis feature and this is clearly seen through the different methods they integrated withintheir solvers. For example, PDSLin uses either MUMPS or Superlu-dist as a directmethod which are different approches of factorizing matrices. Similarly, Maphys useseither MUMPS or PASTIX which also are also different.Using two level parallelism can make hybrid solvers work efficiently and scalable upto thousand number of processors. In this level, Processors are distributed into levelsand work concurrently and independently. This method alleviates the problem ofincreasing Schur complement size. However, load balancing is a challenging problemin this method. Hybrid solvers consume much less amount of memory than either puredirect or iterative methods.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	On the analysis and evaluation of sparse hybrid linear solvers
dc.title.alternative	Sparse hibrit doğrusal çözücülerinin analizi ve değerlendirilmesi
dc.type	masterThesis
dc.date.updated	2018-11-28
dc.contributor.department	Hesaplamalı Bilimler ve Mühendislik Anabilim Dalı
dc.subject.ytm	Linear algebra
dc.identifier.yokid	10200472
dc.publisher.institute	Bilişim Enstitüsü
dc.publisher.university	İSTANBUL TEKNİK ÜNİVERSİTESİ
dc.identifier.thesisid	520102
dc.description.pages	117
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10200472.pdf
Size:: 2.105Mb
Format:: PDF
Description:: File_10200472

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess