Sıra korumalı dizgi eşleştirme tabanlı bağlam modellemesi ile kayıpsız veri sıkıştırma

Kingir, Muhammed Veysel

View/Open

File_10291703 (1.560Mb)

Date

2019

Author

Kingir, Muhammed Veysel

Metadata

Show full item record

Abstract

Dijital iletişimin çok daha yaygın hale gelmesi ve dijitalleşen veri sahasının genişlemesi ile birlikte veri sıkıştırma tekniklerinde yapılacak iyileştirmeler daha önemli hale gelmiştir. Birçok alanda geçmiş veri istatiksel olarak saklanıp analiz edilmekte, gelecek için yapılan tahmin ve öngörülerde kullanılmaktadır. Bu nedenle veri sıkıştırma oranlarında yapılacak iyileştirmeler bu bağlamda iyi bir katkı sağlayacaktır. Veri sıkıştırmalardaki iyileştirme gereksinimi önümüzdeki dönemde giderek artacağa benziyor.Buradan yola çıkarak mevcutta hisse senedi analizi, müzik melodi eşleştirmesi gibi farklı alanlarda kullanılan Sıra Korumalı Dizgi Eşleştirme yöntemi, dinamik sıkıştırma algoritmalarına uygulanarak birtakım kazanımlar elde edilmeye çalışıldı. Öncelikle kullanılan dinamik sıkıştırma algoritmalarının nasıl çalıştığı ve modellendiği anlatıldı, verilen örneklerle detaylandırıldı.Sıra Korumalı Dizgi Eşleştirme yöntemi, verilerin değerlerinin eşleşmesi yerine sıralama ilişkilerinin eşleşmesini gözlemler. Dinamik veri sıkıştırmaları da bağlam modellemesi yaklaşımı ile yapılmaktadır. Bağlam modeli, girdi olan sembolün gerçekleştiği bağlamı kullanarak bu sembolün kaç bitle kodlanacağını belirleyen bir olasılıksal modeldir. Gelen sembolün hangi bağlam modelini kullanacağı, kendisinden önce gelen sabit bir k sayıda karaktere bakılarak belirlenebilir. k-order bağlam modellemesi dediğimiz bu yaklaşımda, model sayısının artması sıkıştırma oranında iyileştirme sağlamakla beraber, kaynak kullanımını ciddi oranda arttırmaktadır. Kullanılan bağlam genişliğinin kaynak kullanımı ve sonuçlar üzerinde etkisi bulunmaktadır. Bu yöntem ile k sayıda karakterin değerlerine bakmak yerine, bu k adet karakterin sıralama ilişkisine bakılarak bağlam değeri belirlendi. Yöntemin uygulanması sonucunda bir takım dosya desenlerinde ciddi iyileştirmeler görülürken, bir kısmında standart yaklaşımla benzer sonuçlar elde edildi. Ancak bunlarda da kaynak kullanımından tasarruf edildi. Bir kısım dosya desenlerinde ise pozitif yönde bir iyileştirme elde edilemedi. Bu yaklaşımda daha iyi sonuçlar elde etmek için ek genişletilmiş yaklaşımlar denendi ve bir takım pozitif sonuçlar alındı. Uygulanan yeni yaklaşımlar kayda değer iyileştirmelerin gözlemlendiği bir sonuç verdi.

With digital communication becoming more common and expanding the digitalized data field, improvements in data compression techniques have become more important. In many areas, past data is stored and analyzed statistically, and used for future estimations and predictions. Therefore, improvements to data compression rates will make a good contribution in this context. The improvement in data compression is likely to increase in the coming period. As the amount of recorded digital data continues to increase exponentially, the requirements for improving data compression algorithms have begun to increase. In this context, in the context modeling approach used in dynamic compression algorithms, the Order-Preserved Pattern Matching method was used to determine the context modeling. Which context model the input character will use in context modeling is determined by looking at the preceding k characters. This increase in k, that is, context determination over wider length of historical data, can improve the compression ratio, but resource utilization will also increase as the number of models exponentially increases. The aim of this study is to develop new approaches and to obtain better compressed data with less resources.From this point of view, it has been tried to obtain some gains by applying the Order Preserve Pattern Matching method, which is used in different fields such as stock analysis, music melody matching, and dynamic compression algorithms. Firstly, it was explained how dynamic compression algorithms are used and modeled. Then the context modelling approach explained in detail. Finally, Order-Preserve Pattern Matching implementation explained.Data compression is a reduction in the number of bits required to represent data. Compressing data can save storage capacity, speed up file transfers, and reduce storage hardware and network bandwidth costs. Compression is performed by a program that uses a formula or algorithm to determine how the size of the data will shrink. For example, an algorithm may represent a bit sequence of smaller 0s and 1s using a dictionary for conversion between them, or a formula may add a reference or pointer to a string of 0s and 1s that the program foresees.Lossless data compression can be done in two different ways: static and dynamic. In static data compression, a data model is created based on the use of symbols by passing over all data. Then all data is coded according to this model. Resolving data back is the same. First, the data model used for coding is taken, then all data is decoded according to this model. However, in some areas, such as telecommunications, it is not possible to overwrite all the data as all of the data has not yet arrived. They use dynamic data compression algorithms. In dynamic data compression algorithms, a data model is initially determined and the incoming data is encoded with this model. The data model is updated by adding the last encoded data information. During the back-analysis of the data, a data model is also initially determined. The incoming data is analyzed according to the model, then the data model is updated. The Order-Preserve Pattern Matching Based Context Modeling approach used in this study was applied to Huffman Dynamic Coding and Arithmetic Dynamic Coding methods which are dynamic lossless data compression methods.In this study, Order-Preserved Pattern Matching method is applied on dynamic compression algorithms. The results were analyzed from different file designs. Additional improvements have been made by increasing the number of contexts used. The results of these were analyzed in detail and some evaluations were obtained. How this method works, development steps, how to implement compression algorithms, the results are explained in detail.The Order Preserve Pattern Matching method observes the mapping of the sort orders instead of matching the values of the data. Dynamic data compression is also done with the context modeling approach. The context model is a probabilistic model that determines how many bits this symbol is encoded using the context in which the input symbol occurs. Which context model the incoming symbol will use can be determined by looking at a fixed number of characters preceding it. Context modeling is the use of the contents of previously seen characters to determine the encoding of the current character. In this approach, which we call k-order context modeling, the increase in the number of models improves the compression ratio and increases the resource usage significantly. The width of the context used has an impact on resource utilization and results. With this method, instead of looking at the values of k characters, the context value was determined by looking at the sort relationship of these k characters. The context width used in doing so has an impact on resource use and results. Increasing the number of models in context modeling allows the input character to be represented by a lower number of bits. In the case of a single pattern, since each input will use the same pattern, the representation of the string bit length of each character may increase. In case of increasing number of models, it raises the problem of which context model the input will use. In this case, an approach that uses the previous k characters is used to predict which context the input character will use. The Order-Preserve Pattern Matching Based Context Modeling approach tries to determine the context to be used according to the pattern similarity by taking the order of the k data instead of the content of the k characters before the data to be encoded. Thus, although the wider order is used, context length is reduced and resource usage is reduced. This method was applied to the Dynamic Huffman Coding and Adaptive Arithmetic Coding. The results were compared with the standard adaptive algorithm results. Then, to make some further improvements, another parameter was added to increase the context length. This additional parameter is provided by adding any element in the sequence array to the end of the sequence. Which element to be added is determined by comparing the values obtained by adding each element individually. Here, files from various data corpus are used as input. Improvements were observed in a number of file patterns. The same results were obtained in a number of file designs with the standard methods, but a smaller number of contexts were used, saving resource usage. In some file designs, no gain was obtained. A study was performed in which positive results were obtained.As a result of the implementation of the method, a number of file patterns have been shown to improve significantly, while some have similar results with the standard approach. However, they also saved resources. A positive improvement was not obtained in some of the file patterns. To achieve better results, additional extended approaches were attempted and a number of positive results were obtained. New approaches have yielded significant improvements.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/127300

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess