Data compression and reconstruction in process engineering applications

Önol, Ceyda

dc.contributor.advisor	Akman, Uğur
dc.contributor.author	Önol, Ceyda
dc.date.accessioned	2020-12-04T10:40:28Z
dc.date.available	2020-12-04T10:40:28Z
dc.date.submitted	2012
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/74745
dc.description.abstract	Sensor teknolojisindeki son gelişmeler sayesinde büyük miktarlarda proses verisi toplanabilmektedir. Fakat bu durum, veri arşivlemeyi kolaylaştırmak için yapılan veri sıkıştırma işlemine duyulan ihtiyacı arttırmıştır. Bunun sonucu olarak, verilerin daha az yer kaplaması ve veri toplayan ve işleyen düğümler arasındaki iletimi hızlandırmak için proses izleme, sistem tanımlama ve hata saptama gibi birçok alanda proses verisi sıkıştırma ve bu veriyi yeniden oluşturma teknikleri önem kazanmıştır. Bu tez çalışmasının ana amacı, orijinal veri setlerinin temel özelliklerini koruyarak yüksek derecelerde sıkıştırma oranları elde edebilmek ve bunun yanında gürültülü verilerden kurtulabilmektir. Bu amaçla, süzgeçleme işlemindeki eşik seviyesi ayarlanarak parçalı kümelemeyle yaklaşımlama, bir ve iki boyutlu ayrık kosinüs dönüşümü ve bir ve iki boyutlu ayrık dalgacık dönüşümü tekniklerinin verimlilikleri değerlendirilmiştir. Bu çalışmada, birbirinden farklı özellikleri olan PortSimHigh, PortSimLow, SELDI-TOF MS ve TEP veri setleri kullanılmıştır. Bahsi geçen sıkıştırma teknikleri, değişik eşik seviyeleri kullanılarak sıkıştırma oranı, yeniden oluşturma hata normu, % göreli global hata ve % göreli maksimum hata değerleri baz alınarak karşılaştırılmıştır. Ayrık kosinüs ve dalgacık dönüşümü metotları ile %90'dan küçük eşik seviyeleri kullanıldığında yüksek sıkıştırma oranlarının elde edilemediği fakat yüksek eşik seviyelerinde daha iyi sıkıştırma oranları karşılığında veriyi yeniden oluşturma kalitesinin kötüleştiği sonucuna varılmıştır. Ayrıca, sıkıştırma tekniklerinin verimliliğinin büyük oranla kullanılan veri setlerinin özelliklerine bağlı olduğu anlaşılmıştır. Ayrık kosinüs dönüşümü metodu rastgele eğilimleri olan düzgün veri setleri için tercih edilirken, ayrık dalgacık dönüşümü metodu çok fazla tepe noktası olan gürültülü veri setleri için daha uygundur. Üstelik, kolonları arasında ilişiği olan çok değişkenli veri setleri için iki boyutlu ayrık kosinüs ve dalgacık dönüşümü metotlarını kullanmak daha kazanımlıdır.
dc.description.abstract	Recent improvements in sensor technology have resulted in huge amount of measured process data along with the increasing need for compression prior to storage. Hence, efficient process data compression and reconstruction techniques gain importance in various tasks such as process monitoring, system identification, and fault detection to save storage space and facilitate data transmission between a data collecting node and a data processing node. Main purpose of this thesis work is to be able to achieve the highest degree of compression and de-noising while preserving the key features of the original data upon retrieval and decompression. With this aim, the employed are the most appropriate dimensionality reduction technique among Piecewise Aggregate Approximation (PAA), One Dimensional and Two Dimensional Discrete Cosine Transform (1D-DCT and 2D-DCT) and One Dimensional and Two Dimensional Discrete Wavelet Transform (1D-DWT and 2D-DWT) by adjusting the threshold parameter in filtering. The data sets used are PortSimHigh, PortSimLow, SELDI-TOF MS and TEP. These techniques are evaluated in terms of compression ratio, reconstruction error norm, % relative global error and % relative maximum error for different ?-% thresholding levels. It is concluded that high compression levels cannot be generated with thresholding percentile values less than 90% in both DCT and DWT methods whereas the quality of reconstruction deteriorates at higher threshold levels in return for better compression. Furthermore, it is revealed that the efficacy of the compression methods strongly depends on the data characteristics. DCT is suitable for smooth data sets with random trends whereas DWT is preferred for the noisy data sets with high peak content. 2D-DCT and 2D-DWT are favored for the multivariable data sets with highly correlated columns.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Kimya Mühendisliği	tr_TR
dc.subject	Chemical Engineering	en_US
dc.title	Data compression and reconstruction in process engineering applications
dc.title.alternative	Proses mühendisliği uygulamalı veri sıkıştırma ve yeniden oluşturma
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Kimya Mühendisliği Anabilim Dalı
dc.identifier.yokid	440501
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	BOĞAZİÇİ ÜNİVERSİTESİ
dc.identifier.thesisid	325595
dc.description.pages	256
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_440501.pdf
Size:: 18.62Mb
Format:: PDF
Description:: File_440501

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess