Çok boyutlu test deseninin ve kalibrasyon yöntemlerinin çok boyutlu bireyselleştirilmiş bilgisayar uygulamalarına etkisi

Özberk, Eren Halil

dc.contributor.advisor	Gelbal, Selahattin
dc.contributor.author	Özberk, Eren Halil
dc.date.accessioned	2020-12-29T13:52:57Z
dc.date.available	2020-12-29T13:52:57Z
dc.date.submitted	2016
dc.date.issued	2019-12-26
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/435110
dc.description.abstract	Testler kimi zaman bireyleri sıralamak kimi zaman da bireyler hakkında tanılayıcı değerlendirmeler yapmak gibi farklı amaçlar doğrultusunda geliştirilebilirler. Ancak son zamanlarda testlerin istenmeyen boyutları ölçtüğü veya çok boyutlu yapılarda geliştirildiği durumlarda alt boyut puanlarının rapor edilmesine yönelik problemler ortaya çıkmaktadır. Çok boyutlu bireyselleştirilmiş bilgisayarlı test (ÇBBBT) uygulamaları, çok boyutlu madde tepki kuramı (ÇBMTK) varsayımlarını kullanarak çok sayıda boyutu etkili bir şekilde ölçebilmektedir. Literatürde alt boyut ve toplam yetenek puanları kestirimlerinin kesinliklerini artırmaya yönelik çok sayıda ÇBBBT madde seçme yöntemi çalışması bulunmaktadır. Literatürde farkı koşul altında madde seçme yöntemlerinin karşılaştırılmasına ilişkin araştırmalar yer almakta fakat testin yapısına ve madde bankasının kalibrasyonlarına göre herhangi bir karşılaştırılmaya rastlanmamaktadır. Bu çalışmada, önceki çalışmalardan farklı olarak, tek ve çok boyutlu kalibrasyon yöntemleri ile çok boyutlu test yapısı (basit ve karmaşık) gerçek test koşullarına en uygun olacak şekilde karşılaştırılmıştır. Bu çalışmanın amacı ÇBBBT uygulamalarında alt boyut ve toplam yetenek puanlarını kestirirken kullanılan çok boyutlu madde seçme yöntemlerinin hata, yanlılık ve korelasyon değerlerini test desenine, kalibrasyon yöntemine ve boyut başına düşen madde sayısına göre karşılaştırmaktır. Araştırmada test deseni, boyut başına düşen madde sayısı, kalibrasyon yöntemi ve madde seçme yöntemi olmak üzere dört koşul manipüle edilmiştir. Basit, düşük karmaşık ve yüksek karmaşık test yapıları için 1000x3 ve 1000x45 matrisleri gerçek yetenek parametreleri çok değişkenli normal dağılımdan rastgele olarak üretilmiştir. Üretilen madde ve yetenek parametreleri kullanılarak telafi edici çok boyutlu 3 parametreli lojistik modele ve belirlenen korelasyon değerlerine göre ikili puanlanan madde yanıt setleri oluşturulmuştur. 3 boyutlu madde bankası basit ve karmaşık test desenleri için simüle edilmiştir. Araştırmada boyutlar arasındaki korelasyon değerleri 0.2, 0.5 ve 0.8; kalibrasyon yöntemleri olarak tek boyutlu ayrı kalibrasyon ile iki farklı çok boyutlu (Bock and Aitkin EM ve Metropolis-Hastings Robbins-Monro algoritmaları) ile minimum açı, Kullback-Leibler (KL) ve minimum hata varyansı madde seçme yöntemleri kullanılmıştır. Alt boyut ve toplam yetenek puanlarının performansları gerçek ve ÇBBBT sonrası kestirilen puanlar arasındaki mutlak yanlılık (ABSBIAS), hata kareleri ortalamasının karekökü (RMSE) ve korelasyona değerlerine göre karşılaştırılmıştır. Araştırmanın sonuçları incelendiğinde kalibrasyon yöntemlerinin, test deseninin ve boyut başına düşen madde sayısının alt boyut ve toplam yetenek puanlarının kestiriminde madde seçme yöntemlerinin performansına anlamlı bir etkisinin olduğu belirlenmiştir. Çok boyutlu test modelinin karmaşıklaştığı durumda hem alt boyut hem de toplam yetenek puanları kestirimlerinde mutlak yanlılık değerlerinin anlamlı bir biçimce azaldığı görülmüştür.Farklı test desenlerinde farklı madde seçme yöntemlerinin daha iyi performans gösterdiği belirlenmiştir. Basit yapılı test deseninde boyutlar arası korelasyonun orta düzeyde (0.5) olduğu uzun testlerde (N=45), toplam puanların kestiriminde tek boyutlu ve BAEM kalibrasyonu için minimum hata varyansı (V1) madde seçme yönteminin en düşük mutlak yanlılık değerlerine sahip olduğu belirlenmiştir. Düşük karmaşık yapılı test deseninde boyutlar arası korelasyonun yüksek düzeyde (0.8) olduğu uzun testlerde (N=45), toplam puanların kestiriminde tek boyutlu kalibrasyon için Vol madde seçme yönteminin en düşük mutlak yanlılık değerlerine sahip olduğu belirlenmiştir. BAEM kalibrasyonunda ise en düşük mutlak yanlılık değeri boyutlar arası korelasyonun düşük düzeyde (0.2) olduğu uzun testlerde (N=45) Vol madde seçme yönteminde gözlenmiştir. Yüksek karmaşık yapılı test deseninde boyutlar arası korelasyonun yüksek düzeyde (0.8) olduğu uzun testlerde (N=45), toplam puanların kestiriminde tek boyutlu kalibrasyon için V1 madde seçme yönteminin en düşük mutlak yanlılık değerlerine sahip olduğu belirlenmiştir. BAEM kalibrasyonunda ise en düşük mutlak yanlılık değeri boyutlar arası korelasyonun düşük düzeyde (0.2) olduğu uzun testlerde (N=45) KL madde seçme yönteminde gözlenmiştir.
dc.description.abstract	A test can be designed for many purposes, including the ranking of people along a continuum or providing diagnostic value about examinees. However, a very common problem that often arises is the reporting diagnostic subscores when items are capable of measuring unwanted dimensions or designed for multidimensional purposes. Multidimensional computer adaptive testing (MCAT) is capable of measuring multiple dimensions efficiently by using multidimensional IRT (MIRT) applications. There have been several research studies about MCAT item selection methods to improve domain and the overall ability score estimations accuracy. According to the literature review it has been found that most studies focused on comparing item selection methods in many conditions except for the structure of test design and multidimensional calibration strategies. In contrast with the previous studies, this study employed unidimensional and multidimensional calibration approach and various test design (simple and complex) which allows the evaluation of domain and subscore ability estimations across multiple real test conditions. The purpose of this study is to compare MCAT item selection methods while estimating domain and the overall ability scores in terms of test design, number of items per dimension, calibration approaches in MCAT framework. In this study, four factors were manipulated, namely the test design, number of items per dimension, calibration strategies and item selection methods. For each SS, CLS or CHS design 1000x3 and 1000x45 matrix of true ability parameters was randomly generated from the multivariate normal distribution. Using the generated item and ability parameters, dichotomous item responses were generated in by using M3PL compensatory multidimensional IRT model with specified correlations. A three-dimensional item bank was simulated with simple and complex structures. Dimensions correlated at ρ = 0.2, 0.5, and 0.8. Three calibration strategies, separate unidimensional and two multidimensional (Bock and Aitkin's EM and Metropolis-Hastings Robbins-Monro algorithm) calibration were examined. The multidimensional CAT item selection procedures: minimum angle, minimize the error variance of the composite score with the optimized weight, and Kullback–Leibler (KL) information were also examined. MCAT domain and composite ability score accuracy was evaluated using absolute bias (ABSBIAS), correlation and the root mean square error (RMSE) between true and estimated ability scores.The results suggest that the calibration approaches, multidimensional test structure and number of item per dimension have significant effect on item selection methods for both domain and the overall score estimations. As the model gets complex absolute biases had decrease significantly for both domain and overall scores. When the test design change different item selection methods had performed better. For SS test design it was found that V1 item selection has the lowest absolute bias estimations for both SU and BAEM calibration while estimating overall scores when correlation between dimension is moderate (0.5) and test length is long (N=45). For CLS test design it was found that Vol item selection has the lowest absolute bias estimations for in SU calibration while estimating overall scores when correlation between dimension is high (0.8) and test length is long (N=45). For BAEM calibration Vol item selection has the lowest absolute bias estimations while estimating overall scores when correlation between dimension is low (0.2) and test length is long (N=45). For CHS test design it was found that V1 item selection has the lowest absolute bias estimations for in SU calibration while estimating overall scores when correlation between dimension is high (0.8) and test length is long (N=45). For BAEM calibration KL item selection has the lowest absolute bias estimations while estimating overall scores when correlation between dimension is low (0.2) and test length is long (N=45).	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Eğitim ve Öğretim	tr_TR
dc.subject	Education and Training	en_US
dc.title	Çok boyutlu test deseninin ve kalibrasyon yöntemlerinin çok boyutlu bireyselleştirilmiş bilgisayar uygulamalarına etkisi
dc.title.alternative	The effect of multidimensional test design structure and calibration strategies in multidimensional computer adaptive testing
dc.type	doctoralThesis
dc.date.updated	2019-12-26
dc.contributor.department	Eğitim Bilimleri Anabilim Dalı
dc.subject.ytm	Calibration
dc.subject.ytm	Tests
dc.subject.ytm	Multidimensional item response theory
dc.subject.ytm	Computerized adaptive testing
dc.identifier.yokid	10132880
dc.publisher.institute	Eğitim Bilimleri Enstitüsü
dc.publisher.university	HACETTEPE ÜNİVERSİTESİ
dc.identifier.thesisid	446893
dc.description.pages	135
dc.publisher.discipline	Eğitimde Ölçme ve Değerlendirme Bilim Dalı

Files in this item

Name:: yokAcikBilim_10132880.pdf
Size:: 9.144Mb
Format:: PDF
Description:: File_10132880

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess