Hibrit film öneri sistemi

Uluyağmur, Mahiye

View/Open

File_440039 (1.220Mb)

Date

2012

Author

Uluyağmur, Mahiye

Metadata

Show full item record

Abstract

Sinema/televizyon ve müzik alanlarında, izlenebilecek ürün sayısı, türü ve bunları izleyebilecek izleyici sayısında büyük bir artış görülmektedir. Bu nedenle, herhangi bir ürünü, bu ürünü izlemekle en çok ilgilenebilecek izleyici kitlesine önermeye yarayacak öneri sistemleri de önem kazanmıştır. İçerik tabanlı öneri sistemleri kullanıcının şimdiye kadar izlediği ürünlerin içerik bilgisini kullanır ve içeriğin türünden etkilenir. Öte yandan, beraber filtreleme öneri sistemleri kulanıcıların ürünlere verdiği puanları (rating) kullanır ve içerik türünden bağımsızdır. İçerik tabanlı öneri sistemlerinin de, beraber filtreleme tabanlı öneri sistemlerinin de zayıf ve güçlü yönleri vardır. Hibrit öneri sistemleri, hem içerik hem de puanlama bilgisini kullanarak daha iyi öneriler üretmeyi amaçlar.Bu çalışmada ürün olarak filmler kullanılmıştır. İçerik tabanlı bir öneri sistemi geliştirilmesi için film içeriği olarak oyuncu, yönetmen, tür gibi bilgilerin yanında her film hakkındaki özet dokümanlarından doküman işleme teknikleri ile üretilmiş vektörler ve kullanıcıların filmlere verdikleri puanlar kullanılmıştır. Ayrıca sadece kullanıcıların filmlere verdikleri puanları kullanan beraber öneri sistemi üzerinde çalışılmıştır. Bu iki sistemin doğrusal bir model ile birleştirilmesiyle kullanıcılara özel film önerileri yapabilmek için hem film içeriği hem de kullanıcıların puanlamalarını kullanan bir hibrit öneri sistemi geliştirilmiştir.İçerik tabanlı öneri sisteminde kullanıcıların izlediği filmlerde geçen öznitelikleri, bu filmlere verdikleri puanlar ile ağırlıklandırarak her özniteliğin her kullanıcı için bir ağırlık değeri oluşturulmuştur. Böylelikle kullanıcıların hangi özniteliklere fazla hangilerine düşük ağırlık verdiği ortaya çıkmaktadır. Önerilecek filmin puanı kullanıcının verdiği ağırlıkların toplamının o kullanıcının eğitim kümesinde izlediği toplam film sayısına bölünmesiyle elde edilir. Öneri işlemi yüksek ağırlık değerine sahip öznitelikler içeren filmlerin kullanıcılara önerilmesi şeklinde yapılmaktadır. Sistemde filmlerin dört farklı özniteliğine göre puan üretilmektedir. Özniteliklerin çıkarılması işleminde öncelikle eğitim kümesindeki tüm kullanıcıların izledikleri tüm filmlerin öznitelikleri çıkarılmaktadır. Böylece film-öznitelik matrisi meydana getirilir. Bu işlem oyuncu, yönetmen, tür ve anahtar kelime öznitelik kümeleri için ayrı ayrı yapılmaktadır. Bir öznitelik kümesi içindeki bir özniteliğin ağırlığı kullanıcının o özniteliği bulunduran tüm filmlere verdiği puanların toplanması ve aynı kullanıcının eğitim kümesinde izlediği toplam film sayısına bölünmesiyle elde edilir. Öznitelikler için elde edilen bu ağırlıklar filmlere tahmini puan üretme işleminde kullanılır.Test kümesinde u kullanıcısına önerilecek bir film geldiğinde öncelikle bu filmin özniteliklerine bakılır. Öznitelik türü olarak oyuncu seçildiğinde filmin hangi oyuncuları bulundurduğu ve bu oyuncuların, u kullanıcısının eğitim kümesinde ağırlık verdiği bir oyuncu olup olmadığı araştırılır. Kullanıcı u'nun bu oyuncuya ait bir filmi önceden izleyip puan verdiği bulunursa önerilecek filmin puanı u kullanıcısının bahsi geçen oyuncuya ait ağırlık değeri olarak belirlenir. Önerilecek filmde geçen oyuncuların hangileri için eğitim kümesinde u kullanıcısının ağırlık değeri varsa bu değerlerin toplamı önerilecek film için verilecek puanı temsil eder. Oyuncu özniteliğine göre puan üretilirken filmlerin birden fazla oyuncusunun olmasıyla genelde ağırlıkların toplanması gerekir, ancak yönetmen özniteliğine göre puan üretilirkern genelde filmlerin tek yönetmeni olacağından ağırlıklar doğrudan puan olarak atanır. Yapılan deneylerde yönetmen özniteliğine göre öneri yapıldığında diğer öznitelik türlerine göre daha başarılı olunduğu görülmüştür. Tür özniteliğine göre alınan sonuçlarda iyi performans göstermektedir.Beraber öneri sistemlerinde, genellikle, kullanıcıların sevdiği ve sevmediği ürünleri açık olarak derecelendirdiği dolaysız geri bildirimli öneri yöntemleri kullanılmaktadır. Öte yandan, TV program önerisi gibi çoğu alanda kullanıcıdan her program için derecelendirme istemek zordur. Derecelendirme yerine, kullanıcıların hangi ürünleri ne kadar süre ile izlediği bilgisinin toplanması ve dolaylı geri bildirimli öneri yöntemlerinin kullanılması daha uygundur. Bu çalışmada, kullanıcıların TV programı izleme süreleri normalleştirilerek üretilen beğeni değerleri puan gibi kullanılmıştır.Bu çalışmada kullanıcıların filmlere verdikleri puanlarden oluşan puanlama matrisleri kullanılmıştır. Oldukça seyrek olan puanlama matrisi, matris ayrıştırma yöntemleri ile faktörlerine ayrılmıştır. Öneri yöntemi olarak dolaylı geri bildirimli öneri yöntemleri düzenli matris çarpanlarına ayırma yöntemi ile beraber kullanılmıştır. Dolaysız geri bildirimli yöntem ile de sonuçlar alınmış, ancak sistemimizde kullanıcılardan doğrudan puan alınamamasından dolayı dolaylı geri bildirimli yöntem esas alınmıştır. Matris çarpanlarının öğrenilmesi sırasında hem öğrenmenin hızlandırılması için uyarlamalı öğrenme hızı kullanılmış, hem de kullanıcı ve ürüne uyarlamalı düzenleme yöntemleri kullanılmıştır.Beraber öneri sisteminde kullanıcılar arasındaki benzerliklerden yararlanan bir yöntem önerilmiştir. u kullanıcısına film önerisi yapılırken, önerilecek i filmini daha önceden eğitim kümesinde izleyen kullanıcılar araştırılır ve bu kullanıcıların i filmine verdikleri puanlar alınır. i filmini eğitim kümesinde izleyen kullanıcıların u kullanıcısı ile eğitim kümesinde ortak izledikleri filmlerin olup olmadığına bakılır. Eğer u kullanıcısı eğitim kümesinde bu filmi izlediyse, film u kullanıcısına direk önerilir. Hem i filmini izleyen hem de kullanıcı u ile aynı filmleri izleyen kullanıcıların u kullanıcısıyla izledikleri ortak film sayısının, bu kullanıcıların i filmine verdikleri puan ile çarpımlarının toplanmasıyla u kullanıcısına i filmi için puan üretilmiş olur. u kullanıcısının en fazla ortak film izlediği kullanıcıyla çok benzer oldukları yorumu yapılabilir.Hibrit öneri sistemi, beraber ve içerik tabanlı önerilerin iki değişik şekilde birleştirilmesi ile oluşturulmuştur. Birinci sistemde beraber öneri puanı önerilecek filmi izleyen kullanıcıların, öneri verilecek kullanıcı ile ortak izlediği film sayısıyla orantılı olarak oluşturulmuştur. İkinci hibrit sistemde ise matris ayrıştırma sonuçlarından üretilen beraber öneri puanları kullanılmıştır. Birinci sistemle içerik tabanlı sistemin doğrusal olarak birleştirilmesi en iyi sonuçları vermiştir.Tez çalışmasında ayrıca TV film önerilerinin değerlendirilmesinde kullanılabilecek, değişik performans ölçütleri kullanılmış ve yeni ölçütler önerilmiştir. Bütün yöntemlerin performansları 13 aylık bir veri kümesi üzerinden değerlendirilmiştir.

The number and kind of available content and the number of users who can view them have increased tremendously in both movie/television and music domains. Therefore, recommendation systems that can accurately recommend to a certain user the set of products that he would most likely be interested and as fast as possible, have become important. While content based recommendation systems use features of products a user has viewed so far and they are domain dependent, domain independent collaborative filtering systems use only the ratings given to each product by a number of users. There are some shortcomings of both collaborative and content-based recommendation systems. Cold-start problem is one of the most important problem of the collaborative filtering systems. If a movie is not watched in the training set, this movie can not be recommended to any user. Content-based system can solve this problem. Moreover if a user is new in the system namely if s/he did not watch any movies, collaborative filtering system can not recommend any movies to this user either. In order to solve the new user problem user demographics can be used, however they tend to be not so reliable for many domains. In our system we first observe the watching behavior of a user for a number of movies and then do recommendations. Content-based recommendation systems rely on content features which need to be extracted. Rating matrices are generally sparse and high dimensional matrices, so it is costly to work with large matrices. In collaborative filtering system matrix factorization methods can generate low dimensional user and item factors to solve the sparsity problem. Content-based recommendation systems rely on content data gathered for a specific user and if too complex models are chosen they may suffer from overspecialization. Different hybrid recommendation systems that integrate content and collaborative recommendation systems have been proposed in the literature.In this thesis, content-based, collaborative and hybrid TV movie recommendation methods are proposed and evaluated. In the content-based recommendation system as the content for a movie, we use information such as movie actor, producer, genre and also words obtained from the movie summaries. In addition to these fields, computed (implicit) ratings which users give to the movies are used in the content-based system. Another recommendation method used in this work is the collaborative filtering method. Collaborative filtering method uses only users? ratings for movies. In this project, we also propose a hybrid movie recommendation system which uses a linear combination of recommendations proposed by the content-based and collaborative filtering methods.Recommendation systems need user ratings. However, for the TV recommendation problem, we do not have explicit ratings from the users. In this thesis, we used the implicit ratings of the movies, which are generated as the percentage of the movie watched by the user over all presentations of the movie. Therefore if a user watched a movie multiple times or different parts in different sessions, the implict rating reflects that.Another contribution of the thesis is the use of different performance evaluation criteria for TV movie recommendation. We evaluate performance of the movie recommendation system by using four evaluation measures. Two of them are the well known information retrieval performance measurements precision and recall. Precision is determined in our system as the number of movies watched by the user in top 10 recommendations divided by 10. High precision means system hits many correct movies in the top 10 recommendation. If a user has watched a lot of content, his/her precision is naturally high. Recall solves this problem since it divides the top 10 hits by the number of movies user u watched in the test set. In addition, two other performance evaluation measures are developed in this thesis: normalized precision and rating weighted normalized precision. Precision gets higher as the number of movies that a user watched in the test set increases and it also gets higher as the number of movies in the test set decreases. Normalized precision takes into account the number of the movies in the test set. Ratio of the number of movies watched by a user and the number of movies in the test set can be used as a normalization term for each user. Normalized precision is precision normalized by this ratio. This ratio is proportional to how much better a recommendation is compared to a uniform random recommendation system to a user who watches movies uniformly random. A recommendation system which recommends movies watched by the user with high ratings is more preferable to another system that recommends the same number of watched movies with low ratings. Rating weighted normalized precision (RWNP) performance measure takes into account the users ratings for the test movies. It is computed as the sum of the ratings of the watched movies in the top 10 recommendations and divided by the ratio of the number of movies that are watched by the user in the test set and the total number of the movies in the test set.The content-based recommendation system uses actor, genre, director and keyword features of movies watched by a user. In the feature extraction phase, first of all a movie-feature matrix which contains the features of all movies in the training set, is created. For a particular user, an existing feature in a watched movie is scaled by the implicit rating for that movie and the sum of the user?s weights for the movie?s features divided by the number of movies that the user watched in the training set gives the weight of a feature for that user. These features are reference features for the recommendation of the test set movies. If a feature weight for a user is greater than the other feature weights it means that the user gives more importance to this feature than the others. This feature weight computation is done separately for four different feature sets: actor, genre, director and keyword. In the test set when movie i will be recommended to the user u, firstly features of the movie i are extracted. Assume that actor feature set is chosen, which actor features movie i contains and whether user u watched such a movie which contains one of these actor features is investigated. If user u watched a movie which contains the actors of the movie i in the training set, then user u rating for movie i is determined by summation of the actor features weights of the user u. While generating ratings according to actor feature set, since usually movies have more than one actor, all available feature weights are summed. On the other hand, according to the director feature set generally there is one director for each movie, so user weights for director features are used directly as ratings for movies. It is observed that ratings generated using the director feature set are more successful than the others, while the genre feature set is also quite successful.Content-based recommendations for each feature set are also combined using three different strategies. Before combination, all generated ratings are normalized to 0-1 range using min-max normalization. In the first combination scheme, different feature sets? ratings are summed directly to generate a new rating for a user to a movie. The second combination scheme takes a weighted sum of the ratings for each feature set. The weight of a feature set is determined as the exponential of the negative mean absolute training set error between the actual ratings and the predicted ratings for that feature set. Weighted sum combination gives better results than sum. The third strategy aims to use the feature set which is likely to be the most successful for a particular user. The feature set with the minimum mean absolute training error for the user is chosen as the feature set to be used for test recommendations.In collaborative filtering, generally explicit feedback recommendation methods where users rank movies explicitly such as likes or dislikes or using scores, are used. However, in TV program recommendation problem, as in many other areas, it is difficult to request the explicit ratings from the user for the programs. Instead of ratings, there is information on how long the user watched an item. For such problems, instead of explicit recommendation methods, implicit methods should be used. In this work, we process the time durations for which users watch the programs to obtain implicit ratings and similar to prior work of others use these ratings for implicit recommendation. The user-movie matrix, which contains the users? ratings for movies can be used to assess similarities between users and movies and hence, for example, movies liked by users similar to the current user can be recommended. However, the user-movie matrix is a very sparse matrix and most user-user and item-item similarities may happen to be just zero. Matrix factorization techniques are used to represent each movie and user in a small number of reduced dimensions where user-item similarities are as close as possible to the ratings given in the training set. We first use the implicit computed ratings as if they are explicit ratings and use explicit matrix factorization methods. While learning the matrix factors, we introduce adaptive learning rate to speed up the learning and we also introduce user/item adaptive regularization. We also use implicit matrix factorization and compare it with the other recommendation methods.Since matrix factorization is a costly procedure which involves many parameters, we also used count based collaborative filtering to measure user-user similarities. In this method when movie i will be recommended to the user u, first the set of users who watched movie i in the training set is obtained. For each of these users, the count of movies liked by user u and that user is used as a similarity between the users. Count based collaborative filtering predicts ratings as the similarity weighted ratings of the users similar to user u for movie i..In this thesis, we propose two hybrid movie recommendation systems by combining content-based and collaborative recommendation system ratings linearly. The first hybrid system HybridCommonMovie is obtained by combining content-based system and count based collaborative filtering system. The second one HybridMF is generated by combining content-based system and matrix factorization based collaborative filtering system. A weight parameter is used to adjust the contribution of the methods in the linear combination.Experiments were performed to assess the performance of the recommendation algorithms for thirteen months of data. Among all the methods experimented with, the best results are obtained with the HybridCommonMovie systems. For this recommendation system, averaged over all users, precision, recall, normalized precision and rating weighted normalized precision results are better than the other recommendation systems. HybridCommonMovie method also is the method which has the smallest number of parameters that need to be adjusted for different datasets, therefore is the preferred recommendation method for the TV recommendation dataset used in this thesis.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/129434

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess