ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

A hybrid recommender system based on data enrichment on the ontology modelling

[version 1; peer review: 2 approved, 1 not approved]
PUBLISHED 17 Sep 2021
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Research Synergy Foundation gateway.

Abstract

Background: A recommender system captures the user preferences and behaviour to provide a relevant recommendation to the user. In a hybrid model-based recommender system, it requires a pre-trained data model to generate recommendations for a user. Ontology helps to represent the semantic information and relationships to model the expressivity and linkage among the data.
Methods: We enhanced the matrix factorization model accuracy by utilizing ontology to enrich the information of the user-item matrix by integrating the item-based and user-based collaborative filtering techniques. In particular, the combination of enriched data, which consists of semantic similarity together with rating pattern, will help to reduce the cold start problem in the model-based recommender system. When the new user or item first coming into the system, we have the user demographic or item profile that linked to our ontology. Thus, semantic similarity can be calculated during the item-based and user-based collaborating filtering process. The item-based and user-based filtering process are used to predict the unknown rating of the original matrix.
Results: Experimental evaluations have been carried out on the MovieLens 100k dataset to demonstrate the accuracy rate of our proposed approach as compared to the baseline method using (i) Singular Value Decomposition (SVD) and (ii) combination of item-based collaborative filtering technique with SVD. Experimental results demonstrated that our proposed method has reduced the data sparsity from 0.9542% to 0.8435%. In addition, it also indicated that our proposed method has achieved better accuracy with Root Mean Square Error (RMSE) of 0.9298, as compared to the baseline method (RMSE: 0.9642) and the existing method (RMSE: 0.9492).
Conclusions: Our proposed method enhanced the dataset information by integrating user-based and item-based collaborative filtering techniques. The experiment results shows that our system has reduced the data sparsity and has better accuracy as compared to baseline method and existing method.

Keywords

Information Retrieval, Ontology, Recommender System, Collaborative Filtering, Content-based System, Hybrid Recommender System

Introduction

A Recommender System (RS) is a system that can provide item recommendation to a user based on their personalized interest. The attention for RS has increased dramatically over the past decade in various industries and domains such as e-commerce and online video streaming. There is a crucial need for having a system that can filter the numerous data around us as we are living in the area of the Internet with humungous data transactions and exchanges daily. With a properly implemented RS, the user will get a personalized recommendation based on the preferences, interest, rating, search results, the similarity between other users and so on. There are various successful use cases where RS helps in increasing the revenue of industrial, especially on online businesses. E-commerce companies such as eBay1 and Amazon2 have made use of RS to promote their products to the targeted customer. On the other hand, online video streaming company such as Netflix3 and YouTube4 have also implemented multiple types of RS in their system.

Generally, there are two types of RS: (1) content-based filtering (CB), and (2) collaborative filtering (CF). The CB RS provides recommendations to a user by using user preferences or history while the CF RS generates the recommendations based on the relationship between the user and item. These two methods have their advantages and shortcomings. As such, to combine the advantages and eliminate the shortcoming of each specific method, a new group of RS named hybrid RS has emerged. According to a recent survey in 20205, most of the recently proposed RS techniques fall under this group. Besides that, the most proposed hybrid RS combine at least one CF method in their system. CF method can be further classified as memory-based and model-based CF. A memory-based CF suggest item based on the similarity between user or item while the model-based CF builds the model by learning the interaction between user and item. There are a few researchers who focus on enhancing the model performance by fine-tuning the parameter and method in the model development process. However, the accuracy of the model built depends on the quality of the data6.

On the other hand, ontology helps to structure the data in a way that the entities are connected within the database7. Thus, the relationship between each entity is preserved. Semantic similarity can be easily calculated by various method that the ordinary method may not be able to discover. Ontology has been proven to help in increasing the accuracy of the RS and decrease the cold start issues8,9. With ontology, Manuela et al.10 reduced the fake neighbours’ problem cause by the CF method. Tarus et al.11 proposed an E-learning RS based on RS and the accuracy is better than using only CF without ontology. Shaikh et al.12 proposed an ontology-based RS in an e-commerce website. User behaviour on the website has been captured as implicit feedback to the RS. Gohari and Tarokh13 proposed a hybrid method that using ontology to structure the data. User-based (UB) CF and item-based (IB) CF were used to generate the recommendation. Bagherifard et al.14 proposed a hybrid approach that utilizing ontology in CB and CF hybrid RS. In their approach, the user has been clustered before calculating pass to CB and CF. This reduces the compute time of the CF RS. Celyan et al.15 proposed SEMCBCF, which is an ontology hybrid RS that extended from CBCF16, which is a CF RS without ontology integrated. The semantic similarity between items was calculated in their proposed system. The weighted average algorithm was used to combine the different similarity value. Nilashi et al.17 proposed an ontology hybrid recommendation that using IB and UB CF together with the clustering method to reduce overgeneralization. On another separate research, Liu and Li18 proposed an ontology CF RS based on Singular Value Decomposition (SVD). By employing the ontology as the data representation, the data sparsity has been decreased and the empty value of the user-item matrix was filled up based on IB CF. Inspired by their work, we proposed to address on enriching the data representation by means of ontology enrichment to give a more accurate recommendation. The summary of the recent publications has been done in Table 1.

Table 1. Recommendation system type and advantages of each publication.

PublicationRS TypeAdvantages
10CFReduce fake neighborhoods’ problem.
11,12CFAble to capture implicit feedback.
14Hybrid
(CB and CF)
User has been clustered in ontology to reduce compute time.
13,15,17Hybrid
(IB and UB CF)
13 User demographic is used.
15 Unknown rating predicted by CB before CF process.
17 Clustering item and user to reduce overgeneralization.
18Hybrid
(IB and model-based CF)
Enriching the matrix by IB CF before the model-based CF process.

In our proposed method, we focus on how to enrich the data information with ontology in order to increase the accuracy of the model-based RS. We proposed a method to enrich the user-item rating matrix by using the semantic similarity calculated from ontology. We added a UB RS to the item-based RS to generate the predicted rating that used to fill the user-item matrix to improve on the accuracy. In addition, our proposed approach will also reduce the main problem that usually faced in the model training, which is the data sparsity issue. With the predicted rating filled in the original user-item matrix, it can fill up the unknown value thus reduce the sparsity and increase the model training result. The experiment evaluations demonstrated that we have achieved higher accuracy and decrease the data sparsity problem of the original matrix.

Methods

Insipred by the data enrichment method proposed by Liu and Li18, we extended the work and proposed a hybrid method that used ontology to model the data. The semantic similarity between each attribute will be calculated by using the ontology structure. The semantic similarity will be used in the rating prediction in IB CF and UB CF. The flow diagram of our proposed method is illustrated in Figure 1. The proposed method consists of four parts:

1. Crawling extra movie information from IMDB and construct the ontology

2. Unknown Rating prediction by IB and UB CF

3. Combine predicted ratings and forms a filled user-item rating matrix

4. Model-based CF.

e6a1ecd5-1398-4704-940e-1cc46adca1cd_figure1.gif

Figure 1. Flow diagram of the proposed method.

We have selected the MovieLens 100K dataset as this is the standard dataset used for benchmarking purpose. This dataset contains 100K rating records with 1682 movie data and 943 user profile details. However, the movie information of the MovieLens dataset is limited. To have more details for the movie, we crawled the extra information from the IMDB website such as movie country, classified, director, actors, and so on. After all the data had been crawled, we constructed the ontology representation for the dataset (see Figure 2). In the ontology representation, all the attributes nodes were connected with each other via the relationship edges. The two main nodes were User and Movie connected through their related profile node.

e6a1ecd5-1398-4704-940e-1cc46adca1cd_figure2.gif

Figure 2. Ontology constructed based on the MovieLens dataset.

The semantic similarity of the dataset can be easily counted from the ontology constructed above. We used the IB CF and UB CF together to predict the unknown value from the original user-item matrix. The IB CF calculate the semantic similarity by considering the relationships between items. We used the Jaccard similarity index in calculating semantic similarity. Jaccard similarity measures the similarity by taking the percentage of the intersection of two sets of data. The formula is depicted in Equation (1).

J(A,B)=|AB||AB|=|AB||A|+|B||AB|(1)

Where J(A, B): the Jaccard similarity index between data A and data B.

From the process above, we got the movie-movie similarity by each feature of the movie (see Table 2). We then combined all the movie-movie similarity by a weighted average algorithm, where the weight variables were decided by experiment evaluation to get the best combination.

Table 2. An example of a movie-movie similarity matrix.

Movie1Movie2Movie3Movie4
Movie110.350.860.5
Movie210.20.6
Movie310.88
Movie41
1

After completing the IB similarity calculation, we were able to predict the unknown rating values in the user-item matrix. The theory of the prediction is finding the related movie ratings rated by the specific user. The formula used is shown in Equation (2).

Predictedu,m=FinalSimi×au,iFinalSimi(2)

Where i: the movie rated by the user, a: rating, u: user, m: movie

The algorithm first took all the movie rated by the specific user and compare to the similarity calculated. It then summed up the predicted value by using the weighted average method where the weight is the similarity of the movie to that specific movie. The predicted value put in a temporary matrix which was later combined with UB RS.

In the UB CF, we applied similar methods from the IB CF above. First, we calculated the similarity of each user features then combined it with a weighted average algorithm. With the user-user similarity calculated, we then predicted the empty movie rating by finding similar users. The similar users’ rating to that specific movie was combined by the weight algorithm.

Once the two IB CF and UB CF methods were completed, the two predicted rating were then combined by using the weighted average algorithm to get the final predicted rating for filling the empty original user-item matrix. After the filling process was completed, it was then passed to the model-based CF to construct the model. The CF model used in this paper was SVD. SVD decomposed the matrix into two lower dimensionality matrix and extracted the latent features. It is a famous method used in the model-based CF.

Results

In the evaluation, we have compared the result from the baseline model that based on SVD method alone to predict rating and an existing method that uses the IB CF to enrich the original user-item matrix.

The proposed system was developed using Jupyter Notebook 6.4.0 in Python 3.6 and Linux environment (Ubuntu 18.04). The Neo4j database has been used to store the data as the it is a graph database that our ontology representation will maintain in the data model. We applied the Root Mean square error (RMSE) algorithm to determine the accuracy of the system. It is a common approach to determine the predictive accuracy of the model19. It gives a relatively high weight to large errors. The smaller the RMSE value, the more accurate the model is.

Several experiments have been done to decide the weight variable used in combining the IB CF and UB CF. Various weightage variables ranging from 0.3 to 0.7 have been tested. Figure 3 shows that the best accuracy is achieved with a weightage of 0.5.

A similarity threshold was applied in the system to prevent destroying the original information of the original matrix when filling the empty value. Figure 4 shows that the accuracy of the model was affected by the similarity threshold. Overall, our proposed method had the lowest RMSE value across the similarity threshold testing (see Figure 5).

e6a1ecd5-1398-4704-940e-1cc46adca1cd_figure3.gif

Figure 3. RMSE of Various Ratios of User-based to Item-based Collaborative Filtering.

e6a1ecd5-1398-4704-940e-1cc46adca1cd_figure4.gif

Figure 4. RMSE comparison with different similarity threshold and methods.

e6a1ecd5-1398-4704-940e-1cc46adca1cd_figure5.gif

Figure 5. Lowest RMSE value comparison between various methods.

The experiment evaluations indicated that our proposed approach had the lowest RMSE value. With the unknown rating filled by IB and UB CF before passing to the model-based CF, the data sparsity also decreased from 0.9542% to 0.8435%.

Discussion

From the experimental results in the earlier section, we observed that adding the IB CF method to enrich the original data helped to increase the accuracy of the model-based CF RS. It helped to boost the information of the original matrix while not destroying the original information. The added user-item CF method allowed the system to get more accurate similar user and items. However, we still want to know if our proposed method works in other model-based CF RS. Hence, we change the SVD model to the SVD++ model as the enhanced proposed method and re-run the experiment. SVD++ is an extended work from SVD, which achieve better accuracy by optimizing the algorithm to consider implicit feedback20. In our experiments, the results in Figure 6 below show that the enhanced ptoposed method with SVD++ outperforms any other method we used above with the enriched data. This helps to verify that our method can be applied to not only SVD, but any other model-based CF RS.

e6a1ecd5-1398-4704-940e-1cc46adca1cd_figure6.gif

Figure 6. Lowest RMSE value comparison between various methods.

From all the results above, it shows that our proposed method can increase the accuracy of the model-based CF RS. By adding the UB CF method to the existing method proposed by Liu and Li18 that employed only the IB CF method, we can achieve better accuracy than the existing method. This is due to the added UB CF method which allows the system to find the related item by user demography, whereas the IB CF method is not able to do it.

Conclusions

In this paper, we reviewed the current ontology based RS and proposed a data enrichment method which uses ontology in a hybrid RS. The proposed method increases the model-based CF RS input data quality by adding the UB CF to the existing IB CF method. Both methods use the structure of ontology to calculate the semantic similarity and, subsequently, fill the unknown rating values of the original user rating matrix. Experiment results indicated that the data sparsity problem has been minimized and the accuracy of the RS system has been increased.

Several improvements can be conducted in future including algorithm optimization. The current offline model building algorithm takes time to process and can be optimized as parallel processing to improve the processing time. Besides, the semantic similarity calculation can be changed to the level-based calculation to fully utilise the benefits of having ontology in the system.

Data and Source Code Availability

Underlying data

Zenodo: chewljie/dataset-enrichment-RS: V1.0 Initial Release, https://doi.org/10.5281/zenodo.5418122

This project contains the following underlying data:

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 17 Sep 2021
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Chew LJ, Haw SC and Subramaniam S. A hybrid recommender system based on data enrichment on the ontology modelling [version 1; peer review: 2 approved, 1 not approved]. F1000Research 2021, 10:937 (https://doi.org/10.12688/f1000research.73060.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 17 Sep 2021
Views
13
Cite
Reviewer Report 22 Nov 2021
Heru Agus Santoso, Faculty of Computer Science, Dian Nuswantoro University, Semarang, Indonesia 
Approved
VIEWS 13
The article has a good chance of being accepted, but  requires explanation in detail:
  1. Please check new references, there are other types of recommended systems such as knowledge-based, hybrid etc.
     
  2. The
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Santoso HA. Reviewer Report For: A hybrid recommender system based on data enrichment on the ontology modelling [version 1; peer review: 2 approved, 1 not approved]. F1000Research 2021, 10:937 (https://doi.org/10.5256/f1000research.76683.r97017)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
13
Cite
Reviewer Report 21 Oct 2021
Dana Sulistyo Kusumo, School of Computing, Telkom University, Bandung, Indonesia 
Approved
VIEWS 13
This article proposes data enhancement of hybrid recommender system using ontology to model user and item relationship. This can address the cold start problem of the recommender system because it can provide information when a new user or item first comes to the recommender ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Kusumo DS. Reviewer Report For: A hybrid recommender system based on data enrichment on the ontology modelling [version 1; peer review: 2 approved, 1 not approved]. F1000Research 2021, 10:937 (https://doi.org/10.5256/f1000research.76683.r94868)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
25
Cite
Reviewer Report 04 Oct 2021
Rajkumar Kannan, Bishop Heber College, Tiruchirappalli, Tamil Nadu, India 
Not Approved
VIEWS 25
Author Conclusion

In this paper, the authors reviewed the current ontology based RS and proposed a data enrichment method which uses ontology in a hybrid RS. The proposed method claims to increase the model-based CF RS input ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Kannan R. Reviewer Report For: A hybrid recommender system based on data enrichment on the ontology modelling [version 1; peer review: 2 approved, 1 not approved]. F1000Research 2021, 10:937 (https://doi.org/10.5256/f1000research.76683.r94869)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 17 Sep 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.