Recommender Systems: A Data-Driven Framework for Personalized Decision Intelligence

Vinodhkumar Gunasekaran; Ilamathi Elango

doi:10.12688/f1000research.174439.1

Home Browse Recommender Systems: A Data-Driven Framework for Personalized Decision...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Recommender Systems: A Data-Driven Framework for Personalized Decision Intelligence

[version 1; peer review: 1 approved with reservations, 1 not approved]

Vinodhkumar Gunasekaran ¹, Ilamathi Elango²

PUBLISHED 18 Dec 2025

Author details Author details

¹ Global Analytics & Solutions, Circana, Chicago, Illinois, 60089, USA
² Sales Compensation, Medline Industries, Northfield, Illinois, 60093, USA

Vinodhkumar Gunasekaran
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Resources, Software, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Ilamathi Elango
Roles: Project Administration, Visualization, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Software and Hardware Engineering gateway.

Abstract

Background

Recommender systems have become inherent in personalizing experiences, especially digital experiences, across domains such as e-commerce, media, and entertainment. These systems use the user to item interactions data (how an user reacts to an item) to identify patterns that predict preference and rank content. Collaborative filtering is one of the most widely used approaches, relying on similarity between users or items to generate recommendations.

Methods

This study examines collaborative filtering using similarity metrics applied to a curated IMDB movie dataset. Data was preprocessed using merging metadata and ratings, encoding categorical fields, and constructing feature vectors for each movie. The primary metric to compute pairwise distances between items was Cosine similarity. An item-item recommendation engine was then created and implemented, and the output was evaluated using a movie example (the Saw 2004).

Results

The system produced coherent recommendations aligned with the genre and thematic characteristics of the input movie used, Saw (2004). The top-ranked films exhibited high cosine similarity scores, indicating strong vector space proximity and consistent user engagement patterns. Visual exploration of the data confirmed that the similarity-based approach captured meaningful behavioral relationships.

Conclusions

The findings show that a simple similarity-based collaborative filtering model can effectively identify related movies without complex model architectures. Even with lightweight feature engineering, the system generated relevant recommendations that mirror typical user preferences. This demonstrates the practicality of similarity-based methods for scalable and interpretable recommendation tasks, and highlights opportunities for future extensions using hybrid or embedding based models.

Keywords

Recommender Systems; Collaborative Filtering; Similarity Metrics; Predictive Modeling; Behavioral Analytics; Personalization; Decision Intelligence

Corresponding author: Vinodhkumar Gunasekaran

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2025 Gunasekaran V and Elango I. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Gunasekaran V and Elango I. Recommender Systems: A Data-Driven Framework for Personalized Decision Intelligence [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2025, 14:1409 (https://doi.org/10.12688/f1000research.174439.1) First published: 18 Dec 2025, 14:1409 (https://doi.org/10.12688/f1000research.174439.1) Latest published: 18 Dec 2025, 14:1409 (https://doi.org/10.12688/f1000research.174439.1)

Introduction

Recommender systems have evolved into fundamental components of modern decision making, driving personalization and consumer engagement across multiple industries. From Amazon to Netflix, leading global enterprises rely on statistically driven recommendation algorithms to interpret customer behavior and tailor digital experiences accordingly.¹ By leveraging user-item interaction data, these systems employ advanced techniques to predict future user preferences and optimize product exposure.

Empirical evidence underscores their economic significance. A McKinsey study estimates that nearly $35 %$ of Amazon’s total sales are directly attributable to its recommendation engine.² Amazon’s multilayered deployment of recommender algorithms integrated across browsing, search, and checkout environments has redefined the e-commerce experience and raised competitive barriers for market entrants.³ Firms that systematically apply data driven recommender methodologies exhibit higher conversion efficiency, greater customer lifetime value, and stronger market differentiation.

Methods

Applications of recommender systems

Recommender systems are embedded across diverse digital ecosystems and serve as core engines for personalization and decision intelligence. They influence consumption patterns by filtering information and tailoring content according to user preferences. Key applications include:

• E-commerce: Personalized product ranking, bundling, and cross-selling on retail platforms such as Amazon.³
• News and Publishing: Dynamic content curation based on reading frequency, dwell time, and topical affinity.
• Music and Podcasts: Platforms such as Spotify employ collaborative filtering and similarity models to recommend audio tracks and playlists aligned with listener preferences.
• Video Streaming: Netflix and YouTube apply collaborative filtering to predict viewing patterns, optimize watch next queues, and enhance user retention.⁴
• Social Media: Systems on platforms like Instagram and Facebook infer interest clusters, enabling targeted recommendations and advertising.
• Travel and Hospitality: TripAdvisor and related services recommend destinations and accommodations based on spatial, behavioral, and preference proximity.

The economic impact of recommender engines is significant. For instance, Netflix offered a $$ 1$ million prize for a model achieving a $10 %$ reduction in mean-squared error (MSE) relative to its production algorithm.⁵ Such incentives underscore the analytical rigor and commercial value associated with advancing recommender methodologies.

Conceptual framework

Recommender systems are analytical engines that identify and suggest products or services aligned with user preferences. By analyzing interaction patterns and behavioral signals, these systems infer latent interests and generate personalized recommendations tailored to individual consumption profiles. For example, a user who consistently engages with the horror genre in film platforms may receive suggestions for additional horror titles, thereby enhancing engagement and increasing platform retention.

Operational basis

Recommender systems learn from observed interactions by modeling the relationships between users and items. These relational structures form the foundation of preference inference and prediction. Three primary relationship types drive these systems:

User-Item Relationship User-item preference data forms the core of recommendation models. For example, a user who frequently purchases books on Amazon will receive suggestions for similar or complementary books. Likewise, a user repeatedly purchasing beauty products will be recommended related cosmetic items according to their purchase profile.

Item-Item Relationship Item similarity is derived from co-engagement patterns. Consider a viewer who watches Superman. The system may recommend Aquaman due to shared characteristics within the DC Comics universe. This mechanism is particularly effective in cold-start situations for new items, where similarity to known items accelerates exposure.

User-User Relationship Users with similar historical patterns can guide recommendations for one another. For instance, if two readers have both engaged deeply with the Harry Potter series, and one has also read The Lord of the Rings, the system can recommend the latter title to the other user. This process is valuable in early-stage engagement when a user is exploring a new category. The conceptual structure of similarity signals in collaborative filtering is illustrated in Figure 1.

Figure 1. Conceptual relationship structure of a recommender system.

This diagram illustrates the foundational similarity relationships used in collaborative filtering, including user–user similarity, item–item similarity, and user–item interactions.

Collaborative filtering

Collaborative filtering (CF) operates on the principle that users who have exhibited similar preferences in the past are likely to share comparable interests in the future.⁶ CF models leverage observed interactions-such as ratings, clicks, or purchase histories-to infer latent preference structures without requiring explicit content features.

Two primary CF paradigms exist: user-user filtering and item-item filtering. In user-user CF, recommendations are derived by identifying users with similar historical engagement patterns and estimating the target user’s interest based on their neighbors’ preferences. Conversely, item-item CF examines correlations among items; a user who interacts with an item is recommended other items that exhibit high similarity to it.⁷

To generate meaningful recommendations, collaborative filtering relies on the computation of pairwise similarity scores. These metrics quantify how strongly two users or two items align based on observed data structures. Common similarity measures include:

• Jaccard Similarity: Measures the ratio of shared items or interactions to the union of items across users, capturing the degree of overlap.
• Euclidean Distance: Computes geometric distance between rating vectors, reflecting dissimilarity based on absolute deviations.
• Cosine Similarity: Evaluates the cosine of the angle between two high-dimensional vectors, emphasizing directional alignment rather than magnitude differences. This metric is particularly effective in sparse rating matrices.

By applying these metrics, CF systems estimate preference scores and rank items to deliver personalized recommendations. The interaction between users and items within a collaborative filtering model is illustrated in Figure 2.

Figure 2. Conceptual illustration of user–user and item–item relationships in a collaborative filtering framework.

This diagram depicts how similarity is computed from user–user interactions and item–item rating patterns, forming the basis of collaborative filtering prediction.

The user-based collaborative filtering prediction is formally defined in Equation 1 which models the expected rating as a similarity-weighted adjustment of a user’s baseline preference.

(1)

{\hat{r}}_{u, i} = {\bar{r}}_{u} + \frac{\sum_{v \in N (u)} sim (u, v) (r_{v, i} - {\bar{r}}_{v})}{\sum_{v \in N (u)} | sim (u, v) |}

Notation summary

• ${\hat{r}}_{u, i}$ - Predicted rating given by user $u$ to item $i$
• ${\bar{r}}_{u}$ - Average rating of user $u$
• $N (u)$ - Set of neighboring users similar to $u$
• $sim (u, v)$ - Similarity between users $u$ and $v$
• $r_{v, i}$ - Rating given by user $v$ to item $i$
• ${\bar{r}}_{v}$ - Average rating of user $v$

Interpretation

This formulation estimates the unknown rating ${\hat{r}}_{u, i}$ as the user’s baseline preference ( ${\bar{r}}_{u}$ ) plus a weighted average of rating deviations from similar users, where the weights correspond to pairwise similarity scores. In other words, users with greater similarity to $u$ exert stronger influence on the prediction. This represents the foundational model for user-based collaborative filtering. The same logic extends to item-based collaborative filtering by interchanging the user and item indices, allowing the system to infer preferences by examining relationships among items rather than users.

Item-Based Collaborative Filtering Rating Prediction The corresponding item-based prediction model is expressed in Equation 2, where similarity among items guides the recommendation process.

(2)

{\hat{r}}_{u, i} = \frac{\sum_{j \in N (i)} sim (i, j) \cdot r_{u, j}}{\sum_{j \in N (i)} | sim (i, j) |}

Notation summary

• ${\hat{r}}_{u, i}$ - Predicted rating/user preference score for user $u$ on item $i$
• $N (i)$ - Set of items most similar to item $i$
• $sim (i, j)$ - Similarity between item $i$ and item $j$
• $r_{u, j}$ - Rating/interaction score of user $u$ for item $j$

Interpretation

In this formulation, recommendations are generated based on the similarity among items that the user has already interacted with. Unlike the user-based method, which compares users to one another, item-based collaborative filtering compares items using shared patterns of user engagement. This approach is computationally efficient and widely adopted in large-scale systems, such as Amazon’s item-to-item recommendation engine.

Similarity metrics

Jaccard similarity

Jaccard Similarity measures the degree of overlap between two users or two items based on shared interactions. It is defined as the ratio of the intersection of item sets to their union, producing values between 0 and 1. A higher score indicates greater similarity. The relationship between intersection and union in Jaccard similarity is illustrated in Figure 3.

Figure 3. Illustration of Jaccard similarity showing intersection versus union of item sets.

This diagram visualizes how Jaccard similarity is computed by comparing the overlap between two sets (intersection) relative to their combined unique elements (union). It demonstrates how shared items between Movie A and Movie B contribute to their similarity score.

The Jaccard similarity formulation is shown in Equation 3.

(3)

Jaccard (A, B) = \frac{| A \cap B |}{| A \cup B |}

where:

• $A$ - Set of items associated with user or item $A$
• $B$ - Set of items associated with user or item $B$
• $| A \cap B |$ - Number of items common to both sets
• $| A \cup B |$ - Total number of unique items across both sets

Euclidean distance

Euclidean distance measures the geometric distance between two users or items in a multidimensional rating space. Unlike correlation-based similarity measures, Euclidean distance represents dissimilarity, where a smaller value indicates stronger similarity between two profiles. As shown in Figure 4, Euclidean distance captures the dissimilarity between items based on squared rating differences.

Figure 4. Euclidean distance representation between two items based on user interaction patterns.

This diagram illustrates how Euclidean distance quantifies dissimilarity between items by measuring squared rating or interaction differences across shared users. A larger distance indicates more divergent user engagement patterns between Movie A and Movie B.

For two items $A$ and $B$ , the distance is computed using Equation 4.

(4)

dist (A, B) = \sqrt{\sum_{u = 1}^{n} {(r_{uA} - r_{uB})}^{2}}

where:

• $r_{uA}$ - Rating or interaction value of user $u$ for item $A$
• $r_{uB}$ - Rating or interaction value of user $u$ for item $B$
• $n$ - Number of users who interacted with either item

For interpretability, practitioners sometimes use the squared distance form ( Equation 5):

(5)

{dist}^{2} (A, B) = \sum_{u = 1}^{n} {(r_{uA} - r_{uB})}^{2}

Euclidean distance performs effectively in low-dimensional or moderately sized datasets, particularly when only a limited number of overlapping users or items exist.

Cosine similarity

Cosine similarity measures the cosine of the angle between two vectors representing user or item profiles in a multidimensional space. It captures how closely aligned two entities are in direction, irrespective of magnitude, making it highly suitable for sparse, high-dimensional datasets such as movie ratings or user-item interaction logs. A vector-space interpretation of cosine similarity is illustrated in Figure 5, showing how the angle between two item vectors determines their similarity.

Figure 5. Cosine similarity representation showing the angle $θ$ between two movie vectors.

This diagram visualizes cosine similarity in a vector space, illustrating how the angle between item vectors (Movie A and Movie B) determines similarity. A smaller angle indicates stronger alignment between rating patterns, while a larger angle indicates divergence.

For two items $A$ and $B$ , cosine similarity is defined in Equation 6.

(6)

sim (A, B) = cos (θ) = \frac{A \cdot B}{‖ A ‖ ‖ B ‖}

where

A

and

B

represent item (or user) vectors,

A \cdot B

denotes their dot product, and

‖ A ‖

and

‖ B ‖

denote their vector magnitudes.

The cosine similarity score ranges between -1 and 1:

• $θ = 0^{\circ}$ (same direction) ⇒ similarity $= 1$
• $θ = 90^{\circ}$ (orthogonal) ⇒ similarity $= 0$
• $θ = 180^{\circ}$ (opposite direction) ⇒ similarity $= - 1$

The directional interpretation of cosine angles is summarized in Table 1.

Table 1. Cosine similarity interpretation.

$θ$	Direction	$cos (θ)$
$0^{\circ}$	Same	1
$90^{\circ}$	Orthogonal	0
$180^{\circ}$	Opposite	-1

Building a recommender system (IMDB dataset)

Data source

The recommender framework was developed using the IMDB Extensive Dataset available on Kaggle.⁸ This dataset provides comprehensive metadata such as movie titles, genres, release information, production studios, and user generated ratings, making it suitable for collaborative filtering research.

The complete implementation, including preprocessing scripts and model code, is publicly available in the author’s Zenodo repository.¹³

Data preparation

Two primary data files-one containing movie attributes and another containing user ratings-were merged using the movie identifier as the key field. Missing or inconsistent observations were removed to reduce noise and minimize sparsity in the user-item rating matrix.

Categorical attributes (e.g., language, genre) were encoded using binary indicator variables. For instance, English language films were encoded as 1, and non-English films as 0; similarly, each genre category was assigned an individual binary flag. After data cleaning and dimensionality reduction, the final working dataset consisted of approximately 65,000 observations and 80 predictor variables. Trends in movie ratings and review volume over time are summarized in Figure 6, which illustrates how viewer engagement has evolved across decades. Demographic differences in genre preferences are summarized in Figure 7, which compares average movie ratings across four major age groups.

Figure 6. Average rating and number of reviews per year in the IMDB dataset.

This scatter plot shows how movie popularity (measured by number of reviews) and average ratings evolve over time. Darker points represent higher average ratings, highlighting trends in viewer engagement across different decades.

Figure 7. Average movie rating per genre across age demographics.

This figure compares how different age groups (0–18, 18–30, 30–45, and 45+) rate movies across various genres. Each panel represents one demographic segment, showing variations in genre preferences and average rating patterns across age groups.

Model framework

The recommender model is built on collaborative filtering, using cosine similarity as the primary distance metric. Each movie is represented as a vector in a multidimensional feature space derived from metadata and user-rating attributes.

Model workflow

1. Data Integration: Merge movie metadata and user-rating tables.
2. Pre-processing: Remove missing values, encode categorical variables, normalize numeric features.
3. Feature Engineering: Retain key predictors such as release year, user vote counts, genre indicators, and language attributes.
4. Similarity Computation: Compute cosine similarity across the item-item matrix.

Recommendation Generation: Rank all movies by similarity and return Top- $N$ recommendations.

System architecture

The end-to-end workflow of the recommender system is illustrated in Figure 8, showing the stages of data collection, preprocessing, model computation, and recommendation generation.

Figure 8. Recommender system workflow from data ingestion to recommendation generation.

This diagram outlines the end-to-end pipeline used in the recommender system implementation, including data collection and merging, preprocessing, construction of item and user feature matrices, cosine similarity computation, and generation of Top-N recommended items.

Implementation summary

The Python 3.10 implementation uses pandas, numpy, and scikit-learn. Full code is available in the GitHub repository.⁹

Execution steps

1. Construct the movie-feature matrix.
2. Compute the cosine similarity matrix.
3. Develop a scoring function to retrieve Top- $N$ matches.
4. Validate recommendations using IMDB metadata.

Illustrative output

To demonstrate system behavior, Saw (2004) was selected as the seed movie. The model returned strongly aligned horror titles such as The Silent Scream and Catacombs, indicating good thematic coherence.

Results

The recommender system was evaluated on a curated IMDB movie dataset using cosine-similarity-based collaborative filtering. The evaluation focuses on whether suggested movies align with behavioral and thematic tendencies observed in the user’s previously rated or viewed content.

Recommendation output

Based on the input movie Saw (2004), the system retrieved the top five movies exhibiting the highest cosine-similarity values. The recommendation outcomes are shown in Table 2.

Table 2. Recommendation output for seed movie “Saw (2004)”.

Seed movie	Recommended title	Cosine similarity
Saw (2004)	The Silent Scream (2005)	0.987
	Catacombs (2007)	0.984
	House of 9 (2005)	0.973
	The Human Centipede (2009)	0.968
	Hostel (2005)	0.962

Interpretation of findings

The recommender system successfully proposed closely related horror and psychological-thriller films, such as The Silent Scream and Catacombs, for a viewer who watched Saw. This demonstrates strong genre coherence and relevance in the generated suggestions.

The recommendations exhibit high internal consistency, indicating that cosine similarity effectively captures vector-space proximity between movies with comparable thematic and stylistic characteristics. The similarity scores, each approaching 1.0, reflect minimal angular separation between vectors, implying substantial shared audience engagement patterns. This alignment between the watched movie and the recommended titles is illustrated in Figure 9.

Figure 9. Watched versus recommended movies generated by the recommender system for a sample user.

This diagram contrasts a user’s previously watched movie (“Saw”, 2004) with the system-generated recommendations. The suggested movies share strong thematic and stylistic similarities with the watched title, illustrating how cosine similarity captures genre alignment and audience-engagement patterns in the model’s output.

Practical implications

The results indicate that a collaborative-filtering approach, supported by structured metadata and a lightweight similarity metric, can approximate human perception of content relatedness.

The framework is scalable and applicable to multiple domains beyond movies, including music, e-commerce, streaming services, and digital platforms. Item metadata-such as brand, category, or stylistic attributes-can seamlessly replace movie features in domain-specific deployments.

These findings reinforce the viability of similarity-based collaborative filtering as a practical and high-interpretability recommendation strategy for industrial systems.

Discussion

The evaluation results demonstrate that cosine-similarity-based collaborative filtering can reproduce genre-consistent and thematically aligned recommendations using a relatively simple feature representation. The close alignment between the recommended titles and the seed movie Saw (2004) indicates that vector-space similarity captures meaningful behavioral patterns that extend beyond explicit metadata. This suggests that item-item proximity in rating space can implicitly encode narrative style, pacing characteristics, and audience affinity, even when these attributes are not explicitly modeled.

These findings are consistent with prior work that has shown the effectiveness of item-item collaborative filtering in sparse environments.³ Similar to observations by Bobadilla et al.,⁶ the model benefits from the fact that cosine similarity emphasizes directional alignment rather than absolute magnitude, making it well-suited for datasets such as IMDB where users interact with only a small fraction of available items. The high internal consistency of similarity scores supports prior evidence that neighborhood-based methods can be competitive benchmarks against more complex latent factor models when interpretability and computational efficiency are required.

The results also highlight the practical utility of similarity-based recommenders as scalable and domain-agnostic tools. Because the model relies on structural patterns in user-item data, it can be deployed in applications such as e-commerce, music streaming, news personalization, or digital media platforms with minimal architectural modifications. The workflow demonstrated here serves as a transparent baseline system that can be implemented rapidly while still offering actionable personalization insights.

Nevertheless, the study also exposes constraints associated with neighborhood-based collaborative filtering. The observed performance is influenced by sparsity in user rating behavior, and the model does not explicitly correct for individual rating bias, which can skew similarity computations. Additionally, similarity-based recommenders inherently struggle with the cold-start problem for new items or users lacking historical data. These limitations motivate further research into hybrid systems that integrate metadata-driven signals or latent embedding methods to enhance robustness.

Overall, the results reaffirm that lightweight similarity-based approaches remain powerful tools for recommendation tasks, especially when transparency and operational simplicity are prioritized. The system presented here provides a strong foundation upon which more advanced or domain-specific enhancements can be developed.

Conclusions

This study presented a comprehensive overview of recommender systems, their foundational principles, and their role in modern digital ecosystems. We outlined the conceptual framework of user-item, item-item, and user-user relationships that underpin recommendation algorithms, followed by collaborative filtering fundamentals and similarity measures such as Jaccard, Euclidean, and Cosine metrics.

Using publicly available IMDB data, a case study demonstrated the practical implementation of these concepts. The resulting system successfully identified and ranked movies similar to a user-provided title, confirming the ability of a similarity-based model to replicate genre associations through vector-space analysis. The findings reinforce that even a simple similarity-driven framework can effectively model user preferences and generate contextually relevant recommendations.

Future extensions could incorporate hybrid architectures that leverage both collaborative and content-based signals, as well as temporal and contextual features to improve personalization accuracy and robustness.

Limitations and future work

While the proposed framework effectively demonstrates similarity-based collaborative filtering, several methodological limitations exist. First, the use of linear similarity assumptions may oversimplify complex nonlinear preference patterns observed in real-world behavior.¹⁰ Second, reliance on pairwise similarity metrics restricts the model’s ability to learn latent representations of users and items.¹¹ Third, the absence of normalization for individual bias and variance may introduce skewness in affinity scoring, particularly for users with extreme rating tendencies or highly popular items. Additionally, the framework does not address the “cold start” problem associated with new users or items that lack historical interaction data.¹² Future research could address these limitations by employing matrix factorization or neural embedding techniques,¹⁰ probabilistic models to capture uncertainty and behavior variability,¹² and hybrid recommenders that fuse collaborative filtering with contextual and content-aware learning to improve scalability and generalizability.

Software availability

Source code available from: https://github.com/vinoalles/Recommender_System

Archived source code available from: https://doi.org/10.5281/zenodo.17822412

License: MIT License (OSI-approved)

Ethics and consent

No human subjects, private data, or biological specimens were involved.

Data availability

All processed datasets used in this study are openly available on Zenodo:

Gunasekaran, V. (2025). Similarity-Based Metadata Recommender System – Processed Feature Dataset.

Zenodo. https://doi.org/10.5281/zenodo.17822412.¹³

This deposit includes:

• imdb_mapping.csv – minimal ID/title/year table
• imdb_processed_features_fixed.csv – engineered feature matrix
• X_scaled.npy – standardized matrix used by the KNN model

No proprietary IMDb ratings, votes, cast/crew, or synopsis data are included.

All data are author-generated derivatives of non-copyrighted fields.

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Reporting guidelines

This study does not involve clinical trials, human participants, animals, or qualitative research, and therefore does not require CONSORT, STROBE, ARRIVE, or COREQ/SRQR reporting checklists. The article follows the general reproducibility and transparency standards recommended by F1000Research for computational research.

References

1. Resnick P, Varian HR: Recommender systems. Commun. ACM. 1997; 40(3): 56–58. Publisher Full Text
2. McKinsey & Company: The power of personalization.Reference Source2013.
3. Linden G, Smith B, York J: Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 2003; 7(1): 76–80. Publisher Full Text
4. Davidson J, Liebald B, Liu J, et al.: The youtube video recommendation system. Proceedings of the 4th ACM Conference on Recommender Systems (RecSys). ACM; 2010; pages 293–296.
5. Bennett J, Lanning S: The netflix prize. Proceedings of KDD Cup and Workshop. 2007.
6. Bobadilla J, Ortega F, Hernando A, et al.: Recommender systems survey. Knowl.-Based Syst. 2013; 46: 109–132. Publisher Full Text
7. Said A, Berkovsky S, De Luca EW: A comparative study on recommender system algorithms. Proceedings of the 2012 ACM Conference on Recommender Systems. ACM; 2012; 445–446. .
8. simhyunsu: Imdb extensive dataset.2024. Reference Source
9. Gunasekaran V: Recommender system implementation repository.Reference Source
10. Koren Y, Bell R, Volinsky C: Matrix factorization techniques for recommender systems. Computer. 2009; 42(8): 30–37. Publisher Full Text
11. Hofmann T: Latent semantic models for collaborative filtering. ACM Transactions on Information Systems (TOIS). 2004; 22(1): 89–115. Publisher Full Text
12. Salakhutdinov R, Mnih A: Probabilistic matrix factorization. Advances in Neural Information Processing Systems (NIPS). 2008; pages 1257–1264.
13. Gunasekaran V: Similarity-Based Metadata Recommender System – Processed Feature. Dataset. Zenodo. 2025. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 18 Dec 2025

Author details Author details

¹ Global Analytics & Solutions, Circana, Chicago, Illinois, 60089, USA
² Sales Compensation, Medline Industries, Northfield, Illinois, 60093, USA

Vinodhkumar Gunasekaran
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Project Administration, Resources, Software, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Ilamathi Elango
Roles: Project Administration, Visualization, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 18 Dec 2025, 14:1409

https://doi.org/10.12688/f1000research.174439.1

Copyright

© 2025 Gunasekaran V and Elango I. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Gunasekaran V and Elango I. Recommender Systems: A Data-Driven Framework for Personalized Decision Intelligence [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2025, 14:1409 (https://doi.org/10.12688/f1000research.174439.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 18 Dec 2025

Views

4

Reviewer Report 05 Jan 2026

Yuri Ariyanto, Politeknik Negeri Malang, Malang, Indonesia

Approved with Reservations

https://doi.org/10.5256/f1000research.192341.r445608

1. The research focus is not clear; the focus must be reframed as an applied case study with a comprehensive quantitative evaluation.
2. The research baseline cannot be found and can be stated in the table. Include baseline methods, ... Continue reading

1. The research focus is not clear; the focus must be reframed as an applied case study with a comprehensive quantitative evaluation.
2. The research baseline cannot be found and can be stated in the table. Include baseline methods, standard evaluation metrics, and statistical analysis so the research is detailed.
3. Clearly articulate the limitations and trade-offs compared to modern approaches. For example, hybrid collaborative filtering
4. Strengthen the discussion and conclusion sections following a rigorous scientific structure. Not clear about discussing cold-start and scalability limitations; and rewrite conclusions to reflect actual findings and future directions.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Recommender Systems, Hybrid Recommendation, Similarity Metrics, Robustness, and Reliability Data in the Recommender Systems

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

4

Reviewer Report 29 Dec 2025

Kulvinder Singh, Kurukshetra University, Kurukshetra, Haryana, India

Not Approved

https://doi.org/10.5256/f1000research.192341.r443430

The authors must specify the new contributions, be it methodological, analytical or empirical. Without such explanation, the manuscript can be interpreted as some sort of a tutorial or replication study rather than a piece of original research.

The authors must specify the new contributions, be it methodological, analytical or empirical. Without such explanation, the manuscript can be interpreted as some sort of a tutorial or replication study rather than a piece of original research. It is therefore important to show how the proposed structure enhances the current state of art, as opposed to regurgitating already proven recommender system pipelines.
How do the authors justify the claim that the framework offers the delivery of intelligent/optimised recommendations?
No standard recommender-system measures (Precision@K Recall@K MAP NDCG RMSE) are reported in the manuscript. Thus, there is no objective data that should prove that the given framework works competitively - or even sufficiently - in comparison to the most basic baselines.
What does the model do to user rating bias, scale differences or normalisation and these are factors that have been well known to significant affect similarity computation?
The IMDB data has both the problem of strong popularity bias and long-tail sparsity and the manuscript uses this data as a neutral. Authors do not make any analysis regarding the effect of these biases upon similarity scores or, their recommendation outcomes or suggest any mitigation measures. As a result, there is a need to elaborate on how the framework will counter the inherent popularity bias and long fan sparsity of the IMDB data.
The cold-start issue is also temporarily admitted, but there is no resolution in the framework. The manner in which the system supports new users or new items is thus necessary to explain, since cold -storm is a major real-life problem.
The computation of the item-item similarity scales quadratically with the catalogue size, thus rendering the strategy impractical with large catalogue sizes unless approximation is used. Leaving out the techniques of approximate methods (e.g. locality -sensitive hashing, ANN search) is a serious technical omission.
What were the baseline procedures that were taken into consideration and why not comparative experiments?
To a great extent, the manuscript avoids its interaction with modern paradigms of recommenders. What are the criteria that could help the readers decide when this approach is the most appropriate as compared to contemporary ones?
Several figures are descriptive rather than analytical and may therefore be cut out without a loss in the technical matter.
The discussion section is a repetition of the predetermined ideas and does not critically interpret the results.
Rewrite the Conclusion: Restate your hypothesis or research question, restate your major findings, explain relevance and added value of your work. Highlight any limitations of your study, describe future directions for research and recommendations.
Reference list is poor
The results of the work should be compared with the results of other investigators and/or other methodologies, experimental results or even simulated results. This is needed to place this work in perspective with other work in the field and provide more credibility for the present results.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Social Networking, Recommendation System

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 18 Dec 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 18 Dec 25	read	read

Kulvinder Singh, Kurukshetra University, Kurukshetra, India
Yuri Ariyanto, Politeknik Negeri Malang, Malang, Indonesia

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

4 Views

05 Jan 2026 | for Version 1

Yuri Ariyanto, Politeknik Negeri Malang, Malang, Indonesia

4 Views Cite this report Responses(0)

Approved With Reservations

1. The research focus is not clear; the focus must be reframed as an applied case study with a comprehensive quantitative evaluation.
2. The research baseline cannot be found and can be stated in the table. Include baseline methods, standard evaluation metrics, and statistical analysis so the research is detailed.
3. Clearly articulate the limitations and trade-offs compared to modern approaches. For example, hybrid collaborative filtering
4. Strengthen the discussion and conclusion sections following a rigorous scientific structure. Not clear about discussing cold-start and scalability limitations; and rewrite conclusions to reflect actual findings and future directions.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Recommender Systems, Hybrid Recommendation, Similarity Metrics, Robustness, and Reliability Data in the Recommender Systems

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

4 Views

29 Dec 2025 | for Version 1

Kulvinder Singh, Kurukshetra University, Kurukshetra, Haryana, India

4 Views Cite this report Responses(0)

Not Approved

The authors must specify the new contributions, be it methodological, analytical or empirical. Without such explanation, the manuscript can be interpreted as some sort of a tutorial or replication study rather than a piece of original research. It is therefore important to show how the proposed structure enhances the current state of art, as opposed to regurgitating already proven recommender system pipelines.
How do the authors justify the claim that the framework offers the delivery of intelligent/optimised recommendations?
No standard recommender-system measures (Precision@K Recall@K MAP NDCG RMSE) are reported in the manuscript. Thus, there is no objective data that should prove that the given framework works competitively - or even sufficiently - in comparison to the most basic baselines.
What does the model do to user rating bias, scale differences or normalisation and these are factors that have been well known to significant affect similarity computation?
The IMDB data has both the problem of strong popularity bias and long-tail sparsity and the manuscript uses this data as a neutral. Authors do not make any analysis regarding the effect of these biases upon similarity scores or, their recommendation outcomes or suggest any mitigation measures. As a result, there is a need to elaborate on how the framework will counter the inherent popularity bias and long fan sparsity of the IMDB data.
The cold-start issue is also temporarily admitted, but there is no resolution in the framework. The manner in which the system supports new users or new items is thus necessary to explain, since cold -storm is a major real-life problem.
The computation of the item-item similarity scales quadratically with the catalogue size, thus rendering the strategy impractical with large catalogue sizes unless approximation is used. Leaving out the techniques of approximate methods (e.g. locality -sensitive hashing, ANN search) is a serious technical omission.
What were the baseline procedures that were taken into consideration and why not comparative experiments?
To a great extent, the manuscript avoids its interaction with modern paradigms of recommenders. What are the criteria that could help the readers decide when this approach is the most appropriate as compared to contemporary ones?
Several figures are descriptive rather than analytical and may therefore be cut out without a loss in the technical matter.
The discussion section is a repetition of the predetermined ideas and does not critically interpret the results.
Rewrite the Conclusion: Restate your hypothesis or research question, restate your major findings, explain relevance and added value of your work. Highlight any limitations of your study, describe future directions for research and recommendations.
Reference list is poor
The results of the work should be compared with the results of other investigators and/or other methodologies, experimental results or even simulated results. This is needed to place this work in perspective with other work in the field and provide more credibility for the present results.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Social Networking, Recommendation System

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

[1] 1. Resnick P, Varian HR: Recommender systems. Commun. ACM. 1997; 40(3): 56–58. Publisher Full Text

[2] 2. McKinsey & Company: The power of personalization.Reference Source2013.

[3] 3. Linden G, Smith B, York J: Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 2003; 7(1): 76–80. Publisher Full Text

[4] 4. Davidson J, Liebald B, Liu J, et al.: The youtube video recommendation system. Proceedings of the 4th ACM Conference on Recommender Systems (RecSys). ACM; 2010; pages 293–296.

[5] 5. Bennett J, Lanning S: The netflix prize. Proceedings of KDD Cup and Workshop. 2007.

[6] 6. Bobadilla J, Ortega F, Hernando A, et al.: Recommender systems survey. Knowl.-Based Syst. 2013; 46: 109–132. Publisher Full Text

[7] 7. Said A, Berkovsky S, De Luca EW: A comparative study on recommender system algorithms. Proceedings of the 2012 ACM Conference on Recommender Systems. ACM; 2012; 445–446. .

[8] 8. simhyunsu: Imdb extensive dataset.2024. Reference Source

[9] 9. Gunasekaran V: Recommender system implementation repository.Reference Source

[10] 10. Koren Y, Bell R, Volinsky C: Matrix factorization techniques for recommender systems. Computer. 2009; 42(8): 30–37. Publisher Full Text

[11] 11. Hofmann T: Latent semantic models for collaborative filtering. ACM Transactions on Information Systems (TOIS). 2004; 22(1): 89–115. Publisher Full Text

[12] 12. Salakhutdinov R, Mnih A: Probabilistic matrix factorization. Advances in Neural Information Processing Systems (NIPS). 2008; pages 1257–1264.

[13] 13. Gunasekaran V: Similarity-Based Metadata Recommender System – Processed Feature. Dataset. Zenodo. 2025. Publisher Full Text

Recommender Systems: A Data-Driven Framework for Personalized Decision Intelligence

Abstract

Background

Methods

Results

Conclusions

Keywords

Introduction

Methods

Applications of recommender systems

Conceptual framework

Operational basis

Figure 1. Conceptual relationship structure of a recommender system.

Collaborative filtering

Figure 2. Conceptual illustration of user–user and item–item relationships in a collaborative filtering framework.

(1)

Notation summary

Interpretation

(2)

Notation summary

Interpretation

Similarity metrics

Figure 3. Illustration of Jaccard similarity showing intersection versus union of item sets.

(3)

Euclidean distance

Figure 4. Euclidean distance representation between two items based on user interaction patterns.

(4)

(5)

Cosine similarity

Figure 5. Cosine similarity representation showing the angle θ between two movie vectors.

(6)

Table 1. Cosine similarity interpretation.

Building a recommender system (IMDB dataset)

Data source

Data preparation

Figure 6. Average rating and number of reviews per year in the IMDB dataset.

Figure 7. Average movie rating per genre across age demographics.

Model framework

Model workflow

System architecture

Figure 8. Recommender system workflow from data ingestion to recommendation generation.

Implementation summary

Illustrative output

Results

Recommendation output

Table 2. Recommendation output for seed movie “Saw (2004)”.

Interpretation of findings

Figure 9. Watched versus recommended movies generated by the recommender system for a sample user.

Practical implications

Discussion

Conclusions

Limitations and future work

Software availability

Ethics and consent

Data availability

Reporting guidelines

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated

Figure 5. Cosine similarity representation showing the angle $θ$ between two movie vectors.