Development, validation and use of artificial-intelligence-related technologies to assess basic motor skills in children: a scoping review

Joel Figueroa-Quiñones; Juan Ipanaque-Neyra; Heber Gómez Hurtado; Oscar Bazo-Alvarez; Juan Carlos Bazo-Alvarez

doi:10.12688/f1000research.138616.2

Home Browse Development, validation and use of artificial-intelligence-related...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Systematic Review

Revised

Development, validation and use of artificial-intelligence-related technologies to assess basic motor skills in children: a scoping review

[version 2; peer review: 1 approved with reservations]

Joel Figueroa-Quiñones ¹, Juan Ipanaque-Neyra², Heber Gómez Hurtado^2,3, Oscar Bazo-Alvarez^2,4, Juan Carlos Bazo-Alvarez^5,6

Joel Figueroa-Quiñones ¹, Juan Ipanaque-Neyra², [...] Heber Gómez Hurtado^2,3, Oscar Bazo-Alvarez^2,4, Juan Carlos Bazo-Alvarez^5,6

PUBLISHED 02 Sep 2025

Author details Author details

¹ Universidad Autonoma de Ica, Chiclayo, Ica, Peru
² Instituto de Investigación, Capacitación y Desarrollo Psicosocial y Educativo (PSYCOPERU), Lima, Peru
³ Ingeniería de Sistemas e Informática, Universidad Tecnológica del Perú, Lima, Peru
⁴ School of Medicine, Universidad San Juan Bautista, Lima, Peru
⁵ Research Department of Primary Care and Population Health, University College London, London, UK
⁶ MedFam Group, School of Medicine, Universidad Cesar Vallejo, Trujillo, Peru

Joel Figueroa-Quiñones
Roles: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Resources, Software, Supervision, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Juan Ipanaque-Neyra
Roles: Data Curation, Formal Analysis, Resources, Validation, Visualization, Writing – Original Draft Preparation

Heber Gómez Hurtado
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Writing – Original Draft Preparation

Oscar Bazo-Alvarez
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Validation, Visualization, Writing – Review & Editing

Juan Carlos Bazo-Alvarez
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Supervision, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background

In basic motor skills evaluation, two observers can eventually mark the same child’s performance differently. When systematic, this brings serious noise to the assessment. New motion sensing and tracking technologies offer more precise measures of these children’s capabilities. We aimed to review current development, validation and use of artificial intelligence-related technologies that assess basic motor skills in children aged 3 to 6 years old.

Methods

We performed a scoping review in Medline, EBSCO, IEEE and Web of Science databases. PRISMA Extension recommendations for scoping reviews were applied for the full review, whereas the COSMIN criteria for diagnostic instruments helped to evaluate the validation of the artificial intelligence (AI)-related measurements.

Results

We found 672 studies, from which 12 were finally selected, 7 related to development and validation and 5 related to use. From the 7 technology development studies, we examined their citation networks using Google Scholar and identified 10 subsequent peer-reviewed publications that either enhanced the original technologies or applied them in new research contexts. Studies on AI-related technologies have prioritized development and technological features. The validation of these algorithms was based on engineering standards, focusing on their accuracy and technical performance, but without integrating medical and psychological knowledge about children’s motor development. They also did not consider the technical characteristics that are typically assessed in psychometric instruments designed to assess motor skills in children (e.g., the Consensus-based Standards for the Selection of Health Measurement Instruments “COSMIN”). Therefore, the use of these AI-related technologies in scientific research is still limited.

Conclusion

Clinical measurement standards have not been integrated into the development of AI-related technologies for measuring basic motor skills in children. This compromises the validity, reliability and practical utility of these tools, so future improvement in this type of research is needed.

Keywords

Basic motor skills, fundamental movements, machine learning, motion detection, prediction techniques  

Corresponding author: Joel Figueroa-Quiñones

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2025 Figueroa-Quiñones J et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Figueroa-Quiñones J, Ipanaque-Neyra J, Gómez Hurtado H et al. Development, validation and use of artificial-intelligence-related technologies to assess basic motor skills in children: a scoping review [version 2; peer review: 1 approved with reservations]. F1000Research 2025, 12:1598 (https://doi.org/10.12688/f1000research.138616.2) First published: 18 Dec 2023, 12:1598 (https://doi.org/10.12688/f1000research.138616.1) Latest published: 02 Sep 2025, 12:1598 (https://doi.org/10.12688/f1000research.138616.2)

Revised Amendments from Version 1

In this second version of the manuscript, we implemented minor adjustments as requested during editorial review and by the external reviewers.

Title: Remains unchanged.
Authors and affiliations: Affiliations were corrected since some of the authors have changed institutions over time, and we are currently supported by new funding during this updated stage of the study.
Abstract and main text: The reviewers’ suggestions were incorporated, These adjustments included improvements in writing style, greater conceptual clarity in the introduction and discussion, and more precise methodological descriptions. The main conclusions of the study remain unchanged.
Tables, figures, and data: No changes were made to tables, figures, or data originally reported.

See the authors' detailed response to the review by Abraham M. Joshua and Ashish John Prabhakar

Introduction

The development of basic motor skills (BMS) in children aged 3 to 6 years is critical, as this is a period of rapid motor growth, where children acquire physical skills that allow them to participate in a variety of activities.¹^,² At this age, children experience significant improvements in gross motor control, allowing them to perform movements such as running, jumping, and manipulating objects with greater precision.³^,⁴ The acquisition of these motor skills is essential for physical, cognitive and emotional development, as BMS are strongly linked to general well-being, self-esteem and social integration.⁵ For example, children with BMS stimulation tend to participate more in physical activities (e.g., school games and sports), suggesting socioemotional and health benefits such as early prevention of obesity.⁶ Likewise, some studies have designed, implemented, and recommended early interventions to promote healthy BMS development in preschool children.⁷ To evaluate the efficacy of these interventions and to monitor the optimal development of BMS in children, valid and reliable measurement tools are needed. Typically, BMS assessment relies on trained professionals who observe, record, and score children’s performance on specific motor tasks.⁸^,⁹ However, a major challenge in this approach is observer bias. Even when raters receive standardized training, small differences in scoring can introduce variability in BMS measurements. This variability reduces the accuracy of the assessment and can lead to misinterpretations. For example, two children with similar motor skills may receive different scores depending on the assessor, resulting in inconsistent results. When these inconsistencies follow a systematic pattern, they contribute to observer bias, a well-documented source of measurement error.¹⁰^,¹¹ In fact, one review reported that of 960 behavioral studies, only 3.2% reported measures of interobserver reliability, and only 1.9% met rigorous criteria for minimizing bias.¹² Similarly, another review on child development found that the quality of reporting on the use of assessors in these studies was poor and that variability in assessor performance may obscure the true developmental status of children, compromising complex and costly clinical decisions.¹³

AI-related technologies (i.e., computational systems that use artificial intelligence to analyze, learn from, and interpret data) offer a promising alternative to minimize observer bias in BMS assessment.¹⁴ For example, for motion capture and analysis, computer vision tools such as OpenPose, MediaPipe and DeepLabCut enable pose estimation and tracking of key points of the human body with high accuracy.¹⁵ In addition, deep learning techniques, such as Convolutional Neural Networks (CNN) and vision-specialized Transformer Models (ViT), have proven to be effective in classifying motion sequences in videos.¹⁶ In that sense, these AI-related technologies for recognizing and classifying human motion patterns consist of several steps ( Figure 1).¹⁷ First, sensor or video devices capture data on human movement. Then, these data undergo pre-processing to reduce noise and enhance relevant features. This step often involves filtering techniques, such as Fast Fourier Transformation (which helps separate important movement signals from background noise) or wavelet transforms. Additionally, to simplify complex data and highlight key movement patterns, methods like principal components analysis (which reduces data dimensions while preserving essential information) or linear discriminant analysis (which enhances the distinction between movement categories) are applied.¹⁷ Next, feature selection methods come into play, determining a subset of features from the initial set that is highly suitable for subsequent classification while adhering to various optimisation criteria. Among the efficient methods for feature selection are Sequential Forward Selection, which starts with an empty set and iteratively adds the feature that best meets the optimisation criterion, and Backward Selection, which involves removing features from the set in a repetitive manner. Finally, AI or machine learning classifiers are required to identify the corresponding class of motion, in our case, a class that reflects the BMS development of a child (e.g., delayed, normal or advanced for its age group). Machine learning tools include binary classification trees, decision engines, Bayes classifiers, k-Nearest Neighbour, rule-based approaches, linear discriminant classifiers and Support Vector Machines. More sophisticated deep learning tools, such as neural networks, are also used. From here onwards, we indistinctly use the expression ‘AI-related technology’ for referring to the full process described in Figure 1 or just the classification tools.

Figure 1. Process of recognition and classification of human motion patterns performed by artificial intelligence (AI)-related technologies.

The application of AI-related technology in physical performance assessment is rapidly increasing.¹⁸ For example, machine learning techniques have been used to assess physical activity intensity in adults.¹⁹^,²⁰ A recent review identified at least 53 studies on motion detection using deep learning or machine learning, with 75% of these studies published since 2015.²¹ AI has also been applied to detect gait abnormalities²² and diagnose health conditions related to walking patterns,²³^,²⁴ as well as to identify early motor skill impairments linked to neurodevelopmental disorders.²⁵ Other AI-based algorithms have been implemented to evaluate psychomotor learning performance in students.²⁶^,²⁷ However, despite the increasing use of AI in motor performance assessment, there is no comprehensive review examining its specific application in the assessment of BMS in preschool children, being a crucial stage for early detection and intervention. Moreover, the scope, limitations and validity of AI-based technologies in this context are not yet clearly systematized. Therefore, it is required to synthesize existing knowledge and guide the development of more accurate and accessible assessment tools.

In this study, we aimed to perform a scoping review on studies related to the development and use of AI-related technologies to assess BMS in children. Our objectives were to: 1) determine the general characteristics of the studies; 2) describe the engineering of the AI technologies designed to assess BMS in preschoolers; 3) determine the substantive validation performed on the AI technologies identified, and 4) describe the current use of these AI technologies in applied research.

Methods

The protocol for this review is available here.²⁸ The PRISMA Extension recommendations for scoping reviews were applied for the full review, whereas the COSMIN guidelines were applied for objective 2.²⁹^,³⁰ The checklists of these guidelines can be found here.³¹

Target studies

We were interested in published studies focused on engineering, substantive validation, or the use of AI-related technologies developed to evaluate BMS in children. A study was focused on engineering when it was strictly dedicated to developing algorithms for motor skills recognition and classification. A study was focused on substantive validation when the validity and reliability of the AI-related technology were evaluated following psychometric international standards.³² A study only used AI-related technology when it did not include engineering or validation; in other words, it just used the technology developed by someone else.

We also defined the following criteria for the search: 1) studies in preschool-aged children (3 to 6 years), 2) studies in which the motor ability (motor or play skills) of the child was assessed using AI-related technologies for motion detection, and 3) studies in which at least one of the basic motor skills described in the literature (running, jumping, kicking, throwing, or catching a ball) was measured. In addition, we excluded 1) studies that did not clearly describe the AI-related technology used or developed, 2) opinion articles, editorials, or narrative reviews without empirical data and 3) grey literature (e.g. theses, dissertations, or non-peer-reviewed reports).

Search strategy

We searched for studies published before January 30, 2023 in the target publications in Medline (SCR_002185), Web of Science (SCR_022706), IEEE (SCR_008314), and EBSCO (SCR_022707). These databases were selected because they specialize in biomedical, engineering, and multidisciplinary research, ensuring that we captured relevant studies in health sciences, AI applications, and motion analysis.

Search terms included keyword combinations such as “child,” “preschool,” “basic motor skills,” “artificial intelligence,” “motion sensing,” and “calibration,” along with related terms and synonyms identified through a preliminary literature review (keywords) and controlled vocabulary (MeSH terms). The full search strategy and complete list of search terms are available here.³³

The search formulas were applied to the databases and all the files were exported in RIS format. Then, to ensure an objective selection process, these identified files were uploaded to the Rayyan platform which facilitated blind selection by the reviewers and expedited the identification of duplicates.

The selection process consisted of two phases. In the first phase, titles and abstracts were reviewed by two independent groups (each consisting of two previously trained medical students). To minimize selection bias, the Rayyan blinding function was used, which prevented reviewers from identifying the decisions of the other reviewers until the final selection phase. In addition, allocation of studies to reviewers was randomized within each group to further reduce potential bias. In case of disagreement, a consensus discussion was held among the reviewers. If consensus could not be reached, the principal investigator made the final inclusion decision.

In the second phase, a full-text review was performed following the same procedure, ensuring consistency and methodological rigor. The final set of studies was determined after resolving all discrepancies through consensus discussions and the intervention of the principal investigator.

Additionally, we mapped those studies that updated or used the AI-related technology identified as engineered and validated in the previous step, by exploring the citations/references reported in the latter.

Data extraction

Data extraction was performed in a structured manner using a pre-designed form.³⁴ To reduce errors and improve the accuracy of the extracted data, one peer reviewer performed the initial extraction and a second peer independently verified the information. Any discrepancies in the extraction were reviewed jointly and/or, with the intervention of the principal investigator. Cross-checks were implemented to ensure the consistency of the information collected. The form included data about the general characteristics of the studies, the engineering of the AI-related technologies, the substantive validation of these technologies, and their current use for BMS assessment in children.

1. General information: First author of the study, country of the study, year of publication, number and sex of participants, health condition (e.g., children with a medical condition that could influence their motor skills).
2. Engineering: Motion capture interface type, basic composition of technologies, system used for motion capture, type of programming language used for system development or modelling, and technology accessibility.
3. Substantive validation: Type of technology developed and validated, validation method, data collection methods, data for COSMIN (see next section), feasibility and usability of the technology.
4. Use: Type of technology used, training of the evaluation team, reported technology reliability, limitations during the technology use, advantages of the technology application, complementary tools, reference to a publication on the technology used.

Data analysis

All data collected were summarised as categorical variables, organised and presented in tables, using descriptive statistics such as simple frequencies and percentages. Since this was a scoping review, a narrative synthesis was used to summarize the findings of the studies, focusing on the characteristics and psychometric properties evaluated according to COSMIN standards.

The COSMIN standards were applied to assess the technical quality of the substantive validation of the AI-related technologies for BMS evaluation.²⁷ In practice, these technologies (e.g., algorithms) work like psychometric tests (e.g., producing similar BMS measurements); thus, the former can be ‘substantively validated’ as the latter usually are. COSMIN is an international standard for reviewing the technical quality of validation studies of psychometric tools (e.g., tests for measuring BMS).

To perform the COSMIN assessment, two investigators independently assessed and scored eight psychometric properties or indicators (content validity, internal consistency, structural validity, reliability, measurement error, criterion validity, construct validity, and responsiveness). Each indicator was evaluated according to the checklist proposed by Mokkink et al.³⁵ For this study, we scored as follows: 1 = N. A, 2 = inadequate, 3 = doubtful, 4 = adequate and 5 = very good. A total score was calculated for each indicator, keeping similar levels for interpretations (very good, adequate, doubtful, inadequate, N.A.). All results from COSMIN assessment were presented in a table.

Results

We identified 672 studies in the first search step, from which 12 studies were finally selected. Among these studies, five were focused on AI-related technology use, while seven were focused on AI-related technology engineering and/or validation ( Figure 2).

Figure 2. PRISMA diagram for the scoping review.

During the last decade, most studies were performed in Asian and European countries (n=9/12, 74.9%) ( Table 1). Almost all studies were carried out in children of both sexes (n=9/12, 75%), and only one was focused on children with some type of motor problem.

Table 1. General characteristics.

Characteristics of the studies	N=12
Continent
Asia	5 (41.6)
Europe	4 (33.3)
Latin America	1 (8.33)
Nort America	1 (8.33)
Oceania	1 (8.33)
Year of publication
2011-2021	10 (83.3)
≤2010	2 (16.7)
Participant gender
Just kids	1 (8.33)
Girls only	1 (8.33)
Both	9 (75.0)
Not report	1 (8.33)
Population type
Children without health problems	9 (75.0)
Children with attention and concentration problems	1 (8.3)
Children with some delay in motor development	1 (8.3)
Obese children	1 (8.3)

To capture the child’s movement, researchers mostly used simple devices such as digital video cameras (n=5/7, 71.4%) ( Table 2). More sophisticated devices were less common, such as sensors attached to the body (n=2/7, 28.6%) or multimedia devices connected to personal computers (n=2/7, 28.6%). The software used for each device was different for each study. The most common type of AI-related technology was machine learning tools for movement pattern recognition (n=4/7, 57.1%), while deep learning algorithms were rarely used (n=1/7, 14.3%). Only a few of these tools are free-access (n=2/7, 28.6%). Most codes were implemented in Python (SCR_008394) and supported by libraries such as OpenGL (which produces 2D and 3D graphics)³⁶^–³⁸ and Numpy (SCR_008633) (which creates vectors and matrices, and mathematical functions) (45) that helps to process images that are captured in real-time and obtain an accurate representation of the movement.

Table 2. Engineering characteristics of studies that developed artificial intelligence (AI)-related technologies.

Characteristics	N=7
Motion capture device
Digital cameras	5 (71.4)
Smartphones application	1 (14.3)
iPod touch	1 (14.3)
Other motion capture devices
Tracker, marker or movement sensor	2(28.6)
Multimedia devices	2(28.6)
Both of them	3 (42.2)
System used for motion capture
Microsoft Kinect	1 (14.3)
myoMOTION	1 (14.3)
OptiTrack Arena	1 (14.3)
ActiGraph GT3X	1 (14.3)
ProReflex-MCU 240; QualisysMedical AB	1 (14.3)
Acceleration recorder	1 (14.3)
iPod touch (operative system)	1 (14.3)
Type of AI-related tool
Machine learning for movement patterns recognition	4 (57.1)
Kinematic analysis	2(28.6)
Deep learning and neural networks	1 (14.3)
Accessibility to technology or codes
Free or open source	2(28.6)
Paid/does not report	5 (71.4)

For the COSMIN evaluation, we considered seven studies that developed a substantive validation of AI technologies ( Table 3). More than half of the studies reported the evaluation of content validity (n=4/7, 57.1%), reliability (n=1/7, 14.2%), and construct validity (n=1/7, 14.2%) with an adequate level. However, other measurement properties, such as structural validity, measurement error and responsiveness, were inadequately or not evaluated in all studies, according to COSMIN standards (n=5/8, 62.5%). It was not unusual that a declared formal evaluation of a psychometric property (e.g., reliability) was followed by no reporting of final results.

Table 3. Studies that developed substantive validation of artificial intelligence (AI)-related technology (n = 7) COSMIN Standards.

MEASUREMENT PROPERTY	Study 1	Study 2	Study 3	Study 4	Study 5	Study 6	Study 7
MEASUREMENT PROPERTY	Shengyan Li (2017)	Santiago Ramos (2014)	Yukie Amemiya (2018)	Hsun-Ying Mao (2014)	Satoshi Suzuki (2019)	Matthew N. Ahmadi (2020)	Parvinpour, S. (2019)
CONTENT VALIDITY
Relevance	INADEQUATE	DOUBTFUL	DOUBTFUL	INADEQUATE	DOUBTFUL	INADEQUATE	DOUBTFUL
Comprehensiveness	DOUBTFUL	ADEQUATE	ADEQUATE	DOUBTFUL	ADEQUATE	DOUBTFUL	ADEQUATE
Comprehensibility	DOUBTFUL	ADEQUATE	ADEQUATE	DOUBTFUL	ADEQUATE	DOUBTFUL	ADEQUATE
INTERNAL CONSISTENCY	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE
STRUCTURAL VALIDITY	INADEQUATE	DOUBTFUL	DOUBTFUL	DOUBTFUL	DOUBTFUL	INADEQUATE	INADEQUATE
RELIABILITY	DOUBTFUL	DOUBTFUL	DOUBTFUL	ADEQUATE	DOUBTFUL	DOUBTFUL	DOUBTFUL
				(ICC=0,67)
MEASUREMENT ERROR	INADEQUATE	DOUBTFUL	DOUBTFUL	DOUBTFUL	INADEQUATE	DOUBTFUL	INADEQUATE
CRITERION VALIDITY	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE
CONSTRUCT VALIDITY
Convergent validity	INADEQUATE	INADEQUATE	ADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE
Discriminative validity	INADEQUATE	INADEQUATE	DOUBTFUL	DOUBTFUL	DOUBTFUL	DOUBTFUL	DOUBTFUL
RESPONSIVENESS	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE	INADEQUATE

In studies using AI-related technology, the children’s movements were captured by trained personnel (n=2/5, 40%) using digital cameras or camcorders (n=4/5, 80%) ( Table 4). In addition, some supporting technologies that provide high-quality video motion capture, such as “Quintic Biomechanics software”, was also reported. Users reported some advantages of these technologies; for example, the short-term evaluation needed and precise and consistent measures that allow a detailed analysis of motor skills. However, no formal generalization of the conclusions to larger populations was reported as a technical limitation.

Table 4. Current use of studies that used artificial intelligence (AI)-related technology.

Characteristics	N=5
Motion capture device
Digital camera/camcorder	4 (80.0)
Haptic interface	1 (20.0)
Training for the evaluation team
Yes	2 (40.0)
No/not reported	3 (60.0)
Reliability of AI-related technology
Inter- and intra-rater reliability	2 (40.0)
Not reported	3 (60.0)
Limitations reported while using technology
Yes	1 (20.0)
No/not reported	4 (80.0)
Advantages reported while using technology
Yes	5 (100.0)
No/not reported	0 (0.0)
Complementary tools or technology
Laptop	1 (20.0)
Quintic biomechanical analysis software	1 (20.0)
Portable DVD	1 (20.0)
Panasonic AG-7350 recorder, a Sony PVM-1341 monitor and a microcomputer	1 (20.0)
Not reported	1 (20.0)
Used technology reference
Published	0 (0.0)
Manual	5 (100.0)

We identified 10 studies that updated and/or applied the exact AI-related technology reported in Tables 2 and 3 (Table III, supplemental material). Among those studies, 7/10, (70%) were used for the assessment of motor skills; and 3/10, (30%) were updated and used (i.e., a new version of the technology).

Discussion

We performed a scoping review of AI-related technologies developed and used to assess motor skills in children. Engineering work and technological features have been prioritized in these studies; for example, the use of advanced systems for motion capture or the training of sophisticated machine learning algorithms for movement patterns recognition. More importantly, the validation of these algorithms was strictly based on engineering criteria; it means, no substantive knowledge of the medical or psychological aspects of motor skills was integrated into the validation process. Technical features typically evaluated in psychometric instruments designed for assessing motor skills in children were also ignored (i.g., COSMIN criteria). The use of these AI-related technologies in scientific research is still limited.

Most studies on AI technologies engineering ignored the standard psychometric validation process (i. e., COSMIN standards). Although many of them complied with the good practices in the development of image processing-oriented software, none of them integrated a substantive validation. AI-related technology is good for identifying movement patterns that are rare in children or patterns that children of a certain age should show, and they are not. This capacity has enormous value for clinical and educative purposes. However, for these AI measures to be integrated into a formal clinical evaluation, some technical features must be confirmed. For example, the measurement error estimate is essential for evaluating individuals from the target population, allowing the definition of critical ranges (i.e., minimum and maximum values) to contrast individual measures and conclude an advantaged, normal or sub-normal motor skill development. Another important psychometric characteristic is responsiveness, which reveals whether any change seen between within-individual AI measurements performed before and after an intervention corresponds to true changes in motor skills (smallest detectable changes), which is linked to investigating when these changes are clinically relevant (minimal important changes).

A previous review of AI technologies for evaluating motor skills in paediatric populations warns that the validation of these tools is limited.³⁹ As we do here, they concluded that this limitation has practical implications in the assessment precision and applicability in clinical contexts. Without a standard psychometric validation process, AI developers do not collect the correct and sufficient evidence to ensure the minimal validity and reliability required for this kind of measurement. For example, one of our reviewed studies reported that the AI algorithm was reliable and valid because it was based on a test previously declared reliable by its original author.⁴⁰ Differences between the population for which the original test was created and the sample used to develop the AI version can seriously compromise the reliability of the measures and their clinical interpretation criterion due to cultural/ethnic, linguistic, social, economic and age differences.¹⁰ In practice, clinical interpretation is an essential component of measurement validity and usually requires evidence beyond the standard qualification norm. For example, the recent study reported a new video-based technology that was based on a classical motor skill test (i.e., that needs paper, pencil and evaluator’s criteria), showing concurrent validity against another measure of motor skills.⁴¹ Contrasting AI measurements against external independent criteria is essential, not only to confirm that the algorithm is measuring what we intend to but also to connect these measurements with other signs and symptoms clinically relevant. In this way, AI measurements become more informative and useful for a full evaluation of a children’s healthy development.⁴²

There are some factors explaining the limited production of AI-related technologies for evaluating motor skills in children. There is a priority for using AI to assess other health problems in this and other populations. During the last two decades, most AI for health has been developed for the diagnosis and follow-up of physical problems such as cancer, cardiovascular diseases, or neurodegenerative disorders in adult subjects.¹⁸^,⁴³ High costs slow the production of these AI-related technologies,⁴⁴^,⁴⁵ especially in low-and-middle-income countries. Rich countries promote the investment of significant amounts of money for developing new cutting-edge technology,⁴⁶ although for a wide range of purposes. In low- and middle-income countries, AI development suffers from some extra limitations, such as insufficient economic and human resources, limited data, non-transparent AI algorithm sharing, and scarce collaboration between technological institutions.⁴⁷

The use of AI-related technologies in scientific research is also limited, and this is linked to other factors. As expected, developers focused on engineering and not research to facilitate the use of their technologies. For example, only one of our reviewed studies performed a usability and feasibility analysis,⁴⁸ which is important to make the technology friendlier and more accessible to future users.⁴⁰ This can be explained, in part, because most of them is still developed within the academia, and not yet in the private sector and for commercial purposes. However, considering how they can improve the speed and precision of BMS evaluation of children for doctors and teachers, these AI-related technologies have great commercial potential in the educative and clinical contexts.

Strengths and Limitations

This is the first scoping review emphasising the substantive validation processes of AI-related technologies produced to assess motor skills in preschool children. The databases consulted during the identification and selection of studies were specialised and extensive; thus, there was a limited loss of relevant information. Also, although this review was based on COSMIN standards to assess the psychometric quality of AI-related technologies, due to the heterogeneity observed in the included studies, no specific adjustments were made to control for possible confounding variables. Therefore, the conclusions need to be interpreted with caution. It is recommended that future research address these factors and use control methods to provide more generalizable conclusions. Furthermore, feasibility and usability were extracted only if the reviewed studies explicitly reported having done so in their analysis of AI-related technologies. Therefore, further studies should evaluate these analyses using a standardized framework. This review did not aim to analyze associations between variables; however, variability in sample sizes, age ranges, and types of AI-based technologies used across studies may affect the comparability and generalizability of the findings. These differences should be considered when interpreting the results and highlight the need for more standardized approaches in future research.

Implications

To facilitate use, developers could conduct studies that evaluate the acceptance, ease of use, cost-effectiveness, and accessibility of these technologies. For example, most technologies rely on sensors and monitors that, while accurate, can be costly, require specialized training, and can be difficult to implement in real-world settings for physicians, teachers, therapists, or practitioners unfamiliar with these tools. In addition, disparities in access to advanced technologies may limit their widespread adoption, particularly in low-resource settings.

Also, these types of technologies may be closer to more universal and cost-effective devices, such as video cameras, smartphones, and tablets, that can assess and report motor skills in real time. However, addressing these challenges requires a collaborative and interdisciplinary approach. Future validation studies should involve experts from multiple fields, including engineers, healthcare professionals, educators and policy makers, to ensure that these technologies are not only accurate, but also practical, scalable and accessible to diverse populations.

New validation studies of these technologies should include validation standards for BMS tests, prioritizing key psychometric properties such as construct validity, criterion validity, reliability, measurement error, among others. To make this possible, engineering teams could incorporate specialists in psychometrics, developmental therapy and medicine to work collaboratively. This multidisciplinary approach will facilitate the integration of medical knowledge and psychometric standards into future software releases, improving both measurement accuracy and practical usability. Finally, developers should consider providing open source code or detailed methodological documentation, which will allow for further refinement, replication, and clinical adaptation of these technologies in future research and real-world applications.

Conclusions

Engineering work and technological features have been prioritized in the studies about AI-related technologies. The validation of these algorithms was strictly based on engineering criteria; it means, no substantive knowledge of the medical or psychological aspects of motor skills was integrated into the validation process. Technical features typically evaluated in psychometric instruments designed for assessing motor skills in children were also ignored (e.g., COSMIN criteria). The use of these AI-related technologies in scientific research is still limited.

Data availability

Extended data

Zenodo: Development, validation and use of artificial-intelligence-related technologies to assess basic motor skills in children: a scoping review, https://doi.org/10.5281/zenodo.8056742³³

This project contains the following extended data:

• Appendix 1. Supplementary Tables
• Appendix 2. Search formulas

Also in Zenodo: Figueroa-Quiñones, Joel. (2023). date extension. Zenodo. https://doi.org/10.5281/zenodo.8190823.³⁴

It’s found:

• Information extraction form

Finally, in Zenodo: Joel. (2023). data extension [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8253201.³¹

This project contains the following extended data:

• 1. COSMIN checklist
• 2. Scoping Reviews (PRISMA-ScR) Checklist
• 3. Flowchart

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Acknowledgements

This report is independent research supported by the National Institute for Health and Care Research ARC North Thames. The views expressed in this publication are those of the author(s) and not necessarily those of the National Institute for Health and Care Research or the Department of Health and Social Care. We thank Miguel Moscoso for his help in the initial stage of this project.

References

1. Gallahue DL, Ozmun JC, Goodway J: Understanding motor development: infants, children, adolescents, adults. New York: McGraw-Hill; 2012.
2. Kit BK, Akinbami LJ, Isfahani NS, et al.: Gross Motor Development in Children Aged 3–5 Years, United States 2012. Matern. Child Health J. 2017; 21: 1573–1580. PubMed Abstract | Publisher Full Text | Free Full Text
3. Figueroa R, An R: Motor Skill Competence and Physical Activity in Preschoolers: A Review. Matern. Child Health J. 2017; 21(1): 136–146. Publisher Full Text
4. Jiang G-P, Jiao X-B, Wu S-K, et al.: Balance, proprioception, and gross motor development of Chinese children aged 3 to 6 years.J. Mot. Behav.2018;50(3): 343–352. PubMed Abstract | Publisher Full Text
5. Gandotra A, Kotyuk E, Bizonics R, et al.: An exploratory study of the relationship between motor skills and indicators of cognitive and socio-emotional development in preschoolers.Eur. J. Dev. Psychol.2023;20(1): 50–65. Publisher Full Text
6. Bremer E, Cairney J: Fundamental Movement Skills and Health-Related Outcomes: A Narrative Review of Longitudinal and Intervention Studies Targeting Typically Developing Children. Am. J. Lifestyle Med. 2018; 12(2): 148–159. Publisher Full Text
7. Eddy LH, Wood ML, Shire KA, et al.: A systematic review of randomized and case-controlled trials investigating the effectiveness of school-based motor skill interventions in 3- to 12-year-old children. Child Care Health Dev. 2019; 45(6): 773–790. PubMed Abstract | Publisher Full Text
8. Connolly KJ, Forssberg H: Neurophysiology and Neuropsychology of Motor Development. Cambridge University Press; 1997; 400.
9. Manoel EJ, Connolly KJ: Variability and the development of skilled actions. Int. J. Psychophysiol. 1995; 19: 129–147. Publisher Full Text
10. American Educational Research Association: American Psychological Association, National Council on Measurement in Education. Standards for educational and psychological testing. Washington, DC: AERA Publications Sales; 2014.
11. Hatfield BD, Landers DM: Observer Expectancy Effects upon Appraisal of Gross Motor Performance. Res. Q. Am. Alliance Health Phys. Educ. Recreat. 1978; 49(1): 53–61. PubMed Abstract | Publisher Full Text
12. Burghardt GM, Bartmess-LeVasseur JN, Browning SA, et al.: Perspectives - Minimizing Observer Bias in Behavioral Studies: A Review and Recommendations. Ethology. 2012; 118(6): 511–517. Publisher Full Text
13. Khalid R, Willatts P, Williams FLR: Do research studies in the UK reporting child neurodevelopment adjust for the variability of assessors: a systematic review. Dev. Med. Child Neurol. 2015; 58(2): 131–137. PubMed Abstract | Publisher Full Text
14. Bossavit B, Arnedillo-Sánchez I: Designing Digital Activities to Screen Locomotor Skills in Developing Children.Alario-Hoyos C, Rodríguez-Triana MJ, Scheffel M, et al., editors. Addressing Global Challenges and Quality Education. Cham: Springer International Publishing; 2020; p. 416–420. (Lecture Notes in Computer Science).
15. Roggio F, Trovato B, Sortino M, et al.: A comprehensive analysis of the machine learning pose estimation models used in human movement and posture analyses: A narrative review.Heliyon.2024; 10(21): e39977. PubMed Abstract | Publisher Full Text | Free Full Text
16. Mao M, Lee A, Hong M: Deep learning innovations in video classification: A survey on techniques and dataset evaluations. Electronics (Basel). 2024; 13(14): 2732. Publisher Full Text
17. Baca A: Methods for Recognition and Classification of Human Motion Patterns – A Prerequisite for Intelligent Devices Assisting in Sports Activities. IFAC Proc. Vol. 2012; 45(2). Publisher Full Text
18. Jiang F, Jiang Y, Zhi H, et al.: Artificial intelligence in healthcare: past, present and future. Stroke Vasc. Neurol. 2017; 2(4): 230–243. PubMed Abstract | Publisher Full Text | Free Full Text
19. Farrahi V, Niemelä M, Tjurin P, et al.: Evaluating and Enhancing the Generalization Performance of Machine Learning Models for Physical Activity Intensity Prediction From Raw Acceleration Data. IEEE J. Biomed. Health Inform. Jan. 2020; 24(1): 27–38. PubMed Abstract | Publisher Full Text
20. Alsareii SA, Awais M, Alamri AM, et al.: Physical activity monitoring and classification using machine learning techniques. Life (Basel). 2022; 12(8): 1103. PubMed Abstract | Publisher Full Text | Free Full Text
21. Cust EE, Sweeting AJ, Ball K, et al.: Machine and deep learning for sport-specific movement recognition: a systematic review of model development and performance. J. Sports Sci. 2019; 37(5): 568–600. PubMed Abstract | Publisher Full Text
22. Tang Y-M, Wang Y-H, Feng X-Y, et al.: Diagnostic value of a vision-based intelligent gait analyzer in screening for gait abnormalities. Gait. Posture. 2022; 91: 205–211. PubMed Abstract | Publisher Full Text
23. Butt AH, Rovini E, Dolciotti C, et al.: Leap motion evaluation for assessment of upper limb motor skills in Parkinson’s disease. 2017 International Conference on Rehabilitation Robotics (ICORR). 2017; pp. 116–121. Publisher Full Text
24. Pogorelc B, Bosnić Z, Gams M: Automatic recognition of gait-related health problems in the elderly using machine learning. Multimed. Tools Appl. 2012; 58(2): 333–354. Publisher Full Text
25. Bertoncelli CM, Altamura P, Vieira ER, et al.: Using Artificial Intelligence to Identify Factors Associated with Autism Spectrum Disorder in Adolescents with Cerebral Palsy. Neuropediatrics. 2019; 50(3): 178–187. Publisher Full Text
26. Santos OC: Artificial Intelligence in Psychomotor Learning: Modeling Human Motion from Inertial Sensor Data. Int. J. Artif. Intell. Tools. 2019; 28(04): 1940006. Publisher Full Text
27. Santos OC: Beyond cognitive and affective issues: Designing smart learning environments for psychomotor personalized learning. En: Learning, Design, and Technology. Cham: Springer International Publishing; 2023; pp. 3309–3332. Publisher Full Text
28. JC: Protocol for a scoping review. Zenodo. 2023. Publisher Full Text
29. Tricco AC, Lillie E, Zarin W, et al.: PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 4 de septiembre de 2018; 169(7): 467–473. PubMed Abstract | Publisher Full Text
30. Prinsen CAC, Mokkink LB, Bouter LM, et al.: COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual. Life Res. 2018; 27(5): 1147–1157. PubMed Abstract | Publisher Full Text | Free Full Text
31. Joel: data extension. [Data set]. Zenodo. 2023. Publisher Full Text
32. Mokkink LB, de Vet HCW , Prinsen CAC, et al.: COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures. Qual. Life Res. 2018; 27(5): 1171–1179. PubMed Abstract | Publisher Full Text | Free Full Text
33. Figueroa-Quiñones J, Ipanaque-Neyra J, Hurtado HG, et al.: Development, validation and use of artificial-intelligence-related technologies to assess basic motor skills in children: a scoping review (Last version). [Data set]. Zenodo. 2023. Publisher Full Text
34. Figueroa-Quiñones J: data extension. Zenodo. 2023. Publisher Full Text
35. Mokkink LB, Terwee CB, Patrick DL, et al.: The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual. Life Res. 2010; 19: 539–549. PubMed Abstract | Publisher Full Text | Free Full Text
36. Shengyan LI, Bin LI, Shixiong ZHANG, et al.: A Markerless Visual-motor Tracking System for Behavior Monitoring in DCD Assessment. Proceedings of APSIPA Annual Summit and Conference. 2017; 774–777. Publisher Full Text
37. Mao HY, Kuo LC, Yang AL, et al.: Balance in children with attention deficit hyperactivity disorder-combined type. Res. Dev. Disabil. 2014; 35: 1252–1258. PubMed Abstract | Publisher Full Text
38. Parvinpour S, Shafizadeh M, Balali M, et al.: Effects of Developmental Task Constraints on Kinematic Synergies during Catching in Children with Developmental Delays. J. Mot. Behav. 2020; 52: 527–543. PubMed Abstract | Publisher Full Text
39. Redd CB, Karunanithi M, Boyd RN, et al.: Technology-assisted quantification of movement to predict infants at high risk of motor disability: A systematic review. Res. Dev. Disabil. 2021; 118: 104071. PubMed Abstract | Publisher Full Text
40. Yu K-H, Beam AL, Kohane IS: Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018; 2(10): 719–731. Publisher Full Text
41. Monje MHG, Domínguez S, Vera-Olmos J, et al.: Remote Evaluation of Parkinson’s Disease Using a Conventional Webcam and Artificial Intelligence. Front. Neurol. 2021; 12: 742654. PubMed Abstract | Publisher Full Text | Free Full Text
42. Suzuki S, Amemiya Y, Sato M: Enhancement of gross-motor action recognition for children by CNN with OpenPose. IECON 2019-45th Annual Conference of the IEEE Industrial Electronics Society, Lisbon, Portugal. 2019; pp. 5382–5387. Publisher Full Text
43. Oung QW, Muthusamy H, Lee HL, et al.: Technologies for Assessment of Motor Disorders in Parkinson’s Disease: A Review. Sensors. 2015; 15(9): 21710–21745. PubMed Abstract | Publisher Full Text | Free Full Text
44. Belić M, Bobić V, Badža M, et al.: Artificial intelligence for assisting diagnostics and assessment of Parkinson’s disease—A review. Clin. Neurol. Neurosurg. 2019; 184: 105442. Publisher Full Text
45. Michalski SC, Szpak A, Loetscher T: Using Virtual Environments to Improve Real-World Motor Skills in Sports: A Systematic Review. Front. Psychol. 2019; 10. PubMed Abstract | Publisher Full Text | Free Full Text
46. Bredt S: Artificial Intelligence (AI) in the Financial Sector—Potential and Public Strategies. Front. Artif. Intell. 2019; 2: 16. PubMed Abstract | Publisher Full Text | Free Full Text
47. UNESCO: Artificial intelligence for sustainable development: challenges and opportunities for UNESCO’s science and engineering programmes.2019. Reference Source
48. Amemiya Y, Suzuki S, Satoh M: A Support System for Gross Motor Assessment of Preschool Children. En: IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society. 2018; 4251–4256. Publisher Full Text

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 18 Dec 2023

Author details Author details

Juan Ipanaque-Neyra
Roles: Data Curation, Formal Analysis, Resources, Validation, Visualization, Writing – Original Draft Preparation

Heber Gómez Hurtado
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Writing – Original Draft Preparation

Oscar Bazo-Alvarez
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Validation, Visualization, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (2)

version 2

Revised

Published: 02 Sep 2025, 12:1598

https://doi.org/10.12688/f1000research.138616.2

version 1

Published: 18 Dec 2023, 12:1598

https://doi.org/10.12688/f1000research.138616.1

© 2025 Figueroa-Quiñones J et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Figueroa-Quiñones J, Ipanaque-Neyra J, Gómez Hurtado H et al. Development, validation and use of artificial-intelligence-related technologies to assess basic motor skills in children: a scoping review [version 2; peer review: 1 approved with reservations]. F1000Research 2025, 12:1598 (https://doi.org/10.12688/f1000research.138616.2)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 18 Dec 2023

Views

Reviewer Report 30 Jan 2025

Abraham M. Joshua, Department of Physiotherapy, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education, Manipal, Karnataka, India

Ashish John Prabhakar, Department of Physiotherapy, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education (Ringgold ID: 76793), Manipal, Karnataka, India

Approved with Reservations

https://doi.org/10.5256/f1000research.151828.r349533

ABSTRACT
Background    -   Redundant phrasing: "two observersers" is misspelled and repetitive.

Results             -   Tracking of "10 other publications" is unclear and vague.
                          -   Overuse of technical language without explanation for general readers (e.g., "engineering criteria").

Keywords       -   Missing potential keywords like "assessment tools," “functional abilities,”
and   "gross motor” based on the ‘title and the contents of the study

Introduction: suggestions
1. Overly Technical Language:

While the introduction is generally informative, it becomes very technical in certain sections, especially when describing the AI process (e.g., "Fast Fourier Transformation," "principal components analysis," "linear discriminant analysis"). This might be difficult for readers unfamiliar with these terms or the AI field.
Suggestion: Simplify the explanation or provide brief definitions or context for these terms.

2. Lack of Clear Connection Between BMS and AI:

While the introduction discusses BMS, AI technologies, and their potential to address observer bias, the connection between these topics could be more explicit. The introduction seems to jump between BMS, traditional measurement tools, observer bias, and AI without a smooth flow that ties everything together.
Suggestion: Strengthen the connection between the problems with current BMS assessment and how AI could specifically address them. A clearer explanation of how AI can solve observer bias and improve accuracy in BMS assessment would help make the argument more compelling.

3. Limited Emphasis on the Scope of the Problem:

The issue of observer bias and its impact on BMS assessment is raised, but it’s not explored in-depth. The extent of the problem (how often it happens, how significant the impact is) isn’t fully explained.
Suggestion: Provide more concrete examples or data to highlight the real-world implications of observer bias in BMS assessments. This would help emphasize the need for better solutions like AI.

4. Vague Description of "AI-Related Technologies":

The term “AI-related technologies” is introduced, but it remains somewhat vague. While the steps of motion recognition are detailed, it’s not entirely clear what specific AI tools or algorithms are most effective in this context or how these methods directly translate to motor skill assessment in children.
Suggestion: Clarify the specific AI tools used or might be used in BMS assessment. This could make the introduction feel more grounded in practical applications.

5. Inconsistent Flow and Structure:

The flow between sections could be smoother. For example, the transition from talking about BMS instruments (TGMD-2, BOT-2) to observer bias is a bit abrupt. Similarly, after discussing the issue of observer bias, the leap to the technicalities of AI feels disconnected.
Suggestion: Reorganize the content so that each section builds more naturally on the previous one. Consider adding a paragraph that ties observer bias to the introduction of AI as a solution before jumping into the technical aspects.

6. Missing Emphasis on the Target Age Group (3-6 Years):

The introduction discusses BMS in general terms but doesn’t emphasize the importance of assessing motor skills in children aged 3 to 6 years. Since this is a specific target group, more focus on why this age range is critical for BMS development would strengthen the relevance of the study.
Suggestion: Provide a brief explanation of why this age group is the focus and what makes BMS development particularly crucial at this stage.

7. Lack of Citations for Claims About AI in Other Fields:

There are a few claims about AI applications in other fields (e.g., assessing physical activity in adults, recognizing walking alterations), but not all of these are fully supported by citations. The mention of “a recent review” and other general statements could be more precise.
Suggestion: Provide specific references or clarify that the claims about AI in other fields are based on the literature review cited earlier.

8. No Clear Statement of the Research Gap:

While the introduction implies a gap in the literature (i.e., a lack of formal review of AI in BMS assessment for children), this could be more explicitly stated and emphasized. It could be clearer why this review is necessary and what new insights it might offer.
Suggestion: Make the research gap more prominent by rephrasing the last couple of sentences to more forcefully explain why this scoping review is essential.

9. Repetitive Mention of AI:

The introduction repeatedly mentions “AI-related technologies,” which could be streamlined. The term is used several times in close proximity without adding new information each time, which can feel redundant.
Suggestion: Try to vary the phrasing (e.g., "machine learning tools," "AI classifiers") to keep the introduction engaging and avoid repetition.

10. Unfinished Sentence in Objective Statement:

The final sentence of the introduction (before the objectives) seems cut off. This makes the last thought feel incomplete and leaves the reader hanging.
Suggestion: Complete the sentence and ensure that all points are concluded before introducing the research objectives.

Methods and discussion:

1. Lack of Detail in Study Selection Criteria

While the article clearly defines the types of studies it targeted (engineering, substantive validation, and use of AI-related technologies), it doesn’t provide much detail on what specific inclusion and exclusion criteria were applied beyond these categories. For example, were studies excluded for reasons like sample size, study design, or methodological rigor?
Suggestion: Provide more specific inclusion and exclusion criteria. Were only randomized controlled trials considered? Were there any restrictions based on the publication type (e.g., peer-reviewed articles only)?

2. Vague Explanation of Search Strategy

The search strategy mentions specific databases but does not explain the rationale for selecting these particular sources over others. Are there other databases relevant to the field that might have been excluded? How were the keywords selected, and were any synonyms or related terms considered?
Suggestion: Provide a clearer justification for the selection of these particular databases and search terms. Did you consider any grey literature (e.g., theses, dissertations, reports from non-peer-reviewed sources) to broaden the search?

3. Details on the Rayyan Platform

The mention of the Rayyan platform for managing studies is helpful, but the text doesn't clarify whether Rayyan was used for the full review process or just the initial screening. The review process appears to be manual in nature, but it could benefit from some details on how disagreements were handled between reviewers (e.g., was there a consensus meeting or did one reviewer have final say on the study’s inclusion?).
Suggestion: Provide more detail on how Rayyan was used throughout the review process. How were disagreements between the two independent groups resolved, and what role did the principal investigator play in the final decisions?

4. Potential Bias in Study Selection

The method states that the initial review of titles and abstracts was done by two independent groups, but there is no mention of any specific strategies to minimize bias during this phase. Given that the authors are likely familiar with the topic, it could be helpful to acknowledge how any biases in study selection (e.g., confirmation bias or publication bias) were minimized.
Suggestion: Mention strategies to reduce bias in study selection. For example, were studies randomly assigned to reviewers, and was there a protocol to ensure that preconceived notions didn’t influence the selection process?

5. Limited Details on Data Extraction Process

While the data extraction form is outlined with categories such as general information, engineering, substantive validation, and use, there is little detail on how the data were extracted from the studies. For example, was any kind of reliability check conducted between reviewers, or were discrepancies resolved through discussion?
Suggestion: Provide more details on the data extraction process, such as whether two independent reviewers performed the extraction and how discrepancies were resolved. Additionally, were any statistical methods used to assess inter-rater reliability in the extraction process?

6. Lack of Clarity on Data Analysis

The description of data analysis mentions the use of descriptive statistics (frequencies and percentages) and the COSMIN standards for evaluating psychometric properties. However, there is no clear explanation of how the data were analyzed or how the results were synthesized. Were any meta-analysis or qualitative synthesis methods considered? How were the studies compared and summarized?
Suggestion: Provide more clarity on how the data were synthesized. Was any quantitative or qualitative analysis beyond simple descriptive statistics performed, such as meta-analysis or thematic analysis? Did you perform a subgroup analysis for specific types of AI technologies or specific study characteristics?

7. Lack of Discussion on Potential Confounding Variables

There is a brief mention of the use of psychometric standards (COSMIN) to evaluate AI technologies, but the article doesn't discuss how confounding factors (e.g., differences in sample size, age groups, or types of technology used) might impact the validity of the conclusions. Were these factors taken into consideration in the evaluation process?
Suggestion: Acknowledge potential confounding variables and how they might affect the validity of the studies reviewed. How did the authors control for these variables, if at all, in the synthesis process?

8. Unclear Rationale for Data Collection Methods

In the section about data extraction, the mention of "feasibility and usability" as part of the data form is important but lacks further context. Was there a specific framework used to evaluate these factors? Were the usability and feasibility criteria standardized across the included studies?
Suggestion: Provide more information on the frameworks or criteria used to assess usability and feasibility. Were these evaluations subjective or based on standardized measures? Clarifying this would help readers understand how the data extraction process assessed these aspects.
Discussion - Discuss usability, cost-effectiveness, and accessibility challenges. -Emphasize the need for interdisciplinary collaboration in future validation studies. -Propose specific psychometric properties (e.g., construct validity) to prioritize future AI validation research. - Add practical recommendations for improving AI tools (e.g., integrating psychometric standards, improving usability for non-specialists).

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Yes
Are sufficient details of the methods and analysis provided to allow replication by others?

Partly
Is the statistical analysis and its interpretation appropriate?

Partly
Are the conclusions drawn adequately supported by the results presented in the review?

Yes
If this is a Living Systematic Review, is the ‘living’ method appropriate and is the search schedule clearly defined and justified? (‘Living Systematic Review’ or a variation of this term should be included in the title.)

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Neurorehabilitation, Upper limb functional evaluation, AI for rehabilitation, Exercise therapy, Balance training

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

CITE

Report a concern

Author Response 10 Sep 2025

Joel Figueroa-Quiñones, Lambayeque, Universidad Señor de Sipán, Chiclayo, Peru

10 Sep 2025

Author Response
Reviewer # 1:

Abstract

Background - Redundant phrasing: "two observersers" is misspelled and repetitive.

We corrected the typographical error and improved the phrasing for clarity in the Background ... Continue reading
Reviewer # 1:

Abstract

Background    -   Redundant phrasing: "two observersers" is misspelled and repetitive.

We corrected the typographical error and improved the phrasing for clarity in the Background section. The revised sentence now reads:
“In basic motor skills evaluation, two observers may rate the same child’s performance differently, introducing variability in the assessment”

Results             -   Tracking of "10 other publications" is unclear and vague.

We revised the sentence in the Results section to provide a more precise description of the process. The updated text now reads:

“From the 7 technology development studies, we examined their citation networks using Google Scholar and identified 10 subsequent peer-reviewed publications that either enhanced the original technologies or applied them in new research contexts.”

                          -   Overuse of technical language without explanation for general readers (e.g., "engineering criteria").

We revised the sentence to reduce technical jargon and make it more accessible to general readers. In the corresponding section, the updated sentence now reads:

“The validation of these algorithms was based on engineering standards, focusing on their accuracy and technical performance, but without integrating medical and psychological knowledge about children's motor development.”

Keywords       -   Missing potential keywords like "assessment tools," “functional abilities,” and   "gross motor” based on the ‘title and the contents of the study

We have updated the list of keywords to better reflect the title and content of the study. The following terms have been added:
“assessment tools,” “functional abilities,” and “gross motor.”

Introduction: suggestions

Overly Technical Language:

While the introduction is generally informative, it becomes very technical in certain sections, especially when describing the AI process (e.g., "Fast Fourier Transformation," "principal components analysis," "linear discriminant analysis"). This might be difficult for readers unfamiliar with these terms or the AI field.

Suggestion: Simplify the explanation or provide brief definitions or context for these terms.

We have revised the third paragraph of the Introduction to provide brief definitions and clearer context for the mentioned techniques. Specifically, we now explain the role of Fast Fourier Transformation, principal components analysis, and linear discriminant analysis in simpler terms to aid reader understanding. The revised sentence reads as follows:

“Then, these data undergo pre-processing to reduce noise and enhance relevant features. This step often involves filtering techniques, such as Fast Fourier Transformation (which helps separate important movement signals from background noise) or wavelet transforms. Additionally, to simplify complex data and highlight key movement patterns, methods like principal components analysis (which reduces data dimensions while preserving essential information) or linear discriminant analysis (which enhances the distinction between movement categories) are applied (17).”

2. Lack of Clear Connection Between BMS and AI:

While the introduction discusses BMS, AI technologies, and their potential to address observer bias, the connection between these topics could be more explicit. The introduction seems to jump between BMS, traditional measurement tools, observer bias, and AI without a smooth flow that ties everything together.

Suggestion: Strengthen the connection between the problems with current BMS assessment and how AI could specifically address them. A clearer explanation of how AI can solve observer bias and improve accuracy in BMS assessment would help make the argument more compelling.

We agree that the connection between BMS, traditional measurement tools, observer bias, and artificial intelligence (AI)-related technology could be more explicit. In response to your suggestion, we have made adjustments to the third paragraph of the Introduction to improve the flow and strengthen the connection between these topics.
Specifically, we added a clearer explanation of how AI can directly address observer bias and improve accuracy in BMS assessment. We now describe how AI, by automating the motion classification process, reduces human subjectivity and enables more objective and accurate evaluations.
In this way, we aim to reinforce the argument that AI-related technology not only offers an alternative to traditional methods but effectively addresses the limitations of current BMS assessment approaches.

3. Limited Emphasis on the Scope of the Problem:
The issue of observer bias and its impact on BMS assessment is raised, but it’s not explored in-depth. The extent of the problem (how often it happens, how significant the impact is) isn’t fully explained.
Suggestion: Provide more concrete examples or data to highlight the real-world implications of observer bias in BMS assessments. This would help emphasize the need for better solutions like AI.

We have strengthened the second paragraph of the Introduction by incorporating concrete data that illustrate the prevalence and real-world consequences of this issue. Specifically, we now include evidence from two reviews: one indicating that, among 960 behavioral studies, only 3.2% reported interobserver reliability measures and only 1.9% met rigorous criteria for minimizing bias (12); and another highlighting that in child development research, poor reporting and variability in assessor performance may obscure children’s true developmental status, potentially compromising clinical decisions (13).

These additions aim to clarify the extent and significance of observer bias, reinforcing the necessity of adopting more objective tools—such as AI-based technologies—for accurate and reliable BMS assessment.

4. Vague Description of "AI-Related Technologies":
The term “AI-related technologies” is introduced, but it remains somewhat vague. While the steps of motion recognition are detailed, it’s not entirely clear what specific AI tools or algorithms are most effective in this context or how these methods directly translate to motor skill assessment in children.
Suggestion: Clarify the specific AI tools used or might be used in BMS assessment. This could make the introduction feel more grounded in practical applications.

We have revised the beginning of the third paragraph of the Introduction to explicitly define “AI-related technologies” and specify the tools most relevant for BMS assessment. The revised text reads as follows:
“AI-related technologies (i.e., computational systems that use artificial intelligence to analyze, learn from, and interpret data) offer a promising alternative to minimize observer bias in BMS assessment (14). For example, for motion capture and analysis, computer vision tools such as OpenPose, MediaPipe and DeepLabCut enable pose estimation and tracking of key points of the human body with high accuracy (15). In addition, deep learning techniques, such as Convolutional Neural Networks (CNN) and vision-specialized Transformer Models (ViT), have proven to be effective in classifying motion sequences in videos (16). In that sense, these AI-related technologies for recognizing and classifying human motion patterns consist of several steps (Figure 1) (17).”

5. Inconsistent Flow and Structure:
The flow between sections could be smoother. For example, the transition from talking about BMS instruments (TGMD-2, BOT-2) to observer bias is a bit abrupt. Similarly, after discussing the issue of observer bias, the leap to the technicalities of AI feels disconnected.
Suggestion: Reorganize the content so that each section builds more naturally on the previous one. Consider adding a paragraph that ties observer bias to the introduction of AI as a solution before jumping into the technical aspects.

We appreciate your feedback on the flow and structure of the manuscript. Our team has improved the text to facilitate smooth transitions between sections.
Specifically, the first part of the paragraph now reads: "Typically, BMS assessment relies on trained professionals who observe, record, and score children's performance on specific motor tasks (8,9). However, a major challenge in this approach is observer bias. Even when raters receive standardized training, small differences in scoring can introduce variability in BMS measurements. This variability reduces the accuracy of the assessment and can lead to misinterpretations. For example, two children with similar motor skills may receive different scores depending on the assessor, resulting in inconsistent results. When these inconsistencies follow a systematic pattern, they contribute to observer bias, a well-documented source of measurement error (10,11).".
We have also added a transitional paragraph at the end of the second paragraph, linking the problem of observer bias to the introduction of AI technologies as a solution. This paragraph highlights how observer bias can affect BMS assessment results, and the following paragraph addresses how AI-related technologies can mitigate this problem, offering a more objective and accurate alternative. The text reads: "In fact, one review reported that of 960 behavioral studies, only 3.2% reported measures of interobserver reliability, and only 1.9% met rigorous criteria for minimizing bias (12). Similarly, another review on child development found that the quality of reporting on the use of assessors in these studies was poor and that variability in assessor performance may obscure the true developmental status of children, compromising complex and costly clinical decisions (13)”.

6. Missing Emphasis on the Target Age Group (3-6 Years):
The introduction discusses BMS in general terms but doesn’t emphasize the importance of assessing motor skills in children aged 3 to 6 years. Since this is a specific target group, more focus on why this age range is critical for BMS development would strengthen the relevance of the study.
Suggestion: Provide a brief explanation of why this age group is the focus and what makes BMS development particularly crucial at this stage.

We have added a brief explanation in the first paragraph of the introduction that highlights why this age group is crucial for BMS development. Specifically, we have pointed out that during this period of rapid physical and motor development, children refine fundamental skills that are essential for later complex activities, which directly impacts their cognitive, emotional, and social development. In addition, we underscored the relevance of early assessment to identify potential delays and facilitate timely interventions.
Now the text says the following: "The development of basic motor skills (BMS) in children aged 3 to 6 years is critical, as this is a period of rapid motor growth, where children acquire physical skills that allow them to participate in a variety of activities (1,2). At this age, children experience significant improvements in gross motor control, allowing them to perform movements such as running, jumping, and manipulating objects with greater precision (3,4). The acquisition of these motor skills is essential for physical, cognitive and emotional development, as BMS are strongly linked to general well-being, self-esteem and social integration (5)."

7. Lack of Citations for Claims About AI in Other Fields:
There are a few claims about AI applications in other fields (e.g., assessing physical activity in adults, recognizing walking alterations), but not all of these are fully supported by citations. The mention of “a recent review” and other general statements could be more precise.
Suggestion: Provide specific references or clarify that the claims about AI in other fields are based on the literature review cited earlier.

We have carefully reviewed the claims about AI applications and ensured that each is supported by an appropriate reference. In addition, we have revised the general claims to improve their accuracy with the cited literature.
Specifically, we have updated citations 4, 5, 6, and 7 with their respective references:
4. Jiang G-P, Jiao X-B, Wu S-K, Ji Z-Q, Liu W-T, Chen X, et al. Balance, proprioception, and gross motor development of Chinese children aged 3 to 6 years. J Mot Behav. 2018;50(3):343–52. Available at: http://dx.doi.org/10.1080/00222895.2017.1363694
5. Gandotra A, Kotyuk E, Bizonics R, Khan I, Petánszki M, Kiss L, et al. An exploratory study of the relationship between motor skills and indicators of cognitive and socio-emotional development in preschoolers. Eur J Dev Psychol. 2023;20(1):50–65. Available at: http://dx.doi.org/10.1080/17405629.2022.2028617
6. Bremer E, Cairney J. Fundamental Movement Skills and Health-Related Outcomes: A Narrative Review of Longitudinal and Intervention Studies Targeting Typically Developing Children. Am J Lifestyle Med. 2018;12(2):148-59. https://doi.org/10.1177/1559827616640196
7. Eddy LH, Wood ML, Shire KA, Bingham DD, Bonnick E, Creaser A, et al. A systematic review of randomized and case-controlled trials investigating the effectiveness of school-based motor skill interventions in 3- to 12-year-old children. Child Care Health Dev. 2019;45(6):773-90. https://doi.org/10.1111/cch.12712

8. No Clear Statement of the Research Gap:
While the introduction implies a gap in the literature (i.e., a lack of formal review of AI in BMS assessment for children), this could be more explicitly stated and emphasized. It could be clearer why this review is necessary and what new insights it might offer.
Suggestion: Make the research gap more prominent by rephrasing the last couple of sentences to more forcefully explain why this scoping review is essential.

We have revised the last two final sentences of the introduction to highlight the research gap in the use of AI to accurately assess BMS in children and the need for this scoping review. Specifically, we now emphasize the lack of a comprehensive review of AI applications for assessing basic motor skills (BMS) in preschool children, as well as the need to systematize the scope, limitations, and validity of these technologies.
Specifically, we have added the following: "However, despite the increasing use of AI in motor performance assessment, there is no comprehensive review examining its specific application in the assessment of BMS in preschool children, being a crucial stage for early detection and intervention. Furthermore, the scope, limitations and validity of AI-based technologies in this context are not yet clearly systematized. Therefore, it is required to synthesize existing knowledge and guide the development of more accurate and accessible assessment tools"

9. Repetitive Mention of AI:

The introduction repeatedly mentions “AI-related technologies,” which could be streamlined. The term is used several times in close proximity without adding new information each time, which can feel redundant.
Suggestion: Try to vary the phrasing (e.g., "machine learning tools," "AI classifiers") to keep the introduction engaging and avoid repetition.

Our team has made sure to review and update the text to avoid redundancy of the term “AI-related technologies”. We have also decided to keep the term “AI-related technologies” consistently throughout the introduction to ensure clarity and consistency throughout the manuscript. In fact, in the third paragraph, we have included a clarification referring to the term “AI-related technologies”.

Specifically, in the first three lines of the third paragraph in the introduction, we have added: "AI-related technologies (i.e., computational systems that use artificial intelligence to analyze, learn from, and interpret data) offer a promising alternative to minimize observer bias in BMS assessment (14)"

10. Unfinished Sentence in Objective Statement:

The final sentence of the introduction (before the objectives) seems cut off. This makes the last thought feel incomplete and leaves the reader hanging.
Suggestion: Complete the sentence and ensure that all points are concluded before introducing the research objectives.

We have reviewed and revised this section to ensure the idea is fully expressed and concluded before presenting the research objectives. Now, in the final lines of the fourth paragraph of the introduction, we have added the following:
“However, despite the increasing use of AI in motor performance assessment, there is no comprehensive review examining its specific application in the assessment of BMS in preschool children, being a crucial stage for early detection and intervention. Moreover, the scope, limitations and validity of AI-based technologies in this context are not yet clearly systematized. Therefore, it is required to synthesize existing knowledge and guide the development of more accurate and accessible assessment tools”

Methods and discussion:

1. Lack of Detail in Study Selection Criteria

While the article clearly defines the types of studies it targeted (engineering, substantive validation, and use of AI-related technologies), it doesn’t provide much detail on what specific inclusion and exclusion criteria were applied beyond these categories. For example, were studies excluded for reasons like sample size, study design, or methodological rigor?
Suggestion: Provide more specific inclusion and exclusion criteria. Were only randomized controlled trials considered? Were there any restrictions based on the publication type (e.g., peer-reviewed articles only)?

We have revised the manuscript and added more detailed inclusion and exclusion criteria. Specifically, in the third paragraph of methods, we added: "We also defined the following criteria for the search: 1) studies in preschool-aged children (3 to 6 years), 2) studies in which the motor ability (motor or play skills) of the child was assessed using AI-related technologies for motion detection, and 3) studies in which at least one of the basic motor skills described in the literature (running, jumping, kicking, throwing, or catching a ball) was measured. In addition, we excluded 1) studies that did not clearly describe the AI-related technology used or developed, 2) opinion articles, editorials, or narrative reviews without empirical data and 3) gray literature (e.g. theses, dissertations, or non-peer-reviewed reports)."

2. Vague Explanation of Search Strategy

The search strategy mentions specific databases but does not explain the rationale for selecting these particular sources over others. Are there other databases relevant to the field that might have been excluded? How were the keywords selected, and were any synonyms or related terms considered?
Suggestion: Provide a clearer justification for the selection of these particular databases and search terms. Did you consider any grey literature (e.g., theses, dissertations, reports from non-peer-reviewed sources) to broaden the search?

We've added clearer justification for our selection of databases and search terms. Specifically, we added in the methods section, subsection Search strategy, the following two paragraphs:

"We searched for studies published before January 30, 2023 in the target publications in Medline (SCR_002185), Web of Science (SCR_022706), IEEE (SCR_008314), and EBSCO (SCR_022707). These databases were selected because they specialize in biomedical, engineering, and multidisciplinary research, ensuring that we captured relevant studies in health sciences, AI applications, and motion analysis.
Search terms included keyword combinations such as “child,” “preschool,” “basic motor skills,” “artificial intelligence,” “motion sensing,” and “calibration,” along with related terms and synonyms identified through a preliminary literature review (keywords) and controlled vocabulary (MeSH terms). The full search strategy and complete list of search terms are available here (42)."
Likewise, we cite the complete search strategy as number 42.

Details on the Rayyan Platform

The mention of the Rayyan platform for managing studies is helpful, but the text doesn't clarify whether Rayyan was used for the full review process or just the initial screening. The review process appears to be manual in nature, but it could benefit from some details on how disagreements were handled between reviewers (e.g., was there a consensus meeting or did one reviewer have final say on the study’s inclusion?).

Our team has detailed how Rayyan was used during the review process. Specifically, in the methods section, subsection Search strategy, in the third, fourth and fifth paragraph, we add:

"The search formulas were applied to the databases and all the files were exported in RIS format. Then, to ensure an objective selection process, these identified files were uploaded to the Rayyan platform which facilitated blind selection by the reviewers and expedited the identification of duplicates.
The selection process consisted of two phases. In the first phase, titles and abstracts were reviewed by two independent groups (each consisting of two previously trained medical students). To minimize selection bias, the Rayyan blinding function was used, which prevented reviewers from identifying the decisions of the other reviewers until the final selection phase. In addition, allocation of studies to reviewers was randomized within each group to further reduce potential bias. In case of disagreement, a consensus discussion was held among the reviewers. If consensus could not be reached, the principal investigator made the final inclusion decision.
In the second phase, a full-text review was performed following the same procedure, ensuring consistency and methodological rigor. The final set of studies was determined after resolving all discrepancies through consensus discussions and the intervention of the principal investigator."

Potential Bias in Study Selection

The method states that the initial review of titles and abstracts was done by two independent groups, but there is no mention of any specific strategies to minimize bias during this phase. Given that the authors are likely familiar with the topic, it could be helpful to acknowledge how any biases in study selection (e.g., confirmation bias or publication bias) were minimized.
Suggestion: Mention strategies to reduce bias in study selection. For example, were studies randomly assigned to reviewers, and was there a protocol to ensure that preconceived notions didn’t influence the selection process?

Our team has added information about the process to minimize bias during study selection. Specifically, in the methods section, Search Strategy subsection, in the fourth and fifth paragraphs, we add:
"The selection process consisted of two phases. In the first phase, titles and abstracts were reviewed by two independent groups (each consisting of two previously trained medical students). To minimize selection bias, the Rayyan blinding function was used, which prevented reviewers from identifying the decisions of the other reviewers until the final selection phase. In addition, allocation of studies to reviewers was randomized within each group to further reduce potential bias. In case of disagreement, a consensus discussion was held among the reviewers. If consensus could not be reached, the principal investigator made the final inclusion decision.
In the second phase, a full-text review was performed following the same procedure, ensuring consistency and methodological rigor. The final set of studies was determined after resolving all discrepancies through consensus discussions and the intervention of the principal investigator."

5. Limited Details on Data Extraction Process

While the data extraction form is outlined with categories such as general information, engineering, substantive validation, and use, there is little detail on how the data were extracted from the studies. For example, was any kind of reliability check conducted between reviewers, or were discrepancies resolved through discussion?
Suggestion: Provide more details on the data extraction process, such as whether two independent reviewers performed the extraction and how discrepancies were resolved. Additionally, were any statistical methods used to assess inter-rater reliability in the extraction process?

We have updated the paragraph by adding information in the methods section, Data extraction subsection. Now the text says the following:
"Data extraction was performed in a structured manner using a pre-designed form (43). To reduce errors and improve the accuracy of the extracted data, one peer reviewer performed the initial extraction and a second peer independently verified the information. Any discrepancies in the extraction were reviewed jointly and/or, with the intervention of the principal investigator. Cross-checks were implemented to ensure the consistency of the information collected."

Lack of Clarity on Data Analysis

The description of data analysis mentions the use of descriptive statistics (frequencies and percentages) and the COSMIN standards for evaluating psychometric properties. However, there is no clear explanation of how the data were analyzed or how the results were synthesized. Were any meta-analysis or qualitative synthesis methods considered? How were the studies compared and summarized?
Suggestion: Provide more clarity on how the data were synthesized. Was any quantitative or qualitative analysis beyond simple descriptive statistics performed, such as meta-analysis or thematic analysis? Did you perform a subgroup analysis for specific types of AI technologies or specific study characteristics?

Since our study was a scoping review, we did not conduct a meta-analysis. Instead, we opted for a narrative synthesis of the results, presented in tables of frequencies and percentages. We also used the COSMIN standards, whose scoring and rating process is described in their original report, which we also reference in our study.
Specifically, our text, in the methods section, subsection "Data analysis," in its first paragraph, states the following: "All data collected were summarized as categorical variables, organized and presented in tables, using descriptive statistics such as simple frequencies and percentages. Since this was a scoping review, a narrative synthesis was used to summarize the findings of the studies, focusing on the characteristics and psychometric properties evaluated according to COSMIN standards."

Lack of Discussion on Potential Confounding Variables

There is a brief mention of the use of psychometric standards (COSMIN) to evaluate AI technologies, but the article doesn't discuss how confounding factors (e.g., differences in sample size, age groups, or types of technology used) might impact the validity of the conclusions. Were these factors taken into consideration in the evaluation process?
Suggestion: Acknowledge potential confounding variables and how they might affect the validity of the studies reviewed. How did the authors control for these variables, if at all, in the synthesis process?

We would like to clarify that, since our study is a scoping review focused on the mapping and characterization of AI-based technologies used to assess motor skills in children, we did not aim to establish causal or correlational relationships between variables. Therefore, the methodological approach does not involve controlling or statistically analyzing confounding variables, as would be expected in primary empirical studies. The purpose of this review was to identify and describe the technologies developed, their applications, and the reported psychometric dimensions.
Nonetheless, we have incorporated a critical reflection on the differences in sample sizes, age ranges (expressed in both years and months), and the types of technologies used, acknowledging these as limitations that could influence the validity and generalizability of the findings reported by the included studies.
Specifically, the last lines of the limitations section say the following: "This review did not aim to analyze associations between variables; however, variability in sample sizes, age ranges, and types of AI-based technologies used across studies may affect the comparability and generalizability of the findings. These differences should be considered when interpreting the results and highlight the need for more standardized approaches in future research."

8. Unclear Rationale for Data Collection Methods

In the section about data extraction, the mention of "feasibility and usability" as part of the data form is important but lacks further context. Was there a specific framework used to evaluate these factors? Were the usability and feasibility criteria standardized across the included studies?
Suggestion: Provide more information on the frameworks or criteria used to assess usability and feasibility. Were these evaluations subjective or based on standardized measures? Clarifying this would help readers understand how the data extraction process assessed these aspects.
Discussion   -    Discuss usability, cost-effectiveness, and accessibility challenges. -Emphasize the need for interdisciplinary collaboration in future validation studies. -Propose specific psychometric properties (e.g., construct validity) to prioritize future AI validation research. - Add practical recommendations for improving AI tools (e.g., integrating psychometric standards, improving usability for non-specialists).

Since no standardized framework was used to assess feasibility and usability, this is an aspect that limits the interpretation and comparison of the results among the included studies. We state this information in the limitations.
Specifically, we added to the beginning of the limitations:: "Also, although this review was based on COSMIN standards to assess the psychometric quality of AI-related technologies, due to the heterogeneity observed in the included studies, no specific adjustments were made to control for possible confounding variables. Therefore, the conclusions need to be interpreted with caution. It is recommended that future research address these factors and use control methods to provide more generalizable conclusions. Furthermore, feasibility and usability were extracted only if the reviewed studies explicitly reported having done so in their analysis of AI-related technologies. Therefore, further studies should evaluate these analyzes using a standardized framework."

In response to comments on discussion, we have expanded this section on implications. Specifically, we added the following paragraphs in implications:
"To facilitate use, developers could conduct studies that evaluate the acceptance, ease of use, cost-effectiveness, and accessibility of these technologies. For example, most technologies rely on sensors and monitors that, while accurate, can be costly, require specialized training, and can be difficult to implement in real-world settings for physicians, teachers, therapists, or practitioners unfamiliar with these tools. In addition, disparities in access to advanced technologies may limit their adoption, particularly in low-resource settings.
Also, these types of technologies may be closer to more universal and cost-effective devices, such as video cameras, smartphones, and tablets, that can assess and report motor skills in real time. However, addressing these challenges requires a collaborative and interdisciplinary approach. Future validation studies should involve experts from multiple fields, including engineers, healthcare professionals, educators and policy makers, to ensure that these technologies are not only accurate, but also practical, scalable and accessible to diverse populations.
New validation studies of these technologies should include validation standards for BMS tests, prioritizing key psychometric properties such as construct validity, criterion validity, reliability, measurement error, among others. To make this possible, engineering teams could incorporate specialists in psychometrics, developmental therapy and medicine to work collaboratively. This multidisciplinary approach will facilitate the integration of medical knowledge and psychometric standards into future software releases, improving both measurement accuracy and practical usability. Finally, developers should consider providing open source code or detailed methodological documentation, which will allow for further refinement, replication, and clinical adaptation of these technologies in future research and real-world applications."
Reviewer # 1:

Abstract

Background    -   Redundant phrasing: "two observersers" is misspelled and repetitive.

We corrected the typographical error and improved the phrasing for clarity in the Background section. The revised sentence now reads:
“In basic motor skills evaluation, two observers may rate the same child’s performance differently, introducing variability in the assessment”

Results             -   Tracking of "10 other publications" is unclear and vague.

We revised the sentence in the Results section to provide a more precise description of the process. The updated text now reads:

“From the 7 technology development studies, we examined their citation networks using Google Scholar and identified 10 subsequent peer-reviewed publications that either enhanced the original technologies or applied them in new research contexts.”

                          -   Overuse of technical language without explanation for general readers (e.g., "engineering criteria").

We revised the sentence to reduce technical jargon and make it more accessible to general readers. In the corresponding section, the updated sentence now reads:

“The validation of these algorithms was based on engineering standards, focusing on their accuracy and technical performance, but without integrating medical and psychological knowledge about children's motor development.”

Keywords       -   Missing potential keywords like "assessment tools," “functional abilities,” and   "gross motor” based on the ‘title and the contents of the study

We have updated the list of keywords to better reflect the title and content of the study. The following terms have been added:
“assessment tools,” “functional abilities,” and “gross motor.”

Introduction: suggestions

Overly Technical Language:

While the introduction is generally informative, it becomes very technical in certain sections, especially when describing the AI process (e.g., "Fast Fourier Transformation," "principal components analysis," "linear discriminant analysis"). This might be difficult for readers unfamiliar with these terms or the AI field.

Suggestion: Simplify the explanation or provide brief definitions or context for these terms.

We have revised the third paragraph of the Introduction to provide brief definitions and clearer context for the mentioned techniques. Specifically, we now explain the role of Fast Fourier Transformation, principal components analysis, and linear discriminant analysis in simpler terms to aid reader understanding. The revised sentence reads as follows:

“Then, these data undergo pre-processing to reduce noise and enhance relevant features. This step often involves filtering techniques, such as Fast Fourier Transformation (which helps separate important movement signals from background noise) or wavelet transforms. Additionally, to simplify complex data and highlight key movement patterns, methods like principal components analysis (which reduces data dimensions while preserving essential information) or linear discriminant analysis (which enhances the distinction between movement categories) are applied (17).”

2. Lack of Clear Connection Between BMS and AI:

While the introduction discusses BMS, AI technologies, and their potential to address observer bias, the connection between these topics could be more explicit. The introduction seems to jump between BMS, traditional measurement tools, observer bias, and AI without a smooth flow that ties everything together.

Suggestion: Strengthen the connection between the problems with current BMS assessment and how AI could specifically address them. A clearer explanation of how AI can solve observer bias and improve accuracy in BMS assessment would help make the argument more compelling.

We agree that the connection between BMS, traditional measurement tools, observer bias, and artificial intelligence (AI)-related technology could be more explicit. In response to your suggestion, we have made adjustments to the third paragraph of the Introduction to improve the flow and strengthen the connection between these topics.
Specifically, we added a clearer explanation of how AI can directly address observer bias and improve accuracy in BMS assessment. We now describe how AI, by automating the motion classification process, reduces human subjectivity and enables more objective and accurate evaluations.
In this way, we aim to reinforce the argument that AI-related technology not only offers an alternative to traditional methods but effectively addresses the limitations of current BMS assessment approaches.

3. Limited Emphasis on the Scope of the Problem:
The issue of observer bias and its impact on BMS assessment is raised, but it’s not explored in-depth. The extent of the problem (how often it happens, how significant the impact is) isn’t fully explained.
Suggestion: Provide more concrete examples or data to highlight the real-world implications of observer bias in BMS assessments. This would help emphasize the need for better solutions like AI.

We have strengthened the second paragraph of the Introduction by incorporating concrete data that illustrate the prevalence and real-world consequences of this issue. Specifically, we now include evidence from two reviews: one indicating that, among 960 behavioral studies, only 3.2% reported interobserver reliability measures and only 1.9% met rigorous criteria for minimizing bias (12); and another highlighting that in child development research, poor reporting and variability in assessor performance may obscure children’s true developmental status, potentially compromising clinical decisions (13).

These additions aim to clarify the extent and significance of observer bias, reinforcing the necessity of adopting more objective tools—such as AI-based technologies—for accurate and reliable BMS assessment.

4. Vague Description of "AI-Related Technologies":
The term “AI-related technologies” is introduced, but it remains somewhat vague. While the steps of motion recognition are detailed, it’s not entirely clear what specific AI tools or algorithms are most effective in this context or how these methods directly translate to motor skill assessment in children.
Suggestion: Clarify the specific AI tools used or might be used in BMS assessment. This could make the introduction feel more grounded in practical applications.

We have revised the beginning of the third paragraph of the Introduction to explicitly define “AI-related technologies” and specify the tools most relevant for BMS assessment. The revised text reads as follows:
“AI-related technologies (i.e., computational systems that use artificial intelligence to analyze, learn from, and interpret data) offer a promising alternative to minimize observer bias in BMS assessment (14). For example, for motion capture and analysis, computer vision tools such as OpenPose, MediaPipe and DeepLabCut enable pose estimation and tracking of key points of the human body with high accuracy (15). In addition, deep learning techniques, such as Convolutional Neural Networks (CNN) and vision-specialized Transformer Models (ViT), have proven to be effective in classifying motion sequences in videos (16). In that sense, these AI-related technologies for recognizing and classifying human motion patterns consist of several steps (Figure 1) (17).”

5. Inconsistent Flow and Structure:
The flow between sections could be smoother. For example, the transition from talking about BMS instruments (TGMD-2, BOT-2) to observer bias is a bit abrupt. Similarly, after discussing the issue of observer bias, the leap to the technicalities of AI feels disconnected.
Suggestion: Reorganize the content so that each section builds more naturally on the previous one. Consider adding a paragraph that ties observer bias to the introduction of AI as a solution before jumping into the technical aspects.

We appreciate your feedback on the flow and structure of the manuscript. Our team has improved the text to facilitate smooth transitions between sections.
Specifically, the first part of the paragraph now reads: "Typically, BMS assessment relies on trained professionals who observe, record, and score children's performance on specific motor tasks (8,9). However, a major challenge in this approach is observer bias. Even when raters receive standardized training, small differences in scoring can introduce variability in BMS measurements. This variability reduces the accuracy of the assessment and can lead to misinterpretations. For example, two children with similar motor skills may receive different scores depending on the assessor, resulting in inconsistent results. When these inconsistencies follow a systematic pattern, they contribute to observer bias, a well-documented source of measurement error (10,11).".
We have also added a transitional paragraph at the end of the second paragraph, linking the problem of observer bias to the introduction of AI technologies as a solution. This paragraph highlights how observer bias can affect BMS assessment results, and the following paragraph addresses how AI-related technologies can mitigate this problem, offering a more objective and accurate alternative. The text reads: "In fact, one review reported that of 960 behavioral studies, only 3.2% reported measures of interobserver reliability, and only 1.9% met rigorous criteria for minimizing bias (12). Similarly, another review on child development found that the quality of reporting on the use of assessors in these studies was poor and that variability in assessor performance may obscure the true developmental status of children, compromising complex and costly clinical decisions (13)”.

6. Missing Emphasis on the Target Age Group (3-6 Years):
The introduction discusses BMS in general terms but doesn’t emphasize the importance of assessing motor skills in children aged 3 to 6 years. Since this is a specific target group, more focus on why this age range is critical for BMS development would strengthen the relevance of the study.
Suggestion: Provide a brief explanation of why this age group is the focus and what makes BMS development particularly crucial at this stage.

We have added a brief explanation in the first paragraph of the introduction that highlights why this age group is crucial for BMS development. Specifically, we have pointed out that during this period of rapid physical and motor development, children refine fundamental skills that are essential for later complex activities, which directly impacts their cognitive, emotional, and social development. In addition, we underscored the relevance of early assessment to identify potential delays and facilitate timely interventions.
Now the text says the following: "The development of basic motor skills (BMS) in children aged 3 to 6 years is critical, as this is a period of rapid motor growth, where children acquire physical skills that allow them to participate in a variety of activities (1,2). At this age, children experience significant improvements in gross motor control, allowing them to perform movements such as running, jumping, and manipulating objects with greater precision (3,4). The acquisition of these motor skills is essential for physical, cognitive and emotional development, as BMS are strongly linked to general well-being, self-esteem and social integration (5)."

7. Lack of Citations for Claims About AI in Other Fields:
There are a few claims about AI applications in other fields (e.g., assessing physical activity in adults, recognizing walking alterations), but not all of these are fully supported by citations. The mention of “a recent review” and other general statements could be more precise.
Suggestion: Provide specific references or clarify that the claims about AI in other fields are based on the literature review cited earlier.

We have carefully reviewed the claims about AI applications and ensured that each is supported by an appropriate reference. In addition, we have revised the general claims to improve their accuracy with the cited literature.
Specifically, we have updated citations 4, 5, 6, and 7 with their respective references:
4. Jiang G-P, Jiao X-B, Wu S-K, Ji Z-Q, Liu W-T, Chen X, et al. Balance, proprioception, and gross motor development of Chinese children aged 3 to 6 years. J Mot Behav. 2018;50(3):343–52. Available at: http://dx.doi.org/10.1080/00222895.2017.1363694
5. Gandotra A, Kotyuk E, Bizonics R, Khan I, Petánszki M, Kiss L, et al. An exploratory study of the relationship between motor skills and indicators of cognitive and socio-emotional development in preschoolers. Eur J Dev Psychol. 2023;20(1):50–65. Available at: http://dx.doi.org/10.1080/17405629.2022.2028617
6. Bremer E, Cairney J. Fundamental Movement Skills and Health-Related Outcomes: A Narrative Review of Longitudinal and Intervention Studies Targeting Typically Developing Children. Am J Lifestyle Med. 2018;12(2):148-59. https://doi.org/10.1177/1559827616640196
7. Eddy LH, Wood ML, Shire KA, Bingham DD, Bonnick E, Creaser A, et al. A systematic review of randomized and case-controlled trials investigating the effectiveness of school-based motor skill interventions in 3- to 12-year-old children. Child Care Health Dev. 2019;45(6):773-90. https://doi.org/10.1111/cch.12712

8. No Clear Statement of the Research Gap:
While the introduction implies a gap in the literature (i.e., a lack of formal review of AI in BMS assessment for children), this could be more explicitly stated and emphasized. It could be clearer why this review is necessary and what new insights it might offer.
Suggestion: Make the research gap more prominent by rephrasing the last couple of sentences to more forcefully explain why this scoping review is essential.

We have revised the last two final sentences of the introduction to highlight the research gap in the use of AI to accurately assess BMS in children and the need for this scoping review. Specifically, we now emphasize the lack of a comprehensive review of AI applications for assessing basic motor skills (BMS) in preschool children, as well as the need to systematize the scope, limitations, and validity of these technologies.
Specifically, we have added the following: "However, despite the increasing use of AI in motor performance assessment, there is no comprehensive review examining its specific application in the assessment of BMS in preschool children, being a crucial stage for early detection and intervention. Furthermore, the scope, limitations and validity of AI-based technologies in this context are not yet clearly systematized. Therefore, it is required to synthesize existing knowledge and guide the development of more accurate and accessible assessment tools"

9. Repetitive Mention of AI:

The introduction repeatedly mentions “AI-related technologies,” which could be streamlined. The term is used several times in close proximity without adding new information each time, which can feel redundant.
Suggestion: Try to vary the phrasing (e.g., "machine learning tools," "AI classifiers") to keep the introduction engaging and avoid repetition.

Our team has made sure to review and update the text to avoid redundancy of the term “AI-related technologies”. We have also decided to keep the term “AI-related technologies” consistently throughout the introduction to ensure clarity and consistency throughout the manuscript. In fact, in the third paragraph, we have included a clarification referring to the term “AI-related technologies”.

Specifically, in the first three lines of the third paragraph in the introduction, we have added: "AI-related technologies (i.e., computational systems that use artificial intelligence to analyze, learn from, and interpret data) offer a promising alternative to minimize observer bias in BMS assessment (14)"

10. Unfinished Sentence in Objective Statement:

The final sentence of the introduction (before the objectives) seems cut off. This makes the last thought feel incomplete and leaves the reader hanging.
Suggestion: Complete the sentence and ensure that all points are concluded before introducing the research objectives.

We have reviewed and revised this section to ensure the idea is fully expressed and concluded before presenting the research objectives. Now, in the final lines of the fourth paragraph of the introduction, we have added the following:
“However, despite the increasing use of AI in motor performance assessment, there is no comprehensive review examining its specific application in the assessment of BMS in preschool children, being a crucial stage for early detection and intervention. Moreover, the scope, limitations and validity of AI-based technologies in this context are not yet clearly systematized. Therefore, it is required to synthesize existing knowledge and guide the development of more accurate and accessible assessment tools”

Methods and discussion:

1. Lack of Detail in Study Selection Criteria

While the article clearly defines the types of studies it targeted (engineering, substantive validation, and use of AI-related technologies), it doesn’t provide much detail on what specific inclusion and exclusion criteria were applied beyond these categories. For example, were studies excluded for reasons like sample size, study design, or methodological rigor?
Suggestion: Provide more specific inclusion and exclusion criteria. Were only randomized controlled trials considered? Were there any restrictions based on the publication type (e.g., peer-reviewed articles only)?

We have revised the manuscript and added more detailed inclusion and exclusion criteria. Specifically, in the third paragraph of methods, we added: "We also defined the following criteria for the search: 1) studies in preschool-aged children (3 to 6 years), 2) studies in which the motor ability (motor or play skills) of the child was assessed using AI-related technologies for motion detection, and 3) studies in which at least one of the basic motor skills described in the literature (running, jumping, kicking, throwing, or catching a ball) was measured. In addition, we excluded 1) studies that did not clearly describe the AI-related technology used or developed, 2) opinion articles, editorials, or narrative reviews without empirical data and 3) gray literature (e.g. theses, dissertations, or non-peer-reviewed reports)."

2. Vague Explanation of Search Strategy

The search strategy mentions specific databases but does not explain the rationale for selecting these particular sources over others. Are there other databases relevant to the field that might have been excluded? How were the keywords selected, and were any synonyms or related terms considered?
Suggestion: Provide a clearer justification for the selection of these particular databases and search terms. Did you consider any grey literature (e.g., theses, dissertations, reports from non-peer-reviewed sources) to broaden the search?

We've added clearer justification for our selection of databases and search terms. Specifically, we added in the methods section, subsection Search strategy, the following two paragraphs:

"We searched for studies published before January 30, 2023 in the target publications in Medline (SCR_002185), Web of Science (SCR_022706), IEEE (SCR_008314), and EBSCO (SCR_022707). These databases were selected because they specialize in biomedical, engineering, and multidisciplinary research, ensuring that we captured relevant studies in health sciences, AI applications, and motion analysis.
Search terms included keyword combinations such as “child,” “preschool,” “basic motor skills,” “artificial intelligence,” “motion sensing,” and “calibration,” along with related terms and synonyms identified through a preliminary literature review (keywords) and controlled vocabulary (MeSH terms). The full search strategy and complete list of search terms are available here (42)."
Likewise, we cite the complete search strategy as number 42.

Details on the Rayyan Platform

The mention of the Rayyan platform for managing studies is helpful, but the text doesn't clarify whether Rayyan was used for the full review process or just the initial screening. The review process appears to be manual in nature, but it could benefit from some details on how disagreements were handled between reviewers (e.g., was there a consensus meeting or did one reviewer have final say on the study’s inclusion?).

Our team has detailed how Rayyan was used during the review process. Specifically, in the methods section, subsection Search strategy, in the third, fourth and fifth paragraph, we add:

"The search formulas were applied to the databases and all the files were exported in RIS format. Then, to ensure an objective selection process, these identified files were uploaded to the Rayyan platform which facilitated blind selection by the reviewers and expedited the identification of duplicates.
The selection process consisted of two phases. In the first phase, titles and abstracts were reviewed by two independent groups (each consisting of two previously trained medical students). To minimize selection bias, the Rayyan blinding function was used, which prevented reviewers from identifying the decisions of the other reviewers until the final selection phase. In addition, allocation of studies to reviewers was randomized within each group to further reduce potential bias. In case of disagreement, a consensus discussion was held among the reviewers. If consensus could not be reached, the principal investigator made the final inclusion decision.
In the second phase, a full-text review was performed following the same procedure, ensuring consistency and methodological rigor. The final set of studies was determined after resolving all discrepancies through consensus discussions and the intervention of the principal investigator."

Potential Bias in Study Selection

The method states that the initial review of titles and abstracts was done by two independent groups, but there is no mention of any specific strategies to minimize bias during this phase. Given that the authors are likely familiar with the topic, it could be helpful to acknowledge how any biases in study selection (e.g., confirmation bias or publication bias) were minimized.
Suggestion: Mention strategies to reduce bias in study selection. For example, were studies randomly assigned to reviewers, and was there a protocol to ensure that preconceived notions didn’t influence the selection process?

Our team has added information about the process to minimize bias during study selection. Specifically, in the methods section, Search Strategy subsection, in the fourth and fifth paragraphs, we add:
"The selection process consisted of two phases. In the first phase, titles and abstracts were reviewed by two independent groups (each consisting of two previously trained medical students). To minimize selection bias, the Rayyan blinding function was used, which prevented reviewers from identifying the decisions of the other reviewers until the final selection phase. In addition, allocation of studies to reviewers was randomized within each group to further reduce potential bias. In case of disagreement, a consensus discussion was held among the reviewers. If consensus could not be reached, the principal investigator made the final inclusion decision.
In the second phase, a full-text review was performed following the same procedure, ensuring consistency and methodological rigor. The final set of studies was determined after resolving all discrepancies through consensus discussions and the intervention of the principal investigator."

5. Limited Details on Data Extraction Process

While the data extraction form is outlined with categories such as general information, engineering, substantive validation, and use, there is little detail on how the data were extracted from the studies. For example, was any kind of reliability check conducted between reviewers, or were discrepancies resolved through discussion?
Suggestion: Provide more details on the data extraction process, such as whether two independent reviewers performed the extraction and how discrepancies were resolved. Additionally, were any statistical methods used to assess inter-rater reliability in the extraction process?

We have updated the paragraph by adding information in the methods section, Data extraction subsection. Now the text says the following:
"Data extraction was performed in a structured manner using a pre-designed form (43). To reduce errors and improve the accuracy of the extracted data, one peer reviewer performed the initial extraction and a second peer independently verified the information. Any discrepancies in the extraction were reviewed jointly and/or, with the intervention of the principal investigator. Cross-checks were implemented to ensure the consistency of the information collected."

Lack of Clarity on Data Analysis

The description of data analysis mentions the use of descriptive statistics (frequencies and percentages) and the COSMIN standards for evaluating psychometric properties. However, there is no clear explanation of how the data were analyzed or how the results were synthesized. Were any meta-analysis or qualitative synthesis methods considered? How were the studies compared and summarized?
Suggestion: Provide more clarity on how the data were synthesized. Was any quantitative or qualitative analysis beyond simple descriptive statistics performed, such as meta-analysis or thematic analysis? Did you perform a subgroup analysis for specific types of AI technologies or specific study characteristics?

Since our study was a scoping review, we did not conduct a meta-analysis. Instead, we opted for a narrative synthesis of the results, presented in tables of frequencies and percentages. We also used the COSMIN standards, whose scoring and rating process is described in their original report, which we also reference in our study.
Specifically, our text, in the methods section, subsection "Data analysis," in its first paragraph, states the following: "All data collected were summarized as categorical variables, organized and presented in tables, using descriptive statistics such as simple frequencies and percentages. Since this was a scoping review, a narrative synthesis was used to summarize the findings of the studies, focusing on the characteristics and psychometric properties evaluated according to COSMIN standards."

Lack of Discussion on Potential Confounding Variables

There is a brief mention of the use of psychometric standards (COSMIN) to evaluate AI technologies, but the article doesn't discuss how confounding factors (e.g., differences in sample size, age groups, or types of technology used) might impact the validity of the conclusions. Were these factors taken into consideration in the evaluation process?
Suggestion: Acknowledge potential confounding variables and how they might affect the validity of the studies reviewed. How did the authors control for these variables, if at all, in the synthesis process?

We would like to clarify that, since our study is a scoping review focused on the mapping and characterization of AI-based technologies used to assess motor skills in children, we did not aim to establish causal or correlational relationships between variables. Therefore, the methodological approach does not involve controlling or statistically analyzing confounding variables, as would be expected in primary empirical studies. The purpose of this review was to identify and describe the technologies developed, their applications, and the reported psychometric dimensions.
Nonetheless, we have incorporated a critical reflection on the differences in sample sizes, age ranges (expressed in both years and months), and the types of technologies used, acknowledging these as limitations that could influence the validity and generalizability of the findings reported by the included studies.
Specifically, the last lines of the limitations section say the following: "This review did not aim to analyze associations between variables; however, variability in sample sizes, age ranges, and types of AI-based technologies used across studies may affect the comparability and generalizability of the findings. These differences should be considered when interpreting the results and highlight the need for more standardized approaches in future research."

8. Unclear Rationale for Data Collection Methods

In the section about data extraction, the mention of "feasibility and usability" as part of the data form is important but lacks further context. Was there a specific framework used to evaluate these factors? Were the usability and feasibility criteria standardized across the included studies?
Suggestion: Provide more information on the frameworks or criteria used to assess usability and feasibility. Were these evaluations subjective or based on standardized measures? Clarifying this would help readers understand how the data extraction process assessed these aspects.
Discussion   -    Discuss usability, cost-effectiveness, and accessibility challenges. -Emphasize the need for interdisciplinary collaboration in future validation studies. -Propose specific psychometric properties (e.g., construct validity) to prioritize future AI validation research. - Add practical recommendations for improving AI tools (e.g., integrating psychometric standards, improving usability for non-specialists).

Since no standardized framework was used to assess feasibility and usability, this is an aspect that limits the interpretation and comparison of the results among the included studies. We state this information in the limitations.
Specifically, we added to the beginning of the limitations:: "Also, although this review was based on COSMIN standards to assess the psychometric quality of AI-related technologies, due to the heterogeneity observed in the included studies, no specific adjustments were made to control for possible confounding variables. Therefore, the conclusions need to be interpreted with caution. It is recommended that future research address these factors and use control methods to provide more generalizable conclusions. Furthermore, feasibility and usability were extracted only if the reviewed studies explicitly reported having done so in their analysis of AI-related technologies. Therefore, further studies should evaluate these analyzes using a standardized framework."

In response to comments on discussion, we have expanded this section on implications. Specifically, we added the following paragraphs in implications:
"To facilitate use, developers could conduct studies that evaluate the acceptance, ease of use, cost-effectiveness, and accessibility of these technologies. For example, most technologies rely on sensors and monitors that, while accurate, can be costly, require specialized training, and can be difficult to implement in real-world settings for physicians, teachers, therapists, or practitioners unfamiliar with these tools. In addition, disparities in access to advanced technologies may limit their adoption, particularly in low-resource settings.
Also, these types of technologies may be closer to more universal and cost-effective devices, such as video cameras, smartphones, and tablets, that can assess and report motor skills in real time. However, addressing these challenges requires a collaborative and interdisciplinary approach. Future validation studies should involve experts from multiple fields, including engineers, healthcare professionals, educators and policy makers, to ensure that these technologies are not only accurate, but also practical, scalable and accessible to diverse populations.
New validation studies of these technologies should include validation standards for BMS tests, prioritizing key psychometric properties such as construct validity, criterion validity, reliability, measurement error, among others. To make this possible, engineering teams could incorporate specialists in psychometrics, developmental therapy and medicine to work collaboratively. This multidisciplinary approach will facilitate the integration of medical knowledge and psychometric standards into future software releases, improving both measurement accuracy and practical usability. Finally, developers should consider providing open source code or detailed methodological documentation, which will allow for further refinement, replication, and clinical adaptation of these technologies in future research and real-world applications."
Competing Interests: There is no conflict of interest Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 10 Sep 2025

Joel Figueroa-Quiñones, Lambayeque, Universidad Señor de Sipán, Chiclayo, Peru

10 Sep 2025

Author Response
Reviewer # 1:

Abstract

Background - Redundant phrasing: "two observersers" is misspelled and repetitive.

We corrected the typographical error and improved the phrasing for clarity in the Background ... Continue reading
Reviewer # 1:

Abstract

Background    -   Redundant phrasing: "two observersers" is misspelled and repetitive.

We corrected the typographical error and improved the phrasing for clarity in the Background section. The revised sentence now reads:
“In basic motor skills evaluation, two observers may rate the same child’s performance differently, introducing variability in the assessment”

Results             -   Tracking of "10 other publications" is unclear and vague.

We revised the sentence in the Results section to provide a more precise description of the process. The updated text now reads:

“From the 7 technology development studies, we examined their citation networks using Google Scholar and identified 10 subsequent peer-reviewed publications that either enhanced the original technologies or applied them in new research contexts.”

                          -   Overuse of technical language without explanation for general readers (e.g., "engineering criteria").

We revised the sentence to reduce technical jargon and make it more accessible to general readers. In the corresponding section, the updated sentence now reads:

“The validation of these algorithms was based on engineering standards, focusing on their accuracy and technical performance, but without integrating medical and psychological knowledge about children's motor development.”

Keywords       -   Missing potential keywords like "assessment tools," “functional abilities,” and   "gross motor” based on the ‘title and the contents of the study

We have updated the list of keywords to better reflect the title and content of the study. The following terms have been added:
“assessment tools,” “functional abilities,” and “gross motor.”

Introduction: suggestions

Overly Technical Language:

While the introduction is generally informative, it becomes very technical in certain sections, especially when describing the AI process (e.g., "Fast Fourier Transformation," "principal components analysis," "linear discriminant analysis"). This might be difficult for readers unfamiliar with these terms or the AI field.

Suggestion: Simplify the explanation or provide brief definitions or context for these terms.

We have revised the third paragraph of the Introduction to provide brief definitions and clearer context for the mentioned techniques. Specifically, we now explain the role of Fast Fourier Transformation, principal components analysis, and linear discriminant analysis in simpler terms to aid reader understanding. The revised sentence reads as follows:

“Then, these data undergo pre-processing to reduce noise and enhance relevant features. This step often involves filtering techniques, such as Fast Fourier Transformation (which helps separate important movement signals from background noise) or wavelet transforms. Additionally, to simplify complex data and highlight key movement patterns, methods like principal components analysis (which reduces data dimensions while preserving essential information) or linear discriminant analysis (which enhances the distinction between movement categories) are applied (17).”

2. Lack of Clear Connection Between BMS and AI:

While the introduction discusses BMS, AI technologies, and their potential to address observer bias, the connection between these topics could be more explicit. The introduction seems to jump between BMS, traditional measurement tools, observer bias, and AI without a smooth flow that ties everything together.

Suggestion: Strengthen the connection between the problems with current BMS assessment and how AI could specifically address them. A clearer explanation of how AI can solve observer bias and improve accuracy in BMS assessment would help make the argument more compelling.

We agree that the connection between BMS, traditional measurement tools, observer bias, and artificial intelligence (AI)-related technology could be more explicit. In response to your suggestion, we have made adjustments to the third paragraph of the Introduction to improve the flow and strengthen the connection between these topics.
Specifically, we added a clearer explanation of how AI can directly address observer bias and improve accuracy in BMS assessment. We now describe how AI, by automating the motion classification process, reduces human subjectivity and enables more objective and accurate evaluations.
In this way, we aim to reinforce the argument that AI-related technology not only offers an alternative to traditional methods but effectively addresses the limitations of current BMS assessment approaches.

3. Limited Emphasis on the Scope of the Problem:
The issue of observer bias and its impact on BMS assessment is raised, but it’s not explored in-depth. The extent of the problem (how often it happens, how significant the impact is) isn’t fully explained.
Suggestion: Provide more concrete examples or data to highlight the real-world implications of observer bias in BMS assessments. This would help emphasize the need for better solutions like AI.

We have strengthened the second paragraph of the Introduction by incorporating concrete data that illustrate the prevalence and real-world consequences of this issue. Specifically, we now include evidence from two reviews: one indicating that, among 960 behavioral studies, only 3.2% reported interobserver reliability measures and only 1.9% met rigorous criteria for minimizing bias (12); and another highlighting that in child development research, poor reporting and variability in assessor performance may obscure children’s true developmental status, potentially compromising clinical decisions (13).

These additions aim to clarify the extent and significance of observer bias, reinforcing the necessity of adopting more objective tools—such as AI-based technologies—for accurate and reliable BMS assessment.

4. Vague Description of "AI-Related Technologies":
The term “AI-related technologies” is introduced, but it remains somewhat vague. While the steps of motion recognition are detailed, it’s not entirely clear what specific AI tools or algorithms are most effective in this context or how these methods directly translate to motor skill assessment in children.
Suggestion: Clarify the specific AI tools used or might be used in BMS assessment. This could make the introduction feel more grounded in practical applications.

We have revised the beginning of the third paragraph of the Introduction to explicitly define “AI-related technologies” and specify the tools most relevant for BMS assessment. The revised text reads as follows:
“AI-related technologies (i.e., computational systems that use artificial intelligence to analyze, learn from, and interpret data) offer a promising alternative to minimize observer bias in BMS assessment (14). For example, for motion capture and analysis, computer vision tools such as OpenPose, MediaPipe and DeepLabCut enable pose estimation and tracking of key points of the human body with high accuracy (15). In addition, deep learning techniques, such as Convolutional Neural Networks (CNN) and vision-specialized Transformer Models (ViT), have proven to be effective in classifying motion sequences in videos (16). In that sense, these AI-related technologies for recognizing and classifying human motion patterns consist of several steps (Figure 1) (17).”

5. Inconsistent Flow and Structure:
The flow between sections could be smoother. For example, the transition from talking about BMS instruments (TGMD-2, BOT-2) to observer bias is a bit abrupt. Similarly, after discussing the issue of observer bias, the leap to the technicalities of AI feels disconnected.
Suggestion: Reorganize the content so that each section builds more naturally on the previous one. Consider adding a paragraph that ties observer bias to the introduction of AI as a solution before jumping into the technical aspects.

We appreciate your feedback on the flow and structure of the manuscript. Our team has improved the text to facilitate smooth transitions between sections.
Specifically, the first part of the paragraph now reads: "Typically, BMS assessment relies on trained professionals who observe, record, and score children's performance on specific motor tasks (8,9). However, a major challenge in this approach is observer bias. Even when raters receive standardized training, small differences in scoring can introduce variability in BMS measurements. This variability reduces the accuracy of the assessment and can lead to misinterpretations. For example, two children with similar motor skills may receive different scores depending on the assessor, resulting in inconsistent results. When these inconsistencies follow a systematic pattern, they contribute to observer bias, a well-documented source of measurement error (10,11).".
We have also added a transitional paragraph at the end of the second paragraph, linking the problem of observer bias to the introduction of AI technologies as a solution. This paragraph highlights how observer bias can affect BMS assessment results, and the following paragraph addresses how AI-related technologies can mitigate this problem, offering a more objective and accurate alternative. The text reads: "In fact, one review reported that of 960 behavioral studies, only 3.2% reported measures of interobserver reliability, and only 1.9% met rigorous criteria for minimizing bias (12). Similarly, another review on child development found that the quality of reporting on the use of assessors in these studies was poor and that variability in assessor performance may obscure the true developmental status of children, compromising complex and costly clinical decisions (13)”.

6. Missing Emphasis on the Target Age Group (3-6 Years):
The introduction discusses BMS in general terms but doesn’t emphasize the importance of assessing motor skills in children aged 3 to 6 years. Since this is a specific target group, more focus on why this age range is critical for BMS development would strengthen the relevance of the study.
Suggestion: Provide a brief explanation of why this age group is the focus and what makes BMS development particularly crucial at this stage.

We have added a brief explanation in the first paragraph of the introduction that highlights why this age group is crucial for BMS development. Specifically, we have pointed out that during this period of rapid physical and motor development, children refine fundamental skills that are essential for later complex activities, which directly impacts their cognitive, emotional, and social development. In addition, we underscored the relevance of early assessment to identify potential delays and facilitate timely interventions.
Now the text says the following: "The development of basic motor skills (BMS) in children aged 3 to 6 years is critical, as this is a period of rapid motor growth, where children acquire physical skills that allow them to participate in a variety of activities (1,2). At this age, children experience significant improvements in gross motor control, allowing them to perform movements such as running, jumping, and manipulating objects with greater precision (3,4). The acquisition of these motor skills is essential for physical, cognitive and emotional development, as BMS are strongly linked to general well-being, self-esteem and social integration (5)."

7. Lack of Citations for Claims About AI in Other Fields:
There are a few claims about AI applications in other fields (e.g., assessing physical activity in adults, recognizing walking alterations), but not all of these are fully supported by citations. The mention of “a recent review” and other general statements could be more precise.
Suggestion: Provide specific references or clarify that the claims about AI in other fields are based on the literature review cited earlier.

We have carefully reviewed the claims about AI applications and ensured that each is supported by an appropriate reference. In addition, we have revised the general claims to improve their accuracy with the cited literature.
Specifically, we have updated citations 4, 5, 6, and 7 with their respective references:
4. Jiang G-P, Jiao X-B, Wu S-K, Ji Z-Q, Liu W-T, Chen X, et al. Balance, proprioception, and gross motor development of Chinese children aged 3 to 6 years. J Mot Behav. 2018;50(3):343–52. Available at: http://dx.doi.org/10.1080/00222895.2017.1363694
5. Gandotra A, Kotyuk E, Bizonics R, Khan I, Petánszki M, Kiss L, et al. An exploratory study of the relationship between motor skills and indicators of cognitive and socio-emotional development in preschoolers. Eur J Dev Psychol. 2023;20(1):50–65. Available at: http://dx.doi.org/10.1080/17405629.2022.2028617
6. Bremer E, Cairney J. Fundamental Movement Skills and Health-Related Outcomes: A Narrative Review of Longitudinal and Intervention Studies Targeting Typically Developing Children. Am J Lifestyle Med. 2018;12(2):148-59. https://doi.org/10.1177/1559827616640196
7. Eddy LH, Wood ML, Shire KA, Bingham DD, Bonnick E, Creaser A, et al. A systematic review of randomized and case-controlled trials investigating the effectiveness of school-based motor skill interventions in 3- to 12-year-old children. Child Care Health Dev. 2019;45(6):773-90. https://doi.org/10.1111/cch.12712

8. No Clear Statement of the Research Gap:
While the introduction implies a gap in the literature (i.e., a lack of formal review of AI in BMS assessment for children), this could be more explicitly stated and emphasized. It could be clearer why this review is necessary and what new insights it might offer.
Suggestion: Make the research gap more prominent by rephrasing the last couple of sentences to more forcefully explain why this scoping review is essential.

We have revised the last two final sentences of the introduction to highlight the research gap in the use of AI to accurately assess BMS in children and the need for this scoping review. Specifically, we now emphasize the lack of a comprehensive review of AI applications for assessing basic motor skills (BMS) in preschool children, as well as the need to systematize the scope, limitations, and validity of these technologies.
Specifically, we have added the following: "However, despite the increasing use of AI in motor performance assessment, there is no comprehensive review examining its specific application in the assessment of BMS in preschool children, being a crucial stage for early detection and intervention. Furthermore, the scope, limitations and validity of AI-based technologies in this context are not yet clearly systematized. Therefore, it is required to synthesize existing knowledge and guide the development of more accurate and accessible assessment tools"

9. Repetitive Mention of AI:

The introduction repeatedly mentions “AI-related technologies,” which could be streamlined. The term is used several times in close proximity without adding new information each time, which can feel redundant.
Suggestion: Try to vary the phrasing (e.g., "machine learning tools," "AI classifiers") to keep the introduction engaging and avoid repetition.

Our team has made sure to review and update the text to avoid redundancy of the term “AI-related technologies”. We have also decided to keep the term “AI-related technologies” consistently throughout the introduction to ensure clarity and consistency throughout the manuscript. In fact, in the third paragraph, we have included a clarification referring to the term “AI-related technologies”.

Specifically, in the first three lines of the third paragraph in the introduction, we have added: "AI-related technologies (i.e., computational systems that use artificial intelligence to analyze, learn from, and interpret data) offer a promising alternative to minimize observer bias in BMS assessment (14)"

10. Unfinished Sentence in Objective Statement:

The final sentence of the introduction (before the objectives) seems cut off. This makes the last thought feel incomplete and leaves the reader hanging.
Suggestion: Complete the sentence and ensure that all points are concluded before introducing the research objectives.

We have reviewed and revised this section to ensure the idea is fully expressed and concluded before presenting the research objectives. Now, in the final lines of the fourth paragraph of the introduction, we have added the following:
“However, despite the increasing use of AI in motor performance assessment, there is no comprehensive review examining its specific application in the assessment of BMS in preschool children, being a crucial stage for early detection and intervention. Moreover, the scope, limitations and validity of AI-based technologies in this context are not yet clearly systematized. Therefore, it is required to synthesize existing knowledge and guide the development of more accurate and accessible assessment tools”

Methods and discussion:

1. Lack of Detail in Study Selection Criteria

While the article clearly defines the types of studies it targeted (engineering, substantive validation, and use of AI-related technologies), it doesn’t provide much detail on what specific inclusion and exclusion criteria were applied beyond these categories. For example, were studies excluded for reasons like sample size, study design, or methodological rigor?
Suggestion: Provide more specific inclusion and exclusion criteria. Were only randomized controlled trials considered? Were there any restrictions based on the publication type (e.g., peer-reviewed articles only)?

We have revised the manuscript and added more detailed inclusion and exclusion criteria. Specifically, in the third paragraph of methods, we added: "We also defined the following criteria for the search: 1) studies in preschool-aged children (3 to 6 years), 2) studies in which the motor ability (motor or play skills) of the child was assessed using AI-related technologies for motion detection, and 3) studies in which at least one of the basic motor skills described in the literature (running, jumping, kicking, throwing, or catching a ball) was measured. In addition, we excluded 1) studies that did not clearly describe the AI-related technology used or developed, 2) opinion articles, editorials, or narrative reviews without empirical data and 3) gray literature (e.g. theses, dissertations, or non-peer-reviewed reports)."

2. Vague Explanation of Search Strategy

The search strategy mentions specific databases but does not explain the rationale for selecting these particular sources over others. Are there other databases relevant to the field that might have been excluded? How were the keywords selected, and were any synonyms or related terms considered?
Suggestion: Provide a clearer justification for the selection of these particular databases and search terms. Did you consider any grey literature (e.g., theses, dissertations, reports from non-peer-reviewed sources) to broaden the search?

We've added clearer justification for our selection of databases and search terms. Specifically, we added in the methods section, subsection Search strategy, the following two paragraphs:

"We searched for studies published before January 30, 2023 in the target publications in Medline (SCR_002185), Web of Science (SCR_022706), IEEE (SCR_008314), and EBSCO (SCR_022707). These databases were selected because they specialize in biomedical, engineering, and multidisciplinary research, ensuring that we captured relevant studies in health sciences, AI applications, and motion analysis.
Search terms included keyword combinations such as “child,” “preschool,” “basic motor skills,” “artificial intelligence,” “motion sensing,” and “calibration,” along with related terms and synonyms identified through a preliminary literature review (keywords) and controlled vocabulary (MeSH terms). The full search strategy and complete list of search terms are available here (42)."
Likewise, we cite the complete search strategy as number 42.

Details on the Rayyan Platform

The mention of the Rayyan platform for managing studies is helpful, but the text doesn't clarify whether Rayyan was used for the full review process or just the initial screening. The review process appears to be manual in nature, but it could benefit from some details on how disagreements were handled between reviewers (e.g., was there a consensus meeting or did one reviewer have final say on the study’s inclusion?).

Our team has detailed how Rayyan was used during the review process. Specifically, in the methods section, subsection Search strategy, in the third, fourth and fifth paragraph, we add:

"The search formulas were applied to the databases and all the files were exported in RIS format. Then, to ensure an objective selection process, these identified files were uploaded to the Rayyan platform which facilitated blind selection by the reviewers and expedited the identification of duplicates.
The selection process consisted of two phases. In the first phase, titles and abstracts were reviewed by two independent groups (each consisting of two previously trained medical students). To minimize selection bias, the Rayyan blinding function was used, which prevented reviewers from identifying the decisions of the other reviewers until the final selection phase. In addition, allocation of studies to reviewers was randomized within each group to further reduce potential bias. In case of disagreement, a consensus discussion was held among the reviewers. If consensus could not be reached, the principal investigator made the final inclusion decision.
In the second phase, a full-text review was performed following the same procedure, ensuring consistency and methodological rigor. The final set of studies was determined after resolving all discrepancies through consensus discussions and the intervention of the principal investigator."

Potential Bias in Study Selection

The method states that the initial review of titles and abstracts was done by two independent groups, but there is no mention of any specific strategies to minimize bias during this phase. Given that the authors are likely familiar with the topic, it could be helpful to acknowledge how any biases in study selection (e.g., confirmation bias or publication bias) were minimized.
Suggestion: Mention strategies to reduce bias in study selection. For example, were studies randomly assigned to reviewers, and was there a protocol to ensure that preconceived notions didn’t influence the selection process?

Our team has added information about the process to minimize bias during study selection. Specifically, in the methods section, Search Strategy subsection, in the fourth and fifth paragraphs, we add:
"The selection process consisted of two phases. In the first phase, titles and abstracts were reviewed by two independent groups (each consisting of two previously trained medical students). To minimize selection bias, the Rayyan blinding function was used, which prevented reviewers from identifying the decisions of the other reviewers until the final selection phase. In addition, allocation of studies to reviewers was randomized within each group to further reduce potential bias. In case of disagreement, a consensus discussion was held among the reviewers. If consensus could not be reached, the principal investigator made the final inclusion decision.
In the second phase, a full-text review was performed following the same procedure, ensuring consistency and methodological rigor. The final set of studies was determined after resolving all discrepancies through consensus discussions and the intervention of the principal investigator."

5. Limited Details on Data Extraction Process

While the data extraction form is outlined with categories such as general information, engineering, substantive validation, and use, there is little detail on how the data were extracted from the studies. For example, was any kind of reliability check conducted between reviewers, or were discrepancies resolved through discussion?
Suggestion: Provide more details on the data extraction process, such as whether two independent reviewers performed the extraction and how discrepancies were resolved. Additionally, were any statistical methods used to assess inter-rater reliability in the extraction process?

We have updated the paragraph by adding information in the methods section, Data extraction subsection. Now the text says the following:
"Data extraction was performed in a structured manner using a pre-designed form (43). To reduce errors and improve the accuracy of the extracted data, one peer reviewer performed the initial extraction and a second peer independently verified the information. Any discrepancies in the extraction were reviewed jointly and/or, with the intervention of the principal investigator. Cross-checks were implemented to ensure the consistency of the information collected."

Lack of Clarity on Data Analysis

The description of data analysis mentions the use of descriptive statistics (frequencies and percentages) and the COSMIN standards for evaluating psychometric properties. However, there is no clear explanation of how the data were analyzed or how the results were synthesized. Were any meta-analysis or qualitative synthesis methods considered? How were the studies compared and summarized?
Suggestion: Provide more clarity on how the data were synthesized. Was any quantitative or qualitative analysis beyond simple descriptive statistics performed, such as meta-analysis or thematic analysis? Did you perform a subgroup analysis for specific types of AI technologies or specific study characteristics?

Since our study was a scoping review, we did not conduct a meta-analysis. Instead, we opted for a narrative synthesis of the results, presented in tables of frequencies and percentages. We also used the COSMIN standards, whose scoring and rating process is described in their original report, which we also reference in our study.
Specifically, our text, in the methods section, subsection "Data analysis," in its first paragraph, states the following: "All data collected were summarized as categorical variables, organized and presented in tables, using descriptive statistics such as simple frequencies and percentages. Since this was a scoping review, a narrative synthesis was used to summarize the findings of the studies, focusing on the characteristics and psychometric properties evaluated according to COSMIN standards."

Lack of Discussion on Potential Confounding Variables

There is a brief mention of the use of psychometric standards (COSMIN) to evaluate AI technologies, but the article doesn't discuss how confounding factors (e.g., differences in sample size, age groups, or types of technology used) might impact the validity of the conclusions. Were these factors taken into consideration in the evaluation process?
Suggestion: Acknowledge potential confounding variables and how they might affect the validity of the studies reviewed. How did the authors control for these variables, if at all, in the synthesis process?

We would like to clarify that, since our study is a scoping review focused on the mapping and characterization of AI-based technologies used to assess motor skills in children, we did not aim to establish causal or correlational relationships between variables. Therefore, the methodological approach does not involve controlling or statistically analyzing confounding variables, as would be expected in primary empirical studies. The purpose of this review was to identify and describe the technologies developed, their applications, and the reported psychometric dimensions.
Nonetheless, we have incorporated a critical reflection on the differences in sample sizes, age ranges (expressed in both years and months), and the types of technologies used, acknowledging these as limitations that could influence the validity and generalizability of the findings reported by the included studies.
Specifically, the last lines of the limitations section say the following: "This review did not aim to analyze associations between variables; however, variability in sample sizes, age ranges, and types of AI-based technologies used across studies may affect the comparability and generalizability of the findings. These differences should be considered when interpreting the results and highlight the need for more standardized approaches in future research."

8. Unclear Rationale for Data Collection Methods

In the section about data extraction, the mention of "feasibility and usability" as part of the data form is important but lacks further context. Was there a specific framework used to evaluate these factors? Were the usability and feasibility criteria standardized across the included studies?
Suggestion: Provide more information on the frameworks or criteria used to assess usability and feasibility. Were these evaluations subjective or based on standardized measures? Clarifying this would help readers understand how the data extraction process assessed these aspects.
Discussion   -    Discuss usability, cost-effectiveness, and accessibility challenges. -Emphasize the need for interdisciplinary collaboration in future validation studies. -Propose specific psychometric properties (e.g., construct validity) to prioritize future AI validation research. - Add practical recommendations for improving AI tools (e.g., integrating psychometric standards, improving usability for non-specialists).

Since no standardized framework was used to assess feasibility and usability, this is an aspect that limits the interpretation and comparison of the results among the included studies. We state this information in the limitations.
Specifically, we added to the beginning of the limitations:: "Also, although this review was based on COSMIN standards to assess the psychometric quality of AI-related technologies, due to the heterogeneity observed in the included studies, no specific adjustments were made to control for possible confounding variables. Therefore, the conclusions need to be interpreted with caution. It is recommended that future research address these factors and use control methods to provide more generalizable conclusions. Furthermore, feasibility and usability were extracted only if the reviewed studies explicitly reported having done so in their analysis of AI-related technologies. Therefore, further studies should evaluate these analyzes using a standardized framework."

In response to comments on discussion, we have expanded this section on implications. Specifically, we added the following paragraphs in implications:
"To facilitate use, developers could conduct studies that evaluate the acceptance, ease of use, cost-effectiveness, and accessibility of these technologies. For example, most technologies rely on sensors and monitors that, while accurate, can be costly, require specialized training, and can be difficult to implement in real-world settings for physicians, teachers, therapists, or practitioners unfamiliar with these tools. In addition, disparities in access to advanced technologies may limit their adoption, particularly in low-resource settings.
Also, these types of technologies may be closer to more universal and cost-effective devices, such as video cameras, smartphones, and tablets, that can assess and report motor skills in real time. However, addressing these challenges requires a collaborative and interdisciplinary approach. Future validation studies should involve experts from multiple fields, including engineers, healthcare professionals, educators and policy makers, to ensure that these technologies are not only accurate, but also practical, scalable and accessible to diverse populations.
New validation studies of these technologies should include validation standards for BMS tests, prioritizing key psychometric properties such as construct validity, criterion validity, reliability, measurement error, among others. To make this possible, engineering teams could incorporate specialists in psychometrics, developmental therapy and medicine to work collaboratively. This multidisciplinary approach will facilitate the integration of medical knowledge and psychometric standards into future software releases, improving both measurement accuracy and practical usability. Finally, developers should consider providing open source code or detailed methodological documentation, which will allow for further refinement, replication, and clinical adaptation of these technologies in future research and real-world applications."
Reviewer # 1:

Abstract

Background    -   Redundant phrasing: "two observersers" is misspelled and repetitive.

We corrected the typographical error and improved the phrasing for clarity in the Background section. The revised sentence now reads:
“In basic motor skills evaluation, two observers may rate the same child’s performance differently, introducing variability in the assessment”

Results             -   Tracking of "10 other publications" is unclear and vague.

We revised the sentence in the Results section to provide a more precise description of the process. The updated text now reads:

“From the 7 technology development studies, we examined their citation networks using Google Scholar and identified 10 subsequent peer-reviewed publications that either enhanced the original technologies or applied them in new research contexts.”

                          -   Overuse of technical language without explanation for general readers (e.g., "engineering criteria").

We revised the sentence to reduce technical jargon and make it more accessible to general readers. In the corresponding section, the updated sentence now reads:

“The validation of these algorithms was based on engineering standards, focusing on their accuracy and technical performance, but without integrating medical and psychological knowledge about children's motor development.”

Keywords       -   Missing potential keywords like "assessment tools," “functional abilities,” and   "gross motor” based on the ‘title and the contents of the study

We have updated the list of keywords to better reflect the title and content of the study. The following terms have been added:
“assessment tools,” “functional abilities,” and “gross motor.”

Introduction: suggestions

Overly Technical Language:

While the introduction is generally informative, it becomes very technical in certain sections, especially when describing the AI process (e.g., "Fast Fourier Transformation," "principal components analysis," "linear discriminant analysis"). This might be difficult for readers unfamiliar with these terms or the AI field.

Suggestion: Simplify the explanation or provide brief definitions or context for these terms.

We have revised the third paragraph of the Introduction to provide brief definitions and clearer context for the mentioned techniques. Specifically, we now explain the role of Fast Fourier Transformation, principal components analysis, and linear discriminant analysis in simpler terms to aid reader understanding. The revised sentence reads as follows:

“Then, these data undergo pre-processing to reduce noise and enhance relevant features. This step often involves filtering techniques, such as Fast Fourier Transformation (which helps separate important movement signals from background noise) or wavelet transforms. Additionally, to simplify complex data and highlight key movement patterns, methods like principal components analysis (which reduces data dimensions while preserving essential information) or linear discriminant analysis (which enhances the distinction between movement categories) are applied (17).”

2. Lack of Clear Connection Between BMS and AI:

While the introduction discusses BMS, AI technologies, and their potential to address observer bias, the connection between these topics could be more explicit. The introduction seems to jump between BMS, traditional measurement tools, observer bias, and AI without a smooth flow that ties everything together.

Suggestion: Strengthen the connection between the problems with current BMS assessment and how AI could specifically address them. A clearer explanation of how AI can solve observer bias and improve accuracy in BMS assessment would help make the argument more compelling.

We agree that the connection between BMS, traditional measurement tools, observer bias, and artificial intelligence (AI)-related technology could be more explicit. In response to your suggestion, we have made adjustments to the third paragraph of the Introduction to improve the flow and strengthen the connection between these topics.
Specifically, we added a clearer explanation of how AI can directly address observer bias and improve accuracy in BMS assessment. We now describe how AI, by automating the motion classification process, reduces human subjectivity and enables more objective and accurate evaluations.
In this way, we aim to reinforce the argument that AI-related technology not only offers an alternative to traditional methods but effectively addresses the limitations of current BMS assessment approaches.

3. Limited Emphasis on the Scope of the Problem:
The issue of observer bias and its impact on BMS assessment is raised, but it’s not explored in-depth. The extent of the problem (how often it happens, how significant the impact is) isn’t fully explained.
Suggestion: Provide more concrete examples or data to highlight the real-world implications of observer bias in BMS assessments. This would help emphasize the need for better solutions like AI.

We have strengthened the second paragraph of the Introduction by incorporating concrete data that illustrate the prevalence and real-world consequences of this issue. Specifically, we now include evidence from two reviews: one indicating that, among 960 behavioral studies, only 3.2% reported interobserver reliability measures and only 1.9% met rigorous criteria for minimizing bias (12); and another highlighting that in child development research, poor reporting and variability in assessor performance may obscure children’s true developmental status, potentially compromising clinical decisions (13).

These additions aim to clarify the extent and significance of observer bias, reinforcing the necessity of adopting more objective tools—such as AI-based technologies—for accurate and reliable BMS assessment.

4. Vague Description of "AI-Related Technologies":
The term “AI-related technologies” is introduced, but it remains somewhat vague. While the steps of motion recognition are detailed, it’s not entirely clear what specific AI tools or algorithms are most effective in this context or how these methods directly translate to motor skill assessment in children.
Suggestion: Clarify the specific AI tools used or might be used in BMS assessment. This could make the introduction feel more grounded in practical applications.

We have revised the beginning of the third paragraph of the Introduction to explicitly define “AI-related technologies” and specify the tools most relevant for BMS assessment. The revised text reads as follows:
“AI-related technologies (i.e., computational systems that use artificial intelligence to analyze, learn from, and interpret data) offer a promising alternative to minimize observer bias in BMS assessment (14). For example, for motion capture and analysis, computer vision tools such as OpenPose, MediaPipe and DeepLabCut enable pose estimation and tracking of key points of the human body with high accuracy (15). In addition, deep learning techniques, such as Convolutional Neural Networks (CNN) and vision-specialized Transformer Models (ViT), have proven to be effective in classifying motion sequences in videos (16). In that sense, these AI-related technologies for recognizing and classifying human motion patterns consist of several steps (Figure 1) (17).”

5. Inconsistent Flow and Structure:
The flow between sections could be smoother. For example, the transition from talking about BMS instruments (TGMD-2, BOT-2) to observer bias is a bit abrupt. Similarly, after discussing the issue of observer bias, the leap to the technicalities of AI feels disconnected.
Suggestion: Reorganize the content so that each section builds more naturally on the previous one. Consider adding a paragraph that ties observer bias to the introduction of AI as a solution before jumping into the technical aspects.

We appreciate your feedback on the flow and structure of the manuscript. Our team has improved the text to facilitate smooth transitions between sections.
Specifically, the first part of the paragraph now reads: "Typically, BMS assessment relies on trained professionals who observe, record, and score children's performance on specific motor tasks (8,9). However, a major challenge in this approach is observer bias. Even when raters receive standardized training, small differences in scoring can introduce variability in BMS measurements. This variability reduces the accuracy of the assessment and can lead to misinterpretations. For example, two children with similar motor skills may receive different scores depending on the assessor, resulting in inconsistent results. When these inconsistencies follow a systematic pattern, they contribute to observer bias, a well-documented source of measurement error (10,11).".
We have also added a transitional paragraph at the end of the second paragraph, linking the problem of observer bias to the introduction of AI technologies as a solution. This paragraph highlights how observer bias can affect BMS assessment results, and the following paragraph addresses how AI-related technologies can mitigate this problem, offering a more objective and accurate alternative. The text reads: "In fact, one review reported that of 960 behavioral studies, only 3.2% reported measures of interobserver reliability, and only 1.9% met rigorous criteria for minimizing bias (12). Similarly, another review on child development found that the quality of reporting on the use of assessors in these studies was poor and that variability in assessor performance may obscure the true developmental status of children, compromising complex and costly clinical decisions (13)”.

6. Missing Emphasis on the Target Age Group (3-6 Years):
The introduction discusses BMS in general terms but doesn’t emphasize the importance of assessing motor skills in children aged 3 to 6 years. Since this is a specific target group, more focus on why this age range is critical for BMS development would strengthen the relevance of the study.
Suggestion: Provide a brief explanation of why this age group is the focus and what makes BMS development particularly crucial at this stage.

We have added a brief explanation in the first paragraph of the introduction that highlights why this age group is crucial for BMS development. Specifically, we have pointed out that during this period of rapid physical and motor development, children refine fundamental skills that are essential for later complex activities, which directly impacts their cognitive, emotional, and social development. In addition, we underscored the relevance of early assessment to identify potential delays and facilitate timely interventions.
Now the text says the following: "The development of basic motor skills (BMS) in children aged 3 to 6 years is critical, as this is a period of rapid motor growth, where children acquire physical skills that allow them to participate in a variety of activities (1,2). At this age, children experience significant improvements in gross motor control, allowing them to perform movements such as running, jumping, and manipulating objects with greater precision (3,4). The acquisition of these motor skills is essential for physical, cognitive and emotional development, as BMS are strongly linked to general well-being, self-esteem and social integration (5)."

7. Lack of Citations for Claims About AI in Other Fields:
There are a few claims about AI applications in other fields (e.g., assessing physical activity in adults, recognizing walking alterations), but not all of these are fully supported by citations. The mention of “a recent review” and other general statements could be more precise.
Suggestion: Provide specific references or clarify that the claims about AI in other fields are based on the literature review cited earlier.

We have carefully reviewed the claims about AI applications and ensured that each is supported by an appropriate reference. In addition, we have revised the general claims to improve their accuracy with the cited literature.
Specifically, we have updated citations 4, 5, 6, and 7 with their respective references:
4. Jiang G-P, Jiao X-B, Wu S-K, Ji Z-Q, Liu W-T, Chen X, et al. Balance, proprioception, and gross motor development of Chinese children aged 3 to 6 years. J Mot Behav. 2018;50(3):343–52. Available at: http://dx.doi.org/10.1080/00222895.2017.1363694
5. Gandotra A, Kotyuk E, Bizonics R, Khan I, Petánszki M, Kiss L, et al. An exploratory study of the relationship between motor skills and indicators of cognitive and socio-emotional development in preschoolers. Eur J Dev Psychol. 2023;20(1):50–65. Available at: http://dx.doi.org/10.1080/17405629.2022.2028617
6. Bremer E, Cairney J. Fundamental Movement Skills and Health-Related Outcomes: A Narrative Review of Longitudinal and Intervention Studies Targeting Typically Developing Children. Am J Lifestyle Med. 2018;12(2):148-59. https://doi.org/10.1177/1559827616640196
7. Eddy LH, Wood ML, Shire KA, Bingham DD, Bonnick E, Creaser A, et al. A systematic review of randomized and case-controlled trials investigating the effectiveness of school-based motor skill interventions in 3- to 12-year-old children. Child Care Health Dev. 2019;45(6):773-90. https://doi.org/10.1111/cch.12712

8. No Clear Statement of the Research Gap:
While the introduction implies a gap in the literature (i.e., a lack of formal review of AI in BMS assessment for children), this could be more explicitly stated and emphasized. It could be clearer why this review is necessary and what new insights it might offer.
Suggestion: Make the research gap more prominent by rephrasing the last couple of sentences to more forcefully explain why this scoping review is essential.

We have revised the last two final sentences of the introduction to highlight the research gap in the use of AI to accurately assess BMS in children and the need for this scoping review. Specifically, we now emphasize the lack of a comprehensive review of AI applications for assessing basic motor skills (BMS) in preschool children, as well as the need to systematize the scope, limitations, and validity of these technologies.
Specifically, we have added the following: "However, despite the increasing use of AI in motor performance assessment, there is no comprehensive review examining its specific application in the assessment of BMS in preschool children, being a crucial stage for early detection and intervention. Furthermore, the scope, limitations and validity of AI-based technologies in this context are not yet clearly systematized. Therefore, it is required to synthesize existing knowledge and guide the development of more accurate and accessible assessment tools"

9. Repetitive Mention of AI:

The introduction repeatedly mentions “AI-related technologies,” which could be streamlined. The term is used several times in close proximity without adding new information each time, which can feel redundant.
Suggestion: Try to vary the phrasing (e.g., "machine learning tools," "AI classifiers") to keep the introduction engaging and avoid repetition.

Our team has made sure to review and update the text to avoid redundancy of the term “AI-related technologies”. We have also decided to keep the term “AI-related technologies” consistently throughout the introduction to ensure clarity and consistency throughout the manuscript. In fact, in the third paragraph, we have included a clarification referring to the term “AI-related technologies”.

Specifically, in the first three lines of the third paragraph in the introduction, we have added: "AI-related technologies (i.e., computational systems that use artificial intelligence to analyze, learn from, and interpret data) offer a promising alternative to minimize observer bias in BMS assessment (14)"

10. Unfinished Sentence in Objective Statement:

The final sentence of the introduction (before the objectives) seems cut off. This makes the last thought feel incomplete and leaves the reader hanging.
Suggestion: Complete the sentence and ensure that all points are concluded before introducing the research objectives.

We have reviewed and revised this section to ensure the idea is fully expressed and concluded before presenting the research objectives. Now, in the final lines of the fourth paragraph of the introduction, we have added the following:
“However, despite the increasing use of AI in motor performance assessment, there is no comprehensive review examining its specific application in the assessment of BMS in preschool children, being a crucial stage for early detection and intervention. Moreover, the scope, limitations and validity of AI-based technologies in this context are not yet clearly systematized. Therefore, it is required to synthesize existing knowledge and guide the development of more accurate and accessible assessment tools”

Methods and discussion:

1. Lack of Detail in Study Selection Criteria

While the article clearly defines the types of studies it targeted (engineering, substantive validation, and use of AI-related technologies), it doesn’t provide much detail on what specific inclusion and exclusion criteria were applied beyond these categories. For example, were studies excluded for reasons like sample size, study design, or methodological rigor?
Suggestion: Provide more specific inclusion and exclusion criteria. Were only randomized controlled trials considered? Were there any restrictions based on the publication type (e.g., peer-reviewed articles only)?

We have revised the manuscript and added more detailed inclusion and exclusion criteria. Specifically, in the third paragraph of methods, we added: "We also defined the following criteria for the search: 1) studies in preschool-aged children (3 to 6 years), 2) studies in which the motor ability (motor or play skills) of the child was assessed using AI-related technologies for motion detection, and 3) studies in which at least one of the basic motor skills described in the literature (running, jumping, kicking, throwing, or catching a ball) was measured. In addition, we excluded 1) studies that did not clearly describe the AI-related technology used or developed, 2) opinion articles, editorials, or narrative reviews without empirical data and 3) gray literature (e.g. theses, dissertations, or non-peer-reviewed reports)."

2. Vague Explanation of Search Strategy

The search strategy mentions specific databases but does not explain the rationale for selecting these particular sources over others. Are there other databases relevant to the field that might have been excluded? How were the keywords selected, and were any synonyms or related terms considered?
Suggestion: Provide a clearer justification for the selection of these particular databases and search terms. Did you consider any grey literature (e.g., theses, dissertations, reports from non-peer-reviewed sources) to broaden the search?

We've added clearer justification for our selection of databases and search terms. Specifically, we added in the methods section, subsection Search strategy, the following two paragraphs:

"We searched for studies published before January 30, 2023 in the target publications in Medline (SCR_002185), Web of Science (SCR_022706), IEEE (SCR_008314), and EBSCO (SCR_022707). These databases were selected because they specialize in biomedical, engineering, and multidisciplinary research, ensuring that we captured relevant studies in health sciences, AI applications, and motion analysis.
Search terms included keyword combinations such as “child,” “preschool,” “basic motor skills,” “artificial intelligence,” “motion sensing,” and “calibration,” along with related terms and synonyms identified through a preliminary literature review (keywords) and controlled vocabulary (MeSH terms). The full search strategy and complete list of search terms are available here (42)."
Likewise, we cite the complete search strategy as number 42.

Details on the Rayyan Platform

The mention of the Rayyan platform for managing studies is helpful, but the text doesn't clarify whether Rayyan was used for the full review process or just the initial screening. The review process appears to be manual in nature, but it could benefit from some details on how disagreements were handled between reviewers (e.g., was there a consensus meeting or did one reviewer have final say on the study’s inclusion?).

Our team has detailed how Rayyan was used during the review process. Specifically, in the methods section, subsection Search strategy, in the third, fourth and fifth paragraph, we add:

"The search formulas were applied to the databases and all the files were exported in RIS format. Then, to ensure an objective selection process, these identified files were uploaded to the Rayyan platform which facilitated blind selection by the reviewers and expedited the identification of duplicates.
The selection process consisted of two phases. In the first phase, titles and abstracts were reviewed by two independent groups (each consisting of two previously trained medical students). To minimize selection bias, the Rayyan blinding function was used, which prevented reviewers from identifying the decisions of the other reviewers until the final selection phase. In addition, allocation of studies to reviewers was randomized within each group to further reduce potential bias. In case of disagreement, a consensus discussion was held among the reviewers. If consensus could not be reached, the principal investigator made the final inclusion decision.
In the second phase, a full-text review was performed following the same procedure, ensuring consistency and methodological rigor. The final set of studies was determined after resolving all discrepancies through consensus discussions and the intervention of the principal investigator."

Potential Bias in Study Selection

The method states that the initial review of titles and abstracts was done by two independent groups, but there is no mention of any specific strategies to minimize bias during this phase. Given that the authors are likely familiar with the topic, it could be helpful to acknowledge how any biases in study selection (e.g., confirmation bias or publication bias) were minimized.
Suggestion: Mention strategies to reduce bias in study selection. For example, were studies randomly assigned to reviewers, and was there a protocol to ensure that preconceived notions didn’t influence the selection process?

Our team has added information about the process to minimize bias during study selection. Specifically, in the methods section, Search Strategy subsection, in the fourth and fifth paragraphs, we add:
"The selection process consisted of two phases. In the first phase, titles and abstracts were reviewed by two independent groups (each consisting of two previously trained medical students). To minimize selection bias, the Rayyan blinding function was used, which prevented reviewers from identifying the decisions of the other reviewers until the final selection phase. In addition, allocation of studies to reviewers was randomized within each group to further reduce potential bias. In case of disagreement, a consensus discussion was held among the reviewers. If consensus could not be reached, the principal investigator made the final inclusion decision.
In the second phase, a full-text review was performed following the same procedure, ensuring consistency and methodological rigor. The final set of studies was determined after resolving all discrepancies through consensus discussions and the intervention of the principal investigator."

5. Limited Details on Data Extraction Process

While the data extraction form is outlined with categories such as general information, engineering, substantive validation, and use, there is little detail on how the data were extracted from the studies. For example, was any kind of reliability check conducted between reviewers, or were discrepancies resolved through discussion?
Suggestion: Provide more details on the data extraction process, such as whether two independent reviewers performed the extraction and how discrepancies were resolved. Additionally, were any statistical methods used to assess inter-rater reliability in the extraction process?

We have updated the paragraph by adding information in the methods section, Data extraction subsection. Now the text says the following:
"Data extraction was performed in a structured manner using a pre-designed form (43). To reduce errors and improve the accuracy of the extracted data, one peer reviewer performed the initial extraction and a second peer independently verified the information. Any discrepancies in the extraction were reviewed jointly and/or, with the intervention of the principal investigator. Cross-checks were implemented to ensure the consistency of the information collected."

Lack of Clarity on Data Analysis

The description of data analysis mentions the use of descriptive statistics (frequencies and percentages) and the COSMIN standards for evaluating psychometric properties. However, there is no clear explanation of how the data were analyzed or how the results were synthesized. Were any meta-analysis or qualitative synthesis methods considered? How were the studies compared and summarized?
Suggestion: Provide more clarity on how the data were synthesized. Was any quantitative or qualitative analysis beyond simple descriptive statistics performed, such as meta-analysis or thematic analysis? Did you perform a subgroup analysis for specific types of AI technologies or specific study characteristics?

Since our study was a scoping review, we did not conduct a meta-analysis. Instead, we opted for a narrative synthesis of the results, presented in tables of frequencies and percentages. We also used the COSMIN standards, whose scoring and rating process is described in their original report, which we also reference in our study.
Specifically, our text, in the methods section, subsection "Data analysis," in its first paragraph, states the following: "All data collected were summarized as categorical variables, organized and presented in tables, using descriptive statistics such as simple frequencies and percentages. Since this was a scoping review, a narrative synthesis was used to summarize the findings of the studies, focusing on the characteristics and psychometric properties evaluated according to COSMIN standards."

Lack of Discussion on Potential Confounding Variables

There is a brief mention of the use of psychometric standards (COSMIN) to evaluate AI technologies, but the article doesn't discuss how confounding factors (e.g., differences in sample size, age groups, or types of technology used) might impact the validity of the conclusions. Were these factors taken into consideration in the evaluation process?
Suggestion: Acknowledge potential confounding variables and how they might affect the validity of the studies reviewed. How did the authors control for these variables, if at all, in the synthesis process?

We would like to clarify that, since our study is a scoping review focused on the mapping and characterization of AI-based technologies used to assess motor skills in children, we did not aim to establish causal or correlational relationships between variables. Therefore, the methodological approach does not involve controlling or statistically analyzing confounding variables, as would be expected in primary empirical studies. The purpose of this review was to identify and describe the technologies developed, their applications, and the reported psychometric dimensions.
Nonetheless, we have incorporated a critical reflection on the differences in sample sizes, age ranges (expressed in both years and months), and the types of technologies used, acknowledging these as limitations that could influence the validity and generalizability of the findings reported by the included studies.
Specifically, the last lines of the limitations section say the following: "This review did not aim to analyze associations between variables; however, variability in sample sizes, age ranges, and types of AI-based technologies used across studies may affect the comparability and generalizability of the findings. These differences should be considered when interpreting the results and highlight the need for more standardized approaches in future research."

8. Unclear Rationale for Data Collection Methods

In the section about data extraction, the mention of "feasibility and usability" as part of the data form is important but lacks further context. Was there a specific framework used to evaluate these factors? Were the usability and feasibility criteria standardized across the included studies?
Suggestion: Provide more information on the frameworks or criteria used to assess usability and feasibility. Were these evaluations subjective or based on standardized measures? Clarifying this would help readers understand how the data extraction process assessed these aspects.
Discussion   -    Discuss usability, cost-effectiveness, and accessibility challenges. -Emphasize the need for interdisciplinary collaboration in future validation studies. -Propose specific psychometric properties (e.g., construct validity) to prioritize future AI validation research. - Add practical recommendations for improving AI tools (e.g., integrating psychometric standards, improving usability for non-specialists).

Since no standardized framework was used to assess feasibility and usability, this is an aspect that limits the interpretation and comparison of the results among the included studies. We state this information in the limitations.
Specifically, we added to the beginning of the limitations:: "Also, although this review was based on COSMIN standards to assess the psychometric quality of AI-related technologies, due to the heterogeneity observed in the included studies, no specific adjustments were made to control for possible confounding variables. Therefore, the conclusions need to be interpreted with caution. It is recommended that future research address these factors and use control methods to provide more generalizable conclusions. Furthermore, feasibility and usability were extracted only if the reviewed studies explicitly reported having done so in their analysis of AI-related technologies. Therefore, further studies should evaluate these analyzes using a standardized framework."

In response to comments on discussion, we have expanded this section on implications. Specifically, we added the following paragraphs in implications:
"To facilitate use, developers could conduct studies that evaluate the acceptance, ease of use, cost-effectiveness, and accessibility of these technologies. For example, most technologies rely on sensors and monitors that, while accurate, can be costly, require specialized training, and can be difficult to implement in real-world settings for physicians, teachers, therapists, or practitioners unfamiliar with these tools. In addition, disparities in access to advanced technologies may limit their adoption, particularly in low-resource settings.
Also, these types of technologies may be closer to more universal and cost-effective devices, such as video cameras, smartphones, and tablets, that can assess and report motor skills in real time. However, addressing these challenges requires a collaborative and interdisciplinary approach. Future validation studies should involve experts from multiple fields, including engineers, healthcare professionals, educators and policy makers, to ensure that these technologies are not only accurate, but also practical, scalable and accessible to diverse populations.
New validation studies of these technologies should include validation standards for BMS tests, prioritizing key psychometric properties such as construct validity, criterion validity, reliability, measurement error, among others. To make this possible, engineering teams could incorporate specialists in psychometrics, developmental therapy and medicine to work collaboratively. This multidisciplinary approach will facilitate the integration of medical knowledge and psychometric standards into future software releases, improving both measurement accuracy and practical usability. Finally, developers should consider providing open source code or detailed methodological documentation, which will allow for further refinement, replication, and clinical adaptation of these technologies in future research and real-world applications."
Competing Interests: There is no conflict of interest Close
Report a concern

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 18 Dec 2023

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1
Version 2 (revision) 02 Sep 25
Version 1 18 Dec 23	read

Abraham M. Joshua, Manipal Academy of Higher Education, Manipal, India

Ashish John Prabhakar, Manipal Academy of Higher Education (Ringgold ID: 76793), Manipal, India

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

20 Views

30 Jan 2025 | for Version 1

Abraham M. Joshua, Department of Physiotherapy, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education, Manipal, Karnataka, India

Ashish John Prabhakar, Department of Physiotherapy, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education (Ringgold ID: 76793), Manipal, Karnataka, India

20 Views Cite this report Responses(1)

Approved With Reservations

While the introduction is generally informative, it becomes very technical in certain sections, especially when describing the AI process (e.g., "Fast Fourier Transformation," "principal components analysis," "linear discriminant analysis"). This might be difficult for readers unfamiliar with these terms or the AI field.
Suggestion: Simplify the explanation or provide brief definitions or context for these terms.

2. Lack of Clear Connection Between BMS and AI:

While the introduction discusses BMS, AI technologies, and their potential to address observer bias, the connection between these topics could be more explicit. The introduction seems to jump between BMS, traditional measurement tools, observer bias, and AI without a smooth flow that ties everything together.
Suggestion: Strengthen the connection between the problems with current BMS assessment and how AI could specifically address them. A clearer explanation of how AI can solve observer bias and improve accuracy in BMS assessment would help make the argument more compelling.

3. Limited Emphasis on the Scope of the Problem:

The issue of observer bias and its impact on BMS assessment is raised, but it’s not explored in-depth. The extent of the problem (how often it happens, how significant the impact is) isn’t fully explained.
Suggestion: Provide more concrete examples or data to highlight the real-world implications of observer bias in BMS assessments. This would help emphasize the need for better solutions like AI.

4. Vague Description of "AI-Related Technologies":

The term “AI-related technologies” is introduced, but it remains somewhat vague. While the steps of motion recognition are detailed, it’s not entirely clear what specific AI tools or algorithms are most effective in this context or how these methods directly translate to motor skill assessment in children.
Suggestion: Clarify the specific AI tools used or might be used in BMS assessment. This could make the introduction feel more grounded in practical applications.

5. Inconsistent Flow and Structure:

The flow between sections could be smoother. For example, the transition from talking about BMS instruments (TGMD-2, BOT-2) to observer bias is a bit abrupt. Similarly, after discussing the issue of observer bias, the leap to the technicalities of AI feels disconnected.
Suggestion: Reorganize the content so that each section builds more naturally on the previous one. Consider adding a paragraph that ties observer bias to the introduction of AI as a solution before jumping into the technical aspects.

6. Missing Emphasis on the Target Age Group (3-6 Years):

The introduction discusses BMS in general terms but doesn’t emphasize the importance of assessing motor skills in children aged 3 to 6 years. Since this is a specific target group, more focus on why this age range is critical for BMS development would strengthen the relevance of the study.
Suggestion: Provide a brief explanation of why this age group is the focus and what makes BMS development particularly crucial at this stage.

7. Lack of Citations for Claims About AI in Other Fields:

There are a few claims about AI applications in other fields (e.g., assessing physical activity in adults, recognizing walking alterations), but not all of these are fully supported by citations. The mention of “a recent review” and other general statements could be more precise.
Suggestion: Provide specific references or clarify that the claims about AI in other fields are based on the literature review cited earlier.

8. No Clear Statement of the Research Gap:

While the introduction implies a gap in the literature (i.e., a lack of formal review of AI in BMS assessment for children), this could be more explicitly stated and emphasized. It could be clearer why this review is necessary and what new insights it might offer.
Suggestion: Make the research gap more prominent by rephrasing the last couple of sentences to more forcefully explain why this scoping review is essential.

9. Repetitive Mention of AI:

The introduction repeatedly mentions “AI-related technologies,” which could be streamlined. The term is used several times in close proximity without adding new information each time, which can feel redundant.
Suggestion: Try to vary the phrasing (e.g., "machine learning tools," "AI classifiers") to keep the introduction engaging and avoid repetition.

10. Unfinished Sentence in Objective Statement:

The final sentence of the introduction (before the objectives) seems cut off. This makes the last thought feel incomplete and leaves the reader hanging.
Suggestion: Complete the sentence and ensure that all points are concluded before introducing the research objectives.

Methods and discussion:

1. Lack of Detail in Study Selection Criteria

While the article clearly defines the types of studies it targeted (engineering, substantive validation, and use of AI-related technologies), it doesn’t provide much detail on what specific inclusion and exclusion criteria were applied beyond these categories. For example, were studies excluded for reasons like sample size, study design, or methodological rigor?
Suggestion: Provide more specific inclusion and exclusion criteria. Were only randomized controlled trials considered? Were there any restrictions based on the publication type (e.g., peer-reviewed articles only)?

2. Vague Explanation of Search Strategy

The search strategy mentions specific databases but does not explain the rationale for selecting these particular sources over others. Are there other databases relevant to the field that might have been excluded? How were the keywords selected, and were any synonyms or related terms considered?
Suggestion: Provide a clearer justification for the selection of these particular databases and search terms. Did you consider any grey literature (e.g., theses, dissertations, reports from non-peer-reviewed sources) to broaden the search?

3. Details on the Rayyan Platform

The mention of the Rayyan platform for managing studies is helpful, but the text doesn't clarify whether Rayyan was used for the full review process or just the initial screening. The review process appears to be manual in nature, but it could benefit from some details on how disagreements were handled between reviewers (e.g., was there a consensus meeting or did one reviewer have final say on the study’s inclusion?).
Suggestion: Provide more detail on how Rayyan was used throughout the review process. How were disagreements between the two independent groups resolved, and what role did the principal investigator play in the final decisions?

4. Potential Bias in Study Selection

The method states that the initial review of titles and abstracts was done by two independent groups, but there is no mention of any specific strategies to minimize bias during this phase. Given that the authors are likely familiar with the topic, it could be helpful to acknowledge how any biases in study selection (e.g., confirmation bias or publication bias) were minimized.
Suggestion: Mention strategies to reduce bias in study selection. For example, were studies randomly assigned to reviewers, and was there a protocol to ensure that preconceived notions didn’t influence the selection process?

5. Limited Details on Data Extraction Process

While the data extraction form is outlined with categories such as general information, engineering, substantive validation, and use, there is little detail on how the data were extracted from the studies. For example, was any kind of reliability check conducted between reviewers, or were discrepancies resolved through discussion?
Suggestion: Provide more details on the data extraction process, such as whether two independent reviewers performed the extraction and how discrepancies were resolved. Additionally, were any statistical methods used to assess inter-rater reliability in the extraction process?

6. Lack of Clarity on Data Analysis

The description of data analysis mentions the use of descriptive statistics (frequencies and percentages) and the COSMIN standards for evaluating psychometric properties. However, there is no clear explanation of how the data were analyzed or how the results were synthesized. Were any meta-analysis or qualitative synthesis methods considered? How were the studies compared and summarized?
Suggestion: Provide more clarity on how the data were synthesized. Was any quantitative or qualitative analysis beyond simple descriptive statistics performed, such as meta-analysis or thematic analysis? Did you perform a subgroup analysis for specific types of AI technologies or specific study characteristics?

7. Lack of Discussion on Potential Confounding Variables

There is a brief mention of the use of psychometric standards (COSMIN) to evaluate AI technologies, but the article doesn't discuss how confounding factors (e.g., differences in sample size, age groups, or types of technology used) might impact the validity of the conclusions. Were these factors taken into consideration in the evaluation process?
Suggestion: Acknowledge potential confounding variables and how they might affect the validity of the studies reviewed. How did the authors control for these variables, if at all, in the synthesis process?

8. Unclear Rationale for Data Collection Methods

In the section about data extraction, the mention of "feasibility and usability" as part of the data form is important but lacks further context. Was there a specific framework used to evaluate these factors? Were the usability and feasibility criteria standardized across the included studies?
Suggestion: Provide more information on the frameworks or criteria used to assess usability and feasibility. Were these evaluations subjective or based on standardized measures? Clarifying this would help readers understand how the data extraction process assessed these aspects.
Discussion - Discuss usability, cost-effectiveness, and accessibility challenges. -Emphasize the need for interdisciplinary collaboration in future validation studies. -Propose specific psychometric properties (e.g., construct validity) to prioritize future AI validation research. - Add practical recommendations for improving AI tools (e.g., integrating psychometric standards, improving usability for non-specialists).

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Yes
Are sufficient details of the methods and analysis provided to allow replication by others?

Partly
Is the statistical analysis and its interpretation appropriate?

Partly
Are the conclusions drawn adequately supported by the results presented in the review?

Yes
If this is a Living Systematic Review, is the ‘living’ method appropriate and is the search schedule clearly defined and justified? (‘Living Systematic Review’ or a variation of this term should be included in the title.)

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Neurorehabilitation, Upper limb functional evaluation, AI for rehabilitation, Exercise therapy, Balance training

Respond to this report

Responses (1)

Author Response

10 Sep 2025

Joel Figueroa-Quiñones, Lambayeque, Universidad Señor de Sipán, Chiclayo, Peru

Reviewer # 1:

Abstract

Background    -   Redundant phrasing: "two observersers" is misspelled and repetitive.

We corrected the typographical error and improved the phrasing for clarity in the Background section. The revised sentence now reads:
“In basic motor skills evaluation, two observers may rate the same child’s performance differently, introducing variability in the assessment”

Results             -   Tracking of "10 other publications" is unclear and vague.

We revised the sentence in the Results section to provide a more precise description of the process. The updated text now reads:

“From the 7 technology development studies, we examined their citation networks using Google Scholar and identified 10 subsequent peer-reviewed publications that either enhanced the original technologies or applied them in new research contexts.”

                          -   Overuse of technical language without explanation for general readers (e.g., "engineering criteria").

We revised the sentence to reduce technical jargon and make it more accessible to general readers. In the corresponding section, the updated sentence now reads:

“The validation of these algorithms was based on engineering standards, focusing on their accuracy and technical performance, but without integrating medical and psychological knowledge about children's motor development.”

Keywords       -   Missing potential keywords like "assessment tools," “functional abilities,” and   "gross motor” based on the ‘title and the contents of the study

We have updated the list of keywords to better reflect the title and content of the study. The following terms have been added:
“assessment tools,” “functional abilities,” and “gross motor.”

Introduction: suggestions

Overly Technical Language:

While the introduction is generally informative, it becomes very technical in certain sections, especially when describing the AI process (e.g., "Fast Fourier Transformation," "principal components analysis," "linear discriminant analysis"). This might be difficult for readers unfamiliar with these terms or the AI field.

Suggestion: Simplify the explanation or provide brief definitions or context for these terms.

We have revised the third paragraph of the Introduction to provide brief definitions and clearer context for the mentioned techniques. Specifically, we now explain the role of Fast Fourier Transformation, principal components analysis, and linear discriminant analysis in simpler terms to aid reader understanding. The revised sentence reads as follows:

“Then, these data undergo pre-processing to reduce noise and enhance relevant features. This step often involves filtering techniques, such as Fast Fourier Transformation (which helps separate important movement signals from background noise) or wavelet transforms. Additionally, to simplify complex data and highlight key movement patterns, methods like principal components analysis (which reduces data dimensions while preserving essential information) or linear discriminant analysis (which enhances the distinction between movement categories) are applied (17).”

2. Lack of Clear Connection Between BMS and AI:

While the introduction discusses BMS, AI technologies, and their potential to address observer bias, the connection between these topics could be more explicit. The introduction seems to jump between BMS, traditional measurement tools, observer bias, and AI without a smooth flow that ties everything together.

Suggestion: Strengthen the connection between the problems with current BMS assessment and how AI could specifically address them. A clearer explanation of how AI can solve observer bias and improve accuracy in BMS assessment would help make the argument more compelling.

We agree that the connection between BMS, traditional measurement tools, observer bias, and artificial intelligence (AI)-related technology could be more explicit. In response to your suggestion, we have made adjustments to the third paragraph of the Introduction to improve the flow and strengthen the connection between these topics.
Specifically, we added a clearer explanation of how AI can directly address observer bias and improve accuracy in BMS assessment. We now describe how AI, by automating the motion classification process, reduces human subjectivity and enables more objective and accurate evaluations.
In this way, we aim to reinforce the argument that AI-related technology not only offers an alternative to traditional methods but effectively addresses the limitations of current BMS assessment approaches.

3. Limited Emphasis on the Scope of the Problem:
The issue of observer bias and its impact on BMS assessment is raised, but it’s not explored in-depth. The extent of the problem (how often it happens, how significant the impact is) isn’t fully explained.
Suggestion: Provide more concrete examples or data to highlight the real-world implications of observer bias in BMS assessments. This would help emphasize the need for better solutions like AI.

We have strengthened the second paragraph of the Introduction by incorporating concrete data that illustrate the prevalence and real-world consequences of this issue. Specifically, we now include evidence from two reviews: one indicating that, among 960 behavioral studies, only 3.2% reported interobserver reliability measures and only 1.9% met rigorous criteria for minimizing bias (12); and another highlighting that in child development research, poor reporting and variability in assessor performance may obscure children’s true developmental status, potentially compromising clinical decisions (13).

These additions aim to clarify the extent and significance of observer bias, reinforcing the necessity of adopting more objective tools—such as AI-based technologies—for accurate and reliable BMS assessment.

4. Vague Description of "AI-Related Technologies":
The term “AI-related technologies” is introduced, but it remains somewhat vague. While the steps of motion recognition are detailed, it’s not entirely clear what specific AI tools or algorithms are most effective in this context or how these methods directly translate to motor skill assessment in children.
Suggestion: Clarify the specific AI tools used or might be used in BMS assessment. This could make the introduction feel more grounded in practical applications.

We have revised the beginning of the third paragraph of the Introduction to explicitly define “AI-related technologies” and specify the tools most relevant for BMS assessment. The revised text reads as follows:
“AI-related technologies (i.e., computational systems that use artificial intelligence to analyze, learn from, and interpret data) offer a promising alternative to minimize observer bias in BMS assessment (14). For example, for motion capture and analysis, computer vision tools such as OpenPose, MediaPipe and DeepLabCut enable pose estimation and tracking of key points of the human body with high accuracy (15). In addition, deep learning techniques, such as Convolutional Neural Networks (CNN) and vision-specialized Transformer Models (ViT), have proven to be effective in classifying motion sequences in videos (16). In that sense, these AI-related technologies for recognizing and classifying human motion patterns consist of several steps (Figure 1) (17).”

5. Inconsistent Flow and Structure:
The flow between sections could be smoother. For example, the transition from talking about BMS instruments (TGMD-2, BOT-2) to observer bias is a bit abrupt. Similarly, after discussing the issue of observer bias, the leap to the technicalities of AI feels disconnected.
Suggestion: Reorganize the content so that each section builds more naturally on the previous one. Consider adding a paragraph that ties observer bias to the introduction of AI as a solution before jumping into the technical aspects.

We appreciate your feedback on the flow and structure of the manuscript. Our team has improved the text to facilitate smooth transitions between sections.
Specifically, the first part of the paragraph now reads: "Typically, BMS assessment relies on trained professionals who observe, record, and score children's performance on specific motor tasks (8,9). However, a major challenge in this approach is observer bias. Even when raters receive standardized training, small differences in scoring can introduce variability in BMS measurements. This variability reduces the accuracy of the assessment and can lead to misinterpretations. For example, two children with similar motor skills may receive different scores depending on the assessor, resulting in inconsistent results. When these inconsistencies follow a systematic pattern, they contribute to observer bias, a well-documented source of measurement error (10,11).".
We have also added a transitional paragraph at the end of the second paragraph, linking the problem of observer bias to the introduction of AI technologies as a solution. This paragraph highlights how observer bias can affect BMS assessment results, and the following paragraph addresses how AI-related technologies can mitigate this problem, offering a more objective and accurate alternative. The text reads: "In fact, one review reported that of 960 behavioral studies, only 3.2% reported measures of interobserver reliability, and only 1.9% met rigorous criteria for minimizing bias (12). Similarly, another review on child development found that the quality of reporting on the use of assessors in these studies was poor and that variability in assessor performance may obscure the true developmental status of children, compromising complex and costly clinical decisions (13)”.

6. Missing Emphasis on the Target Age Group (3-6 Years):
The introduction discusses BMS in general terms but doesn’t emphasize the importance of assessing motor skills in children aged 3 to 6 years. Since this is a specific target group, more focus on why this age range is critical for BMS development would strengthen the relevance of the study.
Suggestion: Provide a brief explanation of why this age group is the focus and what makes BMS development particularly crucial at this stage.

We have added a brief explanation in the first paragraph of the introduction that highlights why this age group is crucial for BMS development. Specifically, we have pointed out that during this period of rapid physical and motor development, children refine fundamental skills that are essential for later complex activities, which directly impacts their cognitive, emotional, and social development. In addition, we underscored the relevance of early assessment to identify potential delays and facilitate timely interventions.
Now the text says the following: "The development of basic motor skills (BMS) in children aged 3 to 6 years is critical, as this is a period of rapid motor growth, where children acquire physical skills that allow them to participate in a variety of activities (1,2). At this age, children experience significant improvements in gross motor control, allowing them to perform movements such as running, jumping, and manipulating objects with greater precision (3,4). The acquisition of these motor skills is essential for physical, cognitive and emotional development, as BMS are strongly linked to general well-being, self-esteem and social integration (5)."

7. Lack of Citations for Claims About AI in Other Fields:
There are a few claims about AI applications in other fields (e.g., assessing physical activity in adults, recognizing walking alterations), but not all of these are fully supported by citations. The mention of “a recent review” and other general statements could be more precise.
Suggestion: Provide specific references or clarify that the claims about AI in other fields are based on the literature review cited earlier.

We have carefully reviewed the claims about AI applications and ensured that each is supported by an appropriate reference. In addition, we have revised the general claims to improve their accuracy with the cited literature.
Specifically, we have updated citations 4, 5, 6, and 7 with their respective references:
4. Jiang G-P, Jiao X-B, Wu S-K, Ji Z-Q, Liu W-T, Chen X, et al. Balance, proprioception, and gross motor development of Chinese children aged 3 to 6 years. J Mot Behav. 2018;50(3):343–52. Available at: http://dx.doi.org/10.1080/00222895.2017.1363694
5. Gandotra A, Kotyuk E, Bizonics R, Khan I, Petánszki M, Kiss L, et al. An exploratory study of the relationship between motor skills and indicators of cognitive and socio-emotional development in preschoolers. Eur J Dev Psychol. 2023;20(1):50–65. Available at: http://dx.doi.org/10.1080/17405629.2022.2028617
6. Bremer E, Cairney J. Fundamental Movement Skills and Health-Related Outcomes: A Narrative Review of Longitudinal and Intervention Studies Targeting Typically Developing Children. Am J Lifestyle Med. 2018;12(2):148-59. https://doi.org/10.1177/1559827616640196
7. Eddy LH, Wood ML, Shire KA, Bingham DD, Bonnick E, Creaser A, et al. A systematic review of randomized and case-controlled trials investigating the effectiveness of school-based motor skill interventions in 3- to 12-year-old children. Child Care Health Dev. 2019;45(6):773-90. https://doi.org/10.1111/cch.12712

8. No Clear Statement of the Research Gap:
While the introduction implies a gap in the literature (i.e., a lack of formal review of AI in BMS assessment for children), this could be more explicitly stated and emphasized. It could be clearer why this review is necessary and what new insights it might offer.
Suggestion: Make the research gap more prominent by rephrasing the last couple of sentences to more forcefully explain why this scoping review is essential.

We have revised the last two final sentences of the introduction to highlight the research gap in the use of AI to accurately assess BMS in children and the need for this scoping review. Specifically, we now emphasize the lack of a comprehensive review of AI applications for assessing basic motor skills (BMS) in preschool children, as well as the need to systematize the scope, limitations, and validity of these technologies.
Specifically, we have added the following: "However, despite the increasing use of AI in motor performance assessment, there is no comprehensive review examining its specific application in the assessment of BMS in preschool children, being a crucial stage for early detection and intervention. Furthermore, the scope, limitations and validity of AI-based technologies in this context are not yet clearly systematized. Therefore, it is required to synthesize existing knowledge and guide the development of more accurate and accessible assessment tools"

9. Repetitive Mention of AI:

The introduction repeatedly mentions “AI-related technologies,” which could be streamlined. The term is used several times in close proximity without adding new information each time, which can feel redundant.
Suggestion: Try to vary the phrasing (e.g., "machine learning tools," "AI classifiers") to keep the introduction engaging and avoid repetition.

Our team has made sure to review and update the text to avoid redundancy of the term “AI-related technologies”. We have also decided to keep the term “AI-related technologies” consistently throughout the introduction to ensure clarity and consistency throughout the manuscript. In fact, in the third paragraph, we have included a clarification referring to the term “AI-related technologies”.

Specifically, in the first three lines of the third paragraph in the introduction, we have added: "AI-related technologies (i.e., computational systems that use artificial intelligence to analyze, learn from, and interpret data) offer a promising alternative to minimize observer bias in BMS assessment (14)"

10. Unfinished Sentence in Objective Statement:

The final sentence of the introduction (before the objectives) seems cut off. This makes the last thought feel incomplete and leaves the reader hanging.
Suggestion: Complete the sentence and ensure that all points are concluded before introducing the research objectives.

We have reviewed and revised this section to ensure the idea is fully expressed and concluded before presenting the research objectives. Now, in the final lines of the fourth paragraph of the introduction, we have added the following:
“However, despite the increasing use of AI in motor performance assessment, there is no comprehensive review examining its specific application in the assessment of BMS in preschool children, being a crucial stage for early detection and intervention. Moreover, the scope, limitations and validity of AI-based technologies in this context are not yet clearly systematized. Therefore, it is required to synthesize existing knowledge and guide the development of more accurate and accessible assessment tools”

Methods and discussion:

1. Lack of Detail in Study Selection Criteria

While the article clearly defines the types of studies it targeted (engineering, substantive validation, and use of AI-related technologies), it doesn’t provide much detail on what specific inclusion and exclusion criteria were applied beyond these categories. For example, were studies excluded for reasons like sample size, study design, or methodological rigor?
Suggestion: Provide more specific inclusion and exclusion criteria. Were only randomized controlled trials considered? Were there any restrictions based on the publication type (e.g., peer-reviewed articles only)?

We have revised the manuscript and added more detailed inclusion and exclusion criteria. Specifically, in the third paragraph of methods, we added: "We also defined the following criteria for the search: 1) studies in preschool-aged children (3 to 6 years), 2) studies in which the motor ability (motor or play skills) of the child was assessed using AI-related technologies for motion detection, and 3) studies in which at least one of the basic motor skills described in the literature (running, jumping, kicking, throwing, or catching a ball) was measured. In addition, we excluded 1) studies that did not clearly describe the AI-related technology used or developed, 2) opinion articles, editorials, or narrative reviews without empirical data and 3) gray literature (e.g. theses, dissertations, or non-peer-reviewed reports)."

2. Vague Explanation of Search Strategy

The search strategy mentions specific databases but does not explain the rationale for selecting these particular sources over others. Are there other databases relevant to the field that might have been excluded? How were the keywords selected, and were any synonyms or related terms considered?
Suggestion: Provide a clearer justification for the selection of these particular databases and search terms. Did you consider any grey literature (e.g., theses, dissertations, reports from non-peer-reviewed sources) to broaden the search?

We've added clearer justification for our selection of databases and search terms. Specifically, we added in the methods section, subsection Search strategy, the following two paragraphs:

"We searched for studies published before January 30, 2023 in the target publications in Medline (SCR_002185), Web of Science (SCR_022706), IEEE (SCR_008314), and EBSCO (SCR_022707). These databases were selected because they specialize in biomedical, engineering, and multidisciplinary research, ensuring that we captured relevant studies in health sciences, AI applications, and motion analysis.
Search terms included keyword combinations such as “child,” “preschool,” “basic motor skills,” “artificial intelligence,” “motion sensing,” and “calibration,” along with related terms and synonyms identified through a preliminary literature review (keywords) and controlled vocabulary (MeSH terms). The full search strategy and complete list of search terms are available here (42)."
Likewise, we cite the complete search strategy as number 42.

Details on the Rayyan Platform

The mention of the Rayyan platform for managing studies is helpful, but the text doesn't clarify whether Rayyan was used for the full review process or just the initial screening. The review process appears to be manual in nature, but it could benefit from some details on how disagreements were handled between reviewers (e.g., was there a consensus meeting or did one reviewer have final say on the study’s inclusion?).

Our team has detailed how Rayyan was used during the review process. Specifically, in the methods section, subsection Search strategy, in the third, fourth and fifth paragraph, we add:

"The search formulas were applied to the databases and all the files were exported in RIS format. Then, to ensure an objective selection process, these identified files were uploaded to the Rayyan platform which facilitated blind selection by the reviewers and expedited the identification of duplicates.
The selection process consisted of two phases. In the first phase, titles and abstracts were reviewed by two independent groups (each consisting of two previously trained medical students). To minimize selection bias, the Rayyan blinding function was used, which prevented reviewers from identifying the decisions of the other reviewers until the final selection phase. In addition, allocation of studies to reviewers was randomized within each group to further reduce potential bias. In case of disagreement, a consensus discussion was held among the reviewers. If consensus could not be reached, the principal investigator made the final inclusion decision.
In the second phase, a full-text review was performed following the same procedure, ensuring consistency and methodological rigor. The final set of studies was determined after resolving all discrepancies through consensus discussions and the intervention of the principal investigator."

Potential Bias in Study Selection

The method states that the initial review of titles and abstracts was done by two independent groups, but there is no mention of any specific strategies to minimize bias during this phase. Given that the authors are likely familiar with the topic, it could be helpful to acknowledge how any biases in study selection (e.g., confirmation bias or publication bias) were minimized.
Suggestion: Mention strategies to reduce bias in study selection. For example, were studies randomly assigned to reviewers, and was there a protocol to ensure that preconceived notions didn’t influence the selection process?

Our team has added information about the process to minimize bias during study selection. Specifically, in the methods section, Search Strategy subsection, in the fourth and fifth paragraphs, we add:
"The selection process consisted of two phases. In the first phase, titles and abstracts were reviewed by two independent groups (each consisting of two previously trained medical students). To minimize selection bias, the Rayyan blinding function was used, which prevented reviewers from identifying the decisions of the other reviewers until the final selection phase. In addition, allocation of studies to reviewers was randomized within each group to further reduce potential bias. In case of disagreement, a consensus discussion was held among the reviewers. If consensus could not be reached, the principal investigator made the final inclusion decision.
In the second phase, a full-text review was performed following the same procedure, ensuring consistency and methodological rigor. The final set of studies was determined after resolving all discrepancies through consensus discussions and the intervention of the principal investigator."

5. Limited Details on Data Extraction Process

While the data extraction form is outlined with categories such as general information, engineering, substantive validation, and use, there is little detail on how the data were extracted from the studies. For example, was any kind of reliability check conducted between reviewers, or were discrepancies resolved through discussion?
Suggestion: Provide more details on the data extraction process, such as whether two independent reviewers performed the extraction and how discrepancies were resolved. Additionally, were any statistical methods used to assess inter-rater reliability in the extraction process?

We have updated the paragraph by adding information in the methods section, Data extraction subsection. Now the text says the following:
"Data extraction was performed in a structured manner using a pre-designed form (43). To reduce errors and improve the accuracy of the extracted data, one peer reviewer performed the initial extraction and a second peer independently verified the information. Any discrepancies in the extraction were reviewed jointly and/or, with the intervention of the principal investigator. Cross-checks were implemented to ensure the consistency of the information collected."

Lack of Clarity on Data Analysis

The description of data analysis mentions the use of descriptive statistics (frequencies and percentages) and the COSMIN standards for evaluating psychometric properties. However, there is no clear explanation of how the data were analyzed or how the results were synthesized. Were any meta-analysis or qualitative synthesis methods considered? How were the studies compared and summarized?
Suggestion: Provide more clarity on how the data were synthesized. Was any quantitative or qualitative analysis beyond simple descriptive statistics performed, such as meta-analysis or thematic analysis? Did you perform a subgroup analysis for specific types of AI technologies or specific study characteristics?

Since our study was a scoping review, we did not conduct a meta-analysis. Instead, we opted for a narrative synthesis of the results, presented in tables of frequencies and percentages. We also used the COSMIN standards, whose scoring and rating process is described in their original report, which we also reference in our study.
Specifically, our text, in the methods section, subsection "Data analysis," in its first paragraph, states the following: "All data collected were summarized as categorical variables, organized and presented in tables, using descriptive statistics such as simple frequencies and percentages. Since this was a scoping review, a narrative synthesis was used to summarize the findings of the studies, focusing on the characteristics and psychometric properties evaluated according to COSMIN standards."

Lack of Discussion on Potential Confounding Variables

There is a brief mention of the use of psychometric standards (COSMIN) to evaluate AI technologies, but the article doesn't discuss how confounding factors (e.g., differences in sample size, age groups, or types of technology used) might impact the validity of the conclusions. Were these factors taken into consideration in the evaluation process?
Suggestion: Acknowledge potential confounding variables and how they might affect the validity of the studies reviewed. How did the authors control for these variables, if at all, in the synthesis process?

We would like to clarify that, since our study is a scoping review focused on the mapping and characterization of AI-based technologies used to assess motor skills in children, we did not aim to establish causal or correlational relationships between variables. Therefore, the methodological approach does not involve controlling or statistically analyzing confounding variables, as would be expected in primary empirical studies. The purpose of this review was to identify and describe the technologies developed, their applications, and the reported psychometric dimensions.
Nonetheless, we have incorporated a critical reflection on the differences in sample sizes, age ranges (expressed in both years and months), and the types of technologies used, acknowledging these as limitations that could influence the validity and generalizability of the findings reported by the included studies.
Specifically, the last lines of the limitations section say the following: "This review did not aim to analyze associations between variables; however, variability in sample sizes, age ranges, and types of AI-based technologies used across studies may affect the comparability and generalizability of the findings. These differences should be considered when interpreting the results and highlight the need for more standardized approaches in future research."

8. Unclear Rationale for Data Collection Methods

In the section about data extraction, the mention of "feasibility and usability" as part of the data form is important but lacks further context. Was there a specific framework used to evaluate these factors? Were the usability and feasibility criteria standardized across the included studies?
Suggestion: Provide more information on the frameworks or criteria used to assess usability and feasibility. Were these evaluations subjective or based on standardized measures? Clarifying this would help readers understand how the data extraction process assessed these aspects.
Discussion - Discuss usability, cost-effectiveness, and accessibility challenges. -Emphasize the need for interdisciplinary collaboration in future validation studies. -Propose specific psychometric properties (e.g., construct validity) to prioritize future AI validation research. - Add practical recommendations for improving AI tools (e.g., integrating psychometric standards, improving usability for non-specialists).

Since no standardized framework was used to assess feasibility and usability, this is an aspect that limits the interpretation and comparison of the results among the included studies. We state this information in the limitations.
Specifically, we added to the beginning of the limitations:: "Also, although this review was based on COSMIN standards to assess the psychometric quality of AI-related technologies, due to the heterogeneity observed in the included studies, no specific adjustments were made to control for possible confounding variables. Therefore, the conclusions need to be interpreted with caution. It is recommended that future research address these factors and use control methods to provide more generalizable conclusions. Furthermore, feasibility and usability were extracted only if the reviewed studies explicitly reported having done so in their analysis of AI-related technologies. Therefore, further studies should evaluate these analyzes using a standardized framework."

In response to comments on discussion, we have expanded this section on implications. Specifically, we added the following paragraphs in implications:
"To facilitate use, developers could conduct studies that evaluate the acceptance, ease of use, cost-effectiveness, and accessibility of these technologies. For example, most technologies rely on sensors and monitors that, while accurate, can be costly, require specialized training, and can be difficult to implement in real-world settings for physicians, teachers, therapists, or practitioners unfamiliar with these tools. In addition, disparities in access to advanced technologies may limit their adoption, particularly in low-resource settings.
Also, these types of technologies may be closer to more universal and cost-effective devices, such as video cameras, smartphones, and tablets, that can assess and report motor skills in real time. However, addressing these challenges requires a collaborative and interdisciplinary approach. Future validation studies should involve experts from multiple fields, including engineers, healthcare professionals, educators and policy makers, to ensure that these technologies are not only accurate, but also practical, scalable and accessible to diverse populations.
New validation studies of these technologies should include validation standards for BMS tests, prioritizing key psychometric properties such as construct validity, criterion validity, reliability, measurement error, among others. To make this possible, engineering teams could incorporate specialists in psychometrics, developmental therapy and medicine to work collaboratively. This multidisciplinary approach will facilitate the integration of medical knowledge and psychometric standards into future software releases, improving both measurement accuracy and practical usability. Finally, developers should consider providing open source code or detailed methodological documentation, which will allow for further refinement, replication, and clinical adaptation of these technologies in future research and real-world applications."

View more View less

Competing Interests

There is no conflict of interest

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Gallahue DL, Ozmun JC, Goodway J: Understanding motor development: infants, children, adolescents, adults. New York: McGraw-Hill; 2012.

[2] 2. Kit BK, Akinbami LJ, Isfahani NS, et al.: Gross Motor Development in Children Aged 3–5 Years, United States 2012. Matern. Child Health J. 2017; 21: 1573–1580. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Figueroa R, An R: Motor Skill Competence and Physical Activity in Preschoolers: A Review. Matern. Child Health J. 2017; 21(1): 136–146. Publisher Full Text

[4] 4. Jiang G-P, Jiao X-B, Wu S-K, et al.: Balance, proprioception, and gross motor development of Chinese children aged 3 to 6 years.J. Mot. Behav.2018;50(3): 343–352. PubMed Abstract | Publisher Full Text

[5] 5. Gandotra A, Kotyuk E, Bizonics R, et al.: An exploratory study of the relationship between motor skills and indicators of cognitive and socio-emotional development in preschoolers.Eur. J. Dev. Psychol.2023;20(1): 50–65. Publisher Full Text

[6] 6. Bremer E, Cairney J: Fundamental Movement Skills and Health-Related Outcomes: A Narrative Review of Longitudinal and Intervention Studies Targeting Typically Developing Children. Am. J. Lifestyle Med. 2018; 12(2): 148–159. Publisher Full Text

[7] 7. Eddy LH, Wood ML, Shire KA, et al.: A systematic review of randomized and case-controlled trials investigating the effectiveness of school-based motor skill interventions in 3- to 12-year-old children. Child Care Health Dev. 2019; 45(6): 773–790. PubMed Abstract | Publisher Full Text

[8] 8. Connolly KJ, Forssberg H: Neurophysiology and Neuropsychology of Motor Development. Cambridge University Press; 1997; 400.

[9] 9. Manoel EJ, Connolly KJ: Variability and the development of skilled actions. Int. J. Psychophysiol. 1995; 19: 129–147. Publisher Full Text

[10] 10. American Educational Research Association: American Psychological Association, National Council on Measurement in Education. Standards for educational and psychological testing. Washington, DC: AERA Publications Sales; 2014.

[11] 11. Hatfield BD, Landers DM: Observer Expectancy Effects upon Appraisal of Gross Motor Performance. Res. Q. Am. Alliance Health Phys. Educ. Recreat. 1978; 49(1): 53–61. PubMed Abstract | Publisher Full Text

[12] 12. Burghardt GM, Bartmess-LeVasseur JN, Browning SA, et al.: Perspectives - Minimizing Observer Bias in Behavioral Studies: A Review and Recommendations. Ethology. 2012; 118(6): 511–517. Publisher Full Text

[13] 13. Khalid R, Willatts P, Williams FLR: Do research studies in the UK reporting child neurodevelopment adjust for the variability of assessors: a systematic review. Dev. Med. Child Neurol. 2015; 58(2): 131–137. PubMed Abstract | Publisher Full Text

[14] 14. Bossavit B, Arnedillo-Sánchez I: Designing Digital Activities to Screen Locomotor Skills in Developing Children.Alario-Hoyos C, Rodríguez-Triana MJ, Scheffel M, et al., editors. Addressing Global Challenges and Quality Education. Cham: Springer International Publishing; 2020; p. 416–420. (Lecture Notes in Computer Science).

[15] 15. Roggio F, Trovato B, Sortino M, et al.: A comprehensive analysis of the machine learning pose estimation models used in human movement and posture analyses: A narrative review.Heliyon.2024; 10(21): e39977. PubMed Abstract | Publisher Full Text | Free Full Text

[16] 16. Mao M, Lee A, Hong M: Deep learning innovations in video classification: A survey on techniques and dataset evaluations. Electronics (Basel). 2024; 13(14): 2732. Publisher Full Text

[17] 17. Baca A: Methods for Recognition and Classification of Human Motion Patterns – A Prerequisite for Intelligent Devices Assisting in Sports Activities. IFAC Proc. Vol. 2012; 45(2). Publisher Full Text

[18] 18. Jiang F, Jiang Y, Zhi H, et al.: Artificial intelligence in healthcare: past, present and future. Stroke Vasc. Neurol. 2017; 2(4): 230–243. PubMed Abstract | Publisher Full Text | Free Full Text

[19] 19. Farrahi V, Niemelä M, Tjurin P, et al.: Evaluating and Enhancing the Generalization Performance of Machine Learning Models for Physical Activity Intensity Prediction From Raw Acceleration Data. IEEE J. Biomed. Health Inform. Jan. 2020; 24(1): 27–38. PubMed Abstract | Publisher Full Text

[20] 20. Alsareii SA, Awais M, Alamri AM, et al.: Physical activity monitoring and classification using machine learning techniques. Life (Basel). 2022; 12(8): 1103. PubMed Abstract | Publisher Full Text | Free Full Text

[21] 21. Cust EE, Sweeting AJ, Ball K, et al.: Machine and deep learning for sport-specific movement recognition: a systematic review of model development and performance. J. Sports Sci. 2019; 37(5): 568–600. PubMed Abstract | Publisher Full Text

[22] 22. Tang Y-M, Wang Y-H, Feng X-Y, et al.: Diagnostic value of a vision-based intelligent gait analyzer in screening for gait abnormalities. Gait. Posture. 2022; 91: 205–211. PubMed Abstract | Publisher Full Text

[23] 23. Butt AH, Rovini E, Dolciotti C, et al.: Leap motion evaluation for assessment of upper limb motor skills in Parkinson’s disease. 2017 International Conference on Rehabilitation Robotics (ICORR). 2017; pp. 116–121. Publisher Full Text

[24] 24. Pogorelc B, Bosnić Z, Gams M: Automatic recognition of gait-related health problems in the elderly using machine learning. Multimed. Tools Appl. 2012; 58(2): 333–354. Publisher Full Text

[25] 25. Bertoncelli CM, Altamura P, Vieira ER, et al.: Using Artificial Intelligence to Identify Factors Associated with Autism Spectrum Disorder in Adolescents with Cerebral Palsy. Neuropediatrics. 2019; 50(3): 178–187. Publisher Full Text

[26] 26. Santos OC: Artificial Intelligence in Psychomotor Learning: Modeling Human Motion from Inertial Sensor Data. Int. J. Artif. Intell. Tools. 2019; 28(04): 1940006. Publisher Full Text

[27] 27. Santos OC: Beyond cognitive and affective issues: Designing smart learning environments for psychomotor personalized learning. En: Learning, Design, and Technology. Cham: Springer International Publishing; 2023; pp. 3309–3332. Publisher Full Text

[28] 28. JC: Protocol for a scoping review. Zenodo. 2023. Publisher Full Text

[29] 29. Tricco AC, Lillie E, Zarin W, et al.: PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 4 de septiembre de 2018; 169(7): 467–473. PubMed Abstract | Publisher Full Text

[30] 30. Prinsen CAC, Mokkink LB, Bouter LM, et al.: COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual. Life Res. 2018; 27(5): 1147–1157. PubMed Abstract | Publisher Full Text | Free Full Text

[31] 31. Joel: data extension. [Data set]. Zenodo. 2023. Publisher Full Text

[32] 32. Mokkink LB, de Vet HCW , Prinsen CAC, et al.: COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures. Qual. Life Res. 2018; 27(5): 1171–1179. PubMed Abstract | Publisher Full Text | Free Full Text

[33] 33. Figueroa-Quiñones J, Ipanaque-Neyra J, Hurtado HG, et al.: Development, validation and use of artificial-intelligence-related technologies to assess basic motor skills in children: a scoping review (Last version). [Data set]. Zenodo. 2023. Publisher Full Text

[34] 34. Figueroa-Quiñones J: data extension. Zenodo. 2023. Publisher Full Text

[35] 35. Mokkink LB, Terwee CB, Patrick DL, et al.: The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual. Life Res. 2010; 19: 539–549. PubMed Abstract | Publisher Full Text | Free Full Text

[36] 36. Shengyan LI, Bin LI, Shixiong ZHANG, et al.: A Markerless Visual-motor Tracking System for Behavior Monitoring in DCD Assessment. Proceedings of APSIPA Annual Summit and Conference. 2017; 774–777. Publisher Full Text

[37] 37. Mao HY, Kuo LC, Yang AL, et al.: Balance in children with attention deficit hyperactivity disorder-combined type. Res. Dev. Disabil. 2014; 35: 1252–1258. PubMed Abstract | Publisher Full Text

[38] 38. Parvinpour S, Shafizadeh M, Balali M, et al.: Effects of Developmental Task Constraints on Kinematic Synergies during Catching in Children with Developmental Delays. J. Mot. Behav. 2020; 52: 527–543. PubMed Abstract | Publisher Full Text

[39] 39. Redd CB, Karunanithi M, Boyd RN, et al.: Technology-assisted quantification of movement to predict infants at high risk of motor disability: A systematic review. Res. Dev. Disabil. 2021; 118: 104071. PubMed Abstract | Publisher Full Text

[40] 40. Yu K-H, Beam AL, Kohane IS: Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018; 2(10): 719–731. Publisher Full Text

[41] 41. Monje MHG, Domínguez S, Vera-Olmos J, et al.: Remote Evaluation of Parkinson’s Disease Using a Conventional Webcam and Artificial Intelligence. Front. Neurol. 2021; 12: 742654. PubMed Abstract | Publisher Full Text | Free Full Text

[42] 42. Suzuki S, Amemiya Y, Sato M: Enhancement of gross-motor action recognition for children by CNN with OpenPose. IECON 2019-45th Annual Conference of the IEEE Industrial Electronics Society, Lisbon, Portugal. 2019; pp. 5382–5387. Publisher Full Text

[43] 43. Oung QW, Muthusamy H, Lee HL, et al.: Technologies for Assessment of Motor Disorders in Parkinson’s Disease: A Review. Sensors. 2015; 15(9): 21710–21745. PubMed Abstract | Publisher Full Text | Free Full Text

[44] 44. Belić M, Bobić V, Badža M, et al.: Artificial intelligence for assisting diagnostics and assessment of Parkinson’s disease—A review. Clin. Neurol. Neurosurg. 2019; 184: 105442. Publisher Full Text

[45] 45. Michalski SC, Szpak A, Loetscher T: Using Virtual Environments to Improve Real-World Motor Skills in Sports: A Systematic Review. Front. Psychol. 2019; 10. PubMed Abstract | Publisher Full Text | Free Full Text

[46] 46. Bredt S: Artificial Intelligence (AI) in the Financial Sector—Potential and Public Strategies. Front. Artif. Intell. 2019; 2: 16. PubMed Abstract | Publisher Full Text | Free Full Text

[47] 47. UNESCO: Artificial intelligence for sustainable development: challenges and opportunities for UNESCO’s science and engineering programmes.2019. Reference Source

[48] 48. Amemiya Y, Suzuki S, Satoh M: A Support System for Gross Motor Assessment of Preschool Children. En: IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society. 2018; 4251–4256. Publisher Full Text

Development, validation and use of artificial-intelligence-related technologies to assess basic motor skills in children: a scoping review

Abstract

Background

Methods

Results

Conclusion

Keywords

Revised Amendments from Version 1

Introduction

Figure 1. Process of recognition and classification of human motion patterns performed by artificial intelligence (AI)-related technologies.

Methods

Target studies

Search strategy

Data extraction

Data analysis

Results

Figure 2. PRISMA diagram for the scoping review.

Table 1. General characteristics.

Table 2. Engineering characteristics of studies that developed artificial intelligence (AI)-related technologies.

Table 3. Studies that developed substantive validation of artificial intelligence (AI)-related technology (n = 7) COSMIN Standards.

Table 4. Current use of studies that used artificial intelligence (AI)-related technology.

Discussion

Strengths and Limitations

Implications

Conclusions

Data availability

Extended data

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated