Keywords
Autism, Screening Tool, Modified Checklist for Autism in Toddlers-Revised, Translation, Adaptation
This article is included in the Developmental Psychology and Cognition gateway.
Autism, Screening Tool, Modified Checklist for Autism in Toddlers-Revised, Translation, Adaptation
In recent decades, autism spectrum disorder (ASD), a neurodevelopmental disorder that impacts a child’s social communication with repetitive and restricted behaviours (Treffert, 1970; Maenner et al., 2020), has been regarded as a global concern. The striking increase in the prevalence of children with ASD over the years (Elsabbagh et al., 2012; Redfield et al., 2014; Christensen et al., 2016; Kassim and Mohamed, 2019; Maenner et al., 2020) has highlighted the importance of the early detection of children who have a high risk of ASD due to its essential role in allowing intervention to begin as early as possible (Bryson et al., 2003; Kasari et al., 2006; Schreibman et al., 2015; Zwaigenbaum et al., 2015). The American Academy of Pediatrics (AAP) recommends that all children be screened at least twice, at 18 months and 24 months (Johnson and Myers, 2007; Myers et al., 2007). With such recommendations and advice spreading worldwide, many screening tools for ASD have been developed. According to some of the most extensive systematic review papers on screening tools for ASD (Rey et al., 2019; Petrocchi et al., 2020), the Modified Checklist for Autism in Toddlers-Revised (M-CHAT-R) is one of the most researched due to its good psychometric properties.
However, most of these screening tools, including the M-CHAT-R, were initially developed in English or for Western countries, which poses difficulties for other non-English speaking populations (Guillemin et al., 1993; Al Maskari et al., 2018). Consequently, the translated versions of the original screening tools are frequently used to cater to the needs of other diverse cultures and linguistic populations around the globe (Soto et al., 2015; Al Maskari et al., 2018). Although it is crucial to be equipped with a screening tool that is available in the local language, the literature has shown an increase in positive outcomes when the cultural context is considered in the adaptation of an instrument for different populations (Sousa and Rojjanasrirat, 2011; Rodríguez and Bernal, 2012; Grinker et al., 2015). In addition, the cultural adaptation of screening tools has also become more common, as it is a quick and cost-effective method to provide valid screening results, especially for countries with limited resources (Guillemin et al., 1993; Ware et al., 1995). In recent years, the M-CHAT-R has been translated into more than 40 translations, and it is available for free download on the author’s website (https://mchatscreen.com/mchat-rf/translations; Retrieved 3rd October 2022). However, for most of the listed translations, validation studies that document the translation and cultural adaptation process and the psychometric properties of the newly translated versions have not been published. In addition, a recent systematic review by Bevan et al., (2020) highlighted that translated screening instruments may function differently than the original versions due to language and cultural differences, suggesting that the adapted instruments may not be psychometrically valid compared to the originals.
The process of cultural adaptation is complicated, as it goes beyond the surface level of language translation and involves adaptation of constructs and technical changes before reaching functional equivalence between the adapted instrument and the original version (DuBay and Watson, 2019). A growing body of literature has suggested several guidelines in the translation and cultural adaptation process that are needed to produce high-quality adapted versions (Beaton et al., 2000; Sousa and Rojjanasrirat, 2011; Soto et al., 2015). Generally, the cultural adaptation process involves reproducing the selected instrument in a new target language, pretesting the new instrument with a small number of individuals from the target population, and analysing the psychometric properties of the new instrument. However, producing new versions of a selected instrument with functional equivalence requires a lengthy reproduction and pretesting process with many quality assurance measures (Ware et al., 1995; Sousa and Rojjanasrirat, 2011). While most studies report that simple forward-backward translation with optional pretesting might be sufficient to determine the validity of a translated version (Perneger et al. 1999; da Mota Falcão et al. 2003), a newer study found that research that undertook more quality assurance measures resulted in higher-feasibility and higher-quality instruments (Acquadro et al., 2008).
In addition, an analysis of the psychometric properties of the translated version should be performed before concluding the validity and reliability of the instrument being used in the new target population (Rodríguez and Bernal, 2012; Al Maskari et al., 2018). Many studies have documented differences in psychometric properties between translated versions and the originals despite the extensive steps taken in the reproduction process (Fourie and Feinauer, 2005; Kondolot et al., 2016). Hence, the synthesis of the equivalence dimensions of a translated instrument cannot be assumed simply by changing the linguistic code (McKenna and Doward, 2005). Necessities in planning the complex reproduction processes should be researched, and psychometric testing should be analysed before determining the functional equivalence of the translated instrument (Beaton et al., 2000). Several guidelines on translation and adaptation processes have been published (Beaton et al., 2000; Soto et al., 2015; Tsang et al., 2017) noting the importance of forward and backward translation, alongside the qualification of translators and sample size suggestions for pilot testing. However, little cross-cultural adaptation research has documented adherence to such guidelines, nor is there any literature that has examined the evaluation of such processes (Gudmundsson, 2009).
Studies and findings from cross-cultural research have always been valued for their significant clinical implications among healthcare professionals who provide care for diverse populations worldwide (Sousa and Rojjanasrirat, 2011). For example, a recent systematic literature review by Bevan et al., (2020) found that most Hispanic or Spanish developmental and behavioural screening instruments were not validated despite being widely used across the United States. Although the systematic review by Bevan et al., (2020) focused only on Spanish-language screening instruments, it undoubtfully offered valuable insights into the importance of the validation and norming of translated and culturally adapted instruments for a target that differs from that of the original. The results of their review suggested that language might be an incomplete proxy, while geographic region of origin or heritage is accounted for as a more culturally relevant factor in providing health care. However, more literature is needed to provide supportive data for such a notion. Unfortunately, in the area of ASD, most of the current available systematic review publications have mainly reviewed the available ASD screening tools and their psychometric properties (Soto et al., 2015; Levante et al., 2019; Petrocchi et al., 2020), while few have examined the feasibility of the translated versions and quality assurance measures taken in the translation and cultural adaptation process. Empirically, traditional simple forward-backward translation methods are insufficient to produce a quality instrument (Gjersing et al., 2010). The cultural values that depict how we view the world often represent what is considered appropriate parenting and child developmental growth (van Kleeck, 1994). Discounting the importance of cultural adaptation and the translation process in developing a translated screening instrument often results in substantial differences in identification rates and the development of poor instruments (Soto et al., 2015). Specifically, the risk level of impairments in related developmental areas may be over- or underestimated (van Kleeck, 1994). Some guidelines have been published to address the complications of cultural adaptation and translation processes (Guillemin et al., 1993; Mokkink et al., 2010; Sousa and Rojjanasrirat, 2011; DuBay and Watson, 2019). Similarities have been noted among published guidelines in several areas, including forward and backward translation, pretesting, and psychometric analysis. However, there are also notable distinctions in the evaluation of specific criteria to assess the quality of the translated instruments. For example, requirements for a translator and the synthesis process could be different from one guideline to another (Beaton et al., 2000; Sousa and Rojjanasrirat, 2011). While the synthesis process proposed by Sousa and Rojjanasrirat (2011) recommends a third independent translator, the guidelines by Beaton et al., (2000) suggest inviting the translators who were involved in the initial forward translation process to discuss and resolve to reach a consensus rather than having another independent translator.
Perhaps the incongruities in guidelines highlight the possibility that there is more than one way to achieve the cultural validity of a translated instrument (Al Maskari et al., 2018), while methodologies for the translation and adaptation of instruments have yet to be established (Gudmundsson, 2009). Hence, examining the steps and levels of adaptation processes used by some of the published studies may extend the knowledge of determining the feasibility of the existing guidelines and offer insights into future work on cultural adaptation and validation (Bird et al., 2014). However, the instrument used for evaluating measurement properties should always be of high quality to provide a sound conclusion. Although there is no single accepted guideline or checklist for evaluating the translation and cultural adaptation process, the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist (Mokkink et al., 2010), which was developed with the involvement of an international multidisciplinary team, seems to be a promising evaluation tool. Moreover, the COSMIN checklist has the advantage of being used as a modular tool where completion of the whole checklist is not necessary. According to the manual (Mokkink et al., 2010), in the evaluation of cross-cultural validity, only Box G should be completed. Similar to the core steps listed in most translation and cultural adaptation guidelines (Beaton et al., 2000; Soto et al., 2015; Tsang et al., 2017), the COSMIN Box G checklist evaluates the quality of the translation procedures and methodological measures involved. Hence, the COSMIN checklist would seem to be justifiable as an instrument to evaluate translation and cultural adaptation studies.
In summary, despite the growing number of studies on the translation and cultural adaptation of screening instruments, there seems to be limited literature that reviews the quality of translation and cultural adaptation processes. Therefore, this systematic review paper aims to methodologically review published translation and cultural adaptation studies of the M-CHAT-R. As the aim of this systematic review study is not to evaluate the effectiveness of an autism screening instrument but to evaluate the level and quality of the cultural adaptation process, comparing different translations of the same instrument seems appropriate in helping to recognise what is feasible and what is not in the adaptation process. Hence, the M-CHAT-R, which has been documented as one of the most translated instruments (Al Maskari et al., 2018; Petrocchi et al., 2020) undoubtfully offers more informative data for evaluation than other screening instruments. Ultimately, this research aims to (1) identify translations and cultural adaptations of the M-CHAT-R for different populations, (2) report on the methodology and psychometric properties of the M-CHAT-R translations, (3) evaluate the quality of translation and cultural adaptation processes and their adherence to the recommended COSMIN checklist and, (4) and critically discuss the implications of these findings to improve future research and practice.
A published protocol (Levante et al., 2019) was referred to as the methodological structure for this systematic review study; the authors of the protocol comprehensively wrote the protocol specifically for early detection studies for ASD in toddlers. According to the systematic review protocol published, the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) guidelines were employed as the primary search strategy and reporting system for all selected studies. In addition, for quality appraisal, the COSMIN checklist (Mokkink et al., 2010) was used to evaluate the psychometric properties and cultural validity of the selected studies. In fact, the COSMIN checklist has also been used as an appraisal tool for previous systematic reviews involving ASD measures for adults (Baghdadli et al., 2017).
All studies involving children aged 48 months and below who had completed the M-CHAT-R in languages other than the original English version were included.
According to the research questions of this systematic review, both predefined inclusion and exclusion criteria were specified. The predefined inclusion criteria were (1) validation studies of translation of the M-CHAT-R, (2) published papers in peer-reviewed journals, (3) papers written in English, and (4) papers published from 1 January 2009 to 31 December 2022. The predefined exclusion criteria were as follows: (1) epidemiological studies and guidelines, (2) retrospective studies and systematic reviews, (3) dissertation theses or conference papers, (4) studies that used the M-CHAT-R to detect developmental disorders other than ASD, (5) studies that compared the M-CHAT-R with other instruments, and (6) papers without the specific aim of evaluating the psychometric properties and validation properties of the M-CHAT-R.
An electronic search was conducted through various databases, such as EBSCOhost, Google Scholar, PubMed, Elsevier and other resources, and a secondary manual search was performed to include papers identified through references and citations as of 25 July 2023. Keywords such as ‘M-CHAT-R,’ ‘Modified Checklist for Autism in Toddlers Revised,’ ‘validation’, and ‘translation’ were applied to all research databases during the search. Papers were filtered by publication date from 1 January 2009 to 31 December 2022. A total of 281 articles were successfully identified at the initial stage. After the removal of the duplicates of 120 articles, 161 articles were screened based on the predefined inclusion and exclusion criteria. Most articles did not meet the inclusion criterion for validation studies, as most of the articles were epidemiological studies or systematic reviews. Hence, only 13 studies were included in this systematic review paper. Figure 1 below shows the PRISMA flow diagram, which portrays the steps performed in identifying literature from the databases (Han, 2023).
The final 13 eligible articles were organised, and the basic information with descriptive data was extracted and presented in table format. Descriptive data included the country, sample size, sample source, cut-off point for positive screening, screening age range, and mean age of the toddlers recruited as samples. The table also included the studies' psychometric properties in terms of sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).
As this systematic review emphasised the methodological evaluations and characteristics of the selected papers, the COSMIN checklist was used as the basis for the evaluations, as it is consistent with the aim of this study (Mokkink et al., 2010). Although guidelines for cultural adaptation and the translation process are available, they were not used in this review study, as they serve as a template and form of advice (Beaton et al., 2000) but not as a validated evaluation checklist as the COSMIN checklist does. The COSMIN checklist is a modular tool that evaluates the measures reported in a study, including its psychometric properties and validity. The COSMIN checklist has nine boxes that identify the main measurement properties. Each box contains various items and can be used individually depending on the target assessed. In this systematic review paper, the evaluation of measures and data obtained were mainly for translation and cross-cultural validity purposes; hence, Box G of the COSMIN checklist, which explicitly evaluates cross-cultural validity, was selected.
The Box G cross-cultural validity checklist is composed of a total of 15 items. The first three items focus on the adequacy of sample size and management of missing items for the studies being reviewed. Subsequently, Items four to nine serve as a clear checklist for the procedures involved in the translation of the M-CHAT-R items. Items 10 to 11 were included in the pre-test process conducted in the studies being reviewed to ensure that the translated versions were understandable and well adapted to the respective culture. Items 12 to 13 examine the characteristics of the samples in the studies reviewed and the quality of the methods and study designs. Our systematic review concentrated exclusively on Item one to Item 13 to evaluate cultural adaptation and the quality of the translation process. In other words, this review did not focus on the statistical methods used in previous studies; hence, Items 14 and 15 were not scored (Mokkink et al., 2010).
Since there was no consensus in the translation and cultural adaptation process, items were not rated based on numbers or scores but on the availability of specific information related to the items. When the item description was available in the particular study, the study was scored with a ‘+’, and when the study did not fulfil the item description, it was scored with a ‘-‘. If a study did not provide enough detail to fulfil the item description or no information was provided, it was scored as ‘NA’ (Mokkink et al., 2010). After each study was scored, the obtained data with evaluation markers were extracted into Excel (version 2303) and presented in a table. A simple total sum of ‘+’ scores, which signified the availability of information for each item, was calculated for each paper. Table 1 shows the item descriptions of Cosmin Checklist Box G for cross-cultural validity.
A total of 281 articles were successfully identified through the electronic search. During the initial review, 120 articles were removed due to duplication. Based on the predefined exclusion criteria mentioned above, 148 articles were removed. Finally, only 13 papers were included in this systematic review.
When the 13 eligible papers were identified for this systematic review, basic descriptive measures of the literature were extracted, presented in Table 2. All13 studies reported the results of the validation of the M-CHAT-R for 12 languages across13 countries. The targeted samples for all 13 studies were children aged 14 months to 48 months to whom the M-CHAT-R was administered. Table 2 shows that the highest mean age among the thirteen studies was 31.66 months, while the lowest mean age was 18.22 months (Magán-Maganto et al., 2020; Jonsdottir et al., 2021).
Study | Country | N | Sample source | Prevalence rate | Cut off point for positive | Screening age | Psychometric properties | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
Range | Mean age | Sensitivity | Specificity | PPV | NPV | ||||||
Brennan et al. (2016) | Albania (The Republic of Albania) | 2594 | Paediatrician site | 1.31% | ≧3 | 16-30 months | 24 months | Not evaluated due to high drop-out rate | 89.5% | - | |
Carakovac et al. (2016) | Serbia | 148 | Three primary health care centres and one tertiary psychiatric centre | 14.7 over 1000 children | ≧3 | 16-30 months | 22.25 months | Not evaluated due to limited sample size | - | - | |
Windiani et al. (2016) | Indonesia | 110 | Child growth and development outpatient clinic, Sanglah General Hospital | - | ≧2 | 18-48 months | 30.6 months | 88.9% | 94.6% | 76.2% | 97.8% |
McClure et al. (2018) | Nepal (Bhutan) | 13 | DeKalb Country Refugee Paediatric Clinic (DCRPC) patient database | - | - | 16-30 months | 21.9 months | Information not provided; study with a small sample size to translate and adapt the MCHAT R/F for use as a culturally adaptive screening tool | |||
Coelho-Medeiros et al. (2019) | Chile | 120 | The UC Christus Health Network | - | ≧3 | 16-30 months | 22.47 months | 100.0% | 83.3% | - | - |
Guo et al. (2019) | China | 7928 | Six provinces, seven tertiary hospitals | - | ≧3 | 16-30 months | 22.69 months | 96.3% | 86.5% | 91.0% | 100.0% |
Sangare et al. (2019) | Mila | 1067 | District and community health centres in Bamako and a psychiatry department | - | ≧3 | 16-30 months | 24 months | 50.0% | 100.0% | 100.0% | 87.0% |
Tsai et al. (2019) | Taiwan | 317 | Community and clinical setting | Estimation of 2.8–29.5 per 10,000 for the ethnic Chinese populations in Hong Kong, Taiwan, and China | ≧3 | 16-30 months | 24.3 months | 86.0% | 96.0% | 59.0% | 99.0% |
Jonsdottir et al. (2020) | Iceland | 1585 | Public Primary Health Centre (PHC) | 1.22 of 2531 children | ≧2 | 30 months | 31.66 months | 95.0% | 84.0% | 72.0% | 99.0% |
Magán-Maganto et al. (2020) | Spain | 6625 | Three provinces in Northwest Spain, namely, Salamanca, Zamora and Valladolid | 1 of 318 children | ≧3 | 14-22 months and 23-36 months | 18.22 months and 24.47 months | 79.0% | 99.0% | 39.0% | 99.0% |
Oner and Munir (2020) | Turkey | 6712 | Family Healthcare Centers (FHCs) | 0.8% | ≧3 | 16-36 months | 26.75 months | 100.0% | 91.0% | 8.6% | 100.0% |
Divya et al. (2020) | India | 450 | Kindergartens and day care centres | - | ≧3 | 16-30 months | - | Information not provided, only mentioned yielded reliability and validity estimates similar to the values of the original M-CHAT validity study | |||
Vorster et al. (2021) | South Africa | 21 | Community church and a day care centre | - | ≧3 | 18-48 months | 29 months | The aim was to collect pilot data to determine the preliminary reliability and feasibility of the two tests |
Out of the 13 studies reviewed, 11 studies recruited their sample population from specific healthcare settings such as hospitals, paediatric centres, or community healthcare centres. According to the basic demographic and descriptive data extracted, the number of samples recruited in each study varied widely. The lowest number of recruited samples was 13 (McClure et al., 2018), and no psychometric properties could be analysed due to the limited sample size. The largest number of recruited samples was 7,928 in China (Guo et al., 2019). Although the M-CHAT-R has been suggested for screening toddlers aged 16 to 30 months (Robins et al., 2014), not all 13 studies that were reviewed adhered to the suggested age range. A study from Spain (Magán-Maganto et al., 2020) recruited subjects as young as 14 months, while two other studies from Indonesia (Windiani et al., 2016) and South Africa (Vorster et al., 2021) recruited samples aged up to 48 months.
The psychometric properties of 13 studies were extracted. The properties of the desired screening accuracy of the 13studies, namely, sensitivity, specificity, PPV, and NPV, are presented in Table 2. Only seven studies reported all screening accuracy properties. Although the PPV and NPV were reported in several studies, the prevalence rate of ASD in the study population was not reported in every paper. The prevalence rate was specified in only six studies (Brennan et al., 2016; Carakovac et al., 2016; Tsai et al., 2019; Magán-Maganto et al., 2020; Oner and Munir, 2020; Jonsdottir et al., 2021). A Chilean study (Coelho-Medeiros et al., 2019) reported sensitivity and specificity values but did not specify the PPV and NPV. Among all the studies that reported sensitivity and specificity values, only a French study (Sangare et al., 2019) reported a low sensitivity value of 0.50, while the others reported good sensitivity and specificity values. Except for the three studies with large sample sizes, namely, the Chinese study (Guo et al., 2019) with 7,928 samples, the Spanish study (Magán-Maganto et al., 2020) with 6,625 samples and the Turkish study (Oner and Munir, 2020) with 6,712 samples, four studies reported high PPVs and NPVs, similar to those of the original study (Robins et al., 2001). Interestingly, the three studies that reported low PPVs (<0.40) were the studies that recruited subject samples from the general population across different provinces or districts of a country. Although the PPV varied among studies, the NPVs reported by all studies remained consistently in the high range of 87%-100%.
Five studies did not report any screening accuracy values. An Albanian study (Brennan et al., 2016) reported only the PPV and failed to report the NPV and sensitivity and specificity values due to a high drop-out rate, while Nepali (McClure et al., 2018) and Serbian (Carakovac et al., 2016) studies could not provide sensitivity and specificity values due to small sample sizes. An Indian study (Divya et al., 2020), despite having an adequate sample size of 450 individuals, did not provide the exact values, as the authors adopted a different validation approach. These authors reported excellent reliability, with a Cronbach’s alpha score of 0.894, and validated the Tamil version of the M-CHAT-R (T-M-CHAT-R) through divergent validity. The T-M-CHAT-R was statistically compared to the Indian Scale for the Assessment of Autism (ISAA), a standardised assessment tool that serves to diagnose and measure the severity of ASD specifically for the Indian population, and yielded a high positive correlation of r=0.01. As such, the authors suggested that the results obtained in their study were similar to the psychometric properties of the original M-CHAT-R study (Robins et al., 2001).
In alignment with the research objectives, COSMIN Box G on cross-cultural validity was used as a quality appraisal tool for all 13 studies, particularly rating the translation and adaptation process, research methodology, and data handling. Table 3 presents the COSMIN checklist results of all 13 studies. Although 13 items were evaluated, the highest number of ‘+’ items, which showed the availability of information presented in each study, was only eight. None of the studies reviewed provided all information needed for each item. The studies that provided the most information were Jonsdottir et al., (2020), McClure et al. (2018), Tsai et al. (2019), and Vorster et al. (2021), while the studies by Coelho-Medeiros et al. (2019), Sangare et al. (2019), and Windiani et al. (2016) provided the least information, with only two items that could be scored ‘+’. Our results showed a lack of justification for highlighting percentages and methods in handling missing items during the validation process, as an Albanian study (Brennan et al., 2016) was the only study that disclosed the percentage of missing items out of all thirteen studies.
Study | Language | G1 | G2 | G3 | G4 | G5 | G6 | G7 | G8 | G9 | G10 | G11 | G12 | G13 | Sum of “+” |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Brennan et al. (2016) | Albanian | + | - | + | + | + | NA | + | - | + | - | - | + | - | 7 |
Carakovac et al. (2016) | Serbian | - | - | - | + | - | + | + | - | - | - | - | + | - | 4 |
Windiani et al. (2016) | Indonesian | - | - | + | - | - | NA | NA | - | - | - | - | + | - | 2 |
McClure et al. (2018) | Nepali | - | - | - | + | + | NA | + | + | + | + | + | + | - | 8 |
Coelho-Medeiros et al. (2019) | Chilean | - | - | + | - | - | NA | NA | - | - | - | - | + | - | 2 |
Guo et al. (2019) | Chinese | - | - | + | + | - | NA | + | + | + | - | - | + | - | 6 |
Sangare et al. (2019) | French | - | - | + | - | - | NA | NA | - | - | - | - | + | - | 2 |
Tsai et al. (2019) | Chinese | - | - | + | + | + | NA | + | + | - | + | + | + | - | 8 |
Jonsdottir et al. (2020) | Icelandic | - | - | + | + | + | NA | + | + | - | + | + | + | - | 8 |
Magán-Maganto et al. (2020) | Spanish | - | - | + | + | - | NA | + | + | + | - | - | + | - | 6 |
Oner and Munir (2020) | Turkish | - | - | + | - | - | NA | + | - | + | + | - | + | - | 5 |
Divya et al. (2020) | Indian | - | - | + | + | + | NA | + | - | - | - | - | + | - | 5 |
Vorster et al. (2021) | Northern Sotho | - | - | + | + | + | NA | + | - | + | + | + | + | - | 8 |
On the other hand, studies conducted in Serbia (Carakovac et al., 2016) and Nepal (McClure et al., 2018) were unable to recruit an adequate sample size, resulting in a failure to report psychometric properties. According to the COSMIN checklist, only nine studies described the translation process from the original English M-CHAT-R to other languages. All nine studies adopted the forward and backward translation approach. However, only six studies highlighted the involvement of experts in the relevant subject matters (Brennan et al., 2016; McClure et al., 2018; Tsai et al., 2019; Divya et al., 2020; Jonsdottir et al., 2020; Vorster et al., 2021).
In addition, among these six studies, only five stated how differences between certain items in the translated version compared to the original version were resolved. Furthermore, only four studies explicitly stated that a committee reviewed the translated versions. Only the Nepali study (McClure et al., 2018) satisfied all the checklist items regarding the translation and adaptation process; in this study, a certified Nepali translator first translated the Nepali version of the M-CHAT-R, then a modification and review was performed by expert panels with double modifications, and finally, back-translation was conducted by a bilingual native Nepali speaker. On the other hand, the Chinese and Icelandic versions of the M-CHAT-R fulfilled all checklist items similar to the Nepali version, except for the lack of committee review. In contrast, the Northern Sotho version of the M-CHAT-R (Vorster et al., 2021) was reviewed by the research committee, but the study did not mention how the difference between items was resolved.
COSMIN Box G includes four checklist items regarding pretesting, sample similarities, and methodological flaws. Although none of the studies seemed to have any major methodological flaws, only five studies performed a pre-test, and four studies specifically described the samples used for the pretesting process (McClure et al., 2018; Tsai et al., 2019; Jonsdottir et al., 2020; Oner and Munir, 2020; Vorster et al., 2021).
The main objective of a screening tool is to identify subjects who are at risk of a specific disorder to facilitate further evaluation, leading to earlier identification for necessary treatment (Soto et al., 2015). As the development of autism-specific screening tools for a particular culture or population in a specific language requires extensive effort and resources, there has been an increased preference in recent years to translate and adapt validated screening tools with excellent psychometric properties, such as the M-CHAT-R, which is regarded as a more cost-effective method than developing a new tool (Brennan et al., 2016). However, the extent to which the measures of the translation versions have the same functional equivalence as the original remains ambiguous. In this systematic review, thirteen studies of the translation and adaptation of the M-CHAT-R in different populations were reviewed.
Initially, the M-CHAT-R was chosen as the screening tool to be studied for this systematic review due to its good psychometric properties (Robins et al., 2014) compared to those of other autism screening tools available for young toddlers (Levante et al., 2019; Petrocchi et al., 2020). In addition, the original authors published a list of more than 40 translations of the M-CHAT-R on their website. In accordance with the first aim of this study to identify published articles on the validation of the translated versions, however, only 13 studies fulfilled the predefined inclusion and exclusion criteria when a systematic database search was performed. Most of the translations posted were either not validated with a written publication or were published in a written language other than English. Compared to the original publication (Robins et al., 2014), for which a large population-based sample of 16071 toddlers was recruited, most of the adapted versions were validated by recruiting samples mainly from healthcare settings. In fact, the bias in recruiting samples from healthcare settings that consist of toddlers with a higher risk of ASD than the rest of the population (Matson and Kozlowski, 2011) might have affected the screening accuracy of the M-CHAT-R during validation and made the accuracy appear unstable across studies (Camp, 2006). A lower PPV is often expected in a screening tool that targets the assessment of disorders with low prevalence, such as ASD, when used in the general population (Kleinman et al., 2008; Soto et al., 2015). However, when the recruited samples include a higher proportion of children at risk of ASD, the PPV would be higher, as the PPV is highly dependent on the prevalence rate (Hedley et al., 2010). Evidently, such a claim is supported in this systematic review, whereby the studies from Spain (Magán-Maganto et al., 2020) and Turkey (Oner and Munir, 2020), in which large samples from the general population were recruited, reported much lower PPVs than studies in which samples were recruited from healthcare settings only. Hence, perhaps when the validation of an instrument is emphasised, the PPV is meaningful only when the samples recruited are a representation of the general population, which reflects the true prevalence of the disorder.
The second aim of this systematic review was to report on the methodology and psychometric properties of the M-CHAT-R translations. All 13 studies reported high sensitivity and specificity values (>0.80) except for the Mila study (Sangare et al., 2019), which showed that the M-CHAT-R is a practicable instrument with high screening accuracy. Interestingly, there seemed to be a difference in the age range of the children recruited as subject samples across the 13 studies reviewed. The original M-CHAT-R (Robins et al., 2009) was published as an ASD screening tool to assess children between the ages of 16 and 30 months. However, among the 13 studies reviewed, three studies did not adhere to the recommended age range. In the study of the Spanish adaptation of the M-CHAT-R (Magán-Maganto et al., 2020), samples as young as 14 months up to 36 months were recruited, but the study still showed good psychometric properties of the instrument, with high sensitivity (79%), specificity (99%), PPV (39%) and NPV (99%). The lower PPV may have been because they recruited 6,625 children from the general population as their subject samples; thus, while the prevalence rate was lower, the sample offered a more realistic reflection of the total population with ASD. In addition to the study of the Spanish M-CHAT-R, studies from Turkey (Oner and Munir, 2020), Indonesia (Windiani et al., 2016), and South Africa (Vorster et al., 2021) also expanded the age range of the samples recruited. In the study from Turkey (Oner and Munir, 2020), samples aged up to 36 months were recruited, while in the other two studies, children aged up to 48 months were recruited. Despite the extended age range of subjects recruited, both Oner and Munir’s (2020) and Windiani et al.,’s (2016) studies showed high sensitivity (100% and 88.9%, respectively) and specificity (91% and 94.6%, respectively) values. Although the PPV varied significantly between these studies, Windiani et al. (2016) reported a PPV as high as 76.2%, and Oner and Munir (2020) reported a relatively low PPV of 8.6%; these differences could be due to the difference in the population recruited. Windiani et al. (2016) recruited samples only from a single at-risk setting, namely, Sanglah General Hospital, while Oner and Munir (2020) recruited a very large sample of 6,712 subjects from the general population.
Although it may seem ambiguous, there are studies on the original M-CHAT (Robins et al., 2001) and its translations (Koh et al., 2014; Windiani et al., 2016; Vorster et al., 2021) in which the instrument was administered to young children aged up to 48 months that yielded similar results. As stated by a few researchers (Landa et al., 2013; Estes et al., 2015), this may be explained by the similar expression of ASD symptoms in the 16-48-month age range. Hence, the feasibility of the M-CHAT-R being used for a wider age range could be considered in future research, which could significantly increase the practicability of the M-CHAT-R. In addition, this systematic review also supported the flexibility of the researcher-defined age range of the subject samples recruited for M-CHAT-R translation and adaptation studies but further highlighted the importance of pilot testing and validation of the instrument among the newly selected population.
Third, this systematic review also evaluated the quality of the translation and cultural adaptation process of 13 articles by adhering to the recommended COSMIN checklist. Based on the results of our systematic review, it is evident that there are wide variations in the translation and adaptation process. Many studies did not report the details of the methods or approaches used, which leads to confusion and complications for evaluation using the COSMIN checklist. 10 out of 13 studies described the translation process, but none of the studies reviewed fully adhered to the COSMIN Box G cross-cultural validity checklist. Although the majority of the studies adopted a forward and backward translation approach as suggested by the original authors on their website, little detail on the number of translators, the expertise of the translators, or the use of committees to review was mentioned, nor were there descriptions regarding the resolution of any discrepancies that arose. Only six studies documented the expertise of translators involved during the translation process, which generally included subject-matter experts in child development (Brennan et al., 2016; McClure et al., 2018; Tsai et al., 2019) or experienced linguists (Divya et al., 2020; Jonsdottir et al., 2021; Vorster et al., 2021).
Notably, recently published articles have shown evidence that indicates the necessity of having a minimum of two bilingual translators who have relevant cultural backgrounds and adequate proficiency in both languages to minimise any risk and biases of linguistic and cultural understanding (Beaton et al., 2000; Musa et al., 2007) for both forward and backward translation processes. In this systematic review, a few published studies explicitly mentioned some items and phrases that were reworded. For example, the Chinese version of the M-CHAT-R made a specific adaptation for Item three by changing the examples from “vacuum the rug” and “mow the lawn” to “wipe the table” (Guo et al., 2019), which they perceived to be more culturally appropriate for the population in China. However, despite communicating in the same Chinese language, the researchers who translated the Taiwanese version of the M-CHAT-R (Tsai et al., 2019) added some examples of play that involves hiding and suddenly reappearing, such as hide-and-seek or making funny faces, instead of using the words “peek-a-boo”, as they stated that Taiwanese parents do not say those words during play. In addition, in the Spanish version of the M-CHAT-R, the example description of Item 20, “bouncing on one’s knees”, was replaced with the name of the common children’s game "the little horse” in Spanish (Jonsdottir et al., 2020). In addition, the Icelandic M-CHAT-R validation study by Jonsdottir et al. (2020) also specified the “action figure” for Item three as a “Playmobil or Lego figure”, as children in Iceland often use such toys in their play. Although there were other studies that reported the adaptation of words or phrases, the details of the process or methods used remain ambiguous. However, it does seem that changes in words or phrases are mainly associated with differences in cultural upbringing among different populations.
On the other hand, surprisingly, the systematic review results showed that only four studies ran a pre-test of the translated M-CHAT-R. A preliminary pilot test is important to provide insight for the researcher to ensure that the translated M-CHAT-R maintains its language and concept equivalence in the targeted culture or community (Beaton et al., 2000; Tsang et al., 2017). It is equally crucial to review how the targeted respondents would perceive the translated version. Among the four studies identified, two were published as pilot studies; in the Nepali M-CHAT-R study (McClure et al., 2018), 13 participants were recruited, and in the Northern Sotho M-CHAT-R study (Vorster et al., 2021), 21 participants were recruited. In both studies, the authors mentioned conducting interviews with their respondents explicitly to determine items that raised confusion, and they subsequently made amendments to those items. On the other hand, in the Taiwanese M-CHAT-R study (Tsai et al., 2019) and Icelandic M-CHAT-R study (Jonsdottir et al., 2020), pre-tests with 25 respondents and 10 respondents, respectively, were conducted to finalise the translated version used for their main study. Based on this review, it is further clarified that the general guidelines for having a sample size between 10 and 40 respondents are sufficient for similar studies, although interviews or clarification for understanding should be sought from the targeted respondents during the pretesting stage (Beaton et al., 2000; Tsang et al., 2017).
One of the major limitations of this systematic review paper is the inclusion and exclusion criteria, as the evaluation of the translation and adaptation process was solely dependent on the information written in the published articles. It is possible that components of the process were not detailed by the researchers even though they were conducted, such as not clarifying the background of translators or how the discrepancies between translations were resolved. Second, only papers published in English were included in this systematic review. The very large discrepancies between the search results and the number of translations listed on the author’s website suggest that some researchers might have published papers on the translation and validation of the instrument in other, non-English languages. Hence, the small number of identified papers might present limitations in drawing any solid conclusion.
Nonetheless, this systematic review paper is the first to explore the feasibility of a single instrument, the M-CHAT-R, by assessing the translation, adaptation, and testing processes. The rigorous evaluation of the translated and adapted versions of the same screening tool not only offers better insights into the feasibility of the tool but also enables a more profound understanding of the essential steps needed for cross-cultural adaptation. Therefore, the strength of this paper lies in the ability of the above findings to serve as a baseline for future research. The outcomes of this review showed that the majority of the studies published demonstrated that the cultural adaptations of the M-CHAT-R are excellent screening tools, and interestingly, offered insights that the M-CHAT-R, as a valid screening instrument to identify children at risk of ASD, might be extended to include children aged up to 48 months. In addition, this paper further highlighted that the sensitivity and specificity values, which are not dependent on the prevalence rate, seem to be more appropriate measures than the PPV and NPV to reflect the validity of the M-CHAT-R, as the results showed considerable discrepancies in PPVs among studies due to the difference in the proportion of clinical and population-based samples.
In addition, this paper revealed that the adaptation process, especially the translation, is largely disregarded, posing concerns that the screening accuracy of translation tools could be compromised. Future research on cross-cultural validation studies is advised to detail the translation and adaptation process, specifically by stating if any phrases or items would be changed or omitted for the targeted populations. In summary, it should be noted that some of the most critical steps in adapting an instrument for a different cultural population are often overlooked (Sperber, 2004). Some established methodology guidelines (Guillemin et al., 1993; Mokkink et al., 2010; Sousa and Rojjanasrirat, 2011; DuBay and Watson, 2019) could be referenced for translating and validating instruments for cross-cultural populations. In conclusion, future research in the area of cultural adaptation should account for culturally relevant indicators of ASD, the expertise of the translators, possible changes in language due to cultural norms, and the possible alteration of scoring or administration methods aligned to the targeted populations or respondents.
No data are associated with this article.
Repository: PRISMA and PRISMA for abstracts checklists and flow chart for ‘Systematic review of translation and cultural adaptations of autism spectrum disorder’s screening tool: The Modified Checklist for Autism in Toddlers, Revised (M-CHAT-R)’. https://doi.org/10.6084/m9.figshare.22674682. (Han, 2023).
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)