Identifying the core components and items to measure health workers' cultural competence in the Ethiopian context [version 1; peer review: awaiting peer review]

Background: Cultural competence (CC) is a crucial attribute in attaining quality healthcare service outcomes, mitigating malfunctioning practices, and improving patient satisfaction. Studies suggested comprehensive CC assessment requires appraising existing CC tools to measure health workers’ CC in the Ethiopian context. Selecting existing CC tools, identifying sub-constructs, pinpointing demographic characteristics, and evaluating items are the study objectives. Methods: 20 cultural competence tools of 20 to 83 items, 1-5 sub-constructs, and 4–10 Likert-Type rating options were identified, to be rated by eight experts in three groups. Consensus based Standards for the selection of health Measurement INstruments (COSMIN) and test construction literature are used to develop raters rating codes to compute Inter-Raters Reliability (IRR). The first group of three experts was rated to decide the inclusion of CC tools, factors, and demographic information. The second group of three experts selected six CC tools and 65 items. Two experts in the third group further evaluated the selected items. Results: Reliability for the inclusion of CC tools, factors, and demographic variables were found to be 75%–87%, 50%–93%, and 50%–86% respectively. 13 items that violate test construction principles like absoluteness, endorsement, proneness to multiple interpretation, ambiguity and double barring were excluded. Cultural skill, cultural knowledge, and cultural awareness are the three most common sub-constructs


Introduction
In recent decades, cultural competence (CC) is increasingly becoming a subject of scholarly attention globally with growing commitments to address the needs of culturally diverse clients. 1 Being a key service sector, the provision of quality health care, by and large depends on the ability to effectively handle cultural diversity that encompasses patients' variations of gender, ethnicity, education, geography, language, color, etc. 2 Health is a key service sector that demands culturally competent health providers and health workers in order to guarantee better patient satisfaction. [3][4][5][6] Countries should work to equip health professionals with the required CC knowledge, skills, and attitudes to address the needs of culturally diverse patients. 7,8 Explorations of the current cultural situation and assessments of CC can help health researchers and policymakers to understand major constrictions and reduce health workers' bias. 9,10 CC is critical for providing appropriate health services for countries with culturally diverse populations like Ethiopia. However, understanding health workers' and providers' CC status is hampered by lack of conceptualizations and consistent measuring tools. 11 CC conceptualization gaps are evidenced in using different terms to detonate it and lack of a single agreed-upon definition. 12 The concept transcultural competence or cultural competence were first coined by Pratt in 1957, interchangeably used terms such as cultural sensitivity, cultural-bound health care needs, cultural-specific care needs, transcultural, multiculturalism, and cross-cultural competence. 13 CC refers to a trait where a person can coordinate, work, or interact with other people who are of different cultures and social backgrounds. 14,15 CC has become an unquestionable and ubiquitous aspect of health professional development and linked to efforts to eliminate culture-based health disparities through sympathetic health workers training. 16 A growing body of literature [17][18][19][20][21][22][23][24][25] recognizes that the health workforce or providers' CC assessment become a priority of health research, education, and practice in countries where health workers encounter culturally diverse patients to improve healthcare outcome quality. To realize the benefit of CC assessment, scholars develop, adapt, and validate CC measuring tools for nursing students 26 ; nurses active at work 10,27 ; rehabilitation practitioners 28 ; professionals living and working in multicultural settings 29 ; and healthcare providers. 4 Scholars rationales for the need of CC tools are increasing trend of foreign patients visiting homogeneous society of South Korean, 23,30 increasing trend of nurse expatriates in Iran, 31 and proven existence of ethnic diversity in Vietnam. 32 However, common limitations of health professionals' CC research and assessment worldwide include: lack of clarity on CC concepts; scarcity of outcome-based research on effectiveness of CC strategies; scant numbers of standardized CC instruments in health cultural context of countries; and violation of one or more test construction principles. 11,33 Beyond aforementioned gaps, Ethiopian health policy gives limited attention to CC, and health providers overlooked the role of CC in health practices were evidenced as additional weakness. [34][35][36][37] Therefore, CC assessments necessitate an urgent need of either developing or adapting valid and reliable CC tool for use in Ethiopian health services. Beforehand, it is imperative to conduct such a type of study targeted to identify major factors to be considered in CC tools. This study was conducted as the first phase of developing the Ethiopian CC instrument with objectives, including: (a) to identify CC measuring tools and demographic variables; (b) to detect sub-constructs of CC tools; (c) to select items to be included in CC tools; and (d) to evaluate items' adherence to test construction principles.
The remaining part of this study is presented in seven sections. The context of health service provision in Ethiopia is presented in section two. Conceptualization and theoretical frameworks are presented in section three. Section four, five, six, and seven will present methodology, findings, discussion, and conclusion respectively.
Health service provision in the Ethiopian context Ethiopia has a short history of modern health services and upholds its roots during the Emperor Regime simultaneously with the 1948 Ministry of Health establishment. 38 The Emperor Regime and Ethiopian subsequent governments commonly have preventative, curative, and rehabilitative health goals. Ethiopia evidenced high population growth rates with 120 million people in 2022, estimated 250 million in 2050, 39 urban (21%) -rural (79%) population variations, 40 is the home of more than 80 ethnic groups, 41 and people communicate using 90 languages of Semitic, Cushitic, Omotic, and Nilo-Saharan families. 42 These cultural factors conglomerated with others have made Ethiopia a home of paramount cultural diversity that will call for the need for CC assessment in crucial service sectors.
So far, heart-breaking conflict, violence, displacement, and war have continued as curses in recent and past Ethiopia. This might be due to inadequacy to address CC in the policies and practices. These devolved situations have supported Huntington's 1993 assertion; human conflict and violence in the new millennium will not be primarily generated on economic or ideological grounds but rather from the limitations of designing culturally responsive systems in crucial service sectors. 43 The health sector, as key service sector, is not only limited in incorporating CC but the health policies and practices overlooked it. 44 Or else, past and current governments of Ethiopia commonly worked massive health facility expansion and increasing number of health workers to meet the increasing health demands of rapidly growing population. [45][46][47] However, Ethiopia is still known for very low densities of health workers at 0.96 per 1000 population far below the African density of health workers of 2.2/1000 population and five times less than World Health Organization's (WHO) minimum threshold of 4.45 per 1000 population to meet the SDG health targets. 37,48 According to Haileamlak's study finding Ethiopia has less than 100,000 health workforce and the health workforce in 2022 is estimated to be less than 120,000. 37 Upon this, Ethiopia should require more than 200,000 health workers to reach the density of African health workforce and 500,000 to fulfill Universal Health Care (UHC) minimal requirements. This means that in order to achieve UHC by 2030, Ethiopia expected to train more than 65,000 health workers (medical doctors, health officers, nurses, and midwives) per year for the next eight years.
However, the work to increase Ethiopian Health workers has increased variations on cultural factors such as language, ethnicity, food, clothing, religion, etc., both within their colleagues and patients. 49 Alternatively, to minimize health gaps, scholars suggested that the Ethiopian government revise its health education curricula, training modality, trainee recruitment, and quality of teachers to go in line with CC international standards, parallel to health workers increasing tasks. 37 Whereas, increasing diversity, conglomerated with limited numbers of health workers, unavailability of CC tools to assess health practices, limited CC attention in health policy, and education curricula have hampered delivering quality healthcare, obstructed closing health disparity, and prevented achieving national and global health goals. 50 Despite this, the international health education curriculum overwhelmingly incorporates CC issues and conducts a comprehensive assessment of health workers as prime priorities of health research, policy, education, and practice. 51 Because, designing mechanisms to improve in-service training to equip health-workers with knowledge, attitude and skill of CC is helpful to deliver quality healthcare by reducing societal dissatisfaction. However, CC as one of mitigating strategies to Ethiopian health challenges, particularly gaps resulted from increasing variations of cultural factors, has gained little attention from both the past and current Ethiopian governments.
In order to bring high-quality health outcomes to satisfy health demands of culturally diverse patients, Ethiopia health system should design strategies to enhance health workers or providers CC, and conduct comprehensive CC assessments to understand existing health gaps as essential components of health practices have emerged as suggestions. 34,52 According to studies, Ethiopia should emphasize CC education, on-the-job training, interpreter service, and comprehensive assessment of health workers to deliver culturally appropriate care. 53 However, beyond CC limitations evidenced worldwide; the Ethiopian health system has demonstrated additional gaps. For instance, nurses reported as if they do not hear the term cultural competence both in their schooling and practices, rather they accustomed to clinical competence. 53 Another study evidenced a considerable numbers of Ethiopian people believed that modern health services showed unacceptable practice. 54 A growing numbers of studies overseas recommend the need for CC assessment of health workers to address patients' health demand. [4][5][6]18,55 Not surprisingly, improving the CC of health professionals by formulating CC policy and conducting comprehensive CC assessment has become the most recurring theme in Ethiopian health. According to Bonsa (2018), it is critical to conduct nationally representative surveys of CC from patient, provider, and professional respondents. The cultural difference of western countries as low context culture and non-western countries as high context culture have limited the universal and consistence use of CC instruments to measure CC of health workers. 56,57 For instance, the cultural belief of health or illness acceptable in one culture may not be acceptable in another culture. [58][59][60][61][62] A study on the health or illness belief of Ethiopians evidenced that spiritual healing is an important treatment for illness, prefer injections to tablets, and prefer translators from the same ethnic group. 63 Additionally, Ethiopia's proven existence of a high number of cultural factors, limited attention to CC in health practices, ambivalent or complex nature of CC, and inconsistencies in the rationales of CC tool construction overseas might make using a specific CC tool to assess the CC of Ethiopian health workers unsound. 34,64 Therefore, the CC tool for use in Ethiopia should be comprehensive in nature, let alone other cultural factors, and it should address the existing marginalization resulted by prejudice or stereotypes within same culture. Marginalization of individuals by their perceived dominant counterparts within same ethnic group; body contacts with them are not allowed, and touching should be believed as bedraggled that heightens patients' dissatisfaction on the healthcare service.
The proven existence of health disparity of Ethiopia resulted from such and other cultural factors have made imperative to conduct such types of study prior to developing, adapting and validating CC tools. Therefore, conducting this type of study is becoming mandatory not only to achieve the objectives but also crucial to use as a steppingstone for the development of Ethiopian Health Workers CC scale. The next sections will discuss the conceptualization and theoretical frameworks of CC.

Conceptualization and theoretical frameworks of CC
There is no single agreed-upon definition of CC, continues as complex, ambivalent, and ambiguous to conceptualize it. 44 Scholars prefer to denote it interchangeably using various terms; 'culturally congruent', 'interculturality', 13,65 'transculturality', 66,67 and 'crossculturality'. 22 The major defining features and concepts deduced from these terms might not contradict with each other. 68 For example, the terms commonly share the necessity to address the needs of clients who are essentially different from the one who provide services to them by one or more cultural factors. 68 The aforementioned terms were used in this study to represent the constructs and concepts addressed in CC.
In spite of its lack of consensus, CC is conceptualized as a set of congruent behaviors, attitudes, and policies that come together to enable a system or among consumer providers and professionals to work effectively in cross-cultural situations. 65 Again, Padopoulos et al., defines CC as the study of cultural differences and similarities in health and illness, underpinning societal and organizational structures to understand current health care practice and contribute to future development in a culturally responsive way. 67 The ambiguity of the concept is manifested in the definition stated as mechanisms of giving attention to patients' prior healthcare experiences in their country and abroad, to possible migration backgrounds and experience therein. 68 The differences in defining features may become sources of contradiction to use a single model or theory on various health cultural contexts. Therefore it is mandatory to identify most widely used definitions, and the Campinha-Bacote's definition of CC, and its sub-constructs are found to be appropriate to use as operational definitions of this study. 9 Beyond the differences, the definitions commonly strive to achieve three core components of global health goals; health as human rights, attention to the entire world's perspectives, and use of an interdisciplinary approach. 69 Regardless of the continued ambiguity, scholars continued to develop, adapt, and evaluate CC measuring tools in the context of health using theories, models and conceptualizations.
Among the classic theories of culture and culturally appropriate health, the most notable one is Leininger's theory; she worked as a clinical nurse specialist with worried children in the USA and noticed recurrent behavioral variations among children. 66,70 It motivated her to formulate a hypothesis stating whether such variations have a cultural basis or not. 59,71,72 To test it, she studied and conducted intensive research to understand the existing nature of culture outside the Western context. 66 She came to Africa, Papua New Guinea, and studied the culture of the country, and she later synthesized her grand theory, called the Cultural Care Theory (CCT).
CCT aims to equip health professionals with the capacity to care for culturally diverse people as crucial elements of the healthcare field. During the study, she gave emphasis to the transcultural capabilities of nurses in delivering culturally appropriate care to clients. She devoted considerable time not only developing the CCT but also worried about making use of it in most cultural contexts. Leininger's philosophical roots of CCT include: extensive and diverse nursing experiences; anthropological insights; life experiences and values; and reflection upon spiritual insights and beliefs. 58, 66 Leininger represents her theory using sunrise model, symbolizes rising of the sun to represent care. Her CCT is depicted in Figure 1. 73 The figure depicts the four foci of the Sunrise Enabler as a cognitive map of the culture care theory; within the foci in the upper portion of the model are components of social structure and worldview factors that influence care and health. The figure depicts how health as a service made available to diverse clients should consider cultural variations manifested on individuals, families, groups, and communities when providing services, delivering health education, and designing curriculum. 58,62,74 Leininger's theory primarily emphasizes culturally based factors such as technological, religious, and philosophical; kinship and social; and cultural values, beliefs, and life-way's in the second foci. 58,59 Leininger argues that delivering culturally congruent health care is not gained by any single factor, but rather by a complex interaction of social, organizational, and internal and external dimensions and factors of health providers and professionals. 75 The other model of CC is Giger and Davidhizar established Model of Transcultural Assessment and Intervention (MTAI) first published in 1988. 76 The MTAI explores variations that exist in caregivers' responses and perspectives relative to cultural diversity presented in a specific country. This theory can be used to examine the evaluation, decision-making, and implementation approaches of health professionals regarding CC in health. 76 Some improvement is observed in 2000 and 2005 in Purnell's model of CC in recapping the limitations of MTAI and clarifying CCT's three activities and decisions. The factors in Leininger were identified as 12 domains in Purnell's model that include overview/heritage, communication, family roles and organizations, workforce issues, bio-cultural ecology, high-risk behaviors, nutrition, pregnancy and childbearing practices, spirituality, healthcare practices, and healthcare practitioners. 77 The most widely used CC model is developed by Campinha-Bacote; making many improvements to her initial model that aimed to recapture the ambiguous nature of CC. Her model is called the Process of CC in the Delivery of Health Care Services. 10,17,78 Her 1991 model incorporated a limited number of constructs such as cultural awareness (CA), cultural knowledge (CK), cultural skill (CSK), and cultural encounters (CE) without expressing their interdependence at the same time. She then decided to revise her model, adding a fifth construct called cultural sensitivity (CS) and changing the pictorial representation from linear to a set of circles. 18, 78 The Campinha-Bacote's Model is depicted in Figure 2. 73 The five CC constructs in the model share more common features. Campinha-Bacote's model assumptions include: CC is a process, not an event; there is more variation within ethnic groups than across ethnic groups (intra-ethnic variation);  levels of competence of health workers are related to their ability to provide culturally responsive health care services; and it consists of the aforementioned five sub-constructs or factors. 9 For this study, the concept of CC and sub-constructs defined by Campinha-Bacote's 2003 work were used as operational definitions. 69 This is because the validation, adaptation, and development of most CC instruments were highly informed by the sub-constructs of CC identified by Campinha-Bacote's model. 30,79,80 Cultural competence (CC) is health workers' ability to deliver culturally appropriate care to culturally diverse patients or the ability to synthesize and apply previously acquired cultural awareness (CA), cultural knowledge (CK), cultural skill (CSK), cultural encounter (CE), and cultural sensitivity (CS) to provide culturally appropriate patient care'.
Cultural encounter (CE) is an ongoing process of interacting with patients from diverse cultural backgrounds. It aims to validate, refine, or modify existing values, beliefs, and practices of a cultural group.
Cultural desire (CD) is the motivation of the healthcare professional to 'want to' engage in the process of becoming culturally competent, not to 'have to'.
Cultural awareness (CA) is deliberate self-examination and in-depth exploration of one's biases, stereotypes, prejudices, assumptions, and '-isms' that one holds about individuals and groups who are different from them.
Cultural knowledge (CK) is the process of acquiring a solid educational foundation about culturally and ethnically diverse groups.
Cultural skill (CSK) is the ability to collect culturally relevant data regarding the patient's presenting problem as well as accurately perform a culturally-based physical assessment in a culturally sensitive manner.

Ethical statement
Hawassa University, College of Natural and Computational Sciences, Research Ethics Review Committee (RERC) has approved the implementation of the research on September 11, 2022, with reference number RERC 10010/22 (Letter of Research Ethics Review Committee). The Committee confirmed that the research met the criteria, which included minimizing subject risk, selecting subjects equitably, determining respect for person-informed consent, documenting it appropriately, and maintaining privacy and confidentiality. Written informed consent was obtained from all participants.

Sources of information and search strategy
We searched the term 'review of CC measuring tools in health settings' in the Google Scholar database for publications from November 2021 up to August 2022. The search was narrowed down again by searching the statement 'health workers' Cultural Competence Measuring Instruments: a systematic review' within the first search. Then, 347 articles or documents published in English were downloaded for further evaluation to select relevant materials. Making screenings based on COSMIN checklist manual, 81 three articles such as "measures of cultural competence in nurses: an integrative review", 51 "psychometric properties of instruments used to measure the cultural competence of nurses: a systematic review", 24 and "cultural competence: analyzing the construct" 25 were chosen for intensive review to further narrow down the selection of articles.
Based on the insights gained, full-text screening of studies related to CC in health settings and factor analysis of subconstructs, models, and theories with target groups of health professionals, health workers, nurses, nursing students, physicians, and health providers were conducted. Articles on CC assessment, scale, tools, instruments, and questionnaires were also considered for inclusion in this study.
Finally, 103 articles or documents of two types in the databases of Academic Medicine, Sage Pub, Academia, Course Hero, Science Direct, Transcultural Care Net, Quiz Let, Semantic Scholar, Simply Psychology, Springer, and Research Gate which were available on Google Scholar and published between 1972 to 2022 were identified and reviewed.
The first type of papers, papers from overseas outside of Ethiopia, including: methodological materials; review of culture and CC; developing, adapting, and validating CC tools; and CC concepts, theories, and frameworks. The second type of paper, papers based in Ethiopia or about Ethiopian topics, including: sociolinguistics, population, and human resources; health policy, strategy, goals, diversity for outcome disparities, and standards; and sociocultural health beliefs, languages, and ethnicity.

Procedure
Eight carefully selected experts participated in this study. The experts were selected based on their relevant contributions to the issue under study through in-depth discussions held with college deans, department heads, and researchers. There were three experts in the first group; one physician with an MD degree, teaching health-related courses for health students at Dilla University and delivering health services at the university's referral hospital for more than five years; one sociologist (PhD) with more than ten years of teaching experience at the same University; and one clinical nurse working at the university's referral hospital for more than a decade. The two health workers of this group were selected as participants upon the recommendations of the health college dean of Dilla University. The third expert in this group is a sociologist, whom the authors chose after carefully evaluating the profiles of the teachers in the departments of social anthropology and sociology. Experts in this group served as judges' in the selection and inclusion of CC tools, subconstructs, and respondents' personal information. Besides, these experts have had a chance to independently select participants from health workers to participate as experts in the second group of the study, i.e., the screening of items from their agreed-upon CC tools Upon the recommendations of the first group of experts, the other three experts in the second group included a physician working at Black Lion Hospital; a clinical nurse holding a master's degree and who was working at Jemmo Health Post in Addis Ababa, and a lecturer in public health working at Jimma University. These experts have more than seven years of proven experience in the health setting of Ethiopia and participated in the screening and selection of items for further evaluation of CC instruments initially decided for inclusion by the first group of experts.
Following this, the third group were set up for the final screening. To select experts who participated in the third group, the authors have had in-depth discussions. They arrive at a consensus to select these experts from psychology department. Because courses related to item development, adaptation, and validation have been delivered to undergrad and postgrad students at psychology department. The group constituted two experts with MA degrees in psychology who specialize in measurement and evaluation that aimed to equip learners with the necessary knowledge, skill, and value for valid and reliable assessment instrument construction, appraisal, and administration with proven experience in teaching test construction and related courses. Both experts in the third group were lecturers at the Department of Psychology, College of Education, Dilla University.
For the purpose of obtaining their willingness to participate in the study, experts in the first and third groups were communicated with through both face-to-face contacts and telephone calls. These experts in both groups were communicated with for data gathering at Dilla University, Odaya Campus, in the lecture hall of the college of education and behavioral sciences at different time points. While experts in the second group were contacted via phone calls, data was gathered at the Office of Excellence Consultancy and Training Center (ECTC) in Addis Ababa. The willingness expressed during our contacts has been taken as their initial consent to participate in this study. Beyond the discussions held at different time points, the experts in the entire three groups have independently engaged in activities related to the tasks assigned in the respective groups of the study. Finally, the written consent of experts in the entire three groups has been obtained during the face-to-face contact made during data gathering. The data collections were held in Dilla and Addis Ababa starting from September 2022 up to November 2022. Figure 3 depicts the procedure followed to guide this study.
To guide the subsequent stages, rating criteria have designed based on review of prominent scholarly works on principles of scale development and item construction [82][83][84][85] ; review of articles on CC instruments 24,25,51 ; and the COSMIN checklist manual. 81 The rating criteria were synthesized and guided by the prominent work of McAlister et al., 86 on qualitative coding to assess Inter-Raters Reliability (IRR) through experts discussion in iterative processes. The rating criteria for raters to rate have prepared by reviewing the technical manuals of CC instruments, and relevant articles.
The rating criteria used to select CC instruments for inclusion in the study include the adequacy of the number of subconstructs; the use of a clear framework; the appropriateness of the target group; an adequate sample size; methodological flows; originality; and reports of validity and internal consistency. The personal characteristics were selected based on raters identified from CC measuring tools, literature reviews, CC frameworks, and practical experience on personal, professional, and job-related cultural factors believed to have significance in delivering culturally congruent care to culturally diverse Ethiopian patients. The sub-constructs were decided from the number of factors identified in the selected CC tools, literature review, CC frameworks, and factor analysis results of related studies. The prepared rating criteria to guide this study are indicated in Table 1. 73 The experts at each stage have prepared codes to use as rating criteria guided by the prominent work of McAlister et al., 86 on qualitative coding to assess Inter-Raters Reliability (IRR) through experts discussion in iterative processes. As well, the preparation of rating codes were also supported by test construction materials, three articles on systematic reviews of CC instruments, and COSMIN checklist manual. 81 Three experts in the first group designed rating codes mainly formulated from the COSMIN checklist and test construction material. According to the checklist, in such studies, health measuring instruments should be included if they meet the following criteria: internal consistency or reliability, content validity, structural validity, and cross-cultural validity. As well, they further enrich their rating codes with test construction materials and the three reviewed articles. They agreed upon the number of sub-constructs in the cultural competence instrument, appropriateness of the target group, comprehensiveness in addressing prominent cultural diversity factors, attention to the use of a clear study framework during instrument development, or validation or adaptation as their rating criteria to guide their instrument selection decision.
Out of the selected CC tools, the first group of experts decided on 20 relevant measuring tools to pass to the next stages of this study. Next, IRR rating codes were further enriched and agreed on adequacy of number of sub-constructs, clear theoretical frameworks, appropriateness of the target group, sample size adequacy, internal consistence report of the full scale and sub-scales, indicating relevance validity report, methodological flows, and originality for their decision by rating "1" to agree with the inclusion of the CC instrument or "0" to disagree with the inclusion of the CC instrument. The prepared rating criteria to guide this study are indicated in Table 1.
In general, the three health experts in the first group have participated to decide inclusion of number of CC tools, identify sub-constructs and determine relevance personal information for the subsequent parts of this study, i.e., pilot testing, validating and factor analysis. As well, they have decided the number of CC instruments to pass to item selection, and evaluation subsequent stages.
To select items from agreed CC instruments, three experts in the second group have further designed IRR codes based on McAlister et al., work. They prepared the codes after attaining orientation on good personality item writing, and independent readings. After synthesizing the principles to select items from COSMIN 81 and test construction materials. [82][83][84] Expert of this group have agreed to these items incorporated from relevance sub-constructs, appropriateness to the target population, and adherence to principles of item construction as rating codes for their decision of inclusion. For instance, the codes to guide experts of both in the second and third group to for item selection and evaluation respectively were used test construction materials of Kline (2015), DeVellis (2012), and Nunnally and Bernstein (1994). [82][83][84]   According to DeVellis, the criteria to select or evaluate the utility of personality items, including: don't make stuff in the past tense; make items with only one idea; keep away from double negatives; items with a simple and direct sentence structure are preferred; avoid employing absolute words as only, just, always, and none; avoid items with many interpretations; avoid items that are likely to be endorsed by everyone; use straightforward language; and keep items to a maximum of 20 words. 82 As well, rating criteria such as the acquiescence response set, social desirability response set, a preference for responding to things in the center or 'uncertain category', and a subject's preference for responding to the extreme response categories of Kline were considered as codes of rating in both the selection and evaluation of items. 84 Finally, the following item construction principles suggested by Nunnally and Bernstein were also taken into consideration during both item selection and evaluation. 83 Their suggestion include: items must be clear; use straightforward wording; all respondents will be able to offer information reliably when asked for an item; each respondent must be able to provide an answer to each question; and write specified short items. 83 Therefore, experts in the second group and third group were taken into consideration in selecting and evaluating items by using the aforementioned item writing suggestions to guide their ratings and use as coding criteria. Table 1 depicts the coding criteria to be used for inclusion, and exclusion ratings. 73 According to McAlister et al., 86 preparation of criteria is an initial step towards establishing IRR between raters within a team of experts. IRR was computed to determine the percentage of agreement between raters. IRR indicated that each rater might not have to rate either for inclusion or exclusion of same number of items, tools, sub-constructs, and demographic variables.
The IRR was calculated as the number of agreed ratings over the total number of ratings on the issue under study as per required. The computed IRR, which gives the fractional percent of the number of rated materials that agree, should be compared to the desired 80%-90% agreement. The overall IRR was calculated the number of ratings that all three raters agreed on divided by the number of total ratings. In addition, IRR between each individual pair of raters computed the total number of rated materials or sections divided by the number of ratings that both raters agreed on. Each set of two raters had two IRR values: (1) the number of times rater one agreed with rater two divided by the total number of ratings used by rater one, and (2) the number of times rater two agreed with rater one divided by the total number of ratings used by rater two.
Cohen's Kappa was computed to know the IRR of two raters that evaluate the quality of the final items before being included. Cohen's, symbolized by the lowercase Greek letter, is a robust statistic useful for either inter-rater or intra-rater reliability testing. Similar to correlation coefficients, it can range from -1 to +1, where 0 represents the amount of agreement that can be expected from random chance, and 1 represents perfect agreement between the raters. 87 The computed reliability for the inclusion of instruments, CC sub-constructs, item selection, evaluation, and demographic variable identification were used as IRR coefficients 86 and inter and intra-rater reliability testing standards. 87 For computed IRR, the fractional percent of the number of rated materials that agree should be compared to the desired 80%-90% agreement.
Two raters were screened and evaluate items in this study. The IRR is computed using Cohen's Kappa. The probability of number of items both raters of experts in the third group agree on inclusion (a) and exclusion (b) were calculated. Then, for the items IRR Cohen's Kappa was computed. The IRR result should be evaluated against indicated reliability index indicated. The two experts again discussed on contextualizing items further for inclusion, then an increase on the computed Cohen's Kappa is expected to attain. The final number of screened items was given for two experts for further evaluation.  27 inventory for assessing the process of cultural competency (IAPCC), 17 transcultural selfefficacy tool (TSET), 98 cultural competence assessment instrument (CCAI-UIC), 28 cultural competence assessment (CCA), 99 Korean cultural competence scale for clinical nurses (K-CCS-CNs), 23 critical cultural competence scale (CCCS), 79 and cross-cultural evaluation tool (CCET). 100 The first group of experts selected sub-constructs, and personal information that are relevant to capture cultural competence in the Ethiopian context. From these instruments agreed for inclusion at least by one expert in the first group. As well, experts in the second and third groups were selecting and evaluating items, respectively, from these CC tools entirely agreed for inclusion by the three experts in the first group.
However, to use as a guiding framework for the selection and evaluation of items, the first group of experts and researchers have agreed to keep two CC tools such as IAPCC 17 and TSET. 98 This is because most the existing CC instrument development, validation, and adaptations are informed either by the two instruments or the frameworks they used to develop their respective tools. 30,67,80,94,101 In the following sections, a review of CC measuring tools, selection of CC tools, identification of sub-constructs, sociodemographic variables, and item screening and evaluation: item wordings to contextualize to Ethiopia's cultural setting will be presented.

Review of CC measuring tools
Before presenting findings of the study, the reviewed results of CC measuring tools were presented in Table 2 Two of the 20 CC tools with 83 items were developed for use in the USA to the target group of students and teachers are the CCCET of Jefferys and Dogan, 94 and TSET of Jeffreys and Smodlaka. 98 The sections below raise conceptual frameworks, response options, and reliability of CC instruments. In subsequent sections, study findings on IRR for inclusion and exclusion of CC measuring tools, socio-demographic characteristics, CC sub-constructs, item screening, and evaluation of items were presented.

IRR of study constructs
The 20 carefully selected CC tools were further evaluated to select the final instruments, determine sub-constructs incorporated, identify major socio-demographic variables, select relevance items, and evaluate items. Table 3 depicts the aforementioned issues, and the IRR of each themes of the study. 73

Selection of CC measuring tools
For the selection of CC measuring tools, the IRR of three experts' ratings were computed. The results show that IRR was found to be between 75-87.5%. In other words, the raters' disagreement ranged between 12.5%-25% owing to raters differing interpretation in evaluating tools that might be originated from lack of prior experience in executing similar tasks.
Eight CC tools, including; NCCS, IAPCC-R, TSET, CCAI-UIC, CCA, K-CCS-CNs, CCCS, and CCET have gained 100% agreement of the three raters. Whereas, four instruments, including; CCI, ECSAI, CCET, and INCCQ have gained 100% disagreement of inclusion by the three raters. As well, six instruments including CSES, CCCET, CCAT, CAS, CS, and CCINC have attained 67.7% agreement of two raters, and disagreement of one rater. Finally, two instruments, the CKS and CDQNE have gained 33.33% of agreement by one rater, and disagreement by two raters.

Identifying sub-constructs
The computation of IRR here is conducted for the major six factors incorporated in the remaining 16 CC measuring tools based on the agreement of inclusion by one or more raters. The IRR of CS is 38%, CD is 44%, CE is 50%, both CK and CA is 75%, and CSK is 81%.  Agreed for the inclusion of 24 items from 41 items of NCCS NCCS, 28 17 items from 24 items of CCAI-UIC, 29 16 items from 26 items of CCCS, 96 14 items from 20 items of CCE, 85  Beyond raters' agreement, IRR was computed with the exclusion of CAS and CCET due to unrelated sub-scales and limited number of sub-constructs respectively approximately reached 69%. Likewise, the percentage of each of the subconstructs incorporation in these tools approximately ranged from 43% for CS to 93% for CSK. The CA and CK subconstructs have equally incorporated by 86% in the aforementioned instruments. As well, CS, and CE were incorporated by 50% and 57% in the Reviewed instruments.

Socio-demographic variables
Regarding socio-demographic variables, the IRR was found to be 50-86%. From this, gender, age, religion, type of organization, department, language, professional category, education, language use, language skill, professional category experience, cultural diversity encounter, cultural training, and levels of cultural competence, ethnic selfidentification, and performance evaluation result were decided to be incorporated to measure the socio-demographic characteristics of health workers in the cultural setting of Ethiopia.

Item selection
The IRR for the selection of items were computed similarly and found to be 59% for rater three, 62% for rater one, and 67% for rater two. The first three experts were agreed to keep TSET and IAPCC from the selected instrument to use as the framework of the subsequent study. Then, six CC measuring tools, including; the 41 items of NCCS, the 24 items of CCAI-UIC, the 26 items of CCCS (excluding the 17 items of critical empowerment), the 20 items of CCE, the 25 items of CCA, and the 33 items of K-CCS-CNs were included for the purpose of item selection. Therefore, the raters were expected to select relevance items from the total of 169 items compiled from the aforementioned six CC instruments. Based on this, the raters were agreed and selected 65 items to pass to the next section of evaluation and item screening (See extended data -appendix A). 73

Item screening and evaluation: Rational of exclusion and inclusion of items
The two raters both agree on the inclusion of 46 items and exclusion of 13 items. Then, the computed IRR by Cohen's Kappa was found to be 0.75. The IRR result should be agreed to be moderate reliability. The two experts again discussed on contextualizing items for inclusion and both agreed to incorporate 10 additional items. Following this, the screened 65 items have further evaluated to decide the final included and excluded number of items.
The IRR was computed using the quotient on the number of items violated test construction principles that already identified in Table 2 of this paper (n=12) and the number each rater identified. Based on this, IRR of rater one is found to be 50% and rater two is 67%. Based on agreed appraising criterion indicated in Table 2, both raters have arrived in a consensus to discarded 13 items out of 65 items. The 65 selected items from the six CC tools, items evaluated by experts, the final 56 included items, and the 11 excluded items are indicated in Appendices A, B, C, and D, respectively. 73 The items such as 'health care knowledge and the patient's comprehension of interpretation of health/illness are usually different systems', 'even if a patient's use or adoption of a treatment method differs from my professional knowledge', 'I usually don't prohibit it', and 'knowledge of cultural differences about the decision maker in the family' were automatically discarded due to violation of one or more item construction principles. 73 Items such as 'I can explain the possible relationships between the health/illness beliefs and culture of the patients', 'I do not feel that I have the skills to provide services to ethnic minority clients', and 'there are no cultural variations within a cultural group of people' were rejected, because they are endorsed by every respondent. Items such as 'I openly discuss with others' issues I may have in developing multicultural awareness', and 'Awareness of culture's impact on perception on health and illness' were rejected because they are prone to multiple interpretations.
Whereas the item 'cultural and linguistic differences could compromise healthcare provider's well-being' was rejected due to violations of both double barring and brevity principles. The item, 'it is not important to assess a patient's preferences in terms of healthcare services if I am knowledgeable about their culture' was excluded on violation of item writing principles such as the item should be short or comprehensible and brevity.
The items 'thinking what it would be like if I were a foreign patient', 'trying to say simple phrases in the foreign patient's language when I care for them', and 'knowledge of cultural differences about sensitivity to pain' were rejected due to lack of clarity, in appropriateness of the target group, and lack of both self-referent and clarity respectively.
As well, both experts of the third group agreed to accept 11 items with some minor modifications as per the nature of the items. Some of the items both experts agreed for inclusion with some modification by mainly incorporating self-referent phrases include: 'I understand a patient's cultural background can promote the quality of Health care', 'I can compare the health or illness beliefs among patients with diverse cultural background', I have the Knowledge of cultural/religious belief that limits caring depending on gender', 'I use a variety of sources to learn about the cultural heritage of other people', and 'I have knowledge of cultural differences about touch or Space'.
Therefore, 13 items were excluded, 11 items modified for inclusion, and remaining 52 items passed for further evaluation.
Item wording to contextualize to the cultural setting of Ethiopia The words or phrases in the items selected from six CC tools by replacing by contextual words or phrases to make items more appropriate for use in the health cultural context of Ethiopia. These words or phrases, including: 'goals', 'most people's', 'it is difficult replaced', 'race', 'a service provider', 'provider's', 'compromise', 'people', 'all', 'members', 'AND', 'identify', 'school', 'relate', and 'foreign patients (FP)' replaced or changed by 'activities', 'culturally diverse patients', 'I face difficulties', 'health professional', 'health professional's', 'hinder','patients', 'different', 'patients', 'OR', 'encounter', 'place', 'treat culturally diverse patients or patients whose ethnic is different from mine' respectively. The noun 'nursing' has been replaced by 'clinical', and 'health care' depending on context. The word 'client' has been replaced by the 'culturally diverse patient', and 'patients'.
The experts' reasons to replace or remove words/phrases have included: to address lack of targets; to avoid items likely to be endorsed by everyone; to avoid items with multiple interpretations; to improve the clarity of items; to make items independent as possible; to avoid jargon words of items; to articulate the items intended factors to measure precisely; to restate items to keep proper arrangements, and to improve the manifestation of patterns of associations. As well, words like 'only', 'usually', and 'completely' that denote absoluteness of the items have been also removed. Furthermore, those items violated one or more principles of item constructions including double barring, lacks target group, might be open for endorsement, prone to multiple interpretations, caused brevity, included absoluteness words, stated as long sentences or with more than 20 words, lacks simplicity, stated to direct with idiomatic expressions, written too short, and proven lack of comprehensibility has been rewritten to improve the quality of the items.
These double barring items were rewritten either into two independent statements or stated as a single item based on the significance of the items. For instance, the item which stated 'Can have appropriate verbal and nonverbal communications with foreign patients' was rewritten as two statements; 'I can have appropriate verbal communications with patients whose ethnicity or culture is different from mine; and 'I can have appropriate nonverbal communications with patients whose ethnicity or culture is different from mine' after the necessary modifications were made by experts. Another double-barring item, 'I find ways to adapt my services to individual and group cultural preferences' has been restated as a single item and written as 'I find ways to adapt my services to individual cultural preferences.' Lastly, the phrase 'ethnic minority clients' has been replaced by phrases such as 'my culturally diverse patients', and 'patients whose ethnicity is different from mine'. On the other hand, the words or phrases like 'behavior', 'ways', 'usually', 'completely', 'will', 'disability', 'being flexible, empathetic, and non-judgmental,' and 'group' have been decided and removed.
Finally, 56 items were selected for further validation of the items to be included in the final version of the newly developed CC Scale to use in the context of Ethiopian health care settings.

Discussion
The  103 According to Bloom, the TSET's affective subconstruct intended to guide health professionals' way of responding, organizing the values, and internalizing the proven cultural variations that help him/her in delivering culturally congruent care to culturally diverse patients; and the practical sub-construct measures how a health professional effectively manipulates with precision to handle the expected health services. Therefore, 83 items of TEST are believed to address six of the most commonly identified sub-constructs in comprehensive, reliable, and valid ways.
Generally, most CC tools have incorporated three sub-constructs such as CSK, CA, and CK. CE and CS were significantly incorporated into the CC tools next to the three sub-constructs. The sub-construct, CD is the least incorporated sub-construct in CC tools.
Specifically, factors incorporated in reviewed instruments of the IAPCCs' of Campinha-Bacote, 9,17,102 and CDQNE of Sealey LJ 80 directly incorporated five of the sub-constructs of CK, CA, CSK, CD, and CE. Whereas the CCINC of Cai D, W, Klunklin A, Kunaviktikul, Sripusanapan A and Avant PK 97 incorporated five of these sub-constructs with some degree of ambiguity of factors denoted as 'respect' to categorize as desire or encounter, and with overlap on the factor denoted 'self-understanding' with denoted sub-construct i.e., CK, later experts agreed in categorizing factors, such as respect denotes CS, and the factor self-understanding denoted CE. Whereas, out of six sub-constructs s used to measure CC, four of the sub-constructs such as CK, CA, CS, and CSK were directly incorporated into NCCS, 27 and K-CCS-CNs. 23 Sub-constructs CA, CK, and CS were directly incorporated into CCAT, 19 whereas the fourth sub-construct CSK was incorporated, representing the indirectly denoted 'cultural practices' factor of the tool. Similarly, sub-factors CA, CK, and CSK were directly incorporated as factors of CCAI-UIC, 28 while the construct CE was taken as one factor of this tool designated as 'practice'. However, four of the sub-constructs incorporated in the CC measuring tool CS 96 indirectly represent the factors, such as interaction, engagement, and respect for cultural difference, interaction confidence, and interaction alternative. The experts agreed to categorize the factors denoted as interaction or interaction alternatives, interaction confidence, engagement, and respect for cultural difference in CS 96 as they were believed to measure the CC sub-constructs such as CK, CSK, CD, and CS respectively.
Despite CC, tools addressing three of sub-constructs of CC, CCCS, 79 and CSES 94 were identified. The CC tool CCCS incorporated sub-constructs such as CA, CK, and CSK and was indicated directly and precisely. Whereas the CC measuring sub-constructs such as CK and CSK were identified in CSES, the CES sub-construct was decided as substitute for the factor denoted as 'knowledge of cultural patterns for specific ethnicity'. On the contrary, the CC tool CSK, only directly indicated sub-construct is CK, but the CSK and CE sub-constructs were denotative of the other two factors identified in the scale.
Concerning the conceptual frameworks of the CC tools, the most commonly used frameworks were Campinha-Bacote's the processes of CC model, Papadopoulos, Tikki, and Taylor's model. Leininger's cultural care theory, Albert Bandura's social cognitive theory, Jeffery's CC and confidence model, the Purnell model of CC, the critical CC model, and literature review were used by professional experts in their development processes of the tools. On the contrary, a significant number of the aforementioned tools were not reported models or theories used to guide their tool construction process.
Therefore, for the subsequent study, Leininger's cultural care theory and Campinha-Bacote's process of CC model have been recommended and agreed to be used for the synthesizing of the then to be developed Ethiopian CC Scale for Health Workers study's conceptual framework. This has been made necessary because of a lack of consensus on the definition of culture and CC. 104 The term 'CC' has appeared regularly in health care literature in the past six decades, but there is no single, commonly accepted definition and it continues with ambiguity. 44 Therefore, the experts' engagement is required to mitigate the lack of agreement on what exactly constitutes culture and CC in the health setting context by grasping the major tents of culture and CC through the models, theories, and literature. Besides, the stated limitations, together with a lack of attention to culture and CC issues in the health policy and practices of Ethiopia, have become added reasons to explore more theories, models, tools, and literature that are believed to help the adoption of a synthesizing conceptual framework more appropriate in the rich cultural diversity of Ethiopia.
The CCT of Leininger revealed that her work has created new insight into the nature of culture, care, and the transcultural health profession by presenting ways to describe, interpret, discover, and understand the diverse meanings and patterns of health and illness, care and healing, survival, and ways of facing and understanding death or disability by culturally diverse people. Scholars have recommended studies of CC in countries with people citing various languages used, the existence of proven ethnic variability, a high number of foreign patients, internal displacement, and increasing numbers of immigrants as common reasons for using CCT and believe that it properly guides similar studies of measuring health professionals' CC. 61 Most CC theories and models have been constructed or adapted using Leininger's pioneering ideas by blending them into their works. 17,77,79 Based on the scholars' recommendation, the experts of this study agreed to select the contents of items to measure health professionals' level of CC. They should have been guided by CCT factors that encompass: worldview; beliefs, values, and practices; technology; education; economics; communication; ethno-history; kinship, family, and social; religious, philosophical, and spiritual; political and legal; and generic forks 58,59,70,72,74 Even though CCT identified the major components or sub-constructs to be incorporated in the adaptation, development, and validation of health workers' CC, but it lacks clarity in conceptualizing their use as the sub-constructs. 77 Therefore, experts agreed to use Campinha-Bacote's process of CC model, believed to be operationalized concepts of CCT actions, and the decision modes as sub-constructs of health professionals' CC measurement. This model has been recommended as the proper model for such types of studies that adopt Leininger's three actions and decisions into workable CC sub-constructs. According to Campinha-Bacote, the three CCT actions and decisions were further categorized into five sub-constructs such as CK, CA, CSK, CE, and CS to make it plausible to conceptualize and operationalize CC underlining manifestations. 18 Most CC measuring tools have used Campinha-Bacote's the process of CC model and mainly incorporated CA, CK, CS, CSK, and CE as the major sub-constructs in the measuring tools of health providers and professionals' CC. As well, the model of Padopoulos et al., 67 identified CA, CK, CSK, and CS as four levels or stages, and they categorized CC as the final stage in which the previous stages are integrated and applied to skills, including recognizing and challenging all forms of discrimination.
Most of the reviewed CC tools assess the CC of health workers in a self-administered manner and are based on individuals' perceptions. 21 As well, most instruments developed to measure CC are commonly used, the 17,55,78 Campinha-Bacote's IAPCC. The tools included CA, CK, CSK, CE, and CS as the major sub-constructs of CC.
Generally, eight instruments were selected by experts to be incorporated for the selection of relevant items from each instrument. This group of three experts selected relevant items that were appropriate in the context of Ethiopia's health setting. Upon their selections, a total of 65 items were initially selected by them.
The numbers of items selected are 13 from NCCS, 12 from CCAI-UIC, eight from CCCS, six from CCE, 10 from CCS, and 17 from K-CCS-CNs. Then the items are evaluated for inclusion to pass to the next stage. Based on the experts' judgment, 13 items were automatically excluded. For instance, items like 'healthcare knowledge and the patient's comprehension of interpretation of health/illness are usually different systems, even if a patient's use or adoption of a treatment method differs from my professional knowledge' and 'I usually don't prohibit it, and knowledge of cultural differences about the decision-maker in the family' were automatically discarded. This is because of the violation of item construction principles and their similarity with other items decided to be incorporated in the initial version of the tools. While 11 items were accepted with minor modifications, The remaining items were modified and contextualized to make appropriate to the cultural context of Ethiopia. This by addressing the words or phrases in the existing instruments has been replaced by other words or phrases that make the items more relevant and appropriate in the healthy cultural setting of Ethiopia.

Conclusion
According to the reviewed CC instruments and experts' judgment, Campinha-Bacote's process of cultural competence model in the delivery of healthcare services to culturally diverse patients and cultural care theory was the most widely used model. The experts thus recommended the adoption of Campinha-Bacote's model along with theory that informs it in the subsequent stages of developing the Ethiopian CC measure. More specifically, the experts suggested that, beyond the selected 56 items from the six existing instruments, additional items should be generated to address the rich cultural diversity of Ethiopia. The prominent among them are TSET and IAPCC recommended guiding the item development phase in order to tap the rich cultural diversity in Ethiopian health service delivery settings.
Most cultural competence measuring instruments included sub-constructs of cultural competence such as cultural skill, cultural awareness, and cultural knowledge. Cultural sensitivity, cultural desire, and cultural encounter were incorporated into 50% of the cultural competence instruments. This research suggested that the commonality that exists in the cultural encounter, cultural desire, and cultural sensitivity might have originated from the broad categorization of behavior such as cognitive, affective, and practical. These three factors were categorized under the affective behavioral domain that encompasses awareness, receiving, responding, valuing, organizing, and characterization.
Therefore, during cultural competence instrument development, authors might think that they can address the three subconstructs when they construct items for the cultural awareness factor. The overlap in these three sub-constructs might make it preferable to develop cultural competence tools by rethinking the educational taxonomy of Bloom in his identified three major domains; cognitive, affective, and psychomotor.
As well, from the selected 65 items, some of the items were overlooked major and minor rules of test construction principles or guidelines. For instance, the items with proven existence of double-barring ideas, endorsed, long sentences, prone to multiple interpretation, encounter brevity, and use of ambiguous words were among the observed limitations of the items in the reviewed cultural competence instruments. This might be mitigated through the participation of experts in the field of psychometric. Most instruments were developed by health professionals, and most of them overlooked the ground rules of personality test construction that are highly recommended by prominent thinkers in the field.
Finally, generating additional items of cultural competence, the translation, and adaption of these 56 items, content validation and factor analysis should be conducted to have valid and reliable tools that measure the Ethiopian health workers' cultural competence. Besides, to make the items more appropriate to the cultural health setting of Ethiopia. The content validation of the selected items and the generated items will be held together to refine the final items to be administered for pilot testing, and conduct CFA, and EFA.

Data availability
Underlying data Zenodo: Identifying the core components and items to measure health workers' cultural competence in the Ethiopian context. https://doi.org/10.5281/zenodo.7497083. 73 The project contains the following underlying data: • Figure 1. pptx portrays the cultural care universality and diversity theory.
• Figure 2. pptx indicates the Campinha-Bacote's the processes of cultural competence in the delivery of health care services.
• Figure 3. pptx demonstrates the procedure of the study.
• Table 1. xlsx depicts sources, short summaries, study constructs, operational definitions, rating codes for inclusion criteria, and rating codes for exclusion.
• Table 3. xlsx includes issues related to the use of three raters' independent ratings to compute the IRR of instruments, sub-constructs, demographic variables, item screening, and item evaluation.

Extended data
Zenodo: Identifying the core components and items to measure health workers' cultural competence in the Ethiopian context. https://doi.org/10.5281/zenodo.7497083. 73 The project contains the following extended data: • Appendix A. xlsx depicts number of CC items selected from six instruments for further evaluation.
• Appendix B. xlsx incorporates evaluation of items selected from CC instruments.
• Appendix C. xlsx reveals selected and modified items in the study.
• Appendix D. xlsx shows number of excluded items in the study.
Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).