Toward a paradigm shift from deficit-based to proactive speech and language treatment: Randomized pilot trial of the Babble Boot Camp in infants with classic galactosemia

Background: Speech and language therapy is typically initiated reactively after a child shows delays. Infants with classic galactosemia (CG), a metabolic disease with a known high risk for both speech and language disorders, hold the keys towards evaluating whether preventive treatment is effective when the risks are known at birth. We present pilot data from a randomized parallel trial of an innovative proactive speech and language intervention program, the Babble Boot Camp (BBC). Method: Five children with CG, otherwise healthy, participated in the study from approximately 2 to 24 months of age. One of these was randomly selected as control receiving conventional management, which typically starts at age 2-3 years. A pediatric speech-language pathologist met weekly via telepractice with the parents in the treatment cohort. Parents implemented the prespeech, speech, and language stimulation and expansion activities according to the intervention protocol. The control child was still too young for conventional treatment. Primary outcome measures were speech sound production complexity in babble and speech and expressive vocabulary size. Secondary outcome measures were vocalization rates and developmental milestones in communication, motor, and cognition. The trial is ongoing. Results: All four treated children had higher speech sound skills in babble, three had higher speech sound skills in meaningful speech, two had higher expressive vocabularies, three had higher global developmental scores, and two had higher vocalization rates, compared to the control child with CG. Discussion: Given the high risk for speech and language delays in children with CG, finding on-schedule abilities in two or more of the treated children but not the untreated child is unexpected under random conditions. The trends toward beneficial effects of the BBC on speech sound production, expressive language, and communication milestones warrant appropriately powered larger clinical trials with full randomization. Trial registration: ClinicalTrials.gov NCT03838016 (12 th February 2019).

and cognition. The trial is ongoing.
: All four treated children had higher speech sound skills in babble, Results three had higher speech sound skills in meaningful speech, two had higher expressive vocabularies, three had higher global developmental scores, and two had higher vocalization rates, compared to the control child with CG.
Given the high risk for speech and language delays in children Discussion: with CG, finding on-schedule abilities in two or more of the treated children but not the untreated child is unexpected under random conditions. The trends toward beneficial effects of the BBC on speech sound production, expressive language, and communication milestones warrant appropriately powered larger clinical trials with full randomization.

Amendments from Version 4
We greatly appreciate the reviewer's suggestions for the present manuscript and also for the larger clinical trial regarding barriers to implementation fidelity and reporting standards. We have addressed the concerns as follows: -Added an appendix that lists and describes all intervention components of the study. The descriptions of the activities and routines were excerpted from the program overview that parents receive at the start of the intervention.
-Contextualized our transcription reliability by converting the accuracy measure from differences to agreement and by showing that our agreement rates were nearly identical to those in the original MBL paper.
-Outlined the implementation and reporting procedures in place for the clinical trial that followed the pilot study described in the present manuscript.
-Described implementation and fidelity measures planned for future expansions of this project, not only in terms of additional research but, eventually, in terms of clinical practice.

Introduction
Difficulties with speech and language are common among young children. In the US, 11% of children age 3 to 6 years have a communication disorder (Black et al., 2015). Many parents who are concerned about their child's ability to talk ask the child's doctor, who, in turn, may refer them to a speech-language pathologist. By the time a referral is made, the child may already be two, three, or even four years old and has passed critical stages in the process of speech and language development (Cates et al., 2012). There is strong evidence that early interventions for children with known risks or first signs of a variety of disorders are highly effective, for instance interventions for children with autism spectrum disorder as young as 12 months (Dawson et al., 2010;Guralnick, 2011;Rogers et al., 2014). However, very early speech and language services are not yet available, in part because speech and language are later-developing skills and disorders in these areas cannot be reliably diagnosed on behavioral grounds until an age when deficits become evident, which is greater than 24 months for speech (Goldman & Fristoe, 2015) and greater than 36 months for language (Semel et al., 2004).
Previous studies have investigated the effect of training parents to engage in activities designed to foster language growth in their children, both regarding typically developing children (Roberts et al., 2014;Tomasello & Farrar, 1986;Tomasello & Kruger, 1992) and children with autism spectrum disorder, Down syndrome, intellectual disability, and other conditions that are known to affect language development (Roberts & Kaiser, 2011;Wetherby et al., 2014). In terms of caregiver training strategies, one intervention study via parent training demonstrated that a four-step approach consisting of teaching, modeling, coaching, and reviewing was effective toward increasing children's expressive language skills at ages 24 to 42 months (Roberts et al., 2014). Another approach was based on a six-step strategy consisting of (a) initiation and joint planning, (b) observation, (c) action (modeling and practice), (d) feedback, (e) reflection, and (f) evaluation, described for use with a child age 26 months (Akamoglu & Dinnebeil, 2017). However, no study targeting prelinguistic skills included children younger than 12 months. Whether treatment focusing on earliest signals of communication has a beneficial effect on later speech and language development is unknown because such treatments have not been developed and validated. Going even further back in the developmental trajectory, some children are born with genetic or other risk factors for speech and language disorders; this risk is known long before prespeech behaviors such as coo and babble and actual speech and language emerge. The question is whether proactive, preventive treatment, if it existed, could reduce the deleterious effects associated with the risk factors in these cases and thereby improve outcomes.
Infants with classic galactosemia (CG) are an ideal population to investigate whether proactive interventions during the first two years of life, long before traditional assessment and intervention are available, can significantly improve speech and language outcomes. CG is a recessively inherited inborn error of metabolism diagnosed via newborn screening, with incidence rates in the US ranging from 1/30,000 to 1/60,000. Worldwide, incidence rates are highest among Caucasians, especially individuals of Irish descent (Coss et al., 2013;Jumbo-Lucioni et al., 2012). Newborn diagnosis can be life-saving because of the deleterious effects of galactose buildup in the child's blood that can occur if dietary restrictions are not implemented immediately. Despite rigorous dietary management, however, children with this disease have a substantially higher risk, compared to the typically developing (TD) population. These risk conditions include motor and learning disabilities (Antshel et al., 2004;Karadag et al., 2013;Potter et al., 2013) but also, importantly, severe speech and language disorders. Speech disorders were reported in 77% of children with CG (Hughes et al., 2009), compared to 3.8% among children generally (Shriberg et al., 1999), and language impairment in roughly 56% to 71% of children with CG (73% to 92% of children with CG who also had speech disorders (Waggoner et al., 1990) (Potter et al., 2008), compared to 7.4% among children generally (Tomblin et al., 1997). This elevated risk, coupled with the early identification, makes children with CG an ideal population in which to examine the efficacy of prospective intervention therapy. If proactive intervention is shown to be more effective than conventional management, this has the potential to change the management model from deficit-based to preventive services for these infants. It will also motivate similar studies in infants with other types of risk for communication disorders, for instance very low birth weight and 7q11.23 duplication syndrome. That is, children known to be at risk may benefit from early, prospective intervention, thus improving outcomes.
The Babble Boot Camp (BBC)© is a program of activities and routines designed for infants and toddlers during the pre-speech and very early speech and language stages. It contains components intended to shape dyadic interactions across modalities, stimulate earliest vocalizations (coo, babble), support emergence of first words and sentences, and foster vocabulary and syntax growth. The active phase of intervention covers ages 2 to 24 months, with plans for follow-up testing using a professional evaluation of speech, language, and cognitive abilities at ages 30, 42, and 54 months. This program is unique in that it leverages knowledge of genetic risks for speech and language difficulties that are identified at birth via a diagnosis of CG, begins with infants as young as age 2 months, and addresses signals of communication starting with eye contact and pre-babble vocalizations, then progresses through all other prelinguistic and linguistic stages until age 24 months.
Here, we report pilot results of the BBC. This Phase 0 exploratory study demonstrates, with clear clinical application, a viable proactive early intervention approach for minimizing speech and language disorders in a vulnerable population of infants with a known genetic risk for these disorders. The purpose of this pilot study was to demonstrate feasibility and to conduct initial comparisons between children with CG participating in the BBC treatment and the control child with CG regarding developmental areas of interest. The primary focus was speech and language development measured with standardized assessments, with secondary attention to cognitive and motor development.

Methods
This study was conducted with approval of the Institutional Review Boards at Arizona State University (IRB ID # STUDY00004969) and Washington State University (IRB ID # 13099). The design is a randomized parallel trial. The study began on January 31, 2017 and is ongoing. Parents learned about the study through online research announcements and referrals from physicians and other service providers, then contacted the research team. Once eligibility for participation was established and parents made the decision to participate, they gave written permission for their infants' participation and written consent for their own participation. The study is listed on ClinicalTrials.gov under NCT03838016. The Babble Boot Camp is copyrighted and listed at Arizona State University under Technology ID M19-186L.

Participants
The current participants are 25 children with CG and their parents. Here, we report on a subset of the children, the five oldest children with the longest participation record, for whom a nearly complete dataset is available through age 24 months. This pilot treatment cohort consists of four children, two girls (codes CG1, CG2) and two boys (CG4, CG5). Note that CG1 only participated in the study up to age 18 months, due to personal circumstances. One additional boy with CG was randomly selected to serve as a control who did not participate in the BBC treatment program. All families participated in the close monitoring components of the study, described further below. For purposes of comparison to typical children, archival data from test norms and publications, described below, were used. In the near future, the parallel design will be built out by recruiting more children with CG into the treatment and control cohorts and also by creating a control cohort consisting of children developing typically. During that next phase of the project, randomization and blinding will be used, the latter of which can only be applied to research team members who analyze the data.
In-and exclusionary criteria are identical for the treatment and control children with CG and are designed to evaluate the effects of the treatment while keeping all other factors the same. Inclusionary criteria are the following: Age at entry into the study is approximately 2 to 4 months. All infants are required to have a newborn diagnosis of CG. Boys and girls of any racial/ethnic background are equally eligible to participate, but given the highest CG prevalence rate among Caucasians, proportionally high numbers of Caucasians are expected. At least one parent or caregiver must enroll as a participant because we also collect quality of life information from the adults. The adults who enroll can be biological parents, adoptive parents, foster parents, or regular caregivers who are not related to the child as long as they provide care to the child on a regular basis. Primary language in the home must be English and at least one parent or caregiver must have at least an 8 th grade education. Because the intervention is implemented via telepractice software, any family whose primary language is English can participate, regardless of country of residence. Exclusionary criteria are the following: Galactosemia types other than CG; medical or sensory diagnosis that could introduce confounding, e.g., Trisomy 21 or deafness. Note that one child with CG, enrolled into the BBC as an infant, met the original criteria but later developed chronic ear infections requiring surgical insertion of pressure equalizing tubes at age 17 months; at the same time, she also underwent a frenectomy to address a tongue tie. Due to concerns regarding her hearing acuity and restricted tongue movement prior to these surgeries, her data are excluded from this report. Additionally, as noted in the consent form, families would be removed from the study if they missed more than half of their weekly meetings with the SLP over a two-month period and did not respond to re-engagement efforts. This was not the case for any of the children described in this study.
All five children live at home with both biological parents and are cared for during the day by their mothers while their fathers are at work. For each of these five families, parental levels of education, an estimate of socio-economic status, include at least some college (for details, see Table 1 below). Consistent with the high CG prevalence rates among Caucasians worldwide, all five children are Caucasian, and all reside in the US. Although additional interventions beyond the BBC, where available, are allowed, none of the children described here were receiving any additional services.

Materials and procedures
The BBC is implemented via parent training by a speechlanguage pathologist (SLP) with expertise in early childhood development and earliest signals of communication. This SLP implements the program in all families in the treatment cohort. She uses a HIPAA-compliant telepractice computer interface to connect with the families. Parents learn about the typical milestones of prespeech, speech, and language development and potential red flags for delays, and, importantly, they are introduced to specific activities and routines that are designed to support typical development for all stages of the program and development beyond.
The program is built around 17 activities and routines designed to foster relevant communication skills for children ages 2 to 24 months. A description of these activities and routines is available at the Open Science Framework entry for the Babble Boot Camp (https://osf.io/yzht4/). Activities are components of the program that take place for at least 5 minutes per day, whereas routines are designed to become daily habits in the parent-child interactions. Examples of activities are stimulating and reinforcing babble by showing the child videos of babbling babies and enriching the child's linguistic environment with joint book reading (Storkel et al., 2017). Examples of routines are saying the names of objects that the child points to (Dimitrova et al., 2016), and expanding child utterances to provide slightly more complex model sentences (Hassink & Leonard, 2010). Activities and routines are modeled on typical developmental milestones for ages 2 to 24 months reported in the scientific literature (Chapman, 2000;Gordon-Brannan & Weiss, 2007;MacLeod, 2013;Paul & Norbury, 2012;Stoel-Gammon, 1983;Stoel-Gammon, 2011). Note that there is considerable variability regarding these milestones. An example of such variability is onset of canonical babble among typically developing children (Oller et al., 1999) as well as children with various types of challenges such as hearing impairment with early amplification (von Hapsburg & Davis, 2006). Therefore, while the developmental milestones are the same for all children, children in the BBC treatment cohort progress through these milestones not based on their chronological age but on their present levels of ability. Also, activities may be individualized based on a child's present levels; for instance, one child may need to learn to babble "baba" via shaping the [b] from a raspberry whereas another child might acquire that sound on her/his own or by imitating a model.
Because in young infants, communication skills do not develop in isolation but, rather, in the context of fine and gross motor, cognitive, social-emotional, and adaptive skills, the SLP discusses developmental milestones of the baby as a whole with the parents. An example is a play activity for an 11-month-old child where parent and child take turns putting balls into a bucket. Each time, the parent says, "Ball," and the child imitates, producing an approximation of the word. This activity builds social-cognitive skills (following directions, imitating, turn-taking), speech and language abilities, and fine and gross motor skills. Learning to greet by waving bye-bye is another example of growth involving social communication and gross motor skill. In the context of the BBC, the SLP makes observations and suggestions regarding those skills most relevant to the communication process, including balance/equilibrium when seated or standing in front of the parent, reaching for objects, and grasping and giving objects.
Following an orientation to the program, the SLP meets with each family once per week for approximately 15 minutes to train and consult on the relevant activities given the child's current skill status. The SLP obtains information to discuss in each meeting in three ways: (1) Parents send the SLP 1-3 home videos, each up to 2 minutes in length, and the SLP reviews these prior to the meeting; (2) During the meeting, parents report on progress with a given skill that was the focus during that week and on their amount and level of engagement with the child on the skill of interest; and (3) The SLP makes direct observations of the baby and the parent-child interactions during the meeting. In this way, the SLP's training model includes the following steps: (1) describing the activity/routine, (2) modeling it, (3) giving feedback as parents practice it during the meeting, (4) providing feedback on the home video, (5) evaluating it in the following week's meeting with the parents by discussing the child's present skills, and (6) discussing ways to build on these skills towards the next target with the parents. Thus, the SLP follows the teach-model-coach-review approach (Roberts et al., 2014). Each week, the SLP makes recommendations for no more than two activities/routines to implement that support the child's current skills and build toward the next developmental step. If parents have questions during the week, they may email the SLP and receive a written explanation. Altogether, the SLP spends 20 to 30 minutes per child per week reviewing the home video, meeting online with the child and parent(s), charting present levels and next goals, and billing.
One key principle underlying all activities is the zone of proximal development or scaffolding in which parents provide speech and language models that bridge what the child can already do and what is slightly beyond the child's skill set: the model is in the zone of skills that the child can do with help (Vygotsky, 1979). One key skill targeted throughout the program is imitation.
Fidelity of treatment delivery and implementation is addressed as follows: The same SLP implements the parent training in all families, ensuring consistency and fidelity of this component of the study. All treatment sessions are recorded, and the SLP takes notes on each component of the session (video review, current skills, next steps and recommendations, and discussion of parent questions). She checks to make sure all intervention components of the teach-model-coach-review strategy (Roberts et al., 2014) are implemented. The SLP provides direct feedback on parent implementation of the activities and routines that were agreed upon in the previous session based on the home videos and live demonstration during the online meetings; this ensures correct implementation of the activities and routines. The SLP further assesses parent fidelity in carrying out the activities and routines by asking the parents how frequently they engaged in the activities/routines during the preceding week. If parents implemented a task or routine with less frequency than agreed upon, the SLP reminds them of the importance of sticking to the plan of action and tells them to contact her right away if they encountered any difficulties. To date, no parent contacted the SLP saying they were unable to implement the activity or routine. Other fidelity measures of parental implementation of the program are percent SLP sessions attended, percent weekly home videos submitted, and percent monthly audio recordings provided. Table 1 summarizes demographic data and fidelity measures of parent follow-through.

Outcome measures
Because the BBC is designed to beneficially influence speech and language development, the primary outcome variables are speech sound production in babble and speech and expressive language ability. Secondary areas of interests are cognitive and motor development and quality of life, but these are only partially addressed in this pilot and feasibility report. During the active BBC phase, we monitor all of these areas as well as a range of other enrichment variables including volubility of child vocalizations, environmental influences, and demographic factors. Quantitative measures of most of these enrichment variables will be analyzed and described in future reports.
To assess speech sound development and language growth specifically, we use a combination of several standardized tests and established clinical procedures. In the present pilot report, we quantify speech and language growth using several metrics. First, we focus on the complexity of the speech sounds produced during babble and early speech. Once per month per child starting at age 6 months, we compute the Mean Babbling Level (MBL) (Stoel-Gammon, 1989), a clinical measure of speech sound complexity for babbling. To compute the MBL, a set of at least 50 utterances is compiled and transcribed into the International Phonetic Alphabet using broad transcription. Nonspeechlike vocalizations are excluded from the transcriptions. An expert rater assigns a score of 1, 2, or 3 to each utterance and computes the average; thus, MBL scores range from 1 to 3 for each child at any given time point. A score of 1 is assigned to simple utterances consisting of a vowel, a syllabic consonant, or a consonant-vowel (CV) or vowelconsonant (VC) sequence where the consonant is either a glide ("w", "y"), a glottal fricative ("h"), or a glottal stop, defined as a brief silence interrupting a vowel. Note that glottal fricatives and stops and glides are not considered to be true consonants. Examples of Level 1 utterances are "m" and "wawa". A score of 2 is assigned to utterances containing at least one CV or VC sequence with a true consonant; if there are two or more syllables, the consonants may be the same ones or differ only in whether they are voiced or not. Examples are "bapa" and "dida". A score of 3 is assigned to utterances containing at least two true consonants produced in different parts of the mouth and/or with different airflow characteristics. Examples are "gaba", and "adap". The three scores thus express the progression from motorically and linguistically simple to more complex skills. MBL scores in the BBC treatment and control cohort were compared to MBL scores in TD children at equivalent ages as reported in the literature (Morris, 2010). Whereas the MBL is applied to babble, which does not convey lexical meaning, an equivalent measure called Syllable Structure Level (SSL) exists for meaningful speech (Paul & Jennings, 1992), and as for the MBL, we used this measure in tandem with typical norms as reported in the literature (Morris, 2010).
The source materials for the MBL and SSL are daylong naturalistic audio recordings captured with a passive, wearable audio recorder (LENA Research Foundation, Boulder, CO). The recordings are obtained in the natural environment of the families and children. The recorder is returned to the research labs and processed offline to obtain the raw, daylong audio file, which provided the source for the present MBL and SSL scores. Of the other possible measures derived from the recordings, number of child vocalizations per hour is reported here as a potential metric of direct treatment effects and fidelity of parental program implementation, as this metric reflects a behavior that is targeted in treatment. Other metrics derived from the LENA recordings capturing variables known to be important for language and communication development in children will be reported in future publications reporting on larger number of participants. All measures collected in this way are objective and algorithm-driven. For purposes of calculating the MBL and SSL, we extracted multiple 5-minute audio segments with the highest occurrence of child utterances and broadly transcribed these utterances into the International Phonetic Alphabet. These segments were selected using a built-in LENA algorithm, independently of who else is in the acoustic recording environment, for instance one or both parents, siblings, or others. We acknowledge that this sample selection procedure differs from that described in the literature (Morris, 2010) in that we excerpted segments with the highest volubility of child utterances from day-long recordings, whereas other studies were based on more naturalistic recordings of 30 or 60 minutes' durations. It is possible that our MBL and SSL scores are higher than they would have been if a more naturalistic 30-or 60-minute sample had been used as the basis for the MBL and SSL scores. In future phases of the project, we will collect data from typically developing children using the same procedures as those described here to create a more equitable basis of comparison.
Second, we used the MacArthur-Bates Communicative Development Inventories 2 (MBCDI-2) (Fenson et al., 2007) to capture early expressive and receptive vocabulary sizes as reported by the parents who completed the MBCDI-2 protocol forms. MBCDI-2 questionnaires were collected from each family at regular intervals (ages 12, 15, 18, 21, and 24 months), and computed percentile scores were compared between the treated and untreated participants. Percentiles were based on the publisher's norming sample, separately for boys and girls. The percentiles thus allowed us to compare the study participants to a representative sample of children of equivalent age and gender. Here, we focus on receptive vocabulary, available up to age 15 months, and expressive vocabulary, available for all reported ages.
Third, the Ages and Stages Questionnaires -3 (ASQ3) parent questionnaires (Squires & Bricker, 2009) capture communication abilities and personal-social development of young children. The evaluation of communication abilities differs from the MBCDI-2 expressive and receptive vocabulary measures in that it samples a broader range of communication abilities, as relevant for child age, such as comprehending phrases, using language to achieve a goal, and producing multi-word sentences. The personal-social development component queries skills in activities of daily living such as feeding and dressing oneself and social interactions, both verbal and nonverbal, such as giving/receiving objects, asking for help, and role-play, as relevant for age. The ASQ3 is considered valid and reliable for purposes of tracking typical development in these areas and three additional ones (gross motor, fine motor, and problem solving), where the problem solving component is an estimate of cognitive ability (Schonhaut et al., 2013). All five questionnaire topics are applicable to children with CG because of the known risks for deficits in each of these areas (Antshel et al., 2004;Karadag et al., 2013;Potter et al., 2013). Each component is scored by summing the raw scores, then assigning one of three categories, based on the provided norms: Above cutoff ("On schedule"), close to cutoff ("Provide learning activities and monitor"), or below cutoff ("Assess further"). The ASQ3 questionnaires are available in 21 separate, age-appropriate and age-normed sets from ages 2 to 60 months. We administer this tool in 6-months intervals and report here on ages 12 and 18 months.

Data analysis
Results and trends are presented descriptively, supported with graphs. Because of the small sample size of this pilot study, statistical tests for group differences and other inferential statistical procedures cannot be reported here, but will be used in future reports of the study.

Quality control
Teams of at least two trained research assistants entered questionnaire data from the MBCDI-2 and ASQ3 into the database. Three trained research assistants transcribed the infant utterances and computed MBL and SSL scores. To obtain inter-rater reliability, 10% of the sound files were transcribed again by a different transcriber who also computed MBL and SSL values for the reliability transcription. For these re-transcriptions, samples were selected from all participants representing six different ages ranging from 6 to 22 months. Agreement between the first and second sets of scores was calculated as a percent score where the average difference between the first and second scores, expressed as an absolute value, was subtracted from 100%, as follows: % Agreement = 100-[(|Score 1 -Score 2|)/(Score 1 + Score 2)/2 × 100] The average percent agreement in the reliability transcriptions was 91.5%, which is in line with the reliability agreement of at least 91% reported in the original MBL study (Stoel-Gammon, 1989) and, hence, consistent with acceptable transcription accuracy. Whereas the participants in the Stoel-Gammon (1989) study were of similar age as the ones in the present study and one of the metrics was identical, the recordings in the Stoel-Gammon study were high-quality analog recordings while we used digital LENA recordings. To our knowledge, there are no comparable MBL studies using LENA recordings. The close similarity in transcriber agreements in the two studies speaks to the acceptable quality of the recordings. The high l evel of transcriber agreement likely reflects the fact that the transcriptions were broad, not narrow, and that the MBL and SSL scores are not based on segmental identity of phonemes but rather, on phoneme classes (Morris, 2010). As an additional eliability check, all original MBL and SSL scores were re-calculated by a different team member. Any computational errors detected by a score difference were corrected and the validated values were entered into the final database.

Results
Mean Babbling Level (MBL) MBL scores of the children with CG in the treatment cohort consistently exceeded those of the control child with CG and typical control children without CG reported in the literature (Morris, 2010) ( Figure 1). For the ages for which data were available for nearly all five children with CG (7 through 24 months), the average difference between the treatment cohort and the control child with CG was 0.5, indicating that the children with CG in the treatment cohort obtained higher MBL scores than the control child with CG. For the ages for which data were available for the control child with CG and the typical literature-based control children without CG (12, 15, 18, and 20 months, as listed in the literature (Morris, 2010), the average difference was 0, indicating that the control child with CG obtained MBL scores equivalent to the typically developing children without CG, but note the declining trend for the control child with CG at the most recent ages. As a reminder, the data from the typically developing children reported by Morris (2010) were based on longer, naturalistically obtained recordings, not on 5-minute segments with the highest vocalization rates as in our study. The difference between the children in the treatment cohort and the typical children was -0.34, indicating that the children in the treatment cohort outpaced the typical children on average. Figure 1 shows MBL scores for all children at all available ages.

Syllable Structure Level (SSL)
The SSL has similar scoring criteria as the MBL, but unlike the MBL, it is based on meaningful speech and required a minimum of 10 utterances. For one child in the treatment cohort, CG1, not enough meaningful utterances could be identified in any of the recording sessions up to 18 months, when the child left the study, to compute the SSL. Of the remaining children and for the available ages, the children in the treatment cohort with CG outperformed the control child with CG by 1.0. For ages 16, 20, and 23 months, the only ages for which data were available for the treatment cohort and the typical children without CG, the children in the treatment group outperformed the typical children by 0.6. For the available datapoints at 20 and 23 months, the control child with CG obtained slightly lower SSL scores than the typical children without CG, by 0.2 points. Note that, with only one exception, the highest scores were obtained by CG2 and CG5. Figure 2 summarizes the SSL scores, where missing datapoints for CG1 and CG4 indicate insufficient numbers of meaningful utterances to calculate the SSL scores Expressive and receptive vocabulary Regarding expressive vocabulary size, the control child obtained one of the lowest rankings for the ages for which data were available ( Figure 3). Two children in the treatment cohort, CG2 and CG5, obtained the highest percentile rankings, which, at age 21 months, were 73 and 72, respectively. Two children, CG1 and CG4, obtained low expressive vocabulary scores, with CG1's percentile score being 1.7 at 18 months and CG4's being, 4.0 at 21 months. The control child, similar to CG1 and CG4 in the treatment cohort, showed declining expressive vocabulary scores, below the 8 th percentile at 21 months. Note that not having any words at all at age 12 months, which was the case for CG1 and the control child, corresponds to the 25 th percentile for boys and the 20 th percentile for girls, but the ranking drops rapidly with age if the expressive vocabulary does not increase substantially. Figure 3 summarizes the expressive vocabulary percentiles.
Regarding receptive vocabulary, the control child obtained the second-lowest percentile ranking (Figure 4), but note that all children including the control child show typical receptive vocabulary scores at the two ages represented in the    questionnaires. The lowest percentile score, 10.5, is considered low average; it was obtained by CG1 at age 12 months. All other scores are solidly within normal limits. Figure 4 summarizes receptive vocabulary percentiles, respectively.

Automatized metrics
All children except CG1 showed growth over time in their vocalization rates per hour. Figure 5 shows highest vocalization rates for CG2 and CG5 and low vocalization rates for CG4 and the control child with CG, especially towards age 24 months. For those ages for which data were available for at least two of the children in the treatment cohort and the control child with CG, the children in the treatment cohort averaged 46 vocalizations per hour more than the control child with CG.

Ages and Stages Questionnaires -3 (ASQ3)
The AGS3 questionnaires at 12 and 18 months indicated that three of the four children in the treatment cohort had communication and personal-social abilities as expected for age.
The communication performance of one child in this cohort was below cutoff for age at 18 months, and the personal-social performance was close to the cutoff for age expectations.
The control child scored close to the cutoff for age expectation at 18 months for communication, and at 12 and 18 months for personal-social abilities.
Regarding problem solving performance, of the children in the treatment cohort, CG1 and CG4 had scores that were not consistently in the range of scores expected for age, both at age 18 months, whereas the control child with CG was close to the cutoff at both time points. Regarding fine and gross motor skills, CG1 was the only child in the treatment cohort with scores that were below (fine motor) or close to (gross motor) the cutoff, whereas the control child had scores close to the cutoff in both areas at one time point and, for gross motor, at both time points. Across all areas, mainly one child in the treatment cohort (CG1) and the control child did not show developmental skills on schedule. Table 2 summarizes the ASQ3 scores for all children in the five assessed areas.

Discussion
In a small sample of very young children with a known and highly predictive risk for speech and language disorders due to CG, we provide preliminary evidence that a program of proactive activities and routines, the BBC, may have beneficial effects in several important regards.
First, very early attention to vocalizations and speech sound production may increase children's ability to produce more complex speech sounds in babble and speech. The children with CG in the treatment cohort obtained greater MBL scores during babble than the control child with CG as well as typical control children without CG. A similar pattern was seen for the SSL during meaningful speech. Note that the MBL  and SSL scores in our study were calculated based on 5-minute segments with the highest volubility, whereas the typical MBL and SSL scores reported in the literature were based on longer samples regardless of volubility. Therefore, the higher scores in the treatment cohort, compared to typically developing children described in the literature, may well be an artifact of this difference in methods. The differences between these metrics in the treatment cohort, compared to the control child with CG, however, are meaningful because the same sample selection was applied to all of these participants. Together, the MBL and SSL findings suggest that children with CG may benefit from very early attention to vocal and speech sound production in terms of increased complexity in babble and speech. It is possible that MBL and SSL are adequate measures for areas of crucial weakness in children with CG in general, and that targeting speech sound skills may give them a clear boost that may have beneficial effects on later speech and language development. Our fully powered clinical trial, launched two years after the pilot phase began and currently in progress, is likely to provide conclusive evidence in this regard. This larger clinical trial includes three cohorts, one with children who have CG and receive both the BBC treatment and the close monitoring via questionnaires, monthly day-long audio recordings, and standardized testing, one with children who have CG and receive the close monitoring only, and one with typical controls who also only receive the close monitoring.
Second, the expressive vocabulary, and possibly to a lesser degree the receptive vocabulary, of children with CG may improve as a result of the BBC. Expressive vocabulary scores as measured with the MBCDI-2 were completely within normal limits for two children in the treatment group, CG2 and CG5, whereas the control child with CG and two children in the treatment cohort, CG1 and CG4, scored below expectation. Given that over half to two thirds of children with CG exhibit language impairment, low language skills in the control child would be expected. Finding two of four children in the treatment cohort with typical language skills may or may not show major beneficial effects of the BBC program; more extensive language measures at subsequent ages will provide a more complete picture. The fact that receptive vocabulary scores were within normal limits for all children including the control child with CG may indicate that CG affects expressive language skills more than receptive language skills.
Third, there may be gains in competence in using language and nonverbal means to interact with others. Three of the children in the treatment cohort had communication skills and personal-social skills as expected for age, as measured with the ASQ3. These children may be gaining age-appropriate competence in activities of daily living as well.
Comparing the results from all measures, the two children with the overall highest scores were CG2 and CG5. They showed no evidence of deficits in any of the evaluated areas; in fact, in the areas of speech sound complexity in meaningful speech and expressive vocabulary size, they scored at the top of the pilot sample. By contrast, CG1 did not produce enough meaningful speech for SSL scores, had low expressive vocabulary scores, and scored near or below the cutoff for expectation in all five developmental domains assessed with the ASQ3. CG4 had intermediate MBL and SSL scores but low expressive vocabulary scores and one low score in the AGQ3, namely in problem solving. The control child with CG obtained some of the lowest MBL, SSL, expressive vocabulary, and ASQ3 scores. These patterns are consistent with cross-domain associations among the evaluated skills (speech, language, social interaction, cognitive ability, motor ability). Whether these patterns reflect global levels of disease severity, global benefits from all of the components of the treatment, or a combination of these cannot be ascertained based on the present data.
Some evidence for the efficacy of the treatment is found in the child vocalization rates per hour. Generally, vocalization rates were higher in the treatment cohort, compared to the control child with CG, and they patterned with outcome measures of speech sound complexity and expressive vocabulary. This growth in child vocalization rates in the treatment cohort may be a result of parental fidelity in carrying out the activities and routines, and it may also represent a direct effect on child behaviors, as increases in child vocalization rates are targeted in the treatment activities and routines. Of the other measures that potentially reflect fidelity of parental follow-through, percent attendance of weekly SLP sessions seemed to predict outcomes best, although CG4 attended nearly all sessions but patterned with the control child across outcome measures. Whether an induced growth in vocalization rates is just one of many effects of the treatment, or whether it has a facilitatory contribution on more advanced speech and language outcomes remains to be explored once a larger dataset is available.

Limitations and future directions
The goals of this study were to obtain preliminary results and to document feasibility of the methods. Both of these goals were accomplished. Because of the small sample size in this pilot study, generalizations to other children with CG are not possible. Whereas the MBCDI-2 percentiles were based on sex-adjusted norms, the other measures were not, but note that the two children with the overall highest performance were a girl and a boy. It is not possible to identify which components of the BBC had the greatest impact, if any, on speech, language, cognitive, and motor development. Many other variables in the study, for instance quantity of child-directed speech, remain to be analyzed, not only in this pilot cohort but also in the full set of families in this study. Longer-term outcomes in the primary and secondary outcome variables will be evaluated as we follow the children until age 4 years and investigate a more complete spectrum of outcome variables including speech, language, cognitive and motor development, and quality of life. Most importantly, the trends toward beneficial effects of the BBC on the primary outcome variables of speech sound production and expressive language warrant appropriately powered larger clinical trials.
Currently, a fully powered clinical trial is in progress (R01 HD098253-02). For this larger study, we have created a processbased fidelity check for parents to ensure that they implement the activities and routines as intended, and we continue to use the process-based fidelity checks to ensure that the SLP regularly completes all components of the intervention. We are making efforts to minimize barriers to implementation fidelity by closely monitoring for factors that can diminish implementation fidelity (Cane et al., 2012;Justice et al., 2015). Results will be published following standardized reporting procedures using the CONSORT checklists and templates (Hoffmann et al., 2014;Ludemann et al., 2017). As the project expands in terms of research purview and additional SLPs are onboarded, and also as the Babble Boot Camp approaches widespread use among clinicians, a detailed instructional project manual and a videorecorded tutorial are planned, along with coaching and fidelity checks to safeguard that SLPs implement the program accurately and consistently. The project contains the following underlying data files: -UnderlyingData_MBL_SSL_MBCDI2.xlsx. This spreadsheet contains the MBL, SSL, and MBCDI-2 data for the participants.

Data availability
The underlying data also include audio files recorded in the participants' homes and hand-written questionnaires with identifiable information; therefore, these cannot be openly have are the need for more information concerning (1) the treatment procedures executed by parents and the speech-language pathologist (SLP) and (2) the fidelity with which they apply those procedures during child-parent interaction and parent training, respectively. I also will address a less substantial concern that was raised in a previous review. My recommendations on these issues include some steps that I believe should be taken as part of an additional revision as well as others that should be incorporated in the larger RCT.
More detailed information about treatment procedures is needed in Treatment Procedure Description. part because of the complex way in which the intervention is delivered: indirectly, through parent training and ongoing coaching by the speech-language pathologist and directly, through interactions with the child by the parents. Currently, in their description of treatment procedures undertaken by parents, the authors provide explicit descriptions for only one activity and one routine, although 17 activities and routines comprise the bulk of the procedures they will enact. In addition, in the description of training, coaching and procedures undertaken by the SLP, it is hard to tell which procedures are considered key to achieving outcomes.
To address these issues in the current report, I recommend that the authors provide readers with access to the intervention protocol or manual specifying each activity and routine being taught to the parents and further specification of the procedures incorporated by the SLP as part of parent training and coaching. Additionally, for the larger RCT, I recommend that the authors consult and make use of not only the CONSORT checklist related to pilot and feasibility trials, but also TIDieR: Template for Intervention Description and Replication (Hoffman ., 2014) . Recent concerns about the transparency and et al replicability of interventions along with subsequent developments in the use of reporting guidelines suggest the importance, for both research and applied purposes, of the thorough description of interventions (Ludemann ., 2016) , particularly those that are complex in nature. et al . With regard to future fidelity procedures, I join the first reviewer of this version of the manuscript Fidelity in recommending the addition of a process-based approach to the measurement of fidelity for treatment procedures undertaken by parents. In addition, I would recommend a similar approach be adopted for addressing the fidelity of the SLP training and coaching activities. In the current version of the study report, readers are assured that "The same SLP implements the parent training in all families, ensuring consistency and fidelity of this component of the intervention" (p. 6). Since one might expect more SLPs to become involved in parent training as the number of participants increases over the course of the RCT, the need to evaluate fidelity is likely to become substantially more pressing. Still, even were a single SLP able to provide training for all families, it would be valuable to verify that training methods do not vary substantially across children not because of attention to addressing each child's specific appropriate needs but because the clinician's attention, memory or other resources important to fidelity are taxed by factors that may be related to characteristics of the child, the parent, or other factors (Cane ., 2012;et al Justice ., 2015) . Further, because a process-based approach will focus on possible key ingredients et al of the training, increased attention to this aspect of the methods may also promote the development of valuable supports to fidelity (e.g., checklists) with eventual benefits to Babble Boot Camp's widespread adoption and use with fidelity by non-research SLPs.
A somewhat less substantive concern Response to earlier concerns regarding IPA transcriptions. relates to the reporting of agreement data for transcriptions in this version of the study report. An earlier review raised the concern that APA transcription from LENA recordings was not supported by the manufacturers of that device and could seriously undermine the reliability/agreement of the outcome measures based on them. In this version of the manuscript, additional information about agreement methods is provided and the average percentage difference in pairs of transcriptions is reported as 8.5%. This finding was described as "consistent with acceptable transcription accuracy" (p. 5). I'd like to suggest that a citation is needed to support that characterization, ideally one based on transcriptions performed 1 2 3,4 that a citation is needed to support that characterization, ideally one based on transcriptions performed under similar conditions those reported here (i.e., with data obtained using Lena) as well as under more typical transcription conditions. Finally, I believe that it would be more appropriate to use the term "agreement" rather than either "reliability" or "accuracy" in discussion of this subject (Stolarova et al., 2014) .
These comments are offered in hopes of supporting the ongoing success of this ambitious and important line of inquiry.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Developmental speech sound disorders, methods related to the description of behavioral interventions in developmental communication disorders.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 27 Jun 2020 , Arizona State University, Tempe, USA Beate Peter Many thanks for this very thorough and helpful review. We greatly appreciate the suggestions for the present manuscript and also for the larger clinical trial regarding barriers to implementation fidelity and reporting standards. We have addressed the concerns as follows: -Added an appendix that lists and describes all intervention components of the study. The descriptions of the activities and routines were excerpted from the program overview that parents receive at the start of the intervention. -Contextualized our transcription reliability by converting the accuracy measure from differences to agreement and by showing that our agreement rates were nearly identical to those in the original MBL paper.
-Outlined the implementation and reporting procedures in place for the clinical trial that followed the pilot study described in the present manuscript.
-Described implementation and fidelity measures planned for future expansions of this project, not only in terms of additional research but, eventually, in terms of clinical practice. We made a few additional minor changes, e.g., corrected some typographical errors that had previously escaped notice and updated some of the information.
No competing interests were disclosed. My major concerns in my earlier reviews were first, that there were no measures of parent training fidelity and second, the status of the typically developing cohort was imprecise.
Relative to the issue of parent training, the author's rationale for their present focus on measuring child variables was much stronger. However, because this large study is poised to be of great value to researchers and clinicians focused on the earliest stages of pre-linguistic development in high risk populations, I would like to point out an issue that could make this study a stronger contributor to the parent training literature. In the section (final paragraph), the authors expand Materials and Procedures 1.

4.
populations, I would like to point out an issue that could make this study a stronger contributor to the parent training literature. In the section (final paragraph), the authors expand Materials and Procedures their description of "parent fidelity in carrying out routines" by detailing a number of legitimate frequencymetrics of evaluation of parent behaviors (i.e., "how frequently they engaged in the based activities/routines during the preceding week", ..…."percent SLP sessions attended, percent weekly home videos submitted, and percent monthly audio recordings provided"). These metrics are process-based excellent, but I feel that the authors, in the main body of their study could push further into the quantification of the parent-child interactions to collect data on the actual interactions content-based themselves between the parents and their children, not only the frequency metrics (noted above). These kinds of content-based quantifications could be invaluable to researchers and clinicians in this area. By content, I am indicating counting dyadically-based behaviors such as the following: increase in maturity of vocalizations following modeling by the parent; see Stoel-Gammon or Oller measures of vocalization maturity; parental use of parallel talk or expansions; dyadic balance of turn taking; dyadic balance in maintenance of eye contact. These content-based behaviors are intended as examples, not an exhaustive list, but all are based on available research into earliest periods of typical infant development of pre-linguistic vocal capacities. They enable the researcher to specify how the parents accomplished their dyadic interactions not only the frequency of compliance with study design requirements. In short, I am urging the research team to expand their quantification from frequency-based metrics into content-based metrics, potentially to identify more closely potential aspects of parent behaviors that might map onto successful or less successful outcomes in their larger child cohort followed in the larger study.
Relative to the issue of the typically developing child cohort, the authors did an adequate job of outlining caveats to their comparison cohort relative to the length (i.e. 5 minutes versus 30-60 minutes) and types of data (i.e. highest volubility samples versus functional use samples) selected for typical development comparison samples. I would like to see an ongoing attention to this issue in the larger study.
Overall, the revisions completed are adequate to index the pilot data for an ongoing larger research study. My present comments are intended to support use of this pilot data to consider expansions into parent-child dyadic interactions in this critical formative period.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes Therefore, an appropriate reliability procedure would be to calculate the MBL or SSL for a selected percentage of children at a sample of ages based on two different transcriptions, to see if the same values would result from the two transcripts. The results of this procedure should be reported *in the article*.
2. The samples used in this study were "multiple 5-minute audio segments with the *highest occurrence* of child utterances" (p. 7). On the other hand, the samples reported in the literature to which the treatment children are compared were *not* selected to be samples during which the typical children were especially voluble. They were 30-minute (or sometimes 60-minute) samples that were used as is, regardless of the child's volubility at the time. Therefore, it is *not* appropriate for the authors to claim that the children in the present study performed better on these measures than the children reported in previous literature (e.g., on p. 7: "the children in the treatment cohort outpaced the typical children"). It is a comparison of the best moments of the treatment children versus typical moments of the children from the literature. We cannot know what the MBL and SSL levels of the typical children would have been if 5-minute selections of their most voluble moments of a day had been selected out for analysis.
There are still a few typos, etc. in the manuscript even though the authors did correct most of the ones that we (not Reviewer 2) pointed out before.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Infant, toddler, and child speech development and disorders, including prelinguistic vocalizations and early words, and motor speech development and disorders.
We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.
Author Response 17 Mar 2020 , Arizona State University, Tempe, USA Beate Peter We appreciate Reviewer 1's third review. In the previous revision, we had addressed Reviewer 2's We appreciate Reviewer 1's third review. In the previous revision, we had addressed Reviewer 2's concerns and addressed Reviewer 1's new concerns as follows: Inter-rater reliability: To obtain inter-rater reliability, 10% of the sound files were transcribed by a different transcriber who also computed MBL and SSL values for the reliability transcription. Differences between the first and second scores were calculated as an absolute value and expressed as a percent score relative to the average difference between the first and second scores. The average difference amounted to 8.5%, which is acceptable. The second reliability procedure, a complete recalculation of all MBL and SSL scores by a different team member, was based on the original transcriptions. Any computational errors that were identified with a discrepancy between the first and second score were corrected and the validated values were entered into the final database.
MBL and SSL sample selection: Reviewer 1 pointed out that studies of typical children are based on naturalistic half-or whole-hour recordings whereas we selected 5-minute segments with the highest volubility from a day-long recording. This is an excellent point regarding our study design that ideally would have been made in the first round of review. We added explanatory text to the Methods and revised the interpretation of our results, which make a lot more sense now.
Writing style: We had implemented 11 of the 16 suggestions for improving the writing style. Of five cases flagged as run-on or long sentences, we have now broken down two into shorter components but in the remaining three, we preferred our own wording. In this revision, we made a few additional improvements of spelling and punctuation. Since no details regarding style were mentioned in this review, we trust that our writing style is acceptable now.
No competing interests were disclosed.

Competing Interests:
thoroughly and objectively.
The primary remaining weaknesses still lie in some of the descriptions of the methods used. Of most concern is the assessment of parent fidelity. The authors have now disclosed that "The SLP assesses parent fidelity in carrying out the activities/routines that had been agreed upon in the previous session, by asking the parents during the preceding week." (p. how frequently they engaged in the activities/ routines 5; italics added). However, assessment of the quality of the parent treatment does not appear to be measured. It is possible that this is done through the review of the "..up to three home videos up to 2 minutes in length" that the parents submit (weekly?). However, the content of these videos is not currently described in the article. It is also important to know how many videos, of what lengths, were actually received for the participants described in this report and what they revealed about these specific parents' fidelity to the content/quality of the treatment protocol.
Another aspect of the methods that should be clarified relates to the MBL and SSL calculations. The authors report that 15% of the infants' "scores" were "double-scored" by other team members (p. 7) with differences over 10% resolved by consensus (if needed). However, it's not clear what this means. Were the transcripts re-transcribed, recoded for MBL and SSL, and the scores (means) then recalculated? Or was the "double-scoring" only recoding of each vocalization with respect to MBL or SSL and then recalculating the means of each? If only the calculation of the means was redone (no re-transcribing or recoding), that is not sufficient. At least the MBL and SSL coding should be redone for reliability's sake.
The MBL and SSL comparisons to TD children should also be explained in more detail. For example, how many TD children were in the comparison samples? What were the characteristics of those children? In addition, the MBL and SSL scores for the participants in this study were based on "multiple 5-minute audio segments " (p. 6; italics added). Was this selection with the highest occurrence of child utterances process of segments with many utterances also used in the TD comparison samples? If not, this difference could at least partially account for the apparently superior performance of some of the children with CG in comparison to the TD samples. In addition, it is stated earlier on the same page that "a set of at least 50 utterances is compiled" for each participant. Do the authors really mean 50 utterances or do they mean 50 vocalizations? It would be helpful to specify how many minutes were required for each of the participants reported in this study to amass 50 utterances and how many vocalizations they produced per utterance.
We find it mysterious that [d] could be shaped from a raspberry (page 5, column 1, 2 paragraph in Materials and Procedures, last sentence). Please correct or clarify.
The figures are much easier to interpret now, but we wonder if boxplots (comparing treated CG participants vs. control participant vs. TDs) might be more revelatory (either instead of the current figures or as well as the current figures), since they would show the distributions of scores better.
Other questions also remain to be answered. For example, it would be helpful if the review of the literature included information about the ages at which deficits in speech, language, fine motor, and gross motor skills typically appear in children with CG. Are the children in this study younger than the ages of emergence of those deficits or are the problems typically identified very early on?
We have reviewed the article in detail and have quite a few specific suggestions for improving the writing style and clarity. These are listed below. Many of these relate to long, wordy, complex sentences that would be much more reader-friendly if they were shortened, re-organized, or broken into multiple sentences. Only the most egregious cases are listed here; the authors are encouraged to read through the ms. to identify other sentences that could be clarified in these ways. Only the first several words of the sentence are provided in the list below to indicate sentences that should be simplified. nd Pg. 9, column 2, 2 paragraph of Discussion, line 1: Change "speech sound production" to "vocal production" since this includes pre-babble?
Is the work clearly and accurately presented and does it cite the current literature?

Is the study design appropriate and is the work technically sound? Yes
Are sufficient details of methods and analysis provided to allow replication by others? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Infant, toddler, and child speech development and disorders, including prelinguistic vocalizations and early words, and motor speech development and disorders.
We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

Introduction
The Introduction section is much improved and the focus on child outcomes is made clear. The rationale for the study is also well done in this section of the manuscript. However, the authors review literature on parent training with this age group, but, linking to the pilot study and/or sections, no data Methods Results is presented to evaluate fidelity of treatment by the parents, a critical issue in evaluating child outcomes in a parent training based study. This is a structural flaw in the pilot study as it is presented for publication at present that precludes valid evaluation of this pilot study of child outcome variables. As such, it seem premature to publish this early result of child behaviors with no in-depth treatment of the parent training variable.

Methods/Results
The four participants in the intervention study and the one control child are well described on relevant 1.

2.
The four participants in the intervention study and the one control child are well described on relevant variables for inclusion and exclusion in the study. However, there is no description at all for the typical developmental comparison cohort. The only mentions I find are as follows: on page 7, paragraph 2 in the section under MBL (……" typical control children without Results CG (12,15,16,18, and 20 months, listed in Morris, 2010") under SSL, paragraph 1, "For ages 15 and 19 months, the only ages for which data were available for the treatment cohort and the typical children without CG." As a result, we have no comparable inclusionary and exclusionary information presented for the typically developing child group that is one of the three main comparison groups in the authors' descriptive results graphs. As noted previously, there are no descriptions of data on parent training fidelity, which could potentially have a great impact on the child outcomes the authors report here. An unknown amount of the variability in results of this pilot study could be related to parent input issues related to the fidelity of delivery of intervention dimensions. Again, the pilot results on child outcome variables seems premature without this important dimension of the overall study of intervention delivered via parent interactions with the child. The gives a good road map for issues I would like to see Limitations and future directions available to a reader in this pilot study.
Overall, this is a needed type of research in our field and has a fairly well developed basis in studies of parent training regimes that are available for comparison. However, no review of that dimension is present in this study. In addition, no description of the typically developing cohort used for comparison is available. I have given more detail in the original review related to these issues. I feel like the pilot study report would be better served as a picture of the larger study if all the dimensions of importance are equally well represented for the reader.
Status: Major issues are missing to give the reader a full picture of the study contemplated and to fully evaluate the data results presented in this pilot study.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Early periods of child speech acquisition in typically developing and at-risk This manuscript focuses on an important and relevant issue: determining whether proactive intervention strategies are beneficial for infants at risk for later speech or language deficits associated with a specific condition (Classic Galactosemia, henceforth "CG"). This is an ambitious longitudinal project. In this manuscript, the authors report on preliminary findings and directions for research.

Is the work clearly and accurately presented and does it cite the current literature?
The primary limitation of the work in its current form is a lack of specificity. First, the literature review fails to give the reader a full sense of the related research or of the related clinical services that have gone before. For example, the authors state that, "very early speech and language services are not yet available" (p. 3). They add that, "The question is whether proactive, preventive treatment, if it existed, could reduce the deleterious effects associated with the disease [that results in risk for speech-language delay] …and thereby improve outcomes" (p. 3). These statements seriously downplay decades of research and of speech-language services in early intervention. As long ago as 1983, Tomasello and Todd showed that levels of parental joint attention impacted their infants' vocabularies between 12-18 months. Roberts and Kaiser's systematic review documents the research demonstrating that training parents to interact with their children in certain ways can impact linguistic outcomes. For example, more recently, Roberts . trained caregivers to use enhanced milieu teaching using a et al "teach-model-coach-review" approach. This included not only telling and showing the caregivers what to do, but also coaching them as they tried out the strategies and reviewing their success afterwards. Treatment fidelity on the part of the trainers and on the part of the caregivers were both measured. These authors demonstrated that the frequency and accuracy with which the caregivers carried out the strategies that they were taught had an impact on the children's communication progress. These are but a few of many studies documenting the value of SLPs training parents to interact with their infants and toddlers in a manner that will facilitate communication development, with the result that SLPs involved in early intervention typically do train parents to use these strategies. For example, ASHA's practice portal document on early intervention states that, "Services that include opportunities for families and caregivers to directly participate in intervention are essential to strengthen existing knowledge and skills and to promote the development of new abilities that enhance child and family outcomes" and that, "SLPs often coach families, caregivers, and other team members in how to implement functional, language-enhancing strategies during daily activities (Brown, 2016 ;Brown & Woods, 2015, 2016Roberts & Kaiser, 2011 ;Roberts, Kaiser, Wolfe, Bryant, & Spidalieri, 2014 ;Searcy, 2011 ;Shelden & Rush, 2010 )." Therefore, the authors need to specify how their study is similar to and different from what's been done before. Unique and important characteristics of their study might include the focus on stimulating prelinguistic vocalizations specifically (versus communication/cognition more generally), beginning at such a young age, and intervention for children with this specific syndrome. However, the very brief descriptions that they provide of their caregiver trainings make it impossible to judge exactly what they are descriptions that they provide of their caregiver trainings make it impossible to judge exactly what they are training the parents to do, how they are training them to do it, and whether or not the treatment fidelity of the trainers or of the parents are actually measured. In this sense, the work is not clearly presented. See below for further discussion of the current lack of specificity in these areas.

Is the study design appropriate and is the work technically sound?
The authors are gathering specific data systematically from several children, selected according to specific criteria, including two control participants who are not receiving treatment (one of whom has CG and one of whom does not), across time. However, the authors need to explicitly state the study's design at the beginning of methods section of this manuscript. Further, the work and its context within its content area and the methods used are not presented specifically enough to judge the soundness of the approach or the appropriateness of the conclusions that are drawn from the results.

Are sufficient details of methods and analysis provided to allow replication by others?
The authors do not provide adequate details about the participants and methods used in this study, specifically: Inclusion and exclusion criteria should be organized separately on page 4 for clarity. In addition to participant criteria, authors should report participant characteristics (i.e., hearing/vision status, whether they are receiving other interventions concurrently, family composition/status, whether participants stay at home or are enrolled in daycare, etc.). This information is important to include for later analyses, as these variables may be confounders, mediators and moderators. In addition to participant information, characteristics need to be collected and provided about legal guardian(s), specifically (a) socioeconomic status, (b) level of education, and (c) estimated amount of time spent with participants (an important consideration for intervention implementation/ feasibility). The authors mention on page 4 that enrichment variables are monitored, but preliminary findings of how these variables may be influencing outcomes is important to report, even qualitatively. Again, this information is important to include for later analyses, as these variables may be confounders, mediators and moderators. Frequency of parent training and consultation is reported as "once per week for approximately 10 minutes" (p. 4). There is no information on the fidelity of parent implementation of BBC strategies, nor is there information on what types of training are provided. For example, were parents just given information or did trainers observe them using strategies and offer feedback? Roberts . et al recommend a four-step caregiver training process; Akamoglu and Dinnebeil recommend six steps. Were recommendations such as these followed? Were the parents' goals for their children taken into account? Brown reports that, "Intervention studies using...a transactional [as opposed to uni-directional] coaching approach have shown to have positive effects on parent and child outcomes while aligning with the intent of family capacity-building (Brown & Woods, 2015, 2016Roberts ., 2014 ;Wetherby et al., 2014 )" (p. 144). If fidelity measures were performed, how et al was it measured and what are the results? This is important for replication purposes and also for judging factors that may impact the results of the study. Roberts . report that, despite the et al intensive attention that was paid in their study to telling, showing, coaching, and reviewing the intervention strategies that they taught to parents, "generalization of strategy use to the home varied by caregiver and by strategy" (p. 1864) and also by activity type (play, book-reading, etc.). The fact that only approximately 10 minutes of training was provided weekly is problematic for clinical translation purposes, as this amount and frequency of SLP intervention is not billable or practical. The authors report that 50 utterances are compiled and transcribed using the International Phonetic Alphabet; however, no information is provided about how this information is acquired or under what circumstances. What level of IPA transcription is used -broad or narrow? Is IPA used We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com