Keywords
Prosody development, Fundamental Frequency, Multi-level analysis, Babbling, Infants
The acquisition and development of prosodic aspects of vocal intonation are of special interest within the larger context of language acquisition.
The present study explored the developmental trajectories of infant prosodic abilities from 4 to 16 months of life with an intensive time points assessment. Several aspects were considered: an acoustic analysis of infant vocal productions with specific software, the analyses of all the prosodic variables associated with the fundamental frequency (F0 mean, F0 range, and F0 final contours), the individual variability and the complexity of the vocal productions of the infants.
The multi-level analysis evidenced specific prosodic developmental trajectories that differ for the different kind of vocal productions since the first months of life.
The findings suggest that in the second half of the first year of life infants show an intonational repertoire that may help manage interactions with their caregiver and that individual variability has to be taken in consideration when assessing infants’ prosody.
Prosody development, Fundamental Frequency, Multi-level analysis, Babbling, Infants
Early infants’ vocal productions form the basis for language development and are important social signals that stimulate caregiver response, which in turn facilitates the development of phonology and speech (Goldstein et al. 2009; Spinelli et al., 2017). Research on pre-linguistic vocalizations of typically developing infants has focused mostly on the segmental and syllabic aspects of spontaneous vocal productions. Fewer scholars focused their attention on supra-segmental aspects of infant vocalizations, including variables that measure prosodic aspects of vocal intonation such as those related to the pitch, usually measured through the fundamental frequency (F0). These prosodic features of language convey grammatical and pragmatic meaning as well as emotional or affect intent, and highlight particular elements of the utterance (Crystal, 1978; Gleitman et al., 1988; Wells et al., 2004). The development of prosody is of special interest within the larger context of language acquisition, because it is among the earliest aspects of speech to be acquired (Esteve-Gibert & Prieto, 2018) and is strictly linked to the maturity of the vocal tract (Kahane & Kahn, 1984). It is well known that infants are sensitive to prosodic information from a very early age due to their prenatal exposure to speech (Gervain, 2015) and therefore are able to understand the prosodic features of parental speech since the first months of life (De Carvalho et al., 2019; Fisher & Tokura, 1996; Soderstrom et al., 2008) and can manage to imitate the prosody of parental speech, for instance, by adapting their F0 features to match that of the parents (Gratier & Devouche, 2011; Ko et al., 2016; McRoberts & Best, 1997; Papoušek & Papoušek, 1989).
At birth, the infant’s vocal tract is not completely developed. Consequently, infant vocal expressions have different sounds and different melodies than adult ones (Astruc et al., 2013). Some authors reported much more reduced pitch range in children than in adults, and that children acquire falling tones first and final rising tones much later (Snow, 2004; Wells et al., 2004). The differences between infants and adults evidenced in the first stages of phonetic development have been linked to the difficulties in controlling the vocal tract. Specifically, Lieberman (1967) assumed that infants do not precisely control the tension of their laryngeal muscles once phonation starts. The child merely maintains the tension of the laryngeal muscles at or near the tension that they had as phonation started, and gradually rose from the start of phonation to either level of slightly falling “plateau”. For this reason, the majority of the breath-groups terminate with falling fundamental frequency contour. However, cross-linguistic differences have been reported, with the presence, for example, of rising contours evidenced in infancy in languages other than English (Esteve-Gibert & Prieto, 2013; Prieto et al., 2012; Whalen et al., 1991).
To our knowledge, only few data are now available considering the prosodic characteristics of the first infants’ vocal productions. The first of these was conducted by D’Odorico (1984), who acoustically analysed the cry and non-cry vocalizations produced by four Italian infants. The results of this study showed that cry vocalization produced in different context are acoustically different and that there are similarities in the acoustic properties of cry and not-cry vocalization produced in the same context. These results were later confirmed by D’Odorico and Franco (1991), who investigated the suprasegmental features of the vocalizations produced during social interaction by five Italian children from 4 to 11 months of age. Indeed, different patterns of non-segmental features were found in sounds produced in different contexts.
Considering the scarce data on Italian and the importance of cross-linguistic comparison on language development, the main aim of this study is to explore how Italian infants develop intonational abilities in the first 16 months of life, focusing on a detailed analysis of the infants’ pitch, in terms of F0 mean, F0 range and F0 final contours. Moreover, differently from other studies on this topic, comparing infants’ prosodic development to a mature adult model, the present study aims to give a broad picture of the longitudinal development of intonational repertory of children’s spontaneous productions, from 4 to 16 months of age.
A summary of the main studies available on the intonation development during the first two years of life is reported in Table 1, available in the Open Science Framework repository as extended data; doi: https://doi.org/10.17605/OSF.IO/738W2. The main focus is on the examination of F0 related aspects (F0 mean, F0 range and F0 final contours) of non-distress spontaneous productions, excluding studies who examined distress productions such as cry and whining (Mampe et al., 2009; Rothgänger, 2003; Zanchi et al., 2016b).
Among the F0-related prosodic variables, the most widely studied is the fundamental frequency mean value (F0 mean), that represents the rate of vibrations of the vocal cords within the larynx and reflects pitch variations of the voice. One of the first reviews on the developmental changes in the F0 mean from birth to adulthood, made by Kent (1976), showed that F0 mean is higher at birth and gradually decreases until it reaches adult levels. However, the studies included in the review were very few. Some of the studies reported in Table 1 (refer to extended data in the Open Science Framework repository) showed the same descending pattern from birth to the second year of life for F0 mean (Amano et al., 2006; Robb & Saxman, 1985; Rothgänger, 2003). Flax, Lahey, Harris, and Boothroyd (1991) found that 2 out of 3 children showed a reduction in F0 mean between the onset of words and 50-words period. This decrease is explained as reflecting the gradual maturation of the vocal tract and the development of infant ability to control the voice.
Nonetheless, other scholars failed to find significant changes in F0 mean over time (Iyer & Oller, 2008; Robb et al., 1989). Laufer and Horii (1977) measured infants’ productions every two weeks during the first six months of life and found that F0 mean values slightly fluctuated. Amano et al. (2006) argued that the main reason why some studies did not find a decrease in the F0 mean values was that speech samples were collected over periods that were too short to catch this effect. Lastly, there is only one author to our knowledge who found the opposite pattern. Fairbanks (1942) reported an increase during the first five months of infant life, followed by stabilization.
Even more contrasting findings have been presented about developmental trajectories of F0 range, namely the difference between the maximum and the minimum pitches produced within the utterance. F0 range represents the ability of infants to vary the intonation and to make the production more communicative and attractive for the listener (Amano et al. 2006). Amano et al. (2006) found an increase of F0 range after the onset of two-word utterances and hypothesized that, as infants grow up, they acquire the ability to vary the fundamental frequency of the voice within a production according to the increase of their communicative abilities. Similarly, Snow (2004) found that 4-years-old children showed a wider F0 range than 1-year-old infants. Nonetheless Snow and Ertmer (2012) reported that, in their sample, 10 out of 12 typically developing infants showed a decrease in F0 range between 3 and 9 months. In a first study, Robb and Saxman (1985) found a decrease of the between-utterances F0 range between 11 and 25 months, while, in a second study (Robb et al., 1989), they failed to find changes over time. Furthermore, Laufer and Horii (1977) found a slight decrease in within utterance F0 range during the first four months of life and a minor increase after the 4th month. To sum up, the developmental trajectories of the within productions F0 range is unclear.
Another prosodic variable that plays an important role in prosodic development is the F0 contour, intended as the shape of pitch (F0) variations within the utterance, which conveys a specific melody to the vocal production. F0 contour may increase from the beginning to the end of the production (rising F0 contour), decrease (falling F0 contour), or not significantly vary (level or flat F0 contour). The direction of these contours, especially of the F0 final contours, is considered fundamental in providing the pragmatic meaning of the vocal production. In most of the languages, falling final contours are typical of statements and labelling, while rising final contours are especially used with interrogative utterances (for Italian see for example Sorianello, 2021). Many scholars agree that infants start to use F0 final contours early to express intentions. For example, Prieto et al. (2012), Prieto & Vanrell (2007) and Esteve-Gibert and Prieto (2013) showed evidence that since the first year of life Catalan and Spanish infants, similarly to adults, are able to variate F0 final contours in order to signal pragmatic meanings, even before they can produce words. Several studies, mostly conducted in English speaking countries, agree that the majority of the productions of infants have falling contours (Fox, 1990), while rising contours are rare (see for example Kent & Murray, 1982, and Robb et al., 1989) and more frequent in adults (Cruttenden, 1997) and preschoolers (Snow, 2004). As stated above, according to Lieberman (1967), the falling contour is considered more natural and simpler than the rising contour without implying the infant’s intentionality. According to Lieberman hypothesis, rising patterns, by being contingent on language experience rather than physiological constraints, stabilize at a later stage of language acquisition. More recently, Snow (2006) found that the development of falling and rising patterns is not linear and follows a U-shaped trajectory. Both falling and rising contours are frequent and well expressed until nine months, when their quality decreases. After a period of stabilization, both these contours reach the same quality observed before the regression period at around 18 months. The author explained this results pointing out that intonation is controlled by physiology in the earliest stage (before nine months of age), but later the tones come under linguistic control. So, the regression could be due to a linguistic reorganization of speech and the U shape shows a shift of intonation from a pre-intentional to an intentional stage. Despite, these two main theoretical explanations, empirical studies that examined the frequency of F0 final contours trajectories over time found inconsistent results. Many of them (see, for example, Flax et al., 1991 and Murry et al., 1983) failed to find variations in the percentages of rising and falling contours over time. Fox (1990) found, consistently between the 3rd and the 9th month, a prevalence of 82% of falling final contours. On the contrary, Robb et al. (1989) found that the less frequently occurring F0 final contours were falling-rising and rising contours (comprising the 6% of all the vocalizations), and this percentage was constant, independently of lexicon size, throughout the first two years.
To sum up, the studies on the development of F0 features over the first years of life failed to give a clear and homogeneous picture, mainly because comparing these studies is difficult. First, the differences among languages, which could lead to different intonation features and developmental trajectories. Second, there are wide differences in the age ranges considered, in the number of longitudinal sessions and many other methodological aspects. For example, the types of infants’ vocal production included in the analyses (whether cry-like or squeal-like vocalizations were included in the analyses or only syllabic-like vocalizations), the tools used for the analyses of speech spectrograms (old studies used visual inspections of the spectrograms to measure values of F0 and are therefore less reliable than modern analyses run with ad hoc programs), and the study design (with longitudinal and cross-sectional analyses leading to different findings due to the variability among the participants).
Moreover, an important issue raised by some of the reported studies is the presence of variability among infants, which affects both F0 values and age-related changes that occur in infants’ voices. Laufer and Horii (1977) described the F0 mean fluctuating from month to month with each infant presenting a specific pattern of change. Flax et al. (1991) found that three infants out of three showed different F0 range change patterns, with one child not varying at all in this period. Kent and Murray (1982) found a prevalence of F0 falling contours all over the age range considered, 3-6-9 months, but pointed out high intra- and inter-individual variability in the production of rising contours at one point in time as well as over time. Other authors agreed that differences among infants are present (Amano et al., 2006), but the restricted number of participants or number of sessions made it difficult to check this hypothesis. This has implications for the design of the study, for a greater number of participants need to be studied, and for the analyses, since participants variance has to be considered.
Moreover, the reported studies mainly considered all the productions or only one specific type of production, without making comparisons among them (see Table 1 in the extended data in the Open Science Framework repository). Robb et al. (1989) recorded on 12 successive occasions the utterances of seven infants in the 8–26-month age period investigating the F0 mean of monosyllabic and bi-syllabic utterances. They found similar F0 mean values between the two productions but a tendency for monosyllables to have a greater F0 range than bi-syllables for all the participants except one child. Moreover, this tendency remained stable across the first two years of life. On the contrary, Snow (2004) did not find an effect of the number of syllables of the production for F0 range, thus showing that the range is independent from the length of the production. But these studies concerned the number of syllables of utterances, not developmentally different productions. Rothgänger (2003) explored the prosodic development of babbling comparing it with the development of cry instead of other non-distress productions, so information about eventual differences among the productions is lacking. Other studies confirmed the necessity to consider the different productions separately, showing differences in the developmental trajectories of the prosodic features of vocalizations and syllabic utterances (Hsu et al., 2000) and different multiword combinations (Behrens & Gut, 2005). These results are vague and confirm that a more detailed description of the phenomena is needed.
Some authors have hypothesized the existence of linguistic trade-offs during development, so that the increased demands in one component of language, such as syntax, may potentially cause a decreased performance in a second component, such as phonology (Crystal, 1978). Furthermore, the interrelationships among different components of language may vary depending on how recently a specific linguistic structure has been learned (Crystal, 1978; Masterson & Kamhi, 1992; Zanchi et al., 2016a). We believe that at early ages this effect may also manifest in relation to the different pre-lexical productions produced by infants. Therefore, we hypothesize that the pre-lexical productions acquired before, and consequently more practiced at the oral-motor level (Oller et al., 1976; Stoel-Gammon, 2011), may have prosodic features different from the later acquired productions, for whom the infant is not yet fluent.
To assess this hypothesis, the main aim of the present study is to explore the development of F0 related prosodic features of each type of pre-lexical productions observed. To our knowledge, this is the main factor not dealt with so far. As reported above, only differences among the prosodic development of cry and other non-distress productions (Murry et al., 1983) or among productions with a different number of syllables (Robb et al., 1989) have been analyzed.
The second aim of the present study is to explore individual variability in prosodic features of speech; then, all the analyses will be run with the use of multi-level analysis with the children as the second level.
We expected to find:
Fifteen infants (3 females) participated in the study. The sample was not gender-balanced. However, previous studies showed that gender differences in the prosodic aspects of language are present only from late puberty (Bennett, 1983; Fox, 1990; Lee et al., 1999).
All infants were healthy and full term born. Families were monolingual Italian-speaking, and mothers’ mean age was about 35 years (range: 28 – 42). 60% of them completed high school education, and 40% had a university degree.
Mothers were contacted after infant birth, and the first meeting was arranged at the beginning of the 4th month (M age = 4 months, 2 days; SD = 0:05). This age was chosen because around 4-6 months significant changes in the anatomical-physiological structure of the infant’s vocal tract occur, strongly increasing the control of speech articulation with enhanced production of speech-like sounds (Kent, 1976).
Infants were followed from the 4th to the 16th month of age. From the 4th to the 14th month a researcher visited the infant and the mother at home every 15 days (twice a month); after the 14th-month visits were monthly. See Table 2 for a summary of the sessions recorded for each participant. Some infants missed part of the sessions, and not all infants were followed up to the 16th month, since mothers interrupted their participation in the study for personal reasons. Mother and infant were audio-video-recorded during free-play face to face interaction without toys. The mothers were asked to play as they normally did. In total, 295 sessions (M per subject = 20, SD = 5) of about 10 minutes each (M = 10.06 minutes, SD = 1.99) were collected.
The audio of all the recorded sessions was obtained using the program Audacity Team (available for download at https://www.audacityteam.org/), and all the audible pre-lexical infant productions that did not overlap with other sounds and were noise-free were coded. In line with previous studies, vegetative sounds (such as wheezes, sneezes, coughs, hiccups, and clicking sounds), stress vocalizations (such as whimpering, fusses, and cries), laughs, words and onomatopoeias were not considered (Fasolo et al., 2010).
Infants’ vocal productions (Stark et al., 1993) were coded as:
• Communicative grunt (g): vocalization constituted by a consonant-like sound (e.g., [m]);
• Vocalic sound (v): vocalization constituted by vowel-like sounds (e.g., [a]);
• Simple babbling (cv): vocalizations containing almost one full vowel-like element and one consonant-like element with rapid transition between consonant and vowel (e.g. [ba]);
• Reduplicated babbling (cvcv): vocalization containing rapid repetition of the same sequence of one full vowel-like element and one consonant-like element (e.g., [baba], [tata]);
• Variegated babbling (c1vc2v or cv1cv2): vocalization containing rapid repetition of different sequences; vocalizations comprising at least one full vowel-like element and at least two different consonant-like elements (e.g., [bata]), or two different full vowel-like elements and one consonant-like element (e.g., [beba]).
In total 9737 productions were coded (M per session = 33, SD = 23.38; M for infant = 645, SD = 329.40).
In all the tables and figures of the present paper the pre-linguistic productions will be indicated as follows: Grunt = Communicative grunts, Voc = Vocalizations, Babb1 = Simple babbling, Babb2 = Reduplicate babbling, Babb 3 = Variate babbling. Table 2 summarizes the total productions for each subject at each age.
The PRAAT speech analysis software package (Paul Boersma and David Weenink, Institute of Phonetic Sciences, University of Amsterdam, The Netherlands; Boersma & Weenink, 2005) was used to obtain the prosodic characteristics of each vocal production using the visual inspection of the sound wave represented in the spectrogram to identify the beginning and the ending of the production (D’Odorico et al., 2009). The following measures were calculated on every single production:
- Fundamental frequency mean (F0 mean): calculated automatically in Hz by the PRAAT program.
- Maximum and minimum pitch: the highest and lowest F0 values in the vocal production (Cruttenden, 1997), calculated in Hz.
- Fundamental frequency range (F0 range): the span of F0 changes over the entire pre-lexical production (in semitones). According to the definition of Snow and Balog (2002), it was calculated as the logarithmic difference between the highest and the lowest F0 values in a production, measured in semitones: [12/log(2)]*[log (maximum F0 - minimum F0)].
- F0 final contour: the last movement of the production intonation profile. Each change of F0 values within the production was classified as having either a rising (F0 final rising contour) or falling (F0 final falling contour) contour if the pitch changed (differences between the minimum and the maximum F0 value) by at least two semitones. If the F0 range of all the production was less than two semitones, the contour was classified as F0 level contour.
The inter-coder reliability between two trained coders was assessed on 20% of the observation sessions randomly selected from each age point. Cohen’s kappa (K) coefficient was calculated to assess the accuracy of vocal productions coding; the value resulted in.93, which is amply sufficient. Concerning the prosodic variables, there were strong correlations between the coders (Pearson’s r) on the calculations of F0 mean (r = .94), highest pitch (r = .87), and lowest pitch (r = .75). The Cohen’s K coefficient on the classification of edge F0 final contours was.83.
The fit lines presented in the graph of Figure 1 were computed using a Kernel smoothing method. Specifically, we employed a non-parametric regression technique where the kernel function weights the observations within a neighborhood around each point of interest. The bandwidth was chosen to include 50% of the data points, ensuring a balanced trade-off between bias and variance. This approach, as stated by Wand and Jones (1995), and by Wasserman (2006), allowed for flexible modeling of the underlying relationships between variables without assuming a specific parametric form.
The total frequency of each type of vocal production and its percentage on all the productions is given in Table 3.
Type of production | Frequency | % of total |
---|---|---|
Grunt | 2523 | 26.1 |
Vocalization | 4955 | 51.2 |
Simple Babbling | 1451 | 15.0 |
Duplicate Babbling | 592 | 6.1 |
Variate Babbling | 155 | 1.6 |
Total | 9676 | 100 |
In Figure 1 are reported the mean relative frequencies of each type of production aggregated within participants at different ages. The graph shows that vocalizations were over time the most frequent productions with a consistent decrease over the first ten months of life. Grunts were very common during the first seven months of life, but their frequency decreased over time since their use becomes very sporadic. All the three types of babblings started to be present between the 5th and the 6th months, and their use increased over time with a predominance of canonical babbling all over the age period considered.
To explore the presence of between-subjects variability, a linear regression was conducted with age, age squared, and infants (indicators) as predictors on the dependent variable F0 mean. The results were statistically significant, R2 = .102, indicating that infants and age have an effect on the F0 mean. Coefficients reported in Table 4 show that infants had significantly different coefficients. This confirms the presence of differences among the infants and that these differences should be taken into consideration.
There are indeed some strong arguments to use multilevel techniques in the analysis of pitch. In this view the infants form a random factor (level 2) and each set of observations (level 1) is nested within each child. All multilevel models were tested with MLwiN 2.33 (Rasbash et al., 2005).
Different multilevel models were investigated. The basic model, the unconditional model with no predictors included in the equation (M0 in Table 5), indicated a significant inter-subject variability in the F0 mean (see Table 5); this represents a reason to carry out all the subsequent analyses with the multilevel software. The F0 mean value across subjects and across time was 366 Hz (Table 5), and it is in line with previous studies (see, for example, Amano et al., 2006).
We also tested further models (M1, M2 and M3) including the effects of age, linear and squared, as fixed predictors (M1), and the fixed effects of type of production variables (M2) and finally a model including also the fixed effects of interactions between age (linear) and each type of production (M3).
We also tested the models with age as a random factor, the -2LL did not decrease significantly, so there was no reason to treat it as a random factor, we kept age and age squared as fixed factors for all the following analyses. The type of production was treated as a fixed factor with a fixed coefficient and vocalization as the reference category. All the models for F0 mean are therefore random intercept models because no other variable than the children have random effect.
The equation representing the final M3 was the following:
Where and cost = 1
The same analyses were followed both for F0 mean, for F0 range and F0 final contours using the same schema and the same hierarchical models.
The first analyses explored the changes in the F0 mean values over time and among the different type of productions. As reported above the basic model (M0) showed there was variation among infants (random part); consequently, all the models included this variation among infants.
We added age1, linear and squared, as fixed predictors (M1). As can be seen in Table 5, only age linear fixed effects were significant. F0 mean values tended to increase over time.
Model 2 (M2) added to M1 the type of production as a fixed predictor with vocalization as the reference category. Results confirmed the significant effect of age and showed that grunts have the lower mean F0 values, higher values are present for variate babblings, while vocalizations, simple and reduplicate babblings have similar values and are situated at an intermediate level. The lack of differences between mono and by-syllables (simple and reduplicated babblings) was also found by Robb et al. (1989).
Model 3 (M3) added the interaction term between age and type of productions to M2. The effect of age, age squared, and the interaction between reduplicated babblings and age were significant. Nonetheless, the LL did not improve significantly, so Model 3 cannot be considered the best representation of F0 mean development and Model 2 was chosen and represented in Figure 2. Figure 2 shows the distances between the curves, the differences between the intercepts of each production. The form of the curve is due to the introduction of age and age squared in the model.
The second set of analyses dealt with the changes of F0 range, measured in semitones, over time and among the type of productions (see Table 6). The basic model (M0) gives 4.47 semitone as mean value of the F0 range and shows the significant variation among subjects and over time.
In Model 1 (M1), the significant effect of age showed that F0 range increased over time. The addition of the type of production, again with vocalization as the reference category, in Model 2 (M2), showed a significant effect of age, age squared and of each type of production except reduplicate babbling. Generally, the effect of age squared showed that F0 range very slightly increased during the first months but decreased later. More complex productions such as reduplicate and variate babblings tend to be produced on average with a wider F0 range than vocalizations. Grunts are the productions with the smallest F0 range. Also, simple babblings showed a narrower F0 range than vocalizations. The difference between simple and reduplicate babblings is in contrast with Robb et al. (1989), who reported monosyllables with a slightly higher F0 range than bi-syllables.
The inclusion of the interaction term in Model 3 (M3) revealed a significant effect of the interaction between age and variate babblings, but the improvement in LL was very little, so Model 2 was chosen as the best representation of F0 range development. Model 2 is represented in Figure 3.
To explore the development trajectories of the ability of children to give to F0 changes special directions, multilevel logistic analyses were run using as dependent variable the presence or not of level, rising and falling F0 final contours in the production. Analyses were done on 9597 productions, excluding the productions for which the coding of F0 final contour was unclear due to the spectrogram being too noisy because of interferences or artifacts.
Firstly, Model 0 showed a significant variability between subjects in the production of level contours with an average probability of 79% to produce level contours (see Table 7). The addition of the fixed effects of age and age squared in Model 1 did not show significant effects. Model 2 showed significant effects of reduplicate and variate babblings that are produced at a greater extent with level F0 contours. The probability to produce level productions did not vary over time. The interaction between age and type of production in Model 3 was not significant. Model 2 was chosen as the best representation of the presence of level F0 contours productions, and it is represented in Figure 4.
Logistic Binomial Multilevel results.
Concerning rising F0 final contours, Model 0 indicated significant variations among subjects and a probability of 59% to produced rising contours within the production (see Table 8). Adding age and age squared in Model 1 did not reveal any significant effects. Model 2 showed a significant effect of grunts, reduplicated and variated babblings. Children use less often rising final contours when pronouncing grunts and more often when pronouncing these more complex babblings. The probability to produce rising F0 final contours did not change over time. Model 3 did not reveal any significant effect of interactions. Hence Model 2 was chosen and represented in Figure 5.
Logistic Binomial Multilevel results.
The same analyses were run for the dependent variable presence or not of a falling F0 final contour (see Table 9). The basic model showed a significant variability among subjects and a general probability of 62% to produce falling final contours within the vocal production. Model 1 showed a significant effect of age squared. The presence of falling final contours is quite stable all over the period considered. In Model 2, the effects of grunts, reduplicate and variate babbling were significant (see Figure 6). Reduplicate and variate babbling were pronounced with higher probability with F0 falling final contours. Model 3 did not reveal any significant effect of the interactions of age and type of production on the presence (or not) of an F0 falling final contour.
Logistic Binomial Multilevel results.
To give a contribution to the contrasting literature on the topic and to supply to the several limitations of previous studies, the present study had the aim to describe the development of F0-related prosodic features of the pre-lexical productions of Italian infants from the 4th to the 16th month of life.
One of the main findings that emerged is the presence of a significant variability among children. Since in the first months of life children show differences in the use of the voice, not only at the mean fundamental frequency level but in all the examined variables, that can be attributed to physiological differences in the shape and development of the larynx. This finding enchanted the need to put attention to a greater extent on individual variability when studies on infants and children are developed. For example, the F0 mean of children across ages varies between 309 to 392 Hz that indicates a difference of more than four semitones among children. This great individual variability may explain the different values found by previous studies, conducted on small groups of participants. Moreover, the tendency not to consider and control this variability in the analysis may explain the previous confusing or not significant findings. The variability among our 15 children is wider than the range found by Laufer and Horii (1977) within their four children (317 to 342 Hz), and it is in contrast with Flax et al. (1991), who did not find significant differences among their three children. We may suppose that a bigger sample could give a better picture of this variability and it may increase the chance to find such phenomena and therefore to control it. A further longitudinal investigation of these differences at older ages would help understanding if this variability is only present at this stage of physiological organization of the vocal tract or it states later or, eventually, it increases over age to make the well-known differences between the tone of the voice of the different persons.
Controlling for this variability, we were able to describe with more accuracy the developmental trajectories for all the variables examined. F0 mean tends to increase over time, about 3.2 Hz per month. This slight increase is in contrast with other previous studies but may confirm the slight increase found in studies with a lower number of subjects as in Laufer and Horii (1977), who found a similar, but not statistically significant, variation. Other authors reported an increase in F0 mean values, however only since the 5th (Fairbanks, 1942) or the 10th month (Murry et al., 1983).
In the first two years of life, infants improve their interactive abilities as can be seen by the greater duration of gazes and amount of gestures directed to the mother (Papaeliou & Trevarthen, 2006; Trevarthen, 1977). Accordingly, we may hypothesize that the voice is more and more used to establish contact with the caregiver and to respond to stimulation received. It could be possible that to this greater participation to the interaction corresponds an increase in activation state and, therefore, an increase of F0 mean values. Moreover, previous authors evidenced that communicative, investigative or emotional pre-lexical productions show different prosodic features. Specifically, a reduced F0 mean is present in not-communicative productions pronounced, for example, while exploring an object. In contrast, productions directed to the adult during face to face interactions and also productions associated with imperative gestures are pronounced with higher F0 mean (Aureli et al., 2017; Papaeliou et al., 2002; Papaeliou & Trevarthen, 2006). Since we recorded mother-infant interactions during face-to-face interactions without objects we may suppose that the coded infants’ vocal productions were mostly pronounced with the aim to communicate with the mother, so, with a higher F0 mean.
A different result was found concerning the developing trajectories of F0 range. Infants tend to slightly increase the amplitude of the variations within their productions during the first months while after the 7th month the F0 range of all the productions decreases of about one semitone every two months. This may mean that very young infants’ productions are less controlled and characterized by very high variations; after the second half of the first year of life, infants tend to control better their voice and to use prosodic variations that are more similar to adult vocal productions. As previous studies have found infants tend to imitate the variations of the vocal productions of mothers (see Gratier & Devouche, 2011, and Ko et al., 2016) and we may suppose that this ability becomes more efficient after the 7th month. The wider F0 range during the first months of life is consistent with Amano et al. (2006) findings on prosodic development of three Japanese infants. However, these authors reported the decrease in F0 range only after the two-words period. On the contrary, Snow (2004) reported an increase in F0 range from the first to the 4th year of age. The different age ranges considered make it very difficult to compare the findings. We may suppose that our analyses, focused in a shorter age range and with more time points, were more effective in evidencing the variation not reported by other authors.
Concerning F0 contours, most of the vocal productions of infants can be considered flat. This finding is different from previous studies and it may be linked to the fact that during face-to-face interaction infants are not required to produce higher in F0 range productions to, for example, attract mother’s attention, or to ask for an object (see Aureli et al., 2017; D’Odorico & Franco, 1991 and Esteve-Gibert & Prieto, 2013). A further coding of the communicative function of the production would have given more information on this regard. Similarly, the percentages of rising and falling final contours do not vary over age. Our findings are not consistent neither with Snow (2006), neither with the hypothesis of Lieberman (1967). The significant effect of production at every age may contrast the idea that, during the first months of life, prosody is prevalently physiologically controlled. It is clear by our results that from the beginning of pre-lexical speech infants can use their prosodic competences, as confirmed by previous studies evidencing that infants can differentiate accents to signal communicative intent (Prieto et al., 2012).
Our data collection did not last enough longer to test Snow’s hypothesis (2006). Both the probability of producing rising and falling final contours slightly decreases over time, but we did not find the U shape effect found by Snow. If this is simply due to the age range explored in our study, we may suppose that the probability to produce these final contours may increase again later, after the 16th month of life.
The main finding of the present study is the significant effect of the type of vocal production all over the prosodic variables examined. The effect indicates that all the productions considered show different prosodic characteristics, and these differences lasts all over the age range considered.
Grunts are confirmed to be the less communicative productions, indeed they are pronounced with the lowest F0 mean and F0 range values compared to other productions. Moreover, it’s when pronouncing grunts and vocalizations that infants seem to be less able to use rising and falling final contours. These productions are the most frequent all over the age period considered but resulted to have less communicative intonation features. High F0 mean, wide F0 range and also rising and falling contours may represent an index of activation and excitement, and our results show these prosodic features are more frequent in the productions that are used with a more specific communicative meaning, as the simple and variate babblings (Fasolo et al., 2008).
Some limits of the present study have to be addressed. First, we used as a measure of F0 range the semitones while many previous studies use the F0 mean SD; this makes the results less comparable. Moreover, we did not consider the different communicative intentions of vocal productions (e.g., were they in respond to maternal speech, were they produced to engage maternal attention, to complain about a need, etc.). This should be another important variable that needs to be taken into consideration.
The present study illustrates the growing need to take into consideration individual variability in assessing infant and children prosody. These findings suggest that in the second half of the first year of life, infants show an intonational repertoire that may contribute to regulating interaction with their partner, which is considered one of the major prerequisites for language acquisition (Papaeliou et al., 2002). The use of multilevel models was very useful to this attend. Applying statistical analyses that are able to explore and control individual variability becomes even more essential when studying children development. Further longitudinal studies may explore if this variability among children is stable over time or it decreases/increases. This would allow to know at what ages is better to control it or not.
Another implication concerns the need to explore linguistic and prosodic development even at the pre-linguistic stage. The prosody of speech has to be studied considering the type of production analyzed. This, with the code of the communicative intention of the production, will give a very interesting contribution to the study of infants’ prosodic development.
The study was approved by the ethical committee of the Department of Neuroscience Imaging and Clinical Science of the University of Chieti-Pescara (Ethical approval number: DNISC2962, 06.11.2019) and was conducted according to the American Psychological Association guidelines in accordance with the 1964 Helsinki Declaration.
A written informed consent (approved by the ethical committee) for participation in the study has been obtained by the mothers.
Open Science Framework (OSF): The developing of prosody in infants: a longitudinal study over the first 16 months of life. https://doi.org/10.17605/OSF.IO/7PBN6 (D’Aloia, 2024).
The project contains the following underlying data:
• Number of prelexical production by child and session.xlsx. Data file which includes the children’s data.
• Read.me.pdf. A checklist adhering to STROBE guidelines for reporting observational studies.
Data are available under the terms of Creative Commons Attribution 4.0 International license (CC-BY 4.0)
The extended data for this study is available in the Open Science Framework repository. This extended data is a component of the main project titled “The developing of prosody in infants: a longitudinal study over the 16 months of life.”
Open Science Framework (OSF). Extended data for “The developing of prosody in infants: a longitudinal study over the 16 months of life.”. Doi: https://doi.org/10.17605/OSF.IO/738W2
The component contains the following extended data:
• Table 1. Extended data. Data file which includes a literature review on F0 related (F0 mean, F0 range, F0 final contours) prosodic development.
Data are available under the terms of Creative Commons Attribution 4.0 International license (CC-BY 4.0)
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Yes
References
1. E, Sardelli G, Marotta: Prosodic parameters for the detection of regional varieties in Italian. https://www.researchgate.net/publication/228809041_Prosodic_parameters_for_the_detection_of_regional_varieties_in_Italian. 2007.Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Acoustic phonetics, corpus phonetics, inter- and intra-speaker variation in voice and speech
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
No
References
1. Vihman M, Macken M, Miller R, Simmons H, et al.: From Babbling to Speech: A Re-Assessment of the Continuity Issue. Language. 1985; 61 (2). Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Phonological development
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Partly
If applicable, is the statistical analysis and its interpretation appropriate?
Partly
Are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions drawn adequately supported by the results?
Partly
References
1. Frota S, Cruz M, Matos N, Vigário M: Early Prosodic Development. 6: 295-324 Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Experimental linguistics and psycholinguistics; prosody; early language development
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 1 06 Aug 24 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)