ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

The developing of prosody in infants: a longitudinal study over the first 16 months of infant life

[version 1; peer review: 2 approved with reservations, 1 not approved]
PUBLISHED 06 Aug 2024
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Background

The acquisition and development of prosodic aspects of vocal intonation are of special interest within the larger context of language acquisition.

Method

The present study explored the developmental trajectories of infant prosodic abilities from 4 to 16 months of life with an intensive time points assessment. Several aspects were considered: an acoustic analysis of infant vocal productions with specific software, the analyses of all the prosodic variables associated with the fundamental frequency (F0 mean, F0 range, and F0 final contours), the individual variability and the complexity of the vocal productions of the infants.

Results

The multi-level analysis evidenced specific prosodic developmental trajectories that differ for the different kind of vocal productions since the first months of life.

Conclusions

The findings suggest that in the second half of the first year of life infants show an intonational repertoire that may help manage interactions with their caregiver and that individual variability has to be taken in consideration when assessing infants’ prosody.

Keywords

Prosody development, Fundamental Frequency, Multi-level analysis, Babbling, Infants 

Introduction

Early infants’ vocal productions form the basis for language development and are important social signals that stimulate caregiver response, which in turn facilitates the development of phonology and speech (Goldstein et al. 2009; Spinelli et al., 2017). Research on pre-linguistic vocalizations of typically developing infants has focused mostly on the segmental and syllabic aspects of spontaneous vocal productions. Fewer scholars focused their attention on supra-segmental aspects of infant vocalizations, including variables that measure prosodic aspects of vocal intonation such as those related to the pitch, usually measured through the fundamental frequency (F0). These prosodic features of language convey grammatical and pragmatic meaning as well as emotional or affect intent, and highlight particular elements of the utterance (Crystal, 1978; Gleitman et al., 1988; Wells et al., 2004). The development of prosody is of special interest within the larger context of language acquisition, because it is among the earliest aspects of speech to be acquired (Esteve-Gibert & Prieto, 2018) and is strictly linked to the maturity of the vocal tract (Kahane & Kahn, 1984). It is well known that infants are sensitive to prosodic information from a very early age due to their prenatal exposure to speech (Gervain, 2015) and therefore are able to understand the prosodic features of parental speech since the first months of life (De Carvalho et al., 2019; Fisher & Tokura, 1996; Soderstrom et al., 2008) and can manage to imitate the prosody of parental speech, for instance, by adapting their F0 features to match that of the parents (Gratier & Devouche, 2011; Ko et al., 2016; McRoberts & Best, 1997; Papoušek & Papoušek, 1989).

At birth, the infant’s vocal tract is not completely developed. Consequently, infant vocal expressions have different sounds and different melodies than adult ones (Astruc et al., 2013). Some authors reported much more reduced pitch range in children than in adults, and that children acquire falling tones first and final rising tones much later (Snow, 2004; Wells et al., 2004). The differences between infants and adults evidenced in the first stages of phonetic development have been linked to the difficulties in controlling the vocal tract. Specifically, Lieberman (1967) assumed that infants do not precisely control the tension of their laryngeal muscles once phonation starts. The child merely maintains the tension of the laryngeal muscles at or near the tension that they had as phonation started, and gradually rose from the start of phonation to either level of slightly falling “plateau”. For this reason, the majority of the breath-groups terminate with falling fundamental frequency contour. However, cross-linguistic differences have been reported, with the presence, for example, of rising contours evidenced in infancy in languages other than English (Esteve-Gibert & Prieto, 2013; Prieto et al., 2012; Whalen et al., 1991).

To our knowledge, only few data are now available considering the prosodic characteristics of the first infants’ vocal productions. The first of these was conducted by D’Odorico (1984), who acoustically analysed the cry and non-cry vocalizations produced by four Italian infants. The results of this study showed that cry vocalization produced in different context are acoustically different and that there are similarities in the acoustic properties of cry and not-cry vocalization produced in the same context. These results were later confirmed by D’Odorico and Franco (1991), who investigated the suprasegmental features of the vocalizations produced during social interaction by five Italian children from 4 to 11 months of age. Indeed, different patterns of non-segmental features were found in sounds produced in different contexts.

Considering the scarce data on Italian and the importance of cross-linguistic comparison on language development, the main aim of this study is to explore how Italian infants develop intonational abilities in the first 16 months of life, focusing on a detailed analysis of the infants’ pitch, in terms of F0 mean, F0 range and F0 final contours. Moreover, differently from other studies on this topic, comparing infants’ prosodic development to a mature adult model, the present study aims to give a broad picture of the longitudinal development of intonational repertory of children’s spontaneous productions, from 4 to 16 months of age.

Prosodic development over infancy and toddlerhood: a contrasting picture

A summary of the main studies available on the intonation development during the first two years of life is reported in Table 1, available in the Open Science Framework repository as extended data; doi: https://doi.org/10.17605/OSF.IO/738W2. The main focus is on the examination of F0 related aspects (F0 mean, F0 range and F0 final contours) of non-distress spontaneous productions, excluding studies who examined distress productions such as cry and whining (Mampe et al., 2009; Rothgänger, 2003; Zanchi et al., 2016b).

Among the F0-related prosodic variables, the most widely studied is the fundamental frequency mean value (F0 mean), that represents the rate of vibrations of the vocal cords within the larynx and reflects pitch variations of the voice. One of the first reviews on the developmental changes in the F0 mean from birth to adulthood, made by Kent (1976), showed that F0 mean is higher at birth and gradually decreases until it reaches adult levels. However, the studies included in the review were very few. Some of the studies reported in Table 1 (refer to extended data in the Open Science Framework repository) showed the same descending pattern from birth to the second year of life for F0 mean (Amano et al., 2006; Robb & Saxman, 1985; Rothgänger, 2003). Flax, Lahey, Harris, and Boothroyd (1991) found that 2 out of 3 children showed a reduction in F0 mean between the onset of words and 50-words period. This decrease is explained as reflecting the gradual maturation of the vocal tract and the development of infant ability to control the voice.

Nonetheless, other scholars failed to find significant changes in F0 mean over time (Iyer & Oller, 2008; Robb et al., 1989). Laufer and Horii (1977) measured infants’ productions every two weeks during the first six months of life and found that F0 mean values slightly fluctuated. Amano et al. (2006) argued that the main reason why some studies did not find a decrease in the F0 mean values was that speech samples were collected over periods that were too short to catch this effect. Lastly, there is only one author to our knowledge who found the opposite pattern. Fairbanks (1942) reported an increase during the first five months of infant life, followed by stabilization.

Even more contrasting findings have been presented about developmental trajectories of F0 range, namely the difference between the maximum and the minimum pitches produced within the utterance. F0 range represents the ability of infants to vary the intonation and to make the production more communicative and attractive for the listener (Amano et al. 2006). Amano et al. (2006) found an increase of F0 range after the onset of two-word utterances and hypothesized that, as infants grow up, they acquire the ability to vary the fundamental frequency of the voice within a production according to the increase of their communicative abilities. Similarly, Snow (2004) found that 4-years-old children showed a wider F0 range than 1-year-old infants. Nonetheless Snow and Ertmer (2012) reported that, in their sample, 10 out of 12 typically developing infants showed a decrease in F0 range between 3 and 9 months. In a first study, Robb and Saxman (1985) found a decrease of the between-utterances F0 range between 11 and 25 months, while, in a second study (Robb et al., 1989), they failed to find changes over time. Furthermore, Laufer and Horii (1977) found a slight decrease in within utterance F0 range during the first four months of life and a minor increase after the 4th month. To sum up, the developmental trajectories of the within productions F0 range is unclear.

Another prosodic variable that plays an important role in prosodic development is the F0 contour, intended as the shape of pitch (F0) variations within the utterance, which conveys a specific melody to the vocal production. F0 contour may increase from the beginning to the end of the production (rising F0 contour), decrease (falling F0 contour), or not significantly vary (level or flat F0 contour). The direction of these contours, especially of the F0 final contours, is considered fundamental in providing the pragmatic meaning of the vocal production. In most of the languages, falling final contours are typical of statements and labelling, while rising final contours are especially used with interrogative utterances (for Italian see for example Sorianello, 2021). Many scholars agree that infants start to use F0 final contours early to express intentions. For example, Prieto et al. (2012), Prieto & Vanrell (2007) and Esteve-Gibert and Prieto (2013) showed evidence that since the first year of life Catalan and Spanish infants, similarly to adults, are able to variate F0 final contours in order to signal pragmatic meanings, even before they can produce words. Several studies, mostly conducted in English speaking countries, agree that the majority of the productions of infants have falling contours (Fox, 1990), while rising contours are rare (see for example Kent & Murray, 1982, and Robb et al., 1989) and more frequent in adults (Cruttenden, 1997) and preschoolers (Snow, 2004). As stated above, according to Lieberman (1967), the falling contour is considered more natural and simpler than the rising contour without implying the infant’s intentionality. According to Lieberman hypothesis, rising patterns, by being contingent on language experience rather than physiological constraints, stabilize at a later stage of language acquisition. More recently, Snow (2006) found that the development of falling and rising patterns is not linear and follows a U-shaped trajectory. Both falling and rising contours are frequent and well expressed until nine months, when their quality decreases. After a period of stabilization, both these contours reach the same quality observed before the regression period at around 18 months. The author explained this results pointing out that intonation is controlled by physiology in the earliest stage (before nine months of age), but later the tones come under linguistic control. So, the regression could be due to a linguistic reorganization of speech and the U shape shows a shift of intonation from a pre-intentional to an intentional stage. Despite, these two main theoretical explanations, empirical studies that examined the frequency of F0 final contours trajectories over time found inconsistent results. Many of them (see, for example, Flax et al., 1991 and Murry et al., 1983) failed to find variations in the percentages of rising and falling contours over time. Fox (1990) found, consistently between the 3rd and the 9th month, a prevalence of 82% of falling final contours. On the contrary, Robb et al. (1989) found that the less frequently occurring F0 final contours were falling-rising and rising contours (comprising the 6% of all the vocalizations), and this percentage was constant, independently of lexicon size, throughout the first two years.

Suggestions for a better comprehension of the phenomena

To sum up, the studies on the development of F0 features over the first years of life failed to give a clear and homogeneous picture, mainly because comparing these studies is difficult. First, the differences among languages, which could lead to different intonation features and developmental trajectories. Second, there are wide differences in the age ranges considered, in the number of longitudinal sessions and many other methodological aspects. For example, the types of infants’ vocal production included in the analyses (whether cry-like or squeal-like vocalizations were included in the analyses or only syllabic-like vocalizations), the tools used for the analyses of speech spectrograms (old studies used visual inspections of the spectrograms to measure values of F0 and are therefore less reliable than modern analyses run with ad hoc programs), and the study design (with longitudinal and cross-sectional analyses leading to different findings due to the variability among the participants).

Moreover, an important issue raised by some of the reported studies is the presence of variability among infants, which affects both F0 values and age-related changes that occur in infants’ voices. Laufer and Horii (1977) described the F0 mean fluctuating from month to month with each infant presenting a specific pattern of change. Flax et al. (1991) found that three infants out of three showed different F0 range change patterns, with one child not varying at all in this period. Kent and Murray (1982) found a prevalence of F0 falling contours all over the age range considered, 3-6-9 months, but pointed out high intra- and inter-individual variability in the production of rising contours at one point in time as well as over time. Other authors agreed that differences among infants are present (Amano et al., 2006), but the restricted number of participants or number of sessions made it difficult to check this hypothesis. This has implications for the design of the study, for a greater number of participants need to be studied, and for the analyses, since participants variance has to be considered.

Moreover, the reported studies mainly considered all the productions or only one specific type of production, without making comparisons among them (see Table 1 in the extended data in the Open Science Framework repository). Robb et al. (1989) recorded on 12 successive occasions the utterances of seven infants in the 8–26-month age period investigating the F0 mean of monosyllabic and bi-syllabic utterances. They found similar F0 mean values between the two productions but a tendency for monosyllables to have a greater F0 range than bi-syllables for all the participants except one child. Moreover, this tendency remained stable across the first two years of life. On the contrary, Snow (2004) did not find an effect of the number of syllables of the production for F0 range, thus showing that the range is independent from the length of the production. But these studies concerned the number of syllables of utterances, not developmentally different productions. Rothgänger (2003) explored the prosodic development of babbling comparing it with the development of cry instead of other non-distress productions, so information about eventual differences among the productions is lacking. Other studies confirmed the necessity to consider the different productions separately, showing differences in the developmental trajectories of the prosodic features of vocalizations and syllabic utterances (Hsu et al., 2000) and different multiword combinations (Behrens & Gut, 2005). These results are vague and confirm that a more detailed description of the phenomena is needed.

The present study

Some authors have hypothesized the existence of linguistic trade-offs during development, so that the increased demands in one component of language, such as syntax, may potentially cause a decreased performance in a second component, such as phonology (Crystal, 1978). Furthermore, the interrelationships among different components of language may vary depending on how recently a specific linguistic structure has been learned (Crystal, 1978; Masterson & Kamhi, 1992; Zanchi et al., 2016a). We believe that at early ages this effect may also manifest in relation to the different pre-lexical productions produced by infants. Therefore, we hypothesize that the pre-lexical productions acquired before, and consequently more practiced at the oral-motor level (Oller et al., 1976; Stoel-Gammon, 2011), may have prosodic features different from the later acquired productions, for whom the infant is not yet fluent.

To assess this hypothesis, the main aim of the present study is to explore the development of F0 related prosodic features of each type of pre-lexical productions observed. To our knowledge, this is the main factor not dealt with so far. As reported above, only differences among the prosodic development of cry and other non-distress productions (Murry et al., 1983) or among productions with a different number of syllables (Robb et al., 1989) have been analyzed.

The second aim of the present study is to explore individual variability in prosodic features of speech; then, all the analyses will be run with the use of multi-level analysis with the children as the second level.

We expected to find:

  • Different trajectories of the prosodic features of each kind of production considered, with the productions earlier acquired showing different prosodic features than productions later acquired.

  • Significant individual variability among children.

Methods

Participants

Fifteen infants (3 females) participated in the study. The sample was not gender-balanced. However, previous studies showed that gender differences in the prosodic aspects of language are present only from late puberty (Bennett, 1983; Fox, 1990; Lee et al., 1999).

All infants were healthy and full term born. Families were monolingual Italian-speaking, and mothers’ mean age was about 35 years (range: 28 – 42). 60% of them completed high school education, and 40% had a university degree.

Procedure

Mothers were contacted after infant birth, and the first meeting was arranged at the beginning of the 4th month (M age = 4 months, 2 days; SD = 0:05). This age was chosen because around 4-6 months significant changes in the anatomical-physiological structure of the infant’s vocal tract occur, strongly increasing the control of speech articulation with enhanced production of speech-like sounds (Kent, 1976).

Infants were followed from the 4th to the 16th month of age. From the 4th to the 14th month a researcher visited the infant and the mother at home every 15 days (twice a month); after the 14th-month visits were monthly. See Table 2 for a summary of the sessions recorded for each participant. Some infants missed part of the sessions, and not all infants were followed up to the 16th month, since mothers interrupted their participation in the study for personal reasons. Mother and infant were audio-video-recorded during free-play face to face interaction without toys. The mothers were asked to play as they normally did. In total, 295 sessions (M per subject = 20, SD = 5) of about 10 minutes each (M = 10.06 minutes, SD = 1.99) were collected.

Table 2. Number of pre-lexical productions by child and session.

Children
Monthn123456789101112131415Total
144571731mm4617241421mmmm271
4b25728117264714442965426m82530
3641221293872472321141632mm19408
5b481462954654223639123076216124609
5271251503132512344628773m86584
6b6112766294160328437412726m22521
748267409133209129373769434503
7b86187191025140mm166197205243523
98110310411843172919142830574158589
8b1077297863143m141862161203522470
11685318663542m616198327841102584
9b12366325382836mm24926521015100462
10a13495919312237m188116322416102434
10b1462583336272377m29132321141431461
11a157245738484921m19172850m2254470
11b163388937244747m405324mm11368
12a1728572034535819m1224144219m8388
12b183232555216320m2022616mmm292
13a1917307391527mm14m22848mm227
13b20291116661921mmmm914mmm185
14a2117391420421412mm1843m34mm253
14b22282713552034mm12m1m21mm211
1523262810402129mmm27441634mm275
1624mmm17122520m1411119mmm119
Total1149117932411068179414671774732705395955113917989737
Total m1110119133303714865

Coding: pre-lexical productions

The audio of all the recorded sessions was obtained using the program Audacity Team (available for download at https://www.audacityteam.org/), and all the audible pre-lexical infant productions that did not overlap with other sounds and were noise-free were coded. In line with previous studies, vegetative sounds (such as wheezes, sneezes, coughs, hiccups, and clicking sounds), stress vocalizations (such as whimpering, fusses, and cries), laughs, words and onomatopoeias were not considered (Fasolo et al., 2010).

Infants’ vocal productions (Stark et al., 1993) were coded as:

  • Communicative grunt (g): vocalization constituted by a consonant-like sound (e.g., [m]);

  • Vocalic sound (v): vocalization constituted by vowel-like sounds (e.g., [a]);

  • Simple babbling (cv): vocalizations containing almost one full vowel-like element and one consonant-like element with rapid transition between consonant and vowel (e.g. [ba]);

  • Reduplicated babbling (cvcv): vocalization containing rapid repetition of the same sequence of one full vowel-like element and one consonant-like element (e.g., [baba], [tata]);

  • Variegated babbling (c1vc2v or cv1cv2): vocalization containing rapid repetition of different sequences; vocalizations comprising at least one full vowel-like element and at least two different consonant-like elements (e.g., [bata]), or two different full vowel-like elements and one consonant-like element (e.g., [beba]).

In total 9737 productions were coded (M per session = 33, SD = 23.38; M for infant = 645, SD = 329.40).

In all the tables and figures of the present paper the pre-linguistic productions will be indicated as follows: Grunt = Communicative grunts, Voc = Vocalizations, Babb1 = Simple babbling, Babb2 = Reduplicate babbling, Babb 3 = Variate babbling. Table 2 summarizes the total productions for each subject at each age.

Coding: prosody

The PRAAT speech analysis software package (Paul Boersma and David Weenink, Institute of Phonetic Sciences, University of Amsterdam, The Netherlands; Boersma & Weenink, 2005) was used to obtain the prosodic characteristics of each vocal production using the visual inspection of the sound wave represented in the spectrogram to identify the beginning and the ending of the production (D’Odorico et al., 2009). The following measures were calculated on every single production:

  • - Fundamental frequency mean (F0 mean): calculated automatically in Hz by the PRAAT program.

  • - Maximum and minimum pitch: the highest and lowest F0 values in the vocal production (Cruttenden, 1997), calculated in Hz.

  • - Fundamental frequency range (F0 range): the span of F0 changes over the entire pre-lexical production (in semitones). According to the definition of Snow and Balog (2002), it was calculated as the logarithmic difference between the highest and the lowest F0 values in a production, measured in semitones: [12/log(2)]*[log (maximum F0 - minimum F0)].

  • - F0 final contour: the last movement of the production intonation profile. Each change of F0 values within the production was classified as having either a rising (F0 final rising contour) or falling (F0 final falling contour) contour if the pitch changed (differences between the minimum and the maximum F0 value) by at least two semitones. If the F0 range of all the production was less than two semitones, the contour was classified as F0 level contour.

Reliability

The inter-coder reliability between two trained coders was assessed on 20% of the observation sessions randomly selected from each age point. Cohen’s kappa (K) coefficient was calculated to assess the accuracy of vocal productions coding; the value resulted in.93, which is amply sufficient. Concerning the prosodic variables, there were strong correlations between the coders (Pearson’s r) on the calculations of F0 mean (r = .94), highest pitch (r = .87), and lowest pitch (r = .75). The Cohen’s K coefficient on the classification of edge F0 final contours was.83.

Fit lines computation

The fit lines presented in the graph of Figure 1 were computed using a Kernel smoothing method. Specifically, we employed a non-parametric regression technique where the kernel function weights the observations within a neighborhood around each point of interest. The bandwidth was chosen to include 50% of the data points, ensuring a balanced trade-off between bias and variance. This approach, as stated by Wand and Jones (1995), and by Wasserman (2006), allowed for flexible modeling of the underlying relationships between variables without assuming a specific parametric form.

27d25b7f-fa29-4767-bce1-2289ec4160f9_figure1.gif

Figure 1. Mean relative frequency of the type of productions from 4 to 16 months of age.

Results

Descriptive analysis

The total frequency of each type of vocal production and its percentage on all the productions is given in Table 3.

Table 3. Frequencies and percentages of each vocal production.

Type of productionFrequency% of total
Grunt252326.1
Vocalization495551.2
Simple Babbling145115.0
Duplicate Babbling5926.1
Variate Babbling1551.6
Total9676100

In Figure 1 are reported the mean relative frequencies of each type of production aggregated within participants at different ages. The graph shows that vocalizations were over time the most frequent productions with a consistent decrease over the first ten months of life. Grunts were very common during the first seven months of life, but their frequency decreased over time since their use becomes very sporadic. All the three types of babblings started to be present between the 5th and the 6th months, and their use increased over time with a predominance of canonical babbling all over the age period considered.

Between subjects variability exploration

To explore the presence of between-subjects variability, a linear regression was conducted with age, age squared, and infants (indicators) as predictors on the dependent variable F0 mean. The results were statistically significant, R2 = .102, indicating that infants and age have an effect on the F0 mean. Coefficients reported in Table 4 show that infants had significantly different coefficients. This confirms the presence of differences among the infants and that these differences should be taken into consideration.

Table 4. Linear regression on F0 mean by children (dummies) and age.

bS.Etp
Age1.3.482.6<.01
Age2-0.7.52-1.4.15
Intercept345.73.2104.1<.01
Child 246.93.214.6<.01
Child 3-.14.9.0.98
Child 42.13.2.6.52
Child 53.63.51.0.30
Child 623.33.46.8<.01
Child 7-1.44.2-.3.75
Child 81.46.3.2.82
Child 946.24.210.8<.01
Child 1037.85.27.2<.01
Child 11-36.74.0-9.0<.01
Child 1241.43.910.4<.01
Child 135.74.11.4.17
Child 141.94.6.3.67
Child 15-35.13.6-10.1<.01

There are indeed some strong arguments to use multilevel techniques in the analysis of pitch. In this view the infants form a random factor (level 2) and each set of observations (level 1) is nested within each child. All multilevel models were tested with MLwiN 2.33 (Rasbash et al., 2005).

Different multilevel models were investigated. The basic model, the unconditional model with no predictors included in the equation (M0 in Table 5), indicated a significant inter-subject variability in the F0 mean (see Table 5); this represents a reason to carry out all the subsequent analyses with the multilevel software. The F0 mean value across subjects and across time was 366 Hz (Table 5), and it is in line with previous studies (see, for example, Amano et al., 2006).

Table 5. Predictors of F0 mean, multilevel results.

M0M1M2M3
bs.e.bs.e.bs.e.bs.e.
Fixed Effects
Constant366.320.89354.592.63361.902.76360.583.13
Age----1.600.491.400.491.620.56
Age2-----0.930.51-0.960.51-1.130.58
Productions; base vocalization
Grunt---------16.482.04-13.213.85
Babb-1---------5.272.52-0.366.83
Babb-2---------0.773.59-19.419.71
Babb-3--------14.526.7024.8423.20
Age*Grunt-------------0.380.37
Age*Babb-1-------------0.350.45
Age*Babb-2------------1.310.65
Age*Babb-3-------------0.691.43
Random effects (level 2)
u2081.83137.622069.24137.622007.62136.072005.86136.07
Random effects (level 1)
e4573.41130.364561.20130.154564.56129.504561.65129.55
Model Fit statistics
LL-----34.82 df-74.196 df-6.7710 df

We also tested further models (M1, M2 and M3) including the effects of age, linear and squared, as fixed predictors (M1), and the fixed effects of type of production variables (M2) and finally a model including also the fixed effects of interactions between age (linear) and each type of production (M3).

We also tested the models with age as a random factor, the -2LL did not decrease significantly, so there was no reason to treat it as a random factor, we kept age and age squared as fixed factors for all the following analyses. The type of production was treated as a fixed factor with a fixed coefficient and vocalization as the reference category. All the models for F0 mean are therefore random intercept models because no other variable than the children have random effect.

The equation representing the final M3 was the following:

y=β0ijcost+β1Ageij+β2AgeSquaredij+β3Gruntij+β4Bab1ij+β5Bab2ij+β6Bab3ij+β7AgexGruntij+β8AgexBab1ij+β9AgexBab2ij+β10AgexBab3ij

Where β0ij=β0+u0j+e0ij and cost = 1

The same analyses were followed both for F0 mean, for F0 range and F0 final contours using the same schema and the same hierarchical models.

F0 mean trajectories over time

The first analyses explored the changes in the F0 mean values over time and among the different type of productions. As reported above the basic model (M0) showed there was variation among infants (random part); consequently, all the models included this variation among infants.

We added age1, linear and squared, as fixed predictors (M1). As can be seen in Table 5, only age linear fixed effects were significant. F0 mean values tended to increase over time.

Model 2 (M2) added to M1 the type of production as a fixed predictor with vocalization as the reference category. Results confirmed the significant effect of age and showed that grunts have the lower mean F0 values, higher values are present for variate babblings, while vocalizations, simple and reduplicate babblings have similar values and are situated at an intermediate level. The lack of differences between mono and by-syllables (simple and reduplicated babblings) was also found by Robb et al. (1989).

Model 3 (M3) added the interaction term between age and type of productions to M2. The effect of age, age squared, and the interaction between reduplicated babblings and age were significant. Nonetheless, the LL did not improve significantly, so Model 3 cannot be considered the best representation of F0 mean development and Model 2 was chosen and represented in Figure 2. Figure 2 shows the distances between the curves, the differences between the intercepts of each production. The form of the curve is due to the introduction of age and age squared in the model.

27d25b7f-fa29-4767-bce1-2289ec4160f9_figure2.gif

Figure 2. F0 mean values (in Hz) of each type of production from 4 to 16 months of age predicted by Model 2 (M2 in Table 5).

F0 range trajectories over time

The second set of analyses dealt with the changes of F0 range, measured in semitones, over time and among the type of productions (see Table 6). The basic model (M0) gives 4.47 semitone as mean value of the F0 range and shows the significant variation among subjects and over time.

Table 6. Predictors of F0 range, multilevel results.

M0M1M2M3
bs.e.bs.e.bs.e.bs.e.
Fixed Effects
Constant4.470.044.700.084.880.094.890.10
Age----0.010.010.000.010.000.02
Age2-----0.060.02-0.060.02-0.050.02
Productions; base vocalization
Grunt---------0.450.07-0.470.12
Babb-1---------0.190.080.290.23
Babb-2--------0.770.121.010.33
Babb-3--------0.890.22-0.940.83
Age*Grunt------------0.000.01
Age*Babb-1-------------0.030.02
Age*Babb-2---------0.020.02
Age*Babb-3------------0.120.05
Random effects (level 2)
u12.960.2312.720.2312.320.2212.310.22
Random effects (level 1)
e1.360.041.370.041.420.041.420.04
Model Fit statistics
LL-----96.62 df-117.736 df-11.7510 df

In Model 1 (M1), the significant effect of age showed that F0 range increased over time. The addition of the type of production, again with vocalization as the reference category, in Model 2 (M2), showed a significant effect of age, age squared and of each type of production except reduplicate babbling. Generally, the effect of age squared showed that F0 range very slightly increased during the first months but decreased later. More complex productions such as reduplicate and variate babblings tend to be produced on average with a wider F0 range than vocalizations. Grunts are the productions with the smallest F0 range. Also, simple babblings showed a narrower F0 range than vocalizations. The difference between simple and reduplicate babblings is in contrast with Robb et al. (1989), who reported monosyllables with a slightly higher F0 range than bi-syllables.

The inclusion of the interaction term in Model 3 (M3) revealed a significant effect of the interaction between age and variate babblings, but the improvement in LL was very little, so Model 2 was chosen as the best representation of F0 range development. Model 2 is represented in Figure 3.

27d25b7f-fa29-4767-bce1-2289ec4160f9_figure3.gif

Figure 3. F0 range (in semitones) values of each type of productions from 4 to 16 months of age predicted by Model 2 (M2 in Table 6).

F0 final contours trajectories over time

To explore the development trajectories of the ability of children to give to F0 changes special directions, multilevel logistic analyses were run using as dependent variable the presence or not of level, rising and falling F0 final contours in the production. Analyses were done on 9597 productions, excluding the productions for which the coding of F0 final contour was unclear due to the spectrogram being too noisy because of interferences or artifacts.

Firstly, Model 0 showed a significant variability between subjects in the production of level contours with an average probability of 79% to produce level contours (see Table 7). The addition of the fixed effects of age and age squared in Model 1 did not show significant effects. Model 2 showed significant effects of reduplicate and variate babblings that are produced at a greater extent with level F0 contours. The probability to produce level productions did not vary over time. The interaction between age and type of production in Model 3 was not significant. Model 2 was chosen as the best representation of the presence of level F0 contours productions, and it is represented in Figure 4.

Table 7. Predictors of presence of level F0 contour productions (binomial level vs no level).

Logistic Binomial Multilevel results.

M0M1M2M3
bs.e.bs.e.bs.e.bs.e.
Fixed Effects
Constant1.360.051.580.161.720.171.740.19
Age----0.000.03-0.030.03-0.040.03
Age2-----0.030.03-0.020.030.000.03
Productions; base vocalization
Grunt---------0.210.12-0.230.23
Babb-1--------0.030.130.410.35
Babb-2--------1.230.210.930.58
Babb-3--------2.230.513.341.90
Age*Grunt------------0.000.02
Age*Babb-1-------------0.030.02
Age*Babb-2------------0.020.04
Age*Babb-3-------------0.070.11
Random effects (level 2)
u3.940.174.020.173.880.173.880.17
Random effects (level 1)
e1.000.001.000.001.000.001.000.00
27d25b7f-fa29-4767-bce1-2289ec4160f9_figure4.gif

Figure 4. Presence of level F0 contours in each type of production from 4 to 16 months of age predicted by Model 2 (M2 in Table 7).

Concerning rising F0 final contours, Model 0 indicated significant variations among subjects and a probability of 59% to produced rising contours within the production (see Table 8). Adding age and age squared in Model 1 did not reveal any significant effects. Model 2 showed a significant effect of grunts, reduplicated and variated babblings. Children use less often rising final contours when pronouncing grunts and more often when pronouncing these more complex babblings. The probability to produce rising F0 final contours did not change over time. Model 3 did not reveal any significant effect of interactions. Hence Model 2 was chosen and represented in Figure 5.

Table 8. Predictors of the presence of rising F0 final contours in the production (binomial rising vs no rising).

Logistic Binomial Multilevel results.

M0M1M2M3
bs.e.bs.e.bs.e.bs.e.
Fixed Effects
Constant0.350.040.530.130.680.130.680.15
Age----0.010.02-0.010.02-0.020.03
Age2-----0.040.02-0.040.02-0.020.03
Productions; base vocalization
Grunt---------0.280.09-0.280.19
Babb-1---------0.060.110.320.28
Babb-2--------0.770.151.010.41
Babb-3--------1.170.270.340.94
Age*Grunt------------0.000.02
Age*Babb-1-------------0.030.02
Age*Babb-2-------------0.020.03
Age*Babb-3------------0.050.06
Random effects (level 2)
u2.430.112.470.112.470.112.470.11
Random effects (level 1)
e1.000.001.000.001.000.001.000.00
27d25b7f-fa29-4767-bce1-2289ec4160f9_figure5.gif

Figure 5. Presence of rising F0 final contours in each type of production from 4 to 16 months of age predicted by Model 2 (M2 in Table 9).

The same analyses were run for the dependent variable presence or not of a falling F0 final contour (see Table 9). The basic model showed a significant variability among subjects and a general probability of 62% to produce falling final contours within the vocal production. Model 1 showed a significant effect of age squared. The presence of falling final contours is quite stable all over the period considered. In Model 2, the effects of grunts, reduplicate and variate babbling were significant (see Figure 6). Reduplicate and variate babbling were pronounced with higher probability with F0 falling final contours. Model 3 did not reveal any significant effect of the interactions of age and type of production on the presence (or not) of an F0 falling final contour.

Table 9. Predictors of the presence of falling F0 final contours in the production (binomial falling vs no falling).

Logistic Binomial Multilevel results.

M0M1M2M3
bs.e.bs.e.bs.e.bs.e.
Fixed Effects
Constant0.510.040.670.130.780.130.800.15
Age----0.020.02-0.010.02-0.010.03
Age2-----0.060.02-0.040.02-0.040.03
Productions; base vocalization
Grunt---------0.130.10-0.180.19
Babb-1--------0.110.110.090.28
Babb-2--------1.040.160.940.43
Babb-3--------1.80.322.191.13
Age*Grunt------------0.010.02
Age*Babb-1------------0.000.02
Age*Babb-2------------0.010.03
Age*Babb-3-------------0.020.07
Random effects (level 2)
u2.530.112.580.112.570.122.570.12
Random effects (level 1)
e1.000.001.000.001.000.001.000.00
27d25b7f-fa29-4767-bce1-2289ec4160f9_figure6.gif

Figure 6. Presence of falling F0 final contours in each type of production from 4 to 16 months of age predicted by Model 2 (M2 in Table 8).

Discussion

To give a contribution to the contrasting literature on the topic and to supply to the several limitations of previous studies, the present study had the aim to describe the development of F0-related prosodic features of the pre-lexical productions of Italian infants from the 4th to the 16th month of life.

One of the main findings that emerged is the presence of a significant variability among children. Since in the first months of life children show differences in the use of the voice, not only at the mean fundamental frequency level but in all the examined variables, that can be attributed to physiological differences in the shape and development of the larynx. This finding enchanted the need to put attention to a greater extent on individual variability when studies on infants and children are developed. For example, the F0 mean of children across ages varies between 309 to 392 Hz that indicates a difference of more than four semitones among children. This great individual variability may explain the different values found by previous studies, conducted on small groups of participants. Moreover, the tendency not to consider and control this variability in the analysis may explain the previous confusing or not significant findings. The variability among our 15 children is wider than the range found by Laufer and Horii (1977) within their four children (317 to 342 Hz), and it is in contrast with Flax et al. (1991), who did not find significant differences among their three children. We may suppose that a bigger sample could give a better picture of this variability and it may increase the chance to find such phenomena and therefore to control it. A further longitudinal investigation of these differences at older ages would help understanding if this variability is only present at this stage of physiological organization of the vocal tract or it states later or, eventually, it increases over age to make the well-known differences between the tone of the voice of the different persons.

Controlling for this variability, we were able to describe with more accuracy the developmental trajectories for all the variables examined. F0 mean tends to increase over time, about 3.2 Hz per month. This slight increase is in contrast with other previous studies but may confirm the slight increase found in studies with a lower number of subjects as in Laufer and Horii (1977), who found a similar, but not statistically significant, variation. Other authors reported an increase in F0 mean values, however only since the 5th (Fairbanks, 1942) or the 10th month (Murry et al., 1983).

In the first two years of life, infants improve their interactive abilities as can be seen by the greater duration of gazes and amount of gestures directed to the mother (Papaeliou & Trevarthen, 2006; Trevarthen, 1977). Accordingly, we may hypothesize that the voice is more and more used to establish contact with the caregiver and to respond to stimulation received. It could be possible that to this greater participation to the interaction corresponds an increase in activation state and, therefore, an increase of F0 mean values. Moreover, previous authors evidenced that communicative, investigative or emotional pre-lexical productions show different prosodic features. Specifically, a reduced F0 mean is present in not-communicative productions pronounced, for example, while exploring an object. In contrast, productions directed to the adult during face to face interactions and also productions associated with imperative gestures are pronounced with higher F0 mean (Aureli et al., 2017; Papaeliou et al., 2002; Papaeliou & Trevarthen, 2006). Since we recorded mother-infant interactions during face-to-face interactions without objects we may suppose that the coded infants’ vocal productions were mostly pronounced with the aim to communicate with the mother, so, with a higher F0 mean.

A different result was found concerning the developing trajectories of F0 range. Infants tend to slightly increase the amplitude of the variations within their productions during the first months while after the 7th month the F0 range of all the productions decreases of about one semitone every two months. This may mean that very young infants’ productions are less controlled and characterized by very high variations; after the second half of the first year of life, infants tend to control better their voice and to use prosodic variations that are more similar to adult vocal productions. As previous studies have found infants tend to imitate the variations of the vocal productions of mothers (see Gratier & Devouche, 2011, and Ko et al., 2016) and we may suppose that this ability becomes more efficient after the 7th month. The wider F0 range during the first months of life is consistent with Amano et al. (2006) findings on prosodic development of three Japanese infants. However, these authors reported the decrease in F0 range only after the two-words period. On the contrary, Snow (2004) reported an increase in F0 range from the first to the 4th year of age. The different age ranges considered make it very difficult to compare the findings. We may suppose that our analyses, focused in a shorter age range and with more time points, were more effective in evidencing the variation not reported by other authors.

Concerning F0 contours, most of the vocal productions of infants can be considered flat. This finding is different from previous studies and it may be linked to the fact that during face-to-face interaction infants are not required to produce higher in F0 range productions to, for example, attract mother’s attention, or to ask for an object (see Aureli et al., 2017; D’Odorico & Franco, 1991 and Esteve-Gibert & Prieto, 2013). A further coding of the communicative function of the production would have given more information on this regard. Similarly, the percentages of rising and falling final contours do not vary over age. Our findings are not consistent neither with Snow (2006), neither with the hypothesis of Lieberman (1967). The significant effect of production at every age may contrast the idea that, during the first months of life, prosody is prevalently physiologically controlled. It is clear by our results that from the beginning of pre-lexical speech infants can use their prosodic competences, as confirmed by previous studies evidencing that infants can differentiate accents to signal communicative intent (Prieto et al., 2012).

Our data collection did not last enough longer to test Snow’s hypothesis (2006). Both the probability of producing rising and falling final contours slightly decreases over time, but we did not find the U shape effect found by Snow. If this is simply due to the age range explored in our study, we may suppose that the probability to produce these final contours may increase again later, after the 16th month of life.

The main finding of the present study is the significant effect of the type of vocal production all over the prosodic variables examined. The effect indicates that all the productions considered show different prosodic characteristics, and these differences lasts all over the age range considered.

Grunts are confirmed to be the less communicative productions, indeed they are pronounced with the lowest F0 mean and F0 range values compared to other productions. Moreover, it’s when pronouncing grunts and vocalizations that infants seem to be less able to use rising and falling final contours. These productions are the most frequent all over the age period considered but resulted to have less communicative intonation features. High F0 mean, wide F0 range and also rising and falling contours may represent an index of activation and excitement, and our results show these prosodic features are more frequent in the productions that are used with a more specific communicative meaning, as the simple and variate babblings (Fasolo et al., 2008).

Some limits of the present study have to be addressed. First, we used as a measure of F0 range the semitones while many previous studies use the F0 mean SD; this makes the results less comparable. Moreover, we did not consider the different communicative intentions of vocal productions (e.g., were they in respond to maternal speech, were they produced to engage maternal attention, to complain about a need, etc.). This should be another important variable that needs to be taken into consideration.

Implications for future studies

The present study illustrates the growing need to take into consideration individual variability in assessing infant and children prosody. These findings suggest that in the second half of the first year of life, infants show an intonational repertoire that may contribute to regulating interaction with their partner, which is considered one of the major prerequisites for language acquisition (Papaeliou et al., 2002). The use of multilevel models was very useful to this attend. Applying statistical analyses that are able to explore and control individual variability becomes even more essential when studying children development. Further longitudinal studies may explore if this variability among children is stable over time or it decreases/increases. This would allow to know at what ages is better to control it or not.

Another implication concerns the need to explore linguistic and prosodic development even at the pre-linguistic stage. The prosody of speech has to be studied considering the type of production analyzed. This, with the code of the communicative intention of the production, will give a very interesting contribution to the study of infants’ prosodic development.

Ethics and consent

The study was approved by the ethical committee of the Department of Neuroscience Imaging and Clinical Science of the University of Chieti-Pescara (Ethical approval number: DNISC2962, 06.11.2019) and was conducted according to the American Psychological Association guidelines in accordance with the 1964 Helsinki Declaration.

A written informed consent (approved by the ethical committee) for participation in the study has been obtained by the mothers.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 06 Aug 2024
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
D'Aloia V, Zanchi P, Logrieco MGM et al. The developing of prosody in infants: a longitudinal study over the first 16 months of infant life [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2024, 13:896 (https://doi.org/10.12688/f1000research.154114.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 06 Aug 2024
Views
2
Cite
Reviewer Report 23 Aug 2025
Nicolas Audibert, Université Sorbonne Nouvelle and CNRS, Paris, France 
Approved with Reservations
VIEWS 2
The study presented in this article is based on longitudinal data collected from Italian infants between the ages of 4 and 16 months, as well as the segmentation and categorization of this data into types of productions. The authors propose ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Audibert N. Reviewer Report For: The developing of prosody in infants: a longitudinal study over the first 16 months of infant life [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2024, 13:896 (https://doi.org/10.5256/f1000research.169103.r400506)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
3
Cite
Reviewer Report 23 Aug 2025
Marilyn Vihman, University of California, Berkeley, Berkeley, USA;  Language and Linguistic Science, University of York, York, UK 
Not Approved
VIEWS 3
This study follows 15 (monolingual Italian) infants over 12 months (from 4 to 16 months); such a relatively large longitudinal study should, in principle, constitute an important contribution to the literature on prosodic development. Unfortunately, the study is limited in ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Vihman M. Reviewer Report For: The developing of prosody in infants: a longitudinal study over the first 16 months of infant life [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2024, 13:896 (https://doi.org/10.5256/f1000research.169103.r400502)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
14
Cite
Reviewer Report 01 Oct 2024
Sonia Frota, Center of Linguistics of the University of Lisbon, Lisbon, Portugal 
Approved with Reservations
VIEWS 14
This study examined the development fo F0 related variables in Italian-learning infants from 4 to 16 months of age. It certainly addresses a relevant topic, offering additional evidence and a contribution to the scarce literature on early prosodic development. However, the ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Frota S. Reviewer Report For: The developing of prosody in infants: a longitudinal study over the first 16 months of infant life [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2024, 13:896 (https://doi.org/10.5256/f1000research.169103.r318803)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 06 Aug 2024
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.