The developing of prosody in infants: a longitudinal study over the first 16 months of infant life

Valeria D'Aloia; Paola Zanchi; Maria Grazia Mada Logrieco; Ilenia Passaquindici; Riccardo Palumbo; Francesca Lionetti; Maria Spinelli; Mirco Fasolo

doi:10.12688/f1000research.154114.1

Home Browse The developing of prosody in infants: a longitudinal study over the...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

The developing of prosody in infants: a longitudinal study over the first 16 months of infant life

[version 1; peer review: 2 approved with reservations, 1 not approved]

Valeria D'Aloia ¹, Paola Zanchi², Maria Grazia Mada Logrieco³, [...] Ilenia Passaquindici¹, Riccardo Palumbo¹, Francesca Lionetti¹, Maria Spinelli¹, Mirco Fasolo¹

Valeria D'Aloia ¹, Paola Zanchi², [...] Maria Grazia Mada Logrieco³, Ilenia Passaquindici¹, Riccardo Palumbo¹, Francesca Lionetti¹, Maria Spinelli¹, Mirco Fasolo¹

PUBLISHED 06 Aug 2024

Author details Author details

¹ Gabriele d'Annunzio University of Chieti and Pescara Department of Neuroscience and Imaging and Clinical Sciences, Chieti, Abruzzo, Italy
² Catholic University of the Sacred Heart Department of Psychology, Milan, Lombardy, Italy
³ University of Foggia Department of Humanities Arts Cultural Heritage Education Sciences, Foggia, Apulia, Italy

Valeria D'Aloia
Roles: Conceptualization, Writing – Original Draft Preparation

Paola Zanchi
Roles: Conceptualization, Formal Analysis, Writing – Original Draft Preparation

Maria Grazia Mada Logrieco
Roles: Methodology, Resources

Ilenia Passaquindici
Roles: Data Curation, Validation

Riccardo Palumbo
Roles: Resources, Writing – Review & Editing

Francesca Lionetti
Roles: Methodology, Writing – Review & Editing

Maria Spinelli
Roles: Conceptualization, Project Administration, Supervision

Mirco Fasolo
Roles: Conceptualization, Project Administration, Supervision

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

Background

The acquisition and development of prosodic aspects of vocal intonation are of special interest within the larger context of language acquisition.

Method

The present study explored the developmental trajectories of infant prosodic abilities from 4 to 16 months of life with an intensive time points assessment. Several aspects were considered: an acoustic analysis of infant vocal productions with specific software, the analyses of all the prosodic variables associated with the fundamental frequency (F0 mean, F0 range, and F0 final contours), the individual variability and the complexity of the vocal productions of the infants.

Results

The multi-level analysis evidenced specific prosodic developmental trajectories that differ for the different kind of vocal productions since the first months of life.

Conclusions

The findings suggest that in the second half of the first year of life infants show an intonational repertoire that may help manage interactions with their caregiver and that individual variability has to be taken in consideration when assessing infants’ prosody.

Keywords

Prosody development, Fundamental Frequency, Multi-level analysis, Babbling, Infants

Corresponding author: Valeria D'Aloia

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2024 D'Aloia V et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: D'Aloia V, Zanchi P, Logrieco MGM et al. The developing of prosody in infants: a longitudinal study over the first 16 months of infant life [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2024, 13:896 (https://doi.org/10.12688/f1000research.154114.1) First published: 06 Aug 2024, 13:896 (https://doi.org/10.12688/f1000research.154114.1) Latest published: 06 Aug 2024, 13:896 (https://doi.org/10.12688/f1000research.154114.1)

Introduction

Early infants’ vocal productions form the basis for language development and are important social signals that stimulate caregiver response, which in turn facilitates the development of phonology and speech (Goldstein et al. 2009; Spinelli et al., 2017). Research on pre-linguistic vocalizations of typically developing infants has focused mostly on the segmental and syllabic aspects of spontaneous vocal productions. Fewer scholars focused their attention on supra-segmental aspects of infant vocalizations, including variables that measure prosodic aspects of vocal intonation such as those related to the pitch, usually measured through the fundamental frequency (F0). These prosodic features of language convey grammatical and pragmatic meaning as well as emotional or affect intent, and highlight particular elements of the utterance (Crystal, 1978; Gleitman et al., 1988; Wells et al., 2004). The development of prosody is of special interest within the larger context of language acquisition, because it is among the earliest aspects of speech to be acquired (Esteve-Gibert & Prieto, 2018) and is strictly linked to the maturity of the vocal tract (Kahane & Kahn, 1984). It is well known that infants are sensitive to prosodic information from a very early age due to their prenatal exposure to speech (Gervain, 2015) and therefore are able to understand the prosodic features of parental speech since the first months of life (De Carvalho et al., 2019; Fisher & Tokura, 1996; Soderstrom et al., 2008) and can manage to imitate the prosody of parental speech, for instance, by adapting their F0 features to match that of the parents (Gratier & Devouche, 2011; Ko et al., 2016; McRoberts & Best, 1997; Papoušek & Papoušek, 1989).

At birth, the infant’s vocal tract is not completely developed. Consequently, infant vocal expressions have different sounds and different melodies than adult ones (Astruc et al., 2013). Some authors reported much more reduced pitch range in children than in adults, and that children acquire falling tones first and final rising tones much later (Snow, 2004; Wells et al., 2004). The differences between infants and adults evidenced in the first stages of phonetic development have been linked to the difficulties in controlling the vocal tract. Specifically, Lieberman (1967) assumed that infants do not precisely control the tension of their laryngeal muscles once phonation starts. The child merely maintains the tension of the laryngeal muscles at or near the tension that they had as phonation started, and gradually rose from the start of phonation to either level of slightly falling “plateau”. For this reason, the majority of the breath-groups terminate with falling fundamental frequency contour. However, cross-linguistic differences have been reported, with the presence, for example, of rising contours evidenced in infancy in languages other than English (Esteve-Gibert & Prieto, 2013; Prieto et al., 2012; Whalen et al., 1991).

To our knowledge, only few data are now available considering the prosodic characteristics of the first infants’ vocal productions. The first of these was conducted by D’Odorico (1984), who acoustically analysed the cry and non-cry vocalizations produced by four Italian infants. The results of this study showed that cry vocalization produced in different context are acoustically different and that there are similarities in the acoustic properties of cry and not-cry vocalization produced in the same context. These results were later confirmed by D’Odorico and Franco (1991), who investigated the suprasegmental features of the vocalizations produced during social interaction by five Italian children from 4 to 11 months of age. Indeed, different patterns of non-segmental features were found in sounds produced in different contexts.

Considering the scarce data on Italian and the importance of cross-linguistic comparison on language development, the main aim of this study is to explore how Italian infants develop intonational abilities in the first 16 months of life, focusing on a detailed analysis of the infants’ pitch, in terms of F0 mean, F0 range and F0 final contours. Moreover, differently from other studies on this topic, comparing infants’ prosodic development to a mature adult model, the present study aims to give a broad picture of the longitudinal development of intonational repertory of children’s spontaneous productions, from 4 to 16 months of age.

Prosodic development over infancy and toddlerhood: a contrasting picture

A summary of the main studies available on the intonation development during the first two years of life is reported in Table 1, available in the Open Science Framework repository as extended data; doi: https://doi.org/10.17605/OSF.IO/738W2. The main focus is on the examination of F0 related aspects (F0 mean, F0 range and F0 final contours) of non-distress spontaneous productions, excluding studies who examined distress productions such as cry and whining (Mampe et al., 2009; Rothgänger, 2003; Zanchi et al., 2016b).

Among the F0-related prosodic variables, the most widely studied is the fundamental frequency mean value (F0 mean), that represents the rate of vibrations of the vocal cords within the larynx and reflects pitch variations of the voice. One of the first reviews on the developmental changes in the F0 mean from birth to adulthood, made by Kent (1976), showed that F0 mean is higher at birth and gradually decreases until it reaches adult levels. However, the studies included in the review were very few. Some of the studies reported in Table 1 (refer to extended data in the Open Science Framework repository) showed the same descending pattern from birth to the second year of life for F0 mean (Amano et al., 2006; Robb & Saxman, 1985; Rothgänger, 2003). Flax, Lahey, Harris, and Boothroyd (1991) found that 2 out of 3 children showed a reduction in F0 mean between the onset of words and 50-words period. This decrease is explained as reflecting the gradual maturation of the vocal tract and the development of infant ability to control the voice.

Nonetheless, other scholars failed to find significant changes in F0 mean over time (Iyer & Oller, 2008; Robb et al., 1989). Laufer and Horii (1977) measured infants’ productions every two weeks during the first six months of life and found that F0 mean values slightly fluctuated. Amano et al. (2006) argued that the main reason why some studies did not find a decrease in the F0 mean values was that speech samples were collected over periods that were too short to catch this effect. Lastly, there is only one author to our knowledge who found the opposite pattern. Fairbanks (1942) reported an increase during the first five months of infant life, followed by stabilization.

Even more contrasting findings have been presented about developmental trajectories of F0 range, namely the difference between the maximum and the minimum pitches produced within the utterance. F0 range represents the ability of infants to vary the intonation and to make the production more communicative and attractive for the listener (Amano et al. 2006). Amano et al. (2006) found an increase of F0 range after the onset of two-word utterances and hypothesized that, as infants grow up, they acquire the ability to vary the fundamental frequency of the voice within a production according to the increase of their communicative abilities. Similarly, Snow (2004) found that 4-years-old children showed a wider F0 range than 1-year-old infants. Nonetheless Snow and Ertmer (2012) reported that, in their sample, 10 out of 12 typically developing infants showed a decrease in F0 range between 3 and 9 months. In a first study, Robb and Saxman (1985) found a decrease of the between-utterances F0 range between 11 and 25 months, while, in a second study (Robb et al., 1989), they failed to find changes over time. Furthermore, Laufer and Horii (1977) found a slight decrease in within utterance F0 range during the first four months of life and a minor increase after the 4^th month. To sum up, the developmental trajectories of the within productions F0 range is unclear.

Another prosodic variable that plays an important role in prosodic development is the F0 contour, intended as the shape of pitch (F0) variations within the utterance, which conveys a specific melody to the vocal production. F0 contour may increase from the beginning to the end of the production (rising F0 contour), decrease (falling F0 contour), or not significantly vary (level or flat F0 contour). The direction of these contours, especially of the F0 final contours, is considered fundamental in providing the pragmatic meaning of the vocal production. In most of the languages, falling final contours are typical of statements and labelling, while rising final contours are especially used with interrogative utterances (for Italian see for example Sorianello, 2021). Many scholars agree that infants start to use F0 final contours early to express intentions. For example, Prieto et al. (2012), Prieto & Vanrell (2007) and Esteve-Gibert and Prieto (2013) showed evidence that since the first year of life Catalan and Spanish infants, similarly to adults, are able to variate F0 final contours in order to signal pragmatic meanings, even before they can produce words. Several studies, mostly conducted in English speaking countries, agree that the majority of the productions of infants have falling contours (Fox, 1990), while rising contours are rare (see for example Kent & Murray, 1982, and Robb et al., 1989) and more frequent in adults (Cruttenden, 1997) and preschoolers (Snow, 2004). As stated above, according to Lieberman (1967), the falling contour is considered more natural and simpler than the rising contour without implying the infant’s intentionality. According to Lieberman hypothesis, rising patterns, by being contingent on language experience rather than physiological constraints, stabilize at a later stage of language acquisition. More recently, Snow (2006) found that the development of falling and rising patterns is not linear and follows a U-shaped trajectory. Both falling and rising contours are frequent and well expressed until nine months, when their quality decreases. After a period of stabilization, both these contours reach the same quality observed before the regression period at around 18 months. The author explained this results pointing out that intonation is controlled by physiology in the earliest stage (before nine months of age), but later the tones come under linguistic control. So, the regression could be due to a linguistic reorganization of speech and the U shape shows a shift of intonation from a pre-intentional to an intentional stage. Despite, these two main theoretical explanations, empirical studies that examined the frequency of F0 final contours trajectories over time found inconsistent results. Many of them (see, for example, Flax et al., 1991 and Murry et al., 1983) failed to find variations in the percentages of rising and falling contours over time. Fox (1990) found, consistently between the 3^rd and the 9^th month, a prevalence of 82% of falling final contours. On the contrary, Robb et al. (1989) found that the less frequently occurring F0 final contours were falling-rising and rising contours (comprising the 6% of all the vocalizations), and this percentage was constant, independently of lexicon size, throughout the first two years.

Suggestions for a better comprehension of the phenomena

To sum up, the studies on the development of F0 features over the first years of life failed to give a clear and homogeneous picture, mainly because comparing these studies is difficult. First, the differences among languages, which could lead to different intonation features and developmental trajectories. Second, there are wide differences in the age ranges considered, in the number of longitudinal sessions and many other methodological aspects. For example, the types of infants’ vocal production included in the analyses (whether cry-like or squeal-like vocalizations were included in the analyses or only syllabic-like vocalizations), the tools used for the analyses of speech spectrograms (old studies used visual inspections of the spectrograms to measure values of F0 and are therefore less reliable than modern analyses run with ad hoc programs), and the study design (with longitudinal and cross-sectional analyses leading to different findings due to the variability among the participants).

Moreover, an important issue raised by some of the reported studies is the presence of variability among infants, which affects both F0 values and age-related changes that occur in infants’ voices. Laufer and Horii (1977) described the F0 mean fluctuating from month to month with each infant presenting a specific pattern of change. Flax et al. (1991) found that three infants out of three showed different F0 range change patterns, with one child not varying at all in this period. Kent and Murray (1982) found a prevalence of F0 falling contours all over the age range considered, 3-6-9 months, but pointed out high intra- and inter-individual variability in the production of rising contours at one point in time as well as over time. Other authors agreed that differences among infants are present (Amano et al., 2006), but the restricted number of participants or number of sessions made it difficult to check this hypothesis. This has implications for the design of the study, for a greater number of participants need to be studied, and for the analyses, since participants variance has to be considered.

Moreover, the reported studies mainly considered all the productions or only one specific type of production, without making comparisons among them (see Table 1 in the extended data in the Open Science Framework repository). Robb et al. (1989) recorded on 12 successive occasions the utterances of seven infants in the 8–26-month age period investigating the F0 mean of monosyllabic and bi-syllabic utterances. They found similar F0 mean values between the two productions but a tendency for monosyllables to have a greater F0 range than bi-syllables for all the participants except one child. Moreover, this tendency remained stable across the first two years of life. On the contrary, Snow (2004) did not find an effect of the number of syllables of the production for F0 range, thus showing that the range is independent from the length of the production. But these studies concerned the number of syllables of utterances, not developmentally different productions. Rothgänger (2003) explored the prosodic development of babbling comparing it with the development of cry instead of other non-distress productions, so information about eventual differences among the productions is lacking. Other studies confirmed the necessity to consider the different productions separately, showing differences in the developmental trajectories of the prosodic features of vocalizations and syllabic utterances (Hsu et al., 2000) and different multiword combinations (Behrens & Gut, 2005). These results are vague and confirm that a more detailed description of the phenomena is needed.

The present study

Some authors have hypothesized the existence of linguistic trade-offs during development, so that the increased demands in one component of language, such as syntax, may potentially cause a decreased performance in a second component, such as phonology (Crystal, 1978). Furthermore, the interrelationships among different components of language may vary depending on how recently a specific linguistic structure has been learned (Crystal, 1978; Masterson & Kamhi, 1992; Zanchi et al., 2016a). We believe that at early ages this effect may also manifest in relation to the different pre-lexical productions produced by infants. Therefore, we hypothesize that the pre-lexical productions acquired before, and consequently more practiced at the oral-motor level (Oller et al., 1976; Stoel-Gammon, 2011), may have prosodic features different from the later acquired productions, for whom the infant is not yet fluent.

To assess this hypothesis, the main aim of the present study is to explore the development of F0 related prosodic features of each type of pre-lexical productions observed. To our knowledge, this is the main factor not dealt with so far. As reported above, only differences among the prosodic development of cry and other non-distress productions (Murry et al., 1983) or among productions with a different number of syllables (Robb et al., 1989) have been analyzed.

The second aim of the present study is to explore individual variability in prosodic features of speech; then, all the analyses will be run with the use of multi-level analysis with the children as the second level.

We expected to find:

• Different trajectories of the prosodic features of each kind of production considered, with the productions earlier acquired showing different prosodic features than productions later acquired.
• Significant individual variability among children.

Methods

Participants

Fifteen infants (3 females) participated in the study. The sample was not gender-balanced. However, previous studies showed that gender differences in the prosodic aspects of language are present only from late puberty (Bennett, 1983; Fox, 1990; Lee et al., 1999).

All infants were healthy and full term born. Families were monolingual Italian-speaking, and mothers’ mean age was about 35 years (range: 28 – 42). 60% of them completed high school education, and 40% had a university degree.

Procedure

Mothers were contacted after infant birth, and the first meeting was arranged at the beginning of the 4^th month (M age = 4 months, 2 days; SD = 0:05). This age was chosen because around 4-6 months significant changes in the anatomical-physiological structure of the infant’s vocal tract occur, strongly increasing the control of speech articulation with enhanced production of speech-like sounds (Kent, 1976).

Infants were followed from the 4^th to the 16^th month of age. From the 4^th to the 14^th month a researcher visited the infant and the mother at home every 15 days (twice a month); after the 14^th-month visits were monthly. See Table 2 for a summary of the sessions recorded for each participant. Some infants missed part of the sessions, and not all infants were followed up to the 16^th month, since mothers interrupted their participation in the study for personal reasons. Mother and infant were audio-video-recorded during free-play face to face interaction without toys. The mothers were asked to play as they normally did. In total, 295 sessions (M per subject = 20, SD = 5) of about 10 minutes each (M = 10.06 minutes, SD = 1.99) were collected.

Table 2. Number of pre-lexical productions by child and session.

		Children
Month	n	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	Total
4°	1	44	57	17	31	m	m	46	17	24	14	21	m	m	m	m	271
4b	2	57	28	11	72	64	71	44	4	29	6	54	2	6	m	82	530
5°	3	64	12	21	29	38	72	47	23	21	14	16	32	m	m	19	408
5b	4	81	46	29	54	65	42	23	6	39	12	30	76	21	61	24	609
6°	5	27	125	1	50	31	32	51	23	44	6	28	7	73	m	86	584
6b	6	112	76	6	29	41	60	3	28	43	7	41	27	26	m	22	521
7°	7	48	26	7	40	91	33	20	9	12	9	37	37	6	94	34	503
7b	8	61	87	19	102	51	40	m	m	16	6	19	7	20	52	43	523
8°	9	81	103	10	41	18	43	17	29	19	14	28	30	57	41	58	589
8b	10	77	29	7	86	31	43	m	14	18	6	21	61	20	35	22	470
9°	11	68	53	18	66	35	42	m	6	16	19	8	32	78	41	102	584
9b	12	36	63	25	38	28	36	m	m	24	9	26	52	10	15	100	462
10a	13	49	59	19	31	22	37	m	18	8	11	6	32	24	16	102	434
10b	14	62	58	33	36	27	23	77	m	29	13	23	21	14	14	31	461
11a	15	72	45	7	38	48	49	21	m	19	17	28	50	m	22	54	470
11b	16	33	88	9	37	24	47	47	m	40	5	3	24	m	m	11	368
12a	17	28	57	20	34	53	58	19	m	12	24	14	42	19	m	8	388
12b	18	32	32	5	55	21	63	20	m	20	22	6	16	m	m	m	292
13a	19	17	30	7	39	15	27	m	m	14	m	22	8	48	m	m	227
13b	20	29	11	16	66	19	21	m	m	m	m	9	14	m	m	m	185
14a	21	17	39	14	20	42	14	12	m	m	18	43	m	34	m	m	253
14b	22	28	27	13	55	20	34	m	m	12	m	1	m	21	m	m	211
15	23	26	28	10	40	21	29	m	m	m	27	44	16	34	m	m	275
16	24	m	m	m	17	12	25	20	m	14	11	11	9	m	m	m	119
Total		1149	1179	324	1106	817	941	467	177	473	270	539	595	511	391	798	9737
Total m		1	1	1	0	1	1	9	13	3	3	0	3	7	14	8	65

Coding: pre-lexical productions

The audio of all the recorded sessions was obtained using the program Audacity Team (available for download at https://www.audacityteam.org/), and all the audible pre-lexical infant productions that did not overlap with other sounds and were noise-free were coded. In line with previous studies, vegetative sounds (such as wheezes, sneezes, coughs, hiccups, and clicking sounds), stress vocalizations (such as whimpering, fusses, and cries), laughs, words and onomatopoeias were not considered (Fasolo et al., 2010).

Infants’ vocal productions (Stark et al., 1993) were coded as:

• Communicative grunt (g): vocalization constituted by a consonant-like sound (e.g., [m]);
• Vocalic sound (v): vocalization constituted by vowel-like sounds (e.g., [a]);
• Simple babbling (cv): vocalizations containing almost one full vowel-like element and one consonant-like element with rapid transition between consonant and vowel (e.g. [ba]);
• Reduplicated babbling (cvcv): vocalization containing rapid repetition of the same sequence of one full vowel-like element and one consonant-like element (e.g., [baba], [tata]);
• Variegated babbling (c1vc2v or cv1cv2): vocalization containing rapid repetition of different sequences; vocalizations comprising at least one full vowel-like element and at least two different consonant-like elements (e.g., [bata]), or two different full vowel-like elements and one consonant-like element (e.g., [beba]).

In total 9737 productions were coded (M per session = 33, SD = 23.38; M for infant = 645, SD = 329.40).

In all the tables and figures of the present paper the pre-linguistic productions will be indicated as follows: Grunt = Communicative grunts, Voc = Vocalizations, Babb1 = Simple babbling, Babb2 = Reduplicate babbling, Babb 3 = Variate babbling. Table 2 summarizes the total productions for each subject at each age.

Coding: prosody

The PRAAT speech analysis software package (Paul Boersma and David Weenink, Institute of Phonetic Sciences, University of Amsterdam, The Netherlands; Boersma & Weenink, 2005) was used to obtain the prosodic characteristics of each vocal production using the visual inspection of the sound wave represented in the spectrogram to identify the beginning and the ending of the production (D’Odorico et al., 2009). The following measures were calculated on every single production:

- Fundamental frequency mean (F0 mean): calculated automatically in Hz by the PRAAT program.
- Maximum and minimum pitch: the highest and lowest F0 values in the vocal production (Cruttenden, 1997), calculated in Hz.
- Fundamental frequency range (F0 range): the span of F0 changes over the entire pre-lexical production (in semitones). According to the definition of Snow and Balog (2002), it was calculated as the logarithmic difference between the highest and the lowest F0 values in a production, measured in semitones: [12/log(2)]*[log (maximum F0 - minimum F0)].
- F0 final contour: the last movement of the production intonation profile. Each change of F0 values within the production was classified as having either a rising (F0 final rising contour) or falling (F0 final falling contour) contour if the pitch changed (differences between the minimum and the maximum F0 value) by at least two semitones. If the F0 range of all the production was less than two semitones, the contour was classified as F0 level contour.

Reliability

The inter-coder reliability between two trained coders was assessed on 20% of the observation sessions randomly selected from each age point. Cohen’s kappa (K) coefficient was calculated to assess the accuracy of vocal productions coding; the value resulted in.93, which is amply sufficient. Concerning the prosodic variables, there were strong correlations between the coders (Pearson’s r) on the calculations of F0 mean (r = .94), highest pitch (r = .87), and lowest pitch (r = .75). The Cohen’s K coefficient on the classification of edge F0 final contours was.83.

Fit lines computation

The fit lines presented in the graph of Figure 1 were computed using a Kernel smoothing method. Specifically, we employed a non-parametric regression technique where the kernel function weights the observations within a neighborhood around each point of interest. The bandwidth was chosen to include 50% of the data points, ensuring a balanced trade-off between bias and variance. This approach, as stated by Wand and Jones (1995), and by Wasserman (2006), allowed for flexible modeling of the underlying relationships between variables without assuming a specific parametric form.

Figure 1. Mean relative frequency of the type of productions from 4 to 16 months of age.

Results

Descriptive analysis

The total frequency of each type of vocal production and its percentage on all the productions is given in Table 3.

Table 3. Frequencies and percentages of each vocal production.

Type of production	Frequency	% of total
Grunt	2523	26.1
Vocalization	4955	51.2
Simple Babbling	1451	15.0
Duplicate Babbling	592	6.1
Variate Babbling	155	1.6
Total	9676	100

In Figure 1 are reported the mean relative frequencies of each type of production aggregated within participants at different ages. The graph shows that vocalizations were over time the most frequent productions with a consistent decrease over the first ten months of life. Grunts were very common during the first seven months of life, but their frequency decreased over time since their use becomes very sporadic. All the three types of babblings started to be present between the 5^th and the 6^th months, and their use increased over time with a predominance of canonical babbling all over the age period considered.

Between subjects variability exploration

To explore the presence of between-subjects variability, a linear regression was conducted with age, age squared, and infants (indicators) as predictors on the dependent variable F0 mean. The results were statistically significant, R² = .102, indicating that infants and age have an effect on the F0 mean. Coefficients reported in Table 4 show that infants had significantly different coefficients. This confirms the presence of differences among the infants and that these differences should be taken into consideration.

Table 4. Linear regression on F0 mean by children (dummies) and age.

	b	S.E	t	p
Age	1.3	.48	2.6	<.01
Age²	-0.7	.52	-1.4	.15
Intercept	345.7	3.2	104.1	<.01
Child 2	46.9	3.2	14.6	<.01
Child 3	-.1	4.9	.0	.98
Child 4	2.1	3.2	.6	.52
Child 5	3.6	3.5	1.0	.30
Child 6	23.3	3.4	6.8	<.01
Child 7	-1.4	4.2	-.3	.75
Child 8	1.4	6.3	.2	.82
Child 9	46.2	4.2	10.8	<.01
Child 10	37.8	5.2	7.2	<.01
Child 11	-36.7	4.0	-9.0	<.01
Child 12	41.4	3.9	10.4	<.01
Child 13	5.7	4.1	1.4	.17
Child 14	1.9	4.6	.3	.67
Child 15	-35.1	3.6	-10.1	<.01

There are indeed some strong arguments to use multilevel techniques in the analysis of pitch. In this view the infants form a random factor (level 2) and each set of observations (level 1) is nested within each child. All multilevel models were tested with MLwiN 2.33 (Rasbash et al., 2005).

Different multilevel models were investigated. The basic model, the unconditional model with no predictors included in the equation (M0 in Table 5), indicated a significant inter-subject variability in the F0 mean (see Table 5); this represents a reason to carry out all the subsequent analyses with the multilevel software. The F0 mean value across subjects and across time was 366 Hz (Table 5), and it is in line with previous studies (see, for example, Amano et al., 2006).

Table 5. Predictors of F0 mean, multilevel results.

	M0		M1		M2		M3
	b	s.e.	b	s.e.	b	s.e.	b	s.e.
Fixed Effects
Constant	366.32	0.89	354.59	2.63	361.90	2.76	360.58	3.13
Age	--	--	1.60	0.49	1.40	0.49	1.62	0.56
Age²	--	--	-0.93	0.51	-0.96	0.51	-1.13	0.58
Productions; base vocalization
Grunt	--	--	--	--	-16.48	2.04	-13.21	3.85
Babb-1	--	--	--	--	-5.27	2.52	-0.36	6.83
Babb-2	--	--	--	--	-0.77	3.59	-19.41	9.71
Babb-3	--	--	--	--	14.52	6.70	24.84	23.20
Age*Grunt	--	--	--	--	--	--	-0.38	0.37
Age*Babb-1	--	--	--	--	--	--	-0.35	0.45
Age*Babb-2	--	--	--	--	--	--	1.31	0.65
Age*Babb-3	--	--	--	--	--	--	-0.69	1.43
Random effects (level 2)
u	2081.83	137.62	2069.24	137.62	2007.62	136.07	2005.86	136.07
Random effects (level 1)
e	4573.41	130.36	4561.20	130.15	4564.56	129.50	4561.65	129.55
Model Fit statistics
LL	--	--	-34.8	2 df	-74.19	6 df	-6.77	10 df

We also tested further models (M1, M2 and M3) including the effects of age, linear and squared, as fixed predictors (M1), and the fixed effects of type of production variables (M2) and finally a model including also the fixed effects of interactions between age (linear) and each type of production (M3).

We also tested the models with age as a random factor, the -2LL did not decrease significantly, so there was no reason to treat it as a random factor, we kept age and age squared as fixed factors for all the following analyses. The type of production was treated as a fixed factor with a fixed coefficient and vocalization as the reference category. All the models for F0 mean are therefore random intercept models because no other variable than the children have random effect.

The equation representing the final M3 was the following:

y = β_{0 ij} cost + β_{1} {Age}_{ij} + β_{2} {AgeSquared}_{ij} + β_{3} {Grunt}_{ij} + β_{4} Bab 1_{ij} + β_{5} Bab 2_{ij} + β_{6} Bab 3_{ij} + β_{7} {AgexGrunt}_{ij} + β_{8} AgexBab 1_{ij} + β_{9} AgexBab 2_{ij} + β_{10} AgexBab 3_{ij}

Where $β_{0 ij} = β_{0} + u_{0 j} + e_{0 ij}$ and cost = 1

The same analyses were followed both for F0 mean, for F0 range and F0 final contours using the same schema and the same hierarchical models.

F0 mean trajectories over time

The first analyses explored the changes in the F0 mean values over time and among the different type of productions. As reported above the basic model (M0) showed there was variation among infants (random part); consequently, all the models included this variation among infants.

We added age¹, linear and squared, as fixed predictors (M1). As can be seen in Table 5, only age linear fixed effects were significant. F0 mean values tended to increase over time.

Model 2 (M2) added to M1 the type of production as a fixed predictor with vocalization as the reference category. Results confirmed the significant effect of age and showed that grunts have the lower mean F0 values, higher values are present for variate babblings, while vocalizations, simple and reduplicate babblings have similar values and are situated at an intermediate level. The lack of differences between mono and by-syllables (simple and reduplicated babblings) was also found by Robb et al. (1989).

Model 3 (M3) added the interaction term between age and type of productions to M2. The effect of age, age squared, and the interaction between reduplicated babblings and age were significant. Nonetheless, the LL did not improve significantly, so Model 3 cannot be considered the best representation of F0 mean development and Model 2 was chosen and represented in Figure 2. Figure 2 shows the distances between the curves, the differences between the intercepts of each production. The form of the curve is due to the introduction of age and age squared in the model.

Figure 2. F0 mean values (in Hz) of each type of production from 4 to 16 months of age predicted by Model 2 (M2 in Table 5).

F0 range trajectories over time

The second set of analyses dealt with the changes of F0 range, measured in semitones, over time and among the type of productions (see Table 6). The basic model (M0) gives 4.47 semitone as mean value of the F0 range and shows the significant variation among subjects and over time.

Table 6. Predictors of F0 range, multilevel results.

	M0		M1		M2		M3
	b	s.e.	b	s.e.	b	s.e.	b	s.e.
Fixed Effects
Constant	4.47	0.04	4.70	0.08	4.88	0.09	4.89	0.10
Age	--	--	0.01	0.01	0.00	0.01	0.00	0.02
Age²	--	--	-0.06	0.02	-0.06	0.02	-0.05	0.02
Productions; base vocalization
Grunt	--	--	--	--	-0.45	0.07	-0.47	0.12
Babb-1	--	--	--	--	-0.19	0.08	0.29	0.23
Babb-2	--	--	--	--	0.77	0.12	1.01	0.33
Babb-3	--	--	--	--	0.89	0.22	-0.94	0.83
Age*Grunt	--	--	--	--	--	--	0.00	0.01
Age*Babb-1	--	--	--	--	--	--	-0.03	0.02
Age*Babb-2	--	--	--	--			-0.02	0.02
Age*Babb-3	--	--	--	--	--	--	0.12	0.05
Random effects (level 2)
u	12.96	0.23	12.72	0.23	12.32	0.22	12.31	0.22
Random effects (level 1)
e	1.36	0.04	1.37	0.04	1.42	0.04	1.42	0.04
Model Fit statistics
LL	--	--	-96.6	2 df	-117.73	6 df	-11.75	10 df

In Model 1 (M1), the significant effect of age showed that F0 range increased over time. The addition of the type of production, again with vocalization as the reference category, in Model 2 (M2), showed a significant effect of age, age squared and of each type of production except reduplicate babbling. Generally, the effect of age squared showed that F0 range very slightly increased during the first months but decreased later. More complex productions such as reduplicate and variate babblings tend to be produced on average with a wider F0 range than vocalizations. Grunts are the productions with the smallest F0 range. Also, simple babblings showed a narrower F0 range than vocalizations. The difference between simple and reduplicate babblings is in contrast with Robb et al. (1989), who reported monosyllables with a slightly higher F0 range than bi-syllables.

The inclusion of the interaction term in Model 3 (M3) revealed a significant effect of the interaction between age and variate babblings, but the improvement in LL was very little, so Model 2 was chosen as the best representation of F0 range development. Model 2 is represented in Figure 3.

Figure 3. F0 range (in semitones) values of each type of productions from 4 to 16 months of age predicted by Model 2 (M2 in Table 6).

F0 final contours trajectories over time

To explore the development trajectories of the ability of children to give to F0 changes special directions, multilevel logistic analyses were run using as dependent variable the presence or not of level, rising and falling F0 final contours in the production. Analyses were done on 9597 productions, excluding the productions for which the coding of F0 final contour was unclear due to the spectrogram being too noisy because of interferences or artifacts.

Firstly, Model 0 showed a significant variability between subjects in the production of level contours with an average probability of 79% to produce level contours (see Table 7). The addition of the fixed effects of age and age squared in Model 1 did not show significant effects. Model 2 showed significant effects of reduplicate and variate babblings that are produced at a greater extent with level F0 contours. The probability to produce level productions did not vary over time. The interaction between age and type of production in Model 3 was not significant. Model 2 was chosen as the best representation of the presence of level F0 contours productions, and it is represented in Figure 4.

Table 7. Predictors of presence of level F0 contour productions (binomial level vs no level).

Logistic Binomial Multilevel results.

	M0		M1		M2		M3
	b	s.e.	b	s.e.	b	s.e.	b	s.e.
Fixed Effects
Constant	1.36	0.05	1.58	0.16	1.72	0.17	1.74	0.19
Age	--	--	0.00	0.03	-0.03	0.03	-0.04	0.03
Age²	--	--	-0.03	0.03	-0.02	0.03	0.00	0.03
Productions; base vocalization
Grunt	--	--	--	--	-0.21	0.12	-0.23	0.23
Babb-1	--	--	--	--	0.03	0.13	0.41	0.35
Babb-2	--	--	--	--	1.23	0.21	0.93	0.58
Babb-3	--	--	--	--	2.23	0.51	3.34	1.90
Age*Grunt	--	--	--	--	--	--	0.00	0.02
Age*Babb-1	--	--	--	--	--	--	-0.03	0.02
Age*Babb-2	--	--	--	--	--	--	0.02	0.04
Age*Babb-3	--	--	--	--	--	--	-0.07	0.11
Random effects (level 2)
u	3.94	0.17	4.02	0.17	3.88	0.17	3.88	0.17
Random effects (level 1)
e	1.00	0.00	1.00	0.00	1.00	0.00	1.00	0.00

Figure 4. Presence of level F0 contours in each type of production from 4 to 16 months of age predicted by Model 2 (M2 in Table 7).

Concerning rising F0 final contours, Model 0 indicated significant variations among subjects and a probability of 59% to produced rising contours within the production (see Table 8). Adding age and age squared in Model 1 did not reveal any significant effects. Model 2 showed a significant effect of grunts, reduplicated and variated babblings. Children use less often rising final contours when pronouncing grunts and more often when pronouncing these more complex babblings. The probability to produce rising F0 final contours did not change over time. Model 3 did not reveal any significant effect of interactions. Hence Model 2 was chosen and represented in Figure 5.

Table 8. Predictors of the presence of rising F0 final contours in the production (binomial rising vs no rising).

Logistic Binomial Multilevel results.

	M0		M1		M2		M3
	b	s.e.	b	s.e.	b	s.e.	b	s.e.
Fixed Effects
Constant	0.35	0.04	0.53	0.13	0.68	0.13	0.68	0.15
Age	--	--	0.01	0.02	-0.01	0.02	-0.02	0.03
Age²	--	--	-0.04	0.02	-0.04	0.02	-0.02	0.03
Productions; base vocalization
Grunt	--	--	--	--	-0.28	0.09	-0.28	0.19
Babb-1	--	--	--	--	-0.06	0.11	0.32	0.28
Babb-2	--	--	--	--	0.77	0.15	1.01	0.41
Babb-3	--	--	--	--	1.17	0.27	0.34	0.94
Age*Grunt	--	--	--	--	--	--	0.00	0.02
Age*Babb-1	--	--	--	--	--	--	-0.03	0.02
Age*Babb-2	--	--	--	--	--	--	-0.02	0.03
Age*Babb-3	--	--	--	--	--	--	0.05	0.06
Random effects (level 2)
u	2.43	0.11	2.47	0.11	2.47	0.11	2.47	0.11
Random effects (level 1)
e	1.00	0.00	1.00	0.00	1.00	0.00	1.00	0.00

Figure 5. Presence of rising F0 final contours in each type of production from 4 to 16 months of age predicted by Model 2 (M2 in Table 9).

The same analyses were run for the dependent variable presence or not of a falling F0 final contour (see Table 9). The basic model showed a significant variability among subjects and a general probability of 62% to produce falling final contours within the vocal production. Model 1 showed a significant effect of age squared. The presence of falling final contours is quite stable all over the period considered. In Model 2, the effects of grunts, reduplicate and variate babbling were significant (see Figure 6). Reduplicate and variate babbling were pronounced with higher probability with F0 falling final contours. Model 3 did not reveal any significant effect of the interactions of age and type of production on the presence (or not) of an F0 falling final contour.

Table 9. Predictors of the presence of falling F0 final contours in the production (binomial falling vs no falling).

Logistic Binomial Multilevel results.

	M0		M1		M2		M3
	b	s.e.	b	s.e.	b	s.e.	b	s.e.
Fixed Effects
Constant	0.51	0.04	0.67	0.13	0.78	0.13	0.80	0.15
Age	--	--	0.02	0.02	-0.01	0.02	-0.01	0.03
Age²	--	--	-0.06	0.02	-0.04	0.02	-0.04	0.03
Productions; base vocalization
Grunt	--	--	--	--	-0.13	0.10	-0.18	0.19
Babb-1	--	--	--	--	0.11	0.11	0.09	0.28
Babb-2	--	--	--	--	1.04	0.16	0.94	0.43
Babb-3	--	--	--	--	1.8	0.32	2.19	1.13
Age*Grunt	--	--	--	--	--	--	0.01	0.02
Age*Babb-1	--	--	--	--	--	--	0.00	0.02
Age*Babb-2	--	--	--	--	--	--	0.01	0.03
Age*Babb-3	--	--	--	--	--	--	-0.02	0.07
Random effects (level 2)
u	2.53	0.11	2.58	0.11	2.57	0.12	2.57	0.12
Random effects (level 1)
e	1.00	0.00	1.00	0.00	1.00	0.00	1.00	0.00

Figure 6. Presence of falling F0 final contours in each type of production from 4 to 16 months of age predicted by Model 2 (M2 in Table 8).

Discussion

To give a contribution to the contrasting literature on the topic and to supply to the several limitations of previous studies, the present study had the aim to describe the development of F0-related prosodic features of the pre-lexical productions of Italian infants from the 4^th to the 16^th month of life.

One of the main findings that emerged is the presence of a significant variability among children. Since in the first months of life children show differences in the use of the voice, not only at the mean fundamental frequency level but in all the examined variables, that can be attributed to physiological differences in the shape and development of the larynx. This finding enchanted the need to put attention to a greater extent on individual variability when studies on infants and children are developed. For example, the F0 mean of children across ages varies between 309 to 392 Hz that indicates a difference of more than four semitones among children. This great individual variability may explain the different values found by previous studies, conducted on small groups of participants. Moreover, the tendency not to consider and control this variability in the analysis may explain the previous confusing or not significant findings. The variability among our 15 children is wider than the range found by Laufer and Horii (1977) within their four children (317 to 342 Hz), and it is in contrast with Flax et al. (1991), who did not find significant differences among their three children. We may suppose that a bigger sample could give a better picture of this variability and it may increase the chance to find such phenomena and therefore to control it. A further longitudinal investigation of these differences at older ages would help understanding if this variability is only present at this stage of physiological organization of the vocal tract or it states later or, eventually, it increases over age to make the well-known differences between the tone of the voice of the different persons.

Controlling for this variability, we were able to describe with more accuracy the developmental trajectories for all the variables examined. F0 mean tends to increase over time, about 3.2 Hz per month. This slight increase is in contrast with other previous studies but may confirm the slight increase found in studies with a lower number of subjects as in Laufer and Horii (1977), who found a similar, but not statistically significant, variation. Other authors reported an increase in F0 mean values, however only since the 5^th (Fairbanks, 1942) or the 10^th month (Murry et al., 1983).

In the first two years of life, infants improve their interactive abilities as can be seen by the greater duration of gazes and amount of gestures directed to the mother (Papaeliou & Trevarthen, 2006; Trevarthen, 1977). Accordingly, we may hypothesize that the voice is more and more used to establish contact with the caregiver and to respond to stimulation received. It could be possible that to this greater participation to the interaction corresponds an increase in activation state and, therefore, an increase of F0 mean values. Moreover, previous authors evidenced that communicative, investigative or emotional pre-lexical productions show different prosodic features. Specifically, a reduced F0 mean is present in not-communicative productions pronounced, for example, while exploring an object. In contrast, productions directed to the adult during face to face interactions and also productions associated with imperative gestures are pronounced with higher F0 mean (Aureli et al., 2017; Papaeliou et al., 2002; Papaeliou & Trevarthen, 2006). Since we recorded mother-infant interactions during face-to-face interactions without objects we may suppose that the coded infants’ vocal productions were mostly pronounced with the aim to communicate with the mother, so, with a higher F0 mean.

A different result was found concerning the developing trajectories of F0 range. Infants tend to slightly increase the amplitude of the variations within their productions during the first months while after the 7^th month the F0 range of all the productions decreases of about one semitone every two months. This may mean that very young infants’ productions are less controlled and characterized by very high variations; after the second half of the first year of life, infants tend to control better their voice and to use prosodic variations that are more similar to adult vocal productions. As previous studies have found infants tend to imitate the variations of the vocal productions of mothers (see Gratier & Devouche, 2011, and Ko et al., 2016) and we may suppose that this ability becomes more efficient after the 7^th month. The wider F0 range during the first months of life is consistent with Amano et al. (2006) findings on prosodic development of three Japanese infants. However, these authors reported the decrease in F0 range only after the two-words period. On the contrary, Snow (2004) reported an increase in F0 range from the first to the 4^th year of age. The different age ranges considered make it very difficult to compare the findings. We may suppose that our analyses, focused in a shorter age range and with more time points, were more effective in evidencing the variation not reported by other authors.

Concerning F0 contours, most of the vocal productions of infants can be considered flat. This finding is different from previous studies and it may be linked to the fact that during face-to-face interaction infants are not required to produce higher in F0 range productions to, for example, attract mother’s attention, or to ask for an object (see Aureli et al., 2017; D’Odorico & Franco, 1991 and Esteve-Gibert & Prieto, 2013). A further coding of the communicative function of the production would have given more information on this regard. Similarly, the percentages of rising and falling final contours do not vary over age. Our findings are not consistent neither with Snow (2006), neither with the hypothesis of Lieberman (1967). The significant effect of production at every age may contrast the idea that, during the first months of life, prosody is prevalently physiologically controlled. It is clear by our results that from the beginning of pre-lexical speech infants can use their prosodic competences, as confirmed by previous studies evidencing that infants can differentiate accents to signal communicative intent (Prieto et al., 2012).

Our data collection did not last enough longer to test Snow’s hypothesis (2006). Both the probability of producing rising and falling final contours slightly decreases over time, but we did not find the U shape effect found by Snow. If this is simply due to the age range explored in our study, we may suppose that the probability to produce these final contours may increase again later, after the 16^th month of life.

The main finding of the present study is the significant effect of the type of vocal production all over the prosodic variables examined. The effect indicates that all the productions considered show different prosodic characteristics, and these differences lasts all over the age range considered.

Grunts are confirmed to be the less communicative productions, indeed they are pronounced with the lowest F0 mean and F0 range values compared to other productions. Moreover, it’s when pronouncing grunts and vocalizations that infants seem to be less able to use rising and falling final contours. These productions are the most frequent all over the age period considered but resulted to have less communicative intonation features. High F0 mean, wide F0 range and also rising and falling contours may represent an index of activation and excitement, and our results show these prosodic features are more frequent in the productions that are used with a more specific communicative meaning, as the simple and variate babblings (Fasolo et al., 2008).

Some limits of the present study have to be addressed. First, we used as a measure of F0 range the semitones while many previous studies use the F0 mean SD; this makes the results less comparable. Moreover, we did not consider the different communicative intentions of vocal productions (e.g., were they in respond to maternal speech, were they produced to engage maternal attention, to complain about a need, etc.). This should be another important variable that needs to be taken into consideration.

Implications for future studies

The present study illustrates the growing need to take into consideration individual variability in assessing infant and children prosody. These findings suggest that in the second half of the first year of life, infants show an intonational repertoire that may contribute to regulating interaction with their partner, which is considered one of the major prerequisites for language acquisition (Papaeliou et al., 2002). The use of multilevel models was very useful to this attend. Applying statistical analyses that are able to explore and control individual variability becomes even more essential when studying children development. Further longitudinal studies may explore if this variability among children is stable over time or it decreases/increases. This would allow to know at what ages is better to control it or not.

Another implication concerns the need to explore linguistic and prosodic development even at the pre-linguistic stage. The prosody of speech has to be studied considering the type of production analyzed. This, with the code of the communicative intention of the production, will give a very interesting contribution to the study of infants’ prosodic development.

Ethics and consent

The study was approved by the ethical committee of the Department of Neuroscience Imaging and Clinical Science of the University of Chieti-Pescara (Ethical approval number: DNISC2962, 06.11.2019) and was conducted according to the American Psychological Association guidelines in accordance with the 1964 Helsinki Declaration.

A written informed consent (approved by the ethical committee) for participation in the study has been obtained by the mothers.

Data availability statement

Open Science Framework (OSF): The developing of prosody in infants: a longitudinal study over the first 16 months of life. https://doi.org/10.17605/OSF.IO/7PBN6 (D’Aloia, 2024).

The project contains the following underlying data:

• Number of prelexical production by child and session.xlsx. Data file which includes the children’s data.
• Read.me.pdf. A checklist adhering to STROBE guidelines for reporting observational studies.

Data are available under the terms of Creative Commons Attribution 4.0 International license (CC-BY 4.0)

Extended data

The extended data for this study is available in the Open Science Framework repository. This extended data is a component of the main project titled “The developing of prosody in infants: a longitudinal study over the 16 months of life.”

Open Science Framework (OSF). Extended data for “The developing of prosody in infants: a longitudinal study over the 16 months of life.”. Doi: https://doi.org/10.17605/OSF.IO/738W2

The component contains the following extended data:

• Table 1. Extended data. Data file which includes a literature review on F0 related (F0 mean, F0 range, F0 final contours) prosodic development.

Data are available under the terms of Creative Commons Attribution 4.0 International license (CC-BY 4.0)

References

Amano S, Nakatani T, Kondo T: Fundamental frequency of infants’ and parents’ utterances in longitudinal recordings. J. Acoust. Soc. Am. 2006; 119(3): 1636–1647. PubMed Abstract | Publisher Full Text
Astruc L, Payne E, Post B, et al.: Tonal targets in early child English, Spanish, and Catalan. Lang. Speech. 2013; 56(2): 229–253. PubMed Abstract | Publisher Full Text
Aureli T, Spinelli M, Fasolo M, et al.: The pointing–vocal coupling progression in the first half of the second year of life. Infancy. 2017; 22(6): 801–818. Publisher Full Text
Behrens H, Gut H: The relationship between prosodic and syntactic organization in early multiword speech. J. Child Lang. 2005; 32(1): 1–34. PubMed Abstract | Publisher Full Text
Bennett S: A 3-year longitudinal study of school-aged children’s fundamental frequencies. J. Speech Hear. Res. 1983; 26(1): 137–141. PubMed Abstract | Publisher Full Text
Boersma P, Weenink D: Praat. Doing phonetics by computer (Version 5.1).2005.
Cruttenden A: Intonation. Cambridge University Press; 1997.
Crystal D: The Analysis of Intonation in Young Children. Edward Arnold; 1978.
D’Aloia V: The developing of prosody in infants: a longitudinal study over the first 16 months of life. [Dataset]. Open Science Framework. 2024. Publisher Full Text
D’Odorico L: Non-segmental features in prelinguistic communications: an analysis of some types of infant cry and non-cry vocalizations. J. Child Lang. 1984; 11(1): 17–27.
D’Odorico L, Franco F: Selective production of vocalization types in different communication contexts. J. Child Lang. 1991; 18(3): 475–499. PubMed Abstract | Publisher Full Text
D’Odorico L, Fasolo M, Marchione D: The prosody of early multi-word speech: word order and its intonational realization in the speech of Italian children. Enfance. 2009; 3: 317–327.
De Carvalho A, He AX, Lidz J, et al.: Prosody and function words cue the acquisition of word meanings in 18-month-old infants. Psychol. Sci. 2019; 30(3): 319–332.
Esteve-Gibert N, Prieto P: Prosody signals the emergence of intentional communication in the first year of life: Evidence from Catalan-babbling infants. J. Child Lang. 2013; 40(5): 919–944. PubMed Abstract | Publisher Full Text
Esteve-Gibert N, Prieto P: The Development of Prosody in First Language Acquisition. Cambridge University Press; 2018.
Fairbanks G: An acoustical study of the pitch of infant hunger wails. Child Dev. 1942; 13: 227–232. Publisher Full Text
Fasolo M, D’Odorico L, Costantini A, et al.: The influence of biological, social, and developmental factors on language acquisition in pre-term born children. Int. J. Speech Lang. Pathol. 2010; 12(6): 461–471. PubMed Abstract | Publisher Full Text
Fasolo M, Majorano M, D’Odorico L: Babbling and first words in children with slow expressive development. Clin. Linguist. Phon. 2008; 22(2): 83–94. PubMed Abstract | Publisher Full Text
Fisher C, Tokura H: Prosody in speech to infants: Direct and indirect acoustic cues to syntactic structure. Signal to syntax: Bootstrapping from speech to grammar in early acquisition. 1996; 343–363.
Flax J, Lahey M, Harris K, et al.: Relations between prosodic variables and communicative functions. J. Child Lang. 1991; 18(01): 3–19. Publisher Full Text
Fox DB: An analysis of the pitch characteristics of infant vocalizations. Psychomusicology. 1990; 9(1): 21–30. Publisher Full Text
Gervain J: Plasticity in early language acquisition: the effects on prenatal and early childhood experience. Curr. Opin. Neurobiol. 2015; 35: 13–20. PubMed Abstract | Publisher Full Text
Gleitman L, Gleitman H, Landau B, et al.: Linguistics: the Cambridge survey: Vol. 3. Language: psychological and biological aspects.1988.
Goldstein MH, Schwade JA, Bornstein MH: The value of vocalizing: Five-month-old infants associate their own noncry vocalizations with responses from caregivers. Child Dev. 2009; 80(3): 636–644. PubMed Abstract | Publisher Full Text | Free Full Text
Gratier M, Devouche E: Imitation and repetition of prosodic contour in vocal interaction at 3 months. Dev. Psychol. 2011; 47(1): 67–76. PubMed Abstract | Publisher Full Text
Hsu H-C, Fogel A, Cooper RB: Infant vocal development during the first 6 months: speech quality and melodic complexity. Infant Child Dev. 2000; 9(1): 1–16. Publisher Full Text
Iyer SN, Oller DK: Fundamental frequency development in typically developing infants and infants with severe-to-profound hearing loss. Clin. Linguist. Phon. 2008; 22(12): 917–936. PubMed Abstract | Publisher Full Text | Free Full Text
Kahane JC, Kahn AR: Weight measurements of infant and adult intrinsic laryngeal muscles. Folia Phoniatr. Logop. 1984; 36(3): 129–133. PubMed Abstract | Publisher Full Text
Kent RD: Anatomical and Neuromuscular Maturation of the Speech Mechanism: Evidence from Acoustic Studies. J. Speech Lang. Hear. Res. 1976; 19(3): 421–447. PubMed Abstract | Publisher Full Text
Kent RD, Murray AD: Acoustic features of infant vocalic utterances at 3, 6, and 9 months. J. Acoust. Soc. Am. 1982; 72(2): 353–365. PubMed Abstract | Publisher Full Text
Ko E-S, Seidl A, Cristia A, et al.: Entrainment of prosody in the interaction of mothers with their young children. J. Child Lang. 2016; 43(2): 284–309. PubMed Abstract | Publisher Full Text
Laufer MZ, Horii Y: Fundamental frequency characteristics of infant non-distress vocalization during the first twenty-four weeks. J. Child Lang. 1977; 4(02): 171–184. Publisher Full Text
Lee S, Potamianos A, Narayanan S: Acoustics of children’s speech: Developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am. 1999; 105(3): 1455–1468. PubMed Abstract | Publisher Full Text
Lieberman P: Intonation, perception, and language. MIT Research Monograph; 1967.
Mampe B, Friederici AD, Christophe A, et al.: Newborns’ cry melody is shaped by their native language. Curr. Biol. 2009; 19(23): 1994–1997. PubMed Abstract | Publisher Full Text
Masterson JJ, Kamhi AG: Linguistic trade-offs in school-age children with and without language disorders. J. Speech Lang. Hear. Res. 1992; 35(5): 1064–1075. PubMed Abstract | Publisher Full Text
McRoberts GW, Best CT: Accommodation in mean f 0 during mother–infant and father–infant vocal interactions: a longitudinal case study. J. Child Lang. 1997; 24(03): 719–736. PubMed Abstract | Publisher Full Text
Murry T, Hoit-Dalgaard J, Gracco VL: Infant vocalization: A longitudinal study of acoustic and temporal parameters. Folia Phoniatr. 1983; 35(5): 245–253. PubMed Abstract | Publisher Full Text
Oller DK, Wieman LA, Doyle WJ, et al.: Infant babbling and speech. J. Child Lang. 1976; 3(1): 1–11. Publisher Full Text
Papaeliou C, Minadakis G, Cavouras D: Acoustic patterns of infant vocalizations expressing emotions and communicative functions. J. Speech Lang. Hear. Res. 2002; 45: 311–317. PubMed Abstract | Publisher Full Text
Papaeliou C, Trevarthen C: Prelinguistic pitch patterns expressing ‘communication’and ‘apprehension’. J. Child Lang. 2006; 33(1): 163–178. PubMed Abstract | Publisher Full Text
Papoušek M, Papoušek H: Forms and functions of vocal matching in interactions between mothers and their precanonical infants. First Lang. 1989; 9(6): 137–157. Publisher Full Text
Prieto P, Estrella A, Thorson J, et al.: Is prosodic development correlated with grammatical and lexical development? Evidence from emerging intonation in Catalan and Spanish. J. Child Lang. 2012; 39(2): 221–257. PubMed Abstract | Publisher Full Text
Prieto P, Vanrell M d M: Early intonational development in Catalan. Paper presented at the Proceedings of the XVIth International Congress of Phonetic Sciences. 2007.
Rasbash J, Steele F, Browne W, et al.: A user’s guide to MLwiN Version 2.0. Bristol: Centre for multilevel modelling, University of Bristol; 2005.
Robb MP, Saxman JH: Developmental Trends in Vocal Fundamental Frequency of Young Children. J. Speech Lang. Hear. Res. 1985; 28(3): 421–427. PubMed Abstract | Publisher Full Text
Robb MP, Saxman JH, Grant AA: Vocal fundamental frequency characteristics during the first two years of life. J. Acoust. Soc. Am. 1989; 85(4): 1708–1717. PubMed Abstract | Publisher Full Text
Rothgänger H: Analysis of the sounds of the child in the first year of age and a comparison to the language. Early Hum. Dev. 2003; 75(1–2): 55–69. PubMed Abstract | Publisher Full Text
Snow D: Falling intonation in the one- and two-syllable utterances of infants and preschoolers. J. Phon. 2004; 32(3): 373–393. Publisher Full Text
Snow D: Regression and Reorganization of Intonation Between 6 and 23 Months. Child Dev. 2006; 77(2): 281–296. PubMed Abstract | Publisher Full Text
Snow D, Balog HL: Do children produce the melody before the words? A review of developmental intonation research. Lingua. 2002; 112(12): 1025–1058. Publisher Full Text
Snow D, Ertmer DJ: Children’s development of intonation during the first year of cochlear implant experience. Clin. Linguist. Phon. 2012; 26(1): 51–70. PubMed Abstract | Publisher Full Text | Free Full Text
Sorianello P: Prosodia: Modelli e ricerca empirica. Carocci; 2021.
Soderstrom M, Blossom M, Foygel R, et al.: Acoustical cues and grammatical units in speech to two preverbal infants. J. Child Lang. 2008; 35(4): 869–902. PubMed Abstract | Publisher Full Text
Spinelli M, Fasolo M, Mesman J: Does prosody make the difference? A meta-analysis on relations between prosodic aspects of infant-directed speech and infant outcomes. Dev. Rev. 2017; 44: 1–18. Publisher Full Text
Stark RE, Bernstein LE, Demorest ME: Vocal communication in the first 18 months of life. J. Speech Lang. Hear. Res. 1993; 36(3): 548–558. PubMed Abstract | Publisher Full Text
Stoel-Gammon C: Relationships between lexical and phonological development in young children. J. Child Lang. 2011; 38(1): 1–34. PubMed Abstract | Publisher Full Text
Trevarthen C: Descriptive analyses of infant communicative behaviour.Shaffer HR, editor. Studies in mother-infant interaction: The Loch Lomond Symposium. London: Academic Press; 1977; pp. 227–270.
Wand MP, Jones MC: Kernel smoothing. Chapman and Hall/CRC; 1995.
Wasserman L: All of nonparametric statistics. Springer; 2006.
Wells B, Peppé S, Goulandris N: Intonation development from five to thirteen. J. Child Lang. 2004; 31(04): 749–778. PubMed Abstract | Publisher Full Text
Whalen DH, Levitt AG, Wang Q: Intonational differences between the reduplicative babbling of French- and English-learning infants. J. Child Lang. 1991; 18(03): 501–516. PubMed Abstract | Publisher Full Text
Zanchi P, Fasolo M, Spinelli M, et al.: The role of acoustic and contextual features in the recognition of crying causes. Psicol. Clin. Svilupp. 2016a; 20(1): 103–123.
Zanchi P, Zampini L, Fasolo M, et al.: Syntax and prosody in narratives: A study of preschool children. First Lang. 2016b; 36(2): 124–139.

Footnotes

1 Age squared was divided by 26 (the number of sessions) to make values comparable.

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 06 Aug 2024

Author details Author details

¹ Gabriele d'Annunzio University of Chieti and Pescara Department of Neuroscience and Imaging and Clinical Sciences, Chieti, Abruzzo, Italy
² Catholic University of the Sacred Heart Department of Psychology, Milan, Lombardy, Italy
³ University of Foggia Department of Humanities Arts Cultural Heritage Education Sciences, Foggia, Apulia, Italy

Valeria D'Aloia
Roles: Conceptualization, Writing – Original Draft Preparation

Paola Zanchi
Roles: Conceptualization, Formal Analysis, Writing – Original Draft Preparation

Maria Grazia Mada Logrieco
Roles: Methodology, Resources

Ilenia Passaquindici
Roles: Data Curation, Validation

Riccardo Palumbo
Roles: Resources, Writing – Review & Editing

Francesca Lionetti
Roles: Methodology, Writing – Review & Editing

Maria Spinelli
Roles: Conceptualization, Project Administration, Supervision

Mirco Fasolo
Roles: Conceptualization, Project Administration, Supervision

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 06 Aug 2024, 13:896

https://doi.org/10.12688/f1000research.154114.1

Copyright

© 2024 D'Aloia V et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

D'Aloia V, Zanchi P, Logrieco MGM et al. The developing of prosody in infants: a longitudinal study over the first 16 months of infant life [version 1; peer review: 2 approved with reservations, 1 not approved]. F1000Research 2024, 13:896 (https://doi.org/10.12688/f1000research.154114.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 1

VERSION 1

PUBLISHED 06 Aug 2024

Views

4

Reviewer Report 23 Aug 2025

Nicolas Audibert, Université Sorbonne Nouvelle and CNRS, Paris, France

Approved with Reservations

https://doi.org/10.5256/f1000research.169103.r400506

The study presented in this article is based on longitudinal data collected from Italian infants between the ages of 4 and 16 months, as well as the segmentation and categorization of this data into types of productions. The authors propose ... Continue reading

The study presented in this article is based on longitudinal data collected from Italian infants between the ages of 4 and 16 months, as well as the segmentation and categorization of this data into types of productions. The authors propose an analysis of different F0 descriptors in order to relate them to other languages.
This study has the advantage of including productions from Italian infants, who are poorly documented in the literature, and the contribution of longitudinal data is particularly valuable.
However, as it stands, the article has a number of weaknesses, particularly with regard to the extraction and parameterization of F0 and the statistical analysis of the results, and cannot be indexed without significant revision.

I agree with reviewer 1 on the missing information on the results in the abstract and the need for a thorough proofreading of the English.

In addition to reviewer 1's comments on the need for clarification regarding information on families and the context of interaction, it would be useful to specify the variety of Italian spoken by the parents of the infants studied and their possible use of dialects. This would make the study more replicable and also refine the interpretation of some of the inter-individual differences observed. Indeed, if we consider that the prosodic forms produced by infants are influenced by input, the prosodic differences between varieties or dialects observed in adults must be taken into account. Previous studies on prosodic differences between varieties of Italian indicate differences in timing, but also in the amplitude of prosodic movements (see Sardelli & Marotta, 2007, or Crocco et al., 2022), which may have an impact on the descriptors measured in this study.
Another point that should be better documented concerns the experimental setup used for data collection. The authors only mention the use of Audacity software but do not specify what types of microphone, sound card, and computer were used, nor how far away from the child the equipment was placed and how it was calibrated.

I also question the adequacy of the data made available in light of F1000R's open science policy. The data on OSF is limited to a summary table of the number of productions per child and session across all categories. While the audio data itself cannot be made public for confidentiality reasons, there is nothing to prevent the raw F0 measurements analyzed in the article from being published in anonymized form with the corresponding categories.

Below, I elaborate on the two main points on which most of my reservations focus.

Detection of F0 and related descriptors

My main reservations concern the method used to detect F0 and the metrics chosen to quantify F0 movements, with values that are analyzed and interpreted without prior verification. Although widely used in the literature, the Praat detection algorithm used with default settings (assuming this is what was done, as no information on the settings is provided) is prone to detection errors, particularly in the context of infants voices. See, for example, Nakatani et al. (2008) on the limitations of conventional methods of F0 estimation for application to infants voices recorded in noisy environments.
Furthermore, I understand the choice of using F0 range as a statistic to assess F0 variability for the purpose of comparability with existing studies. However, this statistic is known to be less robust to detection errors, see for example Portnova et al. (2025) in a clinical context. F0 range is particularly sensitive to extreme values resulting from detection errors, especially octave jumps, which cause F0 to be detected as half or double the actual value. Other measures of variability used in the literature, such as the interquartile range (IQR), are less sensitive to such errors. It would be useful to assess the extent to which estimating variability using the IQR instead of the range would change the results.

With regard more specifically to F0 detection, one of the sentences in the section “Suggestions for a better comprehension of the phenomena” seems particularly problematic to me, given the current state of knowledge on automatic F0 detection. The authors write, “old studies used visual inspections of the spectrograms to measure values of F0 and are therefore less reliable than modern analyses run with ad hoc programs.” On the contrary, it is debatable whether an algorithm designed for adult voices (which is not error-free even on this type of production) but applied to infants voices should necessarily be considered more reliable than a method based on spectrographic inspection. Automatic detection on infants voices is all the more problematic as it is difficult to collect vocal productions of infants without including environmental noise. In the absence of a reliable objective reference obtained by electroglottography, which seems difficult to apply to young children, visual inspection of the acoustic signal and/or spectrograms can thus be used as a reference for assessing the validity of automatic detection (see, for example, Vaysse et al. 2022). A possible alternative to visual identification of periods is the analysis of narrowband spectrograms, obtained using a longer analysis window (typically 20 ms), which allow harmonics (whose frequency is an integer multiple of F0) to be identified visually.
Systematic evaluation of the validity of automatic detection on a subset of the data by comparing it with visual verification would be particularly useful. Alternatively, inspection of the individual distributions of the detected values for a given type of production could identify the outliers most likely to result from detection errors.

Statistical modeling

For the most part, and although the choice of model M2 over M3 is debatable, the approach taken for statistical analysis using mixed-effects linear regression models appears consistent based on the information provided in the text. However, the significant discrepancy between the descriptive data presented in Figure 1 and the predictions of the corresponding model (Table 5 and Figure 2) raises questions about the validity of the modeling, particularly for the interactions between age and type of vocalization. In this respect, it is particularly surprising, given the descriptive data presented, to obtain such low values in Table 5 for the beta coefficients associated with these interactions (and also for the difference in log-likelihood between M2 and M3). Without the possibility of replicating the analysis, it is unclear whether this is due to an error in the data processing or to the choice of random structure modeling.
The choice to consider age as a fixed factor is relevant given the data collection methodology adopted, as each child was recorded at the same age, with the exception of a few missing data points. However, I question the interpretation of the following sentence: “All the models for F0 mean are therefore random intercept models because no other variable than the children have random effect.” According to the explanations provided by the authors, there is no indication that the possibility of including random slopes in the model was considered. However, it seems likely that the infants recorded show interindividual differences in their developmental trajectory between 4 and 16 months, which could have been accounted for by a random slope for children by age. Taking interindividual differences into account only as a random intercept amounts to considering that the differences observed between infants at 4 months remain the same between 4 and 16 months, which is probably not the case. This point should be reconsidered in the revised version. It is also regrettable that, in addition to the results of the regression models, the authors do not include (at least as supplementary material to avoid overloading the article) descriptive figures showing the average measurements for each child * age * type of production. Only Figure 1 presents descriptive data for the average F0 values, but these are averaged between children.

Furthermore, the choice to introduce age squared as a predictor in addition to age is not explained. Is this choice inspired by existing studies on development in order to represent a non-linear developmental trajectory in a linear model? Keeping both age and age squared as predictors complicates the interpretation of the modeling results without justification. This point should therefore be reconsidered in the revised version of the article, either by retaining only one of these two predictors (which seems to me to be the most appropriate choice) or by providing solid arguments in favor of using both.
Finally, the presentation of these analyses is relatively far removed from current best practice, particularly with regard to the definition of the models used. A presentation inspired by the syntax used in commonly used R packages (notably lme4) would enable most readers to understand the modeling choices made more directly. See, for example, Sonderegger (2023) for an overview of the use of regression models applied to linguistic data and their implementation with R.

To a lesser extent, as the caption provides little information, it is unclear from reading the article what exactly each point in Figure 1 corresponds to. Based on the following figures, which show the same categories and number of points, we can assume that this is the average by type of production * age for the 15 children, but this should be clarified. An overview of the inter-individual variability for each point or at the level of the regression curves would be useful in addition to the individual points.
If the objective of this figure is to represent developmental trajectories and compare them, why not opt for GAMM modeling rather than moving on to linear models? Such modeling, now common in speech sciences (see, for example, Tavakoli et al. 2024), would allow for statistical comparison between types of production at different ages, while statistically controlling for inter-individual variability. In my opinion, adding such modeling to the revised version is not essential, but it would be preferable to mention in the discussion the limitations inherent in choosing to rely exclusively on linear models.

Minor comments

While the use of a semitone scale to account for F0 variability is relevant, it is surprising to cite Snow & Balog (2002) as the original source for the use of such a scale. The use of semitone scales for the study of F0 is common practice and dates back to at least the 1950s (see Hirst & De Looze 2021 for a broader discussion of pitch measures and associated scales).

In the presentation of the multilevel analysis results, I assume that LL stands for log-likelihood, but this is not specified anywhere in the text.
In addition, the text refers to -2LL as a criterion for comparing models. Is this a typo, or a comparison criterion derived from log-likelihood (in which case it should be explained)? It should also be clarified what is presented in the LL row of the tables: are these differences in log-likelihood compared to the previous model?
In the presentation of the random structure, the authors write “each set of observations (level 1) is nested within each child.” It should be clarified what a “set of observations” corresponds to. Does this refer to integrating a random intercept per observation, as the note “within variance” suggests?
The note below the tables “Roughly if |2 x SE|≤|parameter| than p < .05” (I assume that “then” should be read instead of “than”) needs to be clarified and would benefit from a brief explanation in the text to clarify this criterion and its interpretation.

The figures use a category coding that is not fully consistent with that used in the text, and the images are pixelated (particularly Figure 1). It would be preferable to use a vector format, or failing that, a PNG format with a higher resolution. In addition, provided that this is compatible with the instructions given to authors, the use of a different color for each type of production would make the figures more readable.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

References

1. E, Sardelli G, Marotta: Prosodic parameters for the detection of regional varieties in Italian. https://www.researchgate.net/publication/228809041_Prosodic_parameters_for_the_detection_of_regional_varieties_in_Italian. 2007.
2. C, Crocco B, Gili Fivela M, D'Imperio: Comparing prosody of Italian varieties and dialects: data from Neapolitan. https://www.researchgate.net/publication/360794420_Comparing_prosody_of_Italian_varieties_and_dialects_data_from_Neapolitan. 2022.
3. Nakatani T, Amano S, Irino T, Ishizuka K, et al.: A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments. Speech Communication. 2008; 50 (3): 203-214 Publisher Full Text
4. Portnova A, Fletcher A, Wisler A, Borrie S: Assessing Fundamental Frequency Variation in Speakers With Parkinson's Disease: Effects of Tracking Errors. Journal of Speech, Language, and Hearing Research. 2025; 68 (7S): 3568-3582 Publisher Full Text
5. Vaysse R, Astésano C, Farinas J: Performance analysis of various fundamental frequency estimation algorithms in the context of pathological speech. The Journal of the Acoustical Society of America. 2022; 152 (5): 3091-3101 Publisher Full Text
6. M, Sonderegger: Regression Modeling for Linguistic Data. https://books.google.co.in/books/about/Regression_Modeling_for_Linguistic_Data.html?id. 2023.
7. Tavakoli S, Matteo B, Pigoli D, Chodroff E, et al.: Statistics in Phonetics. Annual Review of Statistics and Its Application. 2025; 12 (1): 133-156 Publisher Full Text
8. DJ, Hirst C, De Looze: Measuring Speech. Fundamental frequency and pitch. https://www.researchgate.net/publication/359067765_Hirst_Daniel_Celine_De_Looze_2021_Measuring_Speech_Fundamental_frequency_and_pitch_In_Rachael-Anne_Knight_and_Jane_Setter_eds_The_Cambridge_Handbook_of_Phonetics_Chapter_13_336-361. 2021.

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Acoustic phonetics, corpus phonetics, inter- and intra-speaker variation in voice and speech

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

5

Reviewer Report 23 Aug 2025

Marilyn Vihman, University of California, Berkeley, Berkeley, USA; Language and Linguistic Science, University of York, York, UK

Not Approved

https://doi.org/10.5256/f1000research.169103.r400502

This study follows 15 (monolingual Italian) infants over 12 months (from 4 to 16 months); such a relatively large longitudinal study should, in principle, constitute an important contribution to the literature on prosodic development. Unfortunately, the study is limited in ... Continue reading

This study follows 15 (monolingual Italian) infants over 12 months (from 4 to 16 months); such a relatively large longitudinal study should, in principle, constitute an important contribution to the literature on prosodic development. Unfortunately, the study is limited in several ways and is poorly written. At the very least, serious editorial attention is needed, not least to the English, before the paper is made generally available. (What, for example, are ‘prosodic aspects of intonation’? Intonation is prosody. By ‘early infants’ vocalizations’ : What are ‘early infants’, or do the authors mean ‘infants’ early vocalizations? What does it mean to say that the ‘quality’ of falling and rising contours ‘decreases’? Why, after the first mention, is variegation re-termed ‘variate’? and so on.)

The authors do not mention in the abstract or in the opening pages that they analysed only ‘pre-lexical vocalizations’, and they do not say what motivated this choice, which renders much of the introduction irrelevant or misleading. But how could they ‘give a broad picture of the longitudinal development of intonational repertory of children’s spontaneous

productions’ while omitting the word attempts that are typical in the age range covered by the last few months of the study (Vihman et al., 1985; Vihman & McCune, 1994) (Ref 1 and 2) It is unclear why prosodic change over developmental time – increase or decrease in pitch, final syllable contour – is to be expected in vocalizations that co-occur, for most infants over the age of 12 months, with increasingly dominant attempts at word production. That is, the role of ‘non-lexical vocalizations’ in the infant’s life can be assumed to change (and diminish): Once words are being used and taken as communicative efforts by caregivers, babble is surely restricted in any communicative value it may once have had.

In fact, the figures given here – when the reader is finally told just what the data include, in Table 2 – show that 37 of the 65 ‘missing sessions’ occurred in the last 3 months of the study (altogether, in those months, there were 5 to 7 missing sessions out of the 15 planned). Does this mean that, from about 12 months on, many of the children failed to produce a single ‘non-lexical vocalization’ in the 10-minute sample, whereas before that point most children produced roughly 30 such vocalizations per session? We don’t know, as what the authors mean by ‘missing data’ is not explained. In fact, the key change over this period is the fading out of babble, as has been amply shown before. Thus it is unclear why the rise or fall in pitch in such vocalizations in this late period of occurrence should be of interest, if the parallel changes in word attempts are excluded from analysis. (See Vihman, 1996 (Ref 2), for an overview of the literature on prosodic development to that date, which is considerably more extensive than the list given in the table [accessible only on-line]).

More critically, we learn in the Methods that the study is based on mother-child face-to-face interactions, without toys. This method is not a problem for the younger infants – up to 7 or 8 months, perhaps – but it is no longer a natural situation for infants of 9 months or more, when shared attention grows to be a major element in any interaction. It is unclear what the mothers made of the instruction to ‘play normally’ under these conditions. Where there is no object to share, the ‘communicative situation’ is unnaturally impoverished and unlikely to reveal anything about any aspect of language developmental.
Finally, the authors give as one of their findings the existence of extensive individual differences. But this has been very well established for at least 40 years (cf. Vihman et al., 1986). (Ref 1)

The authors include what they refer to as ‘communicative grunt’ among the non-lexical vocalizations they analyse. This is not uncontroversial: Oller and his colleagues, for example, consider grunts to be vegetative, while McCune and her colleagues view ‘communicative grunts’ as the third of three successive developmental grunt types – physical, attentional and communicative, the last type appearing only at about 13 months. Here, the authors provide no further account or definition; we do not know how grunts were identified. But the point is important, as these vocalizations make up fully 26% of the vocalizations counted, and the authors report differences in typical pitch use for grunts compared with the other vocalizations of interest (they ‘have the lower mean F0 values’ and the ‘smallest range’).

In fact, just what the authors take to be a ‘communicative grunt’ is not at all clear, as they indicate (under Descriptive analysis) that they found grunts to be ‘very common during the first seven months of life, but their frequency decreased over time since their use becomes very sporadic’ (i.e., when their use becomes…?). This is in sharp contrast with the many accounts of grunt production by McCune and her colleagues: At and before 7 months only physiological and possibly attentional grunts have been identified, according to those authors. Here, in the Discussion, the authors conclude that ‘Grunts are confirmed to be the less communicative productions’: What could this mean? The only grunts counted were labeled ‘communicative’ – so if they were ‘the less communicative’, on what basis were they coded at all? This is a muddle.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

No

References

1. Vihman M, Macken M, Miller R, Simmons H, et al.: From Babbling to Speech: A Re-Assessment of the Continuity Issue. Language. 1985; 61 (2). Publisher Full Text
2. Wells B: M. M. Vihman,Phonological development: the origins of language in the child . Oxford: Basil Blackwell, 1996. Pp. xiv+312.Journal of Child Language. 1997; 24 (3): 781-788 Publisher Full Text
3. Vihman M, McCune L: When is a word a word?. Journal of Child Language. 1994; 21 (3): 517-542 Publisher Full Text
4. McCune L, Zlatev J: Dynamic systems in semiotic development: The transition to reference. Cognitive Development. 2015; 36: 161-170 Publisher Full Text
5. McCune L, Vihman M, Roug-Hellichius L, Delery D, et al.: Grunt communication in human infants (Homo sapiens).Journal of Comparative Psychology. 1996; 110 (1): 27-37 Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Phonological development

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

16

Reviewer Report 01 Oct 2024

Sonia Frota, Center of Linguistics of the University of Lisbon, Lisbon, Portugal

Approved with Reservations

https://doi.org/10.5256/f1000research.169103.r318803

This study examined the development fo F0 related variables in Italian-learning infants from 4 to 16 months of age. It certainly addresses a relevant topic, offering additional evidence and a contribution to the scarce literature on early prosodic development. However, the ... Continue reading

This study examined the development fo F0 related variables in Italian-learning infants from 4 to 16 months of age. It certainly addresses a relevant topic, offering additional evidence and a contribution to the scarce literature on early prosodic development. However, the paper requires major revisions before it can be indexed. There are also issues related to the use of the English language that need to be amended. I encourage the authors to address all the major and minor issues detailed below and in the annotated pdf. I hope that the detailed comments provided will be helpful to improve the paper.

Abstract
The presentation of the results is too vague.

Introduction

2nd paragraph: You might want to add European Portuguese to the set of languages studied. The study below includes a comparison with the results reported by Whalen et al., 1991):

Frota & Vigário (1995) The intonation of one European Portuguese Infant: a first approach. In I. H. Faria & M. J. Freitas (eds) Studies on the Acquisition of Portuguese. Lisboa: Colibri, pp. 17-34.

3rd paragraph: I believe that you mean in Italian-learning infants. If not, there certainly are studies on infants learning other languages. For example, there is an old study by Shepard and Lane (https://doi.org/10.1044/jshr.1101.94), the study by Mampe et al. (2009), and other work is mentioned in Frota et al., (2016) (such us by Halle et al, or Snow).

p.4 parag. 2: Frota et al. (2016) report development in F0 scaling (which is related to F0 range), which only becomes adult-like around 20-22 months of age.
p. 4 parag. 3: I believe that what is relevant is the language infants are exposed to and not the country where the study is conducted. I suggest rephrasing.

Beyond English-learning infants, Frota & Vigario (1995) found that falling contours predominate in Portuguese-learning infants, who in this respect are similar to English than to French-learning infants. I would suggest reporting here the finding by Whalen et al. (1991) who report that rising contours predominate in French-learning infants.
The present study

paragraph 1: Other studies have suggested that the prosodic component of language shows better performance earlier in development than, for example, other aspects of phonology, or the syntactic component (Prieto et al., 2012; Frota et al., 2016).

How are 'early' and 'later' defined? It would be good to give the reader a better understanding of the hypothesis put forward.

paragraph 2: Please note that there are studies that differentiated babbling from other types of utterances.

paragraph 4: You expect to find 'significant variability among children'. Why? It would be nice to motivate this expectation.

Methods
Participants
If the aspects at stake were not yet analyzed in previous studies, one should not dismiss the possibility of gender differences. Several studies have shown that boys show a slower maturational course of speech motor development. It is thus possible that this might affect prosodic development.
Please provide details on how being 'healthy' was assessed.

Procedure
Please provide information on how the families/babies were recruited. Kindly, also refer here to the ethics procedures followed (e.g., submission to ethics committee, prior informed consent, etc.).

Did all the mothers normally play without toys with their infants?
Coding: pre-lexical productions
Although the goal is to focus on pre-lexical production, it would be highly informative to know per child how many 'words' were produced and from what age. If a given child at a given point in development is producing a fair amount of words, then perhaps the so-called pre-lexical productions from the same child should be discarded, or at least treated differently from those pre-lexical productions that precede the production of words. Of course, a definition of what counts as a 'word' is needed.

Reliability
I'm afraid a clarification is needed. Was mean F0 not automatically calculated as stated above? How could inter-coder reliability be computed for this variable?
A note on the availability of the data: The raw data is not available, I believe. This would be important not only to replicate the current analyses, but also to explore the data with new analyses.

Results
Descriptive analysis
Does canonical babbling refer to simple babbling? Please keep the terminology the same throughout the paper.

Between subjects variability exploration
I believe there should instead be a section on the statistical analyses.
Maybe I'm missing something. Shouldn't age be factored out to investigated between-subject variability? Also, what is the impact of the different types of productions on such variability? I suggest reorganizing this section so that these issues are immediately put forward, instead of looking at variability per se (as this is naturally a given).
Could you provide the measures of comparison among models? For example, did you use Akaike Information Criteria? If not, why not and what were the measures used?

Discussion
Please rephrase the first sentence, as the use of English needs to be corrected.
p. 16, parag.2, lines 6-7: The previous sentence focused on variability across ages. Variability across ages is not individual variability, but variability due to age. One would have individual variability if, for example, different individuals would not show the same trend in variability across age, or the same trend in variability across type of production. The use of 'individual variability' and all the discussion around it, requires clarification and amendment.
p. 17, parag.1: To the extent that F0 range is related to scaling of F0 contours, the use of different types of contours would have a tremendous impact on a rough measure such as mean F0 range. Frota et al. (2016), analyzing the F0 contours produced by children, found that F0 scaling is nearly adult-like between 18 and 24 months, depending on the type of F0 contour (with falling contours for statements getting adult-like early than calling contours, for example).
p. 17, parag.2: In your data, most of the productions were flat. Could this not be related to the fact that only pre-lexical productions were considered? For example, Prieto et al. (2012) and Frota et al. (2016) considered meaningful utterances according to the criteria proposed in Snow (2006) and found very different results, with falling nuclear contours predominating initially and an increasing diversity of nuclear contours emerging over time (that is already seen between 12 and 16 months, although it becomes most evident from 16 months onwards).
p. 17 parag.2: Kindly rephrase the last two sentences.
p. 17, parag.5: 'Grunts are confirmed to be the less communicative productions'. Please add a reference here.
p. 17, parag.6: Together with the communicative intention, an important limitation is the consideration of general acoustic measures of the productions (along the lines of contour-based approaches) without a look into the structural properties of the melodies produced ( such as type of nuclear contour, i.e., pitch accent and boundary tone).

Please rephrase the text within parentheses.
Implications for future studies
It is difficult to see how the findings from the present study show the presence of an intonational repertoire. What are the structural entities of that repertoire and what do they convey? Please also see the comments above.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Partly

References

1. Frota S, Cruz M, Matos N, Vigário M: Early Prosodic Development. 6: 295-324 Publisher Full Text
2. Prieto P, Estrella A, Thorson J, Vanrell Mdel M: Is prosodic development correlated with grammatical and lexical development? Evidence from emerging intonation in Catalan and Spanish.J Child Lang. 2012; 39 (2): 221-57 PubMed Abstract | Publisher Full Text
3. Mampe B, Friederici AD, Christophe A, Wermke K: Newborns' cry melody is shaped by their native language.Curr Biol. 2009; 19 (23): 1994-7 PubMed Abstract | Publisher Full Text
4. Sheppard WC, Lane HL: Development of the prosodic features of infant vocalizing.J Speech Hear Res. 1968; 11 (1): 94-108 PubMed Abstract | Publisher Full Text
5. Snow D: Regression and reorganization of intonation between 6 and 23 months.Child Dev. 2006; 77 (2): 281-96 PubMed Abstract | Publisher Full Text
6. Whalen DH, Levitt AG, Wang Q: Intonational differences between the reduplicative babbling of French- and English-learning infants.J Child Lang. 1991; 18 (3): 501-16 PubMed Abstract | Publisher Full Text
7. S Frota: The intonation of one European Portuguese Infant: a first approach. In Studies on the Acquisition of Portuguese. 1995.

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Experimental linguistics and psycholinguistics; prosody; early language development

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 06 Aug 2024

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 1 06 Aug 24	read	read	read

Sonia Frota, Center of Linguistics of the University of Lisbon, Lisbon, Portugal
Marilyn Vihman, University of California, Berkeley, Berkeley, USA; University of York, York, UK
Nicolas Audibert, Université Sorbonne Nouvelle and CNRS, Paris, France

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

4 Views

23 Aug 2025 | for Version 1

Nicolas Audibert, Université Sorbonne Nouvelle and CNRS, Paris, France

4 Views Cite this report Responses(0)

Approved With Reservations

The study presented in this article is based on longitudinal data collected from Italian infants between the ages of 4 and 16 months, as well as the segmentation and categorization of this data into types of productions. The authors propose an analysis of different F0 descriptors in order to relate them to other languages.
This study has the advantage of including productions from Italian infants, who are poorly documented in the literature, and the contribution of longitudinal data is particularly valuable.
However, as it stands, the article has a number of weaknesses, particularly with regard to the extraction and parameterization of F0 and the statistical analysis of the results, and cannot be indexed without significant revision.

I agree with reviewer 1 on the missing information on the results in the abstract and the need for a thorough proofreading of the English.

In addition to reviewer 1's comments on the need for clarification regarding information on families and the context of interaction, it would be useful to specify the variety of Italian spoken by the parents of the infants studied and their possible use of dialects. This would make the study more replicable and also refine the interpretation of some of the inter-individual differences observed. Indeed, if we consider that the prosodic forms produced by infants are influenced by input, the prosodic differences between varieties or dialects observed in adults must be taken into account. Previous studies on prosodic differences between varieties of Italian indicate differences in timing, but also in the amplitude of prosodic movements (see Sardelli & Marotta, 2007, or Crocco et al., 2022), which may have an impact on the descriptors measured in this study.
Another point that should be better documented concerns the experimental setup used for data collection. The authors only mention the use of Audacity software but do not specify what types of microphone, sound card, and computer were used, nor how far away from the child the equipment was placed and how it was calibrated.

I also question the adequacy of the data made available in light of F1000R's open science policy. The data on OSF is limited to a summary table of the number of productions per child and session across all categories. While the audio data itself cannot be made public for confidentiality reasons, there is nothing to prevent the raw F0 measurements analyzed in the article from being published in anonymized form with the corresponding categories.

Below, I elaborate on the two main points on which most of my reservations focus.

Detection of F0 and related descriptors

My main reservations concern the method used to detect F0 and the metrics chosen to quantify F0 movements, with values that are analyzed and interpreted without prior verification. Although widely used in the literature, the Praat detection algorithm used with default settings (assuming this is what was done, as no information on the settings is provided) is prone to detection errors, particularly in the context of infants voices. See, for example, Nakatani et al. (2008) on the limitations of conventional methods of F0 estimation for application to infants voices recorded in noisy environments.
Furthermore, I understand the choice of using F0 range as a statistic to assess F0 variability for the purpose of comparability with existing studies. However, this statistic is known to be less robust to detection errors, see for example Portnova et al. (2025) in a clinical context. F0 range is particularly sensitive to extreme values resulting from detection errors, especially octave jumps, which cause F0 to be detected as half or double the actual value. Other measures of variability used in the literature, such as the interquartile range (IQR), are less sensitive to such errors. It would be useful to assess the extent to which estimating variability using the IQR instead of the range would change the results.

With regard more specifically to F0 detection, one of the sentences in the section “Suggestions for a better comprehension of the phenomena” seems particularly problematic to me, given the current state of knowledge on automatic F0 detection. The authors write, “old studies used visual inspections of the spectrograms to measure values of F0 and are therefore less reliable than modern analyses run with ad hoc programs.” On the contrary, it is debatable whether an algorithm designed for adult voices (which is not error-free even on this type of production) but applied to infants voices should necessarily be considered more reliable than a method based on spectrographic inspection. Automatic detection on infants voices is all the more problematic as it is difficult to collect vocal productions of infants without including environmental noise. In the absence of a reliable objective reference obtained by electroglottography, which seems difficult to apply to young children, visual inspection of the acoustic signal and/or spectrograms can thus be used as a reference for assessing the validity of automatic detection (see, for example, Vaysse et al. 2022). A possible alternative to visual identification of periods is the analysis of narrowband spectrograms, obtained using a longer analysis window (typically 20 ms), which allow harmonics (whose frequency is an integer multiple of F0) to be identified visually.
Systematic evaluation of the validity of automatic detection on a subset of the data by comparing it with visual verification would be particularly useful. Alternatively, inspection of the individual distributions of the detected values for a given type of production could identify the outliers most likely to result from detection errors.

Statistical modeling

For the most part, and although the choice of model M2 over M3 is debatable, the approach taken for statistical analysis using mixed-effects linear regression models appears consistent based on the information provided in the text. However, the significant discrepancy between the descriptive data presented in Figure 1 and the predictions of the corresponding model (Table 5 and Figure 2) raises questions about the validity of the modeling, particularly for the interactions between age and type of vocalization. In this respect, it is particularly surprising, given the descriptive data presented, to obtain such low values in Table 5 for the beta coefficients associated with these interactions (and also for the difference in log-likelihood between M2 and M3). Without the possibility of replicating the analysis, it is unclear whether this is due to an error in the data processing or to the choice of random structure modeling.
The choice to consider age as a fixed factor is relevant given the data collection methodology adopted, as each child was recorded at the same age, with the exception of a few missing data points. However, I question the interpretation of the following sentence: “All the models for F0 mean are therefore random intercept models because no other variable than the children have random effect.” According to the explanations provided by the authors, there is no indication that the possibility of including random slopes in the model was considered. However, it seems likely that the infants recorded show interindividual differences in their developmental trajectory between 4 and 16 months, which could have been accounted for by a random slope for children by age. Taking interindividual differences into account only as a random intercept amounts to considering that the differences observed between infants at 4 months remain the same between 4 and 16 months, which is probably not the case. This point should be reconsidered in the revised version. It is also regrettable that, in addition to the results of the regression models, the authors do not include (at least as supplementary material to avoid overloading the article) descriptive figures showing the average measurements for each child * age * type of production. Only Figure 1 presents descriptive data for the average F0 values, but these are averaged between children.

Furthermore, the choice to introduce age squared as a predictor in addition to age is not explained. Is this choice inspired by existing studies on development in order to represent a non-linear developmental trajectory in a linear model? Keeping both age and age squared as predictors complicates the interpretation of the modeling results without justification. This point should therefore be reconsidered in the revised version of the article, either by retaining only one of these two predictors (which seems to me to be the most appropriate choice) or by providing solid arguments in favor of using both.
Finally, the presentation of these analyses is relatively far removed from current best practice, particularly with regard to the definition of the models used. A presentation inspired by the syntax used in commonly used R packages (notably lme4) would enable most readers to understand the modeling choices made more directly. See, for example, Sonderegger (2023) for an overview of the use of regression models applied to linguistic data and their implementation with R.

To a lesser extent, as the caption provides little information, it is unclear from reading the article what exactly each point in Figure 1 corresponds to. Based on the following figures, which show the same categories and number of points, we can assume that this is the average by type of production * age for the 15 children, but this should be clarified. An overview of the inter-individual variability for each point or at the level of the regression curves would be useful in addition to the individual points.
If the objective of this figure is to represent developmental trajectories and compare them, why not opt for GAMM modeling rather than moving on to linear models? Such modeling, now common in speech sciences (see, for example, Tavakoli et al. 2024), would allow for statistical comparison between types of production at different ages, while statistically controlling for inter-individual variability. In my opinion, adding such modeling to the revised version is not essential, but it would be preferable to mention in the discussion the limitations inherent in choosing to rely exclusively on linear models.

Minor comments

While the use of a semitone scale to account for F0 variability is relevant, it is surprising to cite Snow & Balog (2002) as the original source for the use of such a scale. The use of semitone scales for the study of F0 is common practice and dates back to at least the 1950s (see Hirst & De Looze 2021 for a broader discussion of pitch measures and associated scales).

In the presentation of the multilevel analysis results, I assume that LL stands for log-likelihood, but this is not specified anywhere in the text.
In addition, the text refers to -2LL as a criterion for comparing models. Is this a typo, or a comparison criterion derived from log-likelihood (in which case it should be explained)? It should also be clarified what is presented in the LL row of the tables: are these differences in log-likelihood compared to the previous model?
In the presentation of the random structure, the authors write “each set of observations (level 1) is nested within each child.” It should be clarified what a “set of observations” corresponds to. Does this refer to integrating a random intercept per observation, as the note “within variance” suggests?
The note below the tables “Roughly if |2 x SE|≤|parameter| than p < .05” (I assume that “then” should be read instead of “than”) needs to be clarified and would benefit from a brief explanation in the text to clarify this criterion and its interpretation.

The figures use a category coding that is not fully consistent with that used in the text, and the images are pixelated (particularly Figure 1). It would be preferable to use a vector format, or failing that, a PNG format with a higher resolution. In addition, provided that this is compatible with the instructions given to authors, the use of a different color for each type of production would make the figures more readable.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

References

1. E, Sardelli G, Marotta: Prosodic parameters for the detection of regional varieties in Italian. https://www.researchgate.net/publication/228809041_Prosodic_parameters_for_the_detection_of_regional_varieties_in_Italian. 2007.
2. C, Crocco B, Gili Fivela M, D'Imperio: Comparing prosody of Italian varieties and dialects: data from Neapolitan. https://www.researchgate.net/publication/360794420_Comparing_prosody_of_Italian_varieties_and_dialects_data_from_Neapolitan. 2022.
3. Nakatani T, Amano S, Irino T, Ishizuka K, et al.: A method for fundamental frequency estimation and voicing decision: Application to infant utterances recorded in real acoustical environments. Speech Communication. 2008; 50 (3): 203-214 Publisher Full Text
4. Portnova A, Fletcher A, Wisler A, Borrie S: Assessing Fundamental Frequency Variation in Speakers With Parkinson's Disease: Effects of Tracking Errors. Journal of Speech, Language, and Hearing Research. 2025; 68 (7S): 3568-3582 Publisher Full Text
5. Vaysse R, Astésano C, Farinas J: Performance analysis of various fundamental frequency estimation algorithms in the context of pathological speech. The Journal of the Acoustical Society of America. 2022; 152 (5): 3091-3101 Publisher Full Text
6. M, Sonderegger: Regression Modeling for Linguistic Data. https://books.google.co.in/books/about/Regression_Modeling_for_Linguistic_Data.html?id. 2023.
7. Tavakoli S, Matteo B, Pigoli D, Chodroff E, et al.: Statistics in Phonetics. Annual Review of Statistics and Its Application. 2025; 12 (1): 133-156 Publisher Full Text
8. DJ, Hirst C, De Looze: Measuring Speech. Fundamental frequency and pitch. https://www.researchgate.net/publication/359067765_Hirst_Daniel_Celine_De_Looze_2021_Measuring_Speech_Fundamental_frequency_and_pitch_In_Rachael-Anne_Knight_and_Jane_Setter_eds_The_Cambridge_Handbook_of_Phonetics_Chapter_13_336-361. 2021.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Acoustic phonetics, corpus phonetics, inter- and intra-speaker variation in voice and speech

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

5 Views

23 Aug 2025 | for Version 1

Marilyn Vihman, University of California, Berkeley, Berkeley, USA; Language and Linguistic Science, University of York, York, UK

5 Views Cite this report Responses(0)

Not Approved

This study follows 15 (monolingual Italian) infants over 12 months (from 4 to 16 months); such a relatively large longitudinal study should, in principle, constitute an important contribution to the literature on prosodic development. Unfortunately, the study is limited in several ways and is poorly written. At the very least, serious editorial attention is needed, not least to the English, before the paper is made generally available. (What, for example, are ‘prosodic aspects of intonation’? Intonation is prosody. By ‘early infants’ vocalizations’ : What are ‘early infants’, or do the authors mean ‘infants’ early vocalizations? What does it mean to say that the ‘quality’ of falling and rising contours ‘decreases’? Why, after the first mention, is variegation re-termed ‘variate’? and so on.)

The authors do not mention in the abstract or in the opening pages that they analysed only ‘pre-lexical vocalizations’, and they do not say what motivated this choice, which renders much of the introduction irrelevant or misleading. But how could they ‘give a broad picture of the longitudinal development of intonational repertory of children’s spontaneous

productions’ while omitting the word attempts that are typical in the age range covered by the last few months of the study (Vihman et al., 1985; Vihman & McCune, 1994) (Ref 1 and 2) It is unclear why prosodic change over developmental time – increase or decrease in pitch, final syllable contour – is to be expected in vocalizations that co-occur, for most infants over the age of 12 months, with increasingly dominant attempts at word production. That is, the role of ‘non-lexical vocalizations’ in the infant’s life can be assumed to change (and diminish): Once words are being used and taken as communicative efforts by caregivers, babble is surely restricted in any communicative value it may once have had.

In fact, the figures given here – when the reader is finally told just what the data include, in Table 2 – show that 37 of the 65 ‘missing sessions’ occurred in the last 3 months of the study (altogether, in those months, there were 5 to 7 missing sessions out of the 15 planned). Does this mean that, from about 12 months on, many of the children failed to produce a single ‘non-lexical vocalization’ in the 10-minute sample, whereas before that point most children produced roughly 30 such vocalizations per session? We don’t know, as what the authors mean by ‘missing data’ is not explained. In fact, the key change over this period is the fading out of babble, as has been amply shown before. Thus it is unclear why the rise or fall in pitch in such vocalizations in this late period of occurrence should be of interest, if the parallel changes in word attempts are excluded from analysis. (See Vihman, 1996 (Ref 2), for an overview of the literature on prosodic development to that date, which is considerably more extensive than the list given in the table [accessible only on-line]).

More critically, we learn in the Methods that the study is based on mother-child face-to-face interactions, without toys. This method is not a problem for the younger infants – up to 7 or 8 months, perhaps – but it is no longer a natural situation for infants of 9 months or more, when shared attention grows to be a major element in any interaction. It is unclear what the mothers made of the instruction to ‘play normally’ under these conditions. Where there is no object to share, the ‘communicative situation’ is unnaturally impoverished and unlikely to reveal anything about any aspect of language developmental.
Finally, the authors give as one of their findings the existence of extensive individual differences. But this has been very well established for at least 40 years (cf. Vihman et al., 1986). (Ref 1)

The authors include what they refer to as ‘communicative grunt’ among the non-lexical vocalizations they analyse. This is not uncontroversial: Oller and his colleagues, for example, consider grunts to be vegetative, while McCune and her colleagues view ‘communicative grunts’ as the third of three successive developmental grunt types – physical, attentional and communicative, the last type appearing only at about 13 months. Here, the authors provide no further account or definition; we do not know how grunts were identified. But the point is important, as these vocalizations make up fully 26% of the vocalizations counted, and the authors report differences in typical pitch use for grunts compared with the other vocalizations of interest (they ‘have the lower mean F0 values’ and the ‘smallest range’).

In fact, just what the authors take to be a ‘communicative grunt’ is not at all clear, as they indicate (under Descriptive analysis) that they found grunts to be ‘very common during the first seven months of life, but their frequency decreased over time since their use becomes very sporadic’ (i.e., when their use becomes…?). This is in sharp contrast with the many accounts of grunt production by McCune and her colleagues: At and before 7 months only physiological and possibly attentional grunts have been identified, according to those authors. Here, in the Discussion, the authors conclude that ‘Grunts are confirmed to be the less communicative productions’: What could this mean? The only grunts counted were labeled ‘communicative’ – so if they were ‘the less communicative’, on what basis were they coded at all? This is a muddle.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

No

References

1. Vihman M, Macken M, Miller R, Simmons H, et al.: From Babbling to Speech: A Re-Assessment of the Continuity Issue. Language. 1985; 61 (2). Publisher Full Text
2. Wells B: M. M. Vihman,Phonological development: the origins of language in the child . Oxford: Basil Blackwell, 1996. Pp. xiv+312.Journal of Child Language. 1997; 24 (3): 781-788 Publisher Full Text
3. Vihman M, McCune L: When is a word a word?. Journal of Child Language. 1994; 21 (3): 517-542 Publisher Full Text
4. McCune L, Zlatev J: Dynamic systems in semiotic development: The transition to reference. Cognitive Development. 2015; 36: 161-170 Publisher Full Text
5. McCune L, Vihman M, Roug-Hellichius L, Delery D, et al.: Grunt communication in human infants (Homo sapiens).Journal of Comparative Psychology. 1996; 110 (1): 27-37 Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Phonological development

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

16 Views

01 Oct 2024 | for Version 1

Sonia Frota, Center of Linguistics of the University of Lisbon, Lisbon, Portugal

16 Views Cite this report Responses(0)

Approved With Reservations

This study examined the development fo F0 related variables in Italian-learning infants from 4 to 16 months of age. It certainly addresses a relevant topic, offering additional evidence and a contribution to the scarce literature on early prosodic development. However, the paper requires major revisions before it can be indexed. There are also issues related to the use of the English language that need to be amended. I encourage the authors to address all the major and minor issues detailed below and in the annotated pdf. I hope that the detailed comments provided will be helpful to improve the paper.

Abstract
The presentation of the results is too vague.

Introduction

2nd paragraph: You might want to add European Portuguese to the set of languages studied. The study below includes a comparison with the results reported by Whalen et al., 1991):

Frota & Vigário (1995) The intonation of one European Portuguese Infant: a first approach. In I. H. Faria & M. J. Freitas (eds) Studies on the Acquisition of Portuguese. Lisboa: Colibri, pp. 17-34.

3rd paragraph: I believe that you mean in Italian-learning infants. If not, there certainly are studies on infants learning other languages. For example, there is an old study by Shepard and Lane (https://doi.org/10.1044/jshr.1101.94), the study by Mampe et al. (2009), and other work is mentioned in Frota et al., (2016) (such us by Halle et al, or Snow).

p.4 parag. 2: Frota et al. (2016) report development in F0 scaling (which is related to F0 range), which only becomes adult-like around 20-22 months of age.
p. 4 parag. 3: I believe that what is relevant is the language infants are exposed to and not the country where the study is conducted. I suggest rephrasing.

Beyond English-learning infants, Frota & Vigario (1995) found that falling contours predominate in Portuguese-learning infants, who in this respect are similar to English than to French-learning infants. I would suggest reporting here the finding by Whalen et al. (1991) who report that rising contours predominate in French-learning infants.
The present study

paragraph 1: Other studies have suggested that the prosodic component of language shows better performance earlier in development than, for example, other aspects of phonology, or the syntactic component (Prieto et al., 2012; Frota et al., 2016).

How are 'early' and 'later' defined? It would be good to give the reader a better understanding of the hypothesis put forward.

paragraph 2: Please note that there are studies that differentiated babbling from other types of utterances.

paragraph 4: You expect to find 'significant variability among children'. Why? It would be nice to motivate this expectation.

Methods
Participants
If the aspects at stake were not yet analyzed in previous studies, one should not dismiss the possibility of gender differences. Several studies have shown that boys show a slower maturational course of speech motor development. It is thus possible that this might affect prosodic development.
Please provide details on how being 'healthy' was assessed.

Procedure
Please provide information on how the families/babies were recruited. Kindly, also refer here to the ethics procedures followed (e.g., submission to ethics committee, prior informed consent, etc.).

Did all the mothers normally play without toys with their infants?
Coding: pre-lexical productions
Although the goal is to focus on pre-lexical production, it would be highly informative to know per child how many 'words' were produced and from what age. If a given child at a given point in development is producing a fair amount of words, then perhaps the so-called pre-lexical productions from the same child should be discarded, or at least treated differently from those pre-lexical productions that precede the production of words. Of course, a definition of what counts as a 'word' is needed.

Reliability
I'm afraid a clarification is needed. Was mean F0 not automatically calculated as stated above? How could inter-coder reliability be computed for this variable?
A note on the availability of the data: The raw data is not available, I believe. This would be important not only to replicate the current analyses, but also to explore the data with new analyses.

Results
Descriptive analysis
Does canonical babbling refer to simple babbling? Please keep the terminology the same throughout the paper.

Between subjects variability exploration
I believe there should instead be a section on the statistical analyses.
Maybe I'm missing something. Shouldn't age be factored out to investigated between-subject variability? Also, what is the impact of the different types of productions on such variability? I suggest reorganizing this section so that these issues are immediately put forward, instead of looking at variability per se (as this is naturally a given).
Could you provide the measures of comparison among models? For example, did you use Akaike Information Criteria? If not, why not and what were the measures used?

Discussion
Please rephrase the first sentence, as the use of English needs to be corrected.
p. 16, parag.2, lines 6-7: The previous sentence focused on variability across ages. Variability across ages is not individual variability, but variability due to age. One would have individual variability if, for example, different individuals would not show the same trend in variability across age, or the same trend in variability across type of production. The use of 'individual variability' and all the discussion around it, requires clarification and amendment.
p. 17, parag.1: To the extent that F0 range is related to scaling of F0 contours, the use of different types of contours would have a tremendous impact on a rough measure such as mean F0 range. Frota et al. (2016), analyzing the F0 contours produced by children, found that F0 scaling is nearly adult-like between 18 and 24 months, depending on the type of F0 contour (with falling contours for statements getting adult-like early than calling contours, for example).
p. 17, parag.2: In your data, most of the productions were flat. Could this not be related to the fact that only pre-lexical productions were considered? For example, Prieto et al. (2012) and Frota et al. (2016) considered meaningful utterances according to the criteria proposed in Snow (2006) and found very different results, with falling nuclear contours predominating initially and an increasing diversity of nuclear contours emerging over time (that is already seen between 12 and 16 months, although it becomes most evident from 16 months onwards).
p. 17 parag.2: Kindly rephrase the last two sentences.
p. 17, parag.5: 'Grunts are confirmed to be the less communicative productions'. Please add a reference here.
p. 17, parag.6: Together with the communicative intention, an important limitation is the consideration of general acoustic measures of the productions (along the lines of contour-based approaches) without a look into the structural properties of the melodies produced ( such as type of nuclear contour, i.e., pitch accent and boundary tone).

Please rephrase the text within parentheses.
Implications for future studies
It is difficult to see how the findings from the present study show the presence of an intonational repertoire. What are the structural entities of that repertoire and what do they convey? Please also see the comments above.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Partly

References

1. Frota S, Cruz M, Matos N, Vigário M: Early Prosodic Development. 6: 295-324 Publisher Full Text
2. Prieto P, Estrella A, Thorson J, Vanrell Mdel M: Is prosodic development correlated with grammatical and lexical development? Evidence from emerging intonation in Catalan and Spanish.J Child Lang. 2012; 39 (2): 221-57 PubMed Abstract | Publisher Full Text
3. Mampe B, Friederici AD, Christophe A, Wermke K: Newborns' cry melody is shaped by their native language.Curr Biol. 2009; 19 (23): 1994-7 PubMed Abstract | Publisher Full Text
4. Sheppard WC, Lane HL: Development of the prosodic features of infant vocalizing.J Speech Hear Res. 1968; 11 (1): 94-108 PubMed Abstract | Publisher Full Text
5. Snow D: Regression and reorganization of intonation between 6 and 23 months.Child Dev. 2006; 77 (2): 281-96 PubMed Abstract | Publisher Full Text
6. Whalen DH, Levitt AG, Wang Q: Intonational differences between the reduplicative babbling of French- and English-learning infants.J Child Lang. 1991; 18 (3): 501-16 PubMed Abstract | Publisher Full Text
7. S Frota: The intonation of one European Portuguese Infant: a first approach. In Studies on the Acquisition of Portuguese. 1995.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Experimental linguistics and psycholinguistics; prosody; early language development

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] Amano S, Nakatani T, Kondo T: Fundamental frequency of infants’ and parents’ utterances in longitudinal recordings. J. Acoust. Soc. Am. 2006; 119(3): 1636–1647. PubMed Abstract | Publisher Full Text

[2] Astruc L, Payne E, Post B, et al.: Tonal targets in early child English, Spanish, and Catalan. Lang. Speech. 2013; 56(2): 229–253. PubMed Abstract | Publisher Full Text

[3] Aureli T, Spinelli M, Fasolo M, et al.: The pointing–vocal coupling progression in the first half of the second year of life. Infancy. 2017; 22(6): 801–818. Publisher Full Text

[4] Behrens H, Gut H: The relationship between prosodic and syntactic organization in early multiword speech. J. Child Lang. 2005; 32(1): 1–34. PubMed Abstract | Publisher Full Text

[5] Bennett S: A 3-year longitudinal study of school-aged children’s fundamental frequencies. J. Speech Hear. Res. 1983; 26(1): 137–141. PubMed Abstract | Publisher Full Text

[6] Boersma P, Weenink D: Praat. Doing phonetics by computer (Version 5.1).2005.

[7] Cruttenden A: Intonation. Cambridge University Press; 1997.

[8] Crystal D: The Analysis of Intonation in Young Children. Edward Arnold; 1978.

[9] D’Aloia V: The developing of prosody in infants: a longitudinal study over the first 16 months of life. [Dataset]. Open Science Framework. 2024. Publisher Full Text

[10] D’Odorico L: Non-segmental features in prelinguistic communications: an analysis of some types of infant cry and non-cry vocalizations. J. Child Lang. 1984; 11(1): 17–27.

[11] D’Odorico L, Franco F: Selective production of vocalization types in different communication contexts. J. Child Lang. 1991; 18(3): 475–499. PubMed Abstract | Publisher Full Text

[12] D’Odorico L, Fasolo M, Marchione D: The prosody of early multi-word speech: word order and its intonational realization in the speech of Italian children. Enfance. 2009; 3: 317–327.

[13] De Carvalho A, He AX, Lidz J, et al.: Prosody and function words cue the acquisition of word meanings in 18-month-old infants. Psychol. Sci. 2019; 30(3): 319–332.

[14] Esteve-Gibert N, Prieto P: Prosody signals the emergence of intentional communication in the first year of life: Evidence from Catalan-babbling infants. J. Child Lang. 2013; 40(5): 919–944. PubMed Abstract | Publisher Full Text

[15] Esteve-Gibert N, Prieto P: The Development of Prosody in First Language Acquisition. Cambridge University Press; 2018.

[16] Fairbanks G: An acoustical study of the pitch of infant hunger wails. Child Dev. 1942; 13: 227–232. Publisher Full Text

[17] Fasolo M, D’Odorico L, Costantini A, et al.: The influence of biological, social, and developmental factors on language acquisition in pre-term born children. Int. J. Speech Lang. Pathol. 2010; 12(6): 461–471. PubMed Abstract | Publisher Full Text

[18] Fasolo M, Majorano M, D’Odorico L: Babbling and first words in children with slow expressive development. Clin. Linguist. Phon. 2008; 22(2): 83–94. PubMed Abstract | Publisher Full Text

[19] Fisher C, Tokura H: Prosody in speech to infants: Direct and indirect acoustic cues to syntactic structure. Signal to syntax: Bootstrapping from speech to grammar in early acquisition. 1996; 343–363.

[20] Flax J, Lahey M, Harris K, et al.: Relations between prosodic variables and communicative functions. J. Child Lang. 1991; 18(01): 3–19. Publisher Full Text

[21] Fox DB: An analysis of the pitch characteristics of infant vocalizations. Psychomusicology. 1990; 9(1): 21–30. Publisher Full Text

[22] Gervain J: Plasticity in early language acquisition: the effects on prenatal and early childhood experience. Curr. Opin. Neurobiol. 2015; 35: 13–20. PubMed Abstract | Publisher Full Text

[23] Gleitman L, Gleitman H, Landau B, et al.: Linguistics: the Cambridge survey: Vol. 3. Language: psychological and biological aspects.1988.

[24] Goldstein MH, Schwade JA, Bornstein MH: The value of vocalizing: Five-month-old infants associate their own noncry vocalizations with responses from caregivers. Child Dev. 2009; 80(3): 636–644. PubMed Abstract | Publisher Full Text | Free Full Text

[25] Gratier M, Devouche E: Imitation and repetition of prosodic contour in vocal interaction at 3 months. Dev. Psychol. 2011; 47(1): 67–76. PubMed Abstract | Publisher Full Text

[26] Hsu H-C, Fogel A, Cooper RB: Infant vocal development during the first 6 months: speech quality and melodic complexity. Infant Child Dev. 2000; 9(1): 1–16. Publisher Full Text

[27] Iyer SN, Oller DK: Fundamental frequency development in typically developing infants and infants with severe-to-profound hearing loss. Clin. Linguist. Phon. 2008; 22(12): 917–936. PubMed Abstract | Publisher Full Text | Free Full Text

[28] Kahane JC, Kahn AR: Weight measurements of infant and adult intrinsic laryngeal muscles. Folia Phoniatr. Logop. 1984; 36(3): 129–133. PubMed Abstract | Publisher Full Text

[29] Kent RD: Anatomical and Neuromuscular Maturation of the Speech Mechanism: Evidence from Acoustic Studies. J. Speech Lang. Hear. Res. 1976; 19(3): 421–447. PubMed Abstract | Publisher Full Text

[30] Kent RD, Murray AD: Acoustic features of infant vocalic utterances at 3, 6, and 9 months. J. Acoust. Soc. Am. 1982; 72(2): 353–365. PubMed Abstract | Publisher Full Text

[31] Ko E-S, Seidl A, Cristia A, et al.: Entrainment of prosody in the interaction of mothers with their young children. J. Child Lang. 2016; 43(2): 284–309. PubMed Abstract | Publisher Full Text

[32] Laufer MZ, Horii Y: Fundamental frequency characteristics of infant non-distress vocalization during the first twenty-four weeks. J. Child Lang. 1977; 4(02): 171–184. Publisher Full Text

[33] Lee S, Potamianos A, Narayanan S: Acoustics of children’s speech: Developmental changes of temporal and spectral parameters. J. Acoust. Soc. Am. 1999; 105(3): 1455–1468. PubMed Abstract | Publisher Full Text

[34] Lieberman P: Intonation, perception, and language. MIT Research Monograph; 1967.

[35] Mampe B, Friederici AD, Christophe A, et al.: Newborns’ cry melody is shaped by their native language. Curr. Biol. 2009; 19(23): 1994–1997. PubMed Abstract | Publisher Full Text

[36] Masterson JJ, Kamhi AG: Linguistic trade-offs in school-age children with and without language disorders. J. Speech Lang. Hear. Res. 1992; 35(5): 1064–1075. PubMed Abstract | Publisher Full Text

[37] McRoberts GW, Best CT: Accommodation in mean f 0 during mother–infant and father–infant vocal interactions: a longitudinal case study. J. Child Lang. 1997; 24(03): 719–736. PubMed Abstract | Publisher Full Text

[38] Murry T, Hoit-Dalgaard J, Gracco VL: Infant vocalization: A longitudinal study of acoustic and temporal parameters. Folia Phoniatr. 1983; 35(5): 245–253. PubMed Abstract | Publisher Full Text

[39] Oller DK, Wieman LA, Doyle WJ, et al.: Infant babbling and speech. J. Child Lang. 1976; 3(1): 1–11. Publisher Full Text

[40] Papaeliou C, Minadakis G, Cavouras D: Acoustic patterns of infant vocalizations expressing emotions and communicative functions. J. Speech Lang. Hear. Res. 2002; 45: 311–317. PubMed Abstract | Publisher Full Text

[41] Papaeliou C, Trevarthen C: Prelinguistic pitch patterns expressing ‘communication’and ‘apprehension’. J. Child Lang. 2006; 33(1): 163–178. PubMed Abstract | Publisher Full Text

[42] Papoušek M, Papoušek H: Forms and functions of vocal matching in interactions between mothers and their precanonical infants. First Lang. 1989; 9(6): 137–157. Publisher Full Text

[43] Prieto P, Estrella A, Thorson J, et al.: Is prosodic development correlated with grammatical and lexical development? Evidence from emerging intonation in Catalan and Spanish. J. Child Lang. 2012; 39(2): 221–257. PubMed Abstract | Publisher Full Text

[44] Prieto P, Vanrell M d M: Early intonational development in Catalan. Paper presented at the Proceedings of the XVIth International Congress of Phonetic Sciences. 2007.

[45] Rasbash J, Steele F, Browne W, et al.: A user’s guide to MLwiN Version 2.0. Bristol: Centre for multilevel modelling, University of Bristol; 2005.

[46] Robb MP, Saxman JH: Developmental Trends in Vocal Fundamental Frequency of Young Children. J. Speech Lang. Hear. Res. 1985; 28(3): 421–427. PubMed Abstract | Publisher Full Text

[47] Robb MP, Saxman JH, Grant AA: Vocal fundamental frequency characteristics during the first two years of life. J. Acoust. Soc. Am. 1989; 85(4): 1708–1717. PubMed Abstract | Publisher Full Text

[48] Rothgänger H: Analysis of the sounds of the child in the first year of age and a comparison to the language. Early Hum. Dev. 2003; 75(1–2): 55–69. PubMed Abstract | Publisher Full Text

[49] Snow D: Falling intonation in the one- and two-syllable utterances of infants and preschoolers. J. Phon. 2004; 32(3): 373–393. Publisher Full Text

[50] Snow D: Regression and Reorganization of Intonation Between 6 and 23 Months. Child Dev. 2006; 77(2): 281–296. PubMed Abstract | Publisher Full Text

[51] Snow D, Balog HL: Do children produce the melody before the words? A review of developmental intonation research. Lingua. 2002; 112(12): 1025–1058. Publisher Full Text

[52] Snow D, Ertmer DJ: Children’s development of intonation during the first year of cochlear implant experience. Clin. Linguist. Phon. 2012; 26(1): 51–70. PubMed Abstract | Publisher Full Text | Free Full Text

[53] Sorianello P: Prosodia: Modelli e ricerca empirica. Carocci; 2021.

[54] Soderstrom M, Blossom M, Foygel R, et al.: Acoustical cues and grammatical units in speech to two preverbal infants. J. Child Lang. 2008; 35(4): 869–902. PubMed Abstract | Publisher Full Text

[55] Spinelli M, Fasolo M, Mesman J: Does prosody make the difference? A meta-analysis on relations between prosodic aspects of infant-directed speech and infant outcomes. Dev. Rev. 2017; 44: 1–18. Publisher Full Text

[56] Stark RE, Bernstein LE, Demorest ME: Vocal communication in the first 18 months of life. J. Speech Lang. Hear. Res. 1993; 36(3): 548–558. PubMed Abstract | Publisher Full Text

[57] Stoel-Gammon C: Relationships between lexical and phonological development in young children. J. Child Lang. 2011; 38(1): 1–34. PubMed Abstract | Publisher Full Text

[58] Trevarthen C: Descriptive analyses of infant communicative behaviour.Shaffer HR, editor. Studies in mother-infant interaction: The Loch Lomond Symposium. London: Academic Press; 1977; pp. 227–270.

[59] Wand MP, Jones MC: Kernel smoothing. Chapman and Hall/CRC; 1995.

[60] Wasserman L: All of nonparametric statistics. Springer; 2006.

[61] Wells B, Peppé S, Goulandris N: Intonation development from five to thirteen. J. Child Lang. 2004; 31(04): 749–778. PubMed Abstract | Publisher Full Text

[62] Whalen DH, Levitt AG, Wang Q: Intonational differences between the reduplicative babbling of French- and English-learning infants. J. Child Lang. 1991; 18(03): 501–516. PubMed Abstract | Publisher Full Text

[63] Zanchi P, Fasolo M, Spinelli M, et al.: The role of acoustic and contextual features in the recognition of crying causes. Psicol. Clin. Svilupp. 2016a; 20(1): 103–123.

[64] Zanchi P, Zampini L, Fasolo M, et al.: Syntax and prosody in narratives: A study of preschool children. First Lang. 2016b; 36(2): 124–139.

The developing of prosody in infants: a longitudinal study over the first 16 months of infant life

Abstract

Background

Method

Results

Conclusions

Keywords

Introduction

Prosodic development over infancy and toddlerhood: a contrasting picture

Suggestions for a better comprehension of the phenomena

The present study

Methods

Participants

Procedure

Table 2. Number of pre-lexical productions by child and session.

Coding: pre-lexical productions

Coding: prosody

Reliability

Fit lines computation

Figure 1. Mean relative frequency of the type of productions from 4 to 16 months of age.

Results

Descriptive analysis

Table 3. Frequencies and percentages of each vocal production.

Between subjects variability exploration

Table 4. Linear regression on F0 mean by children (dummies) and age.

Table 5. Predictors of F0 mean, multilevel results.

F0 mean trajectories over time

Figure 2. F0 mean values (in Hz) of each type of production from 4 to 16 months of age predicted by Model 2 (M2 in Table 5).

F0 range trajectories over time

Table 6. Predictors of F0 range, multilevel results.

Figure 3. F0 range (in semitones) values of each type of productions from 4 to 16 months of age predicted by Model 2 (M2 in Table 6).

F0 final contours trajectories over time

Table 7. Predictors of presence of level F0 contour productions (binomial level vs no level).

Figure 4. Presence of level F0 contours in each type of production from 4 to 16 months of age predicted by Model 2 (M2 in Table 7).

Table 8. Predictors of the presence of rising F0 final contours in the production (binomial rising vs no rising).

Figure 5. Presence of rising F0 final contours in each type of production from 4 to 16 months of age predicted by Model 2 (M2 in Table 9).

Table 9. Predictors of the presence of falling F0 final contours in the production (binomial falling vs no falling).

Figure 6. Presence of falling F0 final contours in each type of production from 4 to 16 months of age predicted by Model 2 (M2 in Table 8).

Discussion

Implications for future studies

Ethics and consent

Data availability statement

Extended data

References

Footnotes

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated