Keywords
newspaper, article, reliability, validity, search
Previous research provided data on the number of yearly articles in three national newspapers in Japan. The validity of this dataset and its generation method was confirmed. However, its reliability was unclear. Not only validity but also reliability should be confirmed. Therefore, the present article investigated the reliability of the data and its method.
I performed a series of searches again and provided a new dataset on the number of articles in the three national newspapers. I followed the same procedure one year after the prior search.
I found very strong correlations in article counts between the previous and current datasets, confirming their high reliability. Although there were some years when the number of articles differed between the two datasets, their differences were small.
Therefore, their reliability was confirmed to be high. Taken together, both the validity and reliability of the data and its generation method were confirmed.
newspaper, article, reliability, validity, search
Newspapers have been analyzed in research across diverse academic disciplines, such as the humanities (e.g., Kawashima, 2017; Yawata, 2020), social sciences (e.g., Miyazawa, 2018; Ogihara, 2023), and natural sciences (e.g., Fujibe & Matsumoto, 2022; Okuhara et al., 2019). Newspapers reflect the interests and attention of people in general. Thus, researchers can investigate humans, society, and nature through newspapers. In addition, newspapers are a product that reflects cultural elements (e.g., Brescoll & LaFrance, 2004; Markus et al., 2006). Researchers can examine cultures by analyzing newspapers. Furthermore, newspapers are a product that remains over time (for reviews, see Morling, 2016; Morling & Lamoreaux, 2008). Thus, newspapers enable researchers to examine historical changes empirically (e.g., Carlquist et al., 2017; Nafstad et al., 2007).
To analyze newspapers appropriately, it is important to know the total number of articles for a given period. Specifically, to calculate the relative frequencies of articles, the total number of articles is necessary. A prior study provided useful datasets that include the numbers of yearly articles in the three national newspapers in Japan over the 150 years between 1872 and 2021 (Ogihara, 2024). The validity of the dataset and its generation method were confirmed by asking each national newspaper company whether the method introduced in the study indeed provides the number of articles for a given year. The method was a search without entering words in the search box. Usually, words or phrases are entered to search for articles, but here, no words were entered intentionally in the search box.
These datasets and their generation method are useful. In fact, previous research has used them. For example, a prior study has shown that the rates of articles referring to unique hybrid dogs (mixed-breed dogs created by crossing purebred dogs) increased between 2003 and 2018, indicating that people in Japan came to seek more uniqueness and Japanese culture became more individualistic (Ogihara & Uchida, 2024). Moreover, a previous study has demonstrated that the rates of articles mentioning kirakira names (unique/uncommon names) increased in Japan between 2011 and 2022, showing a rise in individualism at the macro level (Ogihara, 2025a). These changes in the emphasis on uniqueness are consistent with previous research on names (e.g., Ogihara et al., 2015; Ogihara & Ito, 2022; for a review, see Ogihara, 2025c) and individualism (Hamamura, 2012; Ogihara, 2018; Taras et al., 2012; for a review, see Ogihara, 2017).
However, it is unclear to what extent the data and its method are reliable. It is possible that each search on article counts might be inaccurate and that the numbers might differ by search (trial). If the reliability is low, the data and its method are difficult to use in empirical research. Not only validity but also reliability should be confirmed for the data and its method.
It should be noted that some minor differences can arise because the number of articles in the databases themselves can change over time, regardless of the generation method (Ogihara, 2024). In particular, due to database updates, newspaper companies sometimes add old articles that were not included in their databases. This increases the number of articles. In contrast, companies sometimes remove previous articles from their databases for various reasons (e.g., infringing copyrights, protecting personal information), which decreases the number of articles.
Therefore, this study examined the reliability of the data and its method. Although the values in the datasets might change to some extent because of the updates of the databases, it was predicted that the values in the datasets would not differ too much, showing high reliability. If the values did differ, it would be necessary to know how they differ (extent and direction). For this purpose, I conducted a series of searches again following the previous study (Ogihara, 2024) and analyzed the data by comparing it with the previous data.
I followed the procedures of the previous study (Ogihara, 2024) and performed a series of searches in the three national newspapers in Japan over the 150 years between 1872 and 2021. I conducted a search without entering any words in the search box each year, which validly provided the number of yearly articles.
I conducted these searches in December 2023, one year after the original searches were conducted in December 2022 (Ogihara, 2024).
The databases used were Yomidas Rekishikan (ヨミダス歴史館; the database of the Yomiuri Shimbun; for details, see Ogihara, 2024; “Shimbun” means newspaper in Japanese), Kikuzo II Visual (聞蔵IIビジュアル; the database of the Asahi Shimbun), and Maisaku (毎索; the database of the Mainichi Shimbun). These newspapers have been the most popular national newspapers in Japan (the big three newspapers). Each of these databases consists of two parts: scanned image and text. Older newspapers are archived as images, while newer newspapers are archived as text.
First, I calculated the correlations in article counts between the previous dataset and the current dataset. If the reliability was high, the correlations should be high. Then, I counted the number of years when the numbers of articles differed between the two datasets. Finally, if there were differences between the previous and current values, I calculated the exact differences.
Raw data is available on the Open Science Framework (OSF) platform (Table S1; Ogihara, 2025b).
The correlations between the article counts in the previous and current datasets are shown in Table 1. All the correlations were 1.000, demonstrating that the data and its generation method were reliable.
The number of years when the number of articles differed between previous and current datasets are indicated in Table 1 (raw data is available on the OSF: Table S2; Ogihara, 2025b). Overall, the numbers of years were small (min: 0, max: 31), and their rates were low (min: 0%, max: 28.57%).
In two out of six datasets (text data in Yomiuri and scanned image data in Mainichi), the number of differences was 0, showing that in all years the numbers of articles were identical. In two out of six datasets (scanned image data and text data in Asahi), the number of differences was close to 0, indicating that in almost all years the numbers of articles were identical. In two out of six datasets (scanned image data in Yomiuri and text data in Mainichi), there were some years when the numbers of articles differed. However, the rates were small (lower than 30%).
The absolute and relative numbers of differences are presented in Figures 1 to 3. The absolute differences were calculated by subtracting the numbers in the past dataset from those in the current dataset. Thus, positive values mean that the number of articles increased, and negative values mean that the number of articles decreased over one year. The directions (positive or negative) were also examined and are presented in Table 1.
(A) Differences in absolute number of articles. (B) Differences in relative number of articles.
Note: The absolute differences were calculated by subtracting the numbers in the past dataset from those in the current dataset. Thus, positive values mean that the number of articles increased, and negative values mean that the number of articles decreased over one year.
The relative numbers of differences were calculated by dividing the absolute numbers of differences by the number of yearly articles in the current datasets. When the average of these relative numbers was calculated, to account for the existence of negative values (so that they would not cancel each other out), the absolute values of relative numbers were used (e.g., -0.001% was transformed into 0.001%).
Yomiuri. Differences were found only in the scanned image data (1874-1989), not in the text data (1986-2021). Overall, the differences were minor (the average was 0.003%), and most were positive (90.32%; Table 1 and Figure 1). Only in 1923, was the difference relatively large (65 articles; 0.183%). When this deviant value in 1923 was excluded from the aggregation, the average score changed from 0.003% to 0.001%.
Asahi. Differences were found in the scanned image data (1879-1999) and the text data (1984-2021). The trends in these two datasets were similar. The differences were minor (the averages were 0.0001% and 0.00003%), and most were positive (100 % and 66.67%; Table 1 and Figure 2).
(A) Differences in absolute number of articles. (B) Differences in relative number of articles.
Note: The absolute differences were calculated by subtracting the numbers in the past dataset from those in the current dataset. Thus, positive values mean that the number of articles increased, and negative values mean that the number of articles decreased over one year. Black bars indicate values in the scanned image dataset (1879–1999), and blue bars indicate values in the text dataset (1984–2021).
Mainichi. Differences were found only in the text data (1987-2021), not in the scanned image data (1872-1986). The differences were minor (the average was 0.001%), and all were negative (100 %; Table 1 and Figure 3).
(A) Differences in absolute number of articles. (B) Differences in relative number of articles.
Note: The absolute differences were calculated by subtracting the numbers in the past dataset from those in the current dataset. Thus, positive values mean that the number of articles increased, and negative values mean that the number of articles decreased over one year.
Previous research provided data on the number of yearly articles in the three national newspapers in Japan (Ogihara, 2024). The validity of this dataset and its generation method were confirmed. However, its reliability was unclear. Not only validity but also reliability should be confirmed. Therefore, the present article investigated the reliability of the data and its method.
I performed a series of searches once again and provided the new dataset of the number of articles in the three national newspapers in Japan. I followed the same procedure one year after the previous search was conducted (Ogihara, 2024).
I found very strong correlations in article counts between the previous and current datasets, confirming the high reliability of the data and its method. Although there were some years when the numbers of articles differed between the two datasets, the differences were small. Therefore, their reliability was confirmed to be high. Taken together, both the validity and reliability of the data and its method were confirmed.
Moreover, this study serves as archived historical data at present and updates the database. As explained earlier and shown in this study, the numbers of articles in the databases can change over time. It is important to record information in the databases and update it continuously, which is achieved in this study.
The author confirms being the sole contributor of this work and approved it for publication.
This article does not contain any studies with human participants performed by any of the authors.
This article does not contain any studies with human participants performed by any of the authors.
OSF: Newspaper article counts and their generation method are reliable, Doi: https://doi.org/10.17605/OSF.IO/GHU7M (Ogihara, 2025b).
This project contains the following underlying data:
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: I am anchored in journalism studies, using quantitative research methods.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |
---|---|
1 | |
Version 1 13 Mar 25 |
read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)