Measuring linguistics of the wokototen chart made inductively by deciphering kunten materials

In this paper, we focus on wokototen markings, which are a system of kunten annotations used to facilitate the reading of classical Chinese documents by Japanese readers. Using digitized data, we performed basic measurements of wokototen by using a chart that summarizes the wokototen markings of actual kunten materials described by Hiroshi Tsukishima, and we quantitatively clarified their characteristics. Kunten materials are classical Chinese books with annotations, called kunten, on the Chinese text. The wokototen is a type of kunten. In ancient East Asian countries, kunten systems were developed as a way of directly annotating Chinese documents so that they could be read and understood by non-native readers. For this reason, kunten materials and kunten are treated as historical sources for linguistic and historical research. The shape and position of a wokototen marking determines what kind of reading it indicates. The results of our basic survey quantitatively show that almost all the wokototen charts in actual kunten materials contain particles represented by “te”, “ni”, and “wo”, the most common shapes of wokototen are dots and shapes that can be written with a single stroke, such as ｜, ─, and ＼, and that the most common places to find these markings are to the right of characters in the horizontal direction and below characters in the vertical direction.


Introduction
Kunten is a Japanese system of text markings used to clarify the syntax and meaning of Chinese texts for Japanese readers.Previous research [1][2][3] investigated how these marks can be handled on a computer.In this paper, we focus on a kunten system called wokototen that is used to annotate classical Chinese texts (kanbun), and we perform basic measurements by digitizing wokototen charts (wokototen-zu) that inductively summarize the markings added to actual documents.
The texts we are studying consist of classical Chinese texts annotated with kunten markings.These were once widely used, around from the Heian period to the Edo period in Japan, to promulgate social, cultural, and academic ideas in East Asian countries, where kunten systems were developed as a way of directly annotating these documents so that they could be read and understood by non-native readers.
Wokototen is one such system that was developed in Japan.It consists of marks placed inside or around the Chinese characters to indicate features such as grammatical particles, auxiliary verbs, and the readings of kanji characters.There are various flavors of wokototen associated with different schools and different annotators, and wokototen charts are used as means of inductively compiling the annotation marks used by different schools of wokototen or in different kunten documents.Documents annotated in this way have existed from the Heian period to the present day, and wokototen is mostly found in Chinese classics, Buddhist scriptures, and Japanese books of the Heian and Kamakura periods.Since the materials that are the subject of our research contain complex information, this information must be preserved intact when they are digitized for analysis purposes.We therefore digitized the data by using a dedicated structured description method.For this paper, particularly with regard to this digitized data, we performed basic measurements of wokototen by using a chart that summarizes the wokototen markings of actual kunten materials described by Hiroshi Tsukishima, 4 and we quantitatively clarified their characteristics.

Target of research
Overview of wokototen Wokototen markings are often found in kunten documents from the Heian and Kamakura periods, where they are used to indicate features such as grammatical particles, auxiliary verbs, and conjugated endings by means of variously shaped symbols such as dots (・), lines ( j ), and hooks (└ ).These symbols can be placed at the four corners, inside, or around the strokes of a Chinese character, and their readings differ depending on their position and shape.For example, a dot to the upper right of a kanji has a different meaning than a line in the same position, or a dot to the lower right of the character.To uniquely identify the meaning of a wokototen markings, we need to know both its position and its shape.
There are various types of wokototen associated with different eras and different schools.For example, in one school, a dot at the upper right corner of a Chinese character is read as wo, but in another school, it is read as koto.
Even today, researchers specializing in wokototen are working to decipher kunten materials and study their historical background.Their general research method involves visually deciphering the wokototen and kana annotations in kunten materials to create Japanese transcriptions from which the contents of the materials can be understood.As a by-product of this process, a wokototen chart is generated as a key to the markings used in the source material.A wokototen chart not only summarizes the wokototen markings in the actual kunten materials, but also includes information on how the researchers understood these materials.
As shown in Figure 1, a wokototen chart indicates the reading of each marking according to its shape and position in a square frame (called a tsubo) corresponding to the location of a Chinese character.A wokototen chart typically consists of multiple tsubo, with each tsubo containing multiple wokototen.A collection of these wokototen charts is called a tenzushu (点図集).
Wokototen charts can be classified into two types according to the process by which they were created.One is a comprehensive chart, which is a collection of the wokototen markings used by each school.The other is an inductive chart, which is a collection of the wokototen markings used in a particular body of kunten material.The measurements in this study were made using the latter type of chart.For this paper, we performed basic measurements on 199 types of point charts contained in actual kunten materials as summarized by Hiroshi Tsukishima. 4 By way of comparison, we also present the results of measurements made using the former type of point chart as described by Tomoaki Tsutsumi. 1

Digitization of wokototen chart data
The data used in this study was based on wokototen charts data created from kunten materials created by Hiroshi Tsukishima, which were registered as of June 2022 in the Wokototen charts Database provided by the National Institute for Japanese Language and Linguistics (NINJAL).Further information on the database can be found on the NINJAL website and in reference [3].
As mentioned above, wokototen markings are used to annotate Chinese characters.The meaning of each marking is determined by its position, shape, and reading.In the data of NINJAL, the 'reading' of the wokototen has been entered in Japanese text.The 'shape' of the wokototen has been entered by replacing it with the similar character in Unicode.The 124 characters substituted in this database is shown in Table 1.The position of a wokototen is represented using a 7Â7 square grid of cells with its origin at the center of a tsubo and the upper left and lower right corners at the coordinates (À3, À3) and (3, 3).Since wokototen markings are sometimes slightly separate from the kanji character, the center 5Â5 square corresponds to the area occupied by the character, and coordinates in this region correspond to markings that overlap with the character.The outermost cells are used for positions that are separate from the kanji character.This digitization method of the readings, positions and shapes of the wokototen are described in detail in reference [1].
This method of expressing the position of the Kunten in coordinates has also been used in Korean studies of gugyeol, [14][15][16][17][18] where a coordinate system of 5Â5 squares is defined around each character.These methods of the Korean studies and the current one has the same concept but are not data compatible.

Basic measurements for wokototen charts
In this study, we performed measurements on 199 types of point charts (contained in actual kunten materials as summarized by Hiroshi Tsukishima 4 ), relating to the reading, position, and shape of wokototen.The programme written in C# was used for the measurements. 19Excel 365 and Visual Studio Code (version 1.67) were used to view and compare the data from the database with the reference [4] and to correct the data.VisualStudio Community (version 17) was used to create the programme.
This program reads comma-delimited data, one per line, on the wokototen to be measured.It is a simple program that outputs the count of "reading," "position," and "shape" of the wokototen.A line of read data should be produced in the following format.
Material Title,Tsubo No,Reading,Sharp,Position of X, Position of Y We created this form of data by comparing data taken from the Wokototenzu database with the previous research. 4In some cases, there were multiple readings given for a single symbol in the wokototen chart.Examples are shown in the 'カ/ナ' in the top left-hand corner (-2, -2) of the fifth tsubo in Figure 1.In this case, we have divided the symbol into two separate totals.The one of wokototen of the shape 'L' reading 'カ' and the other of wokototen of the shape 'L' reading 'ナ.In the data created by adding these processes, the total number of target wokototen was 6411.

Results for reading of wokototen
The readings of wokototen, such as "wo" and "koto," were examined to determine how many were found in the target wokototen chart.As a result, there were 303 types of readings.Next, the number of these 303 types of readings on the wokototen chart were measured, and the top 10 types are shown in Table 2.This does not include 885 points for which no reading was noted.
Table 1.List of shapes used for digitisation. Shape

Results for shapes of wokototen
We examined the shapes of wokototen markings such as ・ and j to determine how many of each there were in the target wokototen chart.As a result, we found 124 different shapes.We then counted the occurrences of each of these shapes in the wokototen chart, and the top 10 shapes are shown in Table 3.Since the wokototen charts used in this study were handdrawn, there were several shapes that were partially similar but with slight differences.In such cases, we counted the shapes as being of a single type.For example, the shapes 人 and 入 were grouped together as 人 and counted as the same shape.
Next, we examined the shapes used to indicate the three most frequent readings, which were "te", "ni", and "wo".As shown in Table 4, we found that these readings were represented by seven shapes: ・, \, /, |, ─, ◡, and :.The ・ shape was the most common.

Result for positions of wokototen
The results of measuring the locations of wokototen markings are shown in Table 5 and are depicted graphically in Figure 2. In Table 5, the area with the red background shows the position of the square cells representing the location of the kanji character.

Reading
As Table 2 shows, there are many particles and auxiliary verbs such as "te", "ni", "wo", and "ha".The most common reading was "te", which had more appearances than the number of wokototen charts (199) because it sometimes appeared more than once in the same chart, and because there were only eight charts in which this reading was not mentioned.These were charts with an extremely small number of wokototen points.One example is Scroll 1 of the Biography of Yang Xiong Zhuan (漢書楊雄傳; yellow markings, 5th level), which has only one wokototen marking, Scroll 3 of the Mohe Zhiguan (魔訶止觀; green markings), which has eight, and Scroll 1 of Huiguo Heshang Zhi Bei (惠果和上之碑文; black markings), which has two.In cases where the markings appear in different colors, such as red/vermillion (shuten), black (bokuten), and white (hakuten), the symbol "te" was marked in a different color.The same is true for "wo" and "ni", which explains why "te", "wo", and "ni" in particular are found in almost all wokototen charts.
A comparison with the results of a survey 1 of 26 major wokototen charts compiled by Nakata 8 and Tsukishima 4 for each school shows that there are differences in the types of readings that occur frequently.The most common readings in the major wokototen charts are "su", "naru", "nari", and "tari", but these occur less frequently in wokototen charts based on kunten materials.Of these readings, the most common was "su" with 166 occurrences, followed by "nari" with 149.The other readings "tari" and "naru" were the 22nd and 24th most frequent, with 80 and 77 occurrences, respectively.

Shapes
Table 3 shows that the overwhelming majority of wokototen markings are dots (・).There were six wokototen chart that did not contain dots.These were charts containing almost no wokototen, as in the Biography of Yang Xiong Zhuan, and charts where the \ shape was used for the first tsubo instead of a dot, as in Scroll 1 of Tōdaiji Fujumonkō (東大寺諷誦文 稿) and Scroll 8 of Myōhō Rengekyō (妙法蓮華経; the Lotus Sutra).Including the dot marking (・), which appeared most frequently, the wokototen shapes ・, |, ─, and \ that appeared more than 500 times are shapes that can be written with a single stroke.More complex shapes appeared less often, and the same trend was observed in the measurement results of the 26 main wokototen charts.
The results in Table 4 also show that readings with the highest number of appearances are often denoted by a single dot.
Although other shapes are sometimes used, it is safe to say that these readings are almost always denoted by a dot in all schools.This is thought to be because particles that are important in reading Chinese texts as Japanese are often expressed in the simplest way, which using a single dot, because of the large number of times they are added.In some cases, "wo" is represented using the ◡ shape, and these were the wokototen charts belonging to the fourth group in the classification according to Tsukishima. 4

Position
From Table 5, it can be seen that wokototen are often drawn at the four corners and the center of a character.The most common location was the lower right corner (2, 2), where 709 wokototen were found.In the four corners of characters, where wokototen markings are often placed, markings were more commonly found in the lower corners in the vertical direction, and on the right side in the horizontal direction.
Next, we examined the wokototen markings placed around the outside of the characters.More markings were found on the right side than on the left side.For example, there were 129 markings in the upper right external position (3, À3), but only one in the upper left external position (À3, À3).The most common positions of wokototen markings around the outside of a character were the top, middle and bottom on the outer right side, with the largest number (169) appearing in the middle at coordinates (3, 0).These results show that wokototen, like furigana, tend to be written to the right side of each character.
The above trends suggested by our measurements differ from the results obtained from the 26 main wokototen charts, where there was no difference in the placement of markings between the left, right, upper and lower positions.In addition, measurements of the 26 main wokototen charts showed that few of these markings are placed outside the characters, while in the actual materials it can be seen that many wokototen markings are placed to the right of each character.These differences may represent a trend caused by the actual addition of markings to kunten materials.

Conclusion
We have conducted basic measurements of wokototen markings as described by Hiroshi Tsukishima, 4 who compiled a chart summarizing the wokototen markings applied to actual kunten materials.As a result, we have quantitatively demonstrated that almost all wokototen charts in actual kunten materials contain particles represented by "te", "ni", and "wo".Our results also show that the most common shape used for wokototen is ・, and that markings with shapes of greater complexity are used in fewer documents.We found that wokototen are most often placed at the bottom and right of characters, and that when they are placed outside the characters, they appear mostly to the right side of them, showing the same tendency as that of Japanese annotations such as furigana.
In the present analysis of the trends in the Wokotenzu charts, it has been possible to identify an overall trend.However, it cannot be determined whether all individual Kunten materials show a similar trend.Further research into the content of individual Kunten Material will be required in the future.
In the future, we plan to use the information revealed by these basic measurements to develop a system that will help researchers to guess and present the locations of wokototen markings when deciphering kunten materials, and to automatically generate written transcriptions to help readers understand these materials.Another future task will be to extract the characteristics of wokototen by using large amounts of data, which have been difficult to compare and study using conventional manual research methods.

Kristina Hmeljak Sangawa
University of Ljubljana, Ljubljana, Slovenia The article describes a system for digitising vernacular reading glosses of the wokototen type in classical Chinese texts, presents the results of quantitative measurements of the readings, shapes and positions of such glosses in summaries (compiled by H. Tsukishima) of wokototen markings used in actual kunten materials, and compares these findings with data from 26 wokototen 'comprehensive' charts of different schools compiled by Nakata and Tsukishima.
Not surprisingly, the results show that simple shapes (dots or lines that can be written with one stroke) are the most common markings, with dots ('ten' in 'wokototen' literally means 'dot') being by far the most common; that the most frequent readings are marked by dots, and that most markings appear at the centre or at the four corners overlapping the character they mark.It was also found that markings not overlapping characters are most commonly found on the right of the character, as is still the case for phonetic guides (furigana) in contemporary Japanese.Some differences were found between the summaries of wokototen glosses in kunten materials and the 26 wokototen charts, with regard to the most frequent readings and the most frequent position of markings.
The quantitative findings are not surprising, but do empirically confirm common observations.As the authors note in their conclusion, the most important contribution of this research is the coding and processing method used to digitise wokototen markings, described in detail and complete with code that can be used in the future to extend the digitisation and semi-automatic analysis to other material.
One further important aspect of this article is that it is written in English.The majority of published research on Japanese vernacular readings of classical Chinese texts is written in Japanese, and is therefore not accessible to scholars from other areas of research such as the history of writing, language contact etc., who may not be versed in Japanese.This is also reflected in the list of references of this article, which only includes sources in Japanese (the first three in the listprevious papers by the same authors -are quoted with their English titles, but are actually papers published in Japanese with only an English title and abstract).Given the usefulness of the proposed method for digitisation and processing of text with complex markings and glosses, the fact that it is written in English makes it accessible to a wider audience and thus more likely to be used and built upon.
The terminology used follows the tradition of Japanese research on this topic and mostly consists of Japanese terms in romanised form.This is somewhat surprising, considering that two of the authors are also co-authors of a thoroughly explained proposal for English terminology to be used in research on vernacular readings of classical Chinese texts (Whitman et al. 2010, http://conf.ling.cornell.edu/whitman/WhitmanAlberizziTsukimotoKosukegawa2010Toward.pdf),where they propose using self-explanatory English translations instead of romanised renditions of Japanese terms, such as "vernacular reading" instead of kundoku or "gloss" instead of kunten.All terms in this article are, however, clearly explained and thus accessible also to readers who may not be familiar with the Japanese terminology.
I have a few minor formal suggestions for a possible revised version of the article.
The reference list would be more reader-friendly in a more consistent and complete format, e.g.following the Generic style rules for linguistics [www.eva.mpg.de/linguistics/past-researchresources/resources/generic-style-rules/].Specifically, author names would be more easily searchable and less ambiguous if given in both romanized form and Japanese script, and titles more accessible if quoted consistently with both the romanised form and the original Japanese script, followed by an English translation.In the present version of the article, works with English alternative titles (1, 2, 3) are cited only in English, some (6, 8, 10, 11, 12, 15, which originally have no English title) are cited in Japanese script and with an English translation of the title probably provided by the authors, without a romanised transcription of the Japanese title, while other titles (4, 5, 7, 13, 14, 16, 17, 18) are given in romanised form and Japanese script, but without an English translation of the title.For example, the following citation: Notwithstanding these suggestions for very minor corrections, the article is an important contribution to the study of Japanese vernacular readings of classic Chinese texts, and a welcome introduction of the proposed digitisation method to an international audience.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound?Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: History of Japanese writing I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
International Institute for Digital Humanities, The University of Tokyo, Tokyo, Japan I acknowledge the potential significance of the research presented in this paper, which offers an insightful foundation for discussing matters related to pre-modern international exchange and translation.The findings are compelling, contingent upon further corroboration through subsequent research.
However, having perused the manuscript in detail, I wish to raise two points for your kind consideration and, if possible, rectification.
Firstly, our attention was drawn to the sequence of "te", "wo", and "ni".As per the frequency order established in your study, they should be listed as "te", "wo", and "ni".However, we have observed instances where the sequence appears as "te", "ni", and "wo".Unless there is a specific reason for this inconsistency, we recommend standardizing the order for improved coherence.
Secondly, I noticed that the analysis carried out in this study could arguably be conducted without the use of digital technology.In order to enhance the position of your paper within the landscape of existing research and amplify its overall value, it would be beneficial to offer a clearer context.Specifically, it would be helpful if you could clarify whether no such analysis existed prior to the advent of digital technology, or if there was any similar analysis previously.If the latter is not the case, kindly offer a rationale for the absence of such analyses in the past.This contextual detail will further enrich the relevance of your work within the wider field of study.
I request your thoughtful consideration of these points.

Is the work clearly and accurately presented and does it cite the current literature?
Yes

Pádraic Moran
Department of Classics (School of Languages, Literatures and Cultures), University of Galway, Galway, County Galway, Ireland This article is a valuable study of wokotoken annotations, part of the Japanese kunten system of annotating Chinese/Sinitic texts.Its value lies in its computational approach.By counting the large number of complex wokotoken systems collected in Tsukishima 1986, the authors demonstrate very clear distribution patterns, which present new insights into Japanese annotation practices.The information is clearly and carefully presented, and the implications for future research (presented the Conclusion section) are very significant.This article will be of interest to specialists in kunten.It should also interest researchers on glossing more generally, since the computational approach adopted here could be a model for specialists in other traditions to follow.
1) I saw just one significant error: p. 5: 'Examples are shown in the 'カ/ナ' in the top left-hand corner (-2, -2) of the fifth tsubo in Figure 1.'This appears to be the third tsubo (middle left), not the fifth.(Also 'カ/ナ'in the text is confusing, since the figure has the reverse: ナ/カ.Recommend to change the text accordingly.) 2) I have two recommendations: p. 5, table 1: Several shapes are followed by a sign ↑or ↓.These seem to require explanation.
p. 7, figure 2: Although the figure is extremely helpful, it is a little hard to read.The Y-axis runs from +3 on top to -3 on the bottom.However, the corresponding table 5 has -3 on top and +3 on the bottom.This is more intuitive, since -3 represents the top of a character.Recommend to reverse the Y-axis in figure 2.
3) The following minor errors should be corrected:

Brian Steininger
Department of East Asian Studies, Princeton University, Princeton, New Jersey, USA The article presents a format for digitizing data on vernacular reading glosses in the okototen format, applied to classical Chinese texts in premodern Japan.Using this methodology, the authors compare data from "comprehensive charts" (prescriptive instructions for producing glosses kept by a particular school) and "inductive charts" (diagrams of actual glosses found in primary sources made by contemporary researchers--in this case, the work of Tsukishima Hiroshi).This is an important distinction.The statistical results of the survey contain few major surprises, though there is a somewhat interesting discrepancy in regard to the use of the space to the right of the character.As indicated in the conclusion, the major importance of this research is as foundational work building towards essential new tools for digitization and semi-automated analysis of kunten materials.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound?Yes

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate?Yes Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Reception of Chinese literature in early and medieval Japan I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com

Figure 1 .
Figure 1.Example of a wokototen chart for Kobunshōsho (古文尚書), an important cultural property held at the Toyo Bunko Museum in Tokyo made by Teiji Kosukegawa.

Table 4
. Number of shapes used to represent the readings "te", "ni", and "wo".

Table 5 .
Number of wokototen by location.

Open Peer Review Current Peer Review Status: Version 1
https://doi.org/10.5256/f1000research.144061.r186342© 2023 Hmeljak Sangawa K.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

the study design appropriate and is the work technically sound? Yes Are sufficient details of methods and analysis provided to allow replication by others? Yes If applicable, is the statistical analysis and its interpretation appropriate? Yes Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Is the work clearly and accurately presented and does it cite the current literature? Yes Is the study design appropriate and is the work technically sound? Yes Are sufficient details of methods and analysis provided to allow replication by others? Yes If applicable, is the statistical analysis and its interpretation appropriate? Yes Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes Competing Interests:
An example is shown' 'into two separate totals.The one of wokototen of the shape 'L' reading 'カ' and the other of wokototen of the shape 'L' reading 'ナ.' > 'into two separate items, one for the wokototen shape 'L' reading 'カ' and the other for the wokototen shape 'L' reading 'ナ'.'No competing interests were disclosed.Comparative glossing (annotation) practices generally, specialising in the European Latin tradition, but with an interest in Japanese glossing.Digital Humanities and computational analysis of texts.
I confirm that I

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.