Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
A Written Child Corpus with Editing History Tags
Ryo NagataAyako KawaiKoji SudaJunichi KakegawaKoichiro Morihiro
Author information
JOURNAL FREE ACCESS

2010 Volume 17 Issue 2 Pages 2_51-2_65

Details
Abstract
Corpora have played a crucial role in natural language processing and linguistics. However, there have been very few corpora consisting of the writing of children because of difficulties peculiar to child corpus creation. In this paper, we propose a method for avoiding the difficulties and efficiently creating a child corpus. We have used the proposed method to create a child corpus to show its effectiveness. As a result, we have obtained a child corpus called Kodomo Corpus containing 39,269 morphemes, which is the largest written child corpus. Kodomo Corpus has a feature that the editing histories such as addition and deletion are traceable through its data tags.
Content from these authors
© 2010 The Association for Natural Language Processing
Previous article
feedback
Top