かな単語マルコフ連鎖モデルを用いたかな漢字変換法

加藤 省三; 荒木 睦大; 小越 康宏; 谷口 秀次; 森 幹男

doi:10.1541/ieejeiss.130.1054

Abstract

The processing of kana-to-kanji conversion can be classified into two categories of processing: The first is the processing to detect the boundaries of words in non-segmented kana strings, and the second is the processing to select the candidate of kanji-kana words. Also, the methods of kana-to-kanji conversion can be mainly classified into two types from the point of view of the two processing described above: One is to conduct simultaneously these two processing (called Method-A), and the other is to conduct sequentially them (called Method-B), namely, to detect the boundaries of kana words by using Markov chain model of kana words, and then to convert kana words to kanji-kana words and to select the maximum likely candidates by using Markov chain model of kanji-kana words. This paper evaluates two types of kana-to-kanji conversion method (Method-A and Method-B) by using 2nd-order Markov chain model of words. Through the experiments by using statistical data of daily Japanese newspaper, Method-A and Method-B are evaluated by the criteria of the accuracy rate of conversion, the conversion processing time and the memory capacity. From the results of the experiments, it is concluded that the Method-B is superior to Method-A in the conversion processing time and the memory capacity and is effective in kana-to-kanji conversion of bunsetsu.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!