IEEJ Transactions on Electronics, Information and Systems
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<Information Processing, Software>
Sample-based Collection and Adjustment Algorithm of Rules for Metadata Extraction on Business Documents
Toshiko MatsumotoMitsuharu ObaTakashi OnoyamaMasanori Akiyoshi
Author information
JOURNAL FREE ACCESS

2011 Volume 131 Issue 8 Pages 1502-1511

Details
Abstract
Toward facile introduction of metadata-based document management system, we propose an algorithm which uses sample documents and their manually specified metadata as training data, and generates metadata-extraction rules. Our algorithm enumerates candidates of keywords and layout characteristics specific to the metadata on the basis of metadata occurrence in the training data. And then it examines whether each candidate is specific to only one kind of metadata. In an experiment on Japanese business documents and weekly reports, automatically generated rules have achieved metadata extraction as accurate as manually adjusted one.
Content from these authors
© 2011 by the Institute of Electrical Engineers of Japan
Previous article Next article
feedback
Top