IEEJ Transactions on Electronics, Information and Systems
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<Information Processing, Software>
Identification and Classification of Research Data Cited in Scholarly Papers
Masaya TsunokakeShigeki Matsubara
Author information
JOURNAL RESTRICTED ACCESS

2020 Volume 140 Issue 12 Pages 1357-1364

Details
Abstract

This paper proposes a method for identifying and classifying the research data cited in scholarly papers, aiming at automatic generation of metadata stored in data repository. This study focuses on URL citations in the scholarly papers. That is, the targets are to identify the URLs referring to the research data and to classify them into tool and data. The method is realized as a multi-class classification (tool/data/others). The method acquires the distributed representations of the URLs from the context around them, and uses them as the input feature. There exists an advantage in that the meanings of URLs can be given based on their surrounding words. This study adopts an approach of computing the meaning of the entire URL from those of the components of the URL. In order to evaluate the performance of the proposed method, experiments on URL classification were conducted. The scholarly papers included in the proceedings of the international conference were used as experimental data. Experimental results have shown the effectiveness of the proposed method for identifying and classifying URLs referring to research data.

Content from these authors
© 2020 by the Institute of Electrical Engineers of Japan
Previous article Next article
feedback
Top