Joho Chishiki Gakkaishi

[title in Japanese]

[in Japanese]

2024Volume 34Issue 3 Pages 231
Published: September 30, 2024
Released on J-STAGE: November 09, 2024

DOIhttps://doi.org/10.2964/jsik_2024_021

JOURNAL FREE ACCESS

Download PDF (431K)
Visualizing Research Area Differences in Archives-Related Academic Societies and Developing a Model to Predict the Appropriate Journal for Paper Submission

Toshiki SHIMBARU, Naoto KAI, Kahori OGASHIWA, Sachio FUNAKOSHI

Article type: Research Paper
2024Volume 34Issue 3 Pages 232-243
Published: 2024
Released on J-STAGE: November 09, 2024
Advance online publication: June 07, 2024

DOIhttps://doi.org/10.2964/jsik_2024_004

JOURNAL FREE ACCESS

Show abstractHide abstract

　To efficiently determine the best journal for publishing a new paper, this study developed a model that predicts the most suitable journal by inferring the content from each article's title. Initially, the study utilized ChatGPT to infer the contents from the titles of articles published in four different journals. The results were used to visualize the differences between articles, identifying four distinct clusters. It was observed that journals A, B, and D predominantly published papers from one particular cluster, whereas journal C demonstrated a more balanced approach, publishing papers from each cluster. Subsequently, the authors employed machine learning techniques to create a predictive model for identifying the optimal journal for submission. This model used the content of each paper as the explanatory variable and the choice of a journal as the dependent variable. The model achieved a high predictive accuracy with an AUC (Area Under the Curve) score of approximately 0.88.

View full abstract

Download PDF (1651K)
An Empirical Comparison and Ensemble Learning Methods of BERT Models on Authorship Attribution

Taisei KANDA, Mingzhe JIN

2024Volume 34Issue 3 Pages 244-255
Published: September 30, 2024
Released on J-STAGE: November 09, 2024

DOIhttps://doi.org/10.2964/jsik_2024_022

JOURNAL FREE ACCESS

Show abstractHide abstract

Bidirectional Encoder Representations from Transformers (BERT) is a general-purpose language model designed to be pre-trained on a large amount of training data, fine-tuned, and then adapted to tasks in individual fields. Japanese BERT models have been released based on training data from Wikipedia, Aozora Bunko, and Japanese business news articles, which are relatively easy to obtain. In this study, we compared the performance of multiple BERT models constructed from different pre-training data on an author attribution task, and analyzed the impact of pre-training data on individual tasks. We also studied methods to improve the accuracy of author attribution models by ensemble learning using multiple BERT models. As a result, we found that a BERT model trained on the Aozora Bunko corpus performed well in estimating authors in Aozora Bunko. This clearly shows that pre training data affected the performance of the model when solving individual tasks. We also found that the performance of an ensemble learning architecture comprising multiple BERT models was better than that of a single model.

View full abstract

Download PDF (1541K)
Relationships between Common Factors in Story Plot and Genres

Hajime MURAI, Mizuki AOYAMA, Shoki OHTA, Arisa OHBA, Takaki FUKUMOTO, ...

Article type: Research Paper
2024Volume 34Issue 3 Pages 256-270
Published: September 30, 2024
Released on J-STAGE: November 09, 2024

DOIhttps://doi.org/10.2964/jsik_2024_023

JOURNAL FREE ACCESS

Show abstractHide abstract

　The conventional story analysis targets a specific genre, and few research has been conducted to quantitatively clarify the differences between genres. In this study, in order to realize a cross-genre story structure analysis, more than 1500 episodes were collected for 5 genres (adventure, combat, love, detective, ghost story) that frequently appear in contemporary Japanese entertainment works. We also constructed a data set that can be compared by structurally analyzing all genres in a common framework. Based on the genre data set, common and unique factors of story plot were identified by factor analysis. Moreover, the structural characteristics of sub-genre were extracted by utilizing cluster analysis. Since the characteristics of each genre can be compared based on the same criteria, it is expected that the analysis and automatic generation of complex genre stories will be executed in the future.

View full abstract

Download PDF (1048K)
Report on JAPAN OPEN SCIENCE SUMMIT 2024

JSIK, Special Interest Group on Open Science and Open Data, ed.

2024Volume 34Issue 3 Pages 271-303
Published: September 30, 2024
Released on J-STAGE: November 09, 2024

DOIhttps://doi.org/10.2964/jsik_2024_024

JOURNAL FREE ACCESS

Download PDF (2082K)

Register with J-STAGE for free!