To efficiently determine the best journal for publishing a new paper, this study developed a model that predicts the most suitable journal by inferring the content from each article's title. Initially, the study utilized ChatGPT to infer the contents from the titles of articles published in four different journals. The results were used to visualize the differences between articles, identifying four distinct clusters. It was observed that journals A, B, and D predominantly published papers from one particular cluster, whereas journal C demonstrated a more balanced approach, publishing papers from each cluster. Subsequently, the authors employed machine learning techniques to create a predictive model for identifying the optimal journal for submission. This model used the content of each paper as the explanatory variable and the choice of a journal as the dependent variable. The model achieved a high predictive accuracy with an AUC (Area Under the Curve) score of approximately 0.88.
Bidirectional Encoder Representations from Transformers (BERT) is a general-purpose language model designed to be pre-trained on a large amount of training data, fine-tuned, and then adapted to tasks in individual fields. Japanese BERT models have been released based on training data from Wikipedia, Aozora Bunko, and Japanese business news articles, which are relatively easy to obtain. In this study, we compared the performance of multiple BERT models constructed from different pre-training data on an author attribution task, and analyzed the impact of pre-training data on individual tasks. We also studied methods to improve the accuracy of author attribution models by ensemble learning using multiple BERT models. As a result, we found that a BERT model trained on the Aozora Bunko corpus performed well in estimating authors in Aozora Bunko. This clearly shows that pre training data affected the performance of the model when solving individual tasks. We also found that the performance of an ensemble learning architecture comprising multiple BERT models was better than that of a single model.
The conventional story analysis targets a specific genre, and few research has been conducted to quantitatively clarify the differences between genres. In this study, in order to realize a cross-genre story structure analysis, more than 1500 episodes were collected for 5 genres (adventure, combat, love, detective, ghost story) that frequently appear in contemporary Japanese entertainment works. We also constructed a data set that can be compared by structurally analyzing all genres in a common framework. Based on the genre data set, common and unique factors of story plot were identified by factor analysis. Moreover, the structural characteristics of sub-genre were extracted by utilizing cluster analysis. Since the characteristics of each genre can be compared based on the same criteria, it is expected that the analysis and automatic generation of complex genre stories will be executed in the future.