2024 Volume 28 Issue 6 Pages 1263-1272
Serendipity-oriented recommender systems aim to counteract the overspecialization of user preferences. However, evaluating a user’s serendipitous response to a recommended item can be challenging owing to its emotional nature. In this study, we address this issue by leveraging the rich knowledge of large language models (LLMs) that can perform various tasks. First, it explores the alignment between the serendipitous evaluations made by LLMs and those made by humans. In this study, a binary classification task was assigned to the LLMs to predict whether a user would find the recommended item serendipitously. The predictive performances of three LLMs were measured on a benchmark dataset in which humans assigned the ground truth to serendipitous items. The experimental findings revealed that LLM-based assessment methods do not have a very high agreement rate with human assessments. However, they performed as well as or better than the baseline methods. Further validation results indicate that the number of user rating histories provided to LLM prompts should be carefully chosen to avoid both insufficient and excessive inputs and that interpreting the output of LLMs showing high classification performance is difficult.
This article cannot obtain the latest cited-by information.