Circulation Reports
Online ISSN : 2434-0790

This article has now been updated. Please use the final version.

Performance Evaluation of Large Language Models With Retrieval-Augmented Generation in Cardiology Specialist Examinations in Japan
Hiromasa HayamaTu Hao TranJin KirigayaYosuke KatayamaTomoko NegishiKoya OzawaKazuaki Negishi
Author information
JOURNAL OPEN ACCESS FULL-TEXT HTML Advance online publication

Article ID: CR-25-0094

Details
Abstract

Background: Large language models (LLMs) have shown potential in medical education, but their application to cardiology specialist examinations remains underexplored. We compared the performances of a retrieval-augmented generation LLM (RAG-LLM) ‘CardioCanon’ against general-purpose LLMs.

Methods and Results: A total of 96 publicly available text-based open-source multiple-choice questions from the Japanese Cardiology Specialist Examination (1997–2022) were used. CardioCanon showed similar option-level accuracy to ChatGPT-4o and Gemini 2.0 Flash (81.0%, 76.0%, and 77.2%, respectively), but higher case-based accuracy than ChatGPT (57.3% vs. 29.2%, P<0.001).

Conclusions: RAG techniques can enhance AI-assisted examination performance by improving case-level reasoning and decision-making.

Fullsize Image
Content from these authors
© 2025, THE JAPANESE CIRCULATION SOCIETY

This article is licensed under a Creative Commons [Attribution-NonCommercial-NoDerivatives 4.0 International] license.
https://creativecommons.org/licenses/by-nc-nd/4.0/
feedback
Top