颈部单中心型Castleman病临床辅助诊疗中ChatGPT o1和Claude 3.5 Sonnet的应用比较研究

颈部单中心型Castleman病临床辅助诊疗中ChatGPT o1和Claude 3.5 Sonnet的应用比较研究
DOI:
                        
作者:
                        
作者单位:兰州大学第一临床医学院
作者简介:
通讯作者:
基金项目:国家自然科学基金（82260475），甘肃省卫生健康行业科技创新重大行业项目（CSVSZD2024-02），甘肃省科技计划重点研发项目（25YFWA028）

The Comparative Study of the Application of ChatGPT o1 and Claude 3.5 Sonnet in the Clinical Diagnosis and Treatment of Unicentric Castleman Disease in the Neck

Author:

Affiliation:

Lanzhou University First Clinical Medical College

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

目的: 研究并对比分析ChatGPT o1与Claude 3.5 Sonnet在解答颈部单中心Castleman病相关常见问题时的差异。方法: 围绕颈部单中心Castleman病设计36个常见问题，问题被收集并输入到ChatGPT o1与Claude 3.5 Sonnet搜索引擎中，由耳鼻咽喉头颈外科学教授分别对ChatGPT o1和Claude 3.5 Sonnet生成的回答进行独立评估，评估内容涵盖回答的内容可读性、正确性、内容质量、易理解程度以及实际可操作性。结果: 在可读性方面，Claude 3.5 Sonnet在所有类别中生成的回答显著更简短（189.36 ± 69.09 vs 381.56 ± 153.28字，P=1.9×10?7），具有更低的阅读分数（1.68 ± 5.64 vs 11.20 ± 11.16，P=1.4×10?7），年级分数更高（54.93 ± 35.81 vs 16.70 ± 2.03，P =5.5×10?13）。在患者教育材料评估工具（Patient Education Materials Assessment Tool for Print Materials, PEMAT-P）评分衡量的可理解性和可操作性方面，Claude 3.5 Sonnet表现出更高的总体可理解性（0.38 ± 0.17 vs 0.06 ± 0.05，P=2.3×10?5）和可操作性（0.25 ± 0.22 vs 0.08 ± 0.09，P = 1.5 × 10?2）。然而，ChatGPT-o1的总体准确度分数更高（4.88 ± 0.28 vs 4.58 ± 0.37，P =2.2 × 10-3），并在修改后的基于证据的患者教育信息质量评估工具（Evidence-based Quality of Information in Patient Education，EQIP）标准下获得了更好的质量分数（7.47 ± 1.28 vs 5.75 ± 1.20，P=2.2×10?9）。

Abstract:

Objective: To study and compare the differences in how ChatGPT O1 and Claude 3.5 Sonnet answer common questions related to unicentric Castleman disease in the neck. Method: A total of 36 common questions related to unicentric Castleman disease of the neck were entered into the ChatGPT o1 and Claude 3.5 Sonnet search engines. The professor of otorhinolaryngology–head and neck independently evaluated the responses generated by ChatGPT o1 and Claude 3.5 Sonnet, assessing their readability,accuracy,quality, understandability, and practical applicability. Result: Regarding readability, Claude 3.5 Sonnet generated significantly more concise responses across all categories (189.36 ± 69.09 vs. 381.56 ± 153.28 characters, P=1.9×10?7), had lower reading scores (1.68 ± 5.64 vs. 11.20 ± 11.16, P=1.4×10?7) and exhibited higher grade-level scores (54.93 ± 35.81 vs. 16.70 ± 2.03, P =5.5×10?13). In terms of understandability and actionability as measured by PEMAT-P scores, Claude 3.5 Sonnet demonstrated higher overall understandability (0.38 ± 0.17 vs. 0.06 ± 0.05, P=2.3×10?5) and actionability (0.25 ± 0.22 vs. 0.08 ± 0.09, P = 1.5 × 10?2). Nevertheless, ChatGPT-o1 recorded a higher overall accuracy score (4.88 ± 0.28 vs 4.58 ± 0.37, P =2.2 × 10-3) and attained a superior quality rating under the revised EQIP standards (7.47 ± 1.28 vs 5.75 ± 1.20, P=2.2×10?9). Conclusion：Claude 3.5 Sonnet is superior in conciseness, understandability, and actionability, whereas ChatGPT-o1 is stronger in terms of accuracy, overall quality and readability.

参考文献

相似文献

引证文献

引用本文

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:2024-12-25
最后修改日期:2025-03-01
录用日期:2025-03-03
在线发布日期:

期刊首页

期刊简介

编委会

版权协议

期刊浏览

期刊订阅

广告合作

联系我们

引用本文

分享

文章指标

历史