網易首頁 > 網易號 > 正文申請入駐

NHB重磅研究：人類和大語言模型，誰的創(chuàng)造力更強？

2025-12-29 12:15:44　來源: PsyBrain腦心前沿

北京舉報

分享至

PsyBrain 腦心前沿 | 公眾號 PSY-Brain_Frontier

一鍵關注，點亮星標 ??

不錯過每日前沿資訊

認知神經科學前沿文獻分享

基本信息

Title:A large-scale comparison of divergent creativity in humans and large language models

發(fā)表時間：2025.12.23

發(fā)表期刊:Nature Human Behaviour

影響因子：16.0

獲取原文：

添加小助手:PSY-Brain-Frontier即可獲取PDF版本

研究背景

從愛因斯坦重構時空概念，到嬰兒將熟悉的物體重新組合，創(chuàng)造力一直是人類智慧皇冠上的明珠，也是推動科學、商業(yè)和藝術進步的根本動力。隨著人工智能（AI）和自然語言處理（NLP）技術的飛速發(fā)展，特別是生成式預訓練變換器（GPT）等大語言模型（LLM）的出現，我們似乎站在了一個新的歷史轉折點：機器是否已經具備了媲美甚至超越人類的創(chuàng)造力？

目前的初步研究顯示，LLM在某些創(chuàng)造力測試（如替代用途任務，AUT）中似乎能與人類平分秋色，甚至略勝一籌。然而，這些結論往往依賴于人類評分員的主觀判斷，且難以在大規(guī)模樣本上復現。更重要的是，我們尚不清楚LLM的“創(chuàng)造力”究竟是源于對語義的深刻理解，還是僅僅依賴于概率性的詞匯拼接？

當前領域亟待解決的一個核心爭議在于：LLM生成的“新穎性”是否具有真正的分布多樣性？為了回答這一問題，本研究跳出了傳統的小樣本、主觀評分范式，利用發(fā)散聯想任務（Divergent Association Task, DAT），對近1萬名人類被試和包含GPT-4、Claude 3、DeepSeek-R1在內的9種主流LLM（共計超過21萬次觀測）進行了前所未有的大規(guī)模比較。這不僅是一場人機算力的較量，更是一次對創(chuàng)造力本質的深度認知科學探索。

研究核心總結

本研究 2025年12月23日發(fā)表于Nature Human Behaviour，通過嚴謹的計算模型和大規(guī)模數據分析，揭示了人類與LLM在發(fā)散性創(chuàng)造力上的本質差異。

Fig. 1 | Comparison of the divergent creativity scores between humans and LLMs.

核心發(fā)現一：均值相似，但人類在“極端卓越性”上完勝

研究結果顯示，在整體平均分上，人類（Mean=78.26）略高于LLM（Mean=77.90），差異雖有統計學意義但效應量較小。然而，真正的差異體現在分布形態(tài)上（Second-order statistics）：人類表現出的方差（Variance）顯著高于LLM。這意味著LLM的輸出高度趨同，表現出一種“平庸的穩(wěn)定性”；而人類數據的分布則具有極長的右尾（Right-hand tail）。頂尖的人類被試（Top 10%）在創(chuàng)造力得分上顯著碾壓了包括GPT-4 Turbo在內的所有頂級模型。這一結果有力地反駁了“AI已全面超越人類創(chuàng)造力”的論調，表明在極具挑戰(zhàn)性的高水平創(chuàng)造力任務中，人類的認知優(yōu)勢依然不可撼動。

Fig. 2 | Comparison of divergent creativity scores across different temperature values for LLMs.

核心發(fā)現二：LLM的語義同質性與詞匯循環(huán)

通過詞袋模型（Bag of Words）分析，研究者發(fā)現LLM雖然生成的有效詞匯量更多，但其唯一詞（Unique words）的比例顯著低于人類。LLM傾向于在不同的對話中重復使用相同的詞匯組合（例如反復生成“蘋果、云、椅子”的不同排列），顯示出其缺乏真正的詞匯多樣性。相比之下，人類基于豐富的生活經驗和情感體驗，能夠調動更廣泛的語義網絡，產生更具異質性的聯想。

Fig. 3 | Comparison of divergent creativity scores across different perspective prompts for LLMs.

核心發(fā)現三：提示工程（Prompt Engineering）的局限與反直覺效應

研究進一步探索了提升LLM表現的邊界條件：

溫度參數（Temperature）：提高模型的隨機性參數（Temperature > 0.5）雖然能提升DAT得分，但會導致輸出質量急劇下降，出現大量無意義的亂碼或不存在的詞匯（Garbled responses）。這說明LLM所謂的“高創(chuàng)造力”在極端參數下往往是以犧牲語義連貫性為代價的統計噪聲。
角色扮演失效：當要求LLM扮演具有高創(chuàng)造力的歷史人物（如“像愛因斯坦一樣思考”）時，其表現反而不如基線水平。
人口學模擬偏差：當要求LLM模擬不同年齡或性別的人群時，其表現未能復現人類真實的人口學差異模式，甚至呈現出相反的趨勢。

Fig. 4 | Comparison of divergent creativity scores across different celebrity prompts for LLMs.

關鍵意義與理論貢獻

本研究不僅確立了人類在頂尖創(chuàng)造力（Expert-level creativity）上的獨特優(yōu)勢，還揭示了LLM作為“輔助工具”的最佳定位：LLM能夠有效提升創(chuàng)造力的基線水平（Floor-raiser），適合處理常規(guī)性的發(fā)散任務；但在需要深度語義理解和突破性思維的領域，人類的直覺與經驗仍不可替代。未來的認知神經科學研究應關注這種“人機協作”模式下的認知負荷分配，即如何利用LLM的系統性探索能力來增強人類的直覺創(chuàng)造力。

Fig. 5 | Comparison of divergent creativity scores across different demographic prompts for LLMs.

Abstract

Human–machine partnerships are increasingly used to address grand societal challenges, yet knowledge of the comparative strengths of humans and machines to innovate is nascent. Here we compare the ability of humans (N?=?9,198) and large language models (LLMs, N?=?215,542 observations) to generate novel ideas in an established creativity task. We present three key results. First, human creativity on average is slightly higher than that of LLMs. Second, creativity differences are pronounced at the extremes of the distribution, with humans exhibiting greater variability and higher levels of creativity in the right-hand tail of the distribution. Third, attempts to increase the creativity of LLMs through instructing LLMs to take on genius personas or different demographic roles lifted performance up to a threshold beyond which the output became opposite real-life patterns, whereas strategic prompt-engineering efforts yielded mixed to negative results. We discuss the implications of our findings for human–machine collaboration and problem solving.

請打分

這篇剛剛登上Nature Human Behaviour的研究，是否實至名歸？我們邀請您作為“云審稿人”，一同品鑒。精讀全文后，歡迎在匿名投票中打分，并在評論區(qū)分享您的深度見解。

前沿交流|歡迎加入認知神經科學前沿交流群！

核心圖表、方法細節(jié)、統計結果與討論見原文及其拓展數據。

分享人：飯哥

審核：PsyBrain 腦心前沿編輯部

特別聲明：以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網易號”用戶上傳并發(fā)布，本平臺僅提供信息存儲服務。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.