<cite id="ffb66"></cite><cite id="ffb66"><track id="ffb66"></track></cite>
      <legend id="ffb66"><li id="ffb66"></li></legend>
      色婷婷久,激情色播,久久久无码专区,亚洲中文字幕av,国产成人A片,av无码免费,精品久久国产,99视频精品3
      網易首頁 > 網易號 > 正文 申請入駐

      重溫神作《苦澀的教訓》:預判了從 GPT 到 o1/r1 到 Manus,以及更多...

      0
      分享至


      這是 AI 從業者的必讀指南

      周末,讓我們重溫下《苦澀的教訓》這一神作,發布于 2019 年,預言全中,作者 Rich Sutton,是現代強化學習之父

      Rich Sutton 寫下《苦澀的教訓》,核心觀點只有一句:搜索和學習這兩種通用方法,配合算力擴展,最終會碾壓一切精巧的人工設計

      當時主流觀點還是「純堆算力不行,要嵌入人類知識」。然后 GPT-3 來了,Scaling Laws 被驗證了,語言學家設計了幾十年的 NLP 流水線被一個 Transformer 端到端取代,ChatGPT 爆發。預言全部兌現

      這個預測,現在也正在 Agent 領域繼續驗證

      推理模型把搜索內化到模型內部,o1、DeepSeek-R1 不需要外部設計思維鏈,模型自己在 token 空間里搜索推理路徑

      Manus 這類 Agent 更進一步(他們在定方向的時候,復用了 Sutton 的結論:交給模型):模型自己判斷用什么工具、怎么拆解任務、如何執行。不再需要人工編排 workflow

      這和 Sutton 六年前的判斷完全一致:別折騰精巧設計了,通用方法配合算力擴展,最終會贏


      苦澀的教訓(譯)

      AI 研究 70 年,最大的教訓只有一個:利用算力的通用方法,最終總是最有效的,而且優勢極其明顯

      根本原因在于摩爾定律,或者更準確地說,在于單位算力成本持續指數級下降這一更普遍的規律。大多數 AI 研究都有一個隱含假設:智能體可用的算力是固定的。在這個假設下,嵌入人類知識幾乎是提升性能的唯一途徑。但只要把時間尺度稍微拉長,超出一個典型研究項目的周期,算力就會出現數量級的增長

      為了在短期內做出成果,研究者傾向于利用自己對領域的理解。但長期來看,真正重要的只有一件事:如何利用算力。這兩條路線理論上可以并行,實踐中卻往往相互排斥。時間花在一邊,就沒法花在另一邊。心理上也會形成路徑依賴。更麻煩的是,人類知識導向的方法往往把系統搞得很復雜,反而不利于發揮通用方法的算力優勢。AI 研究者一次又一次地遲到才學會這個苦澀的教訓,回顧幾個最典型的案例很有啟發

      在國際象棋領域,1997 年擊敗卡斯帕羅夫的方法,核心就是大規模深度搜索。當時大多數計算機象棋研究者對此很不滿。他們一直在研究如何利用人類對棋局結構的理解。當一個更簡單的、基于搜索的方法配合專用硬件和軟件被證明遠遠更有效時,這些研究者輸得并不體面。他們說「暴力搜索」這次贏了,但這不是通用策略,而且人類下棋也不是這么下的。他們希望基于人類知識的方法獲勝,結果失望了

      圍棋領域上演了同樣的劇情,只是晚了 20 年。早期投入了大量精力來避免搜索,想辦法利用人類知識,利用圍棋的特殊結構。但當搜索被有效地大規模應用后,所有這些努力都變得無關緊要,甚至適得其反。同樣重要的是通過自我對弈,來學習價值函數(即:讓 AI 自己跟自己下棋,學習判斷局面好壞)

      這個方法在很多游戲甚至國際象棋中都很關鍵,盡管學習在 1997 年首次擊敗世界冠軍的程序中并沒有起主要作用。自我對弈學習,乃至學習本身,和搜索一樣,都是讓大規模算力發揮作用的方式。搜索和學習是 AI 研究中利用海量算力的兩類最重要技術。在圍棋領域,和國際象棋一樣,研究者最初把精力放在利用人類理解上,希望減少搜索量,很久之后才通過擁抱搜索和學習取得了大得多的成功

      在語音識別領域,1970 年代 DARPA 資助了一場早期競賽。參賽者中有大量利用人類知識的特殊方法,涉及關于單詞、音素、人類聲道等等的知識。另一邊是更新的統計方法,計算量更大,基于隱馬爾可夫模型(HMM)。統計方法再次戰勝了人類知識導向的方法。這引發了整個自然語言處理領域幾十年的漸變,統計和計算開始主導這個領域。近年來深度學習在語音識別中的崛起,是這個方向上最新的一步。深度學習方法對人類知識的依賴更少,使用更多算力,在海量訓練集上學習,產生了效果好得多的語音識別系統。和游戲領域一樣,研究者總是試圖讓系統按照他們認為自己大腦工作的方式來運作。他們試圖把這些知識嵌入系統。但最終證明這是適得其反的,是對研究者時間的巨大浪費。因為通過摩爾定律,海量算力變得可用,而且找到了利用它的方法

      計算機視覺領域也是同樣的模式。早期方法把視覺理解為搜索邊緣、廣義圓柱體、或者 SIFT 特征。但今天這些全被拋棄了。現代深度學習神經網絡只使用卷積和某些不變性的概念,效果卻好得多

      這是一個重大教訓。作為一個領域,我們仍然沒有徹底學會它,因為我們還在犯同樣的錯誤。要看清這一點并有效抵制它,我們必須理解這些錯誤為什么有吸引力。我們必須學會這個苦澀的教訓:把我們自以為的思維方式嵌入系統,長期來看行不通

      苦澀的教訓基于以下歷史觀察:
      1)AI 研究者經常試圖把知識嵌入智能體
      2)這在短期內總是有幫助的,而且讓研究者個人很有成就感
      3)但長期來看會遇到瓶頸,甚至阻礙進一步發展
      4)突破性進展最終來自相反的方法,即通過搜索和學習來擴展算力

      最終的成功帶著苦澀,而且往往消化不完全,因為它是對一種受偏愛的、以人類為中心的方法的勝利

      從苦澀的教訓中應該學到的第一點是:通用方法的力量是巨大的。這些方法能隨著算力增加而持續擴展,即使算力變得非常大也能繼續擴展。能夠這樣無限擴展的方法似乎只有兩種:搜索學習

      第二點是:心智的實際內容極其復雜,而且這種復雜性無法簡化。我們應該停止尋找簡單的方式來思考心智的內容,比如關于空間、物體、多智能體或對稱性的簡單概念。所有這些都是外部世界的一部分,而外部世界是任意的、內在復雜的。我們不應該把這些內容嵌入系統,因為它們的復雜性是無窮的。我們應該只嵌入能夠發現和捕捉這種任意復雜性的元方法。這些方法的關鍵在于:它們能夠找到好的近似解,但尋找的過程應該由我們的方法來完成,而不是由我們人類親自來完成

      我們想要的是能夠像我們一樣去發現的 AI 智能體,而不是包含我們已有發現的 AI 智能體

      把我們的發現嵌入系統,只會讓我們更難看清發現過程本身是如何運作的


      The Bitter Lesson

      The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.

      In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search. At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess. When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers. They said that "brute force" search may have won this time, but it was not a general strategy, and anyway it was not how people played chess. These researchers wanted methods based on human input to win and were disappointed when they did not.

      A similar pattern of research progress was seen in computer Go, only delayed by a further 20 years. Enormous initial efforts went into avoiding search by taking advantage of human knowledge, or of the special features of the game, but all those efforts proved irrelevant, or worse, once search was applied effectively at scale. Also important was the use of learning by self play to learn a value function (as it was in many other games and even in chess, although learning did not play a big role in the 1997 program that first beat a world champion). Learning by self play, and learning in general, is like search in that it enables massive computation to be brought to bear. Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research. In computer Go, as in computer chess, researchers' initial effort was directed towards utilizing human understanding (so that less search was needed) and only much later was much greater success had by embracing search and learning.

      In speech recognition, there was an early competition, sponsored by DARPA, in the 1970s. Entrants included a host of special methods that took advantage of human knowledge---knowledge of words, of phonemes, of the human vocal tract, etc. On the other side were newer methods that were more statistical in nature and did much more computation, based on hidden Markov models (HMMs). Again, the statistical methods won out over the human-knowledge-based methods. This led to a major change in all of natural language processing, gradually over decades, where statistics and computation came to dominate the field. The recent rise of deep learning in speech recognition is the most recent step in this consistent direction. Deep learning methods rely even less on human knowledge, and use even more computation, together with learning on huge training sets, to produce dramatically better speech recognition systems. As in the games, researchers always tried to make systems that worked the way the researchers thought their own minds worked---they tried to put that knowledge in their systems---but it proved ultimately counterproductive, and a colossal waste of researcher's time, when, through Moore's law, massive computation became available and a means was found to put it to good use.

      In computer vision, there has been a similar pattern. Early methods conceived of vision as searching for edges, or generalized cylinders, or in terms of SIFT features. But today all this is discarded. Modern deep-learning neural networks use only the notions of convolution and certain kinds of invariances, and perform much better.

      This is a big lesson. As a field, we still have not thoroughly learned it, as we are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

      One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

      The second general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. Essential to these methods is that they can find good approximations, but the search for them should be by our methods, not by us. We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

      特別聲明:以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網易號”用戶上傳并發布,本平臺僅提供信息存儲服務。

      Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.

      相關推薦
      熱點推薦
      除夕當天,牢記1不洗、2不倒、3不回、4不空、5要吃,吉祥過大年

      除夕當天,牢記1不洗、2不倒、3不回、4不空、5要吃,吉祥過大年

      秀廚娘
      2026-02-15 08:45:25
      吃著中國,想著日本?哈薩克斯坦距離烏克蘭第二,到底還有多遠?

      吃著中國,想著日本?哈薩克斯坦距離烏克蘭第二,到底還有多遠?

      靜夜史君
      2026-02-15 23:47:18
      米蘭冬奧會速滑男子500米!前三名均破奧運紀錄!高亭宇獲第七

      米蘭冬奧會速滑男子500米!前三名均破奧運紀錄!高亭宇獲第七

      小蘭看體育
      2026-02-15 17:21:27
      是國米球迷!意大利參議院議長:國米在主場贏尤文會感到雙倍快樂

      是國米球迷!意大利參議院議長:國米在主場贏尤文會感到雙倍快樂

      硯底沉香
      2026-02-16 02:03:14
      美媒:美司法部致函國會議員的愛潑斯坦案名單竟出現夢露等人,美議員質疑“故意混淆視聽”

      美媒:美司法部致函國會議員的愛潑斯坦案名單竟出現夢露等人,美議員質疑“故意混淆視聽”

      環球網資訊
      2026-02-15 19:38:08
      馬筱梅情人節曬幸福!汪小菲去臺北過節,小玥兒和箖箖跟后外婆睡

      馬筱梅情人節曬幸福!汪小菲去臺北過節,小玥兒和箖箖跟后外婆睡

      離離言幾許
      2026-02-14 23:12:38
      TOP14位身高170以上的女神,有顏有燈有演技

      TOP14位身高170以上的女神,有顏有燈有演技

      素然追光
      2026-01-02 02:45:02
      網紅幼虎去世后被“替身”直播,死亡7天后飼養員稱“正曬太陽”;區政府成立調查組,信息上報、跨園轉運真相成謎

      網紅幼虎去世后被“替身”直播,死亡7天后飼養員稱“正曬太陽”;區政府成立調查組,信息上報、跨園轉運真相成謎

      大風新聞
      2026-02-15 18:00:11
      體壇名將放棄中國國籍,轉為美國國籍,14歲時在亞運會創造歷史

      體壇名將放棄中國國籍,轉為美國國籍,14歲時在亞運會創造歷史

      米修體育
      2026-01-24 12:47:31
      應急管理部派出工作組趕赴江蘇東海事故現場指導工作

      應急管理部派出工作組趕赴江蘇東海事故現場指導工作

      國際在線
      2026-02-16 02:55:03
      今年沒有年三十,什么時候貼春聯最好?2個黃金時段定好

      今年沒有年三十,什么時候貼春聯最好?2個黃金時段定好

      白淺娛樂聊
      2026-02-15 12:49:58
      佘詩曼辛苦一年終于放假游泰國,偷拍媽媽挑水果背面照充滿幸福

      佘詩曼辛苦一年終于放假游泰國,偷拍媽媽挑水果背面照充滿幸福

      老頭的傳奇色彩
      2026-02-14 19:05:06
      美國歷史上第一位二百五總統即將誕生,就是當今美國總統特朗普…

      美國歷史上第一位二百五總統即將誕生,就是當今美國總統特朗普…

      福建平子
      2026-02-08 13:17:52
      偷雞摸狗,好賭成性?離過年僅四天,何慶魁的體面被兒子撕得粉碎

      偷雞摸狗,好賭成性?離過年僅四天,何慶魁的體面被兒子撕得粉碎

      筆墨V
      2026-02-14 18:34:18
      房子里有“不干凈”的東西,會有以下3種特征,占一樣也不得了

      房子里有“不干凈”的東西,會有以下3種特征,占一樣也不得了

      神奇故事
      2026-01-05 23:24:05
      古巴已進入倒計時。

      古巴已進入倒計時。

      素顏為誰傾城人
      2026-02-15 05:04:46
      原來iPhone信號差是沒開對,這個隱藏設置一開,信號直接滿格

      原來iPhone信號差是沒開對,這個隱藏設置一開,信號直接滿格

      小柱解說游戲
      2026-02-13 12:20:15
      谷愛凌再遭美國網友網暴:沒收她的財產!回美國是非法滯留 魯比奧查她

      谷愛凌再遭美國網友網暴:沒收她的財產!回美國是非法滯留 魯比奧查她

      小椰的奶奶
      2026-02-13 08:02:35
      福特號突然出動了,伊朗這下是真難了!

      福特號突然出動了,伊朗這下是真難了!

      Ck的蜜糖
      2026-02-16 02:51:46
      2026春晚第五次彩排完成,趙本山宋丹丹回歸懸疑終揭曉

      2026春晚第五次彩排完成,趙本山宋丹丹回歸懸疑終揭曉

      丁羂解說
      2026-02-15 14:12:01
      2026-02-16 05:08:49
      賽博禪心
      賽博禪心
      拜AI古佛,修賽博禪心
      293文章數 36關注度
      往期回顧 全部

      科技要聞

      發春節紅包的大廠都被約談了

      頭條要聞

      大學生寒假為媽媽店鋪當中老年服裝模特 撞臉明星

      頭條要聞

      大學生寒假為媽媽店鋪當中老年服裝模特 撞臉明星

      體育要聞

      NBA三分大賽:利拉德帶傷第三次奪冠

      娛樂要聞

      2026央視春晚最新劇透 重量級嘉賓登場

      財經要聞

      誰在掌控你的胃?起底百億"飄香劑"江湖

      汽車要聞

      奔馳中國換帥:段建軍離任,李德思接棒

      態度原創

      房產
      時尚
      旅游
      游戲
      本地

      房產要聞

      三亞新機場,又傳出新消息!

      多巴胺失寵了?過年這樣穿彩色時髦又減齡

      旅游要聞

      開放機關事業單位床位給游客,“寵客”還要善始善終

      LPL第一賽段還未結束,亞運會已有3隊退出LOL比賽,包括東道主

      本地新聞

      春花齊放2026:《駿馬奔騰迎新歲》

      無障礙瀏覽 進入關懷版