拉片視頻鏈接傳送門:
![]()
講者是來自OpenAI(之前在Anthropic)的研究員Karina Nguyen,她參與Claude1,ChatGPT Canvas,Tasks方面的工作。
下面是完整的演講內容,“親愛的數據”整理成逐句中英文對譯:
I worked at antarik for about two years, working on cloud.
我在 antarik 工作了大約兩年,專注于云相關工作。
Today, I would love to chat more about the scaling paradigms that have happened in the past two to four years in AI research, and how these paradigms unlocked new frontier product research.
今天,我想聊聊過去兩到四年在人工智能研究中出現的擴展范式(Scaling Paradigms),以及這些范式如何開啟了全新的前沿產品研究。
I’m also going to share some of the lessons learned by developing Claude and ChatGPT products, some design challenges and lessons, and how I think about the future of agents as they evolve from collaborators to co-innovators.
我也會分享開發 Claude 和 ChatGPT 產品過程中獲得的一些經驗教訓、設計挑戰,以及我如何看待智能體(Agents)從“合作者”演變為“共同創新者”(Co-innovators)的未來。
In the future, I would also love to invite you to engage in the conversation, and I’d be more than happy to answer some questions at the end.
之后也希望你們加入討論,我很樂意在最后回答你們的問題。
I think there are two scaling paradigms that happened in AI research over the past few years.
我認為過去幾年在 AI 研究中出現了兩種擴展范式。
The first paradigm is next-token prediction, also called pre-training.
第一種范式是下一個詞(token)的預測,也稱為“預訓練”(Pre-training)。
What’s amazing about next-token prediction is that it's essentially a world-building machine.
下一個詞預測之所以令人驚嘆,是因為它本質上是一個“世界構建機”(World-building Machine)。
The model learns to understand the world by predicting the next word, fundamentally because certain sequences are caused by initial actions which are irreversible, so the model learns some of the physics of the world.
模型通過預測下一個詞來理解世界,本質上是因為某些序列(Sequence)是由初始動作(Initial Actions)引起的,這種因果關系是不可逆的,所以模型能夠學到世界的一些物理規律。
Tokens can be anything—strings, words, pixels—so the model must understand how the world works to predict what's next.
詞(Token)可以是任意的東西,比如字符串(Strings)、單詞(Words)、像素(Pixels)等,因此模型必須理解世界的運行方式才能預測接下來會發生什么。
Next-token prediction is essentially massive multitask learning.
下一個詞的預測本質上是大規模的多任務學習(Massive Multitask Learning)。
During pre-training, some tasks are easy, such as translation, while others, like physics, problem-solving generation, logical expressions, and spatial reasoning, are very hard.
在預訓練期間,有些任務很容易,例如翻譯;而另一些任務,如物理知識、問題求解生成、邏輯表達(Logical Expressions)和空間推理(Spatial Reasoning),則非常困難。
Tasks involving computation, like math, require a "Chain of Thought" or extra computational resources during inference.
涉及數學這類需要大量計算的任務,需要使用“思維鏈”(Chain of Thought)或更多的計算資源。
Creative writing is particularly challenging because it involves world-building, storytelling, and maintaining plot coherence, making it easy for the model to lose coherence with just a small mistake.
創造性寫作(Creative Writing)尤其困難,因為它涉及構建世界(World-building)、講故事(Storytelling)以及保持情節連貫性(Plot Coherence),模型很容易因為細微的錯誤而導致情節完全失去連貫性。
Evaluating creative writing is also difficult, making it one of the hardest open-ended AI research problems today.
創造性寫作的評估也很困難,因此它是當今最難的開放式(Open-ended)AI 研究問題之一。
From 2020 to 2021, the first major product based on scaling pre-training was GitHub Copilot, which used billions of code tokens from open-source projects.
從 2020 到 2021 年,基于擴展預訓練的首個主要產品是 GitHub Copilot,它使用了開源項目中的數十億代碼 token。
Researchers improved its usability through Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF).
研究人員通過人類反饋強化學習(Reinforcement Learning from Human Feedback,RLHF)和 AI 反饋強化學習(Reinforcement Learning from AI Feedback,RLAIF)提升了它的實用性。
This introduced a "post-training" phase focused on completing functions, generating multi-line completions, and predicting diffs.
這引入了一個“后訓練”(Post-training)階段,專注于補全函數、生成多行代碼以及預測代碼差異(Diffs)。
The next major paradigm shift, published by OpenAI last year, is scaling reinforcement learning using "Chain of Thought" (CoT), enabling models to tackle highly complex reasoning tasks.
去年 OpenAI 提出了另一個重要范式轉變,即利用“思維鏈”(Chain of Thought,CoT)擴展強化學習(Scaling Reinforcement Learning),使模型能夠處理高度復雜的推理任務。
In CoT, models spend significantly more computational time reasoning step-by-step through problems.
在思維鏈中,模型會花費更多的計算時間逐步推理,解決問題。
A major design challenge is how to present the model's complex thought processes to users without making them wait too long.
一個主要的設計挑戰是如何將模型復雜的思考過程呈現給用戶,同時避免用戶等待時間過長。
This year is considered the "year of agents," characterized by complex reasoning, tool use, and long-context interactions.
今年被稱為“智能體之年”(Year of Agents),今年的特點是復雜推理(Complex Reasoning)、工具使用(Tool Use)和長上下文(Long-context)的互動。
The next stage will be agents evolving into co-innovators through creativity enabled by human-AI collaboration.
下一個階段,智能體將通過人類與 AI 的協作實現創造力,演變為共同創新者(Co-innovators)。
Future product research will involve rapidly iterating between highly complex models and smaller, distilled models.
未來的產品研究將涉及高復雜模型與更小、更快速的蒸餾模型(Distilled Models)之間的快速迭代。
Design challenges include making unfamiliar capabilities feel familiar (e.g., using file uploads), and enabling modular product features that scale easily.
設計挑戰包括使陌生的功能顯得熟悉(例如通過文件上傳)以及設計能夠輕松擴展的模塊化產品功能。
Trust remains a key bottleneck; solutions include better collaborative interfaces allowing real-time user feedback and verification.
信任仍然是關鍵瓶頸;解決方案包括開發更好的協作界面,使用戶能實時反饋和驗證。
Innovative tools like Claude's Slack integration, ChatGPT tasks, and Canvas illustrate the potential of collaborative and multimodal AI interfaces.
創新工具,如 Claude 的 Slack 整合、ChatGPT 的任務功能和 Canvas,展示了協作與多模態(Multimodal)AI 接口的潛力。
Ultimately, the future involves "invisible software creation," allowing anyone, even without coding experience, to create and deploy tools through AI.
最終的未來愿景是“無形的軟件創造”,即使沒有編程經驗的人也能通過 AI 創建和部署工具。
AI interfaces will evolve into highly personalized, multimodal, and interactive canvases, fundamentally changing how we interact with technology and the internet.
AI 界面將發展成高度個性化、多模態、互動的“畫布”(Canvas),從根本上改變我們與技術和互聯網的交互方式。
“My prediction is that you will click less and less on internet links, and the way you will access the internet will be via model lenses, which will be much cleaner and in a much more personalized way.”
我預測,未來你在互聯網上的點擊量會越來越少;你訪問網絡的方式將會通過“模型之鏡”完成,不僅界面更簡潔,也更高度個性化。
“And you can imagine having very personalized multimodal outputs: let’s say if I say I want to learn more about the solar system, instead of it giving me a text output, it should give you a 3D interactive visualization of the solar system, and you can have highly rich interactive features to learn more.”
你可以想象這樣一種個性化的多模態體驗:比如我想深入了解太陽系,與其給我一段文字,不如呈現一個可交互的 3D 太陽系可視化界面,并配備豐富的交互功能,幫助我更直觀、更深入地學習。
“I think there will be this sort of cool future of generative entertainment for people to learn and share new games with other people.”
我設想這樣一個很酷的未來:以“生成式娛樂”為媒介,人們不僅可以學習,還能隨時與他人一起創造并分享全新的游戲體驗。
“I think the way I’m thinking about it is the kind of interface to AI is a blank canvas that kind of molds to your intent. So for example you come to the work today and your intention is to just write code, then the canvas becomes more of an IDE—like a cursor or like a coding IDE, although future programming might change.”
在我看來,與 AI 的交互界面就像一塊“空白畫布”,會根據你的意圖自動定制。
如果你今天的目標只是寫代碼,這塊畫布就會變成一個類似 IDE 的開發環境:自動生成光標、代碼補全、調試工具等(當然,未來的編程模式也許會進一步演進)。
“Or if you’re a writer and you decided to write a novel together, the model can start creating tools on the fly for you such that it will be much easier for you to brainstorm or edit the writing or create character plots and visualize the structure of the plot itself.”
又或者你是一名作家,想和 AI 一起創作小說,模型便會即時為你生成寫作輔助工具,讓你更輕松地進行頭腦風暴、修改文稿、構思角色線索,并可視化展示故事結構。
“Finally, I think the co?innovation is actually going to happen with co?direction creative collaboration with the models itself, and it’s through collaboration with highly reasoning agent systems that will be extremely capable of superhuman tasks to create new novels, films, games, and essentially new science, new knowledge creation.”
最后,我相信“共同創新”將真正實現于人與模型的“共創共導”——通過與高度推理的智能體系統協作,這些系統將具備超越人類的能力,共同創作小說、電影、游戲,乃至推動全新的科學發現與知識創造。
“Cool. Um, thank you so much.”
太酷了。嗯,非常感謝大家!
特別聲明:以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網易號”用戶上傳并發布,本平臺僅提供信息存儲服務。
Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.