PART V /AI-NATIVE 学习AI-NATIVE LEARNING·认知主权THE COGNITIVE SOVEREIGNTY VOLUME
AI Native 学习方法论
AI Native Learning Methodology
这一卷是整个系列的批判性良知。下游三卷讲"把执行交出去";这一卷讲有些内化恰恰不能交出去。当 AI 把"知道"压到近乎免费,稀缺的不再是获取信息,而是体验、反思、迁移——以及守住那些一旦外包就会萎缩、且正是下游所有人本判断赖以运转的认知能力。没有不被萎缩的判断力,组织卷"人回归于意义"就是一句空话。立场不是"AI 已经造成认知能力下降"(证据不足,会反噬),而是有据的警告 + 一个可证伪的赌注。
This volume is the critical conscience of the whole series. The three downstream volumes say "hand execution over"; this one says some internalization is exactly what must not be handed over. Once AI drives the cost of "knowing" toward zero, the scarce thing is no longer acquiring information but experience, reflection, transfer – and guarding the cognitive capacities that atrophy the moment they are outsourced, the very capacities every downstream human judgment runs on. Without un-atrophied judgment, the org volume's "people return to meaning" is an empty phrase. The stance is not "AI makes you stupid" (the evidence does not support it, and it would backfire) but an evidence-grounded warning plus a falsifiable bet.
This volume's kernel specialization (one-line recap): ① execution (lookup, answers, explanations) turns abundant; ② judgment does not merely retreat to a new bottleneck but forks along "abundance-ability" and gains a layer of defensive judgment (what to deliberately keep un-handed); ③ a personal cognitive scaffold becomes infrastructure; ④ the person returns as a cognitive sovereign. You can start here without having read the org volume.
AI-ENABLED LEARNING→AI-NATIVE LEARNING
答案
Answers
更快拿到解释和总结Get explanations and summaries faster先自答,再用 AI 校准缺口Self-answer first, then use AI to calibrate gaps
困难
Difficulty
把阻力全部消掉Remove all friction保留合意困难、反思库和迁移测试Keep desirable difficulty, reflection logs, and transfer tests
能力
Capability
知道答案Know the answer形成离开 AI 后仍能迁移的能力Build ability that transfers after AI is removed
AI-Native learning is not faster explanations; it deliberately separates what can be outsourced from what must be experienced by the learner. It is defensive judgment, protecting downstream taste, value perception, and judgment from hollowing out.
Cognitive Artifacts
学习系统的产物不是笔记,而是可迁移能力。
The output of a learning system is not notes, but transferable capacity.
错题与反思库:记录偏差,而不是收藏答案。
Error and reflection log: record deviation, not answers.
先自答字段:问 AI 前先生成自己的假设。
Self-answer field: produce your own hypothesis before asking AI.
迁移测试:离开 AI 后能否在新情境里做对。
Transfer test: can you perform in a new context without AI?
合意困难:间隔、提取、交错练习被刻意保留。
Desirable difficulty: spacing, retrieval, and interleaving retained on purpose.
AI Stop-line
能查到,不等于该让 AI 代劳。
Just because it can be looked up does not mean AI should do it for you.
Fact lookup, examples, and explanations can go to AI; question quality, challenge hit-rate, value judgment, intuition, and deep thinking must be deliberately retained. If convenience eats these, the whole AI-Native system loses its human foundation.
First Move
从一项能力做外包体检。
Run an outsourcing audit on one capacity.
列出这项能力中 AI 可代劳的步骤与你必须内化的步骤。先保留一个“先自答”动作,再用 AI 校验。小到一周三条反思记录即可开始。
List which steps AI can do and which you must internalize. Keep one self-answer move before AI checks it. Three reflection entries a week is enough to start.
承重命题:当信息与答案随取随到,稀缺的从来不是获取,而是体验、反思、迁移与认知结构的重建——而重建不能被充裕化。AI-Native Learning ≠ "用 AI 当家教 / 让它总结给我听 / 秒查答案"。这是种类之别,不是程度之别:不是"学得更快",是"学的目标变了"。
Load-bearing claim: once information and answers are available on demand, the scarce thing was never acquisition but experience, reflection, transfer and the rebuilding of cognitive structure – and that rebuilding cannot be made abundant. AI-Native Learning is not "use AI as a tutor / have it summarize for me / look up answers instantly." It is a difference of kind, not degree: not "learn faster," but "what learning is for has changed."
把 AI 嫁接到旧学习流程上——更快搜到、更快总结、更快被讲解——只是给"获取信息"这条早已不稀缺的环节加速。这就像在马车前面装喷气引擎:你优化的是一个已经被解掉的瓶颈。学习的真正杠杆,从"内化知识"移到了别处。区分两层很关键:"知道"(knowing that)——陈述性知识,AI 已压到近零成本;"能做"(knowing how)——程序性能力,仍要靠犯错-纠正的循环一遍遍长出来,无法被代劳。信息充裕不等于能力充裕,这是全卷的力学起点。
Grafting AI onto the old learning pipeline (search faster, summarize faster, be explained to faster) merely accelerates "acquiring information," a step that stopped being scarce long ago. It is like bolting a jet engine onto a horse cart: you are optimizing a bottleneck that is already solved. Learning's real leverage has moved elsewhere. The distinction that matters is two layers: "knowing that" – declarative knowledge, which AI has driven to near-zero cost; and "knowing how" – procedural capacity, which still has to grow through the error-and-correction loop, run over and over, and cannot be done on your behalf. Abundant information is not abundant capability – and that is the mechanical starting point of this whole volume.
所以这一卷不教"如何用 AI 学得更快"。它问一个更难、更不确定的问题:当 AI 随时作外脑供答案,人创造与内化知识的过程本身,是否正在被悄悄重写?这不是"瓶颈搬到哪"(那是下游卷的语气),而是"认知能力是否在悄悄萎缩"——一个证据尚未定案、却必须现在就严肃对待的问题。
So this volume does not teach "how to learn faster with AI." It asks a harder, less settled question: once AI stands by as an external brain handing over answers, is the very process by which a person creates and internalizes knowledge being quietly rewritten? This is not "where does the bottleneck move" (the tone of the downstream volumes) but "are cognitive capacities quietly atrophying" – a question the evidence has not settled, yet one that must be taken seriously now.
这卷为什么必须唱反调,而不是补全拼图
Why this volume must dissent, not just complete the puzzle
系列里其余几卷有一个共同的语气:识别瓶颈搬到哪、然后顺着把杠杆做大——它们都在优化便利。如果学习卷也照这个语气写,它会变成"如何用 AI 学得更快"的工具指南,和市面上千篇一律的提示词技巧没有区别,也辜负了它在系列里的位置。本卷必须唱反调,原因不是为了与众不同,而是因为认知是唯一一个"被优化的便利会反噬被优化者"的领域。在工程、组织、设计里,把执行交出去、把流程做顺,损失的至多是某次产出的质量,可由校验兜住;在学习里,把思考交出去、把过程做顺,损失的可能是那个还能思考的人本身——而这个损失没有外部校验能兜住,因为检测器和被检测物是同一个东西。所以学习卷在系列里扮演批判性良知:当其余几卷都在踩油门,它是第一个、也是唯一一个有资格踩刹车的——不是反对 AI,是提醒整个系列,便利不是无代价的,有一块地基必须在加速中被刻意守住。
The rest of the series shares one register: identify where the bottleneck moves, then enlarge the leverage along it – all optimizing convenience. If the learning volume were written in that register, it would become a tool guide to "how to learn faster with AI," indistinguishable from the cookie-cutter prompt tips everywhere, and would betray its place in the series. This volume must dissent, not for the sake of being different, but because cognition is the one domain where the optimized convenience backfires on the one optimized. In engineering, org, and design, handing off execution and smoothing the process risks at most the quality of some output, caught by verification; in learning, handing off thinking and smoothing the process may cost the very person who can still think – a loss no external verification can catch, because the detector and the detected are the same thing. So the learning volume plays the series' critical conscience: while the others step on the gas, it is the first and only one qualified to tap the brakes – not opposing AI, but reminding the whole series that convenience is not costless, that one foundation must be deliberately guarded amid the acceleration.
体验、反思、迁移:三件无法被代劳的事
Experience, reflection, transfer: the three that can't be done for you
承重命题点名了三样在信息充裕后才显出稀缺的东西,值得各自说清它们为什么结构上无法被外包,而不只是"AI 暂时还做不好"。体验是第一人称的:你对一个概念的"感觉"——它在什么情境下成立、在什么边缘失效——只能由你亲历那些情境长出来。AI 能描述边界条件,但描述边界条件不等于撞过边界;撞过的人有一种描述者没有的警觉。反思是对自己思维的二阶操作:发现"我刚才那个推理为什么错",这要求被反思的那个推理是你自己产出的——你无法对一段 AI 替你生成的推理做真正的反思,因为它从不曾是你心智模型的一部分,你只是在事后给一个外来物贴标签。迁移是把一处学到的结构搬到一个表面不同的新情境——它检验的恰恰是"你内化的是抽象结构还是表面套路",而这只有在结构真的长进你脑子里时才可能。三件事的共同点是:它们的产物都是你这个认知主体的改变,而主体的改变无法外包给另一个主体。
The load-bearing claim named three things that turn scarce only after information becomes abundant; each deserves a clear statement of why it is structurally un-outsourceable, not merely "AI can't do it well yet." Experience is first-person: your "feel" for a concept – where it holds, where it fails at the margin – can only grow from your living through those situations. AI can describe boundary conditions, but describing a boundary is not hitting it; the one who hit it carries an alertness the describer lacks. Reflection is a second-order operation on your own thinking: spotting "why that reasoning of mine was wrong" requires that the reasoning being reflected on was yours to produce – you cannot truly reflect on a chain of reasoning AI generated for you, because it was never part of your mental model; you are only labeling a foreign object after the fact. Transfer carries a structure learned in one place into a surface-different new situation – testing precisely "whether what you internalized is the abstract structure or the surface routine," possible only when the structure actually grew into your head. What the three share: their product is a change in you as a cognitive subject, and the change of a subject cannot be outsourced to another subject.
"Learn faster" and "what learning is for has changed" sound like the same thing but are two forks, and telling them apart is this volume's entry point. The difference of degree assumes learning's goal is constant (move knowledge into the head) and only the means improve – faster search, smoother explanation, better-fitted exercises. On this path AI is a better tool, and a methodology is merely "how to use the tool to the hilt." The difference of kind concedes instead: once the cost of "moving knowledge into the head" collapses toward zero, that stops being a goal worth spending effort to guard – the goal itself must be redrawn. It is like photography to painting: the camera made "accurately reproducing appearance" cheap, so painting's goal migrated from "looking like the thing" to "having a view of it." The painter did not get faster; "what painting is for" changed. Learning is undergoing an isomorphic migration.
FIG. L.1 / 嫁接的谬误THE GRAFTING FALLACY看懂:加速早已不稀缺的那一步,杠杆为零Read: accelerating the already-cheap step yields zero leverage
从图里读出:旧流水线里,AI 的全部火力都打在第一格"获取信息"——而那一格的成本早在搜索引擎时代就已塌掉。把工具嫁接到旧流程上,优化的是一个解掉的瓶颈;真正的杠杆在后两格,偏偏那两格要的是无法被加速的结构性时间。这就是为什么本卷不教"用 AI 学得更快"。What the figure says: in the old pipeline, all of AI's firepower lands on the first box, "acquire info" – whose cost already collapsed back in the search-engine era. Grafting the tool onto the old process optimizes a solved bottleneck; the real leverage is in the last two boxes, which happen to need structural time that cannot be accelerated. This is why the volume does not teach "learn faster with AI."
这一卷在系列里的位置Where this volume sits
组织卷要"人回归于意义",前提是人还具备不被萎缩的判断力;本卷正是那份判断力的养成与守护机制。它与系列上游的研究卷也咬合:研究卷问"什么值得知道/做",本卷问"知道之后怎样真正长进认知里"——一个定方向,一个守地基。The org volume's "people return to meaning" presupposes that people still possess un-atrophied judgment; this volume is the mechanism that grows and guards it. It also meshes with the upstream research volume: research asks "what is worth knowing / doing," this volume asks "once known, how does it actually grow into cognition" – one sets direction, the other guards the foundation.
LEARN
01
KERNEL · 内核特化
KERNEL
机理 · 内核母版
Mechanism · Kernel master
充裕的是输入,稀缺的是内化
What's Abundant Is the Input; What's Scarce Is Internalization
Load-bearing claim: the same kernel master block, on the surface of cognition. But the ② step on this surface is not a simple "judgment retreats to the new bottleneck" – it forks along "abundance-ability" into two branches, and adds a layer the other volumes lack: defensive judgment – judging not only what to hand to AI, but what to deliberately keep un-handed.
① 充裕ABUNDANCE
信息 / 答案 / 讲解 / 示范
Information / answers / explanations / demonstrations
"知道"近乎免费——查得到、问得到、个性化生成得到。
"Knowing that" is near-free – lookup-able, ask-able, personally generatable.
② 判断JUDGMENT
退守内化 + 元能力,并划 AI 止步线
Retreat to internalization + meta-skills, draw the AI stop-line
新瓶颈是内化与元认知,不是信息获取;且需主动判断哪些刻意不外包。
The new bottleneck is internalization and metacognition, not acquisition; and you must decide what to deliberately not outsource.
③ 上下文CONTEXT
个人认知脚手架 + 刻意制造的难度
A personal cognitive scaffold + deliberately built difficulty
错题反思库、人机同源可 diff 的知识库,外加有意保留的"合意困难"。
An error-and-reflection log, a same-source diffable knowledge base, plus deliberately retained "desirable difficulty."
④ 人MEANING
守护不可外包的认知
Guard the cognition that can't be outsourced
判断力 · 价值感知 · 直觉 · 深度思考 · 品味——学习者回归为认知主权者与更好的提问者。
Judgment · value perception · intuition · deep thinking · taste – the learner returns as a cognitive sovereign and a better question-asker.
②步的分叉是本卷与下游最深的差异,必须画清楚。沿"这件认知是否可被 AI 充裕地代劳"分两支:
The fork at step ② is the deepest difference between this volume and the downstream ones, and must be drawn clearly. Split along "can this piece of cognition be abundantly done for you by AI":
The abundance-able branch → folds into ①. "Knowing that something is true," "finding a fact," "generating an explanation" – these are no longer the goal of learning; they become one more kind of automated execution. Spending effort here is optimizing a solved bottleneck.
The constitutive branch → sinks to ④. Experience, reflection, transfer, the error-and-correction loop, and the defensive judgment of "which capacities to deliberately not outsource." It is not a capacity that something "more accurate" can supersede; it is the rebuilding of cognitive structure itself – it can only grow inside the learner, and cannot be done on their behalf.
旧Before
学习 = 把外部知识搬进脑子。稀缺资源是"接触信息"(书、老师、课程),瓶颈在获取端。
Learning = moving external knowledge into the head. The scarce resource is "access to information" (books, teachers, courses); the bottleneck is on the acquisition side.
新 · 原理After · principle
获取归零,瓶颈搬到内化端。学习 = 在体验-反思-迁移的闭环里重建认知结构,并主动划出 AI 止步线。充裕的是输入,重建仍靠你自己。
Acquisition goes to zero; the bottleneck moves to the internalization side. Learning = rebuilding cognitive structure inside an experience-reflection-transfer loop, and actively drawing the AI stop-line. The input is abundant; the rebuilding is still on you.
FIG. L.3 / ②步的分叉THE FORK AT STEP ②看懂:判断沿"可充裕性"分两支,并多一层防御Read: judgment forks along "abundance-ability," plus a defensive layer
这张图说的是:下游卷的②步是"判断退守到新瓶颈"——一条单线。学习面的②多了一道分叉和一层防御:先沿"可充裕性"把认知分成上下两支(可充裕的并回①当执行,构成性的下沉到④由人长出),再叠加一层别卷没有的防御性判断——不仅判断该交什么给 AI,更要判断该刻意保留什么不交。这一层就是全卷反调的根。What you are looking at: the downstream volumes' step② is "judgment retreats to the new bottleneck" – a single line. The learning ② adds a fork and a defensive layer: first split cognition along "abundance-ability" into two branches (the abundance-able folds back to ① as execution; the constitutive sinks to ④ to be grown by a person), then overlay a layer of defensive judgment the others lack – judging not only what to hand to AI, but what to deliberately keep un-handed. This layer is the root of the whole volume's contrarian stance.
为什么"可充裕支"必须并回执行,而不是另立一类学习
Why the abundance branch folds back into execution, not into a new kind of learning
②步分叉的上支——可充裕支——很容易被误读成"一种更轻松的学习方式",好像查事实、读 AI 讲解也是在学习,只是变快了。本卷要明确把这一支移出学习的目标范畴,并入①执行。理由是:当一件事的获取成本塌向零,它就不再具备"值得花认知精力去守护"的属性,它变成了又一种被自动化的吞吐——和让 AI 写一段样板代码、生成一份会议纪要没有本质区别。把它仍叫"学习",会让人把精力错配到一个已解掉的瓶颈上,正是 FIG L.1 那个"给马车装喷气引擎"的谬误。这个划分有一个尖锐的推论:"我今天用 AI 查了很多资料、读了很多讲解"——这句话描述的不是学习量,是执行量。它和"我今天学了很多"是两件事;混淆它们,正是便利陷阱在自我感知层的入口。真正的学习量,只能用构成性支的产出来计:你今天亲手跑了几次犯错-纠正循环、做了几次撤除演练、往反思库回流了几条——这些才是认知结构被重建的证据。把可充裕支干脆利落地归入执行,是为了让"学习"这个词只指向那件真正稀缺、真正在你身上发生的事。
The upper branch of the step-② fork – the abundance-able branch – is easily misread as "a more relaxed way of learning," as if looking up facts and reading AI explanations were also learning, just faster. The volume must explicitly move this branch out of learning's goal category and fold it into ① execution. The reason: when a thing's acquisition cost collapses toward zero, it loses the property of "worth spending cognitive effort to guard"; it becomes one more automated throughput – no different in kind from having AI write boilerplate or generate meeting minutes. Still calling it "learning" misallocates effort to a solved bottleneck, exactly the "jet engine on a horse cart" fallacy of FIG L.1. This division has a sharp corollary: "today I looked up a lot of material with AI and read a lot of explanations" – that sentence describes not an amount of learning but an amount of execution. It is a different thing from "today I learned a lot"; conflating them is the convenience trap's entry point at the self-perception layer. Real learning volume can only be counted by the constitutive branch's output: how many error-correction loops you ran by hand today, how many removal drills you did, how many entries flowed back into the reflection log – these are the evidence that cognitive structure was rebuilt. Folding the abundance branch cleanly into execution is so that the word "learning" points only at the thing that is truly scarce and truly happening in you.
内化有两个对象,混淆它们就会误读本卷
Internalization has two objects; conflating them misreads the volume
The volume is most easily misread to two extremes, both rooted in not separating "the object of internalization has changed." The first extreme is the nostalgist: "AI makes people lazy-minded, so return to rote memorization and do everything yourself." This mistakes "internalizing discrete knowledge" for a goal still to guard – yet that branch is exactly the abundance-able one to let go of (FIG L.3's first cell). The second extreme is the abolitionist: "answers are on demand, so internalize nothing; just be able to look up and ask." This abolishes "internalization" wholesale – overlooking that its object merely moves up, it does not vanish: the judgment structure of asking/challenging/integrating is equally expensive and equally can only grow through the error-correction loop (SHEET 03's load). The volume's exact place is between the extremes: let go of what should be let go (discrete facts), hold fast to what should be held (judgment structure and constitutive cognition). "The object of internalization changed; internalization was not abolished" – this sentence is the key to reading the whole volume, and the landed-layer restatement of the step-② fork (FIG L.3). Anyone who reads the volume as "anti-AI" or "mindlessly embrace AI" has turned this key the wrong way.
同一条内核,五个面,本卷是唯一会"反向"的那个
One kernel, five faces; this is the only one that runs in reverse
把五卷的②步并排看,能看清本卷在系列里的特殊位置。组织卷的②:判断退守到"该让谁/什么来做决策"。工程卷的②:判断退守到 trust-but-verify 的校验设计。设计卷的②:判断退守到"什么是好品味、为不为人"。研究卷的②:判断退守到"什么问题值得问、什么算真结果"。它们有一个共同语法——把执行交出去,人退到更高的判断节点,方向都是"放手"。学习卷的②打破这个语法:它的判断里有一支是反向的——明知 AI 能代劳,却判断"这件必须自己做",因为做这件事的过程就是在维持那个能做判断的认知主体。其余四卷优化的是产出,学习卷优化(守护)的是产出者。这就是为什么 EXPANSION-SPECS 说"别卷优化便利,本卷有时要抵抗便利"——不是态度上的逆反,是同一条内核作用在"认知"这个唯一会被代劳反噬的面上时,必然长出的反向分支。
Lay the five volumes' step ② side by side and this volume's special place in the series comes clear. The org ②: judgment retreats to "who/what should make the decision." The engineering ②: judgment retreats to trust-but-verify verification design. The design ②: judgment retreats to "what is good taste, for-people or not." The research ②: judgment retreats to "which question is worth asking, what counts as a real result." They share one grammar – hand off execution, retreat to a higher judgment node – all in the direction of "letting go." The learning ② breaks that grammar: one branch of its judgment runs in reverse – knowing AI can do it, yet judging "this one I must do myself," because the process of doing it is what sustains the cognitive subject able to judge. The other four optimize the output; the learning volume optimizes (guards) the producer. This is why EXPANSION-SPECS says "the others optimize convenience; this volume must sometimes resist it" – not contrarianism as attitude, but the reverse branch the same kernel inevitably grows when it acts on cognition, the one face where being-done-for backfires on the one it is done for.
防御性判断:第一次出现的"反向"内核动作
Defensive judgment: the kernel's first "reverse" move
值得停下来看清这层的特殊性。在组织、工程、设计三卷里,②步的判断都是进攻性的:把执行尽量交出去,人退守到最稀缺的判断节点,目标是让杠杆最大化。学习面第一次出现一个反向的判断动作——有些能被充裕代劳的事,恰恰不该交出去,因为交出去会侵蚀那个让你有资格做判断的认知底座。同一条内核母版,在别卷是"放手",在本卷多了一句"但有些要刻意攥住"。这不是给内核打补丁,而是内核在认知这个特殊面上的必然显形:因为认知是唯一一个"被代劳就会改变代劳者本身"的执行领域——代码被 AI 写不会让人退化,思考被 AI 替才会。
It is worth pausing on what makes this layer special. In the org, engineering, and design volumes, the step-② judgment is offensive: hand off execution as much as possible, retreat to the scarcest judgment node, maximize leverage. The learning surface introduces, for the first time, a reverse judgment move – some things AI can abundantly do are precisely the ones you should not hand over, because handing them over erodes the cognitive bedrock that qualifies you to judge at all. The same kernel master block reads "let go" in the other volumes; here it gains a clause: "but some things, grip on purpose." This is not a patch on the kernel but its inevitable form on the particular surface of cognition: because cognition is the one execution domain where "having it done for you changes the one it is done for" – code written by AI does not degrade a person, thinking done by AI does.
所以"内化"在这一卷里有两个对象,必须分清(接 SHEET 03):内化离散知识(这件事 AI 已接管,不必再守),和内化提问/质疑/整合的判断结构(这件事是新瓶颈,且只能靠自己长)。本卷的②步同时管这两支的分配,外加那道防御线——三件事压在一个步骤里,这是它比下游任何一卷的②都更密的原因。
So "internalization" in this volume has two objects that must be told apart (continuing into SHEET 03): internalizing discrete knowledge (AI has taken this over; no need to guard it) and internalizing the judgment structure of asking/challenging/integrating (this is the new bottleneck, and can only be grown by oneself). This volume's step ② governs the allocation of both branches plus the defensive line – three things packed into one step, which is why it is denser than the ② of any downstream volume.
LEARN
02
MECHANISM · 机理
MECHANISM
机理 · 受力分析
Mechanism · Force analysis
知道几乎免费,能做依旧昂贵
Knowing Is Nearly Free; Doing Stays Expensive
承重命题(推至极限):陈述性知识被压到近零成本,程序性能力却仍要靠犯错-纠正的循环长出,无法被充裕化。两层之间张开一道成本剪刀差:信息↑ 而内化能力不自动↑。真问题随之而来——当 AI 随时供答案,人是否还需要把知识内化为"能做"?
Load-bearing claim (pushed to the limit): declarative knowledge is driven to near-zero cost, yet procedural capacity still has to grow through the error-and-correction loop and cannot be made abundant. A cost scissors opens between the two layers: information rises while internalized capacity does not automatically follow. The real question follows – once AI supplies answers on demand, does a person still need to internalize knowledge into "knowing how"?
The two layers are mechanically different. "Knowing" is the transport of facts: one query, one generation, marginal cost toward zero, copyable and instantly available. "Doing" is the growth of skill: it requires a structure – make an attempt, get feedback, notice the gap, correct, repeat. In this loop "error" is not waste but signal; "correction" is not a patch but the very moment learning happens. AI can produce the right answer for you, but it cannot run the loop for you – because what the loop changes is your cognitive structure, not the artifact.
这道剪刀差有数十年、不依赖 AI 的硬证据撑着(进证据账):
This scissors is backed by decades of hard, AI-independent evidence (the evidence ledger):
The testing effect: active retrieval (quizzing yourself) beats rereading for long-term retention – yet rereading looks better in the short term, so people systematically misjudge (Roediger & Karpicke 2006, II, replicable). AI maximizes the ease of the "reread" mode, landing squarely in that misjudgment trap [R1].
The load-bearing part of deliberate practice is the structured error-and-correction loop, not the hour count. Ericsson 1993's "practice volume dominates" was substantially shrunk by Macnamara et al. 2014's meta-analysis (~14% of variance overall; education 4%, professions <1%, n.s.) – so cite it with care: what bears weight is the loop's structure, not "10,000 hours" (an overstated pop-claim) [R3].
FIG. L.0 / 成本剪刀差THE COST SCISSORS看懂:两条成本曲线随 AI 能力反向张开Read: two cost curves fan apart as AI improves
图里的两条线:红线是"知道"的单位成本,随模型变强塌向零;蓝线是"能做"的单位成本,几乎不动——因为它要的是犯错-纠正循环这种结构性时间,AI 压缩不掉。两线张开的缺口,就是认知错觉的温床:信息越廉价,人越容易把"查得到"误当成"我会了"。The two curves: the red curve is the unit cost of "knowing that," collapsing toward zero as models improve; the blue curve is the unit cost of "knowing how," nearly flat – because it requires the structural time of an error-and-correction loop, which AI cannot compress. The widening gap between them is the breeding ground of a cognitive illusion: the cheaper information gets, the more readily people mistake "I can look it up" for "I have learned it."
"是否还需内化"——把问题推到极限
"Do we still need to internalize" – pushed to the limit
把本张的机理推到极限,会撞上一个不能回避的问题:既然 AI 随时作外脑供答案,人是否还需要把知识内化为"能做"?诚实地回答它,而不是反射式地说"当然需要"。先承认对手最强的版本:对于纯粹的"知道"层——记住某个 API 的参数、某个史实的年份、某段代码的样板——答案是不需要,且早就不需要了;强行内化这些,是在和一个已解掉的瓶颈较劲。问题的全部分量落在"能做"层和它上面的判断结构。这里答案是仍然需要,但理由必须精确,否则站不住。理由不是"以防 AI 哪天不在"(那是脆弱的实用主义),而是 SHEET 03/06 那条更深的链:"能做"的根基是你质疑 AI、做出判断、保有品味的前提——没有它,你连"AI 这次对不对"都判断不了,于是从一个能驾驭工具的人,降格成一个只能全盘接受工具输出的人。所以"是否还需内化"的精确答案是:内化离散知识——不需要;内化让你保有判断主权的那套能力——比任何时候都更需要。这个一刀切不开的回答,正是②步那道分叉(FIG L.3)存在的理由。
Push this sheet's mechanics to the limit and you hit an unavoidable question: since AI stands by as an external brain handing over answers, does a person still need to internalize knowledge into "knowing how"? Answer it honestly, not with a reflexive "of course." First grant the opponent's strongest version: for the pure "knowing" layer – memorizing an API's parameters, a historical date, a code boilerplate – the answer is no, and has been no for a while; forcibly internalizing these is wrestling a solved bottleneck. The whole weight of the question falls on the "doing" layer and the judgment structure above it. Here the answer is still yes, but the reason must be precise or it will not hold. The reason is not "in case AI is unavailable someday" (fragile pragmatism) but the deeper chain of SHEET 03/06: the foundation of "doing" is the precondition for your challenging AI, judging, and holding taste – without it you cannot even judge "is AI right this time," and so you drop from someone who commands the tool to someone who can only accept its output wholesale. So the precise answer to "do we still need to internalize" is: internalize discrete knowledge – no; internalize the capacities that keep you sovereign in judgment – more than ever. This answer that cannot be cut in one stroke is precisely why the step-② fork (FIG L.3) exists.
识别与重现:同一份内容,两个完全不同的成本
Recognition vs reproduction: one content, two utterly different costs
把"知道"与"能做"的成本差拆到认知操作这一层,会落在一对经典区分上:识别(recognition)与重现(recall/production)。识别是"看到正确答案能认出它对"——它廉价、快速、且 AI 把它推到了极致:任何讲解你读完都会觉得"对,我懂了"。重现是"在没有提示的情况下从头把它生成出来"——它昂贵、滞后,且无法被任何外部工具代偿,因为重现要求的是你脑中已有那条可被主动激活的路径。这对区分解释了 AI 学习里最普遍的自欺:你跟着 AI 的推导读一遍,每一步都"识别"得很顺,于是判定自己学会了;但真正的检验是合上 AI 从头"重现",这时大多数当时觉得懂了的东西会塌掉。识别的流畅,被大脑误当成重现的能力——这正是 FIG L.0 那条"知道-能做"缺口在主观体验里的样子。本卷所有"先自答、撤除演练、迁移测试"的处方,本质上都是在强制把检验从识别切换到重现,因为只有重现这一关,能把真实的"能做"和虚假的"看着会"分开。
Push the cost gap between "knowing" and "doing" down to the layer of cognitive operations and it lands on a classic distinction: recognition versus reproduction (recall/production). Recognition is "seeing the right answer and recognizing it as right" – cheap, fast, and AI has pushed it to the extreme: read any explanation and you will feel "yes, I get it." Reproduction is "generating it from scratch with no prompt" – expensive, lagged, and uncompensable by any external tool, because reproduction requires a path already in your head that you can actively activate. This distinction explains the most universal self-deception in AI learning: you read along with AI's derivation, "recognize" each step smoothly, and conclude you have learned it; but the real test is to close AI and "reproduce" from scratch, at which point most of what felt understood collapses. The fluency of recognition is misread by the brain as the capacity for reproduction – exactly what the FIG L.0 "knowing-doing" gap looks like in subjective experience. All the volume's prescriptions of "self-answer first, removal drills, transfer tests" are, at root, forcing the test to switch from recognition to reproduction, because only the reproduction gate separates real "doing" from the false "looks like I can."
为什么"能做"压不下来:循环改变的是结构,不是产物
Why "doing" won't fall: the loop changes the structure, not the artifact
Spell out why the blue line will not budge. "Doing" is expensive not because a smarter algorithm is still missing, but because its product is the change in the learner's own cognitive structure – and structural change can only be lived through by the owner of that structure. A proof, a refactor, a clinical judgment: AI can deliver the correct final artifact in a second, but in that second your neural representation has not shifted at all. Skill psychology calls this proceduralization: declarative "I know the rule" compiles, slowly and only through heavy feedback-laden execution, into procedural "I do it right without thinking." That compilation has its own time constant, unrelated to parameter count. The stronger the AI, the more steps it skips for you – and the skipped steps are exactly where compilation would have happened.
这也解释了一个反复出现的错觉:跟着 AI 的讲解点头如捣蒜、当场全懂,三天后却复现不出。点头时动用的是"知道"层的识别(recognition),廉价且即时;复现要的是"能做"层的提取与生成(recall & production),昂贵且滞后。两层在当下感受里几乎无法区分——这正是合意困难家族最危险的元认知陷阱:表面流畅度(fluency)被大脑误读成已掌握,而真实的掌握往往伴随当下的吃力(见 SHEET 14 的速度公理)。AI 把"知道"层的流畅度推到极致,于是这个错觉也被推到极致。
This also explains a recurring illusion: you nod along to AI's explanation, feel you fully understand in the moment, then cannot reproduce it three days later. Nodding engages the "knowing" layer's recognition – cheap and immediate; reproduction demands the "doing" layer's recall and production – expensive and lagged. The two layers are nearly indistinguishable in the felt present – which is the most dangerous metacognitive trap of the desirable-difficulty family: surface fluency is misread by the brain as mastery, while real mastery usually comes with present-tense effort (see the speed axiom, SHEET 14). AI pushes the "knowing" layer's fluency to the extreme, so this illusion, too, is pushed to the extreme.
成本剪刀差是全卷的力学脊柱
The cost scissors is the volume's mechanical spine
Collapse this sheet's mechanics into one portable sentence: information rises, but internalized capacity does not automatically follow. This scissors is not an isolated finding of this sheet but the mechanical starting point of every later SHEET, and it is worth naming here how it runs downward through the whole volume. Because "knowing" collapses toward zero while "doing" stays put, there is SHEET 03's goal shift – learning's load moves upstream from holding answers to asking/challenging/integrating. Because the gap between the two layers is the breeding ground of cognitive illusion, there is SHEET 04's fracture concern – people systematically mistake "I can look it up" for "I have learned it," idling the deep-thinking muscle unawares. Because "doing" grows only in a feedback-laden loop and needs physical time to consolidate, there are SHEET 05's scaffold and SHEET 14's speed axiom – engineering the loop structure and the time constant into the toolflow. Because the scissors makes certain cognition "seem outsourceable yet atrophy upon outsourcing," there is SHEET 06's stop-line. In other words, the whole volume's dissent, prescriptions, and dashboard are all corollaries grown from this one cost asymmetry of "information cheap, internalization expensive." Grasp the scissors and you hold the master key to every later sheet; ignore it and every later claim looks like an isolated attitude rather than the inevitable result of one mechanics.
那个必须诚实处理的反例:刻意练习被高估的量级
The counterexample we must handle honestly: deliberate practice is overstated
Resting "doing is expensive" on "deliberate practice" alone invites a real counterexample to turn the tables, so handle it first. Ericsson 1993 attributed expert performance mainly to practice volume, spawning the popular "10,000 hours" narrative. But Macnamara et al. 2014's meta-analysis substantially shrank that magnitude: deliberate practice explains only about 14% of performance variance overall, with huge between-domain spread – games 26%, music 21%, sports 18%, education just 4%, professions <1% and non-significant. So practice hours are far from the whole of skill; talent, starting point, and task structure carry much of it. The volume's load-bearing anchor is therefore not "practice long enough" but the narrower, sturdier part: the structure of a feedback-laden error-correction loop is an indispensable link in skill growth. The loop structure is a necessary, not sufficient, condition; hours are a noisy proxy, cited with care. Only at this precision does "doing is expensive" hold, rather than being swept away by "10,000 hours is a myth."
检验信号Test signal
迁移测试通过率,而非答案召回率——若一个人能在没有 AI 在场的新情境里把一项技能用出来,"能做"就长出来了;若只有 AI 在场时才表现好,长出来的是依赖,不是能力。补一条更狠的:合上 AI,间隔三天,能否凭记忆重新生成(不是再认)那条推导——能,才算编译完成。Transfer-test pass rate, not answer-recall rate – if a person can deploy a skill in a new situation with no AI present, "knowing how" has grown; if they perform well only when AI is present, what grew is dependence, not capability. A harsher one: with AI closed and a three-day gap, can you regenerate (not merely recognize) that derivation from memory – only then is compilation complete.
LEARN
03
REDRAW · 重画
REDRAW
重画 · 目标转移
Redraw · Goal shift
从内化知识,到成为更好的提问者
From Internalizing Knowledge to Becoming a Better Question-Asker
承重命题(推至极限的目标转移):若答案随取,"记住答案"贬值,学习目标移向在与 AI 协作中保持主导权的元能力——提问、质疑、整合。目标不是成为更好的答案持有者,是成为更好的问题提出者。但有陷阱:元能力本身仍要"能做"层的练习,不能空谈。
Load-bearing claim (a goal shift pushed to the limit): if answers are on demand, "remembering the answer" devalues, and the goal moves to the meta-skills that keep you in command while collaborating with AI – asking, challenging, integrating. The goal is not to be a better answer-holder but a better question-poser. But there is a trap: the meta-skills themselves still need the "knowing how" layer of practice, not empty talk.
When an accurate answer is one prompt away, value moves upstream from "holding the answer" to "posing the question worth asking, spotting the wrong answer, integrating many sources into a judgment" – that is, the three of asking, challenging, integrating. Only those who can ask well discover real problems; this is exactly where it feeds the upstream innovation volume's "value discovery" (a better question-asker = the input to innovation).
但别把它写成空中楼阁。"提问/质疑/整合"听起来像可以脱离具体知识独立训练的纯元能力——这是危险的误读。你无法质疑一个你毫无"能做"层根基的领域里的 AI 输出:质疑的命中,靠的是你亲手趟过的犯错-纠正循环给你的那种"这里不对劲"的直觉。元能力不是替代"能做",它长在"能做"之上(回接 SHEET 02)。所以这一卷的目标转移不是"从此不用内化了",而是"内化的对象变了"——从内化离散事实,变成内化提问/质疑/整合的判断结构。
But do not write it as a castle in the air. "Asking / challenging / integrating" sounds like pure meta-skills trainable in isolation from concrete knowledge – a dangerous misreading. You cannot challenge an AI output in a domain where you have no "knowing how" foundation: the hit-rate of a challenge rides on the "something's off here" intuition that only the error-and-correction loop you ran yourself can give you. Meta-skills do not replace "knowing how"; they grow on top of it (back to SHEET 02). So this volume's goal shift is not "you no longer need to internalize anything" but "the object of internalization has changed" – from internalizing discrete facts to internalizing the judgment structure of asking, challenging, and integrating.
FIG. L.4 / 犯错-纠正循环THE ERROR-CORRECTION LOOP看懂:能做只在这个闭环里生长,AI 只能进校验环Read: doing grows only inside this loop; AI enters only at verify
顺着环读:能做之所以贵,是因为它只在这个闭环里生长——尝试、犯错、先自纠、再校验、重做。Ericsson 1993 的承重处不是"一万小时",是这个带反馈的纠错环(Macnamara 2014 元分析显示练习量只解释约 14% 方差,所以慎引时数)。AI 的合法位置是④校验,且必须在你自纠之后进;一旦把它挪到①前面替你出第一稿,整个环就被绕过,产物出来了而你的结构没动。Follow the loop: doing is expensive because it grows only inside this loop – attempt, error, self-correct, verify, redo. Ericsson 1993's load-bearing part is not "10,000 hours" but this feedback-laden corrective loop (Macnamara 2014's meta-analysis shows practice volume explains only ~14% of variance, so cite hours with care). AI's legitimate seat is ④ verify, and it must enter after your self-correction; move it ahead of ① to draft for you and the whole loop is bypassed – the artifact ships while your structure stays still.
FIG. L.10 / 元认知监测环THE METACOGNITIVE MONITORING LOOP看懂:监测自己学得准不准——而 AI 专门腐蚀这个环的第一格Read: the loop that keeps your self-read honest – and AI corrupts its first node
顺着环看:能不能停在 FIG L.9 的脚手架一侧,取决于这个三步环转得准不准。①自评"我真的会了吗";②把自感与一次真实测验(而非重读的流畅感)对齐——这一步就是校准;③针对差距做合意困难的练习;再回到①。环的命门在第一格:AI 持续往里注入合成自信——[R19] METR 那条"自感更快、实测更慢"正是它的刻度。可操作的校准量来自 [R18]:你多常反驳 AI、且反驳得对;这个数下降,意味着①已被腐蚀,环在空转。练习一格之所以要的是 [R8] 合意困难而非轻松复述,也是为了让②的实测有真实信号可校准。Around the loop: whether you can stay on FIG L.9's scaffold side depends on this three-node loop running true. ① self-assess "do I really know it?"; ② align that feeling against one real test (not the fluency of rereading) – this step is calibration; ③ run desirable-difficulty practice on the gap; back to ①. The loop's weak point is the first node: AI keeps injecting synthetic confidence – [R19] METR's "felt faster, measured slower" is its gauge. The operable calibration quantity comes from [R18]: how often you push back on AI, and correctly; when that number falls, node ① is corrupted and the loop spins empty. Practice must be [R8] desirable difficulty rather than easy restatement precisely so node ②'s test has a real signal to calibrate against.
这卷的反调,最终落在一个正向的人像上
The dissent finally lands on a positive portrait of a person
容易把这一卷读成纯粹的防守——划止步线、抵抗便利、防萎缩,仿佛它全部的内容都是"别让 AI 拿走什么"。但它真正的落点是正向的:一个在 AI 时代更强的人是什么样子。那个人不是记得更多事实的人(那一层已经廉价),不是打字更快的人(执行已经充裕),而是一个更好的提问者——他能在一片被 AI 拉平的信息里嗅出哪个问题真正值得问,能在 AI 笃定的输出里一眼看出哪里不对劲并反驳得准,能把彼此冲突的多个来源在自己脑中熔成一个自洽的判断。这三件能力(提问、质疑、整合)共同勾勒出的,不是一个抗拒工具的怀旧者,而是一个驾驭工具、且始终知道自己要去哪的认知主体。本卷所有"反调"的纪律——先想后问、保留循环、设合意困难、划止步线——存在的唯一目的,是让这个人像能在便利的默认引力下不被磨平。所以这一卷的底色其实是建设性的:它不是要你少用 AI,是要你用 AI 用成那个更强的人,而不是那个把判断悄悄让渡出去、连自己为什么同意都说不清的人。守住认知主权,是为了让你有资格、也有能力,去做那个回归于意义的判断者——这正是组织卷"人回归于意义"在个体认知层的前提条件。
It is easy to read this volume as pure defense – draw the stop-line, resist convenience, prevent atrophy, as if its whole content were "don't let AI take something away." But its real landing point is positive: what a stronger person looks like in the AI era. That person is not one who remembers more facts (that layer is already cheap), nor one who types faster (execution is already abundant), but a better question-asker – one who can sniff out, in a field of information flattened by AI, which question is truly worth asking; who can see at a glance where a confidently-toned AI output is wrong and push back accurately; who can fuse several conflicting sources in their own head into one self-consistent judgment. The three capacities (asking, challenging, integrating) together sketch not a nostalgist resisting tools but a cognitive subject who commands the tool and always knows where they are going. The sole purpose of all the volume's "dissenting" disciplines – think before you ask, keep the loop, build desirable difficulty, draw the stop-line – is to keep this portrait from being worn flat under convenience's default gravity. So the volume's true ground tone is constructive: it does not ask you to use AI less but to use AI into becoming that stronger person, rather than the one who quietly cedes judgment and cannot even say why they agreed. Guarding cognitive sovereignty is so that you are both entitled and able to be the meaning-bound judge – precisely the individual-cognition precondition of the org volume's "people return to meaning."
整合:为什么它是三件里最难、也最易退化的
Integration: the hardest of the three, and the easiest to degrade
提问、质疑、整合三件元能力里,整合最常被低估,因为它看起来只是"把几个来源放一起"。但整合的内核动作其实很苛刻:把多个可能彼此冲突的来源,放进同一张心智模型里,求一个自洽的判断。这要求你脑中先有一张可以被冲突的模型——没有这张模型,所谓整合就退化成两种更廉价的赝品:要么是拼贴(把各来源的结论并排抄下,不做调和,看起来全面其实没判断),要么是随大流(直接采信出现频率最高或语气最自信的那个,把众数当真理)。AI 的便利恰好同时助长这两种赝品:它能瞬间汇总十个来源(诱你拼贴),也能给出一个语气笃定的综合答案(诱你随大流)。真正的整合反而要求你慢下来,亲自去碰那些来源之间的矛盾,并用自己的模型去裁决——这又是一个"价值在于慢"的具体例子(接 SHEET 14)。所以整合能力的退化往往最隐蔽:你以为自己在"综合信息",其实只是在转述 AI 替你做好的综合,而你那张本该被反复冲突、反复修正的模型,从未上场。
Of the three meta-skills – asking, challenging, integrating – integration is most often underrated, because it looks like merely "putting several sources together." But integration's core move is demanding: taking multiple, possibly conflicting sources and placing them into one mental model to reach a self-consistent judgment. This requires you to already hold a model that can be contradicted – without it, "integration" degrades into two cheaper counterfeits: either collage (copying each source's conclusion side by side without reconciliation, looking comprehensive but judging nothing) or going with the crowd (simply trusting the most frequent or most confidently-toned one, mistaking the mode for the truth). AI's convenience fosters both counterfeits at once: it can instantly summarize ten sources (tempting collage) and give one confidently-toned synthesized answer (tempting crowd-following). Real integration instead requires you to slow down, personally touch the contradictions between sources, and adjudicate with your own model – another concrete case of "value lies in slowness" (continuing SHEET 14). So the degradation of integration is often most hidden: you think you are "synthesizing information" when you are merely paraphrasing the synthesis AI did for you, while your model – the one that should be repeatedly contradicted and corrected – never took the field.
质疑命中率:为什么它无法靠"批判性思维课"凭空补
Challenge hit-rate: why a "critical-thinking course" can't conjure it
有一种流行的补救幻想:既然 AI 让人少思考,那就单独开一门"批判性思维"或"提问技巧"课把元能力补回来。本卷要明确否定这条捷径,因为它误解了质疑能力的来源。质疑 AI 输出的命中率,本质是领域内的错误侦测灵敏度——你能不能一眼看出"这个论断不对/这段代码有 bug/这个临床结论可疑"。这种灵敏度是高度领域特异的:一个资深医生能瞬间嗅出可疑的诊断,但在陌生的法律文书前同样会被 AI 的流畅蒙住。它来自该领域里大量亲历过的"这里不对劲"的瞬间,是 SHEET 02"能做"层的直接沉淀,无法靠一门通用的思维课迁移过去。所以"成为更好的提问者"不是去上课,是在你真正在乎的那个领域里保持亲手做的密度——保住那个让你有资格质疑的根基。这也反过来收紧了本卷的反调:抵抗便利不是泛泛地"多动脑",而是精确地守住那几个你需要保有错误侦测灵敏度的领域。
There is a popular remedial fantasy: since AI makes people think less, just open a standalone "critical thinking" or "questioning skills" course to top the meta-skills back up. The volume must explicitly reject this shortcut, because it misunderstands where the challenging capacity comes from. The hit-rate of challenging AI output is, at root, in-domain error-detection sensitivity – whether you can see at a glance that "this claim is wrong / this code has a bug / this clinical conclusion is suspect." That sensitivity is highly domain-specific: a senior physician can instantly smell a dubious diagnosis but, faced with an unfamiliar legal document, is just as easily fooled by AI's fluency. It comes from the many lived "something's off here" moments inside that domain, a direct deposit of the SHEET 02 "doing" layer, and cannot be transferred over by a generic thinking course. So "become a better question-asker" is not taking a class but keeping up the density of hands-on doing in the domain you actually care about – guarding the foundation that qualifies you to challenge. This in turn tightens the volume's contrarian stance: resisting convenience is not vaguely "think more" but precisely guarding the few domains where you need to retain error-detection sensitivity.
"更好的提问者"为什么是上游创新的输入
Why "a better question-asker" is the input to upstream innovation
Setting the goal at "a better question-asker" rather than "a better answer-holder" is not only an internal redraw of the learning volume; it has a precise interface in the series: only those who can ask well discover real problems, and real problems are the raw material of the innovation volume's "value discovery." This interface deserves clarity, or "question-asker" sounds like a vague compliment. A question's quality equals the degree to which it hits a real need times the degree to which it translates the vague into an attackable structure. AI is superb at searching for answers within a given question, yet it cannot judge for you whether the question itself is worth asking – the latter demands a sense of real-world gaps that can only grow from your lived experience in the domain (experience, SHEET 00). So "question quality" is not rhetoric but a structured capacity, and its ceiling is set by the foundation of your "doing" layer (the defense SHEET 03 keeps stressing).
这也给本卷的反调一个建设性的出口:抵抗便利不是为了守旧,而是为了守住那个"能发现真问题"的认知主体。如果一个人把所有提问都退化成"帮我查一下",久而久之他失去的不只是某项技能,而是识别什么值得问的那种品味——而那恰恰是 AI 时代最稀缺、最不可外包、且整个下游创新链都依赖的东西。学习卷守住它,创新卷才有上游的水源。
This also gives the volume's contrarian stance a constructive exit: resisting convenience is not for the sake of the old but to guard the cognitive subject who "can discover real problems." If a person degrades every question into "look this up for me," over time they lose not just some skill but the taste for recognizing what is worth asking – exactly the thing that is scarcest, most un-outsourceable in the AI era, and on which the whole downstream innovation chain depends. The learning volume guards it so the innovation volume has an upstream water source.
提问、质疑、整合:可训练的结构,不是天赋
Asking, challenging, integrating: a trainable structure, not a gift
Pull the three meta-skills apart and you can see which known mechanism each rests on, hence how to train them rather than chant "learn to ask questions." The core of asking is problem representation – translating a blob of vague discomfort into a structured, attackable question; it is trained by repeatedly doing problem decomposition by hand (SHEET 07, right column, first item), not by watching how others ask. The core of challenging is error detection – and error detection is a recognition capacity whose sensitivity is directly proportional to how many times you have personally lived "something's off here" in that domain; this is why challenge hit-rate is a downstream product of the SHEET 02 "doing" layer, and why an ungrounded challenge is mere contrarianism. The core of integrating is forcing several sources into one mental model and resolving for consistency – it requires that you already hold a model that can be contradicted, otherwise "integration" degrades into collage. All three grow on top of doing, which loops straight back to SHEET 02's warning that meta-skills cannot be talked into being.
这给"目标转移"一个不被误读的精确表述:目标不是从"内化"转向"不内化",而是内化的对象从离散事实上移到了这三套判断结构。它们同样昂贵、同样只能靠犯错-纠正循环长出来——区别只在于,循环的素材从"题目"变成了"我对 AI 的每一次提问与质疑"。一个健康的 AI-Native 学习者,把每一次与 AI 的交互都当成一次练这三件的机会,而不是一次省事。
This gives the "goal shift" a precise statement immune to misreading: the goal is not to move from "internalize" to "don't internalize," but that the object of internalization moves up from discrete facts to these three judgment structures. They are equally expensive and equally can only grow through the error-correction loop – the only difference is that the loop's material changes from "problems" to "each of my questions to and challenges of AI." A healthy AI-Native learner treats every interaction with AI as a rep on these three, not as a shortcut.
检验信号Test signal
质疑 AI 输出的频率与命中率、提问质量、整合多源的能力——而非答案召回率。一个健康的 AI-Native 学习者,会越来越多地反驳 AI、并且反驳对。The frequency and hit-rate of challenging AI outputs, the quality of questions, the ability to integrate multiple sources – not answer-recall rate. A healthy AI-Native learner increasingly pushes back on AI, and pushes back correctly.
Load-bearing claim (the fracture point · the most counter-intuitive sheet in the volume, on the exploration ledger): if you outsource deep thinking to AI for the long run, does deep thinking itself atrophy? This is the open question – the evidence reaches only correlation/short-term and cannot yet be settled. If the answer is yes, the methodology's core task flips from "efficient internalization" to "actively resisting convenience." This is the first time the series explicitly taps the brakes on "embrace AI wholesale."
立场先说清:B——有据警告 + 可证伪赌注,不是"AI 已经造成认知能力下降"。把萎缩写成已发生的事实,既越过了证据,也会反噬这卷的可信度。诚实的表述是:有一类机理上合理、且有先例的担忧,但它在 AI 这个对象上尚未被证实。两边证据都要摆上桌。
State the stance up front: B – an evidence-grounded warning plus a falsifiable bet, not "AI makes you stupid." Writing atrophy as an established fact both overshoots the evidence and backfires on this volume's credibility. The honest statement is: there is a class of concern that is mechanistically plausible and has precedent, but on the object of AI it is not yet proven. Both sides of the evidence go on the table.
担忧侧 · 待坐实Concern side · to be substantiated
相关 / 短期信号Correlational / short-term signals
Gerlich 2025[R4](Societies 15(1):6,N=666,横断面相关):AI 使用与批判性思维显著负相关,由认知卸载中介(总效应 b=-0.42)——作者自承不能证因果、无纵向数据。MIT/Kosmyna 2025[R5](arXiv:2506.08872,preprint,Ⅲ,N=54→第四轮仅 18):EEG 显示"认知负债"累积——样本极小、未经评审。机理先例:Sparrow 等 2011 "Google 效应"(Science 333:776)[R6]——预期可再获取则记得信息更少、记得"去哪找"更多。Gerlich 2025 (Societies 15(1):6, N=666, cross-sectional correlational): AI use negatively correlates with critical thinking, mediated by cognitive offloading (total effect b=-0.42) – the author concedes no causation, no longitudinal data. MIT/Kosmyna 2025 (arXiv:2506.08872, preprint, III, N=54 → only 18 by the 4th session): EEG shows accumulating "cognitive debt" – tiny sample, un-reviewed. Mechanistic precedent: Sparrow et al. 2011 "the Google effect" (Science 333:776)[R6] – when re-access is expected, people recall the info less and recall where to find it more.
反证侧 · 必须摆上Counter side · must be shown
最强因果反指向上The strongest causal evidence points up
最强的因果研究(脚手架式 AI 辅导)给出 +0.73 ~ 1.3 SD 的正效应——用对了,AI 显著提升学习。"认知再分配,而非衰退"(aiXiv 260215,2026 综述)[R7]:现有研究多为小样本、无纵向、相关设计,且把"在被卸载任务上少花力气"误读成"泛化损伤";AI 可能在把认知资源从可卸载任务重新分配到评估/综合/元认知。且 Sparrow 2011 在重复实验中未能稳定复现。The strongest causal studies (scaffolded AI tutoring) yield positive effects of +0.73 to 1.3 SD – used well, AI lifts learning significantly. "Cognitive redistribution, not decline" (aiXiv 260215, 2026 review): existing studies are mostly small-sample, non-longitudinal, correlational, and conflate "less effort on offloaded tasks" with "generalized impairment"; AI may be reallocating cognitive resources from offloadable tasks toward evaluation / synthesis / metacognition. And Sparrow 2011 has failed to replicate robustly.
分水岭 · "怎么用"The hinge · "how you use it"
影响不固定,取决于用法The effect is not fixed; it depends on use
结构化提示 RCT(MDPI Data 2025, 10(11):172,n=150):无引导地用 AI 助长卸载、不提升推理;结构化地用则显著降低卸载、提升批判推理与反思。结论:认知影响不是固定的,取决于你怎么用——这正给了"主动抵抗便利"一个可操作的支点。A structured-prompting RCT (MDPI Data 2025, 10(11):172, n=150): unguided AI use fosters offloading without improving reasoning; structured use significantly reduces offloading and improves critical reasoning and reflection. Conclusion: the cognitive effect is not fixed; it depends on how you use it – which is exactly the operable foothold for "actively resisting convenience."
所以这一卷不下"萎缩已发生"的判决,它下一个赌注:在证据落定之前,把"主动抵抗便利"当成低成本的保险来买。抵抗不是拒用 AI,是几个可操作的姿态——刻意先想后问(先自己产出假设,再让 AI 校验)、刻意手算/手写一遍、刻意延迟求助(给犯错-纠正循环留出发生的时间)。注意一个边界:合意困难只对有基础能成功响应的学习者有益,否则只是"不合意的困难"(Bjork 的告诫)——抵抗要有度,不是自虐。
So this volume does not deliver a verdict of "atrophy has happened"; it places a bet: before the evidence settles, buy "actively resisting convenience" as low-cost insurance. Resistance is not refusing AI; it is a few operable postures – deliberately think before you ask (produce your own hypothesis first, then let AI check it), deliberately compute/write it out by hand once, deliberately delay help (leave the error-and-correction loop time to happen). Note a boundary: desirable difficulty benefits only learners with enough background to respond successfully, otherwise it is merely "undesirable difficulty" (Bjork's caveat) – resistance should be measured, not self-flagellation.
证伪条件Falsification condition
这个赌注为假的条件:若一个纵向、随机、对长期可迁移能力的研究显示——重度 AL 协作者在无 AI 在场的迁移任务上不退步(甚至因再分配而进步),则"抵抗便利"的保险不必买,本卷②步的防御性主张随之削弱。先行指标:出现首批多年期纵向数据(目前为零);结构化用法与无引导用法的长期差距收敛或扩大。This bet is false if: a longitudinal, randomized study of long-term transferable capacity shows that heavy AI-collaborators do not regress on transfer tasks with no AI present (or even improve via redistribution); then the "resistance" insurance need not be bought, and this volume's defensive ② claim weakens accordingly. Leading indicators: the first multi-year longitudinal datasets appear (currently zero); the long-term gap between structured and unguided use converges or widens.
KEYFIG. L.2 / 合意困难曲线THE DESIRABLE-DIFFICULTY CURVE看懂:长期留存对难度的倒 U,AI 把你推向左端Read: long-term retention is an inverted-U over difficulty
沿曲线看:横轴是学习当下的难度/努力,纵轴是长期留存与迁移——两者是倒 U,不是单调。左端(太容易)流畅却不留痕;右端(太难、无基础)只剩挫败,是 Bjork 所说的"不合意的困难";峰值落在中间的合意困难带。AI 的便利默认持续把学习者推向左端——抵抗便利,本质就是主动把自己拉回到绿色带里。注意边界:把无基础者强行推到右端不是抵抗,是自虐。Along the curve: the x-axis is in-the-moment difficulty/effort; the y-axis is long-term retention and transfer – an inverted U, not monotone. The left end (too easy) is fluent but leaves no trace; the right end (too hard, no base) is pure frustration, Bjork's "undesirable difficulty"; the peak sits in the productive-struggle band in the middle. AI's convenience default keeps pushing the learner leftward – so resisting convenience is, at root, actively pulling yourself back into the green band. Note the boundary: forcing a no-base learner to the right end is not resistance but self-harm.
FIG. L.11 / 两个对立假说TWO RIVAL HYPOTHESES看懂:同一批数据,两条相反的解释,至今无人能裁决Read: one body of data, two opposite readings, no one can yet adjudicate
从图里读出:左右两块用的是同一批观测——AI 重度使用者在某些任务上表现更弱。萎缩假说把它读成能力净退化(红,证据为 [R4][R5][R10] 这类相关/短期信号);再分配假说把它读成认知资源上移、被卸载的只是低阶任务(蓝,延伸心智 [R12] 与过渡相论 [R7] 同向)。关键在最底那条黑带:唯一能分开二者的实验——无 AI 在场时独立产出的纵向轨迹——至今没有数据。诚实的做法不是站队,而是把强命题降格为赌注,并去做那个实验。What the figure says: the two panels read the same observations – heavy AI users do worse on certain tasks. Atrophy reads it as net capacity decline (orange; its evidence is correlational/short-term signals like [R4][R5][R10]); reallocation reads it as cognitive resources moving up while only low-level tasks are offloaded (blue; the extended mind [R12] and a transition-phase view [R7] point the same way). The crux is the black bar at the bottom: the only experiment that could separate them – the longitudinal trajectory of independent output with no AI present – has no data yet. The honest move is not to take a side but to downgrade the strong claim to a bet and go run that experiment.
为什么"抵抗便利"是一份理性的保险,而非道德主张
Why "resist convenience" is rational insurance, not a moral claim
很多对 AI 与认知的讨论滑向道德语气——"应该自律""不该偷懒"。本卷刻意不走这条路,因为道德主张既无证据支撑、也无法操作。它把抵抗便利建构成一个纯粹的决策论问题:在萎缩假说与再分配假说尚未分出胜负的窗口期,"主动抵抗"这个动作的期望成本极低(改次序、留挣扎窗口,每天几分钟),而它对冲掉的潜在损失极大(如果萎缩假说成真,失去的是判断主权这种无法事后补救的能力)。一个低成本、对最坏情形高度保护、且在最好情形里也不亏(再分配假说下它恰好就是引导资源上移的动作)的选择,在任何理性决策框架里都是该买的保险。这就是为什么本卷反复说它下的是"赌注"而非"判决":判决需要证据落定,保险只需要不确定性足够大、且下行风险足够严重。当前的证据状态——强相关信号、强反证、零纵向数据——恰好就是"该买保险"的标准情形。把抵抗便利说成美德会让它显得可选、可被意志力豁免;把它说成保险,它就成了一个无论你性格是否自律都该执行的理性动作。
Much discussion of AI and cognition slides into a moral register – "should be disciplined," "shouldn't be lazy." The volume deliberately avoids that path, because a moral claim is neither evidence-backed nor operable. It frames resisting convenience as a pure decision-theory problem: in the window where the atrophy and redistribution hypotheses have not yet been decided, the expected cost of "actively resisting" is very low (reorder, keep a struggle window, a few minutes a day), while the potential loss it hedges is very large (if atrophy proves true, what is lost is judgment sovereignty, a capacity not recoverable after the fact). A choice that is low-cost, highly protective against the worst case, and not even losing in the best case (under redistribution it is precisely the move that steers resources upward) is, in any rational decision framework, insurance worth buying. This is why the volume keeps saying it places "a bet," not "a verdict": a verdict needs the evidence settled; insurance only needs the uncertainty large enough and the downside severe enough. The current evidence state – strong correlational signals, strong counter-evidence, zero longitudinal data – is exactly the textbook case for "buy the insurance." Calling resisting convenience a virtue makes it look optional, exemptible by willpower; calling it insurance makes it a rational move you should execute whether or not your temperament is disciplined.
抵抗的可操作姿态:不是少用,是改次序
The operable postures of resistance: not use less, but reorder
"主动抵抗便利"最容易被误解成"少用 AI 甚至不用 AI",那是把它读成了苦行。它的精确含义是改变与 AI 协作的次序与节奏,让犯错-纠正循环仍有发生的空间。给三个具体、低成本、可立刻照做的姿态。其一,刻意先想后问:任何要交给 AI 的问题,先用两分钟写下你自己的假设或第一版答案,再把它喂给 AI——这一步保证你的认知先上场,AI 退回校验位(接 SHEET 07 流向)。其二,刻意手算/手写一遍:对一项你想真正内化的技能,定期不用 AI、不用补全,从头手做一遍——哪怕慢、哪怕错,这是唯一能产生"能做"层留痕的动作。其三,刻意延迟求助:遇到卡点,先给自己一个固定的挣扎窗口(比如十五分钟)再求助 AI——这个窗口就是合意困难,是让记忆痕迹被加固的地方。三个姿态的共同点:成本极低、不需要意志硬扛、且都是在调次序而非减用量。
"Actively resisting convenience" is most easily mistaken for "use AI less or not at all," which reads it as asceticism. Its precise meaning is changing the order and rhythm of collaborating with AI so the error-correction loop still has room to happen. Here are three concrete, low-cost, immediately actionable postures. First, deliberately think before you ask: for any question you will hand to AI, spend two minutes writing your own hypothesis or first answer, then feed it to AI – this guarantees your cognition takes the field first and AI falls back to the verify seat (continuing SHEET 07's flow). Second, deliberately compute/write it out by hand once: for a skill you want to truly internalize, periodically do it from scratch with no AI and no autocomplete – slow, even wrong, this is the only action that leaves a "doing"-layer trace. Third, deliberately delay help: at a sticking point, give yourself a fixed struggle window (say fifteen minutes) before consulting AI – that window is the desirable difficulty, where the memory trace gets hardened. What the three share: very low cost, no white-knuckling required, and all reorder rather than reduce usage.
GPS 与计算器:先例为什么只能当机理提示,不能当判决
GPS and calculators: why precedents are only mechanistic hints, not verdicts
The two precedents the volume often invokes – GPS weakening spatial cognition, calculators weakening mental arithmetic – deserve an honest account of their evidence boundaries, or they become a soft spot used against it. GPS is the most cited: the London taxi-driver studies (Maguire 2000) found long-term spatial navigation correlating with a larger posterior hippocampus, and habitual GPS use correlating with less hippocampal grey matter (Dahmani & Bohbot 2020). But there is an unignorable direction problem here: does GPS cause hippocampal atrophy, or do people with a naturally weaker hippocampus tend to rely on GPS? Cross-sectional correlation cannot answer. The calculator-arithmetic line is weaker still – a widely repeated claim that lacks a strong first-hand empirical anchor, with most citations speaking in generalities, so the volume deliberately does not use it as hard evidence. Place the two precedents in their proper SHEET 04 position: they provide mechanistic plausibility (outsourcing some cognition truly could change inner capacity), not causal proof about AI (atrophy on the object of AI remains an open empirical question). Honestly distinguishing "mechanistic precedent" from "causal verdict" is the key step by which the volume avoids sliding into scaremongering – precedents let the concern stand, but only let it rest at the grade of "a bet," not reach "a verdict."
两个对立假说,都要放在桌上
Two rival hypotheses, both kept on the table
诚实地处理这个悬案,意味着不偷偷预设答案。眼前有两个机理上都站得住、且都有部分证据的对立假说,它们对同一批观测给出相反的解读,必须并列。萎缩假说:长期把深度思考外包给 AI,被卸载的内在能力像久不用的肌肉一样退化;它的证据是一批相关/短期信号(Gerlich、MIT、Sailer),机理先例是 GPS-海马与 Google 效应。再分配假说(aiXiv 260215,2026):AI 并不让认知能力净退化,而是把认知资源从可卸载的低阶任务重新分配到评估、综合、元认知等高阶功能;现有研究测到的"在被卸载任务上变弱",可能只是资源转移的过渡相,不是永久衰退;延伸心智(Clark & Chalmers)与之同向——卸载是认知边界外移,不是能力流失。两个假说的关键分歧在一个目前无人有数据回答的问题上:在无 AI 在场的迁移任务上,重度协作者的独立产出,随时间是下降还是上升?
Handling this open question honestly means not quietly presupposing the answer. There are two rival hypotheses, both mechanistically defensible and both with partial evidence, giving opposite readings of the same observations; they must stand side by side. The atrophy hypothesis: outsourcing deep thinking to AI for the long run lets the offloaded inner capacity decay like a long-unused muscle; its evidence is a batch of correlational/short-term signals (Gerlich, MIT, Sailer), with mechanistic precedent in GPS-hippocampus and the Google effect. The redistribution hypothesis (aiXiv 260215, 2026): AI does not net-degrade cognitive capacity but reallocates cognitive resources from offloadable low-order tasks toward higher-order functions – evaluation, synthesis, metacognition; what existing studies measure as "weaker on offloaded tasks" may be only the transition phase of that shift, not permanent decline; the extended mind (Clark & Chalmers) points the same way – offloading is the cognitive boundary moving outward, not capacity draining away. The hypotheses' key disagreement turns on a question no one currently has data to answer: on transfer tasks with no AI present, does a heavy collaborator's independent output fall or rise over time?
本卷不假装知道哪个假说为真——它的全部主张只是:在这个分歧未决的窗口期,"主动抵抗便利"是一份低成本、两头都对冲的保险。如果萎缩假说为真,抵抗就避免了真实的能力流失;如果再分配假说为真,抵抗(先自答、保留循环、设合意困难)恰好就是把资源往高阶功能引导的那套动作——它在两种世界里都不亏。这种"无论哪个假说成立都该做"的处方,是面对未决悬案时唯一稳健的下注方式。把它写成"AI 已经造成认知能力下降"会同时背叛两个假说的复杂性,也越过了证据——所以本卷反复把立场钉在 B 档:有据的警告,加一个写明证伪条件的赌注。
This volume does not pretend to know which hypothesis is true – its entire claim is only this: during this window of unsettled disagreement, "actively resisting convenience" is low-cost insurance that hedges both ways. If atrophy is true, resistance averts a real loss of capacity; if redistribution is true, resistance (self-answer first, keep the loop, build desirable difficulty) is exactly the set of moves that steers resources toward higher-order functions – it loses in neither world. This "do it whichever hypothesis holds" prescription is the only robust way to bet in the face of an open question. Writing it as "AI makes you stupid" would betray both hypotheses' complexity and overshoot the evidence – which is why the volume repeatedly pins its stance at grade B: an evidence-grounded warning plus a bet with its falsification condition spelled out.
Cognitive offloading is itself neutral – the extended-mind thesis (Clark & Chalmers 1998) even argues it is the normal state of human cognition; paper, maps, notebooks are all external brains. What actually decides whether it is a scaffold or a crutch is not "AI or no AI" but where it falls on the capability-growth curve and which way the context flows. The defining feature of a scaffold is that it is removable: when Vygotsky used the word, he meant support gradually withdrawn as ability grows, until the learner stands alone. A crutch is the opposite: the longer it is used, the more the supported muscle atrophies, the deeper the dependence, the further off the day of removal. The same AI is a scaffold to someone who derives first and asks it to check; a crutch to someone who asks for the answer outright and never reproduces it. The difference is not in the tool but in the order and the removal path – exactly what SHEET 07's "human-first" direction rule is built to lock in.
所以"断裂点"的精确位置不是某个使用频率的阈值,而是撤除测试失败的那一刻:当你已经无法回答"撤掉这个支撑我还站得住吗",脚手架就已经在你不知情时变成了拐杖。这给了"抵抗"一个可操作的扳手——不是少用 AI,是定期做撤除演练:每隔一段,刻意在无 AI 条件下重做一遍核心任务,把撤除测试当成例行体检(这正是 SHEET 08 仪表盘"迁移"那条信号在测的东西)。
So the precise location of the "break-point" is not a usage-frequency threshold but the moment the removal test fails: when you can no longer answer "can I still stand if this support is taken away," the scaffold has already become a crutch without your noticing. This hands "resistance" an operable wrench – not using AI less, but running periodic removal drills: every so often, deliberately redo the core task with no AI, treating the removal test as a routine check-up (this is exactly what the SHEET 08 dashboard's "transfer" signal measures).
系列接缝Series seam
本张与工程卷的 trust-but-verify 同构:工程对 AI 产物不信任、刻意保留人的校验环(防代码错误);学习对自身依赖不信任、刻意保留人的认知环(防认知萎缩)。一个守外部正确,一个守内部能力——同一个"刻意保留人类环节"的纪律,作用在两个面。This sheet is isomorphic to the engineering volume's trust-but-verify: engineering distrusts AI artifacts and deliberately keeps a human verification loop (guarding against code errors); learning distrusts its own dependence and deliberately keeps a human cognitive loop (guarding against cognitive atrophy). One guards external correctness, the other internal capability – the same discipline of "deliberately retaining a human step," on two surfaces.
Load-bearing claim: make step ③ concrete on the learning surface – a personal cognitive scaffold becomes infrastructure: an error-and-reflection log, a same-source diffable personal knowledge base, and deliberately built "desirable difficulty." The key inversion: in a toolchain where convenience is the default, the learner must actively add friction to themselves – the scaffold is not only a guardrail but retained-on-purpose difficulty.
The engineering volume's ③ is "the codebase is queryable, same-source for people and machines." The learning ③ is same-source but goes further: the scaffold is not an answer warehouse but the construction site for rebuilding cognitive structure. State the principle, do not list tools – the real value of a personal knowledge base (whether it is called Markdown or anything else) is not "software features" but three underlying principles:
人机同源:你和 AI 读同一份纯文本——它可被你重读、被 AI 查询,认知留痕不锁进私有黑箱。
Same-source for people and machines: you and AI read the same plain text – re-readable by you, queryable by AI, cognitive traces not locked in a proprietary black box.
Diffable / versioned: you can see how your understanding changed. This is exactly the value of an error-and-reflection log – it records not the answer but where you were once wrong, and why.
刻意保留的难度:脚手架里要内建"合意困难"——间隔重复、交错练习、用提取(自测)代替呈现(重读)。Bjork(1994;2011;2020 JARMAC 9(4):475):那些减慢表面学习的难度,反而提升长期保持与迁移;那些加速表面学习的,常常损害留存。这是承重锚——数十年可复现、与 AI 无关。
Difficulty retained on purpose: the scaffold builds in "desirable difficulty" – spaced repetition, interleaving, retrieval (self-testing) in place of presentation (rereading). Bjork (1994; 2011; 2020 JARMAC 9(4):475): the difficulties that slow apparent learning improve long-term retention and transfer; those that speed apparent learning often harm retention. This is a load-bearing anchor – decades-replicable and AI-independent.
这就是抵抗便利的工程化落地:不靠意志力硬扛,而是把阻力做进工具流。还有一条"速度"公理在底下撑着——某些过程的价值正在于慢:记忆巩固需要时间与睡眠(间隔效应、睡眠期回放,Stickgold/Diekelmann & Born 等,Ⅱ),是 AI 压缩不掉的物理时间。脚手架要顺着这条时间常数设计,而不是对抗它。
This is the engineering instantiation of resisting convenience: not white-knuckling on willpower, but building the friction into the toolflow. A "speed" axiom underwrites this from below – the value of certain processes lies precisely in being slow: memory consolidation needs time and sleep (the spacing effect, sleep-stage replay; Stickgold / Diekelmann & Born et al., II), physical time that AI cannot compress. The scaffold should be designed along this time constant, not against it.
FIG. L.5 / 认知脚手架即基设SCAFFOLD AS INFRASTRUCTURE看懂:三根承重柱托起一个可拆的认知工地Read: three load-bearing pillars hold up a removable construction site
三根柱子怎么读:个人认知脚手架不是一个软件,是三根承重柱托起的一块工地:错题反思库(焊测试效应)、人机同源可 diff 的知识库(让理解的变化看得见)、以及第三根最反直觉的柱子——刻意制造的难度。前两根别卷也有近亲,第三根是学习卷独有:当工具栈默认抹平一切阻力,护栏就不够了,你得自己往里加合意困难。三根柱子都坐在一条压不掉的时间常数上。Reading the three pillars: a personal cognitive scaffold is not a piece of software but a site held up by three pillars: an error-reflection log (welding in the testing effect), a same-source diffable knowledge base (making changes in understanding visible), and the third, most counter-intuitive pillar – deliberately inserted friction. The first two have cousins in the other volumes; the third is unique to learning: when the stack smooths away all friction by default, guardrails are not enough – you must add desirable difficulty yourself. All three rest on a time constant that cannot be compressed.
FIG. L.9 / 卸载的临界点THE OFFLOADING BREAK-POINT看懂:同一个 AI,过了临界点就从脚手架翻成拐杖Read: the same AI flips from scaffold to crutch past one tipping point
沿曲线看:同一个工具,净作用不是恒正的。依赖浅时它把你撑高——脚手架;越过临界点后曲线穿过零线转负——它开始替你做你本该自己长出来的那部分,成了拐杖。难点在于临界点不写在工具上,要靠一个动作去测:可撤除性(沿用 [R17] 维果茨基对脚手架的定义——支持必须可逐步撤回)。今天就把 AI 撤掉,产出只是略降,你还在脚手架一侧;若直接塌掉,你已过点。这也正是 [R19] METR 那条"自感更快、实测更慢"的合成自信之所以危险——它让你以为还在左侧,其实已滑过临界点。Along the curve: the same tool's net effect is not constantly positive. While reliance is shallow it lifts you – a scaffold; past the break-point the curve crosses zero and turns negative – it starts doing for you the very part you were supposed to grow yourself, becoming a crutch. The hard part is that the break-point is not printed on the tool; you locate it with one act: removability (following [R17] Vygotsky's definition of a scaffold – support that must be withdrawable in steps). Pull the AI today; if output merely dips, you are still on the scaffold side; if it collapses, you have crossed. This is exactly why [R19] METR's "felt faster, measured slower" synthetic confidence is dangerous – it lets you believe you are on the left while you have already slipped past the point.
把抵抗从意志层搬到流程层
Move resistance from the willpower layer to the process layer
The one sentence to remember from this sheet: do not resist convenience by self-discipline; resist it by process. The reason is pragmatic – human willpower is a scarce, depletable resource, while convenience's default gravity is present around the clock, tireless. In a long contest, willpower loses. So the volume repeatedly lands the prescription on "build the scaffold" rather than "be more disciplined," which is at root a structural relocation: move resistance from the willpower layer, where you must refight the battle daily, to the process layer, designed once and then running automatically. The fixed "next review date" in the error-reflection log turns "I must remember to review," which leans on memory, into a process that reminds you on schedule; the interaction template's "self-answer first, demand reasons, leave a reflection" turns "I must resist asking straight for the answer," which leans on restraint, into a process where you ask that way by default. Once the scaffold is built, think-before-asking, spaced review, and trace-return become the path of least resistance, and you no longer need willpower to choose them daily – you just follow the track you laid down. This is why the volume writes no blank check of "try harder," only the engineering prescription of "freeze the effort once into the toolflow": sustainable resistance was never about gritting teeth, it is about design.
复利:为什么脚手架越用越值钱
Compounding: why the scaffold grows more valuable with use
The compounding nature of the scaffold "as infrastructure" deserves a concrete scene, or "compounding" stays a nice word. Picture your error-reflection log running for a year. In the first month it is just a scatter of errors, value nearly linear – one logged, one gained. But as entries accumulate, a new source of compounding appears: you start noticing certain errors recur in different guises (transfer hooks linking them into patterns), so a new entry is no longer just "plus one" – it activates a reunderstanding of a whole class of error, lifting the value of old entries too. Further on, the whole log becomes a mirror: you can diff your mental model from three or six months ago, seeing some blind spots gone and some stubborn biases still there – and this capacity to "see your own cognition change" is something no single entry can give; it is structural, emergent from the whole. This is exactly where infrastructure differs from a tool: a tool's value is fixed, infrastructure's value grows non-linearly with use. It also explains why the volume keeps stressing "start small but keep going" – compounding's precondition is time, and a grandly built log abandoned in three days never reaches the compounding moment.
合意困难是脚手架里最反直觉的一根柱子
Desirable difficulty is the scaffold's most counter-intuitive pillar
The first two pillars (the reflection log, the same-source knowledge base) have cousins in other methodologies; only the third – deliberately adding difficulty into the toolflow – is unique to the learning volume and the most counter to product intuition. Almost every learning tool's default evolution is "smoother, more effortless, more automatic"; this volume argues the reverse: in a convenience-default stack, guardrails (error-prevention, reminders) are not enough – you must actively design desirable difficulty into the process. This is not an attitude but lands on three concrete substitutions: replace presentation (rereading, being explained to) with retrieval (self-testing, closed-book redo); replace massing (learning it all at once) with spacing (spread-out review); replace blocking (grinding one type) with interleaving (mixing problem types). All three substitutions carry grade-II evidence (Bjork's desirable-difficulty family, Roediger & Karpicke's testing effect) and share one trait: each makes the present harder, slower, less comfortable, yet makes long-term retention and transfer better. Raising this pillar means accepting a design rule against product intuition – a good learning scaffold should deliberately create friction in the right places, not eliminate all friction.
人机同源:为什么是纯文本,而不是更聪明的笔记 App
Same-source: why plain text, not a smarter notes app
"人机同源"这条原理常被误读成"用某个支持 AI 的笔记软件",这把原理降级成了选品。它的真正含义比工具深:你和 AI 应该读写同一份可被你直接重读、且不被任何私有格式锁住的载体。为什么这条是承重的,可以从反面看——如果你的认知留痕沉淀在一个把内容锁进私有黑箱、只能通过它的 AI 接口访问的系统里,那么你与"自己过去的思考"之间就多了一个不受你控制的中介。同源原理要排除的正是这个中介:纯文本(Markdown 之类)的价值不在它"简单",而在它把你和你的认知轨迹之间的距离压到零——没有中介、没有锁、可被任意工具读、可被 diff、可被版本控制。这与工程卷"代码库人机同源"是同一条原理的两个实例:让人和机器在同一份不被锁的事实上协作,而不是各持一份会漂移的副本。
"Same-source for people and machines" is often misread as "use some AI-capable notes app," which demotes a principle into product selection. Its real meaning runs deeper than tools: you and AI should read and write one carrier that you can reread directly and that is not locked behind any proprietary format. Why this is load-bearing shows from the contrary – if your cognitive traces settle into a system that locks content in a private black box accessible only through its own AI interface, then between you and "your own past thinking" stands a mediator you do not control. The same-source principle exists to exclude exactly that mediator: the value of plain text (Markdown and the like) is not that it is "simple" but that it collapses the distance between you and your cognitive trajectory to zero – no mediator, no lock, readable by any tool, diffable, versionable. This is the same principle as the engineering volume's "same-source codebase," in two instances: let people and machines collaborate on one un-locked fact, not each holding a copy that drifts.
Calling the scaffold "infrastructure" is not rhetoric. Infrastructure has three defining traits, and the cognitive scaffold matches each – which is where the word bears weight. First, it runs in the background without consuming attention: a mature reflection log is like running water, there when opened, not forcing a daily decision when not – which is exactly what "build friction into the process rather than white-knuckle it" means. Second, it compounds: each added error, each added trace lifts the whole base's value non-linearly, because new entries link to old ones as transfer hooks (SHEET 09); this is the same compounding logic by which the org volume makes context a "queryable infrastructure," only here the asset is your own cognitive trajectory. Third, it defines what the default path is: infrastructure's greatest power is changing default behavior rather than relying on willpower – once "self-answer before asking AI" is hardened by the scaffold into the default route, resisting convenience stops being a daily uphill fight and becomes going with the current.
This third point answers a hard problem SHEET 04 left open: if resisting convenience runs purely on willpower, it is doomed to lose to a convenience-default stack – human self-discipline is scarce and depletes. The real claim of scaffold-as-infrastructure is: use engineering to move resistance from the willpower layer to the process layer. You do not decide daily whether to think-before-asking; you build an environment where "think before you ask" is the path of least resistance, then let the environment keep the discipline for you. This is why the volume's prescription lands on "build the scaffold," not "be more disciplined."
检验信号Test signal
反思库的回流使用率(你真的回去重读自己的错题吗)、迁移测试通过率、以及"主动设阻力"的习惯化——阻力从靠意志变成靠流程。The return-use rate of the reflection log (do you actually go back and reread your own errors), transfer-test pass rate, and the habituation of "adding friction on purpose" – friction shifting from willpower to process.
LEARN
06
REDRAW · 重画
REDRAW
重画 · 反转收束 + 适用边界
Redraw · Inversion + Scope
知道什么不该让 AI 做
Knowing What Not to Let AI Do
承重命题(反转收束):学习的终极重画不是"学会用 AI",而是"知道什么不该让 AI 做"。划出 AI 止步线:判断力、价值感知、直觉、深度思考、品味——这些一旦外包就会萎缩,且正是下游所有人本判断赖以运转的人类底座。本张也承担适用边界:本卷只谈个人认知的"能力内化",不是组织培训、不是教育体系改革。
Load-bearing claim (the inversion that closes the arc): learning's ultimate redraw is not "learn to use AI" but "know what not to let AI do." Draw the AI stop-line: judgment, value perception, intuition, deep thinking, taste – these atrophy the moment they are outsourced, and they are the human bedrock every downstream human judgment runs on. This sheet also carries the scope boundary: this volume is about individual cognitive "capability internalization," not corporate training, not education-system reform.
Collapse the whole volume into one act: draw the line. On one side, AI does the work and the saved effort is genuinely saved. On the other side is a set of cognition kept un-outsourced on purpose – sharing one trait: outsourcing it atrophies it, and it happens to be the very source of human irreplaceability in the other volumes. This is where this volume seams directly to the downstream:
交给 AI · 可充裕Hand to AI · abundance-able
查事实、检索、汇总
Looking up facts, retrieval, summarizing
生成初稿、样例、讲解
Generating drafts, examples, explanations
范式内的、可机检的推导
In-paradigm, machine-checkable derivation
重复性、低价值负载的执行
Repetitive, low-value-load execution
留给人 · AI 止步线内Keep with humans · inside the stop-line
品味——设计卷取用它,本卷养成并守护它
Taste – the design volume consumes it; this volume grows and guards it
价值感知 / 判断力——组织卷人本主线的认知前提
Value perception / judgment – the cognitive precondition of the org volume's human through-line
直觉 / 深度思考——质疑命中率的来源(SHEET 03)
Intuition / deep thinking – the source of challenge hit-rate (SHEET 03)
"什么值得知道 / 做"的构成性价值判断
The constitutive value judgment of "what is worth knowing / doing"
系列接缝 · 品味的两侧Series seam · the two sides of taste
设计卷讲"品味是稀缺判断、生成默认滑向 slop"——那是品味的应用侧。本卷讲"品味/直觉是不可外包、需主动培养的认知"——这是品味的养成侧。两卷在"品味"一词上对接:学习生产并守护品味,设计应用品味。同理,组织卷要人回归于意义,本卷是那份意义所需判断力的上游保障。The design volume says "taste is scarce judgment; generation defaults toward slop" – that is the application side of taste. This volume says "taste/intuition is un-outsourceable cognition that must be actively cultivated" – the cultivation side. The two volumes meet on the word "taste": learning produces and guards taste; design applies it. Likewise, the org volume wants people to return to meaning; this volume is the upstream guarantee of the judgment that meaning requires.
适用边界 · 在域Scope · in-domain
谁适用:把一项技能/认知能力内化进自己的个人——开发者、研究者、设计者、任何要在 AI 协作里保持主导权的人。绿地与改造同适用:绿地从零建脚手架与止步线;改造则先体检已被悄悄外包、却本不该外包的能力,再重建其犯错-纠正循环与合意困难。
Who it fits: individuals internalizing a skill or cognitive capacity into themselves – developers, researchers, designers, anyone who must stay in command while collaborating with AI. Both greenfield and transformation apply: greenfield builds scaffold and stop-line from zero; transformation first audits the capacities quietly outsourced but that should not have been, then rebuilds their error-correction loop and desirable difficulty.
Who is out of range: organization-scale training systems, school-education reform, assessment and certification design – they have their own constraints (scale, equity, accountability) and cannot simply borrow individual-cognition conclusions. This volume seams to them in one stroke only: if an organization wants "people to return to meaning," it must keep its members' judgment un-atrophied – but how to do that at organizational scale is another book. No faking coverage to look universal.
FIG. L.6 / 止步线判据THE STOP-LINE CRITERION看懂:两维定四格,只有一格是真正的危险区Read: two dimensions, four cells, one true danger zone
定位你的格子:两维——横轴可外包性(AI 能不能稳定代劳)、纵轴构成性(它萎缩会不会伤及你做判断的根基)。四格里只有右上"高可外包 × 高构成"是真正的危险区:诱人(AI 做得了)又致命(外包就萎缩,且萎缩反噬判断)。止步线就是这一格的边界。它会随模型变强向左推(越来越多事进入"AI 做得了"),但判据本身不过时——这就是为什么本卷给判据而不给清单。Locate your cell: two dimensions – x outsourceability (can AI do it stably), y constitutiveness (does its atrophy harm the root of your judgment). Of the four cells only the top-right, high-outsourceability × high-constitutiveness, is the true danger zone: tempting (AI can) and lethal (outsourcing atrophies it, and the atrophy backfires on judgment). The stop-line is that cell's border. It drifts left as models improve (ever more enters "AI can do it"), but the criterion does not date – which is why the volume gives a criterion, not a list.
把判据跑一遍:三个真实的判断例
Run the criterion: three real worked judgments
判据只有被跑过才算落地,举三个不同结论的例子,演示两维怎么共同定位。例一 · 让 AI 生成项目的样板代码(boilerplate)。可外包性高(AI 稳定做得了),构成性低(样板本身不是你判断力的根基,记不住也无损)——落在"放心外包"格,放手。例二 · 让 AI 替你做一个领域问题的初步分解。可外包性高(AI 能给出像样的分解),但构成性高——问题分解正是 SHEET 03 提问能力的内核,长期外包它,你识别真问题的能力会空。这落在右上"便利陷阱"格:诱人且致命,止步线内。处置:你先自己分解,再让 AI 补你漏掉的角度(流向规约)。例三 · 让 AI 帮你算一道你已经熟练的算术。可外包性高,构成性低(你早已内化、撤除测试稳过),落在"放心外包",且无需愧疚——这正是 Bjork 边界的提醒:对已牢固的能力强行设阻,是不合意的困难。三个例子的差别全在构成性那一维:可外包性几乎都高(这正是 AI 时代的特征),真正决定交不交的,是"它萎缩了会不会伤及你的判断根基"。把任何一项你正在考虑外包的能力丢进这两维,答案就出来了——这也是 INSTRUMENT 11/12 在做的事。
A criterion is only landed once run, so here are three examples with different verdicts, showing how the two dimensions jointly locate. Case 1 · Having AI generate the project's boilerplate. High outsourceability (AI does it stably), low constitutiveness (boilerplate is not the root of your judgment; not remembering it costs nothing) – lands in "outsource freely," let go. Case 2 · Having AI do a first-pass decomposition of a domain problem. High outsourceability (AI gives a decent decomposition), but high constitutiveness – problem decomposition is the core of the SHEET 03 asking capacity; outsource it long-term and your ability to spot real problems hollows out. This lands in the top-right "convenience trap" cell: tempting and lethal, inside the stop-line. Handling: decompose it yourself first, then have AI add the angles you missed (the flow rule). Case 3 · Having AI do an arithmetic you are already fluent in. High outsourceability, low constitutiveness (long internalized, removal test passes easily) – lands in "outsource freely," and without guilt; this is Bjork's boundary reminding you that forcing friction on an already-solid capacity is undesirable difficulty. The difference across the three lies entirely in the constitutiveness dimension: outsourceability is high in almost all (that is the AI era's signature); what actually decides handing-off is "would its atrophy harm the root of your judgment." Drop any capacity you are considering outsourcing into these two dimensions and the answer falls out – which is exactly what INSTRUMENT 11/12 do.
品味的两侧:学习生产它,设计应用它
Two sides of taste: learning produces it, design applies it
"品味"这个词同时出现在设计卷和本卷,但站在它的两侧,值得把接缝说精确,否则两卷像在重复。设计卷处理的是品味的应用侧:在生成富、slop 默认的环境里,品味是那把稀缺的判断尺,决定从一堆 AI 产出里挑哪个、改哪里、退回哪个——它假设品味已经存在于那个做判断的人身上。本卷处理的是品味的养成侧:品味从哪来?它不是天生的,也不能被外包给 AI 习得——它是大量亲历的"好与不好"判断在某个领域沉淀出的直觉,属于 SHEET 06 止步线内那组"外包就萎缩"的认知。把两卷接起来读,得到一条完整的因果链:学习卷守护并养成品味(不让它被便利侵蚀)→ 设计卷调用品味去对抗 slop。没有前者守住源头,后者就无尺可用——这正是为什么本系列把学习卷定位成"批判性良知":它守的不只是个人能力,是整个下游判断链赖以运转的那个人类底座。
"Taste" appears in both the design volume and this one, but on its two sides; the seam is worth stating precisely, or the volumes look repetitive. The design volume handles taste's application side: in a generation-rich, slop-default environment, taste is the scarce ruler of judgment that decides which of a heap of AI outputs to pick, where to fix, which to reject – it assumes taste already exists in the person judging. This volume handles taste's cultivation side: where does taste come from? It is neither innate nor acquirable by outsourcing to AI – it is the intuition that many lived "good vs not-good" judgments deposit in a domain, one of the SHEET 06 inside-the-stop-line capacities that "atrophy if outsourced." Read the two volumes together and you get a full causal chain: the learning volume guards and cultivates taste (keeping it from being eroded by convenience) → the design volume invokes taste to fight slop. Without the former guarding the source, the latter has no ruler to use – which is why the series positions the learning volume as the "critical conscience": it guards not only individual capability but the human bedrock the entire downstream judgment chain runs on.
止步线不是一条静态名单,是一个判据
The stop-line is not a static list but a criterion
把"什么不该外包"写成一份固定清单,会很快过时——模型一升级,清单就要重划。更耐用的是给出判据,让你自己在任何新能力上现场判断。判据有两维,缺一不可。第一维:可外包性——这件认知 AI 能不能稳定、可机检地代劳?纯属可充裕(查事实、范式内推导)的,外包无损,放手。第二维:构成性——这件认知是不是你做出后续判断的前提?换句话说,它萎缩了,你会不会连"判断 AI 对不对"都失去资格?只有同时高可外包 × 高构成性的那一格,才是止步线要守的危险区——它诱人(AI 做得了)又致命(外包就萎缩、且萎缩了伤及判断根基)。这正是 INSTRUMENT 11 那张四象限图的"便利陷阱"格在 SHEET 06 的理论根。
Writing "what not to outsource" as a fixed list ages fast – one model upgrade and the list must be redrawn. More durable is to give a criterion that lets you judge any new capability on the spot. The criterion has two dimensions, neither dispensable. Dimension one: outsourceability – can AI do this cognition stably and machine-checkably? Purely abundance-able items (looking up facts, in-paradigm derivation) are lossless to outsource; let go. Dimension two: constitutiveness – is this cognition a precondition for your later judgments? Put differently: if it atrophies, do you lose even the standing to judge whether AI is right? Only the cell that is simultaneously high-outsourceability × high-constitutiveness is the danger zone the stop-line guards – tempting (AI can do it) and lethal (outsourcing atrophies it, and the atrophy damages the very root of judgment). This is the SHEET 06 theoretical root of the "convenience trap" cell in INSTRUMENT 11's four-quadrant map.
为什么"构成性"这一维如此关键,而别卷不需要它?因为别卷处理的是外部产物的正确性,学习卷处理的是判断者本身的完整性。一段被 AI 写错的代码可以由测试兜住、由人复核,错误不传染人;但一个被长期外包掉的判断能力,会让那个人连"察觉自己判断错了"的能力一起失去——这是一种会侵蚀检测器的损伤。止步线之所以必须存在,正因为认知是唯一一个外包会反噬外包者元能力的领域。这也解释了为什么本卷的题眼不是"学会用 AI",而是反过来的"知道什么不该让 AI 做":前者优化外部产出,后者守护那个还能做判断的人。
Why is "constitutiveness" so pivotal, and why do the other volumes not need it? Because the others handle the correctness of external artifacts, while the learning volume handles the integrity of the judge themselves. A line of code AI got wrong can be caught by tests and reviewed by a human; the error does not infect the person. But a judgment capacity outsourced for the long run makes that person lose, along with it, the ability to notice that their judgment is wrong – a kind of damage that erodes the detector. The stop-line must exist precisely because cognition is the one domain where outsourcing backfires on the outsourcer's meta-capacity. This also explains why the volume's keystone is not "learn to use AI" but its inverse, "know what not to let AI do": the former optimizes external output, the latter guards the person who can still judge.
LEARN
07
DECISION · 决策
DECISION
决策 · 交出与保留
Decision · Hand off & keep
哪一步交出去,哪一步不交
Which Step to Hand Off, Which to Keep
承重命题:认知影响不是工具的固有属性,取决于你怎么用——结构化用法降卸载、提推理;无引导用法助卸载、不提推理(结构化提示 RCT,MDPI Data 2025)。所以"交不交给 AI"不是一刀切的开关,是一道随任务走的判断。这一张把它做成可照做的分诊:按"这步会不会动到不可外包的认知"决定交还是留,并规定上下文怎么流。
Load-bearing claim: the cognitive effect is not an intrinsic property of the tool but depends on how you use it – structured use reduces offloading and lifts reasoning; unguided use fosters offloading and lifts nothing (a structured-prompting RCT, MDPI Data 2025). So "hand it to AI or not" is not a one-size switch but a per-task judgment. This sheet turns it into a copyable triage: decide by "does this step touch un-outsourceable cognition," then specify how context flows.
同一个工具,两种用法,结果相反。这是全卷处方的支点:本卷开的不是"禁用 AI",是"设计用法"。把一项学习任务拆成步骤,每一步过同一道判断——这步动的是"知道"还是"能做"?动了的话,它在不在 AI 止步线内(SHEET 06)?下表是可照做的分诊:
Same tool, two modes of use, opposite results. This is the hinge of the whole volume's prescription: what this volume prescribes is not "ban AI" but "design the use." Decompose a learning task into steps, and run each step through one judgment – does this step touch "knowing that" or "knowing how"? If it touches doing, is it inside the AI stop-line (SHEET 06)? The table below is a copyable triage:
这步交给 AI(替代式安全)Hand this step to AI (substitution is safe)
取材:检索、汇总、找反例、列出我没想到的角度
Sourcing: retrieval, summarizing, finding counterexamples, listing angles I missed
脚手架:把我的草稿讲解给我听、生成练习题、出测验
Scaffolding: explaining my draft back to me, generating exercises, writing quizzes
校验:在我先产出后,核对我的推导、指出漏洞
Checking: after I produce first, verifying my derivation, flagging gaps
收尾:格式化、改错字、把已成形的判断整理成稿
Finishing: formatting, fixing typos, organizing a settled judgment into prose
这步留给自己(替代式有害)Keep this step (substitution harms)
第一稿的问题分解——亲手把混沌拆成可攻的子问题
The first-pass problem decomposition – breaking chaos into attackable subproblems by hand
犯错-纠正循环的那个错:错了先自纠,再看 AI(SHEET 02)
The error in the error-correction loop: when wrong, self-correct before consulting AI (SHEET 02)
质疑命中:判断 AI 哪里不对——这靠你亲手趟过的直觉(SHEET 03)
The challenge hit: judging where AI is wrong – riding on the intuition only you ran the loop to earn (SHEET 03)
"什么值得做/知道"的价值判断,与最终的品味定夺(SHEET 06)
The value judgment of "what is worth doing / knowing," and the final call of taste (SHEET 06)
关键不在表本身,在上下文怎么在两栏之间流动——流向反了,安全的用法就翻成替代式。规定一个方向:
The point is not the table but how context flows between the two columns – reverse the flow and a safe use flips into substitution. Specify one direction:
先人后机,不是先机后人。你先产出假设/草稿/分解,再把它喂给 AI 校验补全。次序一旦颠倒(先让 AI 出第一稿,你再改),犯错-纠正循环就被绕过了——MIT 预印本里"拥有感最低、78% 无法复述自己刚写的句子"正是这个次序的产物(Ⅲ 级、样本小,但机理方向一致)。
Human first, machine second – not the reverse. You produce the hypothesis / draft / decomposition first, then feed it to AI to verify and complete. Once the order flips (AI writes the first draft, you edit), the error-correction loop is bypassed – the MIT preprint's "lowest sense of ownership, 78% unable to quote a sentence they had just written" is exactly the product of that order (grade III, small sample, but the mechanistic direction agrees).
校验回流,不是答案回流。从 AI 流回你的,应是"哪里不对/还能怎么想",不是"标准答案"。让它当陪练,不当代笔。结构化提示的可干预杠杆就在这一条:把"先自答、要理由、留反思"写进你和 AI 的交互模板。
What flows back is checking, not the answer. What returns from AI should be "where it is wrong / what else to consider," not "the model answer." Let it spar, not ghostwrite. The intervenable lever of structured prompting lives here: write "answer first, demand reasons, leave a reflection" into your interaction template with AI.
留痕回流到脚手架。每一轮的"我错在哪、AI 指出什么、我改了什么"沉淀进反思库(SHEET 05/09)——上下文不只在你和 AI 之间流,还要流进一个可 diff 的、你日后能回看的载体。
Traces flow back into the scaffold. Each round's "where I was wrong, what AI flagged, what I changed" settles into the reflection log (SHEET 05/09) – context flows not only between you and AI but also into a diffable carrier you can revisit later.
把规约跑一遍:同一道题,两种次序,两种结果
Run the rule: one problem, two orders, two outcomes
用一个具体场景把三条流向规约串起来跑一遍,比抽象规则更有说服力。设想你在学一个陌生的算法。违反次序的做法:直接问 AI"这题怎么解",它给出一份完整、漂亮的解法,你读懂了(识别很顺),复制进作业,过了。三天后再遇到变体,你发现自己一片空白——因为你的认知从未上场,AI 的解法只是路过你的眼睛。这就是 MIT 预印本"拥有感最低、复述不出"的临床现场。遵守次序的做法:先给自己十五分钟(延迟求助),写下你的第一版思路,哪怕卡在第三步;然后问 AI"我卡在这里,请只告诉我我的思路哪一步错了,别给完整答案"(校验回流,不是答案回流);拿到反馈后自己改完,再把"我原来错在哪、为什么、这类错还会出现在哪"记进反思库(留痕回流)。同一道题,第二种次序里你的犯错-纠正循环完整跑了一遍,"能做"长出来了一点;第一种次序里只发生了一次信息搬运。三条规约不是教条,是这个差别的可操作版本——它们存在的唯一目的,是确保 AI 进场时循环还在转。
Running the three flow rules through one concrete scene is more convincing than abstract rules. Picture learning an unfamiliar algorithm. The order-violating way: ask AI straight "how do I solve this," it gives a complete, elegant solution, you understand it (recognition is smooth), paste it into the assignment, pass. Three days later a variant appears and you go blank – because your cognition never took the field; AI's solution merely passed your eyes. This is the live scene of the MIT preprint's "lowest ownership, cannot restate." The order-keeping way: give yourself fifteen minutes first (delay help), write your first-pass approach even if stuck at step three; then ask AI "I'm stuck here, tell me only which step of my approach is wrong, don't give the full answer" (verify-return, not answer-return); after the feedback, fix it yourself, then log "where I was wrong, why, where else this class of error shows up" into the reflection log (trace-return). Same problem; in the second order your error-correction loop ran fully and "doing" grew a little; in the first only one information transport happened. The three rules are not dogma but the operable version of this difference – their only purpose is to ensure the loop still turns when AI enters.
分诊表会过时,流向规约不会
The triage table dates; the flow rule does not
本卷刻意把承重压在"流向规约"上,而不是那张"哪步交、哪步留"的分诊表,这个选择背后有方法论上的考量,值得点明。分诊表是当下能力分布的快照:今天该留给自己的"问题分解",明天可能因为模型更强而安全地交出去;表里每一格的归属都会随 AI 能力左移。如果把方法论钉在一张会过时的表上,它的保质期就和某代模型绑死了。流向规约(先人后机、校验回流、留痕回流)则不同——它约束的不是"哪件事",而是"无论哪件事,你的认知都要先动、AI 才进场"这个次序。次序是与具体能力边界正交的:边界怎么移,"先动起来的必须是你"这条都成立,因为它锁的是犯错-纠正循环能否发生的结构条件,而那个结构不随模型变强而改变。这与 FIG L.6 给"判据而非清单"是同一个设计哲学:在一个快速变化的领域里,把方法论钉在不变的结构上,而不是会漂移的边界上。
The volume deliberately rests its load on the "flow rule" rather than the "which step to hand off, which to keep" triage table, and the methodological reason is worth naming. The triage table is a snapshot of the current capability distribution: the "problem decomposition" you should keep today may be safely handed off tomorrow as models strengthen; every cell's assignment drifts left with AI capability. Pin a methodology to a table that dates and its shelf life is bound to one model generation. The flow rule (human-first, verify-return, trace-return) is different – it constrains not "which thing" but "whatever the thing, your cognition must move first, then AI enters," an order. Order is orthogonal to the specific capability boundary: however the boundary moves, "the one to move first must be you" still holds, because it locks the structural condition for whether the error-correction loop can happen, and that structure does not change as models strengthen. This is the same design philosophy as FIG L.6 giving "a criterion, not a list": in a fast-changing domain, pin the methodology to the invariant structure, not the drifting boundary.
三条流向规约,逐条对应一个失败的反面
Three flow rules, each the inverse of a failure
把那三条流向规约(先人后机、校验回流、留痕回流)放在一起,会发现它们不是并列的好习惯,而是各自堵住一个具体的漏。先人后机堵的是"循环被绕过"——一旦 AI 先出第一稿,你的提取与生成就没机会上场,MIT 预印本里"拥有感最低、78% 无法复述自己刚写的句子"正是这个漏的临床表现(Ⅲ 级、样本小,但它精确地操作化了"绕过循环"长什么样)。校验回流堵的是"答案替代反馈"——从 AI 流回的若是标准答案而非"哪里不对",那它就把你本该自己完成的纠正步骤一并代劳了,循环的后半段也丢了。留痕回流堵的是"学习不沉淀"——每一轮的纠错若不流进可 diff 的反思库(SHEET 09),它就只发生在工作记忆里,过几天连同情境一起蒸发,下次重头再来。三条规约合起来,是把一个完整的犯错-纠正循环(FIG L.4)在人机协作的形态下重新拼回完整——缺任何一条,循环就在某处断开。
Put the three flow rules together (human-first, verify-return, trace-return) and you find they are not parallel good habits but each plug a specific leak. Human-first plugs "the loop is bypassed" – once AI drafts first, your retrieval and generation never take the field; the MIT preprint's "lowest ownership, 78% unable to restate a sentence they just wrote" is the clinical sign of this leak (grade III, small sample, but it operationalizes precisely what "bypassing the loop" looks like). Verify-return plugs "the answer replacing feedback" – if what flows back from AI is the model answer rather than "where it is wrong," it has done for you the correction step you were supposed to complete, and the loop's second half is lost too. Trace-return plugs "learning that does not settle" – if each round's correction does not flow into a diffable reflection log (SHEET 09), it happens only in working memory and, days later, evaporates along with its context, so next time you start over. Together the three rules reassemble a complete error-correction loop (FIG L.4) in its human-AI-collaboration form – drop any one and the loop breaks somewhere.
同一个工具,凭什么用法相反、结果相反
Same tool, why use and result diverge
"取决于怎么用"听起来像回避判断,但它有精确的机制支撑,值得拆开。结构化提示 RCT(MDPI Data 2025, n=150)里,两组用的是同一个模型,差别只在交互的形状:无引导组直接索取答案,结构化组被要求先自答、说明理由、写一句反思。结果是认知投入与批判推理的显著分化。机制不神秘——它就是 SHEET 02 的犯错-纠正循环有没有被触发:先自答,你的认知结构先做了一次提取与生成(哪怕错),AI 的反馈才有结构可改;直接索取,你的结构从未上场,AI 的答案只是路过你的眼睛,没有落点。所以"用法"这个词的精确含义是:你的认知有没有在 AI 介入之前先动起来。动了,AI 是校验器,强化循环;没动,AI 是替代器,绕过循环。
"Depends on how you use it" sounds like fence-sitting, but it has a precise mechanism worth unpacking. In the structured-prompting RCT (MDPI Data 2025, n=150), both groups used the same model; the only difference was the shape of the interaction: the unguided group asked straight for answers, the structured group was required to self-answer, state reasons, and write a line of reflection. The result was a significant divergence in cognitive engagement and critical reasoning. The mechanism is no mystery – it is simply whether the SHEET 02 error-correction loop fired: self-answer first and your cognitive structure has already done a retrieval and a generation (even if wrong), so AI's feedback has a structure to revise; ask straight and your structure never took the field, AI's answer merely passes your eyes with nowhere to land. So the precise meaning of "use" is: did your cognition move before AI intervened. If it did, AI is a verifier, reinforcing the loop; if it did not, AI is a substitute, bypassing it.
这把"先人后机"从一句口号变成一个可执行的工程约束:它要锁的不是态度,是上下文的流向。把交互模板固化成"我先给出 X,请你只对 X 做校验/反驳/补漏,不要直接给我标准答案",就是把这条流向写进了和 AI 的契约。这与工程卷把校验做成流程(不是靠人记得复核)、组织卷把上下文做成可查询基设(不是靠人记得同步)是同一个手法:把纪律工程化,不依赖自律。三卷在此处共享同一条工程哲学——能写进流程的,就别留给意志。
This turns "human first, machine second" from a slogan into an executable engineering constraint: what it locks is not attitude but the direction of context flow. Hardening the interaction template into "I give X first; please only verify/refute/patch X, do not hand me the model answer" writes that direction into your contract with AI. This is the same move as the engineering volume making verification a process (not relying on people to remember to review) and the org volume making context a queryable infrastructure (not relying on people to remember to sync): engineer the discipline, do not depend on willpower. The three volumes share one engineering philosophy here – whatever can be written into the process should not be left to the will.
The triage table itself will drift left as models improve – a step that should be kept today may be safely handed off tomorrow. But the flow rule does not age: however the boundary moves, the rule that your cognition must move first still holds. This is why the volume puts the load on the flow direction, not on the table that will date.
怎么知道分诊有效How to know the triage is right
有效的标志:左栏的步骤越来越自动、省心;右栏的步骤你仍亲手做、且做得动。失效的标志:右栏的东西悄悄漂到左栏——你开始让 AI 替你做分解、替你下价值判断,而你说不清自己为什么同意它。后者就是 SHEET 04 的便利陷阱在单个任务上的显形。Done right: the left-column steps grow automatic and effortless; the right-column steps you still do by hand, and still can. Done wrong: right-column items quietly drift left – you start letting AI do the decomposition and the value calls, and you cannot say why you agreed with it. The latter is the SHEET 04 convenience trap showing up on a single task.
Load-bearing claim: since "whether it atrophies long-term" has no longitudinal data to settle it (SHEET 04), the only responsible move is to hang leading indicators and monitor yourself – turning the open question into a dashboard you can read. Every indicator here is a quantity already used as a measurement point in existing research (override rate, ownership, transfer, structured-use share), not a metric I invented. A decline is an early warning, not a verdict.
The dashboard has two banks. Leading indicators rising = cognitive command strengthening; counter-indicators rising = dependence quietly growing. Read the two together – any single line will deceive you (this is the metacognitive trap of desirable difficulty: good current performance often means worse learning).
先行 · override rateLeading · override rate
推翻/修改 AI 输出的频率Frequency of overriding AI output
你多常反驳 AI、且反驳对。APA 2025(N=1,923)用过这个量:推翻越少 → 自报独立推理信心越低(r=-.61)。下降是头号预警——它意味着你正从"质疑者"滑成"接受者"。How often you push back on AI, and push back correctly. APA 2025 (N=1,923) used this quantity: less overriding → lower self-reported confidence in independent reasoning (r=-.61). A decline is the prime warning – it means you are sliding from challenger to acceptor.
先行 · 拥有感Leading · ownership
能否复述自己刚产出的东西Can you restate what you just produced
合上 AI,你能否凭记忆复述刚"和 AI 一起"得出的结论与理由。MIT 预印本把它操作化:LLM 组拥有感最低、78% 无法引用自己刚写的句子(Ⅲ,样本小)。复述不出,就是没内化、只是搬运。With AI closed, can you restate from memory the conclusion and reasoning you just reached "with AI." The MIT preprint operationalized this: the LLM group had the lowest ownership, 78% unable to quote a sentence they had just written (III, small sample). Can't restate it = didn't internalize it, only transported it.
先行 · 迁移Leading · transfer
无 AI 在场的新情境通过率Pass rate in a new, AI-free situation
能否把"和 AI 协作中学到的"用到一个新情境、且 AI 不在场。这是理解的检验标准(迁移),也是"能做"是否真长出来的唯一硬测(SHEET 02)。只在 AI 在场时表现好 = 长出来的是依赖。Can you deploy "what you learned collaborating with AI" in a new situation with AI absent. This is the test standard of understanding (transfer) and the only hard test of whether "knowing how" actually grew (SHEET 02). Good performance only when AI is present = what grew is dependence.
先行 · 结构化占比:你的 AI 使用里,"脚手架式"(先自答、要理由、留反思)对"替代式"(直接要答案)的比例。MDPI Data 2025 把这条立为可干预杠杆——它是你唯一能直接拧的旋钮,拧对了能逆转卸载。升 = 好。
Leading · structured-use share: within your AI use, the ratio of "scaffolded" (answer first, demand reasons, leave a reflection) to "substitutive" (ask straight for the answer). MDPI Data 2025 set this as the intervenable lever – it is the one knob you can turn directly, and turned right it can reverse offloading. Rising = good.
Counter · answer-recall share: within your learning time, the share where "look up the answer instantly" crowds out "run the loop." Rising = bad – it is swapping your learning for transport.
反指标 · "不用 AI 我还会吗"答不上:当你对一项已外包的能力答不出"三个月不用 AI 我还做得来吗",这条就触发了。它直接喂给第二仪器(SHEET 15 的"该不该让 AI 做"判定器)。
Counter · failing the "could I still do this without AI" check: when, for an outsourced capacity, you cannot answer "could I still do this after three months without AI," this line trips. It feeds directly into the second instrument (the "should AI do this" test in SHEET 15).
烟雾报警器,不是判决书:为什么这个区分救命
A smoke alarm, not a verdict: why this distinction is load-bearing
仪表盘最容易被误用成两个极端,而这两个误用都来自混淆"先行指标"和"因果判决"。第一个误用是过度恐慌:看到 override rate 跌了一点,就断定"我被 AI 弄废了",陷入焦虑甚至全面禁用 AI——这越过了证据(单条信号下降不证明能力萎缩),也掉进了 SHEET 04 那个"把萎缩当已证"的立场陷阱。第二个误用是完全无视:因为"反正没有因果证据",就不监测任何东西,任由依赖悄悄长——这放弃了你作为唯一纵向样本能做的唯一负责任的事。正确的姿态在两者之间,由"烟雾报警器"这个比喻精确锚定:报警器响了,你去查厨房有没有着火,而不是立刻报警拆房子,也不是把报警器拔了图清静。仪表盘的某条信号下行,意味着"该回头对这项能力做一次撤除演练了"——是一个触发调查的提示,不是一个定罪的结论。把这个区分守住,仪表盘才既不会把你吓瘫,也不会被你当噪音忽略。这正是 B 档立场(有据警告,不写成已证)在个人日常监测层最具体的样子。
The dashboard is most easily misused to two extremes, both from conflating "leading indicator" with "causal verdict." The first misuse is over-panic: seeing override rate dip a little and concluding "AI ruined me," spiraling into anxiety or even banning AI wholesale – this overshoots the evidence (one declining signal does not prove atrophy) and falls into the SHEET 04 "treating atrophy as proven" stance trap. The second misuse is total disregard: because "there's no causal evidence anyway," monitoring nothing and letting dependence quietly grow – abandoning the one responsible thing you, as the only longitudinal sample, can do. The right posture is between the two, precisely anchored by the "smoke alarm" metaphor: when the alarm sounds you check whether the kitchen is on fire, you neither immediately tear the house down nor unplug the alarm for peace. A declining signal means "time to run a removal drill on this capacity" – a prompt to investigate, not a verdict to convict. Hold this distinction and the dashboard neither paralyzes you with fear nor gets ignored as noise. This is the most concrete form, at the daily personal-monitoring layer, of the grade-B stance (an evidence-grounded warning, not written as proven).
一个具体读法:两组指标怎样互相校正
A concrete reading: how the two banks correct each other
抽象的"两组一起读"容易空转,给一个具体场景。设想一位开发者过去一个季度大量用 AI 写代码,自我感觉效率飙升、产出更多。只看即时感受,结论是"AI 让我更强了"。现在叠上滞后指标:迁移信号——上周面试白板题(无 AI),他卡在一个三个月前还顺手的算法上;拥有感——他无法向同事口头讲清自己上个月合并的那个 PR 的核心逻辑;override rate——他已经想不起上一次反驳 AI 的建议是什么时候。三条滞后信号一致下行,而即时感受却在上行——这个背离本身就是最强的预警,比任何单条都可靠。它说的不是"他变笨了"(证据不支持这种判决),而是"他的能力正越来越依赖 AI 在场,撤除测试在悄悄失败"。这时正确的动作不是恐慌,是回到 SHEET 06/07 重划那项能力的止步线、并对它做一次撤除演练。
"Read the two banks together" spins idle in the abstract, so here is a concrete scene. Picture a developer who spent the past quarter writing a lot of code with AI, feeling efficiency soaring and output up. By immediate feeling alone, the conclusion is "AI made me stronger." Now overlay the lagged indicators: transfer signal – last week's whiteboard interview (no AI), he stuck on an algorithm that was second nature three months ago; ownership – he could not verbally walk a colleague through the core logic of a PR he merged last month; override rate – he cannot recall the last time he pushed back on AI's suggestion. Three lagged signals declining in unison while the immediate feeling rises – that divergence itself is the strongest warning, more reliable than any single line. It says not "he got dumber" (the evidence does not warrant that verdict) but "his capability is increasingly leaning on AI's presence; the removal test is quietly failing." The right move then is not panic but to return to SHEET 06/07, redraw that capacity's stop-line, and run a removal drill on it.
为什么必须自己挂仪表盘:你是唯一的纵向样本
Why you must hang the dashboard yourself: you are the only longitudinal sample
SHEET 04 repeats that "there is no multi-year longitudinal data" – this is not only an academic gap but a direct corollary for the individual: since no one is running a long-term track on you, you are the only longitudinal sample about yourself. Hanging the leading indicators and logging them periodically is, in essence, running an N=1 longitudinal study on yourself. This is not a fallback but the only responsible posture in this evidence state: the macro evidence is too weak for a causal verdict, but strong enough to tell you which quantities to monitor – and those quantities happen to all be readable by you at the individual scale. The dashboard dimensionally reduces the unanswerable big question "is humanity atrophying" into the small question "is this capacity of mine up or down this quarter," which you can verify daily.
但读仪表盘本身有一个绕不开的悖论,必须明说,否则会读反。SHEET 14 的速度公理推出一个反直觉推论:当下表现好,常常意味着学得差(合意困难的元认知陷阱)。这意味着任何"当堂感觉"类的指标都会系统性骗你——你今天用 AI 做得飞快、感觉良好,恰恰可能是 override rate 在跌、拥有感在空。所以仪表盘的设计有意偏向滞后、无 AI 在场的量:迁移测试要隔几天、要撤掉 AI 才测;拥有感要合上 AI 凭记忆复述才算。两组指标必须一起读,正是为了让滞后的真信号去校正即时的假流畅——单看任何一条,尤其单看"感觉",都会把你导向便利陷阱。
But reading the dashboard has an unavoidable paradox that must be stated, or it will be read backwards. The SHEET 14 speed axiom yields a counter-intuitive corollary: good current performance often means worse learning (the metacognitive trap of desirable difficulty). This means any "in-the-moment feeling" indicator will systematically deceive you – flying through a task with AI today and feeling great may be exactly when override rate is dropping and ownership is hollowing. So the dashboard deliberately leans on lagged, AI-absent quantities: transfer tests measured days later with AI removed; ownership counted only by restating from memory with AI closed. The two banks must be read together precisely so the lagged true signal corrects the immediate false fluency – reading any single line, especially "feeling," alone steers you into the convenience trap.
口径诚实 · 别把仪表盘当判决书Honest caveat · the dashboard is not a verdict
这些都是先行指标,不是因果证明。它们大多来自横断/自报/小样本研究(SHEET 10 给完整等级),单条下降不证明"能力已经下降"——只提示"该回头看一眼这项能力了"。把它当烟雾报警器:响了去查,不是响了就定罪。这正是 B 档立场(有据警告,不写成已证)落到个人监测层的样子。These are all leading indicators, not causal proof. Most come from cross-sectional / self-report / small-sample studies (SHEET 10 gives the full grading); a single line declining does not prove "you are getting dumber" – it only prompts "time to look back at this capacity." Treat it as a smoke alarm: when it sounds, investigate, do not convict. This is the B-stance (an evidence-grounded warning, not written as proven) brought down to the personal-monitoring layer.
Load-bearing claim: SHEET 05 says "the cognitive scaffold is infrastructure"; this sheet makes it a copyable artifact – a minimal spec for an error-and-reflection log. It is not an answer warehouse but a way to harden decades of replicable cognitive science (desirable difficulty, the testing effect, spacing) into a format you use daily. State the principle, do not teach the tool: plain text, same-source for people and machines, diffable; what software it is called does not matter.
Each entry records one "where I was once wrong." The fields are deliberately minimal – sticking with it matters more than completeness. Every field rests on a principle, not on whim:
The error (how I was thinking then) – record your wrong reasoning, not the right answer. The whole value is here: the log records "where you were wrong and why," letting you diff how your understanding changed. Copying only the answer = one more answer warehouse, zero learning value.
Self-answer before correcting – before consulting AI/the answer, write your own correction attempt. This step welds the testing effect into the flow: active retrieval (even if wrong) beats passive rereading for long-term retention (Roediger & Karpicke 2006, II). Without it, the log degrades into copying answers.
Next review date (spacing) – give each entry a review day, spaced apart (next day → next week → next month). The spacing effect: spaced review retains longer than massed, lasting up to about a year (multi-day-design studies, II). This designs "slow" into the scaffold rather than leaning on memory.
Transfer hook (where else this error appears) – write one line: "this class of error also shows up in what other situations." Interleaved/varied practice improves transfer (the desirable-difficulty family, Bjork, II); this field forces a near-transfer association, linking isolated errors into a pattern.
反模式 · 答案仓库Anti-pattern · answer warehouse
把 AI 的正确答案剪贴进笔记,越攒越多,从不回看。看着像在学习,实则是在用一个可搜索的外脑替换内化——下次还得问。这正是"知道"的搬运,不是"能做"的生长。
Paste AI's correct answers into notes, pile them ever higher, never revisit. It looks like learning but it is replacing internalization with a searchable external brain – next time you still have to ask. This is the transport of "knowing that," not the growth of "knowing how."
Record the wrong reasoning + self-answer + spaced review + transfer hook. The four fields each weld in a principle (testing effect / spacing / transfer / desirable difficulty). It forces return-use (the SHEET 08 "reflection-log return-use rate" signal measures exactly this), turning each error into a structural rebuild rather than a read-only note.
最小可行的反思库:今天就能跑起来的版本
The minimal viable log: a version you can run today
The spec's biggest enemy is not incompleteness but never getting started. Many, on hearing "build a cognitive scaffold," begin researching which software to use, how many fields to design, how to organize tags – and then, in the pleasure of building the system, defer "actually logging one error" indefinitely. This is itself a procrastination disguised as diligence (a cousin of the SHEET 15 failure modes). So the volume gives a deliberately crude minimal version you can run today: one plain-text file, three entries a week, each with only two lines – "how I was thinking (the wrong reasoning)" and "my correction attempt before seeing the answer (self-answer first)." That is all. No tags, no categories, no fancy review algorithm. This minimal version already runs the testing effect (self-answer = active retrieval) and trace-keeping (the wrong reasoning logged and diffable), and these two are the most load-bearing in the whole spec. Once the habit is stable – say, four unbroken weeks – consider adding "next review date" and "transfer hook." A scaffold's entire value is in being used continuously, and a crude text file run for a year beats a beautifully designed system opened once and never again. Starting small enough that failure is impossible is this volume's only answer to "building the log as procrastination."
间隔复看那一栏,凭什么比"多记几条"更重要
Why the spaced-review field beats "log more entries"
Many turn the reflection log into an ever-growing favorites folder they never revisit – losing exactly its most load-bearing field. The spec's "next review date" looks trivial but actually welds two of the sturdiest cognitive-science findings into the process. First, the spacing effect: spaced review (next day → next week → next month) retains longer than massed, with effects lasting about a year (multi-day-design studies, II). Second, the testing effect: on review, do not reread the error entry but cover it and redo it from memory, making each review itself a retrieval rep. Together they decide that the log's value lies not in entry count but in return frequency – one error retrieved on schedule five times beats fifty entries logged and never reopened. This is why SHEET 08 makes "reflection-log return-use rate" rather than "log entry count" a test signal: the former measures whether the loop actually turns, the latter only measures a collecting habit. A common failure is turning log-building into hoarding, and hoarding brings the illusion of "studying hard" (another variant of the convenience trap) while never triggering a single reconsolidation.
This is the most counter-intuitive and most load-bearing line in the whole spec, and worth spelling out alone. Intuition pushes you to record the correct answer – it is clean, reusable, and looks like a learning result. But recording the correct answer is exactly the answer-warehouse anti-pattern: it replaces your cognitive trace with a searchable external copy, and next time you still have to look it up. Recording the wrong reasoning does the opposite: it preserves a snapshot of "where my mental model went off." The value has three layers. First, diagnosis: wrong reasoning exposes your model's defect, of which the correct answer knows nothing – ten people can get the same problem wrong in ten different places, and only logging where you specifically erred makes review targeted. Second, the diff: revisiting the same class of error three months later, you can see directly whether your model has changed – the only visible evidence of cognitive-structure change, and the artifact-layer form of SHEET 05's "diffable" pillar. Third, metacognitive calibration: repeatedly facing your own error patterns gradually corrects the systematic overestimation between "I thought I knew it" and "I actually know it" – the very overestimation that is the desirable-difficulty family's most dangerous trap.
把四个字段连起来读,它其实是把一个完整的犯错-纠正循环(SHEET 02)冻结成可回看的工件:错的内容(②犯错)+ 先自答(③自纠,焊测试效应)+ 间隔复看(顺速度公理,焊间隔效应)+ 迁移钩子(把孤立的错连成模式,焊交错/迁移)。它不是笔记格式,是把循环结构外化成基设——这样每个错都被强制走完一遍循环,而不是被一个 AI 答案当场抹平。
Read the four fields together and it is really a complete error-correction loop (SHEET 02) frozen into a revisitable artifact: the error (② erring) + self-answer (③ self-correction, welding in the testing effect) + spaced review (along the speed axiom, welding in the spacing effect) + transfer hook (linking isolated errors into a pattern, welding in interleaving/transfer). It is not a note format but the externalization of the loop structure into infrastructure – so each error is forced through one full loop rather than being smoothed over on the spot by an AI answer.
起步小到不会失败Start small enough not to fail
别一上来建宏大系统——那是另一种拖延。一个纯文本文件、一周三条、只填"错的内容 + 先自答"两个字段,就已经在跑测试效应了。其余字段等习惯长稳再加。脚手架的价值在被用,不在被设计得漂亮。Do not start by building a grand system – that is another form of procrastination. One plain-text file, three entries a week, only the "error" and "self-answer" fields filled, already runs the testing effect. Add the rest once the habit is stable. A scaffold's value is in being used, not in being designed beautifully.
Load-bearing claim: this volume's most counter-intuitive claim (cognition may atrophy) rests on a body of evidence, and not one piece of it proves "long-term causal atrophy." For the stance to be credible, each piece's grade, caveat, and what it cannot support must go on the table – this sheet is the volume's evidence ledger, keeping SHEET 04 from being read as subjective assertion. Where a hard conclusion is warranted, draw it; where it is only a bet, say so.
先把能下硬结论的放上来——这些是数十年可复现、且不依赖任何 AI 研究的认知科学,是全卷最稳的地基:
First the pieces that warrant a hard conclusion – decades-replicable cognitive science that depends on no AI study, the volume's firmest ground:
合意困难(Ⅱ,证据账):R. & E. Bjork(1994;2011;2020 JARMAC 9(4):475)——加速表观学习的条件常损害长期留存与迁移;放慢的困难(间隔、交错、变式、用测试代替呈现)反而提升。能下硬结论:AI 抹平困难 = 抹平合意困难。边界:困难只对有基础能成功响应者"合意",否则只是绊脚。
Desirable difficulty (II, evidence ledger): R. & E. Bjork (1994; 2011; 2020 JARMAC 9(4):475) – conditions that speed apparent learning often harm long-term retention and transfer; slowing difficulties (spacing, interleaving, variation, testing in place of presentation) improve them. A hard conclusion holds: AI smoothing away difficulty = smoothing away desirable difficulty. Boundary: difficulty is "desirable" only for those with enough background to respond successfully, otherwise it is just an obstacle.
The testing effect (II, evidence ledger): Roediger & Karpicke 2006 – using testing as a learning event beats rereading for long-term recall; rereading looks better short-term (another metacognitive trap). AI maximizes the ease of the "reread" mode, landing on that trap.
使用方式决定影响(Ⅱ 实验,证据账):结构化提示 RCT(MDPI Data 2025, 10(11):172,n=150)逆转卸载;脚手架式 AI 辅导 RCT 给 +0.73~1.3 SD 正效应。最强的因果证据反而是正向的——所以处方是"设计合意困难",不是"禁用 AI"。
Use determines the effect (II experiment, evidence ledger): a structured-prompting RCT (MDPI Data 2025, 10(11):172, n=150) reverses offloading; scaffolded AI-tutoring RCTs give +0.73 to 1.3 SD positive effects. The strongest causal evidence is positive – so the prescription is "design desirable difficulty," not "ban AI."
接着是只够当赌注的——相关/自报/单次/小样本,必须带口径,不能当因果用(探索账):
Then the pieces that warrant only a bet – correlational / self-report / single-shot / small-sample, which must carry caveats and cannot be used as causal (the exploration ledger):
理论锚 · ⅡTheory anchor · II
认知卸载综述The offloading review
Risko & Gilbert 2016(Trends Cogn Sci 20(9):676):把记忆/计算/导航外包给外部工具——概念真实牢固。但它界定的是行为,不证"卸载 → 内在萎缩"。对立框架必须并列:延伸心智(Clark & Chalmers 1998)——卸载可能是认知边界外移,不是衰退。Risko & Gilbert 2016 (Trends Cogn Sci 20(9):676): outsourcing memory/computation/navigation to external tools – the concept is real and solid. But it defines a behavior, not proof of "offloading → inner atrophy." The rival frame must stand alongside: the extended mind (Clark & Chalmers 1998) – offloading may be the cognitive boundary moving outward, not decline.
先例 · 相关非因果Precedent · correlational
GPS / 搜索引擎GPS / search engines
GPS-海马:Maguire 2000、Dahmani & Bohbot 2020(Sci Rep)——习惯性 GPS 与海马灰质更少相关,方向未定(是 GPS 致萎缩,还是海马强者更爱空间策略?)。Google 效应:Sparrow 2011(Science 333:776)——预期可再取则记信息少、记"去哪找"多,但后续复制未稳健复现。计算器→心算属常见论断、缺一手实证强锚,不宜泛引。GPS-hippocampus: Maguire 2000, Dahmani & Bohbot 2020 (Sci Rep) – habitual GPS correlates with less hippocampal grey matter, direction undetermined (does GPS cause atrophy, or do strong-hippocampus people prefer spatial strategies?). Google effect: Sparrow 2011 (Science 333:776) – expecting re-access, people recall info less and "where to find it" more, but it has not replicated robustly. Calculator → mental arithmetic is a common claim lacking a strong first-hand empirical anchor; do not over-cite.
AI 实证 · Ⅲ / 自报AI empirics · III / self-report
2024–2026 的一批The 2024–2026 batch
Gerlich 2025(横断相关,反向因果未排除);Kosmyna/MIT 2025(arXiv:2506.08872,Ⅲ 预印本,N→18,作者明确反对使用贬义化表述);Lee/Microsoft 2025(CHI,测自报努力非能力,批判性思维转移到核验);Sailer 2024(少见的因果实验:LLM 组认知负荷更低、但论证质量更差——"省力≠学得好");APA 2025 白纸黑字"描述性、不支持因果"。共同口径:无一能证长期因果萎缩。Gerlich 2025 (cross-sectional correlational, reverse causation unexcluded); Kosmyna/MIT 2025 (arXiv:2506.08872, III preprint, N→18, authors asked that "dumber" not be used); Lee/Microsoft 2025 (CHI, measures self-reported effort not ability; critical thinking shifts to verification); Sailer 2024 (a rare causal experiment: the LLM group had lower cognitive load but worse argument quality – "less effort ≠ learned better"); APA 2025 states in black and white "descriptive, does not support causal inference." Shared caveat: none proves long-term causal atrophy.
延伸心智:必须正面对待的对立框架
The extended mind: the rival frame to face head-on
A book arguing "offloading may harm" must face, head-on, the rival frame of the extended mind (Clark & Chalmers 1998). This frame holds that external tools (paper, notebooks, even AI) can be seen as part of the cognitive system, not its opposite; when you use a notebook to remember something, your "memory" boundary has merely moved outward, not "atrophied." If this frame holds, then outsourcing memory/computation to AI is not capacity loss but a natural expansion of the cognitive boundary – and the volume's dissent loses its footing. The honest move is not to skirt it but to say clearly why the volume's claim still stands before it: the extended mind describes the coupled person-tool system at steady state; it does not deny one thing – when that external component is removed, whether the person who lost it can still operate independently. What the volume guards is precisely this "can still operate independently after removal" capacity (the removal test and transfer signal both measure it). Put differently: the extended mind says "using tools is no shame, it is the cognitive norm," and the volume agrees; the volume only adds one clause – "but you must ensure you can still stand with the tool removed, or the outward-moved boundary becomes an unrecoverable dependence." The two frames do not actually conflict – the extended mind describes how capacity is distributed; the volume cares whether that distribution is reversible. Stating the rival frame at this precision is how the dissent, rather than ignoring the opposing side, becomes the narrower, sturdier claim that still holds after absorbing it.
五级证据等级:为什么本卷处处标注 Ⅰ–Ⅴ
Five evidence grades: why the volume tags I–V everywhere
The I–V tags the volume places beside each citation are not academic decoration but a railing against argumentative slippage, and the scheme is worth spelling out. I = peer-reviewed and independently replicated, the hardest; II = controlled experiment or measured design (RCTs, multi-day spacing designs), supporting causal or quasi-causal conclusions; III = structured case or single-shot experiment, correlational study, strongly directional but not causal; IV = first-hand practitioner observation, real but subjective; V = argument or projection, a product of logic not empirical evidence. The grading's use is to pin down each piece's "standing": desirable difficulty and the testing effect are II, so the volume dares to draw hard conclusions from them; the atrophy-side AI empirics are mostly III or even III↓ (cross-sectional/self-report/small-sample/un-reviewed), so they can only hang leading indicators and place bets, not deliver verdicts. Tagging the grade is itself a form of honesty – it lets the reader see at any moment how hard the ground under a claim is, and stops the author from quietly using a grade-III correlation as grade-I causation. SHEET 04's contrarian stance is credible not because its evidence is hard (its core claim's evidence is precisely soft) but because it honestly marks how soft it is and accordingly downgrades the claim to "a bet" rather than "a verdict."
FIG. L.8 / 证据账本THE EVIDENCE LEDGER看懂:把本卷主要主张按证据级排开,硬的归硬、软的认软Read: the volume's main claims sorted by evidence grade — hard ones hard, soft ones owned as soft
账本怎么看:每一行是本卷一条承重主张,圆点落在它真实的证据级上。注意分布的形状:被当作机理引用的那些(间隔、测试效应、睡眠巩固、练习量被高估)都坐在 Ⅰ–Ⅱ 的硬端,可复现、可被独立检验;越往下、越靠工程处方(脚手架、止步线),越是 Ⅲ–Ⅳ 的一手实践,硬度递减。而全卷标题级的命题——"认知会萎缩"——被诚实地放在最右的 Ⅴ:现有证据多是相关性的(卸载与某些测量的能力下降同时出现),尚未排净"本就更弱的人更爱卸载"这类反向因果。所以本卷把它降格为"一个值得对冲的赌注",而不是判决。这张图本身就是 SHEET 15 的论证:诚实不是把软证据说硬,是把它标软、并据此调低主张的语气。Reading the ledger: each row is one load-bearing claim; the dot sits at its true evidence grade. Mind the shape of the distribution: the findings cited as mechanism (spacing, testing effect, sleep consolidation, overstated practice-volume) all sit at the hard Ⅰ–Ⅱ end — replicated and independently checkable; the further down toward engineering prescriptions (scaffold, stop-line), the more it is Ⅲ–Ⅳ first-hand practice, with hardness tapering. The volume's headline claim — "cognition atrophies" — is placed honestly at the far-right Ⅴ: today's evidence is mostly correlational (offloading co-occurs with declines on some measures) and has not ruled out reverse causation such as "already-weaker people offload more." So the volume downgrades it to "a bet worth hedging," not a verdict. The figure is itself SHEET 15's argument: honesty is not calling soft evidence hard, it is marking it soft and lowering the claim's tone to match.
为什么一个唱反调的命题,反而要把反证摆到最显眼处
Why a contrarian claim must put its counter-evidence most visibly
直觉上,一个要论证"认知可能萎缩"的章节,应该把支持的证据堆满、把反证轻描淡写。本卷反着做——把最强的反证(脚手架式 AI 辅导 +0.73~1.3 SD 的正效应、再分配假说、Sparrow 复制失败)放在和担忧侧同等显眼的位置。这不是谦虚,是论证策略上的硬要求。原因有二。其一,本卷的主张本来就是 B 档而非 A 档:它要立的从来不是"萎缩已发生",而是"在未决窗口期买一份对冲保险"——这个主张的力量恰恰来自承认反证的存在;藏起反证,主张就从"诚实的赌注"滑成"选择性叙事",反而更弱。其二,读者的信任是这卷唯一的承重墙:一个唱反调的方法论,最容易被指控为"危言耸听";唯一的免疫方式是把自己最不利的证据先摆出来、并说明它为什么仍不足以推翻赌注。把反证摆到最显眼处,是这卷取信于人的方式,也是它和那种"AI 已经造成认知能力下降"的廉价警示划清界限的地方。
Intuitively, a chapter arguing "cognition may atrophy" should pile up supporting evidence and downplay the counter-evidence. This volume does the reverse – it places the strongest counter-evidence (scaffolded AI-tutoring's +0.73 to 1.3 SD positive effect, the redistribution hypothesis, Sparrow's failed replication) as visibly as the concern side. This is not modesty but a hard requirement of argument strategy, for two reasons. First, the volume's claim is grade B, not grade A from the start: what it asserts was never "atrophy has happened" but "buy hedging insurance during the unsettled window" – and that claim's force comes precisely from acknowledging the counter-evidence; hide it and the claim slides from "an honest bet" into "selective narrative," which is weaker. Second, the reader's trust is this volume's only load-bearing wall: a contrarian methodology is most easily charged with "scaremongering"; the only immunity is to lay out its most unfavorable evidence first and explain why it still falls short of overturning the bet. Putting the counter-evidence most visibly is how this volume earns trust, and where it draws the line against the cheap doom-saying of "AI makes you stupid."
That principle lands on one ledger: arrange every study the volume cites line by line, tagging four things per row – grade (I–V), caveat, the conclusion it supports, and the most-easily-abused conclusion it cannot support. That last column is the table's real purpose: it pins each piece's "range" so it cannot be carried out of range and used as causal.
证据双账本 · 能下硬结论的 vs 只够当赌注的DUAL EVIDENCE LEDGER · hard conclusions vs bets only
研究 / 锚
等级 · 口径
它说了什么
它不能说什么
Study / anchor
Grade · caveat
What it says
What it cannot say
Bjork & Bjork1994 / 2011 / 2020 JARMAC 9(4):475
Ⅱ证据账 · 数十年可复现ledger · decades-replicable
加速表观学习的条件常损害长期留存与迁移;放慢的合意困难反而提升。AI 抹平困难 = 抹平合意困难。Conditions that speed apparent learning often harm long-term retention/transfer; slowing desirable difficulties improve them. AI smoothing difficulty = smoothing desirable difficulty.
不能推出"任何难度都好"——困难只对有基础能成功响应者合意。Cannot imply "all difficulty is good" – desirable only for those able to respond successfully.
Roediger & Karpicke2006 · testing effect
Ⅱ证据账ledger
用测试当学习事件,长期回忆胜过重读;重读短期看着更好(元认知陷阱)。AI 把"重读式"轻松最大化。Testing as a learning event beats rereading for long-term recall; rereading looks better short-term (a metacognitive trap). AI maximizes the "reread" ease.
不能推出"测试越多越好"或可替代理解;只说提取>呈现。Cannot imply "more testing is always better" or that it replaces understanding; only retrieval > presentation.
脚手架式 AI 辅导 RCTScaffolded AI-tutoring RCT2024–25 · +0.73–1.3 SD
Ⅱ实验 · 最强因果experiment · strongest causal
结构化用法下 AI 显著提升学习——本卷最强的因果证据反指向上。处方=设计合意困难,非禁用。Under structured use AI significantly lifts learning – the volume's strongest causal evidence points up. Prescription = design desirable difficulty, not ban.
不能推出"无引导用 AI 也好";增益依赖脚手架设计。Cannot imply "unguided AI is fine too"; the gain depends on scaffold design.
Risko & Gilbert2016 · TiCS 20(9):676
Ⅱ理论锚 · 综述theory anchor · review
界定认知卸载:把记忆/计算/导航外包给外部工具——概念真实牢固。Defines cognitive offloading: outsourcing memory/computation/navigation to external tools – concept real and solid.
界定的是行为,不证"卸载→内在萎缩"。对立框架:延伸心智(Clark & Chalmers 1998)。Defines a behavior, not proof of "offloading → inner atrophy." Rival: extended mind (Clark & Chalmers 1998).
Sparrow et al.2011 · Science 333:776
Ⅲ↓探索账 · 复制存疑exploration · replication doubtful
"Google 效应":预期可再取用,则记信息少、记"去哪找"多。"The Google effect": expecting re-access, people recall info less and "where to find it" more.
后续未稳健复现,等级应下调;不能当作卸载致损的硬证。Has not replicated robustly; grade should be lowered; not hard proof of offloading harm.
习惯性 GPS 与海马灰质更少相关;空间策略使用者灰质更多。Habitual GPS correlates with less hippocampal grey matter; spatial-strategy users have more.
方向未定:GPS 致萎缩,还是海马强者更爱空间策略?相关≠因果。Direction undetermined: does GPS cause atrophy, or do strong-hippocampus people prefer spatial strategy? Correlation ≠ causation.
Gerlich 2025Societies 15(1):6 · N=666
Ⅲ探索账 · 横断相关exploration · cross-sectional
频繁用 AI 与批判性思维显著负相关,由卸载中介(总效应 b=−0.42)。Frequent AI use negatively correlates with critical thinking, mediated by offloading (total effect b=−0.42).
作者自承不能证因果、无纵向数据、反向因果未排除。Author concedes no causation, no longitudinal data, reverse causation unexcluded.
Kosmyna / MIT 2025arXiv:2506.08872 · N→18
Ⅲ↓探索账 · 预印本未评审exploration · preprint, un-reviewed
EEG 示"认知负债"累积;LLM 组拥有感最低、78% 无法引用自己刚写的句子。EEG shows accruing "cognitive debt"; the LLM group had lowest ownership, 78% unable to quote a sentence they just wrote.
样本极小、未评审;作者明确反对使用贬义化表述。仅作机理方向,非结论。Tiny sample, un-reviewed; authors asked that "dumber" not be used. Direction only, not a conclusion.
APA 2025N=1,923
Ⅲ探索账 · 描述性自报exploration · descriptive self-report
提示依赖越高、推翻 AI 越少 → 自报独立推理信心越低(r=−.61)。Higher prompt-dependence, less overriding → lower self-reported confidence in independent reasoning (r=−.61).
作者白纸黑字:"描述性,不支持因果……不暗示认知损害或神经改变。"Authors state in writing: "descriptive, does not support causal inference … no implied cognitive damage or neural change."
There is one iron rule for reading this table: you have not finished a row until you read its last column. Any paraphrase that cites only the first three columns and cuts "what it cannot say" is smuggling a correlational finding into a causal assertion – which is exactly the failure mode this volume itself must most guard against (SHEET 15, last item).
读这张的方式How to read this sheet
上半"能做"层的证据是地基,敢断言;下半"萎缩"侧的证据是赌注,只挂先行指标(SHEET 08)。把两半混为一谈是本卷最危险的失误——要么把扎实的认知科学拖下水,要么把未决悬案抬成定论。证据等级(Ⅰ–Ⅴ)就是用来防这件事的栏杆。The top half (the "knowing how" layer) is the foundation, assertable; the bottom half (the "atrophy" side) is a bet, carrying only leading indicators (SHEET 08). Conflating the two halves is this volume's most dangerous error – either it drags solid cognitive science into the mud or it elevates an open question into a settled one. The evidence grades (I–V) are the railing built to prevent exactly that.
LEARN
11
CRITIQUE · 批判
CRITIQUE
批判 · 旧结构的失效
Critique · where the old structures fail
讲授-考试这套结构,本来就漏,AI 只是把漏点照亮
The lecture-and-test machine already leaked; AI just lit the leaks up
承重命题:传统教育的几个支柱结构——讲授-考试、覆盖优先、"知识传递"的银行存储模型、刷绩点/追文凭、标准化考试、考前突击——在 AI 之前就被认知科学判过"次优",只是低获取成本把代价藏住了。AI 把获取成本砍到零,等于撤掉了遮羞布:每一根支柱原有的漏洞,现在都从"低效"恶化成"空心化"。逐根点名,给机理,不靠情绪。
Load-bearing claim: several pillar structures of traditional education – lecture-and-test, coverage-over-mastery, the "knowledge transfer" banking model, grade-and-credential chasing, standardized testing, exam cramming – were already judged sub-optimal by cognitive science long before AI; cheap acquisition merely hid the cost. AI cuts acquisition cost to zero, pulling the cover off: every pillar's pre-existing flaw now degrades from "inefficient" into "hollowed-out." Named one by one, with mechanism, not vibes.
先承认对手最强的版本:这些结构当年解的是真瓶颈
First grant the opponent's strongest version: these structures once solved real bottlenecks
批判前先公正。讲授制不是愚蠢的设计——在一本书要手抄、一位专家一生只能面授几百人的年代,把一位懂行的人放在台上、让上百人同时听,是当时信息分发问题的最优解。标准化考试也不是凭空作恶——在需要给成千上万陌生人一个可比较、抗裙带关系的能力凭证时,统一题面、统一评分是一项真实的公平性发明。覆盖优先(一学期讲完整本教材)回应的是"课时稀缺、内容必须排进有限窗口"的约束。换句话说,这些结构都是某个真实约束下的合理工程。本卷的批判不是说它们当年错了,而是说:它们赖以成立的那个约束(信息稀缺、分发昂贵、评估必须规模化)正在被 AI 抽走,而结构本身没跟着变。当承重的约束消失,原本被它正当化的代价就裸露出来——下面逐根拆。
Be fair before you critique. The lecture was not a stupid design – in an age when a book had to be hand-copied and one expert could teach at most a few hundred people face-to-face in a lifetime, putting one knowledgeable person on a stage to be heard by a hundred at once was the optimal solution to the information-distribution problem of its day. Standardized testing was not gratuitous cruelty either – when thousands of strangers needed a comparable, nepotism-resistant credential of ability, a uniform paper and uniform scoring was a real fairness invention. Coverage-over-mastery (finishing the whole textbook in a term) answered the constraint that "class time is scarce and content must be packed into a finite window." In other words, each structure was reasonable engineering under some real constraint. This volume's critique is not that they were wrong then, but that the constraint they rested on – information scarcity, expensive distribution, evaluation that had to scale – is being drained away by AI, while the structures themselves have not changed. When the load-bearing constraint vanishes, the cost it once justified is laid bare. Pulled apart below, one pillar at a time.
FIG. L.12 / 旧结构的两轴诊断TWO-AXIS DIAGNOSIS OF THE OLD STRUCTURES看懂:横轴=当年就有多漏,纵轴=AI 充裕把漏点放大多少Read: x = how leaky it already was, y = how far AI abundance amplifies the leak
怎么读这张:越靠右,这结构在 AI 之前就越被认知科学判为次优;越靠上,AI 把信息获取砍到零后,它的漏点被放大得越狠。右上角那簇红点(覆盖优先、考前突击、银行存储模型)是双重受灾区——本来就漏,AI 还把代价从"低效"推成"空心化"。它们不是被 AI 创造出来的坏结构,而是被 AI 照亮的旧坏结构。How to read it: the further right, the more cognitive science had already judged the structure sub-optimal before AI; the further up, the harder its leak is amplified once AI cuts information acquisition to zero. The red cluster top-right (coverage, cramming, the banking model) is the double-hit zone – already leaky, and AI pushes the cost from "inefficient" to "hollowed-out." These are not bad structures AI created, but old bad structures AI has lit up.
讲授-考试:把"听懂"误当成"学会"的流水线
Lecture-and-test: an assembly line that mistakes "followed along" for "learned"
讲授制的承重假设是:把信息从讲者口中传到听者脑中,学习就发生了。这个假设在认知科学里早就站不住——被动接收的信息留存率极低,真正长记的是主动提取(测试效应,Roediger & Karpicke〔R1〕)。讲授制把全部认知负荷压在"听懂"那一刻,而"听懂"恰恰是识别(recognition)层的廉价流畅,最容易被误当成会了(见 SHEET 02 的识别/重现之分)。考试本该补这一刀——逼出重现——但传统考试往往考完即弃,提取只发生一次、还高度可预测,于是退化成"考前把识别临时拉到重现"的突击,长期留存几乎为零。AI 怎么把它推向空心化:当 AI 能秒生一份讲解、并把任何概念讲到"听上去都懂",讲授制最弱的那一环——制造识别层的虚假流畅——被无限放大。学生现在能在任何时刻获得"听懂了"的感觉,却比任何时候都更少被逼着重现。讲授-考试本来只是低效;AI 在场时,它高效地批量生产"自以为学会"的人。
The lecture's load-bearing assumption is: transmit information from the speaker's mouth into the listener's head and learning has happened. That assumption has long been untenable in cognitive science – passively received information has very low retention, and what endures is active retrieval (the testing effect, Roediger & Karpicke〔R1〕). The lecture loads all cognitive effort onto the moment of "following along," and "following along" is precisely the cheap fluency of the recognition layer, the easiest thing to mistake for mastery (see the recognition/reproduction split, SHEET 02). The test was meant to fix this – to force reproduction – but a traditional exam is often used once and discarded; retrieval happens a single time and is highly predictable, degrading into "cramming that temporarily yanks recognition up to reproduction," with near-zero long-term retention. How AI pushes it toward hollowing: when AI can generate an explanation in a second and explain any concept until it "all sounds clear," the lecture's weakest link – manufacturing false fluency at the recognition layer – is amplified without limit. Students can now obtain the feeling of "I followed that" at any moment, while being forced to reproduce less than ever. Lecture-and-test was merely inefficient; with AI present it efficiently mass-produces people who believe they have learned.
覆盖优先与考前突击:和间隔效应正面对撞的两种设计
Coverage and cramming: two designs that collide head-on with the spacing effect
这两根支柱可以一起拆,因为它们撞的是同一条最硬的证据:间隔效应(Cepeda et al. 2006 元分析,跨 254 项研究,Ⅰ 级〔R14〕)——分散练习显著优于集中练习,是合意困难里证据最扎实的一支。覆盖优先(一学期赶完整本教材)逼着每个主题只被触碰一次、且彼此挤压,根本没有回访与间隔的空间;它优化的是"讲过",不是"学会"。考前突击则是集中练习的极端形态:把本该分散数周的提取,压进考前一夜。它能短期拉高考试分数(这正是它存活至今的原因——它对当下的考有效),但对长期留存几乎无贡献,考完即忘是它的设计后果,不是意外。AI 怎么把它推向空心化:这两者本就和间隔效应对撞,AI 让对撞更彻底。当一份完整的、结构化的、可即时生成的笔记/题解随时可得,"赶覆盖"的边际成本趋近于零——学生可以在考前一夜让 AI 把整学期内容压缩成一份"看上去全懂"的速成包,把本就违反间隔原则的突击,做到了物理极限。AI 不创造突击文化,但它把突击的获取成本砍到零,于是移除了过去唯一逼人提前分散学习的摩擦:手工整理太慢。摩擦没了,违反间隔效应的默认路径就成了阻力最小的路径。
These two pillars can be pulled apart together, because they collide with the same hardest piece of evidence: the spacing effect (Cepeda et al. 2006 meta-analysis across 254 studies, grade I〔R14〕) – distributed practice significantly beats massed practice, the most solidly evidenced branch of desirable difficulty. Coverage-over-mastery (racing through the whole textbook in a term) forces every topic to be touched once and squeezed against the next, with no room for revisiting or spacing; it optimizes "was taught," not "was learned." Cramming is the extreme form of massed practice: it compresses retrieval that should be spread over weeks into the night before. It can raise an exam score short-term (which is exactly why it survives – it works for the test in front of you) but contributes almost nothing to long-term retention; forgetting right after the exam is a designed consequence, not an accident. How AI pushes it toward hollowing: both already collide with spacing, and AI makes the collision total. When a complete, structured, instantly generated set of notes or solutions is always available, the marginal cost of "racing for coverage" approaches zero – a student can have AI compress a whole term into a "looks fully understood" crash pack the night before, taking the already-anti-spacing cram to its physical limit. AI does not create cramming culture, but it cuts cramming's acquisition cost to zero, removing the one friction that used to force earlier, distributed study: manual organizing was too slow. With the friction gone, the spacing-violating default becomes the path of least resistance.
"知识传递"的银行存储模型:把人当容器,而 AI 是更好的容器
The "knowledge transfer" banking model: treating people as containers – and AI is a better container
这根支柱是隐喻层面的,却最致命。"知识传递"四个字暗含一个模型——Freire〔R21〕称之为银行存储式教育(banking model):知识是一笔可被转账的存款,老师往学生这个空账户里存,学生的任务是接收、保管、到考试时取出。这个模型把学习者当成容器,而非建构者。它在认知科学里早被建构主义判过死刑:理解不是被灌进去的,是学习者在已有结构上主动重建出来的(Vygotsky 的发展观、皮亚杰的同化-顺应都在反这个隐喻)。AI 怎么把它推向空心化——这是全卷最该看清的一点:如果学习真的只是"把知识从一处转移到另一处存起来",那么 AI 就是一个比任何人脑都更大、更快、更准的容器。在银行存储模型的框架里,人完全没有理由再去内化任何东西——直接把存款放进 AI 这个超级账户就好。也就是说,银行存储模型一旦遇上 AI,会自己推出"人不必再学"的结论,而且这个结论在它自己的前提下是逻辑自洽的。这正是为什么本卷必须从根上换掉这个隐喻:只要还用"传递/存储"来想象学习,就守不住任何东西——因为在那个隐喻里,人本来就该被更好的容器取代。学习的承重从来不是存储,而是建构与判断,那才是 AI 这个容器替代不了的部分。
This pillar lives at the level of metaphor, yet it is the most lethal. The phrase "knowledge transfer" smuggles in a model – what Freire〔R21〕called the banking model of education: knowledge is a deposit that can be transferred, the teacher deposits it into the empty account that is the student, and the student's job is to receive, store, and withdraw it at exam time. This model treats the learner as a container, not a constructor. Cognitive science condemned it long ago through constructivism: understanding is not poured in but actively rebuilt by the learner on top of existing structure (Vygotsky's developmental view and Piaget's assimilation-accommodation both push against this metaphor). How AI pushes it toward hollowing – the single thing this volume most needs you to see: if learning really were just "moving knowledge from one place to be stored in another," then AI is a container larger, faster, and more accurate than any human brain. Within the banking model's frame, a person has no reason left to internalize anything – just put the deposit into the super-account that is AI. That is, the banking model, once it meets AI, derives the conclusion "people need not learn anymore" on its own, and that conclusion is internally consistent under its own premises. This is exactly why the volume must replace the metaphor at the root: as long as you imagine learning as "transfer / storage," you can guard nothing – because in that metaphor a person was always meant to be replaced by a better container. Learning's load was never storage but construction and judgment, which is the part the container called AI cannot replace.
追绩点、文凭与标准化考试:信号机制在 AI 下集体失真
Grade-chasing, credentials and standardized tests: the signaling machinery distorts under AI
把这三者放一起,因为它们共享同一个机制——它们都不是学习本身,而是学习的信号(signaling):绩点、文凭、标准化分数都是用来向外界(雇主、下一级学校)压缩传递"这人有没有能力"的代理指标。代理指标的通病是古德哈特定律、用在考试上即 Campbell 定律〔R22〕——一旦某个度量成了目标,它就不再是好的度量。学生追绩点而非追掌握、刷分而非求理解,这个扭曲在 AI 之前就存在;它之所以被容忍,是因为过去伪造这些信号的成本足够高:你没法不学就写出一篇及格论文、解出一套难题。AI 怎么把它推向空心化:AI 把伪造这些信号的成本砍到接近零。一篇能拿高分的论文、一套能解对的作业、一份漂亮的项目报告,现在都可以在不经过任何内化的情况下生成。这意味着信号与它本该代表的能力之间的连接被切断了——分数还在涨,能力可以原地不动甚至萎缩。标准化考试受冲击最直接:它的全部价值建立在"可比较、难作弊"上,而当 AI 能在远程、开卷、甚至闭卷的边缘场景里大幅介入,"分数代表能力"这个等式的两端就脱钩了。这里要诚实:标准化考试在公平分发稀缺机会上仍有难以替代的制度价值,本卷不主张废除它;但作为学习的信号,它在 AI 下的失真是结构性的,不能再被当作"学会了"的可靠证据。真正抗 AI 的信号只剩一类:在无 AI 在场、新情境下的现场重现与迁移(SHEET 08 的迁移测试),因为那一关考的恰好是 AI 替不了的"能做"。
Group these three because they share one mechanism – none is learning itself but its signal: grades, credentials, and standardized scores are proxy indicators that compress and transmit "does this person have ability" to the outside world (employers, the next school). The chronic disease of proxy indicators is Goodhart's law, in its testing-specific form Campbell's law〔R22〕 – once a measure becomes a target, it ceases to be a good measure. Students chasing grades rather than mastery, gaming scores rather than seeking understanding – that distortion existed before AI; it was tolerated because the cost of faking these signals used to be high enough: you could not write a passing essay or solve a hard problem set without learning. How AI pushes it toward hollowing: AI cuts the cost of faking these signals to near zero. A high-scoring essay, a correctly solved assignment, a polished project report can now be generated with no internalization whatsoever. This severs the link between the signal and the ability it was supposed to represent – the score keeps rising while ability can stay put or even atrophy. Standardized testing takes the most direct hit: its entire value rests on "comparable, hard to cheat," and once AI can intervene heavily in remote, open-book, even the edges of closed-book settings, the two ends of the equation "score represents ability" come apart. Be honest here: standardized testing still has hard-to-replace institutional value in fairly distributing scarce opportunity, and the volume does not argue to abolish it; but as a signal of learning, its distortion under AI is structural and can no longer be treated as reliable evidence of "has learned." The only AI-resistant signal left is one kind: live reproduction and transfer in a new situation with no AI present (the transfer test, SHEET 08), because that gate tests precisely the "doing" AI cannot stand in for.
把六根支柱收成一句:旧结构优化的是"获取",而获取已经免费
The six pillars in one sentence: the old structures optimized "acquisition," and acquisition is now free
To close this sheet: the six pillars look different but share one mismatch. All were born in a world where information acquisition was expensive, so all optimized the structure at the acquisition end – the lecture optimizes distribution, coverage optimizes throughput, the exam optimizes spot-checking, the signaling machinery optimizes filtering. And SHEET 02's cost scissors already showed: acquisition ("knowing") is collapsing toward zero, the bottleneck has moved wholesale to internalization ("doing"). So these structures, which press all their engineering effort onto the acquisition end, are optimizing a bottleneck already solved, while doing almost nothing for the real new bottleneck – internalization, judgment, asking – and even actively obstructing it by continuously manufacturing the illusion that "acquisition equals learned." This is not a call to overthrow them all: the lecture is still an efficient trigger (igniting a question), the exam is still a useful retrieval prompt (so long as it is used frequently, low-stakes, across situations), the standardized test still has institutional value in fair distribution. What needs replacing is not these tools but the acquisition-end worldview behind them – "learning = moving information into the head and back out at exam time." The AI-Native learning methodology is exactly a redraw that migrates the engineering center of gravity from the acquisition end to the internalization end – the scaffold, stop-line, reflection log, and transfer test that follow are all new tools built for that real bottleneck.
检验信号Test signal
一个简单的自检能区分你身处哪一端:问"如果把这门课/这套训练里的获取部分(讲、读、查)全交给 AI,还剩下什么算作学习?"——如果答案是"几乎不剩",那它优化的就是已被解掉的获取瓶颈,正在空心化;如果剩下的是大量的自产-自纠-迁移-反思,那它优化的是内化端,AI 充裕反而让它更纯粹。A simple self-check tells which end you sit at: ask, "if I hand the acquisition parts of this course/training (lecturing, reading, looking up) entirely to AI, what is left that counts as learning?" – if the answer is "almost nothing," it was optimizing the already-solved acquisition bottleneck and is hollowing out; if what remains is plenty of self-production, self-correction, transfer, and reflection, it was optimizing the internalization end, and AI abundance only makes it purer.
LEARN
12
CASES · 案例
CASES
案例 · 把内核走一遍
Cases · walking the kernel through
把方法论走到一个真人身上:四个具体案例
Walking the methodology onto a real person: four concrete cases
Load-bearing claim: if the preceding mechanics, instruments, and stop-line stay abstract, they have not yet been tested. This sheet walks them onto four concrete scenes – a real cognitive-offloading audit, a desirable-difficulty course redesign, a stop-line landed on a specific task, and an atrophy caught by the dashboard before it hardened. Each case gives before/after, the judgment kept in human hands, and which step of the methodology it instantiates.
案例一 · 一名后端工程师对"读懂陌生代码库"做的卸载体检
Case 1 · A backend engineer's offloading audit on "reading an unfamiliar codebase"
情境具体化:一位有六年经验的后端工程师,过去一年里高度依赖 AI 来"读懂"陌生代码——接手一个旧服务时,习惯把整个文件贴给 AI,要它"解释这段在干嘛"。他用 INSTRUMENT 11(外包 vs 内化体检)给这项能力切两轴,结果落在最危险的那格:高可充裕(AI 确实能逐行解释)× 高不可外包(读懂系统是他做架构判断、code review、定位线上故障的底座)——便利陷阱。这逼他做了一次更细的拆分:把"读懂代码库"这一团能力拆成三件子能力,分别体检,而不是整团交出或整团留下。
Make the scene concrete: a backend engineer with six years' experience leaned heavily on AI over the past year to "read" unfamiliar code – on inheriting an old service, the habit was to paste an entire file to AI and ask it to "explain what this does." He ran this capacity through INSTRUMENT 11 (the offload-vs-internalize audit) on two axes, and it landed in the most dangerous cell: high abundance (AI genuinely can explain line by line) × high un-outsourceability (understanding the system is the bedrock of his architecture judgment, code review, and production-incident triage) – the convenience trap. This forced a finer split: break the lump capacity "reading a codebase" into three sub-capacities, audited separately, rather than handing over or keeping the whole lump.
子能力 ASub-capacity A
语法/API 含义Syntax / API meaning
"这个库函数的参数是什么意思"——纯"知道"层,可充裕、外包无损。判定:交给 AI。强行记忆是在和已解瓶颈较劲。"What do this library function's parameters mean" – pure "knowing" layer, abundant, safe to outsource. Verdict: hand to AI. Memorizing it is wrestling a solved bottleneck.
子能力 BSub-capacity B
控制流/数据流追踪Control-/data-flow tracing
"一个请求进来后在系统里怎么流动"——混合区。让 AI 当陪练(先自己画一遍流程,再让 AI 找漏),保留自产-自纠环。"How a request flows through the system once it arrives" – the mixed zone. Let AI spar (sketch the flow yourself first, then have AI find gaps), keeping the self-produce/self-correct loop.
子能力 CSub-capacity C
设计意图/隐性约束Design intent / implicit constraints
"当初为什么这样设计、哪些是不能动的隐形约束"——止步线内。刻意不外包:这正是他作为资深工程师不可替代的判断根。"Why it was designed this way, which invisible constraints must not be touched" – inside the stop-line. Deliberately not outsourced: this is the irreplaceable root of his judgment as a senior engineer.
Paste the whole file to AI, read its summary. Three months later, on an adjacent module, he still had to re-paste and re-ask – his mental model of the system had grown not at all; every time was dependence from zero. In one production incident, AI's explanation led him to the wrong module, and with no independent system map to push back with, he lost two extra hours.
之后 · 按子能力分治After · split by sub-capacity
A 全交 AI;B 先自己画控制流再让 AI 查漏;C 关掉 AI 自己重建设计意图、记进反思库。两个月后,他对这个服务有了一张自己脑中的系统图——下一次故障,他能用这张图反驳 AI 的错误猜测。省力没减多少(语法层仍全交),但留存与判断力实打实长了。
A handed fully to AI; B sketch the control flow himself, then have AI find gaps; C close AI, rebuild design intent himself, log it in the reflection base. Two months on, he held a system map in his own head for this service – at the next incident he could use it to overrule AI's wrong guess. Effort saved barely dropped (the syntax layer is still fully handed over), but retention and judgment grew for real.
这个案例的承重点不是"少用 AI",而是把一团模糊的能力拆细到可以分别判定的颗粒度。整团交出会萎缩判断根;整团留下是在和已解瓶颈较劲。真正的功夫在那把"分治的刀"——而挥这把刀的判断(哪一格是止步线内)恰恰是 AI 替不了、必须留在人手里的那一步。这正是 SHEET 07 交出/保留决策走到一个真人身上的样子。
The load-bearing point of this case is not "use AI less" but splitting a fuzzy lump of capacity down to a granularity where each piece can be judged separately. Handing the whole lump over atrophies the judgment root; keeping the whole lump wrestles a solved bottleneck. The real craft is in that "knife of decomposition" – and the judgment that wields it (which cell is inside the stop-line) is precisely the step AI cannot replace and must stay in human hands. This is SHEET 07's keep/hand-over decision walked onto a real person.
FIG. L.13 / 交出 vs 保留 · 决策树HAND-OFF VS KEEP · DECISION TREE看懂:三个问题把一项能力分流到四种处置Read: three questions route a capacity into four dispositions
沿树走:Q1 把纯"知道"层筛掉(交给 AI);Q2 是主权问题,把"萎缩了会动摇判断根"的能力拦下;Q3 是 Bjork 边界——只有已具基础的人,保留难度才"合意",否则先补基础或让 AI 陪练。落在最左和最右两支的能力放心处置;落在底部那支的,才是真正要划进止步线、刻意设阻力的。这棵树的每个判断点都需要人来答——它本身就是 AI 替不掉的那层。Walking the tree: Q1 screens out the pure "knowing" layer (hand to AI); Q2 is the sovereignty question, intercepting capacities whose atrophy would shake the judgment root; Q3 is Bjork's boundary – only for those with a base is retained difficulty "desirable," otherwise build the base first or let AI spar. Capacities landing on the far-left and far-right branches are disposed of safely; only those on the bottom branch are truly to be drawn inside the stop-line with deliberate friction. Every decision node on this tree must be answered by a human – it is itself the layer AI cannot displace.
案例二 · 一门数据结构课把"合意困难"重新设计回去
Case 2 · A data-structures course redesigns desirable difficulty back in
情境:一位大学讲师发现,自从学生普遍用上 AI 编程助手,数据结构课的作业分数全面上升,但期中闭卷一考——分数明显下降。学生交上来的链表、树、图作业近乎完美,可一旦撤掉 AI、换一道结构相似但题面陌生的题,大多数人写不出来。这是 SHEET 02 识别/重现错觉的教科书级现场:作业制造了识别层的虚假流畅,掩盖了重现层的空洞。她没有禁用 AI(那是脆弱的、也执行不了的),而是按合意困难三类,把刻意的难度重新设计回课程里,且每一处都设计成"AI 在场也绕不开"。
Scene: a university lecturer noticed that ever since students broadly adopted AI coding assistants, data-structures assignment scores rose across the board – but the closed-book midterm crashed. Submitted linked-list, tree, and graph assignments were near-perfect, yet remove AI and swap in a structurally similar but unfamiliar problem, and most could not produce it. This is a textbook scene of SHEET 02's recognition/reproduction illusion: assignments manufactured recognition-layer fluency that masked a hollow reproduction layer. She did not ban AI (fragile and unenforceable) but, along the three families of desirable difficulty, redesigned deliberate difficulty back into the course – each piece designed so that "even with AI present you cannot bypass it."
间隔 SpacingSpacing
回访式小测Revisiting quizzes
每周三分钟闭卷小测,必考三周前学过的旧主题——逼分散提取,对撞"学完就忘"。基于间隔效应(Cepeda 元分析 Ⅰ 级〔R14〕)。A weekly three-minute closed-book quiz always tests a topic from three weeks ago – forcing distributed retrieval, colliding with "learn-then-forget." Grounded in the spacing-effect meta-analysis (grade I〔R14〕).
提取 RetrievalRetrieval
先白板再键盘Whiteboard before keyboard
作业要求先交一张手写/口头讲解视频(无 AI),再交代码。提取走在生成前,把重现这一关补回(测试效应 Ⅱ 级〔R1〕)。Assignments require a handwritten/spoken-explanation video first (no AI), then the code. Retrieval precedes generation, restoring the reproduction gate (testing effect, grade II〔R1〕).
交错 InterleavingInterleaving
混合题型作业Mixed problem sets
不再"本周只做树",而是把树/图/哈希混在同一份作业里——逼学生先判断"该用哪种结构",而判断恰是 AI 在场也得自己做的那一步。No more "trees only this week"; trees/graphs/hashing are mixed in one set – forcing students to first judge "which structure applies," and judging is the step they must do themselves even with AI present.
之前Before
单一主题、可一次性用 AI 完成、考前突击的作业流。作业均分 92,期中闭卷均分 58,两者背离却无人警觉——直到期末挂科率翻倍。即时反馈(高作业分)系统性误导了师生双方对掌握程度的判断。
A single-topic, AI-completable-in-one-pass, cram-before-the-exam assignment flow. Assignment average 92, closed-book midterm average 58 – the divergence went unnoticed until the final's failure rate doubled. Immediate feedback (high assignment scores) systematically misled both teacher and students about the degree of mastery.
之后 · 三类困难设计回去After · three difficulties designed back in
Assignment average fell to 81 (more painful in the moment, as expected), but the closed-book midterm average rose to 74, sharply narrowing the gap – the fingerprint of desirable difficulty: sacrifice present fluency for long-term, transferable retention. Students complained "it got harder," yet end-of-term transfer-test pass rates were markedly higher than the prior cohort's.
这个案例的承重点:合意困难不是"让课更难"这么粗。它是精确地把难度加在重现与判断这两个 AI 绕不开的环节上,同时继续把纯获取层交给 AI。讲师没有和工具对抗,她重设的是评估与练习的结构,让结构自己把学习逼回内化端。注意 Bjork 的边界她也守住了:这些难度只对已经听过课、有基础的学生"合意",对完全没基础的旁听者就只是挫败——所以小测和白板讲解都建立在课程已铺好的基础之上(R8 的告诫)。
The load-bearing point: desirable difficulty is not as crude as "make the course harder." It is placing difficulty precisely on the two links AI cannot bypass – reproduction and judgment – while continuing to hand the pure acquisition layer to AI. The lecturer did not fight the tool; she redesigned the structure of assessment and practice so the structure itself forces learning back to the internalization end. Note she also kept Bjork's boundary: these difficulties are "desirable" only for students who attended and have a base; for a baseless auditor they are mere frustration – so the quizzes and whiteboard explanations all build on the base the course has already laid (R8's caveat).
案例三 · 一位医学生给"鉴别诊断推理"划止步线
Case 3 · A medical student draws a stop-line around "differential-diagnosis reasoning"
情境:一位临床阶段的医学生,用 AI 做症状到诊断的推理非常顺畅——输入一组症状,AI 秒给一份排好序的鉴别诊断列表,附带每条的支持/反对证据。她差点把整个推理过程外包出去。但她对这项能力跑了一遍 SHEET 06 的止步线决策,结论是:这是绝不能退的那条边界。理由不是"AI 不准"(它常常很准),而是更结构性的一点——临床推理是她未来独立行医时为后果负责的能力根基。一个不能独立做出鉴别诊断的医生,无法判断 AI 给出的列表"这次靠不靠谱",也无法在 AI 漏掉一个罕见但致命的诊断时把它捞回来。她外包的可以是文献检索、剂量计算、指南查询,但不能外包"从症状推到诊断"这条主推理链。
Scene: a clinical-phase medical student found AI extremely handy for symptom-to-diagnosis reasoning – feed in a cluster of symptoms and AI instantly returns a ranked differential list with supporting/opposing evidence for each. She nearly outsourced the entire reasoning process. But she ran this capacity through SHEET 06's stop-line decision and concluded: this is the boundary that must never be ceded. The reason is not "AI is inaccurate" (it is often very accurate) but something more structural – clinical reasoning is the capacity-root by which she will be accountable for consequences in independent practice. A physician who cannot independently form a differential cannot judge whether AI's list "holds up this time," nor catch a rare-but-lethal diagnosis when AI misses it. What she may outsource is literature search, dose calculation, guideline lookup; what she may not outsource is the main reasoning chain from symptom to diagnosis.
她落地的方式不是"不用 AI",而是设了一道明确的顺序规约:先自己产出完整的鉴别诊断列表并写下推理(自答在先),然后才打开 AI 对照——AI 此时的角色是"找出我漏掉了什么、我哪条推理错了",而不是"替我想"。每一次她和 AI 的诊断列表不一致,无论谁对,都进她的错题反思库(SHEET 09),记的不是"正确答案",而是"我当时为什么会这样推、漏在哪个环节"。三个月下来,她和 AI 不一致、且她对的比例从 9% 升到 23%——这正是 SHEET 08 那条"质疑 AI 的命中率"先行指标在上升,说明她的推理能力没有被 AI 替代,反而在与 AI 的对抗中被磨利了。
Her landing was not "don't use AI" but a clear ordering rule: first produce the full differential and write the reasoning herself (self-answer first), then open AI to compare – AI's role here is "find what I missed, where my reasoning went wrong," not "think for me." Every time her and AI's differentials disagreed, regardless of who was right, it went into her error-reflection log (SHEET 09), recording not "the correct answer" but "why I reasoned that way, which link I missed." Over three months, the share of cases where she and AI disagreed and she was right rose from 9% to 23% – exactly SHEET 08's "AI-challenge hit-rate" leading indicator rising, showing her reasoning was not replaced by AI but sharpened in sparring against it.
止步线的判据The stop-line criterion
这个案例给出止步线最干净的判据:一项能力,如果你萎缩了它就再也无法判断 AI 在这件事上对不对、也无法为后果负责,它就在止步线内。不是因为 AI 做不好,恰恰是因为 AI 常常做得好——做得好才更危险,因为它让外包看起来零成本,而代价(你失去验证与兜底的能力)要等到一次 AI 出错、而你已无力发现时才结算。This case gives the cleanest stop-line criterion: a capacity is inside the stop-line if, once it atrophies, you can no longer judge whether AI is right about this thing nor be accountable for the consequences. Not because AI does it badly, but precisely because AI often does it well – doing it well is the more dangerous case, because it makes outsourcing look costless, while the cost (you lose the ability to verify and backstop) is only settled when AI errs and you are no longer able to notice.
案例四 · 一个自学者在萎缩硬化前用仪表盘抓住了它
Case 4 · A self-learner catches atrophy on the dashboard before it hardened
情境:一名转行学数据分析的自学者,半年里进步飞快——靠的是 AI 全程陪写代码、解释报错、给思路。表面信号全是绿的:项目越做越复杂、产出越来越快。但他按 SHEET 08 给自己挂了一组刻意偏向滞后、无 AI 在场的先行指标,每两周自测一次。第三个月,一条指标先动了:"不用 AI 我还会吗"的失败频次在升——他发现自己越来越难独立写出一段哪怕简单的数据清洗逻辑,必须先打开 AI 才有"手感"。紧接着第二条指标确认了它:迁移测试通过率在降——给他一个结构相似但 AI 没见过上下文的新任务,他卡住的概率比两个月前更高。
Scene: a career-changer self-studying data analysis progressed fast over six months – on the back of AI co-writing code, explaining errors, and supplying ideas throughout. The surface signals were all green: projects grew more complex, output grew faster. But following SHEET 08 he hung a set of leading indicators deliberately biased toward lagged, AI-absent, self-testing every two weeks. In month three, one indicator moved first: the failure frequency of "could I still do this without AI" rose – he found it increasingly hard to independently write even a simple data-cleaning routine, needing to open AI first to get "the feel." Then a second indicator confirmed it: transfer-test pass rate fell – given a structurally similar task whose context AI had not seen, his probability of getting stuck was higher than two months earlier.
关键在于:这两条指标都领先于任何外部失败。他的项目还在正常推进,没有任何老板或客户察觉问题——如果他只看产出,会一直绿灯到某天 AI 用不了、或遇到一个 AI 解不了的真问题时才暴雷,而那时萎缩已经硬化、补救成本极高。仪表盘的全部价值就在这个提前量:它在能力损失转化为可见后果之前就让损失可见。他的补救也很直接:把数据清洗这项能力从"全交"档拨回"合意带"(INSTRUMENT 13),恢复"先自己写、卡住超过 15 分钟才问 AI"的延迟提示规约,并把每次卡点记进反思库。两个月后,那两条指标掉头——他没有放弃 AI,只是把一项悄悄滑出止步线的能力,重新拽了回来。
The crux: both indicators lead any external failure. His projects were still progressing normally, no boss or client noticed anything – had he watched only output, the light would have stayed green until the day AI was unavailable or a real problem AI could not solve appeared, and by then atrophy would have hardened and remediation cost soared. The dashboard's entire value is this lead time: it makes capacity loss visible before that loss turns into a visible consequence. His remedy was direct too: dial the data-cleaning capacity back from "all-hand" to "the band" (INSTRUMENT 13), restore a delayed-hint rule of "write it yourself first, only ask AI after being stuck 15 minutes," and log each sticking point in the reflection base. Two months on, the two indicators turned around – he did not abandon AI; he merely pulled back a capacity that had quietly slid out of the stop-line.
反过来:这套方法论自己会在哪里失败
In reverse: where this methodology fails on its own terms
诚实地走完四个成功案例,还要补一个会失败的案例——否则就违反了本卷自己的证据纪律(别把处方说得比证据硬)。这套方法论有三种现实的失败模式,都值得点名。失败一:止步线划得太宽,退回低效。一个把几乎所有认知都划进止步线、拒绝外包的人,不是认知主权,是和已解瓶颈较劲——他在纯"知道"层(语法、样板、查得到的事实)上浪费的合意困难,本可以省下来投到真正的判断根上。止步线的价值恰恰在于它窄:守住少数几条真正承重的边界,其余尽量交出去。划得太宽,方法论就退化成它批判过的"覆盖优先"的镜像。失败二:基础不足时强加难度,制造的是挫败而非学习。这是 Bjork 边界(R8)最常被忽略的一面——合意困难只对"有基础能成功响应"的学习者合意。对一个连基本概念都没建立的初学者强行闭卷、延迟提示,难度不会变"合意",只会让他停滞、放弃。对这类学习者,正确的做法恰恰相反:先让 AI 当密集的脚手架把基础铺起来,等他能成功响应了,再逐步撤除、调高阻力。把顺序反过来,方法论就成了劝退器。失败三:仪表盘指标被当成新的 KPI 来表演。如果有人把"质疑 AI 的命中率"当成要冲高的分数,他会开始为了指标表现而无意义地反驳 AI——这正好掉进 Campbell 定律〔R22〕:指标一旦成了目标就失真。仪表盘是烟雾报警器,不是计分板;它响了你去查原因,而不是想办法让它显示好看的数字。
Having honestly walked four successful cases, one must add a failing case too – otherwise we violate the volume's own evidence discipline (do not state a prescription harder than its evidence). This methodology has three realistic failure modes, all worth naming. Failure 1: the stop-line drawn too wide, regressing into inefficiency. A person who draws almost all cognition inside the stop-line and refuses to outsource is not exercising cognitive sovereignty but wrestling a solved bottleneck – the desirable difficulty he wastes on the pure "knowing" layer (syntax, boilerplate, look-up-able facts) could have been saved and invested in the real judgment root. The stop-line's value lies precisely in being narrow: guard the few truly load-bearing boundaries and hand the rest out. Drawn too wide, the methodology degrades into a mirror of the "coverage-over-mastery" it critiqued. Failure 2: imposing difficulty without a base produces frustration, not learning. This is the most-ignored face of Bjork's boundary (R8) – difficulty is desirable only for a learner with enough base to respond successfully. Forcing closed-book work and delayed hints on a beginner who has not even built the basic concepts will not make the difficulty "desirable"; it will only jam and discourage them. For such a learner the right move is the opposite: let AI be a dense scaffold to lay the base first, and once they can respond successfully, gradually withdraw it and raise the friction. Get the order backwards and the methodology becomes a quit-trigger. Failure 3: the dashboard indicators gamed as a new KPI. If someone treats "AI-challenge hit-rate" as a score to maximize, they will start pushing back on AI meaninglessly just to game the metric – falling straight into Campbell's law〔R22〕: an indicator distorts once it becomes a target. The dashboard is a smoke alarm, not a scoreboard; when it sounds you investigate the cause, you do not engineer it to display a flattering number.
These three failure modes share one antidote, which returns us to the kernel: judgment. How wide to draw the stop-line, how much difficulty to add, how to read the indicators – none can be handed to a fixed formula; all require a person to judge by the specific situation and keep adjusting as capacity grows. This is exactly what proves the volume's core stance is not the crude posture of "use AI less" but "keep the judgment step in human hands" – even this methodology itself must be calibrated by its user's judgment rather than applied mechanically. A methodology that requires judgment to use correctly is itself a demonstration of the proposition that judgment cannot be outsourced.
四个案例横跨工程、教学、临床、自学,但走的是同一条内核路径:先用机理看清"知道 vs 能做"的成本差,再用仪器把一团能力拆到可判定的颗粒,对落在止步线内的那部分设合意困难、建反思库、挂先行指标。它们共同演示的不是"少用 AI",而是把判断这一步牢牢留在人手里,把执行尽量交出去,并用工程化的反馈回路守住这条边界不被便利的默认引力悄悄推移。
The four cases span engineering, teaching, the clinic, and self-study, yet walk the same kernel path: use the mechanics to see the "knowing vs doing" cost gap, use the instruments to split a lump of capacity to a judgeable grain, and for the part landing inside the stop-line add desirable difficulty, build a reflection log, hang leading indicators. What they jointly demonstrate is not "use AI less" but keeping the judgment step firmly in human hands, handing execution out as far as possible, and using engineered feedback loops to keep this boundary from being quietly pushed by convenience's default gravity.
LEARN
13
TOOLKIT · 工具包
TOOLKIT
工具包 · 可直接照做
Toolkit · do-this artifacts
五件可直接照做的内化端工具
Five internalization-end tools you can run as-is
承重命题:方法论若不落成"明天就能用"的具体规约,就还停在态度层。这一张给五件内化端工具——错题反思库规约、先自答协议、迁移测试设计、提问者训练操、AI 止步线决策程序。每件都是可拷贝的 do-this 工件:有输入、有步骤、有判据,不是口号。配一件新的交互仪器(INSTRUMENT 14),把"交出 vs 保留 + 阻力调到刚好"两步合成一次可现场操作的判定。
Load-bearing claim: a methodology that does not land in concrete "usable tomorrow" rules stays at the level of attitude. This sheet gives five internalization-end tools – the error-reflection-log spec, the self-answer-first protocol, transfer-test design, questioner-development drills, and the AI stop-line decision procedure. Each is a copyable do-this artifact with inputs, steps, and criteria, not a slogan. Paired with a new interactive instrument (INSTRUMENT 14) that fuses the two steps "hand off vs keep + dial the friction just right" into one live verdict.
工具一 · 错题反思库规约:记的是推理路径,不是答案
Tool 1 · The error-reflection-log spec: log the reasoning path, not the answer
大多数人记的"错题本"记错了对象——抄下正确答案,下次照抄。那本质上还是在记"知道",对内化毫无帮助。AI-Native 的反思库(SHEET 09)记的是你当时的推理路径与它在哪里断裂,因为可迁移的能力长在路径上,不在答案上。每条记录是固定五栏,刻意做到人机同源、可 diff(你和 AI 各填一份,差异本身就是学习信号):
Most people's "error notebook" logs the wrong object – copy down the correct answer, copy it again next time. That is still logging "knowing," useless for internalization. The AI-Native reflection log (SHEET 09) records your reasoning path at the time and where it fractured, because transferable capacity grows on the path, not the answer. Each entry is a fixed five-field record, deliberately same-source and diffable (you and AI each fill one, the difference itself a learning signal):
栏 1Field 1
触发Trigger
遇到的具体问题,以及"我当时以为这是个什么问题"——很多错从误判问题类型开始。The specific problem, and "what kind of problem I thought it was" – many errors start with misclassifying the problem type.
栏 2Field 2
我的路径My path
撤掉 AI、凭自己走的完整推理(哪怕错的),逐步写下——这是重现而非识别,最吃力也最值钱。The full reasoning you walked yourself with AI removed (even if wrong), written step by step – this is reproduction not recognition, the hardest and most valuable.
栏 3Field 3
断裂点Fracture point
和 AI/正解对照后,定位推理在哪一步第一次偏离,以及偏离的根因(不是"算错了",是"我为什么会这样想")。After comparing with AI/the solution, locate at which step the reasoning first diverged, and the root cause (not "miscalculated" but "why I thought that way").
栏 4Field 4
可迁移的修正Transferable fix
把断裂点抽象成一条下次能用在别的题上的规则,而非只修这一题。迁移性是它和普通错题本的分水岭。Abstract the fracture into a rule usable on other problems next time, not just patching this one. Transferability is the watershed from an ordinary error notebook.
栏 5Field 5
回访日Revisit date
按间隔效应排一个未来日期,到期闭卷重做这条——不回访的反思库只是仓库(R14)。Schedule a future date by the spacing effect; on it, redo this entry closed-book – a reflection log never revisited is just a warehouse (R14).
为什么坚持人机同源、可 diff?因为你和 AI 各自独立填栏 2(推理路径)后,两份的差异就是最高密度的学习信号:AI 想到而你没想到的,是你的盲区;你想到而 AI 没想到的,是你尚存的、值得守护的独立判断。把这个差异本身当成学习对象,比单看正确答案信息量大一个量级。这条规约不依赖任何特定软件,一个表格、一个 Markdown 文件、一张纸都能跑——它的承重不在工具,在"记路径不记答案"这条纪律。
Why insist on same-source and diffable? Because once you and AI each independently fill field 2 (the reasoning path), the difference between the two is the highest-density learning signal: what AI thought of and you did not is your blind spot; what you thought of and AI did not is your surviving, worth-guarding independent judgment. Treating that difference itself as the learning object carries an order of magnitude more information than reading the correct answer alone. This spec depends on no particular software – a spreadsheet, a Markdown file, a sheet of paper all run it – its load is not in the tool but in the discipline of "log the path, not the answer."
工具二 · 先自答协议:永远不空手向 AI 求助
Tool 2 · The self-answer-first protocol: never ask AI empty-handed
这是全卷处方里最小、也最高杠杆的一条,可以浓缩成一句操作律:在向 AI 提问之前,先写下你自己的答案、猜测或推理草稿,哪怕它很可能是错的。它之所以是杠杆点,是因为它一次性守住了三样东西。其一,它强制激活重现(你得先从脑中生成,而非等着识别 AI 的输出),把识别/重现错觉关在门外。其二,它把 AI 从"答案来源"降级为"对照与校验源"——你带着一个待检验的假设去,而不是带着一个空洞去,于是你读 AI 输出时是在验证而非接收,验证这个动作本身就在练判断。其三,它产出了反思库栏 2 所需的原始材料——你的独立路径。落地形态可以极轻:在每个 AI 对话框上方放一行自我提示"我先猜:______",或养成"先在便签上写三句再回车"的习惯。
This is the smallest and highest-leverage prescription in the volume, compressible to one operating law: before asking AI, write down your own answer, guess, or draft reasoning first, even if it is likely wrong. It is a leverage point because it guards three things at once. First, it forces reproduction (you must generate from your head rather than wait to recognize AI's output), shutting the recognition/reproduction illusion out. Second, it demotes AI from "answer source" to "comparison and verification source" – you arrive with a hypothesis to test rather than a void, so when you read AI's output you are verifying, not receiving, and the act of verifying is itself practicing judgment. Third, it produces the raw material field 2 of the reflection log needs – your independent path. The landing form can be extremely light: a self-prompt line above each AI chat box, "my guess first: ______," or a habit of "write three sentences on a sticky note before pressing enter."
工具三 · 迁移测试设计:撤掉 AI、换情境、看还能不能做出来
Tool 3 · Transfer-test design: remove AI, change the context, see if you can still do it
SHEET 08 说迁移测试通过率是最该看的信号;这里给出怎么设计一道合格的迁移测试。一道好的迁移测试要同时满足三个条件,少一个就会退化成自欺。条件一,无 AI 在场——只要 AI 可触达,测的就是"你+AI"的联合能力,而不是你的能力,这是最常被偷掉的条件。条件二,情境变化——题面、数据、上下文必须与你练过的不同,否则你测的可能只是对特定题型的模式记忆,而非可迁移的理解;迁移的全部意义在于"换个壳还认得出内核"。条件三,要求重现而非识别——必须让你从头生成(写出、做出、讲出),而不是从给定选项里认出对的。判据很干脆:通过 = 在以上三条都满足时还能独立做出来。设计节奏建议跟睡眠/间隔走(R14/R15)——隔天测、隔周测、隔月测,三道都过,才算真的编译进了"能做"。一个轻量版本:每学完一个主题,让 AI 帮你生成一道"结构相同但换皮"的新题,然后立刻关掉 AI,闭卷做——这里 AI 是出题者,不是答题者,这个角色分配本身就是方法论。
SHEET 08 says transfer-test pass rate is the signal most worth watching; here is how to design a valid transfer test. A good one must satisfy three conditions at once; drop any and it degrades into self-deception. Condition one, no AI present – as long as AI is reachable, you are testing the joint "you + AI" capacity, not yours; this is the most frequently stolen condition. Condition two, changed context – the wording, data, and context must differ from what you practiced, or you may be testing pattern-memory for a specific problem type rather than transferable understanding; transfer's whole meaning is "recognize the kernel through a new shell." Condition three, require reproduction, not recognition – you must generate from scratch (write, do, explain), not pick the right one from given options. The criterion is blunt: pass = you can still do it independently when all three hold. Pace the design with sleep/spacing (R14/R15) – test next-day, next-week, next-month; pass all three and it is truly compiled into "doing." A lightweight version: after finishing a topic, have AI generate a "same-structure, reskinned" new problem, then immediately close AI and do it closed-book – here AI is the question-setter, not the answerer, and that role assignment is itself the methodology.
FIG. L.14 / 三类合意困难的机理THE THREE DESIRABLE DIFFICULTIES看懂:间隔/提取/交错各自怎么把"当下变难"换成"长期变牢"Read: how spacing/retrieval/interleaving each trade "harder now" for "sturdier later"
三栏一起读:间隔(左)把练习拉开,对撞集中突击;提取(中)让信息从脑中输出而非再输入,对撞重读式的虚假流畅;交错(右)把不同题型混排,逼出"该用哪招"的判断。三者机理不同,但指纹相同——都让你当下更吃力(红),换来长期更牢、且能迁移到新情境的留存(蓝)。这正是为什么 AI 提供的"轻松"是个陷阱:轻松感恰恰是合意困难被绕过的信号。Read the three panels together: spacing (left) pulls practice apart, colliding with massed cramming; retrieval (center) sends information out of the brain rather than back in, colliding with reread fluency; interleaving (right) mixes problem types, forcing the judgment of "which move applies." Different mechanisms, same fingerprint – each makes you more effortful now (red) for sturdier, transfer-capable retention later (blue). This is exactly why the "ease" AI offers is a trap: the feeling of ease is precisely the signal that desirable difficulty has been bypassed.
工具四 · 提问者训练操:把"会答"练成"会问"
Tool 4 · Questioner-development drills: training from "can answer" into "can ask"
SHEET 03 主张学习目标上游到提问、质疑、整合这套元能力。但"成为更好的提问者"常常停在口号——这里给四个可练的具体操,每个都对准一种被 AI 充裕悄悄削弱的提问肌肉。它们不需要任何特殊场景,把日常和 AI 的协作改造一下就能练。
SHEET 03 argues the learning goal moves upstream to the meta-skills of asking, challenging, integrating. But "become a better question-asker" often stalls at a slogan – here are four practicable drills, each aimed at a question-muscle quietly weakened by AI abundance. They need no special setting; reshape your everyday AI collaboration and you train them.
操 1Drill 1
反驳得对Push back, correctly
每次 AI 给完答案,强制自己找出至少一处可质疑点并验证它对不对。练的是 SHEET 08 的"质疑命中率",这是头号先行指标(R18)。After every AI answer, force yourself to find at least one challengeable point and verify whether it holds. Trains SHEET 08's "challenge hit-rate," the top leading indicator (R18).
操 2Drill 2
问题升格Upgrade the question
把你刚问 AI 的问题,重写成一个更切中真问题的版本——练的是"识别真问题",这是 AI 替不了的判断起点。Rewrite the question you just asked AI into a version that hits the real problem more squarely – trains "spotting the real problem," the judgment starting point AI cannot stand in for.
操 3Drill 3
追到底层Chase to the bottom
对 AI 的每个答案连问三层"为什么",直到触到第一性原理或触到它的知识边界——练的是把识别层的"懂了"推进到机理层。Ask "why" three layers deep on every AI answer, until you hit a first principle or its knowledge boundary – trains pushing recognition-layer "got it" down to the mechanism layer.
操 4Drill 4
先建假设Hypothesis first
面对一个新领域,先自己列出"我猜关键问题是哪些",再让 AI 补全——练的是带着结构去探索,而非让 AI 替你定义问题空间。Facing a new field, first list "what I guess the key questions are," then let AI complete it – trains exploring with structure rather than letting AI define the problem space for you.
The four drills share one underlying logic: what AI abundance weakens first is not the capacity to "answer" (that was already outsourced) but the capacity to "ask" – because when answers are at hand, a person gradually loses the habit of generating questions in the head first and judging which question is worth asking. The asking-muscle atrophies once idled, and it is precisely the front end of judgment and taste. These four drills are the deliberate load placed on that muscle.
工具五 · AI 止步线决策程序:把"该不该外包"变成可重复的判定
Tool 5 · The AI stop-line decision procedure: making "outsource or not" a repeatable verdict
前四件是练习,这一件是判定程序——把 FIG L.13 决策树固化成一段可重复执行的流程,让"该不该把这件事交给 AI"不再凭感觉,而有一套每次都能跑的步骤。它有五步,顺序不能乱:① 拆颗粒——先把要判定的能力拆到"可单独判定"的最小颗粒(案例一的教训:整团判定必然出错)。② 问可充裕——AI 能否充裕代劳且有可机检判据?否→暂缓,照常自学;是→进③。③ 问主权——若这项能力萎缩,你是否就再也无法判断 AI 在此事上对不对、也无法为后果负责?否→放心外包,把省下的精力还给止步线内的事;是→进④。④ 问基础——你是否已有足够基础能"成功响应"刻意保留的难度(Bjork 边界 R8)?否→先补基础或让 AI 当陪练;是→进⑤。⑤ 落处置——划入止步线:让 AI 当陪练而非代办,设合意困难(用 INSTRUMENT 14 把阻力调到合意带),建反思库,挂先行指标。这套程序的价值在于它可重复、可复盘:每次判定都留下"为什么这样判"的痕迹,三个月后你能回看自己的止步线是否需要调整——能力在长,止步线也该随之移动。
The first four are drills; this one is the verdict procedure – freezing the FIG L.13 decision tree into a repeatable run so that "should I hand this to AI" is no longer by feel but a set of steps you can run every time. It has five steps, order fixed: ① Split the grain – break the capacity under judgment to the smallest "separately judgeable" grain (Case 1's lesson: judging the whole lump necessarily errs). ② Ask abundance – can AI abundantly do it with a machine-checkable criterion? No → park, keep learning yourself; yes → go to ③. ③ Ask sovereignty – if this capacity atrophies, would you no longer be able to judge whether AI is right about it nor be accountable for the consequences? No → hand off freely, return the saved effort to what is inside the stop-line; yes → go to ④. ④ Ask base – do you already have enough base to "respond successfully" to retained difficulty (Bjork's boundary, R8)? No → build the base first or let AI spar; yes → go to ⑤. ⑤ Land the disposition – draw it inside the stop-line: let AI spar rather than do-for-you, add desirable difficulty (use INSTRUMENT 14 to dial friction to the band), build a reflection log, hang leading indicators. The procedure's value is that it is repeatable and reviewable: each verdict leaves a trace of "why I judged this way," so three months on you can revisit whether your stop-line needs adjusting – as capacity grows, the stop-line should move with it.
INSTRUMENT 14 · 交出还是保留 · 现场判定器 HAND-OFF OR KEEP · LIVE DECIDER
Turn Tool 5's decision procedure into one live click: answer three questions in order, the instrument returns a disposition (hand off freely / park and self-learn / draw inside the stop-line), and when it is "inside the stop-line" it lets you dial the friction to the band right there – that step is INSTRUMENT 13's dial embedded in. It fuses "hand off vs keep" and "dial friction just right" into a single verdict, mirroring the FIG L.13 tree.
Q1 · AI 能充裕代劳、且有可机检判据吗?Can AI abundantly do it, with a machine-checkable criterion?
Q2 · 萎缩它,你会失去判断/兜底主权吗?If it atrophies, do you lose sovereignty to judge / backstop?
Q3 · 你已有足够基础响应难度吗?(Bjork 边界)Do you have enough base to respond to difficulty? (Bjork's bound)
划入止步线了——把阻力调到合意带:Inside the stop-line – dial the friction to the band:
Load-bearing claim: AI's core promise is to accelerate everything. But one class of learning process is valuable precisely because it takes time – memory consolidation runs on spacing and sleep, physical time that AI cannot compress. This is not a philosophical slogan but a hard core with both neural and behavioral evidence (the evidence ledger). Take it as an axiom: scaffolds should be designed along this time constant, not against it.
"加速一切"在这里撞上一堵物理墙。记忆从短期变成长期的巩固过程需要时间,且需要睡眠:间隔效应里 24 小时间隔胜过 15 分钟间隔[R14];睡眠期(SWS/REM)大脑主动回放、重激活记忆,单是一段睡眠就能在零额外练习下提升技能表现(Stickgold 2006;Diekelmann & Born 2010;Walker et al. 2003,Ⅱ)。睡眠对记忆的泛化与迁移尤其关键——而迁移正是"能做"的检验标准(SHEET 02)。
"Accelerate everything" hits a wall of physics here. The consolidation by which memory turns from short- to long-term takes time, and needs sleep: in the spacing effect a 24-hour gap beats a 15-minute gap [R14]; during sleep (SWS/REM) the brain actively replays and reactivates memory, and a single bout of sleep can improve skill performance with zero extra practice (Stickgold 2006; Diekelmann & Born 2010; Walker et al. 2003, II). Sleep is especially crucial for the generalization and transfer of memory – and transfer is the test standard of "knowing how" (SHEET 02).
旧误读 · 学习即传输Old misreading · learning as transfer
既然信息能瞬时传输,学习的"慢"被当成纯摩擦——等待、重复、睡一觉,全是待优化的延迟。于是 AI 时代的诱惑是:把一切压成一次性灌输,今晚学完今晚就会。
Since information transmits instantly, learning's "slowness" gets treated as pure friction – waiting, repetition, sleeping on it, all delays to be optimized away. So the temptation of the AI era is: compress everything into a one-shot infusion, learn it tonight and know it tonight.
新 · 原理New · principle
巩固所需的脑状态(间隔、睡眠)是物理时间,不可被 AI 压缩。"慢"不是摩擦,是一部分价值来源。处方随之改变:脚手架内建间隔与睡眠节律(SHEET 09 的"下次复看日期"就是这条的工程化),把时间当盟友,不当敌人。
The brain states consolidation needs (spacing, sleep) are physical time and cannot be compressed by AI. "Slow" is not friction but a source of value. The prescription shifts: scaffolds build in spacing and sleep rhythm (SHEET 09's "next review date" is the engineering of this), treating time as ally, not enemy.
There is also a "speed backfire" corroboration, a reminder not to take "fast" as the default good. METR 2025's RCT: 16 senior open-source developers, 246 real issues; allowing AI made them 19% slower – while they predicted 24% faster and still believed they were 20% faster afterward, subjective diverging from objective, with the slowdown sharper on their most familiar tasks. Caveat: small N, a possible learning curve, cannot be extrapolated to "AI always slows experts" (grade III corroboration, not a settled finding). Its use is to puncture the default assumption of "accelerate everything," not to prove the inverse "AI is always slower."
速度公理怎样反过来约束整卷的处方
How the speed axiom constrains the whole volume's prescription
速度公理不是一张孤立的 SHEET,它给整卷的处方加了一条约束:凡是声称"让学习更快"的方案,都要先过一关——它压缩掉的,是不是巩固所必需的物理时间?这条约束让本卷与市面上绝大多数"AI 高效学习"方案分道扬镳。那些方案的卖点几乎都是压缩时间:一晚速通一门课、十分钟掌握一个概念、把一周的内容塞进一次会话。速度公理直接判定这类承诺在"能做"层是结构性地不可能的——你可以瞬时获得信息(知道),但你无法瞬时完成巩固(能做),因为后者要走一段你睡着时才发生、且 AI 进不去的离线加工。于是本卷的处方在每个落点都反向设计:脚手架内建间隔而非集中(SHEET 09 的复看日期),检验信号取隔期表现而非当堂表现(SHEET 02/08),合意困难刻意让当下变慢(SHEET 05)。这些设计放在一起,共同服从一条母约束:顺着时间常数,而不是对抗它。认清这一点,就明白为什么本卷敢说"有些过程的价值正在于慢"——慢不是本卷的审美偏好,是巩固这件事的物理参数,任何无视它的加速方案,都在拿长期留存换当下的流畅感。
The speed axiom is not an isolated SHEET; it adds a constraint to the whole volume's prescription: any scheme claiming to "make learning faster" must first pass one gate – does what it compresses include the physical time consolidation requires? This constraint parts the volume from the vast majority of "efficient AI learning" schemes on the market. Their selling point is almost always compressing time: cram a course in one night, master a concept in ten minutes, stuff a week's content into one session. The speed axiom directly rules such promises structurally impossible at the "doing" layer – you can obtain information instantly (knowing), but you cannot complete consolidation instantly (doing), because the latter runs through offline processing that happens while you sleep and that AI cannot enter. So the volume's prescription is designed in reverse at every landing point: scaffolds build in spacing not massing (SHEET 09's review dates), test signals take lagged not in-session performance (SHEET 02/08), desirable difficulty deliberately slows the present (SHEET 05). Put together, these designs all obey one master constraint: go with the time constant, not against it. Grasp this and you see why the volume dares to say "some processes are valuable precisely because they are slow" – slow is not an aesthetic preference but a physical parameter of consolidation, and any acceleration scheme that ignores it is trading long-term retention for present-tense fluency.
睡眠不是停机,是离线的主动重训
Sleep is not downtime but active offline retraining
"慢有价值"里最容易被略过、却最硬的一块,是睡眠的角色。直觉把睡眠当成学习的间歇——什么都没发生的停机时间。神经科学给出相反的图景:睡眠是记忆主动加工的高峰期,不是空档。慢波睡眠期间,海马把白天获得的记忆痕迹反复重激活、回放,逐步转交给新皮层做长期存储;这个过程不是被动衰减,是有方向的重训(Stickgold 2006;Diekelmann & Born 2010,Ⅱ)。最惊人的证据是:在一项技能任务里,单纯睡一觉,零额外练习,表现就能提升——大脑在你睡着时替你把白天没练顺的东西又练了一遍。这对学习方法论有一个直接而反直觉的推论:把学习压缩进一次性的熬夜速通,不只是累,是主动切断了记忆从脆弱转向稳定的那条必经通道。AI 能把信息瞬时送达,但它送不来那一夜的离线巩固[R15]。所以本卷的处方里,"睡够、把学习摊到多天"不是养生建议,是和间隔效应同源的硬约束——脚手架(SHEET 09 的间隔复看)正是顺着这条睡眠-巩固的时间常数设计的。
The hardest, most-skipped piece of "slow has value" is the role of sleep. Intuition treats sleep as a gap in learning – downtime where nothing happens. Neuroscience gives the opposite picture: sleep is a peak of active memory processing, not a void. During slow-wave sleep the hippocampus repeatedly reactivates and replays the day's memory traces, gradually handing them to the neocortex for long-term storage; this is not passive decay but directed retraining (Stickgold 2006; Diekelmann & Born 2010, II). The most striking evidence: on a skill task, merely sleeping, with zero extra practice, improves performance – the brain reran for you, while you slept, what you had not yet smoothed out by day. This yields a direct, counter-intuitive corollary for the methodology: cramming learning into a one-shot all-nighter is not just tiring but actively severs the obligatory passage by which memory turns from fragile to stable. AI can deliver information instantly, but it cannot deliver that night's offline consolidation [R15]. So in the volume's prescription, "sleep enough, spread learning over days" is not wellness advice but a hard constraint cognate with the spacing effect – the scaffold (SHEET 09's spaced review) is designed precisely along this sleep-consolidation time constant.
主观与客观的背离:为什么"感觉更快"最不可信
The subjective-objective gap: why "felt faster" is least trustworthy
METR 2025 那个 RCT 里最值得反复讨论的,不是"慢了 19%"这个数,而是主观与客观的方向相反:16 名资深开发者预测用 AI 会快 24%,实际慢了 19%,做完之后仍然以为自己快了 20%。三个数排在一起,画出一条令人不安的背离——人对"自己是否变快/变好"的内省,在 AI 介入后系统性地失准。这条背离和合意困难的元认知陷阱是同一个现象的两个场景:那里是"当下流畅被误读为已掌握",这里是"用 AI 的顺手被误读为更高效"。共同的根是:大脑用'省力感'来代理'效果',而 AI 恰好把省力感和效果解耦了——它能让过程非常省力,却不保证(甚至反向于)真实产出或真实留存。口径必须固定:METR 是 Ⅲ 级辅证,N 小、可能含学习曲线、且是编程任务,不能外推成"AI 永远拖慢专家";它的唯一用途是校正"省力 = 高效"这个默认等式。一旦这个等式被校正,"凭感觉判断学习效果"这件事就彻底不可靠了——这正是 SHEET 08 仪表盘必须偏向滞后、无 AI 在场指标的根本原因。
The most chewable thing in METR 2025's RCT is not the "19% slower" number but that subjective and objective point opposite ways: 16 senior developers predicted 24% faster with AI, were actually 19% slower, and still believed afterward they had been 20% faster. The three numbers together draw an unsettling divergence – people's introspection on "am I faster/better" goes systematically wrong once AI enters. This divergence and the metacognitive trap of desirable difficulty are two scenes of one phenomenon: there, "present fluency misread as mastery"; here, "the smoothness of using AI misread as efficiency." The shared root: the brain proxies 'effect' by 'felt ease,' and AI happens to decouple felt ease from effect – it can make the process extremely effortless without guaranteeing (or even while inverting) real output or real retention. The caveat must be pinned down: METR is grade III corroboration, small N, possibly a learning curve, and a coding task; it cannot be extrapolated to "AI always slows experts." Its only use is to puncture the default equation "effortless = efficient." Once that equation is punctured, "judging learning by feel" becomes wholly unreliable – the root reason the SHEET 08 dashboard must lean on lagged, AI-absent indicators.
巩固是生理过程,不是效率偏好
Consolidation is a physiological process, not an efficiency preference
"慢有价值"很容易被听成一句怀旧的价值观,必须把它钉回生理学,它才站得住。记忆从海马依赖的脆弱痕迹,转为新皮层稳定表征的系统巩固,是一个需要离线时间、且高度依赖睡眠的物理过程:慢波睡眠(SWS)期间,白天的记忆痕迹被反复重激活与回放,是这个转移的承重机制(Stickgold 2006;Diekelmann & Born 2010;Rasch & Born 2013,Ⅱ)。这不是"最好留点时间消化"的软建议,而是没有那段离线时间,长期表征就不形成的硬约束。间隔效应是同一枚硬币的另一面:24 小时间隔胜过 15 分钟间隔,因为拉开的复习给了每一次提取一个"几乎遗忘、再被找回"的窗口,而正是这个再巩固的窗口在加固痕迹。AI 能把信息的传输压到瞬时,但传输不是巩固——它压不掉那段必须发生在你大脑里、且大半发生在你睡着时的离线加工。
"Slow has value" is easily heard as a nostalgic value judgment; it only holds once nailed back to physiology. The turn of memory from fragile hippocampus-dependent traces into stable neocortical representations – systems consolidation – is a physical process that needs offline time and depends heavily on sleep: during slow-wave sleep (SWS), the day's memory traces are repeatedly reactivated and replayed, the load-bearing mechanism of this transfer (Stickgold 2006; Diekelmann & Born 2010; Rasch & Born 2013, II). This is not the soft advice "best to leave time to digest" but the hard constraint that without that offline time, the long-term representation does not form. The spacing effect is the other side of the same coin: a 24-hour gap beats a 15-minute one because spaced review gives each retrieval an "almost-forgotten, then recovered" window, and it is that reconsolidation window that hardens the trace. AI can compress the transmission of information to instant, but transmission is not consolidation – it cannot compress the offline processing that must happen inside your brain, and mostly while you sleep.
这把"慢"从修辞升级为方法论公理:脚手架不该对抗这条时间常数,而要顺着它设计。SHEET 09 反思库的"下次复看日期"就是这条公理的工程化——它不是任务管理,是把间隔效应硬编码进流程。同样,一个把整门课压成一晚速通的学习计划,不管 AI 讲得多顺,都在和生理学对着干:它优化了当晚的流畅感,牺牲了三周后的留存。本卷的处方因此多了一句:把时间当盟友编进流程,而不是当延迟优化掉。
This upgrades "slow" from rhetoric to a methodological axiom: the scaffold should not fight this time constant but be designed along it. SHEET 09's reflection-log "next review date" is the engineering of this axiom – not task management but the spacing effect hard-coded into the process. Likewise, a study plan that crams a whole course into one all-nighter, however smoothly AI explains it, is working against physiology: it optimizes that night's fluency and sacrifices retention three weeks out. The volume's prescription therefore gains a clause: build time into the process as an ally, not optimize it away as latency.
检验信号Test signal
隔一夜/隔几天再测的留存与迁移,而非当堂表现——若你只看"刚学完会不会",你会系统性地误判(合意困难的元认知陷阱)。一个顺着时间常数设计的学习流,当下感觉更费劲,隔期表现更好;反过来就是在用速度偷换留存。Retention and transfer tested after a night or a few days, not in-session performance – if you only check "can I do it right after learning," you will systematically misjudge (the metacognitive trap of desirable difficulty). A learning flow designed along the time constant feels harder now and performs better later; the reverse is trading retention for speed.
LEARN
15
FAILURE · 失败模式
FAILURE
失败 · 误用方式 + 第二仪器
Failure · Anti-patterns + 2nd instrument
AI-Native 学习最常见的误用方式
How AI-Native Learning Most Often Goes Wrong
承重命题:这一卷有一组反复出现的误用方式——它们不是"没用 AI",恰恰是"用了 AI、却用反了方向"。逐一点名,配自查,再落到第二件仪器:一道单项快速判定,回答"这件事到底该不该让 AI 做"。失败模式不是为了制造恐惧,而是把前面各张的边界反过来说一遍,让你认得出自己正在滑向哪条。
Load-bearing claim: this volume has a recurring set of failure modes – not "didn't use AI" but precisely "used AI, in the wrong direction." Name each, pair it with a self-check, then land on the second instrument: a quick single-item test answering "should AI do this at all." Failure modes are not scare tactics but the boundaries of the earlier sheets said in reverse, so you can recognize which one you are sliding toward.
本卷作者最该防的误用方式:把萎缩写成已证
The author's own failure to guard against: writing atrophy as proven
最后一种误用方式,矛头是对内的——它针对的是写这一卷的人,以及任何被它说服、进而想替它辩护的人。一个唱反调的命题天然带着一种诱惑:为了让警示更有力,把"认知可能萎缩"悄悄写成"认知已经萎缩"、把"有据的担忧"夸成"AI 已经造成认知能力下降"。这条线一旦越过,整卷的可信度就会崩塌,原因有两层。其一,证据根本不支持这种判决:本卷反复摆明,萎缩侧的证据全是相关、自报、短期、小样本,最强的因果证据反而指向正面,且零纵向数据(SHEET 10 的双账本就是为固定这一点而建)。把软证据当硬结论用,是学术上的失实。其二,更微妙的是修辞上的反噬:危言耸听会让真正该被认真对待的担忧,被读者连同那份夸张一起丢掉——你越是把话说满,越没人信你那条本来站得住的核心提醒。所以本卷把立场固定在 B 档:有据的警告,加一个写明证伪条件的赌注,一步都不越过证据。这一条之所以放在失败模式的最后,是因为它是元层级的——前五种是读者会犯的,这一种是作者必须先在自己身上防住的。一卷讲认知诚实的书,第一个要诚实对待的,是它自己证据的软硬。
The last failure mode points inward – at whoever writes this volume, and at anyone persuaded by it who then wants to defend it. A contrarian claim carries a natural temptation: to make the warning stronger, quietly write "cognition may atrophy" as "cognition has atrophied," and inflate "an evidence-grounded concern" into "AI makes you stupid." Cross that line and the whole volume's credibility collapses, for two reasons. First, the evidence simply does not support such a verdict: the volume repeatedly shows that the atrophy-side evidence is all correlational, self-report, short-term, small-sample, with the strongest causal evidence pointing positive and zero longitudinal data (the SHEET 10 dual ledger exists to pin exactly this). Using soft evidence as a hard conclusion is academic misrepresentation. Second, more subtly, the rhetorical backfire: scaremongering makes readers discard the concern that genuinely deserves seriousness along with the exaggeration – the more you overstate, the fewer believe the core reminder that was actually defensible. So the volume pins its stance firmly at grade B: an evidence-grounded warning plus a bet with its falsification condition spelled out, overshooting the evidence by not one step. This item sits last among the failure modes because it is meta-level – the first five are ones the reader commits, this one the author must guard against in themselves first. A book about cognitive honesty must, first of all, be honest about the hardness of its own evidence.
两个最容易被忽视的误用方式:指标表演与过度抵抗
Two most-overlooked failures: gaming the metrics and white-knuckling
六种误用方式里,有两种特别隐蔽,因为它们都伪装成"在认真执行方法论"。第一种是指标的 pointsification:把 SHEET 08 的仪表盘做成自我考核的分数或排行,于是开始为指标表演——多反驳几次 AI 来抬高 override rate、多记几条错题来充数。这恰好掉进设计卷反复警告的 Pointsification 反模式:用分数替换了被测量的真实行为,指标一旦成了目标就不再是好指标(古德哈特定律)。仪表盘的正确用法是烟雾报警器——响了去查,不是攒积分。自查:我是在用信号校准自己,还是在做指标表演?第二种是不合意的困难:把"抵抗便利"做成自虐,在毫无基础的领域强行不用 AI,结果只是低效的挫败。这违反 Bjork 的明确边界——困难只对有基础能成功响应的学习者才"合意",否则它就只是绊脚石(FIG L.2 的右端)。抵抗是有度的工程,不是道德苦行:在你已有根基、撤除测试还撑得住的地方加阻力;在你还没根基的地方,老实用 AI 当脚手架,等根基长起来再撤。自查:这点阻力让我长出能力,还是只让我停滞?
Of the six failure modes, two are especially insidious because both disguise themselves as "diligently executing the methodology." The first is pointsification of the metrics: turning the SHEET 08 dashboard into a self-grading score or leaderboard, then performing for the metric – overriding AI a few extra times to lift override rate, logging filler errors to pad the count. This falls squarely into the Pointsification anti-pattern the design volume keeps warning about: replacing the measured real behavior with a score, and once an indicator becomes the target it stops being a good indicator (Goodhart's law). The dashboard's correct use is a smoke alarm – investigate when it sounds, do not farm points. Self-check: am I calibrating with the signal, or farming the signal to show myself? The second is undesirable difficulty: turning "resist convenience" into self-flagellation, refusing AI in a domain with no foundation, yielding only inefficient frustration. This violates Bjork's explicit boundary – difficulty is "desirable" only for learners with enough base to respond successfully, otherwise it is just an obstacle (FIG L.2's right end). Resistance is measured engineering, not moral asceticism: add friction where you already have grounding and the removal test still holds; where you have none, honestly use AI as a scaffold and withdraw it once the base has grown. Self-check: does this friction grow capability, or just jam me?
包装化练习:为什么"学得有趣"会反噬
Chocolate-covered broccoli: why "make it fun" backfires
这个教育游戏化比喻值得拆开,因为它对应 AI 学习产品最常见、也最难自觉的误区。原说法指的是:把枯燥练习包上一层游戏化包装(积分、动画、拟人讲解),以为这样就让学习变好了。失败在于——包装没有改变核心活动本身,只是用外部奖励暂时遮住它;包装消退后,真正需要完成的练习仍然没有发生。AI 时代的变体更隐蔽:把"让 AI 把知识讲得过度顺滑、强共鸣、低负担"误当成把学习变好了。但 SHEET 02 已经证明,学习的承重在犯错-纠正循环,不在讲解的顺滑度。顺滑的讲解作用在"知道"层(识别、被动接收),它确实让那一层更舒服——可那一层 AI 早已充裕化,舒不舒服都不再是瓶颈。真正的练习(亲手试错、遇到阻力、自纠)不仅没有被加强,反而被顺滑的讲解挤掉了:越觉得顺滑,越没在练那个贵的层。所以这个误用方式的危险不在"无效",在"它制造了强烈的'我在学'的错觉",而错觉正是最难自查的。
This metaphor is worth unpacking, because it is the pit AI learning products most often fall into and least notice. The original refers to a failure mode in educational gamification: coating a dull drill (the broccoli) in a layer of game-y sugar (points, animation, anthropomorphic narration), assuming this makes learning fun. The failure: the coating never changed the core activity's dullness, only masked it with an external reward; once the sugar dissolves, the broccoli still goes uneaten. The AI-era variant is subtler: mistaking "have AI explain the knowledge ultra-smoothly, with great resonance, effortlessly" for having made learning better. But SHEET 02 already proved learning's load-bearing part is the error-correction loop, not the smoothness of explanation. Smooth explanation acts on the "knowing" layer (recognition, passive reception) and does make that layer more comfortable – but AI made that layer abundant long ago; comfortable or not, it is no longer the bottleneck. The real broccoli (trying and erring by hand, getting stuck, self-correcting) is not coated but crowded out by the smooth explanation: the more pleasant it feels, the less you practiced the expensive layer. So the danger of this failure mode is not "ineffective" but that "it manufactures a strong illusion of 'I am learning'" – and the illusion is the hardest thing to self-check.
六种误用方式,是前面六张承重的镜像
Six ways it goes wrong, mirroring six earlier claims
这一张不是新内容,是把前面各张的承重命题反过来说一遍——每一种失败模式都精确对应一张 SHEET 的边界被越过。这样排列有个用处:当你怀疑自己正在误用方法论时,可以顺着失败模式倒查回那张 SHEET 的处方。它们也不是彼此孤立的六个问题,而是共享一个根:把"AI 做得了"误读成"该让 AI 做"——便利陷阱是这个根的总名,其余五种是它在不同环节的变体。包装化练习是它发生在"核心活动"环节,pointsification 是它发生在"监测"环节,空中楼阁是它发生在"元能力"环节,不合意的困难是抵抗用力过猛的反向失手,把萎缩当已证则是本卷作者自己最该防的立场失手。逐一点名,是为了让你认得出自己正滑向哪一种。
This sheet is not new content but the earlier load-bearing claims said in reverse – each failure mode maps precisely to one SHEET's boundary being crossed. Arranged this way it has a use: when you suspect you are going wrong, you can trace a failure mode back to that SHEET's prescription. They are also not six isolated pits but share one root: misreading "AI can do it" as "AI should do it" – the convenience trap is the umbrella name for that root, and the other five are its variants at different stages. Chocolate-covered broccoli is it occurring at the "core activity" stage, pointsification at the "monitoring" stage, the castle in the air at the "meta-skill" stage, undesirable difficulty is resistance overshooting in reverse, and treating atrophy as proven is the stance misstep the volume's own author must most guard against. Naming each is so you can recognize which one you are sliding toward.
便利陷阱(头号)——把"AI 做得了"误当成"该让 AI 做",在高可充裕 × 高不可外包那一格悄悄交出判断/品味/深度思考(SHEET 04/07 的危险区)。自查:你还记得不用 AI 时怎么做这件事吗?记不清就触线了。
The convenience trap (the prime one) – mistaking "AI can do it" for "AI should do it," quietly handing over judgment/taste/deep thinking in the high-abundance × high-un-outsourceability cell (the danger zone of SHEET 04/07). Self-check: do you still remember how to do this without AI? If not, you have crossed the line.
包装化练习的反向使用——以为"让 AI 把知识讲得更有趣/更省力"就是学习。可学习的承重在犯错-纠正循环,不在讲解的顺滑。把核心活动(亲手试错)外包掉、只留包装(被讲解得很顺),等于留下了学习的表层体验,却丢掉了真正的练习。自查:这一小时里我产出了多少,还是只接收了很多?
Chocolate-covered broccoli, eaten upside down – believing that "AI makes the knowledge more fun / more effortless" is learning. But learning's load-bearing part is the error-correction loop, not the smoothness of explanation. Outsourcing the core activity (trying and erring by hand) and keeping only the wrapper (being explained to pleasantly) is throwing out the broccoli and eating the coating. Self-check: in this hour, how much did I produce versus merely receive?
指标的 pointsification——把 SHEET 08 的仪表盘做成自我考核的分数/排行,于是开始为指标表演(多推翻几次 AI 显得"有主导权"),而非真在监测。指标是烟雾报警器,不是 KPI。自查:我是在用信号校准,还是在刷信号?
Pointsification of the metrics – turning the SHEET 08 dashboard into a self-grading score/leaderboard, then performing for the metric (overriding AI a few extra times to look "in command") instead of genuinely monitoring. The indicators are a smoke alarm, not a KPI. Self-check: am I calibrating with the signal, or gaming it?
Undesirable difficulty – turning "resist convenience" into self-flagellation: refusing AI in a domain where you have no foundation, yielding only inefficient frustration. Bjork's boundary: difficulty is "desirable" only for those with enough background to respond successfully. Resistance must be measured (SHEET 04). Self-check: does this friction grow capability, or just jam me?
空中楼阁的元能力——以为"提问/质疑/整合"能脱离具体"能做"独立训练。你质疑不了一个你毫无根基领域里的 AI 输出(SHEET 03)。自查:我的质疑命中率在升还是在降?降,多半是根基空了。
Castle-in-the-air meta-skills – believing "asking/challenging/integrating" can be trained in isolation from concrete "knowing how." You cannot challenge an AI output in a domain where you have no grounding (SHEET 03). Self-check: is my challenge hit-rate rising or falling? Falling usually means the grounding has emptied out.
Treating atrophy as proven (a stance misstep) – the failure mode this volume itself must most guard against: for warning force, writing "may atrophy" as "already dumber." The evidence does not support it (SHEET 10) and it backfires on credibility. Self-check: am I stating "an evidence-grounded warning plus a falsifiable bet," or have I overshot the evidence into a verdict?
INSTRUMENT 12 · 该不该让 AI 做 · 单项判定 THE DON'T-OUTSOURCE TEST
The SHEET 07 cognitive audit is a coordinate-style global scan; this one is a quick single-item test – for one capacity you are considering outsourcing, tick each line and get an "outsourcing-safety score" plus a one-line verdict. Every question maps to an earlier sheet's load-bearing claim: the more you tick, the more it belongs inside the stop-line.
LEARN
16
SPECULATION · 推演幕
SPECULATION
推论 · 外推,非事实
Inference · Extrapolation, not fact
2026→2032:认知主权的可能性空间
2026 to 2032: The Possibility Space of Cognitive Sovereignty
Load-bearing claim (speculation · on the exploration ledger): this volume's open question – does offloading let deep thinking atrophy – will not be settled within five years. So this sheet draws no single curve but opens a possibility space: two high-impact, high-uncertainty forces cross into four 2030 worlds, each carrying leading indicators and falsification conditions, with a fictional artifact from that world to make the speculation tangible, and an honest record of the bet that most argues against this volume. Only speculation that can be falsified is worth speculating.
Nature of this chapter · Inference What follows is extrapolation from the public trajectory of 2024–2026, not a statement of fact. It rests on the grade-III↓ evidence of SHEET 04/10 (correlational, self-report, short-term, zero longitudinal), so the whole chapter places bets, never verdicts. The moment the first multi-year longitudinal datasets appear and point a direction, this chapter should be the first rewritten.
Zoom the time scale out: writing (c. 3200 BCE), the printing press (1440), the search engine (1998) each cut the cost of obtaining "knowing" by an order of magnitude, and each came with the same anxiety – Socrates in the Phaedrus feared writing would atrophy memory [R16]. He was not wholly wrong: rote memory did give way. But each time, the freed cognition moved up to higher-order work. AI is the steepest drop on this curve, but it adds an unprecedented feature – it outsources not just storage (like writing) but reasoning and generation themselves. Socrates's bet lost the previous five times (atrophy never overwhelmed redistribution); the whole tension of this volume is that this time what is outsourced sits closer to "thinking" itself, so the extrapolation weight of the historical induction must take its own discount.
三条正在汇流的力量,每条都给出证伪信号
Three converging forces, each with its falsification signal
Speculation is not daydreaming. Three observable forces are pushing learning toward the same crossroads at once – not forecasts but curves already in motion, each carrying "what observation would kill it." Writing the falsification condition beside each is what lets this speculation be falsified by the future, rather than be an essay that is always right and therefore carries no information.
力量一 · 摩擦归零的工具默认
Force 1 · The friction-to-zero tool default
2026→2032每一代学习工具(IDE 补全、答题、写作、辅导 agent)的默认进化方向都是更顺、更省力、更自动。到 2030,"先自己想一遍"在工具流里会需要刻意绕路才做得到。Every generation of learning tool (IDE completion, answer-bots, writing, tutoring agents) defaults to smoother, easier, more automatic. By 2030, "think it through yourself first" will require a deliberate detour inside the toolflow.
已在动In motion2024–2026 一手可观测,无需推演。First-hand observable in 2024–2026, no speculation needed.
证伪Falsified if若主流工具开始把"合意困难"作为默认卖点(如默认开启延迟提示、闭卷模式),摩擦归零的单向趋势即被推翻。If mainstream tools start shipping "desirable difficulty" as a default selling point (delay-prompts, closed-book mode on by default), the one-way slide to zero friction is overturned.
力量二 · 纵向数据终于到场
Force 2 · The longitudinal data finally arrives
2026→2032今天关于卸载的证据全是横断/短期(SHEET 10)。第一批"重度 AI 协作者三到五年后在无 AI 迁移任务上的表现"的纵向研究,大概率在 2028–2030 出结果——它会把头号悬案从赌注推向某个方向。Today's offloading evidence is all cross-sectional/short-term (SHEET 10). The first longitudinal studies of "how heavy AI-collaborators perform on AI-absent transfer tasks after three to five years" will most likely report in 2028–2030 – moving the open question from a bet toward a direction.
推论态Inferred研究在跑,结论未出——本曲线推演权重中等。Studies are running, conclusions are not in – medium speculative weight.
证伪Falsified if若到 2032 仍无任何方法学过关的多年期纵向研究发表,则"数据将裁决"这条本身落空,本卷只能继续下赌注。If by 2032 no methodologically sound multi-year longitudinal study has been published, "the data will adjudicate" itself fails, and this volume can only keep betting.
力量三 · 教育机构开始定价"无 AI 能力"
Force 3 · Institutions start pricing "AI-absent capability"
2026→2032当任何人借 AI 都能产出合格作业,"在 AI 缺席下还能做"的能力变成稀缺的可定价信号。预期 2027 起回潮的闭卷/口试/现场演示评估,本质是机构在给"撤除后仍能独立运转"的能力重新标价。Once anyone with AI can produce passable work, "can still do it with AI absent" becomes a scarce, priceable signal. The closed-book / oral / live-demo assessments expected to resurge from 2027 are, at root, institutions re-pricing the "still operates independently after removal" capacity.
早期信号Early signal2024–2025 已有大学回退手写考试,零星但方向一致。In 2024–2025 some universities already reverted to handwritten exams – sparse but directionally consistent.
证伪Falsified if若主流认证转向"人机协作产出"为唯一评估口径、且不再单独考核无 AI 能力,则这条力量反向,"撤除测试"失去制度支撑。If mainstream credentialing shifts to "human-AI joint output" as the sole assessment and stops testing AI-absent capability separately, this force reverses and the "removal test" loses institutional support.
The three forces mark the boundaries, but which world 2030 lands in turns on two high-impact and still highly uncertain axes. The horizontal: how the open question is adjudicated – whether offloading proves closer to atrophy or closer to redistribution (SHEET 04's two rival hypotheses, which no one yet has longitudinal data to settle). The vertical: how individuals and institutions respond to convenience – whether sovereign learners who design friction and guard the removal capacity prevail, or dependent learners who slide down the zero-friction default prevail. The axes cross into four worlds, each tagged with the leading indicators sliding toward it and the observation that would falsify it. This is the GBN two-axis scenario method [R20] – it predicts no single outcome; it maps the whole possibility space so you can recognize which cell you are sliding toward.
四个未来怎么读:横轴是头号悬案最终怎么裁决(左=萎缩被坐实,右=只是再分配),纵轴是学习者整体的姿态(上=主动抵抗的主权学习者占上风,下=顺默认下滑的依赖学习者占上风)。四象限里,无声的空心化(左下)最坏:能力真在退、人却因元认知错觉浑然不觉;升级的常态(右上)最好。注意一个不对称——本卷主张的"抵抗便利"把你推向上半区,而上半区在两种裁决下都不亏(萎缩世界里它救命,再分配世界里它正好是把资源往高阶引导的动作)。这正是 SHEET 04 那个"保险"论证的图形版:在裁决未出的窗口期,往上半区站是无悔的下注。The four futures: the x-axis is how the open question is finally adjudicated (left = atrophy confirmed, right = mere redistribution); the y-axis is the learner population's posture (up = sovereign learners who actively resist prevail, down = dependent learners who slide down the default prevail). Among the four, the Quiet Hollowing (bottom-left) is worst: capability really decays yet people, via the metacognitive illusion, do not notice; the Upgraded Default (top-right) is best. Note an asymmetry – the "resist convenience" this volume prescribes pushes you into the upper half, and the upper half loses nothing under either verdict (in the atrophy world it saves you; in the redistribution world it is exactly the move that steers resources toward higher-order work). This is the graphical form of SHEET 04's "insurance" argument: during the window before the verdict, standing in the upper half is a no-regret bet.
一件来自 2031 的虚构文物,让推演可触
A fictional 2031 artifact, to make the speculation tangible
只由断言构成的推演显得抽象。下面这件是设计虚构:一份明确虚构的未来文物,把"机构开始给无 AI 能力定价"这条力量投射到 2031 的具体一页上。它不是预测,是把本卷的赌注做成一个你能拿在手里的东西。
Speculation made only of assertions feels abstract. The piece below is design fiction: an explicitly fictional future artifact projecting the "institutions price AI-absent capability" force onto one concrete 2031 page. It is not a prediction; it makes the volume's bet into something you can hold.
CS 247: Algorithm Design (the AI-absent core block)
课程结构
本课分两段。协作段(70%):鼓励全程使用 AI——这是你毕业后真实的工作方式。无 AI 内核段(30%):在监考、断网、纸笔条件下完成;这一段不是怀旧,是给你的"撤除后仍能独立运转"的能力建立可被雇主与认证机构信任的记录。
Course structure
This course splits in two. Collaboration block (70%): AI use is encouraged throughout – this is how you will actually work after graduation. AI-absent core block (30%): completed proctored, offline, with pen and paper; this block is not nostalgia but builds a record of your "still operates independently after removal" capacity that employers and credentialers can trust.
合意困难,明码标价
内核段刻意保留三类难度:闭卷推导、延迟提示(卡住 15 分钟才解锁参考)、交错复习。课程评估中明确告知:这些摩擦会让你当下更难受,但它们正是这门课不可被 AI 代劳的部分——也是你学位里唯一无法被外包者复制的信号。
Desirable difficulty, openly priced
The core block deliberately keeps three kinds of difficulty: closed-book derivation, delayed hints (references unlock only after 15 minutes stuck), and interleaved review. The rubric states plainly: this friction will make the present harder, but it is exactly the part of this course that cannot be done by AI – and the only signal in your degree that an outsourcer cannot copy.
证书标注
成绩单分列两个分数:协作能力与无 AI 能力。用人方可分别查询。(2031 起本州 12 所大学采用同一双分制。)
Transcript notation
The transcript lists two scores: collaboration capability and AI-absent capability, separately queryable by employers. (From 2031, 12 universities in this state adopted the same dual-score scheme.)
「我们不再假装 AI 不存在,也不再假装它无关紧要。我们把课程拆成两半,分别给两种能力定价——因为 2031 年的就业市场已经这样定价了。」——课程说明页
"We no longer pretend AI does not exist, nor that it does not matter. We split the course in two and price the two capacities separately – because the 2031 job market already prices them that way." – course description page
反赌注:最该反对本卷的那个论证
The counter-bet: the argument that most opposes this volume
一卷诚实的推演必须记下反对自己的最强赌注,否则它只是在自我确认。本卷的核心命题是"长期卸载深度思考可能侵蚀深度思考本身,所以要买抵抗这份保险"。最有力的反驳不是"萎缩不会发生",而是一个更釜底抽薪的论证——本卷可能把"被外包的能力"和"值得保留的能力"错误地等同了。延伸心智(Clark & Chalmers 1998)的强版本主张:认知一直是人与工具的耦合系统,没有哪一代人保有过"纯粹的、未被外包的"思考;我们今天珍视的"独立推理",本身就是被书写、印刷、计算器、搜索塑造的产物。如果这成立,那么本卷守护的"撤除后仍能独立运转"可能是个伪目标——就像要求一个现代数学家在没有符号记法的条件下证明定理,那不是更纯粹的能力,只是更低效的折磨。更尖锐的版本来自再分配假说的乐观读法:如果 AI 把人的认知系统性地从可卸载任务释放到判断/品味/提问,那么花力气"抵抗便利、保留低阶手工能力"反而是逆历史潮流的内卷,把本该上移的认知资源浪费在了机器已经做得更好的层面。本卷不认为这个反赌注已经赢——但它有可能赢,且赢的条件是清晰的:见下方证伪。把它写在这里,是因为一卷讲认知诚实的书,必须对自己的核心命题也执行同一套诚实。
An honest speculation must record the strongest bet against itself, or it is merely self-confirming. The volume's core claim is "outsourcing deep thinking for the long run may erode deep thinking itself, so buy resistance as insurance." The most powerful rebuttal is not "atrophy won't happen" but a more foundational argument – the volume may be wrongly equating "the capacity that gets outsourced" with "the capacity worth keeping." The strong version of the extended mind (Clark & Chalmers 1998) holds that cognition has always been a coupled person-tool system; no generation ever held "pure, un-outsourced" thinking; the "independent reasoning" we prize today is itself a product shaped by writing, print, calculators, search. If that holds, then the "still operates independently after removal" the volume guards may be a false target – like demanding a modern mathematician prove a theorem with no symbolic notation: not a purer capacity, just a less efficient torment. A sharper version comes from the optimistic reading of the redistribution hypothesis: if AI systematically frees human cognition from offloadable tasks toward judgment / taste / questioning, then spending effort to "resist convenience and preserve low-order manual skills" is counter-historical busywork, wasting on a layer machines already do better the cognitive resources that should have moved up. The volume does not think this counter-bet has already won – but it could win, and the conditions under which it wins are clear: see the falsification below. It is written here because a book about cognitive honesty must apply the same honesty to its own core claim.
反赌注的证伪 / 兑现条件When the counter-bet wins or loses
反赌注赢(本卷该退让)的条件:一项纵向、随机研究显示,重度 AI 协作者在无 AI 迁移任务上不退步、甚至因再分配而进步——则"抵抗"是逆潮流内卷,本卷②步的防御性主张应被削弱。反赌注输(本卷站得住)的条件:同类研究显示无 AI 能力随依赖时长系统性下降,且该下降不被高阶能力的提升抵消。关键在于:这两个世界今天无人有数据分辨——正因如此,在裁决前买一份两头都不亏的保险,仍是稳健的下注。本卷与反赌注的真正分歧不在价值观,在一个经验问题:撤除 AI 后,独立产出随时间是降还是升。The counter-bet wins (the volume should yield) if: a longitudinal, randomized study shows heavy AI-collaborators do not regress on AI-absent transfer tasks – or even improve via redistribution – then "resistance" is counter-historical busywork and this volume's defensive ② claim should weaken. The counter-bet loses (the volume holds) if: such studies show AI-absent capability falls systematically with dependence duration, and that fall is not offset by gains in higher-order capacity. The crux: today no one has the data to tell these two worlds apart – which is exactly why, before the verdict, buying insurance that loses in neither remains the robust bet. The volume's real disagreement with the counter-bet is not about values but about one empirical question: with AI removed, does independent output fall or rise over time.
LEARN
17
LANDING · 落地
LANDING
落地 · 收束与起步
Landing · Close & start
抵抗便利的学习者操作系统
The Learner's Operating System for Resisting Convenience
Load-bearing claim (the close): collapse the whole volume into operable principles + signals + a starting path, landing on the INSTRUMENT. The last layer gives no static answer – it uses a dynamic three-part split: what's invariant, what's shifting, what's still on the frontier and unsettled.
这个收尾本身,就是方法论的一次演示
This close is itself a demonstration of the methodology
It is worth naming the last layer's self-referential structure: the volume closes with a "dynamic three-part split" rather than a definite checklist, and this form itself demonstrates its content. What the volume teaches throughout is – facing a domain of uneven evidence states, the right move is to grade by evidence and act on each cell with force matched to its hardness, not to flatten all claims to equal certainty. The close does exactly that: the invariant cell (desirable difficulty, the testing effect, sleep consolidation) is grade-II evidence, so it gives a definite prescription to act on now; the frontier cell (does atrophy really happen) is grade-III↓, so it gives only a monitoring posture, no conclusion. If the volume suddenly handed over a "just follow this" universal checklist at the end, it would violate every discipline it just built across sixteen sheets – disguising an open question as solved. So this seemingly "unsatisfying" close is the final redemption of the volume's honesty: a book about cognitive sovereignty cannot, on its last page, make the reader's judgment for them. It hands judgment back to you, along with all the coordinates needed to read it – which is exactly "become a better question-asker" landed on the act of reading itself.
四条原理,配六个能自己读的信号
Four principles, paired with six signals you can read yourself
操作系统不只给原则,还要给反馈回路——否则你不知道原则有没有在起作用。把全卷的检验信号收成一组,每条都设计成个人尺度可读、且偏向滞后/无 AI 在场(避开即时流畅的元认知陷阱)。升 = 好的四条:提问质量(你提的问题越来越切中真问题);质疑 AI 的命中率(你反驳 AI 且反驳对的频率在升);迁移测试通过率(撤掉 AI、换新情境你还做得动);反思库回流使用率(你真的回去重做自己的错题,而非只攒着)。降 = 好的两条反指标:答案召回在学习时间里的占比("秒查"挤掉"走循环"的比例在降);以及"不用 AI 我还会吗"答不上来的频次(在降)。这六条不是 KPI,是仪表盘——它们一起读才有意义,单看任何一条都会被即时感受带偏(SHEET 08 的悖论)。一个健康的学习者,会看到前四条缓慢爬升、后两条缓慢回落,而不是任何一条短期飙高。
An operating system gives not only principles but feedback loops – otherwise you cannot tell whether the principles are working. Gather the volume's test signals into one set, each designed to be readable at the individual scale and biased toward lagged / AI-absent (dodging the metacognitive trap of immediate fluency). Four where rising = good: question quality (your questions increasingly hit real problems); AI-challenge hit-rate (the frequency of pushing back on AI, correctly, rises); transfer-test pass rate (with AI removed and the situation new, you can still do it); reflection-log return-use rate (you actually go back and redo your own errors rather than just hoarding). Two counter-indicators where falling = good: the share of answer-recall in your learning time (the proportion where "instant lookup" crowds out "running the loop" falls); and the frequency of failing the "could I still do this without AI" check (falls). These six are not KPIs but a dashboard – they only mean something read together, and any single one alone is skewed by immediate feeling (the SHEET 08 paradox). A healthy learner sees the first four slowly climb and the last two slowly recede, not any single one spike in the short term.
认知主权:这一卷到底在守什么
Cognitive sovereignty: what this volume actually guards
把全卷收成一个词,是认知主权——指一个人在与 AI 协作中,仍然保有"自己做判断、自己识别真问题、自己裁决好坏"的能力与位置。这个词把本卷的反调从一种情绪("担心 AI 让人变懒")提升成一个有结构的目标。主权不是排斥工具:一个有主权的学习者大量用 AI,但他始终是发号施令、并能验证命令是否被正确执行的那一方;他外包执行,不外包判断。主权也不是一劳永逸:它需要被持续守护,因为侵蚀它的力量(便利的默认引力)每天都在场。前面十六张 SHEET 是守护它的具体工程——机理告诉你它为什么会被侵蚀(成本剪刀差),断裂点告诉你最危险的侵蚀在哪(深度思考的外包),脚手架与流向规约告诉你怎么用工程而非意志去守,仪表盘告诉你怎么在侵蚀发生前看见它,止步线告诉你哪条边界绝不能退。这一卷之所以在系列里不可省,是因为其余几卷描绘的那个"人回归于意义、人做判断与品味"的未来,预设了人还具备一个未被萎缩的认知主体——而守护那个主体,就是学习方法论存在的全部理由。当执行变得充裕,最稀缺、最该被刻意守护的,是那个还能判断"这一切是为了什么"的人。
Collapse the whole volume into one word and it is cognitive sovereignty – a person retaining, in collaboration with AI, the capacity and the seat to "make their own judgments, spot real problems themselves, adjudicate good from bad themselves." This word lifts the volume's dissent from an emotion ("worried AI makes people lazy") into a structured goal. Sovereignty is not rejecting tools: a sovereign learner uses AI heavily, but is always the one giving orders and able to verify the orders were carried out correctly; they outsource execution, not judgment. Nor is sovereignty achieved once and for all: it must be continuously guarded, because the force that erodes it (convenience's default gravity) is present every day. The preceding sixteen sheets are the concrete engineering of guarding it – the mechanics tell you why it gets eroded (the cost scissors), the fracture point tells you where the most dangerous erosion is (outsourcing deep thinking), the scaffold and flow rules tell you how to guard it with engineering rather than willpower, the dashboard tells you how to see erosion before it happens, the stop-line tells you which boundary must never be ceded. This volume is indispensable in the series because the future the others depict – "people return to meaning, people do judgment and taste" – presupposes that people still possess an un-atrophied cognitive subject, and guarding that subject is the entire reason a learning methodology exists. When execution becomes abundant, the scarcest thing, the thing most deserving deliberate guarding, is the person who can still judge "what all this is for."
起步只有一步:先做一次撤除演练
The starting path is one step: run a removal drill first
收尾最忌给一长串待办,让人无从下手。这一卷的起步路径刻意压成一步,且这一步本身就是全卷方法论的缩影:挑一项你已经高度依赖 AI 的认知任务,做一次撤除演练——合上 AI,在无它在场的条件下从头做一遍,记下你卡在哪。这一步同时干了三件事。它运行了一次 INSTRUMENT 11/12 的判定(这项能力到底落在哪个象限、该不该留在止步线内);它产出了你 N=1 纵向研究的第一个数据点(SHEET 08 的迁移信号,基线就此建立);它生成了你错题反思库的第一条记录(SHEET 09,记的不是答案,是"撤掉 AI 后我卡在哪、为什么")。一步落地,三层脚手架同时起步。卡得越狠的地方,越是被悄悄外包、却本不该外包的能力——那里就是你该重建犯错-纠正循环与合意困难的第一处工地。
A close should never hand over a long to-do that leaves you with no foothold. This volume compresses the starting path to one step, and that step is itself a miniature of the whole methodology: pick one cognitive task you already lean on AI for heavily, and run a removal drill – close AI, do it from scratch with AI absent, and note where you get stuck. This one step does three things at once. It runs an INSTRUMENT 11/12 verdict (which quadrant this capacity actually sits in, whether it belongs inside the stop-line); it produces the first data point of your N=1 longitudinal study (the SHEET 08 transfer signal, establishing a baseline); and it generates the first entry of your error-reflection log (SHEET 09 – recording not the answer but "where I got stuck with AI removed, and why"). One landed step, three scaffold layers started at once. Where you stick worst is most likely a capacity quietly outsourced but that should not have been – that is the first construction site for rebuilding its error-correction loop and desirable difficulty.
为什么收尾不给静态答案
Why the close refuses a static answer
大多数方法论在收尾处给一份确定的清单——照做即可。本卷不能,而且这个"不能"本身是它最诚实的立场。原因在 SHEET 04 的头号悬案:认知是否会因长期外包而萎缩,证据尚未定案。一份假装确定的收尾清单会犯本卷自己点名的最后一种误用方式——把未决悬案抬成定论。所以这一卷用动态三分收束,而不是静态答案:把全部命题按证据状态分进三个会随时间移动的格子。不变那一格(合意困难、测试效应、睡眠巩固)是数十年可复现、不依赖任何 AI 研究的硬地基,模型再强也不会动它——这是你可以现在就照做、且不会过时的部分。在变那一格("知道"的获取成本)正在持续下移,学习目标随之上游,你要做的是跟着它调整,而非锚定在某个旧目标上。前沿那一格(萎缩是否真发生)是一个会被未来数据改写的赌注,你对它的正确姿态是挂先行指标、持续监测,而不是现在就站队——这一格的完整推演(2030 的四个可能世界、各自的证伪条件、以及最该反对本卷的那个反赌注)就是 SHEET 16 推演幕。
Most methodologies end with a definite checklist – just follow it. This volume cannot, and that "cannot" is itself its most honest stance. The reason is SHEET 04's open question: whether cognition atrophies under long-term outsourcing is not yet settled. A closing checklist faking certainty would commit the last failure mode the volume itself names – elevating an open question into a settled one. So this volume closes with a dynamic three-part split rather than a static answer: sorting every claim by evidence state into three cells that move over time. The invariant cell (desirable difficulty, the testing effect, sleep consolidation) is a decades-replicable hard floor depending on no AI study; no stronger model moves it – this is the part you can act on now and that will not date. The shifting cell (the cost of obtaining "knowing") keeps moving down, the learning goal moving upstream with it; your job is to adjust along with it, not anchor to an old goal. The frontier cell (does atrophy really happen) is a bet future data will rewrite; the right posture toward it is to hang leading indicators and keep monitoring, not to take sides now – the full speculation of this cell (the four possible 2030 worlds, each falsification condition, and the counter-bet that most argues against this volume) is SHEET 16, the Speculation Act.
This closing structure is itself a demonstration of the volume's methodology: facing a domain where evidence is unsettled, the responsible move is not to fake an answer but to separate the known, the shifting, and the open, and to act on each cell with force matched to its evidence state. This is also why it can seam to the rest of the series without overreaching – it does not hand the org, education, or policy their scale's answers; it only gets right the individual-cognition matter of "what to guard, how to monitor the open risk," then explicitly hands the boundary back.
原理一Principle 1
先想后问Think before you ask
先自己产出假设,再让 AI 校验/补全,不空手求助——保住提问/质疑的元能力(SHEET 03)。Produce your own hypothesis first, then let AI verify/complete it; never ask empty-handed – preserving the asking/challenging meta-skills (SHEET 03).
原理二Principle 2
保留犯错-纠正循环Keep the error-correction loop
错了先自纠再看 AI——承重的是循环结构,不是答案(SHEET 02)。When wrong, self-correct before consulting AI – what bears weight is the loop, not the answer (SHEET 02).
原理三Principle 3
建认知脚手架Build the cognitive scaffold
反思库 / 知识库人机同源、可 diff,内建合意困难(SHEET 05)。A same-source, diffable reflection/knowledge base with desirable difficulty built in (SHEET 05).
第四条原理单独点出,因为它是全卷的题眼:划 AI 止步线——明确哪些能力刻意不外包(SHEET 06)。配套信号:提问质量↑ / 质疑 AI 的命中率↑ / 迁移测试通过率↑ / 反思库回流使用率↑ / "答案召回"在学习时间中的占比↓ / 主动设阻力的习惯化。起步路径只有一步:先做一次"外包 vs 内化"认知体检(下方 INSTRUMENT),标出已被悄悄外包、却本不该外包的能力,对其重建犯错-纠正循环与合意困难。
The fourth principle is called out on its own because it is the volume's keystone: draw the AI stop-line – make explicit which capacities you deliberately do not outsource (SHEET 06). Companion signals: question quality up / hit-rate of challenging AI up / transfer-test pass rate up / reflection-log return-use up / share of "answer recall" in learning time down / habituation of adding friction on purpose. The starting path is one step: run an "offload vs internalize" cognitive audit (the INSTRUMENT below), mark the capacities quietly outsourced but that should not have been, and rebuild their error-correction loop and desirable difficulty.
INSTRUMENT 11 · 外包 vs 内化 · 认知体检 OFFLOAD-VS-INTERNALIZE AUDIT
为一项学习/认知任务打两轴:X · 可充裕度(AI 能多大程度替你做到)× Y · 不可外包度(这项能力萎缩了,你会不会损失认知主导权 / 它是不是下游赖以运转的人类底座)。两轴张成四象限——其中一格是本卷最反直觉的便利陷阱。切两轴看你落在哪,以及该怎么处置。
Score a learning/cognitive task on two axes: X · abundance-ability (how far AI can do it for you) × Y · un-outsourceability (if this capacity atrophies, do you lose cognitive command / is it the human bedrock the downstream runs on). The two axes span four quadrants – one of which is this volume's most counter-intuitive cell, the convenience trap. Toggle both axes to see where you land and what to do.
X · 可充裕度Abundance-ability
Y · 不可外包度Un-outsourceability
核心内化区Core Internalization
低可充裕 × 高不可外包Low abundance × high un-outsourceability
⚠ 便利陷阱⚠ The Convenience Trap
高可充裕 × 高不可外包High abundance × high un-outsourceability
The first two instruments decide "outsource or not"; this one answers "how hard to set resistance." Desirable difficulty is an inverted U (FIG L.2) – too loose leaves no trace, too tight only frustrates, the peak in the middle band. Turn the dial to see the scaffold posture each notch implies, and how much present effort saved it trades for long-term retention. Mind Bjork's boundary: difficulty is desirable only for those with enough base to respond successfully.
当下省力present ease
长期留存long-term retention
THE LAST LAYER · 最后一层,不给静态答案The Last Layer – No Static Answer
不变 · 硬Invariant · hard
理解 ≠ 信息;慢有其价值Understanding ≠ information; slow has value
合意困难、测试效应(Bjork;Roediger & Karpicke),睡眠/间隔巩固——数十年可复现、不依赖 AI。这是全卷最硬的地基,不会因模型更强而变。Desirable difficulty, the testing effect (Bjork; Roediger & Karpicke), sleep/spacing consolidation – decades-replicable, AI-independent. The volume's hardest floor; it does not move because models get stronger.
在变Shifting
"知道"无成本获取"Knowing" is costless to obtain
陈述性知识、讲解、示范随取随到且可个性化。学习目标随之上游到提问/质疑/整合的元能力——这一层正在移动,且还会继续移。Declarative knowledge, explanations, demonstrations on demand and personalizable. The goal moves upstream to the asking/challenging/integrating meta-skills – this layer is moving, and will keep moving.
前沿 · 未决Frontier · open
认知萎缩是否真发生Does atrophy really happen
头号悬案。证据仅相关/短期,最强因果反而正向(+0.73~1.3 SD),无任何多年期纵向数据。本卷下的是赌注不是判决——证伪条件见 SHEET 04。The open question. Evidence is only correlational/short-term, the strongest causal evidence is positive (+0.73 to 1.3 SD), and there is no multi-year longitudinal data. This volume places a bet, not a verdict – falsification condition in SHEET 04.
The first seventeen sheets argue why abundant answers are not learning and what to guard; this piece runs the guarding for you. It does not "design a school": it builds a runnable learning protocol for one person internalizing one capacity. Give it a capacity you want to learn and it first runs a scope gate (greenfield from zero / transformation of a quietly hollowed-out skill / out-of-scope org-scale training, the last judged honestly as "another book," not faked as universal coverage), then produces four real artifacts: a scaffold with desirable difficulty built into the toolflow, an explicit offload-boundary (what the learner does by hand vs. what the agent does), a reflection store with real fields and a write-back edge, and a dashboard that measures capability, not throughput. The counter-intuitive heart of this surface: abundant lookup / answers / explanations is exactly the temptation it must guard against — on the surface of cognition, "having it done for you" changes the one it is done for.
# 在 Claude Code 里调用invoke inside Claude Code
$ /skill ai-native-learning
> "帮我用 AI 学会 X,又不被它替我思考:……""help me learn X with AI without it doing the thinking for me: ..."→ 范围闸 · 绿地 / 转化 / 出域scope gate · greenfield / transformation / out-of-scope→ 交与不交边界 + 脚手架 + 反思库 + 仪表盘offload-boundary + scaffold + reflection store + dashboard→ 一份 AI-Native 学习协议one AI-Native Learning Protocol
What this is · the learning executable companionThe architecture-layer architect designs the organization; the six companion pieces are one per surface, one kernel, mutually coupled, with no fixed reading entry — this is the piece that makes the learning surface runnable. Its sharpest judgment node is also the hardest stop-line in the whole system: which difficulty is desirable and must stay with the human. If the agent does the struggle for you, the human does not learn — offloading the desirable difficulty destroys the very capability the activity exists to build.
SPEC.V / AI NATIVE METHODOLOGY / OWL METHODOLOGY SERIES
SCOPE /一套方法论 · 完整组织光谱 N=1 → N=众多(一人公司至 agent 网络,同一套第一性原理)One methodology · the full organizational spectrum N=1 → N=many (from the one-person company to the agent network, on a single set of first principles)
SERIES /六卷同一内核 · 本卷是其中一个面,完整接线见上方「方法论系列」。Six volumes, one kernel · this volume is one surface; the full wiring is above under "The Series."
APPENDIX · SOURCES /证据与引用登记 —— 分级口径:Ⅰ 同行评审元分析或多实验室复现(最硬)· Ⅱ 同行评审受控/影像研究 · Ⅲ 小样本/未评审/横断面相关(引用须写"相关",不得写"导致")· Ⅳ 综述或从业者一手陈述 · Ⅴ 哲学文本或推演(是论证,不是事实)。一条纪律:全卷核心命题"认知会萎缩"现有证据多为 Ⅲ 级相关性,故降格为可证伪的赌注、不当判决。Evidence and citation registry; grading key: Ⅰ peer-reviewed meta-analysis or multi-lab replication (hardest) · Ⅱ peer-reviewed controlled / imaging study · Ⅲ small-sample / un-reviewed / cross-sectional correlation (citations must read "correlates with," never "causes") · Ⅳ review article or practitioner first-hand account · Ⅴ philosophical text or extrapolation (an argument, not a fact). One discipline: the volume's core claim — "cognition atrophies" — rests mostly on grade-Ⅲ correlational evidence today, so it is downgraded to a falsifiable bet, not a verdict.
测试效应——主动提取(考自己)比重读更利于长期保持,而重读在短期看起来更好,于是人系统性误判;受控实验、数十年可复现。AI 把"重读式"轻松最大化,恰好踩中这个误判陷阱The testing effect: active retrieval (self-testing) beats rereading for long-term retention, yet rereading looks better in the short term, so people systematically misjudge; a controlled experiment, replicated for decades. AI maximizes the ease of the "reread" mode, landing squarely in that misjudgment trap
R2
Ⅱ
Ericsson, Krampe & Tesch-Römer《The Role of Deliberate Practice in the Acquisition of Expert Performance》Psychological Review 100(3) 1993:363-406 · doi.org/10.1037/0033-295X.100.3.363
刻意练习的承重部分是结构化的犯错-纠正循环,不是小时数;其"练习量主导"主张后被 R3 元分析显著压缩,故引用时承重的是循环结构、不是"一万小时"(被夸大的流行说法)The load-bearing part of deliberate practice is the structured error-and-correction loop, not the hour count; its "practice volume dominates" claim was later substantially shrunk by the R3 meta-analysis, so what a citation should carry is the loop structure, not "10,000 hours" (an overstated pop-claim)
R3
Ⅰ
Macnamara, Hambrick & Oswald《Deliberate Practice and Performance in Music, Games, Sports, Education, and Professions: A Meta-Analysis》Psychological Science 25(8) 2014:1608-1618 · doi.org/10.1177/0956797614535810
多研究合并元分析——刻意练习整体仅解释约 14% 表现方差(教育 4%、职业 <1% 不显著);本卷诚实纳入的反例,用来限定 R2 不被过度外推A pooled meta-analysis: deliberate practice explains only about 14% of performance variance overall (education 4%, professions <1%, non-significant); the counter-example this volume honestly includes, used to bound R2 against over-extrapolation
R4
Ⅲ
Gerlich《AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking》Societies 15(1):6 · 2025 · doi.org/10.3390/soc15010006(N=666,横断面相关) (N=666, cross-sectional correlational)
AI 使用与批判性思维显著负相关,由认知卸载中介(总效应 b≈-0.42)——作者自承不能证因果、无纵向数据。萎缩假说的相关性证据,明确标"相关非因果"AI use correlates significantly negatively with critical thinking, mediated by cognitive offloading (total effect b≈-0.42); the author concedes no causation and no longitudinal data. Correlational evidence for the atrophy hypothesis, explicitly flagged "correlation, not causation"
R5
Ⅲ
Kosmyna et al. (MIT Media Lab)《Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing》arXiv:2506.08872 · 2025 · arxiv.org/abs/2506.08872(preprint,N=54→第四轮仅 18,未经评审) (preprint, N=54 → only 18 by the 4th session, un-reviewed)
EEG 显示"认知负债"随 LLM 写作累积——样本极小、未经同行评审,记 Ⅲ;作为萎缩假说的短期生理信号引用,不作长期因果结论EEG shows "cognitive debt" accumulating with LLM-assisted writing; a tiny, un-peer-reviewed sample, graded Ⅲ; cited as a short-term physiological signal for the atrophy hypothesis, not as a long-term causal conclusion
R6
Ⅱ
Sparrow, Liu & Wegner《Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips》Science 333(6043) 2011:776-778 · doi.org/10.1126/science.1207745
"Google 效应"——预期信息可再获取时,人记得内容更少、记得"去哪找"更多。卸载改变的是记忆策略,是萎缩/再分配之争的机理先例The "Google effect": when information is expected to be re-accessible, people remember less of the content and more of "where to find it." Offloading changes the memory strategy; a mechanistic precedent for the atrophy-vs-reallocation debate
R7
Ⅲ
再分配假说工作论文redistribution-hypothesis working paper aiXiv:260215 · 2026(预印/工作论文,引用须写"模型/假说预测") (preprint / working paper; citations must read "the model/hypothesis predicts")
再分配假说——AI 不让认知能力净退化,而是把认知资源从可卸载的低阶任务重分配到更高阶工作;与萎缩假说对置的另一极,目前同样缺纵向数据裁决The redistribution hypothesis: AI does not net-degrade cognitive capacity but reallocates cognitive resources from offloadable low-level tasks to higher-order work; the opposite pole to the atrophy hypothesis, equally lacking longitudinal data to adjudicate
R8
Ⅱ
R. A. Bjork & E. L. Bjork《Desirable Difficulties in Theory and Practice》J. Applied Research in Memory and Cognition (JARMAC) 9(4) 2020:475-479 · doi.org/10.1016/j.jarmac.2020.09.003("desirable difficulties"框架首见 R. Bjork 1994《Memory》章) (the "desirable difficulties" frame first appears in R. Bjork 1994, chapter in Memory)
合意困难——间隔、提取、交错等刻意保留的难度提升长期保持与迁移;边界(Bjork 告诫):只对有基础能成功响应的学习者有益,否则只是"不合意的困难"。数十年可复现Desirable difficulty: deliberately retained difficulties (spacing, retrieval, interleaving) improve long-term retention and transfer; the boundary (Bjork's caveat) is that they help only learners with enough base to respond successfully, otherwise they are merely "undesirable difficulty." Replicated for decades
R9
Ⅱ
Maguire et al.《Navigation-Related Structural Change in the Hippocampi of Taxi Drivers》PNAS 97(8) 2000:4398-4403 · doi.org/10.1073/pnas.070039597
伦敦出租车司机长期空间导航与更大的后海马相关——常被用来支持"卸载致萎缩",但本卷只取其"用进"一面,并配 R10 标明方向未定London taxi drivers' long-term spatial navigation correlates with a larger posterior hippocampus; often invoked to support "offloading causes atrophy," but this volume takes only its "use-it" side and pairs it with R10 to mark the direction as undetermined
R10
Ⅲ
Dahmani & Bohbot《Habitual Use of GPS Negatively Impacts Spatial Memory During Self-Guided Navigation》Scientific Reports 10:6310 · 2020 · doi.org/10.1038/s41598-020-62877-0
习惯性 GPS 使用与海马灰质更少相关——方向未定(是 GPS 致萎缩,还是天生海马偏弱者更依赖 GPS?)。横断面相关,本卷据此明确不下"卸载致萎缩"的因果断言Habitual GPS use correlates with less hippocampal grey matter; direction undetermined (does GPS cause atrophy, or do those with weaker hippocampi rely on GPS more?). Cross-sectional and correlational; on this basis the volume declines any causal "offloading causes atrophy" claim
R11
Ⅲ
Sailer et al.(少见的因果实验:LLM 辅助组认知负荷更低、但论证质量更差)· 2024 (a rare causal experiment: the LLM-assisted group shows lower cognitive load but worse argument quality) · 2024
"省力 ≠ 学得好"的因果级证据——主观轻松与客观质量背离,支撑本卷"刻意保留难度"的立场,同时与 R3/R10 同列被诚实标注样本与外推边界Causal-grade evidence that "less effort ≠ better learning": subjective ease diverges from objective quality, supporting the volume's "deliberately retain difficulty" stance, while honestly noting sample and extrapolation bounds alongside R3/R10
延伸心智——纸笔、地图、笔记本本就是外脑,认知卸载本身中性;论证卸载是认知边界外移、不是能力流失,是再分配假说的哲学同向支撑(哲学论证,记 Ⅲ)The extended mind: pen, paper, maps, notebooks are already external brains, so cognitive offloading is in itself neutral; it argues offloading is the cognitive boundary moving outward, not capacity drain — a philosophical ally of the redistribution hypothesis (a philosophical argument, graded Ⅲ)
认知卸载的权威综述——界定"把记忆/计算/导航外包给外部工具"这一概念,本身牢固;本卷借其框架,但卸载的长期后果仍由 R4-R11 各自的证据级决定The authoritative review of cognitive offloading — it defines the concept of "outsourcing memory/computation/navigation to external tools," which is itself solid; the volume borrows its frame, while the long-term consequences of offloading remain governed by the individual evidence grades of R4-R11
R14
Ⅰ
Cepeda, Pashler, Vul, Wixted & Rohrer《Distributed Practice in Verbal Recall Tasks: A Review and Quantitative Synthesis》Psychological Bulletin 132(3) 2006:354-380 · doi.org/10.1037/0033-2909.132.3.354
间隔效应元分析——跨 254 项研究合并,分散练习显著优于集中练习;间隔是"合意困难"中证据最硬的一支,并支撑"巩固需要物理时间"的速度公理The spacing-effect meta-analysis — pooled across 254 studies, distributed practice significantly beats massed practice; spacing is the hardest-evidenced branch of "desirable difficulty," and underpins the speed axiom that consolidation needs physical time
记忆巩固依赖时间与睡眠(睡眠期回放),受控/影像证据、多实验室复现——这是 AI 压缩不掉的物理时间常数,脚手架须顺着它设计而非对抗Memory consolidation depends on time and sleep (sleep-stage replay); controlled/imaging evidence, replicated across labs — a physical time constant AI cannot compress, which scaffolds must be designed along, not against
R16
Ⅴ
Plato《Phaedrus》(苏格拉底论文字与记忆,约公元前 370 年) (Socrates on writing and memory, c. 370 BCE)
每一次"知道"的获取成本被砍掉,都伴随同一种焦虑——苏格拉底担心文字会让记忆萎缩。哲学文本(Ⅴ),引为这条焦虑曲线的历史起点,不作实证主张Each time the cost of acquiring "knowing that" is cut, the same anxiety recurs — Socrates feared writing would atrophy memory. A philosophical text (Ⅴ), cited as the historical origin of this anxiety curve, making no empirical claim
R17
Ⅴ
Vygotsky《Mind in Society: The Development of Higher Psychological Processes》Harvard University Press 1978("最近发展区"ZPD 与脚手架的理论源头;编译自 1930s 俄文遗稿) (the theoretical source of the Zone of Proximal Development and scaffolding; compiled from 1930s Russian manuscripts)
脚手架的定义性特征是可撤除——支持随能力成长逐步收回,直到学习者独立站立。本卷据此判定 AI 是脚手架还是拐杖,取决于它是否被设计为可撤除(经典理论框架,记 Ⅲ)The defining feature of a scaffold is that it is removable — support gradually withdrawn as ability grows, until the learner stands alone. On this basis the volume judges whether AI is scaffold or crutch by whether it is designed to be removable (a classic theoretical frame, graded Ⅲ)
"反驳 AI 且反驳得对"越少 → 自报独立推理信心越低(r≈-.61);下降是头号预警。调查自述"描述性、不支持因果",故记 Ⅲ、作为元认知监测的实操量表引用The less one "pushes back on AI, and correctly," the lower the self-reported confidence in independent reasoning (r≈-.61); a decline is the top warning sign. The survey self-describes as "descriptive, not supporting causation," hence graded Ⅲ and cited as a practical gauge for metacognitive monitoring
R19
Ⅲ
METR《Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity》arXiv:2507.09089 · 2025-07 · arxiv.org/abs/2507.09089 · metr.org(RCT 设计强,16 名资深维护者、246 真实 issue;arXiv 预印+机构报告,未评审,记 Ⅲ) (strong RCT design, 16 senior maintainers, 246 real issues; arXiv preprint plus institutional report, un-reviewed, graded Ⅲ)
允许用 AI 反而慢 19%,而开发者预测会快 24%、做完仍觉得变快——主观与客观方向相反,是"合成自信"的刻度,提醒别把"快/省力"当默认善Allowing AI made developers 19% slower, yet they predicted 24% faster and still felt faster afterward — subjective and objective point in opposite directions, a gauge of "synthetic confidence" warning against taking "fast / less effort" as the default good
R20
Ⅳ
情景规划法(双轴 2×2 / GBN):Scenario planning (two-axis 2×2 / GBN): Pierre Wack《Scenarios: Uncharted Waters Ahead》HBR 1985-09 · hbr.org/1985/09;Peter Schwartz《The Art of the Long View》Doubleday/Currency 1991(ISBN 978-0-385-26732-8;后联合创立 Global Business Network) (ISBN 978-0-385-26732-8; later co-founded Global Business Network)
推演幕「四个世界」的方法论注脚——取两条最关键且最不确定的驱动力为两轴、张成四象限四情景;目的是拓宽感知而非预测单一未来(经典方法论 Ⅱ、可直接采用;由它生成的具体四情景内容仍是 Ⅴ 级推演,方法可靠性不传染给情景内容)The methodological footnote for the "four worlds" of the projection act — take the two most critical and most uncertain driving forces as the axes, spanning four quadrants and four scenarios; the aim is to widen perception, not to predict a single future (a classic methodology, Ⅱ, directly usable; the specific four scenarios it generates remain Ⅴ-grade extrapolation, since the method's reliability does not carry over to the scenario content)
R21
Ⅴ
Paulo Freire《Pedagogy of the Oppressed》(《被压迫者教育学》)Continuum 1970/2000(ISBN 978-0-8264-1276-8;"banking model of education"概念出自第二章) (ISBN 978-0-8264-1276-8; the "banking model of education" concept is from Chapter 2)
"银行存储式教育"——把知识当作可存取的存款、把学生当作空账户接收,是本卷批判的"知识传递"隐喻的命名来源。教育哲学文本(Ⅴ),引为隐喻批判的概念框架,不作实证主张;其要害在 AI 下被放大:若学习只是存储,AI 就是更好的容器The "banking model of education" – treating knowledge as a deposit to be stored and the student as an empty account that receives it – is the naming source for the "knowledge transfer" metaphor this volume critiques. A philosophy-of-education text (Ⅴ), cited as the conceptual frame for the metaphor critique, making no empirical claim; its sting is amplified under AI: if learning is mere storage, AI is the better container
R22
Ⅳ
Donald T. Campbell《Assessing the Impact of Planned Social Change》Evaluation and Program Planning 2(1) 1979:67-90 · doi.org/10.1016/0149-7189(79)90048-X("Campbell 定律";与 Goodhart 定律同源) ("Campbell's law"; same lineage as Goodhart's law)
Campbell 定律——一个量化的社会指标用于决策的程度越高,它就越易受腐蚀压力、也越会扭曲它本想监测的社会过程。本卷据此论证:分数/文凭/标准化考试作为学习的代理信号,在 AI 把伪造成本砍到近零时结构性失真(经典社会科学命题,Ⅳ)Campbell's law – the more a quantitative social indicator is used for decision-making, the more it is subject to corruption pressure and the more it distorts the social process it was meant to monitor. On this basis the volume argues that grades/credentials/standardized tests, as proxy signals of learning, distort structurally once AI cuts the cost of faking them to near zero (a classic social-science proposition, IV)
REV
DATE
DESCRIPTION
L.0
2026-06
《AI-Native 学习方法论》首版 —— 认知主权主张 · 12 张论证图 · 自有 22 源证据登记(R1–R22),承重论断逐条分级 Ⅰ–Ⅴ;萎缩命题诚实标注为相关性证据、降格为可证伪的赌注First edition of the AI-Native Learning Methodology: the cognitive-sovereignty thesis · 12 argument diagrams · its own 22-source evidence registry (R1–R22), each load-bearing claim graded Ⅰ–Ⅴ; the atrophy claim honestly marked as correlational evidence and downgraded to a falsifiable bet