PART V / AI-NATIVE 学习AI-NATIVE LEARNING · 认知主权THE COGNITIVE SOVEREIGNTY VOLUME

AI Native 学习方法论

AI Native Learning Methodology

其他几卷都在说"把执行交给 AI"；这一卷唱反调：有一部分内化，恰恰不能交出去。AI 把"知道"这件事的成本压到几乎为零之后，稀缺的东西换成了经历、反思、把学到的东西搬到新场景里的能力，不再是拿到信息本身。这些能力有一个共同的脾气：外包出去就萎缩，而它们又恰好是下游所有人本判断赖以运转的底子。组织卷讲"人回归于意义"，前提是这个人手里还有没被掏空的判断力；没有这条，那句话就是空的。我们不主张"AI 已经让人变笨了"——证据还撑不起这句话，说了也会反噬信任——立场是一个有据的警告，加一个愿意被证伪的赌注。

Every other volume in this series says hand execution to AI; this one argues the opposite for a slice of the work: some internalization is exactly what must not be handed over. Once AI drives the cost of knowing toward zero, what’s scarce shifts from access to information toward experience, reflection, and the ability to carry what you learned into a new situation. Those capacities share one trait: outsource them and they atrophy, and they happen to be the ground every downstream human judgment stands on. The org volume’s promise that people return to meaning only holds if that person still has undamaged judgment to bring. Our claim is not that AI is already making people dumber; the evidence doesn’t support that yet, and overclaiming would cost us trust. It is a grounded warning plus a bet we’re willing to have proven wrong.

拿系列内核搬到认知这个面上，会多长出一层别的卷都没有的东西：防御性判断——不止判断该交什么，还要判断该刻意攥住什么不交。完整的四步推导在第 1 节；没读过组织卷，从这里读起也一样成立。

Specialize the series kernel onto cognition and it grows a layer none of the other volumes need: defensive judgment, deciding what to hand off and, just as deliberately, what to keep for yourself. The full four-step version is in Section 1; you don’t need the org volume to start here.

面向执行EXECUTION-FACING

组织OrganizationORG · index 工程EngineeringENG 设计DesignDSN

面向认知COGNITION-FACING

研究ResearchRSCH 学习 · 本卷Learning · hereLEARN 创新InnovationINNO

完整体系总图 ↗Full system map ↗

守护一旦外包就萎缩、却正是所有下游人本判断赖以运转的认知底座。Guarding the cognitive bedrock that atrophies once outsourced yet on which every downstream human judgment runs.

AI-ENABLED LEARNING→AI-NATIVE LEARNING

答案

Answers

更快拿到解释和总结Get explanations and summaries faster先自答，再用 AI 校准缺口Self-answer first, then use AI to calibrate gaps

困难

Difficulty

把阻力全部消掉Remove all friction保留合意困难、反思库和迁移测试Keep desirable difficulty, reflection logs, and transfer tests

能力

Capability

知道答案Know the answer形成离开 AI 后仍能迁移的能力Build ability that transfers after AI is removed

拖动滑块，看学习从"答案外包"转为"认知主权"。进入第 6 节 · AI 止步线

Drag the slider: learning moves from answer outsourcing to cognitive sovereignty. Enter Section 6 · AI Stop-Line

AI-NATIVE DOCUMENT PACK · PART V

学习文档包：守住不该外包的认知能力

Learning Pack: guarding cognition that must not be outsourced

这份文档包把学习从“更快获取答案”重新定向到内化、反思、迁移和 AI 止步线。

This pack redirects learning away from faster answers and toward internalization, reflection, transfer, and the AI stop-line.

Thesis

答案变充裕之后，学习稀缺的是内化与认知主权（谁在替你想）。

Once answers are abundant, what’s scarce in learning is internalization and cognitive sovereignty (who is doing your thinking).

AI-Native 学习不是更快拿到解释，是主动分清哪些能外包、哪些必须自己经历一遍。这是一种防御性判断：守住下游的品味、价值感和判断力，不让它们被悄悄掏空。

AI-Native learning deliberately separates what can be outsourced from what you must experience yourself, rather than promising faster explanations. It is defensive judgment, guarding downstream taste, value perception, and judgment against being hollowed out.

LEARN

CONCEPT · 概念

CONCEPT

定义 · 先划界

Definition

学习不是更快地获取信息

Learning Is Not Acquiring Information Faster

上周让 AI 写的那段代码，现在不看着它，你还复述得出它怎么工作吗？信息当时拿到了，能力却没长进来——这一章先把"拿到"和"学会"切成两件事。

That snippet you had AI write last week – without looking at it, can you still say how it works? You got the information at the time; the capability never grew. This chapter first cuts “getting it” from “learning it.”

一句话In one line

信息更容易得到，并不自动说明能力长进了。学习的关键问题是：哪些过程必须由学习者完成，哪些可以外包，以及怎样用迁移与撤除来检查答案。Easier access to information does not automatically show that capability has grown. The key learning question is which processes a learner must complete, which can be outsourced, and how transfer and removal can check the answer.

把 AI 接进学习流程，检索、总结、讲解都会变快，这本身没问题。问题出在把"拿到解释"当成了"已经学会"。这里其实是两层东西："知道"是陈述性知识，获取成本正在往下掉；"能做"是能在新情境里判断和行动的能力，通常要靠犯错、反馈、迁移才能长出来。但具体到哪个任务、哪种辅助方式，这条边界并没有被简单划定死。

Bring AI into the learning loop and search, summaries, and explanations get faster; that is not the problem. The problem is treating “got the explanation” as “have learned it.” There are really two layers here: knowing that is declarative knowledge, and its cost of acquisition keeps falling; knowing how is the capacity to judge and act in a new situation, usually built through error, feedback, and transfer. But exactly where that line falls, for which task and with which kind of assistance, is not settled.

所以这一卷不打算教"怎么用 AI 学得更快"。它问一个更难、也更没有定论的问题：当 AI 随时能当外脑给答案，人创造和内化知识的过程本身，是不是正在被悄悄改写？这问的是"哪些认知能力正在被养成、被外包、还是被削弱"，不是下游几卷常问的"瓶颈搬到哪儿去了"——证据还没有定论，但这问题现在就得认真对待。

So this volume is not going to teach you how to learn faster with AI. It asks a harder, less settled question: once AI stands by as an external brain handing over answers, is the process by which a person creates and internalizes knowledge itself being quietly rewritten? That is a different question from the one the downstream volumes ask, which is where the bottleneck moved. This one asks which cognitive capacities are being built, which are being outsourced, and which are weakening. The evidence has not settled it, which is exactly why it needs to be taken seriously now.

学习边界Learning Boundary“必须亲手做”是假设，不是道德命令。用撤除演练、迁移任务和延迟复测来检查：拿掉 AI 后，能力是否仍在；换一个情境后，能否迁移；有脚手架时，是否比无脚手架时学得更稳。若辅助能稳定提高这些结果，它应进入学习流程，而不是被浪漫化地排除。“Must be done by hand” is not a moral command. It is a hypothesis. Check it with removal drills, transfer tasks, and delayed retests: does the capability remain without AI, transfer to a new setting, and grow more reliably with a scaffold than without one? If assistance repeatedly improves those outcomes, it belongs in the learning process rather than being romantically excluded.

这卷为什么必须唱反调，而不是补全拼图

Why this volume must dissent, not just complete the puzzle

系列里其他几卷有一个共同的调子：找到瓶颈挪到哪儿了，再把杠杆做大——都是在优化便利。学习卷要是也用这个调子写，就会变成一本"怎么用 AI 学得更快"的工具书，和市面上千篇一律的提示词技巧没什么两样，也对不起它在系列里的位置。

The rest of the series shares one register: find where the bottleneck moved, then enlarge the leverage there, all in the name of optimizing convenience. If the learning volume used that same register, it would collapse into a tool guide for learning faster with AI, indistinguishable from the prompt tricks everywhere, and it would waste its place in the series.

唱反调不是为了标新立异，是因为认知这个领域有一个别处没有的特点：被优化的便利，会反过来咬优化它的人。工程、组织、设计里把执行交出去，顶多是某次产出的质量打折，校验能兜住；学习里把思考交出去，赔进去的可能是那个还能思考的人本身——这个损失没有外部校验能兜，因为检测器和被检测的东西是同一个人。所以学习卷在系列里扮演的角色是刹车：别的卷都在踩油门，它是唯一有资格喊"等等"的那个。这不是反对 AI，是提醒整个系列：便利有代价，总得有一块地基，在加速的时候被人刻意守住。

Dissenting is not about being different for its own sake. Cognition has a feature none of the other domains do: optimized convenience bites back at the person it optimized. In engineering, org, and design, handing off execution costs you, at worst, the quality of some output, and verification catches that. In learning, handing off thinking can cost you the very person who could still think, and no external check catches that loss, because the detector and the thing being detected are the same person. So the learning volume plays the brake pedal in this series: while the others are flooring the gas, it is the one entitled to say wait. This is a reminder to the whole series that convenience has a cost, and one foundation has to be deliberately held onto while everything else accelerates. It is not against AI.

体验、反思、迁移：三件无法被代劳的事

Experience, reflection, transfer: the three that can’t be done for you

承重的判断里点了三样东西，信息一旦充裕就会显出稀缺：体验、反思、迁移。我们目前的判断是：它们结构上没法外包，不是“AI 现在还做不好、换个更强的模型就行”。这判断本身是一注、可被改判——它最强的反方（所谓“独立”认知从来就是被书写、工具一路塑造出来的）第 16 节正面接住；先把凭什么这么押说清楚。体验是第一人称的：你对一个概念的"手感"——它在什么情境下成立、到哪个边界就失效——只能靠你自己撞过那些情境才能长出来。AI 能描述边界条件，但描述边界不等于撞过边界；撞过的人会多一分描述者没有的警觉。

The load-bearing claim named three things that turn scarce once information is abundant (experience, reflection, transfer). Our current judgment is that they are structurally un-outsourceable, not “AI can’t do it well yet, wait for a stronger model” – though that judgment is itself a revisable bet, whose strongest counter, that “independent” cognition was always shaped by writing and tools, is met head-on in Section 16, and it is worth first saying plainly what we are betting on. Experience is first-person: your feel for a concept, where it holds and where it breaks at the edge, only grows from living through those situations yourself. AI can describe the boundary conditions; describing a boundary is not the same as hitting it, and whoever hit it carries an alertness the describer does not have.

反思是对自己思维的二阶操作：发现"我刚才那步推理为什么错了"，前提是那步推理本来就是你自己产出的。你没法真正反思一段 AI 替你生成的推理，因为它从没进过你的心智模型，你只是在事后给一个外来物贴标签。迁移则是把一处学到的结构搬到表面看起来不一样的新情境里去用——它检验的正是"你内化的是抽象结构还是表面套路"，只有结构真的长进脑子才可能。三件事有一个共同点：它们的产物都是你这个认知主体自身的改变，而主体的改变没法外包给另一个主体去替你完成。

Reflection is a second-order operation on your own thinking: catching why that piece of reasoning was wrong presupposes the reasoning was yours to begin with. You cannot really reflect on a chain of reasoning AI generated for you, because it never entered your mental model; you are just labeling a foreign object after the fact. Transfer is carrying a structure you learned in one place into a situation that looks different on the surface; that is exactly what tests whether you internalized the abstract structure or just the surface routine, and it only works if the structure actually grew into your head. What the three share is this: what they produce is a change in you, the cognitive subject, and a subject’s change cannot be outsourced to another subject.

种类之别，不是程度之别

A difference of kind, not of degree

"学得更快"和"学习的目标变了"听起来像一回事，其实是两条岔路，分清它们是这一卷的入口。学习继承的旧工作流很具体：课堂讲授、教材、按学期统一进度的测验，一整套围着"信息本身稀缺、老师和书才是瓶颈"搭出来的流水线。程度之别假设那套流水线的目标没变——还是把知识搬进脑子——只是手段升级了：搜索更快、讲解更顺、习题更贴合，AI 不过是给这条旧流水线换了个更猛的引擎。种类之别承认的是另一件事：当"把知识搬进脑子"这件事的成本塌向零，旧流水线原本要守的那个目标就不值得再守了，目标本身得重新画：不是把旧流水线跑得更快，是拆了它重画。

“Learn faster” and “what learning is for has changed” sound like the same claim, but they are two forks, and telling them apart is where this volume starts. What learning inherited is a specific old workflow: lectures, textbooks, tests paced to a fixed term, a whole pipeline built around one constraint, that information itself was scarce and teachers and books were the bottleneck. The difference-of-degree reading assumes that pipeline’s goal hasn’t moved (still get the knowledge into your head), only the means got better: faster search, smoother explanations, better-fitted exercises, with AI grafted on as a stronger engine for the same old pipeline. The difference-of-kind reading says something else: once the cost of getting knowledge into your head collapses toward zero, the goal that old pipeline existed to guard stops being worth guarding, and the goal itself has to be redrawn, which means tearing the pipeline down, not running it faster.

这有点像摄影之于绘画。相机把"精确复现外观"变得几乎免费，绘画的目标就从"画得像"挪到了"画得有看法"——不是画家变快了，是"画画为了什么"变了。学习正在经历同一种迁移。

It is a bit like photography and painting. The camera made “accurately reproduce appearance” nearly free, so painting’s goal shifted from looking like the thing to having a view of it: the painter did not get faster, what painting was for changed. Learning is going through the same kind of shift.

FIG. L.1 / 嫁接的谬误THE GRAFTING FALLACY看懂：加速早已不稀缺的那一步，杠杆为零Read: accelerating the already-cheap step yields zero leverage

从图里读出：旧流水线里，AI 的全部火力都打在第一格"获取信息"——而那一格的成本早在搜索引擎时代就已塌掉。把工具嫁接到旧流程上，优化的是一个解掉的瓶颈；杠杆在后两格，偏偏那两格要的是无法被加速的结构性时间。这就是为什么本卷不教"用 AI 学得更快"。这里画的是当下的分布（获取已被加速，内化与迁移尚未）；内化与迁移是否永远只能由人走完，是上文刚下的一注——那道分叉画在 FIG L.3。What the figure says: in the old pipeline, all of AI’s firepower lands on the first box, “acquire info” – whose cost already collapsed back in the search-engine era. Grafting the tool onto the old process optimizes a solved bottleneck; the real leverage is in the last two boxes, which happen to need structural time that cannot be accelerated. This is why the volume does not teach “learn faster with AI.” What’s drawn here is the current distribution (acquisition already accelerated, internalization and transfer not yet); whether internalization and transfer can only ever be walked through by a person is the bet just placed above – that fork is drawn in FIG L.3.

这一卷在系列里的位置Where this volume sits

组织卷要人回归于意义，前提是判断力没被萎缩；本卷正是养护这份判断力的机制。研究卷定方向，本卷守地基。The org volume’s “return to meaning” presupposes judgment that hasn’t atrophied; this volume grows and guards it. Research sets direction; this volume guards the foundation.

LEARN

KERNEL · 内核特化

KERNEL

机理 · 内核母版

Mechanism · Kernel master

充裕的是输入，稀缺的是内化

What’s Abundant Is the Input; What’s Scarce Is Internalization

"我今天用 AI 查了很多资料"——这句话量的是执行，不是学习。②步就卡在这道分叉上：执行交给 AI 之后，人剩下的活是判断，判断又裂成两半——说得清标准的那半，迟早也要交出去（组织卷管它叫"判断退守"，下游几卷叫"沿可验证性梯度分叉"，其实是同一件事）；说不清、只能自己长出来的那半，才是本卷要死守的一层防御，别的卷都没有。

“I looked up a lot of material with AI today” – that sentence measures execution, not learning. Step ② turns on exactly this fork: once execution goes to AI, what’s left for the human is judgment, and it splits in two – the half whose standard can be spelled out gets handed off too, eventually (the org volume calls it “judgment retreats,” the downstream volumes “forking along the verifiability gradient” – same thing, different name); the half that can’t be spelled out, that can only be grown by living it, is the defensive layer this volume guards and the others don’t have.

① 充裕ABUNDANCE

信息 / 答案 / 讲解 / 示范

Information / answers / explanations / demonstrations

"知道"近乎免费——查得到、问得到、个性化生成得到。

“Knowing that” is near-free – lookup-able, ask-able, personally generatable.

② 判断JUDGMENT

退守内化 + 元能力，并划 AI 止步线

Retreat to internalization + meta-skills, draw the AI stop-line

新瓶颈是内化与元认知，不是信息获取；且需主动判断哪些刻意不外包。

The new bottleneck is internalization and metacognition, not acquisition; and you must decide what to deliberately not outsource.

③ 上下文CONTEXT

个人认知脚手架 + 刻意制造的难度

A personal cognitive scaffold + deliberately built difficulty

错题反思库、人机同源可 diff 的知识库，外加有意保留的"合意困难"。

An error-and-reflection log, a same-source diffable knowledge base, plus deliberately retained “desirable difficulty.”

④ 人MEANING

守护不可外包的认知

Guard the cognition that can’t be outsourced

判断力 · 价值感知 · 直觉 · 深度思考 · 品味——学习者回归为认知主权者与更好的提问者。

Judgment · value perception · intuition · deep thinking · taste – the learner returns as a cognitive sovereign and a better question-asker.

一句话In one line

别卷都在把执行交出去；本卷多一句相反的：有些事 AI 能代劳，却该自己做——因为做的过程，养着那个能判断的你。The other volumes hand execution off; this one adds the opposite clause: some things AI could do you should do yourself – because doing them keeps alive the you who can judge.

②步的分叉是本卷和下游最深的差异，得画清楚。沿"这件认知是否可被 AI 充裕地代劳"分两支：

The fork at step ② is the deepest difference between this volume and the downstream ones, and it needs to be drawn clearly. Split along “can this piece of cognition be abundantly done for you by AI”:

可充裕支 → 并入 ①。"知道某事是真的""查到某个事实""生成一份讲解"——这些已不再是学习的目标，它们变成又一种被自动化的执行。把精力守在这里，就是在优化已解掉的瓶颈。
The abundance-able branch → folds into ①. “Knowing that something is true,” “finding a fact,” “generating an explanation” – these are no longer the goal of learning; they become one more kind of automated execution. Spending effort here is optimizing a solved bottleneck.
构成性支（构成你判断力本身的那部分认知） → 下沉 ④。体验、反思、迁移、犯错-纠正的循环，以及"哪些能力刻意不外包"这一防御性判断。它是认知结构本身的重建，不是一个可被"更准"超越的能力——只能靠学习者自身长出来，无法代劳。
The constitutive branch → sinks to ④. Experience, reflection, transfer, the error-and-correction loop, and the defensive judgment of “which capacities to deliberately not outsource.” It is the rebuilding of cognitive structure itself, not a capacity that something “more accurate” could supersede – it can only grow inside the learner, and cannot be done on their behalf.

旧Before

学习 = 把外部知识搬进脑子。稀缺资源是"接触信息"（书、老师、课程），瓶颈在获取端。

Learning = moving external knowledge into the head. The scarce resource is “access to information” (books, teachers, courses); the bottleneck is on the acquisition side.

新 · 原理After · principle

获取归零，瓶颈搬到内化端。学习 = 在体验-反思-迁移的闭环里重建认知结构，并主动划出 AI 止步线。充裕的是输入，重建仍靠你自己。

Acquisition goes to zero; the bottleneck moves to the internalization side. Learning = rebuilding cognitive structure inside an experience-reflection-transfer loop, and actively drawing the AI stop-line. The input is abundant; the rebuilding is still on you.

FIG. L.3 / ②步的分叉THE FORK AT STEP ②看懂：判断沿"可充裕性"分两支，并多一层防御 · 全页图例：实线＝观察到的事实或无争议机制，虚线＝本卷当前押注（可改判），点线＝并置的竞争解释Read: judgment forks along “abundance-ability,” plus a defensive layer · Page legend: solid = observed fact or undisputed mechanism, dashed = the volume’s current bet (revisable), dotted = a competing explanation shown alongside it

这张图说的是：下游卷的②步是"判断退守到新瓶颈"——一条单线。学习面的②多了一道分叉：可充裕支（实线，无争议）并回①当执行；构成性支（虚线）下沉到④由人长出——这一支画成虚线，是因为"体验/反思/迁移结构上不可外包"是本卷的押注，不是已证的事实，点线标出它目前最强的反方：若"独立"认知本就是被书写、工具一路塑造出来的（延伸心智，第 16 节正面接住），这一支未必独立成立。叠加其上的防御性判断（同为虚线，别卷没有的一层）——不仅判断该交什么给 AI，更要判断该刻意保留什么不交——是全卷反调的根，而它站不站得住，取决于下面那条虚线能不能撑住。What you are looking at: the downstream volumes’ step② is “judgment retreats to the new bottleneck” – a single line. The learning ② adds a fork: the abundance-able branch (solid, undisputed) folds back to ① as execution; the constitutive branch (dashed) sinks to ④ to be grown by a person: drawn dashed because “experience/reflection/transfer are structurally un-outsourceable” is this volume’s bet, not a proven fact, and the dotted line marks its strongest current counter: if “independent” cognition was always shaped by writing and tools (the extended mind, met head-on in Section 16), this branch may not stand alone. The defensive judgment layered on top (also dashed, a layer the others lack) – judging not only what to hand to AI but what to deliberately keep un-handed – is the root of the whole volume’s contrarian stance, and whether it holds depends on whether that dashed line below it holds.

为什么"可充裕支"必须并回执行，而不是另立一类学习

Why the abundance branch folds back into execution, not into a new kind of learning

②步分叉的上支（可充裕支）很容易被误读成"一种更轻松的学习方式"——好像查事实、读 AI 讲解也是在学习，只是变快了。这一支得明确挪出学习的目标范畴，并回①执行。理由很直接：一件事的获取成本一旦塌向零，它就不再值得花认知精力去守——它变成又一种被自动化的吞吐，和让 AI 写样板代码、生成会议纪要没什么两样。还把它叫"学习"，就会把精力错配到一个已经解掉的瓶颈上，正是 FIG L.1 里"给马车装喷气引擎"那个谬误。

The upper branch of the step-② fork (abundance-able) is easy to misread as a more relaxed way of learning, as if looking up facts and reading AI explanations were learning too, just faster. That branch needs to move out of learning’s goal category and fold back into ① execution. The reason is direct: once a thing’s acquisition cost collapses toward zero, it stops being worth spending cognitive effort to guard. It becomes one more automated throughput, no different from having AI write boilerplate or draft meeting minutes. Keep calling it “learning” and you misallocate effort to a bottleneck that’s already solved – the same “jet engine on a cart” fallacy from FIG L.1.

这个划分有一个很扎手的推论："我今天用 AI 查了很多资料、读了很多讲解"，这句话量的是执行，不是学习。它和"我今天学了很多"是两件事，混淆它们正是便利陷阱在自我感知层的入口。学习量，只能用你自己长出来的那一支的产出来算：今天亲手跑了几次犯错-纠正循环、做了几次撤除演练、往反思库回流了几条，这些才是认知结构真的被重建的证据。把可充裕支干脆归入执行，是为了让"学习"这个词只指那件真正稀缺、真正发生在你身上的事。

That division has a sharp corollary: “I looked up a lot of material with AI today and read a lot of explanations” measures execution, not learning – a different claim from “I learned a lot today,” and conflating the two is exactly where the convenience trap enters self-perception. Real learning can only be counted from the constitutive branch’s output: how many error-correction loops you ran by hand today, how many removal drills, how many entries flowed back into the reflection log. That’s the evidence cognitive structure actually got rebuilt. Folding the abundance branch cleanly into execution keeps the word “learning” pointed at the one thing that’s actually scarce and actually happening in you.

内化有两个对象，混淆它们就会误读本卷

Internalization has two objects; conflating them misreads the volume

本卷最容易被读歪成两个极端，两个极端都出在没分清"内化的对象变了"这件事上。一端是怀旧派："AI 让人不动脑，所以要回到死记硬背，什么都自己来。"这是把"内化离散知识"当成了还要守的目标——可那一支恰恰是已经被充裕化、该放手的（FIG L.3 第一格）。另一端是取消派："答案随取即得，不用再内化任何东西了，会查会问就行。"这是把"内化"整个取消，却没看见内化的对象只是挪了位置、没有消失：提问、质疑、整合这套判断结构，一样昂贵，一样只能靠犯错-纠正循环长出来（第 3 节的承重命题）。

The volume gets misread into two extremes most often, and both come from not separating out that the object of internalization changed. One is the nostalgist: “AI makes people stop thinking, so go back to rote memorization and do everything yourself.” That treats internalizing discrete knowledge as still worth guarding, when that branch is exactly the abundance-able one to let go of (FIG L.3’s first cell). The other is the abolitionist: “answers are a query away, so stop internalizing anything; just look up and ask.” That abolishes internalization wholesale, missing that its object moved, it did not vanish: the judgment structure behind asking, challenging, and integrating is just as expensive, and just as dependent on the error-correction loop to grow (Section 3’s load-bearing claim).

本卷站的位置在两个极端中间：该放的放（离散事实），该守的死守（判断结构，以及只能你自己长出来的那部分认知）。"内化的对象变了，不是内化被取消了"——这句话就是②步那道分叉（FIG L.3）落到地面的另一种说法，也是读懂整卷的钥匙。谁把本卷读成"反 AI"或"无脑拥抱 AI"，都是在这把钥匙上拧错了方向。

The volume’s actual position sits between the two: let go of what should go (discrete facts), hold onto what should stay (judgment structure, the part of cognition only you can grow). “The object of internalization changed, internalization itself was not abolished” is just the ground-level version of the step-② fork (FIG L.3), and it is the key to reading the whole volume correctly. Read it as anti-AI or as embrace-AI-uncritically and you have turned that key the wrong way.

同一条内核，五个面，本卷是唯一会"反向"的那个

One kernel, five faces; this is the only one that runs in reverse

把五卷的②步摆在一起看，才能看清学习卷在系列里的特殊位置。组织卷的②：判断退守到"该让谁/什么来做决策"。工程卷的②：判断退守到 trust-but-verify 的校验设计。设计卷的②：判断退守到"什么是好品味、为不为人"。研究卷的②：判断退守到"什么问题值得问、什么算真结果"。它们共用一套语法——把执行交出去，人退到更高一级的判断节点，方向都是放手。

Lay the five volumes’ step ② side by side and this volume’s odd position comes into focus. Org’s ②: judgment retreats to who or what should make the decision. Engineering’s ②: judgment retreats to trust-but-verify design. Design’s ②: judgment retreats to what counts as good taste, for people or not. Research’s ②: judgment retreats to which question is worth asking, what counts as a real result. They share one grammar – hand off execution, retreat to a higher judgment node – all pointed toward letting go.

学习卷的②打破了这套语法：它的判断里有一支是反向的——明知 AI 能代劳，却判断"这件必须自己做"，因为做这件事的过程本身就是在维持那个能做判断的认知主体。其余四卷优化的是产出，学习卷守的是产出者。别的卷说"交出去"，这一卷补一句"但有些事，得刻意攥住"——这不是态度上使的性子，是同一条内核作用在"认知"这个唯一会被代劳反噬的面上时，必然长出的那条反向分支。

Learning’s ② breaks that grammar. One branch of its judgment runs in reverse: it knows AI can do the task, and judges “I must do this myself anyway,” because doing it is what keeps alive the cognitive subject able to judge at all. The other four optimize the output; learning guards the producer. The others say hand it off – this one adds but some things, hold on purpose. That is not a mood so much as what the same kernel inevitably grows when it lands on the one surface where being done for you backfires on you: cognition.

防御性判断：第一次出现的"反向"内核动作

Defensive judgment: the kernel’s first “reverse” move

这一层的特殊之处值得停下来看清楚。组织、工程、设计三卷里，②步的判断都是进攻性的：把执行尽量交出去，人退到最稀缺的判断节点，目标是把杠杆做到最大。学习面第一次出现一个反向的判断动作：有些能被充裕代劳的事，恰恰不该交出去，因为交出去会侵蚀那个让你有资格做判断的认知底座。同一条内核母版，在别的卷读作"放手"，在这一卷多出一句"但有些事，得刻意攥住"。这不是给内核打补丁，是内核落到认知这个特殊面上必然长出的样子——前提是本卷押得最重的那句经验判断成立：认知可能是唯一一个“被代劳就改变代劳者本身”的执行领域，代码被 AI 写、写的人未必退化，思考被 AI 替、才可能退化。这是一注，不是已证的事实；裁决它的是同一份还没到场的数据——无 AI 在场时，重度协作者的独立判断随时间往哪走（第 4 节）。

It is worth pausing on what makes this layer different. In org, engineering, and design, the step-② judgment is offensive: hand off execution as far as possible, retreat to the scarcest judgment node, push leverage to its max. Learning is where a reverse judgment move shows up for the first time: some things AI can abundantly do for you are exactly the things you should not hand over, because handing them over erodes the cognitive ground that qualifies you to judge anything at all. Same kernel master block – the other volumes read let go; this one adds but some things, grip on purpose. That is not a patch bolted onto the kernel. It is the shape the kernel inevitably takes when it lands on cognition, which we bet is the one execution domain where having it done for you changes the one it is done for: code written by AI need not degrade the person who wrote it, thinking done by AI might. That is this volume’s heaviest bet, not a proven fact, and the same data that hasn’t arrived would settle it – with AI absent, which way a heavy collaborator’s independent judgment moves over time (Section 4).

所以"内化"在这一卷里其实有两个对象，得分清（接第 3 节）：内化离散知识（这件事 AI 已经接管，不用再守），和内化提问、质疑、整合这套判断结构（这是新瓶颈，而且只能靠自己长）。这一卷的②步同时要管这两支怎么分配，外加那道防御线——三件事压在一步里，这就是它比下游任何一卷的②都更密的原因。

So “internalization” in this volume actually has two objects, and they need to stay separate (carried into Section 3): internalizing discrete knowledge (AI has that now, no need to guard it), and internalizing the judgment structure behind asking, challenging, and integrating (that is the new bottleneck, and it only grows by doing it yourself). This volume’s step ② has to manage both allocations plus the defensive line at once – three things stacked into one step, which is why it runs denser than the ② in any downstream volume.

LEARN

MECHANISM · 机理

MECHANISM

机理 · 成本剪刀差

Mechanism · The cost scissors

知道几乎免费，能做依旧昂贵

Knowing Is Nearly Free; Doing Stays Expensive

一句话In one line

查得到近乎免费，做得出来仍然昂贵：后者只能靠自己犯错、纠正长出来，于是人越来越容易把"查得到"当成"会了"。Looking it up is nearly free; doing it stays expensive: the latter grows only through your own erring and correcting, so it gets ever easier to mistake “I can look it up” for “I’ve learned it.”

两层的力学完全不同。"知道"是事实的搬运：一次查询、一次生成，边际成本趋零，还能复制、随时拿到。"能做"是技能的生长，它要一个结构：做出尝试、收到反馈、发现偏差、修正、再做一遍。这个循环里，犯错不是浪费，是信号；纠正不是补救，是学习本身发生的那一刻。AI 能替你产出正确答案，但替不了你走完这个循环——循环改变的是你的认知结构，不是那份产物。

The two layers are mechanically different. “Knowing” is the transport of facts: one query, one generation, marginal cost toward zero, copyable and instantly available. “Doing” is the growth of skill, and it needs a structure: make an attempt, get feedback, notice the gap, correct, repeat. In this loop, error is not waste but signal; correction is not a patch but the very moment learning happens. AI can produce the right answer for you, but it cannot run the loop for you – the loop changes your cognitive structure, not the artifact.

这道剪刀差本身有数十年、不依赖 AI 的硬证据撑着；但下面用来支撑它的案例——包括本卷自己举的那几个——观察的都是这道剪刀差摆在旧的个人-课程-考试容器里，人怎么在旧容器内对抗便利，还没人观察过一个按这道剪刀差从零重新设计出来的学习容器长什么样。价值是真的，样本却都是过渡态（进证据清单）：

This scissors itself is backed by decades of hard, AI-independent evidence; but the cases used to ground it below, including the ones this volume reaches for, all observe that scissors sitting inside the old container of person, course, and exam, watching a person fight convenience within that container. No one has yet observed what a learning container designed from scratch around this scissors would look like. The value is real; the sample is still transitional (the evidence ledger):

测试效应：主动提取（考自己）比重读更利于长期保持——而重读在短期看起来更好，于是人系统性地误判（Roediger & Karpicke 2006，Ⅱ，可复现）。AI 把"重读式"的轻松最大化，恰好踩中这个误判陷阱[R1]。
The testing effect: active retrieval (quizzing yourself) beats rereading for long-term retention – yet rereading looks better in the short term, so people systematically misjudge (Roediger & Karpicke 2006, II, replicable). AI maximizes the ease of the “reread” mode, landing squarely in that misjudgment trap [R1].
刻意练习的承重部分：是那个结构化的犯错-纠正循环，不是小时数。Ericsson 1993 的"练习量主导"被 Macnamara 2014 元分析显著压缩（整体仅解释约 14% 方差；教育 4%、职业 <1% 不显著）——所以请慎引：承重的是循环结构，不是"一万小时"（这是个被夸大的流行说法）[R3]。
The load-bearing part of deliberate practice is the structured error-and-correction loop, not the hour count. Ericsson 1993’s “practice volume dominates” was substantially shrunk by Macnamara et al. 2014’s meta-analysis (~14% of variance overall; education 4%, professions <1%, n.s.) – so cite it with care: what bears weight is the loop’s structure, not “10,000 hours” (an overstated pop-claim) [R3].

FIG. L.0 / 成本剪刀差THE COST SCISSORS看懂：两条成本曲线随 AI 能力反向张开Read: two cost curves fan apart as AI improves

图里的两条线：红线是"知道"的单位成本，随模型变强塌向零；蓝线是"能做"的单位成本，几乎不动——因为它要的是犯错-纠正循环这种结构性时间，AI 压缩不掉。两线张开的缺口，就是认知错觉的温床：信息越廉价，人越容易把"查得到"误当成"我会了"。The two curves: the red curve is the unit cost of “knowing that,” collapsing toward zero as models improve; the blue curve is the unit cost of “knowing how,” nearly flat – because it requires the structural time of an error-and-correction loop, which AI cannot compress. The widening gap between them is the breeding ground of a cognitive illusion: the cheaper information gets, the more readily people mistake “I can look it up” for “I have learned it.”

"是否还需内化"——把问题推到极限

“Do we still need to internalize” – pushed to the limit

把这一节的机理推到极限，会撞上一个躲不开的问题：既然 AI 随时能当外脑给答案，人是不是还需要把知识内化成"能做"？这得诚实回答，不能反射式地说一句"当然需要"了事。先承认对手最强的版本：对于纯粹"知道"这一层——记住某个 API 的参数、某个史实的年份、某段代码的样板——不需要，而且早就不需要了；硬要内化这些，只是在和一个已经解掉的瓶颈较劲。

Push this section’s mechanics to the limit and you hit a question you cannot dodge: since AI stands by as an external brain handing over answers, does a person still need to internalize knowledge into “knowing how”? Answer it honestly, not with a reflexive “of course.” First grant the opponent’s strongest version: for the pure knowing layer (memorizing an API’s parameters, a historical date, a code boilerplate), the answer is no, and has been for a while; forcing internalization here is wrestling a bottleneck that is already solved.

问题的全部分量，落在"能做"这一层和它上面那套判断结构。这里仍然需要，但理由得说精确，不然站不住。理由不是"万一哪天 AI 不在了"——那是脆弱的实用主义——而是第 3 节/06 那条更深的链条：能做的根基，是你能质疑 AI、做出判断、保有品味的前提。没有它，你连"AI 这次对没对"都判断不了，于是从一个能驾驭工具的人，降格成只能全盘接受工具输出的人。所以说得精确一点：内化离散知识，不必；内化那套让你还能自己判断的能力，比任何时候都更需要。这个一刀切不开的答案，正是②步那道分叉（FIG L.3）存在的理由。

The whole weight of the question falls on the “doing” layer and the judgment structure sitting above it. Here the answer is still yes, but the reason has to be precise or it will not hold. The reason is not “in case AI is unavailable someday,” which is fragile pragmatism; it is the deeper chain from Section 3/06: the foundation of doing is the precondition for you to challenge AI, judge, and hold taste at all. Without it you cannot even judge whether AI got it right this time, and you slide from someone who commands the tool to someone who can only accept its output wholesale. So the precise answer is: internalizing discrete knowledge, no; internalizing the capacities that keep you able to judge for yourself, more than ever. This answer that will not cut cleanly into one side is exactly why the step-② fork (FIG L.3) exists.

识别与重现：同一份内容，两个完全不同的成本

Recognition vs reproduction: one content, two utterly different costs

把"知道"和"能做"的成本差拆到认知操作这一层，会落到一对经典区分上：识别（recognition）和重现（recall/production）。识别是"看到正确答案能认出它对"，廉价、快，AI 把它推到了极致——任何讲解你读完都会觉得"对，我懂了"。重现是"没有提示，从头把它生成出来"，昂贵、滞后，也没有任何外部工具能替你补上，因为重现要的是你脑子里已经有一条能被主动激活的路径。

Push the cost gap between knowing and doing down to the layer of cognitive operations and it lands on a classic distinction: recognition versus reproduction. Recognition is seeing the right answer and recognizing it as right: cheap, fast, and AI has pushed it to the extreme, so reading any explanation makes you feel “yes, I get it.” Reproduction is generating it from scratch with no prompt: expensive, lagged, and something no external tool can make up for, because reproduction needs a path already in your head that you can actively fire.

这对区分解释了 AI 学习里最常见的自欺：跟着 AI 的推导读一遍，每一步都识别得很顺，于是断定自己学会了；但检验是合上 AI 从头重现，这时大多数当时觉得懂了的东西都会塌掉。识别的流畅，被大脑误当成了重现的能力——这正是 FIG L.0 那条"知道-能做"缺口在主观体验里的样子。本卷所有"先自答、撤除演练、迁移测试"这些处方，都是在把检验强行从识别切换到重现，因为只有重现这一关，能把真的"能做"和假的"看着会"分开。

This distinction explains the most common self-deception in AI-assisted learning: you read along with AI’s derivation, recognize each step smoothly, and conclude you have learned it. The real test is closing AI and reproducing from scratch, at which point most of what felt understood collapses. The fluency of recognition gets misread by the brain as the capacity for reproduction: that is what the FIG L.0 gap looks like in subjective experience. Every prescription in this volume (self-answer first, removal drills, transfer tests) is, at bottom, forcing the test to switch from recognition to reproduction, because only reproduction separates real doing from the illusion of looking like you can.

为什么"能做"压不下来：循环改变的是结构，不是产物

Why “doing” won’t fall: the loop changes the structure, not the artifact

把这条蓝线为什么压不动的根因说透：能做之所以昂贵，不是算法不够聪明，是它的产物本身就是学习者自身认知结构的改变——而结构改变只能由那个结构的拥有者亲历。一段证明、一次重构、一道临床判断，AI 能在一秒内交付正确的最终物，但那一秒里你的神经表征没有发生任何位移。技能心理学管这个叫程序化（proceduralization）：陈述性的"我知道规则"要经过大量带反馈的执行，才慢慢编译成程序性的"我不假思索就做对"。这条编译链路有自己的时间常数，跟模型参数量没关系。AI 越强，替你跳过的步骤越多，而被跳过的恰恰是编译发生的地方。

Here is why the blue line will not budge: doing is expensive not because a smarter algorithm is still missing, but because its product is the change in the learner’s own cognitive structure, and structural change can only be lived through by whoever owns that structure. A proof, a refactor, a clinical judgment: AI can deliver the correct final artifact in a second, but in that second your neural representation has not shifted at all. Skill psychology calls this proceduralization: declarative “I know the rule” compiles, slowly and only through heavy feedback-laden execution, into procedural “I do it right without thinking.” That compilation has its own time constant, unrelated to parameter count. The stronger the AI, the more steps it skips for you, and the skipped steps are exactly where compilation would have happened.

这也解释了一个反复出现的错觉：跟着 AI 的讲解点头如捣蒜、当场全懂，三天后却复现不出来。点头时用的是"知道"层的识别，廉价且即时；复现要的是"能做"层的提取和生成，昂贵且滞后。这两层在当下的感受里几乎分不清——这正是合意困难家族里最危险的元认知陷阱：表面的流畅被大脑误读成已经掌握，而掌握往往伴着当下的吃力（见第 14 节的速度公理）。AI 把"知道"层的流畅推到了极致，这个错觉也就被推到了极致。

This also explains a recurring illusion: you nod along to AI’s explanation, feel you fully understand it in the moment, then cannot reproduce it three days later. Nodding engages the knowing layer’s recognition (cheap and immediate); reproduction demands the doing layer’s recall and production (expensive and lagged). The two are nearly indistinguishable in the felt present, which is the most dangerous metacognitive trap in the desirable-difficulty family: surface fluency gets misread by the brain as mastery, while real mastery usually comes with present-tense effort (see the speed axiom, Section 14). AI pushes the knowing layer’s fluency to the extreme, so this illusion gets pushed to the extreme too.

成本剪刀差贯穿全卷机理

The cost scissors runs through the whole volume’s mechanics

把这一节的机理收成一句能随身带走的话：信息在涨，内化能力不会自动跟着涨。这道剪刀差不是本节一个孤立的发现，是后面每一节的力学起点——它怎么向下贯穿全卷，值得在这里点一下：

Collapse this section’s mechanics into one sentence worth carrying around: information keeps rising; internalized capacity does not automatically follow. This scissors is the mechanical starting point for every section after it, not an isolated finding, and it is worth naming here how it runs through the rest of the volume:

因为"知道"塌向零而"能做"不动，才有了第 3 节的目标转移：学习的承重从持有答案上游到提问/质疑/整合。
Because “knowing” collapses toward zero while “doing” stays put, there is Section 3’s goal shift: learning’s load moves upstream from holding answers to asking/challenging/integrating.
因为两层张开的缺口正是认知错觉的温床，才有了第 4 节的断裂点担忧：人会系统性地把"查得到"误当成"我会了"，在不知情中让深度思考的肌肉闲置。
Because the gap between the two layers is the breeding ground of cognitive illusion, there is Section 4’s fracture concern: people systematically mistake “I can look it up” for “I have learned it,” idling the deep-thinking muscle unawares.
因为"能做"只在带反馈的循环里生长、且需物理时间巩固，才有了第 5 节的脚手架与第 14 节的速度公理：把循环结构和时间常数工程化进工具流。
Because “doing” grows only in a feedback-laden loop and needs physical time to consolidate, there are Section 5’s scaffold and Section 14’s speed axiom: engineering the loop structure and time constant into the toolflow.
因为剪刀差让某些认知"看似可被代劳、实则代劳即萎缩"，才有了第 6 节的止步线。
Because the scissors makes certain cognition “seem outsourceable yet atrophy upon outsourcing,” there is Section 6’s stop-line.

整卷的反调、处方、仪表盘，全都是从这一道"信息便宜、内化贵"的成本不对称里长出来的推论。认清它，就拿到了读懂后面每一节的总钥匙；忽略它，后面每一条主张都会显得像一个孤立的态度，而不是同一条力学逼出来的结果。

In other words, the whole volume’s dissent, prescriptions, and dashboard are all corollaries grown from this one cost asymmetry: information cheap, internalization expensive. Grasp it and you hold the key to every later section; ignore it and every later claim reads like an isolated attitude rather than what one piece of mechanics forces.

那个必须诚实处理的反例：刻意练习被高估的量级

The counterexample we must handle honestly: deliberate practice is overstated

如果只拿"刻意练习"来撑"能做很贵"这句话，会被一个真实的反例反将一军，得先把它处理掉。Ericsson 1993 把专家表现主要归给练习量，催生了流行的"一万小时"说法。但 Macnamara 等人 2014 年的元分析大幅压缩了这个量级：刻意练习总体只解释了大约 14% 的表现方差，而且分领域差得很——游戏 26%、音乐 21%、运动 18%、教育只有 4%、职业 <1% 且不显著。练习时数远不是技能的全部，天赋、起点、任务结构占了大头。

Rest “doing is expensive” on “deliberate practice” alone, and a real counterexample turns the tables, so handle it first. Ericsson 1993 attributed expert performance mainly to practice volume, spawning the popular “10,000 hours” story. But Macnamara et al. 2014’s meta-analysis substantially shrank that number: deliberate practice explains only about 14% of performance variance overall, and the spread across domains is huge: games 26%, music 21%, sports 18%, education just 4%, professions <1% and not significant. Practice hours are far from the whole of skill; talent, starting point, and task structure carry most of the weight.

所以本卷站得住的锚，不是"练得够久就行"，是更窄也更稳的那一部分：带反馈的犯错-纠正循环，这个结构本身，是技能生长省不掉的一环。循环结构是必要条件，不是充分条件；时数是噪声很大的代理变量，引用时得小心。把话说到这个精度，"能做很贵"才站得住脚，不会被"一万小时是神话"这一句话带跑偏。

So the anchor this volume actually rests on is not “practice long enough” but the narrower, sturdier claim: the structure of a feedback-laden error-correction loop is an indispensable link in growing skill. Loop structure is necessary, not sufficient; hours are a noisy proxy, and should be cited carefully. Only at this precision does “doing is expensive” hold up, instead of getting swept away by “10,000 hours is a myth.”

从零设计的学习容器：方向感、最强反方，和一个还没人做的实验

A container designed from scratch: a direction, the strongest counter, and an experiment nobody has run

把这道剪刀差往前推一步，会问出一个更大的问题：如果真从零设计一个学习容器，不是把 AI 嫁接到旧的个人-课程-考试骨架上，它会长什么样？我们的方向感是——它大概率不再以"课程"为基本单位。课程这副骨架（固定课时、教师定进度、学期末一次性考核）本来就是为"信息稀缺"这个已经不在场的约束搭的。该围着转的，是每个学习者自己犯错-纠正循环的节奏：考核不再是学期末那一道关卡，而是嵌进循环内部、随时触发的检查点；内容顺序不再由固定大纲决定，而是由"你具体在哪个环节复现失败"实时倒推；AI 的角色，从讲解者整体切换成摩擦的设计者——决定什么时候不把答案给你，比把答案讲清楚更重要，因为讲清楚这件事已经免费了。

Push the scissors one step further and a bigger question opens up: what would a learning container look like if it were designed from scratch, instead of AI grafted onto the old person-course-exam skeleton? Our directional guess: it would probably stop using “course” as its basic unit. That skeleton (fixed class hours, a teacher pacing the syllabus, one terminal exam) was built for a constraint that is no longer present, information scarcity. What it should organize around instead is the rhythm of each learner’s own error-correction loop: assessment would stop being the one gate at term’s end and become a checkpoint embedded inside the loop, triggered on demand; content order would stop following a fixed syllabus and instead get inferred, in real time, from exactly where a learner’s reproduction attempt fails; and AI’s job would shift wholesale from explaining to designing friction: deciding when not to hand over the answer matters more than explaining it well, because explaining it well is free now.

最强的反方不是"这想法太理想"，是更狠的一句：整套"从零设计容器"的说法，可能本身就是又一层容器思维——能压缩、能加速犯错-纠正循环的空间，可能根本不存在。如果这个循环的时间常数是神经层面的、跟组织形式无关的生物常量，那"重新设计容器"能省下来的只是排课、考勤这类行政摩擦，压根碰不到那条昂贵的"能做"曲线。顺着这条反方推到底，整段"从零设计"的许诺不过是把旧瓶颈换了个说法，杠杆是零。

The strongest counter is sharper than “that’s too idealistic”: the whole idea of designing the container from scratch might itself be one more layer of container-thinking, because room to compress or speed up the error-correction loop may simply not exist. If that loop’s time constant is a neurological, organization-independent constant, everything a redesigned container could save is administrative friction (scheduling, attendance), and it never touches the genuinely expensive doing-curve. Follow that counter all the way and the whole “design it from scratch” promise is the old bottleneck wearing a new name, with zero real leverage.

能分开这两种可能的不是论证，是一个具体能做的对照：把总的犯错-纠正循环小时数锁定相等，一组人待在重画过的容器里（检查点嵌进循环、内容按个人失败点排序、AI 专职设计摩擦），另一组人待在旧容器里补满同样的循环小时数——比时间到能力的转化率，比迁移测试的通过率。重画过的容器若在同样的循环小时下学得更快、迁移更好，方向感站得住；两组若毫无差别，容器设计就从来不是杠杆，该反方赢。这个对照现在没人做过——这正是为什么前面那句"还没人观察过"不是一句谦辞，是这一卷最深的一处空白。

What would separate the two is not an argument but a concrete comparison: hold total error-correction-loop hours fixed, put one group in a genuinely redesigned container (checkpoints embedded in the loop, content ordered by each learner’s own failure point, AI dedicated to designing friction) and a matched group in the old container logging the same loop hours; then compare the conversion rate from time to capability, and the pass rate on transfer tests. If the redesigned container learns faster and transfers better at equal loop hours, the direction holds. If the two groups come out identical, container design was never the lever, and the counter wins. Nobody has run that comparison yet, which is exactly why calling it unobserved earlier is not a hedge. It is the deepest blank spot in this volume.

检验信号Test signal

看迁移测试通过率，不看答案召回率：撤掉 AI、换个新情境还做得出来，"能做"才算长出；只有 AI 在场才行的，长的是依赖。Watch transfer-test pass rate, not answer recall: if you can still do it in a new situation with AI removed, “knowing how” has grown; what works only with AI present is dependence.

LEARN

REDRAW · 重画

REDRAW

重画 · 目标转移

Redraw · Goal shift

从内化知识，到成为更好的提问者

From Internalizing Knowledge to Becoming a Better Question-Asker

本章把目标上移到提问、质疑、整合，并盯住它们的陷阱：仍要长在"能做"之上。

This chapter moves the goal up to asking, challenging, integrating – and watches their trap: they still grow on top of “doing.”

一句话In one line

答案随手可得后，值钱的是会提对问题、看出 AI 哪里错、把多源熔成判断；但这三件仍要长在你亲手做过的领域上。Once answers are at hand, what’s valuable is asking the right question, spotting where AI is wrong, and fusing many sources into a judgment – but these three still grow on domains you’ve worked by hand.

当一个准确答案离你只有一次提问那么远，价值就从"持有答案"往上游挪到了"提出值得问的问题、识别哪个答案错了、把多个来源整合成一个判断"——也就是提问、质疑、整合这三件。会提问的人才发现得了真问题，这正是它直接喂给上游创新卷"价值发现"的地方：更好的提问者，就是创新的原料。

When an accurate answer is one prompt away, value moves upstream from holding the answer to posing the question worth asking, spotting the wrong answer, integrating many sources into a judgment: asking, challenging, integrating. Only people who can ask well discover real problems; that is exactly where this feeds the innovation volume’s value discovery upstream. A better question-asker is the input innovation runs on.

但别把这三件写成空中楼阁。"提问/质疑/整合"听起来像是能脱离具体知识、单独训练出来的纯元能力——这是个危险的误读。你没法质疑一个你在"能做"层毫无根基的领域里的 AI 输出：质疑能不能命中，靠的是你亲手趟过犯错-纠正循环之后才有的那种"这里不对劲"的直觉。元能力不是替代"能做"，是长在"能做"之上（回接第 2 节）。所以这一卷的目标转移是内化的对象变了：从内化离散事实，变成内化提问、质疑、整合这套判断结构，不是"从此不用内化了"。

But do not build this into a castle in the air. Asking, challenging, integrating sound like pure meta-skills you could train in isolation from concrete knowledge, which is a dangerous misreading. You cannot challenge an AI output in a domain where you have no foundation in doing: whether a challenge lands relies on the “something’s off here” intuition that only your own error-correction loop can give you. Meta-skills do not replace doing; they grow on top of it (back to Section 2). So this volume’s goal shift is that the object of internalization changed, from discrete facts to the judgment structure behind asking, challenging, and integrating. It is not “stop internalizing.”

FIG. L.4 / 犯错-纠正循环THE ERROR-CORRECTION LOOP看懂：能做只在这个闭环里生长，AI 只能进校验环Read: doing grows only inside this loop; AI enters only at verify

顺着环读：能做之所以贵，是因为它只在这个闭环里生长——尝试、犯错、先自纠、再校验、重做。Ericsson 1993 的承重处不是"一万小时"，是这个带反馈的纠错环（Macnamara 2014 元分析显示练习量只解释约 14% 方差，所以慎引时数）。AI 的合法位置是④校验，且必须在你自纠之后进；一旦把它挪到①前面替你出第一稿，整个环就被绕过，产物出来了而你的结构没动。Follow the loop: doing is expensive because it grows only inside this loop – attempt, error, self-correct, verify, redo. Ericsson 1993’s load-bearing part is not “10,000 hours” but this feedback-laden corrective loop (Macnamara 2014’s meta-analysis shows practice volume explains only ~14% of variance, so cite hours with care). AI’s legitimate seat is ④ verify, and it must enter after your self-correction; move it ahead of ① to draft for you and the whole loop is bypassed – the artifact ships while your structure stays still.

FIG. L.10 / 元认知监测环THE METACOGNITIVE MONITORING LOOP看懂：监测自己学得准不准——而 AI 专门腐蚀这个环的第一格Read: the loop that keeps your self-read honest – and AI corrupts its first node

顺着环看：能不能停在 FIG L.9 的脚手架一侧，取决于这个三步环转得准不准。①自评"我真的会了吗"；②把自感与一次真实测验（而非重读的流畅感）对齐——这一步就是校准；③针对差距做合意困难的练习；再回到①。环的命门在第一格：AI 持续往里注入合成自信——[R19] METR 那条"自感更快、实测更慢"正是它的刻度。可操作的校准量来自 [R18]：你多常反驳 AI、且反驳得对；这个数下降，意味着①已被腐蚀，环在空转。练习一格之所以要的是 [R8] 合意困难而非轻松复述，也是为了让②的实测有真实信号可校准。Around the loop: whether you can stay on FIG L.9’s scaffold side depends on this three-node loop running true. ① self-assess “do I really know it?”; ② align that feeling against one real test (not the fluency of rereading) – this step is calibration; ③ run desirable-difficulty practice on the gap; back to ①. The loop’s weak point is the first node: AI keeps injecting synthetic confidence – [R19] METR’s “felt faster, measured slower” is its gauge. The operable calibration quantity comes from [R18]: how often you push back on AI, and correctly; when that number falls, node ① is corrupted and the loop spins empty. Practice must be [R8] desirable difficulty rather than easy restatement precisely so node ②’s test has a real signal to calibrate against.

这卷的反调，最终落在一个正向的人像上

The dissent finally lands on a positive portrait of a person

这一卷很容易被读成纯粹的防守——划止步线、抵抗便利、防萎缩，好像全部内容就是"别让 AI 拿走什么"。但它的落点是正向的：一个在 AI 时代更强的人长什么样子。那个人的强，不在记得更多事实（那层已经廉价），也不在打字更快（执行已经充裕），而在成为一个更好的提问者：能在一片被 AI 拉平的信息里嗅出哪个问题真正值得问，能在 AI 笃定的输出里一眼看出哪里不对劲并反驳得准，能把彼此冲突的多个来源在自己脑子里熔成一个自洽的判断。

It is easy to read this volume as pure defense: draw the stop-line, resist convenience, prevent atrophy, as if the whole point were “don’t let AI take something away.” But its real landing point is positive: what a stronger person looks like in the AI era. That strength is not remembering more facts (already cheap) or typing faster (execution is already abundant); it is becoming a better question-asker, someone who can sniff out which question is truly worth asking in a field flattened by AI, spot where a confidently-toned AI output is wrong and push back accurately, and fuse conflicting sources in their own head into one coherent judgment.

提问、质疑、整合这三件能力，勾勒出的是一个驾驭工具、始终知道自己要去哪的认知主体，不是一个抗拒工具的怀旧者。这一卷所有"反调"的纪律——先想后问、保留循环、设合意困难、划止步线——存在的唯一目的，是让这个人像不在便利的默认引力下被磨平。所以这一卷的底色其实是建设性的：它不要你少用 AI，只要你把 AI 用成让自己变强的方式，而不是悄悄把判断让渡出去、连自己为什么同意都说不清的那种用法。守住认知主权，是为了让你有资格、也有能力去做那个回归于意义的判断者。这正是组织卷"人回归于意义"在个体认知层的前提。

Asking, challenging, integrating together sketch a cognitive subject who commands the tool and always knows where they are going, not a nostalgist resisting it. Every “dissenting” discipline in this volume (think before you ask, keep the loop, build desirable difficulty, draw the stop-line) exists for one purpose: keep that portrait from being worn flat under convenience’s default gravity. So the volume’s real tone is constructive. It does not ask you to use AI less, only to use it in a way that makes you stronger, not in the way that quietly cedes judgment until you cannot even say why you agreed. Guarding cognitive sovereignty is what makes you entitled, and able, to be the judge who returns to meaning: the individual-cognition precondition behind the org volume’s own claim.

整合：为什么它是三件里最难、也最易退化的

Integration: the hardest of the three, and the easiest to degrade

提问、质疑、整合三件元能力里，整合最常被低估，因为它看起来只是"把几个来源放一起"。但整合的内核动作其实很苛刻：把多个可能彼此冲突的来源，放进同一张心智模型里，求一个自洽的判断。这要求你脑子里先有一张能被冲突的模型；没有这张模型，所谓整合就退化成两种更廉价的赝品：要么是拼贴——把各来源的结论并排抄下来，不做调和，看着全面其实没有判断；要么是随大流——直接采信出现频率最高、语气最自信的那个，把众数当真理。

Of the three meta-skills, integration is most often underrated, because it looks like merely putting several sources together. But its core move is demanding: taking multiple, possibly conflicting sources and placing them into one mental model to reach a self-consistent judgment. That requires you already hold a model that can be contradicted. Without it, integration degrades into two cheaper counterfeits: collage, copying each source’s conclusion side by side with no reconciliation, looking comprehensive while judging nothing; or going with the crowd, trusting whichever answer is most frequent or most confidently toned, mistaking the mode for the truth.

AI 的便利恰好同时助长这两种赝品：它能瞬间汇总十个来源，诱你拼贴；也能给出一个语气笃定的综合答案，诱你随大流。整合反而要求你慢下来，亲自去碰那些来源之间的矛盾，用自己的模型去裁决——这又是"价值在于慢"的一个具体例子（接第 14 节）。所以整合能力的退化往往最隐蔽：你以为自己在综合信息，其实只是在转述 AI 替你做好的综合，而你那张本该被反复冲突、反复修正的模型，从没真正上过场。

AI’s convenience fosters both counterfeits at once: it can summarize ten sources instantly, tempting collage, and hand you one confidently-toned synthesis, tempting crowd-following. Real integration instead demands you slow down, touch the contradictions between sources yourself, and adjudicate with your own model, another concrete case of “value lies in slowness” (continuing into Section 14). So integration’s decay is often the hardest to notice: you think you are synthesizing information when you are only paraphrasing AI’s synthesis, while the model that should have been contradicted and corrected repeatedly never actually took the field.

质疑命中率：为什么它无法靠"批判性思维课"凭空补

Challenge hit-rate: why a “critical-thinking course” can’t conjure it

有一种流行的补救幻想：既然 AI 让人少动脑，那就单独开一门"批判性思维"或"提问技巧"课，把元能力补回来。这一卷得明确否定这条捷径，因为它误解了质疑能力从哪儿来。质疑 AI 输出的命中率，本质是领域内的错误侦测灵敏度：你能不能一眼看出"这个论断不对/这段代码有 bug/这个临床结论可疑"。这种灵敏度是高度领域特异的——一个资深医生能瞬间嗅出可疑的诊断，但换到一份陌生的法律文书前，同样会被 AI 的流畅蒙住。它来自你在这个领域里亲历过的大量"这里不对劲"的瞬间，是第 2 节"能做"层直接沉淀下来的东西，没法靠一门通用思维课迁移过去。

There is a popular remedial fantasy: since AI makes people think less, just open a standalone critical-thinking or questioning-skills course and top the meta-skills back up. This volume rejects that shortcut, because it misunderstands where the challenging capacity comes from. The hit-rate of challenging AI output is, at root, in-domain error-detection sensitivity: can you see at a glance that this claim is wrong, this code has a bug, this clinical conclusion is suspect. That sensitivity is highly domain-specific: a senior physician can instantly smell a dubious diagnosis, then get fooled by AI’s fluency just as easily in front of an unfamiliar legal document. It comes from the many lived “something’s off here” moments inside that domain, a direct deposit of Section 2’s doing layer, and it cannot be transferred in by a generic thinking course.

所以"成为更好的提问者"不靠上课，靠在你真正在乎的那个领域里保持亲手做的密度，守住那个让你有资格质疑的根基。这也反过来收紧了这一卷的反调：抵抗便利不是泛泛地"多动脑"，是精确地守住那几个你需要保有错误侦测灵敏度的领域。工程卷把这条灵敏度做成了组织机制（在环练习、常规产出抽检），见工程卷·验证分诊；组织侧对应的三道缓解与其证据（自动化偏差，Parasuraman-Riley 1997），见组织卷·别把人移出环——这三处其实是同一本账。

Becoming a better question-asker is not something you learn in a class; it comes from keeping up the density of hands-on doing in the domain you actually care about, guarding the foundation that qualifies you to challenge at all. That tightens the volume’s contrarian stance too: resisting convenience means precisely guarding the few domains where you need to keep error-detection sensitivity alive, not vaguely “thinking more.” The engineering volume turns that sensitivity into an organizational mechanism (in-the-loop practice, routine-output spot checks); see Engineering · verification triage. The org side’s matching mitigations and evidence (automation bias, Parasuraman-Riley 1997) are in Organization · don’t take the human out of the loop. All three are the same ledger.

"更好的提问者"为什么是上游创新的输入

Why “a better question-asker” is the input to upstream innovation

把目标定在"更好的提问者"而不是"更好的答案持有者"，不只是学习卷内部的一次重画，它在系列里有个很具体的接口：会提问的人才发现得了真问题，而真问题正是创新卷"价值发现"的原料。这条接口值得说清楚，不然"提问者"听起来像句空泛的好话。一个问题的质量，等于它切中真实需求的程度，乘以它把模糊转译成可攻结构的程度。AI 很擅长在一个给定的问题里搜答案，却没法替你判断这个问题本身值不值得问——后者要的是对真实世界缺口的感知，而那种感知只能从你在这个领域的亲历里长出来（体验，第 0 节）。所以"提问质量"不是句修辞，是一个有结构的能力，它的上限由你"能做"层的根基定死（第 3 节反复强调的那条防线）。

Setting the goal at “a better question-asker” rather than “a better answer-holder” is not only an internal redraw for the learning volume; it has a precise interface in the series: only people who ask well discover real problems, and real problems are the raw material of the innovation volume’s value discovery. That interface is worth spelling out, or “question-asker” sounds like a vague compliment. A question’s quality equals how much it hits a real need, times how much it translates the vague into an attackable structure. AI is very good at searching for an answer inside a given question, but it cannot judge for you whether the question itself is worth asking: that demands a sense of real-world gaps that can only grow from your own lived experience in the domain (experience, Section 0). Question quality is a structured capacity, not rhetoric, and its ceiling is set by the foundation of your doing layer (the defense Section 3 keeps stressing).

这也给这一卷的反调一个建设性的出口：抵抗便利不是为了守旧，是为了守住那个"能发现真问题"的认知主体。一个人如果把每次提问都退化成"帮我查一下"，久而久之丢掉的不只是某项技能，而是识别什么值得问的那种品味——而这恰恰是 AI 时代最稀缺、最不可外包、整个下游创新链都靠它的东西。学习卷守住它，创新卷才有上游的水源。

This also gives the volume’s contrarian stance a constructive exit: resisting convenience is about guarding the cognitive subject who can still discover real problems, not about clinging to the old. If someone degrades every question into “just look this up for me,” over time what they lose is not one skill but the taste for recognizing what is worth asking, exactly the thing scarcest and least outsourceable in the AI era, the thing the whole downstream innovation chain runs on. The learning volume guards it so the innovation volume has water upstream.

提问、质疑、整合：可训练的结构，不是天赋

Asking, challenging, integrating: a trainable structure, not a gift

把这三件元能力拆开，能看出各自压在哪条已知机制上，从而知道怎么练，而不是空喊一句"要会提问"。提问的内核是问题表征：把一团模糊的不适感转译成一个有结构、可攻的问句；练的方式是反复亲手做问题分解（第 7 节右栏第一条），不是看别人怎么问。质疑的内核是错误侦测——一种识别能力，它的灵敏度直接正比于你在这个领域亲历过多少次"这里不对劲"；这就是为什么质疑命中率是第 2 节"能做"层的下游产物，没根基的质疑只是抬杠。

Pull the three meta-skills apart and you can see which known mechanism each rests on, and so how to train them, rather than chanting “learn to ask questions.” The core of asking is problem representation: translating a blob of vague discomfort into a structured, attackable question; it is trained by repeatedly doing problem decomposition by hand (Section 7, right column, first item), not by watching how others ask. The core of challenging is error detection: a recognition capacity whose sensitivity is directly proportional to how many times you have personally lived “something’s off here” in that domain. That is why challenge hit-rate is downstream of Section 2’s doing layer, and why a challenge with no foundation is just contrarianism.

整合的内核是把多个来源放进同一张心智模型里求一致：它要你脑子里已经有一张能被冲突的模型，否则"整合"就退化成拼贴。三者都长在能做之上，这就回到了第 2 节那条警告——元能力谈不出来，只能长出来。

The core of integrating is forcing several sources into one mental model and resolving them into one consistent read: it requires you already hold a model that can be contradicted, or integration degrades into collage. All three grow on top of doing, which loops straight back to Section 2’s warning that meta-skills cannot be talked into being.

这给"目标转移"一个不容易被误读的说法：目标不是从"内化"转到"不内化"，是内化的对象从离散事实挪到了这三套判断结构上。它们同样昂贵，同样只能靠犯错-纠正循环长出来——区别只在于，循环的素材从"题目"变成了"我对 AI 的每一次提问和质疑"。一个健康的 AI-Native 学习者，会把每一次和 AI 的交互都当成练这三件事的机会，而不是一次省事。

That gives the goal shift a precise statement that resists misreading: the goal is that the object of internalization moved up, from discrete facts to these three judgment structures, not a move from internalize to don’t internalize. They are equally expensive, and they equally only grow through the error-correction loop; the only difference is the loop’s material shifts from problem sets to every question you put to AI and every time you push back on it. A healthy AI-Native learner treats every interaction with AI as a rep on these three, not a shortcut.

检验信号Test signal

看你质疑 AI 的频率与命中率、提问与整合的质量，而非答案召回率；好的学习者越来越多反驳 AI，且反驳对。Watch how often and how accurately you challenge AI, plus the quality of your asking and integrating – not answer recall; a good learner pushes back on AI more and more, and correctly.

LEARN

FRACTURE · 断裂点

FRACTURE

前沿 · 头号悬案

Frontier · The open question

认知卸载，与主动抵抗

Cognitive Offloading, and Active Resistance

记、算、查都交给 AI 之后，深度思考的肌肉会不会跟着松掉？这是这卷埋得最深的一个问题，我们没有答案，只有一个立场。

Once remembering, computing, and looking things up are all handed to AI, does the muscle for deep thinking go slack? This is the deepest open question in the volume. We don’t have an answer, only a stance.

一句话In one line

会不会萎缩，现有证据答不了这个问题——只够到相关和短期。我们不下判决，只下一个赌注：把"主动抵抗便利"当低成本保险买下来。Whether it atrophies, the evidence can’t yet say; it only reaches correlation and the short term. We’re not issuing a verdict, just placing a bet: buy “actively resisting convenience” as cheap insurance.

先把立场钉死：这是一份有据的警告加一个写明了证伪条件的赌注，不是"AI 已经让人变笨"。把还没发生的事写成既成事实，既是对证据撒谎，也会把这一卷的可信度赔进去。诚实的说法是——有一类担忧机理上说得通、历史上有先例，但它套在 AI 头上这件事还没人证实。两边的证据我们都摆出来，你自己看。

Let’s pin the stance down first: this is an evidence-grounded warning plus a bet with its falsification condition spelled out, not “AI is already making people stupid.” Writing something unproven as settled fact both lies about the evidence and spends down the volume’s credibility. The honest version: there’s a class of worry that is mechanistically plausible and has historical precedent, but nobody has yet proven it about this particular object, AI. We’re putting both sides of the evidence on the table.

担忧侧 · 待坐实Concern side · to be substantiated

相关 / 短期信号Correlational / short-term signals

Gerlich 2025[R4]（Societies 15(1):6，N=666，横断面相关）：AI 使用与批判性思维显著负相关，由认知卸载中介（总效应 b=-0.42），作者自承不能证因果、无纵向数据。MIT/Kosmyna 2025[R5]（arXiv:2506.08872，preprint，Ⅲ，N=54→第四轮仅 18）：EEG 显示"认知负债"累积，样本极小、未经评审。机理先例：Sparrow 等 2011 "Google 效应"（Science 333:776）[R6]——预期可再获取则记得信息更少、记得"去哪找"更多。Gerlich 2025 (Societies 15(1):6, N=666, cross-sectional correlational): AI use negatively correlates with critical thinking, mediated by cognitive offloading (total effect b=-0.42) – the author concedes no causation, no longitudinal data. MIT/Kosmyna 2025 (arXiv:2506.08872, preprint, III, N=54 → only 18 by the 4th session): EEG shows accumulating “cognitive debt” – tiny sample, un-reviewed. Mechanistic precedent: Sparrow et al. 2011 “the Google effect” (Science 333:776)[R6] – when re-access is expected, people recall the info less and recall where to find it more.

反证侧 · 必须摆上Counter side · must be shown

最强因果反指向上The strongest causal evidence points up

最强的因果研究（脚手架式 AI 辅导）在方向上给出正效应——用对了，AI 显著提升学习；但精确量级尚无可核的元分析支撑，本卷只作方向性判断，可核读数以 Bastani 对照（R24）为准。"认知再分配，而非衰退"（再分配假说，理论立场·非实证）[R7]：现有研究多为小样本、无纵向、相关设计，且把"在被卸载任务上少花力气"误读成"泛化损伤"；AI 可能在把认知资源从可卸载任务重新分配到评估/综合/元认知。且 Sparrow 2011 在重复实验中未能稳定复现。The strongest causal studies (scaffolded AI tutoring) point positive in direction – used well, AI lifts learning significantly; but the precise magnitude has no checkable meta-analysis behind it, so this volume makes only a directional judgment, deferring the checkable reading to the Bastani control (R24). “Cognitive redistribution, not decline” (the redistribution hypothesis, a theoretical position · not empirical): existing studies are mostly small-sample, non-longitudinal, correlational, and conflate “less effort on offloaded tasks” with “generalized impairment”; AI may be reallocating cognitive resources from offloadable tasks toward evaluation / synthesis / metacognition. And Sparrow 2011 has failed to replicate robustly.

分水岭 · "怎么用"The hinge · “how you use it”

影响不固定，取决于用法The effect is not fixed; it depends on use

结构化提示 RCT（MDPI Data 2025, 10(11):172，n=150）：无引导地用 AI 助长卸载、不提升推理；结构化地用则显著降低卸载、提升批判推理与反思。结论：认知影响不是固定的，取决于你怎么用——这正给了"主动抵抗便利"一个可操作的支点。A structured-prompting RCT (MDPI Data 2025, 10(11):172, n=150): unguided AI use fosters offloading without improving reasoning; structured use significantly reduces offloading and improves critical reasoning and reflection. Conclusion: the cognitive effect is not fixed; it depends on how you use it – which is exactly the operable foothold for “actively resisting convenience.”

抵抗不是拒用 AI，是几个具体姿态——先想后问（自己先出一个假设，再让 AI 校验）、偶尔手算手写一遍、刻意晚一步求助（给犯错-纠正这个循环留出发生的时间）。有一条边界必须说清：合意困难只对有基础、能自己撑过去的学习者有用，没基础硬撑就是 Bjork 说的"不合意的困难"——抵抗要有分寸，不是自虐。

Resisting doesn’t mean refusing AI. It means a few concrete postures:think before you ask (form your own hypothesis first, then have AI check it), occasionally do it by hand, deliberately delay asking for help so the error-and-correction loop has time to happen. One boundary matters here: desirable difficulty only helps learners who already have enough footing to push through; without that footing it’s just Bjork’s “undesirable difficulty.” Resistance needs proportion, not self-punishment.

证伪条件Falsification condition

什么会让我们改判：一个纵向、随机分组的研究，显示重度 AI 协作者撤掉 AI 后不退步、甚至更强。先行指标是第一批跨年纵向数据出现——截至本版（2026-07）还是零。What would change our mind: a longitudinal, randomized study showing heavy AI-collaborators don’t regress once AI is removed, or even improve. The leading indicator is the first multi-year longitudinal data to appear; as of this edition (2026-07) that’s zero.

KEYFIG. L.2 / 合意困难曲线THE DESIRABLE-DIFFICULTY CURVE看懂：长期留存对难度的倒 U，AI 把你推向左端Read: long-term retention is an inverted-U over difficulty

沿曲线看：横轴是学习当下的难度/努力，纵轴是长期留存与迁移——两者是倒 U，不是单调。左端（太容易）流畅却不留痕；右端（太难、无基础）只剩挫败，是 Bjork 所说的"不合意的困难"；峰值落在中间的合意困难带。AI 的便利默认持续把学习者推向左端——抵抗便利，本质就是主动把自己拉回到绿色带里。注意边界：把无基础者强行推到右端不是抵抗，是自虐。Along the curve: the x-axis is in-the-moment difficulty/effort; the y-axis is long-term retention and transfer – an inverted U, not monotone. The left end (too easy) is fluent but leaves no trace; the right end (too hard, no base) is pure frustration, Bjork’s “undesirable difficulty”; the peak sits in the productive-struggle band in the middle. AI’s convenience default keeps pushing the learner leftward – so resisting convenience is, at root, actively pulling yourself back into the green band. Note the boundary: forcing a no-base learner to the right end is not resistance but self-harm.

FIG. L.11 / 两个对立假说TWO RIVAL HYPOTHESES看懂：同一批数据，两条相反的解释，至今无人能裁决Read: one body of data, two opposite readings, no one can yet adjudicate

从图里读出：左右两块用的是同一批观测：AI 重度使用者在某些任务上表现更弱。萎缩假说把它读成能力净退化（证据为 [R4][R5][R10] 这类相关/短期信号）；再分配假说把它读成认知资源上移、被卸载的只是低阶任务（延伸心智 [R12] 与过渡相论 [R7] 同向）。两条主曲线都按本页图例画成点线：不是谁比谁更可信，是两个都只是竞争解释，谁都没被证实。关键在最底那条黑带：唯一能分开二者的实验——无 AI 在场时独立产出的纵向轨迹——至今没有数据。诚实的做法不是站队，而是把强命题降格为赌注，并去做那个实验。What the figure says: the two panels read the same observations – heavy AI users do worse on certain tasks. Atrophy reads it as net capacity decline (its evidence is correlational/short-term signals like [R4][R5][R10]); reallocation reads it as cognitive resources moving up while only low-level tasks are offloaded (the extended mind [R12] and a transition-phase view [R7] point the same way). Both main curves are drawn dotted per the page legend: not because one is less credible than the other, but because both are competing explanations, and neither is confirmed. The crux is the black bar at the bottom: the only experiment that could separate them – the longitudinal trajectory of independent output with no AI present – has no data yet. The honest move is not to take a side but to downgrade the strong claim to a bet and go run that experiment.

为什么这是保险，不是道德说教

Why this is insurance, not a moral lecture

关于 AI 与认知，很多讨论一开口就是道德腔："要自律""别偷懒"。我们不想走这条路——道德主张既没有证据撑腰，也没法落地成动作。这里其实是一道决策论的题：萎缩假说和再分配假说谁对，眼下分不出胜负；而"主动抵抗"这件事的成本低到几乎可以忽略（改改次序、留一段挣扎的时间，一天几分钟），它对冲的潜在损失却很大——如果萎缩假说是真的，赔进去的是判断力这种事后补不回来的东西。一个成本极低、最坏情形下能护住你、最好情形下也不吃亏（再分配假说成立时，它本身就是把资源导向高阶功能的动作）的选择，放在任何一套理性决策的框架里，都是该买的保险。

A lot of talk about AI and cognition slides straight into a moral register: be disciplined, don’t get lazy. We’d rather not go there, because a moral claim has no evidence behind it and gives you nothing to actually do. What’s really on the table is a decision-theory question. Nobody can yet say whether atrophy or redistribution is correct, and in that window, actively resisting costs almost nothing (reorder a step, keep a struggle window, a few minutes a day), while what it hedges against is large: if atrophy turns out true, what’s lost is judgment itself, a capacity you can’t buy back later. A move that’s cheap, protects you in the worst case, and costs you nothing in the best case (under redistribution it’s literally the move that steers resources upward) is, by any rational accounting, insurance worth buying.

这也是为什么我们坚持说"赌注"而不说"判决"：判决要等证据落定，保险只需要不确定性够大、下行够严重就该买。眼下的证据状态——强相关信号、强反证、零纵向数据——正是教科书式的"该买保险"局面。把抵抗便利叫成美德，会让它显得可选，意志力弱的人就可以豁免；叫它保险，它就是一件不管你自不自律都该做的理性的事。

That’s also why we keep calling it a bet, not a verdict: a verdict needs the evidence settled, insurance only needs the uncertainty large enough and the downside serious enough. The current evidence (strong correlational signal, strong counter-evidence, zero longitudinal data) is a textbook case for buying it. Call it a virtue and it looks optional, something willpower can excuse you from. Call it insurance and it’s a rational move you do regardless of your temperament.

抵抗的具体做法：不是少用，是改次序

What resistance actually looks like: reorder, don’t reduce

"主动抵抗便利"最容易被听成"少用 AI，甚至不用"——那是把它读成了苦行。它的意思是改变你和 AI 协作的次序与节奏，给犯错-纠正这个循环留出发生的空间。三个具体的、成本很低、现在就能做的动作：

“Actively resisting convenience” is easily heard as “use AI less, or not at all,” which reads it as asceticism. What it actually means is changing the order and rhythm of working with AI so the error-and-correction loop still has room to happen. Three concrete, cheap, do-it-today moves:

刻意先想后问：任何要交给 AI 的问题，先用两分钟写下你自己的假设或第一版答案，再喂给 AI，这一步保证你的认知先上场、AI 退回校验位（接第 7 节流向）。
Deliberately think before you ask: for any question you will hand to AI, spend two minutes writing your own hypothesis or first answer, then feed it to AI – this guarantees your cognition takes the field first and AI falls back to the verify seat (continuing Section 7’s flow).
刻意手算/手写一遍：对一项你想真正内化的技能，定期不用 AI、不用补全，从头手做一遍，哪怕慢、哪怕错，这是唯一能产生"能做"层留痕的动作。
Deliberately compute/write it out by hand once: for a skill you want to truly internalize, periodically do it from scratch with no AI and no autocomplete – slow, even wrong, this is the only action that leaves a “doing”-layer trace.
刻意延迟求助：遇到卡点，先给自己一个固定的挣扎窗口（比如十五分钟）再求助 AI，这个窗口就是合意困难，是让记忆痕迹被加固的地方。
Deliberately delay help: at a sticking point, give yourself a fixed struggle window (say fifteen minutes) before consulting AI – that window is the desirable difficulty, where the memory trace gets hardened.

三个动作的共同点：成本很低、不需要硬扛意志力，而且全都是在调次序，不是在砍用量。

What the three share: they’re cheap, they don’t require white-knuckling, and they all reorder rather than cut usage.

GPS 和计算器：先例只能证明机理，证明不了因果

GPS and calculators: precedent for mechanism, not for causation

我们常拿两个先例说事——GPS 弱化空间认知、计算器弱化心算——它们的证据边界也得老实交代，不然反过来会成为这卷的软肋。GPS 这条最常被引用：伦敦出租车司机的研究（Maguire 2000）[R9]发现，长期做空间导航的人后海马更大；另一项研究发现，习惯用 GPS 的人海马灰质更少（Dahmani & Bohbot 2020）[R10]。但这里有个绕不过去的方向问题：是 GPS 用多了导致海马萎缩，还是天生海马偏弱的人本来就更倾向依赖 GPS？横断面相关分不清这两种因果方向。计算器这条更弱：是个流传很广的说法，却缺一手的实证支撑，大多数引用都是泛泛而谈，我们干脆不把它当硬证据用。

We often reach for two precedents (GPS weakening spatial cognition, calculators weakening mental arithmetic), and their evidence boundaries deserve an honest accounting too, or they become a soft spot used against us. GPS is the more cited: the London taxi-driver studies (Maguire 2000) found long-term spatial navigators have a larger posterior hippocampus; a separate study found habitual GPS users have less hippocampal grey matter (Dahmani & Bohbot 2020). But there’s a direction problem we can’t wave away: does GPS use cause hippocampal shrinkage, or do people with a naturally weaker hippocampus simply tend to lean on GPS more? Cross-sectional correlation can’t tell those apart. The calculator claim is weaker still: a widely repeated line that lacks a strong first-hand empirical anchor; most citations of it are hand-wavy, so we deliberately don’t treat it as hard evidence.

这两个先例能做的事，是给担忧提供机理上的合理性：把某种认知外包出去，确实可能改变内在能力，这个机理说得通。它们做不到的事，是给 AI 提供因果证明：AI 会不会导致萎缩，仍然是个开放的经验问题。老实分清"机理先例"和"因果判决"这两件事，正是这卷不滑向危言耸听的关键——先例能让担忧站得住脚，但只能撑到"赌注"这一档，撑不到"判决"。

What these two precedents actually do is give the worry mechanistic plausibility: outsourcing some form of cognition could plausibly change inner capacity. What they don’t do is offer causal proof about AI: whether AI causes atrophy remains an open empirical question. Honestly separating “mechanistic precedent” from “causal verdict” is the key move that keeps this from sliding into scaremongering: precedent lets the concern stand, but only as far as “a bet,” never as far as “a verdict.”

两个对立的假说，都摆在桌上

Two rival hypotheses, both on the table

诚实处理一桩悬案，意味着不能偷偷先预设答案。这里有两个机理上都站得住、也都有些证据的假说，对着同一批观测给出相反的解释——它们得并排放着。萎缩假说：长期把深度思考外包给 AI，被卸载的那部分能力像久不用的肌肉一样退化。支持它的是一堆相关/短期信号——Gerlich 调查了 666 人，发现频繁用 AI 和批判性思维显著负相关，中介变量正是认知卸载（总效应 b=−0.42）；MIT 那组用 AI 写作的人，EEG 上的"认知负债"在累积，78% 的人连自己刚写的句子都引用不出来；再加上 GPS-海马和 Google 效应这两个机理先例。

Handling an open question honestly means not quietly presupposing the answer. There are two hypotheses here, both mechanistically defensible, both with some evidence, giving opposite readings of the same observations; they have to stand side by side. The atrophy hypothesis: outsource deep thinking to AI long enough and the offloaded capacity decays like an unused muscle. What backs it is a pile of correlational, short-term signals: Gerlich surveyed 666 people and found frequent AI use negatively correlated with critical thinking, mediated by offloading (total effect b=−0.42); the MIT group who wrote with AI showed accumulating “cognitive debt” on EEG, with 78% unable to quote a sentence they’d just written; add the GPS-hippocampus and Google-effect precedents on top.

再分配假说（这是个理论立场，不是实证结论）走另一条路：AI 把资源从可以外包的低阶任务转移到评估、综合、元认知这些更高阶的功能上，不是让认知能力净减少；现有研究测到的"在被卸载任务上变弱"，可能只是这次转移的过渡阶段，不是永久衰退。延伸心智（Clark & Chalmers）站在同一边：卸载是认知边界往外扩，不是能力往下掉。两个假说分歧的地方，落在一个目前谁都没有数据回答的问题上——没有 AI 在场时的迁移任务，重度协作者的独立表现，随时间是往下走还是往上走？

The redistribution hypothesis (a theoretical position, not an empirical finding) goes the other way: AI doesn’t net-degrade cognitive capacity, it shifts resources from offloadable low-order tasks toward higher-order functions: evaluation, synthesis, metacognition. What existing studies measure as “weaker on offloaded tasks” may just be the transition phase of that shift, not permanent decline. The extended mind (Clark & Chalmers) sits on the same side: offloading pushes the cognitive boundary outward, it doesn’t drain capacity. Where the two hypotheses actually disagree comes down to a question nobody has data for yet: on a transfer task with no AI present, does a heavy collaborator’s independent output fall or rise over time?

我们不假装知道哪个假说是对的。能说的只有一件事：在这个悬而未决的窗口里，主动抵抗便利是一份两头都对冲的低成本保险。萎缩假说要是真的，抵抗就避免了真实的能力流失；再分配假说要是真的，抵抗（先自答、留住循环、制造合意困难）恰好就是那套把资源往高阶功能引导的动作。这不是空谈：一项结构化提示的随机对照实验（n=150）用的是同一个模型，只是把交互从"直接要答案"换成"先自答、再要理由"，就把卸载的趋势扭转了过来。抵抗在两种世界里都不吃亏。面对一桩悬而未决的案子，"不管哪个假说成立都该做"正是唯一稳健的下注方式。

We don’t pretend to know which hypothesis is correct. All we can say is this: in this unsettled window, actively resisting convenience is cheap insurance that hedges both ways. If atrophy is true, resistance averts a real loss of capacity. If redistribution is true, resistance (answer first, keep the loop, build in desirable difficulty) is exactly the set of moves that steers resources toward higher-order work. This isn’t hand-waving: a structured-prompting RCT (n=150) used the same model and merely switched the interaction from “ask straight for the answer” to “answer first, then justify it,” and that alone reversed offloading. Resistance loses in neither world. Facing an open case, “do it regardless of which hypothesis wins” is the only sturdy way to place a bet.

脚手架和拐杖之间，只隔一个"次序"

Between scaffold and crutch there’s only “order”

认知卸载本身没有好坏之分：延伸心智论（Clark & Chalmers 1998）[R12]甚至说它是人类认知的常态，纸笔、地图、笔记本都是外挂的脑子。真正决定它是脚手架还是拐杖的，是它落在能力成长曲线的哪一段、上下文往哪个方向流，不是"用不用 AI"。脚手架的定义性特征是可以拆：维果茨基说这个词的时候，意思是支撑随能力增长逐步撤走，最后学习者自己站住。拐杖正相反：用得越久，被支撑的那块肌肉越弱，依赖越深，撤走的那天越来越远。同一个 AI，给先自己推导、再拿去校验的人用，是脚手架；给张口就要答案、从来不自己复现一遍的人用，就是拐杖。差别在次序和撤除的路径，不在工具本身——这正是第 7 节"先人后机"那条规则要锁住的东西。

Cognitive offloading is neutral in itself: the extended-mind thesis (Clark & Chalmers 1998) goes so far as to call it the normal state of human cognition; paper, maps, notebooks are all external brains. What actually decides scaffold versus crutch isn’t “AI or no AI” but where it sits on the capability-growth curve and which way the context is flowing. A scaffold’s defining trait is that it’s removable: when Vygotsky used the word, he meant support withdrawn gradually as ability grows, until the learner stands alone. A crutch is the opposite: the longer you use it, the weaker the supported muscle gets, the deeper the dependence, the further off the day of removal. The same AI is a scaffold for someone who derives first and asks it to check, and a crutch for someone who asks for the answer outright and never reproduces it. The difference is the order and the removal path, not the tool itself: exactly what Section 7’s “human-first” rule locks in.

所以"断裂点"的位置是撤除测试失败的那一刻，不是某个使用频率的门槛：当你答不出"拿掉这个支撑我还站得住吗"，脚手架已经在你没注意的时候变成了拐杖。这给了"抵抗"一个具体能用的扳手：定期做撤除演练，不是少用 AI——隔一段时间，刻意在没有 AI 的情况下把核心任务重做一遍，把这个测试当成例行体检（第 8 节仪表盘里"迁移"那条信号，测的就是这个）。

So the “break-point” is the moment the removal test fails, not a usage-frequency threshold: once you can’t answer “can I still stand if this support is pulled,” the scaffold has already turned into a crutch without your noticing. That gives “resistance” a concrete tool: running removal drills on a schedule, not using AI less. Every so often, redo the core task with no AI at all, and treat that as a routine check-up (this is exactly what the “transfer” signal in Section 8’s dashboard measures).

系列接缝Series seam

这和工程卷的 trust-but-verify 是同一个结构：工程留一个校验环防代码出错，学习留一个认知环防能力萎缩：同一条"人不能被彻底移出循环"的纪律，作用在两个不同的面上。Same structure as engineering’s trust-but-verify: engineering keeps a verification loop to catch bad code, learning keeps a cognitive loop to catch atrophy. It is the same discipline, “don’t fully remove the human from the loop,” working on two different surfaces.

LEARN

REDRAW · 重画

REDRAW

机理 · 内核③特化

Mechanism · Kernel ③

认知脚手架即基础设施

The Cognitive Scaffold Is the Infrastructure

同一道题，AI 一秒能给你答案，你也可以先卡自己十五分钟再问——这一章讲的是个人脚手架，而它最反直觉的一根柱子就是后者：那点你亲手给自己设的阻力。

Same problem, two paths: AI answers in a second, or you make yourself sit with it for fifteen minutes first. This chapter is about the personal scaffold, and its most counter-intuitive pillar is the second path: friction you build in for yourself.

一句话In one line

工具栈默认把一切磨顺，光靠护栏挡不住这个趋势——你得把合意困难设计回流程里，靠流程守住阻力，而不是每天靠意志力硬撑。The stack defaults to smoothing everything out, and guardrails alone can’t hold that back; you have to design desirable difficulty back into the process. Hold the friction with process, not by white-knuckling willpower every day.

工程卷的③讲的是代码库可查询、人机同源。学习这一面的③同一个原理，但走得更远：脚手架不是个答案仓库，是重建认知结构的工地。这里我们讲原理，不推荐软件——一个个人知识库，不管你叫它 Markdown 还是别的什么，它的价值从来不在"功能"，而在三条底层原理：

The engineering volume’s ③ is “the codebase is queryable, same-source for people and machines.” The learning-side ③ carries the same principle further: the scaffold is a construction site for rebuilding cognitive structure, not an answer warehouse. We’ll state the principles here, not recommend software: a personal knowledge base, whatever you call it, gets its real value not from features but from three underlying principles:

人机同源：你和 AI 读同一份纯文本——它可被你重读、被 AI 查询，认知留痕不锁进私有黑箱。
Same-source for people and machines: you and AI read the same plain text – re-readable by you, queryable by AI, cognitive traces not locked in a proprietary black box.
可 diff / 可版本：你能看见自己的理解怎么变的。错题反思库的价值正在这条——它记录的不是答案，是你曾经错在哪、以及为什么。
Diffable / versioned: you can see how your understanding changed. This is exactly the value of an error-and-reflection log – it records not the answer but where you were once wrong, and why.
刻意保留的难度：脚手架里要内建"合意困难"——间隔重复、交错练习、用提取（自测）代替呈现（重读）。Bjork（1994；2011；2020 JARMAC 9(4):475）：那些减慢表面学习的难度，反而提升长期保持与迁移；那些加速表面学习的，常常损害留存。这是承重锚——数十年可复现、与 AI 无关。
Difficulty retained on purpose: the scaffold builds in “desirable difficulty” – spaced repetition, interleaving, retrieval (self-testing) in place of presentation (rereading). Bjork (1994; 2011; 2020 JARMAC 9(4):475): the difficulties that slow apparent learning improve long-term retention and transfer; those that speed apparent learning often harm retention. This is a load-bearing anchor – decades-replicable and AI-independent.

这就是抵抗便利落到工程上的样子：不靠意志力硬扛，而是把阻力焊进工具流。底下还压着一条更硬的道理——有些过程的价值恰恰来自慢。记忆巩固需要时间和睡眠（间隔效应、睡眠期回放，Stickgold、Diekelmann & Born 等人的工作，证据分级 Ⅱ），这是 AI 压缩不了的物理时间。脚手架该顺着这条时间常数去设计，不是跟它对着干。

This is what resisting convenience looks like once it’s built into engineering: not gritting your teeth, but welding the friction into the toolflow. Underneath sits a harder truth: some processes get their value precisely from being slow. Memory consolidation needs time and sleep (the spacing effect, sleep-stage replay: work from Stickgold, Diekelmann & Born and others, grade II), physical time AI cannot compress. The scaffold should be designed along that time constant, not against it.

FIG. L.5 / 认知脚手架即基设SCAFFOLD AS INFRASTRUCTURE看懂：三根承重柱托起一个可拆的认知工地Read: three load-bearing pillars hold up a removable construction site

三根柱子怎么读：个人认知脚手架不是一个软件，是三根承重柱托起的一块工地：错题反思库（焊测试效应）、人机同源可 diff 的知识库（让理解的变化看得见）、以及第三根最反直觉的柱子——刻意制造的难度。前两根别卷也有近亲，第三根是学习卷独有：当工具栈默认抹平一切阻力，护栏就不够了，你得自己往里加合意困难。三根柱子都坐在一条压不掉的时间常数上。Reading the three pillars: a personal cognitive scaffold is not a piece of software but a site held up by three pillars: an error-reflection log (welding in the testing effect), a same-source diffable knowledge base (making changes in understanding visible), and the third, most counter-intuitive pillar – deliberately inserted friction. The first two have cousins in the other volumes; the third is unique to learning: when the stack smooths away all friction by default, guardrails are not enough – you must add desirable difficulty yourself. All three rest on a time constant that cannot be compressed.

FIG. L.9 / 卸载的临界点THE OFFLOADING BREAK-POINT看懂：同一个 AI，过了临界点就从脚手架翻成拐杖Read: the same AI flips from scaffold to crutch past one tipping point

沿曲线看：同一个工具，净作用不是恒正的。依赖浅时它把你撑高，是脚手架；越过临界点后曲线穿过零线转负——它开始替你做你本该自己长出来的那部分，成了拐杖。难点在于临界点不写在工具上，要靠一个动作去测：可撤除性（沿用 [R17] 维果茨基对脚手架的定义，支持必须可逐步撤回）。今天就把 AI 撤掉，产出只是略降，你还在脚手架一侧；若直接塌掉，你已过点。这也正是 [R19] METR 那条"自感更快、实测更慢"的合成自信之所以危险——它让你以为还在左侧，其实已滑过临界点。Along the curve: the same tool’s net effect is not constantly positive. While reliance is shallow it lifts you – a scaffold; past the break-point the curve crosses zero and turns negative – it starts doing for you the very part you were supposed to grow yourself, becoming a crutch. The hard part is that the break-point is not printed on the tool; you locate it with one act: removability (following [R17] Vygotsky’s definition of a scaffold – support that must be withdrawable in steps). Pull the AI today; if output merely dips, you are still on the scaffold side; if it collapses, you have crossed. This is exactly why [R19] METR’s “felt faster, measured slower” synthetic confidence is dangerous – it lets you believe you are on the left while you have already slipped past the point.

把抵抗从意志层搬到流程层

Move resistance from willpower to process

这一节最该记住的一句话是：抵抗便利别靠自律，靠流程。理由很实际——人的意志力是有限的、会耗尽的资源，而便利那股默认的引力二十四小时不知疲倦地在场。两者长期拉锯，意志力必输。

The one line to remember here: don’t resist convenience with self-discipline, resist it with process. The reason is practical: willpower is a scarce, depletable resource, while convenience’s default pull is there around the clock, never tiring. In a long fight, willpower loses.

这也是为什么我们的处方总落在"建脚手架"而不是"再自律一点"——这是一次搬家：把抵抗从每天都要重打一仗的意志层，搬到一次性设计好、之后自己运转的流程层。错题反思库里那个固定的"下次复看日期"，把"我得记得复习"这种靠记性的事，变成一个到期就提醒你的流程；交互模板里那句"先自答、要理由、留反思"，把"我得忍住别直接要答案"这种靠克制的事，变成默认就这么问的流程。脚手架一旦搭好，先想后问、间隔复看、留痕回流就成了阻力最小的那条路——你不用每天动用意志去选它，顺着轨道走就是了。所以我们不开"要更努力"的空头支票，只开"把努力一次性焊进工具流"这一张工程处方：能长久的抵抗，从来不是靠咬牙，是靠设计。

That’s also why the prescription keeps landing on “build the scaffold” rather than “be more disciplined”: it’s really a relocation. It moves resistance from the willpower layer, where you refight the same battle daily, to the process layer, designed once and running on its own after that. The fixed “next review date” in the error-reflection log turns “I have to remember to review,” which leans on memory, into a process that reminds you on schedule. The interaction template’s “answer first, give your reasons, leave a reflection” turns “I have to resist asking straight for the answer,” which leans on restraint, into the default way you ask. Once the scaffold is built, thinking before asking, spaced review, and letting traces flow back become the path of least resistance; you’re not choosing it daily by willpower, you’re just following the track you laid. So we don’t write the blank check of “try harder.” We write the one prescription that’s actually engineering: freeze the effort once into the toolflow. Resistance that lasts was never about gritting your teeth. It’s about design.

复利：为什么脚手架越用越值钱

Compounding: why the scaffold gets more valuable the longer you use it

"脚手架即基设"的复利性质，不举个具体场景就只是句好听话。想象你的错题反思库跑了一年。第一个月它只是一堆零散的错题，价值差不多是线性的——记一条多一条。可条目一多，新的复利来源就冒出来了：你会发现某些错误反复以不同面目出现（迁移钩子把它们串成了一个模式），于是一条新错题不再只是"加一"，它会激活你对一整类错误的重新认识，让旧条目跟着一起升值。

The compounding side of “scaffold as infrastructure” is just a nice phrase unless you picture it concretely. Say your error-reflection log has run a year. In month one it’s just scattered errors, value roughly linear: log one, gain one. But once entries pile up, a new source of compounding shows up: you start noticing certain mistakes recurring in different disguises (transfer hooks stringing them into a pattern), so a new entry isn’t just “plus one” anymore. It reactivates your understanding of an entire class of error, and the old entries gain value along with it.

再往后，整个库变成一面镜子：你能 diff 出自己三个月前、半年前的心智模型，看见有些当时的盲区已经消失，有些顽固的偏差还在——这种"看得见自己怎么变的"能力，任何一条单独的记录都给不了，它是整体涌现出来的。这正是基础设施和工具的分界：工具的价值是固定的，基设的价值随使用非线性上涨。这也是为什么我们反复强调"起步要小，但要一直做"——复利的前提是时间，一个建得很宏大却三天就扔下的库，永远等不到复利发生的那一刻。

Further along, the whole log becomes a mirror: you can diff your own mental model from three or six months back, see which blind spots have closed and which stubborn ones haven’t. That capacity to see your own cognitive change is something no single entry can give you; it’s emergent from the whole. That’s exactly where infrastructure differs from a tool: a tool’s value is fixed, infrastructure’s value grows non-linearly with use. It’s also why we keep saying “start small, but keep going”: compounding needs time, and a grand log abandoned after three days never reaches the moment it would compound.

合意困难：最反直觉的一根柱子

Desirable difficulty: the most counter-intuitive pillar

前两根柱子（反思库、同源知识库）在别的方法论里都有近亲，唯独第三根——故意往工具流里加难度——是学习这一面独有的，也最不符合产品直觉。几乎所有学习工具的默认进化方向都是"更顺、更省力、更自动"；我们反过来主张：在一个默认省力的工具栈里，护栏（防错、提醒）不够，你还得主动把合意困难设计进流程。这不是一句态度，落到实处是三组具体替换：用提取（自测、闭卷重做）换掉呈现（重读、被讲解）；用间隔（拉开复习）换掉集中（一次学完）；用交错（混着不同类型练）换掉分块（同类扎堆刷）。

The first two pillars (the reflection log, the same-source knowledge base) have cousins elsewhere. Only the third, deliberately adding difficulty into the toolflow, is unique to this volume, and it cuts most against product instinct. Nearly every learning tool defaults toward smoother, easier, more automatic; we argue the opposite: in a stack that defaults to easy, guardrails aren’t enough, you have to actively design desirable difficulty into the process. Not as an attitude, but as three concrete swaps: retrieval (self-testing, closed-book redo) instead of presentation (rereading, being explained to); spacing (spread-out review) instead of massing (learning it all at once); interleaving (mixed problem types) instead of blocking (grinding one type).

这三组替换都有 Ⅱ 级证据撑腰（Bjork 那一套合意困难的研究、Roediger & Karpicke 的测试效应），共同点是：都让当下更难、更慢、更不舒服，却让长期的留存和迁移更好。立起这根柱子，意味着接受一条反产品直觉的设计准则：好的学习脚手架，该在对的地方故意制造摩擦，而不是把所有摩擦都磨平。

All three carry grade-II evidence (Bjork’s body of work on desirable difficulty, Roediger & Karpicke’s testing effect) and share one trait: each makes the present harder, slower, less comfortable, and makes long-term retention and transfer better. Raising this pillar means accepting a design rule that cuts against product instinct: a good learning scaffold should deliberately create friction in the right places, not sand every bit of it away.

人机同源：为什么是纯文本，不是更聪明的笔记 App

Same-source: why plain text, not a smarter notes app

"人机同源"这条原理常被听成"去用某个支持 AI 的笔记软件"，那是把一条原理降级成了选购品。它的意思比工具深得多：你和 AI 该读写同一份东西——你能直接重读、也不被任何私有格式锁住的载体。为什么这条是承重的，从反面看最清楚：如果你的认知留痕沉淀在一个把内容锁进私有黑箱、只能通过它自己的 AI 接口访问的系统里，那么你和"过去的自己怎么想的"之间，就多出一个不受你控制的中间商。

“Same-source for people and machines” is often heard as “go use some AI-capable notes app,” which demotes a principle into a shopping decision. What it actually means runs deeper than tools: you and AI should read and write the same carrier: one you can reread directly, not locked behind any proprietary format. Why this bears weight shows up clearest from the opposite case. If your cognitive traces settle into a system that locks your content in a private black box, reachable only through its own AI interface, then between you and your own past thinking now sits a middleman you don’t control.

同源原理要排除的正是这个中间商：纯文本（Markdown 之类）的价值在于它把你和你自己认知轨迹之间的距离压到了零，不在于"简单"——没有中间商、没有锁，任何工具都能读，能 diff，能上版本控制。这和工程卷"代码库人机同源"是同一条原理在两个地方的应用：让人和机器在同一份不被锁住的事实上协作，别各拿一份会慢慢漂移走样的副本。

That’s exactly the middleman the same-source principle exists to cut out: plain text’s value is that it collapses the distance between you and your own cognitive trajectory to zero, not “simplicity”: no middleman, no lock, readable by any tool, diffable, versionable. This is the same principle as the engineering volume’s “same-source codebase,” applied in a second place: let people and machines work off one un-locked fact, instead of each holding a copy that slowly drifts.

"基础设施"在这里到底指什么

What “infrastructure” precisely means here

说脚手架"即基础设施"不是修辞。基础设施有三个定义性特征，认知脚手架一条条都对得上——这才是这个词真正承重的地方：

Calling the scaffold “infrastructure” isn’t rhetoric. Infrastructure has three defining traits, and the cognitive scaffold matches every one; that’s where the word actually carries weight:

在背景中运转，不占注意力：成熟的反思库像自来水，需要时打开就有，不需要时不会每天逼你做决策，这正是"把阻力做进流程、而非靠意志硬扛"的意思。
It runs in the background without consuming attention: a mature reflection log is like running water, there when opened, not forcing a daily decision when not – exactly what “build friction into the process rather than white-knuckle it” means.
它有复利：每多一条错题、每多一次留痕，整个库的价值非线性上升，因为新条目会和旧条目连成迁移钩子（第 9 节）；这与组织卷把上下文做成"可查询基设"是同一条复利逻辑，只是这里的资产是你自己的认知轨迹。
It compounds: each added error, each added trace lifts the whole base’s value non-linearly, because new entries link to old ones as transfer hooks (Section 9); this is the same compounding logic by which the org volume makes context a “queryable infrastructure,” only here the asset is your own cognitive trajectory.
它定义了什么是默认路径：基础设施最大的力量是改变默认行为而非依赖自律——一旦"先自答再问 AI"被脚手架固化成默认动线，抵抗便利就顺流而下，不再是每天要打的硬仗。
It defines what the default path is: infrastructure’s greatest power is changing default behavior rather than relying on willpower – once “self-answer before asking AI” is hardened by the scaffold into the default route, resisting convenience stops being a daily uphill fight and becomes going with the current.

这第三点回答了第 4 节留下的一个难题：如果抵抗便利全靠意志力，它注定输给一个默认省力的工具栈——人的自律是稀缺且会耗尽的资源。脚手架即基设，真正主张的是：用工程把抵抗从意志层挪到流程层。你不是每天决定要不要先想后问，你是建好一个让"先想后问"成为阻力最小路径的环境，让环境替你把这份坚持维持下去。这就是为什么处方落在"建脚手架"，不是"更自律"。

This third point answers a hard problem Section 4 left open: if resisting convenience runs purely on willpower, it’s doomed to lose to a stack that defaults to easy: human self-discipline is scarce and it depletes. What scaffold-as-infrastructure actually claims is: use engineering to move resistance from willpower to process. You’re not deciding every day whether to think before you ask; you build an environment where thinking first is the path of least resistance, and let the environment hold that discipline for you. That’s why the prescription lands on “build the scaffold,” not “be more disciplined.”

检验信号Test signal

反思库的回流使用率（你真回去重读错题吗）、迁移测试通过率，以及"主动设阻力"的习惯化：阻力从靠意志变成靠流程。The return-use rate of the reflection log (do you actually reread your errors), transfer-test pass rate, and the habituation of “adding friction on purpose”: friction shifting from willpower to process.

LEARN

REDRAW · 重画

REDRAW

重画 · 反转收束 + 适用边界

Redraw · Inversion + Scope

知道什么不该让 AI 做

Knowing What Not to Let AI Do

别的卷都在教你怎么把执行交出去。这一章反着问：哪些认知，碰都不该碰交出去这个念头？

The other volumes teach you how to hand execution off. This one asks the reverse: which cognition should you never even consider handing over?

一句话In one line

整卷收成一个动作：划一条线。线外放心交给 AI；线内是判断力、价值感、直觉、品味——一外包就萎缩，所以刻意攥在自己手里。The whole volume comes down to one act: draw a line. Outside it, hand freely to AI. Inside it sit judgment, value sense, intuition, taste: outsource them and they atrophy, so you hold onto them on purpose.

把整卷收成一句话，就是划线。线的一侧，放手交给 AI，省下的力气是真省下了。线的另一侧，是一组我们刻意不外包的认知，它们有一个共同的特征：外包它，它就会萎缩——而这几样恰好就是别的卷里"人不可替代"的那个来源。这就是本卷跟下游几卷正面接上的地方。

Collapse the whole volume into one line: draw the line. On one side, AI does the work, and the effort you save is genuinely saved. On the other side sits a set of cognition we keep deliberately un-outsourced, and they all share one trait: outsource it and it atrophies, and these happen to be exactly the source of human irreplaceability the other volumes lean on. This is where this volume connects directly to what’s downstream.

交给 AI · 可充裕Hand to AI · abundance-able

查事实、检索、汇总
Looking up facts, retrieval, summarizing
生成初稿、样例、讲解
Generating drafts, examples, explanations
范式内的、可机检的推导
In-paradigm, machine-checkable derivation
重复性、低价值负载的执行
Repetitive, low-value-load execution

留给人 · AI 止步线内Keep with humans · inside the stop-line

品味——设计卷取用它，本卷养成并守护它
Taste – the design volume consumes it; this volume grows and guards it
价值感知 / 判断力——组织卷人本主线的认知前提
Value perception / judgment – the cognitive precondition of the org volume’s human through-line
直觉 / 深度思考——质疑命中率的来源（第 3 节）
Intuition / deep thinking – the source of challenge hit-rate (Section 3)
"什么值得知道 / 做"的价值判断（只能人来定）
The constitutive value judgment of “what is worth knowing / doing”

系列接缝 · 品味的两侧Series seam · the two sides of taste

品味有两侧：设计卷用它对抗 slop（应用侧），本卷养成并守护它（养成侧）；学习生产品味，设计应用品味。Taste has two sides: the design volume applies it against slop (application), this volume grows and guards it (cultivation); learning produces taste, design applies it.

适用边界 · 在域Scope · in-domain

谁适用：要把一项技能或认知能力内化进自己的个人——开发者、研究者、设计者，任何在和 AI 协作时还想握着主导权的人。绿地和改造都适用：绿地是从零搭脚手架、划止步线；改造是先体检哪些能力已经被悄悄外包出去、其实不该外包，再重建它的犯错-纠正循环和合意困难。

Who it fits: individuals trying to internalize a skill or cognitive capacity: developers, researchers, designers, anyone who wants to stay in the driver’s seat while collaborating with AI. Both greenfield and transformation apply: greenfield builds the scaffold and stop-line from zero; transformation first audits which capacities have quietly been outsourced when they shouldn’t have been, then rebuilds their error-correction loop and desirable difficulty.

适用边界 · 出域Scope · out-of-domain

谁不在射程内：组织级培训体系、学校教育制度改革、考核与认证设计——这些有它们自己的约束（规模、公平、问责），不能直接照搬个人认知层的结论。本卷只往那边接一句：一个组织若真想让人"回归意义"，就得保证成员的判断力不萎缩——但怎么在组织尺度上做到这件事，是另一本书要写的。我们不为了显得普适去假装覆盖了它。

Who’s out of range: organization-wide training systems, school reform, assessment and certification design. These carry their own constraints (scale, fairness, accountability) and can’t just borrow conclusions from individual cognition. This volume connects to them in exactly one sentence: an organization that wants people to “return to meaning” has to keep its members’ judgment from atrophying, but how to do that at organizational scale is a different book. We won’t fake coverage just to look universal.

FIG. L.6 / 止步线判据THE STOP-LINE CRITERION看懂：两维定四格，只有一格是危险区Read: two dimensions, four cells, one true danger zone

定位你的格子：两维——横轴可外包性（AI 能不能稳定代劳）、纵轴是“它萎缩会不会伤及你做判断的根基”。四格里只有右上”高可外包 × 又伤判断根基”是危险区：诱人（AI 做得了）又致命（外包就萎缩，且萎缩反噬判断）。止步线就是这一格的边界。它会随模型变强向左推（越来越多事进入"AI 做得了"），但判据本身不过时——这就是为什么本卷给判据而不给清单。纵轴本身没有免于质疑：Q-LRN-01（下文）问的正是"萎缩会不会伤判断根基"这把尺子，在 AI 永不离场的条件下是否还该这样量。Locate your cell: two dimensions – x outsourceability (can AI do it stably), y constitutiveness (does its atrophy harm the root of your judgment). Of the four cells only the top-right, high-outsourceability × high-constitutiveness, is the true danger zone: tempting (AI can) and lethal (outsourcing atrophies it, and the atrophy backfires on judgment). The stop-line is that cell’s border. It drifts left as models improve (ever more enters “AI can do it”), but the criterion does not date – which is why the volume gives a criterion, not a list. The y-axis itself isn’t exempt from challenge: Q-LRN-01 below asks whether this very ruler, atrophy harming the root of judgment, should still measure the same way once AI never leaves the room.

把判据跑一遍：三个真实的判断

Running the criterion: three real judgments

判据不跑一遍就是空的。举三个结论不同的例子，看两个维度怎么一起把答案定下来：

A criterion means nothing until you run it. Three examples with different verdicts, showing how the two dimensions pin down the answer together:

让 AI 生成项目的样板代码。可外包性高（AI 稳定做得了），也不伤判断根基（样板不是你判断力的根基，记不住也没损失）——落在"放心外包"格，放手。
Having AI generate the project’s boilerplate. High outsourceability (AI does it reliably), low constitutiveness (boilerplate isn’t the root of your judgment, forgetting it costs nothing); lands in “outsource freely,” let go.
让 AI 替你把一个领域问题先做初步分解。可外包性一样高，但这次伤判断根基：问题分解正是第 3 节"提问能力"的内核，长期外包它，你识别真问题的能力就会被掏空。这落进右上"便利陷阱"格，止步线内。处置办法：自己先分解，再让 AI 补你漏掉的角度。
Having AI do a first-pass breakdown of a domain problem. Outsourceability is just as high, but this time it’s constitutive: problem decomposition is the core of the asking capacity from Section 3; outsource it long enough and your ability to spot real problems hollows out. This lands in the top-right “convenience trap” cell, inside the stop-line. What to do: break it down yourself first, then let AI fill in the angles you missed.
让 AI 帮你算一道你早就烂熟的算术。可外包性高，不伤判断根基（早就内化了，撤除测试稳过），落在"放心外包"，而且不用愧疚——这正是 Bjork 的边界：对一项已经牢固的能力硬加阻力，是不合意的困难，不是美德。
Having AI do arithmetic you’re already fluent in. High outsourceability, low constitutiveness (long since internalized, sails through the removal test); lands in “outsource freely,” guilt-free. This is Bjork’s boundary: forcing friction onto an already-solid capacity is undesirable difficulty, not virtue.

三个例子的差别全落在"伤不伤判断根基"这一维上——可外包性几乎都高，这正是 AI 时代的特征。真正决定交不交的，从来是那一件事萎缩了会不会伤到你的判断根基。把任何一项你正考虑外包的能力丢进这两个维度，答案就出来了——这也正是 INSTRUMENT 11/12 在做的事。

All three differ on exactly one axis: whether atrophy damages the root of your judgment. Outsourceability is high across the board; that’s the AI era’s signature. What actually decides whether to hand something off is whether losing it would cost you your judgment. Drop any capacity you’re weighing into these two dimensions and the answer falls out, which is exactly what INSTRUMENT 11/12 do.

Q-LRN-01 · 如果 AI 从此不再离场，“拿掉 AI 还做得出来”还算不算学会的标准？一个后端工程师三年没手写过一行 SQL——需求丢给 AI，它写查询、建索引、连执行计划都附上，跑得又快又对。按本节的判据，他早该在撤除测试上亮红灯：合上 AI 从头写，他卡在第二个 JOIN。可他有一句很难驳的反问——我这辈子的活，AI 都在手边，你非在一个永不发生的情境里考我，考的是能力，还是考古？

Q-LRN-01 · If AI never leaves the room again, is “can you still do it with AI removed” a standard for having learned at all? A backend engineer hasn’t hand-written a line of SQL in three years: he hands the requirement to AI, which writes the query, builds the index, even attaches the execution plan, fast and correct. By this section’s criterion he should have failed the removal test long ago: close AI and write it from scratch and he stalls at the second JOIN. But he has a hard question back: for the rest of my working life AI is right there; test me in a situation that will never occur, and are you measuring capability, or doing archaeology?

标准 A（本卷一路在用的）：能力 = 撤掉 AI 后仍能独立做出来。它最强的地方不在怀旧，在它锁住了别的标准锁不住的东西——你判断 AI 对不对的资格。第二个 JOIN 你自己写不出，你就看不出 AI 那条查询会不会在千万行表上全表扫描；能做，是质疑命中的前提（第 3 节）。它的代价一样实在：把“独立”抬成默认的好，可能误伤一种更高阶的能力——有人早不写 SQL，却能一眼看出 AI 的执行计划烂在哪、该往哪逼它重写；撤除测试给他也判红灯，而他也许才是驾驭工具的那个。

Standard A (the one this volume has been using): capability = you can still produce it independently once AI is removed. Its strength isn’t nostalgia; it locks down what no other standard can – your standing to judge whether AI is right. If you can’t write the second JOIN yourself, you can’t see whether AI’s query full-scans a ten-million-row table; doing is the precondition of a landed challenge (Section 3). Its cost is just as real: making “independent” the default good can wrongly flag a higher-order capacity: someone who long ago stopped writing SQL yet spots at a glance where AI’s execution plan is rotten and how to push it to rewrite. The removal test marks him red too, and he may be the one who actually commands the tool.

标准 B（最强的反方）：能力 = 在 AI 始终在场这个真实条件下，把活干到多好、多远。它最强的地方是不回避现实：既然工作现场 AI 永不缺席，用一个撤除后的反事实去度量，就像非要现代数学家丢掉符号记法去证定理——那不是更纯的能力，是更低效的折磨（这正是第 16 节反赌注的锋刃）。它的代价也很硬：一旦“和 AI 一起能做到”成了唯一标准，你就丢了那把分辨“我在驾驭它”和“我在被它牵着走”的尺——两者产出一模一样，只在撤掉 AI 的一瞬才现形。取消撤除，你连自己有没有被空心化都测不了。

Standard B (the strongest counter): capability = how well and how far you get the work done under the real condition that AI is always present. Its strength is refusing to dodge reality: since AI is never absent from the actual worksite, measuring by a post-removal counterfactual is like demanding a modern mathematician prove a theorem with symbolic notation stripped away – not a purer capacity but a less efficient torment (exactly the edge of Section 16’s counter-bet). Its cost is hard too: make “can do it with AI” the only standard and you lose the ruler that tells “I command it” from “it leads me by the nose”: identical in output, the difference surfacing only the instant AI is pulled. Abolish removal and you can’t even measure whether you’ve been hollowed out.

两个标准撞出的第一个硬问题，谁都绕不过：那点让你在开口问 AI 前多撑十五分钟的困难，是必要的，还是被浪漫化的低效？标准 A 会滑向自虐——把每一次不用 AI 都当美德；标准 B 会滑向偷懒——把每一次省下的挣扎都记成进步。合意困难那条倒 U（FIG L.2）给了一条界：只有落在“撤除后你仍能把它重建起来”射程内的困难才必要，出了这个射程，坚持手做就只是折磨。

The first hard question the collision forces, which neither side escapes: is the difficulty that makes you sit fifteen more minutes before asking AI necessary, or romanticized inefficiency? Standard A slides toward self-punishment: every AI-free rep counted as virtue; Standard B slides toward sloth: every skipped struggle counted as progress. The inverted-U of desirable difficulty (FIG L.2) draws one line: only difficulty within reach of “you could still rebuild it after removal” is necessary; past that reach, insisting on doing it by hand is just torment.

暂定回答 · Q-LRN-01Working answer · Q-LRN-01

我们此刻仍偏向标准 A，但把它收窄了：撤除测试测的不是“你能不能不靠 AI 干活”，是“AI 交来的这一版，你还有没有资格判它对不对”。所以欠一次撤除演练的，只有那些你需要保住质疑权的领域，不是每一个领域。什么会让我们改判：若纵向数据显示重度协作者撤掉 AI 后并不退步、质疑命中率还在涨，那“独立做出来”就确实是个假目标，该问的也就不再是“拿掉 AI 你还做得出来吗”，而是“哪些认知即便 AI 能稳定代劳，也值得留一条撤得回来的路”。

We still lean toward Standard A for now, but we’ve narrowed it: the removal test measures not “can you work without AI” but “do you still have standing to judge whether the version AI handed you is right.” So the only domains that owe you a removal drill are the ones where you need to keep the right to challenge – not every domain. What would change our mind: if longitudinal data shows heavy collaborators don’t regress once AI is removed and their challenge hit-rate keeps climbing, then “produce it independently” really is a false target, and the real question stops being “can you still do it with AI gone” and becomes “which cognition is worth keeping a path back to, even when AI can do it reliably.”

品味的两侧：学习生产它，设计用它

Two sides of taste: learning grows it, design uses it

"品味"这个词在设计卷和本卷都出现，但站的是它的两个不同侧面，这条接缝值得说精确，不然两卷读起来像在重复彼此。设计卷处理的是品味的应用侧：在一个生成随手可得、slop 是默认结果的环境里，品味是那把稀缺的尺子，决定从一堆 AI 产出里挑哪个、改哪里、退回哪个——它假设品味已经在那个做判断的人身上了。本卷处理的是品味的养成侧：品味从哪儿来？它不是天生的，也没法外包给 AI 去学，它是大量亲身经历过的"好/不好"判断，在某个领域里沉淀出来的直觉——正属于第 6 节止步线之内那组"一外包就萎缩"的认知。

“Taste” shows up in both the design volume and this one, but on its two different sides: worth stating precisely, or the two volumes read like they’re repeating each other. The design volume handles taste’s application side: in an environment where generation is cheap and slop is the default output, taste is the scarce ruler that decides which of a pile of AI outputs to pick, what to fix, what to reject. It assumes the taste is already there in the person judging. This volume handles taste’s cultivation side: where does taste come from? Not innate, not something you can outsource to AI to acquire; it’s the intuition that a lot of lived “good/not good” calls deposit in a domain, and it sits squarely among the Section 6 capacities that atrophy if outsourced.

把两卷接起来读，就得到一条完整的因果链：学习卷养成并守护品味，别让它被便利侵蚀；设计卷再调用这份品味去对抗 slop。前者要是没守住源头，后者手里就没有尺子可用——这也是为什么这套系列把学习卷放在"批判性良知"的位置：它守的不只是一个人的能力，是整条下游判断链能不能运转所依赖的那个人类底座。

Read the two volumes together and you get a full causal chain: the learning volume grows and guards taste, keeping it from being eroded by convenience; the design volume then calls on that taste to fight slop. If the former doesn’t guard the source, the latter has no ruler to use, which is why this series positions the learning volume as the “critical conscience”: it guards not just one person’s capability, but the human bedrock the entire downstream judgment chain depends on.

止步线不是一份静态名单，是一个判据

The Stop-Line Is a Criterion, Not a Static List

把"什么不该外包"写成一份固定清单，会很快过时，模型一升级，清单就得重划。更耐用的做法是给一个判据，让你自己在任何新出现的能力上当场判断。判据有两个维度，缺一不可。第一维：可外包性——这件认知 AI 能不能稳定、可核验地代劳？第二维：构成性：这件认知是不是你做后续判断的前提？它萎缩了，你会不会连"判断 AI 对不对"的资格都一起丢了？只有同时落在高可外包、高构成性这一格的，才是止步线真正要守的危险区：它诱人（AI 做得了），也致命（外包就萎缩，萎缩又直接伤及判断的根）。这正是 INSTRUMENT 11 那张四象限图里"便利陷阱"格在这一节的理论根。

Writing “what not to outsource” as a fixed list ages fast; one model upgrade and you have to redraw it. What lasts longer is a criterion you can apply on the spot to any new capability. It has two dimensions, and neither is optional. Dimension one: outsourceability: can AI do this reliably and checkably? Dimension two: constitutiveness: is this cognition a precondition for judgments you’ll make later? Put another way: if it atrophies, do you lose even the standing to judge whether AI got it right? Only the cell that is both high-outsourceability and high-constitutiveness is the danger zone the stop-line guards, tempting because AI can do it, and lethal because outsourcing it atrophies it and that atrophy strikes at the root of judgment itself. This is the theoretical root, in this section, of the “convenience trap” cell in INSTRUMENT 11’s four-quadrant map.

为什么"伤不伤判断根基"这一维这么关键，别的卷却不需要它？因为别的卷处理的是外部产物的对错，学习卷处理的是判断者本身还完不完整。一段 AI 写错的代码，可以被测试兜住、被人复核，错误不会传染给写代码的人；但一项被长期外包掉的判断能力，会连带让那个人失去"察觉自己判断错了"的能力——这是一种会把检测器本身腐蚀掉的损伤。止步线之所以非有不可，正是因为认知是唯一一个外包会反过来伤害外包者本身的领域。这也是为什么本卷的题眼是倒过来的"知道什么不该让 AI 做"，不是"学会用 AI"：前者优化的是外部的产出，后者守的是那个还有能力做判断的人。

Why does “constitutiveness” matter so much here, when the other volumes don’t need it? Because they deal with the correctness of external artifacts, while the learning volume deals with the integrity of the judge. A line of AI-written bad code gets caught by tests, reviewed by a person; the error doesn’t infect the person. But a judgment capacity outsourced for years takes with it the very ability to notice your own judgment is wrong, damage that corrodes the detector itself. The stop-line has to exist because cognition is the one domain where outsourcing turns around and harms the outsourcer. It’s also why this volume’s keystone is its inverse, “know what not to let AI do,” rather than “learn to use AI”: the former optimizes what comes out, the latter guards the person still capable of judging it.

LEARN

DECISION · 决策

DECISION

决策 · 交出与保留

Decision · Hand off & keep

哪一步交出去，哪一步不交

Which Step to Hand Off, Which to Keep

这一章把"交不交给 AI"变成一道跟着任务走的分诊题，并把上下文该往哪流写清楚。

This chapter turns “hand it to AI or not” into a triage question that follows the task, and spells out which way the context should flow.

一句话In one line

同一个 AI，用对了帮你学，用反了替你学。诀窍不是少用，是次序——自己先产出，AI 再校验补漏，流回来的是"哪里不对"，不是答案。The same AI helps you learn or learns for you instead. The trick is order, not using it less: produce yourself first, let AI verify and patch afterward, and what flows back is “where it’s wrong,” not the answer.

同一个工具，两种用法，结果正相反。这就是整卷处方的支点：我们开的不是"禁用 AI"这张方子，是"设计怎么用"这张。把一项学习任务拆成步骤，每一步都过同一道问题——这一步动的是"知道"还是"会做"？如果是会做，它在不在第 6 节说的止步线之内？下面这张表可以直接照着用：

Same tool, two ways of using it, opposite outcomes. That’s the hinge of the whole volume’s prescription: what we’re prescribing is “design how you use it,” not “ban AI.” Break a learning task into steps and run each one through the same question: does this step touch “knowing that” or “knowing how”? If it’s the latter, is it inside the stop-line from Section 6? The table below is one you can use directly:

这步交给 AI（替代式安全）Hand this step to AI (substitution is safe)

取材：检索、汇总、找反例、列出我没想到的角度
Sourcing: retrieval, summarizing, finding counterexamples, listing angles I missed
脚手架：把我的草稿讲解给我听、生成练习题、出测验
Scaffolding: explaining my draft back to me, generating exercises, writing quizzes
校验：在我先产出后，核对我的推导、指出漏洞
Checking: after I produce first, verifying my derivation, flagging gaps
收尾：格式化、改错字、把已成形的判断整理成稿
Finishing: formatting, fixing typos, organizing a settled judgment into prose

这步留给自己（替代式有害）Keep this step (substitution harms)

第一稿的问题分解——亲手把混沌拆成可攻的子问题
The first-pass problem decomposition – breaking chaos into attackable subproblems by hand
犯错-纠正循环的那个错：错了先自纠，再看 AI（第 2 节）
The error in the error-correction loop: when wrong, self-correct before consulting AI (Section 2)
质疑命中：判断 AI 哪里不对——这靠你亲手趟过的直觉（第 3 节）
The challenge hit: judging where AI is wrong – riding on the intuition only you ran the loop to earn (Section 3)
"什么值得做/知道"的价值判断，与最终的品味定夺（第 6 节）
The value judgment of “what is worth doing / knowing,” and the final call of taste (Section 6)

表本身不是重点，重点是上下文在这两栏之间怎么流——流向反了，本来安全的用法就翻成了替代。这里定一个方向：

The table isn’t the point; how context flows between these two columns is. Reverse the flow and a safe use flips into substitution. Here’s the one direction that matters:

先人后机，不是先机后人。你先产出假设/草稿/分解，再把它喂给 AI 校验补全。次序一旦颠倒（先让 AI 出第一稿，你再改），犯错-纠正循环就被绕过了——MIT 预印本里"拥有感最低、78% 无法复述自己刚写的句子"正是这个次序的产物（Ⅲ 级、样本小，但机理方向一致）。
Human first, machine second – not the reverse. You produce the hypothesis / draft / decomposition first, then feed it to AI to verify and complete. Once the order flips (AI writes the first draft, you edit), the error-correction loop is bypassed – the MIT preprint’s “lowest sense of ownership, 78% unable to quote a sentence they had just written” is exactly the product of that order (grade III, small sample, but the mechanistic direction agrees).
校验回流，不是答案回流。从 AI 流回你的，应是"哪里不对/还能怎么想"，不是"标准答案"。让它当陪练，不当代笔。结构化提示的可干预杠杆就在这一条：把"先自答、要理由、留反思"写进你和 AI 的交互模板。
What flows back is checking, not the answer. What returns from AI should be “where it is wrong / what else to consider,” not “the model answer.” Let it spar, not ghostwrite. The intervenable lever of structured prompting lives here: write “answer first, demand reasons, leave a reflection” into your interaction template with AI.
留痕回流到脚手架。每一轮的"我错在哪、AI 指出什么、我改了什么"沉淀进反思库（第 5 节/09）——上下文不只在你和 AI 之间流，还要流进一个可 diff 的、你日后能回看的载体。
Traces flow back into the scaffold. Each round’s “where I was wrong, what AI flagged, what I changed” settles into the reflection log (Section 5/09) – context flows not only between you and AI but also into a diffable carrier you can revisit later.

把规则跑一遍：同一道题，两种次序，两个结果

Running the rule: same problem, two orders, two outcomes

拿一个具体场景把三条流向规则串起来跑一遍，比讲抽象规则更能服人。设想你在学一个陌生的算法。违反次序的做法：直接问 AI"这题怎么解"，它给出一份完整漂亮的解法，你读懂了（读起来很顺），抄进作业，过了。三天后碰到一道变体，你脑子一片空白——因为你的认知从没上场，AI 的解法只是路过你的眼睛。这正是 MIT 预印本里"拥有感最低、复述不出自己刚写的句子"的临床现场。

Running the three flow rules through one concrete scene beats stating abstract rules. Picture learning an unfamiliar algorithm. The order-violating way: ask AI straight “how do I solve this,” get back a complete, elegant solution, understand it (it reads smoothly), paste it into the assignment, pass. Three days later a variant shows up and you go blank: your cognition never took the field; AI’s solution just passed your eyes. This is the live version of the MIT preprint’s “lowest sense of ownership, can’t restate what they wrote.”

守住次序的做法：先给自己十五分钟（延迟求助），写下第一版思路，哪怕卡在第三步；然后问 AI"我卡在这，请只告诉我哪一步错了，别给完整答案"（流回来的是校验，不是答案）；拿到反馈自己改完，再把"我原来错在哪、为什么、这类错还会出现在哪"记进反思库（留痕回流）。同一道题，第二种次序里犯错-纠正循环完整跑了一遍，"会做"长出来一点；第一种次序里只发生了一次信息搬运。三条规则不是教条，是这个差别的可操作版本——它们唯一的用处，是保证 AI 进场时循环还在转。

The order-keeping way: give yourself fifteen minutes first, write your first-pass approach even stuck at step three. Then ask AI, “I’m stuck here: tell me only which step is wrong, not the full answer” (what returns is verification, not the answer). Fix it yourself with that feedback, then log “where I was wrong, why, where else this error shows up” into the reflection log. Same problem, but the second order runs the error-correction loop to completion and “knowing how” grows a little; the first order only moves information around. The three rules are the operable version of that difference, not dogma, and their sole job is making sure the loop is still turning when AI steps in.

分诊表会过时，流向规则不会

The triage table ages; the flow rule doesn’t

我们刻意把重量压在"流向规则"上，而不是压在那张"哪步交、哪步留"的分诊表上，这个选择背后有个方法论上的理由，值得说清楚。分诊表是当下能力分布的一张快照：今天该留给自己的"问题分解"，明天可能就因为模型更强而能安全交出去；表里每一格的归属都会随 AI 能力往左移。把方法论钉死在一张会过时的表上，它的保质期就和某一代模型绑在了一起。

We deliberately put the weight on the “flow rule” instead of on the “which step to hand off” triage table, and the methodological reason is worth naming. The triage table is a snapshot of today’s capability distribution: the “problem decomposition” you should keep to yourself today might safely go to AI tomorrow once models improve; every cell’s assignment drifts as AI gets stronger. Pin a methodology to a table that ages and its shelf life gets tied to one model generation.

流向规则——先人后机、校验回流、留痕回流——不一样。它约束的不是"哪件事"，而是"不管哪件事，你的认知都得先动，AI 才能进场"这个次序。次序跟具体能力边界是正交的：边界怎么移，"先动的必须是你"这条都成立，因为它锁住的是犯错-纠正循环能不能发生的结构性条件，而这个结构不会随模型变强而变。这和第 6 节给判据而不给清单是同一种设计哲学：在一个变化很快的领域里，把方法论钉在不变的结构上，别钉在会漂移的边界上。

The flow rule (human first, verify-return, trace-return) is different. It doesn’t constrain “which thing,” it constrains “whatever the thing, your cognition has to move before AI enters,” an order. Order is orthogonal to the specific capability boundary: however the boundary moves, “you go first” still holds, because it locks the structural condition for whether the error-correction loop can happen at all, and that structure doesn’t change as models get stronger. Same design philosophy as giving a criterion instead of a list in Section 6: in a fast-moving field, pin the methodology to the structure that doesn’t move, not the boundary that does.

三条流向规则，各堵一个具体的漏

Three flow rules, each plugging a specific leak

把三条流向规则放在一起看，会发现它们各自堵住一个具体的漏洞，不是三条并列的好习惯：

Put the three flow rules side by side and they turn out not to be three parallel good habits, but three patches for three specific leaks:

先人后机堵的是"循环被绕过"：AI 一旦先出第一稿，你的提取和生成就没机会上场了，MIT 预印本里"拥有感最低、78% 无法复述自己刚写的句子"，正是这个漏洞的临床表现（证据分级 Ⅲ、样本小，但它精确刻画了"循环被绕过"长什么样）。
Human-first plugs “the loop gets bypassed”: once AI drafts first, your retrieval and generation never take the field. The MIT preprint’s “lowest sense of ownership, 78% unable to restate a sentence they’d just written” is the clinical picture of exactly this leak (grade III, small sample, but it precisely operationalizes what a bypassed loop looks like).
校验回流堵的是"答案顶替反馈"：如果 AI 流回来的是标准答案而不是"哪里不对"，它就替你把该自己走的纠正步骤也做了，循环的后半段照样丢了。
Verify-return plugs “the answer standing in for feedback”: if what flows back is the model answer instead of “where it’s wrong,” AI has done the correction step you were supposed to do yourself, and the loop’s back half is lost anyway.
留痕回流堵的是"学的东西不沉淀"：每一轮的纠错要是没流进一个可 diff 的反思库，它就只活在工作记忆里，过几天连情境一起蒸发，下次还得从头来。
Trace-return plugs “nothing settles”: if each round’s correction never flows into a diffable reflection log, it lives only in working memory and evaporates along with its context within days; next time you start from scratch again.

三条规则合在一起，其实是把一个完整的犯错-纠正循环，在人机协作的形态下重新拼回完整——少哪条，循环就在那个地方断掉。

Together, the three rules reassemble one complete error-correction loop in its human-AI form; drop any one and the loop breaks right there.

同一个工具，为什么用法一反、结果就反

Same Tool: Why the Result Flips When the Use Flips

"取决于怎么用"听着像在回避判断，但它背后有个精确的机制，值得拆开看。结构化提示的随机对照实验（MDPI Data 2025，n=150）里，两组用的是同一个模型，唯一的差别在交互的形状：无引导组直接要答案，结构化组被要求先自答、给理由、写一句反思。结果是认知投入和批判推理出现了显著分化。机制并不神秘——它就是第 2 节说的那个犯错-纠正循环有没有被触发。先自答，你的认知已经做过一次提取和生成（哪怕是错的），AI 的反馈才有东西可改；直接要答案，你的认知从没上场，AI 的答案路过你眼睛，没地方落。

“Depends how you use it” sounds like dodging judgment, but there’s a precise mechanism underneath worth unpacking. In the structured-prompting RCT (MDPI Data 2025, n=150), both groups used the same model; the only difference was the shape of the interaction. The unguided group asked straight for answers; the structured group had to answer first, give reasons, write a line of reflection. The result: a significant split in cognitive engagement and critical reasoning. The mechanism is straightforward: whether the Section 2 error-correction loop fires. Answer first, and your cognition has already done a round of retrieval and generation, even if wrong, so AI’s feedback has something to revise. Ask straight, and your cognition never took the field; AI’s answer just passes your eyes with nowhere to land.

所以"用法"这个词精确的意思是：你的认知有没有在 AI 介入之前先动起来。动了，AI 是校验器，循环被强化；没动，AI 是替代者，循环被绕开。

So the precise meaning of “use” is: did your cognition move before AI stepped in. If it did, AI is a verifier and the loop strengthens. If it didn’t, AI is a substitute and the loop gets bypassed.

这把"先人后机"从一句口号变成了一个能落地的工程约束：它锁的不是态度，是上下文的流向。把交互模板固化成"我先给你 X，你只对 X 做校验、反驳、补漏，别直接给我标准答案"，就是把这个方向写进了你和 AI 的契约。这跟工程卷把校验做成流程（不靠人记得去复核）、组织卷把上下文做成可查询基设（不靠人记得去同步）是同一个手法——把纪律焊进工程，不指望自律。三卷在这里共享一条工程哲学：能写进流程的事，就别留给意志力。

This turns “human first, machine second” from a slogan into an executable engineering constraint: what it locks down is the direction context flows, not attitude. Hardening the interaction template into “I give you X first, you only verify/rebut/patch X, don’t hand me the model answer” writes that direction into your contract with AI. Same move as the engineering volume turning verification into a process instead of relying on people to remember to review, or the org volume turning context into queryable infrastructure instead of relying on people to remember to sync. All three volumes share one engineering philosophy here: whatever can be written into the process shouldn’t be left to willpower.

分诊表本身会随模型能力往左漂——今天该留给自己的步骤，明天可能就能安全交出去。但流向规则不会过时：不管边界怎么移，你的认知得先动这条始终成立。这就是为什么我们把重量压在流向上，而不是压在那张迟早要过时的分诊表上。

The triage table itself will drift as models improve: a step you should keep today might safely be handed off tomorrow. But the flow rule doesn’t age: however the boundary moves, your cognition has to move first still holds. That’s why the weight sits on the flow direction, not on a table that’s bound to date.

怎么知道分诊有效How to know the triage is right

有效：左栏越来越自动，右栏你仍亲手做得动；失效：右栏悄悄漂到左栏，你让 AI 替你分解、下价值判断，却说不清为什么同意。Working: the left column grows automatic, the right you still do by hand; failing: right-column items drift left – you let AI decompose and make the value calls, unable to say why you agreed.

LEARN

SIGNAL · 信号

SIGNAL

信号 · 先行指标

Signal · Leading indicators

在萎缩发生前，看见它

See Atrophy Before It Happens

MIT 那组用 AI 写作的人里，78% 连自己刚敲下的句子都引用不出来——你到底在悄悄变强还是变弱，能量出来的只有你自己，问题是你真的在量吗？

In the MIT group that wrote with AI, 78% couldn’t quote a sentence they’d just typed. Whether you’re quietly getting stronger or weaker, you’re the only one who can measure it, but are you actually doing so?

一句话In one line

没人替你做长期追踪，你就是关于自己的唯一纵向样本。挂几条先行指标定期自测，它们是烟雾报警器——响了就去查，别当成判决。Nobody else is tracking you long-term: you’re your own only longitudinal sample. Hang a few leading indicators and check them regularly; they’re a smoke alarm: when it sounds, go investigate, don’t treat it as a verdict.

仪表盘分两组：先行指标升高，说明认知主导权在变强；反指标升高，说明依赖在悄悄变深。两组必须一起读——单看任何一条都会骗你，这就是合意困难的元认知陷阱：当下表现越好，常常意味着学得越差。

The dashboard has two banks: leading indicators rising means your cognitive command is getting stronger; counter-indicators rising means dependence is quietly deepening. Read the two together; any single line will fool you. That’s the metacognitive trap of desirable difficulty: doing well right now often means you’re learning worse.

先行 · override rateLeading · override rate

推翻/修改 AI 输出的频率Frequency of overriding AI output

你多常反驳 AI、且反驳对。APA 2025（N=1,923）用过这个量：推翻越少 → 自报独立推理信心越低（r=-.61）。下降是头号预警——它意味着你正从"质疑者"滑成"接受者"。How often you push back on AI, and push back correctly. APA 2025 (N=1,923) used this quantity: less overriding → lower self-reported confidence in independent reasoning (r=-.61). A decline is the prime warning – it means you are sliding from challenger to acceptor.

先行 · 拥有感Leading · ownership

能否复述自己刚产出的东西Can you restate what you just produced

合上 AI，你能否凭记忆复述刚"和 AI 一起"得出的结论与理由。MIT 预印本把它操作化：LLM 组拥有感最低、78% 无法引用自己刚写的句子（Ⅲ，样本小）。复述不出，就是没内化、只是搬运。With AI closed, can you restate from memory the conclusion and reasoning you just reached “with AI.” The MIT preprint operationalized this: the LLM group had the lowest ownership, 78% unable to quote a sentence they had just written (III, small sample). Can’t restate it = didn’t internalize it, only transported it.

先行 · 迁移Leading · transfer

无 AI 在场的新情境通过率Pass rate in a new, AI-free situation

能否把"和 AI 协作中学到的"用到一个新情境、且 AI 不在场。这是理解的检验标准（迁移），也是"能做"是否真长出来的唯一硬测（第 2 节）。只在 AI 在场时表现好 = 长出来的是依赖。Can you deploy “what you learned collaborating with AI” in a new situation with AI absent. This is the test standard of understanding (transfer) and the only hard test of whether “knowing how” actually grew (Section 2). Good performance only when AI is present = what grew is dependence.

先行 · 结构化占比：你的 AI 使用里，"脚手架式"（先自答、要理由、留反思）对"替代式"（直接要答案）的比例。MDPI Data 2025 把这条立为可干预杠杆——它是你唯一能直接拧的旋钮，拧对了能逆转卸载。升 = 好。
Leading · structured-use share: within your AI use, the ratio of “scaffolded” (answer first, demand reasons, leave a reflection) to “substitutive” (ask straight for the answer). MDPI Data 2025 set this as the intervenable lever – it is the one knob you can turn directly, and turned right it can reverse offloading. Rising = good.
反指标 · 答案召回占比：你的学习时间里，"秒查答案"挤掉"走完循环"的比例。升 = 坏——它在把你的学习偷换成搬运。
Counter · answer-recall share: within your learning time, the share where “look up the answer instantly” crowds out “run the loop.” Rising = bad – it is swapping your learning for transport.
反指标 · "不用 AI 我还会吗"答不上：当你对一项已外包的能力答不出"三个月不用 AI 我还做得来吗"，这条就触发了。它直接喂给第二仪器（第 15 节的"该不该让 AI 做"判定器）。
Counter · failing the “could I still do this without AI” check: when, for an outsourced capacity, you cannot answer “could I still do this after three months without AI,” this line trips. It feeds directly into the second instrument (the “should AI do this” test in Section 15).

烟雾报警器，不是判决书：为什么这个区分要紧

A smoke alarm, not a verdict: why the distinction matters

仪表盘最容易被用到两个极端去，而这两种误用都来自把"先行指标"和"因果判决"混成了一件事。一种是过度恐慌：override rate 掉了一点，就断定"我被 AI 弄废了"，焦虑起来甚至全面禁用 AI——这既越过了证据（一条信号下降证明不了能力萎缩），也掉进了第 4 节说过的那个陷阱：把萎缩当已经证明的事。另一种是完全不管：反正没有因果证据，索性什么都不监测，任由依赖悄悄变深——这等于放弃了你作为唯一纵向样本能做的那件负责任的事。

The dashboard gets misused at two extremes, and both come from conflating “leading indicator” with “causal verdict.” One is over-panic: override rate dips a little and you conclude “AI ruined me,” spiraling into anxiety or banning AI outright, which overshoots the evidence (one declining signal doesn’t prove atrophy) and falls into the same trap Section 4 warned about, treating atrophy as already proven. The other is total disregard: since there’s no causal evidence anyway, you monitor nothing and let dependence deepen quietly, giving up the one responsible thing you, as the only longitudinal sample, can actually do.

正确的姿态在两者中间，"烟雾报警器"这个比喻能精确锚住它：报警响了，你去查厨房是不是着火了，不会立刻拆房子，也不会为了清静把报警器拔掉。仪表盘某条信号往下走，意思是"该回头对这项能力做一次撤除演练了"——它是个提醒你去查的信号，不是一份定罪的判决。守住这个区分，仪表盘才既不会把你吓瘫，也不会被你当噪音无视。

The right posture sits between the two, and “smoke alarm” pins it down precisely: when the alarm sounds, you check whether the kitchen’s on fire; you don’t tear the house down, and you don’t unplug the alarm for peace of mind. A signal trending down means “time to run a removal drill on this capacity”: a prompt to go check, not a verdict of guilt. Hold that line and the dashboard neither paralyzes you nor gets waved off as noise.

一个具体读法：两组指标怎样互相校正

A concrete reading: how the two banks correct each other

"两组一起读"说起来抽象，给个具体场景。设想一位开发者这一季度大量用 AI 写代码，自己感觉效率飙升、产出更多。只看即时感受，结论就是"AI 让我更强了"。现在叠上滞后的指标：迁移信号：上周面试白板题不让用 AI，他卡在一个三个月前还很顺手的算法上；拥有感：他没法向同事口头讲清上个月自己合并的那个 PR 核心逻辑是什么；override rate：他已经想不起上次反驳 AI 的建议是什么时候了。

“Read the two banks together” stays abstract without a scene. Picture a developer who spent this quarter writing a lot of code with AI, feeling efficiency soar and output rise. Going only on that feeling, the verdict is “AI made me stronger.” Now overlay the lagged indicators: transfer: last week’s no-AI whiteboard interview, he got stuck on an algorithm that was easy for him three months ago. Ownership: he can’t walk a colleague through the core logic of the PR he merged last month. Override rate: he can’t recall the last time he pushed back on an AI suggestion.

三条滞后信号一起往下走，即时感受却在往上走——这个背离本身就是最强的警报，比任何单条都可靠。它说的不是"他变笨了"（证据撑不住这个判决），而是"他的能力越来越依赖 AI 在场，撤除测试正在悄悄失败"。这时该做的，是回到第 6、7 节，重新划一次这项能力的止步线，再对它做一次撤除演练。

Three lagged signals falling together while the immediate feeling rises: that divergence itself is the strongest alarm, more reliable than any single line. It doesn’t say “he got dumber” (the evidence can’t carry that verdict); it says his capability is leaning more and more on AI being present, and the removal test is quietly starting to fail. The right move then is to go back to the stop-line sections, redraw where this capacity’s line sits, and run a removal drill on it.

为什么得自己挂仪表盘：你是唯一的纵向样本

Why you have to hang the dashboard yourself: you’re the only longitudinal sample

第 4 节反复说"没有多年期纵向数据"，这不只是学术上的空白，对个人来说还有一个直接推论：既然没人替你做长期追踪，你就是关于你自己的唯一纵向样本。把先行指标挂起来定期记一笔，就是在为自己跑一个 N=1 的纵向研究。这不是退而求其次，是这种证据状态下唯一负责任的姿态——宏观证据不够格下因果判决，但够格告诉你该盯哪几个量，而这几个量恰好都能在个人尺度上被你自己读出来。仪表盘把"全人类会不会萎缩"这个答不了的大问题，降维成"我这项能力这一季是升是降"这个你每天都能自己验证的小问题。

Section 4 keeps saying “there’s no multi-year longitudinal data.” That’s not just an academic gap; it carries a direct consequence for you: since nobody’s running a long-term track on you, you’re the only longitudinal sample about yourself. Hanging the leading indicators and logging them periodically is, in essence, running an N=1 longitudinal study on yourself. That’s not a consolation prize; it’s the only responsible posture given the evidence we have. The macro evidence can’t carry a causal verdict, but it’s enough to tell you which quantities to watch, and those quantities all happen to be readable by you at your own scale. The dashboard turns the unanswerable big question, is humanity atrophying, into the small question you can check every day: is this particular capacity of mine up or down this quarter.

但读仪表盘本身有一个绕不开的悖论，必须明说，否则会读反。第 14 节的速度公理推出一个反直觉推论：当下表现好，常常意味着学得差（合意困难的元认知陷阱）。任何"当堂感觉"类的指标都会系统性骗你——你今天用 AI 做得飞快、感觉良好，恰恰可能是 override rate 在跌、拥有感在空。

But reading the dashboard has an unavoidable paradox that must be stated, or it will be read backwards. The Section 14 speed axiom yields a counter-intuitive corollary: good current performance often means worse learning (the metacognitive trap of desirable difficulty). This means any “in-the-moment feeling” indicator will systematically deceive you – flying through a task with AI today and feeling great may be exactly when override rate is dropping and ownership is hollowing.

所以仪表盘的设计有意偏向滞后、无 AI 在场的量：迁移测试要隔几天、要撤掉 AI 才测；拥有感要合上 AI 凭记忆复述才算。两组指标必须一起读，正是为了让滞后的真信号去校正即时的假流畅——单看任何一条，尤其单看"感觉"，都会把你导向便利陷阱。

So the dashboard deliberately leans on lagged, AI-absent quantities: transfer tests measured days later with AI removed; ownership counted only by restating from memory with AI closed. The two banks must be read together precisely so the lagged true signal corrects the immediate false fluency – reading any single line, especially “feeling,” alone steers you into the convenience trap.

口径诚实 · 别把仪表盘当判决书Honest caveat · the dashboard is not a verdict

这些是先行指标，不是因果证明；单条下降只提示"该回头看这项能力了"，不证明能力已降，这是 B 档立场落到个人监测层的样子。These are leading indicators, not causal proof; a single decline only prompts “look back at this capacity,” it does not prove decline – the B-stance at the personal-monitoring layer.

LEARN

ARTIFACT · 工件

ARTIFACT

工件 · 可拷贝规约

Artifact · Copyable spec

错题反思库的规约

A Spec for the Error-and-Reflection Log

这一章把认知脚手架做成一件能直接拷走用的工件：错题反思库的最小规约。

This chapter turns the cognitive scaffold into one artifact you can copy and use directly: a minimal spec for the error-reflection log.

一句话In one line

反思库记的是"你当时怎么错的"，抄正确答案没有学习价值；它最大的敌人不是字段不全，是从来没真跑起来。The log records “how you got it wrong,” not the right answer; copying the answer has no learning value. Its biggest enemy is never actually starting, not incomplete fields.

每条记录一件"我曾经错在哪"。字段刻意做得很少——能坚持用，比字段齐全重要得多。但每个字段背后都压着一条原理，不是随手定的：

Each entry records one “where I once got it wrong.” The fields are deliberately few: sticking with it matters more than completeness. But every field carries a principle behind it, none picked at random:

错的内容（我当时怎么想的）——记你的错误推理，不是正确答案。价值全在这：反思库记的是"你曾经错在哪、为什么"，可 diff 地看见自己理解怎么变的。只抄答案 = 又一个答案仓库，零学习价值。
The error (how I was thinking then) – record your wrong reasoning, not the right answer. The whole value is here: the log records “where you were wrong and why,” letting you diff how your understanding changed. Copying only the answer = one more answer warehouse, zero learning value.
纠正前先自答——看 AI/答案之前，先写下你自己的修正尝试。这一步把测试效应焊进流程：主动提取（哪怕错）比被动重读更利于长期保持（Roediger & Karpicke 2006，Ⅱ）。没有这一步，反思库就退化成抄答案。
Self-answer before correcting – before consulting AI/the answer, write your own correction attempt. This step welds the testing effect into the flow: active retrieval (even if wrong) beats passive rereading for long-term retention (Roediger & Karpicke 2006, II). Without it, the log degrades into copying answers.
下次复看日期（间隔）——给每条记一个复习日，按间隔拉开（隔天 → 隔周 → 隔月）。间隔效应：拉开的复习比集中复习留存更久，可达约一年（多日设计研究，Ⅱ）。这是把"慢"设计进脚手架，不是靠记性硬撑。
Next review date (spacing) – give each entry a review day, spaced apart (next day → next week → next month). The spacing effect: spaced review retains longer than massed, lasting up to about a year (multi-day-design studies, II). This designs “slow” into the scaffold rather than leaning on memory.
迁移钩子（这错还会出现在哪）——写一句"这类错在别的什么情境也会犯"。交错/变式练习提升迁移（合意困难家族，Bjork，Ⅱ）；这个字段逼你做近迁移联想，把孤立的错连成模式。
Transfer hook (where else this error appears) – write one line: “this class of error also shows up in what other situations.” Interleaved/varied practice improves transfer (the desirable-difficulty family, Bjork, II); this field forces a near-transfer association, linking isolated errors into a pattern.

反模式 · 答案仓库Anti-pattern · answer warehouse

把 AI 的正确答案剪贴进笔记，越攒越多，从不回看。看着像在学习，实则是在用一个可搜索的外脑替换内化——下次还得问。这正是"知道"的搬运，不是"能做"的生长。

Paste AI’s correct answers into notes, pile them ever higher, never revisit. It looks like learning but it is replacing internalization with a searchable external brain – next time you still have to ask. This is the transport of “knowing that,” not the growth of “knowing how.”

规约 · 重建工地Spec · construction site

记错误推理 + 先自答 + 间隔复看 + 迁移钩子。四个字段各焊一条原理（测试效应/间隔/迁移/合意困难）。它逼你回流（第 8 节的"反思库回流使用率"信号正测这个），把每个错变成一次结构重建，而非一条只读笔记。

Record the wrong reasoning + self-answer + spaced review + transfer hook. The four fields each weld in a principle (testing effect / spacing / transfer / desirable difficulty). It forces return-use (the Section 8 “reflection-log return-use rate” signal measures exactly this), turning each error into a structural rebuild rather than a read-only note.

最小可行的反思库：今天就能跑起来的版本

The minimal viable log: a version you can run today

规约最大的敌人是从来没跑起来，不是设计不全。很多人一听"建认知脚手架"就开始研究该用哪个软件、设计多少字段、怎么分类标签，然后在搭建系统的快感里把"真正记一条错"无限期推迟。这本身就是一种伪装成认真的拖延（也是第 15 节失败模式的近亲）。

The spec’s biggest enemy is not incompleteness but never getting started. Many, on hearing “build a cognitive scaffold,” begin researching which software to use, how many fields to design, how to organize tags, and then, in the pleasure of building the system, defer “actually logging one error” indefinitely. This is itself a procrastination disguised as diligence (a cousin of the Section 15 failure modes).

所以本卷给一个刻意简陋、今天就能跑的最小版本：一个纯文本文件，每周三条，每条只填两行——"我当时怎么想的（错的推理）"和"看答案前我的修正尝试（先自答）"。没有标签、没有分类、没有花哨的复习算法。这个最小版本已经在跑测试效应（先自答 = 主动提取）和留痕（错误推理被记下、可 diff），而这两条是整份规约里最承重的。等这个习惯长稳了（比如连续四周没断）再考虑加"下次复看日期"和"迁移钩子"。脚手架的全部价值在被持续使用：一个跑了一年的简陋文本文件，胜过一个设计精美却建完就再没打开的系统。

So the volume gives a deliberately crude minimal version you can run today: one plain-text file, three entries a week, each with only two lines – “how I was thinking (the wrong reasoning)” and “my correction attempt before seeing the answer (self-answer first).” No tags, no categories, no fancy review algorithm. This minimal version already runs the testing effect (self-answer = active retrieval) and trace-keeping (the wrong reasoning logged and diffable), and these two are the most load-bearing in the whole spec. Once the habit is stable (say, four unbroken weeks) consider adding “next review date” and “transfer hook.” A scaffold’s entire value is in being used continuously: a crude text file run for a year beats a beautifully designed system opened once and never again.

间隔复看那一栏，凭什么比"多记几条"更重要

Why the spaced-review field beats “log more entries”

很多人把反思库做成"越攒越多"的收藏夹，却从不回看——这恰好丢掉了它最承重的那一栏。规约里"下次复看日期"看似琐碎，实则把两条最稳的认知科学结论焊进了流程。其一是间隔效应：拉开的复习（隔天→隔周→隔月）比集中复习留存更久，效应可延续约一年（多日设计研究，Ⅱ）。其二是测试效应：复看时不重读那条错误记录，而是先盖住、凭记忆重做一遍，让每次复看本身成为一次提取练习。

Many turn the reflection log into an ever-growing favorites folder they never revisit – losing exactly its most load-bearing field. The spec’s “next review date” looks trivial but actually welds two of the sturdiest cognitive-science findings into the process. First, the spacing effect: spaced review (next day → next week → next month) retains longer than massed, with effects lasting about a year (multi-day-design studies, II). Second, the testing effect: on review, do not reread the error entry but cover it and redo it from memory, making each review itself a retrieval rep.

两条合起来，决定了反思库的价值不在条目数量，在回流频率——一条被按间隔提取过五次的错题，胜过五十条记完就再没打开过的收藏。这也是第 8 节为什么把"反思库回流使用率"而不是"反思库条目数"立为检验信号：前者测的是循环有没有真的在转，后者只测了收集癖。一个常见的失败是把建库做成囤积，囤积带来"我在认真学"的错觉（又一个便利陷阱的变体），却从未触发任何一次再巩固。

Together they decide that the log’s value lies not in entry count but in return frequency – one error retrieved on schedule five times beats fifty entries logged and never reopened. This is why Section 8 makes “reflection-log return-use rate” rather than “log entry count” a test signal: the former measures whether the loop actually turns, the latter only measures a collecting habit. A common failure is turning log-building into hoarding, and hoarding brings the illusion of “studying hard” (another variant of the convenience trap) while never triggering a single reconsolidation.

为什么记"错的推理"，而不是记"对的答案"

Why log the wrong reasoning, not the right answer

这是整份规约里最违反直觉、也最承重的一条，值得单独讲透。直觉会让你记正确答案——它干净、可复用、看着像学习的成果。但记正确答案恰恰是答案仓库反模式：它把你的认知留痕替换成一个可搜索的外部副本，下次你还是得查。记错误推理则相反，它保存的是一张"我当时的心智模型在哪里偏了"的快照。价值有三层：

This is the most counter-intuitive and most load-bearing line in the whole spec, and worth spelling out alone. Intuition pushes you to record the correct answer – it is clean, reusable, and looks like a learning result. But recording the correct answer is exactly the answer-warehouse anti-pattern: it replaces your cognitive trace with a searchable external copy, and next time you still have to look it up. Recording the wrong reasoning does the opposite: it preserves a snapshot of “where my mental model went off.” The value has three layers:

诊断：错误推理暴露的是你的模型缺陷，而正确答案对此一无所知——同一道题，十个人可以错在十个不同的地方，只有记下你自己错在哪，复看才有针对性。
Diagnosis: wrong reasoning exposes your model’s defect, of which the correct answer knows nothing – ten people can get the same problem wrong in ten different places, and only logging where you specifically erred makes review targeted.
diff：当你三个月后回看同一类错，能直接看出自己的模型变了没有——这是认知结构改变的唯一可见证据，也是第 5 节"可 diff"那条柱子落到工件层的样子。
The diff: revisiting the same class of error three months later, you can see directly whether your model has changed – the only visible evidence of cognitive-structure change, and the artifact-layer form of Section 5’s “diffable” pillar.
元认知校准：反复直面自己的错误模式，会逐步修正"我以为我会了"和"我真的会了"之间的系统性高估——而这个高估正是合意困难家族最危险的陷阱。
Metacognitive calibration: repeatedly facing your own error patterns gradually corrects the systematic overestimation between “I thought I knew it” and “I actually know it” – the very overestimation that is the desirable-difficulty family’s most dangerous trap.

把四个字段连起来读，它其实是把一个完整的犯错-纠正循环（第 2 节）冻结成可回看的工件：错的内容（②犯错）+ 先自答（③自纠，焊测试效应）+ 间隔复看（顺速度公理，焊间隔效应）+ 迁移钩子（把孤立的错连成模式，焊交错/迁移）。它把循环结构外化成基设，不是笔记格式——这样每个错都被强制走完一遍循环，而不是被一个 AI 答案当场抹平。

Read the four fields together and it is really a complete error-correction loop (Section 2) frozen into a revisitable artifact. The loop: the error (② erring) + self-answer (③ self-correction, welding in the testing effect) + spaced review (along the speed axiom, welding in the spacing effect) + transfer hook (linking isolated errors into a pattern, welding in interleaving/transfer). It is not a note format but the externalization of the loop structure into infrastructure – so each error is forced through one full loop rather than being smoothed over on the spot by an AI answer.

起步小到不会失败Start small enough not to fail

别一上来建宏大系统，那是另一种拖延。一个纯文本文件、一周三条、只填错的内容和先自答，就已经在跑测试效应了。脚手架的价值在被用。Don’t start by building a grand system – that’s another form of procrastination. One plain-text file, three entries a week, just “the error” and “self-answer,” already runs the testing effect. A scaffold’s value is in being used.

LEARN

EVIDENCE · 证据

EVIDENCE

证据 · 分级两份清单

Evidence · Graded dual-ledger

卸载证据究竟说了什么

What the Offloading Evidence Actually Says

这一章把全卷用到的证据摊成两份分开记的清单——一份记已经站得住的，一份记还在探索的；每条都标着等级，硬的敢下硬结论，软的只当赌注。

This chapter lays out the volume’s evidence in two separate ledgers: one for what’s already solid, one for what’s still exploratory. Every piece is graded: the hard ones get a hard conclusion, the soft ones stay a bet.

一句话In one line

证据分两摞：合意困难、测试效应这类几十年可复现的，敢下硬结论；"认知会萎缩"那一侧全是相关、小样本，只能当赌注，不能当判决。The evidence splits into two piles: decades-replicable findings like desirable difficulty and the testing effect draw a hard conclusion; the “cognition atrophies” side is all correlational, small-sample, a bet rather than a verdict.

先摆能下硬结论的那一堆——这些是几十年可复现、也不依赖任何 AI 研究的认知科学，是全卷最稳的地基：

First, the pile that earns a hard conclusion: decades-replicable cognitive science that doesn’t depend on any AI study, the firmest ground in the whole volume.

合意困难（Ⅱ，证据清单）：R. & E. Bjork（1994；2011；2020 JARMAC 9(4):475）——加速表观学习的条件常损害长期留存与迁移；放慢的困难（间隔、交错、变式、用测试代替呈现）反而提升。能下硬结论：AI 抹平困难 = 抹平合意困难。边界：困难只对有基础能成功响应者"合意"，否则只是绊脚。
Desirable difficulty (II, evidence ledger): R. & E. Bjork (1994; 2011; 2020 JARMAC 9(4):475) – conditions that speed apparent learning often harm long-term retention and transfer; slowing difficulties (spacing, interleaving, variation, testing in place of presentation) improve them. A hard conclusion holds: AI smoothing away difficulty = smoothing away desirable difficulty. Boundary: difficulty is “desirable” only for those with enough background to respond successfully, otherwise it is just an obstacle.
测试效应（Ⅱ，证据清单）：Roediger & Karpicke 2006——用测试当学习事件，长期回忆胜过重读；而重读短期看着更好（又一个元认知陷阱）。AI 把"重读式"轻松最大化，正踩这个陷阱。
The testing effect (II, evidence ledger): Roediger & Karpicke 2006 – using testing as a learning event beats rereading for long-term recall; rereading looks better short-term (another metacognitive trap). AI maximizes the ease of the “reread” mode, landing on that trap.
使用方式决定影响（Ⅱ 实验，证据清单）：结构化提示 RCT（MDPI Data 2025, 10(11):172，n=150）逆转卸载；脚手架式 AI 辅导 RCT 在方向上给出正效应（精确量级无可核元分析支撑，仅作方向性判断，可核读数见 Bastani 对照 R24）。最强的因果证据反而是正向的——所以处方是"设计合意困难"，不是"禁用 AI"。
Use determines the effect (II experiment, evidence ledger): a structured-prompting RCT (MDPI Data 2025, 10(11):172, n=150) reverses offloading; scaffolded AI-tutoring RCTs point positive in direction (the precise magnitude has no checkable meta-analysis behind it; a directional judgment only, with the checkable reading in the Bastani control R24). The strongest causal evidence is positive – so the prescription is “design desirable difficulty,” not “ban AI.”

接着是只够当赌注的——相关/自报/单次/小样本，必须带口径，不能当因果用（探索清单）：

Then the pieces that warrant only a bet – correlational / self-report / single-shot / small-sample, which must carry caveats and cannot be used as causal (the exploration ledger):

理论锚 · ⅡTheory anchor · II

认知卸载综述The offloading review

Risko & Gilbert 2016（Trends Cogn Sci 20(9):676）：把记忆/计算/导航外包给外部工具——概念真实牢固。但它界定的是行为，不证"卸载 → 内在萎缩"。对立框架必须并列：延伸心智（Clark & Chalmers 1998）——卸载可能是认知边界外移，不是衰退。Risko & Gilbert 2016 (Trends Cogn Sci 20(9):676): outsourcing memory/computation/navigation to external tools – the concept is real and solid. But it defines a behavior, not proof of “offloading → inner atrophy.” The rival frame must stand alongside: the extended mind (Clark & Chalmers 1998) – offloading may be the cognitive boundary moving outward, not decline.

先例 · 相关非因果Precedent · correlational

GPS / 搜索引擎GPS / search engines

GPS-海马：Maguire 2000、Dahmani & Bohbot 2020（Sci Rep）——习惯性 GPS 与海马灰质更少相关，方向未定（是 GPS 致萎缩，还是海马强者更爱空间策略？）。Google 效应：Sparrow 2011（Science 333:776）——预期可再取则记信息少、记"去哪找"多，但后续复制未稳健复现。计算器→心算属常见论断、缺一手实证强锚，不宜泛引。GPS-hippocampus: Maguire 2000, Dahmani & Bohbot 2020 (Sci Rep) – habitual GPS correlates with less hippocampal grey matter, direction undetermined (does GPS cause atrophy, or do strong-hippocampus people prefer spatial strategies?). Google effect: Sparrow 2011 (Science 333:776) – expecting re-access, people recall info less and “where to find it” more, but it has not replicated robustly. Calculator → mental arithmetic is a common claim lacking a strong first-hand empirical anchor; do not over-cite.

AI 实证 · Ⅲ / 自报AI empirics · III / self-report

2024–2026 的一批The 2024–2026 batch

Gerlich 2025（横断相关，反向因果未排除）；Kosmyna/MIT 2025（arXiv:2506.08872，Ⅲ 预印本，N→18，作者明确反对使用贬义化表述）；Lee/Microsoft 2025（CHI，测自报努力非能力，批判性思维转移到核验）；Stadler, Bannert & Sailer 2024（少见的因果实验：LLM 组认知负荷更低、但论证质量更差——"省力≠学得好");APA 2025 白纸黑字"描述性、不支持因果"。共同口径：无一能证长期因果萎缩。Gerlich 2025 (cross-sectional correlational, reverse causation unexcluded); Kosmyna/MIT 2025 (arXiv:2506.08872, III preprint, N→18, authors asked that “dumber” not be used); Lee/Microsoft 2025 (CHI, measures self-reported effort not ability; critical thinking shifts to verification); Stadler, Bannert & Sailer 2024 (a rare causal experiment: the LLM group had lower cognitive load but worse argument quality – “less effort ≠ learned better”); APA 2025 states in black and white “descriptive, does not support causal inference.” Shared caveat: none proves long-term causal atrophy.

延伸心智：必须正面碰的对立框架

The extended mind: the rival frame we have to face head-on

要论证"卸载可能有害"，最该正面碰的对立框架是延伸心智（Clark & Chalmers 1998）。这个框架主张外部工具——纸笔、笔记本，也包括 AI——可以算认知系统的一部分，而不是认知的对立面；你用笔记本记住一件事，只是"记忆"的边界往外挪了，谈不上"萎缩"。这个框架要是成立，把记忆和计算外包给 AI 就不算能力流失，是认知边界的自然扩张，我们这卷唱的反调就没了地基。

Arguing “offloading may be harmful” means we have to face head-on the rival frame of the extended mind (Clark & Chalmers 1998). It holds that external tools (paper, notebooks, and yes, AI) can count as part of the cognitive system, not its opposite; use a notebook to remember something, and your “memory” boundary has simply moved outward, not atrophied. If that frame holds, outsourcing memory and computation to AI is a natural expansion of the cognitive boundary, not a loss of capacity, and our dissenting note here loses its footing.

诚实的做法不是绕开它，是说清楚我们的主张在它面前为什么还站得住。延伸心智描述的是稳态下人和工具耦合成的那个系统，它没有否认一件事：这个外部组件被撤走的时候，失去它的那个人还能不能独立运转。我们守的恰恰是这条"撤掉之后还能独立运转"的能力——撤除测试和迁移信号测的都是它。延伸心智说"用工具不丢人，是认知的常态"，我们同意，只想再加一句：你得保证撤掉工具自己还站得住，不然外移的那个边界就成了收不回来的依赖。两个框架其实不冲突：延伸心智描述能力怎么分布，我们关心的是这个分布可不可逆。

The honest move isn’t to sidestep it, but to say plainly why our claim still stands next to it. The extended mind describes the person-tool system at steady state; it never denies the question of whether, once that external piece is removed, the person who lost it can still operate on their own. That’s exactly the capacity we’re guarding: the removal test and the transfer signal both measure it. The extended mind says using tools is no shame, it’s the cognitive norm, and we agree; we’d just add one clause: you have to be able to still stand once the tool is pulled, or that outward-moved boundary turns into a dependence you can’t take back. The two frames aren’t really in conflict: the extended mind describes how capacity is distributed; we care whether that distribution can be reversed.

五级证据：为什么每条引用都标着 Ⅰ–Ⅴ

Five grades: why every citation carries a I–V tag

每条引用旁边标的 Ⅰ–Ⅴ 不是学术摆设，是一道防止论证往下滑的栏杆，这套口径值得说清楚：

The I–V tag beside every citation is a railing against the argument sliding downhill, not academic decoration; worth spelling out:

Ⅰ = 同行评审且被独立复现的结论，最硬；
I = peer-reviewed and independently replicated, the hardest;
Ⅱ = 受控实验或可测量的设计（如 RCT、多日间隔设计），能下因果或准因果结论；
II = controlled experiment or measured design (RCTs, multi-day spacing designs), supporting causal or quasi-causal conclusions;
Ⅲ = 结构化案例或单次实验、相关研究，方向性强但不能当因果；
III = structured case or single-shot experiment, correlational study, strongly directional but not causal;
Ⅳ = 一手从业者观察，真实但主观；
IV = first-hand practitioner observation, real but subjective;
Ⅴ = 论证或推演，逻辑产物而非经验证据。
V = argument or projection, a product of logic not empirical evidence.

分级的用处，是给每条证据固定住它的"发言权"：合意困难、测试效应是 Ⅱ 级，所以我们敢拿它们下硬结论；萎缩那一侧的 AI 实证大多是 Ⅲ 级，甚至更低（横断、自报、小样本、未评审），只能拿来挂先行指标、下赌注，不能拿来下判决。把等级标出来，本身就是一种诚实——它让读者随时能看见一个主张站在多硬的地上，也让我们没法偷偷把一条 Ⅲ 级相关证据当 Ⅰ 级因果证据来用。第 4 节那个反调之所以可信，不是因为它的证据有多硬（它的核心命题恰恰证据软），是因为它老实标出了自己有多软，并因此把主张降到"赌注"，不是"判决"。

What the grading actually does is fix each piece of evidence’s standing: desirable difficulty and the testing effect are grade II, so we’re willing to draw hard conclusions from them. The atrophy-side AI evidence is mostly grade III or worse (cross-sectional, self-reported, small-sample, unreviewed), so it can only hang leading indicators and place bets, never deliver a verdict. Tagging the grade is itself a form of honesty: it lets a reader see at a glance how solid the ground under any claim is, and it stops us from quietly passing off grade-III correlation as grade-I causation. Section 4’s contrarian stance is credible not because its evidence is hard (its core claim’s evidence is precisely soft) but because it honestly says how soft it is and downgrades accordingly, to a bet rather than a verdict.

真实世界的行为证据：去技能化只在"撤走"那一刻现形

Real-world behavioral evidence: de-skilling only shows at the moment of removal

五级之外还有一类证据，起初我们手里没有，却分量最重：真实世界的行为数据——不靠自报、不靠横断问卷，直接量人在真实任务里的独立产出。一项多中心内镜研究给出目前最硬的一例：医生常规用 AI 辅助结肠镜一段时间后，把 AI 撤掉，同一批医生独立操作时的腺瘤检出率从 28.4% 跌到 22.4%（绝对下降 6 个百分点）[R23]。要害在于去技能化平时看不见：带着 AI 的时候医生表现正常，只有撤除的那一刻，被悄悄侵蚀的独立能力才露出来——这正是这卷"撤除演练"在现实里的版本：能力的真实水位，只有断开支撑才量得出来，而且这次是发生在专家身上。

Beyond the five grades sits a kind of evidence we didn’t have at first, yet it weighs the most: real-world behavioral data, not self-report, not cross-sectional surveys, but a direct measure of independent output on a real task. A multi-center endoscopy study gives the hardest example so far: after clinicians routinely used AI-assisted colonoscopy for a while, pulling the AI dropped the same clinicians’ unaided adenoma detection rate from 28.4% to 22.4%, an absolute six points[R23]. The point is that de-skilling stays invisible day to day: with AI present, clinicians perform normally, and only at the moment of removal does the quietly eroded independent capacity show. That is the real-world version of this volume’s “removal drill.” The true water level of a skill only shows once you disconnect the support, and here it happened to experts.

机理上的旁证来自 Bastani 等人的对照实验[R24]：把 AI 当拐杖用（直接要答案）的学习者，撤掉 AI 后比从没用过 AI 的对照组还差；把 AI 当脚手架用（只给提示不给答案）的，撤掉后没有受损。这条可核实的证据，正是第 10 节那个"脚手架有正效应"的说法（方向对，精确量级没有可核的来源）唯一站得住脚的理由——起作用的是"可撤除"这个设计属性，不是 AI 本身。

The mechanism finds a corroborating witness in Bastani et al.’s controlled experiment[R24]: learners who used AI as a crutch (asking straight for the answer) did worse after removal than a never-used-AI control group; those who used it as a scaffold (hints, not answers) suffered no harm at all. This checkable evidence is the only reason Section 10’s claim about scaffolding’s positive effect (directional, with no checkable source for the exact magnitude) can stand at all: what’s doing the work is the “removable” design property, not AI itself.

这也解释了为什么"摩擦"这件事不能默认托付给厂商。2025 年好几家厂商上线了主打"少给答案、多留过程"的学习模式；2026 年 4 月，OpenAI 悄悄把自己的 Study Mode 下线了[R25]（厂商动向，未经独立核实）。不管下线的理由是什么，方向很清楚：合意困难一旦只活在厂商的某个开关里，它的存亡就取决于商业上的取舍，可以一夜之间消失。所以我们把所有"主动设阻力"的处方都落在自己掌握的流程层——反思库、先自答的协议、撤除演练——而不是某个产品的"学习模式"。你能自己撤除的支撑，才是你真正拥有的那份能力的反面写照。

This also explains why “friction” can’t be handed off to vendors by default. In 2025 several vendors shipped a study mode built around “fewer answers, more process”; in April 2026 OpenAI quietly retired its own Study Mode[R25] (a vendor move, not independently verified). Whatever the reason for retiring it, the direction is clear: once desirable difficulty lives only inside a vendor’s toggle, its survival hinges on a business trade-off and it can vanish overnight. So we put every “add friction on purpose” prescription in a process layer you own yourself (the reflection log, the self-answer-first protocol, the removal drill), not in some product’s “study mode.” A support you can pull yourself is the mirror image of a capability you genuinely own.

FIG. L.8 / 证据清单THE EVIDENCE LEDGER看懂：把本卷主要主张按证据级排开，硬的归硬、软的认软Read: the volume’s main claims sorted by evidence grade, hard ones hard, soft ones owned as soft

账本怎么看：每一行是本卷一条承重主张，圆点落在它真实的证据级上。注意分布的形状：被当作机理引用的那些（间隔、测试效应、睡眠巩固、练习量被高估）都坐在 Ⅰ–Ⅱ 的硬端，可复现、可被独立检验；越往下、越靠工程处方（脚手架、止步线），越是 Ⅲ–Ⅳ 的一手实践，硬度递减。而全卷标题级的命题——"认知会萎缩"——被诚实地放在最右的 Ⅴ：现有证据多是相关性的（卸载与某些测量的能力下降同时出现），尚未排净"本就更弱的人更爱卸载"这类反向因果。所以本卷把它降格为"一个值得对冲的赌注"，而不是判决。这张图本身就是第 15 节的论证：诚实不是把软证据说硬，是把它标软、并据此调低主张的语气。Reading the ledger: each row is one load-bearing claim; the dot sits at its true evidence grade. Mind the shape of the distribution: the findings cited as mechanism (spacing, testing effect, sleep consolidation, overstated practice-volume) all sit at the hard Ⅰ–Ⅱ end, replicated and independently checkable; the further down toward engineering prescriptions (scaffold, stop-line), the more it is Ⅲ–Ⅳ first-hand practice, with hardness tapering. The volume’s headline claim, “cognition atrophies,” is placed honestly at the far-right Ⅴ: today’s evidence is mostly correlational (offloading co-occurs with declines on some measures) and has not ruled out reverse causation such as “already-weaker people offload more.” So the volume downgrades it to “a bet worth hedging,” not a verdict. The figure is itself Section 15’s argument: honesty means marking soft evidence soft and lowering the claim’s tone to match, not calling it hard.

为什么一个唱反调的命题，反而要把反证摆到最显眼处

Why a contrarian claim must put its counter-evidence most visibly

直觉上，一个要论证"认知可能萎缩"的章节，应该把支持的证据堆满、把反证轻描淡写。本卷反着做：把最强的反证（脚手架式 AI 辅导在方向上的正效应：精确量级无可核来源，仅作方向性判断，实证锚点为 Bastani 对照 R24；再分配假说、Sparrow 复制失败）放在和担忧侧同等显眼的位置。这是论证策略上的硬要求，并非谦虚。其一，本卷的主张本来就是 B 档而非 A 档：它要立的从来不在"萎缩已发生"，而在"在未决窗口期买一份对冲保险"：这个主张的力量恰恰来自承认反证的存在；藏起反证，主张就从"诚实的赌注"滑成"选择性叙事"，反而更弱。

Intuitively, a chapter arguing “cognition may atrophy” should pile up supporting evidence and downplay the counter-evidence. This volume does the reverse – it places the strongest counter-evidence as visibly as the concern side. That counter-evidence includes scaffolded AI-tutoring’s directional positive effect (its precise magnitude has no checkable source, a directional judgment whose empirical anchor is the Bastani control R24), the redistribution hypothesis, and Sparrow’s failed replication. This is not modesty but a hard requirement of argument strategy. First, the volume’s claim is grade B, not grade A from the start: what it asserts was never “atrophy has happened” but “buy hedging insurance during the unsettled window.” That claim’s force comes precisely from acknowledging the counter-evidence. Hide it and the claim slides from “an honest bet” into “selective narrative,” which is weaker.

其二，读者的信任是这卷唯一的承重墙：一个唱反调的方法论，最容易被指控为"危言耸听"；唯一的免疫方式是把自己最不利的证据先摆出来、并说明它为什么仍不足以推翻赌注。把反证摆到最显眼处，是这卷取信于人的方式，也是它和那种"AI 已经造成认知能力下降"的廉价警示划清界限的地方。

Second, the reader’s trust is this volume’s only load-bearing wall: a contrarian methodology is most easily charged with “scaremongering”; the only immunity is to lay out its most unfavorable evidence first and explain why it still falls short of overturning the bet. Putting the counter-evidence most visibly is how this volume earns trust, and where it draws the line against the cheap doom-saying of “AI makes you stupid.”

这一原则落到一张账本上：把全卷引用的研究逐条排开，每条标四件事——等级（Ⅰ–Ⅴ）、口径、它支持的结论、以及最容易被滥用的它不能支持的结论。最后一栏才是这张表的真正用途：它把每条证据的"射程"固定下来，防止它被抬出射程当因果用。

That principle lands on one ledger: arrange every study the volume cites line by line, tagging four things per row – grade (I–V), caveat, the conclusion it supports, and the most-easily-abused conclusion it cannot support. That last column is the table’s real purpose: it pins each piece’s “range” so it cannot be carried out of range and used as causal.

两份证据清单 · 能下硬结论的 vs 只够当赌注的DUAL EVIDENCE LEDGER · hard conclusions vs bets only

研究 / 锚

等级 · 口径

它说了什么

它不能说什么

Study / anchor

Grade · caveat

What it says

What it cannot say

Bjork & Bjork1994 / 2011 / 2020 JARMAC 9(4):475

Ⅱ 证据清单 · 数十年可复现ledger · decades-replicable

加速表观学习的条件常损害长期留存与迁移；放慢的合意困难反而提升。AI 抹平困难 = 抹平合意困难。Conditions that speed apparent learning often harm long-term retention/transfer; slowing desirable difficulties improve them. AI smoothing difficulty = smoothing desirable difficulty.

不能推出"任何难度都好"——困难只对有基础能成功响应者合意。Cannot imply “all difficulty is good” – desirable only for those able to respond successfully.

Roediger & Karpicke2006 · testing effect

Ⅱ 证据清单ledger

用测试当学习事件，长期回忆胜过重读；重读短期看着更好（元认知陷阱）。AI 把"重读式"轻松最大化。Testing as a learning event beats rereading for long-term recall; rereading looks better short-term (a metacognitive trap). AI maximizes the “reread” ease.

不能推出"测试越多越好"或可替代理解；只说提取>呈现。Cannot imply “more testing is always better” or that it replaces understanding; only retrieval > presentation.

脚手架式 AI 辅导 RCTScaffolded AI-tutoring RCT2024–25 · 方向性正效应directional +

Ⅱ 实验 · 最强因果experiment · strongest causal

结构化用法下 AI 显著提升学习——本卷最强的因果证据反指向上。处方=设计合意困难，非禁用。Under structured use AI significantly lifts learning – the volume’s strongest causal evidence points up. Prescription = design desirable difficulty, not ban.

不能推出"无引导用 AI 也好"；增益依赖脚手架设计。其正效应只作方向性判断，精确量级无可核来源。Cannot imply “unguided AI is fine too”; the gain depends on scaffold design. The positive effect is a directional judgment only; its precise magnitude has no checkable source.

Risko & Gilbert2016 · TiCS 20(9):676

Ⅱ 理论锚 · 综述theory anchor · review

界定认知卸载：把记忆/计算/导航外包给外部工具——概念真实牢固。Defines cognitive offloading: outsourcing memory/computation/navigation to external tools – concept real and solid.

界定的是行为，不证"卸载→内在萎缩"。对立框架：延伸心智（Clark & Chalmers 1998）。Defines a behavior, not proof of “offloading → inner atrophy.” Rival: extended mind (Clark & Chalmers 1998).

Sparrow et al.2011 · Science 333:776

Ⅲ↓ 探索清单 · 复制存疑exploration · replication doubtful

"Google 效应"：预期可再取用，则记信息少、记"去哪找"多。“The Google effect”: expecting re-access, people recall info less and “where to find it” more.

后续未稳健复现，等级应下调；不能当作卸载致损的硬证。Has not replicated robustly; grade should be lowered; not hard proof of offloading harm.

Maguire 2000 / BohbotGPS-海马 · Sci Rep 2020GPS-hippocampus · Sci Rep 2020

Ⅲ 探索清单 · 相关exploration · correlational

习惯性 GPS 与海马灰质更少相关；空间策略使用者灰质更多。Habitual GPS correlates with less hippocampal grey matter; spatial-strategy users have more.

方向未定：GPS 致萎缩，还是海马强者更爱空间策略？相关≠因果。Direction undetermined: does GPS cause atrophy, or do strong-hippocampus people prefer spatial strategy? Correlation ≠ causation.

Gerlich 2025Societies 15(1):6 · N=666

Ⅲ 探索清单 · 横断相关exploration · cross-sectional

频繁用 AI 与批判性思维显著负相关，由卸载中介（总效应 b=−0.42）。Frequent AI use negatively correlates with critical thinking, mediated by offloading (total effect b=−0.42).

作者自承不能证因果、无纵向数据、反向因果未排除。Author concedes no causation, no longitudinal data, reverse causation unexcluded.

Kosmyna / MIT 2025arXiv:2506.08872 · N→18

Ⅲ↓ 探索清单 · 预印本未评审exploration · preprint, un-reviewed

EEG 示"认知负债"累积；LLM 组拥有感最低、78% 无法引用自己刚写的句子。EEG shows accruing “cognitive debt”; the LLM group had lowest ownership, 78% unable to quote a sentence they just wrote.

样本极小、未评审；作者明确反对使用贬义化表述。仅作机理方向，非结论。Tiny sample, un-reviewed; authors asked that “dumber” not be used. Direction only, not a conclusion.

APA 2025N=1,923

Ⅲ 探索清单 · 描述性自报exploration · descriptive self-report

提示依赖越高、推翻 AI 越少 → 自报独立推理信心越低（r=−.61）。Higher prompt-dependence, less overriding → lower self-reported confidence in independent reasoning (r=−.61).

作者白纸黑字："描述性，不支持因果……不暗示认知损害或神经改变。"Authors state in writing: “descriptive, does not support causal inference … no implied cognitive damage or neural change.”

读这张表只有一条铁律：横着读到最后一栏才算读完。只引前三栏、砍掉"它不能说什么"的转述，都是在把一条相关证据偷渡成因果断言——这正是这卷自己最该防的那种误用（第 15 节末条）。

There’s one iron rule for reading this table: a row isn’t finished until you’ve read its last column. Quoting only the first three columns and dropping “what it can’t say” smuggles a correlational finding into a causal claim: exactly the misuse this volume most needs to guard against in itself (Section 15’s last item).

读这一节的方式How to read this section

上半"能做"的证据是地基，敢断言；下半"萎缩"的证据是赌注，只挂先行指标。把这两半混在一起，是这卷最危险的失误，证据等级就是防它的那道栏杆。The top half, “knowing how,” is foundation, assertable. The bottom half, atrophy, is a bet, carrying only leading indicators. Conflating the two is this volume’s most dangerous failure; the grades are the railing against it.

LEARN

CRITIQUE · 批判

CRITIQUE

批判 · 旧结构的失效

Critique · where the old structures fail

讲授-考试这套结构，本来就漏，AI 只是把漏点照亮

The lecture-and-test machine already leaked; AI just lit the leaks up

这一章逐根点名传统教育的几根支柱——它们本来就漏，AI 只是把漏点照亮了。

This chapter names traditional education’s pillars one by one: they were already leaking, AI just lit the leaks up.

一句话In one line

讲授-考试、覆盖优先、刷绩点这些老结构，AI 之前就被判过"次优";AI 把获取的成本砍到零，漏点从"低效"恶化成了"空心化"。Lecture-and-test, coverage-first, grade-chasing: these old structures were already judged sub-optimal before AI. AI cuts acquisition cost to zero, and their leaks degrade from inefficient to hollowed-out.

先承认对手最强的版本：这些结构当年解的是真瓶颈

First, grant the opponent its strongest case: these structures solved real bottlenecks

批判之前先讲公道话。讲授制不是个愚蠢的设计——在一本书要手抄、一位专家一辈子最多面授几百人的年代，把一个懂行的人放上讲台、让上百人一起听，是当时信息分发问题的最优解。标准化考试也不是凭空作恶——要给成千上万素不相识的人一份可比较、抗裙带关系的能力凭证，统一题面、统一评分是一项真实的公平发明。覆盖优先（一学期讲完整本教材）回应的是"课时有限、内容必须塞进一个窗口"这个真实约束。

Be fair before you criticize. The lecture wasn’t a stupid design. In an age when a book had to be hand-copied and one expert could teach at most a few hundred people face to face in a lifetime, putting a knowledgeable person on a stage for a hundred listeners at once was the best available answer to a real information-distribution problem. Standardized testing wasn’t gratuitous cruelty either: giving thousands of strangers a comparable, nepotism-resistant credential of ability, with one paper and one scoring rule, was a genuine fairness invention. Coverage-first (racing through the whole textbook in a term) answers a real constraint too: class time is scarce and content has to fit inside a finite window.

这些结构都是在某个真实约束下的合理工程。这里的批判不是说它们当年就错了，是说：它们赖以成立的那个约束——信息稀缺、分发昂贵、评估必须规模化——正被 AI 一点点抽走，而结构本身没跟着变。承重的约束一旦消失，原本被它正当化的代价就露出来了。下面逐根拆开看。

In other words, each structure was reasonable engineering under a real constraint. The critique here is that the constraint they rested on (information scarcity, expensive distribution, evaluation that had to scale) is being drained away by AI, while the structures themselves haven’t changed, not that they were wrong back then. Once the load-bearing constraint disappears, the cost it used to justify is exposed. Let’s take them apart one at a time.

FIG. L.12 / 旧结构的两轴诊断TWO-AXIS DIAGNOSIS OF THE OLD STRUCTURES看懂：横轴=当年就有多漏，纵轴=AI 充裕把漏点放大多少Read: x = how leaky it already was, y = how far AI abundance amplifies the leak

怎么读这张：越靠右，这结构在 AI 之前就越被认知科学判为次优；越靠上，AI 把信息获取砍到零后，它的漏点被放大得越狠。右上角那簇红点（覆盖优先、考前突击、银行存储模型）是双重受灾区——本来就漏，AI 还把代价从"低效"推成"空心化"。它们是被 AI 照亮的旧坏结构，不是被 AI 创造出来的坏结构。How to read it: the further right, the more cognitive science had already judged the structure sub-optimal before AI; the further up, the harder its leak is amplified once AI cuts information acquisition to zero. The red cluster top-right (coverage, cramming, the banking model) is the double-hit zone – already leaky, and AI pushes the cost from “inefficient” to “hollowed-out.” These are not bad structures AI created, but old bad structures AI has lit up.

讲授-考试：把"听懂"错当"学会"的流水线

Lecture-and-test: an assembly line that mistakes “followed along” for “learned”

讲授制承重的假设是：把信息从讲者的嘴传到听者的脑子里，学习就发生了。这个假设在认知科学里早就站不住——被动接收的信息留存率很低，真正留下来的是主动提取（测试效应，Roediger & Karpicke〔R1〕）。讲授制把全部认知负荷都压在"听懂"那一刻，而"听懂"恰恰是识别层的廉价流畅，最容易被误当成会了（第 2 节讲过识别和重现的区别）。考试本该补上这一刀，逼出重现，但传统考试往往考完就扔，提取只发生一次、还高度可预测，于是退化成"考前把识别硬拉到重现"的突击，长期留存几乎为零。

The lecture’s load-bearing assumption is: transmit information from the speaker’s mouth to the listener’s head, and learning has happened. Cognitive science has long made that assumption untenable: passively received information retains poorly, and what actually sticks is active retrieval (the testing effect, Roediger & Karpicke〔R1〕). The lecture piles all cognitive effort onto the moment of “following along,” and that’s precisely the cheap fluency of the recognition layer, the easiest thing to mistake for mastery (Section 2 covered the recognition/reproduction split). The exam was meant to fix this by forcing reproduction, but a traditional test is usually taken once and thrown away; retrieval happens a single time and is highly predictable, degrading into cramming that temporarily forces recognition up to reproduction, with near-zero long-term retention.

AI 怎么把它推向空心化：当 AI 能秒生一份讲解、把任何概念讲到"听起来都懂"，讲授制最弱的那一环——制造识别层的虚假流畅——被无限放大了。学生现在随时都能获得"我听懂了"的感觉，却比任何时候都更少被逼着重现。讲授-考试本来只是低效；AI 在场时，它高效地批量生产出一群"以为自己学会了"的人。

How AI pushes it toward hollowing: once AI can generate an explanation in a second and make any concept “sound clear,” the lecture’s weakest link (manufacturing false fluency at the recognition layer) gets amplified without limit. Students can now get the feeling of having followed along at any moment, while being forced to reproduce less than ever. Lecture-and-test used to be merely inefficient. With AI in the room, it efficiently mass-produces people who believe they’ve learned.

覆盖优先与考前突击：正面撞上间隔效应的两种设计

Coverage and cramming: two designs colliding head-on with the spacing effect

这两根支柱可以一起拆，因为它们撞的是同一条最硬的证据，间隔效应（Cepeda 等人 2006 年的元分析，317 项实验、184 篇文献，证据分级 Ⅰ〔R14〕）：分散练习明显好过集中练习，是合意困难这个家族里证据最扎实的一支。覆盖优先（一学期赶完整本教材）逼着每个主题只被碰一次、彼此挤在一起，根本没有回访和间隔的空间。它优化的是"讲过"，不是"学会"。考前突击是集中练习的极端版本：把本该分散好几周的提取，压进考前一晚。它能短期拉高分数（这正是它活到今天的原因——它对眼前这场考有效），但对长期留存几乎没有贡献，考完就忘是这个设计的必然后果，不是意外。

These two pillars can be taken apart together, because they collide with the same hardest evidence: the spacing effect (Cepeda et al. 2006, a meta-analysis of 317 experiments across 184 papers, grade I〔R14〕). Distributed practice clearly beats massed practice, the best-evidenced branch of the desirable-difficulty family. Coverage-first (racing through the whole textbook in a term) forces every topic to be touched once and crammed against the next, with no room to revisit or space anything out; it optimizes “was covered,” not “was learned.” Cramming is the extreme version of massed practice: it compresses what should be weeks of retrieval into the night before. It raises scores short-term, which is exactly why it survives: it works for the exam right in front of you, but contributes almost nothing to long-term retention. Forgetting right after the test is the design’s consequence, not an accident.

AI 怎么把它推向空心化：这两者本来就和间隔效应对撞，AI 让这场碰撞变得更彻底。一份完整、结构化、随手就能生成的笔记或题解随时都在，"赶覆盖"的边际成本几乎归零——学生可以在考前一晚让 AI 把整个学期压成一份"看上去全懂"的速成包，把本就违反间隔的突击做到了物理极限。AI 没有创造突击这种文化，但它把突击的成本砍到接近零，也就拿掉了过去唯一逼着人提前分散学习的那点摩擦——手工整理太慢。摩擦一没，违反间隔效应的那条路就成了阻力最小的路。

How AI pushes it toward hollowing: these two already collide with spacing, and AI makes the collision total. When a complete, structured, instantly generated set of notes or solutions is always at hand, the marginal cost of racing for coverage approaches zero: a student can have AI compress a whole term into a “looks fully understood” crash pack the night before, taking an already anti-spacing habit to its physical limit. AI didn’t invent cramming culture, but it cut its cost to near nothing, removing the one friction that used to force people to space their studying out earlier: manual organizing was too slow. Once that friction is gone, the path that violates spacing becomes the path of least resistance.

"知识传递"的银行存储模型：把人当容器，AI 是更好的容器

The “knowledge transfer” banking model: treating people as containers, and AI is a better one

这根支柱活在隐喻层面，却最致命。"知识传递"这四个字暗含一个模型——Freire〔R21〕把它叫银行存储式教育：知识是一笔可以转账的存款，老师往学生这个空账户里存，学生的任务是接收、保管，到考试时取出来。这个模型把学习者当成容器，不是建构者。认知科学早就用建构主义判了它的死刑：理解不是被灌进去的，是学习者在已有的结构上主动重建出来的——Vygotsky 的发展观、皮亚杰的同化-顺应，讲的都是对这个隐喻的反驳。

This pillar lives at the level of metaphor, yet it’s the most lethal. The phrase “knowledge transfer” smuggles in a model: Freire〔R21〕called it the banking model of education. Knowledge is a deposit that can be transferred: the teacher deposits it into the empty account that is the student, and the student’s job is to receive, store, and withdraw it come exam time. This model treats the learner as a container, not a builder. Constructivism condemned it in cognitive science long ago: understanding is actively rebuilt by the learner on top of what’s already there, not poured in. Vygotsky’s developmental view and Piaget’s assimilation-accommodation are both arguments against exactly this metaphor.

AI 怎么把它推向空心化——这是全章最该看清的一点：如果学习真的只是"把知识从一处转移到另一处存起来"，那 AI 就是一个比任何人脑都更大、更快、更准的容器。在银行存储模型的框架里，人完全没理由再去内化任何东西，直接把存款存进 AI 这个超级账户就好了。也就是说，这个模型一旦碰上 AI，会自己推出"人不必再学"这个结论，而且在它自己的前提下，这个结论是逻辑自洽的。这正是为什么这个隐喻必须从根上换掉：只要还用"传递/存储"去想象学习，就什么都守不住，因为在那个隐喻里，人本来就该被更好的容器取代。学习承重的地方从来不是存储，是建构和判断——那才是 AI 这个容器替代不了的部分。

How AI pushes it toward hollowing (the single thing this section most needs you to see): if learning really were just moving knowledge from one place to be stored in another, then AI is a container bigger, faster, and more accurate than any human brain. Inside the banking model’s own frame, a person has no reason left to internalize anything; just deposit it into the super-account that is AI. In other words, the banking model, the moment it meets AI, derives on its own the conclusion that people don’t need to learn anymore, and that conclusion is internally consistent given its own premises. That’s exactly why this metaphor has to be replaced at the root: as long as you picture learning as transfer or storage, there’s nothing left to guard, because in that picture a person was always meant to be replaced by a better container. What learning actually carries weight on was never storage; it’s construction and judgment, the part the container called AI can’t replace.

追绩点、文凭与标准化考试：信号机制在 AI 下集体失真

Grade-chasing, credentials, standardized tests: the signaling machinery distorts under AI

把这三样放一起讲，因为它们共享同一个机制：都不是学习本身，是学习的信号——绩点、文凭、标准化分数，都是把"这人有没有能力"压缩传给外界（雇主、下一级学校）的代理指标。代理指标的通病是古德哈特定律，用在考试上就是坎贝尔定律〔R22〕：一个度量一旦成了目标，它就不再是个好度量。学生追绩点而不是追掌握、刷分而不是求理解，这个扭曲在 AI 之前就存在，它之所以能被容忍，是因为过去伪造这些信号的成本足够高——你没法不学就写出一篇及格的论文，或解出一道难题。

Group these three because they share one mechanism: none is learning itself, each is a signal of it: grades, credentials, standardized scores are all proxy indicators that compress “does this person have ability” and hand it to the outside world, employers or the next school. The chronic disease of any proxy indicator is Goodhart’s law, and in testing specifically, Campbell’s law〔R22〕: once a measure becomes a target, it stops being a good measure. Students chasing grades instead of mastery, gaming scores instead of seeking understanding: that distortion predates AI. It was tolerable because faking those signals used to cost enough: you couldn’t write a passing essay, or solve a hard problem set, without having actually learned something.

AI 怎么把它推向空心化：AI 把伪造这些信号的成本砍到接近零。一篇能拿高分的论文、一套能解对的作业、一份漂亮的项目报告，现在都能在不经过任何内化的情况下直接生成。这意味着信号和它本该代表的能力之间的连接断了——分数还在涨，能力可以原地不动，甚至萎缩。标准化考试受的冲击最直接：它的全部价值建立在"可比较、难作弊"上，而当 AI 能在远程、开卷、甚至闭卷的边缘场景里大量介入，"分数代表能力"这个等式的两头就脱钩了。这里要说句公道话：标准化考试在公平分发稀缺机会上仍有难以替代的制度价值，我们不主张废掉它；但作为学习的信号，它在 AI 之下的失真是结构性的，不能再当成"学会了"的可靠证据。真正抗得住 AI 的信号只剩一类：没有 AI 在场、换一个新情境下的现场重现和迁移（第 8 节的迁移测试），因为那一关考的正是 AI 替不了的"会不会做"。

How AI pushes it toward hollowing: AI cuts the cost of faking these signals to near zero. A high-scoring essay, a correctly solved assignment, a polished project report can now be generated without any internalization at all. That severs the link between the signal and the ability it was supposed to represent: the score keeps rising while the underlying ability can stay flat, or even atrophy. Standardized testing takes the most direct hit: its whole value rests on being comparable and hard to cheat, and once AI can intervene heavily in remote, open-book, even closed-book edge cases, the two ends of “score represents ability” come apart. To be fair here: standardized testing still carries real, hard-to-replace institutional value in fairly distributing scarce opportunity, and we’re not arguing to abolish it. But as a signal of learning, its distortion under AI is structural, and it can’t be treated as reliable proof of “has learned” anymore. The one signal that still holds against AI is live reproduction and transfer in a new situation with no AI present (the transfer test from Section 8), because that’s the gate that tests exactly the “can do” that AI can’t stand in for.

六根支柱收成一句话：旧结构优化的是"获取"，而获取已经免费

Six pillars, one sentence: the old structures optimized “acquisition,” and acquisition is now free

收一下这一节：六根支柱看着各不相同，底下是同一个错配。它们全都诞生于信息获取很贵的那个世界，所以全都把结构优化在获取这一端：讲授优化分发，覆盖优化吞吐，考试优化抽查，信号机制优化筛选。而第 2 节那道成本剪刀差已经说明白了：获取（"知道"）正塌向零，瓶颈已经整体搬到内化（"会做"）这一端。于是这些把全部工程精力都压在获取端的结构，正在优化一个已经被解决的瓶颈，对真正的新瓶颈——内化、判断、提问——几乎无所作为，甚至因为不断制造"获取就是学会"的错觉而在拖后腿。

To sum up this section: the six pillars look different, but underneath is the same mismatch. All were born in a world where acquiring information was expensive, so all optimized the structure at the acquisition end: the lecture optimizes distribution, coverage optimizes throughput, the exam optimizes spot-checking, the signaling machinery optimizes filtering. And Section 2’s cost scissors already made the point: acquisition (“knowing that”) is collapsing toward zero, and the bottleneck has moved wholesale to internalization (“knowing how”). So these structures, pouring all their engineering into the acquisition end, are optimizing a bottleneck that’s already solved, doing almost nothing for the real new one (internalization, judgment, asking) and actively getting in its way by continuously manufacturing the illusion that acquisition equals learning.

这不是说要把它们全推翻。讲授仍然是高效的触发器（点燃一个问题）；考试仍然是有用的提取触发器（只要用得高频、低风险、跨情境）；标准化测试仍有公平分发的制度价值。要换的不是这些工具，是它们背后那套"学习等于把信息搬进脑子、考试时再搬出来"的获取端世界观。AI-Native 的学习方法论，就是把工程重心从获取端整体搬到内化端的一次重画——后面讲的脚手架、止步线、反思库、迁移测试，都是为内化端这个真瓶颈造的新工具。

That’s not a call to tear them all down. The lecture is still an efficient trigger that ignites a question; the exam is still a useful retrieval prompt, so long as it’s used often, low-stakes, and across situations; standardized testing still carries institutional value for fair distribution. What actually needs replacing is the acquisition-end worldview behind them, not these tools: learning as moving information into the head and back out at exam time. The AI-Native learning methodology is exactly a redraw that moves the engineering center of gravity from the acquisition end to the internalization end. The scaffold, the stop-line, the reflection log, the transfer test that follow are all new tools built for that real bottleneck.

检验信号Test signal

自检：把这门课的获取部分（讲、读、查）都交给 AI，还剩什么算学习？几乎不剩就在空心化；剩下自产-自纠-迁移-反思，才在优化内化端。A self-check: hand this course’s acquisition parts (lecture, read, look up) entirely to AI – what’s left that counts as learning? Almost nothing = hollowing out; plenty of self-produce, self-correct, transfer, reflect = optimizing the internalization end.

LEARN

CASES · 案例

CASES

案例 · 把内核走一遍

Cases · walking the kernel through

把方法论走到一个真人身上：具体案例

Walking the methodology onto a real person: concrete cases

机理、仪器、止步线要是只停在抽象层，等于还没被验证过。

Mechanism, instruments, a stop-line: left abstract, they’re as good as unproven.

一句话In one line

把方法论走到四个真人身上：关键动作从来不是"少用 AI"，是先把一团模糊的能力拆细到能逐块判定，再对止步线内那部分设难度、建库、挂指标。Walking the methodology onto four real people: the key move is splitting a fuzzy capacity fine enough to judge piece by piece, then adding difficulty, a log, and indicators to whatever falls inside the stop-line. It is never “use AI less.”

案例一 · 一名后端工程师对"读懂陌生代码库"做的卸载体检

Case 1 · A backend engineer’s offloading audit on “reading an unfamiliar codebase”

情境具体化：一位有六年经验的后端工程师，过去一年里高度依赖 AI 来"读懂"陌生代码：接手一个旧服务时，习惯把整个文件贴给 AI，要它"解释这段在干嘛"。他用 INSTRUMENT 11（外包 vs 内化体检）给这项能力切两轴，结果落在最危险的那格：高可充裕（AI 确实能逐行解释）× 高不可外包（读懂系统是他做架构判断、code review、定位线上故障的底座），便利陷阱。这逼他做了一次更细的拆分：把"读懂代码库"这一团能力拆成三件子能力，分别体检，而不是整团交出或整团留下。

Make the scene concrete: a backend engineer with six years’ experience leaned heavily on AI over the past year to “read” unfamiliar code. On inheriting an old service, the habit was to paste an entire file to AI and ask it to “explain what this does.” He ran this capacity through INSTRUMENT 11 (the offload-vs-internalize audit) on two axes, and it landed in the most dangerous cell, the convenience trap. On one axis, high abundance: AI genuinely can explain line by line. On the other, high un-outsourceability: understanding the system is the bedrock of his architecture judgment, code review, and production-incident triage. This forced a finer split: break the lump capacity “reading a codebase” into three sub-capacities, audited separately, rather than handing over or keeping the whole lump.

子能力 ASub-capacity A

语法/API 含义Syntax / API meaning

"这个库函数的参数是什么意思"——纯"知道"层，可充裕、外包无损。判定：交给 AI。强行记忆是在和已解瓶颈较劲。“What do this library function’s parameters mean” – pure “knowing” layer, abundant, safe to outsource. Verdict: hand to AI. Memorizing it is wrestling a solved bottleneck.

子能力 BSub-capacity B

控制流/数据流追踪Control-/data-flow tracing

"一个请求进来后在系统里怎么流动"——混合区。让 AI 当陪练（先自己画一遍流程，再让 AI 找漏），保留自产-自纠环。“How a request flows through the system once it arrives” – the mixed zone. Let AI spar (sketch the flow yourself first, then have AI find gaps), keeping the self-produce/self-correct loop.

子能力 CSub-capacity C

设计意图/隐性约束Design intent / implicit constraints

"当初为什么这样设计、哪些是不能动的隐形约束"——止步线内。刻意不外包：这正是他作为资深工程师不可替代的判断根。“Why it was designed this way, which invisible constraints must not be touched” – inside the stop-line. Deliberately not outsourced: this is the irreplaceable root of his judgment as a senior engineer.

之前Before

整个文件贴给 AI，读它给的总结。三个月后换个相邻模块，依然要重新贴、重新问——他对系统的心智模型没有任何增长，每次都是从零依赖。一次线上故障，AI 的解释把他引向了错误的模块，他没有独立的系统图可以反驳它，多花了两小时。

Paste the whole file to AI, read its summary. Three months later, on an adjacent module, he still had to re-paste and re-ask – his mental model of the system had grown not at all; every time was dependence from zero. In one production incident, AI’s explanation led him to the wrong module, and with no independent system map to push back with, he lost two extra hours.

之后 · 按子能力分治After · split by sub-capacity

A 全交 AI；B 先自己画控制流再让 AI 查漏；C 关掉 AI 自己重建设计意图、记进反思库。两个月后，他对这个服务有了一张自己脑中的系统图——下一次故障，他能用这张图反驳 AI 的错误猜测。省力没减多少（语法层仍全交），但留存与判断力实打实长了。

A handed fully to AI; B sketch the control flow himself, then have AI find gaps; C close AI, rebuild design intent himself, log it in the reflection base. Two months on, he held a system map in his own head for this service – at the next incident he could use it to overrule AI’s wrong guess. Effort saved barely dropped (the syntax layer is still fully handed over), but retention and judgment grew for real.

这个案例的承重点，是把一团模糊的能力拆细到可以分别判定的颗粒度——而非笼统地少用 AI。整团交出会萎缩判断根；整团留下是在和已解瓶颈较劲。真正的功夫在那把"分治的刀"——而挥这把刀的判断（哪一格是止步线内）恰恰是 AI 替不了、必须留在人手里的那一步。这正是第 7 节交出/保留决策走到一个真人身上的样子。

The load-bearing point of this case is not “use AI less” but splitting a fuzzy lump of capacity down to a granularity where each piece can be judged separately. Handing the whole lump over atrophies the judgment root; keeping the whole lump wrestles a solved bottleneck. The real craft is in that “knife of decomposition” – and the judgment that wields it (which cell is inside the stop-line) is precisely the step AI cannot replace and must stay in human hands. This is Section 7’s keep/hand-over decision walked onto a real person.

FIG. L.13 / 交出 vs 保留 · 决策树HAND-OFF VS KEEP · DECISION TREE看懂：三个问题把一项能力分流到四种处置Read: three questions route a capacity into four dispositions

沿树走：Q1 把纯"知道"层筛掉（交给 AI）；Q2 是主权问题，把"萎缩了会动摇判断根"的能力拦下；Q3 是 Bjork 边界——只有已具基础的人，保留难度才"合意"，否则先补基础或让 AI 陪练。落在最左和最右两支的能力放心处置；落在底部那支的，才是真正要划进止步线、刻意设阻力的。这棵树的每个判断点都需要人来答——它本身就是 AI 替不掉的那层。Walking the tree: Q1 screens out the pure “knowing” layer (hand to AI); Q2 is the sovereignty question, intercepting capacities whose atrophy would shake the judgment root; Q3 is Bjork’s boundary – only for those with a base is retained difficulty “desirable,” otherwise build the base first or let AI spar. Capacities landing on the far-left and far-right branches are disposed of safely; only those on the bottom branch are truly to be drawn inside the stop-line with deliberate friction. Every decision node on this tree must be answered by a human – it is itself the layer AI cannot displace.

案例二 · 一门数据结构课把"合意困难"重新设计回去

Case 2 · A data-structures course redesigns desirable difficulty back in

情境：一位大学讲师发现，自从学生普遍用上 AI 编程助手，数据结构课的作业分数全面上升，但期中闭卷一考——分数明显下降。学生交上来的链表、树、图作业近乎完美，可一旦撤掉 AI、换一道结构相似但题面陌生的题，大多数人写不出来。这是第 2 节识别/重现错觉的教科书级现场：作业制造了识别层的虚假流畅，掩盖了重现层的空洞。她没有禁用 AI（那是脆弱的、也执行不了的），而是按合意困难三类，把刻意的难度重新设计回课程里，且每一处都设计成"AI 在场也绕不开"。

Scene: a university lecturer noticed that ever since students broadly adopted AI coding assistants, data-structures assignment scores rose across the board – but the closed-book midterm crashed. Submitted linked-list, tree, and graph assignments were near-perfect, yet remove AI and swap in a structurally similar but unfamiliar problem, and most could not produce it. This is a textbook scene of Section 2’s recognition/reproduction illusion: assignments manufactured recognition-layer fluency that masked a hollow reproduction layer. She did not ban AI (fragile and unenforceable) but, along the three families of desirable difficulty, redesigned deliberate difficulty back into the course – each piece designed so that “even with AI present you cannot bypass it.”

间隔 SpacingSpacing

回访式小测Revisiting quizzes

每周三分钟闭卷小测，必考三周前学过的旧主题——逼分散提取，对撞"学完就忘"。基于间隔效应（Cepeda 元分析 Ⅰ 级〔R14〕）。A weekly three-minute closed-book quiz always tests a topic from three weeks ago – forcing distributed retrieval, colliding with “learn-then-forget.” Grounded in the spacing-effect meta-analysis (grade I〔R14〕).

提取 RetrievalRetrieval

先白板再键盘Whiteboard before keyboard

作业要求先交一张手写/口头讲解视频（无 AI），再交代码。提取走在生成前，把重现这一关补回（测试效应 Ⅱ 级〔R1〕）。Assignments require a handwritten/spoken-explanation video first (no AI), then the code. Retrieval precedes generation, restoring the reproduction gate (testing effect, grade II〔R1〕).

交错 InterleavingInterleaving

混合题型作业Mixed problem sets

不再"本周只做树"，而是把树/图/哈希混在同一份作业里——逼学生先判断"该用哪种结构"，而判断恰是 AI 在场也得自己做的那一步。No more “trees only this week”; trees/graphs/hashing are mixed in one set – forcing students to first judge “which structure applies,” and judging is the step they must do themselves even with AI present.

之前Before

单一主题、可一次性用 AI 完成、考前突击的作业流。作业均分 92，期中闭卷均分 58，两者背离却无人警觉——直到期末挂科率翻倍。即时反馈（高作业分）系统性误导了师生双方对掌握程度的判断。

A single-topic, AI-completable-in-one-pass, cram-before-the-exam assignment flow. Assignment average 92, closed-book midterm average 58 – the divergence went unnoticed until the final’s failure rate doubled. Immediate feedback (high assignment scores) systematically misled both teacher and students about the degree of mastery.

之后 · 三类困难设计回去After · three difficulties designed back in

作业均分降到 81（当下更难受，符合预期），但期中闭卷均分升到 74，两者的差距大幅收窄——这正是合意困难的指纹：牺牲当下流畅，换回长期且可迁移的留存。学生抱怨"变难了"，但学期末的迁移测试通过率显著高于上一届。

Assignment average fell to 81 (more painful in the moment, as expected), but the closed-book midterm average rose to 74, sharply narrowing the gap – the fingerprint of desirable difficulty: sacrifice present fluency for long-term, transferable retention. Students complained “it got harder,” yet end-of-term transfer-test pass rates were markedly higher than the prior cohort’s.

这个案例的承重点：合意困难不是"让课更难"这么粗。它是精确地把难度加在重现与判断这两个 AI 绕不开的环节上，同时继续把纯获取层交给 AI。讲师没有和工具对抗，她重设的是评估与练习的结构，让结构自己把学习逼回内化端。注意 Bjork 的边界她也守住了：这些难度只对已经听过课、有基础的学生"合意"，对完全没基础的旁听者就只是挫败——所以小测和白板讲解都建立在课程已铺好的基础之上（R8 的告诫）。

The load-bearing point: desirable difficulty is not as crude as “make the course harder.” It is placing difficulty precisely on the two links AI cannot bypass – reproduction and judgment – while continuing to hand the pure acquisition layer to AI. The lecturer did not fight the tool; she redesigned the structure of assessment and practice so the structure itself forces learning back to the internalization end. Note she also kept Bjork’s boundary: these difficulties are “desirable” only for students who attended and have a base. For a baseless auditor they are mere frustration, so the quizzes and whiteboard explanations all build on the base the course has already laid (R8’s caveat).

案例三 · 一位医学生给"鉴别诊断推理"划止步线

Case 3 · A medical student draws a stop-line around “differential-diagnosis reasoning”

情境：一位临床阶段的医学生，用 AI 做症状到诊断的推理非常顺畅：输入一组症状，AI 秒给一份排好序的鉴别诊断列表，附带每条的支持/反对证据。她差点把整个推理过程外包出去。但她对这项能力跑了一遍第 6 节的止步线决策，结论是：这是绝不能退的那条边界。理由是更结构性的一点，并非"AI 不准"（它常常很准）：临床推理是她未来独立行医时为后果负责的能力根基。一个不能独立做出鉴别诊断的医生，无法判断 AI 给出的列表"这次靠不靠谱"，也无法在 AI 漏掉一个罕见但致命的诊断时把它捞回来。她外包的可以是文献检索、剂量计算、指南查询，但不能外包"从症状推到诊断"这条主推理链。

Scene: a clinical-phase medical student found AI extremely handy for symptom-to-diagnosis reasoning – feed in a cluster of symptoms and AI instantly returns a ranked differential list with supporting/opposing evidence for each. She nearly outsourced the entire reasoning process. But she ran this capacity through Section 6’s stop-line decision and concluded: this is the boundary that must never be ceded. The reason is something more structural, not “AI is inaccurate” (though it is often very accurate): clinical reasoning is the capacity-root by which she will be accountable for consequences in independent practice. A physician who cannot independently form a differential cannot judge whether AI’s list “holds up this time,” nor catch a rare-but-lethal diagnosis when AI misses it. What she may outsource is literature search, dose calculation, guideline lookup; what she may not outsource is the main reasoning chain from symptom to diagnosis.

她落地的方式，是设一道明确的顺序规约，而非"不用 AI"：先自己产出完整的鉴别诊断列表并写下推理（自答在先），然后才打开 AI 对照：AI 此时的角色是"找出我漏掉了什么、我哪条推理错了"，而不是"替我想"。每一次她和 AI 的诊断列表不一致，无论谁对，都进她的错题反思库（第 9 节），记的是"我当时为什么会这样推、漏在哪个环节"，而非"正确答案"。三个月下来，她和 AI 不一致、且她对的比例从 9% 升到 23%，这正是第 8 节那条"质疑 AI 的命中率"先行指标在上升，说明她的推理能力没有被 AI 替代，反而在与 AI 的对抗中被磨利了。

Her landing was not “don’t use AI” but a clear ordering rule: first produce the full differential and write the reasoning herself (self-answer first), then open AI to compare. AI’s role here is “find what I missed, where my reasoning went wrong,” not “think for me.” Every time her and AI’s differentials disagreed, regardless of who was right, it went into her error-reflection log (Section 9), recording not “the correct answer” but “why I reasoned that way, which link I missed.” Over three months, the share of cases where she and AI disagreed and she was right rose from 9% to 23%. That is exactly Section 8’s “AI-challenge hit-rate” leading indicator rising, showing her reasoning was not replaced by AI but sharpened in sparring against it.

止步线的判据The stop-line criterion

止步线最干净的判据：一项能力，萎缩了你就再也判断不了 AI 对不对、也无法为后果负责，它就在线内。AI 做得好反而更危险，因为外包看起来零成本。The cleanest stop-line criterion: a capacity is inside the line if, once it atrophies, you can no longer judge whether AI is right nor be accountable for the consequences. AI doing it well makes it more dangerous, not less – outsourcing looks costless.

案例四 · 一个自学者在萎缩硬化前用仪表盘抓住了它

Case 4 · A self-learner catches atrophy on the dashboard before it hardened

情境：一名转行学数据分析的自学者，半年里进步飞快，靠的是 AI 全程陪写代码、解释报错、给思路。表面信号全是绿的：项目越做越复杂、产出越来越快。但他按第 8 节给自己挂了一组刻意偏向滞后、无 AI 在场的先行指标，每两周自测一次。第三个月，一条指标先动了："不用 AI 我还会吗"的失败频次在升，他发现自己越来越难独立写出一段哪怕简单的数据清洗逻辑，必须先打开 AI 才有"手感"。紧接着第二条指标确认了它：迁移测试通过率在降，给他一个结构相似但 AI 没见过上下文的新任务，他卡住的概率比两个月前更高。

Scene: a career-changer self-studying data analysis progressed fast over six months – on the back of AI co-writing code, explaining errors, and supplying ideas throughout. The surface signals were all green: projects grew more complex, output grew faster. But following Section 8 he hung a set of leading indicators deliberately biased toward lagged, AI-absent, self-testing every two weeks. In month three, one indicator moved first: the failure frequency of “could I still do this without AI” rose – he found it increasingly hard to independently write even a simple data-cleaning routine, needing to open AI first to get “the feel.” Then a second indicator confirmed it: transfer-test pass rate fell – given a structurally similar task whose context AI had not seen, his probability of getting stuck was higher than two months earlier.

这两条指标都领先于任何外部失败。他的项目还在正常推进，没有任何老板或客户察觉问题：如果他只看产出，会一直绿灯到某天 AI 用不了、或遇到一个 AI 解不了的真问题时才暴雷，而那时萎缩已经硬化、补救成本极高。仪表盘的全部价值就在这个提前量：它在能力损失转化为可见后果之前就让损失可见。他的补救也很直接：把数据清洗这项能力从"全交"档拨回"合意带"（INSTRUMENT 13），恢复"先自己写、卡住超过 15 分钟才问 AI"的延迟提示规约，并把每次卡点记进反思库。两个月后，那两条指标掉头，他没有放弃 AI，只是把一项悄悄滑出止步线的能力，重新拽了回来。

The crux: both indicators lead any external failure. His projects were still progressing normally, and no boss or client noticed anything. Had he watched only output, the light would have stayed green until the day AI was unavailable or a real problem AI could not solve appeared. By then, atrophy would have hardened and remediation cost soared. The dashboard’s entire value is this lead time: it makes capacity loss visible before that loss turns into a visible consequence. His remedy was direct too. He dialed the data-cleaning capacity back from “all-hand” to “the band” (INSTRUMENT 13), restored a delayed-hint rule of “write it yourself first, only ask AI after being stuck 15 minutes,” and logged each sticking point in the reflection base. Two months on, the two indicators turned around – he did not abandon AI; he merely pulled back a capacity that had quietly slid out of the stop-line.

反过来：这套方法论自己会在哪里失败

In reverse: where this methodology fails on its own terms

诚实地走完四个成功案例，还要补一个会失败的案例——否则就违反了本卷自己的证据纪律（别把处方说得比证据硬）。这套方法论有三种现实的失败模式，都值得点名：

Having honestly walked four successful cases, one must add a failing case too – otherwise we violate the volume’s own evidence discipline (do not state a prescription harder than its evidence). This methodology has three realistic failure modes, all worth naming:

止步线划得太宽，退回低效。一个把几乎所有认知都划进止步线、拒绝外包的人，算不上认知主权，只是和已解瓶颈较劲——他在纯"知道"层（语法、样板、查得到的事实）上浪费的合意困难，本可以省下来投到真正的判断根上。止步线的价值恰恰在于它窄：守住少数几条真正承重的边界，其余尽量交出去。划得太宽，方法论就退化成它批判过的"覆盖优先"的镜像。
The stop-line drawn too wide, regressing into inefficiency. A person who draws almost all cognition inside the stop-line and refuses to outsource is not exercising cognitive sovereignty but wrestling a solved bottleneck – the desirable difficulty he wastes on the pure “knowing” layer (syntax, boilerplate, look-up-able facts) could have been saved and invested in the real judgment root. The stop-line’s value lies precisely in being narrow: guard the few truly load-bearing boundaries and hand the rest out. Drawn too wide, the methodology degrades into a mirror of the “coverage-over-mastery” it critiqued.
基础不足时强加难度，制造的是挫败而非学习。这是 Bjork 边界（R8）最常被忽略的一面——合意困难只对"有基础能成功响应"的学习者合意。对一个连基本概念都没建立的初学者强行闭卷、延迟提示，难度不会变"合意"，只会让他停滞、放弃。对这类学习者，正确的做法恰恰相反：先让 AI 当密集的脚手架把基础铺起来，等他能成功响应了，再逐步撤除、调高阻力。把顺序反过来，方法论就成了劝退器。
Imposing difficulty without a base produces frustration, not learning. This is the most-ignored face of Bjork’s boundary (R8) – difficulty is desirable only for a learner with enough base to respond successfully. Forcing closed-book work and delayed hints on a beginner who has not even built the basic concepts will not make the difficulty “desirable”; it will only jam and discourage them. For such a learner the right move is the opposite: let AI be a dense scaffold to lay the base first, and once they can respond successfully, gradually withdraw it and raise the friction. Get the order backwards and the methodology becomes a quit-trigger.
仪表盘指标被当成新的 KPI 来表演。如果有人把"质疑 AI 的命中率"当成要冲高的分数，他会开始为了指标表现而无意义地反驳 AI——这正好掉进 Campbell 定律〔R22〕：指标一旦成了目标就失真。仪表盘是烟雾报警器，不是计分板；它响了你去查原因，而不是想办法让它显示好看的数字。
The dashboard indicators gamed as a new KPI. If someone treats “AI-challenge hit-rate” as a score to maximize, they will start pushing back on AI meaninglessly just to game the metric – falling straight into Campbell’s law〔R22〕: an indicator distorts once it becomes a target. The dashboard is a smoke alarm, not a scoreboard; when it sounds you investigate the cause, you do not engineer it to display a flattering number.

这三种失败模式有一个共同的解药，也正好回到内核：判断。止步线划多宽、难度加多大、指标怎么读：没有一条能交给一个固定公式，全都要人根据具体情境做判断，并随能力成长持续调整。这恰恰证明了本卷的核心立场是"把判断这一步留在人手里"，并非"少用 AI"这种粗糙的姿态：连这套方法论本身，都需要被它的使用者用判断来校准，而不是机械照搬。一套需要判断才能正确使用的方法论，本身就是对"判断不可外包"这一命题的演示。

These three failure modes share one antidote, which returns us to the kernel: judgment. How wide to draw the stop-line, how much difficulty to add, how to read the indicators – none can be handed to a fixed formula; all require a person to judge by the specific situation and keep adjusting as capacity grows. This is exactly what proves the volume’s core stance is not the crude posture of “use AI less” but “keep the judgment step in human hands” – even this methodology itself must be calibrated by its user’s judgment rather than applied mechanically. A methodology that requires judgment to use correctly is itself a demonstration of the proposition that judgment cannot be outsourced.

四个案例横跨工程、教学、临床、自学，但走的是同一条内核路径：先用机理看清"知道 vs 能做"的成本差，再用仪器把一团能力拆到可判定的颗粒，对落在止步线内的那部分设合意困难、建反思库、挂先行指标。它们共同演示的，是把判断这一步牢牢留在人手里，把执行尽量交出去，并用工程化的反馈回路守住这条边界不被便利的默认引力悄悄推移；重点从不在"少用 AI"。

The four cases span engineering, teaching, the clinic, and self-study, yet walk the same kernel path. Use the mechanics to see the “knowing vs doing” cost gap, and use the instruments to split a lump of capacity to a judgeable grain. For the part landing inside the stop-line, add desirable difficulty, build a reflection log, and hang leading indicators. What they jointly demonstrate is not “use AI less.” It is keeping the judgment step firmly in human hands, handing execution out as far as possible, and using engineered feedback loops to keep this boundary from being quietly pushed by convenience’s default gravity.

LEARN

TOOLKIT · 工具包

TOOLKIT

工具包 · 可直接照做

Toolkit · do-this artifacts

可直接照做的内化端工具

The internalization-end tools you can run as-is

前面都在讲道理，这一章把它全变成你明天就能照着做的动作。

Everything before this argued the case. This chapter turns it into moves you can follow starting tomorrow.

一句话In one line

五件明天就能用的工具，把整卷落成能照做的规约。最小、也最高杠杆的一条：开口问 AI 之前，先写下你自己的答案。Five tools you can use tomorrow, turning the whole volume into rules you can just follow. The smallest, highest-leverage one: write your own answer down before you ask AI.

工具一 · 错题反思库规约：记的是推理路径，不是答案

Tool 1 · The error-reflection-log spec: log the reasoning path, not the answer

大多数人记的"错题本"记错了对象——抄下正确答案，下次照抄。那还是在记"知道"，对内化毫无帮助。AI-Native 的反思库（第 9 节）记的是你当时的推理路径与它在哪里断裂，因为可迁移的能力长在路径上，不在答案上。每条记录是固定五栏，刻意做到人机同源、可 diff（你和 AI 各填一份，差异本身就是学习信号）：

Most people’s “error notebook” logs the wrong object – copy down the correct answer, copy it again next time. That is still logging “knowing,” useless for internalization. The AI-Native reflection log (Section 9) records your reasoning path at the time and where it fractured, because transferable capacity grows on the path, not the answer. Each entry is a fixed five-field record, deliberately same-source and diffable (you and AI each fill one, the difference itself a learning signal):

栏 1Field 1

触发Trigger

遇到的具体问题，以及"我当时以为这是个什么问题"——很多错从误判问题类型开始。The specific problem, and “what kind of problem I thought it was” – many errors start with misclassifying the problem type.

栏 2Field 2

我的路径My path

撤掉 AI、凭自己走的完整推理（哪怕错的），逐步写下——这是重现而非识别，最吃力也最值钱。The full reasoning you walked yourself with AI removed (even if wrong), written step by step – this is reproduction not recognition, the hardest and most valuable.

栏 3Field 3

断裂点Fracture point

和 AI/正解对照后，定位推理在哪一步第一次偏离，以及偏离的根因（不是"算错了"，是"我为什么会这样想"）。After comparing with AI/the solution, locate at which step the reasoning first diverged, and the root cause (not “miscalculated” but “why I thought that way”).

栏 4Field 4

可迁移的修正Transferable fix

把断裂点抽象成一条下次能用在别的题上的规则，而非只修这一题。迁移性是它和普通错题本的分水岭。Abstract the fracture into a rule usable on other problems next time, not just patching this one. Transferability is the watershed from an ordinary error notebook.

栏 5Field 5

回访日Revisit date

按间隔效应排一个未来日期，到期闭卷重做这条——不回访的反思库只是仓库（R14）。Schedule a future date by the spacing effect; on it, redo this entry closed-book – a reflection log never revisited is just a warehouse (R14).

为什么坚持人机同源、可 diff？因为你和 AI 各自独立填栏 2（推理路径）后，两份的差异就是最高密度的学习信号：AI 想到而你没想到的，是你的盲区；你想到而 AI 没想到的，是你尚存的、值得守护的独立判断。把这个差异本身当成学习对象，比单看正确答案信息量大一个量级。这条规约不依赖任何特定软件，一个表格、一个 Markdown 文件、一张纸都能跑——它的承重不在工具，在"记路径不记答案"这条纪律。

Why insist on same-source and diffable? Because once you and AI each independently fill field 2 (the reasoning path), the difference between the two is the highest-density learning signal. What AI thought of and you did not is your blind spot; what you thought of and AI did not is your surviving, worth-guarding independent judgment. Treating that difference itself as the learning object carries an order of magnitude more information than reading the correct answer alone. This spec depends on no particular software – a spreadsheet, a Markdown file, a sheet of paper all run it – its load is not in the tool but in the discipline of “log the path, not the answer.”

工具二 · 先自答协议：永远不空手向 AI 求助

Tool 2 · The self-answer-first protocol: never ask AI empty-handed

这是全卷处方里最小、也最高杠杆的一条，可以浓缩成一句操作律：在向 AI 提问之前，先写下你自己的答案、猜测或推理草稿，哪怕它很可能是错的。它之所以是杠杆点，是因为它一次性守住了三样东西：

This is the smallest and highest-leverage prescription in the volume, compressible to one operating law: before asking AI, write down your own answer, guess, or draft reasoning first, even if it is likely wrong. It is a leverage point because it guards three things at once:

它强制激活重现（你得先从脑中生成，而非等着识别 AI 的输出），把识别/重现错觉关在门外。
It forces reproduction (you must generate from your head rather than wait to recognize AI’s output), shutting the recognition/reproduction illusion out.
它把 AI 从"答案来源"降级为"对照与校验源"——你带着一个待检验的假设去，而不是带着一个空洞去，于是你读 AI 输出时是在验证而非接收，验证这个动作本身就在练判断。
It demotes AI from “answer source” to “comparison and verification source” – you arrive with a hypothesis to test rather than a void, so when you read AI’s output you are verifying, not receiving, and the act of verifying is itself practicing judgment.
它产出了反思库栏 2 所需的原始材料——你的独立路径。
It produces the raw material field 2 of the reflection log needs – your independent path.

落地形态可以极轻：在每个 AI 对话框上方放一行自我提示"我先猜：______"，或养成"先在便签上写三句再回车"的习惯。

The landing form can be extremely light: a self-prompt line above each AI chat box, “my guess first: ______,” or a habit of “write three sentences on a sticky note before pressing enter.”

工具三 · 迁移测试设计：撤掉 AI、换情境、看还能不能做出来

Tool 3 · Transfer-test design: remove AI, change the context, see if you can still do it

第 8 节说迁移测试通过率是最该看的信号；这里给出怎么设计一道合格的迁移测试。一道好的迁移测试要同时满足三个条件，少一个就会退化成自欺：

Section 8 says transfer-test pass rate is the signal most worth watching; here is how to design a valid transfer test. A good one must satisfy three conditions at once; drop any and it degrades into self-deception:

无 AI 在场——只要 AI 可触达，测的就是"你+AI"的联合能力，而不是你的能力，这是最常被偷掉的条件。
No AI present – as long as AI is reachable, you are testing the joint “you + AI” capacity, not yours; this is the most frequently stolen condition.
情境变化——题面、数据、上下文必须与你练过的不同，否则你测的可能只是对特定题型的模式记忆，而非可迁移的理解；迁移的全部意义在于"换个壳还认得出内核"。
Changed context – the wording, data, and context must differ from what you practiced, or you may be testing pattern-memory for a specific problem type rather than transferable understanding; transfer’s whole meaning is “recognize the kernel through a new shell.”
要求重现而非识别——必须让你从头生成（写出、做出、讲出），而不是从给定选项里认出对的。
Require reproduction, not recognition – you must generate from scratch (write, do, explain), not pick the right one from given options.

判据很干脆：通过 = 在以上三条都满足时还能独立做出来。设计节奏建议跟睡眠/间隔走（R14/R15）：隔天测、隔周测、隔月测，三道都过，才算真的编译进了"能做"。一个轻量版本：每学完一个主题，让 AI 帮你生成一道"结构相同但换皮"的新题，然后立刻关掉 AI，闭卷做，这里 AI 是出题者，不是答题者，这个角色分配本身就是方法论。

The criterion is blunt: pass = you can still do it independently when all three hold. Pace the design with sleep/spacing (R14/R15) – test next-day, next-week, next-month; pass all three and it is truly compiled into “doing.” A lightweight version: after finishing a topic, have AI generate a “same-structure, reskinned” new problem, then immediately close AI and do it closed-book – here AI is the question-setter, not the answerer, and that role assignment is itself the methodology.

FIG. L.14 / 三类合意困难的机理THE THREE DESIRABLE DIFFICULTIES看懂：间隔/提取/交错各自怎么把"当下变难"换成"长期变牢"Read: how spacing/retrieval/interleaving each trade “harder now” for “sturdier later”

三栏一起读：间隔（左）把练习拉开，对撞集中突击；提取（中）让信息从脑中输出而非再输入，对撞重读式的虚假流畅；交错（右）把不同题型混排，逼出"该用哪招"的判断。三者机理不同，但指纹相同——都让你当下更吃力（红），换来长期更牢、且能迁移到新情境的留存（蓝）。这正是为什么 AI 提供的"轻松"是个陷阱：轻松感恰恰是合意困难被绕过的信号。Read the three panels together: spacing (left) pulls practice apart, colliding with massed cramming; retrieval (center) sends information out of the brain rather than back in, colliding with reread fluency; interleaving (right) mixes problem types, forcing the judgment of “which move applies.” Different mechanisms, same fingerprint – each makes you more effortful now (red) for sturdier, transfer-capable retention later (blue). This is exactly why the “ease” AI offers is a trap: the feeling of ease is precisely the signal that desirable difficulty has been bypassed.

工具四 · 提问者训练操：把"会答"练成"会问"

Tool 4 · Questioner-development drills: training from “can answer” into “can ask”

第 3 节主张学习目标上游到提问、质疑、整合这套元能力。但"成为更好的提问者"常常停在口号——这里给四个可练的具体操，每个都对准一种被 AI 充裕悄悄削弱的提问肌肉。它们不需要任何特殊场景，把日常和 AI 的协作改造一下就能练。

Section 3 argues the learning goal moves upstream to the meta-skills of asking, challenging, integrating. But “become a better question-asker” often stalls at a slogan – here are four practicable drills, each aimed at a question-muscle quietly weakened by AI abundance. They need no special setting; reshape your everyday AI collaboration and you train them.

操 1Drill 1

反驳得对Push back, correctly

每次 AI 给完答案，强制自己找出至少一处可质疑点并验证它对不对。练的是第 8 节的"质疑命中率"，这是头号先行指标（R18）。After every AI answer, force yourself to find at least one challengeable point and verify whether it holds. Trains Section 8’s “challenge hit-rate,” the top leading indicator (R18).

操 2Drill 2

问题升格Upgrade the question

把你刚问 AI 的问题，重写成一个更切中真问题的版本——练的是"识别真问题"，这是 AI 替不了的判断起点。Rewrite the question you just asked AI into a version that hits the real problem more squarely – trains “spotting the real problem,” the judgment starting point AI cannot stand in for.

操 3Drill 3

追到底层Chase to the bottom

对 AI 的每个答案连问三层"为什么"，直到触到第一性原理或触到它的知识边界——练的是把识别层的"懂了"推进到机理层。Ask “why” three layers deep on every AI answer, until you hit a first principle or its knowledge boundary – trains pushing recognition-layer “got it” down to the mechanism layer.

操 4Drill 4

先建假设Hypothesis first

面对一个新领域，先自己列出"我猜关键问题是哪些"，再让 AI 补全——练的是带着结构去探索，而非让 AI 替你定义问题空间。Facing a new field, first list “what I guess the key questions are,” then let AI complete it – trains exploring with structure rather than letting AI define the problem space for you.

这四个操共享一个底层逻辑：AI 充裕最先削弱的是"问"的能力，不是"答"的能力（那本来就在外包）——因为当答案随手可得，人会逐渐失去先在脑中生成问题、并判断哪个问题值得问的习惯。提问肌肉一旦闲置就萎缩，而它恰恰是判断与品味的前端。这四个操就是给这块肌肉刻意施加的负荷。

The four drills share one underlying logic: what AI abundance weakens first is not the capacity to “answer” (that was already outsourced) but the capacity to “ask.” When answers are at hand, a person gradually loses the habit of generating questions in the head first and judging which question is worth asking. The asking-muscle atrophies once idled, and it is precisely the front end of judgment and taste. These four drills are the deliberate load placed on that muscle.

工具五 · AI 止步线决策程序：把"该不该外包"变成可重复的判定

Tool 5 · The AI stop-line decision procedure: making “outsource or not” a repeatable verdict

前四件是练习，这一件是判定程序——把 FIG L.13 决策树固化成一段可重复执行的流程，让"该不该把这件事交给 AI"不再凭感觉，而有一套每次都能跑的步骤。它有五步，顺序不能乱：

The first four are drills; this one is the verdict procedure – freezing the FIG L.13 decision tree into a repeatable run so that “should I hand this to AI” is no longer by feel but a set of steps you can run every time. It has five steps, order fixed:

① 拆颗粒——先把要判定的能力拆到"可单独判定"的最小颗粒（案例一的教训：整团判定必然出错）。
① Split the grain – break the capacity under judgment to the smallest “separately judgeable” grain (Case 1’s lesson: judging the whole lump necessarily errs).
② 问可充裕——AI 能否充裕代劳且有可机检判据？否→暂缓，照常自学；是→进③。
② Ask abundance – can AI abundantly do it with a machine-checkable criterion? No → park, keep learning yourself; yes → go to ③.
③ 问主权——若这项能力萎缩，你是否就再也无法判断 AI 在此事上对不对、也无法为后果负责？否→放心外包，把省下的精力还给止步线内的事；是→进④。
③ Ask sovereignty – if this capacity atrophies, would you no longer be able to judge whether AI is right about it nor be accountable for the consequences? No → hand off freely, return the saved effort to what is inside the stop-line; yes → go to ④.
④ 问基础——你是否已有足够基础能"成功响应"刻意保留的难度（Bjork 边界 R8）？否→先补基础或让 AI 当陪练；是→进⑤。
④ Ask base – do you already have enough base to “respond successfully” to retained difficulty (Bjork’s boundary, R8)? No → build the base first or let AI spar; yes → go to ⑤.
⑤ 落处置——划入止步线：让 AI 当陪练而非代办，设合意困难（用 INSTRUMENT 14 把阻力调到合意带），建反思库，挂先行指标。
⑤ Land the disposition – draw it inside the stop-line: let AI spar rather than do-for-you, add desirable difficulty (use INSTRUMENT 14 to dial friction to the band), build a reflection log, hang leading indicators.

这套程序的价值在于它可重复、可复盘：每次判定都留下"为什么这样判"的痕迹，三个月后你能回看自己的止步线是否需要调整——能力在长，止步线也该随之移动。

The procedure’s value is that it is repeatable and reviewable: each verdict leaves a trace of “why I judged this way,” so three months on you can revisit whether your stop-line needs adjusting – as capacity grows, the stop-line should move with it.

INSTRUMENT 14 · 交出还是保留 · 现场判定器 HAND-OFF OR KEEP · LIVE DECIDER

把工具五的决策程序变成一次现场点击：依次回答三个问题，仪器给出处置（放心外包 / 暂缓自学 / 划入止步线），并在"划入止步线"时让你顺手把阻力调到合意带——这一步就是 INSTRUMENT 13 的旋钮内嵌进来。它把"交出 vs 保留"和"阻力调到刚好"两件事合成一次判定，对应 FIG L.13 那棵树。

Turn Tool 5’s decision procedure into one live click: answer three questions in order, the instrument returns a disposition (hand off freely / park and self-learn / draw inside the stop-line). When it is “inside the stop-line” it lets you dial the friction to the band right there – that step is INSTRUMENT 13’s dial embedded in. It fuses “hand off vs keep” and “dial friction just right” into a single verdict, mirroring the FIG L.13 tree.

Q1 · AI 能充裕代劳、且有可机检判据吗？Can AI abundantly do it, with a machine-checkable criterion?

Q2 · 萎缩它，你会失去判断/兜底主权吗？If it atrophies, do you lose sovereignty to judge / backstop?

Q3 · 你已有足够基础响应难度吗？（Bjork 边界）Do you have enough base to respond to difficulty? (Bjork’s bound)

LEARN

MECHANISM · 机理

MECHANISM

机理 · 速度公理

Mechanism · The speed axiom

有些过程的价值，正在于慢

Some Processes Are Valuable Because They’re Slow

熬夜一晚把一门课"速通"完，第二天却几乎全忘光——AI 承诺加速一切，可有些学习，快了反而学不进去。

Pull an all-nighter to speed-run a course and by the next day it’s nearly all gone. AI promises to accelerate everything, yet some learning, sped up, simply doesn’t take.

一句话In one line

记忆从脆弱变牢固要靠间隔和睡眠，这是 AI 压不掉的物理时间；所以有一类过程的价值正在于慢。"感觉学得更快"最不可信，真效果得隔几天撤掉 AI 再测。Memory turns from fragile to stable through spacing and sleep: physical time AI can’t compress. So one class of process is valuable because it’s slow. “Felt faster” is the least trustworthy signal; measure the real effect days later, with AI removed.

"加速一切"在这里撞上一堵物理的墙。记忆从短期变成长期，这个巩固过程需要时间，也需要睡眠：间隔效应里，隔 24 小时复习胜过隔 15 分钟[R14]；睡眠期间（慢波睡眠/REM），大脑主动回放、重新激活记忆，光是睡一觉，在零额外练习的情况下就能提升技能表现（Stickgold 2006;Diekelmann & Born 2010;Walker et al. 2003，证据分级 Ⅱ）。睡眠对记忆的泛化和迁移尤其关键——而迁移正是"会做"的检验标准（第 2 节讲过）。

“Accelerate everything” runs into a wall of physics here. The consolidation that turns memory from short-term to long-term takes time and needs sleep: in the spacing effect, a 24-hour gap beats a 15-minute gap[R14]. During sleep (slow-wave/REM), the brain actively replays and reactivates memory; a single night’s sleep alone can improve skill performance with zero extra practice (Stickgold 2006; Diekelmann & Born 2010; Walker et al. 2003, grade II). Sleep matters especially for generalization and transfer of memory, and transfer is exactly the test standard for “knowing how” (Section 2).

旧误读 · 学习即传输Old misreading · learning as transfer

既然信息能瞬时传输，学习的"慢"被当成纯摩擦——等待、重复、睡一觉，全是待优化的延迟。于是 AI 时代的诱惑是：把一切压成一次性灌输，今晚学完今晚就会。

Since information transmits instantly, learning’s “slowness” gets treated as pure friction – waiting, repetition, sleeping on it, all delays to be optimized away. So the temptation of the AI era is: compress everything into a one-shot infusion, learn it tonight and know it tonight.

新 · 原理New · principle

巩固所需的脑状态（间隔、睡眠）是物理时间，不可被 AI 压缩。"慢"不是摩擦，是一部分价值来源。处方随之改变：脚手架内建间隔与睡眠节律（第 9 节的"下次复看日期"就是这条的工程化），把时间当盟友，不当敌人。

The brain states consolidation needs (spacing, sleep) are physical time and cannot be compressed by AI. “Slow” is not friction but a source of value. The prescription shifts: scaffolds build in spacing and sleep rhythm (Section 9’s “next review date” is the engineering of this), treating time as ally, not enemy.

还有一条"加速反而拖慢"的旁证，提醒别把"快"当默认的好。METR 2025 的随机对照实验：16 名资深开源开发者，246 个真实 issue，允许用 AI 反而慢了 19%——而他们自己预测会快 24%，做完之后仍以为快了 20%，主观和客观完全反着来，且在他们最熟悉的任务上拖慢得更明显。说明一下口径：样本小，可能存在学习曲线，不能外推成"AI 永远拖慢专家"（证据分级 Ⅲ，旁证而非定论）。它的用处是戳破"加速一切"这个默认假设，不是反过来证明"AI 总是更慢"。

There’s also a “speed backfires” corroboration worth a reminder not to treat “fast” as the default good. METR 2025’s RCT: 16 senior open-source developers, 246 real issues; being allowed to use AI made them 19% slower, while they’d predicted 24% faster, and even afterward still believed they’d been 20% faster. Subjective and objective point in opposite directions, and the slowdown was sharper on the tasks they knew best. Caveat: small N, possibly a learning-curve effect, can’t be extrapolated to “AI always slows experts down” (grade III corroboration, not a settled finding). Its job is to puncture the default assumption that everything gets faster, not to prove the reverse.

速度公理怎样反过来约束整卷的处方

How the speed axiom pushes back on the whole volume’s prescription

速度公理不是孤立的一节，它给整卷的处方加了一道关卡：凡是号称"让学习更快"的方案，都得先问一句——它压缩掉的，是不是巩固所必需的那段物理时间？这道关卡让这卷和市面上大多数"AI 高效学习"方案分了道。那些方案的卖点几乎都是压缩时间：一晚速通一门课、十分钟拿下一个概念、把一周的内容塞进一次对话。速度公理直接判定，这类承诺在"会做"这一层是结构性不可能的：信息可以瞬间到手（知道），巩固却没法瞬间完成（会做），因为巩固要走一段只在你睡着时发生、AI 进不去的离线加工。

The speed axiom isn’t an isolated section; it adds a gate to the whole volume’s prescription: any scheme claiming to make learning faster has to answer one question first: does what it compresses include the physical time consolidation needs? This gate is what parts this volume from most “AI-efficient learning” schemes on the market. Their pitch is almost always compressed time: speed-run a course in one night, master a concept in ten minutes, cram a week into one session. The speed axiom simply rules such promises structurally impossible at the “knowing how” layer: information arrives instantly (knowing that), but consolidation cannot; it runs through offline processing that happens mostly while you sleep, somewhere AI can’t follow you.

所以这卷的处方在每个落点上都是反着来设计的：脚手架内建间隔而不是集中（第 9 节的复看日期）、检验信号看隔期表现而不是当堂表现（第 2、8 节）、合意困难刻意让当下变慢（第 5 节）。这些设计合在一起，服从的是同一条母约束：顺着时间常数走，别跟它对着干。看清这一点，就明白为什么这卷敢说"有些过程的价值正在于慢"——慢不是审美偏好，是巩固这件事本身的物理参数，任何无视它的加速方案，都是在拿长期留存换当下那点流畅感。

So this volume’s prescription is designed backwards at every point: scaffolds build in spacing rather than massing (Section 9’s review dates), test signals check lagged rather than in-session performance (Sections 2, 8), desirable difficulty deliberately slows the present down (Section 5). Together these obey one master constraint: go with the time constant, don’t fight it. See that, and you see why this volume can say some processes are valuable precisely because they’re slow. Slow is a physical parameter of consolidation itself, not an aesthetic preference, and any acceleration scheme that ignores it is trading long-term retention for the present tense’s fluency.

睡眠不是停机，是离线的主动重训

Sleep Is Active Offline Retraining, Not Downtime

"慢有价值"里最容易被跳过、却最硬的一块是睡眠的角色。直觉把睡眠当学习的间歇：什么都没发生的停机时间。神经科学给出的是相反的图景：睡眠是记忆主动加工的高峰期，不是空档。慢波睡眠期间，海马把白天获得的记忆痕迹反复重新激活、回放，逐步转交给新皮层做长期存储——这不是被动的衰减，是有方向的重训（Stickgold 2006;Diekelmann & Born 2010，证据分级 Ⅱ）。最惊人的证据是：在一项技能任务里，光是睡一觉，零额外练习，表现就能提升：大脑在你睡着的时候，替你把白天没练顺的东西又练了一遍。

The hardest, most-skipped piece of “slow has value” is sleep’s role. Instinct treats sleep as a gap in learning, downtime where nothing happens. Neuroscience says the opposite: sleep is a peak of active memory processing, not a void. During slow-wave sleep, the hippocampus repeatedly reactivates and replays the day’s memory traces, gradually handing them off to the neocortex for long-term storage: not passive decay, but directed retraining (Stickgold 2006; Diekelmann & Born 2010, grade II). The most striking evidence: on a skill task, sleeping alone, with zero extra practice, improves performance. The brain reran, while you slept, what you hadn’t yet smoothed out during the day.

这对学习方法论有一个直接又反直觉的推论：把学习压进一次性的熬夜速通，不只是累，是主动切断了记忆从脆弱变稳固的必经通道。AI 能把信息瞬间送到你面前，但它送不来那一夜的离线巩固[R15]。所以这卷的处方里，"睡够、把学习摊开在好几天里"不是养生建议，是跟间隔效应同源的一条硬约束——脚手架（第 9 节的间隔复看）正是顺着这条睡眠-巩固的时间常数设计的。

This yields a direct, counter-intuitive corollary: cramming learning into a one-shot all-nighter isn’t just tiring, it actively cuts off the passage memory must travel through to go from fragile to stable. AI can deliver information instantly, but it can’t deliver that night’s offline consolidation[R15]. So in this volume’s prescription, “sleep enough, spread learning across days” is a hard constraint that shares its root with the spacing effect, not wellness advice. The scaffold’s spaced review (Section 9) is designed precisely along this sleep-consolidation time constant.

主观和客观的背离：为什么"感觉更快"最不可信

Subjective versus objective: why “felt faster” is least trustworthy

METR 2025 那个实验里最值得反复琢磨的，不是"慢了 19%"这个数字，是主观和客观完全反着来：16 名资深开发者预测用 AI 会快 24%，实际慢了 19%，做完之后还是以为自己快了 20%。三个数字排在一起，画出一条让人不安的背离——人对"自己是不是变快了/变强了"的内省，一旦 AI 介入，就会系统性地失准。这条背离和合意困难的元认知陷阱其实是同一个现象的两个场景：那边是"当下的流畅被误读成已经掌握"，这边是"用 AI 的顺手被误读成更高效"。

The most worth-chewing-on thing in METR 2025’s RCT is that subjective and objective point in opposite directions, not the “19% slower” figure: 16 senior developers predicted 24% faster with AI, were actually 19% slower, and even afterward still believed they’d been 20% faster. Put those three numbers side by side and you get an unsettling divergence: people’s sense of whether they’re getting faster or better goes systematically wrong once AI is in the loop. This divergence and the metacognitive trap of desirable difficulty are two scenes of the same phenomenon: over there, present fluency gets misread as mastery; over here, the smoothness of using AI gets misread as efficiency.

两者共同的根是：大脑用"省力的感觉"去代理"效果"，而 AI 恰好把这两者解耦了——它能让过程变得非常省力，却不保证真实产出或真实留存也跟着提升，甚至是相反。口径得说清楚：METR 是 Ⅲ 级旁证，样本小、可能带学习曲线、还只是编程任务，不能外推成"AI 永远拖慢专家"；它唯一的用处，是戳破"省力 = 高效"这个默认等式。这个等式一旦被戳破，"凭感觉判断学习效果"这件事就彻底靠不住了——这正是第 8 节仪表盘必须偏向滞后、无 AI 在场指标的根本原因。

The shared root: the brain proxies “effect” with “felt ease,” and AI happens to decouple the two. It can make a process feel effortless without guaranteeing real output or real retention keeps pace, and it can even invert them. Say the caveat plainly: METR is grade III corroboration, small N, possibly a learning-curve effect, and coding-specific; it can’t be extrapolated into “AI always slows experts down.” Its only use is puncturing the default equation “effortless equals efficient.” Once that equation is punctured, judging your own learning by feel becomes wholly unreliable: the root reason the Section 8 dashboard has to lean on lagged, AI-absent indicators.

巩固是生理过程，不是效率上的偏好

Consolidation is physiology, not an efficiency preference

"慢有价值"很容易被听成一句怀旧的价值判断，得把它钉回生理学里才站得住。记忆从依赖海马的脆弱痕迹，转成新皮层里稳定的表征，这个系统巩固过程需要离线时间，也高度依赖睡眠：慢波睡眠期间，白天的记忆痕迹被反复重新激活、回放，正是这次转移的承重机制（Stickgold 2006;Diekelmann & Born 2010;Rasch & Born 2013，证据分级 Ⅱ）。这是一条硬约束——没有那段离线时间，长期表征根本形成不了，不是"最好留点时间消化消化"这种软建议。

“Slow has value” is easily heard as a nostalgic value judgment; it only holds once nailed back to physiology. The turn of memory from fragile, hippocampus-dependent traces into stable neocortical representation, systems consolidation, is a physical process that needs offline time and depends heavily on sleep: during slow-wave sleep, the day’s memory traces get repeatedly reactivated and replayed, the load-bearing mechanism of that transfer (Stickgold 2006; Diekelmann & Born 2010; Rasch & Born 2013, grade II). This is a hard constraint: without that offline time, the long-term representation simply doesn’t form. It is not the soft advice to “leave some time to digest it.”

间隔效应是同一枚硬币的另一面：隔 24 小时复习胜过隔 15 分钟，是因为拉开的复习给每一次提取都留了一个"快忘掉、再找回来"的窗口，正是这个再巩固的窗口在加固记忆痕迹。AI 能把信息的传输压到瞬间完成，但传输不是巩固——它压不掉那段必须发生在你大脑里、而且大半发生在你睡着时的离线加工。

The spacing effect is the flip side of the same coin: a 24-hour gap beats a 15-minute one because spaced review gives each retrieval a window of almost-forgetting-then-recovering, and that reconsolidation window is what hardens the trace. AI can compress the transmission of information down to instant, but transmission isn’t consolidation: it can’t compress the offline processing that has to happen inside your own brain, mostly while you’re asleep.

这把"慢"从一句修辞升级成了方法论上的公理：脚手架不该跟这条时间常数对着干，该顺着它设计。第 9 节反思库里的"下次复看日期"，就是这条公理的工程化——它不是任务管理，是把间隔效应硬编码进了流程。同样道理，一个把整门课压进一晚速通的学习计划，不管 AI 讲得多顺，都是在跟生理学对着干：它优化了当晚的流畅感，牺牲了三周后的留存。所以这卷的处方多了一句：把时间当盟友编进流程里，别当成延迟优化掉。

This upgrades “slow” from rhetoric to a methodological axiom: the scaffold shouldn’t fight this time constant, it should be designed along it. Section 9’s reflection-log “next review date” is exactly this axiom engineered — not task management, but the spacing effect hard-coded into the process. Same logic: a study plan that crams a whole course into one all-nighter, however smoothly AI explains it, is fighting physiology: it optimizes that night’s fluency and sacrifices retention three weeks out. So this volume’s prescription adds one more clause: build time into the process as an ally, not something to optimize away as latency.

检验信号Test signal

测隔几天的留存与迁移，而非当堂表现；顺着时间常数的学习流当下更费劲、隔期更好，反过来就是用速度偷换留存。Test retention and transfer days later, not in-session; a flow along the time constant feels harder now and performs better later – the reverse trades retention for speed.

LEARN

FAILURE · 失败模式

FAILURE

失败 · 误用方式 + 第二仪器

Failure · Anti-patterns + 2nd instrument

AI-Native 学习最常见的误用方式

How AI-Native Learning Most Often Goes Wrong

这一章把前面各节的边界反过来说一遍，好让你认出自己正滑向哪一种误用。

This chapter says the earlier boundaries in reverse, so you can recognize which failure mode you’re sliding toward.

一句话In one line

最常见的翻车不是"没用 AI"，是"用了 AI 却用反了方向"；它们共享一个根——便利陷阱，把"AI 做得了"错读成"该让 AI 做"。The most common failure is “used AI, but backwards,” not “didn’t use AI.” They share one root: the convenience trap, misreading “AI can do it” as “AI should do it.”

这卷作者自己最该防的误用：把萎缩写成已证

The one failure this volume’s author has to guard against first: writing atrophy as proven

最后一种误用方式，矛头是往里指的——它针对的是写这一卷的人，以及任何被说服之后想替它辩护的人。一个唱反调的命题天然带着一种诱惑：为了让警示显得更有力，把"认知可能萎缩"悄悄写成"认知已经萎缩"，把一份有据的担忧夸成"AI 已经让人变笨"。这条线一旦越过，整卷的可信度就会垮掉，原因有两层。第一，证据根本撑不住这种判决：萎缩那一侧的证据全是相关、自报、短期、小样本，最强的因果证据反而指向正面[R24]，而且没有一份纵向数据（第 10 节那两份清单就是为了钉死这一点而建的）。把软证据当硬结论用，本身就是学术上的失实。

The last failure mode points inward — at whoever writes this volume, and at anyone persuaded by it who then wants to defend it further. A contrarian claim carries a built-in temptation: to make the warning land harder, quietly rewrite “cognition may atrophy” as “cognition has atrophied,” and inflate an evidence-grounded concern into “AI already makes you dumber.” Cross that line and the whole volume’s credibility collapses, for two reasons. First, the evidence simply can’t carry that verdict: the atrophy-side evidence is all correlational, self-reported, short-term, small-sample, while the strongest causal evidence actually points positive[R24], and there’s no longitudinal data at all; Section 10’s two ledgers exist precisely to pin this down. Treating soft evidence as a hard conclusion is, plainly, academic misrepresentation.

第二层更隐蔽，是修辞上的反噬：话说得太满，读者会连同那份夸张一起，把真正该被认真对待的担忧也一并丢掉——你越是危言耸听，越没人信你那条本来站得住的核心提醒。所以这卷把立场固定死在同一档：有据的警告，加一个写明了什么会让我们改判的赌注，一步都不越过证据。这一条放在失败模式的最后，是因为它是元层级的——前五种是读者可能犯的，这一种是作者必须先在自己身上防住的。一卷讲认知诚实的书，第一个该诚实对待的，就是自己证据到底有多硬。

The second layer is subtler, the rhetorical backfire: overstate it and readers throw out the legitimate concern along with the exaggeration; the louder the alarm, the less anyone believes the core reminder that was actually defensible. So this volume pins its stance to one place: an evidence-grounded warning, plus a bet with its falsification condition spelled out, not one step past the evidence. This item sits last among the failure modes because it’s meta-level; the first five are ones a reader might commit, this one the author has to guard against in themselves first. A volume about cognitive honesty owes its first honesty to how hard its own evidence actually is.

两个最容易被忽视的误用：给指标表演，和用力过猛

Two easily-missed failures: gaming the metrics, and pushing too hard

六种误用里，有两种特别隐蔽，因为它们都伪装成"在认真执行方法论"。第一种是把指标做成分数：把第 8 节的仪表盘变成自我考核的分数或排行榜，然后开始为指标表演——多反驳几次 AI 好抬高 override rate、多记几条凑数的错题。这正好掉进设计卷反复警告的那个反模式：用分数替换了真实被测量的行为，一个指标一旦成了目标，就不再是个好指标（古德哈特定律）[R22]。仪表盘正确的用法是烟雾报警器——响了去查，不是攒积分。自查：我是在用信号校准自己，还是在为信号表演？

Of the six, two are especially sneaky because both disguise themselves as diligently executing the methodology. The first is turning the metrics into a score: making Section 8’s dashboard into a self-grading leaderboard, then performing for the metric: overriding AI a few extra times just to lift the override rate, logging filler entries to pad the count. This falls straight into the anti-pattern the design volume keeps warning about: a score replaces the real behavior it was supposed to measure, and once a metric becomes the target it stops being a good metric (Goodhart’s law)[R22]. The dashboard’s correct use is a smoke alarm: go check when it sounds, don’t farm points. Self-check: am I calibrating with the signal, or performing for it?

第二种是不合意的困难：把"抵抗便利"做成自虐，在完全没有基础的领域硬是不用 AI，结果只是低效的挫败。这违反了 Bjork 那条明确的边界——困难只对有基础、能成功撑过去的学习者才"合意"，没有基础，它就只是绊脚石[R8]。抵抗是有分寸的工程，不是道德上的苦行：在你已经有根基、撤除测试还撑得住的地方加阻力；在你还没根基的地方，老实把 AI 当脚手架用，等根基长出来再撤。自查：这点阻力是在让我长出能力，还是只是让我停在原地？

The second is undesirable difficulty: turning “resist convenience” into self-punishment, refusing AI outright in a domain where you have zero foundation, producing nothing but inefficient frustration. This violates Bjork’s explicit boundary: difficulty is only “desirable” for a learner with enough footing to push through successfully; without that footing, it’s just an obstacle[R8]. Resistance is engineering with proportion, not moral asceticism: add friction where you already have grounding and the removal test still holds; where you don’t, honestly use AI as a scaffold and withdraw it once the grounding has grown. Self-check: is this friction growing my capability, or just stalling me?

包装化练习：为什么"学得有趣"会反噬

Chocolate-covered broccoli: why “make it fun” backfires

这个教育游戏化里的老比喻值得拆开讲，因为它正对应 AI 学习产品最常见、也最难自觉察觉的误区。原意是说：把枯燥的练习裹上一层游戏化的糖衣——积分、动画、拟人讲解——就以为学习变好了。失败在于：糖衣没有改变核心活动本身，只是用外部奖励暂时把它盖住；糖衣一旦融化，本该做的练习依然没做[R26]。

This old metaphor from educational gamification is worth unpacking, because it’s exactly the pit AI learning products fall into most often, and notice least. The original idea: coat a dull drill, the broccoli, in a layer of game-y sugar (points, animation, an anthropomorphic narrator) and assume that makes learning good. The failure: the coating never changes the core activity, it just masks it with an external reward for a while; once the sugar dissolves, the broccoli still hasn’t been eaten[R26].

AI 时代的变体更隐蔽：把"让 AI 把知识讲得格外顺滑、格外有共鸣、格外不费力"错当成把学习变好了。但第 2 节已经证明，学习承重的地方是犯错-纠正循环，不是讲解顺不顺滑[R2]。顺滑的讲解作用在"知道"这一层（识别、被动接收），它确实让那一层更舒服——可那一层早就被 AI 充裕化了，舒不舒服已经不再是瓶颈。真正的练习（亲手试错、碰到阻力、自己纠正）不但没有被加强，反而被顺滑的讲解挤掉了：越觉得顺，越没在练那个贵的层。所以这个误用的危险不在"没用"，在于它制造了一种很强烈的"我在学"的错觉，而错觉恰恰是最难自查出来的东西。

The AI-era variant is subtler: mistaking “have AI explain this with extra smoothness, extra resonance, zero effort” for having made learning better. But Section 2 already proved that learning’s load-bearing part is the error-correction loop, not how smoothly something gets explained[R2]. A smooth explanation acts on the “knowing that” layer, recognition, passive reception, and it genuinely makes that layer more pleasant. But AI made that layer abundant long ago; pleasant or not, it’s no longer the bottleneck. The real broccoli, trying and failing by hand, hitting friction, self-correcting, is crowded out by the smooth explanation rather than strengthened at all: the smoother it feels, the less you’ve actually practiced the expensive layer. So the danger here is that it manufactures a strong illusion of learning, not that it doesn’t work, and illusions are the hardest thing to catch yourself in.

每一种误用，都是前面某个承重命题的镜像

Each failure is the mirror image of an earlier claim

这一节不是新内容，是把前面各节的承重命题倒过来说一遍——每一种失败模式，都精确对应某一节的边界被越过了。这样排列有个用处：怀疑自己正在误用方法论时，可以顺着失败模式倒查回那一节的处方。

This section is the earlier load-bearing claims said in reverse, not new content. Each failure mode maps precisely onto one section’s boundary getting crossed. Arranged this way, it’s useful: when you suspect you’re misusing the methodology, you can trace the failure back to that section’s prescription.

它们也不是六个各自孤立的坑，是共享一个根：把"AI 做得了"错读成"该让 AI 做"——便利陷阱是这个根的总名，其余五种都是它在不同环节的变体。包装化练习是它落在"核心活动"这一环，指标表演是它落在"监测"这一环，空中楼阁是它落在"元能力"这一环，不合意的困难是抵抗用力过猛的反向失手，把萎缩当已证是这卷的作者自己最该防的立场失手。一一点名，是为了让你认出自己正滑向哪一种。

They’re not six isolated pits either; they share one root: misreading “AI can do it” as “AI should do it.” The convenience trap is the umbrella name for that root, and the other five are its variants at different stages. Chocolate-covered broccoli is it happening at the “core activity” stage, gaming the metrics is it happening at “monitoring,” castles in the air is it happening at “meta-skill.” Undesirable difficulty is resistance overshooting in the other direction, and treating atrophy as proven is the stance misstep this volume’s own author most has to guard against. Naming each one is so you can recognize which you’re sliding toward.

便利陷阱（头号）——把"AI 做得了"误当成"该让 AI 做"，在高可充裕 × 高不可外包那一格悄悄交出判断/品味/深度思考（第 4 节/07 的危险区）。自查：你还记得不用 AI 时怎么做这件事吗？记不清就触线了。
The convenience trap (the prime one) – mistaking “AI can do it” for “AI should do it,” quietly handing over judgment/taste/deep thinking in the high-abundance × high-un-outsourceability cell (the danger zone of Section 4/07). Self-check: do you still remember how to do this without AI? If not, you have crossed the line.
包装化练习的反向使用——以为"让 AI 把知识讲得更有趣/更省力"就是学习。可学习的承重在犯错-纠正循环，不在讲解的顺滑。把核心活动（亲手试错）外包掉、只留包装（被讲解得很顺），等于留下了学习的表层体验，却丢掉了真正的练习。自查：这一小时里我产出了多少，还是只接收了很多？
Chocolate-covered broccoli, eaten upside down – believing that “AI makes the knowledge more fun / more effortless” is learning. But learning’s load-bearing part is the error-correction loop, not the smoothness of explanation. Outsourcing the core activity (trying and erring by hand) and keeping only the wrapper (being explained to pleasantly) is throwing out the broccoli and eating the coating. Self-check: in this hour, how much did I produce versus merely receive?
指标的 pointsification——把第 8 节的仪表盘做成自我考核的分数/排行，于是开始为指标表演（多推翻几次 AI 显得"有主导权"），而非真在监测。指标是烟雾报警器，不是 KPI。自查：我是在用信号校准，还是在刷信号？
Pointsification of the metrics – turning the Section 8 dashboard into a self-grading score/leaderboard, then performing for the metric (overriding AI a few extra times to look “in command”) instead of genuinely monitoring. The indicators are a smoke alarm, not a KPI. Self-check: am I calibrating with the signal, or gaming it?
不合意的困难——把"抵抗便利"做成自虐：在毫无基础的领域强行不用 AI，结果只是低效挫败。Bjork 的边界：困难只对有基础能成功响应者"合意"[R8]。抵抗要有度（第 4 节）。自查：这点阻力让我长出能力，还是只让我停滞？
Undesirable difficulty – turning “resist convenience” into self-flagellation: refusing AI in a domain where you have no foundation, yielding only inefficient frustration. Bjork’s boundary: difficulty is “desirable” only for those with enough background to respond successfully[R8]. Resistance must be measured (Section 4). Self-check: does this friction grow capability, or just jam me?
空中楼阁的元能力——以为"提问/质疑/整合"能脱离具体"能做"独立训练。你质疑不了一个你毫无根基领域里的 AI 输出（第 3 节）。自查：我的质疑命中率在升还是在降？降，多半是根基空了。
Castle-in-the-air meta-skills – believing “asking/challenging/integrating” can be trained in isolation from concrete “knowing how.” You cannot challenge an AI output in a domain where you have no grounding (Section 3). Self-check: is my challenge hit-rate rising or falling? Falling usually means the grounding has emptied out.
把萎缩当已证（立场失手）——这是本卷自己最该防的误用方式：为了警示力度，把"可能萎缩"写成"认知能力已经下降"。证据不支持（第 10 节），且会反噬可信度。自查：我说的是"有据的警告 + 可证伪赌注"，还是越过证据下了判决？
Treating atrophy as proven (a stance misstep) – the failure mode this volume itself must most guard against: for warning force, writing “may atrophy” as “already dumber.” The evidence does not support it (Section 10) and it backfires on credibility. Self-check: am I stating “an evidence-grounded warning plus a falsifiable bet,” or have I overshot the evidence into a verdict?

INSTRUMENT 12 · 该不该让 AI 做 · 单项判定 THE DON'T-OUTSOURCE TEST

第 7 节的认知体检是坐标式全局扫描；这一件是单项快速判定——对一项你正在考虑外包的能力，逐条勾选，得一个"外包安全分"与一句判词。问题全部对应前面各节的承重：勾得越多 → 越该留在止步线内。

The Section 7 cognitive audit is a coordinate-style global scan; this one is a quick single-item test – for one capacity you are considering outsourcing, tick each line and get an “outsourcing-safety score” plus a one-line verdict. Every question maps to an earlier section’s load-bearing claim: the more you tick, the more it belongs inside the stop-line.

它是我判断力 / 价值感 / 直觉 / 品味的一部分吗？（第 6 节）Is it part of my judgment / value sense / intuition / taste? (Section 6) 三个月不用 AI，我还做得来这件事吗？答"不确定"也算勾。（第 8 节）After three months without AI, could I still do this? “Not sure” counts as a tick. (Section 8) 它是我向 AI 提出好问题 / 质疑得准的前提吗？（第 3 节）Is it a precondition for me to ask AI good questions / challenge it accurately? (Section 3) 我的工作产出，依赖我亲手具备它吗？（第 6 节接缝）Does my work output depend on my possessing it by hand? (Section 6 seam) 它的成长靠犯错-纠正循环，而非一次性答案吗？（第 2 节）Does it grow through the error-correction loop rather than a one-shot answer? (Section 2)

LEARN

SPECULATION · 未来推演

SPECULATION

推论 · 外推，非事实

Inference · Extrapolation, not fact

往后推演：认知主权的可能性空间

The Projection: the Possibility Space of Cognitive Sovereignty

一个五年内不会有答案的问题，怎么严肃对待而不装懂？这一章给一种做法。

How do you take a question seriously (one that won’t be answered in five years) without faking expertise? This chapter shows one way.

一句话In one line

头号悬案五年内不会有定论，所以我们不画一条预言曲线，而是张开一个可能性空间：两股力量交叉出四个 2030 世界，每个都配着先行指标和证伪条件。The open question won’t settle in five years, so instead of drawing a prophecy curve, we open up a possibility space: two forces cross into four 2030 worlds, each carrying its own leading indicators and falsification conditions.

这一章的性质 · 推论接下来是基于 2024–2026 公开轨迹的外推，不是事实陈述。它压在第 4、10 节那批 Ⅲ 级以下的证据上——相关、自报、短期、零纵向——所以全章只下赌注，不下判决。一旦首批多年期纵向数据出现并指向某个方向，这一章该是全站最先被改写的地方。

What this chapter is · Inference What follows extrapolates from the public trajectory of 2024–2026; it isn’t a statement of fact. It rests on the grade-III-and-below evidence from Sections 4 and 10: correlational, self-reported, short-term, zero longitudinal. So the whole chapter places bets, never verdicts. The moment the first multi-year longitudinal data shows up and points somewhere, this chapter should be the first thing rewritten.

DEEP TIME · 为什么现在正是这扇窗Why this is the window

把时间尺度拉远看：文字（约公元前 3200 年）、印刷术（1440）、搜索引擎（1998），每一次都把"知道"的获取成本砍掉一个数量级，也每一次都招来同一种焦虑——苏格拉底在《斐德罗篇》里就担心过文字会让记忆萎缩[R16]。他也不算全错：死记硬背式的记忆确实让位了。但每一次，被释放出来的认知都上移到了更高阶的工作。AI 是这条曲线上最陡的一次下跌，可它多了一个前所未有的特征：它外包的不只是存储（像文字那样），而是推理和生成本身。苏格拉底的赌注前五次都输了——萎缩没有压过再分配。这一卷全部的张力就在于：这一次被外包的东西离"思考"本身更近了，所以那五次历史归纳能给我们多少外推的信心，得先打个折。

Zoom the timescale way out: writing (c. 3200 BCE), the printing press (1440), the search engine (1998); each cut the cost of acquiring “knowing that” by an order of magnitude, and each time drew the same anxiety. Socrates in the Phaedrus worried writing would atrophy memory[R16]. He wasn’t entirely wrong: rote memory did give ground. But each time, the freed-up cognition moved to higher-order work instead. AI is the steepest drop yet on that curve, but it adds something none of the earlier ones had: it outsources not just storage, the way writing did, but reasoning and generation themselves. Socrates’s bet lost the last five times; atrophy never won out over redistribution. The tension running through this whole volume is that this time, what’s being outsourced sits closer to thinking itself, so however much confidence those five prior cases buy us, it has to come with a discount.

正在汇流的三股力量，各自带着一份证伪信号

Three converging forces, each carrying its own falsification signal

推演不等于畅想。三股可观测的力量正同时把学习推向同一个十字路口——它们不是预测，是已经在动的曲线，各自都带着"什么观测会让它失效"这句话。把证伪条件写在每一条旁边，就是为了让这张推演图能被未来推翻，而不是写成一篇永远正确、因而没有任何信息量的散文。

Speculation isn’t daydreaming. Three observable forces are pushing learning toward the same crossroads at once: not forecasts, curves already in motion, each carrying its own answer to “what observation would kill this.” Writing the falsification condition beside each one is what lets this map get overturned by the future, instead of being an essay that’s always right and therefore carries no information at all.

力量一 · 摩擦归零的工具默认

Force 1 · The friction-to-zero tool default

未来数年Coming years每一代学习工具（IDE 补全、答题、写作、辅导 agent）的默认进化方向都是更顺、更省力、更自动。到 2030，"先自己想一遍"在工具流里会需要刻意绕路才做得到。Every generation of learning tool (IDE completion, answer-bots, writing, tutoring agents) defaults to smoother, easier, more automatic. By 2030, “think it through yourself first” will require a deliberate detour inside the toolflow.

已在动In motion2024–2026 一手可观测，无需推演。First-hand observable in 2024–2026, no speculation needed.

证伪Falsified if若主流工具开始把"合意困难"作为默认卖点（如默认开启延迟提示、闭卷模式），摩擦归零的单向趋势即被推翻。If mainstream tools start shipping “desirable difficulty” as a default selling point (delay-prompts, closed-book mode on by default), the one-way slide to zero friction is overturned.

力量二 · 纵向数据终于到场

Force 2 · The longitudinal data finally arrives

未来数年Coming years今天关于卸载的证据全是横断/短期（第 10 节）。第一批"重度 AI 协作者三到五年后在无 AI 迁移任务上的表现"的纵向研究，大概率在 2028–2030 出结果——它会把头号悬案从赌注推向某个方向。Today’s offloading evidence is all cross-sectional/short-term (Section 10). The first longitudinal studies of “how heavy AI-collaborators perform on AI-absent transfer tasks after three to five years” will most likely report in 2028–2030 – moving the open question from a bet toward a direction.

推论态Inferred研究在跑，结论未出——本曲线推演权重中等。Studies are running, conclusions are not in – medium speculative weight.

证伪Falsified if若到 2032 仍无任何方法学过关的多年期纵向研究发表，则"数据将裁决"这条本身落空，本卷只能继续下赌注。If by 2032 no methodologically sound multi-year longitudinal study has been published, “the data will adjudicate” itself fails, and this volume can only keep betting.

力量三 · 教育机构开始定价"无 AI 能力"

Force 3 · Institutions start pricing “AI-absent capability”

未来数年Coming years当任何人借 AI 都能产出合格作业，"在 AI 缺席下还能做"的能力变成稀缺的可定价信号。预期 2027 起回潮的闭卷/口试/现场演示评估，本质是机构在给"撤除后仍能独立运转"的能力重新标价。Once anyone with AI can produce passable work, “can still do it with AI absent” becomes a scarce, priceable signal. The closed-book / oral / live-demo assessments expected to resurge from 2027 are, at root, institutions re-pricing the “still operates independently after removal” capacity.

早期信号Early signal2024–2025 已有大学回退手写考试，零星但方向一致。In 2024–2025 some universities already reverted to handwritten exams – sparse but directionally consistent.

证伪Falsified if若主流认证转向"人机协作产出"为唯一评估口径、且不再单独考核无 AI 能力，则这条力量反向，"撤除测试"失去制度支撑。If mainstream credentialing shifts to “human-AI joint output” as the sole assessment and stops testing AI-absent capability separately, this force reverses and the “removal test” loses institutional support.

两条不确定性轴，张开四个 2030 世界

Two axes of uncertainty, opening four 2030 worlds

三股力量划出了边界，但 2030 年究竟落在哪个世界，取决于两条影响很大、却至今高度不确定的轴。横轴是头号悬案怎么裁决：卸载最终被证实更接近萎缩，还是更接近再分配（第 4 节那两个对立假说，目前没人有纵向数据能回答）。纵轴是个体和机构对便利怎么回应：是主动设计摩擦、守住撤除能力的主权学习者占上风，还是顺着零摩擦默认往下滑的依赖学习者占上风。两条轴交叉出四个世界，每个都标了正在滑向它的先行指标，也标了能推翻它的观测。这是 GBN 的双轴情景法[R20]——不预测哪个会发生，只把整个可能性空间画全，好让你认出自己正滑向哪一格。

The three forces mark the boundaries, but which world 2030 lands in turns on two axes that are high-impact and still highly uncertain. The horizontal is how the open question gets adjudicated: does offloading turn out closer to atrophy, or closer to redistribution (Section 4’s two rival hypotheses — nobody has longitudinal data to answer this yet). The vertical is how individuals and institutions respond to convenience: do sovereign learners who design in friction and guard their removal capacity prevail, or do dependent learners who slide down the zero-friction default prevail. The two axes cross into four worlds, each tagged with the leading indicators sliding toward it and the observation that would falsify it. This is the GBN two-axis scenario method[R20]; it doesn’t predict which one happens, it just maps the whole possibility space so you can recognize which cell you’re sliding into.

FIG. L.7 / 2030 情景四象限THE 2030 SCENARIO QUADRANT看懂：裁决方向 × 学习者姿态 = 四个未来Read: adjudication × learner posture = four futures

四个未来怎么读：横轴是头号悬案最终怎么裁决（左＝萎缩被坐实，右＝只是再分配），纵轴是学习者整体的姿态（上＝主动抵抗的主权学习者占上风，下＝顺默认下滑的依赖学习者占上风）。四象限里，无声的空心化（左下）最坏：能力真在退、人却因元认知错觉浑然不觉；升级的常态（右上）最好。注意一个不对称——本卷主张的"抵抗便利"把你推向上半区，而上半区在两种裁决下都不亏（萎缩世界里它救命，再分配世界里它正好是把资源往高阶引导的动作）。这正是第 4 节那个"保险"论证的图形版：在裁决未出的窗口期，往上半区站是无悔的下注。两条主轴都画成虚线——它们是这卷的推演坐标，不是测得的轴。The four futures: the x-axis is how the open question is finally adjudicated (left = atrophy confirmed, right = mere redistribution); the y-axis is the learner population’s posture (up = sovereign learners who actively resist prevail, down = dependent learners who slide down the default prevail). Among the four, the Quiet Hollowing (bottom-left) is worst: capability really decays yet people, via the metacognitive illusion, do not notice; the Upgraded Default (top-right) is best. Note an asymmetry – the “resist convenience” this volume prescribes pushes you into the upper half, and the upper half loses nothing under either verdict (in the atrophy world it saves you; in the redistribution world it is exactly the move that steers resources toward higher-order work). This is the graphical form of Section 4’s “insurance” argument: during the window before the verdict, standing in the upper half is a no-regret bet. Both main axes are drawn dashed: they are this volume’s speculative coordinates, not measured ones.

一件来自 2031 的虚构文物，让推演摸得着

A fictional 2031 artifact, to make the speculation tangible

全是断言的推演读起来很虚。下面这件是设计虚构——一份明确编造的未来文物，把"机构开始给无 AI 能力定价"这股力量，投射到 2031 年一页具体的东西上。它不是预测，是把这卷的赌注做成一件你能拿在手里看的东西。

Speculation made of nothing but assertions reads thin. What follows is design fiction: an explicitly made-up future artifact projecting the “institutions start pricing AI-absent capability” force onto one concrete page from 2031. It’s not a prediction. It’s this volume’s bet turned into something you can hold.

SPECULATIVE · 虚构 · Fiction

ARTIFACT · 大学课程大纲节选 · 2031 秋 · Course syllabus excerpt · Fall 2031

CS 247：算法设计（无 AI 内核段）

CS 247: Algorithm Design (the AI-absent core block)

课程结构: 本课分两段。协作段（70%）：鼓励全程使用 AI——这是你毕业后真实的工作方式。无 AI 内核段（30%）：在监考、断网、纸笔条件下完成；这一段不是怀旧，是给你的"撤除后仍能独立运转"的能力建立可被雇主与认证机构信任的记录。
Course structure: This course splits in two. Collaboration block (70%): AI use is encouraged throughout – this is how you will actually work after graduation. AI-absent core block (30%): completed proctored, offline, with pen and paper; this block is not nostalgia but builds a record of your “still operates independently after removal” capacity that employers and credentialers can trust.
合意困难，明码标价: 内核段刻意保留三类难度：闭卷推导、延迟提示（卡住 15 分钟才解锁参考）、交错复习。课程评估中明确告知：这些摩擦会让你当下更难受，但它们正是这门课不可被 AI 代劳的部分——也是你学位里唯一无法被外包者复制的信号。
Desirable difficulty, openly priced: The core block deliberately keeps three kinds of difficulty: closed-book derivation, delayed hints (references unlock only after 15 minutes stuck), and interleaved review. The rubric states plainly: this friction will make the present harder, but it is exactly the part of this course that cannot be done by AI – and the only signal in your degree that an outsourcer cannot copy.
证书标注: 成绩单分列两个分数：协作能力与无 AI 能力。用人方可分别查询。（2031 起本州 12 所大学采用同一双分制。）
Transcript notation: The transcript lists two scores: collaboration capability and AI-absent capability, separately queryable by employers. (From 2031, 12 universities in this state adopted the same dual-score scheme.)

「我们不再假装 AI 不存在，也不再假装它无关紧要。我们把课程拆成两半，分别给两种能力定价——因为 2031 年的就业市场已经这样定价了。」——课程说明页

“We no longer pretend AI does not exist, nor that it does not matter. We split the course in two and price the two capacities separately – because the 2031 job market already prices them that way.” – course description page

反赌注：最该被拿来反对这卷的那个论证

The counter-bet: the strongest argument against this volume

一份诚实的推演，必须写下反对自己的最强赌注，不然它就只是在自我确认。这卷的核心命题是："长期把深度思考外包出去，可能侵蚀深度思考本身，所以该把抵抗当保险买下来。"最有力的反驳不是"萎缩不会发生"，是一个更釜底抽薪的说法——这卷可能把"被外包的能力"和"值得保留的能力"错当成了同一件事。延伸心智（Clark & Chalmers 1998）的强版本主张：认知从来就是人和工具耦合出来的系统，没有哪一代人拥有过"纯粹的、没被外包过"的思考；我们今天珍视的"独立推理"，本身就是被书写、印刷、计算器、搜索一路塑造出来的产物。这要是成立，这卷守护的"撤除后仍能独立运转"可能就是个假目标——就像非要一个现代数学家在没有符号记法的条件下证明定理，那算不上更纯粹的能力，只是更低效的折磨。

An honest speculation has to write down the strongest bet against itself, or it’s just confirming itself. This volume’s core claim is: outsourcing deep thinking for the long haul may erode deep thinking itself, so buy resistance as insurance. The strongest rebuttal is something more foundational than “atrophy won’t happen”: this volume may be conflating “the capacity that gets outsourced” with “the capacity worth keeping.” The strong version of the extended mind (Clark & Chalmers 1998) holds that cognition has always been a person-tool coupled system; no generation ever had “pure,” never-outsourced thinking; the “independent reasoning” we prize today was itself shaped by writing, print, calculators, search. If that’s right, then what this volume guards, “still operates independently after removal,” might be a false target, like demanding a modern mathematician prove a theorem with no symbolic notation. That wouldn’t be a purer capacity. It would just be a less efficient kind of torment.

更尖锐的版本，来自再分配假说的乐观读法：如果 AI系统性地把人的认知从可外包的任务解放出来、推向判断、品味、提问，那花力气"抵抗便利、保留低阶手工能力"，反而是一种逆着历史潮流的内卷——把本该往上流动的认知资源，浪费在了机器早就做得更好的那一层上。这卷不认为这个反赌注已经赢了——但它有可能赢，而且赢的条件是清楚写好的，见下面的证伪。把这个反赌注写在这里，是因为一本讲认知诚实的书，必须对自己的核心命题也用上同一套诚实。

A sharper version comes from the optimistic reading of the redistribution hypothesis. If AI systematically frees human cognition from offloadable tasks toward judgment, taste, and asking, then spending effort to “resist convenience and preserve low-order manual capacity” is itself busywork running against the historical current: wasting cognitive resources that should be moving up on a layer machines already handle better. This volume doesn’t think that counter-bet has already won. But it could win, and the conditions under which it would are stated plainly below. It’s written here because a book about cognitive honesty has to apply that same honesty to its own core claim.

反赌注的证伪 / 兑现条件When the counter-bet wins or loses

反赌注赢：纵向随机研究显示重度 AI 协作者撤掉 AI 后不退步、甚至进步；反赌注输：无 AI 能力随依赖时长系统性下降。今天无人有数据分辨这两个世界。The counter-bet wins if a longitudinal randomized study shows heavy AI-collaborators don’t regress with AI removed, even improve; it loses if AI-absent capability falls systematically with dependence duration. No one has the data yet to tell the two worlds apart.

LEARN

LANDING · 落地

LANDING

落地 · 收束与起步

Landing · Close & start

抵抗便利的学习者操作系统

The Learner’s Operating System for Resisting Convenience

这一章把全卷收成原则加信号，再加一步起步动作，最后用"不变 / 在变 / 未决"三分收尾。

This chapter collapses the volume into principles plus signals plus one starting step, then closes with an invariant / shifting / open split.

一句话In one line

全卷收成一个词——认知主权：大量用 AI 的同时，仍然保有自己判断、识别真问题、裁决好坏的能力。守住它，"人回归于意义"才有个前提。The whole volume comes down to one word: cognitive sovereignty. Using AI heavily while still holding onto the capacity to judge, spot real problems, and tell good from bad. Guard it, and “people return to meaning” finally has something to stand on.

收尾的原理，配上你能自己读的信号

Closing principles, paired with signals you can read yourself

一套操作系统光给原则不够，还得给反馈回路，不然你没法知道原则到底有没有起作用。把全卷的检验信号收成一组（完整六条见下方表格），每一条都设计成个人尺度可读、且偏向滞后、无 AI 在场，好避开即时流畅带来的元认知陷阱。它们是仪表盘，不是 KPI——一起读才有意义，单看任何一条都会被当下的感受带偏（第 8 节说过这个悖论）。一个状态健康的学习者，会看到"升=好"的那几条在缓慢往上爬、"降=好"的那两条在缓慢回落，而不是任何一条短期突然飙高。

An operating system that only gives principles isn’t enough; it needs feedback loops too, or you have no way to know whether the principles are actually working. Gather the volume’s test signals into one set (all six in the table below), each designed to be readable at your own scale, and biased toward lagged, AI-absent measurement, so as to dodge the metacognitive trap of immediate fluency. They’re a dashboard, not a KPI: they only mean something read together, and any single one alone gets skewed by how things feel right now (Section 8’s paradox). A learner in good shape sees the “rising is good” lines climb slowly and the “falling is good” lines recede slowly, not any one line spike overnight.

认知主权：这一卷到底在守什么

Cognitive sovereignty: what this volume is actually guarding

把全卷收成一个词，就是认知主权——一个人在和 AI 协作的时候，依然保有"自己做判断、自己识别真问题、自己裁决好坏"这份能力和这个位置。这个词把这卷的反调，从一种情绪（担心 AI 让人变懒）提升成了一个有结构的目标。主权不是排斥工具：一个有主权的学习者会大量用 AI，但他始终是发号施令、也能验证命令是否被正确执行的那一方——他外包的是执行，不是判断。主权也不是一劳永逸的东西：它得被持续守护，因为侵蚀它的那股力量（便利的默认引力）每天都在场。

Collapse the whole volume into one word and it’s cognitive sovereignty: a person retaining, while collaborating with AI, the capacity and the standing to make their own judgments, spot real problems themselves, and decide good from bad themselves. This word lifts the volume’s dissent out of a feeling (“worried AI is making people lazy”) into a goal with actual structure. Sovereignty isn’t rejecting tools: a sovereign learner uses AI heavily, but is always the one giving the orders and able to check whether they were carried out right, outsourcing execution, not judgment. Nor is sovereignty won once and kept forever. It has to be guarded continuously, because the force eroding it, convenience’s default pull, is there every single day.

前面十六节都是守护它的具体工程：机理告诉你它为什么会被侵蚀（那道成本剪刀差），断裂点告诉你最危险的侵蚀发生在哪，脚手架和流向规则告诉你怎么用工程、而不是意志去守住它，仪表盘告诉你怎么在侵蚀发生之前就看见它，止步线告诉你哪条边界绝不能退。这一卷在整个系列里之所以省不掉，是因为其余几卷描绘的那个"人回归意义、人做判断和品味"的未来，都预设了人还拥有一个没有萎缩的认知主体——守住那个主体，就是学习方法论存在的全部理由。当执行变得充裕，最稀缺、最值得被刻意守护的，恰恰是那个还能判断"这一切到底是为了什么"的人。

The preceding sixteen sections are the concrete engineering of guarding it: the mechanism tells you why it erodes (the cost scissors), the fracture point tells you where the most dangerous erosion happens, the scaffold and flow rules tell you how to guard it with engineering instead of willpower, the dashboard tells you how to see erosion before it happens, the stop-line tells you which boundary must never give way. This volume can’t be cut from the series because the future the other volumes describe (people returning to meaning, people doing judgment and taste) presupposes a person who still has an un-atrophied cognitive core. Guarding that core is the entire reason a learning methodology exists at all. Once execution becomes abundant, the scarcest thing, the thing most worth deliberately protecting, is the person still capable of judging what all of this is even for.

起步只有一步：先做一次撤除演练

The starting path is one step: run a removal drill

收尾最怕给一长串待办，让人无从下手。这一卷的起步路径刻意压成一步，而且这一步本身就是整套方法论的缩影：挑一项你已经高度依赖 AI 的认知任务，做一次撤除演练——合上 AI，在它不在场的情况下从头做一遍，记下你卡在哪里。这一步同时干了三件事：它跑了一次 INSTRUMENT 11/12 的判定（这项能力到底落在哪个象限、该不该留在止步线内）；它产出了你自己那份 N=1 纵向研究的第一个数据点（第 8 节的迁移信号，从这里开始有基线）；它生成了错题反思库的第一条记录（第 9 节，记的不是答案，是"撤掉 AI 后我卡在哪、为什么"）。一步落地，三层脚手架同时起步。卡得越狠的地方，越可能是被悄悄外包、其实本不该外包的能力——那里正是你该重建犯错-纠正循环和合意困难的第一块工地。

A closing chapter should never hand over a long to-do list that leaves you nowhere to start. This volume compresses the starting path to one step, and that one step is itself a miniature of the whole methodology: pick one cognitive task you already lean on AI heavily for, and run a removal drill: close AI, do it from scratch with it absent, note exactly where you get stuck. This single step does three things at once. It runs an INSTRUMENT 11/12 verdict, which quadrant this capacity actually sits in, whether it belongs inside the stop-line. It produces the first data point of your own N=1 longitudinal study, the Section 8 transfer signal, giving you a baseline. And it writes the first entry in your error-reflection log (Section 9), not the answer, but where you got stuck with AI gone and why. One step lands, and three scaffold layers start moving at once. Wherever you get stuck worst is most likely a capacity that’s been quietly outsourced when it shouldn’t have been — that’s the first construction site for rebuilding its error-correction loop and its desirable difficulty.

为什么收尾不给一个静态答案

Why the close refuses a static answer

大多数方法论收尾时会给一份确定的清单——照做就好。这一卷做不到，而这个"做不到"本身就是它最诚实的立场。原因就在第 4 节那桩头号悬案：认知会不会因为长期外包而萎缩，证据还没有定案。一份假装确定的收尾清单，恰恰会犯这卷自己点名的最后一种误用——把一桩未决的悬案抬成了定论。

Most methodologies close with a definite checklist: just follow it. This volume can’t, and that inability is itself its most honest stance. The reason sits in Section 4’s open question: whether cognition atrophies under long-term outsourcing hasn’t been settled by the evidence. A closing checklist that fakes certainty would commit exactly the last failure mode this volume names in itself: elevating an open question into a settled one.

所以这一卷用一个会动的三分法收尾，而不是一个静态答案——把全部主张按证据状态分进三个会随时间移动的格子。不变那一格（合意困难、测试效应、睡眠巩固）是几十年可复现、不依赖任何 AI 研究的硬地基，模型再强也动不了它，这是你现在就能照做、也不会过时的部分。在变那一格（"知道"的获取成本）还在持续往下掉，学习的目标也跟着往上游走，你要做的是跟着它一起调整，而不是锚死在某个旧目标上。前沿那一格（萎缩到底会不会真发生）是一个未来数据随时可能改写的赌注，对它正确的姿态是挂着先行指标持续监测，而不是现在就站队——这一格完整的推演见第 16 节。

So this volume closes with a moving three-part split, not a static answer, sorting every claim by evidence state into three cells that shift over time. The invariant cell (desirable difficulty, the testing effect, sleep consolidation) is decades-replicable, independent of any AI research, a hard floor no stronger model will move; this is the part you can act on today and it won’t date. The shifting cell, the cost of acquiring “knowing that,” keeps falling, and the learning goal keeps moving upstream with it; your job is to adjust along with it, not anchor to some old target. The frontier cell (does atrophy really happen) is a bet future data could overturn at any time; the right posture toward it is hanging leading indicators and monitoring continuously, not taking a side now. The full speculation on this cell is Section 16.

这个收尾结构本身就是这套方法论的一次示范：面对一个证据还没定案的领域，负责任的做法不是假装有答案，是把已知的、在变的、未决的三样分清楚，再对每一格用配得上它证据状态的力度去行动。它在个人认知这一层能做到这个精度，也就此止步——把止步线画给一个人，把机构、教育、政策的尺度明确交还回去。但完全从第一性原理设计学习，走得比"给一个人配一条止步线"更远：它要连"课程""学分""按学期计时"这些容器一起重新画。我们的方向感是：内化状态本身会变成可查的凭证，编排单位从"学期"挪成"能力"，脚手架、反思库、止步线不再是个人自备的插件，而是机构默认自带的骨架。最强的反方一样锋利（见第 16 节）：也许从来就没有一版"未被外部形塑过"的内化可以拿来当基准，重新设计容器只是换一套习惯，不是逼近真相。能分辨这两边的观测是同一件事——等纵向数据先把萎缩还是再分配裁清楚，再去看按能力而非学期编排的机构里，学习者的判断力曲线往哪弯。这一步已经出了个人尺度，得靠组织卷接住。

This closing structure is itself a demonstration of the methodology: facing a domain where the evidence hasn’t settled, the responsible move is to separate the known, the shifting, and the open, and act on each with force matched to its evidence state, not to fake an answer. At the individual layer of cognition it holds to that precision, and stops there, drawing the stop-line for one person, handing the scale of institutions, education, and policy explicitly back. But designing learning from first principles all the way through goes further than handing one person a stop-line: it means redrawing the containers themselves, the course, the credit, the semester clock. Our best guess is that internalization itself becomes the checkable credential, the unit of scheduling moves from term to capability, and scaffold, log, and stop-line stop being something a person bolts on and become the default skeleton an institution ships with. The strongest counter cuts just as deep (see Section 16): maybe there never was a version of internalization that wasn’t already shaped by some outside medium, and redesigning the container is just trading one habit for another, not closing in on the truth. What would tell the two apart is one and the same observation: once the longitudinal data settles atrophy versus redistribution, watch which way the judgment curve bends for learners inside institutions actually scheduled by capability instead of terms. That question is already past the individual scale; it needs the org volume to carry it.

原理一Principle 1

先想后问Think before you ask

先自己产出假设，再让 AI 校验/补全，不空手求助——保住提问/质疑的元能力（第 3 节）。Produce your own hypothesis first, then let AI verify/complete it; never ask empty-handed – preserving the asking/challenging meta-skills (Section 3).

原理二Principle 2

保留犯错-纠正循环Keep the error-correction loop

错了先自纠再看 AI——承重的是循环结构，不是答案（第 2 节）。When wrong, self-correct before consulting AI – what bears weight is the loop, not the answer (Section 2).

原理三Principle 3

建认知脚手架Build the cognitive scaffold

反思库 / 知识库人机同源、可 diff，内建合意困难（第 5 节）。A same-source, diffable reflection/knowledge base with desirable difficulty built in (Section 5).

第四条原理单独点出，因为它是全卷的题眼：划 AI 止步线——明确哪些能力刻意不外包（第 6 节）。配套信号：提问质量↑ / 质疑 AI 的命中率↑ / 迁移测试通过率↑ / 反思库回流使用率↑ / "答案召回"在学习时间中的占比↓ / 主动设阻力的习惯化。起步路径只有一步：先做一次"外包 vs 内化"认知体检（下方 INSTRUMENT），标出已被悄悄外包、却本不该外包的能力，对其重建犯错-纠正循环与合意困难。

The fourth principle is called out on its own because it is the volume’s keystone: draw the AI stop-line – make explicit which capacities you deliberately do not outsource (Section 6). Companion signals: question quality up / hit-rate of challenging AI up / transfer-test pass rate up / reflection-log return-use up / share of “answer recall” in learning time down / habituation of adding friction on purpose. The starting path is one step: run an “offload vs internalize” cognitive audit (the INSTRUMENT below), mark the capacities quietly outsourced but that should not have been, and rebuild their error-correction loop and desirable difficulty.

INSTRUMENT 11 · 外包 vs 内化 · 认知体检 OFFLOAD-VS-INTERNALIZE AUDIT

为一项学习/认知任务打两轴：X · 可充裕度（AI 能多大程度替你做到）× Y · 不可外包度（这项能力萎缩了，你会不会损失认知主导权 / 它是不是下游赖以运转的人类底座）。两轴张成四象限——其中一格是本卷最反直觉的便利陷阱。切两轴看你落在哪，以及该怎么处置。

Score a learning/cognitive task on two axes: X · abundance-ability (how far AI can do it for you) × Y · un-outsourceability (if this capacity atrophies, do you lose cognitive command / is it the human bedrock the downstream runs on). The two axes span four quadrants – one of which is this volume’s most counter-intuitive cell, the convenience trap. Toggle both axes to see where you land and what to do.

X · 可充裕度Abundance-ability

Y · 不可外包度Un-outsourceability

核心内化区Core Internalization

低可充裕 × 高不可外包Low abundance × high un-outsourceability

⚠ 便利陷阱⚠ The Convenience Trap

高可充裕 × 高不可外包High abundance × high un-outsourceability

暂缓Park It

低可充裕 × 低不可外包Low abundance × low un-outsourceability

放心外包Outsource Freely

高可充裕 × 低不可外包High abundance × low un-outsourceability

INSTRUMENT 13 · 合意困难旋钮 · 把抵抗调到刚好 THE FRICTION DIAL

前两件仪器判断"该不该外包"；这一件回答"抵抗调到多大"。合意困难是倒 U（FIG L.2）——太松不留痕，太紧只挫败，峰值在中间那条带。拨动旋钮，看每一档对应的脚手架姿态，以及它把你当下的省力换成了多少长期的留存。注意 Bjork 的边界：困难只对有基础能成功响应者合意。

The first two instruments decide “outsource or not”; this one answers “how hard to set resistance.” Desirable difficulty is an inverted U (FIG L.2) – too loose leaves no trace, too tight only frustrates, the peak in the middle band. Turn the dial to see the scaffold posture each notch implies, and how much present effort saved it trades for long-term retention. Mind Bjork’s boundary: difficulty is desirable only for those with enough base to respond successfully.

当下省力present ease

长期留存long-term retention

THE LAST LAYER · 最后一层，不给静态答案The Last Layer – No Static Answer

不变 · 硬Invariant · hard

理解 ≠ 信息；慢有其价值Understanding ≠ information; slow has value

合意困难、测试效应（Bjork；Roediger & Karpicke），睡眠/间隔巩固——数十年可复现、不依赖 AI。这是全卷最硬的地基，不会因模型更强而变。Desirable difficulty, the testing effect (Bjork; Roediger & Karpicke), sleep/spacing consolidation – decades-replicable, AI-independent. The volume’s hardest floor; it does not move because models get stronger.

在变Shifting

"知道"无成本获取“Knowing” is costless to obtain

陈述性知识、讲解、示范随取随到且可个性化。学习目标随之上游到提问/质疑/整合的元能力——这一层正在移动，且还会继续移。Declarative knowledge, explanations, demonstrations on demand and personalizable. The goal moves upstream to the asking/challenging/integrating meta-skills – this layer is moving, and will keep moving.

前沿 · 未决Frontier · open

认知萎缩是否真发生Does atrophy really happen

头号悬案。证据仅相关/短期，最强因果反而正向（方向性正效应，精确量级无可核来源），无任何多年期纵向数据。本卷下的是赌注不是判决——证伪条件见第 4 节。The open question. Evidence is only correlational/short-term, the strongest causal evidence is positive (directional; its precise magnitude has no checkable source), and there is no multi-year longitudinal data. This volume places a bet, not a verdict – falsification condition in Section 4.

AI-Native 学习者 · 可执行 skillThe AI-Native Learner

The AI-Native Learner · executable skill

前面十七节讲的是"为什么充裕的答案不是学习、该守住什么"；这一件替你把守护这件事真正跑起来：它不是"设计一所学校"，是为一个人内化一项能力搭出一份可运行的学习协议。给它一项你想学会的能力，它先过一道范围闸（绿地从零开始 / 已被悄悄掏空的能力做转化 / 出域的组织级培训课程——后者诚实判定为"另一本书"，不伪装通吃），再产出四件实物：一份把合意困难内建进流程的脚手架、一条显式的交与不交边界、一个有真实字段和回写边的反思库、一块测能力而不是测吞吐量的仪表盘。这一面最反直觉的地方是：充裕的查、答、讲解，正是它要防的那份诱惑——在认知这个面上，"替你做"会改变"被做的那个人"。

The first seventeen sections argue why abundant answers aren’t learning and what to guard. This piece runs that guarding for you. It builds a runnable learning protocol for one person internalizing one capacity, not “design a school.” Give it a capacity you want to learn, and it runs a scope gate first: greenfield from zero, transformation of a quietly hollowed-out skill, or out-of-scope organizational training (the last of which it honestly judges “another book” rather than faking universal coverage). Then it produces four real artifacts: a scaffold with desirable difficulty built into the flow, an explicit offload boundary, a reflection log with real fields and a write-back edge, and a dashboard that measures capability, not throughput. The most counter-intuitive part of this surface: abundant lookup, answers, explanations are exactly the temptation it has to guard against — on the surface of cognition, having it done for you changes the person it’s done for.

# 在 Claude Code 里调用invoke inside Claude Code
$ /skill ai-native-learning
> "帮我用 AI 学会 X，又不被它替我思考：……""help me learn X with AI without it doing the thinking for me: ..."

  → 范围闸 · 绿地 / 转化 / 出域scope gate · greenfield / transformation / out-of-scope
  → 交与不交边界 + 脚手架 + 反思库 + 仪表盘offload-boundary + scaffold + reflection store + dashboard
  → 一份 AI-Native 学习协议one AI-Native Learning Protocol

开源仓库：Open-source: github.com/watterfall/ai-native-architect/skills/ai-native-learning ↗

一行安装：Install (one line): /plugin marketplace add watterfall/ai-native-architect ↗

这一件是什么 · 学习面的可执行配套架构层的 architect 负责设计组织；六个配套件各对应一个面，同一内核，彼此耦合，阅读没有固定起点——这一件把"学习"这个面真正跑起来。它最锋利的判断节点，也是全系统最硬的止步线：哪一种困难是合意的、必须留在人这一侧。要是让 agent 替你完成那场挣扎，人就学不到——外包合意困难，等于摧毁了这项活动本来要建立的那份能力。

What this is · the learning executable companionThe architecture-layer architect designs the organization; the six companion pieces map one to each surface, sharing one kernel, mutually coupled, with no fixed reading entry; this is the piece that makes the learning surface runnable. Its sharpest judgment node is also the hardest stop-line in the whole system: which difficulty is desirable and has to stay on the human side. If the agent does the struggle for you, the human doesn’t learn: outsourcing desirable difficulty destroys the very capability the activity exists to build.

SPEC.V / AI NATIVE METHODOLOGY / OWL METHODOLOGY SERIES

SCOPE / 一套方法论 · 完整组织光谱 N=1 → N=众多（一人公司至 agent 网络，同一套第一性原理）One methodology · the full organizational spectrum N=1 → N=many (from the one-person company to the agent network, on a single set of first principles)

SERIES / 六卷同一内核 · 本卷是其中一个面，完整接线见上方「方法论系列」。Six volumes, one kernel · this volume is one surface; the full wiring is above under “The Series.”

CONTACT / 案例投稿与合作洽谈：Case submissions and collaboration: contact@ai-native.build

FEEDBACK / 选中任意正文文字或悬停图表，点击浮出的 ⚑ 按钮即可直接提交反馈（免登录），或通过 GitHub 提交并跟踪进展。Select any text or hover a figure, then click the ⚑ button that appears to submit feedback directly (no account needed), or via GitHub to track progress.

APPENDIX · SOURCES / 证据与引用登记 —— 分级口径：Ⅰ 同行评审元分析或多实验室复现（最硬）· Ⅱ 同行评审受控／影像研究 · Ⅲ 小样本／未评审／横断面相关（引用须写"相关"，不得写"导致"）· Ⅳ 综述或从业者一手陈述 · Ⅴ 哲学文本或推演（是论证，不是事实）。一条纪律：全卷核心命题"认知会萎缩"现有证据多为 Ⅲ 级相关性，故降格为可证伪的赌注、不当判决。Evidence and citation registry; grading key: Ⅰ peer-reviewed meta-analysis or multi-lab replication (hardest) · Ⅱ peer-reviewed controlled / imaging study · Ⅲ small-sample / un-reviewed / cross-sectional correlation (citations must read “correlates with,” never “causes”) · Ⅳ review article or practitioner first-hand account · Ⅴ philosophical text or extrapolation (an argument, not a fact). One discipline: the volume’s core claim — “cognition atrophies” — rests mostly on grade-Ⅲ correlational evidence today, so it is downgraded to a falsifiable bet, not a verdict.

REF	级GR	SOURCE	承重论断Load-bearing claim
R1	Ⅱ	Roediger & Karpicke《Test-Enhanced Learning》Psychological Science 17(3) 2006:249-255 · doi.org/10.1111/j.1467-9280.2006.01693.x	测试效应——主动提取（考自己）比重读更利于长期保持，而重读在短期看起来更好，于是人系统性误判；受控实验、数十年可复现。AI 把"重读式"轻松最大化，恰好踩中这个误判陷阱The testing effect: active retrieval (self-testing) beats rereading for long-term retention, yet rereading looks better in the short term, so people systematically misjudge; a controlled experiment, replicated for decades. AI maximizes the ease of the “reread” mode, landing squarely in that misjudgment trap
R2	Ⅱ	Ericsson, Krampe & Tesch-Römer《The Role of Deliberate Practice in the Acquisition of Expert Performance》Psychological Review 100(3) 1993:363-406 · doi.org/10.1037/0033-295X.100.3.363	刻意练习的承重部分是结构化的犯错-纠正循环，不是小时数；其"练习量主导"主张后被 R3 元分析显著压缩，故引用时承重的是循环结构、不是"一万小时"（被夸大的流行说法）The load-bearing part of deliberate practice is the structured error-and-correction loop, not the hour count; its “practice volume dominates” claim was later substantially shrunk by the R3 meta-analysis, so what a citation should carry is the loop structure, not “10,000 hours” (an overstated pop-claim)
R3	Ⅰ	Macnamara, Hambrick & Oswald《Deliberate Practice and Performance in Music, Games, Sports, Education, and Professions: A Meta-Analysis》Psychological Science 25(8) 2014:1608-1618 · doi.org/10.1177/0956797614535810	多研究合并元分析——刻意练习整体仅解释约 14% 表现方差（教育 4%、职业 <1% 不显著）；本卷诚实纳入的反例，用来限定 R2 不被过度外推A pooled meta-analysis: deliberate practice explains only about 14% of performance variance overall (education 4%, professions <1%, non-significant); the counter-example this volume honestly includes, used to bound R2 against over-extrapolation
R4	Ⅲ	Gerlich《AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking》Societies 15(1):6 · 2025 · doi.org/10.3390/soc15010006（N=666，横断面相关） (N=666, cross-sectional correlational)	AI 使用与批判性思维显著负相关，由认知卸载中介（总效应 b≈-0.42）——作者自承不能证因果、无纵向数据。萎缩假说的相关性证据，明确标"相关非因果"AI use correlates significantly negatively with critical thinking, mediated by cognitive offloading (total effect b≈-0.42); the author concedes no causation and no longitudinal data. Correlational evidence for the atrophy hypothesis, explicitly flagged “correlation, not causation”
R5	Ⅲ	Kosmyna et al. (MIT Media Lab)《Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing》arXiv:2506.08872 · 2025 · arxiv.org/abs/2506.08872（preprint，N=54→第四轮仅 18，未经评审） (preprint, N=54 → only 18 by the 4th session, un-reviewed)	EEG 显示"认知负债"随 LLM 写作累积——样本极小、未经同行评审，记 Ⅲ；作为萎缩假说的短期生理信号引用，不作长期因果结论EEG shows “cognitive debt” accumulating with LLM-assisted writing; a tiny, un-peer-reviewed sample, graded Ⅲ; cited as a short-term physiological signal for the atrophy hypothesis, not as a long-term causal conclusion
R6	Ⅱ	Sparrow, Liu & Wegner《Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips》Science 333(6043) 2011:776-778 · doi.org/10.1126/science.1207745	"Google 效应"——预期信息可再获取时，人记得内容更少、记得"去哪找"更多。卸载改变的是记忆策略，是萎缩/再分配之争的机理先例The “Google effect”: when information is expected to be re-accessible, people remember less of the content and more of “where to find it.” Offloading changes the memory strategy; a mechanistic precedent for the atrophy-vs-reallocation debate
R7	Ⅴ	再分配假说（理论立场，非实证）—— 无经同行评审的实证来源；哲学同向支撑见延伸心智 R12。引用一律写"假说预测"，不作为实证证据。The redistribution hypothesis (a theoretical position, not empirical) — no peer-reviewed empirical source; its philosophical ally is the extended mind (R12). Always cite as “the hypothesis predicts,” never as empirical evidence.	再分配假说——AI 不让认知能力净退化，而是把认知资源从可卸载的低阶任务重分配到更高阶工作；与萎缩假说对置的另一极，目前同样缺纵向数据裁决The redistribution hypothesis: AI does not net-degrade cognitive capacity but reallocates cognitive resources from offloadable low-level tasks to higher-order work; the opposite pole to the atrophy hypothesis, equally lacking longitudinal data to adjudicate
R8	Ⅱ	R. A. Bjork & E. L. Bjork《Desirable Difficulties in Theory and Practice》J. Applied Research in Memory and Cognition (JARMAC) 9(4) 2020:475-479 · doi.org/10.1016/j.jarmac.2020.09.003（"desirable difficulties"框架首见 R. Bjork 1994《Memory》章） (the “desirable difficulties” frame first appears in R. Bjork 1994, chapter in Memory)	合意困难——间隔、提取、交错等刻意保留的难度提升长期保持与迁移；边界（Bjork 告诫）：只对有基础能成功响应的学习者有益，否则只是"不合意的困难"。数十年可复现Desirable difficulty: deliberately retained difficulties (spacing, retrieval, interleaving) improve long-term retention and transfer; the boundary (Bjork’s caveat) is that they help only learners with enough base to respond successfully, otherwise they are merely “undesirable difficulty.” Replicated for decades
R9	Ⅱ	Maguire et al.《Navigation-Related Structural Change in the Hippocampi of Taxi Drivers》PNAS 97(8) 2000:4398-4403 · doi.org/10.1073/pnas.070039597	伦敦出租车司机长期空间导航与更大的后海马相关——常被用来支持"卸载致萎缩"，但本卷只取其"用进"一面，并配 R10 标明方向未定London taxi drivers’ long-term spatial navigation correlates with a larger posterior hippocampus; often invoked to support “offloading causes atrophy,” but this volume takes only its “use-it” side and pairs it with R10 to mark the direction as undetermined
R10	Ⅲ	Dahmani & Bohbot《Habitual Use of GPS Negatively Impacts Spatial Memory During Self-Guided Navigation》Scientific Reports 10:6310 · 2020 · doi.org/10.1038/s41598-020-62877-0	习惯性 GPS 使用与海马灰质更少相关——方向未定（是 GPS 致萎缩，还是天生海马偏弱者更依赖 GPS？）。横断面相关，本卷据此明确不下"卸载致萎缩"的因果断言Habitual GPS use correlates with less hippocampal grey matter; direction undetermined (does GPS cause atrophy, or do those with weaker hippocampi rely on GPS more?). Cross-sectional and correlational; on this basis the volume declines any causal “offloading causes atrophy” claim
R11	Ⅲ	Stadler, Bannert & Sailer（少见的因果实验：LLM 辅助组认知负荷更低、但论证质量更差；Computers in Human Behavior 2024, N=91）· 2024 (a rare causal experiment: the LLM-assisted group shows lower cognitive load but worse argument quality; Computers in Human Behavior 2024, N=91) · 2024	"省力 ≠ 学得好"的因果级证据——主观轻松与客观质量背离，支撑本卷"刻意保留难度"的立场，同时与 R3/R10 同列被诚实标注样本与外推边界Causal-grade evidence that “less effort ≠ better learning”: subjective ease diverges from objective quality, supporting the volume’s “deliberately retain difficulty” stance, while honestly noting sample and extrapolation bounds alongside R3/R10
R12	Ⅴ	Clark & Chalmers《The Extended Mind》Analysis 58(1) 1998:7-19 · doi.org/10.1093/analys/58.1.7	延伸心智——纸笔、地图、笔记本本就是外脑，认知卸载本身中性；论证卸载是认知边界外移、不是能力流失，是再分配假说的哲学同向支撑（哲学论证，记 Ⅲ）The extended mind: pen, paper, maps, notebooks are already external brains, so cognitive offloading is in itself neutral; it argues offloading is the cognitive boundary moving outward, not capacity drain — a philosophical ally of the redistribution hypothesis (a philosophical argument, graded Ⅲ)
R13	Ⅳ	Risko & Gilbert《Cognitive Offloading》Trends in Cognitive Sciences 20(9) 2016:676-688 · doi.org/10.1016/j.tics.2016.07.002	认知卸载的权威综述——界定"把记忆/计算/导航外包给外部工具"这一概念，本身牢固；本卷借其框架，但卸载的长期后果仍由 R4-R11 各自的证据级决定The authoritative review of cognitive offloading — it defines the concept of “outsourcing memory/computation/navigation to external tools,” which is itself solid; the volume borrows its frame, while the long-term consequences of offloading remain governed by the individual evidence grades of R4-R11
R14	Ⅰ	Cepeda, Pashler, Vul, Wixted & Rohrer《Distributed Practice in Verbal Recall Tasks: A Review and Quantitative Synthesis》Psychological Bulletin 132(3) 2006:354-380 · doi.org/10.1037/0033-2909.132.3.354	间隔效应元分析——合并 317 项实验（184 篇文献），分散练习显著优于集中练习；间隔是"合意困难"中证据最硬的一支，并支撑"巩固需要物理时间"的速度公理The spacing-effect meta-analysis — pooled across 317 experiments in 184 articles, distributed practice significantly beats massed practice; spacing is the hardest-evidenced branch of “desirable difficulty,” and underpins the speed axiom that consolidation needs physical time
R15	Ⅱ	Diekelmann & Born《The Memory Function of Sleep》Nature Reviews Neuroscience 11(2) 2010:114-126 · doi.org/10.1038/nrn2762；Stickgold《Sleep-dependent memory consolidation》Nature 437(7063) 2005:1272-1278; Stickgold, “Sleep-dependent memory consolidation,” Nature 437(7063) 2005:1272-1278	记忆巩固依赖时间与睡眠（睡眠期回放），受控/影像证据、多实验室复现——这是 AI 压缩不掉的物理时间常数，脚手架须顺着它设计而非对抗Memory consolidation depends on time and sleep (sleep-stage replay); controlled/imaging evidence, replicated across labs — a physical time constant AI cannot compress, which scaffolds must be designed along, not against
R16	Ⅴ	Plato《Phaedrus》（苏格拉底论文字与记忆，约公元前 370 年） (Socrates on writing and memory, c. 370 BCE)	每一次"知道"的获取成本被砍掉，都伴随同一种焦虑——苏格拉底担心文字会让记忆萎缩。哲学文本（Ⅴ），引为这条焦虑曲线的历史起点，不作实证主张Each time the cost of acquiring “knowing that” is cut, the same anxiety recurs — Socrates feared writing would atrophy memory. A philosophical text (Ⅴ), cited as the historical origin of this anxiety curve, making no empirical claim
R17	Ⅴ	Vygotsky《Mind in Society: The Development of Higher Psychological Processes》Harvard University Press 1978（"最近发展区"ZPD 与脚手架的理论源头；编译自 1930s 俄文遗稿） (the theoretical source of the Zone of Proximal Development and scaffolding; compiled from 1930s Russian manuscripts)	脚手架的定义性特征是可撤除——支持随能力成长逐步收回，直到学习者独立站立。本卷据此判定 AI 是脚手架还是拐杖，取决于它是否被设计为可撤除（经典理论框架，记 Ⅲ）The defining feature of a scaffold is that it is removable — support gradually withdrawn as ability grows, until the learner stands alone. On this basis the volume judges whether AI is scaffold or crutch by whether it is designed to be removable (a classic theoretical frame, graded Ⅲ)
R18	Ⅲ	APA（美国心理学会）2026 调查（N=1,923）· 自报量表 (American Psychological Association) 2026 survey (N=1,923) · self-report scale	"反驳 AI 且反驳得对"越少 → 自报独立推理信心越低（呈负相关，精确系数未独立证实）；下降是头号预警。调查自述"描述性、不支持因果"，故记 Ⅲ、作为元认知监测的实操量表引用The less one “pushes back on AI, and correctly,” the lower the self-reported confidence in independent reasoning (a negative correlation; the precise coefficient is not independently verified); a decline is the top warning sign. The survey self-describes as “descriptive, not supporting causation,” hence graded Ⅲ and cited as a practical gauge for metacognitive monitoring
R19	Ⅲ	METR《Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity》arXiv:2507.09089 · 2025-07 · arxiv.org/abs/2507.09089 · metr.org（RCT 设计强，16 名资深维护者、246 真实 issue；arXiv 预印＋机构报告，未评审，记 Ⅲ） (strong RCT design, 16 senior maintainers, 246 real issues; arXiv preprint plus institutional report, un-reviewed, graded Ⅲ)	允许用 AI 反而慢 19%，而开发者预测会快 24%、做完仍觉得变快——主观与客观方向相反，是"合成自信"的刻度，提醒别把"快/省力"当默认善Allowing AI made developers 19% slower, yet they predicted 24% faster and still felt faster afterward — subjective and objective point in opposite directions, a gauge of “synthetic confidence” warning against taking “fast / less effort” as the default good
R20	Ⅳ	情景规划法（双轴 2×2 / GBN）：Scenario planning (two-axis 2×2 / GBN): Pierre Wack《Scenarios: Uncharted Waters Ahead》HBR 1985-09 · hbr.org/1985/09；Peter Schwartz《The Art of the Long View》Doubleday/Currency 1991（ISBN 978-0-385-26732-8；后联合创立 Global Business Network） (ISBN 978-0-385-26732-8; later co-founded Global Business Network)	未来推演「四个世界」的方法论注脚——取两条最关键且最不确定的驱动力为两轴、张成四象限四情景；目的是拓宽感知而非预测单一未来（经典方法论 Ⅱ、可直接采用；由它生成的具体四情景内容仍是 Ⅴ 级推演，方法可靠性不传染给情景内容）The methodological footnote for the “four worlds” of the projection act — take the two most critical and most uncertain driving forces as the axes, spanning four quadrants and four scenarios; the aim is to widen perception, not to predict a single future (a classic methodology, Ⅱ, directly usable; the specific four scenarios it generates remain Ⅴ-grade extrapolation, since the method’s reliability does not carry over to the scenario content)
R21	Ⅴ	Paulo Freire《Pedagogy of the Oppressed》（《被压迫者教育学》） Continuum 1970/2000（ISBN 978-0-8264-1276-8；"banking model of education"概念出自第二章） (ISBN 978-0-8264-1276-8; the “banking model of education” concept is from Chapter 2)	"银行存储式教育"——把知识当作可存取的存款、把学生当作空账户接收，是本卷批判的"知识传递"隐喻的命名来源。教育哲学文本（Ⅴ），引为隐喻批判的概念框架，不作实证主张；其要害在 AI 下被放大：若学习只是存储，AI 就是更好的容器The “banking model of education” – treating knowledge as a deposit to be stored and the student as an empty account that receives it – is the naming source for the “knowledge transfer” metaphor this volume critiques. A philosophy-of-education text (Ⅴ), cited as the conceptual frame for the metaphor critique, making no empirical claim; its sting is amplified under AI: if learning is mere storage, AI is the better container
R22	Ⅳ	Donald T. Campbell《Assessing the Impact of Planned Social Change》Evaluation and Program Planning 2(1) 1979:67-90 · doi.org/10.1016/0149-7189(79)90048-X（"Campbell 定律"；与 Goodhart 定律同源） (“Campbell’s law”; same lineage as Goodhart’s law)	Campbell 定律——一个量化的社会指标用于决策的程度越高，它就越易受腐蚀压力、也越会扭曲它本想监测的社会过程。本卷据此论证：分数/文凭/标准化考试作为学习的代理信号，在 AI 把伪造成本砍到近零时结构性失真（经典社会科学命题，Ⅳ）Campbell’s law – the more a quantitative social indicator is used for decision-making, the more it is subject to corruption pressure and the more it distorts the social process it was meant to monitor. On this basis the volume argues that grades/credentials/standardized tests, as proxy signals of learning, distort structurally once AI cuts the cost of faking them to near zero (a classic social-science proposition, IV)
R23	Ⅱ	内镜 AI 去技能化研究（多中心真实世界；撤除 AI 后独立腺瘤检出率 28.4%→22.4%）endoscopy AI de-skilling study (multi-centre real-world; unaided adenoma detection 28.4%→22.4% after AI removal) Lancet Gastroenterology & Hepatology 2025;10(10):896–903 · doi.org/10.1016/S2468-1253(25)00133-5（Budzyń 等；已核实 · 2025-09） (Budzyń et al.; verified · 2025-09)	迄今唯一真实世界行为层的去技能化直接证据——非自报、非横断，且发生在专家身上；去技能化只在撤除 AI 那一刻显形，正是本卷"撤除演练"的现实版。观察性、撤除前后同组对比（非随机化），故记 Ⅱ 并明标观察设计The only real-world behavioral-layer direct evidence of de-skilling to date — not self-report, not cross-sectional, and occurring in experts; de-skilling surfaces only at the moment of AI removal, the real-world version of this volume’s “removal drill.” Observational, a within-group before/after-removal comparison (not randomized), hence graded Ⅱ with the observational design flagged
R24	Ⅱ	Bastani et al.《Generative AI without guardrails can harm learning: Evidence from high school mathematics》（脚手架 vs 拐杖对照实验：GPT 直接答案组撤除后比从未用 AI 者更差，GPT 导师式提示组撤除后不受损；含发表后勘误） (scaffold-vs-crutch controlled experiment: the direct-answer GPT group did worse than never-AI controls after removal, the tutor-style hint group was unharmed; with a post-publication erratum) · PNAS 2025;122(26):e2422633122 · doi.org/10.1073/pnas.2422633122（勘误 doi 10.1073/pnas.2518204122；working paper SSRN 4895486；已核实 · 2025） (erratum doi 10.1073/pnas.2518204122; working paper SSRN 4895486; verified · 2025)	"脚手架 vs 拐杖"分野的旗舰因果证据——决定 AI 是脚手架还是拐杖的不是用不用，而是给提示还是给答案、撤除后是否仍能独立运转；本卷脚手架处方的可核机理支点The flagship causal evidence for the “scaffold vs crutch” divide — whether AI is scaffold or crutch turns not on use but on hints-vs-answers and on whether one still operates independently after removal; the checkable mechanistic anchor for the volume’s scaffold prescription
R25	Ⅳ	OpenAI Study Mode 上线（2025）与悄然下线（2026-04）的产品时间线 · 厂商动向，未独立核实product timeline of OpenAI Study Mode’s launch (2025) and quiet retirement (2026-04) · vendor move, not independently verified	"摩擦"默认不可托付厂商的实例——主打"少给答案"的学习模式可因商业取舍一夜消失；故合意困难须落在自有流程层，而非某个产品开关。产品动向（Ⅳ），仅作论证示例，不作实证主张An instance of why “friction” cannot be entrusted to vendors by default — a “fewer answers” study mode can vanish overnight on commercial trade-offs; hence desirable difficulty must live in the self-owned process layer, not a product toggle. A product-trend datum (IV), cited as an illustration, making no empirical claim
R26	Ⅳ	Amy Bruckman《Can Educational Be Fun?》（游戏开发者大会 GDC'99，San Jose；"巧克力裹西兰花"批评的原始出处） (Game Developers Conference ’99, San Jose; origin of the “chocolate-dipped broccoli” critique) · 1999 · faculty.cc.gatech.edu/~asb/papers/bruckman-gdc99.pdf	把枯燥练习裹上一层游戏化糖衣（积分、动画）并不改变核心活动，糖衣一化、该做的练习照旧没做；从业者一手陈述，作为概念出处而非实证数据Coating a dull drill in a layer of game-y sugar (points, animation) never changes the core activity; once the coating melts the drill still isn’t done. A practitioner first-hand account, cited as the concept’s source, not as empirical data

REV	DATE	DESCRIPTION
L.0	2026-06	《AI-Native 学习方法论》首版 —— 认知主权主张 · 12 张论证图 · 自有 22 源证据登记（R1–R22），承重论断逐条分级 Ⅰ–Ⅴ；萎缩命题诚实标注为相关性证据、降格为可证伪的赌注First edition of the AI-Native Learning Methodology: the cognitive-sovereignty thesis · 12 argument diagrams · its own 22-source evidence registry (R1–R22), each load-bearing claim graded Ⅰ–Ⅴ; the atrophy claim honestly marked as correlational evidence and downgraded to a falsifiable bet
L.1	2026-06	vNext 证据加固 —— 补真实世界行为层去技能化硬证（Lancet 内镜 R23、Bastani 脚手架/拐杖对照 R24）与 OpenAI Study Mode 撤下时间线 R25；承重的脚手架正效应五处统一加"假说预测·区间待坐实（锚点 R7 未核实）"限定vNext evidence hardening: added real-world behavioral-layer de-skilling evidence (Lancet endoscopy R23, Bastani scaffold/crutch control R24) and the OpenAI Study Mode retirement timeline R25; the load-bearing scaffold-benefit figure qualified in all five places as “a hypothesis prediction, range not yet confirmed (anchor R7 unverified)”

REV. 2026-06 R25 / END OF DOCUMENT