PART III / AI-NATIVE 设计AI-NATIVE DESIGN

AI Native 设计方法论

AI Native Design Methodology

同一句 brief——"做个好看的落地页"——丢给模型十次，能拿回十版都"挑不出错"的稿：对齐干净、配色时髦、间距规整。可上线只能留一版，挑哪版、凭什么？这一问就是这一卷的全部。当一稿、一个变体、一整套界面都近乎免费，稀缺的不再是"做出来"，而是"什么算好"——品味与意图；而生成默认滑向均值（slop：通用、雷同、一眼 AI），十版往往长得像同一版，品味反而成了最稀缺的判断。这里说的设计不止于界面：它要把意图变成人愿意用、为之停留的形态，产品、交互、系统、表达都在其内，而对象始终是具体的人——所以"为谁、何为好"是 AI 接不走、也最不该接走的那部分。

Hand one brief, “make a good-looking landing page,” to a model ten times and you get back ten comps that all “pass”: clean alignment, on-trend palette, tidy spacing. But you can ship only one. Which one, and on what grounds? That question is the whole of this volume. When one comp, one variant, a whole interface is near-free, the scarce thing is no longer “making it” but “what counts as good”: taste and intent. Generation defaults to the mean (slop: generic, derivative, obviously-AI), so the ten often look like the same one and taste becomes the scarcest judgment of all. Design here is not confined to interfaces: it turns intent into a form people want to use and will stay with, spanning product, interaction, system, and expression, and its object is always specific people, so “for whom, and what is good” is the part AI cannot take over, and should least of all be allowed to.

① 出稿已充裕，做出来不再稀缺 → ② 判断沿可验证性梯度裂成两半：能只看产物判对错的（对齐、规格符合、可访问性）交给机器自动跑，只能人来定的审美判断留给人 → ③ 设计系统把"何为好"写成生成前的护栏 → ④ 人腾出手，回到"为谁、何为好"。下面每一节逐一兑现这四步。

① comps are already abundant, so making them is no longer scarce → ② judgment splits in two along the verifiability gradient: what can be judged right-or-wrong from the artifact alone (alignment, spec-conformance, accessibility) goes to the machine to run automatically, while constitutive aesthetic judgment stays with people → ③ the design system writes “what is good” into a pre-generation guardrail → ④ people, freed, return to “for whom, and what is good.” The sections below cash out these four steps one by one.

面向认知Cognition-facing 研究Research 学习Learning 创新Innovation

面向执行Execution-facing 组织Org 工程Engineering 设计Design

本卷讲设计——把意图变成人能理解、愿意使用、为之停留的形态，从产品到交互到表达，而不止于界面。

This volume is about design: turning intent into a form people can understand, want to use, and stay with, from product to interaction to expression, not interfaces alone.

看完整体系总图See the full system map ↗

AI-ENABLED DESIGN→AI-NATIVE DESIGN

产出

Output

更多稿、更快图More drafts, faster images意图、品味与系统约束一起进入生成Intent, taste, and system constraints enter generation together

评价

Review

凭“像不像”挑稿Choose by whether it looks close用指纹、语境和可用性判断何为好Judge quality through fingerprints, context, and usability

边界

Boundary

生成后再修补Patch after generation先写生成护栏，再让系统拒绝 slopWrite guardrails first so the system can refuse slop

拖动滑块，看设计从“出稿效率”转为“品味基础设施”。进入第 5 节 · 品味判断

Drag the slider: design moves from draft speed to taste infrastructure. Enter Section 5 · Taste Judgment

AI-NATIVE DOCUMENT PACK · PART III

设计文档包：把“何为好”写成生成护栏

Design Pack: writing “what good means” as generation guardrails

本卷的文档是品味的基础设施，不是出稿说明：让生成能铺开，让人能判断，让系统能拒绝 slop。

The design document is infrastructure for taste, not a production note: generation can spread, humans can judge, and the system can refuse slop.

Thesis

生成变富后，稀缺的是品味、意图与“为谁”，不是画稿。

When generation is abundant, the scarce thing is taste, intent, and “for whom,” not comps.

AI-Native 设计把设计师从亲手出稿推回判断环：铺开候选、评判差异、导向下一轮，并把新学到的判据沉淀进设计系统。

AI-Native design moves the designer from hand-producing one comp into the judgment loop: spread candidates, judge differences, steer the next round, and distill new criteria into the design system.

DSN

CONCEPT · 概念

CONCEPT

定义 · 先划界

Definition

当二十个方案都能做出来，设计师在判断什么？

When Twenty Designs Are Possible, What Is a Designer Judging?

分界不是用了什么工具。是团队说不说得清：为什么是这一版，不是另外十九版。

The line isn’t which tool a team used. It’s whether they can say why this version, and not the other nineteen.

一句话In one line

候选稿生成得比理解用户、做取舍还快，这条流程就该重画了。判断有没有变得可追溯，才是判断是不是 AI-Native 的标准——出稿快不快，不是。When candidate drafts arrive faster than a team can understand users and make trade-offs, the process needs redrawing. Whether it’s AI-Native comes down to whether judgment becomes traceable, not whether drafts arrive faster.

下午四点的评审桌上，铺着二十个界面。它们都完整、干净，也都能上线。设计师没有因此轻松，反而停住了：新手第一次打开时会不会迷路？屏幕阅读器能不能读懂？品牌在这一版里还剩下什么？当每一稿都来得太快，真正慢下来的是这些问题的回答。

At a four o’clock review, twenty screens cover the table. Each is complete, clean, and capable of shipping. That does not make the designer’s job easier. It makes the room pause: will a first-time user get lost? Can a screen reader make sense of it? What remains of the brand in this version? When every draft arrives quickly, what slows down is not production but answering those questions.

设计这些年来最稀缺的是产出工时——画稿、对齐像素、切图、做变体，都要花人时。许多流程就是照着“做出来很贵”这个前提搭的。现在做出来几乎不要钱了，问题可能堆在了别处：用户研究、可访问性、品牌一致性，或者上线之后有没有人真的用得顺。这个分卷提出的工作假设是：产出速度想变成体验质量，得先把“为谁做、凭什么这样取舍”写进生成和评审的环节，否则速度只是速度。

For years, design’s scarcest resource was production hours: drawing comps, aligning pixels, cutting assets, building variants all cost people time. Most process was built around the assumption that making things is expensive. Now making is nearly free, and the hard part may have moved elsewhere: user research, accessibility, brand coherence, or whether people can actually use the thing once it ships. This volume’s working hypothesis is: speed only turns into experience quality once a team writes “who this is for, and why this trade-off” into generation and review. Otherwise speed is just speed.

三种看似先进、却把问题留在原地的做法

Three Seemingly Advanced Moves That Leave the Problem Where It Is

说清"是什么"最快的办法，是看它把哪些相邻做法甩在了后面。第一种：把 AI 当出图机。用 Midjourney 出配图、用插件出图标，然后照旧手工拼版——这只是把 AI 焊在旧的"产出工时"流程上，换了一支更快的笔，流程图和瓶颈都没动。第二种更隐蔽：把 AI 当自动美化器。指望它"让这版更好看"，结果它套上了当下最流行的视觉模板——这正是 slop：更光滑，但更没有自己。品味本该是人来做的事，被错误地外包给了一台只会拟合均值的机器。第三种：把设计系统当成事后补的文档。先生成、再回头凑一套 token 应付审计——这样设计系统就丢掉了它在 AI-Native 流程里唯一重要的角色：生成之前的护栏，而不是生成之后的辩护词。三种做法的共同毛病，是都没有围绕"生成已经充裕"这个前提重画流程，只是把 AI 塞进了旧流程的某个工位。

The fastest way to say what this is, is to name what it leaves behind. First: AI as an image machine. Generate an illustration in Midjourney, a set of icons with a plugin, then hand-assemble the layout as always. That just grafts AI onto the old production-hours process, a faster pen on the same bottleneck. Nothing about the process diagram changes. Second, more subtle: AI as an auto-beautifier. Expected to “make this look better,” it delivers whatever visual template is trending, which is exactly slop: smoother, less itself. Taste, a human’s job, gets outsourced to a machine that only fits the mean. Third: the design system as an after-the-fact document. Generate first, then patch together tokens to pass an audit, which strips the design system of the one role that actually matters in an AI-Native process: a guardrail before generation, not a defense after it. All three share the same failure: none redraws the process around the fact that generation is now abundant. They just slot AI into one station of the old line.

往正面说：AI-Native 设计是一套能被检验的流程改写。团队在生成之前就把“为谁做、为什么这样取舍、什么不能牺牲”讲清楚，然后让评审和真实上线持续拷问这些话有没有站住。常见的做法，是把人的精力挪到判断和方向上，把设计系统变成生成前的护栏，把“更快”从唯一的指标位置上撤下来。但这三件事凑齐了也不算数——检验的是用户能不能完成任务，有没有人被这版设计挡在外面，团队能不能把取舍讲清楚。

Stated positively: AI-Native design is a process rewrite you can test. Before generation, a team says who this is for, why this trade-off, and what it will not sacrifice, then lets review and real shipping keep challenging whether that holds up. Common moves: shift human effort toward judgment and direction, turn the design system into a pre-generation guardrail, retire “faster” as the only metric that counts. But having all three doesn’t settle it. The real test is whether users complete their tasks, whether anyone gets shut out by this version, and whether the team can actually explain its trade-offs.

这三条不是发证书用的清单，它们其实是三个更难回答的问题：生成变多了之后，谁去做用户研究？哪些判据能写进系统，哪些只能留在具体情境里现场判断？上线之后暴露出的问题，能不能真的回流到下一轮生成？答不上来，这套流程可能仍旧只是换了一支更快的笔。

These three aren’t a checklist you get certified against. They’re three harder questions: once generation multiplies, who does the user research? Which criteria can live in the system, and which only make sense judged on the spot, in context? Can what breaks after launch actually feed back into the next round of generation? If a team can’t answer, the process may still just be a faster pen.

这里说的“品味”，不该被神化成设计师才有的私人直觉，也不该被划成模型永远进不去的禁区。它就是针对具体的人、具体的任务、具体的后果做出的取舍，可以被追问、可以被反驳。如果哪天模型在真实使用里稳定地提出更好的取舍，还能说清楚代价是什么，这一卷划给“人来判断”的边界，也得跟着挪。

Taste here shouldn’t be mystified into a designer’s private intuition, and it shouldn’t be fenced off as territory a model can never enter. It’s a trade-off made about particular people, particular tasks, particular consequences, one that can be questioned and pushed back on. If a model starts reliably proposing better trade-offs in real use, and can account for what they cost, the line this volume draws around “humans judge” has to move too.

① 充裕ABUNDANCE

稿 / 变体 / UI / 文案

Comps / variants / UI / copy

生成成默认，"做出来"不再稀缺。

Generation is the default; making it is no longer scarce.

② 判断JUDGMENT

品味 · 意图 · 何为好

Taste · intent · what’s good

新瓶颈是审美与体验判断 + 连贯。

The new bottleneck is aesthetic/experience judgment + coherence.

③ 上下文CONTEXT

设计系统即护栏

Design system as guardrail

tokens / 组件 / 品牌成为生成的规格。

Tokens / components / brand become the spec for generation.

④ 人MEANING

共情 · 品味 · 为意义负责

Empathy · taste · meaning

设计师回到理解用户、守住品味与意图。

Designers return to understanding users, holding taste and intent.

第②步同样沿可验证性梯度分叉：可机检的部分（对齐 / 规格符合度 / 可访问性）并入①充裕、被自动化；只能人来定的审美判断（品味 / 为谁而存在 / 异质性）下沉④、留给人——这正是设计卷与体系总图共用的那条线。

Step ② forks the same way along the verifiability gradient: the machine-checkable part (alignment / spec-conformance / accessibility) joins ① abundance and gets automated; constitutive aesthetic judgment (taste / who it exists for / heterogeneity) sinks to ④ and stays with people, the very line the design volume shares with the system map.

同一个内核，作用在设计这个面

The one kernel, acting on the design face

系列里六卷——组织、工程、设计、研究、学习、创新——讲的都是同一条内核，只是落在不同的面上；各讲各的只是表象。组织卷说的是"执行充裕、判断退守、上下文成基设、人回归意义"；工程卷说的是"打字充裕、验证成瓶颈、代码库可查询、人做系统专长"。设计卷换的只是名词：充裕的是稿与变体，退守的判断是品味与意图，成了基设的上下文是设计系统，人回归的意义是共情与为人设计。读过任何一个姊妹卷的读者，会认出这是同一台机器换了零件——这才是它能叫"系列"、而不是六篇互不相干的文章的原因。

The six volumes in this series, organization, engineering, design, research, learning, innovation, tell the same kernel landing on six different faces; each telling its own story is only the surface. The org volume tells it as “execution becomes abundant, judgment retreats, context becomes infrastructure, people return to meaning.” The engineering volume tells it as “typing becomes abundant, verification becomes the bottleneck, the codebase becomes queryable, people do deep systems work.” The design volume just swaps the nouns for the same four steps: what’s abundant is comps and variants, the judgment that retreats is taste and intent, the context that becomes infrastructure is the design system, and the meaning people return to is empathy and designing for people. Read any sibling volume and you’ll recognize the same machine with different parts swapped in: that’s what makes this a series rather than six unrelated essays.

核心图KEY FIGFIG. D0.0 / THE FORK · 判断沿可验证性梯度分叉看懂：第②步的判断如何一半被自动化、一半下沉到人 Read: how step ②’s judgment splits: half automated, half sinks to people

第②步是一个分叉口，不是黑箱判断。一条分诊问句"仅看产物文本能判对错吗"把判断切成两半：可机检的那半回流到①充裕、被自动化（DSN 08 的硬约束）；只能人来定的那半下沉到④、留给人（软判据）。整条分叉立在③上下文之上。这张图是后面所有章节的骨架。

Step ② is a fork, not a black-box judgment. One triage question, “can this be ruled right or wrong from the artifact text alone?”, cuts judgment in two: the machine-checkable half flows back into ① abundance and gets automated (DSN 08’s hard constraints); the constitutive half sinks to ④ and stays with people (soft criteria). The whole fork stands on ③ context. This figure is the skeleton of every section that follows.

这次重画，和设计史上每一次工具革命的关键差别

How this redraw differs from every prior tool revolution in design history

设计史上不是没经历过工具革命：手绘到桌面出版，纸面到 Photoshop，静态切图到 Figma 的协作矢量。每一次都让做出来变得更快更便宜，但有一件事始终没变——做的人和判断的人是同一个人，判断始终嵌在做的动作里。设计师推像素的同时就在判断好坏，两件事分不开。

Design history has had real tool revolutions before: hand-drawing to desktop publishing, paper to Photoshop, static slices to Figma’s collaborative vectors. Each made making faster and cheaper, but one thing never changed: the person making and the person judging were the same person, and judgment lived inside the act of making. A designer judged good and bad while pushing pixels; the two were never really separable.

这一次不一样，差别在结构上：它第一次把"做"这件事几乎整个交给了机器，逼得"判断"从"做"里被抽出来、单独显形。过去你不需要专门写下"什么算好"，因为判断本来就活在你的手里；现在手交给了生成，判断如果不被明确写下来、说清楚，它就会消失——一消失，生成就会滑回均值。这也是这一卷反复强调"写规格、说判据、让判断回流"的原因：这些动作不是新发明，只是第一次必须从藏在手里变成写在纸上。看懂这个差别，才看得懂这次是设计师价值结构的一次重组，不只是"又来了一个更快的工具"。

This time is structurally different. For the first time, making gets handed almost entirely to a machine, which forces judgment out of making and makes it stand on its own. You never had to write down “what counts as good” before, because judgment simply lived in your hands. Now the hands belong to generation, and if judgment isn’t written down and stated explicitly, it disappears. Once it disappears, generation slides back to the mean. That’s why this volume keeps insisting on writing the spec, stating the criteria, feeding judgment back: none of that is a new invention, it’s just the first time it has to move out of the hand and onto the page. Get this difference and you get why this is a reorganization of what a designer’s value is made of, not just another faster tool.

DSN

MECHANISM · 不对称

THE ASYMMETRY

机理

Mechanism

生成变富，品味变稀缺

Generation gets cheap, taste gets scarce

这一节拆开不对称的两侧：成本塌了，判断没跟着塌。

This section opens up both sides of the asymmetry: cost collapsed, judgment did not follow.

一句话In one line

出稿几乎不要钱了，"什么算好"却一分都没便宜。生成默认收敛到均值——这就是 slop——模型越强，这层 slop 反而越光滑、越难辨认，于是品味成了整条链上最稀缺的判断。Producing drafts is nearly free now. What counts as good hasn’t gotten one cent cheaper. Generation defaults to the mean (that’s slop), and the stronger the model, the smoother and harder to spot that slop gets. So taste becomes the scarcest judgment left in the chain.

过去设计的稀缺是"做出来"的工时，流程就是照着省这份工时建的。生成一把出稿变成随取随用，产出就不再是约束了。瓶颈整个搬到了一个机器答不了的问题上：这一版，好吗？对不对路？是不是为它要服务的人做的？

The old scarcity in design was production hours, and process was built to save them. Once generation makes comps something you draw on at will, production stops being the constraint. The whole bottleneck lands on a question the machine can’t answer: is this version good? On target? Made for the people it’s supposed to serve?

为什么生成的默认终点是"均值"——而均值就是 slop

Why generation’s default destination is “the mean,” and why the mean is slop

这条风险来自约束不够，不是模型注定只会吐"均值"。请求只说"做一个好看的落地页"，系统就拿不到用户、任务、风险、取舍这些信号，于是它容易滑回训练里见得最多、最稳妥的样子。那种结果不一定丑，但常常不为哪个具体的人负责。这里管它叫 slop，是想指出"看起来合格"跟"在这里合适"之间那道缝，不是给模型能力下数学判决。

This risk comes from being underconstrained, not from a model destined to output only the mean. When a request only says “make a good-looking landing page,” the system has no signal about users, tasks, risk, or trade-offs, so it slides back toward whatever is most common and safest in its training. That result isn’t necessarily ugly, but it’s often accountable to no one in particular. We call this slop to name the gap between “looks acceptable” and “fits here,” not to issue a mathematical verdict on model capability.

更强的模型、更多真实使用的数据、更清楚的规格、更好的评审，都可能把结果拉离这个默认。要做的事不是提前把"品味"锁死给人，而是让每一次生成的结果都能回答：它服务谁，牺牲了什么，凭什么这么判断，用出来的反证能不能回到下一轮。谁能稳定地把这条链走通，谁就在参与设计判断——不管这个"谁"最终是人还是模型。

A stronger model, more data from real use, clearer specs, better review: all of these can pull output away from that default. The job isn’t to lock “taste” to humans in advance. It’s to make every generated result answer: whom it serves, what it sacrificed, on what basis, and whether evidence from use can loop back into the next round. Whoever can reliably carry that chain is doing design judgment, whether that “whoever” turns out to be a person or a model.

生成 · 近乎免费Generation · near-free

一稿、十个变体、一整套界面与状态——随取随用，近零边际成本。

One comp, ten variants, a whole interface with states, on demand at near-zero marginal cost.

品味 · 依旧稀缺Taste · still scarce

"哪个更好、为什么、是否为人"——没有捷径，只能由人判断。这就是新瓶颈。

“Which is better, why, is it for people”: no shortcut, only human judgment. This is the new bottleneck.

图FIGFIG. D1.1 / DISTRIBUTION CLAMP · 生成默认堆在均值，设计系统把分布夹离均值 · generation piles on the mean; the design system clamps the distribution off it 看懂：生成的输出本身是一条压在"均值=slop"上的钟形分布，护栏是把整条分布夹窄、推向品牌那一侧，不是挑一个好结果 Read: generation’s output is itself a bell centered on “the mean = slop”; the guardrail clamps the whole distribution narrow and pushes it toward the brand side, not picking one good result

关键不在"挑出那个好结果"——那是事后筛选，挑不快也挑不稳。护栏做的是上游的事：把生成那条压在均值上的宽分布，整条夹窄、整条推离均值、推向品牌成立的那一侧。所以设计系统的回报是"每一次生成的期望都更接近你认的东西"，而非"少返工一次"。这也是为什么 slop 是默认而非偶发：不加夹具，分布的峰永远落在均值。

The point is not “pick the one good result”: that is downstream filtering, neither fast nor reliable. The guardrail acts upstream: it takes generation’s broad mean-centered distribution and clamps the whole thing narrow, shifts the whole thing off the mean toward where the brand holds. So a design system’s payoff is not “one less rework” but “every generation’s expectation lands closer to what you would sign off on.” This is also why slop is the default, not the accident: without a clamp, the peak always sits on the mean.

把两条曲线画在一起：成本塌、判断不塌

Plot the two curves together: cost collapses, judgment does not

倒回十八个月前：做一整套带状态、响应式的界面，还得排一位设计师画上几天人时。现在，同样一稿加十个变体，一句提示、几分钟就回来了——生成的边际成本沿模型能力曲线塌了一到两个数量级。另一条曲线几乎没动："哪个好、为谁好"这条判断，不会因为模型变强就变便宜，因为它问的不是"能不能做出来"，而是"该不该是这样"。一条塌了、一条没动，中间那道剪刀差，就是品味变成瓶颈的全部原因〔源：Anthropic 2025 agentic-coding 实践与 Karpathy "software is changing" 论述，证据级 Ⅳ 一手从业者〕[R1][R2]。

Rewind eighteen months: building a full set of stateful, responsive screens still meant booking a designer for person-days. Today the same comp plus ten variants comes back from one prompt in a few minutes: generation’s marginal cost has fallen one to two orders of magnitude along the model-capability curve. The other curve has barely moved: “which is good, good for whom” doesn’t get cheaper as the model gets stronger, because it asks not “can this be made” but “should it be this way.” One curve falls, the other holds, and the scissor-gap between them is the whole reason taste becomes the bottleneck 〔Source: Anthropic 2025 agentic-coding practice and Karpathy’s “software is changing” talk, grade Ⅳ practitioner〕[R1][R2].

核心图KEY FIGFIG. D1.0 / GENERATION × TASTE · 生成×品味平面看懂：AI 把你推向哪一格，品味要在哪一格注入 Read: which quadrant AI pushes you into, and where taste must be injected

AI 只沿横轴帮你——把生成推向"近乎免费"。它不会替你沿纵轴往上走。不加判断，你就从 Q2（旧手艺）平移到 Q4（slop 默认区）：更便宜，但谁也不为。胜势 Q1 是人把品味这条纵轴重新加上去换来的，不是 AI 送的。

AI helps you only along the horizontal axis: pushing generation toward “near-free.” It will not climb the vertical axis for you. Without injected judgment you simply slide from Q2 (old craft) to Q4 (the slop default): cheaper, but for no one. The win, Q1, is what a human buys back by re-adding the vertical axis of taste, not a gift from AI.

不对称的直接后果：团队该把省下的人力重新投到哪

The asymmetry’s direct consequence: where a team should reallocate the freed-up effort

如果生成把产出成本压塌了、判断成本却没动，一个理性的团队就该做一次显式的人力再分配——而不是简单地"用 AI 提效、然后裁掉一半设计师"。后者是对这条不对称的误读：它假设瓶颈还在产出，省了产出就万事大吉。真相是瓶颈搬去了判断，所以省下来的产出人力应该重新投到判断这一侧：把规格写得更有判别力，把品味外化成可复用的护栏，把每一轮判断都回流进系统。

If generation collapses production cost while judgment cost stays put, a rational team should run an explicit reallocation of effort: not just “use AI to boost efficiency, then cut half the designers.” The latter misreads the asymmetry: it assumes the bottleneck is still production, so saving on production settles everything. The truth is the bottleneck moved to judgment. The production effort you freed up should go there instead: into writing specs with sharper discriminating power, turning taste into reusable guardrails, feeding each round’s judgment back into the system.

一个做对了的团队，外观上会变成这样：花在 Figma 里推像素的时间大幅减少，花在写"为谁、何为好、什么是红线"、评审候选、争论"这版为什么对路"的时间大幅增加。设计师人数不一定变少，但每个人做的事会明显往上挪，从执行的人变成判断和定方向的人。误读这条不对称的代价是实打实的：以为靠 AI 就能省掉判断，结果只是更快地产出没人负责品味的 slop——悄悄把 Q2 的旧手艺挪成了 Q4 的默认区。

A team that gets this right looks, from the outside, like this: time pushing pixels in Figma drops sharply, time writing “for whom, what’s good, what’s a red line,” reviewing candidates, and arguing over “why this version is on-target” rises sharply. Headcount doesn’t necessarily shrink, but everyone’s work moves visibly upward, from executing to judging and setting direction. Misreading the asymmetry costs something real: believing AI can save you the judgment, you just produce, faster, slop that no one’s taste is accountable for, quietly sliding Q2’s old craft into Q4’s default.

还有一个常被忽略的二阶效应：产出近乎免费之后，尝试的成本也近乎为零，探索的边界因此应该被大幅推开。过去每一稿都贵，团队习惯早早收敛到一个"安全"的方向，不敢发散——发散意味着浪费宝贵的人时。现在这个约束没了：铺十个真正不同的方向，和只铺一个，成本差不多。理性的策略应该反过来：判断之前尽可能多地发散，因为发散几乎免费，发散得越广，判断能挑的可能性空间就越大，撞中"真正对路的方向"的概率也越高。

There’s also a second-order effect people often miss: once production is nearly free, the cost of trying something is nearly zero too, so the boundary of exploration should be pushed much wider. In the past, with every comp expensive, teams converged early on a “safe” direction and avoided diverging: divergence meant burning precious hours. That constraint is gone now: spreading ten genuinely different directions costs about the same as spreading one. The rational move flips: diverge as much as possible before you judge, because divergence is nearly free, and the wider you spread, the bigger the space your judgment gets to pick from, and the better your odds of hitting the direction that’s actually right.

可惜不少团队把省下来的成本用错了地方：拿去更快地收敛（"出图快了，赶紧定稿往下走"），而不是更广地探索。这也是对这条不对称的误用——它把"生成变便宜"这份红利，花在了给旧习惯加速上，没意识到这份红利真正解锁的，是"以前不敢做的大范围探索"。用对了，生成的廉价给判断一个前所未有大的素材库，而不只是让人做完得更快。

Sadly, many teams spend that saved cost in the wrong place: on converging faster (“comps are quick now, lock it in and move on”) rather than exploring wider. That’s another misuse of the asymmetry: spending the “generation got cheap” dividend on accelerating old habits, missing that the real unlock is the wide-range exploration you never used to dare. Used right, cheap generation hands your judgment an unprecedented library of material, rather than just letting you finish faster.

结构性警告 · slopStructural warning · slop

生成的默认产物是 slop：收敛到见得最多的那种样子。它看起来完成了，却谁都不像、谁也不为。slop 不是做得差，是判断没放进去——躲开它只有一个办法：把人的品味放回环里。Generation’s default output is slop: it converges on whatever it has seen the most. It looks finished, yet resembles no one and serves no one. Slop is judgment left out, not bad craft. The only way around it is putting human taste back in the loop.

DSN

01·5

WHEN · 为什么是现在

WHY NOW

证据 · 时机

Evidence · Timing

这套不对称不是预测，是已经发生的事

This asymmetry is not a forecast but something already happening

这一节给出可被推翻的信号：若信号不成立，整卷前提就该被质疑。

This section lays out refutable signals: if they do not hold, the whole volume’s premise deserves doubt.

一句话In one line

这套不对称，过去十八个月已经发生了：成本塌了一到两个数量级，判断没跟上。既然是事实，等着就是在付成本——该现在就把判断迁过去，但工具层面别过早押注某一个。This asymmetry already happened, over the last eighteen months: cost fell one to two orders of magnitude, judgment didn’t keep up. Treat it as fact, and waiting has a cost: move judgment now, but don’t go all-in on any one tool yet.

信号一：成本那头已经塌了。从一句话描述出一整套带状态、响应式、能交付代码的界面，这在 2024–2025 年间的多个生成式 UI 工具和 agentic-coding 实践里已是常规能力，不是演示。出一版界面从"几天人时"压到"一句提示加几分钟"，这是横轴上一到两个数量级的位移〔源：Anthropic 2025 agentic-coding 实践、多家生成式前端工具公开能力，证据级 Ⅳ 一手从业者〕[R1]。信号二：判断那头没跟着塌。同一时期，"哪一版对路、是不是为这群人做的、有没有越过品味的线"，并没有变得更好自动化——把同一个需求丢给模型十次，你会拿到十个都"做得对"、却要靠人来挑的结果。生成解决的是能不能做出来，没解决该是哪一个。信号三：slop 变成了看得见的公共现象。"一眼看出是 AI 做的"从一种模糊的感觉，变成了能用具体指纹描述（青色配深底、紫蓝渐变、玻璃拟态、Inter 居中）、甚至能被检测的东西——这本身就是"生成天然收敛到均值"的直接证据。

Signal one: the cost side has already collapsed. Producing a whole interface (stateful, responsive, deliverable code) from a one-line description is by now routine capability across several generative-UI tools and agentic-coding practices of 2024–2025, not a demo. Compressing one interface version from person-days to a prompt plus a few minutes is a one-to-two order-of-magnitude shift on the horizontal axis 〔Source: Anthropic 2025 agentic-coding practice, public capabilities of several generative front-end tools, grade Ⅳ practitioner〕[R1]. Signal two: the judgment side hasn’t collapsed with it. Over the same period, “which version is on-target, is it for these people, has it crossed the line of taste” hasn’t become any more automatable. Hand the same brief to a model ten times and you get ten results, all “done right,” that still need a human to pick among. Generation solved whether it can be made, not which one it should be. Signal three: slop has become a visible public phenomenon. “You can tell at a glance it’s AI-made” has gone from a vague feeling to something describable by concrete fingerprints (cyan on dark, purple-blue gradients, glassmorphism, centered Inter), even detectable. That, on its own, is direct evidence that generation defaults to converging on the mean.

图FIGFIG. D2.1 / THE SCISSORS · 一塌一平，张开的剪刀差就是瓶颈 · one collapses, one stays flat; the opening gap is the bottleneck 看懂：过去十八个月两条曲线以不同速率移动——生成成本塌了一到两个数量级，判断成本近乎水平，中间的剪刀差正是品味成为瓶颈的全部原因 Read: over eighteen months the two curves moved at different rates: generation cost fell one-to-two orders of magnitude, judgment cost stayed near-flat; the scissors gap between them is the whole reason taste becomes the bottleneck

这张图唯一要让人记住的，是两条线的斜率不同，而不是任何具体数字。橙线塌、蓝线平，是因为它问的是另一类问题，不是判断"更难做"——不是"能不能做出来"，是"该不该是这样、为谁"——这类问题不随模型变强而变便宜。瓶颈因此不会随下一代模型消失，它会随每一次成本下塌而更突出。[R1][R2]

The one thing to take from this chart is the difference in slope, not any specific number. The orange line collapses and the blue one stays flat, not because judgment “got harder” but because it asks a different kind of question: not “can it be built” but “should it be this, for whom,” a kind that does not get cheaper as the model improves. So the bottleneck will not vanish with the next model; it gets more prominent with every collapse in cost. [R1][R2]

"已经发生"意味着等待是有成本的

“Already happening” means waiting has a cost

把不对称当成已经发生的事实，而不是对未来的预言，有一个直接的行动含义：等待不是中性的，它一直在累积成本。如果这只是一个对未来的押注，"再观望一两年"是理性的；但既然成本那头已经塌了、判断那头已经成了瓶颈、slop 已经是看得见的公共现象，那么每多用一段时间的旧流程，就是在用更高的单位成本，做本来可以更便宜的产出——同时把本该投到判断上的注意力，继续锁在产出上。

Treating the asymmetry as an accomplished fact rather than a forecast has a direct action implication: waiting isn’t neutral, it keeps accumulating cost. If this were only a bet on the future, “watch and wait a year or two” would be the rational move. But the cost side has already collapsed, the judgment side has already become the bottleneck, slop is already visible in public, so every extra stretch running the old process means paying a higher unit cost for production that could be cheaper, while keeping attention locked on production when it should have moved to judgment.

更隐蔽的成本是判断能力会萎缩：一个团队要是迟迟不把重心挪到判断上，它的设计师就一直没机会练"写有判别力的规格、说清楚这版为什么对路"这些新瓶颈需要的能力；等到不得不迁移的那天，会发现这些能力不是开个会就能补上的。所以"现在"不是营销话术里的紧迫感，是这条不对称的真实推论：变化已经发生了，重画流程最好的时机是现在，成本最低的时机也是现在。

The more insidious cost is that judgment capability atrophies: if a team is slow to shift its center of gravity to judgment, its designers never get to practice the abilities the new bottleneck demands: writing discriminating specs, stating why a version is on-target. By the time migration becomes unavoidable, they find these aren’t skills you patch in with a meeting. So “now” isn’t marketing urgency. It’s a real corollary of the asymmetry: the change has already happened, so the best time to redraw the process is now, and the cheapest time is also now.

这里得给"现在就动"加一条诚实的限定，免得它被读成盲目的紧迫感：动，指的是开始把重心往判断侧挪、开始搭可机检的护栏、开始把闭环跑一遍——不是"立刻把手上所有工具换成最新的 AI 设计工具"。工具会快速迭代，会有赢家输家，押注某个具体工具是有风险的；但押注"产物能写成文本、判断重于产出、护栏立在生成之前"这几条结构性方向，几乎没有风险，因为它们不依赖任何具体工具的存亡——不管最后谁赢，这几条都成立。所以"现在就动"的正确读法是：结构层面立刻迁移（这是安全又划算的），工具层面保持敏捷、别过早押死在一个上（这是审慎的）。把这两件事分开，你就既不会因为观望一直付迁移成本，也不会因为押错工具被套牢。这才是把不对称当"已发生的结构事实"、而不是"哪个产品的营销叙事"应有的清醒。

“Move now” needs an honest qualifier, lest it read as blind urgency: moving means beginning to shift the center of gravity toward judgment, beginning to build machine-checkable guardrails, beginning to run the loop once, not “immediately replacing every existing tool with the newest AI design tool.” Tools will iterate fast, with winners and losers; betting on any one tool is risky. But betting on the structural directions — the artifact can be written as text, judgment outweighs production, guardrails sit before generation — carries almost no risk, because none of them depends on any one tool’s survival; whichever tool wins in the end, these hold. So the correct reading of “move now” is: migrate at the structural level immediately (safe, worthwhile), stay agile at the tool level and don’t go all-in on one too early (prudent). Keep these apart and you neither keep paying the migration cost by waiting nor get locked in by betting on the wrong tool. That’s the clear-headedness that comes from treating the asymmetry as an accomplished structural fact, not some product’s marketing narrative.

最后得给这套"为什么是现在"加一层克制，免得它滑成技术决定论：成本那头塌了、判断成了瓶颈，这些是真的；但"所以一切都该立刻 AI 化"不跟着成立。有些设计场景判断密度极高、可机检的部分极少——比如承载强烈情感的品牌重塑，或高度依赖特定文化语境的视觉系统——这类场景里，生成本来能帮上的忙就有限，硬上 AI 流程反而可能添乱。

承认不对称已经发生，不等于承认它在每个角落都一样强烈。诚实的姿态是：把这套方法当成一张瓶颈走向图——它告诉你瓶颈在往哪搬、杠杆点在哪，但具体到某个场景该投多少、哪些环节真受益，还得你自己判断。而"该不该上、上到什么程度"这个元判断，本身就是这套方法最看重的那种判断。一套好方法不该要你信它，该给你一双看清瓶颈走向的眼睛，连"它在这里适不适用"也留给你判断——这正是它和营销叙事的区别：营销要你整包接受，方法只递给你看清结构的工具。

Finally, this “why now” needs a layer of restraint, lest it slide into technological determinism: the cost side has collapsed and judgment has become the bottleneck: that’s real. But “so everything should be AI-ified immediately” doesn’t follow. Some design scenarios carry extremely high judgment density and very little machine-checkable surface (a brand rebrand carrying intense emotion, a visual system deeply dependent on a specific cultural context), and in these, what generation can help with is inherently limited; forcing an AI process may add noise instead.

Granting the asymmetry has happened doesn’t mean it’s equally strong everywhere. The honest posture: treat this method as a bottleneck map — it tells you where the bottleneck is moving and where the leverage points are, but how much to invest in a given scenario and which steps actually benefit still need your own judgment. And that meta-judgment — whether and how far to adopt — is itself the kind of judgment this method values most. A good method shouldn’t ask you to believe in it. It should give you eyes to see where the bottleneck moves, and leave even “does this apply here” to your own judgment. That’s exactly what separates it from a marketing narrative: marketing wants you to take the whole package; a method just hands you the tools to see the structure clearly.

证伪条件Falsification condition

这一卷的前提会被推翻，如果：(a) 模型不需要人类给规格，就开始稳定地产出"不是均值、确实为某群人对路"的设计——那说明品味已经被自动化了，②不再退守；或者 (b) 生成的成本并没有实质下降，出稿仍然是团队的瓶颈——那说明①充裕还没到来。只要这两条都不成立（到本版为止都不成立），这套不对称就是真实的结构事实，不是修辞。This volume’s premise is refuted if: (a) models begin, without human specs, to reliably produce designs that are “not the mean and genuinely on-target for some group”: that would mean taste is automated and ② no longer retreats; or (b) generation’s cost hasn’t materially dropped and producing comps is still the team’s bottleneck: that would mean ① abundance hasn’t arrived. As long as neither holds (and neither does as of this edition), the asymmetry is a real structural fact, not rhetoric.

DSN

DESIGN-AS-CODE · 设计即代码

DESIGN-AS-CODE

重画 · 原理

Redraw · Principle

设计即代码——为什么新工具都在去画布化

Design-as-code: why the new tools de-canvas

为什么 pencil、Remotion、html-video 这些工具价值被放大？答案在产物形态，不在工具本身。

Why are tools like pencil, Remotion, html-video amplified? The answer is in the artifact’s form, not the tools themselves.

一句话In one line

产物从二进制画布变成代码，设计就拿到工程同款杠杆：可读、可 diff、可生成、可机检；设计师要的是会读、会判断、会指导生成。When the artifact turns from a binary canvas into code, design gains the same leverage as engineering: legible, diffable, generatable, machine-checkable; what a designer needs is to read, judge, and steer generation.

这正是工程部分那五条贯穿原理，作用在设计这个面上。真正起作用的是"产物变代码"：它让设计满足了同样对 agent 友好的属性——凡满足的都被放大，凡是锁在私有二进制里的都被边缘化：

This is the engineering part’s five through-lines, on the design surface. What actually does the work is “artifact becomes code”: it makes design satisfy the same agent-friendly properties; what meets them gets amplified, what is locked in proprietary binaries gets marginalized:

对 agent 可读：agent 能直接读写设计源，不必解析私有画布。
Legible to agents: agents read and write the design source directly, no proprietary canvas to parse.
可 diff / 可版本：一次改动是一次提交，可评审、可回滚，设计进入工程的协作纪律。
Diffable / versionable: a change is a commit, reviewable, revertible; design enters engineering’s collaboration discipline.
可生成 / 可组合：agent 批量铺变体、组合组件，人只做判断与导向。
Generatable / composable: agents spin up variants and compose components; humans only judge and steer.
可验证：tokens 与约束能被机器检查"是否离牌"：品味的护栏可机检。
Verifiable: tokens and constraints can be machine-checked for “off-brand”: the guardrails of taste become checkable.

"去画布化"是把产物挪到 agent 够得着的形态，不是审美口号

“De-canvasing” moves the artifact into reach of agents, not an aesthetic slogan

画布工具（Figma、Sketch、PSD）把设计状态存成私有二进制：图层树、约束、矢量路径全锁在格式里，只有那个软件读得懂。人能用眼睛看，但 agent 进不去——它读不出"这个按钮用了哪个 token"，也没法把一次修改表达成能评审的文本差异。新一代工具（pencil/paper 用代码描述图形、Remotion 用 React 描述视频、html-video 直接用网页技术出动效）做的是同一件事：把同一份设计重新表达为纯文本。不是"长得更现代"，是产物形态本身变了。

Canvas tools (Figma, Sketch, PSD) store design state as a proprietary binary: the layer tree, constraints, and vector paths are all locked in a format only that software understands. A human can look with their eyes, but an agent can’t get in: it can’t reliably read out “which token this button uses,” and it can’t express a change as a reviewable text diff. The new generation of tools (pencil/paper describing graphics as code, Remotion describing video as React, html-video doing motion straight in web tech) all do the same thing: re-express the same design as plain text. Not “look more modern”: the artifact’s form itself changes.

一旦如此，设计就掉进了软件工程三十年攒下来的全部基础设施里：git、diff、code review、CI、自动化生成。这是产物形态的相变，不是工具竞赛〔源：本系列工程卷"五条贯穿原理"与 design-as-code 实践，证据级 Ⅳ〕[R3]。

Once that happens, design falls into the entire infrastructure software engineering has spent thirty years accumulating: git, diff, code review, CI, automated generation. This is a phase change in the artifact’s form, not a tool race 〔Source: this series’ engineering volume “five through-lines” and design-as-code practice, grade Ⅳ〕[R3].

把"产物变代码"带来的协作纪律说具体一点，会更有说服力。设计是 Figma 文件的时候，团队协作靠的是一套社交协议：在文件里留评论，开会口头对齐，谁是这份文件的 owner——脆弱，追不了溯，改动一多就乱。设计是代码的时候，协作立刻继承工程那套验证了几十年的纪律：每次改动是一个带作者、时间、说明的 commit；并入主干得先过 review；冲突有明确规则；出问题能精确回滚到任何历史版本；谁改了哪一行一目了然。这套纪律是"产物是文本"这个形态自带的：文本能 diff，能 diff 就能 review，能 review 就能协作而不互相覆盖。

这对 AI-Native 尤其关键：一旦 agent 开始大量并行地改设计，没这套纪律，多个 agent 的产出立刻互相打架、没人审得过来。代码形态等于给"人和多个 agent 一起改同一份设计"配了一套现成、能扩展的协作底座，这是画布工具在结构上给不了的。

Making the collaboration discipline that “artifact becomes code” brings more concrete makes the case more persuasive. When design is a Figma file, team collaboration rests on a social protocol: leaving comments in the file, aligning verbally in meetings, “who owns this file”: fragile, untraceable, chaotic once changes pile up. When design is code, collaboration immediately inherits engineering’s discipline validated over decades: each change is a commit with author, time, and message; merging to the trunk requires passing review; conflicts have explicit rules; problems can be precisely reverted to any historical version; who changed which line is plain to see. This discipline comes free with the “artifact is text” form: text can diff, diff enables review, review enables collaborating without overwriting each other.

This matters especially for AI-Native: once agents start changing design en masse and in parallel, without this discipline multiple agents’ outputs instantly conflict and nobody can review them all. The code form hands “humans plus multiple agents changing one design together” a ready-made, scalable collaboration substrate, something canvas tools structurally can’t give.

这里得避免一个把"产物变代码"读偏的方向：它不是要设计师都去学写代码、变成前端工程师。"产物是代码"说的是产物的表达形态是文本，不是要求每个设计师去手敲那段文本——恰恰相反，手敲文本这活儿正该交给生成。设计师需要的不是写代码的能力，是读得懂、判断得了那段代码描述的设计好不好的能力，加上把意图说清楚到足以指导生成的能力。代码形态对设计师的要求不在"会写"，在"会判断、会指导"——这又落回了这一卷的主线：人退到判断与方向，执行（包括把设计写成代码）交给生成。

A misreading of “artifact becomes code” needs to be avoided here: it doesn’t mean designers must all learn to write code and become front-end engineers. “The artifact is code” refers to its expression form being text, not a demand that every designer hand-type that text: quite the opposite, hand-typing that text is exactly the part that should go to generation. What a designer needs is the ability to read and judge whether the design that code describes is good, not the ability to write code, plus the ability to state intent clearly enough to steer generation. In other words, the code form asks of designers not “can write” but “can judge, can steer,” which lands right back on this volume’s through-line: people retreat to judgment and direction, execution (including writing the design as code) goes to generation.

所以"设计即代码"和"设计师要回到品味与意义"不矛盾，是同一件事的两面：正因为产物变成了 agent 能读写的代码，生成才接得住执行，设计师才能腾出手只做判断。想清楚这一点，就不会因为"我不会写代码"误以为自己被挡在 AI-Native 设计门外——你需要的从来是判断，不是写。

So “design as code” and “designers return to taste and meaning” are two faces of the same thing: precisely because the artifact becomes code an agent can read and write, generation can take over execution, freeing the designer to do only judgment. Get this straight and you won’t mistake “I can’t write code” for being excluded from AI-Native design: what you need has never been to write. It’s to judge.

图FIGFIG. D2.0 / ARTIFACT BECOMES CODE · 产物变代码 → 四属性看懂：产物从二进制画布变文本，一次性获得哪四个杠杆 Read: artifact goes binary→text and gains which four levers at once

四个属性是"产物=文本"这一形态的副产物，不是工具的功能。这正是工程卷那条放大律——满足 agent 友好属性的被放大，锁在私有二进制里的被边缘化——落到设计面上的样子。最后一条（可验证）是品味护栏可机检的入口，下一张图与 DSN 06/08 接上。

The four properties are not tool features but byproducts of the “artifact = text” form. This is exactly the engineering volume’s amplification law (what meets agent-friendly properties gets amplified, what is locked in proprietary binary gets sidelined) landing on the design surface. The last property (verifiable) is the entry point where taste’s guardrails become machine-checkable; it connects to DSN 06/08.

"图形即代码"不是新发明，是一条早就存在、如今被 AI 引爆的暗线

“Graphics as code” is no new invention but a long-standing undercurrent now detonated by AI

有段历史值得提一句：把视觉产物写成文本，从来不是新鲜事。SVG 用 XML 描述矢量图，CSS 用声明式规则描述样式，LaTeX 用标记描述排版，PostScript 用程序描述页面——几十年里，"图形即代码"一直是条暗线，只是因为人手写它太慢，大多数设计还是回到所见即所得的画布。AI 改变的不是这条暗线本身，是它的经济性：生成能以近零成本写出、读懂、改动这些文本表达之后，"人手写太慢"这一个拦路虎没了。于是这条一直在的暗线被引爆：代码形态从"理论上更好、实践上太累"变成"理论实践都占了"。这解释了为什么这一波是形态的相变，不是又一次工具迭代——它让一直更优的那个形态终于变得可行。从业者该读出的信号是：押注产物能不能写成文本，而不是押注某个具体工具的功能。

A note of history worth making: expressing visual artifacts as text is not new. SVG describes vectors in XML, CSS describes styling in declarative rules, LaTeX describes typesetting in markup, PostScript describes pages as a program, and for decades, “graphics as code” has existed as an undercurrent, held back only because writing it by hand was too slow, so most design kept returning to the WYSIWYG canvas. What AI changes is its economics, not the undercurrent itself: once generation can write, read, and edit these text expressions at near-zero cost, the one obstacle, “too slow to hand-write,” disappears. So the long-present undercurrent gets detonated: the code form goes from “theoretically better, practically exhausting” to having both. That’s why this wave is a phase change in form, not another tool iteration: the form that was always superior finally became feasible. The signal to read: bet on how text-expressible the artifact is, not on any one tool’s feature set.

同构 / 深潜Isomorphism / dive

设计系统 ↔ 架构的"结构对 agent 可读"是同一招——都是让海量生成连贯的护栏。见The design system ↔ the architecture chapter’s “structure legible to agents” is the same move, both are guardrails that keep mass generation coherent. See 架构篇 ↗the Architecture chapter ↗。.

DSN

REDRAW · 从造物到判物

MAKING → JUDGING

重画 · 流程

Redraw · Process

从打磨一稿到判断多稿

From polishing one to judging many

动作从"打磨一稿"变成"判断多稿"，品味成了瓶颈，一如工程里的验证。

The move goes from “polish one comp” to “judge many”; taste becomes the bottleneck, like verification in engineering.

一句话In one line

执行交给生成之后，人从产出链的末端搬到了前端，剩下三件机器做不了的事：挑、评、导。三件里最被低估的是"导"——把评判变成下一轮的方向。Once execution goes to generation, the human moves from the end of the chain to the front, left with three things a machine can’t do: pick, critique, steer. The most underrated of the three is steer: turning critique into the next round’s direction.

重画的做法，是把动作前移到判断：让生成把候选铺开，设计师做三件机器做不了的事——挑（哪一版对路）、评（好在哪、差在哪）、导（往哪个方向再生成）。这和工程里的 trust-but-verify 是同一招：生成可信但要验，只是这里验的不是对不对，是好不好、像不像自己。

The redraw moves the action upstream to judgment: let generation spread the candidates, and have the designer do the three things a machine can’t: pick (which version is on-target), critique (why it’s good, where it falls short), steer (which direction to regenerate). This is the same move as engineering’s trust-but-verify: generation is trusted but checked. Only here, what’s checked is whether something is good, whether it still looks like you, not correctness.

人的节点没消失，它从产出链的末端搬到了前端

The human node did not vanish; it moved from the end of the chain to the front

旧流程里，设计师的手贯穿全程：从空白画布开始，一笔一笔把脑子里那一稿做出来，价值就在执行的精度上。新流程里，执行交给了生成，人的手退出"做"这一段，收到两端——前端的意图与规格（要往哪生成、什么算好），中段的判断与导向（挑哪版、为什么、再往哪走）。这不是把设计师降级成按按钮的人，恰恰相反：会做的人很多，会判断的人稀缺。一个动作只要能被生成承包，它就从人的核心价值里退场；留给人的，永远是当下还没法自动化的那个判断节点。设计师的稀缺性因此上移了一层——从手上功夫，移到眼力和方向感〔源：Anthropic 2025 agentic-coding 实践＋ Karpathy「软件正在再次改变」验证瓶颈论述，证据级 Ⅳ 一手从业者〕[R1][R2]。

In the old process the designer’s hand ran the whole length: from a blank canvas, stroke by stroke, realizing the comp in their head, with value sitting in execution precision. In the new process execution goes to generation; the human hand exits the “making” segment and lands at the two ends: the front-end intent and spec (which way to generate, what counts as good) and the mid-stream judgment and steering (which version, why, where next). Quite the opposite of demoting the designer to “the person who presses buttons”: plenty of people can make, few can judge. The moment an action can be subcontracted to generation, it exits the human’s core value; what’s left for people is always the judgment node that can’t yet be automated. The designer’s scarcity therefore moves up a layer: from hand-skill to eye and a sense of direction 〔Source: Anthropic 2025 agentic-coding practice + Karpathy’s “software is changing” verification-bottleneck talk, grade Ⅳ practitioner〕[R1][R2].

图FIGFIG. D3.0 / MAKING → JUDGING · 人的节点上移看懂：设计师的手从"做"退出，落到"规格"与"判断"两处 Read: the designer’s hand exits “making,” lands on “spec” and “judgment”

两条泳道同样从左到右，但人的红框位置变了：旧泳道里人占着中段的"执行"；新泳道里执行交给机器（蓝），人退到①规格与③判断两个红框。会做不再值钱，会判断才值钱——这就是"瓶颈搬家"在个人动作层面的样子。

Both lanes run left to right, but the human’s red box has moved: in the old lane people occupied the mid-stream “execution”; in the new lane execution goes to the machine (blue) and people retreat to the two red boxes, ① spec and ③ judgment. Being able to make stops being the prize; being able to judge becomes it. This is “the bottleneck moves” at the level of an individual’s actions.

为什么"判断多稿"比"打磨一稿"更难，而不是更省事

Why “judging many” is harder than “polishing one,” not easier

有个常见的误会，是把"判断多稿"想成轻松活——反正不用自己动手，扫一眼挑个顺眼的就完事。恰恰相反：判断是比执行更高阶的认知活动，做好它比打磨一稿更难。打磨一稿的时候，你脑子里有个明确的目标形象，剩下都是手上功夫；判断多稿的时候，你面对的是一组都"看起来不错"的候选，得在它们之间做出有理由的取舍——这要求你先把判据想清楚（不然就是凭感觉挑），再逐个拿判据去对，还得从落选的里面读出"该往哪个方向再生成"。这种能力需要刻意练，不是会做设计就自动会。这也是为什么"生成让设计变简单了"是个危险的错觉：它让产出变简单了，却让真正决定成败的那部分——判断——变得更密集、更吃功力。把判断当轻松活的团队，最后只是更快地从一堆 slop 里挑出一个略好的 slop。

A common misreading frames “judging many” as the easy job: you don’t have to make it yourself, just glance and pick the nice-looking one. Quite the opposite: judgment is a higher-order cognitive activity than execution, and doing it well is harder than polishing one comp. When polishing one, you hold a clear target image in your head and the rest is hand-skill; when judging many, you face a set of candidates that all “look fine” and must make a reasoned trade-off among them, which requires you first to think the criteria through (or you’re just picking by feel), then evaluate each against them, and also read from the rejects which direction to regenerate. This capability needs deliberate practice; it doesn’t come free with knowing how to design. It’s also why “generation made design simpler” is a dangerous illusion: it made production simpler while making judgment, the part that actually decides success, denser and more demanding. A team that treats judgment as the easy job ends up just picking, faster, a slightly better slop out of a pile of slop.

这里还藏着一个对新人不友好、但必须直面的事实：判断力很难靠"看"速成，它主要靠"做过、并且复盘过为什么"攒出来。过去设计师的成长路径很清楚——做得多，手感和判断一起长。现在执行被生成接管了，新人失去了"在做中练判断"这条天然路径，却又被直接推到了"判断多稿"这个更高阶的任务面前。这是 AI-Native 设计一个真实的、目前还没有好答案的难题：如果新人不再需要做大量执行，他们的判断力从哪来？目前能看到的部分答案，是把判断本身变成可以训练的东西——让新人反复练"对着规格评判候选、说清判据命中还是落空"，让资深设计师把自己的判断显式讲出来供人学（这又接回 DSN 03·5 的"外化品味"）。但这件事值得诚实承认它的难，而不是假装"反正有 AI，新人也能直接上手"。把判断当轻松活的代价，会先落在新人身上，再落到整个团队的判断力储备上。

Hidden here too is a fact unfriendly to juniors yet that must be faced squarely: judgment is hard to fast-track by watching; it accumulates mainly through having done it, and having reviewed why. A designer’s growth path used to be clear: make a great deal, and feel and judgment grow together in the making. Now execution is taken over by generation, juniors lose the natural path of training judgment by making, yet get pushed straight to the higher-order task of judging many. This is a real, not-yet-well-answered difficulty of AI-Native design: if juniors no longer need to do large amounts of execution, where does their judgment come from? The partial answer visible so far is to make judgment itself trainable: have juniors repeatedly practice judging candidates against a spec and stating which criteria hit or missed, and have seniors explicitly articulate their own judgment so others can learn from it (which loops back to DSN 03·5’s “externalizing taste”). But this deserves an honest admission of how hard it is, rather than pretending “AI’s here, so juniors can just jump in.” The cost of treating judgment as the easy job lands on juniors first, then on the whole team’s reserve of judgment.

最后要点名"挑、评、导"三者里最被低估的是导——把评判变成下一轮的具体方向。挑（选出对路那版）和评（说清为什么）已经有不少人意识到重要，但导常被忽略：它要求你不只判断眼前，还要从落选的候选里读出信息——这一批整体偏冷，说明气质该往暖调调；这一版某个局部对了，说明值得往那个方向深挖。导是一种把判断变成生成指令的能力，直接决定下一轮铺开是更聚焦还是又一次发散。一个只会挑、评，不会导的人，环会卡在"挑出一个还行的，但不知道怎么让下一轮更好"——这正是很多人觉得"用 AI 做设计到某个点就上不去了"的根因。导的能力把判断接回生成，让闭环真正转起来。这也是为什么 DSN 07 那条环里，④导向是接住“人的判断”和“机器下一轮生成”的那根轴。

Finally, worth naming: of pick, critique, steer, the most underrated is steer: turning critique into the next round’s concrete direction. Picking (choosing the on-target one) and critiquing (saying why) are already recognized as important by plenty of people, but steer gets overlooked: it requires you not just to judge what’s in front of you, but to read information out of the rejected candidates: this batch running cold overall says the character should warm up; one part of this version landing right says it’s worth deepening in that direction. Steer is the ability to turn judgment into a generation instruction, and it directly decides whether the next spread is more focused or another divergence. Someone who can only pick and critique but not steer gets the loop stuck at “picked a decent one but don’t know how to make the next round better,” which is the root cause behind a lot of people feeling AI design plateaus at some point. Steering is what reconnects judgment to generation and actually turns the loop. That’s also why, in the DSN 07 loop, ④ steer is the axle connecting a human’s judgment to the machine’s next round of generation.

检验信号Test signal

品味命中率上升——候选里一次选中"对路那版"的比例；以及设计师花在判断而非亲手产出上的时间占比上升。产出量本身不是指标。证伪：若设计师花在判断上的时间没升、命中率没升，只是出稿更快了，那就还停在旧流程，只是手更快——没真正发生"造物→判物"的迁移。Taste hit-rate rises: how often the on-target version is picked first; and the share of time spent judging rather than producing by hand rises. Output volume itself is not the metric. Falsified if: the share of time on judgment does not rise and hit-rate does not rise: only comps come faster. Then you are still in the old process with a faster hand; the “making → judging” shift has not actually happened.

同一道分叉，落在设计面上为什么不一样

The same fork lands differently on the design face

这道分叉值得就地推一遍，而不是直接从工程卷搬结论。内核第②步说的是"判断会沿一条可核验性的梯度分叉"：能被机器核验的判断交给机器，不能的留给人。关键在"可核验"到底指什么——它指存在一道判定程序：给定产物，机器能独立、可复现地判出对错。工程面能把大半判断推给机器，正是因为正确性大多有这道程序：测试要么过要么不过，类型要么对要么错，benchmark 给出一个数。这是一道判定，不是一种感受。

This fork is worth deriving in place rather than importing wholesale from the Engineering volume. The kernel’s step ② says judgment forks along a verifiability gradient: what a machine can verify goes to the machine, what it can’t stays with people. Everything hinges on what “verifiable” means; it means a decision procedure exists: given an artifact, a machine can independently and reproducibly rule it right or wrong. Engineering can push most judgment to machines precisely because correctness usually has such a procedure: a test passes or fails, a type checks or doesn’t, a benchmark spits out a number. That’s a verdict, not a feeling.

设计面上，被判断的是体验和美——而“好”在这里今天还没有判定程序。“这一版是否对路、是不是为这群人、有没有越过品味那条线”，没法归约成一个会通过或不通过的测试。可以测可用性、测转化、测对比度合规，但这些只是“好”的代理，不是“好”本身；把代理当判定，正是 slop 和同质化的来路（DSN 09·7）。所以同一道分叉，在设计面上落点不一样：工程面上判断的大头滑向机器，设计面上承重的那部分判断——品味——因为没有可机检的判定程序而滑不过去，停在人这一侧。要紧的是别把这一步说满：品味“结构性”留在人这侧，是本卷押得最重的一注，不是已证出的定理。今天它停在这里，只因为那道判定程序还没人造出来；会让我们改判的不是哪个测试被刷过，而是另一种情形——当模型在真实使用里稳定提出更好的取舍、还能说清代价，人这一端就不再是判断的来源，那道分叉于是被重画，不是被推翻。在此之前，品味是这道分叉在设计面上的构成性判断：拿走它，今天这道叉就没有了人那一端。〔证据级 Ⅴ 论证；可机检代理与“好”本身的区别，见 DSN 06 规格化的上限〕

On the design face, what gets judged is experience and beauty, and here “good” has no decision procedure yet. “Whether this version is on-target, whether it’s for these people, whether it crossed the taste line” can’t be reduced to a test that passes or fails. You can measure usability, measure conversion, measure contrast compliance, but these are only proxies for “good,” not “good” itself; mistaking the proxy for the verdict is exactly where slop and homogenization come from (DSN 09·7). So the same fork lands somewhere different on the design face: on the engineering face the bulk of judgment slides to the machine; on the design face the load-bearing judgment, taste, can’t slide across and stays on the human side, because no machine-checkable decision procedure exists. The thing not to overstate: that taste sits “structurally” with people is this volume’s heaviest bet, not a proven theorem. It sits here today only because that decision procedure has not been built; what would change our minds is not some benchmark getting gamed but another situation entirely — when a model reliably proposes better trade-offs in real use and can account for their cost, the human end stops being the source of judgment, and the fork gets redrawn rather than refuted. Until then, taste is the constitutive judgment of this fork on the design face: take it away and today’s fork has no human end at all. [grade Ⅴ, argument; the proxy-vs-“good” distinction is the ceiling of formalization in DSN 06]

图FIGFIG. D3.5 / THE VERIFIABILITY-GRADIENT FORK · 设计面看懂：把设计判断按"有没有判定程序"铺成一条梯度，看内核第②步在哪一点叉开 Read: lay design judgments on a gradient by “is there a decision procedure,” and see where the kernel’s step ② forks 笔画图例：实线＝观察到的事实 · 虚线＝当前押注（可改判）· 点线＝竞争解释 Stroke legend: solid = observed fact · dashed = current bet (revisable) · dotted = competing explanation

这条梯度是按"有没有一道可复现的对/错判定程序"排的，不是"难易"排序。左端（对比度、间距刻度、token、lint）存在这道程序，判断滑向机器（实线＝观察）；越往右程序越写不尽，到"它活着吗、是不是对的想法"这端干脆没有程序——只剩一种感受。本卷把"品味因此停在人这侧"画成当前押注（虚线箭头与虚框），不是已证的定局；底部那条点线是竞争分支：若模型在真实使用里稳定给出更好取舍、还能说清代价，人这端就会溶解，这道叉被重画而非被推翻。所以要问的不是"机器是不是永远够不到"，而是"那道判定程序会不会、何时被造出来"。

This gradient is ordered by “whether a reproducible right/wrong decision procedure exists,” not by difficulty. At the left end (contrast, spacing scale, tokens, lint) the procedure exists and the judgment slides to the machine (solid = observed); the further right, the less it can be written down, until at “is it alive, is it the right idea” there is no procedure at all — only a feeling. This volume draws “taste therefore stays on the human side” as the current bet (dashed arrow and box), not a settled fact; the dotted branch at the bottom is the competing one: if a model reliably offers better trade-offs in real use and can account for their cost, the human end dissolves and the fork is redrawn, not refuted. So the real question is not “can machines never reach it” but “whether, and when, that decision procedure gets built.”

可机检 · 交给生成与护栏Machine-checkable · hand to generation & guardrails

对比度 / 触达尺寸 / 无障碍合规
Contrast / touch-target size / accessibility compliance
是否符合 token 与组件规格
Conformance to token & component spec
链路是否走得通（可用性硬错误）
Whether the flow works at all (hard usability errors)

无判定程序 · 结构性留给人No decision procedure · structurally human

这一版是否对路、是否"还像我们"
Whether this version is on-target, still “looks like us”
节奏、情绪、何时留白、何时收住
Pacing, emotion, when to leave space, when to hold back
越过"够好"那条线了没有——品味的判据
Whether it crossed the “good enough” line: the taste verdict

DSN

03·5

TASTE · 品味是稀缺判断

TASTE AS SCARCE JUDGMENT

机理 · 拆解

Mechanism · Anatomy

"品味"不是不可讨论的直觉，它是一组可拆解的判断

“Taste” is not an untouchable intuition; it is a set of decomposable judgments

把品味当天赋，就会放弃在 AI-Native 流程里最该做的事。这一节拆开它。

Treat taste as innate and you give up the very thing you should most do in an AI-Native process. This section takes it apart.

一句话In one line

品味能拆成三层：为谁着想的共情、什么算好的判别、往哪走的方向感。共情是根——稀缺，但不神秘。Taste breaks down into three layers: empathy for whom, discernment of what’s good, a sense of direction for where next. Empathy is the root: scarce, but not mysterious.

画一个渐变、排一版网格、调一组配色——这些"会做"的技能，模型已经又快又好地替你做了。可十版都"做得对"的稿摆上桌，要挑出哪一版对这群人是对的，模型就哑了。这一哑，分出了技能和品味：品味不是稀缺的技能，是稀缺的判断。它问的不是"会不会做"，是"该不该是这样"——靠的是对用户处境的理解、对品牌语境的记忆、对"过犹不及"那条线的敏感，全是没法只从产物文本里读出来的东西。这也是 DSN 00 那道分诊问句把它判到④的原因：品味坐在可验证性梯度的最远端，离可机检最远，也就离可自动化最远。

Draw a gradient, lay out a grid, tune a palette, these “can-do” skills a model already does for you, fast and well. But put ten comps all “done right” on the table and ask which one is right for these people, and the model goes silent. That silence is where skill and taste part ways: taste is a scarce judgment, not a scarce skill. It asks not “can it be done” but “should it be this way,” and that depends on understanding the user’s situation, remembering the brand context, sensing where “more becomes worse,” all things that can’t be read from the artifact text alone. This is exactly why DSN 00’s triage question sorts it to ④: taste sits at the far end of the verifiability gradient, furthest from machine-checkable, and so furthest from automatable.

拆开看：共情是品味的根，没有它品味只是个人偏好

Unpacked: empathy is the root of taste; without it, taste is mere preference

三个组成里，共情是根，另外两个都长在它上面。判别力（什么算好）和方向感（往哪走），一旦脱离"为谁"，就退化成纯粹的个人偏好——"我觉得好看"。这正是 slop 和品味最容易被混的地方：一个设计师凭个人审美做的判断，和一个设计师为某群用户做的判断，外表都像"主观选择"，但前者是偏好，后者是品味。区别就在有没有共情这个根：品味永远能回答"对谁、什么处境、为什么"，偏好只能回答"我喜欢"。

Of the three components, empathy is the root, and the other two grow on it. Discernment (what counts as good) and direction (where to go) degenerate into pure personal preference, “I think it looks nice,” the moment they’re cut off from “for whom.” This is exactly where slop and taste get confused most easily: a judgment made from personal aesthetics and a judgment made for a specific group of users both look, on the surface, like “subjective choices.” But the first is preference and the second is taste. The difference is whether the root of empathy is there: taste can always answer “for whom, what situation, why”; preference can only answer “I like it.”

共情也是最难交出去的一层，但难在哪，值得说准。AI 能基于数据模拟“某类用户可能怎么想”，给你有用的素材；它今天做不到的，是为这群人过得好不好负责——在没有指标催的时候，仍然盯着他们真实的处境、在意结果好坏。品味的根现在就长在这种在意上：一个不在意用户的设计师，工具再强也只做得出更精致的自我表达；一个真在意的，工具再简陋也能做出“为他们而存在”的东西。这里得留一个诚实的问号，别把话说满：用户要的是“被服务好”，这到底非得有一颗会在意的心，还是一套可靠摸清他们处境、并为结果担责的流程就够——“机器本体上不会在意”是断言，不是我们验过的事。眼下更稳的说法是：把在意外化成看得见的关照，至今仍然只有人稳定做得到。这就是为什么这一卷反复回到“为人”：不是道德口号，是当下让品味成其为品味、而不是偏好的那个根。

Empathy is also the layer hardest to hand off, but it’s worth being precise about where the difficulty lies. AI can simulate from data how a class of users might think and hand you useful material; what it can’t do today is answer for whether those people are well served — keep watching their real situation, keep caring how it turns out, when no metric is pushing it to. The root of taste grows, right now, on that caring: give a designer who doesn’t care the strongest tool and they’ll produce only more polished self-expression; one who truly does can make something that “exists for them” with crude tools. But an honest question mark belongs here, not an overstatement: what users actually want is to be well served, and whether that strictly requires a mind that cares, or just a process that reliably reads their situation and stays accountable for the outcome, is not something we’ve tested — “a machine ontologically cannot care” is an assertion, not a finding. The steadier claim for now: externalizing that care into attention you can see and check is still something only people do reliably. That’s why this volume keeps returning to “for people”: not a moral slogan, but the root that, for now, makes taste taste rather than preference.

把品味拆成共情、判别、方向这三层，还有个直接的实用价值：它让"怎么培养品味"从一句空话变成能练的东西。要是品味是拆不开的天赋直觉，"培养品味"就无从下手；但既然它是三层判断合成的，每一层都能单独练。练共情：逼自己离开屏幕，真去接触用户，看他们在真实处境里怎么用、卡在哪、为什么放弃——共情是看出来、问出来的，不是想出来的。练判别：刻意做"对照明确判据评判候选"的练习，每次都逼自己说出"好在哪、差在哪"，把模糊的好恶逼成清楚的判据。练方向：复盘自己和高手判断差在哪——同一组候选，为什么高手选了 B 你选了 A，他看到了什么你没看到。这三层各有练法，合起来就是一条能走的品味成长路径。

Decomposing taste into these three layers (empathy, discernment, direction) has a direct practical payoff too: it turns “how to cultivate taste” from an empty phrase into something you can actually train. If taste were indivisible innate mysticism, “cultivating taste” would have no handle to grab; but since it’s a synthesis of three layers of judgment, each layer can be trained on its own. Train empathy: force yourself off the screen and into real contact with users: watch how they use things, where they get stuck, why they give up. Empathy is something you see and ask your way into, not something you imagine. Train discernment: deliberately practice judging candidates against explicit criteria, forcing yourself every time to state where something is good and where it’s weak, pressing vague likes and dislikes into clear criteria. Train direction: review where your judgment differs from an expert’s: same set of candidates, why did they pick B while you picked A, what did they see that you didn’t. Each of the three has its own way to practice, and together they make up a walkable path for growing taste.

这也再次说明：品味稀缺，是因为要长期投入才练得成；不神秘，是因为它能被拆开、被刻意练。

Which again shows taste is scarce but not mysterious: scarce because it takes sustained investment; not mysterious because it can be taken apart and deliberately practiced.

品味稀缺，但不神秘——它可以被外化、被教、被回流

Taste is scarce but not mysterious: it can be externalized, taught, and fed back

"稀缺"和"不可言说"是两件事，混在一起会犯双重错误。承认品味稀缺，是承认它没法被生成替代、必须由人持有；但要是进一步以为它"不可言说"，就等于放弃了 DSN 04/08 写规格那件事——规格恰恰是品味的外化形式。一个资深设计师判断"这版不行"只要一秒，但要是他能把"不行在哪、好该是什么样"说清楚、写下来，这一秒的判断就变成了能喂给生成的护栏、能教给新人的判据、能在团队里复用的资产。AI-Native 设计的一项核心训练，正是逼设计师把藏在直觉里的品味一条条说出来——不是说出来品味就不稀缺了，是说出来它才能进那条闭环、才能被复利。说不出"为什么好"的判断，对团队和生成都是不可见的，它只在那一个人脑子里有效一次。

“Scarce” and “ineffable” are two different things, and conflating them is a double error. Granting that taste is scarce means granting it can’t be replaced by generation and has to be held by people. But go one step further and assume it’s “ineffable,” and you give up the spec-writing in DSN 04/08, and the spec is precisely the externalized form of taste. A senior designer needs only a second to judge “this version doesn’t work,” but if they can say and write down where it doesn’t work and what good should look like, that one-second judgment turns into a guardrail you can feed to generation, criteria you can teach a junior, an asset the team can reuse. A core piece of AI-Native design training is exactly forcing designers to articulate, one item at a time, the taste hiding in their intuition, not because articulating it makes it un-scarce but because only articulated can it enter the loop and compound. A judgment that can’t say why it’s good is invisible to the team and to generation; it works once, only inside one head.

品味是主观的吗？是相对的，但不是任意的

Is taste subjective? Relative, yes; arbitrary, no

有个论点常被用来否定"品味可以讨论"："品味是主观的，各花入各眼，没什么对错好说。"这个论点把相对和任意混在了一起。品味确实是相对的——它相对于"为谁、什么语境、什么目的"才成立；同一个设计，对独立开发者对路，对幼儿教育产品就不对路。但相对不等于任意：一旦把"为谁、什么语境、什么目的"钉死，"哪一版更对路"就有了能讨论、能论证、甚至能让大多数有经验的人达成共识的答案。这正是为什么 DSN 08 的规格里 FOR-WHOM 必须写在最前面——它把"主观"换成了"相对于一个明确对象的可判断性"。把品味说成纯主观，往往是放弃判断的借口；设计判断从来不是"我喜欢"，是"对这群人、在这个目的下，这一版更对，因为……"。AI 在这件事上的位置也就定了：它能帮你模拟"某类用户可能怎么反应"，给判断提供素材；但把相对性钉死的那个锚——为谁、什么目的——以及最后"对不对路"的裁定，还得由理解这群人的人来下。

An argument often used to deny that taste can be discussed goes: “taste is subjective, beauty is in the eye of the beholder, there’s no right or wrong to speak of.” This argument conflates relative with arbitrary. Taste is indeed relative: it holds relative to “for whom, what context, what purpose”; the same design is on-target for solo developers and off-target for an early-childhood education product. But relative doesn’t mean arbitrary: once “for whom, what context, what purpose” is pinned down, “which version is more on-target” has an answer that can be discussed, argued, even agreed on by most experienced people. That’s exactly why FOR-WHOM has to come first in DSN 08’s spec: it converts “subjective” into judge-ability relative to a defined object. Calling taste purely subjective is often just an excuse to abandon judgment; real design judgment is never “I like it,” it’s “for these people, under this purpose, this version is more right, because…” AI’s place in this follows from that: it can help simulate how a class of users might react, supplying material for judgment, but the anchor that pins down relativity, for whom and what purpose, and the final verdict on whether something is on-target, still has to be set by a person who understands these people.

检验信号Test signal

品味在被外化的信号：团队评审时，"我觉得这版更好"逐渐被"它更好，因为对 X 用户在 Y 处境下命中了 Z"取代；新人的命中率随规格完善而上升。证伪：若资深设计师的判断始终说不出理由、无法写进规格、新人怎么也学不会，那要么品味还没被真正拆解，要么把"个人偏好"误当成了"品味"。Signal that taste is being externalized: in review, “I feel this version is better” gets gradually replaced by “it’s better because it hits Z for user X in situation Y”; juniors’ hit-rate rises as the spec sharpens. Falsified if: a senior’s judgment can never state a reason, never enters the spec, and juniors never learn it, then either taste has not really been decomposed, or “personal preference” has been mistaken for “taste.”

上一段说：把“为谁、什么语境、什么目的”钉死，“哪一版更对路”就有得讨论、能让多数有经验的人达成共识。可评审桌上真发生的，常常是另一幕。

The paragraph above said: pin down “for whom, what context, what purpose” and “which version is more on-target” becomes discussable, something most experienced people can converge on. What actually happens at the review table is often a different scene.

同一份规格摆在两个资深设计师面前，对象已经钉死：为第一次上线的独立开发者做的引导页，目的是尽快发布第一个页面，气质要稳、不要打鸡血。生成给了两版，都合规、都可访问、都不是 slop。A 版：一个聚焦输入框，其余全部拿掉，屏幕上只剩一个动作。B 版：先铺一条三步预览，把接下来会发生什么摆在眼前。设计师一号选 A：“对一个发怵的新手，一个清楚的动作把台阶降到最低，多余的上下文是噪音。”设计师二号选 B：“对一个还在掂量值不值得投入的人，看见完整路径才建立信任，光一个输入框把回报藏起来了。”两人指的是同一群用户、同一个目的，各自都讲得通，谁也没在耍稻草人。

One spec, two senior designers, the object already pinned: an onboarding screen for first-time solo developers, the goal to publish a first page fast, the character calm rather than hyped. Generation returned two versions, both compliant, both accessible, neither slop. Version A: a single focused input, everything else stripped, one action on the screen. Version B: a three-step preview laid out first, showing what is coming. Designer one picks A: “for an anxious first-timer, one clear action drops the cliff to its lowest; extra context is noise.” Designer two picks B: “for someone still weighing whether it is worth the investment, seeing the whole path is what builds trust; a bare input hides the payoff.” Both point at the same users and the same purpose, both hold up, and neither is fighting a strawman.

解释一 · 分歧是对象没钉够细。“第一次上线的独立开发者”里其实藏着两拨人：发怵的和评估的。规格还差一刀——这一屏到底先服务哪一拨。把这刀切下去，取舍就收敛，共识回来了。照这个解释，“品味是谁的判断”有答案：谁握着规格，谁就握着这一票；设计决定跟着产品判断走，不玄。

Explanation one · the disagreement is an under-cut object. “First-time solo developers” actually hides two crowds: the anxious and the evaluating. The spec is one cut short: which crowd does this screen serve first. Make that cut and the trade-off converges; consensus comes back. On this reading, “whose judgment is taste” has an answer: whoever holds the spec holds the vote; the design call follows the product call, nothing mystical.

解释二 · 分歧是一道规格化不掉的价值岔路。A 和 B 编码了两套“什么才是为这群人好”的理论——降低焦虑，还是先挣得信任——没有一个中立视角能在不先选定“产品想和用户建立哪种关系”之前，把两者排出高下。把规格切得更细并不会溶掉这道岔，只是把同一个判断挪给写规格的人。这里的品味不是越较真越收敛的判别力，是一份得有人署名、有人担责的主张。

Explanation two · the disagreement is a value fork no spec dissolves. A and B encode two theories of what “good for these people” means, lower the anxiety or earn the trust first, and no view from nowhere ranks them without first choosing what relationship the product wants with the user. Cutting the spec finer does not dissolve the fork; it just relocates the same judgment onto whoever writes the spec. Taste here is not a discernment that converges the harder you push; it is a commitment someone has to sign and answer for.

暂定回答 · Q-DSN-01Working answer · Q-DSN-01

我们目前押：评审桌上多数这类分歧是解释一——对象没钉够细，纪律就是把“为谁”切到取舍能收敛为止，这也是规格里 FOR-WHOM 排第一的原因。但总有一层残差是解释二；把它一律当成解释一，正是团队把一个价值选择洗成“最佳实践”的手法。落地动作：当两版在更细的规格下仍然都站得住，就别再用品味去仲裁，把它抬回产品那一注——点名谁拍板（为后果担责的那个人），并把这次决定记成赌注，不是定论。区分性观察：把规格切细后，若资深判断能稳定收敛，解释一赢，品味大体是可追回的判断；若那道岔路照旧存在，解释二承重，“品味”就有一部分是被授予的裁决权，不是被发现的对错。Our current bet: most review-table disagreements of this kind are explanation one, the object is not cut fine enough, and the discipline is to sharpen “for whom” until the trade-off converges, which is why FOR-WHOM comes first in the spec. But a residue is always explanation two, and treating it uniformly as explanation one is exactly how a team launders a value choice into a “best practice.” The move on the ground: when two versions both survive a sharper spec, stop arbitrating on taste and escalate to the product bet, naming who decides (the person accountable for the outcome) and logging the call as a bet, not a settled truth. Distinguishing observation: if sharpening the spec reliably converges senior verdicts, explanation one wins and taste is largely recoverable judgment; if the forked residue survives every sharpening, explanation two is load-bearing and “taste” is partly granted authority to decide, not discovered correctness.

更深的一问。把两版都讲清楚，并没有让这次选择更对，只是让分歧变得看得见——那么“能解释每一次取舍”到底是让判断变好了，还是只让我们更会为早就偏爱的那版辩护？再往前一步：如果市场稳定奖励 B 版，只因为它长得像用户早就信任的一切，那“更对路”这个裁定，追的是用户的好，还是熟悉本身？

The deeper question. Explaining both versions did not make the choice more right; it only made the disagreement legible. So does “being able to explain every trade-off” actually improve the judgment, or just make us better at defending the version we already preferred? One step further: if the market steadily rewards version B only because it looks like everything users already trust, is the “more on-target” verdict tracking the user’s good, or tracking familiarity itself?

DSN

SYSTEM · 系统即护栏 / 何为好

SYSTEM & THE SPEC OF GOOD

重画 · 规格

Redraw · Spec

设计系统即护栏，把"何为好"写下来

The system as guardrail, writing the spec of “good”

一句话In one line

tokens、组件、品牌是生成之前的护栏——把"何为好"写下来，生成才不会滑向均值。上限不在模型多强，在规格写得有多准；这份规格还得切两层：机器守"离没离牌"，人守"是不是为人"。Tokens, components, brand are guardrails set before generation: write down what counts as good, and generation stops sliding to the mean. The ceiling isn’t set by how strong the model is, it’s set by how precise the spec is; and that spec splits into two layers: the machine holds on-brand, the human holds for people.

老板给的 brief 只有三个字——"要高级"。把这三个字丢给生成，你会拿回一堆均值：因为"高级"那把尺藏在他脑子里，模型学不到、也没法拿它评判，生成不会读心。要让生成不滑回均值，唯一的办法是把那把尺写下来——把意图和判据外化成能指导、能评判生成的规格。这是不能外包给模型的人类规格，和验证篇"人定何为对"是同一道，只不过这里定的是"何为好"。这把尺该包含：

The boss’s whole brief was two words: “make it premium.” Hand those two words to generation and you get back a pile of the mean: the ruler for “premium” lives in his head, the model can’t learn it and can’t be judged against it: generation doesn’t read minds. The only way to keep generation off the mean is to write that ruler down: externalize intent and criteria into a spec that steers and judges generation. This is a human spec you can’t outsource, the same move as the Verification chapter’s “humans define what’s right,” except here you’re defining what’s good. The ruler should hold:

这里有一条常被忽略的因果链，值得说透：生成的质量上限，不取决于模型多强，取决于喂给它的规格多有判别力。给一个极强的模型一句"做个好看的 dashboard"，它只能还你一个均值；给一个中等的模型一份写清了"为谁、什么气质、什么算完成、什么是红线"的规格，它反而能落在窄带里。在 AI-Native 设计里，设计师的杠杆点从"手有多巧"移到了"规格有多准"。设计系统就是这份规格持久化、可复用、可机检的形态——它把一次性的口头判断，沉淀成每次生成都自动生效的护栏。把它当事后文档，等于每次都从零向模型解释"我们要什么"；把它当前置护栏，等于让团队所有人（和所有 agent）共享同一把已经磨好的尺。

There’s a causal chain worth spelling out here, one that gets overlooked a lot: the ceiling on generation’s quality doesn’t depend on how strong the model is, it depends on how discriminating the spec you feed it is. Give an extremely strong model “make a good-looking dashboard” and it can only hand back the mean; give a mid-tier model a spec that spells out “for whom, what character, what counts as done, what’s a red line,” and it lands in the narrow band instead. That means in AI-Native design, the designer’s leverage point moves from how skilled the hand is to how precise the spec is. The design system is exactly that spec in persisted, reusable, machine-checkable form: it distills one-off verbal judgments into a guardrail that fires automatically on every generation. Treat it as a post-hoc doc and you re-explain “what we want” to the model from scratch every time; treat it as an upfront guardrail and the whole team, and every agent, shares one ruler already ground sharp.

这条因果链还解释了一个让不少团队困惑的现象：为什么同样用最新的模型，有的团队产出稳定地好，有的却始终在 slop 里打转？差别几乎从不在模型，在那份喂给模型的规格的质量，以及背后那套设计系统的完备度。模型对所有人都一样，产出的差异几乎全来自约束它的护栏不同。在 AI-Native 时代，一个团队的设计竞争力，越来越不体现在"谁能招到手更巧的设计师"，而体现在"谁能把何为好沉淀成更准、更完备、更可机检的规格与系统"。这是对组织能力的重新定义：核心资产从"个人的手艺"转向"被外化、可复用、能喂给生成的判断"。也正因如此，早早认真建系统、认真写规格、认真让判断回流的团队，会在这条曲线上越走越快；把设计系统当文档敷衍的团队，无论换多新的模型，都会发现自己被卡在均值附近——因为模型从来不是瓶颈，护栏才是。

This causal chain also explains something that puzzles a lot of teams: using the same newest model, why do some teams produce steadily good work while others keep spinning in slop? The difference is almost never the model: it’s the quality of the spec fed to it, and how complete the design system behind it is. The model is the same for everyone; the difference in output comes almost entirely from the guardrails constraining it. Which means in the AI-Native era, a team’s design edge shows up less in “who can hire the more skilled-handed designer” and more in “who can distill what’s good into a sharper, more complete, more machine-checkable spec and system.” That’s a redefinition of organizational capability: the core asset moves from individual craft to externalized, reusable, generation-feedable judgment. And that’s exactly why teams that start early on building the system, writing the spec, feeding judgment back run faster and faster on this curve, while teams that treat the design system as a doc to fob off find themselves stuck near the mean no matter how new a model they swap in, because the model was never the bottleneck. The guardrails are.

为谁、为什么：要服务的人是谁，让他们完成什么、感受到什么。
For whom, for what: who it serves, what it should let them do and feel.
意图与气质：该有的调性与个性，以及明确不要什么。
Intent and character: the tone it should carry, and explicitly what it should not be.
"完成"的判据：怎样算好、算对路、算可发——一组能据以验收的具体信号。
Criteria for “done”: what counts as good, on-target, shippable: concrete signals you can accept against.
反 slop 红线：明令避开的套路（见下一节），把"别像 AI 做的"变成可检查的条目。
Anti-slop red lines: clichés to avoid by rule (see the next section), turning “don’t look AI-made” into checkable items.

设计系统从"交付后的文档"升级为"生成前的护栏"

The design system upgrades from “post-hoc doc” to “pre-generation guardrail”

过去的设计系统是事后产物：先把界面做出来，再回头整理出一套 tokens、组件库、品牌规范，供团队对齐。它的角色是记录已经形成的共识。在 AI-Native 流程里，这个角色翻转了——设计系统必须前置成喂给生成的规格。原因很直接：生成的默认终点是均值（见 DSN 01 的不对称），唯一能把它从均值拉开的，是生成发生之前就给定的约束。没有护栏的生成，等于把方向盘交给训练分布；有护栏的生成，才是把它约束在"这群人、这个品牌"的窄带里。所以设计系统不再是设计的结果，是设计的前提——它越完备、越可机检，海量生成就越不容易离牌〔源：Patterns（Cell Press）同行评议——生成轨迹坍缩到通用视觉母题，证据级 Ⅱ；测的是趋同引力，不外推具体比例〕[R8]。

The old design system was a post-hoc artifact: build the interface first, then go back and tidy up a set of tokens, a component library, brand guidelines for the team to align on. Its job was recording a consensus already formed. In the AI-Native process that job inverts: the design system has to move upfront, into the spec fed to generation. The reason is direct: generation’s default destination is the mean (see DSN 01’s asymmetry), and the only thing that pulls it off the mean is a constraint set before generation happens. Generation without guardrails hands the wheel to the training distribution; generation with guardrails is what constrains it to the narrow band of “these people, this brand.” So the design system is no longer the result of design: it’s the precondition for it. The more complete and machine-checkable it is, the harder it is for mass generation to drift off-brand 〔Source: Patterns (Cell Press), peer-reviewed — generation trajectories collapse to generic visual motifs, grade Ⅱ; measures the gravity of convergence, no specific proportion extrapolated〕[R8].

同一份规格切两层：机器守"不离牌"，人守"是否为人"

One spec, two layers: the machine holds “on-brand,” the human holds “for people”

规格不是一团均质的文字，它内部就分两层，正对应内核的①和④。把它们混在一起，是后面所有失败模式的根源：要么把品味硬塞进 lint（换来对齐完美却没灵魂的 slop），要么让人去盯机器该管的对齐间距（把稀缺的判断力耗在能自动化的事上）。下表把这条切分做成能照搬的对照〔判据见 DSN 08 的分诊问句〕：

A spec isn’t one uniform block of text: it splits internally into two layers, mapping exactly onto the kernel’s ① and ④. Mixing them is the root of every failure mode downstream: either you force taste into lint and get pixel-perfect, soulless slop, or you have people police the alignment and spacing a machine should own, burning scarce judgment on what’s automatable. The table below turns this cut into something you can copy directly 〔criterion: the triage question in DSN 08〕:

维度	Dimension	硬约束层 · 可机检（并入①）	Hard layer · machine-checkable (→①)	软判据层 · 人来定的品味（留给④）	Soft layer · constitutive taste (→④)
问的是	Asks	仅看产物文本能否判对错？	Right/wrong from artifact text alone?	是否为这群人、有没有灵魂？	For these people? Does it have soul?
典型项	Typical items	token 符合度 · 对齐间距 · 对比度 ≥4.5:1 · 触达 ≥44px · 无离系统字号	Token-conformance · alignment · contrast ≥4.5:1 · hit-target ≥44px · no off-system sizes	为谁而做 · 调性气质 · 情感命中 · 异质性 · "完成"的感觉	For-whom · tone & character · emotional fit · heterogeneity · the feel of “done”
谁来守	Held by	机器 · lint / CI 自动跑	Machine · lint / CI, automatic	人 · 设计师亲自判	Human · the designer judges
写成什么	Written as	规则 / 阈值 / 断言	Rules / thresholds / assertions	意图陈述 + 验收信号	Intent statements + acceptance signals
塞错层的后果	Cost of wrong layer	（本就该自动化，无代价）	(belongs here, no cost)	硬塞 lint → 优化出没人想用的完美界面	Forced into lint → a flawless interface no one wants
内核位置	Kernel slot	① 充裕（被自动化）	① abundance (automated)	④ 意义（人回归）	④ meaning (people return)

图FIGFIG. D4.0 / SYSTEM-AS-SPEC · 设计系统即护栏看懂：tokens / 组件 / 品牌如何夹住生成的输出分布 Read: how tokens / components / brand clamp the output distribution of generation

把生成的输出想成一条概率分布。无护栏时，峰落在训练分布的均值——也就是 slop。设计系统靠两道硬约束墙把概率质量夹离均值、推到"只对这群人成立"那个窄带，不只是把分布变窄那么简单。墙是可机检的硬约束（token/对齐/对比度），墙内的形状仍由人的软判据塑造。

Think of generation’s output as a probability distribution. Without guardrails the peak sits at the mean of the training distribution, that is, slop. The design system does more than narrow the distribution; two hard walls clamp the probability mass off the mean, pushing it into the band that is “true only for these people.” The walls are machine-checkable hard constraints (token / alignment / contrast); the shape inside the walls is still sculpted by human soft criteria.

图FIGFIG. D4.1 / SYSTEM AS A TWO-LAYER SPEC · 把护栏拆成两层看懂：tokens→组件→品牌叠成可机检的护栏层，它如何夹住生成的输出分布 Read: tokens→components→brand stack into the machine-checkable guardrail layer, and how it clamps generation’s output distribution

设计系统是两层规格，不是一份文档。A 层是可机检的护栏：tokens（原子值）、组件（组合契约）、品牌里能写成 lint 的部分（禁用模式、字体白名单），自下而上叠成一道生成前就固化、机器能逐条核验的硬约束。这道护栏的作用，正是把右边那条本会峰在均值（=slop）的输出分布，夹离均值、推进"只对这群人成立"的窄带。但护栏只决定墙在哪：墙内"哪一版才真的对路"，是 B 层、是人的软判据。把这两层分清楚，就不会再问"设计系统能不能替我判断"：A 层永远替你，B 层永远替不了你。

A design system is a two-layer spec, not a document. Layer A is the machine-checkable guardrail: tokens (atomic values), components (composition contract), and the lint-expressible part of brand (banned patterns, font whitelist), stacking bottom-up into a hard constraint frozen before generation and verifiable item by item. The job of this guardrail is exactly to take the output distribution on the right (which would otherwise peak at the mean (= slop)) and clamp it off the mean into the narrow band that is “true only for these people.” But the guardrail only sets where the walls stand; “which version inside is truly on-target” is Layer B, the human’s soft criteria. Keep the two layers distinct and you stop asking “can the design system judge for me”: Layer A always can, Layer B never can.

设计系统与架构的"结构即护栏"是同一招：同一个原理，不是类比

The design system and architecture’s “structure as guardrail” are the same move: one principle, not an analogy

这里值得把系列内部这几处"同一招"说透，因为它证明这套方法不是设计领域的特例，是同一条原理在不同面上的复现。架构篇讲过：当 agent 大量生成代码，唯一能让这些代码连贯、不互相打架、不偏离意图的，是一套对 agent 可读的结构约束——清晰的模块边界、类型、接口契约。设计系统在设计面上做的是一模一样的事：当 agent 大量生成界面，让它们连贯、不离牌、不滑向 slop 的，是一套对 agent 可读的结构约束——tokens、组件契约、品牌原则。两边都是用"前置的、可机检的结构护栏"，去调和"海量生成"和"连贯性"这对矛盾。

This intra-series echo is worth stating fully, because it’s proof this method is the same principle recurring on a different face, not a special case for design. The Architecture chapter argued: when agents generate code en masse, the only thing keeping that code coherent, non-conflicting, on-intent is a set of structural constraints legible to agents: clear module boundaries, types, interface contracts. The design system does exactly the same thing on the design face: when agents generate interfaces en masse, what keeps them coherent, on-brand, off the slide into slop is a set of structural constraints legible to agents: tokens, component contracts, brand principles. Both reconcile the same tension, mass generation versus coherence, with upfront, machine-checkable structural guardrails.

这不是修辞上的类比，是因为它们面对的是同一个底层问题：生成一旦变廉价，质量的瓶颈就从"能不能生成"转移到"生成出来的一堆东西能不能保持一致与意图"，解决这个瓶颈的通用形态，就是把意图固化成生成前、机器能读的约束。这就是为什么读过架构篇的人会在这里有强烈的既视感：同一台机器，换了个面。

This isn’t a rhetorical analogy. It follows from facing the same underlying problem: once generation gets cheap, the quality bottleneck shifts from “can it be generated” to “can the pile of generated things stay consistent and on-intent,” and the general shape of the fix is to freeze intent into constraints set before generation, readable by a machine. That’s why anyone who’s read the Architecture chapter gets a strong sense of déjà vu here: same machine, different face.

这也回答了一个实操问题：AI-Native 流程里，设计系统该投入到什么程度才算够？旧标准是"够团队对齐就行"——它只是事后文档，投入太多是浪费。新标准要高得多：设计系统的完备度和可机检度，直接决定生成质量的上限，所以它值得当成核心资产持续投入，而不是有空才收拾的边角料。具体说，过去可能用自然语言松散描述的东西（"主色调用我们的品牌蓝""间距保持一致"），现在都值得固化成机器能直接消费的形式：token 写成 JSON 而不是截图，组件写成有明确接口契约的代码而不是画板示例，品牌原则里能机检的部分（对比度、字体白名单、禁用模式）写成 lint 规则。投入的回报是复利的：设计系统越完备，每次生成就越省判断、越不离牌，而每一轮判断的回流又让它更完备。这是一条正反馈——正因为是正反馈，早投入比晚投入划算得多。把设计系统当事后文档的团队，等于一直在放弃这条复利曲线〔源：W3C Design Tokens 社区组规范草案＋ Material／Carbon／Polaris 设计系统 token 公开实践，证据级 Ⅳ 行业实践〕[R7]。

This also answers a practical question: in an AI-Native process, how much should you invest in the design system before it’s enough? The old standard was “enough to align the team”: since it was only a post-hoc doc, over-investing was waste. The new standard is much higher: the design system’s completeness and machine-checkability directly set the ceiling on generation quality, so it’s worth continuous investment as a core asset, not scrap you tidy when there’s time. Concretely, things once loosely described in natural language (“use our brand blue for the primary,” “keep spacing consistent”) are now worth freezing into forms a machine can consume directly: tokens as JSON, not a screenshot; components as code with explicit interface contracts, not artboard examples; the machine-checkable parts of brand principles (contrast, font whitelist, banned patterns) as lint rules. The return compounds: the more complete the system, the less judgment each generation burns and the less it drifts off-brand, and each round’s judgment feeding back makes it more complete still. That’s a positive feedback loop, and precisely because it is, investing early beats investing late by a wide margin. A team that treats the design system as a post-hoc doc is continuously giving up that compounding curve 〔Source: W3C Design Tokens Community Group draft spec + the public token practice of the Material / Carbon / Polaris design systems, grade Ⅳ industry practice〕[R7].

但这条复利曲线有一个必须守住的边界，不然它会反噬：设计系统能固化的，永远只是"已经被判断过的好"，它替代不了"对新情况做新判断"。护栏写得越完备，越容易生出一种危险的诱惑——以为照着系统生成，结果就一定好，于是停止判断，把系统当成了自动驾驶。这是把③（上下文/护栏）误当成了④（人的判断）。护栏的作用是把生成约束在"过去判断过的好"那个窄带里，让你不必每次都从头判断已经判断过的事；但当一个新情况出现——一类系统没覆盖过的内容、一群没服务过的用户、一个旧判据不再适用的场景——系统会沉默，或者更糟，会用旧判据给你一个"看起来合规、其实不对路"的结果。这时仍然得由人重新判断，并把新判断回流进系统（这正是⑥沉淀）。所以设计系统是判断的沉淀器，不是判断的替代品。守住这条边界，复利曲线才成立；越界把它当自动驾驶，它就会在第一个新情况面前，悄悄把你带回均值。

But this compounding curve has a boundary you have to hold, or it backfires: what a design system can freeze is only “good that’s already been judged”: it can’t substitute for making a new judgment about a new situation. The more complete the guardrails, the more tempting it gets to believe generating by the system guarantees a good result, so you stop judging and treat the system as autopilot. That’s mistaking ③ (context/guardrail) for ④ (human judgment). The guardrails’ job is to constrain generation to the narrow band of “good judged in the past,” so you don’t have to re-judge the already-judged every time, but when a genuinely new situation shows up (content the system never covered, a group of users it never served, a scenario where old criteria no longer apply), the system goes quiet, or worse, hands you, by old criteria, a result that “looks compliant but is off-target.” At that point a human still has to judge fresh, and feed that new judgment back into the system (that’s exactly ⑥ distill). So the design system is a sediment of judgment, not a substitute for it. Hold this boundary and the compounding curve holds. Cross it, treat the system as autopilot, and at the first new situation it will quietly carry you back to the mean.

DSN

PLAYBOOK · 守住人本 / 反 slop

HOLD THE HUMAN

行动 · 承重

Action · Load-bearing

守住人本，拒绝 slop

Hold the human, refuse slop

生成把执行接走后，设计师被还给共情、品味、意义。

Once generation takes over execution, the designer is returned to empathy, taste, and meaning.

一句话In one line

设计成不成功，不看出稿多快，看它到底是不是为具体的人做的。把"更快"本身当成赢，是最隐蔽的那种失败。Whether a design succeeds is about whether it’s actually made for specific people, not how fast the comps ship. Mistaking “faster” itself for the win is the most insidious failure there is.

交给生成Hand to generation

铺变体、补全状态与边角
Variants, states, edge cases
套用设计系统、对齐切图
Applying the system, alignment, export
初稿与探索性方向
First drafts, exploratory directions

留给人 · 品味与意义Keep with humans · taste & meaning

共情：理解用户真正要什么
Empathy: what users truly want
品味：判好坏、守独特、避 slop
Taste: judge, hold distinctiveness, avoid slop
意义：为"这值不值得存在"负责
Meaning: own whether it deserves to exist

承重命题：设计的成败，不看出稿多快，而看它最终是不是真的为具体的人而做。这不是一句装饰性的话，它是整卷的承重墙，也是整个系列那条人本主线落在设计面上的地方。把它当真，会改变你判断"AI-Native 设计成没成功"的标准：一个团队要是用 AI 把出稿速度提了十倍，产出的却全是更快、更光滑的 slop——谁也不为、谁也不记得——那按这条命题，它失败了，哪怕每个效率指标都好看。

The load-bearing claim: a design’s success isn’t judged by how fast its comps ship, but by whether it ends up truly made for specific people. That’s not a pretty phrase: it’s this volume’s load-bearing wall, and where the series’ human through-line lands on the design face. Take it seriously and it changes your criterion for whether AI-Native design has succeeded: if a team uses AI to make comps ten times faster and produces only faster, smoother slop (for no one, remembered by no one), then by this claim it has failed, however good every efficiency metric looks.

反过来，一个团队出稿没快多少，但设计师把省下的每一分注意力都投进了共情和品味，做出的东西真让目标用户觉得"这是为我做的"，那按这条命题，它成功了。出稿更快本身从来不是赢，它只是把人从重复产出里腾出来；被腾出来的人，得回到只有人能做的那件事上——理解人、为人负责。把"更快"本身错当成赢，是 AI-Native 转型里最常见、也最隐蔽的失败：它能让你在所有仪表盘都是绿的时候，悄悄丢掉设计存在的理由。

Flip it around: a team whose comps didn’t get much faster, but whose designers poured every spared minute into empathy and taste, making something that genuinely makes the target user feel “this was made for me,” by this claim, that team succeeded. Shipping faster is never, on its own, the win. It only frees people from repetitive production; the person freed up has to go back to what only a person can do: understand people, be responsible to them. Mistaking “faster” itself for the win is the most common, and most insidious, failure in an AI-Native transition: it lets you quietly lose the reason design exists while every dashboard glows green.

"把人还给意义"在设计面上具体指什么

What “returning people to meaning” concretely means on the design face

"人回归意义"在别的卷里可能还偏抽象，但在设计这个面上落得最实，因为设计本来就是一门关于人的手艺：它全部的价值就在"被某个人用、被某个人感受到"。生成接管了出稿、铺变体、对齐切图这些产出动作之后，被还给设计师的正是这门手艺最初的内核：去理解一个具体的人在一个具体处境里真正需要什么（共情），去判断眼前这一版是不是真的命中了那个需要（品味），去为"这个东西值不值得存在、该是什么样"负责（意义）。这三件事过去常被产出工时挤到边上：设计师大量时间花在像素和导出上，留给共情和判断的反而不多。

“People return to meaning” may stay abstract in other volumes, but on the design face it lands most concretely, because design is by nature a craft about people: its entire value lies in being used by someone, being felt by someone. Once generation takes over the production actions (drawing comps, spreading variants, alignment and export), what gets returned to the designer is exactly the original core of this craft: understanding what a specific person in a specific situation truly needs (empathy), judging whether the version in front of them actually hits that need (taste), owning whether this thing deserves to exist and what it should be (meaning). These three used to get squeezed to the margins by production hours: designers spent enormous time on pixels and exports, leaving little for empathy and judgment.

AI-Native 设计的承诺，不是"设计师能更快地出图"，是把设计师从产出里解放出来，还给他这门手艺一开始就该做的事。这和工程卷"人做系统专长与产品判断、不做吞吐"是同一次解放，只是对象从代码换成了体验与美。

The real promise of AI-Native design is freeing the designer from production and returning them to what this craft was meant to do from the start, not “designers can produce comps faster.” That’s the same liberation as the engineering volume’s “people do deep systems expertise and product judgment, not throughput,” just with the object swapped from code to experience and beauty.

这也是为什么在整个系列里，设计被看作人本主线落得最实的一面。组织卷把"人重新回到中心"立成整套方法论的归宿，但在组织层面它还偏宏观、偏原则；越往下到具体职能，这个目的就越得被翻译成"在这个面上，回归意义具体是什么"。工程卷把它翻译成"人做系统专长而非吞吐"，已经具体了一层；到了设计，它落得最实，因为设计这门手艺的对象本来就是人：它不像写代码那样隔着一层逻辑，直接面对的是"一个人会不会被这个东西打动、会不会觉得被理解"。所以在设计这个面上，"为具体的人而做"不是一条需要费力论证的抽象原则，几乎就是这门手艺的定义本身：一个不为人的设计，从一开始就不配叫设计。

AI-Native 设计因此是检验整条人本主线能不能真的落地的试金石：要是它在最贴近"人"的这个面上都守不住，那它在别的面上多半也只是好看的话；守住了，才证明这条主线不是装饰，是真能一路落到具体动作上的承重结构。

This is also why, across the whole series, design is seen as the face where the human through-line lands most concretely. The org volume sets “putting people back at the center” as the methodology’s destination, but at the organizational level it stays macro, still a principle; the further down toward a concrete function, the more that purpose has to be translated into “on this face, what does returning to meaning actually mean.” The engineering volume translates it into “people do systems expertise, not throughput,” already one layer more concrete. At design it lands most concretely of all, because this craft’s object is people to begin with: unlike writing code, which sits behind a layer of logic, it faces directly whether a person will be moved by this thing, will feel understood. So on the design face, “made for specific people” is nearly the definition of the craft itself, not an abstract principle that needs laborious arguing: a design that isn’t for people doesn’t deserve to be called design in the first place.

AI-Native design is thus the touchstone for whether the whole human through-line can really land: if it can’t be held even on the face closest to “people,” it’s probably just pretty words on the other faces too. Hold it here, and that proves the through-line is a load-bearing structure that really does carry all the way down to concrete action, not decoration.

反 slop 红线（命中越多，越滑向均值）

Anti-slop red lines (the more you hit, the closer to the mean)

深底配青 / 霓虹强调色 · 紫到蓝渐变 · 渐变文字做标题或数字
Dark bg + cyan/neon accents · purple-to-blue gradients · gradient text on headings or metrics
玻璃拟态滥用 · 千篇一律的等大卡片网格 · 巨数字 + 小标签仪表盘模板
Glassmorphism everywhere · identical equal-size card grids · the big-number + small-label hero template
通用字体（Inter / Roboto / 系统默认）· 万物居中 · 每个标题上方一个大圆角图标
Generic fonts (Inter / Roboto / system defaults) · centering everything · a big rounded icon above every heading

检验信号Test signal

slop 率下降、独特度上升——把成品丢给陌生人，他会问"这怎么做出来的"，而不是"这哪个 AI 做的"。同时盯用户侧：可用性、共情命中、"这是为我做的"那种感觉。Slop rate down, distinctiveness up: show the result to a stranger and they ask “how was this made,” not “which AI made this.” Watch the user side too: usability, empathy hits, that “this was made for me” feeling.

DSN

MECHANISM · 为何代码产物被放大

WHY CODE ARTIFACTS WIN

机理

Mechanism

赢的是产物变代码

What wins is the artifact becoming code

把上一节的原理拆到成因层，并划出边界：放大律只管"可被规格约束"的那一半。

Take the prior section’s principle down to the causal layer and mark its boundary: the amplification law governs only the half a spec can bind.

一句话In one line

让设计被放大的，是产物从二进制变成了文本；代码只放大"能写进规格"的那一半，"有没有灵魂"的另一半，还是归人管。What amplifies design is the artifact turning from binary into text; code amplifies only the half a spec can hold: the “has it a soul” half still belongs to people.

这四个属性是"产物=文本"这个形态的副产物，不是工具的功能。Figma 文件、PSD、私有画布把状态锁进二进制：人能看，agent 却看不懂、diff 不出、生成不了、也验不了。换成 HTML / JSX / tokens.json，同一份设计立刻可读、可 diff、可生成、可验证。所以 pencil / Remotion 这类工具真正被放大的原因，不在它们更好用，在它们选对了产物形态。〔源：本系列工程卷 design-as-code 实践与 pencil／Remotion 等设计即代码工具链，证据级 Ⅳ〕[R3]

These four properties are a byproduct of the “artifact = text” form, not tool features. Figma files, PSDs, proprietary canvases lock state into binary: a human can look; an agent can’t parse it, diff it, generate it, or machine-check it. Swap in HTML / JSX / tokens.json and the same design becomes readable, diffable, generatable, verifiable all at once. That’s the real reason pencil / Remotion-class tools get amplified: not that they’re nicer, but that they picked the right artifact form. 〔Source: this series’ engineering volume design-as-code practice + the pencil / Remotion design-as-code toolchain, grade Ⅳ〕[R3]

二进制画布 · 被边缘化Binary canvas · marginalized

状态锁在私有格式里。agent 只能截图猜、人工转译；每次改动是一团不可读的 diff，生成与验证都无从下手。

State locked in a proprietary format. An agent can only screenshot and guess, or a human transcribes; each change is an opaque diff, with no handhold for generation or verification.

代码 / 文本 · 被放大Code / text · amplified

对 agent 可读、可 diff／版本、可生成／组合、可机检规格。设计就此进入工程的协作纪律，海量生成可被审、可回滚、可约束。

Legible to agents + diffable/versionable + generatable/composable + machine-checkable spec. Design enters engineering’s collaboration discipline; mass generation becomes reviewable, revertible, constrainable.

边界 · 何时这条律失效Boundary · when the law stalls

产物变代码只放大可被规格约束的那一半。对齐、间距、token 符合度：可机检，被放大；而"这版有没有灵魂、是否为这群人而做"无法写进类型系统。把设计当纯工程问题，就会优化掉所有可机检的指标，产出一个挑不出错却谁也不想用的界面。代码是杠杆，不是品味的替身。Artifact-as-code amplifies only the half a spec can bind. Alignment, spacing, token-conformance are machine-checkable and get amplified; whether a version has a soul, whether it is made for these people, cannot be written into a type system. Treat design as a pure engineering problem and you optimize every machine-checkable metric into an interface that passes review yet no one wants to use. Code is leverage, not a stand-in for taste.

为什么是"产物形态"而不是"工具能力"在做功

Why it is the “artifact form,” not “tool capability,” that does the work

很容易把 pencil、Remotion 的价值归功于"它们功能更强、更智能"。这是误判，会让人去追下一个更炫的工具，错过杠杆点。做功的是产物从二进制变成文本这件事本身：它一次性满足了所有对 agent 友好的属性，于是被整个软件协作生态接住。检验的办法很干脆：设想把 Figma 的导出格式换成一份完全可读的语义化文本 schema，其他功能一概不变。仅这一改，agent 就能读它、diff 它、生成它、机检它，价值立刻被放大。

It’s tempting to credit pencil’s or Remotion’s value to “they’re more powerful, smarter.” That’s a misdiagnosis, and it sends people chasing the next flashier tool while missing the real leverage point. What does the work is the fact that the artifact turns from binary into text itself, which satisfies every agent-friendly property at once and gets caught by the entire software-collaboration ecosystem. The test is blunt: imagine swapping Figma’s export format for a fully readable, semantic text schema and changing nothing else. That one change alone lets an agent read it, diff it, generate it, machine-check it, and value gets amplified immediately.

反过来：给一个再聪明的 AI 设计工具，只要它把结果存回不可读的私有二进制，agent 照样进不去，杠杆照样不出现。可见做功的是形态，不是聪明。

Conversely: hand the smartest AI design tool around results that get stored back into an unreadable proprietary binary, and the agent still can’t get in, the leverage still doesn’t appear. It’s the form doing the work, not the cleverness.

记症状会过期，懂成因不会——为什么这一节讲机制而非清单

Memorizing symptoms expires; understanding causes does not: why this section teaches mechanism, not a list

有人会问：既然 slop 的指纹能列成清单（DSN 05/09 已经列了），这一节为什么还要费力讲成因？因为清单会过期，成因不会。截至本版（2026-07），slop 的指纹是青配深底、紫蓝渐变、玻璃拟态，但这些只是当下训练分布里最高频的模式；一两年后，等大家都开始反这几条、新的高频模式形成了，slop 的指纹就会换一批面孔——也许是某种新的"高级灰极简"，某种新的版式套路。要是只背了今天这张清单，到时候你手里拿的是一张过期的地图，面对一套全新的 slop 束手无策，甚至因为"它不在我的清单上"而误以为它不是 slop〔源：Patterns（Cell Press）同行评议——生成轨迹坍缩到一组通用视觉母题，坐实"趋同引力可测"，证据级 Ⅱ；指纹清单为时点观察，本版自标 2026-07〕[R8]。

Someone might ask: since slop’s fingerprints can be listed (DSN 05/09 already did), why does this section labor over causes? Because the list expires; the causes don’t. As of this edition (2026-07), slop’s fingerprints are cyan-on-dark, purple-blue gradients, glassmorphism, but these are only the highest-frequency patterns in the current training distribution. A year or two on, once everyone starts rejecting these and a new high-frequency pattern forms, slop’s fingerprints will swap faces: maybe some new “refined-gray minimalism,” some new layout cliché. If you only memorized today’s list, by then you’re holding an expired map, helpless in front of a brand-new slop, maybe even mistaking it for non-slop because “it’s not on my list.” 〔Source: Patterns (Cell Press), peer-reviewed — generation trajectories collapse onto a set of generic visual motifs, confirming “the gravity of convergence is measurable,” grade Ⅱ; the fingerprint list is a point-in-time observation, self-dated 2026-07〕[R8].

但要是懂了成因（slop = 高频 × 安全 × 易生成的交集，也就是对均值的收敛），那不管指纹怎么换脸，你都能用同一把尺认出它：问这版有没有为这群人做过取舍，还是只是选了当下最省事、最不会出错的默认。这就是为什么这一卷始终在讲成因而不是症状：症状是时点的，成因是结构的。背症状让你能对付今天，懂成因才能对付还没出现的明天。

But once you understand the cause (slop = the intersection of high-frequency × safe × easy-to-generate: convergence to the mean), then however the fingerprints change faces, you can recognize it with the same ruler: ask whether this version made a real trade-off for these people, or just picked the most labor-saving, least-wrong default of the moment. That’s why this volume keeps teaching cause rather than symptom: symptoms are point-in-time, cause is structural. Memorizing symptoms handles today; understanding cause handles a tomorrow that hasn’t arrived yet.

有个推论值得点明，它会让人对"反 slop"这件事保持谦卑：今天你引以为傲的"高级"审美，明天也可能变成新的 slop。当下被当成反 slop 的那些做法——克制的留白、单色极简、衬线大标题、不对称网格——要是因为被推崇而变得高频、被无数人模仿、成了生成模型的新默认，就会精确满足"高频 × 安全 × 易生成"这三条，变成下一代的 slop。这不是说这些做法本身不好，是说"好"从来不住在某个具体视觉样式里，住在"是不是为这群人做了取舍"这个动作里。一旦某种样式变成不假思索的默认套用，它就丢了那个动作，不管它曾经多"高级"。

There’s a corollary worth naming, one that keeps you humble about “anti-slop”: the “refined” aesthetic you’re proud of today may become tomorrow’s new slop. Take the practices currently regarded as anti-slop: restrained whitespace, monochrome minimalism, serif headlines, asymmetric grids. If, by getting celebrated, they become high-frequency, imitated by countless people, the new default of generation models, they’ll satisfy exactly the same three conditions (high-frequency × safe × easy-to-generate) and become the next generation’s slop. That’s not to say these practices are bad in themselves. It’s that “good” never lives in a specific visual style: it lives in the act of making a real trade-off for these people. Once a style becomes a thoughtless default applied by rote, it loses that act, however “refined” it once was.

这条推论的实践含义是：反 slop 不是一份能背下来、一劳永逸的"好品味清单"，是一种得每次重新执行的判断动作——每次都重新问"这是为谁、为什么是这样"，而不是套用上次管用的答案。把任何一种审美当成永久正确的安全牌，本身就是滑向 slop 的开始。

The practical upshot: anti-slop is a judgment act you have to re-perform every time, not a “good-taste list” you memorize once and for all, each time asking again “for whom, why this way,” rather than reapplying the answer that worked last time. Treating any aesthetic as a permanently correct safe card is itself the start of sliding into slop.

这条放大律有它精确的边界，说清楚它比夸它更重要。它只作用于"能被规格约束"的那一半设计：对齐、间距、token、可访问性、状态完备度。这一半被代码形态接住之后会被巨幅放大、近乎免费。但另一半——这一版有没有打动人、是不是为它要服务的那群人做的、节奏对不对——没法写进任何类型系统或 lint 规则。

把设计整体当成纯工程问题，就会发生一种隐蔽的退化：你会不自觉地只优化能测的，因为能测的有反馈、能跑 CI、能让进度看得见。于是每个可机检指标都满分，产物却是空的——一个挑不出任何错、却谁也不想多看一眼的界面。这就是"把代码当品味替身"的代价。正确的姿势是：让代码形态把可机检那半彻底自动化，从而腾出人的全部注意力，去守那不可机检的另一半。

This amplification law has a precise boundary, and stating it clearly matters more than praising it. It acts only on the half of design that “can be bound by a spec”: alignment, spacing, tokens, accessibility, state-completeness. Once caught by the code form, that half gets hugely amplified, nearly free. But the other half (whether this version moves anyone, whether it’s made for the people it’s meant to serve, whether the pacing is right) can’t be written into any type system or lint rule.

Treat design wholesale as a pure engineering problem and a hidden degeneration sets in: you unconsciously optimize only what you can measure, because the measurable gives feedback, runs in CI, makes progress visible. So every machine-checkable metric scores full marks while the artifact is hollow: an interface with no findable flaw that no one wants to look at twice. That’s the cost of treating code as a stand-in for taste. The right posture: let the code form fully automate the machine-checkable half, freeing all of the human’s attention to guard the half that can’t be machine-checked.

"只优化可测的"是一种隐蔽的退化——它会自动发生，除非你刻意防

“Optimize only the measurable” is a hidden degeneration: it happens automatically unless you guard against it

这条退化值得单独说，因为它是一种系统会自动滑向的状态，不是某个人偷懒的结果。能测的东西有个不公平的优势：它能给即时反馈，能进 CI，能在仪表盘上变成一条向上的线，能在汇报里被引用。不能测的东西（这版有没有打动人）正相反：反馈慢、说不清、量不出进度。一个团队同时面对这两类目标的时候，注意力会不知不觉往能测的那边倾斜——它持续、廉价地给正反馈，不能测的那边则一直沉默又昂贵。时间一长，团队会在不知不觉中把"好设计"重新定义成"所有可测指标都过关"，而那恰恰是空心 slop 的定义。

防住这条退化需要刻意的反制：在流程里给不能测的判断留出受保护的时间和话语权——比如评审里逼一个人回答"这版指标全过，但它打动人了吗"——并明确承认"可测指标全过"只是必要条件，不是充分条件：它说明没犯低级错误，不说明做出了好东西。把这条写进团队约定，就是抵抗"工程吞掉设计"的护栏。

This degeneration deserves its own treatment, because it’s a state the system slides into automatically, not the result of someone being lazy. The measurable has an unfair advantage: it gives instant feedback, enters CI, becomes an upward line on a dashboard, gets cited in a report. The unmeasurable (does this version move anyone) is the opposite: slow feedback, hard to state, no way to quantify progress. When a team faces both kinds of goal at once, attention tilts imperceptibly toward the measurable side: it continuously, cheaply hands out positive feedback while the unmeasurable side stays forever silent and expensive. Over time the team redefines “good design,” without noticing, as “all measurable metrics pass,” which is exactly the definition of hollow slop.

Guarding against this takes a deliberate counter-move: reserve protected time and standing for unmeasurable judgment in the process (for instance, force someone in review to answer “metrics aside, does this version move anyone”) and explicitly grant that “all measurable metrics pass” is only a necessary condition, not a sufficient one: it says no rookie mistakes were made, not that something good was made. Writing this into the team’s working agreement is the guardrail against engineering swallowing design.

把这条边界正过来说，会得到一个让人安心的结论：代码形态非但不威胁设计师的核心价值，反而是在保护它。它把所有可机检、机械、重复的活——那些过去吞掉设计师大量时间、却最不需要人判断的活——彻底交给了机器，从而腾出设计师的时间和注意力，专投到不可机检的那一半：理解用户、判断好坏、守住品味和意义。代码形态做的是"把人不该做的事拿走"，留下的正是"只有人能做、也最值得人做的事"。

担心"设计被工程吞掉"的人，往往把因果搞反了：被吞掉的不是设计，是设计里本就该自动化的那部分；设计——为人的判断——非但没被吞，反而因为周围杂活被清走，第一次有了充分施展的空间。所以正确的态度不是抵触代码形态，是主动拥抱它来清场，然后把清出来的全部注意力，押在那条目前仍只有人守得稳的人本边界上——这是当下的能力边界，不是永久保证：哪天机器在真实使用里稳定守住了它，这条线就该重画。这就是 DSN 06 这条放大律现在指向的好消息：杠杆归机器，意义归人。

Turn this boundary right-side up and you reach a reassuring conclusion: the code form protects the designer’s core value rather than threatening it. It hands every machine-checkable, mechanical, repetitive job (the jobs that used to eat vast amounts of designer time yet needed the least human judgment) entirely to the machine, freeing the designer’s time and attention to invest wholly in the unmeasurable half: understanding users, judging good and bad, holding taste and meaning. In other words, the code form takes away what people shouldn’t be doing, leaving exactly what only people can do, and most deserve to.

People who worry about “design being devoured by engineering” often have the causality backwards: what gets devoured is the part of design that should have been automated all along, not design. Real design, judgment for people, isn’t devoured at all; with the chores cleared away, it gets room to fully unfold for the first time. So the right move is to actively embrace the code form to clear the ground, then bet all the freed attention on the human boundary machines have not reliably reached, not to resist it. That boundary marks today’s capability, not a permanent guarantee: the day a machine holds it steadily in real use, that line gets redrawn. That’s the good news DSN 06’s amplification law points to for now: leverage to the machine, meaning to people.

DSN

WORKFLOW · AI 设计工作流

THE AI DESIGN WORKFLOW

工件 · 可拷贝

Artifact · Copyable

铺开候选，再收敛——一条可照做的环

Spread candidates, then converge: a loop you can run

把"生成多稿 + 判断"做成一条有节拍的环：规格 → 铺开 → 评判 → 导向 → 收敛 → 沉淀，是工程 SDD 环在设计面的镜像。

Make “generate many, then judge” a loop with a cadence: spec → spread → critique → steer → converge → distill: the engineering SDD loop mirrored on the design surface.

一句话In one line

六步环里，机器管"铺开"、人管"评判与导向"；只要第⑥步把判断写回规格，护栏就越用越准，否则每轮从均值重启。In the six-step loop the machine runs “spread” and the human runs “critique and steer”; as long as step ⑥ writes judgment back into the spec, guardrails sharpen with use, otherwise every round restarts from the mean.

交给生成 · 铺面Hand to generation · cover surface

② 铺开：按规格批量出 6–12 个方向
② Spread: 6–12 directions in a batch, on spec
④ 导向后再生成：选中方向上铺变体
④ Regenerate on the chosen direction’s variants
补全状态/边角/响应式/文案
Fill states, edge cases, responsive, copy

留给人 · 品味与方向Keep with humans · taste & direction

① 规格：写下为谁、何为好、不要什么
① Spec: for whom, what is good, what to avoid
③ 评判：挑对路那版、说清为什么
③ Critique: pick the on-target one, say why
⑤⑥ 收敛与沉淀：定稿、把判断回流进系统
⑤⑥ Converge & distill: ship, feed judgment back

上下文怎么流：规格（①）是喂给生成的护栏，决定②铺出来的候选是否偏离约束；③的"为什么好／差"不能只留在脑子里，而要写成下一轮的提示与对系统的修订。判断必须回流，否则每轮都从均值重新开始。下面是一条可拷贝的环：

How context flows: the spec (①) is the guardrail fed to generation, setting how on- or off-target ② comes out; ③’s “why good / why weak” cannot stay in your head; write it into the next round’s prompt and into a system revision. Judgment must flow back, or every round restarts from the mean. A loop you can copy:

②铺开的关键是"方向多样"，不是"数量多"

The point of ② spread is “directional diversity,” not “high count”

有一个细微但决定成败的区别：铺开 12 个候选，和铺开 12 个方向，是两件完全不同的事。前者常常是同一个想法的 12 次微调——换个色、挪个间距、改个字号——它们挤在审美空间的同一个点附近，给人的判断提供不了任何有意义的对比，只会制造"选择的错觉"。后者是 12 条真正不同的假设：极简文档风 vs 终端美学 vs 杂志排版 vs 卡片流……每一条代表一个对"该怎么为这群人做"的不同回答。判断力只有在面对差异时才有用武之地——你能说出"A 方向的克制更命中这群用户的工程感，B 方向太热闹"，这是判断；而在 12 个微调里挑一个，你只是在表达偏好。所以②的指令不该是"多生成几个"，而该是"生成几个互不相同的方向"，并且在提示里显式要求方向间的差异度。这也解释了 DSN 11 那条反指标——不停"再来一个"却说不出在找什么——的本质：那是在数量轴上空转，而非在方向轴上探索。

There is a subtle but decisive distinction: spreading 12 candidates and spreading 12 directions are entirely different things. The former is often 12 tweaks of the same idea (a different color, a shifted margin, a changed size), crowded near the same point in aesthetic space, offering judgment no meaningful contrast and only manufacturing “the illusion of choice.” The latter is 12 genuinely different hypotheses: minimal-docs vs terminal aesthetic vs magazine typography vs card flow… each a different answer to “how should this be made for these people.” Judgment has work to do only when facing real difference: being able to say “direction A’s restraint hits this audience’s engineering sensibility better, direction B is too busy” is judgment; picking one of 12 tweaks is only expressing a preference. So ②’s instruction should not be “generate a few more” but “generate a few mutually distinct directions,” with the diversity between directions demanded explicitly in the prompt. This also explains the essence of DSN 11’s counter-signal: endless “one more” with no statement of what you seek is spinning on the count axis rather than exploring on the direction axis.

① 规格 SPEC：写下为谁、意图气质、"完成"判据、反 slop 红线（见 DSN 08）。喂给生成前先定，别边生成边想。
① SPEC: write for-whom, intent/character, “done” criteria, anti-slop red lines (see DSN 08). Set it before generating, not while generating.
② 铺开 SPREAD：一次出多个方向（不是一个的微调）。要的是方向多样性，不是数量。
② SPREAD: produce several directions at once (not tweaks of one). You want directional diversity, not count.
③ 评判 CRITIQUE：挑、并说清判据命中／落空在哪。说不出"为什么"，就还没在判断。
③ CRITIQUE: pick, and name where it hits or misses the criteria. If you cannot say “why,” you are not yet judging.
④ 导向 STEER：把评判变成下一轮的具体指令，只在选中方向上深化。
④ STEER: turn the critique into the next round’s concrete instructions; deepen only the chosen direction.
⑤ 收敛 CONVERGE：定一稿，跑可机检的护栏（token／对齐／可访问性）。
⑤ CONVERGE: settle one, run the machine-checkable guardrails (token / alignment / accessibility).
⑥ 沉淀 DISTILL：把这轮新学到的判据回流进设计系统与下次的规格——让护栏越用越准。
⑥ DISTILL: feed the round’s new criteria back into the design system and the next spec; guardrails sharpen with use.

这条环必须闭合——判断不回流，每轮都从均值重启

The loop must close: without feedback, every round restarts from the mean

很多团队把这套做成了一条开环：规格→铺开→挑一个→发版，下一次又从空白开始。开环的致命处在于，第③步那些宝贵的判断——"这版为什么好、那版差在哪、我其实在找什么"——全留在了设计师的脑子里，没有变成任何可复用的东西。于是生成每一轮都从训练分布的均值重新出发，护栏永远停在第一天的水平。这正是为什么环里⑥沉淀这一步是承重的：它把人这一轮做出的判断外化成下一轮的规格修订和系统更新，让护栏带着判断一起长。闭环与开环的差别，不是多走一步流程，而是判断到底有没有复利——开环里每次判断用完即弃，闭环里每次判断都垫高了起点。这与工程卷"上下文成为可查询基设"是同一道：判断只有被写下来、回流进系统，才从一次性的灵光变成团队的资产。

Many teams build this as an open loop: spec → spread → pick one → ship, and next time start from blank again. The fatal flaw of the open loop is that the precious judgment of step ③ (“why this version is good, where that one fails, what I am actually looking for”) all stays in the designer’s head and never becomes anything reusable. So generation restarts every round from the mean of the training distribution, and the guardrails stay frozen at day-one quality. This is exactly why the ⑥ distill step is load-bearing: it externalizes the judgment a human made this round into the next round’s spec revisions and system updates, letting the guardrails grow alongside the judgment. The difference between a closed and an open loop is not one extra process step but whether judgment compounds: in the open loop each judgment is used once and discarded; in the closed loop each one raises the floor. This is the same line as the engineering volume’s “context becomes queryable infrastructure”: judgment becomes a team asset, not a one-off spark, only when it is written down and flows back into the system.

核心图KEY FIGFIG. D5.0 / THE AI DESIGN LOOP · 品味是验证器看懂：comp→变体→评判→收敛，闭环的关键是判断回流 Read: comp→variants→critique→converge; the close is judgment flowing back

外圈是工作流的四个工位，沿顺时针流转。真正让它从"开环堆 slop"变成"闭环长品味"的，是中心那个验证器（人的品味）和左侧那条红色回流箭头——它把③评判出的判据写回①规格。没有这条回流，环只是一台更快的 slop 印刷机；有了它，护栏每转一圈都更准。这与工程卷的 SDD 环是同一招，只是验证器从"正确"换成了"品味"。

The outer ring is the workflow’s four stations, flowing clockwise. What turns it from “an open loop piling up slop” into “a closed loop growing taste” is the verifier at the center (human taste) and the red feedback arrow on the left: it writes the criteria surfaced in ③ critique back into ① spec. Without that feedback the loop is just a faster slop press; with it, the guardrails sharpen each turn. Isomorphic to the engineering volume’s SDD loop, only the verifier is swapped from “correctness” to “taste.”

这条环和工程的 SDD 环为什么是同一条——以及关键的那一处不同

Why this loop and engineering’s SDD loop are the same, and the one central difference

工程卷讲过一条 SDD（spec-driven development）环：人写规格 → agent 生成实现 → 人验证 → 不对就回去修规格再生成。把它和这里的设计环并排看，结构完全一致：都是"人定标准 → 机器生成 → 人验证 → 判断回流修标准"的闭环，都靠中心那个不可外包的验证器把生成约束在意图内。这种一致，来自两者都是同一条内核在不同面的实现：执行充裕之后，人退到"定标准 + 验证"这两个判断节点上。

The engineering volume described an SDD (spec-driven development) loop: humans write the spec → agents generate the implementation → humans verify → if wrong, go back, fix the spec, regenerate. Place it beside the design loop here and the structure is identical: both are closed loops of “humans set the standard → machine generates → humans verify → judgment flows back to fix the standard,” both relying on the non-outsourceable verifier at the center to constrain generation within intent. This identity follows from both being implementations of the one kernel on different faces: after execution becomes abundant, people retreat to the two judgment nodes of “set the standard + verify.”

但有一处关键的不同，必须说清，否则会把设计错当成工程：工程环里的验证器验的是正确性，它在原则上可被自动化测试逼近（一个实现要么通过测试要么不通过，边界相对清晰）；而设计环里的验证器验的是品味，它在原则上就不可被完全自动化，因为"对这群人是否对路"没有一个可被测试用例固定的真值。这条不同决定了：工程里人验证的占比会随着测试覆盖率提高而下降，而设计里人验证的占比有一个不会归零的下限：那个下限正是品味，正是 ④ 的常驻人口。把这条记牢，就不会犯"既然能自动化测试，设计是不是也能全自动验收"的错。

But there is one central difference that must be made clear, or design gets mistaken for engineering: the verifier in the engineering loop checks correctness, which in principle can be approached by automated tests (an implementation either passes the tests or not, a relatively clear boundary); the verifier in the design loop checks taste, which in principle cannot be fully automated, because “on-target for these people or not” has no ground truth pinnable by a test case. This difference dictates: in engineering the share of human verification falls as test coverage rises, while in design the share of human verification has a floor that never reaches zero: that floor is precisely taste, precisely the standing population of ④. Hold this firmly and you will not make the error of “since tests can be automated, can design acceptance be fully automated too.”

关于这条环，最后值得提醒一个节奏上的常见错误：把六步压成三步偷偷跑。压力之下，人很容易把"①规格→②铺开→③评判→④导向→⑤收敛→⑥沉淀"压缩成"随便提个示→挑一个→发版"——省掉了写规格、省掉了说清判据、省掉了判断回流。每一步被省掉，都对应一种已经讲过的失败：省①规格＝生成从均值起步（滑向 slop）；省③的"说清为什么"＝评判退化成凭感觉投票（无法导向）；省⑥沉淀＝判断不复利（每轮从零开始）。这条环的价值恰恰在于它逼你不跳步：六步是六个各自承重的检查点，不是流程繁文，每一个都对应一处人类判断必须显式发生的地方。所以照做这条环最大的好处，是它用结构强迫你在每个该判断的地方真的判断了一次，而非"按流程走显得专业"——这正是从"用更快的手"到"用更准的判断"那一步迁移，在日常操作层面的具体抓手。

On this loop, one last reminder about a common rhythm mistake: compressing six steps into three and running them on the sly. Under pressure, people easily compress “① spec → ② spread → ③ critique → ④ steer → ⑤ converge → ⑥ distill” into “throw out a prompt → pick one → ship”: skipping writing the spec, skipping stating the criteria, skipping judgment feedback. Each skipped step corresponds to a failure already discussed: skip ① spec = generation starts from the mean (slides to slop); skip ③’s “say why” = critique degenerates into voting by feel (cannot steer); skip ⑥ distill = judgment does not compound (every round starts from zero). The value of this loop is precisely that it forces you not to skip: the six steps are not process red tape but six individually load-bearing checkpoints, each corresponding to a place where human judgment must explicitly happen. So the biggest benefit of running this loop is not “looking professional by following process” but that its structure forces you to actually judge once at each place a judgment is due. This is exactly the day-to-day handle for that migration from “a faster hand” to “more accurate judgment.”

有效 / 失效的信号Right / wrong signals

先行指标：每轮铺开的方向真有差异（非一版的微调）；评判时能逐条指名判据；判断回流后下一轮命中率上升。反指标：不停"再生成一个"却说不出在找什么——那是用生成代替判断，环空转，只会更快地堆出同质化候选。Leading: each spread holds genuinely distinct directions (not tweaks of one); you can name the criteria hit per candidate; hit-rate rises next round after judgment flows back. Counter: endless “generate another” with no statement of what you are looking for. That substitutes generation for judgment, the loop spins, and you only pile up sameness faster.

DSN

07·5

CASE · 一遍完整的环

A WORKED LOOP

案例 · 走一遍

Case · walked through

把抽象的环，在一个真实需求上走一遍

Walking the abstract loop through one real brief

这是方法的自我检验：连最常见的需求都走不顺，就不该被信。

This is the method testing itself: if it cannot even run a most-common brief, it does not deserve belief.

一句话In one line

在一个落地页上走完六步环：生成做"做出来"的活，人只做三件机器做不了的事；这六步换任何产物都成立。Walk the full six-step loop on a landing page: generation does the “making,” the human does only the three things machines cannot; the six steps hold for any artifact.

① 规格。先写下不可外包给生成的人类判断：FOR-WHOM=独立开发者，在焦虑找工具、注意力极短的处境里，要在 30 秒内判断"这是否为我"；CHARACTER=克制、工程感、可信，明确不要欢快插画、不要科技蓝渐变；DONE-WHEN=目标用户一眼认出自己、陌生人问"怎么做的"而非"哪个 AI 做的"；HARD-RULES=只用系统 token、对比度≥4.5:1、无渐变文字、字体白名单排除 Inter/Roboto。② 铺开。把规格喂给生成，一次出 8 个方向（不是一个的微调）：有的走极简文档风、有的走终端/代码美学、有的走杂志排版、有的仍滑回了默认的玻璃拟态。注意：哪怕给了红线，总有几版会漏过——这正说明硬约束需要被写成 lint 在收敛时强制跑，而不能只靠提示里写一句。

① Spec. First write the human judgments that cannot be outsourced to generation: FOR-WHOM = solo developers, anxious and tool-hunting with very short attention, who must judge “is this for me” within 30 seconds; CHARACTER = restrained, engineering-grade, trustworthy, explicitly no cheerful illustration, no tech-blue gradient; DONE-WHEN = the target user recognizes themselves at a glance, a stranger asks “how was this made” not “which AI made it”; HARD-RULES = system tokens only, contrast ≥ 4.5:1, no gradient text, typeface whitelist excluding Inter/Roboto. ② Spread. Feed the spec to generation and produce 8 directions at once (not tweaks of one): some go minimal-docs, some terminal/code aesthetic, some magazine typography, some still slide back to the default glassmorphism. Note: even with red lines given, a few versions slip through, which is exactly why hard constraints must be written as lint and forced to run at convergence, not left to one line in the prompt.

③ 评判。关键不是"挑出最好看的"，而是逐条对规格说清命中/落空：终端美学那版命中了 CHARACTER 的"工程感"和 FOR-WHOM 的"一眼认出自己"，但首屏信息密度过高，违背了"30 秒内判断"；杂志排版那版气质对、可信感强，但太重、加载慢。说不出这些"为什么"，就还没在判断、只是在挑。④ 导向。把评判变成下一轮的具体指令："在终端美学方向上深化，但首屏只留一句价值主张 + 一个真实代码片段，密度降一半。"——只在选中方向上再生成，不重开八个。⑤ 收敛。定一版，跑 HARD-RULES 的 lint：对比度过、token 过、渐变文字零、字体过。⑥ 沉淀。把这轮新学到的判据——"独立开发者落地页首屏密度上限""真实代码片段比抽象插画更命中可信感"——写回设计系统的规格库，下一个类似需求的起点就抬高了。这一步是闭环与开环的唯一差别。

③ Critique. The key is not “pick the prettiest” but to state, item by item against the spec, what hits and what misses: the terminal-aesthetic version hits CHARACTER’s “engineering-grade” and FOR-WHOM’s “recognize yourself at a glance,” but its above-the-fold information density is too high, violating “judge within 30 seconds”; the magazine-typography version has the right character and strong trust, but is too heavy and loads slowly. Unable to say these “whys,” you are not judging yet, only picking. ④ Steer. Turn the critique into the next round’s concrete instruction: “deepen the terminal-aesthetic direction, but above the fold keep only one value proposition + one real code snippet, halve the density,” regenerate only on the chosen direction, do not reopen eight. ⑤ Converge. Settle one version, run the HARD-RULES lint: contrast passes, tokens pass, zero gradient text, typeface passes. ⑥ Distill. Write the round’s new criteria: “above-the-fold density ceiling for solo-developer landing pages,” “a real code snippet hits trust better than abstract illustration,” back into the design system’s spec library, and the starting point for the next similar brief is raised. This step is the only difference between a closed and an open loop.

为什么这一遍能推广——它没用任何落地页特有的东西

Why this walk generalizes: it used nothing specific to landing pages

这个案例用的是落地页，但注意：上面六步里，没有一步依赖"它是落地页"这个事实。把对象换成一个移动 App 的引导流、一份数据报表、一套图标系统、甚至一段产品视频，六步的结构原样成立：只有 FOR-WHOM 的内容、HARD-RULES 的具体阈值、③评判时对照的判据会换，而"先写有判别力的规格→铺真正不同的方向→逐条说清判据命中→只在选中方向上深化→收敛跑硬约束→把判断回流"这个骨架不变。这正是判断一套方法是不是抓到了底层结构的试金石：它的步骤能不能在不改结构的前提下，换一个对象重跑一遍。

能，说明它抓的是结构；只对某一类对象成立，说明它抓的是表象。这个落地页案例之所以放在这里，是因为它最常见、最容易被读者拿自己手上的活去对照，不是因为落地页本身特别重要：你完全可以现在就把它换成你正在做的东西，走一遍这六步，看卡在哪一步。卡住的那一步，往往就是你现在最该补的能力。

This case uses a landing page, but note: among the six steps above, not one depends on the fact that it is a landing page. Swap the object for a mobile app’s onboarding flow, a data report, an icon system, even a product video, and the six-step structure holds intact: only the content of FOR-WHOM, the specific thresholds of HARD-RULES, and the criteria checked against in ③ change. The skeleton “write a discriminating spec → spread genuinely different directions → state criteria hits item by item → deepen only the chosen direction → converge and run hard constraints → feed judgment back” stays fixed. This is exactly the touchstone for whether a method has caught the underlying structure: whether its steps can rerun on a different object without changing the structure.

If yes, it caught the structure; if it holds only for one kind of object, it caught the appearance. This landing-page case sits here not because landing pages are especially important but because they are most common and easiest for a reader to check against their own work: you could right now swap it for whatever you are making, walk the six steps, and see where you get stuck. The step you get stuck on is often the capability you most need to build next.

这一遍里有一个细节值得单独拎出来，因为它最容易被跳过、却最决定成败：③评判时"说清为什么"这个动作，是评判的核心，不是修饰。很多人以为评判就是"挑出那个最好的"，挑完就完事；但如果你说不出"它为什么对路、落选的那些差在哪条判据上"，你做的就只是凭感觉投票。这个区别有实在的后果：说不出理由，④导向就无从下手（你不知道该让生成往哪个具体方向深化），⑥沉淀也无从谈起（你没有可回流的判据）。所以"说清为什么"这个看似多余的口头动作，其实是把一次性的直觉判断转化成可导向、可回流、可复用的结构化判断的关键一步。

One detail in this walk deserves to be pulled out on its own, because it is the easiest to skip yet the most decisive: the act of “saying why” in ③ critique is its very substance, not an ornament of judgment. Many assume critique is “pick the best one” and you are done; but if you cannot say “why it is on-target, on which criterion each reject falls short,” what you did is just voting by feel. This distinction has real consequences: with no reason stated, ④ steer has no handle (you do not know which concrete direction to deepen), and ⑥ distill is out of the question (you have no criteria to feed back). So the seemingly redundant verbal act of “saying why” is in fact the key step that converts a one-off intuitive judgment into a steer-able, feed-back-able, reusable structured judgment.

一个简单的自律：每次挑完候选，强迫自己写下三句话：这版命中了哪条判据、落选的主要差在哪、下一轮该往哪个方向深化。写不出这三句，说明你还停在"挑"，没进到"判"。这一条自律，几乎是整条闭环能不能真正闭合的开关。

A simple discipline: after picking a candidate, force yourself to write three sentences: which criterion this version hits, where the main reject falls short, which direction to deepen next round. If you cannot write these three, you are still at “picking,” not yet at “judging.” This one discipline is nearly the switch for whether the whole loop actually closes.

这一遍验证了什么What this walk verifies

注意全程：生成做了所有"做出来"的活（铺 8 个方向、深化、补全），人只做了三件机器做不了的事：把规格写得有判别力、逐条说清判据命中、把判断回流进系统。产出量不是这一遍的功劳，命中率才是：第二轮就收敛，是因为第一轮的判断没有白费。这就是 DSN 03→07 在一个真实需求上的合一。Note throughout: generation did all the “making” (spread 8 directions, deepen, fill in), and the human did only the three things machines cannot: write a discriminating spec, state criteria hits item by item, feed judgment back into the system. Output volume is not the win of this walk; hit-rate is: it converged on the second round because the first round’s judgment was not wasted. This is DSN 03→07 made one on a real brief.

DSN

SPEC · 何为好的规格

THE SPEC OF GOOD

工件 · 模板

Artifact · Template

把"好"写成可生成、半可机检的规格

Write “good” as a spec that is generatable and half machine-checkable

接 DSN 04，这里给可拷贝的规格样例：硬约束并入①充裕，软判据留在④判断。

Following DSN 04, here is a copyable spec sample: hard constraints fold into ① abundance, soft criteria stay at ④ judgment.

一句话In one line

把"好"写成可拷贝的规格，切两层：硬约束写成 lint 进 CI，软判据只能人评；最有判别力的一项往往是写下"不要什么"。Write “good” as a copyable spec in two layers: hard constraints become lint in CI, soft criteria are judged only by people; its most discriminating item is often writing down “what not to be.”

为什么要分两层：把软判据硬塞进 lint，会得到对齐完美却没灵魂的 slop；把硬约束留给人眼盯，会把人的注意力耗在机器该管的事上。分层后，机器守住"不离牌"，人专注守"是否为人"。一份可拷贝的规格骨架（贴进 repo 的 design-spec.md 即可喂生成）：

Why split the layers: forcing soft criteria into lint yields pixel-perfect, soulless slop; leaving hard constraints to the human eye burns attention on what a machine should own. Split, the machine holds “stay on-brand” while the human guards “is it for people.” A copyable spec skeleton (drop it into a repo’s design-spec.md to feed generation):

FOR-WHOM（软·人）：谁用、在什么处境、要完成什么、要感受到什么。例：给独立开发者，焦虑找工具时，30 秒内判断"这是否为我"。
FOR-WHOM (soft · human): who, in what situation, to do what, to feel what. e.g. for solo devs, anxious tool-hunting, to judge “is this for me” within 30 seconds.
CHARACTER（软·人）：该有的调性 + 明确的"不要"。例：要克制、工程感；不要欢快、不要科技蓝渐变。
CHARACTER (soft · human): the tone it carries + an explicit “not.” e.g. restrained, engineering-grade; not playful, no tech-blue gradient.
DONE-WHEN（软·人）：一组验收信号。例：陌生人问"怎么做的"而非"哪个 AI 做的"；目标用户一眼认出自己。
DONE-WHEN (soft · human): a set of acceptance signals. e.g. a stranger asks “how was this made,” not “which AI”; the target user recognizes themselves at a glance.
HARD-RULES（硬·可机检）：只用系统 token；对比度 ≥ 4.5:1；触达尺寸 ≥ 44px；无离系统的字号/间距；命中"反 slop 红线"零次。这一层写成 lint，CI 里跑。
HARD-RULES (hard · machine-checkable): system tokens only; contrast ≥ 4.5:1; hit-target ≥ 44px; no off-system sizes/spacing; zero hits on the anti-slop red lines. This layer becomes lint, run in CI.

分诊判据Triage test

某条约束该进硬层还是软层？问一句："无需理解用户，仅看产物文本就能判定对错吗？"能 → 硬层（lint / CI，并入①）；不能 → 软层（留给人判，留在④）。把这条问句对每条规格走一遍，就得到一份不假装品味可计算的规格。Hard layer or soft? Ask: “can this be ruled right or wrong from the artifact text alone, without understanding the user?” Yes → hard (lint / CI, folded into ①); no → soft (judged by people, kept at ④). Run every constraint through this question and you get a spec that does not pretend taste is computable.

一份好规格的标志：它能让别人（或 agent）替你做出"你会认的"东西

The mark of a good spec: it lets someone else (or an agent) make something “you would sign off on”

怎么判断一份设计规格写得够不够好？有一个干脆的操作性判据：把它交给一个没和你聊过、不在你脑子里的人或 agent，让对方据此生成；产物回来，你认不认？认，说明规格把你脑中那把尺真的外化出来了；不认，说明判据还藏在你的直觉里，没写下来——而藏在直觉里的标准，生成既学不到也评不了。这条判据之所以有用，是因为它把"规格质量"从一个主观感受变成了一次可重复的实验：换三个不同的人/agent 跑同一份规格，若产出彼此接近且都在你的可接受带内，规格就收敛了；若产出四散，说明规格里还有大量留白被各自的均值填上了。好规格写得有判别力，不是写得详尽——它能把"你要的"和"看起来差不多但不对的"区分开。

How do you tell whether a design spec is good enough? There is a blunt operational test: hand it to a person or agent who has never talked with you and is not inside your head, have them generate from it; when the artifact comes back, do you sign off? If yes, the spec really externalized the ruler in your head; if no, the criteria are still hiding in your intuition, unwritten, and a standard that lives in intuition is one generation can neither learn nor be judged against. This test is useful because it turns “spec quality” from a subjective feeling into a repeatable experiment: run the same spec past three different people/agents, and if their outputs are close to each other and all inside your acceptance band, the spec has converged; if outputs scatter, there is still a lot of blank in the spec that each filled with its own mean. A good spec is not exhaustive but discriminating: it tells “what you want” apart from “what looks about right but is wrong.”

这与验证篇"人定何为对"是同一道工序的两个面：验证那边写的是正确性的判据（可被测试用例固定），设计这边写的是好的判据（一半可机检、一半只能由人认）。两边共享一个深层结构：当执行变充裕，"判据"本身成了不可外包的人类产物。模型可以生成无穷多候选，但"按什么标准收"必须由人给定——给不出标准，就只能默认收到均值。所以写规格是设计师在 AI-Native 时代最核心的动作之一，不是流程文档工作：它就是把"判断"这件稀缺的事，沉淀成可复用、可回流、可喂给生成的资产。

This is two faces of the same operation as the Verification chapter’s “humans define what’s right”: verification writes the criteria of correctness (pinnable by test cases); design writes the criteria of good (half machine-checkable, half only human-acceptable). Both share a deep structure: once execution becomes abundant, the “criteria” themselves become the non-outsourceable human artifact. A model can generate infinitely many candidates, but “by what standard to converge” must be supplied by a human: supply no standard and you default to converging on the mean. So writing the spec is not process-documentation busywork but one of the designer’s most central moves in the AI-Native era: it is precisely the act of distilling the scarce thing, judgment, into a reusable, feed-back-able, generation-feedable asset.

规格的两个反面：写太死，和写太空

A spec’s two failure modes: over-pinned, and over-empty

写规格有两个对称的失败，理解它们能帮你找到那条窄路。写太死：把每个像素、每个色值、每个间距都规定死，本质上是用文字重新画了一遍稿：这既没利用生成的探索能力（你已经把答案写死了，生成只剩复刻），也把本该留给方向探索的②铺开压成了零自由度。更隐蔽的是，写太死往往是把软判据误当硬约束的产物：你以为自己在写规格，其实在写一份固执的个人偏好，堵死了所有可能更好的方向。

Writing a spec has two symmetric failures, and understanding them helps you find the narrow path. Over-pinned: nailing down every pixel, color value, and margin is essentially redrawing the comp in words: it neither uses generation’s exploratory power (you have already fixed the answer, leaving generation only to replicate) nor leaves the ② spread any degrees of freedom for directional exploration. More insidiously, over-pinning is often the product of mistaking soft criteria for hard constraints: you think you are writing a spec but are writing a stubborn personal preference that blocks every potentially better direction.

写太空：只写"做个现代、专业、好用的界面"，等于什么都没说：这些词对每个项目都成立，因此对这个项目毫无判别力，生成只能回你均值。这是把规格写成了正确的废话。那条窄路是：硬约束写到可机检的精确（这一侧越死越好，因为它本就该自动化），软判据写到有判别力但不限定具体形态（写清"为谁、什么气质、什么算完成、不要什么"，但把"具体长什么样"留给②去探索、③去判断）。一句话：规格要锁死目标，敞开路径。

Over-empty: writing only “make a modern, professional, usable interface” is saying nothing: these words hold for every project and therefore have no discriminating power for this one, so generation can only return you the mean. This writes the spec as correct nonsense. The narrow path is: write hard constraints to machine-checkable precision (the more fixed this side the better, since it should be automated anyway). Write soft criteria to be discriminating without fixing concrete form (spell out “for whom, what character, what counts as done, what to avoid,” but leave “what it concretely looks like” to ② to explore and ③ to judge). In a phrase: a spec locks the target and opens the path.

最有判别力的一项，往往是"明确不要什么"

The most discriminating item is often “explicitly what not to be”

在所有规格项里，有一项的判别力被严重低估：明确写下"不要什么"。正面描述"要什么"很容易写成正确的废话："要现代、要专业、要好用"对任何项目都成立，因此对收敛几乎没用。但反面的"不要什么"天然带判别力，因为它直接对准了生成最可能滑向的那个均值。写下"不要科技蓝渐变、不要欢快插画、不要玻璃拟态、不要把它做成又一个 SaaS dashboard"，等于在生成出发之前就把那条通往 slop 的最宽的路堵死了。这背后的机制是：生成的默认就是均值，而"不要什么"恰恰是在描述均值的形状：你越清楚自己要避开的那个默认长什么样，你的规格就越能把生成推离它〔源：Patterns（Cell Press）同行评议——生成无强约束时坍缩到分布均值的通用母题，证据级 Ⅱ；测的是趋同引力，不外推具体比例〕[R8]。

所以一份好规格里，"不要"清单往往比"要"清单更有信息量、更省下游的判断成本。这也和 DSN 09 的反 slop 红线接上了：那张红线表，本质上就是一份通用的、可机检的"不要什么"，把它纳进每个项目的规格，你就免费获得了一道挡住最常见 slop 的护栏。

Among all spec items, one has badly underrated discriminating power: explicitly writing down “what not to be.” A positive description of “what to be” easily becomes correct nonsense: “be modern, be professional, be usable” holds for any project and is therefore almost useless for convergence. But the negative “what not to be” carries discriminating power by nature, because it aims directly at the mean generation is most likely to slide toward. Writing “no tech-blue gradient, no cheerful illustration, no glassmorphism, do not make it yet another SaaS dashboard” blocks, before generation even sets out, the widest road to slop. The mechanism behind this: generation’s default is the mean, and “what not to be” is precisely describing the shape of the mean: the clearer you are about what the default you want to avoid looks like, the more your spec can push generation off it 〔Source: Patterns (Cell Press), peer-reviewed — with no strong constraint, generation collapses onto the generic motifs at the distribution’s mean, grade Ⅱ; measures the gravity of convergence, no specific proportion extrapolated〕[R8].

So in a good spec, the “not” list is often more informative than the “to be” list and saves more downstream judgment cost. This connects to DSN 09’s anti-slop red lines: that red-line table is essentially a general, machine-checkable “what not to be,” fold it into every project’s spec and you get, for free, a guardrail against the most common slop.

还要厘清一个关于"半可机检"的常见误解：这个"半"字不是说有一半判据模棱两可、说不清，而是说判据在被分类之后，恰好一半能交给机器、一半必须留给人，且两边都各自清晰。硬约束那半（对比度、token、触达尺寸）清晰到可以写成断言、跑进 CI、给出确定的通过/不通过；软判据那半（为谁、气质、是否打动人）也可以清晰——清晰到能写下"目标用户一眼认出自己""陌生人问怎么做的而非哪个 AI"这样具体到可被人验收的信号——只是它的验收者是人而非机器。所以"半可机检"是一种诚实的精确：它既不假装品味能被算法判定（那是把软的硬塞），也不把可机检的部分推给人眼盯（那是浪费判断力）。一份好规格的高级之处，恰恰在于它清楚地知道自己的每一条该落在哪半，并据此把对的那半交给对的判定者。这种"知道什么该交给机器、什么必须留给人"的清醒，本身就是这套方法论在规格这个具体工件上的体现。

One common misunderstanding about “half machine-checkable” also needs clearing up: the “half” does not mean half the criteria are ambiguous and unstatable, but that once criteria are classified, exactly half can go to the machine and half must stay with people, with both sides clear in their own way. The hard-constraint half (contrast, tokens, hit-target) is clear enough to write as assertions, run in CI, return a definite pass/fail; the soft-criteria half (for-whom, character, whether it moves anyone) can also be clear. It is clear enough to write down signals concrete enough for a human to accept against, like “the target user recognizes themselves at a glance,” “a stranger asks how it was made, not which AI”. Only its acceptor is a human, not a machine. So “half machine-checkable” is an honest precision: it neither pretends taste can be algorithmically decided (forcing the soft into the hard) nor pushes the machine-checkable part to human eyes (wasting judgment). The sophistication of a good spec lies precisely in knowing clearly which half each of its items belongs to, and handing the right half to the right adjudicator accordingly. This clear-headedness (knowing what to give the machine and what must stay with people) is itself this methodology made manifest on the concrete artifact of the spec.

DSN

ANTI-SLOP · 异质审美的守护

ANTI-SLOP & HETEROGENEITY

机理 · 失效

Mechanism · Failure

slop 是同质化，解药是"只对这群人成立"

Slop is homogenization; the cure is “true only for these people”

先看 slop 的成因与可枚举的指纹（下表），再谈解药。

First the mechanism behind slop and its enumerable fingerprints (table below), then the cure.

一句话In one line

slop 是做得像所有人：问题在独特性轴，不在质量轴，"再精修"治不了。解药是把审美钉在一群具体的人身上，对所有人都"还行"正是滑回均值的征兆。Slop is done like everyone: its problem is on the distinctiveness axis, not the quality axis, so “more polish” cannot cure it. The cure is pinning the aesthetic to a specific group of people; being “fine” for everyone is exactly the sign of sliding back to the mean.

成因分析：模型为最小化期望损失，会偏向最高频的视觉模式——青配深底、紫蓝渐变、玻璃拟态、Inter 居中、巨数字仪表盘。它们之所以是 slop，不因为丑，而因为到处都是、谁也不为。下表把它们做成可机检的红线条目（接 DSN 08 的 HARD-RULES），命中即扣分：

Causal analysis: to minimize expected loss, a model leans toward the highest-frequency visual patterns: cyan-on-dark, purple-blue gradients, glassmorphism, centered Inter, big-number dashboards. They are slop not because they are ugly but because they are everywhere and for no one. The table turns them into machine-checkable red-line items (feeding DSN 08’s HARD-RULES); a hit deducts:

指纹 · 配色

FINGERPRINT · COLOR

青配深底 / 霓虹 / 紫蓝渐变

Cyan-on-dark / neon / purple-blue gradient

修法：从品牌或主题取一个真实的、有来由的主色，限定调色板，删掉渐变文字。

Fix: take one real, motivated primary from brand or subject; constrain the palette; kill gradient text.

指纹 · 材质

FINGERPRINT · MATERIAL

玻璃拟态 / 处处大圆角 + 柔投影

Glassmorphism / rounded-everything + soft shadow

修法：让材质服务层级而非装饰；多数表面用实色与硬边界，模糊只留给真正悬浮的层。

Fix: let material serve hierarchy, not decoration; flat surfaces and hard borders for most, blur only for truly floating layers.

指纹 · 排版

FINGERPRINT · TYPE

Inter/Roboto · 万物居中

Inter/Roboto · centering everything

修法：选一款有性格的字（含对比的衬线/特征字形）；建立左对齐为主的真实排版网格。

Fix: pick a typeface with character (a contrasting serif / distinctive forms); build a real left-aligned typographic grid.

指纹 · 布局

FINGERPRINT · LAYOUT

等大卡片网格 · 巨数字仪表盘模板

Equal-card grid · big-number dashboard template

修法：按内容权重定尺寸差异；用真实层级与节奏，而非把一切塞进等大盒。

Fix: size by content weight; use real hierarchy and rhythm instead of stuffing all into equal boxes.

指纹 · 图标

FINGERPRINT · ICON

每个标题上方一个大圆角图标

A big rounded icon above every heading

修法：图标只在帮助识别时用；多数标题靠文字与排版承担，不靠装饰图占位。

Fix: icons only where they aid recognition; let most headings carry on words and type, not decorative placeholders.

指纹 · 语气

FINGERPRINT · VOICE

空洞口号 · "赋能/革命性/无缝"

Empty slogans · “empower / revolutionary / seamless”

修法：用具体名词与动词写真实价值；删掉所有不增信息的形容词堆叠。

Fix: write real value in concrete nouns and verbs; delete every adjective stack that adds no information.

slop 不是"做得差"，是"做得像所有人"——同质化才是失败模式

Slop is “done like everyone,” not “done badly”: homogenization is the failure mode

把 slop 理解成"粗制滥造"会指向错误的解药——以为多打磨就能解决。真相相反：slop 往往做得很精，对齐完美、配色和谐、动效顺滑，每个可机检指标都满分。它的问题不在质量轴，在独特性轴：它收敛到了所有人见得最多的那个样子。这就是为什么"再精修一遍"治不了 slop——你是在沿错误的那条轴用力。模型为最小化期望损失，天然偏向训练分布里最高频的视觉模式；高频意味着"大家都这么做"，"大家都这么做"意味着对谁都不特别。所以 slop 的反面是"更具体"，不是"更高级"：具体到只为某一群人、某一个品牌、某一种处境成立。一个设计若让圈外人无感、却让目标用户心头一动，那不是缺陷，是它找到了自己的边界。

Reading slop as “shoddy” points to the wrong cure: the belief that more polish fixes it. The truth is the opposite: slop is often finely made, with perfect alignment, harmonious color, smooth motion, every machine-checkable metric at full marks. Its problem is not on the quality axis but the distinctiveness axis: it has converged on the shape everyone has seen most. That is why “polish it once more” cannot cure slop: you are pushing along the wrong axis. To minimize expected loss, a model naturally leans toward the highest-frequency visual patterns in its training distribution; high-frequency means “everyone does this,” and “everyone does this” means special to no one. So the opposite of slop is not “more refined” but “more specific”: specific enough to be true only for some group, some brand, some situation. A design that leaves outsiders cold yet moves the target user is not flawed; it has found its boundary.

异质审美的守护，因此是一条结构性的设计纪律，而不是风格偏好。当生成把"达到平均水准"变成免费，整个行业的默认产出会一起向均值滑——这不是某个团队的懒惰，是生成经济学的引力。对抗它需要主动的、付出代价的选择：明确"这只对这群人成立"，并接受"对另一群人不成立"是这个选择的必然代价而非失误。这恰恰是组织卷那条人本主线在设计面上最锋利的体现——为人，意味着为具体的人，而不是为统计意义上的"所有人"。下面这张图把这条引力与对抗它的力画在一起。

Guarding heterogeneity is therefore a structural design discipline, not a style preference. Once generation makes “reaching average quality” free, the whole industry’s default output slides toward the mean together: this is not any one team’s laziness but the gravity of generation economics. Resisting it takes an active, costly choice: stating “this is true only for these people,” and accepting that “it is not true for those people” is the necessary cost of that choice, not a mistake. This is precisely the sharpest expression, on the design surface, of the org volume’s human through-line: being for people means being for specific people, not for the statistical “everyone”. The figure below plots this gravity against the force that resists it.

图FIGFIG. D6.0 / HETEROGENEITY GUARD · 同质化滑向均值 vs 异质守护看懂：生成引力把审美拖向均值，异质守护把它钉在"这群人" Read: generation gravity drags aesthetics to the mean; the guard pins them to “these people”

每个圆是一个有识别度的设计。生成引力（灰虚线）把它们都往中间那个均值吸——这就是同质化失败模式：不加抵抗，所有设计都滑成 slop。唯一的对抗（红箭头）是主动把自己钉在某一群人身上，并接受"对其它群无感"是这个选择的代价而非失误。守异质，就是守"为具体的人"。

Each circle is a design with an identity. Generation gravity (gray dashed) pulls them all toward the mean in the middle, that is the homogenization failure mode: unresisted, every design slides into slop. The only counter (red arrow) is to actively pin yourself to one group of people and accept that “cold to other groups” is the cost of that choice, not a mistake. Guarding heterogeneity is guarding “for specific people.”

同行评议实证 · 均值化已经发生Peer-reviewed evidence · the mean is already here

本卷把"均值 = slop"当作成因分析的结论提出；它已不只是断言。一项发表于 Patterns（Cell Press）的同行评议研究分析了 700 条生成轨迹、横跨 7 档采样温度，发现它们几乎全部坍缩到同一组 12 个通用视觉母题——无论怎么调采样，生成都收敛回那一小撮"见得最多的样子"。这把"生成默认滑向均值"从趋势话术变成可引证的实测：slop 不是偶发劣稿，是无人施加判断时，生成系统的稳态吸引子。〔源：Patterns（Cell Press），同行评议；Hintze、Proschinger Åström & Schossau《Autonomous language-image generation loops converge to generic visual motifs》，2025;7(1):101451，DOI 10.1016/j.patter.2025.101451（已核实 · alphaXiv 全文＋Crossref 2026-06）；证据级 Ⅱ 同行评议实证。该研究测的是生成轨迹的视觉母题坍缩，与本卷"异质守护"结论方向一致，但不据此外推任何具体比例〕[R8]This volume offers “the mean = slop” as a conclusion of causal analysis; it is no longer only an assertion. A peer-reviewed study in Patterns (Cell Press) analyzed 700 generation trajectories across 7 sampling temperatures and found they nearly all collapse onto the same set of 12 universal visual motifs: however you tune sampling, generation converges back to that small handful of “most-seen” shapes. This turns “generation defaults to the mean” from trend-talk into a citable measurement: slop is not the occasional bad comp but the stable attractor of a generative system when no one applies judgment. [Source: Patterns (Cell Press), peer-reviewed; Hintze, Proschinger Åström & Schossau, “Autonomous language-image generation loops converge to generic visual motifs,” 2025;7(1):101451, DOI 10.1016/j.patter.2025.101451 (verified · alphaXiv full text + Crossref, 2026-06); grade Ⅱ peer-reviewed empirical. The study measures the visual-motif collapse of generation trajectories, directionally consistent with this volume’s heterogeneity-guard conclusion, but no specific proportion is extrapolated from it][R8]

异质守护 · 信号Heterogeneity · signals

有效：目标用户说"这是为我做的"，圈外人无感——这正是好的边界，不是缺陷；陌生人问"怎么做的"而非"哪个 AI 做的"。失效：所有人都说"还行/挺专业"，没人有强反应；命中上表 ≥3 条指纹。承重一句：对所有人都成立的审美，等于对均值收敛，也就失去具体对象。守住异质，就是守住"这群人"。Right: the target user says “this was made for me” while outsiders feel nothing (that is a good boundary, not a flaw); a stranger asks “how was this made,” not “which AI.” Wrong: everyone says “fine / professional,” no one reacts strongly; ≥3 fingerprints above hit. Load-bearing: an aesthetic true for everyone converges to the mean and loses its specific audience. Guarding heterogeneity is guarding “these people.”

DSN

09·5

FAILURE · slop 指纹的成因与具体判据

THE SLOP FINGERPRINT

机理 · 失效解剖

Mechanism · Autopsy

为什么 slop 长得都一样——指纹的成因

Why slop all looks the same: the anatomy of the fingerprint

青配深底、紫蓝渐变、玻璃拟态、Inter 居中、巨数字仪表盘：同一机制在不同界面留下的同一组指纹。这一节拆它的成因。

Cyan-on-dark, purple-blue gradients, glassmorphism, centered Inter, big-number dashboards: the same mechanism leaving the same fingerprints across interfaces. This section dissects the cause.

一句话In one line

每条指纹都是"高频 × 安全 × 易生成"的交集；修法是"加回被省略的判断"，而非换新套路。记症状会过期，懂成因不会。Every fingerprint is the intersection of “high-frequency × safe × easy-to-generate”; the fix is “adding back a skipped judgment,” not switching to a newer cliché. Memorizing symptoms expires; understanding causes does not.

成因一：高频。模型偏向训练分布里出现最多的视觉模式。过去几年的 dribbble/产品落地页/dashboard 模板里，深色玻璃拟态加霓虹渐变铺天盖地，于是它成了"设计应有的样子"的统计代表。成因二：安全。这些模式在评审里几乎不会被否：它们"看起来专业"，没人会因为用了 Inter 而被批评。安全意味着低风险，低风险的东西最容易成为默认。成因三：易生成。渐变、圆角、居中、等大卡片，都是用最少的结构决策就能填满画面的招式：它们不要求理解内容层级，只要求把盒子排整齐。三者叠加，就得到一个稳定的吸引子：生成在没有强约束时，必然落到这组指纹上。这解释了一个反直觉的事实：slop 不是模型"不够强"的产物，恰恰是它"足够强地拟合了均值"的产物。〔源：Patterns（Cell Press）同行评议——700 条生成轨迹×7 档温度几乎全部坍缩到同一组通用视觉母题，把"生成默认收敛到均值"从断言升为实测，证据级 Ⅱ；不外推具体比例〕[R8]

Cause one: high-frequency. A model leans toward the visual patterns that appear most in its training distribution. Across the last few years of dribbble / product landing pages / dashboard templates, dark glassmorphism with neon gradients was everywhere, so it became the statistical representative of “what design should look like.” Cause two: safe. These patterns are almost never rejected in review: they “look professional,” and no one gets criticized for using Inter. Safe means low-risk, and the low-risk thing most easily becomes the default. Cause three: easy-to-generate. Gradients, rounded corners, centering, equal-size cards are all moves that fill a screen with the fewest structural decisions: they require no understanding of content hierarchy, only that the boxes be lined up neatly. Stack the three and you get a stable attractor: with no strong constraint, generation inevitably lands on this set of fingerprints. This explains a counterintuitive fact: slop is precisely a product of the model being “strong enough to fit the mean,” not a product of it being “not strong enough.” 〔Source: Patterns (Cell Press), peer-reviewed — 700 generation trajectories × 7 temperatures nearly all collapse onto the same set of generic visual motifs, raising “generation defaults to the mean” from assertion to measurement, grade Ⅱ; no specific proportion extrapolated〕[R8]

每条指纹的修法，都是"加回一个被生成省略的判断"

Every fingerprint’s fix is “adding back a judgment generation skipped”

既然指纹来自"省略判断"，修法就一定是"把那个判断加回去"，而不是换一个更新潮的套路（换套路只是换一个均值）。配色指纹的修法，是回答"这个主色为什么是它"：从品牌、从内容主题、从用户处境里找一个有来由的颜色，而不是默认霓虹；材质指纹的修法，是回答"这里的模糊/投影在服务什么层级"，让材质承担信息层级而非装饰；排版指纹的修法，是回答"这个字为什么是它"，选一款与内容气质相符、有辨识度的字，并建立真实的左对齐网格而非万物居中；布局指纹的修法，是回答"这些内容的权重一样吗"，按内容权重定尺寸差异，而非把一切塞进等大盒。

Since fingerprints come from “skipping a judgment,” the fix must be “adding that judgment back,” not switching to a trendier cliché (switching clichés only swaps one mean for another). The fix for the color fingerprint is answering “why is this the primary color”: find a motivated color from the brand, the content’s subject, the user’s situation, rather than defaulting to neon; the fix for the material fingerprint is answering “what hierarchy is this blur/shadow serving,” letting material carry information hierarchy, not decoration; the fix for the type fingerprint is answering “why is this the typeface,” picking one whose character matches the content, with real identity, and building a genuine left-aligned grid instead of centering everything; the fix for the layout fingerprint is answering “do these contents weigh the same,” sizing by content weight rather than stuffing all into equal boxes.

共同结构：每条修法都把一个"生成默认替你做了的省事决定"，换成"一个你为这群人主动做的判断"。这正是 DSN 03·5 说的：品味就是这些判断的合成，反 slop 就是逐条把它们加回去。

The shared structure: every fix swaps a “labor-saving default generation made for you” for “a judgment you actively made for these people.” This is exactly what DSN 03·5 said: taste is the synthesis of these judgments, and anti-slop is adding them back item by item.

图FIGFIG. D6.1 / FINGERPRINT ANATOMY · 逐处标注一张 slop 落地页 · a slop landing page, annotated part by part 看懂：左边是典型 AI 落地页的线框，每一处指纹（渐变标题、玻璃拟态、三卡、居中 Inter）右边都接着一个被省略的判断：指纹是判断缺位的痕迹，不是审美故障 Read: left is a typical AI landing-page wireframe; each fingerprint (gradient title, glassmorphism, three cards, centered Inter) maps on the right to a skipped judgment: a fingerprint is the trace of an absent decision, not an aesthetic fault

把这张图当成清单的因果版：左边每个编号是肉眼可见的征兆，右边对应的不是"换个更好看的做法"，而是一个本该有人来做、却被生成跳过的判断。这解释了为什么 slop 长得都一样——它们省略的是同一批判断；也解释了为什么修 slop 不能靠"再生成一版漂亮的"：漂亮还是会落在均值，缺的判断没人加回来。

Read this as the causal version of a checklist: each numbered item on the left is a visible symptom; the match on the right is not “a prettier alternative” but a decision someone should have made that generation skipped. This is why slop all looks alike: it omits the same set of judgments; and why you cannot fix slop by “generating a prettier one”: prettier still lands on the mean, and the missing decisions still go unmade.

把一张典型 slop 落地页逐处标注，每处都对应一个被省略的判断

Annotating a typical slop landing page point by point: each maps to a skipped judgment

设想一张最常见的 AI 生成落地页，从上到下逐处看，你会发现它像一份征兆清单：顶部是深色背景配一行紫到蓝的渐变大标题，渐变文字本身就是第一处指纹：它把"标题"当成了炫技的画布，而非传达信息的层级，被省略的判断是"这行字到底要让谁、在几秒内、读到什么"。主视觉区是一块玻璃拟态卡片浮在模糊光斑上，第二处指纹：模糊在这里不服务任何层级，纯粹是装饰，被省略的判断是"这个浮层比背后的东西更重要吗，凭什么浮起来"。功能区是三到四张等大的圆角卡片整齐排成一行，每张顶上一个大圆角图标，第三、四处指纹叠加：等大意味着"这些功能同等重要"这个几乎从不为真的假设没被质疑，图标只是占位装饰，被省略的判断是"这些内容的权重真的一样吗，这个图标帮人认出了什么"。

Picture the most common AI-generated landing page and read it top to bottom, and you find it reads like a checklist of symptoms: at the top, a dark background with a row of purple-to-blue gradient headline: gradient text is itself the first fingerprint, treating the “headline” as a canvas for showing off rather than a hierarchy that conveys information, the skipped judgment being “who exactly should read what, in how many seconds, from this line.” The hero area is a glassmorphism card floating over blurred light blobs: the second fingerprint, where blur serves no hierarchy and is pure decoration, the skipped judgment being “is this floating layer more important than what is behind it, what entitles it to float.” The features area is three or four equal rounded cards lined up neatly, each with a big rounded icon on top: the third and fourth fingerprints stacked, where equal sizing leaves the almost-never-true assumption “these features are equally important” unquestioned and the icon is mere placeholder decoration. The skipped judgment is “do these contents really weigh the same, and what does this icon help anyone recognize.”

数据区是几个巨大的数字配小标签，第五处指纹：它套用了"看起来很有料"的仪表盘模板，被省略的判断是"这些数字对这群用户真的重要吗，还是只是为了填满版面显得专业"。全篇居中、用 Inter，最后两处：居中是最省排版决策的默认（不用想对齐网格），Inter 是最安全的字（不会被批但也毫无性格），被省略的判断是"这个内容的气质适合居中吗，这款字说出了品牌的什么"。

The data area is a few giant numbers with small labels: the fifth fingerprint, applying the “looks substantial” dashboard template, the skipped judgment being “do these numbers really matter to these users, or are they just there to fill the page and look professional.” Centered throughout, in Inter: the last two, where centering is the default that saves the most layout decisions (no alignment grid to think about) and Inter is the safest typeface (never criticized, also utterly without character). The skipped judgment is “does this content’s character suit centering, and what does this typeface say about the brand.”

把这张标注图连起来看，会得到一个关于 slop 的更深的定义：slop 是一连串"省略的判断"在视觉上的累加。每一处指纹单独看都不致命：用一次居中、用一回渐变，本身不是罪；致命的是整张图从头到尾，没有一处是为这群人主动判断过的，每一处都选了那个最省事、最不会错的默认。这正是为什么 slop 给人的感觉是"哪里都对，整体却空"：因为它确实哪里都没犯错（每个默认都是安全的），但也确实哪里都没有人在场（每个默认都绕过了判断）。

Read this annotated figure as a whole and you reach a deeper definition of slop: slop is the visual accumulation of a series of “skipped judgments.” No single fingerprint is fatal on its own: centering once, a gradient once, is no crime; what is fatal is that from top to bottom the whole figure has not one place actively judged for these people, every place picking the most labor-saving, least-wrong default. This is exactly why slop feels “right everywhere, hollow overall”: it really did make no mistakes anywhere (every default is safe), but really has no one present anywhere (every default bypasses judgment).

反过来，一个有品味的设计未必每处都标新立异，但它一定在那些最该判断的地方真的判断了：主色有来由、层级有取舍、字有性格、该强调的被强调。所以"标注一张 slop"这个练习本身就是有价值的训练：它逼你在每一处问"这里有没有人做过判断"，而这个问句，正是把"挑出 slop"升级成"看懂 slop 为什么是 slop"的钥匙。

Conversely, a design with taste need not be novel everywhere, but it surely actually judged where judgment was most due: a motivated primary, a hierarchy with trade-offs, type with character, the emphasis emphasized. So the exercise of “annotating a slop” is itself valuable training: it forces you to ask, at each place, “did anyone make a judgment here,” and that question is the key that upgrades “spotting slop” into “understanding why slop is slop.”

具体判据 · 接 HARD-RULESConcrete criteria · feeding HARD-RULES

把上面的成因翻译成可机检条目，正是 DSN 08 HARD-RULES 那一层的来源：禁渐变文字、限定调色板取自品牌 token、模糊层数上限、字体白名单（排除 Inter/Roboto 系统默认）、卡片尺寸必须随内容权重变化、命中"空洞口号词表"零次。这些可写进 lint；而"这个主色为什么是它、这款字为什么对"仍是软判据，留给人。下方 INSTRUMENT 12/13 帮你把这两层分开跑一遍。Translating the causes above into machine-checkable items is exactly the source of DSN 08’s HARD-RULES layer: ban gradient text, constrain the palette to brand tokens, cap blur layers, whitelist typefaces (excluding Inter/Roboto system defaults), require card size to vary with content weight, zero hits on the “empty-slogan word list.” These go into lint; while “why this primary, why this typeface is right” remains soft, kept with people. INSTRUMENT 12/13 below help you run the two layers separately.

DSN

09·7

FAILURE · 同质化是系统性风险

HOMOGENIZATION AS SYSTEMIC RISK

机理 · 宏观失效

Mechanism · Macro failure

最大的失败模式不在一个产品里，在整个行业一起滑向均值

The biggest failure mode is not inside one product but a whole industry sliding to the mean together

前面是单个产品避开 slop，这里放大到系统层：为什么整个数字世界会越来越像。

The prior sections were one product avoiding slop; here we scale up to the system level: why the whole digital world grows more and more alike.

一句话In one line

个体的"安全选择"在系统层累加成整个行业滑向均值；反面是机会：均值越廉价，"只对这群人成立"的设计越升值，守异质既是抵抗也是押注。Individual “safe choices” sum at the system level into a whole industry sliding to the mean; the flip side is an opportunity: the cheaper the mean, the more valuable design “true only for these people,” so guarding heterogeneity is both resistance and a bet.

机制：共享的均值是一个共享的吸引子。过去，设计的多样性有一部分来自工具和人的差异：不同设计师的手感不同、不同工具的默认不同，这些差异在产出里留下了不同的痕迹。当生成把"达到平均水准"变成几乎免费，且大家用的是同一批模型、同一批流行提示，这些差异来源被抹平了：所有人都从同一个分布的同一个峰附近起步。结果是一种趋同压力：不是有人强迫，而是每个理性的个体都选了那条最省事、最不会错的默认，而这些默认恰好是同一个。十个团队各自做出"看起来很专业"的产品，叠在一起却发现它们像一个模子刻的。这就是同质化：个体层面的"安全选择"，在系统层面累加成"集体失去辨识度"。

Mechanism: a shared mean is a shared attractor. In the past, part of design’s diversity came from the difference between tools and people: different designers’ touch, different tools’ defaults, leaving different marks in the output. Once generation makes “reaching average quality” nearly free, and everyone uses the same batch of models and the same popular prompts, those sources of difference get flattened: everyone starts near the same peak of the same distribution. The result is a convergence pressure: no one forces it, but each rational individual picks the most labor-saving, least-wrong default, and those defaults happen to be the same one. Ten teams each make a product that “looks professional,” yet stacked together they turn out cut from one mold. This is homogenization: “safe choices” at the individual level summing, at the system level, into “a collective loss of distinctiveness.”

实测锚点Measured anchor

这条机制不止是推断，已有受控实验测到它的方向。Doshi 与 Hauser 的对照实验（Science Advances，2024，约 300 名受试者写短篇故事，部分人获得 AI 创意提示）发现一个分裂的效应：拿到 AI 提示的故事在个体层面被评得更新颖、更有用，但这批故事在集体层面用语义相似度衡量却更彼此趋同：个人创意上升，集体多样性下降。这正是"共享均值是共享吸引子"的实验影像：放大每一个个体，同时压平整个分布。〔源：Doshi & Hauser, Science Advances 10(28), 2024，证据级 Ⅱ 受控实验；该研究对象是叙事文本，迁移到视觉/产品设计是一次合理但仍需验证的外推，故不外推具体数字〕[R4]This mechanism is not only inferred; a controlled experiment has measured its direction. Doshi and Hauser’s randomized study (Science Advances, 2024; roughly 300 participants writing short stories, some given AI story ideas) found a split effect: stories written with AI prompts were rated more novel and useful at the individual level, yet the set of those stories was more similar to one another at the collective level, measured by semantic similarity: individual creativity up, collective diversity down. This is the experimental image of “a shared mean is a shared attractor”: amplify each individual while flattening the whole distribution. [Source: Doshi & Hauser, Science Advances 10(28), 2024, grade Ⅱ controlled experiment; the study’s object is narrative text, so carrying it to visual/product design is a reasonable but still-unverified extrapolation, and no specific figure is extrapolated here][R4]

为什么这是真正该担心的失败模式？因为它对个体几乎无痛——你的产品"看起来没问题"，每个指标都过关，没有任何单点的失败提醒你出了事。痛感被推迟、被分散到整个行业和长期：用户对一切都"还行"地无感，没有什么值得记住、值得偏爱、值得忠诚。设计本该制造的"这是为我做的"那种归属感，在普遍的均值里被稀释为零。异质守护就是对这条引力的个体层面抵抗：明确地、付出代价地选择"只对这群人成立"。它在个体层面看起来是放弃了一部分潜在受众，在系统层面却是维持多样性的唯一办法。这正是组织卷人本主线在设计面上的最终落点：为人，意味着为具体的、有差异的人，而维持这种差异，需要每个设计师主动逆着均值的引力站住。

Why is this the failure mode truly worth worrying about? Because it is almost painless to the individual: your product “looks fine,” every metric passes, no single-point failure warns you something went wrong. The pain is deferred and dispersed across the whole industry and the long term: users feel an “it’s fine” indifference to everything, with nothing worth remembering, preferring, or being loyal to. The sense of belonging (“this was made for me”) that design is meant to create gets diluted to zero in the universal mean. Guarding heterogeneity is the individual-level resistance to this gravity: choosing, explicitly and at a cost, “true only for these people.” It looks, at the individual level, like giving up part of a potential audience; at the system level it is the only way to keep diversity alive. This is the final landing, on the design face, of the org volume’s human through-line: being for people means being for specific, differing people, and keeping that difference alive requires each designer to actively stand against the gravity of the mean.

图FIGFIG. D8.0 / DISTINCTIVENESS APPRECIATES · 供给趋无限，均值贬值、偏离均值升值 · as supply →∞, the mean depreciates while the off-mean appreciates 看懂：当"看起来专业"被无限供给，它的价值趋零、变成入场券；同一时间，"为这群人极致对路"的偏离反而越来越稀缺、越来越值钱——这是同质化危机的另一面 Read: when “looks professional” is supplied without limit, its value tends to zero and becomes mere admission; meanwhile the off-mean “extreme fit for this group” grows scarcer and more valuable: the flip side of the homogenization crisis

这是 D6.0"引力把所有设计吸向均值"那张图的时间面：把镜头拉长，会看到两条价值曲线在反向移动。slop 越泛滥，"看起来专业"就越不构成优势——它从差异化变成了基线、入场券；同一过程里，真正"只对这群人成立"的设计因为周围一片均值而更显眼、更稀缺、更值钱。所以守异质不只是防御（别滑向 slop），它是一笔押注：押那个正在升值的稀缺品。这条同样不外推具体数字，只主张方向。

This is the time face of D6.0’s “gravity pulling every design toward the mean”: pull the lens back and the two value curves move in opposite directions. The more slop floods in, the less “looks professional” buys you: it shifts from a differentiator to a baseline, an admission ticket; in the same motion, design that is genuinely “true only for these people” becomes more conspicuous, scarcer, worth more against the surrounding mean. So holding the heterogeneous is a bet on the scarce thing that is appreciating, not only defense (don’t slide into slop). This too extrapolates no specific number, only the direction.

同质化里藏着一个反直觉的机会：异质本身在升值

Hidden in homogenization is a counterintuitive opportunity: distinctiveness is appreciating

这条系统性风险有一个反直觉的另一面，值得对从业者点明：当均值变廉价且无处不在，偏离均值的那部分反而变得更稀缺、更值钱。经济学的直觉在这里成立：任何东西一旦供给无限，它的价值就趋零；slop 正在变成无限供给，所以"看起来专业"本身已经不再是优势，它是基线、是入场券。

This systemic risk has a counterintuitive flip side worth naming for practitioners: once the mean is cheap and everywhere, the part that deviates from the mean becomes scarcer and more valuable. The economic intuition holds here: anything in infinite supply trends toward zero value; slop is becoming infinite supply, so “looks professional” is no longer an advantage but the baseline, the price of admission.

真正能制造差异、让人记住、让目标用户产生"这是为我做的"那种归属感的设计，反而因为周围一片均值而更显眼、更有价值。异质守护不只是一种防御性的纪律（避免滑向 slop），它同时是一种进攻性的机会（在一片趋同里成为那个被记住的）。对一个团队或个人，这是把同质化危机翻转成定位优势的入口：当所有人都在用 AI 把自己变得更像，主动选择"只对这群人极致对路、对其他人无感"的那个，反而占据了越来越空旷的差异化高地。守异质，因此既是对系统性风险的抵抗，也是对一个正在升值的稀缺品的押注。

Design that genuinely creates difference, gets remembered, makes the target user feel the belonging of “this was made for me” becomes, precisely because everything around it is the mean, more conspicuous and more valuable. This means guarding heterogeneity is not only a defensive discipline (avoiding the slide to slop) but also an offensive opportunity (being the one remembered amid the convergence). For a team or an individual, this is the entry to flipping the homogenization crisis into a positioning advantage: when everyone uses AI to make themselves more alike, the one who actively chooses “extremely on-target for these people, cold to everyone else” occupies the increasingly empty high ground of differentiation. Guarding heterogeneity is therefore both a resistance to systemic risk and a bet on a scarce good that is appreciating.

要避免被误读，得把"异质"和"为不同而不同"区分开。异质守护不是鼓励标新立异、不是为了和别人不一样而故意做怪，那只是把"滑向均值"换成了"滑向猎奇"，同样没有共情这个根，同样是 slop 的变体。异质，是"为这群人对路"自然生长出来的结果：当你真的为一群具体的人、在一个具体的目的下做设计，你的取舍会自然地偏离那个为所有人优化的均值，因为这群人和"所有人"本就不同。异质是共情的副产物，不是目标本身。

To avoid being misread, “heterogeneity” must be distinguished from “different for difference’s sake.” Guarding heterogeneity is not encouraging novelty-seeking, not deliberately being weird to stand apart; that merely swaps “sliding to the mean” for “sliding to the bizarre,” equally rootless in empathy, equally a variant of slop. True heterogeneity is the natural outgrowth of “being on-target for these people”: when you genuinely design for a group of specific people under a specific purpose, your trade-offs naturally diverge from the mean optimized for everyone, because these people are inherently different from “everyone.” In other words, heterogeneity is a byproduct of empathy, not the goal itself.

这个区分很重要，因为它防住了一种常见的过度修正：团队听说要"反 slop、要独特"，于是开始为独特而独特，做出一堆刻意古怪、却同样不为用户着想的东西。检验很简单：问这个"不一样"是从"为这群人"长出来的，还是从"想显得不一样"长出来的。前者是异质守护，后者只是换了一种 slop。守异质的正道，永远是先把"为谁"明确下来，让差异自然涌现（自己长出来），而不是把差异本身当成追求。

This distinction matters because it guards against a common over-correction: a team hears “anti-slop, be distinctive” and starts being distinctive for its own sake, making a pile of deliberately odd things equally inconsiderate of users. The test is simple: ask whether this “different” grew from “for these people” or from “wanting to seem different.” The former is guarding heterogeneity; the latter is just another flavor of slop. The right way to guard heterogeneity is always to pin down “for whom” first and let difference emerge naturally, not to pursue difference itself.

把这条系统性失败和这一卷开头的不对称连起来，整张图就闭合了：因为生成把"达到均值"变成免费（不对称），所以默认产出是 slop（均值即 slop），所以不加干预整个行业会一起向均值滑（同质化），所以唯一的解药是每个设计师主动把审美钉在具体的人身上（异质守护），而这件事恰恰是机器做不了、必须由人来扛的判断（品味＝稀缺判断），而它之所以值得扛，是因为为具体的人而做本就是设计的根基，也是整个系列共同的指向（人回到意义）。

Connect this systemic failure to the asymmetry at the volume’s opening and the whole figure closes: because generation makes “reaching the mean” free (the asymmetry), the default output is slop (the mean is slop), so unresisted the whole industry slides to the mean together (homogenization). So the only cure is each designer actively pinning their aesthetic to specific people (guarding heterogeneity). This is exactly the judgment a machine cannot do and a human must carry (taste = scarce judgment), and it is worth carrying because being for specific people is design’s foundation and the whole series’ shared direction (people return to meaning).

这是一条从经济学前提一路推到人本结论的完整逻辑链，而非六个零散的观点：每一环都是上一环的必然后果。这也是为什么这一卷敢说自己抓的是结构而非风格：它的每个主张都是这条链上一个被前因逼出来的节点，而非孤立的审美偏好。读到这里，再回看 DSN 01 那张生成×品味平面，你会发现整卷讲的其实是同一件事的不同切面：人，必须重新把品味这条纵轴，亲手加回到一个正在集体丢失它的世界里。

This is one complete chain of reasoning from an economic premise to a human conclusion, not six scattered points: each link a necessary consequence of the last. This is why this volume dares to say it caught structure, not style: each of its claims is a node on this chain forced out by what came before, not an isolated aesthetic preference. Reading this far, look back at the generation × taste plane of DSN 01 and you find the whole volume was telling different faces of one thing: people must, by hand, re-add the vertical axis of taste to a world that is collectively losing it.

证伪 · 这条担忧可能错在哪Falsification · where this worry could be wrong

这条担忧若错，会错在：若生成模型未来能主动制造有意义的差异（针对不同人群给出真正不同且对路的设计，而非随机扰动），同质化引力就会被模型自身抵消，异质守护也就不再需要人来扛。截至本版（2026-07），没有证据表明模型在做这件事：它们优化的是"对得多"，不是"对这群人特别"。只要这一点不变，同质化就是真实的系统性风险，异质守护就仍是人的责任。If this worry is wrong, it is wrong here: if future generation models can actively manufacture meaningful difference (not random perturbation but genuinely different, on-target designs for different groups), the homogenization gravity would be canceled by the model itself, and guarding heterogeneity would no longer need a human to carry it. As of this edition (2026-07), there is no evidence models do this: they optimize “right for many,” not “special for these people.” As long as that holds, homogenization is a real systemic risk and guarding heterogeneity remains a human responsibility.

DSN

MOTION · video-as-code / 动效即代码

VIDEO-AS-CODE

机理 · 同构

Mechanism · Isomorphism

动效与视频，同一招再走一遍

Motion and video: the same move, once more

design-as-code 不止于静态界面：动效与视频走的是同一条逻辑。

Design-as-code does not stop at static interfaces: motion and video follow the same logic.

一句话In one line

视频用代码写（Remotion 的 React），改一个字是一次 diff、出十种语言是改一组参数；可机检的活全自动化，省下的人力要更密地投到机器判不了的"节奏"上。Video written as code (Remotion’s React): changing a word is one diff, shipping ten languages is one set of parameters; the machine-checkable work is fully automated, and the freed effort must go more densely into the “pacing” the machine cannot judge.

过去做一条产品视频，改一个字要重渲整条、要回到不可读的时间线软件里手动对齐。当视频是代码：文案是参数（可批量本地化）、配色取自同一套 token（不离牌）、一次改动是一次 diff（可评审可回滚）、agent 能按规格批量出变体（铺开→判断→收敛，同一条环）。动效和界面同源，是一份可生成产物，不再是产出工时的黑洞。

Producing a product video used to mean: change one word, re-render the whole thing, go back into opaque timeline software to align by hand. When video is code: copy is a parameter (batch-localizable), palette comes from the same tokens (on-brand), one change is one diff (reviewable, revertible), and an agent can spin up variants on spec (the same spread → judge → converge loop). Motion stops being a black hole of production hours and becomes a generatable artifact, same-source as the interface.

时间线软件 · 二进制Timeline software · binary

改字要重剪、本地化要重做、变体靠手工；agent 进不来，品味花在重复劳动上。

Changing a word means re-editing, localization means redoing, variants are manual; no agent can enter, and taste is spent on repetitive labor.

video-as-code · 文本Video-as-code · text

文案／时长／配色皆变量，agent 按规格铺变体，人只判"哪条对路、节奏对不对"。同一条工作流环。

Copy/duration/palette are variables; an agent spreads variants on spec; the human only judges “which is on-target, is the pacing right.” The same workflow loop.

把动效纳进来，不只是多覆盖一个产物类型，它其实在验证整套方法的可迁移性。一个方法论若只对静态界面成立、一碰到时间维度就失效，那它八成抓的是表象而非结构。动效是一次干净的压力测试：它把"产物是不是私有二进制"这个变量重新拉满（传统视频工具的工程文件正是典型的不可读二进制），于是同一条逻辑链应该重新出现，而它确实出现了。

Bringing motion in is not just covering one more artifact type; it actually verifies the portability of the whole method. A methodology that holds only for static interfaces and breaks the moment it meets the time dimension has most likely grasped surface appearance rather than structure. Motion is a clean stress test: it re-maxes the variable “is the artifact a proprietary binary” (the project files of traditional video tools are exactly the canonical unreadable binary), so the same chain of logic should reappear, and it does.

改一个字要重渲整条、本地化要重做一遍、变体靠手工，这些正是"二进制画布被边缘化"在时间维度上的复现；而 Remotion 把视频表达成 React，让文案成参数、配色取自同一套 token、一次改动成一次 diff，这又正是"代码产物被放大"的复现。这条逻辑链在一个全新的产物类型上原样重演，本身就是这套方法抓到了底层结构、而非临时拼凑的证据。

Change one word and re-render the whole thing, redo localization, variants by hand: these are precisely “the binary canvas gets sidelined” recurring in the time dimension; while Remotion expressing video as React (copy as a parameter, palette from the same tokens, one change as one diff) is precisely “the code artifact gets amplified” recurring. This chain of logic replaying intact on a brand-new artifact type is itself evidence that this method has caught the underlying structure, not an improvised patchwork.

时间维度多出一条不可机检的轴：节奏

The time dimension adds one more axis that cannot be machine-checked: pacing

把 design-as-code 搬到动效，验证了内核的迁移性：同一套逻辑换个面依旧成立。但诚实地说，时间维度比静态多出一条不可机检的轴，所以人那一半的判断在这里反而更重，不是更轻。静态设计里，可机检的硬约束（token、对齐、对比度）能覆盖相当一部分"对不对"；到了动效，多了节奏这条轴（一个转场是 200ms 还是 400ms、一句旁白后该停顿几拍、信息以什么顺序揭示），这些既不能写进 token，也没有"正确值"，只有"对这段内容、这个情绪对不对"。video-as-code 把②③④补全状态、本地化、铺变体这些可机检的劳动自动化掉之后，省下的人力恰恰要更密集地投到节奏判断上。代码给了你随意尝试不同节奏的自由（改个参数即可重渲），但哪个节奏对仍然只有人的耳朵和眼睛能定。

Carrying design-as-code into motion verifies the kernel’s portability: the same logic holds on another face. But honestly, the time dimension has one more axis that cannot be machine-checked than static does, so the human half of judgment is heavier here, not lighter. In static design the hard machine-checkable constraints (token, alignment, contrast) cover a fair share of “right or wrong”; in motion there is the added axis of pacing (whether a transition is 200ms or 400ms, how many beats to hold after a line of narration, in what order information is revealed), none of which can be written into tokens, none of which has a “correct value,” only “right or wrong for this content, this emotion.” This means that after video-as-code automates the machine-checkable labor of ②③④ (filling states, localization, spreading variants), the freed-up human effort must in fact go more densely into pacing judgment. Code gives you the freedom to try different pacings at will (change a parameter, re-render), but which pacing is right can still only be settled by a human’s ear and eye.

这条边界也回答了一个常见的误解："既然视频成了代码，是不是 AI 能端到端生成成品视频了？"能生成，但生成的默认仍是节奏上的均值：四平八稳、哪里都不出错、也哪里都不动人，正是动效版的 slop。所以动效的工作流和静态完全一样：规格（包括节奏意图）→ 铺开多个节奏方向 → 人判哪个对 → 导向再生成 → 收敛。代码形态把这条环的每一步都变得便宜可迭代，但环中央那个"判节奏"的验证器，依旧是人。这就是为什么我们说动效是"同一招再走一遍"，而不是"又一个被 AI 解决的问题"。

This boundary also answers a common misreading: “since video is now code, can AI generate finished video end to end?” It can generate, but the default of that generation is still the mean of pacing (even, error-free everywhere, moving nowhere), which is precisely the motion version of slop. So the motion workflow is fully isomorphic to the static one: spec (including pacing intent) → spread several pacing directions → human judges which is right → steer and regenerate → converge. The code form makes every step of this loop cheap and iterable, but the “judge the pacing” verifier at the loop’s center is still a human. That is why we say motion is “the same move once more,” not “another problem AI has solved.”

动效里还有一类被低估的杠杆：本地化与个性化变体。一条产品视频要出十种语言、三种时长、给不同人群各调一版语气，在时间线软件里这是十几乘以几的重复劳动，每一版都要手动重剪、重对齐、重导出，成本高到大多数团队干脆放弃，只出一版凑合。当视频是代码，这件事的成本结构彻底变了：语言是一个文案参数数组、时长是一个时间轴变量、人群语气是一组可切换的配置，agent 可以按这些参数批量渲染出所有组合，人只需要判断"每一版的节奏在它的语境里对不对"。

Motion holds another underrated lever: localization and personalized variants. Shipping a product video in ten languages, three durations, with the tone tuned for different groups, is in timeline software a repetition of dozens-times-several manual jobs (each version re-edited, re-aligned, re-exported by hand). The cost is so high most teams simply give up and ship one make-do version. When video is code, this cost structure changes completely: language is an array of copy parameters, duration a timeline variable, group tone a switchable configuration, and an agent can batch-render every combination from these parameters, with the human only judging “is each version’s pacing right in its context.”

这把过去"做不起所以不做"的个性化，变成了"几乎免费所以值得做"的常规操作。它的意义不只是省力，而是让"为不同人群各做对一版"（也就是 DSN 09 说的异质守护）在动效这个过去最贵的产物类型上，第一次变得经济可行。代码形态在这里再次证明：它放大的从来是"为具体的人做具体的东西"这件事的可行性，而非某个酷炫功能。

This turns the personalization that used to be “unaffordable so not done” into the routine of “nearly free so worth doing.” Its significance is not only saved effort but that “making one version right for each different group” (DSN 09’s heterogeneity-guarding) becomes, for the first time, economically feasible on motion, the artifact type that used to be most expensive. The code form proves once more here: what it amplifies is the feasibility of “making specific things for specific people,” never some flashy feature.

同构 / 边界Isomorphism / boundary

这是 DSN 06–08 那套逻辑照搬到时间维度——静态怎么做，动效就怎么做。边界也一样：节奏、情绪、何时该停顿留白，无法写进 token，仍是人的品味判断。代码给的是杠杆，不是节奏感。This is the DSN 06–08 logic carried into the time dimension: do for motion what you do for static. The boundary is the same too: pacing, emotion, when to hold a beat or leave space cannot be written into tokens; they remain human taste. Code gives leverage, not a sense of rhythm.

DSN

SPECULATION · 未来推演

SPECULATION · The Speculation Act

推论 · 外推，非事实

Inference · Extrapolation, Not Fact

当生成成本趋零，设计组织会变成什么

When generation cost goes to zero, what the design org becomes

把本卷命题往后投影，张开一个可能性空间，而非预测单一曲线。

This act projects the volume’s thesis forward, opening a possibility space rather than predicting a single curve.

一句话In one line

三条力量（生成趋零、系统即规格、品味即护城河）正在汇流，前两条挤压第三条；每条都附先行指标与证伪条件，能被证伪的推演才值得推演。Three forces (generation → free, system-as-spec, taste-as-moat) are converging, the first two squeezing the third; each carries a leading indicator and a falsification condition, and only speculation that can be falsified is worth speculating.

本章性质 · 推论以下是基于 2023-2026 公开轨迹的外推，不是事实陈述。每条推演都附先行指标与证伪条件；当观察与推演相悖时，本章应最先被改写。〔证据级 Ⅴ 论证/外推〕

Nature of this chapter · InferenceWhat follows extrapolates from the public trajectory of 2023-2026; it is not a statement of fact. Each line carries leading indicators and a falsification condition; when observation contradicts the speculation, this chapter should be the first to be rewritten. [Grade Ⅴ, argument/projection]

THE PROJECTION · 把本卷命题推到时间轴上The thesis pushed onto the timeline

推演不是畅想。本卷只立了一条命题：当"达到平均水准的产出"接近免费，价值就从"会做"塌向"会判断"，而判断里最不能被机器替代的那部分是品味。把这条命题放上时间轴，要追问的不是"AI 会不会更强"（那几乎确定），而是那个无法被机器核验的判断节点，会不会、何时、被什么侵蚀。三条正在汇流的力量决定边界，两条不确定性轴张开四个世界，三件来自那些世界的文物让推演可触。最后必须记下与本卷对赌的反命题：万一品味并不稀缺呢。

Speculation is not daydreaming. This volume rests on one claim: when “output at the average bar” approaches free, value collapses from being-able-to-make toward being-able-to-judge, and the part of judgment least replaceable by machines is taste. Put that claim on a timeline and the question is not “will AI get stronger” (that is nearly certain) but whether, when, and by what the machine-uncheckable judgment node gets eroded. Three converging forces set the boundaries, two axes of uncertainty open four worlds, and three artifacts from those worlds make the speculation tangible. Finally we must record the counter-bet against this volume: what if taste is not scarce after all.

三条力量正在汇流，头两条会挤压第三条Three forces converging — the first two squeeze the third

Three forces converging: the first two squeeze the third

设计组织的重构是三条独立成熟、正在汇流的力量，不是一条曲线。每条只问三件事：成立则解锁什么形态 / 当前在哪 / 什么信号会把它证伪。第一条与第二条若都成立，会同时挤压第三条，这正是本卷命题的胜负手。

The redrawing of the design org is three forces maturing independently and now converging, not one curve. Each asks only three things: what form it unlocks if it holds, where it is now, and what signal would falsify it. If the first two both hold, they squeeze the third at the same time, which is exactly the decisive point of this volume’s thesis.

生成成本趋零 · GENERATION → FREE

Generation Cost → Zero

解锁Unlocks出一张"看起来专业"的稿、一段过得去的视频、一套能跑的组件，边际成本趋近于零。设计的稀缺资源从"产出能力"彻底移走：一人可铺开过去一个团队的候选量。Producing a “looks professional” comp, a passable video, a runnable component set drops to near-zero marginal cost. Design’s scarce resource moves off “ability to produce” entirely; one person spreads the candidate volume a whole team once did.

TRL规模化中2023-26 文生图/视频/前端代码逐年逼近"专业均值"，成本逐年下降〔证据级 Ⅳ〕。Scaling up In 2023-26 text-to-image / video / front-end code close in on the “professional mean” year over year, at falling cost [grade Ⅳ].

证伪Falsified if若生成在"过得去"处长期封顶、最后一公里（品牌一致、可交付、合规）仍需大量人工返工，则成本并未趋零，执行仍是稀缺资源。If generation caps at “passable” for the long run and the last mile (brand consistency, deliverability, compliance) still needs heavy human rework, then cost has not gone to zero and execution remains scarce.

设计系统即规格 · SYSTEM-AS-SPEC

Design-System-as-Spec

解锁Unlocks当 token、组件、规则被写成机器可读的规格（沿 DSN 06-08 的方向），"何为对"的一大半变成可机检的护栏：生成被约束在系统内，品牌一致不再靠逐稿盯。设计系统从文档升级为可执行的判断载体。When tokens, components, and rules are written as machine-readable spec (along the DSN 06-08 direction), much of “what is right” becomes a machine-checkable guardrail; generation is constrained inside the system and brand consistency no longer rides on per-comp policing. The design system upgrades from document to an executable carrier of judgment.

TRL早期商用token 与组件库已标准化；"约束生成"工具 2025 起进入早期商用，品味规则的形式化仍浅〔证据级 Ⅳ〕。Early commercial Tokens and component libraries are standardized; “constrained-generation” tooling entered early commercial use from 2025, while the formalization of taste rules stays shallow [grade Ⅳ].

证伪Falsified if若品味的关键部分始终无法被写成可机检的规则（节奏、情绪、何时留白），则系统只能挡住低级错误，挡不住趋同——护栏有上限。If the load-bearing part of taste can never be written as a machine-checkable rule (pacing, emotion, when to leave space), the system catches only low-level errors, not convergence; the guardrail has a ceiling.

品味作为唯一护城河 · TASTE AS MOAT

Taste as the Remaining Moat

解锁Unlocks前两条把执行和合规都抹平后，组织间唯一不能被复制的差异，是"挑得准、知道为什么、敢承担"的判断密度。竞争从"谁做得多"转向"谁判得对"：品味成为最后的差异化资产，可被定价、被招聘、被组织化。Once the first two flatten execution and compliance, the only difference between organizations that cannot be copied is the density of judgment that picks accurately, knows why, and bears the consequence. Competition shifts from “who makes more” to “who judges right”; taste becomes the last differentiating asset: priced, hired for, and organized around.

TRL论证态这是本卷的承重命题，也是最该被对赌的一条：见下方反命题与情景台。〔证据级 Ⅴ 论证〕Argument stage This is the volume’s load-bearing claim and the one most deserving a counter-bet: see the counter-bet and the scenario bench below. [grade Ⅴ, argument]

证伪Falsified if若模型学会了在统计上稳定地复现"被市场判为好"的设计（用户偏好可被高保真预测），则品味也被自动化，护城河蒸发，这正是反命题。If models learn to reproduce, with statistical stability, the designs the market judges as good (user preference becomes high-fidelity predictable), then taste too is automated and the moat evaporates, which is precisely the counter-bet.

为什么不是四条 · BOUNDARY NOTE

Why Not a Fourth

边界Scope具身/机器人、能源算力地租、监管这些更宽的力量在组织卷推演；本卷只追设计这个面上的三条。把它们都堆进来会稀释命题：推演的纪律是只外推自己能负责的那条线。Broader forces (embodiment/robotics, energy and compute rent, regulation) are speculated in the Org volume; this volume tracks only the three on the design face. Piling them all in would dilute the thesis; the discipline of speculation is to extrapolate only the line you can be held responsible for.

交叉Coupling监管确会外溢到设计（AI 生成内容标注、版权），但它改变的是约束，不改变"品味是否可机检"这个本卷的轴心问题。Regulation does spill into design (labeling of AI-generated content, copyright), but it changes the constraints, not this volume’s pivot question of whether taste is machine-checkable.

INSTRUMENT 14 · 情景台 SCENARIO BENCH

三条力量划定边界，但 2032 落在哪个世界，取决于两条高影响、高不确定的力量：X 轴生成能力（停在"专业均值" vs 突破到"可复现被判为好的设计"）与 Y 轴品味分布（仍稀缺集中于少数判断者 vs 被工具民主化、人人可调）。切换两轴，看本卷命题在那个象限里站得住还是塌掉，以及什么先行指标说明我们正滑向它（GBN 双轴情景法）。

Three forces mark the boundaries, but which world 2032 falls into turns on two high-impact, high-uncertainty forces: X · generation capability (stalls at the “professional mean” vs breaks through to reproducing designs judged good) and Y · taste distribution (stays scarce and concentrated in a few judges vs gets democratized by tools so anyone can dial it). Toggle the two axes to see whether this volume’s thesis holds or collapses in that quadrant, and what leading indicator says we are sliding toward it (the GBN two-axis scenario method).

X · 生成能力Generation Capability

Y · 品味分布Taste Distribution

品味溢价Taste Premium

停在均值 × 稀缺Stalls × Scarce

判断寡头Judgment Oligopoly

复现"好" × 稀缺Reproduces × Scarce

寒武纪长尾Cambrian Long Tail

停在均值 × 民主化Stalls × Democratized

均值之海Sea of the Mean

复现"好" × 民主化Reproduces × Democratized

SHORT-TERM2026-2028

"生成铺开候选"成为默认工序

“Generate the candidates” becomes the default step

从一稿到多稿。独立设计师与小团队默认先让生成铺开十几二十个候选，再把人的时间几乎全部投到挑、评、导上。"会用 Figma 画得快"不再是稀缺技能；"能在二十稿里一眼挑出那一版、并说清为什么"成了新的入门线。设计系统开始被当成"喂给生成的规格"来维护，而不只是交付文档。

From one comp to many. Independent designers and small teams default to letting generation spread a dozen-plus candidates first, then pour almost all human time into picking, critiquing, steering. “Fast in Figma” stops being the scarce skill; “spot the right one out of twenty and say why” becomes the new entry bar. Design systems start being maintained as “spec fed to generation,” not just delivery documents.

校准锚：方向成立，斜率被高估。这是十年曲线的头两年，不是终点。生成在"过得去"处仍频繁卡住最后一公里（品牌细节、跨端一致、可交付状态），返工成本在 2026-28 仍然真实存在，本块所有"趋零"说法都该先打这个折扣。〔证据级 Ⅳ 从业者外推〕

Calibration anchor: the direction holds, the slope is overestimated. These are the first two years of a decade-long curve, not its endpoint. Generation still jams on the last mile at “passable” (brand detail, cross-platform consistency, deliverable state), and rework cost is real through 2026-28; every “→ zero” claim in this block should be discounted by that first. [grade Ⅳ, practitioner extrapolation]

MID-TERM2028-2030

"品味"开始被招聘、被定价、被组织化

“Taste” starts being hired for, priced, and organized

设计组织里出现明确的判断岗与执行岗分层：少数人持有"何为好"的最终判断与品牌方向，生成承包其余。岗位描述里"精通某工具"的权重下降，"判断质量、方向感、能把品味讲清楚"的权重上升。设计系统从静态库演化为"带判断的护栏"：可机检的部分挡住低级错误，挡不住的部分（节奏、情绪）显式留给人。同质化压力成为产品评审的常规议题，而不只是审美洁癖。

A clear split appears inside design orgs between judgment roles and execution roles: a few hold final judgment on “what is good” and the brand direction; generation takes the rest. In job descriptions the weight on “proficient in tool X” falls and the weight on “judgment quality, sense of direction, able to articulate taste” rises. Design systems evolve from static libraries into “guardrails with judgment”: the machine-checkable part catches low-level errors, the uncheckable part (pacing, emotion) is left explicitly to humans. Homogenization pressure becomes a routine review topic, not aesthetic fastidiousness.

分歧点。这一段是本卷与反命题第一次正面相遇：如果到 2030 偏好模型已能稳定预测"这群人会判为好"，那"判断岗"会比预期更早地被压薄。下方反命题块记录了这条对赌。

The point of divergence. This stretch is where this volume first meets its counter-bet head-on: if by 2030 preference models can stably predict “this audience will judge it good,” the “judgment role” thins earlier than expected. The counter-bet block below records that wager.

LONG-TERM2030-2032+

形态多元，而非单一收敛

Plural forms, not a single convergence

最可能的既非"设计师消失"、也非"设计师照旧"，而是光谱分叉：一端是高度自动化、以可机检规格驱动的"均值产品"（够好即可、追求规模与速度），另一端是以稀缺品味为护城河的"判断密度组织"（少数人 + 大量生成，靠"挑得准"卖溢价）。两端之间是仍在用人手做大部分判断的传统团队。"设计师"这个词本身被重新定义：从"会做界面的人"转向"为体验的好坏承担后果的人"。

The most likely outcome is neither “designers vanish” nor “designers carry on as before” but a forking spectrum: at one end, highly automated “mean products” driven by machine-checkable spec (good-enough, chasing scale and speed); at the other, “judgment-density organizations” with scarce taste as their moat (a few people plus heavy generation, selling a premium on picking accurately). Between them sit traditional teams still doing most judgment by hand. The word “designer” itself is redefined: from “someone who can make interfaces” to “someone who bears the consequences for whether the experience is good.”

与"多元"判断对赌的，是"均值之海"的收敛预测：如果生成与偏好预测都成熟、且品味被工具民主化，差异化资产可能整体蒸发，所有产品滑向同一个被验证为"高转化"的局部最优。多元光谱与均值收敛，谁成为 2030 年代设计世界的主图景，是本章最值得跟踪的分歧。〔证据级 Ⅴ〕

Betting against the “plurality” judgment is the convergence prediction of the “Sea of the Mean”: if generation and preference prediction both mature and taste is democratized by tooling, the differentiating asset may evaporate wholesale, and every product slides toward the same local optimum validated as “high-converting”. Whether the plural spectrum or the convergence becomes the main picture of the 2030s design world is the most trackable divergence in this chapter. [grade Ⅴ]

COUNTER-TREND反趋势Counter-trend

"人手做的"成为溢价信号

“Made by a human” becomes a premium signal

所有强趋势都激发反趋势。当生成内容铺满，"100% 人类设计 / 手作"开始作为差异化卖点出现在小众品牌、独立刊物、精品工作室：卖点不在"人手一定更好"，而在"可被证明不是均值生成"本身成了稀缺信号。"反 slop"会从个人洁癖变成一种被市场认可的定位。这一支不会成为主流，但它结构性地存在，并持续提醒：当一切都趋同，"不趋同"本身就有价值。

Every strong trend provokes a counter-trend. As generated content saturates, “100% human-designed / handmade” begins appearing as a differentiator among niche brands, independent publications, and boutique studios: the point is not that human hands are necessarily better, but that “provably not the generated mean” becomes a scarce signal in itself. “Anti-slop” shifts from personal fastidiousness to a market-recognized position. This branch will not become mainstream, but it is structurally present, a standing reminder: when everything converges, “not converging” is itself worth something.

已坐实的一件文物 · 不是虚构One artifact already real · not fiction

下面三件是明确虚构的未来文物；但推演担心的那股力量，有一件此刻已为真。那项发表于 Patterns（Cell Press）的同行评议研究（700 条生成轨迹、7 档温度，几乎全部坍缩到 12 个通用视觉母题）就是"生成默认收敛到均值"在当下（截至 2026 年 6 月）已被测到、可引证的证据。它不证明上方情景台里那个完整的"均值之海"象限（那还要再加上"偏好可被高保真复现"与"品味被工具民主化"两条尚未坐实的条件），只坐实一件事：趋同的引力此刻已是实测现象，而非待验证的预言。这正是为什么"异质守护"要从洁癖升级为工序：要对抗的那股力，已经在场。〔源：同 R8，证据级 Ⅱ 同行评议实证〕[R8]The three pieces below are explicitly fictional future artifacts; but of the force this speculation fears, one artifact is already real now. That peer-reviewed study in Patterns (Cell Press) (700 generation trajectories, 7 temperatures, nearly all collapsing onto 12 universal visual motifs) is measured, citable evidence that, as of June 2026, “generation defaults to the mean” is already observed. It does not prove the full “Sea of the Mean” quadrant on the scenario bench above (that still needs two not-yet-confirmed conditions: preference becoming high-fidelity reproducible, and taste democratized by tooling); it confirms only one thing: the gravity of convergence is now a measured phenomenon, not a prophecy awaiting test. This is exactly why “guarding heterogeneity” must rise from fastidiousness to a process step: the force to resist is already present. [Source: same as R8, grade Ⅱ peer-reviewed empirical][R8]

把命题投影到 2031：三件明确虚构的未来文物Projecting the thesis onto 2031: three explicitly fictional future artifacts

Projecting the thesis onto 2031: three explicitly fictional future artifacts

推演若只有论断会显得抽象。下面三件是 design fiction：明确虚构的未来文物，用以让"品味成为护城河的设计组织"可触。它们是把命题投影到 2031 的一种方式，不是预测。

Speculation made only of assertions feels abstract. The three pieces below are design fiction: explicitly fictional future artifacts that make “the design org where taste is the moat” tangible. They are a way of projecting the thesis onto 2031, not predictions.

SPECULATIVE · 虚构 · Fiction

ARTIFACT 01 · 招聘启事 · Job Posting

招聘：品味负责人（Head of Taste）· 不招生产者

Hiring: Head of Taste · Not Hiring Producers

「你不会被要求出稿。出稿这件事，我们的生成管线一天能给你三千版。你要做的是它做不了的：在三千版里挑出该上线的那一版，说清为什么是它、为什么不是另外两版，并为这个判断的后果负责。」

“You will not be asked to produce comps. Producing comps is something our generation pipeline can hand you three thousand of a day. You do what it cannot: pick the one that should ship out of three thousand, say why it and not the other two, and own the consequences of that judgment.”

职责: 判断而非生产 · 设定品味与品牌边界 · 维护"喂给生成的规格" · 为不可逆的发布决策担责
Responsibilities: Judge rather than produce · set taste and brand boundaries · maintain the “spec fed to generation” · own irreversible release decisions
不要求: 任何单一生成工具的熟练度（我们假设它一年内会被换掉）
Not required: Proficiency in any single generation tool (we assume it will be replaced within a year)
考核: 命中率与方向正确度——挑中的版本上线后的真实表现，以及"为什么"是否能复用进规格、让下一轮命中率更高（非出稿量）
Evaluation: Hit rate and directional correctness: the real-world performance of the picked version after launch, and whether the “why” can be folded back into spec so the next round’s hit rate rises (not comp volume)

SPECULATIVE · 虚构 · Fiction

ARTIFACT 02 · 工具更新日志 · Tool Changelog

某生成式设计工具 v9.0 更新日志（节选）· 人的角色被翻转

A Generative Design Tool, v9.0 Changelog (Excerpt) · The Human Role Inverts

新增: 「判断模式」成为默认。打开文件即生成 N 版候选；画布是一墙待裁的候选，不再是空白。"新建空白画板"降级到二级菜单。
Added: “Judgment mode” is now the default. Open a file and N candidates are generated; the canvas is no longer blank but a wall of candidates awaiting a verdict. “New blank artboard” is demoted to a submenu.
变更: 主操作从"画"变成"挑 / 评 / 导"。每次裁决会问一句"为什么"，把理由沉淀进项目的品味规格——工具开始替你积累那条判断回路。
Changed: The primary action shifts from “draw” to “pick / critique / steer.” Each verdict prompts a “why,” depositing the reason into the project’s taste spec: the tool begins accruing your judgment loop for you.
已知限制: 工具能挡住违反规格的候选，不能替你决定规格本身对不对。节奏、情绪、何时留白仍需人裁——这是设计上我们刻意不自动化的边界，也是本工具的设计立场。
Known limitation: The tool can block candidates that violate the spec; it cannot decide for you whether the spec itself is right. Pacing, emotion, when to leave space still require a human verdict: a boundary we deliberately do not automate, and this tool’s design stance.

SPECULATIVE · 虚构 · Fiction

ARTIFACT 03 · 同质化事故复盘 · Homogenization Postmortem

"我们的 App 和竞品长得一模一样" · 复盘摘要

“Our App Looks Identical to a Competitor’s” · Postmortem Summary

2031，一次品牌重做半年后，团队发现自己的产品与两家竞品的关键页几乎无法区分。没人抄谁：三方都用了同几个生成模型、同几套流行规格、同样把"提升转化"交给同一类偏好预测。趋同是每个理性团队都各自滑向了同一个被验证为"高转化"的局部最优，不是抄袭。

In 2031, half a year after a brand refresh, a team found its product nearly indistinguishable from two competitors’ on the key screens. No one copied anyone: all three used the same few generation models, the same popular spec, and handed “improve conversion” to the same class of preference prediction. The convergence was every rational team independently sliding toward the same local optimum validated as “high-converting,” not plagiarism.

根因: 共享模型 + 共享规格 + 共享优化目标 = 共享吸引子（呼应 DSN 09·7 同质化机制）
Root cause: Shared models + shared spec + shared optimization target = a shared attractor (echoing the DSN 09·7 homogenization mechanism)
责任链: 落在把"何为好"完全外包给转化数字的判断者：症结不在"模型趋同了"，而在没人守异质这件事的判断节点空着
Chain of responsibility: Falls on the judges who outsourced “what is good” entirely to conversion numbers: the problem is not “the models converged” but that the judgment node for guarding heterogeneity was left empty
修复: 把一条"异质守护"指标放进评审（与均值的距离），并显式保留一个人来回答"这是否还像我们"：把本卷的命题做成一道工序，而非口号
Remediation: Put a “guard-heterogeneity” metric into review (distance from the mean) and explicitly keep one human to answer “does this still look like us”: turning this volume’s thesis into a step in the process, not a slogan

主动写下的反命题 · COUNTER-BET本卷押注"品味稀缺、不可机检"。诚实要求把最强的反方观点完整摆出，而不是立一个稻草人来打。反方的最强形式不是"AI 会画得更好看"，而是"品味在统计上并不神秘，因此可被学习与复制"。论证分三步，每一步都已有早期苗头：其一，"被某群人判为好"很可能是一个有结构、可学习的分布：人的审美偏好并非随机，它被文化、语境、近因强烈约束，而凡是有结构的东西，足够的数据加足够强的模型原则上就能逼近。其二，规模正在把这个分布的数据补齐：每一次 A/B、每一次留存、每一次"用户更爱哪版"都在给偏好函数喂标注，偏好建模因此可能不必依赖"理解为什么好"，只需在结果上稳定复现"被判为好"。其三，本卷反复强调的"导/挑/评"这套判断动作，一旦能被显式表达成规格（这正是 DSN 06-08 在推动的事），也就同时把它暴露成了可被模仿的训练信号：我们越是成功地把品味外化成可机检的规格，就越是在亲手为"自动化品味"铺设训练集。这是本卷方法论里一个真实的内在张力，不该被掩盖。什么观察会证实反方、判本卷败：当一个"生成 + 偏好预测"系统在双盲条件下、对一个它训练时未见过的新受众群、于一个旧判据不再适用的新情境里，其挑选命中率能稳定追平甚至超过该领域资深判断者（且这种优势可跨品类复现、而非靠过拟合某一类视觉），那么"品味结构性地停在人这侧"就被推翻了，护城河蒸发，本卷的承重命题随之失效，应整章改写。本卷给自己留的、尚未被证伪的余地只有一处：那个"未见过的新情境"：偏好模型擅长内插已被判过的分布，但一个真正没有先例的新审美命题（一种没人做过、却"对"的东西），没有历史标注可学。只要新情境持续产生、且其判断无法靠内插历史得出，人的判断节点就还在；这条余地一旦也被关上（模型能稳定地为无先例情境做出被验证为对的判断），本卷就该认输。作者把这条证伪条件白纸黑字写在这里，正是为了不把"品味永远稀缺"当成不可质疑的信仰。〔证据级 Ⅴ 论证，与本卷对赌〕

The counter-bet on record · COUNTER-BETThis volume bets that taste is scarce and machine-uncheckable. Honesty demands laying out the strongest opposing case in full rather than erecting a straw man to knock down. The counter-bet’s strongest form is not “AI will draw prettier things” but “taste is not statistically mysterious and is therefore learnable and reproducible.” The argument runs in three steps, each with an early signal already visible. First, “judged good by a given audience” is very likely a structured, learnable distribution: human aesthetic preference is strongly constrained by culture, context, and recency, not random, and anything with structure can in principle be approximated by enough data plus a strong enough model. Second, scale is filling in that distribution’s data: every A/B test, every retention curve, every “users preferred this version” is labeling the preference function, so preference modeling may not need to “understand why it is good,” only to reproduce “judged good” stably at the level of outcomes. Third, the very judgment act this volume keeps emphasizing (steer / pick / critique), once it can be expressed explicitly as spec (exactly what DSN 06-08 push toward), is thereby also exposed as an imitable training signal: the more successfully we externalize taste into machine-checkable spec, the more we are, with our own hands, laying down the training set for “automated taste.” This is a real internal tension inside this volume’s method, and it should not be hidden. What observation would confirm the counter-bet and rule this volume lost: when a “generation + preference-prediction” system, under double-blind conditions, for a new audience it did not see in training, in a novel situation where old criteria no longer apply, can stably match or exceed a senior domain judge’s pick-accuracy, and that edge reproduces across categories rather than overfitting one visual genre, then “taste sits structurally on the human side” is overturned, the moat evaporates, this volume’s load-bearing claim fails with it, and the chapter should be rewritten. The one not-yet-falsified margin this volume keeps for itself is precisely that “novel situation never seen”: preference models excel at interpolating distributions already judged, but a genuinely unprecedented aesthetic proposition (something no one has made yet that is nonetheless “right”) has no historical labels to learn from. As long as novel situations keep arising and their judgment cannot be reached by interpolating history, the human judgment node remains; the day that margin closes too, when a model stably makes verified-correct judgments for situations without precedent, this volume should concede. The author writes this falsification condition down in plain ink precisely so as not to hold “taste is forever scarce” as an unquestionable article of faith. [grade Ⅴ, argument, betting against this volume]

推演溢出的东西Second-Order Effects

Second-Order Effects

推演的终点是它溢出的东西，不是设计组织本身。以下每条都标注在哪个象限下成立：没有无条件的预言。

The endpoint of speculation is not the design org itself but what spills over from it. Each item below is annotated with the quadrant under which it holds; there are no unconditional prophecies.

新岗位：品味负责人 / 判断岗与执行岗分层〔品味溢价 / 判断寡头象限〕；"异质守护"成为可考核职责〔均值之海被察觉后的任一象限〕；设计系统维护者升级为"规格 + 判断载体"的持有人〔系统即规格成立时〕。
New roles: Head of Taste / a split between judgment and execution roles [Taste Premium / Judgment Oligopoly]; “guarding heterogeneity” becomes an evaluable duty [any quadrant once the Sea of the Mean is noticed]; the design-system maintainer becomes the holder of “spec plus judgment carrier” [when system-as-spec holds].
新方法工具：与均值的距离作为评审指标〔同质化象限〕；可机检的品味护栏（挡低级错误，显式留白给人）〔系统即规格成立〕；命中率而非出稿量作为考核〔判断岗成型后〕。
New methods and tools: distance-from-the-mean as a review metric [convergence quadrants]; machine-checkable taste guardrails (block low-level errors, leave the uncheckable to humans) [system-as-spec holds]; hit rate rather than comp volume as evaluation [once judgment roles form].
二阶影响：就业——执行岗收缩、判断岗稀缺化的断层〔全象限，强度随生成能力上升〕；审美生态——同质化的系统性风险与"人手做"的反向溢价同时出现〔均值之海加速两端〕；版权与署名——当作者是"挑的人"而非"做的人"，署名与责任如何归属〔全象限〕。
Second-order effects: employment: the rupture between shrinking execution roles and a scarcified judgment role [all quadrants, intensity scaling with generation capability]; the aesthetic ecosystem: the systemic risk of homogenization and the reverse premium on “made by hand” emerging together [the Sea of the Mean accelerates both ends]; copyright and attribution: when the author is “the one who picks” not “the one who makes,” how authorship and liability are assigned [all quadrants].

DSN

PLAYBOOK · 落地 / 失败模式 / 自检

PLAYBOOK · LANDING & FAILURE MODES

行动 · 承重

Action · Load-bearing

起步、最常见的误用方式、一件自检

Where to start, the most common ways to get it wrong, one self-check

把这一卷收成可执行的落点：四条原则、四组信号、一条起步路径，加上设计师在 AI-Native 化里最常掉的三个坑。最后给一件可玩的自检——把"品味是稀缺判断"做成你下次发版前能跑一遍的清单。

Bring the volume to an executable landing: four principles, four signal sets, a starting path, plus the three pits designers most often fall into going AI-Native. Then a playable self-check that turns “taste is the scarce judgment” into a list you can run before the next release.

一句话In one line

生成负责"多"，人负责"对"：先建护栏再开生成，把"规格→铺开→评判→导向→收敛→沉淀"这条环跑通，并避开三个坑：把"快"当赢、用生成代替判断、把品味硬塞进 lint。Generation handles “many,” people handle “right”: build guardrails before generating, run the “spec → spread → critique → steer → converge → distill” loop through once, and dodge the three pits: mistaking “fast” for the win, generation in place of judgment, forcing taste into lint.

四条原则：设计系统先行、生成多判断严、写下何为好、守住人本与异质

Four principles: design system first, generate many and judge hard, write down what’s good, hold the human and the heterogeneous

设计系统先行：先立 token／组件／红线，再开生成——没有护栏的生成只会更快地铺 slop。
Design system first: stand up tokens / components / red lines before generating; generation without guardrails only spreads slop faster.
生成多、判断严：铺开候选交给机器，挑／评／导留给人；产出量不是指标，品味命中率才是。
Generate many, judge hard: spreading candidates goes to the machine; pick/critique/steer stays with people; output volume is not the metric, taste hit-rate is.
写下"何为好"：软判据写给人、硬约束写成 lint；不写下来，生成只会滑回均值。
Write down “what’s good”: soft criteria for humans, hard constraints as lint; unwritten, generation slides back to the mean.
守住人本与异质：出稿快慢从不是赢的标准，为具体的人而做才是；好的设计只对一群人成立，别为"对所有人还行"而妥协。
Hold the human and the heterogeneous: speed is never the standard for winning: being made for specific people is; good design is true for one group, so do not trade it for “fine for everyone.”

AI 是协作者，不是评判者——这条边界决定了谁握最终判断

AI is a collaborator, not the judge: this boundary decides who holds the final call

一个容易滑过去、却决定成败的边界：在设计里，AI 可以是极强的协作者（铺候选、补状态、给建议、甚至模拟某类用户的反应），但不能成为最终的评判者。原因回到 DSN 03·5：评判"这版是否为这群人对路"需要没法机器化的品味判断，它坐落在可验证性梯度的最远端，AI 给出的"评分"本质仍是对均值的拟合——让它当裁判，等于让均值来定义好坏，那条异质守护的线就会被悄悄抹平。可以让 AI 帮你把判断说得更清楚（"这版为什么让你犹豫？是层级、是语气、还是节奏？"），这是协作；但不能让 AI 替你做出那个判断。把这条边界写进团队的工作约定：AI 的输出永远是"候选 + 理由"，最终"收哪个"的按钮必须由人按下，并由人说清为什么。一旦让模型既当运动员又当裁判，闭环里那个验证器就被偷换成了均值生成器，整套方法的承重点就塌了〔源：Doshi & Hauser《Science Advances》受控实验——AI 放大个体新颖、压平集体多样性；Patterns（Cell Press）同行评议——生成坍缩到分布均值母题；两者均证据级 Ⅱ；前者对象为叙事文本，迁移到设计为合理但未验证的外推〕[R4][R8]。

A boundary easy to skip past yet decisive: in design, AI can be an extremely strong collaborator (spreading candidates, filling states, giving suggestions, even simulating how a class of users might react), but it cannot become the final judge. The reason returns to DSN 03·5: judging “is this version on-target for these people” needs constitutive taste, sitting at the far end of the verifiability gradient, and the “score” AI gives is still at bottom a fit to the mean. Making it the referee means letting the mean define good and bad, and the line of heterogeneity-guarding gets quietly erased. You can let AI help you articulate the judgment more clearly (“why does this version make you hesitate: hierarchy, voice, or pacing?”), which is collaboration; but you cannot let AI make that judgment for you. Write this boundary into the team’s working agreement: AI’s output is always “candidates + reasons,” and the final “converge on which” button must be pressed by a human who states why. Once the model is both athlete and referee, the verifier inside the closed loop is swapped for a mean-generator, and the whole method’s load-bearing point collapses 〔Source: Doshi & Hauser, Science Advances controlled experiment — AI amplifies individual novelty, flattens collective diversity; Patterns (Cell Press), peer-reviewed — generation collapses onto the distribution’s mean motifs; both grade Ⅱ; the former’s object is narrative text, transfer to design is a reasonable but unverified extrapolation〕[R4][R8].

起步路径

Starting path

① 把现有设计搬到代码形态（哪怕只是 token ＋组件落进 repo），先拿到可读／可 diff／可生成。② 用下方自检挑出"最该注入品味"的几处。③ 在那几处跑一遍 DSN 07 的环（规格→铺开→评判→导向→收敛→沉淀），把第一轮的判断回流进系统。先小，先让护栏与判断成形，再扩面。

① Move existing design into code form (even just tokens + components into the repo) to first gain readable/diffable/generatable. ② Use the self-check below to find the few places “most in need of injected taste.” ③ Run the DSN 07 loop there (spec → spread → critique → steer → converge → distill), feeding the first round’s judgment back into the system. Start small; let guardrails and judgment take shape, then widen.

这条起步路径刻意把"先建护栏"放在"开生成"之前，是因为顺序本身承重。一个常见的失败是反过来：先兴奋地让 AI 铺一堆界面，再回头想"该用什么规范统一它们"，这时你已经被一堆好看但各异的候选淹没，判断力耗在收拾局面上，而不是导向。先建护栏（哪怕极小：三五个 token、两三条红线、一句"为谁而做"）意味着第一轮生成就落在窄带里，你的判断从一开始就用在刀刃上。所以"先小"是把稀缺的判断力花在最高杠杆的地方，不是保守：先让护栏与判断在一个小范围内成形、跑通那条闭环，确认判断真的在回流、命中率真的在上升，再扩到更大的面。把第一个完整循环跑通，比一次铺开十个页面重要得多。

This starting path deliberately puts “build guardrails first” before “start generating,” because the order itself is load-bearing. A common failure reverses it: excitedly have AI spread a pile of interfaces first, then go back wondering “what standard should unify them,” by which point you are drowning in good-looking but divergent candidates, your judgment spent on cleanup rather than steering. Building guardrails first (even minimal: three to five tokens, two or three red lines, one “for whom” line) means the very first round of generation lands in the narrow band, and your judgment is spent on the cutting edge from the start. So “start small” is spending scarce judgment where leverage is highest, not conservatism: let guardrails and judgment take shape in a small scope first, run the closed loop through, confirm that judgment really flows back and hit-rate really rises, then widen to a larger surface. Getting that first complete cycle running matters far more than spreading ten pages at once.

如果要把这一卷压成可以贴在墙上的一句话，那就是：生成负责"多"，人负责"对"；而"对"的标准，必须由人写下来、并随每一轮判断变得更准。这一句里装着全部四原则——设计系统先行（把"对"的标准前置成护栏）、生成多判断严（分工："多"给机器、"对"给人）、写下何为好（标准必须外化，否则生成只能滑回均值）、守住人本与异质（"对"的最终判据是"为这群具体的人对"，而非"对所有人还行"）。它也装着全部三个失败模式的镜像：把"快"当胜利＝只追求了"多"忘了"对"；用生成代替判断＝放弃了人对"对"的责任；把品味当可计算＝误以为"对"的标准能全交给机器。把这一句记住，遇到任何具体决策时，问自己：这一步我是在帮生成产出更多，还是在帮自己把"对"的标准说得更清楚？前者机器越来越能替你做，后者永远是你的活——也永远是这门手艺的价值所在。

If this volume had to be compressed into one line you could pin on a wall, it would be: generation handles “many,” people handle “right”; and the standard for “right” must be written down by people and grow sharper with each round of judgment. That one line holds all four principles: design system first (front-load the standard for “right” as a guardrail), generate many and judge hard (the division of labor: “many” to the machine, “right” to people), write down what is good (the standard must be externalized, or generation can only slide back to the mean). Hold the human and the heterogeneous (the final criterion of “right” is “right for these specific people,” not “fine for everyone”). It also holds the mirror image of all three failure modes: mistaking “fast” for the win = chasing only “many,” forgetting “right”; generation in place of judgment = abandoning the human’s responsibility for “right”; treating taste as computable = wrongly believing the standard for “right” can be handed entirely to the machine. Remember this one line, and at any concrete decision ask yourself: in this step am I helping generation produce more, or helping myself state the standard for “right” more clearly? The former the machine can increasingly do for you; the latter is forever your work, and forever where the real value of this craft lies.

最常见的三种误用

The three most common ways to get it wrong

失败 · 一

FAILURE · 1

把"快"当胜利

Mistaking “fast” for the win

出稿更快了，只是更快地产 slop。出稿快本身不是赢；若没换来更高品味命中，瓶颈仍停在旧流程里。

Comps come faster, but it is just slop faster. Speed itself is not the win; if it buys no higher taste hit-rate, the bottleneck is still inside the old process.

失败 · 二

FAILURE · 2

用生成代替判断

Generation in place of judgment

不停"再来一个"却说不出在找什么。环空转，候选越堆越多，方向反而越来越糊。该停下写规格。

Endless “one more” with no statement of what you seek. The loop spins, candidates pile up, direction blurs. Stop and write the spec.

失败 · 三

FAILURE · 3

把品味当可计算

Treating taste as computable

把软判据硬塞 lint，优化掉每个可机检指标，得到挑不出错却没人想用的界面。软判据留给人。

Forcing soft criteria into lint, optimizing every checkable metric into a flawless interface no one wants. Soft criteria stay with people.

INSTRUMENT 12 · SLOP 自检表INSTRUMENT 12 · SLOP SELF-CHECK

发版前跑一遍：勾掉你这版命中的征兆。命中越多，越滑向均值——读数会给出 slop 分、所处带，与第一处该注入品味的地方。Run it before release: tick the fingerprints this version hits. The more hits, the closer to the mean; the readout gives a slop score, the band, and the first place to inject taste.

把每类设计决策分诊：哪些交给生成、哪些定成规则、哪些必须人判

Triage each kind of design decision: hand to generation, set as a rule, or judge by human

DSN 08 给了一条分诊问句，这里把它做成一台可玩的分配器。它沿两条轴打分：横轴问"这类决策可生成 / 可规则化吗？"，纵轴问"它需要品味判断吗？"。两轴一交叉，每类决策落进一格，给出明确判词——交给生成、设计系统定规则、还是必由人判。这台分配器的价值在于逼你把"设计"这个笼统的词拆成一类一类的决策、逐类问这两个问题，不在于告诉你某个答案——这本身就是把品味从直觉里外化出来的过程。试着用它过一遍你手上的项目：补全状态、配色主调、信息层级、品牌语气、对齐间距，各落在哪格？

DSN 08 gave a triage question; here it becomes a playable allocator. It scores along two axes: the horizontal asks “can this kind of decision be generated / ruled?” and the vertical asks “does it need taste judgment?” Cross the two and each kind of decision lands in a cell with a clear verdict: hand to generation, set a design-system rule, or judge by human. The allocator’s value is not telling you a single answer but forcing you to break the blanket word “design” into decision by decision and ask these two questions of each, which is itself the process of externalizing taste from intuition. Try running your current project through it: filling states, the primary palette, information hierarchy, brand voice, alignment and spacing: which cell does each land in?

INSTRUMENT 13 · 设计判断分配器INSTRUMENT 13 · DESIGN-JUDGMENT ALLOCATOR

选一类设计决策的两个属性，看它该落在哪个节点。两轴：可生成/可规则化？× 需品味判断？Pick the two attributes of a kind of design decision and see which node it belongs to. Two axes: generable/rule-able? × needs taste?

① 这类决策可生成 / 可规则化吗？① Can it be generated / ruled?

② 它需要人来定的品味判断吗？② Does it need constitutive taste?

x=Y · y=N

交给生成

Hand to generation

补全状态、铺响应式、套系统、对齐切图——机器全包。

Fill states, responsive, apply the system, alignment/export: the machine does it all.

x=Y · y=Y

设计系统定规则

Design-system rule

可机检但带价值取向：调色板、间距阶、对比度阈值——人先定规则，机器再执行。

Machine-checkable yet value-laden: palette, spacing scale, contrast thresholds: humans set the rule, the machine enforces.

x=N · y=N

需上下文的事实题

Context-fact question

不靠品味但须懂语境：这群用户的真实流程、设备、约束——人查清事实，喂给生成。

No taste but needs context: these users’ real flows, devices, constraints: humans establish facts, then feed generation.

x=N · y=Y

必由人判 · 品味

Human taste · keep here

为谁而做、有没有灵魂、对不对路——可验证性梯度最远端，不可外包给生成。

For whom, has soul, on-target: the far end of the verifiability gradient, not outsourceable.

对一个设计师的职业，这套方法意味着什么

What this method means for a designer’s career

把这卷收到个人层面：如果你是一个设计师，这套方法不是在预告你的工作会消失，而是在指出你的价值重心会迁移，且越早主动迁移越有利。会做（出稿、铺变体、对齐切图）的部分会持续贬值，因为它正是生成最擅长接管的；会判断（写有判别力的规格、说清为什么这版对路、把判断回流进系统）和会方向（看出该往哪生成、守住为谁而做的边界）的部分会持续升值，因为它坐落在可验证性梯度上模型够不到的那一端。

Bringing this volume down to the individual level: if you are a designer, this method does not foretell that your work will vanish; it points out that your center of value will migrate, and the earlier you migrate it deliberately, the better off you are. The making part (producing comps, spreading variants, alignment and export) will keep depreciating, because it is exactly what generation is best at taking over; the judging part (writing discriminating specs, stating why a version is on-target, feeding judgment back into the system) and the directing part (seeing which way to generate, holding the boundary of for-whom) will keep appreciating. They sit at the end of the verifiability gradient the model cannot reach.

值得主动投资的能力，是"把品味外化得更清楚、把判断说得更有理有据"，不再是"把某个工具用得更熟"。一个具体的自检：回顾你上周的工作，花在产出上的时间和花在判断、写规格、给方向上的时间，比例是多少？如果还是前者占绝大多数，说明你还停在旧流程里用更快的手，迁移还没发生。这套方法给的是一张把自己的价值重心往上挪的地图，不是工具。

This means the capability worth investing in is no longer “getting more fluent with some tool” but “externalizing taste more clearly, stating judgment with more reasoned grounds.” A concrete self-check: review last week’s work: what is the ratio between time spent on production and time spent judging, writing specs, giving direction? If the former still dominates by far, it shows you are still in the old process with a faster hand, and the real migration has not happened. What this method offers is a map for moving your own center of value upward, not a tool.

这套迁移有一个常被忽略、却对个人最重要的隐含前提：判断力是会随用而长、随弃而萎的，所以"现在就开始判断"本身就是在投资未来的自己。价值重心上移是一条需要持续走的成长曲线，而非一次性切换的开关：你越早开始在工作里刻意练规格、练评判、练方向，你的判断力就越早进入复利增长；反之，越是抱着"等工具更成熟、等团队都转了我再转"的心态拖延，你就越是在让那条本该增长的曲线停在原地，而周围真正动手的人在拉开差距。

This migration has an implication often overlooked yet most important to the individual: judgment grows with use and atrophies with disuse, so “start judging now” is itself an investment in your future self. Moving your center of value upward is a growth curve you must keep walking, not a one-time switch: the earlier you start deliberately practicing spec, critique, and direction in your work, the earlier your judgment enters compounding growth; conversely, the more you procrastinate with “I’ll switch once the tools mature, once the team has moved,” the more you let that curve that should be growing stall in place while the people actually doing the work pull ahead.

这条曲线对个人的残酷与公平都在于：它不奖励你用了多新的工具，只奖励你做了多少次判断、并复盘了多少次为什么。所以这一卷给设计师的最后一句话不是"去学某个 AI 工具"，而是：从你手上正在做的下一个设计开始，强迫自己在每个该判断的地方真的判断一次、并写下为什么，这一个动作重复足够多次，就是你在 AI-Native 时代最可靠的护城河。

This curve is both cruel and fair to the individual: it rewards not how new your tools are but how many real judgments you made and how often you reviewed why. So this volume’s last word to designers is not “go learn some AI tool” but: starting with the very next design in your hands, force yourself to actually judge once at each place a judgment is due and write down why. This one act, repeated enough times, is your most reliable moat in the AI-Native era.

把整卷收成一句可以带走的话：AI 让"做"变得廉价，于是"判断该做什么、为谁做"第一次成了设计几乎全部的价值。过去这两件事缠在一起，做得好的人通常也判断得好，于是没人需要把判断单拎出来谈；现在执行被剥离给生成，判断被迫独立显形，它的稀缺、它的可训练、它对人本的依赖，才第一次看得这么清楚。这一卷做的全部事情，就是把这个被剥离出来的判断，从"藏在直觉里"还原成"可拆解、可外化、可回流、可练习的具体动作"。但这仍是过渡态的答案——它假设"一个人为一群人做出一版被挑中的设计"这个形状不变，只是重新分配了谁出稿、谁判断。真要把设计从头设计一遍，绕不开两个这一面单独问不完的问题：谁来定义"这群人"、谁为生成被个性化之后的后果负责——答案得穿到组织那一层去找。

To bring the whole volume to one takeaway line: AI made “making” cheap, so “judging what to make and for whom” became, for the first time, design’s real and nearly entire value. These two used to be entangled: those who made well usually judged well, so no one needed to discuss judgment on its own; now that execution is stripped to generation, judgment is forced to take independent shape, and only now are its scarcity, its trainability, its dependence on the human seen this clearly. Everything this volume did was to restore that stripped-out judgment from “mysticism hidden in intuition” to “a concrete act that can be decomposed, externalized, fed back, and practiced.” But that answer is still transitional: it keeps fixed the shape of “one person makes one chosen version for a group,” only reassigning who draws and who judges. Redesigning design from zero runs into two questions this face alone can’t finish asking: who gets to define “this group,” and who answers for the consequences once generation is personalized. The answers have to be sought one layer up, in the organization.

收束 · 系列人本主线Close · the series’ human through-line

这一卷是整个系列人本主线落得最具体的一面：AI-Native 设计把设计师还给共情、品味与意义，而非更快地产 slop；为具体的人，做真正为他们而存在的东西。This volume is where the series’ human through-line lands most concretely: AI-Native design returns the designer to empathy, taste, and meaning rather than producing slop faster; making, for specific people, something that genuinely exists for them.

DSN

WORKED · 落到产物上

WORKED CASES

案例 · 走一遍

Cases · run it

把内核四步，按到四个真实产物上

Pressing the four-step kernel onto four real artifacts

原理若只停在原理，就只是好听的话；这里让它见真章。

Principle that stays principle is just nice talk; here it faces the test.

一句话In one line

把同一套内核按到四个真实产物上，每个案例都给出可核对的前后差，和那一刀只能留在人手里的判断。Press the same kernel onto four real artifacts; each gives a checkable before/after and the one cut that can only stay in human hands.

案例 A · 结账流程在近免费生成下重做CASE A · A checkout flow, redone under near-free generation铺开→收敛真实跑法spread→converge, as it ran

背景

Setting

一个中小电商的结账流程，老问题是第三步（地址＋配送方式）弃单率高。传统做法：交互稿排一周、评审、切图、交付前端两周——一个备选，赌它对。AI-Native 做法：把约束写清后，让生成在一个下午铺出 九个结构不同的候选——单页全展开、三步分页、手风琴折叠、地址自动带出＋仅确认、配送选项前置、运费早现、游客优先、钱包优先、混合式。产出不是瓶颈了，挑哪个、为什么才是。

A mid-size e-commerce checkout whose chronic wound was a high abandon rate at step three (address + delivery). Traditional path: a week of interaction comps, a review, slicing, two weeks of front-end handoff: one candidate, bet it is right. The AI-native path: with constraints written down, generation spread nine structurally different candidates in one afternoon: single-page expanded, three-step paged, accordion, address-autofill-then-confirm, delivery-first, freight-shown-early, guest-first, wallet-first, hybrid. Production stopped being the bottleneck; which one, and why became it.

内核①②

KERNEL ①②

执行（出九稿）交给生成；判断（选哪个）退守到人。

Execution (nine comps) to generation; judgment (which) back to people.

收敛是怎么真发生的

How convergence actually happened

九个候选没有靠"哪个看起来顺眼"挑。团队先把弃单的真实成因查清——会话回放显示，多数人卡在运费在最后一步才出现，而非布局繁简。这一条事实把九选一从审美题变成了事实题：凡是"运费早现"的候选直接进入决赛圈，其余无论多漂亮都淘汰。决赛三个候选做了一周灰度 A/B：运费早现＋地址自动带出的混合式，弃单率从 31% 降到 19%〔源：本案例数字为脱敏复盘区间，证据级 Ⅳ 一手从业者，非公开实验，不外推到其他品类〕[R5]。

The nine were not picked by “which looks nicer.” The team first established the real cause of abandonment: session replay showed most people stalled because freight appeared only at the last step, not because layouts were busy. That single fact turned nine-into-one from an aesthetic question into a factual one: every “freight-early” candidate advanced; the rest, however pretty, were cut. Three finalists ran a week of gated A/B: the hybrid with freight-early plus address-autofill dropped abandonment from 31% to 19%〔source: figures here are a de-identified retrospective range, grade Ⅳ practitioner first-hand, not a public experiment, not extrapolated to other categories〕[R5].

留在人手里

STAYED HUMAN

把"为什么弃单"查成事实，再用事实当判据筛候选——这一刀机器递不出。

Establishing “why they abandon” as fact, then using fact as the cut: a cut the machine cannot hand you.

读后

Read-off

注意这不是"生成帮我们更快画完一个稿"。是流程换了形：先铺开候选空间，再用一条查清的事实把空间收掉大半，最后只在三个里做受控比对。生成做了所有"做出来"的活，人做的全是"该是哪个、凭什么"的判断。两周的交付压成了一个下午加一周灰度，省下的是把赌注从一个押到了九个，再用证据收敛，不是工时。

Note this is not “generation helped us finish one comp faster.” The process changed shape: spread the candidate space first, collapse most of it with one established fact, then run a controlled comparison among only three. Generation did all the “making”; the human did all the “which one, on what grounds” judgment. Two weeks of handoff compressed to one afternoon plus a week of gating, and what was saved was the move from betting on one to betting on nine, then converging on evidence, not labor.

内核③④

KERNEL ③④

事实写进筛选规则（③）；为真实用户的处境而判（④）。

Fact written into the screening rule (③); judging for real users’ situation (④).

案例图CASE FIGFIG. D9 / SPREAD-THEN-COLLAPSE · 九候选如何被一条事实收掉 · how nine candidates collapse on one fact 看懂：候选空间先铺宽，再被"运费早现"这条事实砍成决赛圈 Read: the candidate space spreads wide, then one fact (“freight early”) cuts it to a shortlist

漏斗不是"生成给的越多越好"。宽口是生成的功劳（九个结构不同的候选，近零成本）；窄口是人的功劳（一条查清的事实把审美题压成事实题）。决赛圈才用受控比对。机器把"做出来"做到极廉，人把"该是哪个"判得有据——这就是铺开→收敛的全部。

The funnel is not “the more generation gives, the better.” The wide mouth is generation’s contribution (nine structurally distinct candidates, near-zero cost); the narrow mouth is the human’s (one established fact compresses an aesthetic question into a factual one). Only the shortlist gets controlled comparison. The machine makes “making” nearly free; the human judges “which one” on grounds: that is the whole of spread-then-converge.

案例 B · 把模糊的"高级感"写成可生成、半可机检的规格CASE B · Turning a fuzzy “premium feel” into a generatable, half machine-checkable spec品味写成规格taste as spec

起点：一句话的"好"

Start: a one-line “good”

一个金融类 App 改版，老板的全部 brief 是三个字："要高级。"过去这种 brief 靠一个资深设计师把它"画出来"，好坏全在那个人脑子里、说不清也教不会。在生成几乎免费的语境下，这种 brief 是有毒的：你把"要高级"丢给模型，它给你的就是训练分布里最常见的那种"高级"——深色＋衬线＋大留白＋金色描边，所有金融 App 都长这样的那种。模糊的"好"喂出的必然是均值 slop。

A fintech app redesign; the boss’s entire brief was two words: “make it premium.” Historically a senior designer would “draw out” what that meant, with good and bad living unspoken in one head, unteachable. Under near-free generation this brief is toxic: hand “premium” to a model and you get the most common “premium” in its distribution: dark + serif + big whitespace + gold trim, the look every fintech app already wears. A fuzzy “good” feeds back mean slop by construction.

病灶

THE WOUND

模糊判据 + 廉价生成 = 自动收敛到均值。

Fuzzy criteria + cheap generation = auto-converge to the mean.

把"高级"拆成可判别的条目

Decomposing “premium” into discriminating items

解法是把"高级"拆成一组任何人（和机器）都能逐条判的条目，不是换个更懂的设计师。团队跟老板做了三轮"指物问答"：拿十个现有界面，逐个问"这个算高级吗、为什么"。三轮后，"高级"被拆成六条可判据：① 信息密度低但不空（每屏不超过一个主操作）；② 字体仅两档字号、字重对比靠粗细不靠颜色；③ 主色取自品牌而非通用金色，对比度 ≥ 4.5:1；④ 不用渐变文字、不用玻璃拟态；⑤ 数字用等宽体右对齐；⑥ 动效仅用于状态确认、时长 ≤ 200ms。前四条可写进 lint 直接机检；后两条是软判据，留给人复核。"高级"从一个人脑里的模糊直觉，变成了一张能喂给生成、又能半自动验收的规格。

The fix is decomposing “premium” into items anyone (and any machine) can rule on one by one, not a designer who “gets it” better. The team ran three rounds of point-and-ask with the boss: take ten existing screens and ask, one at a time, “is this premium, and why?” After three rounds, “premium” decomposed into six rule-able items: ① low information density but not empty (at most one primary action per screen); ② only two type sizes, weight contrast carried by boldness not color; ③ primary color drawn from the brand, not generic gold, contrast ≥ 4.5:1; ④ no gradient text, no glassmorphism; ⑤ numbers in monospace, right-aligned; ⑥ motion only for state confirmation, ≤ 200ms. The first four go straight into a lint; the last two are soft criteria left for human review. “Premium” went from one head’s mysticism to a spec you can feed generation and half-automatically accept.

两层规格

TWO LAYERS

①–④ 进 lint（可机检）；⑤⑥ 留人（软判据）。判断一次，执行无数次。

①–④ to lint (machine-checkable); ⑤⑥ to humans (soft). Judge once, enforce countless times.

读后

Read-off

写规格的那三轮指物问答，本身就是这个项目最贵、最不可外包的工作——它把一个人的隐性品味，逼成了显性的、可传递的、可机检一半的判据。这正是设计卷反复说的"把判断从做里抽出来、写下来"：过去品味活在那位设计师的手上，他一走就带走；现在它写成了六条规格，新来的人和生成模型都能照着跑。规格不是限制创意，它是把创意里可复用的那部分固化、把不可复用的那部分（为什么是这个品牌色）留给人继续判。

Those three rounds of point-and-ask were themselves the most expensive, least outsourceable work in the project: they forced one person’s tacit taste into explicit, transferable, half-machine-checkable criteria. This is exactly what the volume keeps saying: pull judgment out of making and write it down. Taste used to live in that designer’s hands and leave when he left; now it is six written rules a new hire and a generation model can both run against. A spec does not constrain creativity; it freezes the reusable part of judgment and leaves the unreusable part (why this brand color) for humans to keep judging.

内核②③

KERNEL ②③

判断退守（②）后，被写成上下文规格（③）喂回生成。

After judgment retreats (②), it is written as context spec (③) fed back to generation.

机理图MECHANISMFIG. D10 / GRAVITY vs FORCE · 均值引力对异质守护力 · mean-gravity against the heterogeneity force 看懂：生成默认把设计拉向分布均值；只有人施加一个反方向的力，产物才落到"只对这群人成立" Read: generation pulls design toward the distribution mean by default; only a human counter-force lands it on “true only for these people”

这是设计卷整套主张的力量对比图。均值引力是常开的、免费的：生成模型优化"对得多"，自然把每个产物往大家见得最多的样子拉。异质守护力是人施加的、昂贵的：它要求"为这群人特别"，方向恰好相反。产物停在哪，取决于两力平衡：撤掉人的力，产物必然滑回均值。所以同质化是不施力时的默认结局，不是某次失手；异质守护也就是把产物按在它该在的位置上的那只手，不是锦上添花。

This is the diagram of the two forces behind the whole volume. Mean-gravity is always-on and free: a generation model optimizes “right for many” and naturally pulls every artifact toward the shape most people have seen. The heterogeneity force is human-supplied and costly: it demands “particular to these people,” pointing the opposite way. Where the artifact rests is set by the balance: remove the human force and it slides back to the mean. So homogenization is the default when no force is applied, not one slip; and heterogeneity-guarding is the hand holding the artifact where it belongs, not a nicety.

案例 C · 选择"只对这群人成立"，并付出代价CASE C · Choosing “true only for these people,” and paying the cost异质守护的一次决策a heterogeneity-guarding decision

两条路摆在面前

Two roads

一个给视障人群用的播客 App。改版时摆出两条路：路 A，按通用最佳实践做——大图卡片、瀑布流、自动播放预览，这是生成默认会给的、也是评审会上最容易过的方案，因为它"看起来对"；路 B，为这群人的真实处境做——主屏只有三个超大触控区、全程可纯键盘/读屏操作、关闭一切自动播放、用声音而非视觉做状态反馈、对比度拉到 7:1。路 B 在任何"通用美观"的评审标准下都会扣分：它不好看、不"现代"、留白少、信息密度高。

A podcast app for blind and low-vision users. The redesign laid out two roads: road A, build to generic best practice: big image cards, infinite scroll, autoplay previews; the default generation would give, and the easiest to pass review because it “looks right”. Road B, build for these users’ real situation: a home screen of just three oversized touch zones, full keyboard/screen-reader operation, all autoplay off, state feedback by sound not sight, contrast pushed to 7:1. Road B loses points under any “generic good-looking” review standard: not pretty, not “modern,” little whitespace, high density.

分叉

THE FORK

通用均值（A）对真实人群（B）——只能选一个当北极星。

Generic mean (A) vs real people (B): only one can be the north star.

付的代价是真的

The cost was real

团队选了路 B，而且明确认了代价：在面向明眼用户的应用商店截图里，它"卖相"差，下载转化低于同类；做品牌物料时，市场部反复想把那三个大触控区改"精致点"。这些代价是月月在数据和会议里出现的，不是想象的。但留在人手里的判断是：这个产品的"好"，不由通用审美定义，由它服务的人能不能独立用完一集播客定义。可用性测试里，纯读屏完成"找到→播放→收藏"的成功率从改版前的 41% 升到 92%〔源：本案例为脱敏复盘的内部可用性测试数字，非公开调查数据；证据级 Ⅳ 一手，不外推为通用结论〕[R6]。

The team chose road B and named the cost outright: in app-store screenshots aimed at sighted users it “sells” poorly, with download conversion below peers; in brand work, marketing kept wanting to make the three big touch zones “more refined.” These costs showed up monthly in data and meetings, not imagined. But the judgment that stayed human was this: this product’s “good” is not defined by generic aesthetics but by whether the people it serves can finish an episode independently. In usability testing, screen-reader-only success at “find → play → save” rose from 41% before to 92%〔source: de-identified internal usability test, not public survey data; grade Ⅳ first-hand, not extrapolated to a generic claim〕[R6].

留在人手里

STAYED HUMAN

定义"为谁好"，并承担"对别人不好"的代价。机器不会替你认这个账。

Defining “good for whom,” and bearing the cost of “not good for others.” The machine will not own that bill for you.

读后

Read-off

异质守护不是免费的口号，它每次都要付一笔真实的代价——放弃一部分通用观众、顶住"做得更普适些"的压力、在某些通用指标上认输。但这恰恰是 AI-Native 设计里人最不可替代的那一刀：生成会一直把你往"对所有人都还行"拉，只有人能决定"我就为这群人做到最好，哪怕对别人差"。这一刀一旦交出去，产品就再也回不到"为它要服务的人而做"，只会均匀地谁都不得罪、谁也不真正服务。

Heterogeneity-guarding is not a free slogan; every time it costs something real: giving up some of the generic audience, resisting pressure to “make it more universal,” conceding on certain generic metrics. But that is precisely the most irreplaceable cut humans hold in AI-native design: generation will forever pull you toward “fine for everyone,” and only a human can decide “I will be best for these people, even if worse for others.” Surrender that cut and the product can never return to being made for the people it serves; it only evenly offends no one and serves no one.

内核④

KERNEL ④

人回到"为谁而做"的意义判断——并为之承担取舍。

People return to the meaning judgment of “for whom,” and own the trade-off.

案例 D · 一个"快但是 slop"的已上线产品，怎么诊断、怎么救回CASE D · A shipped-fast-but-slop product: diagnosing and rescuing itslop 急救slop rescue

症状

Symptom

一个 SaaS 仪表盘，团队用生成工具三天就上线了：快，是真快。但上线两周，用户反馈高度一致："看起来挺专业的，但我说不出它哪不对，就是不想用。"留存第七日掉到 11%。这是典型的 slop 症状：没有明显 bug，每个页面单看都"还行"，合在一起却空洞、雷同、没有重量。它是做得太顺、太均值了，不是做坏了。

A SaaS dashboard the team shipped in three days with generation tools: fast, genuinely fast. But two weeks in, user feedback was eerily uniform: “it looks professional, but I can’t say what’s wrong; I just don’t want to use it.” Day-7 retention sank to 11%. This is the textbook slop symptom: no obvious bugs, every page “fine” alone, yet hollow, samey, and weightless together. It was built too smoothly, too mean-ward, not built badly.

slop ≠ bug

SLOP ≠ BUG

没有错，只是没有"为谁"——这是诊断的入口。

Nothing is wrong; there is just no “for whom”: the entry point for diagnosis.

用指纹清单逐条诊断

Diagnosing against the fingerprint list

救回的第一步是诊断，不是重做。团队拿前面几节那张 slop 指纹清单（配色雷同、玻璃拟态、处处大圆角＋柔投影、空洞口号词、所有卡片一样大、字重靠颜色不靠粗细），逐条对照仪表盘：命中五条。命中本身就指出了病因：这五条都是"生成默认会给、没人下令它别给"的特征。诊断结论写成一句：这个产品的每个像素都对得起"通用专业感"，没有一个像素是为它的真实用户（每天看十次、只关心三个数的运营）而做的。

The first rescue step is diagnosis, not a rebuild. The team took the slop-fingerprint list from the earlier sections (samey palettes, glassmorphism, rounded-everything + soft shadows, hollow slogan words, all cards the same size, weight carried by color not boldness) and checked the dashboard item by item: five hits. The hits themselves named the cause: all five are features “generation gives by default, with no one ordering it not to.” The diagnosis wrote up as one line: every pixel honors “generic professionalism,” and not one pixel was made for its real user (the operator who looks ten times a day and cares about three numbers).

诊断工具

THE TOOL

指纹清单把"说不出哪不对"翻译成五条可指认的具体征兆。

The fingerprint list translates “can’t say what’s wrong” into five nameable, concrete symptoms.

救法：重判，不是重画

The fix: re-judge, not repaint

救回不靠"再生成一版更漂亮的"，那只会换一种 slop。救法是把"为谁"补回来：先做三个真实运营的影子观察，确认他们每天只盯"今日转化、异常订单、待处理工单"三个数；据此把首屏 80% 的卡片删掉，只留这三个，做大、做对比、做成"扫一眼就知道有没有事"；删掉所有装饰性渐变和玻璃拟态，把省下的视觉预算全给那三个数。重做只花了两天（生成依然廉价），但这两天前面压着一周的判断：查清为谁、定义这个产品的"好"。改版后第七日留存从 11% 升到 34%〔源：本案例脱敏复盘区间，证据级 Ⅳ 一手从业者〕[R5]。

Rescue is not “generate a prettier version”: that just swaps one slop for another. The fix restores the “for whom”: shadow three real operators, confirm they watch only three numbers daily: today’s conversion, anomalous orders, open tickets; on that basis delete 80% of the home-screen cards, keep only those three, make them big, contrasted, “glanceable for whether anything’s wrong”; strip every decorative gradient and glass layer and spend the freed visual budget entirely on those three numbers. The rebuild took just two days (generation is still cheap), but those two days sat behind a week of judgment: establish for whom, define this product’s “good.” Day-7 retention rose from 11% to 34% after the redesign〔source: de-identified retrospective range, grade Ⅳ practitioner first-hand〕[R5].

内核全链

FULL KERNEL

诊断（②判断）→ 查清为谁（③上下文）→ 重生成（①执行）→ 验收（④意义）。

Diagnose (② judgment) → establish for whom (③ context) → regenerate (① execution) → accept (④ meaning).

前后图BEFORE/AFTERFIG. D11 / SLOP RESCUE · 把"为谁"补回来，不是重画 · restore the “for whom”, not repaint 看懂：救 slop 的关键动作发生在生成之前：补回判断，再让生成执行 Read: the decisive move in rescuing slop happens before generation: restore judgment, then let generation execute

两端的"画"都是生成出来的，廉价、快。差别全在中间那个黑盒：救 slop 的决定性动作是先补回被跳过的判断，不是再画一版：查清为谁、它们只关心什么。前后对比里像素的变化（七个平均卡片→三个有主次的数）只是结果，工作是那一周的判断。这也解释了为什么"再生成一版更漂亮的"永远救不了 slop：它跳过的正是中间这一步。

Both “drawings” are generated: cheap, fast. The whole difference is the black box between them: the decisive move in rescuing slop is restoring the skipped judgment, not another comp: establish for whom, and what they alone care about. The pixel change in the before/after (seven average cards → three prioritized numbers) is only the result; the real work was that week of judgment. This is why “generate a prettier one” never rescues slop: it skips exactly this middle step.

DSN

CRITIQUE · 旧结构

OLD STRUCTURES

批判 · 承重点

Critique · the load-bearing point

六种传统设计结构，在生成廉价时各自从哪一处断裂

Six traditional design structures, and where each one snaps when generation gets cheap

这些结构曾经合理，是因为"做出来"昂贵；生成把它压到近零，承重点就断了。

These structures were once sound because “making” was expensive; generation crushes that to near-zero, and the load-bearing point snaps.

一句话In one line

六种传统结构断在同一处：都把人的价值锚在执行位（产出的速度或精致度），而执行正是机器接管的那一半；修法只有一个方向，把人挪到判断位。Six traditional structures snap at the same place: each anchors human value to the execution seat (the speed or polish of output), and execution is exactly the half the machine takes over; the fix has one direction: move the human to the judgment seat.

共同的断裂机理：六种结构表面各异，断裂处是同一个——它们都把人的价值定义在"产出物的速度或精致度"上。当这两样都被生成做到又快又好，建立在它们之上的角色、流程、组织就同时失去了承重墙。下面逐条点名，给出每一种过去为何合理、现在断在哪、AI-Native 下它该变成什么。

The shared break: the six structures look different but snap at the same place: each defines human value by “the speed or polish of output”. When generation does both fast and well, the roles, flows, and orgs built on them lose their load-bearing wall at once. Each is named below: why it was once sound, where it snaps now, and what it must become under AI-native.

旧结构 ① · 装饰工 / 最后一公里美化

OLD ① · DESIGN-AS-DECORATION / LAST-MILE PRETTIFIER

"功能先做完，最后叫设计来美化一下"

“Build the function first, call design at the end to pretty it up”

过去合理：美化耗工时，得有专人。断在哪：美化正是生成最擅长的——它能瞬间出十版"更好看"。当美化免费，把人定位成美化工，等于把人按在机器最强的那一格上。该变成：人不做最后一公里的美化，做第一公里的判断——为谁、何为好、哪版对路。

Once sound: prettifying cost hours, so it needed a dedicated person. Where it snaps: prettifying is exactly what generation does best: ten “nicer” versions instantly. When prettifying is free, casting humans as prettifiers pins them on the machine’s strongest square. Becomes: humans do not do the last-mile polish but the first-mile judgment: for whom, what good means, which version is on-target.

旧结构 ② · 孤胆天才 / 作者式设计师

OLD ② · THE LONE-GENIUS AUTEUR

"好设计出自一个有品味的天才之手"

“Good design flows from one tasteful genius’s hand”

过去合理：品味是隐性的、长在手上的，只能由那个人产出。断在哪：手的活已交给生成，天才的"手"不再稀缺；稀缺的是把品味写下来的能力。作者式设计师若不把判据外显，他的品味就随他离职清零、也无法喂给生成。该变成：从"用手产出品味"转向"把品味写成可传递、可机检一半的规格"——见案例 B。

Once sound: taste was tacit, lived in the hand, only that person could produce it. Where it snaps: the hand’s work has gone to generation; the genius’s “hand” is no longer scarce: scarce is the ability to write taste down. An auteur who never externalizes criteria sees his taste zero out when he leaves, and it cannot be fed to generation. Becomes: from “producing taste by hand” to “writing taste as transferable, half-machine-checkable spec,” see Case B.

旧结构 ③ · 出稿—交付—甩给开发的瀑布

OLD ③ · MOCKUP-THEN-HANDOFF WATERFALL

"设计师画死稿，标注好，扔过墙给前端实现"

“Designer freezes a comp, annotates it, throws it over the wall to front-end”

过去合理：画稿和写码是两种昂贵技能，分工省成本。断在哪：当产物本身变成代码（设计即代码，见 DSN 02），"画稿—标注—再翻译成代码"这道墙凭空多出一次有损翻译，每翻一次都丢信息、生 bug。生成能直接产出可运行的产物，墙两边的"翻译"成了纯损耗。该变成：设计与实现在同一介质（代码/token）里同步演化，没有甩墙这一步。

Once sound: drawing and coding were two expensive skills; splitting them saved cost. Where it snaps: when the artifact itself becomes code (design-as-code, see DSN 02), the wall of “comp → annotate → translate to code” adds a gratuitous lossy translation that bleeds information and breeds bugs each pass. Generation produces runnable artifacts directly, so the cross-wall “translation” becomes pure waste. Becomes: design and implementation co-evolve in one medium (code/tokens), with no over-the-wall step.

旧结构 ④ · 像素级画板文化

OLD ④ · PIXEL-PERFECT ARTBOARD CULTURE

"把每一帧每一态都在画板上逐像素固定"

“Nail every frame and state to the pixel on the artboard”

过去合理：实现昂贵、改一次代价高，所以要在画板里把一切定死、减少返工。断在哪：当生成能在几分钟内出齐所有响应式状态与变体，把人时投入到"逐像素固定一帧"上，是把稀缺的判断花在了机器随手能补的细节上。画板里的"完美一帧"还会骗人——它在真实数据、真实设备、真实边界条件下往往不成立。该变成：定义约束与判据（规格、token、护栏），让生成铺出全部状态，人验收的是"对不对路"而非"像素对不对齐"。

Once sound: implementation was expensive and a change costly, so you nailed everything on the artboard to cut rework. Where it snaps: when generation produces every responsive state and variant in minutes, spending human-hours nailing one frame to the pixel spends scarce judgment on details the machine fills offhand. The artboard’s “perfect frame” also lies: it often fails under real data, real devices, real edge cases. Becomes: define constraints and criteria (spec, tokens, guardrails), let generation spread all states, and have humans accept “on-target?” not “pixels aligned?”

旧结构 ⑤ · 设计当内部服务台

OLD ⑤ · DESIGN AS INTERNAL SERVICE DESK

"业务提需求 → 设计接单出图 → 计件交付"

“Business files a ticket → design takes the order and ships comps → piecework delivery”

过去合理：出图是稀缺产能，排队接单能让稀缺资源利用率最高。断在哪：出图不再稀缺，"接单出图"这条价值链整段塌掉——业务自己用生成就能出图。设计若继续做服务台，它守护的恰好是已经免费的那项产能，而把真正稀缺的（为谁、何为好的判断）拱手让给了不懂判断的人。该变成：从"接单出图"转向"定义并守护判据"——成为品味与责任的主理人，不再是产能的瓶颈。

Once sound: comps were scarce capacity, and queued ticketing maximized utilization of a scarce resource. Where it snaps: comps are no longer scarce, and the whole “take-ticket-ship-comp” value chain collapses: business can produce comps with generation itself. A design team that stays a service desk guards precisely the capacity that is now free, while ceding the truly scarce thing (the judgment of for-whom and what-good-means) to people who do not judge it. Becomes: from “take the ticket, ship the comp” to “define and guard the criteria,” no longer the capacity bottleneck but the owner of taste and responsibility.

旧结构 ⑥ · "搞得有冲击力点"式 brief

OLD ⑥ · THE “MAKE IT POP” BRIEF

"做得高级点 / 有冲击力点 / 现代点"

“Make it premium / make it pop / make it modern”

过去合理：brief 模糊没关系，反正中间隔着一个会追问、会用手把模糊变具体的设计师。断在哪：当 brief 直接喂给生成，模糊不再被人消化，而是被模型用训练分布的均值填满——"高级"返回最常见的高级、"现代"返回最常见的现代。模糊 brief × 廉价生成 = 自动产 slop（见案例 B、案例 D）。该变成：brief 必须先被拆成可判别的判据（哪条成立、为谁成立），模糊的"好"在喂给生成前就得被人翻译成可生成的规格。

Once sound: a fuzzy brief was fine because a designer sat in the middle who would interrogate it and turn fuzz into specifics by hand. Where it snaps: when the brief feeds straight into generation, fuzz is no longer digested by a human but filled by the model with the mean of its training distribution: “premium” returns the most common premium, “modern” the most common modern. Fuzzy brief × cheap generation = automatic slop (see Cases B and D). Becomes: a brief must first be decomposed into rule-able criteria (which holds, for whom), and a fuzzy “good” must be translated by a human into a generatable spec before it ever reaches generation.

结构图STRUCTUREFIG. D12 / SEAT SWAP · 旧结构把人放执行位，新结构把人挪到判断位 · old structures seat humans in execution; the new one moves them to judgment 看懂：六种旧结构断在同一处——人坐在了机器最强的执行格 Read: all six old structures snap at the same place: humans sit on the machine’s strongest square, execution

把六种旧结构叠在一张图上，它们的断点重合：每一种都让人坐在左栏（执行位），而执行恰是生成做到又快又好的那一半。它们是同一个错误的六个变体，不是各自独立的坏习惯：把人的价值定义在产出物上。修法也只有一个方向：把人从左栏挪到右栏，去做机器答不了的判断。这张图是 DSN 10 整节的骨架，也是为什么本卷反复说"判断退守到人"。

Overlay the six old structures on one diagram and their break points coincide: each seats the human in the left column (execution), exactly the half generation does fast and well. They are six variants of one error, not six independent bad habits: defining human value by output. The fix has only one direction: move the human from the left column to the right, to the judgment the machine cannot answer. This diagram is the skeleton of all of DSN 10, and why the volume keeps saying “judgment retreats to people.”

值得说清的是：点名旧结构不是说做执行的人没价值，也不是要谁明天就失业。是说当一种结构把人的价值锚点定在已经免费的产能上时，这种结构会先失去解释力、再失去存在理由。一个团队可以继续用 Figma 画板、继续有资深设计师、继续接业务需求，只要价值锚点从"出得快、画得精"挪到"判得准、守得住为谁"。结构批判的对象从来是那条把人按在执行位上的隐含假设，不是工具或岗位。

Worth stating plainly: naming old structures does not say execution workers have no value, nor that anyone is fired tomorrow. It says that when a structure anchors human value to a capacity that is now free, that structure first loses explanatory power, then its reason to exist. A team can keep using Figma artboards, keep senior designers, keep taking business requests, as long as the value anchor moves from “ships fast, draws fine” to “judges true, guards for-whom.” The target of the critique is the buried assumption that pins humans to the execution seat, never the tool or the role.

DSN

TOOLKIT · 可照做

DO-THIS TOOLKIT

工具 · 拿去用

Tools · take and use

把"何为好"变成今天就能跑的工具，而非口号

Turning “what good means” into tools you can run today, not slogans

一句话In one line

设计面比组织面能给更多可操作工具，因为产物具体、可机检一半；这里给五件今天就能跑的工具。The design surface affords more operable tools than the org surface, because its artifact is concrete and half machine-checkable; here are five you can run today.

工具一 · 设计令牌即代码：让"好"可被机器执行

Tool 1 · Design tokens as code: making “good” machine-enforceable

设计令牌（design tokens）是把一切视觉决策——颜色、间距、字号、圆角、阴影、动效时长——抽象成命名变量，存成机器可读的格式（JSON / CSS 变量 / 平台原生），再由构建链注入所有产物。它的意义在 AI-Native 语境下被放大：当生成在产出无数变体，令牌是少数能让海量生成保持一致的锚点。人只需判断一次"主色是这个、间距走 8 的倍数、对比度不低于 4.5:1"，这个判断就以令牌的形式被所有生成强制继承。判断一次，执行无数次——这正是内核②③的落地形态〔源：W3C Design Tokens Community Group 规范草案与各大设计系统（Material / Carbon / Polaris）token 实践，证据级 Ⅳ 行业实践〕[R7]。

Design tokens abstract every visual decision (color, spacing, type size, radius, shadow, motion duration) into named variables, stored in a machine-readable format (JSON / CSS variables / platform-native), then injected into every artifact by the build chain. Their meaning is amplified under AI-native: when generation spits out countless variants, tokens are one of the few anchors that keep mass generation consistent. A human judges once: “the primary is this, spacing in multiples of 8, contrast no lower than 4.5:1,” and that judgment is force-inherited by all generation as tokens. Judge once, enforce countless times: the landed form of kernel ②③〔source: W3C Design Tokens Community Group draft and the token practice of major design systems (Material / Carbon / Polaris), grade Ⅳ industry practice〕[R7].

没有令牌 · 生成各跑各的No tokens · generation drifts

十个页面十种蓝、间距随手取、对比度时好时坏——每版单看还行，合起来散。一致性靠人逐页盯，盯不过来。

Ten pages, ten blues, spacing picked offhand, contrast hit-or-miss: each page fine alone, incoherent together. Consistency rides on a human checking page by page, and they cannot keep up.

有令牌 · 判断一次被强制继承With tokens · one judgment force-inherited

主色、间距阶、对比度阈值定义一次，所有生成从令牌取值。改一个令牌，全站同步。人省下的盯页时间，全投到"这个主色为什么对"。

Primary, spacing scale, contrast threshold defined once; all generation reads from tokens. Change one token, the whole product updates. The watching time saved goes entirely into “why is this primary right.”

工具二 · 设计系统即两层护栏：硬约束 + 软判据

Tool 2 · The design system as a two-layer guardrail: hard rules + soft criteria

设计系统不该是一本没人看的规范文档，而该是一道两层护栏。A 层是硬约束（HARD-RULES）：可机检、可写进 lint、命中即拦——禁渐变文字、调色板取自 token、模糊层数上限、字体白名单（排除 Inter/Roboto 等系统默认）、对比度阈值、卡片尺寸必须随内容权重变化、"空洞口号词表"命中零次。B 层是软判据（SOFT-CRITERIA）：不可机检、必由人复核——"这个主色为什么是它""这款字为什么对这群人""这条动效有没有意义"。A 层让生成不会跑出红线，把人从无穷的低级一致性检查里解放；B 层把人的注意力收束到真正需要判断的少数几处。两层一硬一软，恰好对应可验证性梯度的两端（见核心图 FIG. D0）。

A design system should not be an unread spec document but a two-layer guardrail. Layer A is hard rules (HARD-RULES): machine-checkable, lint-able, blocked on hit: ban gradient text, palette from tokens, a cap on blur layers, a font allowlist (excluding system defaults like Inter/Roboto), a contrast threshold, card size must vary with content weight, “hollow slogan words” zero hits. Layer B is soft criteria (SOFT-CRITERIA): not machine-checkable, human-reviewed only: “why is this primary it,” “why is this typeface right for these people,” “does this motion mean anything.” Layer A keeps generation inside the red lines and frees humans from endless low-level consistency checks; Layer B funnels human attention to the few places that truly need judgment. One hard, one soft, mapping exactly onto the two ends of the verifiability gradient (see key figure FIG. D0).

护栏图GUARDRAILFIG. D13 / TWO-LAYER CLAMP · 硬约束夹住生成，软判据留给人 · hard rules clamp generation, soft criteria stay human 看懂：A 层硬约束像夹钳把生成夹在红线内；越过夹钳的判断交给 B 层的人 Read: Layer-A hard rules clamp generation inside the red lines; judgment past the clamp goes to the human in Layer B

两层护栏不把规范写得更厚，而是把规范切成两半按不同方式执行：能机检的一半（A 层）做成 lint，让机器零成本地把海量生成夹在红线内，人完全不必碰；不能机检的一半（B 层）显式标成软判据，把人的注意力收束到这几处真正需要判断的问题上。夹钳的意义是：它让"一致性"不再消耗人的判断力，从而把判断力全部留给夹钳拦不住的那些只能人来定的问题。

A two-layer guardrail does not thicken the spec but cuts it in half, enforced two different ways: the machine-checkable half (Layer A) becomes a lint that clamps mass generation inside the red lines at zero cost, untouched by humans; the non-checkable half (Layer B) is marked explicitly as soft criteria, funneling human attention to the few questions that truly need judgment. The point of the clamp: it stops “consistency” from consuming human judgment, leaving all of that judgment for the constitutive questions the clamp cannot catch.

工具三 · 品味评分卡：把"好"拆成可逐条判的判据

Tool 3 · The taste scorecard: decomposing “good” into item-by-item criteria

"品味"听着玄，但在一个具体产物上，它几乎总能拆成一组可逐条判的判据。下面是一张通用的品味评分卡骨架——每条都问一个"是/否/部分"的具体问题，而非"美不美"的整体感受。用法：把候选逐条过一遍，记录命中与否；过完得到的是一张"哪里对、哪里没对、为什么"的诊断表，不是一个分数。它把"我觉得不太对"逼成"第 3、第 6 条没过"，从而可讨论、可传递、可喂回生成。

“Taste” sounds mystical, but on a concrete artifact it almost always decomposes into a set of item-by-item criteria. Below is a generic taste-scorecard skeleton: each line asks a specific yes/no/partial question, not a holistic “is it pretty.” Usage: run a candidate through line by line and record hits; what you get is a diagnostic table of “where it’s right, where it’s not, and why,” not a single score. It forces “I feel it’s off” into “lines 3 and 6 fail,” which can then be discussed, transferred, and fed back to generation.

品味评分卡 · 七问TASTE SCORECARD · seven questions逐条判，不打总分judge per line, no overall score

1 · 为谁

1 · For whom

这个产物能说出它具体为谁而做吗？还是"为所有人"——后者通常等于没有人。

Can this artifact name who specifically it is for? Or is it “for everyone,” which usually means no one.

软判据 · 必由人判

soft · human-only

2 · 主次

2 · Hierarchy

扫一眼，能立刻分出"最重要的一件事"吗？还是所有元素争同样的注意力（slop 的典型征兆）。

At a glance, does the single most important thing stand out? Or do all elements fight for equal attention (a textbook slop symptom)?

半可机检 · 视觉权重

half-checkable · visual weight

3 · 有来由

3 · Motivated

主色、字体、布局，每一个都能说出"为什么是它"吗？还是"生成默认给的、没人问过为什么"。

Can the primary color, typeface, and layout each say “why it”? Or are they “what generation defaulted to, never questioned”?

软判据 · 必由人判

soft · human-only

4 · 无指纹

4 · No fingerprint

有没有 slop 指纹（渐变文字、玻璃拟态、处处大圆角＋柔投影、空洞口号词）？这一条可机检。

Any slop fingerprint (gradient text, glassmorphism, rounded-everything + soft shadows, hollow slogan words)? This line is machine-checkable.

硬约束 · 可 lint

hard · lint-able

5 · 经得起真实数据

5 · Survives real data

放进真实长度的文案、真实数量的列表、真实边界条件，它还成立吗？还是只在"完美一帧"里好看。

Put in real-length copy, real-count lists, real edge cases: does it still hold? Or only look good in the “perfect frame”?

半可机检 · 压力测试

half-checkable · stress test

6 · 有重量

6 · Has weight

它有没有一处让人记住、愿意停留的地方？还是哪都"还行"、合起来空洞——这是 slop 与好作品最难机检、却最要命的分野。

Is there one place that lands, that makes someone stay? Or is everything “fine” and the whole hollow, the hardest-to-check yet most decisive line between slop and good work.

软判据 · 必由人判

soft · human-only

7 · 认了代价

7 · Owns a cost

它为了"对这群人"放弃了什么吗？一个不放弃任何人的设计，通常没有为任何人真正做好（见案例 C）。

Did it give up anything to be “for these people”? A design that gives up no one is usually not truly good for anyone (see Case C).

软判据 · 必由人判

soft · human-only

注意第 4 条是硬约束（可写进 lint），第 2、5 条半可机检，其余四条是软判据。这张卡本身就示范了两层护栏：你能把可机检的几条自动化掉，把人的注意力集中到第 1、3、6、7 这几条机器答不了的问题上。一张拆开的评分卡，比一句"我觉得不够高级"对生成有用一万倍——因为前者能逐条喂回去，后者只会换来另一版均值 slop。

Note line 4 is a hard rule (lint-able), lines 2 and 5 are half-checkable, and the other four are soft criteria. The scorecard itself demonstrates the two-layer guardrail: you can automate the checkable lines and concentrate human attention on lines 1, 3, 6, 7: the ones the machine cannot answer. A decomposed scorecard is ten-thousand times more useful to generation than “I feel it’s not premium enough,” because the former feeds back line by line while the latter only buys another mean-slop version.

工具四 · 铺开候选协议：把"多生成几版"变成可复用的判断流程

Tool 4 · The candidate-spread protocol: turning “generate more versions” into a reusable judgment process

"铺开候选再收敛"若不写成协议，很容易退化成"无脑生成一百版然后挑顺眼的"——那只是把一个均值换成另一个均值。把它写成五步协议，它才是判断流程而非产量竞赛：① 先把约束与判据写下来（没有判据，铺开就是噪音）；② 沿"结构维度"而非"皮肤维度"铺开（要九个布局/流程不同的候选，不是九个换了配色的同一个）；③ 用一条查清的事实做第一刀收敛（把审美题压成事实题，见案例 A）；④ 决赛圈才做受控比对（A/B、可用性测试、真实数据压测）；⑤ 把胜出的理由写回判据/令牌/护栏（让这次的判断沉淀成下次的规格）。第 ① 和第 ⑤ 步是把这个协议和"无脑刷版"区分开的关键：前者保证铺开有方向，后者保证判断被复用。

“Spread candidates, then converge,” if unwritten, easily degenerates into “mindlessly generate a hundred and pick the nice one,” which just swaps one mean for another. Written as a five-step protocol, it becomes a judgment process rather than an output race: ① write constraints and criteria down first (without criteria, spread is noise); ② spread along the structural dimension, not the skin dimension (nine candidates with different layout/flow, not nine recolors of the same one); ③ make the first cut with one established fact (compress aesthetics into fact, see Case A); ④ run controlled comparison only on the shortlist (A/B, usability test, real-data stress test); ⑤ write the winner’s reasons back into criteria/tokens/guardrails (let this judgment settle into next time’s spec). Steps ① and ⑤ are what separate this protocol from “mindless re-rolling”: the first keeps the spread directed, the last keeps judgment reused.

工具五 · 生成×品味坐标：把一个设计决策放上去看它该怎么处理

Tool 5 · The generation × taste plane: place a design decision on it to see how to handle it

最后一件工具是一张可放决策的坐标，把前面所有原理收成一个可操作的二维平面。横轴：这个决策能被生成做到多廉价（左=昂贵/需人、右=近免费）。纵轴：判定它好坏多依赖品味（下=可机检的事实题、上=只能人来定的品味题）。四个象限给出四种该怎么处理的处方——这正是 INSTRUMENT 13 设计判断分配台背后的平面。下方 FIG. D14 把这张坐标画出来，配套的 INSTRUMENT 15 让你把自己手头的决策点上去、即时拿到处方。

The last tool is a coordinate you place a decision on, collapsing all the prior principles into one operable two-dimensional plane. X-axis: how cheaply generation can do this decision (left = expensive/needs-human, right = near-free). Y-axis: how much judging its quality depends on taste (bottom = machine-checkable fact question, top = constitutive taste question). The four quadrants give four prescriptions for how to handle it: this is the plane behind INSTRUMENT 13, the design-judgment allocator. FIG. D14 below draws the coordinate, and the companion INSTRUMENT 15 lets you place your own decision on it and get an instant prescription.

坐标图PLANEFIG. D14 / GENERATION × TASTE · 两轴四象限，每格一个处方 · two axes, four quadrants, one prescription each 看懂：把一个设计决策放上去——横轴看生成多廉价，纵轴看判好坏多靠品味 Read: place a design decision: X for how cheap generation is, Y for how much judging quality needs taste

这张坐标是设计卷所有处方的总收口。右下：廉价又可机检，整段交给生成。右上：做起来廉价但带价值取向，人把价值固化成规则（令牌/护栏），机器据规则执行。左下：不靠品味但要先查清事实（研究的活）。左上：既需人又带只能人来定的品味——可验证性梯度的最远端，本卷押注它停在人这侧（是押注，非永久保证）。把任何一个设计决策放上去，它落在哪个象限，就该用哪种处方。下方 INSTRUMENT 15 让这张图变成可点的。

This coordinate is the catch-all for every prescription in the volume. Bottom-right: cheap and checkable, hand it wholesale to generation. Top-right: cheap to make but value-laden, so the human freezes value into a rule (tokens/guardrails) the machine enforces. Bottom-left: not taste-driven but fact-needing first (research work). Top-left: both human-needing and constitutive taste: the far end of the verifiability gradient, which this volume bets stays human — a bet, not a permanent guarantee. Place any design decision on it; whichever quadrant it lands in is the prescription to use. INSTRUMENT 15 below makes this figure clickable.

INSTRUMENT 15 · Slop ↔ 品味自测器INSTRUMENT 15 · Slop ↔ taste self-scorer

把你手头的一个产物逐条过这七问（取自上面的品味评分卡）。每勾选一个"没过"，分数累加；过完即时拿到一个 slop 风险判语和首要修法。这是诊断，不是打总分：它告诉你断在哪条、先修哪条。

Run one artifact you have through these seven questions (from the taste scorecard above). Each “fails” you check adds to the score; on finishing you get an instant slop-risk verdict and the first fix. This is a diagnosis, not an overall score: it tells you which line fails and which to fix first.

设计这一面 · 可执行 skillThe AI-Native Design Skill

The AI-Native Design Skill

前面各节讲"为什么生成会变廉价、品味会变稀缺、该怎么判"；这一件替你把设计真的做出来：它是这个面的可执行配套，不是"设计一个设计组织"：拿到一个产品、一个界面、一条落地页、一个组件、一套设计系统或一段动效，它先过一道重画而非嫁接的闸（把 agent 删掉若塌回"一个设计师手搓一张稿"，就还是更快的铅笔），再跑"先立护栏 → 写可机检一半的规格 → 沿结构维度铺开候选 → 人用品味收敛 → 把判断喂回系统"这条闭环。它的覆盖面是产品 / 交互 / 系统 / 表达，不是给界面套皮。

Every section above covers “why generation gets cheap, taste gets scarce, and how to judge”; this piece actually produces the design with you: it does not “design a design org,” it is this surface’s executable companion: hand it a product, an interface, a landing page, a component, a design system, or a motion piece, and it first runs a redraw-not-graft gate (delete the agents; if it collapses back to one designer hand-crafting one comp, it is still a faster pencil), then runs the closing loop “stand up the guardrail first → write a half-machine-checkable spec → spread candidates along the structural dimension → converge with human taste → feed judgment back into the system.” Its scope is product / interaction / system / expression, not skinning a screen.

# 在 Claude Code 里调用invoke inside Claude Code
$ /skill ai-native-design
> "帮我设计这条落地页，多铺几版再帮我挑一版……""design this landing page, spread a few directions and help me pick..."

  → 重画闸 · 绿地 / 旧产品切出新面 / 仅赋能 / 人/信任边界redraw gate · greenfield / carve a new surface / mere enablement / human-trust boundary
  → 一份设计产物（稿/组件/系统）＋ 令牌即代码 ＋ 品味理由 ＋ 指纹反同质检a design artifact (mockups/components/system) + tokens-as-code + a taste rationale + a fingerprint anti-homogenization check

开源仓库：Open-source: github.com/watterfall/ai-native-architect/skills/ai-native-design ↗

安装：Install: /plugin marketplace add watterfall/ai-native-architect

本件性质 · 设计面的可执行配套架构层那件（ai-native-architect）设计组织；这一件与其余配套件各对应一个面——同一内核、彼此耦合、阅读无固定起点。它把本卷方法论跑成设计产物。判断节点 = 品味：在海量草稿里挑哪一版、并守住其中的人。生成草稿是廉价的，判它们是稀缺的。止步线：永不外包品味——别把"挑哪版、为什么"那一按交给模型；也别把"为谁、有没有灵魂、对不对路"这类软判据硬塞进 lint，那会把每个可机检指标都拉满、却做出一个没人想要的完美界面。

What this is · the design executable companionThe architecture piece (ai-native-architect) designs the organization; this and the other companion pieces each carry one surface: one kernel, mutually coupled, with no fixed reading entry. It runs this volume’s methodology into design artifacts. Judgment node = taste: choosing which version among the abundant drafts, and holding the human in it. Generating drafts is cheap; judging them is scarce. Stop-line: never offload taste: do not hand the “which one, and why” press to the model; and do not force soft criteria (“for whom, has soul, on-target”) into lint, which maxes every checkable metric yet ships a flawless interface no one wants.

SPEC.V / AI NATIVE METHODOLOGY / OWL METHODOLOGY SERIES

SCOPE / 一套方法论 · 完整组织光谱 N=1 → N=众多（一人公司至 agent 网络，同一套第一性原理）One methodology · the full organizational spectrum N=1 → N=many (from the one-person company to the agent network, on a single set of first principles)

SERIES / 六卷同一内核 · 本卷是其中一个面，完整接线见上方「方法论系列」。Six volumes, one kernel · this volume is one surface; the full wiring is above under “The Series.”

CONTACT / 案例投稿与合作洽谈：Case submissions and collaboration: contact@ai-native.build

FEEDBACK / 选中任意正文文字或悬停图表，点击浮出的 ⚑ 按钮即可直接提交反馈（免登录），或通过 GitHub 提交并跟踪进展。Select any text or hover a figure, then click the ⚑ button that appears to submit feedback directly (no account needed), or via GitHub to track progress.

APPENDIX · SOURCES / 证据与引用登记 —— 分级口径：Ⅰ 审计级实证（监管文件交叉验证）· Ⅱ 同行评审 · Ⅲ 理论模型／工作论文（引用须写"模型预测"，不得写"已证明"）· Ⅳ 从业者一手陈述 · Ⅴ 咨询预测（是预测，不是事实）。引用条目以本表为准；本轮 3 票对抗复核未发现被驳倒条目。Evidence and citation registry; grading key: Ⅰ audit-grade empirics (cross-checked against regulatory filings) · Ⅱ peer-reviewed · Ⅲ theoretical model / working paper (citations must read “the model predicts,” never “proven”) · Ⅳ practitioner first-hand account · Ⅴ advisory forecast (a forecast, not a fact). Citation rows are authoritative in this table; the current 3-vote adversarial review found no overturned source.

REF	级GR	SOURCE	承重论断Load-bearing claim
R1	Ⅳ	Anthropic《How Anthropic teams use Claude Code》2025-07-24 · agentic-coding 一手实践 “How Anthropic teams use Claude Code” 2025-07-24 · first-hand agentic-coding practice · anthropic.com/news	从一句描述生成整套带状态、响应式、可交付代码的界面，已是常规能力而非演示——出一版界面从"几天人时"压到"一次提示＋几分钟"（成本侧已塌的从业者证据）Generating a full stateful, responsive, deliverable UI from one prompt is now routine, not a demo: a version drops from “days of human time” to “one prompt + minutes” (practitioner evidence that the cost side has collapsed)
R2	Ⅳ	Karpathy《Software Is Changing (Again)》YC AI Startup School · 2025-06-16 “Software Is Changing (Again)” YC AI Startup School · 2025-06-16 · ycombinator.com/library/MW	Software 3.0 与"验证瓶颈"——生成变廉价后，做功的环节从"能否做出来"移到"该不该是这样、由谁来验"（判断不随模型变便宜的论述锚）Software 3.0 and the “verification bottleneck”: once generation gets cheap, the load-bearing step moves from “can it be built” to “should it be this, and who verifies” (the anchor for judgment not getting cheaper with the model)
R3	Ⅳ	设计即代码工具链：pencil/paper（以代码描述图形）· Remotion（以 React 描述视频，Design-as-code tooling: pencil/paper (graphics described as code) · Remotion (video described as React, remotion.dev）· html-video（用网页技术出动效）；并本系列工程卷"五条贯穿原理"与 design-as-code 实践) · html-video (motion via web tech); plus this series’ engineering volume “five through-lines” and its design-as-code practice	画布工具把设计锁进私有二进制；新一代工具把同一份设计重新表达为纯文本，于是设计掉进软件工程三十年的 git/diff/CI 基础设施——这是产物形态的相变，不是工具竞赛Canvas tools lock design in proprietary binary; the new tools re-express the same design as plain text, so design falls into software engineering’s thirty years of git/diff/CI infrastructure: a phase change in artifact form, not a tool race
R4	Ⅱ	Doshi & Hauser，受控实验, controlled experiment《Generative AI enhances individual creativity but reduces the collective diversity of novel content》Science Advances 10(28) · 2024 · doi.org/10.1126/sciadv.adn5290	约 300 名受试者写短篇故事，部分获 AI 提示：个体层面更新颖，集体层面（语义相似度）更趋同——"放大个体、压平分布"的实验影像（对象是叙事文本，迁移到视觉/产品设计是合理但未验证的外推，故不外推具体数字）~300 participants writing short stories, some given AI prompts: more novel individually, more similar collectively (by semantic similarity): the experimental image of “amplify the individual, flatten the distribution” (the object is narrative text; carrying it to visual/product design is a reasonable but unverified extrapolation, so no specific figure is carried over)
R5	Ⅳ	从业者复盘（脱敏）Practitioner retrospective (de-identified)· DSN 09 案例 A／DCases A / D	结账重做（弃单 31%→19%）与仪表盘 slop 急救（D7 留存 11%→34%）的前后区间，引自一手项目复盘。为脱敏内部数据、非公开受控实验，故仅作区间陈述、不外推到其他品类或团队；用于支撑"铺开→收敛"与"先补判断再重生成"的机理，不用于证明任何普适转化率。The before/after ranges for the checkout redo (abandon 31%→19%) and the dashboard slop rescue (D7 retention 11%→34%), drawn from first-hand project retrospectives. De-identified internal data, not a public controlled experiment, so stated only as ranges and not extrapolated to other categories or teams; used to support the mechanism of “spread→converge” and “restore judgment before regenerating,” not to prove any universal conversion rate.
R6	Ⅳ	从业者复盘（脱敏）＋ WebAIM 屏幕阅读器用户调查（仅作读屏用户构成的背景参考，非任务完成率来源）Practitioner retrospective (de-identified) + WebAIM Screen Reader User Survey (background reference on the screen-reader user population only, not a source of task-completion rates)· webaim.org/projects/screenreadersurvey10	DSN 09 案例 C 的无障碍可用性区间（纯读屏完成关键流程 41%→92%）为脱敏内部复盘数字，非公开调查数据；WebAIM 调查不含任务完成率、不为该量级背书；证据级 Ⅳ 一手，不外推为通用结论。The accessibility-usability range in DSN 09 Case C (screen-reader-only completion of the key flow, 41%→92%) is a de-identified internal retrospective, not public survey data; the WebAIM survey contains no task-completion rates and does not back this magnitude; grade Ⅳ first-hand, not extrapolated to a generic claim.
R7	Ⅳ	W3C Design Tokens Community Group，规范草案, draft specification· w3.org/community/design-tokens＋ Material／Carbon／Polaris 设计系统 token 公开实践 + the public token practice of the Material / Carbon / Polaris design systems	支撑 DSN 11 工具一"设计令牌即代码"：令牌作为机器可读的视觉决策锚点，使一次判断被海量生成强制继承。引规范草案与主流设计系统的公开实践，证据级 Ⅳ 行业实践（规范为草案、各家实现细节不同，不主张统一标准已成定论）。Supports DSN 11 Tool 1 “design tokens as code”: tokens as a machine-readable anchor for visual decisions, so one judgment is force-inherited by mass generation. Cites the draft spec and the public practice of mainstream design systems; grade Ⅳ industry practice (the spec is a draft and implementations differ, so no settled unified standard is claimed).
R8	Ⅱ	Patterns（Cell Press），同行评议研究 · Hintze、Proschinger Åström & Schossau《Autonomous language-image generation loops converge to generic visual motifs》2025;7(1):101451 · DOI 10.1016/j.patter.2025.101451 · 700 条生成轨迹 × 7 档温度的视觉母题坍缩分析Patterns (Cell Press), peer-reviewed · Hintze, Proschinger Åström & Schossau, “Autonomous language-image generation loops converge to generic visual motifs,” 2025;7(1):101451 · DOI 10.1016/j.patter.2025.101451 · analysis of visual-motif collapse across 700 generation trajectories × 7 temperatures	700 条生成轨迹、横跨 7 档采样温度，几乎全部坍缩到同一组 12 个通用视觉母题——把"生成默认收敛到训练分布均值（slop = 均值）"从本卷断言升为同行评议实测；slop 是无人施加判断时生成系统的稳态吸引子。仅坐实"趋同引力此刻可测"，不外推具体比例，也不单独证明情景台"均值之海"象限（后者另需偏好可复现＋品味民主化两条未坐实条件）。700 generation trajectories across 7 sampling temperatures nearly all collapse onto the same set of 12 universal visual motifs, raising “generation defaults to the mean of its training distribution (slop = the mean)” from this volume’s assertion to a peer-reviewed measurement; slop is the stable attractor of a generative system when no judgment is applied. Confirms only that “the gravity of convergence is now measurable,” extrapolates no specific proportion, and does not by itself prove the scenario bench’s “Sea of the Mean” quadrant (which additionally needs the two unconfirmed conditions of reproducible preference + democratized taste).

REV	DATE	DESCRIPTION
1.0	2026-06	设计卷成形 —— 八个主题章（生成变富品味变稀缺 · 设计即代码 · 从打磨到判断 · 品味可拆解 · 设计系统即护栏 · 反 slop 红线 · AI 设计环 · 决策分诊）· 三节深化（DSN 09 四个真实案例 · DSN 10 六种旧结构批判 · DSN 11 五件可照做工具）· INSTRUMENT 10 Slop 自检表 + INSTRUMENT 13 设计判断分配台 + INSTRUMENT 15 Slop↔品味自测器 · 十九张论证图 · 本卷独立证据登记 R1-R8（与组织卷登记分离）Design volume takes shape: eight themed chapters (generation gets cheap, taste gets scarce · design-as-code · from making to judging · taste decomposed · the system as guardrail · anti-slop red lines · the AI design loop · decision triage) · three deepening sections (DSN 09 four real cases · DSN 10 critique of six old structures · DSN 11 five do-this tools) · INSTRUMENT 10 the Slop self-check + INSTRUMENT 13 the design-judgment allocator + INSTRUMENT 15 the Slop↔taste self-scorer · nineteen argument-bearing figures · this volume’s own evidence registry R1-R8 (separated from the organization volume’s)

REV. 2026-06 R1.0 / END OF DOCUMENT

AI Native 设计方法论

AI Native Design Methodology

设计文档包：把“何为好”写成生成护栏

Design Pack: writing “what good means” as generation guardrails

生成变富后，稀缺的是品味、意图与“为谁”，不是画稿。

When generation is abundant, the scarce thing is taste, intent, and “for whom,” not comps.

让生成收敛，必须先把品味外化。

To make generation converge, taste must be externalized first.

可机检的交给系统，只能人来定的审美判断留给人。

Machine-checkable design goes to the system; constitutive aesthetic judgment stays human.

先写一张小规格，再铺开候选。

Write a small spec before spreading candidates.

当二十个方案都能做出来，设计师在判断什么？

When Twenty Designs Are Possible, What Is a Designer Judging?

三种看似先进、却把问题留在原地的做法

Three Seemingly Advanced Moves That Leave the Problem Where It Is

同一个内核，作用在设计这个面

The one kernel, acting on the design face

这次重画，和设计史上每一次工具革命的关键差别

How this redraw differs from every prior tool revolution in design history

生成变富，品味变稀缺

Generation gets cheap, taste gets scarce

为什么生成的默认终点是"均值"——而均值就是 slop

Why generation’s default destination is “the mean,” and why the mean is slop

把两条曲线画在一起：成本塌、判断不塌

Plot the two curves together: cost collapses, judgment does not

不对称的直接后果：团队该把省下的人力重新投到哪

The asymmetry’s direct consequence: where a team should reallocate the freed-up effort

这套不对称不是预测，是已经发生的事

This asymmetry is not a forecast but something already happening

"已经发生"意味着等待是有成本的

“Already happening” means waiting has a cost

设计即代码——为什么新工具都在去画布化

Design-as-code: why the new tools de-canvas

"去画布化"是把产物挪到 agent 够得着的形态，不是审美口号

“De-canvasing” moves the artifact into reach of agents, not an aesthetic slogan

"图形即代码"不是新发明，是一条早就存在、如今被 AI 引爆的暗线

“Graphics as code” is no new invention but a long-standing undercurrent now detonated by AI

从打磨一稿到判断多稿

From polishing one to judging many

人的节点没消失，它从产出链的末端搬到了前端

The human node did not vanish; it moved from the end of the chain to the front

为什么"判断多稿"比"打磨一稿"更难，而不是更省事

Why “judging many” is harder than “polishing one,” not easier

同一道分叉，落在设计面上为什么不一样

The same fork lands differently on the design face

"品味"不是不可讨论的直觉，它是一组可拆解的判断

“Taste” is not an untouchable intuition; it is a set of decomposable judgments

拆开看：共情是品味的根，没有它品味只是个人偏好

Unpacked: empathy is the root of taste; without it, taste is mere preference

品味稀缺，但不神秘——它可以被外化、被教、被回流

Taste is scarce but not mysterious: it can be externalized, taught, and fed back

品味是主观的吗？是相对的，但不是任意的

Is taste subjective? Relative, yes; arbitrary, no

设计系统即护栏，把"何为好"写下来

The system as guardrail, writing the spec of “good”

设计系统从"交付后的文档"升级为"生成前的护栏"

The design system upgrades from “post-hoc doc” to “pre-generation guardrail”

同一份规格切两层：机器守"不离牌"，人守"是否为人"

One spec, two layers: the machine holds “on-brand,” the human holds “for people”

设计系统与架构的"结构即护栏"是同一招：同一个原理，不是类比

The design system and architecture’s “structure as guardrail” are the same move: one principle, not an analogy

守住人本，拒绝 slop

Hold the human, refuse slop

"把人还给意义"在设计面上具体指什么

What “returning people to meaning” concretely means on the design face

反 slop 红线（命中越多，越滑向均值）

Anti-slop red lines (the more you hit, the closer to the mean)

赢的是产物变代码

What wins is the artifact becoming code

为什么是"产物形态"而不是"工具能力"在做功

Why it is the “artifact form,” not “tool capability,” that does the work

记症状会过期，懂成因不会——为什么这一节讲机制而非清单

Memorizing symptoms expires; understanding causes does not: why this section teaches mechanism, not a list

"只优化可测的"是一种隐蔽的退化——它会自动发生，除非你刻意防

“Optimize only the measurable” is a hidden degeneration: it happens automatically unless you guard against it

铺开候选，再收敛——一条可照做的环

Spread candidates, then converge: a loop you can run

②铺开的关键是"方向多样"，不是"数量多"

The point of ② spread is “directional diversity,” not “high count”