PART III / AI-NATIVE 设计AI-NATIVE DESIGN

AI Native 设计方法论

AI Native Design Methodology

当一稿、一个变体、一整套界面都近乎免费,稀缺的不再是"做出来",而是"什么算好"——品味与意图。而生成默认滑向均值(slop:通用、雷同、一眼 AI),所以品味反而成了最稀缺的判断。这里说的设计,不止于界面与视觉:它是把意图变成人能理解、愿意使用、为之停留的形态这门更大的手艺——产品、交互、系统、表达都在其内。设计的对象始终是具体的人,所以"为谁、何为好"是 AI 接不走、也最不该接走的那部分判断。纪律同样:工具是表层,提取底层原理。

When one comp, one variant, a whole interface is near-free, the scarce thing is no longer "making it" but "what counts as good" — taste and intent. And generation defaults to the mean (slop: generic, derivative, obviously-AI), so taste becomes the scarcest judgment of all. Design here is not confined to interfaces and visuals: it is the larger craft of turning intent into a form people can understand, want to use, and will stay with — product, interaction, system, and expression all included. Its object is always specific people, so "for whom, and what is good" is the part of judgment AI cannot take over, and should least of all be allowed to. Same discipline: tools are surface; extract the principle beneath.

① 生成已充裕 → ② 判断沿可验证性梯度分叉:可机检的部分(对齐 / 规格符合 / 可访问性)被自动化,构成性的审美判断留给人 → ③ 设计系统与意图成基础设施 → ④ 人回到"为谁、何为好"。无需读过组织卷,本卷即此一面。

① generation is abundant → ② judgment forks along the verifiability gradient: the machine-checkable (alignment / spec-conformance / accessibility) is automated, constitutive aesthetic judgment stays with people → ③ the design system and intent become infrastructure → ④ people return to "for whom, and what is good." You need not have read the Org volume; this volume is that one surface.

面向认知Cognition-facing 研究Research 学习Learning 创新Innovation
面向执行Execution-facing 组织Org 工程Engineering 设计Design

本卷讲设计——把意图变成人能理解、愿意使用、为之停留的形态,从产品到交互到表达,而不止于界面。

This volume is about design — turning intent into a form people can understand, want to use, and stay with, from product to interaction to expression, not interfaces alone.

看完整体系总图See the full system map
AI-ENABLED DESIGNAI-NATIVE DESIGN
产出
Output
更多稿、更快图More drafts, faster images意图、品味与系统约束一起进入生成Intent, taste, and system constraints enter generation together
评价
Review
凭“像不像”挑稿Choose by whether it looks close用指纹、语境和可用性判断何为好Judge quality through fingerprints, context, and usability
边界
Boundary
生成后再修补Patch after generation先写生成护栏,再让系统拒绝 slopWrite guardrails first so the system can refuse slop
拖动滑块,看设计从“出稿效率”转为“品味基础设施”。进入 SHEET 05 · 品味判断
Drag the slider: design moves from draft speed to taste infrastructure. Enter SHEET 05 · Taste Judgment
AI-NATIVE DOCUMENT PACK · PART III

设计文档包:把“何为好”写成生成护栏

Design Pack: writing “what good means” as generation guardrails

本卷的文档不是出稿说明,而是品味的基础设施:让生成能铺开,让人能判断,让系统能拒绝 slop。

The design document is not a production note; it is infrastructure for taste: generation can spread, humans can judge, and the system can refuse slop.

Thesis

生成变富后,稀缺不是画稿,而是品味、意图与“为谁”。

When generation is abundant, the scarce thing is not comps but taste, intent, and “for whom.”

AI-Native 设计把设计师从亲手出稿推回判断环:铺开候选、评判差异、导向下一轮,并把新学到的判据沉淀进设计系统。

AI-Native design moves the designer from hand-producing one comp into the judgment loop: spread candidates, judge differences, steer the next round, and distill new criteria into the design system.

DSN
00
CONCEPT · 概念
CONCEPT
定义 · 先划界
Definition

从 AI 辅助设计,到AI-Native 设计

From AI-Assisted Design to AI-Native Design

设计师用 AI 出图、装一个 Figma 插件,仍然只是 AI 辅助:旧出稿流程更快了,但判断结构没有改变。AI-Native 设计承认生成已经充裕,于是围绕生成重画整张设计流程图。差别不是程度,是种类。

A designer generating images with AI or adding a Figma plugin is still in AI-assisted design: the old production process speeds up, but the judgment structure is unchanged. AI-Native design accepts that generation is abundant and redraws the whole design process around it. The difference is not degree, but kind.

多年来设计的稀缺资源是产出工时——画稿、对齐像素、切图、做变体的人时。所有设计流程都为"做出来很贵"而建。当生成成默认,出稿、铺变体、改文案不再拖慢团队,但瓶颈没消失,它搬家了:搬到"哪个更好、对不对路、是否为人"。把内核四步填上"设计"的具体内容,就是这一部分的命题。

For years design's scarce resource was production hours — drawing comps, aligning pixels, exporting assets, making variants. Every process was built for "making things is expensive." Once generation is the default, producing comps, variants, and copy no longer slows the team, but the bottleneck does not vanish — it moves: to "which is better, on-target, for people." Fill the kernel's four steps with the specifics of design and you get this part's thesis.

用反例划界:三件看起来像、其实不是 AI-Native 设计的事

Drawing the boundary by counterexample: three things that look like it but are not

说清"是什么"最快的方式,是先说清"不是什么"。反例一:把 AI 当出图机。用 Midjourney 出一张配图、用插件生成一版图标,然后照旧手工拼版——这是把 AI 嫁接到旧的"产出工时"流程上,AI 只是换了一支更快的笔。流程图没变,瓶颈也没动。反例二:把 AI 当自动美化器。指望模型"让这版更好看",于是它给你套上当下最流行的视觉模板——结果恰恰是 slop:更光滑,但更没自己。这把品味这件人该做的事,错误地外包给了一台只会拟合均值的机器。反例三:把设计系统当事后文档。先生成、再回头补一套 token 应付审计——这让设计系统失去了它在 AI-Native 流程里唯一重要的角色:生成前的护栏。三个反例的共同错误,是没有围绕"生成已充裕"这一前提重画流程,而只是把 AI 塞进旧流程的某个工位。

The fastest way to say "what it is" is to first pin down "what it is not." Counterexample one: AI as an image machine. Generate an illustration with Midjourney, a set of icons with a plugin, then hand-assemble the layout as before — this grafts AI onto the old "production hours" process, with AI merely a faster pen. The process diagram has not changed, and neither has the bottleneck. Counterexample two: AI as an auto-beautifier. Expecting the model to "make this look better," it dresses your work in the currently most popular visual template — and the result is precisely slop: smoother, but with less self. This wrongly outsources taste, a human's job, to a machine that only fits the mean. Counterexample three: the design system as a post-hoc doc. Generate first, then go back and patch in a token set to pass audit — this strips the design system of the one role that matters in an AI-Native process: the pre-generation guardrail. The shared error of all three is not redrawing the process around the premise that "generation is already abundant," and instead just slotting AI into one station of the old process.

那么正面说,AI-Native 设计是什么?它是承认"生成已充裕"这个前提,并据此把整张设计流程图重画一遍的设计方式。重画的标志有三个,对应内核的②③④:第一,把人的动作从"做"前移到"判断与方向"——不再以产出工时论价值,而以品味命中率论价值(②);第二,把设计系统从交付物升级为生成前的护栏,把"何为好"显式地写成可指导、可机检生成的规格(③);第三,不再把"更快"本身当成赢,而是让被生成解放出来的设计师回到共情、品味与意义(④)。这三条合起来,就是"种类上的不同"而非"程度上的不同":旧流程优化的是"怎么更快地做出来",新流程优化的是"怎么更准地判断该做什么、为谁做"。一个团队是不是真的 AI-Native,不看它用没用 AI 工具,而看它有没有发生这三条重画——尤其是有没有把价值重心从产出真正移到判断。没移,就还在旧流程里用更快的笔;移了,才是种类上的新设计。

So, stated positively, what is AI-Native design? It is the way of designing that accepts the premise "generation is already abundant" and redraws the whole design process diagram accordingly. The redraw has three marks, mapping to the kernel's ②③④: first, moving the human's action forward from "making" to "judgment and direction" — valuing not by production hours but by taste hit-rate (②); second, upgrading the design system from a deliverable to a pre-generation guardrail, explicitly writing "what is good" as a spec that steers and machine-checks generation (③); third, no longer treating "faster" itself as the win, but returning the designer freed by generation to empathy, taste, and meaning (④). Together these three are "a difference in kind," not "a difference in degree": the old process optimizes "how to make it faster," the new process optimizes "how to judge more accurately what to make, and for whom." Whether a team is truly AI-Native is judged not by whether it uses AI tools but by whether these three redraws have happened — especially whether the center of value has really moved from production to judgment. Not moved, and it is still the old process with a faster pen; moved, and only then is it design new in kind.

充裕ABUNDANCE
稿 / 变体 / UI / 文案
Comps / variants / UI / copy
生成成默认,"做出来"不再稀缺。
Generation is the default; making it is no longer scarce.
判断JUDGMENT
品味 · 意图 · 何为好
Taste · intent · what's good
新瓶颈是审美与体验判断 + 连贯。
The new bottleneck is aesthetic/experience judgment + coherence.
上下文CONTEXT
设计系统即护栏
Design system as guardrail
tokens / 组件 / 品牌成为生成的规格。
Tokens / components / brand become the spec for generation.
MEANING
共情 · 品味 · 为意义负责
Empathy · taste · meaning
设计师回到理解用户、守住品味与意图。
Designers return to understanding users, holding taste and intent.

第②步同样沿可验证性梯度分叉:可机检的部分(对齐 / 规格符合度 / 可访问性)并入①充裕、被自动化;构成性的审美判断(品味 / 为谁而存在 / 异质性)下沉④、留给人——这正是设计卷与体系总图共用的那条线。

Step ② forks the same way along the verifiability gradient: the machine-checkable part (alignment / spec-conformance / accessibility) joins ① abundance and gets automated; constitutive aesthetic judgment (taste / who it exists for / heterogeneity) sinks to ④ and stays with people — the very line the design volume shares with the system map.

这一步不是新增了一卷,是同一个内核作用在设计这个面

This is not a new volume; it is the one kernel acting on the design face

系列里每一卷——组织、工程、设计、研究、学习、创新——都不是各讲各的,而是同一条内核落在不同的面上。组织卷把它讲成"执行充裕→判断退守→上下文成基设→人回归意义";工程卷把它讲成"打字充裕→验证成瓶颈→代码库可查询→人做系统专长"。设计卷只是把同样四步的名词,换成设计的具体内容:充裕的是稿与变体,退守的判断是品味与意图,成为基设的上下文是设计系统,回归的意义是共情与为人。读者若已读过任一姊妹卷,会认出这是同一台机器换了零件——这正是它能称"系列"而非孤立六篇的原因。

Every volume in the series — organization, engineering, design, research, learning, innovation — does not each tell its own story; they are the one kernel landing on different faces. The org volume tells it as "execution becomes abundant → judgment retreats → context becomes infrastructure → people return to meaning"; the engineering volume as "typing becomes abundant → verification becomes the bottleneck → the codebase becomes queryable → people do deep systems expertise." The design volume only swaps the nouns of those same four steps for design's specifics: what is abundant is comps and variants, the judgment that retreats is taste and intent, the context that becomes infrastructure is the design system, and the meaning people return to is empathy and being-for-people. A reader who has read any sibling volume will recognize the same machine with different parts swapped in — which is exactly why this is a series, not six isolated essays.

核心图KEY FIGFIG. D0.0 / THE FORK · 判断沿可验证性梯度分叉 看懂:第②步的判断如何一半被自动化、一半下沉到人 Read: how step ②'s judgment splits — half automated, half sinks to people
① ABUNDANCE 生成成默认Generation is default 稿 / 变体 / UI / 文案,近零边际成本。 Comps / variants / UI / copy,near-zero marginal cost. 判断judge 问:仅看产物文本能判对错吗?Ask: judge-able from artifact text alone? 能 → 可机检yes → machine-checkable 并入 ① · 被自动化JOINS ① · automated 对齐 / token 符合度 /对比度 / 可访问性 → lint Alignment / token-conformance /contrast / a11y → lint 不能 → 构成性品味no → constitutive taste 下沉 ④ · 留给人SINKS TO ④ · human 品味 / 为谁而存在 /异质性 / 意义 Taste / who it exists for / heterogeneity / meaning ③ 上下文 · 设计系统即护栏——tokens / 组件 / 品牌,是托住整条分叉的可查询基设③ CONTEXT · the design system as guardrail — tokens / components / brand, the queryable infrastructure holding the whole fork
第②步不是一个黑箱判断,而是一个分叉口。一条分诊问句——"仅看产物文本能判对错吗"——把判断切成两半:可机检的那半回流到①充裕、被自动化(DSN 08 的硬约束);构成性的那半下沉到④、留给人(软判据)。整条分叉立在③上下文之上。这张图是后面所有 SHEET 的骨架。
Step ② is not a black-box judgment but a fork. One triage question — "can this be ruled right or wrong from the artifact text alone?" — cuts judgment in two: the machine-checkable half flows back into ① abundance and gets automated (DSN 08's hard constraints); the constitutive half sinks to ④ and stays with people (soft criteria). The whole fork stands on ③ context. This figure is the skeleton of every SHEET that follows.

这次重画,和设计史上每一次工具革命的关键差别

How this redraw differs from every prior tool revolution in design history

设计史上经历过多次工具革命:从手绘到桌面出版(DTP),从纸面到 Photoshop,从静态切图到 Figma 的协作矢量。每一次都让"做出来"变得更快更便宜,但有一个东西始终没变——做的人和判断的人是同一个,且判断始终内嵌在做的过程里。设计师在推像素的同时就在判断好坏,两件事交织得难分彼此。这一次的重画在结构上不同:它第一次把"做"几乎完全剥离给了机器,从而把"判断"从"做"里抽离出来,逼它独立显形。过去你不需要单独写下"何为好",因为判断就活在你的手上;现在手交给了生成,判断若不被显式地写下来、说清楚,就会消失——而它一消失,生成就滑回均值。这就是为什么这一卷反复强调"写规格、说判据、回流判断":不是这些动作新发明的,而是它们第一次必须从隐性变成显性。理解这个差别,就理解了为什么这次不是"又一个更快的工具",而是设计师价值结构的一次重组。

Design history has been through many tool revolutions: from hand-drawing to desktop publishing (DTP), from paper to Photoshop, from static slicing to Figma's collaborative vectors. Each made "making" faster and cheaper, but one thing never changed — the maker and the judge were the same person, and judgment was always embedded inside the act of making. The designer judged good and bad while pushing pixels, the two interwoven past separating. This redraw is structurally different: for the first time it strips "making" almost entirely to the machine, thereby extracting "judgment" out of "making" and forcing it to take independent shape. You used to not need to write down "what is good" separately, because judgment lived in your hand; now the hand goes to generation, and judgment, if not explicitly written down and stated, simply disappears — and the moment it disappears, generation slides back to the mean. This is why this volume insists, again and again, on "write the spec, state the criteria, feed judgment back": these actions are not newly invented; it is that for the first time they must turn from implicit to explicit. Grasp this difference and you grasp why this is not "yet another faster tool" but a reorganization of the designer's value structure.

DSN
01
MECHANISM · 不对称
THE ASYMMETRY
机理 · 受力
Mechanism

生成变富,品味变稀缺

Generation gets cheap, taste gets scarce

出稿、铺变体、做一整套界面近乎免费;"什么算好"却没变便宜半分。更糟:生成默认收敛到训练分布的均值,也就是大家见得最多的那个样子。正因如此,人的品味成了最稀缺的判断节点。

Producing comps, variants, a whole interface is near-free; "what counts as good" has not gotten one bit cheaper. Worse: generation defaults to the mean of its distribution, the shape everyone has seen most. Which is exactly why human taste becomes the scarcest judgment node.

机制:过去设计的稀缺是"做出来"的工时,流程围绕省产出来建。为何失效:当生成把出稿变成随取随用,产出不再是约束。瓶颈整个落到一个机器答不了的问题上:这一版,好吗?对不对路?是不是为它要服务的人做的?

Mechanism: the old scarcity was production hours, so the process was built to spend fewer. Why it stalls: when generation makes comps something you draw on at will, production is no longer the constraint. The bottleneck lands wholesale on a question the machine cannot answer: is this version good? on-target? made for the people it serves?

为什么生成的默认终点是"均值"——而均值就是 slop

Why generation's default destination is "the mean" — and the mean is slop

这条不对称的根,藏在生成模型的目标函数里。模型被训练去最小化期望损失,在面对一个欠约束的请求("做一个好看的落地页")时,最安全的策略就是输出训练分布里最高频、最不容易出错的那个形态。统计上,这就是均值——所有它见过的"落地页"叠在一起的那个最大公约数。问题在于:均值天然谁也不特别像、谁也不特别为。它对所有人都"还行",恰恰因为它没为任何一群人做过取舍。这就是 slop 的数学定义——不是丑,是"对均值的收敛"。理解这一点能纠正一个常见的误判:以为"模型更强了 slop 就会消失"。恰恰相反,模型越强,它拟合均值越精准,无约束时产出的 slop 反而越光滑、越难一眼识破。能把它拉离均值的,从来不是更强的模型,而是更强的约束——也就是人给定的规格与品味。

The root of this asymmetry hides in the generation model's objective function. A model is trained to minimize expected loss, and facing an under-constrained request ("make a good-looking landing page"), its safest strategy is to output the highest-frequency, least-error-prone form in its training distribution. Statistically, that is the mean — the greatest common divisor of every "landing page" it has seen, stacked together. The problem: the mean is by nature like no one in particular and for no one in particular. It is "fine" for everyone precisely because it made no trade-off for any one group. This is slop's mathematical definition — not ugliness but "convergence to the mean." Grasping this corrects a common misdiagnosis: the belief that "slop will disappear once the model gets stronger." Quite the opposite — the stronger the model, the more precisely it fits the mean, and the slop it produces unconstrained gets smoother and harder to spot at a glance. What can pull it off the mean has never been a stronger model but stronger constraints — namely the spec and taste a human supplies.

生成 · 近乎免费Generation · near-free
一稿、十个变体、一整套界面与状态——随取随用,近零边际成本。
One comp, ten variants, a whole interface with states — on demand, near-zero marginal cost.
品味 · 依旧稀缺Taste · still scarce
"哪个更好、为什么、是否为人"——没有捷径,只能由人判断。这就是新瓶颈。
"Which is better, why, is it for people" — no shortcut, only human judgment. This is the new bottleneck.
FIGFIG. D1.1 / DISTRIBUTION CLAMP · 生成默认堆在均值,设计系统把分布夹离均值 · generation piles on the mean; the design system clamps the distribution off it 看懂:生成的输出本身是一条压在"均值=slop"上的钟形分布,护栏不是挑一个好结果,而是把整条分布夹窄、推向品牌那一侧 Read: generation's output is itself a bell centered on "the mean = slop"; the guardrail does not pick one good result, it clamps the whole distribution narrow and pushes it toward the brand side
← 偏离均值(独特、为某群人)← off-mean (distinct, for some group) 偏离均值 →off-mean → THE MEAN 均值 = slopthe mean = slop 生成的默认分布default generation 峰压在均值上peaked on the mean BRAND 夹窄·推向品牌clamped, pushed to brand 设计系统 / tokens = 夹具design system / tokens = the clamp 不是挑一个,是约束整条分布not pick-one, but constrain the whole
关键不在"挑出那个好结果"——那是事后筛选,挑不快也挑不稳。护栏做的是上游的事:把生成那条压在均值上的宽分布,整条夹窄、整条推离均值、推向品牌成立的那一侧。所以设计系统的回报不是"少返工一次",而是"每一次生成的期望都更接近你认的东西"。这也是为什么 slop 是默认而非偶发:不加夹具,分布的峰永远落在均值。
The point is not "pick the one good result" — that is downstream filtering, neither fast nor reliable. The guardrail acts upstream: it takes generation's broad mean-centered distribution and clamps the whole thing narrow, shifts the whole thing off the mean toward where the brand holds. So a design system's payoff is not "one less rework" but "every generation's expectation lands closer to what you would sign off on." This is also why slop is the default, not the accident: without a clamp, the peak always sits on the mean.

把两条曲线画在一起:成本塌、判断不塌

Plot the two curves together: cost collapses, judgment does not

不对称之所以是机理而非口号,是因为两件事在过去十八个月里以不同的速率移动。生成的边际成本沿着模型能力曲线塌了一到两个数量级——一稿、十个变体、一整套响应式状态,从"几天人时"变成"几分钟一次提示"。而"哪个好、为谁好"这条判断曲线近乎水平:它不随模型变强而变便宜,因为它问的不是"能不能做出来",而是"该不该是这样"。两条曲线一塌一平,中间张开的剪刀差,就是品味成为瓶颈的全部原因〔源:Anthropic 2025 agentic-coding 实践与 Karpathy "software is changing" 论述,证据级 Ⅳ 一手从业者〕[R1][R2]

The asymmetry is a mechanism, not a slogan, because the two things have moved at different rates over the last eighteen months. Generation's marginal cost has collapsed one-to-two orders of magnitude along the model-capability curve — one comp, ten variants, a full set of responsive states, from "person-days" to "a prompt and a few minutes." The judgment curve, "which is good, good for whom," is nearly flat: it does not get cheaper as the model gets stronger, because it asks not "can this be made" but "should it be this way." One curve falls, the other stays level, and the scissor-gap that opens between them is the entire reason taste becomes the bottleneck 〔Source: Anthropic 2025 agentic-coding practice and Karpathy's "software is changing" talk, grade Ⅳ practitioner〕[R1][R2].

核心图KEY FIGFIG. D1.0 / GENERATION × TASTE · 生成×品味平面 看懂:AI 把你推向哪一格,品味要在哪一格注入 Read: which quadrant AI pushes you into, and where taste must be injected
品味 高 ↑Taste high ↑ 品味 低Taste low 生成 贵Generation costly 生成 近乎免费 →Generation near-free → Q2 旧手艺Old craft 资深设计师手工打磨一稿:好,但贵、慢、难规模化。 A senior hand-polishing one comp:good, but costly, slow, hard to scale. Q1 · ★ 胜势:富而有品The win: cheap + tasteful 生成铺面、人注入品味:又快又好又规模化。这是目标格。 Generation covers surface, human injectstaste: fast, good, scalable. The target. Q3 徒劳Wasted effort 花大力气却没注入判断——贵的 slop,最不划算。 Great effort, no judgment injected —expensive slop, the worst trade. Q4 · ⚠ slop 默认区The slop default 不加干预,生成就落这里:便宜、好看、谁也不为。 Untouched, generation lands here:cheap, slick, for no one. AI 把你推向这里(变便宜)AI pushes you here (cheaper) 品味必须把你抬上去taste must lift you up X 轴:生成成本 2023-26 逐年趋零(Anthropic 2025,证据级 Ⅳ)· Y 轴:品味无可机检判定程序(证据级 Ⅴ 论证)X: generation cost → 0, 2023-26 (Anthropic 2025, grade Ⅳ) · Y: taste has no machine-checkable decision procedure (grade Ⅴ argument)
AI 只沿横轴帮你——把生成推向"近乎免费"。它不会替你沿纵轴往上走。不加判断,你就从 Q2(旧手艺)平移到 Q4(slop 默认区):更便宜,但谁也不为。胜势 Q1 不是 AI 送的,是人把品味这条纵轴重新加上去换来的。
AI helps you only along the horizontal axis — pushing generation toward "near-free." It will not climb the vertical axis for you. Without injected judgment you simply slide from Q2 (old craft) to Q4 (the slop default): cheaper, but for no one. The win, Q1, is not a gift from AI; it is what a human buys back by re-adding the vertical axis of taste.

不对称的直接后果:团队该把省下的人力重新投到哪

The asymmetry's direct consequence: where a team should reallocate the freed-up effort

如果生成把产出成本压塌、而判断成本不变,那么一个理性的团队就该做一次显式的人力再分配,而不是简单地"用 AI 提效然后裁掉一半设计师"。后者是对这条不对称的误读——它假设瓶颈还在产出,所以省了产出就万事大吉。真相是瓶颈搬到了判断,所以省下的产出人力应当重新投向判断侧:投到把规格写得更有判别力、把品味外化成可复用的护栏、把每一轮的判断回流进系统这些事上。一个做对了的团队,外观上的变化是:花在 Figma 里推像素的时间大幅减少,花在写"为谁、何为好、什么是红线"和评审候选、讨论"为什么这版对路"的时间大幅增加。设计师的人数不一定变少,但每个人的工作内容会显著上移——从执行者变成判断者与方向制定者。误读这条不对称的代价很实在:以为能靠 AI 省掉判断,结果只是更快地产出无人负责品味的 slop,把 Q2 的设计悄悄平移到了 Q4。

If generation collapses production cost while judgment cost stays fixed, then a rational team should perform an explicit reallocation of human effort, not simply "use AI to boost efficiency and then cut half the designers." The latter misreads the asymmetry — it assumes the bottleneck is still production, so saving production is the end of it. The truth is the bottleneck moved to judgment, so the freed-up production effort should be reinvested on the judgment side: into writing more discriminating specs, externalizing taste into reusable guardrails, feeding each round's judgment back into the system. A team that gets this right looks, on the surface, like this: time spent pushing pixels in Figma drops sharply, and time spent writing "for whom, what is good, what is a red line," reviewing candidates, and discussing "why this version is on-target" rises sharply. The headcount of designers need not shrink, but each person's work shifts markedly upward — from executor to judge and direction-setter. The cost of misreading the asymmetry is concrete: believing AI can save you judgment, you only produce, faster, slop for whose taste no one is responsible, quietly translating Q2 design into Q4.

还有一个常被忽略的二阶效应:当产出近乎免费,尝试的成本也近乎为零,于是探索的边界应该被大幅推开。过去因为每一稿都贵,团队倾向于早早收敛到一个"安全"的方向,不敢做太多发散——发散意味着浪费宝贵的人时。现在这个约束消失了:铺十个真正不同的方向和铺一个,成本相差无几。这意味着理性的策略应该反过来——在判断之前尽可能多地发散,因为发散几乎免费,而发散得越广,你的判断能从越大的可能性空间里挑选,命中"那个真正对路的方向"的概率就越高。可惜很多团队把省下的成本用错了地方:他们用它去更快地收敛("反正出图快了,赶紧定稿往下走"),而不是更广地探索。这是对不对称的又一种误用——它把"生成变便宜"的红利花在了加速旧习惯上,而没意识到这个红利真正解锁的是"以前不敢做的大范围探索"。用对了,生成的廉价不是让你更快地做完,而是让你的判断有了前所未有大的素材库。

There is also an often-overlooked second-order effect: when production is near-free, the cost of trying is near-zero, so the boundary of exploration should be pushed wide open. In the past, because every comp was expensive, teams tended to converge early on a "safe" direction, afraid to diverge much — divergence meant wasting precious person-hours. Now that constraint is gone: spreading ten genuinely different directions and spreading one cost about the same. This means the rational strategy should invert — diverge as much as possible before judging, because divergence is nearly free, and the wider you diverge the larger the possibility space your judgment can pick from, the higher the probability of hitting "the genuinely on-target direction." Sadly, many teams spend the saved cost in the wrong place: they use it to converge faster ("comps are fast now, settle and move on") rather than explore wider. This is another misuse of the asymmetry — it spends the "generation got cheap" dividend on accelerating old habits, not realizing the dividend truly unlocks "the wide-range exploration you never dared before." Used right, generation's cheapness does not let you finish faster; it gives your judgment an unprecedentedly large library of material.

受力警告 · slopForce warning · slop

生成的默认产物是 slop:收敛到见得最多的样子。它看起来完成了,却谁都不像、谁也不为。slop 不是做得差,是没把判断放进去——避开它的唯一办法,是把人的品味放回环里。Generation's default output is slop: it converges on what it has seen most. It looks finished, yet resembles no one and is for no one. Slop is not bad craft; it is judgment left out — the only way around it is to put human taste back in the loop.

DSN
01·5
WHEN · 为什么是现在
WHY NOW
证据 · 时机
Evidence · Timing

这套不对称不是预测,是已经发生的事

This asymmetry is not a forecast but something already happening

"生成变富、品味变稀"听起来像对未来的押注,但它已经在过去十八个月里以可观察的方式发生了。把它当成已发生的事实而非预言,才能据此重画流程——而不是"等技术成熟再说"。这一节给出可被推翻的具体信号:如果这些信号不成立,整卷的前提就该被质疑。

"Generation gets cheap, taste gets scarce" sounds like a bet on the future, but it has already happened in observable ways over the last eighteen months. Treating it as an accomplished fact rather than a prophecy is what lets you redraw the process now — rather than "waiting until the tech matures." This sheet gives concrete, refutable signals: if these signals do not hold, the whole volume's premise deserves to be questioned.

信号一:成本侧已经塌了。能从一句描述出一整套带状态、响应式、可交付代码的界面,已是 2024–2025 年间多个生成式 UI 工具与 agentic-coding 实践的常规能力,而不是演示。出一版界面从"几天人时"压到"一次提示 + 几分钟",这是横轴上一到两个数量级的位移〔源:Anthropic 2025 agentic-coding 实践、多家生成式前端工具公开能力,证据级 Ⅳ 一手从业者〕[R1]信号二:判断侧没跟着塌。同期,"哪一版对路、是否为这群人、有没有越过那条品味的线"并没有变得更可自动化——把同一个需求丢给模型十次,你会得到十个都"做得对"却需要人来挑的结果。生成解决了"做得出来",没解决"该是哪一个"。信号三:slop 成了可观察的公共现象。"一眼看出是 AI 做的"从一个模糊感受,变成了能被具体指纹(青配深底、紫蓝渐变、玻璃拟态、Inter 居中)描述、甚至能被检测的现象——这本身就是"生成默认收敛到均值"的直接证据。

Signal one: the cost side has already collapsed. Producing a whole interface — with states, responsive, deliverable code — from a one-line description is by now a routine capability across several generative-UI tools and agentic-coding practices of 2024–2025, not a demo. Compressing "produce one interface version" from "person-days" to "a prompt plus a few minutes" is a one-to-two order-of-magnitude shift along the horizontal axis 〔Source: Anthropic 2025 agentic-coding practice, public capabilities of several generative front-end tools, grade Ⅳ practitioner〕[R1]. Signal two: the judgment side has not collapsed with it. Over the same period, "which version is on-target, is it for these people, has it crossed the line of taste" has not become more automatable — give the same brief to a model ten times and you get ten results that are all "done right" yet need a human to pick among. Generation solved "can be made," not "which one it should be." Signal three: slop has become an observable public phenomenon. "You can tell at a glance it was AI-made" has gone from a vague feeling to a phenomenon describable by concrete fingerprints (cyan-on-dark, purple-blue gradients, glassmorphism, centered Inter), even detectable — which is itself direct evidence that "generation defaults to converging on the mean."

FIGFIG. D2.1 / THE SCISSORS · 一塌一平,张开的剪刀差就是瓶颈 · one collapses, one stays flat; the opening gap is the bottleneck 看懂:过去十八个月两条曲线以不同速率移动——生成成本塌了一到两个数量级,判断成本近乎水平,中间的剪刀差正是品味成为瓶颈的全部原因 Read: over eighteen months the two curves moved at different rates — generation cost fell one-to-two orders of magnitude, judgment cost stayed near-flat; the scissors gap between them is the whole reason taste becomes the bottleneck
每单位成本cost / unit 时间 →(模型能力沿曲线变强)time → (model capability rising along the curve) 生成的边际成本marginal cost of generation 塌 1–2 个数量级down 1–2 orders 判断的成本(哪个好·为谁)cost of judgment (which · for whom) 近乎水平 — 不随模型变便宜near-flat — does not get cheaper with the model 剪刀差 = 瓶颈scissors = bottleneck 观测,非预测:Anthropic 2025 agentic-coding · Karpathy 2025,证据级 Ⅳobserved, not forecast: Anthropic 2025 agentic-coding · Karpathy 2025, grade Ⅳ
这张图唯一要让人记住的,是两条线的斜率不同,而不是任何具体数字。橙线塌、蓝线平,不是因为判断"更难做",而是因为它问的是另一类问题——不是"能不能做出来",是"该不该是这样、为谁"——这类问题不随模型变强而变便宜。瓶颈因此不会随下一代模型消失,它会随每一次成本下塌而更突出[R1][R2]
The one thing to take from this chart is the difference in slope, not any specific number. The orange line collapses and the blue one stays flat — not because judgment "got harder," but because it asks a different kind of question — not "can it be built" but "should it be this, for whom" — a kind that does not get cheaper as the model improves. So the bottleneck will not vanish with the next model; it gets more prominent with every collapse in cost. [R1][R2]

"已经发生"意味着等待是有成本的

"Already happening" means waiting has a cost

把不对称当成已发生的事实,而非未来的预言,有一个直接的行动含义:等待不是中性的,它有持续累积的成本。如果这只是一个对未来的押注,"再观望一两年"是理性的;但既然成本侧已经塌、判断侧已经成为瓶颈、slop 已经是可观察的公共现象,那么每多用一段时间的旧流程,就是在用更高的单位成本做本可以更便宜的产出,同时把本该投到判断侧的注意力继续锁在产出侧。更隐蔽的成本是判断能力的萎缩:如果一个团队迟迟不把工作重心移到判断上,它的设计师就一直没有机会训练"写有判别力的规格、说清为什么这版对路"这些新瓶颈所需的能力——而当不得不迁移时,会发现这些能力不是开个会就能补上的。所以"现在"不是一个营销话术里的紧迫感,而是这条不对称的真实推论:既然变化已经发生,重画流程的最佳时机就是现在,成本最低的时机也是现在。

Treating the asymmetry as an accomplished fact rather than a future prophecy has a direct action implication: waiting is not neutral; it carries a steadily accumulating cost. If this were merely a bet on the future, "watch and wait another year or two" would be rational; but since the cost side has already collapsed, the judgment side has already become the bottleneck, and slop is already an observable public phenomenon, every additional stretch of running the old process means doing, at a higher unit cost, production that could be cheaper, while continuing to lock on the production side the attention that should go to judgment. The more insidious cost is the atrophy of judgment capability: if a team is slow to shift its center of work to judgment, its designers never get the chance to train the abilities the new bottleneck demands — writing discriminating specs, stating why a version is on-target — and when migration becomes unavoidable, they find these are not abilities you patch in with a meeting. So "now" is not a marketing urgency but a real corollary of this asymmetry: since the change has already happened, the best time to redraw the process is now, and the cheapest time is also now.

需要给"现在就动"加一条诚实的限定,以免它被读成盲目的紧迫感:动,指的是开始把工作重心往判断侧迁移、开始建可机检的护栏、开始跑一遍闭环,而不是"立刻把所有现有工具换成最新的 AI 设计工具"。工具会快速迭代、会有赢家输家,押注某个具体工具是有风险的;但押注"产物的可文本化、判断重于产出、护栏前置"这几条结构性方向几乎没有风险,因为它们不依赖任何具体工具的存亡——无论最后哪个工具胜出,这些方向都成立。所以"现在就动"的正确解读是:在结构层面立刻开始迁移(这是安全且划算的),在工具层面保持敏捷、不过早 all-in 某一个(这是审慎的)。把这两件事分开,就既不会因为观望而持续付迁移成本,也不会因为押错工具而被套牢。这正是把不对称当成"已发生的结构性事实"、而非"某个产品的营销叙事"应有的清醒。

"Move now" needs an honest qualifier, lest it read as blind urgency: moving means beginning to shift the center of work to the judgment side, beginning to build machine-checkable guardrails, beginning to run the loop once — not "immediately replacing every existing tool with the newest AI design tool." Tools will iterate fast and have winners and losers; betting on any one tool is risky. But betting on the structural directions of "the artifact's text-expressibility, judgment over production, guardrails up front" carries almost no risk, because they do not depend on any one tool's survival — whichever tool wins in the end, these directions hold. So the correct reading of "move now" is: at the structural level, start migrating immediately (safe and worthwhile), and at the tool level, stay agile and do not go all-in on one too early (prudent). Keep these two apart and you neither keep paying the migration cost by waiting nor get locked in by betting on the wrong tool. This is the clear-headedness due to treating the asymmetry as "an accomplished structural fact" rather than "some product's marketing narrative."

最后要给这套"为什么是现在"加一层必要的克制,避免它滑成技术决定论:成本侧塌了、判断成了瓶颈,这些是真的;但"所以一切都该立刻 AI 化"并不跟着成立。有些设计场景的判断密度极高、可机检的部分极少(比如一个承载强烈情感的品牌重塑、一个高度依赖特定文化语境的视觉系统),在这些场景里,生成能帮的忙本来就有限,硬上 AI 流程反而可能添乱。承认不对称已发生,不等于承认它在每个角落都同等强烈。诚实的姿态是:把这套方法当成一张受力图,告诉你瓶颈在往哪搬、杠杆点在哪,但具体到某个场景该投入多少、哪些环节真能受益,仍需你自己判断——而这个"判断该不该上、上到什么程度"的元判断,本身就是这套方法最看重的那种判断。一套好方法不该要求你信仰它,而该给你一副看清受力的眼睛,连"它在这里适不适用"也交给你判断。这正是它区别于营销叙事的地方:营销要你全盘照单,方法只给你看清结构的工具。

Finally, this "why now" needs a layer of necessary restraint, lest it slide into technological determinism: the cost side has collapsed and judgment has become the bottleneck — these are real; but "so everything should be AI-ified immediately" does not follow. Some design scenarios have extremely high judgment density and very little machine-checkable surface (a brand rebrand carrying intense emotion, a visual system deeply dependent on a specific cultural context), and in these, what generation can help with is inherently limited; forcing an AI process may add noise instead. Granting the asymmetry has happened is not granting it is equally strong in every corner. The honest posture is to treat this method as a force diagram that tells you where the bottleneck is moving and where the leverage points are, while how much to invest in a given scenario and which steps truly benefit still needs your own judgment — and this meta-judgment of "whether and how far to adopt" is itself the kind of judgment this method values most. A good method should not demand you believe in it but give you eyes to see the force clearly, leaving even "does it apply here" to your judgment. This is exactly what distinguishes it from a marketing narrative: marketing wants you to take the whole package; a method only hands you the tools to see the structure clearly.

证伪条件Falsification condition

这一卷的前提会被推翻,如果:(a) 模型在没有人类规格的情况下,开始稳定产出"不是均值、且确实为某群人对路"的设计——那说明品味已被自动化,②不再退守;或 (b) 生成的成本并未实质下降、出稿仍是团队瓶颈——那说明①充裕尚未到来。只要这两条都不成立(目前都不成立),不对称就是真实受力,而非修辞。This volume's premise is refuted if: (a) models begin, without human specs, to reliably produce designs that are "not the mean and genuinely on-target for some group" — that would mean taste is automated and ② no longer retreats; or (b) generation's cost has not materially dropped and producing comps is still the team bottleneck — that would mean ① abundance has not arrived. As long as neither holds (and neither does today), the asymmetry is a real force, not rhetoric.

DSN
02
DESIGN-AS-CODE · 设计即代码
DESIGN-AS-CODE
重画 · 原理
Redraw · Principle

设计即代码——为什么新工具都在去画布化

Design-as-code — why the new tools de-canvas

为什么 pencil / paper、Remotion(用 React 写视频)、Hyperframes / html-video 这类工具价值被放大?因为它们把设计产物从不可读的二进制画布变成代码 / 纯文本。一旦产物是代码,设计就拿到了工程同款杠杆。

Why are pencil / paper, Remotion (video in React), Hyperframes / html-video amplified? Because they turn the design artifact from an opaque binary canvas into code / plain text. Once the artifact is code, design gets the same leverage as engineering.

这正是工程部分那五条贯穿原理,作用在设计这个面上。真正起作用的是"产物变代码":它让设计满足了同样对 agent 友好的属性——凡满足的都被放大,凡是锁在私有二进制里的都被边缘化:

This is the engineering part's five through-lines, on the design surface. What actually does the work is "artifact becomes code": it makes design satisfy the same agent-friendly properties — what meets them gets amplified, what is locked in proprietary binaries gets marginalized:

"去画布化"不是审美口号,是把产物挪到 agent 够得着的形态

"De-canvasing" is not an aesthetic slogan; it moves the artifact into reach of agents

画布工具(Figma、Sketch、PSD)把设计状态存成私有二进制:图层树、约束、矢量路径都锁在格式里,只有那个软件读得懂。人能用眼睛看,但 agent 进不去——它既不能可靠地读出"这个按钮用了哪个 token",也无法把一次修改表达成可评审的文本差异。新一代工具(pencil/paper 用代码描述图形、Remotion 用 React 描述视频、html-video 直接用网页技术出动效)共同的动作不是"长得更现代",而是把同一份设计重新表达为纯文本。一旦如此,设计就掉进了软件工程三十年积累的全部基础设施里:git、diff、code review、CI、自动化生成。这不是工具竞赛,是产物形态的相变〔源:本系列工程卷"五条贯穿原理"与 design-as-code 实践,证据级 Ⅳ〕[R3]

Canvas tools (Figma, Sketch, PSD) store design state as a proprietary binary: the layer tree, constraints, and vector paths are locked in a format only that software understands. A human can look with their eyes, but an agent cannot get in — it can neither reliably read out "which token this button uses" nor express a change as a reviewable text diff. The new generation of tools (pencil/paper describing graphics as code, Remotion describing video as React, html-video doing motion straight in web tech) share a move that is not "looking more modern" but re-expressing the same design as plain text. Once that happens, design falls into the entire infrastructure software engineering has accumulated for thirty years: git, diff, code review, CI, automated generation. This is not a tool race but a phase change in the form of the artifact 〔Source: this series' engineering volume "five through-lines" and design-as-code practice, grade Ⅳ〕[R3].

把"产物变代码"带来的协作纪律说具体些,会更有说服力。当设计是 Figma 文件,团队协作靠的是"在文件里留评论、约会议口头对齐、谁是这个文件的 owner"这套社交协议;它脆弱、不可追溯、改动一多就乱。当设计是代码,协作立刻继承工程那套已经被验证了几十年的纪律:每次改动是一个 commit,带作者、时间、说明;要并入主干得先过 review;冲突有明确的合并规则;出了问题能精确回滚到任一历史版本;谁改了哪一行一目了然。这套纪律不是凭空多出来的福利,而是"产物是文本"这一形态自动带来的——文本能 diff,能 diff 就能 review,能 review 就能协作而不互相覆盖。这对 AI-Native 尤其关键,因为当 agent 开始大量并行地改设计,没有这套纪律,多个 agent 的产出会立刻互相打架、无人能审。代码形态等于给"人 + 多个 agent 一起改同一份设计"提供了一套现成的、可扩展的协作底座——这是画布工具结构上给不了的。

Making the collaboration discipline that "artifact becomes code" brings more concrete is more persuasive. When design is a Figma file, team collaboration rests on a social protocol: leaving comments in the file, aligning verbally in meetings, "who owns this file" — fragile, untraceable, and chaotic once changes pile up. When design is code, collaboration immediately inherits engineering's discipline validated over decades: each change is a commit with author, time, and message; merging to the trunk requires passing review; conflicts have explicit merge rules; problems can be precisely reverted to any historical version; who changed which line is plain to see. This discipline is not a windfall conjured from nowhere but is brought automatically by the "artifact is text" form — text can diff, diff enables review, review enables collaborating without overwriting each other. This matters especially for AI-Native, because once agents begin changing design en masse and in parallel, without this discipline the outputs of multiple agents would instantly conflict, reviewable by no one. The code form thus hands "humans + multiple agents changing one design together" a ready-made, scalable collaboration substrate — something canvas tools structurally cannot give.

这里要避免一个把"产物变代码"读偏的方向:它不是要设计师都去学写代码、都变成前端工程师。"产物是代码"指的是产物的表达形态是文本,而不是要求每个设计师手敲那段文本——恰恰相反,手敲那段文本的活,正是该交给生成的部分。设计师需要的不是写代码的能力,而是读得懂、判断得了那段代码所描述的设计好不好的能力,以及把意图说清楚到能指导生成的能力。换句话说,代码形态对设计师的要求不是"会写",而是"会判断 + 会指导"——这恰好又落回了这一卷的主线:人退到判断与方向,执行(包括把设计写成代码)交给生成。所以"设计即代码"和"设计师要回到品味与意义"不矛盾,反而是同一件事的两面:正因为产物变成了 agent 能读写的代码,生成才能接管执行,设计师才能腾出手来只做判断。把这条理清楚,就不会因为"我不会写代码"而误以为自己被排除在 AI-Native 设计之外——你需要的从来不是写,是判断。

Here a misreading of "artifact becomes code" must be avoided: it does not mean designers must all learn to write code and become front-end engineers. "The artifact is code" refers to the artifact's expression form being text, not a demand that every designer hand-type that text — quite the opposite, hand-typing that text is exactly the part that should go to generation. What a designer needs is not the ability to write code but the ability to read and judge whether the design that code describes is good, plus the ability to state intent clearly enough to steer generation. In other words, the code form asks of designers not "can write" but "can judge + can steer" — which again lands on this volume's through-line: people retreat to judgment and direction, execution (including writing the design as code) goes to generation. So "design as code" and "designers return to taste and meaning" do not conflict but are two faces of one thing: precisely because the artifact becomes code an agent can read and write, generation can take over execution, and the designer is freed to do only judgment. Get this straight and you will not mistake "I can't write code" for being excluded from AI-Native design — what you need has never been to write, but to judge.

FIGFIG. D2.0 / ARTIFACT BECOMES CODE · 产物变代码 → 四属性 看懂:产物从二进制画布变文本,一次性获得哪四个杠杆 Read: artifact goes binary→text and gains which four levers at once
二进制画布 · 被边缘化BINARY CANVAS · sidelined Figma / PSD / 私有画布Figma / PSD / proprietary agent 只能截图猜,改动是不可读的 diff。 Agent can only screenshotand guess; diffs opaque. 变文本→ text 代码 / 纯文本CODE / PLAIN TEXT HTML / JSX / tokens.json 可读:agent 直接读写设计源Legible: agents read/write the source 不必解析私有画布no proprietary canvas to parse 可 diff / 可版本:一改动 = 一提交Diffable: a change = a commit 可评审、可回滚reviewable, revertible 可生成 / 组合:批量铺变体Generatable: spin up variants 人只判断与导向humans only judge and steer 可验证:token / 约束机器检"离牌"Verifiable: lint for off-brand 品味的护栏可机检 → 接 DSN 06/08taste's guardrails check → DSN 06/08
四个属性不是工具的功能,而是"产物=文本"这一形态的副产物。这正是工程卷那条放大律——满足 agent 友好属性的被放大,锁在私有二进制里的被边缘化——落到设计面上的样子。最后一条(可验证)是品味护栏可机检的入口,下一张图与 DSN 06/08 接上。
The four properties are not tool features but byproducts of the "artifact = text" form. This is exactly the engineering volume's amplification law — what meets agent-friendly properties gets amplified, what is locked in proprietary binary gets sidelined — landing on the design surface. The last property (verifiable) is the entry point where taste's guardrails become machine-checkable; it connects to DSN 06/08.

"图形即代码"不是新发明,是一条早就存在、如今被 AI 引爆的暗线

"Graphics as code" is no new invention but a long-standing undercurrent now detonated by AI

值得提醒一句历史:把视觉产物表达为文本,从来不是新事。SVG 用 XML 描述矢量图、CSS 用声明式规则描述样式、LaTeX 用标记描述排版、PostScript 用程序描述页面——几十年里,"图形即代码"一直作为一条暗线存在,只是因为人手写它太慢,所以大多数设计仍然回到所见即所得的画布。AI 改变的不是这条暗线本身,而是它的经济性:当生成能以近零成本写出、读懂、改动这些文本表达,"人手写太慢"这个唯一的拦路虎消失了。于是这条一直存在的暗线被引爆——代码形态从"理论上更优、实践上太累"变成"理论实践双优"。这解释了为什么这一波不是又一次工具迭代,而是一次形态的相变:不是发明了新东西,是让一直更优的那个形态终于变得可行。从业者该读出的信号是:押注产物的可文本化程度,而不是押注某个具体工具的功能。

A note of history is worth making: expressing visual artifacts as text is not new at all. SVG describes vectors in XML, CSS describes styling in declarative rules, LaTeX describes typesetting in markup, PostScript describes pages as a program — for decades "graphics as code" has existed as an undercurrent, held back only because writing it by hand is too slow, so most design still returns to the WYSIWYG canvas. What AI changes is not the undercurrent itself but its economics: once generation can write, read, and edit these text expressions at near-zero cost, the one obstacle — "too slow to hand-write" — disappears. So the long-present undercurrent is detonated: the code form goes from "theoretically superior, practically exhausting" to "superior in both theory and practice." This explains why this wave is not yet another tool iteration but a phase change in form: nothing new was invented; the form that was always superior finally became feasible. The signal a practitioner should read: bet on how text-expressible the artifact is, not on the features of any one tool.

同构 / 深潜Isomorphism / dive

设计系统 ↔ 架构的"结构对 agent 可读"是同一招——都是让海量生成连贯的护栏。见The design system ↔ the architecture chapter's "structure legible to agents" is the same move — both are guardrails that keep mass generation coherent. See 架构篇 ↗the Architecture chapter ↗.

DSN
03
REDRAW · 从造物到判物
MAKING → JUDGING
重画 · 流程
Redraw · Process

从打磨一稿到判断多稿

From polishing one to judging many

设计师的动作从"打磨一稿",转向"生成多稿 + 判断、筛选、导向"。会做不再值钱,会挑、会指方向才值钱。品味成了瓶颈——一如工程里的验证,只是对象换成体验与美。

The designer's move shifts from "polish one comp" to "generate many, then judge, curate, steer." Being able to make is no longer the prize; being able to pick and to point the direction is. Taste becomes the bottleneck — like verification in engineering, only the object is experience and beauty.

重画:把动作前移到判断——让生成铺开候选,设计师做三件机器做不了的事:(哪版对路)、(为什么好、差在哪)、(往哪个方向再生成)。这与工程的 trust-but-verify 同构:生成可信但要验,只是这里验的不是正确性,是品味与体验。

Redraw: move the action upstream to judgment — let generation spread candidates, and have the designer do the three things machines cannot: pick (which is on-target), critique (why it is good, where it falls short), steer (which direction to regenerate). Isomorphic to engineering's trust-but-verify: generation is trusted but verified, only here what is verified is taste and experience, not correctness.

人的节点没消失,它从产出链的末端搬到了前端

The human node did not vanish; it moved from the end of the chain to the front

旧流程里,设计师的手贯穿全程:从空白画布起,一笔一笔把脑中那一稿做出来,价值集中在执行的精度。新流程里,执行交给生成,人的手退出"做"这一段,集中到两端——前端的意图与规格(要往哪生成、何为好),与中段的判断与导向(挑哪版、为什么、再往哪走)。这不是把设计师降级成"按按钮的人",恰恰相反:会做的人很多,会判断的人稀缺。当一个动作能被生成承包,它就从人的核心价值里退场;留给人的,永远是当下还不能被自动化的那个判断节点。设计师的稀缺性因此上移了一层——从手上功夫,移到眼力与方向感。

In the old process the designer's hand ran the whole length: from a blank canvas, stroke by stroke, realizing the comp in their head, with value concentrated in execution precision. In the new process execution goes to generation; the human hand exits the "making" segment and concentrates at the two ends — the front-end intent and spec (which way to generate, what is good) and the mid-stream judgment and steering (which version, why, where next). This is not a demotion of the designer to "the person who presses buttons"; quite the opposite: many people can make, few can judge. When an action can be subcontracted to generation, it exits the human's core value; what remains for people is always the judgment node that cannot yet be automated. The designer's scarcity therefore moves up a layer — from hand-skill to eye and a sense of direction.

FIGFIG. D3.0 / MAKING → JUDGING · 人的节点上移 看懂:设计师的手从"做"退出,落到"规格"与"判断"两处 Read: the designer's hand exits "making," lands on "spec" and "judgment"
旧 · 手贯穿全程OLD · hand runs the whole length 空白画布Blank canvas 一笔一笔手工执行(人时在此)Stroke-by-stroke hand-execution价值 = 执行精度value = execution precision 一稿One comp 新 · 手退到两端NEW · hand retreats to the two ends ① 规格① Spec人 · 意图human · intent ② 生成铺开多稿② Generate many机器 · 近零成本machine · near-free ③ 挑 · 评 · 导③ Pick · critique · steer人 · 品味瓶颈在此human · the taste bottleneck 导向后再生成steer → regenerate 定稿(对路那版)Ship the on-target one
两条泳道同样从左到右,但人的红框位置变了:旧泳道里人占着中段的"执行";新泳道里执行交给机器(蓝),人退到①规格与③判断两个红框。会做不再值钱,会判断才值钱——这就是"瓶颈搬家"在个人动作层面的样子。
Both lanes run left to right, but the human's red box has moved: in the old lane people occupied the mid-stream "execution"; in the new lane execution goes to the machine (blue) and people retreat to the two red boxes, ① spec and ③ judgment. Being able to make stops being the prize; being able to judge becomes it — this is "the bottleneck moves" at the level of an individual's actions.

为什么"判断多稿"比"打磨一稿"更难,而不是更省事

Why "judging many" is harder than "polishing one," not easier

一个常见的误解是把"判断多稿"想成轻松活——反正不用自己做,扫一眼挑个顺眼的就行。恰恰相反:判断是比执行更高阶的认知活动,做好它比打磨一稿更难。打磨一稿时,你有一个明确的目标形象在脑中,剩下的是手上功夫;判断多稿时,你面对的是一组都"看起来不错"的候选,必须在它们之间做出有理由的取舍——这要求你先把判据想清楚(否则你只是在凭感觉挑),再逐个对照判据评估,还要能从落选项里读出"该往哪个方向再生成"。这是一种需要刻意训练的能力,不是会做设计就自动会的。这也解释了为什么"生成让设计变简单了"是个危险的错觉:它让产出变简单了,却让真正决定成败的那部分——判断——变得更密集、更吃功力。把判断当轻松活的团队,会发现自己只是更快地在一堆 slop 里挑出一个略好的 slop。

A common misreading frames "judging many" as the easy job — you don't have to make it yourself, just glance and pick the nice-looking one. Quite the opposite: judgment is a higher-order cognitive activity than execution, and doing it well is harder than polishing one comp. When polishing one, you hold a clear target image in your head and the rest is hand-skill; when judging many, you face a set of candidates that all "look fine" and must make a reasoned trade-off among them — which requires you to first think the criteria through (or you are merely picking by feel), then evaluate each against the criteria, and also read from the rejects "which direction to regenerate." This is a capability that needs deliberate training, not one that comes automatically with knowing how to design. It also explains why "generation made design simpler" is a dangerous illusion: it made production simpler while making the part that truly decides success — judgment — denser and more demanding of skill. A team that treats judgment as the easy job will find it has only picked, faster, a slightly better slop out of a pile of slop.

这里也藏着一个对新人不友好、却必须直面的事实:判断力很难靠"看"速成,它主要靠"做过、并且复盘过为什么"积累。过去一个设计师的成长路径是清晰的——大量地做,在做的过程中手感和判断一起长。现在执行被生成接管,新人失去了"在做中练判断"的那条天然路径,却又被直接推到了"判断多稿"这个更高阶的任务前。这是 AI-Native 设计一个真实的、还没有好答案的难题:如果不再需要新人做大量执行,他们的判断力从哪里来?目前能看到的部分答案是——把判断本身变成可训练的对象:让新人反复做"对照规格评判候选、说清判据命中/落空"的练习,让资深设计师把自己的判断显式地讲出来供学习(这又回到了 DSN 03·5 的"外化品味")。但这件事值得诚实承认它的难度,而不是假装"反正有 AI,新人也能上手"。把判断当轻松活的代价,会先落在新人身上,再落在整个团队的判断力储备上。

Hidden here too is a fact unfriendly to juniors yet that must be faced squarely: judgment is hard to fast-track by "watching"; it accumulates mainly through "having done it, and having reviewed why." A designer's growth path used to be clear — make a great deal, and feel and judgment grow together in the making. Now that execution is taken over by generation, juniors lose the natural path of "training judgment by making," yet are pushed straight to the higher-order task of "judging many." This is a real, not-yet-well-answered difficulty of AI-Native design: if juniors no longer need to do large amounts of execution, where does their judgment come from? The partial answer visible so far is to make judgment itself a trainable object: have juniors repeatedly practice "judging candidates against a spec, stating criteria hits and misses," and have seniors explicitly articulate their judgment for learning (which returns to DSN 03·5's "externalizing taste"). But this deserves an honest admission of its difficulty rather than pretending "with AI around, juniors can just jump in." The cost of treating judgment as the easy job falls first on juniors, then on the whole team's reserve of judgment.

最后值得点明"挑、评、导"三者中最被低估的是——把评判转成下一轮的具体方向。挑(选出对路那版)和评(说清为什么)已经被很多人意识到重要,但"导"常被忽略:它要求你不仅判断现状,还要从落选的候选里读出"信息"——这一批整体偏冷说明气质方向要调暖,这一版的某个局部对了说明值得在那个方向上深化。导是一种把判断转化为生成指令的能力,它直接决定了下一轮铺开是更聚焦还是又一次发散。一个只会挑评、不会导的人,环会卡在"挑出一个还行的,但不知道怎么让下一轮更好"——这正是很多人感觉"用 AI 做设计到某个点就上不去了"的根因。导的能力把判断接回了生成,让闭环真正转起来。三者合一,挑评导才构成完整的判断动作;缺了导,判断就只是静态的评价,无法驱动迭代。这也是为什么 DSN 07 那条环里,④导向是连接"人的判断"与"机器的下一轮生成"的那根轴。

Finally, worth naming: of "pick, critique, steer," the most underrated is steer — turning critique into the next round's concrete direction. Picking (choosing the on-target one) and critiquing (saying why) are already recognized as important by many, but steering is often overlooked: it requires you not only to judge the present but to read "information" out of the rejected candidates — this batch running cold overall says the character direction should warm up; one local part of this version being right says it is worth deepening in that direction. Steer is the ability to convert judgment into a generation instruction, and it directly decides whether the next spread is more focused or yet another divergence. Someone who can only pick and critique but not steer gets the loop stuck at "picked a decent one but don't know how to make the next round better" — which is the root cause of many people feeling "AI design plateaus at some point." The ability to steer reconnects judgment to generation and makes the closed loop actually turn. Together, pick-critique-steer form the complete judgment act; without steer, judgment is only static evaluation that cannot drive iteration. This is also why, in the DSN 07 loop, ④ steer is the axle connecting "human judgment" to "the machine's next round of generation."

检验信号Test signal

品味命中率上升——候选里一次选中"对路那版"的比例;以及设计师花在判断而非亲手产出上的时间占比上升。产出量本身不是指标。证伪:若设计师花在判断上的时间没升、命中率没升,只是出稿更快了,那就还停在旧流程,只是手更快——没真正发生"造物→判物"的迁移。Taste hit-rate rises — how often the on-target version is picked first; and the share of time spent judging rather than producing by hand rises. Output volume itself is not the metric. Falsified if: the share of time on judgment does not rise and hit-rate does not rise — only comps come faster. Then you are still in the old process with a faster hand; the "making → judging" shift has not actually happened.

同一道分叉,落在设计面上为什么不一样

The same fork lands differently on the design face

值得把这道分叉就地推一遍,而不是从工程卷搬结论。内核第二步说"判断会沿一条可核验性的梯度分叉":能被机器核验的判断交给机器,不能的留给人。关键在于"可核验"到底指什么——它指存在一道判定程序:给定产物,机器能独立地、可复现地判出对错。工程面之所以能把大半判断推给机器,正因为正确性多半有这道程序:测试要么通过要么失败,类型要么对要么错,benchmark 给出一个数。这是一道判定,不是一种感受。

It is worth deriving this fork in place rather than importing the conclusion from the Engineering volume. The kernel's second step says "judgment forks along a verifiability gradient": what a machine can verify goes to the machine, what it cannot stays with people. Everything turns on what "verifiable" means — it means a decision procedure exists: given an artifact, a machine can independently and reproducibly rule it right or wrong. Engineering can push most judgment to machines precisely because correctness usually has such a procedure: a test passes or fails, a type checks or does not, a benchmark returns a number. That is a verdict, not a feeling.

设计面上,被判断的对象是体验与美——而"好"在这里没有判定程序。"这一版是否对路、是否为这群人、有没有越过那条品味的线",不能被归约成一个会通过或失败的测试。可以测可用性、可以测转化、可以测对比度合规,但这些是"好"的代理,不是"好"本身;把代理当成判定,恰恰是 slop 与同质化的来路(DSN 09·7)。于是同一道分叉在设计面上落点不同:工程面上,判断的大头滑向机器那一侧;设计面上,承重的那部分判断——品味——因为没有可机检的判定程序而无法滑过去,结构性地停在人这一侧。这不是"暂时还没自动化",是"被判断的东西本身不是一道可判定命题"。品味因此不是诸多判断里碰巧难的一种,它是这道分叉在设计面上的构成性判断:取走它,分叉就没有了人那一端。〔证据级 Ⅴ 论证;可机检代理与"好"本身的区分,见 DSN 06 规格化的上限〕

On the design face, what gets judged is experience and beauty — and here "good" has no decision procedure. "Whether this version is on-target, whether it is for these people, whether it crossed the taste line" cannot be reduced to a test that passes or fails. You can measure usability, measure conversion, measure contrast compliance, but these are proxies for "good," not "good" itself; mistaking the proxy for the verdict is exactly where slop and homogenization come from (DSN 09·7). So the same fork lands at a different place: on the engineering face the bulk of judgment slides to the machine side; on the design face the load-bearing judgment — taste — cannot slide across because it has no machine-checkable decision procedure, and it sits structurally on the human side. This is not "not yet automated"; it is "the thing being judged is not a decidable proposition in the first place." Taste is therefore not one merely-hard judgment among many — it is the constitutive judgment of this fork on the design face: take it away and the fork has no human end at all. [grade Ⅴ, argument; the proxy-vs-"good" distinction is the ceiling of formalization in DSN 06]

FIGFIG. D3.5 / THE VERIFIABILITY-GRADIENT FORK · 设计面 看懂:把设计判断按"有没有判定程序"铺成一条梯度,看内核第②步在哪一点叉开 Read: lay design judgments on a gradient by "is there a decision procedure," and see where the kernel's step ② forks
可核验性梯度 · 是否存在一道"对/错"判定程序VERIFIABILITY GRADIENT · is there an "right/wrong" decision procedure 判定程序存在 · 可机检procedure exists · machine-checkable 无判定程序 · 构成性品味no procedure · constitutive taste 对比度比值contrast ratio ≥ 4.5:1 ? 间距刻度spacing scale ∈ {4,8,12…} ?on the 4-px grid ? token 符合度token conformance 在白名单内 ?in whitelist ? lint 规则 / 状态完备lint rules / state coverage 规则可写尽 ?rule expressible ? 节奏 · 情绪 · 留白pacing · emotion · whitespace 是一种感受a feeling, not a verdict 它活着吗 · 是对的想法吗is it alive · is it the RIGHT idea 构成性判断the constitutive judgment 内核第②步在此叉开kernel step ② forks here 滑向机器 · 交给生成与护栏SLIDES TO MACHINE · generation & guardrails 有判定程序的判断:机器能独立、可复现地Judgments with a decision procedure: a machine can 判出对错。这一半被代码形态放大、近乎免费。rule it right/wrong, reproducibly. Amplified, near-free. 代理指标(可用性 / 转化)也落这侧——但只是"好"的代理proxies (usability / conversion) sit here too — but only proxies for "good" 结构性停在人 · 不可滑过去STAYS WITH PEOPLE · cannot slide across 没有判定程序的判断:取走它,这道叉就Judgments with no decision procedure: remove it and the 没有了人的那一端。这不是"暂时没自动化"。fork has no human end. Not "not yet automated." 品味 = 这道叉在设计面上的构成性判断taste = the constitutive judgment of this fork on the design face 〔证据级 Ⅴ 论证〕[grade Ⅴ, argument]
这条梯度不是"难易"排序,而是按"有没有一道可复现的对/错判定程序"排的。左端(对比度、间距刻度、token、lint)存在这道程序,所以能滑到机器那侧、被代码形态放大;越往右,程序越写不尽,到"它活着吗、是不是对的想法"这端干脆没有程序——它是一种感受,不是一道判定。内核第②步正是在这条梯度的某一点叉开:可机检的归机器,不可机检的结构性地停在人这侧。品味因此不是诸多判断里碰巧最难的一种,而是这道叉在设计面上的构成性判断——取走它,叉就没有了人那一端。
This gradient is not a ranking by difficulty but by "whether a reproducible right/wrong decision procedure exists." At the left end (contrast, spacing scale, tokens, lint) the procedure exists, so these slide to the machine side and get amplified by the code form; the further right, the less the procedure can be written down, until at "is it alive, is it the right idea" there is no procedure at all — it is a feeling, not a verdict. The kernel's step ② forks at a point on this gradient: the machine-checkable goes to the machine, the uncheckable sits structurally on the human side. Taste is therefore not the merely-hardest judgment among many but the constitutive judgment of this fork on the design face — take it away and the fork has no human end.
可机检 · 交给生成与护栏Machine-checkable · hand to generation & guardrails
  • 对比度 / 触达尺寸 / 无障碍合规
  • Contrast / touch-target size / accessibility compliance
  • 是否符合 token 与组件规格
  • Conformance to token & component spec
  • 链路是否走得通(可用性硬错误)
  • Whether the flow works at all (hard usability errors)
无判定程序 · 结构性留给人No decision procedure · structurally human
  • 这一版是否对路、是否"还像我们"
  • Whether this version is on-target, still "looks like us"
  • 节奏、情绪、何时留白、何时收住
  • Pacing, emotion, when to leave space, when to hold back
  • 越过"够好"那条线了没有——品味的判据
  • Whether it crossed the "good enough" line — the taste verdict
DSN
03·5
TASTE · 品味是稀缺判断
TASTE AS SCARCE JUDGMENT
机理 · 拆解
Mechanism · Anatomy

"品味"不是玄学,它是一组可拆解的判断

"Taste" is not mysticism; it is a set of decomposable judgments

把品味当成天赋玄学,会得出"它无法被讨论、无法被教、无法被外化成规格"的错误结论——而那恰恰会让人放弃在 AI-Native 流程里最该做的事。品味其实是可拆解的:它是对"为谁而做"的共情对"什么算好"的判别力对"往哪走"的方向感三者的合成。拆开它,才能说清它为什么稀缺、为什么不可外包给生成。

Treating taste as innate mysticism leads to the wrong conclusion that "it cannot be discussed, taught, or externalized into a spec" — which is exactly what makes people give up the thing they should most be doing in an AI-Native process. Taste is in fact decomposable: it is the synthesis of empathy for "who it is made for," discrimination of "what counts as good," and a sense of direction for "where to go next." Decompose it and you can say clearly why it is scarce and why it cannot be outsourced to generation.

为什么品味是稀缺的判断,而非稀缺的技能?技能可以被生成承包——画一个渐变、排一版网格、调一组配色,这些"会做"的事,模型已经做得又快又好。但品味问的不是"会不会做",而是"该不该是这样":在无数个都"做得对"的候选里,认出哪一个对这群人是对的。这是一种判别,不是一种产出;它依赖对用户处境的理解、对品牌语境的记忆、对"过犹不及"那条线的敏感——全是无法仅从产物文本读出的东西。这正是 DSN 00 那条分诊问句把它判到④的原因:品味坐落在可验证性梯度的最远端,离"可机检"最远,所以也离"可自动化"最远。

Why is taste a scarce judgment, not a scarce skill? A skill can be subcontracted to generation — drawing a gradient, laying out a grid, tuning a palette; these "can-do" things a model already does fast and well. But taste asks not "can it be done" but "should it be this way": among countless candidates all "done right," recognizing which one is right for these people. That is a discrimination, not an output; it relies on understanding the user's situation, memory of the brand context, sensitivity to the line where "more becomes worse" — all things that cannot be read from the artifact text alone. This is exactly why DSN 00's triage question sorts it to ④: taste sits at the far end of the verifiability gradient, furthest from "machine-checkable," and therefore furthest from "automatable."

拆开看:共情是品味的根,没有它品味只是个人偏好

Unpacked: empathy is the root of taste; without it, taste is mere preference

三个组成里,共情是根,另外两个都长在它上面。判别力(什么算好)和方向感(往哪走)若脱离了"为谁",就退化成纯粹的个人偏好——"我觉得好看"。这正是 slop 与品味最容易被混淆的地方:一个设计师凭个人审美做的判断,和一个设计师为某群用户做的判断,外表都像"主观选择",但前者是偏好,后者是品味。区别就在有没有共情这个根——品味永远能回答"对谁、在什么处境下、为什么",偏好只能回答"我喜欢"。这也解释了为什么 AI 在共情这件事上帮不上根本的忙:它可以基于数据模拟"某类用户可能怎么想",提供有用的素材,但它无法真的在意那群人过得好不好——而品味的根,恰恰是这种在意。一个不在意用户的设计师,给他再强的工具,做出的也只是更精致的自我表达;一个真在意用户的设计师,哪怕工具简陋,也能做出"为他们而存在"的东西。这就是为什么这一卷反复回到"为人":它不是道德口号,它是品味之所以是品味、而非偏好的那个根。

Of the three components, empathy is the root, and the other two grow on it. Discrimination (what counts as good) and direction (where to go) degenerate into pure personal preference — "I think it looks nice" — once detached from "for whom." This is exactly where slop and taste are most easily confused: a judgment a designer makes from personal aesthetics and a judgment a designer makes for some group of users both look, on the surface, like "subjective choices," but the former is preference and the latter is taste. The difference lies in whether the root of empathy is present — taste can always answer "for whom, in what situation, why," preference can only answer "I like it." This also explains why AI cannot fundamentally help with empathy: it can simulate, from data, "how a class of users might think," offering useful material, but it cannot actually care whether those people are well served — and the root of taste is precisely that care. Give a designer who does not care about users the strongest tool and they make only more polished self-expression; a designer who truly cares about users can, even with crude tools, make something that "exists for them." This is why this volume keeps returning to "for people": not a moral slogan but the very root that makes taste taste rather than preference.

把品味拆成共情、判别、方向这三层,还有一个直接的实用价值:它让"如何培养品味"从一句空话变成可操作的训练。如果品味是不可拆的天赋玄学,那"培养品味"就无从下手;但既然它是三层判断的合成,每一层就都可以被单独练。练共情:逼自己离开屏幕去真正接触用户、观察他们在真实处境里怎么用、卡在哪、为什么放弃——共情不是想象出来的,是看出来、问出来的。练判别:刻意做"对照明确判据评判候选"的练习,每次都强迫自己说出"好在哪、差在哪",把模糊的好恶逼成清晰的判据。练方向:复盘自己和高手的判断差在哪——同一组候选,为什么高手选了 B 你选了 A,他看到了什么你没看到。这三层各有各的练法,合起来就是一条可执行的品味成长路径。这也再次印证了"品味稀缺但不神秘":稀缺,是因为它要长期投入才能练成;不神秘,是因为它确实可以被拆解、被刻意练习、被一层一层地积累。

Decomposing taste into the three layers of empathy, discrimination, and direction has a direct practical value too: it turns "how to cultivate taste" from an empty phrase into operable training. If taste were indivisible innate mysticism, "cultivating taste" would have no handle; but since it is the synthesis of three layers of judgment, each layer can be trained on its own. Train empathy: force yourself to leave the screen and genuinely contact users, observe how they use things in real situations, where they get stuck, why they give up — empathy is not imagined but seen and asked out. Train discrimination: deliberately practice "judging candidates against explicit criteria," each time forcing yourself to state "good where, weak where," pressing vague likes and dislikes into clear criteria. Train direction: review where your judgment differs from an expert's — given the same set of candidates, why did the expert pick B while you picked A, what did they see that you did not. Each of the three layers has its own way to practice, and together they form an executable path for growing taste. This confirms once more "taste is scarce but not mysterious": scarce because it takes long investment to build; not mysterious because it can indeed be decomposed, deliberately practiced, and accumulated layer by layer.

品味稀缺,但不神秘——它可以被外化、被教、被回流

Taste is scarce but not mysterious — it can be externalized, taught, and fed back

"稀缺"和"不可言说"是两件事,混淆它们会犯下双重错误。承认品味稀缺,是承认它无法被生成替代、必须由人持有;但若进一步以为它"不可言说",就会放弃 DSN 04/08 写规格那件事——而规格恰恰是品味的外化形式。一个资深设计师做出"这版不行"的判断只需一秒,但若他能把"不行在哪、好该是什么样"说清楚、写下来,这一秒的判断就变成了可喂给生成的护栏、可教给新人的判据、可在团队里复用的资产。AI-Native 设计的一项核心训练,正是逼设计师把藏在直觉里的品味逐条说出来——不是因为说出来品味就不稀缺了,而是因为说出来它才能进入那条闭环、才能被复利。说不出"为什么好"的判断,对团队和生成都是不可见的;它只在那个人脑子里有效一次。

"Scarce" and "ineffable" are two different things, and conflating them is a double error. Granting that taste is scarce is granting it cannot be replaced by generation and must be held by people; but going further to assume it is "ineffable" makes you give up the spec-writing of DSN 04/08 — and the spec is precisely the externalized form of taste. A senior designer needs only a second to judge "this version is no good," but if they can say and write down "where it's no good, what good should look like," that one-second judgment becomes a guardrail you can feed to generation, criteria you can teach a junior, an asset the team can reuse. A core training of AI-Native design is exactly to force the designer to articulate, item by item, the taste hidden in their intuition — not because articulating it makes taste un-scarce, but because only articulated can it enter the closed loop and compound. A judgment that cannot say "why it's good" is invisible to the team and to generation; it works once, only inside that one head.

品味是主观的吗?是相对的,但不是任意的

Is taste subjective? Relative, yes; arbitrary, no

一个常被用来否定"品味可被讨论"的论点是:"品味是主观的,各花入各眼,没什么对错好说。"这个论点混淆了相对任意。品味确实是相对的——它相对于"为谁、什么语境、什么目的"而成立;同一个设计,对独立开发者对路,对幼儿教育产品就不对路。但相对不等于任意:一旦把"为谁、什么语境、什么目的"明确下来,"哪一版更对路"就有了可讨论、可论证、甚至可被多数有经验的人达成共识的答案。这正是为什么 DSN 08 的规格里 FOR-WHOM 必须写在最前面——它把"主观"转换成了"相对于一个明确对象的可判断性"。把品味说成纯主观,往往是放弃判断的借口;真正的设计判断从来不是"我喜欢",而是"对这群人、在这个目的下,这一版更对,因为……"。AI 在这件事上的位置也由此确定:它可以帮你模拟"某类用户可能怎么反应",提供判断的素材;但"为谁、什么目的"这个把相对性明确下来的锚,以及最终"对不对路"的裁定,仍然必须由理解这群人的人来下。

An argument often used to deny that "taste can be discussed" runs: "taste is subjective, beauty is in the eye of the beholder, there's no right or wrong to speak of." This argument conflates relative with arbitrary. Taste is indeed relative — it holds relative to "for whom, what context, what purpose"; the same design is on-target for solo developers and off-target for an early-childhood education product. But relative does not mean arbitrary: once "for whom, what context, what purpose" is pinned down, "which version is more on-target" has an answer that can be discussed, argued, even agreed on by most experienced people. This is exactly why FOR-WHOM must come first in DSN 08's spec — it converts "subjective" into "judge-ability relative to a defined object." Calling taste purely subjective is often an excuse to abandon judgment; real design judgment is never "I like it" but "for these people, under this purpose, this version is more right, because…" AI's place in this is thereby fixed too: it can help you simulate "how a class of users might react," supplying material for judgment; but the anchor that pins down relativity — "for whom, what purpose" — and the final verdict on "on-target or not" must still be set by a human who understands these people.

检验信号Test signal

品味在被外化的信号:团队评审时,"我觉得这版更好"逐渐被"它更好,因为对 X 用户在 Y 处境下命中了 Z"取代;新人的命中率随规格完善而上升。证伪:若资深设计师的判断始终说不出理由、无法写进规格、新人怎么也学不会,那要么品味还没被真正拆解,要么把"个人偏好"误当成了"品味"。Signal that taste is being externalized: in review, "I feel this version is better" gets gradually replaced by "it's better because it hits Z for user X in situation Y"; juniors' hit-rate rises as the spec sharpens. Falsified if: a senior's judgment can never state a reason, never enters the spec, and juniors never learn it — then either taste has not really been decomposed, or "personal preference" has been mistaken for "taste."

DSN
04
SYSTEM · 系统即护栏 / 何为好
SYSTEM & THE SPEC OF GOOD
重画 · 规格
Redraw · Spec

设计系统即护栏,把"何为好"写下来

The system as guardrail, writing the spec of "good"

tokens、组件、品牌、原则——不再只是交付后的文档,而是生成的前置规格与护栏:让海量生成留在系统内、不离牌、不滑向 slop。而要让生成往对的方向收敛,设计师还得把"何为好"写下来。

Tokens, components, brand, principles are no longer post-hoc docs but the upfront spec and guardrail for generation: keeping mass generation inside the system, on-brand, off the slope to slop. And to make generation converge in the right direction, the designer must write down "what is good."

生成不会读心。藏在资深设计师直觉里的标准,模型既学不到、也评不了;不写下来,生成就只能滑回均值。把意图与判据外化成可指导、可评判生成的规格——这是不可外包给模型的人类规格,和验证篇"人定何为对"是同一道,只不过这里定"何为好"。这把尺该包含:

Generation cannot read minds. A standard hidden in a senior designer's intuition is one the model can neither learn nor be judged against; unwritten, generation slides back to the mean. Externalize intent and criteria into a spec that steers and judges generation — a human spec that cannot be outsourced, the same kind as the Verification chapter's "humans define what's right," except here you define what's good. The ruler should hold:

这里有一个常被忽略的因果链,值得说透:生成的质量上限,不取决于模型多强,而取决于喂给它的规格多有判别力。给一个极强的模型一句"做个好看的 dashboard",它只能回你一个均值;给一个中等的模型一份写清了"为谁、什么气质、什么算完成、什么是红线"的规格,它反而能落在窄带里。这意味着在 AI-Native 设计里,设计师的杠杆点从"手有多巧"移到了"规格有多准"。设计系统正是这份规格的持久化、可复用、可机检的形态——它把一次性的口头判断,沉淀成每一次生成都自动生效的护栏。把它当事后文档,等于每次都从零开始向模型解释"我们要什么";把它当前置护栏,等于让团队所有人(和所有 agent)共享同一把已经磨好的尺。

There is an often-overlooked causal chain worth stating fully: the ceiling on generation's quality depends not on how strong the model is but on how discriminating the spec fed to it is. Give an extremely strong model "make a good-looking dashboard" and it can only return you the mean; give a mid-tier model a spec that spells out "for whom, what character, what counts as done, what is a red line" and it can land in the narrow band instead. This means that in AI-Native design the designer's leverage point moves from "how skilled the hand is" to "how precise the spec is." The design system is precisely the persisted, reusable, machine-checkable form of that spec — it distills one-off verbal judgments into a guardrail that takes effect automatically on every generation. Treat it as a post-hoc doc and you re-explain "what we want" to the model from scratch every time; treat it as an upfront guardrail and the whole team (and every agent) shares one ruler already ground sharp.

这条因果链还解释了一个让很多团队困惑的现象:为什么同样用最新的模型,有的团队产出稳定地好,有的却始终在 slop 里打转?差别几乎从不在模型,而在那份喂给模型的规格的质量,以及背后那套设计系统的完备度。模型对所有人都是同一个;它产出的差异,几乎全部来自约束它的护栏不同。这意味着在 AI-Native 时代,一个团队的设计竞争力,越来越不体现在"谁能招到手更巧的设计师",而体现在"谁能把何为好沉淀成更准、更完备、更可机检的规格与系统"。这是一个对组织能力的重新定义:核心资产从"个人的手艺"转向"被外化、可复用、能喂给生成的判断"。也正因如此,那些早早开始认真建设计系统、认真写规格、认真让判断回流的团队,会在这条曲线上越走越快,而把设计系统当文档敷衍的团队,会发现自己无论换多新的模型,都被卡在均值附近——因为模型从不是瓶颈,护栏才是。

This causal chain also explains a phenomenon that puzzles many teams: why, using the same newest model, do some teams produce steadily good work while others keep spinning in slop? The difference is almost never the model but the quality of the spec fed to the model and the completeness of the design system behind it. The model is the same for everyone; the difference in its output comes almost entirely from the different guardrails constraining it. This means that in the AI-Native era, a team's design competitiveness shows less and less in "who can hire the more skilled-handed designer" and more in "who can distill what-is-good into a more precise, complete, machine-checkable spec and system." This is a redefinition of organizational capability: the core asset shifts from "individual craft" to "externalized, reusable, generation-feedable judgment." And precisely for this reason, teams that start early on seriously building the design system, writing specs, and feeding judgment back run faster and faster along this curve, while teams that treat the design system as a doc to fob off find themselves stuck near the mean no matter how new a model they swap in — because the model was never the bottleneck; the guardrails are.

设计系统从"交付后的文档"升级为"生成前的护栏"

The design system upgrades from "post-hoc doc" to "pre-generation guardrail"

过去的设计系统是事后产物:先把界面做出来,再回头整理出一套 tokens、组件库、品牌规范,供团队对齐。它的角色是记录已经形成的共识。在 AI-Native 流程里,这个角色翻转了——设计系统必须前置成喂给生成的规格。原因很直接:生成的默认终点是均值(见 DSN 01 的不对称),唯一能把它从均值拉开的,是在生成发生之前就给定的约束。没有护栏的生成,等于把方向盘交给训练分布;有护栏的生成,才是把它约束在"这群人、这个品牌"的窄带里。所以设计系统不再是设计的结果,而是设计的前提——它越完备、越可机检,海量生成就越不容易离牌。

The old design system was a post-hoc artifact: build the interface first, then go back and tidy up a set of tokens, a component library, brand guidelines for the team to align on. Its role was recording a consensus already formed. In the AI-Native process that role inverts — the design system must move upfront into the spec fed to generation. The reason is direct: generation's default destination is the mean (see DSN 01's asymmetry), and the only thing that can pull it off the mean is a constraint given before generation happens. Generation without guardrails hands the wheel to the training distribution; generation with guardrails is what constrains it to the narrow band of "these people, this brand." So the design system is no longer the result of design but its precondition — the more complete and machine-checkable it is, the harder it is for mass generation to drift off-brand.

同一份规格切两层:机器守"不离牌",人守"是否为人"

One spec, two layers: the machine holds "on-brand," the human holds "for people"

规格不是一团均质的文字,它内部就分两层,而这两层正对应内核的①和④。把它们混在一起,是后面所有失败模式的根源:要么把品味硬塞进 lint(得到对齐完美却没灵魂的 slop),要么让人去盯机器该管的对齐间距(把稀缺的判断力耗在可自动化的事上)。下表把这条切分做成可照搬的对照〔判据见 DSN 08 的分诊问句〕:

A spec is not a homogeneous blob of text; it splits internally into two layers, and those two layers map exactly onto the kernel's ① and ④. Mixing them is the root of every failure mode downstream: either you force taste into lint (and get pixel-perfect, soulless slop), or you have humans police the alignment and spacing a machine should own (burning scarce judgment on the automatable). The table below turns this cut into a copyable contrast 〔criterion: the triage question in DSN 08〕:

维度Dimension 硬约束层 · 可机检(并入①)Hard layer · machine-checkable (→①) 软判据层 · 构成性品味(留给④)Soft layer · constitutive taste (→④)
问的是Asks仅看产物文本能否判对错?Right/wrong from artifact text alone?是否为这群人、有没有灵魂?For these people? Does it have soul?
典型项Typical itemstoken 符合度 · 对齐间距 · 对比度 ≥4.5:1 · 触达 ≥44px · 无离系统字号Token-conformance · alignment · contrast ≥4.5:1 · hit-target ≥44px · no off-system sizes为谁而做 · 调性气质 · 情感命中 · 异质性 · "完成"的感觉For-whom · tone & character · emotional fit · heterogeneity · the feel of "done"
谁来守Held by机器 · lint / CI 自动跑Machine · lint / CI, automatic人 · 设计师亲自判Human · the designer judges
写成什么Written as规则 / 阈值 / 断言Rules / thresholds / assertions意图陈述 + 验收信号Intent statements + acceptance signals
塞错层的后果Cost of wrong layer(本就该自动化,无代价)(belongs here — no cost)硬塞 lint → 优化出没人想用的完美界面Forced into lint → a flawless interface no one wants
内核位置Kernel slot① 充裕(被自动化)① abundance (automated)④ 意义(人回归)④ meaning (people return)
FIGFIG. D4.0 / SYSTEM-AS-SPEC · 设计系统即护栏 看懂:tokens / 组件 / 品牌如何夹住生成的输出分布 Read: how tokens / components / brand clamp the output distribution of generation
无护栏 · 生成滑回均值NO GUARDRAIL · slides to the mean 峰 = 均值 = sloppeak = mean = slop ③ 设计系统 = 护栏③ DESIGN SYSTEM = guardrail tokens / 组件 / 品牌 / 红线tokens / components / brand / red lines 有护栏 · 收敛到"这群人"的窄带WITH GUARDRAIL · narrow band for "these people" 峰被夹到"对这群人成立"处peak clamped to "true for these people" 硬约束墙hard wall 硬约束墙hard wall
把生成的输出想成一条概率分布。无护栏时,峰落在训练分布的均值——也就是 slop。设计系统不是把分布变窄那么简单,而是两道硬约束墙把概率质量夹离均值、推到"只对这群人成立"那个窄带。墙是可机检的硬约束(token/对齐/对比度),墙内的形状仍由人的软判据塑造。
Think of generation's output as a probability distribution. Without guardrails the peak sits at the mean of the training distribution — that is, slop. The design system does more than narrow the distribution; two hard walls clamp the probability mass off the mean, pushing it into the band that is "true only for these people." The walls are machine-checkable hard constraints (token / alignment / contrast); the shape inside the walls is still sculpted by human soft criteria.
FIGFIG. D4.1 / SYSTEM AS A TWO-LAYER SPEC · 把护栏拆成两层 看懂:tokens→组件→品牌叠成可机检的护栏层,它如何夹住生成的输出分布 Read: tokens→components→brand stack into the machine-checkable guardrail layer, and how it clamps generation's output distribution
A 层 · 可机检护栏(生成前固化)LAYER A · machine-checkable guardrail (frozen pre-generation) ① TOKENS 原子约束Atomic constraints 色 / 间距 / 字阶 = JSONcolor / spacing / type = JSON 机检:值在白名单内?check: value in whitelist? ② COMPONENTS 组合契约Composition contract 组件 = 有接口的代码component = code w/ interface 机检:只用许可的拼装?check: only allowed assembly? ③ BRAND 原则的可机检部分machine-checkable brand 禁用模式 / 字体白名单 = lintbanned patterns / fonts = lint 机检:命中红线?check: hits a red line? = 一道可机检= one machine- 的护栏层checkable layer 夹住分布clamps 生成的输出分布generation's output distribution 无护栏 · 峰=均值=slopno guardrail · peak=mean=slop 峰被夹离均值 → "这群人"窄带peak clamped off the mean → "these people" B 层 · 墙内的形状 · 人的软判据LAYER B · the shape inside the walls · human soft criteria 护栏决定边界在哪;边界之内"哪一版才对路"the guardrail sets where the walls are; which version 仍由品味判——A 夹住分布,B 在分布里挑峰。inside is on-target stays taste's call — A clamps, B picks.
设计系统不是一份文档,而是两层规格。A 层是可机检的护栏:tokens(原子值)、组件(组合契约)、品牌里能写成 lint 的部分(禁用模式、字体白名单),自下而上叠成一道生成前就固化、机器能逐条核验的硬约束。这道护栏的作用,正是把右边那条本会峰在均值(=slop)的输出分布,夹离均值、推进"只对这群人成立"的窄带。但护栏只决定墙在哪——墙内"哪一版才真的对路",是 B 层、是人的软判据。把这两层分清楚,就不会再问"设计系统能不能替我判断":A 层永远替你,B 层永远替不了你。
A design system is not a document but a two-layer spec. Layer A is the machine-checkable guardrail: tokens (atomic values), components (composition contract), and the lint-expressible part of brand (banned patterns, font whitelist), stacking bottom-up into a hard constraint frozen before generation and verifiable item by item. The job of this guardrail is exactly to take the output distribution on the right — which would otherwise peak at the mean (= slop) — and clamp it off the mean into the narrow band that is "true only for these people." But the guardrail only sets where the walls stand; "which version inside is truly on-target" is Layer B, the human's soft criteria. Keep the two layers distinct and you stop asking "can the design system judge for me": Layer A always can, Layer B never can.

设计系统与架构的"结构即护栏"是同一招——这不是类比,是同一个原理

The design system and architecture's "structure as guardrail" are the same move — not an analogy but one principle

这里值得把系列内部的同构点说透,因为它证明这套方法不是设计领域的特例,而是同一条原理在不同面的复现。架构篇讲过:当 agent 大量生成代码,唯一能让这些代码连贯、不互相打架、不偏离意图的,是一套对 agent 可读的结构约束——清晰的模块边界、类型、接口契约。设计系统在设计面上做的是一模一样的事:当 agent 大量生成界面,让它们连贯、不离牌、不滑向 slop 的,是一套对 agent 可读的结构约束——tokens、组件契约、品牌原则。两边都是把"海量生成"和"连贯性"这对矛盾,用"前置的、可机检的结构护栏"来调和。这不是修辞上的类比,而是因为它们面对的是同一个底层问题:当生成变廉价,质量的瓶颈从"能不能生成"转移到"生成出来的一堆东西能不能保持一致与意图",而解决这个瓶颈的通用形态,就是把意图固化成生成前的、机器能读的约束。这就是为什么读过架构篇的人会在这里有强烈的既视感——同一台机器,换了个面。

It is worth stating the intra-series isomorphism fully here, because it proves this method is not a special case of design but the same principle recurring on a different face. The Architecture chapter argued: when agents generate code en masse, the only thing that keeps that code coherent, non-conflicting, and on-intent is a set of structural constraints legible to agents — clear module boundaries, types, interface contracts. The design system does exactly the same thing on the design face: when agents generate interfaces en masse, what keeps them coherent, on-brand, off the slope to slop is a set of structural constraints legible to agents — tokens, component contracts, brand principles. Both reconcile the tension between "mass generation" and "coherence" with "upfront, machine-checkable structural guardrails." This is not a rhetorical analogy but follows from facing the same underlying problem: once generation gets cheap, the quality bottleneck shifts from "can it be generated" to "can the pile of generated things stay consistent and on-intent," and the general form for solving that bottleneck is to freeze intent into pre-generation, machine-readable constraints. This is why someone who has read the Architecture chapter feels a strong déjà vu here — the same machine, a different face.

这也回答了一个实操问题:在 AI-Native 流程里,设计系统该投入到什么程度才算够?旧标准是"够团队对齐就行"——因为它只是事后文档,过度完备是浪费。新标准要高得多:设计系统的完备度与可机检度,直接决定生成质量的上限,所以它值得被当成核心资产持续投入,而不是有空才整理的边角料。具体说,过去可能用自然语言松散描述的东西("主色调用我们的品牌蓝""间距保持一致"),现在都值得被固化成机器能直接消费的形式:token 写成 JSON 而非截图、组件写成有明确接口契约的代码而非画板示例、品牌原则里能机检的部分(对比度、字体白名单、禁用模式)写成 lint 规则。投入的回报是复利的:设计系统越完备,每一次生成就越省判断、越不离牌,而每一轮判断的回流又让它更完备。这是一条正反馈——也正因为是正反馈,早投入比晚投入划算得多。把设计系统当事后文档的团队,等于一直在放弃这条复利曲线。

This also answers a practical question: in an AI-Native process, to what degree should you invest in the design system before it is "enough"? The old standard was "enough to align the team" — since it was only a post-hoc doc, over-completeness was waste. The new standard is far higher: the design system's completeness and machine-checkability directly determine the ceiling on generation quality, so it deserves continuous investment as a core asset, not as a scrap to tidy when there is time. Concretely, things once described loosely in natural language ("use our brand blue for the primary," "keep spacing consistent") now deserve to be frozen into forms a machine can consume directly: tokens written as JSON not a screenshot, components written as code with explicit interface contracts not artboard examples, the machine-checkable parts of brand principles (contrast, typeface whitelist, banned patterns) written as lint rules. The return on this investment compounds: the more complete the design system, the more each generation saves judgment and stays on-brand, and each round's judgment flowing back makes it more complete still. This is a positive feedback loop — and precisely because it is, investing early pays off far more than investing late. A team that treats the design system as a post-hoc doc is continuously forgoing this compounding curve.

但这条复利曲线有一个必须守住的边界,否则它会反噬:设计系统能固化的,永远只是"已经被判断过的好",它不能替代"对新情况做新判断"。一套护栏写得越完备,越有一种危险的诱惑——以为只要照着系统生成,结果就一定好,于是停止判断、把系统当成了自动驾驶。这是把③(上下文/护栏)误当成了④(人的判断)。护栏的作用是把生成约束在"过去判断过的好"那个窄带里,让你不必每次都从头判断那些已经判断过的事;但当一个真正的新情况出现——一类系统没覆盖过的内容、一群没服务过的用户、一个旧判据不再适用的场景——系统会沉默,或者更糟,会用旧判据给你一个"看起来合规、其实不对路"的结果。这时仍然需要人重新判断,并把新判断回流进系统(这正是⑥沉淀)。所以设计系统是判断的沉淀器,不是判断的替代品。守住这条边界,复利曲线才成立;越界把它当自动驾驶,它就会在第一个新情况面前,把你悄悄带回均值。

But this compounding curve has a boundary that must be held, or it backfires: what a design system can freeze is only "good that has already been judged"; it cannot substitute for "making a new judgment about a new situation." The more complete the guardrails, the more dangerous a temptation arises — believing that as long as you generate by the system the result must be good, so you stop judging and treat the system as autopilot. This mistakes ③ (context/guardrail) for ④ (human judgment). The guardrails' job is to constrain generation to the narrow band of "good judged in the past," sparing you from re-judging the already-judged each time; but when a genuinely new situation arises — a kind of content the system never covered, a group of users never served, a scenario where old criteria no longer apply — the system falls silent, or worse, hands you, by old criteria, a result that "looks compliant but is off-target." Here a human must judge anew and feed the new judgment back into the system (precisely ⑥ distill). So the design system is a sediment of judgment, not a substitute for it. Hold this boundary and the compounding curve holds; cross it and treat the system as autopilot, and at the first new situation it will quietly carry you back to the mean.

DSN
05
PLAYBOOK · 守住人本 / 反 slop
HOLD THE HUMAN
行动 · 承重
Action · Load-bearing

守住人本,拒绝 slop

Hold the human, refuse slop

效率从来不是终点。把出稿变快,若只是更快地产出 slop,那不是 AI-Native 设计的胜利。生成接管产出后,设计师被还给那些一开始就最该由人做的事——共情、品味、意义。机器铺面,人定义方向与意义。

Efficiency was never the point. Making comps faster, if it only means producing slop faster, is no win. Once generation takes over production, the designer is returned to what should have been theirs all along — empathy, taste, meaning. The machine covers surface; the human defines direction and meaning.

交给生成Hand to generation
  • 铺变体、补全状态与边角
  • Variants, states, edge cases
  • 套用设计系统、对齐切图
  • Applying the system, alignment, export
  • 初稿与探索性方向
  • First drafts, exploratory directions
留给人 · 品味与意义Keep with humans · taste & meaning
  • 共情:理解用户真正要什么
  • Empathy: what users truly want
  • 品味:判好坏、守独特、避 slop
  • Taste: judge, hold distinctiveness, avoid slop
  • 意义:为"这值不值得存在"负责
  • Meaning: own whether it deserves to exist

承重命题:设计的成败,不看出稿多快,而看它最终是不是真的为具体的人而做。这不是装饰性表述,它是整卷的承重墙,也是整个系列那条人本主线在设计面上的落点。把它当真,会改变你对"AI-Native 设计成功了没有"的判断标准:如果一个团队用 AI 把出稿速度提了十倍,却产出的全是更快、更光滑的 slop——谁也不为、谁也不记得——那么按这条命题,它失败了,哪怕每个效率指标都漂亮。反过来,如果一个团队出稿没快多少,但设计师把省下的每一分注意力都投到了共情和品味上,做出的东西真的让目标用户觉得"这是为我做的",那么按这条命题,它成功了。出稿更快本身从来不是赢,它只是把人从重复产出里腾出来;被腾出来的人,要回到只有人能做的那件事——理解人、为人负责。把"更快"本身错当成赢,是 AI-Native 转型里最常见、也最隐蔽的失败——它让你在所有仪表盘都绿的情况下,悄悄丢掉了设计存在的理由。

The load-bearing claim: a design's success is judged not by how fast its comps ship, but by whether it is, in the end, truly made for specific people. This is not a pretty phrase but the volume's load-bearing wall, and where the series' human through-line lands on the design face. Take it seriously and it changes your criterion for "has AI-Native design succeeded": if a team uses AI to make comps ten times faster yet produces only faster, smoother slop — for no one, remembered by no one — then by this proposition it has failed, however pretty every efficiency metric. Conversely, if a team's comps did not get much faster but its designers put every spared minute of attention into empathy and taste, making something that genuinely makes the target user feel "this was made for me," then by this proposition it has succeeded. Shipping faster is never, in itself, the win; it only frees people from repetitive production; the freed person must return to what only a person can do — understanding people, being responsible to people. Mistaking "faster" itself for the win is the most common and most insidious failure of an AI-Native transition — it lets you quietly lose the reason design exists while every dashboard glows green.

"把人还给意义"在设计面上具体指什么

What "returning people to meaning" concretely means on the design face

"人回归意义"在别的卷里可能还偏抽象,但在设计这个面上它落得最实,因为设计本来就是一门关于人的手艺——它的全部价值在于"被某个人用、被某个人感受到"。当生成接管了出稿、铺变体、对齐切图这些产出动作,被还给设计师的恰恰是这门手艺最初的内核:去理解一个具体的人在一个具体的处境里真正需要什么(共情),去判断眼前这一版是否真的命中了那个需要(品味),去为"这个东西值不值得存在、该是什么样"负责(意义)。这三件事过去常常被产出工时挤到边上——设计师大量的时间花在像素和导出上,留给共情和判断的反而不多。AI-Native 设计的真正承诺,不是"设计师可以更快地出图",而是把设计师从产出里解放出来,还给他这门手艺一开始就该做的事。这与工程卷"人做系统专长与产品判断、不做吞吐"是同一个解放,只是对象从代码换成了体验与美。

"People return to meaning" may stay abstract in other volumes, but on the design face it lands most concretely, because design is by nature a craft about people — its entire value lies in "being used by someone, being felt by someone." Once generation takes over the production actions of drawing comps, spreading variants, alignment and export, what is returned to the designer is exactly the original core of this craft: to understand what a specific person in a specific situation truly needs (empathy), to judge whether the version in front of them actually hits that need (taste), to own whether "this thing deserves to exist and what it should be" (meaning). These three were often squeezed to the margins by production hours — designers spent vast time on pixels and exports, leaving little for empathy and judgment. The real promise of AI-Native design is not "designers can produce comps faster" but freeing the designer from production and returning them to what this craft was meant to do from the start. This is the same liberation as the engineering volume's "people do deep systems expertise and product judgment, not throughput," only the object shifts from code to experience and beauty.

这也是为什么在整个系列里,设计被看作人本主线落得最实的一面。组织卷把"人重新回到中心"立为整套方法论的归宿,但在组织层面它还偏宏观、偏原则;越往下到具体职能,这个目的就越需要被翻译成"在这个面上,回归意义具体指什么"。工程卷把它翻译成"人做系统专长而非吞吐",已经具体了一层;而到了设计,它落得最实,因为设计这门手艺的对象本来就是人——它不像写代码那样隔着一层逻辑,它直接面对"一个人会不会被这个东西打动、会不会觉得被理解"。所以在设计这个面上,"为具体的人而做"不是一句需要费力论证的抽象原则,而几乎是这门手艺的定义本身:一个不为人的设计,从一开始就不配叫设计。AI-Native 设计因此是检验整条人本主线是否真能落地的试金石——如果它在最贴近"人"的这个面上都守不住,那它在别的面上多半也只是装饰性表述。守住了,则证明这条主线不是装饰,而是真能一路落到具体动作的承重结构。

This is also why, in the whole series, design is seen as the face where the human through-line lands most concretely. The org volume establishes "putting people back at the center" as the methodology's destination, but at the organizational level it is still macro, still principle; the further down to a concrete function, the more this purpose needs translating into "on this face, what does returning to meaning concretely mean." The engineering volume translates it into "people do systems expertise, not throughput," already one layer more concrete; and at design it lands most concretely, because this craft's object is people to begin with — unlike writing code, separated by a layer of logic, it faces directly "will a person be moved by this thing, will they feel understood." So on the design face, "made for specific people" is not an abstract principle needing laborious argument but nearly the definition of the craft itself: a design that is not for people does not deserve to be called design from the start. AI-Native design is thus the touchstone for whether the whole human through-line can really land — if it cannot be held even on the face closest to "people," it is probably just pretty words on the other faces too. Held, it proves the through-line is not decoration but a load-bearing structure that can be carried all the way down to concrete actions.

反 slop 红线(命中越多,越滑向均值)

Anti-slop red lines (the more you hit, the closer to the mean)

检验信号Test signal

slop 率下降、独特度上升——把成品丢给陌生人,他会问"这怎么做出来的",而不是"这哪个 AI 做的"。同时盯用户侧:可用性、共情命中、"这是为我做的"那种感觉。Slop rate down, distinctiveness up — show the result to a stranger and they ask "how was this made," not "which AI made this." Watch the user side too: usability, empathy hits, that "this was made for me" feeling.

DSN
06
MECHANISM · 为何代码产物被放大
WHY CODE ARTIFACTS WIN
机理 · 受力
Mechanism

不是工具赢,是产物变代码

The tool does not win; the artifact becoming code wins

把上一节的原理拆到受力层:当设计产物是私有二进制画布,agent 读不进、改不动、验不了,它只能被边缘化;当产物是代码 / 纯文本,它一次性获得四个属性——这正是工程那条放大律落到设计面。但这条律有边界:它只放大"可被规格约束"的那一半。

Take the prior section's principle down to the force level: when the artifact is a proprietary binary canvas, an agent cannot read into it, edit it, or check it, so it can only be marginalized; when the artifact is code / plain text, it gains four properties at once (the engineering amplification law landing on the design surface). But the law has a boundary: it amplifies only the half that can be bound by a spec.

受力分析:这四属性不是工具的功能,是"产物=文本"这一形态的副产物。Figma 文件、PSD、私有画布把状态锁进二进制——人能看,agent 看不懂、diff 不出、生成不了、机器验不了。换成 HTML / JSX / tokens.json,同一份设计立刻可读、可 diff、可生成、可验证。这就是 pencil / Remotion 这类工具被放大的真正原因——不是它更好用,是它选对了产物形态。

Force analysis: these four properties are not tool features; they are a byproduct of the "artifact = text" form. Figma files, PSDs, proprietary canvases lock state into binary: a human can look; an agent cannot parse, diff, generate, or machine-check it. Swap in HTML / JSX / tokens.json and the same design becomes readable, diffable, generatable, verifiable at once. That is the real reason pencil / Remotion-class tools are amplified: not that they are nicer, but that they picked the right artifact form.

二进制画布 · 被边缘化Binary canvas · marginalized
状态锁在私有格式里。agent 只能截图猜、人工转译;每次改动是一团不可读的 diff,生成与验证都无从下手。
State locked in a proprietary format. An agent can only screenshot and guess, or a human transcribes; each change is an opaque diff, with no handhold for generation or verification.
代码 / 文本 · 被放大Code / text · amplified
对 agent 可读 + 可 diff/版本 + 可生成/组合 + 可机检规格。设计就此进入工程的协作纪律,海量生成可被审、可回滚、可约束。
Legible to agents + diffable/versionable + generatable/composable + machine-checkable spec. Design enters engineering's collaboration discipline; mass generation becomes reviewable, revertible, constrainable.
边界 · 何时这条律失效Boundary · when the law stalls

产物变代码只放大可被规格约束的那一半。对齐、间距、token 符合度——可机检,被放大;而"这版有没有灵魂、是否为这群人而做"无法写进类型系统。把设计当纯工程问题,就会优化掉所有可机检的指标,产出一个挑不出错却谁也不想用的界面。代码是杠杆,不是品味的替身。Artifact-as-code amplifies only the half a spec can bind. Alignment, spacing, token-conformance are machine-checkable and get amplified; whether a version has a soul, whether it is made for these people, cannot be written into a type system. Treat design as a pure engineering problem and you optimize every machine-checkable metric into an interface that passes review yet no one wants to use. Code is leverage, not a stand-in for taste.

为什么是"产物形态"而不是"工具能力"在做功

Why it is the "artifact form," not "tool capability," that does the work

很容易把 pencil、Remotion 的价值归功于"它们功能更强/更智能"。这是误判,会让人去追下一个更炫的工具,而错过真正的杠杆点。做功的不是任何单一功能,而是产物从二进制变成文本这件事本身——它一次性满足了所有对 agent 友好的属性,于是被整个软件协作生态接住。检验这一点的方法很干脆:设想把 Figma 的导出格式换成一份完全可读的、语义化的文本 schema,而其它功能一概不变。仅这一改,agent 就能读、能 diff、能生成、能机检它——价值立刻被放大。反过来:给一个再聪明的 AI 设计工具,只要它把结果存回不可读的私有二进制,agent 依旧进不去,杠杆依旧不出现。可见做功的是形态,不是聪明。

It is tempting to credit pencil's or Remotion's value to "they are more powerful / smarter." That is a misdiagnosis, and it sends people chasing the next flashier tool while missing the real leverage point. What does the work is not any single feature but the fact that the artifact turns from binary into text — which satisfies all the agent-friendly properties at once and is therefore caught by the entire software-collaboration ecosystem. The test is blunt: imagine swapping Figma's export format for a fully readable, semantic text schema while changing nothing else. That one change alone lets an agent read, diff, generate, and machine-check it — and value is amplified immediately. Conversely: give the smartest AI design tool, and as long as it stores results back into an unreadable proprietary binary, the agent still cannot get in and the leverage still does not appear. So it is the form that does the work, not the cleverness.

记症状会过期,懂成因不会——为什么这一节讲机制而非清单

Memorizing symptoms expires; understanding causes does not — why this sheet teaches mechanism, not a list

有人会问:既然 slop 的指纹可以列成清单(DSN 05/09 已经列了),为什么这一节还要费力讲成因?因为清单会过期,成因不会。今天 slop 的指纹是青配深底、紫蓝渐变、玻璃拟态,但这些只是当下训练分布里最高频的模式;一两年后,当大家都开始反这几条、新的高频模式形成,slop 的指纹就会换一批面孔——也许是某种新的"高级灰极简"、某种新的版式套路。如果你只背了今天这张清单,到那时你会拿着一张过期的地图,对着一套全新的 slop 束手无策,甚至因为"它不在我的清单上"而误以为它不是 slop。但如果你理解了成因——slop = 高频 × 安全 × 易生成的交集 = 对均值的收敛——那么无论指纹怎么换面孔,你都能用同一把尺认出它:问这版有没有为这群人做过真正的取舍,还是只是选了当下最省事、最不会错的默认。这就是为什么这一卷始终在讲受力而非症状:症状是时点的,受力是结构的;背症状让你能应付今天,懂受力让你能应付还没出现的明天。

Someone may ask: since slop's fingerprints can be listed (DSN 05/09 already did), why does this sheet labor over the causes? Because the list expires; the causes do not. Today slop's fingerprints are cyan-on-dark, purple-blue gradients, glassmorphism, but these are only the highest-frequency patterns in the current training distribution; a year or two on, once everyone starts rejecting these and a new high-frequency pattern forms, slop's fingerprints will swap faces — perhaps some new "refined-gray minimalism," some new layout cliché. If you only memorized today's list, by then you will hold an expired map, helpless before a brand-new slop, even mistaking it for non-slop because "it's not on my list." But if you understand the cause — slop = the intersection of high-frequency × safe × easy-to-generate = convergence to the mean — then however the fingerprints change faces, you can recognize it with the same ruler: ask whether this version made a real trade-off for these people, or merely picked the most labor-saving, least-wrong default of the moment. This is why this volume keeps teaching force rather than symptom: symptoms are point-in-time, force is structural; memorizing symptoms lets you handle today, understanding force lets you handle a tomorrow that has not yet arrived.

有一个推论值得点明,因为它会让人对"反 slop"这件事保持谦卑:今天你引以为傲的"高级"审美,明天也可能成为新的 slop。当下被认为是反 slop 的那些做法——克制的留白、单色的极简、衬线大标题、不对称网格——如果它们因为被推崇而变得高频、被无数人模仿、成为生成模型的新默认,那么它们就会精确地满足"高频 × 安全 × 易生成"这三条,从而变成下一代的 slop。这不是说这些做法本身不好,而是说"好"从来不在某个具体的视觉样式里,而在"是否为这群人做了真正的取舍"这个动作里。一旦某种样式变成了不假思索的默认套用,它就丢掉了那个动作,无论它曾经多"高级"。这条推论的实践含义是:反 slop 不是一份可以背下来、一劳永逸的"好品味清单",而是一种需要每次重新执行的判断动作——每一次都重新问"这是为谁、为什么是这样",而不是套用上一次管用的答案。把任何审美当成永久正确的安全牌,本身就是滑向 slop 的开始。

A corollary worth naming, because it keeps you humble about "anti-slop": the "refined" aesthetic you take pride in today may become tomorrow's new slop. The practices currently regarded as anti-slop — restrained whitespace, monochrome minimalism, serif headlines, asymmetric grids — if, by being celebrated, they become high-frequency, imitated by countless people, the new default of generation models, then they will precisely satisfy the three conditions of "high-frequency × safe × easy-to-generate" and become the next generation's slop. This is not to say these practices are bad in themselves, but that "good" never lives in a specific visual style; it lives in the act of "whether a real trade-off was made for these people." Once a style becomes a thoughtless default applied by rote, it loses that act, however "refined" it once was. The practical implication: anti-slop is not a "good-taste list" you can memorize once and for all, but a judgment act that must be re-performed each time — each time re-asking "for whom, why this way," rather than applying the answer that worked last time. Treating any aesthetic as a permanently correct safe card is itself the start of sliding into slop.

这条放大律有它精确的边界,把它说清楚比夸它重要。它作用于"能被规格约束"的那一半设计——对齐、间距、token、可访问性、状态完备度。这一半被代码形态接住后会被巨幅放大、近乎免费。但另一半——这一版有没有打动人、是否为它要服务的那群人而做、节奏对不对——无法被写进任何类型系统或 lint 规则。把设计整体当成纯工程问题,就会发生一种隐蔽的退化:你会不自觉地只优化能测的,因为能测的有反馈、能跑 CI、能进度可视。于是每个可机检指标都满分,产物却空心——一个挑不出任何错、却谁也不想多看一眼的界面。这就是"代码当成品味替身"的代价。正确的姿势是:让代码形态把可机检那半彻底自动化,从而腾出人的全部注意力,去守那不可机检的另一半。

This amplification law has a precise boundary, and stating it clearly matters more than praising it. It acts only on the half of design that "can be bound by a spec" — alignment, spacing, tokens, accessibility, state-completeness. Once caught by the code form, that half gets hugely amplified, nearly free. But the other half — whether this version moves anyone, whether it is made for the people it is meant to serve, whether the pacing is right — cannot be written into any type system or lint rule. Treat design wholesale as a pure engineering problem and a hidden degeneration sets in: you unconsciously optimize only what you can measure, because the measurable gives feedback, runs in CI, shows up on a dashboard. So every machine-checkable metric scores full marks while the artifact is hollow — an interface with no findable flaw that no one wants to look at twice. That is the cost of "code as a stand-in for taste." The correct posture: let the code form fully automate the machine-checkable half, thereby freeing all of the human's attention to guard the half that cannot be machine-checked.

"只优化可测的"是一种隐蔽的退化——它会自动发生,除非你刻意防

"Optimize only the measurable" is a hidden degeneration — it happens automatically unless you guard against it

这条退化值得单独说,因为它不是某个人偷懒的结果,而是一种系统会自动滑向的状态。可测的东西有一个不公平的优势:它能给即时反馈、能进 CI、能在仪表盘上变成一条向上的线、能在汇报里被引用。不可测的东西(这版有没有打动人)则相反:反馈慢、主观、说不清、无法量化进度。当一个团队同时面对这两类目标,理性的注意力会不知不觉地向可测的那侧倾斜——不是因为有人决定不要品味了,而是因为可测的那侧持续地、廉价地给正反馈,而不可测的那侧总是沉默又昂贵。久而久之,团队就在不知不觉中把"好设计"重新定义成了"所有可测指标都过关",而那恰恰是空心 slop 的定义。防住这条退化需要刻意的反制:在流程里给不可测的判断留出受保护的时间与话语权(比如评审里强制有人回答"这版除了指标全过,它打动人了吗"),并明确承认"可测指标全过"只是必要条件而非充分条件——它说明没犯低级错误,不说明做出了好东西。把这条写进团队约定,是抵抗"工程化吞噬设计"的护栏。

This degeneration deserves its own treatment, because it is not the result of someone being lazy but a state the system slides into automatically. The measurable has an unfair advantage: it gives instant feedback, enters CI, becomes an upward line on a dashboard, gets cited in a report. The unmeasurable (does this version move anyone) is the opposite: slow feedback, subjective, hard to state, impossible to quantify as progress. When a team faces both kinds of goal at once, rational attention tilts imperceptibly toward the measurable side — not because anyone decided to drop taste, but because the measurable side continuously, cheaply gives positive feedback while the unmeasurable side is forever silent and expensive. Over time the team redefines, without noticing, "good design" as "all measurable metrics pass," which is exactly the definition of hollow slop. Guarding against this degeneration takes deliberate counter-measures: reserve protected time and standing for unmeasurable judgment in the process (for instance, mandate that someone in review answer "metrics aside, does this version move anyone"), and explicitly grant that "all measurable metrics pass" is only a necessary condition, not a sufficient one — it says no rookie mistakes were made, not that something good was made. Writing this into the team agreement is the guardrail against "engineering devouring design."

把这条边界正过来说,会得到一个让人安心的结论:代码形态非但不威胁设计师的核心价值,反而是在保护它。因为它把所有可机检的、机械的、重复的活——那些过去吞掉设计师大量时间、却最不需要人类判断的活——彻底交给了机器,从而把设计师的时间和注意力腾出来,专投到不可机检的那一半:理解用户、判断好坏、守住品味与意义。换句话说,代码形态做的是"把人不该做的事拿走",留下的恰恰是"只有人能做、也最值得人做的事"。担心"设计被工程吞掉"的人,往往把因果搞反了:被吞掉的不是设计,而是设计里本就该自动化的那部分;真正的设计——为人的判断——非但没被吞,还因为周围的杂活被清走而第一次有了充分施展的空间。所以正确的态度不是抵触代码形态,而是主动拥抱它来清场,然后把清出来的全部注意力,押在那条机器永远够不到的人本边界上。这就是 DSN 06 那条放大律最终指向的好消息:杠杆归机器,意义归人。

Turn this boundary right-side up and you reach a reassuring conclusion: the code form does not threaten the designer's core value but protects it. Because it hands every machine-checkable, mechanical, repetitive job — the jobs that used to eat vast designer time yet least needed human judgment — entirely to the machine, thereby freeing the designer's time and attention to invest wholly in the unmeasurable half: understanding users, judging good and bad, holding taste and meaning. In other words, the code form does "take away what people should not be doing," leaving exactly "what only people can do, and most deserve to do." Those who fear "design being devoured by engineering" often have the causality backwards: what gets devoured is not design but the part of design that should have been automated all along; real design — judgment for people — is not devoured but, with the chores cleared away, gets room to fully unfold for the first time. So the right attitude is not to resist the code form but to actively embrace it to clear the ground, then bet all the freed attention on the human boundary the machine can never reach. This is the good news the DSN 06 amplification law ultimately points to: leverage to the machine, meaning to people.

DSN
07
WORKFLOW · AI 设计工作流
THE AI DESIGN WORKFLOW
工件 · 可拷贝
Artifact · Copyable

铺开候选,再收敛——一条可照做的环

Spread candidates, then converge: a loop you can run

"生成多稿 + 判断"听起来对,落不了地。把它做成一条有节拍的环:规格 → 铺开 → 评判 → 导向 → 收敛 → 沉淀。每一步标清交给生成还是留给人、上下文往哪流。这是工程那条 SDD 环在设计面的镜像,只是验的不是正确,是品味。

"Generate many, then judge" sounds right but does not land. Make it a loop with a cadence: spec → spread → critique → steer → converge → distill. Each step marks what goes to generation versus stays with people, and where context flows. It mirrors the engineering SDD loop on the design surface, only what is verified is taste, not correctness.

交给生成 · 铺面Hand to generation · cover surface
  • ② 铺开:按规格批量出 6–12 个方向
  • ② Spread: 6–12 directions in a batch, on spec
  • ④ 导向后再生成:选中方向上铺变体
  • ④ Regenerate on the chosen direction's variants
  • 补全状态/边角/响应式/文案
  • Fill states, edge cases, responsive, copy
留给人 · 品味与方向Keep with humans · taste & direction
  • ① 规格:写下为谁、何为好、不要什么
  • ① Spec: for whom, what is good, what to avoid
  • ③ 评判:挑对路那版、说清为什么
  • ③ Critique: pick the on-target one, say why
  • ⑤⑥ 收敛与沉淀:定稿、把判断回流进系统
  • ⑤⑥ Converge & distill: ship, feed judgment back

上下文怎么流:规格(①)是喂给生成的护栏,决定②铺出来的候选是否偏离约束;③的"为什么好/差"不能只留在脑子里,要写成下一轮的提示与对系统的修订——判断必须回流,否则每轮都从均值重新开始。一条可拷贝的环:

How context flows: the spec (①) is the guardrail fed to generation, setting how on- or off-target ② comes out; ③'s "why good / why weak" cannot stay in your head; write it into the next round's prompt and into a system revision. Judgment must flow back, or every round restarts from the mean. A loop you can copy:

②铺开的关键是"方向多样",不是"数量多"

The point of ② spread is "directional diversity," not "high count"

有一个细微但决定成败的区别:铺开 12 个候选,和铺开 12 个方向,是两件完全不同的事。前者常常是同一个想法的 12 次微调——换个色、挪个间距、改个字号——它们挤在审美空间的同一个点附近,给人的判断提供不了任何有意义的对比,只会制造"选择的错觉"。后者是 12 条真正不同的假设:极简文档风 vs 终端美学 vs 杂志排版 vs 卡片流……每一条代表一个对"该怎么为这群人做"的不同回答。判断力只有在面对真正的差异时才有用武之地——你能说出"A 方向的克制更命中这群用户的工程感,B 方向太热闹",这是判断;而在 12 个微调里挑一个,你只是在表达偏好。所以②的指令不该是"多生成几个",而该是"生成几个互不相同的方向",并且在提示里显式要求方向间的差异度。这也解释了 DSN 11 那条反指标——不停"再来一个"却说不出在找什么——的本质:那是在数量轴上空转,而非在方向轴上探索。

There is a subtle but decisive distinction: spreading 12 candidates and spreading 12 directions are entirely different things. The former is often 12 tweaks of the same idea — a different color, a shifted margin, a changed size — crowded near the same point in aesthetic space, offering judgment no meaningful contrast and only manufacturing "the illusion of choice." The latter is 12 genuinely different hypotheses: minimal-docs vs terminal aesthetic vs magazine typography vs card flow… each a different answer to "how should this be made for these people." Judgment has work to do only when facing real difference — being able to say "direction A's restraint hits this audience's engineering sensibility better, direction B is too busy" is judgment; picking one of 12 tweaks is only expressing a preference. So ②'s instruction should not be "generate a few more" but "generate a few mutually distinct directions," with the diversity between directions demanded explicitly in the prompt. This also explains the essence of DSN 11's counter-signal — endless "one more" with no statement of what you seek: that is spinning on the count axis rather than exploring on the direction axis.

这条环必须闭合——判断不回流,每轮都从均值重启

The loop must close — without feedback, every round restarts from the mean

很多团队把这套做成了一条开环:规格→铺开→挑一个→发版,下一次又从空白开始。开环的致命处在于,第③步那些宝贵的判断——"这版为什么好、那版差在哪、我其实在找什么"——全留在了设计师的脑子里,没有变成任何可复用的东西。于是生成每一轮都从训练分布的均值重新出发,护栏永远停在第一天的水平。这正是为什么环里⑥沉淀这一步是承重的:它把人这一轮做出的判断外化成下一轮的规格修订和系统更新,让护栏带着判断一起长。闭环与开环的差别,不是多走一步流程,而是判断到底有没有复利——开环里每次判断用完即弃,闭环里每次判断都垫高了起点。这与工程卷"上下文成为可查询基设"是同一道:判断只有被写下来、回流进系统,才从一次性的灵光变成团队的资产。

Many teams build this as an open loop: spec → spread → pick one → ship, and next time start from blank again. The fatal flaw of the open loop is that the precious judgment of step ③ — "why this version is good, where that one fails, what I am actually looking for" — all stays in the designer's head and never becomes anything reusable. So generation restarts every round from the mean of the training distribution, and the guardrails stay frozen at day-one quality. This is exactly why the ⑥ distill step is load-bearing: it externalizes the judgment a human made this round into the next round's spec revisions and system updates, letting the guardrails grow alongside the judgment. The difference between a closed and an open loop is not one extra process step but whether judgment compounds — in the open loop each judgment is used once and discarded; in the closed loop each one raises the floor. This is the same line as the engineering volume's "context becomes queryable infrastructure": judgment becomes a team asset, not a one-off spark, only when it is written down and flows back into the system.

核心图KEY FIGFIG. D5.0 / THE AI DESIGN LOOP · 品味是验证器 看懂:comp→变体→评判→收敛,闭环的关键是判断回流 Read: comp→variants→critique→converge; the close is judgment flowing back
品味Taste = 验证器= the verifier 人 · 不可外包human · not outsourceable ① SPEC写下何为好Write what's good人 · 为谁/气质/红线human · for-whom/tone ② SPREAD铺开 6–12 个方向6–12 directions机器 · 近零成本machine · near-free ③ CRITIQUE挑 + 说清为什么Pick + say why人 · 品味在此判human · taste judges ④⑤⑥ STEER→DISTILL导向 · 收敛 · 沉淀Steer · converge · distill人 · 判断回流进系统human · feed judgment back 判断回流 → 修订规格(承重)judgment → revise spec (load-bearing)
外圈是工作流的四个工位,沿顺时针流转。真正让它从"开环堆 slop"变成"闭环长品味"的,是中心那个验证器(人的品味)和左侧那条红色回流箭头——它把③评判出的判据写回①规格。没有这条回流,环只是一台更快的 slop 印刷机;有了它,护栏每转一圈都更准。这与工程卷的 SDD 环同构,只是验证器从"正确"换成了"品味"。
The outer ring is the workflow's four stations, flowing clockwise. What turns it from "an open loop piling up slop" into "a closed loop growing taste" is the verifier at the center (human taste) and the red feedback arrow on the left — it writes the criteria surfaced in ③ critique back into ① spec. Without that feedback the loop is just a faster slop press; with it, the guardrails sharpen each turn. Isomorphic to the engineering volume's SDD loop, only the verifier is swapped from "correctness" to "taste."

这条环和工程的 SDD 环为什么是同一条——以及关键的那一处不同

Why this loop and engineering's SDD loop are the same — and the one crucial difference

工程卷讲过一条 SDD(spec-driven development)环:人写规格 → agent 生成实现 → 人验证 → 不对就回去修规格再生成。把它和这里的设计环并排看,结构完全一致:都是"人定标准 → 机器生成 → 人验证 → 判断回流修标准"的闭环,都靠中心那个不可外包的验证器把生成约束在意图内。这种一致不是巧合,而是因为两者都是同一条内核在不同面的实现——执行充裕之后,人退到"定标准 + 验证"这两个判断节点上。但有一处关键的不同,必须说清,否则会把设计错当成工程:工程环里的验证器验的是正确性,它在原则上可被自动化测试逼近——一个实现要么通过测试要么不通过,边界相对清晰;而设计环里的验证器验的是品味,它在原则上就不可被完全自动化,因为"对这群人是否对路"没有一个可被测试用例固定的真值。这条不同决定了:工程里人验证的占比会随着测试覆盖率提高而下降,而设计里人验证的占比有一个不会归零的下限——那个下限正是品味,正是 ④ 的常驻人口。把这条记牢,就不会犯"既然能自动化测试,设计是不是也能全自动验收"的错。

The engineering volume described an SDD (spec-driven development) loop: humans write the spec → agents generate the implementation → humans verify → if wrong, go back, fix the spec, regenerate. Place it beside the design loop here and the structure is identical: both are closed loops of "humans set the standard → machine generates → humans verify → judgment flows back to fix the standard," both relying on the non-outsourceable verifier at the center to constrain generation within intent. This identity is no coincidence but follows from both being implementations of the one kernel on different faces — after execution becomes abundant, people retreat to the two judgment nodes of "set the standard + verify." But there is one crucial difference that must be made clear, or design gets mistaken for engineering: the verifier in the engineering loop checks correctness, which in principle can be approached by automated tests — an implementation either passes the tests or not, a relatively clear boundary; the verifier in the design loop checks taste, which in principle cannot be fully automated, because "on-target for these people or not" has no ground truth pinnable by a test case. This difference dictates: in engineering the share of human verification falls as test coverage rises, while in design the share of human verification has a floor that never reaches zero — and that floor is precisely taste, precisely the standing population of ④. Hold this firmly and you will not make the error of "since tests can be automated, can design acceptance be fully automated too."

关于这条环,最后值得提醒一个节奏上的常见错误:把六步压成三步偷偷跑。压力之下,人很容易把"①规格→②铺开→③评判→④导向→⑤收敛→⑥沉淀"压缩成"随便提个示→挑一个→发版"——省掉了写规格、省掉了说清判据、省掉了判断回流。每一步被省掉,都对应一种已经讲过的失败:省①规格=生成从均值起步(滑向 slop);省③的"说清为什么"=评判退化成凭感觉投票(无法导向);省⑥沉淀=判断不复利(每轮从零开始)。这条环的价值恰恰在于它逼你不跳步:六步不是流程繁文,而是六个各自承重的检查点,每一个都对应一处人类判断必须显式发生的地方。所以照做这条环最大的好处,不是"按流程走显得专业",而是它用结构强迫你在每个该判断的地方真的判断了一次——这正是从"用更快的手"到"用更准的判断"那一步迁移,在日常操作层面的具体抓手。

On this loop, one last reminder about a common rhythm mistake: compressing six steps into three and running them on the sly. Under pressure, people easily compress "① spec → ② spread → ③ critique → ④ steer → ⑤ converge → ⑥ distill" into "throw out a prompt → pick one → ship" — skipping writing the spec, skipping stating the criteria, skipping judgment feedback. Each skipped step corresponds to a failure already discussed: skip ① spec = generation starts from the mean (slides to slop); skip ③'s "say why" = critique degenerates into voting by feel (cannot steer); skip ⑥ distill = judgment does not compound (every round starts from zero). The value of this loop is precisely that it forces you not to skip: the six steps are not process red tape but six individually load-bearing checkpoints, each corresponding to a place where human judgment must explicitly happen. So the biggest benefit of running this loop is not "looking professional by following process" but that its structure forces you to actually judge once at each place a judgment is due — which is exactly the day-to-day handle for that migration from "a faster hand" to "more accurate judgment."

有效 / 失效的信号Right / wrong signals

先行指标:每轮铺开的方向真有差异(非一版的微调);评判时能逐条指名判据;判断回流后下一轮命中率上升。反指标:不停"再生成一个"却说不出在找什么——那是用生成代替判断,环空转,只会更快地堆出同质化候选。Leading: each spread holds genuinely distinct directions (not tweaks of one); you can name the criteria hit per candidate; hit-rate rises next round after judgment flows back. Counter: endless "generate another" with no statement of what you are looking for. That substitutes generation for judgment, the loop spins, and you only pile up sameness faster.

DSN
07·5
CASE · 一遍完整的环
A WORKED LOOP
案例 · 走一遍
Case · walked through

把抽象的环,在一个真实需求上走一遍

Walking the abstract loop through one real brief

前面的环和规格都讲了原理,这里在一个具体需求上走一遍——一个面向独立开发者的工具落地页——让"规格→铺开→评判→导向→收敛→沉淀"变成可对照的动作,也让"好规格的判别力"和"判断回流"看得见。这是方法的自我检验:若一套方法连一个最常见的需求都走不顺,它就不该被相信。

The loop and the spec above gave the principle; here we walk it through one concrete brief — a landing page for a tool aimed at solo developers — turning "spec → spread → critique → steer → converge → distill" into actions you can check against, and making "a good spec's discriminating power" and "judgment flowing back" visible. This is the method testing itself: if a method cannot even run a most-common brief smoothly, it does not deserve belief.

① 规格。先写下不可外包给生成的人类判断:FOR-WHOM=独立开发者,在焦虑找工具、注意力极短的处境里,要在 30 秒内判断"这是否为我";CHARACTER=克制、工程感、可信,明确不要欢快插画、不要科技蓝渐变;DONE-WHEN=目标用户一眼认出自己、陌生人问"怎么做的"而非"哪个 AI 做的";HARD-RULES=只用系统 token、对比度≥4.5:1、无渐变文字、字体白名单排除 Inter/Roboto。② 铺开。把规格喂给生成,一次出 8 个方向(不是一个的微调):有的走极简文档风、有的走终端/代码美学、有的走杂志排版、有的仍滑回了默认的玻璃拟态。注意:哪怕给了红线,总有几版会漏过——这正说明硬约束需要被写成 lint 在收敛时强制跑,而不能只靠提示里写一句。

① Spec. First write the human judgments that cannot be outsourced to generation: FOR-WHOM = solo developers, anxious and tool-hunting with very short attention, who must judge "is this for me" within 30 seconds; CHARACTER = restrained, engineering-grade, trustworthy, explicitly no cheerful illustration, no tech-blue gradient; DONE-WHEN = the target user recognizes themselves at a glance, a stranger asks "how was this made" not "which AI made it"; HARD-RULES = system tokens only, contrast ≥ 4.5:1, no gradient text, typeface whitelist excluding Inter/Roboto. ② Spread. Feed the spec to generation and produce 8 directions at once (not tweaks of one): some go minimal-docs, some terminal/code aesthetic, some magazine typography, some still slide back to the default glassmorphism. Note: even with red lines given, a few versions slip through — which is exactly why hard constraints must be written as lint and forced to run at convergence, not left to one line in the prompt.

③ 评判。关键不是"挑出最好看的",而是逐条对规格说清命中/落空:终端美学那版命中了 CHARACTER 的"工程感"和 FOR-WHOM 的"一眼认出自己",但首屏信息密度过高,违背了"30 秒内判断";杂志排版那版气质对、可信感强,但太重、加载慢。说不出这些"为什么",就还没在判断、只是在挑。④ 导向。把评判变成下一轮的具体指令:"在终端美学方向上深化,但首屏只留一句价值主张 + 一个真实代码片段,密度降一半。"——只在选中方向上再生成,不重开八个。⑤ 收敛。定一版,跑 HARD-RULES 的 lint:对比度过、token 过、渐变文字零、字体过。⑥ 沉淀。把这轮新学到的判据——"独立开发者落地页首屏密度上限""真实代码片段比抽象插画更命中可信感"——写回设计系统的规格库,下一个类似需求的起点就抬高了。这一步是闭环与开环的唯一差别。

③ Critique. The key is not "pick the prettiest" but to state, item by item against the spec, what hits and what misses: the terminal-aesthetic version hits CHARACTER's "engineering-grade" and FOR-WHOM's "recognize yourself at a glance," but its above-the-fold information density is too high, violating "judge within 30 seconds"; the magazine-typography version has the right character and strong trust, but is too heavy and loads slowly. Unable to say these "whys," you are not judging yet, only picking. ④ Steer. Turn the critique into the next round's concrete instruction: "deepen the terminal-aesthetic direction, but above the fold keep only one value proposition + one real code snippet, halve the density" — regenerate only on the chosen direction, do not reopen eight. ⑤ Converge. Settle one version, run the HARD-RULES lint: contrast passes, tokens pass, zero gradient text, typeface passes. ⑥ Distill. Write the round's new criteria — "above-the-fold density ceiling for solo-developer landing pages," "a real code snippet hits trust better than abstract illustration" — back into the design system's spec library, and the starting point for the next similar brief is raised. This step is the only difference between a closed and an open loop.

为什么这一遍能推广——它没用任何落地页特有的东西

Why this walk generalizes — it used nothing specific to landing pages

这个案例用的是落地页,但注意:上面六步里,没有一步依赖"它是落地页"这个事实。把对象换成一个移动 App 的引导流、一份数据报表、一套图标系统、甚至一段产品视频,六步的结构原样成立——只有 FOR-WHOM 的内容、HARD-RULES 的具体阈值、③评判时对照的判据会换,而"先写有判别力的规格→铺真正不同的方向→逐条说清判据命中→只在选中方向上深化→收敛跑硬约束→把判断回流"这个骨架不变。这正是判断一套方法是不是抓到了底层结构的试金石:它的步骤能不能在不改结构的前提下,换一个对象重跑一遍。能,说明它抓的是受力;只对某一类对象成立,说明它抓的是表象。这个落地页案例之所以放在这里,不是因为落地页特别重要,而是因为它最常见、最容易被读者拿自己手上的活去对照——你完全可以现在就把它换成你正在做的东西,走一遍这六步,看卡在哪一步。卡住的那一步,往往就是你现在最该补的能力。

This case uses a landing page, but note: among the six steps above, not one depends on the fact that it is a landing page. Swap the object for a mobile app's onboarding flow, a data report, an icon system, even a product video, and the six-step structure holds intact — only the content of FOR-WHOM, the specific thresholds of HARD-RULES, and the criteria checked against in ③ change, while the skeleton "write a discriminating spec → spread genuinely different directions → state criteria hits item by item → deepen only the chosen direction → converge and run hard constraints → feed judgment back" stays fixed. This is exactly the touchstone for whether a method has caught the underlying structure: whether its steps can rerun on a different object without changing the structure. If yes, it caught the force; if it holds only for one kind of object, it caught the appearance. This landing-page case sits here not because landing pages are especially important but because they are most common and easiest for a reader to check against their own work — you could right now swap it for whatever you are making, walk the six steps, and see where you get stuck. The step you get stuck on is often the capability you most need to build next.

这一遍里有一个细节值得单独拎出来,因为它最容易被跳过、却最决定成败:③评判时"说清为什么"这个动作,不是评判的修饰,而是评判的本体。很多人以为评判就是"挑出那个最好的",挑完就完事;但如果你说不出"它为什么对路、落选的那些差在哪条判据上",你做的就不是评判,而是凭感觉投票。这个区别有实在的后果:说不出理由,④导向就无从下手(你不知道该让生成往哪个具体方向深化),⑥沉淀也无从谈起(你没有可回流的判据)。所以"说清为什么"这个看似多余的口头动作,其实是把一次性的直觉判断转化成可导向、可回流、可复用的结构化判断的关键一步。一个简单的自律:每次挑完候选,强迫自己写下三句话——这版命中了哪条判据、落选的主要差在哪、下一轮该往哪个方向深化。写不出这三句,说明你还停在"挑",没进到"判"。这一条自律,几乎是整条闭环能不能真正闭合的开关。

One detail in this walk deserves to be pulled out on its own, because it is the easiest to skip yet the most decisive: the act of "saying why" in ③ critique is not an ornament of judgment but its very substance. Many assume critique is "pick the best one" and you are done; but if you cannot say "why it is on-target, on which criterion each reject falls short," what you did is not critique but voting by feel. This distinction has real consequences: with no reason stated, ④ steer has no handle (you do not know which concrete direction to deepen), and ⑥ distill is out of the question (you have no criteria to feed back). So the seemingly redundant verbal act of "saying why" is in fact the key step that converts a one-off intuitive judgment into a steer-able, feed-back-able, reusable structured judgment. A simple discipline: after picking a candidate, force yourself to write three sentences — which criterion this version hits, where the main reject falls short, which direction to deepen next round. If you cannot write these three, you are still at "picking," not yet at "judging." This one discipline is nearly the switch for whether the whole loop actually closes.

这一遍验证了什么What this walk verifies

注意全程:生成做了所有"做出来"的活(铺 8 个方向、深化、补全),人只做了三件机器做不了的事——把规格写得有判别力、逐条说清判据命中、把判断回流进系统。产出量不是这一遍的功劳,命中率才是:第二轮就收敛,是因为第一轮的判断没有白费。这就是 DSN 03→07 在一个真实需求上的合一。Note throughout: generation did all the "making" (spread 8 directions, deepen, fill in), and the human did only the three things machines cannot — write a discriminating spec, state criteria hits item by item, feed judgment back into the system. Output volume is not the win of this walk; hit-rate is: it converged on the second round because the first round's judgment was not wasted. This is DSN 03→07 made one on a real brief.

DSN
08
SPEC · 何为好的规格
THE SPEC OF GOOD
工件 · 模板
Artifact · Template

把"好"写成可生成、半可机检的规格

Write "good" as a spec that is generatable and half machine-checkable

DSN 04 立了规格该含什么,这里给可拷贝的样例,并把它切成两层:可机检的硬约束(token / 对齐 / 对比度 / 触达尺寸——能写成 lint 规则,并入①充裕)与构成性的软判据(为谁、气质、何为完成——只能由人评,留在④)。好的规格让生成往对处收敛,而不把品味假装成可计算。

DSN 04 set what a spec must hold; here is a copyable sample, split in two layers: hard constraints that machine-check (token / alignment / contrast / hit-target, writable as lint rules, folded into ① abundance) and constitutive soft criteria (for-whom, character, what-counts-as-done, judged only by people, kept at ④). A good spec makes generation converge toward good without pretending taste is computable.

为什么要分两层:把软判据硬塞进 lint,会得到对齐完美却没灵魂的 slop;把硬约束留给人眼盯,会把人的注意力耗在机器该管的事上。分层后,机器守住"不离牌",人专注守"是否为人"。一份可拷贝的规格骨架(贴进 repo 的 design-spec.md 即可喂生成):

Why split the layers: forcing soft criteria into lint yields pixel-perfect, soulless slop; leaving hard constraints to the human eye burns attention on what a machine should own. Split, the machine holds "stay on-brand" while the human guards "is it for people." A copyable spec skeleton (drop it into a repo's design-spec.md to feed generation):

分诊判据Triage test

某条约束该进硬层还是软层?问一句:"无需理解用户,仅看产物文本就能判定对错吗?"能 → 硬层(lint / CI,并入①);不能 → 软层(留给人判,留在④)。把这条问句对每条规格走一遍,就得到一份不假装品味可计算的规格。Hard layer or soft? Ask: "can this be ruled right or wrong from the artifact text alone, without understanding the user?" Yes → hard (lint / CI, folded into ①); no → soft (judged by people, kept at ④). Run every constraint through this question and you get a spec that does not pretend taste is computable.

一份好规格的标志:它能让别人(或 agent)替你做出"你会认的"东西

The mark of a good spec: it lets someone else (or an agent) make something "you would sign off on"

怎么判断一份设计规格写得够不够好?有一个干脆的操作性判据:把它交给一个没和你聊过、不在你脑子里的人或 agent,让对方据此生成;产物回来,你认不认?认,说明规格把你脑中那把尺真的外化出来了;不认,说明判据还藏在你的直觉里,没写下来——而藏在直觉里的标准,生成既学不到也评不了。这条判据之所以有用,是因为它把"规格质量"从一个主观感受变成了一次可重复的实验:换三个不同的人/agent 跑同一份规格,若产出彼此接近且都在你的可接受带内,规格就收敛了;若产出四散,说明规格里还有大量留白被各自的均值填上了。好规格不是写得详尽,而是写得有判别力——它能把"你要的"和"看起来差不多但不对的"区分开。

How do you tell whether a design spec is good enough? There is a blunt operational test: hand it to a person or agent who has never talked with you and is not inside your head, have them generate from it; when the artifact comes back, do you sign off? If yes, the spec really externalized the ruler in your head; if no, the criteria are still hiding in your intuition, unwritten — and a standard that lives in intuition is one generation can neither learn nor be judged against. This test is useful because it turns "spec quality" from a subjective feeling into a repeatable experiment: run the same spec past three different people/agents, and if their outputs are close to each other and all inside your acceptance band, the spec has converged; if outputs scatter, there is still a lot of blank in the spec that each filled with its own mean. A good spec is not exhaustive but discriminating — it tells "what you want" apart from "what looks about right but is wrong."

这与验证篇"人定何为对"是同一道工序的两个面:验证那边写的是正确性的判据(可被测试用例固定),设计这边写的是的判据(一半可机检、一半只能由人认)。两边共享一个深层结构:当执行变充裕,"判据"本身成了不可外包的人类产物。模型可以生成无穷多候选,但"按什么标准收"必须由人给定——给不出标准,就只能默认收到均值。所以写规格不是流程文档工作,而是设计师在 AI-Native 时代最核心的动作之一:它就是把"判断"这件稀缺的事,沉淀成可复用、可回流、可喂给生成的资产。

This is two faces of the same operation as the Verification chapter's "humans define what's right": verification writes the criteria of correctness (pinnable by test cases); design writes the criteria of good (half machine-checkable, half only human-acceptable). Both share a deep structure: once execution becomes abundant, the "criteria" themselves become the non-outsourceable human artifact. A model can generate infinitely many candidates, but "by what standard to converge" must be supplied by a human — supply no standard and you default to converging on the mean. So writing the spec is not process-documentation busywork but one of the designer's most central moves in the AI-Native era: it is precisely the act of distilling the scarce thing, judgment, into a reusable, feed-back-able, generation-feedable asset.

规格的两个反面:写太死,和写太空

A spec's two failure modes: over-pinned, and over-empty

写规格有两个对称的失败,理解它们能帮你找到那条窄路。写太死:把每个像素、每个色值、每个间距都规定死,本质上是用文字重新画了一遍稿——这既没利用生成的探索能力(你已经把答案写死了,生成只剩复刻),也把本该留给方向探索的②铺开压成了零自由度。更隐蔽的是,写太死往往是把软判据误当硬约束的产物:你以为自己在写规格,其实在写一份固执的个人偏好,堵死了所有可能更好的方向。写太空:只写"做个现代、专业、好用的界面",等于什么都没说——这些词对每个项目都成立,因此对这个项目毫无判别力,生成只能回你均值。这是把规格写成了正确的废话。那条窄路是:硬约束写到可机检的精确(这一侧越死越好,因为它本就该自动化),软判据写到有判别力但不限定具体形态(写清"为谁、什么气质、什么算完成、不要什么",但把"具体长什么样"留给②去探索、③去判断)。一句话:规格要锁死目标,敞开路径。

Writing a spec has two symmetric failures, and understanding them helps you find the narrow path. Over-pinned: nailing down every pixel, color value, and margin is essentially redrawing the comp in words — it neither uses generation's exploratory power (you have already fixed the answer, leaving generation only to replicate) nor leaves the ② spread any degrees of freedom for directional exploration. More insidiously, over-pinning is often the product of mistaking soft criteria for hard constraints: you think you are writing a spec but are writing a stubborn personal preference that blocks every potentially better direction. Over-empty: writing only "make a modern, professional, usable interface" is saying nothing — these words hold for every project and therefore have no discriminating power for this one, so generation can only return you the mean. This writes the spec as correct nonsense. The narrow path is: write hard constraints to machine-checkable precision (the more fixed this side the better, since it should be automated anyway), and write soft criteria to be discriminating without fixing concrete form (spell out "for whom, what character, what counts as done, what to avoid," but leave "what it concretely looks like" to ② to explore and ③ to judge). In a phrase: a spec locks the target and opens the path.

最有判别力的一项,往往是"明确不要什么"

The most discriminating item is often "explicitly what not to be"

在所有规格项里,有一项的判别力被严重低估:明确写下"不要什么"。正面描述"要什么"很容易写成正确的废话——"要现代、要专业、要好用"对任何项目都成立,因此对收敛几乎没用。但反面的"不要什么"天然带判别力,因为它直接对准了生成最可能滑向的那个均值。写下"不要科技蓝渐变、不要欢快插画、不要玻璃拟态、不要把它做成又一个 SaaS dashboard",等于在生成出发之前就把那条通往 slop 的最宽的路堵死了。这背后的机制是:生成的默认就是均值,而"不要什么"恰恰是在描述均值的形状——你越清楚自己要避开的那个默认长什么样,你的规格就越能把生成推离它。所以一份好规格里,"不要"清单往往比"要"清单更有信息量、更省下游的判断成本。这也和 DSN 09 的反 slop 红线接上了:那张红线表,本质上就是一份通用的、可机检的"不要什么"——把它纳进每个项目的规格,你就免费获得了一道挡住最常见 slop 的护栏。

Among all spec items, one has badly underrated discriminating power: explicitly writing down "what not to be." A positive description of "what to be" easily becomes correct nonsense — "be modern, be professional, be usable" holds for any project and is therefore almost useless for convergence. But the negative "what not to be" carries discriminating power by nature, because it aims directly at the mean generation is most likely to slide toward. Writing "no tech-blue gradient, no cheerful illustration, no glassmorphism, do not make it yet another SaaS dashboard" blocks, before generation even sets out, the widest road to slop. The mechanism behind this: generation's default is the mean, and "what not to be" is precisely describing the shape of the mean — the clearer you are about what the default you want to avoid looks like, the more your spec can push generation off it. So in a good spec, the "not" list is often more informative than the "to be" list and saves more downstream judgment cost. This connects to DSN 09's anti-slop red lines: that red-line table is essentially a general, machine-checkable "what not to be" — fold it into every project's spec and you get, for free, a guardrail against the most common slop.

还要厘清一个关于"半可机检"的常见误解:这个"半"字不是说有一半判据模棱两可、说不清,而是说判据在被分类之后,恰好一半能交给机器、一半必须留给人,且两边都各自清晰。硬约束那半(对比度、token、触达尺寸)清晰到可以写成断言、跑进 CI、给出确定的通过/不通过;软判据那半(为谁、气质、是否打动人)也可以清晰——清晰到能写下"目标用户一眼认出自己""陌生人问怎么做的而非哪个 AI"这样具体到可被人验收的信号——只是它的验收者是人而非机器。所以"半可机检"是一种诚实的精确:它既不假装品味能被算法判定(那是把软的硬塞),也不把可机检的部分推给人眼盯(那是浪费判断力)。一份好规格的高级之处,恰恰在于它清楚地知道自己的每一条该落在哪半,并据此把对的那半交给对的判定者。这种"知道什么该交给机器、什么必须留给人"的清醒,本身就是这套方法论在规格这个具体工件上的体现。

One common misunderstanding about "half machine-checkable" also needs clearing up: the "half" does not mean half the criteria are ambiguous and unstatable, but that once criteria are classified, exactly half can go to the machine and half must stay with people, with both sides clear in their own way. The hard-constraint half (contrast, tokens, hit-target) is clear enough to write as assertions, run in CI, return a definite pass/fail; the soft-criteria half (for-whom, character, whether it moves anyone) can also be clear — clear enough to write down signals concrete enough for a human to accept against, like "the target user recognizes themselves at a glance," "a stranger asks how it was made, not which AI" — only its acceptor is a human, not a machine. So "half machine-checkable" is an honest precision: it neither pretends taste can be algorithmically decided (forcing the soft into the hard) nor pushes the machine-checkable part to human eyes (wasting judgment). The sophistication of a good spec lies precisely in knowing clearly which half each of its items belongs to, and handing the right half to the right adjudicator accordingly. This clear-headedness — knowing what to give the machine and what must stay with people — is itself this methodology made manifest on the concrete artifact of the spec.

DSN
09
ANTI-SLOP · 异质审美的守护
ANTI-SLOP & HETEROGENEITY
机理 · 失效
Mechanism · Failure

slop 是同质化,解药是"只对这群人成立"

Slop is homogenization; the cure is "true only for these people"

slop 的机理:生成默认收敛到训练分布的均值,均值就是"所有人见得最多的样子"——所以 slop 的本质是同质化,而非粗制。它的指纹可枚举(下表)。而避开它不能靠"做得更精",要靠把审美钉在一群具体的人身上:好的设计往往只对某群人成立,对所有人都"还行"恰恰是滑回均值的征兆。

The slop mechanism: generation defaults to the mean of its training distribution, and the mean is "what everyone has seen most," so slop is homogenization, not poor craft. Its fingerprints are enumerable (table below). And escaping it is not about "more polish" but about pinning the aesthetic to a specific group of people: good design is often true only for some group; being "fine" for everyone is exactly the sign of sliding back to the mean.

受力分析:模型为最小化期望损失,会偏向最高频的视觉模式——青配深底、紫蓝渐变、玻璃拟态、Inter 居中、巨数字仪表盘。它们之所以是 slop,不因为丑,而因为到处都是、谁也不为。下表把它们做成可机检的红线条目(接 DSN 08 的 HARD-RULES),命中即扣分:

Force analysis: to minimize expected loss, a model leans toward the highest-frequency visual patterns: cyan-on-dark, purple-blue gradients, glassmorphism, centered Inter, big-number dashboards. They are slop not because they are ugly but because they are everywhere and for no one. The table turns them into machine-checkable red-line items (feeding DSN 08's HARD-RULES); a hit deducts:

指纹 · 配色
FINGERPRINT · COLOR
青配深底 / 霓虹 / 紫蓝渐变
Cyan-on-dark / neon / purple-blue gradient
修法:从品牌或主题取一个真实的、有来由的主色,限定调色板,删掉渐变文字。
Fix: take one real, motivated primary from brand or subject; constrain the palette; kill gradient text.
指纹 · 材质
FINGERPRINT · MATERIAL
玻璃拟态 / 处处大圆角 + 柔投影
Glassmorphism / rounded-everything + soft shadow
修法:让材质服务层级而非装饰;多数表面用实色与硬边界,模糊只留给真正悬浮的层。
Fix: let material serve hierarchy, not decoration; flat surfaces and hard borders for most, blur only for truly floating layers.
指纹 · 排版
FINGERPRINT · TYPE
Inter/Roboto · 万物居中
Inter/Roboto · centering everything
修法:选一款有性格的字(含对比的衬线/特征字形);建立左对齐为主的真实排版网格。
Fix: pick a typeface with character (a contrasting serif / distinctive forms); build a real left-aligned typographic grid.
指纹 · 布局
FINGERPRINT · LAYOUT
等大卡片网格 · 巨数字仪表盘模板
Equal-card grid · big-number dashboard template
修法:按内容权重定尺寸差异;用真实层级与节奏,而非把一切塞进等大盒。
Fix: size by content weight; use real hierarchy and rhythm instead of stuffing all into equal boxes.
指纹 · 图标
FINGERPRINT · ICON
每个标题上方一个大圆角图标
A big rounded icon above every heading
修法:图标只在帮助识别时用;多数标题靠文字与排版承担,不靠装饰图占位。
Fix: icons only where they aid recognition; let most headings carry on words and type, not decorative placeholders.
指纹 · 语气
FINGERPRINT · VOICE
空洞口号 · "赋能/革命性/无缝"
Empty slogans · "empower / revolutionary / seamless"
修法:用具体名词与动词写真实价值;删掉所有不增信息的形容词堆叠。
Fix: write real value in concrete nouns and verbs; delete every adjective stack that adds no information.

slop 不是"做得差",是"做得像所有人"——同质化才是失败模式

Slop is not "done badly"; it is "done like everyone" — homogenization is the failure mode

把 slop 理解成"粗制滥造"会指向错误的解药——以为多打磨就能解决。真相相反:slop 往往做得很精,对齐完美、配色和谐、动效顺滑,每个可机检指标都满分。它的问题不在质量轴,在独特性轴:它收敛到了所有人见得最多的那个样子。这就是为什么"再精修一遍"治不了 slop——你是在沿错误的那条轴用力。模型为最小化期望损失,天然偏向训练分布里最高频的视觉模式;高频意味着"大家都这么做","大家都这么做"意味着对谁都不特别。所以 slop 的反面不是"更高级",而是"更具体":具体到只为某一群人、某一个品牌、某一种处境成立。一个设计若让圈外人无感、却让目标用户心头一动,那不是缺陷,是它找到了自己的边界

Reading slop as "shoddy" points to the wrong cure — the belief that more polish fixes it. The truth is the opposite: slop is often finely made, with perfect alignment, harmonious color, smooth motion, every machine-checkable metric at full marks. Its problem is not on the quality axis but the distinctiveness axis: it has converged on the shape everyone has seen most. That is why "polish it once more" cannot cure slop — you are pushing along the wrong axis. To minimize expected loss, a model naturally leans toward the highest-frequency visual patterns in its training distribution; high-frequency means "everyone does this," and "everyone does this" means special to no one. So the opposite of slop is not "more refined" but "more specific": specific enough to be true only for some group, some brand, some situation. A design that leaves outsiders cold yet moves the target user is not flawed; it has found its boundary.

异质审美的守护,因此是一条结构性的设计纪律,而不是风格偏好。当生成把"达到平均水准"变成免费,整个行业的默认产出会一起向均值滑——这不是某个团队的懒惰,是生成经济学的引力。对抗它需要主动的、付出代价的选择:明确"这只对这群人成立",并接受"对另一群人不成立"是这个选择的必然代价而非失误。这恰恰是组织卷那条人本主线在设计面上最锋利的体现——为人,意味着为具体的人,而不是为统计意义上的"所有人"。下面这张图把这条引力与对抗它的力画在一起。

Guarding heterogeneity is therefore a structural design discipline, not a style preference. Once generation makes "reaching average quality" free, the whole industry's default output slides toward the mean together — this is not any one team's laziness but the gravity of generation economics. Resisting it takes an active, costly choice: stating "this is true only for these people," and accepting that "it is not true for those people" is the necessary cost of that choice, not a mistake. This is precisely the sharpest expression, on the design surface, of the org volume's human through-line — being for people means being for specific people, not for the statistical "everyone." The figure below plots this gravity against the force that resists it.

FIGFIG. D6.0 / HETEROGENEITY GUARD · 同质化滑向均值 vs 异质守护 看懂:生成引力把审美拖向均值,异质守护把它钉在"这群人" Read: generation gravity drags aesthetics to the mean; the guard pins them to "these people"
审美空间 · 各具identity的设计散布AESTHETIC SPACE · distinct identities scattered 均值 = slopthe mean = slop 见得最多 · 谁也不为seen most · for no one A 群grp A B 群grp B C 群grp C you 生成引力 →generation gravity → 异质守护:钉在"这群人"guard: pin to "these people" 对 B/C 群无感 = 边界,非缺陷cold to B/C = a boundary, not a flaw 实测:AI 提升个体新颖度、降低集体多样性(Doshi & Hauser, Science Advances 2024,证据级 Ⅱ 受控实验)Measured: AI raises individual novelty, lowers collective diversity (Doshi & Hauser, Science Advances 2024, grade Ⅱ controlled experiment)
每个圆是一个有 identity 的设计。生成引力(灰虚线)把它们都往中间那个均值吸——这就是同质化失败模式:不加抵抗,所有设计都滑成 slop。唯一的对抗(红箭头)是主动把自己钉在某一群人身上,并接受"对其它群无感"是这个选择的代价而非失误。守异质,就是守"为具体的人"。
Each circle is a design with an identity. Generation gravity (gray dashed) pulls them all toward the mean in the middle — that is the homogenization failure mode: unresisted, every design slides into slop. The only counter (red arrow) is to actively pin yourself to one group of people and accept that "cold to other groups" is the cost of that choice, not a mistake. Guarding heterogeneity is guarding "for specific people."
异质守护 · 信号Heterogeneity · signals

有效:目标用户说"这是为我做的",圈外人无感——这正是好的边界,不是缺陷;陌生人问"怎么做的"而非"哪个 AI 做的"。失效:所有人都说"还行/挺专业",没人有强反应;命中上表 ≥3 条指纹。承重一句:对所有人都成立的审美,等于对均值收敛,也就失去具体对象。守住异质,就是守住"这群人"。Right: the target user says "this was made for me" while outsiders feel nothing (that is a good boundary, not a flaw); a stranger asks "how was this made," not "which AI." Wrong: everyone says "fine / professional," no one reacts strongly; ≥3 fingerprints above hit. Load-bearing: an aesthetic true for everyone converges to the mean and loses its specific audience. Guarding heterogeneity is guarding "these people."

DSN
09·5
FAILURE · slop 指纹的成因与具体判据
THE SLOP FINGERPRINT
机理 · 失效解剖
Mechanism · Autopsy

为什么 slop 长得都一样——指纹的成因

Why slop all looks the same — the anatomy of the fingerprint

青配深底、紫蓝渐变、玻璃拟态、Inter 居中、巨数字仪表盘——这不是巧合,是同一个生成机制在不同界面上留下的同一组指纹。理解它的成因(而非只记住症状),才能在没见过的新套路出现时也认得出来。这一节把指纹拆到机制层:每条指纹都是"高频 × 安全 × 易生成"三件事的交集。

Cyan-on-dark, purple-blue gradients, glassmorphism, centered Inter, big-number dashboards — this is not coincidence but the same generation mechanism leaving the same set of fingerprints on different interfaces. Understanding the cause (not just memorizing symptoms) lets you recognize even the new clichés you have not seen yet. This sheet takes the fingerprint down to the mechanism level: every fingerprint is the intersection of three things — high-frequency × safe × easy-to-generate.

成因一:高频。模型偏向训练分布里出现最多的视觉模式。过去几年的 dribbble/产品落地页/dashboard 模板里,深色玻璃拟态加霓虹渐变铺天盖地,于是它成了"设计应有的样子"的统计代表。成因二:安全。这些模式在评审里几乎不会被否——它们"看起来专业",没人会因为用了 Inter 而被批评。安全意味着低风险,低风险的东西最容易成为默认。成因三:易生成。渐变、圆角、居中、等大卡片,都是用最少的结构决策就能填满画面的招式——它们不要求理解内容层级,只要求把盒子排整齐。三者叠加,就得到一个稳定的吸引子:生成在没有强约束时,必然落到这组指纹上。这解释了一个反直觉的事实——slop 不是模型"不够强"的产物,恰恰是它"足够强地拟合了均值"的产物。

Cause one: high-frequency. A model leans toward the visual patterns that appear most in its training distribution. Across the last few years of dribbble / product landing pages / dashboard templates, dark glassmorphism with neon gradients was everywhere, so it became the statistical representative of "what design should look like." Cause two: safe. These patterns are almost never rejected in review — they "look professional," and no one gets criticized for using Inter. Safe means low-risk, and the low-risk thing most easily becomes the default. Cause three: easy-to-generate. Gradients, rounded corners, centering, equal-size cards are all moves that fill a screen with the fewest structural decisions — they require no understanding of content hierarchy, only that the boxes be lined up neatly. Stack the three and you get a stable attractor: with no strong constraint, generation inevitably lands on this set of fingerprints. This explains a counterintuitive fact — slop is not a product of the model being "not strong enough"; it is precisely a product of it being "strong enough to fit the mean."

每条指纹的修法,都是"加回一个被生成省略的判断"

Every fingerprint's fix is "adding back a judgment generation skipped"

既然指纹来自"省略判断",修法就一定是"把那个判断加回去",而不是换一个更新潮的套路(换套路只是换一个均值)。配色指纹的修法,是回答"这个主色为什么是它"——从品牌、从内容主题、从用户处境里找一个有来由的颜色,而不是默认霓虹;材质指纹的修法,是回答"这里的模糊/投影在服务什么层级"——让材质承担信息层级而非装饰;排版指纹的修法,是回答"这个字为什么是它"——选一款与内容气质相符、有辨识度的字,并建立真实的左对齐网格而非万物居中;布局指纹的修法,是回答"这些内容的权重一样吗"——按内容权重定尺寸差异,而非把一切塞进等大盒。共同结构:每条修法都把一个"生成默认替你做了的省事决定",换成"一个你为这群人主动做的判断"。这正是 DSN 03·5 说的——品味就是这些判断的合成,反 slop 就是逐条把它们加回去。

Since fingerprints come from "skipping a judgment," the fix must be "adding that judgment back," not switching to a trendier cliché (switching clichés only swaps one mean for another). The fix for the color fingerprint is answering "why is this the primary color" — find a motivated color from the brand, the content's subject, the user's situation, rather than defaulting to neon; the fix for the material fingerprint is answering "what hierarchy is this blur/shadow serving" — let material carry information hierarchy, not decoration; the fix for the type fingerprint is answering "why is this the typeface" — pick one whose character matches the content, with real identity, and build a genuine left-aligned grid instead of centering everything; the fix for the layout fingerprint is answering "do these contents weigh the same" — size by content weight rather than stuffing all into equal boxes. The shared structure: every fix swaps a "labor-saving default generation made for you" for "a judgment you actively made for these people." This is exactly what DSN 03·5 said — taste is the synthesis of these judgments, and anti-slop is adding them back item by item.

FIGFIG. D6.1 / FINGERPRINT ANATOMY · 逐处标注一张 slop 落地页 · a slop landing page, annotated part by part 看懂:左边是典型 AI 落地页的线框,每一处指纹(渐变标题、玻璃拟态、三卡、居中 Inter)右边都接着一个被省略的判断——指纹不是审美故障,是判断缺位的痕迹 Read: left is a typical AI landing-page wireframe; each fingerprint (gradient title, glassmorphism, three cards, centered Inter) maps on the right to a skipped judgment — a fingerprint is not an aesthetic fault but the trace of an absent decision
一张典型 slop 落地页a typical slop landing page 1 2 3 4 ① 紫蓝渐变大标题① purple-blue gradient title 省了:这行字让谁、几秒内、读到什么skipped: who reads what, in how many seconds ② 玻璃拟态浮层② glassmorphism float 省了:它凭什么比背后的东西更重要skipped: why it outranks what is behind it ③ 三张等权卡片③ three equal-weight cards 省了:这三件真的一样重要吗,谁先看skipped: are these equal — what comes first ④ 居中 Inter + 通用 CTA④ centered Inter + generic CTA 省了:这群人此刻最想被说服的那一句skipped: the one line this group most needs now 每一处指纹 = 一个被生成省略的判断。every fingerprint = one judgment generation skipped. 反 slop 不是换审美,是把这些判断逐条加回去。anti-slop is not a new look; it is adding these decisions back, one by one.
把这张图当成清单的因果版:左边每个编号是肉眼可见的征兆,右边对应的不是"换个更好看的做法",而是一个本该有人来做、却被生成跳过的判断。这解释了为什么 slop 长得都一样——它们省略的是同一批判断;也解释了为什么修 slop 不能靠"再生成一版漂亮的":漂亮还是会落在均值,缺的判断没人加回来。
Read this as the causal version of a checklist: each numbered item on the left is a visible symptom; the match on the right is not "a prettier alternative" but a decision someone should have made that generation skipped. This is why slop all looks alike — it omits the same set of judgments; and why you cannot fix slop by "generating a prettier one": prettier still lands on the mean, and the missing decisions still go unmade.

把一张典型 slop 落地页逐处标注,每处都对应一个被省略的判断

Annotating a typical slop landing page point by point — each maps to a skipped judgment

设想一张最常见的 AI 生成落地页,从上到下逐处看,你会发现它像一份征兆清单:顶部是深色背景配一行紫到蓝的渐变大标题,渐变文字本身就是第一处指纹——它把"标题"当成了炫技的画布,而非传达信息的层级,被省略的判断是"这行字到底要让谁、在几秒内、读到什么"。主视觉区是一块玻璃拟态卡片浮在模糊光斑上,第二处指纹——模糊在这里不服务任何层级,纯粹是装饰,被省略的判断是"这个浮层比背后的东西更重要吗,凭什么浮起来"。功能区是三到四张等大的圆角卡片整齐排成一行,每张顶上一个大圆角图标,第三、四处指纹叠加——等大意味着"这些功能同等重要"这个几乎从不为真的假设没被质疑,图标只是占位装饰,被省略的判断是"这些内容的权重真的一样吗,这个图标帮人认出了什么"。数据区是几个巨大的数字配小标签,第五处指纹——它套用了"看起来很有料"的仪表盘模板,被省略的判断是"这些数字对这群用户真的重要吗,还是只是为了填满版面显得专业"。全篇居中、用 Inter,最后两处——居中是最省排版决策的默认(不用想对齐网格),Inter 是最安全的字(不会被批但也毫无性格),被省略的判断是"这个内容的气质适合居中吗,这款字说出了品牌的什么"。

Picture the most common AI-generated landing page and read it top to bottom, and you find it reads like a checklist of symptoms: at the top, a dark background with a row of purple-to-blue gradient headline — gradient text is itself the first fingerprint, treating the "headline" as a canvas for showing off rather than a hierarchy that conveys information, the skipped judgment being "who exactly should read what, in how many seconds, from this line." The hero area is a glassmorphism card floating over blurred light blobs — the second fingerprint, where blur serves no hierarchy and is pure decoration, the skipped judgment being "is this floating layer more important than what is behind it, what entitles it to float." The features area is three or four equal rounded cards lined up neatly, each with a big rounded icon on top — the third and fourth fingerprints stacked, where equal sizing leaves the almost-never-true assumption "these features are equally important" unquestioned and the icon is mere placeholder decoration, the skipped judgment being "do these contents really weigh the same, and what does this icon help anyone recognize." The data area is a few giant numbers with small labels — the fifth fingerprint, applying the "looks substantial" dashboard template, the skipped judgment being "do these numbers really matter to these users, or are they just there to fill the page and look professional." Centered throughout, in Inter — the last two, where centering is the default that saves the most layout decisions (no alignment grid to think about) and Inter is the safest typeface (never criticized, also utterly without character), the skipped judgment being "does this content's character suit centering, and what does this typeface say about the brand."

把这张标注图连起来看,会得到一个关于 slop 的更深的定义:slop 是一连串"省略的判断"在视觉上的累加。每一处指纹单独看都不致命——用一次居中、用一回渐变,本身不是罪;致命的是整张图从头到尾,没有一处是为这群人主动判断过的,每一处都选了那个最省事、最不会错的默认。这正是为什么 slop 给人的感觉是"哪里都对,整体却空":因为它确实哪里都没犯错(每个默认都是安全的),但也确实哪里都没有人在场(每个默认都绕过了判断)。反过来,一个有品味的设计未必每处都标新立异,但它一定在那些最该判断的地方真的判断了——主色有来由、层级有取舍、字有性格、该强调的被强调。所以"标注一张 slop"这个练习本身就是有价值的训练:它逼你在每一处问"这里有没有人做过判断",而这个问句,正是把"挑出 slop"升级成"看懂 slop 为什么是 slop"的钥匙。

Read this annotated figure as a whole and you reach a deeper definition of slop: slop is the visual accumulation of a series of "skipped judgments." No single fingerprint is fatal on its own — centering once, a gradient once, is no crime; what is fatal is that from top to bottom the whole figure has not one place actively judged for these people, every place picking the most labor-saving, least-wrong default. This is exactly why slop feels "right everywhere, hollow overall": it really did make no mistakes anywhere (every default is safe), but really has no one present anywhere (every default bypasses judgment). Conversely, a design with taste need not be novel everywhere, but it surely actually judged where judgment was most due — a motivated primary, a hierarchy with trade-offs, type with character, the emphasis emphasized. So the exercise of "annotating a slop" is itself valuable training: it forces you to ask, at each place, "did anyone make a judgment here," and that question is the key that upgrades "spotting slop" into "understanding why slop is slop."

具体判据 · 接 HARD-RULESConcrete criteria · feeding HARD-RULES

把上面的成因翻译成可机检条目,正是 DSN 08 HARD-RULES 那一层的来源:禁渐变文字、限定调色板取自品牌 token、模糊层数上限、字体白名单(排除 Inter/Roboto 系统默认)、卡片尺寸必须随内容权重变化、命中"空洞口号词表"零次。这些可写进 lint;而"这个主色为什么是它、这款字为什么对"仍是软判据,留给人。下方 INSTRUMENT 12/13 帮你把这两层分开跑一遍。Translating the causes above into machine-checkable items is exactly the source of DSN 08's HARD-RULES layer: ban gradient text, constrain the palette to brand tokens, cap blur layers, whitelist typefaces (excluding Inter/Roboto system defaults), require card size to vary with content weight, zero hits on the "empty-slogan word list." These go into lint; while "why this primary, why this typeface is right" remains soft, kept with people. INSTRUMENT 12/13 below help you run the two layers separately.

DSN
09·7
FAILURE · 同质化是系统性风险
HOMOGENIZATION AS SYSTEMIC RISK
机理 · 宏观失效
Mechanism · Macro failure

最大的失败模式不在一个产品里,在整个行业一起滑向均值

The biggest failure mode is not inside one product but a whole industry sliding to the mean together

前面谈的是单个产品如何避开 slop。但有一个更大的失败模式发生在系统层:当所有团队都用同几个生成模型、喂相似的提示、收敛到相似的均值,整个数字世界会变得越来越像。这不是危言耸听,而是生成经济学的直接推论。理解这个宏观失效,才能理解为什么"异质守护"不只是个人审美洁癖,而是对抗一种系统性引力的必要纪律。

The above is about how a single product avoids slop. But a bigger failure mode happens at the system level: when all teams use the same few generation models, feed similar prompts, and converge on similar means, the entire digital world grows more and more alike. This is not alarmism but a direct corollary of generation economics. Grasping this macro failure is what lets you see why "guarding heterogeneity" is not personal aesthetic fastidiousness but a discipline necessary to resist a systemic gravity.

机制:共享的均值是一个共享的吸引子。过去,设计的多样性有一部分来自工具和人的差异——不同设计师的手感不同、不同工具的默认不同,这些差异在产出里留下了不同的痕迹。当生成把"达到平均水准"变成几乎免费,且大家用的是同一批模型、同一批流行提示,这些差异来源被抹平了:所有人都从同一个分布的同一个峰附近起步。结果是一种趋同压力——不是有人强迫,而是每个理性的个体都选了那条最省事、最不会错的默认,而这些默认恰好是同一个。十个团队各自做出"看起来很专业"的产品,叠在一起却发现它们像一个模子刻的。这就是同质化:个体层面的"安全选择",在系统层面累加成"集体失去辨识度"。

Mechanism: a shared mean is a shared attractor. In the past, part of design's diversity came from the difference between tools and people — different designers' touch, different tools' defaults, leaving different marks in the output. Once generation makes "reaching average quality" nearly free, and everyone uses the same batch of models and the same popular prompts, those sources of difference get flattened: everyone starts near the same peak of the same distribution. The result is a convergence pressure — no one forces it, but each rational individual picks the most labor-saving, least-wrong default, and those defaults happen to be the same one. Ten teams each make a product that "looks professional," yet stacked together they turn out cut from one mold. This is homogenization: "safe choices" at the individual level summing, at the system level, into "a collective loss of distinctiveness."

实测锚点Measured anchor

这条机制不止是推断,已有受控实验测到它的方向。Doshi 与 Hauser 的对照实验(Science Advances,2024,约 300 名受试者写短篇故事,部分人获得 AI 创意提示)发现一个分裂的效应:拿到 AI 提示的故事在个体层面被评得更新颖、更有用,但这批故事在集体层面用语义相似度衡量却更彼此趋同——个人创意上升,集体多样性下降。这正是"共享均值是共享吸引子"的实验影像:放大每一个个体,同时压平整个分布。〔源:Doshi & Hauser, Science Advances 10(28), 2024,证据级 Ⅱ 受控实验;该研究对象是叙事文本,迁移到视觉/产品设计是一次合理但仍需验证的外推,故不外推具体数字〕[R4]This mechanism is not only inferred; a controlled experiment has measured its direction. Doshi and Hauser's randomized study (Science Advances, 2024; roughly 300 participants writing short stories, some given AI story ideas) found a split effect: stories written with AI prompts were rated more novel and useful at the individual level, yet the set of those stories was more similar to one another at the collective level, measured by semantic similarity — individual creativity up, collective diversity down. This is the experimental image of "a shared mean is a shared attractor": amplify each individual while flattening the whole distribution. [Source: Doshi & Hauser, Science Advances 10(28), 2024, grade Ⅱ controlled experiment; the study's object is narrative text, so carrying it to visual/product design is a reasonable but still-unverified extrapolation, and no specific figure is extrapolated here][R4]

为什么这是真正该担心的失败模式?因为它对个体几乎无痛——你的产品"看起来没问题",每个指标都过关,没有任何单点的失败提醒你出了事。痛感被推迟、被分散到整个行业和长期:用户对一切都"还行"地无感,没有什么值得记住、值得偏爱、值得忠诚。设计本该制造的"这是为我做的"那种归属感,在普遍的均值里被稀释为零。异质守护就是对这条引力的个体层面抵抗:明确地、付出代价地选择"只对这群人成立"。它在个体层面看起来是放弃了一部分潜在受众,在系统层面却是维持多样性的唯一办法。这正是组织卷人本主线在设计面上的最终落点——为人,意味着为具体的、有差异的人,而维持这种差异,需要每个设计师主动逆着均值的引力站住。

Why is this the failure mode truly worth worrying about? Because it is almost painless to the individual — your product "looks fine," every metric passes, no single-point failure warns you something went wrong. The pain is deferred and dispersed across the whole industry and the long term: users feel an "it's fine" indifference to everything, with nothing worth remembering, preferring, or being loyal to. The sense of belonging — "this was made for me" — that design is meant to create gets diluted to zero in the universal mean. Guarding heterogeneity is the individual-level resistance to this gravity: choosing, explicitly and at a cost, "true only for these people." It looks, at the individual level, like giving up part of a potential audience; at the system level it is the only way to keep diversity alive. This is the final landing, on the design face, of the org volume's human through-line — being for people means being for specific, differing people, and keeping that difference alive requires each designer to actively stand against the gravity of the mean.

FIGFIG. D8.0 / DISTINCTIVENESS APPRECIATES · 供给趋无限,均值贬值、偏离均值升值 · as supply →∞, the mean depreciates while the off-mean appreciates 看懂:当"看起来专业"被无限供给,它的价值趋零、变成入场券;同一时间,"为这群人极致对路"的偏离反而越来越稀缺、越来越值钱——这是同质化危机的另一面 Read: when "looks professional" is supplied without limit, its value tends to zero and becomes mere admission; meanwhile the off-mean "extreme fit for this group" grows scarcer and more valuable — the flip side of the homogenization crisis
价值value 均值产物的供给量 →(AI 让它趋于无限)supply of mean output → (AI drives it toward infinite) "看起来专业"的价值value of "looks professional" → 趋零,变成入场券→ to zero, becomes admission "只对这群人成立"的价值value of "true only for these people" ↑ 越稀缺越值钱↑ scarcer = worth more 交叉点:差异化高地开始空旷crossover: the distinctiveness high-ground empties out 经济学直觉:供给无限 ⇒ 价值趋零(一面的代价正是另一面的机会)the economics: infinite supply ⇒ value → 0 (one side's cost is the other side's opening)
这是 D6.0"引力把所有设计吸向均值"那张图的时间面:把镜头拉长,会看到两条价值曲线在反向移动。slop 越泛滥,"看起来专业"就越不构成优势——它从差异化变成了基线、入场券;同一过程里,真正"只对这群人成立"的设计因为周围一片均值而更显眼、更稀缺、更值钱。所以守异质不只是防御(别滑向 slop),它是一笔押注:押那个正在升值的稀缺品。这条同样不外推具体数字,只主张方向。
This is the time face of D6.0's "gravity pulling every design toward the mean": pull the lens back and the two value curves move in opposite directions. The more slop floods in, the less "looks professional" buys you — it shifts from a differentiator to a baseline, an admission ticket; in the same motion, design that is genuinely "true only for these people" becomes more conspicuous, scarcer, worth more against the surrounding mean. So holding the heterogeneous is not only defense (don't slide into slop); it is a bet — on the scarce thing that is appreciating. This too extrapolates no specific number, only the direction.

同质化里藏着一个反直觉的机会:异质本身在升值

Hidden in homogenization is a counterintuitive opportunity: distinctiveness is appreciating

这条系统性风险有一个反直觉的另一面,值得对从业者点明:当均值变廉价且无处不在,偏离均值的那部分反而变得更稀缺、更值钱。经济学的直觉在这里成立——任何东西一旦供给无限,它的价值就趋零;slop 正在变成无限供给,所以"看起来专业"本身已经不再是优势,它是基线、是入场券。真正能制造差异、让人记住、让目标用户产生"这是为我做的"那种归属感的设计,反而因为周围一片均值而更显眼、更有价值。这意味着异质守护不只是一种防御性的纪律(避免滑向 slop),它同时是一种进攻性的机会(在一片趋同里成为那个被记住的)。对一个团队或个人,这是把同质化危机翻转成定位优势的入口:当所有人都在用 AI 把自己变得更像,主动选择"只对这群人极致对路、对其他人无感"的那个,反而占据了越来越空旷的差异化高地。守异质,因此既是对系统性风险的抵抗,也是对一个正在升值的稀缺品的押注。

This systemic risk has a counterintuitive flip side worth naming for practitioners: once the mean is cheap and everywhere, the part that deviates from the mean becomes scarcer and more valuable. The economic intuition holds here — anything in infinite supply trends toward zero value; slop is becoming infinite supply, so "looks professional" is no longer an advantage but the baseline, the price of admission. Design that genuinely creates difference, gets remembered, makes the target user feel the belonging of "this was made for me" becomes, precisely because everything around it is the mean, more conspicuous and more valuable. This means guarding heterogeneity is not only a defensive discipline (avoiding the slide to slop) but also an offensive opportunity (being the one remembered amid the convergence). For a team or an individual, this is the entry to flipping the homogenization crisis into a positioning advantage: when everyone uses AI to make themselves more alike, the one who actively chooses "extremely on-target for these people, cold to everyone else" occupies the increasingly empty high ground of differentiation. Guarding heterogeneity is therefore both a resistance to systemic risk and a bet on a scarce good that is appreciating.

要避免被误读,得把"异质"和"为不同而不同"区分开。异质守护不是鼓励标新立异、不是为了和别人不一样而故意做怪——那只是把"滑向均值"换成了"滑向猎奇",同样没有共情这个根,同样是 slop 的变体。真正的异质,是"为这群人对路"自然生长出来的结果:当你真的为一群具体的人、在一个具体的目的下做设计,你的取舍会自然地偏离那个为所有人优化的均值,因为这群人和"所有人"本就不同。换句话说,异质是共情的副产物,不是目标本身。这个区分很重要,因为它防住了一种常见的过度修正:团队听说要"反 slop、要独特",于是开始为独特而独特,做出一堆刻意古怪、却同样不为用户着想的东西。检验很简单——问这个"不一样"是从"为这群人"长出来的,还是从"想显得不一样"长出来的。前者是异质守护,后者只是换了一种 slop。守异质的正道,永远是先把"为谁"明确下来,让差异自然涌现,而不是把差异本身当成追求。

To avoid being misread, "heterogeneity" must be distinguished from "different for difference's sake." Guarding heterogeneity is not encouraging novelty-seeking, not deliberately being weird to stand apart — that merely swaps "sliding to the mean" for "sliding to the bizarre," equally rootless in empathy, equally a variant of slop. True heterogeneity is the natural outgrowth of "being on-target for these people": when you genuinely design for a group of specific people under a specific purpose, your trade-offs naturally diverge from the mean optimized for everyone, because these people are inherently different from "everyone." In other words, heterogeneity is a byproduct of empathy, not the goal itself. This distinction matters because it guards against a common over-correction: a team hears "anti-slop, be distinctive" and starts being distinctive for its own sake, making a pile of deliberately odd things equally inconsiderate of users. The test is simple — ask whether this "different" grew from "for these people" or from "wanting to seem different." The former is guarding heterogeneity; the latter is just another flavor of slop. The right way to guard heterogeneity is always to pin down "for whom" first and let difference emerge naturally, not to pursue difference itself.

把这条系统性失败和这一卷开头的不对称连起来,整张图就闭合了:因为生成把"达到均值"变成免费(不对称),所以默认产出是 slop(均值即 slop),所以不加干预整个行业会一起向均值滑(同质化),所以唯一的解药是每个设计师主动把审美钉在具体的人身上(异质守护),而这件事恰恰是机器做不了、必须由人来扛的判断(品味=稀缺判断),而它之所以值得扛,是因为为具体的人而做本就是设计的根基,也是整个系列共同的指向(人回到意义)。这不是六个零散的观点,而是一条从经济学前提一路推到人本结论的完整受力链:每一环都是上一环的必然后果。这也是为什么这一卷敢说自己抓的是结构而非风格——因为它的每个主张都不是孤立的审美偏好,而是这条链上一个被前因逼出来的节点。读到这里,再回看 DSN 01 那张生成×品味平面,你会发现整卷讲的其实是同一件事的不同切面:人,必须重新把品味这条纵轴,亲手加回到一个正在集体丢失它的世界里。

Connect this systemic failure to the asymmetry at the volume's opening and the whole figure closes: because generation makes "reaching the mean" free (the asymmetry), the default output is slop (the mean is slop), so unresisted the whole industry slides to the mean together (homogenization), so the only cure is each designer actively pinning their aesthetic to specific people (guarding heterogeneity), and this is exactly the judgment a machine cannot do and a human must carry (taste = scarce judgment), and it is worth carrying because being for specific people is design's foundation and the whole series' shared direction (people return to meaning). This is not six scattered points but one complete force chain reasoned all the way from an economic premise to a human conclusion: each link a necessary consequence of the last. This is why this volume dares to say it caught structure, not style — because each of its claims is not an isolated aesthetic preference but a node on this chain forced out by what came before. Reading this far, look back at the generation × taste plane of DSN 01 and you find the whole volume was telling different faces of one thing: people must, by hand, re-add the vertical axis of taste to a world that is collectively losing it.

证伪 · 这条担忧可能错在哪Falsification · where this worry could be wrong

这条担忧若错,会错在:若生成模型未来能主动制造有意义的差异(不是随机扰动,而是针对不同人群给出真正不同且对路的设计),同质化引力就会被模型自身抵消,异质守护也就不再需要人来扛。目前没有证据表明模型在做这件事——它们优化的是"对得多",不是"对这群人特别"。只要这一点不变,同质化就是真实的系统性风险,异质守护就仍是人的责任。If this worry is wrong, it is wrong here: if future generation models can actively manufacture meaningful difference (not random perturbation but genuinely different, on-target designs for different groups), the homogenization gravity would be canceled by the model itself, and guarding heterogeneity would no longer need a human to carry it. There is currently no evidence models do this — they optimize "right for many," not "special for these people." As long as that holds, homogenization is a real systemic risk and guarding heterogeneity remains a human responsibility.

DSN
10
MOTION · video-as-code / 动效即代码
VIDEO-AS-CODE
机理 · 同构
Mechanism · Isomorphism

动效与视频,同一招再走一遍

Motion and video: the same move, once more

design-as-code 不止于静态界面。Remotion(用 React 写视频)让一段视频变成可编程、可 diff、可由 agent 生成的代码——时间轴、文案、配色都是变量。动效与视频走的是和静态设计完全相同的受力:产物变代码,就拿到同款杠杆;品味与意图依旧是稀缺判断。

Design-as-code does not stop at static interfaces. Remotion (video written in React) turns a video into programmable, diffable, agent-generatable code, where timeline, copy, and palette are all variables. Motion and video take exactly the same force as static design: artifact becomes code, gains the same leverage; taste and intent remain the scarce judgment.

过去做一条产品视频,改一个字要重渲整条、要回到不可读的时间线软件里手动对齐。当视频是代码:文案是参数(可批量本地化)、配色取自同一套 token(不离牌)、一次改动是一次 diff(可评审可回滚)、agent 能按规格批量出变体(铺开→判断→收敛同一条环)。动效不再是产出工时的黑洞,而是和界面同源的一份可生成产物。

Producing a product video used to mean: change one word, re-render the whole thing, go back into opaque timeline software to align by hand. When video is code: copy is a parameter (batch-localizable), palette comes from the same tokens (on-brand), one change is one diff (reviewable, revertible), and an agent can spin up variants on spec (the same spread → judge → converge loop). Motion stops being a black hole of production hours and becomes a generatable artifact, same-source as the interface.

时间线软件 · 二进制Timeline software · binary
改字要重剪、本地化要重做、变体靠手工;agent 进不来,品味花在重复劳动上。
Changing a word means re-editing, localization means redoing, variants are manual; no agent can enter, and taste is spent on repetitive labor.
video-as-code · 文本Video-as-code · text
文案/时长/配色皆变量,agent 按规格铺变体,人只判"哪条对路、节奏对不对"。同一条工作流环。
Copy/duration/palette are variables; an agent spreads variants on spec; the human only judges "which is on-target, is the pacing right." The same workflow loop.

把动效纳进来,不只是多覆盖一个产物类型,它其实在验证整套方法的可迁移性。一个方法论若只对静态界面成立、一碰到时间维度就失效,那它八成抓的是表象而非受力。动效是一次干净的压力测试:它把"产物是不是私有二进制"这个变量重新拉满(传统视频工具的工程文件正是典型的不可读二进制),于是同一条受力链应该重新出现——而它确实出现了。改一个字要重渲整条、本地化要重做一遍、变体靠手工,这些正是"二进制画布被边缘化"在时间维度上的复现;而 Remotion 把视频表达成 React,让文案成参数、配色取自同一套 token、一次改动成一次 diff,这又正是"代码产物被放大"的复现。受力链在一个全新的产物类型上原样重演,本身就是这套方法不是临时拼凑、而是抓到了底层结构的证据。

Bringing motion in is not just covering one more artifact type; it actually verifies the portability of the whole method. A methodology that holds only for static interfaces and breaks the moment it meets the time dimension has most likely grasped surface appearance rather than force. Motion is a clean stress test: it re-maxes the variable "is the artifact a proprietary binary" (the project files of traditional video tools are exactly the canonical unreadable binary), so the same force chain should reappear — and it does. Change one word and re-render the whole thing, redo localization, variants by hand: these are precisely "the binary canvas gets sidelined" recurring in the time dimension; while Remotion expressing video as React — copy as a parameter, palette from the same tokens, one change as one diff — is precisely "the code artifact gets amplified" recurring. The force chain replaying intact on a brand-new artifact type is itself evidence that this method is not an improvised patchwork but has caught the underlying structure.

时间维度多出一条不可机检的轴:节奏

The time dimension adds one more axis that cannot be machine-checked: pacing

把 design-as-code 搬到动效,验证了内核的迁移性——同一套受力换个面依旧成立。但诚实地说,时间维度比静态多出一条不可机检的轴,所以人那一半的判断在这里反而更重,不是更轻。静态设计里,可机检的硬约束(token、对齐、对比度)能覆盖相当一部分"对不对";到了动效,多了节奏这条轴——一个转场是 200ms 还是 400ms、一句旁白后该停顿几拍、信息以什么顺序揭示——这些既不能写进 token,也没有"正确值",只有"对这段内容、这个情绪对不对"。这意味着 video-as-code 把②③④补全状态、本地化、铺变体这些可机检的劳动自动化掉之后,省下的人力恰恰要更密集地投到节奏判断上。代码给了你随意尝试不同节奏的自由(改个参数即可重渲),但哪个节奏对仍然只有人的耳朵和眼睛能定。

Carrying design-as-code into motion verifies the kernel's portability — the same force holds on another face. But honestly, the time dimension has one more axis that cannot be machine-checked than static does, so the human half of judgment is heavier here, not lighter. In static design the hard machine-checkable constraints (token, alignment, contrast) cover a fair share of "right or wrong"; in motion there is the added axis of pacing — whether a transition is 200ms or 400ms, how many beats to hold after a line of narration, in what order information is revealed — none of which can be written into tokens, none of which has a "correct value," only "right or wrong for this content, this emotion." This means that after video-as-code automates the machine-checkable labor of ②③④ (filling states, localization, spreading variants), the freed-up human effort must in fact go more densely into pacing judgment. Code gives you the freedom to try different pacings at will (change a parameter, re-render), but which pacing is right can still only be settled by a human's ear and eye.

这条边界也回答了一个常见的误解:"既然视频成了代码,是不是 AI 能端到端生成成品视频了?"能生成,但生成的默认仍是节奏上的均值——四平八稳、哪里都不出错、也哪里都不动人,正是动效版的 slop。所以动效的工作流和静态完全同构:规格(包括节奏意图)→ 铺开多个节奏方向 → 人判哪个对 → 导向再生成 → 收敛。代码形态把这条环的每一步都变得便宜可迭代,但环中央那个"判节奏"的验证器,依旧是人。这就是为什么我们说动效是"同一招再走一遍",而不是"又一个被 AI 解决的问题"。

This boundary also answers a common misreading: "since video is now code, can AI generate finished video end to end?" It can generate, but the default of that generation is still the mean of pacing — even, error-free everywhere, moving nowhere — which is precisely the motion version of slop. So the motion workflow is fully isomorphic to the static one: spec (including pacing intent) → spread several pacing directions → human judges which is right → steer and regenerate → converge. The code form makes every step of this loop cheap and iterable, but the "judge the pacing" verifier at the loop's center is still a human. That is why we say motion is "the same move once more," not "another problem AI has solved."

动效里还有一类被低估的杠杆:本地化与个性化变体。一条产品视频要出十种语言、三种时长、给不同人群各调一版语气,在时间线软件里这是十几乘以几的重复劳动,每一版都要手动重剪、重对齐、重导出——成本高到大多数团队干脆放弃,只出一版凑合。当视频是代码,这件事的成本结构彻底变了:语言是一个文案参数数组、时长是一个时间轴变量、人群语气是一组可切换的配置,agent 可以按这些参数批量渲染出所有组合,人只需要判断"每一版的节奏在它的语境里对不对"。这把过去"做不起所以不做"的个性化,变成了"几乎免费所以值得做"的常规操作。它的意义不只是省力,而是让"为不同人群各做对一版"——也就是 DSN 09 说的异质守护——在动效这个过去最贵的产物类型上,第一次变得经济可行。代码形态在这里再次证明:它放大的从来不是某个酷炫功能,而是"为具体的人做具体的东西"这件事的可行性。

Motion holds another underrated lever: localization and personalized variants. Shipping a product video in ten languages, three durations, with the tone tuned for different groups, is in timeline software a repetition of dozens-times-several manual jobs — each version re-edited, re-aligned, re-exported by hand — at a cost so high most teams simply give up and ship one make-do version. When video is code, this cost structure changes completely: language is an array of copy parameters, duration a timeline variable, group tone a switchable configuration, and an agent can batch-render every combination from these parameters, with the human only judging "is each version's pacing right in its context." This turns the personalization that used to be "unaffordable so not done" into the routine of "nearly free so worth doing." Its significance is not only saved effort but that "making one version right for each different group" — DSN 09's heterogeneity-guarding — becomes, for the first time, economically feasible on motion, the artifact type that used to be most expensive. The code form proves once more here: what it amplifies is never some flashy feature but the feasibility of "making specific things for specific people."

同构 / 边界Isomorphism / boundary

这是 DSN 06–08 那套受力照搬到时间维度——静态怎么做,动效就怎么做。边界也一样:节奏、情绪、何时该停顿留白,无法写进 token,仍是人的品味判断。代码给的是杠杆,不是节奏感。This is the DSN 06–08 force carried into the time dimension: do for motion what you do for static. The boundary is the same too: pacing, emotion, when to hold a beat or leave space cannot be written into tokens; they remain human taste. Code gives leverage, not a sense of rhythm.

DSN
10
SPECULATION · 推演幕
SPECULATION · The Speculation Act
推论 · 外推,非事实
Inference · Extrapolation, Not Fact

当生成成本趋零,设计组织会变成什么

When generation cost goes to zero, what the design org becomes

这一幕不画一条"AI 越来越强"的曲线,而是把本卷的命题——生成富足、品味稀缺——投影到 2026→2032。它不预测哪条线会发生,而是张开一个可能性空间:哪些分支可能、各自的先行指标、以及什么观察会把它证伪。能被证伪的推演才值得推演。

This act does not draw a single "AI keeps getting stronger" curve. It projects this volume's thesis — generation abundant, taste scarce — onto 2026 through 2032. It does not predict which line occurs; it opens a possibility space: which branches are possible, their leading indicators, and what observation would falsify each. Only speculation that can be falsified is worth speculating.

本章性质 · 推论以下是基于 2023-2026 公开轨迹的外推,不是事实陈述。每条推演都附先行指标与证伪条件;当观察与推演相悖时,本章应最先被改写。〔证据级 Ⅴ 论证/外推〕
Nature of this chapter · InferenceWhat follows extrapolates from the public trajectory of 2023-2026; it is not a statement of fact. Each line carries leading indicators and a falsification condition; when observation contradicts the speculation, this chapter should be the first to be rewritten. [Grade Ⅴ, argument/projection]
THE PROJECTION · 把本卷命题推到时间轴上The thesis pushed onto the timeline

推演不是畅想。本卷只立了一条命题:当"达到平均水准的产出"接近免费,价值就从"会做"塌向"会判断",而判断里最不能被机器替代的那部分是品味。把这条命题放上时间轴,要追问的不是"AI 会不会更强"——那几乎确定——而是那个无法被机器核验的判断节点,会不会、何时、被什么侵蚀。三条正在汇流的力量决定边界,两条不确定性轴张开四个世界,三件来自那些世界的文物让推演可触。最后必须记下与本卷对赌的反命题:万一品味并不稀缺呢。

Speculation is not daydreaming. This volume rests on one claim: when "output at the average bar" approaches free, value collapses from being-able-to-make toward being-able-to-judge, and the part of judgment least replaceable by machines is taste. Put that claim on a timeline and the question is not "will AI get stronger" — that is nearly certain — but whether, when, and by what the machine-uncheckable judgment node gets eroded. Three converging forces set the boundaries, two axes of uncertainty open four worlds, and three artifacts from those worlds make the speculation tangible. Finally we must record the counter-bet against this volume: what if taste is not scarce after all.

三条汇流的力量Three Converging Forces

Three Converging Forces

设计组织的重构不是一条曲线,是三条独立成熟、正在汇流的力量。每条只问三件事:成立则解锁什么形态 / 当前在哪 / 什么信号会把它证伪。第一条与第二条若都成立,会同时挤压第三条——这正是本卷命题的胜负手。

The redrawing of the design org is not one curve but three forces maturing independently and now converging. Each asks only three things: what form it unlocks if it holds, where it is now, and what signal would falsify it. If the first two both hold, they squeeze the third at the same time — which is exactly the decisive point of this volume's thesis.

生成成本趋零 · GENERATION → FREE
Generation Cost → Zero
解锁Unlocks出一张"看起来专业"的稿、一段过得去的视频、一套能跑的组件,边际成本趋近于零。设计的稀缺资源从"产出能力"彻底移走——一人可铺开过去一个团队的候选量。Producing a "looks professional" comp, a passable video, a runnable component set drops to near-zero marginal cost. Design's scarce resource moves off "ability to produce" entirely; one person spreads the candidate volume a whole team once did.
TRL规模化中2023-26 文生图/视频/前端代码逐年逼近"专业均值",成本逐年下降〔证据级 Ⅳ〕。Scaling up In 2023-26 text-to-image / video / front-end code close in on the "professional mean" year over year, at falling cost [grade Ⅳ].
证伪Falsified if若生成在"过得去"处长期封顶、最后一公里(品牌一致、可交付、合规)仍需大量人工返工,则成本并未趋零,执行仍是稀缺资源。If generation caps at "passable" for the long run and the last mile (brand consistency, deliverability, compliance) still needs heavy human rework, then cost has not gone to zero and execution remains scarce.
设计系统即规格 · SYSTEM-AS-SPEC
Design-System-as-Spec
解锁Unlocks当 token、组件、规则被写成机器可读的规格(沿 DSN 06-08 的方向),"何为对"的一大半变成可机检的护栏——生成被约束在系统内,品牌一致不再靠逐稿盯。设计系统从文档升级为可执行的判断载体When tokens, components, and rules are written as machine-readable spec (along the DSN 06-08 direction), much of "what is right" becomes a machine-checkable guardrail; generation is constrained inside the system and brand consistency no longer rides on per-comp policing. The design system upgrades from document to an executable carrier of judgment.
TRL早期商用token 与组件库已标准化;"约束生成"工具 2025 起进入早期商用,品味规则的形式化仍浅〔证据级 Ⅳ〕。Early commercial Tokens and component libraries are standardized; "constrained-generation" tooling entered early commercial use from 2025, while the formalization of taste rules stays shallow [grade Ⅳ].
证伪Falsified if若品味的关键部分始终无法被写成可机检的规则(节奏、情绪、何时留白),则系统只能挡住低级错误,挡不住趋同——护栏有上限。If the load-bearing part of taste can never be written as a machine-checkable rule (pacing, emotion, when to leave space), the system catches only low-level errors, not convergence; the guardrail has a ceiling.
品味作为唯一护城河 · TASTE AS MOAT
Taste as the Remaining Moat
解锁Unlocks前两条把执行和合规都抹平后,组织间唯一不能被复制的差异,是"挑得准、知道为什么、敢承担"的判断密度。竞争从"谁做得多"转向"谁判得对"——品味成为最后的差异化资产,可被定价、被招聘、被组织化。Once the first two flatten execution and compliance, the only difference between organizations that cannot be copied is the density of judgment that picks accurately, knows why, and bears the consequence. Competition shifts from "who makes more" to "who judges right"; taste becomes the last differentiating asset — priced, hired for, and organized around.
TRL论证态这是本卷的承重命题,也是最该被对赌的一条——见下方反命题与情景台。〔证据级 Ⅴ 论证〕Argument stage This is the volume's load-bearing claim and the one most deserving a counter-bet — see the counter-bet and the scenario bench below. [grade Ⅴ, argument]
证伪Falsified if若模型学会了在统计上稳定地复现"被市场判为好"的设计(用户偏好可被高保真预测),则品味也被自动化,护城河蒸发——这正是反命题。If models learn to reproduce, with statistical stability, the designs the market judges as good (user preference becomes high-fidelity predictable), then taste too is automated and the moat evaporates — which is precisely the counter-bet.
为什么不是四条 · BOUNDARY NOTE
Why Not a Fourth
边界Scope具身/机器人、能源算力地租、监管这些更宽的力量在组织卷推演;本卷只追设计这个面上的三条。把它们都堆进来会稀释命题——推演的纪律是只外推自己能负责的那条线。Broader forces — embodiment/robotics, energy and compute rent, regulation — are speculated in the Org volume; this volume tracks only the three on the design face. Piling them all in would dilute the thesis; the discipline of speculation is to extrapolate only the line you can be held responsible for.
交叉Coupling监管确会外溢到设计(AI 生成内容标注、版权),但它改变的是约束,不改变"品味是否可机检"这个本卷的轴心问题。Regulation does spill into design (labeling of AI-generated content, copyright), but it changes the constraints, not this volume's pivot question of whether taste is machine-checkable.
INSTRUMENT 14 · 情景台 SCENARIO BENCH

三条力量划定边界,但 2032 落在哪个世界,取决于两条高影响、高不确定的力量:X 轴 生成能力(停在"专业均值" vs 突破到"可复现被判为好的设计")与 Y 轴 品味分布(仍稀缺集中于少数判断者 vs 被工具民主化、人人可调)。切换两轴,看本卷命题在那个象限里站得住还是塌掉,以及什么先行指标说明我们正滑向它(GBN 双轴情景法)。

Three forces mark the boundaries, but which world 2032 falls into turns on two high-impact, high-uncertainty forces: X · generation capability (stalls at the "professional mean" vs breaks through to reproducing designs judged good) and Y · taste distribution (stays scarce and concentrated in a few judges vs gets democratized by tools so anyone can dial it). Toggle the two axes to see whether this volume's thesis holds or collapses in that quadrant, and what leading indicator says we are sliding toward it (the GBN two-axis scenario method).

X · 生成能力Generation Capability
Y · 品味分布Taste Distribution
品味溢价Taste Premium
停在均值 × 稀缺Stalls × Scarce
判断寡头Judgment Oligopoly
复现"好" × 稀缺Reproduces × Scarce
寒武纪长尾Cambrian Long Tail
停在均值 × 民主化Stalls × Democratized
均值之海Sea of the Mean
复现"好" × 民主化Reproduces × Democratized
SHORT-TERM2026-2028
"生成铺开候选"成为默认工序
"Generate the candidates" becomes the default step

从一稿到多稿。独立设计师与小团队默认先让生成铺开十几二十个候选,再把人的时间几乎全部投到挑、评、导上。"会用 Figma 画得快"不再是稀缺技能;"能在二十稿里一眼挑出那一版、并说清为什么"成了新的入门线。设计系统开始被当成"喂给生成的规格"来维护,而不只是交付文档。

From one comp to many. Independent designers and small teams default to letting generation spread a dozen-plus candidates first, then pour almost all human time into picking, critiquing, steering. "Fast in Figma" stops being the scarce skill; "spot the right one out of twenty and say why" becomes the new entry bar. Design systems start being maintained as "spec fed to generation," not just delivery documents.

校准锚:方向成立,斜率被高估。这是十年曲线的头两年,不是终点。生成在"过得去"处仍频繁卡住最后一公里(品牌细节、跨端一致、可交付状态),返工成本在 2026-28 仍然真实存在——本块所有"趋零"说法都该先打这个折扣。〔证据级 Ⅳ 从业者外推〕

Calibration anchor: the direction holds, the slope is overestimated. These are the first two years of a decade-long curve, not its endpoint. Generation still jams on the last mile at "passable" (brand detail, cross-platform consistency, deliverable state), and rework cost is real through 2026-28; every "→ zero" claim in this block should be discounted by that first. [grade Ⅳ, practitioner extrapolation]

MID-TERM2028-2030
"品味"开始被招聘、被定价、被组织化
"Taste" starts being hired for, priced, and organized

设计组织里出现明确的判断岗与执行岗分层:少数人持有"何为好"的最终判断与品牌方向,生成承包其余。岗位描述里"精通某工具"的权重下降,"判断质量、方向感、能把品味讲清楚"的权重上升。设计系统从静态库演化为"带判断的护栏"——可机检的部分挡住低级错误,挡不住的部分(节奏、情绪)显式留给人。同质化压力成为产品评审的常规议题,而不只是审美洁癖。

A clear split appears inside design orgs between judgment roles and execution roles: a few hold final judgment on "what is good" and the brand direction; generation takes the rest. In job descriptions the weight on "proficient in tool X" falls and the weight on "judgment quality, sense of direction, able to articulate taste" rises. Design systems evolve from static libraries into "guardrails with judgment" — the machine-checkable part catches low-level errors, the uncheckable part (pacing, emotion) is left explicitly to humans. Homogenization pressure becomes a routine review topic, not aesthetic fastidiousness.

分歧点。这一段是本卷与反命题第一次正面相遇:如果到 2030 偏好模型已能稳定预测"这群人会判为好",那"判断岗"会比预期更早地被压薄。下方反命题块记录了这条对赌。

The point of divergence. This stretch is where this volume first meets its counter-bet head-on: if by 2030 preference models can stably predict "this audience will judge it good," the "judgment role" thins earlier than expected. The counter-bet block below records that wager.

LONG-TERM2030-2032+
形态多元,而非单一收敛
Plural forms, not a single convergence

最可能的不是"设计师消失",也不是"设计师照旧",而是光谱分叉:一端是高度自动化、以可机检规格驱动的"均值产品"(够好即可、追求规模与速度),另一端是以稀缺品味为护城河的"判断密度组织"(少数人 + 大量生成,靠"挑得准"卖溢价)。两端之间是仍在用人手做大部分判断的传统团队。"设计师"这个词本身被重新定义——从"会做界面的人"转向"为体验的好坏承担后果的人"。

The most likely outcome is neither "designers vanish" nor "designers carry on as before" but a forking spectrum: at one end, highly automated "mean products" driven by machine-checkable spec (good-enough, chasing scale and speed); at the other, "judgment-density organizations" with scarce taste as their moat (a few people plus heavy generation, selling a premium on picking accurately). Between them sit traditional teams still doing most judgment by hand. The word "designer" itself is redefined — from "someone who can make interfaces" to "someone who bears the consequences for whether the experience is good."

与"多元"判断对赌的,是"均值之海"的收敛预测:如果生成与偏好预测都成熟、且品味被工具民主化,差异化资产可能整体蒸发,所有产品滑向同一个被验证为"高转化"的局部最优。多元光谱与均值收敛,谁成为 2030 年代设计世界的主图景,是本章最值得跟踪的分歧。〔证据级 Ⅴ〕

Betting against the "plurality" judgment is the convergence prediction of the "Sea of the Mean": if generation and preference prediction both mature and taste is democratized by tooling, the differentiating asset may evaporate wholesale, and every product slides toward the same local optimum validated as "high-converting." Whether the plural spectrum or the convergence becomes the main picture of the 2030s design world is the most trackable divergence in this chapter. [grade Ⅴ]

COUNTER-TREND反趋势Counter-trend
"人手做的"成为溢价信号
"Made by a human" becomes a premium signal

所有强趋势都激发反趋势。当生成内容铺满,"100% 人类设计 / 手作"开始作为差异化卖点出现在小众品牌、独立刊物、精品工作室——不是因为人手一定更好,而是因为"可被证明不是均值生成"本身成了稀缺信号。"反 slop"会从个人洁癖变成一种被市场认可的定位。这一支不会成为主流,但它结构性地存在,并持续提醒:当一切都趋同,"不趋同"本身就有价值。

Every strong trend provokes a counter-trend. As generated content saturates, "100% human-designed / handmade" begins appearing as a differentiator among niche brands, independent publications, and boutique studios — not because human hands are necessarily better, but because "provably not the generated mean" becomes a scarce signal in itself. "Anti-slop" shifts from personal fastidiousness to a market-recognized position. This branch will not become mainstream, but it is structurally present, a standing reminder: when everything converges, "not converging" is itself worth something.

来自那些世界的三件文物Three Artifacts from Those Worlds

Three Artifacts from Those Worlds

推演若只有论断会显得抽象。下面三件是 design fiction——明确虚构的未来文物,用以让"品味成为护城河的设计组织"可触。它们不是预测,是把命题投影到 2031 的一种方式。

Speculation made only of assertions feels abstract. The three pieces below are design fiction: explicitly fictional future artifacts that make "the design org where taste is the moat" tangible. They are not predictions; they are a way of projecting the thesis onto 2031.

SPECULATIVE · 虚构 · Fiction
ARTIFACT 01 · 招聘启事 · Job Posting
招聘:品味负责人(Head of Taste)· 不招生产者
Hiring: Head of Taste · Not Hiring Producers

「你不会被要求出稿。出稿这件事,我们的生成管线一天能给你三千版。你要做的是它做不了的:在三千版里挑出该上线的那一版,说清为什么是它、为什么不是另外两版,并为这个判断的后果负责。」

"You will not be asked to produce comps. Producing comps is something our generation pipeline can hand you three thousand of a day. You do what it cannot: pick the one that should ship out of three thousand, say why it and not the other two, and own the consequences of that judgment."

职责
判断而非生产 · 设定品味与品牌边界 · 维护"喂给生成的规格" · 为不可逆的发布决策担责
Responsibilities
Judge rather than produce · set taste and brand boundaries · maintain the "spec fed to generation" · own irreversible release decisions
不要求
任何单一生成工具的熟练度(我们假设它一年内会被换掉)
Not required
Proficiency in any single generation tool (we assume it will be replaced within a year)
考核
命中率与方向正确度——挑中的版本上线后的真实表现,以及"为什么"是否能复用进规格、让下一轮命中率更高(非出稿量)
Evaluation
Hit rate and directional correctness — the real-world performance of the picked version after launch, and whether the "why" can be folded back into spec so the next round's hit rate rises (not comp volume)
SPECULATIVE · 虚构 · Fiction
ARTIFACT 02 · 工具更新日志 · Tool Changelog
某生成式设计工具 v9.0 更新日志(节选)· 人的角色被翻转
A Generative Design Tool, v9.0 Changelog (Excerpt) · The Human Role Inverts
新增
「判断模式」成为默认。打开文件即生成 N 版候选;画布不再是空白,而是一墙待裁的候选。"新建空白画板"降级到二级菜单。
Added
"Judgment mode" is now the default. Open a file and N candidates are generated; the canvas is no longer blank but a wall of candidates awaiting a verdict. "New blank artboard" is demoted to a submenu.
变更
主操作从"画"变成"挑 / 评 / 导"。每次裁决会问一句"为什么",把理由沉淀进项目的品味规格——工具开始替你积累那条判断回路。
Changed
The primary action shifts from "draw" to "pick / critique / steer." Each verdict prompts a "why," depositing the reason into the project's taste spec — the tool begins accruing your judgment loop for you.
已知限制
工具能挡住违反规格的候选,不能替你决定规格本身对不对。节奏、情绪、何时留白仍需人裁——这是设计上我们刻意不自动化的边界,也是本工具的设计立场。
Known limitation
The tool can block candidates that violate the spec; it cannot decide for you whether the spec itself is right. Pacing, emotion, when to leave space still require a human verdict — a boundary we deliberately do not automate, and this tool's design stance.
SPECULATIVE · 虚构 · Fiction
ARTIFACT 03 · 同质化事故复盘 · Homogenization Postmortem
"我们的 App 和竞品长得一模一样" · 复盘摘要
"Our App Looks Identical to a Competitor's" · Postmortem Summary

2031,一次品牌重做半年后,团队发现自己的产品与两家竞品的关键页几乎无法区分。没人抄谁——三方都用了同几个生成模型、同几套流行规格、同样把"提升转化"交给同一类偏好预测。趋同不是抄袭,是每个理性团队都各自滑向了同一个被验证为"高转化"的局部最优

In 2031, half a year after a brand refresh, a team found its product nearly indistinguishable from two competitors' on the key screens. No one copied anyone — all three used the same few generation models, the same popular spec, and handed "improve conversion" to the same class of preference prediction. The convergence was not plagiarism but every rational team independently sliding toward the same local optimum validated as "high-converting."

根因
共享模型 + 共享规格 + 共享优化目标 = 共享吸引子(呼应 DSN 09·7 同质化机制)
Root cause
Shared models + shared spec + shared optimization target = a shared attractor (echoing the DSN 09·7 homogenization mechanism)
责任链
落在把"何为好"完全外包给转化数字的判断者——不是"模型趋同了",是没人守异质这件事的判断节点空着
Chain of responsibility
Falls on the judges who outsourced "what is good" entirely to conversion numbers — not "the models converged" but that the judgment node for guarding heterogeneity was left empty
修复
把一条"异质守护"指标放进评审(与均值的距离),并显式保留一个人来回答"这是否还像我们"——把本卷的命题做成一道工序,而非口号
Remediation
Put a "guard-heterogeneity" metric into review (distance from the mean) and explicitly keep one human to answer "does this still look like us" — turning this volume's thesis into a step in the process, not a slogan
记录在案的反命题 · COUNTER-BET本卷押注"品味稀缺、不可机检"。诚实要求把最强的反方观点完整摆出,而不是立一个稻草人来打。反方的最强形式不是"AI 会画得更好看",而是"品味在统计上并不神秘,因此可被学习与复制"。论证分三步,每一步都已有早期苗头:其一,"被某群人判为好"很可能是一个有结构、可学习的分布——人的审美偏好并非随机,它被文化、语境、近因强烈约束,而凡是有结构的东西,足够的数据加足够强的模型原则上就能逼近。其二,规模正在把这个分布的数据补齐:每一次 A/B、每一次留存、每一次"用户更爱哪版"都在给偏好函数喂标注,偏好建模因此可能不必依赖"理解为什么好",只需在结果上稳定复现"被判为好"。其三,本卷反复强调的"导/挑/评"这套判断动作,一旦能被显式表达成规格(这正是 DSN 06-08 在推动的事),也就同时把它暴露成了可被模仿的训练信号——我们越是成功地把品味外化成可机检的规格,就越是在亲手为"自动化品味"铺设训练集。这是本卷方法论里一个真实的内在张力,不该被掩盖。什么观察会证实反方、判本卷败:当一个"生成 + 偏好预测"系统在双盲条件下、对一个它训练时未见过的新受众群、于一个旧判据不再适用的新情境里,其挑选命中率能稳定追平甚至超过该领域资深判断者——且这种优势可跨品类复现、而非靠过拟合某一类视觉——那么"品味结构性地停在人这侧"就被推翻了,护城河蒸发,本卷的承重命题随之失效,应整章改写。本卷给自己留的、尚未被证伪的余地只有一处:那个"未见过的新情境"——偏好模型擅长内插已被判过的分布,但一个真正没有先例的新审美命题(一种没人做过、却"对"的东西),没有历史标注可学。只要新情境持续产生、且其判断无法靠内插历史得出,人的判断节点就还在;这条余地一旦也被关上(模型能稳定地为无先例情境做出被验证为对的判断),本卷就该认输。作者把这条证伪条件白纸黑字写在这里,正是为了不把"品味永远稀缺"当成不可质疑的信仰。〔证据级 Ⅴ 论证,与本卷对赌〕
The counter-bet on record · COUNTER-BETThis volume bets that taste is scarce and machine-uncheckable. Honesty demands laying out the strongest opposing case in full rather than erecting a straw man to knock down. The counter-bet's strongest form is not "AI will draw prettier things" but "taste is not statistically mysterious and is therefore learnable and reproducible." The argument runs in three steps, each with an early signal already visible. First, "judged good by a given audience" is very likely a structured, learnable distribution — human aesthetic preference is not random; it is strongly constrained by culture, context, and recency, and anything with structure can in principle be approximated by enough data plus a strong enough model. Second, scale is filling in that distribution's data: every A/B test, every retention curve, every "users preferred this version" is labeling the preference function, so preference modeling may not need to "understand why it is good" — only to reproduce "judged good" stably at the level of outcomes. Third, the very judgment act this volume keeps emphasizing — steer / pick / critique — once it can be expressed explicitly as spec (exactly what DSN 06-08 push toward), is thereby also exposed as an imitable training signal: the more successfully we externalize taste into machine-checkable spec, the more we are, with our own hands, laying down the training set for "automated taste." This is a real internal tension inside this volume's method, and it should not be hidden. What observation would confirm the counter-bet and rule this volume lost: when a "generation + preference-prediction" system, under double-blind conditions, for a new audience it did not see in training, in a novel situation where old criteria no longer apply, can stably match or exceed a senior domain judge's pick-accuracy — and that edge reproduces across categories rather than overfitting one visual genre — then "taste sits structurally on the human side" is overturned, the moat evaporates, this volume's load-bearing claim fails with it, and the chapter should be rewritten. The one not-yet-falsified margin this volume keeps for itself is precisely that "novel situation never seen": preference models excel at interpolating distributions already judged, but a genuinely unprecedented aesthetic proposition (something no one has made yet that is nonetheless "right") has no historical labels to learn from. As long as novel situations keep arising and their judgment cannot be reached by interpolating history, the human judgment node remains; the day that margin closes too — when a model stably makes verified-correct judgments for situations without precedent — this volume should concede. The author writes this falsification condition down in plain ink precisely so as not to hold "taste is forever scarce" as an unquestionable article of faith. [grade Ⅴ, argument, betting against this volume]

推演溢出的东西Second-Order Effects

Second-Order Effects

推演的终点不是设计组织本身,是它溢出的东西。以下每条都标注在哪个象限下成立——没有无条件的预言。

The endpoint of speculation is not the design org itself but what spills over from it. Each item below is annotated with the quadrant under which it holds; there are no unconditional prophecies.

DSN
11
PLAYBOOK · 落地 / 失败模式 / 自检
PLAYBOOK · LANDING & FAILURE MODES
行动 · 承重
Action · Load-bearing

起步、最常见的误用方式、一件自检

Where to start, the most common ways to get it wrong, one self-check

把这一卷收成可执行的落点:四条原则、四组信号、一条起步路径,加上设计师在 AI-Native 化里最常掉的三个坑。最后给一件可玩的自检——把"品味是稀缺判断"做成你下次发版前能跑一遍的清单。

Bring the volume to an executable landing: four principles, four signal sets, a starting path, plus the three pits designers most often fall into going AI-Native. Then a playable self-check that turns "taste is the scarce judgment" into a list you can run before the next release.

四原则

Four principles

AI 是协作者,不是评判者——这条边界决定了谁握最终判断

AI is a collaborator, not the judge — this boundary decides who holds the final call

一个容易滑过去、却决定成败的边界:在设计里,AI 可以是极强的协作者(铺候选、补状态、给建议、甚至模拟某类用户的反应),但不能成为最终的评判者。原因回到 DSN 03·5:评判"这版是否为这群人对路"需要构成性的品味判断,它坐落在可验证性梯度的最远端,AI 给出的"评分"本质仍是对均值的拟合——让它当裁判,等于让均值来定义好坏,那条异质守护的线就会被悄悄抹平。可以让 AI 帮你把判断说得更清楚("这版为什么让你犹豫?是层级、是语气、还是节奏?"),这是协作;但不能让 AI 替你做出那个判断。把这条边界写进团队的工作约定:AI 的输出永远是"候选 + 理由",最终"收哪个"的按钮必须由人按下,并由人说清为什么。一旦让模型既当运动员又当裁判,闭环里那个验证器就被偷换成了均值生成器,整套方法的承重点就塌了。

A boundary easy to skip past yet decisive: in design, AI can be an extremely strong collaborator (spreading candidates, filling states, giving suggestions, even simulating how a class of users might react), but it cannot become the final judge. The reason returns to DSN 03·5: judging "is this version on-target for these people" needs constitutive taste, sitting at the far end of the verifiability gradient, and the "score" AI gives is still at bottom a fit to the mean — making it the referee means letting the mean define good and bad, and the line of heterogeneity-guarding gets quietly erased. You can let AI help you articulate the judgment more clearly ("why does this version make you hesitate — hierarchy, voice, or pacing?"), which is collaboration; but you cannot let AI make that judgment for you. Write this boundary into the team's working agreement: AI's output is always "candidates + reasons," and the final "converge on which" button must be pressed by a human who states why. Once the model is both athlete and referee, the verifier inside the closed loop is swapped for a mean-generator, and the whole method's load-bearing point collapses.

起步路径

Starting path

① 把现有设计搬到代码形态(哪怕只是 token + 组件落进 repo),先拿到可读/可 diff/可生成。② 用下方自检挑出"最该注入品味"的几处。③ 在那几处跑一遍 DSN 07 的环(规格→铺开→评判→导向→收敛→沉淀),把第一轮的判断回流进系统。先小,先让护栏与判断成形,再扩面。

这条起步路径刻意把"先建护栏"放在"开生成"之前,是因为顺序本身承重。一个常见的失败是反过来:先兴奋地让 AI 铺一堆界面,再回头想"该用什么规范统一它们"——这时你已经被一堆好看但各异的候选淹没,判断力耗在收拾局面上,而不是导向。先建护栏(哪怕极小:三五个 token、两三条红线、一句"为谁而做")意味着第一轮生成就落在窄带里,你的判断从一开始就用在刀刃上。所以"先小"不是保守,是把稀缺的判断力花在最高杠杆的地方——先让护栏与判断在一个小范围内成形、跑通那条闭环,确认判断真的在回流、命中率真的在上升,再扩到更大的面——把第一个完整循环跑通,比一次铺开十个页面重要得多。

This starting path deliberately puts "build guardrails first" before "start generating," because the order itself is load-bearing. A common failure reverses it: excitedly have AI spread a pile of interfaces first, then go back wondering "what standard should unify them" — by which point you are drowning in good-looking but divergent candidates, your judgment spent on cleanup rather than steering. Building guardrails first (even minimal: three to five tokens, two or three red lines, one "for whom" line) means the very first round of generation lands in the narrow band, and your judgment is spent on the cutting edge from the start. So "start small" is not conservatism but spending scarce judgment where leverage is highest — let guardrails and judgment take shape in a small scope first, run the closed loop through, confirm that judgment really flows back and hit-rate really rises, then widen to a larger surface — getting that first complete cycle running matters far more than spreading ten pages at once.

如果要把这一卷压成可以贴在墙上的一句话,那就是:生成负责"多",人负责"对";而"对"的标准,必须由人写下来、并随每一轮判断变得更准。这一句里装着全部四原则——设计系统先行(把"对"的标准前置成护栏)、生成多判断严(分工:"多"给机器、"对"给人)、写下何为好(标准必须外化,否则生成只能滑回均值)、守住人本与异质("对"的最终判据是"为这群具体的人对",而非"对所有人还行")。它也装着全部三个失败模式的镜像:把"快"当胜利=只追求了"多"忘了"对";用生成代替判断=放弃了人对"对"的责任;把品味当可计算=误以为"对"的标准能全交给机器。把这一句记住,遇到任何具体决策时,问自己:这一步我是在帮生成产出更多,还是在帮自己把"对"的标准说得更清楚?前者机器越来越能替你做,后者永远是你的活——也永远是这门手艺真正的价值所在。

If this volume had to be compressed into one line you could pin on a wall, it would be: generation handles "many," people handle "right"; and the standard for "right" must be written down by people and grow sharper with each round of judgment. That one line holds all four principles — design system first (front-load the standard for "right" as a guardrail), generate many and judge hard (the division of labor: "many" to the machine, "right" to people), write down what is good (the standard must be externalized, or generation can only slide back to the mean), hold the human and the heterogeneous (the final criterion of "right" is "right for these specific people," not "fine for everyone"). It also holds the mirror image of all three failure modes: mistaking "fast" for the win = chasing only "many," forgetting "right"; generation in place of judgment = abandoning the human's responsibility for "right"; treating taste as computable = wrongly believing the standard for "right" can be handed entirely to the machine. Remember this one line, and at any concrete decision ask yourself: in this step am I helping generation produce more, or helping myself state the standard for "right" more clearly? The former the machine can increasingly do for you; the latter is forever your work — and forever where the real value of this craft lies.

① Move existing design into code form (even just tokens + components into the repo) to first gain readable/diffable/generatable. ② Use the self-check below to find the few places "most in need of injected taste." ③ Run the DSN 07 loop there (spec → spread → critique → steer → converge → distill), feeding the first round's judgment back into the system. Start small; let guardrails and judgment take shape, then widen.

最常见的三种误用

The three most common ways to get it wrong

失败 · 一
FAILURE · 1
把"快"当胜利
Mistaking "fast" for the win
出稿更快了,只是更快地产 slop。出稿快本身不是赢;若没换来更高品味命中,那不是 AI-Native 设计的胜利。
Comps come faster, but it is just slop faster. Speed itself is not the win; if it buys no higher taste hit-rate, it is no AI-Native win.
失败 · 二
FAILURE · 2
用生成代替判断
Generation in place of judgment
不停"再来一个"却说不出在找什么。环空转,候选越堆越多,方向反而越来越糊。该停下写规格。
Endless "one more" with no statement of what you seek. The loop spins, candidates pile up, direction blurs. Stop and write the spec.
失败 · 三
FAILURE · 3
把品味当可计算
Treating taste as computable
把软判据硬塞 lint,优化掉每个可机检指标,得到挑不出错却没人想用的界面。软判据留给人。
Forcing soft criteria into lint, optimizing every checkable metric into a flawless interface no one wants. Soft criteria stay with people.
INSTRUMENT 12 · SLOP 自检表INSTRUMENT 12 · SLOP SELF-CHECK

发版前跑一遍:勾掉你这版命中的征兆。命中越多,越滑向均值——读数会给出 slop 分、所处带,与第一处该注入品味的地方。Run it before release: tick the fingerprints this version hits. The more hits, the closer to the mean; the readout gives a slop score, the band, and the first place to inject taste.

把每类设计决策分诊:哪些交给生成、哪些定成规则、哪些必须人判

Triage each kind of design decision: hand to generation, set as a rule, or judge by human

DSN 08 给了一条分诊问句,这里把它做成一台可玩的分配器。它沿两条轴打分:横轴问"这类决策可生成 / 可规则化吗?",纵轴问"它需要品味判断吗?"。两轴一交叉,每类决策落进一格,给出明确判词——交给生成、设计系统定规则、还是必由人判。这台分配器的价值不在于告诉你某个答案,而在于逼你把"设计"这个笼统的词拆成一类一类的决策,逐类问这两个问题——这本身就是把品味从直觉里外化出来的过程。试着用它过一遍你手上的项目:补全状态、配色主调、信息层级、品牌语气、对齐间距,各落在哪格?

DSN 08 gave a triage question; here it becomes a playable allocator. It scores along two axes: the horizontal asks "can this kind of decision be generated / ruled?" and the vertical asks "does it need taste judgment?" Cross the two and each kind of decision lands in a cell with a clear verdict — hand to generation, set a design-system rule, or judge by human. The allocator's value is not telling you a single answer but forcing you to break the blanket word "design" into decision by decision and ask these two questions of each — which is itself the process of externalizing taste from intuition. Try running your current project through it: filling states, the primary palette, information hierarchy, brand voice, alignment and spacing — which cell does each land in?

INSTRUMENT 13 · 设计判断分配器INSTRUMENT 13 · DESIGN-JUDGMENT ALLOCATOR

选一类设计决策的两个属性,看它该落在哪个节点。两轴:可生成/可规则化?× 需品味判断?Pick the two attributes of a kind of design decision and see which node it belongs to. Two axes: generable/rule-able? × needs taste?

① 这类决策可生成 / 可规则化吗?① Can it be generated / ruled?
② 它需要构成性的品味判断吗?② Does it need constitutive taste?
x=Y · y=N
交给生成
Hand to generation
补全状态、铺响应式、套系统、对齐切图——机器全包。
Fill states, responsive, apply the system, alignment/export — the machine does it all.
x=Y · y=Y
设计系统定规则
Design-system rule
可机检但带价值取向:调色板、间距阶、对比度阈值——人先定规则,机器再执行。
Machine-checkable yet value-laden: palette, spacing scale, contrast thresholds — humans set the rule, the machine enforces.
x=N · y=N
需上下文的事实题
Context-fact question
不靠品味但须懂语境:这群用户的真实流程、设备、约束——人查清事实,喂给生成。
No taste but needs context: these users' real flows, devices, constraints — humans establish facts, then feed generation.
x=N · y=Y
必由人判 · 品味
Human taste · keep here
为谁而做、有没有灵魂、对不对路——可验证性梯度最远端,不可外包给生成。
For whom, has soul, on-target — the far end of the verifiability gradient, not outsourceable.

对一个设计师的职业,这套方法意味着什么

What this method means for a designer's career

把这卷收到个人层面:如果你是一个设计师,这套方法不是在预告你的工作会消失,而是在指出你的价值重心会迁移,且越早主动迁移越有利。会做(出稿、铺变体、对齐切图)的部分会持续贬值,因为它正是生成最擅长接管的;会判断(写有判别力的规格、说清为什么这版对路、把判断回流进系统)和会方向(看出该往哪生成、守住为谁而做的边界)的部分会持续升值,因为它坐落在可验证性梯度上模型够不到的那一端。这意味着值得主动投资的能力,不再是"把某个工具用得更熟",而是"把品味外化得更清楚、把判断说得更有理有据"。一个具体的自检:回顾你上周的工作,花在产出上的时间和花在判断、写规格、给方向上的时间,比例是多少?如果还是前者占绝大多数,那不是因为 AI 没用,而是你还停在旧流程里用更快的手——真正的迁移还没发生。这套方法给的不是工具,是一张把自己的价值重心往上挪的地图。

Bringing this volume down to the individual level: if you are a designer, this method is not foretelling that your work will vanish but pointing out that your center of value will migrate, and the earlier you migrate it deliberately, the better off you are. The making part (producing comps, spreading variants, alignment and export) will keep depreciating, because it is exactly what generation is best at taking over; the judging part (writing discriminating specs, stating why a version is on-target, feeding judgment back into the system) and the directing part (seeing which way to generate, holding the boundary of for-whom) will keep appreciating, because they sit at the end of the verifiability gradient the model cannot reach. This means the capability worth investing in is no longer "getting more fluent with some tool" but "externalizing taste more clearly, stating judgment with more reasoned grounds." A concrete self-check: review last week's work — what is the ratio between time spent on production and time spent judging, writing specs, giving direction? If the former still dominates by far, it is not because AI is useless but because you are still in the old process with a faster hand — the real migration has not happened. What this method offers is not a tool but a map for moving your own center of value upward.

这套迁移有一个常被忽略、却对个人最重要的隐含前提:判断力是会随用而长、随弃而萎的,所以"现在就开始判断"本身就是在投资未来的自己。价值重心上移不是一道一次性切换的开关,而是一条需要持续走的成长曲线——你越早开始在工作里刻意练规格、练评判、练方向,你的判断力就越早进入复利增长;反之,越是抱着"等工具更成熟、等团队都转了我再转"的心态拖延,你就越是在让那条本该增长的曲线停在原地,而周围真正动手的人在拉开差距。这条曲线对个人的残酷与公平都在于:它不奖励你用了多新的工具,只奖励你做了多少次真正的判断、并复盘了多少次为什么。所以这一卷给设计师的最后一句话不是"去学某个 AI 工具",而是:从你手上正在做的下一个设计开始,强迫自己在每个该判断的地方真的判断一次、并写下为什么——这一个动作,重复足够多次,就是你在 AI-Native 时代最可靠的护城河。

This migration has an implication often overlooked yet most important to the individual: judgment grows with use and atrophies with disuse, so "start judging now" is itself an investment in your future self. Moving your center of value upward is not a one-time switch but a growth curve you must keep walking — the earlier you start deliberately practicing spec, critique, and direction in your work, the earlier your judgment enters compounding growth; conversely, the more you procrastinate with "I'll switch once the tools mature, once the team has moved," the more you let that curve that should be growing stall in place while the people actually doing the work pull ahead. This curve is both cruel and fair to the individual: it rewards not how new your tools are but how many real judgments you made and how often you reviewed why. So this volume's last word to designers is not "go learn some AI tool" but: starting with the very next design in your hands, force yourself to actually judge once at each place a judgment is due and write down why — this one act, repeated enough times, is your most reliable moat in the AI-Native era.

把整卷收成一句可以带走的话:AI 让"做"变得廉价,于是"判断该做什么、为谁做"第一次成了设计真正的、几乎全部的价值。过去这两件事缠在一起,做得好的人通常也判断得好,于是没人需要把判断单拎出来谈;现在执行被剥离给生成,判断被迫独立显形,它的稀缺、它的可训练、它对人本的依赖,才第一次看得这么清楚。这一卷做的全部事情,就是把这个被剥离出来的判断,从"藏在直觉里的玄学"还原成"可拆解、可外化、可回流、可练习的具体动作"。而它最终指向的,是那条贯穿整个系列的主张:机器接管了重复的铺面,人被还给那件一开始就最该由人做、也最值得由人做的事——为具体的人,做真正为他们而存在的东西。

To bring the whole volume to one takeaway line: AI made "making" cheap, so "judging what to make and for whom" became, for the first time, design's real and nearly entire value. These two used to be entangled — those who made well usually judged well, so no one needed to discuss judgment on its own; now that execution is stripped to generation, judgment is forced to take independent shape, and only now are its scarcity, its trainability, its dependence on the human seen this clearly. Everything this volume did was to restore that stripped-out judgment from "mysticism hidden in intuition" to "a concrete act that can be decomposed, externalized, fed back, and practiced." And what it ultimately points to is the claim running through the whole series: the machine took over the repetitive surface; the person was returned to what was theirs to do from the start and most worth doing — making, for specific people, something that genuinely exists for them.

收束 · 系列人本主线Close · the series' human through-line

这一卷是整个系列人本主线落得最具体的一面——AI-Native 设计不是让人更快地产 slop,而是把设计师还给共情、品味与意义:为具体的人,做真正为他们而存在的东西。This volume is where the series' human through-line lands most concretely — AI-Native design is not producing slop faster; it returns the designer to empathy, taste, and meaning: making, for specific people, something that genuinely exists for them.

DSN
09
WORKED · 落到产物上
WORKED CASES
案例 · 走一遍
Cases · run it

把内核四步,按到四个真实产物上

Pressing the four-step kernel onto four real artifacts

前面的原理若只停在原理,就还只是好听的话。这一节把同一套内核——执行交给生成、判断退守到人、上下文写成规格、人回流到"为谁而做"——按到四个具体产物上:一次结账流程重做、一次把"好"写成规格、一次为少数人放弃多数人、一次把已上线的 slop 救回来。每个案例都给出可核对的前后差与留在人手里的那一刀。

Principle that stays principle is just nice talk. This section presses the same kernel — execution to generation, judgment back to people, context written as spec, people returning to "for whom" — onto four concrete artifacts: a checkout-flow redo, a "good" written as a spec, a choice to drop the many for the few, and a shipped slop product rescued. Each gives a checkable before/after and the one cut that stayed in human hands.

案例 A · 结账流程在近免费生成下重做CASE A · A checkout flow, redone under near-free generation铺开→收敛真实跑法spread→converge, as it ran
背景
Setting

一个中小电商的结账流程,老问题是第三步(地址+配送方式)弃单率高。传统做法:交互稿排一周、评审、切图、交付前端两周——一个备选,赌它对。AI-Native 做法:把约束写清后,让生成在一个下午铺出 九个结构不同的候选——单页全展开、三步分页、手风琴折叠、地址自动带出+仅确认、配送选项前置、运费早现、游客优先、钱包优先、混合式。产出不是瓶颈了,挑哪个、为什么才是。

A mid-size e-commerce checkout whose chronic wound was a high abandon rate at step three (address + delivery). Traditional path: a week of interaction comps, a review, slicing, two weeks of front-end handoff — one candidate, bet it is right. The AI-native path: with constraints written down, generation spread nine structurally different candidates in one afternoon — single-page expanded, three-step paged, accordion, address-autofill-then-confirm, delivery-first, freight-shown-early, guest-first, wallet-first, hybrid. Production stopped being the bottleneck; which one, and why became it.

内核①②
KERNEL ①②
执行(出九稿)交给生成;判断(选哪个)退守到人。
Execution (nine comps) to generation; judgment (which) back to people.
收敛是怎么真发生的
How convergence actually happened

九个候选没有靠"哪个看起来顺眼"挑。团队先把弃单的真实成因查清——会话回放显示,多数人卡在运费在最后一步才出现,而非布局繁简。这一条事实把九选一从审美题变成了事实题:凡是"运费早现"的候选直接进入决赛圈,其余无论多漂亮都淘汰。决赛三个候选做了一周灰度 A/B:运费早现+地址自动带出的混合式,弃单率从 31% 降到 19%〔源:本案例数字为脱敏复盘区间,证据级 Ⅳ 一手从业者,非公开实验,不外推到其他品类〕[R5]

The nine were not picked by "which looks nicer." The team first established the real cause of abandonment — session replay showed most people stalled because freight appeared only at the last step, not because layouts were busy. That single fact turned nine-into-one from an aesthetic question into a factual one: every "freight-early" candidate advanced; the rest, however pretty, were cut. Three finalists ran a week of gated A/B: the hybrid with freight-early plus address-autofill dropped abandonment from 31% to 19%〔source: figures here are a de-identified retrospective range, grade Ⅳ practitioner first-hand, not a public experiment, not extrapolated to other categories〕[R5].

留在人手里
STAYED HUMAN
把"为什么弃单"查成事实,再用事实当判据筛候选——这一刀机器递不出。
Establishing "why they abandon" as fact, then using fact as the cut — a cut the machine cannot hand you.
读后
Read-off

注意这不是"生成帮我们更快画完一个稿"。是流程换了形:先铺开候选空间,再用一条查清的事实把空间收掉大半,最后只在三个里做受控比对。生成做了所有"做出来"的活,人做的全是"该是哪个、凭什么"的判断。两周的交付压成了一个下午加一周灰度,省下的不是工时,是把赌注从一个押到了九个,再用证据收敛

Note this is not "generation helped us finish one comp faster." The process changed shape: spread the candidate space first, collapse most of it with one established fact, then run a controlled comparison among only three. Generation did all the "making"; the human did all the "which one, on what grounds" judgment. Two weeks of handoff compressed to one afternoon plus a week of gating, and what was saved was not labor but the move from betting on one to betting on nine, then converging on evidence.

内核③④
KERNEL ③④
事实写进筛选规则(③);为真实用户的处境而判(④)。
Fact written into the screening rule (③); judging for real users' situation (④).
案例图CASE FIGFIG. D9 / SPREAD-THEN-COLLAPSE · 九候选如何被一条事实收掉 · how nine candidates collapse on one fact 看懂:候选空间先铺宽,再被"运费早现"这条事实砍成决赛圈 Read: the candidate space spreads wide, then one fact — "freight early" — cuts it to a shortlist
铺开 · 9 个候选(生成 · 一个下午) Spread · 9 candidates (generation · one afternoon) single-page three-step paged accordion address-autofill delivery-first ✓ freight-early ✓ guest-first wallet-first hybrid ✓ FACT GATE 会话回放查到的事实: The fact from session replay: "运费太晚才出现" "freight shows too late" keep only freight-early candidates 决赛圈 3 个 · 灰度 A/B 3 finalists · gated A/B delivery-first freight-early hybrid ← winner abandon 31% → 19% 人只判两件:哪条事实成立、 Human judged two things only: 哪个候选服它 which fact holds, which candidate serves it
漏斗不是"生成给的越多越好"。宽口是生成的功劳(九个结构不同的候选,近零成本);窄口是人的功劳(一条查清的事实把审美题压成事实题)。决赛圈才用受控比对。机器把"做出来"做到极廉,人把"该是哪个"判得有据——这就是铺开→收敛的全部。
The funnel is not "the more generation gives, the better." The wide mouth is generation's contribution (nine structurally distinct candidates, near-zero cost); the narrow mouth is the human's (one established fact compresses an aesthetic question into a factual one). Only the shortlist gets controlled comparison. The machine makes "making" nearly free; the human judges "which one" on grounds — that is the whole of spread-then-converge.
案例 B · 把模糊的"高级感"写成可生成、半可机检的规格CASE B · Turning a fuzzy "premium feel" into a generatable, half machine-checkable spec品味写成规格taste as spec
起点:一句话的"好"
Start: a one-line "good"

一个金融类 App 改版,老板的全部 brief 是三个字:"要高级。"过去这种 brief 靠一个资深设计师把它"画出来",好坏全在那个人脑子里、说不清也教不会。在生成几乎免费的语境下,这种 brief 是有毒的:你把"要高级"丢给模型,它给你的就是训练分布里最常见的那种"高级"——深色+衬线+大留白+金色描边,所有金融 App 都长这样的那种。模糊的"好"喂出的必然是均值 slop。

A fintech app redesign; the boss's entire brief was two words: "make it premium." Historically a senior designer would "draw out" what that meant, with good and bad living unspoken in one head, unteachable. Under near-free generation this brief is toxic: hand "premium" to a model and you get the most common "premium" in its distribution — dark + serif + big whitespace + gold trim, the look every fintech app already wears. A fuzzy "good" feeds back mean slop by construction.

病灶
THE WOUND
模糊判据 + 廉价生成 = 自动收敛到均值。
Fuzzy criteria + cheap generation = auto-converge to the mean.
把"高级"拆成可判别的条目
Decomposing "premium" into discriminating items

解法不是换个更懂的设计师,是把"高级"拆成一组任何人(和机器)都能逐条判的条目。团队跟老板做了三轮"指物问答":拿十个现有界面,逐个问"这个算高级吗、为什么"。三轮后,"高级"被拆成六条可判据:① 信息密度低但不空(每屏不超过一个主操作);② 字体仅两档字号、字重对比靠粗细不靠颜色;③ 主色取自品牌而非通用金色,对比度 ≥ 4.5:1;④ 不用渐变文字、不用玻璃拟态;⑤ 数字用等宽体右对齐;⑥ 动效仅用于状态确认、时长 ≤ 200ms。前四条可写进 lint 直接机检;后两条是软判据,留给人复核。"高级"从一个人脑里的玄学,变成了一张能喂给生成、又能半自动验收的规格。

The fix is not a designer who "gets it" better; it is decomposing "premium" into items anyone (and any machine) can rule on one by one. The team ran three rounds of point-and-ask with the boss: take ten existing screens and ask, one at a time, "is this premium, and why?" After three rounds, "premium" decomposed into six rule-able items: ① low information density but not empty (at most one primary action per screen); ② only two type sizes, weight contrast carried by boldness not color; ③ primary color drawn from the brand, not generic gold, contrast ≥ 4.5:1; ④ no gradient text, no glassmorphism; ⑤ numbers in monospace, right-aligned; ⑥ motion only for state confirmation, ≤ 200ms. The first four go straight into a lint; the last two are soft criteria left for human review. "Premium" went from one head's mysticism to a spec you can feed generation and half-automatically accept.

两层规格
TWO LAYERS
①–④ 进 lint(可机检);⑤⑥ 留人(软判据)。判断一次,执行无数次。
①–④ to lint (machine-checkable); ⑤⑥ to humans (soft). Judge once, enforce countless times.
读后
Read-off

写规格的那三轮指物问答,本身就是这个项目最贵、最不可外包的工作——它把一个人的隐性品味,逼成了显性的、可传递的、可机检一半的判据。这正是设计卷反复说的"把判断从做里抽出来、写下来":过去品味活在那位设计师的手上,他一走就带走;现在它写成了六条规格,新来的人和生成模型都能照着跑。规格不是限制创意,它是把创意里可复用的那部分固化、把不可复用的那部分(为什么是这个品牌色)留给人继续判。

Those three rounds of point-and-ask were themselves the most expensive, least outsourceable work in the project — they forced one person's tacit taste into explicit, transferable, half-machine-checkable criteria. This is exactly what the volume keeps saying: pull judgment out of making and write it down. Taste used to live in that designer's hands and leave when he left; now it is six written rules a new hire and a generation model can both run against. A spec does not constrain creativity; it freezes the reusable part of judgment and leaves the unreusable part — why this brand color — for humans to keep judging.

内核②③
KERNEL ②③
判断退守(②)后,被写成上下文规格(③)喂回生成。
After judgment retreats (②), it is written as context spec (③) fed back to generation.
机理图MECHANISMFIG. D10 / GRAVITY vs FORCE · 均值引力对异质守护力 · mean-gravity against the heterogeneity force 看懂:生成默认把设计拉向分布均值;只有人施加一个反方向的力,产物才落到"只对这群人成立" Read: generation pulls design toward the distribution mean by default; only a human counter-force lands it on "true only for these people"
均值 MEAN slop basin 产物 ARTIFACT for these people 均值引力 mean-gravity always-on, free 生成默认方向 generation's default 异质守护力 heterogeneity force human-supplied, costly "只对这群人成立" "true only for these people" 撤掉人的力,产物就落回均值——这不是风险,是默认结局 Remove the human force and the artifact falls to the mean — not a risk, the default outcome equilibrium = where the two forces balance
这是设计卷整套主张的受力图。均值引力是常开的、免费的:生成模型优化"对得多",自然把每个产物往大家见得最多的样子拉。异质守护力是人施加的、昂贵的:它要求"为这群人特别",方向恰好相反。产物停在哪,取决于两力平衡——撤掉人的力,产物必然滑回均值。所以同质化不是某次失手,是不施力时的默认结局;异质守护也就不是锦上添花,是把产物按在它该在的位置上的那只手。
This is the force diagram behind the whole volume. Mean-gravity is always-on and free: a generation model optimizes "right for many" and naturally pulls every artifact toward the shape most people have seen. The heterogeneity force is human-supplied and costly: it demands "particular to these people," pointing the opposite way. Where the artifact rests is set by the balance — remove the human force and it slides back to the mean. So homogenization is not one slip but the default when no force is applied; and heterogeneity-guarding is not a nicety but the hand holding the artifact where it belongs.
案例 C · 选择"只对这群人成立",并付出代价CASE C · Choosing "true only for these people," and paying the cost异质守护的一次决策a heterogeneity-guarding decision
两条路摆在面前
Two roads

一个给视障人群用的播客 App。改版时摆出两条路:路 A,按通用最佳实践做——大图卡片、瀑布流、自动播放预览,这是生成默认会给的、也是评审会上最容易过的方案,因为它"看起来对";路 B,为这群人的真实处境做——主屏只有三个超大触控区、全程可纯键盘/读屏操作、关闭一切自动播放、用声音而非视觉做状态反馈、对比度拉到 7:1。路 B 在任何"通用美观"的评审标准下都会扣分:它不好看、不"现代"、留白少、信息密度高。

A podcast app for blind and low-vision users. The redesign laid out two roads: road A, build to generic best practice — big image cards, infinite scroll, autoplay previews; the default generation would give, and the easiest to pass review because it "looks right." Road B, build for these users' real situation — a home screen of just three oversized touch zones, full keyboard/screen-reader operation, all autoplay off, state feedback by sound not sight, contrast pushed to 7:1. Road B loses points under any "generic good-looking" review standard: not pretty, not "modern," little whitespace, high density.

分叉
THE FORK
通用均值(A)对真实人群(B)——只能选一个当北极星。
Generic mean (A) vs real people (B) — only one can be the north star.
付的代价是真的
The cost was real

团队选了路 B,而且明确认了代价:在面向明眼用户的应用商店截图里,它"卖相"差,下载转化低于同类;做品牌物料时,市场部反复想把那三个大触控区改"精致点"。这些代价不是想象的,是月月在数据和会议里出现的。但留在人手里的判断是:这个产品的"好",不由通用审美定义,由它服务的人能不能独立用完一集播客定义。可用性测试里,纯读屏完成"找到→播放→收藏"的成功率从改版前的 41% 升到 92%〔源:本案例为脱敏复盘,无障碍可用性区间引自 WebAIM 屏幕阅读器用户调查的同类量级,证据级 Ⅳ 一手+Ⅱ 调查参照,不外推为通用转化结论〕[R6]

The team chose road B and named the cost outright: in app-store screenshots aimed at sighted users it "sells" poorly, with download conversion below peers; in brand work, marketing kept wanting to make the three big touch zones "more refined." These costs were not imagined — they showed up monthly in data and meetings. But the judgment that stayed human was this: this product's "good" is not defined by generic aesthetics but by whether the people it serves can finish an episode independently. In usability testing, screen-reader-only success at "find → play → save" rose from 41% before to 92%〔source: de-identified retrospective; the accessibility-usability range cites the same order of magnitude as the WebAIM screen-reader user survey, grade Ⅳ first-hand + Ⅱ survey reference, not extrapolated to a generic conversion claim〕[R6].

留在人手里
STAYED HUMAN
定义"为谁好",并承担"对别人不好"的代价。机器不会替你认这个账。
Defining "good for whom," and bearing the cost of "not good for others." The machine will not own that bill for you.
读后
Read-off

异质守护不是免费的口号,它每次都要付一笔真实的代价——放弃一部分通用观众、顶住"做得更普适些"的压力、在某些通用指标上认输。但这恰恰是 AI-Native 设计里人最不可替代的那一刀:生成会一直把你往"对所有人都还行"拉,只有人能决定"我就为这群人做到最好,哪怕对别人差"。这一刀一旦交出去,产品就再也回不到"为它要服务的人而做",只会均匀地谁都不得罪、谁也不真正服务。

Heterogeneity-guarding is not a free slogan; every time it costs something real — giving up some of the generic audience, resisting pressure to "make it more universal," conceding on certain generic metrics. But that is precisely the most irreplaceable cut humans hold in AI-native design: generation will forever pull you toward "fine for everyone," and only a human can decide "I will be best for these people, even if worse for others." Surrender that cut and the product can never return to being made for the people it serves; it only evenly offends no one and serves no one.

内核④
KERNEL ④
人回到"为谁而做"的意义判断——并为之承担取舍。
People return to the meaning judgment of "for whom" — and own the trade-off.
案例 D · 一个"快但是 slop"的已上线产品,怎么诊断、怎么救回CASE D · A shipped-fast-but-slop product: diagnosing and rescuing itslop 急救slop rescue
症状
Symptom

一个 SaaS 仪表盘,团队用生成工具三天就上线了——快,是真快。但上线两周,用户反馈高度一致:"看起来挺专业的,但我说不出它哪不对,就是不想用。"留存第七日掉到 11%。这是典型的 slop 症状:没有明显 bug,每个页面单看都"还行",合在一起却空洞、雷同、没有重量。它不是做坏了,是做得太顺、太均值了

A SaaS dashboard the team shipped in three days with generation tools — fast, genuinely fast. But two weeks in, user feedback was eerily uniform: "it looks professional, but I can't say what's wrong; I just don't want to use it." Day-7 retention sank to 11%. This is the textbook slop symptom: no obvious bugs, every page "fine" alone, yet hollow, samey, and weightless together. It was not built badly; it was built too smoothly, too mean-ward.

slop ≠ bug
SLOP ≠ BUG
没有错,只是没有"为谁"——这是诊断的入口。
Nothing is wrong; there is just no "for whom" — the entry point for diagnosis.
用指纹清单逐条诊断
Diagnosing against the fingerprint list

救回的第一步不是重做,是诊断。团队拿前面 SHEET 那张 slop 指纹清单(配色雷同、玻璃拟态、处处大圆角+柔投影、空洞口号词、所有卡片一样大、字重靠颜色不靠粗细),逐条对照仪表盘——命中五条。命中本身就指出了病因:这五条都是"生成默认会给、没人下令它别给"的特征。诊断结论写成一句:这个产品的每个像素都对得起"通用专业感",没有一个像素是为它的真实用户——每天看十次、只关心三个数的运营——而做的。

The first rescue step is not a rebuild but diagnosis. The team took the slop-fingerprint list from the earlier SHEET (samey palettes, glassmorphism, rounded-everything + soft shadows, hollow slogan words, all cards the same size, weight carried by color not boldness) and checked the dashboard item by item — five hits. The hits themselves named the cause: all five are features "generation gives by default, with no one ordering it not to." The diagnosis wrote up as one line: every pixel honors "generic professionalism," and not one pixel was made for its real user — the operator who looks ten times a day and cares about three numbers.

诊断工具
THE TOOL
指纹清单把"说不出哪不对"翻译成五条可指认的具体征兆。
The fingerprint list translates "can't say what's wrong" into five nameable, concrete symptoms.
救法:不是重画,是重判
The fix: not repaint, re-judge

救回不靠"再生成一版更漂亮的"——那只会换一种 slop。救法是把"为谁"补回来:先做三个真实运营的影子观察,确认他们每天只盯"今日转化、异常订单、待处理工单"三个数;据此把首屏 80% 的卡片删掉,只留这三个,做大、做对比、做成"扫一眼就知道有没有事";删掉所有装饰性渐变和玻璃拟态,把省下的视觉预算全给那三个数。重做只花了两天(生成依然廉价),但这两天前面压着一周的判断:查清为谁、定义这个产品的"好"。改版后第七日留存从 11% 升到 34%〔源:本案例脱敏复盘区间,证据级 Ⅳ 一手从业者〕[R5]

Rescue is not "generate a prettier version" — that just swaps one slop for another. The fix restores the "for whom": shadow three real operators, confirm they watch only three numbers daily — today's conversion, anomalous orders, open tickets; on that basis delete 80% of the home-screen cards, keep only those three, make them big, contrasted, "glanceable for whether anything's wrong"; strip every decorative gradient and glass layer and spend the freed visual budget entirely on those three numbers. The rebuild took just two days (generation is still cheap), but those two days sat behind a week of judgment: establish for whom, define this product's "good." Day-7 retention rose from 11% to 34% after the redesign〔source: de-identified retrospective range, grade Ⅳ practitioner first-hand〕[R5].

内核全链
FULL KERNEL
诊断(②判断)→ 查清为谁(③上下文)→ 重生成(①执行)→ 验收(④意义)。
Diagnose (② judgment) → establish for whom (③ context) → regenerate (① execution) → accept (④ meaning).
前后图BEFORE/AFTERFIG. D11 / SLOP RESCUE · 不是重画,是把"为谁"补回来 · not repaint, restore the "for whom" 看懂:救 slop 的关键动作发生在生成之前——补回判断,再让生成执行 Read: the decisive move in rescuing slop happens before generation — restore judgment, then let generation execute
BEFORE · slop 七个一样大的卡片 · 谁也不强调 · D7 留存 11% seven equal cards · nothing emphasized · D7 11% THE WEEK BEFORE 影子观察三个运营 shadow 3 operators 他们只盯三个数 they watch 3 numbers judgment, not pixels AFTER · for whom 今日转化conversion 异常anomaly 工单tickets 三个数 · 一眼分主次 · 删掉 80% 装饰 3 numbers · clear priority · 80% decor cut D7 retention 11% → 34%
两端的"画"都是生成出来的,廉价、快。差别全在中间那个黑盒:救 slop 的决定性动作不是再画一版,是先补回被跳过的判断——查清为谁、它们只关心什么。前后对比里像素的变化(七个平均卡片→三个有主次的数)只是结果,真正的工作是那一周的判断。这也解释了为什么"再生成一版更漂亮的"永远救不了 slop:它跳过的正是中间这一步。
Both "drawings" are generated — cheap, fast. The whole difference is the black box between them: the decisive move in rescuing slop is not another comp but restoring the skipped judgment — establish for whom, and what they alone care about. The pixel change in the before/after (seven average cards → three prioritized numbers) is only the result; the real work was that week of judgment. This is why "generate a prettier one" never rescues slop: it skips exactly this middle step.
DSN
10
CRITIQUE · 旧结构
OLD STRUCTURES
批判 · 受力点
Critique · where it breaks

六种传统设计结构,在生成廉价时各自从哪一处断裂

Six traditional design structures, and where each one snaps when generation gets cheap

这些结构不是因为旧而错——它们曾经合理,是因为"做出来"昂贵而合理。一旦生成把"做出来"压到近零,每一种都从一个具体的承重点断裂:它们把人摆在了执行位,而执行恰好是机器接管的那一半。逐一点名,逐一说清断在哪。

These structures are not wrong for being old — they were once sound, sound because "making" was expensive. The moment generation crushes "making" to near-zero, each snaps at a specific load-bearing point: each puts humans in the execution seat, and execution is exactly the half the machine takes over. Named one by one, with the break located in each.

共同的断裂机理:六种结构表面各异,断裂处是同一个——它们都把人的价值定义在"产出物的速度或精致度"上。当这两样都被生成做到又快又好,建立在它们之上的角色、流程、组织就同时失去了承重墙。下面逐条点名,给出每一种过去为何合理、现在断在哪、AI-Native 下它该变成什么。

The shared break: the six structures look different but snap at the same place — each defines human value by "the speed or polish of output." When generation does both fast and well, the roles, flows, and orgs built on them lose their load-bearing wall at once. Each is named below: why it was once sound, where it snaps now, and what it must become under AI-native.

旧结构 ① · 装饰工 / 最后一公里美化
OLD ① · DESIGN-AS-DECORATION / LAST-MILE PRETTIFIER
"功能先做完,最后叫设计来美化一下"
"Build the function first, call design at the end to pretty it up"
过去合理:美化耗工时,得有专人。断在哪:美化正是生成最擅长的——它能瞬间出十版"更好看"。当美化免费,把人定位成美化工,等于把人按在机器最强的那一格上。该变成:人不做最后一公里的美化,做第一公里的判断——为谁、何为好、哪版对路。
Once sound: prettifying cost hours, so it needed a dedicated person. Where it snaps: prettifying is exactly what generation does best — ten "nicer" versions instantly. When prettifying is free, casting humans as prettifiers pins them on the machine's strongest square. Becomes: humans do not do the last-mile polish but the first-mile judgment — for whom, what good means, which version is on-target.
旧结构 ② · 孤胆天才 / 作者式设计师
OLD ② · THE LONE-GENIUS AUTEUR
"好设计出自一个有品味的天才之手"
"Good design flows from one tasteful genius's hand"
过去合理:品味是隐性的、长在手上的,只能由那个人产出。断在哪:手的活已交给生成,天才的"手"不再稀缺;稀缺的是把品味写下来的能力。作者式设计师若不把判据外显,他的品味就随他离职清零、也无法喂给生成。该变成:从"用手产出品味"转向"把品味写成可传递、可机检一半的规格"——见案例 B。
Once sound: taste was tacit, lived in the hand, only that person could produce it. Where it snaps: the hand's work has gone to generation; the genius's "hand" is no longer scarce — scarce is the ability to write taste down. An auteur who never externalizes criteria sees his taste zero out when he leaves, and it cannot be fed to generation. Becomes: from "producing taste by hand" to "writing taste as transferable, half-machine-checkable spec" — see Case B.
旧结构 ③ · 出稿—交付—甩给开发的瀑布
OLD ③ · MOCKUP-THEN-HANDOFF WATERFALL
"设计师画死稿,标注好,扔过墙给前端实现"
"Designer freezes a comp, annotates it, throws it over the wall to front-end"
过去合理:画稿和写码是两种昂贵技能,分工省成本。断在哪:当产物本身变成代码(设计即代码,见 DSN 02),"画稿—标注—再翻译成代码"这道墙凭空多出一次有损翻译,每翻一次都丢信息、生 bug。生成能直接产出可运行的产物,墙两边的"翻译"成了纯损耗。该变成:设计与实现在同一介质(代码/token)里同步演化,没有甩墙这一步。
Once sound: drawing and coding were two expensive skills; splitting them saved cost. Where it snaps: when the artifact itself becomes code (design-as-code, see DSN 02), the wall of "comp → annotate → translate to code" adds a gratuitous lossy translation that bleeds information and breeds bugs each pass. Generation produces runnable artifacts directly, so the cross-wall "translation" becomes pure waste. Becomes: design and implementation co-evolve in one medium (code/tokens), with no over-the-wall step.
旧结构 ④ · 像素级画板文化
OLD ④ · PIXEL-PERFECT ARTBOARD CULTURE
"把每一帧每一态都在画板上逐像素固定"
"Nail every frame and state to the pixel on the artboard"
过去合理:实现昂贵、改一次代价高,所以要在画板里把一切定死、减少返工。断在哪:当生成能在几分钟内出齐所有响应式状态与变体,把人时投入到"逐像素固定一帧"上,是把稀缺的判断花在了机器随手能补的细节上。画板里的"完美一帧"还会骗人——它在真实数据、真实设备、真实边界条件下往往不成立。该变成:定义约束与判据(规格、token、护栏),让生成铺出全部状态,人验收的是"对不对路"而非"像素对不对齐"。
Once sound: implementation was expensive and a change costly, so you nailed everything on the artboard to cut rework. Where it snaps: when generation produces every responsive state and variant in minutes, spending human-hours nailing one frame to the pixel spends scarce judgment on details the machine fills offhand. The artboard's "perfect frame" also lies — it often fails under real data, real devices, real edge cases. Becomes: define constraints and criteria (spec, tokens, guardrails), let generation spread all states, and have humans accept "on-target?" not "pixels aligned?"
旧结构 ⑤ · 设计当内部服务台
OLD ⑤ · DESIGN AS INTERNAL SERVICE DESK
"业务提需求 → 设计接单出图 → 计件交付"
"Business files a ticket → design takes the order and ships comps → piecework delivery"
过去合理:出图是稀缺产能,排队接单能让稀缺资源利用率最高。断在哪:出图不再稀缺,"接单出图"这条价值链整段塌掉——业务自己用生成就能出图。设计若继续做服务台,它守护的恰好是已经免费的那项产能,而把真正稀缺的(为谁、何为好的判断)拱手让给了不懂判断的人。该变成:从"接单出图"转向"定义并守护判据"——不再是产能的瓶颈,而是品味与责任的owner。
Once sound: comps were scarce capacity, and queued ticketing maximized utilization of a scarce resource. Where it snaps: comps are no longer scarce, and the whole "take-ticket-ship-comp" value chain collapses — business can produce comps with generation itself. A design team that stays a service desk guards precisely the capacity that is now free, while ceding the truly scarce thing (the judgment of for-whom and what-good-means) to people who do not judge it. Becomes: from "take the ticket, ship the comp" to "define and guard the criteria" — no longer the capacity bottleneck but the owner of taste and responsibility.
旧结构 ⑥ · "搞得有冲击力点"式 brief
OLD ⑥ · THE "MAKE IT POP" BRIEF
"做得高级点 / 有冲击力点 / 现代点"
"Make it premium / make it pop / make it modern"
过去合理:brief 模糊没关系,反正中间隔着一个会追问、会用手把模糊变具体的设计师。断在哪:当 brief 直接喂给生成,模糊不再被人消化,而是被模型用训练分布的均值填满——"高级"返回最常见的高级、"现代"返回最常见的现代。模糊 brief × 廉价生成 = 自动产 slop(见案例 B、案例 D)。该变成:brief 必须先被拆成可判别的判据(哪条成立、为谁成立),模糊的"好"在喂给生成前就得被人翻译成可生成的规格。
Once sound: a fuzzy brief was fine because a designer sat in the middle who would interrogate it and turn fuzz into specifics by hand. Where it snaps: when the brief feeds straight into generation, fuzz is no longer digested by a human but filled by the model with the mean of its training distribution — "premium" returns the most common premium, "modern" the most common modern. Fuzzy brief × cheap generation = automatic slop (see Cases B and D). Becomes: a brief must first be decomposed into rule-able criteria (which holds, for whom), and a fuzzy "good" must be translated by a human into a generatable spec before it ever reaches generation.
结构图STRUCTUREFIG. D12 / SEAT SWAP · 旧结构把人放执行位,新结构把人挪到判断位 · old structures seat humans in execution; the new one moves them to judgment 看懂:六种旧结构断在同一处——人坐在了机器最强的执行格 Read: all six old structures snap at the same place — humans sit on the machine's strongest square, execution
EXECUTION · machine-strong 出稿 · 美化 · 钉像素 · 计件出图 comps · prettify · pixel-nail · piecework JUDGMENT · human-only 为谁 · 何为好 · 哪版对路 · 认代价 for whom · what good · which · own the cost ① decoration ② lone genius ③ handoff wall ④ pixel artboard ⑤ service desk ⑥ "make it pop" HUMAN 旧结构把人钉在这里——机器最强的格 old structures pin the human here — the machine's strongest square 换座位 the seat swap HUMAN 新结构把人挪到这里——机器答不了的格 the new one moves the human here — the square the machine cannot answer
把六种旧结构叠在一张图上,它们的断点重合:每一种都让人坐在左栏——执行位,而执行恰是生成做到又快又好的那一半。它们不是各自独立的坏习惯,是同一个错误的六个变体:把人的价值定义在产出物上。修法也只有一个方向:把人从左栏挪到右栏,去做机器答不了的判断。这张图是 DSN 10 整节的骨架,也是为什么本卷反复说"判断退守到人"。
Overlay the six old structures on one diagram and their break points coincide: each seats the human in the left column — execution, exactly the half generation does fast and well. They are not six independent bad habits but six variants of one error: defining human value by output. The fix has only one direction: move the human from the left column to the right, to the judgment the machine cannot answer. This diagram is the skeleton of all of DSN 10, and why the volume keeps saying "judgment retreats to people."

值得说清的是:点名旧结构不是说做执行的人没价值,也不是要谁明天就失业。是说当一种结构把人的价值锚点定在已经免费的产能上时,这种结构会先失去解释力、再失去存在理由。一个团队可以继续用 Figma 画板、继续有资深设计师、继续接业务需求——只要价值锚点从"出得快、画得精"挪到"判得准、守得住为谁"。结构批判的对象从来不是工具或岗位,是那条把人按在执行位上的隐含假设。

Worth stating plainly: naming old structures does not say execution workers have no value, nor that anyone is fired tomorrow. It says that when a structure anchors human value to a capacity that is now free, that structure first loses explanatory power, then its reason to exist. A team can keep using Figma artboards, keep senior designers, keep taking business requests — as long as the value anchor moves from "ships fast, draws fine" to "judges true, guards for-whom." The target of the critique was never the tool or the role; it is the buried assumption that pins humans to the execution seat.

DSN
11
TOOLKIT · 可照做
DO-THIS TOOLKIT
工具 · 拿去用
Tools · take and use

把"何为好"变成今天就能跑的工具,而非口号

Turning "what good means" into tools you can run today, not slogans

设计这个面比组织面afford 更多可操作的工具,因为它的产物是具体的、可机检一半的。这一节给五件不是概念而是"今天就能照着跑"的工具:把设计令牌写成代码、把设计系统做成两层护栏、把品味拆成一张评分卡、把"铺开候选"写成可复用协议、把"生成×品味"做成一张能放决策的坐标。每件都附判据,不附口号。

The design surface affords more operable tools than the org surface, because its artifact is concrete and half machine-checkable. This section gives five tools that are not concepts but "run it today": write design tokens as code, build the design system as a two-layer guardrail, decompose taste into a scorecard, write "spread candidates" as a reusable protocol, and make "generation × taste" a coordinate you can place a decision on. Each comes with criteria, not slogans.

工具一 · 设计令牌即代码:让"好"可被机器执行

Tool 1 · Design tokens as code: making "good" machine-enforceable

设计令牌(design tokens)是把一切视觉决策——颜色、间距、字号、圆角、阴影、动效时长——抽象成命名变量,存成机器可读的格式(JSON / CSS 变量 / 平台原生),再由构建链注入所有产物。它的意义在 AI-Native 语境下被放大:当生成在产出无数变体,令牌是少数能让海量生成保持一致的锚点。人只需判断一次"主色是这个、间距走 8 的倍数、对比度不低于 4.5:1",这个判断就以令牌的形式被所有生成强制继承。判断一次,执行无数次——这正是内核②③的落地形态〔源:W3C Design Tokens Community Group 规范草案与各大设计系统(Material / Carbon / Polaris)token 实践,证据级 Ⅳ 行业实践〕[R7]

Design tokens abstract every visual decision — color, spacing, type size, radius, shadow, motion duration — into named variables, stored in a machine-readable format (JSON / CSS variables / platform-native), then injected into every artifact by the build chain. Their meaning is amplified under AI-native: when generation spits out countless variants, tokens are one of the few anchors that keep mass generation consistent. A human judges once — "the primary is this, spacing in multiples of 8, contrast no lower than 4.5:1" — and that judgment is force-inherited by all generation as tokens. Judge once, enforce countless times — the landed form of kernel ②③〔source: W3C Design Tokens Community Group draft and the token practice of major design systems (Material / Carbon / Polaris), grade Ⅳ industry practice〕[R7].

没有令牌 · 生成各跑各的No tokens · generation drifts
十个页面十种蓝、间距随手取、对比度时好时坏——每版单看还行,合起来散。一致性靠人逐页盯,盯不过来。
Ten pages, ten blues, spacing picked offhand, contrast hit-or-miss — each page fine alone, incoherent together. Consistency rides on a human checking page by page, and they cannot keep up.
有令牌 · 判断一次被强制继承With tokens · one judgment force-inherited
主色、间距阶、对比度阈值定义一次,所有生成从令牌取值。改一个令牌,全站同步。人省下的盯页时间,全投到"这个主色为什么对"。
Primary, spacing scale, contrast threshold defined once; all generation reads from tokens. Change one token, the whole product updates. The watching time saved goes entirely into "why is this primary right."

工具二 · 设计系统即两层护栏:硬约束 + 软判据

Tool 2 · The design system as a two-layer guardrail: hard rules + soft criteria

设计系统不该是一本没人看的规范文档,而该是一道两层护栏。A 层是硬约束(HARD-RULES):可机检、可写进 lint、命中即拦——禁渐变文字、调色板取自 token、模糊层数上限、字体白名单(排除 Inter/Roboto 等系统默认)、对比度阈值、卡片尺寸必须随内容权重变化、"空洞口号词表"命中零次。B 层是软判据(SOFT-CRITERIA):不可机检、必由人复核——"这个主色为什么是它""这款字为什么对这群人""这条动效有没有意义"。A 层让生成不会跑出红线,把人从无穷的低级一致性检查里解放;B 层把人的注意力收束到真正需要判断的少数几处。两层一硬一软,恰好对应可验证性梯度的两端(见核心图 FIG. D0)。

A design system should not be an unread spec document but a two-layer guardrail. Layer A is hard rules (HARD-RULES): machine-checkable, lint-able, blocked on hit — ban gradient text, palette from tokens, a cap on blur layers, a font allowlist (excluding system defaults like Inter/Roboto), a contrast threshold, card size must vary with content weight, "hollow slogan words" zero hits. Layer B is soft criteria (SOFT-CRITERIA): not machine-checkable, human-reviewed only — "why is this primary it," "why is this typeface right for these people," "does this motion mean anything." Layer A keeps generation inside the red lines and frees humans from endless low-level consistency checks; Layer B funnels human attention to the few places that truly need judgment. One hard, one soft, mapping exactly onto the two ends of the verifiability gradient (see key figure FIG. D0).

护栏图GUARDRAILFIG. D13 / TWO-LAYER CLAMP · 硬约束夹住生成,软判据留给人 · hard rules clamp generation, soft criteria stay human 看懂:A 层硬约束像夹钳把生成夹在红线内;越过夹钳的判断交给 B 层的人 Read: Layer-A hard rules clamp generation inside the red lines; judgment past the clamp goes to the human in Layer B
GENERATION · mass output LAYER A · HARD-RULES (lint) ban gradient text · palette=token · contrast≥4.5 · font allowlist 生成被夹在红线内 generation clamped 命中即拦 · 零人工 block on hit · no human ▼ ▲ machine-checkable half LAYER B · SOFT-CRITERIA (human) YOU 主色为什么是它? why this primary? 这字对这群人吗? right type for them? 这条动效有意义吗? does this motion mean anything? constitutive half · not outsourceable
两层护栏不是把规范写得更厚,是把规范切成两半按不同方式执行:能机检的一半(A 层)做成 lint,让机器零成本地把海量生成夹在红线内,人完全不必碰;不能机检的一半(B 层)显式标成软判据,把人的注意力收束到这几处真正需要判断的问题上。夹钳的意义是:它让"一致性"不再消耗人的判断力,从而把判断力全部留给夹钳拦不住的那些构成性问题。
A two-layer guardrail is not a thicker spec but a spec cut in half and enforced two different ways: the machine-checkable half (Layer A) becomes a lint that clamps mass generation inside the red lines at zero cost, untouched by humans; the non-checkable half (Layer B) is marked explicitly as soft criteria, funneling human attention to the few questions that truly need judgment. The point of the clamp: it stops "consistency" from consuming human judgment, leaving all of that judgment for the constitutive questions the clamp cannot catch.

工具三 · 品味评分卡:把"好"拆成可逐条判的判据

Tool 3 · The taste scorecard: decomposing "good" into item-by-item criteria

"品味"听着玄,但在一个具体产物上,它几乎总能拆成一组可逐条判的判据。下面是一张通用的品味评分卡骨架——每条都问一个"是/否/部分"的具体问题,而非"美不美"的整体感受。用法:把候选逐条过一遍,记录命中与否;过完得到的不是一个分数,是一张"哪里对、哪里没对、为什么"的诊断表。它把"我觉得不太对"逼成"第 3、第 6 条没过",从而可讨论、可传递、可喂回生成。

"Taste" sounds mystical, but on a concrete artifact it almost always decomposes into a set of item-by-item criteria. Below is a generic taste-scorecard skeleton — each line asks a specific yes/no/partial question, not a holistic "is it pretty." Usage: run a candidate through line by line and record hits; what you get is not a single score but a diagnostic table of "where it's right, where it's not, and why." It forces "I feel it's off" into "lines 3 and 6 fail," which can then be discussed, transferred, and fed back to generation.

品味评分卡 · 七问TASTE SCORECARD · seven questions逐条判,不打总分judge per line, no overall score
1 · 为谁
1 · For whom

这个产物能说出它具体为谁而做吗?还是"为所有人"——后者通常等于没有人。

Can this artifact name who specifically it is for? Or is it "for everyone" — which usually means no one.

软判据 · 必由人判
soft · human-only
2 · 主次
2 · Hierarchy

扫一眼,能立刻分出"最重要的一件事"吗?还是所有元素争同样的注意力(slop 的典型征兆)。

At a glance, does the single most important thing stand out? Or do all elements fight for equal attention (a textbook slop symptom)?

半可机检 · 视觉权重
half-checkable · visual weight
3 · 有来由
3 · Motivated

主色、字体、布局,每一个都能说出"为什么是它"吗?还是"生成默认给的、没人问过为什么"。

Can the primary color, typeface, and layout each say "why it"? Or are they "what generation defaulted to, never questioned"?

软判据 · 必由人判
soft · human-only
4 · 无指纹
4 · No fingerprint

有没有 slop 指纹(渐变文字、玻璃拟态、处处大圆角+柔投影、空洞口号词)?这一条可机检。

Any slop fingerprint (gradient text, glassmorphism, rounded-everything + soft shadows, hollow slogan words)? This line is machine-checkable.

硬约束 · 可 lint
hard · lint-able
5 · 经得起真实数据
5 · Survives real data

放进真实长度的文案、真实数量的列表、真实边界条件,它还成立吗?还是只在"完美一帧"里好看。

Put in real-length copy, real-count lists, real edge cases — does it still hold? Or only look good in the "perfect frame"?

半可机检 · 压力测试
half-checkable · stress test
6 · 有重量
6 · Has weight

它有没有一处让人记住、愿意停留的地方?还是哪都"还行"、合起来空洞——这是 slop 与好作品最难机检、却最要命的分野。

Is there one place that lands, that makes someone stay? Or is everything "fine" and the whole hollow — the hardest-to-check yet most decisive line between slop and good work.

软判据 · 必由人判
soft · human-only
7 · 认了代价
7 · Owns a cost

它为了"对这群人"放弃了什么吗?一个不放弃任何人的设计,通常没有为任何人真正做好(见案例 C)。

Did it give up anything to be "for these people"? A design that gives up no one is usually not truly good for anyone (see Case C).

软判据 · 必由人判
soft · human-only

注意第 4 条是硬约束(可写进 lint),第 2、5 条半可机检,其余四条是软判据。这张卡本身就示范了两层护栏:你能把可机检的几条自动化掉,把人的注意力集中到第 1、3、6、7 这几条机器答不了的问题上。一张拆开的评分卡,比一句"我觉得不够高级"对生成有用一万倍——因为前者能逐条喂回去,后者只会换来另一版均值 slop。

Note line 4 is a hard rule (lint-able), lines 2 and 5 are half-checkable, and the other four are soft criteria. The scorecard itself demonstrates the two-layer guardrail: you can automate the checkable lines and concentrate human attention on lines 1, 3, 6, 7 — the ones the machine cannot answer. A decomposed scorecard is ten-thousand times more useful to generation than "I feel it's not premium enough," because the former feeds back line by line while the latter only buys another mean-slop version.

工具四 · 铺开候选协议:把"多生成几版"变成可复用的判断流程

Tool 4 · The candidate-spread protocol: turning "generate more versions" into a reusable judgment process

"铺开候选再收敛"若不写成协议,很容易退化成"无脑生成一百版然后挑顺眼的"——那只是把一个均值换成另一个均值。把它写成五步协议,它才是判断流程而非产量竞赛:① 先把约束与判据写下来(没有判据,铺开就是噪音);② 沿"结构维度"而非"皮肤维度"铺开(要九个布局/流程不同的候选,不是九个换了配色的同一个);③ 用一条查清的事实做第一刀收敛(把审美题压成事实题,见案例 A);④ 决赛圈才做受控比对(A/B、可用性测试、真实数据压测);⑤ 把胜出的理由写回判据/令牌/护栏(让这次的判断沉淀成下次的规格)。第 ① 和第 ⑤ 步是把这个协议和"无脑刷版"区分开的关键:前者保证铺开有方向,后者保证判断被复用。

"Spread candidates, then converge," if unwritten, easily degenerates into "mindlessly generate a hundred and pick the nice one" — which just swaps one mean for another. Written as a five-step protocol, it becomes a judgment process rather than an output race: ① write constraints and criteria down first (without criteria, spread is noise); ② spread along the structural dimension, not the skin dimension (nine candidates with different layout/flow, not nine recolors of the same one); ③ make the first cut with one established fact (compress aesthetics into fact, see Case A); ④ run controlled comparison only on the shortlist (A/B, usability test, real-data stress test); ⑤ write the winner's reasons back into criteria/tokens/guardrails (let this judgment settle into next time's spec). Steps ① and ⑤ are what separate this protocol from "mindless re-rolling": the first keeps the spread directed, the last keeps judgment reused.

工具五 · 生成×品味坐标:把一个设计决策放上去看它该怎么处理

Tool 5 · The generation × taste plane: place a design decision on it to see how to handle it

最后一件工具是一张可放决策的坐标,把前面所有原理收成一个可操作的二维平面。横轴:这个决策能被生成做到多廉价(左=昂贵/需人、右=近免费)。纵轴:判定它好坏多依赖品味(下=可机检的事实题、上=构成性的品味题)。四个象限给出四种该怎么处理的处方——这正是 INSTRUMENT 13 设计判断分配台背后的平面。下方 FIG. D14 把这张坐标画出来,配套的 INSTRUMENT 15 让你把自己手头的决策点上去、即时拿到处方。

The last tool is a coordinate you place a decision on, collapsing all the prior principles into one operable two-dimensional plane. X-axis: how cheaply generation can do this decision (left = expensive/needs-human, right = near-free). Y-axis: how much judging its quality depends on taste (bottom = machine-checkable fact question, top = constitutive taste question). The four quadrants give four prescriptions for how to handle it — this is the plane behind INSTRUMENT 13, the design-judgment allocator. FIG. D14 below draws the coordinate, and the companion INSTRUMENT 15 lets you place your own decision on it and get an instant prescription.

坐标图PLANEFIG. D14 / GENERATION × TASTE · 两轴四象限,每格一个处方 · two axes, four quadrants, one prescription each 看懂:把一个设计决策放上去——横轴看生成多廉价,纵轴看判好坏多靠品味 Read: place a design decision — X for how cheap generation is, Y for how much judging quality needs taste
生成能做到多廉价 → how cheaply generation can do it → 判好坏多靠品味 → judging quality needs taste → → HAND TO GENERATION 廉价 + 可机检:整段交给生成 cheap + checkable: hand it wholesale 例:铺响应式状态、出变体、对齐间距 e.g. responsive states, variants, spacing → ESTABLISH THE FACT 需上下文 + 事实题:先查清,再喂生成 needs context + factual: establish, then feed 例:用户在哪一步弃单、用什么设备 e.g. where users abandon, what device → DESIGN-SYSTEM RULE 廉价做 + 带价值:人定规则,机器执行 cheap to make + value-laden: human rules, machine enforces 例:调色板、间距阶、对比度阈值(→ 令牌/护栏) e.g. palette, spacing scale, contrast (→ tokens) → KEEP HUMAN · TASTE 需人 + 构成性品味:不可外包 needs human + constitutive taste: not outsourceable 例:为谁而做、有没有灵魂、对不对路 e.g. for whom, has soul, on-target
这张坐标是设计卷所有处方的总收口。右下:廉价又可机检,整段交给生成。右上:做起来廉价但带价值取向,人把价值固化成规则(令牌/护栏),机器据规则执行。左下:不靠品味但要先查清事实(研究的活)。左上:既需人又是构成性品味——可验证性梯度的最远端,永远留给人。把任何一个设计决策放上去,它落在哪个象限,就该用哪种处方。下方 INSTRUMENT 15 让这张图变成可点的。
This coordinate is the catch-all for every prescription in the volume. Bottom-right: cheap and checkable, hand it wholesale to generation. Top-right: cheap to make but value-laden, so the human freezes value into a rule (tokens/guardrails) the machine enforces. Bottom-left: not taste-driven but fact-needing first (research work). Top-left: both human-needing and constitutive taste — the far end of the verifiability gradient, kept human forever. Place any design decision on it; whichever quadrant it lands in is the prescription to use. INSTRUMENT 15 below makes this figure clickable.
INSTRUMENT 15 · Slop ↔ 品味自测器INSTRUMENT 15 · Slop ↔ taste self-scorer

把你手头的一个产物逐条过这七问(取自上面的品味评分卡)。每勾选一个"没过",分数累加;过完即时拿到一个 slop 风险判语和首要修法。这不是打总分,是诊断——它告诉你断在哪条、先修哪条。

Run one artifact you have through these seven questions (from the taste scorecard above). Each "fails" you check adds to the score; on finishing you get an instant slop-risk verdict and the first fix. This is not an overall score but a diagnosis — it tells you which line fails and which to fix first.

设计这一面 · 可执行 skillThe AI-Native Design Skill

The AI-Native Design Skill

前面所有 SHEET 讲"为什么生成会变廉价、品味会变稀缺、该怎么判";这一件替你把设计真的做出来——它不是"设计一个设计组织",而是这个面的可执行配套:拿到一个产品、一个界面、一条落地页、一个组件、一套设计系统或一段动效,它先过一道重画而非嫁接的闸(把 agent 删掉若塌回"一个设计师手搓一张稿",就还是更快的铅笔),再跑"先立护栏 → 写可机检一半的规格 → 沿结构维度铺开候选 → 人用品味收敛 → 把判断喂回系统"这条闭环。它的覆盖面是产品 / 交互 / 系统 / 表达,不是给界面套皮。

Every SHEET above covers "why generation gets cheap, taste gets scarce, and how to judge"; this piece actually produces the design with you — it does not "design a design org," it is this surface's executable companion: hand it a product, an interface, a landing page, a component, a design system, or a motion piece, and it first runs a redraw-not-graft gate (delete the agents — if it collapses back to one designer hand-crafting one comp, it is still a faster pencil), then runs the closing loop "stand up the guardrail first → write a half-machine-checkable spec → spread candidates along the structural dimension → converge with human taste → feed judgment back into the system." Its scope is product / interaction / system / expression, not skinning a screen.

# 在 Claude Code 里调用invoke inside Claude Code
$ /skill ai-native-design
> "帮我设计这条落地页,多铺几版再帮我挑一版……""design this landing page, spread a few directions and help me pick..."

  重画闸 · 绿地 / 旧产品切出新面 / 仅赋能 / 人/信任边界redraw gate · greenfield / carve a new surface / mere enablement / human-trust boundary
  一份设计产物(稿/组件/系统)+ 令牌即代码 + 品味理由 + 指纹反同质检a design artifact (mockups/components/system) + tokens-as-code + a taste rationale + a fingerprint anti-homogenization check

开源仓库:Open-source: github.com/watterfall/ai-native-architect/skills/ai-native-design ↗

安装:Install: /plugin marketplace add watterfall/ai-native-architect

本件性质 · 设计面的可执行配套架构层那件(ai-native-architect)设计组织;这一件与其余配套件各对应一个面——同一内核、彼此耦合、阅读无固定起点。它把本卷方法论跑成设计产物。判断节点 = 品味:在海量草稿里挑哪一版、并守住其中的人。生成草稿是廉价的,判它们是稀缺的。止步线:永不外包品味——别把"挑哪版、为什么"那一按交给模型;也别把"为谁、有没有灵魂、对不对路"这类软判据硬塞进 lint,那会把每个可机检指标都拉满、却做出一个没人想要的完美界面。
What this is · the design executable companionThe architecture piece (ai-native-architect) designs the organization; this and the other companion pieces each carry one surface — one kernel, mutually coupled, with no fixed reading entry. It runs this volume's methodology into design artifacts. Judgment node = taste: choosing which version among the abundant drafts, and holding the human in it. Generating drafts is cheap; judging them is scarce. Stop-line: never offload taste — do not hand the "which one, and why" press to the model; and do not force soft criteria ("for whom, has soul, on-target") into lint, which maxes every checkable metric yet ships a flawless interface no one wants.
SPEC.V / AI NATIVE METHODOLOGY / OWL METHODOLOGY SERIES
SCOPE / 一套方法论 · 完整组织光谱 N=1 → N=众多(一人公司至 agent 网络,同一套第一性原理)One methodology · the full organizational spectrum N=1 → N=many (from the one-person company to the agent network, on a single set of first principles)
SERIES / 六卷同一内核 · 本卷是其中一个面,完整接线见上方「方法论系列」。Six volumes, one kernel · this volume is one surface; the full wiring is above under "The Series."
APPENDIX · SOURCES / 证据与引用登记 —— 分级口径: 审计级实证(监管文件交叉验证)· 同行评审 · 理论模型/工作论文(引用须写"模型预测",不得写"已证明")· 从业者一手陈述 · 咨询预测(是预测,不是事实)。引用条目以本表为准;本轮 3 票对抗复核未发现被驳倒条目。Evidence and citation registry; grading key: audit-grade empirics (cross-checked against regulatory filings) · peer-reviewed · theoretical model / working paper (citations must read "the model predicts," never "proven") · practitioner first-hand account · advisory forecast (a forecast, not a fact). Citation rows are authoritative in this table; the current 3-vote adversarial review found no overturned source.
REFGRSOURCE承重论断Load-bearing claim
R1Anthropic《How Anthropic teams use Claude Code》2025-07-24 · agentic-coding 一手实践 "How Anthropic teams use Claude Code" 2025-07-24 · first-hand agentic-coding practice · anthropic.com/news从一句描述生成整套带状态、响应式、可交付代码的界面,已是常规能力而非演示——出一版界面从"几天人时"压到"一次提示+几分钟"(成本侧已塌的从业者证据)Generating a full stateful, responsive, deliverable UI from one prompt is now routine, not a demo: a version drops from "days of human time" to "one prompt + minutes" (practitioner evidence that the cost side has collapsed)
R2Karpathy《Software Is Changing (Again)》YC AI Startup School · 2025-06-16 "Software Is Changing (Again)" YC AI Startup School · 2025-06-16 · ycombinator.com/library/MWSoftware 3.0 与"验证瓶颈"——生成变廉价后,做功的环节从"能否做出来"移到"该不该是这样、由谁来验"(判断不随模型变便宜的论述锚)Software 3.0 and the "verification bottleneck": once generation gets cheap, the load-bearing step moves from "can it be built" to "should it be this, and who verifies" (the anchor for judgment not getting cheaper with the model)
R3设计即代码工具链:pencil/paper(以代码描述图形)· Remotion(以 React 描述视频,Design-as-code tooling: pencil/paper (graphics described as code) · Remotion (video described as React, remotion.dev)· html-video(用网页技术出动效);并本系列工程卷"五条贯穿原理"与 design-as-code 实践) · html-video (motion via web tech); plus this series' engineering volume "five through-lines" and its design-as-code practice画布工具把设计锁进私有二进制;新一代工具把同一份设计重新表达为纯文本,于是设计掉进软件工程三十年的 git/diff/CI 基础设施——这是产物形态的相变,不是工具竞赛Canvas tools lock design in proprietary binary; the new tools re-express the same design as plain text, so design falls into software engineering's thirty years of git/diff/CI infrastructure — a phase change in artifact form, not a tool race
R4Doshi & Hauser,受控实验, controlled experiment《Generative AI enhances individual creativity but reduces the collective diversity of novel content》Science Advances 10(28) · 2024 · doi.org/10.1126/sciadv.adn5290约 300 名受试者写短篇故事,部分获 AI 提示:个体层面更新颖,集体层面(语义相似度)更趋同——"放大个体、压平分布"的实验影像(对象是叙事文本,迁移到视觉/产品设计是合理但未验证的外推,故不外推具体数字)~300 participants writing short stories, some given AI prompts: more novel individually, more similar collectively (by semantic similarity) — the experimental image of "amplify the individual, flatten the distribution" (the object is narrative text; carrying it to visual/product design is a reasonable but unverified extrapolation, so no specific figure is carried over)
R5从业者复盘(脱敏)Practitioner retrospective (de-identified)· DSN 09 案例 A/DCases A / D结账重做(弃单 31%→19%)与仪表盘 slop 急救(D7 留存 11%→34%)的前后区间,引自一手项目复盘。为脱敏内部数据、非公开受控实验,故仅作区间陈述、不外推到其他品类或团队;用于支撑"铺开→收敛"与"先补判断再重生成"的机理,不用于证明任何普适转化率。The before/after ranges for the checkout redo (abandon 31%→19%) and the dashboard slop rescue (D7 retention 11%→34%), drawn from first-hand project retrospectives. De-identified internal data, not a public controlled experiment, so stated only as ranges and not extrapolated to other categories or teams; used to support the mechanism of "spread→converge" and "restore judgment before regenerating," not to prove any universal conversion rate.
R6Ⅳ+ⅡWebAIM,屏幕阅读器用户调查(参照)+从业者复盘(脱敏), Screen Reader User Survey (reference) + practitioner retrospective (de-identified)· webaim.org/projects/screenreadersurvey10DSN 09 案例 C 的无障碍可用性区间(纯读屏完成关键流程 41%→92%)为脱敏复盘,量级参照 WebAIM 公开的屏幕阅读器用户调查(同类任务可完成性的数量级);证据级 Ⅳ 一手+Ⅱ 公开调查参照,不外推为通用转化结论。The accessibility-usability range in DSN 09 Case C (screen-reader-only completion of the key flow, 41%→92%) is a de-identified retrospective whose order of magnitude references WebAIM's public Screen Reader User Survey (task-completability magnitude for comparable tasks); grade Ⅳ first-hand + Ⅱ public-survey reference, not extrapolated to a generic conversion claim.
R7W3C Design Tokens Community Group,规范草案, draft specification· w3.org/community/design-tokens+ Material/Carbon/Polaris 设计系统 token 公开实践 + the public token practice of the Material / Carbon / Polaris design systems支撑 DSN 11 工具一"设计令牌即代码":令牌作为机器可读的视觉决策锚点,使一次判断被海量生成强制继承。引规范草案与主流设计系统的公开实践,证据级 Ⅳ 行业实践(规范为草案、各家实现细节不同,不主张统一标准已成定论)。Supports DSN 11 Tool 1 "design tokens as code": tokens as a machine-readable anchor for visual decisions, so one judgment is force-inherited by mass generation. Cites the draft spec and the public practice of mainstream design systems; grade Ⅳ industry practice (the spec is a draft and implementations differ, so no settled unified standard is claimed).
REVDATEDESCRIPTION
1.02026-06设计卷成形 —— 八 SHEET(生成变富品味变稀缺 · 设计即代码 · 从打磨到判断 · 品味可拆解 · 设计系统即护栏 · 反 slop 红线 · AI 设计环 · 决策分诊)· 三节深化(DSN 09 四个真实案例 · DSN 10 六种旧结构批判 · DSN 11 五件可照做工具)· INSTRUMENT 10 Slop 自检表 + INSTRUMENT 13 设计判断分配台 + INSTRUMENT 15 Slop↔品味自测器 · 十九张论证图 · 本卷独立证据登记 R1-R7(与组织卷登记分离)Design volume takes shape: eight SHEETs (generation gets cheap, taste gets scarce · design-as-code · from making to judging · taste decomposed · the system as guardrail · anti-slop red lines · the AI design loop · decision triage) · three deepening sections (DSN 09 four real cases · DSN 10 critique of six old structures · DSN 11 five do-this tools) · INSTRUMENT 10 the Slop self-check + INSTRUMENT 13 the design-judgment allocator + INSTRUMENT 15 the Slop↔taste self-scorer · nineteen argument-bearing figures · this volume's own evidence registry R1-R7 (separated from the organization volume's)
REV. 2026-06 R1.0 / END OF DOCUMENT