当一稿、一个变体、一整套界面都近乎免费,稀缺的不再是"做出来",而是"什么算好"——品味与意图。而生成默认滑向均值(slop:通用、雷同、一眼 AI),所以品味反而成了最稀缺的判断。这里说的设计,不止于界面与视觉:它是把意图变成人能理解、愿意使用、为之停留的形态这门更大的手艺——产品、交互、系统、表达都在其内。设计的对象始终是具体的人,所以"为谁、何为好"是 AI 接不走、也最不该接走的那部分判断。纪律同样:工具是表层,提取底层原理。
When one comp, one variant, a whole interface is near-free, the scarce thing is no longer "making it" but "what counts as good" — taste and intent. And generation defaults to the mean (slop: generic, derivative, obviously-AI), so taste becomes the scarcest judgment of all. Design here is not confined to interfaces and visuals: it is the larger craft of turning intent into a form people can understand, want to use, and will stay with — product, interaction, system, and expression all included. Its object is always specific people, so "for whom, and what is good" is the part of judgment AI cannot take over, and should least of all be allowed to. Same discipline: tools are surface; extract the principle beneath.
① 生成已充裕 → ② 判断沿可验证性梯度分叉:可机检的部分(对齐 / 规格符合 / 可访问性)被自动化,构成性的审美判断留给人 → ③ 设计系统与意图成基础设施 → ④ 人回到"为谁、何为好"。无需读过组织卷,本卷即此一面。
① generation is abundant → ② judgment forks along the verifiability gradient: the machine-checkable (alignment / spec-conformance / accessibility) is automated, constitutive aesthetic judgment stays with people → ③ the design system and intent become infrastructure → ④ people return to "for whom, and what is good." You need not have read the Org volume; this volume is that one surface.
This volume is about design — turning intent into a form people can understand, want to use, and stay with, from product to interaction to expression, not interfaces alone.
AI-Native design moves the designer from hand-producing one comp into the judgment loop: spread candidates, judge differences, steer the next round, and distill new criteria into the design system.
Taste Infrastructure
让生成收敛,必须先把品味外化。
To make generation converge, taste must be externalized first.
Alignment, spacing, accessibility, and token conformance can be automated; “is this for these people” and “does it have a soul” cannot be linted. Turn checkable rules into guardrails and keep the un-outsourceable in critique.
Start with four lines: for whom, what it should feel like, what it must not become, and the done-when signal. Then spread candidates, critique by hand, and feed new criteria back into the system.
设计师用 AI 出图、装一个 Figma 插件,仍然只是 AI 辅助:旧出稿流程更快了,但判断结构没有改变。AI-Native 设计承认生成已经充裕,于是围绕生成重画整张设计流程图。差别不是程度,是种类。
A designer generating images with AI or adding a Figma plugin is still in AI-assisted design: the old production process speeds up, but the judgment structure is unchanged. AI-Native design accepts that generation is abundant and redraws the whole design process around it. The difference is not degree, but kind.
For years design's scarce resource was production hours — drawing comps, aligning pixels, exporting assets, making variants. Every process was built for "making things is expensive." Once generation is the default, producing comps, variants, and copy no longer slows the team, but the bottleneck does not vanish — it moves: to "which is better, on-target, for people." Fill the kernel's four steps with the specifics of design and you get this part's thesis.
用反例划界:三件看起来像、其实不是 AI-Native 设计的事
Drawing the boundary by counterexample: three things that look like it but are not
说清"是什么"最快的方式,是先说清"不是什么"。反例一:把 AI 当出图机。用 Midjourney 出一张配图、用插件生成一版图标,然后照旧手工拼版——这是把 AI 嫁接到旧的"产出工时"流程上,AI 只是换了一支更快的笔。流程图没变,瓶颈也没动。反例二:把 AI 当自动美化器。指望模型"让这版更好看",于是它给你套上当下最流行的视觉模板——结果恰恰是 slop:更光滑,但更没自己。这把品味这件人该做的事,错误地外包给了一台只会拟合均值的机器。反例三:把设计系统当事后文档。先生成、再回头补一套 token 应付审计——这让设计系统失去了它在 AI-Native 流程里唯一重要的角色:生成前的护栏。三个反例的共同错误,是没有围绕"生成已充裕"这一前提重画流程,而只是把 AI 塞进旧流程的某个工位。
The fastest way to say "what it is" is to first pin down "what it is not." Counterexample one: AI as an image machine. Generate an illustration with Midjourney, a set of icons with a plugin, then hand-assemble the layout as before — this grafts AI onto the old "production hours" process, with AI merely a faster pen. The process diagram has not changed, and neither has the bottleneck. Counterexample two: AI as an auto-beautifier. Expecting the model to "make this look better," it dresses your work in the currently most popular visual template — and the result is precisely slop: smoother, but with less self. This wrongly outsources taste, a human's job, to a machine that only fits the mean. Counterexample three: the design system as a post-hoc doc. Generate first, then go back and patch in a token set to pass audit — this strips the design system of the one role that matters in an AI-Native process: the pre-generation guardrail. The shared error of all three is not redrawing the process around the premise that "generation is already abundant," and instead just slotting AI into one station of the old process.
那么正面说,AI-Native 设计是什么?它是承认"生成已充裕"这个前提,并据此把整张设计流程图重画一遍的设计方式。重画的标志有三个,对应内核的②③④:第一,把人的动作从"做"前移到"判断与方向"——不再以产出工时论价值,而以品味命中率论价值(②);第二,把设计系统从交付物升级为生成前的护栏,把"何为好"显式地写成可指导、可机检生成的规格(③);第三,不再把"更快"本身当成赢,而是让被生成解放出来的设计师回到共情、品味与意义(④)。这三条合起来,就是"种类上的不同"而非"程度上的不同":旧流程优化的是"怎么更快地做出来",新流程优化的是"怎么更准地判断该做什么、为谁做"。一个团队是不是真的 AI-Native,不看它用没用 AI 工具,而看它有没有发生这三条重画——尤其是有没有把价值重心从产出真正移到判断。没移,就还在旧流程里用更快的笔;移了,才是种类上的新设计。
So, stated positively, what is AI-Native design? It is the way of designing that accepts the premise "generation is already abundant" and redraws the whole design process diagram accordingly. The redraw has three marks, mapping to the kernel's ②③④: first, moving the human's action forward from "making" to "judgment and direction" — valuing not by production hours but by taste hit-rate (②); second, upgrading the design system from a deliverable to a pre-generation guardrail, explicitly writing "what is good" as a spec that steers and machine-checks generation (③); third, no longer treating "faster" itself as the win, but returning the designer freed by generation to empathy, taste, and meaning (④). Together these three are "a difference in kind," not "a difference in degree": the old process optimizes "how to make it faster," the new process optimizes "how to judge more accurately what to make, and for whom." Whether a team is truly AI-Native is judged not by whether it uses AI tools but by whether these three redraws have happened — especially whether the center of value has really moved from production to judgment. Not moved, and it is still the old process with a faster pen; moved, and only then is it design new in kind.
① 充裕ABUNDANCE
稿 / 变体 / UI / 文案
Comps / variants / UI / copy
生成成默认,"做出来"不再稀缺。
Generation is the default; making it is no longer scarce.
② 判断JUDGMENT
品味 · 意图 · 何为好
Taste · intent · what's good
新瓶颈是审美与体验判断 + 连贯。
The new bottleneck is aesthetic/experience judgment + coherence.
③ 上下文CONTEXT
设计系统即护栏
Design system as guardrail
tokens / 组件 / 品牌成为生成的规格。
Tokens / components / brand become the spec for generation.
④ 人MEANING
共情 · 品味 · 为意义负责
Empathy · taste · meaning
设计师回到理解用户、守住品味与意图。
Designers return to understanding users, holding taste and intent.
Step ② forks the same way along the verifiability gradient: the machine-checkable part (alignment / spec-conformance / accessibility) joins ① abundance and gets automated; constitutive aesthetic judgment (taste / who it exists for / heterogeneity) sinks to ④ and stays with people — the very line the design volume shares with the system map.
这一步不是新增了一卷,是同一个内核作用在设计这个面
This is not a new volume; it is the one kernel acting on the design face
Every volume in the series — organization, engineering, design, research, learning, innovation — does not each tell its own story; they are the one kernel landing on different faces. The org volume tells it as "execution becomes abundant → judgment retreats → context becomes infrastructure → people return to meaning"; the engineering volume as "typing becomes abundant → verification becomes the bottleneck → the codebase becomes queryable → people do deep systems expertise." The design volume only swaps the nouns of those same four steps for design's specifics: what is abundant is comps and variants, the judgment that retreats is taste and intent, the context that becomes infrastructure is the design system, and the meaning people return to is empathy and being-for-people. A reader who has read any sibling volume will recognize the same machine with different parts swapped in — which is exactly why this is a series, not six isolated essays.
核心图KEY FIGFIG. D0.0 / THE FORK · 判断沿可验证性梯度分叉看懂:第②步的判断如何一半被自动化、一半下沉到人Read: how step ②'s judgment splits — half automated, half sinks to people
Step ② is not a black-box judgment but a fork. One triage question — "can this be ruled right or wrong from the artifact text alone?" — cuts judgment in two: the machine-checkable half flows back into ① abundance and gets automated (DSN 08's hard constraints); the constitutive half sinks to ④ and stays with people (soft criteria). The whole fork stands on ③ context. This figure is the skeleton of every SHEET that follows.
这次重画,和设计史上每一次工具革命的关键差别
How this redraw differs from every prior tool revolution in design history
Design history has been through many tool revolutions: from hand-drawing to desktop publishing (DTP), from paper to Photoshop, from static slicing to Figma's collaborative vectors. Each made "making" faster and cheaper, but one thing never changed — the maker and the judge were the same person, and judgment was always embedded inside the act of making. The designer judged good and bad while pushing pixels, the two interwoven past separating. This redraw is structurally different: for the first time it strips "making" almost entirely to the machine, thereby extracting "judgment" out of "making" and forcing it to take independent shape. You used to not need to write down "what is good" separately, because judgment lived in your hand; now the hand goes to generation, and judgment, if not explicitly written down and stated, simply disappears — and the moment it disappears, generation slides back to the mean. This is why this volume insists, again and again, on "write the spec, state the criteria, feed judgment back": these actions are not newly invented; it is that for the first time they must turn from implicit to explicit. Grasp this difference and you grasp why this is not "yet another faster tool" but a reorganization of the designer's value structure.
Producing comps, variants, a whole interface is near-free; "what counts as good" has not gotten one bit cheaper. Worse: generation defaults to the mean of its distribution, the shape everyone has seen most. Which is exactly why human taste becomes the scarcest judgment node.
Mechanism: the old scarcity was production hours, so the process was built to spend fewer. Why it stalls: when generation makes comps something you draw on at will, production is no longer the constraint. The bottleneck lands wholesale on a question the machine cannot answer: is this version good? on-target? made for the people it serves?
为什么生成的默认终点是"均值"——而均值就是 slop
Why generation's default destination is "the mean" — and the mean is slop
The root of this asymmetry hides in the generation model's objective function. A model is trained to minimize expected loss, and facing an under-constrained request ("make a good-looking landing page"), its safest strategy is to output the highest-frequency, least-error-prone form in its training distribution. Statistically, that is the mean — the greatest common divisor of every "landing page" it has seen, stacked together. The problem: the mean is by nature like no one in particular and for no one in particular. It is "fine" for everyone precisely because it made no trade-off for any one group. This is slop's mathematical definition — not ugliness but "convergence to the mean." Grasping this corrects a common misdiagnosis: the belief that "slop will disappear once the model gets stronger." Quite the opposite — the stronger the model, the more precisely it fits the mean, and the slop it produces unconstrained gets smoother and harder to spot at a glance. What can pull it off the mean has never been a stronger model but stronger constraints — namely the spec and taste a human supplies.
生成 · 近乎免费Generation · near-free
一稿、十个变体、一整套界面与状态——随取随用,近零边际成本。
One comp, ten variants, a whole interface with states — on demand, near-zero marginal cost.
品味 · 依旧稀缺Taste · still scarce
"哪个更好、为什么、是否为人"——没有捷径,只能由人判断。这就是新瓶颈。
"Which is better, why, is it for people" — no shortcut, only human judgment. This is the new bottleneck.
图FIGFIG. D1.1 / DISTRIBUTION CLAMP · 生成默认堆在均值,设计系统把分布夹离均值 · generation piles on the mean; the design system clamps the distribution off it看懂:生成的输出本身是一条压在"均值=slop"上的钟形分布,护栏不是挑一个好结果,而是把整条分布夹窄、推向品牌那一侧Read: generation's output is itself a bell centered on "the mean = slop"; the guardrail does not pick one good result, it clamps the whole distribution narrow and pushes it toward the brand side
The point is not "pick the one good result" — that is downstream filtering, neither fast nor reliable. The guardrail acts upstream: it takes generation's broad mean-centered distribution and clamps the whole thing narrow, shifts the whole thing off the mean toward where the brand holds. So a design system's payoff is not "one less rework" but "every generation's expectation lands closer to what you would sign off on." This is also why slop is the default, not the accident: without a clamp, the peak always sits on the mean.
把两条曲线画在一起:成本塌、判断不塌
Plot the two curves together: cost collapses, judgment does not
The asymmetry is a mechanism, not a slogan, because the two things have moved at different rates over the last eighteen months. Generation's marginal cost has collapsed one-to-two orders of magnitude along the model-capability curve — one comp, ten variants, a full set of responsive states, from "person-days" to "a prompt and a few minutes." The judgment curve, "which is good, good for whom," is nearly flat: it does not get cheaper as the model gets stronger, because it asks not "can this be made" but "should it be this way." One curve falls, the other stays level, and the scissor-gap that opens between them is the entire reason taste becomes the bottleneck 〔Source: Anthropic 2025 agentic-coding practice and Karpathy's "software is changing" talk, grade Ⅳ practitioner〕[R1][R2].
核心图KEY FIGFIG. D1.0 / GENERATION × TASTE · 生成×品味平面看懂:AI 把你推向哪一格,品味要在哪一格注入Read: which quadrant AI pushes you into, and where taste must be injected
AI 只沿横轴帮你——把生成推向"近乎免费"。它不会替你沿纵轴往上走。不加判断,你就从 Q2(旧手艺)平移到 Q4(slop 默认区):更便宜,但谁也不为。胜势 Q1 不是 AI 送的,是人把品味这条纵轴重新加上去换来的。
AI helps you only along the horizontal axis — pushing generation toward "near-free." It will not climb the vertical axis for you. Without injected judgment you simply slide from Q2 (old craft) to Q4 (the slop default): cheaper, but for no one. The win, Q1, is not a gift from AI; it is what a human buys back by re-adding the vertical axis of taste.
不对称的直接后果:团队该把省下的人力重新投到哪
The asymmetry's direct consequence: where a team should reallocate the freed-up effort
如果生成把产出成本压塌、而判断成本不变,那么一个理性的团队就该做一次显式的人力再分配,而不是简单地"用 AI 提效然后裁掉一半设计师"。后者是对这条不对称的误读——它假设瓶颈还在产出,所以省了产出就万事大吉。真相是瓶颈搬到了判断,所以省下的产出人力应当重新投向判断侧:投到把规格写得更有判别力、把品味外化成可复用的护栏、把每一轮的判断回流进系统这些事上。一个做对了的团队,外观上的变化是:花在 Figma 里推像素的时间大幅减少,花在写"为谁、何为好、什么是红线"和评审候选、讨论"为什么这版对路"的时间大幅增加。设计师的人数不一定变少,但每个人的工作内容会显著上移——从执行者变成判断者与方向制定者。误读这条不对称的代价很实在:以为能靠 AI 省掉判断,结果只是更快地产出无人负责品味的 slop,把 Q2 的设计悄悄平移到了 Q4。
If generation collapses production cost while judgment cost stays fixed, then a rational team should perform an explicit reallocation of human effort, not simply "use AI to boost efficiency and then cut half the designers." The latter misreads the asymmetry — it assumes the bottleneck is still production, so saving production is the end of it. The truth is the bottleneck moved to judgment, so the freed-up production effort should be reinvested on the judgment side: into writing more discriminating specs, externalizing taste into reusable guardrails, feeding each round's judgment back into the system. A team that gets this right looks, on the surface, like this: time spent pushing pixels in Figma drops sharply, and time spent writing "for whom, what is good, what is a red line," reviewing candidates, and discussing "why this version is on-target" rises sharply. The headcount of designers need not shrink, but each person's work shifts markedly upward — from executor to judge and direction-setter. The cost of misreading the asymmetry is concrete: believing AI can save you judgment, you only produce, faster, slop for whose taste no one is responsible, quietly translating Q2 design into Q4.
There is also an often-overlooked second-order effect: when production is near-free, the cost of trying is near-zero, so the boundary of exploration should be pushed wide open. In the past, because every comp was expensive, teams tended to converge early on a "safe" direction, afraid to diverge much — divergence meant wasting precious person-hours. Now that constraint is gone: spreading ten genuinely different directions and spreading one cost about the same. This means the rational strategy should invert — diverge as much as possible before judging, because divergence is nearly free, and the wider you diverge the larger the possibility space your judgment can pick from, the higher the probability of hitting "the genuinely on-target direction." Sadly, many teams spend the saved cost in the wrong place: they use it to converge faster ("comps are fast now, settle and move on") rather than explore wider. This is another misuse of the asymmetry — it spends the "generation got cheap" dividend on accelerating old habits, not realizing the dividend truly unlocks "the wide-range exploration you never dared before." Used right, generation's cheapness does not let you finish faster; it gives your judgment an unprecedentedly large library of material.
受力警告 · slopForce warning · slop
生成的默认产物是 slop:收敛到见得最多的样子。它看起来完成了,却谁都不像、谁也不为。slop 不是做得差,是没把判断放进去——避开它的唯一办法,是把人的品味放回环里。Generation's default output is slop: it converges on what it has seen most. It looks finished, yet resembles no one and is for no one. Slop is not bad craft; it is judgment left out — the only way around it is to put human taste back in the loop.
DSN
01·5
WHEN · 为什么是现在
WHY NOW
证据 · 时机
Evidence · Timing
这套不对称不是预测,是已经发生的事
This asymmetry is not a forecast but something already happening
"Generation gets cheap, taste gets scarce" sounds like a bet on the future, but it has already happened in observable ways over the last eighteen months. Treating it as an accomplished fact rather than a prophecy is what lets you redraw the process now — rather than "waiting until the tech matures." This sheet gives concrete, refutable signals: if these signals do not hold, the whole volume's premise deserves to be questioned.
Signal one: the cost side has already collapsed. Producing a whole interface — with states, responsive, deliverable code — from a one-line description is by now a routine capability across several generative-UI tools and agentic-coding practices of 2024–2025, not a demo. Compressing "produce one interface version" from "person-days" to "a prompt plus a few minutes" is a one-to-two order-of-magnitude shift along the horizontal axis 〔Source: Anthropic 2025 agentic-coding practice, public capabilities of several generative front-end tools, grade Ⅳ practitioner〕[R1]. Signal two: the judgment side has not collapsed with it. Over the same period, "which version is on-target, is it for these people, has it crossed the line of taste" has not become more automatable — give the same brief to a model ten times and you get ten results that are all "done right" yet need a human to pick among. Generation solved "can be made," not "which one it should be." Signal three: slop has become an observable public phenomenon. "You can tell at a glance it was AI-made" has gone from a vague feeling to a phenomenon describable by concrete fingerprints (cyan-on-dark, purple-blue gradients, glassmorphism, centered Inter), even detectable — which is itself direct evidence that "generation defaults to converging on the mean."
图FIGFIG. D2.1 / THE SCISSORS · 一塌一平,张开的剪刀差就是瓶颈 · one collapses, one stays flat; the opening gap is the bottleneck看懂:过去十八个月两条曲线以不同速率移动——生成成本塌了一到两个数量级,判断成本近乎水平,中间的剪刀差正是品味成为瓶颈的全部原因Read: over eighteen months the two curves moved at different rates — generation cost fell one-to-two orders of magnitude, judgment cost stayed near-flat; the scissors gap between them is the whole reason taste becomes the bottleneck
The one thing to take from this chart is the difference in slope, not any specific number. The orange line collapses and the blue one stays flat — not because judgment "got harder," but because it asks a different kind of question — not "can it be built" but "should it be this, for whom" — a kind that does not get cheaper as the model improves. So the bottleneck will not vanish with the next model; it gets more prominent with every collapse in cost. [R1][R2]
Treating the asymmetry as an accomplished fact rather than a future prophecy has a direct action implication: waiting is not neutral; it carries a steadily accumulating cost. If this were merely a bet on the future, "watch and wait another year or two" would be rational; but since the cost side has already collapsed, the judgment side has already become the bottleneck, and slop is already an observable public phenomenon, every additional stretch of running the old process means doing, at a higher unit cost, production that could be cheaper, while continuing to lock on the production side the attention that should go to judgment. The more insidious cost is the atrophy of judgment capability: if a team is slow to shift its center of work to judgment, its designers never get the chance to train the abilities the new bottleneck demands — writing discriminating specs, stating why a version is on-target — and when migration becomes unavoidable, they find these are not abilities you patch in with a meeting. So "now" is not a marketing urgency but a real corollary of this asymmetry: since the change has already happened, the best time to redraw the process is now, and the cheapest time is also now.
需要给"现在就动"加一条诚实的限定,以免它被读成盲目的紧迫感:动,指的是开始把工作重心往判断侧迁移、开始建可机检的护栏、开始跑一遍闭环,而不是"立刻把所有现有工具换成最新的 AI 设计工具"。工具会快速迭代、会有赢家输家,押注某个具体工具是有风险的;但押注"产物的可文本化、判断重于产出、护栏前置"这几条结构性方向几乎没有风险,因为它们不依赖任何具体工具的存亡——无论最后哪个工具胜出,这些方向都成立。所以"现在就动"的正确解读是:在结构层面立刻开始迁移(这是安全且划算的),在工具层面保持敏捷、不过早 all-in 某一个(这是审慎的)。把这两件事分开,就既不会因为观望而持续付迁移成本,也不会因为押错工具而被套牢。这正是把不对称当成"已发生的结构性事实"、而非"某个产品的营销叙事"应有的清醒。
"Move now" needs an honest qualifier, lest it read as blind urgency: moving means beginning to shift the center of work to the judgment side, beginning to build machine-checkable guardrails, beginning to run the loop once — not "immediately replacing every existing tool with the newest AI design tool." Tools will iterate fast and have winners and losers; betting on any one tool is risky. But betting on the structural directions of "the artifact's text-expressibility, judgment over production, guardrails up front" carries almost no risk, because they do not depend on any one tool's survival — whichever tool wins in the end, these directions hold. So the correct reading of "move now" is: at the structural level, start migrating immediately (safe and worthwhile), and at the tool level, stay agile and do not go all-in on one too early (prudent). Keep these two apart and you neither keep paying the migration cost by waiting nor get locked in by betting on the wrong tool. This is the clear-headedness due to treating the asymmetry as "an accomplished structural fact" rather than "some product's marketing narrative."
最后要给这套"为什么是现在"加一层必要的克制,避免它滑成技术决定论:成本侧塌了、判断成了瓶颈,这些是真的;但"所以一切都该立刻 AI 化"并不跟着成立。有些设计场景的判断密度极高、可机检的部分极少(比如一个承载强烈情感的品牌重塑、一个高度依赖特定文化语境的视觉系统),在这些场景里,生成能帮的忙本来就有限,硬上 AI 流程反而可能添乱。承认不对称已发生,不等于承认它在每个角落都同等强烈。诚实的姿态是:把这套方法当成一张受力图,告诉你瓶颈在往哪搬、杠杆点在哪,但具体到某个场景该投入多少、哪些环节真能受益,仍需你自己判断——而这个"判断该不该上、上到什么程度"的元判断,本身就是这套方法最看重的那种判断。一套好方法不该要求你信仰它,而该给你一副看清受力的眼睛,连"它在这里适不适用"也交给你判断。这正是它区别于营销叙事的地方:营销要你全盘照单,方法只给你看清结构的工具。
Finally, this "why now" needs a layer of necessary restraint, lest it slide into technological determinism: the cost side has collapsed and judgment has become the bottleneck — these are real; but "so everything should be AI-ified immediately" does not follow. Some design scenarios have extremely high judgment density and very little machine-checkable surface (a brand rebrand carrying intense emotion, a visual system deeply dependent on a specific cultural context), and in these, what generation can help with is inherently limited; forcing an AI process may add noise instead. Granting the asymmetry has happened is not granting it is equally strong in every corner. The honest posture is to treat this method as a force diagram that tells you where the bottleneck is moving and where the leverage points are, while how much to invest in a given scenario and which steps truly benefit still needs your own judgment — and this meta-judgment of "whether and how far to adopt" is itself the kind of judgment this method values most. A good method should not demand you believe in it but give you eyes to see the force clearly, leaving even "does it apply here" to your judgment. This is exactly what distinguishes it from a marketing narrative: marketing wants you to take the whole package; a method only hands you the tools to see the structure clearly.
证伪条件Falsification condition
这一卷的前提会被推翻,如果:(a) 模型在没有人类规格的情况下,开始稳定产出"不是均值、且确实为某群人对路"的设计——那说明品味已被自动化,②不再退守;或 (b) 生成的成本并未实质下降、出稿仍是团队瓶颈——那说明①充裕尚未到来。只要这两条都不成立(目前都不成立),不对称就是真实受力,而非修辞。This volume's premise is refuted if: (a) models begin, without human specs, to reliably produce designs that are "not the mean and genuinely on-target for some group" — that would mean taste is automated and ② no longer retreats; or (b) generation's cost has not materially dropped and producing comps is still the team bottleneck — that would mean ① abundance has not arrived. As long as neither holds (and neither does today), the asymmetry is a real force, not rhetoric.
Why are pencil / paper, Remotion (video in React), Hyperframes / html-video amplified? Because they turn the design artifact from an opaque binary canvas into code / plain text. Once the artifact is code, design gets the same leverage as engineering.
This is the engineering part's five through-lines, on the design surface. What actually does the work is "artifact becomes code": it makes design satisfy the same agent-friendly properties — what meets them gets amplified, what is locked in proprietary binaries gets marginalized:
对 agent 可读:agent 能直接读写设计源,不必解析私有画布。
Legible to agents: agents read and write the design source directly, no proprietary canvas to parse.
可 diff / 可版本:一次改动是一次提交,可评审、可回滚——设计进入工程的协作纪律。
Diffable / versionable: a change is a commit — reviewable, revertible; design enters engineering's collaboration discipline.
可生成 / 可组合:agent 批量铺变体、组合组件,人只做判断与导向。
Generatable / composable: agents spin up variants and compose components; humans only judge and steer.
可验证:tokens 与约束能被机器检查"是否离牌"——品味的护栏可机检。
Verifiable: tokens and constraints can be machine-checked for "off-brand" — the guardrails of taste become checkable.
"去画布化"不是审美口号,是把产物挪到 agent 够得着的形态
"De-canvasing" is not an aesthetic slogan; it moves the artifact into reach of agents
Canvas tools (Figma, Sketch, PSD) store design state as a proprietary binary: the layer tree, constraints, and vector paths are locked in a format only that software understands. A human can look with their eyes, but an agent cannot get in — it can neither reliably read out "which token this button uses" nor express a change as a reviewable text diff. The new generation of tools (pencil/paper describing graphics as code, Remotion describing video as React, html-video doing motion straight in web tech) share a move that is not "looking more modern" but re-expressing the same design as plain text. Once that happens, design falls into the entire infrastructure software engineering has accumulated for thirty years: git, diff, code review, CI, automated generation. This is not a tool race but a phase change in the form of the artifact 〔Source: this series' engineering volume "five through-lines" and design-as-code practice, grade Ⅳ〕[R3].
Making the collaboration discipline that "artifact becomes code" brings more concrete is more persuasive. When design is a Figma file, team collaboration rests on a social protocol: leaving comments in the file, aligning verbally in meetings, "who owns this file" — fragile, untraceable, and chaotic once changes pile up. When design is code, collaboration immediately inherits engineering's discipline validated over decades: each change is a commit with author, time, and message; merging to the trunk requires passing review; conflicts have explicit merge rules; problems can be precisely reverted to any historical version; who changed which line is plain to see. This discipline is not a windfall conjured from nowhere but is brought automatically by the "artifact is text" form — text can diff, diff enables review, review enables collaborating without overwriting each other. This matters especially for AI-Native, because once agents begin changing design en masse and in parallel, without this discipline the outputs of multiple agents would instantly conflict, reviewable by no one. The code form thus hands "humans + multiple agents changing one design together" a ready-made, scalable collaboration substrate — something canvas tools structurally cannot give.
Here a misreading of "artifact becomes code" must be avoided: it does not mean designers must all learn to write code and become front-end engineers. "The artifact is code" refers to the artifact's expression form being text, not a demand that every designer hand-type that text — quite the opposite, hand-typing that text is exactly the part that should go to generation. What a designer needs is not the ability to write code but the ability to read and judge whether the design that code describes is good, plus the ability to state intent clearly enough to steer generation. In other words, the code form asks of designers not "can write" but "can judge + can steer" — which again lands on this volume's through-line: people retreat to judgment and direction, execution (including writing the design as code) goes to generation. So "design as code" and "designers return to taste and meaning" do not conflict but are two faces of one thing: precisely because the artifact becomes code an agent can read and write, generation can take over execution, and the designer is freed to do only judgment. Get this straight and you will not mistake "I can't write code" for being excluded from AI-Native design — what you need has never been to write, but to judge.
图FIGFIG. D2.0 / ARTIFACT BECOMES CODE · 产物变代码 → 四属性看懂:产物从二进制画布变文本,一次性获得哪四个杠杆Read: artifact goes binary→text and gains which four levers at once
The four properties are not tool features but byproducts of the "artifact = text" form. This is exactly the engineering volume's amplification law — what meets agent-friendly properties gets amplified, what is locked in proprietary binary gets sidelined — landing on the design surface. The last property (verifiable) is the entry point where taste's guardrails become machine-checkable; it connects to DSN 06/08.
"图形即代码"不是新发明,是一条早就存在、如今被 AI 引爆的暗线
"Graphics as code" is no new invention but a long-standing undercurrent now detonated by AI
值得提醒一句历史:把视觉产物表达为文本,从来不是新事。SVG 用 XML 描述矢量图、CSS 用声明式规则描述样式、LaTeX 用标记描述排版、PostScript 用程序描述页面——几十年里,"图形即代码"一直作为一条暗线存在,只是因为人手写它太慢,所以大多数设计仍然回到所见即所得的画布。AI 改变的不是这条暗线本身,而是它的经济性:当生成能以近零成本写出、读懂、改动这些文本表达,"人手写太慢"这个唯一的拦路虎消失了。于是这条一直存在的暗线被引爆——代码形态从"理论上更优、实践上太累"变成"理论实践双优"。这解释了为什么这一波不是又一次工具迭代,而是一次形态的相变:不是发明了新东西,是让一直更优的那个形态终于变得可行。从业者该读出的信号是:押注产物的可文本化程度,而不是押注某个具体工具的功能。
A note of history is worth making: expressing visual artifacts as text is not new at all. SVG describes vectors in XML, CSS describes styling in declarative rules, LaTeX describes typesetting in markup, PostScript describes pages as a program — for decades "graphics as code" has existed as an undercurrent, held back only because writing it by hand is too slow, so most design still returns to the WYSIWYG canvas. What AI changes is not the undercurrent itself but its economics: once generation can write, read, and edit these text expressions at near-zero cost, the one obstacle — "too slow to hand-write" — disappears. So the long-present undercurrent is detonated: the code form goes from "theoretically superior, practically exhausting" to "superior in both theory and practice." This explains why this wave is not yet another tool iteration but a phase change in form: nothing new was invented; the form that was always superior finally became feasible. The signal a practitioner should read: bet on how text-expressible the artifact is, not on the features of any one tool.
同构 / 深潜Isomorphism / dive
设计系统 ↔ 架构的"结构对 agent 可读"是同一招——都是让海量生成连贯的护栏。见The design system ↔ the architecture chapter's "structure legible to agents" is the same move — both are guardrails that keep mass generation coherent. See 架构篇 ↗the Architecture chapter ↗。.
The designer's move shifts from "polish one comp" to "generate many, then judge, curate, steer." Being able to make is no longer the prize; being able to pick and to point the direction is. Taste becomes the bottleneck — like verification in engineering, only the object is experience and beauty.
Redraw: move the action upstream to judgment — let generation spread candidates, and have the designer do the three things machines cannot: pick (which is on-target), critique (why it is good, where it falls short), steer (which direction to regenerate). Isomorphic to engineering's trust-but-verify: generation is trusted but verified, only here what is verified is taste and experience, not correctness.
人的节点没消失,它从产出链的末端搬到了前端
The human node did not vanish; it moved from the end of the chain to the front
In the old process the designer's hand ran the whole length: from a blank canvas, stroke by stroke, realizing the comp in their head, with value concentrated in execution precision. In the new process execution goes to generation; the human hand exits the "making" segment and concentrates at the two ends — the front-end intent and spec (which way to generate, what is good) and the mid-stream judgment and steering (which version, why, where next). This is not a demotion of the designer to "the person who presses buttons"; quite the opposite: many people can make, few can judge. When an action can be subcontracted to generation, it exits the human's core value; what remains for people is always the judgment node that cannot yet be automated. The designer's scarcity therefore moves up a layer — from hand-skill to eye and a sense of direction.
图FIGFIG. D3.0 / MAKING → JUDGING · 人的节点上移看懂:设计师的手从"做"退出,落到"规格"与"判断"两处Read: the designer's hand exits "making," lands on "spec" and "judgment"
Both lanes run left to right, but the human's red box has moved: in the old lane people occupied the mid-stream "execution"; in the new lane execution goes to the machine (blue) and people retreat to the two red boxes, ① spec and ③ judgment. Being able to make stops being the prize; being able to judge becomes it — this is "the bottleneck moves" at the level of an individual's actions.
为什么"判断多稿"比"打磨一稿"更难,而不是更省事
Why "judging many" is harder than "polishing one," not easier
A common misreading frames "judging many" as the easy job — you don't have to make it yourself, just glance and pick the nice-looking one. Quite the opposite: judgment is a higher-order cognitive activity than execution, and doing it well is harder than polishing one comp. When polishing one, you hold a clear target image in your head and the rest is hand-skill; when judging many, you face a set of candidates that all "look fine" and must make a reasoned trade-off among them — which requires you to first think the criteria through (or you are merely picking by feel), then evaluate each against the criteria, and also read from the rejects "which direction to regenerate." This is a capability that needs deliberate training, not one that comes automatically with knowing how to design. It also explains why "generation made design simpler" is a dangerous illusion: it made production simpler while making the part that truly decides success — judgment — denser and more demanding of skill. A team that treats judgment as the easy job will find it has only picked, faster, a slightly better slop out of a pile of slop.
Hidden here too is a fact unfriendly to juniors yet that must be faced squarely: judgment is hard to fast-track by "watching"; it accumulates mainly through "having done it, and having reviewed why." A designer's growth path used to be clear — make a great deal, and feel and judgment grow together in the making. Now that execution is taken over by generation, juniors lose the natural path of "training judgment by making," yet are pushed straight to the higher-order task of "judging many." This is a real, not-yet-well-answered difficulty of AI-Native design: if juniors no longer need to do large amounts of execution, where does their judgment come from? The partial answer visible so far is to make judgment itself a trainable object: have juniors repeatedly practice "judging candidates against a spec, stating criteria hits and misses," and have seniors explicitly articulate their judgment for learning (which returns to DSN 03·5's "externalizing taste"). But this deserves an honest admission of its difficulty rather than pretending "with AI around, juniors can just jump in." The cost of treating judgment as the easy job falls first on juniors, then on the whole team's reserve of judgment.
最后值得点明"挑、评、导"三者中最被低估的是导——把评判转成下一轮的具体方向。挑(选出对路那版)和评(说清为什么)已经被很多人意识到重要,但"导"常被忽略:它要求你不仅判断现状,还要从落选的候选里读出"信息"——这一批整体偏冷说明气质方向要调暖,这一版的某个局部对了说明值得在那个方向上深化。导是一种把判断转化为生成指令的能力,它直接决定了下一轮铺开是更聚焦还是又一次发散。一个只会挑评、不会导的人,环会卡在"挑出一个还行的,但不知道怎么让下一轮更好"——这正是很多人感觉"用 AI 做设计到某个点就上不去了"的根因。导的能力把判断接回了生成,让闭环真正转起来。三者合一,挑评导才构成完整的判断动作;缺了导,判断就只是静态的评价,无法驱动迭代。这也是为什么 DSN 07 那条环里,④导向是连接"人的判断"与"机器的下一轮生成"的那根轴。
Finally, worth naming: of "pick, critique, steer," the most underrated is steer — turning critique into the next round's concrete direction. Picking (choosing the on-target one) and critiquing (saying why) are already recognized as important by many, but steering is often overlooked: it requires you not only to judge the present but to read "information" out of the rejected candidates — this batch running cold overall says the character direction should warm up; one local part of this version being right says it is worth deepening in that direction. Steer is the ability to convert judgment into a generation instruction, and it directly decides whether the next spread is more focused or yet another divergence. Someone who can only pick and critique but not steer gets the loop stuck at "picked a decent one but don't know how to make the next round better" — which is the root cause of many people feeling "AI design plateaus at some point." The ability to steer reconnects judgment to generation and makes the closed loop actually turn. Together, pick-critique-steer form the complete judgment act; without steer, judgment is only static evaluation that cannot drive iteration. This is also why, in the DSN 07 loop, ④ steer is the axle connecting "human judgment" to "the machine's next round of generation."
检验信号Test signal
品味命中率上升——候选里一次选中"对路那版"的比例;以及设计师花在判断而非亲手产出上的时间占比上升。产出量本身不是指标。证伪:若设计师花在判断上的时间没升、命中率没升,只是出稿更快了,那就还停在旧流程,只是手更快——没真正发生"造物→判物"的迁移。Taste hit-rate rises — how often the on-target version is picked first; and the share of time spent judging rather than producing by hand rises. Output volume itself is not the metric. Falsified if: the share of time on judgment does not rise and hit-rate does not rise — only comps come faster. Then you are still in the old process with a faster hand; the "making → judging" shift has not actually happened.
同一道分叉,落在设计面上为什么不一样
The same fork lands differently on the design face
It is worth deriving this fork in place rather than importing the conclusion from the Engineering volume. The kernel's second step says "judgment forks along a verifiability gradient": what a machine can verify goes to the machine, what it cannot stays with people. Everything turns on what "verifiable" means — it means a decision procedure exists: given an artifact, a machine can independently and reproducibly rule it right or wrong. Engineering can push most judgment to machines precisely because correctness usually has such a procedure: a test passes or fails, a type checks or does not, a benchmark returns a number. That is a verdict, not a feeling.
On the design face, what gets judged is experience and beauty — and here "good" has no decision procedure. "Whether this version is on-target, whether it is for these people, whether it crossed the taste line" cannot be reduced to a test that passes or fails. You can measure usability, measure conversion, measure contrast compliance, but these are proxies for "good," not "good" itself; mistaking the proxy for the verdict is exactly where slop and homogenization come from (DSN 09·7). So the same fork lands at a different place: on the engineering face the bulk of judgment slides to the machine side; on the design face the load-bearing judgment — taste — cannot slide across because it has no machine-checkable decision procedure, and it sits structurally on the human side. This is not "not yet automated"; it is "the thing being judged is not a decidable proposition in the first place." Taste is therefore not one merely-hard judgment among many — it is the constitutive judgment of this fork on the design face: take it away and the fork has no human end at all. [grade Ⅴ, argument; the proxy-vs-"good" distinction is the ceiling of formalization in DSN 06]
图FIGFIG. D3.5 / THE VERIFIABILITY-GRADIENT FORK · 设计面看懂:把设计判断按"有没有判定程序"铺成一条梯度,看内核第②步在哪一点叉开Read: lay design judgments on a gradient by "is there a decision procedure," and see where the kernel's step ② forks
This gradient is not a ranking by difficulty but by "whether a reproducible right/wrong decision procedure exists." At the left end (contrast, spacing scale, tokens, lint) the procedure exists, so these slide to the machine side and get amplified by the code form; the further right, the less the procedure can be written down, until at "is it alive, is it the right idea" there is no procedure at all — it is a feeling, not a verdict. The kernel's step ② forks at a point on this gradient: the machine-checkable goes to the machine, the uncheckable sits structurally on the human side. Taste is therefore not the merely-hardest judgment among many but the constitutive judgment of this fork on the design face — take it away and the fork has no human end.
可机检 · 交给生成与护栏Machine-checkable · hand to generation & guardrails
Treating taste as innate mysticism leads to the wrong conclusion that "it cannot be discussed, taught, or externalized into a spec" — which is exactly what makes people give up the thing they should most be doing in an AI-Native process. Taste is in fact decomposable: it is the synthesis of empathy for "who it is made for,"discrimination of "what counts as good," and a sense of direction for "where to go next." Decompose it and you can say clearly why it is scarce and why it cannot be outsourced to generation.
Why is taste a scarce judgment, not a scarce skill? A skill can be subcontracted to generation — drawing a gradient, laying out a grid, tuning a palette; these "can-do" things a model already does fast and well. But taste asks not "can it be done" but "should it be this way": among countless candidates all "done right," recognizing which one is right for these people. That is a discrimination, not an output; it relies on understanding the user's situation, memory of the brand context, sensitivity to the line where "more becomes worse" — all things that cannot be read from the artifact text alone. This is exactly why DSN 00's triage question sorts it to ④: taste sits at the far end of the verifiability gradient, furthest from "machine-checkable," and therefore furthest from "automatable."
拆开看:共情是品味的根,没有它品味只是个人偏好
Unpacked: empathy is the root of taste; without it, taste is mere preference
三个组成里,共情是根,另外两个都长在它上面。判别力(什么算好)和方向感(往哪走)若脱离了"为谁",就退化成纯粹的个人偏好——"我觉得好看"。这正是 slop 与品味最容易被混淆的地方:一个设计师凭个人审美做的判断,和一个设计师为某群用户做的判断,外表都像"主观选择",但前者是偏好,后者是品味。区别就在有没有共情这个根——品味永远能回答"对谁、在什么处境下、为什么",偏好只能回答"我喜欢"。这也解释了为什么 AI 在共情这件事上帮不上根本的忙:它可以基于数据模拟"某类用户可能怎么想",提供有用的素材,但它无法真的在意那群人过得好不好——而品味的根,恰恰是这种在意。一个不在意用户的设计师,给他再强的工具,做出的也只是更精致的自我表达;一个真在意用户的设计师,哪怕工具简陋,也能做出"为他们而存在"的东西。这就是为什么这一卷反复回到"为人":它不是道德口号,它是品味之所以是品味、而非偏好的那个根。
Of the three components, empathy is the root, and the other two grow on it. Discrimination (what counts as good) and direction (where to go) degenerate into pure personal preference — "I think it looks nice" — once detached from "for whom." This is exactly where slop and taste are most easily confused: a judgment a designer makes from personal aesthetics and a judgment a designer makes for some group of users both look, on the surface, like "subjective choices," but the former is preference and the latter is taste. The difference lies in whether the root of empathy is present — taste can always answer "for whom, in what situation, why," preference can only answer "I like it." This also explains why AI cannot fundamentally help with empathy: it can simulate, from data, "how a class of users might think," offering useful material, but it cannot actually care whether those people are well served — and the root of taste is precisely that care. Give a designer who does not care about users the strongest tool and they make only more polished self-expression; a designer who truly cares about users can, even with crude tools, make something that "exists for them." This is why this volume keeps returning to "for people": not a moral slogan but the very root that makes taste taste rather than preference.
把品味拆成共情、判别、方向这三层,还有一个直接的实用价值:它让"如何培养品味"从一句空话变成可操作的训练。如果品味是不可拆的天赋玄学,那"培养品味"就无从下手;但既然它是三层判断的合成,每一层就都可以被单独练。练共情:逼自己离开屏幕去真正接触用户、观察他们在真实处境里怎么用、卡在哪、为什么放弃——共情不是想象出来的,是看出来、问出来的。练判别:刻意做"对照明确判据评判候选"的练习,每次都强迫自己说出"好在哪、差在哪",把模糊的好恶逼成清晰的判据。练方向:复盘自己和高手的判断差在哪——同一组候选,为什么高手选了 B 你选了 A,他看到了什么你没看到。这三层各有各的练法,合起来就是一条可执行的品味成长路径。这也再次印证了"品味稀缺但不神秘":稀缺,是因为它要长期投入才能练成;不神秘,是因为它确实可以被拆解、被刻意练习、被一层一层地积累。
Decomposing taste into the three layers of empathy, discrimination, and direction has a direct practical value too: it turns "how to cultivate taste" from an empty phrase into operable training. If taste were indivisible innate mysticism, "cultivating taste" would have no handle; but since it is the synthesis of three layers of judgment, each layer can be trained on its own. Train empathy: force yourself to leave the screen and genuinely contact users, observe how they use things in real situations, where they get stuck, why they give up — empathy is not imagined but seen and asked out. Train discrimination: deliberately practice "judging candidates against explicit criteria," each time forcing yourself to state "good where, weak where," pressing vague likes and dislikes into clear criteria. Train direction: review where your judgment differs from an expert's — given the same set of candidates, why did the expert pick B while you picked A, what did they see that you did not. Each of the three layers has its own way to practice, and together they form an executable path for growing taste. This confirms once more "taste is scarce but not mysterious": scarce because it takes long investment to build; not mysterious because it can indeed be decomposed, deliberately practiced, and accumulated layer by layer.
品味稀缺,但不神秘——它可以被外化、被教、被回流
Taste is scarce but not mysterious — it can be externalized, taught, and fed back
"Scarce" and "ineffable" are two different things, and conflating them is a double error. Granting that taste is scarce is granting it cannot be replaced by generation and must be held by people; but going further to assume it is "ineffable" makes you give up the spec-writing of DSN 04/08 — and the spec is precisely the externalized form of taste. A senior designer needs only a second to judge "this version is no good," but if they can say and write down "where it's no good, what good should look like," that one-second judgment becomes a guardrail you can feed to generation, criteria you can teach a junior, an asset the team can reuse. A core training of AI-Native design is exactly to force the designer to articulate, item by item, the taste hidden in their intuition — not because articulating it makes taste un-scarce, but because only articulated can it enter the closed loop and compound. A judgment that cannot say "why it's good" is invisible to the team and to generation; it works once, only inside that one head.
An argument often used to deny that "taste can be discussed" runs: "taste is subjective, beauty is in the eye of the beholder, there's no right or wrong to speak of." This argument conflates relative with arbitrary. Taste is indeed relative — it holds relative to "for whom, what context, what purpose"; the same design is on-target for solo developers and off-target for an early-childhood education product. But relative does not mean arbitrary: once "for whom, what context, what purpose" is pinned down, "which version is more on-target" has an answer that can be discussed, argued, even agreed on by most experienced people. This is exactly why FOR-WHOM must come first in DSN 08's spec — it converts "subjective" into "judge-ability relative to a defined object." Calling taste purely subjective is often an excuse to abandon judgment; real design judgment is never "I like it" but "for these people, under this purpose, this version is more right, because…" AI's place in this is thereby fixed too: it can help you simulate "how a class of users might react," supplying material for judgment; but the anchor that pins down relativity — "for whom, what purpose" — and the final verdict on "on-target or not" must still be set by a human who understands these people.
检验信号Test signal
品味在被外化的信号:团队评审时,"我觉得这版更好"逐渐被"它更好,因为对 X 用户在 Y 处境下命中了 Z"取代;新人的命中率随规格完善而上升。证伪:若资深设计师的判断始终说不出理由、无法写进规格、新人怎么也学不会,那要么品味还没被真正拆解,要么把"个人偏好"误当成了"品味"。Signal that taste is being externalized: in review, "I feel this version is better" gets gradually replaced by "it's better because it hits Z for user X in situation Y"; juniors' hit-rate rises as the spec sharpens. Falsified if: a senior's judgment can never state a reason, never enters the spec, and juniors never learn it — then either taste has not really been decomposed, or "personal preference" has been mistaken for "taste."
DSN
04
SYSTEM · 系统即护栏 / 何为好
SYSTEM & THE SPEC OF GOOD
重画 · 规格
Redraw · Spec
设计系统即护栏,把"何为好"写下来
The system as guardrail, writing the spec of "good"
Tokens, components, brand, principles are no longer post-hoc docs but the upfront spec and guardrail for generation: keeping mass generation inside the system, on-brand, off the slope to slop. And to make generation converge in the right direction, the designer must write down "what is good."
Generation cannot read minds. A standard hidden in a senior designer's intuition is one the model can neither learn nor be judged against; unwritten, generation slides back to the mean. Externalize intent and criteria into a spec that steers and judges generation — a human spec that cannot be outsourced, the same kind as the Verification chapter's "humans define what's right," except here you define what's good. The ruler should hold:
There is an often-overlooked causal chain worth stating fully: the ceiling on generation's quality depends not on how strong the model is but on how discriminating the spec fed to it is. Give an extremely strong model "make a good-looking dashboard" and it can only return you the mean; give a mid-tier model a spec that spells out "for whom, what character, what counts as done, what is a red line" and it can land in the narrow band instead. This means that in AI-Native design the designer's leverage point moves from "how skilled the hand is" to "how precise the spec is." The design system is precisely the persisted, reusable, machine-checkable form of that spec — it distills one-off verbal judgments into a guardrail that takes effect automatically on every generation. Treat it as a post-hoc doc and you re-explain "what we want" to the model from scratch every time; treat it as an upfront guardrail and the whole team (and every agent) shares one ruler already ground sharp.
This causal chain also explains a phenomenon that puzzles many teams: why, using the same newest model, do some teams produce steadily good work while others keep spinning in slop? The difference is almost never the model but the quality of the spec fed to the model and the completeness of the design system behind it. The model is the same for everyone; the difference in its output comes almost entirely from the different guardrails constraining it. This means that in the AI-Native era, a team's design competitiveness shows less and less in "who can hire the more skilled-handed designer" and more in "who can distill what-is-good into a more precise, complete, machine-checkable spec and system." This is a redefinition of organizational capability: the core asset shifts from "individual craft" to "externalized, reusable, generation-feedable judgment." And precisely for this reason, teams that start early on seriously building the design system, writing specs, and feeding judgment back run faster and faster along this curve, while teams that treat the design system as a doc to fob off find themselves stuck near the mean no matter how new a model they swap in — because the model was never the bottleneck; the guardrails are.
为谁、为什么:要服务的人是谁,让他们完成什么、感受到什么。
For whom, for what: who it serves, what it should let them do and feel.
意图与气质:该有的调性与个性,以及明确不要什么。
Intent and character: the tone it should carry, and explicitly what it should not be.
"完成"的判据:怎样算好、算对路、算可发——一组能据以验收的具体信号。
Criteria for "done": what counts as good, on-target, shippable — concrete signals you can accept against.
反 slop 红线:明令避开的套路(见下一节),把"别像 AI 做的"变成可检查的条目。
Anti-slop red lines: clichés to avoid by rule (see next sheet), turning "don't look AI-made" into checkable items.
设计系统从"交付后的文档"升级为"生成前的护栏"
The design system upgrades from "post-hoc doc" to "pre-generation guardrail"
The old design system was a post-hoc artifact: build the interface first, then go back and tidy up a set of tokens, a component library, brand guidelines for the team to align on. Its role was recording a consensus already formed. In the AI-Native process that role inverts — the design system must move upfront into the spec fed to generation. The reason is direct: generation's default destination is the mean (see DSN 01's asymmetry), and the only thing that can pull it off the mean is a constraint given before generation happens. Generation without guardrails hands the wheel to the training distribution; generation with guardrails is what constrains it to the narrow band of "these people, this brand." So the design system is no longer the result of design but its precondition — the more complete and machine-checkable it is, the harder it is for mass generation to drift off-brand.
同一份规格切两层:机器守"不离牌",人守"是否为人"
One spec, two layers: the machine holds "on-brand," the human holds "for people"
A spec is not a homogeneous blob of text; it splits internally into two layers, and those two layers map exactly onto the kernel's ① and ④. Mixing them is the root of every failure mode downstream: either you force taste into lint (and get pixel-perfect, soulless slop), or you have humans police the alignment and spacing a machine should own (burning scarce judgment on the automatable). The table below turns this cut into a copyable contrast 〔criterion: the triage question in DSN 08〕:
Think of generation's output as a probability distribution. Without guardrails the peak sits at the mean of the training distribution — that is, slop. The design system does more than narrow the distribution; two hard walls clamp the probability mass off the mean, pushing it into the band that is "true only for these people." The walls are machine-checkable hard constraints (token / alignment / contrast); the shape inside the walls is still sculpted by human soft criteria.
图FIGFIG. D4.1 / SYSTEM AS A TWO-LAYER SPEC · 把护栏拆成两层看懂:tokens→组件→品牌叠成可机检的护栏层,它如何夹住生成的输出分布Read: tokens→components→brand stack into the machine-checkable guardrail layer, and how it clamps generation's output distribution
设计系统不是一份文档,而是两层规格。A 层是可机检的护栏:tokens(原子值)、组件(组合契约)、品牌里能写成 lint 的部分(禁用模式、字体白名单),自下而上叠成一道生成前就固化、机器能逐条核验的硬约束。这道护栏的作用,正是把右边那条本会峰在均值(=slop)的输出分布,夹离均值、推进"只对这群人成立"的窄带。但护栏只决定墙在哪——墙内"哪一版才真的对路",是 B 层、是人的软判据。把这两层分清楚,就不会再问"设计系统能不能替我判断":A 层永远替你,B 层永远替不了你。
A design system is not a document but a two-layer spec. Layer A is the machine-checkable guardrail: tokens (atomic values), components (composition contract), and the lint-expressible part of brand (banned patterns, font whitelist), stacking bottom-up into a hard constraint frozen before generation and verifiable item by item. The job of this guardrail is exactly to take the output distribution on the right — which would otherwise peak at the mean (= slop) — and clamp it off the mean into the narrow band that is "true only for these people." But the guardrail only sets where the walls stand; "which version inside is truly on-target" is Layer B, the human's soft criteria. Keep the two layers distinct and you stop asking "can the design system judge for me": Layer A always can, Layer B never can.
设计系统与架构的"结构即护栏"是同一招——这不是类比,是同一个原理
The design system and architecture's "structure as guardrail" are the same move — not an analogy but one principle
It is worth stating the intra-series isomorphism fully here, because it proves this method is not a special case of design but the same principle recurring on a different face. The Architecture chapter argued: when agents generate code en masse, the only thing that keeps that code coherent, non-conflicting, and on-intent is a set of structural constraints legible to agents — clear module boundaries, types, interface contracts. The design system does exactly the same thing on the design face: when agents generate interfaces en masse, what keeps them coherent, on-brand, off the slope to slop is a set of structural constraints legible to agents — tokens, component contracts, brand principles. Both reconcile the tension between "mass generation" and "coherence" with "upfront, machine-checkable structural guardrails." This is not a rhetorical analogy but follows from facing the same underlying problem: once generation gets cheap, the quality bottleneck shifts from "can it be generated" to "can the pile of generated things stay consistent and on-intent," and the general form for solving that bottleneck is to freeze intent into pre-generation, machine-readable constraints. This is why someone who has read the Architecture chapter feels a strong déjà vu here — the same machine, a different face.
This also answers a practical question: in an AI-Native process, to what degree should you invest in the design system before it is "enough"? The old standard was "enough to align the team" — since it was only a post-hoc doc, over-completeness was waste. The new standard is far higher: the design system's completeness and machine-checkability directly determine the ceiling on generation quality, so it deserves continuous investment as a core asset, not as a scrap to tidy when there is time. Concretely, things once described loosely in natural language ("use our brand blue for the primary," "keep spacing consistent") now deserve to be frozen into forms a machine can consume directly: tokens written as JSON not a screenshot, components written as code with explicit interface contracts not artboard examples, the machine-checkable parts of brand principles (contrast, typeface whitelist, banned patterns) written as lint rules. The return on this investment compounds: the more complete the design system, the more each generation saves judgment and stays on-brand, and each round's judgment flowing back makes it more complete still. This is a positive feedback loop — and precisely because it is, investing early pays off far more than investing late. A team that treats the design system as a post-hoc doc is continuously forgoing this compounding curve.
But this compounding curve has a boundary that must be held, or it backfires: what a design system can freeze is only "good that has already been judged"; it cannot substitute for "making a new judgment about a new situation." The more complete the guardrails, the more dangerous a temptation arises — believing that as long as you generate by the system the result must be good, so you stop judging and treat the system as autopilot. This mistakes ③ (context/guardrail) for ④ (human judgment). The guardrails' job is to constrain generation to the narrow band of "good judged in the past," sparing you from re-judging the already-judged each time; but when a genuinely new situation arises — a kind of content the system never covered, a group of users never served, a scenario where old criteria no longer apply — the system falls silent, or worse, hands you, by old criteria, a result that "looks compliant but is off-target." Here a human must judge anew and feed the new judgment back into the system (precisely ⑥ distill). So the design system is a sediment of judgment, not a substitute for it. Hold this boundary and the compounding curve holds; cross it and treat the system as autopilot, and at the first new situation it will quietly carry you back to the mean.
Efficiency was never the point. Making comps faster, if it only means producing slop faster, is no win. Once generation takes over production, the designer is returned to what should have been theirs all along — empathy, taste, meaning. The machine covers surface; the human defines direction and meaning.
交给生成Hand to generation
铺变体、补全状态与边角
Variants, states, edge cases
套用设计系统、对齐切图
Applying the system, alignment, export
初稿与探索性方向
First drafts, exploratory directions
留给人 · 品味与意义Keep with humans · taste & meaning
共情:理解用户真正要什么
Empathy: what users truly want
品味:判好坏、守独特、避 slop
Taste: judge, hold distinctiveness, avoid slop
意义:为"这值不值得存在"负责
Meaning: own whether it deserves to exist
承重命题:设计的成败,不看出稿多快,而看它最终是不是真的为具体的人而做。这不是装饰性表述,它是整卷的承重墙,也是整个系列那条人本主线在设计面上的落点。把它当真,会改变你对"AI-Native 设计成功了没有"的判断标准:如果一个团队用 AI 把出稿速度提了十倍,却产出的全是更快、更光滑的 slop——谁也不为、谁也不记得——那么按这条命题,它失败了,哪怕每个效率指标都漂亮。反过来,如果一个团队出稿没快多少,但设计师把省下的每一分注意力都投到了共情和品味上,做出的东西真的让目标用户觉得"这是为我做的",那么按这条命题,它成功了。出稿更快本身从来不是赢,它只是把人从重复产出里腾出来;被腾出来的人,要回到只有人能做的那件事——理解人、为人负责。把"更快"本身错当成赢,是 AI-Native 转型里最常见、也最隐蔽的失败——它让你在所有仪表盘都绿的情况下,悄悄丢掉了设计存在的理由。
The load-bearing claim: a design's success is judged not by how fast its comps ship, but by whether it is, in the end, truly made for specific people. This is not a pretty phrase but the volume's load-bearing wall, and where the series' human through-line lands on the design face. Take it seriously and it changes your criterion for "has AI-Native design succeeded": if a team uses AI to make comps ten times faster yet produces only faster, smoother slop — for no one, remembered by no one — then by this proposition it has failed, however pretty every efficiency metric. Conversely, if a team's comps did not get much faster but its designers put every spared minute of attention into empathy and taste, making something that genuinely makes the target user feel "this was made for me," then by this proposition it has succeeded. Shipping faster is never, in itself, the win; it only frees people from repetitive production; the freed person must return to what only a person can do — understanding people, being responsible to people. Mistaking "faster" itself for the win is the most common and most insidious failure of an AI-Native transition — it lets you quietly lose the reason design exists while every dashboard glows green.
"把人还给意义"在设计面上具体指什么
What "returning people to meaning" concretely means on the design face
"People return to meaning" may stay abstract in other volumes, but on the design face it lands most concretely, because design is by nature a craft about people — its entire value lies in "being used by someone, being felt by someone." Once generation takes over the production actions of drawing comps, spreading variants, alignment and export, what is returned to the designer is exactly the original core of this craft: to understand what a specific person in a specific situation truly needs (empathy), to judge whether the version in front of them actually hits that need (taste), to own whether "this thing deserves to exist and what it should be" (meaning). These three were often squeezed to the margins by production hours — designers spent vast time on pixels and exports, leaving little for empathy and judgment. The real promise of AI-Native design is not "designers can produce comps faster" but freeing the designer from production and returning them to what this craft was meant to do from the start. This is the same liberation as the engineering volume's "people do deep systems expertise and product judgment, not throughput," only the object shifts from code to experience and beauty.
This is also why, in the whole series, design is seen as the face where the human through-line lands most concretely. The org volume establishes "putting people back at the center" as the methodology's destination, but at the organizational level it is still macro, still principle; the further down to a concrete function, the more this purpose needs translating into "on this face, what does returning to meaning concretely mean." The engineering volume translates it into "people do systems expertise, not throughput," already one layer more concrete; and at design it lands most concretely, because this craft's object is people to begin with — unlike writing code, separated by a layer of logic, it faces directly "will a person be moved by this thing, will they feel understood." So on the design face, "made for specific people" is not an abstract principle needing laborious argument but nearly the definition of the craft itself: a design that is not for people does not deserve to be called design from the start. AI-Native design is thus the touchstone for whether the whole human through-line can really land — if it cannot be held even on the face closest to "people," it is probably just pretty words on the other faces too. Held, it proves the through-line is not decoration but a load-bearing structure that can be carried all the way down to concrete actions.
反 slop 红线(命中越多,越滑向均值)
Anti-slop red lines (the more you hit, the closer to the mean)
深底配青 / 霓虹强调色 · 紫到蓝渐变 · 渐变文字做标题或数字
Dark bg + cyan/neon accents · purple-to-blue gradients · gradient text on headings or metrics
Generic fonts (Inter / Roboto / system defaults) · centering everything · a big rounded icon above every heading
检验信号Test signal
slop 率下降、独特度上升——把成品丢给陌生人,他会问"这怎么做出来的",而不是"这哪个 AI 做的"。同时盯用户侧:可用性、共情命中、"这是为我做的"那种感觉。Slop rate down, distinctiveness up — show the result to a stranger and they ask "how was this made," not "which AI made this." Watch the user side too: usability, empathy hits, that "this was made for me" feeling.
DSN
06
MECHANISM · 为何代码产物被放大
WHY CODE ARTIFACTS WIN
机理 · 受力
Mechanism
不是工具赢,是产物变代码赢
The tool does not win; the artifact becoming code wins
Take the prior section's principle down to the force level: when the artifact is a proprietary binary canvas, an agent cannot read into it, edit it, or check it, so it can only be marginalized; when the artifact is code / plain text, it gains four properties at once (the engineering amplification law landing on the design surface). But the law has a boundary: it amplifies only the half that can be bound by a spec.
Force analysis: these four properties are not tool features; they are a byproduct of the "artifact = text" form. Figma files, PSDs, proprietary canvases lock state into binary: a human can look; an agent cannot parse, diff, generate, or machine-check it. Swap in HTML / JSX / tokens.json and the same design becomes readable, diffable, generatable, verifiable at once. That is the real reason pencil / Remotion-class tools are amplified: not that they are nicer, but that they picked the right artifact form.
State locked in a proprietary format. An agent can only screenshot and guess, or a human transcribes; each change is an opaque diff, with no handhold for generation or verification.
产物变代码只放大可被规格约束的那一半。对齐、间距、token 符合度——可机检,被放大;而"这版有没有灵魂、是否为这群人而做"无法写进类型系统。把设计当纯工程问题,就会优化掉所有可机检的指标,产出一个挑不出错却谁也不想用的界面。代码是杠杆,不是品味的替身。Artifact-as-code amplifies only the half a spec can bind. Alignment, spacing, token-conformance are machine-checkable and get amplified; whether a version has a soul, whether it is made for these people, cannot be written into a type system. Treat design as a pure engineering problem and you optimize every machine-checkable metric into an interface that passes review yet no one wants to use. Code is leverage, not a stand-in for taste.
为什么是"产物形态"而不是"工具能力"在做功
Why it is the "artifact form," not "tool capability," that does the work
It is tempting to credit pencil's or Remotion's value to "they are more powerful / smarter." That is a misdiagnosis, and it sends people chasing the next flashier tool while missing the real leverage point. What does the work is not any single feature but the fact that the artifact turns from binary into text — which satisfies all the agent-friendly properties at once and is therefore caught by the entire software-collaboration ecosystem. The test is blunt: imagine swapping Figma's export format for a fully readable, semantic text schema while changing nothing else. That one change alone lets an agent read, diff, generate, and machine-check it — and value is amplified immediately. Conversely: give the smartest AI design tool, and as long as it stores results back into an unreadable proprietary binary, the agent still cannot get in and the leverage still does not appear. So it is the form that does the work, not the cleverness.
记症状会过期,懂成因不会——为什么这一节讲机制而非清单
Memorizing symptoms expires; understanding causes does not — why this sheet teaches mechanism, not a list
Someone may ask: since slop's fingerprints can be listed (DSN 05/09 already did), why does this sheet labor over the causes? Because the list expires; the causes do not. Today slop's fingerprints are cyan-on-dark, purple-blue gradients, glassmorphism, but these are only the highest-frequency patterns in the current training distribution; a year or two on, once everyone starts rejecting these and a new high-frequency pattern forms, slop's fingerprints will swap faces — perhaps some new "refined-gray minimalism," some new layout cliché. If you only memorized today's list, by then you will hold an expired map, helpless before a brand-new slop, even mistaking it for non-slop because "it's not on my list." But if you understand the cause — slop = the intersection of high-frequency × safe × easy-to-generate = convergence to the mean — then however the fingerprints change faces, you can recognize it with the same ruler: ask whether this version made a real trade-off for these people, or merely picked the most labor-saving, least-wrong default of the moment. This is why this volume keeps teaching force rather than symptom: symptoms are point-in-time, force is structural; memorizing symptoms lets you handle today, understanding force lets you handle a tomorrow that has not yet arrived.
A corollary worth naming, because it keeps you humble about "anti-slop": the "refined" aesthetic you take pride in today may become tomorrow's new slop. The practices currently regarded as anti-slop — restrained whitespace, monochrome minimalism, serif headlines, asymmetric grids — if, by being celebrated, they become high-frequency, imitated by countless people, the new default of generation models, then they will precisely satisfy the three conditions of "high-frequency × safe × easy-to-generate" and become the next generation's slop. This is not to say these practices are bad in themselves, but that "good" never lives in a specific visual style; it lives in the act of "whether a real trade-off was made for these people." Once a style becomes a thoughtless default applied by rote, it loses that act, however "refined" it once was. The practical implication: anti-slop is not a "good-taste list" you can memorize once and for all, but a judgment act that must be re-performed each time — each time re-asking "for whom, why this way," rather than applying the answer that worked last time. Treating any aesthetic as a permanently correct safe card is itself the start of sliding into slop.
This amplification law has a precise boundary, and stating it clearly matters more than praising it. It acts only on the half of design that "can be bound by a spec" — alignment, spacing, tokens, accessibility, state-completeness. Once caught by the code form, that half gets hugely amplified, nearly free. But the other half — whether this version moves anyone, whether it is made for the people it is meant to serve, whether the pacing is right — cannot be written into any type system or lint rule. Treat design wholesale as a pure engineering problem and a hidden degeneration sets in: you unconsciously optimize only what you can measure, because the measurable gives feedback, runs in CI, shows up on a dashboard. So every machine-checkable metric scores full marks while the artifact is hollow — an interface with no findable flaw that no one wants to look at twice. That is the cost of "code as a stand-in for taste." The correct posture: let the code form fully automate the machine-checkable half, thereby freeing all of the human's attention to guard the half that cannot be machine-checked.
"只优化可测的"是一种隐蔽的退化——它会自动发生,除非你刻意防
"Optimize only the measurable" is a hidden degeneration — it happens automatically unless you guard against it
This degeneration deserves its own treatment, because it is not the result of someone being lazy but a state the system slides into automatically. The measurable has an unfair advantage: it gives instant feedback, enters CI, becomes an upward line on a dashboard, gets cited in a report. The unmeasurable (does this version move anyone) is the opposite: slow feedback, subjective, hard to state, impossible to quantify as progress. When a team faces both kinds of goal at once, rational attention tilts imperceptibly toward the measurable side — not because anyone decided to drop taste, but because the measurable side continuously, cheaply gives positive feedback while the unmeasurable side is forever silent and expensive. Over time the team redefines, without noticing, "good design" as "all measurable metrics pass," which is exactly the definition of hollow slop. Guarding against this degeneration takes deliberate counter-measures: reserve protected time and standing for unmeasurable judgment in the process (for instance, mandate that someone in review answer "metrics aside, does this version move anyone"), and explicitly grant that "all measurable metrics pass" is only a necessary condition, not a sufficient one — it says no rookie mistakes were made, not that something good was made. Writing this into the team agreement is the guardrail against "engineering devouring design."
Turn this boundary right-side up and you reach a reassuring conclusion: the code form does not threaten the designer's core value but protects it. Because it hands every machine-checkable, mechanical, repetitive job — the jobs that used to eat vast designer time yet least needed human judgment — entirely to the machine, thereby freeing the designer's time and attention to invest wholly in the unmeasurable half: understanding users, judging good and bad, holding taste and meaning. In other words, the code form does "take away what people should not be doing," leaving exactly "what only people can do, and most deserve to do." Those who fear "design being devoured by engineering" often have the causality backwards: what gets devoured is not design but the part of design that should have been automated all along; real design — judgment for people — is not devoured but, with the chores cleared away, gets room to fully unfold for the first time. So the right attitude is not to resist the code form but to actively embrace it to clear the ground, then bet all the freed attention on the human boundary the machine can never reach. This is the good news the DSN 06 amplification law ultimately points to: leverage to the machine, meaning to people.
DSN
07
WORKFLOW · AI 设计工作流
THE AI DESIGN WORKFLOW
工件 · 可拷贝
Artifact · Copyable
铺开候选,再收敛——一条可照做的环
Spread candidates, then converge: a loop you can run
"Generate many, then judge" sounds right but does not land. Make it a loop with a cadence: spec → spread → critique → steer → converge → distill. Each step marks what goes to generation versus stays with people, and where context flows. It mirrors the engineering SDD loop on the design surface, only what is verified is taste, not correctness.
How context flows: the spec (①) is the guardrail fed to generation, setting how on- or off-target ② comes out; ③'s "why good / why weak" cannot stay in your head; write it into the next round's prompt and into a system revision. Judgment must flow back, or every round restarts from the mean. A loop you can copy:
②铺开的关键是"方向多样",不是"数量多"
The point of ② spread is "directional diversity," not "high count"
有一个细微但决定成败的区别:铺开 12 个候选,和铺开 12 个方向,是两件完全不同的事。前者常常是同一个想法的 12 次微调——换个色、挪个间距、改个字号——它们挤在审美空间的同一个点附近,给人的判断提供不了任何有意义的对比,只会制造"选择的错觉"。后者是 12 条真正不同的假设:极简文档风 vs 终端美学 vs 杂志排版 vs 卡片流……每一条代表一个对"该怎么为这群人做"的不同回答。判断力只有在面对真正的差异时才有用武之地——你能说出"A 方向的克制更命中这群用户的工程感,B 方向太热闹",这是判断;而在 12 个微调里挑一个,你只是在表达偏好。所以②的指令不该是"多生成几个",而该是"生成几个互不相同的方向",并且在提示里显式要求方向间的差异度。这也解释了 DSN 11 那条反指标——不停"再来一个"却说不出在找什么——的本质:那是在数量轴上空转,而非在方向轴上探索。
There is a subtle but decisive distinction: spreading 12 candidates and spreading 12 directions are entirely different things. The former is often 12 tweaks of the same idea — a different color, a shifted margin, a changed size — crowded near the same point in aesthetic space, offering judgment no meaningful contrast and only manufacturing "the illusion of choice." The latter is 12 genuinely different hypotheses: minimal-docs vs terminal aesthetic vs magazine typography vs card flow… each a different answer to "how should this be made for these people." Judgment has work to do only when facing real difference — being able to say "direction A's restraint hits this audience's engineering sensibility better, direction B is too busy" is judgment; picking one of 12 tweaks is only expressing a preference. So ②'s instruction should not be "generate a few more" but "generate a few mutually distinct directions," with the diversity between directions demanded explicitly in the prompt. This also explains the essence of DSN 11's counter-signal — endless "one more" with no statement of what you seek: that is spinning on the count axis rather than exploring on the direction axis.
① 规格 SPEC:写下为谁、意图气质、"完成"判据、反 slop 红线(见 DSN 08)。喂给生成前先定,别边生成边想。
① SPEC: write for-whom, intent/character, "done" criteria, anti-slop red lines (see DSN 08). Set it before generating, not while generating.
② 铺开 SPREAD:一次出多个方向(不是一个的微调)。要的是方向多样性,不是数量。
② SPREAD: produce several directions at once (not tweaks of one). You want directional diversity, not count.
③ 评判 CRITIQUE:挑、并说清判据命中/落空在哪。说不出"为什么",就还没在判断。
③ CRITIQUE: pick, and name where it hits or misses the criteria. If you cannot say "why," you are not yet judging.
④ 导向 STEER:把评判变成下一轮的具体指令,只在选中方向上深化。
④ STEER: turn the critique into the next round's concrete instructions; deepen only the chosen direction.
⑤ 收敛 CONVERGE:定一稿,跑可机检的护栏(token / 对齐 / 可访问性)。
⑤ CONVERGE: settle one, run the machine-checkable guardrails (token / alignment / accessibility).
⑥ 沉淀 DISTILL:把这轮新学到的判据回流进设计系统与下次的规格——让护栏越用越准。
⑥ DISTILL: feed the round's new criteria back into the design system and the next spec; guardrails sharpen with use.
这条环必须闭合——判断不回流,每轮都从均值重启
The loop must close — without feedback, every round restarts from the mean
Many teams build this as an open loop: spec → spread → pick one → ship, and next time start from blank again. The fatal flaw of the open loop is that the precious judgment of step ③ — "why this version is good, where that one fails, what I am actually looking for" — all stays in the designer's head and never becomes anything reusable. So generation restarts every round from the mean of the training distribution, and the guardrails stay frozen at day-one quality. This is exactly why the ⑥ distill step is load-bearing: it externalizes the judgment a human made this round into the next round's spec revisions and system updates, letting the guardrails grow alongside the judgment. The difference between a closed and an open loop is not one extra process step but whether judgment compounds — in the open loop each judgment is used once and discarded; in the closed loop each one raises the floor. This is the same line as the engineering volume's "context becomes queryable infrastructure": judgment becomes a team asset, not a one-off spark, only when it is written down and flows back into the system.
核心图KEY FIGFIG. D5.0 / THE AI DESIGN LOOP · 品味是验证器看懂:comp→变体→评判→收敛,闭环的关键是判断回流Read: comp→variants→critique→converge; the close is judgment flowing back
The outer ring is the workflow's four stations, flowing clockwise. What turns it from "an open loop piling up slop" into "a closed loop growing taste" is the verifier at the center (human taste) and the red feedback arrow on the left — it writes the criteria surfaced in ③ critique back into ① spec. Without that feedback the loop is just a faster slop press; with it, the guardrails sharpen each turn. Isomorphic to the engineering volume's SDD loop, only the verifier is swapped from "correctness" to "taste."
这条环和工程的 SDD 环为什么是同一条——以及关键的那一处不同
Why this loop and engineering's SDD loop are the same — and the one crucial difference
The engineering volume described an SDD (spec-driven development) loop: humans write the spec → agents generate the implementation → humans verify → if wrong, go back, fix the spec, regenerate. Place it beside the design loop here and the structure is identical: both are closed loops of "humans set the standard → machine generates → humans verify → judgment flows back to fix the standard," both relying on the non-outsourceable verifier at the center to constrain generation within intent. This identity is no coincidence but follows from both being implementations of the one kernel on different faces — after execution becomes abundant, people retreat to the two judgment nodes of "set the standard + verify." But there is one crucial difference that must be made clear, or design gets mistaken for engineering: the verifier in the engineering loop checks correctness, which in principle can be approached by automated tests — an implementation either passes the tests or not, a relatively clear boundary; the verifier in the design loop checks taste, which in principle cannot be fully automated, because "on-target for these people or not" has no ground truth pinnable by a test case. This difference dictates: in engineering the share of human verification falls as test coverage rises, while in design the share of human verification has a floor that never reaches zero — and that floor is precisely taste, precisely the standing population of ④. Hold this firmly and you will not make the error of "since tests can be automated, can design acceptance be fully automated too."
On this loop, one last reminder about a common rhythm mistake: compressing six steps into three and running them on the sly. Under pressure, people easily compress "① spec → ② spread → ③ critique → ④ steer → ⑤ converge → ⑥ distill" into "throw out a prompt → pick one → ship" — skipping writing the spec, skipping stating the criteria, skipping judgment feedback. Each skipped step corresponds to a failure already discussed: skip ① spec = generation starts from the mean (slides to slop); skip ③'s "say why" = critique degenerates into voting by feel (cannot steer); skip ⑥ distill = judgment does not compound (every round starts from zero). The value of this loop is precisely that it forces you not to skip: the six steps are not process red tape but six individually load-bearing checkpoints, each corresponding to a place where human judgment must explicitly happen. So the biggest benefit of running this loop is not "looking professional by following process" but that its structure forces you to actually judge once at each place a judgment is due — which is exactly the day-to-day handle for that migration from "a faster hand" to "more accurate judgment."
有效 / 失效的信号Right / wrong signals
先行指标:每轮铺开的方向真有差异(非一版的微调);评判时能逐条指名判据;判断回流后下一轮命中率上升。反指标:不停"再生成一个"却说不出在找什么——那是用生成代替判断,环空转,只会更快地堆出同质化候选。Leading: each spread holds genuinely distinct directions (not tweaks of one); you can name the criteria hit per candidate; hit-rate rises next round after judgment flows back. Counter: endless "generate another" with no statement of what you are looking for. That substitutes generation for judgment, the loop spins, and you only pile up sameness faster.
The loop and the spec above gave the principle; here we walk it through one concrete brief — a landing page for a tool aimed at solo developers — turning "spec → spread → critique → steer → converge → distill" into actions you can check against, and making "a good spec's discriminating power" and "judgment flowing back" visible. This is the method testing itself: if a method cannot even run a most-common brief smoothly, it does not deserve belief.
① 规格。先写下不可外包给生成的人类判断:FOR-WHOM=独立开发者,在焦虑找工具、注意力极短的处境里,要在 30 秒内判断"这是否为我";CHARACTER=克制、工程感、可信,明确不要欢快插画、不要科技蓝渐变;DONE-WHEN=目标用户一眼认出自己、陌生人问"怎么做的"而非"哪个 AI 做的";HARD-RULES=只用系统 token、对比度≥4.5:1、无渐变文字、字体白名单排除 Inter/Roboto。② 铺开。把规格喂给生成,一次出 8 个方向(不是一个的微调):有的走极简文档风、有的走终端/代码美学、有的走杂志排版、有的仍滑回了默认的玻璃拟态。注意:哪怕给了红线,总有几版会漏过——这正说明硬约束需要被写成 lint 在收敛时强制跑,而不能只靠提示里写一句。
① Spec. First write the human judgments that cannot be outsourced to generation: FOR-WHOM = solo developers, anxious and tool-hunting with very short attention, who must judge "is this for me" within 30 seconds; CHARACTER = restrained, engineering-grade, trustworthy, explicitly no cheerful illustration, no tech-blue gradient; DONE-WHEN = the target user recognizes themselves at a glance, a stranger asks "how was this made" not "which AI made it"; HARD-RULES = system tokens only, contrast ≥ 4.5:1, no gradient text, typeface whitelist excluding Inter/Roboto. ② Spread. Feed the spec to generation and produce 8 directions at once (not tweaks of one): some go minimal-docs, some terminal/code aesthetic, some magazine typography, some still slide back to the default glassmorphism. Note: even with red lines given, a few versions slip through — which is exactly why hard constraints must be written as lint and forced to run at convergence, not left to one line in the prompt.
③ 评判。关键不是"挑出最好看的",而是逐条对规格说清命中/落空:终端美学那版命中了 CHARACTER 的"工程感"和 FOR-WHOM 的"一眼认出自己",但首屏信息密度过高,违背了"30 秒内判断";杂志排版那版气质对、可信感强,但太重、加载慢。说不出这些"为什么",就还没在判断、只是在挑。④ 导向。把评判变成下一轮的具体指令:"在终端美学方向上深化,但首屏只留一句价值主张 + 一个真实代码片段,密度降一半。"——只在选中方向上再生成,不重开八个。⑤ 收敛。定一版,跑 HARD-RULES 的 lint:对比度过、token 过、渐变文字零、字体过。⑥ 沉淀。把这轮新学到的判据——"独立开发者落地页首屏密度上限""真实代码片段比抽象插画更命中可信感"——写回设计系统的规格库,下一个类似需求的起点就抬高了。这一步是闭环与开环的唯一差别。
③ Critique. The key is not "pick the prettiest" but to state, item by item against the spec, what hits and what misses: the terminal-aesthetic version hits CHARACTER's "engineering-grade" and FOR-WHOM's "recognize yourself at a glance," but its above-the-fold information density is too high, violating "judge within 30 seconds"; the magazine-typography version has the right character and strong trust, but is too heavy and loads slowly. Unable to say these "whys," you are not judging yet, only picking. ④ Steer. Turn the critique into the next round's concrete instruction: "deepen the terminal-aesthetic direction, but above the fold keep only one value proposition + one real code snippet, halve the density" — regenerate only on the chosen direction, do not reopen eight. ⑤ Converge. Settle one version, run the HARD-RULES lint: contrast passes, tokens pass, zero gradient text, typeface passes. ⑥ Distill. Write the round's new criteria — "above-the-fold density ceiling for solo-developer landing pages," "a real code snippet hits trust better than abstract illustration" — back into the design system's spec library, and the starting point for the next similar brief is raised. This step is the only difference between a closed and an open loop.
为什么这一遍能推广——它没用任何落地页特有的东西
Why this walk generalizes — it used nothing specific to landing pages
This case uses a landing page, but note: among the six steps above, not one depends on the fact that it is a landing page. Swap the object for a mobile app's onboarding flow, a data report, an icon system, even a product video, and the six-step structure holds intact — only the content of FOR-WHOM, the specific thresholds of HARD-RULES, and the criteria checked against in ③ change, while the skeleton "write a discriminating spec → spread genuinely different directions → state criteria hits item by item → deepen only the chosen direction → converge and run hard constraints → feed judgment back" stays fixed. This is exactly the touchstone for whether a method has caught the underlying structure: whether its steps can rerun on a different object without changing the structure. If yes, it caught the force; if it holds only for one kind of object, it caught the appearance. This landing-page case sits here not because landing pages are especially important but because they are most common and easiest for a reader to check against their own work — you could right now swap it for whatever you are making, walk the six steps, and see where you get stuck. The step you get stuck on is often the capability you most need to build next.
One detail in this walk deserves to be pulled out on its own, because it is the easiest to skip yet the most decisive: the act of "saying why" in ③ critique is not an ornament of judgment but its very substance. Many assume critique is "pick the best one" and you are done; but if you cannot say "why it is on-target, on which criterion each reject falls short," what you did is not critique but voting by feel. This distinction has real consequences: with no reason stated, ④ steer has no handle (you do not know which concrete direction to deepen), and ⑥ distill is out of the question (you have no criteria to feed back). So the seemingly redundant verbal act of "saying why" is in fact the key step that converts a one-off intuitive judgment into a steer-able, feed-back-able, reusable structured judgment. A simple discipline: after picking a candidate, force yourself to write three sentences — which criterion this version hits, where the main reject falls short, which direction to deepen next round. If you cannot write these three, you are still at "picking," not yet at "judging." This one discipline is nearly the switch for whether the whole loop actually closes.
这一遍验证了什么What this walk verifies
注意全程:生成做了所有"做出来"的活(铺 8 个方向、深化、补全),人只做了三件机器做不了的事——把规格写得有判别力、逐条说清判据命中、把判断回流进系统。产出量不是这一遍的功劳,命中率才是:第二轮就收敛,是因为第一轮的判断没有白费。这就是 DSN 03→07 在一个真实需求上的合一。Note throughout: generation did all the "making" (spread 8 directions, deepen, fill in), and the human did only the three things machines cannot — write a discriminating spec, state criteria hits item by item, feed judgment back into the system. Output volume is not the win of this walk; hit-rate is: it converged on the second round because the first round's judgment was not wasted. This is DSN 03→07 made one on a real brief.
DSN
08
SPEC · 何为好的规格
THE SPEC OF GOOD
工件 · 模板
Artifact · Template
把"好"写成可生成、半可机检的规格
Write "good" as a spec that is generatable and half machine-checkable
DSN 04 set what a spec must hold; here is a copyable sample, split in two layers: hard constraints that machine-check (token / alignment / contrast / hit-target, writable as lint rules, folded into ① abundance) and constitutive soft criteria (for-whom, character, what-counts-as-done, judged only by people, kept at ④). A good spec makes generation converge toward good without pretending taste is computable.
Why split the layers: forcing soft criteria into lint yields pixel-perfect, soulless slop; leaving hard constraints to the human eye burns attention on what a machine should own. Split, the machine holds "stay on-brand" while the human guards "is it for people." A copyable spec skeleton (drop it into a repo's design-spec.md to feed generation):
FOR-WHOM (soft · human): who, in what situation, to do what, to feel what. e.g. for solo devs, anxious tool-hunting, to judge "is this for me" within 30 seconds.
CHARACTER (soft · human): the tone it carries + an explicit "not." e.g. restrained, engineering-grade; not playful, no tech-blue gradient.
DONE-WHEN(软·人):一组验收信号。例:陌生人问"怎么做的"而非"哪个 AI 做的";目标用户一眼认出自己。
DONE-WHEN (soft · human): a set of acceptance signals. e.g. a stranger asks "how was this made," not "which AI"; the target user recognizes themselves at a glance.
HARD-RULES (hard · machine-checkable): system tokens only; contrast ≥ 4.5:1; hit-target ≥ 44px; no off-system sizes/spacing; zero hits on the anti-slop red lines. This layer becomes lint, run in CI.
分诊判据Triage test
某条约束该进硬层还是软层?问一句:"无需理解用户,仅看产物文本就能判定对错吗?"能 → 硬层(lint / CI,并入①);不能 → 软层(留给人判,留在④)。把这条问句对每条规格走一遍,就得到一份不假装品味可计算的规格。Hard layer or soft? Ask: "can this be ruled right or wrong from the artifact text alone, without understanding the user?" Yes → hard (lint / CI, folded into ①); no → soft (judged by people, kept at ④). Run every constraint through this question and you get a spec that does not pretend taste is computable.
一份好规格的标志:它能让别人(或 agent)替你做出"你会认的"东西
The mark of a good spec: it lets someone else (or an agent) make something "you would sign off on"
How do you tell whether a design spec is good enough? There is a blunt operational test: hand it to a person or agent who has never talked with you and is not inside your head, have them generate from it; when the artifact comes back, do you sign off? If yes, the spec really externalized the ruler in your head; if no, the criteria are still hiding in your intuition, unwritten — and a standard that lives in intuition is one generation can neither learn nor be judged against. This test is useful because it turns "spec quality" from a subjective feeling into a repeatable experiment: run the same spec past three different people/agents, and if their outputs are close to each other and all inside your acceptance band, the spec has converged; if outputs scatter, there is still a lot of blank in the spec that each filled with its own mean. A good spec is not exhaustive but discriminating — it tells "what you want" apart from "what looks about right but is wrong."
This is two faces of the same operation as the Verification chapter's "humans define what's right": verification writes the criteria of correctness (pinnable by test cases); design writes the criteria of good (half machine-checkable, half only human-acceptable). Both share a deep structure: once execution becomes abundant, the "criteria" themselves become the non-outsourceable human artifact. A model can generate infinitely many candidates, but "by what standard to converge" must be supplied by a human — supply no standard and you default to converging on the mean. So writing the spec is not process-documentation busywork but one of the designer's most central moves in the AI-Native era: it is precisely the act of distilling the scarce thing, judgment, into a reusable, feed-back-able, generation-feedable asset.
规格的两个反面:写太死,和写太空
A spec's two failure modes: over-pinned, and over-empty
Writing a spec has two symmetric failures, and understanding them helps you find the narrow path. Over-pinned: nailing down every pixel, color value, and margin is essentially redrawing the comp in words — it neither uses generation's exploratory power (you have already fixed the answer, leaving generation only to replicate) nor leaves the ② spread any degrees of freedom for directional exploration. More insidiously, over-pinning is often the product of mistaking soft criteria for hard constraints: you think you are writing a spec but are writing a stubborn personal preference that blocks every potentially better direction. Over-empty: writing only "make a modern, professional, usable interface" is saying nothing — these words hold for every project and therefore have no discriminating power for this one, so generation can only return you the mean. This writes the spec as correct nonsense. The narrow path is: write hard constraints to machine-checkable precision (the more fixed this side the better, since it should be automated anyway), and write soft criteria to be discriminating without fixing concrete form (spell out "for whom, what character, what counts as done, what to avoid," but leave "what it concretely looks like" to ② to explore and ③ to judge). In a phrase: a spec locks the target and opens the path.
最有判别力的一项,往往是"明确不要什么"
The most discriminating item is often "explicitly what not to be"
Among all spec items, one has badly underrated discriminating power: explicitly writing down "what not to be." A positive description of "what to be" easily becomes correct nonsense — "be modern, be professional, be usable" holds for any project and is therefore almost useless for convergence. But the negative "what not to be" carries discriminating power by nature, because it aims directly at the mean generation is most likely to slide toward. Writing "no tech-blue gradient, no cheerful illustration, no glassmorphism, do not make it yet another SaaS dashboard" blocks, before generation even sets out, the widest road to slop. The mechanism behind this: generation's default is the mean, and "what not to be" is precisely describing the shape of the mean — the clearer you are about what the default you want to avoid looks like, the more your spec can push generation off it. So in a good spec, the "not" list is often more informative than the "to be" list and saves more downstream judgment cost. This connects to DSN 09's anti-slop red lines: that red-line table is essentially a general, machine-checkable "what not to be" — fold it into every project's spec and you get, for free, a guardrail against the most common slop.
One common misunderstanding about "half machine-checkable" also needs clearing up: the "half" does not mean half the criteria are ambiguous and unstatable, but that once criteria are classified, exactly half can go to the machine and half must stay with people, with both sides clear in their own way. The hard-constraint half (contrast, tokens, hit-target) is clear enough to write as assertions, run in CI, return a definite pass/fail; the soft-criteria half (for-whom, character, whether it moves anyone) can also be clear — clear enough to write down signals concrete enough for a human to accept against, like "the target user recognizes themselves at a glance," "a stranger asks how it was made, not which AI" — only its acceptor is a human, not a machine. So "half machine-checkable" is an honest precision: it neither pretends taste can be algorithmically decided (forcing the soft into the hard) nor pushes the machine-checkable part to human eyes (wasting judgment). The sophistication of a good spec lies precisely in knowing clearly which half each of its items belongs to, and handing the right half to the right adjudicator accordingly. This clear-headedness — knowing what to give the machine and what must stay with people — is itself this methodology made manifest on the concrete artifact of the spec.
DSN
09
ANTI-SLOP · 异质审美的守护
ANTI-SLOP & HETEROGENEITY
机理 · 失效
Mechanism · Failure
slop 是同质化,解药是"只对这群人成立"
Slop is homogenization; the cure is "true only for these people"
The slop mechanism: generation defaults to the mean of its training distribution, and the mean is "what everyone has seen most," so slop is homogenization, not poor craft. Its fingerprints are enumerable (table below). And escaping it is not about "more polish" but about pinning the aesthetic to a specific group of people: good design is often true only for some group; being "fine" for everyone is exactly the sign of sliding back to the mean.
Force analysis: to minimize expected loss, a model leans toward the highest-frequency visual patterns: cyan-on-dark, purple-blue gradients, glassmorphism, centered Inter, big-number dashboards. They are slop not because they are ugly but because they are everywhere and for no one. The table turns them into machine-checkable red-line items (feeding DSN 08's HARD-RULES); a hit deducts:
指纹 · 配色
FINGERPRINT · COLOR
青配深底 / 霓虹 / 紫蓝渐变
Cyan-on-dark / neon / purple-blue gradient
修法:从品牌或主题取一个真实的、有来由的主色,限定调色板,删掉渐变文字。
Fix: take one real, motivated primary from brand or subject; constrain the palette; kill gradient text.
指纹 · 材质
FINGERPRINT · MATERIAL
玻璃拟态 / 处处大圆角 + 柔投影
Glassmorphism / rounded-everything + soft shadow
修法:让材质服务层级而非装饰;多数表面用实色与硬边界,模糊只留给真正悬浮的层。
Fix: let material serve hierarchy, not decoration; flat surfaces and hard borders for most, blur only for truly floating layers.
指纹 · 排版
FINGERPRINT · TYPE
Inter/Roboto · 万物居中
Inter/Roboto · centering everything
修法:选一款有性格的字(含对比的衬线/特征字形);建立左对齐为主的真实排版网格。
Fix: pick a typeface with character (a contrasting serif / distinctive forms); build a real left-aligned typographic grid.
指纹 · 布局
FINGERPRINT · LAYOUT
等大卡片网格 · 巨数字仪表盘模板
Equal-card grid · big-number dashboard template
修法:按内容权重定尺寸差异;用真实层级与节奏,而非把一切塞进等大盒。
Fix: size by content weight; use real hierarchy and rhythm instead of stuffing all into equal boxes.
指纹 · 图标
FINGERPRINT · ICON
每个标题上方一个大圆角图标
A big rounded icon above every heading
修法:图标只在帮助识别时用;多数标题靠文字与排版承担,不靠装饰图占位。
Fix: icons only where they aid recognition; let most headings carry on words and type, not decorative placeholders.
Reading slop as "shoddy" points to the wrong cure — the belief that more polish fixes it. The truth is the opposite: slop is often finely made, with perfect alignment, harmonious color, smooth motion, every machine-checkable metric at full marks. Its problem is not on the quality axis but the distinctiveness axis: it has converged on the shape everyone has seen most. That is why "polish it once more" cannot cure slop — you are pushing along the wrong axis. To minimize expected loss, a model naturally leans toward the highest-frequency visual patterns in its training distribution; high-frequency means "everyone does this," and "everyone does this" means special to no one. So the opposite of slop is not "more refined" but "more specific": specific enough to be true only for some group, some brand, some situation. A design that leaves outsiders cold yet moves the target user is not flawed; it has found its boundary.
Guarding heterogeneity is therefore a structural design discipline, not a style preference. Once generation makes "reaching average quality" free, the whole industry's default output slides toward the mean together — this is not any one team's laziness but the gravity of generation economics. Resisting it takes an active, costly choice: stating "this is true only for these people," and accepting that "it is not true for those people" is the necessary cost of that choice, not a mistake. This is precisely the sharpest expression, on the design surface, of the org volume's human through-line — being for people means being for specific people, not for the statistical "everyone." The figure below plots this gravity against the force that resists it.
图FIGFIG. D6.0 / HETEROGENEITY GUARD · 同质化滑向均值 vs 异质守护看懂:生成引力把审美拖向均值,异质守护把它钉在"这群人"Read: generation gravity drags aesthetics to the mean; the guard pins them to "these people"
Each circle is a design with an identity. Generation gravity (gray dashed) pulls them all toward the mean in the middle — that is the homogenization failure mode: unresisted, every design slides into slop. The only counter (red arrow) is to actively pin yourself to one group of people and accept that "cold to other groups" is the cost of that choice, not a mistake. Guarding heterogeneity is guarding "for specific people."
异质守护 · 信号Heterogeneity · signals
有效:目标用户说"这是为我做的",圈外人无感——这正是好的边界,不是缺陷;陌生人问"怎么做的"而非"哪个 AI 做的"。失效:所有人都说"还行/挺专业",没人有强反应;命中上表 ≥3 条指纹。承重一句:对所有人都成立的审美,等于对均值收敛,也就失去具体对象。守住异质,就是守住"这群人"。Right: the target user says "this was made for me" while outsiders feel nothing (that is a good boundary, not a flaw); a stranger asks "how was this made," not "which AI." Wrong: everyone says "fine / professional," no one reacts strongly; ≥3 fingerprints above hit. Load-bearing: an aesthetic true for everyone converges to the mean and loses its specific audience. Guarding heterogeneity is guarding "these people."
DSN
09·5
FAILURE · slop 指纹的成因与具体判据
THE SLOP FINGERPRINT
机理 · 失效解剖
Mechanism · Autopsy
为什么 slop 长得都一样——指纹的成因
Why slop all looks the same — the anatomy of the fingerprint
Cyan-on-dark, purple-blue gradients, glassmorphism, centered Inter, big-number dashboards — this is not coincidence but the same generation mechanism leaving the same set of fingerprints on different interfaces. Understanding the cause (not just memorizing symptoms) lets you recognize even the new clichés you have not seen yet. This sheet takes the fingerprint down to the mechanism level: every fingerprint is the intersection of three things — high-frequency × safe × easy-to-generate.
成因一:高频。模型偏向训练分布里出现最多的视觉模式。过去几年的 dribbble/产品落地页/dashboard 模板里,深色玻璃拟态加霓虹渐变铺天盖地,于是它成了"设计应有的样子"的统计代表。成因二:安全。这些模式在评审里几乎不会被否——它们"看起来专业",没人会因为用了 Inter 而被批评。安全意味着低风险,低风险的东西最容易成为默认。成因三:易生成。渐变、圆角、居中、等大卡片,都是用最少的结构决策就能填满画面的招式——它们不要求理解内容层级,只要求把盒子排整齐。三者叠加,就得到一个稳定的吸引子:生成在没有强约束时,必然落到这组指纹上。这解释了一个反直觉的事实——slop 不是模型"不够强"的产物,恰恰是它"足够强地拟合了均值"的产物。
Cause one: high-frequency. A model leans toward the visual patterns that appear most in its training distribution. Across the last few years of dribbble / product landing pages / dashboard templates, dark glassmorphism with neon gradients was everywhere, so it became the statistical representative of "what design should look like." Cause two: safe. These patterns are almost never rejected in review — they "look professional," and no one gets criticized for using Inter. Safe means low-risk, and the low-risk thing most easily becomes the default. Cause three: easy-to-generate. Gradients, rounded corners, centering, equal-size cards are all moves that fill a screen with the fewest structural decisions — they require no understanding of content hierarchy, only that the boxes be lined up neatly. Stack the three and you get a stable attractor: with no strong constraint, generation inevitably lands on this set of fingerprints. This explains a counterintuitive fact — slop is not a product of the model being "not strong enough"; it is precisely a product of it being "strong enough to fit the mean."
每条指纹的修法,都是"加回一个被生成省略的判断"
Every fingerprint's fix is "adding back a judgment generation skipped"
Since fingerprints come from "skipping a judgment," the fix must be "adding that judgment back," not switching to a trendier cliché (switching clichés only swaps one mean for another). The fix for the color fingerprint is answering "why is this the primary color" — find a motivated color from the brand, the content's subject, the user's situation, rather than defaulting to neon; the fix for the material fingerprint is answering "what hierarchy is this blur/shadow serving" — let material carry information hierarchy, not decoration; the fix for the type fingerprint is answering "why is this the typeface" — pick one whose character matches the content, with real identity, and build a genuine left-aligned grid instead of centering everything; the fix for the layout fingerprint is answering "do these contents weigh the same" — size by content weight rather than stuffing all into equal boxes. The shared structure: every fix swaps a "labor-saving default generation made for you" for "a judgment you actively made for these people." This is exactly what DSN 03·5 said — taste is the synthesis of these judgments, and anti-slop is adding them back item by item.
图FIGFIG. D6.1 / FINGERPRINT ANATOMY · 逐处标注一张 slop 落地页 · a slop landing page, annotated part by part看懂:左边是典型 AI 落地页的线框,每一处指纹(渐变标题、玻璃拟态、三卡、居中 Inter)右边都接着一个被省略的判断——指纹不是审美故障,是判断缺位的痕迹Read: left is a typical AI landing-page wireframe; each fingerprint (gradient title, glassmorphism, three cards, centered Inter) maps on the right to a skipped judgment — a fingerprint is not an aesthetic fault but the trace of an absent decision
Read this as the causal version of a checklist: each numbered item on the left is a visible symptom; the match on the right is not "a prettier alternative" but a decision someone should have made that generation skipped. This is why slop all looks alike — it omits the same set of judgments; and why you cannot fix slop by "generating a prettier one": prettier still lands on the mean, and the missing decisions still go unmade.
把一张典型 slop 落地页逐处标注,每处都对应一个被省略的判断
Annotating a typical slop landing page point by point — each maps to a skipped judgment
设想一张最常见的 AI 生成落地页,从上到下逐处看,你会发现它像一份征兆清单:顶部是深色背景配一行紫到蓝的渐变大标题,渐变文字本身就是第一处指纹——它把"标题"当成了炫技的画布,而非传达信息的层级,被省略的判断是"这行字到底要让谁、在几秒内、读到什么"。主视觉区是一块玻璃拟态卡片浮在模糊光斑上,第二处指纹——模糊在这里不服务任何层级,纯粹是装饰,被省略的判断是"这个浮层比背后的东西更重要吗,凭什么浮起来"。功能区是三到四张等大的圆角卡片整齐排成一行,每张顶上一个大圆角图标,第三、四处指纹叠加——等大意味着"这些功能同等重要"这个几乎从不为真的假设没被质疑,图标只是占位装饰,被省略的判断是"这些内容的权重真的一样吗,这个图标帮人认出了什么"。数据区是几个巨大的数字配小标签,第五处指纹——它套用了"看起来很有料"的仪表盘模板,被省略的判断是"这些数字对这群用户真的重要吗,还是只是为了填满版面显得专业"。全篇居中、用 Inter,最后两处——居中是最省排版决策的默认(不用想对齐网格),Inter 是最安全的字(不会被批但也毫无性格),被省略的判断是"这个内容的气质适合居中吗,这款字说出了品牌的什么"。
Picture the most common AI-generated landing page and read it top to bottom, and you find it reads like a checklist of symptoms: at the top, a dark background with a row of purple-to-blue gradient headline — gradient text is itself the first fingerprint, treating the "headline" as a canvas for showing off rather than a hierarchy that conveys information, the skipped judgment being "who exactly should read what, in how many seconds, from this line." The hero area is a glassmorphism card floating over blurred light blobs — the second fingerprint, where blur serves no hierarchy and is pure decoration, the skipped judgment being "is this floating layer more important than what is behind it, what entitles it to float." The features area is three or four equal rounded cards lined up neatly, each with a big rounded icon on top — the third and fourth fingerprints stacked, where equal sizing leaves the almost-never-true assumption "these features are equally important" unquestioned and the icon is mere placeholder decoration, the skipped judgment being "do these contents really weigh the same, and what does this icon help anyone recognize." The data area is a few giant numbers with small labels — the fifth fingerprint, applying the "looks substantial" dashboard template, the skipped judgment being "do these numbers really matter to these users, or are they just there to fill the page and look professional." Centered throughout, in Inter — the last two, where centering is the default that saves the most layout decisions (no alignment grid to think about) and Inter is the safest typeface (never criticized, also utterly without character), the skipped judgment being "does this content's character suit centering, and what does this typeface say about the brand."
Read this annotated figure as a whole and you reach a deeper definition of slop: slop is the visual accumulation of a series of "skipped judgments." No single fingerprint is fatal on its own — centering once, a gradient once, is no crime; what is fatal is that from top to bottom the whole figure has not one place actively judged for these people, every place picking the most labor-saving, least-wrong default. This is exactly why slop feels "right everywhere, hollow overall": it really did make no mistakes anywhere (every default is safe), but really has no one present anywhere (every default bypasses judgment). Conversely, a design with taste need not be novel everywhere, but it surely actually judged where judgment was most due — a motivated primary, a hierarchy with trade-offs, type with character, the emphasis emphasized. So the exercise of "annotating a slop" is itself valuable training: it forces you to ask, at each place, "did anyone make a judgment here," and that question is the key that upgrades "spotting slop" into "understanding why slop is slop."
把上面的成因翻译成可机检条目,正是 DSN 08 HARD-RULES 那一层的来源:禁渐变文字、限定调色板取自品牌 token、模糊层数上限、字体白名单(排除 Inter/Roboto 系统默认)、卡片尺寸必须随内容权重变化、命中"空洞口号词表"零次。这些可写进 lint;而"这个主色为什么是它、这款字为什么对"仍是软判据,留给人。下方 INSTRUMENT 12/13 帮你把这两层分开跑一遍。Translating the causes above into machine-checkable items is exactly the source of DSN 08's HARD-RULES layer: ban gradient text, constrain the palette to brand tokens, cap blur layers, whitelist typefaces (excluding Inter/Roboto system defaults), require card size to vary with content weight, zero hits on the "empty-slogan word list." These go into lint; while "why this primary, why this typeface is right" remains soft, kept with people. INSTRUMENT 12/13 below help you run the two layers separately.
DSN
09·7
FAILURE · 同质化是系统性风险
HOMOGENIZATION AS SYSTEMIC RISK
机理 · 宏观失效
Mechanism · Macro failure
最大的失败模式不在一个产品里,在整个行业一起滑向均值
The biggest failure mode is not inside one product but a whole industry sliding to the mean together
The above is about how a single product avoids slop. But a bigger failure mode happens at the system level: when all teams use the same few generation models, feed similar prompts, and converge on similar means, the entire digital world grows more and more alike. This is not alarmism but a direct corollary of generation economics. Grasping this macro failure is what lets you see why "guarding heterogeneity" is not personal aesthetic fastidiousness but a discipline necessary to resist a systemic gravity.
Mechanism: a shared mean is a shared attractor. In the past, part of design's diversity came from the difference between tools and people — different designers' touch, different tools' defaults, leaving different marks in the output. Once generation makes "reaching average quality" nearly free, and everyone uses the same batch of models and the same popular prompts, those sources of difference get flattened: everyone starts near the same peak of the same distribution. The result is a convergence pressure — no one forces it, but each rational individual picks the most labor-saving, least-wrong default, and those defaults happen to be the same one. Ten teams each make a product that "looks professional," yet stacked together they turn out cut from one mold. This is homogenization: "safe choices" at the individual level summing, at the system level, into "a collective loss of distinctiveness."
实测锚点Measured anchor
这条机制不止是推断,已有受控实验测到它的方向。Doshi 与 Hauser 的对照实验(Science Advances,2024,约 300 名受试者写短篇故事,部分人获得 AI 创意提示)发现一个分裂的效应:拿到 AI 提示的故事在个体层面被评得更新颖、更有用,但这批故事在集体层面用语义相似度衡量却更彼此趋同——个人创意上升,集体多样性下降。这正是"共享均值是共享吸引子"的实验影像:放大每一个个体,同时压平整个分布。〔源:Doshi & Hauser, Science Advances 10(28), 2024,证据级 Ⅱ 受控实验;该研究对象是叙事文本,迁移到视觉/产品设计是一次合理但仍需验证的外推,故不外推具体数字〕[R4]This mechanism is not only inferred; a controlled experiment has measured its direction. Doshi and Hauser's randomized study (Science Advances, 2024; roughly 300 participants writing short stories, some given AI story ideas) found a split effect: stories written with AI prompts were rated more novel and useful at the individual level, yet the set of those stories was more similar to one another at the collective level, measured by semantic similarity — individual creativity up, collective diversity down. This is the experimental image of "a shared mean is a shared attractor": amplify each individual while flattening the whole distribution. [Source: Doshi & Hauser, Science Advances 10(28), 2024, grade Ⅱ controlled experiment; the study's object is narrative text, so carrying it to visual/product design is a reasonable but still-unverified extrapolation, and no specific figure is extrapolated here][R4]
Why is this the failure mode truly worth worrying about? Because it is almost painless to the individual — your product "looks fine," every metric passes, no single-point failure warns you something went wrong. The pain is deferred and dispersed across the whole industry and the long term: users feel an "it's fine" indifference to everything, with nothing worth remembering, preferring, or being loyal to. The sense of belonging — "this was made for me" — that design is meant to create gets diluted to zero in the universal mean. Guarding heterogeneity is the individual-level resistance to this gravity: choosing, explicitly and at a cost, "true only for these people." It looks, at the individual level, like giving up part of a potential audience; at the system level it is the only way to keep diversity alive. This is the final landing, on the design face, of the org volume's human through-line — being for people means being for specific, differing people, and keeping that difference alive requires each designer to actively stand against the gravity of the mean.
图FIGFIG. D8.0 / DISTINCTIVENESS APPRECIATES · 供给趋无限,均值贬值、偏离均值升值 · as supply →∞, the mean depreciates while the off-mean appreciates看懂:当"看起来专业"被无限供给,它的价值趋零、变成入场券;同一时间,"为这群人极致对路"的偏离反而越来越稀缺、越来越值钱——这是同质化危机的另一面Read: when "looks professional" is supplied without limit, its value tends to zero and becomes mere admission; meanwhile the off-mean "extreme fit for this group" grows scarcer and more valuable — the flip side of the homogenization crisis
This is the time face of D6.0's "gravity pulling every design toward the mean": pull the lens back and the two value curves move in opposite directions. The more slop floods in, the less "looks professional" buys you — it shifts from a differentiator to a baseline, an admission ticket; in the same motion, design that is genuinely "true only for these people" becomes more conspicuous, scarcer, worth more against the surrounding mean. So holding the heterogeneous is not only defense (don't slide into slop); it is a bet — on the scarce thing that is appreciating. This too extrapolates no specific number, only the direction.
同质化里藏着一个反直觉的机会:异质本身在升值
Hidden in homogenization is a counterintuitive opportunity: distinctiveness is appreciating
这条系统性风险有一个反直觉的另一面,值得对从业者点明:当均值变廉价且无处不在,偏离均值的那部分反而变得更稀缺、更值钱。经济学的直觉在这里成立——任何东西一旦供给无限,它的价值就趋零;slop 正在变成无限供给,所以"看起来专业"本身已经不再是优势,它是基线、是入场券。真正能制造差异、让人记住、让目标用户产生"这是为我做的"那种归属感的设计,反而因为周围一片均值而更显眼、更有价值。这意味着异质守护不只是一种防御性的纪律(避免滑向 slop),它同时是一种进攻性的机会(在一片趋同里成为那个被记住的)。对一个团队或个人,这是把同质化危机翻转成定位优势的入口:当所有人都在用 AI 把自己变得更像,主动选择"只对这群人极致对路、对其他人无感"的那个,反而占据了越来越空旷的差异化高地。守异质,因此既是对系统性风险的抵抗,也是对一个正在升值的稀缺品的押注。
This systemic risk has a counterintuitive flip side worth naming for practitioners: once the mean is cheap and everywhere, the part that deviates from the mean becomes scarcer and more valuable. The economic intuition holds here — anything in infinite supply trends toward zero value; slop is becoming infinite supply, so "looks professional" is no longer an advantage but the baseline, the price of admission. Design that genuinely creates difference, gets remembered, makes the target user feel the belonging of "this was made for me" becomes, precisely because everything around it is the mean, more conspicuous and more valuable. This means guarding heterogeneity is not only a defensive discipline (avoiding the slide to slop) but also an offensive opportunity (being the one remembered amid the convergence). For a team or an individual, this is the entry to flipping the homogenization crisis into a positioning advantage: when everyone uses AI to make themselves more alike, the one who actively chooses "extremely on-target for these people, cold to everyone else" occupies the increasingly empty high ground of differentiation. Guarding heterogeneity is therefore both a resistance to systemic risk and a bet on a scarce good that is appreciating.
To avoid being misread, "heterogeneity" must be distinguished from "different for difference's sake." Guarding heterogeneity is not encouraging novelty-seeking, not deliberately being weird to stand apart — that merely swaps "sliding to the mean" for "sliding to the bizarre," equally rootless in empathy, equally a variant of slop. True heterogeneity is the natural outgrowth of "being on-target for these people": when you genuinely design for a group of specific people under a specific purpose, your trade-offs naturally diverge from the mean optimized for everyone, because these people are inherently different from "everyone." In other words, heterogeneity is a byproduct of empathy, not the goal itself. This distinction matters because it guards against a common over-correction: a team hears "anti-slop, be distinctive" and starts being distinctive for its own sake, making a pile of deliberately odd things equally inconsiderate of users. The test is simple — ask whether this "different" grew from "for these people" or from "wanting to seem different." The former is guarding heterogeneity; the latter is just another flavor of slop. The right way to guard heterogeneity is always to pin down "for whom" first and let difference emerge naturally, not to pursue difference itself.
Connect this systemic failure to the asymmetry at the volume's opening and the whole figure closes: because generation makes "reaching the mean" free (the asymmetry), the default output is slop (the mean is slop), so unresisted the whole industry slides to the mean together (homogenization), so the only cure is each designer actively pinning their aesthetic to specific people (guarding heterogeneity), and this is exactly the judgment a machine cannot do and a human must carry (taste = scarce judgment), and it is worth carrying because being for specific people is design's foundation and the whole series' shared direction (people return to meaning). This is not six scattered points but one complete force chain reasoned all the way from an economic premise to a human conclusion: each link a necessary consequence of the last. This is why this volume dares to say it caught structure, not style — because each of its claims is not an isolated aesthetic preference but a node on this chain forced out by what came before. Reading this far, look back at the generation × taste plane of DSN 01 and you find the whole volume was telling different faces of one thing: people must, by hand, re-add the vertical axis of taste to a world that is collectively losing it.
证伪 · 这条担忧可能错在哪Falsification · where this worry could be wrong
这条担忧若错,会错在:若生成模型未来能主动制造有意义的差异(不是随机扰动,而是针对不同人群给出真正不同且对路的设计),同质化引力就会被模型自身抵消,异质守护也就不再需要人来扛。目前没有证据表明模型在做这件事——它们优化的是"对得多",不是"对这群人特别"。只要这一点不变,同质化就是真实的系统性风险,异质守护就仍是人的责任。If this worry is wrong, it is wrong here: if future generation models can actively manufacture meaningful difference (not random perturbation but genuinely different, on-target designs for different groups), the homogenization gravity would be canceled by the model itself, and guarding heterogeneity would no longer need a human to carry it. There is currently no evidence models do this — they optimize "right for many," not "special for these people." As long as that holds, homogenization is a real systemic risk and guarding heterogeneity remains a human responsibility.
Design-as-code does not stop at static interfaces. Remotion (video written in React) turns a video into programmable, diffable, agent-generatable code, where timeline, copy, and palette are all variables. Motion and video take exactly the same force as static design: artifact becomes code, gains the same leverage; taste and intent remain the scarce judgment.
Producing a product video used to mean: change one word, re-render the whole thing, go back into opaque timeline software to align by hand. When video is code: copy is a parameter (batch-localizable), palette comes from the same tokens (on-brand), one change is one diff (reviewable, revertible), and an agent can spin up variants on spec (the same spread → judge → converge loop). Motion stops being a black hole of production hours and becomes a generatable artifact, same-source as the interface.
时间线软件 · 二进制Timeline software · binary
改字要重剪、本地化要重做、变体靠手工;agent 进不来,品味花在重复劳动上。
Changing a word means re-editing, localization means redoing, variants are manual; no agent can enter, and taste is spent on repetitive labor.
video-as-code · 文本Video-as-code · text
文案/时长/配色皆变量,agent 按规格铺变体,人只判"哪条对路、节奏对不对"。同一条工作流环。
Copy/duration/palette are variables; an agent spreads variants on spec; the human only judges "which is on-target, is the pacing right." The same workflow loop.
Bringing motion in is not just covering one more artifact type; it actually verifies the portability of the whole method. A methodology that holds only for static interfaces and breaks the moment it meets the time dimension has most likely grasped surface appearance rather than force. Motion is a clean stress test: it re-maxes the variable "is the artifact a proprietary binary" (the project files of traditional video tools are exactly the canonical unreadable binary), so the same force chain should reappear — and it does. Change one word and re-render the whole thing, redo localization, variants by hand: these are precisely "the binary canvas gets sidelined" recurring in the time dimension; while Remotion expressing video as React — copy as a parameter, palette from the same tokens, one change as one diff — is precisely "the code artifact gets amplified" recurring. The force chain replaying intact on a brand-new artifact type is itself evidence that this method is not an improvised patchwork but has caught the underlying structure.
时间维度多出一条不可机检的轴:节奏
The time dimension adds one more axis that cannot be machine-checked: pacing
Carrying design-as-code into motion verifies the kernel's portability — the same force holds on another face. But honestly, the time dimension has one more axis that cannot be machine-checked than static does, so the human half of judgment is heavier here, not lighter. In static design the hard machine-checkable constraints (token, alignment, contrast) cover a fair share of "right or wrong"; in motion there is the added axis of pacing — whether a transition is 200ms or 400ms, how many beats to hold after a line of narration, in what order information is revealed — none of which can be written into tokens, none of which has a "correct value," only "right or wrong for this content, this emotion." This means that after video-as-code automates the machine-checkable labor of ②③④ (filling states, localization, spreading variants), the freed-up human effort must in fact go more densely into pacing judgment. Code gives you the freedom to try different pacings at will (change a parameter, re-render), but which pacing is right can still only be settled by a human's ear and eye.
这条边界也回答了一个常见的误解:"既然视频成了代码,是不是 AI 能端到端生成成品视频了?"能生成,但生成的默认仍是节奏上的均值——四平八稳、哪里都不出错、也哪里都不动人,正是动效版的 slop。所以动效的工作流和静态完全同构:规格(包括节奏意图)→ 铺开多个节奏方向 → 人判哪个对 → 导向再生成 → 收敛。代码形态把这条环的每一步都变得便宜可迭代,但环中央那个"判节奏"的验证器,依旧是人。这就是为什么我们说动效是"同一招再走一遍",而不是"又一个被 AI 解决的问题"。
This boundary also answers a common misreading: "since video is now code, can AI generate finished video end to end?" It can generate, but the default of that generation is still the mean of pacing — even, error-free everywhere, moving nowhere — which is precisely the motion version of slop. So the motion workflow is fully isomorphic to the static one: spec (including pacing intent) → spread several pacing directions → human judges which is right → steer and regenerate → converge. The code form makes every step of this loop cheap and iterable, but the "judge the pacing" verifier at the loop's center is still a human. That is why we say motion is "the same move once more," not "another problem AI has solved."
Motion holds another underrated lever: localization and personalized variants. Shipping a product video in ten languages, three durations, with the tone tuned for different groups, is in timeline software a repetition of dozens-times-several manual jobs — each version re-edited, re-aligned, re-exported by hand — at a cost so high most teams simply give up and ship one make-do version. When video is code, this cost structure changes completely: language is an array of copy parameters, duration a timeline variable, group tone a switchable configuration, and an agent can batch-render every combination from these parameters, with the human only judging "is each version's pacing right in its context." This turns the personalization that used to be "unaffordable so not done" into the routine of "nearly free so worth doing." Its significance is not only saved effort but that "making one version right for each different group" — DSN 09's heterogeneity-guarding — becomes, for the first time, economically feasible on motion, the artifact type that used to be most expensive. The code form proves once more here: what it amplifies is never some flashy feature but the feasibility of "making specific things for specific people."
同构 / 边界Isomorphism / boundary
这是 DSN 06–08 那套受力照搬到时间维度——静态怎么做,动效就怎么做。边界也一样:节奏、情绪、何时该停顿留白,无法写进 token,仍是人的品味判断。代码给的是杠杆,不是节奏感。This is the DSN 06–08 force carried into the time dimension: do for motion what you do for static. The boundary is the same too: pacing, emotion, when to hold a beat or leave space cannot be written into tokens; they remain human taste. Code gives leverage, not a sense of rhythm.
DSN
10
SPECULATION · 推演幕
SPECULATION · The Speculation Act
推论 · 外推,非事实
Inference · Extrapolation, Not Fact
当生成成本趋零,设计组织会变成什么
When generation cost goes to zero, what the design org becomes
This act does not draw a single "AI keeps getting stronger" curve. It projects this volume's thesis — generation abundant, taste scarce — onto 2026 through 2032. It does not predict which line occurs; it opens a possibility space: which branches are possible, their leading indicators, and what observation would falsify each. Only speculation that can be falsified is worth speculating.
Nature of this chapter · InferenceWhat follows extrapolates from the public trajectory of 2023-2026; it is not a statement of fact. Each line carries leading indicators and a falsification condition; when observation contradicts the speculation, this chapter should be the first to be rewritten. [Grade Ⅴ, argument/projection]
THE PROJECTION · 把本卷命题推到时间轴上The thesis pushed onto the timeline
Speculation is not daydreaming. This volume rests on one claim: when "output at the average bar" approaches free, value collapses from being-able-to-make toward being-able-to-judge, and the part of judgment least replaceable by machines is taste. Put that claim on a timeline and the question is not "will AI get stronger" — that is nearly certain — but whether, when, and by what the machine-uncheckable judgment node gets eroded. Three converging forces set the boundaries, two axes of uncertainty open four worlds, and three artifacts from those worlds make the speculation tangible. Finally we must record the counter-bet against this volume: what if taste is not scarce after all.
The redrawing of the design org is not one curve but three forces maturing independently and now converging. Each asks only three things: what form it unlocks if it holds, where it is now, and what signal would falsify it. If the first two both hold, they squeeze the third at the same time — which is exactly the decisive point of this volume's thesis.
生成成本趋零 · GENERATION → FREE
Generation Cost → Zero
解锁Unlocks出一张"看起来专业"的稿、一段过得去的视频、一套能跑的组件,边际成本趋近于零。设计的稀缺资源从"产出能力"彻底移走——一人可铺开过去一个团队的候选量。Producing a "looks professional" comp, a passable video, a runnable component set drops to near-zero marginal cost. Design's scarce resource moves off "ability to produce" entirely; one person spreads the candidate volume a whole team once did.
TRL规模化中2023-26 文生图/视频/前端代码逐年逼近"专业均值",成本逐年下降〔证据级 Ⅳ〕。Scaling up In 2023-26 text-to-image / video / front-end code close in on the "professional mean" year over year, at falling cost [grade Ⅳ].
证伪Falsified if若生成在"过得去"处长期封顶、最后一公里(品牌一致、可交付、合规)仍需大量人工返工,则成本并未趋零,执行仍是稀缺资源。If generation caps at "passable" for the long run and the last mile (brand consistency, deliverability, compliance) still needs heavy human rework, then cost has not gone to zero and execution remains scarce.
设计系统即规格 · SYSTEM-AS-SPEC
Design-System-as-Spec
解锁Unlocks当 token、组件、规则被写成机器可读的规格(沿 DSN 06-08 的方向),"何为对"的一大半变成可机检的护栏——生成被约束在系统内,品牌一致不再靠逐稿盯。设计系统从文档升级为可执行的判断载体。When tokens, components, and rules are written as machine-readable spec (along the DSN 06-08 direction), much of "what is right" becomes a machine-checkable guardrail; generation is constrained inside the system and brand consistency no longer rides on per-comp policing. The design system upgrades from document to an executable carrier of judgment.
TRL早期商用token 与组件库已标准化;"约束生成"工具 2025 起进入早期商用,品味规则的形式化仍浅〔证据级 Ⅳ〕。Early commercial Tokens and component libraries are standardized; "constrained-generation" tooling entered early commercial use from 2025, while the formalization of taste rules stays shallow [grade Ⅳ].
证伪Falsified if若品味的关键部分始终无法被写成可机检的规则(节奏、情绪、何时留白),则系统只能挡住低级错误,挡不住趋同——护栏有上限。If the load-bearing part of taste can never be written as a machine-checkable rule (pacing, emotion, when to leave space), the system catches only low-level errors, not convergence; the guardrail has a ceiling.
品味作为唯一护城河 · TASTE AS MOAT
Taste as the Remaining Moat
解锁Unlocks前两条把执行和合规都抹平后,组织间唯一不能被复制的差异,是"挑得准、知道为什么、敢承担"的判断密度。竞争从"谁做得多"转向"谁判得对"——品味成为最后的差异化资产,可被定价、被招聘、被组织化。Once the first two flatten execution and compliance, the only difference between organizations that cannot be copied is the density of judgment that picks accurately, knows why, and bears the consequence. Competition shifts from "who makes more" to "who judges right"; taste becomes the last differentiating asset — priced, hired for, and organized around.
TRL论证态这是本卷的承重命题,也是最该被对赌的一条——见下方反命题与情景台。〔证据级 Ⅴ 论证〕Argument stage This is the volume's load-bearing claim and the one most deserving a counter-bet — see the counter-bet and the scenario bench below. [grade Ⅴ, argument]
证伪Falsified if若模型学会了在统计上稳定地复现"被市场判为好"的设计(用户偏好可被高保真预测),则品味也被自动化,护城河蒸发——这正是反命题。If models learn to reproduce, with statistical stability, the designs the market judges as good (user preference becomes high-fidelity predictable), then taste too is automated and the moat evaporates — which is precisely the counter-bet.
为什么不是四条 · BOUNDARY NOTE
Why Not a Fourth
边界Scope具身/机器人、能源算力地租、监管这些更宽的力量在组织卷推演;本卷只追设计这个面上的三条。把它们都堆进来会稀释命题——推演的纪律是只外推自己能负责的那条线。Broader forces — embodiment/robotics, energy and compute rent, regulation — are speculated in the Org volume; this volume tracks only the three on the design face. Piling them all in would dilute the thesis; the discipline of speculation is to extrapolate only the line you can be held responsible for.
交叉Coupling监管确会外溢到设计(AI 生成内容标注、版权),但它改变的是约束,不改变"品味是否可机检"这个本卷的轴心问题。Regulation does spill into design (labeling of AI-generated content, copyright), but it changes the constraints, not this volume's pivot question of whether taste is machine-checkable.
INSTRUMENT 14 · 情景台 SCENARIO BENCH
三条力量划定边界,但 2032 落在哪个世界,取决于两条高影响、高不确定的力量:X 轴 生成能力(停在"专业均值" vs 突破到"可复现被判为好的设计")与 Y 轴 品味分布(仍稀缺集中于少数判断者 vs 被工具民主化、人人可调)。切换两轴,看本卷命题在那个象限里站得住还是塌掉,以及什么先行指标说明我们正滑向它(GBN 双轴情景法)。
Three forces mark the boundaries, but which world 2032 falls into turns on two high-impact, high-uncertainty forces: X · generation capability (stalls at the "professional mean" vs breaks through to reproducing designs judged good) and Y · taste distribution (stays scarce and concentrated in a few judges vs gets democratized by tools so anyone can dial it). Toggle the two axes to see whether this volume's thesis holds or collapses in that quadrant, and what leading indicator says we are sliding toward it (the GBN two-axis scenario method).
X · 生成能力Generation Capability
Y · 品味分布Taste Distribution
品味溢价Taste Premium
停在均值 × 稀缺Stalls × Scarce
判断寡头Judgment Oligopoly
复现"好" × 稀缺Reproduces × Scarce
寒武纪长尾Cambrian Long Tail
停在均值 × 民主化Stalls × Democratized
均值之海Sea of the Mean
复现"好" × 民主化Reproduces × Democratized
SHORT-TERM2026-2028
"生成铺开候选"成为默认工序
"Generate the candidates" becomes the default step
From one comp to many. Independent designers and small teams default to letting generation spread a dozen-plus candidates first, then pour almost all human time into picking, critiquing, steering. "Fast in Figma" stops being the scarce skill; "spot the right one out of twenty and say why" becomes the new entry bar. Design systems start being maintained as "spec fed to generation," not just delivery documents.
Calibration anchor: the direction holds, the slope is overestimated. These are the first two years of a decade-long curve, not its endpoint. Generation still jams on the last mile at "passable" (brand detail, cross-platform consistency, deliverable state), and rework cost is real through 2026-28; every "→ zero" claim in this block should be discounted by that first. [grade Ⅳ, practitioner extrapolation]
MID-TERM2028-2030
"品味"开始被招聘、被定价、被组织化
"Taste" starts being hired for, priced, and organized
A clear split appears inside design orgs between judgment roles and execution roles: a few hold final judgment on "what is good" and the brand direction; generation takes the rest. In job descriptions the weight on "proficient in tool X" falls and the weight on "judgment quality, sense of direction, able to articulate taste" rises. Design systems evolve from static libraries into "guardrails with judgment" — the machine-checkable part catches low-level errors, the uncheckable part (pacing, emotion) is left explicitly to humans. Homogenization pressure becomes a routine review topic, not aesthetic fastidiousness.
The point of divergence. This stretch is where this volume first meets its counter-bet head-on: if by 2030 preference models can stably predict "this audience will judge it good," the "judgment role" thins earlier than expected. The counter-bet block below records that wager.
The most likely outcome is neither "designers vanish" nor "designers carry on as before" but a forking spectrum: at one end, highly automated "mean products" driven by machine-checkable spec (good-enough, chasing scale and speed); at the other, "judgment-density organizations" with scarce taste as their moat (a few people plus heavy generation, selling a premium on picking accurately). Between them sit traditional teams still doing most judgment by hand. The word "designer" itself is redefined — from "someone who can make interfaces" to "someone who bears the consequences for whether the experience is good."
Betting against the "plurality" judgment is the convergence prediction of the "Sea of the Mean": if generation and preference prediction both mature and taste is democratized by tooling, the differentiating asset may evaporate wholesale, and every product slides toward the same local optimum validated as "high-converting." Whether the plural spectrum or the convergence becomes the main picture of the 2030s design world is the most trackable divergence in this chapter. [grade Ⅴ]
Every strong trend provokes a counter-trend. As generated content saturates, "100% human-designed / handmade" begins appearing as a differentiator among niche brands, independent publications, and boutique studios — not because human hands are necessarily better, but because "provably not the generated mean" becomes a scarce signal in itself. "Anti-slop" shifts from personal fastidiousness to a market-recognized position. This branch will not become mainstream, but it is structurally present, a standing reminder: when everything converges, "not converging" is itself worth something.
Speculation made only of assertions feels abstract. The three pieces below are design fiction: explicitly fictional future artifacts that make "the design org where taste is the moat" tangible. They are not predictions; they are a way of projecting the thesis onto 2031.
"You will not be asked to produce comps. Producing comps is something our generation pipeline can hand you three thousand of a day. You do what it cannot: pick the one that should ship out of three thousand, say why it and not the other two, and own the consequences of that judgment."
职责
判断而非生产 · 设定品味与品牌边界 · 维护"喂给生成的规格" · 为不可逆的发布决策担责
Responsibilities
Judge rather than produce · set taste and brand boundaries · maintain the "spec fed to generation" · own irreversible release decisions
不要求
任何单一生成工具的熟练度(我们假设它一年内会被换掉)
Not required
Proficiency in any single generation tool (we assume it will be replaced within a year)
Hit rate and directional correctness — the real-world performance of the picked version after launch, and whether the "why" can be folded back into spec so the next round's hit rate rises (not comp volume)
SPECULATIVE · 虚构 · Fiction
ARTIFACT 02 · 工具更新日志 · Tool Changelog
某生成式设计工具 v9.0 更新日志(节选)· 人的角色被翻转
A Generative Design Tool, v9.0 Changelog (Excerpt) · The Human Role Inverts
新增
「判断模式」成为默认。打开文件即生成 N 版候选;画布不再是空白,而是一墙待裁的候选。"新建空白画板"降级到二级菜单。
Added
"Judgment mode" is now the default. Open a file and N candidates are generated; the canvas is no longer blank but a wall of candidates awaiting a verdict. "New blank artboard" is demoted to a submenu.
The primary action shifts from "draw" to "pick / critique / steer." Each verdict prompts a "why," depositing the reason into the project's taste spec — the tool begins accruing your judgment loop for you.
The tool can block candidates that violate the spec; it cannot decide for you whether the spec itself is right. Pacing, emotion, when to leave space still require a human verdict — a boundary we deliberately do not automate, and this tool's design stance.
SPECULATIVE · 虚构 · Fiction
ARTIFACT 03 · 同质化事故复盘 · Homogenization Postmortem
"我们的 App 和竞品长得一模一样" · 复盘摘要
"Our App Looks Identical to a Competitor's" · Postmortem Summary
In 2031, half a year after a brand refresh, a team found its product nearly indistinguishable from two competitors' on the key screens. No one copied anyone — all three used the same few generation models, the same popular spec, and handed "improve conversion" to the same class of preference prediction. The convergence was not plagiarism but every rational team independently sliding toward the same local optimum validated as "high-converting."
根因
共享模型 + 共享规格 + 共享优化目标 = 共享吸引子(呼应 DSN 09·7 同质化机制)
Root cause
Shared models + shared spec + shared optimization target = a shared attractor (echoing the DSN 09·7 homogenization mechanism)
责任链
落在把"何为好"完全外包给转化数字的判断者——不是"模型趋同了",是没人守异质这件事的判断节点空着
Chain of responsibility
Falls on the judges who outsourced "what is good" entirely to conversion numbers — not "the models converged" but that the judgment node for guarding heterogeneity was left empty
Put a "guard-heterogeneity" metric into review (distance from the mean) and explicitly keep one human to answer "does this still look like us" — turning this volume's thesis into a step in the process, not a slogan
The counter-bet on record · COUNTER-BETThis volume bets that taste is scarce and machine-uncheckable. Honesty demands laying out the strongest opposing case in full rather than erecting a straw man to knock down. The counter-bet's strongest form is not "AI will draw prettier things" but "taste is not statistically mysterious and is therefore learnable and reproducible." The argument runs in three steps, each with an early signal already visible. First, "judged good by a given audience" is very likely a structured, learnable distribution — human aesthetic preference is not random; it is strongly constrained by culture, context, and recency, and anything with structure can in principle be approximated by enough data plus a strong enough model. Second, scale is filling in that distribution's data: every A/B test, every retention curve, every "users preferred this version" is labeling the preference function, so preference modeling may not need to "understand why it is good" — only to reproduce "judged good" stably at the level of outcomes. Third, the very judgment act this volume keeps emphasizing — steer / pick / critique — once it can be expressed explicitly as spec (exactly what DSN 06-08 push toward), is thereby also exposed as an imitable training signal: the more successfully we externalize taste into machine-checkable spec, the more we are, with our own hands, laying down the training set for "automated taste." This is a real internal tension inside this volume's method, and it should not be hidden. What observation would confirm the counter-bet and rule this volume lost: when a "generation + preference-prediction" system, under double-blind conditions, for a new audience it did not see in training, in a novel situation where old criteria no longer apply, can stably match or exceed a senior domain judge's pick-accuracy — and that edge reproduces across categories rather than overfitting one visual genre — then "taste sits structurally on the human side" is overturned, the moat evaporates, this volume's load-bearing claim fails with it, and the chapter should be rewritten. The one not-yet-falsified margin this volume keeps for itself is precisely that "novel situation never seen": preference models excel at interpolating distributions already judged, but a genuinely unprecedented aesthetic proposition (something no one has made yet that is nonetheless "right") has no historical labels to learn from. As long as novel situations keep arising and their judgment cannot be reached by interpolating history, the human judgment node remains; the day that margin closes too — when a model stably makes verified-correct judgments for situations without precedent — this volume should concede. The author writes this falsification condition down in plain ink precisely so as not to hold "taste is forever scarce" as an unquestionable article of faith. [grade Ⅴ, argument, betting against this volume]
推演溢出的东西Second-Order Effects
Second-Order Effects
推演的终点不是设计组织本身,是它溢出的东西。以下每条都标注在哪个象限下成立——没有无条件的预言。
The endpoint of speculation is not the design org itself but what spills over from it. Each item below is annotated with the quadrant under which it holds; there are no unconditional prophecies.
New roles: Head of Taste / a split between judgment and execution roles [Taste Premium / Judgment Oligopoly]; "guarding heterogeneity" becomes an evaluable duty [any quadrant once the Sea of the Mean is noticed]; the design-system maintainer becomes the holder of "spec plus judgment carrier" [when system-as-spec holds].
New methods and tools: distance-from-the-mean as a review metric [convergence quadrants]; machine-checkable taste guardrails (block low-level errors, leave the uncheckable to humans) [system-as-spec holds]; hit rate rather than comp volume as evaluation [once judgment roles form].
Second-order effects: employment — the rupture between shrinking execution roles and a scarcified judgment role [all quadrants, intensity scaling with generation capability]; the aesthetic ecosystem — the systemic risk of homogenization and the reverse premium on "made by hand" emerging together [the Sea of the Mean accelerates both ends]; copyright and attribution — when the author is "the one who picks" not "the one who makes," how authorship and liability are assigned [all quadrants].
DSN
11
PLAYBOOK · 落地 / 失败模式 / 自检
PLAYBOOK · LANDING & FAILURE MODES
行动 · 承重
Action · Load-bearing
起步、最常见的误用方式、一件自检
Where to start, the most common ways to get it wrong, one self-check
Bring the volume to an executable landing: four principles, four signal sets, a starting path, plus the three pits designers most often fall into going AI-Native. Then a playable self-check that turns "taste is the scarce judgment" into a list you can run before the next release.
Design system first: stand up tokens / components / red lines before generating; generation without guardrails only spreads slop faster.
生成多、判断严:铺开候选交给机器,挑/评/导留给人;产出量不是指标,品味命中率才是。
Generate many, judge hard: spreading candidates goes to the machine; pick/critique/steer stays with people; output volume is not the metric, taste hit-rate is.
写下"何为好":软判据写给人、硬约束写成 lint;不写下来,生成只会滑回均值。
Write down "what's good": soft criteria for humans, hard constraints as lint; unwritten, generation slides back to the mean.
Hold the human and the heterogeneous: speed is never the standard for winning — being made for specific people is; good design is true for one group, so do not trade it for "fine for everyone."
AI 是协作者,不是评判者——这条边界决定了谁握最终判断
AI is a collaborator, not the judge — this boundary decides who holds the final call
一个容易滑过去、却决定成败的边界:在设计里,AI 可以是极强的协作者(铺候选、补状态、给建议、甚至模拟某类用户的反应),但不能成为最终的评判者。原因回到 DSN 03·5:评判"这版是否为这群人对路"需要构成性的品味判断,它坐落在可验证性梯度的最远端,AI 给出的"评分"本质仍是对均值的拟合——让它当裁判,等于让均值来定义好坏,那条异质守护的线就会被悄悄抹平。可以让 AI 帮你把判断说得更清楚("这版为什么让你犹豫?是层级、是语气、还是节奏?"),这是协作;但不能让 AI 替你做出那个判断。把这条边界写进团队的工作约定:AI 的输出永远是"候选 + 理由",最终"收哪个"的按钮必须由人按下,并由人说清为什么。一旦让模型既当运动员又当裁判,闭环里那个验证器就被偷换成了均值生成器,整套方法的承重点就塌了。
A boundary easy to skip past yet decisive: in design, AI can be an extremely strong collaborator (spreading candidates, filling states, giving suggestions, even simulating how a class of users might react), but it cannot become the final judge. The reason returns to DSN 03·5: judging "is this version on-target for these people" needs constitutive taste, sitting at the far end of the verifiability gradient, and the "score" AI gives is still at bottom a fit to the mean — making it the referee means letting the mean define good and bad, and the line of heterogeneity-guarding gets quietly erased. You can let AI help you articulate the judgment more clearly ("why does this version make you hesitate — hierarchy, voice, or pacing?"), which is collaboration; but you cannot let AI make that judgment for you. Write this boundary into the team's working agreement: AI's output is always "candidates + reasons," and the final "converge on which" button must be pressed by a human who states why. Once the model is both athlete and referee, the verifier inside the closed loop is swapped for a mean-generator, and the whole method's load-bearing point collapses.
这条起步路径刻意把"先建护栏"放在"开生成"之前,是因为顺序本身承重。一个常见的失败是反过来:先兴奋地让 AI 铺一堆界面,再回头想"该用什么规范统一它们"——这时你已经被一堆好看但各异的候选淹没,判断力耗在收拾局面上,而不是导向。先建护栏(哪怕极小:三五个 token、两三条红线、一句"为谁而做")意味着第一轮生成就落在窄带里,你的判断从一开始就用在刀刃上。所以"先小"不是保守,是把稀缺的判断力花在最高杠杆的地方——先让护栏与判断在一个小范围内成形、跑通那条闭环,确认判断真的在回流、命中率真的在上升,再扩到更大的面——把第一个完整循环跑通,比一次铺开十个页面重要得多。
This starting path deliberately puts "build guardrails first" before "start generating," because the order itself is load-bearing. A common failure reverses it: excitedly have AI spread a pile of interfaces first, then go back wondering "what standard should unify them" — by which point you are drowning in good-looking but divergent candidates, your judgment spent on cleanup rather than steering. Building guardrails first (even minimal: three to five tokens, two or three red lines, one "for whom" line) means the very first round of generation lands in the narrow band, and your judgment is spent on the cutting edge from the start. So "start small" is not conservatism but spending scarce judgment where leverage is highest — let guardrails and judgment take shape in a small scope first, run the closed loop through, confirm that judgment really flows back and hit-rate really rises, then widen to a larger surface — getting that first complete cycle running matters far more than spreading ten pages at once.
If this volume had to be compressed into one line you could pin on a wall, it would be: generation handles "many," people handle "right"; and the standard for "right" must be written down by people and grow sharper with each round of judgment. That one line holds all four principles — design system first (front-load the standard for "right" as a guardrail), generate many and judge hard (the division of labor: "many" to the machine, "right" to people), write down what is good (the standard must be externalized, or generation can only slide back to the mean), hold the human and the heterogeneous (the final criterion of "right" is "right for these specific people," not "fine for everyone"). It also holds the mirror image of all three failure modes: mistaking "fast" for the win = chasing only "many," forgetting "right"; generation in place of judgment = abandoning the human's responsibility for "right"; treating taste as computable = wrongly believing the standard for "right" can be handed entirely to the machine. Remember this one line, and at any concrete decision ask yourself: in this step am I helping generation produce more, or helping myself state the standard for "right" more clearly? The former the machine can increasingly do for you; the latter is forever your work — and forever where the real value of this craft lies.
① Move existing design into code form (even just tokens + components into the repo) to first gain readable/diffable/generatable. ② Use the self-check below to find the few places "most in need of injected taste." ③ Run the DSN 07 loop there (spec → spread → critique → steer → converge → distill), feeding the first round's judgment back into the system. Start small; let guardrails and judgment take shape, then widen.
发版前跑一遍:勾掉你这版命中的征兆。命中越多,越滑向均值——读数会给出 slop 分、所处带,与第一处该注入品味的地方。Run it before release: tick the fingerprints this version hits. The more hits, the closer to the mean; the readout gives a slop score, the band, and the first place to inject taste.
把每类设计决策分诊:哪些交给生成、哪些定成规则、哪些必须人判
Triage each kind of design decision: hand to generation, set as a rule, or judge by human
DSN 08 gave a triage question; here it becomes a playable allocator. It scores along two axes: the horizontal asks "can this kind of decision be generated / ruled?" and the vertical asks "does it need taste judgment?" Cross the two and each kind of decision lands in a cell with a clear verdict — hand to generation, set a design-system rule, or judge by human. The allocator's value is not telling you a single answer but forcing you to break the blanket word "design" into decision by decision and ask these two questions of each — which is itself the process of externalizing taste from intuition. Try running your current project through it: filling states, the primary palette, information hierarchy, brand voice, alignment and spacing — which cell does each land in?
选一类设计决策的两个属性,看它该落在哪个节点。两轴:可生成/可规则化?× 需品味判断?Pick the two attributes of a kind of design decision and see which node it belongs to. Two axes: generable/rule-able? × needs taste?
① 这类决策可生成 / 可规则化吗?① Can it be generated / ruled?
② 它需要构成性的品味判断吗?② Does it need constitutive taste?
x=Y · y=N
交给生成
Hand to generation
补全状态、铺响应式、套系统、对齐切图——机器全包。
Fill states, responsive, apply the system, alignment/export — the machine does it all.
x=Y · y=Y
设计系统定规则
Design-system rule
可机检但带价值取向:调色板、间距阶、对比度阈值——人先定规则,机器再执行。
Machine-checkable yet value-laden: palette, spacing scale, contrast thresholds — humans set the rule, the machine enforces.
x=N · y=N
需上下文的事实题
Context-fact question
不靠品味但须懂语境:这群用户的真实流程、设备、约束——人查清事实,喂给生成。
No taste but needs context: these users' real flows, devices, constraints — humans establish facts, then feed generation.
x=N · y=Y
必由人判 · 品味
Human taste · keep here
为谁而做、有没有灵魂、对不对路——可验证性梯度最远端,不可外包给生成。
For whom, has soul, on-target — the far end of the verifiability gradient, not outsourceable.
对一个设计师的职业,这套方法意味着什么
What this method means for a designer's career
把这卷收到个人层面:如果你是一个设计师,这套方法不是在预告你的工作会消失,而是在指出你的价值重心会迁移,且越早主动迁移越有利。会做(出稿、铺变体、对齐切图)的部分会持续贬值,因为它正是生成最擅长接管的;会判断(写有判别力的规格、说清为什么这版对路、把判断回流进系统)和会方向(看出该往哪生成、守住为谁而做的边界)的部分会持续升值,因为它坐落在可验证性梯度上模型够不到的那一端。这意味着值得主动投资的能力,不再是"把某个工具用得更熟",而是"把品味外化得更清楚、把判断说得更有理有据"。一个具体的自检:回顾你上周的工作,花在产出上的时间和花在判断、写规格、给方向上的时间,比例是多少?如果还是前者占绝大多数,那不是因为 AI 没用,而是你还停在旧流程里用更快的手——真正的迁移还没发生。这套方法给的不是工具,是一张把自己的价值重心往上挪的地图。
Bringing this volume down to the individual level: if you are a designer, this method is not foretelling that your work will vanish but pointing out that your center of value will migrate, and the earlier you migrate it deliberately, the better off you are. The making part (producing comps, spreading variants, alignment and export) will keep depreciating, because it is exactly what generation is best at taking over; the judging part (writing discriminating specs, stating why a version is on-target, feeding judgment back into the system) and the directing part (seeing which way to generate, holding the boundary of for-whom) will keep appreciating, because they sit at the end of the verifiability gradient the model cannot reach. This means the capability worth investing in is no longer "getting more fluent with some tool" but "externalizing taste more clearly, stating judgment with more reasoned grounds." A concrete self-check: review last week's work — what is the ratio between time spent on production and time spent judging, writing specs, giving direction? If the former still dominates by far, it is not because AI is useless but because you are still in the old process with a faster hand — the real migration has not happened. What this method offers is not a tool but a map for moving your own center of value upward.
这套迁移有一个常被忽略、却对个人最重要的隐含前提:判断力是会随用而长、随弃而萎的,所以"现在就开始判断"本身就是在投资未来的自己。价值重心上移不是一道一次性切换的开关,而是一条需要持续走的成长曲线——你越早开始在工作里刻意练规格、练评判、练方向,你的判断力就越早进入复利增长;反之,越是抱着"等工具更成熟、等团队都转了我再转"的心态拖延,你就越是在让那条本该增长的曲线停在原地,而周围真正动手的人在拉开差距。这条曲线对个人的残酷与公平都在于:它不奖励你用了多新的工具,只奖励你做了多少次真正的判断、并复盘了多少次为什么。所以这一卷给设计师的最后一句话不是"去学某个 AI 工具",而是:从你手上正在做的下一个设计开始,强迫自己在每个该判断的地方真的判断一次、并写下为什么——这一个动作,重复足够多次,就是你在 AI-Native 时代最可靠的护城河。
This migration has an implication often overlooked yet most important to the individual: judgment grows with use and atrophies with disuse, so "start judging now" is itself an investment in your future self. Moving your center of value upward is not a one-time switch but a growth curve you must keep walking — the earlier you start deliberately practicing spec, critique, and direction in your work, the earlier your judgment enters compounding growth; conversely, the more you procrastinate with "I'll switch once the tools mature, once the team has moved," the more you let that curve that should be growing stall in place while the people actually doing the work pull ahead. This curve is both cruel and fair to the individual: it rewards not how new your tools are but how many real judgments you made and how often you reviewed why. So this volume's last word to designers is not "go learn some AI tool" but: starting with the very next design in your hands, force yourself to actually judge once at each place a judgment is due and write down why — this one act, repeated enough times, is your most reliable moat in the AI-Native era.
To bring the whole volume to one takeaway line: AI made "making" cheap, so "judging what to make and for whom" became, for the first time, design's real and nearly entire value. These two used to be entangled — those who made well usually judged well, so no one needed to discuss judgment on its own; now that execution is stripped to generation, judgment is forced to take independent shape, and only now are its scarcity, its trainability, its dependence on the human seen this clearly. Everything this volume did was to restore that stripped-out judgment from "mysticism hidden in intuition" to "a concrete act that can be decomposed, externalized, fed back, and practiced." And what it ultimately points to is the claim running through the whole series: the machine took over the repetitive surface; the person was returned to what was theirs to do from the start and most worth doing — making, for specific people, something that genuinely exists for them.
收束 · 系列人本主线Close · the series' human through-line
这一卷是整个系列人本主线落得最具体的一面——AI-Native 设计不是让人更快地产 slop,而是把设计师还给共情、品味与意义:为具体的人,做真正为他们而存在的东西。This volume is where the series' human through-line lands most concretely — AI-Native design is not producing slop faster; it returns the designer to empathy, taste, and meaning: making, for specific people, something that genuinely exists for them.
DSN
09
WORKED · 落到产物上
WORKED CASES
案例 · 走一遍
Cases · run it
把内核四步,按到四个真实产物上
Pressing the four-step kernel onto four real artifacts
Principle that stays principle is just nice talk. This section presses the same kernel — execution to generation, judgment back to people, context written as spec, people returning to "for whom" — onto four concrete artifacts: a checkout-flow redo, a "good" written as a spec, a choice to drop the many for the few, and a shipped slop product rescued. Each gives a checkable before/after and the one cut that stayed in human hands.
案例 A · 结账流程在近免费生成下重做CASE A · A checkout flow, redone under near-free generation铺开→收敛真实跑法spread→converge, as it ran
A mid-size e-commerce checkout whose chronic wound was a high abandon rate at step three (address + delivery). Traditional path: a week of interaction comps, a review, slicing, two weeks of front-end handoff — one candidate, bet it is right. The AI-native path: with constraints written down, generation spread nine structurally different candidates in one afternoon — single-page expanded, three-step paged, accordion, address-autofill-then-confirm, delivery-first, freight-shown-early, guest-first, wallet-first, hybrid. Production stopped being the bottleneck; which one, and why became it.
内核①②
KERNEL ①②
执行(出九稿)交给生成;判断(选哪个)退守到人。
Execution (nine comps) to generation; judgment (which) back to people.
The nine were not picked by "which looks nicer." The team first established the real cause of abandonment — session replay showed most people stalled because freight appeared only at the last step, not because layouts were busy. That single fact turned nine-into-one from an aesthetic question into a factual one: every "freight-early" candidate advanced; the rest, however pretty, were cut. Three finalists ran a week of gated A/B: the hybrid with freight-early plus address-autofill dropped abandonment from 31% to 19%〔source: figures here are a de-identified retrospective range, grade Ⅳ practitioner first-hand, not a public experiment, not extrapolated to other categories〕[R5].
留在人手里
STAYED HUMAN
把"为什么弃单"查成事实,再用事实当判据筛候选——这一刀机器递不出。
Establishing "why they abandon" as fact, then using fact as the cut — a cut the machine cannot hand you.
Note this is not "generation helped us finish one comp faster." The process changed shape: spread the candidate space first, collapse most of it with one established fact, then run a controlled comparison among only three. Generation did all the "making"; the human did all the "which one, on what grounds" judgment. Two weeks of handoff compressed to one afternoon plus a week of gating, and what was saved was not labor but the move from betting on one to betting on nine, then converging on evidence.
内核③④
KERNEL ③④
事实写进筛选规则(③);为真实用户的处境而判(④)。
Fact written into the screening rule (③); judging for real users' situation (④).
案例图CASE FIGFIG. D9 / SPREAD-THEN-COLLAPSE · 九候选如何被一条事实收掉 · how nine candidates collapse on one fact看懂:候选空间先铺宽,再被"运费早现"这条事实砍成决赛圈Read: the candidate space spreads wide, then one fact — "freight early" — cuts it to a shortlist
The funnel is not "the more generation gives, the better." The wide mouth is generation's contribution (nine structurally distinct candidates, near-zero cost); the narrow mouth is the human's (one established fact compresses an aesthetic question into a factual one). Only the shortlist gets controlled comparison. The machine makes "making" nearly free; the human judges "which one" on grounds — that is the whole of spread-then-converge.
案例 B · 把模糊的"高级感"写成可生成、半可机检的规格CASE B · Turning a fuzzy "premium feel" into a generatable, half machine-checkable spec品味写成规格taste as spec
A fintech app redesign; the boss's entire brief was two words: "make it premium." Historically a senior designer would "draw out" what that meant, with good and bad living unspoken in one head, unteachable. Under near-free generation this brief is toxic: hand "premium" to a model and you get the most common "premium" in its distribution — dark + serif + big whitespace + gold trim, the look every fintech app already wears. A fuzzy "good" feeds back mean slop by construction.
病灶
THE WOUND
模糊判据 + 廉价生成 = 自动收敛到均值。
Fuzzy criteria + cheap generation = auto-converge to the mean.
The fix is not a designer who "gets it" better; it is decomposing "premium" into items anyone (and any machine) can rule on one by one. The team ran three rounds of point-and-ask with the boss: take ten existing screens and ask, one at a time, "is this premium, and why?" After three rounds, "premium" decomposed into six rule-able items: ① low information density but not empty (at most one primary action per screen); ② only two type sizes, weight contrast carried by boldness not color; ③ primary color drawn from the brand, not generic gold, contrast ≥ 4.5:1; ④ no gradient text, no glassmorphism; ⑤ numbers in monospace, right-aligned; ⑥ motion only for state confirmation, ≤ 200ms. The first four go straight into a lint; the last two are soft criteria left for human review. "Premium" went from one head's mysticism to a spec you can feed generation and half-automatically accept.
两层规格
TWO LAYERS
①–④ 进 lint(可机检);⑤⑥ 留人(软判据)。判断一次,执行无数次。
①–④ to lint (machine-checkable); ⑤⑥ to humans (soft). Judge once, enforce countless times.
Those three rounds of point-and-ask were themselves the most expensive, least outsourceable work in the project — they forced one person's tacit taste into explicit, transferable, half-machine-checkable criteria. This is exactly what the volume keeps saying: pull judgment out of making and write it down. Taste used to live in that designer's hands and leave when he left; now it is six written rules a new hire and a generation model can both run against. A spec does not constrain creativity; it freezes the reusable part of judgment and leaves the unreusable part — why this brand color — for humans to keep judging.
内核②③
KERNEL ②③
判断退守(②)后,被写成上下文规格(③)喂回生成。
After judgment retreats (②), it is written as context spec (③) fed back to generation.
机理图MECHANISMFIG. D10 / GRAVITY vs FORCE · 均值引力对异质守护力 · mean-gravity against the heterogeneity force看懂:生成默认把设计拉向分布均值;只有人施加一个反方向的力,产物才落到"只对这群人成立"Read: generation pulls design toward the distribution mean by default; only a human counter-force lands it on "true only for these people"
This is the force diagram behind the whole volume. Mean-gravity is always-on and free: a generation model optimizes "right for many" and naturally pulls every artifact toward the shape most people have seen. The heterogeneity force is human-supplied and costly: it demands "particular to these people," pointing the opposite way. Where the artifact rests is set by the balance — remove the human force and it slides back to the mean. So homogenization is not one slip but the default when no force is applied; and heterogeneity-guarding is not a nicety but the hand holding the artifact where it belongs.
案例 C · 选择"只对这群人成立",并付出代价CASE C · Choosing "true only for these people," and paying the cost异质守护的一次决策a heterogeneity-guarding decision
两条路摆在面前
Two roads
一个给视障人群用的播客 App。改版时摆出两条路:路 A,按通用最佳实践做——大图卡片、瀑布流、自动播放预览,这是生成默认会给的、也是评审会上最容易过的方案,因为它"看起来对";路 B,为这群人的真实处境做——主屏只有三个超大触控区、全程可纯键盘/读屏操作、关闭一切自动播放、用声音而非视觉做状态反馈、对比度拉到 7:1。路 B 在任何"通用美观"的评审标准下都会扣分:它不好看、不"现代"、留白少、信息密度高。
A podcast app for blind and low-vision users. The redesign laid out two roads: road A, build to generic best practice — big image cards, infinite scroll, autoplay previews; the default generation would give, and the easiest to pass review because it "looks right." Road B, build for these users' real situation — a home screen of just three oversized touch zones, full keyboard/screen-reader operation, all autoplay off, state feedback by sound not sight, contrast pushed to 7:1. Road B loses points under any "generic good-looking" review standard: not pretty, not "modern," little whitespace, high density.
分叉
THE FORK
通用均值(A)对真实人群(B)——只能选一个当北极星。
Generic mean (A) vs real people (B) — only one can be the north star.
The team chose road B and named the cost outright: in app-store screenshots aimed at sighted users it "sells" poorly, with download conversion below peers; in brand work, marketing kept wanting to make the three big touch zones "more refined." These costs were not imagined — they showed up monthly in data and meetings. But the judgment that stayed human was this: this product's "good" is not defined by generic aesthetics but by whether the people it serves can finish an episode independently. In usability testing, screen-reader-only success at "find → play → save" rose from 41% before to 92%〔source: de-identified retrospective; the accessibility-usability range cites the same order of magnitude as the WebAIM screen-reader user survey, grade Ⅳ first-hand + Ⅱ survey reference, not extrapolated to a generic conversion claim〕[R6].
留在人手里
STAYED HUMAN
定义"为谁好",并承担"对别人不好"的代价。机器不会替你认这个账。
Defining "good for whom," and bearing the cost of "not good for others." The machine will not own that bill for you.
Heterogeneity-guarding is not a free slogan; every time it costs something real — giving up some of the generic audience, resisting pressure to "make it more universal," conceding on certain generic metrics. But that is precisely the most irreplaceable cut humans hold in AI-native design: generation will forever pull you toward "fine for everyone," and only a human can decide "I will be best for these people, even if worse for others." Surrender that cut and the product can never return to being made for the people it serves; it only evenly offends no one and serves no one.
内核④
KERNEL ④
人回到"为谁而做"的意义判断——并为之承担取舍。
People return to the meaning judgment of "for whom" — and own the trade-off.
案例 D · 一个"快但是 slop"的已上线产品,怎么诊断、怎么救回CASE D · A shipped-fast-but-slop product: diagnosing and rescuing itslop 急救slop rescue
A SaaS dashboard the team shipped in three days with generation tools — fast, genuinely fast. But two weeks in, user feedback was eerily uniform: "it looks professional, but I can't say what's wrong; I just don't want to use it." Day-7 retention sank to 11%. This is the textbook slop symptom: no obvious bugs, every page "fine" alone, yet hollow, samey, and weightless together. It was not built badly; it was built too smoothly, too mean-ward.
slop ≠ bug
SLOP ≠ BUG
没有错,只是没有"为谁"——这是诊断的入口。
Nothing is wrong; there is just no "for whom" — the entry point for diagnosis.
The first rescue step is not a rebuild but diagnosis. The team took the slop-fingerprint list from the earlier SHEET (samey palettes, glassmorphism, rounded-everything + soft shadows, hollow slogan words, all cards the same size, weight carried by color not boldness) and checked the dashboard item by item — five hits. The hits themselves named the cause: all five are features "generation gives by default, with no one ordering it not to." The diagnosis wrote up as one line: every pixel honors "generic professionalism," and not one pixel was made for its real user — the operator who looks ten times a day and cares about three numbers.
诊断工具
THE TOOL
指纹清单把"说不出哪不对"翻译成五条可指认的具体征兆。
The fingerprint list translates "can't say what's wrong" into five nameable, concrete symptoms.
Rescue is not "generate a prettier version" — that just swaps one slop for another. The fix restores the "for whom": shadow three real operators, confirm they watch only three numbers daily — today's conversion, anomalous orders, open tickets; on that basis delete 80% of the home-screen cards, keep only those three, make them big, contrasted, "glanceable for whether anything's wrong"; strip every decorative gradient and glass layer and spend the freed visual budget entirely on those three numbers. The rebuild took just two days (generation is still cheap), but those two days sat behind a week of judgment: establish for whom, define this product's "good." Day-7 retention rose from 11% to 34% after the redesign〔source: de-identified retrospective range, grade Ⅳ practitioner first-hand〕[R5].
前后图BEFORE/AFTERFIG. D11 / SLOP RESCUE · 不是重画,是把"为谁"补回来 · not repaint, restore the "for whom"看懂:救 slop 的关键动作发生在生成之前——补回判断,再让生成执行Read: the decisive move in rescuing slop happens before generation — restore judgment, then let generation execute
Both "drawings" are generated — cheap, fast. The whole difference is the black box between them: the decisive move in rescuing slop is not another comp but restoring the skipped judgment — establish for whom, and what they alone care about. The pixel change in the before/after (seven average cards → three prioritized numbers) is only the result; the real work was that week of judgment. This is why "generate a prettier one" never rescues slop: it skips exactly this middle step.
DSN
10
CRITIQUE · 旧结构
OLD STRUCTURES
批判 · 受力点
Critique · where it breaks
六种传统设计结构,在生成廉价时各自从哪一处断裂
Six traditional design structures, and where each one snaps when generation gets cheap
These structures are not wrong for being old — they were once sound, sound because "making" was expensive. The moment generation crushes "making" to near-zero, each snaps at a specific load-bearing point: each puts humans in the execution seat, and execution is exactly the half the machine takes over. Named one by one, with the break located in each.
The shared break: the six structures look different but snap at the same place — each defines human value by "the speed or polish of output." When generation does both fast and well, the roles, flows, and orgs built on them lose their load-bearing wall at once. Each is named below: why it was once sound, where it snaps now, and what it must become under AI-native.
旧结构 ① · 装饰工 / 最后一公里美化
OLD ① · DESIGN-AS-DECORATION / LAST-MILE PRETTIFIER
"功能先做完,最后叫设计来美化一下"
"Build the function first, call design at the end to pretty it up"
Once sound: prettifying cost hours, so it needed a dedicated person. Where it snaps: prettifying is exactly what generation does best — ten "nicer" versions instantly. When prettifying is free, casting humans as prettifiers pins them on the machine's strongest square. Becomes: humans do not do the last-mile polish but the first-mile judgment — for whom, what good means, which version is on-target.
旧结构 ② · 孤胆天才 / 作者式设计师
OLD ② · THE LONE-GENIUS AUTEUR
"好设计出自一个有品味的天才之手"
"Good design flows from one tasteful genius's hand"
Once sound: taste was tacit, lived in the hand, only that person could produce it. Where it snaps: the hand's work has gone to generation; the genius's "hand" is no longer scarce — scarce is the ability to write taste down. An auteur who never externalizes criteria sees his taste zero out when he leaves, and it cannot be fed to generation. Becomes: from "producing taste by hand" to "writing taste as transferable, half-machine-checkable spec" — see Case B.
旧结构 ③ · 出稿—交付—甩给开发的瀑布
OLD ③ · MOCKUP-THEN-HANDOFF WATERFALL
"设计师画死稿,标注好,扔过墙给前端实现"
"Designer freezes a comp, annotates it, throws it over the wall to front-end"
Once sound: drawing and coding were two expensive skills; splitting them saved cost. Where it snaps: when the artifact itself becomes code (design-as-code, see DSN 02), the wall of "comp → annotate → translate to code" adds a gratuitous lossy translation that bleeds information and breeds bugs each pass. Generation produces runnable artifacts directly, so the cross-wall "translation" becomes pure waste. Becomes: design and implementation co-evolve in one medium (code/tokens), with no over-the-wall step.
旧结构 ④ · 像素级画板文化
OLD ④ · PIXEL-PERFECT ARTBOARD CULTURE
"把每一帧每一态都在画板上逐像素固定"
"Nail every frame and state to the pixel on the artboard"
Once sound: implementation was expensive and a change costly, so you nailed everything on the artboard to cut rework. Where it snaps: when generation produces every responsive state and variant in minutes, spending human-hours nailing one frame to the pixel spends scarce judgment on details the machine fills offhand. The artboard's "perfect frame" also lies — it often fails under real data, real devices, real edge cases. Becomes: define constraints and criteria (spec, tokens, guardrails), let generation spread all states, and have humans accept "on-target?" not "pixels aligned?"
旧结构 ⑤ · 设计当内部服务台
OLD ⑤ · DESIGN AS INTERNAL SERVICE DESK
"业务提需求 → 设计接单出图 → 计件交付"
"Business files a ticket → design takes the order and ships comps → piecework delivery"
Once sound: comps were scarce capacity, and queued ticketing maximized utilization of a scarce resource. Where it snaps: comps are no longer scarce, and the whole "take-ticket-ship-comp" value chain collapses — business can produce comps with generation itself. A design team that stays a service desk guards precisely the capacity that is now free, while ceding the truly scarce thing (the judgment of for-whom and what-good-means) to people who do not judge it. Becomes: from "take the ticket, ship the comp" to "define and guard the criteria" — no longer the capacity bottleneck but the owner of taste and responsibility.
Once sound: a fuzzy brief was fine because a designer sat in the middle who would interrogate it and turn fuzz into specifics by hand. Where it snaps: when the brief feeds straight into generation, fuzz is no longer digested by a human but filled by the model with the mean of its training distribution — "premium" returns the most common premium, "modern" the most common modern. Fuzzy brief × cheap generation = automatic slop (see Cases B and D). Becomes: a brief must first be decomposed into rule-able criteria (which holds, for whom), and a fuzzy "good" must be translated by a human into a generatable spec before it ever reaches generation.
结构图STRUCTUREFIG. D12 / SEAT SWAP · 旧结构把人放执行位,新结构把人挪到判断位 · old structures seat humans in execution; the new one moves them to judgment看懂:六种旧结构断在同一处——人坐在了机器最强的执行格Read: all six old structures snap at the same place — humans sit on the machine's strongest square, execution
Overlay the six old structures on one diagram and their break points coincide: each seats the human in the left column — execution, exactly the half generation does fast and well. They are not six independent bad habits but six variants of one error: defining human value by output. The fix has only one direction: move the human from the left column to the right, to the judgment the machine cannot answer. This diagram is the skeleton of all of DSN 10, and why the volume keeps saying "judgment retreats to people."
Worth stating plainly: naming old structures does not say execution workers have no value, nor that anyone is fired tomorrow. It says that when a structure anchors human value to a capacity that is now free, that structure first loses explanatory power, then its reason to exist. A team can keep using Figma artboards, keep senior designers, keep taking business requests — as long as the value anchor moves from "ships fast, draws fine" to "judges true, guards for-whom." The target of the critique was never the tool or the role; it is the buried assumption that pins humans to the execution seat.
DSN
11
TOOLKIT · 可照做
DO-THIS TOOLKIT
工具 · 拿去用
Tools · take and use
把"何为好"变成今天就能跑的工具,而非口号
Turning "what good means" into tools you can run today, not slogans
The design surface affords more operable tools than the org surface, because its artifact is concrete and half machine-checkable. This section gives five tools that are not concepts but "run it today": write design tokens as code, build the design system as a two-layer guardrail, decompose taste into a scorecard, write "spread candidates" as a reusable protocol, and make "generation × taste" a coordinate you can place a decision on. Each comes with criteria, not slogans.
工具一 · 设计令牌即代码:让"好"可被机器执行
Tool 1 · Design tokens as code: making "good" machine-enforceable
Design tokens abstract every visual decision — color, spacing, type size, radius, shadow, motion duration — into named variables, stored in a machine-readable format (JSON / CSS variables / platform-native), then injected into every artifact by the build chain. Their meaning is amplified under AI-native: when generation spits out countless variants, tokens are one of the few anchors that keep mass generation consistent. A human judges once — "the primary is this, spacing in multiples of 8, contrast no lower than 4.5:1" — and that judgment is force-inherited by all generation as tokens. Judge once, enforce countless times — the landed form of kernel ②③〔source: W3C Design Tokens Community Group draft and the token practice of major design systems (Material / Carbon / Polaris), grade Ⅳ industry practice〕[R7].
没有令牌 · 生成各跑各的No tokens · generation drifts
十个页面十种蓝、间距随手取、对比度时好时坏——每版单看还行,合起来散。一致性靠人逐页盯,盯不过来。
Ten pages, ten blues, spacing picked offhand, contrast hit-or-miss — each page fine alone, incoherent together. Consistency rides on a human checking page by page, and they cannot keep up.
有令牌 · 判断一次被强制继承With tokens · one judgment force-inherited
Primary, spacing scale, contrast threshold defined once; all generation reads from tokens. Change one token, the whole product updates. The watching time saved goes entirely into "why is this primary right."
工具二 · 设计系统即两层护栏:硬约束 + 软判据
Tool 2 · The design system as a two-layer guardrail: hard rules + soft criteria
A design system should not be an unread spec document but a two-layer guardrail. Layer A is hard rules (HARD-RULES): machine-checkable, lint-able, blocked on hit — ban gradient text, palette from tokens, a cap on blur layers, a font allowlist (excluding system defaults like Inter/Roboto), a contrast threshold, card size must vary with content weight, "hollow slogan words" zero hits. Layer B is soft criteria (SOFT-CRITERIA): not machine-checkable, human-reviewed only — "why is this primary it," "why is this typeface right for these people," "does this motion mean anything." Layer A keeps generation inside the red lines and frees humans from endless low-level consistency checks; Layer B funnels human attention to the few places that truly need judgment. One hard, one soft, mapping exactly onto the two ends of the verifiability gradient (see key figure FIG. D0).
护栏图GUARDRAILFIG. D13 / TWO-LAYER CLAMP · 硬约束夹住生成,软判据留给人 · hard rules clamp generation, soft criteria stay human看懂:A 层硬约束像夹钳把生成夹在红线内;越过夹钳的判断交给 B 层的人Read: Layer-A hard rules clamp generation inside the red lines; judgment past the clamp goes to the human in Layer B
A two-layer guardrail is not a thicker spec but a spec cut in half and enforced two different ways: the machine-checkable half (Layer A) becomes a lint that clamps mass generation inside the red lines at zero cost, untouched by humans; the non-checkable half (Layer B) is marked explicitly as soft criteria, funneling human attention to the few questions that truly need judgment. The point of the clamp: it stops "consistency" from consuming human judgment, leaving all of that judgment for the constitutive questions the clamp cannot catch.
工具三 · 品味评分卡:把"好"拆成可逐条判的判据
Tool 3 · The taste scorecard: decomposing "good" into item-by-item criteria
"Taste" sounds mystical, but on a concrete artifact it almost always decomposes into a set of item-by-item criteria. Below is a generic taste-scorecard skeleton — each line asks a specific yes/no/partial question, not a holistic "is it pretty." Usage: run a candidate through line by line and record hits; what you get is not a single score but a diagnostic table of "where it's right, where it's not, and why." It forces "I feel it's off" into "lines 3 and 6 fail," which can then be discussed, transferred, and fed back to generation.
品味评分卡 · 七问TASTE SCORECARD · seven questions逐条判,不打总分judge per line, no overall score
1 · 为谁
1 · For whom
这个产物能说出它具体为谁而做吗?还是"为所有人"——后者通常等于没有人。
Can this artifact name who specifically it is for? Or is it "for everyone" — which usually means no one.
软判据 · 必由人判
soft · human-only
2 · 主次
2 · Hierarchy
扫一眼,能立刻分出"最重要的一件事"吗?还是所有元素争同样的注意力(slop 的典型征兆)。
At a glance, does the single most important thing stand out? Or do all elements fight for equal attention (a textbook slop symptom)?
半可机检 · 视觉权重
half-checkable · visual weight
3 · 有来由
3 · Motivated
主色、字体、布局,每一个都能说出"为什么是它"吗?还是"生成默认给的、没人问过为什么"。
Can the primary color, typeface, and layout each say "why it"? Or are they "what generation defaulted to, never questioned"?
软判据 · 必由人判
soft · human-only
4 · 无指纹
4 · No fingerprint
有没有 slop 指纹(渐变文字、玻璃拟态、处处大圆角+柔投影、空洞口号词)?这一条可机检。
Any slop fingerprint (gradient text, glassmorphism, rounded-everything + soft shadows, hollow slogan words)? This line is machine-checkable.
硬约束 · 可 lint
hard · lint-able
5 · 经得起真实数据
5 · Survives real data
放进真实长度的文案、真实数量的列表、真实边界条件,它还成立吗?还是只在"完美一帧"里好看。
Put in real-length copy, real-count lists, real edge cases — does it still hold? Or only look good in the "perfect frame"?
Is there one place that lands, that makes someone stay? Or is everything "fine" and the whole hollow — the hardest-to-check yet most decisive line between slop and good work.
软判据 · 必由人判
soft · human-only
7 · 认了代价
7 · Owns a cost
它为了"对这群人"放弃了什么吗?一个不放弃任何人的设计,通常没有为任何人真正做好(见案例 C)。
Did it give up anything to be "for these people"? A design that gives up no one is usually not truly good for anyone (see Case C).
Note line 4 is a hard rule (lint-able), lines 2 and 5 are half-checkable, and the other four are soft criteria. The scorecard itself demonstrates the two-layer guardrail: you can automate the checkable lines and concentrate human attention on lines 1, 3, 6, 7 — the ones the machine cannot answer. A decomposed scorecard is ten-thousand times more useful to generation than "I feel it's not premium enough," because the former feeds back line by line while the latter only buys another mean-slop version.
工具四 · 铺开候选协议:把"多生成几版"变成可复用的判断流程
Tool 4 · The candidate-spread protocol: turning "generate more versions" into a reusable judgment process
"Spread candidates, then converge," if unwritten, easily degenerates into "mindlessly generate a hundred and pick the nice one" — which just swaps one mean for another. Written as a five-step protocol, it becomes a judgment process rather than an output race: ① write constraints and criteria down first (without criteria, spread is noise); ② spread along the structural dimension, not the skin dimension (nine candidates with different layout/flow, not nine recolors of the same one); ③ make the first cut with one established fact (compress aesthetics into fact, see Case A); ④ run controlled comparison only on the shortlist (A/B, usability test, real-data stress test); ⑤ write the winner's reasons back into criteria/tokens/guardrails (let this judgment settle into next time's spec). Steps ① and ⑤ are what separate this protocol from "mindless re-rolling": the first keeps the spread directed, the last keeps judgment reused.
工具五 · 生成×品味坐标:把一个设计决策放上去看它该怎么处理
Tool 5 · The generation × taste plane: place a design decision on it to see how to handle it
The last tool is a coordinate you place a decision on, collapsing all the prior principles into one operable two-dimensional plane. X-axis: how cheaply generation can do this decision (left = expensive/needs-human, right = near-free). Y-axis: how much judging its quality depends on taste (bottom = machine-checkable fact question, top = constitutive taste question). The four quadrants give four prescriptions for how to handle it — this is the plane behind INSTRUMENT 13, the design-judgment allocator. FIG. D14 below draws the coordinate, and the companion INSTRUMENT 15 lets you place your own decision on it and get an instant prescription.
坐标图PLANEFIG. D14 / GENERATION × TASTE · 两轴四象限,每格一个处方 · two axes, four quadrants, one prescription each看懂:把一个设计决策放上去——横轴看生成多廉价,纵轴看判好坏多靠品味Read: place a design decision — X for how cheap generation is, Y for how much judging quality needs taste
This coordinate is the catch-all for every prescription in the volume. Bottom-right: cheap and checkable, hand it wholesale to generation. Top-right: cheap to make but value-laden, so the human freezes value into a rule (tokens/guardrails) the machine enforces. Bottom-left: not taste-driven but fact-needing first (research work). Top-left: both human-needing and constitutive taste — the far end of the verifiability gradient, kept human forever. Place any design decision on it; whichever quadrant it lands in is the prescription to use. INSTRUMENT 15 below makes this figure clickable.
Run one artifact you have through these seven questions (from the taste scorecard above). Each "fails" you check adds to the score; on finishing you get an instant slop-risk verdict and the first fix. This is not an overall score but a diagnosis — it tells you which line fails and which to fix first.
Every SHEET above covers "why generation gets cheap, taste gets scarce, and how to judge"; this piece actually produces the design with you — it does not "design a design org," it is this surface's executable companion: hand it a product, an interface, a landing page, a component, a design system, or a motion piece, and it first runs a redraw-not-graft gate (delete the agents — if it collapses back to one designer hand-crafting one comp, it is still a faster pencil), then runs the closing loop "stand up the guardrail first → write a half-machine-checkable spec → spread candidates along the structural dimension → converge with human taste → feed judgment back into the system." Its scope is product / interaction / system / expression, not skinning a screen.
# 在 Claude Code 里调用invoke inside Claude Code
$ /skill ai-native-design
> "帮我设计这条落地页,多铺几版再帮我挑一版……""design this landing page, spread a few directions and help me pick..."→ 重画闸 · 绿地 / 旧产品切出新面 / 仅赋能 / 人/信任边界redraw gate · greenfield / carve a new surface / mere enablement / human-trust boundary→ 一份设计产物(稿/组件/系统)+ 令牌即代码 + 品味理由 + 指纹反同质检a design artifact (mockups/components/system) + tokens-as-code + a taste rationale + a fingerprint anti-homogenization check
What this is · the design executable companionThe architecture piece (ai-native-architect) designs the organization; this and the other companion pieces each carry one surface — one kernel, mutually coupled, with no fixed reading entry. It runs this volume's methodology into design artifacts. Judgment node = taste: choosing which version among the abundant drafts, and holding the human in it. Generating drafts is cheap; judging them is scarce. Stop-line: never offload taste — do not hand the "which one, and why" press to the model; and do not force soft criteria ("for whom, has soul, on-target") into lint, which maxes every checkable metric yet ships a flawless interface no one wants.
SPEC.V / AI NATIVE METHODOLOGY / OWL METHODOLOGY SERIES
SCOPE /一套方法论 · 完整组织光谱 N=1 → N=众多(一人公司至 agent 网络,同一套第一性原理)One methodology · the full organizational spectrum N=1 → N=many (from the one-person company to the agent network, on a single set of first principles)
SERIES /六卷同一内核 · 本卷是其中一个面,完整接线见上方「方法论系列」。Six volumes, one kernel · this volume is one surface; the full wiring is above under "The Series."
APPENDIX · SOURCES /证据与引用登记 —— 分级口径:Ⅰ 审计级实证(监管文件交叉验证)· Ⅱ 同行评审 · Ⅲ 理论模型/工作论文(引用须写"模型预测",不得写"已证明")· Ⅳ 从业者一手陈述 · Ⅴ 咨询预测(是预测,不是事实)。引用条目以本表为准;本轮 3 票对抗复核未发现被驳倒条目。Evidence and citation registry; grading key: Ⅰ audit-grade empirics (cross-checked against regulatory filings) · Ⅱ peer-reviewed · Ⅲ theoretical model / working paper (citations must read "the model predicts," never "proven") · Ⅳ practitioner first-hand account · Ⅴ advisory forecast (a forecast, not a fact). Citation rows are authoritative in this table; the current 3-vote adversarial review found no overturned source.
REF
级GR
SOURCE
承重论断Load-bearing claim
R1
Ⅳ
Anthropic《How Anthropic teams use Claude Code》2025-07-24 · agentic-coding 一手实践 "How Anthropic teams use Claude Code" 2025-07-24 · first-hand agentic-coding practice · anthropic.com/news
从一句描述生成整套带状态、响应式、可交付代码的界面,已是常规能力而非演示——出一版界面从"几天人时"压到"一次提示+几分钟"(成本侧已塌的从业者证据)Generating a full stateful, responsive, deliverable UI from one prompt is now routine, not a demo: a version drops from "days of human time" to "one prompt + minutes" (practitioner evidence that the cost side has collapsed)
R2
Ⅳ
Karpathy《Software Is Changing (Again)》YC AI Startup School · 2025-06-16 "Software Is Changing (Again)" YC AI Startup School · 2025-06-16 · ycombinator.com/library/MW
Software 3.0 与"验证瓶颈"——生成变廉价后,做功的环节从"能否做出来"移到"该不该是这样、由谁来验"(判断不随模型变便宜的论述锚)Software 3.0 and the "verification bottleneck": once generation gets cheap, the load-bearing step moves from "can it be built" to "should it be this, and who verifies" (the anchor for judgment not getting cheaper with the model)
R3
Ⅳ
设计即代码工具链:pencil/paper(以代码描述图形)· Remotion(以 React 描述视频,Design-as-code tooling: pencil/paper (graphics described as code) · Remotion (video described as React, remotion.dev)· html-video(用网页技术出动效);并本系列工程卷"五条贯穿原理"与 design-as-code 实践) · html-video (motion via web tech); plus this series' engineering volume "five through-lines" and its design-as-code practice
画布工具把设计锁进私有二进制;新一代工具把同一份设计重新表达为纯文本,于是设计掉进软件工程三十年的 git/diff/CI 基础设施——这是产物形态的相变,不是工具竞赛Canvas tools lock design in proprietary binary; the new tools re-express the same design as plain text, so design falls into software engineering's thirty years of git/diff/CI infrastructure — a phase change in artifact form, not a tool race
R4
Ⅱ
Doshi & Hauser,受控实验, controlled experiment《Generative AI enhances individual creativity but reduces the collective diversity of novel content》Science Advances 10(28) · 2024 · doi.org/10.1126/sciadv.adn5290
约 300 名受试者写短篇故事,部分获 AI 提示:个体层面更新颖,集体层面(语义相似度)更趋同——"放大个体、压平分布"的实验影像(对象是叙事文本,迁移到视觉/产品设计是合理但未验证的外推,故不外推具体数字)~300 participants writing short stories, some given AI prompts: more novel individually, more similar collectively (by semantic similarity) — the experimental image of "amplify the individual, flatten the distribution" (the object is narrative text; carrying it to visual/product design is a reasonable but unverified extrapolation, so no specific figure is carried over)
R5
Ⅳ
从业者复盘(脱敏)Practitioner retrospective (de-identified)· DSN 09 案例 A/DCases A / D
结账重做(弃单 31%→19%)与仪表盘 slop 急救(D7 留存 11%→34%)的前后区间,引自一手项目复盘。为脱敏内部数据、非公开受控实验,故仅作区间陈述、不外推到其他品类或团队;用于支撑"铺开→收敛"与"先补判断再重生成"的机理,不用于证明任何普适转化率。The before/after ranges for the checkout redo (abandon 31%→19%) and the dashboard slop rescue (D7 retention 11%→34%), drawn from first-hand project retrospectives. De-identified internal data, not a public controlled experiment, so stated only as ranges and not extrapolated to other categories or teams; used to support the mechanism of "spread→converge" and "restore judgment before regenerating," not to prove any universal conversion rate.
DSN 09 案例 C 的无障碍可用性区间(纯读屏完成关键流程 41%→92%)为脱敏复盘,量级参照 WebAIM 公开的屏幕阅读器用户调查(同类任务可完成性的数量级);证据级 Ⅳ 一手+Ⅱ 公开调查参照,不外推为通用转化结论。The accessibility-usability range in DSN 09 Case C (screen-reader-only completion of the key flow, 41%→92%) is a de-identified retrospective whose order of magnitude references WebAIM's public Screen Reader User Survey (task-completability magnitude for comparable tasks); grade Ⅳ first-hand + Ⅱ public-survey reference, not extrapolated to a generic conversion claim.
R7
Ⅳ
W3C Design Tokens Community Group,规范草案, draft specification· w3.org/community/design-tokens+ Material/Carbon/Polaris 设计系统 token 公开实践 + the public token practice of the Material / Carbon / Polaris design systems
支撑 DSN 11 工具一"设计令牌即代码":令牌作为机器可读的视觉决策锚点,使一次判断被海量生成强制继承。引规范草案与主流设计系统的公开实践,证据级 Ⅳ 行业实践(规范为草案、各家实现细节不同,不主张统一标准已成定论)。Supports DSN 11 Tool 1 "design tokens as code": tokens as a machine-readable anchor for visual decisions, so one judgment is force-inherited by mass generation. Cites the draft spec and the public practice of mainstream design systems; grade Ⅳ industry practice (the spec is a draft and implementations differ, so no settled unified standard is claimed).
REV
DATE
DESCRIPTION
1.0
2026-06
设计卷成形 —— 八 SHEET(生成变富品味变稀缺 · 设计即代码 · 从打磨到判断 · 品味可拆解 · 设计系统即护栏 · 反 slop 红线 · AI 设计环 · 决策分诊)· 三节深化(DSN 09 四个真实案例 · DSN 10 六种旧结构批判 · DSN 11 五件可照做工具)· INSTRUMENT 10 Slop 自检表 + INSTRUMENT 13 设计判断分配台 + INSTRUMENT 15 Slop↔品味自测器 · 十九张论证图 · 本卷独立证据登记 R1-R7(与组织卷登记分离)Design volume takes shape: eight SHEETs (generation gets cheap, taste gets scarce · design-as-code · from making to judging · taste decomposed · the system as guardrail · anti-slop red lines · the AI design loop · decision triage) · three deepening sections (DSN 09 four real cases · DSN 10 critique of six old structures · DSN 11 five do-this tools) · INSTRUMENT 10 the Slop self-check + INSTRUMENT 13 the design-judgment allocator + INSTRUMENT 15 the Slop↔taste self-scorer · nineteen argument-bearing figures · this volume's own evidence registry R1-R7 (separated from the organization volume's)