Eugene's Page

Sat, 27 Jun 2026 11:42:29 GMT

UE“反射”概念：

反射：UE 通过 UClass/UProperty 等系统在运行时提供类型信息和动态访问能力。
UE 的反射系统是通过 UHT 工具和特定宏实现的代码生成机制。你用 UCLASS 标记类、UPROPERTY 标记变量、UFUNCTION 标记函数，这些宏会被 UHT 识别。
UHT 在编译前扫描这些标记，生成.generated.h 和.cpp 文件，里面包含类的反射注册代码，比如 StaticClass () 函数和 UClass 对象的构造逻辑。
生成的代码会把类信息注册到引擎全局的 GObjectClasses 数组里，让引擎在运行时能动态获取类结构、调用函数或访问属性，这支撑了蓝图交互、垃圾回收等核心功能。

因为 UE 需要在运行时动态处理代码信息。比如蓝图可视化编程，引擎得通过反射知道 C++ 类有哪些函数和变量，才能让蓝图调用它们。

比如你在 C++ 里写了一个角色类，里面有个 UFUNCTION 标记的跳跃函数 Jump ()。没有反射的话，蓝图编辑器根本不知道这个 Jump () 函数存在，因为编译后的机器码里，函数名和参数这些信息都被优化掉了。
有了反射，UHT 会在编译时为这个 Jump () 函数生成反射元数据，包括函数名、参数类型、返回值，以及它属于哪个类。引擎运行时能通过这些元数据，在蓝图编辑器里把 Jump () 函数显示出来，你才能拖拽节点调用它。
如果后续你在 C++ 里给 Jump () 加了一个高度参数，反射系统会自动更新元数据，蓝图里对应的函数节点也会同步显示出新参数，整个过程不需要手动写任何蓝图和 C++ 交互的绑定代码。

回退操作 Command 模式（轻量级）：**

每次操作封装为 ICommand { Do(); Undo(); }
维护 undoStack 和 redoStack
执行操作 → 压入 undoStack，清空 redoStack
Undo → 弹出 undoStack，执行 Undo()，压入 redoStack

2. Snapshot 模式（适用于复杂场景）：

操作前序列化整个对象状态的快照
Undo 时直接恢复快照
优点：实现简单，不容易出 bug
缺点：内存开销大

实际项目中的混合方案：

简单属性修改 → Command 模式（记录 oldValue/newValue）
复杂操作（节点图变更、场景编辑）→ Snapshot 或 Diff 模式
合并机制：连续同类操作合并（如拖拽 Slider 时合并为一条记录）

UE智能指针对比表

指针类型	管理对象	所有权	核心作用	适用场景
TObjectPtr	UObject派生类	共享	安全访问UObject，自动参与垃圾回收	替代传统UPROPERTY指针，日常UObject引用
TWeakObjectPtr	UObject派生类	无	弱引用UObject，不阻止回收	避免循环引用，临时访问可能被销毁的UObject
TSoftObjectPtr	UObject派生类	无	软引用UObject，支持资源异步加载	引用可能未加载的资源，如关卡外的模型、纹理
TSharedPtr	非UObject类型	共享	通过引用计数管理生命周期	需要多持有者共享非UObject资源
TUniquePtr	非UObject类型	独占	唯一拥有对象，不可复制	管理无需共享的非UObject资源，如自定义数据结构
TWeakPtr	非UObject类型	无	弱引用TSharedPtr，不增加引用计数	配合TSharedPtr避免循环引用

关键区别说明

管理对象边界：前三种严格用于UObject派生类，依赖UE垃圾回收系统；后三种用于非UObject类型，靠手动内存管理机制。
UObject指针细分：

TObjectPtr是强引用，会让UObject保持存活，是日常开发的首选。
TWeakObjectPtr是弱引用，当UObject被标记为回收时，指针会自动置空，常用在UI控件引用角色对象这类场景。
TSoftObjectPtr存储的是资源路径而非直接内存地址，对象未加载时可异步加载，适合开放世界游戏引用远处的资源。

非UObject指针细分：

TSharedPtr通过引用计数共享对象，当引用计数为0时自动释放内存，但需注意手动避免循环引用。
TUniquePtr是独占式指针，不允许复制，只能通过移动语义转移所有权，性能开销最小。
TWeakPtr需要绑定到TSharedPtr使用，当TSharedPtr释放对象后，TWeakPtr会自动失效，解决循环引用问题。

ECS 架构是什么？和传统 OOP 有什么区别？

	OOP	ECS
数据布局	对象分散在堆上	Component 连续内存排列
缓存友好性	差（指针跳转）	好（数据局部性）
逻辑组织	方法绑定在类上	System 独立遍历 Component
组合性	需要多重继承/组合模式	天然组合（挂 Component 即可）
其实ECS节省的是cpu去查找的时间。

核心概念：

Entity：ID 标识，不存数据
Component：纯数据（Position, Velocity, Health…）
System：纯逻辑（MovementSystem 遍历所有 Position+Velocity 组件）

核心区别：OOP 以对象为核心，数据与逻辑封装在类中，易形成复杂继承树；ECS 将数据与逻辑分离，实体为组件容器，系统批量处理同类组件，数据连续存储提升缓存效率，支持动态组合与并行计算。
UE5 Mass 系统案例：作为 ECS 实现，Mass 用 “片段” 存储实体数据，“处理器” 统一处理逻辑。如《黑客帝国》Demo 中的万人级 crowd 模拟，通过将角色位置、速度等数据打包连续存储，移动处理器可批量更新所有角色坐标，性能远超传统 Actor 方案。

堆Stack 栈heap

堆（Heap）：动态分配内存，大小不固定，生命周期由程序员控制，访问速度较慢，适合存储大对象或需要在运行时确定大小的数据。（没有固定的存取顺序）
栈（Stack）：自动分配内存，大小固定，生命周期由函数调用控制，访问速度快，适合存储局部变量和函数参数。（有固定的存取顺序，后进先出）

Function Calling 的原理是什么？你在项目中怎么用的？

原理： LLM 不直接执行函数，而是 输出结构化的函数调用意图（函数名 + 参数），由宿主程序解析并执行。

RAG 是什么？你是怎么实现的？

RAG（Retrieval-Augmented Generation） = 先检索相关文档，再让 LLM 基于检索结果回答。

ControlNet 是什么？它解决了什么问题？

参考答案：

ControlNet 为预训练 Diffusion Model 添加 空间控制能力。

解决的问题： 原始 Text-to-Image 无法精确控制生成图像的构图、姿态、边缘等空间结构。

原理：

在 Stable Diffusion 的 U-Net 每个 Block 上添加一个并行的 “Zero Convolution” 分支
输入额外的条件图（边缘检测/Canny、深度图、姿态/OpenPose、法线贴图等）
训练时只训练 ControlNet 分支，冻结原始模型

常见 ControlNet 类型：

Canny Edge：控制轮廓
Depth：控制深度结构
OpenPose：控制人物姿态
Segment：控制区域分割
Scribble：控制草图

LoRA 是什么？为什么它很受欢迎？

LoRA（Low-Rank Adaptation） 是一种参数高效微调方法。
LoRA 是一种参数高效的大模型微调技术，核心是冻结原模型权重，仅训练少量低秩矩阵来模拟任务适配所需的参数更新。它参数量仅为全量微调的 0.1%-1%，大幅降低显存占用和训练成本，且推理时可合并权重无额外延迟。
在游戏领域，能快速微调图生图模型生成风格统一的角色装备、场景素材，或微调对话模型让 NPC 生成符合设定的自然台词，适配小团队高效开发需求。

MVC、MVP、MVVM 的区别是什么？

模式	组件职责	组件关系	优缺点
MVC	Model（数据） View（界面） Controller（逻辑）	Controller 直接操作 Model 和 View	简单直观，适合小型项目；Controller 可能变得臃肿
MVP	Model（数据） View（界面） Presenter（逻辑）	Presenter 直接操作 Model，间接更新 View	Presenter 可测试性强；View 依赖 Presenter，增加耦合
MVVM	Model（数据） View（界面） ViewModel（逻辑）	ViewModel 直接操作 Model，通过数据绑定更新 View	双向绑定简化 UI 更新；学习曲线较陡峭，可能引入性能问题

MVP的Preseter和MVVM的ViewModel在职责上非常相似，都是处理业务逻辑和数据交互的中介，但MVVM通过数据绑定机制让ViewModel直接更新View，减少了Presenter中大量的UI更新代码，使得代码更简洁、可测试性更强。MVVM适合复杂UI交互较多的项目，而MVP则更适合简单UI或需要严格分离测试的场景。

GPU 渲染流水线的完整阶段？

参考答案：

GPU 渲染管线（Rendering Pipeline）是 GPU 执行图形渲染的完整流程：

应用阶段（CPU 侧）：

1. 应用阶段（Application Stage）
   → 视锥体裁剪（Frustum Culling）
   → 批次合批（Draw Call Batching）
   → 输出 Draw Call + 顶点数据到 GPU

几何阶段（GPU 顶点着色器）：

2. 顶点着色器（Vertex Shader）
   → 模型空间 → 世界空间 → 视图空间 → 齐次裁剪空间
   → 顶点变换：LocalMatrix × WorldMatrix × ViewMatrix × ProjectionMatrix

3. 曲面细分（Tessellation，可选）
   → Hull Shader → Tessellator → Domain Shader

4. 几何着色器（Geometry Shader，可选）
   → 以图元为单位处理，可生成/销毁图元

光栅化阶段（Rasterization）：

5. 图元装配 & 裁剪
   → Clipping（齐次空间裁剪）
   → Perspective Divide → NDC → Viewport Transform

6. 背面剔除（Back-face Culling）
   → 根据顶 点环绕顺序（顺时针/逆时针）剔除背面

7. 光栅化（Rasterization）
   → 离散化：点/线/三角形 → 片段（Fragment）
   → 视口变换：NDC → Screen Space

片段/像素阶段：

8. 片段着色器（Fragment / Pixel Shader）
   → 逐像素着色：光照计算、纹理采样、颜色输出

9. 逐片段操作（Per-Fragment Operations）
   → 深度测试（Depth Test / Z-Test）
   → 模板测试（Stencil Test）
   → 混合（Alpha Blending）
   → 输出到 Framebuffer

Sat, 27 Jun 2026 11:42:29 GMT

lang: “en”

UE “Reflection” Concept

Reflection: UE provides runtime type information and dynamic access capabilities through systems like UClass and UProperty.
UE’s reflection system is a code-generation mechanism implemented via the UHT tool and specific macros. You mark classes with UCLASS, variables with UPROPERTY, and functions with UFUNCTION — these macros are recognized by UHT.
Before compilation, UHT scans these markers and generates .generated.h and .cpp files containing the reflection registration code for each class, such as the StaticClass() function and the construction logic for UClass objects.
The generated code registers class information into the engine’s global GObjectClasses array, enabling the engine at runtime to dynamically retrieve class structure, invoke functions, or access properties — which in turn powers blueprint interaction, garbage collection, and other core features.

This exists because UE needs to dynamically process code information at runtime. Blueprint visual scripting, for example, requires the engine to know via reflection which functions and variables a C++ class exposes so that blueprints can call them.

Say you write a character class in C++ with a Jump() function marked with UFUNCTION. Without reflection, the blueprint editor has no idea Jump() exists — the function name, parameters, and all that metadata get optimized away in the compiled machine code.
With reflection, UHT generates reflection metadata for Jump() at compile time, including its name, parameter types, return value, and which class it belongs to. At runtime, the engine uses this metadata to surface Jump() in the blueprint editor as a callable node you can drag in.
If you later add a height parameter to Jump() in C++, the reflection system automatically updates the metadata, and the corresponding blueprint node syncs up to show the new parameter — no manual binding code between blueprint and C++ required.

Undo/Redo — Command Pattern (Lightweight)

Wrap each operation as ICommand { Do(); Undo(); }
Maintain undoStack and redoStack
Execute operation → push to undoStack, clear redoStack
Undo → pop from undoStack, call Undo(), push to redoStack

2. Snapshot Pattern (for complex scenarios):

Serialize a snapshot of the entire object state before each operation
On Undo, restore the snapshot directly
Pros: simple to implement, less prone to bugs
Cons: high memory overhead

Hybrid approach for real projects:

Simple property edits → Command pattern (record oldValue/newValue)
Complex operations (node graph changes, scene edits) → Snapshot or Diff pattern
Merge mechanism: collapse consecutive operations of the same type (e.g., dragging a slider merges into a single history entry)

UE Smart Pointer Comparison

Pointer Type	Managed Object	Ownership	Core Role	Use Case
TObjectPtr	UObject-derived	Shared	Safe UObject access, participates in garbage collection automatically	Replaces traditional UPROPERTY pointers; everyday UObject references
TWeakObjectPtr	UObject-derived	None	Weak reference to UObject, does not prevent GC	Avoids circular references; temporary access to UObjects that may be destroyed
TSoftObjectPtr	UObject-derived	None	Soft reference to UObject, supports async asset loading	References assets that may not be loaded, e.g., meshes or textures outside the current level
TSharedPtr	Non-UObject types	Shared	Manages lifetime via reference counting	Multiple owners sharing a non-UObject resource
TUniquePtr	Non-UObject types	Exclusive	Sole ownership, non-copyable	Managing non-UObject resources that don’t need sharing, e.g., custom data structures
TWeakPtr	Non-UObject types	None	Weak reference to a TSharedPtr, does not increment ref count	Avoids circular references when used alongside TSharedPtr

Key Distinctions

Object boundary: The first three are strictly for UObject-derived classes and rely on UE’s garbage collection system; the last three are for non-UObject types and use manual memory management.
UObject pointer breakdown:
- TObjectPtr is a strong reference that keeps the UObject alive — the go-to choice for everyday development.
- TWeakObjectPtr is a weak reference; when a UObject is marked for collection, the pointer is automatically nulled. Common in scenarios like UI widgets holding references to character objects.
- TSoftObjectPtr stores a resource path rather than a direct memory address. The asset can be asynchronously loaded when it isn’t in memory, making it ideal for open-world games referencing distant assets.
Non-UObject pointer breakdown:
- TSharedPtr shares an object via reference counting, automatically freeing memory when the count reaches zero. Be mindful of circular references — they must be avoided manually.
- TUniquePtr is an exclusive pointer: no copying allowed, ownership transfers only through move semantics. Lowest performance overhead.
- TWeakPtr must be bound to a TSharedPtr. Once TSharedPtr releases the object, TWeakPtr automatically becomes invalid, resolving circular reference issues.

What is ECS Architecture? How Does It Differ from Traditional OOP?

	OOP	ECS
Data layout	Objects scattered on the heap	Components laid out in contiguous memory
Cache friendliness	Poor (pointer chasing)	Good (data locality)
Logic organization	Methods bound to classes	Systems iterate over Components independently
Composability	Requires multiple inheritance / composition patterns	Natural composition (just attach Components)

ECS essentially saves CPU time on data lookups.

Core concepts:

Entity: an ID only, stores no data
Component: pure data (Position, Velocity, Health…)
System: pure logic (MovementSystem iterates all Position+Velocity components)

Core difference: OOP centers on objects — data and logic are encapsulated in classes, which tends to grow complex inheritance trees. ECS separates data from logic: entities are containers for components, systems process batches of the same component type, contiguous data storage improves cache efficiency, and the architecture naturally supports dynamic composition and parallel computation.
UE5 Mass system example: as an ECS implementation, Mass stores entity data in “fragments” and unifies logic in “processors.” The Matrix Awakens demo’s crowd simulation of thousands of characters packs position, velocity, and other data into contiguous storage, letting the movement processor batch-update all character coordinates — performance that far exceeds the traditional Actor approach.

Stack vs. Heap

Heap: Dynamically allocated memory, variable size, lifetime controlled by the programmer, slower access. Suitable for large objects or data whose size is determined at runtime. (No fixed access order.)
Stack: Automatically allocated memory, fixed size, lifetime controlled by the function call, fast access. Suitable for local variables and function parameters. (Fixed access order: last in, first out.)

What Is Function Calling and How Have You Used It in Projects?

How it works: The LLM doesn’t execute functions directly — it outputs a structured function-call intent (function name + arguments), which the host program parses and executes.

What Is RAG and How Did You Implement It?

RAG (Retrieval-Augmented Generation) = retrieve relevant documents first, then have the LLM answer based on those retrieved results.

What Is ControlNet and What Problem Does It Solve?

Reference answer:

ControlNet adds spatial control capabilities to a pretrained Diffusion Model.

The problem it solves: Raw Text-to-Image generation cannot precisely control the composition, pose, edges, or other spatial structure of generated images.

How it works:

A parallel “Zero Convolution” branch is added to each block of Stable Diffusion’s U-Net
Additional conditioning images are fed as input (edge detection / Canny, depth maps, pose / OpenPose, normal maps, etc.)
During training, only the ControlNet branch is trained; the original model weights are frozen

Common ControlNet types:

Canny Edge: controls outlines
Depth: controls depth structure
OpenPose: controls human pose
Segment: controls region segmentation
Scribble: controls sketch-based guidance

What Is LoRA and Why Is It So Popular?

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method.
LoRA freezes the original model weights and trains only a small number of low-rank matrices to approximate the parameter updates needed for task adaptation. The trainable parameter count is just 0.1%–1% of full fine-tuning, drastically reducing VRAM usage and training cost. During inference, the weights can be merged with the base model, adding no extra latency.
In game development, LoRA lets you quickly fine-tune an image-to-image model to generate stylistically consistent character equipment and environment assets, or fine-tune a dialogue model so NPCs produce setting-appropriate natural dialogue — a great fit for small teams that need to move fast.

What Is the Difference Between MVC, MVP, and MVVM?

Pattern	Component Responsibilities	Component Relationships	Pros / Cons
MVC	Model (data) / View (UI) / Controller (logic)	Controller directly operates both Model and View	Simple and intuitive, good for small projects; Controller can become bloated
MVP	Model (data) / View (UI) / Presenter (logic)	Presenter directly operates Model, updates View indirectly	Presenter is highly testable; View depends on Presenter, increasing coupling
MVVM	Model (data) / View (UI) / ViewModel (logic)	ViewModel directly operates Model, updates View via data binding	Two-way binding simplifies UI updates; steeper learning curve, potential performance overhead

MVP’s Presenter and MVVM’s ViewModel are very similar in responsibility — both act as intermediaries handling business logic and data interaction. The key difference is that MVVM’s data-binding mechanism lets ViewModel update the View directly, eliminating the large amount of UI-update code you’d write in a Presenter. This makes the code more concise and testable. MVVM suits projects with complex, frequent UI interactions; MVP fits simpler UIs or scenarios where strict test isolation is needed.

What Are the Complete Stages of the GPU Rendering Pipeline?

Reference answer:

The GPU Rendering Pipeline is the full process by which a GPU executes graphics rendering:

Application Stage (CPU side):

1. Application Stage
   → Frustum Culling
   → Draw Call Batching
   → Outputs Draw Calls + vertex data to GPU

Geometry Stage (GPU vertex shaders):

2. Vertex Shader
   → Model Space → World Space → View Space → Homogeneous Clip Space
   → Vertex transform: LocalMatrix × WorldMatrix × ViewMatrix × ProjectionMatrix

3. Tessellation (optional)
   → Hull Shader → Tessellator → Domain Shader

4. Geometry Shader (optional)
   → Processes per-primitive, can emit or discard primitives

Rasterization Stage:

5. Primitive Assembly & Clipping
   → Clipping (homogeneous space clipping)
   → Perspective Divide → NDC → Viewport Transform

6. Back-face Culling
   → Discards back-facing primitives based on vertex winding order (CW/CCW)

7. Rasterization
   → Discretization: points / lines / triangles → Fragments
   → Viewport Transform: NDC → Screen Space

Fragment / Pixel Stage:

8. Fragment Shader (Pixel Shader)
   → Per-pixel shading: lighting calculations, texture sampling, color output

9. Per-Fragment Operations
   → Depth Test (Z-Test)
   → Stencil Test
   → Alpha Blending
   → Output to Framebuffer

Obsidian 学习路径与功能笔记

Fri, 08 May 2026 16:00:00 GMT

Obsidian 学习路径与功能笔记

目标：以最少的折腾时间，把 Obsidian 用成”长期可复利”的知识库；先稳住基本功，再按需扩展插件与方法论。

0. 为什么是 Obsidian

本地优先：所有笔记是 .md 纯文本，跟随 Git/网盘随便同步；与本仓库 Hexo 博客天然兼容（notes/_posts/**/*.md 可直接被博客引擎消费）。
链接驱动：用 [[wikilink]] 把碎片连成网，长期沉淀越久越值钱。
插件生态：核心插件 + 社区插件 ≈ “可编程的笔记系统”。
零锁定：随时可以离开，文件即数据。

1. 我自己的文件目录路径

这个 Vault 的根目录 C:\Users\youdr\iCloudDrive\Doc\notes\ 下有四个隐藏文件夹，分别服务于不同的工具链：

notes/
├── .claude/       # Claude Code 的 vault 级配置
├── .claudian/     # Claudian 插件的运行时数据
├── .obsidian/     # Obsidian 本体的所有配置
└── .omc/          # oh-my-claudecode (OMC) 的状态存储

1. 学习路径总览（建议按周推进）

阶段	时长	核心目标	关键产出
W1 基础	3–5 天	掌握 Vault、Markdown、双链、标签	第一篇带链接的笔记
W2 组织	1 周	文件夹策略、模板、每日笔记	个人 PKM 结构成型
W3 进阶	1 周	Dataview、Templater、Graph View	自动化索引页
W4 工作流	1 周	与 Hexo / Git / VSCode 联动	笔记 → 博客一键流程
持续	—	方法论（Zettelkasten / PARA / Johnny Decimal）	可复利的二阶笔记

2. W1：基础——把”骨架”立起来

2.1 核心概念

Vault（库）：一个文件夹 = 一个 Vault。所有 .md 与 .obsidian/ 配置都在里面。
Note（笔记）：一个 .md 文件 = 一条笔记。命名建议：日期前缀 + 主题，如 20260509.ObsidianFunctionLearning.md。
Frontmatter（YAML 元数据）：文件最顶部的 --- 块，存放 title / tags / date / status，被 Dataview / 主题 / Hexo 共同消费。
Link（双链）：[[文件名]] 或 [[文件名|显示文本]]；[[A#二级标题]] 跳转到具体小节；[[A^block-id]] 引用块。
Backlink（反向链接）：右侧面板自动列出”谁链接了我”，是 Obsidian 的灵魂功能。
Tag（标签）：#topic/subtopic 支持层级；和文件夹是互补关系，不是替代。

2.2 必须背下来的快捷键

操作	快捷键
全局命令面板	`Ctrl + P`
快速切换文件	`Ctrl + O`
新建笔记	`Ctrl + N`
双链补全	输入 `[[` 触发
切换源码/预览	`Ctrl + E`
打开当天 Daily Note	`Ctrl + Shift + D`（启用 Daily Notes 插件后）
源码模式切换（自己设置）	ctrl + /

2.3 W1 练习

在 notes/ 打开 Vault，写 3 条笔记（任意主题）。
让其中两条用 [[]] 互相链接。
给每条加 tags: [...]，在右侧面板看 Backlink。

3. W2：组织——确立结构与模板

3.1 文件夹策略（与本仓库现状对齐）

当前仓库已有的目录可直接套用：

notes/
├── _posts/        # Hexo 发布的正式文章（双语）
│   ├── zh-CN/
│   └── en/
├── Learning/      # 学习笔记 / 个人草稿（本文件所在）
├── AIdocs/        # 项目级架构、决策、路线图
└── about/         # 关于页

建议在 Learning/ 下再分：

daily/：每日笔记（自动创建）
topic/：主题长文（成熟后迁出到 _posts/）
inbox/：临时草稿，未分类

3.2 三种主流方法论（任选其一即可，别全上）

方法	一句话	适合谁
Zettelkasten	一卡一念，靠双链组网，不靠分类	长期写作者、研究者
PARA	Project / Area / Resource / Archive	项目驱动型工作者
Johnny Decimal	`10-19 / 11.01` 编号制	偏好结构与索引的人

建议：你已经有 _posts / Learning / AIdocs 这种”项目+资源”分布，先跑 PARA，等笔记数量过 500 篇再考虑 Zettelkasten。

3.3 必装核心插件（自带）

进入 设置 → 核心插件，把这些打开：

✅ Daily Notes：每日一篇时间轴笔记
✅ Templates：插入模板内容
✅ Outline：右侧大纲
✅ Backlinks / Outgoing Links：反向 / 正向链接面板
✅ Graph View：知识图谱
✅ Tag Pane：标签面板
✅ File Recovery：自动备份，强烈推荐
⚠️ Workspaces：多布局切换（进阶可开）

4. W3：进阶——让笔记自己动起来

4.1 必装社区插件（短列表，不要贪多）

插件	作用
Dataview	用类 SQL 查询笔记元数据，自动生成索引页
Templater	比内置 Templates 强 100 倍，支持 JS 脚本
Excalidraw	手绘 / 流程图，附带双链
Advanced Tables	表格编辑器（写 Markdown 表格的人都需要）
Obsidian Git	Vault 自动 commit / push（你这个仓库正好用得上）
Iconize / Iconic	给文件夹/文件加图标，提升可视性
Style Settings	调主题细节（字体、间距、颜色）
Linter	Markdown 风格统一，YAML 排序

4.2 Dataview 入门示例

在任意笔记里写：

```dataview
TABLE date, status, file.tags AS tags
FROM "Learning"
WHERE status = "in-progress"
SORT date DESC
```

→ 自动列出 Learning/ 下所有 status: in-progress 的笔记。

4.3 Templater 模板示例

在 notes/Templates/learning.md：

---
title: <% tp.file.title %>
date: <% tp.date.now("YYYY-MM-DD") %>
tags: []
status: in-progress
---

# <% tp.file.title %>

## 背景

## 内容

## 参考

→ 新建笔记时一键套用，title / date 自动填。

5. W4：工作流——把 Obsidian 嵌进现有管线

本仓库是 Hexo 博客 + Git 版本管理 + Obsidian 笔记 的三件套，目标是：

草稿（Learning/inbox/） →
成熟（Learning/topic/） →
发布（_posts/zh-CN/ 与 _posts/en/） →
博客上线（Hexo build）

5.1 与 Hexo 兼容的 frontmatter

博客文章需要的字段（参考 notes/_posts/ 现有文章）：

---
title: 文章标题
date: 2026-05-09 12:00:00
categories: [分类]
tags: [标签1, 标签2]
lang: zh-CN
---

5.2 与 Git 联动

用 Obsidian Git 插件做”自动 commit”。
但本仓库已经有自己的提交规范（见 git log 风格），建议：
- 写作期间：手动 commit。
- 每日睡前：用 Obsidian Git 一键 push。

5.3 与博客主题（hexo-theme-magnetic）的注意事项

你当前主题里的 tag-graph.js 与 Obsidian Graph View 是两套图谱，互不影响。
笔记里 [[wikilink]] 在博客渲染时不会自动转成超链接（除非装 Hexo 插件 hexo-filter-github-emojis 类的扩展）。如果要发到博客，改成标准 Markdown 链接。

6. 进阶专题（按需展开）

6.1 Canvas（白板）

内置功能，新建白板 → 把多张笔记拖进来当卡片，画连线。适合做知识地图、项目看板。

6.2 Sync 方案对比

方式	成本	优点	坑
Obsidian Sync 官方	$4/月	端到端加密、最稳	收费
iCloud / OneDrive	免费	简单	`.obsidian/` 容易冲突
Git（推荐你这种）	免费	完整版本史	大文件需 LFS
Syncthing	免费	局域网快	配置略折腾

6.3 移动端

iOS / Android 客户端免费。
移动端 + iCloud / Git 跨设备 → 手机随手记，电脑深度整理。

7. 路径布置建议（针对本仓库）

关键问题：根目录 D:\Project\UGit\EugenePage\.obsidian 已存在，说明 Vault 当前打开的是整个仓库而不是 notes/。

两种方案，二选一：

方案 A：把 `notes/` 单独作为 Vault（推荐）

在 Obsidian 起始页 → “打开文件夹作为库” → 选 D:\Project\UGit\EugenePage\notes。
优点：Vault 范围干净，只看到笔记，不被 themes/、scripts/ 干扰。
操作：把根目录的 .obsidian/ 移动到 notes/.obsidian/（或删掉重建），并在 .gitignore 里保留 notes/.obsidian/workspace.json（个人布局，不必跟踪）但保留核心插件配置。

方案 B：保持仓库根作为 Vault

优点：可以同时编辑主题代码与笔记。
缺点：Graph View 会扫描所有 .md，大量噪音。
必须做：在 Obsidian 设置 → 文件与链接 → 排除的文件 里把 themes/、node_modules/、public/ 全部排除。

`.gitignore` 建议（任一方案都加）

# Obsidian 个人配置（团队不共享）
.obsidian/workspace.json
.obsidian/workspace-mobile.json
.obsidian/cache
.obsidian/plugins/*/data.json   # 视情况，含敏感的不要 push

但 .obsidian/core-plugins.json、community-plugins.json、appearance.json、hotkeys.json 建议跟踪，方便多机同步。

8. 插件/流程

Image Auto Upload

复制或拖入图片时自动上传至图床，与 PicGo 生态兼容。底层依赖 PicList（PicGo 的社区增强版）的命令行接口，需提前配置好图床后方可使用。功能定位与 Typora 的图片上传一致，是笔记软件的基础素质之一。

Obsidian CLI + Claudian

让 AI（Claude Code）直接读写 Vault 的桥梁，分两步启用：

开启 CLI：设置 → 关于 → Obsidian CLI → 点击注册 → 重启 Obsidian。
安装插件：从社区插件市场搜索并安装 Claudian。
前提：本机已完成 Claude Code 的配置，Claudian 会自动识别并接入。

整体的体验下来，会很像在 VSCode 里面使用 GitHub Copilot。在输入框的 Yolo 功能,，相当于是自动化修改。
而且 Claude 里面其实也有 Plan 模式的，点击 Shift + Tab 就可以直接在对话框里切换 Plan 模式
然后和它共同商量每一步应该怎么做，最后再让它去执行.
同时使用斜杠，依旧可以调用一些 Claude 里面的命令.

这一部分参考教程：
https://www.bilibili.com/video/BV1xFwxzKE5D

配套 Claude Code Skills（kepano/obsidian-skills）

Obsidian CEO Steph Ango 在 kepano/obsidian-skills 发布了一组官方 Agent Skill，让 Claude Code 真正”懂” Obsidian 的文件格式与协议。安装方式：把每个 skill 文件夹放到 vault 的 .claude/skills// 下，仅对该 vault 启动的 Claude Code 生效，不会污染全局或其它项目。

Skill	用途	我个人是否安装
obsidian-markdown	读写 Obsidian Flavored Markdown：`[[wikilink]]`、`![[embed]]`、callouts（`> [!note]`）、properties frontmatter 等 Obsidian 专属语法。不装的话 Claude 写 `.md` 时会按通用 Markdown 处理，可能破坏专属语法。	未安装（计划安装）
obsidian-bases	读写 `.base` 文件（Obsidian 1.9+ 引入的数据库视图，支持 views / filters / formulas / summaries）	未安装（暂不需要，当前 vault 还没有 `.base` 文件，等真正用到 Bases 再补）
json-canvas	读写 `.canvas` 文件（白板的 JSON 格式，包含节点、边、组、连线），让 Claude 能直接生成或修改 Canvas	未安装（计划安装）
obsidian-cli	教 Claude 调用 Obsidian 内置的 `obsidian://` URI 协议（如 `obsidian://open?vault=...&file=...`）以及可选的 HTTP API。不需要额外安装任何命令行二进制——所有调用走 Obsidian 自带能力。	未安装（计划安装）
defuddle	用 Defuddle 库从网页抽取干净 Markdown，自动去掉导航栏、广告、推荐等噪音，节省 token，适合”网页剪藏 → 笔记”场景	未安装（计划安装）

关于 obsidian-cli 的常见误解：这个 skill 不等于”装一个独立 CLI 工具”。obsidian:// URI 协议从 Obsidian 1.0 起就是默认内置功能，skill 的作用只是让 Claude 学会调用它来实现”打开某篇笔记、触发某个命令、跳转到指定 block”等操作。HTTP API 部分若想启用，需要额外安装社区插件 Local REST API（可选）。

与 Claudian 的关系：Claudian 是 Obsidian 端的插件，提供”在 Obsidian UI 里跟 Claude Code 对话”的入口；上面这些 skill 是 Claude Code 端的能力包，让 Claude 在读写 vault 文件时更专业。两者互补、不冲突。

下面我挨个介绍我比较推荐的这几个 skills。

Advanced Canvas 插件 + json-canvas

比如说我这一篇文章的顶部，有一个关于文件路径的介绍。上面有个思维导图，这个思维导图就是用 JSON Canvas 画出来的。
如果遇到一些比较难的文章或者是比较复杂的架构，可以让他帮你整理思维导图，方便理解。

Advanced Canvas 提供 30+ 增强功能：自定义流程图节点样式、Graph View 集成、幻灯片演示模式；支持 Portal（Canvas 套娃）与单节点嵌入 Markdown。

9. 参考资源

官方文档：https://help.obsidian.md/
官方论坛：https://forum.obsidian.md/
中文社区：少数派（sspai.com）的 Obsidian 系列
YouTube：Linking Your Thinking、Nicole van der Hoeven、Bryan Jenks
方法论：
- Niklas Luhmann《How to Take Smart Notes》（Zettelkasten 圣经）
- Tiago Forte《Building a Second Brain》（PARA 提出者）

Houdini MCP Project Comparison

Sat, 02 May 2026 04:00:00 GMT

Houdini MCP Project Comparison: capoomgit/houdini-mcp vs healkeiser/fxhoudinimcp

Introduction

As the MCP (Model Context Protocol) standard gains traction, more and more DCC applications are adding AI assistant integrations. In the Houdini ecosystem, two major open-source MCP projects currently exist:

capoomgit/houdini-mcp — an early-stage project with a clean, minimal structure
healkeiser/fxhoudinimcp — a newer, more feature-complete implementation

This post compares the two across architecture design, feature coverage, installation experience, and extensibility to help you pick the right one for your workflow.

Overview Comparison

Dimension	houdini-mcp (capoomgit)	fxhoudinimcp (healkeiser)
Focus	Lightweight MCP bridge	Full-featured Houdini MCP server
Tool count	Unspecified; covers basic operations	168 tools + 8 resources + 6 workflow prompts
Architecture	Custom TCP socket (port 9876)	Houdini’s built-in `hwebserver` (port 8100)
Installation	Manual file copy to Houdini directory	PyPI package, `pip install fxhoudinimcp`
Package manager dependency	Requires `uv`	Standard `pip` works fine
Thread safety	Not explicitly addressed	`hdefereval.executeInMainThreadWithResult()`
License	Not specified	MIT
Maintenance status	Community-maintained	Actively developed

Architecture Comparison

houdini-mcp (capoomgit)

1	Claude Desktop ──(stdio)──> MCP Bridge Script ──(TCP :9876)──> Houdini Plugin

Communication: The MCP Bridge Script talks to Claude via stdin/stdout and to Houdini via a custom TCP socket.
Server side: A hand-rolled HoudiniMCPServer listening on localhost:9876.
Inspired by: Adapted from blender-mcp.

fxhoudinimcp (healkeiser)

1	Claude Desktop / Cursor / Claude Code ──(stdio/streamable-http)──> FXHoudini MCP Server ──(HTTP :8100)──> Houdini hwebserver

Communication: The MCP Server talks to AI clients via stdio or streamable-http, and talks to Houdini via HTTP/JSON.
Server side: Uses Houdini’s built-in hwebserver directly — no custom server process needed.
Thread safety: Uses hdefereval.executeInMainThreadWithResult() to ensure all hou.* API calls run on the main thread.

Architecture Analysis

Aspect	houdini-mcp	fxhoudinimcp
Server implementation	Custom socket	Houdini native `hwebserver`
Transport protocol	TCP	HTTP / JSON
MCP transport	stdio	stdio + streamable-http
Thread safety	Unknown	Explicitly guaranteed
Dependency complexity	Requires a separate Bridge Script process	MCP Server communicates directly with hwebserver

Verdict: fxhoudinimcp’s architecture is more robust — it reuses Houdini’s native components, reducing the surface area for custom-code bugs.

Feature Coverage Comparison

houdini-mcp Feature Set

Provides basic Houdini control:

Create and modify nodes
Execute Python / HScript code
Basic scene operations
OPUS integration: Connects to the OPUS procedural furniture and environment asset library via RapidAPI (exclusive feature)

fxhoudinimcp Feature Set (19 categories, 168 tools)

Category	Tools	Description
Scene Management	7	Open, save, import/export, scene info
Node Operations	16	Create, delete, copy, connect, layout, flag
Parameters	10	Get/set values, expressions, keyframes, spare parameters
Geometry (SOPs)	12	Points, primitives, attributes, groups, sampling, nearest-point lookup
LOPs/USD	18	Stage inspection, Prim, layers, composition, variants, lights
DOPs	8	Simulation info, DOP objects, step/reset, memory usage
PDG/TOPs	10	Cook, Work Items, scheduler, dependency graph
COPs (Copernicus)	7	Image nodes, layers, VDB data
HDAs	10	Create, install, and manage digital assets
Animation	9	Keyframes, playbar control, frame range
Rendering	9	Viewport screenshots, render nodes, settings, render launch
VEX	5	Create/edit Wrangle nodes, validate VEX code
Code Execution	4	Python, HScript, expressions, environment variables
Viewport/UI	11	Pane management, screenshots, status messages, error detection
Scene Context	8	Network overview, Cook chain, selection, scene summary, error analysis
Workflows	8	One-click Pyro/RBD/FLIP/Vellum setup, SOP chains, render configuration
Materials	4	List, inspect, create materials and shader networks
CHOPs	4	Channel data, CHOP nodes, export channels to parameters
Cache	4	List, inspect, clear, write file caches
Takes	4	List, create, switch Takes and parameter overrides

Highlights:

One-click workflows: Instant Pyro, RBD, FLIP, and Vellum simulation setup
Full USD/LOPs support: 18 tools covering the USD pipeline
Copernicus (COPs) support: Image processing node operations
Scene context analysis: Error detection and Cook chain tracing

Installation and Configuration Comparison

houdini-mcp Installation Steps

Install uv (Python package manager)
Manually create the Houdini scripts directory and copy files
Run uv add "mcp[cli]" in the directory
Manually create a Shelf Tool
(Optional) Create a Houdini Package JSON for auto-loading
Configure claude_desktop_config.json

{
  "mcpServers": {
    "houdini": {
      "command": "uv",
      "args": ["run", "python", "C:/path/to/houdini_mcp_server.py"]
    }
  }
}

fxhoudinimcp Installation Steps

pip install fxhoudinimcp (or uv pip install fxhoudinimcp)
Copy the Package JSON to the Houdini packages directory
Configure the MCP client

{
  "mcpServers": {
    "fxhoudini": {
      "command": "python",
      "args": ["-m", "fxhoudinimcp"],
      "env": {
        "HOUDINI_HOST": "localhost",
        "HOUDINI_PORT": "8100"
      }
    }
  }
}

Claude Code support (one-liner):

1	claude mcp add --scope user fxhoudini -- python -m fxhoudinimcp

Installation Experience Comparison

Aspect	houdini-mcp	fxhoudinimcp
Installation steps	5-6 steps, multiple manual operations	2-3 steps, standardized process
Package manager	Requires `uv`	Standard `pip` or `uv` both work
PyPI package	No	Yes
Auto-start	Requires manual Package configuration	`uiready.py` handles auto-start
Documentation quality	Basic README	Detailed categorized docs + environment variable reference

Client Support Comparison

AI Client	houdini-mcp	fxhoudinimcp
Claude Desktop	Supported	Supported
Cursor	Supported	Supported
VS Code	Not mentioned	Supported
Claude Code CLI	Not mentioned	Supported (one-liner)

Exclusive Features

Exclusive to houdini-mcp

OPUS integration: Access to the OPUS procedural asset library (furniture and environment assets) via RapidAPI. Requires a RapidAPI account and an active API subscription.

Exclusive to fxhoudinimcp

One-click simulation workflows: Pyro / RBD / FLIP / Vellum setup in a single call
Deep USD/LOPs support: 18 dedicated tools
Copernicus image processing: COPs node operations
Scene error analysis: Automatic Cook error detection and reporting
Environment variable configuration: HOUDINI_HOST, HOUDINI_PORT, FXHOUDINIMCP_AUTOSTART, and more
Dual transport mode: stdio + streamable-http

Recommendations by Use Case

Choose houdini-mcp (capoomgit) if you:

Only need basic AI control of Houdini
Are already using a uv-based workflow
Specifically need OPUS procedural asset library integration
Have a simple project scope and want to get started quickly

Choose fxhoudinimcp (healkeiser) if you:

Need comprehensive Houdini coverage (SOPs, LOPs, DOPs, TOPs, COPs, etc.)
Work with USD/LOPs pipelines
Want one-click simulation workflows (Pyro / FLIP / Vellum / RBD)
Prefer a standardized installation via a PyPI package
Use Claude Code CLI as your primary AI tool
Need guaranteed thread safety
Value active maintenance and long-term project evolution

Conclusion

Evaluation Dimension	houdini-mcp	fxhoudinimcp	Winner
Feature richness	Basic	168 tools	fxhoudinimcp
Architecture robustness	Custom socket	Native hwebserver	fxhoudinimcp
Installation convenience	Multi-step manual	One-liner pip	fxhoudinimcp
Client compatibility	Desktop + Cursor	Desktop + Cursor + VSCode + Claude Code	fxhoudinimcp
Asset ecosystem	OPUS integration	None	houdini-mcp
Documentation quality	Basic	Comprehensive	fxhoudinimcp
Maintenance activity	Community-maintained	Actively developed	fxhoudinimcp

Overall recommendation: For most users, fxhoudinimcp is the better choice — broader feature coverage, a more robust architecture, and a smoother installation process. If you specifically need the OPUS procedural asset library integration, houdini-mcp is worth a look as a complementary tool.

References

capoomgit/houdini-mcp
healkeiser/fxhoudinimcp
blender-mcp — the project that inspired houdini-mcp

Houdini MCP 项目对比评测

Sat, 02 May 2026 04:00:00 GMT

Houdini MCP 项目对比评测：capoomgit/houdini-mcp vs healkeiser/fxhoudinimcp

引言

随着 MCP（Model Context Protocol）协议的普及，越来越多的 DCC 软件开始接入 AI 助手。在 Houdini 生态中，目前有两个主要的 MCP 开源项目：

capoomgit/houdini-mcp — 早期项目，结构简洁
healkeiser/fxhoudinimcp — 后起之秀，功能全面

本文从架构设计、功能覆盖、安装体验、扩展性等维度进行对比，帮助选择适合自己工作流的项目。

总览对比

维度	houdini-mcp (capoomgit)	fxhoudinimcp (healkeiser)
定位	轻量级 MCP 桥接	全面型 Houdini MCP 服务器
工具数量	未明确分类，基础功能为主	168 个工具 + 8 资源 + 6 工作流提示
架构	自定义 TCP Socket（端口 9876）	Houdini 内置 `hwebserver`（端口 8100）
安装方式	手动复制文件到 Houdini 目录	PyPI 发布，`pip install fxhoudinimcp`
包管理依赖	强依赖 `uv`	标准 `pip` 即可
线程安全	未明确说明	`hdefereval.executeInMainThreadWithResult()`
许可证	未明确	MIT
维护状态	社区维护	活跃开发中

架构设计对比

houdini-mcp（capoomgit）

1	Claude Desktop ──(stdio)──> MCP Bridge Script ──(TCP :9876)──> Houdini Plugin

通信方式：MCP Bridge Script 通过 stdin/stdout 与 Claude 通信，通过自定义 TCP Socket 与 Houdini 通信。
服务端：自己实现的 HoudiniMCPServer，监听在 localhost:9876。
灵感来源：基于 blender-mcp 改写。

fxhoudinimcp（healkeiser）

1	Claude Desktop / Cursor / Claude Code ──(stdio/streamable-http)──> FXHoudini MCP Server ──(HTTP :8100)──> Houdini hwebserver

通信方式：MCP Server 通过 stdio 或 streamable-http 与 AI 客户端通信，通过 HTTP/JSON 与 Houdini 通信。
服务端：直接使用 Houdini 内置的 hwebserver，无需额外启动自定义服务器。
线程安全：使用 hdefereval.executeInMainThreadWithResult() 确保 hou.* API 调用在主线程执行。

架构差异分析

对比点	houdini-mcp	fxhoudinimcp
服务端实现	自定义 Socket	Houdini 原生 `hwebserver`
传输协议	TCP	HTTP / JSON
MCP 传输	stdio	stdio + streamable-http
线程安全	未知	有明确保障
依赖复杂度	需要额外运行 Bridge Script	MCP Server 直接与 hwebserver 通信

结论：fxhoudinimcp 的架构更稳健 — 复用 Houdini 原生组件，减少自定义代码带来的潜在问题。

功能覆盖对比

houdini-mcp 功能范围

提供基础的 Houdini 控制：

创建和修改节点
执行 Python / HScript 代码
场景基础操作
OPUS 集成：通过 RapidAPI 接入 OPUS 的程序化家具和环境资产库（独有功能）

fxhoudinimcp 功能范围（19 个分类，168 个工具）

分类	工具数	说明
Scene Management	7	打开、保存、导入/导出、场景信息
Node Operations	16	创建、删除、复制、连接、布局、标记
Parameters	10	获取/设置值、表达式、关键帧、自定义参数
Geometry (SOPs)	12	点、面、属性、组、采样、最近点查找
LOPs/USD	18	Stage 检查、Prim、层、合成、变体、灯光
DOPs	8	模拟信息、DOP 对象、步进/重置、内存使用
PDG/TOPs	10	Cook、Work Item、调度器、依赖图
COPs (Copernicus)	7	图像节点、层、VDB 数据
HDAs	10	创建、安装、管理数字资产
Animation	9	关键帧、播放条控制、帧范围
Rendering	9	视口截图、渲染节点、设置、渲染启动
VEX	5	创建/编辑 Wrangle、验证 VEX 代码
Code Execution	4	Python、HScript、表达式、环境变量
Viewport/UI	11	面板管理、截图、状态消息、错误检测
Scene Context	8	网络概览、Cook 链、选择、场景摘要、错误分析
Workflows	8	一键 Pyro/RBD/FLIP/Vellum 设置、SOP 链、渲染配置
Materials	4	列出、检查、创建材质和着色器网络
CHOPs	4	通道数据、CHOP 节点、导出通道到参数
Cache	4	列出、检查、清除、写入文件缓存
Takes	4	列出、创建、切换 Take 及参数覆盖

亮点：

一键工作流：Pyro、RBD、FLIP、Vellum 模拟一键搭建
USD/LOPs 全面支持：18 个工具覆盖 USD 工作流
Copernicus (COPs) 支持：图像处理节点操作
场景上下文分析：错误检测、Cook 链追踪

安装与配置对比

houdini-mcp 安装步骤

安装 uv（Python 包管理工具）
手动创建 Houdini 脚本目录并复制文件
在目录中运行 uv add "mcp[cli]"
手动创建 Shelf Tool
（可选）创建 Houdini Package JSON 实现自动加载
配置 claude_desktop_config.json

{
  "mcpServers": {
    "houdini": {
      "command": "uv",
      "args": ["run", "python", "C:/path/to/houdini_mcp_server.py"]
    }
  }
}

fxhoudinimcp 安装步骤

pip install fxhoudinimcp（或 uv pip install fxhoudinimcp）
复制 Package JSON 到 Houdini packages 目录
配置 MCP 客户端

{
  "mcpServers": {
    "fxhoudini": {
      "command": "python",
      "args": ["-m", "fxhoudinimcp"],
      "env": {
        "HOUDINI_HOST": "localhost",
        "HOUDINI_PORT": "8100"
      }
    }
  }
}

Claude Code 支持（一行命令）：

1	claude mcp add --scope user fxhoudini -- python -m fxhoudinimcp

安装体验对比

对比点	houdini-mcp	fxhoudinimcp
安装步骤	5-6 步，多处手动操作	2-3 步，标准化流程
包管理	强依赖 `uv`	标准 `pip` / `uv` 均可
PyPI 发布	无	有
自动启动	需手动配置 Package	`uiready.py` 自动启动
文档质量	基础 README	详细的分类文档 + 环境变量说明

客户端支持对比

AI 客户端	houdini-mcp	fxhoudinimcp
Claude Desktop	支持	支持
Cursor	支持	支持
VS Code	未提及	支持
Claude Code CLI	未提及	支持（一行命令）

独有功能

houdini-mcp 独有

OPUS 集成：通过 RapidAPI 接入 OPUS 程序化资产库，可获取家具和环境资产。需要注册 RapidAPI 账号并订阅 API。

fxhoudinimcp 独有

一键模拟工作流：Pyro / RBD / FLIP / Vellum 一键搭建
USD/LOPs 深度支持：18 个工具
Copernicus 图像处理：COPs 节点操作
场景错误分析：自动检测和报告 Cook 错误
环境变量配置：HOUDINI_HOST、HOUDINI_PORT、FXHOUDINIMCP_AUTOSTART 等
双传输模式：stdio + streamable-http

适用场景建议

选择 houdini-mcp（capoomgit）的情况

只需要基础的 AI 控制 Houdini 功能
已经在使用 uv 工作流
需要 OPUS 程序化资产库的集成
项目结构简单，希望快速上手

选择 fxhoudinimcp（healkeiser）的情况

需要全面的 Houdini 功能覆盖（SOPs、LOPs、DOPs、TOPs、COPs 等）
需要 USD/LOPs 工作流支持
需要一键模拟工作流（Pyro / FLIP / Vellum / RBD）
希望使用标准化安装（PyPI 包）
使用 Claude Code CLI 作为主要 AI 工具
需要线程安全保障
重视项目的活跃维护和长期演进

结论

评价维度	houdini-mcp	fxhoudinimcp	胜出
功能丰富度	基础	168 工具	fxhoudinimcp
架构稳健性	自定义 Socket	原生 hwebserver	fxhoudinimcp
安装便利性	手动多步	pip 一键	fxhoudinimcp
客户端兼容	Desktop + Cursor	Desktop + Cursor + VSCode + Claude Code	fxhoudinimcp
资产生态	OPUS 集成	无	houdini-mcp
文档质量	基础	完善	fxhoudinimcp
维护活跃度	社区维护	活跃开发	fxhoudinimcp

综合推荐：对于大多数用户，fxhoudinimcp 是更好的选择 — 更全面的功能覆盖、更稳健的架构、更便捷的安装流程。如果你特别需要 OPUS 程序化资产库的集成，可以额外关注 houdini-mcp。

参考链接

AI Agent Framework Research Notes

Thu, 30 Apr 2026 04:00:00 GMT

AI Agent Framework Research Notes

Last updated: 2026-04-30
As AI Agent technology evolves at a rapid pace, new agent development frameworks keep appearing. This document surveys and compares six of the most widely adopted Agent frameworks available today, helping developers choose the right tool for their use case.

I. Framework Overview Comparison
II. LangGraph
III. CrewAI
IV. LlamaIndex
V. Dify
VI. OpenAI Agents SDK
VII. Google ADK
VIII. Framework Selection Guide

I. Framework Overview Comparison

Dimension	LangGraph	CrewAI	LlamaIndex	Dify	OpenAI Agents SDK	Google ADK
Developer	LangChain Inc.	CrewAI Inc.	LlamaIndex Inc.	LangGenius	OpenAI	Google
Latest Version	v1.1.10	v1.14.3	v0.14.6	v1.6.0+	v0.14.6	v1.31.1
License	MIT	MIT	MIT	Dify License (Apache 2.0+)	MIT	Apache 2.0
Language	Python / JS	Python	Python / TS	Visual (multi-language SDK)	Python / JS	Python / Java / Go / TS
Core Philosophy	Graph orchestration	Role-playing teams	RAG + Agent	Low-code platform	Minimal multi-agent	Code-first
Model Support	Model-agnostic	Model-agnostic	Model-agnostic	Model-agnostic	100+ LLMs	Model-agnostic
Learning Curve	Steep	Moderate	Moderate	Low	Low	Moderate
Best For	Complex stateful workflows	Multi-role collaboration	RAG + Agent	Rapid prototyping / non-technical users	OpenAI ecosystem apps	Google ecosystem apps

II. LangGraph

2.1 Introduction

LangGraph is a low-level orchestration framework developed by the LangChain team, specifically designed for building long-running, stateful AI Agents. It draws design inspiration from Google’s Pregel and Apache Beam.

Core positioning: rather than abstracting away prompts or architecture, LangGraph provides low-level infrastructure that gives developers fine-grained control over agent workflows. It is already used in production by companies such as Klarna, Replit, and Elastic.

Project Info	Details
Latest Version	v1.1.10 (2026-04-27)
License	MIT
Install	`pip install -U langgraph`
GitHub	langchain-ai/langgraph
Docs	docs.langchain.com/oss/python/langgraph

2.2 Core Architecture

LangGraph models agent workflows as a Graph, built from three core components:

State: A shared data structure, typically defined using TypedDict or a Pydantic Model
Nodes: Functions that encode agent logic — they receive the current state and return an updated state
Edges: Functions that determine the next node, supporting conditional branching or fixed transitions

2.3 Key Features

Persistence: Saves the graph state as a checkpoint after each execution step; supports in-memory, Redis, and other backends
Human-in-the-Loop: Uses interrupt() to pause execution and wait for human input before resuming
Streaming: Supports multiple streaming modes including values, messages, and updates
Subgraphs: Supports nested graphs where subgraphs have their own independent checkpoints and interrupt capabilities
Time Travel: Can rewind to any historical checkpoint, with support for forking and replaying
Visualization: After compilation, can generate Mermaid diagrams to visualize the workflow structure

2.4 Code Example

from typing import Literal
from langgraph.graph import StateGraph, MessagesState, START, END
from langchain.messages import SystemMessage, HumanMessage, ToolMessage

# Define tools
def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b

def add(a: int, b: int) -> int:
    """Add two numbers."""
    return a + b

tools = [multiply, add]
tools_by_name = {tool.name: tool for tool in tools}

# Define nodes
def llm_call(state: MessagesState):
    """LLM decides whether to call a tool"""
    return {
        "messages": [
            llm_with_tools.invoke(
                [SystemMessage(content="You are a helpful arithmetic assistant.")]
                + state["messages"]
            )
        ]
    }

def tool_node(state: dict):
    """Execute tool calls"""
    result = []
    for tool_call in state["messages"][-1].tool_calls:
        tool = tools_by_name[tool_call["name"]]
        observation = tool.invoke(tool_call["args"])
        result.append(ToolMessage(content=str(observation), tool_call_id=tool_call["id"]))
    return {"messages": result}

# Define conditional edge routing
def should_continue(state: MessagesState) -> Literal["tool_node", END]:
    last_message = state["messages"][-1]
    if last_message.tool_calls:
        return "tool_node"
    return END

# Build and compile the graph
builder = StateGraph(MessagesState)
builder.add_node("llm_call", llm_call)
builder.add_node("tool_node", tool_node)
builder.add_edge(START, "llm_call")
builder.add_conditional_edges("llm_call", should_continue, ["tool_node", END])
builder.add_edge("tool_node", "llm_call")

agent = builder.compile()

# Run
result = agent.invoke({"messages": [HumanMessage(content="Add 3 and 4, then multiply by 5.")]})

2.5 Strengths and Limitations

Strengths: Fine-grained control, stateful execution, native human-in-the-loop, fault-tolerant recovery, time-travel debugging, framework-agnostic

Limitations: Steep learning curve, lots of boilerplate code, best experience requires the LangSmith ecosystem, fast-moving release cycle

III. CrewAI

3.1 Introduction

CrewAI is a Python framework for orchestrating multiple autonomous AI Agents, built entirely from scratch with no dependency on LangChain or any other framework. Its core idea is to simulate real team collaboration through role-playing.

Project Info	Details
Latest Version	v1.14.3 (2025-04-24)
License	MIT
Install	`pip install 'crewai[tools]'`
GitHub	crewAIInc/crewAI
Docs	docs.crewai.com

3.2 Core Concepts

Agent: Identity and behavior defined through role, goal, and backstory
Task: A concrete unit of work; can specify the assigned agent, context dependencies, and output format
Crew: A collection of collaborating agents, defining the execution process and memory configuration
Tools: A rich set of built-in tools (search, file read/write, code execution, etc.) with MCP integration support
Process: Either Sequential or Hierarchical (automatically creates a Manager Agent)

3.3 Key Features

Role-playing design: Intuitive role definitions that closely mirror real team collaboration
Collaborative workflows: Agents can delegate tasks to one another and pass context between them
Four memory systems: Short-term memory, long-term memory, entity memory, and contextual memory
Flows: Enterprise-grade event-driven workflow orchestration, supporting @start, @listen, and @router decorators
Checkpoint & Fork: Supports saving, restoring, and branching execution state
YAML-driven configuration: Agents and tasks can be defined via YAML files

3.4 Code Example

from crewai import Agent, Task, Crew, Process

# Define Agents
researcher = Agent(
    role='Senior AI Researcher',
    goal='Discover the latest development trends in the AI Agent space',
    backstory='You are an experienced researcher with a knack for spotting cutting-edge technology developments.',
    verbose=True,
    memory=True,
)

writer = Agent(
    role='Technical Report Writing Specialist',
    goal='Transform research findings into clear, well-structured reports',
    backstory='You are a technical writing expert who excels at turning complex information into readable reports.',
    verbose=True,
    memory=True,
)

# Define Tasks
research_task = Task(
    description='Conduct comprehensive research on {topic} and gather the latest development trends.',
    expected_output='A detailed list of research findings with 10 key points',
    agent=researcher,
)

writing_task = Task(
    description='Write a complete technical report based on the research findings.',
    expected_output='A complete report in Markdown format',
    agent=writer,
    context=[research_task],
    output_file='report.md',
)

# Assemble the Crew and run
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
    verbose=True,
)

result = crew.kickoff(inputs={'topic': 'multi-agent collaboration systems'})

3.5 Strengths and Limitations

Strengths: Fully standalone with no external dependencies, intuitive role-playing design, four memory systems, YAML-driven configuration, active community (100,000+ certified developers)

Limitations: Python only, high API overhead for multi-agent collaboration, complex to debug, enterprise features require a paid plan

IV. LlamaIndex

4.1 Introduction

LlamaIndex (formerly GPT Index) is an open-source framework that started out focused on RAG (Retrieval-Augmented Generation) and has since expanded into a document intelligence and OCR platform. Founded by Jerry Liu in 2022.

Project Info	Details
Latest Version	v0.14.6
License	MIT
Install	`pip install llama-index`
GitHub	run-llama/llama_index
Docs	developers.llamaindex.ai

4.2 Core Concepts

Workflow: An event-driven orchestration mechanism where steps are defined using the @step decorator
Context: A global runtime context that coordinates data passing between steps and supports persistence
Event-driven architecture: StartEvent → custom events → StopEvent, forming a directed graph
AgentWorkflow: A high-level abstraction that automatically selects the appropriate agent type based on LLM capabilities

4.3 Agent Types

Type	Use Case	Characteristics
FunctionAgent	When the LLM supports function calling	Uses native function calling directly — most efficient
ReActAgent	When the LLM does not support function calling	Executes via the ReAct (Reasoning + Acting) loop
CodeActAgent	Scenarios that require code execution	Generates and executes code via tags

4.4 Key Features

RAG + Agent integration: RAG is a first-class capability, not an add-on; supports 130+ data formats
Multi-agent collaboration: Native support for multi-agent handoff mechanisms
Context persistence: Workflow state can be serialized and restored, suitable for production environments
LlamaParse: Enterprise-grade document parsing and OCR
300+ integration packages: Covers mainstream LLMs, vector databases, and data sources

4.5 Code Example

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
import asyncio

# Build a RAG index
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Define tools
def multiply(a: float, b: float) -> float:
    """Multiply two numbers."""
    return a * b

async def search_documents(query: str) -> str:
    """Search documents for answers."""
    response = await query_engine.aquery(query)
    return str(response)

# Create the agent
agent = FunctionAgent(
    tools=[multiply, search_documents],
    llm=OpenAI(model="gpt-4o-mini"),
    system_prompt="You are a helpful assistant that can calculate and search documents.",
)

# Run
async def main():
    response = await agent.run("What did the author do in college? Also, what's 7 * 8?")
    print(response)

asyncio.run(main())

4.6 Strengths and Limitations

Strengths: Deep RAG + Agent integration, flexible event-driven architecture, 300+ ecosystem integrations, multi-agent support, LlamaParse enterprise-grade parsing

Limitations: Steep learning curve, relatively heavy framework, TypeScript version has incomplete feature coverage, fast release cycle with frequent breaking changes, enterprise features require a paid plan

V. Dify

5.1 Introduction

Dify (Do It For You) is an open-source LLM application development platform positioned as an agentic workflow builder. It combines Backend-as-a-Service with LLMOps, enabling both non-technical users and developers to rapidly build AI applications.

Project Info	Details
Latest Version	v1.6.0+
License	Dify Open Source License (Apache 2.0+)
Deploy	`docker compose up -d`
GitHub	langgenius/dify
Docs	docs.dify.ai

5.2 Core Features

Visual workflow builder: Drag-and-drop canvas supporting parallel processing, conditional branching, and loop nodes
Agent strategies: Supports Function Calling, ReAct, and custom strategy plugins
RAG pipeline: A complete data source → processing → knowledge base → retrieval flow
Model management: Seamless integration with hundreds of LLMs, with model switching and performance comparison
Prompt IDE: An intuitive prompt authoring interface
LLMOps: Monitor and analyze application logs and performance

5.3 Agent Strategies

Strategy	Use Case
Function Calling	Models with native tool calling support (e.g., GPT-4, Claude)
ReAct	Models without native function calling, or when explicit reasoning traces are needed
Custom Strategy Plugin	Complex behaviors requiring multi-turn tool calls, etc.

5.4 How to Create an Agent

Dify uses a visual / no-code approach:

Create an “Agent” type application in Dify Studio
Select an LLM model
Set the Agent strategy (automatically detects Function Calling support)
Choose from 50+ built-in tools or add custom tools
Write a system prompt
Preview and debug, then publish with one click

5.5 Integration Capabilities

API: Full RESTful API with SSE streaming support
SDK: Node.js, PHP, and Java clients
Plugin system: Six plugin categories — models, tools, agent strategies, extensions, data sources, and triggers
MCP integration: Native support for the Model Context Protocol
Deployment: Docker Compose, Kubernetes, Terraform, AWS CDK

5.6 Strengths and Limitations

Strengths: Low-code / no-code, ready out of the box (50+ built-in tools), rapid path from prototype to production, multi-model support, active community (800+ contributors)

Limitations: Limited customization flexibility (less than code-first frameworks), execution subject to step/time limits, license is not pure Apache 2.0, risk of platform lock-in, advanced reasoning modes are less mature than dedicated frameworks

VI. OpenAI Agents SDK

6.1 Introduction

OpenAI Agents SDK is a lightweight multi-agent framework officially released by OpenAI, evolved from the internal Swarm experimental project. Its core philosophy is minimalist design — building complex workflows from just a few concepts: Agent, Handoff, Guardrail, and Tool.

Project Info	Details
Latest Version	v0.14.6 (2026-04-25)
License	MIT
Install	`pip install openai-agents`
GitHub	openai/openai-agents-python
Docs	openai.github.io/openai-agents-python

6.2 Core Concepts

Agent: An LLM configured with instructions, tools, guardrails, and handoff capabilities
Runner: The agent executor, providing run() (async), run_sync() (synchronous), and run_streamed() (streaming)
Handoff: Task delegation between agents; the receiving agent inherits the full conversation history
Guardrails: Safety rails in three categories — input guardrails, output guardrails, and tool guardrails
Tools: Supports function tools, MCP tools, OpenAI hosted tools, and Agent as Tool

6.3 Key Features

Minimalist design: Few core primitives, gentle learning curve
Provider-agnostic: Supports 100+ LLMs via any-llm / LiteLLM
Three-layer guardrails: Safety validation at the input → output → tool level
Built-in Tracing: Visualize and debug agent execution flows
Realtime Agents: Build voice agents (gpt-realtime-1.5)
Sandbox Agents: Added in v0.14.0 — executes code in a containerized environment
Structured output: Define output types via Pydantic Model using output_type

6.4 Code Example

import asyncio
from agents import Agent, Runner, function_tool

# Define a tool
@function_tool
def get_weather(city: str) -> str:
    """Get the weather for a specified city."""
    return f"The weather in {city} is sunny."

# Define specialist Agents
billing_agent = Agent(
    name="Billing Agent",
    instructions="You are a billing specialist.",
)

refund_agent = Agent(
    name="Refund Agent",
    instructions="You are a refund specialist.",
)

# Define a triage Agent
triage_agent = Agent(
    name="Triage Agent",
    instructions="Route the user's question to the correct specialist Agent: billing issues -> Billing Agent; refund issues -> Refund Agent.",
    handoffs=[billing_agent, refund_agent],
    tools=[get_weather],
)

# Run
async def main():
    result = await Runner.run(
        triage_agent,
        "I was charged twice for my subscription. Please help me resolve this.",
    )
    print(f"Final answer: {result.final_output}")
    print(f"Handled by Agent: {result.last_agent.name}")

asyncio.run(main())

6.5 Strengths and Limitations

Strengths: Officially maintained, minimalist design, provider-agnostic, three-layer guardrails, built-in tracing, voice agent support

Limitations: Still at 0.x — API may change, deep dependency on the OpenAI ecosystem, no parallel agent execution support, no built-in persistent memory system

VII. Google ADK

7.1 Introduction

Google ADK (Agent Development Kit) is an open-source, code-first agent development framework released by Google. Its design philosophy is to make AI agent development feel like traditional software development. It is optimized for Gemini and Google Cloud, while remaining model-agnostic and deployment-agnostic.

Project Info	Details
Latest Version	v1.31.1 (2026-04-30)
License	Apache 2.0
Install	`pip install google-adk`
GitHub	google/adk-python
Docs	google.github.io/adk-docs

7.2 Core Concepts

LlmAgent (alias Agent): The core building block — combines an LLM model, instructions, and tools
SequentialAgent: Executes sub-agents one after another in order (pipeline style)
ParallelAgent: Runs multiple sub-agents concurrently
LoopAgent: Repeatedly executes sub-agents with support for exit conditions
sub_agents: Nesting sub-agents to build hierarchical multi-agent architectures

7.3 Key Features

Multi-agent orchestration: Sequential, parallel, loop-based, and LLM-driven dynamic routing
Built-in tools: Google Search, Vertex AI Search, code executor, and more
Google ecosystem integration: Native Gemini, Vertex AI Agent Engine, Cloud Run
Flexible deployment: Local, Agent Engine (fully managed), Cloud Run, GKE, Docker
Built-in evaluation: CLI tool adk eval for systematic agent performance assessment
A2A protocol: Supports Agent-to-Agent remote communication
Lifecycle callbacks: before/after_agent, before/after_model, and before/after_tool hooks

7.4 Code Example

import asyncio
from google.adk.agents import Agent, SequentialAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
from google.adk.tools import google_search

# Define a weather Agent
weather_agent = Agent(
    name="weather_assistant",
    model="gemini-2.5-flash",
    instruction="You are a weather query assistant. Use Google Search to find the latest weather information.",
    tools=[google_search],
)

# Define a translation Agent
translate_agent = Agent(
    name="translate_assistant",
    model="gemini-2.5-flash",
    instruction="You are a translation assistant. Translate content into Chinese.",
)

# Compose into a sequential workflow
pipeline = SequentialAgent(
    name="WeatherPipeline",
    sub_agents=[weather_agent, translate_agent],
)

# Run
session_service = InMemorySessionService()
runner = Runner(agent=pipeline, app_name="weather_app", session_service=session_service)

async def run_agent(query: str):
    session = session_service.create_session(
        app_name="weather_app", user_id="user_1", session_id="session_1"
    )
    content = types.Content(role='user', parts=[types.Part(text=query)])
    async for event in runner.run_async(
        user_id="user_1", session_id="session_1", new_message=content
    ):
        if event.is_final_response() and event.content and event.content.parts:
            print(f"Agent reply: {event.content.parts[0].text.strip()}")

asyncio.run(run_agent("What's the weather in Tokyo today?"))

7.5 Strengths and Limitations

Strengths: Code-first, powerful orchestration (sequential / parallel / loop), deep Google ecosystem integration, built-in evaluation, multi-language support (Python / Java / Go / TS), Apache 2.0 open source

Limitations: Best experience requires Gemini and Google Cloud, relatively new framework with an early-stage community ecosystem, frequent releases mean the API may change, access to Google services is restricted from mainland China

VIII. Framework Selection Guide

Choose by Use Case

Use Case	Recommended Framework	Reason
Complex stateful workflows	LangGraph	Low-level graph orchestration, persistence, time travel
Multi-role team collaboration	CrewAI	Role-playing design, delegation mechanism, memory systems
RAG + Agent	LlamaIndex	Deep RAG integration, 130+ data formats, document parsing
Rapid prototyping / non-technical teams	Dify	Visual drag-and-drop, low-code, ready out of the box
Primarily OpenAI models	OpenAI Agents SDK	Officially maintained, minimal API, tracing and debugging
Google Cloud deployment	Google ADK	Gemini-optimized, Vertex AI integration, built-in evaluation
Need fine-grained control	LangGraph / Google ADK	Low-level APIs, lifecycle callback hooks
Need production-grade guardrails	OpenAI Agents SDK	Three-layer Guardrails

Choose by Team Profile

Team Profile	Recommendation
Full-stack development teams	LangGraph, Google ADK
Python data science teams	CrewAI, LlamaIndex
Product managers / operations teams	Dify
Heavy OpenAI ecosystem users	OpenAI Agents SDK
Google Cloud users	Google ADK
Need to validate ideas quickly	Dify, OpenAI Agents SDK

Note: The framework information above is based on research conducted in April 2026. All frameworks iterate quickly — check the official documentation for the latest information before getting started.

AI Agent 框架调研笔记

Thu, 30 Apr 2026 04:00:00 GMT

AI Agent 框架调研笔记

更新时间：2026-04-30
随着 AI Agent 技术的快速发展，各类 Agent 开发框架层出不穷。本文档对当前主流的 6 个 Agent 框架进行调研和对比分析，帮助开发者选择合适的工具。

一、框架概览对比

维度	LangGraph	CrewAI	LlamaIndex	Dify	OpenAI Agents SDK	Google ADK
开发方	LangChain Inc.	CrewAI Inc.	LlamaIndex Inc.	LangGenius	OpenAI	Google
最新版本	v1.1.10	v1.14.3	v0.14.6	v1.6.0+	v0.14.6	v1.31.1
许可证	MIT	MIT	MIT	Dify License (Apache 2.0+)	MIT	Apache 2.0
语言	Python / JS	Python	Python / TS	可视化（多语言 SDK）	Python / JS	Python / Java / Go / TS
核心理念	图编排	角色扮演团队	RAG + Agent	低代码平台	极简多 Agent	代码优先
模型支持	模型无关	模型无关	模型无关	模型无关	100+ LLM	模型无关
学习曲线	较陡	中等	中等	低	低	中等
适合场景	复杂有状态工作流	多角色协作	RAG + Agent	快速原型/非技术用户	OpenAI 生态应用	Google 生态应用

二、LangGraph

2.1 简介

LangGraph 是由 LangChain 团队开发的底层编排框架，专门用于构建长时间运行的、有状态的 AI Agent。设计灵感来自 Google 的 Pregel 和 Apache Beam。

核心定位：不抽象化提示词或架构，提供底层基础设施，让开发者精细控制 Agent 工作流。已被 Klarna、Replit、Elastic 等公司用于生产环境。

项目信息	详情
最新版本	v1.1.10（2026-04-27）
许可证	MIT
安装	`pip install -U langgraph`
GitHub	langchain-ai/langgraph
文档	docs.langchain.com/oss/python/langgraph

2.2 核心架构

LangGraph 将 Agent 工作流建模为图（Graph），由三个核心组件构成：

State（状态）：共享数据结构，通常用 TypedDict 或 Pydantic Model 定义
Nodes（节点）：编码 Agent 逻辑的函数，接收当前状态、返回更新后的状态
Edges（边）：决定下一个节点的函数，支持条件分支或固定转换

2.3 关键特性

持久化（Persistence）：每个执行步骤将图状态保存为 checkpoint，支持内存、Redis 等后端
人机协作（Human-in-the-Loop）：通过 interrupt() 暂停执行，等待人工输入后恢复
流式输出（Streaming）：支持 values、messages、updates 等多种流式模式
子图（Subgraphs）：支持图嵌套，子图拥有独立的 checkpoint 和中断能力
时间旅行：可回溯到任意历史 checkpoint，支持 fork 和重放
可视化：编译后可生成 Mermaid 图形展示工作流结构

2.4 代码示例

from typing import Literal
from langgraph.graph import StateGraph, MessagesState, START, END
from langchain.messages import SystemMessage, HumanMessage, ToolMessage

# 定义工具
def multiply(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b

def add(a: int, b: int) -> int:
    """Add two numbers."""
    return a + b

tools = [multiply, add]
tools_by_name = {tool.name: tool for tool in tools}

# 定义节点
def llm_call(state: MessagesState):
    """LLM 决定是否调用工具"""
    return {
        "messages": [
            llm_with_tools.invoke(
                [SystemMessage(content="You are a helpful arithmetic assistant.")]
                + state["messages"]
            )
        ]
    }

def tool_node(state: dict):
    """执行工具调用"""
    result = []
    for tool_call in state["messages"][-1].tool_calls:
        tool = tools_by_name[tool_call["name"]]
        observation = tool.invoke(tool_call["args"])
        result.append(ToolMessage(content=str(observation), tool_call_id=tool_call["id"]))
    return {"messages": result}

# 定义条件边路由
def should_continue(state: MessagesState) -> Literal["tool_node", END]:
    last_message = state["messages"][-1]
    if last_message.tool_calls:
        return "tool_node"
    return END

# 构建并编译图
builder = StateGraph(MessagesState)
builder.add_node("llm_call", llm_call)
builder.add_node("tool_node", tool_node)
builder.add_edge(START, "llm_call")
builder.add_conditional_edges("llm_call", should_continue, ["tool_node", END])
builder.add_edge("tool_node", "llm_call")

agent = builder.compile()

# 运行
result = agent.invoke({"messages": [HumanMessage(content="Add 3 and 4, then multiply by 5.")]})

2.5 优势与局限

优势： 精细化控制、有状态执行、原生人机协作、容错恢复、时间旅行调试、框架无关

局限： 学习曲线较陡、样板代码多、最佳体验需配合 LangSmith 生态、版本迭代快

三、CrewAI

3.1 简介

CrewAI 是一个用于编排多个自主 AI Agent 的 Python 框架，完全从零构建，不依赖 LangChain 或其他框架。核心理念是通过角色扮演模拟真实团队协作。

项目信息	详情
最新版本	v1.14.3（2025-04-24）
许可证	MIT
安装	`pip install 'crewai[tools]'`
GitHub	crewAIInc/crewAI
文档	docs.crewai.com

3.2 核心概念

Agent（智能体）：通过 role（角色）、goal（目标）、backstory（背景故事）定义身份和行为
Task（任务）：具体工作单元，可指定执行者、上下文依赖和输出格式
Crew（团队）：一组协作 Agent 的集合，定义执行流程和记忆配置
Tools（工具）：丰富的内置工具集（搜索、文件读写、代码执行等），支持 MCP 集成
Process（流程）：Sequential（顺序）或 Hierarchical（层级，自动创建 Manager Agent）

3.3 关键特性

角色扮演设计：直观的角色定义方式，贴近真实团队协作
协作工作流：Agent 间可委派任务、传递上下文
四种记忆系统：短期记忆、长期记忆、实体记忆、上下文记忆
Flows（流程编排）：企业级事件驱动工作流，支持 @start、@listen、@router 装饰器
Checkpoint & Fork：支持执行状态的保存、恢复和分支
YAML 配置驱动：Agent 和 Task 可通过 YAML 文件定义

3.4 代码示例

from crewai import Agent, Task, Crew, Process

# 定义 Agent
researcher = Agent(
    role='高级 AI 研究员',
    goal='发现 AI Agent 领域的最新发展趋势',
    backstory='你是一位经验丰富的研究员，擅长发现前沿技术的最新动态。',
    verbose=True,
    memory=True,
)

writer = Agent(
    role='技术报告撰写专家',
    goal='将研究发现转化为清晰、结构化的报告',
    backstory='你是一位技术写作专家，擅长将复杂信息转化为易读的报告。',
    verbose=True,
    memory=True,
)

# 定义 Task
research_task = Task(
    description='对 {topic} 进行全面调研，收集最新的发展趋势。',
    expected_output='包含 10 个要点的详细研究发现列表',
    agent=researcher,
)

writing_task = Task(
    description='根据研究发现撰写一份完整的技术报告。',
    expected_output='完整的 Markdown 格式报告',
    agent=writer,
    context=[research_task],
    output_file='report.md',
)

# 组建 Crew 并执行
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
    verbose=True,
)

result = crew.kickoff(inputs={'topic': '多智能体协作系统'})

3.5 优势与局限

优势： 完全独立无依赖、角色扮演直观、四种记忆系统、YAML 配置驱动、活跃社区（10 万+ 认证开发者）

局限： 仅支持 Python、多 Agent 协作 API 开销大、调试复杂、企业功能需付费

四、LlamaIndex

4.1 简介

LlamaIndex（原名 GPT Index）是一个开源框架，最初专注于 RAG（检索增强生成），现已扩展为文档智能体和 OCR 平台。由 Jerry Liu 于 2022 年创立。

项目信息	详情
最新版本	v0.14.6
许可证	MIT
安装	`pip install llama-index`
GitHub	run-llama/llama_index
文档	developers.llamaindex.ai

4.2 核心概念

Workflow（工作流）：事件驱动的编排机制，通过 @step 装饰器定义步骤
Context（上下文）：全局运行时上下文，协调步骤间数据传递，支持持久化
事件驱动架构：StartEvent → 自定义事件 → StopEvent，形成有向图
AgentWorkflow：高层封装，自动根据 LLM 能力选择合适的 Agent 类型

4.3 Agent 类型

类型	适用场景	特点
FunctionAgent	LLM 支持函数调用时	直接使用原生 function calling，效率最高
ReActAgent	LLM 不支持函数调用时	通过 ReAct（推理+行动）循环执行
CodeActAgent	需要执行代码的场景	通过标签生成并执行代码

4.4 关键特性

RAG + Agent 一体化：RAG 是核心能力而非补充，130+ 数据格式接入
多智能体协作：原生支持多 Agent 交接（handoff）机制
Context 持久化：工作流状态可序列化恢复，适合生产环境
LlamaParse：企业级文档解析和 OCR
300+ 集成包：覆盖主流 LLM、向量数据库、数据源

4.5 代码示例

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
import asyncio

# 构建 RAG 索引
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# 定义工具
def multiply(a: float, b: float) -> float:
    """Multiply two numbers."""
    return a * b

async def search_documents(query: str) -> str:
    """Search documents for answers."""
    response = await query_engine.aquery(query)
    return str(response)

# 创建智能体
agent = FunctionAgent(
    tools=[multiply, search_documents],
    llm=OpenAI(model="gpt-4o-mini"),
    system_prompt="You are a helpful assistant that can calculate and search documents.",
)

# 运行
async def main():
    response = await agent.run("What did the author do in college? Also, what's 7 * 8?")
    print(response)

asyncio.run(main())

4.6 优势与局限

优势： RAG + Agent 深度集成、事件驱动架构灵活、300+ 生态集成、多智能体支持、LlamaParse 企业级解析

局限： 学习曲线较陡、框架较重、TS 版本功能覆盖不全、版本迭代快有 breaking changes、企业功能需付费

五、Dify

5.1 简介

Dify（Do It For You）是一个开源的 LLM 应用开发平台，定位为智能体工作流构建器。将 Backend-as-a-Service 与 LLMOps 结合，让非技术用户和开发者都能快速构建 AI 应用。

项目信息	详情
最新版本	v1.6.0+
许可证	Dify Open Source License (Apache 2.0+)
部署	`docker compose up -d`
GitHub	langgenius/dify
文档	docs.dify.ai

5.2 核心功能

可视化工作流构建器：拖拽式画布，支持并行处理、条件分支、循环节点
Agent 策略：支持 Function Calling、ReAct 和自定义策略插件
RAG 管道：完整的数据源 → 处理 → 知识库 → 检索流程
模型管理：无缝集成数百种 LLM，支持模型切换和性能比较
Prompt IDE：直观的提示词编写界面
LLMOps：监控和分析应用日志和性能

5.3 Agent 策略

策略	适用场景
Function Calling	模型原生支持工具调用（如 GPT-4、Claude）
ReAct	模型不支持原生函数调用，或需要显式推理追踪
自定义策略插件	需要多轮工具调用等复杂行为

5.4 创建 Agent 的方式

Dify 采用可视化/无代码方式：

在 Dify Studio 中创建 “Agent” 类型应用
选择 LLM 模型
设置 Agent 策略（自动检测 Function Calling 支持）
从 50+ 内置工具中选择或添加自定义工具
编写系统提示词
调试预览后一键发布

5.5 集成能力

API：完整的 RESTful API，支持 SSE 流式响应
SDK：Node.js、PHP、Java 客户端
插件系统：模型、工具、Agent 策略、扩展、数据源、触发器六类插件
MCP 集成：原生支持 Model Context Protocol
部署：Docker Compose、Kubernetes、Terraform、AWS CDK

5.6 优势与局限

优势： 低代码/无代码、开箱即用（50+ 内置工具）、快速原型到生产、多模型支持、活跃社区（800+ 贡献者）

局限： 自定义灵活性受限（不如代码框架）、执行有步骤/时间限制、许可证非纯 Apache 2.0、平台锁定风险、高级推理模式不如专用框架成熟

六、OpenAI Agents SDK

6.1 简介

OpenAI Agents SDK 是 OpenAI 官方推出的轻量级多智能体框架，从内部 Swarm 实验项目演化而来。核心理念是极简设计——只用 Agent / Handoff / Guardrail / Tool 几个概念构建复杂工作流。

项目信息	详情
最新版本	v0.14.6（2026-04-25）
许可证	MIT
安装	`pip install openai-agents`
GitHub	openai/openai-agents-python
文档	openai.github.io/openai-agents-python

6.2 核心概念

Agent：配置了指令、工具、护栏和交接能力的 LLM
Runner：Agent 执行器，提供 run()（异步）、run_sync()（同步）、run_streamed()（流式）
Handoff：Agent 间的任务委托，被委托者继承完整对话历史
Guardrails：安全护栏，分输入护栏、输出护栏、工具护栏三类
Tools：支持函数工具、MCP 工具、OpenAI 托管工具、Agent as Tool

6.3 关键特性

极简设计：核心原语少，学习曲线平缓
Provider 无关：通过 any-llm / LiteLLM 支持 100+ LLM
三层护栏：输入 → 输出 → 工具级别的安全校验
内置追踪（Tracing）：可视化调试 Agent 运行流程
Realtime Agents：支持构建语音 Agent（gpt-realtime-1.5）
Sandbox Agents：v0.14.0 新增，在容器环境中执行代码
结构化输出：通过 Pydantic Model 定义 output_type

6.4 代码示例

import asyncio
from agents import Agent, Runner, function_tool

# 定义工具
@function_tool
def get_weather(city: str) -> str:
    """获取指定城市的天气信息。"""
    return f"The weather in {city} is sunny."

# 定义专业 Agent
billing_agent = Agent(
    name="Billing Agent",
    instructions="你是账单问题专家。",
)

refund_agent = Agent(
    name="Refund Agent",
    instructions="你是退款问题专家。",
)

# 定义分流 Agent
triage_agent = Agent(
    name="Triage Agent",
    instructions="根据用户问题路由到正确的专业 Agent：账单 -> Billing Agent；退款 -> Refund Agent。",
    handoffs=[billing_agent, refund_agent],
    tools=[get_weather],
)

# 运行
async def main():
    result = await Runner.run(
        triage_agent,
        "我的订阅被扣了两次费用，请帮我处理。",
    )
    print(f"最终回答: {result.final_output}")
    print(f"处理 Agent: {result.last_agent.name}")

asyncio.run(main())

6.5 优势与局限

优势： 官方维护、极简设计、Provider 无关、三层护栏、内置追踪、语音 Agent 支持

局限： 仍处 0.x 阶段 API 可能变动、深度依赖 OpenAI 生态、不支持并行 Agent 执行、无内置持久化记忆系统

七、Google ADK

7.1 简介

Google ADK（Agent Development Kit） 是 Google 推出的开源、代码优先的 Agent 开发框架。设计理念是让 AI Agent 开发更像传统软件开发，针对 Gemini 和 Google Cloud 优化，但保持模型无关和部署无关。

项目信息	详情
最新版本	v1.31.1（2026-04-30）
许可证	Apache 2.0
安装	`pip install google-adk`
GitHub	google/adk-python
文档	google.github.io/adk-docs

7.2 核心概念

LlmAgent（别名 Agent）：核心构建块，组合 LLM 模型 + 指令 + 工具
SequentialAgent：按顺序依次执行子 Agent（管道式）
ParallelAgent：并发执行多个子 Agent
LoopAgent：重复执行子 Agent，支持退出条件
sub_agents：通过嵌套构建层级式多 Agent 架构

7.3 关键特性

多 Agent 编排：顺序、并行、循环和 LLM 驱动的动态路由
内置工具：Google Search、Vertex AI Search、代码执行器等
Google 生态集成：原生 Gemini、Vertex AI Agent Engine、Cloud Run
灵活部署：本地、Agent Engine（全托管）、Cloud Run、GKE、Docker
内置评估：CLI 工具 adk eval 系统化评估 Agent 性能
A2A 协议：支持 Agent-to-Agent 远程通信
生命周期回调：before/after_agent、before/after_model、before/after_tool 钩子

7.4 代码示例

import asyncio
from google.adk.agents import Agent, SequentialAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
from google.adk.tools import google_search

# 定义天气 Agent
weather_agent = Agent(
    name="weather_assistant",
    model="gemini-2.5-flash",
    instruction="你是一个天气查询助手。使用 Google 搜索查找最新天气信息。",
    tools=[google_search],
)

# 定义翻译 Agent
translate_agent = Agent(
    name="translate_assistant",
    model="gemini-2.5-flash",
    instruction="你是一个翻译助手，将内容翻译成中文。",
)

# 组合成顺序工作流
pipeline = SequentialAgent(
    name="WeatherPipeline",
    sub_agents=[weather_agent, translate_agent],
)

# 运行
session_service = InMemorySessionService()
runner = Runner(agent=pipeline, app_name="weather_app", session_service=session_service)

async def run_agent(query: str):
    session = session_service.create_session(
        app_name="weather_app", user_id="user_1", session_id="session_1"
    )
    content = types.Content(role='user', parts=[types.Part(text=query)])
    async for event in runner.run_async(
        user_id="user_1", session_id="session_1", new_message=content
    ):
        if event.is_final_response() and event.content and event.content.parts:
            print(f"Agent 回复: {event.content.parts[0].text.strip()}")

asyncio.run(run_agent("What's the weather in Tokyo today?"))

7.5 优势与局限

优势： 代码优先、强大编排能力（顺序/并行/循环）、Google 生态深度集成、内置评估、多语言支持（Python/Java/Go/TS）、Apache 2.0 开源

局限： 最佳体验需 Gemini 和 Google Cloud、框架较新社区生态初期、高频发布 API 可能变动、中国大陆访问 Google 服务受限

八、框架选型指南

按使用场景选择

场景	推荐框架	理由
复杂有状态工作流	LangGraph	底层图编排、持久化、时间旅行
多角色团队协作	CrewAI	角色扮演设计、委派机制、记忆系统
RAG + Agent	LlamaIndex	RAG 深度集成、130+ 数据格式、文档解析
快速原型 / 非技术团队	Dify	可视化拖拽、低代码、开箱即用
OpenAI 模型为主	OpenAI Agents SDK	官方维护、极简 API、追踪调试
Google Cloud 部署	Google ADK	Gemini 优化、Vertex AI 集成、内置评估
需要精细控制	LangGraph / Google ADK	底层 API、回调钩子
需要生产级护栏	OpenAI Agents SDK	三层 Guardrails

按团队特点选择

团队特点	推荐
全栈开发团队	LangGraph、Google ADK
Python 数据科学团队	CrewAI、LlamaIndex
产品经理 / 运营团队	Dify
OpenAI 生态重度用户	OpenAI Agents SDK
Google Cloud 用户	Google ADK
需要快速验证想法	Dify、OpenAI Agents SDK

注意：以上框架信息基于 2026 年 4 月的调研，各框架迭代较快，建议使用前查看官方文档获取最新信息。

Tile Explorer Web — 24h AI GameDev Hackathon Project (Software)

Tue, 28 Apr 2026 04:00:00 GMT

Tile Explorer is a browser-based tile-matching puzzle game I built in 24 hours. The entire project runs on a purely native web stack — PixiJS (loaded via CDN) for rendering, Web Audio API for procedurally synthesized sound effects, zero build tools, zero npm dependencies. Double-click index.html and it just runs. The game is deployed on GitHub Pages, with a live leaderboard powered by Supabase’s free tier. Total hosting cost: ¥0/month.

↑ Playable right here (requires an internet connection to load)

Core gameplay: Patterned tiles are stacked across the board. Tap an accessible tile to send it into a 7-slot collection tray at the bottom. Match 3 identical patterns and they clear automatically. Clear every tile from the board to complete the level.

Key highlights of the project:

Mathematically guaranteed solvability: Level layouts are generated from a difficulty formula where total tile count = patternTypes × setsPerType × 3, which structurally ensures every pattern appears in multiples of three. A backtracking solver runs inside a Web Worker to forward-validate each layout — only layouts with a confirmed solution path are accepted. The solver also records the optimal move count, which serves as the star-rating baseline.
Procedural audio synthesis: Every interactive sound effect — taps, clears, combos, power-ups, warnings — is synthesized in real time via the Web Audio API. Zero audio files, zero network requests. Combo sounds are built on a C-major chord progression system, progressively brightening from triangle waves to sawtooth waves to give players a satisfying sense of escalating momentum. When BGM is playing, sound effects auto-duck by 6dB and smoothly recover over 200ms.
Data-driven architecture: Difficulty curves, power-up properties, and theme configurations are all declarative, editable config tables. A designer can tune difficulty curves and power-up parameters by editing JS config files directly — no touching game logic code. Six visual themes each have their own library of 32 emoji patterns, a background image, and a BGM track; themes rotate automatically every 3 levels.
PWA + offline support: Full Progressive Web App support is implemented — installable to a phone’s home screen and fully playable offline. The Service Worker uses a three-tier caching strategy: precached static assets, cache-first for CDN resources, and Stale-While-Revalidate for theme media. Dual-CDN failover provides automatic fallback.
Zero-cost online leaderboard: Built on Supabase’s free tier (PostgreSQL + REST API). A UUID is auto-generated on first visit and stored in localStorage — no account required. The database enforces row-level security (RLS); the client holds only the anon key. All input goes through dual regex validation plus XSS sanitization. Scores earned offline are queued locally and submitted automatically once connectivity is restored.

On the engineering side: tile occlusion uses spatial hashing (O(n) instead of O(n²)); clear particle effects use a pre-allocated object pool to avoid GC jitter; opacity calculations follow an exponential decay model based on the Weber–Fechner law; and the collection slots use a smart clustering insertion algorithm to help players quickly spot matching opportunities.

The entire project was completed within 24 hours. My own code spans 14 JS modules + 3 CSS files + 1 HTML file, covering 10,000 levels, 6 power-up types, and 6 themes. AI assistance generated the vast majority of the code, along with all audio synthesis parameters and BGM assets. My role focused on architecture design, requirements refinement, data structure design, and overall code quality.

Tile Explorer Web — 24h AI GameDev 马拉松作品 (软件作品)

Tue, 28 Apr 2026 04:00:00 GMT

Tile Explorer 是我在 24 小时内完成的一款浏览器三消瓦片解谜游戏。整个项目完全采用 Web 原生技术栈开发，渲染引擎使用 PixiJS（CDN 引入），音效通过 Web Audio API 程序化合成，零构建工具、零 npm 依赖——双击 index.html 即可运行。游戏已部署至 GitHub Pages，后端使用 Supabase 免费层实现在线排行榜，整体运维成本为 0 元/月。

↑ 上方可直接游玩（需要联网加载）

核心玩法：版面上堆叠着带有图案的瓦片，点击可用瓦片将其送入底部 7 格收集槽，凑齐 3 个相同图案自动消除，清空版面上所有瓦片即通关。

项目的主要亮点：

数学可解性保证：关卡布局由难度公式推导生成，瓦片总数 = patternTypes × setsPerType × 3，从根本上保证每种图案数量均为 3 的倍数。同时，Web Worker 中运行回溯求解器对每个布局做正向验证，只有确认存在通关路径才会采用，并记录最优步数作为星级评分基准。
程序化音效合成：所有交互音效（点击、消除、连击、道具、警告等）均通过 Web Audio API 实时合成，零音频文件、零网络请求。连击音效基于 C 大调和弦递进系统设计，从三角波到锯齿波逐渐变亮，给玩家”蓄力”的感知。BGM 播放时音效自动 Ducking（降 6dB），200ms 后平滑恢复。
数据驱动架构：难度曲线、道具属性、主题配置均为可编辑的声明式配置表。策划可直接修改 JS 配置文件调整难度曲线和道具参数，无需触碰游戏逻辑代码。6 套视觉主题各有独立的 32 emoji 图案库、背景图和 BGM，每 3 关自动轮换。
PWA + 离线支持：实现了完整的 Progressive Web App 支持——可安装到手机主屏幕、支持完全离线游玩。Service Worker 采用三级缓存策略（静态资源预缓存、CDN 资源缓存优先、主题媒体 Stale-While-Revalidate），双 CDN 容灾自动回退。
零成本在线排行榜：使用 Supabase 免费层（PostgreSQL + REST API），首次访问自动生成 UUID 存入 localStorage，无需注册。数据库启用行级安全（RLS），客户端仅持有 anon key，输入经双重正则校验 + XSS 清洗。离线成绩存入本地队列，联网后自动提交。

工程方面，瓦片覆盖关系使用空间哈希（O(n) 替代 O(n²)），消除特效使用预分配粒子对象池避免 GC 抖动，透明度计算遵循韦伯-费希纳定律的指数衰减模型，槽位采用智能聚类插入算法帮助玩家快速识别匹配机会。

整个项目在 24 小时内完成，自有代码 14 个 JS 模块 + 3 个 CSS + 1 个 HTML，覆盖 10,000 关、6 种道具、6 套主题。过程中 AI 辅助生成了绝大部分代码与全部音效参数、BGM 资产，我主要负责架构设计、需求梳理、数据结构设计及代码质量把控。

Veil Lingo — Online English Education Platform (Software Project)

Sat, 18 Apr 2026 04:00:00 GMT

Veil Lingo is a live one-on-one English speaking education platform targeting Chinese learners, connecting them with professional teachers from English-speaking countries. The platform name draws from John Rawls’ philosophical concept of the “veil of ignorance” — the idea being to create a fair, transparent teaching marketplace where the quality of instruction itself becomes the core basis for pricing. The project is deployed and live at talk-lingo.com.

↑ Live site preview above (or visit talk-lingo.com directly)

The project covers three user-facing portals: a student portal (browse teachers, book lessons, credit wallet, review system), a teacher portal (personal profile, calendar scheduling, earnings dashboard, rating feedback), and an admin backend (data dashboard, teacher approval, review moderation, violation management, system parameter configuration) — totaling 28+ pages and 34+ components.

Key technical highlights:

Dynamic Pricing and Salary Algorithm: The platform’s core differentiating design. Lesson prices float dynamically based on a teacher’s booking rate — high-demand teachers see prices automatically rise, while prices pull back when demand is low, creating a positive incentive loop. Teacher salaries are similarly auto-adjusted based on demand and ratings, ensuring top teachers earn higher returns. All parameters are configurable in the admin backend, so strategy adjustments require no code changes.
Pairwise Comparison Review System: Students can evaluate two teachers they’ve taken lessons with in a head-to-head comparison. This produces more reliable quality signals than traditional independent scoring, helping the platform more accurately identify differences in teaching ability.
Multi-Dimensional Radar Chart Scoring: Teacher evaluations span multiple teaching dimensions, visualized as radar charts. This gives students an intuitive view of a teacher’s style and strengths, and provides teachers with clear direction for improvement.
Mainland China Network Optimization: Geo-aware routing via Cloudflare Workers automatically selects the optimal access path for mainland users, reducing latency and improving availability.
Full Internationalization Support: Complete bilingual coverage in Chinese and English, with 874 translation keys managing all user-facing copy through a translation system.

On the tech stack side, the frontend uses Next.js (App Router + Server Components) + TypeScript + Tailwind CSS + shadcn/ui. The backend runs on Supabase (PostgreSQL + Auth + Storage + Realtime), with Row-Level Security enforcing data access control. The app is deployed on Vercel, with Cloudflare handling CDN and DNS. The entire project was built from scratch to production launch, covering full-stack development end to end: database design (21 tables + 26 migration scripts), authentication and authorization, payment wallet, scheduled jobs, SEO optimization, and more.

Veil Lingo — 在线英语教育平台 (软件作品)

Sat, 18 Apr 2026 04:00:00 GMT

Veil Lingo（无知之幕）是一个已上线的在线一对一口语教育平台，面向中国英语学习者，连接来自英语国家的专业教师。平台名取自约翰·罗尔斯的「无知之幕」哲学概念——意在创造一个公平、透明的教学市场，让教学质量本身成为定价的核心依据。项目已部署上线，域名为 talk-lingo.com。

↑ 上方为线上实站点预览（也可直接访问 talk-lingo.com）

项目包含三个用户端：学生端（浏览教师、预约课程、信用钱包、评价系统）、教师端（个人档案、日历排班、收入看板、评分反馈）和管理后台（数据看板、教师审批、评价审核、违规管理、系统参数配置），合计 28+ 个页面、34+ 个组件。

技术上的主要亮点：

动态定价与薪资算法：平台核心差异化设计。课程价格根据教师预约率动态浮动——高需求教师价格自动上调，低需求时回调，形成正向激励循环。教师薪资同样根据需求与评价自动调节，确保优秀教师获得更高回报。所有参数可在管理后台配置，无需改代码即可调整策略。
配对比较评价系统：学生可以对上过课的两位教师进行头对头对比评价，比传统独立评分能产生更可靠的质量信号，帮助平台更准确地识别教学水平差异。
多维度雷达图评分：教师评价覆盖多个教学维度，通过雷达图可视化呈现，帮助学生直观了解教师的教学风格和强项，也为教师提供清晰的改进方向。
中国大陆网络优化：通过 Cloudflare Workers 实现地理感知路由，针对大陆用户自动选择最优访问路径，降低延迟并提升可用性。
完整的国际化支持：中英双语全覆盖，874 个翻译键，所有面向用户的文案均通过翻译系统管理。

技术栈方面，前端采用 Next.js（App Router + Server Components）+ TypeScript + Tailwind CSS + shadcn/ui，后端使用 Supabase（PostgreSQL + Auth + Storage + Realtime），通过 Row-Level Security 确保数据安全。部署在 Vercel 上，Cloudflare 提供 CDN 和 DNS 服务。整个项目从零到上线，涉及完整的全栈开发：数据库设计（21 张表 + 26 个迁移脚本）、认证授权、支付钱包、定时任务、SEO 优化等。

Hermes Agent Research Notes

Thu, 16 Apr 2026 02:00:00 GMT

Hermes Agent Research Notes

1. Project Overview

Hermes Agent is an open-source, self-learning AI agent framework developed by Nous Research.

Project Info	Details
Initial Release	2026-02-25 (v0.1.0)
Current Version	v0.8.0 (2026-04-08)
GitHub Stars	22k+
License	MIT
Language	Python

Core philosophy: an agent should grow alongside its user — through a built-in learning loop, it creates skills from experience and continuously improves. The more you use it, the better it gets.

2. Core Features

2.1 Self-Learning Feedback Loop

Automatically creates reusable Skill documents after completing complex tasks
Skills self-iterate and improve through usage
Built-in FTS5 full-text search + LLM summarization for cross-session memory recall
Honcho-based user modeling to understand who you are

2.2 Multi-Platform Integration

A single Gateway process covers: Telegram, Discord, Slack, WhatsApp, Signal, Email. Supports voice memo transcription with continuous cross-platform conversations.

2.3 Terminal Interface

Full TUI: multi-line editing, slash command completion, conversation history, interrupt redirection, and streaming tool output.

2.4 Model-Agnostic

Supports Nous Portal, OpenRouter (200+ models), OpenAI, Anthropic, Hugging Face, Xiaomi MiMo, and more. Switch with hermes model — zero code changes required.

2.5 Scheduled Tasks

Built-in Cron scheduler. Define scheduled tasks in natural language (daily digests, backups, audits) and results are automatically delivered to any platform.

2.6 Parallel Sub-Agents

Spawn isolated sub-agents for parallel workflows. Supports Python scripts that call tools via RPC, compressing multi-step pipelines into single-turn operations with zero context overhead.

2.7 Flexible Deployment

6 terminal backends: Local, Docker, SSH, Daytona, Singularity, Modal. Serverless on-demand wake-up keeps idle costs near zero.

3. Quick Start

# Install (supports Linux / macOS / WSL2 / Termux)
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# Start
source ~/.bashrc
hermes              # Start a conversation
hermes model        # Select a model
hermes tools        # Configure tools
hermes gateway      # Start the message gateway
hermes setup        # Full setup wizard

4. Comparison with OpenClaw

OpenClaw (formerly Clawdbot/MoltBot) was released in January 2026 by Austrian engineer Peter Steinberger, and is the hottest open-source agent project of 2026 (200k+ Stars). Hermes has a clear lineage connection — it even ships a built-in OpenClaw migration tool (hermes claw migrate).

Dimension	Hermes Agent	OpenClaw
Release Date	2026-02	2026-01
Developer	Nous Research (team)	Peter Steinberger (solo start)
GitHub Stars	22k+	200k+
Core Philosophy	Self-learning loop — builds skills from experience, continuously iterates	Autonomous execution — completes real tasks on behalf of the user
Skill System	Auto-created + self-improving, compatible with agentskills.io standard	Primarily manual configuration, no automatic learning loop
Model Support	Model-agnostic (OpenRouter / Xiaomi MiMo / HuggingFace, etc.)	Primarily tied to the Claude family
Messaging Platforms	Telegram / Discord / Slack / WhatsApp / Signal / Email	Telegram / Discord / Slack / Feishu
Deployment	VPS / Docker / SSH / Serverless (6 backends)	Local-first, Docker / self-hosted
Memory System	Honcho user modeling + FTS5 cross-session search	MEMORY.md static memory file
Community Size	Rapidly growing	Large ecosystem, rich plugins and templates

Summary: OpenClaw has a more mature ecosystem and a larger community — a better fit for users who need autonomous execution out of the box. Hermes is lighter and emphasizes a “the more you use it, the better it knows you” self-learning mechanism, making it ideal for users who want an agent that’s a long-term companion and continuously adapts to their habits. Migration paths exist between the two, so you can switch as needed.

5. Comparison with Other Tools

Feature	Hermes Agent	Claude Code	OpenAI Codex
Self-Learning Skill System	Yes	Yes (OMC extension)	No
Multi-Platform Messaging	Telegram / Discord / Slack / WhatsApp / Signal	CLI + IDE	CLI + API
Model Choice	Any model	Claude family	GPT family
Scheduled Tasks	Built-in Cron	Requires external scheduler	No
Deployment	VPS / Docker / Serverless	Local / IDE	Cloud
Open Source	MIT	Partial	No

6. Assessment

Strengths: Unique self-learning mechanism, model-agnostic, broad platform coverage, flexible deployment, active community.

Limitations: The project is relatively new (only 2 months old), and API stability remains to be seen. Compared to mature tools like Claude Code, the ecosystem and plugin count still have room to grow.

Best Use Case: When you want a long-running personal agent that continuously learns your preferences — especially for cross-platform scenarios (Telegram, WeChat, etc.).

Sources: Hermes GitHub | Hermes Official Docs | OpenClaw GitHub | MIT Technology Review China

Hermes Agent 调研笔记

Thu, 16 Apr 2026 02:00:00 GMT

Hermes Agent 调研笔记

一、项目概览

Hermes Agent 是 Nous Research 开发的开源、自学习 AI Agent 框架。

项目信息	详情
首次发布	2026-02-25 (v0.1.0)
当前版本	v0.8.0 (2026-04-08)
GitHub Stars	22k+
协议	MIT
语言	Python

核心理念：Agent 应该随用户一起成长——通过内置学习循环，从经验中创建技能、持续改进，越用越强。

二、核心特性

2.1 自学习闭环

完成复杂任务后自动创建可复用的 Skill 文档
Skill 在使用过程中自我迭代优化
内置 FTS5 全文搜索 + LLM 摘要，支持跨会话记忆召回
基于 Honcho 的用户画像建模，理解你是谁

2.2 多平台接入

单一 Gateway 进程即可覆盖：Telegram、Discord、Slack、WhatsApp、Signal、Email。支持语音备忘录转录，跨平台对话连续。

2.3 终端交互

完整的 TUI 界面：多行编辑、斜杠命令补全、对话历史、中断重定向、流式工具输出。

2.4 模型无关

支持 Nous Portal、OpenRouter (200+ 模型)、OpenAI、Anthropic、Hugging Face、小米 MiMo 等，hermes model 一键切换，零代码改动。

2.5 定时任务

内置 Cron 调度器，用自然语言定义定时任务（日报、备份、审计），结果自动投递到任意平台。

2.6 并行子代理

可生成隔离子代理并行工作流，支持通过 RPC 调用工具的 Python 脚本，将多步骤流水线压缩为零上下文开销的单轮操作。

2.7 灵活部署

6 种终端后端：Local、Docker、SSH、Daytona、Singularity、Modal。支持 Serverless 按需唤醒，空闲时几乎零成本。

三、快速上手

# 安装（支持 Linux / macOS / WSL2 / Termux）
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# 启动
source ~/.bashrc
hermes              # 开始对话
hermes model        # 选择模型
hermes tools        # 配置工具
hermes gateway      # 启动消息网关
hermes setup        # 完整设置向导

四、与 OpenClaw 对比

OpenClaw（前身 Clawdbot/MoltBot）由奥地利工程师 Peter Steinberger 于 2026 年 1 月发布，是 2026 年最火的开源 Agent 项目（200k+ Stars）。Hermes 与之有明确的渊源——内置了 OpenClaw 迁移工具（hermes claw migrate）。

维度	Hermes Agent	OpenClaw
发布时间	2026-02	2026-01
开发者	Nous Research（团队）	Peter Steinberger（个人起步）
GitHub Stars	22k+	200k+
核心理念	自学习闭环——从经验中创建技能、持续迭代	自主执行——代替用户完成真实操作
Skill 系统	自动创建 + 自我改进，兼容 agentskills.io 标准	手动配置为主，无自动学习闭环
模型支持	模型无关（OpenRouter/小米 MiMo/HuggingFace 等）	主要绑定 Claude 系列
消息平台	Telegram/Discord/Slack/WhatsApp/Signal/Email	Telegram/Discord/Slack/飞书
部署方式	VPS/Docker/SSH/Serverless（6 种后端）	本地优先，Docker/自托管
记忆系统	Honcho 用户画像 + FTS5 跨会话搜索	MEMORY.md 静态记忆文件
社区规模	快速增长中	庞大生态，插件/模板丰富

总结：OpenClaw 生态更成熟、社区更大，适合需要”开箱即用”自主执行的用户；Hermes 更轻量、更强调”越用越懂你”的自学习机制，适合希望 Agent 长期陪伴并持续适应自己习惯的用户。两者有迁移路径，可以按需切换。

五、与其他工具对比

特性	Hermes Agent	Claude Code	OpenAI Codex
自学习 Skill 系统	有	有 (OMC 扩展)	无
多平台消息	Telegram/Discord/Slack/WhatsApp/Signal	CLI + IDE	CLI + API
模型选择	任意模型	Claude 系列	GPT 系列
定时任务	内置 Cron	需外部调度	无
部署方式	VPS / Docker / Serverless	本地 / IDE	云端
开源	MIT	部分	否

六、评价

优势：自学习机制独特、模型无关、多平台覆盖、部署灵活、社区活跃。

局限：项目较新（仅 2 个月），API 稳定性待观察；与 Claude Code 等成熟工具相比，生态和插件数量尚有差距。

适用场景：需要一个长期运行、持续学习你偏好的个人 Agent，尤其是跨平台（Telegram/微信）使用场景。

参考来源：Hermes GitHub | Hermes 官方文档 | OpenClaw GitHub | MIT Technology Review China

SQL Basics Notes

Sun, 12 Apr 2026 02:00:00 GMT

I. Introduction to SQL

1.1 What is SQL

SQL (Structured Query Language): the standard programming language for managing relational databases.

RDBMS: Relational Database Management System
Common databases (by type): MySQL, PostgreSQL, SQLite, Oracle, SQL Server
1. SQLite: lightweight, embedded — great for mobile apps
2. MySQL: open-source, widely used — great for web apps
3. PostgreSQL: open-source, feature-rich — great for complex apps
4. Oracle: enterprise-grade, fully featured — great for large-scale apps
5. SQL Server: developed by Microsoft — great for Windows environments

1.2 Basic SQL Categories

Four schools of thought — these are the disciplines you use to communicate with a database. Master them and you’re a data wrangler; give up and you’re just a data janitor. 🐶

Category	Purpose	Keywords
DDL	Define database structure	CREATE, ALTER, DROP
DML	Manipulate data	INSERT, UPDATE, DELETE
DQL	Query data	SELECT
DCL	Control permissions	GRANT, REVOKE

II. Basic Syntax

2.1 Basic Rules

SQL statements end with a semicolon ; (some databases allow omitting it)
Keywords are case-insensitive, but the convention is to write keywords in uppercase and table/column names in lowercase
Strings and dates are wrapped in single quotes ' '
Comments: -- single-line comment, /* multi-line comment */

2.2 Writing Style

-- Recommended writing style
SELECT
    id,
    name,
    email
FROM
    users/* Whether to use double quotes depends on the database type */
WHERE
    status = 'active'
ORDER BY
    create_time DESC;

2.3 Common Operators

Arithmetic Operators

Operator	Description
`+`	Addition
`-`	Subtraction
`*`	Multiplication
`/`	Division
`%` or `MOD()`	Modulo

Comparison Operators

Operator	Description
`=`	Equal to
`<>` or `!=`	Not equal to
`>`	Greater than
`<`	Less than
`>=`	Greater than or equal to
`<=`	Less than or equal to

Logical Operators

Operator	Description
`AND`	Logical AND (higher precedence than OR — use parentheses like in C++)
`OR`	Logical OR
`NOT`	Logical NOT

2.4 Common Commands (MySQL)

-- Show all databases
SHOW DATABASES;

-- Show all tables in the current database
SHOW TABLES;

-- View table structure
DESC table_name;
-- or
DESCRIBE table_name;

-- View the CREATE TABLE statement
SHOW CREATE TABLE table_name;

-- Show full column info for a table
SHOW FULL COLUMNS FROM table_name;

2.5 ⚠️ Things to Watch Out For

Query syntax keywords have a specific ordering relationship.

III. DDL — Data Definition

2.1 Creating a Database

1 2	CREATE DATABASE database_name; USE database_name;

2.2 Creating a Table

CREATE TABLE table_name (
    column1 data_type [constraint],
    column2 data_type [constraint],
    ...
);

Common data types:

Integer: INT, BIGINT
Decimal: DECIMAL(m,n), FLOAT, DOUBLE
String: VARCHAR(n), CHAR(n), TEXT
Date/Time: DATE, DATETIME, TIMESTAMP

2.3 Constraints

CREATE TABLE users (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE,
    age INT DEFAULT 18,
    FOREIGN KEY (dept_id) REFERENCES departments(id)
);

Common constraints:

PRIMARY KEY: primary key, uniquely identifies a row
NOT NULL: value cannot be null
UNIQUE: value must be unique
DEFAULT: default value
FOREIGN KEY: foreign key constraint
AUTO_INCREMENT: auto-increment (MySQL)

2.4 Altering Table Structure

-- Add a column
ALTER TABLE table_name ADD column_name data_type;

-- Modify a column
ALTER TABLE table_name MODIFY column_name new_data_type;

-- Drop a column
ALTER TABLE table_name DROP COLUMN column_name;

-- Rename a table
ALTER TABLE table_name RENAME TO new_table_name;

2.5 Dropping a Table

1 2	DROP TABLE table_name; -- Drop the table entirely TRUNCATE TABLE table_name; -- Clear all data (keep the structure)

III. DML — Data Manipulation

3.1 Inserting Data

-- Insert a single row
INSERT INTO table_name (col1, col2) VALUES (val1, val2);

-- Insert multiple rows
INSERT INTO table_name (col1, col2) VALUES
(val1, val2),
(val3, val4),
(val5, val6);

-- Import from another table
INSERT INTO table_name SELECT * FROM other_table WHERE condition;

3.2 Updating Data

1
2
3

UPDATE table_name
SET col1 = new_val1, col2 = new_val2
WHERE condition;

3.3 Deleting Data

1	DELETE FROM table_name WHERE condition;

IV. DQL — Data Query (Core)

4.1 Basic Queries

-- Query all columns
SELECT * FROM table_name;

-- Query specific columns
SELECT col1, col2 FROM table_name;

-- Deduplicate
SELECT DISTINCT col FROM table_name;

-- Alias
SELECT col AS alias FROM table_name;

4.2 Conditional Queries — WHERE

SELECT * FROM table_name WHERE condition;

-- Comparison operators
WHERE age > 18
WHERE name = 'Alice'
WHERE age >= 18 AND age <= 30

-- Range
WHERE age BETWEEN 18 AND 30

-- Enumeration
WHERE status IN ('active', 'pending')

-- Pattern matching
WHERE name LIKE 'A%'       -- starts with A
WHERE name LIKE '%son%'    -- contains "son"
WHERE name LIKE 'A_'       -- starts with A, exactly 2 characters

-- Null checks
WHERE email IS NULL
WHERE email IS NOT NULL

4.3 Sorting — ORDER BY

SELECT * FROM table_name ORDER BY col1 ASC, col2 DESC;

-- ASC: ascending (default)
-- DESC: descending

4.4 Limiting Results — LIMIT

-- MySQL
SELECT * FROM table_name LIMIT 10;
SELECT * FROM table_name LIMIT 5, 10;  -- Start from row 5, fetch 10 rows

-- SQL Server
SELECT TOP 10 * FROM table_name;

-- Oracle
SELECT * FROM table_name WHERE ROWNUM <= 10;

4.5 Aggregate Functions

SELECT
    COUNT(*)          AS total_rows,
    COUNT(col)        AS non_null_count,
    SUM(col)          AS total,
    AVG(col)          AS average,
    MAX(col)          AS maximum,
    MIN(col)          AS minimum
FROM table_name;

4.6 Grouping — GROUP BY

SELECT col, aggregate_function
FROM table_name
GROUP BY col
HAVING aggregate_condition;

Note: WHERE filters before grouping; HAVING filters after grouping.

-- Example: average salary per department
SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
HAVING AVG(salary) > 5000;

4.7 Multi-Table Queries

Joins (JOIN)

-- INNER JOIN: only keep matching rows
SELECT *
FROM table1
INNER JOIN table2 ON table1.col = table2.col;

-- LEFT JOIN: keep all rows from the left table; NULLs where there's no match on the right
SELECT *
FROM table1
LEFT JOIN table2 ON table1.col = table2.col;

-- RIGHT JOIN: keep all rows from the right table
SELECT *
FROM table1
RIGHT JOIN table2 ON table1.col = table2.col;

-- FULL JOIN (MySQL doesn't support this natively — simulate with UNION)
SELECT * FROM table1 LEFT JOIN table2 ON ...
UNION
SELECT * FROM table1 RIGHT JOIN table2 ON ...;

Subqueries

-- Subquery in WHERE
SELECT * FROM table_name WHERE col = (SELECT col FROM ...);

-- IN subquery
SELECT * FROM table_name WHERE col IN (SELECT col FROM ...);

-- EXISTS subquery
SELECT * FROM table_name WHERE EXISTS (SELECT 1 FROM ... WHERE condition);

4.8 UNION — Combined Queries

SELECT col FROM table1
UNION                 -- merge and deduplicate
SELECT col FROM table2;

SELECT col FROM table1
UNION ALL            -- merge and keep duplicates
SELECT col FROM table2;

V. Common Functions

5.1 String Functions

Function	Description
`CONCAT(s1, s2)`	Concatenate strings
`LENGTH(s)`	Get string length
`UPPER(s)` / `LOWER(s)`	Convert case
`TRIM(s)`	Strip leading/trailing spaces
`SUBSTRING(s, start, len)`	Extract a substring
`REPLACE(s, old, new)`	Replace substring
`IFNULL(s, default)`	Replace NULL with a default value

5.2 Numeric Functions

Function	Description
`ROUND(n, d)`	Round to d decimal places
`CEIL(n)` / `FLOOR(n)`	Ceiling / floor
`ABS(n)`	Absolute value
`MOD(n, m)`	Modulo
`RAND()`	Random number

5.3 Date Functions

Function	Description
`NOW()` / `SYSDATE()`	Current date and time
`CURDATE()`	Current date
`YEAR(d)` / `MONTH(d)` / `DAY(d)`	Extract year / month / day
`DATE_FORMAT(d, format)`	Format a date
`DATE_ADD(d, INTERVAL n unit)`	Add/subtract from a date
`DATEDIFF(d1, d2)`	Difference between two dates

1	SELECT DATE_FORMAT(create_time, '%Y-%m-%d %H:%i:%s') FROM table_name;

5.4 Conditional Logic

-- IF
SELECT IF(age >= 18, 'adult', 'minor') FROM table_name;

-- CASE WHEN
SELECT
    CASE
        WHEN score >= 90 THEN 'A'
        WHEN score >= 80 THEN 'B'
        WHEN score >= 60 THEN 'C'
        ELSE 'D'
    END AS grade
FROM table_name;

VI. Indexes

6.1 Index Types

Type	Description
Regular index	Allows duplicate values
Unique index	Values must be unique
Primary key index	Auto-created with the primary key; unique and not null
Full-text index	Full-text search (MyISAM)
Composite index	Spans multiple columns

6.2 Creating Indexes

-- Create an index
CREATE INDEX index_name ON table_name(col);

-- Create a unique index
CREATE UNIQUE INDEX index_name ON table_name(col);

-- Create a composite index
CREATE INDEX index_name ON table_name(col1, col2);

-- View indexes
SHOW INDEX FROM table_name;

-- Drop an index
DROP INDEX index_name ON table_name;

6.3 Indexing Principles

Good candidates: large datasets, frequently queried columns, columns often used in WHERE
Avoid: small datasets, frequently updated columns, low-cardinality columns
Leftmost prefix rule: composite indexes are used starting from the leftmost column

VII. Transactions

7.1 Transaction Properties (ACID)

Atomicity: either everything succeeds or everything fails
Consistency: data is in a valid state before and after the transaction
Isolation: concurrent transactions don’t interfere with each other
Durability: once committed, data is permanently saved

7.2 Transaction Control

-- Start a transaction
START TRANSACTION;
-- or
BEGIN;

-- Commit
COMMIT;

-- Rollback
ROLLBACK;

-- Set a savepoint
SAVEPOINT savepoint_name;

-- Rollback to a savepoint
ROLLBACK TO savepoint_name;

7.3 Isolation Levels

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read
READ UNCOMMITTED	Possible	Possible	Possible
READ COMMITTED	Not possible	Possible	Possible
REPEATABLE READ (default)	Not possible	Not possible	Possible
SERIALIZABLE	Not possible	Not possible	Not possible

1	SET SESSION TRANSACTION ISOLATION LEVEL level;

VIII. Views

-- Create a view
CREATE VIEW view_name AS
SELECT col1, col2
FROM table_name
WHERE condition;

-- Use a view
SELECT * FROM view_name;

-- Drop a view
DROP VIEW view_name;

IX. References

SQL 基础笔记

Sun, 12 Apr 2026 02:00:00 GMT

一、SQL 简介

1.1 什么是 SQL

SQL（Structured Query Language）：用于管理关系型数据库的标准编程语言。

RDBMS：Relational Database Management System，关系型数据库管理系统
常见数据库（类型）：MySQL、PostgreSQL、SQLite、Oracle、SQL Server
1.SQLite：轻量级、嵌入式，适合移动应用
2.MySQL：开源、流行，适合Web应用
3.PostgreSQL：开源、功能强大，适合复杂应用
4.Oracle：企业级、功能全面，适合大型应用
5.SQL Server：微软开发，适合Windows环境

1.2 SQL 基本分类

四大门派，用这几门绝学来与数据库进行交流。学废了也就当个搬运工。🐶

分类	用途	关键字
DDL	定义数据库结构	CREATE、ALTER、DROP
DML	操作数据	INSERT、UPDATE、DELETE
DQL	查询数据	SELECT
DCL	控制权限	GRANT、REVOKE

二、基础语法

2.1 基本规则

SQL 语句以分号 ; 结尾（部分数据库可不加）
关键字不区分大小写，但习惯上关键字大写，表名/字段名小写
字符串和日期用单引号 ' ' 包围
注释：-- 单行注释、/* 多行注释 */

2.2 书写规范

-- 推荐的书写风格
SELECT
    id,
    name,
    email
FROM
    users/* 是否使用双引号取决于数据库类型 */
WHERE
    status = 'active'
ORDER BY
    create_time DESC;

2.3 常用运算符

算术运算符

运算符	说明
`+`	加
`-`	减
`*`	乘
`/`	除
`%` 或 `MOD()`	取余

比较运算符

运算符	说明
`=`	等于
`<>` 或 `!=`	不等于
`>`	大于
`<`	小于
`>=`	大于等于
`<=`	小于等于

逻辑运算符

运算符	说明
`AND`	且（优先级比OR更高和Cpp一样可以用括号）
`OR`	或
`NOT`	非

2.4 常用命令（MySQL）

-- 显示所有数据库
SHOW DATABASES;

-- 显示当前数据库所有表
SHOW TABLES;

-- 查看表结构
DESC 表名;
-- 或
DESCRIBE 表名;

-- 查看建表语句
SHOW CREATE TABLE 表名;

-- 显示表的所有列信息
SHOW FULL COLUMNS FROM 表名;

2.5 ⚠️注意事项

查询语法关键字是带顺序关系的。

三、DDL 数据定义

2.1 创建数据库

1 2	CREATE DATABASE 数据库名; USE 数据库名;

2.2 创建表

CREATE TABLE 表名 (
    字段名1 数据类型 [约束],
    字段名2 数据类型 [约束],
    ...
);

常用数据类型：

整数：INT、BIGINT
小数：DECIMAL(m,n)、FLOAT、DOUBLE
字符串：VARCHAR(n)、CHAR(n)、TEXT
日期：DATE、DATETIME、TIMESTAMP

2.3 约束

CREATE TABLE users (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE,
    age INT DEFAULT 18,
    FOREIGN KEY (dept_id) REFERENCES departments(id)
);

常用约束：

PRIMARY KEY：主键，唯一标识
NOT NULL：非空
UNIQUE：唯一
DEFAULT：默认值
FOREIGN KEY：外键约束
AUTO_INCREMENT：自增（MySQL）

2.4 修改表结构

-- 添加字段
ALTER TABLE 表名 ADD 字段名 数据类型;

-- 修改字段
ALTER TABLE 表名 MODIFY 字段名 新数据类型;

-- 删除字段
ALTER TABLE 表名 DROP COLUMN 字段名;

-- 重命名表
ALTER TABLE 表名 RENAME TO 新表名;

2.5 删除表

1 2	DROP TABLE 表名; -- 删除表结构 TRUNCATE TABLE 表名; -- 清空表数据（保留结构）

三、DML 数据操作

3.1 插入数据

-- 插入单条
INSERT INTO 表名 (字段1, 字段2) VALUES (值1, 值2);

-- 插入多条
INSERT INTO 表名 (字段1, 字段2) VALUES
(值1, 值2),
(值3, 值4),
(值5, 值6);

-- 从其他表导入
INSERT INTO 表名 SELECT * FROM 其他表 WHERE 条件;

3.2 更新数据

1
2
3

UPDATE 表名
SET 字段1 = 新值1, 字段2 = 新值2
WHERE 条件;

3.3 删除数据

1	DELETE FROM 表名 WHERE 条件;

四、DQL 数据查询（核心）

4.1 基本查询

-- 查询所有字段
SELECT * FROM 表名;

-- 查询指定字段
SELECT 字段1, 字段2 FROM 表名;

-- 去重
SELECT DISTINCT 字段 FROM 表名;

-- 别名
SELECT 字段 AS 别名 FROM 表名;

4.2 条件查询 WHERE

SELECT * FROM 表名 WHERE 条件;

-- 比较运算符
WHERE age > 18
WHERE name = '张三'
WHERE age >= 18 AND age <= 30

-- 范围
WHERE age BETWEEN 18 AND 30

-- 枚举
WHERE status IN ('active', 'pending')

-- 模糊匹配
WHERE name LIKE '张%'      -- 张开头
WHERE name LIKE '%三%'      -- 包含三
WHERE name LIKE '张_'       -- 张开头，2个字

-- 空值
WHERE email IS NULL
WHERE email IS NOT NULL

4.3 排序 ORDER BY

SELECT * FROM 表名 ORDER BY 字段1 ASC, 字段2 DESC;

-- ASC：升序（默认）
-- DESC：降序

4.4 限制 LIMIT

-- MySQL
SELECT * FROM 表名 LIMIT 10;
SELECT * FROM 表名 LIMIT 5, 10;  -- 从第5条开始，取10条

-- SQL Server
SELECT TOP 10 * FROM 表名;

-- Oracle
SELECT * FROM 表名 WHERE ROWNUM <= 10;

4.5 聚合函数

SELECT
    COUNT(*)          AS 总记录数,
    COUNT(字段)       AS 非空数量,
    SUM(字段)         AS 求和,
    AVG(字段)         AS 平均值,
    MAX(字段)         AS 最大值,
    MIN(字段)         AS 最小值
FROM 表名;

4.6 分组 GROUP BY

SELECT 字段, 聚合函数
FROM 表名
GROUP BY 字段
HAVING 聚合条件;

注意：WHERE 在分组前过滤，HAVING 在分组后过滤。

-- 示例：统计每个部门的平均工资
SELECT dept_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY dept_id
HAVING AVG(salary) > 5000;

4.7 多表查询

连接（JOIN）

-- 内连接：只保留匹配的行
SELECT *
FROM 表1
INNER JOIN 表2 ON 表1.字段 = 表2.字段;

-- 左连接：保留左表全部，右表无匹配为 NULL
SELECT *
FROM 表1
LEFT JOIN 表2 ON 表1.字段 = 表2.字段;

-- 右连接：保留右表全部
SELECT *
FROM 表1
RIGHT JOIN 表2 ON 表1.字段 = 表2.字段;

-- 全连接（MySQL 不支持，用 UNION 模拟）
SELECT * FROM 表1 LEFT JOIN 表2 ON ...
UNION
SELECT * FROM 表1 RIGHT JOIN 表2 ON ...;

子查询

-- WHERE 中的子查询
SELECT * FROM 表名 WHERE 字段 = (SELECT 字段 FROM ...);

-- IN 子查询
SELECT * FROM 表名 WHERE 字段 IN (SELECT 字段 FROM ...);

-- EXISTS 子查询
SELECT * FROM 表名 WHERE EXISTS (SELECT 1 FROM ... WHERE 条件);

4.8 UNION 联合查询

SELECT 字段 FROM 表1
UNION                 -- 去重合并
SELECT 字段 FROM 表2;

SELECT 字段 FROM 表1
UNION ALL            -- 保留重复
SELECT 字段 FROM 表2;

五、常用函数

5.1 字符串函数

函数	作用
`CONCAT(s1, s2)`	拼接字符串
`LENGTH(s)`	获取长度
`UPPER(s)` / `LOWER(s)`	大小写转换
`TRIM(s)`	去除首尾空格
`SUBSTRING(s, start, len)`	截取子串
`REPLACE(s, old, new)`	替换
`IFNULL(s, default)`	NULL 替换

5.2 数值函数

函数	作用
`ROUND(n, d)`	四舍五入
`CEIL(n)` / `FLOOR(n)`	向上/下取整
`ABS(n)`	绝对值
`MOD(n, m)`	取余
`RAND()`	随机数

5.3 日期函数

函数	作用
`NOW()` / `SYSDATE()`	当前日期时间
`CURDATE()`	当前日期
`YEAR(d)` / `MONTH(d)` / `DAY(d)`	提取年月日
`DATE_FORMAT(d, format)`	格式化日期
`DATE_ADD(d, INTERVAL n unit)`	日期加减
`DATEDIFF(d1, d2)`	日期差值

1	SELECT DATE_FORMAT(create_time, '%Y-%m-%d %H:%i:%s') FROM 表名;

5.4 条件判断

-- IF
SELECT IF(age >= 18, '成年', '未成年') FROM 表名;

-- CASE WHEN
SELECT
    CASE
        WHEN score >= 90 THEN 'A'
        WHEN score >= 80 THEN 'B'
        WHEN score >= 60 THEN 'C'
        ELSE 'D'
    END AS grade
FROM 表名;

六、索引

6.1 索引类型

类型	说明
普通索引	允许重复值
唯一索引	值唯一
主键索引	主键自动创建，唯一且非空
全文索引	全文搜索（MyISAM）
组合索引	多列组合

6.2 创建索引

-- 创建索引
CREATE INDEX 索引名 ON 表名(字段);

-- 创建唯一索引
CREATE UNIQUE INDEX 索引名 ON 表名(字段);

-- 创建组合索引
CREATE INDEX 索引名 ON 表名(字段1, 字段2);

-- 查看索引
SHOW INDEX FROM 表名;

-- 删除索引
DROP INDEX 索引名 ON 表名;

6.3 索引原则

适合：数据量大、查询频繁、WHERE 条件常用
避免：数据量小、更新频繁、区分度低的字段
最左前缀：组合索引从左开始使用

七、事务

7.1 事务特性（ACID）

Atomicity（原子性）：要么全部成功，要么全部失败
Consistency（一致性）：事务前后数据状态一致
Isolation（隔离性）：并发事务互不干扰
Durability（持久性）：提交后数据永久保存

7.2 事务控制

-- 开启事务
START TRANSACTION;
-- 或
BEGIN;

-- 提交
COMMIT;

-- 回滚
ROLLBACK;

-- 设置保存点
SAVEPOINT 保存点名称;

-- 回滚到保存点
ROLLBACK TO 保存点名称;

7.3 隔离级别

隔离级别	脏读	不可重复读	幻读
READ UNCOMMITTED	可能	可能	可能
READ COMMITTED	不可能	可能	可能
REPEATABLE READ（默认）	不可能	不可能	可能
SERIALIZABLE	不可能	不可能	不可能

1	SET SESSION TRANSACTION ISOLATION LEVEL 级别;

八、视图

-- 创建视图
CREATE VIEW 视图名 AS
SELECT 字段1, 字段2
FROM 表名
WHERE 条件;

-- 使用视图
SELECT * FROM 视图名;

-- 删除视图
DROP VIEW 视图名;

九、资料

Notes on Building Agentic Workflows (Andrew Ng)

Wed, 08 Apr 2026 09:45:00 GMT

I. Introduction to Agentic Workflows

1.1 What Is Agentic AI

Core definition: An LLM-based application that completes tasks through multi-step execution flows.

Non-agentic: Single prompt, one-shot completion (like writing an essay without a backspace key)
Agentic: Multi-step flow — outline → decide if research is needed → execute searches → write draft → reflect and revise → final output

Analogy: Making a stir-fry dish with multiple AIs each handling a role (prep / cooking / plating / review)

The blue labels mark different stages of AI evolution: from prompt engineer to content engineer to Hermes engineer (Agent).

1.2 Levels of Autonomy

Agentic is an adjective, not a noun — this sidesteps the debate over “what really counts as an Agent.”

Low Autonomy	High Autonomy
All steps predefined, tool calls hardcoded	Agent dynamically decides the step sequence
AI only generates text	Agent can create new tools on its own
“Obedient but brainless assistant”	“Smart, accountable intern”

The essence: not just “can do work,” but “knows how to think about the work, what tools to use, and can self-correct.”
A proper Agentic AI should be capable of autonomous planning (Planning — selecting tools on its own) and autonomous reflection (Reflection — with memory and self-review).

1.3 Three Major Benefits

Significant performance gains: On HumanEval, an agentic GPT-3.5 can outperform a non-agentic GPT-4 (though both sound like ancient history now)
Parallel speedup: Multiple LLM instances search and read simultaneously, then aggregate results
Modular design: Freely swap components (search engines, different LLMs, different tools)

1.4 Task Decomposition Method

Core methodology:

Observe human behavior → 2. Break into sub-steps → 3. Assess LLM/tool feasibility → 4. Iterate and improve

Case study — progressive decomposition of article writing:

1 step: Generate directly (shallow)
3 steps: Outline → Search → Write (better, but may feel disconnected)
5 steps: Outline → Search → Draft → Self-critique → Revise (best — simulates the human write-reflect-revise loop)
Core principle: “If a step produces poor results, break it down into smaller sub-steps.”

Building blocks: Model (LLM) + Tools (APIs, information retrieval, code execution)

1.5 Agentic AI Evaluation (Evals — think before you build)

Andrew Ng emphasizes: The biggest predictor distinguishing effective from ineffective practitioners is a rigorous development process built around evaluation and error analysis.

Methodology:

Build first, observe outputs, then evaluate (don’t design all metrics upfront): just see how things work.
Identify low-quality outputs and define error types.
Build evaluation metrics to track errors: write scripts to automatically scan all agent outputs and count how often error outputs appear.
For subjective criteria, use LLM as Judge (1–5 scoring, but don’t let the model score directly without guidance).

Two types of evaluation: End-to-end evaluation (overall output quality) / Component-level evaluation (per-step quality)

1.6 Overview of the Four Design Patterns

Pattern	Core Idea
Reflection	Multiple agents check, evaluate, and improve their own outputs.
Tool Use	Gives LLMs the ability to call external tools/functions
Planning	The model autonomously decides the sequence of steps for complex tasks
Multi-Agent	Multiple agents with different specializations collaborate

Real-World Example: oh-my-claudecode (OMC)

OMC (oh-my-claudecode) is a real Agent system that perfectly maps to these four design patterns:

Reflection: code-reviewer / verifier agent — after the executor writes code, an independent reviewer examines it. This is exactly what section 2.1 describes: “use different models, one to generate and one to review (1+1>2).” OMC’s rule is Never self-approve in the same active context — writing code and reviewing code must always be two separate agents.
Tool Use: MCP servers (Context7 for docs, filesystem for file ops, LSP for code analysis), the Skill system (/commit, /plan, and other callable skills), Bash tools — matching section 3.1’s “tools are functions, the model autonomously decides when to use them.” Claude independently judges which tool to use for each task.
Planning: planner agent, /plan skill, plan mode — matching section 5.2’s “LLM outputs a structured plan before executing.” In plan mode, Claude first explores the codebase and designs an approach, only writing code after you approve.
Multi-Agent: Team mode (/team) can launch multiple specialized agents simultaneously (explorer for search, executor for writing code, reviewer for review, designer for UI…), sharing a TaskList and collaborating via SendMessage.

II. The Reflection Design Pattern

Reflection really does improve output quality. So maybe taking more time to reflect on yourself actually leads to growth. 🐶

2.1 Reflection Improves Task Output

Core analogy: Humans review and revise drafts — AI can do the same.

Email writing example:

AI generates V1 → Feed V1 back to the LLM with a reflection prompt → Generate improved V2

Progressive path for code writing:

Basic: LLM writes code V1 → LLM reviews and generates V2
Advanced: Use different models — one to generate, one reasoning model to review (1+1>2)
Ultimate: Combine external feedback — execute V1 in a sandbox, capture errors, feed back to LLM to generate V2

Key insight: Reflection is an engineering practice, not magic; external feedback is the critical differentiator

2.2 Internal — Two Golden Rules of Self-Reflection

Explicitly instruct the reflection action: Say “review,” “check,” “verify” (specific actions), not just “improve.” For objective tasks: build ground-truth datasets + automated code evaluation (e.g., checking SQL query correctness).
Specify concrete criteria: List explicit evaluation dimensions (e.g., “professional tone,” “factually accurate”). For subjective tasks: use a Rubric to guide LLM scoring — avoid direct comparison (which introduces positional bias).

The paper “Self-Refine” shows that reflection consistently improved performance across all 7 tasks and 4 models tested.

2.3 External — Breaking Through the Performance Ceiling

Three performance curves:

🔴 No reflection: rapid gains from prompt engineering, then plateaus
🔵 With reflection: breaks through the plateau to a higher level
🟡 Reflection + external feedback: breaks through again to the highest level

Three types of external feedback: regex matching (avoid mentioning competitors) → search validation (fact-checking) → word count checks (format constraints)

External feedback breaks the model out of its information silo, addressing inherent weaknesses (precise counting, fact verification) and enabling closed-loop optimization.

Performance curve comparison: reflection vs. external feedback

III. Tool Use

3.1 What Tools Actually Are

Tools are functions — the model autonomously decides when to use them.

Key capability — conditional invocation: the model intelligently judges when a tool is needed.

“How much caffeine is in green tea?” → Answer from internal knowledge
“What time is it now?” → Call the get_current_time tool

Multi-tool chaining: a calendar assistant can chain check_calendar → make_appointment

3.2 How Does an LLM Actually “Call” a Function?

In theory, an LLM never touches the execution layer of any function — from start to finish, it does exactly one thing: generate text.
What we call “calling a function” is fundamentally a text relay protocol.

sequenceDiagram
    participant U as User
    participant S as System / Engineering Code
    participant L as LLM

    U->>L: "What time is it?"

    Note over S,L: Step 1: System prompt tells LLM which tools are available
    Note right of L: System prompt includes:
function name: get_current_time
description: get the current time
parameters: none

    Note over S,L: Step 2: LLM outputs structured text (nothing is executed!)
    L-->>S: tool_calls: [{
  name: "get_current_time",
  arguments: {}
}]

    Note over S: Step 3: Outer code intercepts
actually executes the function
gets result "15:20:45"

    Note over S,L: Step 4: Result is fed back into the LLM context
    S->>L: { role: "tool",
  content: "15:20:45" }

    L-->>S: "It is currently 3:20 PM."
    S->>U: "It is currently 3:20 PM."

Three key insights:

The LLM is not “calling” a function — it is predicting “the next output should be a JSON snippet expressing that I want to use this tool.” This is a pattern learned from training on large amounts of code and API documentation.
The outer engineering code is what actually executes the function — frameworks like Claude Code, the OpenAI SDK, and LangChain parse the LLM’s structured text output, execute the function, and feed the result back.
The LLM’s core ability is “judgment” — it decides “should I answer this user’s question from internal knowledge, or should I call a tool?” The “green tea caffeine” (internal knowledge) vs. “what time is it” (call a tool) example in section 3.1 illustrates exactly this judgment.

Early on, hand-written prompt templates were needed to trigger tool calls (e.g., “FUNCTION: get_current_time()”). Modern LLMs natively understand tool calling without hardcoded trigger syntax.

3.3 Tool Syntax and AI SDK (Writing Functions for LLMs to Call)

The AI SDK (from Andrew Ng’s team) unifies access to multiple LLM providers:

Function name → Python function name
Description → docstring
Parameter types → automatically extracted

3.4 Models That Write Their Own Code (LLM Writes and Calls Its Own Code)

Traditional approach (predefined add/subtract, etc.) vs. code execution (let the model write code itself):

Model outputs code inside tags
Code is extracted and executed in a sandbox
Error messages are fed back to the model for reflection and revision

⚠️ Security warning: A real-world case — an agentic code executor ran rm *.py and deleted all project files. Sandbox environments (Docker, E2B) are mandatory.

3.5 MCP (Model Context Protocol)

MCP standardizes how LLMs access external tools and data sources, expanding the “tool surface” available to the LLM.

Problem: m applications × n tools = m×n amount of work
MCP solution: Build n shared MCP Servers, m applications connect → work reduced to m+n
Client: The application that needs tools (Cursor, Claude Desktop, etc.)
Server: The tool/data provider (Slack, GitHub, PostgreSQL, etc.)

IV. Practical Tips for Building Agentic AI

4.1 Evals in Practice

Evaluation approaches can be divided along two dimensions, forming a 2×2 matrix to guide evaluation design:

Evaluation Dimension	Objective Evals (check with code)	Subjective Evals (use LLM as judge)
Each question has a unique correct answer (Per-Example Ground Truth)	Case 1: Invoice date extraction (each invoice has a different correct date — use code to check for a match)	Case 3: Counting gold-standard points (each topic has different key ideas — use LLM to check coverage)
Only unified rules / format / standards, no fixed answer (No Per-Example Ground Truth)	Case 2: Marketing copy length (all headlines must be 10 words — use code to check compliance)	Rubric Grading (e.g., evaluate charts against a unified clarity rubric)

Start fast and rough: Don’t be intimidated into treating evals as a massive project or spend endless time on theory first. Start with 10–20 examples and get some quick metrics to complement manual observation.
Iterate on your evals:
1. As the system and evals mature, scale up the evaluation set.
2. If the system improves but eval scores don’t go up, it’s time to improve the evals themselves.
Take inspiration from expert behavior: For systems automating human tasks, observe where the system underperforms human experts and use that as the focus for the next phase of work.

4.2 Error Analysis and Prioritization

As system complexity grows, intuition-driven debugging becomes unreliable — systematic analysis is required.

Core method:

Inspect traces and intermediate outputs: Each step’s output is called a “span”; combined, they form a “trace.”
Focus on error cases and quantify them: Build a table tracking failure rates per component.
- Example: 45% unsatisfactory search results vs. 5% poor search keyword generation → prioritize improving the search component.
- Make a habit of regularly reading the conversation log between the LLM and its tools.

4.3 Component-Level Evaluation

Analogous to unit tests vs. integration tests. Advantages: faster iteration, cleaner signal, teams can work in parallel.

Workflow: Error analysis pinpoints the problem component → Component-level eval for tuning → End-to-end eval to validate overall improvement

4.4 Strategies for Problem-Solving

Non-LLM components: Tune parameters/hyperparameters (number of search results, RAG similarity threshold), switch vendors.

LLM components (by priority):

Improve the prompt (clearer instructions, few-shot examples)
Try different LLMs (use evals to test multiple models)
Task decomposition (break complex steps into generate + reflect)
Fine-tuning (last resort — highest cost)

4.5 Latency and Cost Optimization

For early-stage teams, output quality matters far more than latency and cost. Optimize quality first, then latency, then cost.
Apply the same modular thinking: first identify which component is slowest/most expensive, then optimize it specifically (e.g., refine the prompt, switch models, reduce call frequency).

4.6 Four Phases of the Development Process

Phase	Focus	Analysis Activity
1. Rapid Prototype	Get the end-to-end flow working (“build the garbage first”)	Manually inspect outputs, read traces
2. Initial Evaluation	Go beyond manual observation	Build 10–20 example end-to-end evals
3. Rigorous Analysis	Need precise improvement direction	Error analysis, quantify component failure rates
4. Efficient Tuning	System is mature, improve at component level	Component-level evals

Two main developer activities: building (writing code) and analyzing (deciding where to focus). Teams typically spend too much time building and too little time analyzing.

V. Patterns for Highly Autonomous Agents

5.1 Planning Workflows

Planning pattern: The agent autonomously decides the tool-call sequence — no hardcoding.

Case study — customer service assistant (tools: get description, get price, check inventory, check orders, process purchase, process returns):

User asks: “Do you have round sunglasses under $100?”
LLM plans: get description → check inventory → get price → output answer

Advantages: Rich capabilities, no need to pre-orchestrate. Risks: The LLM’s plan is unpredictable and may be unstable.

5.2 Structured Plans

Natural language plans are ambiguous → require the LLM to output a structured plan (JSON/XML):

[
  {"description": "Find round sunglasses", "tool": "get_item_descriptions", "arguments": {"query": "round sunglasses"}},
  {"description": "Check inventory", "tool": "check_inventory", "arguments": {"items": "$step1_result"}}
]

5.3 Code As Action

Code As Action — HuggingFace smolagents

Drawing from the CodeAgent concept in HuggingFace smolagents — letting the LLM write code directly to express multi-step plans.

Advantages: Can call large libraries (hundreds of Pandas functions), highly expressive, research shows better performance than JSON/text plans.
Risks: The code the LLM writes must be executed in a sandbox environment.

5.4 Multi-Agent Workflows

Even when all agents use the same LLM, splitting complex tasks into independent roles is more effective.

My personal intuition is that it works because different prompts/contexts cause the model to focus on different things.

Advantages:

Task decomposition: Natural division of work by role/skill
Focus: Developers build one role at a time; simpler tasks = better output
Modular reuse: General-purpose agents (e.g., “chart designer”) can be reused across applications
Bypasses context limits: Each agent handles its own context (critical for 128k context constraints)
Cost savings: Shorter contexts = fewer tokens = lower cost and faster response

5.5 Four Communication Patterns

Pattern	Structure	Pros	Cons	Best For
Linear	Sequential, one-directional	Simple	Inflexible	Fixed-flow tasks
Hierarchy (two-tier)	Manager coordinates all subordinates	Easy to control	Manager bottleneck	Multi-task coordination
Deep Hierarchy	Sub-agents have their own sub-agents	Scalable, modular	Complex, hard to debug	Large systems
All-to-All (Decentralized)	All agents communicate freely	Creative	Unpredictable results	Exploratory / generative tasks

Given current LLM capabilities, linear and hierarchical patterns are more practical (the deeper the hierarchy, the more information is lost in transmission).
Beyond these four patterns, there is also a conversation pattern — a downgraded version of the decentralized model. In conversation mode, only two agents talk to each other at a time: one executes the task, the other reviews it, and together they hand off a result both are satisfied with.

5.6 Framework Recommendations

LangChain: Linear workflows
smolagents: Hierarchical workflows (author’s recommendation — simple, low abstraction, @tool decorator makes development easy)
MetaGPT / CamelAI: Decentralized workflows

Summary and Personal Reflections

I previously built a Skill at work that used Claude Code to call MCP tools to inspect UE assets and parse build error logs (though maybe that Skill doesn’t quite qualify as Agentic AI). Thinking about it through the lens of Agentic AI, it probably could have been built much more robustly. I was also genuinely surprised by the stability of the test projects in Andrew Ng’s course.

1. Planning: After a timer fires, Claude Code first analyzes the error log, decides which tool to call (check docs, check code, check historical error records, etc.), then executes the tools — or even writes its own database query code on the fly.
1. Reflection: After receiving the tool result, Claude Code performs a self-review to determine if the result is useful. If it isn’t satisfied, it adjusts the query parameters and calls the tool again, repeating until it gets a result it’s happy with.
1. Multi-Agent: You could design multiple specialized agents — one dedicated to log analysis, one to querying docs, one to querying code — collaborating through a shared context.
1. Evals: You could design automated evaluation scripts to quantify Claude Code’s performance on resolving errors — metrics like success rate, average time to resolution, etc. (Each completed result could auto-upload a JSON record to a server for the admin to review weekly. Users could also be asked whether the AI’s suggested solution actually solved their problem, building up a solution database so the AI can reference past resolutions for similar future issues.)

One more thing: different models may suit different harnesses, since their capabilities vary (as mentioned in Hung-yi Lee’s course — for example, Sonnet has a kind of “context anxiety,” meaning its capabilities noticeably degrade when the context gets very long).

Finally, returning to the Tool Use design pattern: one key practical insight is that the design quality of MCP tools directly determines the capability ceiling of the agent. Drawing from my experience with the UE MCP project, here are six tool design principles I’ve distilled:

Description is the most important design decision — the description is the interface: The caller of an MCP tool is the LLM, not a human. The LLM relies on the description field to decide “when to call this and how to call it.” A good description includes: what it does, when to use it, boundary constraints, parameter semantics, and what the return value means. A poor description makes the tool dead weight.
Granularity control — use subsystems as boundaries: Tools that are too fine-grained (e.g., splitting node creation by coordinate axis) lead to long call chains and compounding errors; tools that are too coarse (e.g., generating an entire character blueprint in one call) become black boxes where the LLM can’t localize failures. Use engine subsystems as boundaries — each tool does one complete thing.
Return values must be “LLM-friendly”: Return values must include enough decision context — on success, indicate what operations are available next; on failure, provide error_type, error_message, and suggestion so the agent can self-correct rather than blindly retry.
Separate reads from writes, make side effects transparent: When uncertain, LLMs tend to call tools that “look safe.” Read-only tools and write operations should be clearly categorized, with write operations explicitly annotating side effects in the description (e.g., “creates a new file on disk,” “irreversible operation”).
Idempotent design — make the LLM willing to retry: The LLM may call the same tool repeatedly due to timeouts or misjudgments. Design tools to be safe to call multiple times (e.g., if the asset already exists, return the existing asset instead of throwing an error).
Layered tool structure: High-level tools (task-oriented complete workflows, e.g., setup_character_blueprint()) reduce the number of calls needed; mid-level tools (single-step operations) preserve flexibility; low-level APIs should not be directly exposed to the LLM. Guide the LLM in the description to prefer high-level paths.

The one-sentence summary: Good MCP tool design, at its core, means “designing tools so the LLM can use them correctly, as if it had read the documentation.” This perfectly aligns with the core idea of the Tool Use pattern in Andrew Ng’s course — the quality of your tools sets the upper bound on your agent’s autonomous decision-making.

References

https://www.bilibili.com/video/BV1DfrdByE2H — Course video (Bilibili)
Original GitHub notes: Contains runnable code you can study as Jupyter Notebooks in VSCode — very convenient.
Another video mentioned later in the course: Agentic Knowledge Graph Construction
Original course link
Hung-yi Lee’s course: A good companion — feels a bit like an Agent course in its own right

Agentic工作流搭建教程笔记（吴恩达）

Wed, 08 Apr 2026 09:45:00 GMT

一、Agentic工作流简介

1.1 什么是Agentic AI

核心定义：基于LLM的应用通过多步骤执行来完成任务的流程。

Non-agentic：单次prompt，one-shot完成（类似写作文不许用退格键）
Agentic：多步骤流程——列大纲 → 决定是否需要调研 → 执行搜索 → 写初稿 → 反思修改 → 最终定稿

类比：做番茄炒蛋，让多个AI各司其职（备料/烹饪/摆盘/审查）

蓝色标注的分别是AI的不同发展阶段：从prompt engineer到content engineer再到Hermes engineer（Agent）.

1.2 自主性等级

Agentic是一个形容词而非名词，避免”什么才算真正的Agent”的争论。

低自主	高自主
所有步骤预定义，工具调用硬编码	Agent动态决策步骤顺序
AI只负责生成文本	Agent可自行创建新工具
“听话但没脑子的助手”	“聪明、有责任心的实习生”

本质：不只是”能干活”，而是”知道怎么思考如何干活、用什么工具、能自检纠错”。
一个合格的Agentic AI应该能自主规划（Planning，自己选择工具）和自主反思（Reflection，有记忆和反思）。

1.3 三大益处

性能大幅提升：HumanEval上，Agentic的GPT-3.5可超越Non-agentic的GPT-4（BTW这些听起来都是上古模型了）
并行加速：多个LLM实例同时搜索/阅读，汇总后输出
模块化设计：自由替换组件（搜索引擎、不同LLM、不同工具）

1.4 任务分解方法

核心方法论：

观察人类行为 → 2. 拆解为子步骤 → 3. 评估LLM/工具可行性 → 4. 迭代优化

案例——写文章的递进分解：

1步：直接生成（肤浅）
3步：大纲 → 搜索 → 写文（较好但可能脱节）
5步：大纲 → 搜索 → 初稿 → 自我批评 → 修改（最好，模拟人的写-反思-修改循环）
核心方法论：“如果某一步骤效果不好，就把它再拆成更小的子步骤。”

构建模块：模型（LLM）+ 工具（API、信息检索、代码执行）

1.5 Agentic AI评估（Evals，在做之前对项目进行思考）

吴恩达强调：区分有效和无效实践者的最大预测因子，是围绕评估和错误分析的规范开发流程。

方法论：

先构建，观察输出，再做评估（不要预先设计所有评估标准）：看看这事情咋弄。
识别低质量输出，定义错误类型。
建立评估指标追踪错误：编写脚本自动扫描智能体的所有输出，统计提及错误输出的次数和频率。
主观标准可用 LLM as Judge（1-5分打分，但是不要直接让大模型去打分）。

两类评估：端到端评估（整体输出质量） / 组件级评估（单步质量）（这两种就是一个整体一个局部评估）

1.6 四大设计模式总览

模式	核心思想
Reflection 反思	多Agent检查、评估、改进自己的输出。
Tool Use 工具使用	给LLM调用外部工具/函数的能力
Planning 规划	模型自主决定复杂任务的步骤序列
Multi-Agent 多智能体	多个不同专长的Agent协作

现实案例：oh-my-claudecode（OMC）

OMC （oh-My-Claude）是一个真实的 Agent 系统，完美对应了这四大设计模式：

Reflection 反思：code-reviewer / verifier agent — executor 写完代码后，由独立的 reviewer 审查。这就是笔记 2.1 说的”用不同模型，一个生成一个审查（1+1>2）”。OMC 的规则是 Never self-approve in the same active context，写代码和审代码必须是两个独立的 Agent。
Tool Use 工具使用：MCP servers（Context7 查文档、filesystem 操作文件、LSP 做代码分析）、Skill 系统（/commit、/plan 等可调用技能）、Bash 工具 — 对应笔记 3.1 的”工具就是函数，模型自主决定何时使用”。Claude 会根据任务自主判断该用哪个工具。
Planning 规划：planner agent、/plan skill、plan mode — 对应笔记 5.2 的”LLM输出结构化计划后再执行”。在 plan mode 下，Claude 会先探索代码库、设计方案，等你批准后才动手写代码。
Multi-Agent 多智能体：Team 模式（/team）可以同时启动多个专项 Agent（explorer 负责搜索、executor 负责写代码、reviewer 负责审查、designer 负责 UI…），它们共享 TaskList，通过 SendMessage 通信协作。

二、反思设计模式（Reflection）

使用反射模式真的是能提升效果的。所以人有多时候多反思一下自己，可能真的就能进步。🐶

2.1 反思提升任务输出

核心类比：人类会审查和修改草稿，AI也可以。

邮件写作案例：

AI生成V1 → 将V1反馈给LLM并附上反思prompt → 生成改进的V2

代码写作进阶路径：

基础：LLM写代码V1 → LLM审查生成V2
进阶：用不同模型——一个生成，一个推理模型审查（1+1>2）
终极：结合外部反馈——在沙箱执行V1，捕获错误，反馈给LLM生成V2

关键：反思是工程实践而非魔法；外部反馈是关键区分因素

2.2 Internal——自我反思的两条黄金法则

明确指示反思动作：说”审查””检查””验证”（具体的事情），不只是”改进”。客观任务：构建ground truth数据集 + 自动化代码评估（如SQL查询正确性）
指定具体标准：列出明确的评估维度（如”专业语气””事实准确”）。主观任务：用**评分标准（Rubric）**引导LLM打分，避免直接比较（存在位置偏差）

论文”Self-refine”表明：在全部7个任务、4个模型上，反思一致地提升了性能。

2.3 external——外部突破性能天花板

三条性能曲线：

🔴 无反思：prompt工程快速提升后停滞
🔵 有反思：突破停滞到更高水平
🟡 反思+外部反馈：再次突破到最高水平

三类外部反馈：正则匹配（避免提竞品）→ 搜索验证（事实核查）→ 字数检查（格式约束）

外部反馈打破模型信息孤岛，解决固有弱点（精确计数、事实核实），实现闭环优化。

反思与外部反馈的性能曲线对比

三、工具使用（Tool Use）

3.1 工具的本质

工具就是函数，模型自主决定何时使用。

关键能力——条件调用：模型智能判断何时需要工具。

“绿茶含多少咖啡因？” → 从内部知识回答
“现在几点？” → 调用get_current_time工具

多工具协作：日历助手可链式调用 check_calendar → make_appointment

3.2 LLM 是如何”调用”函数的？

LLM 理论上根本触碰不到函数的执行层——它从头到尾只做一件事：生成文本。
所谓的”调用函数”，本质是一个文本中转协议。

sequenceDiagram
    participant U as 用户
    participant S as 系统/工程代码
    participant L as LLM

    U->>L: "现在几点了？"

    Note over S,L: Step 1: 系统 prompt 告诉 LLM 可用工具
    Note right of L: 系统 prompt 中包含：
函数名: get_current_time
描述: 获取当前时间
参数: 无

    Note over S,L: Step 2: LLM 输出结构化文本（没有执行任何东西！）
    L-->>S: tool_calls: [{
  name: "get_current_time",
  arguments: {}
}]

    Note over S: Step 3: 外层代码拦截
真正执行函数
拿到结果 "15:20:45"

    Note over S,L: Step 4: 把结果塞回 LLM 上下文
    S->>L: { role: "tool",
  content: "15:20:45" }

    L-->>S: "现在是下午3点20分。"
    S->>U: "现在是下午3点20分。"

三个关键认知：

LLM 不是在”调用”函数 — 它只是在预测”接下来应该输出一段 JSON 表示我想用这个工具”。这是从大量代码和 API 文档训练中学到的模式。
真正执行函数的是外层工程代码 — Claude Code、OpenAI SDK、LangChain 这些框架负责解析 LLM 输出的结构化文本，执行函数，再把结果喂回去。
LLM 的核心能力在于”判断” — 它决定”这个用户问题我该用内部知识回答，还是该调工具”。笔记 3.1 里”绿茶咖啡因”（内部知识）vs “现在几点”（调工具）就是这个判断。

早期需要手写prompt模板触发（如”FUNCTION: get_current_time()”），现代LLM已原生理解工具调用，无需硬编码触发语法。

3.3 工具语法与AI SDK（写好函数让LLM调用）

AI SDK（Andrew Ng团队出品）统一多家LLM提供商访问：

函数名 → Python函数名
描述 → docstring
参数类型 → 自动提取

3.4 模型自己写代码（LLM自己写自己调）

传统方式（预定义add/subtract等）vs 代码执行（让模型自己写代码）

模型输出标签中的代码
在沙箱中提取执行
错误信息反馈给模型进行反思修改

⚠️ 安全警告：真实案例——Agentic代码执行器运行rm *.py删除了项目所有文件。必须使用沙箱环境（Docker、E2B）。

3.5 MCP（Model Context Protocol）

MCP标准化LLM访问外部工具和数据源的方式。让LLM调用的“工具范围”更大。

问题：m个应用 × n个工具 = m×n的工作量
MCP方案：建n个共享MCP Server，m个应用连接 → 工作量降为 m+n
Client：需要工具的应用（Cursor、Claude Desktop等）
Server：工具/数据提供者（Slack、GitHub、PostgreSQL等）

四、构建Agentic AI的实用技巧

4.1 评估（Evals）实战

评估方式可以从两个维度划分，形成一个 2x2 的矩阵，用于指导评估的设计：

评估维度	客观评估 (Objective Evals) （用代码检查）	主观评估 (Subjective Evals) （用 LLM 作为评判者）
每个问题有唯一正确答案 (Per-Example Ground Truth)	案例一：发票日期提取 (每个发票有不同的正确日期，用代码检查是否匹配)	案例三：统计黄金标准点 (每个主题有不同的重要观点，用 LLM 检查是否充分提及)
只有统一规则 / 格式 / 标准，没有固定答案 (No Per-Example Ground Truth)	案例二：营销文案长度 (所有标题都要求是 10 个词，用代码检查是否符合统一标准)	评分标准评估 (Rubric Grading) (例如，根据统一的清晰度评分标准来评估图表)

从快速而粗糙的评估开始：不要因为觉得评估是一个大型项目，就不敢轻易建立，或者花漫长的时间去做理论调研。先用 10-20 个例子开始，快速获得一些指标来辅助人工观察。
迭代改进评估：
1. 随着系统和评估的成熟，可以增加评估集的规模。
2. 如果系统改进了但评估分数没有提高，意味着该改进评估本身了。
以专业人士的行为为灵感：对于自动化人类任务的系统，观察系统在哪些方面性能不如人类专家，以此作为下一阶段工作的重点。

4.2 错误分析与优先级

系统复杂度上升后，直觉驱动debug不可靠，需要系统化分析。

核心方法：

检查traces和中间输出：每步输出叫”span”，合在一起叫”trace”
聚焦错误案例并量化：建表格追踪各组件失败率
- 例：搜索结果不满意45% vs 搜索关键词生成5% → 优先改搜索组件
- 习惯性地多看看LLM和工具的交流过程。

4.3 组件级评估

类比单元测试 vs 集成测试。优势：更快迭代、信号更清晰、团队可并行。

工作流：错误分析定位问题组件 → 组件级评估调优 → 端到端评估验证整体改善

4.4 解决问题的策略

非LLM组件：调参数/超参数（搜索结果数、RAG相似度阈值）、换供应商

LLM组件（按优先级）：

改进Prompt（明确指令、few-shot示例）
尝试不同LLM（用eval测试多个模型）
任务分解（将复杂步骤拆为生成+反思）
微调（最后手段，成本最高）

4.5 延迟与成本优化

早期团队，输出质量远比延迟和成本重要。先优化质量，再优化延迟，最后优化成本。
还是以组建化的思想，先分析哪个组件最慢/最贵，再针对性优化（如改prompt、换模型、减少调用频率）。

4.6 开发过程四阶段

阶段	核心	分析活动
1. 快速原型	端到端先跑通（”先造垃圾”）	手动检查输出、阅读traces
2. 初始评估	超越手动观察	建10-20例端到端eval
3. 严格分析	需要精确改善方向	错误分析，量化组件失败率
4. 高效调优	系统成熟，组件级改善	组件级eval

开发者两大活动：构建（写代码）和分析（决定聚焦哪里）。团队常花太多时间构建、太少时间分析。

五、高度自治智能体的模式

5.1 规划工作流（Planning）

规划模式：Agent自主决定工具调用序列，不硬编码。

案例——客服助手（工具：查描述、查价格、查库存、查订单、处理购买、处理退货）：

用户问”有100刀以下的圆墨镜吗？”
LLM规划：查描述 → 查库存 → 查价格 → 输出答案

优势：能力丰富，无需预编排。风险：无法预测LLM的计划，可能不稳定。

5.2 结构化计划

自然语言计划有歧义 → 要求LLM输出结构化计划（JSON/XML）：

[
  {"description": "查找圆墨镜", "tool": "get_item_descriptions", "arguments": {"query": "round sunglasses"}},
  {"description": "检查库存", "tool": "check_inventory", "arguments": {"items": "$step1_result"}}
]

5.3 代码即计划 Code As Action

Code As Action — HuggingFace smolagents

参考HuggingFace smolagents的CodeAgent概念——让LLM直接写代码表达多步计划。

优势：可调用大型库（Pandas数百函数）、表达力强、研究显示性能优于JSON/文本计划。
风险：明确要求LLM编写的代码并需要沙箱执行。

5.4 多智能体工作流

即使所有Agent用同一个LLM，拆分复杂任务为独立角色也更有效。

我个人觉得可能是因为提示词/上下文不同，导致模型的关注点是不一样的。

优势：

任务分解：按角色/技能自然分工
聚焦：开发者一次构建一个角色；更简单的任务 = 更好的输出
模块化复用：通用Agent（如”图表设计师”）可跨应用复用
突破上下文限制：每个Agent处理自己的上下文（对128k上下文限制至关重要）
成本节省：更短的上下文 = 更少的token = 更低的成本和更快的响应

5.5 四种通信模式

模式	结构	优点	缺点	适用场景
线性 Linear	顺序单向传递	简单	不灵活	固定流程任务
层级（两层）Hierarchy	Manager协调所有下属	易控制	Manager瓶颈	多任务协调
深层层级 Deep Hierarchy	子Agent有自己的子Agent	可扩展、模块化	复杂难调试	大型系统
全连接（去中心化）All to all	所有Agent自由通信	有创意	结果不可预测	探索/生成任务

当前LLM能力下，线性和层级模式更实用（层级越深信息损失越大）。
其实在这四种模式外，还有一种对话模式，比较类似去中心模式的降级版。对话模式每次都只有两个Agent互相对话交流，一方执行任务，另一方审查任务，最终交出一份双方都满意的结果。

5.6 框架推荐

LangChain：线性工作流
smolagents：层级工作流（作者推荐——简单、低抽象、@tool装饰器易开发）
MetaGPT / CamelAI：去中心化工作流

总结与个人思考

我之前在公司构建了一个通过ClaudeCode调用MCP去检查UE资产和打包报错Log的Skill（但或许这个Skill其实并不算Agentic AI），如果用Agentic AI的思维来做这个项目的话，它可能能做得更完善一点。而且我非常惊讶，吴恩达课程里面的几个测试项目的稳定性。

1. 规划（Planning）：定时器触发后，ClaudeCode先分析报错日志，判断需要调用哪个工具（查文档、查代码、查历史报错记录等），然后再执行工具，甚至自行编写数据库查询代码。
1. 反思（Reflection）：ClaudeCode在得到工具结果后，先进行自我审查，看看结果是否有用，如果不满意就调整查询参数重新调用工具，直到得到满意的结果。
1. 多智能体（Multi-Agent）：可以设计多个专门的Agent，比如一个专门分析日志的Agent，一个专门查询文档的Agent，一个专门查询代码的Agent，它们通过共享上下文进行协作。
1. 评估（Evals）：可以设计一些自动化的评估脚本，来量化ClaudeCode在解决报错问题上的表现，比如成功率、平均解决时间等指标。（每一个结果执行完成之后，会有一个Json表格自动上传服务器，然后管理员每周看统计效果。并且可以让用户回答该AI解决思路是否解决了你的问题，组建一个问题解决方案数据库，这样AI遇到了类似问题就可以参考之前的解决方案）。

另外，不同的模型可能适用于不同的harness，因为模型的能力不太一样（李宏毅的课程提到了，比如说sonnet 他对于上下文会有焦虑，所以当内容很多的时候，它会出现明显的能力下降)。

最后，回到 Tool Use 这个设计模式，一个关键的实践洞察是：MCP 工具的设计质量直接决定了 Agent 的能力上限。结合我在 UE MCP 项目中的经验，总结出以下六条工具设计原则：

Description 是最重要的设计——描述即接口：MCP 工具的调用方是 LLM 而非人类，LLM 靠 description 字段判断”什么时候该调用、怎么调用”。好的描述要包含：做什么、什么时候用、边界限制、参数语义、返回值含义。描述写得烂，工具就是死的。
粒度控制——以子系统为边界：工具太细（如按坐标轴拆分节点创建）导致调用链过长、容易出错累积；太粗（如一句话生成整个角色蓝图）变成黑盒，出错了 LLM 无法定位。以引擎子系统为边界划分，每个工具做一件完整的事。
返回值要”对 LLM 友好”：返回值必须包含足够的决策上下文——成功时提示下一步可用的操作，失败时给出 error_type、error_message 和 suggestion，让 Agent 能自我纠正而不是盲重试。
读写分离，副作用透明：LLM 在不确定时倾向于调用”看起来安全”的工具。只读工具和写操作要明确分类，写操作在描述里标注副作用（如”会在磁盘上创建新文件”、”不可逆操作”）。
幂等性设计，让 LLM 敢于重试：LLM 可能因超时或误判而重复调用同一工具，设计为重复调用安全（如：资产已存在则返回现有资产而非报错）。
分层工具结构：高层工具（面向任务的完整工作流，如 setup_character_blueprint()）减少调用次数；中层工具（面向单步操作）保证灵活性；底层 API 不直接暴露给 LLM。描述里引导 LLM 优先走高层路径。

核心一句话：好用的 MCP 工具设计，本质是”让 LLM 像一个读过文档的开发者一样能正确使用它”。 这和吴恩达课程中 Tool Use 模式的核心思想完美对应——工具的质量决定了 Agent 自主决策的上限。

参考资料

https://www.bilibili.com/video/BV1DfrdByE2H 课程地址
原github笔记:这里面有可以运行的代码，可以通过在VSCode里面的Jupyter Notebook的形式来学习，会很方便。
👆视频里面提到后面部分存在的另外一个视频：代理知识图谱构建
原始的视频地址
李宏毅课程：这个课程有点像 Agent

Claude Code Tips & Workflows

Sun, 22 Mar 2026 02:00:00 GMT

Claude Code Tips & Workflows

This post documents the stable workflows, prompting strategies, and plugin usage patterns I’ve settled on while working with Claude Code. The goal isn’t to list every feature — it’s to capture the practices that actually improve delivery speed.

What Claude Code Is Good At

From the perspective of game development, toolchain work, and content engineering, Claude Code is best suited for:

Understanding existing project structure
Batch refactoring scripts or tooling code
Adding tests, documentation, or scaffolding
Integrating third-party SDKs, service APIs, or CLI tools
Mid-complexity multi-file changes

Its strengths aren’t about “just throwing everything at it and hoping for the best.” It’s more about:

Reliable long-context comprehension
Strong performance on code explanation, refactoring, and summarization
Workflows that follow an analyze-first, then execute pattern

If your tasks are very granular and real-time — like line-level autocomplete while you type — an in-IDE completion tool is still more direct.

Claude Configuration File Locations

	`project-root/CLAUDE.md`	`~/.claude/settings.json`	`~/.claude.json`
Purpose	Instructions for Claude	User configuration	Internal system state
Written by	User (manually)	User (manually edited)	Claude Code (auto-maintained)
Contents	Project conventions, coding style, workflow agreements	API keys, model config, permission rules, MCP servers	Startup count, tool usage stats, per-project session records
Analogy	`.editorconfig` / `.eslintrc`	VS Code `settings.json`	VS Code `state.vscdb`
Version control	Should be committed to git	Do not commit (contains keys)	Do not commit (contains user ID)
`env`	—	API endpoint, model name	—
`permissions`	—	Tool allowlist	—
`mcpServers`	—	Global MCP servers	—
`enabledPlugins`	—	Plugin toggles	—
`numStartups`	—	—	Total startup count
`projects`	—	—	Per-project MCP, trust state, session stats
`toolUsage`	—	—	Call count and last-used time per tool
`tipsHistory`	—	—	Tips that have already been shown
Notes	This file is always present in the context window.

My Stable Workflow

Ask It to Map the Impact First

Before touching shared modules, base libraries, or build scripts, I’ll ask:

1 2	If I change this function signature, which callers will be affected? Please list them by file and flag the high-risk points.

This step is genuinely valuable for avoiding unintended breakage.

Ask for Verification Steps Along the Way

Beyond just the code changes, I’ll also request:

After completing the changes, please add:
1. How to verify the result
2. Recommended commands to run
3. Edge cases that might fail

This brings the final output much closer to a deliverable state.

Be Explicit About What Not to Do

For example:

Don’t change public APIs
Don’t introduce new dependencies
Don’t touch UI styles
Don’t modify the database schema

Negative constraints like these are critical, especially in existing projects.

Side note: Claude Sonnet tends to suffer from fairly serious context anxiety. It’s best used for short, quick tasks that can be completed in a single shot.

Plugins

The real value of plugins and external tool integrations isn’t “more features” — it’s turning Claude Code from something that only talks into something that can look things up, run things, and verify things.

My usual criteria come down to three questions:

Does it reduce manual context switching?
Does it ground the analysis in the actual codebase?
Does it form a stable workflow, not just a one-off demo?

ralph-loop (Loop Plugin)

I think of ralph-loop as a “loop execution framework” or a “task closure enhancer” rather than a simple plugin.

It works well for:

Tasks that require multiple rounds of analysis, execution, and checking
Situations where you want AI to iterate at a fixed cadence
Breaking large tasks into observable, reviewable rounds

Auto-Approving Confirmations

If your goal is simply “let it edit files in the current workspace without asking every time,” reach for Claude Code’s built-in permission modes first, rather than expecting the plugin to bypass confirmations.

I think about this in two tiers:

Option 1: Auto-approve file edits only

This is the safer approach.

Claude Code has an acceptEdits mode whose core effect is:

File edits within the workspace can be batch-accepted
Commands, network requests, and other side-effectful operations will still prompt you

If your main frustration is the “can I edit this file?” prompt, this is the tier to use.

You can check the current mode via /config or /permissions, or explicitly set it in a settings file:

{
  "$schema": "https://json.schemastore.org/claude-code-settings.json",
  "permissions": {
    "defaultMode": "acceptEdits"
  }
}

Common placement options:

Global effect: ~/.claude/settings.json
Current repo only (local): .claude/settings.local.json
Team-shared config: .claude/settings.json

If you only want to open this up on your own machine for the current project, .claude/settings.local.json is the best fit — it won’t be committed to the repo and won’t affect other projects.

Option 2: Skip command confirmations too

If you want Claude Code to skip confirmations for command execution and tool calls as well, you can pass a startup flag:

1	claude --dangerously-skip-permissions

This mode skips all permission prompts outright. It’s appropriate when you fully trust the current repo and explicitly accept that it may automatically edit files, run commands, and call tools.

The risk level here is noticeably higher — I wouldn’t leave it on by default. Better suited for:

Your own personal repos
Well-isolated test environments
Short-term high-frequency iteration tasks

I’d avoid using it for:

Repos of unknown origin
Projects with sensitive configs mixed in
Projects that can trigger deployments, publishing, or database operations

Practical Recommendation

If your concern is “I don’t want ralph-loop to keep asking me during multi-round execution,” the priority order should be:

Switch the default permission mode to acceptEdits first
Apply it only to the current repo via .claude/settings.local.json
Only switch to --dangerously-skip-permissions if even command confirmations are breaking your flow

In short:

Just want to skip file edit confirmations → use acceptEdits
Want to skip basically all confirmations → use --dangerously-skip-permissions

The first is for daily use; the second is for when you’ve consciously accepted the risks.

The bottom line: ralph-loop doesn’t make Claude Code “smarter” — it makes your task orchestration more stable.

Oh My Claude Code (OMC — Claude Agent Suite)

A zero-learning-curve tool for getting into Claude workflows. Provides a pre-configured collection of agents.

Installation

I use npm. Run in your terminal:
npm i -g oh-my-claude-sisyphus@latest
Then omc setup to complete the setup.
Project site

Five Available Modes

🔸 Autopilot (fully autonomous) — end-to-end automation from planning through implementation to testing
🔸 Ultra Pilot (parallel acceleration) — up to 5 parallel workers running simultaneously, ~5x throughput
🔸 Swarm (collaborative team) — multiple agents collaborate like a dev team, pulling from a shared task pool
🔸 Pipeline (sequential) — agents chained in a fixed order, ideal for workflows that must proceed step by step
🔸 EcoCode (economy mode) — maximizes token savings while maintaining efficiency

Status Bar in the New Claude CLI

[OMC#4.11.2] | session:33m | ctx:32% | T:17 A:1
● This is the OMC status bar. Here’s what each field means:

[OMC#4.11.2] — OMC plugin version
session:33m — current session has been running for 33 minutes
ctx:32% — 32% of the context window has been used
T:17 — Tool calls: 17 tool invocations in this session
A:1 — Agents: 1 active sub-agent currently running

Common Commands

There’s a subtle distinction worth noting: claude --resume is a startup flag you run in the terminal, while /xxx commands are typed directly inside a Claude Code session.

claude --resume: Resume the most recent or a specific session. Great when the terminal closes unexpectedly or you want to pick up where you left off.
/rewind: Roll the current session back to an earlier point. Very handy when a few rounds of thinking went sideways and you want to undo a stretch of work.
/resume / /continue: Resume an existing session from within a session, or open the session picker to continue a previous task.
/remote-control / /rc: Expose the current local session for remote control, so you can continue the task from claude.ai/code or a mobile device. Perfect for stepping away from your desk without losing context.
claude --dangerously-skip-permissions: Skip all permission confirmations at startup — no more per-action approval prompts. Use this when you fully trust the repo and want Claude Code to autonomously edit files, run commands, and call tools. Highest risk mode; best kept for personal projects or isolated environments.
/compact: Compress the current context into a summary and continue the same session. A lifesaver when the context is nearly full but you don’t want to re-explain the whole project. (Not supported by OMC.)
/doctor: Check Claude Code’s installation, permissions, and configuration. When a command stops working, a plugin misbehaves, or the environment looks off, running this first usually saves a lot of debugging time.

Errors & Debugging

PowerShell Garbled Output on Chinese Windows

Example error:

● Bash(powershell.exe -NoProfile -Command "...")
  ⎿  Error: Exit code 1
     ����λ�� ��:1 �ַ��: 169
     + ... ref])|Out-Null;if(.Count -eq 0){'OK: no syntax errors'}else{|%{extglo ...
     +                                                                 ~
     ������ʹ�ÿչ��Ԫ��
         + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
         + FullyQualifiedErrorId : EmptyPipeElement

Root cause:

Simplified Chinese Windows defaults to code page 936 (GBK/GB2312), but Claude Code processes strings as UTF-8 internally. When PowerShell outputs Chinese error messages encoded in GBK, Claude Code interprets them as UTF-8 → mojibake.

This is a very common issue affecting virtually all CJK (Chinese/Japanese/Korean) Windows users. English Windows is unaffected because code page 437 is ASCII-compatible with UTF-8.

Fix: Set the environment variable in Claude Code settings

Add to your project-level .claude/settings.local.json:

{
  "env": {
    "CHCP": "65001"
  }
}

Or for global effect, add it to ~/.claude/settings.json.

Why not change the system locale:

Windows offers a “Beta: Use Unicode UTF-8 for worldwide language support” toggle that changes the system code page from 936 to 65001. It’s the most thorough fix, but it’s a global irreversible change that can break older Chinese software with hardcoded GBK encoding (legacy installers, older archive tools, etc.). The env CHCP=65001 approach only affects Claude Code’s shell process — the rest of the system remains untouched.

You’ll need to restart your Claude Code session for the change to take effect.

References

Learning video by a Bilibili creator

Claude Code 使用技巧整理

Sun, 22 Mar 2026 02:00:00 GMT

Claude Code 使用技巧整理

这篇文档主要记录我在使用 Claude Code 过程中的一些稳定工作流、提问方式和插件使用经验。目标不是把所有功能列一遍，而是整理出真正能提高交付效率的做法。

Claude Code 适合做什么

如果从游戏开发、工具链开发和内容工程的视角来看，Claude Code 更适合以下几类任务：

理解已有工程结构
批量重构脚本或工具代码
补测试、补文档、补脚手架
对接第三方 SDK、服务接口或命令行工具
做中等复杂度的多文件改动

它的优势不是“直接无脑一把梭”，而是：

长上下文理解能力比较稳定
对代码解释、重构、归纳总结表现较好
适合先分析、再执行的工作流

如果任务非常碎、非常即时，比如一边写一边需要行级补全，那还是 IDE 内补全工具更直接。

Claude配置文件位置

	`项目根/CLAUDE.md`	`~/.claude/settings.json`	`~/.claude.json`
语义	给 Claude 的指令	用户配置	系统内部状态
谁写	用户手写	用户手动编辑	Claude Code 自动维护
内容	项目规范、编码风格、工作流约定	API 密钥、模型配置、权限规则、MCP 服务器	启动次数、工具使用统计、项目级会话记录
类比	`.editorconfig` / `.eslintrc`	VS Code 的 `settings.json`	VS Code 的 `state.vscdb`
版本控制	应提交到 git	不提交（含密钥）	不提交（含用户 ID）
`env`	—	API 地址、模型名	—
`permissions`	—	工具白名单	—
`mcpServers`	—	全局 MCP 服务器	—
`enabledPlugins`	—	插件开关	—
`numStartups`	—	—	启动总次数
`projects`	—	—	每个项目的 MCP、信任状态、会话统计
`toolUsage`	—	—	每个工具的调用次数和最后使用时间
`tipsHistory`	—	—	已展示过的提示信息
备注	这个文件一直存在于上下文中。

我比较稳定的使用流程

先要求它说明影响面

在改公共模块、基础类库、构建脚本之前，可以先问：

1 2	如果修改这个函数签名，会影响哪些调用方？请按文件列出来，并说明高风险点。

这一步对避免误改很有价值。

让它顺手补验证步骤

除了改代码，我还会顺带要求：

修改完成后，请补充：
1. 如何验证
2. 建议执行的命令
3. 可能失败的边界情况

这样最后产出更接近可交付状态。

把“不要做什么”写明

例如：

不要改 public API
不要引入新依赖
不要动 UI 样式
不要修改数据库结构

这类负向约束非常重要，尤其是在已有项目里。

另外，Claude Sonnot会有比较严重的上下文焦虑。所以它只适合用于来做短暂的快速完成的小功能。

ClaudeCode底层实现讨论

模型自带的搜索能力WebSearch如何实现的

Claude Code 内置了两个独立的联网工具，经常被混为一谈：

工具	职责	输入	输出
WebSearch	搜索引擎入口	关键词 query	标题 + 链接列表
WebFetch	页面内容读取	具体 URL	页面正文

两者配合使用：先用 WebSearch 找到相关链接，再用 WebFetch 读取具体内容。

底层调用链路

这里有个比较有意思的实现细节，是社区通过逆向 Claude Code 流量发现的：

主对话触发：当 Claude 判断需要搜索时，主会话调用 WebSearch，传入 query 参数
派生子对话：Anthropic 服务端会为这次搜索单独起一个 Claude Opus 子会话，调用 Anthropic 内部的 web_search 服务端工具
结果回传：子会话处理完后，结果作为工具返回值传回主对话
可能多轮：整个过程可能在单次请求中重复多次（比如先搜一次，根据结果决定再搜一次）

这个设计的意图是让主 Agent 保持轻量，并限制注入面（injection surface），搜索逻辑在隔离的子会话里运行。

版本与收费说明

目前最新工具版本是 web_search_20260209，支持动态过滤（Dynamic Filtering，正式”进入 Claude 上下文之前”，先让代码过滤一遍，只保留有用的部分**）
API 用户：WebSearch 是单独计费的附加功能，每次搜索额外收费
Max 套餐用户：已经包含在套餐里，不单独扣费，可以直接用
这也是为什么有人反馈”用 Claude 订阅账号跑 Claude Code 时 WebSearch 显示 Rate limit，但用 API Key 却正常”——两者走的是不同的配额通道

和 MCP 搜索插件的区别

如果你已经用 Max 套餐，内置的 WebSearch + WebFetch 对日常搜索够用，不需要额外装 Tavily、Brave 这类 MCP 搜索插件。MCP 搜索插件更适合需要更高频次搜索、或者需要自定义搜索行为的 API 用户。

Plugin /其他工具/工作流

插件或者外部工具接入，真正的价值不是“功能变多”，而是让 Claude Code 可以从“只会说”变成“能查、能跑、能验证”。

我的判断标准通常是三条：

能不能减少手动切换上下文
能不能把分析结果落到真实工程上
能不能形成稳定工作流，而不是偶尔演示一次

ralph-loop （循环插件）

ralph-loop 这类插件我更倾向把它看成“循环执行框架”或者“任务闭环增强器”。

它比较适合下面这些场景：

一个任务需要多轮分析、执行、检查
需要让 AI 按固定节奏反复迭代
需要把大任务拆成可观察的小回合

自动化同意问题

如果你的目标只是“允许它直接修改当前工作区里的文件，不要每次都弹确认”，优先用 Claude Code 自带的权限模式，而不是指望插件本身绕过确认。

我建议分两档理解：

方案 1：只自动同意编辑文件

这是更稳的做法。

Claude Code 有一个 acceptEdits 模式，核心效果是：

工作区内的文件编辑可以批量接受
但执行命令、联网请求、其他有副作用的操作，仍然会继续询问

如果你主要烦的是“改这个文件能不能同意”这种提示，那应该优先用这一档。

可以通过 /config 或 /permissions 检查当前模式，也可以在设置文件里显式写上：

{
  "$schema": "https://json.schemastore.org/claude-code-settings.json",
  "permissions": {
    "defaultMode": "acceptEdits"
  }
}

常见放置位置：

全局生效：~/.claude/settings.json
当前仓库本地生效：.claude/settings.local.json
团队共享配置：.claude/settings.json

如果你只是想在自己机器上对当前项目放开，最适合放到 .claude/settings.local.json。这样不会提交进仓库，也不会影响别的项目。

方案 2：连命令确认也一起跳过

如果你希望 Claude Code 连命令执行、工具调用这些确认也尽量别问，可以直接用启动参数：

1	claude --dangerously-skip-permissions

这个模式相当于直接跳过权限确认，适合你完全信任当前仓库、并且明确知道它可能会自动执行改文件、跑命令、调用工具等操作的场景。

但这一档风险明显更高，不建议默认长期打开。更适合：

自己的个人仓库
隔离良好的测试环境
临时做高频迭代任务

不太建议直接用于：

来路不明的仓库
混有敏感配置的工程
会执行部署、发布、数据库操作的项目

实战建议

如果你的诉求只是“ralph-loop 在多轮执行时别一直问我能不能改文件”，那优先级应该是：

先把默认权限模式切到 acceptEdits
用 .claude/settings.local.json 只对当前仓库生效
只有在你连命令确认都嫌打断流程时，才改用 --dangerously-skip-permissions

简单说：

只想免掉文件编辑确认，用 acceptEdits
想把所有确认基本都跳过，用 --dangerously-skip-permissions

前者适合日常主力使用，后者适合你明确接受风险时再开。

简单说，ralph-loop 不是让 Claude Code “更聪明”，而是让你的任务编排更稳定。

Oh My Claude Code（OMC，Claude智能体集合）

主打一个零成本学习Claude的一个工具。提供一个预设好的agent合集。

安装方式

我使用npm包管理。在terminal里执行：
npm i -g oh-my-claude-sisyphus@latest
然后omc setup,完成设置。
项目网站

提供了5种模式

🔸 Autopilot（完全自主） - 从规划、实施到测试的全流程自动化
🔸 Ultra Pilot（并行加速） - 最多5个并行工作者同时处理，速度提升5倍
🔸 Swarm（协作团队） - 多个智能体像开发团队一样协作，从共享任务池领取工作
🔸 Pipeline（流水线） - 按固定顺序串联智能体，适合必须按步骤推进的工作流
🔸 EcoCode（经济模式） - 在保持效率的前提下最大化节省 token 消耗

新版本claude命令行下的标识

[OMC#4.11.2] | session:33m | ctx:32% | T:17 A:1
● 这是 OMC 状态栏的显示，各项含义：

[OMC#4.11.2] — OMC 插件版本号
session:33m — 当前会话已持续 33 分钟
ctx:32% — 上下文窗口已使用 32%
T:17 — Tool calls，本次会话已执行的工具新调用次数（17 次）
A:1 — Agents，当前活跃的子代理数量（1 个）

在运行命令的时候，遇到一个报错：

● Ran 4 stop hooks (ctrl+o to expand)
  ⎿  Stop hook error: Failed with non-blocking status code: /usr/bin/bash: line 1: node: command not found
  ⎿  Stop hook error: Failed with non-blocking status code: /usr/bin/bash: line 1: node: command not found
  ⎿  Stop hook error: Failed with non-blocking status code: /usr/bin/bash: line 1: node: command not found

首先建议全局安装 OMC，这样在全局的命令行里都能调用到这个，比在plugin里面管理会好些：

1 2	npm install -g oh-my-claude-sisyphus omc setup

上面提到的这个问题可能是因为没有安装tmux，

1	winget install psmux

claude-tap（API 流量追踪器）

一个本地代理工具，用来拦截并可视化 Claude Code、Codex CLI、Gemini CLI 等编程代理的真实 API 流量。

核心用途：调试 AI 行为时，能直接看到底层发生了什么——

查看完整的系统提示词、对话历史、工具定义与工具调用结果
比较相邻两次请求的差异，精确定位是哪条提示、哪个参数发生了变化
每次运行生成 JSONL 日志 + 自包含 HTML 查看器，方便留存和分享
数据全留本地，无需任何托管仪表盘，常见 auth header 会自动脱敏

安装（需 Python 3.11+）：

1
2
3

uv tool install claude-tap
# 或
pip install claude-tap

GitHub 项目地址
第一次使用

1	claude-tap --tap-live

工具在当前时间点不支持 Python 3.14 这个版本，所以为了适用这工具，我还特地降级为 3.13.

常用命令

这里有一个小区别：claude --resume 是在终端里执行的启动参数，/xxx 则是在 Claude Code 会话里直接输入的命令。

claude --resume：恢复最近一次或指定会话。终端意外关闭，或者你想接着上次上下文继续做时很好用。
/rewind：把当前会话回退到前面的某个节点。刚刚几轮思路跑偏、想撤回一段操作时非常方便。
/resume / /continue：在会话里恢复已有 session，或者直接打开会话选择器继续之前的任务。
/remote-control / /rc：把当前本地会话开放给远程控制，可以在 claude.ai/code 或移动端继续接手当前任务。这个功能很适合临时离开电脑但又不想断掉上下文的场景。
claude --dangerously-skip-permissions：启动时直接跳过权限确认，不再逐次询问用户同意。适合你完全信任当前仓库、并且希望 Claude Code 自主连续执行改文件、跑命令、调用工具的场景，但风险也最高，最好只在个人项目或隔离环境里临时使用。
/compact：把当前上下文压缩成摘要后继续同一会话。上下文快满、但你又不想重新解释项目背景时特别省事。（这个OMC不支持）
/doctor：检查 Claude Code 的安装、权限和配置问题。命令失效、插件不工作、环境异常时，先跑它通常能省掉不少排查时间。

报错及Debug

PreToolUse / PostToolUse Hooks 报错

报错示例：

Read 1 file (ctrl+o to expand)
  ⎿  PreToolUse:Read hook error    ⎿  ECONNREFUSED
  ⎿  PreToolUse:Read hook error    ⎿  ECONNREFUSED
  ⎿  PostToolUse:Read hook error   ⎿  ECONNREFUSED
  ⎿  PostToolUse:Read hook error   ⎿  ECONNREFUSED

原因分析：
一些公司会通过 Hooks 的方式监控 Claude 的使用情况，它同时还是 Claude Code 的 LLM 网关 —— ANTHROPIC_BASE_URL 也指向同一个端口。

“内网AI网关工具们”监听的 Endpoint：

Endpoint	触发时机	用途（推断）
`/hook/claude`	`UserPromptSubmit` / `Stop` / `StopFailure` / `Subagent*` / `PostToolUseFailure`	通用事件流 —— 把”用户提交了 prompt”、”会话结束”、”子 agent 启停”这类生命周期事件喂给“内网AI网关工具们”
`/hook/claude/pre-tool`	`PreToolUse`	工具调用前拦截 —— “内网AI网关工具们”能在这里看到你要调什么工具、什么参数
`/hook/claude/post-tool`	`PostToolUse`	工具调用后回执 —— “内网AI网关工具们”能看到工具返回了什么

“内网AI网关工具们”拿这些数据干嘛：

会话观测 / 录制 —— “内网AI网关工具们”是 GUI，需要实时知道 Claude Code 当前在做什么，才能在它的界面里展示”当前会话、调了哪些工具、用了多少 token”。
多 agent 编排 —— 看它的扩展代码（gateway-dispatch.ts），“内网AI网关工具们”内部跑了一个 daemon，可以在 Codex / Claude / Gemini / CodeMaker 之间分发任务、做后台任务卡片。Hooks 是它统一观察这些 agent 的入口。
可能的策略干预 —— PreToolUse hook 在 Claude Code 里是有能力 block / 改写工具调用的（这是规范的能力），但具体“内网AI网关工具们”用没用没看到证据。用空 {} 打 pre-tool 它回 422，说明它确实在解析 payload。

小结：

这些 hooks 是“内网AI网关工具们”把自己 UI 接到 Claude Code 上的观测 + 控制通道，不是必需。如果你不用“内网AI网关工具们”的 GUI 看会话状态、不用它编排多 agent，可以删；但因为 ANTHROPIC_BASE_URL 也走“内网AI网关工具们”，“内网AI网关工具们”进程本身仍然必须活着。

Windows 中文环境 PowerShell 乱码

报错示例：

● Bash(powershell.exe -NoProfile -Command "...")
  ⎿  Error: Exit code 1
     ����λ�� ��:1 �ַ��: 169
     + ... ref])|Out-Null;if(.Count -eq 0){'OK: no syntax errors'}else{|%{extglo ...
     +                                                                 ~
     ������ʹ�ÿչ��Ԫ��
         + CategoryInfo          : ParserError: (:) [], ParentContainsErrorRecordException
         + FullyQualifiedErrorId : EmptyPipeElement

原因分析：

Windows 简体中文版默认 code page 是 936 (GBK/GB2312)，但 Claude Code 内部按 UTF-8 处理字符串。PowerShell 输出中文错误信息时用 GBK 编码，Claude Code 按 UTF-8 解读 → 乱码。

这是一个非常常见的问题，几乎所有 CJK（中日韩）Windows 用户都会遇到。英文 Windows 不会触发，因为 code page 437 与 UTF-8 在 ASCII 范围内兼容。
修复方案：在 Claude Code settings 里设置环境变量

在项目级 .claude/settings.local.json 中添加：

{
  "env": {
    "CHCP": "65001"
  }
}

或全局生效，写入 ~/.claude/settings.json。

为什么不推荐改系统区域设置：

Windows 提供”Beta: 使用 Unicode UTF-8 提供全球语言支持”选项，会把系统 code page 从 936 改成 65001，效果最彻底，但属于全局不可逆改动，可能导致少数旧中文软件（GBK 硬编码的安装程序、老版压缩工具等）乱码。env CHCP=65001 只在 Claude Code 的 shell 进程里生效，系统其他部分完全不受影响。

修改后需要重启 Claude Code 会话才能生效。

参考资料

一个B站博主的学习视频

Research Notes on Mainstream AI Subscription Plans

Fri, 13 Mar 2026 19:10:00 GMT

Research Notes on Mainstream AI Subscription Plans

Last updated: 2026-03-14
As AI technology advances rapidly, AI subscription services are proliferating. This document surveys and compares the current mainstream AI subscription options to help individuals and teams find the right fit.
Provider categories:
Native AI providers: Companies that build their own models (OpenAI, Google, Anthropic, etc.)
Third-party AI providers: Platforms that aggregate multiple model sources (OpenRouter, Together AI, Replicate, etc.)

Provider Category Overview
Native AI Providers
Third-Party AI Providers
Comparison Summary

Provider Category Overview

Native AI Providers

Definition: Companies that develop their own AI models and offer direct API access.

Characteristics:

✅ Strongest model capabilities (cutting-edge technology)
✅ Mature ecosystem, rich documentation
✅ Official support, high stability
❌ Single vendor, risk of vendor lock-in
❌ Relatively higher prices (though OpenAI allows region-switching tricks for lower rates)
❌ Complex integration when using multiple vendors

Best for:

Projects requiring peak model performance
Enterprise applications with high stability requirements
Global products needing multilingual support
Teams that don’t want to rely on third-party proxies

Third-Party AI Providers (Aggregators)

Definition: Platforms that aggregate multiple AI model sources behind a single unified API.

Characteristics:

✅ Unified interface, lower integration complexity
✅ Rich model selection, flexible switching
✅ Smart routing and automatic failover
✅ Cost optimization, transparent pricing
❌ Extra middleware layer, may introduce additional latency
❌ Dependent on third-party platform stability
❌ Feature set may not be as complete as native providers

Best for:

Projects that need to connect to multiple models simultaneously
Teams wanting to reduce vendor lock-in risk
Cost-sensitive scenarios requiring flexible model switching
Rapid prototyping and testing

Native AI Providers

OpenAI

Official Websites

Overview

OpenAI is the pioneer and leader in large language models, offering the GPT series (GPT-4, GPT-3.5, etc.) and image generation models (DALL-E). It is currently the most mature AI API provider in the industry.

Core Models

Language Models

GPT-4 Turbo: Latest GPT-4, faster and cheaper, supports 128K context
GPT-4: Top-tier language model, supports 8K/32K/128K context
GPT-3.5 Turbo: Great value, fast responses, supports 16K context
GPT-4o: Multimodal model supporting text, images, and audio

Image Models

DALL-E 3: High-quality image generation
DALL-E 2: Previous-generation image generation

Other Models

Whisper: Speech recognition (multilingual)
Embeddings: Text embedding vectors
Text-to-Speech: Voice synthesis
Moderation: Content moderation

Subscription Plans

Free Tier

Price: $0/month
Credits: $5 free credit (new users)
Limits: Lower rate limits

API Pay-as-you-go

GPT-4 Turbo: $0.01 / 1K input tokens, $0.03 / 1K output tokens
GPT-4: $0.03 / 1K input tokens, $0.06 / 1K output tokens
GPT-3.5 Turbo: $0.0015 / 1K input tokens, $0.002 / 1K output tokens
DALL-E 3: $0.04 / image
Whisper: $0.006 / minute

ChatGPT Plus (Personal)

Price: $20/month
Includes:
- GPT-4 access
- DALL-E 3 image generation
- Advanced data analysis
- Browsing capability
- Priority access to new features

Team

Price: $25/user/month
Includes:
- Everything in ChatGPT Plus
- Admin console
- Team collaboration workspace
- Data isolation
- Higher rate limits

Enterprise

Price: Contact sales
Includes:
- Unlimited speed
- Priority support
- API access
- Data encryption
- Custom model fine-tuning
- Compliance certifications (SOC2, HIPAA)

Core Strengths

1. Model Capabilities

Industry-leading language models
Excellent multilingual support
Powerful code generation
Outstanding reasoning and comprehension

2. Ecosystem

API: REST API, Python/JS SDK
LangChain: Native support
Vercel AI SDK: Native support
VS Code plugins: Copilot and more
Rich documentation: Detailed API docs and examples

3. Advanced Features

Function Calling: Call external functions
Streaming: Stream responses
JSON Mode: Guaranteed JSON output
Vision: Image understanding
Fine-tuning: Custom model tuning
Assistants API: Build AI assistants

4. Enterprise Features

Azure OpenAI: Enterprise-grade deployment
Data privacy: Data not used for training (Enterprise tier)
Compliance: SOC2, HIPAA, GDPR
SLA: Enterprise service level agreements
Technical support: Dedicated support team

Best For

Applications that demand peak model performance
Enterprise apps with high stability requirements
Global products needing multilingual support
Teams wanting a complete ecosystem and toolchain
Cost-insensitive scenarios

Pros & Cons

Pros:

✅ Strongest models, industry benchmark
✅ Mature ecosystem, rich tooling
✅ Best documentation and community support
✅ Comprehensive enterprise features
✅ Continuous updates and improvements

Cons:

❌ Relatively higher prices
❌ Single vendor, lock-in risk
❌ Some features require Enterprise tier
❌ Data compliance concerns for users outside the US

Docs & Resources

Official docs: https://platform.openai.com/docs
API Reference: https://platform.openai.com/docs/api-reference
GitHub: https://github.com/openai
Community: OpenAI Developer Forum

Google Gemini

Official Websites

Overview

Google Gemini (formerly Bard) is Google’s multimodal large language model offering strong text, image, and audio understanding, with deep integration across the Google ecosystem.

Core Models

Gemini Series

Gemini Ultra: Most powerful model, multimodal
Gemini Pro: Mainstream model, balanced performance and cost
Gemini Pro Vision: Vision model
Gemini Flash: High-speed response model

Other Models

PaLM 2: Previous-generation language model
Imagen: Image generation
Codey: Code model

Subscription Plans

Free Tier

Price: $0/month
Includes:
- Gemini Pro access
- Daily usage limit
- Web interface access

AI Studio (API Pay-as-you-go)

Gemini Pro: $0.0005 / 1K input tokens, $0.0015 / 1K output tokens
Gemini Pro Vision: $0.0025 / 1K input tokens, $0.0075 / 1K output tokens
Imagen: $0.002 / image

Google One AI Premium (Personal)

Price: $19.99/month
Includes:
- Gemini Ultra access
- 2TB Google Cloud storage
- Google Workspace premium features

Enterprise

Price: Contact sales
Includes:
- Vertex AI platform access
- Custom model fine-tuning
- Data privacy protection
- Compliance certifications
- Technical support

Core Strengths

1. Multimodal Capabilities

Native multimodal (text, images, audio, video)
Cross-modal understanding and generation
Real-time video analysis

2. Google Ecosystem Integration

Google Workspace: Docs, Gmail, Sheets integration
Google Search: Real-time search capability
Google Maps: Geospatial information
YouTube: Video content understanding
Android: Mobile integration

3. Developer Experience

Vertex AI: Enterprise-grade AI platform
AI Studio: Free development environment
Google Cloud: Cloud-native deployment
Kaggle: Data science community

4. Performance Advantages

MLOps: Model deployment and monitoring
A/B Testing: Model comparison
AutoML: Automated machine learning
TPU optimization: Hardware acceleration

Best For

Projects needing Google ecosystem integration
Multimodal application development
Enterprise AI platforms
Teams that need MLOps capabilities
Existing Google Cloud users

Pros & Cons

Pros:

✅ Strong multimodal capabilities
✅ Deep Google ecosystem integration
✅ Relatively lower prices
✅ Mature enterprise AI platform
✅ Rich developer tools

Cons:

❌ Language model capabilities slightly behind GPT-4
❌ Documentation and community not as strong as OpenAI
❌ Some features still in Beta
❌ Access restricted in some regions

Docs & Resources

Official docs: https://ai.google.dev/docs
Vertex AI: https://cloud.google.com/vertex-ai
AI Studio: https://aistudio.google.com/

Anthropic Claude

Official Websites

Overview

Anthropic was founded by former OpenAI employees and focuses on AI safety and alignment. The Claude series is known for its safety, long context window, and natural conversational quality.

Core Models

Claude 3 Series

Claude 3 Opus: Most powerful, highest intelligence
Claude 3 Sonnet: Balanced model, good performance-to-cost ratio
Claude 3 Haiku: Fast model, low cost

Claude 2 Series

Claude 2.1: Long context (200K tokens)
Claude 2: Previous-generation model

Subscription Plans

Free Tier

Price: $0/month
Credits: Limited usage quota

API Pay-as-you-go

Claude 3 Opus: $15 / 1M input tokens, $75 / 1M output tokens
Claude 3 Sonnet: $3 / 1M input tokens, $15 / 1M output tokens
Claude 3 Haiku: $0.25 / 1M input tokens, $1.25 / 1M output tokens
Claude 2.1: $8 / 1M input tokens, $24 / 1M output tokens

Claude Pro (Personal)

Price: $20/month
Includes:
- Claude 3 Opus access
- Higher usage limits
- Priority access to new features

Team

Price: $30/user/month
Includes:
- Everything in Claude Pro
- Team management features
- Higher usage limits

Enterprise

Price: Contact sales
Includes:
- Custom model fine-tuning
- Data privacy protection
- Compliance certifications
- Dedicated support

Core Strengths

1. Safety and Alignment

Leading AI safety research
Constitutional AI methodology
Refusal of harmful content
Strong interpretability

2. Long Context

Claude 2.1 supports 200K tokens
Long document comprehension and summarization
Large codebase analysis

3. Natural Conversation

Highly fluent dialogue
Natural tone and voice
Ideal for chatbots
Creative writing

4. Coding Capabilities

Excellent performance on programming tasks
Code generation and debugging
Technical documentation understanding
Code refactoring suggestions

5. Developer Experience

API: REST API, Python/JS SDK
LangChain: Native support
Function Calling: Call external functions
Streaming: Stream responses
Tool Use: External tool integration

Best For

Scenarios with high safety requirements
Applications requiring long context
Coding assistant tools
Chatbots
Content moderation and compliance

Pros & Cons

Pros:

✅ Best safety and alignment properties
✅ Long context support (200K)
✅ Natural and fluent conversation
✅ Strong coding capabilities
✅ Reasonably priced

Cons:

❌ Model capabilities slightly behind GPT-4
❌ Ecosystem not as mature as OpenAI
❌ Smaller documentation and community
❌ Fewer tools and plugins

Docs & Resources

Official docs: https://docs.anthropic.com/
API Reference: https://docs.anthropic.com/claude/reference
GitHub: https://github.com/anthropics
Research papers: https://www.anthropic.com/research

Zhipu AI

Official Websites

Overview

Zhipu AI is a leading Chinese large model company that developed the GLM series, offering Chinese-optimized language models and multimodal capabilities.

Core Models

Language Models

GLM-4: New-generation LLM, capabilities benchmarked against GPT-4
GLM-3-Turbo: Fast response model
GLM-3-6B: Lightweight model

Multimodal Models

CogView: Image generation
CogVideo: Video generation
CogView3: Third-generation image model

Specialized Models

CodeGeeX: Code model
CharacterGLM: Role-playing
MedicalGLM: Healthcare

Subscription Plans

Free Tier

Price: ¥0/month
Includes:
- Basic model access
- Daily usage limit
- Online chat

API Pay-as-you-go

GLM-4: ¥0.1 / 1K input tokens, ¥0.1 / 1K output tokens
GLM-3-Turbo: ¥0.005 / 1K input tokens, ¥0.005 / 1K output tokens
CogView: ¥0.05 / image

Personal Plan

Price: ¥49/month
Includes:
- GLM-4 premium access
- Higher usage limits
- Priority responses

Enterprise

Price: Contact sales
Includes:
- Private deployment
- Model fine-tuning
- Data privacy protection
- Dedicated technical support
- Compliance certifications

Core Strengths

1. Chinese Language Optimization

Strong Chinese comprehension and generation
Deep understanding of Chinese culture
Excellent Chinese instruction-following

2. Domestic Compliance Support

Compliant with Chinese data regulations
Data stays onshore
Compatible with domestic hardware

3. Multimodal Capabilities

Text, images, and video
Cross-modal understanding
Wide application scenarios

4. Cost Advantage

Relatively lower prices
Optimized for the domestic market
Suitable for large-scale applications

5. Developer Experience

API: REST API, Python/Java SDK
LangChain: Native support
Web interface: Online debugging
Detailed docs: Chinese-language documentation

Best For

Domestic application development
Chinese-primary applications
Scenarios with strict data compliance requirements
Cost-sensitive projects
Teams requiring domestic-only deployment

Pros & Cons

Pros:

✅ Strong Chinese language capabilities
✅ Compliant with Chinese regulations
✅ Relatively lower cost
✅ Domestic deployment support
✅ Solid multimodal capabilities

Cons:

❌ Model capabilities slightly behind GPT-4
❌ Weaker English capabilities
❌ Smaller ecosystem and community
❌ Fewer tools and plugins

Docs & Resources

Official docs: https://open.bigmodel.cn/dev/api
GitHub: https://github.com/THUDM
Open-source projects: GLM-4, CodeGeeX, etc.

Baidu ERNIE Bot

Official Websites

Overview

Baidu’s ERNIE Bot (Wenxin Yiyan) is a large language model based on Baidu’s ERNIE series, with deep integration across the Baidu ecosystem.

Core Models

ERNIE Series

ERNIE 4.0: Latest version, multimodal
ERNIE 3.5: Mainstream version
ERNIE 4.0 Turbo: Fast version

Specialized Models

ERNIE Bot: Dialogue model
ERNIE Speed: High-speed response
ERNIE Lite: Lightweight

Subscription Plans

Free Tier

Price: ¥0/month
Includes:
- Basic dialogue
- Daily usage limit

API Pay-as-you-go

ERNIE 4.0: ¥0.12 / 1K tokens
ERNIE 3.5: ¥0.008 / 1K tokens
ERNIE Speed: ¥0.004 / 1K tokens

VIP Membership

Price: ¥49/month
Includes:
- ERNIE 4.0 access
- Higher usage limits
- Exclusive features

Enterprise

Price: Contact sales
Includes:
- Baidu Cloud integration
- Private deployment
- Model fine-tuning
- Dedicated support

Core Strengths

1. Baidu Ecosystem Integration

Baidu Search: Real-time search
Baidu Maps: Geospatial information
Baidu Baike: Knowledge base
Baidu Netdisk: Cloud storage

2. Chinese Language Optimization

Strong Chinese capabilities
Chinese cultural understanding
Rich Chinese knowledge base

3. Enterprise Features

Baidu Cloud: Cloud-native deployment
Intelligent Cloud: AI development platform
Compliance: Meets domestic requirements
Technical support: 24/7 support

4. Developer Tools

Qianfan Platform: AI development platform
ModelArts: Model training
AppBuilder: Application building

Best For

Baidu ecosystem integration
Domestic enterprise applications
Teams needing Baidu Cloud services
High compliance requirements
SMBs looking for rapid deployment

Pros & Cons

Pros:

✅ Deep Baidu ecosystem integration
✅ Strong Chinese capabilities
✅ Comprehensive enterprise features
✅ Good Baidu Cloud support
✅ Relatively lower prices

Cons:

❌ Average model capabilities
❌ Weaker English capabilities
❌ Relatively closed ecosystem
❌ Fewer developer tools

Docs & Resources

Official docs: https://cloud.baidu.com/doc/WENXINWORKSHOP/
Qianfan Platform: https://console.bce.baidu.com/qianfan/

Alibaba Cloud Qwen

Official Websites

Overview

Alibaba Cloud’s Qwen (Tongyi Qianwen) is a series of large language models ranging from lightweight to ultra-large scale, offering diverse model choices.

Core Models

Qwen Series

Qwen-Max: Most powerful model
Qwen-Plus: Mainstream model
Qwen-Turbo: Fast response
Qwen-Long: Long context

Open-Source Models

Qwen-72B: Open-source large-scale
Qwen-14B: Open-source mid-scale
Qwen-7B: Open-source small-scale

Specialized Models

Qwen-VL: Vision-language model
Qwen-Audio: Audio model
CodeQwen: Code model

Subscription Plans

Free Tier

Price: ¥0/month
Includes:
- Basic model access
- Daily usage limit

API Pay-as-you-go

Qwen-Max: ¥0.04 / 1K tokens
Qwen-Plus: ¥0.008 / 1K tokens
Qwen-Turbo: ¥0.003 / 1K tokens

Enterprise

Price: Contact sales
Includes:
- Deep Alibaba Cloud integration
- Private deployment
- Model fine-tuning
- SLA guarantees

Core Strengths

1. Alibaba Cloud Ecosystem

Alibaba Cloud ECS: Cloud servers
OSS: Object storage
RDS: Database
Function Compute: Serverless

2. Rich Model Selection

Multiple scales of models
Downloadable open-source models
Long context support

3. Chinese Language Optimization

Strong Chinese capabilities
Integration with Alibaba products
Optimized for e-commerce scenarios

4. Developer Tools

DashScope: AI development platform
ModelScope: Model community
PAI: Machine learning platform

Best For

Existing Alibaba Cloud users
E-commerce applications
Enterprise-grade applications
Teams that prefer open-source models
Cost-sensitive projects

Pros & Cons

Pros:

✅ Deep Alibaba Cloud integration
✅ Rich model selection
✅ Downloadable open-source models
✅ Strong Chinese capabilities
✅ Lower prices

Cons:

❌ Average model capabilities
❌ Weaker English capabilities
❌ Relatively closed ecosystem
❌ Toolchain not fully mature

Docs & Resources

Official docs: https://help.aliyun.com/zh/dashscope/
ModelScope: https://modelscope.cn/

ByteDance Doubao

Official Websites

Overview

ByteDance’s Doubao is an AI assistant and multimodal model platform offering dialogue, image generation, voice, and other capabilities.

Core Models

Doubao Series

Doubao-Pro: Professional version
Doubao-Lite: Lightweight version
Doubao-Character: Role-playing

Vision Models

Skylark: Image generation
Skylark-2: Second-generation images

Other Models

Voice models: Speech synthesis
Video models: Video generation

Subscription Plans

Free Tier

Price: ¥0/month
Includes:
- Basic dialogue
- Daily usage limit

API Pay-as-you-go

Doubao-Pro: ¥0.008 / 1K tokens
Doubao-Lite: ¥0.001 / 1K tokens
Skylark: ¥0.05 / image

Enterprise

Price: Contact sales
Includes:
- Volcano Engine integration
- Private deployment
- Dedicated support

Core Strengths

1. ByteDance Ecosystem

Douyin (TikTok): Short video integration
Toutiao: News and content
Feishu (Lark): Workplace collaboration
Volcano Engine: Cloud services

2. Multimodal

Text, images, audio, and video
Cross-modal understanding
Creative content generation

3. Cost Advantage

Lower prices
Suitable for large-scale applications
Generous free quota

4. Developer Tools

Volcano Engine: Development platform
Open Platform: API services
Detailed docs: Chinese-language documentation

Best For

ByteDance ecosystem integration
Multimodal applications
Content creation
Cost-sensitive projects
Consumer-facing (C-end) applications

Pros & Cons

Pros:

✅ ByteDance ecosystem integration
✅ Strong multimodal capabilities
✅ Lower prices
✅ Generous free quota
✅ Suitable for consumer apps

Cons:

❌ Average model capabilities
❌ Fewer enterprise features
❌ Toolchain not fully mature
❌ Relatively closed ecosystem

Docs & Resources

Volcano Engine: https://platform.volcengine.com/
Open Platform: https://open.volcengine.com/

Moonshot Kimi

Official Website

https://www.moonshot.cn/

Overview

Moonshot AI’s Kimi is known for its ultra-long context window — supporting up to 2 million tokens — making it ideal for long document analysis and summarization.

Core Models

Kimi Series

moonshot-v1-128k: 128K context
moonshot-v1-32k: 32K context
moonshot-v1-8k: 8K context

Subscription Plans

Free Tier

Price: ¥0/month
Includes:
- Basic dialogue
- 20 files/day

Pro

Price: ¥68/month
Includes:
- 128K context
- Unlimited file uploads
- Higher usage limits

Enterprise

Price: Contact sales
Includes:
- API access
- Private deployment
- Dedicated support

Core Strengths

1. Ultra-Long Context

2 million token context window
Long document analysis
Large codebase comprehension

2. File Processing

Supports multiple formats
PDF, Word, Excel, and more
Document summarization and analysis

3. Chinese Language Optimization

Strong Chinese capabilities
Chinese document processing
Chinese cultural understanding

4. User Experience

Clean interface
Easy to use
Suited for individual users

Best For

Long document analysis
Codebase comprehension
Research and academic work
Personal knowledge management
Document summarization

Pros & Cons

Pros:

✅ Ultra-long context (2 million tokens)
✅ Strong file processing
✅ Good Chinese language optimization
✅ Pleasant user experience
✅ Great for personal use

Cons:

❌ Average model capabilities
❌ Relatively narrow feature set
❌ Fewer enterprise features
❌ Toolchain not fully mature

Docs & Resources

Official website: https://www.moonshot.cn/
Usage docs: Available on the official website

GitHub Copilot (Microsoft)

Official Websites

Overview

GitHub Copilot is Microsoft’s AI coding assistant that aggregates multiple leading large language models — OpenAI, Anthropic Claude, Google Gemini, and others — with deep integration into VS Code, Visual Studio, JetBrains, and other IDEs. It provides code completion, chat, Agent mode, code review, and comprehensive coding assistance. For developers, Copilot Pro is one of the best-value AI coding subscriptions available today.

One important billing quirk: GitHub Copilot resets usage quotas on the 1st of each month — code completions and Premium request allowances both reset on the 1st. This means subscribing around the 15th lets you get roughly two months of quota for a single month’s payment (if you don’t subscribe continuously, you effectively get 15 full days each month). In practice, I stopped my subscription on the 7th, then re-subscribed on the 15th (same account — interestingly I was only charged $7 that time, and I haven’t fully figured out the billing logic). My Premium request quota for that month remained at whatever was left over from before the 7th. So I’d suggest alternating between two accounts — essentially getting $20 worth of value out of $10, which is now roughly on par with the Claude Pro $20 tier.

Core Models

Copilot supports multiple AI models and you can switch freely within the IDE:

Claude Sonnet 4.6 / Claude 3.7 Sonnet: Anthropic’s strong coding models
Claude Opus 4.5: Anthropic’s most powerful reasoning model
GPT-4.1 / GPT-5 mini: OpenAI’s latest models
Gemini 2.5 Pro / Gemini 3.1 Pro: Google’s high-performance models
o3-mini / o1-mini: Reasoning-enhanced models

Subscription Plans

Free

Price: $0/month
Includes:
- 2,000 code completions/month
- 50 Premium requests/month
- Basic chat functionality
- VS Code and other IDE support

Pro (Personal Professional)

Price: $10/month or $100/year
Includes:
- Unlimited code completions
- 300 Premium requests/month
- Multi-model switching (Claude, Gemini, GPT, etc.)
- Agent mode (autonomous multi-step coding tasks)
- Code Review
- Copilot CLI (command-line assistant)
- Copilot Chat (conversational coding assistance)

Business (Team)

Price: $19/user/month
Includes:
- Everything in Pro
- Organization management dashboard
- Knowledge base integration (index org codebase)
- Custom model selection policies
- IP indemnification
- SAML SSO

Enterprise

Price: $39/user/month
Includes:
- Everything in Business
- Requires GitHub Enterprise Cloud ($21/user/month)
- Custom model fine-tuning
- Pull Request summaries
- Security vulnerability fix suggestions
- Advanced security compliance features

Core Strengths

1. Deep IDE Integration

VS Code: Best experience, native integration
Visual Studio: Seamless within the Microsoft ecosystem
JetBrains suite: IntelliJ, PyCharm, and more
Neovim / Vim: Terminal-friendly
Xcode: Apple ecosystem support

2. Free Multi-Model Switching

Supports OpenAI, Anthropic, Google, and more
One-click model switching within the IDE
Choose the best model for each task
Premium requests consumed based on model complexity

3. Agent Mode

Autonomously understands requirements and executes multi-step coding tasks
Auto-reads files, edits code, runs terminal commands
End-to-end automated coding experience

4. Coding-Scenario Optimization

Code completion: Real-time context-aware completion
Code generation: Generate code from natural language
Code explanation: Understand complex code logic
Bug fixing: Intelligently locate and fix issues
Code refactoring: Optimize code structure and quality
Unit testing: Auto-generate test cases

Best For

Everyday coding development (top recommendation)
Code review and refactoring
Learning new languages and frameworks
Rapid prototyping
Team collaborative coding

Pros & Cons

Pros:

✅ Extremely competitive price ($10/month, outstanding value)
✅ Multi-model support with free switching
✅ Best-in-class IDE integration experience
✅ Agent mode for strong automation
✅ Deep Microsoft ecosystem integration
✅ Free for students

Cons:

❌ Primarily focused on coding; general conversation is limited
❌ Monthly cap on Premium requests
❌ Tied to the GitHub ecosystem
❌ Enterprise tier is pricey

Personal Take

For developers, GitHub Copilot Pro is one of the most worthwhile AI subscriptions out there. $10/month gets you unlimited code completions plus 300 multi-model Premium requests — just the right amount, not too little and nothing wasted. Compared to subscribing separately to ChatGPT Plus ($20) or Claude Pro ($20), Copilot Pro delivers clearly better value, and since you use it directly inside the IDE, your workflow stays seamless.

Docs & Resources

Official docs: https://docs.github.com/en/copilot
Model list: https://docs.github.com/copilot/reference/ai-models/supported-models
GitHub Blog: https://github.blog

Third-Party AI Providers

OpenRouter

Official Website

https://openrouter.ai/

Overview

OpenRouter is a unified AI model API gateway that provides a single interface for accessing hundreds of AI models. Its core idea is to let developers connect to multiple AI providers through one API, simplifying integration work.

Core Strengths

1. Model Ecosystem (300+ Models)

Large language models: GPT-4, Claude, DeepSeek, GLM, Llama, Mistral, and more
Image models: DALL-E, Stable Diffusion, Midjourney, and more
Multimodal: Vision models, audio, video

2. Smart Routing System

Model Fallbacks: Automatic failover
Provider Routing: Intelligent routing selection
Auto Router: Automatically select the best model (powered by NotDiamond)
Optimization by cost, performance, or reliability

3. Advanced Features

Multimodal Support

Image Inputs: Send images to vision models
Image Generation: Generate images
PDF Inputs: Process PDF documents
Audio: Voice input/output
Video Inputs: Video processing

Enhancement Features

Zero Data Retention (ZDR): No data retained
Structured Outputs: JSON Schema validation
Web Search: Real-time web search
Prompt Caching: Cache prompts to reduce costs
Response Healing: Auto-fix malformed responses
Zero Completion Insurance: No charge for failed responses

4. Model Variants

:free - Free model variant
:extended - Extended context window
:exacto - Prioritizes tool call quality
:thinking - Extended reasoning
:online - Real-time web search
:nitro - High-speed inference

5. Developer Experience

SDKs & Frameworks

Official SDKs: TypeScript, Python
Compatible with: OpenAI SDK, Anthropic Agent SDK
Frameworks: LangChain, Vercel AI SDK, PydanticAI, TanStack AI
Tools: Zapier, Infisical, LiveKit

Integration Tools

BYOK (Bring Your Own Key): Use your own API keys
Guardrails: Data policies and model access restrictions
Broadcast: Integrates with Langfuse, Datadog, Braintrust, and more

6. Management Features

Organization Management: Team collaboration and API key management
App Attribution: Application attribution and ranking
Activity Export: Usage data export
Crypto API: Cryptocurrency payment support

Pricing

Billing: Per token
Transparent pricing: Clear pricing per model
Cost optimization: Smart routing reduces costs
Free models: Some models available in :free variant

Best For

Projects connecting to multiple AI models simultaneously
Enterprise apps requiring high availability and failover
Developers looking to reduce migration costs
Scenarios requiring flexible model switching
A/B testing across different models

Pros & Cons

Pros:

✅ Unified interface, lower integration complexity
✅ Hundreds of models, rich selection
✅ Smart routing and failover
✅ Advanced features (caching, structured outputs, etc.)
✅ Compatible with major SDKs, low learning curve
✅ Active community and ecosystem

Cons:

❌ Extra middleware layer, possible additional latency
❌ Dependent on OpenRouter’s service stability
❌ Some advanced features may cost extra

Docs & Resources

Official docs: https://openrouter.ai/docs
GitHub: https://github.com/openrouter
Community projects: Awesome OpenRouter

Together AI

Official Website

https://www.together.ai/

Overview

Together AI is an AI infrastructure provider offering hosted inference for open-source models, along with custom model training and deployment services.

Core Strengths

1. Open-Source Model Hosting

Llama series: Llama 3, Llama 2, and more
Mistral series: Mistral, Mixtral, and more
Other open-source models: Falcon, Vicuna, and more
Regular updates with the latest open-source models

2. High-Performance Inference

GPU optimization: Optimized for specific GPUs
Flash Attention: Accelerated inference
Low latency: Optimized inference engine
High throughput: Supports large-scale concurrency

3. Custom Models

Model fine-tuning: Fine-tuning service
Custom training: Train on your own data
Model evaluation: Model performance benchmarking tools
Model deployment: One-click deployment

4. Developer Tools

Python SDK: Full Python client
OpenAI-compatible: Works with the OpenAI SDK
Monitoring and analytics: Usage tracking
Cost management: Detailed cost analysis

5. Enterprise Features

Private deployment: Support for private cloud
Data privacy: GDPR compliant
SLA guarantees: Enterprise service levels
Technical support: Professional team support

Pricing

Billing: Per token
Transparent pricing: Open-source model prices generally lower than proprietary
Volume discounts: Discounts for high usage
Reserved instances: Reserve capacity for long-term use

Best For

Developers who prefer open-source models
Teams needing custom model training
Cost-sensitive large-scale applications
Enterprises requiring private deployment

Pros & Cons

Pros:

✅ Rich open-source model ecosystem
✅ Well-optimized performance, fast
✅ Supports custom model training
✅ Strong openness and control
✅ Relatively lower costs

Cons:

❌ Does not include proprietary models like GPT or Claude
❌ Model capabilities may not match proprietary models
❌ Limited multimodal support

Docs & Resources

Website: https://www.together.ai/
Docs: https://docs.together.ai/
GitHub: https://github.com/togethercomputer

Replicate

Official Website

https://replicate.com/

Overview

Replicate is an AI model hosting platform that makes it easy for developers to run open-source AI models — including large language models, image generation, audio processing, and more.

Core Strengths

1. Rich Model Library

Language models: Llama, Mistral, Falcon, and more
Image generation: Stable Diffusion series
Image processing: Super-resolution, inpainting, style transfer, and more
Audio processing: Speech synthesis, recognition, and more
Video generation: Video synthesis and editing
Other models: OCR, NLP, and more

2. Ease of Use

Simple API: Clean REST API
Python SDK: Python client
Web Playground: Test models online
Rich examples: Extensive usage examples

3. Custom Models

Upload models: Upload your own models
Docker support: Docker-based model deployment
Cog API: Performance-optimized Cog API
Version control: Model versioning

4. Community Ecosystem

Model sharing: Community model library
Fork models: Build on others’ models
Open-source friendly: Large open-source model collection

5. Developer Experience

Live preview: Preview model output online
Debugging tools: Convenient debugging and optimization
Monitoring dashboard: Usage and cost monitoring
Webhooks: Async task callbacks

Pricing

Billing: By compute time
Transparent pricing: Clear hourly cost
Free credits: Free credits for new users
Pay-as-you-go: Flexible billing

Best For

Rapid prototyping
Testing different models
Small-scale applications
Teams needing diverse model types
Open-source model enthusiasts

Pros & Cons

Pros:

✅ Very rich model library
✅ Easy to use, quick to get started
✅ Supports custom models
✅ Active community
✅ Relatively affordable

Cons:

❌ Does not include proprietary models (GPT, Claude)
❌ Performance may not match dedicated services
❌ Limited enterprise features
❌ Multimodal integration requires manual handling

Docs & Resources

Website: https://replicate.com/
Docs: https://replicate.com/docs
GitHub: https://github.com/replicate

Fireworks.ai

Official Website

https://fireworks.ai/

Overview

Fireworks.ai is a high-performance AI inference platform focused on delivering fast, low-cost AI model inference.

Core Strengths

1. High-Performance Inference

Ultra-fast inference: Industry-leading inference speed
Low latency: Optimized inference engine
High throughput: Supports large-scale concurrency
GPU optimization: Deep hardware-level optimization

2. Model Ecosystem

Open-source models: Llama, Mistral, and more
Optimized models: Fireworks-optimized model variants
Custom models: Support for custom model deployment
Multimodal: Text, images, and more

3. Cost Advantage

Transparent pricing: Clear billing
Pay-as-you-go: Flexible billing model
Volume discounts: Discounts for high usage
Reserved instances: Lower costs for long-term use

4. Developer Experience

OpenAI-compatible: Works with the OpenAI SDK
Python SDK: Full Python client
REST API: Standard REST interface
Monitoring tools: Usage tracking

5. Enterprise Features

Private deployment: Private cloud support
Data security: Enterprise-grade security
SLA guarantees: Service level agreements
Technical support: Professional support team

Technical Highlights

Flash Attention: Accelerated attention computation
KV Cache: Optimized caching mechanism
Quantization: Model quantization to reduce costs
Distributed inference: Distributed deployment support

Pricing

Billing: Per token
Cost advantage: Competitive pricing compared to other providers
Flexible billing: Multiple billing modes supported

Best For

Performance-demanding applications
Cost-sensitive large-scale applications
Scenarios requiring low latency
Projects preferring open-source models

Pros & Cons

Pros:

✅ Extremely fast inference speed
✅ Clear cost advantage
✅ OpenAI-compatible, low migration cost
✅ Well-optimized performance
✅ Enterprise-grade features

Cons:

❌ Relatively fewer models
❌ Does not include proprietary models
❌ Limited multimodal support
❌ Smaller community ecosystem

Docs & Resources

Website: https://fireworks.ai/
Docs: https://fireworks.ai/docs

Hugging Face Inference

Official Website

https://huggingface.co/

Overview

Hugging Face is the largest open-source model community, offering model hosting, inference services, datasets, and more. Hugging Face Inference is its inference API service.

Core Strengths

1. Model Ecosystem (Largest)

Massive model library: Tens of thousands of models
Language models: Llama, Mistral, BERT, T5, and more
Image models: Stable Diffusion, ViT, and more
Audio models: Whisper, AudioLDM, and more
Multimodal: All kinds of multimodal models

2. Community-Driven

Open-source ecosystem: Largest open-source model community
Model sharing: Users can share their models
Collaborative development: Community-driven model improvements
Rich resources: Tutorials, docs, and examples galore

3. Inference Services

Serverless API: Serverless inference
Inference Endpoints: Dedicated inference endpoints
Private deployment: Private cloud support
GPU acceleration: GPU-accelerated inference

4. Developer Tools

Python SDK: The transformers library
JavaScript SDK: Browser support
API clients: Clients for multiple languages
Web UI: Online testing and demos

5. Enterprise Features

Inference Endpoints: Enterprise-grade inference endpoints
Data security: GDPR compliant
SLA guarantees: Service level agreements
Private repositories: Private model repositories

Pricing

Serverless: Pay per usage
Inference Endpoints: Hourly billing (monthly/annual)
Free tier: Free usage available
Enterprise pricing: Customized enterprise plans

Best For

Teams needing specific open-source models
Open-source model enthusiasts
Research and experimentation
Projects requiring diverse model choices
Open-source initiatives

Pros & Cons

Pros:

✅ Most models of any platform
✅ Richest community ecosystem
✅ Open-source friendly
✅ Rich documentation and tutorials
✅ Supports virtually all open-source models

Cons:

❌ Does not include proprietary models (GPT, Claude)
❌ Performance may not match dedicated providers
❌ Enterprise-grade features require extra payment
❌ Inference speed may be slower

Docs & Resources

Website: https://huggingface.co/
Docs: https://huggingface.co/docs/api-inference
GitHub: https://github.com/huggingface

SiliconFlow

Official Website

https://siliconflow.cn/

Overview

SiliconFlow is a Chinese company aiming to become a leading global AI capability provider. It offers multimodal model capabilities spanning language, speech, images, and video, aggregating both domestic and international model sources.

Core Strengths

1. Full-Scenario Product Matrix (Multimodal Aggregation)

Language models: DeepSeek-R1, DeepSeek-V3, QwQ-32B, GLM-4-9B-Chat, and more
Voice models: CosyVoice2-0.5B
Image models: Kolors
Video models: HunyuanVideo-HD, Wan2.1-I2V-14B-720P, Wan2.1-T2V-14B, and more

2. Performance Optimization

High-speed inference: Language model speed improved by 10x+
Low latency: Voice generation latency as low as 100ms
Deep optimization for domestic Chinese models

3. Cost Advantage

Image generation cost savings of 66%
Language model cost savings of 46%
Hosting cost reduction for customers of 52%

4. Enterprise-Grade Features

High Stability

Developer-validated high reliability
Comprehensive monitoring and fault-tolerance
Enterprise-grade professional technical support

High Security

BYOC deployment: Protect data privacy
Compute/network/storage isolation: Comprehensive security
Meets industry standards and compliance requirements
Supports domestic-only deployment

High Scalability

Dynamic scaling to support elastic workloads
One-click custom model deployment
Hybrid cloud deployment support

5. Intelligent Capabilities

Smart scaling for flexible business growth
Intelligent cost analysis for budget control
Access to multiple advanced model services

Technical Advantages

Deep optimization for domestic Chinese LLMs (DeepSeek, GLM, etc.)
Comprehensive multimodal capabilities
Enterprise deployment solutions
Compliant with Chinese data regulations
Localized service support

Pricing

Billing: Per token or per call
Cost advantage: Significant savings compared to overseas providers
Flexible plans: Multiple pricing options available

Best For

Domestic enterprises using Chinese large models
Multimodal AI application development
Scenarios with strict data security and compliance requirements
Cost-sensitive projects
Enterprise-grade deployment scenarios

Pros & Cons

Pros:

✅ Clear cost advantage
✅ Comprehensive multimodal capabilities
✅ Well-optimized for Chinese domestic models
✅ Compliant with Chinese regulations
✅ Localized service support
✅ Comprehensive enterprise features

Cons:

❌ International model coverage not as broad as OpenRouter
❌ Documentation and community relatively new
❌ Lower degree of internationalization

Docs & Resources

Website: https://siliconflow.cn/
API docs: https://docs.siliconflow.cn/

Comparison Summary

Native Providers vs Third-Party Providers

Feature	Native Providers	Third-Party Providers
Model capability	Strongest	Depends on upstream
Model variety	Single vendor	Rich selection
Unified interface	Per vendor	✅ Unified interface
Smart routing	❌	✅
Failover	❌	✅
Integration complexity	High (multi-vendor)	Low
Vendor lock-in	High	Low
Latency	Low	Slightly higher
Stability	High	Platform-dependent
Cost	Higher	More optimization room
Ecosystem	Mature but closed	Open
Enterprise features	Comprehensive	Partial support
Compliance	Needs verification	Mixed

Quick Comparison Table (All Providers)

Feature	OpenAI	Google	Claude	Zhipu	Baidu	Alibaba	Doubao	Kimi	GitHub Copilot	OpenRouter	Together	Replicate	Fireworks	HF	SiliconFlow
Type	Native	Native	Native	Native	Native	Native	Native	Native	Coding tool	Third-party	Third-party	Third-party	Third-party	Third-party	Third-party
Model capability	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Model variety	1	1	1	1	1	1	1	1	Multi-provider	300+	50+	Thousands	20+	Tens of thousands	Multiple
Chinese	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Multimodal	✅	✅	Partial	✅	Partial	Partial	✅	❌	❌	✅	Partial	✅	Partial	✅	✅
Smart routing	❌	❌	❌	❌	❌	❌	❌	❌	❌	✅	❌	❌	❌	❌	Partial
Cost	High	Medium	Medium-high	Medium	Medium	Medium	Low	Medium	Extremely low	Medium	Low	Low	Low	Medium	Low
Enterprise features	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐
Documentation	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
Community	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐⭐	⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐
Compliance	Low	Low	Low	High	High	High	High	High	Medium	Medium	Medium	Medium	Medium	Medium	High

Recommendations

Choose a native provider if:

OpenAI

You need peak model performance
Enterprise-grade apps with high stability requirements
Global products needing multilingual support
You don’t want to rely on third parties
Cost is not a primary concern

Google Gemini

You need Google ecosystem integration
Multimodal application development
You’re on Google Cloud
You need MLOps capabilities

Anthropic Claude

High safety requirements
You need long context (200K)
Coding assistant tools
Chatbots

Zhipu AI

Domestic Chinese application development
Chinese-primary applications
Strict compliance requirements
Cost-sensitive

Baidu ERNIE Bot

Baidu ecosystem integration
Need Baidu Cloud services
SMB rapid deployment

Alibaba Cloud Qwen

Existing Alibaba Cloud users
E-commerce applications
Open-source model preference

ByteDance Doubao

ByteDance ecosystem integration
Multimodal applications
Consumer-facing apps
Cost-sensitive

Moonshot Kimi

Long document analysis
Research and academic work
Personal knowledge management

GitHub Copilot

Everyday coding development (strongly recommended)
Coding scenarios needing multi-model switching
Limited budget but need high-quality AI assistance
Seamless in-IDE use without switching between browser and editor

Choose a third-party provider if:

OpenRouter

You need to connect to multiple models at once
You want smart routing and failover
Reducing vendor lock-in risk
You need A/B testing

Together AI

You prefer open-source models
You need custom model training
Cost-sensitive large-scale applications

Replicate

Rapid prototyping
Testing different models
Small-scale applications
Open-source model enthusiasts

Fireworks.ai

Extremely high performance requirements
Cost-sensitive large-scale applications
Low latency requirements

Hugging Face

Specific open-source models
Research and experimentation
Community-driven development

SiliconFlow

Domestic enterprises
Multimodal applications
Strict compliance requirements
Cost-sensitive

Best Practices

1. Hybrid Strategy

Core features → Native provider (stability, capability)
Cost optimization → Third-party open-source models
Compliance requirements → Locally compliant provider
A/B testing → Third-party aggregation platform

2. Avoiding Vendor Lock-In

Use an abstraction layer to wrap the API
Design swappable model selection strategies
Maintain multi-provider backup plans

3. Cost Optimization

Use caching to reduce repeated requests
Choose models based on task complexity
Monitor usage and costs
Take advantage of free quotas

4. Monitoring and Observability

Track model performance metrics
Monitor usage and costs
Set up alerting mechanisms
Use platform analytics tools

Learning Resources

Native Providers

OpenAI: https://platform.openai.com/docs
Google AI: https://ai.google.dev/docs
Anthropic: https://docs.anthropic.com/
Zhipu AI: https://open.bigmodel.cn/dev/api
Baidu: https://cloud.baidu.com/doc/WENXINWORKSHOP/
Alibaba Cloud: https://help.aliyun.com/zh/dashscope/
ByteDance: https://platform.volcengine.com/
Kimi: https://www.moonshot.cn/
GitHub Copilot: https://docs.github.com/en/copilot

Third-Party Providers

OpenRouter: https://openrouter.ai/docs
Together AI: https://docs.together.ai/
Replicate: https://replicate.com/docs
Fireworks.ai: https://fireworks.ai/docs
Hugging Face: https://huggingface.co/docs/api-inference
SiliconFlow: https://docs.siliconflow.cn/

Search Keywords

AI subscription plan comparison
LLM API pricing
OpenAI vs Claude vs Google
third-party AI provider
Chinese AI model comparison
AI API aggregation platform
OpenRouter tutorial
AI inference platform

Future Updates

This document will be updated continuously to track the latest developments and pricing changes from AI providers. I recommend checking each provider’s official announcements and changelogs regularly.

Update plan:

Update pricing information
Add new models and services
Supplement with real-world use cases
Add performance benchmark data
Update compliance and privacy policies

This document is based on information as of March 2026. AI providers change rapidly — always refer to official sources for the latest information.

Eugene's Page

UE“反射”概念：

回退操作 Command 模式（轻量级）：**

UE智能指针对比表

关键区别说明

ECS 架构是什么？和传统 OOP 有什么区别？

堆Stack 栈heap

Function Calling 的原理是什么？你在项目中怎么用的？

RAG 是什么？你是怎么实现的？

ControlNet 是什么？它解决了什么问题？

LoRA 是什么？为什么它很受欢迎？

MVC、MVP、MVVM 的区别是什么？

GPU 渲染流水线的完整阶段？

UE “Reflection” Concept

Undo/Redo — Command Pattern (Lightweight)

UE Smart Pointer Comparison

Key Distinctions

What is ECS Architecture? How Does It Differ from Traditional OOP?

Stack vs. Heap

What Is Function Calling and How Have You Used It in Projects?

What Is RAG and How Did You Implement It?

What Is ControlNet and What Problem Does It Solve?

What Is LoRA and Why Is It So Popular?

What Is the Difference Between MVC, MVP, and MVVM?

What Are the Complete Stages of the GPU Rendering Pipeline?

Obsidian 学习路径与功能笔记

Obsidian 学习路径与功能笔记

0. 为什么是 Obsidian

1. 我自己的文件目录路径

1. 学习路径总览（建议按周推进）

2. W1：基础——把”骨架”立起来

2.1 核心概念

2.2 必须背下来的快捷键

2.3 W1 练习

3. W2：组织——确立结构与模板

3.1 文件夹策略（与本仓库现状对齐）

3.2 三种主流方法论（任选其一即可，别全上）

3.3 必装核心插件（自带）

4. W3：进阶——让笔记自己动起来

4.1 必装社区插件（短列表，不要贪多）

4.2 Dataview 入门示例

4.3 Templater 模板示例

5. W4：工作流——把 Obsidian 嵌进现有管线

5.1 与 Hexo 兼容的 frontmatter

5.2 与 Git 联动

5.3 与博客主题（hexo-theme-magnetic）的注意事项

6. 进阶专题（按需展开）

6.1 Canvas（白板）

6.2 Sync 方案对比

6.3 移动端

7. 路径布置建议（针对本仓库）

方案 A：把 notes/ 单独作为 Vault（推荐）

方案 B：保持仓库根作为 Vault

.gitignore 建议（任一方案都加）

8. 插件/流程

Image Auto Upload

Obsidian CLI + Claudian

配套 Claude Code Skills（kepano/obsidian-skills）

Advanced Canvas 插件 + json-canvas

9. 参考资源

Houdini MCP Project Comparison

Houdini MCP Project Comparison: capoomgit/houdini-mcp vs healkeiser/fxhoudinimcp

Introduction

Overview Comparison

Architecture Comparison

houdini-mcp (capoomgit)

fxhoudinimcp (healkeiser)

Architecture Analysis

Feature Coverage Comparison

houdini-mcp Feature Set

fxhoudinimcp Feature Set (19 categories, 168 tools)

Installation and Configuration Comparison

houdini-mcp Installation Steps

fxhoudinimcp Installation Steps

Installation Experience Comparison

Client Support Comparison

Exclusive Features

Exclusive to houdini-mcp

Exclusive to fxhoudinimcp

Recommendations by Use Case

方案 A：把 `notes/` 单独作为 Vault（推荐）

`.gitignore` 建议（任一方案都加）