COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

COLLEAGUE.SKILL distills human traces into versioned, inspectable AI skill packages rather than opaque prompts.

How can we systematically distill human expertise and interaction style into inspectable, governed AI skills without attempting to replicate the person themselves?

LLM agents often lack the specific expertise, judgment, and interaction style of human teammates, yet existing methods for capturing these traits rely on hidden memories or monolithic, uneditable prompts. COLLEAGUE.SKILL is an automated pipeline that distills heterogeneous human traces—such as chat logs, documents, and code reviews—into structured, versioned skill packages. These packages separate expert capabilities from behavioral constraints, allowing users to inspect, correct, and install them across multiple agent hosts. The system supports over 215 community-contributed skills and has garnered more than 100,000 cumulative stars on its public gallery, demonstrating a viable distribution surface for person-grounded artifacts.

Paper Primer

The system treats person-grounded knowledge as an artifact construction problem. It parses raw data into two distinct tracks: a capability track for decision heuristics and mental models, and a behavior track for communication style and interaction rules.

The core mechanism is a dual-representation writer: it renders these tracks into independent Markdown files (`work.md` and `persona.md`) that can be invoked separately or combined into a single `SKILL.md` entry point. This modularity allows users to apply expert judgment without necessarily adopting a specific persona, or vice versa.

The system enables iterative, natural-language correction of agent behavior.

The pipeline supports a correction handler that accepts feedback (e.g., "he would not say that"), generates a patch, archives the previous version, and updates the skill package while preserving the audit trail. The workflow supports full versioning, rollback, and granular updates to specific sections of the skill artifact.

Why treat these skills as "artifacts" rather than just training a model on the data?

Treating them as artifacts makes the knowledge inspectable, correctable, and portable. Unlike hidden model weights or monolithic prompts, these packages allow users to audit the source boundaries, roll back changes, and install specific capabilities across different agent hosts.

Does this system claim to faithfully reproduce a person's identity?

No. The authors explicitly state that the system is not for identity replacement or behavioral cloning. It is a tool for distilling selected, bounded practices—such as review checklists or decision heuristics—into a technical object that remains under user control.

For researchers and builders, this paper shifts the focus from "how to make an agent sound like a person" to "how to package expert judgment as an auditable, versioned software dependency."

Introduction to Person-Grounded Agents

We frame the need to turn human traces into inspectable, correctable AI skill packages.

LLM agents are moving beyond isolated commands toward carrying reusable context about how work and interaction should be performed. In practice, users want agents to preserve bounded fragments of a person’s expertise—review judgments, decision heuristics, or interpersonal style—without attempting full identity simulation. This gap motivates a person‑grounded trace‑to‑skill distillation problem.

A person‑grounded skill is a bounded, inspectable artifact that encodes selected traces of a person or role, exposing their knowledge, mental models, and interaction constraints without claiming to replicate the whole identity.

Raw trace size = $4 \times 2\,$KB $= 8\,$KB.

Extraction selects $4$ representative snippets, each $1\,$KB, yielding a compact package of $4\,$KB.

The compact size makes inspection, versioning, and distribution feasible on commodity hardware.

The example shows how trace‑grounded distillation shrinks heterogeneous evidence from gigabytes to kilobytes, enabling practical skill packaging.

COLLEAGUE.SKILL (Automated AI Skill Generation via Expert Knowledge Distillation) implements this pipeline: it ingests heterogeneous traces, renders a versioned skill package with the two tracks, and provides installers for agents such as Claude Code, OpenClaw, and Hermes. The system supports natural‑language correction, rollback, and optional gallery distribution, and it ships domain‑specific presets for colleague, celebrity/public‑figure, and relationship settings.

The shift from identity‑based agents to trace‑grounded skill artifacts makes human expertise portable, inspectable, and safely controllable.

System Architecture and Standards

We formalize skill generation as a bounded artifact problem and describe its dual‑track representation.

The core difficulty is turning heterogeneous traces (documents, chats, code reviews) into a reusable, inspectable AI skill without leaking the full identity of the source person.

Think of a skill package as a well‑defined toolbox: every file has a fixed role, the toolbox can be opened by any compatible agent, and a manifest tells the agent exactly what to expect.

manifest.json declares version 1.0, preset “colleague”, and lists the three files with their SHA‑256 hashes.

work.md holds a single capability rule: “When asked for project status, reply with the latest sprint summary.”

SKILL.md references work.md as its capability track and includes an empty persona section, making the package fully invokable.

An agent loads manifest.json, verifies the version, reads SKILL.md, and routes a “status” query to the rule in work.md.

The example shows how the standard enforces a clear entry point, explicit versioning, and separable tracks while remaining trivially small enough to simulate mentally.

How does this dual‑track artifact differ from a conventional persona model that bundles knowledge and style together?

In a conventional persona model the knowledge base, procedural heuristics, and surface tone are interleaved in a single language model, making it impossible to inspect or replace one component without retraining. The Agent Skills standard isolates the capability track (work.md) from the style track (persona.md), so each can be examined, updated, or omitted independently while the manifest guarantees compatibility.

This table outlines the artifacts used in the Agent Skills standard, their primary consumers, and their contents.

With the standard in place, the COLLEAGUE.SKILL pipeline can reliably turn raw work traces into a versioned, inspectable skill that agents load and execute without ever exposing the full identity of the source person.

The Distillation Pipeline

Separate capability and behavior tracks, then versioned governance, turn raw traces into inspectable AI skills.

Building person‑grounded AI skills collapses when a single prompt tries to encode both durable work methods and fleeting interaction quirks, making inspection impossible. The paper’s trick is to split generation into a capability track and a behavior track, then lock the resulting artifact behind versioned governance rails.

The system treats a raw interaction trace as a blueprint, extracts reusable procedural chunks, and writes them into a structured skill contract that can be inspected, edited, and versioned.

Normalize the exchange into three fields: scene, wrong, correct.

Capability track extracts the policy rule “refunds may be flexible when justified”.

Behavior track extracts the polite phrasing “I’m sorry to hear that…”.

Combine into a Markdown contract with a “## Capability” section containing the rule and a “## Behavior” section containing the phrasing.

Commit the contract as version v1 in the repository.

The resulting contract isolates the policy from the wording, so a later audit can change the wording without touching the rule, or vice‑versa.

Collect source material (chat logs, PDFs, Slack exports, etc.) and optional profile fields.

Select a preset (colleague, celebrity, relationship) that tailors the prompt focus.

Run the capability track to extract durable methods and heuristics.

Run the behavior track to extract expression and interaction patterns.

Render both tracks into structured Markdown sections.

Package the Markdown into the artifact contract and hand it to the writer module.

The figure illustrates a five-stage workflow process: 1. Trace intake, 2. Preset router, 3. Dual distill, 4. Artifact writer, and 5. Productization. Each stage includes specific sub-components and descriptive text. A "Governance rail" is positioned at the bottom, connected to the main flow, featuring five key elements: local-first storage, provenance + evidence, correction log, version / rollback, and optional gallery.

After a skill contract is created, a set of immutable safety and lifecycle mechanisms surround it, guaranteeing that any change is traceable, reversible, and respects consent.

The version manager archives the current v2 contract.

Apply the correction patch, updating the “Behavior” section’s time phrasing.

Increment the version to v3 and record the patch in the correction log.

Because provenance links the patch to the original trace, a reviewer can see why the time shift was needed.

If the user revokes consent, the rollback feature can revert to v1, which lacks the time‑specific entry.

The rail ensures every behavioral tweak is both auditable and reversible, protecting against accidental drift.

Receive natural‑language feedback (e.g., “he would not say that”).

Classify the feedback as expert‑work or expression‑level.

If expert‑work, generate a Markdown patch targeting the matching level‑2 heading.

If expression‑level, create a structured correction record {scene, wrong, correct}.

Archive the current version, apply the patch or record, increment the lifecycle version.

Regenerate derived artifacts (e.g., compiled prompts, UI previews).

Version manager updates its index, enabling list, backup, rollback, and cleanup operations.

The public‑figure extension adds a research pass that pulls first‑person writings, long‑form interviews, and timeline evidence, then runs the same dual‑track pipeline. Evidence limits are recorded so downstream agents can downgrade confidence when sources are thin.

The relationship preset reuses the artifact pipeline but treats personal interaction traces as locally owned, editable state. Stronger consent, retention, and access‑control assumptions are baked into the governance rail, making deletion and correction first‑order requirements.

Applications and Lifecycle

We showcase three application presets and compare our skill lifecycle to prior agent systems.

The central premise is that COLLEAGUE.SKILL distills human expertise into bounded, inspectable AI skills by grounding them in concrete source traces rather than mimicking a person’s identity.

COLLEAGUE.SKILL supports three distinct application presets—colleague, celebrity, and relationship—each with tailored evidence, governance, and runtime configurations.

Figure 2 illustrates the preset hierarchy and its three branches.

The lifecycle treats a skill as a versioned artifact that can be created, inspected, invoked, corrected, archived, and shared, while a parallel update lane records corrections, generates patches, archives metadata, and enables rollback.

**Figure 2.** Application presets layered on the COLLEAGUE.SKILL person-grounded skill pipeline. The shared artifact workflow branches into colleague, celebrity, and relationship presets with different evidence scopes, governance requirements, and invocation aliases.

**Figure.** Versioned Skill Lifecycle

**Figure.** Public Deployment Surface

Discussion and Implications

We examine the limits of person‑grounded skills and how they should be deployed responsibly.

The paper’s premise is that we can distill selected human knowledge and interaction style into a bounded, inspectable AI skill without trying to recreate the whole person. This “person‑grounded skill” encodes how a role weighs evidence, detects risk, explains trade‑offs, refuses bad requests, and adapts communication. It is deliberately limited to avoid identity replacement.

The key difference is that we produce inspectable artifacts whose evidence and limits are visible, rather than black‑box models that merely mimic a person’s surface behavior.

The workflow matters because a single prompt cannot guarantee accountability. COLLEAGUE.SKILL treats trace‑to‑skill distillation as a file‑based pipeline: creation, inspection, invocation, correction, rollback, deletion, host installation, and optional distribution. These operations let users audit, repair, withhold, or share the skill in a controlled way.

Extraction quality can be examined in the generated work.md and persona.md files, while installation and sharing rely on manifests instead of ad‑hoc instructions. Governance operates on explicit metadata rather than hidden prompt state, making the research object sharper and easier to study.

The same mechanism underlies four domains: colleague‑level expert checklists, celebrity mental‑model packages, private relationship constraints, and a public gallery of skills. Together they illustrate an ecosystem of reusable, bounded artifacts rather than a collection of unrestricted person simulations.

Our claims are artifact‑level: we define a package format, implement the workflow, expose correction and rollback state, support multiple agent hosts, and show the same pipeline works across all four settings. We do not claim that the skills faithfully reproduce a person or that they automatically improve downstream performance.

Evaluating behavioral fidelity therefore requires human and task‑based studies: comparing full versus capability‑only versus behavior‑only variants under matched source evidence, and measuring risk‑utility trade‑offs such as over‑attachment in relationship skills or hallucinated motives in public‑figure extensions.

Productization is not a cosmetic add‑on; the installer, manifest, rollback, and deletion mechanisms turn the distillation into a legible software object whose ownership, provenance, versioning, and deployment boundaries can be audited. These concrete handles enable future research to compare source scopes, correction histories, and invocation modes without reverse‑engineering hidden model memory.

Limitations remain: the paper does not address source‑matching quality, downstream task performance, emotional safety, or user‑trust calibration. Real deployments will depend on source quality, extraction fidelity, model behavior, and human review, and corrections may inadvertently encode editor bias.

Responsible deployment therefore requires explicit participation, scoped source collection, access controls, retention limits, and non‑mandatory use. Governance affordances are provided, but lawful source use, consent, and full redaction must be handled through separate review processes. Gallery publication should be opt‑in with attestation, review, takedown mechanisms, source‑boundary labels, and clear disclaimers for celebrity or relationship extensions.

In conclusion, COLLEAGUE.SKILL does not aim to recreate people; it creates portable, versioned, inspectable skill artifacts that encode selected human traces while keeping provenance, consent, and safety visible. This product‑oriented approach gives researchers concrete objects to read, revise, install, withhold, and delete, paving the way for future benchmarks that evaluate judgment preservation and safety rather than open‑ended impersonation.

Acknowledgements

We thank the community contributors whose feedback enabled the public COLLEAGUE.SKILL ecosystem.

We especially thank the community members who contributed skills, submitted feedback, and supported the public gallery.

0xAlexWu 123pyLeo 1544501967 1sh1ro 2559063619 Aar0nPB AdeleZhu Adrin agenmod AicbLab aka556 alchaincyf AnonBug arould001 awecsfgvs baibai2013 bankeluilian baojiachen0214 Bayson-create BeamusWayne binggandata BiscuitCoder bombers26 Bughouse1024 ByteRax c0dedance Canding3021 cantian-ai ccjincc ceetity Charpup ChrisWu11 ClarkYoung-xhs CommitHu502Craft cyber-immortal cyberk1895 Cyh29hao dadwadw233 daiyanpgg-wq DanZai233 davecat Dclef derrickgong87 dglijin-oss dull-bird DysonSWang EastZsRoad FANzR-arch Fhui Formangarden524 gufenglees guilings Hchen1218 heywanrong hotcoffeeshake

Read the original paper

Open the simplified reader on Paperglide