🔧 What Is a Skill?
In the context of AI agents, a skill (also called a tool or function) is a packaged, reusable capability that an LLM can invoke to interact with the world outside its context window. Skills bridge the gap between language understanding and real-world action.
Every skill has four components:
- Name — a unique identifier the model uses to reference the skill (e.g.,
web_search) - Description — natural language text explaining what the skill does and when to use it; the LLM reads this to decide when to call the skill
- Input schema — a structured definition of the arguments (JSON Schema); the model fills these in based on the task
- Implementation — the actual code that runs: an API call, database query, shell command, or any other operation
Example: A get_weather skill has the description "Fetches current weather for a city", accepts { "city": "string" } as input, and calls a weather API. The LLM never sees the API key or HTTP call — it just receives the result back as text.
⚙️ How Tool Calling Works
When an agent has access to skills, the LLM doesn't call them directly — it requests a call by generating a structured output. The runtime (your application code) executes the actual call and returns the result. Here's the full cycle:
- Tool registration — Your code sends the model a list of available skills with their descriptions and schemas
- Model reasoning — The LLM reads the task and the available tools, decides which tool to call and with what arguments
- Tool call request — The model outputs a structured "tool call" message (e.g.,
{ "tool": "web_search", "args": { "query": "current Gemini models" } }) - Execution — Your runtime code calls the actual function, API, or service
- Result injection — The result is added back to the conversation context
- Continued reasoning — The model reads the result and either calls another tool, asks a follow-up, or generates the final answer
This loop can repeat many times in one turn. A complex agentic task might call 10–20 tools in sequence before producing a final response.
📊 Skills vs Plugins vs MCP Servers
The terminology has evolved rapidly. Here's how the concepts relate historically:
| Era | Name | How it worked | Status |
|---|---|---|---|
| 2023 | ChatGPT Plugins | User-installable extensions with an OpenAPI spec; ChatGPT called them via HTTP | Deprecated (replaced by GPTs + tools) |
| 2023 | OpenAI Function Calling | API-level JSON Schema definitions; the model outputs a structured function call, your code executes it | Active — renamed "tool use" |
| 2023– | Tool Use / Skills | Same as function calling; Anthropic coined "tool use", others use "skills" or "functions" | Active — current standard |
| 2024– | MCP Servers | Standardized protocol (Anthropic, Dec 2024) for packaging and distributing tool servers; any MCP client can connect to any MCP server | Active — growing ecosystem |
Key distinction: Traditional tool use is defined inline in your application code. MCP servers are standalone processes that expose tools over a standardized protocol — making them reusable across different AI clients (Claude Desktop, Cursor, VS Code, custom agents). Read more: What Is MCP.
🗂️ Tool Categories & Risk Levels
Not all tools carry the same risk. Grouping by risk level helps define your agent's action space and where to add human-in-the-loop checkpoints:
| Category | Examples | Risk | Recommendation |
|---|---|---|---|
| Read-only | web_search, read_file, get_weather, list_dir | Low | Allow autonomously; log all calls |
| Write (local) | write_file, create_dir, edit_code | Medium | Scope to a sandboxed workspace directory |
| Network / External | send_email, post_to_slack, call_api | Medium–High | HITL approval for irreversible sends |
| OS / Shell | run_command, execute_script, install_package | High | Restrict to containerized environment; HITL required |
| Destructive | delete_file, drop_table, revoke_access | Critical | Always require explicit human confirmation |
🏗️ Anatomy of a Good Skill
The single most important design decision for a skill is its description. The LLM reads descriptions to decide which tool to call — a poorly written description leads to wrong tool selection, missed calls, or ambiguous arguments.
What makes a description effective
- Be explicit about when to use it — "Use this tool when the user asks about real-time data or current events" is better than "Searches the web"
- State what it does NOT do — "Does not return historical data older than 30 days"
- Describe the output format — "Returns a JSON array of search results with title, url, and snippet fields"
- Keep it under 200 words — descriptions longer than ~200 tokens can overwhelm the model's attention when many tools are registered
Schema design principles
- Required vs optional — Only mark fields required if truly necessary; optional fields with defaults reduce model errors
- Use enums for constrained values —
"format": {"enum": ["json", "markdown", "text"]}prevents hallucinated values - Prefer idempotent tools — Tools that can be safely retried (reads, lookups) are safer than one-shot actions (sends, deletes)
- Return structured data — JSON responses are easier for the model to reason about than unstructured text blobs
💡 Real-world Examples
| Skill name | What it does | Key arguments | Risk |
|---|---|---|---|
web_search | Query a search engine and return top results | query: string, num_results?: number | Low |
read_file | Read a file from the workspace directory | path: string, offset?: number, limit?: number | Low |
write_file | Create or overwrite a file | path: string, content: string | Medium |
run_tests | Execute the project test suite and return results | filter?: string | Medium |
browser_screenshot | Navigate to a URL and return a screenshot | url: string | Medium |
send_email | Send an email to one or more recipients | to: string[], subject: string, body: string | High — always HITL |
run_shell | Execute an arbitrary shell command | command: string, cwd?: string | High — sandbox required |
✅ Best Practices
Minimal permissions
Grant each agent only the tools it needs for its specific task. An agent that answers FAQ questions doesn't need write_file or send_email. The smaller the action space, the smaller the blast radius if the agent is manipulated. See: Excessive Agency.
Audit all tool calls
Log every tool call with its arguments and result. This enables debugging, cost attribution, and detection of anomalous behavior (e.g., an agent reading ~/.ssh/ when it should only access the project directory). AgentOps platforms like LangSmith make this easy.
Human-in-the-loop for irreversible actions
Any action that is hard or impossible to undo — sending messages, deleting data, making purchases — should require explicit human confirmation before execution. Build HITL checkpoints into your agent runtime, not just the prompt. Prompts can be overridden; code cannot be.
Beware tool poisoning
If you allow agents to install or discover MCP servers dynamically, a malicious server can register a tool with a description that contains hidden instructions. Always review tool descriptions before adding them to your agent's registry. See: Tool Poisoning.
Write clear error returns
When a tool fails, return a structured error with enough context for the model to recover or escalate — not a raw exception stack trace. Example: { "error": "rate_limited", "retry_after": 5 }. A well-described error lets the agent retry, use a fallback tool, or inform the user gracefully.