What Are AI Skills & Tools — How Agents Use Functions

🔧 What Is a Skill?

In the context of AI agents, a skill (also called a tool or function) is a packaged, reusable capability that an LLM can invoke to interact with the world outside its context window. Skills bridge the gap between language understanding and real-world action.

Every skill has four components:

Name — a unique identifier the model uses to reference the skill (e.g., web_search)
Description — natural language text explaining what the skill does and when to use it; the LLM reads this to decide when to call the skill
Input schema — a structured definition of the arguments (JSON Schema); the model fills these in based on the task
Implementation — the actual code that runs: an API call, database query, shell command, or any other operation

Example: A get_weather skill has the description "Fetches current weather for a city", accepts { "city": "string" } as input, and calls a weather API. The LLM never sees the API key or HTTP call — it just receives the result back as text.

⚙️ How Tool Calling Works

When an agent has access to skills, the LLM doesn't call them directly — it requests a call by generating a structured output. The runtime (your application code) executes the actual call and returns the result. Here's the full cycle:

Tool registration — Your code sends the model a list of available skills with their descriptions and schemas
Model reasoning — The LLM reads the task and the available tools, decides which tool to call and with what arguments
Tool call request — The model outputs a structured "tool call" message (e.g., { "tool": "web_search", "args": { "query": "current Gemini models" } })
Execution — Your runtime code calls the actual function, API, or service
Result injection — The result is added back to the conversation context
Continued reasoning — The model reads the result and either calls another tool, asks a follow-up, or generates the final answer

This loop can repeat many times in one turn. A complex agentic task might call 10–20 tools in sequence before producing a final response.

📊 Skills vs Plugins vs MCP Servers

The terminology has evolved rapidly. Here's how the concepts relate historically:

Era	Name	How it worked	Status
2023	ChatGPT Plugins	User-installable extensions with an OpenAPI spec; ChatGPT called them via HTTP	Deprecated (replaced by GPTs + tools)
2023	OpenAI Function Calling	API-level JSON Schema definitions; the model outputs a structured function call, your code executes it	Active — renamed "tool use"
2023–	Tool Use / Skills	Same as function calling; Anthropic coined "tool use", others use "skills" or "functions"	Active — current standard
2024–	MCP Servers	Standardized protocol (Anthropic, Dec 2024) for packaging and distributing tool servers; any MCP client can connect to any MCP server	Active — growing ecosystem

Key distinction: Traditional tool use is defined inline in your application code. MCP servers are standalone processes that expose tools over a standardized protocol — making them reusable across different AI clients (Claude Desktop, Cursor, VS Code, custom agents). Read more: What Is MCP.

🗂️ Tool Categories & Risk Levels

Not all tools carry the same risk. Grouping by risk level helps define your agent's action space and where to add human-in-the-loop checkpoints:

Category	Examples	Risk	Recommendation
Read-only	web_search, read_file, get_weather, list_dir	Low	Allow autonomously; log all calls
Write (local)	write_file, create_dir, edit_code	Medium	Scope to a sandboxed workspace directory
Network / External	send_email, post_to_slack, call_api	Medium–High	HITL approval for irreversible sends
OS / Shell	run_command, execute_script, install_package	High	Restrict to containerized environment; HITL required
Destructive	delete_file, drop_table, revoke_access	Critical	Always require explicit human confirmation

🏗️ Anatomy of a Good Skill

The single most important design decision for a skill is its description. The LLM reads descriptions to decide which tool to call — a poorly written description leads to wrong tool selection, missed calls, or ambiguous arguments.

What makes a description effective

Be explicit about when to use it — "Use this tool when the user asks about real-time data or current events" is better than "Searches the web"
State what it does NOT do — "Does not return historical data older than 30 days"
Describe the output format — "Returns a JSON array of search results with title, url, and snippet fields"
Keep it under 200 words — descriptions longer than ~200 tokens can overwhelm the model's attention when many tools are registered

Schema design principles

Required vs optional — Only mark fields required if truly necessary; optional fields with defaults reduce model errors
Use enums for constrained values — "format": {"enum": ["json", "markdown", "text"]} prevents hallucinated values
Prefer idempotent tools — Tools that can be safely retried (reads, lookups) are safer than one-shot actions (sends, deletes)
Return structured data — JSON responses are easier for the model to reason about than unstructured text blobs

💡 Real-world Examples

Skill name	What it does	Key arguments	Risk
`web_search`	Query a search engine and return top results	`query: string, num_results?: number`	Low
`read_file`	Read a file from the workspace directory	`path: string, offset?: number, limit?: number`	Low
`write_file`	Create or overwrite a file	`path: string, content: string`	Medium
`run_tests`	Execute the project test suite and return results	`filter?: string`	Medium
`browser_screenshot`	Navigate to a URL and return a screenshot	`url: string`	Medium
`send_email`	Send an email to one or more recipients	`to: string[], subject: string, body: string`	High — always HITL
`run_shell`	Execute an arbitrary shell command	`command: string, cwd?: string`	High — sandbox required

✅ Best Practices

Minimal permissions

Grant each agent only the tools it needs for its specific task. An agent that answers FAQ questions doesn't need write_file or send_email. The smaller the action space, the smaller the blast radius if the agent is manipulated. See: Excessive Agency.

Audit all tool calls

Log every tool call with its arguments and result. This enables debugging, cost attribution, and detection of anomalous behavior (e.g., an agent reading ~/.ssh/ when it should only access the project directory). AgentOps platforms like LangSmith make this easy.

Human-in-the-loop for irreversible actions

Any action that is hard or impossible to undo — sending messages, deleting data, making purchases — should require explicit human confirmation before execution. Build HITL checkpoints into your agent runtime, not just the prompt. Prompts can be overridden; code cannot be.

Beware tool poisoning

If you allow agents to install or discover MCP servers dynamically, a malicious server can register a tool with a description that contains hidden instructions. Always review tool descriptions before adding them to your agent's registry. See: Tool Poisoning.

Write clear error returns

When a tool fails, return a structured error with enough context for the model to recover or escalate — not a raw exception stack trace. Example: { "error": "rate_limited", "retry_after": 5 }. A well-described error lets the agent retry, use a fallback tool, or inform the user gracefully.