What Are AI Skills & Tools

How AI agents use packaged capabilities — from function calls to MCP servers. A practical guide for developers building with LLMs.

7 min read Updated: April 2026

🔧 What Is a Skill?

In the context of AI agents, a skill (also called a tool or function) is a packaged, reusable capability that an LLM can invoke to interact with the world outside its context window. Skills bridge the gap between language understanding and real-world action.

Every skill has four components:

  • Name — a unique identifier the model uses to reference the skill (e.g., web_search)
  • Description — natural language text explaining what the skill does and when to use it; the LLM reads this to decide when to call the skill
  • Input schema — a structured definition of the arguments (JSON Schema); the model fills these in based on the task
  • Implementation — the actual code that runs: an API call, database query, shell command, or any other operation

Example: A get_weather skill has the description "Fetches current weather for a city", accepts { "city": "string" } as input, and calls a weather API. The LLM never sees the API key or HTTP call — it just receives the result back as text.

⚙️ How Tool Calling Works

When an agent has access to skills, the LLM doesn't call them directly — it requests a call by generating a structured output. The runtime (your application code) executes the actual call and returns the result. Here's the full cycle:

  1. Tool registration — Your code sends the model a list of available skills with their descriptions and schemas
  2. Model reasoning — The LLM reads the task and the available tools, decides which tool to call and with what arguments
  3. Tool call request — The model outputs a structured "tool call" message (e.g., { "tool": "web_search", "args": { "query": "current Gemini models" } })
  4. Execution — Your runtime code calls the actual function, API, or service
  5. Result injection — The result is added back to the conversation context
  6. Continued reasoning — The model reads the result and either calls another tool, asks a follow-up, or generates the final answer

This loop can repeat many times in one turn. A complex agentic task might call 10–20 tools in sequence before producing a final response.

📊 Skills vs Plugins vs MCP Servers

The terminology has evolved rapidly. Here's how the concepts relate historically:

EraNameHow it workedStatus
2023 ChatGPT Plugins User-installable extensions with an OpenAPI spec; ChatGPT called them via HTTP Deprecated (replaced by GPTs + tools)
2023 OpenAI Function Calling API-level JSON Schema definitions; the model outputs a structured function call, your code executes it Active — renamed "tool use"
2023– Tool Use / Skills Same as function calling; Anthropic coined "tool use", others use "skills" or "functions" Active — current standard
2024– MCP Servers Standardized protocol (Anthropic, Dec 2024) for packaging and distributing tool servers; any MCP client can connect to any MCP server Active — growing ecosystem

Key distinction: Traditional tool use is defined inline in your application code. MCP servers are standalone processes that expose tools over a standardized protocol — making them reusable across different AI clients (Claude Desktop, Cursor, VS Code, custom agents). Read more: What Is MCP.

🗂️ Tool Categories & Risk Levels

Not all tools carry the same risk. Grouping by risk level helps define your agent's action space and where to add human-in-the-loop checkpoints:

CategoryExamplesRiskRecommendation
Read-only web_search, read_file, get_weather, list_dir Low Allow autonomously; log all calls
Write (local) write_file, create_dir, edit_code Medium Scope to a sandboxed workspace directory
Network / External send_email, post_to_slack, call_api Medium–High HITL approval for irreversible sends
OS / Shell run_command, execute_script, install_package High Restrict to containerized environment; HITL required
Destructive delete_file, drop_table, revoke_access Critical Always require explicit human confirmation

🏗️ Anatomy of a Good Skill

The single most important design decision for a skill is its description. The LLM reads descriptions to decide which tool to call — a poorly written description leads to wrong tool selection, missed calls, or ambiguous arguments.

What makes a description effective

  • Be explicit about when to use it — "Use this tool when the user asks about real-time data or current events" is better than "Searches the web"
  • State what it does NOT do — "Does not return historical data older than 30 days"
  • Describe the output format — "Returns a JSON array of search results with title, url, and snippet fields"
  • Keep it under 200 words — descriptions longer than ~200 tokens can overwhelm the model's attention when many tools are registered

Schema design principles

  • Required vs optional — Only mark fields required if truly necessary; optional fields with defaults reduce model errors
  • Use enums for constrained values"format": {"enum": ["json", "markdown", "text"]} prevents hallucinated values
  • Prefer idempotent tools — Tools that can be safely retried (reads, lookups) are safer than one-shot actions (sends, deletes)
  • Return structured data — JSON responses are easier for the model to reason about than unstructured text blobs

💡 Real-world Examples

Skill nameWhat it doesKey argumentsRisk
web_search Query a search engine and return top results query: string, num_results?: number Low
read_file Read a file from the workspace directory path: string, offset?: number, limit?: number Low
write_file Create or overwrite a file path: string, content: string Medium
run_tests Execute the project test suite and return results filter?: string Medium
browser_screenshot Navigate to a URL and return a screenshot url: string Medium
send_email Send an email to one or more recipients to: string[], subject: string, body: string High — always HITL
run_shell Execute an arbitrary shell command command: string, cwd?: string High — sandbox required

✅ Best Practices

Minimal permissions

Grant each agent only the tools it needs for its specific task. An agent that answers FAQ questions doesn't need write_file or send_email. The smaller the action space, the smaller the blast radius if the agent is manipulated. See: Excessive Agency.

Audit all tool calls

Log every tool call with its arguments and result. This enables debugging, cost attribution, and detection of anomalous behavior (e.g., an agent reading ~/.ssh/ when it should only access the project directory). AgentOps platforms like LangSmith make this easy.

Human-in-the-loop for irreversible actions

Any action that is hard or impossible to undo — sending messages, deleting data, making purchases — should require explicit human confirmation before execution. Build HITL checkpoints into your agent runtime, not just the prompt. Prompts can be overridden; code cannot be.

Beware tool poisoning

If you allow agents to install or discover MCP servers dynamically, a malicious server can register a tool with a description that contains hidden instructions. Always review tool descriptions before adding them to your agent's registry. See: Tool Poisoning.

Write clear error returns

When a tool fails, return a structured error with enough context for the model to recover or escalate — not a raw exception stack trace. Example: { "error": "rate_limited", "retry_after": 5 }. A well-described error lets the agent retry, use a fallback tool, or inform the user gracefully.