How Large Language Models Use Tools
Background
1. Transformer
The Transformer is the foundation of all modern large language models (LLMs), including GPT, Claude, Llama, and Gemini.
It is a sequence model that predicts each next token given all previous tokens:
1 | P(x_t | x_1, x_2, …, x_{t-1}) |
Transformers use self-attention to capture relationships between tokens, allowing them to reason over long contexts and generate coherent, contextually relevant language.
However, Transformers are purely text-based — they do not natively know how to fetch real-time data or interact with tools and APIs.
That limitation is what protocols like MCP (Model Context Protocol) are designed to solve.