LLM vs Agents ? A mathematical intuition

Over the past few years, Large Language Models (LLMs) like ChatGPT, Claude, and LLaMA have taken centre stage in AI research and product development. These models are incredibly powerful at generating human-like text, answering questions, summarising content, and more.

But now, a new paradigm is emerging: AI Agents. These are not just models that predict text—they are systems that act. They plan, use tools, interact with environments, and complete tasks autonomously.

So, what exactly is the difference between an LLM and an AI Agent? How do they work under the hood? And when should you use one over the other?

In this blog we will see how agents are different from LLMs not just theoretically but mathematically with some intuitive examples and simple mathematics.

To explain how agents differ from LLMs, we’ll need to break it down into two parts:

LLMs:

An LLM is a function approximator: it maps a sequence of tokens (words) to the next token, using probabilities.

Mathematical Definition:

An LLM approximates the probability distribution:

P(wt∣w1,w2,…,wt−1;θ) ,where:

-w_t: next token

-w₁,…,w_t−1: context tokens

-θ: model parameters (weights of neural net)

LLMs are trained to minimize cross-entropy loss:

The LLM is a pure function, like:

LLM: Input Text→ Next Token Distribution

Agents: Decision-Making with Goals + Memory + Tools

An agent is a broader computational entity that can:

Perceive its environment (via observations),
Decide on actions,
Execute those actions,
Update memory,
Often uses LLMs as tools.

Mathematical Definition

An agent’s behaviour can be modelled as a policy function in a Partially Observable Markov Decision Process (POMDP): π(a_t∣o_≤t,m_t)

a_t: action at time t
o_≤t: sequence of observations up to time t
m_t: internal memory or belief state

Agent’s objective: Maximize expected reward:

where:

s_t: hidden state (environment)
R(s_t,a_t): reward function

Example:

Let’s take an example .
“The sun rises in the ….”

We’ll explain mathematically and intuitively what:

An LLM does
An Agent does (which may use an LLM)

LLM Behavior — Predicting Next Token

P(w_t=‘east’∣ w₁=‘The’, w₂=‘sun’, w₃=‘rises’, w₄=‘in’, w₅=‘the’; θ)

Where:

-w_t: next word

-w_<t: previous words

-θ: model parameters (weights from training)

It estimates probabilities of possible next tokens:

Tokens	Probability
east	0.92
morning	0.03
sky	0.02
west	0.01
north	0.01

As it is evident from the table “east” has the highest probability(0.92)

So LLM completes:

“The sun rises in the east“

LLM doesn’t verify if it’s true — it just chooses the most statistically likely word based on training.

Now let’s see how an Agent reacts to the same input.

Agent’s behaviour (policy):

π(a_t ∣o_≤t, m_t)

Where:

-a_t: action (e.g., “search”, “ask LLM”, “query KB”)

-o_≤t: all observations (input, environment)

-m_t: memory (state, facts)

Steps an Agent might take:

Observation: “The sun rises in the …”
Recognize task: This is a factual statement; needs knowledge.
Action: Query tool or LLM for fact:
- Use a tool: a1=call_knowledge_base(“sunrise direction”)
- → returns "east"
Form answer: “The sun rises in the east.”

Deep dive into Agent’s inner working

Let’s go deep into the math behind how an Agent processes:

“The sun rises in the …”

We’ll model this using POMDP (Partially Observable Markov Decision Process) framework — the foundation for goal-driven agents — and show how it differs from a stateless LLM

Agent Framework (POMDP View):

An agent operates over time using a policy:

π(a_t ∣o_≤t, m_t)

where:

-a_t: action (e.g., “search”, “ask LLM”, “query Knowledge Base”)

-o_≤t: all observations (input, environment)

-m_t: memory (state, facts)

Agent’s goal :

maximize total expected reward:

Where:

s_t: true (hidden) environment state at time t

R(s_t,a_t): reward received after action

Now steps taken by the agent for :

“The sun rises in the ….”

Step 1: Observation o_t

Agent reads user prompt: o_t=“The sun rises in the …”

This is partially observable — it doesn’t fully tell us what the user wants (completion? verification? correction?). So the agent must infer intent.

Step 2: Belief/Memory State (m_t)

Assume agent has a belief or internal memory, such as:

m_t={facts: sunrise: “east”,

user intents: [“complete”, “fact-check”]}

This belief state helps the agent remember important information. It typically contains:

Facts:

Knowledge the agent has acquired or been told.

Example: "sunrise": "east" → the agent knows the sun rises in the east.

User Intents:

What the user has asked or wants to do.

Example: ["complete", "fact-check"] → the user wants the agent to complete a task and also verify some facts.

Purpose:
This structured memory helps the agent make decisions more intelligently — by combining current observations with what it already knows or assumes.

Step 3: Inference

The agent needs to estimate what state the user wants completed:

Let:

s_t: Hidden user intent (e.g. “complete sentence”, “verify fact”)

Agent maintains a belief distribution over states:

b_t(s_t)=P(a_t ∣o_≤t, m)

Let’s assume:

State (user intent)	Probability
complete sentence	0.80
ask for fact	0.15
irrelevant	0.05

Step 4: Action Selection

Agent uses policy “π” to select the next action:

Where:

Q(s_t,a): estimated value of taking action ‘a’ in state ‘s_t‘

[Choose the action ‘a’ that has the highest expected value, based on your beliefs b_t(s_t) about the current situation, and your estimate of how good each action is in each situation Q(s_t,a)]

Let’s now break down each part:

“The sun rises in the …”

The agent isn’t 100% sure what the user wants. It considers two possible intentions (states):

State s_t	Meaning	Belief b_t(s_t)
`complete`	User wants to complete the sentence	0.80 (80% likely)
`verify`	User wants to fact-check a statement	0.20 (20% likely)

Now the agent thinks:
“Which action should I take?”

Candidate Actions:

Action	Description
`LLM_Complete()`	Use LLM to complete the sentence
`lookup("sunrise")`	Check a tool/knowledge base
`ask_clarification()`	Ask user: “Do you want to complete or fact-check?”

The Agent’s Knowledge: Q-values

Q-values are like ratings of how good each action is in each situation.

Let’s say:

State	Action	Q(s_t,a)
complete	LLM_Complete	0.8
complete	lookup	0.9
verify	LLM_Complete	0.5
verify	lookup	0.95

Expected Value of Each Action:

Now compute the expected value of each action across both possible states:

For LLM_Complete:

EV=0.8⋅0.8+0.2⋅0.5=0.64+0.10=0.74

For `lookup("sunrise")`:

EV=0.8⋅0.9+0.2⋅0.95=0.72+0.19=0.91

For `ask_clarification` (say we assign 0.5 in both cases):

EV=0.8×0.5+0.2×0.5=0.4+0.1=0.5

Agent chooses:

a_t= arg⁡ max⁡_a EV(a)=lookup(“sunrise”)

Because it gives the highest expected benefit, given uncertainty.

Step 5: Execute Action → Get Result

result=tool(“sunrise”)=”east”

Now agent updates memory: m_t+1=m_t∪{completion: “east”}

Step 6: Generate Output

Use template or LLM to form:

“The sun rises in the east“

Conclusion:

TL;DR

LLM vs Agents ? A mathematical intuition

Published by soumya.swain186@gmail.com on May 21, 2025May 21, 2025

Mathematical Definition:

Agents: Decision-Making with Goals + Memory + Tools

Mathematical Definition

Agent’s objective: Maximize expected reward:

Example:

Agent’s behaviour (policy):

Steps an Agent might take:

Deep dive into Agent’s inner working

Now steps taken by the agent for :

Step 3: Inference

Step 4: Action Selection

The Agent’s Knowledge: Q-values

For `lookup("sunrise")`:

For `ask_clarification` (say we assign 0.5 in both cases):

Agent chooses:

Step 5: Execute Action → Get Result

Step 6: Generate Output

Conclusion:

0 Comments

Leave a Reply Cancel reply

Data Science

LoRA(Low Rank Adaption of LLM) explained

Data Science

Running a huge LLM on a single GPU in Colab

Data Science

Finetuning DBSCAN(Density-Based Spatial Clustering of Applications with Noise)

LLM vs Agents ? A mathematical intuition

Published by soumya.swain186@gmail.com on May 21, 2025May 21, 2025

Mathematical Definition:

Agents: Decision-Making with Goals + Memory + Tools

Mathematical Definition

Agent’s objective: Maximize expected reward:

Example:

Agent’s behaviour (policy):

Steps an Agent might take:

Deep dive into Agent’s inner working

Now steps taken by the agent for :

Step 3: Inference

Step 4: Action Selection

The Agent’s Knowledge: Q-values

For lookup("sunrise"):

For ask_clarification (say we assign 0.5 in both cases):

Agent chooses:

Step 5: Execute Action → Get Result

Step 6: Generate Output

Conclusion:

0 Comments

Leave a Reply Cancel reply

Related Posts

Data Science

LoRA(Low Rank Adaption of LLM) explained

Data Science

Running a huge LLM on a single GPU in Colab

Data Science

Finetuning DBSCAN(Density-Based Spatial Clustering of Applications with Noise)

For `lookup("sunrise")`:

For `ask_clarification` (say we assign 0.5 in both cases):