1. What is Reinforcement Learning?

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment to maximize cumulative reward.

In RL, an agent:

  • Perceives the environment’s state

  • Takes actions

  • Receives feedback (reward or penalty)

  • Learns a policy that maximizes long-term reward

It’s trial-and-error learning, where the agent explores and improves over time.


2. Key Components of RL

ComponentDescription
AgentLearner or decision maker
EnvironmentExternal system the agent interacts with
State (S)Current situation of the environment
Action (A)Choices the agent can make
Reward (R)Feedback signal (positive or negative)
Policy (π)Strategy mapping states to actions
Value Function (V)Expected long-term reward from a state
Q-Function (Q)Expected reward of taking action a in state s

3. The RL Cycle

  1. Agent observes the state

  2. It selects an action based on its policy

  3. The environment returns a new state and a reward

  4. Agent updates its policy based on experience


4. Types of Reinforcement Learning

(a) Model-Free RL

  • Learns directly from experience

  • No knowledge of environment dynamics

Examples:

  • Q-Learning

  • SARSA

(b) Model-Based RL

  • Learns a model of the environment (transition and reward functions)

  • Uses the model to plan


Q-Learning (Off-policy)

  • Learns Q-values:

  • Doesn’t follow the same policy it updates

SARSA (On-policy)

  • Learns Q-values while following current policy

Deep Q-Networks (DQN)

  • Combines Q-learning with neural networks

  • Handles high-dimensional state spaces (like games)


6. Exploration vs Exploitation

  • Exploration: Try new actions to discover better rewards

  • Exploitation: Use known actions that give high reward

A balance is needed (e.g., using ε-greedy policy).


7. Reward Signal

The reward is crucial in shaping agent behavior:

  • Immediate reward teaches short-term behavior

  • Discounted future rewards (via discount factor γ) encourage long-term planning


8. Applications of RL

  • Game Playing: AlphaGo, Dota bots

  • Robotics: Learning to walk, grip objects

  • Recommendation Systems

  • Autonomous Vehicles

  • Industrial Control Systems


9. Challenges in RL

  • Delayed rewards: Action effects seen after many steps

  • Exploration complexity: Large action/state spaces

  • Sample inefficiency: Requires lots of interaction data

  • Stability and convergence in training (especially deep RL)


Summary

FeatureReinforcement Learning
Learning StyleTrial-and-error
Feedback TypeScalar reward signal
GoalMaximize total reward
Best ForSequential decision-making
Core IdeaLearn from interaction with environment

Reinforcement learning is like teaching a dog tricks by giving it treats—it improves by trying, failing, and adjusting its behavior to get more rewards over time.