Debmalya’s Substack

Debmalya’s Substack

Share this post

Debmalya’s Substack
Debmalya’s Substack
LLM based fine-tuning of Reinforcement Learning Agents

LLM based fine-tuning of Reinforcement Learning Agents

Reinforcement Learning Agents for Industrial Control Systems

Debmalya Biswas's avatar
Debmalya Biswas
Dec 30, 2024
∙ Paid

Share this post

Debmalya’s Substack
Debmalya’s Substack
LLM based fine-tuning of Reinforcement Learning Agents
Share

1. Introduction

AI agents are the current hype. I have written about them, and others are also discussing them. Overall, however, it does mean that there is a lot of confusion reg. what are agentic AI systems? How are they different from generative AI (Gen AI)? Fig. 1 below tries to provide some clarity to this debate by showing the evolution of agentic AI systems.

Fig. 1: Agentic AI evolution (Image by Author)

Given a user task, the goal of an agent platform is to identify an agent (group of agents) capable to executing that task. So the first component we need is an orchestration layer capable of decomposing a task into sub-tasks, with execution of the resp. agents orchestrated by an orchestration engine. As of today, we prompt an LLM for the task decomposition. So this is the overlap with Gen AI.

Unfortunately, this also means that agentic AI today is limited by the reasoning capabilities of large language models (LLMs).

It then monitors the execution / environment and adapts autonomously. Given the long-running nature of such complex tasks, memory management is key for Agentic AI systems. The current solution is to use vector databases (Vector DBs) to store the agent memory externally — making data items accessible as needed.

It is also important to mention that integration with enterprise systems (e.g., CRM in this case) will be needed for most use-cases. For example, refer to the Model Context Protocol (MCP) proposed by Anthropic recently to connect AI agents to external systems where enterprise data resides.

A reference architecture for an agentic AI platform with the above components is illustrated in Fig. 2.

Fig. 2: Agentic AI platform reference architecture focusing on RL agents (Image by Author)

We will be focusing on the AI agent types and their capabilities in this article (highlighted on the right hand side of Fig. 2), especially, RL agents.
(Please refer to our previous article on Stateful & Responsible AI Agents for a deep dive on the other components.)

When we talk about AI agents today, we mostly talk about LLM agents, which loosely translates to invoking (prompting) an LLM to perform natural language processing (NLP) tasks, e.g., processing documents, summarizing them, generating responses based on the retrieved data. For example, refer to the “researcher” agent scenario outlined by LangGraph.

It is important to mention here that some agentic tasks may be better suited to other machine learning (ML) techniques, e.g., reinforcement learning (RL), predictive analytics, etc. — depending on the use-case objectives.

In this article, we focus on RL agents, and show how LLMs can be used to fine-tune the RL rewards and policy functions.

2. Reinforcement Learning for Industrial Control Systems

Keep reading with a 7-day free trial

Subscribe to Debmalya’s Substack to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Debmalya Biswas
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share