Reinforcement Learning

An Overview of Bandits

In this post, we will take a look at the bandit problem and discuss some solution strategies. This is a fairly introductory overview so a basic understanding of probability should be enough to get through this one. Most of the posts on my page talk about RL and various topics related to RL and this post is no different. If you are already familiar with some RL, then the best way to understand bandits is as a simplified RL problem.

Inverse Reinforcement Learning

This is a review of the paper Algorithms for Inverse Reinforcement Learning. I recommend some reinforcement learning (RL) basics before you read this. The first couple of posts from the RL course on my page might be a good starting point. Inverse RL (IRL) is a topic I’ve been interested in in recent times so I’m excited to write this post. So lets get cracking! The Problem The Inverse RL problem.

Modeling RL Problems

Disclaimer: The content for this article does not come from any textbook or other reliable sources. They are observations made purely from my very limited experience with RL. I recommend that you gather some RL basics before you proceed to read this article. The first couple of posts from the course on my page could be a good start. In this article, I’m going to talk about something that I haven’t seen anywhere before and nobody really talks about it but I’m going to take a shot at it.

Bridging the Gaps With Reinforcement Learning

In this post, I will be talking about a unique way to use reinforcement learning (RL) in deep learning applications. I definitely recommend brushing up some deep learning fundamentals and if possible, some policy gradient fundamentals as well before you get started with this post. Traditionally, RL is used to solve sequential decision making problems in the video game space or robotics space or any other space where there is a concrete RL task at hand.

Deep RL for IR applications

Developing RL techniques for IR and NLP applications

The Human Swarm Project

Developed RL policies to control a swarm of drones to save humans from a forest fire. For the [IUSSTF-Viterbi programme]( 2019