In this post, we will take a look at the bandit problem and discuss some solution strategies. This is a fairly introductory overview so a basic understanding of probability should be enough to get through this one. Most of the posts on my page talk about RL and various topics related to RL and this post is no different. If you are already familiar with some RL, then the best way to understand bandits is as a simplified RL problem.

This is a review of the paper Algorithms for Inverse Reinforcement Learning. I recommend some reinforcement learning (RL) basics before you read this. The first couple of posts from the RL course on my page might be a good starting point.
Inverse RL (IRL) is a topic I’ve been interested in in recent times so I’m excited to write this post. So lets get cracking!
The Problem The Inverse RL problem.

Disclaimer: The content for this article does not come from any textbook or other reliable sources. They are observations made purely from my very limited experience with RL.
I recommend that you gather some RL basics before you proceed to read this article. The first couple of posts from the course on my page could be a good start.
In this article, I’m going to talk about something that I haven’t seen anywhere before and nobody really talks about it but I’m going to take a shot at it.

In this post, I will be talking about a unique way to use reinforcement learning (RL) in deep learning applications. I definitely recommend brushing up some deep learning fundamentals and if possible, some policy gradient fundamentals as well before you get started with this post.
Traditionally, RL is used to solve sequential decision making problems in the video game space or robotics space or any other space where there is a concrete RL task at hand.

Developing RL techniques for IR and NLP applications

Developed RL policies to control a swarm of drones to save humans from a forest fire. For the [IUSSTF-Viterbi programme](https://www.iusstf.org/program/iusstf-viterbi-program) 2019

Developed RL policies to control a swarm of drones to save humans from a forest fire. For the [IUSSTF-Viterbi programme](https://www.iusstf.org/program/iusstf-viterbi-program) 2019

Powered by the Academic theme for Hugo.