This repository contains various simulations on agents that currently don't think but, using state-of-the-art simulators like Nvidia Isaac Gym, PyBullet, MuJoCo, and Unity. I will train different sets of Reinforcement Learning algorithms on these agents across all these simulators and test all algorithms.
Naming Prefix | Description |
---|---|
HKIsaac__ | Simulations in Isaac Gym |
HKPyBullet__ | Simulations in PyBullet |
HKMuJuCo__ | Simulations in MuJoCo |
- Nvidia AI for Robotics
- OmniIsaacGymEnvs Github
- Nvidia Omniverse
- Nvidia Isaac
- Nvidia Isaac Gym Documentation
- Google Deep Mind Control Suite
- MuJoCo
- PyBullet
Reinforcement Learning (RL) is a type of machine learning where agents learn to make decisions by interacting with an environment to maximize some notion of cumulative reward. It's characterized by trial-and-error, feedback, and the balance between exploration of uncharted territory and exploitation of current knowledge.
An Agent is an entity capable of perceiving its environment, making decisions on what actions to take, and learning from the outcomes of these actions. The agent's goal is to find the best strategy, or policy, that will maximize the cumulative rewards over time. This process involves observing the environment, executing actions, receiving rewards, and updating the policy based on the learned experiences.
Explanation of the Reinforcement Learning Pipeline:
-
Environment: Illustrated by a globe, it represents the world or context in which the RL agent operates. In RL, the environment is where the agent performs actions and receives feedback in the form of states and rewards.
-
Reward: Shown as a bag of money with a dollar sign, this symbolizes the reward function. In RL, rewards are given to the agent for performing certain actions, which guide the agent to learn the optimal policy. The reward is the feedback signal used to quantify the success of an action taken in a given state.
-
Agent: Depicted as a neural network, this is the learner or decision-maker. The agent decides what actions to take to achieve its goal, based on the policy it learns from interacting with the environment.
-
Training: Illustrated with a barbell being lifted by the word "agent", this represents the process of learning. The agent is metaphorically "exercising" through training to improve its policy. The action of training involves the agent interacting with the environment, receiving rewards, and adjusting its policy accordingly.
-
Deployment: The final icon is a robot, representing the deployment phase where the trained agent is put into action in the real world or a production environment. The robot implies that the agent is now capable of performing tasks autonomously.