Exploring Reinforcement Learning: Can AI Learn to Play QWOP? | DigiKey
Reinforcement learning is a form of machine learning where AI agents improve at some task through experience interacting with an environment. In this instance, the environment is the video game QWOP. After much trial and error, the agent learns to “scoot” safely along the ground. You can find a written tutorial here walking you through the process of training the agent shown in the video:
https://www.digikey.com/en/maker/projects/teach-an-ai-to-play-qwop/ce7e360e67ae4017809be3576385ae5e
The code for the project can be found here: https://github.com/ShawnHymel/qwop-ai
In the video, Shawn asks a few friends to provide a demonstration of the deceptively simple game, QWOP, which involves using the q, w, o, and p keys to move a runner’s legs. The goal is to make the character run 100 meters. Thanks to the ragdoll physics and almost no ground friction, the results are often hilarious, and making it more than a few meters is frustratingly difficult.
We then construct an AI to use reinforcement learning that attempts to play the game through trial and error. Rewards are based on the distance, and the agent can choose one of five actions: no action, press q, press w, press q and p together, and press w and o together.
Observations are constructed by taking screenshots at a rate of around 15 frames per second and stacking four frames together so that the agent can judge speed, angular velocity, etc. We use OpenCV to resize the images and convert them to grayscale. Tesseract is used to provide optical character recognition (OCR) to extract the distance ran (given at the top of the screen) and if the game has reached a terminal state (by looking for the “press space” words to appear).
We wrap these interactions into a class using gymnasium. We then pass this environment wrapper to Stable Baselines, which trains our agent using the Proximal Policy Optimization (PPO) algorithm. In the end, the agent was able to make the character scoot safely along the ground to the 50 meter mark, where it struggled to make it over the hurdle. Without relying on imitation learning or a much larger amount of hyperparameter tuner, this is probably the best we will get for now.
Related Videos: Intro to Edge AI - https://www.youtube.com/watch?v=Ejld8XZmvwE
Related Articles: https://www.digikey.com/en/maker/projects/teach-an-ai-to-play-qwop/ce7e360e67ae4017809be3576385ae5e
Learn more: Maker.io - https://www.digikey.com/en/maker DigiKey’s Blog – TheCircuit https://www.digikey.com/en/blog Connect with DigiKey on Facebook https://www.facebook.com/digikey.electronics/ And follow us on Twitter https://twitter.com/digikey
00:00 - Intro to QWOP 01:11 - QWOP attempts 03:11 - Intro to reinforcement learning 06:15 - Creating a custom gymnasium environment 17:29 - Creating a custom Weights and Biases logger 19:48 - Train reinforcement learning agent 21:20 - Check the agent’s performance 23:45 - Test the agent 24:50 - Going further to train a better agent 25:34 - Conclusion

