VideoLibrary

Video Library > Exploring Reinforcement Learning: Can AI Learn to Play QWOP? | DigiKey

Exploring Reinforcement Learning: Can AI Learn to Play QWOP? | DigiKey

Reinforcement learning is a form of machine learning where AI agents improve at some task through experience interacting with an environment. In this instance, the environment is the video game QWOP. After much trial and error, the agent learns to “scoot” safely along the ground. You can find a written tutorial here walking you through the process of training the agent shown in the video:

https://www.digikey.com/en/maker/projects/teach-an-ai-to-play-qwop/ce7e360e67ae4017809be3576385ae5e

The code for the project can be found here: https://github.com/ShawnHymel/qwop-ai

In the video, Shawn asks a few friends to provide a demonstration of the deceptively simple game, QWOP, which involves using the q, w, o, and p keys to move a runner’s legs. The goal is to make the character run 100 meters. Thanks to the ragdoll physics and almost no ground friction, the results are often hilarious, and making it more than a few meters is frustratingly difficult.

We then construct an AI to use reinforcement learning that attempts to play the game through trial and error. Rewards are based on the distance, and the agent can choose one of five actions: no action, press q, press w, press q and p together, and press w and o together.

Observations are constructed by taking screenshots at a rate of around 15 frames per second and stacking four frames together so that the agent can judge speed, angular velocity, etc. We use OpenCV to resize the images and convert them to grayscale. Tesseract is used to provide optical character recognition (OCR) to extract the distance ran (given at the top of the screen) and if the game has reached a terminal state (by looking for the “press space” words to appear).

We wrap these interactions into a class using gymnasium. We then pass this environment wrapper to Stable Baselines, which trains our agent using the Proximal Policy Optimization (PPO) algorithm. In the end, the agent was able to make the character scoot safely along the ground to the 50 meter mark, where it struggled to make it over the hurdle. Without relying on imitation learning or a much larger amount of hyperparameter tuner, this is probably the best we will get for now.

Related Videos: Intro to Edge AI - https://www.youtube.com/watch?v=Ejld8XZmvwE

Related Articles: https://www.digikey.com/en/maker/projects/teach-an-ai-to-play-qwop/ce7e360e67ae4017809be3576385ae5e

Learn more: Maker.io - https://www.digikey.com/en/maker DigiKey’s Blog – TheCircuit https://www.digikey.com/en/blog Connect with DigiKey on Facebook https://www.facebook.com/digikey.electronics/ And follow us on Twitter https://twitter.com/digikey

00:00 - Intro to QWOP 01:11 - QWOP attempts 03:11 - Intro to reinforcement learning 06:15 - Creating a custom gymnasium environment 17:29 - Creating a custom Weights and Biases logger 19:48 - Train reinforcement learning agent 21:20 - Check the agent’s performance 23:45 - Test the agent 24:50 - Going further to train a better agent 25:34 - Conclusion

6/5/2023 5:58:26 PM

Need Help?

Feedback

I agree to receive marketing emails from DigiKey (optional). I understand that I can withdraw consent at any time. Please review our Privacy Notice.

INFORMATION

About DigiKey Marketplace Sell on DigiKey.com Careers Site Map Digital Solutions Newsroom

HELP

Help and Support Order Status Shipping Rates/Options Returns and Order Issues

CONTACT US

Chat (+30) 211 990 6010 eu.support@digikey.com Co-Browse

ECIA Member

FOLLOW US

Download on the App Store

Get it on Google Play

Greece Copyright © 1995-2026, DigiKey. All Rights Reserved. Terms & Conditions Privacy Notice Cookie Settings