Solving 4-puzzle with Reinforcement Learning (Q-learning) in Python


This article describes a simple approach to solve the 4-puzzle problem with Reinforcement-learning (using Q-learning). Not all instances of 4-puzzle problem are solvable by only shifting the space (represented by ). Let’s aim at solving those problem instances only with model-free-methods.

The Markov Decision Process for 4-puzzle problem

Consider the following 24-state MDP for the 4-puzzle problem, each state being encoded as a flattened string, with the goal state being ‘0123‘.

For each state except the goal state, we have exactly two valid actions from the set of actions {‘Up‘, ‘Down‘, ‘Left‘, ‘Right‘}. The goal state is the exception which has an additional actionSTAY‘ that allows self transition.

The only actions that have a positive reward of +100 are (‘2103‘, ‘UP‘), (‘1023‘, ‘LEFT‘) and (‘1023

View original post 352 more words

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s