Can anyone recommend a reinforcement learning library or framework that can handle large state spaces by abstracting them?
I'm attempting to implement the intelligence for a small agent in a game world. The agent is represented by a small two-wheeled robot that can move forward and backwards, and turn left and right. It has a couple sensors for detecting a boundary on the ground, a couple ultrasonic sensors for detecting objects far away, and a couple bump sensors for detecting contact with an object or opponent. It also can do some simple dead reckoning to estimate its position in the world using its starting position as a reference. So all the state features available to it are:
edge_detected=0|1 edge_left=0|1 edge_right=0|1 edge_both=0|1 sonar_detected=0|1 sonar_left=0|1 sonar_left_dist=near|far|very_far sonar_right=0|1 sonar_right_dist=near|far|very_far sonar_both=0|1 contact_detected=0|1 contact_left=0|1 contact_right=0|1 contact_both=0|1 estimated_distance_from_edge_in_front=near|far|very_far estimated_distance_from_edge_in_back=near|far|very_far estimated_distance_from_edge_to_left=near|far|very_far estimated_distance_from_edge_to_right=near|far|very_far
The goal is to identify the state where the reward signal is received, and learn a policy to acquire that reward as quickly as possible. In a traditional Markov model, this state space represented discretely would have 2985984 possible values, which is far too much to explore each and every one using something like Q-learning or SARSA.
Can anyone recommend a reinforcement library appropriate for this domain (preferably with Python bindings) or an unimplemented algorithm that I could potentially implement myself?Answer1:
Your actual state is the robot's position and orientation in the world. Using these sensor readings is an approximation, since it is likely to render many states indistinguishable.
Now, if you go down this road, you could use linear function approximation. Then this is just 24 binary features (12 0|1 + 6*2 near|far|very_far). This is such a small number that you could even use all pairs of features for learning. Farther down this road is online discovery of feature dependencies (see Alborz Geramifard's paper, for example). This is directly related to your interest in hierarchical learning.
An alternative is to use a conventional algorithm to track the robot's position and use the position as input to RL.