Implementations of Hierarchical Reinforcement Learning


Can anyone recommend a reinforcement learning library or framework that can handle large state spaces by abstracting them?

I'm attempting to implement the intelligence for a small agent in a game world. The agent is represented by a small two-wheeled robot that can move forward and backwards, and turn left and right. It has a couple sensors for detecting a boundary on the ground, a couple ultrasonic sensors for detecting objects far away, and a couple bump sensors for detecting contact with an object or opponent. It also can do some simple dead reckoning to estimate its position in the world using its starting position as a reference. So all the state features available to it are:

edge_detected=0|1 edge_left=0|1 edge_right=0|1 edge_both=0|1 sonar_detected=0|1 sonar_left=0|1 sonar_left_dist=near|far|very_far sonar_right=0|1 sonar_right_dist=near|far|very_far sonar_both=0|1 contact_detected=0|1 contact_left=0|1 contact_right=0|1 contact_both=0|1 estimated_distance_from_edge_in_front=near|far|very_far estimated_distance_from_edge_in_back=near|far|very_far estimated_distance_from_edge_to_left=near|far|very_far estimated_distance_from_edge_to_right=near|far|very_far

The goal is to identify the state where the reward signal is received, and learn a policy to acquire that reward as quickly as possible. In a traditional Markov model, this state space represented discretely would have 2985984 possible values, which is far too much to explore each and every one using something like Q-learning or SARSA.

Can anyone recommend a reinforcement library appropriate for this domain (preferably with Python bindings) or an unimplemented algorithm that I could potentially implement myself?


Your actual state is the robot's position and orientation in the world. Using these sensor readings is an approximation, since it is likely to render many states indistinguishable.

Now, if you go down this road, you could use linear function approximation. Then this is just 24 binary features (12 0|1 + 6*2 near|far|very_far). This is such a small number that you could even use all pairs of features for learning. Farther down this road is online discovery of feature dependencies (see Alborz Geramifard's paper, for example). This is directly related to your interest in hierarchical learning.

An alternative is to use a conventional algorithm to track the robot's position and use the position as input to RL.


  • Handling custom attributes
  • MySQL 3.23 UNION fails with Error 1064 [closed]
  • Hierarchical Time Series
  • send file using default bluetooth application in android
  • regular expression in grep
  • RegExp word boundary with special characters (.) javascript
  • Does iOS support Bluetooth SPP?
  • Symfony2 FOSUserBundle error - FileLoaderImportCircularReferenceException
  • jquery styles not applied in dynamically creation
  • How to deal with overlapping X-axis labels in DOJO chart?
  • Getting p-values from leave-one-out in R
  • Handle Tab Key Press on WebBrowser Control and Prevent Switching between Html Elements
  • Draw circle on google maps by passing parameters to the url
  • Why can't pass only 1 coulmn to glmnet when it is possible in glm function in R?
  • Re-using zip iterator in python 3 [duplicate]
  • Problems uploading App to Apple's App Store using RoboVM
  • How to fix this floating point square root algorithm
  • How to change a windows path to a linux path in all files under a directory using sed
  • Could not install package 'Microsoft.Owin.Security 2.0.1'
  • How to force refresh on CallLog.Calls.CACHED_NAME column?
  • Submission of new app with iAds
  • c++ search a vector for element first seen position
  • Is there any purpose for h2-h6 headings in HTML5?
  • testing a POST using phpunit in laravel 4
  • Compare struct to a constant in C
  • Activation Function choice for Neural network
  • GAE: Way to get reference to an HttpSession from its ID?
  • Spring boot 2.0.0.M4 required a bean named 'entityManagerFactory' that could not be found
  • Does Mobilefirst provide a provision to access web services directly?
  • What is the purpose of TaskExecutor in spring?
  • GridView breaks while scrolling
  • Blackberry - Custom EditField Cursor
  • Content-Length header not returned from Pylons response
  • iOS: Detect app start via notification press
  • Body moving without any force applied? (Box2d)
  • formatting the colorbar ticklabels with SymLogNorm normalization in matplotlib
  • Why HTML5 Canvas with a larger size stretch a drawn line?
  • what is the difference between the asp.net mvc application and asp.net web application
  • Android Google Maps API OnLocationChanged only called once