Benchmarks

Strong scaling

An artifificial Markov Decision Process (MDP) is generated with 100.000 states, that on average transition to 150 other states. The stagecost is uniformly distributed. It was solved on the Euler cluster of ETH Zurich. The number of ranks corresponds to the number of cores. The strong scaling plot suggests a 97% parallel fraction in the solving part.

_images/speedupthesis.png

PyMDPToolbox forest example

_images/Benchmarkpymdptoolboxsparse.png

While madupite was designed to be run in parallel on clusters, it also achieves a competitive performance on a single core. Here we compare the performance of madupite to PyMDPToolbox based on the forest management scenario. The benchmark ran on a single core of a 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz. The code can be found in the benchmarks folder of the github repository. The diagram shows the minimum, maximum and mean over 10 runs for each algorithm and number of states. PI, OPI and VI correspond to the policy iteration, policy iteration modified and value iteration algorithm implemented in PyMDPToolbox. Due to the choice of stopping criterion, OPI and VI by PyMDPToolbox only return an optimal policy without the associated optimal cost, which explains the performance gap between VI and mdplib.

AI-Toolbox tiger-antelope example

_images/aitoolbox3.png

Using the same default options as in the previous example, we compare the performance of madupite and AI-Toolbox using the tiger-antelope example from AI-Toolbox. The MDP models a tiger that chases an Antelope on a discrete square grid. The state is encoded by the x- and y-coordinate of both animals. The action determines to which of the four adjacent cells the tiger moves to. The antelope’s movement is random, which means it has a 1/5 probability to move to an adjacent cell or remain in its current position, except when the tiger is an adjacent cell where it will not move to the tiger’s position. The state in which both antelop and tiger are in the same position is modelled as an absorbing state, i.e. the system remains in this state regardless of the action. The plot shows how madupite can achieve a better performance than AI-Toolbox by using multiple cores. Furthermore, the time needed to create the MDP for a square of size 14 is less than 3 seconds for madupite, but 90 seconds for AI-Toolbox.