Tutorial

mpirun

The distributed sparse linear solvers for madupite rely on calls to MPI from PETSc. Therefore, you always need to use mpirun:

export OMP_NUM_THREADS=1 # optional, see explaination below
mpirun -n <number of ranks> python main.py

The number of MPI ranks is specified by the -n option. While PETSc’s parallelism is based solely on MPI, the underlying BLAS/LAPACK routines may be parallelized with OpenMP, which can be controlled by setting the environment variable OMP_NUM_THREADS. More details can be found in the manual of PETSc and the manual for mpirun of your MPI implementation.

Minimal example

This example demonstrates how to quickly load and solve a MDP.

import madupite as md
with md.PETScContextManager():
    # create an mdp instance and load the transition probabilities and stage-costs
    mdp = md.PyMdp("./transprob.bin", "./stagecost.bin")
    # generate an initial cost and solve the MDP according to the default options
    mdp.solve("./outputDirectory/")

Generate an artificial MDP and solve it with various options

Since the routines for MDP generating are sequential they need only be executed by the main rank, which is why they are contained inside the if md.MPI_master() statement.

import madupite as md

# every function call should be within the PETScContextManager
with md.PETScContextManager():
    # create an empty mdp instance
    mdp = md.PyMdp()
    # Since we use mpirun, every line of code will be executed by each rank.
    # Parts that should only be run by a single rank must be called inside
    # an if md.MPI_master() statement.
    if md.MPI_master():
        # this function creates a random mdp with 10 states, 8 actions,
        # sparsity 0.5, and uniformly distributed stagecost
        transprob, stagecost = md.generateArtificialMDP(8, 4, 0.5)
        # write the numpy/scipy matrices to PETSc binary format
        md.writePETScBinary(stagecost, "./stagecost.bin")
        md.writePETScBinary(transprob, "./transprob.bin")

    # load the transition probabilities and stage costs
    mdp.loadP("./transprob.bin")
    mdp.loadG("./stagecost.bin")

    # generate an initial cost and overwrite the solver options with the json file
    mdp.generateInitCost("./options.json")
    # as an example we investigate the impact of different discount factors
    for i in range(4):
        mdp["discount factor"] = (i + 1) * 0.99 / 4
        # The first argument specifies the folder where the output files will be stored
        # solve accepts a second argument which reads options from
        # a josn file that overwrite the existing options
        mdp.solve("./" + str(i) + "/")
    # Now let's try another inner solver
    mdp["solver"]["ksptype"] = "richardson"
    mdp["solver"]["preconditioner"] = "jacobi"
    # if you don't need an output file you can just assign the
    # empty string to cost filename, policy filename or metadata filename.
    mdp["cost filename"] = ""
    mdp.solve("./jac/")

Input data format

The data for the MDP (transition probability tensor and stage cost matrix) must be provided in the PETSc binary format. The installation with pip provides a generator for artificial MDPs and a converter from numpy/scipy matrices to PETSc binary.

import madupite as md
md.writePETScBinary(matrix, "path/to/matrix.bin")

If you want to create your own MDP you must match the data layout of this solver:

  • stagecost matrix: \(G_{i,j}=\) “expected stage-cost of applying input j in state i”.

  • The transition probability is often expressed as a 3-dimensional tensor. In order to make use of the parallel matrix layout from PETSc, the first 2-dimensions must be combined:

\[P_{i,j,k}= \text{"transition probability from state i to k given input j"}\]

is flattened into a matrix:

\[\begin{split}i&=\left\lfloor\frac{x}{\# actions}\right\rfloor \\ a&=x\mod \# actions \\ P'_{x,j}&= \text{"transition probability from state } i \text{ to state j given input } a.\end{split}\]

It holds that \(P_{i,j,k}=P'_{i*\#actions+j,k}\) and can be implemented in Python as:

# combine first and second dimension of a 3d numpy array in python
1stdim, 2nddim, 3rddim = transprobtensor.shape
transprobmat = transprobtensor.reshape(1stdim * 2nddim, 3rddim)

The options.json file

Specify the details of the solver. Every call of mdp.solve(…) will solve the loaded mdp according to the details set in the solver. If you want to solve the same MDP multiple times with different settings just change the options in the json file. The repository contains a template optionstemplate.json.

properties

  • initial cost type

constant number (without quotes) or ‘random’ for uniformly distributed initial guess

  • discount factor

float between 0 and 1

  • solver

properties

  • ksptype

Sparse linear solver, e.g. ‘gmres’ from KSPType

  • options

String that contains PETSc options for the solver, e.g. -ksp_gmres_restart 5

  • preconditioner

Preconditioner, e.g. ‘jacobi’ from PCType

  • inner stopping criterion

properties

  • alpha

default for inexact policy iteration (fast) or exact, which is equivalent to standard policy iteration (slow). The accuracy is controlled by the outer stopping criterion

  • decreasing

tbd

  • outer stopping criterion

The maximum norm of the bellman residual function, e.g. 1e-9

  • policy filename

filename, can be empty

  • cost filename

filename, can be empty

  • metadata filename

filename, can be empty

Metadata format

The metadata provides insights about convergence and timings. It copies the options used during solving and adds the following.

properties

  • number of actions

Number of actions

  • number of states

Number of states

  • ranks

Number of available ranks to the solver

  • residual log (as array)

properties

  • inner solver

properties

  • iterations

Number of iteration in the inner solver

  • residual

Residual of the inner solver.

  • residual

The bellman residual. Solve stops as soon as this value is below outer stopping criterion

  • time

Time in seconds since the solver started

  • time and date

Fri Jul 7 09:57:40 2023

import json
metadata = json.load("./metadata.json")
# access the total execution time of the solver
totaltime = metadata["residual log"][-1]["time"]
# get a list of the Bellman residuals for each iPI iteration
residuals = [d["residual"] for d in metadata["residual log"]]

Generate arbitrary MDPs

generateMDP() allows generating arbitrary MDPs. Users have to specify a function that returns the transition probabilities for a state-action pair in form of an index array and the corresponding values. The stage-cost function should return the cost of a state-action pair. The following examples show how to create the MDPs for the forest management scenario by PyMDPToolbox and the tiger-antelope example by AI-Toolbox.

Forest management example

import madupite as md
import numpy as np

def probfunc(state, action):
    if action == 0:
        idx, val = np.array(
            [0, min(state + 1, nstates - 1)], dtype=np.float64
        ), np.array([p, 1 - p])
        return idx, val
    else:
        idx, val = np.array([0], dtype=np.float64), np.array(
            [1], dtype=np.float64
        )
        return idx, val

def costfunc(state, action):
    if action == 0 and state == nstates - 1:
        return -r1
    if action == 1 and state > 0:
        if state == nstates - 1:
            return -r2
        else:
            return -1
    return 0

md.generateMDP(
    nstates,
    mactions,
    probfunc,
    costfunc,
    "./data/sparse/P" + str(nstates) + ".bin",
    "./data/sparse/G" + str(nstates) + ".bin",
)

This will create the following MDP:

           | p 1-p 0.......0  |
           | .  0 1-p 0....0  |
P[:,0,:] = | .  .  0  .       |
           | .  .        .    |
           | .  .         1-p |
           | p  0  0....0 1-p |

           | 1 0..........0 |
           | . .          . |
P[:,1,:] = | . .          . |
           | . .          . |
           | . .          . |
           | 1 0..........0 |

         |  0  |
         |  .  |
R[:,0] = |  .  |
         |  .  |
         |  0  |
         |  r1 |

         |  0  |
         |  1  |
R[:,1] = |  .  |
         |  .  |
         |  1  |
         |  r2 |

tiger-antelope example

The tiger chases the antelope on a 2d-grid with periodic boundary conditions (“wrap-around world”). The tiger can choose to move by one field in any direction (left, right, up, down or stay). The antelope can move to the same fields, but moves randomly. It does not move to the cell where the tiger is at. The original formulation assigns a negative cost to the finnal absorbing state but an equivalent formulation could assign a positive cost to every state but the absorbing state.

import madupite as md

for square_size in range(4, 23):
    # function that maps a state(int) to coordinates of both animals
    def state2coords(state) -> list:
        return [
            (state % (square_size ** (i + 1))) // (square_size**i) for i in range(4)
        ]

    # function that maps coordinates of both animals to a state(int)
    def coords2state(coords):
        for i in range(4):
            if coords[i] == square_size:
                coords[i] = 0
            elif coords[i] == -1:
                coords[i] = square_size - 1
        state = 0
        for i in range(4):
            state += coords[i] * square_size ** (i)
        return state

    # 1d-distance of x and y in a wrap-around world
    def wrapDiff(x, y):
        diff1 = max(x, y) - min(x, y)
        diff2 = min(x, y) + square_size - max(x, y)
        return min(diff1, diff2)

    # manhattan distance in a wrap-around world
    def manhattanDistance(state, action=None):
        x_t, y_t, x_a, y_a = state2coords(state)
        # apply action
        if action is not None:
            if action == 3:  # right
                x_t = x_t + 1
            elif action == 2:  # left
                x_t = x_t - 1
            elif action == 0:  # up
                y_t = y_t + 1
            elif action == 1:  # down
                y_t = y_t - 1
        # measure distance
        return wrapDiff(x_t, x_a) + wrapDiff(y_t, y_a)

    # there are five actions: up,down,left,right,stand
    # the state space encodes the x and y position of the antelope and the tiger
    # the world has periodic boundary conditions
    def transprobfunc(state, action):
        coords = state2coords(state)
        x_t, y_t, x_a, y_a = coords
        idx = []
        val = []
        if x_t == x_a and y_t == y_a:
            # Final absorbing state, Tiger caught antelope
            idx = [state]
            val = [1]
        else:
            x_tn = x_t
            y_tn = y_t
            if action == 3:  # right
                x_tn = x_t + 1
            elif action == 2:  # left
                x_tn = x_t - 1
            elif action == 0:  # up
                y_tn = y_t + 1
            elif action == 1:  # down
                y_tn = y_t - 1

            idx = [
                coords2state([x_tn, y_tn, x_a, y_a]),
                coords2state([x_tn, y_tn, x_a + 1, y_a]),
                coords2state([x_tn, y_tn, x_a - 1, y_a]),
                coords2state([x_tn, y_tn, x_a, y_a + 1]),
                coords2state([x_tn, y_tn, x_a, y_a - 1]),
            ]

            # antelope does not move to tiger's position
            if manhattanDistance(state) <= 1:
                idx.remove(coords2state([x_tn, y_tn, x_t, y_t]))
                val = [1 / 4 for i in range(4)]
            else:
                val = [1 / 5 for i in range(5)]

        return idx, val

    def costfunc(state, action):
        coords = state2coords(state)
        x_t, y_t, x_a, y_a = coords
        if x_t == x_a and y_t == y_a:
            # Final absorbing state, Tiger caught antelope
            return -10
        elif action != 4:
            dist = manhattanDistance(state, action)
            if dist == 0:
                return -2.5
            elif dist == 1:
                return -2

        # # absorbing state modelling that allows discount factor 1
        # # instead of rewarding the final state we penalize every other state.
        # if x_t == x_a and y_t == y_a:
        #     return 0
        # else:
        #     return 10
        return 0

    nstates = square_size**4
    md.generateMDP(
        nstates,
        5,
        transprobfunc,
        costfunc,
        "./data/P" + str(nstates) + ".bin",
        "./data/G" + str(nstates) + ".bin",
    )