Tutorials#
Loading and reading data with madupite
#
In this tutorial, we want to show how to load and read MDP data that is stored in files. The data itself must be stored as a PETSc binary file (explained here). madupite
provides a method to save numpy or scipy matrices to PETSc binary files (madupite.writePETScBinary()
).
Assuming the stage cost matrix and transition probability tensor are stored as g.bin
and P.bin
we can load them with madupite.Matrix.fromFile()
. We need to specify whether it is a stage cost matrix (md.MatrixCategory.Cost
) or a transition probability tensor (md.MatrixCategory.Dynamics
) to ensure that the number of states and actions is correctly inferred.
Furthermore, you can specify whether the matrix is sparse or dense using the md.MatrixType
enum. Sparse matrices are stored in a compressed format, which can save memory and speed up computations.
Defining an object for matrix preallocation is not necessary when loading from files since the information about non-zero elements is stored in the binary file.
import madupite as md
import numpy as np
def main():
# Write matrices to file
numStates = 10
numActions = 3
cost_matrix = np.random.rand(numStates, numActions)
md.writePETScBinary(cost_matrix, "g.bin")
prob_matrix = np.random.rand(numStates * numActions, numStates)
# normalize rows to 1, in order to have a valid transition probability matrix
prob_matrix /= prob_matrix.sum(axis=1)[:, None]
md.writePETScBinary(prob_matrix, "P.bin")
# Load matrices from file
g = md.Matrix.fromFile(
filename="g.bin",
category=md.MatrixCategory.Cost,
type=md.MatrixType.Dense,
)
P = md.Matrix.fromFile(
filename="P.bin",
category=md.MatrixCategory.Dynamics,
type=md.MatrixType.Sparse,
)
if __name__ == "__main__":
main()
Warning
Note that as of madupite
V1.0, the files themselves must contain the data in a sparse format because PETSc does not support reading dense matrices from binary files. By specifying the matrix type as dense, the data will be read as a sparse matrix and then converted to a dense matrix. This is recommended for stage cost matrices to benefit from data locality and speed up computations.
Generating data with madupite
#
Depending on the problem, creating the MDP data with numpy and reading them with madupite is often slower than generating them directly with madupite. This is because madupite can generate the transition probabilities in parallel and in the correct format, which avoids the need to convert the data.
In the following example, we show how to generate the stage cost matrix and transition probability tensor with madupite. We define a cost function and a probability function that are used to generate the data. The cost function takes the current state and action as input and returns the cost. The probability function takes the current state and action as input and returns the transition probabilities and the next state indices.
import madupite as md
def costfunc(s, a):
return s + a
def probfunc(s, a):
transition_probabilities = [0.2, 0.8]
state_indices = [s, (s + a) % 50]
return transition_probabilities, state_indices
def main():
num_states = 50
num_actions = 3
g = md.createStageCostMatrix(
numStates=num_states, numActions=num_actions, func=costfunc
)
P = md.createTransitionProbabilityTensor(
numStates=num_states,
numActions=num_actions,
func=probfunc,
)
# Solve the MDP ...
if __name__ == "__main__":
main()
Matrix Preallocation#
For large MDPs with sparse transition probability tensors, it is often beneficial to preallocate the matrices to avoid reallocations during the computation. This can be done by specifying the preallocation
argument. The method takes an instance of the madupite.MatrixPreallocation
class, which specifies the number of non-zero elements per row in the diagonal and off-diagonal block. See the example below for more details (adapted from PETSc).
Consider the following 8x8 matrix with 34 non-zero values, that is assembled across 3 ranks. Let’s assume that rank0 owns 3 rows, rank1 owns 3 rows, rank2 owns 2 rows. This division can be shown as follows:
1 2 0 | 0 3 0 | 0 4
rank0 0 5 6 | 7 0 0 | 8 0
9 0 10 | 11 0 0 | 12 0
-------------------------------------
13 0 14 | 15 16 17 | 0 0
rank1 0 18 0 | 19 20 21 | 0 0
0 0 0 | 22 23 0 | 24 0
-------------------------------------
rank2 25 26 27 | 0 0 28 | 29 0
30 0 0 | 31 32 33 | 0 34
This can be represented as a collection of submatrices as:
A B C
D E F
G H I
Where the submatrices A, B, C are owned by rank0, D, E, F are owned by rank1, G, H, I are owned by rank2.
The DIAGONAL submatrices corresponding to rank0, rank1, rank2 are submatrices [A], [E], [I] respectively. The OFF-DIAGONAL submatrices corresponding to rank0, rank1, rank2 are [BC], [DF], [GH] respectively.
When d_nz
, o_nz
parameters are specified, d_nz
storage elements are
allocated for every row of the local diagonal submatrix, and o_nz
storage locations are allocated for every row of the OFF-DIAGONAL submatrix.
Typically one chooses d_nz
and o_nz
as the max nonzeros per local
rows for each of the local DIAGONAL, and the OFF-DIAGONAL submatrices.
In this case, the values of d_nz
, o_nz
are:
rank0 dnz = 2, o_nz = 2
rank1 dnz = 3, o_nz = 2
rank2 dnz = 1, o_nz = 4
When d_nnz
, o_nnz
parameters are specified, the storage is specified
for every row, corresponding to both DIAGONAL and OFF-DIAGONAL submatrices.
In the above case the values for d_nnz
, o_nnz
are:
rank0 d_nnz = [2,2,2] and o_nnz = [2,2,2]
rank1 d_nnz = [3,3,2] and o_nnz = [2,1,1]
rank2 d_nnz = [1,1] and o_nnz = [4,4]
import madupite as md
# ...
rank, size = md.mpi_rank_size()
# Option 1
pc = md.MatrixPreallocation()
if rank == 0:
pc.d_nz = 2
pc.o_nz = 2
elif rank == 1:
pc.d_nz = 3
pc.o_nz = 2
else:
pc.d_nz = 1
pc.o_nz = 4
# Option 2
pc2 = md.MatrixPreallocation()
if rank == 0:
pc2.d_nnz = [2, 2, 2]
pc2.o_nnz = [2, 2, 2]
elif rank == 1:
pc2.d_nnz = [3, 3, 2]
pc2.o_nnz = [2, 1, 1]
else:
pc2.d_nnz = [1, 1]
pc2.o_nnz = [4, 4]
P = md.createTransitionProbabilityTensor(
numStates=num_states,
numActions=num_actions,
func=probfunc,
preallocation=pc
)
# Solve the MDP ...
Data format#
The data format for the MDP is defined by the stage cost matrix and the transition probability tensor. The stage cost matrix is a matrix of size numStates x numActions
, where each element (s, a) represents the cost of taking action a in state s. The transition probabilities are usually expressed as a tensor of size numStates x numActions x numStates
, where each element (s, a, s’) represents the probability of transitioning from state s to state s’ after applying action a. For madupite
the tensor is flattened to a matrix of size numStates*numActions x numStates
, where each row i represents the transition probabilities from state i // numStates to state s’ after applying action i % numStates.
The tensor can be reshaped as follows:
>>> import numpy as np
>>> numStates = 3
>>> numActions = 2
>>> P=np.array(
... [[[0.5, 0.5, 0.0 ],
... [0.25, 0.33, 0.42]],
...
... [[0.3, 0.3, 0.4 ],
... [0.4, 0.2, 0.4 ]],
...
... [[0.6 , 0.1, 0.3 ],
... [0.7 , 0.1, 0.2 ]]])
>>>
>>> P.reshape((numStates*numActions, numStates))
array([[0.5 , 0.5 , 0. ],
[0.25, 0.33, 0.42],
[0.3 , 0.3 , 0.4 ],
[0.4 , 0.2 , 0.4 ],
[0.6 , 0.1 , 0.3 ],
[0.7 , 0.1 , 0.2 ]])