Decentralized partially observable Markov decision process


The decentralized partially observable Markov decision process is a model for coordination and decision-making among multiple agents. It is a probabilistic model that can consider uncertainty in outcomes, sensors and communication.
It is a generalization of a Markov decision process and a partially observable Markov decision process to consider multiple decentralized agents.

Definition

Formal definition

A Dec-POMDP is a 7-tuple, where
At each time step, each agent takes an action, the state updates based on the transition function, each agent observes an observation based on the observation function and a reward is generated for the whole team based on the reward function. The goal is to maximize expected cumulative reward over a finite or infinite number of steps. These time steps repeat until some given horizon or forever. The discount factor maintains a finite sum in the infinite-horizon case.