Kolkata Paise Restaurant Problem


The Kolkata Paise Restaurant Problem is a mathematical game for competitive resource allocation without any coordination. Its name is drawn from the once-common "Paise Restaurants" in the Indian city named Kolkata. These were affordable eateries from the early 1900s to the 1970s that offered fixed-price meals at extremely low costs . The KPR problem is an anti-coordination game that models how a large number of individuals compete for limited resources without direct communication or coordination.
The problem becomes trivial—yet optimally efficient—if a non-playing coordinator or dictator intervenes. By simply instructing all players to form a queue and visit the restaurant matching their position in the line on the first day, and then rotate to the next restaurant each subsequent day, full resource utilization is achieved immediately. This ensures food for all customers, maximum revenue for all restaurants, and requires no learning or convergence time.
However, the true complexity of the problem arises when individuals act independently, each making decisions based on personal experiences of past success or failure, or available information about previous crowd sizes at the restaurants. In this decentralized setting, players aim to maximize their own payoff, which incidentally also drives optimal utilization and revenue at the system level—but only through emergent, self-organized behavior.
The KPR model generalizes the El Farol Bar problem, extending it from binary choice to multiple options. For foundational work on KPR, see
and for some early reviews see. When reduced to two players, the game aligns with classic anti-coordination models like the Chicken Game or Hawk–Dove Game. Tamir argued, following Anderson's "More is different", that this extension to large number of choices for all the players make KPR game much more complex and appropriate for decentralized optimization problems, than the finite option/choice games. For a study on the emergence of distributed coordination in the KPR problem with finite information, see.
Algorithmically, KPR shares traits with the Gale–Shapley algorithm in decentralized matching contexts. Broader connections to the "Kolkata Game" or "Kolkata Algorithm" appear in studies such as Refs.

Problem Definition

  • There are N restaurants and λN players ; typically λ=1. N can be arbitrarily large.
  • Each day, customers independently choose a restaurant based on their past experience of success or failure. Players do not know each other's choices but have access to historical data of past selections.
  • At each restaurant, only one of the players arriving there is randomly chosen and served. The rest of those arriving at that restaurant leave without food for that day.
  • The ideal outcome is perfect coordination, where each player picks a different restaurant in zero or short convergence time. However, this becomes difficult in the KPR game. This leads to inefficiencies or partial utilization. The KPR objective is to evolve the collective 'parallel learning' algorithms for maximizing the utilization fraction in minimum time.

    Strategies & Optimization

In KPR problem, strategies are evaluated on the basis of their aggregate payoff or the utilization fraction. For, this is given by the average fraction of attended restaurants or by the average fraction of agents getting food on any day.
In the random choice case of the KPR problem, each of the agents randomly selects one of the restaurants. The probability that exactly agents choose the same restaurant is:
giving a Poisson distribution in the limit :
The fraction of restaurants not chosen by any agent is then, giving the utilization fraction equal to. This becomes equal to for, meaning about 63% of restaurants are utilized or about 63% agents get food on any day in this case. The convergence time is zero.
For the random choice case discussed above, the agents do not have any memory, nor do they employ any learning strategy. A minimal learning stochastic strategy, with utilization fraction ~0.79, gives each customer a probability of choosing the same restaurant as yesterday's one varying inversely with the crowd size there yesterday, while choosing among the other restaurants with uniform probability. This is a better result than the above-mentioned simple random choice case, though the convergence time seems to grow very weakly, perhaps logarithmically or even more weakly, with .
Increased utilization fraction for customers, each having a fixed low budget allowance for local search using Traveling Salesman Problem type algorithm, has also been studied by Kastampolidou, Papalitsas and Andronikos. Employing a locally clustered structure of the restaurant distribution, each agent can visit multiple restaurants to search optimally for a vacant one and get food there. This approach is effectively equivalent to salesmen each solving a local TSP to complete visiting cities. Of course, some agents will still fail to get a vacant restaurant within his/her local neighborhood, allowed by the little budget, and full utilization will not be possible for any finite budget allowed. This approach helps derive a probabilistic formula, leading to increased utilization fraction, confirming its clear advantage.
Strategies without an external dictator are available to achieve full utilization of resources if objective orderings of players and resources are available to all players. In some cases, a social norm could provide that ordering; in others, history of play would eventually provide it.
Extensions of KPR for on-call car hire problems have been explored in by Martin et al. and see for the mean field solution of a generalized KPR problem in the same resource competition in spatial settings of the vehicle-for-hire market. For a detailed analysis and review on strategic driver relocations for higher efficiencies of ride-hailing platforms in KPR-like context, see.
For application of KPR to optimized job allocation problems in Internet-of-Things, see by Park and Saad. Stability of the KPR, induced by the introduction of dining clubs have also been studied. One can see for a study on the impact of expert opinions on such evolving mutually competing search strategies for the resources.
See also for application of KPR model to anthropological and sociological analysis of the models of polytheism, and for an algorithmic application to cancer therapy, see.
Extensions to quantum games for three player KPR have been studied, where, dimensionality permitted, each player can achieve much higher mean payoff. For some recent studies in the context of three player KPR Game on the advantages of higher payoffs in quantum games when the initial state is maximally entangled, see. See for a general introduction to econophysics, sociophysics, classical KPR, quantum games and quantum KPR, and see for a later review on classical and quantum KPR games.
One may see for an early attempt to exploit KPR type strategies for developing Artificial Intelligence or AI models. For a study of KPR-like approaches related to graph partitioning problem, see. Recently the KPR game has been extended and that has been exploited for creating a new benchmark for evaluating the AI algorithms.

Tournaments

The KPR Problem can be studied via tournaments to establish its grandmaster strategy, much as the Prisoner's dilemma was studied to discover/establish Tit for tat, Generous tit for tat, and zero-determinant strategies.
The open-source MAD Chairs software creates an otree server as standard for human subjects experiments, but allows tournaments also to include AI and hardcoded strategies for the KPR Problem.
No strategy performed well against all others in Axelrod's Prisoner's Dilemma tournaments, so Axelrod ranked them on total winnings, but the grandmaster strategy for the KPR tournament rankings shown above is more clear: The turntaking strategy achieves both high outcomes and also edge over each other individual strategy when the compared strategies have equal populations, where "edge" refers to the average lowest outcome minus the average lowest outcome of the compared strategy. In other words, no coalition competing against turntaking would be sustainable because it would include a member who should rationally expect to improve their outcomes by defecting to turntaking. Assuming that each alien society would reach the same conclusion by running its own tournaments, turntaking is a universal basis for coordination. It sorts restaurants by popularity and patrons by debt of favors owed to each other, and any player can independently calculate these from history of play.

Applications

The KPR model can describe various real-life resource allocation problems, such as:
  • Ride-hailing services: Passengers and the taxi drivers compete for on-call-hire taxis, sometimes overwhelming some areas while others remain unpopulated, unless properly matched dynamically by the on-call service managers.
  • Online resource access: Users competing for limited computing resources. Here the computer management needs appropriate job allocation dynamically, anticipating/extrapolating from the sizes of the incoming jobs and of the queues, in the context of Internet of Things.
  • Dining Clubs in KPR: Where the clubs can proxy on behalf of the players, who are not willing to take the risk of overcrowded restaurant choice and consequent failure to get the food.
  • AI benchmarking: Algorithms optimizing decentralized resource utilization generated employing KPR-inspired strategies.

    Quantum Games and Quantum Strategies

We start with a simple example of a quantum game demonstrating the use of quantum entanglement in the definition of the game strategy. The setup was introduced by Cluse, Horn, Shimony and Holt in the context of the hidden variable theory, and later became a prototype example in quantum game theory. The 'game' was also used as a test for quantumness in the context of quantum computers.
Consider a game where Alice and Bob are asked a yes-no question, there are 4 possible pairs of questions,, or, and these are chosen randomly. In the question they get a reward 1 for uncorrelated answer or and in the other cases they get a reward 1 for correlated answers or. The game is repeated. Alice and Bob are not allowed to communicate during the game, however they can decide in advance on a strategy. They could agree to always answer and such a strategy will yield a 75% success. Alternatively, they can each toss a coin, and it is easy to show that such a strategy will yield a 50% success. Suppose now they share a maximally entangled two qubit state:
Define the locally unitary operator:
Alice and Bob can apply such unitary operators on and then measure their entangled state in the basis. Alice uses ,, Bob uses. It is easy to show that for Alice and Bob will win at 85% of the cases. The unitary operators on the entangled state produce un-correlated pair of "coins" for the questions, and a correlated pair of "coins" for the other questions.
Therefore, using entangled states, unitary operators, and quantum measurement can improve the success of such games.
We can use the above example to define a quantum strategy of a game. The players are sharing an entangled state, and they are allowed to use local unitary operators. Following the local operations, a quantum measurement of the state will yield a probability distribution of the payoffs defined by the game. The entanglement is the "strategic" part of quantum strategy, and the local unitary operators are the "tactical" part of the quantum strategy.
Quantum strategies could change the Nash equilibria landscape, it could erase classical Nash equilibrium strategies, and build new quantum Nash equilibrium strategies. Moreover, one can get quantum Nash equilibria with possible different payoffs than the classical ones.
For example, in Eisert's quantum Prisoner's Dilemma, the old Nash equilibrium is no longer valid, there exists a new Nash equilibrium strategy, in the form of two identical local unitary matrices. Moreover, in the new Nash equilibrium, both players have a better payoff than in the classical case, the payoff equals the classical payoff for mutual cooperation, and therefore the players escape the dilemma.
In the above setup of Eisert's quantum Prisoner's Dilemma, there is the possibility of "stirring", where one player can undo the other players actions, by playing a strategy that "commutes through" J - the entangling operator. This looks like the player is operating on the space before it was entangled, destroying the temporal order of the game. However, if we restrict the local unitary operations, we can prevent stirring.
In general, player A's local operations can always affect the joint state and hence the reduced density matrix for B may change. Clearly, if we allow non-local operations, then stirring is possible.
A mixed quantum strategy is a strategy where each of the player has a probability distribution over local unitary operations which are applied to a density matrix that is shared among the players. The mixed quantum state should not be separable, otherwise the whole game resembles a classical one. In general, a mixed quantum strategy will yield Nash equilibria.
Flitney and Abbott introduced a quantum version of the Monty – Hall game. It was shown that introducing a quantum strategy erases the classical Nash equilibrium, and a mixed quantum strategy over maximally entangled states brings back the classical Nash equilibrium.
In a repeated game, introducing quantum strategies could have effect on the stability of the classical Nash equilibrium, in the Evolutionary Stability Strategy sense. A stable Nash equilibrium could turn into a non-stable one by an invasion of 'quantum mutants', and visa-versa, an evolutionary unstable Nash equilibrium could become stable under quantum 'mutations'.
In a repeated CHSH game Reichardt, Unger and Vazirani showed that the above quantum strategy is robust; if Alice and Bob play the CHSH game many times and if they win in about 85 percent of the times, then we can conclude that they are using a close form of the above quantum strategy. Moreover, the above quantum strategy is generating: if Alice and Bob are using strategies which could depend on the success of previous games, and if they win in about 85 percent of the times, then we can conclude that there are enough independent games played with the above quantum strategy.