# Algorithms for Simple Stochastic Games

2009

Thesis

M.S.C.S.

Computer Science

## Major Professor

Rahul Tripathi, Ph.D.

## Committee Member

Nagarajan Ranganathan, Ph.D.

## Committee Member

Sudeep Sarkar, Ph.D.

## Keywords

Game theory, Optimal strategies, Algorithms, Computational complexity, Computational equilibrium

## Abstract

A simple stochastic game (SSG) is a game defined on a directed multigraph and played between players MAX and MIN. Both players have control over disjoint subsets of vertices: player MAX controls a subset VMAX and player MIN controls a subset VMIN of vertices. The remaining vertices fall into either VAVE, a subset of vertices that support stochastic transitions, or SINK, a subset of vertices that have zero outdegree and are associated with a payoff in the range [0, 1]. The game starts by placing a token on a designated start vertex. The token is moved from its current vertex position to a neighboring one according to certain rules. A fixed strategy σ of player MAX determines where to place the token when the token is at a vertex of VMAX. Likewise, a strategy τ of player MIN determines where to place the token when the token is at a vertex of VMIN. When the token is at a vertex of VAVE, the token is moved to a uniformly at random chosen neighbor. The game stops when the token arrives on a SINK vertex; at this point, player MAX gets the payoff associated with the SINK vertex.

A fundamental question related to SSGs is the SSG value problem: Given a SSG G, is there a strategy of player MAX that gives him an expected payoff at least 1/2 regardless of the strategy of player MIN? This problem is among the rare natural combinatorial problems that belong to the class NP ∩ coNP but for which there is no known polynomial-time algorithm. In this thesis, we survey known algorithms for the SSG value problem and characterize them into four groups of algorithms: iterative approximation, strategy improvement, mathematical programming, and randomized algorithms. We obtain two new algorithmic results: Our first result is an improved worst-case, upper bound on the number of iterations required by the Homan-Karp strategy improvement algorithm. Our second result is a randomized Las Vegas strategy improvement algorithm whose expected running time is O(20:78n).

COinS