Graduation Year


Document Type




Degree Name

Doctor of Philosophy (Ph.D.)

Degree Granting Department

Electrical Engineering

Major Professor

Nasir Ghani, Ph.D.

Committee Member

Richard Gitlin, Sc.D.

Committee Member

Ismail Uysal, Ph.D.

Committee Member

Srinivas Katkoori, Ph.D.

Committee Member

Tao Zhang, Ph.D.


Survivability, Network Virtualization, Disaster Recovery, Progressive Recovery


Cloud infrastructure services are enabling organizations and enterprises to outsource a wide range of computing, storage, and networking needs to external service providers. These offerings make extensive use of underlying network virtualization, i.e., virtual network (VN) embedding, techniques to provision and interconnect customized storage/computing resource pools across large network substrates. However, as cloud-based services continue to gain traction, there is a growing need to address a range of resiliency concerns, particularly with regards to large-scale outages. These conditions can be triggered by events such as natural disasters, malicious man-made attacks, and even cascading power failures.

Overall, a wide range of studies have looked at network virtualization survivability, with most efforts focusing on pre-fault protection strategies to set aside backup datacenter and network bandwidth resources. These contributions include single node/link failure schemes as well as recent studies on correlated multi-failure \disaster" recovery schemes. However, pre-fault provisioning is very resource-intensive and imposes high costs for clients. Moreover this approach cannot guarantee recovery under generalized multi-failure conditions. Although post-fault restoration (remapping) schemes have also been studied, the effectiveness of these methods is constrained by the scale of infrastructure damage. As a result there is a pressing need to investigate longer-term post-fault infrastructure repair strategies to minimize VN service disruption. However this is a largely unexplored area and requires specialized consideration as damaged infrastructures will likely be repaired in a time-staged, incremental manner, i.e., progressive recovery.

Furthermore, more specialized multicast VN (MVN) services are also being used to support a range of content distribution and real-time streaming needs over cloud-based infrastructures. In general, these one-to-many services impose more challenging requirements in terms of geographic coverage, delay, delay variation, and reliability. Now some recent studies have looked at MVN embedding and survivability design. In particular, the latter contributions cover both pre-fault protection and post-fault restoration methods, and also include some multi-failure recovery techniques. Nevertheless, there are no known efforts that incorporate risk vulnerabilities into the MVN embedding process. Indeed, there is a strong need to develop such methods in order to reduce the impact of large-scale outages, and this remains an open topic area.

In light of the above, this dissertation develops some novel solutions to further improve the resiliency of the network virtualization services in the presence of large outages. Foremost, new multi-stage (progressive) infrastructure repair strategies are proposed to improve the post-fault recovery of VN services. These contributions include advanced simulated annealing metaheuristics as well as more scalable polynomial-time heuristic algorithms. Furthermore, enhanced \risk-aware" mapping solutions are also developed to achieve more reliable multicast (MVN) embedding, providing a further basis to develop more specialized repair strategies in the future. The performance of these various solutions is also evaluated extensively using custom-developed simulation models.