Graduation Year


Document Type




Degree Granting Department

Chemical Engineering

Major Professor

Aydin K. Sunol, Ph.D.

Committee Member

John A. Llewellyn, Ph.D.

Committee Member

Scott W. Campbell, Ph.D.

Committee Member

Luis H. Garcia-Rubio, Ph.D.

Committee Member

Rafael Perez, Ph.D.


data mining, symbolic regression, function identification, parameter regression, statistic analysis, process simulation


Local thermodynamic models are practical alternatives to computationally expensive rigorous models that involve implicit computational procedures and often complement them to accelerate computation for real-time optimization and control. Human-centered strategies for development of these models are based on approximation of theoretical models. Genetic Programming (GP) system can extract knowledge from the given data in the form of symbolic expressions. This research describes a fully data driven automatic self-evolving algorithm that builds appropriate approximating formulae for local models using genetic programming. No a-priori information on the type of mixture (ideal/non ideal etc.) or assumptions are necessary.

The approach involves synthesis of models for a given set of variables and mathematical operators that may relate them. The selection of variables is automated through principal component analysis and heuristics. For each candidate model, the model parameters are optimized in the inner integrated nested loop. The trade-off between accuracy and model complexity is addressed through incorporation of the Minimum Description Length (MDL) into the fitness (objective) function.

Statistical tools including residual analysis are used to evaluate performance of models. Adjusted R-square is used to test model's accuracy, and F-test is used to test if the terms in the model are necessary. The analysis of the performance of the models generated with the data driven approach depicts theoretically expected range of compositional dependence of partition coefficients and limits of ideal gas as well as ideal solution behavior. Finally, the model built by GP integrated into a steady state and dynamic flow sheet simulator to show the benefits of using such models in simulation. The test systems were propane-propylene for ideal solutions and acetone-water for non-ideal. The result shows that, the generated models are accurate for the whole range of data and the performance is tunable. The generated local models can indeed be used as empirical models go beyond elimination of the local model updating procedures to further enhance the utility of the approach for deployment of real-time applications.