Graduation Year


Document Type




Degree Granting Department

Information Systems and Decision Sciences

Major Professor

Donald J. Berndt, Ph.D.

Co-Major Professor

Balaji Padmanabhan, Ph.D.

Committee Member

Joni L. Jones, Ph.D.

Committee Member

Richard P. Will, Ph.D.


clickstream research, information foraging theory, web mining, information scent, data mining


This dissertation sought to explain goal achievement at limited traffic “long tail” Web sites using

Information Foraging Theory (IFT). The central thesis of IFT is that individuals are driven by a

metaphorical sense of smell that guides them through patches of information in their environment.

An information patch is an area of the search environment with similar information. Information

scent is the driving force behind why a person makes a navigational selection amongst a group

of competing options. As foragers are assumed to be rational, scent is a mechanism by which to

reduce search costs by increasing the accuracy on which option leads to the information of value.

IFT was originally developed to be used in a “production rule” environment, where a user would

perform an action when the conditions of a rule were met. However, the use of IFT in clickstream

research required conceptualizing the ideas of information scent and patches in a non-production

rule environment. To meet such an end this dissertation asked three research questions regarding

(1) how to learn information patches, (2) how to learn trails of scent, and finally (3) how to combine

both concepts to create a Clickstream Model of Information Foraging (CMIF).

The learning of patches and trails were accomplished by using contrast sets, which distinguished

between individuals who achieved a goal or not. A user- and site-centric version of the CMIF,

which extended and operationalized IFT, presented and evaluated hypotheses. The user-centric

version had four hypotheses and examined product purchasing behavior from panel data, whereas

the site-centric version had nine hypotheses and predicted contact form submission using data

from a Web hosting company.

In general, the results show that patches and trails exist on several Web sites, and the majority

of hypotheses were supported in each version of the CMIF. This dissertation contributed to the literature

by providing a theoretically-grounded model which tested and extended IFT; introducing

a methodology for learning patches and trails; detailing a methodology for preprocessing clickstream

data for long tail Web sites; and focusing on traditionally under-studied long tail Web sites.