Graduation Year

2021

Document Type

Thesis

Degree

M.S.C.S.

Degree Name

MS in Computer Science (M.S.C.S.)

Degree Granting Department

Computer Science and Engineering

Major Professor

Sriram Chellappan, Ph.D.

Co-Major Professor

John Licato, Ph.D.

Committee Member

Marvin Andujar, Ph.D.

Keywords

multilingual, neural networks, NLP, participatory research

Abstract

Machine Translation (MT) has the potential to bridge the gap between the developed world and the marginalized communities by making information more accessible in real-time. While there are over 7000 spoken languages in the world, only about a hundred have access to high-quality MT systems and even fewer enjoy the benefits of more advanced language technologies. Unfortunately, resource scarcity and the lack of digital infrastructure are only some of the many challenges associated with globalizing NLP. Many large-scale multilingual studies and datasets often get little to no feedback from native speakers or linguistic experts of the languages involved, leading to serious problems of data quality and potential biases. In this thesis, we present a case study of participatory research in 22 Turkic languages involving native speakers, language technologists, researchers, linguists, commercial entities, and more. Through this thesis, we compile and release the largest public corpus for MT in Turkic languages along with 26 bilingual baseline models. We outline the curation and release of public datasets, the development of machine translation technologies, and their deployment in real-world scenarios. In addition, we discuss the lessons learned through this case study, its applications, and limitations, as well as implications for future projects.

Scholar Commons Citation

Mirzakhalov, Jamshidbek, "Turkic Interlingua: A Case Study of Machine Translation in Low-resource Languages" (2021). USF Tampa Graduate Theses and Dissertations.
https://digitalcommons.usf.edu/etd/8829

Download

Included in

Computer Sciences Commons

COinS

USF Tampa Graduate Theses and Dissertations

Turkic Interlingua: A Case Study of Machine Translation in Low-resource Languages

Graduation Year

Document Type

Degree

Degree Name

Degree Granting Department

Major Professor

Co-Major Professor

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Search

Browse By

Useful Links

USF Tampa Graduate Theses and Dissertations

Turkic Interlingua: A Case Study of Machine Translation in Low-resource Languages

Author

Graduation Year

Document Type

Degree

Degree Name

Degree Granting Department

Major Professor

Co-Major Professor

Committee Member

Keywords

Abstract

Scholar Commons Citation

Included in

Share

Search

Browse By

Useful Links