Big Data to Knowledge – Harnessing Semiotic Relationships of Data Quality and Skills in Genome Curation Work

Document Type


Publication Date



data quality, DIK hierarchy, genome curation, semiotics

Digital Object Identifier (DOI)



This article aims to understand the views of genomic scientists with regard to the data quality assurances associated with semiotics and data–information–knowledge (DIK). The resulting communication of signs generated from genomic curation work, was found within different semantic levels of DIK that correlate specific data quality dimensions with their respective skills. Syntactic data quality dimensions were ranked the highest among all other semiotic data quality dimensions, which indicated that scientists spend great efforts for handling data wrangling activities in genome curation work. Semantic- and pragmatic-related sign communications were about meaningful interpretation, thus required additional adaptive and interpretative skills to deal with data quality issues. This expanded concept of ‘curation’ as sign/semiotic was not previously explored from the practical to the theoretical perspectives. The findings inform policy makers and practitioners to develop framework and cyberinfrastructure that facilitate the initiatives and advocacies of ‘Big Data to Knowledge’ by funding agencies. The findings from this study can also help plan data quality assurance policies and thus maximise the efficiency of genomic data management. Our results give strong support to the relevance of data quality skills communication for relationship with data quality assurance in genome curation activities.

Was this content written or created while at USF?


Citation / Publisher Attribution

Journal of Information Science, in press