Senior ResearcherContact Magnus
This project develops and applies Natural Language Processing tools for assessing the validity, reliability, and representativeness of online text data, in order to facilitate the use of web data for studying opinions and societies through communicative online behavior.
The application of Big Data technology in various fields of Humanities and Social Sciences in recent years has provided scholars with innovative ways and means to address the changing landscape of public opinions and attitudes. Although Computer Science has made, and continues to make, great progress both in terms of data management and in the development of analytical tools for handling vast amounts of data, problems of selection, measurement errors and other types of biases are still unsolved. This project will validate the use of online text data as a complement to traditional surveys and polls. We do this by first answering the question “What text data is actually available on the Internet?”. We then collect text data using traditional survey experiments in order to answer the question “What does representative data really look like?”. Lastly, we develop computational methods that can answer questions such as “Is this text data relevant for my purposes?”, “Is this text data reliable?”, and “Is this text data representative of a population”? The outcome of the proposed project will contribute with important means and new measurements for using online text data as complements to traditional surveys and measures of human behavior, attitudes and opinions.
Studying opinions in web data
University of Bergen, the University of Gothenburg, GESIS Leibniz Institute for the Social Sciences, Södertörn University