How will we take polls and gather opinions when people are no longer willing to participate in surveys? Using AI to analyse what people write online can supplement traditional political science studies in an age when fewer and fewer people are willing to participate in phone surveys. RISE is collaborating with political scientists at Mid Sweden University and several other universities to use AI and machine learning for the purpose of analysing human language on the web.
Opinion surveys on how people are expected to vote are becoming increasing difficult and expensive to conduct. Uncertainty of results has thus risen since it has become very hard to reach people via surveys and phone calls. New breakthroughs in language analysis methods that use deep learning to understand human language create opportunities for learning people’s opinions. It can also help with detecting different means of attempting to create impact online.
New methods create opportunities
Political scientists at several universities are collaborating with RISE in a project called "Studies of Public Opinion in Web Data", aimed at studying what people write online using language technology methods. The purpose is to find new ways of discovering public opinion on various topics. It is hoped that the project will be able to contribute new tools and method to supplement traditional opinion surveys.
– “This area is new and revolutionary. In recent years, new methods using deep learning have been developed with the ability to read a wide variety of languages,” says Magnus Sahlgren, who is in charge of the text analysis group at RISE.
The group is building computer systems that can understand human language. Breakthroughs in recent years have opened up entirely new possibilities for understanding the context around how words are used.
It’s very important for there to be transparency as to who has generated the message
Distinct differences between countries
The first part of the project compared, among other things, how different key concepts such as democracy, corruption and migration were perceived in different countries by analysing large amounts of web data using algorithms and models. It turned out that differences between languages and countries could be discerned by analysing discussions online.
– “As regards the term democracy, we could see clear differences between Western Europe, the Anglo Saxon world and the Middle East. in Western Europe, discussions on democracy tend to revolve around terms of procedure and freedom. In newer democracies in other parts of the world, however, they revolved more around state-building and state capacity, says Stefan Dahlberg, Professor of Political Science at Mid Sweden University.
In the past, analyses of languages online have focused more on identifying the presence of keywords to detect opinion and emotions. With the methods used now, focus is more on what topics people are talking about. For example, besides knowing that a discussion is about immigration, there is also a desire to know how it is being discussed. Is immigration seen as a problem or opportunity? The method involves analysing how people talk about a subject, which makes it possible to see which type of language usage it dominating the debate. The models are trained to understand the meaning of what is written and that specific words can have a variety of meanings.
– “These methods can also be useful for detecting propaganda and false news. It’s important to note, however, that the technology can be used for both good and bad purposes. It’s very important for there to be transparency as to who has generated the message, news or propaganda,” says Magnus Sahlgren.
Although the technology has advanced profoundly in recent years, there are still many challenges. The models currently in use require an enormous amount of data and computational resources. In fact, access to data is one of the problems because the way machines learn is by process large quantities of data. For this project, the data was purchased from an outside supplier.
Machine learning for linguistic intelligence
In order for analysis of web data to genuinely be useful, you need to first whether the text information is relevant for the purpose, reliable, and representative of the population. The next step in the project will focus on the latter, i.e. determining whether the data is representative of the population. The question needing to be answered is just how representative discussions taking place on line are of the overall population. To find out, machine learning is used whereby the models are trained using data from political science studies.
– “One of the biggest challenges to overcome is to successfully build models that truly understand what has been written. To do that, the models must be trained in a variety of languages. For example, there is a big difference between the style of language used in a typical news text and what typically is used in more informal discussion forums, such as Familjeliv (a forum about family life) or Flashback,” says Magnus Sahlgren.
This project, studying public opinions through web data is unique in that it compares a variety of languages and countries. Most similar studies focus on just one country and they also typically collect the data themselves. The interdisciplinary approach of this project is also unique. In this study, researchers at Mid Sweden University and the other higher education institutions are responsible for formulating the political science problems and how to go about solving them. RISE provides the technology.
– “In this project, we’ve been able to achieve things not possible for many other political scientists. It typically takes quite a long time before they have access to models as powerful as these. Thanks to RISE though, we’ve had access to the very latest technology,” says Stefan Dahlberg.