Report on IPS Digital Frontiers Seminar- Man and Machine (2): Online and Offline Sensing of Public Sentiments on National Day Rally 2016

By Valerie Yeo

AS more people take to the cyberspace to express their opinion on various issues, governments and organisations face the challenge of making sense of a cluttered online space. How can we understand public opinion through Singapore’s online space? On 23 February 2017, the Institute of Policy Studies (IPS) organised a seminar that presented findings from an on-going study between IPS and the Living Analytics Research Centre (LARC) at the Singapore Management University.

The study aims to develop an online media analytics system that analyses online public opinion, and using that, to predict how the wider public feels towards important issues and events. The collaboration involves human analysis of online data by IPS and computer analysis of the same data by LARC. A total of 72 participants, from public and private sectors, as well as academics, attended the seminar.

Tan Tarn How, Senior Research Fellow from the Arts, Culture and Media cluster at IPS, chaired the session. There were two presentations — the first covered the online sensing done by Dr Jiang Jing, Associate Professor at the School of Information Systems, and Dr Palakorn Achananuparp, Research Scientist from LARC; and the second on offline sentiments drew on findings from a survey conducted by Senior Research Fellow Dr Carol Soon from IPS.

Opening Remarks by Tan Tarn How

Mr Tan introduced the seminar as a sequel to a previous study, “Rationality of the Political Online Space” conducted in 2014. The online media analytics system has been further refined in this study to answer the perennial question: How can public opinion be understood through Singapore’s online space? Can some kind of correspondence be drawn between what is found online to what happens in the general population? To analyse online and offline public sentiments, and the convergence between the two, researchers used the recent National Day Rally 2016 (NDR 2016) as a case study. By applying online and offline sensing to NDR 2016, researchers analysed the public’s responses to key trajectories and policies laid out for Singapore.

Presentation by Dr Jiang Jing and Dr Palakorn Achananuparp

Big Data and Online Sensing

The first objective of the study was to discover how Singaporeans who went online responded to the issues mentioned during NDR 2016. Traditionally, a survey would be conducted but the sample will be limited and not easily scaled up. Studying the cyberspace has advantages such as the large number of users and publicly available data. Based on online content that was collected from two weeks before and two weeks after 21 August 2016 when NDR 2016 was held, the data was analysed to answer questions such as: which online platforms were the most popular, which NDR topics gained the most attention and most positive sentiments, which topics saw the most divided opinions, and how interest and sentiments towards various topics differed across online platforms. The second objective of the study was to compare the findings of online sensing with those from the survey done by IPS and gauge if the interest and sentiments harnessed from online media are representative of the larger public.

The online platforms studied include blogs, Facebook, Twitter, online websites of Singapore mass media, political parties’ and politicians’ social networking sites, and online discussion forums. The methodology employed for machine training involved three steps: Identify NDR-related content, categorise posts and comments by topics, and determine if the sentiments expressed were positive or negative. The accuracy rates for machine learning were high for relevance of content and sentiment classification. The accuracy rate for topic categorisation was slightly lower and this was because some topics saw very few posts.

The popularity of an online platform was measured in two ways — the aggregation of original posts and comments, and engagement volume that include comments, likes and shares. Based on the aggregation of original posts and comments, Facebook was most popular, followed by discussion forums, Twitter, blogs and mainstream media. The majority of content were comments because it takes more effort to produce something original (e.g., an article on a blog). Facebook was the top platform when it came to engagement, followed by mainstream media, blogs, Twitter and discussion forums. In terms of the type of engagement, sharing was the most common form of engagement and one reason could be it requires the least effort compared to posting comments. The number of likes was proportionately smaller than the number of shares and comments because the function was not applicable across all platforms.

The popularity of NDR topics was measured by the total number of posts that mentioned the issue. The NDR topic that received the most interest was “PM Lee”. This topic also received the most positive sentiments. The topic that received the most negative sentiments was “Elected Presidency”.

Five policy issues were selected by IPS researchers after NDR took place. The selection was made based on their prominence in both mainstream media and the online space. The issues — “policies that dealt with disruption to economy”, “leadership succession”, “review of ElderShield”, “minority representation for the Elected Presidency” and the “Asatizah Recognition Scheme” (registration of Islamic teachers) — were also used in the survey to assess public reception to policy changes. Online sensing found that among these five topics, people were most interested in “leadership succession” and felt most positively about it too. 

The divisiveness of topics was measured by weighing positive sentiments against negative sentiments. If there were equal amount of positive and negative sentiments for a topic, it means that sentiments were divided and the topic a divisive one. Foreign policy was the most divisive issue. When it came to the five selected topics, “policies that dealt with disruption to economy” was the most divisive issue. 

Cross-platform analysis of interest and sentiment towards specific topics revealed further insights. Based on the distribution of topics, Facebook and Twitter were the most similar in terms of the topics discussed (both saw a dominance of discussion on “PM Lee” and “unwell”) while forums and blogs were similar (topics such as “Elected Presidency” and “PM Lee” were the most popular topics on both platforms). Thus, a division is observed where users of blogs and forums are interested in similar topics, while Facebook and Twitter users are interested in a different set of topics. When it came to sentiments, those on Facebook, Twitter and mainstream media were largely positive, while sentiments on blogs and forums were mainly negative. For positive sentiments, posts were largely clustered around the same topics such as “PM Lee” and “unwell”. However, the distribution of topics for negative sentiments was spread out over different topics. Users of Facebook, forums and blogs felt negatively towards “Elected Presidency”, Twitter towards “PM Lee” and mainstream media towards “foreign policy”.

Thus, online sensing revealed that the topic and sentiment distributions on Facebook and Twitter are clearly different from those on blogs and forums. Also, Facebook accounts for majority of users and posts generated, making it the go-to platform for public opinion.

Presentation by Dr Carol Soon

Survey and Offline Sensing

The survey had two objectives: first, to determine Singaporeans’ media usage as well as their interest and sentiments towards NDR issues; second, to triangulate online sensing and analytics with surveys, so as to develop a tool that accurately analyses public opinion across different domains in the long term. An online survey was conducted with 2,000 Singaporeans who engaged with the NDR, broadly defined as those who watched or listened to it live, watched a repeat telecast, read or heard reports, or talked about it to other people.

In terms of the demographics of respondents, 62.8% of respondents were aged 30–54 years old, 62.4% of respondents earned $5,000 and above per month, and 75% of respondents had a diploma, first degree, postgraduate or professional qualification.

In terms of how people engaged with NDR, watching and listening to it live was most common (with 56.5% of respondents having done so), followed by reading or hearing reports of it (35.9% having done so). Among those who watched or listened to the NDR, 89% watched it on TV. When it came to media usage for seeking information relating to NDR, the most popular medium was TV, followed by print newspapers and Facebook. The least used media were radio and other social networking sites (e.g., Twitter and Instagram).

When it came to participating in online activities relating to the NDR on online platforms such as blogs, YouTube sites, Facebook, online websites of Singapore mass media, political parties’ and politicians’ social networking sites and/or websites, Instant Messaging platforms, online discussion forums and other social networking sites (e.g., Instagram and Twitter), the findings showed that people engaged more in passive activities, which required less effort (e.g., learnt more about fellow Singaporeans’ views) rather than activities that required more effort (e.g., wrote a post or made a video). This was a trend consistent for all platforms.

The five NDR topics mentioned earlier were ranked based on interest and sentiment. Respondents were most interested in “policies that dealt with disruption to economy” (93.5%) followed by “leadership succession” (91.1%), “review of ElderShield” (89.4%), “minority representation for the Elected Presidency” (80.5%) and the “Asatizah Recognition Scheme”, or registration of Islamic teachers (65.8%). 

Breaking down the findings by demographics — race, age, income and education — there were differences in interest for certain topics. Minorities were more likely to be more interested in race-related policies such as “minority representation for the Elected Presidency” and “Asatizah Recognition Scheme”. Higher-income, higher-educated and older respondents were more likely to be more interested in “leadership succession”. The higher educated were also more likely to be more interested in “policies that dealt with the disruption to economy”.

For sentiments, respondents felt most positively towards “policies that dealt with the disruption to economy” followed by “review of ElderShield”, “leadership succession”, “minority representation for the Elected Presidency”, “Asatizah Recognition Scheme”. Minorities were more likely to feel more positive towards race-related policies as well. Older respondents were more likely to feel most positively across all topics while income did not make any difference on how people felt towards any of the topics.

Comparison was also made between online users (people who used at least one of the online platforms, excluding Instant Messaging) and non-online users (did not use any of those platforms). There was no significant difference in interest between online and non-online users for all five topics. For sentiments, non-users were more likely to feel more positive than online users about “review of ElderShield” and “leadership succession”.

Lastly, the results between online sensing and the survey for the same five NDR topics were compared. Survey respondents were divided into three groups — all 2,000 respondents, 1,535 online users and 110 active users (i.e., those who participated in at least nine out of eighteen high engagement activities such as “posted a tweet/photo” and “started a discussion thread”). Across all three groups, online sensing reflected the survey results to a certain extent as the top two topics for interest were similar. For online sensing, “leadership succession” was top and “policies that dealt with disruption to economy” was second, but the order was reversed for survey respondents. However, the difference in interest between the top two topics for the survey was very small at 2.4%. In particular, Facebook and mainstream media were the most reflective of active users’ interest in topics. Online interest reflected the interests of active users even more accurately rather than the interests of the general population. 

As for sentiments, the same trend was observed where the top two positive topics were the same. While “leadership succession” had the most positive sentiments for online sensing and “policies that dealt with disruption to economy” came second, the order was reversed for active users. However, the last two topics and their rankings were congruent across all three groups and online sensing — with “Asatizah Recognition Scheme” as fourth and “minority representation for the Elected Presidency” as fifth. 

In conclusion, the most popular platforms as sources of information were TV and print newspaper, while for engagement it was Facebook. While online sensing reflected offline interest on NDR issues to a certain extent, it reflected offline sentiments more closely.

Discussion

Mr Tan chaired the discussion session where the following issues were raised:

Factors for sentiment analysis

A participant noted that liking and sharing were used to measure engagement rather than sentiment for online sensing. However, liking, sharing and commenting are considered click speech and also express sentiments. Notably, sharing is ambiguous because there could be many motivations — as a show of support or to critique the issue.

Dr Achananuparp said that sharing is an interesting behaviour that warrants more study. Analogous to sharing, it is ambiguous if retweeting is a signal of endorsement as well. In a previous study done during General Election 2011, he compared retweets with the sentiments that were expressed using text analytics. Indeed, when people retweeted, they expressed positive sentiments towards the original content. Likewise, it is possible to do the same for likes and shares, to determine whether it can be assumed that these functions generally express positive sentiment. 

A participant asked how many Facebook posts were unique in content because these posts may just be sharing the same few mainstream media articles.

Dr Jiang replied that the aim of the study was to examine people’s interest on different topics. Thus, repetition would not pose a problem because it reflects which topics people pay attention to. Perhaps original content is more likely to be found on platforms such as forums and blogs. However, the number of users on those platforms are smaller than Facebook users. Thus, Facebook will be useful in providing a rough gauge of the percentage of the people interested in particular topics while blogs and forums will give more details into this public opinion such as the arguments used and evidence provided.

Challenges to online sensing

A participant asked if the content volume for several topics were less than 100, would it still be meaningful to conduct sentiment analysis, given the accuracy rate of 84%.

Dr Jiang answered that generally, when one has a small sample, it is not as reliable. For the study, researchers used a random sample of all the topics and measured the accuracy on the whole. Thus, it would be advisable to focus on analysing topics that received more attention and posts rather than minority topics.

Another participant asked if it would be increasingly difficult to conduct online sensing as people move to closed chat platforms such as Telegram, WhatsApp and Facebook Messenger. He asked how this constraint could be overcome.  

Dr Jiang acknowledged the challenge that such a trend poses. She said that even for Facebook, which is supposed to be an open platform, many accounts are private and data irretrievable. Fortunately, given that Facebook attracts many users, even if researchers were to only look at public pages, these would still be a significant number of posts. As for closed chat platforms, she said that one possibility is to engage a small set of users to share their content, friendship links and from there, generalise to a larger population.

A participant asked if the reaction to the spontaneous incident of Prime Minister Lee taking ill could have affected the results.

Dr Soon replied that PM Lee taking ill could have diverted some attention from other topics and the team had recognised the implications of using NDR as a case study. However, the team decided to proceed as there would be future studies that will enable researchers to compare and validate the findings. She explained that as the team’s focus was on policy issues, they did not select PM Lee taking ill as one of the five topics for survey study and comparison with online sensing. 

Triangulation of online sensing and survey data

One participant asked how the results from online sensing and the survey could be combined to provide a more complete picture of public opinion.

Dr Soon replied that different methodologies offer different insights and have their respective pros and cons. There were similarities in terms of what was found for online sensing and survey results, like the top topics that had attracted most interest and the most positive sentiments. The survey can complement online sensing by shedding light on how different demographic groups feel about various topics. As was presented earlier, demographic factors did have a significant impact for certain topics. In addition, the study showed that there was no difference in interest between online and non-online users and difference in sentiments occurred for two out of five topics. Without overstating the claims, this could indicate that online users are quite similar to non-users in terms of interest and sentiments, especially the former. The caveat is that this was just one case study, but this is worthy of further investigation.

Methodology

A participant sought to clarify how topics were assigned to posts for online sensing and how “Elected Presidency” differed from “minority representation”.

Dr Soon explained that a post could be assigned multiple labels to provide a more nuanced analysis. The human coders from IPS adopted a grounded approach and assigned topics at the fine-grained level as opposed to doing so broadly. When a post discussed the Elected Presidency, it could refer to be minority representation, eligibility criteria, Council of Presidential Advisors, possible presidential candidates or the constitutional commission. Thus, minority representation was differentiated from the Elected Presidency so as to examine the proportion of people who talked about this particular aspect.

Mr Tan commented that the current study is a static image of NDR. However, researchers would like to work towards capturing a moving image for certain long-term issues like transport, healthcare, cost of living and immigration. It will be useful to note the fluctuations of online sentiment over time and track the emergence of new topics.

In closing, Dr Achananuparp remarked that online users are a subset of the offline population, so they will reflect some part of society. A pertinent question is how users are different and how representative they are of the larger population. By adjusting for bias in demographics, applying techniques to make the two groups converge, a more accurate prediction can be made.

Valerie Yeo is a Research Assistant (Special Projects) at IPS.

Find out more about the presentations and the Digital Frontiers Series at our event page.

Return