Social media platforms have become ubiquitous since their conception in the digital age. Originally, the platforms existed as social networking sites, for individuals to connect with others rather than dyadic communication as they are commonly used for now (1). These outlets have developed opportunities for individuals to publicly and digitally share self-produced content despite their consistently evolving functionality (1). Whether that content is visual, auditory, literary, or other contribution style is often dependent on the platform itself. To understand the dynamics of social media function and how platforms have shifted large scale communication, it is important to understand two bookend theories regarding mass communication. The first is the hypodermic needle theory, which is also known as the bullet theory. In this original anecdotal conception made popular in the 1930s, the media plays an active role injecting the passive and malleable audience with messages (2). However, in the later part of the 20th century, a new theory to explain communication was proposed with the dimming role of traditional media like newspapers and radio – the uses and gratification theory. This theory, in contrast to the bullet theory, suggests that individuals will not accept all messages posed by the media and instead will select only those sources that match best with personal interest and interact actively with those outlets (3). The uses and gratifications model has gained much popularity in explaining the popularity of social media platforms, which prioritize user interaction and contribution in addition to allowing individuals to participate in multiple communities simultaneously (4).
It is considered established knowledge that American teenagers are avid users of social media platforms. A 2016 nationally representative survey performed by the AP-NORC Center for Public Affairs Research found that approximately 89% of American teenagers have access to a smartphone and 80% to a laptop computer, affording the instant and continuous connection to the Internet that is a precursor to social media participation (5). The study further discovered that 94% of American teenagers aged 13 to 17 actively use social media platforms through creation of content, a sharp incline from the approximately 57% statistic uncovered in a 2004 survey conducted by the Pew Institute, a national think tank.
For a long time, the issue of surveillance has reigned as an important controversy in society, reaching as far back as the dystopian predictions in Orwell’s 1984. With the advent of ballooning social media platform popularity comes, however, the increasingly aggravated concern of personal information being surveilled. When it comes to the surveillance of the Internet, search terms and counts are of high importance to marketing agencies. A study on the matter that examined the movie, music, and video game industries found that “search counts are generally predictive of consumer activities” after discovering strong correlation that could be used to forecast long term trends (6). With regards to surveillance for marketing, further studies have added that eye-tracking, which collects data about an individual’s fixation and movement throughout a webpage in order to enhance the visual experience, is yet another mode of surveillance (7). Apart from the consumer surveillance, surveillance is also performed by third party agencies for government and national security purposes, a program that has come under fire in the past years. After the Patriot Act in 2001, triggered by the terrorist attacks of September 11th, it became important for the American government to regulate the available information and observe the new frontier of the “cyberspace” for any illegal activities (8).
Recently, a new application for social media surveillance has gained traction in the fields of public health and epidemiology. Termed social media syndromic surveillance, the process involves the tracking of “structured data based on established surveillance and monitoring protocols tailored to each disease (i.e. used for calculating the incidence, seasonality, and burden of disease)” according to a systematic review (9). Applications of trial syndromic surveillance systems tracking social media have allowed many doctors and public health officials to produce more efficient responses to disease outbreaks by gaining instant notification due to the rapidly updated nature of social media and the ease of tracking relevant keywords, their counts, and the location of their use (10, 11). The data collected from this method can be applied for a variety of purposes which can be boiled down to 3 key categories (12). The first of these purposes is “nowcasting”: the surveillance of diseases performed to analyze their prevalence and locations of high impact in hopes of aggregating enough data to eventually construct an Early Warning Detection system and forecast the diseases’ trajectories. The second is known as pharmacovigilance and concerns the study of adverse reactions individuals may experience after taking various medications. Finally, the third purpose is creating situational and behavioral awareness by collecting data to understand the social trends, behaviors, and events which may create a disease or catalyze its spread. To understand the applications of social media syndromic surveillance in more concrete terms, we can look to implementations and trials of the systems performed across the world.
The first trial to consider is one from 2009, which set an important precedent for social media syndromic surveillance by studying Google searches to better understand epidemiological conditions (13). In the study, researchers from Google and the Centers for Disease Control and Prevention built a system which automatically analyzed “hundreds of billions of individual searches from 5 years of Google web search logs” in order to observe the prevalence of any searches regarding ILI or “influenza-like illness”. The researchers then studied the data to determine which of the queries could best be used to explain trends in the number of physician visits gathered from CDC files. Essentially, the researchers “rewarded queries that showed regional variations similar to the regional variations in CDC ILI data” and once doing so weeded out the queries which coincidentally were associated with the trends but were not valid, such as “high school basketball.” Aggregating the final queries together allowed the researchers to create a model which when compared to the CDC’s data accurately predicted the trends in ILI related physician visits with a mean correlation of 0.90. These results demonstrated a promising system of forecasting epidemiological trends – a system which would become more and more efficient as Internet usage increased and more data was collected.
Another example of a social media syndromic surveillance system in action comes from a 2011 study which examined Tweets from 2009 during an Influenza outbreak in the Midwest United States (14). In this study, researchers built a system which looked through Twitter data for a set of preset keywords like “flu, swine, influenza, vaccine, tamiflu … symptom, syndrome, and illness.” The system found Tweets containing these words and marked them with their timestamp and geographic location in order to build a continuously updated map of influenza related social media posts. Using learning systems known as Support Vector Machines, the researchers compared the data from H1N1-Related Tweet Volume and Drug-Related Tweet Volume with the confirmed H1N1 case counts, ultimately finding that social media syndromic surveillance could successfully give knowledge about the levels of public concern about the flu and also about the cases that were occurring and being treated.
Finally, we can look at an even more recent example of a social media syndromic surveillance system from a 2018 paper by English researcher Serban and team. The researchers built a system known as SENTINEL which incorporates data from CDC as well as Twitter to collect information on the current epidemiological conditions and forecast future trends. The system marks an improvement over its predecessors by going through “1.8 million tweets per day in normal usage on a single machine, and [having] the ability to process 90 million per day if more data is available” (12). This system was successfully able to achieve 3 purposes using American data: early warning detection for diseases, situational awareness of “health-related events”, and predictions of present levels of prevalence. The major drawback of such systems however, is their relatively nascent stage, leading to a shortage of usability and effectiveness studies according to several sources as well as the occurrence of technical issues (9). Further research will be necessary to fully understand whether these systems can be implemented for regular use.
Edited by: Sophia Xiao
Illustrated by: Caroline Cao