The Project

Approach and Methodology

The Consortium will address the challenges by:

  • Reviewing the current state of the art, both by reviewing academic and grey literature and by consulting with experts.
  • Describing how a modular platform architecture could look like, which is able to combine different approaches to detect, monitor and offer support to fight malicious communication phenomena and to evolve with the evolution of threats. This includes describing available software tools and techniques, as well as those which do not, thus providing recommendations for further funding, research, testing.
  • Setting-up a platform prototype (Observatory) to collect, analyse and present large quantities of data according to different techniques, in order to develop a decision-making support tool for users, both individuals and institutions.
  • Extracting from all the above, and discussing with experts, key recommendations for action for stakeholders including policy makers, regulators, traditional media companies, social media companies and citizens. The scientific approach used in the development of such proposal is totally multidisciplinary with competences ranging from Computer Science (in order to aggregate and validate news) to Economics and Politics (to work on the co-creation of policies and decision-making support tools with stakeholders), to Computational Science and behavioural user analysis (to understand how opinions and stories form and spread on social media, determining certain behaviours offline), to Network theory and complex systems science (for the modelling and measurements of the various quantities) to Information Visualization and Information Aesthetics (to be able to represent and visualize complex data and large sets of data in meaningful, comprehensible, accessible and usable ways).

The Observatory (platform) will consist of distributed elements (browser plugins which will reside and function on final users’ computers) and of a centralized architecture which will be developed by customising and integrating physical resources already developed by Catchy and HER. The platform will integrate data harvested from a variety of sources (web, RSS, social, APIs, data integration) and process them using a number of techniques (natural language processing, network analysis, semantic analysis, machine learning, sentiment and emotional analysis), produce results and reports, visualize information in static and interactive ways and export and integrate data, information and visualizations to other destinations.

It will also host in vitro experiments (through A/B testing techniques, Gamified setups including Challenges, Objectives and Recognitions/Badges, and Filtering techniques) to support policy design and decision-making activities.

Evidence collected via the platform and analysed by the researchers will allow to ground recommendations into original empirical research as well as on literature review.

Objective 1: Understanding of how information spreads on social media

Objective 2: Understanding of the emergence of narratives and change of behaviour in the age of social media

Objective 3: Observatory of spread of information and user behavior

Objective 4: Recommendations for a research agenda and for stakeholder actions


Context Analysis

Since the appearance of Facebook and Twitter, the impact of social networks on the way people consume, produce and interacts with information has grown exponentially. The rapid change that the media environment has undergone since the advent of social media can be seen in the quantity of information produced each day. The total amount of data in the world was 4.4 Zettabytes (ZB) in 2013. That is set to rise steeply to 44 ZB by 2020. According to a report from IBM Marketing Cloud [IBM Report 2017] 90% of the data in the world today has been created in the last two years alone, at 2.5 quintillion bytes of data a day.

This can also be seen in changing habits and behaviour of people – and particularly of younger generations – in the way they consume and interact with information. Every action today can be photographed, filmed and distributed to friends, family or beyond. This new era of mass self- communication [Castells 2010] that is self-generated in content, self-directed in emission and self- selected in reception reaches a potentially global audience through p2p networks and is the backbone of today’s communication networks. This dramatic shift in production, consumption and interaction with information is bound to have a deep impact not only on the media environment but also on most systems, whether looking at the micro-level of individuals – the way they think, see themselves and the world, interact with others and take decisions, including what to buy or whom they should vote for – or at the macro-level, in the way our democratic and political systems and economic models work.

“The shift from traditional mass media to a system of horizontal communication networks organised around the Internet and wireless communication has introduced a multiplicity of communication patterns at the source of a fundamental cultural transformation, as virtuality becomes an essential dimension of our reality” says leading sociologist Manuel Castells in describing the basis for our new “networked society”.

These enormous shifts are having huge ramifications in all fields and, in fact, more and more resources are being poured into this new medium to be able to better harness it and use it. In the political arena, candidates and parties are developing ever more sophisticated social media strategies to get elected – look for instance at the impact of big data analytics and microtargeting during the Brexit and 2016 US elections [The Guardian 2017, BBC 2017] -, while policy-makers are struggling to genuinely engage with their constituencies via these new channels. Indeed, as remarked by the Commission’s White Paper on the future of Europe, “restoring trust, building consensus and creating a sense of belonging is harder in an era where information has never been so plentiful, so accessible, yet so difficult to grasp. The 24/7 nature of the news cycle is quicker and harder to keep up with and respond to than it ever has been before. More tweets are now sent every day than in a whole year ten years ago. And by 2018, around a third of the world’s population will use social media networks”.

Social media are becoming even more important also for companies that invest enormous amounts to boost their reputations and where – on a darker note – campaigns are launched to support the damaging of competitors’ reputation. An example of this more sinister behaviour could be observed during the “fridge-incident” in China, where Siemens name was blacklisted by consumers after a video of one broken fridge door went viral [Economic Observer 2011] with heavy losses for the company’s reputation and revenue stream. Businesses are also becoming increasingly good at using social media for their own commercial purposes: indeed, microtargeting practices in the marketing world are indeed increasingly sophisticated and adopted. And as proved by the first large- scale study on the effect of microtargeting carried out by Matz and al (in press) they are also extremely effective: indeed, the research, involving 3.7 million individuals, showed that persuasive messages that were matched to people’s extraversion or openness-to-experience level resulted in up to 40% more clicks and up to 50% more purchases than their mismatching or un-personalized counterparts.

Although the era of social media is just in its infancy the dramatic impact it has had on the world has sparked the emergency of completely new fields of academic research.

For simplicity, we can divide this research into two main areas:

  1. Understanding how information spreads on social media
  2. Understanding the emergence of narratives and change of behaviour in the age of social media

1. Understanding how information spreads on social media

The emergence of the network society has created a whole field of research looking and trying to understand how information spreads on social media. Why do some things go viral? Why was Obama or Trump’s social media campaign much more effective then their opponents? Why do people retire into the comfort of their own “echo-chambers” thus reaffirming their worldview creating an increasingly polarized world? Why has the information environment of social media developed in a way so different from that envisaged by its creators?

The initial view of the changing media environment spearheaded by the internet and the emergence of social media was that of an open culture of information that would allow truly democratic collaboration and empowerment where the economic and political barriers to participation were taken away. In 2005 Surowiecki formalized the concept of wisdom of the crowds that posits that the emergence of social media will help the development of a more democratic decision making process as often decisions and tasks made by collective actions are better than the ones made by individuals [Surowiecki 2005]. This paradigm brought many to hail the Internet as a tool to provide true freedom and equality, as access to information becomes universal and modes of collaboration endless and fluid.

Wikipedia is usually taken as the prime example of the possibility of this collaborative vision. In Wikipedia, large number of users and contributors who fact-check, correct and update Wikipedia constantly results in an encyclopaedia whose quality and accuracy of information is comparable to that of the leading encyclopaedias and sometimes better [GIles 2005].

Another very successful example is the operating system Linux. Both cases show that necessary conditions for the wisdom of the crowds to develop is the presence of a fertile environment that can accept crowd-based decisions, that provides everybody with the same information, that works in a decentralized fashion leveraging on the differences between individuals and aggregate their opinion fairly. Unfortunately, this idealist scenario has only worked in a few cases and for certain actions. More often than not very different patterns of behaviour have emerged; in the real world where all to often the incentives work against the collective and for individual interests, significant research has demonstrated that networks develop in a different way. This has been made clear in a variety of experiments based on coordination as for example the 2009 1 DARPA Network Challenge, that offered a $40,000 prize to the first team to find red weather balloons placed in 10 undisclosed locations in the continental United States. The winning team distributed the task on the social platforms and adopted a variant of a known incentive framework [Kleinberg & Raghavan 2005]. Notwithstanding the incentive approach, the winning team had to overcome a number of fake reports and a deliberate action of spoofing. This feature posed serious concerns on the role of the Internet and social networks in timely and effective communication for the management of populations in the case of natural disasters and/or terror attacks, when pervasive and timely warnings and alerts could be crucial to save lives [National Academic Press 2015].

Research today demonstrates that it is impossible to describe with simple features the effects of social media in the development of our society as the complexity in having to take account of the actions of millions of individuals far extends our capacity. Social media are today an incredibly powerful instrument of news creation and distribution. The emergence and ubiquotousness of issues like “fake news”, “microtargeting” and “computational propaganda” or eco-chambers demonstrate the power that a diverse range of actors ascribes to social media. It is therefore of the utmost importance to understand the forces and the causes that generate this phenomenon that is seriously changing the present society [The Guardian 2017].

SCIENCE OF NETWORKS: Statistical approaches, complex systems and network theory are key instruments to describe the spreading of information on social media. Statistical physics in particular has proven very efficient in the detection of sudden change in the qualitative properties of the system (functional/un-functional). Here the role of the topological structure of the connections [Caldarelli 2007] between individuals in the present society has been investigated in a variety of papers [Quattrociocchi et al. 2014]. The key point is to detect and understand the role of superspreaders, i.e. the persons/sites who account for most part of the diffusion in the system. As communication today provides similar patterns to any complex network, be it biological networks or computer networks, an obvious approach to better understand the spreading of information on social media is to consider the analogous diffusion patterns of disease in society. Yet this model has proven insufficient in explaining certain features of the spreading of information on social media as it ignores the different mechanisms of transfer for the disease with respect to the news. Indeed, for the diseases the role of superspreaders has been traditionally linked to measures of centrality [Lloyd-Smith et al. 2005, Reluga 2009]. Contrarily to expectations [Jackson et al 2011, Ghali et al. 2012], when considering social networks, traditional quantities as the PageRank or the degree centrality are not always successful in detecting the key-players. Much more successful approaches are related to the analysis of persistence of k-core structures [Pei et al. 2015, Gong et al 2016] across different social networks.

Although significant research has been done to understand the spreading of (mis)information on social media the complexity of the environment, the number of individuals, echo-chambers, make the system so highly complex that no simple model seems to be sufficient at this time. An important note to make is also the fast – and relatively unpredictable – pace at which people move from one platform to the next, changing patterns and preferred method of communication (text, photo, other) as technology and services progress. More interdisciplinary studies will be required in order to answer the fundamental questions relating to the working of social networks.

RESEARCH ALGORITHMS AND SOCIAL MEDIA: Today, algorithms curate everyday online content by prioritizing, classifying, associating, and filtering information. In doing so, they exert power to shape the users’ experience and even their perception of the world [Diakopoulos (2015)]. In the light of the issues raised above, another core area of research has been looking into the structure and biases of algorithms and their effect on social media. Some of the most powerful influencers on elections today are the social media platforms and the algorithms they use to spread information [Guardian 2017]. Yet it is not possible for researchers to measure and effectively study these phenomena as the majority of data is not made available by the privately owned platforms (Twitter, Facebook, etc.). Philip Howard of the Oxford’s Internet Institute highlighted “that there have been several democratic exercises in the last year that have gone off the rails because of large amounts of misinformation in the public sphere; Brexit and its outcome, and the Trump election and its outcome, are what I think of as ‘mistakes’, in that there were such significant amounts of misinformation out in the public sphere.” [The Guardian 2017]. Whilst over 60% of internet users in 2015 where completely unaware of the curated nature of their social media feeds [Eslami, M. et al. (2015)], these are in reality highly curated content where there is a clear algorithmic selection, based on a number of factors including payment history, popularity, how interactive use has been and actions of other.

Researchers today agree that the impact of the algorithms of social media platforms should not be underestimated and that they have a high responsibility in the spreading of (mis)information. The politics behind algorithms is becoming one of the most important questions: “who (and how) is setting the news agenda in the era of the algorithms?” leading to the question: Who Controls the Public Sphere in an Era of Algorithms? [Reed 2016]. Along these lines there is a growing concern about the rise of bots [Ferrara et al. 2016] as producers and amplifiers of fake news in systems like Twitter. DARPA for example organised a challenge to find the best algorithmic approaches to detect and measure bot influence from terrorists [Subrahmanian 2016]. There is also a strong debate if the use of algorithms increases or not [Messing et al. 2014] the presence of the so-called echo chambers [Bakshy et al. 2015, Bessi et al. 2016].

2. Understanding the emergence of narratives and change of behaviour in the age of social media

Social media do not only affect society by algorithms and changing patterns of behaviour. Research shows that the “content” of information is key in understanding how social media networks work and why certain information has more effect than other. Evidence that came out in the wake of the Brexit elections demonstrated the impact of misinformation and the declining role of evidence in public discourse in general. An interesting aspect of the debates was the completely diverging narrative used on the social media compared to the speeches of the politicians and the covering of the traditional media.

This raised two important questions: the first point was the losing importance of facts and evidence; Michael Gove’s (in)famous quote “people in this country have had enough of experts”, has been echoed later in the United States by the coinage of the term “alternative facts”. The second issue was the inadequacy of pollsters to predict the outcome of the results. In fact since the Brexit campaign and in preparation for the 2017 UK elections all UK polling services tweaked their methodology to try and become more accurate with poor results. This highlights that today even professional forecasters cannot read the intricacies and complexity of the new world of social media demonstrating the need for a more systematic and comprehensive approach that not only looks at the patterns of behaviour and spread of information, but that works closely with psychologists, political scientists, economists and other experts to redesign interdisciplinary and perhaps intersectoral ways to approach the understanding of this new realm.

A new culture is forming, the culture of real virtuality [Castells, 2010] where the interaction between the online and offline worlds become increasingly hybridized in our everyday life. Evidence from academic research has shown that this new culture – and the communication fabric made available by the Internet and wireless communications that permeate everything we do, wherever and whenever we do it – is increasingly polarising society, creating echo-chambers and parallel realities where the meaning of the truth is contested.

NARRATIVE THEORY: The above trends have brought increasing focus on the role of narratives in framing reality. A narrative is a story that has a beginning, a body, and an end that unfolds in accordance with a plot. A narrative describes the temporal sequences of actions performed by agents, and the consequences of these coordinated actions. Narratives are shared with others. Groups that share narratives form narrative communities. Narrative schemas are abstract, general representations of the general features of sets of narratives that have similar structure. By specifying how scenarios usually unfold they provide the basis on which individuals understand the rules that govern the social world.

Cognitive science shows that an estimated 98% of our thought is reflexive, i.e. unconscious, and not reflective (conscious) [Lakoff, 2008]. This startling fact underlines why narrative theory is so important. Narratives provide the structure or blueprint with which individuals embedded in a social context understand their social reality (e.g., Berger & Lukmann, 1966; McAdams, 2006; Nowak et al., 2016; Polkinghorne, 1991). So although life unfolds on a momentary and granular basis, a narrative organizes this granularity into a higher-level structure that gives meaning to subsets of lower-level actions and events. Individuals do not decide about credibility of information on an item-by-item basis. Rather they process information in the context of narratives they have adopted and believe information that fits the narratives “it must be true, because this is how things usually are” while rejecting information contrary to narratives. Moreover individuals actively seek information that confirms narrative schema an avoid information that contradicts their narratives. For example it was shown that 77.92% of likes and 80.86% of comments on fake content generated by trolls and satire sites were from users interacting with conspiracy stories (Bessi at al 2015). Efforts to debunk fake news that support adopted narratives often leads to paradoxical effect of initiating a search for information that supports the narratives and in effect strengthens the narrative.

NETWORKS OF TRUST: Evidence shows that the perception of “truth” depends on the narratives that single individuals accept as reality. Evidence in fact shows that information and misinformation spreads in the same way across social media. Research has revealed that a key factor for individuals in classifying information is origin of the information. If the information comes from a source perceived as trusted and fits the accepted narrative, it will be accepted without question; if the information comes from a source perceived as trusted but does not fit the narrative the user will search for evidence; if the information comes from a source perceived as untrusted it will be discredited without much thought. Thus the study of networks of trust is becoming an increasingly important topic for researchers as much as that of the impact of influencers.

SENTIMENT ANALYSIS: Whilst narrative theory remains a very resource intensive field, sentiment analysis has come up to help with understanding the meaning of very high number of natural language or text messages. Sentiment analysis provides a way to aggregate information contained in a high volume of unstructured material. In large bodies of text for example, often specific information may be disregarded, because the essence of meaning may be characterized by the evaluation contained in the text (Osgood 1952). Sentiment analysis refers to automatic discovery of evaluation in text (whether it is positive, neutral or negative) with respect to emotions, attitudes and opinions. It usually is aimed to discover opinions or attitudes of a set of individuals (authors of texts) towards a person (e.g. political candidate) or an object (e.g. a product), or on a topic (e.g. reaction to a news event). It is also used to diagnose users’ affective states. Sentiment analysis is one of the most active research fields in natural language processing both in scientific studies in computer science and the social sciences as well as in practical applications both for business (e.g. marketing) and societies (for example in diagnosis of public opinion and its shifts). Sentiment analysis is an especially important tool for the analysis of the content of social media, which contains a large volume of opinionated data. It provides a common metric with which attitudes of groups of individuals in social networks may be revealed, changes of attitudes can be tracked in time, and attitudes toward products or individuals (e.g. political candidates) can be diagnosed.

PSYCHOMETRICS, PSYCHOLOGICAL OPERATIONS – PSY-OPS: the availability of big data combined with psychometrics and psychological operations has raised a number of questions for researchers, policymakers, media and other stakeholders. Indeed, as people use social media more and more, this enormous amount of data on individual behaviour and interests is today being used to make predictions about the future [Qiu et al 2017]. The effects that this can have on our behaviours, institutional systems and our meaning of democracy are central questions today. Laboratory studies show that such persuasive appeals are more effective in influencing behavior when they are formulated to fit individuals’ unique psychological characteristics. Recent research shows that people’s psychological characteristics can be accurately predicted from the digital footprints left in the digital environment. It has been shown that a wide range of psycho-demographic traits, including political and religious views, sexual orientation, personality, intelligence and race, can be accurately predicted from an individual’s Facebook Likes, Tweets, or browsing logs (PNAS 2013 Kosinski, JPSP, Quercia). For example, Facebook-Likes-based predictions of personality were shown to be more accurate than personality ratings of friends and family members (PNAS 2015 Wu). Ability to reveal psycho-demographic traits of large number of people based on their digital footprints, is a key enable of persuasive micro-targeting that will be addressed in more detail below.

O2O EFFECT: Online to Offline (O2O), is a new model of e-commerce, in which online consumers can get products and services offline (Du, Tang 2014, Tsai, Yang &Wang). This area of research is gaining traction as the culture of real virtuality becomes more pervasive. O2O mechanisms may also be used in political persuasion, where a group of users identified online as having particular political orientation, or a specific profile of likes, may be prompted for some real-world action, e.g. to attend a rally, join a meeting, stage a boycott etc. In a similar vein, an identified offline group of individuals (e.g. a list of individuals of a particular meeting) may be targeted in online messaging.

MICROTARGETING: One of the most interesting and novel trends today is that of microtargeting. Microtargeting uses big data mining techniques to adjust information to its viewer’s profile. It is used by political parties and election campaigns to communicate with group of voters, that involve predictive market segmentation (aka cluster analysis), to influence elections. The most alarming example of this kind of research came out after the US 2016 elections where it emerged that a company working in parallel with big data analytics and psychometrics, Cambridge Analytica, claimed to have successfully influenced both the Brexit election and the US elections using this data in order to create microtargeted messages to target users across the voter base [The Guardian 2017]. The case of Cambridge Analytica made apparent the technical capacity we have today in using new technologies to “manipulate” voters that governments across the world have started to create research groups looking especially at this problem, coined by the Guardian, as the hijacking of our democracy. Following the validation of claims that Russian bots and volunteers using micro targeting in the US 2016 Presidential elections, the New York Times described this as being “a worldwide, internet-based assault on democracy”. Scholars at the Oxford Internet Institute have tracked armies of volunteers and bots as they move propaganda across Facebook and Twitter in efforts to undermine trust in democracy or to elect their preferred candidates in the Philippines, India, France, the Netherlands, Britain and elsewhere [Vaidhyanathan 2017, New York Times].

DEBUNKING FAKE NEWS: The problem of fake news is much more complex than it looks at the first sight. Fake news are not a new phenomenon, they have been in the public sphere forever, as part of political propaganda, marketing efforts, popularity seeking by newspapers etc. The novelty of fake news is associated with the technological aspects of how it is propagated, rather then just the fact that false information is intentionally spread. The essence of democracy is that citizens make informed decisions and thus the access to reliable information is at the foundation of democracy. In fact the fourth estate – the media – has held a key place in todays democratic systems. As the main channel of information spread shifts from traditional media to social media, the traditional ways of assuring the reliability of information, such as clearly established standards of responsible journalism, get progressively weaker. The systematic distortion of information by internal or external sources that we see today represents a dangerously effective way of manipulating societies.

Fake news and the techniques and processes which are used to spread them are in constant evolution, and the definition of fake news is itself problematic, as in most cases fake news are the result of gradients of compositions of true and false facts. On top of that, the communities, technologies, networks, software platforms and online services used to spread these malicious communications are constantly evolving. Indeed, academics, organizations and companies that deal with these phenomena through their research, products and services agree that there is no single fits-all model, and that scenarios and battlegrounds are ever evolving. Significant efforts have been made both by the academic community as well as by media organisations, governments, civil society organisations and citizens to respond to this challenges. Case studies include algorithms, fact-checking tools, efforts by the big new media giants such as Facebook and Google to address the issue. So far no silver bullet has proven effective but it will be important to look at the successes and failures of existing initiatives when trying to address the fundamental question of how to deal with the scale of the problem of fake news in todays media environment and its effect on our political and economic institutions.

In the 21st-century social media information war, faith in democracy is the first casualty. This is why a better understating of the way in which (mis)information spreads across social media, in a comprehensive and interdisciplinary way, is a prerequisite in order to be able to find meaningful ways to address the deeper issues that these new technologies are raising and how they are affecting social behaviour.