Twitter Project

The Twitter team in CCNSD studies Farsi Twitter and its trending topics. Twitter data, in our team, is used as an indicator of the social, cultural and political concerns and interests of Twitter users. We, however, assume that the impact of Twitter, and other social media, do not stop online, and the processes are part of a much larger sphere of influence on Farsi speaking countries and diasporic communities. Currently, we are working on two different projects:

 (1) Human-bot interaction in the formation of collective identities in Farsi Twitter 

(2) A typology of fake news based on their propagation.


Human-bot interaction in the formation of collective identities in Farsi twitter

For a collective action to take place, a collective identity, discursive community —or a collective voice, in a less strict sense— is formed through a dynamic process. Today, Online Social Networks (OSNs)  have changed the way these collective identities or voices are shaped and the way actors come to constitute themselves as a collective or an online community. In shaping the collective identity in social media, the boundaries between inside and outside tend to get blurry, leading to a less concrete definition and more conflicts and negotiations in discursive space. We are interested to study these collective identities and the dynamic processes of their formations.

Although OSNs ephemeral role is clear in flash mobilization or the aggregation of the public around contentious issues, our focus is on the formation of a more steady phenomenon of collective identities (or discursive communities). Simple aggregation of individuals around a common cause does not create long-standing community and solidarity. We look at the individual actors on OSNs and ask whether collective identity is fostered through individual activities on OSNs. We also ask whether collective solidarity is shaped in different political, social, cultural, economic, and environmental issues or the process of using OSNs in general is more individualizing.

Among various OSNs, our focus is on Twitter. Twitter, as the most popular microblogging tool, has become a live source of discussion of current social and political events. Compared to other platforms, Twitter plays a significant role in facilitating the formation of online communities. It is more public and offers limited privacy by default, contributes more to the rapid spread of information, acts as the most timely platform in covering the news, and is the most commonly used corporate social media platform, (Min, Wilson, & Moon, 2014; Osborne & Dredze, 2014; Olteanu, Castillo, Diakopoulos, & Aberer, 2015).  

We are, in particular, interested in the role of twitter bots in the process of the formation of these collective identities or discursive communities. Bot population is estimated to be between 5% to 15% of accounts on twitter, an estimation which excludes cyborgs and possibly very sophisticated bots that “escape a human annotator’s judgment” (Varol et. al., 2017).  Twitter bots, frequently part of a botnet (or a group of coordinated bots), are used to perform simple and structurally repetitive tasks, with a much higher rate compared to that of humans. Bots tweet, retweet, mention, and follow other users/bots singularly or collectively in a coordinated manner (i.e. botnets). Automated programs or bots are developed to disrupt or influence online discourses, using spam hashtags and scam users, and exercise a profound impact on content popularity and activity on Twitter (Ferrara, Varol, Davis, Menczer & Flammini, 2016).

Dr. Kosar Karimipour Introduces the Farsi Twitter Project

Among different types of bots, we focus on social bots, the type that aims to influence the public discourse on Twitter. Today, social bots and botnets attempt to play an opinion-making role in OSNs in general, and twitter in particular, by affecting or manipulating the discourse as a whole. A growing stream of research looks for, and quantifies, the effects of such coordinated activities on a range of topics from elections, to civil war and even advertisements (see for example Abokhodair, Yoo & McDonald, 2015; Bessi & Ferrara, 2016; Clark et. al, 2016). They are created by entities (individuals or organizations) with technical skills to the target population and promote content, with motivations to boost popularity or to influence public opinions (Bessi & Ferrara, 2016; Ferrara, et al., 2016). Frequently referred to as sybils, they seem to exert a disproportionately large influence on OSNs by creating pseudonymous identities. Researchers have studied the flow of information and ideas and social connectivity and showed how the behavior of bots is different in botnets (interacting with other bots) and outside of it (interacting with other humans) are interconnected (see for example Berger & Morgan, 2015; Bessi & Ferrara, 2016; Stella, Ferrara, & De Domenico, 2018). Following this stream of literature, we aim to detect the effect of social bots on online discourses on Farsi Twitter. Our research questions are focused on the interaction between human and bot users in the process of the formation of collective identities or discursive communities.

Bot-detection or classification of accounts into groups of bots, human users and possibly cyborgs in Farsi Twitter is the first step toward studying the patterns of communication among different classes of entities on the network. Easily detectable bots show lower levels of entropy, their tweet content follows spam patterns, and their account properties (e.g. URL ration, account name, tweeting device makeup) show more deviation from normal. Simple classification algorithms are built to detect these bots using metadata, such as the followers/followings ratio, tweeting frequency, account age, and the number of mentions/replies (Chu, Gianvecchio, Wang & Jajodia, 2012). However, over time, some bots have become more sophisticated in emulating the patterns of human activities and are now harder to detect. To detect more sophisticated bots a combination of features is usually evaluated. Features that are evaluated include twitter user metadata (e.g. default picture, length and the number of digits in the screen names and usernames, and per-hour and the total number of tweets, retweets, or followers, and timing (e.g. time between two (re)tweet)), follower/following network (e.g. language use, the distribution of account age, network density, and clustering coefficient), and content (e. number and entropy of words, the proportion of Part-Of-Speech tags, and sentiment analysis (e.g. number of emotions per tweet, and polarization score of each tweet) (Varol, e. al., 2017).

We use the Botometer API to get a bot-score for each account, and a classification mechanism to divide the account into bots, possible bots, and humans. Botometer is developed by Indiana University Network Science Institute (IUNI) and the Center for Complex Networks and Systems Research (CNetS) to classify an account as a bot or human (Yang et. al., 2019). After detecting the botnets (or users showing bot-like behavior) in Farsi twitter we seek to study the possible characteristic differences between botnets and human networks. We also ask what are the alternate discourses, narratives or hoaxes disseminated by botnets, and whether they are successful in disseminating them.

Once the required data from the Farsi Twitter is available, we aim to take a number of measurements and evaluate the corresponding metrics that could reveal informative patterns in the data. In particular, we need the continuous (gap-free) data for tweets in Farsi Twitter over a long-enough time interval, preferably including the follower-following metadata that could help us build the friendship network. In order to identify trends in the network, we necessarily need to study the data over time and it is essential that these time intervals span all stages of development of the Twitter conversations over issues corresponding to trending hashtags, from dawn to dusk of the trends. This enables us to run a meaningful and rigorous analysis of the data that would have potentially valuable interpretations. As soon as this data is available, we initially run macro-level analyses that could shed light on the influence of bots on making a conversation trendy, or more generally, on the formation of a collective identity. This includes, but is not limited to, evaluating the share of bots of the network (both content and friendship networks) involved in each particular conversation over time. At the next stage of the data analysis, we plan to make use of standard network analysis methods, such as centrality measures, to evaluate the relative importance of bots in the network corresponding to each content. Although various candidate analysis techniques, such as methods for evaluating cascade on networks, information diffusion in multilayer networks (friendship, and content), and constructing a gravity field of contents have been discussed in our group, we would prefer to avoid claiming any of these to be useful and keep our options for the technical aspects of the next stage considerably open. In the light of the results from the initial macro-level analyses, we could get a more concise idea of what particular methods would produce more informative results and could be generalizable in other media, societies, or contexts.

Iranian Presidential Election in 2021

This is a retweet network of the election discussion for 2 weeks before the election. Each node’s color represent its community. Each community is tagged based on its hubs political interest which is demonstrated in the graph.

This is a retweet network of the election discussion for 1 weeks after the election. Each node’s color represent its community. Each community is tagged based on its hubs political interest which is demonstrated in the graph.

Social media is the latest fascination of political communication strategists, particularly during election campaigns. While television and newspapers remain dominant sources of news in general, social media has become an increasingly important source for political news in particular. Twitter, due to its political nature has played a significant role in this matter. With this sudden shift in paradigm many issues have emerged including; mis/disinformation, echo – chambers, polarization and online manipulation through the use of bots and trolls. These issues have been heavily studied in different concepts such as covid-19 pandemic misinformation in Italian twitter (Mattia et. al., 2021), the activity of bots and non- genuine accounts in twitter for in 2016 U.S. presidential election, 2017 French presidential election and Brexit (Gorwa et. al., 2020) and so on.

Recent political upheavals in Iran, Tunisia and Egypt have been frequently portrayed as ‘social media revolutions’ in the Western media, with journalists, pundits, bloggers, diplomats and PR professionals offering views, opinions and analysis around the clock. Largely absent from this lively discussion have been academics, who probably could not keep up with the pace of technophilic media frenzy surrounding these events. 

Due to the rise of technology and specifically the emergence of “Web 2.0”, the effect AI agents have in shaping personal decisions and ideologies has become significant. Social media impacts the way its users eat, dress and behave. Moreover, with the rapid increase in social media users, these platforms have a significant role in elections and the manner in which societies are shaped or disintegrated. Due to the fact that more and more individuals are using social media as their news source, the propagation of dis/misinformation is becoming a vital threat to the public. On the other hand, the way a topic is discussed and the shape of its network decides the likelihood of a user encountering opposing views. Since, in most cases people are influenced by social media to make decisions for their lives, such as meals, style and so on, a lot of companies are interested in maneuvering the user’s attention to their product. Similarly, since these platforms also work as a source of information, they have become a tool for malicious actors to exploit online discussions in an attempt to manipulate the public opinion. To prevent exploiting users’ opinions, more research in this matter is necessary.

Over the past few years, this team has been studying retweet and friendship networks of trending topics in Twitter. We constructed a “Twitter Machine” program that takes a keyword and an initial date as the input and stores any tweet, with that word, since the given date in an SQLite database. The tweets are made available through Twitter’s Standard Search API. Twitter Machine deconstructs the JSON that API returns, records the necessary data in an SQLite database and updates the user index in the PostgreSQL database. PostgreSQL is used as a relational database and contains every user participating in a conversation gathered by Twitter Machine. The users’ information is regularly updated when Twitter Machine does a new task. In a recent study about Iranian presidential election topic in twitter, Twitter Machine collected 8818675 tweets that included the hashtags relating to the election discussion. 153115 users tweeted these posts over the 8 weeks used in this study. 

Besides, you can access the data _ list of users and tweets _ to collaborate on future projects. Also, feel free to be in touch and obtain more information about the data.

Current Team Members

Saeedeh Mohammadi

Saeedeh Mohammadi

Researcher in Computational Social Sciences at Center For Complex Networks and Social Data Science

Parham Moradi

Parham Moradi

Data Science, Computational Social Sciences & Complex System

Amirhossein Pilehvarian

Amirhossein Pilehvarian

Parsa Bigdeli

Parsa Bigdeli

Researcher in Network Science at Center For Complex Networks and Social Data Science

Former Team Members

Kosar Karimipour

Kosar Karimipour

a sociologist by training and now a postdoc at CCNSD

Arya Gholampour

Arya Gholampour

BA in Physics