## Previous Talks

*Essentials of Real-World Data Analysis (2):*Noise, Duplicate and Outlier Detection

🗣** Jamshid ArdalanKia**, Visitor at CCNSD – Shahid Beheshti University

Mon, Farvardin 26th, 1399**16:00 – 17:00**

Ibn al-Haytham Hall, Physics Dept. SBU

#### Summary

How can we explore/extract patterns from experimental (real) data-sets which are valuable but messy and containing missing values? In real-world data analysis, discovering the statistical characteristics of data is of great importance. We are eager to reveal the intrinsic and latent properties of data sets which they are not complete or accurate! So, the question is, given such messy data-sets, how can we extract the patterns embedded in the data and then, to what extent can we check the accuracy of our results? As discussed in the previous talk, 80% of data scientists’ time is spent on data cleaning. Data cleaning is not something separated from data science. In this regard, analysis of outliers and noise, duplicates and missing values is of great significance for coming to a reliable study. In this presentation, for the continuation of discussions about data cleaning before running the preprocessing step, we statistically visualize the noise and outlier data. By these methods, we can further design our criteria for what we consider as noise and outlier. Then, this question arises: Should we get rid of them? Or conversely, are we lucky to have them?

*Essentials of Real-World Data Analysis*

🗣** Jamshid ArdalanKia**, Visitor at CCNSD – Shahid Beheshti University

Mon, Farvardin 19th, 1399**16:00 – 17:00**

Ibn al-Haytham Hall, Physics Dept. SBU

#### Summary

How can we explore/extract patterns from experimental (real) data-sets which are valuable but messy and containing missing values? In real-world data analysis, discovering the statistical characteristics of data is of great importance. We are eager to reveal the intrinsic and latent properties of data sets which they are not complete or accurate! So, the question is, given such messy data-sets, how can we extract the patterns embedded in the data and then, to what extent can we check the accuracy of our results?

At CCNSD, the cornerstone of our projects is extracting the network(s) from experimental data-sets with the highest possible accuracy. Then, alongside different visualization techniques, we continue our job as network scientists to come to a good description or/and prediction for the system under study. Therefore, the first challenge to tackle while working with real-world data is the problem of missing values and noise reduction/removal procedures before doing any kind of analysis

.

In this presentation, for the sake of more precise results and not coming to adverse conclusions in our projects, I would briefly talk about why we encounter with missing values, how to avoid them as input parameters, how to set some criteria on them, how to find, perceive, handle, manipulate, or/and get rid of fake information in the data!

*The Role of Agents’ Relation Age in Social Balance of a Singed Network*

🗣** Dr. Sahar Arabzadeh**, Visitor at CCNSD – Shahid Beheshti University

Mon, Esfand 20th, 1397**16:00 – 17:00**

Ibn al-Haytham Hall, Physics Dept. SBU

#### Summary

Why we are intelligent species among many organisms? There are many arguments about this. By looking at the natural process of growth, evolution or social relations, you can immediately capture that any agent has a lifetime that is not infinite. From an evolutionary perspective, we should have enough time to achieve our abilities and express them. Could there be a human civilization if we would have lived just 10 years old? On the other hand, could it be more intelligent species if they were let to live more?

It seems that more chance to survive and get to the maximum life span, give more chance to see intelligent species or in our world, more individuals who reach to balance. Why? Because the excellence of an evolutionary system is the achievement of balance.

It is not just that how long it survives, but that how long does it take to have the opportunity to communicate with other agents. This is the nodes’ duty to reduce network stress by changing their relations. It is important to have time to form a balanced and minimized stress society. In other words, we can say more intelligent species are those who communicate with their environment in the best way.

*CCNSD Interdisciplinary Research Projects on Twitter, Arms Trade & Co-citation Maps*

🗣 Dr. Kosar Karimi Pour, Post-doctoral Researcher at CCNSD – Shahid Beheshti University

#### Summary

Here at CCNSD, researchers from different backgrounds (e.g. physics, mathematics, business administration, and sociology) collaborate to understand the complex nature of social networks. In this presentation, I will give brief introductions into three empirical research projects currently underway in the center. Through these introductions, I am hoping to stimulate a discussion on the power and challenges of such interdisciplinary collaborations. In the first project, we use Twitter API for data crawling to answer a series of questions, including how social bots in Farsi Twitter have evolved in the past ten years, what are the patterns of interactions between bots and human users in Farsi Twitter, and what are the different patterns of propagation of true and false news on Twitter. In the second project that I will introduce, we use international arms trade data to explore the network mechanism through which the arms trade moved away from its polarized state during the Cold War. Finally, I will talk about the project on intellectual mapping of the field of Middle Eastern Studies. In this project, which is an extension of my Ph.D. dissertation, we use data from Google Scholar to create longitudinal co-citation maps of Middle Eastern Studies in English and ask whether the social backgrounds of individual scholars can (partly) explain their positions in the intellectual network of the field.

#### Absorbing Phase Transition in the Coupled Dynamics of Node & Link States in Random Networks

🗣 Dr. Meghdad Saeedian, IFISC (CSIC-UIB), Universitat Illes Balears, Spain

#### Summary

We present a stochastic dynamics model of coupled evolution for the binary states of nodes and links in a complex network. In the context of opinion formation node states represent two possible opinions and link states a positive or negative relation. Dynamics proceeds via node and link state update towards pairwise satisfactory relations in which nodes in the same state are connected by positive links or nodes in different states are connected by negative links. By a mean-field rate equations analysis and Monte Carlo simulations in random networks we find an absorbing phase transition from a dynamically active phase to an absorbing phase. The transition occurs for a critical value of the relative time scale for node and link state updates. In the absorbing phase the order parameter, measuring global order, approaches exponentially the final frozen configuration. Finite-size effects are such that in the absorbing phase the final configuration is reached in a characteristic time that scales logarithmically with system size, while in the active phase, finite-size fluctuations take the system to a frozen configuration in a characteristic time that grows exponentially with system size. There is also a finite-size topological transition associated with group splitting the network of these final frozen configurations.

*Using big trace data to understand social behavior and decision making*

🗣 Amirhosein Farzam – Northwestern,PhD. student in Applied Mathematics

#### Summary

GPS data is a new source of information that became available at scales that would allow for studies on individuals only recently. Despite it being a conveniently available source of massive data, there have been relatively few studies on using this data to understand social concepts and study political issues systematically. We developed a pipeline for processing big trace data and matching it to nodes on the streets network using a Hidden Markov Model in a computationally feasible way. The output of this pipeline is then used to compute the posterior distribution of a random path model parameters that would shed light on taste-based sectarian segregation in Baghdad.

*Exact solution of generalized cooperative SIR dynamics*

🗣Fatemeh Zarei, SUT