Effects of missing data in social networks

Effects of missing data in social networks

April 14, 2024 | G. K. ossinets
The paper examines the impact of missing data on the structural properties of social networks, focusing on three main mechanisms: network boundary speciation (omission of actors or interactions), survey non-response, and censoring by vertex degree (fixed choice design). Using the scientific collaboration network from the Los Alamos E-print Archive and random bipartite graphs, the study shows that boundary speciation and fixed choice designs can significantly alter network-level statistics. Omission of interaction contexts or fixed choice of interactions overestimates clustering and assortativity, while actor non-response underestimates them, leading to inflated measurement errors. Social networks with multiple interaction contexts exhibit surprising properties due to overlapping cliques, such as assortativity by degree not necessarily improving network robustness to random node omissions. The paper also highlights the importance of considering missing data mechanisms in network analysis, emphasizing the need for sensitivity analyses and the development of methods to minimize their effects. The results suggest that network properties like average vertex degree, clustering coefficient, and assortativity are sensitive to missing data, and that the choice of network representation (e.g., bipartite vs. unipartite) can influence these estimates. The study underscores the challenges of analyzing social networks with incomplete data and the importance of accounting for missing data mechanisms in network research.The paper examines the impact of missing data on the structural properties of social networks, focusing on three main mechanisms: network boundary speciation (omission of actors or interactions), survey non-response, and censoring by vertex degree (fixed choice design). Using the scientific collaboration network from the Los Alamos E-print Archive and random bipartite graphs, the study shows that boundary speciation and fixed choice designs can significantly alter network-level statistics. Omission of interaction contexts or fixed choice of interactions overestimates clustering and assortativity, while actor non-response underestimates them, leading to inflated measurement errors. Social networks with multiple interaction contexts exhibit surprising properties due to overlapping cliques, such as assortativity by degree not necessarily improving network robustness to random node omissions. The paper also highlights the importance of considering missing data mechanisms in network analysis, emphasizing the need for sensitivity analyses and the development of methods to minimize their effects. The results suggest that network properties like average vertex degree, clustering coefficient, and assortativity are sensitive to missing data, and that the choice of network representation (e.g., bipartite vs. unipartite) can influence these estimates. The study underscores the challenges of analyzing social networks with incomplete data and the importance of accounting for missing data mechanisms in network research.
Reach us at info@study.space
Understanding Effects of missing data in social networks