Understanding Effects of missing data in social networks

This paper by G. Kossinets explores the impact of missing data on the structural properties of social networks, focusing on three main mechanisms: network boundary specification (non-inclusion of actors or affiliations), survey non-response, and censoring by vertex degree (fixed choice design). The study uses the scientific collaboration network from the Los Alamos E-print Archive and random bipartite graphs to examine these mechanisms. Key findings include: 1. **Network Boundary Specification**: This can significantly alter estimates of network-level statistics, such as clustering and assortativity coefficients. Omission of interaction contexts or fixed choice of affiliations can overestimate these coefficients, while actor non-response can underestimate them due to inattentive measurement error. 2. **Survey Non-Response**: This can lead to a decrease in the average degree of the network, but the effect is less severe than that of network boundary specification. Reciprocal nominations can help mitigate the impact of non-response. 3. **Fixed Choice Design**: This mechanism can introduce a non-random missing data pattern, affecting the structural properties of the truncated graph. The effect depends on the network's mixing pattern ( assortative or disassortative). The paper also discusses the concept of "redundancy" in group affiliation, which measures the average importance of interaction contexts. High redundancy implies that new interaction contexts are likely to link already connected actors, reducing the sensitivity of the network to boundary specification. Overall, the study highlights the importance of considering the sources and mechanisms of missing data in social network analysis to ensure accurate and reliable estimates of network properties.This paper by G. Kossinets explores the impact of missing data on the structural properties of social networks, focusing on three main mechanisms: network boundary specification (non-inclusion of actors or affiliations), survey non-response, and censoring by vertex degree (fixed choice design). The study uses the scientific collaboration network from the Los Alamos E-print Archive and random bipartite graphs to examine these mechanisms. Key findings include: 1. **Network Boundary Specification**: This can significantly alter estimates of network-level statistics, such as clustering and assortativity coefficients. Omission of interaction contexts or fixed choice of affiliations can overestimate these coefficients, while actor non-response can underestimate them due to inattentive measurement error. 2. **Survey Non-Response**: This can lead to a decrease in the average degree of the network, but the effect is less severe than that of network boundary specification. Reciprocal nominations can help mitigate the impact of non-response. 3. **Fixed Choice Design**: This mechanism can introduce a non-random missing data pattern, affecting the structural properties of the truncated graph. The effect depends on the network's mixing pattern ( assortative or disassortative). The paper also discusses the concept of "redundancy" in group affiliation, which measures the average importance of interaction contexts. High redundancy implies that new interaction contexts are likely to link already connected actors, reducing the sensitivity of the network to boundary specification. Overall, the study highlights the importance of considering the sources and mechanisms of missing data in social network analysis to ensure accurate and reliable estimates of network properties.

Effects of missing data in social networks

April 14, 2024 | G. Kossinets