You find my publications on Google Scholar and listed below.

  1. Mapping non-local relationships between metadata and network structure with metadata-dependent encoding of random walks
    Aleix Bassolas, Anton Holmgren, Antoine Marot, Martin Rosvall, and Vincenzo Nicosia
    Sci. Adv. 8, eabn7558 (2022)
    Abstract »
    Integrating structural information and metadata, such as gender, social status, or interests, enriches networks and enables a better understanding of the large-scale structure of complex systems. However, existing approaches to augment networks with metadata for community detection only consider immediately adjacent nodes, and cannot exploit the non-local relationships between metadata and large-scale network structure present in many spatial and social systems. Here we develop a flow-based community-detection framework based on the map equation, that integrates network information and metadata of distant nodes and reveals more complex relationships. We analyze social and spatial networks and find that our methodology can detect functional metadata-informed communities distinct from those derived solely from network information or metadata. For example, in a mobility network of London, we identify communities that reflect the heterogeneity of income distribution, and in a European power-grid network, we identify communities that capture relationships between geography and energy prices beyond country borders.
  2. Network-based biostratigraphy for the late Permian to mid-Triassic Beaufort Group (Karoo Supergroup) in South Africa enhances biozone applicability and stratigraphic correlation
    Pia A. Viglietti, Alexis Rojas, Martin Rosvall, Brady Klimes, Kenneth D. Angielczyk
    J. Paleontol. 65, 5 (2022)
    Abstract »
    The Permo-Triassic vertebrate assemblage zones (AZs) of South Africa's Karoo Basin are a standard for local and global correlations. However, temporal, geographical and methodological limitations challenge the AZs reliability. We analyse a unique fossil dataset comprising 1408 occurrences of 115 species grouped into 19 stratigraphic bin intervals from the Cistecephalus, Daptocephalus, Lystrosaurus declivis and Cynognathus AZs. Using network science tools we compare six frameworks: Broom, Rubidge, Viglietti, Member, Formation, and one suggesting diachroneity of the Daptocephalus/Lystrosaurus AZ boundary (Gastaldo). Our results demonstrate that historical frameworks (Broom, Rubidge) still identify the Karoo AZs. No scheme supports the Cistecephalus AZ, and it probably comprises two discrete communities. The Lystrosaurus declivis AZ is traced across all frameworks, despite many shared species with the underlying Daptocephalus AZ, suggesting that the extinction event across this interval is not a statistical artefact. A community shift at the upper Katberg to lower Burgersdorp formations may indicate a depositional hiatus which has important implications for regional correlations, and Mesozoic ecosystem evolution. The Gastaldo model still identifies a Lystrosaurus and Daptocephalus AZ community shift, does not significantly improve recent AZ models (Viglietti), and highlights important issues with some AZ studies. Localized bed-scale lithostratigraphy (sandstone datums), and singleton fossils cannot be used to reject the patterns shown by hundreds of fossils, and regional chronostratigraphic markers of the Karoo foreland basin. Metre-level occurrence data suggests that 20–50 m sampling intervals capture Karoo AZs, unifying the use of metre-level placements of singleton fossils to delineate biozone boundaries and make regional correlations.
  3. Cross-validation of correlation networks using modular structure
    Magnus Neuman, Viktor Jonsson, Joaquín Calatayud, and Martin Rosvall
    Appl Netw Sci 7, 75 (2022)
    PREPRINT (Version 1) available at Research Square
    Abstract »
    Correlation networks derived from multivariate data appear in many applications across the sciences. These networks are usually dense and require sparsification to detect meaningful structure. However, current methods for sparsifying correlation networks struggle with balancing overfitting and underfitting. We propose a module-based cross-validation procedure to threshold these networks, making modular structure an integral part of the thresholding. We illustrate our approach using synthetic and real data and find that its ability to recover a planted partition has a step-like dependence on the number of data samples. The reward for sampling more varies non-linearly with the number of samples, with minimal gains after a critical point. A comparison with the well-established WGCNA method shows that our approach allows for revealing more modular structure in the data used here.
  4. Compressing network populations with modal networks reveals structural diversity
    Alec Kirkley, Alexis Rojas, Martin Rosvall, and Jean-Gabriel Young
    Abstract »
    Analyzing relational data collected over time requires a critical decision: Is one network representation sufficient? Or are more networks needed to capture changing structures? While the choice may be evident in some cases, for example when analyzing a physical system going through abrupt changes between two known states, other datasets can pose more difficult modeling challenges. Here we describe efficient nonparametric methods derived from the minimum description length principle to construct these network representations automatically. The methods input a population of networks measured on the same set of nodes and output a small set of representative networks together with an assignment of each measurement to one of these representative networks. We show that these methods recover planted heterogeneity in synthetic network populations and effectively identify important structural heterogeneities in example network populations representing global trade and the fossil record.
  5. Similarity-based Link Prediction from Modular Compression of Network Flows
    Christopher Blöcker, Jelena Smiljanić, Ingo Scholtes, and Martin Rosvall
    Abstract »
    Node similarity scores constitute a foundation for machine learning in graphs. Besides clustering, node classification, and anomaly detection, they are a basis for link prediction with critical applications in biological systems, information networks, and recommender systems. Recent works on link prediction use vector space embeddings to calculate node similarities. While these methods can provide good performance in undirected networks, they have several disadvantages: limited interpretability, problem-specific hyperparameter tuning, manual model fitting through dimensionality reduction, and poor performance of symmetric similarities in directed link prediction. To address these issues, we propose MapSim, a novel information-theoretic approach to assess node similarities based on modular compression of network flows. Different from vector space embeddings, MapSim represents nodes in a discrete, non-metric space of communities and yields asymmetric similarities suitable to predict directed and undirected links in an unsupervised fashion. The resulting similarities can be explained based on a network's hierarchical modular organisation, facilitating interpretability. MapSim naturally accounts for Occam's razor, leading to parsimonious representations of clusters at multiple scales. Addressing unsupervised link prediction, we compare MapSim to popular embedding-based algorithms across 47 data sets of networks from a few hundred to hundreds of thousands of nodes and millions of links. Our analysis shows that MapSim's average performance across all networks is more than 7% higher than its closest competitor, outperforming all embedding methods in 14 of the 47 networks, and a more than 33% better worst-case performance. Our method demonstrates the potential of compression-based approaches in graph representation learning, with promising applications in other graph learning tasks.
  6. Map equation centrality: a community-aware centrality score based on the map equation
    Christopher Blöcker, Juan Carlos Nieves, and Martin Rosvall
    Appl. Netw. Sci. 7, 56 (2022)
    Abstract »
    To measure node importance, network scientists employ centrality scores that typically take a microscopic or macroscopic perspective, relying on node features or global network structure. However, traditional centrality measures such as degree centrality, betweenness centrality, or PageRank neglect the community structure found in real-world networks. To study node importance based on network flows from a mesoscopic perspective, we analytically derive a community-aware information-theoretic centrality score based on network flow and the coding principles behind the map equation: map equation centrality. Map equation centrality measures how much further we can compress the network’s modular description by not coding for random walker transitions to the respective node, using an adapted coding scheme and determining node importance from a network flow-based point of view. The information-theoretic centrality measure can be determined from a node’s local network context alone because changes to the coding scheme only affect other nodes in the same module. Map equation centrality is agnostic to the chosen network flow model and allows researchers to select the model that best reflects the dynamics of the process under study. Applied to synthetic networks, we highlight how our approach enables a more fine-grained differentiation between nodes than node-local or network-global measures. Predicting influential nodes for two different dynamical processes on real-world networks with traditional and other community-aware centrality measures, we find that activating nodes based on map equation centrality scores tends to create the largest cascades in a linear threshold model.
  7. Flow-based community detection in hypergraphs
    Anton Eriksson, Timoteo Carletti, Renaud Lambiotte, Alexis Rojas, and Martin Rosvall
    In: Battiston, F., Petri, G. (eds) Higher-Order Systems. Understanding Complex Systems., Springer, Cham. (2022)
    Abstract »
    To connect structure, dynamics and function in systems with multibody interactions, network scientists model random walks on hypergraphs and identify communities that confine the walks for a long time. The two flow-based community-detection methods Markov stability and the map equation identify such communities based on different principles and search algorithms. But how similar are the resulting communities? We explain both methods' machinery applied to hypergraphs and compare them on synthetic and real-world hypergraphs using various hyperedge-size biased random walks and time scales. We find that the map equation is more sensitive to time-scale changes and that Markov stability is more sensitive to hyperedge-size biases.
  8. Mapping flows on weighted and directed networks with incomplete observations
    Jelena Smiljanić, Christopher Blöcker, Daniel Edler, and Martin Rosvall
    J. Complex Netw. 9, 6 (2021)
    Abstract »
    Detecting significant community structure in networks with incomplete observations is challenging because the evidence for specific solutions fades away with missing data. For example, recent research shows that flow-based community detection methods can highlight spurious communities in sparse undirected and unweighted networks with missing links. Current Bayesian approaches developed to overcome this problem do not work for incomplete observations in weighted and directed networks that describe network flows. To overcome this gap, we extend the idea behind the Bayesian estimate of the map equation for unweighted and undirected networks to enable more robust community detection in weighted and directed networks. We derive an empirical Bayes estimate of the transitions rates that can incorporate metadata information and show how an efficient implementation in the community-detection method Infomap provides more reliable communities even with a significant fraction of data missing.
  9. How choosing random-walk model and network representation matters for flow-based community detection in hypergraphs
    Anton Eriksson, Daniel Edler, Alexis Rojas, Manlio de Domenico, and Martin Rosvall
    Commun. Phys. 4, 133 (2021)
    Abstract »
    Hypergraphs offer an explicit formalism to describe multibody interactions in complex systems. To connect dynamics and function in systems with these higher-order interactions, network scientists have generalised random-walk models to hypergraphs and studied the multibody effects on flow-based centrality measures. Mapping the large-scale structure of those flows requires effective community detection methods applied to cogent network representations. For different hypergraph data and research questions, which combination of random-walk model and network representation is best? We define unipartite, bipartite, and multilayer network representations of hypergraph flows and explore how they and the underlying random-walk model change the number, size, depth, and overlap of identified multilevel communities. These results help researchers choose the appropriate modelling approach when mapping flows on hypergraphs.
  10. A multiscale view of the Phanerozoic fossil record reveals the three major biotic transitions
    Alexis Rojas, Joaquín Calatayud, Michal Kowalewski, Magnus Neuman, and Martin Rosvall
    Commun. Biol. 4, 309 (2021)
    bioRxiv 866186
    Abstract »
    The hypothesis of the Great Evolutionary Faunas is a foundational concept of macroevolutionary research postulating that three global mega-assemblages have dominated Phanerozoic oceans following abrupt biotic transitions. Empirical estimates of this large-scale pattern depend on several methodological decisions and are based on approaches unable to capture multiscale dynamics of the underlying Earth-Life System. Combining a multilayer network representation of fossil data with a multilevel clustering that eliminates the subjectivity inherent to distance-based approaches, we demonstrate that Phanerozoic oceans sequentially harbored four global benthic mega-assemblages. Shifts in dominance patterns among these global marine mega-assemblages were abrupt (end-Cambrian 494 Ma; end-Permian 252 Ma) or protracted (mid-Cretaceous 129 Ma), and represent the three major biotic transitions in Earth’s history. Our findings suggest that gradual ecological changes associated with the Mesozoic Marine Revolution triggered a protracted biotic transition comparable in magnitude to the end-Permian transition initiated by the most severe biotic crisis of the past 500 million years. Overall, our study supports the notion that both long-term ecological changes and major geological events have played crucial roles in shaping the mega-assemblages that dominated Phanerozoic oceans.
  11. Regularities in species niches reveal the world’s climatic regions
    Joaquín Calatayud, Magnus Neuman, Alexis Rojas, Anton Eriksson, and Martin Rosvall
    eLife 10:e58397 (2021)
    bioRxiv 851030
    Abstract »
    Climate regions form the basis of many ecological, evolutionary and conservation studies. However, our understanding of climate regions is limited to how they shape vegetation: They do not account for the distribution of animals. Here we develop a network-based framework to identify important climates worldwide based on regularities in realized niches of about 26,000 tetrapods. We show that high-energy climates, including deserts, tropical savannas, and steppes, are consistent across animal- and plant-derived classifications, indicating similar underlying climatic determinants. Conversely, temperate climates differ across all groups, suggesting that these climates allow for idiosyncratic adaptations. Finally, we show how the integration of niche classifications with geographical information enables the detection of climatic transition zones and the signal of geographic and historical processes. Our results identify the climates shaping the distribution of tetrapods and call for caution when using general climate classifications to study the ecology, evolution, or conservation of specific taxa.
  12. Identifying flow modules in ecological networks using Infomap
    Carmel Farage, Daniel Edler, Anna Eklöf, Martin Rosvall, and Shai Pilosof
    Methods Ecol. Evol. 12(6), 778-786 (2021)
    medRxiv 2020.04.14.040519
    Abstract »
    Analysing how species interact in modules is a fundamental problem in network ecology. Theory shows that a modular network structure can reveal underlying dynamic ecological and evolutionary processes, influence dynamics that operate on the network and affect the stability of the ecological system. Although many ecological networks describe flows, such as biomass flows in food webs or disease transmission, most modularity analyses have ignored network flows, which can hinder our understanding of the interplay between structure and dynamics. Here we present Infomap, an established method based on network flows to the field of ecological networks. Infomap is a flexible tool that can identify modules in virtually any type of ecological network and is particularly useful for directed, weighted and multilayer networks. We illustrate how Infomap works on all these network types. We also provide a fully documented repository with additional ecological examples. Finally, to help researchers analyse their networks with Infomap, we introduce the open source R package infomapecology. Analysing flow‐based modularity is useful across ecology and transcends to other biological and non‐biological disciplines. A dynamic approach for detecting modular structure has strong potential to provide new insights into the organisation of ecological networks.
  13. Impacts of Oxazepam on Perch (Perca fluviatilis) Behavior: Fish Familiarized to Lake Conditions Do Not Show Predicted Anti-anxiety Response
    Johan Fahlman, Gustav Hellström, Micael Jonsson, Jerker Berglund Fick, Martin Rosvall, and Jonatan Klaminder
    Environ. Sci. Technol. 55(6):3624-3633 (2021)
    Abstract »
    A current theory in environmental science states that dissolved anxiolytics (oxazepam) from wastewater effluents can reduce anti-predator behavior in fish with potentially negative impacts on prey fish populations. Here, we hypothesize that European perch (Perca fluviatilis) populations being exposed to oxazepam in situ show reduced anti-predator behavior, which has previously been observed for exposed isolated fish in laboratory studies. We tested our hypothesis by exposing a whole-lake ecosystem, containing both perch (prey) and northern pike (Esox lucius; predator), to oxazepam while tracking fish behavior before and after exposure in the exposed lake as well as in an unexposed nearby lake (control). Oxazepam concentrations in the exposed lake ranged between 11 and 24 μg L-1, which is >200 times higher than concentrations reported for European rivers. In contrast to our hypothesis, we did not observe an oxazepam-induced reduction in anti-predator behavior, inferred from perch swimming activity, distance to predators, distance to conspecifics, home-range size, and habitat use. In fact, exposure to oxazepam instead stimulated anti-predator behavior (decreased activity, decreased distance to conspecifics, and increased littoral habitat use) when using behavior in the control lake as a reference. Shoal dynamics and temperature changes may have masked modest reductions in anti-predator behavior due to oxazepam. Although we cannot fully resolve the mechanism(s) behind our observations, our results indicate that the effects of oxazepam on perch behavior in a familiar natural ecosystem are negligible in comparison to the effects of other environmental conditions.
  14. Socioeconomic and environmental patterns behind H1N1 spreading in Sweden
    András Bóta, Martin Holmberg, Lauren Gardner, and Martin Rosvall
    Sci. Rep. 11, 22512 (2021)
    medRxiv 2020.03.18.20038349
    Abstract »
    Identifying the critical factors related to influenza spreading is crucial in predicting and mitigating epidemics. Specifically, uncovering the relationship between epidemic onset and various risk indicators such as socioeconomic, mobility and climate factors can reveal locations and travel patterns that play critical roles in furthering an outbreak. We study the 2009 A(H1N1) influenza outbreaks in Sweden’s municipalities between 2009 and 2015 and use the Generalized Inverse Infection Method (GIIM) to assess the most significant contributing risk factors. GIIM represents an epidemic spreading process on a network: nodes correspond to geographical objects, links indicate travel routes, and transmission probabilities assigned to the links guide the infection process. Our results reinforce existing observations that the influenza outbreaks considered in this study were driven by the country’s largest population centers, while meteorological factors also contributed significantly. Travel and other socioeconomic indicators have a negligible effect. We also demonstrate that by training our model on the 2009 outbreak, we can predict the epidemic onsets in the following five seasons with high accuracy.
  15. Mapping flows on sparse networks with missing links
    Jelena Smiljanić, Daniel Edler, and Martin Rosvall
    Phys. Rev. E 102, 012302 (2020)
    Abstract »
    Unreliable network data can cause community-detection methods to overfit and highlight spurious structures with misleading information about the organization and function of complex systems. Here we show how to detect significant flow-based communities in sparse networks with missing links using the map equation. Since the map equation builds on Shannon entropy estimation, it assumes complete data such that analyzing undersampled networks can lead to overfitting. To overcome this problem, we incorporate a Bayesian approach with assumptions about network uncertainties into the map equation framework. Results in both synthetic and real-world networks show that the Bayesian estimate of the map equation provides a principled approach to revealing significant structures in undersampled networks.
  16. Mapping flows on bipartite networks
    Christopher Blöcker and Martin Rosvall
    Phys. Rev. E 102, 052305 (2020)
    Abstract »
    Mapping network flows provides insight into the organization of networks, but even though many real networks are bipartite, no method for mapping flows takes advantage of the bipartite structure. What do we miss by discarding this information and how can we use it to understand the structure of bipartite networks better? The map equation models network flows with a random walk and exploits the information-theoretic duality between compression and finding regularities to detect communities in networks. However, it does not use the fact that random walks in bipartite networks alternate between node types, information worth 1 bit. To make some or all of this information available to the map equation, we developed a coding scheme that remembers node types at different rates. We explored the community landscape of bipartite real-world networks from no node-type information to full node-type information and found that using node types at a higher rate generally leads to deeper community hierarchies and a higher resolution. The corresponding compression of network flows exceeds the amount of extra information provided. Consequently, taking advantage of the bipartite structure increases the resolution and reveals more network regularities.
  17. Positive associations among rare species and their persistence in ecological assemblages
    Joaquín Calatayud, Enrique Andivia, Adrián Escudero, Carlos J. Melián, Rubén Bernardo-Madrid, Markus Stoffel, Cristina Aponte, Nagore G. Medina, Rafael Molina-Venegas, Xavier Arnan, Martin Rosvall, Magnus Neuman, Jorge Ari Noriega, Fernanda Alves-Martins, Isabel Draper, Arantzazu Luzuriaga, Juan Antonio Ballesteros-Cánovas, César Morales-Molino, Pablo Ferrandis, Asier Herrero, Luciano Pataro, Leandro Juen, Alex Cea, and Jaime Madrigal-González
    Nature Ecology & Evolution 4, 40-45 (2020)
    Abstract »
    According to the competitive exclusion principle, species with low competitive abilities should be excluded by more efficient competitors; yet, they generally remain as rare species. Here, we describe the positive and negative spatial association networks of 326 disparate assemblages, showing a general organization pattern that simultaneously supports the primacy of competition and the persistence of rare species. Abundant species monopolize negative associations in about 90% of the assemblages. On the other hand, rare species are mostly involved in positive associations, forming small network modules. Simulations suggest that positive interactions among rare species and microhabitat preferences are the most probable mechanisms underpinning this pattern and rare species persistence. The consistent results across taxa and geography suggest a general explanation for the maintenance of biodiversity in competitive environments.
  18. From networks to optimal higher-order models of complex systems
    Renaud Lambiotte, Martin Rosvall, and Ingo Scholtes
    Nature Physics 15, 313–320 (2019)
    Abstract »
    Rich data are revealing that complex dependencies between the nodes of a network may not be captured by models based on pairwise interactions. Higher-order network models go beyond these limitations, offering new perspectives for understanding complex systems.
  19. Modelling Temporal Networks with Markov Chains, Community Structures and Change Points
    Tiago P. Peixoto and Martin Rosvall
    In: Holme P., Saramäki J. (eds) Temporal Network Theory. Computational Social Sciences. Springer, Cham (2019)
    Abstract »
    While temporal networks contain crucial information about the evolving systems they represent, only recently have researchers showed how to incorporate higher-order Markov chains, community structure and abrupt transitions to describe them. However, each approach can only capture one aspect of temporal networks, which often are multifaceted with dynamics taking place concurrently at small and large structural scales and also at short and long timescales. Therefore, these approaches must be combined for more realistic descriptions of empirical systems. Here we present two data-driven approaches developed to capture multiple aspects of temporal network dynamics. Both approaches capture short timescales and small structural scales with Markov chains. Whereas one approach also captures large structural scales with communities, the other instead captures long timescales with change points. Using a nonparametric Bayesian inference framework, we illustrate how the multi-aspect approaches better describe evolving systems by combining different scales, because the most plausible models combine short timescales and small structural scales with large-scale structural and dynamical modular patterns or many change points.
  20. Human activity is altering the world’s zoogeographical regions
    Rubén Bernardo-Madrid, Joaquín Calatayud, Manuela González-Suarez, Martin Rosvall, Pablo M. Lucas, Marta Rueda, Alexandre Antonelli, and Eloy Revilla
    Ecology Letters 22, 1297–1305 (2019)
    bioRxiv 287300
    Abstract »
    Zoogeographical regions, or zooregions, are areas of the Earth defined by species pools that reflect ecological, historical and evolutionary processes acting over millions of years. Consequently, researchers have assumed that zooregions are robust and unlikely to change on a human timescale. However, the increasing number of human‐mediated introductions and extinctions can challenge this assumption. By delineating zooregions with a network‐based algorithm, here we show that introductions and extinctions are altering the zooregions we know today. Introductions are homogenising the Eurasian and African mammal zooregions and also triggering less intuitive effects in birds and amphibians, such as dividing and redefining zooregions representing the Old and New World. Furthermore, these Old and New World amphibian zooregions are no longer detected when considering introductions plus extinctions of the most threatened species. Our findings highlight the profound and far‐reaching impact of human activity and call for identifying and protecting the uniqueness of biotic assemblages.
  21. Exploring the solution landscape enables more reliable network community detection
    Joaquín Calatayud, Rubén Bernardo-Madrid, Magnus Neuman, Alexis Rojas, and Martin Rosvall
    Phys. Rev. E 100, 052308 (2019)
    Abstract »
    To understand how a complex system is organized and functions, researchers often identify communities in the system's network of interactions. Because it is practically impossible to explore all solutions to guarantee the best one, many community-detection algorithms rely on multiple stochastic searches. But for a given combination of network and stochastic algorithm, how many searches are sufficient to find a solution that is good enough? The standard approach is to pick a reasonably large number of searches and select the network partition with the highest quality or derive a consensus solution based on all network partitions. However, if different partitions have similar qualities such that the solution landscape is degenerate, the single best partition may miss relevant information, and a consensus solution may blur complementary communities. Here we address this degeneracy problem with coarse-grained descriptions of the solution landscape. We cluster network partitions based on their similarity and suggest an approach to determine the minimum number of searches required to describe the solution landscape adequately. To make good use of all partitions, we also propose different ways to explore the solution landscape, including a significance clustering procedure. We test these approaches on synthetic and real-world networks, and find that different networks and algorithms require a different number of searches and that exploring the coarse-grained solution landscape can reveal noteworthy complementary solutions and enable more reliable community detection.
  22. Constrained information flows in temporal networks reveal intermittent communities
    Ulf Aslak, Martin Rosvall, and Sune Lehmann
    Phys. Rev. E 97, 062312 (2018)
    Abstract »
    Many real-world networks are representations of dynamic systems with interactions that change over time, often in uncoordinated ways and at irregular intervals. For example, university students connect in intermittent groups that repeatedly form and dissolve based on multiple factors, including their lectures, interests, and friends. Such dynamic systems can be represented as multilayer networks where each layer represents a snapshot of the temporal network. In this representation, it is crucial that the links between layers accurately capture real dependencies between those layers. Often, however, these dependencies are unknown. Therefore, current methods connect layers based on simplistic assumptions that cannot capture node-level layer dependencies. For example, connecting every node to itself in other layers with the same weight can wipe out essential dependencies between intermittent groups, making it difficult or even impossible to identify them. In this paper, we present a principled approach to estimating node-level layer dependencies based on the network structure within each layer. We implement our node-level coupling method in the community detection framework Infomap and demonstrate its performance compared to current methods on synthetic and real temporal networks. We show that our approach more effectively constrains information inside multilayer communities so that Infomap can better recover planted groups in multilayer benchmark networks that represent multiple modes with different groups and better identify intermittent communities in real temporal contact networks. These results suggest that node-level layer coupling can improve the modeling of information spreading in temporal networks and better capture their dynamic community structure.
  23. Modeling sequences and temporal networks with dynamic community structures
    Tiago P. Peixoto and Martin Rosvall
    Nature Communications 8, 582 (2017)
    Abstract »
    In evolving complex systems such as air traffic and social organizations, collective effects emerge from their many components' dynamic interactions. While the dynamic interactions can be represented by temporal networks with nodes and links that change over time, they remain highly complex. It is therefore often necessary to use methods that extract the temporal networks' large-scale dynamic community structure. However, such methods are subject to overfitting or suffer from effects of arbitrary, a priori imposed timescales, which should instead be extracted from data. Here we simultaneously address both problems and develop a principled data-driven method that determines relevant timescales and identifies patterns of dynamics that take place on networks as well as shape the networks themselves. We base our method on an arbitrary-order Markov chain model with community structure, and develop a nonparametric Bayesian inference framework that identifies the simplest such model that can explain temporal interaction data.
  24. Mapping higher-order network flows in memory and multilayer networks with Infomap
    Daniel Edler, Ludvig Bohlin, and Martin Rosvall
    Algorithms 10, 112 (2017)
    Abstract »
    Comprehending complex systems by simplifying and highlighting important dynamical patterns requires modeling and mapping higher-order network flows. However, complex systems come in many forms and demand a range of representations, including memory and multilayer networks, which in turn call for versatile community-detection algorithms to reveal important modular regularities in the flows. Here we show that various forms of higher-order network flows can be represented in a unified way with networks that distinguish physical nodes for representing a~complex system's objects from state nodes for describing flows between the objects. Moreover, these so-called sparse memory networks allow the information-theoretic community detection method known as the map equation to identify overlapping and nested flow modules in data from a range of~different higher-order interactions such as multistep, multi-source, and temporal data. We derive the map equation applied to sparse memory networks and describe its search algorithm Infomap, which can exploit the flexibility of sparse memory networks. Together they provide a general solution to reveal overlapping modular patterns in higher-order flows through complex systems.
  25. Different approaches to community detection
    Martin Rosvall, Jean-Charles Delvenne, Michael T. Schaub, and Renaud Lambiotte
    Advances in Network Clustering and Blockmodeling, 105-119 (2020)
    Abstract »
    Community detection, the decomposition of a graph into essential building blocks, has been a core research topic in network science over the past years. Since a precise notion of what constitutes a community has remained evasive, community detection algorithms have often been compared on benchmark graphs with a particular form of assortative community structure and classified based on the mathematical techniques they employ. However, this comparison can be misleading because apparent similarities in their mathematical machinery can disguise different goals and reasons for why we want to employ community detection in the first place. Here we provide a focused review of these different motivations that underpin community detection. This problem-driven classification is useful in applied network science, where it is important to select an appropriate algorithm for the given purpose. Moreover, highlighting the different facets of community detection also delineates the many lines of research and points out open directions and avenues for future research.
  26. The many facets of community detection in complex networks
    Michael T. Schaub, Jean-Charles Delvenne, Martin Rosvall, and Renaud Lambiotte
    Appl. Netw. Sci. 2: 4 (2017)
    Abstract »
    Community detection, the decomposition of a graph into essential building blocks, has been a core research topic in network science over the past years. Since a precise notion of what constitutes a community has remained evasive, community detection algorithms have often been compared on benchmark graphs with a particular form of assortative community structure and classified based on the mathematical techniques they employ. However, this comparison can be misleading because apparent similarities in their mathematical machinery can disguise different goals and reasons for why we want to employ community detection in the first place. Here we provide a focused review of these different motivations that underpin community detection. This problem-driven classification is useful in applied network science, where it is important to select an appropriate algorithm for the given purpose. Moreover, highlighting the different facets of community detection also delineates the many lines of research and points out open directions and avenues for future research.
  27. Infomap Bioregions: Interactive mapping of biogeographical regions from species distributions
    Daniel Edler, Thaís Guedes, Alexander Zizka, Martin Rosvall, and Alexandre Antonelli
    Syst. Biol. 66 (2): 197-204 (2017)
    arXiv:1512.00892 Infomap Bioregions
    Abstract »
    Biogeographical regions (bioregions) reveal how different sets of species are spatially grouped and therefore are important units for conservation, historical biogeography, ecology and evolution. Several methods have been developed to identify bioregions based on species distribution data rather than expert opinion. One approach successfully applies network theory to simplify and highlight the underlying structure in species distributions. However, this method lacks tools for simple and efficient analysis. Here we present Infomap Bioregions, an interactive web application that inputs species distribution data and generates bioregion maps. Species distributions may be provided as georeferenced point occurrences or range maps, and can be of local, regional or global scale. The application uses a novel adaptive resolution method to make best use of often incomplete species distribution data. The results can be downloaded as vector graphics, shapefiles or in table format. We validate the tool by processing large datasets of publicly available species distribution data of the world's amphibians using species ranges, and mammals using point occurrences. We then calculate the fit between the inferred bioregions and WWF ecoregions. As examples of applications, researchers can reconstruct ancestral ranges in historical biogeography or identify indicator species for targeted conservation.
  28. Scalable and efficient flow-based community detection for large-scale graph analysis
    Seung-Hee Bae, Daniel Halperin, Jevin West, Martin Rosvall, and Bill Howe
    ACM Trans. Knowl. Discov. Data 11, 3, Article 32 (2017)
    Abstract »
    Community detection is an increasingly popular approach to uncover important structures in large networks. Flow-based community detection methods rely on communication patterns of the network rather than structural properties to determine communities. The Infomap algorithm in particular optimizes a novel objective function called the map equation and has been shown to outperform other approaches in third-party benchmarks. However, Infomap and its variants are inherently sequential, limiting their use for large-scale graphs. In this paper, we propose a novel algorithm to optimize the map equation called RelaxMap. RelaxMap provides two important improvements over Infomap: parallelization, so that the map equation can be optimized over much larger graphs, and prioritization, so that the most important work occurs first, iterations take less time, and the algorithm converges faster. We implement these techniques using OpenMP on shared-memory multicore systems, and evaluate our approach on a variety of graphs from standard graph clustering benchmarks as well as real graph datasets. Our evaluation shows that both techniques are effective: RelaxMap achieves 70% parallel efficiency on 8 cores, and prioritization improves algorithm performance by an additional 20%–50% in average, depending on the graph properties. Additionally, RelaxMap converges in the similar number of iterations and provides solutions of equivalent quality as the serial Infomap implementation.
  29. Efficient community detection of network flows for varying Markov times and bipartite networks
    Masoumeh Kheirkhah, Andrea Lancichinetti, and Martin Rosvall
    Phys. Rev. E 93, 032309 (2016)
    Abstract »
    Community detection of network flows conventionally assumes one-step dynamics on the links. For sparse networks and interest in large-scale structures, longer timescales may be more appropriate. Oppositely, for large networks and interest in small-scale structures, shorter timescales may be better. However, current methods for analyzing networks at different timescales require expensive and often infeasible network reconstructions. To overcome this problem, we introduce a method that takes advantage of the inner-workings of the map equation and evades the reconstruction step. This makes it possible to efficiently analyze large networks at different Markov times with no extra overhead cost. The method also evades the costly unipartite projection for identifying flow modules in bipartite networks.
  30. Robustness of journal rankings by network flows with different amounts of memory
    Ludvig Bohlin, Alcides Viamontes Esquivel, Andrea Lancichinetti, and Martin Rosvall
    J. Assn. Inf. Sci. Tec. 67, 2527 (2016)
    Abstract »
    As the number of scientific journals has multiplied, journal rankings have become increasingly important for scientific decisions. From submissions and subscriptions to grants and hirings, researchers, policy makers, and funding agencies make important decisions with influence from journal rankings such as the ISI journal impact factor. Typically, the rankings are derived from the citation network between a selection of journals and unavoidably depend on this selection. However, little is known about how robust rankings are to the selection of included journals. Here we compare the robustness of three journal rankings based on network flows induced on citation networks. They model pathways of researchers navigating scholarly literature, stepping between journals and remembering their previous steps to different degree: zero-step memory as impact factor, one-step memory as Eigenfactor, and two-step memory, corresponding to zero-, first-, and second-order Markov models of citation flow between journals. We conclude that a second-order Markov model is slightly more robust, because it combines the advantages of the lower-order models: perturbations that remain local and citation weights that depend on journal importance. However, the robustness gain comes at the cost of requiring more data, because the second-order Markov model requires citation data from twice as long a period.
  31. Maps of sparse Markov chains efficiently reveal community structure in network flows with memory
    Christian Persson, Ludvig Bohlin, Daniel Edler, and Martin Rosvall
    Abstract »
    To better understand the flows of ideas or information through social and biological systems, researchers develop maps that reveal important patterns in network flows. In practice, network flow models have implied memoryless first-order Markov chains, but recently researchers have introduced higher-order Markov chain models with memory to capture patterns in multi-step pathways. Higher-order models are particularly important for effectively revealing actual, overlapping community structure, but higher-order Markov chain models suffer from the curse of dimensionality: their vast parameter spaces require exponentially increasing data to avoid overfitting and therefore make mapping inefficient already for moderate-sized systems. To overcome this problem, we introduce an efficient cross-validated mapping approach based on network flows modeled by sparse Markov chains. To illustrate our approach, we present a map of citation flows in science with research fields that overlap in multidisciplinary journals. Compared with currently used categories in science of science studies, the research fields form better units of analysis because the map more effectively captures how ideas flow through science.
  32. Mapping bilateral information interests using the activity of Wikipedia editors
    Fariba Karimi, Ludvig Bohlin, Ann Samoilenko, Martin Rosvall, and Andrea Lancichinetti
    Palgrave Communications 1, 15041 (2015)
    Abstract »
    We live in a global village where electronic communication has eliminated the geographical barriers of information exchange. The road is now open to worldwide convergence of information interests, shared values and understanding. Nevertheless, interests still vary between countries around the world. This raises important questions about what today’s world map of information interests actually looks like and what factors cause the barriers of information exchange between countries. To quantitatively construct a world map of information interests, we devise a scalable statistical model that identifies countries with similar information interests and measures the countries’ bilateral similarities. From the similarities we connect countries in a global network and find that countries can be mapped into 18 clusters with similar information interests. Through regression we find that language and religion best explain the strength of the bilateral ties and formation of clusters. Our findings provide a quantitative basis for further studies to better understand the complex interplay between shared interests and conflict on a global scale. The methodology can also be extended to track changes over time and capture important trends in global information exchange.
  33. Sources of and processes controlling CO2 emissions change with the size of streams and rivers
    E. R. Hotchkiss, R. O. Hall Jr, R. A. Sponseller, D. Butman, J. Klaminder, H. Laudon, M. Rosvall, and J. Karlsson
    Nature Geoscience 8, 696-699 (2015)
    Abstract »
    Carbon dioxide (CO2) evasion from streams and rivers to the atmosphere represents a substantial flux in the global carbon cycle1, 2, 3. The proportions of CO2 emitted from streams and rivers that come from terrestrially derived CO2 or from CO2 produced within freshwater ecosystems through aquatic metabolism are not well quantified. Here we estimated CO2 emissions from running waters in the contiguous United States, based on freshwater chemical and physical characteristics and modelled gas transfer velocities at 1463 United States Geological Survey monitoring sites. We then assessed CO2 production from aquatic metabolism, compiled from previously published measurements of net ecosystem production from 187 streams and rivers across the contiguous United States. We find that CO2 produced by aquatic metabolism contributes about 28% of CO2 evasion from streams and rivers with flows between 0.0001 and 19,000 m3 s−1. We mathematically modelled CO2 flux from groundwater into running waters along a stream–river continuum to evaluate the relationship between stream size and CO2 source. Terrestrially derived CO2 dominates emissions from small streams, and the percentage of CO2 emissions from aquatic metabolism increases with stream size. We suggest that the relative role of rivers as conduits for terrestrial CO2 efflux and as reactors mineralizing terrestrial organic carbon is a function of their size and connectivity with landscapes.
  34. Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems
    Manlio De Domenico, Andrea Lancichinetti, Alex Arenas, and Martin Rosvall
    Physical Review X 5, 011027 (2015)
    Abstract »
    To comprehend interconnected systems across the social and natural sciences, researchers have developed many powerful methods to identify functional modules. For example, with interaction data aggregated into a single network layer, flow-based methods have proven useful for identifying modular dynamics in weighted and directed networks that capture constraints on flow processes. However, many interconnected systems consist of agents or components that exhibit multiple layers of interactions, possibly from several different processes. Inevitably, representing this intricate network of networks as a single aggregated network leads to information loss and may obscure the actual organization. Here, we propose a method based on a compression of network flows that can identify modular flows both within and across layers in nonaggregated multilayer networks. Our numerical experiments on synthetic multilayer networks, with some layers originating from the same interaction process, show that the analysis fails in aggregated networks or when treating the layers separately, whereas the multilayer method can accurately identify modules across layers that originate from the same interaction process. We capitalize on our findings and reveal the community structure of two multilayer collaboration networks with topics as layers: scientists affiliated with the Pierre Auger Observatory and scientists publishing works on networks on the arXiv. Compared to conventional aggregated methods, the multilayer method uncovers connected topics and reveals smaller modules with more overlap that better capture the actual organization.
  35. Estimating the resolution limit of the map equation in community detection
    Tatsuro Kawamoto and Martin Rosvall
    Physical Review E 91, 012809 (2015)
    Abstract »
    A community detection algorithm is considered to have a resolution limit if the scale of the smallest modules that can be resolved depends on the size of the analyzed subnetwork. The resolution limit is known to prevent some community detection algorithms from accurately identifying the modular structure of a network. In fact, any global objective function for measuring the quality of a two-level assignment of nodes into modules must have some sort of resolution limit or an external resolution parameter. However, it is yet unknown how the resolution limit affects the so-called map equation, which is known to be an efficient objective function for community detection. We derive an analytical estimate and conclude that the resolution limit of the map equation is set by the total number of links between modules instead of the total number of links in the full network as for modularity. This mechanism makes the resolution limit much less restrictive for the map equation than for modularity; in practice, it is orders of magnitudes smaller. Furthermore, we argue that the effect of the resolution limit often results from shoehorning multilevel modular structures into two-level descriptions. As we show, the hierarchical map equation effectively eliminates the resolution limit for networks with nested multilevel modular structures.
  36. Memory in network flows and its effects on spreading dynamics and community detection
    Martin Rosvall, Alcides V. Esquivel, Andrea Lancichinetti, Jevin D. West, and Renaud Lambiotte
    Nature Communications 5, 4630 (2014)
    Abstract »
    Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is usconventional community detection, ranking and spreading analysis, although it ignores a potentially important feature of the dynamics: where flow moves to may don where it comes from. Here we analyse pathways from different systems, and although we only observe marginal consequences for disease spreading, we showignoring the effects of second-order Markov dynamics has important consequences for community detection, ranking and information spreading. For example, captdynamics with a second-order Markov model allows us to reveal actual travel patterns in air traffic and to uncover multidisciplinary journals in sciencommunication. These findings were achieved only by using more available data and making no additional assumptions, and therefore suggest that accounting for horder memory in network flows can help us better understand how real systems are organized and function.
  37. Mapping change in the overnight money market
    Morten L. Bech, Carl T. Bergstrom, Martin Rosvall, and Rodney J. Garratt
    Physica A 424, 44-51 (2014)
    NY Fed 507
    Abstract »
    We use an information-theoretic approach to describe changes in lending relationships between financial institutions around the time of the Lehman Brothers failure. Unlike previous work that conducts maximum likelihood estimation on undirected networks our analysis distinguishes between borrowers and lenders and looks for broader lending relationships (multi-bank lending cycles) that extend beyond the immediate counter-parties. We detect significant changes in lending patterns following implementation of the Interest on Required and Excess Reserves policy by the Federal Reserve in October 2008. Analysis of micro-scale rates of change in the data suggests these changes were triggered by the collapse of Lehman Brothers a few weeks before.
  38. Stock Portfolio Structure of Individual Investors Infers Future Trading Behavior
    Ludvig Bohlin and Martin Rosvall
    PLoS ONE 9(7): e103006 (2014)
    Abstract »
    Although the understanding of and motivation behind individual trading behavior is an important puzzle in finance, little is known about the connection between an investor's portfolio structure and her trading behavior in practice. In this paper, we investigate the relation between what stocks investors hold, and what stocks they buy, and show that investors with similar portfolio structures to a great extent trade in a similar way. With data from the central register of shareholdings in Sweden, we model the market in a similarity network, by considering investors as nodes, connected with links representing portfolio similarity. From the network, we find investor groups that not only identify different investment strategies, but also represent individual investors trading in a similar way. These findings suggest that the stock portfolios of investors hold meaningful information, which could be used to earn a better understanding of stock market dynamics.
  39. Finding cultural holes: how structure and culture diverge in networks of scholarly communication
    Daril A Vilhena, Jacob G Foster, Martin Rosvall, Jevin D West, James Evans, Carl T Bergstrom
    Sociological Science 1, 221 (2014)
    Abstract »
    Divergent interests, expertise, and language form cultural barriers to communication. No formalism has been available to characterize these “cultural holes.” Here we use information theory to measure cultural holes and demonstrate our formalism in the context of scientific communication using papers from JSTOR. We extract scientific fields from the structure of citation flows and infer field-specific cultures by cataloging phrase frequencies in full text and measuring the relative efficiency of between-field communication. We then combine citation and cultural information in a novel topographic map of science, mapping citations to geographic distance and cultural holes to topography. By analyzing the full citation network, we find that communicative efficiency decays with citation distance in a field-specific way. These decay rates reveal hidden patterns of cohesion and fragmentation. For example, the ecological sciences are balkanized by jargon, whereas the social sciences are relatively integrated. Our results highlight the importance of enriching structural analyses with cultural data.
  40. Effect of memory on the dynamics of random walks on networks
    Renaud Lambiotte, Vsevolod Salnikov, and Martin Rosvall
    Journal of Complex Networks (2014)
    Abstract »
    Pathways of diffusion observed in real-world systems often require stochastic processes going beyond first-order Markov models, as implicitly assumed in network theory. In this work, we focus on second-order Markov models, and derive an analytical expression for the effect of memory on the spectral gap and thus, equivalently, on the characteristic time needed for the stochastic process to asymptotically reach equilibrium. Perturbation analysis shows that standard first-order Markov models can either overestimate or underestimate the diffusion rate of flows across the modular structure of a system captured by a second-order Markov network. We test the theoretical predictions on a toy example and on numerical data, and discuss their implications for network theory, in particular in the case of temporal or multiplex networks.
  41. Dynamics of interacting information waves in networks
    Atieh Mirshahvalad, Alcides Viamontes, Ludvig Lizana, and Martin Rosvall
    Phys. Rev. E 89, 012809 (2014)
    Abstract »
    To better understand the inner workings of information spreading, network researchers often use simple models to capture the spreading dynamics. But most models only highlight the effect of local interactions on the global spreading of a single information wave, and ignore the effects of interactions between multiple waves. Here we take into account the effect of multiple interacting waves by using an agent-based model in which the interaction between information waves is based on their novelty. We analyzed the global effects of such interactions and found that information that actually reaches nodes reaches them faster. This effect is caused by selection between information waves: slow waves die out and only fast waves survive. As a result, and in contrast to models with non-interacting information dynamics, the access to information decays with the distance from the source. Moreover, when we analyzed the model on various synthetic and real spatial road networks, we found that the decay rate also depends on the path redundancy and the effective dimension of the system. In general, the decay of the information wave frequency as a function of distance from the source follows a power law distribution with an exponent between -0.2 for a two-dimensional system with high path redundancy and -0.5 for a tree-like system with no path redundancy. We found that the real spatial networks provide an infrastructure for information spreading that lies in between these two extremes. Finally, to better understand the mechanics behind the scaling results, we provide analytic calculations of the scaling for a one-dimensional system.
  42. Community detection and visualization of networks with the map equation framework
    Ludvig Bohlin, Daniel Edler, Andrea Lancichinetti, and Martin Rosvall
    Book chapter in Measuring Scholarly Impact: Methods and Practice
    Available as a tutorial to
    Abstract »
    Large networks contain plentiful information about the organization of a system. The challenge is to extract useful information buried in the structure of myriad nodes and links. Therefore, powerful tools for simplifying and highlighting important structures in networks are essential for comprehending their organization. Such tools are called community detection methods and they are designed to identify strongly intra-connected modules that often correspond to important functional units. Here we describe one such method, known as the map equation, and its accompanying algorithms for finding, evaluating, and visualizing the modular organization of networks. The map equation framework is very flexible and can with its search algorithm Infomap, for example, identify two-level, multi-level and overlapping organization in weighted, directed, and multiplex networks. Because the map equation framework operates on the flow induced by the links of a network, it naturally captures flow of ideas and citation flow, and is therefore well-suited for analysis of bibliometric networks.
  43. Resampling Effects on Significance Analysis of Network Clustering and Ranking
    Atieh Mirshahvalad, Olivier H. Beauchesne, Éric Archambault, and Martin Rosvall
    PLoS ONE 8(1): e53943 (2013)
    Abstract »
    Community detection helps us simplify the complex configuration of networks, but communities are reliable only if they are statistically significant. To detect statistically significant communities, a common approach is to resample the original network and analyze the communities. But resampling assumes independence between samples, while the components of a network are inherently dependent. Therefore, we must understand how breaking dependencies between resampled components affects the results of the significance analysis. Here we use scientific communication as a model system to analyze this effect. Our dataset includes citations among articles published in journals in the years 1984–2010. We compare parametric resampling of citations with non-parametric article resampling. While citation resampling breaks link dependencies, article resampling maintains such dependencies. We find that citation resampling underestimates the variance of link weights. Moreover, this underestimation explains most of the differences in the significance analysis of ranking and clustering. Therefore, when only link weights are available and article resampling is not an option, we suggest a simple parametric resampling scheme that generates link-weight variances close to the link-weight variances of article resampling. Nevertheless, when we highlight and summarize important structural changes in science, the more dependencies we can maintain in the resampling scheme, the earlier we can predict structural change.
  44. Scalable Flow-Based Community Detection for Large-Scale Network Analysis
    Seung-Hee Bae, Daniel Halperin, Jevin West, Martin Rosvall, and Bill Howe
    2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW), 303 (2013)
    Abstract »
    Community-detection is a powerful approach to uncover important structures in large networks. Since networks often describe flow of some entity, flow-based community-detection methods are particularly interesting. One such algorithm is called Info map, which optimizes the objective function known as the map equation. While Info map is known to be an effective algorithm, its serial implementation cannot take advantage of multicore processing in modern computers. In this paper, we propose a novel parallel generalization of Info map called Relax Map. This algorithm relaxes concurrency assumptions to avoid lock overhead, achieving 70% parallel efficiency in shared-memory multicore experiments while exhibiting similar convergence properties and finding similar community structures as the serial algorithm. We evaluate our approach on a variety of real graph datasets as well as synthetic graphs produced by a popular graph generator used for benchmarking community detection algorithms. We describe the algorithm, the experiments, and some emerging research directions in high-performance community detection on massive graphs.
  45. Coherence Out of Chaos: Mapping European Union Law by Running Randomly Through the Maze of CJEU Case Law
    Johan Lindholm, Mattias Derlén, Martin Rosvall, and Atieh Mirshahvalad
    Europarättslig Tidskrift 3, 517 (2013)
    Abstract »
    Recent research has demonstrated the ability of network analysis to better understand law. In this study we apply network analysis to the case law of the European Court of Justice (CJEU) in order understand its role as a source of law. In doing so, we apply network analysis tools not previously used in legal scholarship, most significantly (i) a modified version of the PageRank algorithm, (ii) the Map Equation, and (iii) resampling to infer “missing” links. In the article we demonstrate that this method can help us to understand not only the CJEU’s case law but law generally.
  46. Ranking and clustering of nodes in networks with smart teleportation
    Renaud Lambiotte and Martin Rosvall
    Phys. Rev. E 85, 056107 (2012)
    Abstract »
    Random teleportation is a necessary evil for ranking and clustering directed networks based on random walks. Teleportation enables ergodic solutions, but the solutions must necessarily depend on the exact implementation and parametrization of the teleportation. For example, in the commonly used PageRank algorithm, the teleportation rate must trade off a heavily biased solution with a uniform solution. Here we show that teleportation to links rather than nodes enables a much smoother trade-off and effectively more robust results. We also show that, by not recording the teleportation steps of the random walker, we can further reduce the effect of teleportation with dramatic effects on clustering.
  47. Significant communities in large sparse networks
    Atieh Mirshahvalad, Johan Lindholm, Mattias Derlén, and Martin Rosvall
    PLoS ONE 7(3): e33721 (2012)
    Abstract »
    Researchers use community-detection algorithms to reveal large-scale organization in biological and social networks, but community detection is useful only if the communities are significant and not a result of noisy data. To assess the statistical significance of the network communities, or the robustness of the detected structure, one approach is to perturb the network structure by removing links and measure how much the communities change. However, perturbing sparse networks is challenging because they are inherently sensitive; they shatter easily if links are removed. Here we propose a simple method to perturb sparse networks and assess the significance of their communities. We generate resampled networks by adding extra links based on local information, then we aggregate the information from multiple resampled networks to find a coarse-grained description of significant clusters. In addition to testing our method on benchmark networks, we use our method on the sparse network of the European Court of Justice (ECJ) case law, to detect significant and insignificant areas of law. We use our significance analysis to draw a map of the ECJ case law network that reveals the relations between the areas of law.
  48. A Network-Based Approach to Visualize Prevalence and Progression of Metabolic Syndrome Components
    Robin Haring, Martin Rosvall, Uwe Völker, Henry Völzke, Heyo Kroemer, Matthias Nauck, and Henri Wallaschofski
    PLoS ONE 7(6): e39461 (2012)
    Abstract »
    The additional clinical value of clustering cardiovascular risk factors to define the metabolic syndrome (MetS) is still under debate. However, it is unclear which cardiovascular risk factors tend to cluster predominately and how individual risk factor states change over time. We used data from 3,187 individuals aged 20–79 years from the population-based Study of Health in Pomerania for a network-based approach to visualize clustered MetS risk factor states and their change over a five-year follow-up period. MetS was defined by harmonized Adult Treatment Panel III criteria, and each individual's risk factor burden was classified according to the five MetS components at baseline and follow-up. We used the map generator to depict 32 (25) different states and highlight the most important transitions between the 1,024 (322) possible states in the weighted directed network. At baseline, we found the largest fraction (19.3%) of all individuals free of any MetS risk factors and identified hypertension (15.4%) and central obesity (6.3%), as well as their combination (19.0%), as the most common MetS risk factors. Analyzing risk factor flow over the five-year follow-up, we found that most individuals remained in their risk factor state and that low high-density lipoprotein cholesterol (HDL) (6.3%) was the most prominent additional risk factor beyond hypertension and central obesity. Also among individuals without any MetS risk factor at baseline, low HDL (3.5%), hypertension (2.1%), and central obesity (1.6%) were the first risk factors to manifest during follow-up. We identified hypertension and central obesity as the predominant MetS risk factor cluster and low HDL concentrations as the most prominent new onset risk factor.
  49. Comparing network covers using mutual information
    Alcides Viamontes Esquivel and Martin Rosvall
    Abstract »
    In network science, researchers often use mutual information to understand the difference between network partitions produced by community detection methods. Here we extend the use of mutual information to covers, that is, the cases where a node can belong to more than one module. In our proposed solution, the underlying stochastic process used to compare partitions is extended to deal with covers, and the random variables of the new process are simply fed into the usual definition of mutual information. With partitions, our extended process behaves exactly as the conventional approach for partitions, and thus, the mutual information values obtained are the same. We also describe how to perform sampling and do error estimation for our extended process, as both are necessary steps for a practical application of this measure. The stochastic process that we define here is not only applicable to networks, but can also be used to compare more general set-to-set binary relations.
  50. Compression of flow can reveal overlapping modular organization in networks
    Alcides Viamontes Esquivel and Martin Rosvall
    Phys. Rev. X 1, 021025 (2011)
    arXiv:1105.0812 Source code
    Abstract »
    To better understand the organization of overlapping modules in large networks with respect to flow, we introduce the map equation for overlapping modules. In this information-theoretic framework, we use the correspondence between compression and regularity detection. The generalized map equation measures how well we can compress a description of flow in the network when we partition it into modules with possible overlaps. When we minimize the generalized map equation over overlapping network partitions, we detect modules that capture flow and determine which nodes at the boundaries between modules should be classified in multiple modules and to what degree. With a novel greedy-search algorithm, we find that some networks, for example, the neural network of the nematode Caenorhabditis elegans, are best described by modules dominated by hard boundaries, but that others, for example, the sparse European-roads network, have an organization of highly overlapping modules.
  51. Reinforced communication and social navigation: remember your friends and remember yourself
    Atieh Mirshahvalad and Martin Rosvall
    Phys. Rev. E 84, 036102 (2011)
    Abstract »
    In social systems, people communicate with each other and form groups based on their interests. The pattern of interactions, the network, and the ideas that flow on the network naturally evolve together. Researchers use simple models to capture the feedback between changing network patterns and ideas on the network, but little is understood about the role of past events in the feedback process. Here we introduce a simple agent-based model to study the coupling between peoples' ideas and social networks, and better understand the role of history in dynamic social networks. We measure how information about ideas can be recovered from information about network structure and, the other way around, how information about network structure can be recovered from information about ideas. We find that it is in general easier to recover ideas from the network structure than vice versa.
  52. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems
    Martin Rosvall and Carl T. Bergstrom
    PLoS ONE 6(4): e18209 (2011)
    arXiv:1010.0431 Source code
    Abstract »
    To comprehend the hierarchical organization of large integrated systems, we introduce the hierarchical map equation that reveals multilevel structures in networks. In this information-theoretic approach, we exploit the duality between compression and pattern detection; by compressing a description of a random walker as a proxy for real flow on a network, we find regularities in the network that induce this system-wide flow. Finding the shortest multilevel description of the random walker therefore gives us the best hierarchical clustering of the network — the optimal number of levels and modular partition at each level — with respect to the dynamics on the network. With a novel search algorithm, we extract and illustrate the rich multilevel organization of several large social and biological networks. For example, from the global air traffic network we uncover countries and continents, and from the pattern of scientific communication we reveal more than 100 scientific fields organized in four major disciplines: life sciences, physical sciences, ecology and earth sciences, and social sciences. In general, we find shallow hierarchical structures in globally interconnected systems, such as neural networks, and rich multilevel organizations in systems with highly separated regions, such as road networks
  53. The transmission sense of information
    Carl T. Bergstrom and Martin Rosvall
    Biology and Philosophy 26, 159-176 (2011)
    arXiv:0810.4168 Response to commentaries on The Transmission Sense of Information
    Abstract »
    Biologists rely heavily on the language of information, coding, and transmission that is commonplace in the field of information theory developed by Claude Shannon, but there is open debate about whether such language is anything more than facile metaphor. Philosophers of biology have argued that when biologists talk about information in genes and in evolution, they are not talking about the sort of information that Shannon's theory addresses. First, philosophers have suggested that Shannon theory is only useful for developing a shallow notion of correlation, the so-called "causal sense" of information. Second, they typically argue that in genetics and evolutionary biology, information language is used in a "semantic sense," whereas semantics are deliberately omitted from Shannon theory. Neither critique is well-founded. Here we propose an alternative to the causal and semantic senses of information: a transmission sense of information, in which an object X conveys information if the function of X is to reduce, by virtue of its sequence properties, uncertainty on the part of an agent who observes X. The transmission sense not only captures much of what biologists intend when they talk about information in genes, but also brings Shannon's theory back to the fore. By taking the viewpoint of a communications engineer and focusing on the decision problem of how information is to be packaged for transport, this approach resolves several problems that have plagued the information concept in biology, and highlights a number of important features of the way that information is encoded, stored, and transmitted as genetic sequence.
  54. Time walkers and spatial dynamics of ageing information
    Ludvig Lizana, Martin Rosvall, and Kim Sneppen
    Phys. Rev. Lett. 104, 040603 (2010)
    Abstract »
    The distribution of information is essential for living system's ability to coordinate and adapt. Random walkers are often used to model this distribution process and, in doing so, one effectively assumes that information maintains its relevance over time. But the value of information in social and biological systems often decay and must continuously be updated. To capture the spatial dynamics of ageing information, we introduce time walkers. A time walker moves like a random walker, but interacts with traces left by other walkers, some representing older information, some newer. The traces forms a navigable information landscape. We quantify the dynamical properties of time walkers moving on a two-dimensional lattice and the quality of the information landscape generated by their movements. We visualise the self-similar landscape as a river network, and show that searching in this landscape is superior to random searching and scales as the length of loop-erased random walks.
  55. The map equation
    Martin Rosvall, Daniel Axelsson, and Carl T. Bergstrom
    Eur. Phys. J. Special Topics 178, 13 (2009)
    arXiv:0906.1405 Map generator
    Abstract »
    Many real-world networks are so large that we must simplify their structure before we can extract useful information about the systems they represent. As the tools for doing these simplifications proliferate within the network literature, researchers would benefit from some guidelines about which of the so-called community detection algorithms are most appropriate for the structures they are studying and the questions they are asking. Here we show that different methods highlight different aspects of a network's structure and that the the sort of information that we seek to extract about the system must guide us in our decision. For example, many community detection algorithms, including the popular modularity maximization approach, infer module assignments from an underlying model of the network formation process. However, we are not always as interested in how a system's network structure was formed, as we are in how a network's extant structure influences the system's behavior. To see how structure influences current behavior, we will recognize that links in a network induce movement across the network and result in system-wide interdependence. In doing so, we explicitly acknowledge that most networks carry flow. To highlight and simplify the network structure with respect to this flow, we use the map equation. We present an intuitive derivation of this flow-based and information-theoretic method and provide an interactive on-line application that anyone can use to explore the mechanics of the map equation. The differences between the map equation and the modularity maximization approach are not merely conceptual. Because the map equation attends to patterns of flow on the network and the modularity maximization approach does not, the two methods can yield dramatically different results for some network structures. To illustrate this and build our understanding of each method, we partition several sample networks. We also describe an algorithm and provide source code to efficiently decompose large weighted and directed networks based on the map equation.
  56. Mapping change in large networks
    Martin Rosvall and Carl T. Bergstrom
    PLoS ONE 5(1): e8694 (2010)
    arXiv:0812.1242Alluvial generator
    Abstract »
    Change is the very nature of interaction patterns in biology, technology, economics, and science itself: The interactions within and between organisms change; the air, ground, and sea traffic change; the global financial flow changes; the scientific research front changes. With increasingly available data, networks and clustering tools have become important methods used to comprehend instances of these large-scale structures. But blind to the difference between noise and trends in the data, these tools alone must fail when used to study change. Only if we can assign significance to the partition of single networks can we distinguish structural changes from fluctuations and assess how much confidence we should have in the changes. Here we show that bootstrap resampling accompanied by significance clustering provides a solution to this problem. We use the significance clustering to realize de Solla Price's vision of mapping the change in science.
  57. Reinforced communication and social navigation generate groups in model networks
    Martin Rosvall and Kim Sneppen
    Phys. Rev. E 79, 026111 (2009)
    Abstract »
    To investigate the role of information flow in group formation, we introduce a model of communication and social navigation. We let agents gather information in an idealized network society, and demonstrate that heterogeneous groups can evolve without presuming that individuals have different interests. In our scenario, individuals' access to global information is constrained by local communication with the nearest neighbors on a dynamic network. The result is reinforced interests among like-minded agents in modular networks; the flow of information works as a glue that keeps individuals together. The model explains group formation in terms of limited information access and highlights global broadcasting of information as a way to counterbalance this fragmentation. To illustrate how the information constraints imposed by the communication structure affect future development of real-world systems, we extrapolate dynamics from the topology of four social networks.
  58. Maps of random walks on complex networks reveal community structure
    Martin Rosvall and Carl T. Bergstrom
    PNAS 105, 1118 (2008)
    arXiv:0707.0609 Source code Map generator Interactive map
    Abstract »
    To comprehend the multipartite organization of large-scale biological and social systems, we introduce a new information-theoretic approach to reveal community structure in weighted and directed networks. The method decomposes a network into modules by optimally compressing a description of information flows on the network. The result is a map that both simplifies and highlights the regularities in the structure and their relationships to each other. We illustrate the method by making a map of scientific communication as captured in the citation patterns of more than 6000 journals. We discover a multicentric organization with fields that vary dramatically in size and degree of integration into the network of science. Along the backbone of the network — which includes physics, chemistry, molecular biology, and medicine — information flows bidirectionally, but the map reveals a directional pattern of citation from the applied fields to the basic sciences.
  59. An information-theoretic framework for resolving community structure in complex networks
    Martin Rosvall and Carl T. Bergstrom
    PNAS 104, 7327 (2007)
    physics/0612035 Source code
    Abstract »
    To understand the structure of a large-scale biological, social, or technological network, it can be helpful to decompose the network into smaller subunits or modules. In this article, we develop an information-theoretic foundation for the concept of modularity in networks. We identify the modules of which the network is composed by finding an optimal compression of its topology, capitalizing on regularities in its structure. We explain the advantages of this approach and illustrate them by partitioning a number of real-world and model networks.
  60. Dynamics of opinions and social structures
    Martin Rosvall and Kim Sneppen
    Abstract »
    Social groups with widely different music tastes, political convictions, and religious beliefs emerge and disappear on scales from extreme subcultures to mainstream mass-cultures. Both the underlying social structure and the formation of opinions are dynamic, and changes in one affect the other. Several positive feedback mechanisms have been proposed to drive the diversity in social and economic systems, but little effort has been devoted to pinpointing the interplay between a dynamically changing social network and the spread and gathering of information on the network. Here we analyze this phenomenon in terms of a social network model that explicitly simulates the feedback between information assembly and the emergence of social structures: changing beliefs are coupled to changing relationships because agents self-organize a dynamic network to facilitate their hunter-gatherer behavior in information space. Our analysis demonstrates that tribal organizations and modular social networks can emerge as a result of contact-seeking agents that reinforce their beliefs among like-minded. We also find that prestigious persons can streamline the social network into hierarchical structures around themselves.
  61. Network models of phage-bacteria coevolution
    Martin Rosvall, Ian B. Dodd, Sandeep Krishna, and Kim Sneppen
    Phys. Rev. E 74, 066105 (2006)
    Abstract »
    Bacteria and their bacteriophages are the most abundant, widespread and diverse groups of biological entities on the planet. In an attempt to understand how the interactions between bacteria, virulent phages and temperate phages might affect the diversity of these groups, we developed a novel stochastic network model for examining the co-evolution of these ecologies. In our approach, nodes represent whole species or strains of bacteria or phages, rather than individuals, with "speciation" and extinction modelled by duplication and removal of nodes. Phage-bacteria links represent host-parasite relationships and temperate-virulent phage links denote prophage-encoded resistance. The effect of horizontal transfer of genetic information between strains was also included in the dynamic rules. The observed networks evolved in a highly dynamic fashion but the ecosystems were prone to collapse (one or more entire groups going extinct) Diversity could be stably maintained in the model only if the probability of speciation was independent of the diversity. Such an effect could be achieved in real ecosystems if the speciation rate is primarily set by the availability of ecological niches.
  62. Networks and our limited information horizon
    Martin Rosvall and Kim Sneppen
    International Journal of Bifurcation and Chaos 17, 2509 (2007)
    Abstract »
    In this paper we quantify our limited information horizon, by measuring the information necessary to locate specific nodes in a network. To investigate different ways to overcome this horizon, and the interplay between communication and topology in social networks, we let agents communicate in a model society. In this way, they build a perception of the network that they can use to create strategic links to improve their standing in the network. We observe a narrow distribution of links when communication is low and a network with a broad distribution of links when communication is high.
  63. Self-assembly of information in networks
    Martin Rosvall and Kim Sneppen
    Europhys. Lett. 74, 1109 (2006)
    Abstract »
    We model self-assembly of information in networks to investigate necessary conditions for building a global perception of a system by local communication. Our approach is to let agents chat in a model system to self-organize distant communication pathways. We demonstrate that simple local rules allow agents to build a perception of the system that is robust to dynamical changes and mistakes. We find that messages are most effectively forwarded in the presence of hubs, while transmission in hub-free networks is more robust against misinformation and failures.
  64. Modeling self-organization of communication and topology in social networks
    Martin Rosvall and Kim Sneppen
    Phys. Rev. E 74, 016108 (2006)
    Abstract »
    This paper introduces a model of self-organization between communication and topology in social networks, with feedback between different communication habits and the topology. To study this feedback, we let agents communicate to build a perception of a network and use this information to create strategic links. We observe a narrow distribution of links when the communication is low and a system with a broad distribution of links when the communication is high. We also analyze the outcome of chatting, cheating, and lying as strategies to get better access to information in the network. Chatting, although only adopted by a few agents, results in a global gain in the system. Contrary, a global loss is inevitable in a system with too many liars.
  65. Degree landscapes in scale-free networks
    Jacob Bock Axelsen, Sebastian Bernhardsson, Martin Rosvall, Kim Sneppen, and Ala Trusina
    Phys. Rev. E 74, 036119 (2006)
    Abstract »
    We generalize the degree-organizational view of real-world networks with broad degree-distributions in a landscape analogue with mountains (high-degree nodes) and valleys (low-degree nodes) For example, correlated degrees between adjacent nodes correspond to smooth landscapes (social networks), hierarchical networks to one-mountain landscapes (the Internet), and degree-disassortative networks without hierarchical features to rough landscapes with several mountains. We also generate ridge landscapes to model networks organized under constraints imposed by the space the networks are embedded in, associated to spatial or, in molecular networks, to functional localization. To quantify the topology, we here measure the widths of the mountains and the separation between different mountains.
  66. Searchability of networks
    Martin Rosvall, Andreas Grönlund, Petter Minnhagen, and Kim Sneppen
    Phys. Rev. E 72, 046117 (2005)
    Abstract »
    We investigate the searchability of complex systems in terms of their interconnectedness. Associating searchability with the number and size of branch points along the paths between the nodes, we find that scale-free networks are relatively difficult to search, and thus that the abundance of scale-free networks in nature and society may reflect an attempt to protect local areas in a highly interconnected network from nonrelated communication. In fact, starting from a random node, real-world networks with higher order organization like modular or hierarchical structure are even more difficult to navigate than random scale-free networks. The searchability at the node level opens the possibility for a generalized hierarchy measure that captures both the hierarchy in the usual terms of trees, as in military structures, and the intrinsic hierarchical nature of topological hierarchies for scale-free networks, as in the Internet.
  67. Communication boundaries in networks
    Ala Trusina, Martin Rosvall, and Kim Sneppen
    Phys. Rev. Lett. 94, 238701 (2005)
    Abstract »
    We investigate and quantify the interplay between topology and the ability to send specific signals in complex networks. We find that in a majority of investigated real-world networks, the ability to communicate is favored by the network topology at short distances, but disfavored at longer distances. We further discuss how the ability to locate specific nodes can be improved if information associated with the overall traffic in the network is available.
  68. Navigating networks with limited information
    Martin Rosvall, Petter Minnhagen, and Kim Sneppen
    Phys. Rev. E 71, 066111 (2005)
    Abstract »
    We study navigation with limited information in networks and demonstrate that many real-world networks have a structure that can be described as favoring communication at short distance at the cost of constraining communication at long distance. This feature, which is robust and more evident with limited than with complete information, reflects both topological and possibly functional design characteristics. For example, the characteristics of the networks studied derived from a city and from the Internet are manifested through modular network designs. We also observe that directed navigation in typical networks requires remarkably little information on the level of individual nodes. By studying navigation, or specific signaling, we take a complementary approach to the common studies of information transfer devoted to the broadcasting of information in studies of virus spreading and the like.
  69. Hide and seek on complex networks
    Kim Sneppen, Ala Trusina, and Martin Rosvall
    Europhys. Lett. 69, 853 (2005)
    Abstract »
    Signaling pathways and networks determine the ability to communicate in systems ranging from living cells to human society. We investigate how the network structure constrains communication in social, man-made and biological networks. We find that human networks of governance and collaboration are predictable on tête-è-tête level, reflecting well-defined pathways, but globally inefficient. In contrast, the Internet tends to have better overall communication abilities and more alternative pathways, and is therefore more robust. Between these extremes, the molecular network of Saccharomyces cerevisea is more similar to the simpler social systems, whereas the pattern of interactions in the more complex Drosophilia melanogaster resembles the robust Internet.
  70. Measuring information networks
    Kim Sneppen, Ala Trusina, and Martin Rosvall
    Pramana J. Phys. 64, 1121 (2005)
    Abstract »
    Traffic and communication between different parts of a complex system are fundamental elements in maintaining its overall cooperativity. Because a complex system consists of many different parts, it matters where signals are transmitted. Thus signaling and traffic are in principle specific, with each message going from a unique sender to a specific recipient. In the current paper we review some measures of network topology that are related to its ability to direct specific communication.
  71. Networks and cities: an information perspective
    Martin Rosvall, Ala Trusina, Petter Minnhagen, and Kim Sneppen
    Phys. Rev. Lett. 94, 028701 (2005)
    Abstract »
    Traffic is constrained by the information involved in locating the receiver and the physical distance between sender and receiver. We here focus on the former, and investigate traffic in the perspective of information handling. We re-plot the road map of cities in terms of the information needed to locate specific addresses and create information city networks with roads mapped to nodes and intersections to links between nodes. These networks have the broad degree distribution found in many other complex networks. Mapping to an information city network makes it possible to quantify the information associated with locating specific addresses.
  72. Self-organization of structures and networks from merging and small-scale fluctuations
    Petter Minnhagen, Martin Rosvall, Kim Sneppen, and Ala Trusina
    Physica A 340, 725 (2004)
    Abstract »
    We discuss merging-and-creation as a self-organizing process for scale-free topologies in networks. Three power-law classes characterized by the power-law exponents 3/2, 2 and 5/2 are identified and the process is generalized to networks. In the network context, the merging can be viewed as a consequence of optimization related to more efficient signaling.
  73. A simple model for self-organization of bipartite networks
    Kim Sneppen, Martin Rosvall, Ala Trusina, and Petter Minnhagen
    Europhys. Lett. 67, 349 (2004)
    Abstract »
    We suggest a minimalistic model for directed networks and suggest an application to the injection and merging of magnetic field lines. We obtain a network of connected donor and acceptor vertices with degree distribution 1/s^2, and with dynamic reconnection events of size \Delta s occurring with a frequency that scales as 1/\Delta s^3. This suggests that the model is in the same universality class as the model for self-organization in the solar atmosphere suggested by Hughes et al.
  74. Modeling dynamics of information networks
    Martin Rosvall, and Kim Sneppen
    Phys. Rev. Lett. 91, 178701 (2003)
    Abstract »
    We propose an information-based model for network dynamics in which imperfect information leads to networks where the different vertices have a widely different number of edges to other vertices, and where the topology has hierarchical features. The possibility to observe scale-free networks is linked to a minimally connected system where hubs remain dynamic.