Complex Networks: Analysis, Numerics, and Applications
|
| |
Dates and times (all times in EST / GMT-5, V = Virtual, P = In Person) :
Time | Friday, Feb 18th | Saturday, Feb 19th |
9:15-9:30 | Log-in / Registration | Log-in |
9:30-10:15 | John Conroy Neil Molino | V | Maria Cameron | P |
10:30-11:15 | Mauro Maggioni | V | Yihong Wu | V |
11:30-12:15 | John Dickerson | P | Cetin Savkli | P |
12:30-2:00 | Lunch |
2:00-2:45 | Bhaswar Bhattacharya | V | Vince Lyzinski | P |
3:00-3:45 | Michelle Girvan | V | Zachary Lubberts | P |
4:00-4:45 | David Bindel | V | Edo Airoldi Chencheng Cai | V |
Where:
UMD Kirwan 3206, with simulataneous broadcast via this Zoom link.
Recorded talks will become available on our Youtube channel.
List of Speakers
Dr. Edo Airoldi (Temple) Dr. Chencheng Cai (Temple) |
Optimizing Randomized and Deterministic Saturation Designs under Interference |
Randomized saturation designs are a family of designs which assign a possibly different treatment proportion to each cluster of a population at random. As a result, they generalize the well-known (stratified) completely randomized designs and the cluster-based randomized designs, which are included as special cases. We show that, under the stable unit treatment value assumption, either the cluster-based or the stratified completely randomized design are in fact optimal for the bias and variance of the difference-in-means estimator among randomized saturation designs. However, this is no longer the case when interference is present. We provide the closed form of the bias and variance of the difference-in-means estimator under a linear model of interference and investigate the optimization of each of these objectives. In addition tothe randomized saturation designs, we propose a deterministic saturation design, where the treatment proportion for clusters are fixed, rather than randomized, in order to further improve the estimator under correct model specification. Through simulations, we illustrate the merits of optimizing randomized saturation designs to the graph and potential outcome structure, as well as showcasing the additional improvements yielded by well-chosen deterministic saturation designs.
|
Dr. Bhaswar Bhattacharya (UPenn) |
Motif Estimation via Subgraph Sampling: The Fourth-Moment Phenomenon |
Network sampling has emerged as an indispensable tool for understanding features of large-scale complex networks where it is practically impossible to search/query over all the nodes. Examples include social networks, biological networks, internet and communication networks, and socio-economic networks, among others. In this talk we will discuss a unified framework for statistical inference for counting motifs, such as edges, triangles, and wedges, in the widely used subgraph sampling model. In particular, we will provide precise conditions for the consistency and the asymptotic normality of the natural Horvitz–Thompson (HT) estimator, which can be used for constructing confidence intervals and hypothesis testing for the motif counts. As a consequence, an interesting fourth-moment phenomenon for the asymptotic normality of the HT estimator and connections to fundamental results in random graph theory will emerge.
Based on joint work with Sayan Das and Sumit Mukherjee.
|
Dr. David Bindel (Cornell) |
Spectral network analysis: beyond the spectrum's edge |
Most spectral methods for analyzing complex networks involve the extremal eigenvalues and associated eigenvalues for a Laplacian or for some other matrix associated with a graph. In this talk, we discuss two lines of work -- one on "local spectra" for community detection, one on graph spectral densities – that generalize spectral analysis techniques to extract information from the interior of the spectrum. Our approach highlights connections between approaches from network science, geometry, and condensed matter physics.
Joint work with Kun Dong, Austin Benson, Kun He, and John Hopcroft.
|
Dr. Maria Cameron (UMD) |
Analysis of hydrocarbon pyrolysis with the aid of random graph theory |
Understanding hydrocarbon pyrolysis under extreme temperature and pressure is of interest to the combustion, organic chemistry, and astrophysics communities. Molecular dynamics simulations of hydrocarbon pyrolysis allow us to get an experimental insight, but they are extremely expensive — take weeks or even months to run. These simulations have shown the generation of over 7 thousand chemical species and over 13 thousand unique chemical reactions — and there is strong evidence that this is just a small fraction of possible reactions. We propose a methodology featuring an originally developed simplified 10-reaction model and the use of the random graph theory, specifically the configurational model, that allows us to predict the equilibrium molecule size distribution for various initial compositions and various temperatures without running molecular simulation. We demonstrate the strength and discuss the limitations of this approach.
Joint work with Vincent Dufour-Decieus (Stanford University), Christopher Moakler (UMD), and Evan Reed (Stanford University).
|
Dr. John Conroy (IDA CSS) Dr. Neil Molino (IDA CSS) |
Towards Two to Five Truths Revealed in Non-Negative
Matrix Factorizations |
In this talk we explore the role of matrix scaling of a matrix of counts when building
a topic model using a non-negative matrix factorization. We present a scaling inspired
by the normalized Laplacian (NL) for graphs can greatly improve the quality of a nonnegative matrix factorization. The results parallel those in spectral graph clustering work
of [2], where the authors proved adjacency spectral embedding (ASE) spectral clustering
was more likely to discover core-periphery partitions and Laplacian Spectral Embedding
(LSE) was more likely to discover affinity partitions. In the text applications non-negative
factorizations of matrices (NMF) are typically used on a matrix of co-occurence “contexts”
and “terms” counts.1 Terms, in our experiments were tokens as defined by CountVectorizer, which breaks on white space and removes punctuation. Depending on the application
a context could be a set of two or more consecutive terms, a sentence, or one or more documents. We studied five matrix scalings (1) the original counts (None), (2) column scaling
(CS), (3) row scaling (RS), (4) pointwise mutual information (PWMI), and (5) normalized
Laplacian (NL). As the matrices are context-term matrices RS scaling turn each row into
the maximum likelihood multinomial for context and CS estimates the term distributions
across the context. PWMI scales the counts by the row and column marginals, while NL
scales them by the square root’s of these marginals. To the extent that the spectral result of
[2] carries over to NMF applied to text we would expect NL to be best at recovering latent
topics. We test these 5 scalings on three datasets, 20 newsgroups, NMF both ways using
counts and LSE normalized counts for the newsgroup data (20 groups) , Clemson Russian
troll data (8 types of trolls).2 and a collection of 500 newswire documents, consisting of
50 topics from the Document Understanding Conference (DUC). Using the adjusted Rand
index (ARI) measure cluster similarity we see an increase of 50% for the tweet data and
over 200% for the newsgroup dataset. For the DUC dataset, NL edges out RS and gives
over 40% improvement over None. The second part of the talk will give some analysis of
why these matrix scalings behave as they do.
1 See [1] for example for a text summarization example.
2 https://fivethirtyeight.com/features/why-were-sharing-3-million-russian-troll-tweets/
[1] John M. Conroy, Sashka T. Davis, Section mixture models for scientific document
summarization International Journal on Digital Libraries 19 (2-3), 305-322, 2018.
[2] Carey E. Priebe, Youngser Park, Joshua T. Vogelstein, John M. Conroy, Vince Lyzinski, Minh Tang, Avanti Athreya, Joshua Cape, and Eric Bridgeford. On a two-truths
phenomenon in spectral graph clustering. Proceedings of the National Academy of
Sciences, 116(13):5995–6000, 2019.
|
Dr. John Dickerson (UMD) |
Designing Efficient, Fair, and Robust Platform Markets: Case Studies in Worldwide Blood Donation and Organ Exchange |
Markets are systems that empower interested parties — humans, firms, governments, or autonomous agents — to exchange goods, services, and information. In some markets, such as stock and commodity exchanges, prices do all of the “work” of matching supply and demand. Due to logistical or societal constraints, many markets, e.g., school choice, rideshare, online dating, advertising, cadaveric organ allocation, online labor, public housing, refugee placement, and kidney exchange, cannot rely solely on prices to match supply and demand. Techniques from artificial intelligence (AI), computer science, and mathematics have a long history of both being created via, and also finding application in, the design and analysis of markets of both types. AI techniques determine how to discover structure in an uncertain matching problem, learn how to decide between matching now versus waiting, and balance competing objectives such as fairness, diversity, and economic efficiency.
This talk covers optimization- and AI-based approaches to the design and analysis of markets. It focuses on managing uncertainty — specifically, on optimizing costly information gathering done before a clearing algorithm is run, and then incorporating that uncertainty into the algorithm(s) at clearing time. Techniques will be presented through the lens of worldwide blood donation as well as kidney exchange, an organized market where patients with end-stage renal failure swap willing but incompatible donors. These markets are currently fielded nationally and internationally and are run (to varying degrees) by AI-based systems, thus surfacing pertinent questions at the intersection of ethics and artificial intelligence.
This talk covers recent work published in the last couple of years at AAAI, AIJ, EC, ICML, JAIR, Management Science, NeurIPS, and Operations Research, as well as work that is currently under submission.
|
Dr. Michelle Girvan (UMD) |
Reservoir Computing for Forecasting the Dynamics of Complex Networks |
|
Dr. Zachary Lubberts (JHU) |
Beyond the Adjacency Matrix: Random Line Graphs and Inference for Networks with Edge Attributes |
Any modern network inference paradigm must incorporate multiple aspects of network structure, including information that is often encoded both in vertices and in edges. Methodology for handling vertex attributes has been developed for a number of network models, but comparable techniques for edge-related attributes remain largely unavailable. We address this gap in the literature by extending the latent position random graph model to the line graph of a random graph, which is formed by creating a vertex for each edge in the original random graph, and connecting each pair of edges incident to a common vertex in the original graph. We prove concentration inequalitie for the spectrum of a line graph, and then establish that although naive spectral decompositions can fail to extract necessary signal for edge clustering, there exist signal-preserving singular subspaces of the line graph that can be recovered through a carefully-chosen projection. Moreover, we can consistently estimate edge latent positions in a random line graph, even though such graphs are of a random size, typically have high rank, and possess no spectral gap. Our results also demonstrate that the line graph of a stochastic block model exhibits underlying block structure, and we synthesize and test our methods in simulations for cluster recovery and edge covariate inference in stochastic block model graphs.
|
Dr. Vince Lyzinski (UMD) |
Multiplex Graph Matching Matched Filters |
We consider the problem of detecting a noisy induced multiplex template network in a larger multiplex background network. Our approach, which extends the framework of Sussman et al. (2019) to the multiplex setting, leverages a multiplex analogue of the classical graph matching problem to use the template as a matched filter for efficiently searching the background for candidate template matches. The effectiveness of our approach is demonstrated both theoretically and empirically, with particular attention paid to richly featured network examples.
|
Dr. Mauro Maggioni (JHU) |
Learning Interaction laws in networks of particle- and agent-based systems |
Interacting agent-based systems are ubiquitous in science, from modeling of particles in Physics to prey-predator and colony models in Biology, to opinion dynamics in economics and social sciences. Oftentimes the laws of interactions between the agents are quite simple, for example they depend only on pairwise interactions, and only on pairwise distance in each interaction, yet the agents form time-varying networks exhibit large time and large space behaviors that can be quite coordinated and complex. We consider the following inference problem for a system of interacting particles or agents: given only observed trajectories of the agents in the system, can we learn what the laws of interactions are? We would like to do this without assuming any particular form for the interaction laws, i.e. they might be “any” function of pairwise distances. We consider this problem both the mean-field limit (i.e. the number of particles going to infinity) and in the case of a finite number of agents, with an increasing number of observations, albeit in this talk we will mostly focus on the latter case. We cast this as an inverse problem, and present a solution in the simplest yet interesting case where the interaction is governed by an (unknown) function of pairwise distances. We discuss when this problem is well-posed, we construct estimators for the interaction kernels with provably good statistically and computational properties, and discuss extensions to second-order systems, more general interaction kernels, and stochastic systems. We measure empirically the performance of our techniques on various examples, that include extensions to agent systems with different types of agents, second-order systems, and families of systems with parametric interaction kernels. We also conduct numerical experiments to test the large time behavior of these systems, especially in the cases where they exhibit emergent behavior.
This is joint work with F. Lu, J.Miller, S. Tang and M. Zhong.
|
Dr. Cetin Savkli (JHU APL) |
Link cohesion in highly connected graphs |
Real world graphs are often very dense and discovering structure of communities can be a challenging task. In this talk we will discuss truss and trapeze based link strength metrics and how they can be used to discover community structure. Low computational cost and simplicity of these metrics make them useful tools in pruning dense graphs to enhance quality of subsequent analysis.”
|
Dr. Yihong Wu (Yale) |
Spectral Graph Matching: Theory and Algorithms |
Graph matching aims at finding the vertex correspondence that maximally aligns the edge sets of two given graphs. This task amounts to solving a computationally intractable quadratic assignment problem (QAP). We propose a new spectral method, which computes the eigendecomposition of the two adjacency matrices and returns a matching based on the pairwise alignments between all eigenvectors of the first graph with all eigenvectors of the second. Each alignment is inversely weighted by the gap between the corresponding eigenvalues. This spectral method can be equivalently viewed as solving a regularized quadratic programming relaxation of the QAP. We show that, for a correlated Erdos-Renyi model with average degree at least polylog(n), this method finds the exact matching with high probability if the two graphs differ by at most a 1/polylog(n) fraction of edges. The proposed algorithm matches the state of the art of polynomial-time algorithms based on combinatorial ideas, and exponentially improves the performance of existing spectral methods that only compare top eigenvectors or eigenvectors of the same order. The analysis exploits local laws for the resolvents of sparse Wigner matrices. The superiority of this new spectral method is also demonstrated on a variety of datasets, in terms of both matching accuracy and computational efficiency. Finally, we discuss various open questions in the context of statistical and computational limits of this problem.
Based on joint work with Zhou Fan (Yale), Cheng Mao (Georgia Tech), Jiaming Xu (Duke), and Sophie Yu (Duke) available at https://arxiv.org/abs/1907.08880, https://arxiv.org/abs/1907.08883, https://arxiv.org/abs/2008.10097, https://arxiv.org/abs/2110.11816.
|
Organizing Committee:
Radu Balan Maria Cameron Wojciech Czaja Vince Lyzinski
|