SIAM International Conference on Data Mining (SDM20)

May 7 – 9, 2020; Cincinnati, Ohio


Tutorial Title:  Task-driven Discovery of Incomplete Networks: From Vertex Classification to Dense Subgraph Discovery to Representation Learning


Abstract:  Most network analysis is conducted on existing incomplete samples of much larger complete, fully observed graphs. For example, many researchers obtain graphs from online data repositories without knowing how these graphs were collected. Thus, these graphs can be poor representations of the fully observed networks. More complete data would lead to more accurate analyses, but data acquisition is at best costly and at worst error-prone. For example, think of an adversary that deliberately poisons the answer to a query (e.g., by inserting noise into an API’s responses). The network discovery problem is defined as follows: Given a query budget for identifying additional nodes and edges, how can one improve the observed graph sample so that it is a more accurate representation of the complete, fully observed network? The key phrase in this problem definition is “a more accurate representation of the complete, fully observed network”. In active exploration, the more accurate representation involves growing the graph by adding nodes and edges. However, in active learning, the more accurate representation is learning the best performing function on the network for a down-stream task such as vertex classification. This is a novel problem that is related to, but distinct from, topics such as graph sampling and crawling. Given the prevailing use of graph samples in the research literature, this problem is of considerable importance, even though it has been ignored. In this tutorial, we will discuss the state-of-art approaches to network discovery for vertex classification, dense subgraph discovery, and representation learning. In addition, we will discuss issues surrounding adversarial machine learning when querying for network discovery.



·      Tina Eliassi-Rad, Northeastern University

·      Rajmonda Caceres, MIT Lincoln Laboratory

·      Peter Morales, MIT Lincoln Laboratory



·      Timothy LaRock, Northeastern University

·      Timothy Sakharov, Northeastern University

·      Benjamin A. Miller, Northeastern University

·      Joel Kurucar, MIT Lincoln Laboratory

·      Sucheta Soundarajan, Syracuse University

·      Sahely Bhadra, Indian Institute of Technology, Palakkad



·      Introduction and motivation (10 minutes)

·      Graph crawling and sampling (15 minutes)

·      Estimating network parameters (15 minutes)

·      Task-driven network discovery

o   Algorithmic choices (30 minutes)

§  Making assumptions about the process that is generating the data

§  Learning online only vs. offline and online

§  Making sequential discussions with and without memory

§  Exploring vs. exploiting

·      Down-stream tasks (30 minutes)

o   Vertex classification

o   Dense subgraph discovery

o   Representation learning

·      Dealing with Adversaries in Network Discovery (10 minutes)

·      Wrap-up, future work, and Q&A (10 minutes)


Slides: Stay tuned…