SIAM International
Conference on Data Mining (SDM20)
May 7 – 9,
2020; Cincinnati, Ohio
Tutorial Title: Task-driven
Discovery of Incomplete Networks: From Vertex Classification to Dense Subgraph
Discovery to Representation Learning
Abstract: Most
network analysis is conducted on existing incomplete samples of much larger
complete, fully observed graphs. For example, many researchers obtain graphs
from online data repositories without knowing how these graphs were collected.
Thus, these graphs can be poor representations of the fully observed networks.
More complete data would lead to more accurate analyses, but data acquisition
is at best costly and at worst error-prone. For
example, think of an adversary that deliberately poisons the answer to a query
(e.g., by inserting noise into an API’s responses). The network discovery
problem is defined as follows: Given a query budget for identifying additional
nodes and edges, how can one improve the observed graph sample so that it is a
more accurate representation of the complete, fully observed network? The key
phrase in this problem definition is “a more accurate representation of the
complete, fully observed network”. In active exploration, the more accurate
representation involves growing the graph by adding nodes and edges. However,
in active learning, the more accurate representation is learning the best
performing function on the network for a down-stream task such as vertex
classification. This is a novel problem that is related to, but distinct from,
topics such as graph sampling and crawling. Given the prevailing use of graph
samples in the research literature, this problem is of considerable importance,
even though it has been ignored. In this tutorial, we will discuss the
state-of-art approaches to network discovery for vertex classification, dense
subgraph discovery, and representation learning. In addition, we will discuss
issues surrounding adversarial machine learning when querying for network
discovery.
Presenters
·
Tina Eliassi-Rad, Northeastern University
·
Rajmonda Caceres, MIT Lincoln Laboratory
·
Peter Morales, MIT
Lincoln Laboratory
Contributors
·
Timothy
LaRock, Northeastern University
·
Timothy
Sakharov, Northeastern University
·
Benjamin
A. Miller, Northeastern University
·
Joel Kurucar, MIT Lincoln Laboratory
·
Sucheta Soundarajan,
Syracuse University
·
Sahely Bhadra, Indian Institute of Technology, Palakkad
Outline:
·
Introduction
and motivation (10 minutes)
·
Graph
crawling and sampling (15 minutes)
·
Estimating
network parameters (15 minutes)
·
Task-driven
network discovery
o Algorithmic choices (30 minutes)
§ Making assumptions about the
process that is generating the data
§ Learning online only vs. offline
and online
§ Making sequential discussions with
and without memory
§ Exploring vs. exploiting
·
Down-stream
tasks (30 minutes)
o Vertex classification
o Dense subgraph discovery
o Representation learning
·
Dealing
with Adversaries in Network Discovery (10 minutes)
·
Wrap-up,
future work, and Q&A (10 minutes)
Slides:
Stay tuned…