The 25TH ACM SIGKDD Conference on Knowledge
Discovery and Data Mining (KDD 2019)
Sunday
August 4, 2019, 1:00 – 5:00 PM
Dena’ina Civic and Convention
Center, Anchorage, AK.
Tutorial Title: Incompleteness
in Networks: Biases, Skewed Results, and Some Solutions
Abstract: Network
analysis is often conducted on incomplete samples of much larger fully observed
networks (which are supposed to represent some phenomena of interest). For
example, many researchers obtain networks from online data repositories (such
as snap.stanford.edu and icon.colorado.edu) without knowing how the
networks were collected. Such networks can be poor representations of the fully
observed phenomena. More complete data would lead to more accurate analyses,
but data acquisition can be at best costly and at worst error-prone (e.g.,
consider an adversary that deliberately poisons the answer to a query). Past
work on this topic has forked into two branches: (1) assume a network model and
use the observed data to infer the unobserved data; and (2) do not assume a
network model, instead use the observed data to learn a policy for data
acquisition given a query budget. We focus on the second branch. That is, given
a query budget for identifying additional nodes and edges, how can one improve
the observed network sample so that it is a more accurate representation of the
fully observed network? This problem is related to, but distinct from, topics
such as graph sampling and crawling. In this tutorial, we will discuss latent biases
in incomplete networks and present methods for enriching such networks through
active probing of nodes and edges. We will focus on multi-armed bandit, online
active learning, and Markov decision process formulations of this problem
(a.k.a. the network discovery problem); and clarify distinctions between
learning to grow the network (a.k.a. active exploration) and learning the
“best” function on the network (a.k.a. active learning).
Presenters
·
Tina Eliassi-Rad, Northeastern University
·
Rajmonda
Caceres, MIT Lincoln Laboratory
·
Timothy
LaRock, Northeastern University
·
Peter Morales, MIT
Lincoln Laboratory
Contributors
·
Timothy
Sakharov, Northeastern University
·
Benjamin
A. Miller, Northeastern University
·
Joel Kurucar, MIT
Lincoln Laboratory
·
Sucheta Soundarajan, Syracuse University
·
Sahely Bhadra,
Indian Institute of Technology, Palakkad
Outline:
·
Introduction
o
Complex networks and their properties
o
Partial observability, biases, and skewed results
o
The network discovery (a.k.a. network completion)
problem
·
Online learning for network discovery
o
Estimation approaches
o
Multi-armed bandit solutions
o
Coffee Break: 2:30 – 3:00 @ Lobby Foyer
o
Network online learning and its limitations
·
Offline learning for network discovery
o
The need for offline learning
o
Reinforcement learning
·
Wrap-up, open questions, and Q&A
Slides:
Available here.