The 25TH ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2019)

 

Sunday August 4, 2019, 1:00 – 5:00 PM

Dena’ina Civic and Convention Center, Anchorage, AK.

 

Tutorial Title:  Incompleteness in Networks: Biases, Skewed Results, and Some Solutions

 

Abstract:  Network analysis is often conducted on incomplete samples of much larger fully observed networks (which are supposed to represent some phenomena of interest). For example, many researchers obtain networks from online data repositories (such as snap.stanford.edu and icon.colorado.edu) without knowing how the networks were collected. Such networks can be poor representations of the fully observed phenomena. More complete data would lead to more accurate analyses, but data acquisition can be at best costly and at worst error-prone (e.g., consider an adversary that deliberately poisons the answer to a query). Past work on this topic has forked into two branches: (1) assume a network model and use the observed data to infer the unobserved data; and (2) do not assume a network model, instead use the observed data to learn a policy for data acquisition given a query budget. We focus on the second branch. That is, given a query budget for identifying additional nodes and edges, how can one improve the observed network sample so that it is a more accurate representation of the fully observed network? This problem is related to, but distinct from, topics such as graph sampling and crawling. In this tutorial, we will discuss latent biases in incomplete networks and present methods for enriching such networks through active probing of nodes and edges. We will focus on multi-armed bandit, online active learning, and Markov decision process formulations of this problem (a.k.a. the network discovery problem); and clarify distinctions between learning to grow the network (a.k.a. active exploration) and learning the “best” function on the network (a.k.a. active learning).

 

Presenters

·      Tina Eliassi-Rad, Northeastern University

·      Rajmonda Caceres, MIT Lincoln Laboratory

·      Timothy LaRock, Northeastern University

·      Peter Morales, MIT Lincoln Laboratory

 

Contributors

·      Timothy Sakharov, Northeastern University

·      Benjamin A. Miller, Northeastern University

·      Joel Kurucar, MIT Lincoln Laboratory

·      Sucheta Soundarajan, Syracuse University

·      Sahely Bhadra, Indian Institute of Technology, Palakkad

 

Outline:

·      Introduction

o   Complex networks and their properties

o   Partial observability, biases, and skewed results

o   The network discovery (a.k.a. network completion) problem

·      Online learning for network discovery

o   Estimation approaches

o   Multi-armed bandit solutions

o   Coffee Break: 2:30 – 3:00 @ Lobby Foyer

o   Network online learning and its limitations

·      Offline learning for network discovery

o   The need for offline learning

o   Reinforcement learning

·      Wrap-up, open questions, and Q&A

 

Slides: Available here.