2018 SIAM International Conference
on Data Mining (SDMձ8)
May 3-5,
2018 San Diego, California
Tutorial Title: Problems
with Partially Observed (Incomplete) Networks: Biases, Skewed Results, and
Solutions
Abstract: Networked
representations of physical and social phenomena are ubiquitous. Examples
include social and information networks, technological and communication
networks, co-purchasing networks, etc. These networks are often incomplete
because the phenomena are partially observed. Working with incomplete networks
can skew analyses. Acquiring the full data is often unrealistic (e.g.,
obtaining the Twitter Firehose is not viable), but one may be able to collect
data selectively to enrich the incomplete network. With a limited query budget,
which parts of a partially observed network should be examined to give the best
(i.e., most complete) view of the entire network? Suppose that one has obtained
a sample of a Twitter retweet network from a Web site. The sample was collected
for some other purpose (unbeknownst to us), and so may not contain the most
useful structural information for one’s purposes. How should one best
supplement this sampled data? This tutorial addresses the above questions. In
particular, it we will focus on multi-armed bandit and reinforcement learning
solutions
Presenters
·
Tina Eliassi-Rad, Northeastern University,
tina@eliassi.org
·
Sucheta Soundarajan, Syracuse University,
susounda@syr.edu
·
Sahely Bhadra,
Indian Institute of Technology, Palakkad, Kerala, India, sahely@iitpkd.ac.in
Schedule: This
two-hour tutorial will cover the following:
·
Complex
networks and their properties
·
Partial
observability, biases, and skewed results
·
The
network completion problem
·
Multi-armed
bandit solutions
·
Limits
of learning in incomplete networks
Slides: Available
here.
Resources & code: Will
be uploaded soon. Stay tuned.
Target Audience and
Prerequisites: Our target audience includes
researchers and practitioners in data mining and machine learning, with an
interest in incomplete (a.k.a. partially observed) networks and graphs. We are
targeting people who are concerned about the latent biases in the “real-world”
data being used in research and industry. We expect the audience to come away
with an overview of the state-of-art in enriching incomplete networks and have
a better understanding of the challenges in this area. No assumption is made
about familiarity with complex networks, graph mining, graph sampling, and
incomplete data. A brief overview of them will be included in the tutorial.
References
[1] K. Avrachenkov,
N. Litvak, L. O. Prokhorenkova, and E. Sayargulova. Quick detection of high-degree entities in
large directed networks. In ICDM, pages 54–65, 2014.
[2] M. Bilgic,
L. Mihalkova, and L. Getoor.
Active learning for networked data. In ICML, pages 79–86, 2010.
[3] S. Bubeck
and N. Cesa-Bianchi. Regret analysis of stochastic
and nonstochastic multi-armed bandit problems.
Foundations and Trends in Machine Learning, vol. 5, no. 1, pages 1–122, 2012.
[4] S. Chen, A. Mira, and J.-P. Onnela. Flexible Model Selection for Mechanistic Network
Models via Super Learner. CoRR, abs/1804.00237v1,
April 2018.
[5] J. Cho, H. Garcia-Molina, and L.
Page. Efficient crawling through URL ordering. In WWW, pages 161–172, 1998.
[6] R. Cohen, S. Havlin,
and D. Ben-Avraham. Efficient immunization strategies for computer networks and
populations. Phys. Rev. Lett., vol. 91, no. 24, p. 247901, 2003.
[7] C. Cooper, T. Radzik,
and Y. Siantos. Estimating network parameters using
random walks. In CASoN, pages 33–40, 2012.
[8] C. Dann, G. Neumann, and J. Peters.
Policy evaluation with temporal differences: A survey and comparison. Journal
of Machine Learning Research 15: 809–883 (2014).
[9] A. Dasgupta, R. Kumar, and D.
Sivakumar. Social sampling. In KDD, pages 235–243, 2012.
[10] A. Dasgupta, R. Kumar, and T. Sarlós. On estimating the average degree. In WWW, pages
795–806, 2014.
[11] I. Grondman,
L. Busoniu, G.A.D. Lopes, and R. Babuska.
A survey of actor-critic reinforcement learning: Standard and natural policy
gradients. IEEE Trans. Systems, Man, and Cybernetics, Part C 42(6): 1291–1307
(2012).
[12] S. Hanneke and E. P. Xing. Network
completing and survey sampling. In AISTATS, pages 209–215, 2009.
[13] M. Kim and J. Leskovec.
The network completion problem: Inferring missing nodes and edges in networks.
In SDM, pages 47–58, 2011.
[14] T. LaRock,
T. Sakharov, S. Bhadra, and T. Eliassi-Rad. Limits of Learning in Incomplete
Networks, NetSci’18, Paris, France, June 2018.
[15] J. Leskovec
and C. Faloutsos. Sampling from large graphs. In KDD,
pages 631–636, 2006.
[16] S. A. Macskassy
and F. Provost. Suspicion scoring based on guilt-by association, collective
inference, and focused data access. In International Conference on Intelligence
Analysis, 2005.
[17] K. Madhawa
and T. Murata. Exploring Partially Observed Networks with Nonparametric
Bandits, CoRR, abs/1804.07059v1, April 2018.
[18] A. S. Maiya
and T. Berger-Wolf. Online sampling of high centrality individuals in social
networks. In PAKDD, pages 91–98, 2010.
[19] A. S. Maiya
and T. Berger-Wolf. Sampling community structure. In WWW, pages 701–710, 2010.
[20] A. S. Maiya
and T. Y. Berger-Wolf. Benefits of bias: Towards better characterization of
network sampling. In KDD, pages 105–113, 2011.
[21] F. Murai,
D. Rennó, B. Ribeiro, G.L. Pappa,
D. Towsley and K. Gile. Selective
harvesting over networks. CoRR, abs/1703.05082v1, March
2017.
[22] L. Peel. Estimating network
parameters for selecting community detection algorithms. In FUSION, pages 1–8,
2010.
[23] J. J. Pfeiffer III, J. Neville, and
P. N. Bennett. Active sampling of networks. In MLG Workshop, 2012.
[24] J. J. Pfeiffer III, J. Neville, and
P. N. Bennett. Active exploration in networks: Using probabilistic
relationships for learning and inference. In CIKM, pages 639–648, 2014.
[25] V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? Improving data quality and data
mining using multiple, noisy labelers. In KDD, pages 614–622, 2008.
[26] S. Soundarajan,
T. Eliassi-Rad, B. Gallagher, and A. Pinar. MaxOutProbe:
An algorithm for increasing the size of partially observed networks. In NIPS
Workshop on Networks in the Social and Information Sciences, December 2015.
[27] S. Soundarajan,
T. Eliassi-Rad, B. Gallagher, and A. Pinar. MaxReach:
Reducing network incompleteness through node probes. In ASONAM 2016: 152–157.
[28] S. Soundarajan,
T. Eliassi-Rad, B. Gallagher, and A. Pinar. e-WGX: Adaptive edge probing for
enhancing incomplete networks. In WebSci, June 2017.
[29] R.S. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D.
Silver, C. Szepesvári, and E. Wiewiora.
Fast gradient-descent methods for temporal-difference learning with linear
function approximation. In ICML 2009: 993–1000.
[30]
D.
Thompson, J. Bennett, C. Seshadhri, and A. Pinar. A
provably-robust sampling method for generating colormaps of large data. In
LDAV, pages 77–84, 2013.
[31]
J.
Vermorel and M. Mohri.
Multi-armed bandit algorithms and empirical evaluation. In ECML, pages 437–448,
2005.