2018 SIAM International Conference on Data Mining (SDMձ8)

May 3-5, 2018      San Diego, California

 

Tutorial Title:  Problems with Partially Observed (Incomplete) Networks: Biases, Skewed Results, and Solutions

 

Abstract:  Networked representations of physical and social phenomena are ubiquitous. Examples include social and information networks, technological and communication networks, co-purchasing networks, etc. These networks are often incomplete because the phenomena are partially observed. Working with incomplete networks can skew analyses. Acquiring the full data is often unrealistic (e.g., obtaining the Twitter Firehose is not viable), but one may be able to collect data selectively to enrich the incomplete network. With a limited query budget, which parts of a partially observed network should be examined to give the best (i.e., most complete) view of the entire network? Suppose that one has obtained a sample of a Twitter retweet network from a Web site. The sample was collected for some other purpose (unbeknownst to us), and so may not contain the most useful structural information for one’s purposes. How should one best supplement this sampled data? This tutorial addresses the above questions. In particular, it we will focus on multi-armed bandit and reinforcement learning solutions

 

Presenters

·      Tina Eliassi-Rad, Northeastern University, tina@eliassi.org

·      Sucheta Soundarajan, Syracuse University, susounda@syr.edu

·      Sahely Bhadra, Indian Institute of Technology, Palakkad, Kerala, India, sahely@iitpkd.ac.in

 

Schedule:  This two-hour tutorial will cover the following:

·      Complex networks and their properties

·      Partial observability, biases, and skewed results

·      The network completion problem

·      Multi-armed bandit solutions

·      Limits of learning in incomplete networks

 

Slides:  Available here.

 

Resources & code:  Will be uploaded soon. Stay tuned.

 

Target Audience and Prerequisites:  Our target audience includes researchers and practitioners in data mining and machine learning, with an interest in incomplete (a.k.a. partially observed) networks and graphs. We are targeting people who are concerned about the latent biases in the “real-world” data being used in research and industry. We expect the audience to come away with an overview of the state-of-art in enriching incomplete networks and have a better understanding of the challenges in this area. No assumption is made about familiarity with complex networks, graph mining, graph sampling, and incomplete data. A brief overview of them will be included in the tutorial.

 

References

[1]  K. Avrachenkov, N. Litvak, L. O. Prokhorenkova, and E. Sayargulova. Quick detection of high-degree entities in large directed networks. In ICDM, pages 54–65, 2014.

[2]  M. Bilgic, L. Mihalkova, and L. Getoor. Active learning for networked data. In ICML, pages 79–86, 2010.

[3]  S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, vol. 5, no. 1, pages 1–122, 2012.

[4]  S. Chen, A. Mira, and J.-P. Onnela. Flexible Model Selection for Mechanistic Network Models via Super Learner. CoRR, abs/1804.00237v1, April 2018.

[5]  J. Cho, H. Garcia-Molina, and L. Page. Efficient crawling through URL ordering. In WWW, pages 161–172, 1998.

[6]  R. Cohen, S. Havlin, and D. Ben-Avraham. Efficient immunization strategies for computer networks and populations. Phys. Rev. Lett., vol. 91, no. 24, p. 247901, 2003.

[7]  C. Cooper, T. Radzik, and Y. Siantos. Estimating network parameters using random walks. In CASoN, pages 33–40, 2012.

[8]  C. Dann, G. Neumann, and J. Peters. Policy evaluation with temporal differences: A survey and comparison. Journal of Machine Learning Research 15: 809–883 (2014).

[9]  A. Dasgupta, R. Kumar, and D. Sivakumar. Social sampling. In KDD, pages 235–243, 2012.

[10]    A. Dasgupta, R. Kumar, and T. Sarlós. On estimating the average degree. In WWW, pages 795–806, 2014.

[11]    I. Grondman, L. Busoniu, G.A.D. Lopes, and R. Babuska. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Trans. Systems, Man, and Cybernetics, Part C 42(6): 1291–1307 (2012).

[12]    S. Hanneke and E. P. Xing. Network completing and survey sampling. In AISTATS, pages 209–215, 2009.

[13]    M. Kim and J. Leskovec. The network completion problem: Inferring missing nodes and edges in networks. In SDM, pages 47–58, 2011.

[14]    T. LaRock, T. Sakharov, S. Bhadra, and T. Eliassi-Rad. Limits of Learning in Incomplete Networks, NetSci’18, Paris, France, June 2018.

[15]    J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD, pages 631–636, 2006.

[16]    S. A. Macskassy and F. Provost. Suspicion scoring based on guilt-by association, collective inference, and focused data access. In International Conference on Intelligence Analysis, 2005.

[17]    K. Madhawa and T. Murata. Exploring Partially Observed Networks with Nonparametric Bandits, CoRR, abs/1804.07059v1, April 2018.

[18]    A. S. Maiya and T. Berger-Wolf. Online sampling of high centrality individuals in social networks. In PAKDD, pages 91–98, 2010.

[19]    A. S. Maiya and T. Berger-Wolf. Sampling community structure. In WWW, pages 701–710, 2010.

[20]    A. S. Maiya and T. Y. Berger-Wolf. Benefits of bias: Towards better characterization of network sampling. In KDD, pages 105–113, 2011.

[21]    F. Murai, D. Rennó, B. Ribeiro, G.L. Pappa, D. Towsley and K. Gile. Selective harvesting over networks. CoRR, abs/1703.05082v1, March 2017.

[22]    L. Peel. Estimating network parameters for selecting community detection algorithms. In FUSION, pages 1–8, 2010.

[23]    J. J. Pfeiffer III, J. Neville, and P. N. Bennett. Active sampling of networks. In MLG Workshop, 2012.

[24]    J. J. Pfeiffer III, J. Neville, and P. N. Bennett. Active exploration in networks: Using probabilistic relationships for learning and inference. In CIKM, pages 639–648, 2014.

[25]    V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? Improving data quality and data mining using multiple, noisy labelers. In KDD, pages 614–622, 2008.

[26]    S. Soundarajan, T. Eliassi-Rad, B. Gallagher, and A. Pinar. MaxOutProbe: An algorithm for increasing the size of partially observed networks. In NIPS Workshop on Networks in the Social and Information Sciences, December 2015.

[27]    S. Soundarajan, T. Eliassi-Rad, B. Gallagher, and A. Pinar. MaxReach: Reducing network incompleteness through node probes. In ASONAM 2016: 152–157.

[28]    S. Soundarajan, T. Eliassi-Rad, B. Gallagher, and A. Pinar. e-WGX: Adaptive edge probing for enhancing incomplete networks. In WebSci, June 2017.

[29]    R.S. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In ICML 2009: 993–1000.

[30]    D. Thompson, J. Bennett, C. Seshadhri, and A. Pinar. A provably-robust sampling method for generating colormaps of large data. In LDAV, pages 77–84, 2013.

[31]    J. Vermorel and M. Mohri. Multi-armed bandit algorithms and empirical evaluation. In ECML, pages 437–448, 2005.