TWC: Medium: Collaborative: Know Thy Enemy: Data Mining Meets Networks for Understanding Web-Based Malware Dissemination


 Tina Eliassi-Rad (PI) 


 Network Science Institute 

 Phone: (617) 373-6475 

 College of Computer and Information Science 

 Email: tina AT; eliassi AT 

 Northeastern University 

 Address: 360 Huntington Avenue, Mailstop 1010-177, Boston, MA 02115 

This material is based upon work supported by the National Science Foundation under Grant No. CNS-1314603. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


1.1. Abstract

Link to NSF abstract

How does web-based malware spread? We use the term web-based malware to describe malware that is distributed through websites, and malicious posts in social networks. We are in an arms race against web-based malware distributors; and as in any war, knowledge is power. The more we know about them, the better we can defend ourselves. Our goal is to understand the dissemination of web-based malware by creating "MalScope," a suite of methods and tools that uses cutting-edge approaches to build spatiotemporal models, generators and sampling techniques for malware dissemination. From a scientific point of view, this project brings together two disciplines: Data Mining and Network Security. The outcome is a suite of novel, sophisticated, and scalable techniques and models that will enhance our understanding of malware dissemination at a large scale. We use two types of web-based malware dissemination data: (1) user machines accessing dangerous sites and downloading web-based malware; and (2) Facebook users being exposed to malicious posts. We already have and will continue to obtain more data from our industry partners (e.g., Symantec's WINE project), open-access projects, or collect on our own (e.g., MyPageKeeper).

The broader impact of our work is that it will enable the development of security solutions for end-users and industry. A 15-minute network outage costs a 200-employee company about $40K, while identity theft costs about $1,500 per person on average. By knowing the enemy better, security researchers and industry can more effectively stop the interconnected manifestations of Internet threats: identity theft, the creation of botnets, and DoS attacks. The PIs have a track record of technology transfer, with collaborators at industrial labs (Yahoo, MSR, Symantec, AT&T, IBM), national labs (LLNL, Sandia), open-source software ("Pegasus"), and spin-off startups (StopTheHacker). Educational impacts include developing a new course, providing publicly available educational material, and open-source software.

1.2. Keywords

Data mining, web-based malware dissemination, graph mining.

1.3. Funding agency


The following professors are co-PIs on this project:


The following graduate students and postdocs have worked on the project:


Selected Papers:


Selected tutorials with co-PIs:


Selected invited talks at conferences and workshops:


All other papers, talks, tutorial slides, and other resources are available here.

Last updated July 5, 2016 by Tina Eliassi-Rad.