Tutorial Abstract

Web Mining

Ricardo Baeza-Yates, Aristides Gionis

Friday, September 19, morning
Location: R007

The Web continues to grow and evolve very fast, changing our daily lives. This activity represents the collaborative work of the millions of institutions and people that contribute content to the Web as well as the one billion people that use it. In this ocean of hyperlinked data there is explicit and implicit information and knowledge. Web Mining is the task of analyzing this data and extracting information and knowledge for many different purposes. The data comes in three main OCavors: content (text, images, etc.), structure (hyperlinks) and usage (navigation, queries, etc.), implying different techniques such as text, graph or log mining. Each case reflects the wisdom of some group of people that can be used to make the Web better. For example, user generated tags in Web 2.0 sites. In this tutorial we walk through the mining process and show several applications, ranging from Web site design to search engines. The tutorial will consist of four major parts.
We will introduce the main concepts behind Web mining, the different data that is found in the Web and typical applications. We walk through the mining process: data recollection, data cleaning, data warehousing and data analysis. This includes crawling in the case of content mining, and privacy issues in the case of usage mining and techniques such as k-anonymization.
We discuss the main techniques used for the different data types and typical applications. We finish with three detailed cases: Web site design, Web spam detection and query mining.