[slides] The Web as a Graph%3A Measurements%2C Models%2C and Methods

The paper "The Web as a Graph: Measurements, Models, and Methods" by Jon M. Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew S. Tomkins explores the structure of the World Wide Web (WWW) as a directed graph, with nodes representing web pages and edges representing hyperlinks. The authors highlight the significant impact of the Web on society, noting its rapid growth and the rich information content embedded in the network of links. The paper begins by describing two algorithms that operate on the Web graph: Kleinberg's HITS method for web search and the enumeration of certain bipartite cliques for community discovery. These algorithms are chosen because they are driven by specific structures in the Web graph, which are fundamental to how web content is created. The authors then present a series of measurements and properties of the Web graph, such as the Zipfian distribution of in- and out-degrees, which suggest that traditional random graph models like $\mathcal{G}_{n,p}$ are inadequate for modeling the Web graph. They propose a new family of random graph models that incorporate a "copying" process, where links are added by copying from other nodes in the graph. This copying process is seen as a key aspect of both content creation and the observed statistics. The paper concludes with a discussion of related work, including the use of Web graph structure to enhance web search and the statistical analysis of academic citation graphs, which also exhibit Zipf distributions. The authors suggest that their new models and observations open up a rich sub-field of research in the analysis of random graphs and graph algorithms on the Web.The paper "The Web as a Graph: Measurements, Models, and Methods" by Jon M. Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew S. Tomkins explores the structure of the World Wide Web (WWW) as a directed graph, with nodes representing web pages and edges representing hyperlinks. The authors highlight the significant impact of the Web on society, noting its rapid growth and the rich information content embedded in the network of links. The paper begins by describing two algorithms that operate on the Web graph: Kleinberg's HITS method for web search and the enumeration of certain bipartite cliques for community discovery. These algorithms are chosen because they are driven by specific structures in the Web graph, which are fundamental to how web content is created. The authors then present a series of measurements and properties of the Web graph, such as the Zipfian distribution of in- and out-degrees, which suggest that traditional random graph models like $\mathcal{G}_{n,p}$ are inadequate for modeling the Web graph. They propose a new family of random graph models that incorporate a "copying" process, where links are added by copying from other nodes in the graph. This copying process is seen as a key aspect of both content creation and the observed statistics. The paper concludes with a discussion of related work, including the use of Web graph structure to enhance web search and the statistical analysis of academic citation graphs, which also exhibit Zipf distributions. The authors suggest that their new models and observations open up a rich sub-field of research in the analysis of random graphs and graph algorithms on the Web.

The Web as a Graph: Measurements, Models, and Methods

1999 | Jon M. Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew S. Tomkins