Fast Random Walk with Restart and Its Applications

Fast Random Walk with Restart and Its Applications

| Hanghang Tong, Christos Faloutsos, Jia-Yu Pan
This paper proposes a fast solution for computing random walk with restart (RWR) on large graphs, which is used for various applications such as image captioning, neighborhood formation, and content-based image retrieval. The key idea is to exploit two properties of real graphs: (a) linear correlations and (b) blockwise, community-like structure. The proposed method, B_LIN, uses low-rank matrix approximation and graph partitioning, followed by the Sherman-Morrison lemma for matrix inversion. This approach significantly reduces pre-computation and storage costs, while maintaining high query performance. The method is validated on real-world datasets, including Corel images and DBLP author-conference data. Experimental results show that B_LIN achieves up to 150x speedup with 90%+ quality preservation compared to the straightforward implementation. For the DBLP dataset, it achieves up to 1,800x speedup with no quality loss. The method is also extended to bipartite graphs, achieving fast performance for tasks such as finding related conferences or authors. The paper also provides theoretical justification for the method, including an error bound for NB_LIN. The proposed approach balances the trade-off between off-line processing cost and on-line response quality, making it suitable for large-scale applications. The results demonstrate that the proposed methods can efficiently handle large graphs while maintaining high accuracy and performance.This paper proposes a fast solution for computing random walk with restart (RWR) on large graphs, which is used for various applications such as image captioning, neighborhood formation, and content-based image retrieval. The key idea is to exploit two properties of real graphs: (a) linear correlations and (b) blockwise, community-like structure. The proposed method, B_LIN, uses low-rank matrix approximation and graph partitioning, followed by the Sherman-Morrison lemma for matrix inversion. This approach significantly reduces pre-computation and storage costs, while maintaining high query performance. The method is validated on real-world datasets, including Corel images and DBLP author-conference data. Experimental results show that B_LIN achieves up to 150x speedup with 90%+ quality preservation compared to the straightforward implementation. For the DBLP dataset, it achieves up to 1,800x speedup with no quality loss. The method is also extended to bipartite graphs, achieving fast performance for tasks such as finding related conferences or authors. The paper also provides theoretical justification for the method, including an error bound for NB_LIN. The proposed approach balances the trade-off between off-line processing cost and on-line response quality, making it suitable for large-scale applications. The results demonstrate that the proposed methods can efficiently handle large graphs while maintaining high accuracy and performance.
Reach us at info@study.space