Quickly processing very large graphs containing billions to trillions of vertices and edges is increasingly important in practice today. The goal of this workshop is to bring together researchers studying theoretical and practical aspects of large-scale parallel graph processing to share their results on new problems, models, and algorithms, and to share progress in translating theoretical results to practical large-scale graph processing solutions.
Some topics that will be covered include industrial applications of parallel graph processing, frameworks for graph processing, representations of dynamic graphs, parallel graph-clustering, theoretically-efficient graph algorithms, and parallel graph partitioning.
The workshop will take place on Monday, July 11th from 2–6pm. Please see the SPAA site for more details.
Virtual Participation: We will stream and record the workshop on zoom. Please find the meeting info here.
Organized by: Laxman Dhulipala
Title: Parallel Algorithm Design for Classic Graph Problems (Yan Gu)
Abstract: This talk will cover a few new parallel algorithms on classic graph problems such as single-source shortest-paths (the rho-stepping and the delta*-stepping algorithms), biconnectivity (the vertex-labeling algorithm), and strongly connected components (the BGSS algorithm). These algorithms are simple and have good theoretical guarantees on work, span, and space. In addition, this talk will discuss a new data structure, the parallel hash bag, that can maintain the vertex subset in many graph algorithms efficiently, and the local-search technique that can accelerate many traversal-based graph algorithms.
Title: Scaling up Hierarchical Agglomerative Clustering to Trillion-Edge Graphs (Jakub Łącki)
Abstract: Hierarchical agglomerative clustering (HAC) is a simple and widely popular clustering method known for its high empirical quality. However, the applications of HAC on large datasets have been limited by the high computational cost of the existing implementations. Addressing this limitation has been an elusive task, as the algorithm has been believed to be inherently sequential and computationally expensive.
In this talk we study the HAC problem on edge-weighted graphs assuming the widely-used average linkage similarity function. We first give conditional hardness results showing that HAC requires time that is super-linear in the input size, and cannot be parallelized efficiently, formalizing the commonly held intuitions about the algorithm. We then show how both of these limitations can be bypassed by allowing approximation. Namely, we show that 1+ε approximate HAC can be implemented in near-linear work and polylogarithmic depth.
Finally, we show that our ideas lead to highly scalable HAC implementations. In a shared memory parallel setting (i.e., on a single machine) we obtain an over 50x speedup over the best existing baseline. Moreover, in the distributed setting, we demonstrate that our implementation scales to graphs containing trillions of edges.
Title: Recent Advances in Parallel Graph Partitioning (Lars Gottesbüren)
Abstract: Dividing data or workload among capacity-bounded processors in a way that minimizes inter-processor communication is a fundamental application of balanced graph partitioning. This talk gives an overview of our recent work on parallelizing techniques for a variety of time-quality trade-offs. I will also discuss adaptations to hypergraphs, buffered streaming, partitioning into a large number of blocks, as well as intricacies of efficient parallel implementations.
Title: So You Want to Make a Dynamic Graph Data Structure (Brian Wheatman)
Abstract: This talk discusses dynamic graph data structures and what to consider when designing new ones. It breaks the task of designing graph data structures into creating structures that can support a few key operations efficiently. The focus is on how different use cases, which can cause different data patterns and access patterns, can influence the design of these structures and how to exploit these differences to maximize performance.
Title: GraphIt - A High-Performance DSL for Graph Analytics (Julian Shun)
Abstract: This talk will present GraphIt, a domain-specific language for parallel graph computations that generates fast implementations for algorithms with different performance characteristics running on graphs with different sizes and structures. GraphIt separates what is computed (algorithm) from how it is computed (schedule). Programmers specify the algorithm using an algorithm language, and performance optimizations are specified using a separate scheduling language. The algorithm language simplifies expressing the algorithms, while exposing opportunities for optimizations. The scheduling language enables programmers to easily search through the complex optimization tradeoff space for graph processing. GraphIt supports efficient implementations of both unordered and ordered graph algorithms using combinations of existing as well as new optimizations that we design. Furthermore, we were able to make GraphIt portable across multicore CPUs, GPUs, Swarm, and the Hammerblade manycore architecture with reasonable effort by decoupling the architecture-independent algorithm from the architecture-specific schedules and backends.
The workshop follows the ACM Policy on Harassment and Discrimination.