KCore

At this point, you should be familiar with some of the basic features of the GBBS library such as the vertexSubset datatype and core parts of the API such as edgeMap. For examples of how to use this, please refer to either the previous tutorial or the API reference.

In this tutorial, we will introduce and use other parts of the GBBS library such as the bucketing data structure and the nghCount primitive in order to compute the k-core of a graph. The implementation we describe is presented in more detail in [1, 2].

Definitions#

Given $G = (V, E)$ the $k$ -core of $G$ is the maximal set of vertices $S \subset V$ s.t. the degree for $v \in S$ in $G[S]$ (the induced subgraph of $G$ over $S$ ) is at least $k$ . These objects have been useful in the study of social networks and bioinformatics, and have natural applications in the visualization of complex networks.

For instance in the figure above the 1-core consists of all vertices with degree at least 1 (all of the vertices in this example). The vertices in the 2-core are computed by recursively deleting vertices with degree less than 2 until all vertices remaining have degree at least 2. The 3-core is computed similarly, by deleting vertices with degree less than 3. The largest core in this example is the 3-core.

In the $k$ -core problem, the goal is to produce a map, $f : V \rightarrow \mathbb{N}$ s.t. for $v \in V$ , $f(v)$ is the maximum core that $v$ is a part of. We refer to $f(v)$ as the coreness of $v$ . A commonly used measure is the degeneracy of a graph, which is just the maximum non-empty core that $G$ supports.

The Peeling Algorithm#

In this tutorial we will implement a parallel implementation of the classic peeling algorithm due to Matula and Beck (1983) (the same algorithm has been implicitly described in the work of Anderson and Mayr (1984)).

The sequential peeling algorithm is very simple. It first bucket-sorts vertices by their degree, and then repeatedly deletes the minimum-degree vertex. The affected neighbors are moved to a new bucket corresponding to their induced degree. As each edge in each direction and each vertex is processed exactly once, the algorithm runs in $O(m + n)$ work.

We will implement a parallel peeling algorithm based on this idea. The algorithm works as follows at a high level. It stores the vertices in a set of buckets which correspond to their induced degree. In each parallel step, the algorithm peels all vertices in the smallest non-empty bucket (let this bucket number be $k$ ), and assigns these vertices to have a coreness of $k$ . It then updates the induced degrees of the neighbors of the peeled vertices, and moves them into a bucket corresponding to their new induced degree. Note that if the induced degree of a vertex falls below the current bucket being peeled, $k$ , the vertex is moved to bucket $k$ . The algorithm terminates once all vertices are peeled. Later we will argue that our implementation of this algorithm runs in $O(m + n)$ expected work and $O(\rho \log n)$ depth w.h.p. where $\rho$ is the peeling complexity of the graph, defined as the number of rounds to peel the graph to an empty graph where each peeling step removes all minimum degree vertices.

Getting Started#

Let's get started! cd into the /benchmarks/kcore/ directory and create a new directory called kcore_tutorial. Create three files in this directory: BUILD, KCore.cc and KCore.h. We will start by filling in some easy content in the BUILD and KCore.cc files.

Add the following to the BUILD file.

cc_library(
  name = "KCore",
  hdrs = ["KCore.h"],
  deps = [
  "//gbbs:gbbs",
  "//gbbs:julienne",
  ]
)

cc_binary(
  name = "KCore_main",
  srcs = ["KCore.cc"],
  deps = [":KCore"]
)

package(
  default_visibility = ["//visibility:public"],
)

And add the following to the KCore.cc file.

#include "KCore.h"

namespace gbbs {

template <class Graph>
double KCore_runner(Graph& G, commandLine P) {
  size_t num_buckets = P.getOptionLongValue("-nb", 16);
  std::cout << "### Application: KCore" << std::endl;
  std::cout << "### Graph: " << P.getArgument(0) << std::endl;
  std::cout << "### Threads: " << num_workers() << std::endl;
  std::cout << "### n: " << G.n << std::endl;
  std::cout << "### m: " << G.m << std::endl;
  std::cout << "### Params: -nb (num_buckets) = " << num_buckets << std::endl;
  std::cout << "### ------------------------------------" << std::endl;

  timer t; t.start();
  auto cores = KCore(G, num_buckets);
  double tt = t.stop();

  std::cout << "### Running Time: " << tt << std::endl;
  return tt;
}

}  // namespace gbbs

generate_symmetric_main(gbbs::KCore_runner, false);

Thus far we have simply set up the skeleton of the k-core implementation. The runner program in KCore.cc specifies that it runs on a symmetric graph (either compressed or uncompressed), and requires a parameter called num_buckets (-nb) which we will discuss shortly.

Implementing the Parallel Peeling Algorithm#

Add the following skeleton to KCore.h

#pragma once

#include "gbbs/gbbs.h"
#include "gbbs/julienne.h"

namespace gbbs {

template <class Graph>
inline sequence<uintE> KCore(Graph& G, size_t num_buckets = 16) {
  const uintE n = G.n;
  auto D = sequence<uintE>(n, [&](size_t i) {
    return G.get_vertex(i).out_degree(); });

  return D;
}

So far the code doesn't do much. The signature for the KCore function tells us that it accepts as input a graph and the number of buckets, and returns a sequence of integers representing the coreness number of each vertex. Inside the function body, the main point of interest is the array D, which will store the coreness values. The array initializes the coreness values to degree of each vertex in the graph.

Initializing the Bucketing Structure#

The next step in the algorithm before performing the peeling computation is to initialize a bucketing structure that will maintain a dynamic mapping between a set of identifiers (the vertices) and their bucket (representing their current induced degree).

auto B = make_vertex_buckets(n, D, increasing, num_buckets);

This declaration creates a bucketing structure, B, containing

n specifies the number of identifiers
D specifies the mapping from identifier to its current bucket number
increasing specifies that the buckets are processed in increasing order (smallest to largest)
num_buckets specifies the number of open buckets to maintain

Bucketing: NextBucket#

Next, now that the bucketing structure is initialized, we will implement the main peeling portion of the algorithm. The first step in peeling is to loop while all vertices have not yet been processed, and extract all of the vertices in the next non-empty bucket. Add the following loop after the initialization of the bucketing structure, B.

uintE finished = 0, rho = 0, k_max = 0;
while (finished != n) {
  auto bkt = B.next_bucket();
  auto active = vertexSubset(n, std::move(bkt.identifiers));
  uintE k = bkt.id;
  k_max = k;
  rho++;
  finished += active.size();
}

As described above, this code will loop until all vertices have been peeled (when finished == n). Each step of the loop extracts the identifiers in the next non-empty bucket using the B.next_bucket() primitive. This primitive returns a bucket object which consists the following:

identifiers: a sequence of identifiers contained in the extracted bucket
id: the id of the extracted bucket

The code first initializes a vertexSubset using the extracted identifiers (vertices to be peeled), and increments the number of finished vertices by the number of extracted identifiers.

Updating the Induced Degrees#

The next step is to compute the effect of peeling these vertices from the graph has on their neighbors degrees. To do this we will use the nghCount primitive provided by GBBS.

First add the following EdgeMap declaration after the initialization of the bucket structure near the top of the function. This declaration constructs a helper data-structure used to perform the nghCount.

auto em = hist_table<uintE, uintE>(
    std::make_tuple(UINT_E_MAX, 0), (size_t)G.m / 50);

The nghCount primitive takes as input

G: an input graph
active: a vertexSubset
cond_f: a boolean function
apply_f: a function from std::tuple<uintE, uintE> -> std::optional<std::tuple<uintE, uintE>>
em: a reference to the helper EdgeMap struct
flags: an optional flags parameter

It performs the following useful logic. It emits a value of $1$ for each edge $(u,v)$ where $u \in \mathsf{active}$ and $C(v) = \mathsf{true}$ . Next, the mapped values for each such $v$ are reduced in parallel to obtain a single summed value, $R_v$ . Finally, the apply_f function is called on the pair $(v, R_v)$ to obtain an optional result $O$ . The vertex, augmented value pair $(v, o)$ is emitted to the output vertexSubset if and only if $O = \mathsf{Some}(o)$ (i.e., a non-null option was returned).

For example, consider the figure above. The green vertices are the vertices in $\mathsf{active}$ in (1). Suppose cond_f returns $\mathsf{true}$ for $v_2, v_4, v_5$ and $\mathsf{false}$ for $v_3, v_6$ , and apply_f simply returns $\mathsf{Some}(R_v)$ . The counts associated with the red vertices in (2) show the vertices and their associated values in the int vertexSubset that is emitted by the nghCount primitive.

Now that we understand the semantics of nghCount, we can use it to compute the number of neighbors removed from each vertex. The apply_f function will simply take the number of removed neighbors and perform the following steps:

potentially update the coreness of this vertex in D
return the new bucket that the neighbor should be moved to, or a null option if the neighbor does not need to be moved

auto apply_f = [&](const std::tuple<uintE, uintE>& p)
    -> const std::optional<std::tuple<uintE, uintE> > {
  uintE v = std::get<0>(p), edgesRemoved = std::get<1>(p);
  uintE deg = D[v];
  if (deg > k) {
    uintE new_deg = std::max(deg - edgesRemoved, k);
    D[v] = new_deg;
    return wrap(v, B.get_bucket(new_deg));
  }
  return std::nullopt;
};

auto cond_f = [] (const uintE& u) { return true; };
vertexSubsetData<uintE> moved = nghCount(
    G, active, cond_f, apply_f, em, no_dense);

Note that the code above makes use of the bucketing optimization described in the Julienne paper [1]. Specifically, the implementation is optimized for the case where only a small number of buckets are processed, which is typically the case in practice. Thus, instead of maintaining the identifiers stored in every single bucket, the implementation only explicitly represents a prefix of the next $C$ many buckets, for some user-specified parameter $C$ .

The key point for our implementation is that we can potentially skip updating the bucketing structure for many vertices that had their degree decreased. To find out whether we need to actually move this vertex to a bucket currently in the open range, , we call the B.get_bucket primitive corresponding to the new bucket for this vertex (new_deg). If the bucketing data structure does not need to move this vertex from its current bucket to the bucket corresponding to new_deg it returns a value of UINT_E_MAX, which the wrap primitive then ignores by returning std::nullopt.

Note that the other reason that a neighbor may not need to move buckets is if it has already been peeled.

Updating the Buckets#

The last step is to update the bucketing data structure with the new buckets for each of the affected neighbors.

B.update_buckets(moved);

The Finished Application#

#pragma once

#include "gbbs/gbbs.h"
#include "gbbs/julienne.h"

namespace gbbs {

template <class Graph>
inline sequence<uintE> KCore(Graph& G, size_t num_buckets = 16) {
  const uintE n = G.n;
  auto D = sequence<uintE>(n, [&](size_t i) {
    return G.get_vertex(i).out_degree(); });

  auto B = make_vertex_buckets(n, D, increasing, num_buckets);
  auto em = hist_table<uintE, uintE>(
      std::make_tuple(UINT_E_MAX, 0), (size_t)G.m / 50);

  uintE finished = 0, rho = 0, k_max = 0;
  while (finished != n) {
    auto bkt = B.next_bucket();
    auto active = vertexSubset(n, std::move(bkt.identifiers));
    uintE k = bkt.id;
    k_max = k;
    rho++;
    finished += active.size();

    auto apply_f = [&](const std::tuple<uintE, uintE>& p)
        -> const std::optional<std::tuple<uintE, uintE> > {
      uintE v = std::get<0>(p), edgesRemoved = std::get<1>(p);
      uintE deg = D[v];
      if (deg > k) {
        uintE new_deg = std::max(deg - edgesRemoved, k);
        D[v] = new_deg;
        return wrap(v, B.get_bucket(new_deg));
      }
      return std::nullopt;
    };

    auto cond_f = [] (const uintE& u) { return true; };
    vertexSubsetData<uintE> moved = nghCount(
        G, active, cond_f, apply_f, em, no_dense);

    B.update_buckets(moved);
  }

  std::cout << "### rho = " << rho << " k_{max} = " << k_max << std::endl;
  return D;
}

}  // namespace gbbs

Compilation#

Using Bazel you can compile the application the KCore application as follows:

$ bazel build //benchmarks/KCore/KCore_tutorial/...

To use make you can copy the makefile from benchmarks/KCore/JulienneDBS17/makefile, and run make.

Testing#

Let's try running our program on one of the test-inputs provided by Ligra in the inputs/ directory.

$ numactl -i all ./bazel-bin/benchmarks/KCore/KCore_tutorial/KCore_main \
    -s -rounds 4 inputs/rMatGraph_J_5_100
# list_alloc init_blocks: 1000
# after init:
# Used: 0, allocated: 79872, node size: 16392, bytes: 1309261824
### Application: KCore
### Graph: inputs/rMatGraph_J_5_100
### Threads: 144
### n: 128
### m: 708
### Params: -nb (num_buckets) = 16 -fa (use fetch_and_add) = 0
### ------------------------------------
### rho = 21 k_{max} = 4
### Running Time: 0.000540018
...

The flags above indicate the following:

The -s flag indicates that the graph is symmetric.
The -rounds flag indicates the number of times to run the benchmark.
The -nb flag indicates the number of open buckets the bucketing implementation should maintain.

Great! We've successfully implemented a work-efficient k-core algorithm in GBBS. You should now have all of the pieces necessary to implement your own bucketing-based algorithms in GBBS.

References#

[1] Laxman Dhulipala, Guy Blelloch, and Julian Shun
Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing
Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 293-304, 2017.

[2] Laxman Dhulipala, Guy Blelloch, and Julian Shun
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pp. 393-404, 2018.