BFS

In this tutorial we'll implement a parallel version of Breadth-First Search in GBBS. Given a graph, $G = (V,E)$ , and a source node $s$ , a Breadth-First Search processes the graph level by level, starting from $s$ . The output of the algorithm is an array $A$ s.t. $\forall v \in V, A[v]$ holds the parent of $v$ in the breadth-first tree rooted at $s$ .

Getting Started#

In what follows we will reimplement the BFS algorithm shown in the file above and go through all of the steps to create a new benchmark in GBBS.

To get started, cd into the benchmarks/BFS directory and create a new directory BFS_tutorial. Create two new files in this directory, BFS.cc and BFS.h. We will start with the interesting algorithmic code in BFS.h.

Let's start by creating a stub for the BFS function:

template <class Graph>
inline sequence<uintE> BFS(Graph& G, uintE source) {

}

One of the goals of GBBS is to abstract away implementation specific details about how the graph is represented. Towards this goal, the algorithm implementations in GBBS typically take the graph type as a template argument, as shown above. This approach lets us run the same code on the graph whether it is compressed or uncompressed, and directed or undirected.

The uintE type name is an integer type used to represent vertex-ids in GBBS (the analogous type for edge-ids is uintT although we won't use it in this implementation).

The BFS function's signature indicates that it takes as input a graph, and a source vertex-id and returns a sequence of integers representing the parent of each vertex in the BFS tree.

Next, modify the BFS implementation to the following:

template <class Graph>
inline sequence<uintE> BFS(Graph& G, uintE source) {
  using W = typename Graph::weight_type;
  uintE n = G.n;

  // Creates Parents array, initialized to all -1, except for source.
  auto Parents = sequence<uintE>(n, [&](size_t i) { return UINT_E_MAX; });
  Parents[source] = source;

}

Here, we set W to the weight-type in the provided graph. For unweighted graphs, this is the type of an empty struct (with no data), and otherwise it is the graph's weight type, e.g., int32 or float.

Next, we set the Parents array to a sequence containing $n$ (G.n) elements where the i'th element is some suitable null value, in this case the maximum unsigned uintE, UINT_E_MAX. Note that the initialization of this sequence will happen in parallel when we use a parallel runtime. We also set the parent of the source vertex to itself.

So far, we have set up the variables and data necessary for the algorithm, and we're ready to start the interesting parts of the BFS.

Representing a Frontier#

We can express a frontier generated by the BFS with the vertexSubset datatype (from the Ligra interface), which lets us represent a subset of the vertices. Each level of the traversal will be represented by a vertexSubset. The initial vertexSubset, is just a set containing our source node can be constructed the following constructor

template <class W>
struct BFS_F {
  uintE* Parents;
  BFS_F(uintE* _Parents) : Parents(_Parents) { }
  inline bool updateAtomic (uintE s, uintE d, W w) { // Atomically update.
    return pbbslib::atomic_compare_and_swap(&Parents[d], UINT_E_MAX, s);
  }
  inline bool update (uintE s, uintE d, W w) { // Defer to updateAtomic for now.
    return updateAtomic(s, d, w);
  }
  // Cond function checks if vertex has been visited yet.
  inline bool cond (uintE d) { return (Parents[d] == UINT_E_MAX); }
};
vertexSubset Frontier(n, source); // creates initial frontier

Traversing a Frontier#

Now, we need to describe the logic to produce the next frontier given the current frontier. Enter edgeMap. edgeMap (from the Ligra interface) allows us to process the out edges of a vertexSubset, and produce a new vertexSubset.

The behavior of edgeMap is intended to be customized by providing a user-defined structure. edgeMap expects a parameter of some template class F that has the following fields:

template <class W>
struct F {
  F(...) { ... }
  inline bool update (uintE u, uintE v, W wgh) {
    // logic for how to process the edge (s,d)
  }

  inline bool updateAtomic (uintE u, uintE v, W wgh) {
    // logic for how to process the edge (s,d)
  }

  inline bool cond (uintE d) {
    // return 1 if update should be applied on and edge (s,d)
    // return 0 otherwise
  }
};

There is an important distinction between update and updateAtomic that we will discuss later. For now, we will make sure that all update logic is atomic. Atomicity is important because if the vertex has more than one in-edge from the current frontier, then multiple calls to update can happen in parallel which may result in an incorrect result.

For our BFS, let's implement a struct BFS_F where

updateAtomic: atomically update Parents array. We can implement this using a compare and swap operation, which is provided in the pbbslib library as pbbslib::atomic_compare_and_swap.
cond: avoid revisiting previously visited vertices by checking if Parents[v] == UINT_E_MAX.

template <class W>
struct BFS_F {
  uintE* Parents;
  BFS_F(uintE* _Parents) : Parents(_Parents) { }
  inline bool updateAtomic (uintE s, uintE d, W w) { // Atomically update.
    return pbbslib::atomic_compare_and_swap(&Parents[d], UINT_E_MAX, s);
  }
  inline bool update (uintE s, uintE d, W w) { // Defer to updateAtomic for now.
    return updateAtomic(s, d, w);
  }
  // Cond function checks if vertex has been visited yet.
  inline bool cond (uintE d) { return (Parents[d] == UINT_E_MAX); }
};

Add this code before the definition of BFS.

Notice that while BFS_F will correctly produce the next frontier, the tree computed by the BFS is still non-deterministic! We will discuss how to fix this in a later section.

Implementing Traversal Logic#

All we need to do to finish up the BFS is to actually call edgeMap, and add a termination condition that signifies when the traversal is finished. Given a current frontier, our condition should just apply the edgeMap while the current frontier is non-empty. In code:

while (!Frontier.isEmpty()) { // Loop until frontier is empty.
  Frontier = edgeMap(GA, Frontier, BFS_F(Parents.begin()));
}

The Final Algorithm#

Our finished BFS algorithm should look as follows:

#pragma once

#include "gbbs/gbbs.h"

namespace gbbs {

template <class W>
struct BFS_F {
  uintE* Parents;
  BFS_F(uintE* _Parents) : Parents(_Parents) { }
  inline bool updateAtomic (uintE s, uintE d, W w) { // Atomically update.
    return pbbslib::atomic_compare_and_swap(&Parents[d], UINT_E_MAX, s);
  }
  inline bool update (uintE s, uintE d, W w) { // Defer to updateAtomic for now.
    return updateAtomic(s, d, w);
  }
  // Cond function checks if vertex has been visited yet.
  inline bool cond (uintE d) { return (Parents[d] == UINT_E_MAX); }
};

template <class Graph>
inline sequence<uintE> BFS(Graph& G, uintE source) {
  using W = typename Graph::weight_type;
  uintE n = G.n;

  // Creates Parents array, initialized to all -1, except for source.
  auto Parents = sequence<uintE>(n, [&](size_t i) { return UINT_E_MAX; });
  Parents[source] = source;

  vertexSubset Frontier(n, source);
  size_t reachable = 0;
  while (!Frontier.isEmpty()) {
    std::cout << Frontier.size() << std::endl;
    reachable += Frontier.size();
    Frontier = edgeMap(G, Frontier, BFS_F<W>(Parents.begin()));
  }
  std::cout << "Reachable: " << reachable << std::endl;
  return Parents;
}

}  // namespace gbbs

At this point we have created a BFS implementation

Wrapping up: BFS.cc and Build-files.#

Lastly, let's see what code we need to add in order to create a binary to run BFS on graph inputs. Place the following code in BFS.cc.

#include "BFS.h"

namespace gbbs {

template <class Graph>
double BFS_runner(Graph& G, commandLine P) {
  uintE src = static_cast<uintE>(P.getOptionLongValue("-src", 0));

  std::cout << "### Application: BFS" << std::endl;
  std::cout << "### Graph: " << P.getArgument(0) << std::endl;
  std::cout << "### Threads: " << num_workers() << std::endl;
  std::cout << "### n: " << G.n << std::endl;
  std::cout << "### m: " << G.m << std::endl;
  std::cout << "### Params: -src = " << src << std::endl;
  std::cout << "### ------------------------------------" << std::endl;

  timer t; t.start();
  auto parents = BFS(G, src);
  double tt = t.stop();

  std::cout << "### Running Time: " << tt << std::endl;
  return tt;
}

}  // namespace gbbs

generate_main(gbbs::BFS_runner, false);

The main points of interest in this code are the generate_main function, which is a macro defined by GBBS (see gbbs/benchmark.h) that first loads the graph supplied as input by the user, and calls gbbs::BFS_runner. The false flag passed to this macro indicates that this application does not modify the input graph. The macro expects a runner function that takes as input a graph and a commandLine object which supplies the command-line arguments provided by the user.

Lastly, if you are using Bazel add a file called BUILD with the following contents to this directory:

cc_library(
  name = "BFS",
  hdrs = ["BFS.h"],
  deps = ["//gbbs:gbbs"]
)

cc_binary(
  name = "BFS_main",
  srcs = ["BFS.cc"],
  deps = [":BFS"]
)

package(
  default_visibility = ["//visibility:public"],
)

This BUILD file contains two build rules, which tells Bazel how to build the desired outputs, such as executable binaries (cc_binary) and libraries (cc_library). Build rules are also called targets, and point to a specific set of source files and dependencies. A target can also point to other targets, as seen in the dependencies for cc_binary.

You can read more about Bazel here.

Compilation#

Using Bazel you can compile the application the BFS application as follows:

$ bazel build //benchmarks/BFS/BFS_tutorial/...

To use make you can copy the makefile from benchmarks/BFS/NondeterministicBFS/makefile, and run make.

Testing#

Let's try running our program on one of the test-inputs provided by Ligra in the inputs/ directory.

$ numactl -i all ./bazel-bin/benchmarks/BFS/NonDeterministicBFS/BFS_main \
    -s -rounds 4 -src 0 inputs/rMatGraph_J_5_100
# list_alloc init_blocks: 1000
# after init:
# Used: 0, allocated: 79872, node size: 16392, bytes: 1309261824
### Application: BFS
### Graph: inputs/rMatGraph_J_5_100
### Threads: 144
### n: 128
### m: 708
### Params: -src = 0
### ------------------------------------
1
8
20
50
41
5
Reachable: 125
### Running Time: 0.000113964
...

The flags above indicate the following:

The -s flag indicates that the graph is symmetric.
The -rounds flag indicates the number of times to run the benchmark.
The -src flag indicates the source vertex.

Great! We've successfully implemented a shared memory breadth-first search in GBBS. You should now have all of the pieces needed to implement new benchmarks in GBBS.