Topology: Distributed

This example uses RPC to enable a coordinator instance to get the topology of all the other active instances. The RPC Engine

This example showcases how the abstract HiCR Core API can be used to discover compute and memory devices in the system. The code is structured as follows:

  • source/ contains the different variants of this example corresponding to different backends

    • mpi.cpp corresponds to the MPI backend implementation. It uses hwloc to discover local resources and then communicates them via MPI

Local Topology Discovery

Each instance discovers its own local topology

// Creating HWloc topology object
hwloc_topology_t topology;

// Reserving memory for hwloc
hwloc_topology_init(&topology);

// Initializing hwloc (CPU) topology manager
HiCR::backend::hwloc::TopologyManager tm(&topology);

// Gathering topology from the topology manager
const auto t = tm.queryTopology();

Topology RPC registration

Each instance instantiates the RPCEngine and registers and RPC to serialize and send its local topology to the caller

// Creating RPC engine instance
HiCR::frontend::RPCEngine rpcEngine(...);

// Initialize RPC engine
rpcEngine.initialize();

// Creating execution unit to run as RPC
auto executionUnit = std::make_shared<HiCR::backend::pthreads::ExecutionUnit>([&](void *closure) { sendTopology(rpcEngine); });

// Adding RPC target by name and the execution unit id to run
rpcEngine.addRPCTarget(TOPOLOGY_RPC_NAME, executionUnit);

Listening for RPCs

All the instances except Root listens for incoming RPC requests

// Listening for RPC requests
rpcEngine.listen();

RPC Invokation

The Root instance requests the topology from all the other instances, merge them, and then displays them

// Getting instance manager from the rpc engine
auto im = rpcEngine.getInstanceManager();

// Querying instance list
auto &instances = im->getInstances();

// Getting the pointer to our own (coordinator) instance
auto coordinator = im->getCurrentInstance();

// Invoke RPC
for (const auto &instance : instances)
  if (instance->getId() != coordinator->getId()) rpcEngine.requestRPC(*instance, TOPOLOGY_RPC_NAME);

// Getting return values from the RPCs containing each of the worker's topology
for (const auto &instance : instances)
  if (instance == coordinator)
  {
    // Getting return value as a memory slot
    auto returnValue = rpcEngine.getReturnValue(*instance);

    // Receiving raw serialized topology information from the worker
    std::string serializedTopology = (char *)returnValue->getPointer();

    // Parsing serialized raw topology into a json object
    auto topologyJson = nlohmann::json::parse(serializedTopology);

    // Freeing return value
    rpcEngine.getMemoryManager()->freeLocalMemorySlot(returnValue);

    // HiCR topology object to obtain
    HiCR::Topology topology;

    // Merge topologies
    topology.merge(HiCR::backend::hwloc::TopologyManager::deserializeTopology(topologyJson));
    }
}

The result should look like the following:

* Worker 1 Topology:
  + 'NUMA Domain'
    Compute Resources: 44 Processing Unit(s)
    Memory Space:     'RAM', 93.071026 Gb
  + 'NUMA Domain'
    Compute Resources: 44 Processing Unit(s)
    Memory Space:     'RAM', 94.437321 Gb
* Worker 2 Topology:
  + 'NUMA Domain'
    Compute Resources: 44 Processing Unit(s)
    Memory Space:     'RAM', 93.071026 Gb
  + 'NUMA Domain'
    Compute Resources: 44 Processing Unit(s)
    Memory Space:     'RAM', 94.437321 Gb
* Worker 3 Topology:
  + 'NUMA Domain'
    Compute Resources: 44 Processing Unit(s)
    Memory Space:     'RAM', 93.071026 Gb
  + 'NUMA Domain'
    Compute Resources: 44 Processing Unit(s)
    Memory Space:     'RAM', 94.437321 Gb