Asking the Right Question

How many cores are on my system?

This is a seemingly simple question. We’d like to use the number of available processors to determine how many jobs we can safely run in parallel. Linux provides various ways to fetch the number of available processors including:

nproc
lscpu
/proc/cpuinfo

bash


> nproc 
32

> lscpu | grep ^CPU\(s\)\: | awk '{print $2}'
32

> grep --count "processor" /proc/cpuinfo
32

The sys/sysinfo.h headers provide a nproc function and C++ itself provides std::thread::hardware_concurrency since C++ 11.

cpp


#include <iostream>
#include <sys/sysinfo.h>
#include <thread>

int main(int argc, char *argv[]) {
    auto online_procs = get_nprocs();
    auto all_procs = get_nprocs_conf();
    std::cout << "nproc: "      << online_procs 
        <<"\n"<< "nproc_conf: " << all_procs 
        << std::endl;

    auto max_jobs = std::thread::hardware_concurrency();
    std::cout<<"Maximum Workload: "<<max_jobs<<std::endl;
    return 0;
}

bash-session


> g++ avail_cpus.cpp -o avail-test-cpp
> ./avail-test-cpp
nproc: 32
nproc_conf: 32
Maximum Workload: 32

Rust also provides a thread utilty method, available_parallelism since Rust 1.59.

rust


fn main() {
    if let Ok(max_jobs) = std::thread::available_parallelism() {
        println!("Maximum Workload: {}", max_jobs);
    } 
}

bash-session


> cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/available`
Maximum Workload: 32

We have our answer, right? We’ve detected the number of processors available and we can use that to control the number of threads that we launch. Let’s create some test images and test them on a local Kubernetes cluster.

Test Environment⌗

Key	Value
Kubernetes	Rancher Desktop
Kubernetes version	v1.25.4
Container Engine	dockerd(moby)
VM CPUs	10
VM RAM	16GB

We’ll create deployments for both the C++ and Rust programs and see what they report.

bash


# create a deployment for the C++ code
> kubectl create deployment avail-bootcamp-cpp --image=avail-test-cpp:0.0.2
# view the output from the c++ program
> kubectl logs avail-bootcamp-cpp-7c8dc94758-nqndr
Maximum Workload: 10
nproc: 10
nproc_conf: 10

> kubectl create deployment avail-bootcamp --image=available:0.0.1
> kubectl logs avail-bootcamp-57fcd7dc7f-4lhlr
Maximum Workload: 10

So far so good. Both programs report a maximum workload of 10 which matches the available processors in our Kubernetes cluster.

But we really shouldn’t have two programs trying to fully utilize the limited resources in our Kubernetes cluster. Let’s restrict each to a minimum of 2 processors and give it the ability to scale up to 4 processors as needed.

We’ll do this using kubectl set resources but typically this would be done in your values.yaml. After running set resoruces Kubernetes will terminate the currently running pod and spin up a new one with the requested limits in place.

bash-session


# restrict the cpu limits to 2000-4000m 
kubectl set resources deployment avail-bootcamp-cpp --limits=cpu=4000m,memory=512Mi --requests=cpu=2000m,memory=256Mi

# restrict the cpu limits to 2000-4000m
kubectl set resources deployment avail-bootcamp --limits=cpu=4000m,memory=512Mi --requests=cpu=2000m,memory=256Mi

Once the new pods are up and running we can check the logs to see what they reported for the Maximum Workload.

bash-session


# view the output from the C++ program
kubectl logs avail-bootcamp-cpp-598bdd878-hk45x
nproc: 10
nproc_conf: 10
Maximum Workload: 10

# view the output from the Rust program
kubectl logs avail-bootcamp-7cb6d4b7dc-qcmm4
Maximum Workload: 4

The Rust version correctly picks up the fact that we’ve put limits in place and in this instance Kubernetes has given us the maximum allowed CPU time of 4000m.

But what’s going on with the C++ version? It’s still reporting 10 as the maximum workload. This is a case where we’ve asked C++ the wrong question or maybe we’ve asked it the right question but we are using the answer incorrectly. We’ve asked how many processors are available or what level of concurrency is supported by this hardware. This answer does not take into account the limits placed on the process by the user.

In Rust the function name, available_parallelism, is our hint that we are asking a slightly different question. Instead of asking how many cores are available, we are asking Rust to provide us with an estimation of how much parallelism is available on the currently running system. The Rust implementation takes into account the user restrictions.

To fix the C++ program we’re going to have to dig into Linux Control Groups (cgroups). In Rust this has already been done for you. For most simple cases this will work without issue however there are several scenarios where this estimate may not be accurate or fail to be provided. available_parallelism returns an optional for a reason …

For now, I’ll leave you with links to the cgroup documentation and we’ll fix the C++ version in a future post.

cgroup version 1

cgroup version 2