You are considering the architectural design of your Docker-based CPM system with the goal of stability, load balancing, and redundancy.
A Docker-based CPM
One or more Linux hosts
- One or more private locations
See this article on scaling and rightsizing for the CPM that covers many of these details.
You can calculate the job demand that your private location will face with a formula:
number of monitors / (average monitor frequency * number of workers * replicas or hosts)
- How many synthetic monitoring can we execute in 1 CPM instance?
It depends on the number, type, and frequency of monitors being run on the CPM. It also depends on the resources provided the CPM. More cpu cores will yield more heavy workers, which in turn will require more memory at 2.5 GB per cpu core. As a result a host with 8 cpu cores will be able to process 8 simultaneous Chrome-based jobs. Assuming each job takes 30 seconds to complete on average, then it could handle a max of 16 "heavy" jobs per minute.
This is assuming no timeouts which will occupy a runner thread for up to 180 seconds by default. Essentially, there are many factors to consider and each case will be different.
It's important to note that scripted browser and simple browser monitors require the most time per heavy worker to complete, sometimes upwards of 120 seconds. API test monitors can execute faster. Ping monitors don't come into consideration as they execute on the minion container itself and don't contribute much additional load on the host.
Frequency is a significant factor which greatly impacts how many jobs the CPM needs to handle per minute. For example, a monitor with a 1-minute frequency will occupy 10x the resources of a monitor with a 10-minute frequency.
- We have 8 CPU core for per private location, will it not distribute the load?
There are several ways to load balance a private location. With one instance of the CPM running on one host, jobs will run simultaneously up to the lightweight and heavy worker counts. This depends on the number of cpu cores provided to the minion. Lightweight workers are set to
25*num_cpusand heavy workers are set to
num_cpus. See our environment variables section which describes this for the vars called
Another way to load balance the private location is to add more hosts. This is a best practice as adding more hosts provides load balancing and failover protection. Ideally each host would be configured with enough resources to handle the full job demand alone such that if one host fails, the other can take on the full load. Each host will pull jobs from the queue as quickly as it can. In this way load balancing can be achieved, though there is not any specific rule applied, like round-robin.
Finally, additional private locations can be added to serve as a kind of load balancing, though in a more pre-planned way. The way to utilize multiple private locations is to allocate each private location to a specific role or purpose, for example, dev and prod. In this way specific locations can be chosen for different monitors and not all monitors need be set to run on one location. This can help to alleviate the prod location from having to run dev jobs, which may be more likely to fail and cause performance degradation due to our default of 2 retries on the prod CPM. Essentially, each failing monitor will get tried three times (soft failures) before a failed result is sent to the UI (hard failure). For unreliable or poorly written scripts, it is best to run them on a dev CPM to not degrade the performance of the prod CPM.
- How many private locations are required to handle 100+ monitors?
One private location can handle all your monitors. There is no limitation to the number of monitors that can be run on a private location, assuming enough CPM instances/hosts exist. Assuming the worst case scenario of 100 scripted browser jobs set to 1-minute frequency and say 30 seconds per job, you'd need 50 cpu cores and 125 GB of memory. Whether that is spread across several hosts or one big host is up to you.
Note that as host size increases, a bottleneck can happen with disk I/O. At 15 scripted browser jobs per minute, disk IOPS will hover around 200-300 ops/sec on write.