Dziekan P., Żmijewski P.
A numerical cloud model with Lagrangian particles coupled to an Eulerian flow is adapted for distributed memory systems. Eulerian and Lagrangian calculations can be done in parallel on CPUs and GPUs, respectively. The fraction of time when CPUs and GPUs work simultaneously is maximized at around 80 % for an optimal ratio of CPU and GPU workloads. The optimal ratio of workloads is different for different systems because it depends on the relation between computing performance of CPUs and GPUs. GPU workload can be adjusted by changing the number of Lagrangian particles, which is limited by device memory. Lagrangian computations scale with the number of nodes better than Eulerian computations because the former do not require collective communications. This means that the ratio of CPU and GPU computation times also depends on the number of nodes. Therefore, for a fixed number of Lagrangian particles, there is an optimal number of nodes, for which the time CPUs and GPUs work simultaneously is maximized. Scaling efficiency up to this optimal number of nodes is close to 100 %. Simulations that use both CPUs and GPUs take between 10 and 120 times less time and use between 10 to 60 times less energy than simulations run on CPUs only. Simulations with Lagrangian microphysics take up to 8 times longer to finish than simulations with Eulerian bulk microphysics, but the difference decreases as more nodes are used. The presented method of adaptation for computing clusters can be used in any numerical model with Lagrangian particles coupled to an Eulerian fluid flow.
Geoscientific Model Development, 2022, vol. 15(11), pp. 4489–4501, doi: 10.5194/gmd-15-4489-2022