Parallelization

rmpadmos · November 16, 2023, 10:53am

I am trying to run getfem in parallel. I am having an issue with the solver. The solver does not seem to be faster with more cores.
I am using mumps given the size of my problem.
For instance, the parallel example demo_parallel_laplacian.py does not seem to be faster with more cores.
Even for a larger problem (2.5M dof) there does not seem to be any speedup with more cores.
I am not sure if this is an issue related to getfem or mumps.

I have configured getfem with:
./configure --prefix=/home/rmpadmos/getfem/ --with-pic --enable-paralevel=2 --enable-python CC=mpicc CXX=mpicxx FC=mpif90 --with-optimization=-O3 --enable-par-mumps --enable-metis --enable-qhull --enable-multithread-blas --enable-blas64-support

How do I speedup the solver in getfem?

Konstantinos.Poulios · November 16, 2023, 2:24pm

hi, are you using mpi4py? When running getfem from python with mpi4py, I could not get the MPI parallelization to work either. We should do some more testing to figure out if it is a mpi4py/mumps or a GetFEM issue. From C++ I could get quite some reasonable MPI scaling before, but I haven’t tested it recently.

On the other side, do you know whether it is your linear solve or the assembly that takes most time? If it is the assembly, then the easy solution is to use openmp instead of MPI. Assembly is trivially parallelizable. You just need to compile with sequential mumps, and the --with-openmp flag, then at the beginning of your python script you call

gf.util_set_num_threads(4)

Konstantinos.Poulios · November 16, 2023, 2:35pm

Maybe we you can start by saving your matrix to a file with

md.tangent_matrix().save("mm","K.mm")

to use it for testing.

Then we can do some testing with a simple script like:

import getfem as gf
import numpy as np
import time
K = gf.SpMat("load", "mm", "K.mm")
N = K.size()[0]
b = np.ones(N)
starttime = time.time()
x = gf.linsolve_mumps(K, b)
print("duration", time.time()-starttime)

rmpadmos · November 16, 2023, 4:46pm

I am indeed using mpi4py. The solver takes the most time. Model assembly does run in parallel. It is the linear solver that seems to be the issue.
Running with more cores increases the cpu load even though the compute time is unchanged. Perhaps there is an issue with mumps loading too many threads? I will run some more tests.

Konstantinos.Poulios · June 24, 2024, 5:58pm

Just a short update on the status, with the current development version compiling getfem with para_level 2 and running python scripts with mpi4py and mumps runs pretty well. It scales reasonably for up to 16 procs.

Topic		Replies	Views
Problem solving transient brick with MUMPS Questions and issues	2	33	January 21, 2025
Probleme installation MUMPS sur Mac ox General	5	57	January 24, 2025
Performance of parallel MUMPS solver General	2	51	June 2, 2025
Fast solve of linear problems General	23	100	April 16, 2025
Building GetFEM package for Windows General	26	434	April 24, 2025

Parallelization

Related topics