I am trying to run getfem in parallel. I am having an issue with the solver. The solver does not seem to be faster with more cores.
I am using mumps given the size of my problem.
For instance, the parallel example demo_parallel_laplacian.py does not seem to be faster with more cores.
Even for a larger problem (2.5M dof) there does not seem to be any speedup with more cores.
I am not sure if this is an issue related to getfem or mumps.
I have configured getfem with:
./configure --prefix=/home/rmpadmos/getfem/ --with-pic --enable-paralevel=2 --enable-python CC=mpicc CXX=mpicxx FC=mpif90 --with-optimization=-O3 --enable-par-mumps --enable-metis --enable-qhull --enable-multithread-blas --enable-blas64-support
How do I speedup the solver in getfem?
hi, are you using mpi4py? When running getfem from python with mpi4py, I could not get the MPI parallelization to work either. We should do some more testing to figure out if it is a mpi4py/mumps or a GetFEM issue. From C++ I could get quite some reasonable MPI scaling before, but I haven’t tested it recently.
On the other side, do you know whether it is your linear solve or the assembly that takes most time? If it is the assembly, then the easy solution is to use openmp instead of MPI. Assembly is trivially parallelizable. You just need to compile with sequential mumps, and the --with-openmp flag, then at the beginning of your python script you call
gf.util_set_num_threads(4)
Maybe we you can start by saving your matrix to a file with
md.tangent_matrix().save("mm","K.mm")
to use it for testing.
Then we can do some testing with a simple script like:
import getfem as gf
import numpy as np
import time
K = gf.SpMat("load", "mm", "K.mm")
N = K.size()[0]
b = np.ones(N)
starttime = time.time()
x = gf.linsolve_mumps(K, b)
print("duration", time.time()-starttime)
I am indeed using mpi4py. The solver takes the most time. Model assembly does run in parallel. It is the linear solver that seems to be the issue.
Running with more cores increases the cpu load even though the compute time is unchanged. Perhaps there is an issue with mumps loading too many threads? I will run some more tests.
Just a short update on the status, with the current development version compiling getfem with para_level 2 and running python scripts with mpi4py and mumps runs pretty well. It scales reasonably for up to 16 procs.