Problem during boundary condition assembly

perf is installed with

sudo apt install linux-perf

Hello Kostas,

That gives me

[sudo] password for eac:
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package linux-perf

Sincerely,
Eric Comstock

ok, it seems that the package name is a bit different in Ubuntu than in Debian, try with

sudo apt install linux-tools-common

or

sudo apt install linux-tools-generic

the installation is with apt in any case, do not try anything else for installing software in your Ubuntu.

Hello Kostas,

Thank you! I am still having a problem installing, though - during the installation step of the guide you sent, I got

eac@GREIVOUS2:/mnt/c/Users/Admin/Documents/Important Files/GT/Research/Summer 2025/Memos/06 02 2025 - getFEM high-D pare
llelization testing$ uname -r
5.15.146.1-microsoft-standard-WSL2
eac@GREIVOUS2:/mnt/c/Users/Admin/Documents/Important Files/GT/Research/Summer 2025/Memos/06 02 2025 - getFEM high-D parellelization testing$ sudo apt install linux-tools-5.15.146.1-generic
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package linux-tools-5.15.146.1-generic
E: Couldn't find any package by glob 'linux-tools-5.15.146.1-generic'
eac@GREIVOUS2:/mnt/c/Users/Admin/Documents/Important Files/GT/Research/Summer 2025/Memos/06 02 2025 - getFEM high-D parellelization testing$

Please advise.

Sincerely,
Eric Comstock

the link I provided is just FYI, you were not meant to run this command. Just install linux-tools-generic

Hello Kostas,

I tried that as well, and I am still getting an error when I try to run perf. I am getting the error message

  You may need to install the following packages for this specific kernel:
    linux-tools-5.15.146.1-microsoft-standard-WSL2
    linux-cloud-tools-5.15.146.1-microsoft-standard-WSL2

  You may also want to install one of the following packages to keep up to date:
    linux-tools-standard-WSL2
    linux-cloud-tools-standard-WSL2

However, when I try to install the mentioned packages, I get the error that they do not exist.

Sincerely,
Eric Comstock

what is the output of these commands?

whereis perf
ls /usr/lib/linux-tools/

Hello Kostas,

Thank you!

I tried running those commands, and I got the following output:

eac@GREIVOUS2:/mnt/c/Users/Admin$ whereis perf
perf: /usr/bin/perf /usr/share/man/man1/perf.1.gz
eac@GREIVOUS2:/mnt/c/Users/Admin$ ls /usr/lib/linux-tools/
5.15.0-141-generic

However, I still have an error when I try to use perf.

Sincerely,
Eric Comstock

instead of perf use

/usr/lib/linux-tools/5.15.0-141-generic/perf

in the command for running your program

1 Like

Hello Kostas,

Thank you! That seems to have worked. I will parse the results and send you what I find.

Sincerely,
Eric Comstock

Hello Kostas,

First, I has to turn down the sampling frequency to 100 to prevent errors relating to the perf.data file being 5 GB. I got some results that did not make a whole lot of sense to me. The command I ran was

LD_LIBRARY_PATH=/home/eac/FEM_cpp_testing/test_compilation_folder/opt/lib/ PYTHONPATH=/home/eac/FEM_cpp_testing/test_compilation_folder/opt/lib/python3.10/site-packages/getfem /usr/lib/linux-tools/5.15.0-141-generic/perf record -F 100 --call-graph dwarf python3 test6D_linear_BCs_fast_default.py

and then

hotspot

which generated a number of errors:

QStandardPaths: wrong permissions on runtime directory /run/user/1000/, 0755 instead of 0700
QStandardPaths: wrong permissions on runtime directory /run/user/1000/, 0755 instead of 0700
feature not properly read PerfHeader::BPF_PROG_INFO 4 0
feature not properly read PerfHeader::SAMPLE_TIME 16 0
feature not properly read PerfHeader::CACHE 6828 0
feature not properly read PerfHeader::CPU_TOPOLOGY 1068 756
feature not properly read PerfHeader::BPF_BTF 4 0
Linux version "5.15.179" detected. Switching to automatic buffering.
unhandled event type 73   PERF_RECORD_THREAD_MAP
unhandled event type 74   PERF_RECORD_CPU_MAP
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/hwloc.sm. This can break stack unwinding and lead to missing symbols.
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/pmix_dstor_ds21_27269/initial-pmix_shared-segment-0. This can break stack unwinding and lead to missing symbols.
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/pmix_dstor_ds21_27269/smlockseg-3960668161. This can break stack unwinding and lead to missing symbols.
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/pmix_dstor_ds12_27269/initial-pmix_shared-segment-0. This can break stack unwinding and lead to missing symbols.
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/pmix_dstor_ds12_27269/dstore_sm.lock. This can break stack unwinding and lead to missing symbols.
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/pmix_dstor_ds12_27269/dstore_sm.lock. This can break stack unwinding and lead to missing symbols.
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/pmix_dstor_ds12_27269/initial-pmix_shared-segment-0. This can break stack unwinding and lead to missing symbols.
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/pmix_dstor_ds21_27269/smlockseg-3960668161. This can break stack unwinding and lead to missing symbols.
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/pmix_dstor_ds21_27269/initial-pmix_shared-segment-0. This can break stack unwinding and lead to missing symbols.
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/pmix_dstor_ds21_27269/smseg-3960668161-0. This can break stack unwinding and lead to missing symbols.
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/pmix_dstor_ds21_27269/smdataseg-3960668161-0. This can break stack unwinding and lead to missing symbols.
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/pmix_dstor_ds21_27269/smseg-3960668161-0. This can break stack unwinding and lead to missing symbols.
PerfUnwind::MissingElfFile: Could not find ELF file for /tmp/ompi.GREIVOUS2.1000/pid.27269/pmix_dstor_ds21_27269/smdataseg-3960668161-0. This can break stack unwinding and lead to missing symbols.
Invalid memory read requested by dwfl fffffffffffffffc

Please let me know if any of these invalidate the results.

Here are the results:

Format:
Symbol
Binary
CPU clock time %

libstdc++.so.6.0.30std::_Rb_tree_increment(std::_Rb_tree_node_base const*) 
libstdc++.so.6.0.30 
29.3%

gmm::strongest_value_type<std::vector<double>, gmm::rsvector<double> >::value_type gmm::vect_sp<std::vector<double>, gmm::rsvector<double> >(std::vector<double> const&, gmm::rsvector<double> const&)
_getfem.cpython-310-x86_64-linux-gnu.so
24.8%

void gmm::add_spec<gmm::scaled_vector_const_ref<gmm::rsvector<double>, double>, std::vector<double> >(gmm::scaled_vector_const_ref<gmm::rsvector<double>, double> const&, std::vector<double>&, gmm::abstract_vector) [clone .isra.0]
_getfem.cpython-310-x86_64-linux-gnu.so
24.1%

void gmm::range_basis_eff_Lanczos<gmm::col_matrix<gmm::rsvector<double> > >(gmm::col_matrix<gmm::rsvector<double> > const&, std::set<unsigned long>&, double)
_getfem.cpython-310-x86_64-linux-gnu.so
7.27%

dgemm_
libblas.so.3.10.0
1.97%

I could only find strongest_value_type, add_spec, range_basis_eff_Lanczos, and dgemm_ in header files in the source code, and I could not find _Rb_tree_increment.

Do you know what these do?

Sincerely,
Eric Comstock

useful results

  1. dgemm_ is blas matrix-matrix product, you should spend most of your computations in calls like this

  2. the called function is not strongest_value_type the function called in this line is gmm::vect_sp

  3. gmm::add_spec is just addition of vectors, you need to see if this is function runs slow (e.g. applied to very large vectors), or if it is just run a gazillion times (because of some poor programming logic)

  4. range_basis_eff_Lanczos has to do with eliminating the null space of a multiplier when you have used the add_multiplier function to add the multiplier. This computation time, you could save if you define your multiplier as a normal (filtered) variable, if you restrict yourself the region and order of the variable, so that the resulting constraint will have no null space.

In general try also to check who is the caller of these time consuming functions, to get an idea why they are called so often (or with too much data) in the first place.

Hello Kostas,

Thank you for the detailed breakdown!

  1. Okay - it makes sense that dgemm is called, but it only takes up 2% of the runtime. It is called by a series of functions, which is called by standard_solve (which makes sense).

  2. Thank you!

  3. Got it - is there a way to find this out in perf, or should I create a modification to the sourcecode to measure it directly (e.g. printing something into a file whenever it is run)?

  4. Okay - how do I do that? I remember I was using the filtered boundary conditions earlier, but we switched to add_linear_term.

It seems that range_basis_eff_Lanczos also is what calls add_spec, vect_sp, and _Rb_tree_increment. This function seems to take up most of the time. It is called by range_basis, which is called by model::actualize_sizes(), which is from model::nd_dof, from ga_workspace::ga_workspace, from add_linear_term_, from add_linear_term.

Sincerely,
Eric Comstock

your interpretation looks correct, it all comes from your use of “add_multiplier” which leads to an expensive null-space computation, only the first time the dofs of the multiplier variable need to be defined.

You need to figure out how to replace your add_multiplier with add_filtered_fem_variable, and still get the same results (select correct order of multiplier fem, and correct region). This will save you all these heavy computations.

1 Like

Hello Kostas,

Thank you!

I got it to partially work with add_filtered_fem_variable, but it caused me to get the “ICNTL(14) too low” error again. To change this, I only need to edit ICNTL(14) in gmm_MUMPS_interface.h, correct?

I am using

    md.add_filtered_fem_variable("mult46", mf, 46)
    logging.info('Multiplier 1/6')
    md.add_filtered_fem_variable("mult47", mf, 47)
    logging.info('Multiplier 2/6')
    md.add_filtered_fem_variable("mult48", mf, 48)
    logging.info('Multiplier 3/6')
    md.add_filtered_fem_variable("mult49", mf, 49)
    logging.info('Multiplier 4/6')
    md.add_filtered_fem_variable("mult50", mf, 50)
    logging.info('Multiplier 5/6')
    md.add_filtered_fem_variable("mult51", mf, 51)
    logging.info('Multiplier 6/6')

to get my multipliers, and the error is

thon3.10/site-packages/getfem python3 test6D_add_filtered_fem_variable.py
Level 1 Warning in getfem_regular_meshes.cc, line 33: CAUTION : Simplexification in dimension >= 5 has not been tested and the resulting mesh should be not conformal
message from gf_model_get follow:
List of model variables and data:
Variable       f                              1 copy   fem dependant     4096 doubles.
Variable       mult40                         1 copy   fem dependant     3068 doubles.
Variable       mult46                         1 copy   fem dependant     3056 doubles.
Variable       mult47                         1 copy   fem dependant     3056 doubles.
Variable       mult48                         1 copy   fem dependant     3056 doubles.
Variable       mult49                         1 copy   fem dependant     3056 doubles.
Variable       mult50                         1 copy   fem dependant     1024 doubles.
Variable       mult51                         1 copy   fem dependant     1024 doubles.
Data           B1                             1 copy   fem dependant     4096 doubles.
Data           B2                             1 copy   fem dependant     4096 doubles.
Data           B3                             1 copy   fem dependant     4096 doubles.
Data           DirichletData                  1 copy   fem dependant     4096 doubles.
Data           E1                             1 copy   fem dependant     4096 doubles.
Data           E2                             1 copy   fem dependant     4096 doubles.
Data           E3                             1 copy   fem dependant     4096 doubles.
Data           zeros                          1 copy   fem dependant     4096 doubles.

Trace 2 in getfem_models.cc, line 3308: Generic source term assembly
Trace 2 in getfem_models.cc, line 3319:  (source term): generic source term assembly
Trace 2 in getfem_models.cc, line 3476: Generic linear assembly brick: generic matrix assembly
Trace 2 in getfem_models.cc, line 3476: Generic linear assembly brick: generic matrix assembly
Trace 2 in getfem_models.cc, line 3476: Generic linear assembly brick: generic matrix assembly
Trace 2 in getfem_models.cc, line 3476: Generic linear assembly brick: generic matrix assembly
Trace 2 in getfem_models.cc, line 3476: Generic linear assembly brick: generic matrix assembly
Trace 2 in getfem_models.cc, line 3476: Generic linear assembly brick: generic matrix assembly
Trace 2 in getfem_models.cc, line 3476: Generic linear assembly brick: generic matrix assembly
Trace 2 in getfem_models.cc, line 3308: Generic source term assembly
Trace 2 in getfem_models.cc, line 3319: Source term: generic source term assembly
Trace 2 in getfem_models.cc, line 3308: Generic source term assembly
Trace 2 in getfem_models.cc, line 3319: Source term: generic source term assembly
Trace 2 in getfem_models.cc, line 3308: Generic source term assembly
Trace 2 in getfem_models.cc, line 3319: Source term: generic source term assembly
Trace 2 in getfem_models.cc, line 3308: Generic source term assembly
Trace 2 in getfem_models.cc, line 3319: Source term: generic source term assembly
Trace 2 in getfem_models.cc, line 3308: Generic source term assembly
Trace 2 in getfem_models.cc, line 3319: Source term: generic source term assembly
Trace 2 in getfem_models.cc, line 2652: Global generic assembly RHS
Assembly time 2.48828
 iter   0 residual            1
Trace 2 in getfem_models.cc, line 2654: Global generic assembly tangent term
Assembly time 1.49739
logic_error exception caught
Traceback (most recent call last):
  File "/mnt/c/Users/Admin/Documents/Important Files/GT/Research/Summer 2025/Memos/06 16 2025 - getFEM term filtering/test6D_add_filtered_fem_variable.py", line 309, in <module>
    force, stability, result_arrays = calc_F(0e-6, 0, -7.8, 0., 1e6+0.0, 2.9, make_grids(N, 10))
  File "/mnt/c/Users/Admin/Documents/Important Files/GT/Research/Summer 2025/Memos/06 16 2025 - getFEM term filtering/test6D_add_filtered_fem_variable.py", line 259, in calc_F
    md.solve("noisy", "lsolver", "mumps")
  File "/home/eac/FEM_cpp_testing/test_compilation_folder/opt/lib/python3.10/site-packages/getfem/getfem.py", line 2989, in solve
    return self.get("solve", *args)
  File "/home/eac/FEM_cpp_testing/test_compilation_folder/opt/lib/python3.10/site-packages/getfem/getfem.py", line 2813, in get
    return getfem('model_get',self.id, *args)
RuntimeError: (Getfem::InterfaceError) -- Error in ../../src/gmm/gmm_MUMPS_interface.h, line 205 bool gmm::mumps_error_check(int, int):
Solve with MUMPS failed: error -9, increase ICNTL(14)

Note - it did go much more quickly at first, though - my logging file now reads

2025-06-16 08:59:24,077 [DEBUG] Basis functions per element: ('1 - x - y - z - w - v - u', 'x', 'y', 'z', 'w', 'v', 'u')
2025-06-16 08:59:48,650 [INFO] Multiplier 1/6
2025-06-16 08:59:48,651 [INFO] Multiplier 2/6
2025-06-16 08:59:48,651 [INFO] Multiplier 3/6
2025-06-16 08:59:48,652 [INFO] Multiplier 4/6
2025-06-16 08:59:48,652 [INFO] Multiplier 5/6
2025-06-16 08:59:48,653 [INFO] Multiplier 6/6
2025-06-16 08:59:48,709 [INFO] Linear 1/6
2025-06-16 08:59:48,715 [INFO] Linear 2/6
2025-06-16 08:59:48,722 [INFO] Linear 3/6
2025-06-16 08:59:48,727 [INFO] Linear 4/6
2025-06-16 08:59:48,730 [INFO] Linear 5/6
2025-06-16 08:59:48,732 [INFO] Linear 6/6

…but it errors afterwards.

Sincerely,
Eric Comstock

no, you should not change ICNTL(14), you need to reflect on why the add_filtered_fem_variable gives you a different behavior than the add_multiplier.
You pass two things to add_filtered_fem_variable

  • a mesh_fem object
  • a region

if you define these 2 correctly you will not get a too dense tangent matrix (which has the ICNTL(14) error as a side effect)

Hello Kostas,

Is there any way to examine each of those objects, to determine if they are being defined correctly?

Sincerely,
Eric Comstock

to begin with, for a given mesh, you need to know beforehand (calculate by hand) how many degrees of freedom your multiplier variable should have. Then you can print

model.mesh_fem_of_variable("mult_varname").nb_dof()

and compare.

You can also examine the value of
model.mesh_fem_of_variable("mult_varname").nb_basic_dof()

which is all dofs without restricting the variable to region.

1 Like

Hello Kostas,

I got the error “AttributeError: 'MeshFem' object has no attribute 'nb_dof'. Did you mean: 'nbdof'?

I switched to md.mesh_fem_of_variable("mult46").nbdof(), and got some very odd results - it was exactly the same as md.mesh_fem_of_variable("mult46").nb_basic_dof(), which were both the total number of dofs I expect in my problem.

Please advise.

Sincerely,
Eric Comstock

yes, I meant nbdof(). It seems we are coming closer to the root cause of the issue.

Your region for the filtered variable is either not defined properly, or ignored.

print the content of mesh.region(region_number)

how does it look like?