Finding performance bottlenecks is important. All cases inside test_assembly.cc are quite optimized but this does not cover all possible cases.
Profiling a specific program that is based on GetFEM, only requires that the program (if compiled at all) and GetFEM itself are compiled with the debug flag ā-gā. Then the program can be started with perf, as in the examples:
OMP_NUM_THREADS=1 perf record --call-graph dwarf ./test_assembly
or
OMP_NUM_THREADS=1 perf record --call-graph dwarf python3 gf_benchmark.py
The profiled program will run at its normal speed and perf will create a quite big file called perf.data with all profiling information. To visualize this file, just start hotspot in the same folder with
hotspot
by default it will search for a file called perf.data and will provide a visualization like this