Compiling a CP2K binary – the massively parallel, open-source quantum chemistry and solid state physics program written in Fortran 2003 – can be a daunting task if you’ve never built a large project using maketools. This post aims to demystify the build process by showcasing a full CP2K build on a Cray XC40 supercomputer (codename Sisu) including dependencies.

The CP2K installation process can be divided into four parts: compiling dependencies, compiling the binary, checking that it works (= regression testing), and benchmarking the performance of the binary. Several versions of the CP2K binary can be built using different parallelization strategies and combinations of optimization and debugging flags. In this post, I will focus on the first three steps of the installation process. I will devote a separate post for exploring the effects of different optimization flags/options, although I will build (some of the) binary variants in this post.

This post was last edited on August 28, 2018 after its original publish date on April 26, 2017.

# Building the dependencies

## Mandatory dependencies

A number of mandatory prerequisites are needed to build CP2K. An up-to-date list can be found in the official installation instructions. These include the obvious Fortran/C compilers and other build tools (make and Python) as well as the numerical libraries BLAS and LAPACK for performing linear algebra operations. To build an MPI parallelized binary (highly recommended!), you will also need a working MPI installation and a parallel version of the mathematical libraries (ScaLAPACK).

Most of the above dependencies are either directly available in most Linux distributions or they can be easily installed from official repositories using the distribution’s package manager (e.g. pacman or apt-get). You should always use tuned vendor-provided libraries to maximize performance if one is available for your platform (e.g. use MKL on Intel machines instead of the reference Netlib BLAS/LAPACK library). These libraries should already be available when building on a hosted HPC machine but will require manual installation otherwise. A good source for help in building these dependencies can be found in the installation scripts of the CP2K toolchain. In fact, you can perform a complete installation of all dependencies with the toolchain, but here I will be manually installing only those dependencies that I need.

I will use the Cray Compilation Environment (CCE) with GNU compilers, Libsci for the (Sca)LAPACK/BLAS math libraries, and cray-mpich for MPI. These can simply be loaded as modules

## Optional dependencies

In addition to the mandatory prerequisites listed in the previous section, a number of optional libraries can be linked to CP2K to extended its features and to improve its performance. I will be using the following libraries

• FFTW3 (improved FFT performance)
• libxsmm (improved matrix multiplication performance for small matrices)
• libgrid (improved performance for collocation/integration routines)
• ELPA (improved performance for matrix diagonalizations)
• libint (enables Hartree-Fock (HF) and post-HF calculations)
• libxc (provides a greater number of exchange-correlation functionals)

I will not build the FFTW3 library as it is available as a module (module load fftw, version 3.3.4.11) on my machine. For the other libraries, I will use the following settings to compile them unless otherwise specified

where ${libname} and ${libver} are the name and version, respectively, of the library being built.

Each of these libraries is built using the standard maketools process – configure, make and make install (see e.g. this post if you are unfamiliar with the concept). The CCE automatically sets a number of compiler flags, for example, to link the Libsci math libraries. These features can be leveraged by using the Cray wrappers for all compilers (using ftn instead of gfortran). The CCE thus simplifies the configuration process somewhat but the key principles remain the same. Please refer to the toolchain for complete configuration scripts. I will use the same library versions that are used in the toolchain. The compute nodes on Sisu use Haswell processors which is why I’ve set the flags -march=haswell -mtune=haswell -mavx2.

### libxc

Libxc is a library of exchange-correlation functionals for density functional theory calculations. It provides functionals ranging from LDA all the way to meta hybrid functionals. I will compile version 3.0 of the library. The list of functionals in this release can be found here. For instructions on how to use libxc functionals in CP2K, take a look at the corresponding regtests. Note that not all libxc functionals are supported by CP2K. The compilation proceeds as follows

### libint

Libint is a library for computing four center electron repulsion integrals. These integrals are needed for Hartree-Fock (hybrid exchange-correlation functionals) and other higher level (MP2, RPA) calculations. CP2K is compatible only with the first version of the library which have a minor release number of => 1.1.4 (libraries from the next major version 2.x.y wont work). I will use version 1.1.6, which is again the toolchain default.

To successfully compile libint, I had to remove the -g -fno-omit-frame-pointer flags from CXXFLAGS. I will compile libint to support angular momenta up to h-functions for energies (configure flag --with-libint-max-am=5) and g-functions for derivatives (--with-libderiv-max-am1=4)

Due to the dual basis set nature of CP2K, one of the key operations in CP2K is the integration and collocation of Gaussian products (routines calculate_rho_elec and integrate_v_rspace), where the sparse matrix representation of the electron density (coefficients of Gaussian basis functions) is mapped to realspace multigrids and its reverse operation. The routines implementing these operations can be tuned for a specific architecture and packed into a library (libgrid) using auto generation tools which can be found in the folder ${cp2k_basedir}/tools/autotune_grid. Please refer to the README therein for additional details. Building libgrid is slightly more involved than the other libraries. After unpacking all of the data, I set the following options in config.in I then generated the makefile with ./generate_makefile.sh and compiled all code variants using make -j 4 all_gen (on a login node). Each binary was then timed on the compute nodes using 1 core (aprun -n 1 make all_run), which took several hours. It might be possible to run this step in parallel but I opted for the safe choice. The best code variants were then selected with make gen_best > make_gen_best.log 2>&1 & and packaged into a library with make libgrid.a. ### ELPA ELPA is an efficient numerical library for diagonalizing matrices in a block-cyclic data layout (ScaLAPACK format). The library provides optimized kernels tailored to specific architectures. By default, all supported kernels are installed. The performance of ELPA usually exceeds the performance of vendor-specific diagonalization libraries. In CP2K, ELPA can be linked in to replace all calls to the ScaLAPACK diagonalization routine (cp_fm_syevd). The performance benefits should be most apparent in simulations using a standard diagonalization based solver (&DIAGONALIZATION) or when using the &FULL_ALL preconditioner for OT. It should be noted that ELPA cannot be used for ‘small’ systems (size of input matrix relative to the number of processors). I will build both MPI only and hybrid MPI/OpenMP versions of the library (version 2016.05.004). Compared to the other libraries, some additional flags have to be defined in the configure phase but otherwise the installation proceeds analogously To check which kernels were installed, you can use the print_kernels binary available in the object directory (where make was invoked) as below, or inspect the library header file elpa_kernel_constants.h which was installed in the include folder of the library. ### libxsmm Libxsmm is a specialized matrix multiplication library for small matrices targeting Intel architecture. It substitutes CP2K’s own (optional) libsmm library (${cp2k_basedir}/tools/build_libsmm). Small matrix multiplications are a common operation in CP2K because Gaussian basis functions are used, see e.g. the DBCSR timing report at the end of a CP2K output file. The library features just-in-time code generation and overall the installation process is very simple. As a slight caveat, I actually had to compile version 1.7.1 of the library on Taito – a computer cluster which has similar Haswell nodes like its supercomputer sibling – due to issues with the gcc compiler on Sisu (which nonetheless works with version 1.5.1). Thanks to the CSC service desk for discovering the workaround!

I installed libxsmm using

Edit on August 28, 2018: Thanks to the help of @hfp, one of the developers behind libxsmm, I managed to track down the cause behind the compilation crash on Sisu. Turns out that the binutils were outdated (2.23.1) on Sisu, while a newer 2.25 version was available on Taito. This issue can be bypassed by compiling the library with the extra flag INTRINSICS=1, see also instructions here.

# Building the CP2K binary

Compiling the CP2K binary is very similar to building the dependencies. A development version of CP2K can be downloaded with

svn checkout http://svn.code.sf.net/p/cp2k/code/trunk .

Alternatively, you can download a stable release version following these instructions. I will use the latest SVN development version in this post (r17867 at the time of writing). Instead of configuring the installation with a separate script, the compilation flags and external libraries are defined in an arch file (${cp2k_basedir}/arch/). A number of example arch files are included in the directory for different architectures. You can find more examples in the CP2K dashboard or by searching the CP2K Google groups forum. I will use the following arch file templates to build MPI only (popt) and hybrid MPI/OpenMP (psmp) versions of CP2K Above in the templates, we use DFLAGS to tell which external libraries should be included in CP2K and how they have been configured (refer to the INSTALL file). The appropriate include and library files, which we built earlier, are included with FCLAGS, LDFLAGS and LIBS. You’ll notice that these ‘basic’ templates are already using quite aggressive optimization flags -march=haswell -mtune=haswell -mavx2 -funroll-loops -ffast-math -ftree-vectorize (refer to the gcc manual for more details regarding the flags), which I’ve found not to be an issue with GNU compilers. In the psmp version, threaded versions of the ELPA, FFTW3 and libxsmm library are linked to CP2K. Notice that the values of libint_max_am and libderiv_max_am need to be 1 larger than during compilation of libint (you can verify the correct values from the libint library header files). In addition to the basic installation, I will compile binaries with fused multiply-add (FMA) and hugepages (larger virtual memory pages) support. These might lead to performance improvements in certain use-cases (more on this later in the separate benchmarking post). FMA instructions can be enabled with by adding -mfma to FCFLAGS. On Cray XC40, hugepages can be used by loading any of the available modules with different pagesizes (module avail craype-hugepages). I will use the craype-hugepages-32M module with a pagesize of 32M. The pagesize can be changed at runtime without recompiling the binary by loading a different module (see man intro_hugepages). With the arch files sorted, CP2K can be installed with which installs binaries into ${cp2k_basedir}/exe/your_arch/. If your installation fails study the make log file carefully, reread the installation instructions, and/or seek help online (the Google groups forum is a good source of information!).

# Testing the CP2K binary

Having built a CP2K binary, the next step is to verify that it produces correct results for known test cases called regtests. CP2K ships with nearly 3000 regtests which you can find in the ${cp2k_basedir}/tests/ directory. The idea of regression testing is to compare the output of the newly compiled binary against reference values. A test-dependent tolerance parameter determines how much the output can deviate from the reference value before it is considered incorrect. The SVN development version of CP2K is regularly tested on different architectures and the results are collected in the dashboard. If the dashboard regression testers report no errors (due to issues with latest patches) binaries built using common compilers and machines should pass all regtests. Certain optimization flags or compiler versions might however lead to incorrect regtest results. In such an event, it is worth rerunning the test with a binary compiled with a lower optimization level to try and pinpoint any issues (e.g -O1). Sometimes wrong test results can be safely ignored if the error is small compared to the tolerance, but the assessment has to be made case-by-case. The easiest way to run the full regtest suite on a local machine is to use make based testing make -j X ARCH=your_arch VERSION=your_version test. On HPC machines with a batch job submission system, the regtest script ${cp2k_basedir}/tools/regtesting/do_regtest

should be copied for example to the ${cp2k_basedir}/regtesting directory. Next, a configuration file suitable to the machine must be created for running the script. I’ve used the following file for testing the popt binaries The regtest can then be started by executing ./do_regtest -c$your_conf in an appropriate batch job submission script. You can find more information about regression testing here.

I’ve tabulated the regtest results below for the binaries I built in the previous section. The results reveal no issues with the binaries.

Optimization level Version OK NEW WRONG
‘Basic’ popt 2955 19 0
psmp 2955 19 0
FMA popt 2955 19 0
psmp 2955 19 0
FMA + Hugepages popt 2955 19 0
psmp 2955 19 0