Generated Sources In CMake Builds

October 23, 2021April 18, 2017 by Craig Scott

Using a set of source files to build libraries and executables is about the most basic thing a build system needs to do. This is relatively easy with CMake, but things get more interesting when some of the source files need to be generated as part of the build. CMake provides a range of functionality which can be used to create files, but getting build dependencies correct is an area where many developers struggle or even simply give up. It doesn’t have to be that way!

Generating Files At Configure Time

The easiest scenario involves copying a file from somewhere into a known location during the configure stage and using it as a source or header file in the build stage. The configure_file() command makes this trivial and even has the ability to transform the input such that ${someVar} and @someVar@ are replaced with the value of the corresponding CMake variable during the copy. This can be a better alternative to passing information through compiler defines in some situations. A particularly effective example of this is passing an application version string defined in CMake through to C++ source code:

version.cpp.in:

const char* getVersion()
{
    return "@MyProj_VERSION@";
}

CMakeLists.txt:

cmake_minimum_required(VERSION 3.0)
project(MyProj VERSION 2.4.3)

configure_file(version.cpp.in version.cpp)

add_executable(myapp
    main.cpp
    ${CMAKE_CURRENT_BINARY_DIR}/version.cpp
)

In the above, the MyProj_VERSION variable is automatically populated by CMake as part of the project() command when the VERSION option is provided. The configure_file() command then substitutes that CMake variable’s value during the copy, so the version.cpp file ends up with the version string embedded directly. The version.cpp file is generated in the build directory and this file is then added as a source for the myapp executable.

One of the good things about configure_file() is that it automatically handles build dependencies. If the version.cpp.in input file ever changes, CMake will re-run the configure stage at the start of the build so that the version.cpp file is regenerated. Furthermore, CMake also recognises that version.cpp is an input file to the myapp target, so version.cpp will be recompiled and myapp relinked. The project does not have to specify any of these dependencies, CMake automatically recognises and defines them. The configure_file() command also has the useful characteristic that if the generated content doesn’t change, then the file is not actually updated and therefore CMake’s dependency tracking won’t cause downstream dependencies to be rebuilt.

Sometimes the content of source files might be hard-coded directly in the CMakeLists.txt files or it may be built up from CMake variable contents. Files can then be written using one of the file() command forms which create file contents. The above example could also be implemented like this:

cmake_minimum_required(VERSION 3.0)
project(MyProj VERSION 2.4.3)

file(WRITE version.cpp
     "const char* getVersion() { return \"${MyProj_VERSION}\"; }"
)

add_executable(myapp
    main.cpp
   ${CMAKE_CURRENT_BINARY_DIR}/version.cpp
)

The file() command has other forms which can be used to write files, such as file(APPEND...) and file(GENERATE...), the latter being useful if generator expressions need to be evaluated as part of the content to be written. One disadvantage of using file(WRITE) or file(APPEND) to write files is that they update the output file every time CMake is run. As a result, anything that depends on that output file will be seen as out of date, even if the file’s contents haven’t changed. The file(GENERATE) command only updates the file if the contents change, just like configure_file(), so for this reason, file(GENERATE) or configure_file() are usually better alternatives if the output file is used as an input to something else.

Generating Files At Build Time

A more challenging scenario is where a tool needs to be run at build time and the tool’s outputs are files to be used as sources for CMake targets. During the configure stage (i.e. when CMake is run), these files do not yet exist, so CMake needs some way of knowing they are generated files or it will expect them to be present. Unfortunately, a common approach used in this situation is to take a copy of a set of generated files and save them with the project’s sources (i.e. check them into version control). The project then defines a custom target which the developer can run to update the saved sources. This approach has a number of problems:

Builds should not require manual steps or building a set of targets in a certain order, that’s what the build system should be able to handle its own.
Builds can modify sources, which they should always aim to avoid. If a developer is using multiple build trees with the same set of sources (e.g. for different build configurations such as Debug and Release), then the builds may conflict in terms of the sources they want to generate (e.g. a Debug build may add additional logging or consistency checks).
It is too easy to use generated code which is not up to date. This can be a hard to detect cause of code consistency problems if the generated code is used across multiple projects.
If using code review as part of merge requests, etc., changes in the generated code can swamp the interesting changes made by the developer. In some cases, the amount of change in the generated code can even test the limits of review tools in terms of the size of a change they can handle.

Instead of defining a custom target to generate the sources manually, projects should define custom outputs with add_custom_command(). CMake can then automatically work out dependencies when those outputs are used as inputs to another target.

cmake_minimum_required(VERSION 3.0)
project(MyProj VERSION 2.4.3)

add_custom_command(
    OUTPUT  generated.cpp          # Treated as relative to CMAKE_CURRENT_BINARY_DIR
    COMMAND mytool
              ${CMAKE_CURRENT_SOURCE_DIR}/someInputFile.cpp.in
              ${CMAKE_CURRENT_BINARY_DIR}/generated.cpp
    DEPENDS someInputFile.cpp.in   # Treated as relative to CMAKE_CURRENT_SOURCE_DIR
)

add_executable(myapp
    main.cpp
    ${CMAKE_CURRENT_BINARY_DIR}/generated.cpp
)

With this approach, generated sources do not have to be saved in the source tree or checked in to version control. The sources are generated at build time and targets that list any of the custom command’s output files as sources will be dependent on the custom command. Extra dependencies can also be specified in add_custom_command() with the DEPENDS option, allowing files it uses as inputs to become part of the dependency hierarchy. If such input files change, the output files are regenerated and targets using those output files will be rebuilt. The main restriction here is that the targets using the generated outputs as inputs must be defined in the same directory scope as the custom command.

This method can also be used when the tool performing the generation is itself a build target. If the command to run is an executable target, CMake will substitute the location of the binary automatically. It will also set up a dependency which ensures the executable target exists, but not a dependency that will re-run generation every time that target is rebuilt (to get that behavior, the executable has to be listed as a DEPENDS as well).

cmake_minimum_required(VERSION 3.0)
project(MyProj VERSION 2.4.3)

add_executable(generator generator.cpp)

add_custom_command(
    OUTPUT  generated.cpp
    COMMAND generator
              ${CMAKE_CURRENT_SOURCE_DIR}/someInputFile.cpp.in
              ${CMAKE_CURRENT_BINARY_DIR}/generated.cpp
    DEPENDS generator someInputFile.cpp.in
)

add_executable(myapp
    main.cpp
    ${CMAKE_CURRENT_BINARY_DIR}/generated.cpp
)

Closing Remarks

The most effective way to generate content to be used as source files depends on a number of factors. If the file contents can be generated at configure time, this is often the simplest approach. The main drawback to a configure-time copy is that if the configure stage is not fast, re-running CMake every time the input file changes can be annoying. Generating source files at build time is preferable where the generation is expensive or where it requires a tool that is itself built as part of the project. This lets CMake handle dependencies between the tool and the generation steps as well as being more suitable for parallel builds (in comparison, the configure stage is inherently non-parallel).

One scenario not covered in the above is where one of the files being generated is a file read in by CMake as part of the configure stage. A generator may produce a CMakeLists.txt file, for example, which means the generator has to exist at configure time before any build has been performed. If the generator is built by the project, a chicken-and-egg situation results. Handling this sort of scenario requires more complicated techniques, with one effective solution being documented here. Alternatively, if the generator can be factored out into its own project, a more traditional superbuild approach using ExternalProject may be another alternative.

Have a CMake maintainer work on your project

Get the book for more CMake content

18 thoughts on “Generated Sources In CMake Builds”

Neil

February 8, 2018 at 9:49 pm

Very useful indeed. Thanks a lot. I’ve now got something which takes my GLSL shaders and puts them into a map of inside the C++ code, thus solving my “how to embed my shaders in the executable” problem, nicely.
Reply
Joe

March 9, 2018 at 5:25 am

Hi,

I stumbled across this trying to solve this very problem when porting a C++ project to Linux. This looks to be exactly what I want but unfortunately none of this works for the static library I’m trying to build.

My C++ static library project has a pre-build step runs a custom tool on a file named “myfile.tra” to produce an output named “myfile.c” The “myfile.c” is supposed to be compiled into the static library.

But even after trying your approach, any attempt to reference that generated file “myfile.c” in the subsequent add_library() statement (for the static library) fails completely. Cmake sees that it is not there (when Cmake is running, that is) and complains. It doesn’t appear to be smart enough to realize that the previous add_executable() and add_custom_command are supposed to create it. Not sure what I am missing.

add_generator(generator myfile.tra)
add_custom_command( OUTPUT myfile.c COMMAND generator ${TOOLSDIR}/gentool --out ${CMAKE_CURRENT_LIST_DIR}/myfile.c DEPENDS generator myfile.tra ) add_library(mylibrary static ${CMAKE_CURRENT_LIST_DIR}/myfile.c)
... rest of the CMake stuff for mylibrary ...

I have verified that my tool generation command is correct. I am using CMake 3.5 if that matters.

Am I missing something obvious?
Reply
- Craig Scott
  
  March 9, 2018 at 7:25 am
  
  By specifying OUTPUT myfile.c, you are saying it generates myfile.c in the current binary directory, but your call to add_library() is expecting it in the current source directory. Update your add_library() call to use CMAKE_CURRENT_BINARY_DIR instead of CMAKE_CURRENT_LIST_DIR and same for the actual command you give to add_custom_command(). You should not generally be generating files into the source directory anyway, aim to keep your source dir clean and free of build artefacts.
  
  I’m also not sure if CMake understands static in lowercase in the call to add_library(). Even if it does, it is very unusual. It would always be STATIC in uppercase in all the examples and docs I’ve seen.
  Reply
Al

September 12, 2018 at 5:38 am

Very good information, that expands on the topics touched here, can be found in this post:

https://samthursfield.wordpress.com/2015/11/21/cmake-dependencies-between-targets-and-files-and-custom-commands/

(Highly recommended, since it explains concepts, which the official cmake documentation just points out in obfuscated manner. Here’s an example: the “super-bad” official cmake-documentation on add_custom_command states: “Do not list the output in more than one independent target that may build in parallel or the two instances of the rule may conflict (instead use the add_custom_target() command to drive the command and make the other targets depend on that one)”.
— Check the above link for a more readable and understandable explanation of what this means!)

PS: Just acquired your book. I like your writing. Clean, understandable and hands-on. In other words: totally different to the official cmake-documentation and the rather bad book called “Mastering CMake”.
Reply
Dale Barnard

May 24, 2019 at 4:26 am

This is so great. I had been struggling with using SWIG with CMake, and this helped me see the light. Thanks!
Reply
Kai-Uwe Behrmann

June 29, 2019 at 10:01 pm

Thanks. Your article with the examples gave me the valuable hints for integration of code generator output into the build process. So each time the related sources change, the generated code is checked.
Reply
Martin

September 20, 2019 at 1:45 am
I was struggling to find a way to express the dependency when the generated files are in a subdirectory:

For example there is a command “cmd input.xml” that generates “generated.cpp”, but input.xml and generated.cpp are in a sub directory to the one that contains the CMakeLists.txt file
```
add_custom_command(OUTPUT subdir/generated.cpp
                                       COMMAND $(CMAKE_CMD)
                                       ARGS -E chdir subdir cmd input.xml
                                       MAIN_DEPENDENCY input.xml
)

add_library(my_lib MODULE
  subdir/generated.cpp
)
```
would not work for me (it would not run the custom command, or would tell me that the source subdir/generated.cpp did not exist), but using a macro like this:
```
set(GENERATED subdir/generated.cpp)
add_custom_command(OUTPUT ${GENERATED}
                                       COMMAND $(CMAKE_CMD)
                                       ARGS -E chdir subdir cmd input.xml
                                       MAIN_DEPENDENCY input.xml
)

add_library(my_lib MODULE
  ${GENERATED}
)
```
works.

So you think that this is a cmake bug, or is this behaviour intended?
Reply
Craig Scott

September 20, 2019 at 5:36 am
The first example failing is expected. The reason is related to the use of relative paths for the generated file. The custom command produces the file in the build directory, but the add_library() command adds a source file expected to be in the source directory. The correct version should look like this:
```
add_custom_command(OUTPUT subdir/generated.cpp
                   COMMAND $(CMAKE_CMD)
                   ARGS -E chdir subdir cmd input.xml
                   MAIN_DEPENDENCY input.xml
)

add_library(my_lib MODULE
  ${CMAKE_CURRENT_BINARY_DIR}/subdir/generated.cpp   # This is the changed line
)
```
The second example succeeding is unexpected. I don’t have an immediate answer for why it does at the moment, but if I find a reason for it, I’ll add a further comment later.
Reply
Dennis Ferron

May 19, 2020 at 4:08 pm

I’m adding code generation to my game engine project. From another blog (https://blog.kangz.net/posts/2016/05/26/integrating-a-code-generator-with-cmake/) I learned of the idea of having the code generator itself print out its outputs and dependencies when given –print_outputs or –print_dependencies command line arguments. So the generator is first queried with two execute_process commands to find out what the dependencies and outputs are, and then the add_custom_command can be configured with that information.

The problem I’m running into with that approach is that (I think) the execute_process command runs at configure time and/or isn’t aware that its COMMAND argument depends on the code generator sub-project being built first. If the code generator fails to build, running execute_process to get dependencies fails, and then I can’t build from the top level. (That is, I can build the generator manually (because it does not include add_custom_command) but I can’t build all (because a CMake script with add_custom_command is included from the top level.))

The difference is that in the example from the blog I linked to, the code generator executable was an external tool. In my situation the code generator is also built in the same CMake project. Thankfully I don’t think I need to resort to a nested CMake invocation. A possible solution I just thought of is:

The tool which is run with execute_process to –print_outputs or –print_dependencies does not HAVE to be the same executable as the code generator. I can factor that out into a simpler executable or script which just understands how to find my template files and knows what files the code generator will make. It can change less frequently than my code generator executable so wouldn’t trigger a chicken and egg problem on rebuilds (though first build might be a problem).

In fact, I’d prefer to just have the template files (inputs) for the code generator also in my CMake project. What I really need is a way to tell CMake, run my add_custom_command on any file in the project with a particular extension.
Reply
Kughas

May 12, 2021 at 9:29 am

In the last example there will be a version of generator for every build type. Ideally you need generator built only in Release configuration and then being reused all other build types.
Reply
Craig Scott

May 12, 2021 at 9:51 am

Good observation. But it would be undesirable to require that the Release configuration always be built and even if it was, it would be fragile to require it to be built before other configurations. This affects multi-config generators like Visual Studio and Xcode, but Ninja Multi-Config has some unique capabilities in this area. Ninja Multi-Config supports a CMAKE_CROSS_CONFIGS variable which can be used to coordinate how multiple configurations can be built and used in one build. It has specific features for almost exactly the scenario you describe and its docs even discuss an example that looks very similar. You can find details here:

https://cmake.org/cmake/help/latest/generator/Ninja%20Multi-Config.html
Reply
- Kughas
  
  May 19, 2021 at 2:20 am
  
  This is great. Exactly what I was looking for. Thanks.
  Reply
Claus Klein

October 22, 2021 at 6:01 pm
I had to change your example a bit, because the generator does not find the in file:
```
cmake_minimum_required(VERSION 3.10...3.21)

project(MyProj VERSION 2.4.3)

add_executable(generator generator.cpp)

add_custom_command(
    OUTPUT  generated.cpp
    COMMAND generator ${CMAKE_CURRENT_SOURCE_DIR}/someInputFile.cpp.in generated.cpp
    DEPENDS generator someInputFile.cpp.in
)

add_executable(myapp
    main.cpp
    ${CMAKE_CURRENT_BINARY_DIR}/generated.cpp
)
```
Reply
- Craig Scott
  
  October 22, 2021 at 6:43 pm
  
  Sure, this depends on how you implement your generator in the generator.cpp file. But it does probably make sense to pass the input file as a command-line argument, as in your modified version of the example. I’ll update the article shortly.
  Reply
Claus Klein

October 22, 2021 at 7:12 pm

An other problem I found on windows with ninja generator is:
If the commands are too long, cmake generates a batch file.

I run into problems because the idl compiler used is a cmd file wrapper for a java application.

On windows, I have to use “call idl2cpp.cmd … ” which is not portable!

Without call … , the following commands in the rule are not executed without any error?

Q: Do you have any idea how this can be solved?
Q: What is the MAX command length, after that, a batch file is generated?
Q: Is this a windows specifically problem or a general ninja problem?
Reply
- Craig Scott
  
  October 23, 2021 at 12:25 am
  
  This would be better asked in the CMake forums
  Reply
  - Claus Klein
    
    October 26, 2021 at 6:46 am
    
    https://discourse.cmake.org/t/add-custom-command-does-not-work-in-each-cases-but-there-is-no-error/4338
    Reply
James Wilkinson

June 24, 2022 at 8:58 am

This was really helpful; thanks for giving great detail and working examples.
Reply