Do Not Redefine CMake Commands

September 29, 2019September 14, 2018 by Craig Scott

In this article, we take a closer look at a particular example from the popular Effective CMake talk by Dan Pfeifer. The example in question relies on undocumented behavior, the dangers of which may not be immediately obvious and can lead to infinite recursion.

In the section of the talk where package management is discussed, the following example code is presented (around 52m38s):

set(CMAKE_PREFIX_PATH "/prefix")
set(as_subproject Foo)

macro(find_package)
  if(NOT "${ARG0}" IN_LIST as_subproject)
    _find_package(${ARGV})
  endif()
endmacro()

add_subdirectory(Foo)
add_subdirectory(App)

The intent behind the above block of CMake code is to redefine the built-in find_package() command such that it checks if the package to be found is one of those named in the as_subproject list variable. If it is in that list, then it should be assumed that the package will be added to the project directly via add_subdirectory() and the find_package() call should do nothing. Otherwise, the call should be forwarded to the original built-in find_package() command. It relies on undocumented behavior to call through to the original implementation using _find_package().

(The following explanation is largely extracted from the Functions And Macros chapter of the book Professional CMake: A Practical Guide)

When function() or macro() is called to define a new command, if a command already exists with that name, the undocumented CMake behavior is to make the old command available using the same name except with an underscore prepended. This applies whether the old name is for a built-in command, a custom function or a macro. If a command is only ever overridden once, techniques like in the example above appear to work, but if the command is overridden again, then the original command is no longer accessible. The prepending of one underscore to “save” the previous command only applies to the current name, it is not applied recursively to all previous overrides. This has the potential to lead to infinite recursion, as the following contrived example demonstrates:

function(printme)
    message("Hello from first")
endfunction()

function(printme)
    message("Hello from second")
    _printme()
endfunction()

function(printme)
    message("Hello from third")
    _printme()
endfunction()

printme()

One may naively expect the output to be as follows:

Hello from third
Hello from second
Hello from first

But instead, the first implementation is never called because the second one ends up calling itself in an infinite loop. When CMake processes the above, here is what occurs:

The first implementation of printme() is created and made available as a command of that name. No command by that name previously existed, so no further action is required.
The second implementation of printme() is encountered. CMake finds an existing command by that name, so it defines the name _printme to point to the old command and sets printme to point to the new definition.
The third implementation of printme() is encountered. Again, CMake finds an existing command by that name, so it redefines the name _printme to point to the old command (which is the second implementation) and sets printme to point to the new definition.

When printme() is called, execution enters the third implementation, which calls _printme(). This enters the second implementation which also calls _printme(), but _printme() points back at the second implementation again and infinite recursion results. Execution never reaches the first implementation.

For the find_package() example mentioned earlier, the implications of find_package() being redefined more than once are catastrophic.

The original find_package() command becomes permanently inaccessible.
Any attempt to call find_package() will result in infinite recursion if _find_package() is called by the new implementation.

In all fairness, in his talk Dan highlighted that he envisioned that the code sample in question would only ever be executed once as part of a package manager. The find_package() macro redefinition would be inserted by a package manager as part of how it incorporated external dependencies under its control. Even if find_package() were only redefined once though, it would still be relying on undocumented CMake behavior which may be modified or removed completely in a future version. Reliance on such behavior should be discouraged and as the above discussion shows, the technique is not safe to use in general.

In a future article, we will also look at another aspect of this example to discuss some potential traps when forwarding command arguments. Those gotchas will add further weight to the recommendation to avoid overriding existing commands.

Have a CMake maintainer work on your project

Get the book for more CMake content

6 thoughts on “Do Not Redefine CMake Commands”

Daniel Michael Russell
May 3, 2019 at 2:31 am
What sucks is that there are only a handful of talks like this, and they all approach the problem slightly differently. Meanwhile, as a new user of cmake, building an open source project, I almost immediately ran into this (rather severe) weakness of cmake and have been looking for how to resolve it, but keep getting confused by the slight differences (hacks) in approach by these talks, as well as a an almost total black out on the Internet.
It seems to me, in conclusion, that these are FUNDAMENTAL and unsolved problems.
I would like 2 things to be added to CMAKE officially, so that we are not hacking around and creating trouble just to do the basics:
1) An add_subdirectory command that allows for a few extract options, include the ability to do the same stuff as ExternalProject_Add, except that it
a) works at CONFIGURE time to download the directory from GIT etc.
b) takes an extra parameter to prefix all targets in the subdirectory with a namespace, preventing namespace pollution
one possible way to do this is to change/copy/modify ExternalProject_Add so that it supports use during configure time via an option, and then use a standardized naming system such that the system can tie find_package together with ExternalProject_Add using the name given as the first parameter given to it. In other words, I need a way to tie ExternalProject together with find_package so they work together.
2) Upgrade find_package to support subdirectory adding and working with target names instead of old fashion file names to be searched on the filesystem. Add standardized variable names to control turning on subdirectory mode as well as specifying the subdir to use:
_USE_AS_SUBDIR
_SUBDIR
or something like that. The user would set these, then use the normal find_package() command which would
Reply
- Craig Scott
  May 4, 2019 at 5:09 pm
  An add_subdirectory command that allows for a few extract options, include the ability to do the same stuff as ExternalProject_Add, except that it
  a) works at CONFIGURE time to download the directory from GIT etc.
  See the FetchContent module
  b) takes an extra parameter to prefix all targets in the subdirectory with a namespace, preventing namespace pollution
  The lack of a namespace feature for targets is something I too have wanted since working on FetchContent. I’m not aware of anyone working on such a capability at the moment though.
  In other words, I need a way to tie ExternalProject together with find_package so they work together.
  In CMake 3.14, I added the FetchContent_MakeAvailable() function to the FetchContent module. I named it that way specifically because I’m trying to work it towards supporting both add_subdirectory() and find_package() as ways to bring a dependency into the build. It’s not trivial though, see the (long) discussions in this issue for some of the details.
  Reply
- nolange
  May 17, 2019 at 12:53 am
  I would very much like that all the C-Make Tutors work together on a set of guidelines, with clear cut how-to’s.
  That being said, I really dont think you should use CMake for building external dependencies. You either have to do that when configuring or you lack the outputs for find_package. Both are bad solutions, and this is an limitation CMake will always have.
  It gets worse if you use a project that does that, you might not want the version the project has coded in its Makefiles.
  It gets alot worse if you want the option to be able to use any of the following variants by choice: system installed, staged (prepared output in another directory) or built from source.
  I found another tool layer to be better fitting, and just using find_package as the binaries will be somehow available by the time CMake configures your project. Conan fits this purpose pretty well and seems the way forward.
  Reply
Samuel Flis
May 26, 2020 at 2:10 am
I’ve ran into this issue lately as well. Since you don’t mention any other ways of achieving the same result as Dan Pfeifer, I assume there is no reasonable workaround?
The only way I can think of is to wrap find_package in your own macro instead of redefining it but that means your cmake files will look differently as you do not use standard cmake functions. This might be fine in a closed development environment but if you want your project to work in the outside world wrapping find_package doesn’t solve anything. I would like to hear your thoughts about this.
Reply
Craig Scott
May 28, 2020 at 9:21 pm
Wrapping find_package() with your own function would be fine for a private project that nothing else will consume, or where you are in control of all projects involved. This may be the case for projects in a company, for example. For open source projects, I would recommend against it because you can’t know all the different ways your project may be used, both now and in the future (including any potential improvements to CMake). Right now, there isn’t a clear path that doesn’t involve some sort of compromise. Your best bet is to stick with standard CMake commands if you can and let consuming projects decide how they want to incorporate you into their build. That avoids forcing consumers to use whatever methods, package managers, assumptions, etc. you have onto them.
Reply
Kuba Sunderland-Ober
October 1, 2020 at 11:22 am
This is just the tip of the iceberg. Suppose you actually use cmake for something non-trivial like adding support for a new architecture that has “rich” target and source properties and needs those to be defaulted and properly inherited – just like CMake itself does when it effectively determines the source property foo by looking up a target property foo and finally checking CMAKE_foo. In other words: one has to implement property defaulting from a CMAKE_xyz-like variable to a target property, but potentially to be overridden by a source property. The only place to do it is by tying into add_executable and add_library (not necessarily by overriding those with a macro etc., but still – it’s not documented at all how to do it and it requires reading the source to figure out). CMake has a mechanism to do all this already, but (of course) it seems too much to ask to have it actually be usable by more than just cmake developers… whereas it should be a simple task, because fundamentally there’s not much to it. Just an arbitrary design choice that puts an obstacle in people’s way.
Cmake is unfortunately full of situations where hard things are easy if cmake explicitly supports them, but otherwise quite reasonably simple things are almost infinitely hard.
Another example: you want to produce compiler or linker arguments in a format that’s not baked into the C++ generators code: there’s no way to do it. You have to write a script (cmake or otherwise) to be a shim between the cmake’s generator output and the compiler, and that script then has to re-format the arguments. The price for that is that for every compiler/assembler/linker invocation you invoke an extra executable. On Windows this has a by far non-trivial cost and negatively affects build times when the compiler/assembler/linker itself is fast. Suppose you’re building a project that has a thousand source files… you can almost double the compilation time just by using the shim. That’s just a rather simple example. But anyone not using the compilers baked into cmake will run into it as the very first thing. And the idea that “toolchain files” somehow help with any of it is preposterous: they are the last thing you want if you wish a seamless cmake experience. When the user of the buildsystem sets the ${CMAKE_SYSTEM_NAME}, everything else should follow suit – just as it does when using the supported compilers such as gcc, clang, msvc, etc. To support that, you end up writing stuff that goes into the Platform and Compiler module subfolders, and the guidance there is precisely zero, even though there’s no technical reason not to have this stuff documented properly. And if someone were to suggest documenting it, they’ll be promptly dismissed that this is an implementation detail that’s not supposed to be documented.
Moderation note: This comment has been modified from its original to remove some inflammatory remarks.
Reply

Professional CMake:

A Practical Guide

Have a CMake maintainer work on your project

Get the book for more CMake content

6 thoughts on “Do Not Redefine CMake Commands”

Leave a Comment Cancel reply