Playing with canaries
Looking at SSP over several architectures.
Looking at SSP over several architectures.
In this post, we will talk about the canaries, which is part of “Smash Stack Protector” (SSP) mechanism built in GCC (along with most other modern compilers). This article aims to describe canaries, and summarize the different implementations of SSP on different architectures. Developers enforcing SSP should be aware of these implementations when building code that aims to be built on different architectures (for example, embedded software in IoT devices). We will dig deep into the libC and the kernel to understand fundamentally all the components of the canary.
The following architectures were tested in VMs (thanks to QEMU):
All using the same setup:
All the links to source code are provided within this article, and all the code excerpts can be found on elttam’s GitHub. The disassembled snippets are run against compiled versions of files on the GitHub, which you can compile to reproduce, with:
The stack protection mechanism appeared in response to the wide-spread stack buffer overflow vulnerabilities, which started attracting a lot of attention after the famous Phrack article by AlephOne in 1996.
Starting 2000, Hiroaki Etoh (from IBM) suggested first the idea of modifying GCC compilation process to integrate a low overhead mechanism to protect against stack overflows. This gave birth to “StackGuard”, the GCC Stack-Smashing protection still in use today. As early as 2000, several implementations were tested, and consequently have been attacked.
However, by constantly improving it, GCC implemented the “StackGuard” protection, thoroughly described in the GCC Summit paper (2003).
The goal for SSP is to provide the program a way to detect if the stack has been corrupted to the point where it can allow to redirect the code flow and allow arbitrary code execution. To protect it, a random value will be inserted at the base of stack of a function context like this:
In this example above, if Var2
boundaries are not properly checked (for example
when using strcpy()
type of functions) and attempts to overwrite the
return address, it will corrupt the canary, which the program will detect and
force a premature (but safe) exit to further memory corruption, and ultimately code execution.
As we can observe, one of the immediate weakness is that it will not avoid the corruption of the
variables of the current context (here Var1
). However, compilers can rearrange
the setup of the variables to prevent that.
The SSP is enabled at two levels:
1 - during compilation, the compiler will insert a canary check stub. The following options are supported by any recent compiler:
-fstack-protector
(since GCC 4.1): includes a canary when a function
defines an array of char
with a size of 8 bytes or more-fstack-protector-all
: adds a canary for all non-inline functions-fstack-protector-strong
(since 4.9): provides a smarter way to protect
any sensitive location within the current context (the best description can
be found
on
Kees Cook blog)2 - the protection takes effect when the binary is loaded by the
loader (ld
).
The original paper defines 3 possible canary types:
<img src=”https://github.com/elttam/canary-fun/blob/master”/assets/images/terminator_canary.png?raw=true” width=”100%”>
In practice, it is possible to check the presence of a canary within the ELF
thanks to the presence of __libc_chk_fail@plt
symbol, which is the PLT
entry for the procedure invoked should the canary be tampered with. Some tools
(like checksec.sh
or
pwntools
) can also be used.
To be secure, the canary must ensure at least the following properties:
Let’s examine that!
The GlibC manipulates the canary through a global variable called
__stack_chk_guard
.
By reading the source code of glibc-2.24
, one can apprehend quite fast when the
userland canary* is being setup. The canary is generated by the loader, through
the following calls:
*Note: the reason I specified “userland canary” is because the Linux kernel uses another (and different) canary to protect against stack overflow within the kernel. However, for simplicity, canary will always refer to userland canary for now. We will cover the kernel-land canary later in this article.
We can observe that the function _dl_setup_stack_chk_guard()
allows to create
all the canary types mentioned earlier: if dl_random
is null, then the
__stack_chk_guard
will be a “terminator canary”, otherwise “random canary”.
In practice, on recent Glibc, dl_random
is never null (we will understand why
later on), and so the
canary is only a (mem-)copy of it, with its least significant byte being nullified.
This operation is done to force the termination of a C-string, and make it harder for attackers to overwrite. But on the other hand, this also diminishes all the possible values for the canary, which can only have 2^((sizeof(register)-1)*8) different values.
The canary is roughly a (mem)copy of _dl_random
, which according to a vague
description, is populated by the kernel. Let’s see how it’s done:
_dl_aux_init()
is called by LIBC_START_MAIN()
, itself
called
by
_start
,
which is the ELF entrypoint in userland from the kernel,
as defined by the SystemV R4 ABI
(see [the x8664 ABI](https://refspecs.linuxbase.org/elf/x86_64-abi-0.21.pdf), page
25 and onward). _dl_aux_init()
is the function in charge of handling
in userland the values passed from the kernel, through the “_Auxiliary Vector”.
An
Auxiliary Vector
is an ELF structure that aims to provide information from the kernel to the
application. Note that this structure must be present but can be empty. If not
empty, then it will provide information basically in the form of an associative
array, whose keys can be found in the manpage of getauxval()
. Among other
valuable information we find:
AT_RANDOM
value can be found in the libC:
The Auxiliary Vector can be dumped directly from the terminal by invoking the
target binary and setting the environment variable LD_SHOW_AUXV
:
The same information is exposed through the procfs
structure (/proc/<pid>/auxv
):
Note: Some hardened kernels/systems will not expose this information.
Using
the greetz.c
test
file
compiled with -fstack-protector
, we can also use GDB to confirm this.
Bingo, we have a perfect match: rax
contains the value of the 8 first bytes of
the AUXV AT_RANDOM! This information is useful, because now we have an easy way to
determine the canaries for any process.
The file read_canary_from_pid.c
provides a Proof-of-Concept for this attack:
Quick note: this code will work universally on all recent Linux for all
architectures as long as it supports the syscall
process_vm_readv
and exposes
their Auxiliary Vector. A Python version is also
provided, that solely relies on procfs
information.
<img src=”https://github.com/elttam/canary-fun/blob/master”/assets/images/canary-dump.png?raw=true” width=”100%”>
This means that if a process allows to read arbitrary files (such as a Directory Traversal vulnerability on a Web server), it is possible to retrieve the canary this way, if you can seek through the file descriptor. For example, if targeting an HTTP server, the leak would look something like:
/proc/self/auxv
to get the AT_RANDOM location/proc/self/mem
and force an lseek
access to reach the location found above via the
HTTP header Range
(for instance Range: bytes=<0xAT_RANDOM_ADDRESS>-<0xAT_RANDOM_ADDRESS+16>
)sizeof(register)
data &= 0xff
)__stack_chk_guard
location in memory!That’s pretty cool, but back to our business. Right now, what we really want to
know is how the canary gets populated.
So far, we only know where the canary gets its value from, but
we do not know how the 16-byte (or 128-bit) location pointed by the Auxiliary
Vector AT_RANDOM
gets filled.
<img src=”https://github.com/elttam/canary-fun/blob/master”/assets/images/deeper.jpg?raw=true” width=”100%”>
The creation of a new process goes way beyond the purpose of this article, so we will simply cover the part that interests us. Note that there are plenty of excellent resources covering this topic.
When sys_execve
is called, the kernel will prepare the new process. If the
executable is
an ELF, it will call
load_elf_binary()
,
that will in turn
call
create_elf_tables()
.
It is this function that will populate with random data the 16-byte buffer
k_rand_bytes
, and expose it to the user.
And finally, it will create the Auxiliary Vector entry for AT_RANDOM
:
Finally, we know exactly how the canary gets its value, which we can summarize here:
sys_execve
, using the function void get_random_bytes(void *buf, int nbytes)
,AT_RANDOM
,_dl_random
global is pointing to this location,memcopy
-ed into __stack_chk_guard
.There, done!
Unfortunately for us, that does not leave us a big room for attacking its randomness. Although evaluating/attacking the random generation from this function has been done in the past, we will not cover it as part of this article. This function is at the core of every (if not all) cryptographic mechanism in Linux (cryptographic key generation, TCP sequencing, BlueTooth pairing exchange, Linux kernel canary, etc.), and even though this source of randomness is not perfect, it is considered very secure.
It is actually such a good source of entropy that developers could
also rely on it for initializing user-land random generator
srand
for non-forking processes.
The following C snippet could be used to that extent:
This would actually be better than the traditional (and vulnerable) call:
which is still used way too much
<img src=”https://github.com/elttam/canary-fun/blob/master”/assets/images/github-search.png?raw=true” width=”100%”>
But there is a design flaw here: as we saw, the canary value is being set via the LIBC_START_MAIN
function. This means that the value is only generated when a new ELF is
being executed and mapped in memory (via sys_execve
syscall). But a regular
fork will result in the child process systematically inheriting its canary from
its parent. This weakness makes the canary inherently vulnerable to brute-force attacks
(CTF players are very familiar with
such attack).
Now that we know where the canary’s value comes from, let’s spend some time trying to analyse how the canary is used at the assembly level as directed by the compiler (here GCC-6.3.0), for several architectures, starting naturally with Intel.
GCC implementation of the canary for Intel architectures will rely on the
selector gs
as as quick grep
(or even better, rg
) will tell:
i386 family uses
segmentation to
translate virtual address in protected mode to physical address. Several
16-bit selectors exist to make this mechanism possible, and the most commonly
known are Code Selector (CS), Data Selector (DS), Stack Selector (SS). Two
additional selectors exist, FS and GS, without specific purpose. GS is usually
used to
store TLS
information. GCC uses this segment register to save the canary at the offset
GS:0x14. If no TLS, its location is
pointed by the symbol __stack_chk_guard
.
If we apply it to the binary greetz
, the canary is copied in the current
context right after the function prologue:
And the epilogue will be in charge of checking if the canary has been modified:
As seen by grep-ing GCC earlier, x86_64 will use FS instead of GS as a selector with an offset of 0x18. This implementation choice is interesting since x86_64 has a flat-memory model, and GS, FS are only offsetting registers. However, it is used because of the following property:
Every segment register has a “visible” part and a “hidden” part. When a segment selector is loaded into the visible part of a segment register, the processor also loads the hidden part of the segment register with the base address, segment limit, and access control information from the segment descriptor pointed to by the segment selector. The information cached in the segment register (visible and hidden) allows the processor to translate addresses without taking extra bus cycles to read the base address and limit from the segment descriptor.
Source: Intel® 64 and IA-32 Architectures, Sect 3.4.3 - Segment registers
Using FS allows to have the canary in the current memory layout without having a potential attacker allowed to directly reach the address.
On Intel, the canary is stored in a readable and writable location. By inserting a simple C stub, such as
it becomes possible to read the canary’s value from inside the current process
(read_canary.c
is here):
Using the same movl
instruction and swaping the arguments also allows to
re-write it. By combining those
two mechanisms (reading and writing), one can replace a forked
process’ canary with an arbitrary one directly during the runtime of the
process. This means that x86 developers can protect their code against
brute-force attacks on the SSP with a very simple stub, and minimal performance
impact.
To prove it, the file
greetz-renew-canary.c
was
written as a Proof-of-Concept, where it will replace the child process’ canary
with a dummy value (in this case 0x4142434445464748). This code runs similarly
on 32 and 64 bits.
As we can see, through a quite simple hack, we have protected our forked
process against brute-force attack! A good seed for the new canary would
be to re-use another chunk of the buffer randomly generated (provided by
AT_RANDOM
).
Other implementations such as RenewSSP provides a ready-to-use library to force the canary renewal upon forking. Similarly, this library uses this “hack” to update the canary values on forked process, and works only for x86. The very nature of this hack will never allow it to be merged upstream.
Now that canaries on Intel architecture have no secret for us, let’s move on to other architectures and implementations.
The location of the SSP can be found under symbol __stack_chk_guard
, and the failure
procedure (__stack_chk_fail
) by its PLT location.
Let’s compile greetz.c
on an ARMv6l (RaspberryPi-like) with
-fstack-protector
, and disassemble the (vulnerable) greetz()
function. It
will look something like this:
At 0x000085c0, the binary loads the canary, and stores it into the stack at
0x000085c8. A careful reader would have seen those weird andeq
instruction
after the return (pop pc
). The first address (at 0x00008604 greetz+80
)
corresponds to the address where the canary location is hardcoded by the
compiler. But because it is within the .text
segment, GDB assumes it is code
and disassemble it as code, where it is really an address.
The canary is written in BSS, so its location will always be predictable unless the binary is compiled as PIE. But wait, if the compiler defines a hardcoded value to indicate where to find the canary, how can this work if the memory is totally randomized?
To do that, the compiler will cheat: it will hardcode at the end of the function
an offset (at 0x7f558808) and, since on ARM, $pc
is a register like any other,
it will simply $pc
to this offset to find the canary (at 0x7f55880c)!
This means that the compiler requires that the .got
page be located
immediately after the .text
page(s). Such predictability allows attacks such
as Offset2lib
.
Fun fact: if only one function is to be SSP-protected, the compiler can optimize
the code to strip the reference to __stack_chk_guard
. The location of the canary
will stay the same, but no symbol will exist.
MIPS compiled binaries can also be protected by SSP, and canaries check implementation on MIPS is very similar to the ARM approach.
But unlike ARM, the stub inserted by the compiler will point to an address in
the GOT. This location holds another address pointing into a read-only location
mapped by ld.so
, where the __stack_chk_guard
is stored.
This double dereference does not really allow to hack our way to simply update the canary when the binary is forked, like we did on Intel.
Just as in ARM, the few ways to recover the canary would be by either bruteforcing the 2^24 possible values, or through an information leak. Many home routers are MIPS-based Linux boxes, and still have many format string vulnerabilities which can be precious for this kind of attack.
Last but not least, let’s see SSP on PowerPC. As it just so happens, there is not much more to say for this architecture and it is very similar to ARM and MIPS.
A page is allocated in memory as read/write, which will contains the canary.
As expected, the canary is populated the same way that we described before, and
the PoC read_canary_from_pid
can still be used to know the canary of a running
process:
And now that we’ve covered all the major architectures, you might also be curious to know about the kernel-land canary.
Well, Linux protects also itself against overflows thanks to a per-process structure called
stack_canary
. This field is populated very early during the kernel
initialization by
calling the architecture-specific function boot_init_stack_canary()
.
On x86, Linux will use the same function as in user-land
(i.e. get_random_bytes()
), and will shuffle it using the timestamp like this:
For MIPS and ARM (including AARCH64), the kernel canary uses
get_random_bytes()
as well, but the result is XOR-ed with LINUX_VERSION_CODE
variable:
And every fork()
will generate a new kernel canary for the current
process:
Very similarly to user-land, the
procedure
__stack_chk_fail()
will be invoked to panic()
the kernel when a corruption is detected.
In this article, we’ve tried to cover a big part of the SSP
protection, which is the canary generation and use. We’ve tested it across
several architectures, which had us peeking down into kernel-land. Although the
focus was given to understanding the canary
mechanism of it, it is important to note that SSP encompasses more mechanisms,
such as local variable re-ordering, and can also be finely tuned according to
specific needs (using --param=ssp-buffer-size=N
with N=8
as a default).
To conclude, SSP provides a fairly good protection against stack buffer overflows,
on all architectures tested. Developers should be encouraged to
systematically provide binaries compiled with this flag. In case of doubt as
to which SSP option would offer the best trade-off security/performance, it
would be recommended to turn to
-fstack-protector-strong
, as it provides more protection against buffer
overrun, by improving the traditional SSP argument re-ordering (to detect
function pointers and such).
As you may have noticed reading the implementation details across all the different architectures, the SSP implementation within the C compiler is pretty much the same; the most notable exception being Intel, which uses architecture-specific property to provide a better way to reach the canary.
So if we were to summarize the pros & cons of the use of a stack canary, we could say that:
Pros:
Cons:
execve()
generates a new canary, forking process does not, meaning that
the forked process canaries may be brute-forced;offset2lib
attacks.Newer protections, such as SafeStack may offer a newer/better alternative, which may just be the subject of a follow-up blog post.
Well, that’s it. I hope you’ve enjoyed reading those notes, and feel free to poke me for comments or questions.
Playing with canaries
October 2024 - A Monocle on Chronicles
August 2024 - DUCTF 2024 ESPecially Secure Boot Writeup
July 2024 - plORMbing your Prisma ORM with Time-based Attacks
June 2024 - plORMbing your Django ORM
January 2024 - Keeping up with the Pwnses
October 2023 - Exploring the STSAFE-A110
elttam is a globally recognised, independent information security company, renowned for our advanced technical security assessments.
Read more about our services at elttam.com
Connect with us on LinkedIn
Follow us at @elttam