CppCon 2018: Morris Hafner “UEFI Applications With Modern C++”

– Welcome, everyone, to my talk on writing UEFI Applications with C++ or Modern C++. My name is Morris Hafner. I moved from Germany to
Scotland to eat haggis and work on compilers. I’m a software engineer
at Codeplay Software where I work on SYCL, the Khronos standard for heterogeneous programming for C++. I’m also a post-grad student at the University of Edinburgh and unfortunately I also have a tendency to break compilers, debuggers, linkers, pretty much you name it and I like to say that it’s a curse, but it’s probably also because I want to
do very exotic things like trying to make modern C++ happen on exotic platforms like UEFI. So what is UEFI? It stands for Unified
Extensible Firmware Interface and it’s meant to be
the replacement for BIOS on the PC platform. So it was initially developed
for the Intel Itanium platform in the ’90s and you all
know how that went down, but the technology works
well and was reused for x86 and can also be used
for ARM and RISC-V computers. It has many more features
compared to BIOS. You have network access,
you have Secure Boot which was quite controversial
in the Linux community. You also have non-volatile RAM. There’s even Bytecode specification if you want to share application code between different CPU architectures, but the most interesting thing about UEFI compared to BIOS is that
you don’t need to write as much assembly code anymore. So can I get a short hand, who has written code for BIOS? Maybe a boot loader or kernel? Yeah, a couple of people. So you might be all
familiar with this then. So the BIOS was the
initialization system so to speak for the original IBM PC and what happens if you press the power button, until UEFI, is that it does some
very basic initialization on your main board and then loads the first 512 bytes of code. From the master boot record, it jumps to some specific address
and then you’re on your own. And even very basic things like switching from the 16 bit mode to 32
bit mode to 64 bit mode, installing it you have to do it yourself and this obviously will
require assembly code because switching to 32 bit mode isn’t covered by the C language. So it is much more
lower level and you have to call interrupt instead
of having proper functions like you have EFI. So EFI executables are just
binaries on some FAT32 partition with a very specific pile system code. As you can already see,
it was heavily influenced by Microsoft with concern to organization because it uses the same binary
format as Microsoft Windows, it uses the same calling
convention as Microsoft Windows, and unfortunately it
also used UTF-16 strings, which cause a headache. And when you press the power button on pretty much every PC
here in this very room including MacBooks, it does
initialization of your hardware like BIOS, but then it doesn’t move or jump to some address
but launches an executable which is usually located
in EFI/Boot/bootx64.efi. So this is usually the case. You may change the default. You may also have to deal
with program limitations like the one on my laptop
here, which is hard codes of part of the Windows boot loader and this is an interesting exercise if you want to install
Linux on this machine. So how did we get an executable
to run on this platform? So first of all, we’re going
to need some kind of toolchain. So the thing you usually want to use is the TianoCore EDK Two, which is just a truly massive framework
and contains functions for all kinds of things. It even has a port of Python Two so if you’re wanting to run Python Two without an operating
system, download EDK Two. But I want to have
something simpler than that. So I went with the new
efilib, which is pretty much just a set of headers, a standard library, a linker script, and
some basic functionality for certain C language
features in this ERT Zero. As for compilers, you can use MSVC. I am going to use the MinGW
gcc, but you cannot use clang. If you try to use clang,
you get a nice error message saying that freestanding COFF
executables are not supported and you get a nice internal
compiler error afterwards. Right, so the standard library. So we have a SDK. We have a compiler and now
we have a standard library or we need a standard library. Unfortunately we don’t have one. The EDK Two ships with an implementation of the C95 standard library,
which is pretty much just a C library with back
fixes and the ISO 646 keywords, so and, or, but, and so on. But we also don’t have
a C++ standard library so things like operate a
new dynamic cast, ITTI, everything that would
require runtime support is just off the table. But in practice and this
is very, very hacky, we can still read things
from our host implementations like std::array or
std::tuple or the type traits because those are just
compile time constructs that don’t have any
dependency on the runtime. So it’s hacky, but it works
and I just went with that. And with that we can finally
compile an executable. So your compiler invocation line will probably look like that. You want to invoke your compiler. You want to disable the red zone, which is an optimization in the x86-64. Call a convention and this
allows certain small functions that uses less than I believe
128 bytes of stack space to not decrease the stack pointer. Even though you’re using stack space, you use the stack pointer. But we are on a freestanding
environment here. We don’t have an operating system that would catch all the interrupts and if an interrupt occurs on a system, what happens is that the processor wants to save its data on the
stack and if we did decrease the stack pointer, we would
cause memory corruption so we have to disable this optimization. Next, so after that we are
officially an environment. We have no operating system. We are in a Microsoft universe
where wchar is two bytes and there is a four bytes. We don’t have standard library. We need to change the
entry point from main to something else because the signature is slightly different. And then we tell the
linker that we don’t have a win32 executable. We have an EFI application
and we are saving this information by setting the subsystem of our application to 10. So otherwise Windows would just think that this was a valid win32 application and you could launch this on Windows and it would horribly crash. I can put this in a CMake Toolchain and then we can finally compile the code. So Hello World looks like that. We put two headers, efi.h and efilib.h. Our entry point is extern “C” and you’ll see Microsoft ABI
and the very important thing is here the system table, which contains all the services you can
access through the EFI. For example, ConOut to print
out strings on the terminal. So I use a 16 string with Hello World. Unfortunately I also need
to cast the character weight because the interface is broken. But yeah, this is Hello World for EFI. And you compile this with the command line I just showed you here. And then if you want to run
it, you can download OVMF from the Internet, which is
an Open Source implementation of UEFI and is compatible with QEMU so you can just set it as the firmware for your QEMU virtual machine. You put your executable
inside of a virtual hard drive with a FAT32 partition and
then you can hopefully launch your Hello World application
inside of your virtual machine or actually on real hardware if you would like to do that instead. So I’ve already shown you the ConOut thing and there are obviously many more features than printing something
out of the terminal in EFI and if you would like to
have access to some service, you usually need to create
for its existence using GUIDs. Because, for example, if
I want to render something on the screen, I need
to have a graphics card installed on my system
and not every system has a graphics card
installed on the system so this might fail. So we need to query it for the existence of certain services and if that succeeds, we get a nice struct back which contains a bunch of function
pointers and this gives us a nice object-oriented
style interface even in C. – [Audience Member] Nice. – Yeah, nice in terms of C. We’re obviously biased
here. (audience laughs) And if you look closely
here, the signatures are very, very interesting
because we always have a return. We always return an
error code, EFI_status. We have some input parameters,
we have mutable parameters, and we have some output parameters. So Windows might be already familiar with those in, in out, and out annotations and in EFI you have the same. So that’s very interesting because in C, this is probably fine and
the thing you want to do, but in C++ we are more used
to interfaces like that where we turn all of our
actual output arguments by value and don’t have
and usually don’t want to deal with any error codes. So what we would like
to have is a function that takes all the
input arguments by value or by construct for that
matter, the mutable arguments by reference, returns
and expected of a tuple or my error code, and then I
have a much nicer interface. Because the other thing
about output parameters on C is that you need to
create your output variables on the stack right before you
make your call to the function and those values in there don’t make sense and we can’t mark those as const because we only write to
them in the next statement. So it’s just an error prone
way to call the functions. So how do we improve that strategy? I like to create a wrapper
for all of those EFI functions and for that I came up
with a function called wrap that takes an EFI function
or EFI function pointer, takes three integers
as template parameters, the number of input arguments, the number of mutable arguments, and the number of output arguments, and then I like to pass
in a function pointer and then it returns what I get and you lamp the back
on your function back with my fixed interface that is more C++. For this to work, I
need to do a few things. First of all, I need to get the list of all my argument types. Then, I need to split this up
so it has my input arguments, the mutable arguments, and the
output arguments separately and then I can finally
create my EFI function and then we can add some
error handling to that. For the first step we need
to get the argument list. My advice is don’t bother with it. Just do a boost callable
trait in your starting step. It has a really nice
type alias called args_t where you pass in a callable. In my case, LocateDevicePath
or the type of LocateDevicePath and then it returns a tuple
with all the arguments. And then the next step would
be to split up this tuple so we have our inputs, the mutables, and the outputs separated. And we can do this with a
constexpr function like that. So my idea is that I
implement a one-way split. So given a tuple, I
want to have a new tuple that contains two tuples
where the first tuple has every element from zero
to N and the second tuple contains all elements from N to the end. And C++14 with make index sequence makes this actually very easy to do. I just call make_index_sequence with N for the first tuple and then
call make_index_sequence again from N to the end for the second tuple and format everything
to the st_impl function that basically just copies
everything into my new tuple. Note that I actually
don’t call this function. This is all just to get the types right and here I do my final three way split. So here I’m just creating
a bunch of type aliases that don’t cause any runtime code and here if I have my argument list and I want to get the first part of all my input arguments,
I just do my split at the number of input
arguments and to get the mutable arguments
and the output arguments, I need to split the second tuple at the number of mutable arguments. And then I end up with three
meaningful type lists here, In containing the input arguments, InOut containing the mutable arguments, and Out containing the output arguments. And with those three type devs, I can call another constexpr function
called make_out_param_adapter that takes three variadic templates. The sequence of inputs,
the sequence of mutables, the sequence of outputs. And it takes the EFI
function as a parameter here and returns a new parameter
that captures our EFI function and this parameter only takes
the input and the mutables as an argument, but not the outputs. The outputs are just locally
allocated on the stack here. So I remove the pointer type and create a tuple of my result. Then I create a second tuple temporarily just to get my pointers back
to my result right here. So now I have my inputs, my mutables, and the pointers to my outputs. I concatenate those three, pass everything to start applying, and then
can call my EFI function with my deconstructed tuple. And then I can return the
tuple of my output arguments. So I have a question. Who thinks this causes any overhead? We have one hand, two hands. Okay. So most people were
wrong, we have overhead. So here I have a function f that has a single output parameter and then I came up with a wrapper that I would probably write in C so it becomes an integer,
allocates an r on the stack, and returns the result. And then the cpp function is something that is most similar to
what we just metaprogrammed so we allocate a tuple of int and then return a tuple of int. And you can see on line 11
that we have a dead stall here so we move the zero to some address. So we have some overhead
here and the reason for this is that the tuple causes
invalid initialization of your functions, which means in our case that the integer is initialized as zero whereas you can see this is
just an uninitialized value, but the compiler cannot optimize that out because we are wrapping
an opaque function pointer and the compiler can’t
determine that we actually only write to this memory address. We never read from it. In C, that’s easy to tell the compiler because we just say,
“Okay, it’s uninitialized “so you just optimize it
out because it’s undefined.” In C++, we set it to zero. The compiler has to assume
that this may derive from some other function. It can’t optimize that out. How do we get around this? We can just create a small wrapper which I call uninitialized that has an empty, explicit default constructor and when we value
initialize this struct here, this does only default
initialize our value. So initialization
unfortunately happens in C++ and we can just wrap all
of our output variables in this uninitialized
type because we are sure that we want to have uninitialized values and pass this to our function. So minor change and this
instruction goes away, yeah? – [Audience Member] If
you’ve got a equals default instead of empty braces,
would you have gotten the same results? – I believe the overhead
still doesn’t go away because this still causes a
value to be default initialized, but initialization… Okay, Chris says no. I believe him.
– Okay. That’s good, I know
you’re pressed for time. – Yeah. Okay. And now what we still need to
do is wrap the error codes. So as I said before, we don’t
have access to exceptions and even then exceptions
cause some overhead so I don’t really want to use them. Instead I want to use an expected type. So if you haven’t seen
Simon Brand’s infamous talk or Andre’s talk from
earlier this conference, an expected type is something
like a special variant that either contains a result
value or an error value. In my case, I’m using Simon
Brand’s implementation tl::expected and now we just
need to instance our wrapper with a small if constexpr and we check if the return type of our
function is EFI_STATUS so it can return an error code. If it can return an error
code, we call our function, check the error code, check
if it’s not EFI_SUCCESS. If it’s not EFI_SUCCESS which
means there is some error, we return only the error
code as an unexpected value. Otherwise, right, we have a success. We just return our tuple
instead of the expected. Then there’s also this
case where EFI functions don’t have an error code that they return. They just return void. In this case, we can just
return our tuple directly without wrapping it in an unexpected. So question number two, overhead. Who thinks this causes overhead? No one. Okay. – [Audience Member] We’re
unwilling to guess at this point. (audience laughs) Uh-oh. – Oh. Okay, nevermind. (audience laughs) Um… Okay, so I can’t show you that,
but what I essentially did was I put my entire wrapper
inside of Compiler Explorer. I wrote the same wrapper
basically in C and C++ so I just called my EFI function and in C I create my output variables on the stack. In C++, I just called my
wrapper and then I check my error code in C. So in C if something failed,
I print out an error message. If it’s C, I print another error message and do the same in C++ and the result was that the generated assembly
code is indeed different, but only because the compiler
inverted the jump instruction. So unless you want to take
something like launch prediction into account, it doesn’t
really cause any overhead. Right, so this was the
metaprogramming part, but I still want to make
some things very clear. I made a lot of simplifications
and assumptions. So first of all, I only
have a 30 minute slot. I can’t cover everything. Secondly, I’m repping C and not C++. I made the assumption
that we are only wrapping fundamental types and parts. I believe the compiler
has a much harder time optimizing things if
things are not trivially copy constructible for example. Also, the trick with the callable traits only works because we
don’t have any overloads. I’m only repping C and C
doesn’t have any overloads. But you may want to add
overloads by yourself with a future std::overload
when you’re writing a high-level wrapper, using
my wrapper for example. Okay, so I’ve shown you a
bunch of metaprogramming, but I still haven’t shown you how to write some more applications
that is not Hello World. So we could write our own kernel now or we could write our own boot loader now. You just call ExitBootServices
and the machine is your own, but I chose to render a couple of things on the framebuffer instead
and because of time I don’t think I have
the time to show that, but I can just show you
later after the talk. But yeah. First of all, I need to create an instance of the graphics out protocol. So as I said before, we are not sure if our hardware actually
supports graphics. So first of all, we need
to create for the existence of some graphics adaptor and we do this by first of all creating two wrappers. So we need two functions,
locate_handle_buffer and handle_protocol. And with those two
functions, we can finally create our instance. So I call locate_handle_buffer
and if that succeeds, I call handle_protocol
and if that succeeds, I cast the result of handle_protocol to EFI_GRAPHICS_OUTPUT_PROTOCOL. And the nice thing is if anything fails, I just print out fail
and I’m done with it. And the nice thing is even
though we didn’t have exceptions, I could put in my error
handling inside of my slides because I used expected. Okay, so with our graphics
out protocol instance we can create a framebuffer
where it’s just iterating over the available modes. So different resolutions,
different call adapts, and so on. In my code I just chose to
choose a very specific resolution that is pretty much
available in every system. If it doesn’t work, well tough luck. I just exit. And yeah, we created our framebuffer. We can actually draw to the screen now. So in my case, I also wanted
to have some double buffering so I emulated by just stack
allocating enough space and then implementing two functions, swap_to_screen and clear. I’m also using std::fill and std::copy because those don’t have any
runtime dependencies either. And with that I can render
things on the screen and unfortunately I still
don’t have access to the heap because I was mostly
just too lazy to do it. I could just probably reimplement malloc, but I can’t just open say
agilityf file from the hard drive, do some mesh processing in memory, and then render everything on my screen. I only have my stack
that implicit surfaces gives me an escape hatch. Implicit surfaces give me
a functional representation of my scene and with that,
everything is just a function, our stack is allocated, and I
can just retrace that scene. Unfortunately no time for details here. I can show you the demo after the talk. But yeah, this was
basically the talk on EFI so I want to cover a few more things. Everything that I’ve done
here was technically incorrect and not comformant. Partially because I made a bunch of hacks, but also the standard. Even if we have a standard
library available for EFI, the standard says about
freestanding environments that the available subset
of functions available is pretty much just the subset
of the C standard library. Now right now, yes? – [Audience Member] So
that’s not exactly true? – Yeah.
– The way the standard specifies the freestanding
mode is at least those headers. So you’re implementation
is absolutely allowed to provide more than the bare minimum that the standard specifies. – Okay, so to paraphrase, he pointed out that an implementation
is allowed to provide more features from the
C++ standard library. The standard just mandates
that it has to provide at least more or less the C subset. Was that more or less correct? – Yeah.
– Yeah, okay. So right now to fix that,
there is the SG14 meeting happening right now and one
of the papers on the agenda is Ben Craig’s proposal for
freestanding environments that tries to mandate more
classes to be available in a freestanding environment. Because when I’m writing this tuple, why am I not allowed to use the tuple when I want to make sure that it works in every freestanding environment? And there’s another interesting
paper going on right now, the Zero Cost deterministic
Exceptions by Herb Sutter and those basically reintroduce
as something similar to the exception specifications
and this will probably allow us to write to clear
one’s catch statements, but everything will just
map down to a language built in expected type so to speak. So maybe in the future
we will still be able to throw a catch even if you don’t have access with the heap or any
other language runtime features. So yeah, those are my
references so you can check out my code here on my GitHub
at mmha/efiraytracer. If you want to learn more
about UEFI in general, I definitely can recommend checking out the OSDev Wiki on UEFI. There is the Freestanding Proposal. There is the Implementation
of Expected by Simon and yeah, that’s it. Thank you. (audience applauds) Yeah, so to much surprise we have one and a half minutes of questions. Yes, Jason? – [Jason] At the very
beginning of your EFI main, it looked like you had a C++ attribute in your extern “C” EFI
main and I was just curious if that was… – So the question was about the… Right, this gnu::ms_abi thing? – [Jason] Yes. – So the thing about GCC
is that most attributes are also available as C++ notation. So you can also write __attribute in GCC, but I think that’s ugly and I just chose the C++ 11 notation. – [Jason] So your compiler
was fine with that? – Yep.
– Okay. – Yes? – [Audience Member] So
one thing you can do if you use clang for it
is to build an ELF file and object copy. (audience laughs) It actually works and
you don’t actually have to worry about dealing with
Microsoft performance limits. – [Audience Members] You
have to manually relocated the symbols if you do that. I know because I just did that. – [Audience Member]
Object copy works if you– – So the comment was
that I can also use clang by using object copy? That’s even more hacked, I won’t do that. – [Audience Member] That’s literally what that program was created for. – Sure. Okay, yeah? So thank you very much.

3 thoughts on “CppCon 2018: Morris Hafner “UEFI Applications With Modern C++”

  1. Code like this gives template metaprogramming a bad name. Do you honestly believe this is better than the actual C-based UEFI API? IMHO, the result is needlessly complex and fugly.

  2. Good talk, full with actual commands and code, and he explains all the pitfalls. Ive been meaning to get into this, this guy basically jumpstarted my whole process. I already compiled and used the Tianocore EDK2 and got it running. Freestanding COFF executable in Clang error is sad, hopefully they can fix that., because, Overall we need more and better EFI utility programs to run before the OS.

Leave a Reply

Your email address will not be published. Required fields are marked *