You can get a single-bit bools in a C++ struct, with eg
struct foo {
bool a:1;
bool b:1;
...
}
but you can't take a pointer to such a member.
So storing a boolean in one byte is more speed efficient!
(In C you can store a boolean in one bit. If, for example, you need to store a great number of booleans and memory size is more important than speed. )
You're pulling in an entire cache line (64 bytes) with any load. Why would you not just turn the bool into sum type that carries the payload with them? That way you can actually use the rest of the cache line instead of loading 64 bytes to work with a single bit, throwing the other 511 bits in the trash, and then doing another load on top for the data.
It's even worse when you do multithreading with packed bools because threads can keep trashing each other's cache lines forcing them to wait for the load from L3, or worse, DRAM.
1. load byte at $address into $register
2. use whatever native instructions there are to change just that single bit in $register - in the worst case you need multiple of them
3. write byte from $register into $address
In contrast, all modern platforms have a single "store immediate (=hardcoded in the bytecode) value" instruction, so it's either two (set register to 0 / 1, write register to RAM) or even one (write value directly to RAM) instruction.
Bit-packing structures (another poster showed an example here in the thread) used to be done pretty much everywhere and you'll notice it if you deal with reverse engineering code even from the Windows 98 era... but in anything more modern it's not needed because computers nowadays have more than 640 KB of RAM [1].
By the way, accessing / modifying small values in large(r) containers is a common problem in computing in general, so if it interests you, you might want to look into "SSD write amplification", "SMR HDD write amplification" or "flash wear leveling".
[1] https://lunduke.locals.com/post/5488507/myth-bill-gates-said...
At runtime, booleans are 1 byte each. The additional work of shifting and masking to get to the bit you want simply isn't worth it. When reading from memory, the CPU will read a whole cache lines of 64 bytes anyways. So unless you want to process a lot of booleans (and only booleans) at once, it's simply not worth having them 1 bit each.
Because aligned data runs faster, a single boolean sandwiched inbetween integers or pointers can even take up 4 or 8 bytes of memory.
Note: Some languages have built-in optimisations where an array of booleans is actually 1 bit each. C++'s `std::vector I've seen people advise to never use booleans, simply because a number (or enum) can convey more meaning while taking up the same amount of space.
Other answers go into detail about why, but you have a more fundamental misconception: nearly all use at least 1 byte, the majority of which use 4 bytes.
Using 4 bytes is partially historical, x86 used to suffer from several performance pitfalls if pointer accesses weren't aligned (multiples of 4). While his generally isn't an issue on x86 any more, other architectures do still have this limitation. ABIs designed during that era (i.e. many of the ones we use today) inherited that limitation out of nessecity. Furthermore some instructions (including on recent x86 designs, I think) need to be 4 byte aligned.
Memory is cheap, so. shrug
In embedded this can be a very different story though. There we are often working with tiny memories and lower clock speeds and the concern is packing everything in tighter.