HACKER Q&A
📣 kureikain

Is Evented or Async I/O just polling at the end of day


Hi hacker,

So I were trying to understand evented, asyncio, epoll etc...

I understand that instead of waiting and blocking when doing IO, we will continue to running, and telling the "kernel" or "lower stack" that we're interested when the data arrive...

but how does the kernel or the lower component that really do IO "know" when the data is arrived to notify us? Isn't it just polling at the lowest layer? something has to wait there and keep checking?


  👤 LowerThanZero Accepted Answer ✓
Not necessarily. See, at hardware level the CPU can use what is called an "interrupt". In its simplest form it's a CPU pin that receives electrical signals from the outside world. The CPU can then well... interrupt from what it was doing and see what the problem was. That part of code which is run when the interrupt occurs it's called "interrupt handler". Say you wanna read a sector from a disk drive and your disk controller is hooked to the interrupt signal of the CPU... you tell the drive controller "gimme sector X" and then you (as a OS kernel) mind your own business doing other things. Once the platter spins in the right position and the read head is positioned etc etc that is milliseconds later the controller gets what you asked for and it issues an "interrupt" for the CPU. The kernel interrupt handler will go and fetch the desired block. Or even better, a DMA controller can do the job and take the data from the disk controller and move it somewhere in memory then only bother the CPU to say "The required block has arrived your highness" ;-)

Network controllers can issue interrupts so you know a packet has arrived... keyboard controllers too and so on. Hopefully you get the idea. :)


👤 brudgers
At the lowest layer, it's electrons moving through transistors or quantum mechanics or quarks or turtles or what have you. It's all a question of relevant abstractions. I'm not being dismissive because it is hard to accept that object oriented programs quickly become indistinguishable from functional programs as we move down the stack.

It's really a question of what our chosen abstraction allows us to take for granted. If we're worried about what we take for granted, then we need to be wary of "just" as in "just polling." Because polling entails scheduling and efficient scheduling is NP hard and the code that implements the scheduling abstraction at the layer of the kernel abstraction is almost certainly of greater inherent complexity than the code written at the application abstraction layer that utilizes the asyncio abstraction because the kernel abstraction layer has to deal with multiple hardware threads and predictive execution and pipelining and multiple magnitudes of latency for lots and lots of processes pooled together.

If you're interested in diving down, I recommend The Design of the Unix Operating System by Bach. It has clear diagrams and language that show the big ideas happening below the application layer.


👤 LowerThanZero
Your question though rises an interesting problem: ULTIMATELY, is there a piece of code which loops while waiting for data? I guess the answer is "maybe" but it's more of a philosophical question... in my previous example the OS kernel running on the main CPU doesn't loop awaiting for a disk drive block or a network packet but the code running on the controller CPU most likely does.

So it's a matter of perspective I guess. You don't want to poll in your code and you want to let the kernel do the job for you. Does it mean that the kernel doesn't poll? Not necessarily; The kernel itself might be forced to "poll" (e.g. the device driver can't do otherwise) or it can outsource ;-) the dirty and time consuming poll job to a peripheral which in turn will have an embedded CPU and firmware.

Bottom line: if you poll it is GUARANTEED to be a waste of time and CPU cycles but if you let the kernel do it there's a chance that the system might do a better overall job. That's because main CPU time is really expensive while peripheral CPU resources might be cheaper in the big picture so ultimately it's not a "poll vs. no poll" but rather "specialized poll vs wasteful poll"


👤 citrin_ru
I'm not a kernel expert, but to my understanding it works in a following way: kernel have threads which e. g. process incoming TCP/UDP packets or send outgoing traffic. When such thread detect 'interesting' event, e. g. incoming data was saved into a buffer for a TCP connection it send a notification which cases user-space threads to wake up from a sleep and process an event.

There is always a polling in at least one place - a kernel scheduler, but the scheduler generally doesn't care if sync or async IO is used (with sync IO schedule has more work, because there are more threads/process for same number of connections, but operation principles are the same).