So I were trying to understand evented, asyncio, epoll etc...
I understand that instead of waiting and blocking when doing IO, we will continue to running, and telling the "kernel" or "lower stack" that we're interested when the data arrive...
but how does the kernel or the lower component that really do IO "know" when the data is arrived to notify us? Isn't it just polling at the lowest layer? something has to wait there and keep checking?
Network controllers can issue interrupts so you know a packet has arrived... keyboard controllers too and so on. Hopefully you get the idea. :)
It's really a question of what our chosen abstraction allows us to take for granted. If we're worried about what we take for granted, then we need to be wary of "just" as in "just polling." Because polling entails scheduling and efficient scheduling is NP hard and the code that implements the scheduling abstraction at the layer of the kernel abstraction is almost certainly of greater inherent complexity than the code written at the application abstraction layer that utilizes the asyncio abstraction because the kernel abstraction layer has to deal with multiple hardware threads and predictive execution and pipelining and multiple magnitudes of latency for lots and lots of processes pooled together.
If you're interested in diving down, I recommend The Design of the Unix Operating System by Bach. It has clear diagrams and language that show the big ideas happening below the application layer.
So it's a matter of perspective I guess. You don't want to poll in your code and you want to let the kernel do the job for you. Does it mean that the kernel doesn't poll? Not necessarily; The kernel itself might be forced to "poll" (e.g. the device driver can't do otherwise) or it can outsource ;-) the dirty and time consuming poll job to a peripheral which in turn will have an embedded CPU and firmware.
Bottom line: if you poll it is GUARANTEED to be a waste of time and CPU cycles but if you let the kernel do it there's a chance that the system might do a better overall job. That's because main CPU time is really expensive while peripheral CPU resources might be cheaper in the big picture so ultimately it's not a "poll vs. no poll" but rather "specialized poll vs wasteful poll"
There is always a polling in at least one place - a kernel scheduler, but the scheduler generally doesn't care if sync or async IO is used (with sync IO schedule has more work, because there are more threads/process for same number of connections, but operation principles are the same).