Infrastructure

Anatomy of a Hypervisor

Open the cover and look at what the hypervisor actually does — instruction trapping, shadow page tables, paravirt drivers, and the small modern miracles that make a VM feel like a host.

· · 4 min read

Every cloud abstraction we work with — Kubernetes, serverless, function runtimes, AI inference services — sits on top of a hypervisor doing something extraordinary on its behalf. Make one machine look like many, trustworthily. This post opens the cover and looks at the parts. The aim is not to teach hypervisor internals at depth — books exist for that — but to give the platform engineer a working mental model of where the abstractions are leaking, where the performance is being taxed, and which parts of the layer have changed materially in the last decade.

Type-1
Bare-metal hypervisors run production
VT-x/AMD-V
Hardware-assisted instruction trapping
EPT/RVI
Hardware-assisted memory translation
SR-IOV
Hardware-assisted NIC partitioning

Five things the hypervisor is actually doing

Each layer is a small modern miracle that almost nobody thinks about anymore
1
Instruction trapping & emulation CPU

VT-x / AMD-V intercept privileged instructions; the hypervisor decides whether to emulate, defer, or pass through

2
Memory translation MMU

Two-level page tables (EPT / RVI); guest virtual → guest physical → host physical, with hardware acceleration

3
Paravirtualised I/O Storage / Net

virtio drivers in the guest talk to a backend in the hypervisor — far faster than emulating real hardware

4
Hardware partitioning Direct-attach

SR-IOV NICs, NVMe namespaces, GPU vGPU/MIG slicing — line-rate isolation without emulation overhead

5
Live migration Mobility

Iterative dirty-page tracking and final stop-and-copy, all under a few hundred milliseconds of guest pause

How the CPU layer actually works

The original challenge of x86 virtualization was that some privileged instructions did not trap when executed in user mode — meaning a naïve hypervisor could not catch them. The first commercial solutions used binary translation (rewriting instructions on the fly). Then Intel VT-x and AMD-V introduced a new ring of execution — a root mode — that the hypervisor uses, with VMENTER and VMEXIT instructions to swap between host and guest contexts. Each VMEXIT is expensive (hundreds to thousands of cycles); the engineering of a modern hypervisor is in large part the engineering of not exiting unnecessarily. Paravirtualised drivers, posted interrupts, APICv, and direct-attach hardware all exist to reduce the exit count.

How the memory layer actually works

A guest sees its own physical memory. The hypervisor must translate that into host physical memory transparently. Originally this was done with shadow page tables in software — slow, complex, with interesting bugs. Modern hardware (Intel EPT, AMD RVI) implements a second-stage page table in silicon: guest virtual maps to guest physical, then guest physical maps to host physical, both through hardware. The cost is real but bounded — usually a few percent overhead on memory-intensive workloads, much less on most. NUMA awareness, transparent huge pages, and memory overcommit policies are the operational levers most platform teams actually tune.

How I/O escapes the emulation tax

Naïvely emulating a real network card is correct and slow. Paravirtual I/O — virtio in the open ecosystem, vmxnet in vSphere — gives the guest a driver that talks directly to a hypervisor backend, eliminating the emulation overhead. The next step beyond that is hardware partitioning: SR-IOV makes a single physical NIC present multiple virtual functions that the guest can drive directly, with line-rate throughput. NVMe namespaces do the equivalent for storage. GPU vGPU and MIG do it for compute accelerators. Each of these techniques trades flexibility for performance, and the choice depends on workload profile.

Why the layer keeps mattering

Containers did not replace VMs. They layered on top of them. Almost every Kubernetes node in production today runs inside a VM, even at the hyperscalers, because the hardware-isolation guarantees the hypervisor provides are still the cheapest way to get tenant separation, live migration, and snapshot semantics. Confidential computing extends this with AMD SEV-SNP and Intel TDX — encrypting guest memory against the hypervisor itself, the modern frontier of the layer. The threat model inverts: the hypervisor becomes a less-trusted layer than the guest, and the silicon enforces the boundary.

68%
Typical VM density per modern host after consolidation
Up from 10–20% on bare metal; varies wildly by workload mix and over-commit policy.

What the hypervisor share looks like in Bangladesh

Indicative hypervisor share across Bangladeshi enterprise estates
VMware vSphere BFSI, telco, government
58 %
KVM (RHV / OpenStack / Proxmox) Growing post-Broadcom
21 %
Microsoft Hyper-V Windows-heavy estates
11 %
Nutanix AHV HCI deployments
7 %
Other
3 %

Source: Cloud Digit field observations, 2024–2025.

Where the layer is heading

Three movements deserve attention. Confidential computing (SEV-SNP, TDX) turns the hypervisor into a less-trusted layer than the guest, which inverts a quarter-century of design assumptions. Lightweight VM tech — Firecracker, Cloud Hypervisor — collapses VM startup to tens of milliseconds, blurring the line with containers. And unikernel patterns push the OS-and-app into a single ELF for tail-latency workloads. The hypervisor stays. What sits on top of it keeps changing.

Related

Read next

Discussion

Comments