kvm: Intel associative TLBs

kvm: Intel associative TLBs

Traditional x86 architecture implicitly requires TLB flushing upon context switching (CR3 writes) so the new process-to-run’s address space does not conflict with lineal to physical translations cached by previous processes. When using shadow pages for MMU virtualization, it can be quite expensive to throw away.
Intel introduced Virtual Processor ID (vpid) into its VT-x technology in order to tag different processes and therefore avoid unnecessary TLB flushes.
KVM uses a global bitmap to facilitate vpid management for all guests and all vCPUs, managing up to ~64000 unique identifiers. Upon virtual machine startup it will allocate a vpid for each vCPU with a first-come, first-serve policy. The data is protected by a vmx_vpid_lock spinlock.

 static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);  
 static DEFINE_SPINLOCK(vmx_vpid_lock);  
 ...  
 static void allocate_vpid(struct vcpu_vmx *vmx)  
 {  
      int vpid;  
      vmx->vpid = 0;  
      if (!enable_vpid)  
           return;  
      spin_lock(&vmx_vpid_lock);  
      vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);  
      if (vpid < VMX_NR_VPIDS) {  
           vmx->vpid = vpid;  
           __set_bit(vpid, vmx_vpid_bitmap);  
      }  
      spin_unlock(&vmx_vpid_lock);  
 }  

Similarly, when the guest is shutdown, it will free its corresponding the vpid(s):

 static void free_vpid(struct vcpu_vmx *vmx)  
 {  
      if (!enable_vpid)  
           return;  
      spin_lock(&vmx_vpid_lock);  
      if (vmx->vpid != 0)  
           __clear_bit(vmx->vpid, vmx_vpid_bitmap);  
      spin_unlock(&vmx_vpid_lock);  
 }  

To invalidate different cached translations based on vpid, Intel added the invvpid instruction. The specific invalidations are grouped as (for more information check the Intel reference manual vol. 3C 2.8 – Caching Translation Information):

  • Individual address: the vCPU invalidates translations for a specific  given address and PID
  • Single context: the vCPU invalidates all tagged translations for a specific given VPID
  • All context: the vCPU invalidates all translations for all VPIDs (except the original, id 0)
  • Single context, retaining global translations: the vCPU invalidates all tagged translations for a specific given VPID, except global translations.

Whenever there’s a TLB flush call or a vCPU reset (like when setting up the architecture at boot time), both part of standard x86 operations, the vpid_sync_context() function is called:

 

This function calls the corresponding invalidation type, previously described. The vpid_sync_vcpu_single() routine obviously must pass the vmx->vpid in order to specify what id its referring to.Both global and single contexts end up calling __invvpid(), that does all assembler the work.
The VPID feature can be enabled/disabled by traditional kernel module parameters  at /sys/module/kvm_intel/parameters/vpid
A while ago I proposed a patch to enable tracing vpid management for simulating tagged TLB behavior and performance. Unfortunately tracing these events for experimentation/research did not suit mainstream enough to be officially merged. Understandable.

source : https://blog.stgolabs.net/2012/05/kvm-intel-associative-tlbs.html