In RHV 4.3.6, there are three migration policies: Minimal downtime, Suspend workload if needed and Post-copy migration. This post will explain them in details.
Live VM migration mainly involves the transfer of VM’s CPU, memory, and I/O state from the source KVM host to the destination KVM host. With regards to “Minimal downtime” and “Suspend workload if needed” , the source KVM host will mark all VM’s memory pages as RAM dirty and transfer the VM’s memory state to the destination KVM host iteratively. The transfer is performed live with the VM still running on the source host. In the first iteration, the source host transfers the entire memory of the VM to the destination, only the pages that are modified will be transferred in the subsequent iterations. The destination KVM has the latest version of all the VM’s memory pages and the VM restarts on the destination KVM host.
Minimal downtime
A policy that lets the VM migrate in typical situations. The VM should not experience any significant downtime. If the VM migration is not converging for a long time, the migration will be aborted. The guest agent hook mechanism is enabled.
From engine-config MigrationPolicies, you can get:
– The max migrations in parallel: 2
– The max stalling limit is 6. If it is still stalling after stalling 6, the migration will be aborted. It has below downtimes in milliseconds:
- initial downtime(initialItems): 100
- stalling 1 iteration, set downtime to 150
- stalling 2 iteration, set downtime to 200
- stalling 3 iteration, set downtime to 300
- stalling 4 iteration, set downtime to 400
- stalling 6 iteration, set downtime to 500
- if still stalling, abort
MigrationPolicies-Minimal downtime: [{"id":{"uuid":"80554327-0569-496b-bdeb-fcbbf52b827b"},"maxMigrations":2,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"name":"Minimal downtime","description":"A policy that lets the VM migrate in typical situations. The VM should not experience any significant downtime. If the VM migration is not converging for a long time, the migration will be aborted. The guest agent hook mechanism is enabled.","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["300"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["500"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"abort","params":[]}]}}
Suspend workload if needed
A policy that lets the VM migrate in most situations, including VMs running heavy workloads. On the other hand, the VM may experience a more significant downtime. The migration may still be aborted for extreme workloads.The guest agent hook mechanism is enabled.
It is very similar to the “Minimal Downtime” besides the maxMigrations is 1 and there is one additional schedule between stalling 6 and abort.
From engine-config MigrationPolicies, you can get:
– The max migrations in parallel: 1
– The max stalling limit is 6. If this very high downtime 5 seconds does not help, the migration will be aborted.
- initial downtime(initialItems): 100
- stalling 1 iteration, set downtime to 150
- stalling 2 iteration, set downtime to 200
- stalling 3 iteration, set downtime to 300
- stalling 4 iteration, set downtime to 400
- stalling 6 iteration, set downtime to 500
- if still stalling, set downtime to 5000
- if still stalling, abort
MigrationPolicies-Suspend workload if needed: {"id":{"uuid":"80554327-0569-496b-bdeb-fcbbf52b827c"},"maxMigrations":1,"autoConvergence":true,"migrationCompression":true,"enableGuestEvents":true,"name":"Suspend workload if needed","description":"A policy that lets the VM migrate in most situations, including VMs running heavy workloads. On the other hand, the VM may experience a more significant downtime. The migration may still be aborted for extreme workloads. The guest agent hook mechanism is enabled.","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}},{"stallingLimit":3,"convergenceItem":{"action":"setDowntime","params":["300"]}},{"stallingLimit":4,"convergenceItem":{"action":"setDowntime","params":["400"]}},{"stallingLimit":6,"convergenceItem":{"action":"setDowntime","params":["500"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"setDowntime","params":["5000"]},{"action":"abort","params":[]}]}}
Post-copy migration
The VM should not experience any significant downtime. If the VM migration is not converging for a long time, the migration will be switched to post-copy. The guest agent hook mechanism is enabled.
In post-copy VM migration, VM is suspended immediately upon beginning the migration. Its CPU state is transferred to the destination host, while its memory state is still residing at the source host. VM migration will only switch over to running on the target host once all RAM has been transferred. Compared with “Minimal downtime” and “Suspend workload if needed”, postcopy is more network bandwidth-friendly as post-copy transfers each VM page over the network only once, it provides lower total migration time than write-intensive applications VM in “Minimal downtime” and “Suspend workload if needed”. However, if there is a network interruption while in post-copy mode it will also be impossible to recover, which means the VM can be lost if a network failure occurs during migration.
From engine-config MigrationPolicies, you can get:
– The max migrations in parallel: 1
– The max stalling limit is 2.
- initial downtime(initialItems): 100
- stalling 1 iteration, set downtime to 150
- stalling 2 iteration, set downtime to 200
- if still stalling, abort
MigrationPolicies-Post-copy migration: {"id":{"uuid":"a7aeedb2-8d66-4e51-bb22-32595027ce71"},"maxMigrations":2,"autoConvergence":true,"migrationCompression":false,"enableGuestEvents":true,"name":"Post-copy migration","description":"The VM should not experience any significant downtime. If the VM migration is not converging for a long time, the migration will be switched to post-copy. The guest agent hook mechanism is enabled.","config":{"convergenceItems":[{"stallingLimit":1,"convergenceItem":{"action":"setDowntime","params":["150"]}},{"stallingLimit":2,"convergenceItem":{"action":"setDowntime","params":["200"]}}],"initialItems":[{"action":"setDowntime","params":["100"]}],"lastItems":[{"action":"postcopy","params":[]},{"action":"abort","params":[]}]}}]