KVM PCI Passthrough and Omni-Path » History » Version 16
Brian Smith, 04/07/2018 05:27 PM
1 | 16 | Brian Smith | # KVM PCI Passthrough and Omni-Path |
---|---|---|---|
2 | 1 | Brian Smith | |
3 | 16 | Brian Smith | A KVM guest can use OPA hardware when configured for PCI passthrough. This document is OPA and Debian-centric, but the concepts should apply to other Linux host operating systems and PCI devices. |
4 | 1 | Brian Smith | |
5 | ## BIOS Settings |
||
6 | |||
7 | 1. Intel VT must be enabled. |
||
8 | 4 | Brian Smith | 2. Integrated IO / IntelVT must be enabled. |
9 | 1 | Brian Smith | |
10 | ## Kernel Command Line |
||
11 | |||
12 | 9 | Brian Smith | Add this to the host's kernel command line and reboot the host: |
13 | |||
14 | 10 | Brian Smith | ``` |
15 | intel_iommu=on iommu=pt |
||
16 | ``` |
||
17 | 1 | Brian Smith | |
18 | 5 | Brian Smith | When configured properly, ```/sys/kernel/iommu_groups/``` will contain many subdirectories. If that path is empty, IOMMU is not working. |
19 | |||
20 | 1 | Brian Smith | ## Install KVM |
21 | |||
22 | ``` |
||
23 | 13 | Brian Smith | $ sudo apt install qemu-kvm libvirt-clients libvirt-daemon-system virtinst libosinfo-bin virsh |
24 | 1 | Brian Smith | $ sudo adduser YOU libvirt |
25 | $ sudo adduser YOU libvirt-qemu |
||
26 | $ sudo adduser YOU kvm |
||
27 | ``` |
||
28 | |||
29 | ## Disable hfi1 on host |
||
30 | |||
31 | The hfi1 driver must not be loaded on the host machine, in order to use PCI passthrough. In /etc/modprobe.d/hfi1.conf: |
||
32 | |||
33 | ``` |
||
34 | blacklist hfi1 |
||
35 | ``` |
||
36 | |||
37 | Also, there is no reason to have IFS installed on the host. The host machine should have no OPA functionality enabled. |
||
38 | |||
39 | ## Configure PCI Passthrough |
||
40 | |||
41 | The hfi1 device must be setup for PCI passthrough. Find the device's port in the output of lspci: |
||
42 | |||
43 | ``` |
||
44 | $ lspci | grep Omni | cut -f1 '-d ' |
||
45 | ``` |
||
46 | |||
47 | For the scripts below, prepend the port with 0000:, like "0000:80:02.0". |
||
48 | |||
49 | |||
50 | Use the following script, replace PCI_PORT with the port of the hfi1: |
||
51 | |||
52 | ``` |
||
53 | 15 | Brian Smith | |
54 | 1 | Brian Smith | #!/bin/bash |
55 | |||
56 | PCI_PORT=0000:80:02.0 |
||
57 | DEV_VENDOR=8086 |
||
58 | DEV_MODEL=24f0 |
||
59 | |||
60 | rmmod vfio_pci |
||
61 | 15 | Brian Smith | rmmod vfio |
62 | 1 | Brian Smith | echo "$PCI_PORT" > /sys/bus/pci/devices/$PCI_PORT/driver/unbind |
63 | modprobe vfio |
||
64 | modprobe vfio_pci |
||
65 | echo $DEV_VENDOR $DEV_MODEL > /sys/bus/pci/drivers/vfio-pci/new_id |
||
66 | ``` |
||
67 | |||
68 | 2 | Brian Smith | ## Configure Default Network for DNS Forwarding |
69 | |||
70 | ``` |
||
71 | $ sudo virsh net-edit default |
||
72 | ``` |
||
73 | |||
74 | Add this tag: |
||
75 | |||
76 | ``` |
||
77 | 16 | Brian Smith | <domain name='MYDOMAIN' localOnly='no'/> |
78 | 2 | Brian Smith | ``` |
79 | |||
80 | 1 | Brian Smith | ## Create Guest |
81 | |||
82 | 16 | Brian Smith | While it is possible to manage guests for an unprivileged user, they get a non-functional network setup in the default config. |
83 | 3 | Brian Smith | |
84 | **Use virsh as root.** |
||
85 | 1 | Brian Smith | |
86 | ``` |
||
87 | $ systemctl start libvirtd |
||
88 | 8 | Brian Smith | $ virt-install --virt-type kvm --name GUEST_NAME \ |
89 | 1 | Brian Smith | --vcpus=4 --virt-type kvm --cdrom $HOME/kvm-guest/debian-8.7.0-amd64-DVD-1.iso \ |
90 | -v --os-variant debian8 \ |
||
91 | 8 | Brian Smith | --disk path=PATH_TO_CREATE_DISK,size=16 --memory 4096 --graphics vnc |
92 | 1 | Brian Smith | ``` |
93 | |||
94 | Connect a VNC client to a tunneled connection to the host. |
||
95 | |||
96 | From the workstation: |
||
97 | |||
98 | ``` |
||
99 | 14 | Brian Smith | $ ssh -L5910:localhost:5900 YOU@HOST |
100 | 1 | Brian Smith | ``` |
101 | |||
102 | Now connect a VNC client to localhost:5910 and complete the install. |
||
103 | |||
104 | ## Import Existing Disk to New Guest |
||
105 | |||
106 | To import an existing guest disk image, use the following command: |
||
107 | |||
108 | ``` |
||
109 | 7 | Brian Smith | $ sudo virt-install --virt-type kvm --name GUEST_NAME \ |
110 | 1 | Brian Smith | --vcpus=4 --virt-type kvm --import \ |
111 | -v --os-variant debian8 \ |
||
112 | --disk PATH_TO_DISK_IMAGE,device=disk,bus=virtio --memory 4096 --graphics vnc |
||
113 | 2 | Brian Smith | ``` |
114 | |||
115 | ## Connect to Guest, Configure DNS |
||
116 | |||
117 | The default network for KVM is 192.168.122.0/24 and the guest should be assigned a DHCP address when it boots. Use the VNC connection to execute ```$ ip addr``. ssh should be able to connect to the guest from the host. |
||
118 | 1 | Brian Smith | |
119 | 2 | Brian Smith | Unfortunately, dnsmasq doesn't appear to set the search domain properly. For Debian, configure a search domain in the guest's ```/etc/network/interfaces```. |
120 | |||
121 | ``` |
||
122 | allow-hotplug eth0 |
||
123 | iface eth0 inet dhcp |
||
124 | 16 | Brian Smith | dns-search MYDOMAIN |
125 | 1 | Brian Smith | ``` |
126 | 2 | Brian Smith | |
127 | 1 | Brian Smith | ## Configure Guest for PCI Passthrough |
128 | |||
129 | Shutdown the guest if it is running. |
||
130 | |||
131 | ``` |
||
132 | 16 | Brian Smith | $ virsh shutdown GUEST_NAME |
133 | 1 | Brian Smith | ``` |
134 | |||
135 | Look for the PCI device in virsh. Look for a pci device that matches the port found via lspci. |
||
136 | |||
137 | ``` |
||
138 | $ virsh nodedev-list --tree |
||
139 | ``` |
||
140 | |||
141 | Detach the device. Use the child device of the one that matches the device you found via lspci. |
||
142 | |||
143 | ``` |
||
144 | $ virsh nodedev-detach pci_0000_81_00_0 |
||
145 | ``` |
||
146 | |||
147 | Dump the device info. |
||
148 | |||
149 | ``` |
||
150 | $ virsh nodedev-dumpxml pci_0000_81_00_0 |
||
151 | ``` |
||
152 | |||
153 | 16 | Brian Smith | Convert bus, slot and function to hex. The printf utility may be used to do this. |
154 | 1 | Brian Smith | |
155 | 16 | Brian Smith | |
156 | ``` |
||
157 | $ printf %x VALUE |
||
158 | ``` |
||
159 | |||
160 | 1 | Brian Smith | Edit the guest and add a hostdev section: |
161 | |||
162 | ``` |
||
163 | 16 | Brian Smith | $ virsh edit GUEST_NAME |
164 | |||
165 | 1 | Brian Smith | <hostdev mode='subsystem' type='pci' managed='yes'> |
166 | <source> |
||
167 | <address domain='0x0000' bus='0x81' slot='0x0' function='0x0'/> |
||
168 | </source> |
||
169 | </hostdev> |
||
170 | ``` |
||
171 | |||
172 | 16 | Brian Smith | Boot the guest |
173 | 1 | Brian Smith | |
174 | 16 | Brian Smith | ``` |
175 | $ virsh start GUEST_NAME |
||
176 | ``` |
||
177 | 1 | Brian Smith | |
178 | 16 | Brian Smith | Upon booting the guest, the passthrough device should be present in the guest's lspci output. The passthrough device should be usable by the guest's kernel drivers. |
179 | |||
180 | **Note**: the PCI device may have different capabilities in the VM than it has on the physical host. Hopefully, the driver takes this into account. Refer to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4c009af473b2026caaa26107e34d7cc68dad7756 for a patch that fixes one such problem in hfi1. Hope it helps. |
||
181 | |||
182 | 1 | Brian Smith | ## References |
183 | |||
184 | 1. https://wiki.debian.org/KVM |
||
185 | 2. https://jamielinux.com/docs/libvirt-networking-handbook/nat-based-network.html |
||
186 | 3. https://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM |
||
187 | 4. https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF |
||
188 | 5. https://wiki.debian.org/VGAPassthrough |
||
189 | |||
190 | ---- |
||
191 | |||
192 | 16 | Brian Smith | Brian T. Smith |
193 | Senior Technical Staff |
||
194 | System Fabric Works, Inc. |
||
195 | bsmith@systemfabricworks.com |