qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] SGX NUMA support plus vepc reset
@ 2021-10-22 19:27 Yang Zhong
  2021-10-22 19:27 ` [PATCH v2 1/5] numa: Enable numa for SGX EPC sections Yang Zhong
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Yang Zhong @ 2021-10-22 19:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: yang.zhong, pbonzini, jarkko, eblake, philmd

The basic SGX patches were merged into Qemu release, the left NUMA
function for SGX should be enabled. The patch1 implemented the SGX NUMA
ACPI to enable NUMA in the SGX guest. Since Libvirt need detailed host
SGX EPC sections info to decide how to allocate EPC sections for SGX NUMA
guest, the SGXEPCSection list is introduced to show detailed sections info
in the monitor or HMP interface.

This version also plus the vEPC reset support because the related kernel
patches are almost merged into kernel release, please ref below link:
https://lore.kernel.org/all/20211021201155.1523989-1-pbonzini@redhat.com/

Thanks!

Yang


Changes from V1:
- added documents for new members.(Eric)
- changed the "index" to "node" in struct SGXEPCSection.(Eric, Paolo)
- squashed the previous patch 4 and patch 5 into patch 3.(Paolo)
- added reset patch(patch 5) into this version.


Yang Zhong (5):
  numa: Enable numa for SGX EPC sections
  monitor: Support 'info numa' command
  numa: Support SGX numa in the monitor and Libvirt interfaces
  doc: Add the SGX numa description
  sgx: Reset the vEPC regions during VM reboot

 docs/system/i386/sgx.rst  |  31 ++++++--
 qapi/machine.json         |  10 ++-
 qapi/misc-target.json     |  19 ++++-
 include/hw/i386/sgx-epc.h |   3 +
 include/hw/i386/x86.h     |   1 +
 linux-headers/linux/kvm.h |   6 ++
 hw/core/numa.c            |   6 ++
 hw/i386/acpi-build.c      |   4 ++
 hw/i386/sgx-epc.c         |   3 +
 hw/i386/sgx.c             | 148 +++++++++++++++++++++++++++++++++++---
 hw/i386/x86.c             |   4 ++
 monitor/hmp-cmds.c        |   1 +
 qemu-options.hx           |   4 +-
 13 files changed, 222 insertions(+), 18 deletions(-)



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/5] numa: Enable numa for SGX EPC sections
  2021-10-22 19:27 [PATCH v2 0/5] SGX NUMA support plus vepc reset Yang Zhong
@ 2021-10-22 19:27 ` Yang Zhong
  2021-10-22 19:27 ` [PATCH v2 2/5] monitor: Support 'info numa' command Yang Zhong
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Yang Zhong @ 2021-10-22 19:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: yang.zhong, pbonzini, jarkko, eblake, philmd

The basic SGX did not enable numa for SGX EPC sections, which
result in all EPC sections located in numa node 0. This patch
enable SGX numa function in the guest and the EPC section can
work with RAM as one numa node.

The Guest kernel related log:
[    0.009981] ACPI: SRAT: Node 0 PXM 0 [mem 0x180000000-0x183ffffff]
[    0.009982] ACPI: SRAT: Node 1 PXM 1 [mem 0x184000000-0x185bfffff]
The SRAT table can normally show SGX EPC sections menory info in different
numa nodes.

The SGX EPC numa related command:
 ......
 -m 4G,maxmem=20G \
 -smp sockets=2,cores=2 \
 -cpu host,+sgx-provisionkey \
 -object memory-backend-ram,size=2G,host-nodes=0,policy=bind,id=node0 \
 -object memory-backend-epc,id=mem0,size=64M,prealloc=on,host-nodes=0,policy=bind \
 -numa node,nodeid=0,cpus=0-1,memdev=node0 \
 -object memory-backend-ram,size=2G,host-nodes=1,policy=bind,id=node1 \
 -object memory-backend-epc,id=mem1,size=28M,prealloc=on,host-nodes=1,policy=bind \
 -numa node,nodeid=1,cpus=2-3,memdev=node1 \
 -M sgx-epc.0.memdev=mem0,sgx-epc.0.node=0,sgx-epc.1.memdev=mem1,sgx-epc.1.node=1 \
 ......

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 qapi/machine.json         | 10 ++++++++-
 include/hw/i386/sgx-epc.h |  3 +++
 hw/i386/acpi-build.c      |  4 ++++
 hw/i386/sgx-epc.c         |  3 +++
 hw/i386/sgx.c             | 44 +++++++++++++++++++++++++++++++++++++++
 monitor/hmp-cmds.c        |  1 +
 qemu-options.hx           |  4 ++--
 7 files changed, 66 insertions(+), 3 deletions(-)

diff --git a/qapi/machine.json b/qapi/machine.json
index 5db54df298..38a1e3438f 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1207,12 +1207,15 @@
 #
 # @memdev: memory backend linked with device
 #
+# @node: the numa node
+#
 # Since: 6.2
 ##
 { 'struct': 'SgxEPCDeviceInfo',
   'data': { '*id': 'str',
             'memaddr': 'size',
             'size': 'size',
+            'node': 'int',
             'memdev': 'str'
           }
 }
@@ -1285,10 +1288,15 @@
 #
 # @memdev: memory backend linked with device
 #
+# @node: the numa node
+#
 # Since: 6.2
 ##
 { 'struct': 'SgxEPC',
-  'data': { 'memdev': 'str' } }
+  'data': { 'memdev': 'str',
+            'node': 'int'
+          }
+}
 
 ##
 # @SgxEPCProperties:
diff --git a/include/hw/i386/sgx-epc.h b/include/hw/i386/sgx-epc.h
index a6a65be854..581fac389a 100644
--- a/include/hw/i386/sgx-epc.h
+++ b/include/hw/i386/sgx-epc.h
@@ -25,6 +25,7 @@
 #define SGX_EPC_ADDR_PROP "addr"
 #define SGX_EPC_SIZE_PROP "size"
 #define SGX_EPC_MEMDEV_PROP "memdev"
+#define SGX_EPC_NUMA_NODE_PROP "node"
 
 /**
  * SGXEPCDevice:
@@ -38,6 +39,7 @@ typedef struct SGXEPCDevice {
 
     /* public */
     uint64_t addr;
+    uint32_t node;
     HostMemoryBackendEpc *hostmem;
 } SGXEPCDevice;
 
@@ -56,6 +58,7 @@ typedef struct SGXEPCState {
 } SGXEPCState;
 
 bool sgx_epc_get_section(int section_nr, uint64_t *addr, uint64_t *size);
+void sgx_epc_build_srat(GArray *table_data);
 
 static inline uint64_t sgx_epc_above_4g_end(SGXEPCState *sgx_epc)
 {
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 81418b7911..563a38992f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2062,6 +2062,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
         nvdimm_build_srat(table_data);
     }
 
+    if (pcms->sgx_epc.size != 0) {
+        sgx_epc_build_srat(table_data);
+    }
+
     /*
      * TODO: this part is not in ACPI spec and current linux kernel boots fine
      * without these entries. But I recall there were issues the last time I
diff --git a/hw/i386/sgx-epc.c b/hw/i386/sgx-epc.c
index 55e2217eae..e5cd2789be 100644
--- a/hw/i386/sgx-epc.c
+++ b/hw/i386/sgx-epc.c
@@ -21,6 +21,7 @@
 
 static Property sgx_epc_properties[] = {
     DEFINE_PROP_UINT64(SGX_EPC_ADDR_PROP, SGXEPCDevice, addr, 0),
+    DEFINE_PROP_UINT32(SGX_EPC_NUMA_NODE_PROP, SGXEPCDevice, node, 0),
     DEFINE_PROP_LINK(SGX_EPC_MEMDEV_PROP, SGXEPCDevice, hostmem,
                      TYPE_MEMORY_BACKEND_EPC, HostMemoryBackendEpc *),
     DEFINE_PROP_END_OF_LIST(),
@@ -139,6 +140,8 @@ static void sgx_epc_md_fill_device_info(const MemoryDeviceState *md,
     se->memaddr = epc->addr;
     se->size = object_property_get_uint(OBJECT(epc), SGX_EPC_SIZE_PROP,
                                         NULL);
+    se->node = object_property_get_uint(OBJECT(epc), SGX_EPC_NUMA_NODE_PROP,
+                                        NULL);
     se->memdev = object_get_canonical_path(OBJECT(epc->hostmem));
 
     info->u.sgx_epc.data = se;
diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index 11607568b6..9a77519609 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -21,6 +21,7 @@
 #include "qapi/qapi-commands-misc-target.h"
 #include "exec/address-spaces.h"
 #include "sysemu/hw_accel.h"
+#include "hw/acpi/aml-build.h"
 
 #define SGX_MAX_EPC_SECTIONS            8
 #define SGX_CPUID_EPC_INVALID           0x0
@@ -29,6 +30,46 @@
 #define SGX_CPUID_EPC_SECTION           0x1
 #define SGX_CPUID_EPC_MASK              0xF
 
+static int sgx_epc_device_list(Object *obj, void *opaque)
+{
+    GSList **list = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_SGX_EPC)) {
+        *list = g_slist_append(*list, DEVICE(obj));
+    }
+
+    object_child_foreach(obj, sgx_epc_device_list, opaque);
+    return 0;
+}
+
+static GSList *sgx_epc_get_device_list(void)
+{
+    GSList *list = NULL;
+
+    object_child_foreach(qdev_get_machine(), sgx_epc_device_list, &list);
+    return list;
+}
+
+void sgx_epc_build_srat(GArray *table_data)
+{
+    GSList *device_list = sgx_epc_get_device_list();
+
+    for (; device_list; device_list = device_list->next) {
+        DeviceState *dev = device_list->data;
+        Object *obj = OBJECT(dev);
+        uint64_t addr, size;
+        int node;
+
+        node = object_property_get_uint(obj, SGX_EPC_NUMA_NODE_PROP,
+                                        &error_abort);
+        addr = object_property_get_uint(obj, SGX_EPC_ADDR_PROP, &error_abort);
+        size = object_property_get_uint(obj, SGX_EPC_SIZE_PROP, &error_abort);
+
+        build_srat_memory(table_data, addr, size, node, MEM_AFFINITY_ENABLED);
+    }
+    g_slist_free(device_list);
+}
+
 static uint64_t sgx_calc_section_metric(uint64_t low, uint64_t high)
 {
     return (low & MAKE_64BIT_MASK(12, 20)) +
@@ -179,6 +220,9 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
         /* set the memdev link with memory backend */
         object_property_parse(obj, SGX_EPC_MEMDEV_PROP, list->value->memdev,
                               &error_fatal);
+        /* set the numa node property for sgx epc object */
+        object_property_set_uint(obj, SGX_EPC_NUMA_NODE_PROP, list->value->node,
+                             &error_fatal);
         object_property_set_bool(obj, "realized", true, &error_fatal);
         object_unref(obj);
     }
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index bcaa41350e..8af26e3e20 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1878,6 +1878,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
                                se->id ? se->id : "");
                 monitor_printf(mon, "  memaddr: 0x%" PRIx64 "\n", se->memaddr);
                 monitor_printf(mon, "  size: %" PRIu64 "\n", se->size);
+                monitor_printf(mon, "  node: %" PRId64 "\n", se->node);
                 monitor_printf(mon, "  memdev: %s\n", se->memdev);
                 break;
             default:
diff --git a/qemu-options.hx b/qemu-options.hx
index 5f375bbfa6..aaa5a1926d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -127,11 +127,11 @@ SRST
 ERST
 
 DEF("M", HAS_ARG, QEMU_OPTION_M,
-    "                sgx-epc.0.memdev=memid\n",
+    "                sgx-epc.0.memdev=memid,sgx-epc.0.node=numaid\n",
     QEMU_ARCH_ALL)
 
 SRST
-``sgx-epc.0.memdev=@var{memid}``
+``sgx-epc.0.memdev=@var{memid},sgx-epc.0.node=@var{numaid}``
     Define an SGX EPC section.
 ERST
 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 2/5] monitor: Support 'info numa' command
  2021-10-22 19:27 [PATCH v2 0/5] SGX NUMA support plus vepc reset Yang Zhong
  2021-10-22 19:27 ` [PATCH v2 1/5] numa: Enable numa for SGX EPC sections Yang Zhong
@ 2021-10-22 19:27 ` Yang Zhong
  2021-10-22 19:27 ` [PATCH v2 3/5] numa: Support SGX numa in the monitor and Libvirt interfaces Yang Zhong
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Yang Zhong @ 2021-10-22 19:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: yang.zhong, pbonzini, jarkko, eblake, philmd

Add the MEMORY_DEVICE_INFO_KIND_SGX_EPC case for SGX numa info
with 'info numa' command in the monitor.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 hw/core/numa.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 510d096a88..1aa05dcf42 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -756,6 +756,7 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
     PCDIMMDeviceInfo     *pcdimm_info;
     VirtioPMEMDeviceInfo *vpi;
     VirtioMEMDeviceInfo *vmi;
+    SgxEPCDeviceInfo *se;
 
     for (info = info_list; info; info = info->next) {
         MemoryDeviceInfo *value = info->value;
@@ -781,6 +782,11 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
                 node_mem[vmi->node].node_mem += vmi->size;
                 node_mem[vmi->node].node_plugged_mem += vmi->size;
                 break;
+            case MEMORY_DEVICE_INFO_KIND_SGX_EPC:
+                se = value->u.sgx_epc.data;
+                node_mem[se->node].node_mem += se->size;
+                node_mem[se->node].node_plugged_mem = 0;
+                break;
             default:
                 g_assert_not_reached();
             }


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 3/5] numa: Support SGX numa in the monitor and Libvirt interfaces
  2021-10-22 19:27 [PATCH v2 0/5] SGX NUMA support plus vepc reset Yang Zhong
  2021-10-22 19:27 ` [PATCH v2 1/5] numa: Enable numa for SGX EPC sections Yang Zhong
  2021-10-22 19:27 ` [PATCH v2 2/5] monitor: Support 'info numa' command Yang Zhong
@ 2021-10-22 19:27 ` Yang Zhong
  2021-10-22 19:27 ` [PATCH v2 4/5] doc: Add the SGX numa description Yang Zhong
  2021-10-22 19:27 ` [PATCH v2 5/5] sgx: Reset the vEPC regions during VM reboot Yang Zhong
  4 siblings, 0 replies; 8+ messages in thread
From: Yang Zhong @ 2021-10-22 19:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: yang.zhong, pbonzini, jarkko, eblake, philmd

Add the SGXEPCSection list into SGXInfo to show the multiple
SGX EPC sections detailed info, not the total size like before.
This patch can enable numa support for 'info sgx' command and
QMP interfaces. The new interfaces show each EPC section info
in one numa node. Libvirt can use QMP interface to get the
detailed host SGX EPC capabilities to decide how to allocate
host EPC sections to guest.

(qemu) info sgx
 SGX support: enabled
 SGX1 support: enabled
 SGX2 support: enabled
 FLC support: enabled
 NUMA node #0: size=67108864
 NUMA node #1: size=29360128

The QMP interface show:
(QEMU) query-sgx
{"return": {"sgx": true, "sgx2": true, "sgx1": true, "sections": \
[{"node": 0, "size": 67108864}, {"node": 1, "size": 29360128}], "flc": true}}

(QEMU) query-sgx-capabilities
{"return": {"sgx": true, "sgx2": true, "sgx1": true, "sections": \
[{"node": 0, "size": 17070817280}, {"node": 1, "size": 17079205888}], "flc": true}}

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 qapi/misc-target.json | 19 ++++++++++++++--
 hw/i386/sgx.c         | 51 +++++++++++++++++++++++++++++++++++--------
 2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 5aa2b95b7d..1022aa0184 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -337,6 +337,21 @@
   'if': 'TARGET_ARM' }
 
 
+##
+# @SGXEPCSection:
+#
+# Information about intel SGX EPC section info
+#
+# @node: the numa node
+#
+# @size: the size of epc section
+#
+# Since: 6.2
+##
+{ 'struct': 'SGXEPCSection',
+  'data': { 'node': 'int',
+            'size': 'uint64'}}
+
 ##
 # @SGXInfo:
 #
@@ -350,7 +365,7 @@
 #
 # @flc: true if FLC is supported
 #
-# @section-size: The EPC section size for guest
+# @sections: The EPC sections info for guest
 #
 # Since: 6.2
 ##
@@ -359,7 +374,7 @@
             'sgx1': 'bool',
             'sgx2': 'bool',
             'flc': 'bool',
-            'section-size': 'uint64'},
+            'sections': ['SGXEPCSection']},
    'if': 'TARGET_I386' }
 
 ##
diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index 9a77519609..b5b710a556 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -76,11 +76,13 @@ static uint64_t sgx_calc_section_metric(uint64_t low, uint64_t high)
            ((high & MAKE_64BIT_MASK(0, 20)) << 32);
 }
 
-static uint64_t sgx_calc_host_epc_section_size(void)
+static SGXEPCSectionList *sgx_calc_host_epc_sections(void)
 {
+    SGXEPCSectionList *head = NULL, **tail = &head;
+    SGXEPCSection *section;
     uint32_t i, type;
     uint32_t eax, ebx, ecx, edx;
-    uint64_t size = 0;
+    uint32_t j = 0;
 
     for (i = 0; i < SGX_MAX_EPC_SECTIONS; i++) {
         host_cpuid(0x12, i + 2, &eax, &ebx, &ecx, &edx);
@@ -94,10 +96,13 @@ static uint64_t sgx_calc_host_epc_section_size(void)
             break;
         }
 
-        size += sgx_calc_section_metric(ecx, edx);
+        section = g_new0(SGXEPCSection, 1);
+        section->node = j++;
+        section->size = sgx_calc_section_metric(ecx, edx);
+        QAPI_LIST_APPEND(tail, section);
     }
 
-    return size;
+    return head;
 }
 
 SGXInfo *qmp_query_sgx_capabilities(Error **errp)
@@ -121,13 +126,35 @@ SGXInfo *qmp_query_sgx_capabilities(Error **errp)
     info->sgx1 = eax & (1U << 0) ? true : false;
     info->sgx2 = eax & (1U << 1) ? true : false;
 
-    info->section_size = sgx_calc_host_epc_section_size();
+    info->sections = sgx_calc_host_epc_sections();
 
     close(fd);
 
     return info;
 }
 
+static SGXEPCSectionList *sgx_get_epc_sections_list(void)
+{
+    GSList *device_list = sgx_epc_get_device_list();
+    SGXEPCSectionList *head = NULL, **tail = &head;
+    SGXEPCSection *section;
+
+    for (; device_list; device_list = device_list->next) {
+        DeviceState *dev = device_list->data;
+        Object *obj = OBJECT(dev);
+
+        section = g_new0(SGXEPCSection, 1);
+        section->node = object_property_get_uint(obj, SGX_EPC_NUMA_NODE_PROP,
+                                                 &error_abort);
+        section->size = object_property_get_uint(obj, SGX_EPC_SIZE_PROP,
+                                                 &error_abort);
+        QAPI_LIST_APPEND(tail, section);
+    }
+    g_slist_free(device_list);
+
+    return head;
+}
+
 SGXInfo *qmp_query_sgx(Error **errp)
 {
     SGXInfo *info = NULL;
@@ -146,14 +173,13 @@ SGXInfo *qmp_query_sgx(Error **errp)
         return NULL;
     }
 
-    SGXEPCState *sgx_epc = &pcms->sgx_epc;
     info = g_new0(SGXInfo, 1);
 
     info->sgx = true;
     info->sgx1 = true;
     info->sgx2 = true;
     info->flc = true;
-    info->section_size = sgx_epc->size;
+    info->sections = sgx_get_epc_sections_list();
 
     return info;
 }
@@ -161,6 +187,7 @@ SGXInfo *qmp_query_sgx(Error **errp)
 void hmp_info_sgx(Monitor *mon, const QDict *qdict)
 {
     Error *err = NULL;
+    SGXEPCSectionList *section_list, *section;
     g_autoptr(SGXInfo) info = qmp_query_sgx(&err);
 
     if (err) {
@@ -175,8 +202,14 @@ void hmp_info_sgx(Monitor *mon, const QDict *qdict)
                    info->sgx2 ? "enabled" : "disabled");
     monitor_printf(mon, "FLC support: %s\n",
                    info->flc ? "enabled" : "disabled");
-    monitor_printf(mon, "size: %" PRIu64 "\n",
-                   info->section_size);
+
+    section_list = info->sections;
+    for (section = section_list; section; section = section->next) {
+        monitor_printf(mon, "NUMA node #%" PRId64 ": ",
+                       section->value->node);
+        monitor_printf(mon, "size=%" PRIu64 "\n",
+                       section->value->size);
+    }
 }
 
 bool sgx_epc_get_section(int section_nr, uint64_t *addr, uint64_t *size)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 4/5] doc: Add the SGX numa description
  2021-10-22 19:27 [PATCH v2 0/5] SGX NUMA support plus vepc reset Yang Zhong
                   ` (2 preceding siblings ...)
  2021-10-22 19:27 ` [PATCH v2 3/5] numa: Support SGX numa in the monitor and Libvirt interfaces Yang Zhong
@ 2021-10-22 19:27 ` Yang Zhong
  2021-10-22 19:27 ` [PATCH v2 5/5] sgx: Reset the vEPC regions during VM reboot Yang Zhong
  4 siblings, 0 replies; 8+ messages in thread
From: Yang Zhong @ 2021-10-22 19:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: yang.zhong, pbonzini, jarkko, eblake, philmd

Add the SGX numa reference command and how to check if
SGX numa is support or not with multiple EPC sections.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 docs/system/i386/sgx.rst | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/docs/system/i386/sgx.rst b/docs/system/i386/sgx.rst
index f103ae2a2f..9e4ada761f 100644
--- a/docs/system/i386/sgx.rst
+++ b/docs/system/i386/sgx.rst
@@ -141,8 +141,7 @@ To launch a SGX guest:
   |qemu_system_x86| \\
    -cpu host,+sgx-provisionkey \\
    -object memory-backend-epc,id=mem1,size=64M,prealloc=on \\
-   -object memory-backend-epc,id=mem2,size=28M \\
-   -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2
+   -M sgx-epc.0.memdev=mem1,sgx-epc.0.node=0
 
 Utilizing SGX in the guest requires a kernel/OS with SGX support.
 The support can be determined in guest by::
@@ -152,8 +151,32 @@ The support can be determined in guest by::
 and SGX epc info by::
 
   $ dmesg | grep sgx
-  [    1.242142] sgx: EPC section 0x180000000-0x181bfffff
-  [    1.242319] sgx: EPC section 0x181c00000-0x1837fffff
+  [    0.182807] sgx: EPC section 0x140000000-0x143ffffff
+  [    0.183695] sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.
+
+To launch a SGX numa guest:
+
+.. parsed-literal::
+
+  |qemu_system_x86| \\
+   -cpu host,+sgx-provisionkey \\
+   -object memory-backend-ram,size=2G,host-nodes=0,policy=bind,id=node0 \\
+   -object memory-backend-epc,id=mem0,size=64M,prealloc=on,host-nodes=0,policy=bind \\
+   -numa node,nodeid=0,cpus=0-1,memdev=node0 \\
+   -object memory-backend-ram,size=2G,host-nodes=1,policy=bind,id=node1 \\
+   -object memory-backend-epc,id=mem1,size=28M,prealloc=on,host-nodes=1,policy=bind \\
+   -numa node,nodeid=1,cpus=2-3,memdev=node1 \\
+   -M sgx-epc.0.memdev=mem0,sgx-epc.0.node=0,sgx-epc.1.memdev=mem1,sgx-epc.1.node=1
+
+and SGX epc numa info by::
+
+  $ dmesg | grep sgx
+  [    0.369937] sgx: EPC section 0x180000000-0x183ffffff
+  [    0.370259] sgx: EPC section 0x184000000-0x185bfffff
+
+  $ dmesg | grep SRAT
+  [    0.009981] ACPI: SRAT: Node 0 PXM 0 [mem 0x180000000-0x183ffffff]
+  [    0.009982] ACPI: SRAT: Node 1 PXM 1 [mem 0x184000000-0x185bfffff]
 
 References
 ----------


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 5/5] sgx: Reset the vEPC regions during VM reboot
  2021-10-22 19:27 [PATCH v2 0/5] SGX NUMA support plus vepc reset Yang Zhong
                   ` (3 preceding siblings ...)
  2021-10-22 19:27 ` [PATCH v2 4/5] doc: Add the SGX numa description Yang Zhong
@ 2021-10-22 19:27 ` Yang Zhong
  2021-10-22 21:46   ` Paolo Bonzini
  4 siblings, 1 reply; 8+ messages in thread
From: Yang Zhong @ 2021-10-22 19:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: yang.zhong, pbonzini, jarkko, eblake, philmd

For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot.  For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.

Qemu can invoke the ioctl to bring its vEPC pages back to uninitialized
state. There is a possibility that some pages fail to be removed if they
are SECS pages, and the child and SECS pages could be in separate vEPC
regions.  Therefore, the ioctl returns the number of EREMOVE failures,
telling Qemu to try the ioctl again after it's done with all vEPC regions.

The related kernel patches v4 will be merged into kernel release and link:
https://lore.kernel.org/all/20211021201155.1523989-1-pbonzini@redhat.com/

Once this kernel patchset is merged, the kernel commit ids will be updated
here.

Signed-off-by: Yang Zhong <yang.zhong@intel.com>
---
 include/hw/i386/x86.h     |  1 +
 linux-headers/linux/kvm.h |  6 +++++
 hw/i386/sgx.c             | 53 +++++++++++++++++++++++++++++++++++++++
 hw/i386/x86.c             |  4 +++
 4 files changed, 64 insertions(+)

diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 23267a3674..e78ca6c156 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -141,5 +141,6 @@ qemu_irq x86_allocate_cpu_irq(void);
 void gsi_handler(void *opaque, int n, int level);
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name);
 DeviceState *ioapic_init_secondary(GSIState *gsi_state);
+void sgx_epc_reset(void *opaque);
 
 #endif
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index bcaf66cc4d..ee110e660b 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -887,6 +887,12 @@ struct kvm_ppc_resize_hpt {
 #define KVM_GET_EMULATED_CPUID	  _IOWR(KVMIO, 0x09, struct kvm_cpuid2)
 #define KVM_GET_MSR_FEATURE_INDEX_LIST    _IOWR(KVMIO, 0x0a, struct kvm_msr_list)
 
+/*
+ * ioctl for /dev/sgx_vepc
+ */
+#define SGX_MAGIC 0xA4
+#define SGX_IOC_VEPC_REMOVE_ALL       _IO(SGX_MAGIC, 0x04)
+
 /*
  * Extension capability list.
  */
diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index b5b710a556..3e21094c30 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -22,6 +22,8 @@
 #include "exec/address-spaces.h"
 #include "sysemu/hw_accel.h"
 #include "hw/acpi/aml-build.h"
+#include "hw/i386/x86.h"
+#include <sys/ioctl.h>
 
 #define SGX_MAX_EPC_SECTIONS            8
 #define SGX_CPUID_EPC_INVALID           0x0
@@ -70,6 +72,57 @@ void sgx_epc_build_srat(GArray *table_data)
     g_slist_free(device_list);
 }
 
+static int sgx_remove_all_pages(PCMachineState *pcms, int num)
+{
+    HostMemoryBackend *hostmem;
+    SGXEPCDevice *epc;
+    int failures = 0, failures_1 = 0;
+    unsigned long ret = 0;
+    int fd, j;
+
+    for (j = 0; j < num; j++) {
+        epc = pcms->sgx_epc.sections[j];
+        hostmem = MEMORY_BACKEND(epc->hostmem);
+        fd = memory_region_get_fd(host_memory_backend_get_memory(hostmem));
+
+        failures = ioctl(fd, SGX_IOC_VEPC_REMOVE_ALL);
+        if (failures < 0) {
+            return failures;
+        } else if (failures > 0) {
+            /* Remove SECS pages */
+            sleep(1);
+            failures_1 = ioctl(fd, SGX_IOC_VEPC_REMOVE_ALL);
+        }
+
+        /*
+         * The host or guest can support 8 EPC sections, use the
+         * corresponding bit to show each section removal status.
+         */
+        if (failures_1) {
+            set_bit(j, &ret);
+        }
+    }
+
+    return ret;
+}
+
+void sgx_epc_reset(void *opaque)
+{
+    PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
+    GSList *device_list = sgx_epc_get_device_list();
+    int len = g_slist_length(device_list);
+    int ret;
+
+    do {
+        ret = sgx_remove_all_pages(pcms, len);
+        if (ret == -ENOTTY) {
+            break;
+        }
+    } while (ret);
+
+    g_slist_free(device_list);
+}
+
 static uint64_t sgx_calc_section_metric(uint64_t low, uint64_t high)
 {
     return (low & MAKE_64BIT_MASK(12, 20)) +
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 76de7e2265..03d30a487a 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -39,6 +39,7 @@
 #include "sysemu/replay.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/cpu-timers.h"
+#include "sysemu/reset.h"
 #include "trace.h"
 
 #include "hw/i386/x86.h"
@@ -1307,6 +1308,9 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name,
     visit_type_SgxEPCList(v, name, &x86ms->sgx_epc_list, errp);
 
     qapi_free_SgxEPCList(list);
+
+    /* register the reset callback for sgx reset */
+    qemu_register_reset(sgx_epc_reset, NULL);
 }
 
 static void x86_machine_initfn(Object *obj)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 5/5] sgx: Reset the vEPC regions during VM reboot
  2021-10-22 19:27 ` [PATCH v2 5/5] sgx: Reset the vEPC regions during VM reboot Yang Zhong
@ 2021-10-22 21:46   ` Paolo Bonzini
  2021-10-26  8:15     ` Yang Zhong
  0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2021-10-22 21:46 UTC (permalink / raw)
  To: Yang Zhong, qemu-devel; +Cc: philmd, jarkko, eblake

On 22/10/21 21:27, Yang Zhong wrote:
> +
> +    for (j = 0; j < num; j++) {
> +        epc = pcms->sgx_epc.sections[j];
> +        hostmem = MEMORY_BACKEND(epc->hostmem);
> +        fd = memory_region_get_fd(host_memory_backend_get_memory(hostmem));
> +
> +        failures = ioctl(fd, SGX_IOC_VEPC_REMOVE_ALL);
> +        if (failures < 0) {
> +            return failures;
> +        } else if (failures > 0) {
> +            /* Remove SECS pages */
> +            sleep(1);
> +            failures_1 = ioctl(fd, SGX_IOC_VEPC_REMOVE_ALL);
> +        }
> +
> +        /*
> +         * The host or guest can support 8 EPC sections, use the
> +         * corresponding bit to show each section removal status.
> +         */
> +        if (failures_1) {
> +            set_bit(j, &ret);
> +        }
> +    }

This sleep is not necessary, just do two tries on all the regions.  So 
something like

     int failures;

     /*
      * The second pass is needed to remove SECS pages that could not
      * be removed during the first.
      */
     for (i = 0; i < 2; i++) {
         failures = 0;
         for (j = 0; j < pcms->sgx_epc.nr_sections; j++) {
             epc = pcms->sgx_epc.sections[j];
             hostmem = MEMORY_BACKEND(epc->hostmem);
             fd = 
memory_region_get_fd(host_memory_backend_get_memory(hostmem));

             r = ioctl(fd, SGX_IOC_VEPC_REMOVE_ALL);
             if (r < 0) {
                 return r;
             }
             if (r > 0) {
                 /* SECS pages remain */
                 failures++;
                 if (pass == 1) {
                     error_report("cannot reset vEPC section %d\n", j);
                 }
             }
         }
         if (!failures) {
             return 0;
         }
     }
     return failures;

is enough, without any need to do further retries.

Paolo



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 5/5] sgx: Reset the vEPC regions during VM reboot
  2021-10-22 21:46   ` Paolo Bonzini
@ 2021-10-26  8:15     ` Yang Zhong
  0 siblings, 0 replies; 8+ messages in thread
From: Yang Zhong @ 2021-10-26  8:15 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: yang.zhong, philmd, jarkko, eblake, qemu-devel

On Fri, Oct 22, 2021 at 11:46:30PM +0200, Paolo Bonzini wrote:
> On 22/10/21 21:27, Yang Zhong wrote:
> >+
> >+    for (j = 0; j < num; j++) {
> >+        epc = pcms->sgx_epc.sections[j];
> >+        hostmem = MEMORY_BACKEND(epc->hostmem);
> >+        fd = memory_region_get_fd(host_memory_backend_get_memory(hostmem));
> >+
> >+        failures = ioctl(fd, SGX_IOC_VEPC_REMOVE_ALL);
> >+        if (failures < 0) {
> >+            return failures;
> >+        } else if (failures > 0) {
> >+            /* Remove SECS pages */
> >+            sleep(1);
> >+            failures_1 = ioctl(fd, SGX_IOC_VEPC_REMOVE_ALL);
> >+        }
> >+
> >+        /*
> >+         * The host or guest can support 8 EPC sections, use the
> >+         * corresponding bit to show each section removal status.
> >+         */
> >+        if (failures_1) {
> >+            set_bit(j, &ret);
> >+        }
> >+    }
> 
> This sleep is not necessary, just do two tries on all the regions.
> So something like
> 
>     int failures;
> 
>     /*
>      * The second pass is needed to remove SECS pages that could not
>      * be removed during the first.
>      */
>     for (i = 0; i < 2; i++) {
>         failures = 0;
>         for (j = 0; j < pcms->sgx_epc.nr_sections; j++) {
>             epc = pcms->sgx_epc.sections[j];
>             hostmem = MEMORY_BACKEND(epc->hostmem);
>             fd =
> memory_region_get_fd(host_memory_backend_get_memory(hostmem));
> 
>             r = ioctl(fd, SGX_IOC_VEPC_REMOVE_ALL);
>             if (r < 0) {
>                 return r;
>             }
>             if (r > 0) {
>                 /* SECS pages remain */
>                 failures++;
>                 if (pass == 1) {
>                     error_report("cannot reset vEPC section %d\n", j);
>                 }
>             }
>         }
>         if (!failures) {
>             return 0;
>         }
>     }
>     return failures;
> 
> is enough, without any need to do further retries.
>

  Thanks Paolo, i will update it in the next version. Please also
  help review other patches, thanks! 

  Yang

> Paolo


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-26  8:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22 19:27 [PATCH v2 0/5] SGX NUMA support plus vepc reset Yang Zhong
2021-10-22 19:27 ` [PATCH v2 1/5] numa: Enable numa for SGX EPC sections Yang Zhong
2021-10-22 19:27 ` [PATCH v2 2/5] monitor: Support 'info numa' command Yang Zhong
2021-10-22 19:27 ` [PATCH v2 3/5] numa: Support SGX numa in the monitor and Libvirt interfaces Yang Zhong
2021-10-22 19:27 ` [PATCH v2 4/5] doc: Add the SGX numa description Yang Zhong
2021-10-22 19:27 ` [PATCH v2 5/5] sgx: Reset the vEPC regions during VM reboot Yang Zhong
2021-10-22 21:46   ` Paolo Bonzini
2021-10-26  8:15     ` Yang Zhong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).