kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/9] PKS write protected page tables
@ 2021-05-05  0:30 Rick Edgecombe
  2021-05-05  0:30 ` [PATCH RFC 1/9] list: Support getting most recent element in list_lru Rick Edgecombe
                   ` (11 more replies)
  0 siblings, 12 replies; 32+ messages in thread
From: Rick Edgecombe @ 2021-05-05  0:30 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening
  Cc: ira.weiny, rppt, dan.j.williams, linux-kernel, Rick Edgecombe

This is a POC for write protecting page tables with PKS (Protection Keys for 
Supervisor) [1]. The basic idea is to make the page tables read only, except 
temporarily on a per-cpu basis when they need to be modified. I’m looking for 
opinions on whether people like the general direction of this in terms of 
value and implementation.

Why would people want this?
===========================
Page tables are the basis for many types of protections and as such, are a 
juicy target for attackers. Mapping them read-only will make them harder to 
use in attacks.

This protects against an attacker that has acquired the ability to write to 
the page tables. It's not foolproof because an attacker who can execute 
arbitrary code can either disable PKS directly, or simply call the same 
functions that the kernel uses for legitimate page table writes.

Why use PKS for this?
=====================
PKS is an upcoming CPU feature that allows supervisor virtual memory 
permissions to be changed without flushing the TLB, like PKU does for user 
memory. Protecting page tables would normally be really expensive because you 
would have to do it with paging itself. PKS helps by providing a way to toggle 
the writability of the page tables with just a per-cpu MSR.

Performance impacts
===================
Setting direct map permissions on whatever random page gets allocated for a 
page table would result in a lot of kernel range shootdowns and direct map 
large page shattering. So the way the PKS page table memory is created is 
similar to this module page clustering series[2], where a cache of pages is 
replenished from 2MB pages such that the direct map permissions and associated 
breakage is localized on the direct map. In the PKS page tables case, a PKS 
key is pre-applied to the direct map for pages in the cache.

There would be some costs of memory overhead in order to protect the direct 
map page tables. There would also be some extra kernel range shootdowns to 
replenish the cache on occasion, from setting the PKS key on the direct map of 
the new pages. I don’t have any actual performance data yet.

This is based on V6 [1] of the core PKS infrastructure patches. PKS 
infrastructure follow-on’s are planned to enable keys to be set to the same 
permissions globally. Since this usage needs a key to be set globally 
read-only by default, a small temporary solution is hacked up in patch 8. Long 
term, PKS protected page tables would use a better and more generic solution 
to achieve this.

[1]
https://lore.kernel.org/lkml/20210401225833.566238-1-ira.weiny@intel.com/
[2]
https://lore.kernel.org/lkml/20210405203711.1095940-1-rick.p.edgecombe@intel.com
/

Thanks,

Rick


Rick Edgecombe (9):
  list: Support getting most recent element in list_lru
  list: Support list head not in object for list_lru
  x86/mm/cpa: Add grouped page allocations
  mm: Explicitly zero page table lock ptr
  x86, mm: Use cache of page tables
  x86/mm/cpa: Add set_memory_pks()
  x86/mm/cpa: Add perm callbacks to grouped pages
  x86, mm: Protect page tables with PKS
  x86, cpa: PKS protect direct map page tables

 arch/x86/boot/compressed/ident_map_64.c |   5 +
 arch/x86/include/asm/pgalloc.h          |   6 +
 arch/x86/include/asm/pgtable.h          |  26 +-
 arch/x86/include/asm/pgtable_64.h       |  33 ++-
 arch/x86/include/asm/pkeys_common.h     |   8 +-
 arch/x86/include/asm/set_memory.h       |  23 ++
 arch/x86/mm/init.c                      |  40 +++
 arch/x86/mm/pat/set_memory.c            | 312 +++++++++++++++++++++++-
 arch/x86/mm/pgtable.c                   | 144 ++++++++++-
 include/asm-generic/pgalloc.h           |  42 +++-
 include/linux/list_lru.h                |  26 ++
 include/linux/mm.h                      |   7 +
 mm/Kconfig                              |   6 +-
 mm/list_lru.c                           |  38 ++-
 mm/memory.c                             |   1 +
 mm/swap.c                               |   7 +
 mm/swap_state.c                         |   6 +
 17 files changed, 705 insertions(+), 25 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH RFC 1/9] list: Support getting most recent element in list_lru
  2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
@ 2021-05-05  0:30 ` Rick Edgecombe
  2021-05-05  0:30 ` [PATCH RFC 2/9] list: Support list head not in object for list_lru Rick Edgecombe
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Rick Edgecombe @ 2021-05-05  0:30 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening
  Cc: ira.weiny, rppt, dan.j.williams, linux-kernel, Rick Edgecombe

In future patches, some functionality will use list_lru that also needs
to keep track of the most recently used element on a node. Since this
information is already contained within list_lru, add a function to get
it so that an additional list is not needed in the caller.

Do not support memcg aware list_lru's since it is not needed by the
intended caller.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 include/linux/list_lru.h | 13 +++++++++++++
 mm/list_lru.c            | 28 ++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 9dcaa3e582c9..4bde44a5024b 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -103,6 +103,19 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item);
  */
 bool list_lru_del(struct list_lru *lru, struct list_head *item);
 
+/**
+ * list_lru_get_mru: gets and removes the tail from one of the node lists
+ * @list_lru: the lru pointer
+ * @nid: the node id
+ *
+ * This function removes the most recently added item from one of the node
+ * id specified. This function should not be used if the list_lru is memcg
+ * aware.
+ *
+ * Return value: The element removed
+ */
+struct list_head *list_lru_get_mru(struct list_lru *lru, int nid);
+
 /**
  * list_lru_count_one: return the number of objects currently held by @lru
  * @lru: the lru pointer.
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 6f067b6b935f..fd5b19dcfc72 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -156,6 +156,34 @@ bool list_lru_del(struct list_lru *lru, struct list_head *item)
 }
 EXPORT_SYMBOL_GPL(list_lru_del);
 
+struct list_head *list_lru_get_mru(struct list_lru *lru, int nid)
+{
+	struct list_lru_node *nlru = &lru->node[nid];
+	struct list_lru_one *l = &nlru->lru;
+	struct list_head *ret;
+
+	/* This function does not attempt to search through the memcg lists */
+	if (list_lru_memcg_aware(lru)) {
+		WARN_ONCE(1, "list_lru: %s not supported on memcg aware list_lrus", __func__);
+		return NULL;
+	}
+
+	spin_lock(&nlru->lock);
+	if (list_empty(&l->list)) {
+		ret = NULL;
+	} else {
+		/* Get tail */
+		ret = l->list.prev;
+		list_del_init(ret);
+
+		l->nr_items--;
+		nlru->nr_items--;
+	}
+	spin_unlock(&nlru->lock);
+
+	return ret;
+}
+
 void list_lru_isolate(struct list_lru_one *list, struct list_head *item)
 {
 	list_del_init(item);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH RFC 2/9] list: Support list head not in object for list_lru
  2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
  2021-05-05  0:30 ` [PATCH RFC 1/9] list: Support getting most recent element in list_lru Rick Edgecombe
@ 2021-05-05  0:30 ` Rick Edgecombe
  2021-05-05  0:30 ` [PATCH RFC 3/9] x86/mm/cpa: Add grouped page allocations Rick Edgecombe
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Rick Edgecombe @ 2021-05-05  0:30 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening
  Cc: ira.weiny, rppt, dan.j.williams, linux-kernel, Rick Edgecombe

In future patches, there will be a need to keep track of objects with
list_lru where the list_head is not in the object (will be in struct
page). Since list_lru automatically determines the node id from the
list_head, this will fail when using struct page.

So create a new function in list_lru, list_lru_add_node(), that allows
the node id of the item to be passed in. Otherwise it behaves exactly
like list_lru_add().

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 include/linux/list_lru.h | 13 +++++++++++++
 mm/list_lru.c            | 10 ++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h
index 4bde44a5024b..7ad149b22223 100644
--- a/include/linux/list_lru.h
+++ b/include/linux/list_lru.h
@@ -90,6 +90,19 @@ void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg);
  */
 bool list_lru_add(struct list_lru *lru, struct list_head *item);
 
+/**
+ * list_lru_add_node: add an element to the lru list's tail
+ * @list_lru: the lru pointer
+ * @item: the item to be added.
+ * @nid: the node id of the item
+ *
+ * Like list_lru_add, but takes the node id as parameter instead of
+ * calculating it from the list_head passed in.
+ *
+ * Return value: true if the list was updated, false otherwise
+ */
+bool list_lru_add_node(struct list_lru *lru, struct list_head *item, int nid);
+
 /**
  * list_lru_del: delete an element to the lru list
  * @list_lru: the lru pointer
diff --git a/mm/list_lru.c b/mm/list_lru.c
index fd5b19dcfc72..8e32a6fc1527 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -112,9 +112,8 @@ list_lru_from_kmem(struct list_lru_node *nlru, void *ptr,
 }
 #endif /* CONFIG_MEMCG_KMEM */
 
-bool list_lru_add(struct list_lru *lru, struct list_head *item)
+bool list_lru_add_node(struct list_lru *lru, struct list_head *item, int nid)
 {
-	int nid = page_to_nid(virt_to_page(item));
 	struct list_lru_node *nlru = &lru->node[nid];
 	struct mem_cgroup *memcg;
 	struct list_lru_one *l;
@@ -134,6 +133,13 @@ bool list_lru_add(struct list_lru *lru, struct list_head *item)
 	spin_unlock(&nlru->lock);
 	return false;
 }
+
+bool list_lru_add(struct list_lru *lru, struct list_head *item)
+{
+	int nid = page_to_nid(virt_to_page(item));
+
+	return list_lru_add_node(lru, item, nid);
+}
 EXPORT_SYMBOL_GPL(list_lru_add);
 
 bool list_lru_del(struct list_lru *lru, struct list_head *item)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH RFC 3/9] x86/mm/cpa: Add grouped page allocations
  2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
  2021-05-05  0:30 ` [PATCH RFC 1/9] list: Support getting most recent element in list_lru Rick Edgecombe
  2021-05-05  0:30 ` [PATCH RFC 2/9] list: Support list head not in object for list_lru Rick Edgecombe
@ 2021-05-05  0:30 ` Rick Edgecombe
  2021-05-05 12:08   ` Mike Rapoport
  2021-05-05  0:30 ` [PATCH RFC 4/9] mm: Explicitly zero page table lock ptr Rick Edgecombe
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 32+ messages in thread
From: Rick Edgecombe @ 2021-05-05  0:30 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening
  Cc: ira.weiny, rppt, dan.j.williams, linux-kernel, Rick Edgecombe

For x86, setting memory permissions on the direct map results in fracturing
large pages. Direct map fracturing can be reduced by locating pages that
will have their permissions set close together.

Create a simple page cache that allocates pages from huge page size
blocks. Don't guarantee that a page will come from a huge page grouping,
instead fallback to non-grouped pages to fulfill the allocation if
needed. Also, register a shrinker such that the system can ask for the
pages back if needed. Since this is only needed when there is a direct
map, compile it out on highmem systems.

Free pages in the cache are kept track of in per-node list inside a
list_lru. NUMA_NO_NODE requests are serviced by checking each per-node
list in a round robin fashion. If pages are requested for a certain node
but the cache is empty for that node, a whole additional huge page size
page is allocated.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/include/asm/set_memory.h |  14 +++
 arch/x86/mm/pat/set_memory.c      | 151 ++++++++++++++++++++++++++++++
 2 files changed, 165 insertions(+)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 4352f08bfbb5..b63f09cc282a 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -4,6 +4,9 @@
 
 #include <asm/page.h>
 #include <asm-generic/set_memory.h>
+#include <linux/gfp.h>
+#include <linux/list_lru.h>
+#include <linux/shrinker.h>
 
 /*
  * The set_memory_* API can be used to change various attributes of a virtual
@@ -135,4 +138,15 @@ static inline int clear_mce_nospec(unsigned long pfn)
  */
 #endif
 
+struct grouped_page_cache {
+	struct shrinker shrinker;
+	struct list_lru lru;
+	gfp_t gfp;
+	atomic_t nid_round_robin;
+};
+
+int init_grouped_page_cache(struct grouped_page_cache *gpc, gfp_t gfp);
+struct page *get_grouped_page(int node, struct grouped_page_cache *gpc);
+void free_grouped_page(struct grouped_page_cache *gpc, struct page *page);
+
 #endif /* _ASM_X86_SET_MEMORY_H */
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 16f878c26667..6877ef66793b 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2306,6 +2306,157 @@ int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
 	return retval;
 }
 
+#ifndef HIGHMEM
+static struct page *__alloc_page_order(int node, gfp_t gfp_mask, int order)
+{
+	if (node == NUMA_NO_NODE)
+		return alloc_pages(gfp_mask, order);
+
+	return alloc_pages_node(node, gfp_mask, order);
+}
+
+static struct grouped_page_cache *__get_gpc_from_sc(struct shrinker *shrinker)
+{
+	return container_of(shrinker, struct grouped_page_cache, shrinker);
+}
+
+static unsigned long grouped_shrink_count(struct shrinker *shrinker,
+					  struct shrink_control *sc)
+{
+	struct grouped_page_cache *gpc = __get_gpc_from_sc(shrinker);
+	unsigned long page_cnt = list_lru_shrink_count(&gpc->lru, sc);
+
+	return page_cnt ? page_cnt : SHRINK_EMPTY;
+}
+
+static enum lru_status grouped_isolate(struct list_head *item,
+				       struct list_lru_one *list,
+				       spinlock_t *lock, void *cb_arg)
+{
+	struct list_head *dispose = cb_arg;
+
+	list_lru_isolate_move(list, item, dispose);
+
+	return LRU_REMOVED;
+}
+
+static void __dispose_pages(struct grouped_page_cache *gpc, struct list_head *head)
+{
+	struct list_head *cur, *next;
+
+	list_for_each_safe(cur, next, head) {
+		struct page *page = list_entry(head, struct page, lru);
+
+		list_del(cur);
+
+		__free_pages(page, 0);
+	}
+}
+
+static unsigned long grouped_shrink_scan(struct shrinker *shrinker,
+					 struct shrink_control *sc)
+{
+	struct grouped_page_cache *gpc = __get_gpc_from_sc(shrinker);
+	unsigned long isolated;
+	LIST_HEAD(freeable);
+
+	if (!(sc->gfp_mask & gpc->gfp))
+		return SHRINK_STOP;
+
+	isolated = list_lru_shrink_walk(&gpc->lru, sc, grouped_isolate,
+					&freeable);
+	__dispose_pages(gpc, &freeable);
+
+	/* Every item walked gets isolated */
+	sc->nr_scanned += isolated;
+
+	return isolated;
+}
+
+static struct page *__remove_first_page(struct grouped_page_cache *gpc, int node)
+{
+	unsigned int start_nid, i;
+	struct list_head *head;
+
+	if (node != NUMA_NO_NODE) {
+		head = list_lru_get_mru(&gpc->lru, node);
+		if (head)
+			return list_entry(head, struct page, lru);
+		return NULL;
+	}
+
+	/* If NUMA_NO_NODE, search the nodes in round robin for a page */
+	start_nid = (unsigned int)atomic_fetch_inc(&gpc->nid_round_robin) % nr_node_ids;
+	for (i = 0; i < nr_node_ids; i++) {
+		int cur_nid = (start_nid + i) % nr_node_ids;
+
+		head = list_lru_get_mru(&gpc->lru, cur_nid);
+		if (head)
+			return list_entry(head, struct page, lru);
+	}
+
+	return NULL;
+}
+
+/* Get and add some new pages to the cache to be used by VM_GROUP_PAGES */
+static struct page *__replenish_grouped_pages(struct grouped_page_cache *gpc, int node)
+{
+	const unsigned int hpage_cnt = HPAGE_SIZE >> PAGE_SHIFT;
+	struct page *page;
+	int i;
+
+	page = __alloc_page_order(node, gpc->gfp, HUGETLB_PAGE_ORDER);
+	if (!page)
+		return __alloc_page_order(node, gpc->gfp, 0);
+
+	split_page(page, HUGETLB_PAGE_ORDER);
+
+	for (i = 1; i < hpage_cnt; i++)
+		free_grouped_page(gpc, &page[i]);
+
+	return &page[0];
+}
+
+int init_grouped_page_cache(struct grouped_page_cache *gpc, gfp_t gfp)
+{
+	int err = 0;
+
+	memset(gpc, 0, sizeof(struct grouped_page_cache));
+
+	if (list_lru_init(&gpc->lru))
+		goto out;
+
+	gpc->shrinker.count_objects = grouped_shrink_count;
+	gpc->shrinker.scan_objects = grouped_shrink_scan;
+	gpc->shrinker.seeks = DEFAULT_SEEKS;
+	gpc->shrinker.flags = SHRINKER_NUMA_AWARE;
+
+	err = register_shrinker(&gpc->shrinker);
+	if (err)
+		list_lru_destroy(&gpc->lru);
+
+out:
+	return err;
+}
+
+struct page *get_grouped_page(int node, struct grouped_page_cache *gpc)
+{
+	struct page *page;
+
+	page = __remove_first_page(gpc, node);
+
+	if (page)
+		return page;
+
+	return __replenish_grouped_pages(gpc, node);
+}
+
+void free_grouped_page(struct grouped_page_cache *gpc, struct page *page)
+{
+	INIT_LIST_HEAD(&page->lru);
+	list_lru_add_node(&gpc->lru, &page->lru, page_to_nid(page));
+}
+#endif /* !HIGHMEM */
 /*
  * The testcases use internal knowledge of the implementation that shouldn't
  * be exposed to the rest of the kernel. Include these directly here.
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH RFC 4/9] mm: Explicitly zero page table lock ptr
  2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
                   ` (2 preceding siblings ...)
  2021-05-05  0:30 ` [PATCH RFC 3/9] x86/mm/cpa: Add grouped page allocations Rick Edgecombe
@ 2021-05-05  0:30 ` Rick Edgecombe
  2021-05-05  0:30 ` [PATCH RFC 5/9] x86, mm: Use cache of page tables Rick Edgecombe
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Rick Edgecombe @ 2021-05-05  0:30 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening
  Cc: ira.weiny, rppt, dan.j.williams, linux-kernel, Rick Edgecombe

In ptlock_init() there is a VM_BUG_ON_PAGE() check on the page table lock
pointer. Explicitly zero the lock in ptlock_free() so a page table lock
can be re-initialized without triggering the BUG_ON().

It appears this doesn't normally trigger because the private field
shares the same space in struct page as ptl and page tables always
return to the buddy allocator before being re-initialized as new page
tables. When the page returns to the buddy allocator, private gets
used to store the page order, so it inadvertently clears ptl as well.
In future patches, pages will get re-initialized as page tables without
returning to the buddy allocator so this is needed.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 mm/memory.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/memory.c b/mm/memory.c
index 5efa07fb6cdc..130f8c1e380a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5225,5 +5225,6 @@ bool ptlock_alloc(struct page *page)
 void ptlock_free(struct page *page)
 {
 	kmem_cache_free(page_ptl_cachep, page->ptl);
+	page->ptl = 0;
 }
 #endif
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH RFC 5/9] x86, mm: Use cache of page tables
  2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
                   ` (3 preceding siblings ...)
  2021-05-05  0:30 ` [PATCH RFC 4/9] mm: Explicitly zero page table lock ptr Rick Edgecombe
@ 2021-05-05  0:30 ` Rick Edgecombe
  2021-05-05  8:51   ` Peter Zijlstra
  2021-05-06 18:24   ` Shakeel Butt
  2021-05-05  0:30 ` [PATCH RFC 6/9] x86/mm/cpa: Add set_memory_pks() Rick Edgecombe
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 32+ messages in thread
From: Rick Edgecombe @ 2021-05-05  0:30 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening
  Cc: ira.weiny, rppt, dan.j.williams, linux-kernel, Rick Edgecombe

Change the page table allocation functions defined in pgalloc.h to use
a cache of physically grouped pages. This will let the page tables to be
set with PKS permissions later.

For userspace page tables, they are gathered up using mmu gather, and
freed along with other types of pages in swap.c. Reuse the PageTable
page flag to communicate that swap needs to return this page to the
cache of page tables, and not free it to the page allocator. Set this flag
in the free_tlb() family of functions called by mmu gather.

Do not set PKS permissions on the page tables, because the page table
setting functions cannot handle it yet. This will be done in later
patches.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/include/asm/pgalloc.h |  4 ++
 arch/x86/mm/pgtable.c          | 75 ++++++++++++++++++++++++++++++++++
 include/asm-generic/pgalloc.h  | 42 +++++++++++++++----
 include/linux/mm.h             |  7 ++++
 mm/swap.c                      |  7 ++++
 mm/swap_state.c                |  6 +++
 6 files changed, 132 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index 62ad61d6fefc..e38b54853a51 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -7,6 +7,10 @@
 #include <linux/pagemap.h>
 
 #define __HAVE_ARCH_PTE_ALLOC_ONE
+#ifdef CONFIG_PKS_PG_TABLES
+#define __HAVE_ARCH_FREE_TABLE
+#define __HAVE_ARCH_ALLOC_TABLE
+#endif
 #define __HAVE_ARCH_PGD_FREE
 #include <asm-generic/pgalloc.h>
 
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index f6a9e2e36642..7ccd031d2384 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -6,12 +6,16 @@
 #include <asm/tlb.h>
 #include <asm/fixmap.h>
 #include <asm/mtrr.h>
+#include <asm/set_memory.h>
+#include <linux/page-flags.h>
 
 #ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
 phys_addr_t physical_mask __ro_after_init = (1ULL << __PHYSICAL_MASK_SHIFT) - 1;
 EXPORT_SYMBOL(physical_mask);
 #endif
 
+static struct grouped_page_cache gpc_pks;
+static bool pks_page_en;
 #ifdef CONFIG_HIGHPTE
 #define PGTABLE_HIGHMEM __GFP_HIGHMEM
 #else
@@ -33,6 +37,46 @@ pgtable_t pte_alloc_one(struct mm_struct *mm)
 	return __pte_alloc_one(mm, __userpte_alloc_gfp);
 }
 
+#ifdef CONFIG_PKS_PG_TABLES
+struct page *alloc_table(gfp_t gfp)
+{
+	struct page *table;
+
+	if (!pks_page_en)
+		return alloc_page(gfp);
+
+	table = get_grouped_page(numa_node_id(), &gpc_pks);
+	if (!table)
+		return NULL;
+
+	if (gfp & __GFP_ZERO)
+		memset(page_address(table), 0, PAGE_SIZE);
+
+	if (memcg_kmem_enabled() &&
+	    gfp & __GFP_ACCOUNT &&
+	    !__memcg_kmem_charge_page(table, gfp, 0)) {
+		free_table(table);
+		table = NULL;
+	}
+
+	VM_BUG_ON_PAGE(*(unsigned long *)&table->ptl, table);
+
+	return table;
+}
+
+void free_table(struct page *table_page)
+{
+	if (!pks_page_en) {
+		__free_pages(table_page, 0);
+		return;
+	}
+
+	if (memcg_kmem_enabled() && PageMemcgKmem(table_page))
+		__memcg_kmem_uncharge_page(table_page, 0);
+	free_grouped_page(&gpc_pks, table_page);
+}
+#endif /* CONFIG_PKS_PG_TABLES */
+
 static int __init setup_userpte(char *arg)
 {
 	if (!arg)
@@ -54,6 +98,8 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
 {
 	pgtable_pte_page_dtor(pte);
 	paravirt_release_pte(page_to_pfn(pte));
+	/* Set Page Table so swap knows how to free it */
+	__SetPageTable(pte);
 	paravirt_tlb_remove_table(tlb, pte);
 }
 
@@ -70,12 +116,16 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
 	tlb->need_flush_all = 1;
 #endif
 	pgtable_pmd_page_dtor(page);
+	/* Set Page Table so swap nows how to free it */
+	__SetPageTable(virt_to_page(pmd));
 	paravirt_tlb_remove_table(tlb, page);
 }
 
 #if CONFIG_PGTABLE_LEVELS > 3
 void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 {
+	/* Set Page Table so swap nows how to free it */
+	__SetPageTable(virt_to_page(pud));
 	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
 	paravirt_tlb_remove_table(tlb, virt_to_page(pud));
 }
@@ -83,6 +133,8 @@ void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
 #if CONFIG_PGTABLE_LEVELS > 4
 void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
 {
+	/* Set Page Table so swap nows how to free it */
+	__SetPageTable(virt_to_page(p4d));
 	paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
 	paravirt_tlb_remove_table(tlb, virt_to_page(p4d));
 }
@@ -411,12 +463,24 @@ static inline void _pgd_free(pgd_t *pgd)
 
 static inline pgd_t *_pgd_alloc(void)
 {
+	if (pks_page_en) {
+		struct page *page = alloc_table(GFP_PGTABLE_USER);
+
+		if (!page)
+			return NULL;
+		return page_address(page);
+	}
+
 	return (pgd_t *)__get_free_pages(GFP_PGTABLE_USER,
 					 PGD_ALLOCATION_ORDER);
 }
 
 static inline void _pgd_free(pgd_t *pgd)
 {
+	if (pks_page_en) {
+		free_table(virt_to_page(pgd));
+		return;
+	}
 	free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
 }
 #endif /* CONFIG_X86_PAE */
@@ -859,6 +923,17 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 	return 1;
 }
 
+#ifdef CONFIG_PKS_PG_TABLES
+static int __init pks_page_init(void)
+{
+	pks_page_en = !init_grouped_page_cache(&gpc_pks, GFP_KERNEL | PGTABLE_HIGHMEM);
+
+out:
+	return !pks_page_en;
+}
+
+device_initcall(pks_page_init);
+#endif /* CONFIG_PKS_PG_TABLES */
 #else /* !CONFIG_X86_64 */
 
 int pud_free_pmd_page(pud_t *pud, unsigned long addr)
diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 02932efad3ab..3437db2a2740 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -2,11 +2,26 @@
 #ifndef __ASM_GENERIC_PGALLOC_H
 #define __ASM_GENERIC_PGALLOC_H
 
+#include <linux/mm.h>
+
 #ifdef CONFIG_MMU
 
 #define GFP_PGTABLE_KERNEL	(GFP_KERNEL | __GFP_ZERO)
 #define GFP_PGTABLE_USER	(GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)
 
+#ifndef __HAVE_ARCH_ALLOC_TABLE
+static inline struct page *alloc_table(gfp_t gfp)
+{
+	return alloc_page(gfp);
+}
+#else /* __HAVE_ARCH_ALLOC_TABLE */
+extern struct page *alloc_table(gfp_t gfp);
+#endif /* __HAVE_ARCH_ALLOC_TABLE */
+
+#ifdef __HAVE_ARCH_FREE_TABLE
+extern void free_table(struct page *);
+#endif /* __HAVE_ARCH_FREE_TABLE */
+
 /**
  * __pte_alloc_one_kernel - allocate a page for PTE-level kernel page table
  * @mm: the mm_struct of the current context
@@ -18,7 +33,12 @@
  */
 static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm)
 {
-	return (pte_t *)__get_free_page(GFP_PGTABLE_KERNEL);
+	struct page *page = alloc_table(GFP_PGTABLE_KERNEL);
+
+	if (!page)
+		return NULL;
+
+	return (pte_t *)page_address(page);
 }
 
 #ifndef __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
@@ -41,7 +61,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
  */
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
-	free_page((unsigned long)pte);
+	free_table(virt_to_page(pte));
 }
 
 /**
@@ -60,11 +80,11 @@ static inline pgtable_t __pte_alloc_one(struct mm_struct *mm, gfp_t gfp)
 {
 	struct page *pte;
 
-	pte = alloc_page(gfp);
+	pte = alloc_table(gfp);
 	if (!pte)
 		return NULL;
 	if (!pgtable_pte_page_ctor(pte)) {
-		__free_page(pte);
+		free_table(pte);
 		return NULL;
 	}
 
@@ -99,7 +119,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
 static inline void pte_free(struct mm_struct *mm, struct page *pte_page)
 {
 	pgtable_pte_page_dtor(pte_page);
-	__free_page(pte_page);
+	free_table(pte_page);
 }
 
 
@@ -123,11 +143,11 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 
 	if (mm == &init_mm)
 		gfp = GFP_PGTABLE_KERNEL;
-	page = alloc_pages(gfp, 0);
+	page = alloc_table(gfp);
 	if (!page)
 		return NULL;
 	if (!pgtable_pmd_page_ctor(page)) {
-		__free_pages(page, 0);
+		free_table(page);
 		return NULL;
 	}
 	return (pmd_t *)page_address(page);
@@ -139,7 +159,7 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 {
 	BUG_ON((unsigned long)pmd & (PAGE_SIZE-1));
 	pgtable_pmd_page_dtor(virt_to_page(pmd));
-	free_page((unsigned long)pmd);
+	free_table(virt_to_page(pmd));
 }
 #endif
 
@@ -160,10 +180,14 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
 	gfp_t gfp = GFP_PGTABLE_USER;
+	struct page *table;
 
 	if (mm == &init_mm)
 		gfp = GFP_PGTABLE_KERNEL;
-	return (pud_t *)get_zeroed_page(gfp);
+	table = alloc_table(gfp);
+	if (!table)
+		return NULL;
+	return (pud_t *)page_address(table);
 }
 #endif
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 64a71bf20536..d6dedfc02aab 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2185,6 +2185,13 @@ static inline bool ptlock_init(struct page *page) { return true; }
 static inline void ptlock_free(struct page *page) {}
 #endif /* USE_SPLIT_PTE_PTLOCKS */
 
+#ifndef CONFIG_PKS_PG_TABLES
+static inline void free_table(struct page *table_page)
+{
+	__free_pages(table_page, 0);
+}
+#endif /* CONFIG_PKS_PG_TABLES */
+
 static inline void pgtable_init(void)
 {
 	ptlock_cache_init();
diff --git a/mm/swap.c b/mm/swap.c
index 31b844d4ed94..d6ff697be28e 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -36,6 +36,7 @@
 #include <linux/hugetlb.h>
 #include <linux/page_idle.h>
 #include <linux/local_lock.h>
+#include <asm/pgalloc.h>
 
 #include "internal.h"
 
@@ -888,6 +889,12 @@ void release_pages(struct page **pages, int nr)
 			continue;
 		}
 
+		if (PageTable(page)) {
+			__ClearPageTable(page);
+			free_table(page);
+			continue;
+		}
+
 		if (!put_page_testzero(page))
 			continue;
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 3cdee7b11da9..a60ec3d4ab21 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -22,6 +22,7 @@
 #include <linux/swap_slots.h>
 #include <linux/huge_mm.h>
 #include <linux/shmem_fs.h>
+#include <asm/pgalloc.h>
 #include "internal.h"
 
 /*
@@ -310,6 +311,11 @@ static inline void free_swap_cache(struct page *page)
 void free_page_and_swap_cache(struct page *page)
 {
 	free_swap_cache(page);
+	if (PageTable(page)) {
+		__ClearPageTable(page);
+		free_table(page);
+		return;
+	}
 	if (!is_huge_zero_page(page))
 		put_page(page);
 }
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH RFC 6/9] x86/mm/cpa: Add set_memory_pks()
  2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
                   ` (4 preceding siblings ...)
  2021-05-05  0:30 ` [PATCH RFC 5/9] x86, mm: Use cache of page tables Rick Edgecombe
@ 2021-05-05  0:30 ` Rick Edgecombe
  2021-05-05  0:30 ` [PATCH RFC 7/9] x86/mm/cpa: Add perm callbacks to grouped pages Rick Edgecombe
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Rick Edgecombe @ 2021-05-05  0:30 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening
  Cc: ira.weiny, rppt, dan.j.williams, linux-kernel, Rick Edgecombe

Add function for setting PKS key on kernel memory.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/include/asm/set_memory.h | 1 +
 arch/x86/mm/pat/set_memory.c      | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index b63f09cc282a..a2bab1626fdd 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -52,6 +52,7 @@ int set_memory_decrypted(unsigned long addr, int numpages);
 int set_memory_np_noalias(unsigned long addr, int numpages);
 int set_memory_nonglobal(unsigned long addr, int numpages);
 int set_memory_global(unsigned long addr, int numpages);
+int set_memory_pks(unsigned long addr, int numpages, int key);
 
 int set_pages_array_uc(struct page **pages, int addrinarray);
 int set_pages_array_wc(struct page **pages, int addrinarray);
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 6877ef66793b..29e61afb4a94 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -1914,6 +1914,13 @@ int set_memory_wb(unsigned long addr, int numpages)
 }
 EXPORT_SYMBOL(set_memory_wb);
 
+int set_memory_pks(unsigned long addr, int numpages, int key)
+{
+	return change_page_attr_set_clr(&addr, numpages, __pgprot(_PAGE_PKEY(key)),
+					__pgprot(_PAGE_PKEY(0xF & ~(unsigned int)key)),
+					0, 0, NULL);
+}
+
 int set_memory_x(unsigned long addr, int numpages)
 {
 	if (!(__supported_pte_mask & _PAGE_NX))
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH RFC 7/9] x86/mm/cpa: Add perm callbacks to grouped pages
  2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
                   ` (5 preceding siblings ...)
  2021-05-05  0:30 ` [PATCH RFC 6/9] x86/mm/cpa: Add set_memory_pks() Rick Edgecombe
@ 2021-05-05  0:30 ` Rick Edgecombe
  2021-05-05  0:30 ` [PATCH RFC 8/9] x86, mm: Protect page tables with PKS Rick Edgecombe
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Rick Edgecombe @ 2021-05-05  0:30 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening
  Cc: ira.weiny, rppt, dan.j.williams, linux-kernel, Rick Edgecombe

Future patches will need to set permissions on pages in the cache, so
add some callbacks that let gouped page cache callers provide a callback
the component can call when replenishing the cache or free-ing pages via
the shrinker.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/include/asm/set_memory.h |  8 +++++++-
 arch/x86/mm/pat/set_memory.c      | 26 +++++++++++++++++++++++---
 arch/x86/mm/pgtable.c             |  3 ++-
 3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index a2bab1626fdd..b370a20681db 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -139,14 +139,20 @@ static inline int clear_mce_nospec(unsigned long pfn)
  */
 #endif
 
+typedef int (*gpc_callback)(struct page*, unsigned int);
+
 struct grouped_page_cache {
 	struct shrinker shrinker;
 	struct list_lru lru;
 	gfp_t gfp;
+	gpc_callback pre_add_to_cache;
+	gpc_callback pre_shrink_free;
 	atomic_t nid_round_robin;
 };
 
-int init_grouped_page_cache(struct grouped_page_cache *gpc, gfp_t gfp);
+int init_grouped_page_cache(struct grouped_page_cache *gpc, gfp_t gfp,
+			    gpc_callback pre_add_to_cache,
+			    gpc_callback pre_shrink_free);
 struct page *get_grouped_page(int node, struct grouped_page_cache *gpc);
 void free_grouped_page(struct grouped_page_cache *gpc, struct page *page);
 
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 29e61afb4a94..6387499c855d 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2356,6 +2356,9 @@ static void __dispose_pages(struct grouped_page_cache *gpc, struct list_head *he
 
 		list_del(cur);
 
+		if (gpc->pre_shrink_free)
+			gpc->pre_shrink_free(page, 1);
+
 		__free_pages(page, 0);
 	}
 }
@@ -2413,18 +2416,33 @@ static struct page *__replenish_grouped_pages(struct grouped_page_cache *gpc, in
 	int i;
 
 	page = __alloc_page_order(node, gpc->gfp, HUGETLB_PAGE_ORDER);
-	if (!page)
-		return __alloc_page_order(node, gpc->gfp, 0);
+	if (!page) {
+		page = __alloc_page_order(node, gpc->gfp, 0);
+		if (gpc->pre_add_to_cache)
+			gpc->pre_add_to_cache(page, 1);
+		return page;
+	}
 
 	split_page(page, HUGETLB_PAGE_ORDER);
 
+	/* If fail to prepare to be added, try to clean up and free */
+	if (gpc->pre_add_to_cache && gpc->pre_add_to_cache(page, hpage_cnt)) {
+		if (gpc->pre_shrink_free)
+			gpc->pre_shrink_free(page, hpage_cnt);
+		for (i = 0; i < hpage_cnt; i++)
+			__free_pages(&page[i], 0);
+		return NULL;
+	}
+
 	for (i = 1; i < hpage_cnt; i++)
 		free_grouped_page(gpc, &page[i]);
 
 	return &page[0];
 }
 
-int init_grouped_page_cache(struct grouped_page_cache *gpc, gfp_t gfp)
+int init_grouped_page_cache(struct grouped_page_cache *gpc, gfp_t gfp,
+			    gpc_callback pre_add_to_cache,
+			    gpc_callback pre_shrink_free)
 {
 	int err = 0;
 
@@ -2442,6 +2460,8 @@ int init_grouped_page_cache(struct grouped_page_cache *gpc, gfp_t gfp)
 	if (err)
 		list_lru_destroy(&gpc->lru);
 
+	gpc->pre_add_to_cache = pre_add_to_cache;
+	gpc->pre_shrink_free = pre_shrink_free;
 out:
 	return err;
 }
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 7ccd031d2384..bcef1f458b75 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -926,7 +926,8 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 #ifdef CONFIG_PKS_PG_TABLES
 static int __init pks_page_init(void)
 {
-	pks_page_en = !init_grouped_page_cache(&gpc_pks, GFP_KERNEL | PGTABLE_HIGHMEM);
+	pks_page_en = !init_grouped_page_cache(&gpc_pks, GFP_KERNEL | PGTABLE_HIGHMEM,
+					       NULL, NULL);
 
 out:
 	return !pks_page_en;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH RFC 8/9] x86, mm: Protect page tables with PKS
  2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
                   ` (6 preceding siblings ...)
  2021-05-05  0:30 ` [PATCH RFC 7/9] x86/mm/cpa: Add perm callbacks to grouped pages Rick Edgecombe
@ 2021-05-05  0:30 ` Rick Edgecombe
  2021-05-05  0:30 ` [PATCH RFC 9/9] x86, cpa: PKS protect direct map page tables Rick Edgecombe
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Rick Edgecombe @ 2021-05-05  0:30 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening
  Cc: ira.weiny, rppt, dan.j.williams, linux-kernel, Rick Edgecombe

Write protect page tables with PKS. Toggle writeability inside the
pgtable.h defined page table modifiction functions.

Do not protect the direct map page tables as it is more complicated and
will come in a later patch.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/boot/compressed/ident_map_64.c |  5 ++
 arch/x86/include/asm/pgalloc.h          |  2 +
 arch/x86/include/asm/pgtable.h          | 26 ++++++++-
 arch/x86/include/asm/pgtable_64.h       | 33 ++++++++++--
 arch/x86/include/asm/pkeys_common.h     |  8 ++-
 arch/x86/mm/pgtable.c                   | 72 ++++++++++++++++++++++---
 mm/Kconfig                              |  6 ++-
 7 files changed, 140 insertions(+), 12 deletions(-)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index f7213d0943b8..2999be8f9347 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -349,3 +349,8 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
 	 */
 	add_identity_map(address, end);
 }
+
+#ifdef CONFIG_PKS_PG_TABLES
+void enable_pgtable_write(void) {}
+void disable_pgtable_write(void) {}
+#endif
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index e38b54853a51..f1062d23d7c7 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -6,6 +6,8 @@
 #include <linux/mm.h>		/* for struct page */
 #include <linux/pagemap.h>
 
+#define STATIC_TABLE_KEY	1
+
 #define __HAVE_ARCH_PTE_ALLOC_ONE
 #ifdef CONFIG_PKS_PG_TABLES
 #define __HAVE_ARCH_FREE_TABLE
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index b1529b44a996..da6bae8bef7a 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -117,6 +117,14 @@ extern pmdval_t early_pmd_flags;
 #define arch_end_context_switch(prev)	do {} while(0)
 #endif	/* CONFIG_PARAVIRT_XXL */
 
+#ifdef CONFIG_PKS_PG_TABLES
+void enable_pgtable_write(void);
+void disable_pgtable_write(void);
+#else /* CONFIG_PKS_PG_TABLES */
+static void enable_pgtable_write(void) { }
+static void disable_pgtable_write(void) { }
+#endif /* CONFIG_PKS_PG_TABLES */
+
 /*
  * The following only work if pte_present() is true.
  * Undefined behaviour if not..
@@ -1102,7 +1110,9 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
 static inline void ptep_set_wrprotect(struct mm_struct *mm,
 				      unsigned long addr, pte_t *ptep)
 {
+	enable_pgtable_write();
 	clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
+	disable_pgtable_write();
 }
 
 #define flush_tlb_fix_spurious_fault(vma, address) do { } while (0)
@@ -1152,7 +1162,9 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
 static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 				      unsigned long addr, pmd_t *pmdp)
 {
+	enable_pgtable_write();
 	clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
+	disable_pgtable_write();
 }
 
 #define pud_write pud_write
@@ -1167,10 +1179,18 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 		unsigned long address, pmd_t *pmdp, pmd_t pmd)
 {
 	if (IS_ENABLED(CONFIG_SMP)) {
-		return xchg(pmdp, pmd);
+		pmd_t ret;
+
+		enable_pgtable_write();
+		ret = xchg(pmdp, pmd);
+		disable_pgtable_write();
+
+		return ret;
 	} else {
 		pmd_t old = *pmdp;
+		enable_pgtable_write();
 		WRITE_ONCE(*pmdp, pmd);
+		disable_pgtable_write();
 		return old;
 	}
 }
@@ -1253,13 +1273,17 @@ static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
  */
 static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count)
 {
+	enable_pgtable_write();
 	memcpy(dst, src, count * sizeof(pgd_t));
+	disable_pgtable_write();
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return;
 	/* Clone the user space pgd as well */
+	enable_pgtable_write();
 	memcpy(kernel_to_user_pgdp(dst), kernel_to_user_pgdp(src),
 	       count * sizeof(pgd_t));
+	disable_pgtable_write();
 #endif
 }
 
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 56d0399a0cd1..a287f3c8a0a3 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -64,7 +64,9 @@ void set_pte_vaddr_pud(pud_t *pud_page, unsigned long vaddr, pte_t new_pte);
 
 static inline void native_set_pte(pte_t *ptep, pte_t pte)
 {
+	enable_pgtable_write();
 	WRITE_ONCE(*ptep, pte);
+	disable_pgtable_write();
 }
 
 static inline void native_pte_clear(struct mm_struct *mm, unsigned long addr,
@@ -80,7 +82,9 @@ static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
 
 static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
+	enable_pgtable_write();
 	WRITE_ONCE(*pmdp, pmd);
+	disable_pgtable_write();
 }
 
 static inline void native_pmd_clear(pmd_t *pmd)
@@ -91,7 +95,12 @@ static inline void native_pmd_clear(pmd_t *pmd)
 static inline pte_t native_ptep_get_and_clear(pte_t *xp)
 {
 #ifdef CONFIG_SMP
-	return native_make_pte(xchg(&xp->pte, 0));
+	pteval_t pte_val;
+
+	enable_pgtable_write();
+	pte_val = xchg(&xp->pte, 0);
+	disable_pgtable_write();
+	return native_make_pte(pte_val);
 #else
 	/* native_local_ptep_get_and_clear,
 	   but duplicated because of cyclic dependency */
@@ -104,7 +113,12 @@ static inline pte_t native_ptep_get_and_clear(pte_t *xp)
 static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 {
 #ifdef CONFIG_SMP
-	return native_make_pmd(xchg(&xp->pmd, 0));
+	pteval_t pte_val;
+
+	enable_pgtable_write();
+	pte_val = xchg(&xp->pmd, 0);
+	disable_pgtable_write();
+	return native_make_pmd(pte_val);
 #else
 	/* native_local_pmdp_get_and_clear,
 	   but duplicated because of cyclic dependency */
@@ -116,7 +130,9 @@ static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 
 static inline void native_set_pud(pud_t *pudp, pud_t pud)
 {
+	enable_pgtable_write();
 	WRITE_ONCE(*pudp, pud);
+	disable_pgtable_write();
 }
 
 static inline void native_pud_clear(pud_t *pud)
@@ -127,7 +143,12 @@ static inline void native_pud_clear(pud_t *pud)
 static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 {
 #ifdef CONFIG_SMP
-	return native_make_pud(xchg(&xp->pud, 0));
+	pteval_t pte_val;
+
+	enable_pgtable_write();
+	pte_val = xchg(&xp->pud, 0);
+	disable_pgtable_write();
+	return native_make_pud(pte_val);
 #else
 	/* native_local_pudp_get_and_clear,
 	 * but duplicated because of cyclic dependency
@@ -144,13 +165,17 @@ static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 	pgd_t pgd;
 
 	if (pgtable_l5_enabled() || !IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION)) {
+		enable_pgtable_write();
 		WRITE_ONCE(*p4dp, p4d);
+		disable_pgtable_write();
 		return;
 	}
 
 	pgd = native_make_pgd(native_p4d_val(p4d));
 	pgd = pti_set_user_pgtbl((pgd_t *)p4dp, pgd);
+	enable_pgtable_write();
 	WRITE_ONCE(*p4dp, native_make_p4d(native_pgd_val(pgd)));
+	disable_pgtable_write();
 }
 
 static inline void native_p4d_clear(p4d_t *p4d)
@@ -160,7 +185,9 @@ static inline void native_p4d_clear(p4d_t *p4d)
 
 static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
+	enable_pgtable_write();
 	WRITE_ONCE(*pgdp, pti_set_user_pgtbl(pgdp, pgd));
+	disable_pgtable_write();
 }
 
 static inline void native_pgd_clear(pgd_t *pgd)
diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h
index 6917f1a27479..5682a922d60f 100644
--- a/arch/x86/include/asm/pkeys_common.h
+++ b/arch/x86/include/asm/pkeys_common.h
@@ -25,7 +25,13 @@
  *
  * NOTE: This needs to be a macro to be used as part of the INIT_THREAD macro.
  */
-#define INIT_PKRS_VALUE (PKR_AD_KEY(1) | PKR_AD_KEY(2) | PKR_AD_KEY(3) | \
+
+/*
+ * HACK: There is no global pkeys support yet. We want the pg table key to be
+ * read only, not disabled. Assume the page table key will be key 1 and set it
+ * WD in the default mask.
+ */
+#define INIT_PKRS_VALUE (PKR_WD_KEY(1) | PKR_AD_KEY(2) | PKR_AD_KEY(3) | \
 			 PKR_AD_KEY(4) | PKR_AD_KEY(5) | PKR_AD_KEY(6) | \
 			 PKR_AD_KEY(7) | PKR_AD_KEY(8) | PKR_AD_KEY(9) | \
 			 PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | \
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index bcef1f458b75..6e536fe77943 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -7,6 +7,7 @@
 #include <asm/fixmap.h>
 #include <asm/mtrr.h>
 #include <asm/set_memory.h>
+#include <linux/pkeys.h>
 #include <linux/page-flags.h>
 
 #ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
@@ -16,6 +17,7 @@ EXPORT_SYMBOL(physical_mask);
 
 static struct grouped_page_cache gpc_pks;
 static bool pks_page_en;
+
 #ifdef CONFIG_HIGHPTE
 #define PGTABLE_HIGHMEM __GFP_HIGHMEM
 #else
@@ -49,8 +51,11 @@ struct page *alloc_table(gfp_t gfp)
 	if (!table)
 		return NULL;
 
-	if (gfp & __GFP_ZERO)
+	if (gfp & __GFP_ZERO) {
+		enable_pgtable_write();
 		memset(page_address(table), 0, PAGE_SIZE);
+		disable_pgtable_write();
+	}
 
 	if (memcg_kmem_enabled() &&
 	    gfp & __GFP_ACCOUNT &&
@@ -607,9 +612,12 @@ int ptep_test_and_clear_young(struct vm_area_struct *vma,
 {
 	int ret = 0;
 
-	if (pte_young(*ptep))
+	if (pte_young(*ptep)) {
+		enable_pgtable_write();
 		ret = test_and_clear_bit(_PAGE_BIT_ACCESSED,
 					 (unsigned long *) &ptep->pte);
+		disable_pgtable_write();
+	}
 
 	return ret;
 }
@@ -620,9 +628,12 @@ int pmdp_test_and_clear_young(struct vm_area_struct *vma,
 {
 	int ret = 0;
 
-	if (pmd_young(*pmdp))
+	if (pmd_young(*pmdp)) {
+		enable_pgtable_write();
 		ret = test_and_clear_bit(_PAGE_BIT_ACCESSED,
 					 (unsigned long *)pmdp);
+		disable_pgtable_write();
+	}
 
 	return ret;
 }
@@ -631,9 +642,12 @@ int pudp_test_and_clear_young(struct vm_area_struct *vma,
 {
 	int ret = 0;
 
-	if (pud_young(*pudp))
+	if (pud_young(*pudp)) {
+		enable_pgtable_write();
 		ret = test_and_clear_bit(_PAGE_BIT_ACCESSED,
 					 (unsigned long *)pudp);
+		disable_pgtable_write();
+	}
 
 	return ret;
 }
@@ -642,6 +656,7 @@ int pudp_test_and_clear_young(struct vm_area_struct *vma,
 int ptep_clear_flush_young(struct vm_area_struct *vma,
 			   unsigned long address, pte_t *ptep)
 {
+	int ret;
 	/*
 	 * On x86 CPUs, clearing the accessed bit without a TLB flush
 	 * doesn't cause data corruption. [ It could cause incorrect
@@ -655,7 +670,10 @@ int ptep_clear_flush_young(struct vm_area_struct *vma,
 	 * shouldn't really matter because there's no real memory
 	 * pressure for swapout to react to. ]
 	 */
-	return ptep_test_and_clear_young(vma, address, ptep);
+	enable_pgtable_write();
+	ret = ptep_test_and_clear_young(vma, address, ptep);
+	disable_pgtable_write();
+	return ret;
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -666,7 +684,9 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma,
 
 	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
 
+	enable_pgtable_write();
 	young = pmdp_test_and_clear_young(vma, address, pmdp);
+	disable_pgtable_write();
 	if (young)
 		flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
 
@@ -924,10 +944,50 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 }
 
 #ifdef CONFIG_PKS_PG_TABLES
+static int pks_key;
+
+static int _pks_protect(struct page *page, unsigned int cnt)
+{
+	/* TODO: do this in one step */
+	if (set_memory_4k((unsigned long)page_address(page), cnt))
+		return 1;
+	set_memory_pks((unsigned long)page_address(page), cnt, pks_key);
+	return 0;
+}
+
+static int _pks_unprotect(struct page *page, unsigned int cnt)
+{
+	set_memory_pks((unsigned long)page_address(page), cnt, 0);
+	return 0;
+}
+
+void enable_pgtable_write(void)
+{
+	if (pks_page_en)
+		pks_mk_readwrite(STATIC_TABLE_KEY);
+}
+
+void disable_pgtable_write(void)
+{
+	if (pks_page_en)
+		pks_mk_readonly(STATIC_TABLE_KEY);
+}
+
 static int __init pks_page_init(void)
 {
+	/*
+	 * TODO: Needs global keys to be initially set globally readable, for now
+	 * warn if its not the expected static key
+	 */
+	pks_key = pks_key_alloc("PKS protected page tables");
+	if (pks_key < 0)
+		goto out;
+	WARN_ON(pks_key != STATIC_TABLE_KEY);
+
 	pks_page_en = !init_grouped_page_cache(&gpc_pks, GFP_KERNEL | PGTABLE_HIGHMEM,
-					       NULL, NULL);
+					       _pks_protect, _pks_unprotect);
+	if (!pks_page_en)
+		pks_key_free(pks_key);
 
 out:
 	return !pks_page_en;
diff --git a/mm/Kconfig b/mm/Kconfig
index 463e95ea0df1..0a856332fd38 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -812,7 +812,11 @@ config ARCH_HAS_SUPERVISOR_PKEYS
 	bool
 config ARCH_ENABLE_SUPERVISOR_PKEYS
 	def_bool y
-	depends on PKS_TEST
+	depends on PKS_TEST || PKS_PG_TABLES
+
+config PKS_PG_TABLES
+	def_bool y
+	depends on !PAGE_TABLE_ISOLATION && !HIGHMEM && !X86_PAE && PGTABLE_LEVELS = 4
 
 config PERCPU_STATS
 	bool "Collect percpu memory statistics"
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH RFC 9/9] x86, cpa: PKS protect direct map page tables
  2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
                   ` (7 preceding siblings ...)
  2021-05-05  0:30 ` [PATCH RFC 8/9] x86, mm: Protect page tables with PKS Rick Edgecombe
@ 2021-05-05  0:30 ` Rick Edgecombe
  2021-05-05  2:03 ` [PATCH RFC 0/9] PKS write protected " Ira Weiny
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 32+ messages in thread
From: Rick Edgecombe @ 2021-05-05  0:30 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening
  Cc: ira.weiny, rppt, dan.j.williams, linux-kernel, Rick Edgecombe

Protecting direct map page tables is a bit more difficult because a page
table may be needed for a page split as part new setting the PKS
permission the new page table. So in the case of an empty cache of page
tables the page table allocator could get into a situation where it cannot
create any more page tables.

Several solutions were looked at:

1. Break the direct map with pages allocated from the large page being
converted to PKS. This would result in a window where the table could be
written to right before it was linked into the page tables. It also
depends on high order pages being available, and so would regress from
the un-protecteed behavior in that respect.
2. Hold some page tables in reserve to be able to break the large page
for a new 2MB page, but if there are no 2MB page's available we may need
to add a single page to the cache, in which case we would use up the
reserve of page tables needed to break a new page, but not get enough
page tables back to replenish the resereve.
3. Always map the direct map at 4k when protecting page tables so that
pages don't need to be broken to map them with a PKS key. This would have
undesirable performance.

4. Lastly, the strategy employed in this patch, have a separate cache of
page tables just used for the direct map. Early in boot, squirrel away
enough page tables to map the direct map at 4k. This comes with the same
memory overhead of mapping the direct map at 4k, but gets the other
benefits of mapping the direct map as large pages.

Some direct map page tables currently still escape protection, so there
are a few todos. It is a rough sketch of the idea.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/include/asm/set_memory.h |   2 +
 arch/x86/mm/init.c                |  40 +++++++++
 arch/x86/mm/pat/set_memory.c      | 134 +++++++++++++++++++++++++++++-
 3 files changed, 172 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index b370a20681db..55e2add0452b 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -90,6 +90,8 @@ bool kernel_page_present(struct page *page);
 
 extern int kernel_set_to_readonly;
 
+void add_pks_table(unsigned long addr);
+
 #ifdef CONFIG_X86_64
 /*
  * Prevent speculative access to the page by either unmapping
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index dd694fb93916..09ae02003151 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -26,6 +26,7 @@
 #include <asm/pti.h>
 #include <asm/text-patching.h>
 #include <asm/memtype.h>
+#include <asm/pgalloc.h>
 
 /*
  * We need to define the tracepoints somewhere, and tlb.c
@@ -119,6 +120,8 @@ __ref void *alloc_low_pages(unsigned int num)
 	if (after_bootmem) {
 		unsigned int order;
 
+		WARN_ON(IS_ENABLED(CONFIG_PKS_PG_TABLES));
+		/* TODO: When does this happen, how to deal with the order? */
 		order = get_order((unsigned long)num << PAGE_SHIFT);
 		return (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO, order);
 	}
@@ -153,6 +156,11 @@ __ref void *alloc_low_pages(unsigned int num)
 		clear_page(adr);
 	}
 
+	printk("Allocing un-protected page table: %lx\n", (unsigned long)__va(pfn << PAGE_SHIFT));
+	/*
+	 * TODO: Save the va of this table to PKS protect post boot, but we need a small allocation
+	 * for the list...
+	 */
 	return __va(pfn << PAGE_SHIFT);
 }
 
@@ -532,6 +540,36 @@ unsigned long __ref init_memory_mapping(unsigned long start,
 	return ret >> PAGE_SHIFT;
 }
 
+/* TODO: Check this math */
+static u64 calc_tables_needed(unsigned int size)
+{
+	unsigned int puds = size >> PUD_SHIFT;
+	unsigned int pmds = size >> PMD_SHIFT;
+	unsigned int needed_to_map_tables = 0; //??
+
+	return puds + pmds + needed_to_map_tables;
+}
+
+static void __init reserve_page_tables(u64 start, u64 end)
+{
+	u64 reserve_size = calc_tables_needed(end - start);
+	u64 reserved = 0;
+	u64 cur;
+	int i;
+
+	while (reserved < reserve_size) {
+		cur = memblock_find_in_range(start, end, HPAGE_SIZE, HPAGE_SIZE);
+		if (!cur) {
+			WARN(1, "Could not reserve HPAGE size page tables");
+			return;
+		}
+		memblock_reserve(cur, HPAGE_SIZE);
+		for (i = 0; i < HPAGE_SIZE; i += PAGE_SIZE)
+			add_pks_table((long unsigned int)__va(cur + i));
+		reserved += HPAGE_SIZE;
+	}
+}
+
 /*
  * We need to iterate through the E820 memory map and create direct mappings
  * for only E820_TYPE_RAM and E820_KERN_RESERVED regions. We cannot simply
@@ -568,6 +606,8 @@ static unsigned long __init init_range_memory_mapping(
 		init_memory_mapping(start, end, PAGE_KERNEL);
 		mapped_ram_size += end - start;
 		can_use_brk_pgt = true;
+		if (IS_ENABLED(CONFIG_PKS_PG_TABLES))
+			reserve_page_tables(start, end);
 	}
 
 	return mapped_ram_size;
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 6387499c855d..a5d21a664c98 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -69,6 +69,90 @@ static DEFINE_SPINLOCK(cpa_lock);
 #define CPA_PAGES_ARRAY 4
 #define CPA_NO_CHECK_ALIAS 8 /* Do not search for aliases */
 
+#ifdef CONFIG_PKS_PG_TABLES
+static LLIST_HEAD(tables_cache);
+static LLIST_HEAD(tables_to_covert);
+static bool tables_inited;
+
+struct pks_table_llnode {
+	struct llist_node node;
+	void *table;
+};
+
+static void __add_dmap_table_to_convert(void *table, struct pks_table_llnode *ob)
+{
+	ob->table = table;
+	llist_add(&ob->node, &tables_to_covert);
+}
+
+static void add_dmap_table_to_convert(void *table)
+{
+	struct pks_table_llnode *ob;
+
+	ob = kmalloc(sizeof(*ob), GFP_KERNEL);
+
+	WARN(!ob, "Page table unprotected\n");
+
+	__add_dmap_table_to_convert(table, ob);
+}
+
+void add_pks_table(unsigned long addr)
+{
+	struct llist_node *node = (struct llist_node *)addr;
+
+	enable_pgtable_write();
+	llist_add(node, &tables_cache);
+	disable_pgtable_write();
+}
+
+static void *get_pks_table(void)
+{
+	return llist_del_first(&tables_cache);
+}
+
+static void *_alloc_dmap_table(void)
+{
+	struct page *page = alloc_pages(GFP_KERNEL, 0);
+
+	if (!page)
+		return NULL;
+
+	return page_address(page);
+}
+
+static struct page *alloc_dmap_table(void)
+{
+	void *tablep = get_pks_table();
+
+	/* Fall back to un-protected table is something went wrong */
+	if (!tablep) {
+		if (tables_inited)
+			WARN(1, "Allocating unprotected direct map table\n");
+		tablep = _alloc_dmap_table();
+	}
+
+	if (tablep && !tables_inited)
+		add_dmap_table_to_convert(tablep);
+
+	return virt_to_page(tablep);
+}
+
+static void free_dmap_table(struct page *table)
+{
+	add_pks_table((unsigned long)virt_to_page(table));
+}
+#else /* CONFIG_PKS_PG_TABLES */
+static struct page *alloc_dmap_table(void)
+{
+	return alloc_pages(GFP_KERNEL, 0);
+}
+
+static void free_dmap_table(struct page *table)
+{
+	__free_page(table);
+}
+#endif
+
 static inline pgprot_t cachemode2pgprot(enum page_cache_mode pcm)
 {
 	return __pgprot(cachemode2protval(pcm));
@@ -1068,14 +1152,15 @@ static int split_large_page(struct cpa_data *cpa, pte_t *kpte,
 
 	if (!debug_pagealloc_enabled())
 		spin_unlock(&cpa_lock);
-	base = alloc_pages(GFP_KERNEL, 0);
+	base = alloc_dmap_table();
+
 	if (!debug_pagealloc_enabled())
 		spin_lock(&cpa_lock);
 	if (!base)
 		return -ENOMEM;
 
 	if (__split_large_page(cpa, kpte, address, base))
-		__free_page(base);
+		free_dmap_table(base);
 
 	return 0;
 }
@@ -1088,7 +1173,7 @@ static bool try_to_free_pte_page(pte_t *pte)
 		if (!pte_none(pte[i]))
 			return false;
 
-	free_page((unsigned long)pte);
+	free_dmap_table(virt_to_page(pte));
 	return true;
 }
 
@@ -1100,7 +1185,7 @@ static bool try_to_free_pmd_page(pmd_t *pmd)
 		if (!pmd_none(pmd[i]))
 			return false;
 
-	free_page((unsigned long)pmd);
+	free_dmap_table(virt_to_page(pmd));
 	return true;
 }
 
@@ -2484,6 +2569,47 @@ void free_grouped_page(struct grouped_page_cache *gpc, struct page *page)
 	list_lru_add_node(&gpc->lru, &page->lru, page_to_nid(page));
 }
 #endif /* !HIGHMEM */
+
+#ifdef CONFIG_PKS_PG_TABLES
+/* PKS protect reserved dmap tables */
+static int __init init_pks_dmap_tables(void)
+{
+	struct pks_table_llnode *cur_entry;
+	static LLIST_HEAD(from_cache);
+	struct pks_table_llnode *tmp;
+	struct llist_node *cur, *next;
+
+	llist_for_each_safe(cur, next, llist_del_all(&tables_cache))
+		llist_add(cur, &from_cache);
+
+	while ((cur = llist_del_first(&from_cache))) {
+		llist_add(cur, &tables_cache);
+
+		tmp = kmalloc(sizeof(*tmp), GFP_KERNEL);
+		if (!tmp)
+			goto out_err;
+		tmp->table = cur;
+		llist_add(&tmp->node, &tables_to_covert);
+	}
+
+	tables_inited = true;
+
+	while ((cur = llist_del_first(&tables_to_covert))) {
+		cur_entry = llist_entry(cur, struct pks_table_llnode, node);
+		set_memory_pks((unsigned long)cur_entry->table, 1, STATIC_TABLE_KEY);
+		kfree(cur_entry);
+	}
+
+	return 0;
+out_err:
+	WARN(1, "Unable to protect all page tables\n");
+	llist_add(llist_del_all(&from_cache), &tables_cache);
+	return 0;
+}
+
+device_initcall(init_pks_dmap_tables);
+#endif
+
 /*
  * The testcases use internal knowledge of the implementation that shouldn't
  * be exposed to the rest of the kernel. Include these directly here.
-- 
2.30.2


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 0/9] PKS write protected page tables
  2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
                   ` (8 preceding siblings ...)
  2021-05-05  0:30 ` [PATCH RFC 9/9] x86, cpa: PKS protect direct map page tables Rick Edgecombe
@ 2021-05-05  2:03 ` Ira Weiny
  2021-05-05  6:25 ` Kees Cook
  2021-05-05 11:08 ` Vlastimil Babka
  11 siblings, 0 replies; 32+ messages in thread
From: Ira Weiny @ 2021-05-05  2:03 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening, rppt, dan.j.williams, linux-kernel

On Tue, May 04, 2021 at 05:30:23PM -0700, Rick Edgecombe wrote:
> 
> This is based on V6 [1] of the core PKS infrastructure patches. PKS 
> infrastructure follow-on’s are planned to enable keys to be set to the same 
> permissions globally. Since this usage needs a key to be set globally 
> read-only by default, a small temporary solution is hacked up in patch 8. Long 
> term, PKS protected page tables would use a better and more generic solution 
> to achieve this.

Before you send this out I've been thinking about this more and I think I would
prefer you not call this 'globally' setting the key.  Because you don't really
want to be able to update the key globally like I originally suggested for
kmap().  What is required is to set a different default for the key which gets
used by all threads by 'default'.

What is really missing is how to get the default changed after it may have been
used by some threads...  thus the 'global' nature...  Perhaps I am picking nits
here but I think it may go over better with Thomas and the maintainers.  Or
maybe not...  :-)

Would it be too much trouble to call this a 'default' change?  Because that is
really what you implement?

Ira

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 0/9] PKS write protected page tables
  2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
                   ` (9 preceding siblings ...)
  2021-05-05  2:03 ` [PATCH RFC 0/9] PKS write protected " Ira Weiny
@ 2021-05-05  6:25 ` Kees Cook
  2021-05-05  8:37   ` Peter Zijlstra
                     ` (2 more replies)
  2021-05-05 11:08 ` Vlastimil Babka
  11 siblings, 3 replies; 32+ messages in thread
From: Kees Cook @ 2021-05-05  6:25 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening, ira.weiny, rppt, dan.j.williams, linux-kernel

On Tue, May 04, 2021 at 05:30:23PM -0700, Rick Edgecombe wrote:
> This is a POC for write protecting page tables with PKS (Protection Keys for 
> Supervisor) [1]. The basic idea is to make the page tables read only, except 
> temporarily on a per-cpu basis when they need to be modified. I’m looking for 
> opinions on whether people like the general direction of this in terms of 
> value and implementation.

Yay!

> Why would people want this?
> ===========================
> Page tables are the basis for many types of protections and as such, are a 
> juicy target for attackers. Mapping them read-only will make them harder to 
> use in attacks.
> 
> This protects against an attacker that has acquired the ability to write to 
> the page tables. It's not foolproof because an attacker who can execute 
> arbitrary code can either disable PKS directly, or simply call the same 
> functions that the kernel uses for legitimate page table writes.

I think it absolutely has value. The exploit techniques I'm aware of that
target the page table are usually attempting to upgrade an arbitrary
write into execution (e.g. write to kernel text after setting kernel
text writable in the page table) or similar "data only" attacks (make
sensitive page writable).

It looks like PKS-protected page tables would be much like the
RO-protected text pages in the sense that there is already code in
the kernel to do things to make it writable, change text, and set it
read-only again (alternatives, ftrace, etc). That said, making the PKS
manipulation code be inline to page-writing code would make it less
available for ROP/JOP, if an attack DID want to go that route.

> Why use PKS for this?
> =====================
> PKS is an upcoming CPU feature that allows supervisor virtual memory 
> permissions to be changed without flushing the TLB, like PKU does for user 
> memory. Protecting page tables would normally be really expensive because you 
> would have to do it with paging itself. PKS helps by providing a way to toggle 
> the writability of the page tables with just a per-cpu MSR.

The per-cpu-ness is really important for both performance and for avoiding
temporal attacks where an arbitrary write in one CPU is timed against
a page table write in another CPU.

> Performance impacts
> ===================
> Setting direct map permissions on whatever random page gets allocated for a 
> page table would result in a lot of kernel range shootdowns and direct map 
> large page shattering. So the way the PKS page table memory is created is 
> similar to this module page clustering series[2], where a cache of pages is 
> replenished from 2MB pages such that the direct map permissions and associated 
> breakage is localized on the direct map. In the PKS page tables case, a PKS 
> key is pre-applied to the direct map for pages in the cache.
> 
> There would be some costs of memory overhead in order to protect the direct 
> map page tables. There would also be some extra kernel range shootdowns to 
> replenish the cache on occasion, from setting the PKS key on the direct map of 
> the new pages. I don’t have any actual performance data yet.

What CPU models are expected to have PKS?

> This is based on V6 [1] of the core PKS infrastructure patches. PKS 
> infrastructure follow-on’s are planned to enable keys to be set to the same 
> permissions globally. Since this usage needs a key to be set globally 
> read-only by default, a small temporary solution is hacked up in patch 8. Long 
> term, PKS protected page tables would use a better and more generic solution 
> to achieve this.
> 
> [1] https://lore.kernel.org/lkml/20210401225833.566238-1-ira.weiny@intel.com/

Ah, neat!

> [2] https://lore.kernel.org/lkml/20210405203711.1095940-1-rick.p.edgecombe@intel.com/

Ooh. What does this do for performance? It sounds like less TLB
pressure, IIUC?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 0/9] PKS write protected page tables
  2021-05-05  6:25 ` Kees Cook
@ 2021-05-05  8:37   ` Peter Zijlstra
  2021-05-05 18:38     ` Kees Cook
  2021-05-05 19:51   ` Edgecombe, Rick P
  2021-05-06  0:00   ` Ira Weiny
  2 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2021-05-05  8:37 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rick Edgecombe, dave.hansen, luto, linux-mm, x86, akpm,
	linux-hardening, kernel-hardening, ira.weiny, rppt,
	dan.j.williams, linux-kernel

On Tue, May 04, 2021 at 11:25:31PM -0700, Kees Cook wrote:

> It looks like PKS-protected page tables would be much like the
> RO-protected text pages in the sense that there is already code in
> the kernel to do things to make it writable, change text, and set it
> read-only again (alternatives, ftrace, etc).

We don't actually modify text by changing the mapping at all. We modify
through a writable (but not executable) temporary alias on the page (on
x86).

Once a mapping is RX it will *never* be writable again (until we tear it
all down).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 5/9] x86, mm: Use cache of page tables
  2021-05-05  0:30 ` [PATCH RFC 5/9] x86, mm: Use cache of page tables Rick Edgecombe
@ 2021-05-05  8:51   ` Peter Zijlstra
  2021-05-05 12:09     ` Mike Rapoport
  2021-05-06 18:24   ` Shakeel Butt
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2021-05-05  8:51 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: dave.hansen, luto, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening, ira.weiny, rppt, dan.j.williams, linux-kernel

On Tue, May 04, 2021 at 05:30:28PM -0700, Rick Edgecombe wrote:
> @@ -54,6 +98,8 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
>  {
>  	pgtable_pte_page_dtor(pte);
>  	paravirt_release_pte(page_to_pfn(pte));
> +	/* Set Page Table so swap knows how to free it */
> +	__SetPageTable(pte);
>  	paravirt_tlb_remove_table(tlb, pte);
>  }
>  
> @@ -70,12 +116,16 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
>  	tlb->need_flush_all = 1;
>  #endif
>  	pgtable_pmd_page_dtor(page);
> +	/* Set Page Table so swap nows how to free it */
> +	__SetPageTable(virt_to_page(pmd));
>  	paravirt_tlb_remove_table(tlb, page);
>  }
>  
>  #if CONFIG_PGTABLE_LEVELS > 3
>  void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
>  {
> +	/* Set Page Table so swap nows how to free it */
> +	__SetPageTable(virt_to_page(pud));
>  	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
>  	paravirt_tlb_remove_table(tlb, virt_to_page(pud));
>  }
> @@ -83,6 +133,8 @@ void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
>  #if CONFIG_PGTABLE_LEVELS > 4
>  void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
>  {
> +	/* Set Page Table so swap nows how to free it */
> +	__SetPageTable(virt_to_page(p4d));
>  	paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
>  	paravirt_tlb_remove_table(tlb, virt_to_page(p4d));
>  }

This, to me, seems like a really weird place to __SetPageTable(), why
can't we do that on allocation?

> @@ -888,6 +889,12 @@ void release_pages(struct page **pages, int nr)
>  			continue;
>  		}
>  
> +		if (PageTable(page)) {
> +			__ClearPageTable(page);
> +			free_table(page);
> +			continue;
> +		}
> +
>  		if (!put_page_testzero(page))
>  			continue;
>  
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 3cdee7b11da9..a60ec3d4ab21 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -22,6 +22,7 @@
>  #include <linux/swap_slots.h>
>  #include <linux/huge_mm.h>
>  #include <linux/shmem_fs.h>
> +#include <asm/pgalloc.h>
>  #include "internal.h"
>  
>  /*
> @@ -310,6 +311,11 @@ static inline void free_swap_cache(struct page *page)
>  void free_page_and_swap_cache(struct page *page)
>  {
>  	free_swap_cache(page);
> +	if (PageTable(page)) {
> +		__ClearPageTable(page);
> +		free_table(page);
> +		return;
> +	}
>  	if (!is_huge_zero_page(page))
>  		put_page(page);
>  }

And then free_table() can __ClearPageTable() and all is nice and
symmetric and all this weirdness goes away, no?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 0/9] PKS write protected page tables
  2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
                   ` (10 preceding siblings ...)
  2021-05-05  6:25 ` Kees Cook
@ 2021-05-05 11:08 ` Vlastimil Babka
  2021-05-05 11:56   ` Peter Zijlstra
  11 siblings, 1 reply; 32+ messages in thread
From: Vlastimil Babka @ 2021-05-05 11:08 UTC (permalink / raw)
  To: Rick Edgecombe, dave.hansen, luto, peterz, linux-mm, x86, akpm,
	linux-hardening, kernel-hardening
  Cc: ira.weiny, rppt, dan.j.williams, linux-kernel

On 5/5/21 2:30 AM, Rick Edgecombe wrote:
> This is a POC for write protecting page tables with PKS (Protection Keys for 
> Supervisor) [1]. The basic idea is to make the page tables read only, except 
> temporarily on a per-cpu basis when they need to be modified. I’m looking for 
> opinions on whether people like the general direction of this in terms of 
> value and implementation.
> 
> Why would people want this?
> ===========================
> Page tables are the basis for many types of protections and as such, are a 
> juicy target for attackers. Mapping them read-only will make them harder to 
> use in attacks.
> 
> This protects against an attacker that has acquired the ability to write to 
> the page tables. It's not foolproof because an attacker who can execute 
> arbitrary code can either disable PKS directly, or simply call the same 
> functions that the kernel uses for legitimate page table writes.

Yeah, it's a good idea. I've once used a similar approach locally during
debugging a problem that appeared to be stray writes hitting page tables, and
without PKS I indeed made the whole pages read-only when not touched by the
designated code.

> Why use PKS for this?
> =====================
> PKS is an upcoming CPU feature that allows supervisor virtual memory 
> permissions to be changed without flushing the TLB, like PKU does for user 
> memory. Protecting page tables would normally be really expensive because you 
> would have to do it with paging itself. PKS helps by providing a way to toggle 
> the writability of the page tables with just a per-cpu MSR.

I can see in patch 8/9 that you are flipping the MSR around individual
operations on page table entries. In my patch I hooked making the page table
writable to obtaining the page table lock (IIRC I had only the PTE level fully
handled though). Wonder if that would be better tradeoff even for your MSR approach?

Vlastimil

> Performance impacts
> ===================
> Setting direct map permissions on whatever random page gets allocated for a 
> page table would result in a lot of kernel range shootdowns and direct map 
> large page shattering. So the way the PKS page table memory is created is 
> similar to this module page clustering series[2], where a cache of pages is 
> replenished from 2MB pages such that the direct map permissions and associated 
> breakage is localized on the direct map. In the PKS page tables case, a PKS 
> key is pre-applied to the direct map for pages in the cache.
> 
> There would be some costs of memory overhead in order to protect the direct 
> map page tables. There would also be some extra kernel range shootdowns to 
> replenish the cache on occasion, from setting the PKS key on the direct map of 
> the new pages. I don’t have any actual performance data yet.
> 
> This is based on V6 [1] of the core PKS infrastructure patches. PKS 
> infrastructure follow-on’s are planned to enable keys to be set to the same 
> permissions globally. Since this usage needs a key to be set globally 
> read-only by default, a small temporary solution is hacked up in patch 8. Long 
> term, PKS protected page tables would use a better and more generic solution 
> to achieve this.
> 
> [1]
> https://lore.kernel.org/lkml/20210401225833.566238-1-ira.weiny@intel.com/
> [2]
> https://lore.kernel.org/lkml/20210405203711.1095940-1-rick.p.edgecombe@intel.com
> /
> 
> Thanks,
> 
> Rick
> 
> 
> Rick Edgecombe (9):
>   list: Support getting most recent element in list_lru
>   list: Support list head not in object for list_lru
>   x86/mm/cpa: Add grouped page allocations
>   mm: Explicitly zero page table lock ptr
>   x86, mm: Use cache of page tables
>   x86/mm/cpa: Add set_memory_pks()
>   x86/mm/cpa: Add perm callbacks to grouped pages
>   x86, mm: Protect page tables with PKS
>   x86, cpa: PKS protect direct map page tables
> 
>  arch/x86/boot/compressed/ident_map_64.c |   5 +
>  arch/x86/include/asm/pgalloc.h          |   6 +
>  arch/x86/include/asm/pgtable.h          |  26 +-
>  arch/x86/include/asm/pgtable_64.h       |  33 ++-
>  arch/x86/include/asm/pkeys_common.h     |   8 +-
>  arch/x86/include/asm/set_memory.h       |  23 ++
>  arch/x86/mm/init.c                      |  40 +++
>  arch/x86/mm/pat/set_memory.c            | 312 +++++++++++++++++++++++-
>  arch/x86/mm/pgtable.c                   | 144 ++++++++++-
>  include/asm-generic/pgalloc.h           |  42 +++-
>  include/linux/list_lru.h                |  26 ++
>  include/linux/mm.h                      |   7 +
>  mm/Kconfig                              |   6 +-
>  mm/list_lru.c                           |  38 ++-
>  mm/memory.c                             |   1 +
>  mm/swap.c                               |   7 +
>  mm/swap_state.c                         |   6 +
>  17 files changed, 705 insertions(+), 25 deletions(-)
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 0/9] PKS write protected page tables
  2021-05-05 11:08 ` Vlastimil Babka
@ 2021-05-05 11:56   ` Peter Zijlstra
  2021-05-05 19:46     ` Edgecombe, Rick P
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2021-05-05 11:56 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Rick Edgecombe, dave.hansen, luto, linux-mm, x86, akpm,
	linux-hardening, kernel-hardening, ira.weiny, rppt,
	dan.j.williams, linux-kernel

On Wed, May 05, 2021 at 01:08:35PM +0200, Vlastimil Babka wrote:
> On 5/5/21 2:30 AM, Rick Edgecombe wrote:

> > Why use PKS for this?
> > =====================
> > PKS is an upcoming CPU feature that allows supervisor virtual memory 
> > permissions to be changed without flushing the TLB, like PKU does for user 
> > memory. Protecting page tables would normally be really expensive because you 
> > would have to do it with paging itself. PKS helps by providing a way to toggle 
> > the writability of the page tables with just a per-cpu MSR.
> 
> I can see in patch 8/9 that you are flipping the MSR around individual
> operations on page table entries. In my patch I hooked making the page table
> writable to obtaining the page table lock (IIRC I had only the PTE level fully
> handled though). Wonder if that would be better tradeoff even for your MSR approach?

There's also the HIGHPTE code we could abuse to kmap an alias while
we're at it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 3/9] x86/mm/cpa: Add grouped page allocations
  2021-05-05  0:30 ` [PATCH RFC 3/9] x86/mm/cpa: Add grouped page allocations Rick Edgecombe
@ 2021-05-05 12:08   ` Mike Rapoport
  2021-05-05 13:09     ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Rapoport @ 2021-05-05 12:08 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: dave.hansen, luto, peterz, linux-mm, x86, akpm, linux-hardening,
	kernel-hardening, ira.weiny, dan.j.williams, linux-kernel

On Tue, May 04, 2021 at 05:30:26PM -0700, Rick Edgecombe wrote:
> For x86, setting memory permissions on the direct map results in fracturing
> large pages. Direct map fracturing can be reduced by locating pages that
> will have their permissions set close together.
> 
> Create a simple page cache that allocates pages from huge page size
> blocks. Don't guarantee that a page will come from a huge page grouping,
> instead fallback to non-grouped pages to fulfill the allocation if
> needed. Also, register a shrinker such that the system can ask for the
> pages back if needed. Since this is only needed when there is a direct
> map, compile it out on highmem systems.

I only had time to skim through the patches, I like the idea of having a
simple cache that allocates larger pages with a fallback to basic page
size.

I just think it should be more generic and closer to the page allocator.
I was thinking about adding a GFP flag that will tell that the allocated
pages should be removed from the direct map. Then alloc_pages() could use
such cache whenever this GFP flag is specified with a fallback for lower
order allocations.
 
> Free pages in the cache are kept track of in per-node list inside a
> list_lru. NUMA_NO_NODE requests are serviced by checking each per-node
> list in a round robin fashion. If pages are requested for a certain node
> but the cache is empty for that node, a whole additional huge page size
> page is allocated.
> 
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>  arch/x86/include/asm/set_memory.h |  14 +++
>  arch/x86/mm/pat/set_memory.c      | 151 ++++++++++++++++++++++++++++++
>  2 files changed, 165 insertions(+)
> 
> diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
> index 4352f08bfbb5..b63f09cc282a 100644
> --- a/arch/x86/include/asm/set_memory.h
> +++ b/arch/x86/include/asm/set_memory.h
> @@ -4,6 +4,9 @@
>  
>  #include <asm/page.h>
>  #include <asm-generic/set_memory.h>
> +#include <linux/gfp.h>
> +#include <linux/list_lru.h>
> +#include <linux/shrinker.h>
>  
>  /*
>   * The set_memory_* API can be used to change various attributes of a virtual
> @@ -135,4 +138,15 @@ static inline int clear_mce_nospec(unsigned long pfn)
>   */
>  #endif
>  
> +struct grouped_page_cache {
> +	struct shrinker shrinker;
> +	struct list_lru lru;
> +	gfp_t gfp;
> +	atomic_t nid_round_robin;
> +};
> +
> +int init_grouped_page_cache(struct grouped_page_cache *gpc, gfp_t gfp);
> +struct page *get_grouped_page(int node, struct grouped_page_cache *gpc);
> +void free_grouped_page(struct grouped_page_cache *gpc, struct page *page);
> +
>  #endif /* _ASM_X86_SET_MEMORY_H */
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 16f878c26667..6877ef66793b 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2306,6 +2306,157 @@ int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
>  	return retval;
>  }
>  
> +#ifndef HIGHMEM
> +static struct page *__alloc_page_order(int node, gfp_t gfp_mask, int order)
> +{
> +	if (node == NUMA_NO_NODE)
> +		return alloc_pages(gfp_mask, order);
> +
> +	return alloc_pages_node(node, gfp_mask, order);
> +}
> +
> +static struct grouped_page_cache *__get_gpc_from_sc(struct shrinker *shrinker)
> +{
> +	return container_of(shrinker, struct grouped_page_cache, shrinker);
> +}
> +
> +static unsigned long grouped_shrink_count(struct shrinker *shrinker,
> +					  struct shrink_control *sc)
> +{
> +	struct grouped_page_cache *gpc = __get_gpc_from_sc(shrinker);
> +	unsigned long page_cnt = list_lru_shrink_count(&gpc->lru, sc);
> +
> +	return page_cnt ? page_cnt : SHRINK_EMPTY;
> +}
> +
> +static enum lru_status grouped_isolate(struct list_head *item,
> +				       struct list_lru_one *list,
> +				       spinlock_t *lock, void *cb_arg)
> +{
> +	struct list_head *dispose = cb_arg;
> +
> +	list_lru_isolate_move(list, item, dispose);
> +
> +	return LRU_REMOVED;
> +}
> +
> +static void __dispose_pages(struct grouped_page_cache *gpc, struct list_head *head)
> +{
> +	struct list_head *cur, *next;
> +
> +	list_for_each_safe(cur, next, head) {
> +		struct page *page = list_entry(head, struct page, lru);
> +
> +		list_del(cur);
> +
> +		__free_pages(page, 0);
> +	}
> +}
> +
> +static unsigned long grouped_shrink_scan(struct shrinker *shrinker,
> +					 struct shrink_control *sc)
> +{
> +	struct grouped_page_cache *gpc = __get_gpc_from_sc(shrinker);
> +	unsigned long isolated;
> +	LIST_HEAD(freeable);
> +
> +	if (!(sc->gfp_mask & gpc->gfp))
> +		return SHRINK_STOP;
> +
> +	isolated = list_lru_shrink_walk(&gpc->lru, sc, grouped_isolate,
> +					&freeable);
> +	__dispose_pages(gpc, &freeable);
> +
> +	/* Every item walked gets isolated */
> +	sc->nr_scanned += isolated;
> +
> +	return isolated;
> +}
> +
> +static struct page *__remove_first_page(struct grouped_page_cache *gpc, int node)
> +{
> +	unsigned int start_nid, i;
> +	struct list_head *head;
> +
> +	if (node != NUMA_NO_NODE) {
> +		head = list_lru_get_mru(&gpc->lru, node);
> +		if (head)
> +			return list_entry(head, struct page, lru);
> +		return NULL;
> +	}
> +
> +	/* If NUMA_NO_NODE, search the nodes in round robin for a page */
> +	start_nid = (unsigned int)atomic_fetch_inc(&gpc->nid_round_robin) % nr_node_ids;
> +	for (i = 0; i < nr_node_ids; i++) {
> +		int cur_nid = (start_nid + i) % nr_node_ids;
> +
> +		head = list_lru_get_mru(&gpc->lru, cur_nid);
> +		if (head)
> +			return list_entry(head, struct page, lru);
> +	}
> +
> +	return NULL;
> +}
> +
> +/* Get and add some new pages to the cache to be used by VM_GROUP_PAGES */
> +static struct page *__replenish_grouped_pages(struct grouped_page_cache *gpc, int node)
> +{
> +	const unsigned int hpage_cnt = HPAGE_SIZE >> PAGE_SHIFT;
> +	struct page *page;
> +	int i;
> +
> +	page = __alloc_page_order(node, gpc->gfp, HUGETLB_PAGE_ORDER);
> +	if (!page)
> +		return __alloc_page_order(node, gpc->gfp, 0);
> +
> +	split_page(page, HUGETLB_PAGE_ORDER);
> +
> +	for (i = 1; i < hpage_cnt; i++)
> +		free_grouped_page(gpc, &page[i]);
> +
> +	return &page[0];
> +}
> +
> +int init_grouped_page_cache(struct grouped_page_cache *gpc, gfp_t gfp)
> +{
> +	int err = 0;
> +
> +	memset(gpc, 0, sizeof(struct grouped_page_cache));
> +
> +	if (list_lru_init(&gpc->lru))
> +		goto out;
> +
> +	gpc->shrinker.count_objects = grouped_shrink_count;
> +	gpc->shrinker.scan_objects = grouped_shrink_scan;
> +	gpc->shrinker.seeks = DEFAULT_SEEKS;
> +	gpc->shrinker.flags = SHRINKER_NUMA_AWARE;
> +
> +	err = register_shrinker(&gpc->shrinker);
> +	if (err)
> +		list_lru_destroy(&gpc->lru);
> +
> +out:
> +	return err;
> +}
> +
> +struct page *get_grouped_page(int node, struct grouped_page_cache *gpc)
> +{
> +	struct page *page;
> +
> +	page = __remove_first_page(gpc, node);
> +
> +	if (page)
> +		return page;
> +
> +	return __replenish_grouped_pages(gpc, node);
> +}
> +
> +void free_grouped_page(struct grouped_page_cache *gpc, struct page *page)
> +{
> +	INIT_LIST_HEAD(&page->lru);
> +	list_lru_add_node(&gpc->lru, &page->lru, page_to_nid(page));
> +}
> +#endif /* !HIGHMEM */
>  /*
>   * The testcases use internal knowledge of the implementation that shouldn't
>   * be exposed to the rest of the kernel. Include these directly here.
> -- 
> 2.30.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 5/9] x86, mm: Use cache of page tables
  2021-05-05  8:51   ` Peter Zijlstra
@ 2021-05-05 12:09     ` Mike Rapoport
  2021-05-05 13:19       ` Peter Zijlstra
  2021-05-06 17:59       ` Matthew Wilcox
  0 siblings, 2 replies; 32+ messages in thread
From: Mike Rapoport @ 2021-05-05 12:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rick Edgecombe, dave.hansen, luto, linux-mm, x86, akpm,
	linux-hardening, kernel-hardening, ira.weiny, dan.j.williams,
	linux-kernel

On Wed, May 05, 2021 at 10:51:55AM +0200, Peter Zijlstra wrote:
> On Tue, May 04, 2021 at 05:30:28PM -0700, Rick Edgecombe wrote:
> > @@ -54,6 +98,8 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
> >  {
> >  	pgtable_pte_page_dtor(pte);
> >  	paravirt_release_pte(page_to_pfn(pte));
> > +	/* Set Page Table so swap knows how to free it */
> > +	__SetPageTable(pte);
> >  	paravirt_tlb_remove_table(tlb, pte);
> >  }
> >  
> > @@ -70,12 +116,16 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
> >  	tlb->need_flush_all = 1;
> >  #endif
> >  	pgtable_pmd_page_dtor(page);
> > +	/* Set Page Table so swap nows how to free it */
> > +	__SetPageTable(virt_to_page(pmd));
> >  	paravirt_tlb_remove_table(tlb, page);
> >  }
> >  
> >  #if CONFIG_PGTABLE_LEVELS > 3
> >  void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
> >  {
> > +	/* Set Page Table so swap nows how to free it */
> > +	__SetPageTable(virt_to_page(pud));
> >  	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
> >  	paravirt_tlb_remove_table(tlb, virt_to_page(pud));
> >  }
> > @@ -83,6 +133,8 @@ void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
> >  #if CONFIG_PGTABLE_LEVELS > 4
> >  void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
> >  {
> > +	/* Set Page Table so swap nows how to free it */
> > +	__SetPageTable(virt_to_page(p4d));
> >  	paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
> >  	paravirt_tlb_remove_table(tlb, virt_to_page(p4d));
> >  }
> 
> This, to me, seems like a really weird place to __SetPageTable(), why
> can't we do that on allocation?

We call __ClearPageTable() at pgtable_pxy_page_dtor(), so at least for pte
and pmd we need to somehow tell release_pages() what kind of page it was.
 
> > @@ -888,6 +889,12 @@ void release_pages(struct page **pages, int nr)
> >  			continue;
> >  		}
> >  
> > +		if (PageTable(page)) {
> > +			__ClearPageTable(page);
> > +			free_table(page);
> > +			continue;
> > +		}
> > +
> >  		if (!put_page_testzero(page))
> >  			continue;
> >  
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index 3cdee7b11da9..a60ec3d4ab21 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -22,6 +22,7 @@
> >  #include <linux/swap_slots.h>
> >  #include <linux/huge_mm.h>
> >  #include <linux/shmem_fs.h>
> > +#include <asm/pgalloc.h>
> >  #include "internal.h"
> >  
> >  /*
> > @@ -310,6 +311,11 @@ static inline void free_swap_cache(struct page *page)
> >  void free_page_and_swap_cache(struct page *page)
> >  {
> >  	free_swap_cache(page);
> > +	if (PageTable(page)) {
> > +		__ClearPageTable(page);
> > +		free_table(page);
> > +		return;
> > +	}
> >  	if (!is_huge_zero_page(page))
> >  		put_page(page);
> >  }
> 
> And then free_table() can __ClearPageTable() and all is nice and
> symmetric and all this weirdness goes away, no?

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 3/9] x86/mm/cpa: Add grouped page allocations
  2021-05-05 12:08   ` Mike Rapoport
@ 2021-05-05 13:09     ` Peter Zijlstra
  2021-05-05 18:45       ` Mike Rapoport
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2021-05-05 13:09 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Rick Edgecombe, dave.hansen, luto, linux-mm, x86, akpm,
	linux-hardening, kernel-hardening, ira.weiny, dan.j.williams,
	linux-kernel

On Wed, May 05, 2021 at 03:08:27PM +0300, Mike Rapoport wrote:
> On Tue, May 04, 2021 at 05:30:26PM -0700, Rick Edgecombe wrote:
> > For x86, setting memory permissions on the direct map results in fracturing
> > large pages. Direct map fracturing can be reduced by locating pages that
> > will have their permissions set close together.
> > 
> > Create a simple page cache that allocates pages from huge page size
> > blocks. Don't guarantee that a page will come from a huge page grouping,
> > instead fallback to non-grouped pages to fulfill the allocation if
> > needed. Also, register a shrinker such that the system can ask for the
> > pages back if needed. Since this is only needed when there is a direct
> > map, compile it out on highmem systems.
> 
> I only had time to skim through the patches, I like the idea of having a
> simple cache that allocates larger pages with a fallback to basic page
> size.
> 
> I just think it should be more generic and closer to the page allocator.
> I was thinking about adding a GFP flag that will tell that the allocated
> pages should be removed from the direct map. Then alloc_pages() could use
> such cache whenever this GFP flag is specified with a fallback for lower
> order allocations.

That doesn't provide enough information I think. Removing from direct
map isn't the only consideration, you also want to group them by the
target protection bits such that we don't get to use 4k pages quite so
much.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 5/9] x86, mm: Use cache of page tables
  2021-05-05 12:09     ` Mike Rapoport
@ 2021-05-05 13:19       ` Peter Zijlstra
  2021-05-05 21:54         ` Edgecombe, Rick P
  2021-05-06 17:59       ` Matthew Wilcox
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2021-05-05 13:19 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Rick Edgecombe, dave.hansen, luto, linux-mm, x86, akpm,
	linux-hardening, kernel-hardening, ira.weiny, dan.j.williams,
	linux-kernel

On Wed, May 05, 2021 at 03:09:09PM +0300, Mike Rapoport wrote:
> On Wed, May 05, 2021 at 10:51:55AM +0200, Peter Zijlstra wrote:
> > On Tue, May 04, 2021 at 05:30:28PM -0700, Rick Edgecombe wrote:
> > > @@ -54,6 +98,8 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
> > >  {
> > >  	pgtable_pte_page_dtor(pte);
> > >  	paravirt_release_pte(page_to_pfn(pte));
> > > +	/* Set Page Table so swap knows how to free it */
> > > +	__SetPageTable(pte);
> > >  	paravirt_tlb_remove_table(tlb, pte);
> > >  }
> > >  
> > > @@ -70,12 +116,16 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
> > >  	tlb->need_flush_all = 1;
> > >  #endif
> > >  	pgtable_pmd_page_dtor(page);
> > > +	/* Set Page Table so swap nows how to free it */
> > > +	__SetPageTable(virt_to_page(pmd));
> > >  	paravirt_tlb_remove_table(tlb, page);
> > >  }
> > >  
> > >  #if CONFIG_PGTABLE_LEVELS > 3
> > >  void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
> > >  {
> > > +	/* Set Page Table so swap nows how to free it */
> > > +	__SetPageTable(virt_to_page(pud));
> > >  	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
> > >  	paravirt_tlb_remove_table(tlb, virt_to_page(pud));
> > >  }
> > > @@ -83,6 +133,8 @@ void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
> > >  #if CONFIG_PGTABLE_LEVELS > 4
> > >  void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
> > >  {
> > > +	/* Set Page Table so swap nows how to free it */
> > > +	__SetPageTable(virt_to_page(p4d));
> > >  	paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
> > >  	paravirt_tlb_remove_table(tlb, virt_to_page(p4d));
> > >  }
> > 
> > This, to me, seems like a really weird place to __SetPageTable(), why
> > can't we do that on allocation?
> 
> We call __ClearPageTable() at pgtable_pxy_page_dtor(), so at least for pte
> and pmd we need to somehow tell release_pages() what kind of page it was.

Hurph, right, but then the added comment is misleading; s/Set/Reset/g.
Still I'm thinking that if we do these allocators, moving the set/clear
to the allocator would be the most natural place, perhaps we can remove
them from the {c,d}tor.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 0/9] PKS write protected page tables
  2021-05-05  8:37   ` Peter Zijlstra
@ 2021-05-05 18:38     ` Kees Cook
  0 siblings, 0 replies; 32+ messages in thread
From: Kees Cook @ 2021-05-05 18:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rick Edgecombe, dave.hansen, luto, linux-mm, x86, akpm,
	linux-hardening, kernel-hardening, ira.weiny, rppt,
	dan.j.williams, linux-kernel

On Wed, May 05, 2021 at 10:37:29AM +0200, Peter Zijlstra wrote:
> On Tue, May 04, 2021 at 11:25:31PM -0700, Kees Cook wrote:
> 
> > It looks like PKS-protected page tables would be much like the
> > RO-protected text pages in the sense that there is already code in
> > the kernel to do things to make it writable, change text, and set it
> > read-only again (alternatives, ftrace, etc).
> 
> We don't actually modify text by changing the mapping at all. We modify
> through a writable (but not executable) temporary alias on the page (on
> x86).
> 
> Once a mapping is RX it will *never* be writable again (until we tear it
> all down).

Yes, quite true. I was trying to answer the concern about "is it okay
that there is a routine in the kernel that can write to page tables
(via temporary disabling of PKS)?" by saying "yes, this is fine -- we
already have similar routines in the kernel that bypass memory
protections, and that's okay because the defense is primarily about
blocking flaws that allow attacker-controlled writes to be used to
leverage greater control over kernel state, of which the page tables are
pretty central. :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 3/9] x86/mm/cpa: Add grouped page allocations
  2021-05-05 13:09     ` Peter Zijlstra
@ 2021-05-05 18:45       ` Mike Rapoport
  2021-05-05 21:57         ` Edgecombe, Rick P
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Rapoport @ 2021-05-05 18:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rick Edgecombe, dave.hansen, luto, linux-mm, x86, akpm,
	linux-hardening, kernel-hardening, ira.weiny, dan.j.williams,
	linux-kernel

On Wed, May 05, 2021 at 03:09:12PM +0200, Peter Zijlstra wrote:
> On Wed, May 05, 2021 at 03:08:27PM +0300, Mike Rapoport wrote:
> > On Tue, May 04, 2021 at 05:30:26PM -0700, Rick Edgecombe wrote:
> > > For x86, setting memory permissions on the direct map results in fracturing
> > > large pages. Direct map fracturing can be reduced by locating pages that
> > > will have their permissions set close together.
> > > 
> > > Create a simple page cache that allocates pages from huge page size
> > > blocks. Don't guarantee that a page will come from a huge page grouping,
> > > instead fallback to non-grouped pages to fulfill the allocation if
> > > needed. Also, register a shrinker such that the system can ask for the
> > > pages back if needed. Since this is only needed when there is a direct
> > > map, compile it out on highmem systems.
> > 
> > I only had time to skim through the patches, I like the idea of having a
> > simple cache that allocates larger pages with a fallback to basic page
> > size.
> > 
> > I just think it should be more generic and closer to the page allocator.
> > I was thinking about adding a GFP flag that will tell that the allocated
> > pages should be removed from the direct map. Then alloc_pages() could use
> > such cache whenever this GFP flag is specified with a fallback for lower
> > order allocations.
> 
> That doesn't provide enough information I think. Removing from direct
> map isn't the only consideration, you also want to group them by the
> target protection bits such that we don't get to use 4k pages quite so
> much.

Unless I'm missing something we anyway hand out 4k pages from the cache and
the neighbouring 4k may end up with different protections.

This is also similar to what happens in the set Rick posted a while ago to
support grouped vmalloc allocations:

[1] https://lore.kernel.org/lkml/20210405203711.1095940-1-rick.p.edgecombe@intel.com/

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 0/9] PKS write protected page tables
  2021-05-05 11:56   ` Peter Zijlstra
@ 2021-05-05 19:46     ` Edgecombe, Rick P
  0 siblings, 0 replies; 32+ messages in thread
From: Edgecombe, Rick P @ 2021-05-05 19:46 UTC (permalink / raw)
  To: peterz, vbabka
  Cc: kernel-hardening, Hansen, Dave, luto, x86, linux-mm, akpm,
	linux-kernel, rppt, linux-hardening, Weiny, Ira, Williams, Dan J

On Wed, 2021-05-05 at 13:56 +0200, Peter Zijlstra wrote:
> On Wed, May 05, 2021 at 01:08:35PM +0200, Vlastimil Babka wrote:
> > On 5/5/21 2:30 AM, Rick Edgecombe wrote:
> 
> > > Why use PKS for this?
> > > =====================
> > > PKS is an upcoming CPU feature that allows supervisor virtual
> > > memory 
> > > permissions to be changed without flushing the TLB, like PKU does
> > > for user 
> > > memory. Protecting page tables would normally be really expensive
> > > because you 
> > > would have to do it with paging itself. PKS helps by providing a
> > > way to toggle 
> > > the writability of the page tables with just a per-cpu MSR.
> > 
> > I can see in patch 8/9 that you are flipping the MSR around
> > individual
> > operations on page table entries. In my patch I hooked making the
> > page table
> > writable to obtaining the page table lock (IIRC I had only the PTE
> > level fully
> > handled though). Wonder if that would be better tradeoff even for
> > your MSR approach?
> 
Hmm, I see, that could reduce the sprinkling of the enable/disable
calls. It seems some(most?) of the kernel address space page table
modification don't take the page table locks though so those callers
would have to call something else to be able to write.

> There's also the HIGHPTE code we could abuse to kmap an alias while
> we're at it.

For a non-PKS debug feature?

It might fit pretty easily into CONFIG_DEBUG_PAGEALLOC on top of this
series. enable/disable_pgtable_write() could take a pointer that is
ignored in PKS mode, but triggers a cpa call in a
CONFIG_DEBUG_PAGETABLE_WRITE mode.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 0/9] PKS write protected page tables
  2021-05-05  6:25 ` Kees Cook
  2021-05-05  8:37   ` Peter Zijlstra
@ 2021-05-05 19:51   ` Edgecombe, Rick P
  2021-05-06  0:00   ` Ira Weiny
  2 siblings, 0 replies; 32+ messages in thread
From: Edgecombe, Rick P @ 2021-05-05 19:51 UTC (permalink / raw)
  To: keescook
  Cc: kernel-hardening, luto, Hansen, Dave, x86, linux-mm, peterz,
	akpm, linux-kernel, rppt, linux-hardening, Weiny, Ira, Williams,
	Dan J

On Tue, 2021-05-04 at 23:25 -0700, Kees Cook wrote:
> > infrastructure follow-on’s are planned to enable keys to be set to
> > the same 
> > permissions globally. Since this usage needs a key to be set globally
> > read-only by default, a small temporary solution is hacked up in
> > patch 8. Long 
> > term, PKS protected page tables would use a better and more generic
> > solution 
> > to achieve this.
> > 
> > [1]
> > https://lore.kernel.org/lkml/20210401225833.566238-1-ira.weiny@intel.com/
> 
> Ah, neat!
> 
> > [2]
> > https://lore.kernel.org/lkml/20210405203711.1095940-1-rick.p.edgecombe@intel.com/
> 
> Ooh. What does this do for performance? It sounds like less TLB
> pressure, IIUC?

Yea, less TLB pressure, faster page table walks in theory. There was
some testing that showed having all 4k pages was bad for performance:
https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com/

I'm not sure exactly how much breakage is needed before problems start
to show up, but there was also someone posting that large amounts of
tracing was noticeable for their workload.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 5/9] x86, mm: Use cache of page tables
  2021-05-05 13:19       ` Peter Zijlstra
@ 2021-05-05 21:54         ` Edgecombe, Rick P
  0 siblings, 0 replies; 32+ messages in thread
From: Edgecombe, Rick P @ 2021-05-05 21:54 UTC (permalink / raw)
  To: peterz, rppt
  Cc: kernel-hardening, Hansen, Dave, luto, x86, linux-mm, akpm,
	linux-kernel, Williams, Dan J, linux-hardening, Weiny, Ira

On Wed, 2021-05-05 at 15:19 +0200, Peter Zijlstra wrote:
> On Wed, May 05, 2021 at 03:09:09PM +0300, Mike Rapoport wrote:
> > On Wed, May 05, 2021 at 10:51:55AM +0200, Peter Zijlstra wrote:
> > > On Tue, May 04, 2021 at 05:30:28PM -0700, Rick Edgecombe wrote:
> > > > @@ -54,6 +98,8 @@ void ___pte_free_tlb(struct mmu_gather *tlb,
> > > > struct page *pte)
> > > >  {
> > > >         pgtable_pte_page_dtor(pte);
> > > >         paravirt_release_pte(page_to_pfn(pte));
> > > > +       /* Set Page Table so swap knows how to free it */
> > > > +       __SetPageTable(pte);
> > > >         paravirt_tlb_remove_table(tlb, pte);
> > > >  }
> > > >  
> > > > @@ -70,12 +116,16 @@ void ___pmd_free_tlb(struct mmu_gather
> > > > *tlb, pmd_t *pmd)
> > > >         tlb->need_flush_all = 1;
> > > >  #endif
> > > >         pgtable_pmd_page_dtor(page);
> > > > +       /* Set Page Table so swap nows how to free it */
> > > > +       __SetPageTable(virt_to_page(pmd));
> > > >         paravirt_tlb_remove_table(tlb, page);
> > > >  }
> > > >  
> > > >  #if CONFIG_PGTABLE_LEVELS > 3
> > > >  void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
> > > >  {
> > > > +       /* Set Page Table so swap nows how to free it */
> > > > +       __SetPageTable(virt_to_page(pud));
> > > >         paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
> > > >         paravirt_tlb_remove_table(tlb, virt_to_page(pud));
> > > >  }
> > > > @@ -83,6 +133,8 @@ void ___pud_free_tlb(struct mmu_gather *tlb,
> > > > pud_t *pud)
> > > >  #if CONFIG_PGTABLE_LEVELS > 4
> > > >  void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
> > > >  {
> > > > +       /* Set Page Table so swap nows how to free it */
> > > > +       __SetPageTable(virt_to_page(p4d));
> > > >         paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
> > > >         paravirt_tlb_remove_table(tlb, virt_to_page(p4d));
> > > >  }
> > > 
> > > This, to me, seems like a really weird place to __SetPageTable(),
> > > why
> > > can't we do that on allocation?
> > 
> > We call __ClearPageTable() at pgtable_pxy_page_dtor(), so at least
> > for pte
> > and pmd we need to somehow tell release_pages() what kind of page
> > it was.
> 
> Hurph, right, but then the added comment is misleading;
> s/Set/Reset/g.
> Still I'm thinking that if we do these allocators, moving the
> set/clear
> to the allocator would be the most natural place, perhaps we can
> remove
> them from the {c,d}tor.

Hmm, yes. I guess there could be just x86 specific versions of the
ctor/dtor that don't set the flag. Seems like it should work and be
less confusing. Thanks.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 3/9] x86/mm/cpa: Add grouped page allocations
  2021-05-05 18:45       ` Mike Rapoport
@ 2021-05-05 21:57         ` Edgecombe, Rick P
  2021-05-09  9:39           ` Mike Rapoport
  0 siblings, 1 reply; 32+ messages in thread
From: Edgecombe, Rick P @ 2021-05-05 21:57 UTC (permalink / raw)
  To: peterz, rppt
  Cc: kernel-hardening, Hansen, Dave, luto, x86, linux-mm, akpm,
	linux-kernel, Williams, Dan J, linux-hardening, Weiny, Ira

On Wed, 2021-05-05 at 21:45 +0300, Mike Rapoport wrote:
> On Wed, May 05, 2021 at 03:09:12PM +0200, Peter Zijlstra wrote:
> > On Wed, May 05, 2021 at 03:08:27PM +0300, Mike Rapoport wrote:
> > > On Tue, May 04, 2021 at 05:30:26PM -0700, Rick Edgecombe wrote:
> > > > For x86, setting memory permissions on the direct map results
> > > > in fracturing
> > > > large pages. Direct map fracturing can be reduced by locating
> > > > pages that
> > > > will have their permissions set close together.
> > > > 
> > > > Create a simple page cache that allocates pages from huge page
> > > > size
> > > > blocks. Don't guarantee that a page will come from a huge page
> > > > grouping,
> > > > instead fallback to non-grouped pages to fulfill the allocation
> > > > if
> > > > needed. Also, register a shrinker such that the system can ask
> > > > for the
> > > > pages back if needed. Since this is only needed when there is a
> > > > direct
> > > > map, compile it out on highmem systems.
> > > 
> > > I only had time to skim through the patches, I like the idea of
> > > having a
> > > simple cache that allocates larger pages with a fallback to basic
> > > page
> > > size.
> > > 
> > > I just think it should be more generic and closer to the page
> > > allocator.
> > > I was thinking about adding a GFP flag that will tell that the
> > > allocated
> > > pages should be removed from the direct map. Then alloc_pages()
> > > could use
> > > such cache whenever this GFP flag is specified with a fallback
> > > for lower
> > > order allocations.
> > 
> > That doesn't provide enough information I think. Removing from
> > direct
> > map isn't the only consideration, you also want to group them by
> > the
> > target protection bits such that we don't get to use 4k pages quite
> > so
> > much.
> 
> Unless I'm missing something we anyway hand out 4k pages from the
> cache and
> the neighbouring 4k may end up with different protections.
> 
> This is also similar to what happens in the set Rick posted a while
> ago to
> support grouped vmalloc allocations:
> 

One issue is with the shrinker callbacks. If you are just trying to
reset and free a single page because the system is low on memory, it
could be problematic to have to break a large page, which would require
another page.

I think for vmalloc, eventually it should just have the direct map
alias unmapped. The reason it was not in the linked patch, is just to
iteratively move in the direction of having permissioned vmallocs be
unmapped.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 0/9] PKS write protected page tables
  2021-05-05  6:25 ` Kees Cook
  2021-05-05  8:37   ` Peter Zijlstra
  2021-05-05 19:51   ` Edgecombe, Rick P
@ 2021-05-06  0:00   ` Ira Weiny
  2 siblings, 0 replies; 32+ messages in thread
From: Ira Weiny @ 2021-05-06  0:00 UTC (permalink / raw)
  To: Kees Cook
  Cc: Rick Edgecombe, dave.hansen, luto, peterz, linux-mm, x86, akpm,
	linux-hardening, kernel-hardening, rppt, dan.j.williams,
	linux-kernel

On Tue, May 04, 2021 at 11:25:31PM -0700, Kees Cook wrote:
> On Tue, May 04, 2021 at 05:30:23PM -0700, Rick Edgecombe wrote:
> 
> > Performance impacts
> > ===================
> > Setting direct map permissions on whatever random page gets allocated for a 
> > page table would result in a lot of kernel range shootdowns and direct map 
> > large page shattering. So the way the PKS page table memory is created is 
> > similar to this module page clustering series[2], where a cache of pages is 
> > replenished from 2MB pages such that the direct map permissions and associated 
> > breakage is localized on the direct map. In the PKS page tables case, a PKS 
> > key is pre-applied to the direct map for pages in the cache.
> > 
> > There would be some costs of memory overhead in order to protect the direct 
> > map page tables. There would also be some extra kernel range shootdowns to 
> > replenish the cache on occasion, from setting the PKS key on the direct map of 
> > the new pages. I don’t have any actual performance data yet.
> 
> What CPU models are expected to have PKS?


Supervisor Memory Protection Keys (PKS) is a feature which is found on Intel’s
Sapphire Rapids (and later) “Scalable Processor” Server CPUs.  It will also be
available in future non-server Intel parts.

Also qemu has some support as well.

https://www.qemu.org/2021/04/30/qemu-6-0-0/

Ira

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 5/9] x86, mm: Use cache of page tables
  2021-05-05 12:09     ` Mike Rapoport
  2021-05-05 13:19       ` Peter Zijlstra
@ 2021-05-06 17:59       ` Matthew Wilcox
  1 sibling, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2021-05-06 17:59 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Peter Zijlstra, Rick Edgecombe, dave.hansen, luto, linux-mm, x86,
	akpm, linux-hardening, kernel-hardening, ira.weiny,
	dan.j.williams, linux-kernel

On Wed, May 05, 2021 at 03:09:09PM +0300, Mike Rapoport wrote:
> On Wed, May 05, 2021 at 10:51:55AM +0200, Peter Zijlstra wrote:
> > On Tue, May 04, 2021 at 05:30:28PM -0700, Rick Edgecombe wrote:
> > > @@ -54,6 +98,8 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte)
> > >  {
> > >  	pgtable_pte_page_dtor(pte);
> > >  	paravirt_release_pte(page_to_pfn(pte));
> > > +	/* Set Page Table so swap knows how to free it */
> > > +	__SetPageTable(pte);
> > >  	paravirt_tlb_remove_table(tlb, pte);
> > >  }
> > >  
> > > @@ -70,12 +116,16 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd)
> > >  	tlb->need_flush_all = 1;
> > >  #endif
> > >  	pgtable_pmd_page_dtor(page);
> > > +	/* Set Page Table so swap nows how to free it */
> > > +	__SetPageTable(virt_to_page(pmd));
> > >  	paravirt_tlb_remove_table(tlb, page);
> > >  }
> > >  
> > >  #if CONFIG_PGTABLE_LEVELS > 3
> > >  void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
> > >  {
> > > +	/* Set Page Table so swap nows how to free it */
> > > +	__SetPageTable(virt_to_page(pud));
> > >  	paravirt_release_pud(__pa(pud) >> PAGE_SHIFT);
> > >  	paravirt_tlb_remove_table(tlb, virt_to_page(pud));
> > >  }
> > > @@ -83,6 +133,8 @@ void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud)
> > >  #if CONFIG_PGTABLE_LEVELS > 4
> > >  void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d)
> > >  {
> > > +	/* Set Page Table so swap nows how to free it */
> > > +	__SetPageTable(virt_to_page(p4d));
> > >  	paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT);
> > >  	paravirt_tlb_remove_table(tlb, virt_to_page(p4d));
> > >  }
> > 
> > This, to me, seems like a really weird place to __SetPageTable(), why
> > can't we do that on allocation?
> 
> We call __ClearPageTable() at pgtable_pxy_page_dtor(), so at least for pte
> and pmd we need to somehow tell release_pages() what kind of page it was.

One of the things I've been thinking about doing is removing the pgtable
dtors and instead calling the pgtable dtor in __put_page() if PageTable().
Might work nicely with this ...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 5/9] x86, mm: Use cache of page tables
  2021-05-05  0:30 ` [PATCH RFC 5/9] x86, mm: Use cache of page tables Rick Edgecombe
  2021-05-05  8:51   ` Peter Zijlstra
@ 2021-05-06 18:24   ` Shakeel Butt
  2021-05-07 16:27     ` Edgecombe, Rick P
  1 sibling, 1 reply; 32+ messages in thread
From: Shakeel Butt @ 2021-05-06 18:24 UTC (permalink / raw)
  To: Rick Edgecombe
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra (Intel),
	Linux MM, x86, Andrew Morton, linux-hardening, kernel-hardening,
	Ira Weiny, Mike Rapoport, Dan Williams, LKML

On Tue, May 4, 2021 at 5:36 PM Rick Edgecombe
<rick.p.edgecombe@intel.com> wrote:
>
[...]
> +#ifdef CONFIG_PKS_PG_TABLES
> +struct page *alloc_table(gfp_t gfp)
> +{
> +       struct page *table;
> +
> +       if (!pks_page_en)
> +               return alloc_page(gfp);
> +
> +       table = get_grouped_page(numa_node_id(), &gpc_pks);
> +       if (!table)
> +               return NULL;
> +
> +       if (gfp & __GFP_ZERO)
> +               memset(page_address(table), 0, PAGE_SIZE);
> +
> +       if (memcg_kmem_enabled() &&
> +           gfp & __GFP_ACCOUNT &&
> +           !__memcg_kmem_charge_page(table, gfp, 0)) {
> +               free_table(table);
> +               table = NULL;
> +       }
> +
> +       VM_BUG_ON_PAGE(*(unsigned long *)&table->ptl, table);

table can be NULL due to charge failure.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 5/9] x86, mm: Use cache of page tables
  2021-05-06 18:24   ` Shakeel Butt
@ 2021-05-07 16:27     ` Edgecombe, Rick P
  0 siblings, 0 replies; 32+ messages in thread
From: Edgecombe, Rick P @ 2021-05-07 16:27 UTC (permalink / raw)
  To: shakeelb
  Cc: kernel-hardening, luto, Hansen, Dave, x86, linux-mm, peterz,
	akpm, linux-kernel, rppt, linux-hardening, Weiny, Ira, Williams,
	Dan J

On Thu, 2021-05-06 at 11:24 -0700, Shakeel Butt wrote:
> On Tue, May 4, 2021 at 5:36 PM Rick Edgecombe
> <rick.p.edgecombe@intel.com> wrote:
> > 
> [...]
> > +#ifdef CONFIG_PKS_PG_TABLES
> > +struct page *alloc_table(gfp_t gfp)
> > +{
> > +       struct page *table;
> > +
> > +       if (!pks_page_en)
> > +               return alloc_page(gfp);
> > +
> > +       table = get_grouped_page(numa_node_id(), &gpc_pks);
> > +       if (!table)
> > +               return NULL;
> > +
> > +       if (gfp & __GFP_ZERO)
> > +               memset(page_address(table), 0, PAGE_SIZE);
> > +
> > +       if (memcg_kmem_enabled() &&
> > +           gfp & __GFP_ACCOUNT &&
> > +           !__memcg_kmem_charge_page(table, gfp, 0)) {
> > +               free_table(table);
> > +               table = NULL;
> > +       }
> > +
> > +       VM_BUG_ON_PAGE(*(unsigned long *)&table->ptl, table);
> 
> table can be NULL due to charge failure.

Argh, yes. Thank you. I'll remove the VM_BUG_ON, it was left
accidentally.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 3/9] x86/mm/cpa: Add grouped page allocations
  2021-05-05 21:57         ` Edgecombe, Rick P
@ 2021-05-09  9:39           ` Mike Rapoport
  2021-05-10 19:38             ` Edgecombe, Rick P
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Rapoport @ 2021-05-09  9:39 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: peterz, kernel-hardening, Hansen, Dave, luto, x86, linux-mm,
	akpm, linux-kernel, Williams, Dan J, linux-hardening, Weiny, Ira

On Wed, May 05, 2021 at 09:57:17PM +0000, Edgecombe, Rick P wrote:
> On Wed, 2021-05-05 at 21:45 +0300, Mike Rapoport wrote:
> > On Wed, May 05, 2021 at 03:09:12PM +0200, Peter Zijlstra wrote:
> > > On Wed, May 05, 2021 at 03:08:27PM +0300, Mike Rapoport wrote:
> > > > On Tue, May 04, 2021 at 05:30:26PM -0700, Rick Edgecombe wrote:
> > > > > For x86, setting memory permissions on the direct map results
> > > > > in fracturing
> > > > > large pages. Direct map fracturing can be reduced by locating
> > > > > pages that
> > > > > will have their permissions set close together.
> > > > > 
> > > > > Create a simple page cache that allocates pages from huge page
> > > > > size
> > > > > blocks. Don't guarantee that a page will come from a huge page
> > > > > grouping,
> > > > > instead fallback to non-grouped pages to fulfill the allocation
> > > > > if
> > > > > needed. Also, register a shrinker such that the system can ask
> > > > > for the
> > > > > pages back if needed. Since this is only needed when there is a
> > > > > direct
> > > > > map, compile it out on highmem systems.
> > > > 
> > > > I only had time to skim through the patches, I like the idea of
> > > > having a
> > > > simple cache that allocates larger pages with a fallback to basic
> > > > page
> > > > size.
> > > > 
> > > > I just think it should be more generic and closer to the page
> > > > allocator.
> > > > I was thinking about adding a GFP flag that will tell that the
> > > > allocated
> > > > pages should be removed from the direct map. Then alloc_pages()
> > > > could use
> > > > such cache whenever this GFP flag is specified with a fallback
> > > > for lower
> > > > order allocations.
> > > 
> > > That doesn't provide enough information I think. Removing from
> > > direct
> > > map isn't the only consideration, you also want to group them by
> > > the
> > > target protection bits such that we don't get to use 4k pages quite
> > > so
> > > much.
> > 
> > Unless I'm missing something we anyway hand out 4k pages from the
> > cache and
> > the neighbouring 4k may end up with different protections.
> > 
> > This is also similar to what happens in the set Rick posted a while
> > ago to
> > support grouped vmalloc allocations:
> > 
> 
> One issue is with the shrinker callbacks. If you are just trying to
> reset and free a single page because the system is low on memory, it
> could be problematic to have to break a large page, which would require
> another page.

I don't follow you here. Maybe I've misread the patches but AFAIU the large
page is broken at allocation time and 4k pages remain 4k pages afterwards.

In my understanding the problem with a simple shrinker is that even if we
have the entire 2M free it is not being reinstated as 2M page in the direct
mapping.
 
-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC 3/9] x86/mm/cpa: Add grouped page allocations
  2021-05-09  9:39           ` Mike Rapoport
@ 2021-05-10 19:38             ` Edgecombe, Rick P
  0 siblings, 0 replies; 32+ messages in thread
From: Edgecombe, Rick P @ 2021-05-10 19:38 UTC (permalink / raw)
  To: rppt
  Cc: kernel-hardening, Hansen, Dave, luto, x86, linux-mm, peterz,
	akpm, linux-kernel, Williams, Dan J, linux-hardening, Weiny, Ira

On Sun, 2021-05-09 at 12:39 +0300, Mike Rapoport wrote:
> On Wed, May 05, 2021 at 09:57:17PM +0000, Edgecombe, Rick P wrote:
> > On Wed, 2021-05-05 at 21:45 +0300, Mike Rapoport wrote:
> > > On Wed, May 05, 2021 at 03:09:12PM +0200, Peter Zijlstra wrote:
> > > > On Wed, May 05, 2021 at 03:08:27PM +0300, Mike Rapoport wrote:
> > > > > On Tue, May 04, 2021 at 05:30:26PM -0700, Rick Edgecombe
> > > > > wrote:
> > > > > > For x86, setting memory permissions on the direct map
> > > > > > results
> > > > > > in fracturing
> > > > > > large pages. Direct map fracturing can be reduced by
> > > > > > locating
> > > > > > pages that
> > > > > > will have their permissions set close together.
> > > > > > 
> > > > > > Create a simple page cache that allocates pages from huge
> > > > > > page
> > > > > > size
> > > > > > blocks. Don't guarantee that a page will come from a huge
> > > > > > page
> > > > > > grouping,
> > > > > > instead fallback to non-grouped pages to fulfill the
> > > > > > allocation
> > > > > > if
> > > > > > needed. Also, register a shrinker such that the system can
> > > > > > ask
> > > > > > for the
> > > > > > pages back if needed. Since this is only needed when there
> > > > > > is a
> > > > > > direct
> > > > > > map, compile it out on highmem systems.
> > > > > 
> > > > > I only had time to skim through the patches, I like the idea
> > > > > of
> > > > > having a
> > > > > simple cache that allocates larger pages with a fallback to
> > > > > basic
> > > > > page
> > > > > size.
> > > > > 
> > > > > I just think it should be more generic and closer to the page
> > > > > allocator.
> > > > > I was thinking about adding a GFP flag that will tell that
> > > > > the
> > > > > allocated
> > > > > pages should be removed from the direct map. Then
> > > > > alloc_pages()
> > > > > could use
> > > > > such cache whenever this GFP flag is specified with a
> > > > > fallback
> > > > > for lower
> > > > > order allocations.
> > > > 
> > > > That doesn't provide enough information I think. Removing from
> > > > direct
> > > > map isn't the only consideration, you also want to group them
> > > > by
> > > > the
> > > > target protection bits such that we don't get to use 4k pages
> > > > quite
> > > > so
> > > > much.
> > > 
> > > Unless I'm missing something we anyway hand out 4k pages from the
> > > cache and
> > > the neighbouring 4k may end up with different protections.
> > > 
> > > This is also similar to what happens in the set Rick posted a
> > > while
> > > ago to
> > > support grouped vmalloc allocations:
> > > 
> > 
> > One issue is with the shrinker callbacks. If you are just trying to
> > reset and free a single page because the system is low on memory,
> > it
> > could be problematic to have to break a large page, which would
> > require
> > another page.
> 
> I don't follow you here. Maybe I've misread the patches but AFAIU the
> large
> page is broken at allocation time and 4k pages remain 4k pages
> afterwards.

Yea that's right.

I thought Peter was saying if the page allocator grouped all of the
same permission together it could often leave the direct map as large
pages, and so the page allocator would have to know about permissions.

So I was just trying to say, to leave large pages on the direct map,
the shrinker has to handle breaking a page while freeing a single page.
So that would have to be addressed to get large pages with permissions
in the first place.

It doesn't seem impossible to solve I guess, so maybe not an important
point. It could maybe just hold a page in reserve.

Now that I think about it, since this PKS tables series holds all
potentially needed direct map page tables in reserve, it shouldn't
actually be a problem for this case. So this could leave the PKS tables
pages as large on the direct map.

> In my understanding the problem with a simple shrinker is that even
> if we
> have the entire 2M free it is not being reinstated as 2M page in the
> direct
> mapping.

Yea, that is a downside to this simple shrinker. 


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-05-10 19:39 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05  0:30 [PATCH RFC 0/9] PKS write protected page tables Rick Edgecombe
2021-05-05  0:30 ` [PATCH RFC 1/9] list: Support getting most recent element in list_lru Rick Edgecombe
2021-05-05  0:30 ` [PATCH RFC 2/9] list: Support list head not in object for list_lru Rick Edgecombe
2021-05-05  0:30 ` [PATCH RFC 3/9] x86/mm/cpa: Add grouped page allocations Rick Edgecombe
2021-05-05 12:08   ` Mike Rapoport
2021-05-05 13:09     ` Peter Zijlstra
2021-05-05 18:45       ` Mike Rapoport
2021-05-05 21:57         ` Edgecombe, Rick P
2021-05-09  9:39           ` Mike Rapoport
2021-05-10 19:38             ` Edgecombe, Rick P
2021-05-05  0:30 ` [PATCH RFC 4/9] mm: Explicitly zero page table lock ptr Rick Edgecombe
2021-05-05  0:30 ` [PATCH RFC 5/9] x86, mm: Use cache of page tables Rick Edgecombe
2021-05-05  8:51   ` Peter Zijlstra
2021-05-05 12:09     ` Mike Rapoport
2021-05-05 13:19       ` Peter Zijlstra
2021-05-05 21:54         ` Edgecombe, Rick P
2021-05-06 17:59       ` Matthew Wilcox
2021-05-06 18:24   ` Shakeel Butt
2021-05-07 16:27     ` Edgecombe, Rick P
2021-05-05  0:30 ` [PATCH RFC 6/9] x86/mm/cpa: Add set_memory_pks() Rick Edgecombe
2021-05-05  0:30 ` [PATCH RFC 7/9] x86/mm/cpa: Add perm callbacks to grouped pages Rick Edgecombe
2021-05-05  0:30 ` [PATCH RFC 8/9] x86, mm: Protect page tables with PKS Rick Edgecombe
2021-05-05  0:30 ` [PATCH RFC 9/9] x86, cpa: PKS protect direct map page tables Rick Edgecombe
2021-05-05  2:03 ` [PATCH RFC 0/9] PKS write protected " Ira Weiny
2021-05-05  6:25 ` Kees Cook
2021-05-05  8:37   ` Peter Zijlstra
2021-05-05 18:38     ` Kees Cook
2021-05-05 19:51   ` Edgecombe, Rick P
2021-05-06  0:00   ` Ira Weiny
2021-05-05 11:08 ` Vlastimil Babka
2021-05-05 11:56   ` Peter Zijlstra
2021-05-05 19:46     ` Edgecombe, Rick P

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).