lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021
@ 2021-07-19 12:31 James Simmons
  2021-07-19 12:31 ` [lustre-devel] [PATCH 01/18] lustre: statahead: update task management code James Simmons
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:31 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Backport patches from OpenSFS tree as of July 18, 2021.

Amir Shehata (1):
  lnet: RMDA infrastructure updates

Arshad Hussain (1):
  lnet: libcfs: Add checksum speed under /sys/fs

Bobi Jam (1):
  lustre: llite: failed ASSERTION(ldlm_has_layout(lock))

Chris Horn (1):
  lnet: Correct peer NI recovery age out calculation

Dominique Martinet (1):
  lnet: annotate LNET_WIRE_HANDLE_COOKIE_NONE as u64

Lai Siyao (2):
  lustre: lmv: compare space to mkdir on parent MDT
  lustre: llite: reset pfid after dir migration

Mike Marciniszyn (2):
  lnet: o2iblnd: Move racy NULL assignment
  lnet: o2iblnd: Avoid double posting invalidate

Mikhail Pershin (1):
  lustre: uapi: per-user changelog names and mask

Mr. NeilBrown (1):
  lustre: statahead: update task management code

Qian Yingjin (2):
  lustre: llite: simplify callback handling for async getattr
  lustre: pcc: introducing OBD_CONNECT2_PCCRO flag

Sebastien Buisson (2):
  lustre: quota: nodemap squashed root cannot bypass quota
  lustre: sec: migrate/extend/split on encrypted file

Serguei Smirnov (1):
  lnet: use ni fatal error when calculating net health

Wang Shilong (2):
  lustre: quota: add get/set project support for non-dir/file
  lustre: readahead: fix to reserve min pages

 fs/lustre/include/lu_object.h           |   3 +-
 fs/lustre/include/lustre_crypto.h       |   2 +
 fs/lustre/include/obd.h                 |  34 +--
 fs/lustre/include/obd_class.h           |   4 +-
 fs/lustre/llite/dir.c                   |   2 +
 fs/lustre/llite/file.c                  | 117 +++++++--
 fs/lustre/llite/llite_internal.h        |  12 +-
 fs/lustre/llite/llite_lib.c             |  28 ++-
 fs/lustre/llite/namei.c                 | 101 +++++++-
 fs/lustre/llite/rw.c                    |   8 +-
 fs/lustre/llite/statahead.c             | 420 ++++++++++++--------------------
 fs/lustre/lmv/lmv_obd.c                 |  23 +-
 fs/lustre/mdc/mdc_internal.h            |   3 +-
 fs/lustre/mdc/mdc_locks.c               |  31 ++-
 fs/lustre/obdclass/llog_swab.c          |  17 +-
 fs/lustre/obdclass/lu_tgt_descs.c       |  11 +-
 fs/lustre/obdclass/obd_sysfs.c          |  61 +++++
 fs/lustre/osc/osc_cache.c               |   2 +-
 fs/lustre/ptlrpc/wiretest.c             |  29 ++-
 include/linux/libcfs/libcfs_crypto.h    |   3 +
 include/linux/lnet/lib-lnet.h           |   1 +
 include/linux/lnet/lib-types.h          |   2 +
 include/uapi/linux/lnet/lnet-types.h    |   2 +-
 include/uapi/linux/lustre/lustre_idl.h  |  22 ++
 include/uapi/linux/lustre/lustre_user.h |  18 +-
 net/lnet/klnds/o2iblnd/o2iblnd.c        |   4 +-
 net/lnet/klnds/o2iblnd/o2iblnd.h        |   1 +
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c     |  16 +-
 net/lnet/libcfs/linux-crypto.c          |   3 +-
 net/lnet/lnet/api-ni.c                  |   5 +-
 net/lnet/lnet/lib-md.c                  |  29 ++-
 net/lnet/lnet/peer.c                    |   3 +-
 32 files changed, 625 insertions(+), 392 deletions(-)

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 01/18] lustre: statahead: update task management code
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
@ 2021-07-19 12:31 ` James Simmons
  2021-07-19 12:31 ` [lustre-devel] [PATCH 02/18] lustre: llite: simplify callback handling for async getattr James Simmons
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:31 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: "Mr. NeilBrown" <neilb@suse.de>

When the rewrite to remove ptlrpc_thread from statahead
was ported to OpenSFS, a number of improvements were made.
This patch back-ports those to Linux

And import bug-fix is in ll_agl_add() where we now call
wake_up_process() inside a spinlock, so we can be sure the task pointer
is not NULL.

A few while-loops waiting for events have been simplified to only check
each 'continue' condition once instead of checking before doing an
action, then checking it before sleeping.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12780
Lustre-commit: 5b630935452d5d8d7 ("LU-12780 llite: don't use ptlrpc_thread for sai_agl_thread")
Lustre-commit: 6bf49037b3a134587 ("LU-12780 llite: avoid ptlrpc_thread for ll_statahead_thread")
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/36258
Reviewed-on: https://review.whamcloud.com/36259
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/statahead.c | 95 +++++++++++++++++++++++----------------------
 1 file changed, 49 insertions(+), 46 deletions(-)

diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c
index 282cb5a..40ea206 100644
--- a/fs/lustre/llite/statahead.c
+++ b/fs/lustre/llite/statahead.c
@@ -35,6 +35,7 @@
 #include <linux/mm.h>
 #include <linux/highmem.h>
 #include <linux/pagemap.h>
+#include <linux/delay.h>
 
 #define DEBUG_SUBSYSTEM S_LLITE
 
@@ -299,7 +300,6 @@ static void sa_put(struct ll_statahead_info *sai, struct sa_entry *entry,
 	if (sai->sai_task)
 		wake_up_process(sai->sai_task);
 	spin_unlock(&lli->lli_sa_lock);
-
 }
 
 /*
@@ -898,37 +898,40 @@ static int ll_agl_thread(void *arg)
 	CDEBUG(D_READA, "agl thread started: sai %p, parent %pd\n",
 	       sai, parent);
 
-	while (!kthread_should_stop()) {
-
+	while (({set_current_state(TASK_IDLE);
+		 !kthread_should_stop(); })) {
 		spin_lock(&plli->lli_agl_lock);
-		/* The statahead thread maybe help to process AGL entries,
-		 * so check whether list empty again.
-		 */
 		clli = list_first_entry_or_null(&sai->sai_agls,
 						struct ll_inode_info,
 						lli_agl_list);
 		if (clli) {
+			__set_current_state(TASK_RUNNING);
 			list_del_init(&clli->lli_agl_list);
 			spin_unlock(&plli->lli_agl_lock);
 			ll_agl_trigger(&clli->lli_vfs_inode, sai);
 			cond_resched();
 		} else {
 			spin_unlock(&plli->lli_agl_lock);
-		}
-
-		set_current_state(TASK_IDLE);
-		if (list_empty(&sai->sai_agls) &&
-		    !kthread_should_stop())
 			schedule();
-		__set_current_state(TASK_RUNNING);
+		}
 	}
+	__set_current_state(TASK_RUNNING);
 	return 0;
 }
 
 static void ll_stop_agl(struct ll_statahead_info *sai)
 {
-	struct ll_inode_info *plli = ll_i2info(sai->sai_dentry->d_inode);
+	struct dentry *parent = sai->sai_dentry;
+	struct ll_inode_info *plli = ll_i2info(parent->d_inode);
 	struct ll_inode_info *clli;
+	struct task_struct *agl_task;
+
+	spin_lock(&plli->lli_agl_lock);
+	agl_task = sai->sai_agl_task;
+	sai->sai_agl_task = NULL;
+	spin_unlock(&plli->lli_agl_lock);
+	if (!agl_task)
+		return;
 
 	CDEBUG(D_READA, "stop agl thread: sai %p pid %u\n",
 	       sai, (unsigned int)sai->sai_agl_task->pid);
@@ -1076,14 +1079,19 @@ static int ll_statahead_thread(void *arg)
 
 			fid_le_to_cpu(&fid, &ent->lde_fid);
 
-			do {
-				sa_handle_callback(sai);
+			while (({set_current_state(TASK_IDLE);
+				 sai->sai_task; })) {
+				if (sa_has_callback(sai)) {
+					__set_current_state(TASK_RUNNING);
+					sa_handle_callback(sai);
+				}
 
 				spin_lock(&lli->lli_agl_lock);
 				while (sa_sent_full(sai) &&
 				       !agl_list_empty(sai)) {
 					struct ll_inode_info *clli;
 
+					__set_current_state(TASK_RUNNING);
 					clli = list_first_entry(&sai->sai_agls,
 								struct ll_inode_info,
 								lli_agl_list);
@@ -1097,15 +1105,11 @@ static int ll_statahead_thread(void *arg)
 				}
 				spin_unlock(&lli->lli_agl_lock);
 
-				set_current_state(TASK_IDLE);
-				if (sa_sent_full(sai) &&
-				    !sa_has_callback(sai) &&
-				    agl_list_empty(sai) &&
-				    sai->sai_task)
-					/* wait for spare statahead window */
-					schedule();
-				__set_current_state(TASK_RUNNING);
-			} while (sa_sent_full(sai) && sai->sai_task);
+				if (!sa_sent_full(sai))
+					break;
+				schedule();
+			}
+			__set_current_state(TASK_RUNNING);
 
 			sa_statahead(parent, name, namelen, &fid);
 		}
@@ -1138,21 +1142,18 @@ static int ll_statahead_thread(void *arg)
 	 * statahead is finished, but statahead entries need to be cached, wait
 	 * for file release to stop me.
 	 */
-	while (sai->sai_task) {
-		sa_handle_callback(sai);
-
-		set_current_state(TASK_IDLE);
-		/* ensure we see the NULL stored by
-		 * ll_deauthorize_statahead()
-		 */
-		if (!sa_has_callback(sai) &&
-		    smp_load_acquire(&sai->sai_task))
+	while (({set_current_state(TASK_IDLE);
+		 sai->sai_task; })) {
+		if (sa_has_callback(sai)) {
+			__set_current_state(TASK_RUNNING);
+			sa_handle_callback(sai);
+		} else {
 			schedule();
-		__set_current_state(TASK_RUNNING);
+		}
 	}
+	__set_current_state(TASK_RUNNING);
 out:
-	if (sai->sai_agl_task)
-		ll_stop_agl(sai);
+	ll_stop_agl(sai);
 
 	/*
 	 * wait for inflight statahead RPCs to finish, and then we can free sai
@@ -1160,7 +1161,7 @@ static int ll_statahead_thread(void *arg)
 	 */
 	while (sai->sai_sent != sai->sai_replied) {
 		/* in case we're not woken up, timeout wait */
-		schedule_timeout_idle(HZ>>3);
+		msleep(125);
 	}
 
 	/* release resources held by statahead RPCs */
@@ -1176,7 +1177,7 @@ static int ll_statahead_thread(void *arg)
 	wake_up(&sai->sai_waitq);
 	ll_sai_put(sai);
 
-	do_exit(rc);
+	return rc;
 }
 
 /* authorize opened dir handle @key to statahead */
@@ -1220,18 +1221,15 @@ void ll_deauthorize_statahead(struct inode *dir, void *key)
 	sai = lli->lli_sai;
 	if (sai && sai->sai_task) {
 		/*
-		 * statahead thread may not quit yet because it needs to cache
-		 * entries, now it's time to tell it to quit.
+		 * statahead thread may not have quit yet because it needs to
+		 * cache entries, now it's time to tell it to quit.
 		 *
-		 * In case sai is released, wake_up() is called inside spinlock,
-		 * so we use smp_store_release() to serialize ops.
+		 * wake_up_process() provides the necessary barriers
+		 * to pair with set_current_state().
 		 */
 		struct task_struct *task = sai->sai_task;
 
-		/* ensure ll_statahead_thread sees the NULL before
-		 * calling schedule() again.
-		 */
-		smp_store_release(&sai->sai_task, NULL);
+		sai->sai_task = NULL;
 		wake_up_process(task);
 	}
 	spin_unlock(&lli->lli_sa_lock);
@@ -1510,6 +1508,11 @@ static int revalidate_statahead_dentry(struct inode *dir,
 	ldd = ll_d2d(*dentryp);
 	ldd->lld_sa_generation = lli->lli_sa_generation;
 	sa_put(sai, entry, lli);
+	spin_lock(&lli->lli_sa_lock);
+	if (sai->sai_task)
+		wake_up_process(sai->sai_task);
+	spin_unlock(&lli->lli_sa_lock);
+
 	return rc;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 02/18] lustre: llite: simplify callback handling for async getattr
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
  2021-07-19 12:31 ` [lustre-devel] [PATCH 01/18] lustre: statahead: update task management code James Simmons
@ 2021-07-19 12:31 ` James Simmons
  2021-07-19 12:31 ` [lustre-devel] [PATCH 03/18] lustre: uapi: per-user changelog names and mask James Simmons
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:31 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Qian Yingjin <qian@ddn.com>

In this patch, it prepares the inode and set lock data directly in
the callback interpret of the intent async getattr RPC request (in
ptlrpcd context), simplifies the old impementation that defer this
work in the statahead thread.

According to the benchmark result, the workload "ls -l" to a large
directory on a client without any caching (server and client),
containing 1M files (47001 bytes) shows the results with measured
elapsed time:
  - w/o patch:  180 seconds;
  - w patch:    181 seconds;

There is no any obvious performance regession.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14139
Lustre-commit: cbaaa7cde45f593 ("LU-14139 llite: simplify callback handling for async getattr")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/40712
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd.h          |  34 ++--
 fs/lustre/include/obd_class.h    |   4 +-
 fs/lustre/llite/llite_internal.h |   7 +-
 fs/lustre/llite/statahead.c      | 343 ++++++++++++++-------------------------
 fs/lustre/lmv/lmv_obd.c          |   6 +-
 fs/lustre/mdc/mdc_internal.h     |   3 +-
 fs/lustre/mdc/mdc_locks.c        |  31 ++--
 7 files changed, 160 insertions(+), 268 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index 86d7839..eeb6262 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -818,18 +818,24 @@ struct md_callback {
 			       void *data, int flag);
 };
 
-struct md_enqueue_info;
-/* metadata stat-ahead */
-
-struct md_enqueue_info {
-	struct md_op_data		mi_data;
-	struct lookup_intent		mi_it;
-	struct lustre_handle		mi_lockh;
-	struct inode		       *mi_dir;
-	struct ldlm_enqueue_info	mi_einfo;
-	int (*mi_cb)(struct ptlrpc_request *req,
-		     struct md_enqueue_info *minfo, int rc);
-	void			       *mi_cbdata;
+enum md_opcode {
+	MD_OP_NONE	= 0,
+	MD_OP_GETATTR	= 1,
+	MD_OP_MAX,
+};
+
+struct md_op_item {
+	enum md_opcode			mop_opc;
+	struct md_op_data		mop_data;
+	struct lookup_intent		mop_it;
+	struct lustre_handle		mop_lockh;
+	struct ldlm_enqueue_info	mop_einfo;
+	int (*mop_cb)(struct req_capsule *pill,
+		      struct md_op_item *item,
+		      int rc);
+	void			       *mop_cbdata;
+	struct inode		       *mop_dir;
+	u64				mop_lock_flags;
 };
 
 struct obd_ops {
@@ -1060,8 +1066,8 @@ struct md_ops {
 				const char *name, int namelen,
 				struct lu_fid *fid);
 
-	int (*intent_getattr_async)(struct obd_export *,
-				    struct md_enqueue_info *);
+	int (*intent_getattr_async)(struct obd_export *exp,
+				    struct md_op_item *item);
 
 	int (*revalidate_lock)(struct obd_export *, struct lookup_intent *,
 			       struct lu_fid *, u64 *bits);
diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index f2a3d2b..ad9b2fc 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -1594,7 +1594,7 @@ static inline int md_init_ea_size(struct obd_export *exp, u32 easize,
 }
 
 static inline int md_intent_getattr_async(struct obd_export *exp,
-					  struct md_enqueue_info *minfo)
+					  struct md_op_item *item)
 {
 	int rc;
 
@@ -1605,7 +1605,7 @@ static inline int md_intent_getattr_async(struct obd_export *exp,
 	lprocfs_counter_incr(exp->exp_obd->obd_md_stats,
 			     LPROC_MD_INTENT_GETATTR_ASYNC);
 
-	return MDP(exp->exp_obd, intent_getattr_async)(exp, minfo);
+	return MDP(exp->exp_obd, intent_getattr_async)(exp, item);
 }
 
 static inline int md_revalidate_lock(struct obd_export *exp,
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index a073d6d..1d5255e 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1477,17 +1477,12 @@ struct ll_statahead_info {
 					     * is not a hidden one
 					     */
 	unsigned int	    sai_skip_hidden;/* skipped hidden dentry count */
-	unsigned int	    sai_ls_all:1,   /* "ls -al", do stat-ahead for
+	unsigned int	    sai_ls_all:1;   /* "ls -al", do stat-ahead for
 					     * hidden entries
 					     */
-				sai_in_readpage:1;/* statahead in readdir() */
 	wait_queue_head_t	sai_waitq;      /* stat-ahead wait queue */
 	struct task_struct     *sai_task;       /* stat-ahead thread */
 	struct task_struct     *sai_agl_task;   /* AGL thread */
-	struct list_head	sai_interim_entries; /* entries which got async
-						      * stat reply, but not
-						      * instantiated
-						      */
 	struct list_head	sai_entries;	/* completed entries */
 	struct list_head	sai_agls;	/* AGLs to be sent */
 	struct list_head	sai_cache[LL_SA_CACHE_SIZE];
diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c
index 40ea206..becd0e1 100644
--- a/fs/lustre/llite/statahead.c
+++ b/fs/lustre/llite/statahead.c
@@ -55,13 +55,12 @@ enum se_stat {
 
 /*
  * sa_entry is not refcounted: statahead thread allocates it and do async stat,
- * and in async stat callback ll_statahead_interpret() will add it into
- * sai_interim_entries, later statahead thread will call sa_handle_callback() to
- * instantiate entry and move it into sai_entries, and then only scanner process
- * can access and free it.
+ * and in async stat callback ll_statahead_interpret() will prepare the inode
+ * and set lock data in the ptlrpcd context. Then the scanner process will be
+ * woken up if this entry is the waiting one, can access and free it.
  */
 struct sa_entry {
-	/* link into sai_interim_entries or sai_entries */
+	/* link into sai_entries */
 	struct list_head	se_list;
 	/* link into sai hash table locally */
 	struct list_head	se_hash;
@@ -73,10 +72,6 @@ struct sa_entry {
 	enum se_stat		se_state;
 	/* entry size, contains name */
 	int			se_size;
-	/* pointer to async getattr enqueue info */
-	struct md_enqueue_info	*se_minfo;
-	/* pointer to the async getattr request */
-	struct ptlrpc_request	*se_req;
 	/* pointer to the target inode */
 	struct inode		*se_inode;
 	/* entry name */
@@ -113,9 +108,7 @@ static inline int sa_hash(int val)
 	spin_unlock(&sai->sai_cache_lock[i]);
 }
 
-/*
- * Remove entry from SA table.
- */
+/* unhash entry from sai_cache */
 static inline void
 sa_unhash(struct ll_statahead_info *sai, struct sa_entry *entry)
 {
@@ -138,12 +131,6 @@ static inline int sa_sent_full(struct ll_statahead_info *sai)
 	return atomic_read(&sai->sai_cache_count) >= sai->sai_max;
 }
 
-/* got async stat replies */
-static inline int sa_has_callback(struct ll_statahead_info *sai)
-{
-	return !list_empty(&sai->sai_interim_entries);
-}
-
 static inline int agl_list_empty(struct ll_statahead_info *sai)
 {
 	return list_empty(&sai->sai_agls);
@@ -267,8 +254,8 @@ static void sa_free(struct ll_statahead_info *sai, struct sa_entry *entry)
 }
 
 /* called by scanner after use, sa_entry will be killed */
-static void sa_put(struct ll_statahead_info *sai, struct sa_entry *entry,
-		   struct ll_inode_info *lli)
+static void
+sa_put(struct ll_statahead_info *sai, struct sa_entry *entry)
 {
 	struct sa_entry *tmp, *next;
 
@@ -295,11 +282,6 @@ static void sa_put(struct ll_statahead_info *sai, struct sa_entry *entry,
 			break;
 		sa_kill(sai, tmp);
 	}
-
-	spin_lock(&lli->lli_sa_lock);
-	if (sai->sai_task)
-		wake_up_process(sai->sai_task);
-	spin_unlock(&lli->lli_sa_lock);
 }
 
 /*
@@ -334,55 +316,55 @@ static void sa_put(struct ll_statahead_info *sai, struct sa_entry *entry,
 }
 
 /* finish async stat RPC arguments */
-static void sa_fini_data(struct md_enqueue_info *minfo)
+static void sa_fini_data(struct md_op_item *item)
 {
-	ll_unlock_md_op_lsm(&minfo->mi_data);
-	iput(minfo->mi_dir);
-	kfree(minfo);
+	ll_unlock_md_op_lsm(&item->mop_data);
+	iput(item->mop_dir);
+	kfree(item);
 }
 
-static int ll_statahead_interpret(struct ptlrpc_request *req,
-				  struct md_enqueue_info *minfo, int rc);
+static int ll_statahead_interpret(struct req_capsule *pill,
+				  struct md_op_item *item, int rc);
 
 /*
  * prepare arguments for async stat RPC.
  */
-static struct md_enqueue_info *
+static struct md_op_item *
 sa_prep_data(struct inode *dir, struct inode *child, struct sa_entry *entry)
 {
-	struct md_enqueue_info   *minfo;
+	struct md_op_item *item;
 	struct ldlm_enqueue_info *einfo;
-	struct md_op_data        *op_data;
+	struct md_op_data *op_data;
 
-	minfo = kzalloc(sizeof(*minfo), GFP_NOFS);
-	if (!minfo)
+	item = kzalloc(sizeof(*item), GFP_NOFS);
+	if (!item)
 		return ERR_PTR(-ENOMEM);
 
-	op_data = ll_prep_md_op_data(&minfo->mi_data, dir, child,
+	op_data = ll_prep_md_op_data(&item->mop_data, dir, child,
 				     entry->se_qstr.name, entry->se_qstr.len, 0,
 				     LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data)) {
-		kfree(minfo);
-		return (struct md_enqueue_info *)op_data;
+		kfree(item);
+		return ERR_CAST(item);
 	}
 
 	if (!child)
 		op_data->op_fid2 = entry->se_fid;
 
-	minfo->mi_it.it_op = IT_GETATTR;
-	minfo->mi_dir = igrab(dir);
-	minfo->mi_cb = ll_statahead_interpret;
-	minfo->mi_cbdata = entry;
-
-	einfo = &minfo->mi_einfo;
-	einfo->ei_type   = LDLM_IBITS;
-	einfo->ei_mode   = it_to_lock_mode(&minfo->mi_it);
-	einfo->ei_cb_bl  = ll_md_blocking_ast;
-	einfo->ei_cb_cp  = ldlm_completion_ast;
-	einfo->ei_cb_gl  = NULL;
+	item->mop_it.it_op = IT_GETATTR;
+	item->mop_dir = igrab(dir);
+	item->mop_cb = ll_statahead_interpret;
+	item->mop_cbdata = entry;
+
+	einfo = &item->mop_einfo;
+	einfo->ei_type = LDLM_IBITS;
+	einfo->ei_mode = it_to_lock_mode(&item->mop_it);
+	einfo->ei_cb_bl = ll_md_blocking_ast;
+	einfo->ei_cb_cp = ldlm_completion_ast;
+	einfo->ei_cb_gl = NULL;
 	einfo->ei_cbdata = NULL;
 
-	return minfo;
+	return item;
 }
 
 /*
@@ -393,22 +375,8 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
 sa_make_ready(struct ll_statahead_info *sai, struct sa_entry *entry, int ret)
 {
 	struct ll_inode_info *lli = ll_i2info(sai->sai_dentry->d_inode);
-	struct md_enqueue_info *minfo = entry->se_minfo;
-	struct ptlrpc_request *req = entry->se_req;
 	bool wakeup;
 
-	/* release resources used in RPC */
-	if (minfo) {
-		entry->se_minfo = NULL;
-		ll_intent_release(&minfo->mi_it);
-		sa_fini_data(minfo);
-	}
-
-	if (req) {
-		entry->se_req = NULL;
-		ptlrpc_req_finished(req);
-	}
-
 	spin_lock(&lli->lli_sa_lock);
 	wakeup = __sa_make_ready(sai, entry, ret);
 	spin_unlock(&lli->lli_sa_lock);
@@ -465,7 +433,6 @@ static struct ll_statahead_info *ll_sai_alloc(struct dentry *dentry)
 	sai->sai_index = 1;
 	init_waitqueue_head(&sai->sai_waitq);
 
-	INIT_LIST_HEAD(&sai->sai_interim_entries);
 	INIT_LIST_HEAD(&sai->sai_entries);
 	INIT_LIST_HEAD(&sai->sai_agls);
 
@@ -528,7 +495,6 @@ static void ll_sai_put(struct ll_statahead_info *sai)
 		LASSERT(sai->sai_task == NULL);
 		LASSERT(sai->sai_agl_task == NULL);
 		LASSERT(sai->sai_sent == sai->sai_replied);
-		LASSERT(!sa_has_callback(sai));
 
 		list_for_each_entry_safe(entry, next, &sai->sai_entries,
 					 se_list)
@@ -619,26 +585,63 @@ static void ll_agl_trigger(struct inode *inode, struct ll_statahead_info *sai)
 }
 
 /*
- * prepare inode for sa entry, add it into agl list, now sa_entry is ready
- * to be used by scanner process.
+ * Callback for async stat RPC, this is called in ptlrpcd context. It prepares
+ * the inode and set lock data directly in the ptlrpcd context. It will wake up
+ * the directory listing process if the dentry is the waiting one.
  */
-static void sa_instantiate(struct ll_statahead_info *sai,
-			   struct sa_entry *entry)
+static int ll_statahead_interpret(struct req_capsule *pill,
+				  struct md_op_item *item, int rc)
 {
-	struct inode *dir = sai->sai_dentry->d_inode;
-	struct inode *child;
-	struct md_enqueue_info *minfo;
-	struct lookup_intent *it;
-	struct ptlrpc_request *req;
+	struct lookup_intent *it = &item->mop_it;
+	struct inode *dir = item->mop_dir;
+	struct ll_inode_info *lli = ll_i2info(dir);
+	struct ll_statahead_info *sai = lli->lli_sai;
+	struct sa_entry *entry = (struct sa_entry *)item->mop_cbdata;
 	struct mdt_body	*body;
-	int rc = 0;
+	struct inode *child;
+	u64 handle = 0;
+
+	if (it_disposition(it, DISP_LOOKUP_NEG))
+		rc = -ENOENT;
 
-	LASSERT(entry->se_handle != 0);
+	/*
+	 * because statahead thread will wait for all inflight RPC to finish,
+	 * sai should be always valid, no need to refcount
+	 */
+	LASSERT(sai);
+	LASSERT(entry);
 
-	minfo = entry->se_minfo;
-	it = &minfo->mi_it;
-	req = entry->se_req;
-	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
+	CDEBUG(D_READA, "sa_entry %.*s rc %d\n",
+	       entry->se_qstr.len, entry->se_qstr.name, rc);
+
+	if (rc != 0) {
+		ll_intent_release(it);
+		sa_fini_data(item);
+	} else {
+		/*
+		 * release ibits lock ASAP to avoid deadlock when statahead
+		 * thread enqueues lock on parent in readdir and another
+		 * process enqueues lock on child with parent lock held, eg.
+		 * unlink.
+		 */
+		handle = it->it_lock_handle;
+		ll_intent_drop_lock(it);
+		ll_unlock_md_op_lsm(&item->mop_data);
+	}
+
+	if (rc != 0) {
+		spin_lock(&lli->lli_sa_lock);
+		if (__sa_make_ready(sai, entry, rc))
+			wake_up(&sai->sai_waitq);
+
+		sai->sai_replied++;
+		spin_unlock(&lli->lli_sa_lock);
+
+		return rc;
+	}
+
+	entry->se_handle = handle;
+	body = req_capsule_server_get(pill, &RMF_MDT_BODY);
 	if (!body) {
 		rc = -EFAULT;
 		goto out;
@@ -646,7 +649,7 @@ static void sa_instantiate(struct ll_statahead_info *sai,
 
 	child = entry->se_inode;
 	/* revalidate; unlinked and re-created with the same name */
-	if (unlikely(!lu_fid_eq(&minfo->mi_data.op_fid2, &body->mbo_fid1))) {
+	if (unlikely(!lu_fid_eq(&item->mop_data.op_fid2, &body->mbo_fid1))) {
 		if (child) {
 			entry->se_inode = NULL;
 			iput(child);
@@ -663,7 +666,7 @@ static void sa_instantiate(struct ll_statahead_info *sai,
 		goto out;
 	}
 
-	rc = ll_prep_inode(&child, &req->rq_pill, dir->i_sb, it);
+	rc = ll_prep_inode(&child, pill, dir->i_sb, it);
 	if (rc)
 		goto out;
 
@@ -676,107 +679,18 @@ static void sa_instantiate(struct ll_statahead_info *sai,
 
 	if (agl_should_run(sai, child))
 		ll_agl_add(sai, child, entry->se_index);
-
 out:
 	/*
-	 * sa_make_ready() will drop ldlm ibits lock refcount by calling
+	 * First it will drop ldlm ibits lock refcount by calling
 	 * ll_intent_drop_lock() in spite of failures. Do not worry about
 	 * calling ll_intent_drop_lock() more than once.
 	 */
+	ll_intent_release(&item->mop_it);
+	sa_fini_data(item);
 	sa_make_ready(sai, entry, rc);
-}
-
-/* once there are async stat replies, instantiate sa_entry from replies */
-static void sa_handle_callback(struct ll_statahead_info *sai)
-{
-	struct ll_inode_info *lli;
-
-	lli = ll_i2info(sai->sai_dentry->d_inode);
 
 	spin_lock(&lli->lli_sa_lock);
-	while (sa_has_callback(sai)) {
-		struct sa_entry *entry;
-
-		entry = list_first_entry(&sai->sai_interim_entries,
-					 struct sa_entry, se_list);
-		list_del_init(&entry->se_list);
-		spin_unlock(&lli->lli_sa_lock);
-
-		sa_instantiate(sai, entry);
-		spin_lock(&lli->lli_sa_lock);
-	}
-	spin_unlock(&lli->lli_sa_lock);
-}
-
-/*
- * callback for async stat, because this is called in ptlrpcd context, we only
- * put sa_entry in sai_cb_entries list, and let sa_handle_callback() to really
- * prepare inode and instantiate sa_entry later.
- */
-static int ll_statahead_interpret(struct ptlrpc_request *req,
-				  struct md_enqueue_info *minfo, int rc)
-{
-	struct lookup_intent *it = &minfo->mi_it;
-	struct inode *dir = minfo->mi_dir;
-	struct ll_inode_info *lli = ll_i2info(dir);
-	struct ll_statahead_info *sai = lli->lli_sai;
-	struct sa_entry *entry = (struct sa_entry *)minfo->mi_cbdata;
-	u64 handle = 0;
-
-	if (it_disposition(it, DISP_LOOKUP_NEG))
-		rc = -ENOENT;
-
-	/*
-	 * because statahead thread will wait for all inflight RPC to finish,
-	 * sai should be always valid, no need to refcount
-	 */
-	LASSERT(sai);
-	LASSERT(entry);
-
-	CDEBUG(D_READA, "sa_entry %.*s rc %d\n",
-	       entry->se_qstr.len, entry->se_qstr.name, rc);
-
-	if (rc) {
-		ll_intent_release(it);
-		sa_fini_data(minfo);
-	} else {
-		/*
-		 * release ibits lock ASAP to avoid deadlock when statahead
-		 * thread enqueues lock on parent in readdir and another
-		 * process enqueues lock on child with parent lock held, eg.
-		 * unlink.
-		 */
-		handle = it->it_lock_handle;
-		ll_intent_drop_lock(it);
-		ll_unlock_md_op_lsm(&minfo->mi_data);
-	}
-
-	spin_lock(&lli->lli_sa_lock);
-	if (rc) {
-		if (__sa_make_ready(sai, entry, rc))
-			wake_up(&sai->sai_waitq);
-	} else {
-		int first = 0;
-
-		entry->se_minfo = minfo;
-		entry->se_req = ptlrpc_request_addref(req);
-		/*
-		 * Release the async ibits lock ASAP to avoid deadlock
-		 * when statahead thread tries to enqueue lock on parent
-		 * for readpage and other tries to enqueue lock on child
-		 * with parent's lock held, for example: unlink.
-		 */
-		entry->se_handle = handle;
-		if (!sa_has_callback(sai))
-			first = 1;
-
-		list_add_tail(&entry->se_list, &sai->sai_interim_entries);
-
-		if (first && sai->sai_task)
-			wake_up_process(sai->sai_task);
-	}
 	sai->sai_replied++;
-
 	spin_unlock(&lli->lli_sa_lock);
 
 	return rc;
@@ -785,16 +699,16 @@ static int ll_statahead_interpret(struct ptlrpc_request *req,
 /* async stat for file not found in dcache */
 static int sa_lookup(struct inode *dir, struct sa_entry *entry)
 {
-	struct md_enqueue_info *minfo;
+	struct md_op_item *item;
 	int rc;
 
-	minfo = sa_prep_data(dir, NULL, entry);
-	if (IS_ERR(minfo))
-		return PTR_ERR(minfo);
+	item = sa_prep_data(dir, NULL, entry);
+	if (IS_ERR(item))
+		return PTR_ERR(item);
 
-	rc = md_intent_getattr_async(ll_i2mdexp(dir), minfo);
+	rc = md_intent_getattr_async(ll_i2mdexp(dir), item);
 	if (rc)
-		sa_fini_data(minfo);
+		sa_fini_data(item);
 
 	return rc;
 }
@@ -814,7 +728,7 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry,
 		.it_op = IT_GETATTR,
 		.it_lock_handle = 0
 	};
-	struct md_enqueue_info *minfo;
+	struct md_op_item *item;
 	int rc;
 
 	if (unlikely(!inode))
@@ -823,9 +737,9 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry,
 	if (d_mountpoint(dentry))
 		return 1;
 
-	minfo = sa_prep_data(dir, inode, entry);
-	if (IS_ERR(minfo))
-		return PTR_ERR(minfo);
+	item = sa_prep_data(dir, inode, entry);
+	if (IS_ERR(item))
+		return PTR_ERR(item);
 
 	entry->se_inode = igrab(inode);
 	rc = md_revalidate_lock(ll_i2mdexp(dir), &it, ll_inode2fid(inode),
@@ -833,15 +747,15 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry,
 	if (rc == 1) {
 		entry->se_handle = it.it_lock_handle;
 		ll_intent_release(&it);
-		sa_fini_data(minfo);
+		sa_fini_data(item);
 		return 1;
 	}
 
-	rc = md_intent_getattr_async(ll_i2mdexp(dir), minfo);
+	rc = md_intent_getattr_async(ll_i2mdexp(dir), item);
 	if (rc) {
 		entry->se_inode = NULL;
 		iput(inode);
-		sa_fini_data(minfo);
+		sa_fini_data(item);
 	}
 
 	return rc;
@@ -934,14 +848,14 @@ static void ll_stop_agl(struct ll_statahead_info *sai)
 		return;
 
 	CDEBUG(D_READA, "stop agl thread: sai %p pid %u\n",
-	       sai, (unsigned int)sai->sai_agl_task->pid);
-	kthread_stop(sai->sai_agl_task);
+	       sai, (unsigned int)agl_task->pid);
+	kthread_stop(agl_task);
 
-	sai->sai_agl_task = NULL;
 	spin_lock(&plli->lli_agl_lock);
-	while ((clli = list_first_entry_or_null(&sai->sai_agls,
-						struct ll_inode_info,
-						lli_agl_list)) != NULL) {
+	clli = list_first_entry_or_null(&sai->sai_agls,
+					struct ll_inode_info,
+					lli_agl_list);
+	if (clli) {
 		list_del_init(&clli->lli_agl_list);
 		spin_unlock(&plli->lli_agl_lock);
 		clli->lli_agl_index = 0;
@@ -950,7 +864,7 @@ static void ll_stop_agl(struct ll_statahead_info *sai)
 	}
 	spin_unlock(&plli->lli_agl_lock);
 	CDEBUG(D_READA, "agl thread stopped: sai %p, parent %pd\n",
-	       sai, sai->sai_dentry);
+	       sai, parent);
 	ll_sai_put(sai);
 }
 
@@ -1014,10 +928,8 @@ static int ll_statahead_thread(void *arg)
 			break;
 		}
 
-		sai->sai_in_readpage = 1;
 		page = ll_get_dir_page(dir, op_data, pos);
 		ll_unlock_md_op_lsm(op_data);
-		sai->sai_in_readpage = 0;
 		if (IS_ERR(page)) {
 			rc = PTR_ERR(page);
 			CDEBUG(D_READA,
@@ -1081,14 +993,9 @@ static int ll_statahead_thread(void *arg)
 
 			while (({set_current_state(TASK_IDLE);
 				 sai->sai_task; })) {
-				if (sa_has_callback(sai)) {
-					__set_current_state(TASK_RUNNING);
-					sa_handle_callback(sai);
-				}
-
 				spin_lock(&lli->lli_agl_lock);
 				while (sa_sent_full(sai) &&
-				       !agl_list_empty(sai)) {
+				       !list_empty(&sai->sai_agls)) {
 					struct ll_inode_info *clli;
 
 					__set_current_state(TASK_RUNNING);
@@ -1140,16 +1047,11 @@ static int ll_statahead_thread(void *arg)
 
 	/*
 	 * statahead is finished, but statahead entries need to be cached, wait
-	 * for file release to stop me.
+	 * for file release closedir() call to stop me.
 	 */
 	while (({set_current_state(TASK_IDLE);
 		 sai->sai_task; })) {
-		if (sa_has_callback(sai)) {
-			__set_current_state(TASK_RUNNING);
-			sa_handle_callback(sai);
-		} else {
-			schedule();
-		}
+		schedule();
 	}
 	__set_current_state(TASK_RUNNING);
 out:
@@ -1159,13 +1061,9 @@ static int ll_statahead_thread(void *arg)
 	 * wait for inflight statahead RPCs to finish, and then we can free sai
 	 * safely because statahead RPC will access sai data
 	 */
-	while (sai->sai_sent != sai->sai_replied) {
+	while (sai->sai_sent != sai->sai_replied)
 		/* in case we're not woken up, timeout wait */
 		msleep(125);
-	}
-
-	/* release resources held by statahead RPCs */
-	sa_handle_callback(sai);
 
 	CDEBUG(D_READA, "statahead thread stopped: sai %p, parent %pd\n",
 	       sai, parent);
@@ -1173,8 +1071,8 @@ static int ll_statahead_thread(void *arg)
 	spin_lock(&lli->lli_sa_lock);
 	sai->sai_task = NULL;
 	spin_unlock(&lli->lli_sa_lock);
-
 	wake_up(&sai->sai_waitq);
+
 	ll_sai_put(sai);
 
 	return rc;
@@ -1200,8 +1098,8 @@ void ll_authorize_statahead(struct inode *dir, void *key)
 }
 
 /*
- * deauthorize opened dir handle @key to statahead, but statahead thread may
- * still be running, notify it to quit.
+ * deauthorize opened dir handle @key to statahead, and notify statahead thread
+ * to quit if it's running.
  */
 void ll_deauthorize_statahead(struct inode *dir, void *key)
 {
@@ -1427,10 +1325,6 @@ static int revalidate_statahead_dentry(struct inode *dir,
 		goto out_unplug;
 	}
 
-	/* if statahead is busy in readdir, help it do post-work */
-	if (!sa_ready(entry) && sai->sai_in_readpage)
-		sa_handle_callback(sai);
-
 	if (!sa_ready(entry)) {
 		spin_lock(&lli->lli_sa_lock);
 		sai->sai_index_wait = entry->se_index;
@@ -1507,7 +1401,7 @@ static int revalidate_statahead_dentry(struct inode *dir,
 	 */
 	ldd = ll_d2d(*dentryp);
 	ldd->lld_sa_generation = lli->lli_sa_generation;
-	sa_put(sai, entry, lli);
+	sa_put(sai, entry);
 	spin_lock(&lli->lli_sa_lock);
 	if (sai->sai_task)
 		wake_up_process(sai->sai_task);
@@ -1591,7 +1485,6 @@ static int start_statahead_thread(struct inode *dir, struct dentry *dentry,
 		spin_lock(&lli->lli_sa_lock);
 		lli->lli_sai = NULL;
 		spin_unlock(&lli->lli_sa_lock);
-		atomic_dec(&ll_i2sbi(parent->d_inode)->ll_sa_running);
 		rc = PTR_ERR(task);
 		CERROR("can't start ll_sa thread, rc : %d\n", rc);
 		goto out;
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 56d22d1..ac88d20 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -3431,9 +3431,9 @@ static int lmv_clear_open_replay_data(struct obd_export *exp,
 }
 
 static int lmv_intent_getattr_async(struct obd_export *exp,
-				    struct md_enqueue_info *minfo)
+				    struct md_op_item *item)
 {
-	struct md_op_data *op_data = &minfo->mi_data;
+	struct md_op_data *op_data = &item->mop_data;
 	struct obd_device *obd = exp->exp_obd;
 	struct lmv_obd *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *ptgt = NULL;
@@ -3457,7 +3457,7 @@ static int lmv_intent_getattr_async(struct obd_export *exp,
 	if (ctgt != ptgt)
 		return -EREMOTE;
 
-	return md_intent_getattr_async(ptgt->ltd_exp, minfo);
+	return md_intent_getattr_async(ptgt->ltd_exp, item);
 }
 
 static int lmv_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h
index fab40bd..2416607 100644
--- a/fs/lustre/mdc/mdc_internal.h
+++ b/fs/lustre/mdc/mdc_internal.h
@@ -130,8 +130,7 @@ int mdc_cancel_unused(struct obd_export *exp, const struct lu_fid *fid,
 int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
 			struct lu_fid *fid, u64 *bits);
 
-int mdc_intent_getattr_async(struct obd_export *exp,
-			     struct md_enqueue_info *minfo);
+int mdc_intent_getattr_async(struct obd_export *exp, struct md_op_item *item);
 
 enum ldlm_mode mdc_lock_match(struct obd_export *exp, u64 flags,
 			      const struct lu_fid *fid, enum ldlm_type type,
diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index 4135c3a..a0fcab0 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -49,7 +49,7 @@
 
 struct mdc_getattr_args {
 	struct obd_export	*ga_exp;
-	struct md_enqueue_info	*ga_minfo;
+	struct md_op_item	*ga_item;
 };
 
 int it_open_error(int phase, struct lookup_intent *it)
@@ -1360,10 +1360,10 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env,
 {
 	struct mdc_getattr_args *ga = args;
 	struct obd_export *exp = ga->ga_exp;
-	struct md_enqueue_info *minfo = ga->ga_minfo;
-	struct ldlm_enqueue_info *einfo = &minfo->mi_einfo;
-	struct lookup_intent *it = &minfo->mi_it;
-	struct lustre_handle *lockh = &minfo->mi_lockh;
+	struct md_op_item *item = ga->ga_item;
+	struct ldlm_enqueue_info *einfo = &item->mop_einfo;
+	struct lookup_intent *it = &item->mop_it;
+	struct lustre_handle *lockh = &item->mop_lockh;
 	struct ldlm_reply *lockrep;
 	u64 flags = LDLM_FL_HAS_INTENT;
 
@@ -1388,18 +1388,17 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env,
 	if (rc)
 		goto out;
 
-	rc = mdc_finish_intent_lock(exp, req, &minfo->mi_data, it, lockh);
-
+	rc = mdc_finish_intent_lock(exp, req, &item->mop_data, it, lockh);
 out:
-	minfo->mi_cb(req, minfo, rc);
+	item->mop_cb(&req->rq_pill, item, rc);
 	return 0;
 }
 
 int mdc_intent_getattr_async(struct obd_export *exp,
-			     struct md_enqueue_info *minfo)
+			     struct md_op_item *item)
 {
-	struct md_op_data *op_data = &minfo->mi_data;
-	struct lookup_intent *it = &minfo->mi_it;
+	struct md_op_data *op_data = &item->mop_data;
+	struct lookup_intent *it = &item->mop_it;
 	struct ptlrpc_request *req;
 	struct mdc_getattr_args *ga;
 	struct ldlm_res_id res_id;
@@ -1428,11 +1427,11 @@ int mdc_intent_getattr_async(struct obd_export *exp,
 	 * to avoid possible races. It is safe to have glimpse handler
 	 * for non-DOM locks and costs nothing.
 	 */
-	if (!minfo->mi_einfo.ei_cb_gl)
-		minfo->mi_einfo.ei_cb_gl = mdc_ldlm_glimpse_ast;
+	if (!item->mop_einfo.ei_cb_gl)
+		item->mop_einfo.ei_cb_gl = mdc_ldlm_glimpse_ast;
 
-	rc = ldlm_cli_enqueue(exp, &req, &minfo->mi_einfo, &res_id, &policy,
-			      &flags, NULL, 0, LVB_T_NONE, &minfo->mi_lockh, 1);
+	rc = ldlm_cli_enqueue(exp, &req, &item->mop_einfo, &res_id, &policy,
+			      &flags, NULL, 0, LVB_T_NONE, &item->mop_lockh, 1);
 	if (rc < 0) {
 		ptlrpc_req_finished(req);
 		return rc;
@@ -1440,7 +1439,7 @@ int mdc_intent_getattr_async(struct obd_export *exp,
 
 	ga = ptlrpc_req_async_args(ga, req);
 	ga->ga_exp = exp;
-	ga->ga_minfo = minfo;
+	ga->ga_item = item;
 
 	req->rq_interpret_reply = mdc_intent_getattr_async_interpret;
 	ptlrpcd_add_req(req);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 03/18] lustre: uapi: per-user changelog names and mask
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
  2021-07-19 12:31 ` [lustre-devel] [PATCH 01/18] lustre: statahead: update task management code James Simmons
  2021-07-19 12:31 ` [lustre-devel] [PATCH 02/18] lustre: llite: simplify callback handling for async getattr James Simmons
@ 2021-07-19 12:31 ` James Simmons
  2021-07-19 12:31 ` [lustre-devel] [PATCH 04/18] lnet: Correct peer NI recovery age out calculation James Simmons
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:31 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Mikhail Pershin, Lustre Development List

From: Mikhail Pershin <mpershin@whamcloud.com>

Allow specifying a name for newly-registered changelog users,
rather than the default "clNNN" that is otherwise used. This
allows services to register a "well-known" changelog user,
rather than having to store the changelog username in HA storage
outside of the filesystem.

Each changelog user still has a unique ID appended to it, to allow
the changelog_clear and changelog_deregister commands to be run
using only the ID if necessary/desired. User name can be used to
deregister. User name is also unique per server.

If no name is given, then default "cl" format is used.

With this new functionality, it is possible to specify the name like:
  testfs-MDT0000: Registered changelog userid 'cl13-watcher'

Per-user mask is also added to allow specific operation logging on
per-user basis. Mask can be set only during registration. Resulting
mask from per-server mask and all user masks is used for current
changelog operations.

Lustre-commit: a15eb4f13224e14 ("LU-13055 mdd: per-user changelog names and mask")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43380
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/llog_swab.c          | 17 ++++++++++++-----
 fs/lustre/ptlrpc/wiretest.c             | 23 ++++++++++++-----------
 include/uapi/linux/lustre/lustre_idl.h  | 18 ++++++++++++++++++
 include/uapi/linux/lustre/lustre_user.h |  2 +-
 4 files changed, 43 insertions(+), 17 deletions(-)

diff --git a/fs/lustre/obdclass/llog_swab.c b/fs/lustre/obdclass/llog_swab.c
index 0b83dc3..7bfc304 100644
--- a/fs/lustre/obdclass/llog_swab.c
+++ b/fs/lustre/obdclass/llog_swab.c
@@ -185,19 +185,26 @@ void lustre_swab_llog_rec(struct llog_rec_hdr *rec)
 		 * to compute its location at runtime
 		 */
 		tail = (struct llog_rec_tail *)((char *)&cr->cr +
-						changelog_rec_size(&cr->cr) +
-						cr->cr.cr_namelen);
+						rec->lrh_len - sizeof(*tail));
 		break;
 	}
 
 	case CHANGELOG_USER_REC:
+	case CHANGELOG_USER_REC2:
 	{
-		struct llog_changelog_user_rec *cur =
-			(struct llog_changelog_user_rec *)rec;
+		struct llog_changelog_user_rec2 *cur =
+			(struct llog_changelog_user_rec2 *)rec;
 
 		__swab32s(&cur->cur_id);
 		__swab64s(&cur->cur_endrec);
-		tail = &cur->cur_tail;
+		if (cur->cur_hdr.lrh_type == CHANGELOG_USER_REC2) {
+			__swab32s(&cur->cur_mask);
+			BUILD_BUG_ON(offsetof(typeof(*cur), cur_padding1) == 0);
+			BUILD_BUG_ON(offsetof(typeof(*cur), cur_padding2) == 0);
+			BUILD_BUG_ON(offsetof(typeof(*cur), cur_padding3) == 0);
+		}
+		tail = (struct llog_rec_tail *)((char *)rec +
+						rec->lrh_len - sizeof(*tail));
 		break;
 	}
 
diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index 9e0eaa7..c7eb218 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -3567,17 +3567,18 @@ void lustre_assert_wire_constants(void)
 		 (long long)(int)offsetof(struct llog_logid, lgl_ogen));
 	LASSERTF((int)sizeof(((struct llog_logid *)0)->lgl_ogen) == 4, "found %lld\n",
 		 (long long)(int)sizeof(((struct llog_logid *)0)->lgl_ogen));
-	BUILD_BUG_ON(OST_SZ_REC != 274730752);
-	BUILD_BUG_ON(MDS_UNLINK_REC != 274801668);
-	BUILD_BUG_ON(MDS_UNLINK64_REC != 275325956);
-	BUILD_BUG_ON(MDS_SETATTR64_REC != 275325953);
-	BUILD_BUG_ON(OBD_CFG_REC != 274857984);
-	BUILD_BUG_ON(LLOG_GEN_REC != 274989056);
-	BUILD_BUG_ON(CHANGELOG_REC != 275120128);
-	BUILD_BUG_ON(CHANGELOG_USER_REC != 275185664);
-	BUILD_BUG_ON(HSM_AGENT_REC != 275251200);
-	BUILD_BUG_ON(LLOG_HDR_MAGIC != 275010873);
-	BUILD_BUG_ON(LLOG_LOGID_MAGIC != 275010875);
+	BUILD_BUG_ON(OST_SZ_REC != 0x10600f00);
+	BUILD_BUG_ON(MDS_UNLINK_REC != 0x10612404);
+	BUILD_BUG_ON(MDS_UNLINK64_REC != 0x10692404);
+	BUILD_BUG_ON(MDS_SETATTR64_REC != 0x10692401);
+	BUILD_BUG_ON(OBD_CFG_REC != 0x10620000);
+	BUILD_BUG_ON(LLOG_GEN_REC != 0x10640000);
+	BUILD_BUG_ON(CHANGELOG_REC != 0x10660000);
+	BUILD_BUG_ON(CHANGELOG_USER_REC != 0x10670000);
+	BUILD_BUG_ON(CHANGELOG_USER_REC2 != 0x10670002);
+	BUILD_BUG_ON(HSM_AGENT_REC != 0x10680000);
+	BUILD_BUG_ON(LLOG_HDR_MAGIC != 0x10645539);
+	BUILD_BUG_ON(LLOG_LOGID_MAGIC != 0x1064553b);
 
 	/* Checks for struct llog_catid */
 	LASSERTF((int)sizeof(struct llog_catid) == 32, "found %lld\n",
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index 68bb807..8f49adb 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -2480,6 +2480,7 @@ enum llog_op_type {
 	/* LLOG_JOIN_REC	= LLOG_OP_MAGIC | 0x50000, obsolete  1.8.0 */
 	CHANGELOG_REC		= LLOG_OP_MAGIC | 0x60000,
 	CHANGELOG_USER_REC	= LLOG_OP_MAGIC | 0x70000,
+	CHANGELOG_USER_REC2	= LLOG_OP_MAGIC | 0x70002,
 	HSM_AGENT_REC		= LLOG_OP_MAGIC | 0x80000,
 	LLOG_HDR_MAGIC		= LLOG_OP_MAGIC | 0x45539,
 	LLOG_LOGID_MAGIC	= LLOG_OP_MAGIC | 0x4553b,
@@ -2575,6 +2576,8 @@ struct llog_changelog_rec {
 	struct llog_rec_tail	cr_do_not_use;	/**< for_sizezof_only */
 } __attribute__((packed));
 
+#define CHANGELOG_USER_NAMELEN 16 /* base name including NUL terminator */
+
 struct llog_changelog_user_rec {
 	struct llog_rec_hdr	cur_hdr;
 	__u32			cur_id;
@@ -2583,6 +2586,21 @@ struct llog_changelog_user_rec {
 	struct llog_rec_tail	cur_tail;
 } __attribute__((packed));
 
+/* this is twice the size of CHANGELOG_USER_REC */
+struct llog_changelog_user_rec2 {
+	struct llog_rec_hdr	cur_hdr;
+	__u32			cur_id;
+	/* only for use in relative time comparisons to detect idle users */
+	__u32			cur_time;
+	__u64			cur_endrec;
+	__u32                   cur_mask;
+	__u32			cur_padding1;
+	char			cur_name[CHANGELOG_USER_NAMELEN];
+	__u64			cur_padding2;
+	__u64			cur_padding3;
+	struct llog_rec_tail	cur_tail;
+} __attribute__((packed));
+
 enum agent_req_status {
 	ARS_WAITING,
 	ARS_STARTED,
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 49b013c..0cd3500 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -1248,7 +1248,7 @@ enum changelog_rec_type {
 	CL_RESYNC	= 22, /* FLR: file was resync-ed */
 	CL_GETXATTR	= 23,
 	CL_DN_OPEN	= 24, /* denied open */
-	CL_LAST
+	CL_LAST,
 };
 
 static inline const char *changelog_type2str(int type)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 04/18] lnet: Correct peer NI recovery age out calculation
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (2 preceding siblings ...)
  2021-07-19 12:31 ` [lustre-devel] [PATCH 03/18] lustre: uapi: per-user changelog names and mask James Simmons
@ 2021-07-19 12:31 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 05/18] lustre: lmv: compare space to mkdir on parent MDT James Simmons
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:31 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

The calculation to age a peer NI out of recovery is only valid if
lnet_recovery_limit is non-zero. When set to zero, we allow peer NIs
to be in recovery indefinitely.

HPE-bug-id: LUS-9953
Fixes: b414b1afc8 ("lnet: Age peer NI out of recovery")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14654
Lustre-commit: 8f3f0e1219724d6e ("LU-14654 lnet: Correct peer NI recovery age out calculation")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/43501
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 29c3372..224f4e2 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -4033,7 +4033,8 @@ int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk)
 		return;
 	}
 
-	if (now > lpni->lpni_last_alive + lnet_recovery_limit) {
+	if (lnet_recovery_limit &&
+	    now > lpni->lpni_last_alive + lnet_recovery_limit) {
 		CDEBUG(D_NET, "lpni %s aged out last alive %lld\n",
 		       libcfs_nid2str(lpni->lpni_nid),
 		       lpni->lpni_last_alive);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 05/18] lustre: lmv: compare space to mkdir on parent MDT
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (3 preceding siblings ...)
  2021-07-19 12:31 ` [lustre-devel] [PATCH 04/18] lnet: Correct peer NI recovery age out calculation James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 06/18] lnet: annotate LNET_WIRE_HANDLE_COOKIE_NONE as u64 James Simmons
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

In QOS subdirectory creation, subdirectories are kept on parent MDT
if it is less full than average, however it checks weight other than
free space, while "weight = free space - penalty", if MDTs have
different penalties, the result is not accurate, therefore this may
not work.

Check free space instead, and loosen the critirion to allow the
free space within the range of QOS threshold.

Fixes: 6a7e36a787eb ("lustre: lmv: qos stay on current MDT if less full")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14762
Lustre-commit: 002c2a80266b23c1 ("LU-14762 lmv: compare space to mkdir on parent MDT")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43997
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lu_object.h     |  3 ++-
 fs/lustre/lmv/lmv_obd.c           | 17 ++++++++++-------
 fs/lustre/obdclass/lu_tgt_descs.c | 11 ++++++-----
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/fs/lustre/include/lu_object.h b/fs/lustre/include/lu_object.h
index bbc4533..84e0489 100644
--- a/fs/lustre/include/lu_object.h
+++ b/fs/lustre/include/lu_object.h
@@ -1434,7 +1434,7 @@ struct lu_svr_qos {
 	struct obd_uuid		 lsq_uuid;	/* ptlrpc's c_remote_uuid */
 	struct list_head	 lsq_svr_list;	/* link to lq_svr_list */
 	u64			 lsq_bavail;	/* total bytes avail on svr */
-	u64			 lsq_iavail;	/* tital inode avail on svr */
+	u64			 lsq_iavail;	/* total inode avail on svr */
 	u64			 lsq_penalty;	/* current penalty */
 	u64			 lsq_penalty_per_obj; /* penalty decrease
 						       * every obj
@@ -1451,6 +1451,7 @@ struct lu_tgt_qos {
 	u64			 ltq_penalty_per_obj; /* penalty decrease
 						       * every obj
 						       */
+	u64			 ltq_avail;	/* bytes/inode avail */
 	u64			 ltq_weight;	/* net weighting */
 	time64_t		 ltq_used;	/* last used time, seconds */
 	bool			 ltq_usable:1;	/* usable for striping */
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index ac88d20..2f84028 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1430,6 +1430,7 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data,
 static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt)
 {
 	struct lu_tgt_desc *tgt, *cur = NULL;
+	u64 total_avail = 0;
 	u64 total_weight = 0;
 	u64 cur_weight = 0;
 	int total_usable = 0;
@@ -1460,23 +1461,25 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt)
 
 		tgt->ltd_qos.ltq_usable = 1;
 		lu_tgt_qos_weight_calc(tgt);
-		if (tgt->ltd_index == *mdt) {
+		if (tgt->ltd_index == *mdt)
 			cur = tgt;
-			cur_weight = tgt->ltd_qos.ltq_weight;
-		}
+		total_avail += tgt->ltd_qos.ltq_avail;
 		total_weight += tgt->ltd_qos.ltq_weight;
 		total_usable++;
 	}
 
-	/* if current MDT has higher-than-average space, stay on same MDT */
-	rand = total_weight / total_usable;
-	if (cur_weight >= rand) {
+	/* if current MDT has above-average space, within range of the QOS
+	 * threshold, stay on the same MDT to avoid creating needless remote
+	 * MDT directories.
+	 */
+	rand = total_avail * (256 - lmv->lmv_qos.lq_threshold_rr) /
+	       (total_usable * 256);
+	if (cur && cur->ltd_qos.ltq_avail >= rand) {
 		tgt = cur;
 		rc = 0;
 		goto unlock;
 	}
 
-	cur_weight = 0;
 	rand = lu_prandom_u64_max(total_weight);
 
 	lmv_foreach_connected_tgt(lmv, tgt) {
diff --git a/fs/lustre/obdclass/lu_tgt_descs.c b/fs/lustre/obdclass/lu_tgt_descs.c
index 2a2b30a..935cff6 100644
--- a/fs/lustre/obdclass/lu_tgt_descs.c
+++ b/fs/lustre/obdclass/lu_tgt_descs.c
@@ -220,14 +220,15 @@ static inline u64 tgt_statfs_iavail(struct lu_tgt_desc *tgt)
 void lu_tgt_qos_weight_calc(struct lu_tgt_desc *tgt)
 {
 	struct lu_tgt_qos *ltq = &tgt->ltd_qos;
-	u64 temp, temp2;
+	u64 penalty;
 
-	temp = (tgt_statfs_bavail(tgt) >> 16) * (tgt_statfs_iavail(tgt) >> 8);
-	temp2 = ltq->ltq_penalty + ltq->ltq_svr->lsq_penalty;
-	if (temp < temp2)
+	ltq->ltq_avail = (tgt_statfs_bavail(tgt) >> 16) *
+			 (tgt_statfs_iavail(tgt) >> 8);
+	penalty = ltq->ltq_penalty + ltq->ltq_svr->lsq_penalty;
+	if (ltq->ltq_avail < penalty)
 		ltq->ltq_weight = 0;
 	else
-		ltq->ltq_weight = temp - temp2;
+		ltq->ltq_weight = ltq->ltq_avail - penalty;
 }
 EXPORT_SYMBOL(lu_tgt_qos_weight_calc);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 06/18] lnet: annotate LNET_WIRE_HANDLE_COOKIE_NONE as u64
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (4 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 05/18] lustre: lmv: compare space to mkdir on parent MDT James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 07/18] lnet: libcfs: Add checksum speed under /sys/fs James Simmons
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Dominique Martinet, Lustre Development List

From: Dominique Martinet <asmadeus@codewreck.org>

Fix the following warning on new gcc with -Wextra when including
lustre_idl.h on external project:

.../include/linux/lnet/lnet-types.h: In function LNetMDHandleIsInvalid:
.../include/linux/lnet/lnet-types.h:355:46:
   error: comparison of integer expressions of different signedness:
   int and __u64 {aka long long unsigned int} [-Werror=sign-compare]
        return (LNET_WIRE_HANDLE_COOKIE_NONE == h.cookie);
                                                 ^~

WC-bug-id: https://jira.whamcloud.com/browse/LU-14093
Lustre-commit: 27214876fcdfbda0 ("LU-14093 lnet: annotate LNET_WIRE_HANDLE_COOKIE_NONE as u64")
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-on: https://review.whamcloud.com/43713
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lnet/lnet-types.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/lnet/lnet-types.h b/include/uapi/linux/lnet/lnet-types.h
index 43800ae..0c426ac 100644
--- a/include/uapi/linux/lnet/lnet-types.h
+++ b/include/uapi/linux/lnet/lnet-types.h
@@ -134,7 +134,7 @@ struct lnet_counters {
  * 'me' for match entry). Each type of object is given a unique handle type
  * to enhance type checking.
  */
-#define LNET_WIRE_HANDLE_COOKIE_NONE   (-1)
+#define LNET_WIRE_HANDLE_COOKIE_NONE   (~0ULL)
 
 struct lnet_handle_md {
 	__u64	cookie;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 07/18] lnet: libcfs: Add checksum speed under /sys/fs
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (5 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 06/18] lnet: annotate LNET_WIRE_HANDLE_COOKIE_NONE as u64 James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 08/18] lnet: use ni fatal error when calculating net health James Simmons
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Arshad Hussain, Lustre Development List

From: Arshad Hussain <arshad.hussain@aeoncomputing.com>

This patch adds total of registered checksum and all
registered checksum names along with their speed under
/sys/kernel/debug/lustre/checksum_speed

TestCase sanity/77m added.

Sample output:
$ lctl get_param checksum_speed
checksum_speed=adler32: 1955
crc32: 2423
crc32c: 14035

WC-bug-id: https://jira.whamcloud.com/browse/LU-11698
Lustre-commit: d775f9ae37975c85 ("LU-11698 libcfs: Add checksum speed under /sys/fs")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/43943
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/obd_sysfs.c       | 61 ++++++++++++++++++++++++++++++++++++
 include/linux/libcfs/libcfs_crypto.h |  3 ++
 net/lnet/libcfs/linux-crypto.c       |  3 +-
 3 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/obdclass/obd_sysfs.c b/fs/lustre/obdclass/obd_sysfs.c
index 43bbbe9..93d2abc 100644
--- a/fs/lustre/obdclass/obd_sysfs.c
+++ b/fs/lustre/obdclass/obd_sysfs.c
@@ -59,6 +59,7 @@
 #include <linux/seq_file.h>
 #include <linux/kobject.h>
 
+#include <linux/libcfs/libcfs_crypto.h>
 #include <uapi/linux/lnet/lnetctl.h>
 #include <obd_support.h>
 #include <obd_class.h>
@@ -420,6 +421,63 @@ static int obd_device_list_open(struct inode *inode, struct file *file)
 	.release = seq_release,
 };
 
+/* checksum_speed */
+static void *checksum_speed_start(struct seq_file *p, loff_t *pos)
+{
+	return pos;
+}
+
+static void checksum_speed_stop(struct seq_file *p, void *v)
+{
+}
+
+static void *checksum_speed_next(struct seq_file *p, void *v, loff_t *pos)
+{
+	++(*pos);
+	if (*pos >= CFS_HASH_ALG_SPEED_MAX - 1)
+		return NULL;
+
+	return pos;
+}
+
+static int checksum_speed_show(struct seq_file *p, void *v)
+{
+	loff_t index = *(loff_t *)v;
+
+	if (!index || index > CFS_HASH_ALG_SPEED_MAX - 1)
+		return 0;
+
+	seq_printf(p, "%s: %d\n", cfs_crypto_hash_name(index),
+		   cfs_crypto_hash_speeds[index]);
+
+	return 0;
+}
+
+static const struct seq_operations checksum_speed_sops = {
+	.start		= checksum_speed_start,
+	.stop		= checksum_speed_stop,
+	.next		= checksum_speed_next,
+	.show		= checksum_speed_show,
+};
+
+static int checksum_speed_open(struct inode *inode, struct file *file)
+{
+	int rc = seq_open(file, &checksum_speed_sops);
+
+	if (rc)
+		return rc;
+
+	return 0;
+}
+
+static const struct file_operations checksum_speed_fops = {
+	.owner		= THIS_MODULE,
+	.open		= checksum_speed_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
+
 static int
 health_check_seq_show(struct seq_file *m, void *unused)
 {
@@ -507,6 +565,9 @@ int class_procfs_init(void)
 
 	debugfs_create_file("health_check", 0444, debugfs_lustre_root,
 			    NULL, &health_check_fops);
+
+	debugfs_create_file("checksum_speed", 0444, debugfs_lustre_root,
+			    NULL, &checksum_speed_fops);
 out:
 	return rc;
 }
diff --git a/include/linux/libcfs/libcfs_crypto.h b/include/linux/libcfs/libcfs_crypto.h
index ef099e9..fc60220 100644
--- a/include/linux/libcfs/libcfs_crypto.h
+++ b/include/linux/libcfs/libcfs_crypto.h
@@ -135,6 +135,9 @@ enum cfs_crypto_hash_alg {
 	return NULL;
 }
 
+/*  Array of hash algorithm speed in MByte per second */
+extern int cfs_crypto_hash_speeds[CFS_HASH_ALG_MAX];
+
 /**
  * Return hash name for hash algorithm identifier
  *
diff --git a/net/lnet/libcfs/linux-crypto.c b/net/lnet/libcfs/linux-crypto.c
index aeaa623..7b4338a 100644
--- a/net/lnet/libcfs/linux-crypto.c
+++ b/net/lnet/libcfs/linux-crypto.c
@@ -39,7 +39,8 @@
 /**
  *  Array of hash algorithm speed in MByte per second
  */
-static int cfs_crypto_hash_speeds[CFS_HASH_ALG_MAX];
+int cfs_crypto_hash_speeds[CFS_HASH_ALG_MAX];
+EXPORT_SYMBOL(cfs_crypto_hash_speeds);
 
 /**
  * Initialize the state descriptor for the specified hash algorithm.
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 08/18] lnet: use ni fatal error when calculating net health
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (6 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 07/18] lnet: libcfs: Add checksum speed under /sys/fs James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 09/18] lustre: quota: add get/set project support for non-dir/file James Simmons
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

When ni is flagged with "fatal_error" by LND, its health score
remains unaffected. This allows for the net containing such ni
to be selected for tx even if it is the only ni in this net.
Take "fatal_error" status of the ni into account when calculating
the net health score.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14750
Lustre-commit: 86a69f9eb5cab3f9 ("LU-14750 lnet: use ni fatal error when calculating net health")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43962
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/api-ni.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 687df3b..dc9020d 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -3103,11 +3103,12 @@ int lnet_get_net_healthv_locked(struct lnet_net *net)
 {
 	struct lnet_ni *ni;
 	int best_healthv = 0;
-	int healthv;
+	int healthv, ni_fatal;
 
 	list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
 		healthv = atomic_read(&ni->ni_healthv);
-		if (healthv > best_healthv)
+		ni_fatal = atomic_read(&ni->ni_fatal_error_on);
+		if (!ni_fatal && healthv > best_healthv)
 			best_healthv = healthv;
 	}
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 09/18] lustre: quota: add get/set project support for non-dir/file
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (7 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 08/18] lnet: use ni fatal error when calculating net health James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 10/18] lustre: readahead: fix to reserve min pages James Simmons
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Wang Shilong, Lustre Development List

From: Wang Shilong <wshilong@ddn.com>

Add ablity to get/set non-dir/file's project ID and state.

WC-bug-id: https://jira.whamcloud.com/browse/LU-11872
Lustre-commit: b31792b0e72425c8 ("LU-11872 quota: add get/set project support for non-dir/file")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/44006
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/dir.c                   |   2 +
 fs/lustre/llite/file.c                  | 111 ++++++++++++++++++++++++++------
 fs/lustre/llite/llite_internal.h        |   5 +-
 fs/lustre/llite/llite_lib.c             |   3 +-
 include/uapi/linux/lustre/lustre_user.h |  16 +++++
 5 files changed, 115 insertions(+), 22 deletions(-)

diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index fa8e697..9666534 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -2098,6 +2098,8 @@ static long ll_dir_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		return ll_ioctl_fsgetxattr(inode, cmd, arg);
 	case FS_IOC_FSSETXATTR:
 		return ll_ioctl_fssetxattr(inode, cmd, arg);
+	case LL_IOC_PROJECT:
+		return ll_ioctl_project(file, cmd, arg);
 	case LL_IOC_PCC_DETACH_BY_FID: {
 		struct lu_pcc_detach_fid *detach;
 		struct lu_fid *fid;
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 54e343f..1ef5fd8 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -3314,7 +3314,8 @@ int ll_ioctl_fsgetxattr(struct inode *inode, unsigned int cmd,
 	return 0;
 }
 
-int ll_ioctl_check_project(struct inode *inode, struct fsxattr *fa)
+int ll_ioctl_check_project(struct inode *inode, u32 xflags,
+			   u32 projid)
 {
 	/*
 	 * Project Quota ID state is only allowed to change from within the init
@@ -3324,36 +3325,29 @@ int ll_ioctl_check_project(struct inode *inode, struct fsxattr *fa)
 	if (current_user_ns() == &init_user_ns)
 		return 0;
 
-	if (ll_i2info(inode)->lli_projid != fa->fsx_projid)
+	if (ll_i2info(inode)->lli_projid != projid)
 		return -EINVAL;
 
 	if (test_bit(LLIF_PROJECT_INHERIT, &ll_i2info(inode)->lli_flags)) {
-		if (!(fa->fsx_xflags & FS_XFLAG_PROJINHERIT))
+		if (!(xflags & FS_XFLAG_PROJINHERIT))
 			return -EINVAL;
 	} else {
-		if (fa->fsx_xflags & FS_XFLAG_PROJINHERIT)
+		if (xflags & FS_XFLAG_PROJINHERIT)
 			return -EINVAL;
 	}
 
 	return 0;
 }
 
-int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd,
-			unsigned long arg)
+static int ll_set_project(struct inode *inode, u32 xflags, u32 projid)
 {
 	struct ptlrpc_request *req = NULL;
 	struct md_op_data *op_data;
-	struct fsxattr fsxattr;
 	struct cl_object *obj;
 	unsigned int inode_flags;
 	int rc = 0;
 
-	if (copy_from_user(&fsxattr,
-			   (const struct fsxattr __user *)arg,
-			   sizeof(fsxattr)))
-		return -EFAULT;
-
-	rc = ll_ioctl_check_project(inode, &fsxattr);
+	rc = ll_ioctl_check_project(inode, xflags, projid);
 	if (rc)
 		return rc;
 
@@ -3362,11 +3356,11 @@ int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd,
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
-	inode_flags = ll_xflags_to_inode_flags(fsxattr.fsx_xflags);
+	inode_flags = ll_xflags_to_inode_flags(xflags);
 	op_data->op_attr_flags = ll_inode_to_ext_flags(inode_flags);
-	if (fsxattr.fsx_xflags & FS_XFLAG_PROJINHERIT)
+	if (xflags & FS_XFLAG_PROJINHERIT)
 		op_data->op_attr_flags |= LUSTRE_PROJINHERIT_FL;
-	op_data->op_projid = fsxattr.fsx_projid;
+	op_data->op_projid = projid;
 	op_data->op_xvalid |= OP_XVALID_PROJID;
 	rc = md_setattr(ll_i2sbi(inode)->ll_md_exp, op_data, NULL,
 			0, &req);
@@ -3377,16 +3371,14 @@ int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd,
 	ll_update_inode_flags(inode, op_data->op_attr_flags);
 
 	/* Avoid OST RPC if this is only ioctl setting project inherit flag */
-	if (fsxattr.fsx_xflags == 0 ||
-	    fsxattr.fsx_xflags == FS_XFLAG_PROJINHERIT)
+	if (xflags == 0 || xflags == FS_XFLAG_PROJINHERIT)
 		goto out_fsxattr;
 
 	obj = ll_i2info(inode)->lli_clob;
 	if (obj) {
 		struct iattr attr = { 0 };
 
-		rc = cl_setattr_ost(obj, &attr, OP_XVALID_FLAGS,
-				    fsxattr.fsx_xflags);
+		rc = cl_setattr_ost(obj, &attr, OP_XVALID_FLAGS, xflags);
 	}
 
 out_fsxattr:
@@ -3395,6 +3387,83 @@ int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd,
 	return rc;
 }
 
+int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd,
+			unsigned long arg)
+{
+	struct fsxattr fsxattr;
+
+	if (copy_from_user(&fsxattr,
+			   (const struct fsxattr __user *)arg,
+			   sizeof(fsxattr)))
+		return -EFAULT;
+
+	return ll_set_project(inode, fsxattr.fsx_xflags,
+			      fsxattr.fsx_projid);
+}
+
+int ll_ioctl_project(struct file *file, unsigned int cmd,
+		     unsigned long arg)
+{
+	struct lu_project lu_project;
+	struct dentry *dentry = file_dentry(file);
+	struct inode *inode = file_inode(file);
+	struct dentry *child_dentry = NULL;
+	int rc = 0, name_len;
+
+	if (copy_from_user(&lu_project,
+			   (const struct lu_project __user *)arg,
+			   sizeof(lu_project)))
+		return -EFAULT;
+
+	/* apply child dentry if name is valid */
+	name_len = strnlen(lu_project.project_name, NAME_MAX);
+	if (name_len > 0 && name_len <= NAME_MAX) {
+		inode_lock(inode);
+		child_dentry = lookup_one_len(lu_project.project_name,
+					      dentry, name_len);
+		inode_unlock(inode);
+		if (IS_ERR(child_dentry)) {
+			rc = PTR_ERR(child_dentry);
+			goto out;
+		}
+		inode = child_dentry->d_inode;
+		if (!inode) {
+			rc = -ENOENT;
+			goto out;
+		}
+	} else if (name_len > NAME_MAX) {
+		rc = -EINVAL;
+		goto out;
+	}
+
+	switch (lu_project.project_type) {
+	case LU_PROJECT_SET:
+		rc = ll_set_project(inode, lu_project.project_xflags,
+				    lu_project.project_id);
+		break;
+	case LU_PROJECT_GET:
+		lu_project.project_xflags =
+				ll_inode_flags_to_xflags(inode->i_flags);
+		if (test_bit(LLIF_PROJECT_INHERIT,
+			     &ll_i2info(inode)->lli_flags))
+			lu_project.project_xflags |= FS_XFLAG_PROJINHERIT;
+		lu_project.project_id = ll_i2info(inode)->lli_projid;
+		if (copy_to_user((struct lu_project __user *)arg,
+				 &lu_project, sizeof(lu_project))) {
+			rc = -EFAULT;
+			goto out;
+		}
+		break;
+	default:
+		rc = -EINVAL;
+		break;
+	}
+out:
+	if (!IS_ERR_OR_NULL(child_dentry))
+		dput(child_dentry);
+	return rc;
+}
+
 static long ll_file_unlock_lease(struct file *file, struct ll_ioc_lease *ioc,
 				 unsigned long arg)
 {
@@ -4063,6 +4132,8 @@ static int ll_heat_set(struct inode *inode, enum lu_heat_flag flags)
 		return ll_ioctl_fsgetxattr(inode, cmd, arg);
 	case FS_IOC_FSSETXATTR:
 		return ll_ioctl_fssetxattr(inode, cmd, arg);
+	case LL_IOC_PROJECT:
+		return ll_ioctl_project(file, cmd, arg);
 	case BLKSSZGET:
 		return put_user(PAGE_SIZE, (int __user *)arg);
 	case LL_IOC_HEAT_GET: {
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 1d5255e..6cae741 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1157,11 +1157,14 @@ int ll_migrate(struct inode *parent, struct file *file,
 int ll_get_fid_by_name(struct inode *parent, const char *name,
 		       int namelen, struct lu_fid *fid, struct inode **inode);
 int ll_inode_permission(struct inode *inode, int mask);
-int ll_ioctl_check_project(struct inode *inode, struct fsxattr *fa);
+int ll_ioctl_check_project(struct inode *inode, u32 xflags, u32 projid);
 int ll_ioctl_fsgetxattr(struct inode *inode, unsigned int cmd,
 			unsigned long arg);
 int ll_ioctl_fssetxattr(struct inode *inode, unsigned int cmd,
 			unsigned long arg);
+int ll_ioctl_project(struct file *file, unsigned int cmd,
+		     unsigned long arg);
+
 int ll_lov_setstripe_ea_info(struct inode *inode, struct dentry *dentry,
 			     u64 flags, struct lov_user_md *lum,
 			     int lum_size);
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 153d34e..10a9a95 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -2615,7 +2615,8 @@ int ll_iocontrol(struct inode *inode, struct file *file,
 		if (flags & LUSTRE_PROJINHERIT_FL)
 			fa.fsx_xflags = FS_XFLAG_PROJINHERIT;
 
-		rc = ll_ioctl_check_project(inode, &fa);
+		rc = ll_ioctl_check_project(inode, fa.fsx_xflags,
+					    fa.fsx_projid);
 		if (rc)
 			return rc;
 
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 0cd3500..da15ca8 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -375,6 +375,7 @@ struct ll_ioc_lease_id {
 #define LL_IOC_PCC_DETACH		_IOW('f', 252, struct lu_pcc_detach)
 #define LL_IOC_PCC_DETACH_BY_FID	_IOW('f', 252, struct lu_pcc_detach_fid)
 #define LL_IOC_PCC_STATE		_IOR('f', 252, struct lu_pcc_state)
+#define LL_IOC_PROJECT			_IOW('f', 253, struct lu_project)
 
 #define LL_STATFS_LMV		1
 #define LL_STATFS_LOV		2
@@ -2311,6 +2312,21 @@ struct lu_pcc_state {
 	char	pccs_path[PATH_MAX];
 };
 
+enum lu_project_type {
+	LU_PROJECT_NONE = 0,
+	LU_PROJECT_SET,
+	LU_PROJECT_GET,
+	LU_PROJECT_MAX
+};
+
+struct lu_project {
+	__u32	project_type; /* enum lu_project_type */
+	__u32	project_id;
+	__u32	project_xflags;
+	__u32	project_reserved;
+	char	project_name[NAME_MAX + 1];
+};
+
 struct fid_array {
 	__u32 fa_nr;
 	/* make header's size equal lu_fid */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 10/18] lustre: readahead: fix to reserve min pages
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (8 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 09/18] lustre: quota: add get/set project support for non-dir/file James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 11/18] lnet: RMDA infrastructure updates James Simmons
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Wang Shilong, Lustre Development List

From: Wang Shilong <wshilong@ddn.com>

@pages_min might be larger than @pages which indicate
more pages should be read, and it will cause a warning
later.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14778
Lustre-commit: 4fc127428f00d6a3 ("LU-14778 readahead: fix to reserve min pages")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/44050
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/rw.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 184e5e8..4de77f6 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -85,8 +85,9 @@ static unsigned long ll_ra_count_get(struct ll_sb_info *sbi,
 	struct ll_ra_info *ra = &sbi->ll_ra_info;
 	long ret;
 
+	WARN_ON_ONCE(pages_min > pages);
 	/**
-	 * Don't try readahead agreesively if we are limited
+	 * Don't try readahead aggresively if we are limited
 	 * LRU pages, otherwise, it could cause deadlock.
 	 */
 	pages = min(sbi->ll_cache->ccc_lru_max >> 2, pages);
@@ -95,7 +96,7 @@ static unsigned long ll_ra_count_get(struct ll_sb_info *sbi,
 	 * this will make us leak @ra_cur_pages, because
 	 * ll_ra_count_put() acutally freed @pages.
 	 */
-	if (WARN_ON_ONCE(pages_min > pages))
+	if (unlikely(pages_min > pages))
 		pages_min = pages;
 
 	/*
@@ -829,7 +830,8 @@ static int ll_readahead(const struct lu_env *env, struct cl_io *io,
 	/* don't over reserved for mmap range read */
 	if (skip_index)
 		pages_min = 0;
-
+	if (pages_min > pages)
+		pages = pages_min;
 	ria->ria_reserved = ll_ra_count_get(ll_i2sbi(inode), ria, pages,
 					    pages_min);
 	if (ria->ria_reserved < pages)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 11/18] lnet: RMDA infrastructure updates
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (9 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 10/18] lustre: readahead: fix to reserve min pages James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 12/18] lnet: o2iblnd: Move racy NULL assignment James Simmons
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

Add infrastructure to force RDMA for payloads < 4K.
Add infrastructure to extract the first page in a
payload. Useful for determining the type of the payload
to be transmitted.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14798
Lustre-commit: 7ac839837c1c6cd1f ("LU-14798 lnet: RMDA infrastructure updates")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Lustre-change: https://review.whamcloud.com/37453
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Whamcloud-bug-id: EX-773
Reviewed-on: https://review.whamcloud.com/44109
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h       |  1 +
 include/linux/lnet/lib-types.h      |  2 ++
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c |  4 ++--
 net/lnet/lnet/lib-md.c              | 29 +++++++++++++++++++++++------
 4 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 6b9e926..f56ecab 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -711,6 +711,7 @@ void lnet_copy_kiov2iter(struct iov_iter *to,
 void lnet_md_unlink(struct lnet_libmd *md);
 void lnet_md_deconstruct(struct lnet_libmd *lmd, struct lnet_event *ev);
 struct page *lnet_kvaddr_to_page(unsigned long vaddr);
+struct page *lnet_get_first_page(struct lnet_libmd *md, unsigned int offset);
 int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset);
 
 unsigned int lnet_get_lnd_timeout(void);
diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 64d7472..e951e02 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -122,6 +122,8 @@ struct lnet_msg {
 	enum lnet_msg_hstatus	msg_health_status;
 	/* This is a recovery message */
 	bool			msg_recovery;
+	/* force an RDMA even if the message size is < 4K */
+	bool			msg_rdma_force;
 	/* the number of times a transmission has been retried */
 	int			msg_retry_count;
 	/* flag to indicate that we do not want to resend this message */
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index ec0d05a..c66acc51 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1553,7 +1553,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 		/* is the REPLY message too small for RDMA? */
 		nob = offsetof(struct kib_msg, ibm_u.immediate.ibim_payload[lntmsg->msg_md->md_length]);
-		if (nob <= IBLND_MSG_SIZE)
+		if (nob <= IBLND_MSG_SIZE && !lntmsg->msg_rdma_force)
 			break;		/* send IMMEDIATE */
 
 		tx = kiblnd_get_idle_tx(ni, target.nid);
@@ -1599,7 +1599,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	case LNET_MSG_PUT:
 		/* Is the payload small enough not to need RDMA? */
 		nob = offsetof(struct kib_msg, ibm_u.immediate.ibim_payload[payload_nob]);
-		if (nob <= IBLND_MSG_SIZE)
+		if (nob <= IBLND_MSG_SIZE && !lntmsg->msg_rdma_force)
 			break;			/* send IMMEDIATE */
 
 		tx = kiblnd_get_idle_tx(ni, target.nid);
diff --git a/net/lnet/lnet/lib-md.c b/net/lnet/lnet/lib-md.c
index fbee4e0..affa921 100644
--- a/net/lnet/lnet/lib-md.c
+++ b/net/lnet/lnet/lib-md.c
@@ -87,9 +87,9 @@ struct page *lnet_kvaddr_to_page(unsigned long vaddr)
 }
 EXPORT_SYMBOL(lnet_kvaddr_to_page);
 
-int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset)
+struct page *
+lnet_get_first_page(struct lnet_libmd *md, unsigned int offset)
 {
-	int cpt = CFS_CPT_ANY;
 	unsigned int niov;
 	struct bio_vec *kiov;
 
@@ -102,7 +102,7 @@ int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset)
 		md = lnet_handle2md(&md->md_bulk_handle);
 
 	if (!md || md->md_niov == 0)
-		return CFS_CPT_ANY;
+		return NULL;
 
 	kiov = md->md_kiov;
 	niov = md->md_niov;
@@ -113,12 +113,29 @@ int lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset)
 		kiov++;
 		if (niov == 0) {
 			CERROR("offset %d goes beyond iov\n", offset);
-			goto out;
+			return NULL;
 		}
 	}
 
-	cpt = cfs_cpt_of_node(lnet_cpt_table(),
-			      page_to_nid(kiov->bv_page));
+	return kiov->bv_page;
+}
+
+int
+lnet_cpt_of_md(struct lnet_libmd *md, unsigned int offset)
+{
+	struct page *page;
+	int cpt = CFS_CPT_ANY;
+
+	page = lnet_get_first_page(md, offset);
+	if (!page) {
+		CDEBUG(D_NET,
+		       "Couldn't resolve first page of md %p with offset %u\n",
+		       md, offset);
+		goto out;
+	}
+
+	cpt = cfs_cpt_of_node(lnet_cpt_table(), page_to_nid(page));
+
 out:
 	return cpt;
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 12/18] lnet: o2iblnd: Move racy NULL assignment
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (10 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 11/18] lnet: RMDA infrastructure updates James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 13/18] lnet: o2iblnd: Avoid double posting invalidate James Simmons
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Mike Marciniszyn, Lustre Development List

From: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

kiblnd_fmr_pool_unmap() can race map and subsequent processing
because of this flaw in unmap:

if (frd) {
        frd->frd_valid = false;
        spin_lock(&fps->fps_lock);
        list_add_tail(&frd->frd_list, &fpo->fast_reg.fpo_pool_list);
        spin_unlock(&fps->fps_lock);
        fmr->fmr_frd = NULL;
}

The fmr can be pulled off the list in kiblnd_fmr_pool_unmap() on
another CPU an fmr_frd could be in a state of flux and
potentially be seen incorrectly later on as the kib_tx is processed.

Fix my moving the fmr_frd assignment to before the fmr is added to the
list.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14733
Lustre-commit: 023113fb8946f356 ("LU-14733 o2iblnd: Move racy NULL assignment")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Reviewed-on: https://review.whamcloud.com/44189
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index d722e6c..81d9e4d 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1539,10 +1539,10 @@ void kiblnd_fmr_pool_unmap(struct kib_fmr *fmr, int status)
 	fps = fpo->fpo_owner;
 	if (frd) {
 		frd->frd_valid = false;
+		fmr->fmr_frd = NULL;
 		spin_lock(&fps->fps_lock);
 		list_add_tail(&frd->frd_list, &fpo->fast_reg.fpo_pool_list);
 		spin_unlock(&fps->fps_lock);
-		fmr->fmr_frd = NULL;
 	}
 	fmr->fmr_pool = NULL;
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 13/18] lnet: o2iblnd: Avoid double posting invalidate
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (11 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 12/18] lnet: o2iblnd: Move racy NULL assignment James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 14/18] lustre: quota: nodemap squashed root cannot bypass quota James Simmons
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Mike Marciniszyn, Lustre Development List

From: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>

When the kib_tx is provisioned during kiblnd_fmr_pool_map(), spare
WRs in the kib_fast_reg_descriptor are setup and the mapping of
pages is given to the mr.

kiblnd_post_tx_locked() then posts the spare WRs from the
kib_fast_reg_descriptor.

if (rc == 0)
        return 0;

The code returns and the kib_fast_reg_descriptor is still contains
the spare WRs.   The next time the kib_tx is used, the
now obsolete WRs will be inadvertently posted.   For rdmavt, the
obsolete invalidate will cause an -EINVAL to be returned from
the post send.

Fix by adding a state variable frd_posted to the kib_fast_reg_descriptor.
The variable is set to false in kiblnd_fmr_pool_unmap().
kiblnd_post_tx_locked() is adjusted to avoid prepending the
kib_fast_reg_descriptor WRs when frd_posted is true.   After
the post succeeds, the frd_posted is set to true.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14733
Lustre-commit: 5930576791e864529 ("LU-14733 o2iblnd: Avoid double posting invalidate")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Reviewed-on: https://review.whamcloud.com/44190
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c    |  2 ++
 net/lnet/klnds/o2iblnd/o2iblnd.h    |  1 +
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 12 +++++++-----
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 81d9e4d..b519a31 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1539,6 +1539,7 @@ void kiblnd_fmr_pool_unmap(struct kib_fmr *fmr, int status)
 	fps = fpo->fpo_owner;
 	if (frd) {
 		frd->frd_valid = false;
+		frd->frd_posted = false;
 		fmr->fmr_frd = NULL;
 		spin_lock(&fps->fps_lock);
 		list_add_tail(&frd->frd_list, &fpo->fast_reg.fpo_pool_list);
@@ -1634,6 +1635,7 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx,
 			fmr->fmr_key = is_rx ? mr->rkey : mr->lkey;
 			fmr->fmr_frd = frd;
 			fmr->fmr_pool = fpo;
+			frd->frd_posted = false;
 			return 0;
 		}
 
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 824b204..8d1d7eb 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -272,6 +272,7 @@ struct kib_fast_reg_descriptor { /* For fast registration */
 	struct ib_reg_wr	frd_fastreg_wr;
 	struct ib_mr	       *frd_mr;
 	bool			frd_valid;
+	bool			frd_posted;
 };
 
 struct kib_fmr_pool {
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index c66acc51..32ccac2 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -721,6 +721,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	struct kib_msg *msg = tx->tx_msg;
 	struct kib_peer_ni *peer_ni = conn->ibc_peer;
 	struct lnet_ni *ni = peer_ni->ibp_ni;
+	struct kib_fast_reg_descriptor *frd = tx->tx_fmr.fmr_frd;
 	int ver = conn->ibc_version;
 	int rc;
 	int done;
@@ -809,11 +810,10 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 		/* close_conn will launch failover */
 		rc = -ENETDOWN;
 	} else {
-		struct kib_fast_reg_descriptor *frd = tx->tx_fmr.fmr_frd;
 		const struct ib_send_wr *bad = &tx->tx_wrq[tx->tx_nwrq - 1].wr;
 		struct ib_send_wr *wrq = &tx->tx_wrq[0].wr;
 
-		if (frd) {
+		if (frd && !frd->frd_posted) {
 			if (!frd->frd_valid) {
 				wrq = &frd->frd_inv_wr;
 				wrq->next = &frd->frd_fastreg_wr.wr;
@@ -837,11 +837,13 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 
 	conn->ibc_last_send = ktime_get();
 
-	if (!rc)
+	if (rc == 0) {
+		if (frd)
+			frd->frd_posted = true;
 		return 0;
+	}
 
-	/*
-	 * NB credits are transferred in the actual
+	/* NB credits are transferred in the actual
 	 * message, which can only be the last work item
 	 */
 	conn->ibc_credits += credit;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 14/18] lustre: quota: nodemap squashed root cannot bypass quota
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (12 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 13/18] lnet: o2iblnd: Avoid double posting invalidate James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 15/18] lustre: llite: reset pfid after dir migration James Simmons
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

When root on client is squashed via a nodemap's squash_uid/squash_gid,
its IOs must not bypass quota enforcement as it normally does without
squashing.
So on client side, do not set OBD_BRW_FROM_GRANT for every page being
used by root. And on server side, check if root is squashed via a
nodemap and remove OBD_BRW_NOQUOTA.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14739
Lustre-commit: a4fbe7341baf12c00 ("LU-14739 quota: nodemap squashed root cannot bypass quota")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/43988
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index 84c6b68..50f6477 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -2380,7 +2380,7 @@ int osc_queue_async_io(const struct lu_env *env, struct cl_io *io,
 	}
 
 	/* check if the file's owner/group is over quota */
-	if (!(cmd & OBD_BRW_NOQUOTA)) {
+	if (!io->ci_noquota) {
 		struct cl_object *obj;
 		struct cl_attr *attr;
 		unsigned int qid[MAXQUOTAS];
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 15/18] lustre: llite: reset pfid after dir migration
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (13 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 14/18] lustre: quota: nodemap squashed root cannot bypass quota James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 16/18] lustre: llite: failed ASSERTION(ldlm_has_layout(lock)) James Simmons
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

A plain directory will be turned into to a stripe upon
migration/restripe, and reversely if target is plain directory, the
target stripe will be turned into directory after.

In the first case, set pfid, and in the latter case, clear pfid,
otherwise ll_lock_cancel_bits() will use the wrong master inode.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14459
Lustre-commit: abbe545a63b304e80 ("LU-14459 llite: reset pfid after dir migration")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43289
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_lib.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 10a9a95..88a1d17 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1401,10 +1401,12 @@ static struct inode *ll_iget_anon_dir(struct super_block *sb,
 				      struct lustre_md *md)
 {
 	struct ll_sb_info *sbi = ll_s2sbi(sb);
+	struct ll_inode_info *lli;
 	struct mdt_body *body = md->body;
 	struct inode *inode;
 	ino_t ino;
 
+	LASSERT(md->lmv);
 	ino = cl_fid_build_ino(fid, sbi->ll_flags & LL_SBI_32BIT_API);
 	inode = iget_locked(sb, ino);
 	if (!inode) {
@@ -1413,10 +1415,8 @@ static struct inode *ll_iget_anon_dir(struct super_block *sb,
 		return ERR_PTR(-ENOENT);
 	}
 
+	lli = ll_i2info(inode);
 	if (inode->i_state & I_NEW) {
-		struct ll_inode_info *lli = ll_i2info(inode);
-		struct lmv_stripe_md *lsm = md->lmv;
-
 		inode->i_mode = (inode->i_mode & ~S_IFMT) |
 				(body->mbo_mode & S_IFMT);
 		LASSERTF(S_ISDIR(inode->i_mode), "Not slave inode " DFID "\n",
@@ -1432,12 +1432,17 @@ static struct inode *ll_iget_anon_dir(struct super_block *sb,
 		lli->lli_fid = *fid;
 		ll_lli_init(lli);
 
-		LASSERT(lsm);
 		/* master object FID */
 		lli->lli_pfid = body->mbo_fid1;
 		CDEBUG(D_INODE, "lli %p slave " DFID " master " DFID "\n",
 		       lli, PFID(fid), PFID(&lli->lli_pfid));
 		unlock_new_inode(inode);
+	} else {
+		/* in directory restripe/auto-split, a directory will be
+		 * transformed to a stripe if it's plain, set its pfid here,
+		 * otherwise ll_lock_cancel_bits() can't find the master inode.
+		 */
+		lli->lli_pfid = body->mbo_fid1;
 	}
 
 	return inode;
@@ -1547,6 +1552,12 @@ static int ll_update_lsm_md(struct inode *inode, struct lustre_md *md)
 	if (md->default_lmv)
 		ll_update_default_lsm_md(inode, md);
 
+	/* after dir migration/restripe, a stripe may be turned into a
+	 * directory, in this case, zero out its lli_pfid.
+	 */
+	if (unlikely(fid_is_norm(&lli->lli_pfid)))
+		fid_zero(&lli->lli_pfid);
+
 	/*
 	 * no striped information from request, lustre_md from req does not
 	 * include stripeEA, see ll_md_setattr()
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 16/18] lustre: llite: failed ASSERTION(ldlm_has_layout(lock))
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (14 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 15/18] lustre: llite: reset pfid after dir migration James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 17/18] lustre: pcc: introducing OBD_CONNECT2_PCCRO flag James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 18/18] lustre: sec: migrate/extend/split on encrypted file James Simmons
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Bobi Jam <bobijam@whamcloud.com>

When setting layout in layout lock, the lock could lost its layout
bits, and we'd try fetch the layout lock again.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14780
Lustre-commit: 1b166d6dd6a2f39d ("LU-14780 llite: failed ASSERTION(ldlm_has_layout(lock))")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44054
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 1ef5fd8..b822ca5 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -5609,7 +5609,11 @@ static int ll_layout_lock_set(struct lustre_handle *lockh, enum ldlm_mode mode,
 
 	lock = ldlm_handle2lock(lockh);
 	LASSERT(lock);
-	LASSERT(ldlm_has_layout(lock));
+
+	if (!ldlm_has_layout(lock)) {
+		rc = -EAGAIN;
+		goto out;
+	}
 
 	LDLM_DEBUG(lock, "File " DFID "(%p) being reconfigured",
 		   PFID(&lli->lli_fid), inode);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 17/18] lustre: pcc: introducing OBD_CONNECT2_PCCRO flag
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (15 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 16/18] lustre: llite: failed ASSERTION(ldlm_has_layout(lock)) James Simmons
@ 2021-07-19 12:32 ` James Simmons
  2021-07-19 12:32 ` [lustre-devel] [PATCH 18/18] lustre: sec: migrate/extend/split on encrypted file James Simmons
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Qian Yingjin <qian@ddn.com>

Add a new connection flag OBD_CONNECT2_PCCRO to solve the access
consistency from the old client without PCC-RO support.

By necessity, also include definitions for OBD_CONNECT2_MODE_CONVERT
and OBD_CONNECT2_BATCH_RPC so obd_connect_names[] works.

WC-bug-id: https://jira.whamcloud.com/browse/LU-10499
Lustre-commit: 6007dc9382df726 ("LU-10499 pcc: introducing OBD_CONNECT2_PCCRO flag")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/40791
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/wiretest.c            | 6 ++++++
 include/uapi/linux/lustre/lustre_idl.h | 4 ++++
 2 files changed, 10 insertions(+)

diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index c7eb218..cd1456c 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -1252,6 +1252,12 @@ void lustre_assert_wire_constants(void)
 		 OBD_CONNECT2_DOM_LVB);
 	LASSERTF(OBD_CONNECT2_REP_MBITS == 0x100000ULL, "found 0x%.16llxULL\n",
 		 OBD_CONNECT2_REP_MBITS);
+	LASSERTF(OBD_CONNECT2_MODE_CONVERT == 0x200000ULL, "found 0x%.16llxULL\n",
+		 OBD_CONNECT2_MODE_CONVERT);
+	LASSERTF(OBD_CONNECT2_BATCH_RPC == 0x400000ULL, "found 0x%.16llxULL\n",
+		 OBD_CONNECT2_BATCH_RPC);
+	LASSERTF(OBD_CONNECT2_PCCRO == 0x800000ULL, "found 0x%.16llxULL\n",
+		 OBD_CONNECT2_PCCRO);
 	LASSERTF(OBD_CONNECT2_ATOMIC_OPEN_LOCK == 0x4000000ULL, "found 0x%.16llxULL\n",
 		 OBD_CONNECT2_ATOMIC_OPEN_LOCK);
 	LASSERTF(OBD_CKSUM_CRC32 == 0x00000001UL, "found 0x%.8xUL\n",
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index 8f49adb..2047b92 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -840,6 +840,10 @@ struct ptlrpc_body_v2 {
 #define OBD_CONNECT2_LSEEK	      0x40000ULL /* SEEK_HOLE/DATA RPC */
 #define OBD_CONNECT2_DOM_LVB	      0x80000ULL /* pack DOM glimpse data in LVB */
 #define OBD_CONNECT2_REP_MBITS	     0x100000ULL /* match reply by mbits, not xid */
+#define OBD_CONNECT2_REP_MBITS       0x100000ULL /* match reply mbits not xid*/
+#define OBD_CONNECT2_MODE_CONVERT    0x200000ULL /* LDLM mode convert */
+#define OBD_CONNECT2_BATCH_RPC	     0x400000ULL /* Multi-RPC batch request */
+#define OBD_CONNECT2_PCCRO	     0x800000ULL /* Read-only PCC */
 #define OBD_CONNECT2_ATOMIC_OPEN_LOCK 0x4000000ULL/* request lock on 1st open */
 /* XXX README XXX:
  * Please DO NOT add flag values here before first ensuring that this same
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [lustre-devel] [PATCH 18/18] lustre: sec: migrate/extend/split on encrypted file
  2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
                   ` (16 preceding siblings ...)
  2021-07-19 12:32 ` [lustre-devel] [PATCH 17/18] lustre: pcc: introducing OBD_CONNECT2_PCCRO flag James Simmons
@ 2021-07-19 12:32 ` James Simmons
  17 siblings, 0 replies; 19+ messages in thread
From: James Simmons @ 2021-07-19 12:32 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Sebastien Buisson <sbuisson@ddn.com>

lfs migrate/extend/split makes use of volatile files to swap layouts.
When operation is carried out on an encrypted file, the volatile file
must be assigned the same encryption context as the original file, so
that data moved/copied to different OSTs is identical to the original
file's.
Also update sanity-sec test_52 to exercise these commands.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14677
Lustre-commit: 09c558d16f0a80f4 ("LU-14677 sec: migrate/extend/split on encrypted file")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/43878
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_crypto.h |   2 +
 fs/lustre/llite/llite_lib.c       |   6 +--
 fs/lustre/llite/namei.c           | 101 ++++++++++++++++++++++++++++++++++++--
 3 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/fs/lustre/include/lustre_crypto.h b/fs/lustre/include/lustre_crypto.h
index b19bb420..6cc946d 100644
--- a/fs/lustre/include/lustre_crypto.h
+++ b/fs/lustre/include/lustre_crypto.h
@@ -58,6 +58,8 @@ static inline bool ll_sbi_has_encrypt(struct ll_sb_info *sbi)
 
 static inline void ll_sbi_set_encrypt(struct ll_sb_info *sbi, bool set) { }
 #endif
+/* sizeof(struct fscrypt_context_v2) = 40 */
+#define LLCRYPT_ENC_CTX_SIZE 40
 
 /* Encoding/decoding routines inspired from yEnc principles.
  * We just take care of a few critical characters:
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 88a1d17..5610523 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -2063,8 +2063,7 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr,
 			 * it is necessary due to possible time
 			 * de-synchronization between MDT inode and OST objects
 			 */
-			if (S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode) &&
-			    attr->ia_valid & ATTR_SIZE) {
+			if (S_ISREG(inode->i_mode) && IS_ENCRYPTED(inode)) {
 				xvalid |= OP_XVALID_FLAGS;
 				flags = LUSTRE_ENCRYPT_FL;
 				/* Call to ll_io_zero_page is not necessary if
@@ -2073,7 +2072,8 @@ int ll_setattr_raw(struct dentry *dentry, struct iattr *attr,
 				 * In case of Direct IO, all we need is to set
 				 * new size.
 				 */
-				if (attr->ia_size & ~PAGE_MASK &&
+				if (attr->ia_valid & ATTR_SIZE &&
+				    attr->ia_size & ~PAGE_MASK &&
 				    !(attr->ia_valid & ATTR_FILE &&
 				      attr->ia_file->f_flags & O_DIRECT)) {
 					pgoff_t offset;
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index f32aa14..5cc01f0 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -33,6 +33,7 @@
 #include <linux/fs.h>
 #include <linux/sched.h>
 #include <linux/mm.h>
+#include <linux/file.h>
 #include <linux/quotaops.h>
 #include <linux/highmem.h>
 #include <linux/pagemap.h>
@@ -878,10 +879,102 @@ static struct dentry *ll_lookup_it(struct inode *parent, struct dentry *dentry,
 			*secctxlen = 0;
 	}
 	if (it->it_op & IT_CREAT && encrypt) {
-		rc = fscrypt_inherit_context(parent, NULL, op_data, false);
-		if (rc) {
-			retval = ERR_PTR(rc);
-			goto out;
+		/* Volatile file name may look like:
+		 * <parent>/LUSTRE_VOLATILE_HDR:<mdt_index>:<random>:fd=<fd>
+		 * where fd is opened descriptor of reference file.
+		 */
+		if (unlikely(filename_is_volatile(dentry->d_name.name,
+						  dentry->d_name.len, NULL))) {
+			int ctx_size = LLCRYPT_ENC_CTX_SIZE;
+			struct lustre_sb_info *lsi;
+			struct file *ref_file;
+			struct inode *ref_inode;
+			char *p, *q, *fd_str;
+			void *ctx;
+			int fd;
+
+			p = strnstr(dentry->d_name.name, ":fd=",
+				    dentry->d_name.len);
+			if (!p || strlen(p + 4) == 0) {
+				retval = ERR_PTR(-EINVAL);
+				goto out;
+			}
+
+			q = strchrnul(p + 4, ':');
+			fd_str = kstrndup(p + 4, q - p - 4, GFP_NOFS);
+			if (!fd_str) {
+				retval = ERR_PTR(-ENOMEM);
+				goto out;
+			}
+			rc = kstrtouint(fd_str, 10, &fd);
+			kfree(fd_str);
+			if (rc) {
+				rc = -EINVAL;
+				goto inherit;
+			}
+
+			ref_file = fget(fd);
+			if (!ref_file) {
+				rc = -EINVAL;
+				goto inherit;
+			}
+
+			ref_inode = file_inode(ref_file);
+			if (!ref_inode) {
+				fput(ref_file);
+				rc = -EINVAL;
+				goto inherit;
+			}
+
+			lsi = s2lsi(ref_inode->i_sb);
+
+getctx:
+			ctx = kzalloc(ctx_size, GFP_NOFS);
+			if (!ctx) {
+				retval = ERR_PTR(-ENOMEM);
+				goto out;
+			}
+
+#ifdef CONFIG_FS_ENCRYPTION
+			rc = ref_inode->i_sb->s_cop->get_context(ref_inode,
+								 ctx, ctx_size);
+#else
+			rc = -ENODATA;
+#endif
+			if (rc == -ERANGE) {
+				kfree(ctx);
+				ctx_size *= 2;
+				goto getctx;
+			}
+			fput(ref_file);
+			if (rc < 0) {
+				kfree(ctx);
+				goto inherit;
+			}
+
+			op_data->op_file_encctx_size = rc;
+			if (rc == ctx_size) {
+				op_data->op_file_encctx = ctx;
+			} else {
+				op_data->op_file_encctx = kzalloc(op_data->op_file_encctx_size,
+								  GFP_NOFS);
+				if (!op_data->op_file_encctx) {
+					kfree(ctx);
+					retval = ERR_PTR(-ENOMEM);
+					goto out;
+				}
+				memcpy(op_data->op_file_encctx, ctx,
+				       op_data->op_file_encctx_size);
+				kfree(ctx);
+			}
+		} else {
+inherit:
+			rc = fscrypt_inherit_context(parent, NULL, op_data,
+						     false);
+			if (rc) {
+				retval = ERR_PTR(rc);
+				goto out;
+			}
 		}
 		if (encctx)
 			*encctx = op_data->op_file_encctx;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-07-19 12:33 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-19 12:31 [lustre-devel] [PATCH 00/18] lustre: sync to OpenSFS as of July 18, 2021 James Simmons
2021-07-19 12:31 ` [lustre-devel] [PATCH 01/18] lustre: statahead: update task management code James Simmons
2021-07-19 12:31 ` [lustre-devel] [PATCH 02/18] lustre: llite: simplify callback handling for async getattr James Simmons
2021-07-19 12:31 ` [lustre-devel] [PATCH 03/18] lustre: uapi: per-user changelog names and mask James Simmons
2021-07-19 12:31 ` [lustre-devel] [PATCH 04/18] lnet: Correct peer NI recovery age out calculation James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 05/18] lustre: lmv: compare space to mkdir on parent MDT James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 06/18] lnet: annotate LNET_WIRE_HANDLE_COOKIE_NONE as u64 James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 07/18] lnet: libcfs: Add checksum speed under /sys/fs James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 08/18] lnet: use ni fatal error when calculating net health James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 09/18] lustre: quota: add get/set project support for non-dir/file James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 10/18] lustre: readahead: fix to reserve min pages James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 11/18] lnet: RMDA infrastructure updates James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 12/18] lnet: o2iblnd: Move racy NULL assignment James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 13/18] lnet: o2iblnd: Avoid double posting invalidate James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 14/18] lustre: quota: nodemap squashed root cannot bypass quota James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 15/18] lustre: llite: reset pfid after dir migration James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 16/18] lustre: llite: failed ASSERTION(ldlm_has_layout(lock)) James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 17/18] lustre: pcc: introducing OBD_CONNECT2_PCCRO flag James Simmons
2021-07-19 12:32 ` [lustre-devel] [PATCH 18/18] lustre: sec: migrate/extend/split on encrypted file James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).