lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021
@ 2021-08-02 19:50 James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 01/25] lustre: llite: avoid stale data reading James Simmons
                   ` (30 more replies)
  0 siblings, 31 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Port the latest patches from the OpenSFS tree as of Aug 2, 2021.
One patch was held back ("lustre: pcc: add LCM_FL_PCC_RDONLY layout flag")
due to a bug exposed by its change.

Andreas Dilger (2):
  lustre: llite: revert 'simplify callback handling for async getattr'
  lustre: uapi: remove MDS_SETATTR_PORTAL and service

Chris Horn (1):
  lnet: Protect lpni deref in lnet_health_check

Cyril Bordage (3):
  lnet: print device status in net show command
  lnet: check memdup_user_nul using IS_ERR
  lnet: add "stats reset" to lnetctl

Lai Siyao (3):
  lustre: lmv: getattr_name("..") under striped directory
  lustre: mdc: set default LMV on ROOT
  lustre: llite: enable filesystem-wide default LMV

Mr NeilBrown (2):
  lnet: discard lnet_current_net_count
  lnet: convert kiblnd/ksocknal_thread_start to vararg

Oleg Drokin (1):
  lustre: update version to 2.14.53

Patrick Farrell (10):
  lustre: llite: No locked parallel DIO
  lustre: llite: Modify AIO/DIO reference counting
  lustre: llite: Remove transient page counting
  lustre: lov: Improve DIO submit
  lustre: llite: Adjust dio refcounting
  lustre: clio: Skip prep for transients
  lustre: osc: Improve osc_queue_sync_pages
  lustre: osc: Remove lockless truncate
  lustre: osc: Remove client contention support
  lustre: osc: osc: Do not flush on lockless cancel

Serguei Smirnov (1):
  lnet: o2iblnd: clear fatal error on successful failover

Wang Shilong (2):
  lustre: llite: avoid stale data reading
  lustre: llite: avoid project quota overflow

 fs/lustre/include/cl_object.h           |  23 ++-
 fs/lustre/include/lustre_osc.h          |   3 -
 fs/lustre/include/obd.h                 |  37 ++--
 fs/lustre/include/obd_class.h           |   4 +-
 fs/lustre/llite/dir.c                   |   2 +
 fs/lustre/llite/file.c                  |  21 ++-
 fs/lustre/llite/llite_internal.h        |  12 +-
 fs/lustre/llite/llite_lib.c             |  22 ++-
 fs/lustre/llite/namei.c                 |  74 +++++++-
 fs/lustre/llite/rw.c                    |   4 +-
 fs/lustre/llite/rw26.c                  |  22 ++-
 fs/lustre/llite/statahead.c             | 324 +++++++++++++++++++++-----------
 fs/lustre/llite/vvp_internal.h          |   7 -
 fs/lustre/llite/vvp_io.c                |   4 +-
 fs/lustre/llite/vvp_object.c            |   4 +-
 fs/lustre/llite/vvp_page.c              |  22 ++-
 fs/lustre/lmv/lmv_obd.c                 |  44 +++--
 fs/lustre/lmv/lproc_lmv.c               |  26 ++-
 fs/lustre/lov/lov_io.c                  |  23 ++-
 fs/lustre/mdc/lproc_mdc.c               |  43 -----
 fs/lustre/mdc/mdc_dev.c                 |  15 +-
 fs/lustre/mdc/mdc_internal.h            |   3 +-
 fs/lustre/mdc/mdc_locks.c               |  31 +--
 fs/lustre/mdc/mdc_request.c             |   8 +
 fs/lustre/obdclass/cl_io.c              |  20 +-
 fs/lustre/obdclass/cl_page.c            |  21 ++-
 fs/lustre/obdecho/echo_client.c         |   4 +-
 fs/lustre/osc/lproc_osc.c               |  68 -------
 fs/lustre/osc/osc_cache.c               |   3 +-
 fs/lustre/osc/osc_io.c                  |  10 -
 fs/lustre/osc/osc_lock.c                |  31 +--
 fs/lustre/osc/osc_object.c              |  22 ---
 fs/lustre/ptlrpc/wiretest.c             |   2 -
 include/linux/lnet/lib-lnet.h           |   1 -
 include/uapi/linux/lnet/libcfs_ioctl.h  |   3 +-
 include/uapi/linux/lnet/lnet-dlc.h      |   1 +
 include/uapi/linux/lustre/lustre_idl.h  |  10 +-
 include/uapi/linux/lustre/lustre_user.h |   2 +
 include/uapi/linux/lustre/lustre_ver.h  |   4 +-
 net/lnet/klnds/o2iblnd/o2iblnd.c        |  37 +++-
 net/lnet/klnds/o2iblnd/o2iblnd.h        |  10 +-
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c     |  12 --
 net/lnet/klnds/socklnd/socklnd.c        |  16 +-
 net/lnet/klnds/socklnd/socklnd.h        |  10 +-
 net/lnet/klnds/socklnd/socklnd_cb.c     |  17 +-
 net/lnet/libcfs/module.c                |   4 +-
 net/lnet/libcfs/tracefile.c             |   8 +-
 net/lnet/lnet/api-ni.c                  |  32 +---
 net/lnet/lnet/lib-msg.c                 |  71 +++----
 net/lnet/lnet/router_proc.c             |   4 +-
 50 files changed, 642 insertions(+), 559 deletions(-)

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 01/25] lustre: llite: avoid stale data reading
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 02/25] lustre: llite: No locked parallel DIO James Simmons
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Wang Shilong, Lustre Development List

From: Wang Shilong <wshilong@ddn.com>

remove_mapping() can prohibit to kill page from page cache due page
refcount!=2, in vvp_page_delete() clear uptodate flag in case
stale data reading later.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14541
Lustre-commit: f2a16793fa4316fc9cc ("LU-14541 llite: avoid stale data reading")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/43476
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/vvp_page.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c
index 86353df..2ecd414 100644
--- a/fs/lustre/llite/vvp_page.c
+++ b/fs/lustre/llite/vvp_page.c
@@ -172,6 +172,12 @@ static void vvp_page_delete(const struct lu_env *env,
 
 	ClearPagePrivate(vmpage);
 	vmpage->private = 0;
+
+	/**
+	 * Vmpage might not be released due page refcount != 2,
+	 * clear Page uptodate here to avoid stale data.
+	 */
+	ClearPageUptodate(vmpage);
 	/*
 	 * Reference from vmpage to cl_page is removed, but the reference back
 	 * is still here. It is removed later in vvp_page_fini().
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 02/25] lustre: llite: No locked parallel DIO
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 01/25] lustre: llite: avoid stale data reading James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 03/25] lnet: discard lnet_current_net_count James Simmons
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

If we are doing locked DIO, the OSC & LDLM locks are
released at the end of cl_io_loop, ie, before we wait for
parallel DIO at the llite layer.

This is problematic because the locks are released before
i/o done using them is complete; this can lead to data
inconsistencies.  (And at least one LBUG, see LU-14805.)

The easiest solution for now is only do parallel DIO when
working lockless (which is the default; DIO only switches
to locked to manage conflicts with buffered i/o).

This problem & fix apply to AIO as well as parallel DIO.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14805
Lustre-commit: 0f8db7e06abbc341 ("LU-14805 llite: No locked parallel DIO")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44131
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/rw26.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index ba9c070..0d72c3e 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -410,10 +410,19 @@ static ssize_t ll_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	else
 		vio->u.readwrite.vui_read += tot_bytes;
 
-	/* If async dio submission is not allowed, we must wait here. */
-	if (is_sync_kiocb(iocb) && !io->ci_parallel_dio) {
+	/* We cannot do async submission - for AIO or regular DIO - unless
+	 * lockless because it causes us to release the lock early.
+	 *
+	 * There are also several circumstances in which we must disable
+	 * parallel DIO, so we check if it is enabled.
+	 *
+	 * The check for "is_sync_kiocb" excludes AIO, which does not need to
+	 * be disabled in these situations.
+	 */
+	if (io->ci_dio_lock || (is_sync_kiocb(iocb) && !io->ci_parallel_dio)) {
 		ssize_t rc2;
 
+		/* Wait here rather than doing async submission */
 		rc2 = cl_sync_io_wait_recycle(env, &aio->cda_sync, 0, 0);
 		if (result == 0 && rc2)
 			result = rc2;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 03/25] lnet: discard lnet_current_net_count
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 01/25] lustre: llite: avoid stale data reading James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 02/25] lustre: llite: No locked parallel DIO James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 04/25] lnet: convert kiblnd/ksocknal_thread_start to vararg James Simmons
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

The variable lnet_current_net_count is never used.  So remove it.
The function lnet_get_net_count() is only used to update thar
variable, so remove it too.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lsutre-commit: a39f07804153f4f4 ("LU-6142 lnet: discard lnet_current_net_count")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/44089
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h |  1 -
 net/lnet/lnet/api-ni.c        | 22 ----------------------
 2 files changed, 23 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index f56ecab..3677a12 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -804,7 +804,6 @@ bool lnet_net_unique(u32 net_id, struct list_head *nilist,
 bool lnet_ni_unique_net(struct list_head *nilist, char *iface);
 void lnet_incr_dlc_seq(void);
 u32 lnet_get_dlc_seq_locked(void);
-int lnet_get_net_count(void);
 
 struct lnet_peer_net *lnet_get_next_peer_net_locked(struct lnet_peer *lp,
 						    u32 prev_lpn_id);
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index dc9020d..ec28139 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -196,8 +196,6 @@ static void lnet_set_lnd_timeout(void)
 			   (lnet_retry_count + 1);
 }
 
-unsigned int lnet_current_net_count;
-
 /*
  * This sequence number keeps track of how many times DLC was used to
  * update the local NIs. It is incremented when a NI is added or
@@ -1671,23 +1669,6 @@ struct lnet_ping_buffer *
 	return count;
 }
 
-int
-lnet_get_net_count(void)
-{
-	struct lnet_net *net;
-	int count = 0;
-
-	lnet_net_lock(0);
-
-	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
-		count++;
-	}
-
-	lnet_net_unlock(0);
-
-	return count;
-}
-
 void
 lnet_swap_pinginfo(struct lnet_ping_buffer *pbuf)
 {
@@ -2516,9 +2497,6 @@ static void lnet_push_target_fini(void)
 		lnet_net_unlock(LNET_LOCK_EX);
 	}
 
-	/* update net count */
-	lnet_current_net_count = lnet_get_net_count();
-
 	return ni_count;
 
 failed1:
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 04/25] lnet: convert kiblnd/ksocknal_thread_start to vararg
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (2 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 03/25] lnet: discard lnet_current_net_count James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 05/25] lnet: print device status in net show command James Simmons
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Mr NeilBrown <neilb@suse.de>

Rather than requiring the called to format a thread name into a temp
buffer, change these thread_start function to accept a format and
args, and to hand them directly to kthread_run().

This is done with a macro rather than a function as the functions are
trivial and varargs is slightly easier with macros.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: 9976d2c35d40a170 ("LU-6142 lnet: convert kiblnd/ksocknal_thread_start to vararg")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/44122
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c    | 10 ++++------
 net/lnet/klnds/o2iblnd/o2iblnd.h    | 10 +++++++++-
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 12 ------------
 net/lnet/klnds/socklnd/socklnd.c    | 16 ++++++----------
 net/lnet/klnds/socklnd/socklnd.h    | 10 +++++++++-
 net/lnet/klnds/socklnd/socklnd_cb.c | 17 ++---------------
 6 files changed, 30 insertions(+), 45 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index b519a31..3141953 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -2712,13 +2712,11 @@ static int kiblnd_start_schedulers(struct kib_sched_info *sched)
 	}
 
 	for (i = 0; i < nthrs; i++) {
-		long id;
-		char name[20];
+		long id = KIB_THREAD_ID(sched->ibs_cpt, sched->ibs_nthreads + i);
 
-		id = KIB_THREAD_ID(sched->ibs_cpt, sched->ibs_nthreads + i);
-		snprintf(name, sizeof(name), "kiblnd_sd_%02ld_%02ld",
-			 KIB_THREAD_CPT(id), KIB_THREAD_TID(id));
-		rc = kiblnd_thread_start(kiblnd_scheduler, (void *)id, name);
+		rc = kiblnd_thread_start(kiblnd_scheduler, (void *)id,
+					 "kiblnd_sd_%02ld_%02ld",
+					 KIB_THREAD_CPT(id), KIB_THREAD_TID(id));
 		if (!rc)
 			continue;
 
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 8d1d7eb..3691bfe 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -907,7 +907,15 @@ int kiblnd_fmr_pool_map(struct kib_fmr_poolset *fps, struct kib_tx *tx,
 
 int kiblnd_connd(void *arg);
 int kiblnd_scheduler(void *arg);
-int kiblnd_thread_start(int (*fn)(void *arg), void *arg, char *name);
+#define kiblnd_thread_start(fn, data, namefmt, arg...)			\
+	({								\
+		struct task_struct *__task = kthread_run(fn, data,	\
+							 namefmt, ##arg);\
+		if (!IS_ERR(__task))					\
+			atomic_inc(&kiblnd_data.kib_nthreads);		\
+		PTR_ERR_OR_ZERO(__task);				\
+	})
+
 int kiblnd_failover_thread(void *arg);
 
 int kiblnd_alloc_pages(struct kib_pages **pp, int cpt, int npages);
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 32ccac2..193e75b 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -1830,18 +1830,6 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	return rc;
 }
 
-int
-kiblnd_thread_start(int (*fn)(void *arg), void *arg, char *name)
-{
-	struct task_struct *task = kthread_run(fn, arg, "%s", name);
-
-	if (IS_ERR(task))
-		return PTR_ERR(task);
-
-	atomic_inc(&kiblnd_data.kib_nthreads);
-	return 0;
-}
-
 static void
 kiblnd_thread_fini(void)
 {
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index e15f1c0..cbbbb0c 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -2066,15 +2066,13 @@ static int ksocknal_device_event(struct notifier_block *unused,
 	}
 
 	for (i = 0; i < *ksocknal_tunables.ksnd_nconnds; i++) {
-		char name[16];
-
 		spin_lock_bh(&ksocknal_data.ksnd_connd_lock);
 		ksocknal_data.ksnd_connd_starting++;
 		spin_unlock_bh(&ksocknal_data.ksnd_connd_lock);
 
-		snprintf(name, sizeof(name), "socknal_cd%02d", i);
 		rc = ksocknal_thread_start(ksocknal_connd,
-					   (void *)((uintptr_t)i), name);
+					   (void *)((uintptr_t)i),
+					   "socknal_cd%02d", i);
 		if (rc) {
 			spin_lock_bh(&ksocknal_data.ksnd_connd_lock);
 			ksocknal_data.ksnd_connd_starting--;
@@ -2241,14 +2239,12 @@ static int ksocknal_device_event(struct notifier_block *unused,
 
 	for (i = 0; i < nthrs; i++) {
 		long id;
-		char name[20];
 
 		id = KSOCK_THREAD_ID(sched->kss_cpt, sched->kss_nthreads + i);
-		snprintf(name, sizeof(name), "socknal_sd%02d_%02d",
-			 sched->kss_cpt, (int)KSOCK_THREAD_SID(id));
-
-		rc = ksocknal_thread_start(ksocknal_scheduler,
-					   (void *)id, name);
+		rc = ksocknal_thread_start(ksocknal_scheduler, (void *)id,
+					   "socknal_sd%02d_%02d",
+					   sched->kss_cpt,
+					   (int)KSOCK_THREAD_SID(id));
 		if (!rc)
 			continue;
 
diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h
index 357769a..45103a3 100644
--- a/net/lnet/klnds/socklnd/socklnd.h
+++ b/net/lnet/klnds/socklnd/socklnd.h
@@ -650,7 +650,15 @@ int ksocknal_launch_packet(struct lnet_ni *ni, struct ksock_tx *tx,
 void ksocknal_queue_tx_locked(struct ksock_tx *tx, struct ksock_conn *conn);
 void ksocknal_txlist_done(struct lnet_ni *ni, struct list_head *txlist, int error);
 void ksocknal_query(struct lnet_ni *ni, lnet_nid_t nid, time64_t *when);
-int ksocknal_thread_start(int (*fn)(void *arg), void *arg, char *name);
+#define ksocknal_thread_start(fn, data, namefmt, arg...)		\
+	({								\
+		struct task_struct *__task = kthread_run(fn, data,	\
+							 namefmt, ##arg);\
+		if (!IS_ERR(__task))					\
+			atomic_inc(&ksocknal_data.ksnd_nthreads);	\
+		PTR_ERR_OR_ZERO(__task);				\
+	})
+
 void ksocknal_thread_fini(void);
 void ksocknal_launch_all_connections_locked(struct ksock_peer_ni *peer_ni);
 struct ksock_conn_cb *
diff --git a/net/lnet/klnds/socklnd/socklnd_cb.c b/net/lnet/klnds/socklnd/socklnd_cb.c
index bfb98f5..efec479 100644
--- a/net/lnet/klnds/socklnd/socklnd_cb.c
+++ b/net/lnet/klnds/socklnd/socklnd_cb.c
@@ -966,18 +966,6 @@ struct ksock_conn_cb *
 	return -EIO;
 }
 
-int
-ksocknal_thread_start(int (*fn)(void *arg), void *arg, char *name)
-{
-	struct task_struct *task = kthread_run(fn, arg, "%s", name);
-
-	if (IS_ERR(task))
-		return PTR_ERR(task);
-
-	atomic_inc(&ksocknal_data.ksnd_nthreads);
-	return 0;
-}
-
 void
 ksocknal_thread_fini(void)
 {
@@ -1951,7 +1939,6 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 static int
 ksocknal_connd_check_start(time64_t sec, long *timeout)
 {
-	char name[16];
 	int rc;
 	int total = ksocknal_data.ksnd_connd_starting +
 		    ksocknal_data.ksnd_connd_running;
@@ -1991,8 +1978,8 @@ void ksocknal_write_callback(struct ksock_conn *conn)
 	spin_unlock_bh(&ksocknal_data.ksnd_connd_lock);
 
 	/* NB: total is the next id */
-	snprintf(name, sizeof(name), "socknal_cd%02d", total);
-	rc = ksocknal_thread_start(ksocknal_connd, NULL, name);
+	rc = ksocknal_thread_start(ksocknal_connd, NULL,
+				   "socknal_cd%02d", total);
 
 	spin_lock_bh(&ksocknal_data.ksnd_connd_lock);
 	if (!rc)
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 05/25] lnet: print device status in net show command
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (3 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 04/25] lnet: convert kiblnd/ksocknal_thread_start to vararg James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 06/25] lustre: lmv: getattr_name("..") under striped directory James Simmons
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Cyril Bordage, Lustre Development List

From: Cyril Bordage <cbordage@whamcloud.com>

A device can be in fatal state, if the cable was disconnected, or the
port brought down on the switch side. In these cases, the LND (o2iblnd
for now), will flag the device in fatal state. That device will not be
used any further. However, it's health will not be decremented. This
causes some confusion when examining the state of the node.
It is better to print the device status in the output of the lnetctl
net show command.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14114
Lustre-commit: f75ff33d9fbefd69 ("LU-14114 lnet: print device status in net show command")
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44169
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lnet/lnet-dlc.h | 1 +
 net/lnet/lnet/api-ni.c             | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h
index c1c063f..ef60224 100644
--- a/include/uapi/linux/lnet/lnet-dlc.h
+++ b/include/uapi/linux/lnet/lnet-dlc.h
@@ -190,6 +190,7 @@ struct lnet_ioctl_local_ni_hstats {
 	__u32 hlni_local_no_route;
 	__u32 hlni_local_timeout;
 	__u32 hlni_local_error;
+	__s32 hlni_fatal_error;
 	__s32 hlni_health_value;
 	__u32 hlni_ping_count;
 	__u64 hlni_next_ping;
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index ec28139..4513d8d 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -3692,6 +3692,8 @@ u32 lnet_get_dlc_seq_locked(void)
 		atomic_read(&ni->ni_hstats.hlt_local_timeout);
 	stats->hlni_local_error =
 		atomic_read(&ni->ni_hstats.hlt_local_error);
+	stats->hlni_fatal_error =
+		atomic_read(&ni->ni_fatal_error_on);
 	stats->hlni_health_value =
 		atomic_read(&ni->ni_healthv);
 	stats->hlni_ping_count = ni->ni_ping_count;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 06/25] lustre: lmv: getattr_name("..") under striped directory
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (4 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 05/25] lnet: print device status in net show command James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 07/25] lustre: llite: revert 'simplify callback handling for async getattr' James Simmons
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

For getattr_name(".."), it should return FID of the master object for
striped directories. This includes changes on both client and server:
* lmv_getattr_name() should use master object FID if it's looking up
  "..".
* mdt_raw_lookup() should check parent object is sub stripe, if so
  it needs to lookup again to get master object FID. For old client
  without above change this needs to be checked twice.

This is needed by NFS export, because ll_get_parent() find parent by
getattr_name("..").

Reenable check_fhandle_syscall and update sanityn test_102.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14826
Lustre-commit: cbc62b0b829afdce ("LU-14826 mdt: getattr_name("..") under striped directory")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44168
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lmv/lmv_obd.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 2f84028..1d9b830 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1945,7 +1945,11 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 	int rc;
 
 retry:
-	tgt = lmv_locate_tgt(lmv, op_data);
+	if (op_data->op_namelen == 2 &&
+	    op_data->op_name[0] == '.' && op_data->op_name[1] == '.')
+		tgt = lmv_fid2tgt(lmv, &op_data->op_fid1);
+	else
+		tgt = lmv_locate_tgt(lmv, op_data);
 	if (IS_ERR(tgt))
 		return PTR_ERR(tgt);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 07/25] lustre: llite: revert 'simplify callback handling for async getattr'
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (5 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 06/25] lustre: lmv: getattr_name("..") under striped directory James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 08/25] lnet: Protect lpni deref in lnet_health_check James Simmons
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

This reverts commit 248f68f27de7d18c58a44114a46259141ca53115.

This is causing process hangs and timeouts during file removal.

Fixes: 248f68f27d ("lustre: llite: simplify callback handling for async getattr")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14868
Lustre-commit: e90794af4bfac3a5 ("U-14868 llite: revert 'simplify callback handling for async getattr'")
Reviewed-on: https://review.whamcloud.com/44371
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd.h          |  32 ++--
 fs/lustre/include/obd_class.h    |   4 +-
 fs/lustre/llite/llite_internal.h |   7 +-
 fs/lustre/llite/statahead.c      | 319 ++++++++++++++++++++++++++-------------
 fs/lustre/lmv/lmv_obd.c          |   6 +-
 fs/lustre/mdc/mdc_internal.h     |   3 +-
 fs/lustre/mdc/mdc_locks.c        |  31 ++--
 7 files changed, 252 insertions(+), 150 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index eeb6262..f619342 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -818,24 +818,18 @@ struct md_callback {
 			       void *data, int flag);
 };
 
-enum md_opcode {
-	MD_OP_NONE	= 0,
-	MD_OP_GETATTR	= 1,
-	MD_OP_MAX,
-};
-
-struct md_op_item {
-	enum md_opcode			mop_opc;
-	struct md_op_data		mop_data;
-	struct lookup_intent		mop_it;
-	struct lustre_handle		mop_lockh;
-	struct ldlm_enqueue_info	mop_einfo;
-	int (*mop_cb)(struct req_capsule *pill,
-		      struct md_op_item *item,
-		      int rc);
-	void			       *mop_cbdata;
-	struct inode		       *mop_dir;
-	u64				mop_lock_flags;
+struct md_enqueue_info;
+/* metadata stat-ahead */
+
+struct md_enqueue_info {
+	struct md_op_data		mi_data;
+	struct lookup_intent		mi_it;
+	struct lustre_handle		mi_lockh;
+	struct inode		       *mi_dir;
+	struct ldlm_enqueue_info	mi_einfo;
+	int (*mi_cb)(struct ptlrpc_request *req,
+		     struct md_enqueue_info *minfo, int rc);
+	void			       *mi_cbdata;
 };
 
 struct obd_ops {
@@ -1067,7 +1061,7 @@ struct md_ops {
 				struct lu_fid *fid);
 
 	int (*intent_getattr_async)(struct obd_export *exp,
-				    struct md_op_item *item);
+				    struct md_enqueue_info *minfo);
 
 	int (*revalidate_lock)(struct obd_export *, struct lookup_intent *,
 			       struct lu_fid *, u64 *bits);
diff --git a/fs/lustre/include/obd_class.h b/fs/lustre/include/obd_class.h
index ad9b2fc..f2a3d2b 100644
--- a/fs/lustre/include/obd_class.h
+++ b/fs/lustre/include/obd_class.h
@@ -1594,7 +1594,7 @@ static inline int md_init_ea_size(struct obd_export *exp, u32 easize,
 }
 
 static inline int md_intent_getattr_async(struct obd_export *exp,
-					  struct md_op_item *item)
+					  struct md_enqueue_info *minfo)
 {
 	int rc;
 
@@ -1605,7 +1605,7 @@ static inline int md_intent_getattr_async(struct obd_export *exp,
 	lprocfs_counter_incr(exp->exp_obd->obd_md_stats,
 			     LPROC_MD_INTENT_GETATTR_ASYNC);
 
-	return MDP(exp->exp_obd, intent_getattr_async)(exp, item);
+	return MDP(exp->exp_obd, intent_getattr_async)(exp, minfo);
 }
 
 static inline int md_revalidate_lock(struct obd_export *exp,
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 6cae741..2247806 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -1480,12 +1480,17 @@ struct ll_statahead_info {
 					     * is not a hidden one
 					     */
 	unsigned int	    sai_skip_hidden;/* skipped hidden dentry count */
-	unsigned int	    sai_ls_all:1;   /* "ls -al", do stat-ahead for
+	unsigned int	    sai_ls_all:1,   /* "ls -al", do stat-ahead for
 					     * hidden entries
 					     */
+				sai_in_readpage:1;/* statahead in readdir() */
 	wait_queue_head_t	sai_waitq;      /* stat-ahead wait queue */
 	struct task_struct     *sai_task;       /* stat-ahead thread */
 	struct task_struct     *sai_agl_task;   /* AGL thread */
+	struct list_head	sai_interim_entries; /* entries which got async
+						      * stat reply, but not
+						      * instantiated
+						      */
 	struct list_head	sai_entries;	/* completed entries */
 	struct list_head	sai_agls;	/* AGLs to be sent */
 	struct list_head	sai_cache[LL_SA_CACHE_SIZE];
diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c
index becd0e1..8930f61 100644
--- a/fs/lustre/llite/statahead.c
+++ b/fs/lustre/llite/statahead.c
@@ -32,6 +32,7 @@
 
 #include <linux/fs.h>
 #include <linux/sched.h>
+#include <linux/kthread.h>
 #include <linux/mm.h>
 #include <linux/highmem.h>
 #include <linux/pagemap.h>
@@ -55,12 +56,13 @@ enum se_stat {
 
 /*
  * sa_entry is not refcounted: statahead thread allocates it and do async stat,
- * and in async stat callback ll_statahead_interpret() will prepare the inode
- * and set lock data in the ptlrpcd context. Then the scanner process will be
- * woken up if this entry is the waiting one, can access and free it.
+ * and in async stat callback ll_statahead_interpret() will add it into
+ * sai_interim_entries, later statahead thread will call sa_handle_callback() to
+ * instantiate entry and move it into sai_entries, and then only scanner process
+ * can access and free it.
  */
 struct sa_entry {
-	/* link into sai_entries */
+	/* link into sai_interim_entries or sai_entries */
 	struct list_head	se_list;
 	/* link into sai hash table locally */
 	struct list_head	se_hash;
@@ -72,6 +74,10 @@ struct sa_entry {
 	enum se_stat		se_state;
 	/* entry size, contains name */
 	int			se_size;
+	/* pointer to async getattr enqueue info */
+	struct md_enqueue_info	*se_minfo;
+	/* pointer to the async getattr request */
+	struct ptlrpc_request	*se_req;
 	/* pointer to the target inode */
 	struct inode		*se_inode;
 	/* entry name */
@@ -131,6 +137,12 @@ static inline int sa_sent_full(struct ll_statahead_info *sai)
 	return atomic_read(&sai->sai_cache_count) >= sai->sai_max;
 }
 
+/* got async stat replies */
+static inline int sa_has_callback(struct ll_statahead_info *sai)
+{
+	return !list_empty(&sai->sai_interim_entries);
+}
+
 static inline int agl_list_empty(struct ll_statahead_info *sai)
 {
 	return list_empty(&sai->sai_agls);
@@ -316,55 +328,55 @@ static void sa_free(struct ll_statahead_info *sai, struct sa_entry *entry)
 }
 
 /* finish async stat RPC arguments */
-static void sa_fini_data(struct md_op_item *item)
+static void sa_fini_data(struct md_enqueue_info *minfo)
 {
-	ll_unlock_md_op_lsm(&item->mop_data);
-	iput(item->mop_dir);
-	kfree(item);
+	ll_unlock_md_op_lsm(&minfo->mi_data);
+	iput(minfo->mi_dir);
+	kfree(minfo);
 }
 
-static int ll_statahead_interpret(struct req_capsule *pill,
-				  struct md_op_item *item, int rc);
+static int ll_statahead_interpret(struct ptlrpc_request *req,
+				  struct md_enqueue_info *minfo, int rc);
 
 /*
  * prepare arguments for async stat RPC.
  */
-static struct md_op_item *
+static struct md_enqueue_info *
 sa_prep_data(struct inode *dir, struct inode *child, struct sa_entry *entry)
 {
-	struct md_op_item *item;
+	struct md_enqueue_info   *minfo;
 	struct ldlm_enqueue_info *einfo;
-	struct md_op_data *op_data;
+	struct md_op_data        *op_data;
 
-	item = kzalloc(sizeof(*item), GFP_NOFS);
-	if (!item)
+	minfo = kzalloc(sizeof(*minfo), GFP_NOFS);
+	if (!minfo)
 		return ERR_PTR(-ENOMEM);
 
-	op_data = ll_prep_md_op_data(&item->mop_data, dir, child,
+	op_data = ll_prep_md_op_data(&minfo->mi_data, dir, child,
 				     entry->se_qstr.name, entry->se_qstr.len, 0,
 				     LUSTRE_OPC_ANY, NULL);
 	if (IS_ERR(op_data)) {
-		kfree(item);
-		return ERR_CAST(item);
+		kfree(minfo);
+		return (struct md_enqueue_info *)op_data;
 	}
 
 	if (!child)
 		op_data->op_fid2 = entry->se_fid;
 
-	item->mop_it.it_op = IT_GETATTR;
-	item->mop_dir = igrab(dir);
-	item->mop_cb = ll_statahead_interpret;
-	item->mop_cbdata = entry;
-
-	einfo = &item->mop_einfo;
-	einfo->ei_type = LDLM_IBITS;
-	einfo->ei_mode = it_to_lock_mode(&item->mop_it);
-	einfo->ei_cb_bl = ll_md_blocking_ast;
-	einfo->ei_cb_cp = ldlm_completion_ast;
-	einfo->ei_cb_gl = NULL;
+	minfo->mi_it.it_op = IT_GETATTR;
+	minfo->mi_dir = igrab(dir);
+	minfo->mi_cb = ll_statahead_interpret;
+	minfo->mi_cbdata = entry;
+
+	einfo = &minfo->mi_einfo;
+	einfo->ei_type   = LDLM_IBITS;
+	einfo->ei_mode   = it_to_lock_mode(&minfo->mi_it);
+	einfo->ei_cb_bl  = ll_md_blocking_ast;
+	einfo->ei_cb_cp  = ldlm_completion_ast;
+	einfo->ei_cb_gl  = NULL;
 	einfo->ei_cbdata = NULL;
 
-	return item;
+	return minfo;
 }
 
 /*
@@ -375,8 +387,22 @@ static int ll_statahead_interpret(struct req_capsule *pill,
 sa_make_ready(struct ll_statahead_info *sai, struct sa_entry *entry, int ret)
 {
 	struct ll_inode_info *lli = ll_i2info(sai->sai_dentry->d_inode);
+	struct md_enqueue_info *minfo = entry->se_minfo;
+	struct ptlrpc_request *req = entry->se_req;
 	bool wakeup;
 
+	/* release resources used in RPC */
+	if (minfo) {
+		entry->se_minfo = NULL;
+		ll_intent_release(&minfo->mi_it);
+		sa_fini_data(minfo);
+	}
+
+	if (req) {
+		entry->se_req = NULL;
+		ptlrpc_req_finished(req);
+	}
+
 	spin_lock(&lli->lli_sa_lock);
 	wakeup = __sa_make_ready(sai, entry, ret);
 	spin_unlock(&lli->lli_sa_lock);
@@ -433,6 +459,7 @@ static struct ll_statahead_info *ll_sai_alloc(struct dentry *dentry)
 	sai->sai_index = 1;
 	init_waitqueue_head(&sai->sai_waitq);
 
+	INIT_LIST_HEAD(&sai->sai_interim_entries);
 	INIT_LIST_HEAD(&sai->sai_entries);
 	INIT_LIST_HEAD(&sai->sai_agls);
 
@@ -495,6 +522,7 @@ static void ll_sai_put(struct ll_statahead_info *sai)
 		LASSERT(sai->sai_task == NULL);
 		LASSERT(sai->sai_agl_task == NULL);
 		LASSERT(sai->sai_sent == sai->sai_replied);
+		LASSERT(!sa_has_callback(sai));
 
 		list_for_each_entry_safe(entry, next, &sai->sai_entries,
 					 se_list)
@@ -585,63 +613,26 @@ static void ll_agl_trigger(struct inode *inode, struct ll_statahead_info *sai)
 }
 
 /*
- * Callback for async stat RPC, this is called in ptlrpcd context. It prepares
- * the inode and set lock data directly in the ptlrpcd context. It will wake up
- * the directory listing process if the dentry is the waiting one.
+ * prepare inode for sa entry, add it into agl list, now sa_entry is ready
+ * to be used by scanner process.
  */
-static int ll_statahead_interpret(struct req_capsule *pill,
-				  struct md_op_item *item, int rc)
+static void sa_instantiate(struct ll_statahead_info *sai,
+			   struct sa_entry *entry)
 {
-	struct lookup_intent *it = &item->mop_it;
-	struct inode *dir = item->mop_dir;
-	struct ll_inode_info *lli = ll_i2info(dir);
-	struct ll_statahead_info *sai = lli->lli_sai;
-	struct sa_entry *entry = (struct sa_entry *)item->mop_cbdata;
-	struct mdt_body	*body;
+	struct inode *dir = sai->sai_dentry->d_inode;
 	struct inode *child;
-	u64 handle = 0;
-
-	if (it_disposition(it, DISP_LOOKUP_NEG))
-		rc = -ENOENT;
-
-	/*
-	 * because statahead thread will wait for all inflight RPC to finish,
-	 * sai should be always valid, no need to refcount
-	 */
-	LASSERT(sai);
-	LASSERT(entry);
-
-	CDEBUG(D_READA, "sa_entry %.*s rc %d\n",
-	       entry->se_qstr.len, entry->se_qstr.name, rc);
-
-	if (rc != 0) {
-		ll_intent_release(it);
-		sa_fini_data(item);
-	} else {
-		/*
-		 * release ibits lock ASAP to avoid deadlock when statahead
-		 * thread enqueues lock on parent in readdir and another
-		 * process enqueues lock on child with parent lock held, eg.
-		 * unlink.
-		 */
-		handle = it->it_lock_handle;
-		ll_intent_drop_lock(it);
-		ll_unlock_md_op_lsm(&item->mop_data);
-	}
-
-	if (rc != 0) {
-		spin_lock(&lli->lli_sa_lock);
-		if (__sa_make_ready(sai, entry, rc))
-			wake_up(&sai->sai_waitq);
-
-		sai->sai_replied++;
-		spin_unlock(&lli->lli_sa_lock);
+	struct md_enqueue_info *minfo;
+	struct lookup_intent *it;
+	struct ptlrpc_request *req;
+	struct mdt_body	*body;
+	int rc = 0;
 
-		return rc;
-	}
+	LASSERT(entry->se_handle != 0);
 
-	entry->se_handle = handle;
-	body = req_capsule_server_get(pill, &RMF_MDT_BODY);
+	minfo = entry->se_minfo;
+	it = &minfo->mi_it;
+	req = entry->se_req;
+	body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
 	if (!body) {
 		rc = -EFAULT;
 		goto out;
@@ -649,7 +640,7 @@ static int ll_statahead_interpret(struct req_capsule *pill,
 
 	child = entry->se_inode;
 	/* revalidate; unlinked and re-created with the same name */
-	if (unlikely(!lu_fid_eq(&item->mop_data.op_fid2, &body->mbo_fid1))) {
+	if (unlikely(!lu_fid_eq(&minfo->mi_data.op_fid2, &body->mbo_fid1))) {
 		if (child) {
 			entry->se_inode = NULL;
 			iput(child);
@@ -666,7 +657,7 @@ static int ll_statahead_interpret(struct req_capsule *pill,
 		goto out;
 	}
 
-	rc = ll_prep_inode(&child, pill, dir->i_sb, it);
+	rc = ll_prep_inode(&child, &req->rq_pill, dir->i_sb, it);
 	if (rc)
 		goto out;
 
@@ -679,18 +670,107 @@ static int ll_statahead_interpret(struct req_capsule *pill,
 
 	if (agl_should_run(sai, child))
 		ll_agl_add(sai, child, entry->se_index);
+
 out:
 	/*
-	 * First it will drop ldlm ibits lock refcount by calling
+	 * sa_make_ready() will drop ldlm ibits lock refcount by calling
 	 * ll_intent_drop_lock() in spite of failures. Do not worry about
 	 * calling ll_intent_drop_lock() more than once.
 	 */
-	ll_intent_release(&item->mop_it);
-	sa_fini_data(item);
 	sa_make_ready(sai, entry, rc);
+}
+
+/* once there are async stat replies, instantiate sa_entry from replies */
+static void sa_handle_callback(struct ll_statahead_info *sai)
+{
+	struct ll_inode_info *lli;
+
+	lli = ll_i2info(sai->sai_dentry->d_inode);
 
 	spin_lock(&lli->lli_sa_lock);
+	while (sa_has_callback(sai)) {
+		struct sa_entry *entry;
+
+		entry = list_first_entry(&sai->sai_interim_entries,
+					 struct sa_entry, se_list);
+		list_del_init(&entry->se_list);
+		spin_unlock(&lli->lli_sa_lock);
+
+		sa_instantiate(sai, entry);
+		spin_lock(&lli->lli_sa_lock);
+	}
+	spin_unlock(&lli->lli_sa_lock);
+}
+
+/*
+ * callback for async stat RPC, because this is called in ptlrpcd context, we
+ * only put sa_entry in sai_interim_entries, and wake up statahead thread to
+ * really prepare inode and instantiate sa_entry later.
+ */
+static int ll_statahead_interpret(struct ptlrpc_request *req,
+				  struct md_enqueue_info *minfo, int rc)
+{
+	struct lookup_intent *it = &minfo->mi_it;
+	struct inode *dir = minfo->mi_dir;
+	struct ll_inode_info *lli = ll_i2info(dir);
+	struct ll_statahead_info *sai = lli->lli_sai;
+	struct sa_entry *entry = (struct sa_entry *)minfo->mi_cbdata;
+	u64 handle = 0;
+
+	if (it_disposition(it, DISP_LOOKUP_NEG))
+		rc = -ENOENT;
+
+	/*
+	 * because statahead thread will wait for all inflight RPC to finish,
+	 * sai should be always valid, no need to refcount
+	 */
+	LASSERT(sai);
+	LASSERT(entry);
+
+	CDEBUG(D_READA, "sa_entry %.*s rc %d\n",
+	       entry->se_qstr.len, entry->se_qstr.name, rc);
+
+	if (rc) {
+		ll_intent_release(it);
+		sa_fini_data(minfo);
+	} else {
+		/*
+		 * release ibits lock ASAP to avoid deadlock when statahead
+		 * thread enqueues lock on parent in readdir and another
+		 * process enqueues lock on child with parent lock held, eg.
+		 * unlink.
+		 */
+		handle = it->it_lock_handle;
+		ll_intent_drop_lock(it);
+		ll_unlock_md_op_lsm(&minfo->mi_data);
+	}
+
+	spin_lock(&lli->lli_sa_lock);
+	if (rc) {
+		if (__sa_make_ready(sai, entry, rc))
+			wake_up(&sai->sai_waitq);
+	} else {
+		int first = 0;
+
+		entry->se_minfo = minfo;
+		entry->se_req = ptlrpc_request_addref(req);
+		/*
+		 * Release the async ibits lock ASAP to avoid deadlock
+		 * when statahead thread tries to enqueue lock on parent
+		 * for readpage and other tries to enqueue lock on child
+		 * with parent's lock held, for example: unlink.
+		 */
+		entry->se_handle = handle;
+		if (!sa_has_callback(sai))
+			first = 1;
+
+		list_add_tail(&entry->se_list, &sai->sai_interim_entries);
+
+		if (first && sai->sai_task)
+			wake_up_process(sai->sai_task);
+	}
 	sai->sai_replied++;
+
 	spin_unlock(&lli->lli_sa_lock);
 
 	return rc;
@@ -699,16 +779,16 @@ static int ll_statahead_interpret(struct req_capsule *pill,
 /* async stat for file not found in dcache */
 static int sa_lookup(struct inode *dir, struct sa_entry *entry)
 {
-	struct md_op_item *item;
+	struct md_enqueue_info *minfo;
 	int rc;
 
-	item = sa_prep_data(dir, NULL, entry);
-	if (IS_ERR(item))
-		return PTR_ERR(item);
+	minfo = sa_prep_data(dir, NULL, entry);
+	if (IS_ERR(minfo))
+		return PTR_ERR(minfo);
 
-	rc = md_intent_getattr_async(ll_i2mdexp(dir), item);
+	rc = md_intent_getattr_async(ll_i2mdexp(dir), minfo);
 	if (rc)
-		sa_fini_data(item);
+		sa_fini_data(minfo);
 
 	return rc;
 }
@@ -728,7 +808,7 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry,
 		.it_op = IT_GETATTR,
 		.it_lock_handle = 0
 	};
-	struct md_op_item *item;
+	struct md_enqueue_info *minfo;
 	int rc;
 
 	if (unlikely(!inode))
@@ -737,9 +817,9 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry,
 	if (d_mountpoint(dentry))
 		return 1;
 
-	item = sa_prep_data(dir, inode, entry);
-	if (IS_ERR(item))
-		return PTR_ERR(item);
+	minfo = sa_prep_data(dir, inode, entry);
+	if (IS_ERR(minfo))
+		return PTR_ERR(minfo);
 
 	entry->se_inode = igrab(inode);
 	rc = md_revalidate_lock(ll_i2mdexp(dir), &it, ll_inode2fid(inode),
@@ -747,15 +827,15 @@ static int sa_revalidate(struct inode *dir, struct sa_entry *entry,
 	if (rc == 1) {
 		entry->se_handle = it.it_lock_handle;
 		ll_intent_release(&it);
-		sa_fini_data(item);
+		sa_fini_data(minfo);
 		return 1;
 	}
 
-	rc = md_intent_getattr_async(ll_i2mdexp(dir), item);
+	rc = md_intent_getattr_async(ll_i2mdexp(dir), minfo);
 	if (rc) {
 		entry->se_inode = NULL;
 		iput(inode);
-		sa_fini_data(item);
+		sa_fini_data(minfo);
 	}
 
 	return rc;
@@ -815,6 +895,9 @@ static int ll_agl_thread(void *arg)
 	while (({set_current_state(TASK_IDLE);
 		 !kthread_should_stop(); })) {
 		spin_lock(&plli->lli_agl_lock);
+		/* The statahead thread maybe help to process AGL entries,
+		 * so check whether list empty again.
+		 */
 		clli = list_first_entry_or_null(&sai->sai_agls,
 						struct ll_inode_info,
 						lli_agl_list);
@@ -852,10 +935,9 @@ static void ll_stop_agl(struct ll_statahead_info *sai)
 	kthread_stop(agl_task);
 
 	spin_lock(&plli->lli_agl_lock);
-	clli = list_first_entry_or_null(&sai->sai_agls,
-					struct ll_inode_info,
-					lli_agl_list);
-	if (clli) {
+	while ((clli = list_first_entry_or_null(&sai->sai_agls,
+						struct ll_inode_info,
+						lli_agl_list)) != NULL) {
 		list_del_init(&clli->lli_agl_list);
 		spin_unlock(&plli->lli_agl_lock);
 		clli->lli_agl_index = 0;
@@ -928,8 +1010,10 @@ static int ll_statahead_thread(void *arg)
 			break;
 		}
 
+		sai->sai_in_readpage = 1;
 		page = ll_get_dir_page(dir, op_data, pos);
 		ll_unlock_md_op_lsm(op_data);
+		sai->sai_in_readpage = 0;
 		if (IS_ERR(page)) {
 			rc = PTR_ERR(page);
 			CDEBUG(D_READA,
@@ -993,9 +1077,14 @@ static int ll_statahead_thread(void *arg)
 
 			while (({set_current_state(TASK_IDLE);
 				 sai->sai_task; })) {
+				if (sa_has_callback(sai)) {
+					__set_current_state(TASK_RUNNING);
+					sa_handle_callback(sai);
+				}
+
 				spin_lock(&lli->lli_agl_lock);
 				while (sa_sent_full(sai) &&
-				       !list_empty(&sai->sai_agls)) {
+				       !agl_list_empty(sai)) {
 					struct ll_inode_info *clli;
 
 					__set_current_state(TASK_RUNNING);
@@ -1047,11 +1136,16 @@ static int ll_statahead_thread(void *arg)
 
 	/*
 	 * statahead is finished, but statahead entries need to be cached, wait
-	 * for file release closedir() call to stop me.
+	 * for file release to stop me.
 	 */
 	while (({set_current_state(TASK_IDLE);
 		 sai->sai_task; })) {
-		schedule();
+		if (sa_has_callback(sai)) {
+			__set_current_state(TASK_RUNNING);
+			sa_handle_callback(sai);
+		} else {
+			schedule();
+		}
 	}
 	__set_current_state(TASK_RUNNING);
 out:
@@ -1061,9 +1155,13 @@ static int ll_statahead_thread(void *arg)
 	 * wait for inflight statahead RPCs to finish, and then we can free sai
 	 * safely because statahead RPC will access sai data
 	 */
-	while (sai->sai_sent != sai->sai_replied)
+	while (sai->sai_sent != sai->sai_replied) {
 		/* in case we're not woken up, timeout wait */
 		msleep(125);
+	}
+
+	/* release resources held by statahead RPCs */
+	sa_handle_callback(sai);
 
 	CDEBUG(D_READA, "statahead thread stopped: sai %p, parent %pd\n",
 	       sai, parent);
@@ -1325,6 +1423,10 @@ static int revalidate_statahead_dentry(struct inode *dir,
 		goto out_unplug;
 	}
 
+	/* if statahead is busy in readdir, help it do post-work */
+	if (!sa_ready(entry) && sai->sai_in_readpage)
+		sa_handle_callback(sai);
+
 	if (!sa_ready(entry)) {
 		spin_lock(&lli->lli_sa_lock);
 		sai->sai_index_wait = entry->se_index;
@@ -1497,7 +1599,6 @@ static int start_statahead_thread(struct inode *dir, struct dentry *dentry,
 	sai->sai_task = task;
 
 	wake_up_process(task);
-
 	/*
 	 * We don't stat-ahead for the first dirent since we are already in
 	 * lookup.
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 1d9b830..71bf7811 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -3438,9 +3438,9 @@ static int lmv_clear_open_replay_data(struct obd_export *exp,
 }
 
 static int lmv_intent_getattr_async(struct obd_export *exp,
-				    struct md_op_item *item)
+				    struct md_enqueue_info *minfo)
 {
-	struct md_op_data *op_data = &item->mop_data;
+	struct md_op_data *op_data = &minfo->mi_data;
 	struct obd_device *obd = exp->exp_obd;
 	struct lmv_obd *lmv = &obd->u.lmv;
 	struct lmv_tgt_desc *ptgt = NULL;
@@ -3464,7 +3464,7 @@ static int lmv_intent_getattr_async(struct obd_export *exp,
 	if (ctgt != ptgt)
 		return -EREMOTE;
 
-	return md_intent_getattr_async(ptgt->ltd_exp, item);
+	return md_intent_getattr_async(ptgt->ltd_exp, minfo);
 }
 
 static int lmv_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
diff --git a/fs/lustre/mdc/mdc_internal.h b/fs/lustre/mdc/mdc_internal.h
index 2416607..fab40bd 100644
--- a/fs/lustre/mdc/mdc_internal.h
+++ b/fs/lustre/mdc/mdc_internal.h
@@ -130,7 +130,8 @@ int mdc_cancel_unused(struct obd_export *exp, const struct lu_fid *fid,
 int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
 			struct lu_fid *fid, u64 *bits);
 
-int mdc_intent_getattr_async(struct obd_export *exp, struct md_op_item *item);
+int mdc_intent_getattr_async(struct obd_export *exp,
+			     struct md_enqueue_info *minfo);
 
 enum ldlm_mode mdc_lock_match(struct obd_export *exp, u64 flags,
 			      const struct lu_fid *fid, enum ldlm_type type,
diff --git a/fs/lustre/mdc/mdc_locks.c b/fs/lustre/mdc/mdc_locks.c
index a0fcab0..4135c3a 100644
--- a/fs/lustre/mdc/mdc_locks.c
+++ b/fs/lustre/mdc/mdc_locks.c
@@ -49,7 +49,7 @@
 
 struct mdc_getattr_args {
 	struct obd_export	*ga_exp;
-	struct md_op_item	*ga_item;
+	struct md_enqueue_info	*ga_minfo;
 };
 
 int it_open_error(int phase, struct lookup_intent *it)
@@ -1360,10 +1360,10 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env,
 {
 	struct mdc_getattr_args *ga = args;
 	struct obd_export *exp = ga->ga_exp;
-	struct md_op_item *item = ga->ga_item;
-	struct ldlm_enqueue_info *einfo = &item->mop_einfo;
-	struct lookup_intent *it = &item->mop_it;
-	struct lustre_handle *lockh = &item->mop_lockh;
+	struct md_enqueue_info *minfo = ga->ga_minfo;
+	struct ldlm_enqueue_info *einfo = &minfo->mi_einfo;
+	struct lookup_intent *it = &minfo->mi_it;
+	struct lustre_handle *lockh = &minfo->mi_lockh;
 	struct ldlm_reply *lockrep;
 	u64 flags = LDLM_FL_HAS_INTENT;
 
@@ -1388,17 +1388,18 @@ static int mdc_intent_getattr_async_interpret(const struct lu_env *env,
 	if (rc)
 		goto out;
 
-	rc = mdc_finish_intent_lock(exp, req, &item->mop_data, it, lockh);
+	rc = mdc_finish_intent_lock(exp, req, &minfo->mi_data, it, lockh);
+
 out:
-	item->mop_cb(&req->rq_pill, item, rc);
+	minfo->mi_cb(req, minfo, rc);
 	return 0;
 }
 
 int mdc_intent_getattr_async(struct obd_export *exp,
-			     struct md_op_item *item)
+			     struct md_enqueue_info *minfo)
 {
-	struct md_op_data *op_data = &item->mop_data;
-	struct lookup_intent *it = &item->mop_it;
+	struct md_op_data *op_data = &minfo->mi_data;
+	struct lookup_intent *it = &minfo->mi_it;
 	struct ptlrpc_request *req;
 	struct mdc_getattr_args *ga;
 	struct ldlm_res_id res_id;
@@ -1427,11 +1428,11 @@ int mdc_intent_getattr_async(struct obd_export *exp,
 	 * to avoid possible races. It is safe to have glimpse handler
 	 * for non-DOM locks and costs nothing.
 	 */
-	if (!item->mop_einfo.ei_cb_gl)
-		item->mop_einfo.ei_cb_gl = mdc_ldlm_glimpse_ast;
+	if (!minfo->mi_einfo.ei_cb_gl)
+		minfo->mi_einfo.ei_cb_gl = mdc_ldlm_glimpse_ast;
 
-	rc = ldlm_cli_enqueue(exp, &req, &item->mop_einfo, &res_id, &policy,
-			      &flags, NULL, 0, LVB_T_NONE, &item->mop_lockh, 1);
+	rc = ldlm_cli_enqueue(exp, &req, &minfo->mi_einfo, &res_id, &policy,
+			      &flags, NULL, 0, LVB_T_NONE, &minfo->mi_lockh, 1);
 	if (rc < 0) {
 		ptlrpc_req_finished(req);
 		return rc;
@@ -1439,7 +1440,7 @@ int mdc_intent_getattr_async(struct obd_export *exp,
 
 	ga = ptlrpc_req_async_args(ga, req);
 	ga->ga_exp = exp;
-	ga->ga_item = item;
+	ga->ga_minfo = minfo;
 
 	req->rq_interpret_reply = mdc_intent_getattr_async_interpret;
 	ptlrpcd_add_req(req);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 08/25] lnet: Protect lpni deref in lnet_health_check
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (6 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 07/25] lustre: llite: revert 'simplify callback handling for async getattr' James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 09/25] lustre: uapi: remove MDS_SETATTR_PORTAL and service James Simmons
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

Discovery thread can modify peer NI/peer net/peer relationship
so we need to be careful when dereferencing the peer NI pointer in
lnet_health_check(). Discovery thread operations under net lock, so
move the peer NI dereference under the net lock which is taken for
incrementing the health stats.

Move some of the other code that is only relevant for messages with a
health status != LNET_MSG_STATUS_OK under the appropriate condition.

HPE-bug-id: LUS-9962
WC-bug-id: https://jira.whamcloud.com/browse/LU-14655
Lustre-commit: d87af24452a2e883 ("LU-14655 lnet: Protect lpni deref in lnet_health_check")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/43503
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/lib-msg.c | 71 ++++++++++++++++++++++++++-----------------------
 1 file changed, 38 insertions(+), 33 deletions(-)

diff --git a/net/lnet/lnet/lib-msg.c b/net/lnet/lnet/lib-msg.c
index 580ddf6..e471848 100644
--- a/net/lnet/lnet/lib-msg.c
+++ b/net/lnet/lnet/lib-msg.c
@@ -821,38 +821,6 @@
 		attempt_remote_resend = false;
 	}
 
-	/* Don't further decrement the health value if a recovery message
-	 * failed.
-	 */
-	if (msg->msg_recovery) {
-		handle_local_health = false;
-		handle_remote_health = false;
-	} else {
-		handle_local_health = false;
-		handle_remote_health = true;
-	}
-
-	/* For local failures, health/recovery/resends are not needed if I only
-	 * have a single (non-lolnd) interface. NB: pb_nnis includes the lolnd
-	 * interface, so a single-rail node would have pb_nnis == 2.
-	 */
-	if (the_lnet.ln_ping_target->pb_nnis <= 2) {
-		handle_local_health = false;
-		attempt_local_resend = false;
-	}
-
-	/* For remote failures, health/recovery/resends are not needed if the
-	 * peer only has a single interface. Special case for routers where we
-	 * rely on health feature to manage route aliveness. NB: unlike pb_nnis
-	 * above, lp_nnis does _not_ include the lolnd, so a single-rail node
-	 * would have lp_nnis == 1.
-	 */
-	if (lpni && lpni->lpni_peer_net->lpn_peer->lp_nnis <= 1) {
-		attempt_remote_resend = false;
-		if (!lnet_isrouter(lpni))
-			handle_remote_health = false;
-	}
-
 	if (!lo)
 		LASSERT(ni && lpni);
 	else
@@ -865,11 +833,48 @@
 	       lnet_health_error2str(hstatus));
 
 	/* stats are only incremented for errors so avoid wasting time
-	 * incrementing statistics if there is no error.
+	 * incrementing statistics if there is no error. Similarly, whether to
+	 * update health values or perform resends is only applicable for
+	 * messages with a health status != OK.
 	 */
 	if (hstatus != LNET_MSG_STATUS_OK) {
+		/* Don't further decrement the health value if a recovery
+		 * message failed.
+		 */
+		if (msg->msg_recovery) {
+			handle_local_health = false;
+			handle_remote_health = false;
+		} else {
+			handle_local_health = true;
+			handle_remote_health = true;
+		}
+
+		/* For local failures, health/recovery/resends are not needed if
+		 * I only have a single (non-lolnd) interface. NB: pb_nnis
+		 * includes the lolnd interface, so a single-rail node would
+		 * have pb_nnis == 2.
+		 */
+		if (the_lnet.ln_ping_target->pb_nnis <= 2) {
+			handle_local_health = false;
+			attempt_local_resend = false;
+		}
+
 		lnet_net_lock(0);
 		lnet_incr_hstats(ni, lpni, hstatus);
+		/* For remote failures, health/recovery/resends are not needed
+		 * if the peer only has a single interface. Special case for
+		 * routers where we rely on health feature to manage route
+		 * aliveness. NB: unlike pb_nnis above, lp_nnis does _not_
+		 * include the lolnd, so a single-rail node would have
+		 * lp_nnis == 1.
+		 */
+		if (lpni && lpni->lpni_peer_net &&
+		    lpni->lpni_peer_net->lpn_peer &&
+		    lpni->lpni_peer_net->lpn_peer->lp_nnis <= 1) {
+			attempt_remote_resend = false;
+			if (!lnet_isrouter(lpni))
+				handle_remote_health = false;
+		}
 		lnet_net_unlock(0);
 	}
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 09/25] lustre: uapi: remove MDS_SETATTR_PORTAL and service
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (7 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 08/25] lnet: Protect lpni deref in lnet_health_check James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 10/25] lustre: llite: Modify AIO/DIO reference counting James Simmons
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Andreas Dilger <adilger@whamcloud.com>

Remove the MDS_SETATTR_PORTAL and the service threads listening on
this portal since they are unused since Lustre 2.1 and are no longer
needed.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13326
Lustre-commit: 7a2ef25f1f259c0a ("LU-13326 mds: remove MDS_SETATTR_PORTAL and service")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/37798
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_idl.h | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index 2047b92..65948d8 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -93,15 +93,11 @@
 
 #define CONNMGR_REQUEST_PORTAL	1
 #define CONNMGR_REPLY_PORTAL	2
-/* #define OSC_REQUEST_PORTAL	3 */
 #define OSC_REPLY_PORTAL	4
-/* #define OSC_BULK_PORTAL	5 */
 #define OST_IO_PORTAL		6
 #define OST_CREATE_PORTAL	7
 #define OST_BULK_PORTAL		8
-/* #define MDC_REQUEST_PORTAL	9 */
 #define MDC_REPLY_PORTAL	10
-/* #define MDC_BULK_PORTAL	11 */
 #define MDS_REQUEST_PORTAL	12
 #define MDS_IO_PORTAL		13
 #define MDS_BULK_PORTAL		14
@@ -109,10 +105,7 @@
 #define LDLM_CB_REPLY_PORTAL	16
 #define LDLM_CANCEL_REQUEST_PORTAL 17
 #define LDLM_CANCEL_REPLY_PORTAL   18
-/* #define PTLBD_REQUEST_PORTAL	19 */
-/* #define PTLBD_REPLY_PORTAL	20 */
-/* #define PTLBD_BULK_PORTAL	21 */
-#define MDS_SETATTR_PORTAL	22
+/* #define MDS_SETATTR_PORTAL	22 obsolete after 2.13 */
 #define MDS_READPAGE_PORTAL	23
 #define OUT_PORTAL		24
 #define MGC_REPLY_PORTAL	25
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 10/25] lustre: llite: Modify AIO/DIO reference counting
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (8 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 09/25] lustre: uapi: remove MDS_SETATTR_PORTAL and service James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 11/25] lustre: llite: Remove transient page counting James Simmons
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Patrick Farrell, Lustre Development List

From: Patrick Farrell <farr0186@gmail.com>

For DIO pages, it's enough to have a reference on the
cl_object associated with the AIO.  This saves taking a
reference on the cl_object for each page, which saves about
5% of the time when doing DIO/AIO.

This is possible because the lifecycle of the aio struct is
always greater than that of the associated pages.

This patch reduces i/o time in ms/GiB by:
Write: 6 ms/GiB
Read: 1 ms/GiB

Totals:
Write: 198 ms/GiB
Read: 197 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     5030 MiB/s
read      5174 MiB/s

Plus this patch:
write     5183 MiB/s
read      5200 MiB/s

WC-bug-id: https://jira.whamcloud.com/browse/LU-13799
Lustre-commit: b3de247b76b4101 ("LU-13799 llite: Modify AIO/DIO reference counting")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-on: https://review.whamcloud.com/39442
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h |  5 +++--
 fs/lustre/llite/file.c        |  5 +++--
 fs/lustre/obdclass/cl_io.c    | 12 ++++++++----
 fs/lustre/obdclass/cl_page.c  |  6 ++++--
 4 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 61a14f4..0f785e5 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -2593,8 +2593,8 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor,
 		     int ioret);
 int cl_sync_io_wait_recycle(const struct lu_env *env, struct cl_sync_io *anchor,
 			    long timeout, int ioret);
-struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb);
-void cl_aio_free(struct cl_dio_aio *aio);
+struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj);
+void cl_aio_free(const struct lu_env *env, struct cl_dio_aio *aio);
 
 static inline void cl_sync_io_init(struct cl_sync_io *anchor, int nr)
 {
@@ -2624,6 +2624,7 @@ struct cl_sync_io {
 struct cl_dio_aio {
 	struct cl_sync_io	cda_sync;
 	struct cl_page_list	cda_pages;
+	struct cl_object	*cda_obj;
 	struct kiocb		*cda_iocb;
 	ssize_t			cda_bytes;
 	unsigned int		cda_no_aio_complete:1;
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index b822ca5..1bf237b 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -1656,7 +1656,8 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot,
 		if (!ll_sbi_has_parallel_dio(sbi))
 			is_parallel_dio = false;
 
-		ci_aio = cl_aio_alloc(args->u.normal.via_iocb);
+		ci_aio = cl_aio_alloc(args->u.normal.via_iocb,
+				      ll_i2info(inode)->lli_clob);
 		if (!ci_aio) {
 			rc = -ENOMEM;
 			goto out;
@@ -1814,7 +1815,7 @@ static void ll_heat_add(struct inode *inode, enum cl_io_type iot,
 		cl_sync_io_note(env, &io->ci_aio->cda_sync,
 				rc == -EIOCBQUEUED ? 0 : rc);
 		if (!is_aio) {
-			cl_aio_free(io->ci_aio);
+			cl_aio_free(env, io->ci_aio);
 			io->ci_aio = NULL;
 		}
 	}
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index 63ce39c..b5e7744b 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -1131,7 +1131,7 @@ static void cl_aio_end(const struct lu_env *env, struct cl_sync_io *anchor)
 					   ret ?: aio->cda_bytes, 0);
 }
 
-struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb)
+struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb, struct cl_object *obj)
 {
 	struct cl_dio_aio *aio;
 
@@ -1147,15 +1147,19 @@ struct cl_dio_aio *cl_aio_alloc(struct kiocb *iocb)
 		cl_page_list_init(&aio->cda_pages);
 		aio->cda_iocb = iocb;
 		aio->cda_no_aio_complete = 0;
+		cl_object_get(obj);
+		aio->cda_obj = obj;
 	}
 	return aio;
 }
 EXPORT_SYMBOL(cl_aio_alloc);
 
-void cl_aio_free(struct cl_dio_aio *aio)
+void cl_aio_free(const struct lu_env *env, struct cl_dio_aio *aio)
 {
-	if (aio)
+	if (aio) {
+		cl_object_put(env, aio->cda_obj);
 		kmem_cache_free(cl_dio_aio_kmem, aio);
+	}
 }
 EXPORT_SYMBOL(cl_aio_free);
 
@@ -1196,7 +1200,7 @@ void cl_sync_io_note(const struct lu_env *env, struct cl_sync_io *anchor,
 		 * If anchor->csi_aio is set, we are responsible for freeing
 		 * memory here rather than when cl_sync_io_wait() completes.
 		 */
-		cl_aio_free(aio);
+		cl_aio_free(env, aio);
 	}
 }
 EXPORT_SYMBOL(cl_sync_io_note);
diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c
index 1c9e91d..41bd767 100644
--- a/fs/lustre/obdclass/cl_page.c
+++ b/fs/lustre/obdclass/cl_page.c
@@ -147,7 +147,8 @@ static void cl_page_free(const struct lu_env *env, struct cl_page *cl_page,
 	cl_page->cp_layer_count = 0;
 	lu_object_ref_del_at(&obj->co_lu, &cl_page->cp_obj_ref,
 			     "cl_page", cl_page);
-	cl_object_put(env, obj);
+	if (cl_page->cp_type != CPT_TRANSIENT)
+		cl_object_put(env, obj);
 	lu_ref_fini(&cl_page->cp_reference);
 	__cl_page_free(cl_page, bufsize);
 }
@@ -227,7 +228,8 @@ struct cl_page *cl_page_alloc(const struct lu_env *env, struct cl_object *o,
 		BUILD_BUG_ON((1 << CP_TYPE_BITS) < CPT_NR); /* cp_type */
 		refcount_set(&cl_page->cp_ref, 1);
 		cl_page->cp_obj = o;
-		cl_object_get(o);
+		if (type != CPT_TRANSIENT)
+			cl_object_get(o);
 		lu_object_ref_add_at(&o->co_lu, &cl_page->cp_obj_ref,
 				     "cl_page", cl_page);
 		cl_page->cp_vmpage = vmpage;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 11/25] lustre: llite: Remove transient page counting
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (9 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 10/25] lustre: llite: Modify AIO/DIO reference counting James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 12/25] lustre: lov: Improve DIO submit James Simmons
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Patrick Farrell, Lustre Development List

From: Patrick Farrell <farr0186@gmail.com>

Transient page counting is not used for anything, as
already noted in the commit message, but costs something
like 4% of the time in DIO page submission.

Remove it.

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

This patch reduces i/o time in ms/GiB by:
Write: 6 ms/GiB
Read: 11 ms/GiB

Totals:
Write: 174 ms/GiB
Read: 167 ms/GiB

With previous patches in series:
write     5703 MiB/s
read      5756 MiB/s

Plus this patch:
write     5900 MiB/s
read      6136 MiB/s

WC-bug-id: https://jira.whamcloud.com/browse/LU-13799
Lustre-commit: 587e5aa8342980f7 ("LU-13799 llite: Remove transient page counting")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-on: https://review.whamcloud.com/39441
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/vvp_internal.h | 7 -------
 fs/lustre/llite/vvp_object.c   | 4 +---
 fs/lustre/llite/vvp_page.c     | 5 -----
 3 files changed, 1 insertion(+), 15 deletions(-)

diff --git a/fs/lustre/llite/vvp_internal.h b/fs/lustre/llite/vvp_internal.h
index f2599be..b5e1df2 100644
--- a/fs/lustre/llite/vvp_internal.h
+++ b/fs/lustre/llite/vvp_internal.h
@@ -189,13 +189,6 @@ struct vvp_object {
 	struct inode	       *vob_inode;
 
 	/**
-	 * Number of transient pages.  This is no longer protected by i_sem,
-	 * and needs to be atomic.  This is not actually used for anything,
-	 * and can probably be removed.
-	 */
-	atomic_t		vob_transient_pages;
-
-	/**
 	 * Number of outstanding mmaps on this file.
 	 *
 	 * \see ll_vm_open(), ll_vm_close().
diff --git a/fs/lustre/llite/vvp_object.c b/fs/lustre/llite/vvp_object.c
index 096d996..294df88 100644
--- a/fs/lustre/llite/vvp_object.c
+++ b/fs/lustre/llite/vvp_object.c
@@ -63,8 +63,7 @@ static int vvp_object_print(const struct lu_env *env, void *cookie,
 	struct inode *inode = obj->vob_inode;
 	struct ll_inode_info *lli;
 
-	(*p)(env, cookie, "(%d %d) inode: %p ",
-	     atomic_read(&obj->vob_transient_pages),
+	(*p)(env, cookie, "(%d) inode: %p ",
 	     atomic_read(&obj->vob_mmap_cnt), inode);
 	if (inode) {
 		lli = ll_i2info(inode);
@@ -228,7 +227,6 @@ static int __vvp_object_init(const struct lu_env *env,
 			     const struct cl_object_conf *conf)
 {
 	vob->vob_inode = conf->coc_inode;
-	atomic_set(&vob->vob_transient_pages, 0);
 	cl_object_page_init(&vob->vob_cl, sizeof(struct vvp_page));
 	return 0;
 }
diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c
index 2ecd414..9e14898 100644
--- a/fs/lustre/llite/vvp_page.c
+++ b/fs/lustre/llite/vvp_page.c
@@ -437,10 +437,8 @@ static void vvp_transient_page_fini(const struct lu_env *env,
 				    struct pagevec *pvec)
 {
 	struct vvp_page *vpg = cl2vvp_page(slice);
-	struct vvp_object *clobj = cl2vvp(slice->cpl_obj);
 
 	vvp_page_fini_common(vpg, pvec);
-	atomic_dec(&clobj->vob_transient_pages);
 }
 
 static const struct cl_page_operations vvp_transient_page_ops = {
@@ -469,11 +467,8 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 		cl_page_slice_add(page, &vpg->vpg_cl, obj,
 				  &vvp_page_ops);
 	} else {
-		struct vvp_object *clobj = cl2vvp(obj);
-
 		cl_page_slice_add(page, &vpg->vpg_cl, obj,
 				  &vvp_transient_page_ops);
-		atomic_inc(&clobj->vob_transient_pages);
 	}
 	return 0;
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 12/25] lustre: lov: Improve DIO submit
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (10 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 11/25] lustre: llite: Remove transient page counting James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 13/25] lustre: llite: Adjust dio refcounting James Simmons
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Patrick Farrell, Lustre Development List

From: Patrick Farrell <farr0186@gmail.com>

Skip some unnecessary looping in page submission for the
DIO case.

This gives about a 2% improvement for AIO/DIO page
submission.

This patch reduces i/o time in ms/GiB by:
Write: 2 ms/GiB
Read: 2 ms/GiB

Totals:
Write: 172 ms/GiB
Read: 165 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write        7726 MiB/s
read         5899 MiB/s

Plus this patch:
write        5954 MiB/s
read         6217 MiB/s

WC-bug-id: https://jira.whamcloud.com/browse/LU-13799
Lustre-commit: d31647c017a390c9 ("LU-13799 lov: Improve DIO submit")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-on: https://review.whamcloud.com/39446
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/lov/lov_io.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/fs/lustre/lov/lov_io.c b/fs/lustre/lov/lov_io.c
index 9012ad6..2885943 100644
--- a/fs/lustre/lov/lov_io.c
+++ b/fs/lustre/lov/lov_io.c
@@ -1255,11 +1255,15 @@ static int lov_io_submit(const struct lu_env *env,
 	struct lov_io *lio = cl2lov_io(env, ios);
 	struct lov_io_sub *sub;
 	struct cl_page_list *plist = &lov_env_info(env)->lti_plist;
-	struct cl_page *page;
+	struct cl_page *page = cl_page_list_first(qin);
 	struct cl_page *tmp;
+	bool dio = false;
 	int index;
 	int rc = 0;
 
+	if (page->cp_type == CPT_TRANSIENT)
+		dio = true;
+
 	cl_page_list_init(plist);
 	while (qin->pl_nr > 0) {
 		struct cl_2queue *cl2q = &lov_env_info(env)->lti_cl2q;
@@ -1281,12 +1285,17 @@ static int lov_io_submit(const struct lu_env *env,
 		cl_page_list_move(&cl2q->c2_qin, qin, page);
 
 		index = page->cp_lov_index;
-		cl_page_list_for_each_safe(page, tmp, qin) {
-			/* this page is not on this stripe */
-			if (index != page->cp_lov_index)
-				continue;
-
-			cl_page_list_move(&cl2q->c2_qin, qin, page);
+		/* DIO is already split by stripe */
+		if (!dio) {
+			cl_page_list_for_each_safe(page, tmp, qin) {
+				/* this page is not on this stripe */
+				if (index != page->cp_lov_index)
+					continue;
+
+				cl_page_list_move(&cl2q->c2_qin, qin, page);
+			}
+		} else {
+			cl_page_list_splice(qin, &cl2q->c2_qin);
 		}
 
 		sub = lov_sub_get(env, lio, index);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 13/25] lustre: llite: Adjust dio refcounting
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (11 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 12/25] lustre: lov: Improve DIO submit James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 14/25] lustre: clio: Skip prep for transients James Simmons
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Patrick Farrell, Lustre Development List

From: Patrick Farrell <farr0186@gmail.com>

We get a page reference in cl_page_find, then immediately
add another for cl_page_list_add and remove the first
reference.  This is pretty silly, since the life cycle is
the same on these.

This improves DIO/AIO page submission by around 2%.

This patch reduces i/o time in ms/GiB by:
Write: 2 ms/GiB
Read: 2 ms/GiB

Totals:
Write: 170 ms/GiB
Read: 162 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous pa5ches in series:
write        5955 MiB/s
read         6218 MiB/s

Plus this patch:
write        6028 MiB/s
read         6305 MiB/s

WC-bug-id: https://jira.whamcloud.com/browse/LU-13799
Lustre-commit: 1e4d10af3909452b ("LU-13799 llite: Adjust dio refcounting")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-on: https://review.whamcloud.com/39447
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/cl_object.h   | 18 ++++++++++--------
 fs/lustre/llite/llite_lib.c     |  2 +-
 fs/lustre/llite/rw.c            |  4 ++--
 fs/lustre/llite/rw26.c          |  9 +++++----
 fs/lustre/llite/vvp_io.c        |  4 ++--
 fs/lustre/llite/vvp_page.c      | 11 +++++++----
 fs/lustre/obdclass/cl_io.c      |  8 +++++---
 fs/lustre/obdecho/echo_client.c |  4 ++--
 8 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/fs/lustre/include/cl_object.h b/fs/lustre/include/cl_object.h
index 0f785e5..d068454 100644
--- a/fs/lustre/include/cl_object.h
+++ b/fs/lustre/include/cl_object.h
@@ -2548,14 +2548,16 @@ static inline struct cl_page *cl_page_list_first(struct cl_page_list *plist)
 	list_for_each_entry_safe((page), (temp), &(list)->pl_pages, cp_batch)
 
 void cl_page_list_init(struct cl_page_list *plist);
-void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page);
+void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page,
+		      bool get_ref);
 void cl_page_list_move(struct cl_page_list *dst, struct cl_page_list *src,
 		       struct cl_page *page);
 void cl_page_list_move_head(struct cl_page_list *dst, struct cl_page_list *src,
 			    struct cl_page *page);
-void cl_page_list_splice(struct cl_page_list *list, struct cl_page_list *head);
-void cl_page_list_del(const struct lu_env *env, struct cl_page_list *plist,
-		      struct cl_page *page);
+void cl_page_list_splice(struct cl_page_list *list,
+			 struct cl_page_list *head);
+void cl_page_list_del(const struct lu_env *env,
+		      struct cl_page_list *plist, struct cl_page *page);
 void cl_page_list_disown(const struct lu_env *env,
 			 struct cl_io *io, struct cl_page_list *plist);
 void cl_page_list_discard(const struct lu_env *env,
@@ -2563,10 +2565,10 @@ void cl_page_list_discard(const struct lu_env *env,
 void cl_page_list_fini(const struct lu_env *env, struct cl_page_list *plist);
 
 void cl_2queue_init(struct cl_2queue *queue);
-void cl_2queue_disown(const struct lu_env *env,
-		      struct cl_io *io, struct cl_2queue *queue);
-void cl_2queue_discard(const struct lu_env *env,
-		       struct cl_io *io, struct cl_2queue *queue);
+void cl_2queue_disown(const struct lu_env *env, struct cl_io *io,
+		      struct cl_2queue *queue);
+void cl_2queue_discard(const struct lu_env *env, struct cl_io *io,
+		       struct cl_2queue *queue);
 void cl_2queue_fini(const struct lu_env *env, struct cl_2queue *queue);
 void cl_2queue_init_page(struct cl_2queue *queue, struct cl_page *page);
 
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 5610523..7b8f1b5 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -1884,7 +1884,7 @@ int ll_io_zero_page(struct inode *inode, pgoff_t index, pgoff_t offset,
 		anchor = &vvp_env_info(env)->vti_anchor;
 		cl_sync_io_init(anchor, 1);
 		clpage->cp_sync_io = anchor;
-		cl_page_list_add(&queue->c2_qin, clpage);
+		cl_page_list_add(&queue->c2_qin, clpage, true);
 		rc = cl_io_submit_rw(env, io, CRT_WRITE, queue);
 		if (rc)
 			goto queuefini1;
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 4de77f6..48984aa 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -249,7 +249,7 @@ static int ll_read_ahead_page(const struct lu_env *env, struct cl_io *io,
 			vpg->vpg_defer_uptodate = 1;
 			vpg->vpg_ra_used = 0;
 		}
-		cl_page_list_add(queue, page);
+		cl_page_list_add(queue, page, true);
 	} else {
 		/* skip completed pages */
 		cl_page_unassume(env, io, page);
@@ -1657,7 +1657,7 @@ int ll_io_read_page(const struct lu_env *env, struct cl_io *io,
 		cl_sync_io_init(anchor, 1);
 		page->cp_sync_io = anchor;
 
-		cl_page_list_add(&queue->c2_qin, page);
+		cl_page_list_add(&queue->c2_qin, page, true);
 	}
 
 	io_start_index = cl_index(io->ci_obj, io->u.ci_rw.crw_pos);
diff --git a/fs/lustre/llite/rw26.c b/fs/lustre/llite/rw26.c
index 0d72c3e..e5d80cb 100644
--- a/fs/lustre/llite/rw26.c
+++ b/fs/lustre/llite/rw26.c
@@ -264,7 +264,10 @@ struct ll_dio_pages {
 			 */
 			page->cp_inode = inode;
 		}
-		cl_page_list_add(&queue->c2_qin, page);
+		/* We keep the refcount from cl_page_find, so we don't need
+		 * another one here
+		 */
+		cl_page_list_add(&queue->c2_qin, page, false);
 		/*
 		 * Set page clip to tell transfer formation engine
 		 * that page has to be sent even if it is beyond KMS.
@@ -273,8 +276,6 @@ struct ll_dio_pages {
 			cl_page_clip(env, page, 0, size);
 		++io_pages;
 
-		/* drop the reference count for cl_page_find */
-		cl_page_put(env, page);
 		offset += page_size;
 		size -= page_size;
 	}
@@ -731,7 +732,7 @@ static int ll_write_end(struct file *file, struct address_space *mapping,
 		lcc->lcc_page = NULL; /* page will be queued */
 
 		/* Add it into write queue */
-		cl_page_list_add(plist, page);
+		cl_page_list_add(plist, page, true);
 		if (plist->pl_nr == 1) /* first page */
 			vio->u.readwrite.vui_from = from;
 		else
diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index 0e54f46..a117800 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -1444,7 +1444,7 @@ static int vvp_io_fault_start(const struct lu_env *env,
 			cl_page_assume(env, io, page);
 
 			cl_page_list_init(plist);
-			cl_page_list_add(plist, page);
+			cl_page_list_add(plist, page, true);
 
 			/* size fixup */
 			if (last_index == vvp_index(vpg))
@@ -1466,7 +1466,7 @@ static int vvp_io_fault_start(const struct lu_env *env,
 				if (result >= 0) {
 					io->ci_noquota = 1;
 					cl_page_own(env, io, page);
-					cl_page_list_add(plist, page);
+					cl_page_list_add(plist, page, true);
 					lu_ref_add(&page->cp_reference,
 						   "cl_io", io);
 					result = cl_io_commit_async(env, io,
diff --git a/fs/lustre/llite/vvp_page.c b/fs/lustre/llite/vvp_page.c
index 9e14898..60a28d6 100644
--- a/fs/lustre/llite/vvp_page.c
+++ b/fs/lustre/llite/vvp_page.c
@@ -459,16 +459,19 @@ int vvp_page_init(const struct lu_env *env, struct cl_object *obj,
 	vpg->vpg_page = vmpage;
 	get_page(vmpage);
 
-	if (page->cp_type == CPT_CACHEABLE) {
+	if (page->cp_type == CPT_TRANSIENT) {
+		/* DIO pages are referenced by userspace, we don't need to take
+		 * a reference on them. (contrast with get_page() call above)
+		 */
+		cl_page_slice_add(page, &vpg->vpg_cl, obj,
+				  &vvp_transient_page_ops);
+	} else {
 		/* in cache, decref in vvp_page_delete */
 		refcount_inc(&page->cp_ref);
 		SetPagePrivate(vmpage);
 		vmpage->private = (unsigned long)page;
 		cl_page_slice_add(page, &vpg->vpg_cl, obj,
 				  &vvp_page_ops);
-	} else {
-		cl_page_slice_add(page, &vpg->vpg_cl, obj,
-				  &vvp_transient_page_ops);
 	}
 	return 0;
 }
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index b5e7744b..9a0373f 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -825,7 +825,8 @@ void cl_page_list_init(struct cl_page_list *plist)
 /**
  * Adds a page to a page list.
  */
-void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page)
+void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page,
+		      bool get_ref)
 {
 	/* it would be better to check that page is owned by "current" io, but
 	 * it is not passed here.
@@ -836,7 +837,8 @@ void cl_page_list_add(struct cl_page_list *plist, struct cl_page *page)
 	list_add_tail(&page->cp_batch, &plist->pl_pages);
 	++plist->pl_nr;
 	lu_ref_add_at(&page->cp_reference, &page->cp_queue_ref, "queue", plist);
-	cl_page_get(page);
+	if (get_ref)
+		cl_page_get(page);
 }
 EXPORT_SYMBOL(cl_page_list_add);
 
@@ -1019,7 +1021,7 @@ void cl_2queue_init_page(struct cl_2queue *queue, struct cl_page *page)
 	/*
 	 * Add a page to the incoming page list of 2-queue.
 	 */
-	cl_page_list_add(&queue->c2_qin, page);
+	cl_page_list_add(&queue->c2_qin, page, true);
 }
 EXPORT_SYMBOL(cl_2queue_init_page);
 
diff --git a/fs/lustre/obdecho/echo_client.c b/fs/lustre/obdecho/echo_client.c
index c3a12ce..4cc046a 100644
--- a/fs/lustre/obdecho/echo_client.c
+++ b/fs/lustre/obdecho/echo_client.c
@@ -1021,7 +1021,7 @@ static void echo_commit_callback(const struct lu_env *env, struct cl_io *io,
 		struct page *vmpage = pvec->pages[i];
 		struct cl_page *page = (struct cl_page *)vmpage->private;
 
-		cl_page_list_add(&queue->c2_qout, page);
+		cl_page_list_add(&queue->c2_qout, page, true);
 	}
 }
 
@@ -1085,7 +1085,7 @@ static int cl_echo_object_brw(struct echo_object *eco, int rw, u64 offset,
 		/*
 		 * Add a page to the incoming page list of 2-queue.
 		 */
-		cl_page_list_add(&queue->c2_qin, clp);
+		cl_page_list_add(&queue->c2_qin, clp, true);
 
 		/* drop the reference count for cl_page_find, so that the page
 		 * will be freed in cl_2queue_fini.
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 14/25] lustre: clio: Skip prep for transients
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (12 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 13/25] lustre: llite: Adjust dio refcounting James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 15/25] lustre: osc: Improve osc_queue_sync_pages James Simmons
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Patrick Farrell, Lustre Development List

From: Patrick Farrell <farr0186@gmail.com>

The work done by cpo_prep() (etc) is unnecessary for
transient pages.  This gives only a minimal performance
boost and is better seen as a step towards removing the
cl_page abstraction for transient pages.

But, it does consistently give around 1% better
performance.

This patch reduces i/o time in ms/GiB by:
Write: 1 ms/GiB
Read: 1 ms/GiB

Totals:
Write: 169 ms/GiB
Read: 161 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write        6028 MiB/s
read         6305 MiB/s

Plus this patch:
write        6071 MiB/s
read         6355 MiB/s

WC-bug-id: https://jira.whamcloud.com/browse/LU-13799
Lustre-commit: b8553978789ad3dd ("LU-13799 clio: Skip prep for transients")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-on: https://review.whamcloud.com/39448
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/obdclass/cl_page.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/obdclass/cl_page.c b/fs/lustre/obdclass/cl_page.c
index 41bd767..4bfa1c5 100644
--- a/fs/lustre/obdclass/cl_page.c
+++ b/fs/lustre/obdclass/cl_page.c
@@ -850,12 +850,15 @@ int cl_page_prep(const struct lu_env *env, struct cl_io *io,
 	if (crt >= CRT_NR)
 		return -EINVAL;
 
-	cl_page_slice_for_each(cl_page, slice, i) {
-		if (slice->cpl_ops->cpo_own)
-			result = (*slice->cpl_ops->io[crt].cpo_prep)(env, slice,
-								     io);
-		if (result != 0)
-			break;
+	if (cl_page->cp_type != CPT_TRANSIENT) {
+		cl_page_slice_for_each(cl_page, slice, i) {
+			if (slice->cpl_ops->cpo_own)
+				result = (*slice->cpl_ops->io[crt].cpo_prep)(env,
+									     slice,
+									     io);
+			if (result != 0)
+				break;
+		}
 	}
 
 	if (result >= 0) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 15/25] lustre: osc: Improve osc_queue_sync_pages
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (13 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 14/25] lustre: clio: Skip prep for transients James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 16/25] lustre: llite: avoid project quota overflow James Simmons
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Patrick Farrell, Lustre Development List

From: Patrick Farrell <farr0186@gmail.com>

This patch was split and partially done in:
https://review.whamcloud.com/38214

So the text below refers to the combination of this patch
and that one.  This patch now just improves a looped atomic
add by replacing with a single one.  The rest of the grant
calcuation change is in
https://review.whamcloud.com/38214

(I am retaining the text below to show the performance
 improvement)
----------
osc_queue_sync_pages now has a grant calculation component,
this has a pretty painful impact on the new faster DIO
performance.  Specifically, per page ktime_get() and the
per-page atomic_add cost close to 10% of total CPU time in
the DIO path.

We can make this per batch of pages rather than for each
page, which reduces this cost from 10% of CPU to almost
nothing.

This improves write performance by about 10% (but has no
effect on reads, since they don't use grant).

This patch reduces i/o time in ms/GiB by:
Write: 10 ms/GiB
Read: 0 ms/GiB

Totals:
Write: 158 ms/GiB
Read: 161 ms/GiB

mpirun -np 1 $IOR -w -t 1G -b 64G -o $FILE --posix.odirect

Before patch:
write     6071

After patch:
write     6470

(Read is similar.)

This also fixes a mistake in d23d4cb67c / LU-13419 where it
removed the shrink interval update entirely from the direct
i/o path.

Fixes: d23d4cb67c ("lustre: osc: Move shrink update to per-write")
WC-bug-id: https://jira.whamcloud.com/browse/LU-13419
Lustre-commit: 87c4535f7a5d239a ("LU-13799 osc: Improve osc_queue_sync_pages")
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-on: https://review.whamcloud.com/39482
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_cache.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/osc/osc_cache.c b/fs/lustre/osc/osc_cache.c
index 50f6477..69cf9ba 100644
--- a/fs/lustre/osc/osc_cache.c
+++ b/fs/lustre/osc/osc_cache.c
@@ -2715,8 +2715,8 @@ int osc_queue_sync_pages(const struct lu_env *env, struct cl_io *io,
 			list_for_each_entry(oap, list, oap_pending_item) {
 				osc_consume_write_grant(cli,
 							&oap->oap_brw_page);
-				atomic_long_inc(&obd_dirty_pages);
 			}
+			atomic_long_add(page_count, &obd_dirty_pages);
 			osc_unreserve_grant_nolock(cli, grants, 0);
 			ext->oe_grants = grants;
 		} else {
@@ -2730,6 +2730,7 @@ int osc_queue_sync_pages(const struct lu_env *env, struct cl_io *io,
 			"not enough grant available, switching to sync for this i/o\n");
 		}
 		spin_unlock(&cli->cl_loi_list_lock);
+		osc_update_next_shrink(cli);
 	}
 
 	ext->oe_is_rdma_only = !!(brw_flags & OBD_BRW_RDMA_ONLY);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 16/25] lustre: llite: avoid project quota overflow
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (14 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 15/25] lustre: osc: Improve osc_queue_sync_pages James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 17/25] lnet: check memdup_user_nul using IS_ERR James Simmons
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Wang Shilong, Lustre Development List

From: Wang Shilong <wshilong@ddn.com>

Currently, project ID is stored as u32, max possible
value for it is 4294967295.

However, VFS reserve max value for special usage, see
following function:

static inline bool
qid_has_mapping(struct user_namespace *ns, struct kqid qid)
{
        return from_kqid(ns, qid) != (qid_t) -1;
}

So qid_has_mapping() could return 0 for id 4294967295.
A further try on chown test:

$ chown 4294967295:4294967295 c.sh
  chown: invalid user: ‘4294967295:4294967295’
$ chown 4294967294:4294967294 c.sh

Fix to check max possible value for project ID in the
client kernel side, and add a test case for this.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14740
Lustre-commit: 3ffa5d680f0092ae ("LU-14740 llite: avoid project quota overflow")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-on: https://review.whamcloud.com/43939
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/file.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index 1bf237b..a4e432e 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -3323,8 +3323,17 @@ int ll_ioctl_check_project(struct inode *inode, u32 xflags,
 	 * namespace. Enforce that restriction only if we are trying to change
 	 * the quota ID state. Everything else is allowed in user namespaces.
 	 */
-	if (current_user_ns() == &init_user_ns)
+	if (current_user_ns() == &init_user_ns) {
+		/*
+		 * Caller is allowed to change the project ID. if it is being
+		 * changed, make sure that the new value is valid.
+		 */
+		if (ll_i2info(inode)->lli_projid != projid &&
+		     !projid_valid(make_kprojid(&init_user_ns, projid)))
+			return -EINVAL;
+
 		return 0;
+	}
 
 	if (ll_i2info(inode)->lli_projid != projid)
 		return -EINVAL;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 17/25] lnet: check memdup_user_nul using IS_ERR
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (15 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 16/25] lustre: llite: avoid project quota overflow James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 18/25] lustre: osc: Remove lockless truncate James Simmons
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Cyril Bordage, Lustre Development List

From: Cyril Bordage <cbordage@whamcloud.com>

Crash in proc_lnet_portal_rotor. memdup_user_nul returns an ERR_PTR
on error, not a NULL pointer. IS_ERR and PTR_ERR functions have to be
used to check and return the correct error code. The fix has been
applied in other locations having the wrong check.

Fixes: 986fbf5bf19 ("lnet: libcfs: discard cfs_trace_copyin_string()")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14788
Lustre-commit: 449d046e55a42cc4 ("LU-14788 lnet: check memdup_user_nul using IS_ERR")
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44091
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/libcfs/module.c    | 4 ++--
 net/lnet/libcfs/tracefile.c | 8 ++++----
 net/lnet/lnet/router_proc.c | 4 ++--
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/lnet/libcfs/module.c b/net/lnet/libcfs/module.c
index 8059569..a249bdd 100644
--- a/net/lnet/libcfs/module.c
+++ b/net/lnet/libcfs/module.c
@@ -317,8 +317,8 @@ static int proc_dobitmasks(struct ctl_table *table, int write,
 		}
 	} else {
 		tmpstr = memdup_user_nul(buffer, nob);
-		if (!tmpstr)
-			return -ENOMEM;
+		if (IS_ERR(tmpstr))
+			return PTR_ERR(tmpstr);
 
 		rc = libcfs_debug_str2mask(mask, strim(tmpstr), is_subsys);
 		/* Always print LBUG/LASSERT to console, so keep this mask */
diff --git a/net/lnet/libcfs/tracefile.c b/net/lnet/libcfs/tracefile.c
index 6321840..e0ef234 100644
--- a/net/lnet/libcfs/tracefile.c
+++ b/net/lnet/libcfs/tracefile.c
@@ -942,8 +942,8 @@ int cfs_trace_dump_debug_buffer_usrstr(void __user *usr_str, int usr_str_nob)
 	int rc;
 
 	str = memdup_user_nul(usr_str, usr_str_nob);
-	if (!str)
-		return -ENOMEM;
+	if (IS_ERR(str))
+		return PTR_ERR(str);
 
 	path = strim(str);
 	if (path[0] != '/')
@@ -1001,8 +1001,8 @@ int cfs_trace_daemon_command_usrstr(void __user *usr_str, int usr_str_nob)
 	int rc;
 
 	str = memdup_user_nul(usr_str, usr_str_nob);
-	if (!str)
-		return -ENOMEM;
+	if (IS_ERR(str))
+		return PTR_ERR(str);
 
 	rc = cfs_trace_daemon_command(str);
 	kfree(str);
diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c
index dd52a08..0de6681 100644
--- a/net/lnet/lnet/router_proc.c
+++ b/net/lnet/lnet/router_proc.c
@@ -816,8 +816,8 @@ static int proc_lnet_portal_rotor(struct ctl_table *table, int write,
 	}
 
 	buf = memdup_user_nul(buffer, nob);
-	if (!buf)
-		return -ENOMEM;
+	if (IS_ERR(buf))
+		return PTR_ERR(buf);
 
 	tmp = strim(buf);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 18/25] lustre: osc: Remove lockless truncate
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (16 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 17/25] lnet: check memdup_user_nul using IS_ERR James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 19/25] lustre: osc: Remove client contention support James Simmons
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

Lockless truncate does not work and cannot be made to work.

Fundamentally, it has no means of ensuring consistency
across clients because it can't force them all to drop
cached data without locking.

It's been off for years - let's just get rid of it.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14838
Lustre-commit: 6335dba83995765 ("LU-14838 osc: Remove lockless truncate")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44204
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_osc.h         |  2 --
 fs/lustre/llite/llite_lib.c            |  3 +--
 fs/lustre/mdc/lproc_mdc.c              |  2 --
 fs/lustre/osc/lproc_osc.c              | 35 ----------------------------------
 fs/lustre/osc/osc_io.c                 | 10 ----------
 fs/lustre/osc/osc_lock.c               |  6 +-----
 fs/lustre/ptlrpc/wiretest.c            |  2 --
 include/uapi/linux/lustre/lustre_idl.h |  1 -
 8 files changed, 2 insertions(+), 59 deletions(-)

diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index 13e9363..3a2d8bc 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -116,12 +116,10 @@ struct osc_device {
 	struct osc_stats {
 		u64		os_lockless_writes;	/* by bytes */
 		u64		os_lockless_reads;	/* by bytes */
-		u64		os_lockless_truncates;	/* by times */
 	} od_stats;
 
 	/* configuration item(s) */
 	time64_t		od_contention_time;
-	int			od_lockless_truncate;
 };
 
 /* \defgroup osc osc
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 7b8f1b5..63d0f02 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -284,7 +284,7 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 	data->ocd_connect_flags = OBD_CONNECT_IBITS	| OBD_CONNECT_NODEVOH  |
 				  OBD_CONNECT_ATTRFID	| OBD_CONNECT_GRANT    |
 				  OBD_CONNECT_VERSION	| OBD_CONNECT_BRW_SIZE |
-				  OBD_CONNECT_SRVLOCK	| OBD_CONNECT_TRUNCLOCK|
+				  OBD_CONNECT_SRVLOCK	|
 				  OBD_CONNECT_CANCELSET | OBD_CONNECT_FID      |
 				  OBD_CONNECT_AT	| OBD_CONNECT_LOV_V3   |
 				  OBD_CONNECT_VBR	| OBD_CONNECT_FULL20   |
@@ -510,7 +510,6 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt)
 				  OBD_CONNECT_REQPORTAL | OBD_CONNECT_BRW_SIZE |
 				  OBD_CONNECT_CANCELSET | OBD_CONNECT_FID      |
 				  OBD_CONNECT_SRVLOCK   |
-				  OBD_CONNECT_TRUNCLOCK |
 				  OBD_CONNECT_AT	| OBD_CONNECT_OSS_CAPA |
 				  OBD_CONNECT_VBR	| OBD_CONNECT_FULL20   |
 				  OBD_CONNECT_64BITHASH | OBD_CONNECT_MAXBYTES |
diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c
index 02636ef..b3ace37 100644
--- a/fs/lustre/mdc/lproc_mdc.c
+++ b/fs/lustre/mdc/lproc_mdc.c
@@ -561,8 +561,6 @@ static int mdc_stats_seq_show(struct seq_file *seq, void *v)
 		   stats->os_lockless_writes);
 	seq_printf(seq, "lockless_read_bytes\t\t%llu\n",
 		   stats->os_lockless_reads);
-	seq_printf(seq, "lockless_truncate\t\t%llu\n",
-		   stats->os_lockless_truncates);
 	return 0;
 }
 
diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c
index 3991b2c..bfc5df1 100644
--- a/fs/lustre/osc/lproc_osc.c
+++ b/fs/lustre/osc/lproc_osc.c
@@ -539,38 +539,6 @@ static ssize_t contention_seconds_store(struct kobject *kobj,
 }
 LUSTRE_RW_ATTR(contention_seconds);
 
-static ssize_t lockless_truncate_show(struct kobject *kobj,
-				      struct attribute *attr,
-				      char *buf)
-{
-	struct obd_device *obd = container_of(kobj, struct obd_device,
-					      obd_kset.kobj);
-	struct osc_device *od = obd2osc_dev(obd);
-
-	return sprintf(buf, "%u\n", od->od_lockless_truncate);
-}
-
-static ssize_t lockless_truncate_store(struct kobject *kobj,
-				       struct attribute *attr,
-				       const char *buffer,
-				       size_t count)
-{
-	struct obd_device *obd = container_of(kobj, struct obd_device,
-					      obd_kset.kobj);
-	struct osc_device *od = obd2osc_dev(obd);
-	bool val;
-	int rc;
-
-	rc = kstrtobool(buffer, &val);
-	if (rc)
-		return rc;
-
-	od->od_lockless_truncate = val;
-
-	return count;
-}
-LUSTRE_RW_ATTR(lockless_truncate);
-
 static ssize_t destroys_in_flight_show(struct kobject *kobj,
 				       struct attribute *attr,
 				       char *buf)
@@ -890,8 +858,6 @@ static int osc_stats_seq_show(struct seq_file *seq, void *v)
 		   stats->os_lockless_writes);
 	seq_printf(seq, "lockless_read_bytes\t\t%llu\n",
 		   stats->os_lockless_reads);
-	seq_printf(seq, "lockless_truncate\t\t%llu\n",
-		   stats->os_lockless_truncates);
 	return 0;
 }
 
@@ -928,7 +894,6 @@ void lproc_osc_attach_seqstat(struct obd_device *obd)
 	&lustre_attr_cur_dirty_grant_bytes.attr,
 	&lustre_attr_destroys_in_flight.attr,
 	&lustre_attr_grant_shrink_interval.attr,
-	&lustre_attr_lockless_truncate.attr,
 	&lustre_attr_max_dirty_mb.attr,
 	&lustre_attr_max_pages_per_rpc.attr,
 	&lustre_attr_max_rpcs_in_flight.attr,
diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c
index f69f201..047ae00 100644
--- a/fs/lustre/osc/osc_io.c
+++ b/fs/lustre/osc/osc_io.c
@@ -703,16 +703,6 @@ void osc_io_setattr_end(const struct lu_env *env,
 		result = cbargs->opc_rc;
 		io->ci_result = cbargs->opc_rc;
 	}
-	if (result == 0) {
-		if (oio->oi_lockless) {
-			/* lockless truncate */
-			struct osc_device *osc = lu2osc_dev(obj->co_lu.lo_dev);
-
-			LASSERT(cl_io_is_trunc(io) || cl_io_is_fallocate(io));
-			/* XXX: Need a lock. */
-			osc->od_stats.os_lockless_truncates++;
-		}
-	}
 
 	if (cl_io_is_trunc(io)) {
 		u64 size = io->u.ci_setattr.sa_attr.lvb_size;
diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index 422f3e5..6d6d271 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -800,7 +800,6 @@ void osc_lock_to_lockless(const struct lu_env *env,
 	struct cl_io *io = oio->oi_cl.cis_io;
 	struct cl_object *obj = slice->cls_obj;
 	struct osc_object *oob = cl2osc(obj);
-	const struct osc_device *osd = lu2osc_dev(obj->co_lu.lo_dev);
 	struct obd_connect_data *ocd;
 
 	LASSERT(ols->ols_state == OLS_NEW ||
@@ -821,10 +820,7 @@ void osc_lock_to_lockless(const struct lu_env *env,
 					 OBD_CONNECT_SRVLOCK);
 		if (io->ci_lockreq == CILR_NEVER ||
 		    /* lockless IO */
-		    (ols->ols_locklessable && osc_object_is_contended(oob)) ||
-		    /* lockless truncate */
-		    (cl_io_is_trunc(io) && osd->od_lockless_truncate &&
-		     (ocd->ocd_connect_flags & OBD_CONNECT_TRUNCLOCK))) {
+		    (ols->ols_locklessable && osc_object_is_contended(oob))) {
 			ols->ols_locklessable = 1;
 			slice->cls_ops = ols->ols_lockless_ops;
 		}
diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index cd1456c..4301bd4 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -1108,8 +1108,6 @@ void lustre_assert_wire_constants(void)
 		 OBD_CONNECT_XATTR);
 	LASSERTF(OBD_CONNECT_LARGE_ACL == 0x200ULL, "found 0x%.16llxULL\n",
 		 OBD_CONNECT_LARGE_ACL);
-	LASSERTF(OBD_CONNECT_TRUNCLOCK == 0x400ULL, "found 0x%.16llxULL\n",
-		 OBD_CONNECT_TRUNCLOCK);
 	LASSERTF(OBD_CONNECT_TRANSNO == 0x800ULL, "found 0x%.16llxULL\n",
 		 OBD_CONNECT_TRANSNO);
 	LASSERTF(OBD_CONNECT_IBITS == 0x1000ULL, "found 0x%.16llxULL\n",
diff --git a/include/uapi/linux/lustre/lustre_idl.h b/include/uapi/linux/lustre/lustre_idl.h
index 65948d8..77a64f2 100644
--- a/include/uapi/linux/lustre/lustre_idl.h
+++ b/include/uapi/linux/lustre/lustre_idl.h
@@ -719,7 +719,6 @@ struct ptlrpc_body_v2 {
 #define OBD_CONNECT_ACL			 0x80ULL /*access control lists */
 #define OBD_CONNECT_XATTR		0x100ULL /*client use extended attr */
 #define OBD_CONNECT_LARGE_ACL		0x200ULL /* more than 32 ACL entries */
-#define OBD_CONNECT_TRUNCLOCK		0x400ULL /*locks on server for punch */
 #define OBD_CONNECT_TRANSNO		0x800ULL /*replay sends init transno */
 #define OBD_CONNECT_IBITS	       0x1000ULL /* not checked in 2.11+ */
 #define OBD_CONNECT_JOIN	       0x2000ULL /*files can be concatenated.
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 19/25] lustre: osc: Remove client contention support
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (17 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 18/25] lustre: osc: Remove lockless truncate James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 20/25] lustre: osc: osc: Do not flush on lockless cancel James Simmons
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

Lockless buffered i/o and contention detection don't work,
lockless bufferd i/o is unfixable and contention detection
is broken enough that it will have to be rewritten.

Let's remove both.  This patch starts the removal by
pulling the client side support.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14838
Lustre-commit: 5ad00e36eca11a14 ("LU-14838 osc: Remove client contention support")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44205
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_osc.h |  1 -
 fs/lustre/mdc/lproc_mdc.c      | 41 -----------------------------------------
 fs/lustre/mdc/mdc_dev.c        | 15 +--------------
 fs/lustre/osc/lproc_osc.c      | 33 ---------------------------------
 fs/lustre/osc/osc_lock.c       | 19 ++-----------------
 fs/lustre/osc/osc_object.c     | 22 ----------------------
 6 files changed, 3 insertions(+), 128 deletions(-)

diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index 3a2d8bc..8a62eb2 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -658,7 +658,6 @@ int osc_attr_update(const struct lu_env *env, struct cl_object *obj,
 int osc_object_glimpse(const struct lu_env *env, const struct cl_object *obj,
 		       struct ost_lvb *lvb);
 int osc_object_invalidate(const struct lu_env *env, struct osc_object *osc);
-int osc_object_is_contended(struct osc_object *obj);
 int osc_object_find_cbdata(const struct lu_env *env, struct cl_object *obj,
 			   ldlm_iterator_t iter, void *data);
 int osc_object_prune(const struct lu_env *env, struct cl_object *obj);
diff --git a/fs/lustre/mdc/lproc_mdc.c b/fs/lustre/mdc/lproc_mdc.c
index b3ace37..d13a6b7 100644
--- a/fs/lustre/mdc/lproc_mdc.c
+++ b/fs/lustre/mdc/lproc_mdc.c
@@ -268,45 +268,6 @@ static int mdc_cached_mb_seq_show(struct seq_file *m, void *v)
 }
 LDEBUGFS_SEQ_FOPS(mdc_cached_mb);
 
-static int mdc_contention_seconds_seq_show(struct seq_file *m, void *v)
-{
-	struct obd_device *obd = m->private;
-	struct osc_device *od  = obd2osc_dev(obd);
-
-	seq_printf(m, "%lld\n", od->od_contention_time);
-	return 0;
-}
-
-static ssize_t mdc_contention_seconds_seq_write(struct file *file,
-						const char __user *buffer,
-						size_t count, loff_t *off)
-{
-	struct seq_file *sfl = file->private_data;
-	struct obd_device *obd = sfl->private;
-	struct osc_device *od  = obd2osc_dev(obd);
-	int rc;
-	char kernbuf[128];
-	s64 val;
-
-	if (count >= sizeof(kernbuf))
-		return -EINVAL;
-
-	if (copy_from_user(kernbuf, buffer, count))
-		return -EFAULT;
-	kernbuf[count] = 0;
-
-	rc = kstrtos64(kernbuf, count, &val);
-	if (rc)
-		return rc;
-	if (val < 0 || val > INT_MAX)
-		return -ERANGE;
-
-	od->od_contention_time = val;
-
-	return count;
-}
-LDEBUGFS_SEQ_FOPS(mdc_contention_seconds);
-
 static int mdc_unstable_stats_seq_show(struct seq_file *m, void *v)
 {
 	struct obd_device *obd = m->private;
@@ -628,8 +589,6 @@ static ssize_t mdc_dom_min_repsize_seq_write(struct file *file,
 	  .fops	=	&mdc_checksum_type_fops		},
 	{ .name	=	"timeouts",
 	  .fops	=	&mdc_timeouts_fops		},
-	{ .name	=	"contention_seconds",
-	  .fops	=	&mdc_contention_seconds_fops	},
 	{ .name	=	"import",
 	  .fops	=	&mdc_import_fops		},
 	{ .name	=	"state",
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index 1c28f80..ce4148d 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -536,18 +536,7 @@ static int mdc_lock_upcall(void *cookie, struct lustre_handle *lockh,
 		mdc_lock_granted(env, oscl, lockh);
 
 	/* Error handling, some errors are tolerable. */
-	if (oscl->ols_locklessable && rc == -EUSERS) {
-		/* This is a tolerable error, turn this lock into
-		 * lockless lock.
-		 */
-		osc_object_set_contended(cl2osc(slice->cls_obj));
-		LASSERT(slice->cls_ops != oscl->ols_lockless_ops);
-
-		/* Change this lock to ldlmlock-less lock. */
-		osc_lock_to_lockless(env, oscl, 1);
-		oscl->ols_state = OLS_GRANTED;
-		rc = 0;
-	} else if (oscl->ols_glimpse && rc == -ENAVAIL) {
+	if (oscl->ols_glimpse && rc == -ENAVAIL) {
 		LASSERT(oscl->ols_flags & LDLM_FL_LVB_READY);
 		mdc_lock_lvb_update(env, cl2osc(slice->cls_obj),
 				    NULL, &oscl->ols_lvb);
@@ -972,8 +961,6 @@ int mdc_lock_init(const struct lu_env *env, struct cl_object *obj,
 
 	if (!(enqflags & CEF_MUST))
 		osc_lock_to_lockless(env, ols, (enqflags & CEF_NEVER));
-	if (ols->ols_locklessable && !(enqflags & CEF_DISCARD_DATA))
-		ols->ols_flags |= LDLM_FL_DENY_ON_CONTENTION;
 
 	if (io->ci_type == CIT_WRITE || cl_io_is_mkwrite(io))
 		osc_lock_set_writer(env, io, obj, ols);
diff --git a/fs/lustre/osc/lproc_osc.c b/fs/lustre/osc/lproc_osc.c
index bfc5df1..f9878e0 100644
--- a/fs/lustre/osc/lproc_osc.c
+++ b/fs/lustre/osc/lproc_osc.c
@@ -507,38 +507,6 @@ static ssize_t checksum_dump_store(struct kobject *kobj,
 }
 LUSTRE_RW_ATTR(checksum_dump);
 
-static ssize_t contention_seconds_show(struct kobject *kobj,
-				       struct attribute *attr,
-				       char *buf)
-{
-	struct obd_device *obd = container_of(kobj, struct obd_device,
-					      obd_kset.kobj);
-	struct osc_device *od = obd2osc_dev(obd);
-
-	return sprintf(buf, "%lld\n", od->od_contention_time);
-}
-
-static ssize_t contention_seconds_store(struct kobject *kobj,
-					struct attribute *attr,
-					const char *buffer,
-					size_t count)
-{
-	struct obd_device *obd = container_of(kobj, struct obd_device,
-					      obd_kset.kobj);
-	struct osc_device *od = obd2osc_dev(obd);
-	unsigned int val;
-	int rc;
-
-	rc = kstrtouint(buffer, 10, &val);
-	if (rc)
-		return rc;
-
-	od->od_contention_time = val;
-
-	return count;
-}
-LUSTRE_RW_ATTR(contention_seconds);
-
 static ssize_t destroys_in_flight_show(struct kobject *kobj,
 				       struct attribute *attr,
 				       char *buf)
@@ -887,7 +855,6 @@ void lproc_osc_attach_seqstat(struct obd_device *obd)
 	&lustre_attr_active.attr,
 	&lustre_attr_checksums.attr,
 	&lustre_attr_checksum_dump.attr,
-	&lustre_attr_contention_seconds.attr,
 	&lustre_attr_cur_dirty_bytes.attr,
 	&lustre_attr_cur_grant_bytes.attr,
 	&lustre_attr_cur_lost_grant_bytes.attr,
diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index 6d6d271..f6faed7 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -287,18 +287,7 @@ static int osc_lock_upcall(void *cookie, struct lustre_handle *lockh,
 		osc_lock_granted(env, oscl, lockh);
 
 	/* Error handling, some errors are tolerable. */
-	if (oscl->ols_locklessable && rc == -EUSERS) {
-		/* This is a tolerable error, turn this lock into
-		 * lockless lock.
-		 */
-		osc_object_set_contended(cl2osc(slice->cls_obj));
-		LASSERT(slice->cls_ops != oscl->ols_lockless_ops);
-
-		/* Change this lock to ldlmlock-less lock. */
-		osc_lock_to_lockless(env, oscl, 1);
-		oscl->ols_state = OLS_GRANTED;
-		rc = 0;
-	} else if (oscl->ols_glimpse && rc == -ENAVAIL) {
+	if (oscl->ols_glimpse && rc == -ENAVAIL) {
 		LASSERT(oscl->ols_flags & LDLM_FL_LVB_READY);
 		osc_lock_lvb_update(env, cl2osc(slice->cls_obj),
 				    NULL, &oscl->ols_lvb);
@@ -818,9 +807,7 @@ void osc_lock_to_lockless(const struct lu_env *env,
 					(io->ci_lockreq == CILR_MAYBE) &&
 					(ocd->ocd_connect_flags &
 					 OBD_CONNECT_SRVLOCK);
-		if (io->ci_lockreq == CILR_NEVER ||
-		    /* lockless IO */
-		    (ols->ols_locklessable && osc_object_is_contended(oob))) {
+		if (io->ci_lockreq == CILR_NEVER) {
 			ols->ols_locklessable = 1;
 			slice->cls_ops = ols->ols_lockless_ops;
 		}
@@ -1242,8 +1229,6 @@ int osc_lock_init(const struct lu_env *env,
 	if (!(enqflags & CEF_MUST))
 		/* try to convert this lock to a lockless lock */
 		osc_lock_to_lockless(env, oscl, (enqflags & CEF_NEVER));
-	if (oscl->ols_locklessable && !(enqflags & CEF_DISCARD_DATA))
-		oscl->ols_flags |= LDLM_FL_DENY_ON_CONTENTION;
 
 	if (io->ci_type == CIT_WRITE || cl_io_is_mkwrite(io))
 		osc_lock_set_writer(env, io, obj, oscl);
diff --git a/fs/lustre/osc/osc_object.c b/fs/lustre/osc/osc_object.c
index 0dd926a..517ce5c 100644
--- a/fs/lustre/osc/osc_object.c
+++ b/fs/lustre/osc/osc_object.c
@@ -332,28 +332,6 @@ static int osc_object_fiemap(const struct lu_env *env, struct cl_object *obj,
 	return rc;
 }
 
-int osc_object_is_contended(struct osc_object *obj)
-{
-	struct osc_device *dev = lu2osc_dev(obj->oo_cl.co_lu.lo_dev);
-	time64_t osc_contention_time = dev->od_contention_time;
-	ktime_t retry_time;
-
-	if (OBD_FAIL_CHECK(OBD_FAIL_OSC_OBJECT_CONTENTION))
-		return 1;
-
-	if (!obj->oo_contended)
-		return 0;
-
-	retry_time = ktime_add_ns(obj->oo_contention_time,
-				  osc_contention_time * NSEC_PER_SEC);
-	if (ktime_after(ktime_get(), retry_time)) {
-		osc_object_clear_contended(obj);
-		return 0;
-	}
-	return 1;
-}
-EXPORT_SYMBOL(osc_object_is_contended);
-
 /**
  * Implementation of struct cl_object_operations::coo_req_attr_set() for osc
  * layer. osc is responsible for struct obdo::o_id and struct obdo::o_seq
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 20/25] lustre: osc: osc: Do not flush on lockless cancel
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (18 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 19/25] lustre: osc: Remove client contention support James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 21/26] lustre: pcc: add LCM_FL_PCC_RDONLY layout flag James Simmons
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Patrick Farrell <pfarrell@whamcloud.com>

The cancellation of a an OSC lock without an LDLM lock
(a 'lockless' OSC lock) should not flush pages.  Only
direct i/o is allowed to use a lockless OSC lock, and
direct i/o does not create flushable pages.

DIO pages are not flushable because:
A) all synced ASAP, and
B) the OSC extents created for them are not added to the
extent tree which is used to track these pages.

Instead, this has the effect of trying to flush pages from
ongoing buffered i/o.  This can lead to crashes like the
following:

osc_cache_writeback_range()) ASSERTION(hp == 0 && discard == 0) failed

This assert essentially says the lock cancellation
(hp == 1) found an active i/o (an extent in the OES_ACTIVE
state).

This is not allowed because the flushing code assumes an
LDLM lock is being cancelled, which will only start once
there is no active i/o.  Because the OSC lock being
cancelled is not associated with an LDLM lock, this is not
true, and nothing prevents active i/o under a different
lock, leading to this assert.

The solution is simply to not flush pages when cancelling a
no-LDLM-lock OSC lock.

Additional note:
New lockless OSC locks cannot be created if they are
blocked by a regular OSC lock, but a new regular lock can
be created if there is a lockless lock present.

Thus, the sequence is something like this:
Direct i/o creates lockless OSC lock
Buffered i/o creates OSC and LDLM lock on the same range
Direct i/o finishes, starts cancelling its OSC lock
Buffered i/o is still ongoing, with extents in OES_ACTIVE

This results in the above crash during the OSC lock
cancellation.

Note it would be possible to resolve this issue by not
allowing lockless OSC locks to match regular OSC locks, but
this is not necessary, since there's no reason for lockless
locks to flush pages on cancellation.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14814
Lustre-commit: 6717c573ed90da91 ("LU-14814 osc: osc: Do not flush on lockless cancel")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44152
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/osc/osc_lock.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/fs/lustre/osc/osc_lock.c b/fs/lustre/osc/osc_lock.c
index f6faed7..eb3cb58 100644
--- a/fs/lustre/osc/osc_lock.c
+++ b/fs/lustre/osc/osc_lock.c
@@ -1134,16 +1134,8 @@ static void osc_lock_lockless_cancel(const struct lu_env *env,
 {
 	struct osc_lock *ols = cl2osc_lock(slice);
 	struct osc_object *osc = cl2osc(slice->cls_obj);
-	struct cl_lock_descr *descr = &slice->cls_lock->cll_descr;
-	int result;
 
 	LASSERT(!ols->ols_dlmlock);
-	result = osc_lock_flush(osc, descr->cld_start, descr->cld_end,
-				descr->cld_mode, false);
-	if (result)
-		CERROR("Pages for lockless lock %p were not purged(%d)\n",
-		       ols, result);
-
 	osc_lock_wake_waiters(env, osc, ols);
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 21/26] lustre: pcc: add LCM_FL_PCC_RDONLY layout flag
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (19 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 20/25] lustre: osc: osc: Do not flush on lockless cancel James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 21/25] lustre: update version to 2.14.53 James Simmons
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Qian Yingjin <qian@ddn.com>

The upcoming new feature PCC-RO is combined with FLR and extend
the on-disk data strucutre 'enum lov_comp_md_flags' for layout
components. It adds a new layout flag: LCM_FL_PCC_RDONLY.

enum lov_comp_md_flags {
        LCM_FL_NONE             = 0x0,
        LCM_FL_RDONLY           = 0x1,
        LCM_FL_WRITE_PENDING    = 0x2,
        LCM_FL_SYNC_PENDING     = 0x3,
        LCM_FL_PCC_RDONLY       = 0x8,
        LCM_FL_FLR_MASK         = 0xB,
};

The LCM_FL_PCC_RDONLY flag, which is dedicated for PCC-RO, is
different from LCM_FL_RDONLY.
A PCC-RO cached file could be in the state:
- LCM_FL_PCC_RDONLY | LCM_FL_RDONLY: it means that all FLR
  components are synced and in up-to-date state. The replicated
  file is on read-only state. And then one client attaches the
  file into the PCC backend with PCC-RO mode.
- LCM_FL_PCC_RDONLY | LCM_FL_WRITE_PENDING: it means the file was
  once modified, the data content of layout components are not
  synced. MDT has already picked a promary replica and marked
  other components as STALE. At this time, a client can still
  PCC-RO attach the file. On this client, the primary component
  and the PCC copy are both in up-to-date state.

As a new LCM_FL_PCC_RDONLY flag is added, the old client may not
understand this new FLR layout flag, and may result in
inconsistent data access.

This patch adds this new flag for the purpose of compatibility and
interoperability.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13602
Lustre-commit: adc1bbbf20e0a8a5 ("LU-13602 pcc: add LCM_FL_PCC_RDONLY layout flag")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/40813
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/wiretest.c             |  2 ++
 include/uapi/linux/lustre/lustre_user.h | 13 +++++++------
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index 4301bd4..c3a8a35 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -1727,6 +1727,8 @@ void lustre_assert_wire_constants(void)
 		 (long long)LCM_FL_WRITE_PENDING);
 	LASSERTF(LCM_FL_SYNC_PENDING == 3, "found %lld\n",
 		 (long long)LCM_FL_SYNC_PENDING);
+	LASSERTF(LCM_FL_PCC_RDONLY == 8, "found %lld\n",
+		 (long long)LCM_FL_PCC_RDONLY);
 
 	/* Checks for struct lmv_mds_md_v1 */
 	LASSERTF((int)sizeof(struct lmv_mds_md_v1) == 56, "found %lld\n",
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index da15ca8..748c044 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -622,12 +622,13 @@ static inline __u16 mirror_id_of(__u32 id)
  * on-disk data for lcm_flags. Valid if lcm_magic is LOV_MAGIC_COMP_V1.
  */
 enum lov_comp_md_flags {
-	/* the least 2 bits are used by FLR to record file state */
-	LCM_FL_NONE             = 0,
-	LCM_FL_RDONLY           = 1,
-	LCM_FL_WRITE_PENDING    = 2,
-	LCM_FL_SYNC_PENDING     = 3,
-	LCM_FL_FLR_MASK         = 0x3,
+	/* the least 4 bits are used by FLR to record file state */
+	LCM_FL_NONE             = 0x0,
+	LCM_FL_RDONLY           = 0x1,
+	LCM_FL_WRITE_PENDING    = 0x2,
+	LCM_FL_SYNC_PENDING     = 0x3,
+	LCM_FL_PCC_RDONLY	= 0x8,
+	LCM_FL_FLR_MASK         = 0x8,
 };
 
 struct lov_comp_md_v1 {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 21/25] lustre: update version to 2.14.53
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (20 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 21/26] lustre: pcc: add LCM_FL_PCC_RDONLY layout flag James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 22/25] lustre: mdc: set default LMV on ROOT James Simmons
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

New tag 2.14.53

Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h
index a840eca..093f898 100644
--- a/include/uapi/linux/lustre/lustre_ver.h
+++ b/include/uapi/linux/lustre/lustre_ver.h
@@ -3,9 +3,9 @@
 
 #define LUSTRE_MAJOR 2
 #define LUSTRE_MINOR 14
-#define LUSTRE_PATCH 52
+#define LUSTRE_PATCH 53
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.14.52"
+#define LUSTRE_VERSION_STRING "2.14.53"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 22/25] lustre: mdc: set default LMV on ROOT
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (21 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 21/25] lustre: update version to 2.14.53 James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 22/26] lustre: update version to 2.14.53 James Simmons
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Lai Siyao, Hongchao Zhang, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

To balance MDT usage, set default LMV on ROOT if it's not set. The
default stripe offset is "-1", and default stripe count is "1". Then
directory created by "mkdir" under ROOT will be scattered on all MDTs
by usage.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13417
Lustre-commit: 3e04b0fd6c3dd363 ("LU-13417 mdd: set default LMV on ROOT")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38553
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/mdc_request.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index 1fb9c46..8b94f6c 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -557,6 +557,13 @@ static int mdc_get_lustre_md(struct obd_export *exp, struct req_capsule *pill,
 			goto out;
 		}
 
+		if (md_exp->exp_obd->obd_type->typ_lu == &mdc_device_type) {
+			CERROR("%s: no LMV, upgrading from old version?\n",
+			       md_exp->exp_obd->obd_name);
+			rc = 0;
+			goto out_acl;
+		}
+
 		if (md->body->mbo_valid & OBD_MD_MEA) {
 			lmv_size = md->body->mbo_eadatasize;
 			if (!lmv_size) {
@@ -618,6 +625,7 @@ static int mdc_get_lustre_md(struct obd_export *exp, struct req_capsule *pill,
 	}
 	rc = 0;
 
+out_acl:
 	/* for ACL, it's possible that FLACL is set but aclsize is zero.
 	 * only when aclsize != 0 there's an actual segment for ACL
 	 * in reply buffer.
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 22/26] lustre: update version to 2.14.53
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (22 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 22/25] lustre: mdc: set default LMV on ROOT James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 23/25] lustre: llite: enable filesystem-wide default LMV James Simmons
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

New tag 2.14.53

Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h
index a840eca..093f898 100644
--- a/include/uapi/linux/lustre/lustre_ver.h
+++ b/include/uapi/linux/lustre/lustre_ver.h
@@ -3,9 +3,9 @@
 
 #define LUSTRE_MAJOR 2
 #define LUSTRE_MINOR 14
-#define LUSTRE_PATCH 52
+#define LUSTRE_PATCH 53
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.14.52"
+#define LUSTRE_VERSION_STRING "2.14.53"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 23/25] lustre: llite: enable filesystem-wide default LMV
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (23 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 22/26] lustre: update version to 2.14.53 James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 23/26] lustre: mdc: set default LMV on ROOT James Simmons
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

This change includes three parts:
1. save dir depth to ROOT after lookup on client side.
2. once space balanced default LMV is set on ROOT, and
   max-inherit/max-inherit-rr is unlimited or not less than directory
   depth, new directory will be created in QOS or roundrobin mode.
3. set ROOT default LMV max-inherit unlimited, and max-inherit-rr to
   3, and increase the ratio to create subdirectory on local MDT with
   the directory depth to ROOT, so that new directories will be
   created by space usage, and the deeper it's located it's more
   likely to create on local MDTs; and the top 3 layer will be created
   in roundrobin mode if system is balanced.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14792
Lustre-commit: b9c4dc3c33fe87ec ("LU-14792 llite: enable filesystem-wide default LMV")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44090
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd.h                 |  5 +++
 fs/lustre/llite/dir.c                   |  2 +
 fs/lustre/llite/file.c                  |  5 ++-
 fs/lustre/llite/llite_internal.h        |  5 ++-
 fs/lustre/llite/llite_lib.c             | 17 ++++++++
 fs/lustre/llite/namei.c                 | 74 ++++++++++++++++++++++++++++++---
 fs/lustre/llite/statahead.c             |  5 ++-
 fs/lustre/lmv/lmv_obd.c                 | 32 ++++++++------
 fs/lustre/lmv/lproc_lmv.c               | 26 +++++++++++-
 include/uapi/linux/lustre/lustre_user.h |  2 +
 10 files changed, 149 insertions(+), 24 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index f619342..7c5e699 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -706,6 +706,8 @@ enum md_op_flags {
 	MF_MDC_CANCEL_FID4	= BIT(3),
 	MF_GET_MDT_IDX		= BIT(4),
 	MF_GETATTR_BY_FID	= BIT(5),
+	MF_QOS_MKDIR		= BIT(6),
+	MF_RR_MKDIR		= BIT(7),
 };
 
 enum md_cli_flags {
@@ -795,6 +797,9 @@ struct md_op_data {
 
 	u32			op_projid;
 
+	/* mkdir */
+	unsigned short		op_dir_depth;
+
 	u16			op_mirror_id;
 
 	/*
diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 9666534..57f7c3c 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -442,6 +442,8 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump,
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
+	op_data->op_dir_depth = ll_i2info(parent)->lli_depth;
+
 	if (ll_sbi_has_encrypt(sbi) &&
 	    (IS_ENCRYPTED(parent) ||
 	     unlikely(fscrypt_dummy_context_enabled(parent)))) {
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index a4e432e..aa5c662 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -676,8 +676,11 @@ static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize,
 		 * of kernel will deal with that later.
 		 */
 		ll_set_lock_data(sbi->ll_md_exp, inode, itp, &bits);
-		if (bits & MDS_INODELOCK_LOOKUP)
+		if (bits & MDS_INODELOCK_LOOKUP) {
 			d_lustre_revalidate(de);
+			ll_update_dir_depth(parent->d_inode, d_inode(de));
+		}
+
 		/* if DoM bit returned along with LAYOUT bit then there
 		 * can be read-on-open data returned.
 		 */
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 2247806..95e4f45 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -178,13 +178,15 @@ struct ll_inode_info {
 			 * -- I am the owner of dir statahead.
 			 */
 			pid_t				lli_opendir_pid;
+			/* directory depth to ROOT */
+			unsigned short			lli_depth;
 			/* stat will try to access statahead entries or start
 			 * statahead if this flag is set, and this flag will be
 			 * set upon dir open, and cleared when dir is closed,
 			 * statahead hit ratio is too low, or start statahead
 			 * thread failed.
 			 */
-			unsigned int			lli_sa_enabled:1;
+			unsigned short			lli_sa_enabled:1;
 			/* generation for statahead */
 			unsigned int			lli_sa_generation;
 			/* rw lock protects lli_lsm_md */
@@ -1215,6 +1217,7 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs,
 		       u32 flags);
 int ll_update_inode(struct inode *inode, struct lustre_md *md);
 void ll_update_inode_flags(struct inode *inode, unsigned int ext_flags);
+void ll_update_dir_depth(struct inode *dir, struct inode *inode);
 int ll_read_inode2(struct inode *inode, void *opaque);
 void ll_truncate_inode_pages_final(struct inode *inode);
 void ll_delete_inode(struct inode *inode);
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 63d0f02..f540caf 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -2483,6 +2483,23 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 	return 0;
 }
 
+/* update directory depth to ROOT, called after LOOKUP lock is fetched. */
+void ll_update_dir_depth(struct inode *dir, struct inode *inode)
+{
+	struct ll_inode_info *lli;
+
+	if (!S_ISDIR(inode->i_mode))
+		return;
+
+	if (inode == dir)
+		return;
+
+	lli = ll_i2info(inode);
+	lli->lli_depth = ll_i2info(dir)->lli_depth + 1;
+	CDEBUG(D_INODE, DFID" depth %hu\n", PFID(&lli->lli_fid),
+	       lli->lli_depth);
+}
+
 void ll_truncate_inode_pages_final(struct inode *inode)
 {
 	struct address_space *mapping = &inode->i_data;
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 5cc01f0..54b4e0a 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -741,8 +741,10 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 
 	if (!it_disposition(it, DISP_LOOKUP_NEG)) {
 		/* We have the "lookup" lock, so unhide dentry */
-		if (bits & MDS_INODELOCK_LOOKUP)
+		if (bits & MDS_INODELOCK_LOOKUP) {
 			d_lustre_revalidate(*de);
+			ll_update_dir_depth(parent, d_inode(*de));
+		}
 
 		if (encrypt) {
 			rc = fscrypt_get_encryption_info(inode);
@@ -1415,10 +1417,6 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry,
 			return rc;
 	}
 
-	ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, inode, it, &bits);
-	if (bits & MDS_INODELOCK_LOOKUP)
-		d_lustre_revalidate(dentry);
-
 	d_instantiate(dentry, inode);
 
 	if (encrypt) {
@@ -1427,8 +1425,17 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry,
 			return rc;
 	}
 
-	if (!(ll_i2sbi(inode)->ll_flags & LL_SBI_FILE_SECCTX))
+	if (!(ll_i2sbi(inode)->ll_flags & LL_SBI_FILE_SECCTX)) {
 		rc = ll_inode_init_security(dentry, inode, dir);
+		if (rc)
+			return rc;
+	}
+
+	ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, inode, it, &bits);
+	if (bits & MDS_INODELOCK_LOOKUP) {
+		d_lustre_revalidate(dentry);
+		ll_update_dir_depth(dir, inode);
+	}
 
 	return rc;
 }
@@ -1451,6 +1458,58 @@ void ll_update_times(struct ptlrpc_request *request, struct inode *inode)
 		inode->i_ctime.tv_sec = body->mbo_ctime;
 }
 
+/* once default LMV (space balanced) is set on ROOT, it should take effect if
+ * default LMV is not set on parent directory.
+ */
+static void ll_qos_mkdir_prep(struct md_op_data *op_data, struct inode *dir)
+{
+	struct inode *root = dir->i_sb->s_root->d_inode;
+	struct ll_inode_info *rlli = ll_i2info(root);
+	struct ll_inode_info *lli = ll_i2info(dir);
+	struct lmv_stripe_md *lsm;
+
+	op_data->op_dir_depth = lli->lli_depth;
+
+	/* parent directory is striped */
+	if (unlikely(lli->lli_lsm_md))
+		return;
+
+	/* default LMV set on parent directory */
+	if (unlikely(lli->lli_default_lsm_md))
+		return;
+
+	/* parent is ROOT */
+	if (unlikely(dir == root))
+		return;
+
+	/* default LMV not set on ROOT */
+	if (!rlli->lli_default_lsm_md)
+		return;
+
+	down_read(&rlli->lli_lsm_sem);
+	lsm = rlli->lli_default_lsm_md;
+	if (!lsm)
+		goto unlock;
+
+	/* not space balanced */
+	if (lsm->lsm_md_master_mdt_index != LMV_OFFSET_DEFAULT)
+		goto unlock;
+
+	if (lsm->lsm_md_max_inherit != LMV_INHERIT_NONE &&
+	    (lsm->lsm_md_max_inherit == LMV_INHERIT_UNLIMITED ||
+	     lsm->lsm_md_max_inherit >= lli->lli_depth)) {
+		op_data->op_flags |= MF_QOS_MKDIR;
+		if (lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE &&
+		    (lsm->lsm_md_max_inherit_rr == LMV_INHERIT_RR_UNLIMITED ||
+		     lsm->lsm_md_max_inherit_rr >= lli->lli_depth))
+			op_data->op_flags |= MF_RR_MKDIR;
+		CDEBUG(D_INODE, DFID" requests qos mkdir %#x\n",
+		       PFID(&lli->lli_fid), op_data->op_flags);
+	}
+unlock:
+	up_read(&rlli->lli_lsm_sem);
+}
+
 static int ll_new_node(struct inode *dir, struct dentry *dentry,
 		       const char *tgt, umode_t mode, int rdev,
 		       u32 opc)
@@ -1475,6 +1534,9 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry,
 		goto err_exit;
 	}
 
+	if (S_ISDIR(mode))
+		ll_qos_mkdir_prep(op_data, dir);
+
 	if (sbi->ll_flags & LL_SBI_FILE_SECCTX) {
 		err = ll_dentry_init_security(dentry, mode, &dentry->d_name,
 					      &op_data->op_file_secctx_name,
diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c
index 8930f61..e00fe58 100644
--- a/fs/lustre/llite/statahead.c
+++ b/fs/lustre/llite/statahead.c
@@ -1488,8 +1488,11 @@ static int revalidate_statahead_dentry(struct inode *dir,
 			}
 
 			if ((bits & MDS_INODELOCK_LOOKUP) &&
-			    d_lustre_invalid(*dentryp))
+			    d_lustre_invalid(*dentryp)) {
 				d_lustre_revalidate(*dentryp);
+				ll_update_dir_depth(dir, (*dentryp)->d_inode);
+			}
+
 			ll_intent_release(&it);
 		}
 	}
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 71bf7811..fb64b6c 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1427,7 +1427,8 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data,
 	return md_close(tgt->ltd_exp, op_data, mod, request);
 }
 
-static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt)
+static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt,
+					      unsigned short dir_depth)
 {
 	struct lu_tgt_desc *tgt, *cur = NULL;
 	u64 total_avail = 0;
@@ -1470,10 +1471,10 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt)
 
 	/* if current MDT has above-average space, within range of the QOS
 	 * threshold, stay on the same MDT to avoid creating needless remote
-	 * MDT directories.
+	 * MDT directories. It's more likely for low level directories.
 	 */
 	rand = total_avail * (256 - lmv->lmv_qos.lq_threshold_rr) /
-	       (total_usable * 256);
+	       (total_usable * 256 * (1 + dir_depth / 4));
 	if (cur && cur->ltd_qos.ltq_avail >= rand) {
 		tgt = cur;
 		rc = 0;
@@ -1727,12 +1728,14 @@ static inline bool lmv_op_default_qos_mkdir(const struct md_op_data *op_data)
 {
 	const struct lmv_stripe_md *lsm = op_data->op_default_mea1;
 
-	return lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT;
+	return (op_data->op_flags & MF_QOS_MKDIR) ||
+	       (lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT);
 }
 
-/* mkdir by QoS in two cases:
- * 1. 'lfs mkdir -i -1'
- * 2. parent default LMV master_mdt_index is -1
+/* mkdir by QoS in three cases:
+ * 1. ROOT default LMV is space balanced.
+ * 2. 'lfs mkdir -i -1'
+ * 3. parent default LMV master_mdt_index is -1
  *
  * NB, mkdir by QoS only if parent is not striped, this is to avoid remote
  * directories under striped directory.
@@ -1754,11 +1757,12 @@ static inline bool lmv_op_qos_mkdir(const struct md_op_data *op_data)
 	return false;
 }
 
-/* if default LMV is set, and its index is LMV_OFFSET_DEFAULT, and
- * 1. max_inherit_rr is set and is not LMV_INHERIT_RR_NONE
+/* if parent default LMV is space balanced, and
+ * 1. max_inherit_rr is set
  * 2. or parent is ROOT
- * mkdir roundrobin.
- * NB, this also needs to check server is balanced, which is checked by caller.
+ * mkdir roundrobin. Or if parent doesn't have default LMV, while ROOT default
+ * LMV requests roundrobin mkdir, do the same.
+ * NB, this needs to check server is balanced, which is done by caller.
  */
 static inline bool lmv_op_default_rr_mkdir(const struct md_op_data *op_data)
 {
@@ -1767,7 +1771,8 @@ static inline bool lmv_op_default_rr_mkdir(const struct md_op_data *op_data)
 	if (!lmv_op_default_qos_mkdir(op_data))
 		return false;
 
-	return lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE ||
+	return (op_data->op_flags & MF_RR_MKDIR) ||
+	       (lsm && lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE) ||
 	       fid_is_root(&op_data->op_fid1);
 }
 
@@ -1842,7 +1847,8 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 	} else if (lmv_op_qos_mkdir(op_data)) {
 		struct lmv_tgt_desc *tmp = tgt;
 
-		tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds);
+		tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds,
+					 op_data->op_dir_depth);
 		if (tgt == ERR_PTR(-EAGAIN)) {
 			if (ltd_qos_is_balanced(&lmv->lmv_mdt_descs) &&
 			    !lmv_op_default_rr_mkdir(op_data) &&
diff --git a/fs/lustre/lmv/lproc_lmv.c b/fs/lustre/lmv/lproc_lmv.c
index 767b40e..b9efae9 100644
--- a/fs/lustre/lmv/lproc_lmv.c
+++ b/fs/lustre/lmv/lproc_lmv.c
@@ -121,10 +121,21 @@ static ssize_t qos_prio_free_store(struct kobject *kobj,
 	struct obd_device *obd = container_of(kobj, struct obd_device,
 					      obd_kset.kobj);
 	struct lmv_obd *lmv = &obd->u.lmv;
+	char buf[6], *tmp;
 	unsigned int val;
 	int rc;
 
-	rc = kstrtouint(buffer, 0, &val);
+	/* "100%\n\0" should be largest string */
+	if (count >= sizeof(buf))
+		return -ERANGE;
+
+	strncpy(buf, buffer, sizeof(buf));
+	buf[sizeof(buf) - 1] = '\0';
+	tmp = strchr(buf, '%');
+	if (tmp)
+		*tmp = '\0';
+
+	rc = kstrtouint(buf, 0, &val);
 	if (rc)
 		return rc;
 
@@ -158,10 +169,21 @@ static ssize_t qos_threshold_rr_store(struct kobject *kobj,
 	struct obd_device *obd = container_of(kobj, struct obd_device,
 					      obd_kset.kobj);
 	struct lmv_obd *lmv = &obd->u.lmv;
+	char buf[6], *tmp;
 	unsigned int val;
 	int rc;
 
-	rc = kstrtouint(buffer, 0, &val);
+	/* "100%\n\0" should be largest string */
+	if (count >= sizeof(buf))
+		return -ERANGE;
+
+	strncpy(buf, buffer, sizeof(buf));
+	buf[sizeof(buf) - 1] = '\0';
+	tmp = strchr(buf, '%');
+	if (tmp)
+		*tmp = '\0';
+
+	rc = kstrtouint(buf, 0, &val);
 	if (rc)
 		return rc;
 
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index da15ca8..b317bbf 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -847,6 +847,8 @@ enum {
 	LMV_INHERIT_RR_DEFAULT		= 0,
 	/* not inherit any more */
 	LMV_INHERIT_RR_END		= 1,
+	/* default inherit_rr of ROOT */
+	LMV_INHERIT_RR_ROOT		= 3,
 	/* max inherit depth */
 	LMV_INHERIT_RR_MAX		= 250,
 	/* [251, 254] are reserved */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 23/26] lustre: mdc: set default LMV on ROOT
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (24 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 23/25] lustre: llite: enable filesystem-wide default LMV James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 24/25] lnet: o2iblnd: clear fatal error on successful failover James Simmons
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Lai Siyao, Hongchao Zhang, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

To balance MDT usage, set default LMV on ROOT if it's not set. The
default stripe offset is "-1", and default stripe count is "1". Then
directory created by "mkdir" under ROOT will be scattered on all MDTs
by usage.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13417
Lustre-commit: 3e04b0fd6c3dd363 ("LU-13417 mdd: set default LMV on ROOT")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38553
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mdc/mdc_request.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/lustre/mdc/mdc_request.c b/fs/lustre/mdc/mdc_request.c
index 1fb9c46..8b94f6c 100644
--- a/fs/lustre/mdc/mdc_request.c
+++ b/fs/lustre/mdc/mdc_request.c
@@ -557,6 +557,13 @@ static int mdc_get_lustre_md(struct obd_export *exp, struct req_capsule *pill,
 			goto out;
 		}
 
+		if (md_exp->exp_obd->obd_type->typ_lu == &mdc_device_type) {
+			CERROR("%s: no LMV, upgrading from old version?\n",
+			       md_exp->exp_obd->obd_name);
+			rc = 0;
+			goto out_acl;
+		}
+
 		if (md->body->mbo_valid & OBD_MD_MEA) {
 			lmv_size = md->body->mbo_eadatasize;
 			if (!lmv_size) {
@@ -618,6 +625,7 @@ static int mdc_get_lustre_md(struct obd_export *exp, struct req_capsule *pill,
 	}
 	rc = 0;
 
+out_acl:
 	/* for ACL, it's possible that FLACL is set but aclsize is zero.
 	 * only when aclsize != 0 there's an actual segment for ACL
 	 * in reply buffer.
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 24/25] lnet: o2iblnd: clear fatal error on successful failover
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (25 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 23/26] lustre: mdc: set default LMV on ROOT James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 24/26] lustre: llite: enable filesystem-wide default LMV James Simmons
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

In IB bonding configuration link down event causes fatal error
flag to be set on the bonded interface so it is not selected by
LNet for tx, e.g. when just one of the two cables is pulled.
This change allows for the interface status to be restored on
successful failover.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14806
Lustre-commit: 4668283cd13079dd ("LU-14806 o2iblnd: clear fatal error on successful failover")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44139
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 3141953..686581a 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1487,6 +1487,21 @@ static void kiblnd_fini_fmr_poolset(struct kib_fmr_poolset *fps)
 	}
 }
 
+static int kiblnd_get_link_status(struct net_device *dev)
+{
+	int ret = -1;
+
+	LASSERT(dev);
+
+	if (!netif_running(dev))
+		ret = 0;
+	/* Some devices may not be providing link settings */
+	else if (dev->ethtool_ops->get_link)
+		ret = dev->ethtool_ops->get_link(dev);
+
+	return ret;
+}
+
 static int
 kiblnd_init_fmr_poolset(struct kib_fmr_poolset *fps, int cpt, int ncpts,
 			struct kib_net *net,
@@ -2347,6 +2362,7 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns)
 	struct ib_pd *pd;
 	struct kib_net *net;
 	struct sockaddr_in addr;
+	struct net_device *netdev;
 	unsigned long flags;
 	int rc = 0;
 	int i;
@@ -2467,11 +2483,18 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns)
 	if (hdev)
 		kiblnd_hdev_decref(hdev);
 
-	if (rc)
+	if (rc) {
 		dev->ibd_failed_failover++;
-	else
+	} else {
 		dev->ibd_failed_failover = 0;
 
+		rcu_read_lock();
+		netdev = dev_get_by_name_rcu(ns, dev->ibd_ifname);
+		if (netdev && (kiblnd_get_link_status(netdev) == 1))
+			kiblnd_set_ni_fatal_on(dev->ibd_hdev, 0);
+		rcu_read_unlock();
+	}
+
 	return rc;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 24/26] lustre: llite: enable filesystem-wide default LMV
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (26 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 24/25] lnet: o2iblnd: clear fatal error on successful failover James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 25/25] lnet: add "stats reset" to lnetctl James Simmons
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

This change includes three parts:
1. save dir depth to ROOT after lookup on client side.
2. once space balanced default LMV is set on ROOT, and
   max-inherit/max-inherit-rr is unlimited or not less than directory
   depth, new directory will be created in QOS or roundrobin mode.
3. set ROOT default LMV max-inherit unlimited, and max-inherit-rr to
   3, and increase the ratio to create subdirectory on local MDT with
   the directory depth to ROOT, so that new directories will be
   created by space usage, and the deeper it's located it's more
   likely to create on local MDTs; and the top 3 layer will be created
   in roundrobin mode if system is balanced.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14792
Lustre-commit: b9c4dc3c33fe87ec ("LU-14792 llite: enable filesystem-wide default LMV")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44090
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd.h                 |  5 +++
 fs/lustre/llite/dir.c                   |  2 +
 fs/lustre/llite/file.c                  |  5 ++-
 fs/lustre/llite/llite_internal.h        |  5 ++-
 fs/lustre/llite/llite_lib.c             | 17 ++++++++
 fs/lustre/llite/namei.c                 | 74 ++++++++++++++++++++++++++++++---
 fs/lustre/llite/statahead.c             |  5 ++-
 fs/lustre/lmv/lmv_obd.c                 | 32 ++++++++------
 fs/lustre/lmv/lproc_lmv.c               | 26 +++++++++++-
 include/uapi/linux/lustre/lustre_user.h |  2 +
 10 files changed, 149 insertions(+), 24 deletions(-)

diff --git a/fs/lustre/include/obd.h b/fs/lustre/include/obd.h
index f619342..7c5e699 100644
--- a/fs/lustre/include/obd.h
+++ b/fs/lustre/include/obd.h
@@ -706,6 +706,8 @@ enum md_op_flags {
 	MF_MDC_CANCEL_FID4	= BIT(3),
 	MF_GET_MDT_IDX		= BIT(4),
 	MF_GETATTR_BY_FID	= BIT(5),
+	MF_QOS_MKDIR		= BIT(6),
+	MF_RR_MKDIR		= BIT(7),
 };
 
 enum md_cli_flags {
@@ -795,6 +797,9 @@ struct md_op_data {
 
 	u32			op_projid;
 
+	/* mkdir */
+	unsigned short		op_dir_depth;
+
 	u16			op_mirror_id;
 
 	/*
diff --git a/fs/lustre/llite/dir.c b/fs/lustre/llite/dir.c
index 9666534..57f7c3c 100644
--- a/fs/lustre/llite/dir.c
+++ b/fs/lustre/llite/dir.c
@@ -442,6 +442,8 @@ static int ll_dir_setdirstripe(struct dentry *dparent, struct lmv_user_md *lump,
 	if (IS_ERR(op_data))
 		return PTR_ERR(op_data);
 
+	op_data->op_dir_depth = ll_i2info(parent)->lli_depth;
+
 	if (ll_sbi_has_encrypt(sbi) &&
 	    (IS_ENCRYPTED(parent) ||
 	     unlikely(fscrypt_dummy_context_enabled(parent)))) {
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index a4e432e..aa5c662 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -676,8 +676,11 @@ static int ll_intent_file_open(struct dentry *de, void *lmm, int lmmsize,
 		 * of kernel will deal with that later.
 		 */
 		ll_set_lock_data(sbi->ll_md_exp, inode, itp, &bits);
-		if (bits & MDS_INODELOCK_LOOKUP)
+		if (bits & MDS_INODELOCK_LOOKUP) {
 			d_lustre_revalidate(de);
+			ll_update_dir_depth(parent->d_inode, d_inode(de));
+		}
+
 		/* if DoM bit returned along with LAYOUT bit then there
 		 * can be read-on-open data returned.
 		 */
diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 2247806..95e4f45 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -178,13 +178,15 @@ struct ll_inode_info {
 			 * -- I am the owner of dir statahead.
 			 */
 			pid_t				lli_opendir_pid;
+			/* directory depth to ROOT */
+			unsigned short			lli_depth;
 			/* stat will try to access statahead entries or start
 			 * statahead if this flag is set, and this flag will be
 			 * set upon dir open, and cleared when dir is closed,
 			 * statahead hit ratio is too low, or start statahead
 			 * thread failed.
 			 */
-			unsigned int			lli_sa_enabled:1;
+			unsigned short			lli_sa_enabled:1;
 			/* generation for statahead */
 			unsigned int			lli_sa_generation;
 			/* rw lock protects lli_lsm_md */
@@ -1215,6 +1217,7 @@ int ll_statfs_internal(struct ll_sb_info *sbi, struct obd_statfs *osfs,
 		       u32 flags);
 int ll_update_inode(struct inode *inode, struct lustre_md *md);
 void ll_update_inode_flags(struct inode *inode, unsigned int ext_flags);
+void ll_update_dir_depth(struct inode *dir, struct inode *inode);
 int ll_read_inode2(struct inode *inode, void *opaque);
 void ll_truncate_inode_pages_final(struct inode *inode);
 void ll_delete_inode(struct inode *inode);
diff --git a/fs/lustre/llite/llite_lib.c b/fs/lustre/llite/llite_lib.c
index 63d0f02..f540caf 100644
--- a/fs/lustre/llite/llite_lib.c
+++ b/fs/lustre/llite/llite_lib.c
@@ -2483,6 +2483,23 @@ int ll_update_inode(struct inode *inode, struct lustre_md *md)
 	return 0;
 }
 
+/* update directory depth to ROOT, called after LOOKUP lock is fetched. */
+void ll_update_dir_depth(struct inode *dir, struct inode *inode)
+{
+	struct ll_inode_info *lli;
+
+	if (!S_ISDIR(inode->i_mode))
+		return;
+
+	if (inode == dir)
+		return;
+
+	lli = ll_i2info(inode);
+	lli->lli_depth = ll_i2info(dir)->lli_depth + 1;
+	CDEBUG(D_INODE, DFID" depth %hu\n", PFID(&lli->lli_fid),
+	       lli->lli_depth);
+}
+
 void ll_truncate_inode_pages_final(struct inode *inode)
 {
 	struct address_space *mapping = &inode->i_data;
diff --git a/fs/lustre/llite/namei.c b/fs/lustre/llite/namei.c
index 5cc01f0..54b4e0a 100644
--- a/fs/lustre/llite/namei.c
+++ b/fs/lustre/llite/namei.c
@@ -741,8 +741,10 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
 
 	if (!it_disposition(it, DISP_LOOKUP_NEG)) {
 		/* We have the "lookup" lock, so unhide dentry */
-		if (bits & MDS_INODELOCK_LOOKUP)
+		if (bits & MDS_INODELOCK_LOOKUP) {
 			d_lustre_revalidate(*de);
+			ll_update_dir_depth(parent, d_inode(*de));
+		}
 
 		if (encrypt) {
 			rc = fscrypt_get_encryption_info(inode);
@@ -1415,10 +1417,6 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry,
 			return rc;
 	}
 
-	ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, inode, it, &bits);
-	if (bits & MDS_INODELOCK_LOOKUP)
-		d_lustre_revalidate(dentry);
-
 	d_instantiate(dentry, inode);
 
 	if (encrypt) {
@@ -1427,8 +1425,17 @@ static int ll_create_it(struct inode *dir, struct dentry *dentry,
 			return rc;
 	}
 
-	if (!(ll_i2sbi(inode)->ll_flags & LL_SBI_FILE_SECCTX))
+	if (!(ll_i2sbi(inode)->ll_flags & LL_SBI_FILE_SECCTX)) {
 		rc = ll_inode_init_security(dentry, inode, dir);
+		if (rc)
+			return rc;
+	}
+
+	ll_set_lock_data(ll_i2sbi(dir)->ll_md_exp, inode, it, &bits);
+	if (bits & MDS_INODELOCK_LOOKUP) {
+		d_lustre_revalidate(dentry);
+		ll_update_dir_depth(dir, inode);
+	}
 
 	return rc;
 }
@@ -1451,6 +1458,58 @@ void ll_update_times(struct ptlrpc_request *request, struct inode *inode)
 		inode->i_ctime.tv_sec = body->mbo_ctime;
 }
 
+/* once default LMV (space balanced) is set on ROOT, it should take effect if
+ * default LMV is not set on parent directory.
+ */
+static void ll_qos_mkdir_prep(struct md_op_data *op_data, struct inode *dir)
+{
+	struct inode *root = dir->i_sb->s_root->d_inode;
+	struct ll_inode_info *rlli = ll_i2info(root);
+	struct ll_inode_info *lli = ll_i2info(dir);
+	struct lmv_stripe_md *lsm;
+
+	op_data->op_dir_depth = lli->lli_depth;
+
+	/* parent directory is striped */
+	if (unlikely(lli->lli_lsm_md))
+		return;
+
+	/* default LMV set on parent directory */
+	if (unlikely(lli->lli_default_lsm_md))
+		return;
+
+	/* parent is ROOT */
+	if (unlikely(dir == root))
+		return;
+
+	/* default LMV not set on ROOT */
+	if (!rlli->lli_default_lsm_md)
+		return;
+
+	down_read(&rlli->lli_lsm_sem);
+	lsm = rlli->lli_default_lsm_md;
+	if (!lsm)
+		goto unlock;
+
+	/* not space balanced */
+	if (lsm->lsm_md_master_mdt_index != LMV_OFFSET_DEFAULT)
+		goto unlock;
+
+	if (lsm->lsm_md_max_inherit != LMV_INHERIT_NONE &&
+	    (lsm->lsm_md_max_inherit == LMV_INHERIT_UNLIMITED ||
+	     lsm->lsm_md_max_inherit >= lli->lli_depth)) {
+		op_data->op_flags |= MF_QOS_MKDIR;
+		if (lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE &&
+		    (lsm->lsm_md_max_inherit_rr == LMV_INHERIT_RR_UNLIMITED ||
+		     lsm->lsm_md_max_inherit_rr >= lli->lli_depth))
+			op_data->op_flags |= MF_RR_MKDIR;
+		CDEBUG(D_INODE, DFID" requests qos mkdir %#x\n",
+		       PFID(&lli->lli_fid), op_data->op_flags);
+	}
+unlock:
+	up_read(&rlli->lli_lsm_sem);
+}
+
 static int ll_new_node(struct inode *dir, struct dentry *dentry,
 		       const char *tgt, umode_t mode, int rdev,
 		       u32 opc)
@@ -1475,6 +1534,9 @@ static int ll_new_node(struct inode *dir, struct dentry *dentry,
 		goto err_exit;
 	}
 
+	if (S_ISDIR(mode))
+		ll_qos_mkdir_prep(op_data, dir);
+
 	if (sbi->ll_flags & LL_SBI_FILE_SECCTX) {
 		err = ll_dentry_init_security(dentry, mode, &dentry->d_name,
 					      &op_data->op_file_secctx_name,
diff --git a/fs/lustre/llite/statahead.c b/fs/lustre/llite/statahead.c
index 8930f61..e00fe58 100644
--- a/fs/lustre/llite/statahead.c
+++ b/fs/lustre/llite/statahead.c
@@ -1488,8 +1488,11 @@ static int revalidate_statahead_dentry(struct inode *dir,
 			}
 
 			if ((bits & MDS_INODELOCK_LOOKUP) &&
-			    d_lustre_invalid(*dentryp))
+			    d_lustre_invalid(*dentryp)) {
 				d_lustre_revalidate(*dentryp);
+				ll_update_dir_depth(dir, (*dentryp)->d_inode);
+			}
+
 			ll_intent_release(&it);
 		}
 	}
diff --git a/fs/lustre/lmv/lmv_obd.c b/fs/lustre/lmv/lmv_obd.c
index 71bf7811..fb64b6c 100644
--- a/fs/lustre/lmv/lmv_obd.c
+++ b/fs/lustre/lmv/lmv_obd.c
@@ -1427,7 +1427,8 @@ static int lmv_close(struct obd_export *exp, struct md_op_data *op_data,
 	return md_close(tgt->ltd_exp, op_data, mod, request);
 }
 
-static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt)
+static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt,
+					      unsigned short dir_depth)
 {
 	struct lu_tgt_desc *tgt, *cur = NULL;
 	u64 total_avail = 0;
@@ -1470,10 +1471,10 @@ static struct lu_tgt_desc *lmv_locate_tgt_qos(struct lmv_obd *lmv, u32 *mdt)
 
 	/* if current MDT has above-average space, within range of the QOS
 	 * threshold, stay on the same MDT to avoid creating needless remote
-	 * MDT directories.
+	 * MDT directories. It's more likely for low level directories.
 	 */
 	rand = total_avail * (256 - lmv->lmv_qos.lq_threshold_rr) /
-	       (total_usable * 256);
+	       (total_usable * 256 * (1 + dir_depth / 4));
 	if (cur && cur->ltd_qos.ltq_avail >= rand) {
 		tgt = cur;
 		rc = 0;
@@ -1727,12 +1728,14 @@ static inline bool lmv_op_default_qos_mkdir(const struct md_op_data *op_data)
 {
 	const struct lmv_stripe_md *lsm = op_data->op_default_mea1;
 
-	return lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT;
+	return (op_data->op_flags & MF_QOS_MKDIR) ||
+	       (lsm && lsm->lsm_md_master_mdt_index == LMV_OFFSET_DEFAULT);
 }
 
-/* mkdir by QoS in two cases:
- * 1. 'lfs mkdir -i -1'
- * 2. parent default LMV master_mdt_index is -1
+/* mkdir by QoS in three cases:
+ * 1. ROOT default LMV is space balanced.
+ * 2. 'lfs mkdir -i -1'
+ * 3. parent default LMV master_mdt_index is -1
  *
  * NB, mkdir by QoS only if parent is not striped, this is to avoid remote
  * directories under striped directory.
@@ -1754,11 +1757,12 @@ static inline bool lmv_op_qos_mkdir(const struct md_op_data *op_data)
 	return false;
 }
 
-/* if default LMV is set, and its index is LMV_OFFSET_DEFAULT, and
- * 1. max_inherit_rr is set and is not LMV_INHERIT_RR_NONE
+/* if parent default LMV is space balanced, and
+ * 1. max_inherit_rr is set
  * 2. or parent is ROOT
- * mkdir roundrobin.
- * NB, this also needs to check server is balanced, which is checked by caller.
+ * mkdir roundrobin. Or if parent doesn't have default LMV, while ROOT default
+ * LMV requests roundrobin mkdir, do the same.
+ * NB, this needs to check server is balanced, which is done by caller.
  */
 static inline bool lmv_op_default_rr_mkdir(const struct md_op_data *op_data)
 {
@@ -1767,7 +1771,8 @@ static inline bool lmv_op_default_rr_mkdir(const struct md_op_data *op_data)
 	if (!lmv_op_default_qos_mkdir(op_data))
 		return false;
 
-	return lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE ||
+	return (op_data->op_flags & MF_RR_MKDIR) ||
+	       (lsm && lsm->lsm_md_max_inherit_rr != LMV_INHERIT_RR_NONE) ||
 	       fid_is_root(&op_data->op_fid1);
 }
 
@@ -1842,7 +1847,8 @@ int lmv_create(struct obd_export *exp, struct md_op_data *op_data,
 	} else if (lmv_op_qos_mkdir(op_data)) {
 		struct lmv_tgt_desc *tmp = tgt;
 
-		tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds);
+		tgt = lmv_locate_tgt_qos(lmv, &op_data->op_mds,
+					 op_data->op_dir_depth);
 		if (tgt == ERR_PTR(-EAGAIN)) {
 			if (ltd_qos_is_balanced(&lmv->lmv_mdt_descs) &&
 			    !lmv_op_default_rr_mkdir(op_data) &&
diff --git a/fs/lustre/lmv/lproc_lmv.c b/fs/lustre/lmv/lproc_lmv.c
index 767b40e..b9efae9 100644
--- a/fs/lustre/lmv/lproc_lmv.c
+++ b/fs/lustre/lmv/lproc_lmv.c
@@ -121,10 +121,21 @@ static ssize_t qos_prio_free_store(struct kobject *kobj,
 	struct obd_device *obd = container_of(kobj, struct obd_device,
 					      obd_kset.kobj);
 	struct lmv_obd *lmv = &obd->u.lmv;
+	char buf[6], *tmp;
 	unsigned int val;
 	int rc;
 
-	rc = kstrtouint(buffer, 0, &val);
+	/* "100%\n\0" should be largest string */
+	if (count >= sizeof(buf))
+		return -ERANGE;
+
+	strncpy(buf, buffer, sizeof(buf));
+	buf[sizeof(buf) - 1] = '\0';
+	tmp = strchr(buf, '%');
+	if (tmp)
+		*tmp = '\0';
+
+	rc = kstrtouint(buf, 0, &val);
 	if (rc)
 		return rc;
 
@@ -158,10 +169,21 @@ static ssize_t qos_threshold_rr_store(struct kobject *kobj,
 	struct obd_device *obd = container_of(kobj, struct obd_device,
 					      obd_kset.kobj);
 	struct lmv_obd *lmv = &obd->u.lmv;
+	char buf[6], *tmp;
 	unsigned int val;
 	int rc;
 
-	rc = kstrtouint(buffer, 0, &val);
+	/* "100%\n\0" should be largest string */
+	if (count >= sizeof(buf))
+		return -ERANGE;
+
+	strncpy(buf, buffer, sizeof(buf));
+	buf[sizeof(buf) - 1] = '\0';
+	tmp = strchr(buf, '%');
+	if (tmp)
+		*tmp = '\0';
+
+	rc = kstrtouint(buf, 0, &val);
 	if (rc)
 		return rc;
 
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 748c044..1688a53 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -848,6 +848,8 @@ enum {
 	LMV_INHERIT_RR_DEFAULT		= 0,
 	/* not inherit any more */
 	LMV_INHERIT_RR_END		= 1,
+	/* default inherit_rr of ROOT */
+	LMV_INHERIT_RR_ROOT		= 3,
 	/* max inherit depth */
 	LMV_INHERIT_RR_MAX		= 250,
 	/* [251, 254] are reserved */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 25/25] lnet: add "stats reset" to lnetctl
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (27 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 24/26] lustre: llite: enable filesystem-wide default LMV James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 25/26] lnet: o2iblnd: clear fatal error on successful failover James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 26/26] lnet: add "stats reset" to lnetctl James Simmons
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Cyril Bordage, Lustre Development List

From: Cyril Bordage <cbordage@whamcloud.com>

This new command resets stats shown by "lnetctl stats show". It could
be useful when debugging connectivity issues, by making easier the
process to detect the changes in stats from the clean state rather
than on top of historical values.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13299
Lustre-commit: db0b09018e771146 ("LU-13299 lnet: add "stats reset" to lnetctl")
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44150
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lnet/libcfs_ioctl.h | 3 ++-
 net/lnet/lnet/api-ni.c                 | 8 ++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h
index 2c900ef..7b1c880 100644
--- a/include/uapi/linux/lnet/libcfs_ioctl.h
+++ b/include/uapi/linux/lnet/libcfs_ioctl.h
@@ -155,6 +155,7 @@ struct libcfs_ioctl_data {
 #define IOC_LIBCFS_GET_UDSP_SIZE	_IOWR(IOC_LIBCFS_TYPE, 107, IOCTL_CONFIG_SIZE)
 #define IOC_LIBCFS_GET_UDSP		_IOWR(IOC_LIBCFS_TYPE, 108, IOCTL_CONFIG_SIZE)
 #define IOC_LIBCFS_GET_CONST_UDSP_INFO	_IOWR(IOC_LIBCFS_TYPE, 109, IOCTL_CONFIG_SIZE)
-#define IOC_LIBCFS_MAX_NR				       109
+#define IOC_LIBCFS_RESET_LNET_STATS	_IOWR(IOC_LIBCFS_TYPE, 110, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_MAX_NR				       110
 
 #endif /* __LIBCFS_IOCTL_H__ */
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 4513d8d..c7df936 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -3886,6 +3886,14 @@ u32 lnet_get_dlc_seq_locked(void)
 		return rc;
 	}
 
+	case IOC_LIBCFS_RESET_LNET_STATS:
+	{
+		mutex_lock(&the_lnet.ln_api_mutex);
+		lnet_counters_reset();
+		mutex_unlock(&the_lnet.ln_api_mutex);
+		return 0;
+	}
+
 	case IOC_LIBCFS_CONFIG_RTR:
 		config = arg;
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 25/26] lnet: o2iblnd: clear fatal error on successful failover
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (28 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 25/25] lnet: add "stats reset" to lnetctl James Simmons
@ 2021-08-02 19:50 ` James Simmons
  2021-08-02 19:50 ` [lustre-devel] [PATCH 26/26] lnet: add "stats reset" to lnetctl James Simmons
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

In IB bonding configuration link down event causes fatal error
flag to be set on the bonded interface so it is not selected by
LNet for tx, e.g. when just one of the two cables is pulled.
This change allows for the interface status to be restored on
successful failover.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14806
Lustre-commit: 4668283cd13079dd ("LU-14806 o2iblnd: clear fatal error on successful failover")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44139
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/o2iblnd/o2iblnd.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 3141953..686581a 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -1487,6 +1487,21 @@ static void kiblnd_fini_fmr_poolset(struct kib_fmr_poolset *fps)
 	}
 }
 
+static int kiblnd_get_link_status(struct net_device *dev)
+{
+	int ret = -1;
+
+	LASSERT(dev);
+
+	if (!netif_running(dev))
+		ret = 0;
+	/* Some devices may not be providing link settings */
+	else if (dev->ethtool_ops->get_link)
+		ret = dev->ethtool_ops->get_link(dev);
+
+	return ret;
+}
+
 static int
 kiblnd_init_fmr_poolset(struct kib_fmr_poolset *fps, int cpt, int ncpts,
 			struct kib_net *net,
@@ -2347,6 +2362,7 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns)
 	struct ib_pd *pd;
 	struct kib_net *net;
 	struct sockaddr_in addr;
+	struct net_device *netdev;
 	unsigned long flags;
 	int rc = 0;
 	int i;
@@ -2467,11 +2483,18 @@ int kiblnd_dev_failover(struct kib_dev *dev, struct net *ns)
 	if (hdev)
 		kiblnd_hdev_decref(hdev);
 
-	if (rc)
+	if (rc) {
 		dev->ibd_failed_failover++;
-	else
+	} else {
 		dev->ibd_failed_failover = 0;
 
+		rcu_read_lock();
+		netdev = dev_get_by_name_rcu(ns, dev->ibd_ifname);
+		if (netdev && (kiblnd_get_link_status(netdev) == 1))
+			kiblnd_set_ni_fatal_on(dev->ibd_hdev, 0);
+		rcu_read_unlock();
+	}
+
 	return rc;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [lustre-devel] [PATCH 26/26] lnet: add "stats reset" to lnetctl
  2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
                   ` (29 preceding siblings ...)
  2021-08-02 19:50 ` [lustre-devel] [PATCH 25/26] lnet: o2iblnd: clear fatal error on successful failover James Simmons
@ 2021-08-02 19:50 ` James Simmons
  30 siblings, 0 replies; 32+ messages in thread
From: James Simmons @ 2021-08-02 19:50 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Cyril Bordage, Lustre Development List

From: Cyril Bordage <cbordage@whamcloud.com>

This new command resets stats shown by "lnetctl stats show". It could
be useful when debugging connectivity issues, by making easier the
process to detect the changes in stats from the clean state rather
than on top of historical values.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13299
Lustre-commit: db0b09018e771146 ("LU-13299 lnet: add "stats reset" to lnetctl")
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44150
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lnet/libcfs_ioctl.h | 3 ++-
 net/lnet/lnet/api-ni.c                 | 8 ++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h
index 2c900ef..7b1c880 100644
--- a/include/uapi/linux/lnet/libcfs_ioctl.h
+++ b/include/uapi/linux/lnet/libcfs_ioctl.h
@@ -155,6 +155,7 @@ struct libcfs_ioctl_data {
 #define IOC_LIBCFS_GET_UDSP_SIZE	_IOWR(IOC_LIBCFS_TYPE, 107, IOCTL_CONFIG_SIZE)
 #define IOC_LIBCFS_GET_UDSP		_IOWR(IOC_LIBCFS_TYPE, 108, IOCTL_CONFIG_SIZE)
 #define IOC_LIBCFS_GET_CONST_UDSP_INFO	_IOWR(IOC_LIBCFS_TYPE, 109, IOCTL_CONFIG_SIZE)
-#define IOC_LIBCFS_MAX_NR				       109
+#define IOC_LIBCFS_RESET_LNET_STATS	_IOWR(IOC_LIBCFS_TYPE, 110, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_MAX_NR				       110
 
 #endif /* __LIBCFS_IOCTL_H__ */
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 4513d8d..c7df936 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -3886,6 +3886,14 @@ u32 lnet_get_dlc_seq_locked(void)
 		return rc;
 	}
 
+	case IOC_LIBCFS_RESET_LNET_STATS:
+	{
+		mutex_lock(&the_lnet.ln_api_mutex);
+		lnet_counters_reset();
+		mutex_unlock(&the_lnet.ln_api_mutex);
+		return 0;
+	}
+
 	case IOC_LIBCFS_CONFIG_RTR:
 		config = arg;
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-08-02 19:54 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-02 19:50 [lustre-devel] [PATCH 00/25] Sync to OpenSFS tree as of Aug 2, 2021 James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 01/25] lustre: llite: avoid stale data reading James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 02/25] lustre: llite: No locked parallel DIO James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 03/25] lnet: discard lnet_current_net_count James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 04/25] lnet: convert kiblnd/ksocknal_thread_start to vararg James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 05/25] lnet: print device status in net show command James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 06/25] lustre: lmv: getattr_name("..") under striped directory James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 07/25] lustre: llite: revert 'simplify callback handling for async getattr' James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 08/25] lnet: Protect lpni deref in lnet_health_check James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 09/25] lustre: uapi: remove MDS_SETATTR_PORTAL and service James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 10/25] lustre: llite: Modify AIO/DIO reference counting James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 11/25] lustre: llite: Remove transient page counting James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 12/25] lustre: lov: Improve DIO submit James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 13/25] lustre: llite: Adjust dio refcounting James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 14/25] lustre: clio: Skip prep for transients James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 15/25] lustre: osc: Improve osc_queue_sync_pages James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 16/25] lustre: llite: avoid project quota overflow James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 17/25] lnet: check memdup_user_nul using IS_ERR James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 18/25] lustre: osc: Remove lockless truncate James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 19/25] lustre: osc: Remove client contention support James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 20/25] lustre: osc: osc: Do not flush on lockless cancel James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 21/26] lustre: pcc: add LCM_FL_PCC_RDONLY layout flag James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 21/25] lustre: update version to 2.14.53 James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 22/25] lustre: mdc: set default LMV on ROOT James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 22/26] lustre: update version to 2.14.53 James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 23/25] lustre: llite: enable filesystem-wide default LMV James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 23/26] lustre: mdc: set default LMV on ROOT James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 24/25] lnet: o2iblnd: clear fatal error on successful failover James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 24/26] lustre: llite: enable filesystem-wide default LMV James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 25/25] lnet: add "stats reset" to lnetctl James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 25/26] lnet: o2iblnd: clear fatal error on successful failover James Simmons
2021-08-02 19:50 ` [lustre-devel] [PATCH 26/26] lnet: add "stats reset" to lnetctl James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).