lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
* [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021
@ 2021-08-23  2:27 James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 01/15] lustre: uapi: support fixed directory layout James Simmons
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Update to latest OpenSFS work which is at 2.14.54

Alex Zhuravlev (1):
  lustre: mgc: rework mgc_apply_recover_logs() for gcc10

Amir Shehata (3):
  lnet: keep in insync to change due to GPU Direct Support
  lustre: osc: Support RDMA only pages
  lnet: peer state to lock primary nid

Chris Horn (3):
  lnet: Reflect ni_fatal in NI status
  lnet: Provide kernel API for adding peers
  lustre: obdclass: Add peer/peer NI when processing llog

James Simmons (1):
  lustre: obdclass: reintroduce lu_ref

Lai Siyao (1):
  lustre: uapi: support fixed directory layout

Mikhail Pershin (1):
  lustre: mdt: implement fallocate in MDC/MDT

Oleg Drokin (1):
  lustre: update version to 2.14.54

Qian Yingjin (1):
  lustre: pcc: add LCM_FL_PCC_RDONLY layout flag

Serguei Smirnov (2):
  lnet: socklnd: allow dynamic setting of conns_per_peer
  lnet: socklnd: set conns_per_peer based on link speed

Shaun Tancheff (1):
  lustre: llite: Proved an abstraction for AS_EXITING

 fs/lustre/Kconfig                          |   9 +
 fs/lustre/include/lu_ref.h                 | 104 +++++++-
 fs/lustre/include/lustre_osc.h             |  10 +-
 fs/lustre/ldlm/ldlm_lib.c                  |   3 +-
 fs/lustre/llite/llite_internal.h           |   7 +
 fs/lustre/llite/vvp_object.c               |   2 +-
 fs/lustre/mdc/mdc_dev.c                    |  29 ++-
 fs/lustre/mgc/mgc_request.c                |  24 +-
 fs/lustre/obdclass/Makefile                |   3 +-
 fs/lustre/obdclass/cl_io.c                 |   8 +
 fs/lustre/obdclass/lu_ref.c                | 393 +++++++++++++++++++++++++++++
 fs/lustre/obdclass/lustre_peer.c           |  18 +-
 fs/lustre/osc/osc_io.c                     |   7 +-
 fs/lustre/osc/osc_request.c                |  18 +-
 fs/lustre/ptlrpc/wiretest.c                |   3 +
 include/linux/lnet/api.h                   |   1 +
 include/linux/lnet/lib-lnet.h              |  19 +-
 include/linux/lnet/lib-types.h             |  32 +++
 include/uapi/linux/lnet/libcfs_ioctl.h     |   3 +-
 include/uapi/linux/lnet/lnet-dlc.h         |  14 +
 include/uapi/linux/lustre/lustre_user.h    |  18 +-
 include/uapi/linux/lustre/lustre_ver.h     |   4 +-
 net/lnet/klnds/o2iblnd/o2iblnd.c           |   1 +
 net/lnet/klnds/o2iblnd/o2iblnd.h           |   9 +-
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c        |  16 +-
 net/lnet/klnds/socklnd/socklnd.c           |  70 ++---
 net/lnet/klnds/socklnd/socklnd.h           |   4 +
 net/lnet/klnds/socklnd/socklnd_modparams.c | 126 ++++++++-
 net/lnet/lnet/api-ni.c                     |  59 ++++-
 net/lnet/lnet/lib-move.c                   |  62 ++++-
 net/lnet/lnet/peer.c                       |  60 ++++-
 net/lnet/lnet/router_proc.c                |   2 +-
 32 files changed, 1007 insertions(+), 131 deletions(-)

-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 01/15] lustre: uapi: support fixed directory layout
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 02/15] lustre: pcc: add LCM_FL_PCC_RDONLY layout flag James Simmons
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lai Siyao, Lustre Development List

From: Lai Siyao <lai.siyao@whamcloud.com>

User may not want directories split automatically in some cases:
*.directory migrated.
* directory restriped.

To support this, an LMV flag LMV_HASH_FLAG_FIXED is added, and it will
be set on migrated/restriped directories. NB, if directory is migrated
or restriped to a one-stripe directory, it won't be transformed into a
plain directory, because this flag needs to be kept.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14459
Lustre-commit: 4c2514f483280137 ("LU-14459 mdt: support fixed directory layout")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43291
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/wiretest.c             | 1 +
 include/uapi/linux/lustre/lustre_user.h | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index 4301bd4..7d504bd 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -1778,6 +1778,7 @@ void lustre_assert_wire_constants(void)
 	BUILD_BUG_ON(LMV_MAGIC_V1 != 0x0CD20CD0);
 	BUILD_BUG_ON(LMV_MAGIC_STRIPE != 0x0CD40CD0);
 	BUILD_BUG_ON(LMV_HASH_TYPE_MASK != 0x0000ffff);
+	BUILD_BUG_ON(LMV_HASH_FLAG_FIXED != 0x02000000);
 	BUILD_BUG_ON(LMV_HASH_FLAG_MERGE != 0x04000000);
 	BUILD_BUG_ON(LMV_HASH_FLAG_SPLIT != 0x08000000);
 	BUILD_BUG_ON(LMV_HASH_FLAG_LOST_LMV != 0x10000000);
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index b317bbf..7fcc009 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -718,6 +718,9 @@ static inline bool lmv_is_known_hash_type(__u32 type)
 	       (type & LMV_HASH_TYPE_MASK) == LMV_HASH_TYPE_CRUSH;
 }
 
+/* fixed layout, such directories won't split automatically */
+/* NB, update LMV_HASH_FLAG_KNOWN when adding new flag */
+#define LMV_HASH_FLAG_FIXED		0x02000000
 #define LMV_HASH_FLAG_MERGE		0x04000000
 #define LMV_HASH_FLAG_SPLIT		0x08000000
 
@@ -733,6 +736,8 @@ static inline bool lmv_is_known_hash_type(__u32 type)
 #define LMV_HASH_FLAG_LAYOUT_CHANGE	\
 	(LMV_HASH_FLAG_MIGRATION | LMV_HASH_FLAG_SPLIT | LMV_HASH_FLAG_MERGE)
 
+#define LMV_HASH_FLAG_KNOWN		0xfe000000
+
 /* both SPLIT and MIGRATION are set for directory split */
 static inline bool lmv_hash_is_splitting(__u32 hash)
 {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 02/15] lustre: pcc: add LCM_FL_PCC_RDONLY layout flag
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 01/15] lustre: uapi: support fixed directory layout James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 03/15] lustre: mdt: implement fallocate in MDC/MDT James Simmons
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Qian Yingjin <qian@ddn.com>

The upcoming new feature PCC-RO is combined with FLR and extend
the on-disk data strucutre 'enum lov_comp_md_flags' for layout
components. It adds a new layout flag: LCM_FL_PCC_RDONLY.

enum lov_comp_md_flags {
        LCM_FL_NONE             = 0x0,
        LCM_FL_RDONLY           = 0x1,
        LCM_FL_WRITE_PENDING    = 0x2,
        LCM_FL_SYNC_PENDING     = 0x3,
        LCM_FL_PCC_RDONLY       = 0x8,
        LCM_FL_FLR_MASK         = 0xB,
};

The LCM_FL_PCC_RDONLY flag, which is dedicated for PCC-RO, is
different from LCM_FL_RDONLY.
A PCC-RO cached file could be in the state:
- LCM_FL_PCC_RDONLY | LCM_FL_RDONLY: it means that all FLR
  components are synced and in up-to-date state. The replicated
  file is on read-only state. And then one client attaches the
  file into the PCC backend with PCC-RO mode.
- LCM_FL_PCC_RDONLY | LCM_FL_WRITE_PENDING: it means the file was
  once modified, the data content of layout components are not
  synced. MDT has already picked a promary replica and marked
  other components as STALE. At this time, a client can still
  PCC-RO attach the file. On this client, the primary component
  and the PCC copy are both in up-to-date state.

As a new LCM_FL_PCC_RDONLY flag is added, the old client may not
understand this new FLR layout flag, and may result in
inconsistent data access.

This patch adds this new flag for the purpose of compatibility and
interoperability.

WC-bug-id: https://jira.whamcloud.com/browse/LU-13602
Lustre-commit: adc1bbbf20e0a8a5 ("LU-13602 pcc: add LCM_FL_PCC_RDONLY layout flag")
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/40813
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/wiretest.c             |  2 ++
 include/uapi/linux/lustre/lustre_user.h | 13 +++++++------
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/ptlrpc/wiretest.c b/fs/lustre/ptlrpc/wiretest.c
index 7d504bd..b063cb9 100644
--- a/fs/lustre/ptlrpc/wiretest.c
+++ b/fs/lustre/ptlrpc/wiretest.c
@@ -1727,6 +1727,8 @@ void lustre_assert_wire_constants(void)
 		 (long long)LCM_FL_WRITE_PENDING);
 	LASSERTF(LCM_FL_SYNC_PENDING == 3, "found %lld\n",
 		 (long long)LCM_FL_SYNC_PENDING);
+	LASSERTF(LCM_FL_PCC_RDONLY == 8, "found %lld\n",
+		 (long long)LCM_FL_PCC_RDONLY);
 
 	/* Checks for struct lmv_mds_md_v1 */
 	LASSERTF((int)sizeof(struct lmv_mds_md_v1) == 56, "found %lld\n",
diff --git a/include/uapi/linux/lustre/lustre_user.h b/include/uapi/linux/lustre/lustre_user.h
index 7fcc009..1940e52 100644
--- a/include/uapi/linux/lustre/lustre_user.h
+++ b/include/uapi/linux/lustre/lustre_user.h
@@ -622,12 +622,13 @@ static inline __u16 mirror_id_of(__u32 id)
  * on-disk data for lcm_flags. Valid if lcm_magic is LOV_MAGIC_COMP_V1.
  */
 enum lov_comp_md_flags {
-	/* the least 2 bits are used by FLR to record file state */
-	LCM_FL_NONE             = 0,
-	LCM_FL_RDONLY           = 1,
-	LCM_FL_WRITE_PENDING    = 2,
-	LCM_FL_SYNC_PENDING     = 3,
-	LCM_FL_FLR_MASK         = 0x3,
+	/* the least 4 bits are used by FLR to record file state */
+	LCM_FL_NONE             = 0x0,
+	LCM_FL_RDONLY           = 0x1,
+	LCM_FL_WRITE_PENDING    = 0x2,
+	LCM_FL_SYNC_PENDING     = 0x3,
+	LCM_FL_PCC_RDONLY	= 0x8,
+	LCM_FL_FLR_MASK         = 0xB,
 };
 
 struct lov_comp_md_v1 {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 03/15] lustre: mdt: implement fallocate in MDC/MDT
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 01/15] lustre: uapi: support fixed directory layout James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 02/15] lustre: pcc: add LCM_FL_PCC_RDONLY layout flag James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 04/15] lnet: Reflect ni_fatal in NI status James Simmons
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Mikhail Pershin, Lustre Development List

From: Mikhail Pershin <mpershin@whamcloud.com>

- add CLIO fallocate() handling in MDC
- implement FALLOCATE RPC handling at MDT side
- update test group 150 in sanity to work with
  sanity-dom.sh test

WC-bug-id: https://jira.whamcloud.com/browse/LU-14382
Lustre-commit: 163870abfb7c3fe3 ("LU-14382 mdt: implement fallocate in MDC/MDT")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41418
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_osc.h |  4 ++++
 fs/lustre/mdc/mdc_dev.c        | 29 ++++++++++++++++++++---------
 fs/lustre/osc/osc_io.c         |  5 +++--
 fs/lustre/osc/osc_request.c    |  3 ++-
 4 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index 8a62eb2..09868ea 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -678,6 +678,8 @@ int osc_reconnect(const struct lu_env *env, struct obd_export *exp,
 int osc_disconnect(struct obd_export *exp);
 int osc_punch_send(struct obd_export *exp, struct obdo *oa,
 		   obd_enqueue_update_f upcall, void *cookie);
+int osc_fallocate_base(struct obd_export *exp, struct obdo *oa,
+		       obd_enqueue_update_f upcall, void *cookie, int mode);
 
 /* osc_io.c */
 int osc_io_submit(const struct lu_env *env, const struct cl_io_slice *ios,
@@ -712,6 +714,8 @@ void osc_io_lseek_end(const struct lu_env *env,
 		      const struct cl_io_slice *slice);
 int osc_io_lru_reserve(const struct lu_env *env, const struct cl_io_slice *ios,
 		       loff_t pos, size_t count);
+int osc_punch_start(const struct lu_env *env, struct cl_io *io,
+		    struct cl_object *obj);
 
 /* osc_lock.c */
 void osc_lock_to_lockless(const struct lu_env *env, struct osc_lock *ols,
diff --git a/fs/lustre/mdc/mdc_dev.c b/fs/lustre/mdc/mdc_dev.c
index ce4148d..4777b47 100644
--- a/fs/lustre/mdc/mdc_dev.c
+++ b/fs/lustre/mdc/mdc_dev.c
@@ -35,6 +35,7 @@
 
 #include <obd_class.h>
 #include <lustre_osc.h>
+#include <linux/falloc.h>
 #include <uapi/linux/lustre/lustre_param.h>
 
 #include "mdc_internal.h"
@@ -1035,11 +1036,13 @@ static int mdc_io_setattr_start(const struct lu_env *env,
 					      &oio->oi_trunc);
 		if (rc < 0)
 			return rc;
+	} else if (cl_io_is_fallocate(io) &&
+		   io->u.ci_setattr.sa_falloc_mode & FALLOC_FL_PUNCH_HOLE) {
+		rc = osc_punch_start(env, io, obj);
+		if (rc < 0)
+			return rc;
 	}
 
-	if (cl_io_is_fallocate(io))
-		return -EOPNOTSUPP;
-
 	if (oio->oi_lockless == 0) {
 		cl_object_attr_lock(obj);
 		rc = cl_object_attr_get(env, obj, attr);
@@ -1070,7 +1073,7 @@ static int mdc_io_setattr_start(const struct lu_env *env,
 			return rc;
 	}
 
-	if (!(ia_valid & ATTR_SIZE))
+	if (!(ia_valid & ATTR_SIZE) && !cl_io_is_fallocate(io))
 		return 0;
 
 	memset(oa, 0, sizeof(*oa));
@@ -1078,12 +1081,10 @@ static int mdc_io_setattr_start(const struct lu_env *env,
 	oa->o_mtime = attr->cat_mtime;
 	oa->o_atime = attr->cat_atime;
 	oa->o_ctime = attr->cat_ctime;
-
-	oa->o_size = size;
-	oa->o_blocks = OBD_OBJECT_EOF;
 	oa->o_valid = OBD_MD_FLID | OBD_MD_FLGROUP | OBD_MD_FLATIME |
 		      OBD_MD_FLCTIME | OBD_MD_FLMTIME | OBD_MD_FLSIZE |
 		      OBD_MD_FLBLOCKS;
+
 	if (oio->oi_lockless) {
 		oa->o_flags = OBD_FL_SRVLOCK;
 		oa->o_valid |= OBD_MD_FLFLAGS;
@@ -1095,9 +1096,19 @@ static int mdc_io_setattr_start(const struct lu_env *env,
 	}
 
 	init_completion(&cbargs->opc_sync);
+	if (cl_io_is_fallocate(io)) {
+		int falloc_mode = io->u.ci_setattr.sa_falloc_mode;
 
-	rc = osc_punch_send(osc_export(cl2osc(obj)), oa,
-			    mdc_async_upcall, cbargs);
+		oa->o_size = io->u.ci_setattr.sa_falloc_offset;
+		oa->o_blocks = io->u.ci_setattr.sa_falloc_end;
+		rc = osc_fallocate_base(osc_export(cl2osc(obj)), oa,
+					mdc_async_upcall, cbargs, falloc_mode);
+	} else {
+		oa->o_size = size;
+		oa->o_blocks = OBD_OBJECT_EOF;
+		rc = osc_punch_send(osc_export(cl2osc(obj)), oa,
+				    mdc_async_upcall, cbargs);
+	}
 	cbargs->opc_rpc_sent = rc == 0;
 	return rc;
 }
diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c
index 047ae00..d828ae0 100644
--- a/fs/lustre/osc/osc_io.c
+++ b/fs/lustre/osc/osc_io.c
@@ -548,8 +548,8 @@ static void osc_trunc_check(const struct lu_env *env, struct cl_io *io,
  * if server doesn't support fallocate punch, we also need these data to be
  * flushed first to prevent re-ordering with the punch
  */
-static int osc_punch_start(const struct lu_env *env, struct cl_io *io,
-			   struct cl_object *obj)
+int osc_punch_start(const struct lu_env *env, struct cl_io *io,
+		    struct cl_object *obj)
 {
 	struct osc_object *osc = cl2osc(obj);
 	pgoff_t pg_start = cl_index(obj, io->u.ci_setattr.sa_falloc_offset);
@@ -564,6 +564,7 @@ static int osc_punch_start(const struct lu_env *env, struct cl_io *io,
 			     osc);
 	return 0;
 }
+EXPORT_SYMBOL(osc_punch_start);
 
 static int osc_io_setattr_start(const struct lu_env *env,
 				const struct cl_io_slice *slice)
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 2b2ee83..2ac0300 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -472,7 +472,7 @@ int osc_fallocate_base(struct obd_export *exp, struct obdo *oa,
 
 	ptlrpc_request_set_replen(req);
 
-	req->rq_interpret_reply = (ptlrpc_interpterer_t)osc_setattr_interpret;
+	req->rq_interpret_reply = osc_setattr_interpret;
 	BUILD_BUG_ON(sizeof(*sa) > sizeof(req->rq_async_args));
 	sa = ptlrpc_req_async_args(sa, req);
 	sa->sa_oa = oa;
@@ -482,6 +482,7 @@ int osc_fallocate_base(struct obd_export *exp, struct obdo *oa,
 	ptlrpcd_add_req(req);
 	return 0;
 }
+EXPORT_SYMBOL(osc_fallocate_base);
 
 static int osc_sync_interpret(const struct lu_env *env,
 			      struct ptlrpc_request *req,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 04/15] lnet: Reflect ni_fatal in NI status
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
                   ` (2 preceding siblings ...)
  2021-08-23  2:27 ` [lustre-devel] [PATCH 03/15] lustre: mdt: implement fallocate in MDC/MDT James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 05/15] lustre: obdclass: reintroduce lu_ref James Simmons
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

If the ni_fatal_error_on flag is set on an NI then that NI should be
considered down.

HPE-bug-id: LUS-10167
WC-bug-id: https://jira.whamcloud.com/browse/LU-14790
Lustre-commit: d77e95cc6d4e947b ("LU-14790 lnet: Reflect ni_fatal in NI status")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/44072
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h | 14 ++++++++++++++
 net/lnet/lnet/api-ni.c        | 14 +++-----------
 net/lnet/lnet/router_proc.c   |  2 +-
 3 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 3677a12..ed54477 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -116,6 +116,20 @@
 	return update;
 }
 
+static inline unsigned int
+lnet_ni_get_status_locked(struct lnet_ni *ni)
+__must_hold(&ni->ni_lock)
+{
+	if (ni->ni_nid == LNET_NID_LO_0)
+		return LNET_NI_STATUS_UP;
+	else if (atomic_read(&ni->ni_fatal_error_on))
+		return LNET_NI_STATUS_DOWN;
+	else if (ni->ni_status)
+		return ni->ni_status->ns_status;
+	else
+		return LNET_NI_STATUS_UP;
+}
+
 static inline bool
 lnet_ni_set_status(struct lnet_ni *ni, u32 status)
 {
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index c7df936..370c1d6 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -1829,9 +1829,7 @@ struct lnet_ping_buffer *
 			ns->ns_nid = ni->ni_nid;
 
 			lnet_ni_lock(ni);
-			ns->ns_status = ni->ni_status ?
-					ni->ni_status->ns_status :
-						LNET_NI_STATUS_UP;
+			ns->ns_status = lnet_ni_get_status_locked(ni);
 			ni->ni_status = ns;
 			lnet_ni_unlock(ni);
 
@@ -2936,10 +2934,7 @@ void lnet_lib_exit(void)
 	}
 
 	cfg_ni->lic_nid = ni->ni_nid;
-	if (ni->ni_nid == LNET_NID_LO_0)
-		cfg_ni->lic_status = LNET_NI_STATUS_UP;
-	else
-		cfg_ni->lic_status = ni->ni_status->ns_status;
+	cfg_ni->lic_status = lnet_ni_get_status_locked(ni);
 	cfg_ni->lic_dev_cpt = ni->ni_dev_cpt;
 
 	memcpy(&tun->lt_cmn, &ni->ni_net->net_tunables, sizeof(tun->lt_cmn));
@@ -3022,10 +3017,7 @@ void lnet_lib_exit(void)
 	config->cfg_config_u.cfg_net.net_peer_rtr_credits =
 		ni->ni_net->net_tunables.lct_peer_rtr_credits;
 
-	if (ni->ni_nid == LNET_NID_LO_0)
-		net_config->ni_status = LNET_NI_STATUS_UP;
-	else
-		net_config->ni_status = ni->ni_status->ns_status;
+	net_config->ni_status = lnet_ni_get_status_locked(ni);
 
 	if (ni->ni_cpts) {
 		int num_cpts = min(ni->ni_ncpts, LNET_MAX_SHOW_NUM_CPT);
diff --git a/net/lnet/lnet/router_proc.c b/net/lnet/lnet/router_proc.c
index 0de6681..43f70b6 100644
--- a/net/lnet/lnet/router_proc.c
+++ b/net/lnet/lnet/router_proc.c
@@ -673,7 +673,7 @@ static int proc_lnet_nis(struct ctl_table *table, int write,
 
 			lnet_ni_lock(ni);
 			LASSERT(ni->ni_status);
-			stat = (ni->ni_status->ns_status ==
+			stat = (lnet_ni_get_status_locked(ni) ==
 				LNET_NI_STATUS_UP) ? "up" : "down";
 			lnet_ni_unlock(ni);
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 05/15] lustre: obdclass: reintroduce lu_ref
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
                   ` (3 preceding siblings ...)
  2021-08-23  2:27 ` [lustre-devel] [PATCH 04/15] lnet: Reflect ni_fatal in NI status James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 06/15] lnet: keep in insync to change due to GPU Direct Support James Simmons
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

Previously lu_ref was removed due to the lack of testing. Intel
brought this back to life so reintroduce this debugging feature.

WC-bug-id: https://jira.whamcloud.com/browse/LU-6142
Lustre-commit: 5c98de856618f30 ("LU-6142 obdclass: resolve lu_ref checkpatch issues")
Reviewed-on: https://review.whamcloud.com/44088
WC-bug-id: https://jira.whamcloud.com/browse/LU-8066
Lustre-commit: 6b319185659104b ("LU-8066 obdclass: move lu_ref to debugfs")

Reviewed-on: https://review.whamcloud.com/44311
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
---
 fs/lustre/Kconfig           |   9 +
 fs/lustre/include/lu_ref.h  | 104 ++++++++++--
 fs/lustre/obdclass/Makefile |   3 +-
 fs/lustre/obdclass/cl_io.c  |   8 +
 fs/lustre/obdclass/lu_ref.c | 393 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 503 insertions(+), 14 deletions(-)

diff --git a/fs/lustre/Kconfig b/fs/lustre/Kconfig
index bb0e4e7..bcd9d0a 100644
--- a/fs/lustre/Kconfig
+++ b/fs/lustre/Kconfig
@@ -61,6 +61,15 @@ config LUSTRE_DEBUG_EXPENSIVE_CHECK
 
 	  Use with caution. If unsure, say N.
 
+config LUSTRE_DEBUG_LU_REF
+	bool "Enable Lustre lu_ref checks"
+	depends on LUSTRE_DEBUG_EXPENSIVE_CHECK
+	help
+	  lu_ref gives the ability to track references to a given object. It is
+	  quite cpu expensive so its disabled by default.
+
+	  Use with caution. If unsure, say N.
+
 config LUSTRE_TRANSLATE_ERRNOS
 	bool
 	depends on LUSTRE_FS && !X86
diff --git a/fs/lustre/include/lu_ref.h b/fs/lustre/include/lu_ref.h
index 493df95..7b368c2 100644
--- a/fs/lustre/include/lu_ref.h
+++ b/fs/lustre/include/lu_ref.h
@@ -104,12 +104,91 @@
  * @{
  */
 
-/*
- * dummy data structures/functions to pass compile for now.
- * We need to reimplement them with kref.
+#ifdef CONFIG_LUSTRE_DEBUG_LU_REF
+
+/**
+ * Data-structure to keep track of references to a given object. This is used
+ * for debugging.
+ *
+ * lu_ref is embedded into an object which other entities (objects, threads,
+ * etc.) refer to.
  */
-struct lu_ref {};
-struct lu_ref_link {};
+struct lu_ref {
+	/**
+	 * Spin-lock protecting lu_ref::lf_list.
+	 */
+	spinlock_t		lf_guard;
+	/**
+	 * List of all outstanding references (each represented by struct
+	 * lu_ref_link), pointing to this object.
+	 */
+	struct list_head	lf_list;
+	/**
+	 * # of links.
+	 */
+	short			lf_refs;
+	/**
+	 * Flag set when lu_ref_add() failed to allocate lu_ref_link. It is
+	 * used to mask spurious failure of the following lu_ref_del().
+	 */
+	short			lf_failed;
+	/**
+	 * flags - attribute for the lu_ref, for pad and future use.
+	 */
+	short			lf_flags;
+	/**
+	 * Where was I initialized?
+	 */
+	short			lf_line;
+	const char		*lf_func;
+	/**
+	 * Linkage into a global list of all lu_ref's (lu_ref_refs).
+	 */
+	struct list_head	lf_linkage;
+};
+
+struct lu_ref_link {
+	struct lu_ref	*ll_ref;
+	struct list_head ll_linkage;
+	const char	*ll_scope;
+	const void	*ll_source;
+};
+
+void lu_ref_init_loc(struct lu_ref *ref, const char *func, const int line);
+void lu_ref_fini(struct lu_ref *ref);
+#define lu_ref_init(ref) lu_ref_init_loc(ref, __func__, __LINE__)
+
+void lu_ref_add(struct lu_ref *ref, const char *scope, const void *source);
+
+void lu_ref_add_atomic(struct lu_ref *ref, const char *scope,
+		       const void *source);
+
+void lu_ref_add_at(struct lu_ref *ref, struct lu_ref_link *link,
+		   const char *scope, const void *source);
+
+void lu_ref_del(struct lu_ref *ref, const char *scope, const void *source);
+
+void lu_ref_set_at(struct lu_ref *ref, struct lu_ref_link *link,
+		   const char *scope, const void *source0, const void *source1);
+
+void lu_ref_del_at(struct lu_ref *ref, struct lu_ref_link *link,
+		   const char *scope, const void *source);
+
+void lu_ref_print(const struct lu_ref *ref);
+
+void lu_ref_print_all(void);
+
+int lu_ref_global_init(void);
+
+void lu_ref_global_fini(void);
+
+#else /* !CONFIG_LUSTRE_DEBUG_LU_REF */
+
+struct lu_ref {
+};
+
+struct lu_ref_link {
+};
 
 static inline void lu_ref_init(struct lu_ref *ref)
 {
@@ -119,18 +198,16 @@ static inline void lu_ref_fini(struct lu_ref *ref)
 {
 }
 
-static inline struct lu_ref_link *lu_ref_add(struct lu_ref *ref,
-					     const char *scope,
-					     const void *source)
+static inline void lu_ref_add(struct lu_ref *ref,
+			      const char *scope,
+			      const void *source)
 {
-	return NULL;
 }
 
-static inline struct lu_ref_link *lu_ref_add_atomic(struct lu_ref *ref,
-						    const char *scope,
-						    const void *source)
+static inline void lu_ref_add_atomic(struct lu_ref *ref,
+				     const char *scope,
+				     const void *source)
 {
-	return NULL;
 }
 
 static inline void lu_ref_add_at(struct lu_ref *ref,
@@ -172,6 +249,7 @@ static inline void lu_ref_print(const struct lu_ref *ref)
 static inline void lu_ref_print_all(void)
 {
 }
+#endif /* CONFIG_LUSTRE_DEBUG_LU_REF */
 
 /** @} lu */
 
diff --git a/fs/lustre/obdclass/Makefile b/fs/lustre/obdclass/Makefile
index 1c46ea4..659cdf0 100644
--- a/fs/lustre/obdclass/Makefile
+++ b/fs/lustre/obdclass/Makefile
@@ -6,7 +6,8 @@ obj-$(CONFIG_LUSTRE_FS) += obdclass.o
 obdclass-y := llog.o llog_cat.o llog_obd.o llog_swab.o class_obd.o \
 	      genops.o obd_sysfs.o lprocfs_status.o lprocfs_counters.o \
 	      lustre_handles.o lustre_peer.o statfs_pack.o linkea.o \
-	      obdo.o obd_config.o obd_mount.o lu_object.o lu_ref.o \
+	      obdo.o obd_config.o obd_mount.o lu_object.o \
 	      cl_object.o cl_page.o cl_lock.o cl_io.o kernelcomm.o \
 	      jobid.o integrity.o obd_cksum.o range_lock.o \
 	      lu_tgt_descs.o lu_tgt_pool.o
+obdclass-$(CONFIG_LUSTRE_DEBUG_LU_REF) += lu_ref.o
diff --git a/fs/lustre/obdclass/cl_io.c b/fs/lustre/obdclass/cl_io.c
index 9a0373f..f33a5f3 100644
--- a/fs/lustre/obdclass/cl_io.c
+++ b/fs/lustre/obdclass/cl_io.c
@@ -895,6 +895,14 @@ void cl_page_list_move_head(struct cl_page_list *dst, struct cl_page_list *src,
  */
 void cl_page_list_splice(struct cl_page_list *src, struct cl_page_list *dst)
 {
+#ifdef CONFIG_LUSTRE_DEBUG_LU_REF
+	struct cl_page *page;
+	struct cl_page *tmp;
+
+	cl_page_list_for_each_safe(page, tmp, src)
+		lu_ref_set_at(&page->cp_reference, &page->cp_queue_ref,
+			      "queue", src, dst);
+#endif
 	dst->pl_nr += src->pl_nr;
 	src->pl_nr = 0;
 	list_splice_tail_init(&src->pl_pages, &dst->pl_pages);
diff --git a/fs/lustre/obdclass/lu_ref.c b/fs/lustre/obdclass/lu_ref.c
index fd7ac39..0eb92ce 100644
--- a/fs/lustre/obdclass/lu_ref.c
+++ b/fs/lustre/obdclass/lu_ref.c
@@ -42,3 +42,396 @@
 #include <obd_class.h>
 #include <obd_support.h>
 #include <lu_ref.h>
+
+#ifdef CONFIG_LUSTRE_DEBUG_LU_REF
+/**
+ * Asserts a condition for a given lu_ref. Must be called with
+ * lu_ref::lf_guard held.
+ */
+#define REFASSERT(ref, expr) do {					\
+	struct lu_ref *__tmp = (ref);					\
+									\
+	if (unlikely(!(expr))) {					\
+		lu_ref_print(__tmp);					\
+		spin_unlock(&__tmp->lf_guard);				\
+		lu_ref_print_all();					\
+		LASSERT(0);						\
+		spin_lock(&__tmp->lf_guard);				\
+	}								\
+} while (0)
+
+static struct kmem_cache *lu_ref_link_kmem;
+
+static struct lu_kmem_descr lu_ref_caches[] = {
+	{
+		.ckd_cache = &lu_ref_link_kmem,
+		.ckd_name  = "lu_ref_link_kmem",
+		.ckd_size  = sizeof(struct lu_ref_link)
+	},
+	{
+		.ckd_cache = NULL
+	}
+};
+
+/**
+ * Global list of active (initialized, but not finalized) lu_ref's.
+ *
+ * Protected by lu_ref_refs_guard.
+ */
+static LIST_HEAD(lu_ref_refs);
+static DEFINE_SPINLOCK(lu_ref_refs_guard);
+static struct lu_ref lu_ref_marker = {
+	.lf_guard	= __SPIN_LOCK_UNLOCKED(lu_ref_marker.lf_guard),
+	.lf_list	= LIST_HEAD_INIT(lu_ref_marker.lf_list),
+	.lf_linkage	= LIST_HEAD_INIT(lu_ref_marker.lf_linkage)
+};
+
+void lu_ref_print(const struct lu_ref *ref)
+{
+	struct lu_ref_link *link;
+
+	CERROR("lu_ref: %p %d %d %s:%d\n",
+	       ref, ref->lf_refs, ref->lf_failed, ref->lf_func, ref->lf_line);
+	list_for_each_entry(link, &ref->lf_list, ll_linkage) {
+		CERROR("     link: %s %p\n", link->ll_scope, link->ll_source);
+	}
+}
+
+static int lu_ref_is_marker(const struct lu_ref *ref)
+{
+	return ref == &lu_ref_marker;
+}
+
+void lu_ref_print_all(void)
+{
+	struct lu_ref *ref;
+
+	spin_lock(&lu_ref_refs_guard);
+	list_for_each_entry(ref, &lu_ref_refs, lf_linkage) {
+		if (lu_ref_is_marker(ref))
+			continue;
+
+		spin_lock(&ref->lf_guard);
+		lu_ref_print(ref);
+		spin_unlock(&ref->lf_guard);
+	}
+	spin_unlock(&lu_ref_refs_guard);
+}
+
+void lu_ref_init_loc(struct lu_ref *ref, const char *func, const int line)
+{
+	ref->lf_refs = 0;
+	ref->lf_func = func;
+	ref->lf_line = line;
+	spin_lock_init(&ref->lf_guard);
+	INIT_LIST_HEAD(&ref->lf_list);
+	spin_lock(&lu_ref_refs_guard);
+	list_add(&ref->lf_linkage, &lu_ref_refs);
+	spin_unlock(&lu_ref_refs_guard);
+}
+EXPORT_SYMBOL(lu_ref_init_loc);
+
+void lu_ref_fini(struct lu_ref *ref)
+{
+	spin_lock(&ref->lf_guard);
+	REFASSERT(ref, list_empty(&ref->lf_list));
+	REFASSERT(ref, ref->lf_refs == 0);
+	spin_unlock(&ref->lf_guard);
+	spin_lock(&lu_ref_refs_guard);
+	list_del_init(&ref->lf_linkage);
+	spin_unlock(&lu_ref_refs_guard);
+}
+EXPORT_SYMBOL(lu_ref_fini);
+
+static struct lu_ref_link *lu_ref_add_context(struct lu_ref *ref,
+					      int flags,
+					      const char *scope,
+					      const void *source)
+{
+	struct lu_ref_link *link;
+
+	link = NULL;
+	if (lu_ref_link_kmem) {
+		link = kmem_cache_zalloc(lu_ref_link_kmem, flags);
+		if (link) {
+			link->ll_ref = ref;
+			link->ll_scope = scope;
+			link->ll_source = source;
+			spin_lock(&ref->lf_guard);
+			list_add_tail(&link->ll_linkage, &ref->lf_list);
+			ref->lf_refs++;
+			spin_unlock(&ref->lf_guard);
+		}
+	}
+
+	if (!link) {
+		spin_lock(&ref->lf_guard);
+		ref->lf_failed++;
+		spin_unlock(&ref->lf_guard);
+		link = ERR_PTR(-ENOMEM);
+	}
+
+	return link;
+}
+
+void lu_ref_add(struct lu_ref *ref, const char *scope, const void *source)
+{
+	might_sleep();
+	lu_ref_add_context(ref, GFP_NOFS, scope, source);
+}
+EXPORT_SYMBOL(lu_ref_add);
+
+void lu_ref_add_at(struct lu_ref *ref, struct lu_ref_link *link,
+		   const char *scope, const void *source)
+{
+	link->ll_ref = ref;
+	link->ll_scope = scope;
+	link->ll_source = source;
+	spin_lock(&ref->lf_guard);
+	list_add_tail(&link->ll_linkage, &ref->lf_list);
+	ref->lf_refs++;
+	spin_unlock(&ref->lf_guard);
+}
+EXPORT_SYMBOL(lu_ref_add_at);
+
+/**
+ * Version of lu_ref_add() to be used in non-blockable contexts.
+ */
+void lu_ref_add_atomic(struct lu_ref *ref, const char *scope,
+		       const void *source)
+{
+	lu_ref_add_context(ref, GFP_ATOMIC, scope, source);
+}
+EXPORT_SYMBOL(lu_ref_add_atomic);
+
+static inline int lu_ref_link_eq(const struct lu_ref_link *link,
+				 const char *scope,
+				 const void *source)
+{
+	return link->ll_source == source && !strcmp(link->ll_scope, scope);
+}
+
+/**
+ * Maximal chain length seen so far.
+ */
+static unsigned int lu_ref_chain_max_length = 127;
+
+/**
+ * Searches for a lu_ref_link with given [scope, source] within given lu_ref.
+ */
+static struct lu_ref_link *lu_ref_find(struct lu_ref *ref, const char *scope,
+				       const void *source)
+{
+	struct lu_ref_link *link;
+	unsigned int iterations;
+
+	iterations = 0;
+	list_for_each_entry(link, &ref->lf_list, ll_linkage) {
+		++iterations;
+		if (lu_ref_link_eq(link, scope, source)) {
+			if (iterations > lu_ref_chain_max_length) {
+				CWARN("Long lu_ref chain %d \"%s\":%p\n",
+				      iterations, scope, source);
+				lu_ref_chain_max_length = iterations * 3 / 2;
+			}
+			return link;
+		}
+	}
+	return NULL;
+}
+
+void lu_ref_del(struct lu_ref *ref, const char *scope, const void *source)
+{
+	struct lu_ref_link *link;
+
+	spin_lock(&ref->lf_guard);
+	link = lu_ref_find(ref, scope, source);
+	if (link) {
+		list_del(&link->ll_linkage);
+		ref->lf_refs--;
+		spin_unlock(&ref->lf_guard);
+		kmem_cache_free(lu_ref_link_kmem, link);
+	} else {
+		REFASSERT(ref, ref->lf_failed > 0);
+		ref->lf_failed--;
+		spin_unlock(&ref->lf_guard);
+	}
+}
+EXPORT_SYMBOL(lu_ref_del);
+
+void lu_ref_set_at(struct lu_ref *ref, struct lu_ref_link *link,
+		   const char *scope,
+		   const void *source0, const void *source1)
+{
+	spin_lock(&ref->lf_guard);
+	REFASSERT(ref, !IS_ERR_OR_NULL(link));
+	REFASSERT(ref, link->ll_ref == ref);
+	REFASSERT(ref, lu_ref_link_eq(link, scope, source0));
+	link->ll_source = source1;
+	spin_unlock(&ref->lf_guard);
+}
+EXPORT_SYMBOL(lu_ref_set_at);
+
+void lu_ref_del_at(struct lu_ref *ref, struct lu_ref_link *link,
+		   const char *scope, const void *source)
+{
+	spin_lock(&ref->lf_guard);
+	REFASSERT(ref, !IS_ERR_OR_NULL(link));
+	REFASSERT(ref, link->ll_ref == ref);
+	REFASSERT(ref, lu_ref_link_eq(link, scope, source));
+	list_del(&link->ll_linkage);
+	ref->lf_refs--;
+	spin_unlock(&ref->lf_guard);
+}
+EXPORT_SYMBOL(lu_ref_del_at);
+
+static void *lu_ref_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	struct lu_ref *ref = seq->private;
+
+	spin_lock(&lu_ref_refs_guard);
+	if (list_empty(&ref->lf_linkage))
+		ref = NULL;
+	spin_unlock(&lu_ref_refs_guard);
+
+	return ref;
+}
+
+static void *lu_ref_seq_next(struct seq_file *seq, void *p, loff_t *pos)
+{
+	struct lu_ref *ref = p;
+	struct lu_ref *next;
+
+	LASSERT(seq->private == p);
+	LASSERT(!list_empty(&ref->lf_linkage));
+
+	(*pos)++;
+	spin_lock(&lu_ref_refs_guard);
+	next = list_entry(ref->lf_linkage.next, struct lu_ref, lf_linkage);
+	if (&next->lf_linkage == &lu_ref_refs)
+		p = NULL;
+	else
+		list_move(&ref->lf_linkage, &next->lf_linkage);
+	spin_unlock(&lu_ref_refs_guard);
+
+	return p;
+}
+
+static void lu_ref_seq_stop(struct seq_file *seq, void *p)
+{
+	/* Nothing to do */
+}
+
+
+static int lu_ref_seq_show(struct seq_file *seq, void *p)
+{
+	struct lu_ref *ref  = p;
+	struct lu_ref *next;
+
+	spin_lock(&lu_ref_refs_guard);
+	next = list_entry(ref->lf_linkage.next, struct lu_ref, lf_linkage);
+	if ((&next->lf_linkage == &lu_ref_refs) || lu_ref_is_marker(next)) {
+		spin_unlock(&lu_ref_refs_guard);
+		return 0;
+	}
+
+	/* print the entry */
+	spin_lock(&next->lf_guard);
+	seq_printf(seq, "lu_ref: %p %d %d %s:%d\n",
+		   next, next->lf_refs, next->lf_failed,
+		   next->lf_func, next->lf_line);
+	if (next->lf_refs > 64) {
+		seq_puts(seq, "  too many references, skip\n");
+	} else {
+		struct lu_ref_link *link;
+		int i = 0;
+
+		list_for_each_entry(link, &next->lf_list, ll_linkage)
+			seq_printf(seq, "  #%d link: %s %p\n",
+				   i++, link->ll_scope, link->ll_source);
+	}
+	spin_unlock(&next->lf_guard);
+	spin_unlock(&lu_ref_refs_guard);
+
+	return 0;
+}
+
+static const struct seq_operations lu_ref_seq_ops = {
+	.start	= lu_ref_seq_start,
+	.stop	= lu_ref_seq_stop,
+	.next	= lu_ref_seq_next,
+	.show	= lu_ref_seq_show
+};
+
+static int lu_ref_seq_open(struct inode *inode, struct file *file)
+{
+	struct lu_ref *marker = &lu_ref_marker;
+	int result = 0;
+
+	result = seq_open(file, &lu_ref_seq_ops);
+	if (result == 0) {
+		spin_lock(&lu_ref_refs_guard);
+		if (!list_empty(&marker->lf_linkage))
+			result = -EAGAIN;
+		else
+			list_add(&marker->lf_linkage, &lu_ref_refs);
+		spin_unlock(&lu_ref_refs_guard);
+
+		if (result == 0) {
+			struct seq_file *f = file->private_data;
+
+			f->private = marker;
+		} else {
+			seq_release(inode, file);
+		}
+	}
+
+	return result;
+}
+
+static int lu_ref_seq_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *m = file->private_data;
+	struct lu_ref *ref = m->private;
+
+	spin_lock(&lu_ref_refs_guard);
+	list_del_init(&ref->lf_linkage);
+	spin_unlock(&lu_ref_refs_guard);
+
+	return seq_release(inode, file);
+}
+
+static const struct file_operations lu_ref_dump_fops = {
+	.owner		= THIS_MODULE,
+	.open		= lu_ref_seq_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= lu_ref_seq_release
+};
+
+int lu_ref_global_init(void)
+{
+	int result;
+
+	CDEBUG(D_CONSOLE,
+	       "lu_ref tracking is enabled. Performance isn't.\n");
+
+	result = lu_kmem_init(lu_ref_caches);
+	if (result)
+		return result;
+
+	debugfs_create_file("lu_refs", 0444, debugfs_lustre_root,
+			    NULL, &lu_ref_dump_fops);
+
+	return result;
+}
+
+void lu_ref_global_fini(void)
+{
+	/* debugfs file gets cleaned up by debugfs_remove_recursive on
+	 * debugfs_lustre_root
+	 */
+	lu_kmem_fini(lu_ref_caches);
+}
+
+#endif /* CONFIG_LUSTRE_DEBUG_LU_REF */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 06/15] lnet: keep in insync to change due to GPU Direct Support
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
                   ` (4 preceding siblings ...)
  2021-08-23  2:27 ` [lustre-devel] [PATCH 05/15] lustre: obdclass: reintroduce lu_ref James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 07/15] lustre: osc: Support RDMA only pages James Simmons
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

Since in the HPC community most people run 10+ year old kernels
Nvidia created their own version of PCI peer2peer which sites
want to use. The OpenSFS supports this special one off out of
tree driver which impacts the LNet code. To keep in sync we
port to the Linux proper tree these changes. This also allows
the potential to support the support PCI peer2peer in the
future. This initial abstract was poorly done so it will have
to be revisted.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14798
Lustre-commit: a7a889f77cec3ad44 ("LU-14798 lnet: add LNet GPU Direct Support")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
lustre-change: https://review.whamcloud.com/37368
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Whamcloud-bug-id: EX-773
Reviewed-on: https://review.whamcloud.com/44110
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-types.h      | 25 +++++++++++++++
 net/lnet/klnds/o2iblnd/o2iblnd.c    |  1 +
 net/lnet/klnds/o2iblnd/o2iblnd.h    |  9 +++---
 net/lnet/klnds/o2iblnd/o2iblnd_cb.c | 16 ++++++++--
 net/lnet/lnet/lib-move.c            | 62 ++++++++++++++++++++++++++++++-------
 5 files changed, 95 insertions(+), 18 deletions(-)

diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index e951e02..6b97ab9 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -312,8 +312,33 @@ struct lnet_lnd {
 
 	/* accept a new connection */
 	int (*lnd_accept)(struct lnet_ni *ni, struct socket *sock);
+
+	/* get dma_dev priority */
+	unsigned int (*lnd_get_dev_prio)(struct lnet_ni *ni,
+					 unsigned int dev_idx);
 };
 
+/* FIXME !!!!! The abstract for GPU page support (PCI peer2peer)
+ * was done for only the external NVIDIA driver and done very
+ * poorly. Once DRI / TTM supports peer2peer we can redo this
+ * right.
+ */
+static inline unsigned int lnet_get_dev_prio(struct device *dev,
+					     unsigned int dev_idx)
+{
+	return UINT_MAX;
+}
+
+static inline bool lnet_is_rdma_only_page(struct page *page)
+{
+	return false;
+}
+
+static inline unsigned int lnet_get_dev_idx(struct page *page)
+{
+	return false;
+}
+
 struct lnet_tx_queue {
 	int			tq_credits;	/* # tx credits free */
 	int			tq_credits_min;	/* lowest it's been */
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.c b/net/lnet/klnds/o2iblnd/o2iblnd.c
index 686581a..a4949d8 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.c
@@ -2953,6 +2953,7 @@ static int kiblnd_startup(struct lnet_ni *ni)
 	.lnd_ctl	= kiblnd_ctl,
 	.lnd_send	= kiblnd_send,
 	.lnd_recv	= kiblnd_recv,
+	.lnd_get_dev_prio = kiblnd_get_dev_prio,
 };
 
 static void ko2inlnd_assert_wire_constants(void)
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd.h b/net/lnet/klnds/o2iblnd/o2iblnd.h
index 3691bfe..5066f7b 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd.h
+++ b/net/lnet/klnds/o2iblnd/o2iblnd.h
@@ -858,18 +858,18 @@ static inline void kiblnd_dma_unmap_single(struct ib_device *dev,
 #define KIBLND_UNMAP_ADDR_SET(p, m, a)	do {} while (0)
 #define KIBLND_UNMAP_ADDR(p, m, a)	(a)
 
-static inline int kiblnd_dma_map_sg(struct ib_device *dev,
+static inline int kiblnd_dma_map_sg(struct kib_hca_dev *hdev,
 				    struct scatterlist *sg, int nents,
 				    enum dma_data_direction direction)
 {
-	return ib_dma_map_sg(dev, sg, nents, direction);
+	return ib_dma_map_sg(hdev->ibh_ibdev, sg, nents, direction);
 }
 
-static inline void kiblnd_dma_unmap_sg(struct ib_device *dev,
+static inline void kiblnd_dma_unmap_sg(struct kib_hca_dev *hdev,
 				       struct scatterlist *sg, int nents,
 				       enum dma_data_direction direction)
 {
-	ib_dma_unmap_sg(dev, sg, nents, direction);
+	ib_dma_unmap_sg(hdev->ibh_ibdev, sg, nents, direction);
 }
 
 static inline u64 kiblnd_sg_dma_address(struct ib_device *dev,
@@ -959,3 +959,4 @@ void kiblnd_pack_msg(struct lnet_ni *ni, struct kib_msg *msg, int version,
 int kiblnd_send(struct lnet_ni *ni, void *private, struct lnet_msg *lntmsg);
 int kiblnd_recv(struct lnet_ni *ni, void *private, struct lnet_msg *lntmsg,
 		int delayed, struct iov_iter *to, unsigned int rlen);
+unsigned int kiblnd_get_dev_prio(struct lnet_ni *ni, unsigned int dev_idx);
diff --git a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 193e75b..8ccd2ab 100644
--- a/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/net/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -615,7 +615,7 @@ static void kiblnd_unmap_tx(struct kib_tx *tx)
 		kiblnd_fmr_pool_unmap(&tx->tx_fmr, tx->tx_status);
 
 	if (tx->tx_nfrags) {
-		kiblnd_dma_unmap_sg(tx->tx_pool->tpo_hdev->ibh_ibdev,
+		kiblnd_dma_unmap_sg(tx->tx_pool->tpo_hdev,
 				    tx->tx_frags, tx->tx_nfrags, tx->tx_dmadir);
 		tx->tx_nfrags = 0;
 	}
@@ -636,7 +636,7 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	tx->tx_dmadir = (rd != tx->tx_rd) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
 	tx->tx_nfrags = nfrags;
 
-	rd->rd_nfrags = kiblnd_dma_map_sg(hdev->ibh_ibdev, tx->tx_frags,
+	rd->rd_nfrags = kiblnd_dma_map_sg(hdev, tx->tx_frags,
 					  tx->tx_nfrags, tx->tx_dmadir);
 
 	for (i = 0, nob = 0; i < rd->rd_nfrags; i++) {
@@ -1721,6 +1721,18 @@ static int kiblnd_map_tx(struct lnet_ni *ni, struct kib_tx *tx,
 	lnet_finalize(lntmsg, -EIO);
 }
 
+unsigned int
+kiblnd_get_dev_prio(struct lnet_ni *ni, unsigned int dev_idx)
+{
+	struct kib_net *net = ni->ni_data;
+	struct device *dev = NULL;
+
+	if (net)
+		dev = net->ibn_dev->ibd_hdev->ibh_ibdev->dma_device;
+
+	return lnet_get_dev_prio(dev, dev_idx);
+}
+
 int
 kiblnd_recv(struct lnet_ni *ni, void *private, struct lnet_msg *lntmsg,
 	    int delayed, struct iov_iter *to, unsigned int rlen)
diff --git a/net/lnet/lnet/lib-move.c b/net/lnet/lnet/lib-move.c
index 33d7e78..035bda3 100644
--- a/net/lnet/lnet/lib-move.c
+++ b/net/lnet/lnet/lib-move.c
@@ -1420,16 +1420,38 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	return best_route;
 }
 
+static inline unsigned int
+lnet_dev_prio_of_md(struct lnet_ni *ni, unsigned int dev_idx)
+{
+	if (dev_idx == UINT_MAX)
+		return UINT_MAX;
+
+	if (!ni || !ni->ni_net || !ni->ni_net->net_lnd ||
+	    !ni->ni_net->net_lnd->lnd_get_dev_prio)
+		return UINT_MAX;
+
+	return ni->ni_net->net_lnd->lnd_get_dev_prio(ni, dev_idx);
+}
+
 static struct lnet_ni *
 lnet_get_best_ni(struct lnet_net *local_net, struct lnet_ni *best_ni,
 		 struct lnet_peer *peer, struct lnet_peer_net *peer_net,
-		 int md_cpt)
+		 struct lnet_msg *msg, int md_cpt)
 {
-	struct lnet_ni *ni = NULL;
+	struct lnet_libmd *md = msg->msg_md;
+	unsigned int offset = msg->msg_offset;
 	unsigned int shortest_distance;
+	struct lnet_ni *ni = NULL;
 	int best_credits;
 	int best_healthv;
 	u32 best_sel_prio;
+	unsigned int best_dev_prio;
+	unsigned int dev_idx = UINT_MAX;
+	struct page *page = lnet_get_first_page(md, offset);
+
+	msg->msg_rdma_force = lnet_is_rdma_only_page(page);
+	if (msg->msg_rdma_force)
+		dev_idx = lnet_get_dev_idx(page);
 
 	/* If there is no peer_ni that we can send to on this network,
 	 * then there is no point in looking for a new best_ni here.
@@ -1440,9 +1462,11 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 	if (!best_ni) {
 		best_sel_prio = LNET_MAX_SELECTION_PRIORITY;
 		shortest_distance = UINT_MAX;
+		best_dev_prio = UINT_MAX;
 		best_credits = INT_MIN;
 		best_healthv = 0;
 	} else {
+		best_dev_prio = lnet_dev_prio_of_md(best_ni, dev_idx);
 		shortest_distance = cfs_cpt_distance(lnet_cpt_table(), md_cpt,
 						     best_ni->ni_dev_cpt);
 		best_credits = atomic_read(&best_ni->ni_tx_credits);
@@ -1456,6 +1480,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		int ni_healthv;
 		int ni_fatal;
 		u32 ni_sel_prio;
+		unsigned int ni_dev_prio;
 
 		ni_credits = atomic_read(&ni->ni_tx_credits);
 		ni_healthv = atomic_read(&ni->ni_healthv);
@@ -1471,6 +1496,8 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 					    md_cpt,
 					    ni->ni_dev_cpt);
 
+		ni_dev_prio = lnet_dev_prio_of_md(ni, dev_idx);
+
 		/*
 		 * All distances smaller than the NUMA range
 		 * are treated equally.
@@ -1478,22 +1505,21 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		if (distance < lnet_numa_range)
 			distance = lnet_numa_range;
 
-		/*
-		 * Select on health, shorter distance, available
-		 * credits, then round-robin.
+		/* * Select on health, selection policy, direct dma prio,
+		 * shorter distance, available credits, then round-robin.
 		 */
 		if (ni_fatal)
 			continue;
 
 		if (best_ni)
 			CDEBUG(D_NET,
-			       "compare ni %s [c:%d, d:%d, s:%d, p:%u] with best_ni %s [c:%d, d:%d, s:%d, p:%u]\n",
+			       "compare ni %s [c:%d, d:%d, s:%d, p:%u, g:%u] with best_ni %s [c:%d, d:%d, s:%d, p:%u, g:%u]\n",
 			       libcfs_nid2str(ni->ni_nid), ni_credits, distance,
-			       ni->ni_seq, ni_sel_prio,
+			       ni->ni_seq, ni_sel_prio, ni_dev_prio,
 			       (best_ni) ? libcfs_nid2str(best_ni->ni_nid)
 			       : "not selected", best_credits, shortest_distance,
 			       (best_ni) ? best_ni->ni_seq : 0,
-			       best_sel_prio);
+			       best_sel_prio, best_dev_prio);
 		else
 			goto select_ni;
 
@@ -1507,6 +1533,11 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 		else if (ni_sel_prio < best_sel_prio)
 			goto select_ni;
 
+		if (ni_dev_prio > best_dev_prio)
+			continue;
+		else if (ni_dev_prio < best_dev_prio)
+			goto select_ni;
+
 		if (distance > shortest_distance)
 			continue;
 		else if (distance < shortest_distance)
@@ -1522,6 +1553,7 @@ void lnet_usr_translate_stats(struct lnet_ioctl_element_msg_stats *msg_stats,
 
 select_ni:
 		best_sel_prio = ni_sel_prio;
+		best_dev_prio = ni_dev_prio;
 		shortest_distance = distance;
 		best_healthv = ni_healthv;
 		best_ni = ni;
@@ -1812,6 +1844,7 @@ struct lnet_ni *
 lnet_find_best_ni_on_spec_net(struct lnet_ni *cur_best_ni,
 			      struct lnet_peer *peer,
 			      struct lnet_peer_net *peer_net,
+			      struct lnet_msg *msg,
 			      int cpt)
 {
 	struct lnet_net *local_net;
@@ -1829,7 +1862,7 @@ struct lnet_ni *
 	 *	3. Round Robin
 	 */
 	best_ni = lnet_get_best_ni(local_net, cur_best_ni,
-				   peer, peer_net, cpt);
+				   peer, peer_net, msg, cpt);
 
 	return best_ni;
 }
@@ -2064,6 +2097,7 @@ struct lnet_ni *
 	if (!sd->sd_best_ni) {
 		lpn = gwni->lpni_peer_net;
 		sd->sd_best_ni = lnet_find_best_ni_on_spec_net(NULL, gw, lpn,
+							       sd->sd_msg,
 							       sd->sd_md_cpt);
 		if (!sd->sd_best_ni) {
 			CERROR("Internal Error. Expected local ni on %s but non found :%s\n",
@@ -2143,7 +2177,7 @@ struct lnet_ni *
 
 struct lnet_ni *
 lnet_find_best_ni_on_local_net(struct lnet_peer *peer, int md_cpt,
-			       bool discovery)
+			       struct lnet_msg *msg, bool discovery)
 {
 	struct lnet_peer_net *lpn = NULL;
 	struct lnet_peer_net *best_lpn = NULL;
@@ -2237,8 +2271,8 @@ struct lnet_ni *
 		/* Select the best NI on the same net as best_lpn chosen
 		 * above
 		 */
-		best_ni = lnet_find_best_ni_on_spec_net(NULL, peer,
-							best_lpn, md_cpt);
+		best_ni = lnet_find_best_ni_on_spec_net(NULL, peer, best_lpn,
+							msg, md_cpt);
 	}
 
 	return best_ni;
@@ -2298,6 +2332,7 @@ struct lnet_ni *
 		best_ni =
 			lnet_find_best_ni_on_spec_net(NULL, sd->sd_peer,
 						      sd->sd_best_lpni->lpni_peer_net,
+						      sd->sd_msg,
 						      sd->sd_md_cpt);
 		/* If there is no best_ni we don't have a route */
 		if (!best_ni) {
@@ -2350,6 +2385,7 @@ struct lnet_ni *
 		sd->sd_best_ni = lnet_find_best_ni_on_spec_net(NULL,
 							       sd->sd_peer,
 							       sd->sd_best_lpni->lpni_peer_net,
+							       sd->sd_msg,
 							       sd->sd_md_cpt);
 		if (!sd->sd_best_ni) {
 			CERROR("Unable to forward message to %s. No local NI available\n",
@@ -2382,6 +2418,7 @@ struct lnet_ni *
 		sd->sd_best_ni =
 		  lnet_find_best_ni_on_spec_net(NULL, sd->sd_peer,
 						sd->sd_best_lpni->lpni_peer_net,
+						sd->sd_msg,
 						sd->sd_md_cpt);
 
 		if (!sd->sd_best_ni) {
@@ -2403,6 +2440,7 @@ struct lnet_ni *
 	 */
 	sd->sd_best_ni = lnet_find_best_ni_on_local_net(sd->sd_peer,
 					sd->sd_md_cpt,
+					sd->sd_msg,
 					lnet_msg_discovery(sd->sd_msg));
 	if (sd->sd_best_ni) {
 		sd->sd_best_lpni =
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 07/15] lustre: osc: Support RDMA only pages
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
                   ` (5 preceding siblings ...)
  2021-08-23  2:27 ` [lustre-devel] [PATCH 06/15] lnet: keep in insync to change due to GPU Direct Support James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 08/15] lustre: mgc: rework mgc_apply_recover_logs() for gcc10 James Simmons
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Wang Shilong, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

Some memory architectures and CPU-offload cards with
on-board memory do not map data pages into the CPU
address space. Allow RDMA of data directly into those
pages without accessing contents.

Therefore, made changes to prevent doing checksum on
these type of pages.

WC-bug-id: https://jira.whamcloud.com/browse/LU-14798
Lustre-commit: 29eabeb34c5ba2cffd ("LU-14798 lustre: Support RDMA only pages")
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Lustre-change: https://review.whamcloud.com/37454
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Whamcloud-bug-id: EX-773
Reviewed-on: https://review.whamcloud.com/44111
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/lustre_osc.h |  6 ++++--
 fs/lustre/osc/osc_io.c         |  2 ++
 fs/lustre/osc/osc_request.c    | 15 +++++++++++----
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/lustre/include/lustre_osc.h b/fs/lustre/include/lustre_osc.h
index 09868ea..cdc9aae 100644
--- a/fs/lustre/include/lustre_osc.h
+++ b/fs/lustre/include/lustre_osc.h
@@ -956,8 +956,10 @@ struct osc_extent {
 				oe_ndelay:1,
 	/* direct IO pages */
 				oe_dio:1,
-	/* this extent consists of RDMA only pages */
-				oe_is_rdma_only;
+	/* this extent consists of pages that are not directly accessible
+	 *  from the CPU
+	 */
+				oe_is_rdma_only:1;
 	/* how many grants allocated for this extent.
 	 *  Grant allocated for this extent. There is no grant allocated
 	 *  for reading extents and sync write extents.
diff --git a/fs/lustre/osc/osc_io.c b/fs/lustre/osc/osc_io.c
index d828ae0..b867985 100644
--- a/fs/lustre/osc/osc_io.c
+++ b/fs/lustre/osc/osc_io.c
@@ -153,6 +153,8 @@ int osc_io_submit(const struct lu_env *env, const struct cl_io_slice *ios,
 	page = cl_page_list_first(qin);
 	if (page->cp_type == CPT_TRANSIENT)
 		brw_flags |= OBD_BRW_NOCACHE;
+	if (lnet_is_rdma_only_page(page->cp_vmpage))
+		brw_flags |= OBD_BRW_RDMA_ONLY;
 
 	/*
 	 * NOTE: here @page is a top-level page. This is done to avoid
diff --git a/fs/lustre/osc/osc_request.c b/fs/lustre/osc/osc_request.c
index 2ac0300..db73fce 100644
--- a/fs/lustre/osc/osc_request.c
+++ b/fs/lustre/osc/osc_request.c
@@ -1402,6 +1402,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 	const char *obd_name = cli->cl_import->imp_obd->obd_name;
 	struct inode *inode = NULL;
 	bool directio = false;
+	bool enable_checksum = true;
 
 	if (pga[0]->pg) {
 		inode = page2inode(pga[0]->pg);
@@ -1545,6 +1546,11 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 		}
 	}
 
+	if (lnet_is_rdma_only_page(pga[0]->pg)) {
+		enable_checksum = false;
+		short_io_size = 0;
+	}
+
 	/* Check if read/write is small enough to be a short io. */
 	if (short_io_size > cli->cl_max_short_io_bytes || niocount > 1 ||
 	    !imp_connect_shortio(cli->cl_import))
@@ -1700,10 +1706,12 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 	if (osc_should_shrink_grant(cli))
 		osc_shrink_grant_local(cli, &body->oa);
 
+	if (!cli->cl_checksum || sptlrpc_flavor_has_bulk(&req->rq_flvr))
+		enable_checksum = false;
+
 	/* size[REQ_REC_OFF] still sizeof (*body) */
 	if (opc == OST_WRITE) {
-		if (cli->cl_checksum &&
-		    !sptlrpc_flavor_has_bulk(&req->rq_flvr)) {
+		if (enable_checksum) {
 			/* store cl_cksum_type in a local variable since
 			 * it can be changed via lprocfs
 			 */
@@ -1743,8 +1751,7 @@ static int osc_brw_prep_request(int cmd, struct client_obd *cli,
 		req_capsule_set_size(pill, &RMF_RCS, RCL_SERVER,
 				     sizeof(u32) * niocount);
 	} else {
-		if (cli->cl_checksum &&
-		    !sptlrpc_flavor_has_bulk(&req->rq_flvr)) {
+		if (enable_checksum) {
 			if ((body->oa.o_valid & OBD_MD_FLFLAGS) == 0)
 				body->oa.o_flags = 0;
 			body->oa.o_flags |= obd_cksum_type_pack(obd_name,
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 08/15] lustre: mgc: rework mgc_apply_recover_logs() for gcc10
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
                   ` (6 preceding siblings ...)
  2021-08-23  2:27 ` [lustre-devel] [PATCH 07/15] lustre: osc: Support RDMA only pages James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 09/15] lnet: socklnd: allow dynamic setting of conns_per_peer James Simmons
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Alex Zhuravlev <bzzz@whamcloud.com>

rework mgc_apply_recover_logs() to use a separate buffer of
appropriate size so that gcc10 doesn't complain:
mgc_request.c:1506:24: error: argument 4 may overlap destination
        object [-Werror=restrict]
 1506 |        pos += sprintf(obdname + pos, "-%s-%s", cname, inst);

WC-bug-id: https://jira.whamcloud.com/browse/LU-14093
Lustre-commit: d13d8158e816b7ac ("LU-14093 mgc: rework mgc_apply_recover_logs() for gcc10")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/40484
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/mgc/mgc_request.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/fs/lustre/mgc/mgc_request.c b/fs/lustre/mgc/mgc_request.c
index 50044aa2..3955d1f 100644
--- a/fs/lustre/mgc/mgc_request.c
+++ b/fs/lustre/mgc/mgc_request.c
@@ -1093,7 +1093,7 @@ static int mgc_apply_recover_logs(struct obd_device *mgc,
 	struct lustre_cfg *lcfg;
 	struct lustre_cfg_bufs bufs;
 	u64 prev_version = 0;
-	char *inst;
+	char inst[MTI_NAME_MAXLEN + 1];
 	char *buf;
 	int bufsz;
 	int pos;
@@ -1107,19 +1107,15 @@ static int mgc_apply_recover_logs(struct obd_device *mgc,
 	/* get dynamic nids setting */
 	dynamic_nids = mgc->obd_dynamic_nids;
 
-	inst = kzalloc(PAGE_SIZE, GFP_KERNEL);
-	if (!inst)
-		return -ENOMEM;
-
-	pos = snprintf(inst, PAGE_SIZE, "%px", cfg->cfg_instance);
-	if (pos >= PAGE_SIZE) {
-		kfree(inst);
+	pos = snprintf(inst, sizeof(inst), "%px", cfg->cfg_instance);
+	if (pos >= sizeof(inst))
 		return -E2BIG;
-	}
 
-	++pos;
-	buf = inst + pos;
-	bufsz = PAGE_SIZE - pos;
+	buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+	bufsz = PAGE_SIZE;
+	pos = 0;
 
 	while (datalen > 0) {
 		int entry_len = sizeof(*entry);
@@ -1204,7 +1200,7 @@ static int mgc_apply_recover_logs(struct obd_device *mgc,
 				  is_ost ? "OST" : "MDT", entry->mne_index);
 
 		cname = is_ost ? "osc" : "mdc",
-			pos += sprintf(obdname + pos, "-%s-%s", cname, inst);
+		pos += snprintf(obdname + pos, bufsz, "-%s-%s", cname, inst);
 		lustre_cfg_bufs_reset(&bufs, obdname);
 
 		/* find the obd by obdname */
@@ -1308,7 +1304,7 @@ static int mgc_apply_recover_logs(struct obd_device *mgc,
 		/* continue, even one with error */
 	}
 
-	kfree(inst);
+	kfree(buf);
 	return rc;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 09/15] lnet: socklnd: allow dynamic setting of conns_per_peer
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
                   ` (7 preceding siblings ...)
  2021-08-23  2:27 ` [lustre-devel] [PATCH 08/15] lustre: mgc: rework mgc_apply_recover_logs() for gcc10 James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 10/15] lnet: Provide kernel API for adding peers James Simmons
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

Modify lnetctl and associated code to allow dynamic setting
of conns_per_peer lnd parameter per ni.

The parameter can be set for a specific active nid:
        lnetctl net set --nid 192.168.122.10@tcp --conns-per-peer=4

Or when adding a new net, taking effect on the new nid:
        lnetctl net add --net tcp --if eth0 --conns-per-peer=1

By default, conns_per_peer value specified as the module parameter
shall be used.

WC-bug-id: https://jira.whamcloud.com/browse/LU-12815
Lustre-commit: a5cbe7883db6d77b ("LU-12815 socklnd: allow dynamic setting of conns_per_peer")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/41463
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-lnet.h              |  3 ++
 include/uapi/linux/lnet/libcfs_ioctl.h     |  3 +-
 include/uapi/linux/lnet/lnet-dlc.h         | 14 ++++++
 net/lnet/klnds/socklnd/socklnd.c           | 70 +++++++++++++++---------------
 net/lnet/klnds/socklnd/socklnd.h           |  4 ++
 net/lnet/klnds/socklnd/socklnd_modparams.c | 55 +++++++++++++++++++++--
 net/lnet/lnet/api-ni.c                     | 43 ++++++++++++++++++
 7 files changed, 154 insertions(+), 38 deletions(-)

diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index ed54477..760c093 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -86,6 +86,9 @@
 #define DEFAULT_PEER_CREDITS	8
 #define DEFAULT_CREDITS	256
 
+/* default number of connections per peer */
+#define DEFAULT_CONNS_PER_PEER	1
+
 int choose_ipv4_src(u32 *ret, int interface, u32 dst_ipaddr, struct net *ns);
 
 bool lnet_is_route_alive(struct lnet_route *route);
diff --git a/include/uapi/linux/lnet/libcfs_ioctl.h b/include/uapi/linux/lnet/libcfs_ioctl.h
index 7b1c880..f2ae76c 100644
--- a/include/uapi/linux/lnet/libcfs_ioctl.h
+++ b/include/uapi/linux/lnet/libcfs_ioctl.h
@@ -156,6 +156,7 @@ struct libcfs_ioctl_data {
 #define IOC_LIBCFS_GET_UDSP		_IOWR(IOC_LIBCFS_TYPE, 108, IOCTL_CONFIG_SIZE)
 #define IOC_LIBCFS_GET_CONST_UDSP_INFO	_IOWR(IOC_LIBCFS_TYPE, 109, IOCTL_CONFIG_SIZE)
 #define IOC_LIBCFS_RESET_LNET_STATS	_IOWR(IOC_LIBCFS_TYPE, 110, IOCTL_CONFIG_SIZE)
-#define IOC_LIBCFS_MAX_NR				       110
+#define IOC_LIBCFS_SET_CONNS_PER_PEER	_IOWR(IOC_LIBCFS_TYPE, 111, IOCTL_CONFIG_SIZE)
+#define IOC_LIBCFS_MAX_NR				       111
 
 #endif /* __LIBCFS_IOCTL_H__ */
diff --git a/include/uapi/linux/lnet/lnet-dlc.h b/include/uapi/linux/lnet/lnet-dlc.h
index ef60224..2ca70eb 100644
--- a/include/uapi/linux/lnet/lnet-dlc.h
+++ b/include/uapi/linux/lnet/lnet-dlc.h
@@ -81,9 +81,16 @@ struct lnet_ioctl_config_o2iblnd_tunables {
 	__u16 lnd_ntx;
 };
 
+struct lnet_ioctl_config_socklnd_tunables {
+	__u32 lnd_version;
+	__u16 lnd_conns_per_peer;
+	__u16 lnd_pad;
+};
+
 struct lnet_lnd_tunables {
 	union {
 		struct lnet_ioctl_config_o2iblnd_tunables lnd_o2ib;
+		struct lnet_ioctl_config_socklnd_tunables lnd_sock;
 	} lnd_tun_u;
 };
 
@@ -280,6 +287,13 @@ struct lnet_ioctl_reset_health_cfg {
 	lnet_nid_t rh_nid;
 };
 
+struct lnet_ioctl_reset_conns_per_peer_cfg {
+	struct libcfs_ioctl_hdr rcpp_hdr;
+	__u16 rcpp_all:1;
+	__s16 rcpp_value;
+	lnet_nid_t rcpp_nid;
+};
+
 struct lnet_ioctl_recovery_list {
 	struct libcfs_ioctl_hdr rlst_hdr;
 	enum lnet_health_type rlst_type:32;
diff --git a/net/lnet/klnds/socklnd/socklnd.c b/net/lnet/klnds/socklnd/socklnd.c
index cbbbb0c..96cb0e0 100644
--- a/net/lnet/klnds/socklnd/socklnd.c
+++ b/net/lnet/klnds/socklnd/socklnd.c
@@ -37,7 +37,7 @@
  * Author: Eric Barton <eric@bartonsoftware.com>
  */
 
-#include <linux/pci.h>
+#include <linux/ethtool.h>
 #include <linux/inetdevice.h>
 #include <linux/sunrpc/addr.h>
 #include "socklnd.h"
@@ -135,6 +135,7 @@ static int ksocknal_ip2index(struct sockaddr *addr, struct lnet_ni *ni)
 	conn_cb->ksnr_ctrl_conn_count = 0;
 	conn_cb->ksnr_blki_conn_count = 0;
 	conn_cb->ksnr_blko_conn_count = 0;
+	conn_cb->ksnr_max_conns = 0;
 
 	return conn_cb;
 }
@@ -394,6 +395,19 @@ struct ksock_peer_ni *
 	return count;
 }
 
+static unsigned int
+ksocknal_get_conns_per_peer(struct ksock_peer_ni *peer_ni)
+{
+	struct lnet_ni *ni = peer_ni->ksnp_ni;
+	struct lnet_ioctl_config_socklnd_tunables *tunables;
+
+	LASSERT(ni);
+
+	tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_sock;
+
+	return tunables->lnd_conns_per_peer;
+}
+
 static void
 ksocknal_incr_conn_count(struct ksock_conn_cb *conn_cb,
 			 int type)
@@ -409,19 +423,16 @@ struct ksock_peer_ni *
 		break;
 	case SOCKLND_CONN_BULK_IN:
 		conn_cb->ksnr_blki_conn_count++;
-		if (conn_cb->ksnr_blki_conn_count >=
-		    *ksocknal_tunables.ksnd_conns_per_peer)
+		if (conn_cb->ksnr_blki_conn_count >= conn_cb->ksnr_max_conns)
 			conn_cb->ksnr_connected |= BIT(type);
 		break;
 	case SOCKLND_CONN_BULK_OUT:
 		conn_cb->ksnr_blko_conn_count++;
-		if (conn_cb->ksnr_blko_conn_count >=
-		    *ksocknal_tunables.ksnd_conns_per_peer)
+		if (conn_cb->ksnr_blko_conn_count >= conn_cb->ksnr_max_conns)
 			conn_cb->ksnr_connected |= BIT(type);
 		break;
 	case SOCKLND_CONN_ANY:
-		if (conn_cb->ksnr_conn_count >=
-		    *ksocknal_tunables.ksnd_conns_per_peer)
+		if (conn_cb->ksnr_conn_count >= conn_cb->ksnr_max_conns)
 			conn_cb->ksnr_connected |= BIT(type);
 		break;
 	default:
@@ -429,9 +440,8 @@ struct ksock_peer_ni *
 		break;
 	}
 
-	CDEBUG(D_NET, "Add conn type %d, ksnr_connected %x conns_per_peer %d\n",
-	       type, conn_cb->ksnr_connected,
-	       *ksocknal_tunables.ksnd_conns_per_peer);
+	CDEBUG(D_NET, "Add conn type %d, ksnr_connected %x ksnr_max_conns %d\n",
+	       type, conn_cb->ksnr_connected, conn_cb->ksnr_max_conns);
 }
 
 static void
@@ -597,6 +607,13 @@ struct ksock_peer_ni *
 
 	ksocknal_add_conn_cb_locked(peer_ni, conn_cb);
 
+	/* Remember conns_per_peer setting at the time
+	 * of connection initiation. It will define the
+	 * max number of conns per type for this conn_cb
+	 * while it's in use.
+	 */
+	conn_cb->ksnr_max_conns = ksocknal_get_conns_per_peer(peer_ni);
+
 	write_unlock_bh(&ksocknal_data.ksnd_global_lock);
 
 	return 0;
@@ -1002,7 +1019,13 @@ struct ksock_peer_ni *
 				continue;
 
 			num_dup++;
-			if (num_dup < *ksocknal_tunables.ksnd_conns_per_peer)
+			/* If max conns per type is not registered in conn_cb
+			 * as ksnr_max_conns, use ni's conns_per_peer
+			 */
+			if ((peer_ni->ksnp_conn_cb &&
+			     num_dup < peer_ni->ksnp_conn_cb->ksnr_max_conns) ||
+			    (!peer_ni->ksnp_conn_cb &&
+			     num_dup < ksocknal_get_conns_per_peer(peer_ni)))
 				continue;
 
 			/* Reply on a passive connection attempt so the peer_ni
@@ -1229,7 +1252,7 @@ struct ksock_peer_ni *
 		 * of the given type got created
 		 */
 		if (ksocknal_get_conn_count_by_type(conn_cb, conn->ksnc_type) ==
-		    *ksocknal_tunables.ksnd_conns_per_peer)
+		    conn_cb->ksnr_max_conns)
 			LASSERT((conn_cb->ksnr_connected &
 				BIT(conn->ksnc_type)) != 0);
 
@@ -2288,7 +2311,6 @@ static int ksocknal_device_event(struct notifier_block *unused,
 ksocknal_startup(struct lnet_ni *ni)
 {
 	struct ksock_net *net;
-	struct lnet_ioctl_config_lnd_cmn_tunables *net_tunables;
 	struct ksock_interface *ksi = NULL;
 	struct lnet_inetdev *ifaces = NULL;
 	struct sockaddr_in *sa;
@@ -2309,28 +2331,8 @@ static int ksocknal_device_event(struct notifier_block *unused,
 
 	net->ksnn_incarnation = ktime_get_real_ns();
 	ni->ni_data = net;
-	net_tunables = &ni->ni_net->net_tunables;
-
-	if (net_tunables->lct_peer_timeout == -1)
-		net_tunables->lct_peer_timeout =
-			*ksocknal_tunables.ksnd_peertimeout;
-
-	if (net_tunables->lct_max_tx_credits == -1)
-		net_tunables->lct_max_tx_credits =
-			*ksocknal_tunables.ksnd_credits;
-
-	if (net_tunables->lct_peer_tx_credits == -1)
-		net_tunables->lct_peer_tx_credits =
-			*ksocknal_tunables.ksnd_peertxcredits;
-
-	if (net_tunables->lct_peer_tx_credits >
-	    net_tunables->lct_max_tx_credits)
-		net_tunables->lct_peer_tx_credits =
-			net_tunables->lct_max_tx_credits;
 
-	if (net_tunables->lct_peer_rtr_credits == -1)
-		net_tunables->lct_peer_rtr_credits =
-			*ksocknal_tunables.ksnd_peerrtrcredits;
+	ksocknal_tunables_setup(ni);
 
 	rc = lnet_inet_enumerate(&ifaces, ni->ni_net_ns);
 	if (rc < 0)
diff --git a/net/lnet/klnds/socklnd/socklnd.h b/net/lnet/klnds/socklnd/socklnd.h
index 45103a3..7a55492 100644
--- a/net/lnet/klnds/socklnd/socklnd.h
+++ b/net/lnet/klnds/socklnd/socklnd.h
@@ -403,6 +403,9 @@ struct ksock_conn_cb {
 	int			ksnr_conn_count;	/* total # conns for
 							 * this cb
 							 */
+	unsigned int		ksnr_max_conns;		/* conns_per_peer at
+							 * peer creation
+							 */
 };
 
 #define SOCKNAL_KEEPALIVE_PING	1	/* cookie for keepalive ping */
@@ -696,6 +699,7 @@ int ksocknal_lib_get_conn_tunables(struct ksock_conn *conn, int *txmem,
 void ksocknal_write_callback(struct ksock_conn *conn);
 
 int ksocknal_tunables_init(void);
+void ksocknal_tunables_setup(struct lnet_ni *ni);
 
 void ksocknal_lib_csum_tx(struct ksock_tx *tx);
 
diff --git a/net/lnet/klnds/socklnd/socklnd_modparams.c b/net/lnet/klnds/socklnd/socklnd_modparams.c
index bc772e4..c6cce1e 100644
--- a/net/lnet/klnds/socklnd/socklnd_modparams.c
+++ b/net/lnet/klnds/socklnd/socklnd_modparams.c
@@ -24,6 +24,8 @@
 #include <asm/hypervisor.h>
 #endif
 
+#define CURRENT_LND_VERSION 1
+
 static int sock_timeout;
 module_param(sock_timeout, int, 0644);
 MODULE_PARM_DESC(sock_timeout, "dead socket timeout (seconds)");
@@ -139,8 +141,8 @@
 module_param(zc_recv_min_nfrags, int, 0644);
 MODULE_PARM_DESC(zc_recv_min_nfrags, "minimum # of fragments to enable ZC recv");
 
-static unsigned int conns_per_peer = 1;
-module_param(conns_per_peer, uint, 0444);
+static unsigned int conns_per_peer = DEFAULT_CONNS_PER_PEER;
+module_param(conns_per_peer, uint, 0644);
 MODULE_PARM_DESC(conns_per_peer, "number of connections per peer");
 
 #if SOCKNAL_VERSION_DEBUG
@@ -150,9 +152,13 @@
 #endif
 
 struct ksock_tunables ksocknal_tunables;
+static struct lnet_ioctl_config_socklnd_tunables default_tunables;
 
 int ksocknal_tunables_init(void)
 {
+	default_tunables.lnd_version = CURRENT_LND_VERSION;
+	default_tunables.lnd_conns_per_peer = conns_per_peer;
+
 	/* initialize ksocknal_tunables structure */
 	ksocknal_tunables.ksnd_timeout = &sock_timeout;
 	ksocknal_tunables.ksnd_nscheds = &nscheds;
@@ -201,4 +207,47 @@ int ksocknal_tunables_init(void)
 		*ksocknal_tunables.ksnd_zc_min_payload = (16 << 20) + 1;
 
 	return 0;
-};
+}
+
+void ksocknal_tunables_setup(struct lnet_ni *ni)
+{
+	struct lnet_ioctl_config_socklnd_tunables *tunables;
+	struct lnet_ioctl_config_lnd_cmn_tunables *net_tunables;
+
+	/* If no tunables specified, setup default tunables */
+	if (!ni->ni_lnd_tunables_set)
+		memcpy(&ni->ni_lnd_tunables.lnd_tun_u.lnd_sock,
+		       &default_tunables, sizeof(*tunables));
+
+	tunables = &ni->ni_lnd_tunables.lnd_tun_u.lnd_sock;
+
+	/* Current API version */
+	tunables->lnd_version = CURRENT_LND_VERSION;
+
+	net_tunables = &ni->ni_net->net_tunables;
+
+	if (net_tunables->lct_peer_timeout == -1)
+		net_tunables->lct_peer_timeout =
+			*ksocknal_tunables.ksnd_peertimeout;
+
+	if (net_tunables->lct_max_tx_credits == -1)
+		net_tunables->lct_max_tx_credits =
+			*ksocknal_tunables.ksnd_credits;
+
+	if (net_tunables->lct_peer_tx_credits == -1)
+		net_tunables->lct_peer_tx_credits =
+			*ksocknal_tunables.ksnd_peertxcredits;
+
+	if (net_tunables->lct_peer_tx_credits >
+	    net_tunables->lct_max_tx_credits)
+		net_tunables->lct_peer_tx_credits =
+			net_tunables->lct_max_tx_credits;
+
+	if (net_tunables->lct_peer_rtr_credits == -1)
+		net_tunables->lct_peer_rtr_credits =
+			*ksocknal_tunables.ksnd_peerrtrcredits;
+
+	if (!tunables->lnd_conns_per_peer)
+		tunables->lnd_conns_per_peer = (conns_per_peer) ?
+			conns_per_peer : DEFAULT_CONNS_PER_PEER;
+}
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index 370c1d6..bb5fb56 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -3657,6 +3657,30 @@ u32 lnet_get_dlc_seq_locked(void)
 	lnet_net_unlock(LNET_LOCK_EX);
 }
 
+static void
+lnet_ni_set_conns_per_peer(lnet_nid_t nid, int value, bool all)
+{
+	struct lnet_net *net;
+	struct lnet_ni *ni;
+
+	lnet_net_lock(LNET_LOCK_EX);
+	list_for_each_entry(net, &the_lnet.ln_nets, net_list) {
+		list_for_each_entry(ni, &net->net_ni_list, ni_netlist) {
+			if (ni->ni_nid != nid && !all)
+				continue;
+			if (LNET_NETTYP(net->net_id) == SOCKLND)
+				ni->ni_lnd_tunables.lnd_tun_u.lnd_sock.lnd_conns_per_peer = value;
+			else if (LNET_NETTYP(net->net_id) == O2IBLND)
+				ni->ni_lnd_tunables.lnd_tun_u.lnd_o2ib.lnd_conns_per_peer = value;
+			if (!all) {
+				lnet_net_unlock(LNET_LOCK_EX);
+				return;
+			}
+		}
+	}
+	lnet_net_unlock(LNET_LOCK_EX);
+}
+
 static int
 lnet_get_local_ni_hstats(struct lnet_ioctl_local_ni_hstats *stats)
 {
@@ -4086,6 +4110,25 @@ u32 lnet_get_dlc_seq_locked(void)
 		return 0;
 	}
 
+	case IOC_LIBCFS_SET_CONNS_PER_PEER: {
+		struct lnet_ioctl_reset_conns_per_peer_cfg *cfg = arg;
+		int value;
+
+		if (cfg->rcpp_hdr.ioc_len < sizeof(*cfg))
+			return -EINVAL;
+		if (cfg->rcpp_value < 0)
+			value = 1;
+		else
+			value = cfg->rcpp_value;
+		CDEBUG(D_NET,
+		       "Setting conns_per_peer to %d for %s. all = %d\n",
+		       value, libcfs_nid2str(cfg->rcpp_nid), cfg->rcpp_all);
+		mutex_lock(&the_lnet.ln_api_mutex);
+		lnet_ni_set_conns_per_peer(cfg->rcpp_nid, value, cfg->rcpp_all);
+		mutex_unlock(&the_lnet.ln_api_mutex);
+		return 0;
+	}
+
 	case IOC_LIBCFS_NOTIFY_ROUTER: {
 		time64_t deadline = ktime_get_real_seconds() - data->ioc_u64[0];
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 10/15] lnet: Provide kernel API for adding peers
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
                   ` (8 preceding siblings ...)
  2021-08-23  2:27 ` [lustre-devel] [PATCH 09/15] lnet: socklnd: allow dynamic setting of conns_per_peer James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 11/15] lustre: obdclass: Add peer/peer NI when processing llog James Simmons
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

Implement LNetAddPeer() API to allow other kernel modules to add
peers to LNet.

Peers created via this API are not marked as having been configured
by DLC. As such, they can be overwritten by discovery.

HPE-bug-id: LUS-9293
WC-bug-id: https://jira.whamcloud.com/browse/LU-14661
Lustre-commit: ac201366ad5700ed ("LU-14661 lnet: Provide kernel API for adding peers")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/43509
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/api.h      |  1 +
 include/linux/lnet/lib-lnet.h |  2 +-
 net/lnet/lnet/api-ni.c        |  2 +-
 net/lnet/lnet/peer.c          | 60 ++++++++++++++++++++++++++++++++++++++++---
 4 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/include/linux/lnet/api.h b/include/linux/lnet/api.h
index 891c4a6..d32c7c1 100644
--- a/include/linux/lnet/api.h
+++ b/include/linux/lnet/api.h
@@ -164,6 +164,7 @@ int LNetGet(lnet_nid_t self,
 int LNetCtl(unsigned int cmd, void *arg);
 void LNetDebugPeer(struct lnet_process_id id);
 int LNetGetPeerDiscoveryStatus(void);
+int LNetAddPeer(lnet_nid_t *nids, u32 num_nids);
 
 /** @} lnet_misc */
 
diff --git a/include/linux/lnet/lib-lnet.h b/include/linux/lnet/lib-lnet.h
index 760c093..37489ae 100644
--- a/include/linux/lnet/lib-lnet.h
+++ b/include/linux/lnet/lib-lnet.h
@@ -854,7 +854,7 @@ struct lnet_peer_net *lnet_peer_get_net_locked(struct lnet_peer *peer,
 void lnet_peer_clr_pref_rtrs(struct lnet_peer_ni *lpni);
 int lnet_peer_add_pref_rtr(struct lnet_peer_ni *lpni, lnet_nid_t nid);
 int lnet_peer_ni_set_non_mr_pref_nid(struct lnet_peer_ni *lpni, lnet_nid_t nid);
-int lnet_add_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid, bool mr);
+int lnet_add_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid, bool mr, bool temp);
 int lnet_del_peer_ni(lnet_nid_t key_nid, lnet_nid_t nid);
 int lnet_get_peer_info(struct lnet_ioctl_peer_cfg *cfg, void __user *bulk);
 int lnet_get_peer_ni_info(u32 peer_index, u64 *nid,
diff --git a/net/lnet/lnet/api-ni.c b/net/lnet/lnet/api-ni.c
index bb5fb56..41d2d26 100644
--- a/net/lnet/lnet/api-ni.c
+++ b/net/lnet/lnet/api-ni.c
@@ -4015,7 +4015,7 @@ u32 lnet_get_dlc_seq_locked(void)
 		mutex_lock(&the_lnet.ln_api_mutex);
 		rc = lnet_add_peer_ni(cfg->prcfg_prim_nid,
 				      cfg->prcfg_cfg_nid,
-				      cfg->prcfg_mr);
+				      cfg->prcfg_mr, false);
 		mutex_unlock(&the_lnet.ln_api_mutex);
 		return rc;
 	}
diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 224f4e2..c2f5d8b 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -1320,6 +1320,51 @@ struct lnet_peer_ni *
 	return rc;
 }
 
+int
+LNetAddPeer(lnet_nid_t *nids, u32 num_nids)
+{
+	lnet_nid_t pnid = 0;
+	bool mr;
+	int i, rc;
+
+	if (!nids || num_nids < 1)
+		return -EINVAL;
+
+	rc = LNetNIInit(LNET_PID_ANY);
+	if (rc < 0)
+		return rc;
+
+	mutex_lock(&the_lnet.ln_api_mutex);
+
+	mr = lnet_peer_discovery_disabled == 0;
+
+	rc = 0;
+	for (i = 0; i < num_nids; i++) {
+		if (nids[i] == LNET_NID_LO_0)
+			continue;
+
+		if (!pnid) {
+			pnid = nids[i];
+			rc = lnet_add_peer_ni(pnid, LNET_NID_ANY, mr, true);
+		} else if (lnet_peer_discovery_disabled) {
+			rc = lnet_add_peer_ni(nids[i], LNET_NID_ANY, mr, true);
+		} else {
+			rc = lnet_add_peer_ni(pnid, nids[i], mr, true);
+		}
+
+		if (rc && rc != -EEXIST)
+			goto unlock;
+	}
+
+unlock:
+	mutex_unlock(&the_lnet.ln_api_mutex);
+
+	LNetNIFini();
+
+	return rc == -EEXIST ? 0 : rc;
+}
+EXPORT_SYMBOL(LNetAddPeer);
+
 lnet_nid_t
 LNetPrimaryNID(lnet_nid_t nid)
 {
@@ -1538,6 +1583,11 @@ struct lnet_peer_net *
 			else if ((lp->lp_state ^ flags) & LNET_PEER_MULTI_RAIL)
 				rc = -EPERM;
 			goto out;
+		} else if (!(flags & LNET_PEER_CONFIGURED)) {
+			if (lp->lp_primary_nid == nid) {
+				rc = -EEXIST;
+				goto out;
+			}
 		}
 		/* Delete and recreate as a configured peer. */
 		lnet_peer_del(lp);
@@ -1777,17 +1827,19 @@ struct lnet_peer_net *
  * being created/modified/deleted by a different thread.
  */
 int
-lnet_add_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid, bool mr)
+lnet_add_peer_ni(lnet_nid_t prim_nid, lnet_nid_t nid, bool mr, bool temp)
 {
 	struct lnet_peer *lp = NULL;
 	struct lnet_peer_ni *lpni;
-	unsigned int flags;
+	unsigned int flags = 0;
 
 	/* The prim_nid must always be specified */
 	if (prim_nid == LNET_NID_ANY)
 		return -EINVAL;
 
-	flags = LNET_PEER_CONFIGURED;
+	if (!temp)
+		flags = LNET_PEER_CONFIGURED;
+
 	if (mr)
 		flags |= LNET_PEER_MULTI_RAIL;
 
@@ -1806,7 +1858,7 @@ struct lnet_peer_net *
 	lp = lpni->lpni_peer_net->lpn_peer;
 
 	/* Peer must have been configured. */
-	if (!(lp->lp_state & LNET_PEER_CONFIGURED)) {
+	if (!temp && !(lp->lp_state & LNET_PEER_CONFIGURED)) {
 		CDEBUG(D_NET, "peer %s was not configured\n",
 		       libcfs_nid2str(prim_nid));
 		return -ENOENT;
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 11/15] lustre: obdclass: Add peer/peer NI when processing llog
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
                   ` (9 preceding siblings ...)
  2021-08-23  2:27 ` [lustre-devel] [PATCH 10/15] lnet: Provide kernel API for adding peers James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 12/15] lnet: peer state to lock primary nid James Simmons
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Chris Horn, Lustre Development List

From: Chris Horn <chris.horn@hpe.com>

Construct peers when processing the config log so that LNet has
complete information about peer info stored in the config log.

These are "temporary" peers which can be overwritten by discovery.

In client_import_add_nids_to_conn(), we do not need to hold the
import lock when adding NIDs to the obd_uuid, and LNet needs to take
the LNet API mutex when adding/modifying peers. We don't want to take
the mutex while a spin lock is already being held, so drop the spin
lock prior to calling class_add_nids_to_uuid().

HPE-bug-id: LUS-9293
WC-bug-id: https://jira.whamcloud.com/browse/LU-14661
Lustre-commit: 16321de596f63951 ("LU-14661 obdclass: Add peer/peer NI when processing llog")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/43510
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_lib.c        |  3 ++-
 fs/lustre/obdclass/lustre_peer.c | 18 +++++++++++++++++-
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/lustre/ldlm/ldlm_lib.c b/fs/lustre/ldlm/ldlm_lib.c
index c5ee2c3..90bad71 100644
--- a/fs/lustre/ldlm/ldlm_lib.c
+++ b/fs/lustre/ldlm/ldlm_lib.c
@@ -173,9 +173,10 @@ int client_import_add_nids_to_conn(struct obd_import *imp, lnet_nid_t *nids,
 	list_for_each_entry(conn, &imp->imp_conn_list, oic_item) {
 		if (class_check_uuid(&conn->oic_uuid, nids[0])) {
 			*uuid = conn->oic_uuid;
+			spin_unlock(&imp->imp_lock);
 			rc = class_add_nids_to_uuid(&conn->oic_uuid, nids,
 						    nid_count);
-			break;
+			return rc;
 		}
 	}
 	spin_unlock(&imp->imp_lock);
diff --git a/fs/lustre/obdclass/lustre_peer.c b/fs/lustre/obdclass/lustre_peer.c
index c0a0bfb..f7e6a0f 100644
--- a/fs/lustre/obdclass/lustre_peer.c
+++ b/fs/lustre/obdclass/lustre_peer.c
@@ -81,6 +81,7 @@ int class_add_uuid(const char *uuid, u64 nid)
 {
 	struct uuid_nid_data *data, *entry;
 	int found = 0;
+	int rc;
 
 	LASSERT(nid != 0);  /* valid newconfig NID is never zero */
 
@@ -119,9 +120,15 @@ int class_add_uuid(const char *uuid, u64 nid)
 	if (found) {
 		CDEBUG(D_INFO, "found uuid %s %s cnt=%d\n", uuid,
 		       libcfs_nid2str(nid), entry->un_nid_count);
+		rc = LNetAddPeer(entry->un_nids, entry->un_nid_count);
+		CDEBUG(D_INFO, "Add peer %s rc = %d\n",
+		       libcfs_nid2str(data->un_nids[0]), rc);
 		kfree(data);
 	} else {
 		CDEBUG(D_INFO, "add uuid %s %s\n", uuid, libcfs_nid2str(nid));
+		rc = LNetAddPeer(data->un_nids, data->un_nid_count);
+		CDEBUG(D_INFO, "Add peer %s rc = %d\n",
+		       libcfs_nid2str(data->un_nids[0]), rc);
 	}
 	return 0;
 }
@@ -173,7 +180,8 @@ int class_add_nids_to_uuid(struct obd_uuid *uuid, lnet_nid_t *nids,
 			   int nid_count)
 {
 	struct uuid_nid_data *entry;
-	int i;
+	int i, rc;
+	bool matched = false;
 
 	if (nid_count >= MTI_NIDS_MAX) {
 		CDEBUG(D_NET, "too many NIDs (%d) for UUID '%s'\n",
@@ -188,6 +196,8 @@ int class_add_nids_to_uuid(struct obd_uuid *uuid, lnet_nid_t *nids,
 
 		if (!obd_uuid_equals(&entry->un_uuid, uuid))
 			continue;
+
+		matched = true;
 		CDEBUG(D_NET, "Updating UUID '%s'\n", obd_uuid2str(uuid));
 		for (i = 0; i < nid_count; i++)
 			entry->un_nids[i] = nids[i];
@@ -195,6 +205,12 @@ int class_add_nids_to_uuid(struct obd_uuid *uuid, lnet_nid_t *nids,
 		break;
 	}
 	spin_unlock(&g_uuid_lock);
+	if (matched) {
+		rc = LNetAddPeer(entry->un_nids, entry->un_nid_count);
+		CDEBUG(D_INFO, "Add peer %s rc = %d\n",
+		       libcfs_nid2str(entry->un_nids[0]), rc);
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL(class_add_nids_to_uuid);
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 12/15] lnet: peer state to lock primary nid
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
                   ` (10 preceding siblings ...)
  2021-08-23  2:27 ` [lustre-devel] [PATCH 11/15] lustre: obdclass: Add peer/peer NI when processing llog James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 13/15] lustre: llite: Proved an abstraction for AS_EXITING James Simmons
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Amir Shehata, Lustre Development List

From: Amir Shehata <ashehata@whamcloud.com>

Introduce the following two peer states:

LNET_PEER_LOCK_PRIMARY, set by Lustre to lock the primary NID
of a peer to the NID Lustre is configured with

LNET_PEER_BAD_CONFIG, set by LNet if Lustre attempts to set
a peer's Primary NID to a NID used as the primary NID of another
peer

WC-bug-id: https://jira.whamcloud.com/browse/LU-14668
Lustre-commit: 684943e2d0c2ad09 ("LU-14668 lnet: peer state to lock primary nid")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43562
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/linux/lnet/lib-types.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/lnet/lib-types.h b/include/linux/lnet/lib-types.h
index 6b97ab9..85b0d54 100644
--- a/include/linux/lnet/lib-types.h
+++ b/include/linux/lnet/lib-types.h
@@ -812,6 +812,13 @@ struct lnet_peer {
 #define LNET_PEER_MARK_DELETION		BIT(18)
 /* lnet_peer_del()/lnet_peer_del_locked() has been called on the peer */
 #define LNET_PEER_MARK_DELETED		BIT(19)
+/* lock primary NID to what's requested by ULP */
+#define LNET_PEER_LOCK_PRIMARY		BIT(20)
+/* this is for informational purposes only. It is set if a peer gets
+ * configured from Lustre with a primary NID which belongs to another peer
+ * which is also configured by Lustre as the primary NID.
+ */
+#define LNET_PEER_BAD_CONFIG		BIT(21)
 
 struct lnet_peer_net {
 	/* chain on lp_peer_nets */
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 13/15] lustre: llite: Proved an abstraction for AS_EXITING
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
                   ` (11 preceding siblings ...)
  2021-08-23  2:27 ` [lustre-devel] [PATCH 12/15] lnet: peer state to lock primary nid James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 14/15] lnet: socklnd: set conns_per_peer based on link speed James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 15/15] lustre: update version to 2.14.54 James Simmons
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Shaun Tancheff, Lustre Development List

From: Shaun Tancheff <shaun.tancheff@hpe.com>

Linux kernel v3.14-7405-g91b0abe36a7b added AS_EXITING flag
AS_EXITING flag is set while address_space mapping is exiting.

Provide an abstraction mapping_clear_exiting() to clear
the AS_EXITING flag. This balances the kernel
mapping_set_existing().

HPE-bug-id: LUS-9977
WC-bug-id: https://jira.whamcloud.com/browse/LU-14787
Lustre-commit: e423a0bd7a4a59be ("LU-14787 libcfs: Proved an abstraction for AS_EXITING")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/44070
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/llite/llite_internal.h | 7 +++++++
 fs/lustre/llite/vvp_object.c     | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/lustre/llite/llite_internal.h b/fs/lustre/llite/llite_internal.h
index 95e4f45..6b5e318 100644
--- a/fs/lustre/llite/llite_internal.h
+++ b/fs/lustre/llite/llite_internal.h
@@ -32,6 +32,8 @@
 
 #ifndef LLITE_INTERNAL_H
 #define LLITE_INTERNAL_H
+
+#include <linux/pagemap.h>
 #include <obd.h>
 #include <uapi/linux/lustre/lustre_ver.h>
 #include <lustre_disk.h>	/* for s2sbi */
@@ -1110,6 +1112,11 @@ void ll_cl_add(struct file *file, const struct lu_env *env, struct cl_io *io,
 void ll_cl_remove(struct file *file, const struct lu_env *env);
 struct ll_cl_context *ll_cl_find(struct file *file);
 
+static inline void mapping_clear_exiting(struct address_space *mapping)
+{
+	clear_bit(AS_EXITING, &mapping->flags);
+}
+
 extern const struct address_space_operations ll_aops;
 
 /* llite/file.c */
diff --git a/fs/lustre/llite/vvp_object.c b/fs/lustre/llite/vvp_object.c
index 294df88..8a53458 100644
--- a/fs/lustre/llite/vvp_object.c
+++ b/fs/lustre/llite/vvp_object.c
@@ -164,7 +164,7 @@ static int vvp_prune(const struct lu_env *env, struct cl_object *obj)
 	}
 
 	ll_truncate_inode_pages_final(inode);
-	clear_bit(AS_EXITING, &inode->i_mapping->flags);
+	mapping_clear_exiting(inode->i_mapping);
 
 	return 0;
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 14/15] lnet: socklnd: set conns_per_peer based on link speed
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
                   ` (12 preceding siblings ...)
  2021-08-23  2:27 ` [lustre-devel] [PATCH 13/15] lustre: llite: Proved an abstraction for AS_EXITING James Simmons
@ 2021-08-23  2:27 ` James Simmons
  2021-08-23  2:27 ` [lustre-devel] [PATCH 15/15] lustre: update version to 2.14.54 James Simmons
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown
  Cc: Serguei Smirnov, Lustre Development List

From: Serguei Smirnov <ssmirnov@whamcloud.com>

Specifying conns_per_peer=0 for a ni is now used to set
the conns_per_peer as a function of the corresponding link speed
as follows:
    conns_per_peer = (ilog2(Gbps) / 2 + 1)

Listed below are the resulting defaults for common link speeds:
    100Gbps, 200Gbps    -> 4
    50Gbps              -> 3
    5Gbps, 10Gbps       -> 2
    less than 4Gbps     -> 1

WC-bug-id: https://jira.whamcloud.com/browse/LU-12815
Lustre-commit: c44afcfb72a1c2fd ("LU-12815 socklnd: set conns_per_peer based on link speed")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44417
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/klnds/socklnd/socklnd_modparams.c | 75 +++++++++++++++++++++++++++++-
 1 file changed, 73 insertions(+), 2 deletions(-)

diff --git a/net/lnet/klnds/socklnd/socklnd_modparams.c b/net/lnet/klnds/socklnd/socklnd_modparams.c
index c6cce1e..72f9df2 100644
--- a/net/lnet/klnds/socklnd/socklnd_modparams.c
+++ b/net/lnet/klnds/socklnd/socklnd_modparams.c
@@ -23,6 +23,8 @@
 #if defined(__x86_64__) || defined(__i386__)
 #include <asm/hypervisor.h>
 #endif
+#include <linux/inetdevice.h>
+#include <linux/ethtool.h>
 
 #define CURRENT_LND_VERSION 1
 
@@ -154,6 +156,75 @@
 struct ksock_tunables ksocknal_tunables;
 static struct lnet_ioctl_config_socklnd_tunables default_tunables;
 
+static int ksocklnd_ni_get_eth_intf_speed(struct lnet_ni *ni)
+{
+	const struct in_ifaddr *ifa;
+	struct net_device *dev;
+	int intf_idx = -1;
+	int ret = -1;
+
+	rtnl_lock();
+	for_each_netdev(ni->ni_net_ns, dev) {
+		int flags = dev_get_flags(dev);
+		struct in_device *in_dev;
+
+		if (flags & IFF_LOOPBACK) /* skip the loopback IF */
+			continue;
+
+		if (!(flags & IFF_UP))
+			continue;
+
+		in_dev = __in_dev_get_rcu(dev);
+		if (!in_dev)
+			continue;
+
+		in_dev_for_each_ifa_rcu(ifa, in_dev) {
+			if (strcmp(ifa->ifa_label, ni->ni_interface) == 0)
+				intf_idx = dev->ifindex;
+		}
+		if (intf_idx >= 0)
+			break;
+	}
+	if (intf_idx >= 0) {
+		struct ethtool_link_ksettings cmd;
+		int ethtool_ret;
+
+		/* Some devices may not be providing link settings */
+		ethtool_ret = __ethtool_get_link_ksettings(dev, &cmd);
+		if (!ethtool_ret)
+			ret = cmd.base.speed;
+		else
+			ret = ethtool_ret;
+	}
+	rtnl_unlock();
+
+	return ret;
+}
+
+static int ksocklnd_speed2cpp(int speed)
+{
+	/* Use the minimum of 1Gbps to avoid calling ilog2 with 0 */
+	if (speed < 1000)
+		speed = 1000;
+
+	/* Pick heuristically optimal conns_per_peer value
+	 * for the specified ethernet interface speed (Mbps)
+	 */
+	return ilog2(speed / 1000) / 2 + 1;
+}
+
+static int ksocklnd_lookup_conns_per_peer(struct lnet_ni *ni)
+{
+	int cpp = DEFAULT_CONNS_PER_PEER;
+	int speed = ksocklnd_ni_get_eth_intf_speed(ni);
+
+	CDEBUG(D_NET, "intf %s speed %d\n", ni->ni_interface, speed);
+	if (speed > 0)
+		cpp = ksocklnd_speed2cpp(speed);
+
+	return cpp;
+}
+
 int ksocknal_tunables_init(void)
 {
 	default_tunables.lnd_version = CURRENT_LND_VERSION;
@@ -248,6 +319,6 @@ void ksocknal_tunables_setup(struct lnet_ni *ni)
 			*ksocknal_tunables.ksnd_peerrtrcredits;
 
 	if (!tunables->lnd_conns_per_peer)
-		tunables->lnd_conns_per_peer = (conns_per_peer) ?
-			conns_per_peer : DEFAULT_CONNS_PER_PEER;
+		tunables->lnd_conns_per_peer =
+			ksocklnd_lookup_conns_per_peer(ni);
 }
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [lustre-devel] [PATCH 15/15] lustre: update version to 2.14.54
  2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
                   ` (13 preceding siblings ...)
  2021-08-23  2:27 ` [lustre-devel] [PATCH 14/15] lnet: socklnd: set conns_per_peer based on link speed James Simmons
@ 2021-08-23  2:27 ` James Simmons
  14 siblings, 0 replies; 16+ messages in thread
From: James Simmons @ 2021-08-23  2:27 UTC (permalink / raw)
  To: Andreas Dilger, Oleg Drokin, NeilBrown; +Cc: Lustre Development List

From: Oleg Drokin <green@whamcloud.com>

New tag 2.14.54

Signed-off-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 include/uapi/linux/lustre/lustre_ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/lustre/lustre_ver.h b/include/uapi/linux/lustre/lustre_ver.h
index 093f898..90254ed 100644
--- a/include/uapi/linux/lustre/lustre_ver.h
+++ b/include/uapi/linux/lustre/lustre_ver.h
@@ -3,9 +3,9 @@
 
 #define LUSTRE_MAJOR 2
 #define LUSTRE_MINOR 14
-#define LUSTRE_PATCH 53
+#define LUSTRE_PATCH 54
 #define LUSTRE_FIX 0
-#define LUSTRE_VERSION_STRING "2.14.53"
+#define LUSTRE_VERSION_STRING "2.14.54"
 
 #define OBD_OCD_VERSION(major, minor, patch, fix)			\
 	(((major) << 24) + ((minor) << 16) + ((patch) << 8) + (fix))
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-08-23  2:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-23  2:27 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS as of Aug 22, 2021 James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 01/15] lustre: uapi: support fixed directory layout James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 02/15] lustre: pcc: add LCM_FL_PCC_RDONLY layout flag James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 03/15] lustre: mdt: implement fallocate in MDC/MDT James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 04/15] lnet: Reflect ni_fatal in NI status James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 05/15] lustre: obdclass: reintroduce lu_ref James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 06/15] lnet: keep in insync to change due to GPU Direct Support James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 07/15] lustre: osc: Support RDMA only pages James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 08/15] lustre: mgc: rework mgc_apply_recover_logs() for gcc10 James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 09/15] lnet: socklnd: allow dynamic setting of conns_per_peer James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 10/15] lnet: Provide kernel API for adding peers James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 11/15] lustre: obdclass: Add peer/peer NI when processing llog James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 12/15] lnet: peer state to lock primary nid James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 13/15] lustre: llite: Proved an abstraction for AS_EXITING James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 14/15] lnet: socklnd: set conns_per_peer based on link speed James Simmons
2021-08-23  2:27 ` [lustre-devel] [PATCH 15/15] lustre: update version to 2.14.54 James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).