Codebase list ibutils / upstream/1.5.7+0.2.gbd7e502 ibdiag / doc / ibdiag_release_notes.txt
upstream/1.5.7+0.2.gbd7e502

Tree @upstream/1.5.7+0.2.gbd7e502 (Download .tar.gz)

ibdiag_release_notes.txt @upstream/1.5.7+0.2.gbd7e502raw · history · blame

                     Open Fabrics InfiniBand Diagnostic Utilities
                     --------------------------------------------

*******************************************************************************
RELEASE: OFED 1.3
DATE: Feb 2008

===============================================================================
Table of Contents
===============================================================================
1. Overview
2. New features
3. Major Bugs Fixed
4. Known Issues

===============================================================================
1. Overview
===============================================================================

The ibdiag package was enhanced to check more aspects of the network setup,
including partitions, IPoIB and QoS. Additional major feature is it's ability
to write a topology file of the discovered network. A summary table is provided
with a list of the executed checks and their results.

===============================================================================
2. New Features
===============================================================================

The following new checks were added to the tools:

ibdiagnet new features:
-----------------------
+ Partitions Check:
  - Validate all leaf switch ports (conencted to a host) which enforce
    partitions are not blocking partitions set on the host ports they
    are connected to.
  - Report for each partition the member hosts and their membership status.
    Full membership allows hosts to communicate to any other member.
    Limited membership allows communication with full members only.
    The new report file is named ibdiagnet.pkey.

+ IPoIB Subnets Check:
  - The IPoIB subnets and their properties are reported.
  - For each group all the host ports that are part of the partition are
    checked to have a high enough communication rate to be part of the group
     (warn if not).
  - If all the group members can use a communication rate higher then the group
    rate a warning is produced as the subnet uses a suboptimal rate.

Other changes:
+ The multicast groups report was enhanced to provide the details of each
  group and the memebers list is provided in a new report file: ibdiagnet.mcgs.

+ A new flag, -wt <file-name>, was added. ibdiagnet, with the new option,
  writes out a discovered topology file by the provided file-name and
  the required new IBNL files into an output directory named ibdiag_ibnl.
  This new feature allows you to capture the current state of the fabric
  and later compare to it. Such the features provided by the "Topology
  Matching" check become available. These feature include recognizing
  changes in connections, speed and width.

+ Load subnet database from file:
  Ibdiagnet dumps its internal database, which contains the subnet structure,
  to a file (/tmp/ibdiagnet.db by default). This file can be loaded in later
  ibdiagnet runs (using the -load_db <db file> option). When this option is set,
  ibdiagnet loads the subnet data from the file and skips the discovery stage.
  Using this option can save the subnet discovery time for large cluster.
  Note: Some if ibdiagnet checks would not be performed when the -load_db
        option is set. These checks are:
        - Duplicated guids.
        - Zero guids.
        - Links in INIT state.
        - SMs status.

 + A new flag, -skip <skip-option(s)>, was added. When this flag is specified,
   ibdiagnet skips the given check. One or more space separated values can be
   specified.
   Available skip options: dup_guids, zero_guids, pm, logical_state, part, ipoib.
   The -skip flag can be used in order to run only specific checks, or to reduce
   ibdiagnet run time.

ibdiagpath new features:
------------------------
+ Partitions Check:
  - The list of partitions of source and destination ports is reported.
  - A check for which partitions are common to the source, destination and
    every port on the path (if enforcing partitions) is calculated and
    reported. A warning is provided if a source partition is blocked by
    a port on the path.
    An error is provided in there are no common partitions for the path.

+ IPoIB Subnets Check:
  - The IPoIB subnets available for the path and reported.
  - If the source or destination ports are members in partitions which have
    an IPoIB group and for some reason can not join the group a warning is
    provided.

+ QoS Check:
  With the introduction of QoS, the following new issues might arise from
  impropper setup of the fabric:
  - VL Arbitration Tables might use VLs which are higher then the currently
    supported maximal VL on the port. A warning is provided for such cases.
  - VL Arbitration Tables might "block" a VL by setting its weight to zero.
    A warning is provided for these cases
  - SLs (service levels) might be mapped to VLs which are blocked by the
    two above rules. In such case these SLs can not be used by the path.
    A report including the set of "valid" SLs for the path is provided.
  - If there are no "valid" SLs an error is provided since the source and
    destination ports can not communicate.

Common changes to all tools:
----------------------------
A summary table of all the checks perfomed and their total number of errors and
warnings was added to the tools standard output.

===============================================================================
3. Major Bugs Fixed
===============================================================================

+ Fabrics Qualities report is now available in the main log file (and not only
  in the standard output

===============================================================================
4. Known Issues
===============================================================================

- Ibdiagnet tries to query port counters for ports in INIT state. In this
  case, run time would be longer and an error message for each port would be
  printed to screen.
  Workaround:
  * Use "-skip pm" option if links in INIT state are found.
  * Run opensm to activate the links.

- A failure in IPoIB check may cause ibdiagnet to exit, without printing the
  summary report.

- Ibdiagnet "-wt" option may generate a bad topology file when running on a
  cluster that contains complex switch systems.

*******************************************************************************
RELEASE: OFED 1.2
DATE: June 2007

===============================================================================
Table of Contents
===============================================================================
1. Overview
2. Requirements
3. Reports
4. Known Issues


===============================================================================
1. Overview
===============================================================================

The ibdiag package provides tools that encapsulate the best known diagnostic
procedures for InfiniBand fabrics. Target users of these utilities are network
managers not expert in the details of the InfiniBand specification.

The following tools are provided:
o ibdiagnet - performs a diagnostic check of the entire network.
  Should be used on any suspicion of fabric misbehavior.
  The default invocation can be enhanced to perform more advanced checks and
  produce more reports. The following is a partial set of checks it performs:
   - Check for a single master SM
   - Check all routes between hosts are set correctly (include also multicast
     groups)
    - Check for fabric links health

o ibdiagpath - traces a path between two nodes specified by LIDs or a directed
  path. This utility should be used when connectivity between specific two
  hosts is broken.

o ibdiagui - a graphic user interface on top of ibdiagnet
  Mostly suitable for medium size fabrics (<100 nodes) and for users who needs
  to explore an unknown network. The main features it provides is an
  automatically generated connectivity graph, an object properties browser and
  hyperlinking of ibdiagnet log to these widgets.

Note: man pages are provided for each tools.

The package tools performs the following diagnostic procedures:
* Discover the InfiniBand fabric connectivity
* Determine whether or not a Subnet Manager (SM) is running
* Identify links which drop packets and/or incur errors by sending MAD
  packets multiple times, across all the links, reporting port monitor counters
* Identify fabric level mismatches or inconsistencies such as:
  - Duplicate port GUIDs - Two or more different ports with the same GUID
  - Duplicate node GUIDs - Two or more different nodes with the same node GUID
  - Duplicate LIDs - Two or more devices that have the same assigned LID
  - Zero valued LIDs - A device with LID=0 indicates that the SM did not
    assign a LID to this device.
  - Zero valued system GUIDs - A device with system GUID=0 indicates that
    the vendor did not assign it a GUID.
  - An InfiniBand link is in the INIT state, which prevents data transfer
  - Unexpected link width (when using the -lw flag)
  - Unexpected link speed (when using the -ls flag)
  - Check for partitions and SL2VL settings preventing communication between
     specific nodes (ibdiagpath)

===============================================================================
2. Requirements
===============================================================================

Software dependency:

1. ibis and ibdm must be installed (are included in same

2. ibdiagui also depends on installation of the following packages:
    * Tk8.4 is standard on all Linux distributions, but if not please
      download and install from:
      http://www.tcl.tk/software/tcltk/
    * Graphviz - an automatic graph layout utility. Should be downloaded
      and installed from:
      http://www.graphviz.org/

3. ibdiags are part of the ibutils package and should be installed
    as part of it. Updates are possible as standalone package.

===============================================================================
3. Reports
===============================================================================
The default directory for all generated report files is /tmp .

Both utilities collect summary information regarding all the fabric SM's
during the run, and then output that information at end of the run in file
/tmp/ibdiagnet.sm.

Each report message includes:
   - Device Type
   - Device portGUID
   - The direct path to the device
   - If a topology file is provided to be matched with the discovered fabric,
     the node name is also provided in the report message. Otherwise, host
     names are included only in HCA-related report messages.

===============================================================================
4. Known Issues
===============================================================================
   ibdiagpath issues:
   - If no subnet manager is initialized in the subnet, FDB tables may be
     incorrectly set. Consequently, PortCounter MADs cannot be sent.

   - A link along a LID-routed path in INIT state causes ibdiagpath performance
     queries to fail. The performance queries fail since they cannot proceed
      via non-ACTIVE links.

   - ibdiagpath cannot validate the provided topology file against the
      existing fabric topology. If the topology file includes a device/link
      that does not exist, or the device/link information is incorrect,
      then ibdiagpath may - in name-based routing - extract a non-existing
      path based on the incorrect topology file.

   - If the hostname provided for the -s flag is not the actual local
      hostname, then all the extracted names from the topology file will
      be incorrect. However, all the other information provided will be correct.

   - Executing "ibis exit" in order to terminate ibis, while running over
      RHEL5 i686, causes ibis to exit uncleanly.