QoS Management in OpenSM ============================================================================== Table of contents ============================================================================== 1. Overview 2. Full QoS Policy File 3. Simplified QoS Policy Definition 4. Policy File Syntax Guidelines 5. Examples of Full Policy File 6. Simplified QoS Policy - Details and Examples 7. SL2VL Mapping and VL Arbitration ============================================================================== 1. Overview ============================================================================== When QoS in OpenSM is enabled (-Q or --qos), OpenSM looks for QoS Policy file. The default name of OpenSM QoS policy file is /usr/local/etc/opensm/qos-policy.conf. The default may be changed by using -Y or --qos_policy_file option with OpenSM. During fabric initialization and at every heavy sweep OpenSM parses the QoS policy file, applies its settings to the discovered fabric elements, and enforces the provided policy on client requests. The overall flow for such requests is: - The request is matched against the defined matching rules such that the QoS Level definition is found. - Given the QoS Level, path(s) search is performed with the given restrictions imposed by that level. There are two ways to define QoS policy: - Full policy, where the policy file syntax provides an administrator various ways to match PathRecord/MultiPathRecord (PR/MPR) request and enforce various QoS constraints on the requested PR/MPR - Simplified QoS policy definition, where an administrator would be able to match PR/MPR requests by various ULPs and applications running on top of these ULPs. While the full policy syntax is very flexible, in many cases the simplified policy definition would be sufficient. ============================================================================== 2. Full QoS Policy File ============================================================================== QoS policy file has the following sections: I) Port Groups (denoted by port-groups). This section defines zero or more port groups that can be referred later by matching rules (see below). Port group lists ports by: - Port GUID - Port name, which is a combination of NodeDescription and IB port number - PKey, which means that all the ports in the subnet that belong to partition with a given PKey belong to this port group - Partition name, which means that all the ports in the subnet that belong to partition with a given name belong to this port group - Node type, where possible node types are: CA, SWITCH, ROUTER, ALL, and SELF (SM's port). II) QoS Setup (denoted by qos-setup). This section describes how to set up SL2VL and VL Arbitration tables on various nodes in the fabric. However, this is not supported in OpenSM currently. SL2VL and VLArb tables should be configured in the OpenSM options file (default location - /usr/local/etc/opensm/opensm.conf). III) QoS Levels (denoted by qos-levels). Each QoS Level defines Service Level (SL) and a few optional fields: - MTU limit - Rate limit - PKey - Packet lifetime When path(s) search is performed, it is done with regards to restriction that these QoS Level parameters impose. One QoS level that is mandatory to define is a DEFAULT QoS level. It is applied to a PR/MPR query that does not match any existing match rule. Similar to any other QoS Level, it can also be explicitly referred by any match rule. IV) QoS Matching Rules (denoted by qos-match-rules). Each PathRecord/MultiPathRecord query that OpenSM receives is matched against the set of matching rules. Rules are scanned in order of appearance in the QoS policy file such as the first match takes precedence. Each rule has a name of QoS level that will be applied to the matching query. A default QoS level is applied to a query that did not match any rule. Queries can be matched by: - Source port group (whether a source port is a member of a specified group) - Destination port group (same as above, only for destination port) - PKey - QoS class - Service ID To match a certain matching rule, PR/MPR query has to match ALL the rule's criteria. However, not all the fields of the PR/MPR query have to appear in the matching rule. For instance, if the rule has a single criterion - Service ID, it will match any query that has this Service ID, disregarding rest of the query fields. However, if a certain query has only Service ID (which means that this is the only bit in the PR/MPR component mask that is on), it will not match any rule that has other matching criteria besides Service ID. ============================================================================== 3. Simplified QoS Policy Definition ============================================================================== Simplified QoS policy definition comprises of a single section denoted by qos-ulps. Similar to the full QoS policy, it has a list of match rules and their QoS Level, but in this case a match rule has only one criterion - its goal is to match a certain ULP (or a certain application on top of this ULP) PR/MPR request, and QoS Level has only one constraint - Service Level (SL). The simplified policy section may appear in the policy file in combine with the full policy, or as a stand-alone policy definition. See more details and list of match rule criteria below. ============================================================================== 4. Policy File Syntax Guidelines ============================================================================== - Empty lines are ignored. - Leading and trailing blanks, as well as empty lines, are ignored, so the indentation in the example is just for better readability. - Comments are started with the pound sign (#) and terminated by EOL. - Any keyword should be the first non-blank in the line, unless it's a comment. - Keywords that denote section/subsection start have matching closing keywords. - Having a QoS Level named "DEFAULT" is a must - it is applied to PR/MPR requests that didn't match any of the matching rules. - Any section/subsection of the policy file is optional. ============================================================================== 5. Examples of Full Policy File ============================================================================== As mentioned earlier, any section of the policy file is optional, and the only mandatory part of the policy file is a default QoS Level. Here's an example of the shortest policy file: qos-levels qos-level name: DEFAULT sl: 0 end-qos-level end-qos-levels Port groups section is missing because there are no match rules, which means that port groups are not referred anywhere, and there is no need defining them. And since this policy file doesn't have any matching rules, PR/MPR query won't match any rule, and OpenSM will enforce default QoS level. Essentially, the above example is equivalent to not having QoS policy file at all. The following example shows all the possible options and keywords in the policy file and their syntax: # # See the comments in the following example. # They explain different keywords and their meaning. # port-groups port-group # using port GUIDs name: Storage # "use" is just a description that is used for logging # Other than that, it is just a comment use: SRP Targets port-guid: 0x10000000000001, 0x10000000000005-0x1000000000FFFA port-guid: 0x1000000000FFFF end-port-group port-group name: Virtual Servers # The syntax of the port name is as follows: # "node_description/Pnum". # node_description is compared to the NodeDescription of the node, # and "Pnum" is a port number on that node. port-name: vs1 HCA-1/P1, vs2 HCA-1/P1 end-port-group # using partitions defined in the partition policy port-group name: Partitions partition: Part1 pkey: 0x1234 end-port-group # using node types: CA, ROUTER, SWITCH, SELF (for node that runs SM) # or ALL (for all the nodes in the subnet) port-group name: CAs and SM node-type: CA, SELF end-port-group end-port-groups qos-setup # This section of the policy file describes how to set up SL2VL and VL # Arbitration tables on various nodes in the fabric. # However, this is not supported in OpenSM currently - the section is # parsed and ignored. SL2VL and VLArb tables should be configured in the # OpenSM options file (by default - /usr/local/etc/opensm/opensm.conf). end-qos-setup qos-levels # Having a QoS Level named "DEFAULT" is a must - it is applied to # PR/MPR requests that didn't match any of the matching rules. qos-level name: DEFAULT use: default QoS Level sl: 0 end-qos-level # the whole set: SL, MTU-Limit, Rate-Limit, PKey, Packet Lifetime qos-level name: WholeSet sl: 1 mtu-limit: 4 rate-limit: 5 pkey: 0x1234 packet-life: 8 end-qos-level end-qos-levels # Match rules are scanned in order of their apperance in the policy file. # First matched rule takes precedence. qos-match-rules # matching by single criteria: QoS class qos-match-rule use: by QoS class qos-class: 7-9,11 # Name of qos-level to apply to the matching PR/MPR qos-level-name: WholeSet end-qos-match-rule # show matching by destination group and service id qos-match-rule use: Storage targets destination: Storage service-id: 0x10000000000001, 0x10000000000008-0x10000000000FFF qos-level-name: WholeSet end-qos-match-rule qos-match-rule source: Storage use: match by source group only qos-level-name: DEFAULT end-qos-match-rule qos-match-rule use: match by all parameters qos-class: 7-9,11 source: Virtual Servers destination: Storage service-id: 0x0000000000010000-0x000000000001FFFF pkey: 0x0F00-0x0FFF qos-level-name: WholeSet end-qos-match-rule end-qos-match-rules ============================================================================== 6. Simplified QoS Policy - Details and Examples ============================================================================== Simplified QoS policy match rules are tailored for matching ULPs (or some application on top of a ULP) PR/MPR requests. This section has a list of per-ULP (or per-application) match rules and the SL that should be enforced on the matched PR/MPR query. Match rules include: - Default match rule that is applied to PR/MPR query that didn't match any of the other match rules - SDP - SDP application with a specific target TCP/IP port range - SRP with a specific target IB port GUID - RDS - iSER - iSER application with a specific target TCP/IP port range - IPoIB with a default PKey - IPoIB with a specific PKey - any ULP/application with a specific Service ID in the PR/MPR query - any ULP/application with a specific PKey in the PR/MPR query - any ULP/application with a specific target IB port GUID in the PR/MPR query Since any section of the policy file is optional, as long as basic rules of the file are kept (such as no referring to nonexisting port group, having default QoS Level, etc), the simplified policy section (qos-ulps) can serve as a complete QoS policy file. The shortest policy file in this case would be as follows: qos-ulps default : 0 #default SL end-qos-ulps It is equivalent to the previous example of the shortest policy file, and it is also equivalent to not having policy file at all. Below is an example of simplified QoS policy with all the possible keywords: qos-ulps default : 0 # default SL sdp, port-num 30000 : 0 # SL for application running on top # of SDP when a destination # TCP/IPport is 30000 sdp, port-num 10000-20000 : 0 sdp : 1 # default SL for any other # application running on top of SDP rds : 2 # SL for RDS traffic iser, port-num 900 : 0 # SL for iSER with a specific target # port iser : 3 # default SL for iSER ipoib, pkey 0x0001 : 0 # SL for IPoIB on partition with # pkey 0x0001 ipoib : 4 # default IPoIB partition, # pkey=0x7FFF any, service-id 0x6234 : 6 # match any PR/MPR query with a # specific Service ID any, pkey 0x0ABC : 6 # match any PR/MPR query with a # specific PKey srp, target-port-guid 0x1234 : 5 # SRP when SRP Target is located on # a specified IB port GUID any, target-port-guid 0x0ABC-0xFFFFF : 6 # match any PR/MPR query with # a specific target port GUID end-qos-ulps Similar to the full policy definition, matching of PR/MPR queries is done in order of appearance in the QoS policy file such as the first match takes precedence, except for the "default" rule, which is applied only if the query didn't match any other rule. All other sections of the QoS policy file take precedence over the qos-ulps section. That is, if a policy file has both qos-match-rules and qos-ulps sections, then any query is matched first against the rules in the qos-match-rules section, and only if there was no match, the query is matched against the rules in qos-ulps section. Note that some of these match rules may overlap, so in order to use the simplified QoS definition effectively, it is important to understand how each of the ULPs is matched: 6.1 IPoIB IPoIB query is matched by PKey. Default PKey for IPoIB partition is 0x7fff, so the following three match rules are equivalent: ipoib : ipoib, pkey 0x7fff : any, pkey 0x7fff : 6.2 SDP SDP PR query is matched by Service ID. The Service-ID for SDP is 0x000000000001PPPP, where PPPP are 4 hex digits holding the remote TCP/IP Port Number to connect to. The following two match rules are equivalent: sdp : any, service-id 0x0000000000010000-0x000000000001ffff : 6.3 RDS Similar to SDP, RDS PR query is matched by Service ID. The Service ID for RDS is 0x000000000106PPPP, where PPPP are 4 hex digits holding the remote TCP/IP Port Number to connect to. Default port number for RDS is 0x48CA, which makes a default Service-ID 0x00000000010648CA. The following two match rules are equivalent: rds : any, service-id 0x00000000010648CA : 6.4 iSER Similar to RDS, iSER query is matched by Service ID, where the the Service ID is also 0x000000000106PPPP. Default port number for iSER is 0x0CBC, which makes a default Service-ID 0x0000000001060CBC. The following two match rules are equivalent: iser : any, service-id 0x0000000001060CBC : 6.5 SRP Service ID for SRP varies from storage vendor to vendor, thus SRP query is matched by the target IB port GUID. The following two match rules are equivalent: srp, target-port-guid 0x1234 : any, target-port-guid 0x1234 : Note that any of the above ULPs might contain target port GUID in the PR query, so in order for these queries not to be recognized by the QoS manager as SRP, the SRP match rule (or any match rule that refers to the target port guid only) should be placed at the end of the qos-ulps match rules. 6.6 MPI SL for MPI is manually configured by MPI admin. OpenSM is not forcing any SL on the MPI traffic, and that's why it is the only ULP that did not appear in the qos-ulps section. ============================================================================== 7. SL2VL Mapping and VL Arbitration ============================================================================== OpenSM cached options file has a set of QoS related configuration parameters, that are used to configure SL2VL mapping and VL arbitration on IB ports. These parameters are: - Max VLs: the maximum number of VLs that will be on the subnet. - High limit: the limit of High Priority component of VL Arbitration table (IBA 7.6.9). - VLArb low table: Low priority VL Arbitration table (IBA 7.6.9) template. - VLArb high table: High priority VL Arbitration table (IBA 7.6.9) template. - SL2VL: SL2VL Mapping table (IBA 7.6.6) template. It is a list of VLs corresponding to SLs 0-15 (Note that VL15 used here means drop this SL). There are separate QoS configuration parameters sets for various target types: CAs, routers, switch external ports, and switch's enhanced port 0. The names of such parameters are prefixed by "qos__" string. Here is a full list of the currently supported sets: qos_ca_ - QoS configuration parameters set for CAs. qos_rtr_ - parameters set for routers. qos_sw0_ - parameters set for switches' port 0. qos_swe_ - parameters set for switches' external ports. Here's the example of typical default values for CAs and switches' external ports (hard-coded in OpenSM initialization): qos_ca_max_vls 15 qos_ca_high_limit 0 qos_ca_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 qos_ca_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 qos_ca_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 qos_swe_max_vls 15 qos_swe_high_limit 0 qos_swe_vlarb_high 0:4,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0,11:0,12:0,13:0,14:0 qos_swe_vlarb_low 0:0,1:4,2:4,3:4,4:4,5:4,6:4,7:4,8:4,9:4,10:4,11:4,12:4,13:4,14:4 qos_swe_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 VL arbitration tables (both high and low) are lists of VL/Weight pairs. Each list entry contains a VL number (values from 0-14), and a weighting value (values 0-255), indicating the number of 64 byte units (credits) which may be transmitted from that VL when its turn in the arbitration occurs. A weight of 0 indicates that this entry should be skipped. If a list entry is programmed for VL15 or for a VL that is not supported or is not currently configured by the port, the port may either skip that entry or send from any supported VL for that entry. Note, that the same VLs may be listed multiple times in the High or Low priority arbitration tables, and, further, it can be listed in both tables. The limit of high-priority VLArb table (qos__high_limit) indicates the number of high-priority packets that can be transmitted without an opportunity to send a low-priority packet. Specifically, the number of bytes that can be sent is high_limit times 4K bytes. A high_limit value of 255 indicates that the byte limit is unbounded. Note: if the 255 value is used, the low priority VLs may be starved. A value of 0 indicates that only a single packet from the high-priority table may be sent before an opportunity is given to the low-priority table. Keep in mind that ports usually transmit packets of size equal to MTU. For instance, for 4KB MTU a single packet will require 64 credits, so in order to achieve effective VL arbitration for packets of 4KB MTU, the weighting values for each VL should be multiples of 64. Below is an example of SL2VL and VL Arbitration configuration on subnet: qos_ca_max_vls 15 qos_ca_high_limit 6 qos_ca_vlarb_high 0:4 qos_ca_vlarb_low 0:0,1:64,2:128,3:192,4:0,5:64,6:64,7:64 qos_ca_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 qos_swe_max_vls 15 qos_swe_high_limit 6 qos_swe_vlarb_high 0:4 qos_swe_vlarb_low 0:0,1:64,2:128,3:192,4:0,5:64,6:64,7:64 qos_swe_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,7 In this example, there are 8 VLs configured on subnet: VL0 to VL7. VL0 is defined as a high priority VL, and it is limited to 6 x 4KB = 24KB in a single transmission burst. Such configuration would suilt VL that needs low latency and uses small MTU when transmitting packets. Rest of VLs are defined as low priority VLs with different weights, while VL4 is effectively turned off.