Contrail: How to configure QoS ?



Quality of service has always been a critical piece of MPLS VPN networks. QoS allows to classify customer applications and prioritize them when a congestion occurs on a network link.
This post describes how quality of service is implemented in Contrail and how it should be configured. The description provided is based on Contrail 3.2.0 version.




Main concepts




The Quality of Service implementation in Contrail is based on multiple concepts.

The classification block groups packets into classes based on the DSCP, the MPLS EXP or the 802.1p value. The output of the classification process is a forwarding-class identifier which corresponds to the class of the packet. The classification is always done at the ingress packet processing step.
The classification rules are defined using qos-config objects in Contrail.

For each class of service to be implemented, a forwarding-class object should be created. This forwarding-class object describes the behavior associated to the class of service such as the packet rewrite rules (DSCP, EXP, 802.1p) and the output logical queue identifier.
Contrail does not manage the output packet scheduling policies. For performance optimization reason, this task is delegated to the NIC and the vrouter does not perform any scheduling action.

The logical queue (described in the qos-queue object) is a pure Contrail data and does not represent the queue on the NIC.

A mapping between the logical queue (Contrail queue) and the hardware queue (NIC queue) is required at the vrouter level.




The figure above describes two VMs hosted on different compute nodes. The VM2 sends two types of IPv4 traffic: some critical traffic to the VM1 which requires no packet loss and a low latency, and some best effort traffic to Internet.
Two classes of service must be implemented in order to ensure the flow prioritization between the critical and the best effort traffic.

At Contrail level, we use two forwarding-classes (FC) as we need to describe two behaviors (one for each class of traffic). Let's consider FC1 to be associated to the critical traffic, and FC2 to the best effort traffic.
Each forwarding-class will be associated rewrite rules and a logical-queue.
FC1 rewrites the DSCP value to EF and enqueue the traffic to the logical-queue 2. The logical queue 2 is associated to the hardware queue 0.
FC2 rewrites the DSCP value to BE and enqueue the traffic to the logical-queue 1. The logical queue 1 is associated to the hardware queue 1.

A classification policy (qos-config) is required to ensure that the traffic is correctly classified when entering the vrouter. In our example, we consider that we use the DSCP value to classify the packets and packets coming with the DSCP EF are associated to FC1 while all other packets are associated to FC2.
In our example, we apply the classification policy to the VM interface of VM2.

The NIC is configured to consider the hardware queue 0 as a strict priority queue while the other hardware queues will be served in a regular round-robin fashion.



When a best effort packet comes from the VM2 in the vrouter (through the tap interface), the classification policy is applied to determine the forwarding class associated with the packet. As it is a packet with a DSCP different from EF, the packet is associated with the forwarding class #2. The appropriate forwarding actions are taken on the packet including the DSCP rewrite which applies to the outer IP header of the packet (tunnel encapsulation). As FC2 is associated with the logical-queue 1, the vrouter knows that it should enqueue the packet in the hardware queue 1 of the NIC. The NIC will then dequeue the packet based on its packet scheduling parameters.
When a critical packet comes in from the VM2, the vrouter classifies the packet in FC1 and performs the necessary actions including the rewrite rules. It will enqueue the packet in the hardware queue 0 of the NIC and the NIC will dequeue the packet immediately thanks to the strict priority.


Classification policies

Classification policies are maintained in the qos-config objects in Contrail.
There are three main types of classification policies:
  • vhost policy: which handles the traffic coming from the compute host OS and going through the vrouter.
  • fabric policy: which handles the traffic coming from the fabric port and entering the vrouter.
  • project policies (which are more regular policies): which handle any other type of traffic.



The classification policy is defined by associating DSCP values, EXP values and/or 802.1p values to forwarding-class identifiers. A default forwarding class must also be defined in case there is no match.


The following script gives an example on how to configure a Project classification policy using the VNC API:

from vnc_api import vnc_api

vnc_lib = vnc_api.VncApi(api_server_host='10.0.0.1',api_server_port='8082',
                        auth_host='10.0.0.1',username='admin',password='admin',tenant_name='admin')


print "Retrieve project"
admin = vnc_lib.project_read(fq_name=['default-domain','admin'])

print "Create DSCP mappings"
dscp_obj = vnc_api.QosIdForwardingClassPair(key=46,forwarding_class_id=1)
dscp_obj2 = vnc_api.QosIdForwardingClassPair(key=54,forwarding_class_id=2)
print "Create DSCP list"
dscp_list_obj = vnc_api.QosIdForwardingClassPairs(qos_id_forwarding_class_pair=[dscp_obj,dscp_obj2])

print "Create EXP mappings"
exp_obj1 = vnc_api.QosIdForwardingClassPair(key=6,forwarding_class_id=1)
exp_obj2 = vnc_api.QosIdForwardingClassPair(key=0,forwarding_class_id=2)
print "Create EXP list"
exp_list_obj = vnc_api.QosIdForwardingClassPairs(qos_id_forwarding_class_pair=[exp_obj1,exp_obj2])


print "Create project QoS policy"
qoscfg_obj = vnc_api.QosConfig(name="MY_TEST_QOS_CFG",
                                parent_obj=admin,
                                qos_config_type='project',
                                dscp_entries=dscp_list_obj,
                                mpls_exp_entries=exp_list_obj,
                                default_forwarding_class_id=0,
                                display_name="MY_TEST_QOS_CFG")

vnc_lib.qos_config_create(qoscfg_obj)


Note: the vnc_api documentation can be found at Link.

This example creates two lists of forwarding class mappings, one for the DSCP classification and one for the MPLS EXP classification. In this example, we map DSCP 46 for FC1, DSCP 54 to FC2, EXP 6 to FC1 and EXP 0 to FC2.
The example finally creates the qos-config object, named "MY_TEST_QOS_CONFIG" which defines FC0 as the default forwarding class and uses the mapping lists we created earlier.

An API call allows to verify the proper configuration of the policy:

root@SRV1:~/Contrail# curl -s -H "X-Auth-Token: $(keystone token-get | awk '/ id / {print $4}')"  http://192.168.227.37:8082/qos-configs | python -mjson.tool
{
    "qos-configs": [
        {
            "fq_name": [
                "default-domain",
                "admin",
                "MY_TEST_QOS_CFG"
            ],
            "href": "http://192.168.227.37:8082/qos-config/e6f7c9fa-5024-4f09-b5c3-05b4d20771bb",
            "uuid": "e6f7c9fa-5024-4f09-b5c3-05b4d20771bb"
        }
    ]
}


root@SRV1:~/Contrail# curl -s -H "X-Auth-Token: $(keystone token-get | awk '/ id / {print $4}')"  http://192.168.227.37:8082/qos-config/e6f7c9fa-5024-4f09-b5c3-05b4d20771bb | python -mjson.tool

{
    "qos-config": {
        "default_forwarding_class_id": 0,
        "display_name": "MY_TEST_QOS_CFG",
        "dscp_entries": {
            "qos_id_forwarding_class_pair": [
                {
                    "forwarding_class_id": 2,
                    "key": 54
                },
                {
                    "forwarding_class_id": 1,
                    "key": 46
                }
            ]
        },
        "fq_name": [
            "default-domain",
            "admin",
            "MY_TEST_QOS_CFG"
        ],
        "global_system_config_refs": [
            {
                "attr": null,
                "href": "http://192.168.227.37:8082/global-system-config/7cbfca32-1552-4d31-b5c2-7c1673cadd01",
                "to": [
                    "default-global-system-config"
                ],
                "uuid": "7cbfca32-1552-4d31-b5c2-7c1673cadd01"
            }
        ],
        "href": "http://192.168.227.37:8082/qos-config/e6f7c9fa-5024-4f09-b5c3-05b4d20771bb",
        "id_perms": {
            "created": "2017-05-15T13:33:08.070543",
            "creator": null,
            "description": null,
            "enable": true,
            "last_modified": "2017-05-15T13:33:08.070543",
            "permissions": {
                "group": "cloud-admin",
                "group_access": 7,
                "other_access": 7,
                "owner": "admin",
                "owner_access": 7
            },
            "user_visible": true,
            "uuid": {
                "uuid_lslong": 13097318415499489723,
                "uuid_mslong": 16642993024894521097
            }
        },
        "mpls_exp_entries": {
            "qos_id_forwarding_class_pair": [
                {
                    "forwarding_class_id": 1,
                    "key": 6
                },
                {
                    "forwarding_class_id": 2,
                    "key": 0
                }
            ]
        },
        "name": "MY_TEST_QOS_CFG",
        "parent_href": "http://192.168.227.37:8082/project/a5af7aac-3915-4161-9dd4-d6ae6c795ced",
        "parent_type": "project",
        "parent_uuid": "a5af7aac-3915-4161-9dd4-d6ae6c795ced",
        "perms2": {
            "global_access": 0,
            "owner": "a5af7aac391541619dd4d6ae6c795ced",
            "owner_access": 7,
            "share": []
        },
        "qos_config_type": "project",
        "uuid": "e6f7c9fa-5024-4f09-b5c3-05b4d20771bb"
    }
}


Classification policies can also be configured using the Contrail WEB UI as displayed below:




Applying classification policies

The Project classification policies can be applied at three different levels:
  • VM interface level: the traffic coming from the VM in the vrouter is classified using the rules defined in the policy.
  • At network policy level: the traffic matching the policy is classified using the rules defined in the policy.
  • At virtual network level:the traffic entering the virtual network is classified using the rules defined in the policy.

The following Python script adds a classification policy to a VM interface:

from vnc_api import vnc_api

vnc_lib = vnc_api.VncApi(api_server_host='10.0.0.1',api_server_port='8082', auth_host='10.0.0.1', username='admin', password='admin', tenant_name='admin')                      

qoscfg = vnc_lib.qos_config_read(fq_name= ['default-domain', 'admin', 'MY_TEST_QOS_CFG'])

myif = vnc_lib.virtual_machine_interface_read(id='a3d92c0d-eedd-4973-b862-76462a54786f')

myif.add_qos_config(qoscfg)

vnc_lib.virtual_machine_interface_update(myif)

The following Python script adds a classification policy to a virtual network:

from vnc_api import vnc_api

vnc_lib = vnc_api.VncApi(api_server_host='10.0.0.1', api_server_port='8082', auth_host='10.0.0.1', username='admin', password='admin', tenant_name='admin')

qoscfg = vnc_lib.qos_config_read(fq_name= ['default-domain', 'admin', 'MY_TEST_QOS_CFG'])

myvn = vnc_lib.virtual_network_read(fq_name= ['default-domain', 'admin', 'SLI1'])

myvn.add_qos_config(qoscfg)

vnc_lib.virtual_network_update(myvn)


The WEBUI can also be used to apply a classification policy. The example below displays the UI configuration of a classification policy on a network policy.



The classification policy can also be checked using the vrouter CLI:

root@SRV1:~/Contrail# vif --list
Vrouter Interface Table

Flags: P=Policy, X=Cross Connect, S=Service Chain, Mr=Receive Mirror
       Mt=Transmit Mirror, Tc=Transmit Checksum Offload, L3=Layer 3, L2=Layer 2
       D=DHCP, Vp=Vhost Physical, Pr=Promiscuous, Vnt=Native Vlan Tagged
       Mnp=No MAC Proxy, Dpdk=DPDK PMD Interface, Rfl=Receive Filtering Offload, Mon=Interface is Monitored
       Uuf=Unknown Unicast Flood, Vof=VLAN insert/strip offload, Df=Drop New Flows

vif0/0      OS: p2p1 (Speed 1000, Duplex 1)
            Type:Physical HWaddr:90:e2:ba:8e:3a:74 IPaddr:0
            Vrf:0 Flags:TcL3L2VpPr MTU:1514 QOS:-1 Ref:7
            RX packets:3795777  bytes:1804258263 errors:0
            TX packets:1955571097  bytes:218940274962 errors:0
            Drops:180

vif0/1      OS: vhost0
            Type:Host HWaddr:90:e2:ba:8e:3a:74 IPaddr:a000002
            Vrf:0 Flags:L3L2 MTU:1514 QOS:1 Ref:3
            RX packets:7012898  bytes:16291156125 errors:0
            TX packets:3748951  bytes:1797336837 errors:0
            Drops:9

vif0/21     OS: tap1de8ec30-2d
            Type:Virtual HWaddr:00:00:5e:00:01:00 IPaddr:0
            Vrf:1 Flags:PL3L2D MTU:9160 QOS:0 Ref:5
            RX packets:101483  bytes:11610723 errors:0
            TX packets:1848289764  bytes:140471430681 errors:0
            Drops:50435

This example shows that the qos policy 0 is applied to vif0/21 which is a VM interface. The content of the policy can be retrieved using the qosmap utility:

root@SRV1:~/Contrail# qosmap --get-qos 0
QOS Map 0/0
    DSCP              FC
      10               4
      18               3
      26               2
      34               1
      46               1
      48               5
      56               5

     EXP              FC

 DOTONEP              FC

vhost traffic classification

The vhost traffic can be classified by the vrouter based on the DSCP, EXP or 802.1p value in the packet. If the process does not set the appropriate value in the packet, this packet cannot be classified correctly by the vrouter.

To overcome this limitation, iptables rules can be set on the host OS to ensure the appropriate marking of the flows coming from the processes.
The following configuration sets a DSCP value of 48 to XMPP flows and a DSCP value of 10 to introspec flows:

root@SRV1:/# iptables -t mangle -A OUTPUT -p tcp -m tcp --dport 5269 -j DSCP --set-dscp 48
root@SRV1:/# iptables -t mangle -A OUTPUT -p tcp -m tcp --sport 8085 -j DSCP --set-dscp 10



Forwarding-classes

The forwarding-class object defines the behavior associated with a particular class of service.
A forwarding-class object should be configured for each required class of service.

The forwarding-classes are central to the system and does not need to be applied. Each time a new forwarding-class ID is created, the system automatically takes it into account.

The forwarding-class defines the rewrite rules for the DSCP, EXP and 802.1p as well as the output logical queue.


The following script shows how to configure a forwarding-class:

from vnc_api import vnc_api

vnc_lib = vnc_api.VncApi(api_server_host='10.0.0.1',api_server_port='8082',
                        auth_host='10.0.0.1',username='admin',password='admin',tenant_name='admin')



print "Create a QoS queue\n"
qosq = vnc_api.QosQueue(name="90",qos_queue_identifier=90,display_name="90")
vnc_lib.qos_queue_create(qosq)

print "Create Forwarding Class\n"
fc = vnc_api.ForwardingClass(name="42",
                                forwarding_class_id=42,
                                forwarding_class_dscp=12,
                                forwarding_class_mpls_exp=2,
                                forwarding_class_vlan_priority=5)

print "Adding the QoS queue to the FC\n"
fc.add_qos_queue(qosq)

print "Terminate the creation of the FC\n"
vnc_lib.forwarding_class_create(fc)


An API call can be used to check that the object has been successfully created in Contrail:

root@SRV1:~/Contrail# curl -s -H "X-Auth-Token: $(keystone token-get | awk '/ id / {print $4}')"  http://192.168.227.37:8082/forwarding-classs | python -mjson.tool
{
    "forwarding-classs": [
        {
            "fq_name": [
                "default-global-system-config",
                "default-global-qos-config",
                "42"
            ],
            "href": "http://192.168.227.37:8082/forwarding-class/c1b0cbc7-3470-4e83-9999-340896ebbc86",
            "uuid": "c1b0cbc7-3470-4e83-9999-340896ebbc86"
        }
    ]
}



root@SRV1:~/Contrail# curl -s -H "X-Auth-Token: $(keystone token-get | awk '/ id / {print $4}')"  http://192.168.227.37:8082/forwarding-class/c1b0cbc7-3470-4e83-9999-340896ebbc86 | python -mjson.tool
{
    "forwarding-class": {
        "display_name": "42",
        "forwarding_class_dscp": 12,
        "forwarding_class_id": 42,
        "forwarding_class_mpls_exp": 2,
        "forwarding_class_vlan_priority": 5,
        "fq_name": [
            "default-global-system-config",
            "default-global-qos-config",
            "42"
        ],
        "href": "http://192.168.227.37:8082/forwarding-class/c1b0cbc7-3470-4e83-9999-340896ebbc86",
        "id_perms": {
            "created": "2017-05-15T14:34:31.026058",
            "creator": null,
            "description": null,
            "enable": true,
            "last_modified": "2017-05-15T14:34:31.026058",
            "permissions": {
                "group": "cloud-admin",
                "group_access": 7,
                "other_access": 7,
                "owner": "admin",
                "owner_access": 7
            },
            "user_visible": true,
            "uuid": {
                "uuid_lslong": 11067934770736118918,
                "uuid_mslong": 13956879301659872899
            }
        },
        "name": "42",
        "parent_href": "http://192.168.227.37:8082/global-qos-config/89c0154e-eede-478e-9674-2283ceaeefbd",
        "parent_type": "global-qos-config",
        "parent_uuid": "89c0154e-eede-478e-9674-2283ceaeefbd",
        "perms2": {
            "global_access": 0,
            "owner": "a5af7aac391541619dd4d6ae6c795ced",
            "owner_access": 7,
            "share": []
        },
        "qos_queue_refs": [
            {
                "attr": null,
                "href": "http://192.168.227.37:8082/qos-queue/5ee0adaf-479b-4949-abc7-ce75bddc7686",
                "to": [
                    "default-global-system-config",
                    "default-global-qos-config",
                    "90"
                ],
                "uuid": "5ee0adaf-479b-4949-abc7-ce75bddc7686"
            }
        ],
        "uuid": "c1b0cbc7-3470-4e83-9999-340896ebbc86"
    }
}


A forwarding class can also be configured using the WEB UI:



Note: when using the WEB UI, there is no need to configure the QoS queue object which corresponds to the logical queue. It is automatically done by the system when the logical-queue ID is provided in the forwarding-class configuration.

Rewrite rules

The packet rewrite rules are applied when the packet is sent to the fabric port. There is no remarking done when the packet is forwarded between VMs hosted on the same compute node.
The remarking happens on the outer header of the packet (on the tunnel header).

The following capture shows how the vrouter marked the outer MPLS and IP header of the packet (the example uses an MPLSoGRE encapsulation on the fabric port).
The associated forwarding-class configuration is to set the DSCP value to CS6 and the EXP value to 6.




Queuing

The vrouter maintains the concept of logical queues and each forwarding-class is associated to a logical-queue. Those logical queues are purely internal to Contrail and does not have any direct link with the NIC hardware queues. 

The number of logical queues used in Contrail could be higher than the number of queues available on the NIC.

A mapping between the logical queues and the hardware queues should be done in the vrouter-agent configuration file.

This can be done through the testbed.py or directly in the contrail-vrouter-agent.conf.

The following configuration provides a sample testbed.py configuration:

env.qos = {
        host1: [
                {'hardware_q_id':'1','logical_queue':['1'],'default':'True'},
                {'hardware_q_id':'9','logical_queue':['2']},
                {'hardware_q_id':'41','logical_queue':['6','10-20','50']},
        ]
}

In the configuration sample provided, the hardware queue 1 is associated to the logical queue 1 and is set as the default hardware queue. The hardware queue 9 is associated to the logical queue 2. The hardware queue 41 is associated to multiple logical queues: 6, 10 up to 20 and 50.
This shows how the number of logical queues can be higher than the number of hardware queues. Multiple logical queues can be aggregated into a single hardware queue.
The default hardware queue if a logical queue used in Contrail does not have an associated hardware queue.

When using the testbed.py, the fab "setup_qos_queuing" should be used to deploy the appropriate vrouter agent configuration file.
The fab script will update the configuration file, deactivates XPS and restarts the vrouter agent.
XPS (Transmit Packet Steering) is a mechanism to select a transmit queue on a multi-queue device based on a per CPU core affinity. This mechanism should be disabled as a transmit queue selection should be done on a per forwarding-class basis and not on a per core basis.

The resulting contrail-vrouter-agent.conf is as follows:

[QOS]
[QUEUE-1]
# This is the default hardware queue
default_hw_queue= true
# Logical nic queues for qos config
logical_queue=[1]

[QUEUE-9]
# Logical nic queues for qos config
logical_queue=[2]

[QUEUE-41]
# Logical nic queues for qos config
logical_queue=[6,10-20,50]


By using this mapping, the vrouter knows the exact hardware queue to use for a packet associated with a particular forwarding-class. The "qos-map" utility provided with Contrail can be used to check on the vrouter the forwarding-class configuration:

root@SRV1:~/Contrail# qosmap --get-fc 42
Forwarding Class Map 0
 FC            DSCP  EXP  .1p    Queue
 42              12    2    5       1

The example above shows that the FC 42 is associated with the hardware queue 1. When we set up the forwarding class 42, it was associated with the logical queue 90, but the logical queue 90 is not mapped to an hardware queue. As a consequence Contrail uses the default hardware queue which is 1.

The appropriate queueing can be checked using the ethtool -S command:

root@SRV1:~/Contrail# ethtool -S p2p1 | grep tx_queue_.*_pac
     tx_queue_0_packets: 8535430
     tx_queue_1_packets: 7302346
     tx_queue_2_packets: 8741187
     tx_queue_3_packets: 9341669
     tx_queue_4_packets: 9206391
     tx_queue_5_packets: 8638594
     tx_queue_6_packets: 7609915
     tx_queue_7_packets: 8226832
     tx_queue_8_packets: 9333220
     tx_queue_9_packets: 25695758
     tx_queue_10_packets: 7711683
     tx_queue_11_packets: 9209279


Scheduling

The job of the vrouter ends when it queues the packet in the appropriate hardware queue.
And at this stage, the NIC starts to schedule the packet based on its scheduling configuration.

The packet scheduling task is fully under the NIC responsibility. As a consequence, the scheduling parameters and capabilities depend on the NIC to be used. 

Contrail has been successfully tested with Intel NIANTIC based cards (X520) and provides tools to setup the scheduling on these NICs. Using other NICs may be possible and may work but, in that case, the configuration of the NIC driver to perform the scheduling should be by the user. 

Intel NIANTIC cards supports Datacenter Bridging Technology and especially the packet prioritization (ETS). It supports eight Priority Groups and the possibility for each priority group to act in a strict priority mode or in a guaranteed bandwidth mode.
These NICs supports 64 queues and, to make it simpler, each group of 8 queues is mapped to a priority group as displayed in the diagram below.


Each priority group can be defined to act in a strict priority mode or in a guaranteed bandwidth mode. Multiple priority groups can be defined in a strict priority mode and in that case, the priority group with the higher identifier will be served first. There is no way to rate-limit a priority group with a strict priority, so there is a possibility that it consumes all the bandwidth.
When there is a congestion between priority groups in the guaranteed bandwidth mode, the queues are served to guarantee the percentage of bandwidth for each priority group.

In order to implement class of services with a different behavior, it is important to choose the hardware queue number appropriately.
Let's say that we want to implement 6 class of services:

  • One class with strict priority for low latency traffic.
  • One class for high priority data with 50% of guaranteed bandwidth.
  • One class for medium priority data with 34% of guaranteed bandwidth.
  • One class for low priority data with 10% of guaranteed bandwidth.
  • One class for network control traffic with 5% of guaranteed bandwidth.
  • One class for best effort traffic with 1% of guaranteed bandwidth.
To ensure the right behavior of each class, we need to pickup hardware queues in a different priority group for each class. As a consequence, we need 6 queues located in 6 different priority groups. An example of hardware queue mapping could be:
ClassFC IDHWQLQPGProperty
Low latency1120Strict
High priority4335450%
Medium priority3254334%
Low priority2173210%
Network Control59615%
Best Effort041151%

Note that any logical queue ID can be used as far as the logical queue is mapped to the appropriate hardware queue.


The configuration of the scheduling (for NIANTIC cards) can be achieved in multiple ways:
  • By using the testbed.py and the appropriate fab script.
  • By using the qosmap utility.


The testbed.py can be modified using the following sample configuration:

env.qos_niantic = {
   host2:[
    { 'priority_id': '0', 'scheduling': 'strict', 'bandwidth': '0'},
    { 'priority_id': '1', 'scheduling': 'rr', 'bandwidth': '5'},
    { 'priority_id': '2', 'scheduling': 'rr', 'bandwidth': '10'},
    { 'priority_id': '3', 'scheduling': 'rr', 'bandwidth': '34'},
    { 'priority_id': '4', 'scheduling': 'rr', 'bandwidth': '50'},
    { 'priority_id': '5', 'scheduling': 'rr', 'bandwidth': '1'}
   ]
}

This configuration defines the properties of each priority group to be used:
  • the scheduling type: 'rr' for round-robin (guaranteed bandwidth mode) or 'strict' for strict priority queuing.
  • the bandwidth percent: only relevant for 'rr' priority group.

The fab "setup_qos_scheduling" can be used to read the testbed.py and deploy the qos scheduling on the appropriate hosts. This fab file uses the qosmap utility to program the NIC.

The qosmap utility can also be used directly on the compute node to program the NIC:

/usr/bin/qosmap --set-queue p2p1 --dcbx ieee --bw 0,5,10,34,50,1,0,0 --strict 10000000

The command takes as parameters:
  • The name of the network interface to be used.
  • A list of bandwidth percent ordered by priority group (from 0 to 7).
  • A bitmask for the strictness of each priority group: when the bit is set the priority group is considered as strict. The MSB corresponds to the PG0.


The scheduling configuration can be checked using the qosmap utility:

root@SRV1:/home/nie# qosmap --get-queue p2p1
Priority Operation
Interface:                   p2p1
DCBX:                        IEEE
DCB State:                  Disabled

                               P0   P1   P2   P3   P4   P5   P6   P7
Traffic Class:                  0    1    2    3    4    5    6    7

                              TC0  TC1  TC2  TC3  TC4  TC5  TC6  TC7
Priority Group:                 0    1    2    3    4    5    6    7

                              PG0  PG1  PG2  PG3  PG4  PG5  PG6  PG7
Priority Group Bandwidth:       0    5   10   34   50    1    0    0
Strictness:                     1    0    0    0    0    0    0    0


Scheduling operation

The command 'tc -s class show dev <if>' allows to check the status of each queue and especially packet drops. This commands does not display the queue at the hardware level but at the kernel level.


root@SRV1:/home/nie# tc -s class show dev p2p1
class mq :1 root
 Sent 887690749 bytes 8535456 pkt (dropped 0, overlimits 0 requeues 1082)
 backlog 0b 0p requeues 1082
class mq :2 root
 Sent 759488618 bytes 7302349 pkt (dropped 0, overlimits 0 requeues 2352)
 backlog 0b 0p requeues 2352
class mq :3 root
 Sent 909090550 bytes 8741189 pkt (dropped 0, overlimits 0 requeues 1526)
 backlog 0b 0p requeues 1526
class mq :4 root
 Sent 971531518 bytes 9341669 pkt (dropped 0, overlimits 0 requeues 355)
 backlog 0b 0p requeues 355


As an example, I setup a compute using a 1Gbps fabric port and sending about 400Mbps in each class of service, except Best Effort which was not used and high priority data which sends 300Mbps of traffic.



The resulting transmission rate in each class was:

Queue 1 (LL), sent 180663484394 (348.644897460938 Mbps), 225516959 (57050.6666666667 pps), drop 868487 (0 pps)

Queue 9 (NC), sent 7471607762 (32.4336995442708 Mbps), 8570597 (5157.76666666667 pps), drop 16499636 (30697.333333 pps)

Queue 17 (LP), sent 72786421764 (64.9051523844401 Mbps), 90339913 (10484 pps), drop 61050612 (24270.1333333333 pps)

Queue 25 (MP), sent 61952485572 (220.722416178385 Mbps), 77279185 (36014.3333333333 pps), drop 3711981 (13944.4 pps)

Queue 33 (HP), sent 310291353350 (254.943929036458 Mbps), 386488821 (41575.0333333333 pps), drop 54688423 (0 pps)

Queue 41 (BE), sent 15004491634 (0 Mbps), 18453155 (0 pps), drop 76531343 (0 pps)

Note 1: this output comes from an home-made script which periodically uses the "tc" command presented to compute transmission rates.
Note 2: the transmission rate per class is not accurate and may vary around the desired value.


We can see that the property of each class is handled correctly by the NIC:
  • The low latency class (LL) does not experience any drop thanks to the strict priority. After the EF has been served, around 652Mbps are available for the other classes (taking into account that EF currently uses 348Mbps in the displayed capture).
  • The network control class (NC) uses approximatively 32Mbps which corresponds to 5% of the remaining bandwidth (652Mbps).
  • The medium priority class (MP) uses approximatively 220Mbps which corresponds to 34% of the remaining bandwidth (652Mbps).