Optimizing TCP Goodput and Delay in next generation IEEE 802.11 (ax) devices

In this paper we suggest three scheduling strategies for the IEEE 802.11ax transmis- sion of DL unidirectional TCP data from the Access Point to stations. Two strategies are based on the Single User operation mode and one is based on the Multi User operation mode, using Multi User Multiple-Input-Multiple-Output (MU-MIMO) and OFDMA. We measure the Goodput of the system as a function of the time intervals over which these Goodputs are received in all three strategies. For up to 8 stations the MU strategy outperforms the SU. For 16 and 32 stations it is not clear whether MU outperforms SU or vice versa. For 64 stations the SU strategies outperform the MU significantly. We also checked the influence of the Delayed Acks feature on the received Goodputs and found that this feature has significance only when the TCP data segments are relatively short.


Background
The latest IEEE 802.11 Standard (WiFi) [1], created and maintained by the IEEE LAN/MAN Standards Committee (IEEE 802.11), is currently the most effective solution within the range of Wireless Local Area Networks (WLAN). Since its first release in 1997 the standard provides the basis for Wireless network products using the WiFi brand, and has since been improved upon in many ways. One of the main goals of these improvements is to increase the system throughput provided by users and to improve the standard's Quality-of-Service (QoS) capabilities. To fulfill the promise of increasing IEEE 802.11 performance and QoS capabilities, a new amendment (IEEE 802.11ax -also known as High Efficiency (HE) ) was recently introduced [2]. IEEE 802.11ax is considered to be the sixth generation of a WLAN in the IEEE 802.11 set of WLAN types and is a successor to IEEE 802.11ac [3,4]. The scope of the IEEE 802.11ax amendment is to define modifications for both the IEEE 802.11 PHY and MAC layers that enable at least four-fold improvement in the average throughput per station in densely deployed networks [5][6][7][8]. Currently IEEE 802.11ax project is finalizing revision 2.0, which will be the baseline for WFA IEEE 802.11ax certification.

Research question
In order to achieve its goals, one of the main challenges of IEEE 802.11ax is to enable UL and DL simultaneous transmissions by several stations and to improve Quality-of-Service performance. The current paper is a continuation to papers [9][10][11]. In these papers the authors suggest scheduling strategies for the parallel transmissions of the AP to a given set of stations using new features of IEEE 802.11ax . The authors assume UDP-like traffic where the AP transmits data MSDUs to the stations, which reply with MAC acknowledgments. In this paper we assume a DL unidirectional TCP-like traffic in which the AP transmits TCP Data MSDUs to a given set of stations, and the stations reply with TCP Ack MSDUs. As far as we know the issue of transmitting TCP traffic over IEEE 802. 11ax has not yet been investigated. We suggest several scheduling strategies for the transmissions of TCP traffic over the DL using Single User (SU) and Multi User (MU) modes for 1, 4, 8, 16, 32 and 64 stations scenarios over a reliable channel. This is one of the aspects to compare between new amendments of the IEEE 802.11 standard [12]. In this paper we are interested in finding an upper bound on the maximum DL unidirectional TCP Goodput that can be achieved by IEEE 802.11ax and comparing between the various scheduling strategies. Therefore, we assume the traffic saturation model where TCP connections always have data to transmit and the TCP Ack is generated immediately by receivers. Second, we neutralize any aspects of the PHY layer as the number of Spatial Streams (SS) in use and channel correlation when using Multi User Multiple Input Multiple Output (MU-MIMO), the use in the sounding protocol etc.
As mentioned, we assume that every TCP connection has an unlimited number of TCP Data segments to transmit, and we assume that transmissions are made using an optimized (in terms of overhead reduction) two level aggregation scheme to be described later. Our goal is to find an upper bound on the maximum possible Goodput that the wireless channel enables the TCP connections, where the TCP itself does not impose any limitations on the offered load, i.e. on the rate that MSDUs are given for transmission to the MAC layer of the IEEE 802.11ax. We also assume that the AP and the stations are the end points of the TCP connections. Following e.g. [13][14][15][16] it is quite common to consider short Round Trip Times (RTT) in this kind of high speed network such that no retransmission timeouts occur. Moreover, we assume that every TCP connections' Transmission Window can always provide as many MSDUs to transmit as the IEEE 802.11ax protocol limits enable. This assumption follows the observation that aggregation is useful in a scenario where the offered load on the channel is high. Finally, we assume that every TCP Ack either acknowledges one TCP Data segment, or it acknowledges two TCP Data segments. The latter possibility is denoted Delayed Acks, a feature in TCP that enables a TCP Ack to acknowledge two TCP Data segments.
This research is only a first step in investigating TCP traffic in IEEE 802.11ax. In our further papers we plan to address other TCP traffic scenarios to investigate such as UL unidirectional TCP traffic and bi-directional TCP traffic.

Previous works
The issue of TCP traffic over IEEE 802.11ax that involves bidirectional data packet exchange has not yet been studied. Most of the research papers on IEEE 802.11ax thus far examine different access methods to enable efficient multi-user access to random sets of stations. For example, in [17] the authors deal with the introduction of Orthogonal Frequency Division Multiple Access (OFDMA) into IEEE 802.11ax to enable multi user access. They introduce an OFDMA based multiple access protocol, denoted Orthogonal MAC for IEEE 802.11ax (OMAX), to solve synchronization problems and reduce overhead associated with using OFDMA. In [18] the authors suggest an access protocol over the UL of an IEEE 802.11ax WLAN based on MU-MIMO and OFDMA PHY. In [19] the authors suggest a centralized medium access protocol for the UL of IEEE 802.11ax in order to efficiently use the transmission resources. In this protocol, stations transmit requests for frequency subcarriers, denoted Resource Units (RU), to the AP over the UL. The AP allocates RUs to the stations which later use them for data transmissions over the UL. In [20]  We would like to mention that the issue of TCP traffic over IEEE 802.11ac networks (the predecessor standard of IEEE 802.11ax) has already been investigated, e.g. in [26][27][28], for DL TCP traffic, UL TCP traffic and both DL and UL TCP traffic. However, in all these works there is no possibility of using the MU operation mode over the UL, a feature that was first introduced in IEEE 802.11ax .
The remainder of the paper is organized as follows: In Section 2 we describe the new mechanisms of IEEE 802.11ax relevant to this paper. In Section 3 we describe the scheduling strategies that we suggest in SU and MU modes. We assume the reader is familiar with the basics of PHY and MAC layers of IEEE 802.11 described in previous papers, e.g. [29].
In Section 4 we analytically compute the Goodputs of the various scheduling strategies.
In Section 5 we present the Goodputs of the various scheduling strategies and Section 6 summarizes the paper. In the Appendix we show how to efficiently schedule MPDUs in the various scheduling strategies. Lastly, moving forward, we denote IEEE 802.11ax by 11ax .
2 The new features in IEEE 802.11ax IEEE 802.11ax focuses on implementing mechanisms to efficiently serve more users, enabling consistent and reliable streams of data ( average throughput per user ) in the presence of multiple users. In order to meet these targets 11ax addresses several new mechanisms in both the PHY and MAC layers. At the PHY layer, 11ax enables larger OFDM FFT sizes (4X larger) and therefore every OFDM symbol is 12.8µs compared to 3.2µs in IEEE 802.11ac, the predecessor of 11ax . By narrower sub-carrier spacing (4X closer) the protocol efficiency is increased because the same Guard Interval (GI) is used both in 11ax and in previous versions of the standard.
In addition, to increase the average throughput per user in high-density scenarios, 11ax introduces two new Modulation Coding Schemes (MCSs), MCS10 (1024 QAM ) and MCS 11 (1024 QAM 5/6), applicable for transmission with bandwidth larger than 20 MHz.
In this paper we use the Transmission Opportunity (TXOP) feature first introduced in IEEE 802.11n [30]. This feature allows a station, after gaining access to the channel, to transmit several PHY Protocol Data Units (PPDUs) in a row without interruption, and can also allocate some of the TXOP time interval to one or more receivers in order to allow data transmission in the reverse link. This is termed Reverse Direction (RD). For scenarios with bidirectional traffic such as TCP Data segments/Ack segments, this approach is very efficient as it reduces contention in the wireless channel.
We focus on optimizing the TXOP duration and pattern, PPDU duration and the 11ax's two-level aggregation scheme working point first introduced in IEEE 802.11n [30] In 11ax the size of an MPDU is limited to 11454 bytes and the size of the A-MPDU frame is limited to 4,194,304 bytes. The transmission time of the PPDU (PSDU and its preamble) is limited to 5.484ms (5484µs) due to the L-SIG (one of the legacy preamble's fields) duration limit [1]. The A-MPDU frame structure in two-level aggregation is shown in Figure 1.
IEEE 802.11ax also enables extension of the acknowledgment mechanism by using an acknowledgment window of 256 MPDUs. In this paper we also assume that all MPDUs transmitted in an A-MPDU frame are from the same Traffic Stream (TS). In this case up to 256 MPDUs are allowed in an A-MPDU frame of 11ax.
Finally, in 11ax it is possible to transmit/receive simultaneously to/from up to 74 stations over the DL/UL respectively using MU.

HE scheduling strategies for TCP Usage
We compare between 11ax contention based Single User (SU), Reverse Direction (RD) SU and Multi User (MU) TCP DL unidirectional scheduling strategies in order to optimize the performance of DL single direction TCP connections, from the AP to stations.

Scheduling strategy 1 -HE DL Single User Reverse Direction unidirectional TCP
Recall that Reverse Direction (RD) is a mechanism by which the owner of a Transmission Opportunity (TXOP), the AP in our case, can enable its receiver to immediately transmit back the TCP Acks during the TXOP so that the receiver does not need to initiate UL transmission by using the Extended Distributed Coordination Function (EDCF) channel access method defined in IEEE 802.11e [1]. This is particularly efficient for bi-directional traffic such as TCP Data/Ack segments as it reduces overhead caused by collisions.
We examine a HE RD based scheduling strategy in which the AP transmits DL HE SU A-MPDU frames containing MPDUs of TCP Data segments to a station and enables the station to answer with an UL HE SU A-MPDU frame containing MPDUs frames of TCP Acks segments. Both the AP and the stations apply the two-level aggregation. We assume the following scenario to use RD, as is illustrated in Figure 2.
After waiting AIFS and BackOff according to the 802.11 air access EDCA procedure, the AP initiates a TXOP by transmitting n DL HE SU A-MPDU frames in a row. Every such DL PPDU transmission, followed by receiving the BAck frame from the station, is denoted a HE DL RD TCP Data cycle. In its last DL HE SU A-MPDU frame the AP sets the RDG bit [1], enabling the station to respond with an UL HE SU A-MPDU frame containing TCP Ack segments. The AP then responds with a BAck frame and terminates the TXOP with the CF-End frame [1]. The transmission of the UL HE SU A-MPDU frame by the station, followed by the BAck transmission from the AP, is denoted a HE UL RD TCP Ack cycle.
In this HE RD based scheduling strategy we assume that there are no collisions and TXOP are repeated over the channel one after the other. This is made possible by configuring the stations in a way that prevents collisions. For example, the stations are configured to choose their BackOff intervals from very large contention intervals, other than the default ones [1].
Thus, the AP always wins over the channel without collisions.
In the case where the AP maintains TCP connections with S stations in parallel, it transmits to the stations using Round Robin i.e. , after maintaining a TXOP with a station the AP initiates a TXOP with the next station and so on.

Scheduling strategy 2 -HE DL Single User contention based unidirectional TCP
This HE SU scheduling strategy is shown in Figure 3. In this strategy the AP uses TXOPs but not RD: when the AP gets access to the channel it transmits DL HE SU A-MPDU frames containing TCP Data segments to a station in a row. Every transmission of a single DL HE SU A-MPDU frame from the AP is followed by a BAck frame transmission from the destination station; see Figure 3(A).
In this scheduling strategy both the AP and the stations contend in parallel for accessing the air channel in every transmission attempt, using the EDCF channel access method. In case the AP fails to gain access to the channel during its first attempt, it tries to access the channel again according to EDCF, with re-try penalty (longer BackOff interval) as shown in Figure 3(A).
The AP transmits to the stations in a Round Robin fashion. After transmitting TCP Data segments to a station, the AP does not serve that station again before receiving TCP Ack segments from the station and before the AP returns again to the station in the Round Robin order. Notice from the above that if the AP returns to a station in the Round Robin order before that station transmits TCP Ack segments to the AP, the AP skips over the station.
A station transmits to the AP only when it has TCP Ack segments, and it transmits the TCP Acks in one UL HE SU A-MPDU frame. See Figure 3(B).

Scheduling strategy 3 -HE DL simultaneous Multi User unidirectional TCP
In the HE DL unidirectional TCP Multi User mode the AP transmits TCP Data to and receives TCP Acks from several stations in parallel. We assume the following DL unidirectional TCP where simultaneous DL TCP Data is sent by the AP to multiple stations in the same PPDU and the TCP Acks are sent simultaneously by the stations at the same TXOP by using Multi User, as is illustrated in Figure 4.

IEEE 802.11 Frames/PPDU formats
In Figure 5 we show the 802.11 frames' formats of the BAck, Multi Station BAck, TF and CF-End frames used in the various scheduling strategies. In Figure 6 we show the various PPDUs' formats used in the various scheduling strategies shown in Figures 2-4.
For the TCP Data/Ack segments' transmission in Figure 2, scheduling strategy 1, the PPDU format in Figure 6(A) is used while the BAck and CF-End frames are transmitted using the legacy mode in Figure 6(B).
For the TCP Data/Ack segments' transmission in Figure 3, scheduling strategy 2, the PPDU in Figure 6(A) is used while the BAck frames are transmitted by the legacy mode shown in Figure 6(B).
For the TCP DL Data segments' transmission in Figure 4, scheduling strategy 3, the PPDU format in Figure 6(C) is used and the BAcks are transmitted using the PPDU format in Figure 6(D). The TCP UL Ack segments are transmitted by the PPDU format in In the 11ax PPDU formats we find the HE-LTF fields, the number of which equals the number of SSs in use; 4 in our case. In this paper we assume that each such field is composed of 2X LTF and therefore of duration 7.2µs [2].
Notice also that the PSDU frame in 11ax contains a Packet Extension (PE) field. This field is mainly used in MU mode and we assume that it is 0µs in SU and the longest possible in MU, 16µs.
In the HE-SIG-B field used in the PPDU format of Figure 6(C) the Modulation/Coding Scheme (MCS) that is used for this field is the minimum between MCS4 and the one used for the data transmissions [2]. The length of this field is also a function of the number of stations to which the AP transmits simultaneously. Therefore, in the case of 4 stations for example, the HE-SIG-B field duration is 8µs for MCS0 and MCS1 and 4µs for MCS2-4 following section 23.3.9.8 in [2]. For MCS5-MCS11 it is 4µs as for MCS4.

Parameters' values
We assume the 5GHz band, a 160MHz channel and that the AP and each station has 4 antennae. In SU mode, i.e. in scheduling strategies 1 and 2, the AP and the stations use up to 4 Spatial Streams and the entire channel is devoted to transmissions of the AP and stations. The BAck frames are transmitted using legacy mode and the basic rates' set is used. The PHY rate R legacy is set to the largest basic rate that is smaller or equal to the TCP Data/Ack segments' transmission rate R T CP .
In Table 1 we show the PHY rates and the length of preambles used in SU mode in scheduling strategies 1 and 2 and in the various MCSs. The values are taken from [2].
When using MU mode in scheduling strategy 3, the 160MHz channel is divided into The TF and the Multi Station BAck frames are transmitted using the legacy mode and the PHY rate R legacy is set to the largest basic rate that is smaller or equal to the TCP Data/Ack segments' transmission rate R T CP . The minimal basic PHY rate is 6Mbps. In the case of R T CP smaller than 6Mbps, R legacy is never less than 6Mbps. This can occur in the case of 64 stations.
Concerning the transmission in non-legacy mode, an OFDM symbol is 12.8µs. In the DL we assume a GI of 0.8µs and therefore the symbol in this direction is 13.6µs. In the UL MU we assume a GI of 1.6µs and therefore the symbol in this direction is 14.4µs. The UL GI is 1.6µs due to UL arrival time variants. In UL SU the GI is 0.8µs. When considering transmissions in legacy mode, the symbol is 4µs containing a GI of 0.8µs.
We assume that the MAC Header field is of 28 bytes and the Frame Control Sequence   G to all the stations over S TXOPs. In the MU strategy, scheduling strategy 3, the TCP Goodput G of a TXOP is that provided to all the stations together over one TXOP.
The second target of the analysis is to find the time intervals over which the system enables a given TCP Goodput G to all of its stations. A scheduling strategy that enables a given TCP Goodput to all stations over shorter time intervals is more efficient.

Maximum Goodput of a TXOP
Computing the optimal working point per scheduling strategy, i.e. the one that maximizes the Goodput of a TXOP, is done in 3 stages: Where:  parallel to the AP. The analysis in scheduling strategy 3 is therefore basically the same as for scheduling strategy 1 with some differences specified below.
The Goodput of scheduling strategy 3 shown in Figure 4 is given by Eq. 3, assuming the AP transmits N DL TCP Data MSDUs in every TCP connection in n HE DL MU A-MPDUs: where: T Sym DL · R DL + P r(6(D)) + T (BAck) + 2 · SIF S T cycle Ack = P r(6(B)) + T (T F ) + P r(6(D)) + T Sym U L ·     The

Goodput vs. delay computation
For scheduling strategies 1 and 3 we measured the Goodput received in every TXOP according to Eqs. 2 and 4 respectively. In these equations the total number of TCP Data bits transmitted in a TXOP is divided by the TXOP length, measured in seconds. However, since we assume that the same TXOPs repeat themselves one after another, the computed Goodput of a TXOP is also the Goodput of the system. We now measure for every number N of TCP Data segments transmitted in a TXOP, 1 ≤ N ≤ N M AX the resulted length of the TXOP interval containing the N TCP Data segments' and as mentioned the Goodput is computed using Eqs. 2 and 4 respectively.
For scheduling strategy 2 we also measure the Goodput when transmitting N TCP Data segments. However, in this scheduling strategy there is no TXOP with RD and instead we measure the average time elapsed from the time the AP transmits to a station TCP Data segments until it receives TCP Acks from the station.
From now on we denote by cycle the TXOPs in scheduling strategies 1 and 3, and the above time interval that we described for scheduling strategy 2. By cycle length we denote the length, in seconds, of the cycle. The next step is as follows: Notice that for every number N of TCP Data segments transmitted in a cycle, there is a resulting cycle length which shows how much time is needed in a cycle for the transmission of these TCP Data segments to a specific station.
Thus, for every number N of TCP Data segments, 1 ≤ N ≤ N M AX we attach two measures: the cycle length in which these N TCP Data segments are transmitted and the resulting Goodput. We now arrange the cycle lengths in a list together with the associated Goodputs in increasing order of the cycle lengths.
Notice that two different cycle lengths can have the same Goodput. One of the cycles has more TCP Data segments but it can also have more A-MPDU/MPDUs' overhead. IN addition, the number of TCP Data segments can be large enough so that the addition of one more TCP Data segment barely changes the Goodput. For a set of cycle lengths with the same Goodput we leave only the shortest cycle in the list.
Consider now a cycle length of L ms with a Goodput G. In scheduling strategies 1 and 2 (the SU ones) when the AP is communicating with S stations in Round Robin, a station receives TCP Data segments in every S th cycle. Thus, a station receives a service for L ms with a Goodput G, and then waits (S − 1) · L ms before receiving TCP Data segments again. In total the system provides a Goodput G for all stations during an interval of S · L ms.
In scheduling strategy 3 (the MU one) where S stations transmit in a TXOP, every station has a Goodput G S during an interval of L ms. Overall the system provides a Goodput G to all the stations during every interval of L ms.

Goodput results
In Figure 7  Therefore, the MU strategy outperforms the SU strategies, while using RD outperforms the contention based strategy. We can therefore conclude that the MU uses the channel more efficiently in this case, and enables a better performance for TCP than SU.
The same result also holds for 8 stations, Figure 7(C). In the case of 16 stations the MU strategy almost achieves the maximum Goodput. The RD strategy achieves the maximum Goodput, although in much larger delays. The MU strategy has small PHY rates that do not enable transmission of many TCP Acks due to the limit on transmission time of the HE UL MU A-MPDU frame containing the TCP Acks. As a consequence the number of TCP Data segments that can be transmitted in a TXOP is relatively small. Therefore, it is not possible to transmit as many TCP Data segments in a TXOP as in the SU strategies, and the resulting Goodput is smaller.
Notice that the above phenomena is also observed in the case of 32 stations, Figure 7(E).
In Figure 7(F), the case for 64 stations, the very small PHY rates in MU cause the SU modes to outperform MU significantly.
In Figure 8 we show the same results as in Figure 7 In Figure 9 we show results for the various TCP Data segments' sizes, 208, 464 and 1460 bytes for MCS11, for the cases of 4, 8 and 16 stations in Figures 8 (A), (B) and (C) respectively. We also show results with and without Delayed Acks. Since the number of TCP Acks that can be transmitted in a cycle does not change, one can expect that as the length of the TCP Data segments decreases, the length of the respective cycles also decrease.
This also is true for the respective Goodputs since the overhead of transmitting TCP Ack segments remains unchanged.
We see these expected results in Figure 9. Notice that for all cases the curves end at the longest cycles possible and these lengths decrease as the TCP Data segment lengths decrease.
We can also see that while for TCP Data segments of 1460 bytes the use of Delayed Acks results only in marginal Goodput improvement, the other TCP Data segments' lengths such as 464 and 208 bytes show significant improvement. in the order of 15 − 20%. With short TCP Data segments one can add many such segments without increasing the number of MPDUs and A-MPDUs significantly, while greatly increasing the number of TCP Data bytes transmitted. Therefore, the ratio between the increase in the TCP data to the increase in the A-MPDUs/MPDUs overheads is much better than in the case of long TCP Data segments and the increase in the Goodput is more significant.

Summary
In this paper we have introduced three scheduling strategies for the transmission of TCP Data over the DL of an IEEE 802.11ax system, where the AP is the TCP Data transmitter and the stations are the receivers. Two of the strategies are SU, and one strategy is an MU.
We measured the Goodput of the system as a function of the time it takes the system to provide this Goodput.
We found that for up to 8 stations the MU strategy outperforms those of SU, i.e. the maximum Goodput is achieved in the MU strategy in much shorter time intervals than in the           Goodput of the SU strategies, but does so in much shorter time intervals. The SU strategies achieve a slightly larger Goodput but with much longer time intervals. Therefore, in these cases it is not clear which is the best strategy. For the case of 64 stations the SU strategies are much better than the MU because the latter has very small PHY rate channels.
Finally, we found that using Delayed Acks has only marginal influence on the Goodput when transmitting long TCP Data segments. The Delayed Acks feature results with significant improvement in the achieved Goodput, in the order of when the TCP Data segments are short. Let X lower be a lower limit on the number of A-MPDUs needed to transmit the N TCP Data MSDUs. One possibility for scheduling the N TCP Data MSDUs in these X lower A-MPDUs is by defining (X lower − 1) Full A-MPDUs and possibly one Partial A-MPDU.
Notice that by using X upper A-MPDUs one uses the smallest amount of overhead caused by MPDUs and the largest amount of overhead caused by A-MPDUs. The MPDU's overhead is the MAC Header, MPDU Delimiter and the FCS fields. The A-MPDU overhead for scheduling strategies 1 and 2 is P r(6(A)) + P r(6(B)) + T (BAck) + 2 · SIF S. For scheduling strategy 3 the preambles are P r(6(C)) and P r(6(D)).
By using X lower A-MPDUs one uses the largest amount of overhead caused by MPDUs and the smallest amount of overhead caused by A-MPDUs. To find the maximum Goodput when transmitting N TCP Data MPDUs, one needs to review all numbers of A-MPDUs X, X lower ≤ X ≤ X upper , and determine the minimum MPDUs' overhead when using X A-MPDUs. We then need to find the minimum sum of overheads of both A-MPDUs and MPDUs for all Xs in the range.
In the following we show the scheduling that results with the smallest amount of MPDUs' overhead given N TCP Data MSDUs and X A-MPDUs, For this purpose we now define the following scheduling α of N TCP Data MSDUs into X A-MPDUs, X lower ≤ X ≤ X upper . Assume on the contrary that there is another scheduling β in which the N TCP Data MSDUs are scheduled in X A-MPDUs within less than P MPDUs.
Notice that if an A-MPDU in scheduling β contains two or more Partial MPDUs then it is possible to re-arrange the scheduling of the MSDUs within the A-MPDU such that the number of MPDUs is not changed and that all the MPDUs in the A-MPDU are Full MPDUs except possibly one Partial MPDU.
We re-arrange all the A-MPDUs in scheduling β as described above. We go then through