Rate limiting

Last updated

In computer networks, rate limiting is used to control the rate of requests sent or received by a network interface controller. It can be used to prevent DoS attacks [1] and limit web scraping. [2]

Contents

Research indicates flooding rates for one zombie machine are in excess of 20 HTTP GET requests per second, [3] legitimate rates much less.

Hardware appliances

Hardware appliances can limit the rate of requests on layer 4 or 5 of the OSI model.

Rate limiting can be induced by the network protocol stack of the sender due to a received ECN-marked packet and also by the network scheduler of any router along the way.

While a hardware appliance can limit the rate for a given range of IP-addresses on layer 4, it risks blocking a network with many users which are masked by NAT with a single IP address of an ISP.

Deep packet inspection can be used to filter on the session layer but will effectively disarm encryption protocols like TLS and SSL between the appliance and the protocol server (i.e. web server).

Protocol servers

Protocol servers using a request / response model, such as FTP servers or typically Web servers may use a central in-memory key-value database, like Redis or Aerospike, for session management. A rate limiting algorithm is used to check if the user session (or IP address) has to be limited based on the information in the session cache.

In case a client made too many requests within a given time frame, HTTP servers can respond with status code 429: Too Many Requests.

However, in some cases (i.e. web servers) the session management and rate limiting algorithm should be built into the application (used for dynamic content) running on the web server, rather than the web server itself.

When a protocol server or a network device notice that the configured request limit is reached, then it will offload new requests and not respond to them. Sometimes they may be added to a queue to be processed once the input rate reaches an acceptable level, but at peak times the request rate can even exceed the capacities of such queues and requests have to be thrown away.

Data centers

Data centers widely use rate limiting to control the share of resources given to different tenants and applications according to their service level agreement. [4] A variety of rate limiting techniques are applied in data centers using software and hardware. Virtualized data centers may also apply rate limiting at the hypervisor layer. Two important performance metrics of rate limiters in data centers are resource footprint (memory and CPU usage) which determines scalability, and precision. There usually exists a trade-off, that is, higher precision can be achieved by dedicating more resources to the rate limiters. A considerable body of research with focus on improving performance of rate limiting in data centers. [4]

See also

Algorithms
Libraries

Related Research Articles

<span class="mw-page-title-main">Asynchronous Transfer Mode</span> Digital telecommunications protocol for voice, video, and data

Asynchronous Transfer Mode (ATM) is a telecommunications standard defined by the American National Standards Institute and ITU-T for digital transmission of multiple types of traffic. ATM was developed to meet the needs of the Broadband Integrated Services Digital Network as defined in the late 1980s, and designed to integrate telecommunication networks. It can handle both traditional high-throughput data traffic and real-time, low-latency content such as telephony (voice) and video. ATM provides functionality that uses features of circuit switching and packet switching networks by using asynchronous time-division multiplexing. ATM was seen in the 1990s as a competitor to Ethernet and networks carrying IP traffic as it was faster and was designed with quality-of-service in mind, but it fell out of favor once Ethernet reached speeds of 1 gigabits per second.

Quality of service (QoS) is the description or measurement of the overall performance of a service, such as a telephony or computer network, or a cloud computing service, particularly the performance seen by the users of the network. To quantitatively measure quality of service, several related aspects of the network service are often considered, such as packet loss, bit rate, throughput, transmission delay, availability, jitter, etc.

The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, the entire suite is commonly referred to as TCP/IP. TCP provides reliable, ordered, and error-checked delivery of a stream of octets (bytes) between applications running on hosts communicating via an IP network. Major internet applications such as the World Wide Web, email, remote administration, and file transfer rely on TCP, which is part of the Transport layer of the TCP/IP suite. SSL/TLS often runs on top of TCP.

<span class="mw-page-title-main">Denial-of-service attack</span> Type of cyber-attack

In computing, a denial-of-service attack is a cyber-attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host connected to a network. Denial of service is typically accomplished by flooding the targeted machine or resource with superfluous requests in an attempt to overload systems and prevent some or all legitimate requests from being fulfilled. The range of attacks varies widely, spanning from inundating a server with millions of requests to slow its performance, overwhelming a server with a substantial amount of invalid data, to submitting requests with an illegitimate IP address.

A multilayer switch (MLS) is a computer networking device that switches on OSI layer 2 like an ordinary network switch and provides extra functions on higher OSI layers. The MLS was invented by engineers at Digital Equipment Corporation.

Traffic shaping is a bandwidth management technique used on computer networks which delays some or all datagrams to bring them into compliance with a desired traffic profile. Traffic shaping is used to optimize or guarantee performance, improve latency, or increase usable bandwidth for some kinds of packets by delaying other kinds. It is often confused with traffic policing, the distinct but related practice of packet dropping and packet marking.

Explicit Congestion Notification (ECN) is an extension to the Internet Protocol and to the Transmission Control Protocol and is defined in RFC 3168 (2001). ECN allows end-to-end notification of network congestion without dropping packets. ECN is an optional feature that may be used between two ECN-enabled endpoints when the underlying network infrastructure also supports it.

In computer networking, Layer 2 Tunneling Protocol (L2TP) is a tunneling protocol used to support virtual private networks (VPNs) or as part of the delivery of services by ISPs. It uses encryption ('hiding') only for its own control messages, and does not provide any encryption or confidentiality of content by itself. Rather, it provides a tunnel for Layer 2, and the tunnel itself may be passed over a Layer 3 encryption protocol such as IPsec.

Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking of new connections. A consequence of congestion is that an incremental increase in offered load leads either only to a small increase or even a decrease in network throughput.

<span class="mw-page-title-main">Leaky bucket</span> Network traffic shaping and policing algorithm

The leaky bucket is an algorithm based on an analogy of how a bucket with a constant leak will overflow if either the average rate at which water is poured in exceeds the rate at which the bucket leaks or if more water than the capacity of the bucket is poured in all at once. It can be used to determine whether some sequence of discrete events conforms to defined limits on their average and peak rates or frequencies, e.g. to limit the actions associated with these events to these rates or delay them until they do conform to the rates. It may also be used to check conformance or limit to an average rate alone, i.e. remove any variation from the average.

The token bucket is an algorithm used in packet-switched and telecommunications networks. It can be used to check that data transmissions, in the form of packets, conform to defined limits on bandwidth and burstiness. It can also be used as a scheduling algorithm to determine the timing of transmissions that will comply with the limits set for the bandwidth and burstiness: see network scheduler.

<span class="mw-page-title-main">NetFlow</span> Communications protocol

NetFlow is a feature that was introduced on Cisco routers around 1996 that provides the ability to collect IP network traffic as it enters or exits an interface. By analyzing the data provided by NetFlow, a network administrator can determine things such as the source and destination of traffic, class of service, and the causes of congestion. A typical flow monitoring setup consists of three main components:

Transmission Control Protocol (TCP) uses a congestion control algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, along with other schemes including slow start and congestion window (CWND), to achieve congestion avoidance. The TCP congestion-avoidance algorithm is the primary basis for congestion control in the Internet. Per the end-to-end principle, congestion control is largely a function of internet hosts, not the network itself. There are several variations and versions of the algorithm implemented in protocol stacks of operating systems of computers that connect to the Internet.

TCP tuning techniques adjust the network congestion avoidance parameters of Transmission Control Protocol (TCP) connections over high-bandwidth, high-latency networks. Well-tuned networks can perform up to 10 times faster in some cases. However, blindly following instructions without understanding their real consequences can hurt performance as well.

<span class="mw-page-title-main">Computer network</span> Network that allows computers to share resources and communicate with each other

A computer network is a set of computers sharing resources located on or provided by network nodes. Computers use common communication protocols over digital interconnections to communicate with each other. These interconnections are made up of telecommunication network technologies based on physically wired, optical, and wireless radio-frequency methods that may be arranged in a variety of network topologies.

Bandwidth management is the process of measuring and controlling the communications on a network link, to avoid filling the link to capacity or overfilling the link, which would result in network congestion and poor performance of the network. Bandwidth is described by bit rate and measured in units of bits per second (bit/s) or bytes per second (B/s).

WAN optimization is a collection of techniques for improving data transfer across wide area networks (WANs). In 2008, the WAN optimization market was estimated to be $1 billion, and was to grow to $4.4 billion by 2014 according to Gartner, a technology research firm. In 2015 Gartner estimated the WAN optimization market to be a $1.1 billion market.

In computing, Microsoft's Windows Vista and Windows Server 2008 introduced in 2007/2008 a new networking stack named Next Generation TCP/IP stack, to improve on the previous stack in several ways. The stack includes native implementation of IPv6, as well as a complete overhaul of IPv4. The new TCP/IP stack uses a new method to store configuration settings that enables more dynamic control and does not require a computer restart after a change in settings. The new stack, implemented as a dual-stack model, depends on a strong host-model and features an infrastructure to enable more modular components that one can dynamically insert and remove.

Bufferbloat is a cause of high latency and jitter in packet-switched networks caused by excess buffering of packets. Bufferbloat can also cause packet delay variation, as well as reduce the overall network throughput. When a router or switch is configured to use excessively large buffers, even very high-speed networks can become practically unusable for many interactive applications like voice over IP (VoIP), audio streaming, online gaming, and even ordinary web browsing.

Deterministic Networking (DetNet) is an effort by the IETF DetNet Working Group to study implementation of deterministic data paths for real-time applications with extremely low data loss rates, packet delay variation (jitter), and bounded latency, such as audio and video streaming, industrial automation, and vehicle control.

References

  1. Richard A. Deal (September 22, 2004). "Cisco Router Firewall Security: DoS Protection". Cisco Press. Retrieved April 16, 2017.
  2. Greenberg, Andy (12 January 2021). "An Absurdly Basic Bug Let Anyone Grab All of Parler's Data" . Wired. Archived from the original on 12 January 2021. Retrieved 12 January 2021.
  3. Jinghe Jin; Nazarov Nodir; Chaetae Im; Seung Yeob Nam (7 November 2014). "Mitigating HTTP GET Flooding Attacks through Modified NetFPGA Reference Router". p. 1. Archived from the original on Mar 6, 2023. Retrieved 19 December 2021 via ResearchGate.
  4. 1 2 Noormohammadpour, M.; Raghavendra, C. S. (May 2018). "Datacenter Traffic Control: Understanding Techniques and Trade-offs". IEEE Communications Surveys & Tutorials. 20 (2): 1. doi:10.1109/COMST.2017.2782753. Archived from the original on Jan 16, 2024 via ResearchGate.
  5. 1 2 3 4 Nikrad Mahdi (April 12, 2017). "An alternative approach to rate limiting". Medium. Retrieved April 16, 2017.