FireStarter: an Encrypted Value Is *Not* a Token!

We’ve been writing a lot on tokenization as we build the content for our next white paper, and in Adrian’s response to the PCI Council’s guidance on tokenization. I want to address something that’s really been ticking me off…

In our latest post in the series we described the details of token generation. One of the options, which we had to include since it’s built into many of the products, is encryption of the original value – then using the encrypted value as the token.

Here’s the thing: If you encrypt the value, it’s encryption, not tokenization! Encryption obfuscates, but a token removes , the original data.

Conceptually the major advantages of tokenization are:

The token cannot be reversed back to the original value.
The token maintains the same structure and data type as the original value.

While format preserving encryption can retain the structure and data type, it’s still reversible back to the original if you have the key and algorithm. Yes, you can add per-organization salt, but this is still encryption. I can see some cases where using a hash might make sense, but only if it’s a format preserving hash.

I worry that marketing is deliberately muddling the terms.

Opinions? Otherwise, I declare here and now that if you are using an encrypted value and calling it a ‘token’, that is not tokenization.

19 Comments

Marc Massar 2011-01-27

Wow - I'm almost sad I missed this thread when it was raging. I think the long and short of it is that the reason people are in such disagreement over what a token is or which is better (encryption vs. tokenization) is that tokenization is not a technology that you can easily put your finger on. It's a use case of data transformation...not a specific technology or method. High value data goes into black box, out comes de-valued data And the opposite... De-valued data goes into black box, out comes high value data Because we require reversibility in tokenization, the attacks against a token are all going to focus on that function. Using a cipher as a generation method means you have to focus your attacks against the decryption process. Using a lookup table to associate de-valued data to high value data means that the attacks must focus on access to the lookup function. The generation method is irrelevant as long as the reversal process can't be performed without going to "the black box." In cryptography the attacks against ciphers are well documented and often enter the realm of millenia (or more) to perpetrate. On the other hand, when was the last time you heard of someone spoofing, or socially engineering, credentials to a database? Or when was the last time someone published an 0-day against Oracle? (Here's one on your site - http://securosis.com/blog/litchfield-discloses-oracle-0-day-at-black-hat/) Compare that to the last time you heard of an equivalent attack against AES...? It also seems to me that using a lookup function can quickly get out of hand if you are a service provider and provide each merchant with its own collection of token=PAN pairings. Again, sorry I missed the first round of fires on this. @Jay and @Mark - your comments are very well stated - I might have to quote you on my next call about this topic :)

Rich 2011-01-27

Mark, You can say the same things about plenty of encryption systems that are compromised on a regular basis, including those in the payment industry. In other words- specious argument. Both the token and encryption servers need to be secure or the system breaks down for both approaches. At least with tokenization you have a single choke point to focus your efforts on. If anything we've seen real world exploitation of key management in payment systems, but not (yet) tokenization. (BTW- please disclose when you work for a vendor involved in what you are commenting on, we don't filter anything, but we do ask that as a courtesy).

Marc Massar 2011-01-27

@Rich - "At least with tokenization you have a single choke point to focus your efforts on." I think focusing your efforts on securing the tokenization/de-tokenization process (regardless of method) is the correct approach and I think everyone can agree on this. Worrying about an attacker's focus on decrypting a ciphertext in the absence of a decryption service or the key isn't warranted in most cases. Sorry about the disclosure - I hadn't seen any of the other vendors (Voltage, RSA, Mercury Payments) in this thread mention their affiliations and I've never been called out on that before on this site. My comments are my own and do not reflect my employer's. As a security professional I am objective to all points of view.

Mark Bower 2010-08-03

Hi Rich, Thanks for responding - I think we're actually on the same page and I certainly welcome your efforts in bringing attention to the different ways Tokenization can take place on a backdrop of best practices. I guess its just when I see someone claiming them can tokenize by splitting a credit card number into two halves and storing them in two databases in the same site so its "no longer cardholder data" that the hairs on the back of my neck stand up. There's no "science" there as I'm sure you would agree:) And no, I didn't read that or see that recommended it on your site. With regards to the FUD aspect - I respectfully disagree. Tokenization is a powerful technique and I have no qualms with that - but one of the points you've missed in your analysis is critical aspect of authentication of the tokenization entry point (not just the detokenization). If this isn't carefully managed, then it is possible to perform a dictionary based attack in some cases which can be made easier if the input fields have predictable input patterns (consider a regionally issued credit card from a smaller bank with 1 BIN). Revealing live PAN's without access to a detokenizing system is a risk that needs to be avoided. In your recommendation, all that is covered is this "Send new data to be tokenized and retrieve the token." I think this is a shortcoming in your recommendations given the probable use of using Tokenization for data that has known input patterns which reduces the effort needed in the dictionary or table attack. Whilst these kinds of attacks may be difficult, they are not impossible. its also possible to avoid them by design. I suspect this is one of the considerations Visa has looked at too. Keep lighting the fires :) Regards, Mark Bower

Rich 2010-08-03

Mark, Reasonable enough. We do have another post in the series (near the end) of deployment guidance and pitfalls to avoid. I think that should take care of your concerns. You're catching us mid-series here :)

Adrian Lane 2010-08-03

Mark, You are correct that we did not discuss - or worse, trivialized - authentication at the token entry point. It's an important point. And one that is difficult to discuss in that it changes for every deployment model. As Rich mentioned we will discuss in the deployment guidance later in the series. I am also ignorant on another point you raise ... a dictionary attack on input patterns. Can you clarify what you mean? Is this possible with random number tokens? Hashed & salted tokens? -Adrian

Rich 2010-08-02

Mark, Sorry for the time delay in approving your comment. We were *all* out of the office for Black Hat or vacations last week, and while I usually still approve them remotely I lost track due to the event. I want to respond to a couple of points... 1. The PCI scope is pretty clear-

Mark Bower 2010-07-30

Adrian, Regarding your statement: "Key here is to remember, PCI DSS is allowing systems that substitute credit card data with tokens to be removed from the audit based upon the premise that PAN data is not available" I'd be interested if you could point to the specific part of PCI DSS today that states that Tokens remove systems from the validation requirements. There's a lot of work going on in this area but nowhere does this get stated in PCI DSS to be clear. Thus, merely claiming one is "using Tokenization" may or may not reduce scope and may or may not increase security: it has to be done right: only a QSA can make that decision when looking at the specifics of an implementation. A lot of claims are made about Tokenization security, and many are not based on science. I would also point out that getting Tokenization right is a lot more involved than merely substituting data and managing a Data Vault. Many of the types of attacks on cryptosystems still apply in slightly different forms to Tokenization systems especially if such systems do not pay very good attention to the token generation process, exactly what you tokenize in the first place, and most importantly how you manage credentials and access to BOTH the tokenizing system and detokenizing system and any images of it that are distributed. The suggestion that Tokenization is "simple" is also a somewhat misleading statement: if you have to manage, sync, distribute and contain a growing database of tokens, keys and other sensitive materials (credentials), monitor it etc, then this starts to become a significant surface to risk manage - especially the entry and exit points and their data paths. Also, how do you manage a re-tokenize event if your token systems somehow have been compromised so the tokens themselves can now be manipulated, injected and abused? Assuring that the tokenizing engine has not been tampered with or the sources of entropy used to generated tokens are within specification are all considerations. One cannot underestimate the ingenuity of todays sophisticated attackers. An open access tokenizer for example may permit a successful table based attack on a poorly implemented system given knowledge of cardholder data patterns. A badly design hashing token approach which does not may attention to security may lead to simple compromise without even attacking the token database. VISA's guidance is refreshing to see more rigor being necessary. Perhaps these types of attacks are what VISA indicated in their statement: "Where properly implemented, tokenization may help simplify a merchant's payment card environment," said Eduardo Perez, Head of Global Payment System Security, Visa Inc. "However, we know from working with the industry and from forensics investigations, that there are some common implementation pitfalls that have contributed to data compromises. For example, entities have failed to monitor for malfunctions, anomalies and suspicious activity, allowing an intruder to manipulate the tokenization system undetected. As more merchants look at tokenization solutions, these best practices will provide guidance on how to implement those solutions effectively and highlight areas for particular vigilance," With regard to Jays comments: "In the real world, we

Mark Bower 2010-07-30

Note to Adrian/Editor: my comments were also following Rich (not Jay). Also - the intro position that this is follow up to the PCI Council is not correct - this subject is on Visa's recommendations. These are independent and separate to the PCI SSC. Probably important to make that clear!

Jay Jacobs 2010-07-21

'tis a beautiful thing to be challenged and learn... thank you. If I could paraphrase what I grok from this: cryptosystems are complicated and come in a wide array of arrangements. Tokenization only has one model (client/server). The one model for tokenization has qualities that make it difficult to screw up from a client perspective, therefore systems dealing in tokens may be excluded from further inspection. That makes sense to me. Got it, I think. Perhaps my hang up is because I know there are functionally equivalent cryptosystems and token solutions. Both can require client authentication, plaintext being sent to a server and de-valued data being returned. To reverse it, both require authentication, the de-valued data with the original plaintext being returned. Both would require a breach at the one centralized location before the reversal process may be compromised/distributed. But, to my first statement here, that is but one of a myriad of possible solutions for crypto and the only method for tokens. So while I personally feel the less-robust cryptosystems are spoiling the more-robust cryptosystems, I think this thread is more that the architecture of tokenization is getting the spotlight.

Jay Jacobs 2010-07-20

I gotta disagree, perhaps I'm missing something, either that or you're confusing the possibility of reversing encryption with the probability of reversing encryption. The statement "The token cannot be reversed back to the original value" could be appended with "unless access is granted to do so". Because some systems want to reverse it for various purposes, the reversal process exists, right? But the argument here is that encryption is somehow different (injecting: from a security perspective) from tokenization. We could easily make the statement that an encrypted value cannot be reversed back to the original value (unless access is granted to do so). Having an encrypted value without access to the key is functionally no different than having a token without access to reverse it. Breaking it down further, clients wishing to de-tokenize are probably set up to authenticate to a tokenization system, much in the same way clients wishing to decrypt must authenticate into a key management system or Hardware Security Module (HSM). Access to the key can/should be controlled just as access to a tokenization solution. Perhaps, Rich, your position is assuming a foundation of poor key management? Because I struggle with this concept the standalone tokens and standalone ciphertext would have different value to an attacker. Both are devalued and both rely on some other breakdown in security to return value to the data. Is this assuming that keys are somehow more vulnerable to compromise than a tokenization solution? Or are you thinking that brute force of keys is a realistic attack?

Adrian Lane 2010-07-20

@Jay - Very well stated. If the argument was just the theoretical perspective I may agree, but the theoretical security of the token is not a realistic assessment here. You do not get to gauge the security of the _system_ based upon the how hard it is to crack the standalone token. These things are not equivalent. Key here is to remember, PCI DSS is allowing systems that substitute credit card data with tokens to be removed from the audit based upon the premise that PAN data is not available. But if it is encrypted with the keys it _is_ available! How can you justify removing a system from a PCI audit when you are actually storing the credit card/PAN and keys? Sure, the theoretical algorithm, and the implementation of that algorithm, may be secure. Yes, if key management is done right it will be _really_ hard to break. But the fact is you don't know if key management has been properly performed! Is the encryption system secure or has been impaired? It's a ridiculous argument to claim that tokens constructed through mathematical functions are unbreakable so we don't need to audit tokenized systems. Any merchant that wants to gain access to the encrypted content you will need to have the keys accessible. That means the entire crypto-system is present and it all needs to be reviewed. If you include systems with tokenized data in the scope of the PCI audit you have nullified the cost savings benefit that makes tokenization so attractive. Thanks again for a great comment. -Adrian

Jay Jacobs 2010-07-20

@Adrian - I must be missing the point, my apologies, perhaps I'm just approaching this from too much of a cryptonerd perspective. Though, I'd like to think I'm not being overly theoretical. To extend your example, any merchant that wants to gain access to the de-tokenized content, we will need to make a de-tokenization interface available to them. They will have the ability to get at the credit card/PAN of every token they have. From the crypto side, if releasing keys to merchants is unacceptable, require that merchants return ciphertext to be decrypted so the key is not shared... What's the difference between those two? Let's say my cryptosystem leverages a networked HSM. Clients connect and authenticate, send in an account number and get back ciphertext. In order to reverse that operation, a client would have to connect and authenticate, send in cipher text and receive back an account number. Is it not safe to assume that the ciphertext can be passed around safely? Why should systems that only deal in that ciphertext be in scope for PCI when an equivalent token is considered out of scope? Conversely, how do clients authenticate into a tokenization system? Because the security of the tokens (from an attackers perspective) is basically shifted to that authentication method. What if it's a password stored next to the tokens? What if it's mutual SSL authentication using asymmetric keys? Are we just back to needing good key management and access control? My whole point is that, from my view point, I think encrypting data is getting a bad wrap when the problem is poorly implemented security controls. I don't see any reason to believe that we can't have poorly implemented tokenization systems. If we can't control access into a cryptosystem, I don't see why we'd do any better controlling access to a token system. With PCI DSS saying tokenization is "better", my guess is we'll see a whole bunch of mediocre token systems that will eventually lead us to realize that hey, we can build just as craptastic tokenization systems as we have cryptosystems.

Adrian Lane 2010-07-20

@Jay - I understand your viewpoint, but we are arguing two different points. I am not questioning the effectiveness of cryptography. I am saying any crypto system must be verified, and if I have the option to avoid that responsibility altogether, that's preferable. As concise as possible to respond to some of your points ... >If the merchant wants to access the data, why use tokens at all? Why not just encrypt the database fields? That's what they are be doing today. Why would anyone substitute one crypto system for another if the first one was not broken and the two are functionally equivalent? > The tokens (encrypted & random number variants) are equivalent ... up until the point you introduce decryption apparatus and provide a means to retrieve PAN data. They may _still_ be equivalent, but only once the security of the crypto system is verified. > The point is to _not_ perform an audit. Cryptosystem implementation and deployments can be botched. Removing the need to audit is where time and money are saved. > "Why should systems that only deal in that cipher-text be in scope for PCI when an equivalent token is considered out of scope?" Because there is a chance FPE can be hacked, however slim, and a random number can't be. > The hope is that the security problem devolves to access control of the token server. And the hope is there is only one token server to worry about so you focus your time, money and resources at that point. Further -- if you are a merchant -- you are better off from a security perspective if the token server is not housed within your environment at all. That way any attack and possible compromise is entirely outside of the merchant environment. > There are some craptastic token servers out there ... using reversible hashes, storing keys inside the token repository, using fungible encryption methods, etc. Once again, I see your argument, and I don't really disagree with it. The potential of cryptography is not in question. But as there is a cheaper and more secure option available, the use case for encryption _is_ in question. -Adrian

Rich 2010-07-20

@Jay, I have a different perspective here. In the real world, we've seen a number of attacks where the attacker is able to compromise the key (memory parsing attacks, for example), or capture the data when it is unencrypted at some point in the application chain. Tokenization nearly wipes-out that concern. The attacker either needs to hit the tokenization server or any back-end systems that use the real value, which is typically a *far* smaller set of systems than what we see in an encryption implementation. Thus tokenization materially reduces the attack surface, which is why it also reduces audit scope. That isn't true of encryption, since anywhere the value is present is still part of the attack surface. With tokenization, that's reduced to only the locations where the original value is stored. Make sense?

Drew Dillon 2010-07-19

The point being that FPE isn't a standard. What you buy from company A, can't be decrypted by company B. NIST is considering AES FFX mode and, from what I've heard, considering it seriously (though I've heard were unaware of the CC/SSN use case). But until that happens, non-standard crypto should terrify people looking to encrypt an entire line of business. Layer onto that that company A and B are startups that might get sold any day to another company that may or may not care about the install base of that particular product. That's a big risk. So companies that offer FPE have to tie it back to something that's recognizable, reversible, and reasonably standard. They have to borrow the "reducing audit scope" message of tokenization or what are you doing it for?

Lucas Zaichkowsky 2010-07-19

No matter how the "tokenization" system is implemented, at a basic level somewhere there will be a ciphertext and there will be a key used for decryption. If the service provider houses ciphertext and key, an attacker who gets into their systems can hit the jackpot by getting all the stored data. If the key and data are split, a merchant or service provider hacked individually does not put that stored data at risk. In that type of implementation, you can call the returned data "ciphertext", "keying material" or other technically correct names depending on how the split is performed. Unfortunately, most merchants, VARs, and ISVs are confused and don't understand applied cryptography. I would be willing to bet they are more likely to buy "tokenization" since that's the name that appears in card brand and PCI SSC literature. The PCI SSC has already published an FAQ entry stating that if it can be shown that encrypted card data cannot be decrypted, it's not considered cardholder data. That kind of a judgment makes logical sense. At the end of the day, the goal is to decrease the attack surface area of payment card data which in turn helps reduce fraud. In an implementation that returns ciphertext or keying material to the merchant, understanding the marketing needs, and knowing the intent of PCI, what would you recommend a vendor do?

Lucas Zaichkowsky 2010-07-19

Tadd, What I'm bringing to light is the discussion of how an entire tokenization system should be applied and appropriately labeled. A token by itself is completely useless. For transactions to occur, a real card number has to be passed to the card brand networks. The first sentence of your second paragraph is where the issue lies. There must be a point where the token turns into a credit card number. That is where the tokenization system becomes vulnerable to attack. That is why some systems send ciphertext back to the client to store as a way of splitting the data from the key. Another topic entirely is that there are tokenization systems where the client sends the token to a server and receives a card number in response. Defeats the whole purpose. :/

Branden Williams 2010-07-19

Love it. Even more that we're on the same page and didn't even discuss this. I posted last thursday as well!