If I use PKCS7 padding in RijndaelManaged with 16 bytes of data then I get 32 bytes of data output. It appears that for PKCS7 when the data size matches the block size it adds a whole extra block of data.
If I use Zeros padding for 16 bytes of data I get out 16 bytes of data. So for Zeros padding if the data matches the block size then it doesn't pad.
I have searched through the documentation and it says nothing about this difference in padding behavior.
Can someone please point me to some kind of documentation which specifies what the padding behavior should be for the different padding modes when the data size matches the block size.
I came across this article which offers an explanation that seems to jibe with some other articles I found during my searching. Here's the basic reason:
You may be wondering what happens if our data length is a perfect
multiple of the block size. In this scenario, PaddingMode.None and
PaddingMode.Zeros add no padding. However, in the case of
PaddingMode.PKCS7, padding must be added because the cipher must be
able to reverse even a no-padding situation. In this case, an
additional block must be added to the plain text and the value of each
byte set to the block size in bytes.
Related
.Net's implementation of Convert.FromBase64String() can be fed an invalid input and will produce output without throwing a FormatException. Most invalid input is caught correctly but this type is not. In this case FromBase64String will ignore the last 4 bits of the input string. This can cause multiple inputs to decode to the same byte array.
The issue is demonstrated in this code:
var validBase64 = Convert.FromBase64String("TQ==");//assigned length 1 byte array with single byte 0x4d
var invalidBase64 = Convert.FromBase64String("TR==");//also assigned the same
var validConvertedBack = Convert.ToBase64String(validBase64);//assigned TQ==
var invalidConvertedBack = Convert.ToBase64String(validBase64);//assigned TQ==
This occurs for the reason explained in this image:
The first 2 bytes of 'Q' (0,1) are used but the next 4 are ignored. Any character that starts with 0,1 will therefore decode to the same byte array such as R (0,1,0,0,0,1).
The Base64 encoding rfc does not define decoding, only encoding so .Net is not in violation of that spec. However the spec does warn that a similar issue is dangerous:
> The padding step in base 64 and base 32 encoding can, if improperly
implemented, lead to non-significant alterations of the encoded data.
For example, if the input is only one octet for a base 64 encoding,
then all six bits of the first symbol are used, but only the first
two bits of the next symbol are used. These pad bits MUST be set to
zero by conforming encoders, which is described in the descriptions
on padding below. If this property do not hold, there is no
canonical representation of base-encoded data, and multiple base-
encoded strings can be decoded to the same binary data. If this
property (and others discussed in this document) holds, a canonical
encoding is guaranteed.
Is Microsoft aware of this functionality? If not, could this be a security issue?
I am using bouncy castle library for encryption and decryption in c# and I was wondering how to deal with plain text smaller than block size?
here is what I have done so far:
AesFastEngine engine = new AesFastEngine();
GcmBlockCipher cipher = new GcmBlockCipher(engine);
AeadParameters param = new AeadParameters(new KeyParameters(key), 128, iv, null);
cipher.Init(true, parameters);
byte[] encData = new byte[plain.Length];
cipher.ProcessBytes(plain, 0, plain.Length, encData, 0);
when the plain data is smaller then block size it does nothing.
Unfortunately the Bouncy Castle and Oracle / Java implementations are not online. That is, the online properties of the underlying CTR mode encryption aren't kept. Online in this context means that bytes are directly encrypted/decrypted when they arive. This may have to do with how the encryption is handled and how the authentication tags are handled.
AES-CTR can be implemented in multiple ways. You can either first encrypt the counter and then directly XOR with plaintext/ciphertext when it arrives. You can also first buffer the plaintext and then, once you have a full block, create the counter, encrypt it and then XOR a full block of plaintext. This had advantages in the sense that it more closely resembles other modes of operation such as CBC. Furthermore you may not have to buffer the key stream in memory all that time.
The authentication tag can also be handled differently. Here there are basically three options. You could simply regard the authentication tag as a separate entity to the ciphertext. This lets you keep the online properties of CTR mode and should, in my opinion, be the preferred option. You could also see it as part of the ciphertext, but in that case you lose the online properties during decryption; you would need to know where the ciphertext ends before you can handle the last number of bytes that make up the authentication tag. So you'd need to buffer at least the size in bytes of the authentication tag. Finally, still during decryption, you might only want to return plaintext bytes after verification of the plaintext bytes. In that case you'd need to buffer the entire ciphertext and return the plaintext in one go.
As the authentication tag issues are only for decryption it is likely that Bouncy will just buffer because of the way CTR is implemented. You'd indeed have to call doFinal - as Robert already mentioned in the comments - to retrieve the last block of ciphertext as well as the authentication tag. It could be that the encryption is not yet performed because the encryption routine is kept somewhat symmetrical to the decryption routine as well.
I'm sending some encrypted data to a client through a web service.
The client had requested that I encrypt the data using a given key and IV. I know you should ideally use a different random IV each time, and I've already raised that with them.
The IV they have provided is a string of length 25. This really doesn't seem right to me.
As far as I was aware the IV length should match the block size, so either 128, 192 or 256 bytes (String lengths 16, 24 or 32). Am I right, or am I missing something here...?
Please note that the IV was provided to me, and therefore I am not trying to pick it.
The provided IV was of the form "ghPNHfg544JUdfjdR5BGVbj67", which I not believe is correct. (The provided key was a string 16 characters long)
I need to encrypt a text (16 chars), preferably using AES, and I need limit the length of the result encrypted text (14 or 16 characteres). The encrypted has to be only chars and numbers (not '=', '?', ...) Is it possible?
I'll need to get back the original text from the cipher text(encrypted).
Is there a way to do this using RijndaelManaged (System.Security.Cryptography
)?
The cypher-text needs to be at least as long(in an entropic sense) as plain-text. You can't losslessly compress arbitrary texts. So if you limit your output to log2(10+2*26)*16=95 bits the input can't have any more entropy than that. This has nothing to do with AES, it's a mathematical limitation that applies to all lossless encodings.
What's a character? A byte a char or a unicode-codepoint?
AES has the additional problem that it's a block cypher, the minimum output size is equal to the blocksize, 128bits. And since the output appears random it can't be compressed after encrypting. And that already exceeds your limit. And most encryption modes add a bit of additional padding.
There are functions which map arbitrary length input to constant length output. They are called hash-functions. But following the pidgeon-hole-principle they map multiple inputs to an output. So you can't get back the input for all possible inputs.
I want to encrypt a string and embed it in a URL, so I want to make sure the encrypted output isn't bigger than the input.
Is AES the way to go?
It's impossible to create any algorithm which will always create a smaller output than the input, but can reverse any output back to the input. If you allow "no bigger than the input" then basically you're just talking isomorphic algorithms where they're always the same size as the input. This is due to the pigeonhole principle.
Added to that, encryption usually has a little bit of padding (e.g. "to the nearest 8 bytes, rounded up" - in AES, that's 16 bytes). Oh, and on top of that you're got the issue of converting between text and binary. Encryption algorithms usually work in binary, but URLs are in text. Even if you assume ASCII, you could end up with an encrypted binary value which isn't ASCII. The simplest way of representing arbitrary binary data in text is to use base64. There are other alternatives which would be highly fiddly, but the general "convert text to binary, encrypt, convert binary to text" pattern is the simplest one.
Simple answer is no.
Any symmetric encryption algorithm ( AES included ) will produce an output of at minimum the same but often slightly larger. As Jon Skeet points out, usually because of padding or alignment.
Of course you could compress your string using zlib and encrypt but you'd need to decompress after decrypting.
Disclaimer - compressing the string with zlib will not guarantee it comes out smaller though
What matters is not really the cipher that you use, but the encryption mode that you use. For example the CTR mode has no length expansion, but every encryption needs a new distinct starting point for the counter. Other modes like OFB, CFB (or CBC with ciphertext stealing) also don't need to be padded to a multiple of the block length of the cipher, but they need an IV. It is unclear from your question if there is some information available from which an IV could be derived pseudorandomly an if any of these modes would be appropriate. It is also unclear if you need authentication, or if you need semantic security> i.e. is it a problem if you encrypt the same string twice and you get the same ciphertext twice?
If we are talking about symetric encription to obtain the original encrypted string from a cyphered one it is not possible. I think that unless you use hashes (SHA1, SHA256...) you will never obtain a cyphered string smaller than the original text. The problem with hashes is that they are not the solution for retrieving the original string because they are one way encryption algorithms.
When using AES, the output data will be rounded up to have a specific length (e.g a length divisible trough 16).
If you want to transfer secret data to another website, a HTTP post may do better than embedding the data into the URL.
Also just another thing to clarify:
Not only is it true that symmetric encryption algorithms produce an output that is at least as large as the input, the same is true of asymmetric encryption.
"Asymmetric encryption" and "cryptographic hashes" are two different things.
Asymmetric encryption (e.g. RSA) means that given the output (i.e. the ciphertext), you can get the input (i.e. the plaintext) back if you have the right key, it's just that decrypting requires a different key than the key used for encrypting. For asymmetric encryption, the same "pigeonhole principle" argument applies.
Cryptographic hashes (e.g. SHA-1) mean that given the output (i.e. the hash) you can't get the input back, and you can't even find a different input that hashes to the same value (assuming the hash is secure). For cryptographic hashes, the hash can be shorter than the input. (In fact the hash is the same size regardless of the length of the input.
And also one more thing: In any secure encryption system the ciphertext will be longer than the plaintext. This is because there are multiple possible ciphertexts that any given plaintext could encrypt to (e.g. using different IVs.) If this were not the case then the cipher would leak information because if two identical plaintexts were encrypted, they would encrypt to identical ciphertexts, and an adversary would then know that the plaintexts were the same.