The inverse question exists with no answer and a comment that I don't understand.
I am attempting to create a Base64 encoded HMAC-SHA1 signature for a pseudo OAuth authentication header for an API. I found a support document (requires authenticated access) that takes you through the evolution of creating the signature. I'm able to create the same data up until the final step which is to Base64 encode the hash.
The support document states that the HMAC-SHA1 signature is:
cb5acd2d3ef689a8fbec4d06c576371834689673
And I get:
CB5ACD2D3EF689A8FBEC4D06C576371834689673
The support document then states
From the hex result string in step 3, encode the value using Base64
and provides the following Base64 encoded result (58 characters):
Y2I1YWNkMmQzZWY2ODlhOGZiZWM0ZDA2YzU3NjM3MTgzNDY4OTY3Mw==
When I use Convert.ToBase64String() to convert my signature I get (28 characters):
y1rNLT72iaj77E0GxXY3GDRolnM=
I'm stumped, I don't know if the support document is incorrect or if I'm doing something wrong. The fact that I'm generating a string that is 28 characters and the example is 56 is too interesting to ignore.
The comment in the aforementioned semi-duplicate question also stumps me. I don't see how the string "MDY" translates to any ascii or unicode digits that make sense to me - I don't understand how the comment author came to that conclusion.
The hex value is being encoded as text ("062..." == 0x30, 0x36, 0x32,
...) rather than as the large number it represents.
Your signature is a 20 byte (160 bit) long byte array.
So it's basically a very long number.
When you show it, you are converting it into a hex string, so each byte is shown as 2 chars, so you get a 40 chars long string.
Base64 encoding gives you 4 chars every 3 bytes of payload to encode.
If you encode in base64 20 bytes of binary data you get 26.6 bytes, rounded up to 28 (you round up every 4 bytes).
If you encode your 40 characters long string (320 bits), you get 53,3 characters, again rounded up to 56.
I suppose you're doing the latter, and encoding a string instead of a byte[].
I came up with the same problem here. You have to convert sha1sum result to a Hex format. For example, when your sha1sum result is dfe5ec35f9c9f144d3814821a558bcfa23ab1a58, the console outputs it in string format.
You may use UltraEdit or another text editor, edit it as hex.
And then save it as a file.
The right answer 3+XsNfnJ8UTTgUghpVi8+iOrGlg= comes out after you base64 encode the file.
Have a try. Hope it helps.
The answer is correct, here is a small python example
import base64
import hashlib
# replace "hello" with your value from which you get sha1
sha = hashlib.sha1(b'hello')
sha_bytes = sha.digest() # 20 bytes
sha_str = sha.hexdigest() # 40 bytes - you are trying to b64encode this value
b64_sha_bytes = base64.b64encode(sha_bytes) # b'qvTGHdzF6KLavt4PO0gs2a6pQ00='
b64_sha_str = base64.b64encode(sha_str.encode()) # b'YWFmNGM2MWRkY2M1ZThhMmRhYmVkZTBmM2I0ODJjZDlhZWE5NDM0ZA=='
Related
I'm trying to make one decoder. Basic system .Net 4.7 I'm trying to migrate this system into php, but I'm having trouble converting bytes. As far as I understand the default string UTF-16le on C#, I understood the ord and chr functions as UCS-2 on the PHP side. I want to do below and I do not get the same result there are codes. What can I do to fix this, thanks in advance
XOR Encoded Text Bytes = [101,107,217,78,40,68,234,218,162,67,139,81,44,166,24,148];
on C#
string result = System.Text.Encoding.UTF8.GetString(destinationArray);
On PHP
for($i=0;$i<sizeof($encoded);$i++){
echo "\t".$encoded[$i]." => ".chr($encoded[$i])."\n";
$tmpStr .= chr($encoded[$i]);
}
C# Result size=26:
ek�N(D�ڢC�Q,��
PHP Result size=16:
ek�N(D�ڢC�Q,��
the strings looks the same, but byte translation is quite different.
C# Result to Bytes array:
byte[] utf8 = System.Text.Encoding.Unicode.GetBytes(result);
Console.WriteLine(string.Join("-", utf8));
response =
101-0-107-0-253-255-78-0-40-0-68-0-253-255-162-6-67-0-253-255-81-0-44-0-253-255-24-0-253-255
PHP Result to Bytes Array:
echo implode("-",unpack("C*", $tmpStr));
response = 101-107-217-78-40-68-234-218-162-67-139-81-44-166-24-148
if php response convert to UTF-16le, results again different
echo implode("-",unpack("C*", mb_convert_encoding($tmpStr,'UTF-16le')));
response =
101-0-107-0-63-0-78-0-40-0-68-0-63-0-162-6-67-0-63-0-81-0-44-0-63-0-24-0-63-0
You are mixing quite different things here.
First, in the C# code, you are not using the same encoding when converting from bytes to a string and then from a string back to bytes: Encoding.UTF8 in the first case and Encoding.Unicode (which is .NET name for UTF-16) in the latter... Things cannot go well if you do this. And by the way, I'm not sure that PHP's UCS2 is equivalent to UTF-16:
UTF-8 encodes characters on 1, 2, 3 or 4 bytes depending on the character
UTF-16 encodes characters on 2 or 4 bytes depending on the character
UCS-2 always encodes characters on 2 bytes, and hence cannot encode more than 65536 characters...
Then what you pass to the 'bytes to string' conversions is not necessarily valid! Because you've XORed the input data (I assume it to be some secret string), the resulting bytes may or may not be a valid sequence in some encodings. For example:
It is not valid in ASCII because you have (in your example) bytes > 127
It is not valid in UTF-8 because 217 followed by 78 is recognized neither as a 1-, 2-, 3-, or 4-byte character by UTF-8; hence, the � you see before the N.
It seems to be invalid UTF-16 as well, but roundtripping works (I could get back the original array using .NET's Unicode.GetString, then Unicode.GetBytes. If I remove your last byte - and end up with an odd number of bytes - then UTF-16 roundtripping does not work any more...
Although I did not test it, it should also be invalid UCS-2 because UCS-2 'looks like' UTF-16 for 2-byte characters.
Roundtripping works with ANSI encodings sucha as windows-1252 because these encodings accept any byte. However, I would discourage using such trick because you have to be sure the same code page is used on both sides of the encoding/decoding process.
Therefore, I think, in your case, the best way to store your XORed bytes into a string would be to convert the array to base64. In C# you can do it this way:
// The code below gives you ZWt1TihEInY+QydRLEIYMA==
var converted = Convert.ToBase64String(array);
// And this one gives you back the initial array
var bytes = Convert.FromBase64String(converted);
Quick googling will tell you to use base64_encode and base64_decode in PHP.
Bottom note: if you want to really understand what's going on with al this encodings stuff, here is the must-read blog post on the subject: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
I Searched for " How to Encode the data in utf-8 format". Regarding this I got the best result is following:
UTF8Encoding utf8 = new UTF8Encoding();
String unicodeString = "ABCD";
// Encode the string.
Byte[] encodedBytes = utf8.GetBytes(unicodeString);
// Decode bytes back to string.
String decodedString = utf8.GetString(encodedBytes);
But the Problem is when I see the encoded data I found that is not more than ASCII code.
can any one help me to improve my knowledge.
For example as I passed "ABCD " it gets converted into 65,66,67,68.... I think this is not utf-8
UTF-8 is backwards compatible with ASCII of course. You should test with some characters that are not included in ASCII.
If you program in C# the strings are already encoded in UTF-16. You will not see anything Special there. If you want to see something you should try to compare the LENGTH of the Byte[] when you encode the string into different Encodings.
Check out the Wikipedia article on UTF8: Wikipedia.
From there:
Backward compatibility: One-byte codes are used only for the ASCII
values 0 through 127. In this case the UTF-8 code has the same value
as the ASCII code. The high-order bit of these codes is always 0. This
means that UTF-8 can be used for parsers expecting 8-bit extended
ASCII even if they are not designed for UTF-8.
The point here is that for anything that would be ASCII 0-127 in UTF8 it's the same. You need to try more extended characters (an example in the article is the Euro symbol) to see how it's different. Or try an ASCII value greater than 127 and you'll see it different.
I convert a byte array to a string , and I convert this string to byte array.
these two byte arrays are different.
As below:
byte[] tmp = Encoding.ASCII.GetBytes(Encoding.ASCII.GetString(b));
Suppose b is a byte array.
b[0]=3, b[1]=188, b[2]=2 //decimal system
Result:
tmp[0]=3, tmp[1]=63, tmp[2]=2
So that's my problem, what's wrong with it?
188 is out of range for ASCII. Characters that are not in the corresponding character set are transposed to '?' by design (would you prefer transposing to "1/4"?)
ASCII is 7-bit only, so others are invalid. By default it uses ? to replace any invalid bytes and that's why you get a ?.
For 8-bit character sets, you should be looking for either the Extended ASCII (which is later defined "ISO 8859-1") or the code page 437 (which is often confused with Extended ASCII, but in fact it's not).
You can use the following code:
Encoding enc = Encoding.GetEncoding("iso-8859-1");
// For CP437, use Encoding.GetEncoding(437)
byte[] tmp = enc.GetBytes(enc.GetString(b));
The character 188 is not defined for ASCII. Instead, you're getting 63, which is a question mark.
The ASCII character set has a range from 1 to 127. You can see 188 is not in this range and is converted to ? (= ASC 63).
Not every sequence of bytes is necessarily a valid sequence of encoded values for a particular encoding.
So the result of Encoding.ASCII.GetString(b) on an arbitrary array of bytes, b, is poorly defined. (And could be, for any other encoding also).
If you need to take an arbitrary byte array and obtain a sequence of characters, you might want to look into the Convert classes ToBase64String and FromBase64String. If that's not what you're trying to do, maybe explain the original problem to us.
188 isn't in the range of ASCII (7 bit), you should use Encoding.Default to get the ANSI encoding:
byte[] b = new byte[3]{ 3, 188, 2 };
byte[] tmp = Encoding.Default.GetBytes(Encoding.Default.GetString(b));
In my ASP.Net application working process, I need to do some work with string, which equals something like
=?utf-8?B?SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=?=
How can I decode it to normal human language?
Thanks in advance!
Update:
Convert.FromBase64String() does not work for string, which equals
=?UTF-8?Q?Bestellbest=C3=A4tigung?=
I get The format of s is invalid. s contains a non-base-64 character, more than two padding characters, or a non-white space-character among the padding characters. exception.
Update:
Solution Here
Alternative solution
Update:
What kind of string encoding is that: Nweiß ???
It's actually a base-64 string:
string zz = "SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=";
byte[] dd = Convert.FromBase64String(zz);
// Returns Ihre Bestellung - Versandbestätigung - 1105891246
string yy = System.Text.Encoding.UTF8.GetString(dd);
I've written a library that will decode these sorts of strings. You can find it at http://github.com/jstedfast/MimeKit
Specifically, take a look at MimeKit.Utils.Rfc2047.DecodeText()
This seems to be MIME Header Encoding. The Q in your second example indicates that it is Quoted Printable.
This question seems to cover the variants fairly well. In a quick search I didn't find any .NET libraries to decode this automatically, but it shouldn't be hard to do manually if you need to.
That's not UTF8. Thats a Base64 encoded string.
the UTF-8 only indicates that the target string is in UTF8 format.
After decoding the Base64 string:
SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=
You'll get the following result:
Ihre Bestellung - Versandbestätigung - 1105891246
See Base64 online decode/encode
Looks like a base64 string.
Try Convert.FromBase64String
http://msdn.microsoft.com/en-us/library/system.convert.frombase64string.aspx
This is an encoded word, which is used in email headers when there is non-ASCII content. Encoded words are defined in RFC 2047:
https://www.rfc-editor.org/rfc/rfc2047#section-2
The BNF for an encoded word is:
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
So the correct way to interpret this is:
The data is the stuff between the 3rd and 4th question marks
It has been Base64 encoded (the 'B' stands for Base64; if it were a
'Q' then it would be quoted-printable).
Once you decode the
data, it will be in the UTF-8 character set.
The result, as #Shai correctly pointed out, is:
Ihre Bestellung - Versandbestätigung - 1105891246
This is German. The umlaut is obviously the reason for the UTF-8 and thus the need for an encoded word. The translation is:
Your order - Delivery confirmation - 1105891246
Apparently it's a tracking number for an order.
All modern email clients (and Outlook) transparently support encoded words.
This is a bit of guesswork, but let's try
remove =? from start and ?= from end
keep the start up to the next ? as the character set
Remove the B? - don't know, what it is
Convert the rest to a byte[] via System.Convert.FromBase64String()
Convert this to the final String via Encoding.GetSTring() using the character set remembered in the second step
Additional information: Unable to
translate Unicode character \uDFFF at
index 195 to specified code page.
I made an algorithm, who's result are binary values (different lengths). I transformed it into uint, and then into chars and saved into stringbuilder, as you can see below:
uint n = Convert.ToUInt16(tmp_chars, 2);
_koded_text.Append(Convert.ToChar(n));
My problem is, that when i try to save those values into .txt i get the previously mentioned error.
StreamWriter file = new StreamWriter(filename);
file.WriteLine(_koded_text);
file.Close();
What i am saving is this: "忿췾᷿]볯褟ﶞ痢ﳻ��伞ﳴ㿯ﹽ翼蛿㐻ﰻ筹��﷿₩マ랿鳿⏟麞펿"... which are some weird signs.
What i need is to convert those binary values into some kind of string of chars and save it to txt. I saw somewhere that converting to UTF8 should help, but i don't know how to. Would changing files encoding help too?
You cannot transform binary data to a string directly. The Unicode characters in a string are encoded using utf16 in .NET. That encoding uses two bytes per character, providing 65536 distinct values. Unicode however has over one million codepoints. To make that work, the Unicode codepoints above \uffff (above the BMP, Basic Multilingual Plane) are encoded with a surrogate pair. The first one has a value between 0xd800 and 0xdbff, the second between 0xdc00 and 0xdfff. That provides 2 ^ (10 + 10) = 1 million additional codes.
You can perhaps see where this leads, in your case the code detects a high surrogate value (0xdfff) that isn't paired with a low surrogate. That's illegal. Lots more possible mishaps, several codepoints are unassigned, several are diacritics that get mangled when the string is normalized.
You just can't make this work. Base64 encoding is the standard way to carry binary data across a text stream. It uses 6 bits per character, 3 bytes require 4 characters. The character set is ASCII so the odds of the receiving program decoding the character back to binary incorrectly are minimal. Only a decades old IBM mainframe that uses EBCDIC could get you into trouble. Or just plain avoid encoding to text and keep it binary.
Since you're trying to encode binary data to a text stream this SO question already contains an answer to the question: "How do I encode something as base64?" From there plain ASCII/ANSI text is fine for the output encoding.