Unable to decode base64 while read the data from console C# - c#

I'm facing the issue while converting the encode/decode the data with C#. I have hard-coded the certain base64(encoded) data and its successfully decoded the string. As like below,
string encodedText = "eyJDb25uX0dyb3VwX0lEIjozMywiVXNlckVtYWlsIjoiVGVzdHNlcnZpc2VA\nZ21haWwuY29tIiwiVXNlclBhc3N3b3JkIjoib1ZkTEREWUVfX3FuSnZFSE1W\ncnR5WU5ZZzJSTnNzUnpaWG5KaFJMcCIsIkJhc2VVUkwiOiJodHRwOi8vbG9j\nYWxob3N0OjMwMDAifQ==\n";
byte[] data = Convert.FromBase64String(encodedText);
string decodedString = Encoding.UTF8.GetString(data);
But, while reading the same value from the console, its failed to decode the data. For example,
string readLine = Console.ReadLine();
Console.WriteLine("Received Data :: " + readLine); // Exactly same data received here
byte[] encodedByte = Convert.FromBase64String(readLine); //Failed here?
string configData = System.Text.Encoding.UTF8.GetString(encodedByte);
The second code failed with the below error message
Unhandled exception. System.FormatException: The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
at System.Convert.FromBase64CharPtr(Char* inputPtr, Int32 inputLength)
at System.Convert.FromBase64String(String s)
Note:
I have noticed that and removed all the \n from the given string. Now, its working fine. But, I'm not sure how to remove that \n programatically. I tried with the below codes, but it's does not working,
readLine = Regex.Replace(readLine, #"\t|\n|\r", String.Empty);
And also tried with,
readLine = readLine.Replace("\n", String.Empty);
It would be much appreciated if anyone help on this.

The problem why it should not work, because when an string is input to the comand line '\n' would be escaped to '\\n'
Try this one:
readLine = readLine.Replace("\\n", "");
//or
readLine = Regex.Replace(readLine, #"\\t|\\n|\\r", String.Empty);

Here is the original re-written to illustrate extra characters:
string encodedText =
"eyJDb25uX0dyb3VwX0lEIjozMywiVXNlckVtYWlsIjoiVGVzdHNlcnZpc2VA"
+ "\nZ21haWwuY29tIiwiVXNlclBhc3N3b3JkIjoib1ZkTEREWUVfX3FuSnZFSE1W"
+ "\ncnR5WU5ZZzJSTnNzUnpaWG5KaFJMcCIsIkJhc2VVUkwiOiJodHRwOi8vbG9j"
+ "\nYWxob3N0OjMwMDAifQ=="
+ "\n";
The data that should be entered in the console is then:
eyJDb25uX0dyb3VwX0lEIjozMywiVXNlckVtYWlsIjoiVGVzdHNlcnZpc2VAZ21haWwuY29tIiwiVXNlclBhc3N3b3JkIjoib1ZkTEREWUVfX3FuSnZFSE1WcnR5WU5ZZzJSTnNzUnpaWG5KaFJMcCIsIkJhc2VVUkwiOiJodHRwOi8vbG9jYWxob3N0OjMwMDAifQ==
There is no encoding issue present with this data "when cleaned up", and it will be read correctly with Console.ReadLine if correctly entered. Try to pipe it in from a file if unable to paste such correctly.
The code from the literal works because of relaxed rules in that newline characters are ignored by Convert.FromBase64String. However, the translation (of "\n" to a literal newline) that occurs in a string literal does NOT occur when entered/read via the console.
Performing a translation of errant \n sequences that appear - read as two characters when typed in the console - would require code such as:
readLine = readLine.Replace("\\n", "");
// "\n".ToCharArray() -> { 0x10 }
// "\\n".ToCharArray() -> { '\\', 'n' }

Related

System.FormatException: 'Invalid length for a Base-64 char array or string.'

I'm beating my head against a wall here, with this simple code that just doesn't work:
string middle = "eyJzdWIiOiJtYXR0d2ViZXIiLCJqdGkiOiJlMWVmNjc5Mi02YTBjLTQ4YWUtYmQzNi0wZDlmMTVlMDFiY2UiLCJpYXQiOjE0OTMwOTI0OTQsIm5iZiI6MTQ5MzA5MjQ5NCwiZXhwIjoxNDkzMjY1Mjk0LCJpc3MiOiJFQ29tbVdlYkFQSTIiLCJhdWQiOiJFQ29tbVdlYkNsaWVudDIifQ"
byte[] newBytes = Convert.FromBase64String(middle);
middle = Encoding.UTF8.GetString(newBytes);
It's that simple! And yet I get the error in the Title.
Also, I ran this on https://www.base64decode.org/ and it decodes perfectly.
Since your provided string does not completely fit criteria of FromBase64String method accepted values you need to add end symbol to follow the convention. It does not automatically add end symbols to your string.
The valueless character, "=", is used for trailing padding. The end of s can consist of zero, one, or two padding characters.
Source.
To fix issue you are having add "==" to the end of your string.
For example: string middle = "SomeString=="
To address the exception you are facing:
public static string Base64UrlDecode(this string base64)
{
string padded = base64.PadRight(base64.Length + (4 - base64.Length % 4) % 4, '=');
return Encoding.UTF8.GetString(Convert.FromBase64String(padded));
}
All credits goes to the accepted answer here Code for decoding/encoding a modified base64 URL.

C# String confusion compared to Java

I'm confused as a java dev trying his way into C#. I've read about the string type and it being immutable and such , not much different from java except that it doesn't seem to be an object like there but I'm getting weird behavior regardless. I have following toString method on a class
public override string ToString()
{
StringBuilder builder = new StringBuilder();
builder.Append("BlockType: ");
builder.Append(BlockType + "\n");
//builder.Append(System.Text.ASCIIEncoding.ASCII.GetChars(Convert.FromBase64String("dHh0AA==")));
//builder.Append("\n");
builder.Append("BlockName: ");
builder.Append(BlockName + "\n");
//builder.Append(System.Text.ASCIIEncoding.ASCII.GetChars(Convert.FromBase64String(this.BlockName)));
//builder.Append("\n");
builder.Append("BlockLength: " + this.BlockLength + "\n");
builder.Append("pBlockData: " + this.pBlockData + "\n");
return builder.ToString();
}
When I fill it with data. Taking in account that BlockType and BlockName will contain a Base64 String. I get following result
FileVersionNo: 0
nx: 1024
ny: 512
TileSize: 256
HorizScale: 10
Precis: 0,01
ExtHeaderLength: 35
nExtHeaderBlocks: 1
pExtHeaderBlocks: System.Collections.Generic.LinkedList`1[LibFhz.HfzExtHeaderBlock]
BlockType: dHh0AA==
BlockName: YXBwLW5hbWUAAAAAAAAAAA==
BlockLength: 11
pBlockData: System.Byte[]
Which is perfect exactly what I want, however when I try to get the ASCII value of those Base64 (or UTF-8, I tried both) I get the following result
FileVersionNo: 0
nx: 1024
ny: 512
TileSize: 256
HorizScale: 10
Precis: 0,01
ExtHeaderLength: 35
nExtHeaderBlocks: 1
pExtHeaderBlocks: System.Collections.Generic.LinkedList`1[LibFhz.HfzExtHeaderBlock]
BlockType: txt
The code just seems to stop, without error or stacktrace. I have no idea what is going on. I thought first that a \0 is missing so I've added it to the string, then I thought I need a \r\n ... again not the sollution, I started to google with people just wanting to know how to do a Bas64 to UTF-8 conversion ... but that part seems easy ... this code stop isn't.
Any insights or links to decent articles about string handling in .net would be appreciated
I've had a look at what you get from this:
var test = Convert.FromBase64String("YXBwLW5hbWUAAAAAAAAAAA==");
var builder = new StringBuilder();
builder.Append(System.Text.Encoding.ASCII.GetChars(test));
The answer is the string "app-name" with a load of null (0) characters at the end.
You could try removing all the null characters by adding this line just before you return builder.ToString():
builder.Replace("\0", null);
That may or may not help, depending on what you're doing with the returned string.
First
builder.Append("pBlockData: " + this.pBlockData + "\n");
Doesn't do what you think it does, specifically if pBlockData is a byte array you will get something like this (output from scriptcs):
> byte[] data = new byte[11];
> StringBuilder sb = new StringBuilder();
> sb.Append("data = ")
{Capacity:16,MaxCapacity:2147483647,Length:7}
> sb.Append(data);
{Capacity:32,MaxCapacity:2147483647,Length:20}
> sb.ToString()
data = System.Byte[]
Second C# strings (.NET strings in general) are UTF-16, so it doesn't really know how to handle displaying bytes. It doesn't matter if it is bas64 encoded or ASCII or French pickles ;-) the runtime just treats it as binary. Also null termination is not required, the length of the string is kept as a property of the string object.
So you need to turn the byte array you have into a UTF-16 character array, or string before you output it. If the byte array contains valid ASCII you can look into the 'System.Text.ASCIIEncoding.ASCII.GetDecoder().Convert' method as one way to accomplish this.

Hashing Query String containing Special Characters not working

I have posted few questions about Tokens and Password reset and have managed to finally figure this all out. Thanks everyone!
So before reading that certain characters will not work in a query string, I decided to hash the query string but as you've guessed, the plus signs are stripped out.
How do you secure or hash a query string?
This is a sample from a company email I received and the string looks like this:
AweVZe-LujIAuh8i9HiXMCNDIRXfSZYv14o4KX0KywJAGlLklGC1hSw-bJWCYfia-pkBbessPNKtQQ&t=pr&ifl
In my setup, I am simply using a GUID. But does it matter?
In my scenario the user cannot access the password page, even without a GIUD. That's because the page is set to redirect onload if the query string don't match the session variable?
Are there ways to handle query string to give the result like above?
This question is more about acquiring knowledge.
UPDATE:
Here is the Hash Code:
public static string QueryStringHash(string input)
{
byte[] inputBytes = Encoding.UTF8.GetBytes();
SHA512Managed sha512 = new SHA512Managed();
byte[] outputBytes = sha512.ComputeHash(inputBytes);
return Convert.ToBase64String(outputBytes);
}
Then I pass the HASH (UserID) to a SESSION before sending it as a query string:
On the next page, the Session HASH is not the same as the Query which cause the values not to match and rendered the query string invalid.
Note: I created a Class called Encryption that handles all the Hash and Encryption.
Session["QueryString"] = Encryption.QueryStringHash(UserID);
Response.Redirect("~/public/reset-password.aspx?uprl=" +
HttpUtility.UrlEncode(Session["QueryString"].ToString()));
I also tried everything mentioned on this page but no luck:
How do I replace all the spaces with %20 in C#
Thanks for reading.
The problem is that base64 encoding uses the '+' and '/' characters, which have special meaning in URLs. If you want to base64 encode query parameters, you have to change those characters. Typically, that's done by replacing the '+' and '/' with '-' and '_' (dash and underscore), respectively, as specified in RFC 4648.
In your code, then, you'd do this:
public static string QueryStringHash(string input)
{
byte[] inputBytes = Encoding.UTF8.GetBytes();
SHA512Managed sha512 = new SHA512Managed();
byte[] outputBytes = sha512.ComputeHash(inputBytes);
string b64 = Convert.ToBase64String(outputBytes);
b64 = b64.Replace('+', '-');
return b64.Replace('/', '_');
}
On the receiving end, of course, you'll need to replace the '-' and '_' with the corresponding '+' and '/' before calling the method to convert from base 64.
They recommend not using the pad character ('='), but if you do, it should be URL encoded. There's no need to communicate the pad character if you always know how long your encoded strings are. You can add the required pad characters on the receiving end. But if you can have variable length strings, then you'll need the pad character.
Any time you see base 64 encoding used in query parameters, this is how it's done. It's all over the place, perhaps most commonly in YouTube video IDs.
I did something before where I had to pass a hash in a query string. As you've experienced Base 64 can be pretty nasty when mixed with URLs so I decided to pass it as a hex string instead. Its a little longer, but much easier to deal with. Here is how I did it:
First a method to transform binary into a hex string.
private static string GetHexFromData(byte[] bytes)
{
var output = new StringBuilder();
foreach (var b in bytes)
{
output.Append(b.ToString("X2"));
}
return output.ToString();
}
Then a reverse to convert a hex string back to binary.
private static byte[] GetDataFromHex(string hex)
{
var bytes = new List<byte>();
for (int i = 0; i < hex.Length; i += 2)
{
bytes.Add((byte)int.Parse(hex.Substring(i, 2), System.Globalization.NumberStyles.HexNumber));
}
return bytes.ToArray();
}
Alternatively if you just need to verify the hashes are the same, just convert both to hex strings and compare the strings (case-insensitive). hope this helps.

How to deal with ISO-2022-JP ( and other character sets ) in a Twitter update?

Part of my application accepts arbitrary text and posts it as an Update to Twitter. Everything works fine, until it comes to posting foreign ( non ASCII/UTF7/8 ) character sets, then things no longer work.
For example, if someone posts:
に投稿できる
It ( within my code in Visual Studio debugger ) becomes:
=?ISO-2022-JP?B?GyRCJEtFajlGJEckLSRrGyhC?=
Googling has told me that this represents ( minus ? as delimiters )
=?ISO-2022-JP is the text encoding
?B means it is base64 encoded
?GyRCJEtFajlGJEckLSRrGyhC? Is the encoded string
For the life of me, I can't figure out how to get this string posted as an update to Twitter in it's original Japanese characters. As it stands now, sending '=?ISO-2022-JP?B?GyRCJEtFajlGJEckLSRrGyhC?=' to Twitter will result in exactly that getting posted. Ive also tried breaking the string up into pieces as above, using System.Text.Encoding to convert to UTF8 from ISO-2022-JP and vice versa, base64 decoded and not. Additionally, ive played around with the URL Encoding of the status update like this:
string[] bits = tweetText.Split(new char[] { '?' });
if (bits.Length >= 4)
{
textEncoding = System.Text.Encoding.GetEncoding(bits[1]);
xml = oAuth.oAuthWebRequest(TwitterLibrary.oAuthTwitter.Method.POST, url, "status=" + System.Web.HttpUtility.UrlEncode(decodedText, textEncoding));
}
No matter what I do, the results never end up back to normal.
EDIT:
Got it in the end. For those following at home, it was pretty close to the answer listed below in the end. It was just Visual Studios debugger was steering me the wrong way and a bug in the Twitter Library I was using. End result was this:
decodedText = textEncoding.GetString(System.Convert.FromBase64String(bits[3]));
byte[] originalBytes = textEncoding.GetBytes(decodedText);
byte[] utfBytes = System.Text.Encoding.Convert(textEncoding, System.Text.Encoding.UTF8, originalBytes);
// now, back to string form
decodedText = System.Text.Encoding.UTF8.GetString(utfBytes);
Thanks all.
This produced the output you are looking for:
using System;
using System.Text;
class Program {
static void Main(string[] args) {
string input = "に投稿できる";
Console.WriteLine(EncodeTwit(input));
Console.ReadLine();
}
public static string EncodeTwit(string txt) {
var enc = Encoding.GetEncoding("iso-2022-jp");
byte[] bytes = enc.GetBytes(txt);
char[] chars = new char[(bytes.Length * 3 + 1) / 2];
int len = Convert.ToBase64CharArray(bytes, 0, bytes.Length, chars, 0);
return "=?ISO-2022-JP?B?" + new string(chars, 0, len) + "?=";
}
}
Standards are great, there are so many to choose from. ISO never disappoints, there are no less than 3 ISO-2022-JP encodings. If you have trouble then also try encodings 50221 and 50222.
Your understanding of how the text is encoded seems correct. In python
'GyRCJEtFajlGJEckLSRrGyhC'.decode('base64').decode('ISO-2022-JP')
returns the correct unicode string. Note that you need to decode base64 first in order to get the ISO-2022-JP-encoded text.

How do I get a string type of a hex value that represents an upper ascii value character

Part of our app parses RTF documents and we've come across a special character that is not translating well. When viewed in Word the character is an elipsis (...), and it's encoded in the RTF as ('85).
In our vb code we converted the hex (85) to int(133) and then did Chr(133) to return (...)
Here's the code in C# - problem is this doesn't work for values above 127. Any ideas?
Calling code :
// S is Hex number!!!
return Convert.ToChar(HexStringToInt(s)).ToString();
Helper method:
private static int HexStringToInt(string hexString)
{
int i;
try
{
i = Int32.Parse(hexString, NumberStyles.HexNumber);
}
catch (Exception ex)
{
throw new ApplicationException("Error trying to convert hex value: " + hexString, ex);
}
return i;
}
This looks like a character encoding issue to me. Unicode doesn't include any characters with numbers in the upper-ASCII 128-255 range, so trying to convert character 133 will fail.
Need to convert it first to a character using the proper decoding, Convert.toChar appears to be using UTF-16.
Sometimes there's a manual bit manipulation hack to convert the character from upper ASCII to the appropriate unicode char, but since the ellipsis wasn't in most of the widely used extended ASCII codepages, that's unlikely to work here.
What you really want to do is use the Encoding.GetString(Byte[]) method, with the proper encoding. Put your value into a byte array, then GetString to get the C# native string for the character.
You can learn more about RTF character encodings on the RTF Wikipedia page.
FYI: The horizontal ellipsis is character U+2026 (pdf).
Your original code works prefectly fine for me. It is able to convert any Hex from 00 to FF into the appropriate character. Using vs2008.
private static int HexStringToInt(string hexString)
{
try
{
return Convert.ToChar(hexString);
}
catch (FormatException ex)
{
throw new ArgumentException("Is not a valid hex character.", "hexString", ex);
}
// Convert.ToChar() will throw an ArgumentException also
// if hexString is bad
}
My guess would be that a Char in .NET is actually two bytes (16 bits), as they are UTF-16 encoded. Maybe you are only catching/writing the first byte of the value?
Basically, are you doing something with the char value afterwards that assumes it is 8-bits instead of 16, and is therefore truncating it?
You are probably using the default character encoding when reading in the RTF file, which is UTF-8, when the RTF file is actually stored using the "windows-1252" extended ASCII latin encoding.
C# strings use a 16 unicode bit wide character format. Translating windows-1252 character 0x85 to its unicode equivalent involves a complicated mapping, since the the code points (character numbers) are very different. Luckily Windows can do the work for you.
You can change the way the characters are converted when reading in the text by explicitly specifying the source encoding when opening the stream.
using System.IO;
using System.Text.Encoding;
using (TextReader tr = new StreamReader(path_to_RTF_file, Encoding.GetEncoding(1252)))
{
// Read from the file as usual.
}
Here's some rough code that should work for you:
// Convert hex number, which represents an RTF code-page escaped character,
// to the desired character (uses '85' from your example as a literal):
var number = int.Parse("85", System.Globalization.NumberStyles.HexNumber);
Debug.Assert(number <= byte.MaxValue);
byte[] bytes = new byte[1] { (byte)number };
char[] chars = Encoding.GetEncoding(1252).GetString(bytes).ToCharArray();
// or, use:
// char[] chars = Encoding.Default.GetString(bytes).ToCharArray();
string result = new string(chars);
Just use this function I modified (very slightly) from Chris' website:
private static string charScrubber(string content)
{
StringBuilder sbTemp = new StringBuilder(content.Length);
foreach (char currentChar in content)
{
if ((currentChar != 127 && currentChar > 1))
{
sbTemp.Append(currentChar);
}
}
content = sbTemp.ToString();
return content;
}
You can modify the "current Char" condition to remove whatever character is needed to be eliminated (as appearing here, you will not get any 0x00 characters, or the (char)127, or 0x57 character).
ASCII/Hex table here: http://www.cs.mun.ca/~michael/c/ascii-table.html
Chris' site: http://seattlesoftware.wordpress.com/2008/09/11/hexadecimal-value-0-is-an-invalid-character/
-Tom

Categories