C# String confusion compared to Java

C# String confusion compared to Java - c#

I'm confused as a java dev trying his way into C#. I've read about the string type and it being immutable and such , not much different from java except that it doesn't seem to be an object like there but I'm getting weird behavior regardless. I have following toString method on a class
public override string ToString()
{
StringBuilder builder = new StringBuilder();
builder.Append("BlockType: ");
builder.Append(BlockType + "\n");
//builder.Append(System.Text.ASCIIEncoding.ASCII.GetChars(Convert.FromBase64String("dHh0AA==")));
//builder.Append("\n");
builder.Append("BlockName: ");
builder.Append(BlockName + "\n");
//builder.Append(System.Text.ASCIIEncoding.ASCII.GetChars(Convert.FromBase64String(this.BlockName)));
//builder.Append("\n");
builder.Append("BlockLength: " + this.BlockLength + "\n");
builder.Append("pBlockData: " + this.pBlockData + "\n");
return builder.ToString();
}
When I fill it with data. Taking in account that BlockType and BlockName will contain a Base64 String. I get following result
FileVersionNo: 0
nx: 1024
ny: 512
TileSize: 256
HorizScale: 10
Precis: 0,01
ExtHeaderLength: 35
nExtHeaderBlocks: 1
pExtHeaderBlocks: System.Collections.Generic.LinkedList`1[LibFhz.HfzExtHeaderBlock]
BlockType: dHh0AA==
BlockName: YXBwLW5hbWUAAAAAAAAAAA==
BlockLength: 11
pBlockData: System.Byte[]
Which is perfect exactly what I want, however when I try to get the ASCII value of those Base64 (or UTF-8, I tried both) I get the following result
FileVersionNo: 0
nx: 1024
ny: 512
TileSize: 256
HorizScale: 10
Precis: 0,01
ExtHeaderLength: 35
nExtHeaderBlocks: 1
pExtHeaderBlocks: System.Collections.Generic.LinkedList`1[LibFhz.HfzExtHeaderBlock]
BlockType: txt
The code just seems to stop, without error or stacktrace. I have no idea what is going on. I thought first that a \0 is missing so I've added it to the string, then I thought I need a \r\n ... again not the sollution, I started to google with people just wanting to know how to do a Bas64 to UTF-8 conversion ... but that part seems easy ... this code stop isn't.
Any insights or links to decent articles about string handling in .net would be appreciated

I've had a look at what you get from this:
var test = Convert.FromBase64String("YXBwLW5hbWUAAAAAAAAAAA==");
var builder = new StringBuilder();
builder.Append(System.Text.Encoding.ASCII.GetChars(test));
The answer is the string "app-name" with a load of null (0) characters at the end.
You could try removing all the null characters by adding this line just before you return builder.ToString():
builder.Replace("\0", null);
That may or may not help, depending on what you're doing with the returned string.

First
builder.Append("pBlockData: " + this.pBlockData + "\n");
Doesn't do what you think it does, specifically if pBlockData is a byte array you will get something like this (output from scriptcs):
> byte[] data = new byte[11];
> StringBuilder sb = new StringBuilder();
> sb.Append("data = ")
{Capacity:16,MaxCapacity:2147483647,Length:7}
> sb.Append(data);
{Capacity:32,MaxCapacity:2147483647,Length:20}
> sb.ToString()
data = System.Byte[]
Second C# strings (.NET strings in general) are UTF-16, so it doesn't really know how to handle displaying bytes. It doesn't matter if it is bas64 encoded or ASCII or French pickles ;-) the runtime just treats it as binary. Also null termination is not required, the length of the string is kept as a property of the string object.
So you need to turn the byte array you have into a UTF-16 character array, or string before you output it. If the byte array contains valid ASCII you can look into the 'System.Text.ASCIIEncoding.ASCII.GetDecoder().Convert' method as one way to accomplish this.

Related

Unable to decode base64 while read the data from console C#

I'm facing the issue while converting the encode/decode the data with C#. I have hard-coded the certain base64(encoded) data and its successfully decoded the string. As like below,
string encodedText = "eyJDb25uX0dyb3VwX0lEIjozMywiVXNlckVtYWlsIjoiVGVzdHNlcnZpc2VA\nZ21haWwuY29tIiwiVXNlclBhc3N3b3JkIjoib1ZkTEREWUVfX3FuSnZFSE1W\ncnR5WU5ZZzJSTnNzUnpaWG5KaFJMcCIsIkJhc2VVUkwiOiJodHRwOi8vbG9j\nYWxob3N0OjMwMDAifQ==\n";
byte[] data = Convert.FromBase64String(encodedText);
string decodedString = Encoding.UTF8.GetString(data);
But, while reading the same value from the console, its failed to decode the data. For example,
string readLine = Console.ReadLine();
Console.WriteLine("Received Data :: " + readLine); // Exactly same data received here
byte[] encodedByte = Convert.FromBase64String(readLine); //Failed here?
string configData = System.Text.Encoding.UTF8.GetString(encodedByte);
The second code failed with the below error message
Unhandled exception. System.FormatException: The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
at System.Convert.FromBase64CharPtr(Char* inputPtr, Int32 inputLength)
at System.Convert.FromBase64String(String s)
Note:
I have noticed that and removed all the \n from the given string. Now, its working fine. But, I'm not sure how to remove that \n programatically. I tried with the below codes, but it's does not working,
readLine = Regex.Replace(readLine, #"\t|\n|\r", String.Empty);
And also tried with,
readLine = readLine.Replace("\n", String.Empty);
It would be much appreciated if anyone help on this.

The problem why it should not work, because when an string is input to the comand line '\n' would be escaped to '\\n'
Try this one:
readLine = readLine.Replace("\\n", "");
//or
readLine = Regex.Replace(readLine, #"\\t|\\n|\\r", String.Empty);

Here is the original re-written to illustrate extra characters:
string encodedText =
"eyJDb25uX0dyb3VwX0lEIjozMywiVXNlckVtYWlsIjoiVGVzdHNlcnZpc2VA"
+ "\nZ21haWwuY29tIiwiVXNlclBhc3N3b3JkIjoib1ZkTEREWUVfX3FuSnZFSE1W"
+ "\ncnR5WU5ZZzJSTnNzUnpaWG5KaFJMcCIsIkJhc2VVUkwiOiJodHRwOi8vbG9j"
+ "\nYWxob3N0OjMwMDAifQ=="
+ "\n";
The data that should be entered in the console is then:
eyJDb25uX0dyb3VwX0lEIjozMywiVXNlckVtYWlsIjoiVGVzdHNlcnZpc2VAZ21haWwuY29tIiwiVXNlclBhc3N3b3JkIjoib1ZkTEREWUVfX3FuSnZFSE1WcnR5WU5ZZzJSTnNzUnpaWG5KaFJMcCIsIkJhc2VVUkwiOiJodHRwOi8vbG9jYWxob3N0OjMwMDAifQ==
There is no encoding issue present with this data "when cleaned up", and it will be read correctly with Console.ReadLine if correctly entered. Try to pipe it in from a file if unable to paste such correctly.
The code from the literal works because of relaxed rules in that newline characters are ignored by Convert.FromBase64String. However, the translation (of "\n" to a literal newline) that occurs in a string literal does NOT occur when entered/read via the console.
Performing a translation of errant \n sequences that appear - read as two characters when typed in the console - would require code such as:
readLine = readLine.Replace("\\n", "");
// "\n".ToCharArray() -> { 0x10 }
// "\\n".ToCharArray() -> { '\\', 'n' }

ulong.Parse(string, NumberStyles) Exception C#

i been working on this "string to Binary" method for longer than usual and i have no idea where i m going wrong.
i have already searched the internet for solution but nothing seem to be working the way it supposed to do.
public static string hexToBin(string strValue)
{
byte[] hexThis = ASCIIEncoding.ASCII.GetBytes(strValue.ToString());
string thiI = ToHex(strValue);
ulong number = UInt64.Parse(*string*, System.Globalization.NumberStyles.HexNumber);
byte[] bytes = BitConverter.GetBytes(number);
string binaryString = string.Empty;
foreach (byte singleByte in bytes)
{
binaryString += Convert.ToString(singleByte, 2);
}
return binaryString;
}
ToHex(string) takes string and returns its hex representation.
but all i keep getting is "Input string was not in a correct format." at the ulong.Parse(string, NumberStyle); and no matter what are my inputs i keep getting the "FormatException" "Input string was not in a correct format." Error.
the inputs and its outputs
string: format exception - "Hello"
hex: format exception - "48 65 6C 6C 6F"
byte[]: format exception - { 72, 101, 108, 108, 111 }
i have also tried using the "Hello" string, but it threw me the same error.
would you please let me know what i m doing wrong in here?
i also have tried "Clean/build/rebuild" restart visual studio, but i keep getting the same format exception.
EDIT,, used UInt64.Parse() not ulong.Parse() and the used string is "Hello" w/o quotation.
EDIT #2,,
so i did this based on knittl suggestion and used the Convert.ToUInt64 instead of the parse, but still getting same error
ulong binary;
string binThis;
byte[] ByteThis;
binThis = "Hello";
ByteThis = ASCIIEncoding.ASCII.GetBytes(binThis);
binary = Convert.ToUInt64(ByteThis);
Console.WriteLine(binary);
the CurrentCulture is set to en-US and i m also using en-US keyboard
EDIT #3 - Solved
thanks to knittl
the solution is as follow:
string thestring = "example";
string[] finale = new string[thestring.Length];
foreach (var c in ByteThis)
{
for (int i = 0; i < ByteThis.Length; i++)
{
thestring = Convert.ToString(c, 2);
thestring = "0" + thestring;
if (thestring.Length == 9)
thestring.Remove(0, 1);
finale[i] = thestring;
Console.WriteLine(finale[i]);
}
}
the final for is to check on the solution.
this question aimed to get the binary representation of a given string.

Not totally clear, what your method should do (i.e. what format the input string is. Is it a bas10 number, or already a hexadecimal number?)
If it's a hexadecimal number, use ulong.Parse(inputStr, NumberStyles.HexNumber). If not, simply use ulong.Parse(inputStr). Note that NumberStyles.HexNumber does not allow the 0x prefix (Convert.ToUInt64(inputStr) does however).
Then, once you have your input string parsed to a number, simply use Convert.ToString(number, 2) to convert to base2. You will notice that there is no overload which takes an ulong and an int, but you can simply cast your number to a (signed) long, since the binary representation will be identical between the two (cf. two's complement). So, in effect Convert.ToString((long)number, 2).
No need for complicated loops and conversions to byte arrays.

Bonus answer.
If you are not too concerned with performance, you can even use a LINQ one-liner:
Encoding.ASCII.GetBytes(inputStr).Aggregate(
new StringBuilder(),
(sb, ch) => sb.Append(Convert.ToString(ch, 2).PadLeft(8, '0')),
sb => sb.ToString());

Hashing Query String containing Special Characters not working

I have posted few questions about Tokens and Password reset and have managed to finally figure this all out. Thanks everyone!
So before reading that certain characters will not work in a query string, I decided to hash the query string but as you've guessed, the plus signs are stripped out.
How do you secure or hash a query string?
This is a sample from a company email I received and the string looks like this:
AweVZe-LujIAuh8i9HiXMCNDIRXfSZYv14o4KX0KywJAGlLklGC1hSw-bJWCYfia-pkBbessPNKtQQ&t=pr&ifl
In my setup, I am simply using a GUID. But does it matter?
In my scenario the user cannot access the password page, even without a GIUD. That's because the page is set to redirect onload if the query string don't match the session variable?
Are there ways to handle query string to give the result like above?
This question is more about acquiring knowledge.
UPDATE:
Here is the Hash Code:
public static string QueryStringHash(string input)
{
byte[] inputBytes = Encoding.UTF8.GetBytes();
SHA512Managed sha512 = new SHA512Managed();
byte[] outputBytes = sha512.ComputeHash(inputBytes);
return Convert.ToBase64String(outputBytes);
}
Then I pass the HASH (UserID) to a SESSION before sending it as a query string:
On the next page, the Session HASH is not the same as the Query which cause the values not to match and rendered the query string invalid.
Note: I created a Class called Encryption that handles all the Hash and Encryption.
Session["QueryString"] = Encryption.QueryStringHash(UserID);
Response.Redirect("~/public/reset-password.aspx?uprl=" +
HttpUtility.UrlEncode(Session["QueryString"].ToString()));
I also tried everything mentioned on this page but no luck:
How do I replace all the spaces with %20 in C#
Thanks for reading.

The problem is that base64 encoding uses the '+' and '/' characters, which have special meaning in URLs. If you want to base64 encode query parameters, you have to change those characters. Typically, that's done by replacing the '+' and '/' with '-' and '_' (dash and underscore), respectively, as specified in RFC 4648.
In your code, then, you'd do this:
public static string QueryStringHash(string input)
{
byte[] inputBytes = Encoding.UTF8.GetBytes();
SHA512Managed sha512 = new SHA512Managed();
byte[] outputBytes = sha512.ComputeHash(inputBytes);
string b64 = Convert.ToBase64String(outputBytes);
b64 = b64.Replace('+', '-');
return b64.Replace('/', '_');
}
On the receiving end, of course, you'll need to replace the '-' and '_' with the corresponding '+' and '/' before calling the method to convert from base 64.
They recommend not using the pad character ('='), but if you do, it should be URL encoded. There's no need to communicate the pad character if you always know how long your encoded strings are. You can add the required pad characters on the receiving end. But if you can have variable length strings, then you'll need the pad character.
Any time you see base 64 encoding used in query parameters, this is how it's done. It's all over the place, perhaps most commonly in YouTube video IDs.

I did something before where I had to pass a hash in a query string. As you've experienced Base 64 can be pretty nasty when mixed with URLs so I decided to pass it as a hex string instead. Its a little longer, but much easier to deal with. Here is how I did it:
First a method to transform binary into a hex string.
private static string GetHexFromData(byte[] bytes)
{
var output = new StringBuilder();
foreach (var b in bytes)
{
output.Append(b.ToString("X2"));
}
return output.ToString();
}
Then a reverse to convert a hex string back to binary.
private static byte[] GetDataFromHex(string hex)
{
var bytes = new List<byte>();
for (int i = 0; i < hex.Length; i += 2)
{
bytes.Add((byte)int.Parse(hex.Substring(i, 2), System.Globalization.NumberStyles.HexNumber));
}
return bytes.ToArray();
}
Alternatively if you just need to verify the hashes are the same, just convert both to hex strings and compare the strings (case-insensitive). hope this helps.

How to store UTF-8 bytes from a C# String in a SQL Server 2000 TEXT column

I have an existing SQL Server 2000 database that stores UTF-8 representations of text in a TEXT column. I don't have the option of modifying the type of the column, and must be able to store non-ASCII Unicode data from a C# program into that column.
Here's the code:
sqlcmd.CommandText =
"INSERT INTO Notes " +
"(UserID, LocationID, Note) " +
"VALUES (" +
Note.UserId.ToString() + ", " +
Note.LocationID.ToString() + ", " +
"#note); " +
"SELECT CAST(SCOPE_IDENTITY() AS BIGINT) ";
SqlParameter noteparam = new SqlParameter( "#note", System.Data.SqlDbType.Text, int.MaxValue );
At this point I've tried a few different ways to get my UTF-8 data into the parameter. For example:
// METHOD ONE
byte[] bytes = (byte[]) Encoding.UTF8.GetBytes( Note.Note );
char[] characters = bytes.Select( b => (char) b ).ToArray();
noteparam.Value = new String( characters );
I've also tried simply
// METHOD TWO
noteparam.Value = Note.Note;
And
// METHOD THREE
byte[] bytes = (byte[]) Encoding.UTF8.GetBytes( Note.Note );
noteparam.Value = bytes;
Continuing, here's the rest of the code:
sqlcmd.Parameters.Add( noteparam );
sqlcmd.Prepare();
try
{
Note.RecordId = (Int64) sqlcmd.ExecuteScalar();
}
catch
{
return false;
}
Method one (get UTF8 bytes into a string) does something strange -- I think it is UTF-8 encoding the string a second time.
Method two stores garbage.
Method three throws an exception in ExecuteScalar() claiming it can't convert the parameter to a String.
Things I already know, so no need telling me:
SQL Server 2000 is past/approaching end-of-life
TEXT columns are not meant for Unicode text
Seriously, SQL Server 2000 is old. You need to upgrade.
Any suggestions?

If your database collation is SQL_Latin1_General_CP1 (the default for the U.S. edition of SQL Server 2000), then you can use the following trick to store Unicode text as UTF-8 in a char, varchar, or text column:
byte[] bytes = Encoding.UTF8.GetBytes(Note.Note);
noteparam.Value = Encoding.GetEncoding(1252).GetString(bytes);
Later, when you want to read back the text, reverse the process:
SqlDataReader reader;
// ...
byte[] bytes = Encoding.GetEncoding(1252).GetBytes((string)reader["Note"]);
string note = Encoding.UTF8.GetString(bytes);
If your database collation is not SQL_Latin1_General_CP1, then you will need to replace 1252 with the correct code page.
Note: If you look at the stored text in Enterprise Manager or Query Analyzer, you'll see strange characters in place of non-ASCII text, just as if you opened a UTF-8 document in a text editor that didn't support Unicode.
How it works: When storing Unicode text in a non-Unicode column, SQL Server automatically converts the text from Unicode to the code page specified by the database collation. Any Unicode characters that don't exist in the target code page will be irreversibly mangled, which is why your first two methods didn't work.
But you were on the right track with method one. The missing step is to "protect" the raw UTF-8 bytes by converting them to Unicode using the Windows-1252 code page. Now, when SQL Server performs the automatic conversion from Unicode to Windows-1252, it gets back the original UTF-8 bytes untouched.

How to deal with ISO-2022-JP ( and other character sets ) in a Twitter update?

Part of my application accepts arbitrary text and posts it as an Update to Twitter. Everything works fine, until it comes to posting foreign ( non ASCII/UTF7/8 ) character sets, then things no longer work.
For example, if someone posts:
に投稿できる
It ( within my code in Visual Studio debugger ) becomes:
=?ISO-2022-JP?B?GyRCJEtFajlGJEckLSRrGyhC?=
Googling has told me that this represents ( minus ? as delimiters )
=?ISO-2022-JP is the text encoding
?B means it is base64 encoded
?GyRCJEtFajlGJEckLSRrGyhC? Is the encoded string
For the life of me, I can't figure out how to get this string posted as an update to Twitter in it's original Japanese characters. As it stands now, sending '=?ISO-2022-JP?B?GyRCJEtFajlGJEckLSRrGyhC?=' to Twitter will result in exactly that getting posted. Ive also tried breaking the string up into pieces as above, using System.Text.Encoding to convert to UTF8 from ISO-2022-JP and vice versa, base64 decoded and not. Additionally, ive played around with the URL Encoding of the status update like this:
string[] bits = tweetText.Split(new char[] { '?' });
if (bits.Length >= 4)
{
textEncoding = System.Text.Encoding.GetEncoding(bits[1]);
xml = oAuth.oAuthWebRequest(TwitterLibrary.oAuthTwitter.Method.POST, url, "status=" + System.Web.HttpUtility.UrlEncode(decodedText, textEncoding));
}
No matter what I do, the results never end up back to normal.
EDIT:
Got it in the end. For those following at home, it was pretty close to the answer listed below in the end. It was just Visual Studios debugger was steering me the wrong way and a bug in the Twitter Library I was using. End result was this:
decodedText = textEncoding.GetString(System.Convert.FromBase64String(bits[3]));
byte[] originalBytes = textEncoding.GetBytes(decodedText);
byte[] utfBytes = System.Text.Encoding.Convert(textEncoding, System.Text.Encoding.UTF8, originalBytes);
// now, back to string form
decodedText = System.Text.Encoding.UTF8.GetString(utfBytes);
Thanks all.

This produced the output you are looking for:
using System;
using System.Text;
class Program {
static void Main(string[] args) {
string input = "に投稿できる";
Console.WriteLine(EncodeTwit(input));
Console.ReadLine();
}
public static string EncodeTwit(string txt) {
var enc = Encoding.GetEncoding("iso-2022-jp");
byte[] bytes = enc.GetBytes(txt);
char[] chars = new char[(bytes.Length * 3 + 1) / 2];
int len = Convert.ToBase64CharArray(bytes, 0, bytes.Length, chars, 0);
return "=?ISO-2022-JP?B?" + new string(chars, 0, len) + "?=";
}
}
Standards are great, there are so many to choose from. ISO never disappoints, there are no less than 3 ISO-2022-JP encodings. If you have trouble then also try encodings 50221 and 50222.

Your understanding of how the text is encoded seems correct. In python
'GyRCJEtFajlGJEckLSRrGyhC'.decode('base64').decode('ISO-2022-JP')
returns the correct unicode string. Note that you need to decode base64 first in order to get the ISO-2022-JP-encoded text.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# String confusion compared to Java - c#

Related

Unable to decode base64 while read the data from console C#

ulong.Parse(string, NumberStyles) Exception C#

Hashing Query String containing Special Characters not working

How to store UTF-8 bytes from a C# String in a SQL Server 2000 TEXT column

How to deal with ISO-2022-JP ( and other character sets ) in a Twitter update?

Categories

Resources