How to encode to get the desired result? - c#

Original String is:
csrfToken=ajax:1238044988226892967&postTitle=Job Openings Linux Systems Administrator
Staff&postText=Security Clearance: Public Trust -- Linux systems administration experience specifically in managing or supporting RedHat and/or Centos Linux in...&pollChoice1-
ANetPostForm=&pollChoice2-ANetPostForm=&pollChoice3-ANetPostForm=&pollChoice4-ANetPostForm=&pollChoice5-ANetPostForm=&pollEndDate-ANetPostForm=0&contentImageCount=1&contentImageIndex=0&
contentImage=http://www.ideal-jobs.net/images/image070.jpg&contentEntityID=5637974394992087135&contentUrl=http%3a%2f%2fwww.ideal-jobs.net%2fjob-openings-
linux-systems-administrator-staff%2f&contentTitle=Job Openings Linux Systems Administrator
Staff&contentSummary=Security Clearance: Public Trust -- Linux systems administration experience specifically in managing or supporting RedHat and/or Centos Linux in...&contentImageIncluded=true&%23=Save&tweet=&postItem=Share&gid=50565&ajax=true&tetherAccountID=&facebookTetherID=
String i want it to be like after encoding:
csrfToken=ajax%3A6293994705950333071&postTitle=hello&postText=Hi%20everyone%20hae%20a%20good%20day%20%2C%20i%20am%20new%20to%20this%20%3A)&
pollChoice1-ANetPostForm=&pollChoice2-ANetPostForm=&pollChoice3-ANetPostForm=&pollChoice4-
ANetPostForm=&pollChoice5-ANetPostForm=&pollEndDate-ANetPostForm=0&contentImageCount=0&contentImageIndex=-1&contentImage=&contentEntityID=&contentUrl=&contentTitle=&
contentSummary=&contentImageIncluded=true&%23=&gid=163857&postItem=&ajax=true&tetherAccountID=&facebookTetherID=
And currently i am using :
byte[] byteData = HttpUtility.UrlEncodeToBytes(postData);
and i am getting the string (i see in fiddler) like :
csrfToken%3dajax%3a1238044988226892967%26postTitle%3dJob+Openings+Linux+Systems+Administrat
or+Staff%26postText%3dSecurity+Clearance%3a+Public+Trust+--+Linux+systems+administration+ex
perience+specifically+in+managing+or+supporting+RedHat+and%2for+Centos+Linux+in...%26pollCh
oice1-ANetPostForm%3d%26pollChoice2-ANetPostForm%3d%26pollChoice3-ANetPostForm
%3d%26pollChoice4-ANetPostForm%3d%26pollChoice5-ANetPostForm%3d%26pollEndDate-
ANetPostForm%3d0%26contentImageCount%3d1%26contentImageIndex%3d0%26contentImage%3dhttp
%3a%2f%2fwww.ideal-
jobs.net%2fimages%2fimage070.jpg%26contentEntityID%3d5637974394992087135%26contentUrl%3dhtt
p%253a%252f%252fwww.ideal-jobs.net%252fjob-openings-linux-systems-administrator-
staff%252f%26contentTitle%3dJob+Openings+Linux+Systems+Administrator+Staff%26contentSummary
%3dSecurity+Clearance%3a+Public+Trust+--+Linux+systems+administration+experience+specifical
ly+in+managing+or+supporting+RedHat+and%2for+Centos+Linux+in...%26contentImageIncluded%3dtr
ue%26%2523%3dSave%26tweet
%3d%26postItem%3dShare%26gid%3d50565%26ajax%3dtrue%26tetherAccountID
%3d%26facebookTetherID%3d
ALSO TRIED:
UTF8Encoding encoding = new UTF8Encoding();
AND
byte[] byteData = HttpUtility.UrlEncodeUnicodeToBytes(postData);
Still no luck..
Thank you

The problem is you're URL encoding the whole string, including the delimiter characters & and =. You first need to parse the string into fields, then url encode just the field names and values and finally recombine into a string.
Give this a try:
string input; // Your input string
List<string> outputs = new List<string>();
// Parse the original string
NameValueCollection parms = HttpUtility.ParseQueryString(input);
// Loop over each item, url encoding
foreach (string key in parms.AllKeys) {
foreach (string val in parms.GetValues(key))
outputs.Add(HttpUtility.UrlEncode(key) + "=" + HttpUtility.UrlEncode(val));
}
// combine the encoded strings, joining with &
string result = string.Join("&", outputs); // the final result
EDIT
Here is a simpler version I figured out while trying out my previous idea:
string result = HttpUtility.ParseQueryString(postData).ToString();

Related

Hashing Query String containing Special Characters not working

I have posted few questions about Tokens and Password reset and have managed to finally figure this all out. Thanks everyone!
So before reading that certain characters will not work in a query string, I decided to hash the query string but as you've guessed, the plus signs are stripped out.
How do you secure or hash a query string?
This is a sample from a company email I received and the string looks like this:
AweVZe-LujIAuh8i9HiXMCNDIRXfSZYv14o4KX0KywJAGlLklGC1hSw-bJWCYfia-pkBbessPNKtQQ&t=pr&ifl
In my setup, I am simply using a GUID. But does it matter?
In my scenario the user cannot access the password page, even without a GIUD. That's because the page is set to redirect onload if the query string don't match the session variable?
Are there ways to handle query string to give the result like above?
This question is more about acquiring knowledge.
UPDATE:
Here is the Hash Code:
public static string QueryStringHash(string input)
{
byte[] inputBytes = Encoding.UTF8.GetBytes();
SHA512Managed sha512 = new SHA512Managed();
byte[] outputBytes = sha512.ComputeHash(inputBytes);
return Convert.ToBase64String(outputBytes);
}
Then I pass the HASH (UserID) to a SESSION before sending it as a query string:
On the next page, the Session HASH is not the same as the Query which cause the values not to match and rendered the query string invalid.
Note: I created a Class called Encryption that handles all the Hash and Encryption.
Session["QueryString"] = Encryption.QueryStringHash(UserID);
Response.Redirect("~/public/reset-password.aspx?uprl=" +
HttpUtility.UrlEncode(Session["QueryString"].ToString()));
I also tried everything mentioned on this page but no luck:
How do I replace all the spaces with %20 in C#
Thanks for reading.
The problem is that base64 encoding uses the '+' and '/' characters, which have special meaning in URLs. If you want to base64 encode query parameters, you have to change those characters. Typically, that's done by replacing the '+' and '/' with '-' and '_' (dash and underscore), respectively, as specified in RFC 4648.
In your code, then, you'd do this:
public static string QueryStringHash(string input)
{
byte[] inputBytes = Encoding.UTF8.GetBytes();
SHA512Managed sha512 = new SHA512Managed();
byte[] outputBytes = sha512.ComputeHash(inputBytes);
string b64 = Convert.ToBase64String(outputBytes);
b64 = b64.Replace('+', '-');
return b64.Replace('/', '_');
}
On the receiving end, of course, you'll need to replace the '-' and '_' with the corresponding '+' and '/' before calling the method to convert from base 64.
They recommend not using the pad character ('='), but if you do, it should be URL encoded. There's no need to communicate the pad character if you always know how long your encoded strings are. You can add the required pad characters on the receiving end. But if you can have variable length strings, then you'll need the pad character.
Any time you see base 64 encoding used in query parameters, this is how it's done. It's all over the place, perhaps most commonly in YouTube video IDs.
I did something before where I had to pass a hash in a query string. As you've experienced Base 64 can be pretty nasty when mixed with URLs so I decided to pass it as a hex string instead. Its a little longer, but much easier to deal with. Here is how I did it:
First a method to transform binary into a hex string.
private static string GetHexFromData(byte[] bytes)
{
var output = new StringBuilder();
foreach (var b in bytes)
{
output.Append(b.ToString("X2"));
}
return output.ToString();
}
Then a reverse to convert a hex string back to binary.
private static byte[] GetDataFromHex(string hex)
{
var bytes = new List<byte>();
for (int i = 0; i < hex.Length; i += 2)
{
bytes.Add((byte)int.Parse(hex.Substring(i, 2), System.Globalization.NumberStyles.HexNumber));
}
return bytes.ToArray();
}
Alternatively if you just need to verify the hashes are the same, just convert both to hex strings and compare the strings (case-insensitive). hope this helps.

Encoding not converting

An ASP.NET page (ashx) receives a GET request with a UTF8 string. It reads a SqlServer database with Windows-1255 data.
I can't seem to get them to work together. I've used information gathered on SO (mainly Convert a string's character encoding from windows-1252 to utf-8) as well as msdn on the subject.
When I run anything through the functions below - it always ends up the same as it started - not converted at all.
Is something done wrong?
EDIT
What I'm specifically trying to do (getData returns a Dictionary<int, string>):
getData().Where(a => a.Value.Contains(context.Request.QueryString["q"]))
Result is empty, unless I send a "neutral" character such as "'" or ",".
CODE
string windows1255FromUTF8(string p)
{
Encoding win = Encoding.GetEncoding(1255);
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(p);
byte[] winBytes = Encoding.Convert(utf8, win, utfBytes);
return win.GetString(winBytes);
}
string UTF8FromWindows1255(string p)
{
Encoding win = Encoding.GetEncoding(1255);
Encoding utf8 = Encoding.UTF8;
byte[] winBytes = win.GetBytes(p);
byte[] utfBytes = Encoding.Convert(win, utf8, winBytes);
return utf8.GetString(utfBytes);
}
There is nothing wrong with the functions, they are simply useless.
What the functions do is to encode the strings into bytes, convert the data from one encoding to another, then decode the bytes back to a string. Unless the string contains a character that is not possible to encode using the windows-1255 encoding, the returned value should be identical to the input.
Strings in .NET doesn't have an encoding. If you get a string from a source where the text was encoded using for example UTF-8, once it's decoded into a string it doesn't have that encoding any more. You don't have to do anyting to a string to use it when the destination has a specific encoding, whatever library you are using that takes the string will take care of the encoding.
For some reason this worked:
byte[] fromBytes = (fromEncoding.UTF8).GetBytes(myString);
string finalString = (Encoding.GetEncoding(1255)).GetString(fromBytes);
Switching encoding without the conversion...

How to deal with ISO-2022-JP ( and other character sets ) in a Twitter update?

Part of my application accepts arbitrary text and posts it as an Update to Twitter. Everything works fine, until it comes to posting foreign ( non ASCII/UTF7/8 ) character sets, then things no longer work.
For example, if someone posts:
に投稿できる
It ( within my code in Visual Studio debugger ) becomes:
=?ISO-2022-JP?B?GyRCJEtFajlGJEckLSRrGyhC?=
Googling has told me that this represents ( minus ? as delimiters )
=?ISO-2022-JP is the text encoding
?B means it is base64 encoded
?GyRCJEtFajlGJEckLSRrGyhC? Is the encoded string
For the life of me, I can't figure out how to get this string posted as an update to Twitter in it's original Japanese characters. As it stands now, sending '=?ISO-2022-JP?B?GyRCJEtFajlGJEckLSRrGyhC?=' to Twitter will result in exactly that getting posted. Ive also tried breaking the string up into pieces as above, using System.Text.Encoding to convert to UTF8 from ISO-2022-JP and vice versa, base64 decoded and not. Additionally, ive played around with the URL Encoding of the status update like this:
string[] bits = tweetText.Split(new char[] { '?' });
if (bits.Length >= 4)
{
textEncoding = System.Text.Encoding.GetEncoding(bits[1]);
xml = oAuth.oAuthWebRequest(TwitterLibrary.oAuthTwitter.Method.POST, url, "status=" + System.Web.HttpUtility.UrlEncode(decodedText, textEncoding));
}
No matter what I do, the results never end up back to normal.
EDIT:
Got it in the end. For those following at home, it was pretty close to the answer listed below in the end. It was just Visual Studios debugger was steering me the wrong way and a bug in the Twitter Library I was using. End result was this:
decodedText = textEncoding.GetString(System.Convert.FromBase64String(bits[3]));
byte[] originalBytes = textEncoding.GetBytes(decodedText);
byte[] utfBytes = System.Text.Encoding.Convert(textEncoding, System.Text.Encoding.UTF8, originalBytes);
// now, back to string form
decodedText = System.Text.Encoding.UTF8.GetString(utfBytes);
Thanks all.
This produced the output you are looking for:
using System;
using System.Text;
class Program {
static void Main(string[] args) {
string input = "に投稿できる";
Console.WriteLine(EncodeTwit(input));
Console.ReadLine();
}
public static string EncodeTwit(string txt) {
var enc = Encoding.GetEncoding("iso-2022-jp");
byte[] bytes = enc.GetBytes(txt);
char[] chars = new char[(bytes.Length * 3 + 1) / 2];
int len = Convert.ToBase64CharArray(bytes, 0, bytes.Length, chars, 0);
return "=?ISO-2022-JP?B?" + new string(chars, 0, len) + "?=";
}
}
Standards are great, there are so many to choose from. ISO never disappoints, there are no less than 3 ISO-2022-JP encodings. If you have trouble then also try encodings 50221 and 50222.
Your understanding of how the text is encoded seems correct. In python
'GyRCJEtFajlGJEckLSRrGyhC'.decode('base64').decode('ISO-2022-JP')
returns the correct unicode string. Note that you need to decode base64 first in order to get the ISO-2022-JP-encoded text.

Encoding Conversion problem

I've got a little problem changing the ecoding of a string. Actually I read from a DB strings that are encoded using the codepage 850 and I have to prepare them in order to be suitable for an interoperable WCF service.
From the DB I read characters \x10 and \x11 (triangular shapes) and i want to convert them to the Unicode format in order to prevent serialization/deserialization problem during WCF call. (Chars
and are not valid according of the XML specs even if WCF serialize them).
Now, I use following code in order to covert string encoding, but nothing happens. Result string is in fact identical to the original one.
I'm probably missing something...
Please help me!!!
Emanuele
static class UnicodeEncodingExtension
{
public static string Convert(this Encoding sourceEncoding, Encoding targetEncoding, string value)
{
string reEncodedString = null;
byte[] sourceBytes = sourceEncoding.GetBytes(value);
byte[] targetBytes = Encoding.Convert(sourceEncoding, targetEncoding, sourceBytes);
reEncodedString = sourceEncoding.GetString(targetBytes);
return reEncodedString;
}
}
class Program
{
private static Encoding Cp850Encoding = Encoding.GetEncoding(850);
private static Encoding UnicodeEncoding = Encoding.UTF8;
static void Main(string[] args)
{
string value;
string resultValue;
value = "\x10";
resultValue = Cp850Encoding.Convert(UnicodeEncoding, value);
value = "\x11";
resultValue = Cp850Encoding.Convert(UnicodeEncoding, value);
value = "\u25b6";
resultValue = UnicodeEncoding.Convert(Cp850Encoding, value);
value = "\u25c0";
resultValue = UnicodeEncoding.Convert(Cp850Encoding, value);
}
}
It seems you think there is a problem based on an incorrect understanding. But jmservera is correct - all strings in .NET are encoded internally as unicode.
You didn't say exactly what you want to accomplish. Are you experiencing a problem at the other end of the wire?
Just FYI, you can set the text encoding on a WCF binding with the textMessageEncoding element in the config file.
I suspect this line may be your culprit
reEncodedString = sourceEncoding.GetString(targetBytes);
which seems to take your target encoded string of bytes and asks your sourceEncoding to make a string out of them. I've not had a chance to verify it but I suspect the following might be better
reEncodedString = targetEncoding.GetString(targetBytes);
All the strings stored in string are in fact Unicode.Unicode. Read: Strings in .Net and C# and The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Edit: I suppose that you want the Convert function to automatically change \x11 to \u25c0, but the problem here is that \x11 is valid in almost any encoding, the differences usually start in character \x80, so the Convert function will maintain it even if you do that:
string reEncodedString = null;
byte[] unicodeBytes = UnicodeEncoding.Unicode.GetBytes(value);
byte[] sourceBytes = Encoding.Convert(Encoding.Unicode,
sourceEncoding, unicodeBytes);
You can see in unicode.org the mappings from CP850 to Unicode. So, for this conversion to happen you will have to change these characters manually.
byte[] sourceBytes =Encoding.Default.GetBytes(value)
Encoding.UTF8.GetString(sourceBytes)
this sequence usefull for download unicode file from service(for example xml file that contain persian character)
You should try this:
byte[] sourceBytes = sourceEncoding.GetBytes(value);
var convertedString = Encoding.UTF8.GetString(sourceBytes);

How to convert (transliterate) a string from utf8 to ASCII (single byte) in c#?

I have a string object
"with multiple characters and even special characters"
I am trying to use
UTF8Encoding utf8 = new UTF8Encoding();
ASCIIEncoding ascii = new ASCIIEncoding();
objects in order to convert that string to ascii. May I ask someone to bring some light to this simple task, that is hunting my afternoon.
EDIT 1:
What we are trying to accomplish is getting rid of special characters like some of the special windows apostrophes. The code that I posted below as an answer will not take care of that. Basically
O'Brian will become O?Brian. where ' is one of the special apostrophes
This was in response to your other question, that looks like it's been deleted....the point still stands.
Looks like a classic Unicode to ASCII issue. The trick would be to find where it's happening.
.NET works fine with Unicode, assuming it's told it's Unicode to begin with (or left at the default).
My guess is that your receiving app can't handle it. So, I'd probably use the ASCIIEncoder with an EncoderReplacementFallback with String.Empty:
using System.Text;
string inputString = GetInput();
var encoder = ASCIIEncoding.GetEncoder();
encoder.Fallback = new EncoderReplacementFallback(string.Empty);
byte[] bAsciiString = encoder.GetBytes(inputString);
// Do something with bytes...
// can write to a file as is
File.WriteAllBytes(FILE_NAME, bAsciiString);
// or turn back into a "clean" string
string cleanString = ASCIIEncoding.GetString(bAsciiString);
// since the offending bytes have been removed, can use default encoding as well
Assert.AreEqual(cleanString, Default.GetString(bAsciiString));
Of course, in the old days, we'd just loop though and remove any chars greater than 127...well, those of us in the US at least. ;)
I was able to figure it out. In case someone wants to know below the code that worked for me:
ASCIIEncoding ascii = new ASCIIEncoding();
byte[] byteArray = Encoding.UTF8.GetBytes(sOriginal);
byte[] asciiArray = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, byteArray);
string finalString = ascii.GetString(asciiArray);
Let me know if there is a simpler way o doing it.
For anyone who likes Extension methods, this one does the trick for us.
using System.Text;
namespace System
{
public static class StringExtension
{
private static readonly ASCIIEncoding asciiEncoding = new ASCIIEncoding();
public static string ToAscii(this string dirty)
{
byte[] bytes = asciiEncoding.GetBytes(dirty);
string clean = asciiEncoding.GetString(bytes);
return clean;
}
}
}
(System namespace so it's available pretty much automatically for all of our strings.)
Based on Mark's answer above (and Geo's comment), I created a two liner version to remove all ASCII exception cases from a string. Provided for people searching for this answer (as I did).
using System.Text;
// Create encoder with a replacing encoder fallback
var encoder = ASCIIEncoding.GetEncoding("us-ascii",
new EncoderReplacementFallback(string.Empty),
new DecoderExceptionFallback());
string cleanString = encoder.GetString(encoder.GetBytes(dirtyString));
If you want 8 bit representation of characters that used in many encoding, this may help you.
You must change variable targetEncoding to whatever encoding you want.
Encoding targetEncoding = Encoding.GetEncoding(874); // Your target encoding
Encoding utf8 = Encoding.UTF8;
var stringBytes = utf8.GetBytes(Name);
var stringTargetBytes = Encoding.Convert(utf8, targetEncoding, stringBytes);
var ascii8BitRepresentAsCsString = Encoding.GetEncoding("Latin1").GetString(stringTargetBytes);

Categories