How to get correct string text? - c#

I'm trying to obtain the correct unicode characters represented by this string:
string originalString = "\u0605\u04c3\u5000\u0000\u5000\ufd00\u4400\ud500\u7600\ud300\u4f00\ubc00\u0c00\u2d00\u4000\ue400\u0e00\u7400\u4800\ub700\u1d00\u1300\ue900\u6000\u4c00\ufb00\u9900\u3900\ud900\u6700\uae00\ueb00\u8f00\u2800\u0200\ub300\u5c00\ufe00\u0100\u3d00\u9100\u3000\u0300\u1600\u0100\u7000\u6200\u8e00\u1d00\u8e00\u6200\ua900\u6300\uc800\u0900\ub700\ub000\u6000\ue400\u9200\u3f00\u9100\u8d00\uef00\u3600\u0100\u9e00\u0081";
If I hard-code it in the cs file, I can see in debug mode that it shows the correct characters, but if I have the exact string written in a file and I try to read it, it shows the string as it is in the file.
TextReader tr = new StreamReader("c:\\test.txt");
string tmpString = tr.ReadLine();
tr.Close();
byte[] array = Encoding.Unicode.GetBytes(tmpString );
string finalResult = Encoding.Unicode.GetString(array);
How can I make the finalResult string have the correct unicode characters?
Thanks in advance
Gonçalo
EDIT: Already tried placing
TextReader tr = new StreamReader("c:\\test.txt",Encoding.Unicode);
but the characters are different from the correct ones.

Does your file actually contain the content:
\u0605\u04c3\u5000\u0000\u5000\ufd00\u4400\ud500\u7600\ud300\u4f00
\ubc00\u0c00\u2d00\u4000\ue400\u0e00\u7400\u4800\ub700\u1d00\u1300
\ue900\u6000\u4c00\ufb00\u9900\u3900\ud900\u6700\uae00\ueb00\u8f00
\u2800\u0200\ub300\u5c00\ufe00\u0100\u3d00\u9100\u3000\u0300\u1600
\u0100\u7000\u6200\u8e00\u1d00\u8e00\u6200\ua900\u6300\uc800\u0900
\ub700\ub000\u6000\ue400\u9200\u3f00\u9100\u8d00\uef00\u3600\u0100\u9e00\u0081
If so, you need to convert each sequence to its corresponding unicode character
string originalString = "\u0605\u04c3\u5000\u0000\u5000\ufd00\u4400\ud500\u7600\ud300\u4f00\ubc00\u0c00\u2d00\u4000\ue400\u0e00\u7400\u4800\ub700\u1d00\u1300\ue900\u6000\u4c00\ufb00\u9900\u3900\ud900\u6700\uae00\ueb00\u8f00\u2800\u0200\ub300\u5c00\ufe00\u0100\u3d00\u9100\u3000\u0300\u1600\u0100\u7000\u6200\u8e00\u1d00\u8e00\u6200\ua900\u6300\uc800\u0900\ub700\ub000\u6000\ue400\u9200\u3f00\u9100\u8d00\uef00\u3600\u0100\u9e00\u0081";
string tmpString = "\\u0605\\u04c3\\u5000\\u0000\\u5000\\ufd00\\u4400\\ud500\\u7600\\ud300\\u4f00\\ubc00\\u0c00\\u2d00\\u4000\\ue400\\u0e00\\u7400\\u4800\\ub700\\u1d00\\u1300\\ue900\\u6000\\u4c00\\ufb00\\u9900\\u3900\\ud900\\u6700\\uae00\\ueb00\\u8f00\\u2800\\u0200\\ub300\\u5c00\\ufe00\\u0100\\u3d00\\u9100\\u3000\\u0300\\u1600\\u0100\\u7000\\u6200\\u8e00\\u1d00\\u8e00\\u6200\\ua900\\u6300\\uc800\\u0900\\ub700\\ub000\\u6000\\ue400\\u9200\\u3f00\\u9100\\u8d00\\uef00\\u3600\\u0100\\u9e00\\u0081";
string finalResult = Regex.Replace(tmpString, #"\\u(....)", match => ((char)int.Parse(match.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)).ToString());

you can use the Encoding as parameter while reading the file
TextReader tr = new StreamReader("c:\\test.txt",Encoding.Unicode);
string unicode_string = tr.ReadLine();

Try something like:
TextReader streamReader = new StreamReader("c:\\test.txt");
string input = streamReader.ReadLine();
string[] chars = input.Split(new char[] { '\\', 'u' },
StringSplitOptions.RemoveEmptyEntries);
streamReader.Close();
string answer = string.Empty;
foreach (string charachter in chars)
{
byte byte1 = byte.Parse(string.Format("{0}{1}",
charachter[0], charachter[1]), NumberStyles.AllowHexSpecifier);
byte byte2 = byte.Parse(string.Format("{0}{1}",
charachter[2], charachter[3]), NumberStyles.AllowHexSpecifier);
answer += Encoding.Unicode.GetString(new byte[] { byte2, byte1 });
}

Related

C# - How can I get the key and value from strings of xaml resourcedictionary format?

I read the xaml file as binaryreader and have it as a string.
And,
<System: String x: Key = "ABC"> AAAA </ System: String>
.........................many
<System: String x: Key = "ZZZ"> ASDQWE </ System: String>
I want to get ABC and AAAA as a String.
Whether there is a parser or method that can have this type of list ??
This is my xaml file to string code
BinaryReader b = new BinaryReader(file.InputStream);
byte[] binData = b.ReadBytes((int)file.InputStream.Length);
string result = System.Text.Encoding.UTF8.GetString(binData);
Xaml file is a xml file. So you can easily use LinqToXml to select the nodes and values.
I read the file line by line and extracted it using regular expressions.
StreamReader reader = new StreamReader(file.InputStream);
string textLine = reader.ReadLine();
string key = GetKeyStringFromXaml(textLine);
string val = GetValStringFromXaml(textLine);
public string GetKeyStringFromXaml(string textLine) {
Regex regex = new Regex("<System:String x:Key=\"(.*)\">(.*)</System:String>");
var v = regex.Match(textLine);
return v.Groups[1].ToString();
}
public string GetValStringFromXaml(string textLine) {
Regex regex = new Regex("<System:String x:Key=\"(.*)\">(.*)</System:String>");
var v = regex.Match(textLine);
return v.Groups[2].ToString();
}

Finding ® in a string of text

Let me rephrase my question:
I am reading in text where one of the characters is the registered symbol, ®, from a text file that has no problem displaying the symbol. When I try to print the string after reading it from the file, the symbol is an unprintable character. When I read in the string and split the string to characters and convert the character to an Int16 and print out the hex, I get 0xFFFD. I specify Encoding.UTF8 when I open the StreamReader.
Here is what I have
using (System.IO.StreamReader sr = new System.IO.StreamReader(HttpContext.Current.Server.MapPath("~/App_Code/Hormel") + "/nutrition_data.txt", System.Text.Encoding.UTF8))
{
string line;
while((line = sr.ReadLine()) != null)
{
//after spliting the file on '~'
items[i] = scrubData(utf8.GetString(utf8.GetBytes(items[i].ToCharArray())));
//items[i] = scrubData(items[i]); //original
}
}
Here is the scrubData function
private String scrubData(string data)
{
string newStr = String.Empty;
try
{
if (data.Contains("HORMEL"))
{
string[] s = data.Split(' ');
foreach(string str in s)
{
if (str.Contains("HORMEL"))
{
char[] ch = str.ToCharArray();
for(int i=0; i<ch.Length; i++)
{
EventLogProvider.LogInformation("LoadNutritionInfoTask", "Test", ch[i] + " = " + String.Format("{0:X}", Convert.ToInt16(ch[i])));
}
}
}
}
return String.Empty;
}
catch (Exception ex)
{
EventLogProvider.LogInformation("LoadNutritionInfoTask", "ScrubData", ex.Message);
return data;
}
}
I'm not concerned with what is being returned right now, I am printing out the characters and the hex codes that correspond to them.
First, you need to make sure you're reading the text with the correct encoding. It appears to me that you are using UTF-8, since you say ® (Unicode code point U+00AE) is 0xC2AE, which is the same as UTF-8. You can use that like:
Encoding.UTF8.GetString(new byte[] { 0xc2, 0xae }) // "®", the registered symbol
// or
using (var streamReader = new StreamReader(file, Encoding.UTF8))
Once you've got it as a string in C#, you should use HttpUtility.HtmlEncode to encode it as HTML. E.g.
HttpUtility.HtmlEncode("SomeStuff®") // result is "SomeStuff®"
Check encoding you are decoding bytes with.
Try this:
string txt = "textwithsymbol";
string html = "<html></html>";
txt = txt.Replace("\u00ae", html);
Obviously you would replace the txt variable with the text you have read in and "\u00ae" is the symbol you are looking for.

MemoryStream to string[]

I read the content of a CSV file from a zip file in memory(the requirment is not to write to disk) into the MemoryStream. and use to following code to get the human readable string
string result = Encoding.ASCII.GetString(memoryStream.ToArray());
However, we would like the result to be a string[] to map each row in the CSV file.
Is there a way to handle this automatically?
Thanks
Firstly, there's no need to call ToArray on the memory stream. Just use a StreamReader, and call ReadLine() repeatedly:
memoryStream.Position = 0; // Rewind!
List<string> rows = new List<string>();
// Are you *sure* you want ASCII?
using (var reader = new StreamReader(memoryStream, Encoding.ASCII))
{
string line;
while ((line = reader.ReadLine()) != null)
{
rows.Add(line);
}
}
You can use Split method to split string by newlines:
string[] result = Encoding.
ASCII.
GetString(memoryStream.ToArray()).
Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
Depending on the contents of your CSV file, this can be a much harder problem than you're giving it credit for.
assume this is your csv:
id, data1, data2
1, some data, more data
2, "This element has a new line
right in the middle of the field", and that can create problems if you're reading line by line
If you simply read this in line by line with reader.ReadLine(), you're not going to get what you want if you happen to have quoted fields with new lines in the middle (which is generally allowed in CSVs). you need something more like this
List<String> results = new List<string>();
StringBuilder nextRow = new StringBuilder();
bool inQuote = false;
char nextChar;
while(reader.ReadChar(out nextChar)){ // pretend ReadChar reads a char into nextChar and returns false when it hits EOF
if(nextChar == '"'){
inQuote = !inQuote;
} else if(!inQuote && nextChar == '\n'){
results.Add(nextRow.ToString());
nextRow.Length = 0;
} else{ nextString.Append(nextChar); }
}
note that this handles double quotes. Missing quotes will be a problem, but they always are in .csv files.

C# Why do I only get partial results when parsing out a CSV or TSV file?

I am trying to get the second value from a CSV file with 100 rows. I am getting the first 42 values then it stops... no error messege, or error handling at all for that matter. I am perplexed and am on a timeline. It is also doing it for a TSV file, but giving the first 43 results. Please help and let me know if it looks strange to you.
I am using streamreader, reading each line into a string array, splitting the array and taking the second value and adding it to a list...
string path = #"C:\Users\dave\Desktop\codes\testfile.txt";
StreamReader sr = new StreamReader(path);
List<string> stkno = new List<string>();
foreach (var line in path)
{
string s = sr.ReadLine();
string[] words = s.Split(',');
stkno.Add(words[1]);
}
var message = string.Join(",", stkno.ToArray());
MessageBox.Show(message);
Your path variable is a string. That means when you foreach over it, you're getting a sequence of characters - 'C' then ':' then '\' etc. I don't think that's what you mean to do...
Here's a simpler approach using File.ReadLines:
string path = #"C:\Users\dave\Desktop\codes\testfile.txt";
List<string> stkno = (from line in File.ReadLines(path)
let words = line.Split(',')
select words[1]).ToList();
Or:
string path = #"C:\Users\dave\Desktop\codes\testfile.txt";
List<string> stkno = File.ReadLines(path)
.Select(line => line.Split(',')[1])
.ToList();
If you're using .NET 3.5 and you don't mind reading the whole file in one go, you can use File.ReadAllLines instead.
You are accidentally iterating over the number of characters in the file path instead of the number of lines in the string. This change should fix that:
string path = #"C:\Users\dave\Desktop\codes\testfile.txt";
StreamReader sr = new StreamReader(path);
List<string> stkno = new List<string>();
while (sr.Peek() >= 0)
{
string s = sr.ReadLine();
string[] words = s.Split(',');
stkno.Add(words[1]);
}
var message = string.Join(",", stkno.ToArray());
MessageBox.Show(message);
How about this:
string path = #"C:\Users\dave\Desktop\codes\testfile.txt";
var secondWords = from line in File.ReadAllLines(path)
let words = line.Split(',')
select words[1];
var message = string.Join(",", secondWords.ToArray());
I think you mean to do:
string path = #"C:\Users\dave\Desktop\codes\testfile.txt";
StreamReader sr = new StreamReader(path);
List<string> stkno = new List<string>();
string s;
while(s = sr.ReadLine() != null)
{
string[] words = s.Split(',');
stkno.Add(words[1]);
}
var message = string.Join(",", stkno.ToArray());
MessageBox.Show(message);

How to trim the illegal characters from a string

I will read a file from my computer using
StreamReader sr = new StreamReader(FileName);
string str = sr.ReadToEnd();
In this i am getting some illegal characters like /n,/r and some other.
I Would like to replace illegal characters with a empty character. I tried of making an character array but i did not able to remove those so can any one help me
You can use the String.Replace method:
string str = sr.ReadToEnd().Replace("\r", "").Replace("\n", "");
However it's not a very good idea if the string is long and you have a long list of illegal characters, because each call to Replace will create a new instance of String. A better option would be to filter out the illegal characters using Linq :
char[] illegalChars = new[] { '\r', '\n' }; // add other illegal chars if needed
char[] chars = sr.ReadToEnd().Where(c => !illegalChars.Contains(c)).ToArray();
string str = new String(chars);
However the call to Contains adds overhead, it is faster to test directly against each illegal character:
char[] chars = sr.ReadToEnd().Where(c => c != '\r' && c != '\n').ToArray();
string str = new String(chars);
And for completeness, here's an even faster version:
StringBuilder sb = new StringBuilder();
foreach(char c in sr.ReadToEnd())
{
if (c != '\r' && c != '\n')
sb.Append(c);
}
string str = sb.ToString();
string str = string.Join(string.Empty, File.ReadAllLines(FileName));
StreamReader sr = new StreamReader (FileName);
StringBuilder sb = new StringBuilder (sr.ReadToEnd());
sb.Replace ("\r\n", String.Empty);
sb.Replace ("\n", String.Empty);
string hereIsYourString = sb.ToString ();

Categories