How to trim the illegal characters from a string - c#

I will read a file from my computer using
StreamReader sr = new StreamReader(FileName);
string str = sr.ReadToEnd();
In this i am getting some illegal characters like /n,/r and some other.
I Would like to replace illegal characters with a empty character. I tried of making an character array but i did not able to remove those so can any one help me

You can use the String.Replace method:
string str = sr.ReadToEnd().Replace("\r", "").Replace("\n", "");
However it's not a very good idea if the string is long and you have a long list of illegal characters, because each call to Replace will create a new instance of String. A better option would be to filter out the illegal characters using Linq :
char[] illegalChars = new[] { '\r', '\n' }; // add other illegal chars if needed
char[] chars = sr.ReadToEnd().Where(c => !illegalChars.Contains(c)).ToArray();
string str = new String(chars);
However the call to Contains adds overhead, it is faster to test directly against each illegal character:
char[] chars = sr.ReadToEnd().Where(c => c != '\r' && c != '\n').ToArray();
string str = new String(chars);
And for completeness, here's an even faster version:
StringBuilder sb = new StringBuilder();
foreach(char c in sr.ReadToEnd())
{
if (c != '\r' && c != '\n')
sb.Append(c);
}
string str = sb.ToString();

string str = string.Join(string.Empty, File.ReadAllLines(FileName));

StreamReader sr = new StreamReader (FileName);
StringBuilder sb = new StringBuilder (sr.ReadToEnd());
sb.Replace ("\r\n", String.Empty);
sb.Replace ("\n", String.Empty);
string hereIsYourString = sb.ToString ();

Related

Finding ® in a string of text

Let me rephrase my question:
I am reading in text where one of the characters is the registered symbol, ®, from a text file that has no problem displaying the symbol. When I try to print the string after reading it from the file, the symbol is an unprintable character. When I read in the string and split the string to characters and convert the character to an Int16 and print out the hex, I get 0xFFFD. I specify Encoding.UTF8 when I open the StreamReader.
Here is what I have
using (System.IO.StreamReader sr = new System.IO.StreamReader(HttpContext.Current.Server.MapPath("~/App_Code/Hormel") + "/nutrition_data.txt", System.Text.Encoding.UTF8))
{
string line;
while((line = sr.ReadLine()) != null)
{
//after spliting the file on '~'
items[i] = scrubData(utf8.GetString(utf8.GetBytes(items[i].ToCharArray())));
//items[i] = scrubData(items[i]); //original
}
}
Here is the scrubData function
private String scrubData(string data)
{
string newStr = String.Empty;
try
{
if (data.Contains("HORMEL"))
{
string[] s = data.Split(' ');
foreach(string str in s)
{
if (str.Contains("HORMEL"))
{
char[] ch = str.ToCharArray();
for(int i=0; i<ch.Length; i++)
{
EventLogProvider.LogInformation("LoadNutritionInfoTask", "Test", ch[i] + " = " + String.Format("{0:X}", Convert.ToInt16(ch[i])));
}
}
}
}
return String.Empty;
}
catch (Exception ex)
{
EventLogProvider.LogInformation("LoadNutritionInfoTask", "ScrubData", ex.Message);
return data;
}
}
I'm not concerned with what is being returned right now, I am printing out the characters and the hex codes that correspond to them.
First, you need to make sure you're reading the text with the correct encoding. It appears to me that you are using UTF-8, since you say ® (Unicode code point U+00AE) is 0xC2AE, which is the same as UTF-8. You can use that like:
Encoding.UTF8.GetString(new byte[] { 0xc2, 0xae }) // "®", the registered symbol
// or
using (var streamReader = new StreamReader(file, Encoding.UTF8))
Once you've got it as a string in C#, you should use HttpUtility.HtmlEncode to encode it as HTML. E.g.
HttpUtility.HtmlEncode("SomeStuff®") // result is "SomeStuff®"
Check encoding you are decoding bytes with.
Try this:
string txt = "textwithsymbol";
string html = "<html></html>";
txt = txt.Replace("\u00ae", html);
Obviously you would replace the txt variable with the text you have read in and "\u00ae" is the symbol you are looking for.

Adding chars before and after each word in a string

Imagine we have a string as :
String mystring = "A,B,C,D";
I would like to add an apostrophe before and after each word in my string.Such as:
"'A','B','C','D'"
How can i achieve that?
What's your definition of a word? Anything between commas?
First get the words:
var words = mystring.Split(',');
Then add the apostrophes:
words = words.Select(w => String.Format("'{0}'", w));
And turn them back into one string:
var mynewstring = String.Join(",", words);
mystring = "'" + mystring.replace(",", "','") + "'";
I would let each "word" be determined by the regex \b word boundary. So, you have:
var output = Regex.Replace("A,B,C,D", #"(\b)", #"'$1");
string str = "a,b,c,d";
string.Format("'{0}'", str.Replace(",", "','"));
or
string str = "a,b,c,d";
StringBuilder sb = new StringBuilder(str.Length * 2 + 2);
foreach (var c in str.ToCharArray())
{
sb.AppendFormat((c == ',' ? "{0}" : "'{0}'"), c);
}
str = sb.ToString();
string mystring = "A,B,C,D";
string[] array = mystring.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
string newstring = "";
foreach (var item in array)
{
newstring += "'" + item + "',";
}
newstring = newstring.Remove(newstring.Length - 1);
Console.WriteLine(newstring);
Output will be;
'A','B','C','D'
Here a DEMO.
Or more simple;
string mystring = "A,B,C,D";
Console.WriteLine(string.Format("'{0}'", mystring.Replace(",", "','")));
you can use regular expressions to solve this problem
like this:
string words= "A,B,C,D";Regex reg = new Regex(#"(\w+)");words = reg.Replace(words, match=> { return string.Format("'{0}'", match.Groups[1].Value); });

Differentiate between tab-separated, space-separated and comma-separated streams

I am using the following code to read a tab-delimited stream.
using (StreamReader readFile = new StreamReader(path))
{
string line;
string[] row;
while ((line = readFile.ReadLine()) != null)
{
row = line.Split('\t');
parsedData.Add(row);
}
}
However, occasionally a user may supply a space-separated or comma-separated file. How do I automatically detect the delimiter instead of having to change row = line.Split('\t'); to row = line.Split(' '); or row = line.Split(',');?
Thanks.
You can use to string.Split method to split your data by number of characters
var delims = new [] {',', '\t', ' ' };
var result = line.Split(delims, StringSplitOptions.RemoveEmptyEntries);
Or you can use Regex
var result = Regex.Split(line, #"[,\t ]+")
You can't differentiate between them before hand.
What you can do is try to split on all of them:
row = line.Split('\t', ' ', ',');
This of course assumes that the data between delimiters doesn't contain the delimiters.
You'll have to define what a separator is and how you detect it. If you say: "The separator for a file is the first non-quoted whitespace character I encounter on the first line", then you can read the first line and determine the separator. You can then pass that to the .Split() method.
row = line.Split(new char[]{' ', ',', '\t'}, StringSplitOptions.RemoveEmptyEntries);

How to get correct string text?

I'm trying to obtain the correct unicode characters represented by this string:
string originalString = "\u0605\u04c3\u5000\u0000\u5000\ufd00\u4400\ud500\u7600\ud300\u4f00\ubc00\u0c00\u2d00\u4000\ue400\u0e00\u7400\u4800\ub700\u1d00\u1300\ue900\u6000\u4c00\ufb00\u9900\u3900\ud900\u6700\uae00\ueb00\u8f00\u2800\u0200\ub300\u5c00\ufe00\u0100\u3d00\u9100\u3000\u0300\u1600\u0100\u7000\u6200\u8e00\u1d00\u8e00\u6200\ua900\u6300\uc800\u0900\ub700\ub000\u6000\ue400\u9200\u3f00\u9100\u8d00\uef00\u3600\u0100\u9e00\u0081";
If I hard-code it in the cs file, I can see in debug mode that it shows the correct characters, but if I have the exact string written in a file and I try to read it, it shows the string as it is in the file.
TextReader tr = new StreamReader("c:\\test.txt");
string tmpString = tr.ReadLine();
tr.Close();
byte[] array = Encoding.Unicode.GetBytes(tmpString );
string finalResult = Encoding.Unicode.GetString(array);
How can I make the finalResult string have the correct unicode characters?
Thanks in advance
Gonçalo
EDIT: Already tried placing
TextReader tr = new StreamReader("c:\\test.txt",Encoding.Unicode);
but the characters are different from the correct ones.
Does your file actually contain the content:
\u0605\u04c3\u5000\u0000\u5000\ufd00\u4400\ud500\u7600\ud300\u4f00
\ubc00\u0c00\u2d00\u4000\ue400\u0e00\u7400\u4800\ub700\u1d00\u1300
\ue900\u6000\u4c00\ufb00\u9900\u3900\ud900\u6700\uae00\ueb00\u8f00
\u2800\u0200\ub300\u5c00\ufe00\u0100\u3d00\u9100\u3000\u0300\u1600
\u0100\u7000\u6200\u8e00\u1d00\u8e00\u6200\ua900\u6300\uc800\u0900
\ub700\ub000\u6000\ue400\u9200\u3f00\u9100\u8d00\uef00\u3600\u0100\u9e00\u0081
If so, you need to convert each sequence to its corresponding unicode character
string originalString = "\u0605\u04c3\u5000\u0000\u5000\ufd00\u4400\ud500\u7600\ud300\u4f00\ubc00\u0c00\u2d00\u4000\ue400\u0e00\u7400\u4800\ub700\u1d00\u1300\ue900\u6000\u4c00\ufb00\u9900\u3900\ud900\u6700\uae00\ueb00\u8f00\u2800\u0200\ub300\u5c00\ufe00\u0100\u3d00\u9100\u3000\u0300\u1600\u0100\u7000\u6200\u8e00\u1d00\u8e00\u6200\ua900\u6300\uc800\u0900\ub700\ub000\u6000\ue400\u9200\u3f00\u9100\u8d00\uef00\u3600\u0100\u9e00\u0081";
string tmpString = "\\u0605\\u04c3\\u5000\\u0000\\u5000\\ufd00\\u4400\\ud500\\u7600\\ud300\\u4f00\\ubc00\\u0c00\\u2d00\\u4000\\ue400\\u0e00\\u7400\\u4800\\ub700\\u1d00\\u1300\\ue900\\u6000\\u4c00\\ufb00\\u9900\\u3900\\ud900\\u6700\\uae00\\ueb00\\u8f00\\u2800\\u0200\\ub300\\u5c00\\ufe00\\u0100\\u3d00\\u9100\\u3000\\u0300\\u1600\\u0100\\u7000\\u6200\\u8e00\\u1d00\\u8e00\\u6200\\ua900\\u6300\\uc800\\u0900\\ub700\\ub000\\u6000\\ue400\\u9200\\u3f00\\u9100\\u8d00\\uef00\\u3600\\u0100\\u9e00\\u0081";
string finalResult = Regex.Replace(tmpString, #"\\u(....)", match => ((char)int.Parse(match.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)).ToString());
you can use the Encoding as parameter while reading the file
TextReader tr = new StreamReader("c:\\test.txt",Encoding.Unicode);
string unicode_string = tr.ReadLine();
Try something like:
TextReader streamReader = new StreamReader("c:\\test.txt");
string input = streamReader.ReadLine();
string[] chars = input.Split(new char[] { '\\', 'u' },
StringSplitOptions.RemoveEmptyEntries);
streamReader.Close();
string answer = string.Empty;
foreach (string charachter in chars)
{
byte byte1 = byte.Parse(string.Format("{0}{1}",
charachter[0], charachter[1]), NumberStyles.AllowHexSpecifier);
byte byte2 = byte.Parse(string.Format("{0}{1}",
charachter[2], charachter[3]), NumberStyles.AllowHexSpecifier);
answer += Encoding.Unicode.GetString(new byte[] { byte2, byte1 });
}

convert string array to string

I would like to convert a string array to a single string.
string[] test = new string[2];
test[0] = "Hello ";
test[1] = "World!";
I would like to have something like "Hello World!"
string[] test = new string[2];
test[0] = "Hello ";
test[1] = "World!";
string.Join("", test);
A slightly faster option than using the already mentioned use of the Join() method is the Concat() method. It doesn't require an empty delimiter parameter as Join() does. Example:
string[] test = new string[2];
test[0] = "Hello ";
test[1] = "World!";
string result = String.Concat(test);
hence it is likely faster.
A simple string.Concat() is what you need.
string[] test = new string[2];
test[0] = "Hello ";
test[1] = "World!";
string result = string.Concat(test);
If you also need to add a seperator (space, comma etc) then, string.Join() should be used.
string[] test = new string[2];
test[0] = "Red";
test[1] = "Blue";
string result = string.Join(",", test);
If you have to perform this on a string array with hundereds of elements than string.Join() is better by performace point of view. Just give a "" (blank) argument as seperator. StringBuilder can also be used for sake of performance, but it will make code a bit longer.
Try:
String.Join("", test);
which should return a string joining the two elements together. "" indicates that you want the strings joined together without any separators.
In the accepted answer, String.Join isn't best practice per its usage. String.Concat should have be used since OP included a trailing space in the first item: "Hello " (instead of using a null delimiter).
However, since OP asked for the result "Hello World!", String.Join is still the appropriate method, but the trailing whitespace should be moved to the delimiter instead.
// string[] test = new string[2];
// test[0] = "Hello ";
// test[1] = "World!";
string[] test = { "Hello", "World" }; // Alternative array creation syntax
string result = String.Join(" ", test);
Aggregate can also be used for same.
string[] test = new string[2];
test[0] = "Hello ";
test[1] = "World!";
string joinedString = test.Aggregate((prev, current) => prev + " " + current);
string ConvertStringArrayToString(string[] array)
{
//
// Concatenate all the elements into a StringBuilder.
//
StringBuilder strinbuilder = new StringBuilder();
foreach (string value in array)
{
strinbuilder.Append(value);
strinbuilder.Append(' ');
}
return strinbuilder.ToString();
}
I used this way to make my project faster:
RichTextBox rcbCatalyst = new RichTextBox()
{
Lines = arrayString
};
string text = rcbCatalyst.Text;
rcbCatalyst.Dispose();
return text;
RichTextBox.Text will automatically convert your array to a multiline string!
Like this:
string str= test[0]+test[1];
You can also use a loop:
for(int i=0; i<2; i++)
str += test[i];

Categories