How to compare characters in C# - c#

I'm trying to compare two characters in C#. The "==" operator does not work for strings, you have to use the .Equals() method. In the following code example I want to read each character in the input string, and output another string without spaces.
string inputName, outputName = null;
// read input name from file
foreach (char indexChar in inputName)
{
if (!indexChar.Equals(" "))
outputName += indexChar;
}
This does not work, the comparison always equals false, even when the input name has embedded spaces. I also tried using the overload method Equals(string, string), which did not work either. I'm assuming C# treats char variables as a string of length 1. Microsoft's documentation doesn't seem to mention comparing characters. Does anyone have a better method for comparing characters in a string?

" " is a string of length one; a char and a string never match; you want ' ', the space character:
if (indexChar != ' ')
However, if you're just trying to remove all spaces, it is probably easier to just do:
var outputName = inputName.Replace(" ", "");
This avoids allocating lots of intermediate strings.
Note also that the space character isn't the only whitespace character in unicode. If you need to deal with all whitespace characters, a regex may be a better option:
var outputName = Regex.Replace(inputName, #"\s", "");

You can use .CompareTo(char) to compare characters.
Example :
if('Z'.CompareTo('Z') == 0)
Console.WriteLine("Same character !");

Thanks for all the great suggestions. inputName.CompareTo(" ") is not the way to go for this example, you would still have to have a loop. I ended up using:
var outputName = Regex.Replace(inputName, #"\s", "")
which works, and it's only one line of code!

Related

Convert special character in string into a char

I have this string:
string specialCharacterString = #"\n";
where "\n" is the new line special character.
Is it possible convert/assign that string (of two characters) into a (single) char. How do I do something like:
char specialCharacter = Parse(specialCharacterString);
Where specialCharacter value would be equal to \n
Is there anything in dotnet that would parse the string for me or must I use if or switch the string (the string can contain any special character) to accomplish what I want. Note that char.Parse(string) cannot handle special characters and thinks the string above is actually two characters.
Maybe I am oversimplifying but can't you just do the following:
txtString.Replace("\n", "$");
It is technically a string to string replacement but would be string to char...
You can always cast it to a char since you know what char you are replacing the string with.
Not sure, what business need it is, but if you need parsing C# in C# you can use some tools like Antlr, which supports C# grammar (https://github.com/antlr/grammars-v4/)
I don't think there is any ready tool designed just for strings
Try use Regex.Unescape(specialCharacterString);
It will return the new string with escape characters.
For example:
var literalStringWithEscapeCharacters = #"Hello\tWorld";
var stringWithEscapeCharacters = Regex.Unescape(literalStringWithEscapeCharacters);
Console.WriteLine(stringWithEscapeCharacters);
Will print: Hello World
Instead of: Hello\tWorld
Then you can find escape characters in stringWithEscapeCharacters like this:
var escapeChars= new [] { '\n' };
var characters = stringWithEscapeCharacters.Where(c => escapeChars.Contains(c)).ToList();
All escape characters described here:
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/strings/#string-escape-sequences

C# Char Array remove at specific index

Not to sure the best way to remove the char from the char array if the char at a given index is a number.
private string TextBox_CharacterCheck(string tocheckTextBox)
{
char[] charlist = tocheckTextBox.ToCharArray();
foreach (char character in charlist)
{
if (char.IsNumber(character))
{
}
}
return (new string(charlist));
}
Thanks in advance.
// this is now resolved. thank you to all who contributed
You could use the power of Linq:
return new string(tocheckTextBox.Where(c => !char.IsNumber(c)).ToArray())
This is fairly easy using Regex:
var result = Regex.Replace("a1b2c3d4", #"\d", "");
(as #Adassko notes, you can use "[0-9]" instead of #"\d" if you just want the digits 0 to 9, and not any other numeric characters).
You can also do it fairly efficiently using a StringBuilder:
var sb = new StringBuilder();
foreach (var ch in "a1b2c3d4")
{
if (!char.IsNumber(ch))
{
sb.Append(ch);
}
}
var result = sb.ToString();
You can also do it with linq:
var result = new string("a1b2c3d4".Where(x => !char.IsNumber(x)).ToArray());
Use Regex:
private string TextBox_CharacterCheck(string tocheckTextBox)
{
return Regex.Replace(tocheckTextBox, #"[\d]", string.Empty);;
}
System.String is immutable. You could use string.Replace or a regular expression to remove unwanted characters into a new string.
your best bet is to use regular expressions.
strings are immutable meaning that you can't change them - you need to rewrite the whole string - to do it in optimal way you should use StringBuilder class and Append every character that you want.
Also watch out for your code - char.IsNumber checks not only for characters 0-9, it also returns true for every numeric character such as ٢ and you probably don't want that.
here's the full list of characters returning true:
0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789
you should also use [0-9] rather than \d in your regular expressions if you want only parsable digits.
You can also use a trick to .Split your string on your character, then .Join it back. This not only allows you to remove one or more characters, it also lets you to replace it with some other character.
I use this trick to remove incorrect characters from file name:
string.Join("-", possiblyIncorrectFileName.Split(Path.GetInvalidFileNameChars()))
this code will replace any character that cannot be used in valid file name to -
You can use LINQ to remove the char from the char array if the char at a given index is a number.
CODE
//This will return you the list of char discarding the number.
var removedDigits = tocheckTextBox.Where(x => !char.IsDigit(x));
//This will return the string without numbers.
string output = string.join("", removedDigits);

Replace Unicode character "�" with a space

I'm a doing an massive uploading of information from a .csv file and I need replace this character non ASCII "�" for a normal space, " ".
The character "�" corresponds to "\uFFFD" for C, C++, and Java, which it seems that it is called REPLACEMENT CHARACTER. There are others, such as spaces type like U+FEFF, U+205F, U+200B, U+180E, and U+202F in the C# official documentation.
I'm trying do the replace this way:
public string Errors = "";
public void test(){
string textFromCsvCell = "";
string validCharacters = "^[0-9A-Za-z().:%-/ ]+$";
textFromCsvCell = "This is my text from csv file"; //All spaces aren't normal space " "
string cleaned = textFromCsvCell.Replace("\uFFFD", "\"")
if (Regex.IsMatch(cleaned, validCharacters ))
//All code for insert
else
Errors=cleaned;
//print Errors
}
The test method shows me this text:
"This is my�texto from csv file"
I try some solutions too:
Trying solution 1: Using Trim
Regex.Replace(value.Trim(), #"[^\S\r\n]+", " ");
Try solution 2: Using Replace
System.Text.RegularExpressions.Regex.Replace(str, #"\s+", " ");
Try solution 3: Using Trim
String.Trim(new char[]{'\uFEFF', '\u200B'});
Try solution 4: Add [\S\r\n] to validCharacters
string validCharacters = "^[\S\r\n0-9A-Za-z().:%-/ ]+$";
Nothing works.
How can I replace it?
Sources:
Unicode Character 'REPLACEMENT CHARACTER' (U+FFFD)
Trying to replace all white space with a single space
Strip the byte order mark from string in C#
Remove extra whitespaces, but keep new lines using a regular expression in C#
EDITED
This is the original string:
"SYSTEM OF MONITORING CONTINUES OF GLUCOSE"
in 0x... notation
SYSTEM OF0xA0MONITORING CONTINUES OF GLUCOSE
Solution
Go to the Unicode code converter. Look at the conversions and do the replace.
In my case, I do a simple replace:
string value = "SYSTEM OF MONITORING CONTINUES OF GLUCOSE";
//value contains non-breaking whitespace
//value is "SYSTEM OF�MONITORING CONTINUES OF GLUCOSE"
string cleaned = "";
string pattern = #"[^\u0000-\u007F]+";
string replacement = " ";
Regex rgx = new Regex(pattern);
cleaned = rgx.Replace(value, replacement);
if (Regex.IsMatch(cleaned,"^[0-9A-Za-z().:<>%-/ ]+$"){
//all code for insert
else
//Error messages
This expression represents all possible spaces: space, tab, page break, line break and carriage return
[ \f\n\r\t\v​\u00a0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​​\u202f\u205f​\u3000]
References
Regular expressions (MDN)
Using String.Replace:
Use a simple String.Replace().
I've assumed that the only characters you want to remove are the ones you've mentioned in the question: � and you want to replace them by a normal space.
string text = "imp�ortant";
string cleaned = text.Replace('\u00ef', ' ')
.Replace('\u00bf', ' ')
.Replace('\u00bd', ' ');
// Returns 'imp ortant'
Or using Regex.Replace:
string cleaned = Regex.Replace(text, "[\u00ef\u00bf\u00bd]", " ");
// Returns 'imp ortant'
Try it out: Dotnet Fiddle
Define a range of ASCII characters, and replace anything that is not within that range.
We want to find only Unicode characters, so we will match on a Unicode character and replace.
Regex.Replace("This is my te\uFFFDxt from csv file", #"[^\u0000-\u007F]+", " ")
The above pattern will match anything that is not ^ in the set [ ] of this range \u0000-\u007F (ASCII characters (everything past \u007F is Unicode)) and replace it with a space.
Result
This is my te xt from csv file
You can adjust the range provided \u0000-\u007F as needed to expand the range of allowed characters to suit your needs.
If you just want ASCII then try the following:
var ascii = new ASCIIEncoding();
byte[] encodedBytes = ascii.GetBytes(text);
var cleaned = ascii.GetString(encodedBytes).Replace("?", " ");

How do I find and remove any rule or newline in an output? [duplicate]

How can I replace Line Breaks within a string in C#?
Use replace with Environment.NewLine
myString = myString.Replace(System.Environment.NewLine, "replacement text"); //add a line terminating ;
As mentioned in other posts, if the string comes from another environment (OS) then you'd need to replace that particular environments implementation of new line control characters.
The solutions posted so far either only replace Environment.NewLine or they fail if the replacement string contains line breaks because they call string.Replace multiple times.
Here's a solution that uses a regular expression to make all three replacements in just one pass over the string. This means that the replacement string can safely contain line breaks.
string result = Regex.Replace(input, #"\r\n?|\n", replacementString);
To extend The.Anyi.9's answer, you should also be aware of the different types of line break in general use. Dependent on where your file originated, you may want to look at making sure you catch all the alternatives...
string replaceWith = "";
string removedBreaks = Line.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
should get you going...
I would use Environment.Newline when I wanted to insert a newline for a string, but not to remove all newlines from a string.
Depending on your platform you can have different types of newlines, but even inside the same platform often different types of newlines are used. In particular when dealing with file formats and protocols.
string ReplaceNewlines(string blockOfText, string replaceWith)
{
return blockOfText.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);
}
If your code is supposed to run in different environments, I would consider using the Environment.NewLine constant, since it is specifically the newline used in the specific environment.
line = line.Replace(Environment.NewLine, "newLineReplacement");
However, if you get the text from a file originating on another system, this might not be the correct answer, and you should replace with whatever newline constant is used on the other system. It will typically be \n or \r\n.
if you want to "clean" the new lines, flamebaud comment using regex #"[\r\n]+" is the best choice.
using System;
using System.Text.RegularExpressions;
class MainClass {
public static void Main (string[] args) {
string str = "AAA\r\nBBB\r\n\r\n\r\nCCC\r\r\rDDD\n\n\nEEE";
Console.WriteLine (str.Replace(System.Environment.NewLine, "-"));
/* Result:
AAA
-BBB
-
-
-CCC
DDD---EEE
*/
Console.WriteLine (Regex.Replace(str, #"\r\n?|\n", "-"));
// Result:
// AAA-BBB---CCC---DDD---EEE
Console.WriteLine (Regex.Replace(str, #"[\r\n]+", "-"));
// Result:
// AAA-BBB-CCC-DDD-EEE
}
}
Use new in .NET 6 method
myString = myString.ReplaceLineEndings();
Replaces ALL newline sequences in the current string.
Documentation:
ReplaceLineEndings
Don't forget that replace doesn't do the replacement in the string, but returns a new string with the characters replaced. The following will remove line breaks (not replace them). I'd use #Brian R. Bondy's method if replacing them with something else, perhaps wrapped as an extension method. Remember to check for null values first before calling Replace or the extension methods provided.
string line = ...
line = line.Replace( "\r", "").Replace( "\n", "" );
As extension methods:
public static class StringExtensions
{
public static string RemoveLineBreaks( this string lines )
{
return lines.Replace( "\r", "").Replace( "\n", "" );
}
public static string ReplaceLineBreaks( this string lines, string replacement )
{
return lines.Replace( "\r\n", replacement )
.Replace( "\r", replacement )
.Replace( "\n", replacement );
}
}
To make sure all possible ways of line breaks (Windows, Mac and Unix) are replaced you should use:
string.Replace("\r\n", "\n").Replace('\r', '\n').Replace('\n', 'replacement');
and in this order, to not to make extra line breaks, when you find some combination of line ending chars.
Why not both?
string ReplacementString = "";
Regex.Replace(strin.Replace(System.Environment.NewLine, ReplacementString), #"(\r\n?|\n)", ReplacementString);
Note: Replace strin with the name of your input string.
I needed to replace the \r\n with an actual carriage return and line feed and replace \t with an actual tab. So I came up with the following:
public string Transform(string data)
{
string result = data;
char cr = (char)13;
char lf = (char)10;
char tab = (char)9;
result = result.Replace("\\r", cr.ToString());
result = result.Replace("\\n", lf.ToString());
result = result.Replace("\\t", tab.ToString());
return result;
}
var answer = Regex.Replace(value, "(\n|\r)+", replacementString);
As new line can be delimited by \n, \r and \r\n, first we’ll replace \r and \r\n with \n, and only then split data string.
The following lines should go to the parseCSV method:
function parseCSV(data) {
//alert(data);
//replace UNIX new lines
data = data.replace(/\r\n/g, "\n");
//replace MAC new lines
data = data.replace(/\r/g, "\n");
//split into rows
var rows = data.split("\n");
}
Use the .Replace() method
Line.Replace("\n", "whatever you want to replace with");
Best way to replace linebreaks safely is
yourString.Replace("\r\n","\n") //handling windows linebreaks
.Replace("\r","\n") //handling mac linebreaks
that should produce a string with only \n (eg linefeed) as linebreaks.
this code is usefull to fix mixed linebreaks too.
Another option is to create a StringReader over the string in question. On the reader, do .ReadLine() in a loop. Then you have the lines separated, no matter what (consistent or inconsistent) separators they had. With that, you can proceed as you wish; one possibility is to use a StringBuilder and call .AppendLine on it.
The advantage is, you let the framework decide what constitutes a "line break".
string s = Regex.Replace(source_string, "\n", "\r\n");
or
string s = Regex.Replace(source_string, "\r\n", "\n");
depending on which way you want to go.
Hopes it helps.
If you want to replace only the newlines:
var input = #"sdfhlu \r\n sdkuidfs\r\ndfgdgfd";
var match = #"[\\ ]+";
var replaceWith = " ";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input.Replace(#"\n", replaceWith).Replace(#"\r", replaceWith), match, replaceWith);
Console.WriteLine("output: " + x);
If you want to replace newlines, tabs and white spaces:
var input = #"sdfhlusdkuidfs\r\ndfgdgfd";
var match = #"[\\s]+";
var replaceWith = "";
Console.WriteLine("input: " + input);
var x = Regex.Replace(input, match, replaceWith);
Console.WriteLine("output: " + x);
This is a very long winded one-liner solution but it is the only one that I had found to work if you cannot use the the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method
MyStr.replace( System.String.Concat( System.Char.ConvertFromUtf32(13).ToString(), System.Char.ConvertFromUtf32(10).ToString() ), ReplacementString );
This is somewhat offtopic but to get it to work inside Visual Studio's XML .props files, which invoke .NET via the XML properties, I had to dress it up like it is shown below.
The Visual Studio XML --> .NET environment just would not accept the special character escapes like "\r" and "\n" and \x0d and \u000D as well as System.Environment.NewLine as parameters to thereplace() method.
$([System.IO.File]::ReadAllText('MyFile.txt').replace( $([System.String]::Concat($([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString()))),$([System.String]::Concat('^',$([System.Char]::ConvertFromUtf32(13).ToString()),$([System.Char]::ConvertFromUtf32(10).ToString())))))
Based on #mark-bayers answer and for cleaner output:
string result = Regex.Replace(ex.Message, #"(\r\n?|\r?\n)+", "replacement text");
It removes \r\n , \n and \r while perefer longer one and simplify multiple occurances to one.

Remove a single Special Character from a String

I am new to C# and I want to know that how can I remove a single apostrophe ( ' ) from my string. I have a problem that I am using my code to remove other special characters and it works fine except this special character ( ' ).
My code is:
mystring=mystring.Replace(#"'"," ");
How can i remove this character from my string is there any other way can anybody please help me?
The character you are showing us in the comment is another one than the one you are using in the code
(’) => is ANSI 146 (in comment, 92 hex)
(') => is ANSI 39 (in code)
Solution 1: Copy paste the character from the source into the code.
Solution 2: Use a unicode escape sequence:
mystring = mystring.Replace("\u0092", " ");
or, using chars instead of strings:
mystring = mystring.Replace('\u0092', ' ');
Note, in your example you are replacing the apostrophe by a space. If you want to remove it instead do:
mystring = mystring.Replace("\u0092", "");
See: ANSI character set and equivalent Unicode and HTML characters.
That is not a regular apostrophe.
You need something more like this.
mystring = mystring.Replace("\x92", "");
You can use the Regex.Replace method
string output = Regex.Replace(mystring, #"'", "");
I hope I helped
//we can remove . or any special character from string using Replace in csharp//
string name = " .Akhil. ";
name = name.Replace( " .Akhil. ", "Akhil");
Console.WriteLine(name);

Categories