Unable to remove invisible chars using Regex - c#

I want to remove any invisible chars from a string, only keep spaces & any chars from 0x20-0x7F,
I use this: Regex.Replace(QueryString, #"[^\s\x20-\x7F]", "");
However it does not work
QueryString has a char 0xA0, after that, the char still exists in QueryString.
I am not sure why this failed to work?

0xA0 is the non-breaking space character - and as such it's matched with \s. Rather than using \s, expand this out into the list of whitespace characters you want to include.

I think you would rather use StringBuilder to process such strings.
StringBuilder sb = new StringBuilder(str.Length);
foreach(char ch in str)
{
if (0x20 <= ch && ch <= 0x7F)
{
sb.Append(ch)
}
}
string result = sb.ToString();

Related

c# add comma before every numbers in my string except first number

I am developing as application in asp.net mvc.
I have a string like below
string myString = "1A5#3a2#"
now I want to add a comma after every occurrence of number in my string except the first occurrence.
like
string myNewString "1A,5#,3a,2#";
I know I can use loop for this like below
myNewString
foreach(var ch in myString)
{
if (ch >= '0' && ch <= '9')
{
myNewString = myNewString ==""?"":myNewString + "," + Convert.ToString(ch);
}
else
{
myNewString = myNewString ==""? Convert.ToString(ch): myNewString + Convert.ToString(ch);
}
}
You could use this StringBuilder approach:
public static string InsertBeforeEveryDigit(string input, char insert)
{
StringBuilder sb = new(input);
for (int i = sb.Length - 2; i >= 0; i--)
{
if (!char.IsDigit(sb[i]) && char.IsDigit(sb[i+1]))
{
sb.Insert(i+1, insert);
}
}
return sb.ToString();
}
Console.Write(InsertBeforeEveryDigit("1A5#3a2#", ',')); // 1A,5#,3a,2#
Update: This answer gives a different result than the one from TWM if the string contains consecutive digits like here: "12A5#3a2#". My answer gives: 12A,5#,3a,2#,
TWM's gives: 1,2A,5#,3a,2#. Not sure what is desired.
so, as I understood the below code will work for you
StringBuilder myNewStringBuilder = new StringBuilder();
foreach(var ch in myString)
{
if (ch >= '0' && ch <= '9')
{
if (myNewStringBuilder.Length > 0)
{
myNewStringBuilder.Append(",");
}
myNewStringBuilder.Append(ch);
}
else
{
myNewStringBuilder.Append(ch);
}
}
myString = myNewStringBuilder.ToString();
NOTE
Instead of using myNewString variable, I've used StringBuilder object to build up the new string. This is more efficient than concatenating strings, as concatenating strings creates new strings and discards the old ones. The StringBuilder object avoids this by efficiently storing the string in a mutable buffer, reducing the number of object allocations and garbage collections.
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. 😊
string myString = "1A5#3a2#";
var result = Regex.Replace(myString, #"(?<=\d\D*)\d\D*", ",$&");
Regex explanation (#regex101):
\d\D* - matches every occurrence of a digit with any following non-digits (zero+)
(?<=\d\D*) - negative lookbehind so we have at least one group with digit before (i.e. ignore first)
This can be updated if you need to handle consecutive digits (i.e. "1a55b" -> "1a,55b") by changing \d to \d+:
var result = Regex.Replace(myString, #"(?<=\d+\D*)\d+\D*", ",$&");

String Conversion - remove some characters and replace non-digits with ASCII code

I need to take the value CS5999-1 and convert it to 678359991. Basically replace any alpha character with the equivalent ASCII value and strip the dash. I need to get rid of non-numeric characters and make the value unique (some of the data coming in is all numeric and I determined this will make the records unique).
I have played around with regular expressions and can replace the characters with an empty string, but can't figure out how to replace the character with an ASCII value.
Code is still stuck in .NET 2.0 (Corporate America) in case that matters for any ideas.
I have tried several different methods to do this and no I don't expect SO members to write the code for me. I am looking for ideas.
to replace the alpha characters with an empty string I have used:
strResults = Regex.Replace(strResults , #"[A-Za-z\s]",string.Empty);
This loop will replace the character with itself. Basically if I could replace find a way to substitute the replace value with an the ACSII value I would have it, but have tried converting the char value to int and several other different methods I found and all come up with an error.
foreach (char c in strMapResults)
{
strMapResults = strMapResults.Replace(c,c);
}
Check if each character is in the a-z range. If so, add the ASCII value to the list, and if it is in the 0-9 range, just add the number.
public static string AlphaToAscii(string str)
{
var result = string.Empty;
foreach (char c in str)
{
if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
result += (int)c;
else if (c >= '0' && c <= '9')
result += c;
}
return result;
}
All characters outside of the alpha-numeric range (such as -) will be ignored.
If you are running this function on particularly large strings or want better performance you may want to use a StringBuilder instead of +=.
For all characters in the ASCII range, the encoded value is the same as the Unicode code point. This is also true of ISO/IEC 8859-1, and UCS-2, but not of other legacy encodings.
And since UCS-2 is the same as UTF-16 for the values in UCS-2 (which includes all ASCII characters, as per the above), and since .NET char is a UTF-16 unit, all you need to do is just cast to int.
var builder = new StringBuilder(str.Length * 3); // Pre-allocate to worse-case scenario
foreach(char c in str)
{
if (c >= '0' && c <= '9')
builder.Append(c);
else if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
builder.Append((int)c);
}
string result = builder.ToString();
If you want to know how you might do this with a regular expression (you mentioned regex in your question), here's one way to do it.
The code below filters all non-digit characters, converting letters to their ASCII representation, and dumping anything else, including all non-ASCII alphabetical characters. Note that treating (int)char as the equivalent of a character's ASCII value is only valid where the character is genuinely available in the ASCII character set, which is clearly the case for A-Za-z.
MatchEvaluator filter = match =>
{
var alpha = match.Groups["asciialpha"].Value;
return alpha != "" ? ((int) alpha[0]).ToString() : "";
};
var filtered = Regex.Replace("CS5999-1", #"(?<asciialpha>[A-Za-z])|\D", filter);
Try this
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string input = "CS5999-1";
MatchEvaluator evaluator = new MatchEvaluator(Replace);
string results = Regex.Replace(input, "[A-Za-z\\-]", evaluator);
}
static string Replace(Match match)
{
if (match.Value == "-")
{
return "";
}
else
{
byte[] ascii = Encoding.UTF8.GetBytes(match.Value);
return ascii[0].ToString();
}
}
}
}
​

how to deal with string.split by position

I'd like to ask one question about String.Split
For example:
char[] semicolon=new [] {';'};
char[] bracket=new [] {'[',']'};
string str="AND[Firstpart;Sndpart]";
I can split str by bracket and then split by semicolon.
Finally,I get the Firstpart and Sndpart in the bracket.
But If str="AND[AND[Firstpart;Sndpart];sndpart];
How can I get AND[Firpart;Sndpart] and sndpart?
Is there a way to tell c# to split by second semicolon?
Thanks for your help
One way is to hide characters inside bracket with a character that is not used in any of your strings.
Method HideSplit: This method will change separator characters inside brackets with fake ones. Then it will perform split and will give back the result with original characters.
This method maybe an overkill if you want to do this many times. but you should be able to optimize it easily if you got the idea.
private static void Main()
{
char[] semicolon = new[] { ';' };
char[] bracket = new[] { '[', ']' };
string str = "AND[AND[Firstpart;Sndpart];sndpart]";
string[] splitbyBracket = HideSplit(str, bracket);
}
private static string[] HideSplit(string str,char[] separator)
{
int counter = 0; // When counter is more than 0 it means we are inside brackets
StringBuilder result = new StringBuilder(); // To build up string as result
foreach (char ch in str)
{
if(ch == ']') counter--;
if (counter > 0) // if we are inside brackets perform hide
{
if (ch == '[') result.Append('\uFFF0'); // add '\uFFF0' instead of '['
else if (ch == ']') result.Append('\uFFF1');
else if (ch == ';') result.Append('\uFFF2');
else result.Append(ch);
}
else result.Append(ch);
if (ch == '[') counter++;
}
string[] split = result.ToString().Split(separator); // Perform split. (characters are hidden now)
return split.Select(x => x
.Replace('\uFFF0', '[')
.Replace('\uFFF1', ']')
.Replace('\uFFF2', ';')).ToArray(); // unhide characters and give back result.
// dont forget: using System.Linq;
}
Some examples :
string[] a1 = HideSplit("AND[AND[Firstpart;Sndpart];sndpart]", bracket);
// Will give you this array { AND , AND[Firstpart;Sndpart];sndpart }
string[] a2 = HideSplit("AND[Firstpart;Sndpart];sndpart", semicolon);
// Will give you this array { AND[Firstpart;Sndpart] , sndpart }
string[] a3 = HideSplit("AND[Firstpart;Sndpart]", bracket);
// Will give you this array { AND , Firstpart;Sndpart }
string[] a4 = HideSplit("Firstpart;Sndpart", semicolon);
// Will give you this array { Firstpart , Sndpart }
And you can continue splitting this way.
Is there a way to tell c# to split by second semicolon?
There is no direct way to do that, but if that is precisely what you want, it's not hard to achieve:
string str="AND[AND[Firstpart;Sndpart];sndpart];
string[] tSplits = str.Split(';', 3);
string[] splits = { tSplits[0] + ";" + tSplits[1], tSplits[2] };
You could achieve the same result using a combination of IndexOf() and Substring(), however that is most likely not what you'll end up using as it's too specific and not very helpful for various inputs.
For your case, you need something that understands context.
In real-world complex cases you'd probably use a lexer / parser, but that seems like an overkill here.
Your best effort would probably be to use a loop, walk through all characters while counting +/- square brackets and spliting when you find a semicolon & the count is 1.
You can use Regex.Split, which is a more flexible form of String.Split:
string str = "AND[AND[Firstpart;Sndpart];sndpart]";
string[] arr = Regex.Split(str, #"(.*?;.*?;)");
foreach (var s in arr)
Console.WriteLine("'{0}'", s);
// output: ''
// 'AND[AND[Firstpart;Sndpart];'
// 'sndpart]'
Regex.Split splits not by chars, but by a string matching a regex expression, so it comes down to constructing a regex pattern meeting particular requirements. Splitting by a second semicolon is in practice splitting by a string that ends in a semicolon and that contains another semicolon before, so the matching pattern by which you split the input string could be for example: (.*?;.*?;).
The returned array has three elements instead of two because the splitting regex matches the beginning of the input string, in this case the empty string is returned as the first element.
You can read more on Regex.Split on msdn.

What is the regular expression to replace white space with a specified character?

I have searched lot of questions and answers but, I just got lengthy and complicated expressions. Now I want to replace all white spaces from the string. I know it can be done by regex. but, I don't have enough knowledge about regex and how to replace all white space with ','(comma) using it. I have checked some links but, I didn't get exact answer. If you have any link of posted question or answer like this. please suggest me.
My string is defined as below.
string sText = "BankMaster AccountNo decimal To varchar";
and the result should be return as below.
"BankMaster,AccountNo,decimal,To,varchar"
Full Code:
string sItems = Clipboard.GetText();
string[] lines = sItems.Split('\n');
for (int iLine =0; iLine<lines.Length;iLine++)
{
string sLine = lines[iLine];
sLine = //CODE TO REPLACE WHITE SPACE WITH ','
string[] cells = sLine.Split(',');
grdGrid.Rows.Add(iLine, cells[0], cells[1], cells[2], cells[4]);
}
Additional Details
I have more than 16000 line in a list. and all lines are same formatted like given example above. So, I am going to use regular expression instead of loop and recursive function call. If you have any other way to make this process more faster than regex then please suggest me.
string result = Regex.Replace(sText, "\\s+", ",");
\s+ stands for "capture all sequential whitespaces of any kind".
By whitespace regex engine undeerstands space (), tab (\t), newline (\n) and caret return (\r)
string a = "Some text with spaces";
Regex rgx = new Regex("\\s+");
string result = rgx.Replace(a, ",");
Console.WriteLine(result);
The code above will replace all the white spaces with ',' character
there are lot's of samples to do that by regular expressions:
Flex: replace all spaces with comma,
Regex replace all commas with value,
http://www.perlmonks.org/?node_id=896548,
http://www.dslreports.com/forum/r20971008-sed-help-whitespace-to-comma
Try This:
string str = "BankMaster AccountNo decimal To varchar";
StringBuilder temp = new StringBuilder();
str=str.Trim(); //trim before logic to avoid any trailing/leading whitespaces.
foreach(char ch in str)
{
if (ch == ' ' && temp[temp.Length-1] != ',')
{
temp.Append(",");
}
else if (ch != ' ')
{
temp.Append(ch.ToString());
}
}
Console.WriteLine(temp);
Output:
BankMaster,AccountNo,decimal,To,varchar
Try this:
sText = Regex.Replace(sText , #"\s+", ",");

How to remove escape sequences from stream

is there an quick way to find(and remove) all escape sequences from a Stream/String??
Hope bellow syntax will be help full for you
string inputString = #"hello world]\ ";
StringBuilder sb = new StringBuilder();
string[] parts = inputString.Split(new char[] { ' ', '\n', '\t', '\r', '\f', '\v','\\' }, StringSplitOptions.RemoveEmptyEntries);
int size = parts.Length;
for (int i = 0; i < size; i++)
sb.AppendFormat("{0} ", parts[i]);
The escape sequences that you are referring to are simply text based represntations of characters that are normally either unprintable (such as new lines or tabs) or conflict with other characters used in source code files (such as the backslash "\").
Although when debugging you might see these chracters represented as escaped characters in the debugger, the actual characters in the stream are not "escaped", they are those actual characters (for example a new line character).
If you want to remove certain characters (such as newline characters) then remove them in the same way you would any other character (e.g. the letter "a")
// Removes all newline characters in a string
myString.Replace("\n", "");
If you are actually doing some processing on a string that contains escaped characters (such as a source code file) then you can simply replace the escaped string with its unescaped equivalent:
// Replaces the string "\n" with the newline character
myString.Replace("\\n", "\n");
In the above I use the escape sequence for the backslash so that I match the string "\n", instead of the newline character.
If you're going for fewer lines of code:
string inputString = "\ncheese\a";
char[] escapeChars = new[]{ '\n', '\a', '\r' }; // etc
string cleanedString = new string(inputString.Where(c => !escapeChars.Contains(c)).ToArray());
You can use System.Char.IsControl() to detect control characters.
To filter control characters from a string:
public string RemoveControlCharacters(string input)
{
return
input.Where(character => !char.IsControl(character))
.Aggregate(new StringBuilder(), (builder, character) => builder.Append(character))
.ToString();
}
To filter control characters from a stream you can do something similar, however you will first need a way to convert a Stream to an IEnumerable<char>.
public IEnumerable<char> _ReadCharacters(Stream input)
{
using(var reader = new StreamReader(input))
{
while(!reader.EndOfStream)
{
foreach(var character in reader.ReadLine())
{
yield return character;
}
}
}
}
Then you can use this method to filter control characters:
public string RemoveControlCharacters(Stream input)
{
return
_ReadCharacters(input)
.Where( character => !Char.IsControl(character))
.Aggregate( new StringBuilder(), ( builder, character ) => builder.Append( character ) )
.ToString();
}
Escape sequense is a string of characters usually beginning with ESC-char but can contain any character. They are used on terminals to control cursor position graphics-mode etc.
http://en.wikipedia.org/wiki/Escape_sequence
Here is my implement with python. Should be easy enough to translate to C.
#!/usr/bin/python2.6/python
import sys
Estart="\033" #possible escape start keys
Estop="HfABCDsuJKmhlp" #possible esc end keys
replace="\015" # ^M character
replace_with="\n"
f_in = sys.stdin
parsed = sys.stdout
seqfile= open('sequences','w')#for debug
in_seq = 0
c = f_in.read(1)
while len(c) > 0 and not c=='\0':
while len(c)>0 and c!='\0' and not c in Estart:
if not c in replace :
parsed.write(c)
else:
parsed.write(replace_with[replace.find(c)])
c = f_in.read(1)
while len(c)>0 and c!='\0' and not c in Estop:
seqfile.write(c)
c = f_in.read(1)
seqfile.write(c) #write final character
c = f_in.read(1)
f_in.close()
parsed.close()
seqfile.close()

Categories