remove the any character from string except number,dot(.), and comma(,) [duplicate] - c#

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
remove the invalid character from price
Hi friends,
i have a scenario where i have to remove the invalid character from price using c# code.
i want the regular ex to remove the character or some thing good then this.
For Ex- my price is
"3,950,000 ( Ex. TAX )"
i want to remove "( Ex. TAX )" from the price.
my scenario is that. i have to remove the any character from string except number,dot(.), and comma(,)
please help..
thanks in advance
Shivi

private string RemoveExtraText(string value)
{
var allowedChars = "01234567890.,";
return new string(value.Where(c => allowedChars.Contains(c)).ToArray());
}

string s = #"3,950,000 ( Ex. TAX )";
string result = string.Empty;
foreach (var c in s)
{
int ascii = (int)c;
if ((ascii >= 48 && ascii <= 57) || ascii == 44 || ascii == 46)
result += c;
}
Console.Write(result);
Notice that the dot in "Ex. TAX" will stay

How about this:
using System.Text.RegularExpressions;
public static Regex regex = new Regex(
"(\\d|[,\\.])*",
RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
//// Capture the first Match, if any, in the InputText
Match m = regex.Match(InputText);
//// Capture all Matches in the InputText
MatchCollection ms = regex.Matches(InputText);
//// Test to see if there is a match in the InputText
bool IsMatch = regex.IsMatch(InputText);

You can use LINQ
HashSet<char> validChars = new HashSet<char>(
new char[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ',', '.' });
var washedString = new string((from c in "3,950,000 ( Ex. TAX )"
where validChars.Contains(c)
select c).ToArray());
but the "." in "Ex. TAX" will remain.

you may use something like [^alpha] ore [^a-z]

Related

How to remove a charlist from a string

How can I remove a specific list of chars from a string?
For example I have the string Multilanguage File07 and want to remove all vowels, spaces and numbers to get the string MltlnggFl.
Is there any shorter way than using a foreach loop?
string MyLongString = "Multilanguage File07";
string MyShortString = MyLongString;
char[] charlist = new char[17]
{ 'a', 'e', 'i', 'o', 'u',
'0', '1', '2', '3', '4', '5',
'6', '7', '8', '9', '0', ' ' };
foreach (char letter in charlist)
{
MyShortString = MyShortString.Replace(letter.ToString(), "");
}
Use this code to replace a list of chars within a string:
using System.Text.RegularExpressions;
string MyLongString = "Multilanguage File07";
string MyShortString = Regex.Replace(MyLongString, "[aeiou0-9 ]", "");
Result:
Multilanguage File07 => MltlnggFl
Text from which some chars should be removed 12345 => Txtfrmwhchsmchrsshldbrmvd
Explanation of how it works:
The Regex Expression I use here, is a list of independend chars defined by the brackets []
=> [aeiou0-9 ]
The Regex.Replace() iterates through the whole string and looks at each character, if it will match one of the characters within the Regular Expression.
Every matched letter will be replaced by an empty string ("").
How about this:
var charList = new HashSet<char>(“aeiou0123456789 “);
MyLongString = new string(MyLongString.Where(c => !charList.Contains(c)).ToArray());
Try this pattern: (?|([aeyuio0-9 ]+)). Replace it with empty string and you will get your desird result.
I used branch reset (?|...) so all characters are captured into one group for easier manipulation.
Demo.
public void removeVowels()
{
string str = "MultilanguAge File07";
var chr = str.Where(c => !"aeiouAEIOU0-9 ".Contains(c)).ToList();
Console.WriteLine(string.Join("", chr));
}
1st line: creating desire string variable.
2nd line: using linq ignore vowels words [captital case,lower case, 0-9 number & space] and convert into list.
3rd line: combine chr list into one line string with the help of string.join function.
result: MltlnggFl7
Note: removeVowels function not only small case, 1-9 number and empty space but also remove capital case word from string.

Regex for removing only specific special characters from string

I'd like to write a regex that would remove the special characters on following basis:
To remove white space character
#, &, ', (, ), <, > or #
I have written this regex which removes whitespaces successfully:
string username = Regex.Replace(_username, #"\s+", "");
But I'd like to upgrade/change it so that it can remove the characters above that I mentioned.
Can someone help me out with this?
string username = Regex.Replace(_username, #"(\s+|#|&|'|\(|\)|<|>|#)", "");
use a character set [charsgohere]
string removableChars = Regex.Escape(#"#&'()<>#");
string pattern = "[" + removableChars + "]";
string username = Regex.Replace(username, pattern, "");
I suggest using Linq instead of regular expressions:
string source = ...
string result = string.Concat(source
.Where(c => !char.IsWhiteSpace(c) &&
c != '(' && c != ')' ...));
In case you have many characters to skip you can organize them into a collection:
HashSet<char> skip = new HashSet<char>() {
'(', ')', ...
};
...
string result = string.Concat(source
.Where(c => !char.IsWhiteSpace(c) && !skip.Contains(c)));
You can easily use the Replace function of the Regex:
string a = "ash&#<>fg fd";
a= Regex.Replace(a, "[#&'(\\s)<>#]","");
import re
string1 = "12#34#adf$c5,6,7,ok"
output = re.sub(r'[^a-zA-Z0-9]','',string1)
^ will use for except mention in brackets(or replace special char with white spaces) will substitute with whitespaces then will return in string
result = 1234adfc567ok

C# regular expression

I have string like this:
{F971h}[0]<0>some result code: 1
and I want to split it into:
F971
0
0
some result code: 1
I know I can first split "{|}|[|]|<|>" it into:
{F971h}
[0]
<0>
some result code: 1
and next: {F971h} -> F971; [0] -> 0; etc.
But how can I do it with one regular expression?
I try somethink like this:
Regex rgx = new Regex(#"(?<timestamp>[0-9A-F]+)" + #"(?<subsystem>\d+)" + #"(?<level>\d+)" + #"(?<messagep>[0-9A-Za-z]+)");
var result = rgx.Matches(input);
You can try just Split without any regular expressions:
string source = "{F971h}[0]<0>some result code: 1";
string[] items = source.Split(new char[] { '{', '}', '[', ']', '<', '>' },
StringSplitOptions.RemoveEmptyEntries);
Test:
// F971h
// 0
// 0
// some result code: 1
Console.Write(String.Join(Environment.NewLine, items));
There are two issues with your regex:
You do not allow lowercase ASCII letters in the first capture group (add a-z or a RegexOptions.IgnoreCase flag)
The delimiting characters are missing in the pattern (<, >, [, ], etc.)
Use
{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>.+)
^ ^^^ ^^^ ^^ ^
See the regex demo
Since the messagep group should match just the rest of the line, I suggest just using .+ at the end. Else, you'd need to replace your [0-9A-Za-z]+ that does not allow whitespace with something like [\w\s]+ (match all word chars and whitespaces, 1 or more times).
C# code:
var s = #"{F971h}[0]<0>some result code: 1";
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>.+)";
var m = Regex.Match(s, pat);
if (m.Success)
{
Console.Out.WriteLine(m.Groups["timestamp"].Value);
Console.Out.WriteLine(m.Groups["subsystem"].Value);
Console.Out.WriteLine(m.Groups["level"].Value);
Console.Out.WriteLine(m.Groups["messagep"].Value);
}
Or for a multiline string containing multiple matches:
var s = "{F971h}[0]<0>some result code: 1\r\n{FA71h}[0]<0>some result code: 3\r\n{FB72h}[0]<0>some result code: 5";
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>[^\r\n]+)";
var res = System.Text.RegularExpressions.Regex.Matches(s, pat)
.Cast<System.Text.RegularExpressions.Match>()
.Select(x => new[] {
x.Groups["timestamp"].Value,
x.Groups["subsystem"].Value,
x.Groups["level"].Value,
x.Groups["messagep"].Value})
.ToList();
You can get it like that:
string line = #"{F971h}[0]<0>some result code: 1";
var matchCollection = Regex.Matches(line, #"\{(?<timestamp>.*?)\}\[(?<subsystem>.*?)\]<(?<level>.*?)>(?<messagep>.*)");
if (matchCollection.Count > 0)
{
string timestamp = matchCollection[0].Groups["timestamp"].Value;
string subsystem = matchCollection[0].Groups["subsystem"].Value;
string level = matchCollection[0].Groups["level"].Value;
string messagep = matchCollection[0].Groups["messagep"].Value;
Console.Out.WriteLine("First part is {0}, second: {1}, thrid: {2}, last: {3}", timestamp, subsystem, level, messagep);
}
else
{
Console.Out.WriteLine("No match found.");
}
You can watch it live here on regex storm. You'll have to learn about:
Named capture groups
Repetitions
Thank you all! Code below works for me. I missed that it can be multiple string:
{F971h}[0]<0>some result code: 1\r\n{FA71h}[0]<0>some result code: 3\r\n{FB72h}[0]<0>some result code: 5
code:
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<message>.+)";
var collection = Regex.Matches(input, pat);
foreach (Match m in collection)
{
var timestamp = m.Groups["timestamp"];
var subsystem = m.Groups["subsystem"];
var level = m.Groups["level"];
var message = m.Groups["message"];
}

what regex must i use to split this?

i am very newbie to c#..
i want program if input like this
input : There are 4 numbers in this string 40, 30, and 10
output :
there = string
are = string
4 = number
numbers = string
in = string
this = string
40 = number
, = symbol
30 = number
, = symbol
and = string
10 = number
i am try this
{
class Program
{
static void Main(string[] args)
{
string input = "There are 4 numbers in this string 40, 30, and 10.";
// Split on one or more non-digit characters.
string[] numbers = Regex.Split(input, #"(\D+)(\s+)");
foreach (string value in numbers)
{
Console.WriteLine(value);
}
}
}
}
but the output is different from what i want.. please help me.. i am stuck :((
The regex parser has an if conditional and the ability to group items into named capture groups; to which I will demonstrate.
Here is an example where the patttern looks for symbols first (only a comma add more symbols to the set [,]) then numbers and drops the rest into words.
string text = #"There are 4 numbers in this string 40, 30, and 10";
string pattern = #"
(?([,]) # If a comma (or other then add it) is found its a symbol
(?<Symbol>[,]) # Then match the symbol
| # else its not a symbol
(?(\d+) # If a number
(?<Number>\d+) # Then match the numbers
| # else its not a number
(?<Word>[^\s]+) # So it must be a word.
)
)
";
// Ignore pattern white space allows us to comment the pattern only, does not affect
// the processing of the text!
Regex.Matches(text, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt =>
{
if (mt.Groups["Symbol"].Success)
return "Symbol found: " + mt.Groups["Symbol"].Value;
if (mt.Groups["Number"].Success)
return "Number found: " + mt.Groups["Number"].Value;
return "Word found: " + mt.Groups["Word"].Value;
}
)
.ToList() // To show the result only remove
.ForEach(rs => Console.WriteLine (rs));
/* Result
Word found: There
Word found: are
Number found: 4
Word found: numbers
Word found: in
Word found: this
Word found: string
Number found: 40
Symbol found: ,
Number found: 30
Symbol found: ,
Word found: and
Number found: 10
*/
Once the regex has tokenized the resulting matches, then we us linq to extract out those tokens by identifying which named capture group has a success. In this example we get the successful capture group and project it into a string to print out for viewing.
I discuss the regex if conditional on my blog Regular Expressions and the If Conditional for more information.
You could split using this pattern: #"(,)\s?|\s"
This splits on a comma, but preserves it since it is within a group. The \s? serves to match an optional space but excludes it from the result. Without it, the split would include the space that occurred after a comma. Next, there's an alternation to split on whitespace in general.
To categorize the values, we can take the first character of the string and check for the type using the static Char methods.
string input = "There are 4 numbers in this string 40, 30, and 10";
var query = Regex.Split(input, #"(,)\s?|\s")
.Select(s => new
{
Value = s,
Type = Char.IsLetter(s[0]) ?
"String" : Char.IsDigit(s[0]) ?
"Number" : "Symbol"
});
foreach (var item in query)
{
Console.WriteLine("{0} : {1}", item.Value, item.Type);
}
To use the Regex.Matches method instead, this pattern can be used: #"\w+|,"
var query = Regex.Matches(input, #"\w+|,").Cast<Match>()
.Select(m => new
{
Value = m.Value,
Type = Char.IsLetter(m.Value[0]) ?
"String" : Char.IsDigit(m.Value[0]) ?
"Number" : "Symbol"
});
Well to match all numbers you could do:
[\d]+
For the strings:
[a-zA-Z]+
And for some of the symbols for example
[,.?\[\]\\\/;:!\*]+
You can very easily do this like so:
string[] tokens = Regex.Split(input, " ");
foreach(string token in tokens)
{
if(token.Length > 1)
{
if(Int32.TryParse(token))
{
Console.WriteLine(token + " = number");
}
else
{
Console.WriteLine(token + " = string");
}
}
else
{
if(!Char.isLetter(token ) && !Char.isDigit(token))
{
Console.WriteLine(token + " = symbol");
}
}
}
I do not have an IDE handy to test that this compiles. Essentially waht you are doing is splitting the input on space and then performing some comparisons to determine if it is a symbol, string, or number.
If you want to get the numbers
var reg = new Regex(#"\d+");
var matches = reg.Matches(input );
var numbers = matches
.Cast<Match>()
.Select(m=>Int32.Parse(m.Groups[0].Value));
To get your output:
var regSymbols = new Regex(#"(?<number>\d+)|(?<string>\w+)|(?<symbol>(,))");
var sMatches = regSymbols.Matches(input );
var symbols = sMatches
.Cast<Match>()
.Select(m=> new
{
Number = m.Groups["number"].Value,
String = m.Groups["string"].Value,
Symbol = m.Groups["symbol"].Value
})
.Select(
m => new
{
Match = !String.IsNullOrEmpty(m.Number) ?
m.Number : !String.IsNullOrEmpty(m.String)
? m.String : m.Symbol,
MatchType = !String.IsNullOrEmpty(m.Number) ?
"Number" : !String.IsNullOrEmpty(m.String)
? "String" : "Symbol"
}
);
edit
If there are more symbols than a comma you can group them in a class, like #Bogdan Emil Mariesan did and the regex will be:
#"(?<number>\d+)|(?<string>\w+)|(?<symbol>[,.\?!])"
edit2
To get the strings with =
var outputLines = symbols.Select(m=>
String.Format("{0} = {1}", m.Match, m.MatchType));

How to parse a comma delimited string when comma and parenthesis exists in field

I have this string in C#
adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
I want to use a RegEx to parse it to get the following:
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
In addition to the above example, I tested with the following, but am still unable to parse it correctly.
"%exc.uns: 8 hours let # = ABC, DEF", "exc_it = 1 day" , " summ=graffe ", " a,b,(c,d)"
The new text will be in one string
string mystr = #"""%exc.uns: 8 hours let # = ABC, DEF"", ""exc_it = 1 day"" , "" summ=graffe "", "" a,b,(c,d)""";
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var resultStrings = new List<string>();
int? firstIndex = null;
int scopeLevel = 0;
for (int i = 0; i < str.Length; i++)
{
if (str[i] == ',' && scopeLevel == 0)
{
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault(), i - firstIndex.GetValueOrDefault()));
firstIndex = i + 1;
}
else if (str[i] == '(') scopeLevel++;
else if (str[i] == ')') scopeLevel--;
}
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault()));
Event faster:
([^,]*\x28[^\x29]*\x29|[^,]+)
That should do the trick. Basically, look for either a "function thumbprint" or anything without a comma.
adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
^ ^ ^ ^ ^
The Carets symbolize where the grouping stops.
Just this regex:
[^,()]+(\([^()]*\))?
A test example:
var s= "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
Regex regex = new Regex(#"[^,()]+(\([^()]*\))?");
var matches = regex.Matches(s)
.Cast<Match>()
.Select(m => m.Value);
returns
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
If you simply must use Regex, then you can split the string on the following:
, # match a comma
(?= # that is followed by
(?: # either
[^\(\)]* # no parens at all
| # or
(?: #
[^\(\)]* # ...
\( # (
[^\(\)]* # stuff in parens
\) # )
[^\(\)]* # ...
)+ # any number of times
)$ # until the end of the string
)
It breaks your input into the following:
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
You can also use .NET's balanced grouping constructs to create a version that works with nested parens, but you're probably just as well off with one of the non-Regex solutions.
Another way to implement what Snowbear was doing:
public static string[] SplitNest(this string s, char src, string nest, string trg)
{
int scope = 0;
if (trg == null || nest == null) return null;
if (trg.Length == 0 || nest.Length < 2) return null;
if (trg.IndexOf(src) >= 0) return null;
if (nest.IndexOf(src) >= 0) return null;
for (int i = 0; i < s.Length; i++)
{
if (s[i] == src && scope == 0)
{
s = s.Remove(i, 1).Insert(i, trg);
}
else if (s[i] == nest[0]) scope++;
else if (s[i] == nest[1]) scope--;
}
return s.Split(trg);
}
The idea is to replace any non-nested delimiter with another delimiter that you can then use with an ordinary string.Split(). You can also choose what type of bracket to use - (), <>, [], or even something weird like \/, ][, or `'. For your purposes you would use
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
string[] result = str.SplitNest(',',"()","~");
The function would first turn your string into
adj_con(CL2,1,3,0)~adj_cont(CL1,1,3,0)~NG~ NG/CL~ 5 value of CL(JK)~ HO
then split on the ~, ignoring the nested commas.
Assuming non nested, matching parentheses, you can easily match the tokens you want instead of splitting the string:
MatchCollection matches = Regex.Matches(data, #"(?:[^(),]|\([^)]*\))+");
var s = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var result = string.Join(#"\n",Regex.Split(s, #"(?<=\)),|,\s"));
The pattern matches for ) and excludes it from the match then matches ,
or
matches , followed by a space.
result =
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
The TextFieldParser (msdn) class seems to have the functionality built-in:
TextFieldParser Class: - Provides methods and properties for parsing structured text files.
Parsing a text file with the TextFieldParser is similar to iterating over a text file, while the ReadFields method to extract fields of text is similar to splitting the strings.
The TextFieldParser can parse two types of files: delimited or fixed-width. Some properties, such as Delimiters and HasFieldsEnclosedInQuotes are meaningful only when working with delimited files, while the FieldWidths property is meaningful only when working with fixed-width files.
See the article which helped me find that
Here's a stronger option, which parses the whole text, including nested parentheses:
string pattern = #"
\A
(?>
(?<Token>
(?:
[^,()] # Regular character
|
(?<Paren> \( ) # Opening paren - push to stack
|
(?<-Paren> \) ) # Closing paren - pop
|
(?(Paren),) # If inside parentheses, match comma.
)*?
)
(?(Paren)(?!)) # If we are not inside parentheses,
(?:,|\Z) # match a comma or the end
)*? # lazy just to avoid an extra empty match at the end,
# though it removes a last empty token.
\Z
";
Match match = Regex.Match(data, pattern, RegexOptions.IgnorePatternWhitespace);
You can get all matches by iterating over match.Groups["Token"].Captures.

Categories