Regex to strip characters except given ones?

Regex to strip characters except given ones? - c#

I would like to strip strings but only leave the following:
[a-zA-Z]+[_a-zA-Z0-9-]*
I am trying to output strings that start with a character, then can have alphanumeric, underscores, and dashes. How can I do this with RegEx or another function?

Because everything in the second part of the regex is in the first part, you could do something like this:
String foo = "_-abc.!##$5o993idl;)"; // your string here.
//First replace removes all the characters you don't want.
foo = Regex.Replace(foo, "[^_a-zA-Z0-9-]", "");
//Second replace removes any characters from the start that aren't allowed there.
foo = Regex.Replace(foo, "^[^a-zA-Z]+", "");
So start out by paring it down to only the allowed characters. Then get rid of any allowed characters that can't be at the beginning.
Of course, if your regex gets more complicated, this solution falls apart fairly quickly.

Assuming that you've got the strings in a collection, I would do it this way:
foreach element in the collection try match the regex
if !success, remove the string from the collection
Or the other way round - if it matches, add it to a new collection.
If the strings are not in a collection can you add more details as to what your input looks like ?

If you want to pull out all of the identifiers matching your regular expression, you can do it like this:
var input = " _wontmatch f_oobar0 another_valid ";
var re = new Regex( #"\b[a-zA-Z][_a-zA-Z0-9-]*\b" );
foreach( Match match in re.Matches( input ) )
Console.WriteLine( match.Value );

Use MatchCollection matchColl = Regex.Matches("input string","your regex");
Then use:
string [] outStrings = new string[matchColl.Count]; //A string array to contain all required strings
for (int i=0; i < matchColl.Count; i++ )
outStrings[i] = matchColl[i].ToString();
You will have all the required strings in outStrings. Hope this helps.

Edited
var s = Regex.Matches(input_string, "[a-z]+(_*-*[a-z0-9]*)*", RegexOptions.IgnoreCase);
string output_string="";
foreach (Match m in s)
{
output_string = output_string + m;
}
MessageBox.Show(output_string);

Related

c# extracting a certain value within a string

I'm trying to remove a certain bit of text within a string.
Say the string I have contains html elements, like paragraph tags, I created some sort of tokens that will be identified with "{" at the beginning and "}" at the end.
So essentially the string I have would look like this:
text = "<p>{token}</p><p> text goes here {token3}</p>"
I'm wondering is there a way to extract all the words including the "{}" using C#-Code within the string.
Whilst each token could be different to the next, that is why i must use "{" and "}" to identify them as seen below
At the moment I'm got to this code:
var newWord = text.Contains("{") && word.Contains("}")

Something like
var r = new Regex("({.*?})");
foreach(var match in r.Matches(myString)) ...
The ? means that your regex is non-greedy. If you omit it you´ll simply get everythinbg between the first { and the last }.
Alternativly you may also use this:
var index = text.IndexOf("{");
while (index != -1)
{
var end = text.IndexOf("}", index);
result.Add(text.Substring(index, end - index + 1));
index = text.IndexOf("{", index + 1);
}

I would just use a regex for this:
Regex reg = new Regex("{.*?}");
var results = reg.Matches(text);
The regex searches for any characters between { and }.
The .*? means match any character but in a non greedy way. So it will search for the shortest possible string between braces.

Using regex to remove everything that is not in between '<#'something'#>' and replace it with commas

I have a string, for example
<#String1#> + <#String2#> , <#String3#> --<#String4#>
And I want to use regex/string manipulation to get the following result:
<#String1#>,<#String2#>,<#String3#>,<#String4#>
I don't really have any experience doing this, any tips?

There are multiple ways to do something like this, and it depends on exactly what you need. However, if you want to use a single regex operation to do it, and you only want to fix stuff that comes between the bracketed strings, then you could do this:
string input = "<#String1#> + <#String2#> , <#String3#> --<#String4#>";
string pattern = "(?<=>)[^<>]+(?=<)";
string replacement = ",";
string result = Regex.Replace(input, pattern, replacement);
The pattern uses [^<>]+ to match any non-pointy-bracket characters, but it combines it with a look-behind statement ((?<=>)) and a look-ahead statement (?=<) to make sure that it only matches text that occurs between a closing and another opening set of brackets.
If you need to remove text that comes before the first < or after the last >, or if you find the look-around statements confusing, you may want to consider simply matching the text that comes between the brackets and then loop through all the matches and build a new string yourself, rather than using the RegEx.Replace method. For instance:
string input = "sdfg<#String1#> + <#String2#> , <#String3#> --<#String4#>ag";
string pattern = #"<[^<>]+>";
List<String> values = new List<string>();
foreach (Match m in Regex.Matches(input, pattern))
values.Add(m.Value);
string result = String.Join(",", values);
Or, the same thing using LINQ:
string input = "sdfg<#String1#> + <#String2#> , <#String3#> --<#String4#>ag";
string pattern = #"<[^<>]+>";
string result = String.Join(",", Regex.Matches(input, pattern).Cast<Match>().Select(x => x.Value));

If you're just after string manipulation and don't necessarily need a regex, you could simply use the string.Replace method.
yourString = yourString.Replace("#> + <#", "#>,<#");

RegEx: Split string by separator and then by another

There is a problem with needed behavior.
Assume there is a
sourceString = #"name1$$value1^name2$$value2^name3$$value3";
maybe more long string...
I'd like to first split by ^ separator and then by another $$ to create dictionary based on this name-value pairs.
This string is stored in file so may be too long, any split operations may take too much time.
I hope there is a regex with match by ^ and internal groupmatch by $$.

This regex (.*?)\$\$(.*?)(?:\^|$) will match the name value pairs, and here is a Rubular to prove it. And to use it you can use the following code:
var input = "name1$$value1^name2$$value2^name3$$value3";
var pattern = #"(.*?)\$\$(.*?)(?:\^|$)";
var hash = new Dictionary<string, string>();
var match = Regex.Match(input, pattern);
while (match.Success)
{
hash.Add(match.Groups[1].Value, match.Groups[2].Value);
match = match.NextMatch();
}

Why not use:
sourceString.Split(new char[] {'^'}, StringSplitOptions.RemoveEmptyEntries)
Then you can do the same for $$

Fixing RegEx Split() function - Empty string as first entry

upfront the code to visualize a bit the problem I am facing:
This is the text that needs to be split.
:20:0444453880181732
:21:0444453880131350
:22:CANCEL/ABCDEF0131835055
:23:BUY/CALL/E/EUR
:82A:ABCDEFZZ80A
:87A:4444655604
:30:061123
:31G:070416/1000/USNY
:31E:070418
:26F:PRINCIPAL
:32B:EUR1000000,00
:36:1,31000000
:33B:USD1310000,00
:37K:PCT1,60000000
:34P:061127USD16000,00
:57A:ABCDEFZZ80A
This is my Regex
Regex r = new Regex(#"\:\d{2}\w*\:", RegexOptions.Multiline);
MatchCollection matches = r.Matches(Content);
string[] items = r.Split(Content);
// ----- Fix for first entry being empty string.
int index = items[0] == string.Empty ? 1 : 0;
foreach (Match match in matches)
{
MessageField field = new MessageField();
field.FieldIdExtended = match.Value;
field.Content = items[index];
Fields.Add(field);
index++;
}
As you can see from the comments the problem occurs with the splitting of the string.
It returns as first item an empty string.
Is there any elegant way to solve this?
Thanks, Dimi

The reason that you are getting this behaviour is that your first delimiter from the split has nothing before it and this the first entry is blank.
The way to solve this properly is probably to capture the value that you want in the regular expression and then just get it from your match set.
At a rough first guess you probably want something like:
Regex r = new Regex(#"^:(?<id>\d{2}\w*):(?<content>.*)$", RegexOptions.Multiline);
MatchCollection matches = r.Matches(Content);
foreach (Match match in matches)
{
MessageField field = new MessageField();
field.FieldIdExtended = match.Groups["id"].ToString()
field.Content = match.Groups["content"].ToString();
Fields.Add(field);
}
The use of named capture groups makes it easy to extract stuff. You may need to tweak the regex to be more as you want. Currently it gets 20 as id and 0444453880181732 as content. I wasn't 100% clear on what you needed to capture but you look ok with regex so I assume that isn't a problem. :)
Essentially here you are not really trying to split stuff but match stuff and pull it out.

use:
string[] items = r.Split(Content, StringSplitOptions.RemoveEmptyEntries);
to remove empty entries.

C# string manipulation

I have a string like
A150[ff;1];A160;A100;D10;B10'
in which I want to extract A150, D10, B10
In between these valid string, i can have any characters. The one part that is consistent is the semicolumn between each legitimate strings.
Again the junk character that I am trying to remove itself can contain the semi column

Without having more detail for the specific rules, it looks like you want to use String.Split(';') and then construct a regex to parse out the string you really need foreach string in your newly created collection. Since you said that the semi colon can appear in the "junk" it's irrelevant since it won't match your regex.

var input = "A150[ff+1];A160;A150[ff-1]";
var temp = new List<string>();
foreach (var s in input.Split(';'))
{
temp.Add(Regex.Replace(s, "(A[0-9]*)\\[*.*", "$1"));
}
foreach (var s1 in temp.Distinct())
{
Console.WriteLine(s1);
}
produces the output
A150
A160

First,you should use
string s="A150[ff;1];A160;A100;D10;B1";
s.IndexOf("A160");
Through this command you can get the index of A160 and other words.
And then s.Remove(index,count).

If you only want to remove the 'junk' inbetween the '[' and ']' characters you can use regex for that
Regex regex = new Regex(#"\[([^\}]+)\]");
string result = regex.Replace("A150[ff;1];A160;A100;D10;B10", "");
Then String.Split to get the individual items

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to strip characters except given ones? - c#

I would like to strip strings but only leave the following: [a-zA-Z]+[_a-zA-Z0-9-]* I am trying to output strings that start with a character, then can have alphanumeric, underscores, and dashes. How can I do this with RegEx or another function?

If you want to pull out all of the identifiers matching your regular expression, you can do it like this: var input = " _wontmatch f_oobar0 another_valid "; var re = new Regex( #"\b[a-zA-Z][_a-zA-Z0-9-]*\b" ); foreach( Match match in re.Matches( input ) ) Console.WriteLine( match.Value );

Edited var s = Regex.Matches(input_string, "[a-z]+(_-[a-z0-9])", RegexOptions.IgnoreCase); string output_string=""; foreach (Match m in s) { output_string = output_string + m; } MessageBox.Show(output_string);

Related

c# extracting a certain value within a string

Using regex to remove everything that is not in between '<#'something'#>' and replace it with commas

RegEx: Split string by separator and then by another

Fixing RegEx Split() function - Empty string as first entry

C# string manipulation

Categories

Resources

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to strip characters except given ones? - c#

I would like to strip strings but only leave the following: [a-zA-Z]+[_a-zA-Z0-9-]* I am trying to output strings that start with a character, then can have alphanumeric, underscores, and dashes. How can I do this with RegEx or another function?

If you want to pull out all of the identifiers matching your regular expression, you can do it like this: var input = " _wontmatch f_oobar0 another_valid "; var re = new Regex( #"\b[a-zA-Z][_a-zA-Z0-9-]*\b" ); foreach( Match match in re.Matches( input ) ) Console.WriteLine( match.Value );

Edited var s = Regex.Matches(input_string, "[a-z]+(_*-*[a-z0-9]*)*", RegexOptions.IgnoreCase); string output_string=""; foreach (Match m in s) { output_string = output_string + m; } MessageBox.Show(output_string);

Related

c# extracting a certain value within a string

Using regex to remove everything that is not in between '<#'something'#>' and replace it with commas

RegEx: Split string by separator and then by another

Fixing RegEx Split() function - Empty string as first entry

C# string manipulation

Categories

Resources

Edited var s = Regex.Matches(input_string, "[a-z]+(_-[a-z0-9])", RegexOptions.IgnoreCase); string output_string=""; foreach (Match m in s) { output_string = output_string + m; } MessageBox.Show(output_string);