Manipulating Matched values in Regex C# - c#

I have read a text file and matched the data I am interested in. My question is, what is the best way to manipulate the data I have matched?
The code I am reading the text file with is.
OpenFileDialog dialog = new OpenFileDialog();
dialog.Filter =
"All files (*.*)|*.*";
//dialog.InitialDirectory = "C:\\";
dialog.Title = "Select a text file";
if (dialog.ShowDialog() == DialogResult.OK)
{
string fname = dialog.FileName; // selected file
label1.Text = fname;
if (String.IsNullOrEmpty(richTextBox1.Text))
{
var matches1 = Regex.Matches(System.IO.File.ReadAllText(fname), #"L10 P\d\d\d R \S\S\S\S\S\S\S")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
richTextBox1.Lines = matches1.ToArray();
}
The result now looks like:
L10 P015 R +4.9025
and I need it to look like this:
#2015=4.9025
L10 is excluded, P015 turns into #2015, R and + turn into =, and the number stays the same.

Use capturing groups:
First change your regex to:
L10 P(?<key>\d{3}) R \S(?<val>\S{6})
The (?<name>...) syntax lets you declare a named capturing group. You can later retrieve the value that was matched by this group.
Next, when you have a match object, you can extract the matching group contents with match.Groups["key"].Value and match.Groups["val"].Value, like that:
.Select(m => string.Format("#2{0}={1}", m.Groups["key"].Value, m.Groups["val"].Value))

var matches = Regex.Matches(System.IO.File.ReadAllText(fname), #"L10 P\d\d\d R \S\S\S\S\S\S\S")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
string num1 = "2" + matches[1].Substring(1); // "2" + "015"
string num2 = matches[3].Substring(1); // "4.9025"
string finalValue = "#" + num1 + "=" + num2; // "#2015=4.9025"
richTextBox1.Text = finalValue;
I believe that this should work, based on your single example.
This assumes that we are simply always ignoring the first character of the P015 item and the first character of the +4.9025 item.

Why don't you simply split the receiving stream, your rules are basic and there is no need for regexes.
string receivingStream = "L10 P015 R +4.9025";
string[] tokens = receivingStream.Split(new char[] { ' ' });
tokens[0] == L10
tokens[1] == Date
tokens[2] == R
tokens[3] == Number

You want to be using Regex.Replace to mutate the string once instead of going through all of this matching. You'll want to add grouping to the regex, and use substitutions in the replacement string.
see:
https://msdn.microsoft.com/en-us/library/xwewhkd1(v=vs.110).aspx

Related

C# Filter a word with an undefined number of spaces between charachers

For exampe:
I can create a wordt with multiple spaces for example:
string example = "**example**";
List<string>outputs = new List<string>();
string example_output = "";
foreach(char c in example)
{
example_putput += c + " ";
}
And then i can loop it to remve all spaces and add them to the outputs list,
The problem there is. I need it to work in scenario's where there are double spaces and more.
For example.
string text = "This is a piece of text for this **example**.";
I basicly want to detect AND remove 'example'
But, i want to do that even when it says e xample, e x ample or example.
And in my scenaria, since its a spamfilter, i cant just replace the spaces in the whole sentence like below, because i'd need to .Replace( the word with the exact same spaces as the user types it in).
.Replace(" ", "");
How would i achieve this?
TLDR:
I want to filter out a word with multiple spaces combinations without altering any other parts of the line.
So example, e xample, e x ample, e x a m ple
becomes a filter word
I wouldn't mind a method which could generates a word with all spaces as plan b.
You can use this regex to achieve that:
(e[\s]*x[\s]*a[\s]*m[\s]*p[\s]*l[\s]*e)
Link
Dotnet Fiddle
You could use a regex for that: e\s*x\s*a\s*m\s*p\s*l\s*e
\s means any whitespace character and the * means 0-n count of that whitespace.
Small snippet:
const string myInput = "e x ample";
var regex = new Regex("e\s*x\s*a\s*m\s*p\s*l\s*e");
var match = regex.Match(myInput);
if (match.Success)
{
// We have a match! Bad word
}
Here the link for the regex: https://regex101.com/r/VFjzTg/1
I see that the problem is to ignore the spaces in the matchstring, but not touch them anywhere else in the string.
You could create a regular expression out of your matchword, allowing arbitrary whitespace between each character.
// prepare regex. Need to do this only once for many applications.
string findword = "example";
// TODO: would need to escape special chars like * ( ) \ . + ? here.
string[] tmp = new string[findword.Length];
for(int i=0;i<tmp.Length;i++)tmp[i]=findword.Substring(i,1);
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(string.Join("\\s*",tmp));
// on each text to filter, do this:
string inp = "A text with the exa mple word in it.";
string outp;
outp = r.Replace(inp,"");
System.Console.WriteLine(outp);
Left out the escaping of regex-special-chars for brevity.
You can try regular expressions:
using System.Text.RegularExpressions;
....
// Having a word to find
string toFind = "Example";
// we build the regular expression
Regex regex = new Regex(
#"\b" + string.Join(#"\s*", toFind.Select(c => Regex.Escape(c.ToString()))) + #"\b",
RegexOptions.IgnoreCase);
// Then we apply regex built for the required text:
string text = "This is a piece of text for this **example**. And more (e X amp le)";
string result = regex.Replace(text, "");
Console.Write(result);
Outcome:
This is a piece of text for this ****. And more ()
Edit: if you want to ignore diacritics, you should modify regular expression:
string toFind = "Example";
Regex regex = new Regex(#"\b" + string.Join(#"\s*",
toFind.Select(c => Regex.Escape(c.ToString()) + #"\p{Lm}*")),
RegexOptions.IgnoreCase);
and Normalize text before matching:
string text = "This is a piece of text for this **examplé**. And more (e X amp le)";
string result = regex.Replace(text.Normalize(NormalizationForm.FormD), "");

Split string pattern

I have a string that I need to split in an array of string. All the values are delimited by a pipe | and are separated by a comma.
|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||
The array should have the following 8 values after the split
111
2,2
room 1
13'2'' x 13'8''
""
""
""
""
by "" I simply mean an empty string. Please note that the value can also have a comma e.g 2,2. I think probably the best way to do this is through Regex.Split but I am not sure how to write the correct regular expression. Any suggestions or any better way of achieving this will be really appreciated.
You can use Match() to get the values instead of split() as long as the values between the pipe characters don't contain the pipe character itself:
(?<=\|)[^|]*(?=\|)
This will match zero or more non-pipe characters [^|]* which are preceded (?<=\|) and followed by a pipe (?=\|).
In C#:
var input = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var results = Regex.Matches(input, #"(?<=\|)[^|]*(?=\|)");
foreach (Match match in results)
Console.WriteLine("Found '{0}' at position {1}",
match.Value, match.Index);
EDIT: Since commas always separate the values that are between pipe characters | then we can be sure that the commas used as separators will always appear at odd intervals, so we can only walk the even indexes of the array to get the true values like this:
var input = "|room 1|,|,|,||,||,||,||,||,||";
var results = Regex.Matches(input, #"(?<=\|)[^|]*(?=\|)");
for (int i = 0; i < results.Count; i+=2)
Console.WriteLine("Found '{0}'", results[i].Value);
This can be also used in the first example above.
Assuming all fields are enclosed by a pipe and delimited by a comma you can use |,| as the delimiter, removing the leading and trailing |
Dim data = "|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||"
Dim delim = New String() {"|,|"}
Dim results = data.Substring(1, data.Length - 2).Split(delim, StringSplitOptions.None)
For Each s In results
Console.WriteLine(s)
Next
Output:
111
2,2
room 1
13'2'' x 13'8''
""
""
""
""
No need to use a regex, remove the pipes and split the string on the comma:
var input = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var parts = input.Split(',').Select(x => x.Replace("|", string.Empty));
or
var parts = input.Replace("|", string.Empty).Split(',');
EDIT: OK, in that case, use a while loop to parse the string:
var values = new List<string>();
var str = #"|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";;
while (str.Length > 0)
{
var open = str.IndexOf('|');
var close = str.IndexOf('|', open + 1);
var value = str.Substring(open + 1, open + close - 1);
values.Add(value);
str = open + close < str.Length - 1
? str.Substring(open + close + 2)
: string.Empty;
}
You could try this:
string a = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
string[] result = a.Split('|').Where(s => !s.Contains(",")).Select(s => s.Replace("|",String.Empty)).ToArray();
mmm maybe this work for you:
var data = "|111|,|2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var resultArray = data.Replace("|", "").Split(',');
Regards.,
k
EDIT: You can use wildcard
string data = "|111|,|2,2|,|,3|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
var resultArray = data.Replace("|,|", "¬").Replace("|", "").Split('¬');
Regards.,
k
Check, if this fits your needs...
var str = "|111|,|2,2|,|room 1|,|13'2'' x 13'8''|,||,||,||,||";
//Iterate through all your matches (we're looking for anything between | and |, non-greedy)
foreach (Match m in Regex.Matches(str, #"\|(.*?)\|"))
{
//Groups[0] is entire match, with || symbols, but [1] - something between ()
Console.WriteLine(m.Groups[1].Value);
}
Though, to find anything between | and |, you might and probably should use [^\|] instead of . character.
At least, for specified use case it gives the result you're expecting.

How to trim characters from certain patterned words in string?

Given the following string:
string s = "I need drop the 1 from the end of AAAAAAAA1 and BBBBBBBB1"
How can I trim the "1" from any 8 character string that ends in a 1? I got so far as to find a working Regex pattern that finds these strings, and I'm guessing I could use a TrimEnd to remove the "1", but how I do I modify the string itself?
Regex regex = new Regex("\\w{8}1");
foreach (Match match in regex.Matches(s))
{
MessageBox.Show(match.Value.TrimEnd('1'));
}
The result I'm looking for would be "I need drop the 1 from the end of AAAAAAAA and BBBBBBBB"
Regex.Replace is the tool for the job:
var regex = new Regex("\\b(\\w{8})1\\b");
regex.replace(s, "$1");
I slightly modified the regular expression to match the description of what you are trying to do more closely.
Here a non-regex approach:
s = string.Join(" ", s.Split().Select(w => w.Length == 9 && w.EndsWith("1") ? w.Substring(0, 8) : w));
In VB with LINQ:
Dim l = 8
Dim s = "I need drop the 1 from the end of AAAAAAAA1 and BBBBBBBB1"
Dim d = s.Split(" ").Aggregate(Function(p1, p2) p1 & " " & If(p2.Length = l + 1 And p2.EndsWith("1"), p2.Substring(0, p2.Length - 1), p2))
Try this:
s = s.Replace(match.Value, match.Value.TrimEnd('1'));
And the s string will have the value you want.

Extracting string between two characters?

I want to extract email id between < >
for example.
input string : "abc" <abc#gmail.com>; "pqr" <pqr#gmail.com>;
output string : abc#gmail.com;pqr#gmail.com
Without regex, you can use this:
public static string GetStringBetweenCharacters(string input, char charFrom, char charTo)
{
int posFrom = input.IndexOf(charFrom);
if (posFrom != -1) //if found char
{
int posTo = input.IndexOf(charTo, posFrom + 1);
if (posTo != -1) //if found char
{
return input.Substring(posFrom + 1, posTo - posFrom - 1);
}
}
return string.Empty;
}
And then:
GetStringBetweenCharacters("\"abc\" <abc#gmail.com>;", '<', '>')
you will get
abc#gmail.com
string input = #"""abc"" <abc#gmail.com>; ""pqr"" <pqr#gmail.com>;";
var output = String.Join(";", Regex.Matches(input, #"\<(.+?)\>")
.Cast<Match>()
.Select(m => m.Groups[1].Value));
Tested
string input = "\"abc\" <abc#gmail.com>; \"pqr\" <pqr#gmail.com>;";
matchedValuesConcatenated = string.Join(";",
Regex.Matches(input, #"(?<=<)([^>]+)(?=>)")
.Cast<Match>()
.Select(m => m.Value));
(?<=<) is a non capturing look behind so < is part of the search but not included in the output
The capturing group is anything not > one or more times
Can also use non capturing groups #"(?:<)([^>]+)(?:>)"
The answer from LB +1 is also correct. I just did not realize it was correct until I wrote an answer myself.
Use the String.IndexOf(char, int) method to search for < starting at a given index in the string (e.g. the last index that you found a > character at, i.e. at the end of the previous e-mail address - or 0 when looking for the first address).
Write a loop that repeats for as long as you find another < character, and everytime you find a < character, look for the next > character. Use the String.Substring(int, int) method to extract the e-mail address whose start and end position is then known to you.
Could use the following regex and some linq.
var regex = new Regex(#"\<(.*?)\>");
var input= #"""abc"" <abc#gmail.com>; ""pqr"" <pqr#gmail.com>";
var matches = regex.Matches(input);
var res = string.Join(";", matches.Cast<Match>().Select(x => x.Value.Replace("<","").Replace(">","")).ToArray());
The <> brackets get removed afterwards, you could also integrate it into Regex I guess.
string str = "\"abc\" <abc#gmail.com>; \"pqr\" <pqr#gmail.com>;";
string output = string.Empty;
while (str != string.Empty)
{
output += str.Substring(str.IndexOf("<") + 1, str.IndexOf(">") -1);
str = str.Substring(str.IndexOf(">") + 2, str.Length - str.IndexOf(">") - 2).Trim();
}

C#: Parse substring from a string by detecting whitespace

If I have various strings that have text followed by whitespace followed by text, how can I parse the substring beginning with the first character in the second block of text?
For example:
If I have the string:
"stringA stringB"
How can I extract the substring
"stringB"
The strings are of various lengths but will all be of the format .
I'm sure this can be easily done with regex but I'm having trouble finding the proper syntax for c#.
No RegEx needed, just split it.
var test = "stringA stringB";
var second = test.Split()[1];
and if you are in the wonderful LINQ-land
var second = "string1 string2".Split().ElementAtOrDefault(1);
and with RegEx (for completeness)
var str2 = Regex.Match("str1 str2", #"\w (.*$)").Groups[1].Value;
use string.Split()
var test = "stringA stringB";
var elements = test.Split(new[]
{
' '
});
var desiredItem = elements.ElementAtOrDefault(1);
if you want to capture all whitespaces (msdn tells us more):
var test = "stringA stringB";
//var elements = test.Split(); // pseudo overload
var elements = test.Split(null); // correct overload
var desiredItem = elements.ElementAtOrDefault(1);
edit:
why pseudo-overload?
.Split() gets compiled to .Split(new char[0])
not documented in MSDN
If all strings are separated by a whitespace you don't need a regex here. You could just use the Split() method:
string[] result = { };
string myStrings = "stringA stringB stringC";
result = myStrings.Split(' ');
You don't need event the Split(). I think a simple IndexOf/Substring will do the job.
var input = "A B";
var result = string.Empty;
var index = input.IndexOf(' ');
if (index >= 0)
{
result = input.Substring(index + 1);
}

Categories