Regex isn't parsing the email body correctly - c#

I have my regex codes to parse this out on my email body.
Building: {building number} // new line
Level: {level of building} // new line
Phase: {phase or room number} // new line
Request: {your request}
Example:
Building: 1
Level: 2
Phase: 20
Request: Get 4 chairs
Here's my regex:
string re1 = "(Building)"; // Word 1
string re2 = "(:)"; // Any Single Character 1
string re3 = "(\\s+)"; // White Space 1
string re4 = "(\\d)"; // Any Single Digit 1
string re5 = "(\\n)"; // White Space 2
string re6 = "(Level)"; // Word 2
string re7 = "(:)"; // Any Single Character 2
string re8 = "(\\s+)"; // White Space 3
string re9 = "(\\d)"; // Any Single Digit 2
string re10 = "(\\n)"; // White Space 4
string re11 = "(Phase)"; // Word 3
string re12 = "(:)"; // Any Single Character 3
string re13 = "(\\s+)"; // White Space 5
string re14 = "(\\d+)"; // Integer Number 1
string re15 = "(\\n)"; // White Space 6
string re16 = "(Request)"; // Word 4
string re17 = "(:)"; // Any Single Character 4
string re18 = "(\\s+)"; // White Space 7
string re19 = "(\\s+)"; // Match Any
Regex r = new Regex(re1 + re2 + re3 + re4 + re5 + re6 + re7 + re8 + re9 + re10 + re11 + re12 + re13 + re14 + re15 + re16 + re17 + re18 + re19, RegexOptions.Multiline);
Match m = r.Match(body);
if (m.Success) {
blah blah blah
} else {
blah blah
}
The problem is even if the format (email body) is correct, it's still not matching my regex and it's not storing on my database.
Is my regex correct?

First, there are some useless complications that prevents from matching. This answer sums up the suggestions made in the comments to try to improve your regexp.
Then, your regexp is making groups of everything because of the parenthesis. While this is not especially problematic, this is totally useless. If you want though, you could match the values passed in the mail, but this is totally optional. This would be the result regex:
Building:\s(\d)\s*Level:\s(\d)\s*Phase:\s(\d+)\s*Request:\s(.*)
You can try it here, at Regex101 and see the grouping results of the regular expression.
If you want to retrieve the values, you can use a Matcher.
The result java code, with escaped characters, would be the following:
String regex = "Building:\\s(\\d)\\s*Level:\\s(\\d)\\s*Phase:\\s(\\d+)\\s*Request:\\s(.*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(body);
if (matcher.matches()) {
// There could be exceptions here at runtime if values in the mail
// are not numbers, handle it any way you want
Integer building = Integer.valueOf(matcher.group(1));
Integer level = Integer.valueOf(matcher.group(2));
Integer phase = Integer.valueOf(matcher.group(3));
String request = matcher.group(4);
}
I would STRONGLY recommend to be very careful with the last input to avoid any kind of SQL injection.

Related

Find string pattern

I'm trying to make an app that's looking for a string entered by a user. There will be a text file that's going to store a lot of strings and the app will be checking if the string can be found within this file and display the index of the string. In case the string can't be found, the app will look for specific patterns.
Here's an example of the text file:
This
This |
This is |
This car is #
| - one word
# - one or more words
How will the app work?
If "This" is the string entered by the user, the app will display the index of the first line (0).
If "This apple" is the string entered by the user, the app will display the index of "This |" (1).
If "This is awesome" is the string entered by the user, the app will display the index of "This is |" (2).
If "The car is blue and I like it" is the string entered by the user, the app will display the index of "This car is #" (3).
Usually, if I'm looking for a string I would use this code:
string[] grammarFile = File.ReadAllLines(#"C:\Users\user_name\Desktop\Text.txt");
int resp = Array.IndexOf(grammarFile, userString);
Console.WriteLine(resp);
The main problem is that I have no idea how I could do this for patterns.
You need a definition for a word. I will assume that a word is a consecutive string of any non-whitespace characters.
Let's define a regex that matches a single word:
var singleWordRegex = #"[^\s]+";
and a regex that matches one or more words (a sequence of non-whitespace characters, followed by a sequence of whitespace characters or the end of the string):
var oneOrMoreWordsRegex = #"([^\s]+([\s]|$)+)+";
Now you can transform each string from your textfile to a regex like this:
Regex ToRegex(this string grammarEntry)
{
var singleWordRegex = #"[^\s]+";
var oneOrMoreWordsRegex = #"([^\s]+([\s]|$)+)+";
return new Regex("^" + grammarEntry.Replace("|", singleWordRegex).Replace("#", oneOrMoreWordsRegex) + "$" );
}
and test every grammar entry like this:
var userString = ReadUserString();
string[] grammarFile = File.ReadAllLines(#"C:\Users\user_name\Desktop\Text.txt");
var resp = -1;
for(int i = 0; i < grammarFile.Length; ++i)
{
var grammarEntry = grammarFile[i];
if(grammarEntry.ToRegex().IsMatch(userString))
{
resp = i;
break;
}
}
Console.WriteLine(resp);
On a side note, if you're going to perform many matches it might be wise to save all ToRegex calls to an array as preprocessing.

Get string between strings in c#

I am trying to get string between same strings:
The texts starts here ** Get This String ** Some other text ongoing here.....
I am wondering how to get the string between stars. Should I should use some regex or other functions?
You can try Split:
string source =
"The texts starts here** Get This String **Some other text ongoing here.....";
// 3: we need 3 chunks and we'll take the middle (1) one
string result = source.Split(new string[] { "**" }, 3, StringSplitOptions.None)[1];
You can use IndexOf to do the same without regular expressions.
This one will return the first occurence of string between two "**" with trimed whitespaces. It also has checks of non-existence of a string which matches this condition.
public string FindTextBetween(string text, string left, string right)
{
// TODO: Validate input arguments
int beginIndex = text.IndexOf(left); // find occurence of left delimiter
if (beginIndex == -1)
return string.Empty; // or throw exception?
beginIndex += left.Length;
int endIndex = text.IndexOf(right, beginIndex); // find occurence of right delimiter
if (endIndex == -1)
return string.Empty; // or throw exception?
return text.Substring(beginIndex, endIndex - beginIndex).Trim();
}
string str = "The texts starts here ** Get This String ** Some other text ongoing here.....";
string result = FindTextBetween(str, "**", "**");
I usually prefer to not use regex whenever possible.
If you want to use regex, this could do:
.*\*\*(.*)\*\*.*
The first and only capture has the text between stars.
Another option would be using IndexOf to find the position of the first star, check if the following character is a star too and then repeat that for the second set. Substring the part between those indexes.
If you can have multiple pieces of text to find in one string, you can use following regex:
\*\*(.*?)\*\*
Sample code:
string data = "The texts starts here ** Get This String ** Some other text ongoing here..... ** Some more text to find** ...";
Regex regex = new Regex(#"\*\*(.*?)\*\*");
MatchCollection matches = regex.Matches(data);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
You could use split but this would only work if there is 1 occurrence of the word.
Example:
string output = "";
string input = "The texts starts here **Get This String **Some other text ongoing here..";
var splits = input.Split( new string[] { "**", "**" }, StringSplitOptions.None );
//Check if the index is available
//if there are no '**' in the string the [1] index will fail
if ( splits.Length >= 2 )
output = splits[1];
Console.Write( output );
Console.ReadKey();
You can use SubString for this:
String str="The texts starts here ** Get This String ** Some other text ongoing here";
s=s.SubString(s.IndexOf("**"+2));
s=s.SubString(0,s.IndexOf("**"));

Replace a part of string containing Password

Slightly similar to this question, I want to replace argv contents:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
to this:
"-help=none\n-URL=(default)\n-password=********\n-uname=Khanna\n-p=100"
I have tried very basic string find and search operations (using IndexOf, SubString etc.). I am looking for more elegant solution so as to replace this part of string:
-password=AnyPassword
to:
-password=*******
And keep other part of string intact. I am looking if String.Replace or Regex replace may help.
What I've tried (not much of error-checks):
var pwd_index = argv.IndexOf("--password=");
string converted;
if (pwd_index >= 0)
{
var leftPart = argv.Substring(0, pwd_index);
var pwdStr = argv.Substring(pwd_index);
var rightPart = pwdStr.Substring(pwdStr.IndexOf("\n") + 1);
converted = leftPart + "--password=********\n" + rightPart;
}
else
converted = argv;
Console.WriteLine(converted);
Solution
Similar to Rubens Farias' solution but a little bit more elegant:
string argv = "-help=none\n-URL=(default)\n-password=\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)[^\n]*", "$1********");
It matches password= literally, stores it in capture group $1 and the keeps matching until a \n is reached.
This yields a constant number of *'s, though. But telling how much characters a password has, might already convey too much information to hackers, anyway.
Working example: https://dotnetfiddle.net/xOFCyG
Regular expression breakdown
( // Store the following match in capture group $1.
password= // Match "password=" literally.
)
[ // Match one from a set of characters.
^ // Negate a set of characters (i.e., match anything not
// contained in the following set).
\n // The character set: consists only of the new line character.
]
* // Match the previously matched character 0 to n times.
This code replaces the password value by several "*" characters:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)([\s\S]*?\n)",
match => match.Groups[1].Value + new String('*', match.Groups[2].Value.Length - 1) + "\n");
You can also remove the new String() part and replace it by a string constant

Getting Index of First Non Alpha character in a string C#

I'm trying to split out a string (at the index) whenever I find the first non alpha or whitespace.
My Regex is really rusty and trying to find some direction on getting this to work.
Example: "Payments Received by 08/14/2015 $0.00" is the string. and I'm able to find the first digit
string alphabet = String.Empty;
string digit = String.Empty;
int digitStartIndex;
Match regexMatch = Regex.Match("Payments Received by 08/14/2015 $0.00", "\\d");
digitStartIndex = regexMatch.Index;
alphabet = line.Substring(0, digitStartIndex);
digit = line.Substring(digitStartIndex);
The problem lies when a string like "Amount This Period + $57.00"
I end up with "Amount This Period + $"
How from using Regex in C#, if I want to also include specific non-alphanumeric characters to check for such as $ + -?
Edit: I'm looking for the output (variables alphabet and digit) in the example above I'm struggling with to be.
"Amount This Period"
"+ $57.00"
To split a string the way you mention, use a regular expression to find the initial alpha/space chars and then the rest.
var s = "Payments Received by 08/14/2015 $0.00";
var re = new Regex("^([a-z ]+)(.+)", RegexOptions.IgnoreCase);
var m = re.Match(s);
if (m.Success)
{
Console.WriteLine(m.Groups[1]);
Console.WriteLine(m.Groups[2]);
}
The ^ is important to find characters at the start.
Ah, then you want this I think:
void Main()
{
var regex = new Regex(#"(.*?)([\$\+\-].*)");
var a = "Payments Received by 08/14/2015 $0.00";
var b = "Amount This Period + $57.00";
Console.WriteLine(regex.Match(a).Groups[1].Value);
Console.WriteLine(regex.Match(a).Groups[2].Value);
Console.WriteLine(regex.Match(b).Groups[1].Value);
Console.WriteLine(regex.Match(b).Groups[2].Value);
}
Outputs:
Payments Received by 08/14/2015
$0.00
Amount This Period
+ $57.00

regex to strip number from var in string

I have a long string and I have a var inside it
var abc = '123456'
Now I wish to get the 123456 from it.
I have tried a regex but its not working properly
Regex regex = new Regex("(?<abc>+)=(?<var>+)");
Match m = regex.Match(body);
if (m.Success)
{
string key = m.Groups["var"].Value;
}
How can I get the number from the var abc?
Thanks for your help and time
var body = #" fsd fsda f var abc = '123456' fsda fasd f";
Regex regex = new Regex(#"var (?<name>\w*) = '(?<number>\d*)'");
Match m = regex.Match(body);
Console.WriteLine("name: " + m.Groups["name"]);
Console.WriteLine("number: " + m.Groups["number"]);
prints:
name: abc
number: 123456
Your regex is not correct:
(?<abc>+)=(?<var>+)
The + are quantifiers meaning that the previous characters are repeated at least once (and there are no characters since (?< ... > ... ) is named capture group and is not considered as a character per se.
You perhaps meant:
(?<abc>.+)=(?<var>.+)
And a better regex might be:
(?<abc>[^=]+)=\s*'(?<var>[^']+)'
[^=]+ will match any character except an equal sign.
\s* means any number of space characters (will also match tabs, newlines and form feeds though)
[^']+ will match any character except a single quote.
To specifically match the variable abc, you then put it like this:
(?<abc>abc)\s*=\s*'(?<var>[^']+)'
(I added some more allowances for spaces)
From the example you provided the number can be gotten such as
Console.WriteLine (
Regex.Match("var abc = '123456'", #"(?<var>\d+)").Groups["var"].Value); // 123456
\d+ means 1 or more numbers (digits).
But I surmise your data doesn't look like your example.
Try this:
var body = #"my word 1, my word 2, my word var abc = '123456' 3, my word x";
Regex regex = new Regex(#"(?<=var \w+ = ')\d+");
Match m = regex.Match(body);

Categories