Find string pattern

Find string pattern - c#

I'm trying to make an app that's looking for a string entered by a user. There will be a text file that's going to store a lot of strings and the app will be checking if the string can be found within this file and display the index of the string. In case the string can't be found, the app will look for specific patterns.
Here's an example of the text file:
This
This |
This is |
This car is #
| - one word
# - one or more words
How will the app work?
If "This" is the string entered by the user, the app will display the index of the first line (0).
If "This apple" is the string entered by the user, the app will display the index of "This |" (1).
If "This is awesome" is the string entered by the user, the app will display the index of "This is |" (2).
If "The car is blue and I like it" is the string entered by the user, the app will display the index of "This car is #" (3).
Usually, if I'm looking for a string I would use this code:
string[] grammarFile = File.ReadAllLines(#"C:\Users\user_name\Desktop\Text.txt");
int resp = Array.IndexOf(grammarFile, userString);
Console.WriteLine(resp);
The main problem is that I have no idea how I could do this for patterns.

You need a definition for a word. I will assume that a word is a consecutive string of any non-whitespace characters.
Let's define a regex that matches a single word:
var singleWordRegex = #"[^\s]+";
and a regex that matches one or more words (a sequence of non-whitespace characters, followed by a sequence of whitespace characters or the end of the string):
var oneOrMoreWordsRegex = #"([^\s]+([\s]|$)+)+";
Now you can transform each string from your textfile to a regex like this:
Regex ToRegex(this string grammarEntry)
{
var singleWordRegex = #"[^\s]+";
var oneOrMoreWordsRegex = #"([^\s]+([\s]|$)+)+";
return new Regex("^" + grammarEntry.Replace("|", singleWordRegex).Replace("#", oneOrMoreWordsRegex) + "$" );
}
and test every grammar entry like this:
var userString = ReadUserString();
string[] grammarFile = File.ReadAllLines(#"C:\Users\user_name\Desktop\Text.txt");
var resp = -1;
for(int i = 0; i < grammarFile.Length; ++i)
{
var grammarEntry = grammarFile[i];
if(grammarEntry.ToRegex().IsMatch(userString))
{
resp = i;
break;
}
}
Console.WriteLine(resp);
On a side note, if you're going to perform many matches it might be wise to save all ToRegex calls to an array as preprocessing.

Related

Finding multiple occurrences of a word and printing out the entire line in which the word exists using c-sharp

Hey guys this is something I am trying to do for a while...
So I have a text file online which can be accessed through a link like this https://example.com/myfile.txt
the text file contains a few sentences, here are a few
This is a sentence
sentences are a combination of multiple words
we cannot imagine a world without sentences
words together form a sentence
here's a sentence orange is a fruit and I love it!
how's the day today?
Okay that was all random sentences in the text file, so now If I enter the word 'words' in the input (string input) I want it to print out all the lines which contain the input, here's an example
sentences are a combination of multiple words
words together form a sentence

This is an obvious case to use Regex.
using System.Text.RegularExpression;
string text = #"This is a sentence
sentences are a combination of multiple words
we cannot imagine a world without sentences
words together form a sentence
here's a sentence orange is a fruit and I love it!
how's the day today?";
string input = "words";
Regex regex = new Regex(#"^.*" + input + #".*$", RegexOptions.Multiline);
foreach (Match match in regex.Matches(text))
{
Console.WriteLine(match.value);
}
Explanation: #"^.*" + input + #".*$", RegexOptions.Multiline:
^.* match zero or more characters from start of line
input match the string in input string
.*$ match zero or more characters at end of line
RegexOptions.Multiline makes ^ and $ match start and end of line (normally they match start and end of whole text).

So, just for the fun of it, you can use these functions (they do what you asked, and the second one allows you to pass the URI so it will download the text file from a web address and proceed with the search.
public static global::System.Collections.Generic.List<string> SearchFor(string[] Lines, string Term)
{
global::System.Collections.Generic.List<string> r = new global::System.Collections.Generic.List<string>(Lines.Length);
foreach (string l in Lines) { if (l.Contains(Term)) { r.Add(l.Trim()); } }
r.TrimExcess();
return r;
}
public static global::System.Collections.Generic.List<string> SearchFor(string Text, string Term, bool TextIsUri = false)
{
if (TextIsUri) { using (global::System.Net.WebClient w = new global::System.Net.WebClient()) { Text = w.DownloadString(new global::System.Uri(Text)); } }
return SearchFor(Text.Split(new char[] { '\n' }, global::System.StringSplitOptions.RemoveEmptyEntries), Term);
}
Surelly you can also provide some safety measures like testing if string is empty and such, but i left it without it so you can see the simplest code and improve on it.

Get string between strings in c#

I am trying to get string between same strings:
The texts starts here ** Get This String ** Some other text ongoing here.....
I am wondering how to get the string between stars. Should I should use some regex or other functions?

You can try Split:
string source =
"The texts starts here** Get This String **Some other text ongoing here.....";
// 3: we need 3 chunks and we'll take the middle (1) one
string result = source.Split(new string[] { "**" }, 3, StringSplitOptions.None)[1];

You can use IndexOf to do the same without regular expressions.
This one will return the first occurence of string between two "**" with trimed whitespaces. It also has checks of non-existence of a string which matches this condition.
public string FindTextBetween(string text, string left, string right)
{
// TODO: Validate input arguments
int beginIndex = text.IndexOf(left); // find occurence of left delimiter
if (beginIndex == -1)
return string.Empty; // or throw exception?
beginIndex += left.Length;
int endIndex = text.IndexOf(right, beginIndex); // find occurence of right delimiter
if (endIndex == -1)
return string.Empty; // or throw exception?
return text.Substring(beginIndex, endIndex - beginIndex).Trim();
}
string str = "The texts starts here ** Get This String ** Some other text ongoing here.....";
string result = FindTextBetween(str, "**", "**");
I usually prefer to not use regex whenever possible.

If you want to use regex, this could do:
.*\*\*(.*)\*\*.*
The first and only capture has the text between stars.
Another option would be using IndexOf to find the position of the first star, check if the following character is a star too and then repeat that for the second set. Substring the part between those indexes.

If you can have multiple pieces of text to find in one string, you can use following regex:
\*\*(.*?)\*\*
Sample code:
string data = "The texts starts here ** Get This String ** Some other text ongoing here..... ** Some more text to find** ...";
Regex regex = new Regex(#"\*\*(.*?)\*\*");
MatchCollection matches = regex.Matches(data);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}

You could use split but this would only work if there is 1 occurrence of the word.
Example:
string output = "";
string input = "The texts starts here **Get This String **Some other text ongoing here..";
var splits = input.Split( new string[] { "**", "**" }, StringSplitOptions.None );
//Check if the index is available
//if there are no '**' in the string the [1] index will fail
if ( splits.Length >= 2 )
output = splits[1];
Console.Write( output );
Console.ReadKey();

You can use SubString for this:
String str="The texts starts here ** Get This String ** Some other text ongoing here";
s=s.SubString(s.IndexOf("**"+2));
s=s.SubString(0,s.IndexOf("**"));

Replace a part of string containing Password

Slightly similar to this question, I want to replace argv contents:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
to this:
"-help=none\n-URL=(default)\n-password=********\n-uname=Khanna\n-p=100"
I have tried very basic string find and search operations (using IndexOf, SubString etc.). I am looking for more elegant solution so as to replace this part of string:
-password=AnyPassword
to:
-password=*******
And keep other part of string intact. I am looking if String.Replace or Regex replace may help.
What I've tried (not much of error-checks):
var pwd_index = argv.IndexOf("--password=");
string converted;
if (pwd_index >= 0)
{
var leftPart = argv.Substring(0, pwd_index);
var pwdStr = argv.Substring(pwd_index);
var rightPart = pwdStr.Substring(pwdStr.IndexOf("\n") + 1);
converted = leftPart + "--password=********\n" + rightPart;
}
else
converted = argv;
Console.WriteLine(converted);

Solution
Similar to Rubens Farias' solution but a little bit more elegant:
string argv = "-help=none\n-URL=(default)\n-password=\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)[^\n]*", "$1********");
It matches password= literally, stores it in capture group $1 and the keeps matching until a \n is reached.
This yields a constant number of *'s, though. But telling how much characters a password has, might already convey too much information to hackers, anyway.
Working example: https://dotnetfiddle.net/xOFCyG
Regular expression breakdown
( // Store the following match in capture group $1.
password= // Match "password=" literally.
)
[ // Match one from a set of characters.
^ // Negate a set of characters (i.e., match anything not
// contained in the following set).
\n // The character set: consists only of the new line character.
]
* // Match the previously matched character 0 to n times.

This code replaces the password value by several "*" characters:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)([\s\S]*?\n)",
match => match.Groups[1].Value + new String('*', match.Groups[2].Value.Length - 1) + "\n");
You can also remove the new String() part and replace it by a string constant

Regex isn't parsing the email body correctly

I have my regex codes to parse this out on my email body.
Building: {building number} // new line
Level: {level of building} // new line
Phase: {phase or room number} // new line
Request: {your request}
Example:
Building: 1
Level: 2
Phase: 20
Request: Get 4 chairs
Here's my regex:
string re1 = "(Building)"; // Word 1
string re2 = "(:)"; // Any Single Character 1
string re3 = "(\\s+)"; // White Space 1
string re4 = "(\\d)"; // Any Single Digit 1
string re5 = "(\\n)"; // White Space 2
string re6 = "(Level)"; // Word 2
string re7 = "(:)"; // Any Single Character 2
string re8 = "(\\s+)"; // White Space 3
string re9 = "(\\d)"; // Any Single Digit 2
string re10 = "(\\n)"; // White Space 4
string re11 = "(Phase)"; // Word 3
string re12 = "(:)"; // Any Single Character 3
string re13 = "(\\s+)"; // White Space 5
string re14 = "(\\d+)"; // Integer Number 1
string re15 = "(\\n)"; // White Space 6
string re16 = "(Request)"; // Word 4
string re17 = "(:)"; // Any Single Character 4
string re18 = "(\\s+)"; // White Space 7
string re19 = "(\\s+)"; // Match Any
Regex r = new Regex(re1 + re2 + re3 + re4 + re5 + re6 + re7 + re8 + re9 + re10 + re11 + re12 + re13 + re14 + re15 + re16 + re17 + re18 + re19, RegexOptions.Multiline);
Match m = r.Match(body);
if (m.Success) {
blah blah blah
} else {
blah blah
}
The problem is even if the format (email body) is correct, it's still not matching my regex and it's not storing on my database.
Is my regex correct?

First, there are some useless complications that prevents from matching. This answer sums up the suggestions made in the comments to try to improve your regexp.
Then, your regexp is making groups of everything because of the parenthesis. While this is not especially problematic, this is totally useless. If you want though, you could match the values passed in the mail, but this is totally optional. This would be the result regex:
Building:\s(\d)\s*Level:\s(\d)\s*Phase:\s(\d+)\s*Request:\s(.*)
You can try it here, at Regex101 and see the grouping results of the regular expression.
If you want to retrieve the values, you can use a Matcher.
The result java code, with escaped characters, would be the following:
String regex = "Building:\\s(\\d)\\s*Level:\\s(\\d)\\s*Phase:\\s(\\d+)\\s*Request:\\s(.*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(body);
if (matcher.matches()) {
// There could be exceptions here at runtime if values in the mail
// are not numbers, handle it any way you want
Integer building = Integer.valueOf(matcher.group(1));
Integer level = Integer.valueOf(matcher.group(2));
Integer phase = Integer.valueOf(matcher.group(3));
String request = matcher.group(4);
}
I would STRONGLY recommend to be very careful with the last input to avoid any kind of SQL injection.

Make words from string matching with string bold

I want to make words matching in the string bold. I am using Jquery autocomplete with asp.net mvc. My following code works only if string has single word.
label = p.Name.Replace(termToSearch.ToLower(),"<b>" + termToSearch.ToLower() + "</b>"),
But doesnt work when I have 2 words matching which are at random position.
E.g When I search Gemini Oil
My Result should be id Gemini Sunflower Oil.
Any Ideas

A single line of Regex can do just that:
String term = "Gemini Oil";
String input = "Gemini Sunflower Oil.";
String result = Regex.Replace( input, String.Join("|", term.Split(' ')), #"<b>$&</b>");
Console.Out.WriteLine(result);
<b>Gemini</b> Sunflower <b>Oil</b>.

You could just split the search term on each space character and then run the replace multiple times:
var terms = termToSearch.split(' ');
foreach (var term in terms) {
p = p.Name.Replace(term.ToLower(),"<b>" + term.ToLower() + "</b>"),
}
label = p;

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find string pattern - c#

Related

Finding multiple occurrences of a word and printing out the entire line in which the word exists using c-sharp

Get string between strings in c#

Replace a part of string containing Password

Regex isn't parsing the email body correctly

Make words from string matching with string bold

Categories

Resources