Parsing rules file, c# regex not seeing pattern - c#

I have taken my regex from python and try to make work in c#, while i get no errors, it does not display any output and during debug, i do not see the output varible get populated with any data, here is snippet:
StringWriter strwriter = new StringWriter();
rule = sr.ReadLine();
do
{
Regex action = new Regex(#"^#\w+(?<action>(alert)\\s+(tcp|udp)\\s+(.*?)\\('*}))");
Regex message = new Regex("(?<msg>[\\s(]*\\((.*)\\)[^)]*$)", RegexOptions.IgnorePatternWhitespace);
Regex content = new Regex("(?<content>[\\s(]*\\((.*)\\)[^)]*$)", RegexOptions.IgnorePatternWhitespace);
Match result = action.Match(rule);
//String repl = Regex.Replace(rule, "[\\;]", ",");
//Match mat = action.Match(repl);
Console.WriteLine(result.Groups["action"].Value);
//writer.WriteLine(result.Groups["action"].Value + "," + result.Groups["msg"].Value + "," + result.Groups["content"].Value + "," + result.Groups["flow"].Value + "," + result.Groups["ct"].Value + "," + result.Groups["pcre"].Value + "," + result.Groups["sid"].Value);
} while (rule != null);
result does not show anything, what have i missed, these are almost the same one that i have working in the python script.

Since you're using string literals for the first regex, don't double escape!
^#\w+(?<action>(alert)\\s+(tcp|udp)\\s+(.*?)\\('*}))
^ ^ ^
=>
^#\w+(?<action>(alert)\s+(tcp|udp)\s+(.*?)\('*}))
With the input, there were a couple of more things wrong with the regex and this one should bring you in the right direction:
^#\s*(?<action>alert\s+(?:tcp|udp)\s+(.*?)\([^)]*\))
regex101 demo
If you don't want the part within parentheses, you can omit the last part:
^#\s*(?<action>alert\s+(?:tcp|udp)\s+(.*?)\()

Related

How to obtain regex matched string 's file path?

I have successfully regex matched multiple string from a folder with txt.files with "streamreader" but i also need to obtain the matched string's file path.
How am i able to obtain the matched string's file paths?
static void abnormalitiescheck()
{
int count = 0;
Regex regex = new Regex(#"(#####)");
DirectoryInfo di = new DirectoryInfo(txtpath);
Console.WriteLine("No" + "\t" + "Name and location of file" + "\t" + "||" +" " + "Abnormal Text Detected");
Console.WriteLine("=" + "\t" + "=========================" + "\t" + "||" + " " + "=======================");
foreach (string files in Directory.GetFiles(txtpath, "*.txt"))
{
using (StreamReader reader = new StreamReader(files))
{
string line;
while ((line = reader.ReadLine()) != null)
{
Match match = regex.Match(line);
if (match.Success)
{
count++;
Console.WriteLine(count + "\t\t\t\t\t" + match.Value + "\n");
}
}
}
}
}
If possible , i want to have output of the strings's file path as well.
For e.g.,
C:/..../email_4.txt
C:/..../email_7.txt
C:/..../email_8.txt
C:/..../email_9.txt
As you already have the DirectoryInfo, you could get the FullName property.
You also have the filename called files. To get the name and location of the file, you could use Path.Combine
Your updated code could look like:
Console.WriteLine(count + "\t" + Path.Combine(di.FullName , Path.GetFileName(files)) + "\t" + match.Value + "\n");
I'm guessing that we might just want to maybe match some .txt files. If that might be the case, let's start with a simple expression that would collect everything from the start of our input strings up to .txt, then we add .txt as a right boundary:
^(.+?)(.txt)$
Demo
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"^(.+?)(.txt)$";
string input = #"C:/..../email_4.txt
C:/..../email_7.txt
C:/..../email_8.txt
C:/..../email_9.txt";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}

C# creating an HTML line with escaping

I'm creating a loop in which each line is a pretty long HTML line on the page. I've tried various combinations of # and """ but I just can't seem to get the hang of it
This is what I've got now, but the single quotes are giving me problems on the page, so I want to change all the single quotes to double quotes, just like a normal HTML line would use them for properties in the elements:
sOutput += "<div class='item link-item " + starOrBullet + "'><a href='" + appSet + linkID + "&TabID=" + tabID + "' target=’_blank’>" + linkText + "</a></div>";
variables are:
starOrBullet
appSet
LinkID
tabID (NOT $TabID=)
linkText
BTW, appSet="http://linktracker.swmed.org:8020/LinkTracker/Default.aspx?LinkID="
Can someone help me here?
You have to escape the double quotes (") with \"
For your case:
sOutput += "<div class=\"item link-item " + starOrBullet + "\"><a href=\"" + appSet + linkID + "&TabID=" + tabID + "\" target=’_blank’>" + linkText + "</a></div>";
If you concat many strings, you should use StringBuilder for performance reasons.
You can use a verbatim string and escape a double quote with a double quote. So it will be a double double quote.
tring mystring = #"This is \t a ""verbatim"" string";
You can also make your string shorter by doing the following:
Method 1
string mystring = #"First Line
Second Line
Third Line";
Method 2
string mystring = "First Line \n" +
"Second Line \n" +
"Third Line \n";
Method 3
var mystring = String.Join(
Environment.NewLine,
"First Line",
"Second Line",
"Third Line");
You must make habit to use C# class to generate Html instead concatenation. Please find below code to generate Html using C#.
Check this link for more information
https://dejanstojanovic.net/aspnet/2014/june/generating-html-string-in-c/
https://learn.microsoft.com/en-us/dotnet/api/system.web.ui.htmltextwriter
Find below code for your question
protected void Page_Load(object sender, EventArgs e)
{
string starOrBullet = "star-link";
string appSet = "http://linktracker.swmed.org:8020/LinkTracker/Default.aspx?LinkID=";
string LinkID = "2";
string tabID = "1";
string linkText = "linkText_Here";
string sOutput = string.Empty;
StringBuilder sbControlHtml = new StringBuilder();
using (StringWriter stringWriter = new StringWriter())
{
using (HtmlTextWriter htmlWriter = new HtmlTextWriter(stringWriter))
{
//Generate container div control
HtmlGenericControl divControl = new HtmlGenericControl("div");
divControl.Attributes.Add("class", string.Format("item link-item {0}",starOrBullet));
//Generate link control
HtmlGenericControl linkControl = new HtmlGenericControl("a");
linkControl.Attributes.Add("href", string.Format("{0}{1}&TabID={2}",appSet,LinkID,tabID));
linkControl.Attributes.Add("target", "_blank");
linkControl.InnerText = linkText;
//Add linkControl to container div
divControl.Controls.Add(linkControl);
//Generate HTML string and dispose object
divControl.RenderControl(htmlWriter);
sbControlHtml.Append(stringWriter.ToString());
divControl.Dispose();
}
}
sOutput = sbControlHtml.ToString();
}

Parse XML With Additional String

I need to support parsing xml that is inside an email body but with extra text in the beginning and the end.
I've tried the HTML agility pack but this does not remove the non-xml texts.
So how do I cleanse the string w/c contains an entire xml text mixed with other texts around it?
var bodyXmlPart= #"Hi please see below client <?xml version=""1.0"" encoding=""UTF-8""?>" +
"<ac_application>" +
" <primary_applicant_data>" +
" <first_name>Ross</first_name>" +
" <middle_name></middle_name>" +
" <last_name>Geller</last_name>" +
" <ssn>123456789</ssn>" +
" </primary_applicant_data>" +
"</ac_application> thank you, \n john ";
//How do I clean up the body xml part before loading into xml
//This will fail:
var xDoc = XDocument.Parse(bodyXmlPart);
If you mean that body can contain any XML and not just ac_application. You can use the following code:
var bodyXmlPart = #"Hi please see below client " +
"<ac_application>" +
" <primary_applicant_data>" +
" <first_name>Ross</first_name>" +
" <middle_name></middle_name>" +
" <last_name>Geller</last_name>" +
" <ssn>123456789</ssn>" +
" </primary_applicant_data>" +
"</ac_application> thank you, \n john ";
StringBuilder pattern = new StringBuilder();
Regex regex = new Regex(#"<\?xml.*\?>", RegexOptions.Singleline);
var match = regex.Match(bodyXmlPart);
if (match.Success) // There is an xml declaration
{
pattern.Append(#"<\?xml.*");
}
Regex regexFirstTag = new Regex(#"\s*<(\w+:)?(\w+)>", RegexOptions.Singleline);
var match1 = regexFirstTag.Match(bodyXmlPart);
if (match1.Success) // xml has body and we got the first tag
{
pattern.Append(match1.Value.Trim().Replace(">",#"\>" + ".*"));
string firstTag = match1.Value.Trim();
Regex regexFullXmlBody = new Regex(pattern.ToString() + #"<\/" + firstTag.Trim('<','>') + #"\>", RegexOptions.None);
var matchBody = regexFullXmlBody.Match(bodyXmlPart);
if (matchBody.Success)
{
string xml = matchBody.Value;
}
}
This code can extract any XML and not just ac_application.
Assumptions are, that the body will always contain XML declaration tag.
This code will look for XML declaration tag and then find first tag immediately following it. This first tag will be treated as root tag to extract entire xml.
I'd probably do something like this...
using System.Diagnostics;
using System.Text.RegularExpressions;
namespace Test {
class Program {
static void Main(string[] args) {
var bodyXmlPart = #"Hi please see below client <?xml version=""1.0"" encoding=""UTF-8""?>" +
"<ac_application>" +
" <primary_applicant_data>" +
" <first_name>Ross</first_name>" +
" <middle_name></middle_name>" +
" <last_name>Geller</last_name>" +
" <ssn>123456789</ssn>" +
" </primary_applicant_data>" +
"</ac_application> thank you, \n john ";
Regex regex = new Regex(#"(?<pre>.*)(?<xml>\<\?xml.*</ac_application\>)(?<post>.*)", RegexOptions.Singleline);
var match = regex.Match(bodyXmlPart);
if (match.Success) {
Debug.WriteLine($"pre={match.Groups["pre"].Value}");
Debug.WriteLine($"xml={match.Groups["xml"].Value}");
Debug.WriteLine($"post={match.Groups["post"].Value}");
}
}
}
}
This outputs...
pre=Hi please see below client
xml=<?xml version="1.0" encoding="UTF-8"?><ac_application> <primary_applicant_data> <first_name>Ross</first_name> <middle_name></middle_name> <last_name>Geller</last_name> <ssn>123456789</ssn> </primary_applicant_data></ac_application>
post= thank you,
john

Regex, Remove function in a string

I'm trying to get the function DoDialogwizardWithArguments that is inside a string using Regex:
string:
var a = 1 + 2;DoDialogWizardWithArguments('CopyGroup', '&act=enviarcliente', 96487, (Q.getBody().$.innerWidth()/4)*3, Q.getBody().$.innerHeight(), new Function("if(localStorage.getItem('atualizaPgsParaCli')){{Q.window.close();Q.window.proxy.reload();}}localStorage.removeItem('atualizaPgsParaCli');return true;"), false);p = q.getBOdy();
actual Regex (pattern):
DoDialogWizardWithArguments\((.*\$?)\)
Result expected:
DoDialogWizardWithArguments('CopyGroup', '&act=enviarcliente', 96487, (Q.getBody().$.innerWidth()/4)*3, Q.getBody().$.innerHeight(), new Function("if(localStorage.getItem('atualizaPgsParaCli')){{Q.window.close();Q.window.proxy.reload();}}localStorage.removeItem('atualizaPgsParaCli');return true;"), false)
The problem:
If there's another parentheses ")" that is not the parentheses of DoDialogWizardWithArguments function the Regex is getting this too.
How can i get only the function with his open and close parentheses.
If Regex is not possible, whats the better option?
Example regex link:https://regex101.com/r/kP2bQ4/1
Try this one as regex: https://regex101.com/r/kP2bQ4/2
DoDialogWizardWithArguments\(((?:[^()]|\((?1)\))*+)\)
I'd probably try to simplify it like this:
var str = #"var a = 1 + 2;DoDialogWizardWithArguments('CopyGroup', '&act=enviarcliente', 96487, (Q.getBody().$.innerWidth()/4)*3, Q.getBody().$.innerHeight(), new Function("if(localStorage.getItem('atualizaPgsParaCli')){{Q.window.close();Q.window.proxy.reload();}}localStorage.removeItem('atualizaPgsParaCli');return true;"), false);p = q.getBOdy();"
var lines = str.Split(';');
foreach(var line in lines)
{
if(line.Contains("DoDialogWizardWithArguments")){
int startPos = line.IndexOf("(");
int endPos = line.IndexOf(")");
return line.Substring(startPos+1, endPos - startPos - 1);
}
}
return "Not found";
If you don't want to detect if DoDialogWizardWithArguments was correctly written but just the function itself, try with "DoDialogWizardWithArguments([^,],[^,],[^,],([^,]),.+);".
Example:
String src = #"xdasadsdDoDialogWizardWithArguments('CopyGroup', '&act=enviarcliente', 96487, (Q.getBody().$.innerWidth()/4)*3, Q.getBody().$.innerHeight(), new Function(" + "\""
+ "if(localStorage.getItem('atualizaPgsParaCli')){{Q.window.close();Q.window.proxy.reload();}}localStorage.removeItem('atualizaPgsParaCli');return true;"
+ "\"" + "), false);p"; //An example of what you asked for
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(#"DoDialogWizardWithArguments([^,]*,[^,]*,[^,]*,([^,]*),.+);"); //This is your function
MessageBox.Show(r.Match(src).Value);
if (r.IsMatch(src))
MessageBox.Show("Yeah, it's DoDialog");
else MessageBox.Show("Nope, Nope, Nope");

Help with simplifying a couple of regex's

Below I have two regex's that operate on some text:
assume key = "old" and value = "new"
text = Regex.Replace(text,
"\\." + change.Key + ",",
"." + change.Value + ","
);
text = Regex.Replace(text,
"\\." + change.Key + ";",
"." + change.Value + ";"
);
So, ".old," and ".old;" would change to ".new," and ".new;", respectively.
I'm sure this could be shortened to one regex. How can I do this so that the string only changes when the comma and semicolon are at the end of the variable? For example, I don't want ".oldQ" to change to ".newQ". Thanks!
.NET uses $ for backreferences:
text = Regex.Replace(text,
#"\." + change.Key + "([,;])",
"." + change.Value + "$1");
Out of my head:
text = Regex.Replace(text, #"\.(old|new),",#"\.\1;");
You want to just change the middle part, so:
text = Regex.Replace(text,
"\\." + change.Key + "(,|;)^", // mark a group using "()" for substitution...
"." + change.Value + "\1" // use the group ("\1")
);
I like using \b, like this:
text = Regex.Replace(text, #"\." + change.Key + #"\b", "." + change.Value);
It would match on keywords followed by other delimiters, not just "," and ";", but it may still work in your case.

Categories