I'm stuck with regular expressions. The program is a console application written in C#. There are a few commands. I want to check the arguments are right first. I thought it'll be easy with Regex but couldn't do that:
var strArgs = "";
foreach (var x in args)
{
strArgs += x + " ";
}
if (!Regex.IsMatch(strArgs, #"(-\?|-help|-c|-continuous|-l|-log|-ip|)* .{1,}"))
{
Console.WriteLine("Command arrangement is wrong. Use \"-?\" or \"-help\" to see help.");
return;
}
Usage is:
program.exe [-options] [domains]
The problem is, program accepts all commands. Also I need to check "-" prefixed commands are before the domains. I think the problem is not difficult to solve.
Thanks...
Since you will end up writing a switch statement to process the options anyway, you would be better off doing the checking there:
switch(args[i])
{
case "-?": ...
case "-help": ...
...
default:
if (args[i][0] == '-')
throw new Exception("Unrecognised option: " + args[i]);
}
First, to parse command line arguments don't use regular expressions. Here is a related question that I think you should look at:
Best way to parse command line arguments in C#?
But for your specific problem with your regular expression - the options are optional and then you match against a space followed by anything at all, where anything can include for example invalid domains and/or invalid options. So far example this is valid according to your regular expression:
program.exe -c -invalid
One way to improve this by being more precise about the allowed characters in a domain rather than just matching anything.
Another problem with your regular expressions is that you don't allow spaces between the switches. To handle that you probably want something like this:
(?:(?:-\?|-help|-c|-continuous|-l|-log|-ip) +)*
I'd also like to point out that you should use string.Join instead of the loop you are currently using.
string strArgs = string.Join(" ", args);
Don't reinvent the wheel, handling command line arguments is a solved problem.
I've gotten good use out of the Command Line Parser Library for .Net.
Actually the easiest way to achieve command line argument parsing is to create a powershell commandlet. That gives you a really nice way to work with arguments.
I have been using this function with success... perhaps it will be useful for someone else...
First, define your variables:
private string myVariable1;
private string myVariable2;
private Boolean debugEnabled = false;
Then, execute the function:
loadArgs();
and add the function to your code:
private void loadArgs()
{
const string namedArgsPattern = "^(/|-)(?<name>\\w+)(?:\\:(?<value>.+)$|\\:$|$)";
System.Text.RegularExpressions.Regex argRegEx = new System.Text.RegularExpressions.Regex(namedArgsPattern, System.Text.RegularExpressions.RegexOptions.Compiled);
foreach (string arg in Environment.GetCommandLineArgs())
{
System.Text.RegularExpressions.Match namedArg = argRegEx.Match(arg);
if (namedArg.Success)
{
switch (namedArg.Groups["name"].ToString().ToLower())
{
case "myArg1":
myVariable1 = namedArg.Groups["value"].ToString();
break;
case "myArg2":
myVariable2 = namedArg.Groups["value"].ToString();
break;
case "debug":
debugEnabled = true;
break;
default:
break;
}
}
}
}
and to use it you can use the command syntax with either a forward slash "/" or a dash "-":
myappname.exe /myArg1:Hello /myArg2:Chris -debug
This regex parses the command line arguments into matches and groups so that you can build a parser based on this regex.
((?:|^\b|\s+)--(?<option_1>.+?)(?:\s|=|$)(?!-)(?<value_1>[\"\'].+?[\"\']|.+?(?:\s|$))?|(?:|^\b)-(?<option_2>.)(?:\s|=|$)(?!-)(?<value_2>[\"\'].+?[\"\']|.+?(?:\s|$))?|(?<arg>[\"\'].+?[\"\']|.+?(?:\s|$)))
This Regex will parse the Following and works in almost all the languages
--in-argument hello --out-stdout false positional -x
--in-argument 'hello world"
"filename"
--in-argument="hello world'
--in-argument='hello'
--in-argument hello
"hello"
helloworld
--flag-off
-v
-x="hello"
-u positive
C:\serverfile
--in-arg1='abc' --in-arg2=hello world c:\\test
Try on Regex101
Related
I've a code snippet written in JScript.Net for FiddlerScript and trying to re-write the same in C# to use as Fiddler extension. I'm not familiar with C# syntax, so need help.
Here's the existing code snippet in JScript.NET, which I'm trying to convert into C#:
var sHostname = oSession.hostname;
switch(sHostname) {
case /example1.com/i.test(sHostname) && sHostname:
case /example2.com/i.test(sHostname) && sHostname:
case /example3.com/i.test(sHostname) && sHostname:
MessageBox.Show("Matched: " + sHostname);
default:
FiddlerApplication.Log.LogString("No match for hostname.");
}
Here's something I tried in C#, but this is very primitive:
var sHostname = oSession.hostname;
string[] patterns = { #"[a-z]", #"[0-9]", #"example[0-9]", #"[a-z0-9.]", #"\w+" }; // a collection of about 20K+ pattern entries, yes 20K+, and stored in separate class
IList<string> patternList = new ReadOnlyCollection<string>(patterns);
var status = false;
var pattern = "";
foreach (string p in patternList)
{
if (Regex.IsMatch(sHostname, p, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace))
{
pattern = p;
status = true;
break;
}
}
if (status)
{
System.Diagnostics.Debug.WriteLine("Matched: " + pattern);
// Other code
} else
{
System.Diagnostics.Debug.WriteLine("No match found");
// Other code
}
As a newbie to C#, I came across many stuff:
Regex compiled version is better than creating new object every time. I'm not sure specifying switches RegexOptions::Compiled | RegexOptions::IgnoreCase will be same as creating the list once and using over iterations. Any hint on this will be helpful.
Usage of LINQ instead of boilerplate code.
And many more good things in C# world
All I'm interested is a match or not (boolean true / false value), like Any() and not performing any match, split, etc. Again, this code need to be executed over several hundred times per second in Fiddler, so it need to be efficient enough to match against over 20K+ patterns list for a specified input string. I know it sounds bit crazy, so need help from experts.
FYI, I'm using Visual Studio 2013, so not sure if I can use C# 7 syntax. I'd be very much interested if better things can be done in C# 7 or higher VS versions.
Thank you in advance!
I have a project to demonstrate a program similar to the "echo" command in the MS-DOS Command Line. Here is the code in C#:
using System;
namespace arguments
{
class Program
{
static void Main(string[] args)
{
try
{
switch (args[0])
{
case "/?":
string location = System.Reflection.Assembly.GetEntryAssembly().Location;
string name = System.IO.Path.GetFileName(location);
Console.WriteLine("Displays messages\nSyntax: {0} [message]", name);
Environment.Exit(0);
break;
}
if (args.Length >= 0)
{
string x = "";
foreach (var item in args)
{
x += item.ToString() + " ";
}
Console.WriteLine(Convert.ToString(x)); // this should eliminate vulnerabilities.
}
}
catch
{
string location = System.Reflection.Assembly.GetEntryAssembly().Location;
string name = System.IO.Path.GetFileName(location);
Console.WriteLine("Displays messages\nSyntax: {0} [message]", name);
}
}
}
}
This does a pretty efficient job at doing what it's supposed to do. Then I got into trying to exploit it in any way I could.
In command prompt, I ran arguments.exe ", this is supposed to print out ". But that's not really what happened. I then tried the same with the echo command by running echo ", and it, like it's supposed to, printed out ". This is mind boggling because I wouldn't have even thought this would be a problem. I couldn't get it to pose a great threat, just confused me for a minute.
My question is, is there any way to pass the quotation mark (") as argument to this console application?
Here is a picture to demonstrate it a little bit better: http://prntscr.com/cm9yal
void Main(string[] args)
args array here contains the arguments which have been passed to your application. Because arguments may have spaces they can be surrounded by quotes.
For this reason you won't get the string you have placed as argument. You will also loose any number of spaces between quoted parameters.
If you need the raw command line string, use:
string cmdline = System.Environment.CommandLine;
To be able to get the single quote, you'll need to bypass the default parsing performed by the CLR when populating the args array. You can do this by examining Environment.CommandLine, which in the case you describe above will return something along the lines of:
ConsoleApplication1.exe \"
Note, the argument I passed was simply " (not the escaped variant shown).
I've been doing some googling and did not find any solution. The most common case of a path-argument combo has quotes like
"C:\Program Files\example.exe" -argument --argument -argument "argument argument"
"C:\Program Files\example.exe" /argument /argument /argument "argument argument"
They simply go through the entire thing, look for the second quote, then treat everything after that as an argument.
.
The second solution I found (see here) works without quotes yet only works for paths without spaces. See below.
This works: C:\Windows\System32\Sample.exe -args -args -args "argument argument"
This does not work: C:\Program Files\Sample.exe -argument "arg arg" --arg-arg
This works in the same manner. They look for the first space then treat everything after it as an argument, which will not work with some/most programs (the program files folder name has a space).
.
Is there a solution to this? I've tried to use and tweak numerous snippets and even tried to make my own regex statement yet they all failed. Code snippets or even a library would come in handy.
Thanks in advance!
EDIT: The snippets I found as per request
Snippet 1:
char* lpCmdLine = ...;
char* lpArgs = lpCmdLine;
// skip leading spaces
while(isspace(*lpArgs))
lpArgs++;
if(*lpArgs == '\"')
{
// executable is quoted; skip to first space after matching quote
lpArgs++;
int quotes = 1;
while(*lpArgs)
{
if(isspace(*lpArgs) && !quotes)
break;
if(*lpArgs == '\"')
quotes = !quotes;
}
}
else
{
// executable is not quoted; skip to first space
while(*lpArgs && !isspace(*lpArgs))
lpArgs++;
}
// TODO: skip any spaces before the first arg
Source 2: almost everything in here
Source 3: Various shady blogs
You could try a CSV parser like the only onboard in .NET, the VisualBasic.TextFieldParser:
List<string[]> allLineFields = new List<string[]>();
var textStream = new System.IO.StringReader(text);
using (var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(textStream))
{
parser.Delimiters = new string[] { " " };
parser.HasFieldsEnclosedInQuotes = true; // <--- !!!
string[] fields;
while ((fields = parser.ReadFields()) != null)
{
allLineFields.Add(fields);
}
}
With a single string the list contains one String[], the first is the path, the rest are args.
Update: this works with all but your last string because the path is C:\Program Files\Sample.exe. You have to wrap it in quotes, otherwise the space in Program Files splits them into two parts, but that is a known issue with windows paths and scripts.
I created a little console program that will search text files and return all string lines that matches a variable entered by a user. One issue I ran into is, say I want to look up "1234" which represents a location code, but there is also a phone number that has "555-1234" in the string line, I get that one back too. I am thinking if I input the delimiter (ex: ",") with the variable (",1234,") then maybe I can ensure search is accurate. Am I on the right track, or is there a better way? This is where I am at so far:
string[] file = File.ReadAllLines(sPath);
foreach (string s in file)
{
using (StreamWriter sw = File.AppendText(rPath))
{
if (sFound = Regex.IsMatch(s, string.Format(#"\b{0}\b",
Regex.Escape(searchVariable))))
{
sw.WriteLine(s);
}
}
}
I'd say you are on the right track.
I'd suggest changing the regular expressions so that it uses a negative lookbehind to match "searchVariable" that is not preceeded by "-", so "1234" in "555-1234" wouldn't be matched, but ",1234" (for instance) would.
You will only need to use "Regex.Escape()" if you want to include special regular expression characters in your search, which from your question you don't want to do.
You could change the code to something like this (it's late so I haven't tested this!):
var lines= File.ReadAllLines(sPath);
var regex = new Regex(String.Format("(?<!-){0}\b", searchVariable));
if (lines.Any())
{
using (var streamWriter = File.AppendText(rPath))
{
foreach (var line in lines)
{
if (regex.IsMatch(line))
{
streamWriter.WriteLine(line);
}
}
}
}
A great website for testing these (often tricky!) regular expressions is Regex Hero.
Use Linq to CSV and make your life easier. Just go to Nuget and search Linq to CSV.
I have some site content that contains abbreviations. I have a list of recognised abbreviations for the site, along with their explanations. I want to create a regular expression which will allow me to replace all of the recognised abbreviations found in the content with some markup.
For example:
content: This is just a little test of the memb to see if it gets picked up.
Deb of course should also be caught here.
abbreviations: memb = Member; deb = Debut;
result: This is just a little test of the [a title="Member"]memb[/a] to see if it gets picked up.
[a title="Debut"]Deb[/a] of course should also be caught here.
(This is just example markup for simplicity).
Thanks.
EDIT:
CraigD's answer is nearly there, but there are issues. I only want to match whole words. I also want to keep the correct capitalisation of each word replaced, so that deb is still deb, and Deb is still Deb as per the original text. For example, this input:
This is just a little test of the memb.
And another memb, but not amemba.
Deb of course should also be caught here.deb!
First you would need to Regex.Escape() all the input strings.
Then you can look for them in the string, and iteratively replace them by the markup you have in mind:
string abbr = "memb";
string word = "Member";
string pattern = String.Format("\b{0}\b", Regex.Escape(abbr));
string substitue = String.Format("[a title=\"{0}\"]{1}[/a]", word, abbr);
string output = Regex.Replace(input, pattern, substitue);
EDIT: I asked if a simple String.Replace() wouldn't be enough - but I can see why regex is desirable: you can use it to enforce "whole word" replacements only by making a pattern that uses word boundary anchors.
You can go as far as building a single pattern from all your escaped input strings, like this:
\b(?:{abbr_1}|{abbr_2}|{abbr_3}|{abbr_n})\b
and then using a match evaluator to find the right replacement. This way you can avoid iterating the input string more than once.
Not sure how well this will scale to a big word list, but I think it should give the output you want (although in your question the 'result' seems identical to 'content')?
Anyway, let me know if this is what you're after
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var input = #"This is just a little test of the memb to see if it gets picked up.
Deb of course should also be caught here.";
var dictionary = new Dictionary<string,string>
{
{"memb", "Member"}
,{"deb","Debut"}
};
var regex = "(" + String.Join(")|(", dictionary.Keys.ToArray()) + ")";
foreach (Match metamatch in Regex.Matches(input
, regex /*#"(memb)|(deb)"*/
, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
{
input = input.Replace(metamatch.Value, dictionary[metamatch.Value.ToLower()]);
}
Console.Write (input);
Console.ReadLine();
}
}
}
For anyone interested, here is my final solution. It is for a .NET user control. It uses a single pattern with a match evaluator, as suggested by Tomalak, so there is no foreach loop. It's an elegant solution, and it gives me the correct output for the sample input while preserving correct casing for matched strings.
public partial class Abbreviations : System.Web.UI.UserControl
{
private Dictionary<String, String> dictionary = DataHelper.GetAbbreviations();
protected void Page_Load(object sender, EventArgs e)
{
string input = "This is just a little test of the memb. And another memb, but not amemba to see if it gets picked up. Deb of course should also be caught here.deb!";
var regex = "\\b(?:" + String.Join("|", dictionary.Keys.ToArray()) + ")\\b";
MatchEvaluator myEvaluator = new MatchEvaluator(GetExplanationMarkup);
input = Regex.Replace(input, regex, myEvaluator, RegexOptions.IgnoreCase);
litContent.Text = input;
}
private string GetExplanationMarkup(Match m)
{
return string.Format("<b title='{0}'>{1}</b>", dictionary[m.Value.ToLower()], m.Value);
}
}
The output looks like this (below). Note that it only matches full words, and that the casing is preserved from the original string:
This is just a little test of the <b title='Member'>memb</b>. And another <b title='Member'>memb</b>, but not amemba to see if it gets picked up. <b title='Debut'>Deb</b> of course should also be caught here.<b title='Debut'>deb</b>!
I doubt it will perform better than just doing normal string.replace, so if performance is critical measure (refactoring a bit to use a compiled regex). You can do the regex version as:
var abbrsWithPipes = "(abbr1|abbr2)";
var regex = new Regex(abbrsWithPipes);
return regex.Replace(html, m => GetReplaceForAbbr(m.Value));
You need to implement GetReplaceForAbbr, which receives the specific abbr being matched.
I'm doing pretty exactly what you're looking for in my application and this works for me:
the parameter str is your content:
public static string GetGlossaryString(string str)
{
List<string> glossaryWords = GetGlossaryItems();//this collection would contain your abbreviations; you could just make it a Dictionary so you can have the abbreviation-full term pairs and use them in the loop below
str = string.Format(" {0} ", str);//quick and dirty way to also search the first and last word in the content.
foreach (string word in glossaryWords)
str = Regex.Replace(str, "([\\W])(" + word + ")([\\W])", "$1<span class='glossaryItem'>$2</span>$3", RegexOptions.IgnoreCase);
return str.Trim();
}