NEsper issue with regexp - c#

I have been stuck here for a good while and seem to nail the problem to incorrect NEsper behaviour with regex. I wrote a simple project to reproduce the issue and it is available from github.
In a nutshell, NEsper allows me to pump messages (events) through a set of rules (SQL-like). If an event matches a rule, NEsper fires an alert. In my application I need to use a regular expression and this doesn't seem to work.
Problem
I tried both approaches of creating statements createPattern and createEPL and they are not firing a match event, however a regular expression and an input are matching by the .NET Regex class. If instead of regex ("\b\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\b") I pass a matching value ("127.0.0.5") to the statement, the event successfully fires.
INPUT
127.0.0.5
==RULE FAIL==
every (Id123=TestDummy(Value regexp '\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'))
// and I want this to pass
==RULE PASS==
every (Id123=TestDummy(Value regexp '127.0.0.5'))
Question
Could anyone help me out with a sample of NEsper regular expression matching? Or perhaps point to my dumb mistake in the code.
Code
This is my NEsper demo wrapper class
public class NesperAdapter
{
public MatchEventSubscrtiber Subscriber { get; set; }
internal EPServiceProvider Engine { get; private set; }
public NesperAdapter()
{
//This call internally depend on log4net,
//will throw an error if log4net cannot be loaded
EPServiceProviderManager.PurgeDefaultProvider();
//config
var configuration = new Configuration();
configuration.AddEventType("TestDummy", typeof(TestDummy).FullName);
configuration.EngineDefaults.Threading.IsInternalTimerEnabled = false;
configuration.EngineDefaults.Logging.IsEnableExecutionDebug = false;
configuration.EngineDefaults.Logging.IsEnableTimerDebug = false;
//engine
Engine = EPServiceProviderManager.GetDefaultProvider(configuration);
Engine.EPRuntime.SendEvent(new TimerControlEvent(TimerControlEvent.ClockTypeEnum.CLOCK_EXTERNAL));
Engine.Initialize();
Engine.EPRuntime.UnmatchedEvent += OnUnmatchedEvent;
}
public void AddStatementFromRegExp(string regExp)
{
const string pattern = "any (Id123=TestDummy(Value regexp '{0}'))";
string formattedPattern = String.Format(pattern, regExp);
EPStatement statement = Engine.EPAdministrator.CreatePattern(formattedPattern);
//this is subscription
Subscriber = new MatchEventSubscrtiber();
statement.Subscriber = Subscriber;
}
internal void OnUnmatchedEvent(object sender, UnmatchedEventArgs e)
{
Console.WriteLine(#"Unmatched event");
Console.WriteLine(e.Event);
}
public void SendEvent(object someEvent)
{
Engine.EPRuntime.SendEvent(someEvent);
}
}
Then subscriber and a DummyType
public class MatchEventSubscrtiber
{
public bool HasEventFired { get; set; }
public MatchEventSubscrtiber()
{
HasEventFired = false;
}
public void Update(IDictionary<string, object> rows)
{
Console.WriteLine("Match event fired");
Console.WriteLine(rows);
HasEventFired = true;
}
}
public class TestDummy
{
public string Value { get; set; }
}
And NUnit test. If one comments nesper.AddStatementFromRegExp(regexp); line and uncomments //nesper.AddStatementFromRegExp(input); line then test pass. However I need a regular expression there.
//Match any IP address
[TestFixture(#"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b", "127.0.0.5")]
public class WhenValidRegexpPassedAndRuleCreatedAndPropagated
{
private NesperAdapter nesper;
//Setup
public WhenValidRegexpPassedAndRuleCreatedAndPropagated(string regexp, string input)
{
//check it is valid regexp in .NET
var r = new Regex(regexp);
var match = r.Match(input);
Assert.IsTrue(match.Success, "Regexp validation failed in .NET");
//create and start engine
nesper = new NesperAdapter();
//Add a rule, this fails with a correct regexp and a matching input
//PROBLEM IS HERE
nesper.AddStatementFromRegExp(regexp);
//PROBLEM IS HERE
//This works, but it is just input self-matching
//nesper.AddStatementFromRegExp(input);
var oneEvent = new TestDummy
{
Value = input
};
nesper.SendEvent(oneEvent);
}
[Test]
public void ThenNesperFiresMatchEvent()
{
//wait till nesper process the event
Thread.Sleep(100);
//Check if subscriber has received the event
Assert.IsTrue(nesper.Subscriber.HasEventFired,
"Event didn't fire");
}
}

I was debugging this issue for some time now and found that NEsper incorrectly handles
WHERE regexp 'foobar' statement
So if I have
SELECT * FROM MyType WHERE PropertyA regexp 'some valid regexp'
NEsper performs string formatting and validation with 'some valid regexp' and removes important (and valid) symbols from regexp. This is how I fixed it for myself. Not sure if it is a recommended approach.
File: com.espertech.esper.epl.expression.ExprRegexpNode
Reason: I think it is up to the user how regexp is constructed, this shall not be part of a framework.
// Inside this method
public object Evaluate(EventBean[] eventsPerStream, bool isNewData, ExprEvaluatorContext exprEvaluatorContext){...}
// Find two occurrences of
_pattern = new Regex(String.Format("^{0}$", patternText));
// And change to
_pattern = new Regex(patternText);
File: com.espertech.esper.epl.parse.ASTConstantHelper
Reason: requireUnescape for all strings, but skip regexp as this brakes valid regexp and removes some valid symbols from it.
// Inside this method
public static Object Parse(ITree node){...}
// Find one occurrence of
case EsperEPL2GrammarParser.STRING_TYPE:
{
return StringValue.ParseString(node.Text, requireUnescape);
}
// And change to
case EsperEPL2GrammarParser.STRING_TYPE:
{
bool requireUnescape = true;
if (node.Parent != null)
{
if (!String.IsNullOrEmpty(node.Parent.Text))
{
if (node.Parent.Text == "regexp")
{
requireUnescape = false;
}
}
}
return StringValue.ParseString(node.Text, requireUnescape);
}
File: com.espertech.esper.type.StringValue
Reason: unescape all strings, but the regexp value.
// Inside this method
public static String ParseString(String value){...}
// Change from
public static String ParseString(String value)
{
if ((value.StartsWith("\"")) & (value.EndsWith("\"")) || (value.StartsWith("'")) & (value.EndsWith("'")))
{
if (value.Length > 1)
{
if (value.IndexOf('\\') != -1)
{
return Unescape(value.Substring(1, value.Length - 2));
}
return value.Substring(1, value.Length - 2);
}
}
throw new ArgumentException("String value of '" + value + "' cannot be parsed");
}
// Change to
public static String ParseString(String value, bool requireUnescape = true)
{
if ((value.StartsWith("\"")) & (value.EndsWith("\"")) || (value.StartsWith("'")) & (value.EndsWith("'")))
{
if (value.Length > 1)
{
if (requireUnescape)
{
if (value.IndexOf('\\') != -1)
{
return Unescape(value.Substring(1, value.Length - 2));
}
}
return value.Substring(1, value.Length - 2);
}
}
throw new ArgumentException("String value of '" + value + "' cannot be parsed");
}

Related

Using "IndexOf" on a List<T> of objects

In my Visual Studio2019 C# console program (named 'codeTester') I have this object:
public class ipData
{
private string ip;
private string region;
private string country;
public ipData(string ip, string region, string country)
{
this.ip = ip;
this.region = region;
this.country = country;
}
public string Ip
{
get { return ip; }
set { ip = value; }
}
public string Region
{
get { return region; }
set { region = value; }
}
public string Country
{
get { return country; }
set { country = value; }
}
}
and I created a List of this object and add some data:
List<ipData> ipInfo = new List<ipData>();
ipInfo.Add(new ipData("192.168.0.199", "UT", "USA"));
ipInfo.Add(new ipData("251.168.0.963", "NB", "CAN"));
Now I want to search the list on one of its fields so I ask the user for the data to search for:
Console.WriteLine("Enter searh criteria: ");
string searchparam = Console.ReadLine();
Next I want the index of the found item, if any:
int x = ipInfo.IndexOf(searchparam);
but this statement throws a design-time exception which says:
"Argument 1: cannot convert from 'string' to 'codeTester.Program.ipData'"
So I've been stuck at this point for hours and all my searches have not yielded anything pertinent. Where am I going wrong?
It does not work, because the search parameter is expected to be of the same type as the element type of the list. In this case ipData.
You could use FindIndex which accepts a lambda expression as parameter:
int x =
ipInfo.FindIndex(ip => ip.Region == searchparam || ip.Country == searchparam);
if (x >= 0) {
Console.WriteLine($"The IP address is {ipInfo[x]}");
} else {
Console.WriteLine("not found");
}
or you can use LINQ like this:
string ipAddress = ipInfo
.FirstOrDefault(ip => ip.Region == searchparam || ip.Country == searchparam)?.Ip;
This will return a null string if no entry was found.
You can also have it return the whole record instead:
ipData data = ipInfo
.FirstOrDefault(ip => ip.Region == searchparam || ip.Country == searchparam);
if (data != null) {
Console.WriteLine(
$"Country = {data.Country}, Region = {data.Region}, IP = {data.ip}");
}
LINQ also allows you to return more than one result. E.g. you can return all data corresponding to one country like this:
var result = ipInfo.Where(ip => ip.Country == "USA");
foreach (ipData data in result) {
Console.WriteLine(
$"Country = {data.Country}, Region = {data.Region}, IP = {data.ip}");
}
The C# naming conventions state the class names should be written in PascalCase. Another convention says that acronyms with up to two characters in length are written all upper case (IP). According to these conventions, the class name should be IPData.
See also:
Capitalization Conventions (Microsoft Docs)
C# Coding Standards and Naming Conventions
That is because IndexOf searches for the exact object in list. For it to work, you would have to pass ipData to it instead.
For this purpose, you should use Find or FindIndex.

Check command line arguments for input value

i am using command line arguments and if conditions are used to check the input values but it is not looking good can i change it to switch but i have no idea how to change it my code is
if (args.Length > 0 && args.Length == 4)
{
string programName = args[0];
string file1= args[2];
string file2= args[3];
bool flag = false;
int num= 0;
bool isNum = Int32.TryParse(args[1].ToString(), out num);
if (!(programName.Equals("Army")))
{
Console.WriteLine("Error");
}
if (!Int32.TryParse(args[1].ToString(), out isNum ))
{
Console.WriteLine("value should be a number");
}
if (!File.Exists(file1))
{
Console.WriteLine("file 1 does not exist");
}
if (!File.Exists(file2))
{
Console.WriteLine("file 2 does not exist");
}
A switch statement isn't really called for here. That's useful when you have a single value and need to select from a series of possible mutually-exclusive steps based on that value. But that's not what you're doing here. These aren't a chain of if/else if statements keying off a value, these are more like guard clauses. All of them need to run in order to determine all of the output to show to the user.
You can shorten the code by removing the curly braces:
if (!(programName.Equals("Army")))
Console.WriteLine("Error");
if (!Int32.TryParse(args[1].ToString(), out isNum ))
Console.WriteLine("value should be a number");
if (!File.Exists(file1))
Console.WriteLine("file 1 does not exist");
if (!File.Exists(file2))
Console.WriteLine("file 2 does not exist");
You could also extract these lines of code into their own method, which would make the Main method a little cleaner. You could even extract the conditional checks themselves into very small methods to make it more prose-like for readability. But the conditional structure itself doesn't really need to change.
You can create class which will be responsible for retrieving and checking your application arguments. E.g. if your application has name Zorg, you can create following class:
public class ZorgConfiguration
{
private string num;
private string programName;
private string file1;
private string file2;
public static ZorgConfiguration InitializeFrom(string[] args)
{
if (args.Length < 4)
throw new ZorgConfigurationException("At least 4 arguments required");
return new ZorgConfiguration {
ProgramName = args[0],
Num = args[1],
File1 = args[2],
File2 = args[3]
};
}
// to be continued
}
As you can see, it's responsibility is to hold application settings. It has static method for creating instance of configuration from args array. This method checks if arguments count correct and then initializes each property of configuration class with appropriate argument. Checking argument value moved to properties:
public string ProgramName
{
get { return programName; }
private set {
if (value == "Army")
throw new ZorgConfigurationException("Error");
programName = value;
}
}
public string Num
{
get { return num; }
private set {
int i;
if (!Int32.TryParse(value, out i))
throw new ZorgConfigurationException("value should be a number");
num = value;
}
}
public string File1
{
get { return file1; }
private set {
if (!File.Exists(value))
throw new ZorgConfigurationException("file 1 does not exist");
file1 = value;
}
}
Each property is responsible for verifying corresponding argument value. If value is incorrect, then custom ZorgConfigurationException (that is simply class inherited from Exception) is thrown.
Now main application code looks very clean:
try
{
var config = ZorgConfiguration.InitializeFrom(args);
// you can use config.File1 etc
}
catch (ZorgConfigurationException e)
{
Console.WriteLine(e.Message);
// exit application
}
I use this class to parse command line arguments, I've found it somewhere, but I can't remember where:
public class Arguments
{
// Variables
private StringDictionary Parameters;
// Constructor
public Arguments(string[] Args)
{
Parameters = new StringDictionary();
Regex Spliter = new Regex(#"^-{1,2}|^/|=|:",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
Regex Remover = new Regex(#"^['""]?(.*?)['""]?$",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
string Parameter = null;
string[] Parts;
// Valid parameters forms:
// {-,/,--}param{ ,=,:}((",')value(",'))
// Examples:
// -param1 value1 --param2 /param3:"Test-:-work"
// /param4=happy -param5 '--=nice=--'
foreach (string Txt in Args)
{
// Look for new parameters (-,/ or --) and a
// possible enclosed value (=,:)
Parts = Spliter.Split(Txt, 3);
switch (Parts.Length)
{
// Found a value (for the last parameter
// found (space separator))
case 1:
if (Parameter != null)
{
if (!Parameters.ContainsKey(Parameter))
{
Parts[0] =
Remover.Replace(Parts[0], "$1");
Parameters.Add(Parameter, Parts[0]);
}
Parameter = null;
}
// else Error: no parameter waiting for a value (skipped)
break;
// Found just a parameter
case 2:
// The last parameter is still waiting.
// With no value, set it to true.
if (Parameter != null)
{
if (!Parameters.ContainsKey(Parameter))
Parameters.Add(Parameter, "true");
}
Parameter = Parts[1];
break;
// Parameter with enclosed value
case 3:
// The last parameter is still waiting.
// With no value, set it to true.
if (Parameter != null)
{
if (!Parameters.ContainsKey(Parameter))
Parameters.Add(Parameter, "true");
}
Parameter = Parts[1];
// Remove possible enclosing characters (",')
if (!Parameters.ContainsKey(Parameter))
{
Parts[2] = Remover.Replace(Parts[2], "$1");
Parameters.Add(Parameter, Parts[2]);
}
Parameter = null;
break;
}
}
// In case a parameter is still waiting
if (Parameter != null)
{
if (!Parameters.ContainsKey(Parameter))
Parameters.Add(Parameter, "true");
}
}
// Retrieve a parameter value if it exists
// (overriding C# indexer property)
public string this[string Param]
{
get
{
return (Parameters[Param]);
}
}
}
I use it this way:
var cmdParams = new Arguments(args);
if (cmdParams["File"] != null && parametros["cmdParams"] == "Filename.txt) { }
Hope it helps!
Command line arguments can get complicated if there are different functions and arguments..
Best way is to tokenize your arguments, function switch examples are /p /a, or -h, -g etc...Your cmd arg parser looks for these tokens (pattern) - once found you know which cmd it is.. Have switch - case or any other mechanism for this. Also tokenise the other data arguments. Hence you have two sets of arguments - easy to manage.

Regular expression can't handle rogue square brackets

Thanks to the smarties on here in the past I have this amazing recursive regular expression that helps me to transform custom BBCode-style tags in a block of text.
/// <summary>
/// Static class containing common regular expression strings.
/// </summary>
public static class RegularExpressions
{
/// <summary>
/// Expression to find all root-level BBCode tags. Use this expression recursively to obtain nested tags.
/// </summary>
public static string BBCodeTags
{
get
{
return #"
(?>
\[ (?<tag>[^][/=\s]+) \s*
(?: = \s* (?<val>[^][]*) \s*)?
]
)
(?<content>
(?>
\[(?<innertag>[^][/=\s]+)[^][]*]
|
\[/(?<-innertag>\k<innertag>)]
|
[^][]+
)*
(?(innertag)(?!))
)
\[/\k<tag>]
";
}
}
}
This regex works beautifully, recursively matching on all tags. Like this:
[code]
some code
[b]some text [url=http://www.google.com]some link[/url][/b]
[/code]
The regex does exactly what I want and matches the [code] tag. It breaks it up into three groups: tag, optional value, and content. Tag being the tag name ("code" in this case). Optional value being a value after the equals(=) sign if there is one. And content being everything between the opening and closing tag.
The regex can be used recursively to match nested tags. So after matching on [code] I would run it again against the content group and it would match the [b] tag. If I ran it again on the next content group it would then match the [url] tag.
All of that is wonderful and delicious but it hiccups on one issue. It can't handle rogue square brackets.
[code]This is a successful match.[/code]
[code]This is an [ unsuccessful match.[/code]
[code]This is also an [unsuccessful] match.[/code]
I really suck at regular expressions but if anyone knows how I might tweak this regex to correctly ignore rogue brackets (brackets that do not make up an opening tag and/or do not have a matching closing tag) so that it still matches the surrounding tags, I would be very appreciative :D
Thanks in advance!
Edit
If you are interested in seeing the method where I use this expression you are welcome to.
I did a program that can parse your strings in a debugable, developer-friendly way. It is not a small code like those regexes, but it has a positive side: you can debug it, and fine tune it as you need.
The implementation is a descent recursive parser, but if you need some kind of contextual data, you can place it all inside the ParseContext class.
It is quite long, but I consider it as being better than a a regex based solution.
To test it, create a console application, and replace all the code inside Program.cs with the following code:
using System.Collections.Generic;
namespace q7922337
{
static class Program
{
static void Main(string[] args)
{
var result1 = Match.ParseList<TagsGroup>("[code]This is a successful match.[/code]");
var result2 = Match.ParseList<TagsGroup>("[code]This is an [ unsuccessful match.[/code]");
var result3 = Match.ParseList<TagsGroup>("[code]This is also an [unsuccessful] match.[/code]");
var result4 = Match.ParseList<TagsGroup>(#"
[code]
some code
[b]some text [url=http://www.google.com]some link[/url][/b]
[/code]");
}
class ParseContext
{
public string Source { get; set; }
public int Position { get; set; }
}
abstract class Match
{
public override string ToString()
{
return this.Text;
}
public string Source { get; set; }
public int Start { get; set; }
public int Length { get; set; }
public string Text { get { return this.Source.Substring(this.Start, this.Length); } }
protected abstract bool ParseInternal(ParseContext context);
public bool Parse(ParseContext context)
{
var result = this.ParseInternal(context);
this.Length = context.Position - this.Start;
return result;
}
public bool MarkBeginAndParse(ParseContext context)
{
this.Start = context.Position;
var result = this.ParseInternal(context);
this.Length = context.Position - this.Start;
return result;
}
public static List<T> ParseList<T>(string source)
where T : Match, new()
{
var context = new ParseContext
{
Position = 0,
Source = source
};
var result = new List<T>();
while (true)
{
var item = new T { Source = source, Start = context.Position };
if (!item.Parse(context))
break;
result.Add(item);
}
return result;
}
public static T ParseSingle<T>(string source)
where T : Match, new()
{
var context = new ParseContext
{
Position = 0,
Source = source
};
var result = new T { Source = source, Start = context.Position };
if (result.Parse(context))
return result;
return null;
}
protected List<T> ReadList<T>(ParseContext context)
where T : Match, new()
{
var result = new List<T>();
while (true)
{
var item = new T { Source = this.Source, Start = context.Position };
if (!item.Parse(context))
break;
result.Add(item);
}
return result;
}
protected T ReadSingle<T>(ParseContext context)
where T : Match, new()
{
var result = new T { Source = this.Source, Start = context.Position };
if (result.Parse(context))
return result;
return null;
}
protected int ReadSpaces(ParseContext context)
{
int startPos = context.Position;
int cnt = 0;
while (true)
{
if (startPos + cnt >= context.Source.Length)
break;
if (!char.IsWhiteSpace(context.Source[context.Position + cnt]))
break;
cnt++;
}
context.Position = startPos + cnt;
return cnt;
}
protected bool ReadChar(ParseContext context, char p)
{
int startPos = context.Position;
if (startPos >= context.Source.Length)
return false;
if (context.Source[startPos] == p)
{
context.Position = startPos + 1;
return true;
}
return false;
}
}
class Tag : Match
{
protected override bool ParseInternal(ParseContext context)
{
int startPos = context.Position;
if (!this.ReadChar(context, '['))
return false;
this.ReadSpaces(context);
if (this.ReadChar(context, '/'))
this.IsEndTag = true;
this.ReadSpaces(context);
var validName = this.ReadValidName(context);
if (validName != null)
this.Name = validName;
this.ReadSpaces(context);
if (this.ReadChar(context, ']'))
return true;
context.Position = startPos;
return false;
}
protected string ReadValidName(ParseContext context)
{
int startPos = context.Position;
int endPos = startPos;
while (char.IsLetter(context.Source[endPos]))
endPos++;
if (endPos == startPos) return null;
context.Position = endPos;
return context.Source.Substring(startPos, endPos - startPos);
}
public bool IsEndTag { get; set; }
public string Name { get; set; }
}
class TagsGroup : Match
{
public TagsGroup()
{
}
protected TagsGroup(Tag openTag)
{
this.Start = openTag.Start;
this.Source = openTag.Source;
this.OpenTag = openTag;
}
protected override bool ParseInternal(ParseContext context)
{
var startPos = context.Position;
if (this.OpenTag == null)
{
this.ReadSpaces(context);
this.OpenTag = this.ReadSingle<Tag>(context);
}
if (this.OpenTag != null)
{
int textStart = context.Position;
int textLength = 0;
while (true)
{
Tag tag = new Tag { Source = this.Source, Start = context.Position };
while (!tag.MarkBeginAndParse(context))
{
if (context.Position >= context.Source.Length)
{
context.Position = startPos;
return false;
}
context.Position++;
textLength++;
}
if (!tag.IsEndTag)
{
var tagGrpStart = context.Position;
var tagGrup = new TagsGroup(tag);
if (tagGrup.Parse(context))
{
if (textLength > 0)
{
if (this.Contents == null) this.Contents = new List<Match>();
this.Contents.Add(new Text { Source = this.Source, Start = textStart, Length = textLength });
textStart = context.Position;
textLength = 0;
}
this.Contents.Add(tagGrup);
}
else
{
textLength += tag.Length;
}
}
else
{
if (tag.Name == this.OpenTag.Name)
{
if (textLength > 0)
{
if (this.Contents == null) this.Contents = new List<Match>();
this.Contents.Add(new Text { Source = this.Source, Start = textStart, Length = textLength });
textStart = context.Position;
textLength = 0;
}
this.CloseTag = tag;
return true;
}
else
{
textLength += tag.Length;
}
}
}
}
context.Position = startPos;
return false;
}
public Tag OpenTag { get; set; }
public Tag CloseTag { get; set; }
public List<Match> Contents { get; set; }
}
class Text : Match
{
protected override bool ParseInternal(ParseContext context)
{
return true;
}
}
}
}
If you use this code, and someday find that you need optimizations because the parser has become ambiguous, then try using a dictionary in the ParseContext, take a look here for more info: http://en.wikipedia.org/wiki/Top-down_parsing in the topic Time and space complexity of top-down parsing. I find it very interesting.
The first change is pretty simple - you can get it by changing [^][]+, which is responsible for matching the free text, to .. This seems a little crazy, perhaps, but it's actually safe, because you are using a possessive group (?> ), so all the valid tags will be matched by the first alternation - \[(?<innertag>[^][/=\s]+)[^][]*] - and cannot backtrack and break the tags.
(Remember to enable the Singleline flag, so . matches newlines)
The second requirement, [unsuccessful], seems to go against your goal it. The whole idea from the very start is not to match these unclosed tags. If you allow unclosed tags, all matches of the form \[(.*?)\].*?[/\1] become valid. Not good. At best, you can try to whitelist a few tags which are not allowed to be matched.
An example of both changes:
(?>
\[ (?<tag>[^][/=\s]+) \s*
(?: = \s* (?<val>[^][]*) \s*)?
\]
)
(?<content>
(?>
\[(?:unsuccessful)\] # self closing
|
\[(?<innertag>[^][/=\s]+)[^][]*]
|
\[/(?<-innertag>\k<innertag>)]
|
.
)*
(?(innertag)(?!))
)
\[/\k<tag>\]
Working example on Regex Hero
Ok. Here's another attempt. This one is a little more complicated.
The idea is to match the whole text from start to ext, and parse it to a single Match. While rarely used as such, .Net Balancing Groups allow you to fine tune your captures, remembering all positions and captures exactly the way you want them.
The pattern I came up with is:
\A
(?<StartContentPosition>)
(?:
# Open tag
(?<Content-StartContentPosition>) # capture the content between tags
(?<StartTagPosition>) # Keep the starting postion of the tag
(?>\[(?<TagName>[^][/=\s]+)[^\]\[]*\]) # opening tag
(?<StartContentPosition>) # start another content capture
|
# Close tag
(?<Content-StartContentPosition>) # capture the content in the tag
\[/\k<TagName>\](?<Tag-StartTagPosition>) # closing tag, keep the content in the <tag> group
(?<-TagName>)
(?<StartContentPosition>) # start another content capture
|
. # just match anything. The tags are first, so it should match
# a few if it can. (?(TagName)(?!)) keeps this in line, so
# unmatched tags will not mess with the resul
)*
(?<Content-StartContentPosition>) # capture the content after the last tag
\Z
(?(TagName)(?!))
Remember - the balancing group (?<A-B>) captures into A all text since B was last captured (and pops that position from B's stack).
Now you can parse the string using:
Match match = Regex.Match(sample, pattern, RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace);
Your interesting data will be on match.Groups["Tag"].Captures, which contains all tags (some of them are contained in others), and match.Groups["Content"].Captures, which contains tag's contents, and contents between tags. For example, without all blanks, it contains:
some code
some text
This is also an successful match.
This is also an [ unsuccessful match.
This is also an [unsuccessful] match.
This is pretty close to a full parsed document, but you'll still have to play with indices and length to figure out the exact order and structure of the document (though it isn't more complex than sorting all captures)
At this point I'll state what others have said - it may be a good time to write a parser, this pattern isn't pretty...

String Replacements for Word Merge

using asp.net 4
we do a lot of Word merges at work. rather than using the complicated conditional statements of Word i want to embed my own syntax. something like:
Dear Mr. { select lastname from users where userid = 7 },
Your invoice for this quarter is: ${ select amount from invoices where userid = 7 }.
......
ideally, i'd like this to get turned into:
string.Format("Dear Mr. {0}, Your invoice for this quarter is: ${1}", sqlEval[0], sqlEval[1]);
any ideas?
Well, I don't really recommend rolling your own solution for this, however I will answer the question as asked.
First, you need to process the text and extract the SQL statements. For that you'll need a simple parser:
/// <summary>Parses the input string and extracts a unique list of all placeholders.</summary>
/// <remarks>
/// This method does not handle escaping of delimiters
/// </remarks>
public static IList<string> Parse(string input)
{
const char placeholderDelimStart = '{';
const char placeholderDelimEnd = '}';
var characters = input.ToCharArray();
var placeHolders = new List<string>();
string currentPlaceHolder = string.Empty;
bool inPlaceHolder = false;
for (int i = 0; i < characters.Length; i++)
{
var currentChar = characters[i];
// Start of a placeholder
if (!inPlaceHolder && currentChar == placeholderDelimStart)
{
currentPlaceHolder = string.Empty;
inPlaceHolder = true;
continue;
}
// Start of a placeholder when we already have one
if (inPlaceHolder && currentChar == placeholderDelimStart)
throw new InvalidOperationException("Unexpected character detected at position " + i);
// We found the end marker while in a placeholder - we're done with this placeholder
if (inPlaceHolder && currentChar == placeholderDelimEnd)
{
if (!placeHolders.Contains(currentPlaceHolder))
placeHolders.Add(currentPlaceHolder);
inPlaceHolder = false;
continue;
}
// End of a placeholder with no matching start
if (!inPlaceHolder && currentChar == placeholderDelimEnd)
throw new InvalidOperationException("Unexpected character detected at position " + i);
if (inPlaceHolder)
currentPlaceHolder += currentChar;
}
return placeHolders;
}
Okay, so that will get you a list of SQL statements extracted from the input text. You'll probably want to tweak it to use properly typed parser exceptions and some input guards (which I elided for clarity).
Now you just need to replace those placeholders with the results of the evaluated SQL:
// Sample input
var input = "Hello Mr. {select firstname from users where userid=7}";
string output = input;
var extractedStatements = Parse(input);
foreach (var statement in extractedStatements)
{
// Execute the SQL statement
var result = Evaluate(statement);
// Update the output with the result of the SQL statement
output = output.Replace("{" + statement + "}", result);
}
This is obviously not the most efficient way to do this, but I think it sufficiently demonstrates the concept without muddying the waters.
You'll need to define the Evaluate(string) method. This will handle executing the SQL.
I just finished building a proprietary solution like this for a law firm here.
I evaluated a product called Windward reports. It's a tad pricy, esp if you need a lot of copies, but for one user it's not bad.
it can pull from XML or SQL data sources (or more if I remember).
Might be worth a look (and no I don't work for 'em, just evaluated their stuff)
You might want to check out the razor engine project on codeplex
http://razorengine.codeplex.com/
Using SQL etc within your template looks like a bad idea. I'd suggest you make a ViewModel for each template.
The Razor thing is really easy to use. Just add a reference, import the namespace, and call the Parse method like so:
(VB guy so excuse syntax!)
MyViewModel myModel = new MyViewModel("Bob",150.00); //set properties
string myTemplate = "Dear Mr. #Model.FirstName, Your invoice for this quarter is: #Model.InvoiceAmount";
string myOutput = Razor.Parse(myTemplate, myModel);
Your string can come from anywhere - I use this with my templates stored in a database, you could equally load it from files or whatever. It's very powerful as a view engine, you can do conditional stuff, loops, etc etc.
i ended up rolling my own solution but thanks. i really dislike if statements. i'll need to refactor them out. here it is:
var mailingMergeString = new MailingMergeString(input);
var output = mailingMergeString.ParseMailingMergeString();
public class MailingMergeString
{
private string _input;
public MailingMergeString(string input)
{
_input = input;
}
public string ParseMailingMergeString()
{
IList<SqlReplaceCommand> sqlCommands = new List<SqlReplaceCommand>();
var i = 0;
const string openBrace = "{";
const string closeBrace = "}";
while (string.IsNullOrWhiteSpace(_input) == false)
{
var sqlReplaceCommand = new SqlReplaceCommand();
var open = _input.IndexOf(openBrace) + 1;
var close = _input.IndexOf(closeBrace);
var length = close != -1 ? close - open : _input.Length;
var newInput = _input.Substring(close + 1);
var nextClose = newInput.Contains(openBrace) ? newInput.IndexOf(openBrace) : newInput.Length;
if (i == 0 && open > 0)
{
sqlReplaceCommand.Text = _input.Substring(0, open - 1);
_input = _input.Substring(open - 1);
}
else
{
sqlReplaceCommand.Command = _input.Substring(open, length);
sqlReplaceCommand.PlaceHolder = openBrace + i + closeBrace;
sqlReplaceCommand.Text = _input.Substring(close + 1, nextClose);
sqlReplaceCommand.NewInput = _input.Substring(close + 1);
_input = newInput.Contains(openBrace) ? sqlReplaceCommand.NewInput : string.Empty;
}
sqlCommands.Add(sqlReplaceCommand);
i++;
}
return sqlCommands.GetParsedString();
}
internal class SqlReplaceCommand
{
public string Command { get; set; }
public string SqlResult { get; set; }
public string PlaceHolder { get; set; }
public string Text { get; set; }
protected internal string NewInput { get; set; }
}
}
internal static class SqlReplaceExtensions
{
public static string GetParsedString(this IEnumerable<MailingMergeString.SqlReplaceCommand> sqlCommands)
{
return sqlCommands.Aggregate("", (current, replaceCommand) => current + (replaceCommand.PlaceHolder + replaceCommand.Text));
}
}

Best practice for parsing and validating mobile number

I wonder what the best practice for parsing and validating a mobile number before sending a text is. I've got code that works, but I'd like to find out better ways of doing it (as my last question, this is part of my early new years resolution to write better quality code!).
At the moment we are very forgiving when the user enters the number on the form, they can enter things like "+44 123 4567890", "00441234567890", "0123456789", "+44(0)123456789", "012-345-6789" or even "haven't got a phone".
However, to send the text the format must be 44xxxxxxxxxx (this is for UK mobiles only), so we need to parse it and validate it before we can send. Below is the code that I have for now (C#, asp.net), it would be great if anyone had any ideas on how to improve it.
Thanks,
Annelie
private bool IsMobileNumberValid(string mobileNumber)
{
// parse the number
_mobileNumber = ParsedMobileNumber(mobileNumber);
// check if it's the right length
if (_mobileNumber.Length != 12)
{
return false;
}
// check if it contains non-numeric characters
if(!Regex.IsMatch(_mobileNumber, #"^[-+]?[0-9]*\.?[0-9]+$"))
{
return false;
}
return true;
}
private string ParsedMobileNumber(string number)
{
number = number.Replace("+", "");
number = number.Replace(".", "");
number = number.Replace(" ", "");
number = number.Replace("-", "");
number = number.Replace("/", "");
number = number.Replace("(", "");
number = number.Replace(")", "");
number = number.Trim(new char[] { '0' });
if (!number.StartsWith("44"))
{
number = "44" + number;
}
return number;
}
EDIT
Here's what I ended up with:
private bool IsMobileNumberValid(string mobileNumber)
{
// remove all non-numeric characters
_mobileNumber = CleanNumber(mobileNumber);
// trim any leading zeros
_mobileNumber = _mobileNumber.TrimStart(new char[] { '0' });
// check for this in case they've entered 44 (0)xxxxxxxxx or similar
if (_mobileNumber.StartsWith("440"))
{
_mobileNumber = _mobileNumber.Remove(2, 1);
}
// add country code if they haven't entered it
if (!_mobileNumber.StartsWith("44"))
{
_mobileNumber = "44" + _mobileNumber;
}
// check if it's the right length
if (_mobileNumber.Length != 12)
{
return false;
}
return true;
}
private string CleanNumber(string phone)
{
Regex digitsOnly = new Regex(#"[^\d]");
return digitsOnly.Replace(phone, "");
}
Use a regular expression to remove any non-numeric characters instead of trying to guess how a person will enter their number - this will remove all your Replace() and Trim() methods, unless you really need to trim a leading zero.
string CleanPhone(string phone)
{
Regex digitsOnly = new Regex(#"[^\d]");
return digitsOnly.Replace(phone, "");
}
Alternatively, I would recommend you use a masked textbox to collect the # (there are many options available) to allow only numeric input, and display the input with whatever format you'd like. This way you're guaranteeing that the value received will be all numeric characters.
Check out QAS, it's a commercial solution.
They have email, phone and address validations.
http://www.qas.com/phone-number-validation-web-service.htm
We use their services for Address and Email (not phone) and have been satisfied with it.
#annelie maybe you can update your regular expression to a more powerful one. Check out this site here. It contains many expressions but I think one of the top 2 expressions in the site should be suitable to you.
public class PhoneNumber
{
public PhoneNumber(string value)
{
if (String.IsNullOrEmpty(value))
throw new ArgumentNullException("numberString", Properties.Resources.PhoneNumberIsNullOrEmpty);
var match = new Regex(#"\+(\w+) \((\w+)\) (\w+)", RegexOptions.Compiled).Match(value);
if (match.Success)
{
ushort countryCode = 0;
ushort localCode = 0;
int number = 0;
if (UInt16.TryParse(match.Result("$1"), out countryCode) &&
UInt16.TryParse(match.Result("$2"), out localCode) &&
Int32.TryParse(match.Result("$3"), out number))
{
this.CountryCode = countryCode;
this.LocalCode = localCode;
this.Number = number;
}
}
else
{
throw new ArgumentNullException("numberString", Properties.Resources.PhoneNumberInvalid);
}
}
public PhoneNumber(int countryCode, int localCode, int number)
{
if (countryCode == 0)
throw new ArgumentOutOfRangeException("countryCode", Properties.Resources.PhoneNumberIsNullOrEmpty);
else if (localCode == 0)
throw new ArgumentOutOfRangeException("localCode", Properties.Resources.PhoneNumberIsNullOrEmpty);
else if (number == 0)
throw new ArgumentOutOfRangeException("number", Properties.Resources.PhoneNumberIsNullOrEmpty);
this.CountryCode = countryCode;
this.LocalCode = localCode;
this.Number = number;
}
public int CountryCode { get; set; }
public int LocalCode { get; set; }
public int Number { get; set; }
public override string ToString()
{
return String.Format(System.Globalization.CultureInfo.CurrentCulture, "+{0} ({1}) {2}", CountryCode, LocalCode, Number);
}
public static bool Validate(string value)
{
return new Regex(#"\+\w+ \(\w+\) \w+", RegexOptions.Compiled).IsMatch(value);
}
public static bool Validate(string countryCode, string localCode, string number, out PhoneNumber phoneNumber)
{
var valid = false;
phoneNumber = null;
try
{
ushort uCountryCode = 0;
ushort uLocalCode = 0;
int iNumber = 0;
// match only if all three numbers have been parsed successfully
valid = UInt16.TryParse(countryCode, out uCountryCode) &&
UInt16.TryParse(localCode, out uLocalCode) &&
Int32.TryParse(number, out iNumber);
if (valid)
phoneNumber = new PhoneNumber(uCountryCode, uLocalCode, iNumber);
}
catch (ArgumentException)
{
// still not match
}
return valid;
}
}

Categories