C# : Regular Expression - c#

I'm having a set of row data as follows
List<String> l_lstRowData = new List<string> { "Data 1 32:01805043*0FFFFFFF",
"Data 3, 20.0e-3",
"Data 2, 1.0e-3 172:?:CRC" ,
"Data 6"
};
and two List namely "KeyList" and "ValueList" like
List<string> KeyList = new List<string>();
List<string> ValueList = new List<string>();
I need to fill the two List<String> from the data from l_lstRowData using Pattern Matching
And here is my Pattern for this
String l_strPattern = #"(?<KEY>(Data|data|DATA)\s[0-9]*[,]?[ ]?[0-9e.-]*)[ \t\r\n]*(?<Value>[0-9A-Za-z:?*!. \t\r\n\-]*)";
Regex CompiledPattern=new Regex(l_strPattern,RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
So finally the two Lists will contain
KeyList
{ "Data 1" }
{ "Data 3, 20.0e-3" }
{ "Data 2, 1.0e-3" }
{ "Data 6" }
ValueList
{ "32:01805043*0FFFFFFF" }
{ "" }
{ "172:?:CRC" }
{ "" }
Scenerio:
The Group KEY in the Pattern Should match "The data followed by an integer value , and the if there exist a comma(,) then the next string i.e a double value
The Group Value in the Pattern should match string after the whitespace.In the first string it should match 32:01805043*0FFFFFFF but in the 3rd 172:?:CRC.
Here is my sample code
for (int i = 0; i < l_lstRowData.Count; i++)
{
MatchCollection M = CompiledPattern.Matches(l_lstRowData[i], 0);
KeyList.Add(M[0].Groups["KEY"].Value);
ValueList.Add(M[0].Groups["Value"].Value);
}
But my Pattern is not working in this situation.
EDIT
My code result like
KeyList
{ "Data 1 32" } // 32 should be in the next list
{ "Data 3, 20.0e-3" }
{ "Data 2, 1.0e-3" }
{ "Data 6" }
ValueList
{ ":01805043*0FFFFFFF" }
{ "" }
{ "172:?:CRC" }
{ "" }
Please help me to rewrite my Pattern.

Your code works for me, so please define what's not working.
Also:
start your regexp with ^ and end it with $
use regex.Match() instead of Matches() because you know it'll only match once
i don't see why you use IgnorePatternWhitespace
use a simple comma instead of [,], a simple space instead of [ ]
use \s instead of [ \t\r\n]
if you specify IgnoreCase, then no need for Data|data|DATA or [A-Za-z]
And if you clean this up, maybe you can solve it alone :)

I think a simpler regex would work: (?<key>data \d(?:, [\d.e-]+)?)(?<value>.*) will match your keys and values, providing you use the RegexOptions.IgnoreCase flag too.
You can see the results at this Rubular example link.

Related

How to Find List Item that is Present in a String

I have to find whether the String Contains one of the Exact word which are present in the List.
Eg:
List<string> KeyWords = new List<string>(){"Test","Re Test","ACK"};
String s1 = "Please give the Test"
String s2 = "Please give Re Test"
String s3 = "Acknowledge my work"
Now,
When I use: Keywords.Where(x=>x.Contains(s1)) It Gives me a Match which is correct. But for s3 it should not.
Any workaround for this.
Use split function on the basis of space and match the words.
i hope that will worked.
How about using regular expressions?
public static class Program
{
public static void Main(string[] args)
{
var keywords = new List<string>() { "Test", "Re Test", "ACK" };
var targets = new[] {
"Please give the Test",
"Please give Re Test",
"Acknowledge my work"
};
foreach (var target in targets)
{
Console.WriteLine($"{target}: {AnyMatches(target, keywords)}");
}
Console.ReadKey();
}
private static bool AnyMatches(string target, IEnumerable<string> keywords)
{
foreach (var keyword in keywords)
{
var regex = new Regex($"\\b{Regex.Escape(keyword)}\\b", RegexOptions.IgnoreCase);
if (regex.IsMatch(target))
return true;
}
return false;
}
}
Creating the regular expression always on-the-fly is maybe not the best option in production, so you should think of creating a list of Regex based on your keywords instead of storing only the keywords in a dumb string list.
Bit different solution.
void Main()
{
var KeyWords = new List<string>(){ "Test","Re Test","ACK" };
var array = new string[] {
"Please give the Test",
"Please give Re Test",
"Acknowledge my work"
};
foreach(var c in array)
{
Contains(c,KeyWords); // Your result.
}
}
private bool Contains(string sentence, List<string> keywords) {
var result = keywords.Select(keyWord=>{
var parts3 = Regex.Split(sentence, keyWord, RegexOptions.IgnoreCase).Where(x=>!string.IsNullOrWhiteSpace(x)).First().Split((char[])null); // Split by the keywords and get the rest of the words splitted by empty space
var splitted = sentence.Split((char[])null); // split the original string.
return parts3.Where(t=>!string.IsNullOrWhiteSpace(t)).All(x=>splitted.Any(t=>t.Trim().Equals(x.Trim(),StringComparison.InvariantCultureIgnoreCase)));
}); // Check if all remaining words from parts3 are inside the existing splitted string, thus verifying if full words.
return result.All(x=>x);// if everything matches then it was a match on full word.
}
The Idea is to split by the word you are looking for e.g Split by ACK and then see if the remaining words are matched by words splitted inside the original string, if the remaining match that means there was a word match and thus a true. If it is a part split meaning a sub string was taken out, then words wont match and thus result will be false.
Your usage of Contains is backwards:
var foundKW = KeyWords.Where(kw => s1.Contains(kw)).ToList();
how about the using of regex
using \bthe\b, \b represents a word boundary delimiter.
List<string> KeyWords = new List<string>(){"Test","Re Test","ACK"};
String s1 = "Please give the Test"
String s2 = "Please give Re Test"
String s3 = "Acknowledge my work"
bool result = false ;
foreach(string str in KeyWords)
{
result = Regex.IsMatch(s1 , #"\b"+str +"\b");
if(result)
break;
}

Check U.K Postcode for Validation [duplicate]

This question already has answers here:
RegEx for matching UK Postcodes
(33 answers)
Closed 3 years ago.
I need to check the U.K postcode against a list.
The U.K postcode is of a standard format but the list only contains the outward section that I need to check against.
The list contains a series of outward postcode with also some data relating to this outward postcode, so for example
AL St Albans
B Birmingham
BT Belfast
TR Taunton
TR21 Taunton X
TR22 Taunton Y
My aim is that when I get a postcode, for example B20 7TP, I can search and find Birmingham.
Any ideas??
The question is different to the ones referred to as possible answers, but in my case I need to check a full postcode against just the outward postcode.
If you have the whole postcode and only want to use the outcode, remove the last three characters and use what remains. All postcodes end with the pattern digit-alpha-alpha, so removing those characters will give the outcode; any string that does not fit that pattern or that does not give a valid outcode after removing that substring is not a valid postcode. (Source)
If you're willing to take on an external (and Internet-based) dependency, you could look at using something like https://postcodes.io, in particular the outcodes section of that API. I have no affiliation with postcodes.io; I just found it after a Google.
Per the documentation, /outcodes will return
the outcode
the eastings
the northings
the andministrative counties under the code
the district/unitary authories under the code
the administrative/electoral areas under the code
the WGS84 logitude
the WGS84 latitude
the countries included in the code
the parish/communities in the code
For reference, a call to /outcodes/TA1 returns:
{
"status": 200,
"result": {
"outcode": "TA1",
"longitude": -3.10297767924529,
"latitude": 51.0133987332761,
"northings": 124359,
"eastings": 322721,
"admin_district": [
"Taunton Deane"
],
"parish": [
"Taunton Deane, unparished area",
"Bishop's Hull",
"West Monkton",
"Trull",
"Comeytrowe"
],
"admin_county": [
"Somerset"
],
"admin_ward": [
"Taunton Halcon",
"Bishop's Hull",
"Taunton Lyngford",
"Taunton Eastgate",
"West Monkton",
"Taunton Manor and Wilton",
"Taunton Fairwater",
"Taunton Killams and Mountfield",
"Trull",
"Comeytrowe",
"Taunton Blackbrook and Holway"
],
"country": [
"England"
]
}
}
If you have the whole postcode, the /postcodes endpoint will return considerably more detailed information which I will not include here, but it does include the outcode and the incode as separate fields.
I would, of course, recommend caching the results of any call to a remote API.
Build a regular expression from the list of known codes. Pay attention that the order of known codes in the regular expression matters. You need to use longer codes before shorter codes.
private void button1_Click(object sender, EventArgs e)
{
textBoxLog.Clear();
var regionList = BuildList();
var regex = BuildRegex(regionList.Keys);
TryMatch("B20 7TP", regionList, regex);
TryMatch("BT1 1AB", regionList, regex);
TryMatch("TR21 1AB", regionList, regex);
TryMatch("TR0 00", regionList, regex);
TryMatch("XX123", regionList, regex);
}
private static IReadOnlyDictionary<string, string> BuildList()
{
Dictionary<string, string> result = new Dictionary<string, string>();
result.Add("AL", "St Albans");
result.Add("B", "Birmingham");
result.Add("BT", "Belfast");
result.Add("TR", "Taunton");
result.Add("TR21", "Taunton X");
result.Add("TR22", "Taunton Y");
return result;
}
private static Regex BuildRegex(IEnumerable<string> codes)
{
// Sort the code by length descending so that for example TR21 is sorted before TR and is found by regex engine
// before the shorter match
codes = from code in codes
orderby code.Length descending
select code;
// Escape the codes to be used in the regex
codes = from code in codes
select Regex.Escape(code);
// create Regex Alternatives
string codesAlternatives = string.Join("|", codes.ToArray());
// A regex that starts with any of the codes and then has any data following
string lRegExSource = "^(" + codesAlternatives + ").*";
return new Regex(lRegExSource, RegexOptions.IgnoreCase | RegexOptions.Singleline);
}
/// <summary>
/// Try to match the postcode to a region
/// </summary>
private bool CheckPostCode(string postCode, out string identifiedRegion, IReadOnlyDictionary<string, string> regionList, Regex regex)
{
// Check whether we have any match at all
Match match = regex.Match(postCode);
bool result = match.Success;
if (result)
{
// Take region code from first match group
// and use it in dictionary to get region name
string regionCode = match.Groups[1].Value;
identifiedRegion = regionList[regionCode];
}
else
{
identifiedRegion = "";
}
return result;
}
private void TryMatch(string code, IReadOnlyDictionary<string, string> regionList, Regex regex)
{
string region;
if (CheckPostCode(code, out region, regionList, regex))
{
AppendLog(code + ": " + region);
}
else
{
AppendLog(code + ": NO MATCH");
}
}
private void AppendLog(string log)
{
textBoxLog.AppendText(log + Environment.NewLine);
}
Produces this output:
B20 7TP: Birmingham
BT1 1AB: Belfast
TR21 1AB: Taunton X
TR0 00: Taunton
XX123: NO MATCH
For your information, the regex built here is ^(TR21|TR22|AL|BT|TR|B).*

Replacing anchor/link in text

I'm having issues doing a find / replace type of action in my function, i'm extracting the < a href="link">anchor from an article and replacing it with this format: [link anchor] the link and anchor will be dynamic so i can't hard code the values, what i have so far is:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
string theString = string.Empty;
switch (articleWikiCheck) {
case "id|wpTextbox1":
StringBuilder newHtml = new StringBuilder(articleBody);
Regex r = new Regex(#"\<a href=\""([^\""]+)\"">([^<]+)");
string final = string.Empty;
foreach (var match in r.Matches(theString).Cast<Match>().OrderByDescending(m => m.Index))
{
string text = match.Groups[2].Value;
string newHref = "[" + match.Groups[1].Index + " " + match.Groups[1].Index + "]";
newHtml.Remove(match.Groups[1].Index, match.Groups[1].Length);
newHtml.Insert(match.Groups[1].Index, newHref);
}
theString = newHtml.ToString();
break;
default:
theString = articleBody;
break;
}
Helpers.ReturnMessage(theString);
return theString;
}
Currently, it just returns the article as it originally is, with the traditional anchor text format: < a href="link">anchor
Can anyone see what i have done wrong?
regards
If your input is HTML, you should consider using a corresponding parser, HtmlAgilityPack being really helpful.
As for the current code, it looks too verbose. You may use a single Regex.Replace to perform the search and replace in one pass:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody, #"<a\s+href=""([^""]+)"">([^<]+)", "[$1 $2]");
}
else
{
// Helpers.ReturnMessage(articleBody); // Uncomment if it is necessary
return articleBody;
}
}
See the regex demo.
The <a\s+href="([^"]+)">([^<]+) regex matches <a, 1 or more whitespaces, href=", then captures into Group 1 any one or more chars other than ", then matches "> and then captures into Group 2 any one or more chars other than <.
The [$1 $2] replacement replaces the matched text with [, Group 1 contents, space, Group 2 contents and a ].
Updated (Corrected regex to support whitespaces and new lines)
You can try this expression
Regex r = new Regex(#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>");
It will match your anchors, even if they are splitted into multiple lines. The reason why it is so long is because it supports empty whitespaces between the tags and their values, and C# does not supports subroutines, so this part [\s\n]* has to be repeated multiple times.
You can see a working sample at dotnetfiddle
You can use it in your example like this.
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody,
#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>",
"[${link} ${anchor}]");
}
else
{
return articleBody;
}
}

Need to split something dynamically from a string

I have a string which is somewhat like this:
string data = "I have a {apple} and a {orange}";
I need to extract the content inside {}, let's say for 10 times
I tried this
string[] split = data.Split(new char[] { '{', '}' }, StringSplitOptions.RemoveEmptyEntries);
The problem is my data is going to be dynamic and I wouldn't know at what instance the {<>} would be present, it can also be something like this
Give {Pen} {Pencil}
I guess the above method wouldn't work, so I would really like to know a dynamic way to do this. Any input would be really helpful.
Thanks and Regards
Try this:
string data = "I have a {apple} and a {orange}";
Regex rx = new Regex("{(.*?)}");
foreach (Match item in rx.Matches(data))
{
Console.WriteLine(item.Groups[1].Value);
}
You need to use Regex to get all values you need.
If the string between {} does not contain nested {} you can use a regex to perform this task:
string data = "I have a {apple} and a {orange}";
Regex reg = new Regex(#"\{(?<Name>[A-z0-9]*)\}");
var matches = reg.Matches(data);
foreach (var m in matches.OfType<Match>())
{
Console.WriteLine($"Found {m.Groups["Name"].Value} at {m.Index}");
}
To replace the strings between {} you can use Regex.Replace:
reg.Replace(data, m => m.Groups["Name"].Value + "_")
// Will produce "I have a apple_ and a orange_"
To get the rest of the string, you can use Regex.Split:
Regex reg2 = new Regex(#"\{[A-z0-9]*\}");
var result = reg2.Split(data);
// will contain "I have a ", " and a ", "", you might want to remove ""
As I understand, you want to split that string into parts like this:
I have a
{apple}
and a
{orange}
And then you want to go over those parts and do something with them, and that something is different depending on whether part is enclosed in {} or not. If so - you need Regex.Split:
string data = "I have a {apple} and a {orange}";
var parts = Regex.Split(data, #"({.*?})");
foreach (var part in parts) {
if (part.StartsWith("{") && part.EndsWith("}")) {
var trimmed = part.TrimStart('{').TrimEnd('}');
// "apple" and "orange" go here
// do something with {} part
}
else {
// "I have a " and " and a " go here
// do something with other part
}
}

How to read a string that is enclosed in specific tags and replace it with a different string?

I have a string that has multiple places to parse for strings enclosed in <%%> tags and replace it with appropriate values. If it is only one occurence of the tags, I could use IndexOf method to read the string and then use Replace method. How can I do it with multiple occurences of the tags? Thanks for any suggestions.
Example:
Read text1 <%GetName%> Read text2 <%GetID%> Read tex3 <%GetNumber%> and more
The output should be
Read text1 Value1 Read text2 Value2 Read text3 Value3
You could consider using regular expressions - specifically the Regex.Replace method
The regex you would require would be something like:
<%([^%]+)%>
Using a MatchEvaluator, you can replace the whole string with something specific based on the match:
var newText = Regex.Replace(textToCheck, "<%([^%]+)%>", (m) => {
switch (m.Groups[1].Value)
{
case "GetName":
return "New value";
...
}
});
You can use regex and a dictionary to map the values.....
var toReplace = new Dictionary<string, string>()
{
{"GetName", "Value1" },
{"GetID", "Value2" },
{"GetNumber", "Value3" },
};
string input = #"Read text1 <%GetName%> Read text2 <%GetID%> Read tex3 <%GetNumber%> and more";
var output = Regex.Replace(input, #"<%(.+?)%>", m => toReplace[m.Groups[1].Value]);
OUTPUT:
Read text1 Value1 Read text2 Value2 Read tex3 Value3 and more
There's an alternative to regular expressions using a dictionary to map template parameters to values to replace them in a given string template:
public static class StringTemplatingExtensions
{
public static string ParseTemplate(this string template, IDictionary<string, object> valueMap)
{
foreach(var pair in valueMap)
{
template = template.Replace($"<%{pair.Key}%>", pair.Value.ToString());
}
return template;
}
}
So it can be used as follows:
var template = "Read text1 <%GetName%> Read text2 <%GetID%> Read tex3 <%GetNumber%>";
var parsed = template.ParseTemplate(new Dictionary<string, object> {
{ "GetName", "Matías" },
{ "GetID", "114894" },
{ "GetNumber", "282893" }
});
Note that this solution is less flexible than others, because it won't support <% VARIABLE %>, <%VARIABLE %>, but just <%VARIABLE%> (without spaces). BTW, it's a very simple but yet effective way of implementing your requirement and it just works!

Categories