Dynamic Regex replaces single backslash with double backslash in C# - c#

In my code I am saving the Regular Expression for validating the UK Mobile number i.e. "^(+44\s?7\d{3}|(?07\d{3})?)\s?\d{3}\s?\d{3}$" in to the Sql server database.
On retrieving the expression to validate the mobile number the "\" will be replaced with the "\", this gives a serious issue as on checking it says the mobile number is invalid even though its valid. I tried to replace the double slash with single or even by replacing the slash with some special characters in the database.
If I give the regex expression statically it works fine for me:
C# Code:
bool isPhoneNumber = Regex.IsMatch(sColumnValue, #"^(\+44\s?7\d{3}|\(?07\d{3}\)?)\s?\d{3}\s?\d{3}$");
if (isPhoneNumber == true)
{
//Do something...
}
else
{
//Do something...
}
But replaces when I get the Regex string stored in database.I have replaced "\" with special characters in database "###" i.e.
"^(###+44###s?7###d{3}|###(?07###d{3}###)?)###s?###d{3}###s?###d{3}$"
C# code:
string sRegxE = Context.Fields.Where(s => s.Name == sColumnName).Select(s => s.ExpressionValue).FirstOrDefault();
string sExpression= sRegxE.Replace(#"###", #"\");
if (isPhoneNumber == true)
{
//Do something...
}
else
{
//Do something...
}
This doesn't work for me and I am getting the double slash instead of single slash with produces serious effect on Regex validation.
Can anybody help me to prevent the replacing of single backslash in C#, Cheers!!

You can simply use .Replace() function like below :-
string temp = "^(###+44###s?7###d{3}|###(?07###d{3}###)?)###s?###d{3}###s?###d{3}$";
temp = temp.Replace("^(###+44###s?7###d{3}|###(?07###d{3}###)?)###s?###d{3}###s?###d{3}$", "\");
Or
temp = temp.Replace("^(###+44###s?7###d{3}|###(?07###d{3}###)?)###s?###d{3}###s?###d{3}$", "\\");
And can use it like :-
bool isPhoneNumber = Regex.IsMatch(sColumnValue, temp);
if (isPhoneNumber == true)
{
//Do something...
}
else
{
//Do something...
}
Edited:-
You can also use Regex.Unescape()
Have a look on below link for more details :-
Many regular expressions contain escaped characters. Sometimes you want to unescape these characters to get their original representation.
https://www.dotnetperls.com/unescape

I tried the following code and it worked
string data = "+447712345678";
string pattern = #"^(###+44###s?7###d{3}|###(?07###d{3}###)?)###s?###d{3}###s?###d{3}$";
pattern = pattern.Replace(#"###", #"\");
if (Regex.IsMatch(data, pattern))
{
//Do something...
}
else
{
//Do something...
}

Finally I solved it.
What I am doing here is instead of using "Regex.IsMatch()" directly,I created a Regex object and passing the dynamic value from the Sql server database i.e. ^(+44\s?7\d{3}|(?07\d{3})?)\s?\d{3}\s?\d{3}$, then check whether the input matches.
Code:
string sRegxE =dbContext.GetFields.Where(s => s.Name == sColumnName).Select(s => s.ExpressionValue).FirstOrDefault();
Regex RgxM = new Regex("" + sRegxE + "");
Match isPhoneNumber = RgxM.Match(sColumnValue);
if (isPhoneNumber.Success)
{
//Do somthing...
}
Reference:
Capture variable string in a regular expression?

Related

How to get all files ending with the extension "_\<fileNum>of\<totalFileNum>" and sometimes without? [duplicate]

a user specifies a file name that can be either in the form "<name>_<fileNum>of<fileNumTotal>" or simply "<name>". I need to somehow extract the "<name>" part from the full file name.
Basically, I am looking for a solution to the method "ExtractName()" in the following example:
string fileName = "example_File"; \\ This var is specified by user
string extractedName = ExtractName(fileName); // Must return "example_File"
fileName = "example_File2_1of5";
extractedName = ExtractName(fileName); // Must return "example_File2"
fileName = "examp_File_3of15";
extractedName = ExtractName(fileName); // Must return "examp_File"
fileName = "example_12of15";
extractedName = ExtractName(fileName); // Must return "example"
Edit: Here's what I've tried so far:
ExtractName(string fullName)
{
return fullName.SubString(0, fullName.LastIndexOf('_'));
}
But this clearly does not work for the case where the full name is just "<name>".
Thanks
This would be easier to parse using Regex, because you don't know how many digits either number will have.
var inputs = new[]
{
"example_File",
"example_File2_1of5",
"examp_File_3of15",
"example_12of15"
};
var pattern = new Regex(#"^(.+)(_\d+of\d+)$");
foreach (var input in inputs)
{
var match = pattern.Match(input);
if (!match.Success)
{
// file doesn't end with "#of#", so use the whole input
Console.WriteLine(input);
}
else
{
// it does end with "#of#", so use the first capture group
Console.WriteLine(match.Groups[1].Value);
}
}
This code returns:
example_File
example_File2
examp_File
example
The Regex pattern has three parts:
^ and $ are anchors to ensure you capture the entire string, not just a subset of characters.
(.+) - match everything, be as greedy as possible.
(_\d+of\d+) - match "_#of#", where "#" can be any number of consecutive digits.

Removing Escape Characters for a string

I am having a bit of a problem with Escape characters is a string that I am reading from a txt file,
They are causing an error later in my program, they need to be removed but I can't seem to filter them out
public static List<string> loadData(string type)
{
List<string> dataList = new List<string>();
try
{
string path = Path.Combine(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location), "Data");
string text = File.ReadAllText(path + type);
string[] dataArray = text.Split(',');
foreach (var data in dataArray)
{
string dataUnescaped = Regex.Unescape(data);
if (!string.IsNullOrEmpty(dataUnescaped) && (!dataUnescaped.Contains(#"\r") || (!dataUnescaped.Contains(#"\n"))))
{
dataList.Add(data);
}
}
return dataList;
}
catch(Exception e)
{
Console.WriteLine(e);
return dataList;
}
}
I have tried text.Replace(#"\r\n")
and an if statement but I just cant seem to remove them from my string
Any ideas will be appreciated
If you add the # Sign before a string that means you specify that you want a string without having to escape any characters.
So if you wanted a path without # you would need to do this:
string s = "c:\\myfolder\\myfile.txt"
But if you add the # before your \n\r isntead of the escaped sequence Windows New Line you would instead get the string "\n\r".
So this will result in you removing all occurrences of the string "\n\r". Instead of NewLines like you want to:
text.Replace(#"\r\n")
To fix that you would need to use:
text = text.Replace(Environment.NewLine, string.Empty);
You can use Environment.NewLine as well instead of \r and \n, because Environment knows which OS you are currently on and change the replaced character depeding on that.

Replacing anchor/link in text

I'm having issues doing a find / replace type of action in my function, i'm extracting the < a href="link">anchor from an article and replacing it with this format: [link anchor] the link and anchor will be dynamic so i can't hard code the values, what i have so far is:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
string theString = string.Empty;
switch (articleWikiCheck) {
case "id|wpTextbox1":
StringBuilder newHtml = new StringBuilder(articleBody);
Regex r = new Regex(#"\<a href=\""([^\""]+)\"">([^<]+)");
string final = string.Empty;
foreach (var match in r.Matches(theString).Cast<Match>().OrderByDescending(m => m.Index))
{
string text = match.Groups[2].Value;
string newHref = "[" + match.Groups[1].Index + " " + match.Groups[1].Index + "]";
newHtml.Remove(match.Groups[1].Index, match.Groups[1].Length);
newHtml.Insert(match.Groups[1].Index, newHref);
}
theString = newHtml.ToString();
break;
default:
theString = articleBody;
break;
}
Helpers.ReturnMessage(theString);
return theString;
}
Currently, it just returns the article as it originally is, with the traditional anchor text format: < a href="link">anchor
Can anyone see what i have done wrong?
regards
If your input is HTML, you should consider using a corresponding parser, HtmlAgilityPack being really helpful.
As for the current code, it looks too verbose. You may use a single Regex.Replace to perform the search and replace in one pass:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody, #"<a\s+href=""([^""]+)"">([^<]+)", "[$1 $2]");
}
else
{
// Helpers.ReturnMessage(articleBody); // Uncomment if it is necessary
return articleBody;
}
}
See the regex demo.
The <a\s+href="([^"]+)">([^<]+) regex matches <a, 1 or more whitespaces, href=", then captures into Group 1 any one or more chars other than ", then matches "> and then captures into Group 2 any one or more chars other than <.
The [$1 $2] replacement replaces the matched text with [, Group 1 contents, space, Group 2 contents and a ].
Updated (Corrected regex to support whitespaces and new lines)
You can try this expression
Regex r = new Regex(#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>");
It will match your anchors, even if they are splitted into multiple lines. The reason why it is so long is because it supports empty whitespaces between the tags and their values, and C# does not supports subroutines, so this part [\s\n]* has to be repeated multiple times.
You can see a working sample at dotnetfiddle
You can use it in your example like this.
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody,
#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>",
"[${link} ${anchor}]");
}
else
{
return articleBody;
}
}

Decode HTML string in c# [duplicate]

How do I decode this string 'Sch\u00f6nen' (#"Sch\u00f6nen") in C#, I've tried HttpUtility but it doesn't give me the results I need, which is "Schönen".
Regex.Unescape did the trick:
System.Text.RegularExpressions.Regex.Unescape(#"Sch\u00f6nen");
Note that you need to be careful when testing your variants or writing unit tests: "Sch\u00f6nen" is already "Schönen". You need # in front of string to treat \u00f6 as part of the string.
If you landed on this question because you see "Sch\u00f6nen" (or similar \uXXXX values in string constant) - it is not encoding. It is a way to represent Unicode characters as escape sequence similar how string represents New Line by \n and Return by \r.
I don't think you have to decode.
string unicodestring = "Sch\u00f6nen";
Console.WriteLine(unicodestring);
Schönen was outputted.
Wrote a code that covnerts unicode strings to actual chars. (But the best answer in this topic works fine and less complex).
string stringWithUnicodeSymbols = #"{""id"": 10440119, ""photo"": 10945418, ""first_name"": ""\u0415\u0432\u0433\u0435\u043d\u0438\u0439""}";
var splitted = Regex.Split(stringWithUnicodeSymbols, #"\\u([a-fA-F\d]{4})");
string outString = "";
foreach (var s in splitted)
{
try
{
if (s.Length == 4)
{
var decoded = ((char) Convert.ToUInt16(s, 16)).ToString();
outString += decoded;
}
else
{
outString += s;
}
}
catch (Exception e)
{
outString += s;
}
}

Single Quote Escape Character Formatting for JavaScript

I have a question that I feel will be simple to answer: I have the code
function ApplicantNameMatchedInitialPayment() {
var applicantName = '<%= ViewData["ApplicantName"] %>';
var fullName = applicantName.split(' ');
if (fullName.length == 2)
{
var firstName = fullName[0].toLowerCase();
var lastName = fullName[1].toLowerCase();
var nameOnCard = $("#name-on-card").val().toLowerCase();
if(nameOnCard.includes(firstName) & (nameOnCard.includes(lastName)))
{
return true;
}
}
return false;
}
I am trying to handle a case where my user enters their name with an apostrophe. When the ViewData Object is filled during live execution, the customer's name will show up in the 'applicantName' variable. The problem is that if I enter a name like "De'Leon", a JS error is thrown in the console because of an incorrect escape sequence.. and the string will not be read correctly. I want to take any string that is passed in from my C# Viewdata object and handle the apostrophes dynamically so that no errors are thrown and so that my javascript understands that everything should just be one string. A little help with the string formatting and escape character?
If you want to just escape apostrophes in JavaScript you could try to simply replace them with \’:
s = s.replace("'", "\'");
It won’t affect your further work with this string so if you write it to the console it will output a result without backslash:
var s = "De'Leon";
s = s.replace("'", "\'");
console.log(s); // > De'Leon
If you're using .NET version 4 or later, you can use HttpUtility.JavaScriptStringEncode.

Categories