Regular Expression + Options in MongoDB (C# Driver) - c#

I'm working with the MongoDB driver for C # and I'm making queries to get from a collection a list of items that match a field.
I am using the BsonRegularExpression object with the following expression:
"/.*" + summonerName + "/"
This would be the equivalent of LIKE in Sql, but the problem comes when I want it to be case insensitive.
To do so, many sites comment that it must be like this:
BsonRegularExpression expReg = new BsonRegularExpression("/.*" + summonerName + "/", "i");
When I put the options parameter like this: "i" after the expression, it just doesn't return anything.
I leave the entire code here:
public static List<Summoner> GetSummonerByName(String summonerName)
{
try
{
var database = dbClient.GetDatabase(databaseName);
var collection = database.GetCollection<Summoner>(collectionSummoner);
String nombreRegex = "";
var nombreChar = summonerName.ToCharArray();
foreach(Char caracter in nombreChar)
{
nombreRegex += "*";
nombreRegex += caracter;
}
//var filter = Builders<Summoner>.Filter.Regex(u => u.name, new BsonRegularExpression("/.*C*o*r*b*a*n/"));
BsonRegularExpression expReg = new BsonRegularExpression("/.*" + summonerName + "/");
var filter = Builders<Summoner>.Filter.Regex(u => u.name, expReg);
var resultado = collection.Find(filter).ToList();
if(resultado.Count() == 0)
{
InsertSummoner(RiotApiConnectorService.GetSummonerByName(summonerName));
GetSummonerByName(summonerName);
}
return resultado;
}
catch (Exception error)
{
return null;
}
}
Has anyone experience using regular expressions in the mongo driver in C #?
Thanks in advance!

You should define it as
BsonRegularExpression expReg = new BsonRegularExpression(Regex.Escape(summonerName), "i")
The point here is that BsonRegularExpression regex definition should not include regex delimiters, / in your case. The regex instantiation is performed using the BsonRegularExpression class, and the regex delimiters are simply redundant, and are treated here as literal / in the pattern. There are no matches because your data have no slashes.
Next, you do not need .* here because regex searches for a match anywhere in the input text, it does not require a full string match (as is the case with LIKE operator).
Note Regex.Escape(summonerName) is used just in case there are special regex metacharacters in the summonerName, and if the method is not used, the search may fail.

Related

How to Extract Domain name from string with Regex in C#?

I want extract Top-Level Domain names and Country top-level domain names from string with Regex. I tested many Regex like this code:
var linkParser = new Regex(#"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
Match m = linkParser.Match(Url);
Console.WriteLine(m.Value);
But none of these codes could do it properly.
The text string entered by the user can be in the following statements:
jonasjohn.com
http://www.jonasjohn.de/snippets/csharp/
jonasjohn.de
www.jonasjohn.de/snippets/csharp/
http://www.answers.com/article/1194427/8-habits-of-extraordinarily-likeable-people
http://www.apple.com
https://www.cnn.com.au
http://www.downloads.news.com.au
https://ftp.android.co.nz
http://global.news.ca
https://www.apple.com/
https://ftp.android.co.nz/
http://global.news.ca/
https://www.apple.com/
https://johnsmith.eu
ftp://johnsmith.eu
johnsmith.gov.ae
johnsmith.eu
www.jonasjohn.de
www.jonasjohn.ac.ir/snippets/csharp
http://www.jonasjohn.de/
ftp://www.jonasjohn.de/
https://subdomain.abc.def.jonasjohn.de/test.htm
The Regex I tested:
^(?:https?:\/\/)?(?:[^#\/\n]+#)?(?:www\.)?([^:\/\n]+)"
\b(?:https?://|www\.)\S+\b
://(?<host>([a-z\\d][-a-z\\d]*[a-z\\d]\\.)*[a-z][-a-z\\d]+[a-z])
and also too many
I just need the domain name and I don't need a protocol or a subdomain.
Like:
Domainname.gTLD or DomainName.ccTLD or DomainName.xyz.ccTLD
I got list of them from PUBLIC SUFFIX
Of course, I've seen a lot of posts on stackoverflow.com, but none of it answered me.
You don't need a Regex to parse a URL. If you have a valid URL, you can use one of the Uri constructors or Uri.TryCreate to parse it:
if(Uri.TryCreate("http://google.com/asdfs",UriKind.RelativeOrAbsolute,out var uri))
{
Console.WriteLine(uri.Host);
}
www.jonasjohn.de/snippets/csharp/ and jonasjohn.de/snippets/csharp/ aren't valid URLs though. TryCreate can still parse them as relative URLs, but reading Host throws System.InvalidOperationException: This operation is not supported for a relative URI.
In that case you can use the UriBuilder class, to parse and modify the URL eg:
var bld=new UriBuilder("jonasjohn.com");
Console.WriteLine(bld.Host);
This prints
jonasjohn.com
Setting the Scheme property produces a valid,complete URL:
bld.Scheme="https";
Console.WriteLine(bld.Uri);
This produces:
https://jonasjohn.com:80/
According to Lidqy answer, I wrote this function, which I think supports most possible situations, and if the input value is out of this, you can make it an exception.
public static string ExtractDomainName(string Url)
{
var regex = new Regex(#"^((https?|ftp)://)?(www\.)?(?<domain>[^/]+)(/|$)");
Match match = regex.Match(Url);
if (match.Success)
{
string domain = match.Groups["domain"].Value;
int freq = domain.Where(x => (x == '.')).Count();
while (freq > 2)
{
if (freq > 2)
{
var domainSplited = domain.Split('.', 2);
domain = domainSplited[1];
freq = domain.Where(x => (x == '.')).Count();
}
}
return domain;
}
else
{
return String.Empty;
}
}
var rx = new Regex(#"^((https?|ftp)://)?(www\.)?(?<domain>[^/]+)(/|$)");
var data = new[] { "jonasjohn.com",
"http://www.jonasjohn.de/snippets/csharp/",
"jonasjohn.de",
"www.jonasjohn.de/snippets/csharp/",
"http://www.answers.com/article/1194427/8-habits-of-extraordinarily-likeable-people",
"http://www.apple.com",
"https://www.cnn.com.au",
"http://www.downloads.news.com.au",
"https://ftp.android.co.nz",
"http://global.news.ca",
"https://www.apple.com/",
"https://ftp.android.co.nz/",
"http://global.news.ca/",
"https://www.apple.com/",
"https://johnsmith.eu",
"ftp://johnsmith.eu",
"johnsmith.gov.ae",
"johnsmith.eu",
"www.jonasjohn.de",
"www.jonasjohn.ac.ir/snippets/csharp",
"http://www.jonasjohn.de/",
"ftp://www.jonasjohn.de/",
"https://subdomain.abc.def.jonasjohn.de/test.htm"
};
foreach (var dat in data) {
var match = rx.Match(dat);
if (match.Success)
Console.WriteLine("{0} => {1}", dat, match.Groups["domain"].Value);
else {
Console.WriteLine("{0} => NO MATCH", dat);
}
}

Replacing anchor/link in text

I'm having issues doing a find / replace type of action in my function, i'm extracting the < a href="link">anchor from an article and replacing it with this format: [link anchor] the link and anchor will be dynamic so i can't hard code the values, what i have so far is:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
string theString = string.Empty;
switch (articleWikiCheck) {
case "id|wpTextbox1":
StringBuilder newHtml = new StringBuilder(articleBody);
Regex r = new Regex(#"\<a href=\""([^\""]+)\"">([^<]+)");
string final = string.Empty;
foreach (var match in r.Matches(theString).Cast<Match>().OrderByDescending(m => m.Index))
{
string text = match.Groups[2].Value;
string newHref = "[" + match.Groups[1].Index + " " + match.Groups[1].Index + "]";
newHtml.Remove(match.Groups[1].Index, match.Groups[1].Length);
newHtml.Insert(match.Groups[1].Index, newHref);
}
theString = newHtml.ToString();
break;
default:
theString = articleBody;
break;
}
Helpers.ReturnMessage(theString);
return theString;
}
Currently, it just returns the article as it originally is, with the traditional anchor text format: < a href="link">anchor
Can anyone see what i have done wrong?
regards
If your input is HTML, you should consider using a corresponding parser, HtmlAgilityPack being really helpful.
As for the current code, it looks too verbose. You may use a single Regex.Replace to perform the search and replace in one pass:
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody, #"<a\s+href=""([^""]+)"">([^<]+)", "[$1 $2]");
}
else
{
// Helpers.ReturnMessage(articleBody); // Uncomment if it is necessary
return articleBody;
}
}
See the regex demo.
The <a\s+href="([^"]+)">([^<]+) regex matches <a, 1 or more whitespaces, href=", then captures into Group 1 any one or more chars other than ", then matches "> and then captures into Group 2 any one or more chars other than <.
The [$1 $2] replacement replaces the matched text with [, Group 1 contents, space, Group 2 contents and a ].
Updated (Corrected regex to support whitespaces and new lines)
You can try this expression
Regex r = new Regex(#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>");
It will match your anchors, even if they are splitted into multiple lines. The reason why it is so long is because it supports empty whitespaces between the tags and their values, and C# does not supports subroutines, so this part [\s\n]* has to be repeated multiple times.
You can see a working sample at dotnetfiddle
You can use it in your example like this.
public static string GetAndFixAnchor(string articleBody, string articleWikiCheck) {
if (articleWikiCheck == "id|wpTextbox1")
{
return Regex.Replace(articleBody,
#"<[\s\n]*a[\s\n]*(([^\s]+\s*[ ]*=*[ ]*[\s|\n*]*('|"").*\3)[\s\n]*)*href[ ]*=[ ]*('|"")(?<link>.*)\4[.\n]*>(?<anchor>[\s\S]*?)[\s\n]*<\/[\s\n]*a>",
"[${link} ${anchor}]");
}
else
{
return articleBody;
}
}

Extract ID and replace everything in `Example HTML`

New to Regular Expressions, I want to have the following text in my HTML and would like to replace with something else
Example HTML:
{{Object id='foo'}}
Extract the id into a variable like this:
string strId = "foo";
So far I have the following Regular Expression code that will capture the Example HTML:
string strStart = "Object";
string strFind = "{{(" + strStart + ".*?)}}";
Regex regExp = new Regex(strFind, RegexOptions.IgnoreCase);
Match matchRegExp = regExp.Match(html);
while (matchRegExp.Success)
{
//At this point, I have this variable:
//{{Object id='foo'}}
//I can find the id='foo' (see below)
//but not sure how to extract 'foo' and use it
string strFindInner = "id='(.*?)'"; //"{{Slider";
Regex regExpInner = new Regex(strFindInner, RegexOptions.IgnoreCase);
Match matchRegExpInner = regExpInner.Match(matchRegExp.Value.ToString());
//Do something with 'foo'
matchRegExp = matchRegExp.NextMatch();
}
I understand this might be a simple solution, I am hoping to gain more knowledge about Regular Expressions but more importantly, I am hoping to receive a suggestion on how to approach this cleaner and more efficiently.
Thank you
Edit:
Is this an example that I could potentially use: c# regex replace
While I am not solving my initial question with Regular Expressions, I did move into a simpler solution using SubString, IndexOf and string.Split for the time being, I understand that my code needs to be cleaned up but thought I would post the answer that I have thus far.
string html = "<p>Start of Example</p>{{Object id='foo'}}<p>End of example</p>"
string strObject = "Slider"; //Example
//When found, this will contain "{{Object id='foo'}}"
string strCode = "";
//ie: "id='foo'"
string strCodeInner = "";
//Tags will be a list, but in this example, only "id='foo'"
string[] tags = { };
//Looking for the following "{{Object "
string strFindStart = "{{" + strObject + " ";
int intFindStart = html.IndexOf(strFindStart);
//Then ending in the following
string strFindEnd = "}}";
int intFindEnd = html.IndexOf(strFindEnd) + strFindEnd.Length;
//Must find both Start and End conditions
if (intFindStart != -1 && intFindEnd != -1)
{
strCode = html.Substring(intFindStart, intFindEnd - intFindStart);
//Remove Start and End
strCodeInner = strCode.Replace(strFindStart, "").Replace(strFindEnd, "");
//Split by spaces, this needs to be improved if more than IDs are to be used
//but for proof of concept this is perfect
tags = strCodeInner.Split(new char[] { ' ' });
}
Dictionary<string, string> dictTags = new Dictionary<string, string>();
foreach (string tag in tags)
{
string[] tagSplit = tag.Split(new char[] { '=' });
dictTags.Add(tagSplit[0], tagSplit[1].Replace("'", "").Replace("\"", ""));
}
//At this point, I can replace "{{Object id='foo'}}" with anything I'd like
//What I don't show is that I go into the website's database,
//get the object (ie: Slider) and return the html for slider with the ID of foo
html = html.Replace(strCode, strView);
/*
"html" variable may contain:
<p>Start of Example</p>
<p id="foo">This is the replacement text</p>
<p>End of example</p>
*/

Replace string with regular expression and my own parameter

In my html I've serval token like this:
{PROP_1_1}, {PROP_1_2}, {PROP_37871_1} ...
Actually I replace that token with the following code:
htmlBuffer = htmlBuffer.Replace("{PROP_" + prop.PropertyID + "_1}", prop.PropertyDefaultHtml);
where prop is a custom object. But in this case it affects only the tokens ending with '_1'. I would like to propagate this logic to all the rest ending up with '_X' where X is numeric.
How could I implement a regexp pattern to achieve this?
You can use Regex.Replace():
Regex rgx = new Regex("{PROP_" + prop.PropertyID + "_\d+}");
htmlBuffer = rgx.Replace(htmlBuffer, prop.PropertyDefaultHtml);
You can do even better, you can catch both identifiers in a regular expression. That way you can loop through the references that exist in the string and get the properties for those, instead of looping through all the properties that you have and check if there is any reference for them in the string.
Example:
htmlBuffer = Regex.Replace(htmlBuffer, #"{PROP_(\d+)_(\d+)}", m => {
int id = Int32.Parse(m.Groups[1].Value);
int suffix = Int32.Parse(m.Groups[2].Value);
return properties[id].GetValue(suffix);
});

Changing the middle of a string

I have a list of strings.. each one looks similar to this:
"\n\t\"BLOCK\",\"HEADER-\"\r\n\t\t\"NAME\",\"147430\"\r\n\t\t\"REVISION\",\"0000\"\r\n\t\t\"DATE\",\"11/11/10\"\r\n\t\t\"TIME\",\"10:03:47\"\r\n\t\t\"PMABAR\",\"\"\r\n\t\t\"COMMENT\",\"\"\r\n\t\t\"PTPNAME\",\"0805C\"\r\n\t\t\"CMPNAME\",\"0805C\"\r\n\t\"BLOCK\",\"PRTIDDT-\"\r\n\t\t\"PMAPP\",1\r\n\t\t\"PMADC\",0\r\n\t\t\"ComponentQty\",4\r\n\t\"BLOCK\",\"PRTFORM-\"\r\n\t\t\....(more)...."
What I am trying to do is keep that entire string BUT... replace the DATE, TIME and ComponentQty.....
I want to place the date variable that i have set for the DATE, as well as the DateTime.Now.ToString(""HH:mm:ss") for the TIME ... and a dictionary[part] for the ComponentQty. These values would replace like so:
"DATE","11/11/10" with "DATE","12/06/11"
"TIME","10:03:47" with "TIME","10:30:10"
"ComponentQty",4 with "ComponentQty", 8
or something similar...
so the new string would look like this:
"\n\t\"BLOCK\",\"HEADER-\"\r\n\t\t\"NAME\",\"147430\"\r\n\t\t\"REVISION\",\"0000\"\r\n\t\t\"DATE\",\"12/06/11\"\r\n\t\t\"TIME\",\"10:30:10"\"\r\n\t\t\"PMABAR\",\"\"\r\n\t\t\"COMMENT\",\"\"\r\n\t\t\"PTPNAME\",\"0805C\"\r\n\t\t\"CMPNAME\",\"0805C\"\r\n\t\"BLOCK\",\"PRTIDDT-\"\r\n\t\t\"PMAPP\",1\r\n\t\t\"PMADC\",0\r\n\t\t\"ComponentQty\",8\r\n\t\"BLOCK\",\"PRTFORM-\"\r\n\t\t\....(more)...."
What is the quickest way to do such a thing? I was thinking Regex but I am not too sure on how to go about doing this. Can anyone help?
EDIT:
I used just a normal string replace to do it.. but the replaced data will not always have the statc date, time, compQty that I have below (11/11/10, 10:03:47, 4)... I need to find a way to make that section not hard coded -- with regex I am assuming..
var newDate = "DATE\",\"" + date + "\"";
var newTime = "TIME\",\"" + DateTime.Now.ToString("HH:mm:ss") + "\"";
var newCompQTY = "ComponentQty\"," + dictionary[part];
trimmedDataBasePart = trimmedDataBasePart.ToUpper().Replace("DATE\",\"11/11/10", newDate);
trimmedDataBasePart = trimmedDataBasePart.ToUpper().Replace("TIME\",\"10:03:47", newTime);
trimmedDataBasePart = trimmedDataBasePart.ToUpper().Replace("COMPONENTQTY\",4", newCompQTY);
I am trying to set a value to a Regex and am not sure how to do so... this is what I was trying... but it obviously does not work because the var is not a string. any suggestions?
var newDate = "DATE\",\"" + date + "\"";
var regexedDate = Regex.Match(trimmedDataBasePart, "DATE\",[0-9]+/[0-9]+/[0-9]+");
trimmedDataBasePart = trimmedDataBasePart.ToUpper().Replace(regexedDate, newDate);
Try this:
resultString = Regex.Replace(subjectString, #"(.*\bDATE\b\D*).*?(\\.*\bTIME\b\D*).*?(\\.*\bComponentQty\b\D*)\d+(.*)", "$1NEW_DATE$2NEW_TIME$3NEW_QTY", RegexOptions.Singleline);
Where NEW_DATE should be replaced by your date, NEW_TIME by your time, and NEW_QTY by your new qty.
You can create the replacement string from other variables as you please :)
Well well well, .NET and interpolated variables suck.. If you try to change use "$11" in replacement it thinks it has to use backreference #11 and it fails. Also Regexbuddy had a bug which produced the wrong regex. This is tested and works!
string subjectString = "\n\t\"BLOCK\",\"HEADER-\"\r\n\t\t\"NAME\",\"147430\"\r\n\t\t\"REVISION\",\"0000\"\r\n\t\t\"DATE\",\"11/11/10\"\r\n\t\t\"TIME\",\"10:03:47\"\r\n\t\t\"PMABAR\",\"\"\r\n\t\t\"COMMENT\",\"\"\r\n\t\t\"PTPNAME\",\"0805C\"\r\n\t\t\"CMPNAME\",\"0805C\"\r\n\t\"BLOCK\",\"PRTIDDT-\"\r\n\t\t\"PMAPP\",1\r\n\t\t\"PMADC\",0\r\n\t\t\"ComponentQty\",4\r\n\t\"BLOCK\",\"PRTFORM-\"\r\n\t\t....(more)....";
Regex regexObj = new Regex(#"^(.*\bDATE\b\D*).*?(\"".*?\bTIME\b\D*).*?(\"".*?\bComponentQty\b\D*)\d+(.*)$", RegexOptions.Singleline);
StringBuilder myResult = new StringBuilder();
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
for (int i = 1; i < matchResults.Groups.Count; i++)
{
Group groupObj = matchResults.Groups[i];
if (groupObj.Success)
{
myResult.Append(groupObj.Value);
switch (i)
{
case 1:
myResult.Append("NEW_DATE");
break;
case 2:
myResult.Append("NEW_TIME");
break;
case 3:
myResult.Append("NEW QTY");
break;
}
}
}
matchResults = matchResults.NextMatch();
}
Console.WriteLine("Final Result : \n\n\n{0}", myResult.ToString());
Output:
Final Result :
"BLOCK","HEADER-"
"NAME","147430"
"REVISION","0000"
"DATE","NEW_DATE"
"TIME","NEW_TIME"
"PMABAR",""
"COMMENT",""
"PTPNAME","0805C"
"CMPNAME","0805C"
"BLOCK","PRTIDDT-"
"PMAPP",1
"PMADC",0
"ComponentQty",NEW QTY
"BLOCK","PRTFORM-"
....(more)....
By the way you have a falsely escaped dot in your input string. Cheers and have fun! :)
If you can change the way your source string looks, I would use String.Format:
string s = String.Format("Date={0}, Name={1}, Quantity={2}", date, name, quantity);
The placeholders {0}, {1}, {2} are replaced with the specified arguments which follow.
To make it cleaner I would create a function to parse that string list, and then another function to create such a string list instead of using regexps. I think this will make your code easier to maintain.
Dictionary<string, string> Parse(List<string> data)
{
...
}
List<string> CreateStringList(Dictionary<string, string> values)
{
...
}
List<string> SetValues(List<string> data)
{
Dictionary<string, string> values = Parse(data);
values["DATE"] = "12/06/11";
values["TIME"] = "10:30:10";
values["ComponentQty"] = "4";
return CreateStringList(values);
}

Categories