Replace single group via RegEx in all matches - c#

I have a text containing HTML-Elements, where hyperlinks contain not URLs but IDs to the item the hyperlink should open. Now i'm trying to get all those IDs and replace them with new IDs. The scenario is, that all ID's have changed and i have a dictionary with "oldId -> newID" and need to replace that in the text.
This input
Some text some text <a href = "##1234"> stuff stuff stuff <a href="##9999"> xxxx
With this Dictionary mapping
1234 -> 100025
9999 -> 100026
Should generate this output
Some text some text <a href = "##100025"> stuff stuff stuff <a href="##100026"> xxxx
So far i have this:
var textContent = "...";
var regex = new Regex(#"<\s*a\s+href\s*=\s*""##(?<RefId>\d+)""\s*\\?\s*>");
var matches = regex.Matches(textContent);
foreach (var match in matches.Cast<Match>())
{
var id = -1;
if (Int32.TryParse(match.Groups["RefId"].Value, out id))
{
int newId;
// idDictionary contains the mapping from old id to new id
if (idDictionary.TryGetValue(id, out newId))
{
// Now replace the id of the current match with the new id
}
}
}`
How do i replace the IDs now?

Don't parse HTML with regular expressions.
But if you must, if you're trying to perform a replacement, use the Replace method.
var updatedContent = regex.Replace(textContent, match =>
{
var id = -1;
if (Int32.TryParse(match.Groups["RefId"].Value, out id))
{
int newId;
// idDictionary contains the mapping from old id to new id
if (idDictionary.TryGetValue(id, out newId))
{
// Now replace the id of the current match with the new id
return newId.ToString();
}
}
// No change
return match.Value;
});
Edit: As you've pointed out, this replaces the entire match. Whoops.
Firstly, change your regular expression so the thing you'll be replacing is the entire match:
#"(?<=<\s*a\s+href\s*=\s*""##)(?<RefId>\d+)(?=""\s*\\?\s*>)"
This matches just a string of digits, but ensures it has the HTML tag before and after it.
It should now do what you want, but for tidiness you can replace (?<RefId>\d+) with just \d+ (as you don't need the group any more) and match.Groups["RefId"].Value with just match.Value.

Just use callback in replace.
regex.Replace(textContent, delegate(Match m) {
int id = -1, newId;
if (Int32.TryParse(m.Groups["RefId"].Value, out id)) {
if (idDictionary.TryGetValue(id, out newId))
return newId.ToString();
}
return m.Value; // if TryGetValue fails, return the match
});

Unless you are pulling the new IDs from the HTML aswell, I don't see why you can't just use a direct String.Replace here
var html = "Some text some text <a href = '##1234'> stuff stuff stuff <a href='##9999'> xxxx";
var mappings = new Dictionary<string, string>()
{
{ "1234", "100025" },
{ "9999", "100026" },
...
};
foreach (var map in mappings)
{
html = html.Replace("##" + map.Key, "##" + map.Value);
}
Fiddle

Related

Check if a particular string is contained in a list of strings

I'm trying to search a string to see if it contains any strings from a list,
var s = driver.FindElement(By.Id("list"));
var innerHtml = s.GetAttribute("innerHTML");
innerHtml is the string I want to search for a list of strings provided by me, example
var list = new List<string> { "One", "Two", "Three" };
so if say innerHtml contains "One" output Match: One
You can do this in the following way:
int result = list.IndexOf(innerHTML);
It will return the index of the item with which there is a match, else if not found it would return -1.
If you want a string output, as mentioned in the question, you may do something like:
if (result != -1)
Console.WriteLine(list[result] + " matched.");
else
Console.WriteLine("No match found");
Another simple way to do this is:
string matchedElement = list.Find(x => x.Equals(innerHTML));
This would return the matched element if there is a match, otherwise it would return a null.
See docs for more details.
You can do it with LINQ by applying Contains to innerHtml for each of the items on the list:
var matches = list.Where(item => innerHtml.Contains(item)).ToList();
Variable matches would contain a subset of strings from the list which are matched inside innerHtml.
Note: This approach does not match at word boundaries, which means that you would find a match of "One" when innerHtml contains "Onerous".
foreach(var str in list)
{
if (innerHtml.Contains(str))
{
// match found, do your stuff.
}
}
String.Contains documentation
For those who want to serach Arrray of chars in another list of strings
List WildCard = new() { "", "%", "?" };
List PlateNo = new() { "13eer", "rt4444", "45566" };
if (WildCard.Any(x => PlateNo.Any(y => y.Contains(x))))
Console.WriteLine("Plate has wildchar}");

How to get Last Index Of '\' or '//', whichever comes last?

I want to get lastindexof character from url which comes from the database on the basis of '\' or '//'
Say for example i have string like this
Administration\Masters\EmployeePulseDetailsMaster.aspx
Administration/Masters/SearchKnowYourCollegues.aspx
Administration//SMS//PushSMS.aspx
I am using that code
foreach (var item in SessionClass.UserDetails.SubModules)
{
if (Request.RawUrl.Contains(item.PageURL.Substring(item.PageURL.LastIndexOf('\\') + 1))
|| Request.RawUrl.Contains(item.PageURL.Substring(item.PageURL.LastIndexOf('/') + 1)))
{
Response.RedirectPermanent("~/Login.aspx");
}
}
You can use a regular expression to find the last occurrence of any character in a group by constructing a regular expression that looks like this:
[target-group][^target-group]*$
In your case, the target group is [/\\], so the search would look like this:
var match = Regex.Match(s, #"[/\\][^/\\]*$");
Here is a running example:
var data = new[] {
#"quick/brown/fox"
, #"jumps\over\the\lazy\dog"
, #"Administration\Masters\EmployeePulseDetailsMaster.aspx"
, #"Administration/Masters/SearchKnowYourCollegues.aspx"
, #"Administration//SMS//PushSMS.aspx"
};
foreach (var s in data) {
var m = Regex.Match(s, #"[/\\][^/\\]*$");
if (m.Success) {
Console.WriteLine(s.Substring(m.Index+1));
}
}
This prints
fox
dog
EmployeePulseDetailsMaster.aspx
SearchKnowYourCollegues.aspx
PushSMS.aspx
Demo.
I guess you want to determine if the name of the current page is in the list of SessionClass.UserDetails.SubModules. Then i'd use Request.Url.Segments.Last() to get only the name of the current page(f.e. PushSMS.aspx) and System.IO.Path.GetFileName to get the name of each url. GetFileName works with / or \:
string pageName = Request.Url.Segments.Last();
bool anyMatch = SessionClass.UserDetails.SubModules
.Any(module => pageName == System.IO.Path.GetFileName(module.PageURL));
if(anyMatch) Response.RedirectPermanent("~/Login.aspx");
You need to add using System.Linq; for Enumerable.Any.

Custom Regex for Parsing Custom Fields in HTML String

I am sending some html in a hidden field, and on server side I would be parsing it with regex. Currently I am able to parse
<div id="4059">asd</div>
and the code below gives me "id" in match.Groups[2] and "4059" in match.Groups[4], "div" comes at first index and 3rd comes empty.
string regex2 = #"<(?<Tag_Name>(a)|div)\b[^>]*?\b(?<URL_Type>(?(1)id))\s*=\s*(?:""(?<URL>(?:\\""|[^""])*)""|'(?<URL>(?:\\'|[^'])*)')";
var matches = Regex.Matches(myDiv, regex2, RegexOptions.IgnoreCase | RegexOptions.Singleline);
var links = new List<string>();
foreach (Match item in matches)
{
if (item.Groups[2].Value == "div")
{
employee.ID = item.Groups[4].Value;
}
]
Can someone please edit this regex,
<(?<Tag_Name>(a)|div)\b[^>]*?\b(?<URL_Type>(?(1)id))\s*=\s*(?:""(?<URL>(?:\\""|[^""])*)""|'(?<URL>(?:\\'|[^'])*)')
so that I could parse
<div id="5094" fieldA="asd" fieldB="def" fieldC="ghi"></div>
and the fields could be added too.
I should also mention here that I am working on a custom control and I CAN NOT USE HTML AGILITY PACK as the assemblies conflict as I add this in my project.
If you already know that the string contains only <div field="value" field="value" ...></div> (i.e. there's nothing but this div in the string), then just simplify your regex to pick out the field and value, and run it in a loop:
string regstr = #"\s+(?<field>[^\s=]+)\s*=\s*\"(?<value>[^\"]+)\"";
var reg = new Regex(regstr);
var m = reg.Match(myDiv);
while (m.Success)
{
// m.Groups["field"] and m.Groups["value"] hold your field and value
// get the next match
m = m.NextMatch();
}

extracting the common prefixes from a list of strings

I have a list of strings, such as:
{ abc001, abc002, abc003, cdef001, cdef002, cdef004, ghi002, ghi001 }
I want to get all the common unique prefixes; for example, for the above list:
{ abc, cdef, ghi }
How do I do that?
var list = new List<String> {
"abc001", "abc002", "abc003", "cdef001",
"cdef002", "cdef004", "ghi002", "ghi001"
};
var prefixes = list.Select(x = >Regex.Match(x, #"^[^\d]+").Value).Distinct();
It may be a good idea to write a helper class to represent your data. For example:
public class PrefixedNumber
{
private static Regex parser = new Regex(#"^(\p{L}+)(\d+)$");
public PrefixedNumber(string source) // you may want a static Parse method.
{
Match parsed = parser.Match(source); // think about an error here when it doesn't match
Prefix = parsed.Groups[1].Value;
Index = parsed.Groups[2].Value;
}
public string Prefix { get; set; }
public string Index { get; set; }
}
You need to come up with a better name, of course, and better access modifiers.
Now the task is quite easy:
List<string> data = new List<string> { "abc001", "abc002", "abc003", "cdef001",
"cdef002", "cdef004", "ghi002", "ghi001" };
var groups = data.Select(str => new PrefixedNumber(str))
.GroupBy(prefixed => prefixed.Prefix);
The result is all data, parsed, and grouped by the prefix.
You can achieve that using Regular Expression to select the text part, and then use HashSet<string> to add that text part so no duplication added:
using System.Text.RegularExpressions;
//simulate your real list
List<string> myList = new List<string>(new string[] { "abc001", "abc002", "cdef001" });
string pattern = #"^(\D*)\d+$";
// \D* any non digit characters, and \d+ means followed by at least one digit,
// Note if you want also to capture string like "abc" alone without followed by numbers
// then the pattern will be "^(\D*)$"
Regex regex = new Regex(pattern);
HashSet<string> matchesStrings = new HashSet<string>();
foreach (string item in myList)
{
var match = regex.Match(item);
if (match.Groups.Count > 1)
{
matchesString.Add(match.Groups[1].Value);
}
}
result:
abc, cde
Assuming that your prefix is all alpha characters and terminited by the first non-alpha character, you could use the following LINQ expression
List<string> listOfStrings = new List<String>()
{ "abc001d", "abc002", "abc003", "cdef001", "cdef002", "cdef004", "ghi002", "ghi001" };
var prefixes = (from s in listOfStrings
select new string(s.TakeWhile(c => char.IsLetter(c)).ToArray())).Distinct();

Find the matching word C#

I have 3 strings, i would like find matches
http://www.vkeong.com/2011/food-drink/heng-bak-kut-teh-delights-taman-kepong/#comments
http://www.vkeong.com/2009/food-drink/sen-kee-duck-satay-taman-desa-jaya-kepong/
http://www.vkeong.com/2008/food-drink/nasi-lemak-wai-sik-kai-kepong-baru/
for each link above=="nasi-lemak"
{
found!
}
If you're just looking to see if a longer string contains a specific shorter string, use String.Contains.
For your example:
string[] urlStrings = new string[]
{
#"http://www.vkeong.com/2011/food-drink/heng-bak-kut-teh-delights-taman-kepong/#comments"
#"http://www.vkeong.com/2009/food-drink/sen-kee-duck-satay-taman-desa-jaya-kepong"
#"http://www.vkeong.com/2008/food-drink/nasi-lemak-wai-sik-kai-kepong-baru/"
}
foreach(String url in urlStrings)
{
if(url.Contains("nasi-lemak"))
{
//Your code to handle a match here.
}
}
You want the String.IndexOf method.
foreach(string url in url_list)
{
if(url.IndexOf("nasi-lemak") != -1)
{
// Found!
}
}
Surely we also need a LINQ answer :)
var matches = urlStrings.Where(s => s.Contains("nasi-lemak"));
// or if you prefer query form. This is really the same as above
var matches2 = from url in urlStrings
where url.Contains("nasi-lemak")
select url;
// Now you can use matches or matches2 in a foreach loop
foreach (var matchingUrl in matches)
DoStuff(matchingUrl);

Categories