RegEx.Replace is replacing more than the matched - c#

I'm trying to replace the following match a pattenr in xml string where the pattern is various types of attributes that are present in any given xml element.
so if the xml string was:
<TEST xlmns="https://www.test.com">
<XXX>Foo</XXX>
<YYY>Bar</YYY>
</TEST>
I want to remove the namespaces above using pattenr .*?(?:[a-z][a-z0-9_]*).*?((?:[a-z][a-z0-9_]*))(=)(\".*?\") in the below code:
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var txt = "<TEST xlmns=\"https://www.test.com\"> <XXX>Foo</XXX> <YYY>Bar</YYY> </TEST>";
const string pattern = ".*?(?:[a-z][a-z0-9_]*).*?((?:[a-z][a-z0-9_]*))(=)(\".*?\")";
var r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
var m = r.Match(txt);
if (m.Success)
{
String var1 = m.Groups[1].ToString();
String c1 = m.Groups[2].ToString();
String string1 = m.Groups[3].ToString();
Console.Write( var1.ToString() + c1.ToString() + string1.ToString() + "\n");
Console.WriteLine(RegExReplace(txt,pattern,""));
}
Console.ReadLine();
}
static String RegExReplace(String input, String pattern, String replacement)
{
if (string.IsNullOrEmpty(input))
return input;
return Regex.Replace(input, pattern, replacement, RegexOptions.IgnoreCase);
}
}
}
But where it matches, in this case <TEST xlmns="https://www.test.com"> is turned into > when it should have been <TEST>
What have i done wrong in the replace method?

If you just want to remove namespace, change your regex to:
const string pattern = "xlmns=\".*\"";
If you want to remove all attributes, use the given regex:
const string pattern = "\w+=\".*\"";
Full code:
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
var txt = "<TEST xlmns=\"https://www.test.com\"> <XXX>Foo</XXX> <YYY>Bar</YYY> </TEST>";
const string pattern = "\w+=\".*\"";
var r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
var m = r.Match(txt);
if (m.Success)
{
String var1 = m.Groups[1].ToString();
String c1 = m.Groups[2].ToString();
String string1 = m.Groups[3].ToString();
Console.Write( var1.ToString() + c1.ToString() + string1.ToString() + "\n");
Console.WriteLine(RegExReplace(txt,pattern,""));
}
Console.ReadLine();
}
static String RegExReplace(String input, String pattern, String replacement)
{
if (string.IsNullOrEmpty(input))
return input;
return Regex.Replace(input, pattern, replacement, RegexOptions.IgnoreCase);
}
}
}

Related

c# count a specific word happen times in a txt file

static void Main(string[] args)
{
StreamReader oReader;
if (File.Exists(#"C:\cmd.txt"))
{
Console.WriteLine("IMAGE");
string cSearforSomething = Console.ReadLine().Trim();
oReader = new StreamReader(#"C:\cmd.txt");
string cColl = oReader.ReadToEnd();
string cCriteria = #"\b" + cSearforSomething + #"\b";
System.Text.RegularExpressions.Regex oRegex = new System.Text.RegularExpressions.Regex(cCriteria, RegexOptions.IgnoreCase);
int count = oRegex.Matches(cColl).Count;
Console.WriteLine(count.ToString());
}
Console.ReadLine();
}
I cannot calculate the string "IMAGE" happen times in my file? Is my code wrong?
Try this code
public static void Main()
{
var str = File.ReadAllText(#"C:\cmd.txt");
var searchTerm = "IMAGE";
var matches = Regex.Matches(str, #"\b" + searchTerm + #"\b", RegexOptions.IgnoreCase);
Console.WriteLine(matches.Count);
Console.ReadLine();
}
According to https://msdn.microsoft.com/en-us/library/az24scfc.aspx the "\b" matches a backspace. That might explain why you aren't matching the regular expression.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication8
{
class Program
{
static void Main(string[] args)
{
string cColl = System.IO.File.ReadAllText(#"C:\some.txt");
//string cColl = "This is similar, similar, similar, similar, similar, similar";
Console.WriteLine(cColl);
string cCriteria = #"\b" + "similar" + #"\b";
Regex oRegex = new Regex(cCriteria, RegexOptions.IgnoreCase);
int count = oRegex.Matches(cColl).Count;
Console.WriteLine(count.ToString());
Console.ReadLine();
}
}
}

Remove accents from a text file

I have issues with removing accents from a text file program replaces characters with diacritics to ? Here is my code:
private void button3_Click(object sender, EventArgs e)
{
if (radioButton3.Checked)
{
byte[] tmp;
tmp = System.Text.Encoding.GetEncoding("ISO-8859-1").GetBytes(richTextBox1.Text);
richTextBox2.Text = System.Text.Encoding.UTF8.GetString(tmp);
}
}
Taken from here: https://stackoverflow.com/a/249126/3047078
static string RemoveDiacritics(string text)
{
var normalizedString = text.Normalize(NormalizationForm.FormD);
var stringBuilder = new StringBuilder();
foreach (var c in normalizedString)
{
var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
if (unicodeCategory != UnicodeCategory.NonSpacingMark)
{
stringBuilder.Append(c);
}
}
return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}
usage:
string result = RemoveDiacritics("včľťšľžšžščýščýťčáčáčťáčáťýčťž");
results in vcltslzszscyscytcacactacatyctz
richTextBox1.Text = "včľťšľžšžščýščýťčáčáčťáčáťýčťž";
string text1 = richTextBox1.Text.Normalize(NormalizationForm.FormD);
string pattern = #"\p{M}";
string text2 = Regex.Replace(text1, pattern, "�");
richTextBox2.Text = text2;
First normalize the string.
Then with a regular expression replace all diacritics. Pattern \p{M} is Unicode Category - All diacritic marks.

How to check if a string contains substring with wildcard? like abc*xyz

When I parse lines in text file, I want to check if a line contains abc*xyz, where * is a wildcard. abc*xyz is a user input format.
You can generate Regex and match using it
searchPattern = "abc*xyz";
inputText = "SomeTextAndabc*xyz";
public bool Contains(string searchPattern,string inputText)
{
string regexText = WildcardToRegex(searchPattern);
Regex regex = new Regex(regexText , RegexOptions.IgnoreCase);
if (regex.IsMatch(inputText ))
{
return true;
}
return false;
}
public static string WildcardToRegex(string pattern)
{
return "^" + Regex.Escape(pattern)
.Replace(#"\*", ".*")
.Replace(#"\?", ".")
+ "$";
}
Here is the source
and Here is a similar issue
If asterisk is the only wildcard character that you wish to allow, you could replace all asterisks with .*?, and use regular expressions:
var filter = "[quick*jumps*lazy dog]";
var parts = filter.Split('*').Select(s => Regex.Escape(s)).ToArray();
var regex = string.Join(".*?", parts);
This produces \[quick.*?jumps.*?lazy\ dog] regex, suitable for matching inputs.
Demo.
Use Regex
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string prefix = "abc";
string suffix = "xyz";
string pattern = string.Format("{0}.*{1}", prefix, suffix);
string input = "abc123456789xyz";
bool resutls = Regex.IsMatch(input, pattern);
}
}
}
​

Replace string after at with another string

I have two strings.
First string:
"31882757623"<sip:+31882757623#asklync.nl;user=phone>;epid=5440626C04;tag=daa784a738
Second string:vandrielfinance.nl
I want to replace asklync.nl to vandrielfinance.nl in the first string after the # with the second string (vandrielfinance.nl). Everything else will stay the same.
So the new string will be:
"31882757623"<sip:+31882757623#vandrielfinance.nl;user=phone>;epid=5440626C04;tag=daa784a738
Here is what I have so far:
static string ReplaceSuffix(string orginal, string newString)
{
string TobeObserved = "#";
orginal = "\"31882757623\"<sip:+31882757623#asklync.nl;user=phone>;epid=5440626C04;tag=daa784a738";
string second = "vandrielfinance.nl";
string pattern = second.Substring(0, second.LastIndexOf("#") + 1);
string code = orginal.Substring(orginal.IndexOf(TobeObserved) + TobeObserved.Length);
//newString = Regex.Replace(code,second, pattern);
newString = Regex.Replace(second, orginal, pattern);
string hallo = orginal.Replace(newString, second);
Console.Write("Original String: {0}", orginal);
Console.Write("\nReplacement String: \n{0}", newString);
Console.WriteLine("\n" + code);
return newString;
}
why not string.Replace?
string s = "\"31882757623\"<sip:+31882757623#asklync.nl;user=phone>;epid=5440626C04;tag=daa784a738";
string t = "vandrielfinance.nl";
string u = s.Replace("asklync.nl", t);
Console.WriteLine(u);
I'm not really a fan a string.Split(), but it made for quick work in this case:
static string ReplaceSuffix(string orginal, string newString)
{
var segments = original.Split(";".ToCharArray());
var segments2 = segments[0].Split("#".ToCharArray());
segments2[1] = newString;
segments[0] = string.Join("#", segments2);
var result = string.Join(";", segments);
Console.WriteLine("Original String:\n{0}\nReplacement String:\n{1}, original, result);
return result;
}
If the original domain will really always be asklync.nl, you may even be able to just do this:
static string ReplaceSuffix(string orginal)
{
var oldDomain = "asklync.nl";
var newDomain = "vandrielfinance.nl";
var result = original.Replace(oldDomain, newDomain);
Console.WriteLine("Original String:\n{0}\nReplacement String:\n{1}, original, result);
return result;
}
This should work
var orginal = "\"31882757623\"<sip:+31882757623#asklync.nl;user=phone>;epid=5440626C04;tag=daa784a738";
string second = "vandrielfinance.nl";
var returnValue = string.Empty;
var split = orginal.Split('#');
if (split.Length > 0)
{
var findFirstSemi = split[1].IndexOf(";");
var restOfString = split[1].Substring(findFirstSemi, split[1].Length - findFirstSemi);
returnValue = split[0] + "#" + second + restOfString;
}
Console.WriteLine("Original String:");
Console.WriteLine("{0}", orginal);
Console.WriteLine("Replacement String:");
Console.WriteLine("{0}", returnValue);
//return returnValue;
I'm not a huge fan of RegEx or string.Split, especially when a string function already exists to replace a portion of a string.
string orginal = "\"31882757623\"<sip:+31882757623#asklync.nl;user=phone>;epid=5440626C04;tag=daa784a738";
string second = "vandrielfinance.nl";
int start = orginal .IndexOf("#");
int end = orginal .IndexOf(";", start);
string newString = orginal .Replace(orginal.Substring(start, end-start), second );
Console.WriteLine(orginal );
Console.WriteLine(newString);

Remove words from string c#

I am working on a ASP.NET 4.0 web application, the main goal for it to do is go to the URL in the MyURL variable then read it from top to bottom, search for all lines that start with "description" and only keep those while removing all HTML tags. What I want to do next is remove the "description" text from the results afterwords so I have just my device names left. How would I do this?
protected void parseButton_Click(object sender, EventArgs e)
{
MyURL = deviceCombo.Text;
WebRequest objRequest = HttpWebRequest.Create(MyURL);
objRequest.Credentials = CredentialCache.DefaultCredentials;
using (StreamReader objReader = new StreamReader(objRequest.GetResponse().GetResponseStream()))
{
originalText.Text = objReader.ReadToEnd();
}
//Read all lines of file
String[] crString = { "<BR> " };
String[] aLines = originalText.Text.Split(crString, StringSplitOptions.RemoveEmptyEntries);
String noHtml = String.Empty;
for (int x = 0; x < aLines.Length; x++)
{
if (aLines[x].Contains(filterCombo.SelectedValue))
{
noHtml += (RemoveHTML(aLines[x]) + "\r\n");
}
}
//Print results to textbox
resultsBox.Text = String.Join(Environment.NewLine, noHtml);
}
public static string RemoveHTML(string text)
{
text = text.Replace(" ", " ").Replace("<br>", "\n");
var oRegEx = new System.Text.RegularExpressions.Regex("<[^>]+>");
return oRegEx.Replace(text, string.Empty);
}
Ok so I figured out how to remove the words through one of my existing functions:
public static string RemoveHTML(string text)
{
text = text.Replace(" ", " ").Replace("<br>", "\n").Replace("description", "").Replace("INFRA:CORE:", "")
.Replace("RESERVED", "")
.Replace(":", "")
.Replace(";", "")
.Replace("-0/3/0", "");
var oRegEx = new System.Text.RegularExpressions.Regex("<[^>]+>");
return oRegEx.Replace(text, string.Empty);
}
public static void Main(String[] args)
{
string str = "He is driving a red car.";
Console.WriteLine(str.Replace("red", "").Replace(" ", " "));
}
Output:
He is driving a car.
Note: In the second Replace its a double space.
Link : https://i.stack.imgur.com/rbluf.png
Try this.It will remove all occurrence of the word which you want to remove.
Try something like this, using LINQ:
List<string> lines = new List<string>{
"Hello world",
"Description: foo",
"Garbage:baz",
"description purple"};
//now add all your lines from your html doc.
if (aLines[x].Contains(filterCombo.SelectedValue))
{
lines.Add(RemoveHTML(aLines[x]) + "\r\n");
}
var myDescriptions = lines.Where(x=>x.ToLower().BeginsWith("description"))
.Select(x=> x.ToLower().Replace("description",string.Empty)
.Trim());
// you now have "foo" and "purple", and anything else.
You may have to adjust for colons, etc.
void Main()
{
string test = "<html>wowzers description: none <div>description:a1fj391</div></html>";
IEnumerable<string> results = getDescriptions(test);
foreach (string result in results)
{
Console.WriteLine(result);
}
//result: none
// a1fj391
}
static Regex MyRegex = new Regex(
"description:\\s*(?<value>[\\d\\w]+)",
RegexOptions.Compiled);
IEnumerable<string> getDescriptions(string html)
{
foreach(Match match in MyRegex.Matches(html))
{
yield return match.Groups["value"].Value;
}
}
Adapted From Code Project
string value = "ABC - UPDATED";
int index = value.IndexOf(" - UPDATED");
if (index != -1)
{
value = value.Remove(index);
}
It will print ABC without - UPDATED

Categories