C# Regex Replace Greek Letters - c#

I'm getting string like "thetaetaA" (theta eta A)
I need to replace the recived string like {\theta}{\eta}A
// C# code with regex to match greek letters
string gl = "alpha|beta|delata|theta|eta";
string recived = "thetaetaA";
var greekLetters = Regex.Matches(recived,gl);
could someone please tell how can I create the required text
{\theta}{\eta}A
if I use loop and do a replace it generate following out put
{\th{\eta}}{\eta}A
because theta included eta

Regex.Matches() doesn't replace anything. Use Regex.Replace(). Capture the words and reference the capture in the replacement adding the special characters around it. (And possibly have the superstrings before the substrings in the alternation. Though it works either way for me. Supposedly it's a greedy match anyway.)
class Program
{
static void Main(string[] args)
{
string gl = "alpha|beta|delta|theta|eta";
string received = "thetaetaA";
string texified = Regex.Replace(received, $"({gl})", #"{\$1}");
Console.WriteLine(texified);
Console.ReadKey();
}
}

Related

Replace all non-supported chars with a space

I need to accomplish following. I have list of allowed chars (this is for QB Issues with special characters in QBO API v3 .NET SDK)
var goodChars = "ABCD...abcd...~_-...";
void string Sanitize(string input)
{
// TODO: Need to take input and replace all chars not included in "goodChars" with a space
}
I know how to find bad chars with RegEx, but this is like backwards, I don't need to look at matches. I need to look at what is not matching and replace only those.
string Sanitize(string input)
{
return new string(input.Select(x => goodChars.Contains(x)?x:' ').ToArray());
}
And as vc 74 suggests, its better to have an HashSet<char> of goodChars instead of a string for faster look ups
You can use a Regex with a negative pattern
const string pattern = "[^A-Za-z~_-]";
var regex = new Regex(pattern);
string sanitized = regex.Replace(input, " ");
Fiddle
Note that if this code is used frequently, you can store the regex in a static member to avoid recreating (and recompiling) for each invocation.

C# Extract part of the string that starts with specific letters

I have a string which I extract from an HTML document like this:
var elas = htmlDoc.DocumentNode.SelectSingleNode("//a[#class='a-size-small a-link-normal a-text-normal']");
if (elas != null)
{
//
_extractedString = elas.Attributes["href"].Value;
}
The HREF attribute contains this part of the string:
gp/offer-listing/B002755TC0/
And I'm trying to extract the B002755TC0 value, but the problem here is that the string will vary by its length and I cannot simply use Substring method that C# offers to extract that value...
Instead I was thinking if there's a clever way to do this, to perhaps a match beginning of the string with what I search?
For example I know for a fact that each href has this structure like I've shown, So I would simply match these keywords:
offer-listing/
So I would find this keyword and start extracting the part of the string B002755TC0 until the next " / " sign ?
Can someone help me out with this ?
This is a perfect job for a regular expression :
string text = "gp/offer-listing/B002755TC0/";
Regex pattern = new Regex(#"offer-listing/(\w+)/");
Match match = pattern.Match(text);
string whatYouAreLookingFor = match.Groups[1].Value;
Explanation : we just match the exact pattern you need.
'offer-listing/'
followed by any combination of (at least one) 'word characters' (letters, digits, hyphen, etc...),
followed by a slash.
The parenthesis () mean 'capture this group' (so we can extract it later with match.Groups[1]).
EDIT: if you want to extract also from this : /dp/B01KRHBT9Q/
Then you could use this pattern :
Regex pattern = new Regex(#"/(\w+)/$");
which will match both this string and the previous. The $ stands for the end of the string, so this literally means :
capture the characters in between the last two slashes of the string
Though there is already an accepted answer, I thought of sharing another solution, without using Regex. Just find the position of your pattern in the input + it's lenght, so the wanted text will be the next character. to find the end, search for the first "/" after the begining of the wanted text:
string input = "gp/offer-listing/B002755TC0/";
string pat = "offer-listing/";
int begining = input.IndexOf(pat)+pat.Length;
int end = input.IndexOf("/",begining);
string result = input.Substring(begining,end-begining);
If your desired output is always the last piece, you could also use split and get the last non-empty piece:
string result2 = input.Split(new string[]{"/"},StringSplitOptions.RemoveEmptyEntries)
.ToList().Last();

Replace exact matching words containing special characters

I came across How to search and replace exact matching strings only. However, it doesn't work when there are words that start with #. My fiddle here https://dotnetfiddle.net/9kgW4h
string textToFind = string.Format(#"\b{0}\b", "#bob");
Console.WriteLine(Regex.Replace("#bob!", textToFind, "me"));// "#bob!" instead of "me!"
Also, in addition to that what I would like to do is that, if a word starts with \# say for example \#myname and if I try to find and replace #myname, it shouldn't do the replace.
I suggest replacing the leading and trailing word boundaries with unambiguous lookaround-based boundaries that will require whitespace chars or start/end of string on both ends of the search word, (?<!\S) and (?!\S). Besides, you need to use $$ in the replacement pattern to replace with a literal $.
I suggest:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string text = #"It is #google.com or #google w#google \#google \\#google";
string result = SafeReplace(text,"#google", "some domain", true);
Console.WriteLine(result);
}
public static string SafeReplace(string input, string find, string replace, bool matchWholeWord)
{
string textToFind = matchWholeWord ? string.Format(#"(?<!\S){0}(?!\S)", Regex.Escape(find)) : find;
return Regex.Replace(input, textToFind, replace.Replace("$","$$"));
}
}
See the C# demo.
The Regex.Escape(find) is only necessary if you expect special regex metacharacters in the find variable value.
The regex demo is available at regexstorm.net.

Simple regex question C#

I need to match the string that is shown in the window displayed below :
8% of setup_av_free.exe from software-files-l.cnet.com Completed
98% of test.zip from 65.55.72.119 Completed
[numeric]%of[filename]from[hostname | IP address]Completed
I have written the regex pattern halfway
if (Regex.IsMatch(text, #"[\d]+%[\s]of[\s](.+?)(\.[^.]*)[\s]from[\s]"))
MessageBox.Show(text);
and I now need to integrate the following regex into my code above
ValidIpAddressRegex = "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";
ValidHostnameRegex = "^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$";
The 2 regex were taken from this link. These 2 regex works well when i use the Regex.ismatch to match "123.123.123.123" and "software-files-l.cnet.com" . However i cannot get it to work when i intergrate both of them to my existin regex code. I tried several variant but not able to get it to work. Can someone guide me to integrate the 2 regex to my existing code. Thanks in advance.
You can certainly combine all these regular expressions into one, but I'd recommend against it. Consider this method, first it checks wether your input text has the correct form overall, then it checks if the "from" part is an IP address or a hostname.
bool CheckString(string text) {
const string ValidIpAddressRegex = #"^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";
const string ValidHostnameRegex = #"^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$";
var match = Regex.Match(text, #"[\d]+%[\s]of[\s](.+?)(\.[^.]*)[\s]from[\s](\S+)");
if(!match.Success)
return false;
string address = match.Groups[3].Value;
return Regex.IsMatch(address, ValidIpAddressRegex) ||
Regex.IsMatch(address, ValidHostnameRegex);
}
It does what you want and is much more readable and than single monster-sized regular expression. If you aren't going to call this method millions of time in a loop there is no reason to be concerned about it being less performant that single regex.
Also, in case you aren't aware of that the brackets around \d or \s aren't necessary.
The "Problem" that those two regexes do not match your string is that they start with ^ and end with $
^ means match the start of the string (or row if the m modifier is activated)
$ means match the end of the string (or row if the m modifier is activated)
When you try it this is true but in your real text they are in the middle of the string, so it is not matched.
Try just remove the ^ at the very beginning and the $ at the very end.
Here you go.
^[\d]+%[\s+]of[\s+](.+?)(\.[^.]*)[\s+]from[\s+]((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])|((([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])))[\s+]Completed
Remove the ^ and $ characters from the ValidIpAddressRegex and ValidHostnameRegex samples above, and add them separated by the or character (|) enclosed by parentheses.
You could use this, its should work for all cases. I mightve accidentally deleted a character while formatting so let me know if it doesnt work.
string captureString = "8% of setup_av_free.exe from software-files-l.cnet.com Completed";
Regex reg = new Regex(#"(?<perc>\d+)% of (?<file>\w+\.\w+) from (?<host>" +
#"(\d+\.\d+.\d+.\d+)|(((https?|ftp|gopher|telnet|file|notes|ms-help):" +
#"((//)|(\\\\))+)?[\w\d:##%/;$()~_?\+-=\\\.&]*)) Completed");
Match m = reg.Match(captureString);
string perc = m.Groups["perc"].Value;
string file = m.Groups["file"].Value;
string host = m.Groups["host"].Value;

Convert ASCII hex codes to character in "mixed" string

Is there a method, or a way to convert a string with a mix of characters and ASCII hex codes to a string of just characters?
e.g. if I give it the input Hello\x26\x2347\x3bWorld it will return Hello/World?
Thanks
Quick and dirty:
static void Main(string[] args)
{
Regex regex = new Regex(#"\\x[0-9]{2}");
string s = #"Hello\x26\x2347World";
var matches = regex.Matches(s);
foreach(Match match in matches)
{
s = s.Replace(match.Value, ((char)Convert.ToByte(match.Value.Replace(#"\x", ""), 16)).ToString());
}
Console.WriteLine(s);
Console.Read();
}
And use HttpUtility.HtmlDecode to decode the resulting string.
I'm not sure about those specific character codes but you might be able to do some kind of regex to find all the character codes and convert only them. Though if the character codes can be varying lengths it might be difficult to make sure that they don't get mixed up with any normal numbers/digits in the string.

Categories