Convert ASCII hex codes to character in "mixed" string - c#

Is there a method, or a way to convert a string with a mix of characters and ASCII hex codes to a string of just characters?
e.g. if I give it the input Hello\x26\x2347\x3bWorld it will return Hello/World?
Thanks

Quick and dirty:
static void Main(string[] args)
{
Regex regex = new Regex(#"\\x[0-9]{2}");
string s = #"Hello\x26\x2347World";
var matches = regex.Matches(s);
foreach(Match match in matches)
{
s = s.Replace(match.Value, ((char)Convert.ToByte(match.Value.Replace(#"\x", ""), 16)).ToString());
}
Console.WriteLine(s);
Console.Read();
}
And use HttpUtility.HtmlDecode to decode the resulting string.

I'm not sure about those specific character codes but you might be able to do some kind of regex to find all the character codes and convert only them. Though if the character codes can be varying lengths it might be difficult to make sure that they don't get mixed up with any normal numbers/digits in the string.

Related

Convert special character in string into a char

I have this string:
string specialCharacterString = #"\n";
where "\n" is the new line special character.
Is it possible convert/assign that string (of two characters) into a (single) char. How do I do something like:
char specialCharacter = Parse(specialCharacterString);
Where specialCharacter value would be equal to \n
Is there anything in dotnet that would parse the string for me or must I use if or switch the string (the string can contain any special character) to accomplish what I want. Note that char.Parse(string) cannot handle special characters and thinks the string above is actually two characters.
Maybe I am oversimplifying but can't you just do the following:
txtString.Replace("\n", "$");
It is technically a string to string replacement but would be string to char...
You can always cast it to a char since you know what char you are replacing the string with.
Not sure, what business need it is, but if you need parsing C# in C# you can use some tools like Antlr, which supports C# grammar (https://github.com/antlr/grammars-v4/)
I don't think there is any ready tool designed just for strings
Try use Regex.Unescape(specialCharacterString);
It will return the new string with escape characters.
For example:
var literalStringWithEscapeCharacters = #"Hello\tWorld";
var stringWithEscapeCharacters = Regex.Unescape(literalStringWithEscapeCharacters);
Console.WriteLine(stringWithEscapeCharacters);
Will print: Hello World
Instead of: Hello\tWorld
Then you can find escape characters in stringWithEscapeCharacters like this:
var escapeChars= new [] { '\n' };
var characters = stringWithEscapeCharacters.Where(c => escapeChars.Contains(c)).ToList();
All escape characters described here:
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/strings/#string-escape-sequences

C# Regex Replace Greek Letters

I'm getting string like "thetaetaA" (theta eta A)
I need to replace the recived string like {\theta}{\eta}A
// C# code with regex to match greek letters
string gl = "alpha|beta|delata|theta|eta";
string recived = "thetaetaA";
var greekLetters = Regex.Matches(recived,gl);
could someone please tell how can I create the required text
{\theta}{\eta}A
if I use loop and do a replace it generate following out put
{\th{\eta}}{\eta}A
because theta included eta
Regex.Matches() doesn't replace anything. Use Regex.Replace(). Capture the words and reference the capture in the replacement adding the special characters around it. (And possibly have the superstrings before the substrings in the alternation. Though it works either way for me. Supposedly it's a greedy match anyway.)
class Program
{
static void Main(string[] args)
{
string gl = "alpha|beta|delta|theta|eta";
string received = "thetaetaA";
string texified = Regex.Replace(received, $"({gl})", #"{\$1}");
Console.WriteLine(texified);
Console.ReadKey();
}
}

get an special Substring in c#

I need to extract a substring from an existing string. This String starts with uninteresting characters (include "," "space" and numbers) and ends with ", 123," or ", 57," or something like this where the numbers can change. I only need the Numbers.
Thanks
public static void Main(string[] args)
{
string input = "This is 2 much junk, 123,";
var match = Regex.Match(input, #"(\d*),$"); // Ends with at least one digit
// followed by comma,
// grab the digits.
if(match.Success)
Console.WriteLine(match.Groups[1]); // Prints '123'
}
Regex to match numbers: Regex regex = new Regex(#"\d+");
Source (slightly modified): Regex for numbers only
I think this is what you're looking for:
Remove all non numeric characters from a string using Regex
using System.Text.RegularExpressions;
...
string newString = Regex.Replace(oldString, "[^.0-9]", "");
(If you don't want to allow the decimal delimiter in the final result, remove the . from the regular expression above).
Try something like this :
String numbers = new String(yourString.TakeWhile(x => char.IsNumber(x)).ToArray());
You can use \d+ to match all digits within a given string
So your code would be
var lst=Regex.Matches(inp,reg)
.Cast<Match>()
.Select(x=x.Value);
lst now contain all the numbers
But if your input would be same as provided in your question you don't need regex
input.Substring(input.LastIndexOf(", "),input.LastIndexOf(","));

C# Regex for retrieving capital string in quotation mark

Given a string, I want to retrieve a string that is in between the quotation marks, and that is fully capitalized.
For example, if a string of
oqr"awr"q q"ASRQ" asd "qIKQWIR"
has been entered, the regex would only evaluate "ASRQ" as matching string.
What is the best way to approach this?
Edit: Forgot to mention the string takes a numeric input as well I.E: "IO8917AS" is a valid input
EDIT: If you actually want "one or more characters, and none of the characters is a lower-case letter" then you probably want:
Regex regex = new Regex("\"\\P{Ll}+\"");
That will then allow digits as well... and punctuation. If you want to allow digits and upper case letters but nothing else, you can use:
Regex regex = new Regex("\"[\\p{Lu}\\d]+\"");
Or in verbatim string literal form (makes the quotes more confusing, but the backslashes less so):
Regex regex = new Regex(#"""[\p{Lu}\d]+""");
Original answer (before digits were required)
Sounds like you just want (within the pattern)
"[A-Z]*"
So something like:
Regex regex = new Regex("\"[A-Z]*\"");
Or for full Unicode support, use the Lu Unicode character category:
Regex regex = new Regex("\"\\p{Lu}*\"");
EDIT: As noted, if you don't want to match an empty string in quotes (which is still "a string where everything is upper case") then use + instead of *, e.g.
Regex regex = new Regex("\"\\p{Lu}+\");
Short but complete example of finding and displaying the first match:
using System;
using System.Text.RegularExpressions;
class Program
{
public static void Main()
{
Regex regex = new Regex("\"\\p{Lu}+\"");
string text = "oqr\"awr\"q q\"ASRQ\" asd \"qIKQWIR\"";
Match match = regex.Match(text);
Console.WriteLine(match.Success); // True
Console.WriteLine(match.Value); // "ASRQ"
}
}
Like this:
"\"[A-Z]+\""
The outermost quotes are not part of the regex, they delimit a C# string.
This requires at least one uppercase character between quotes and works for the English language.
Please try the following:
[\w]*"([A-Z0-9]+)"

Regular expressions match not working accurately with semicolon

I have a string of codes like:
0926;0941;0917;0930;094D;
I want to search for: 0930;094D; in the above string. I am using this code to find a string fragment:
static bool ExactMatch(string input, string match)
{
return Regex.IsMatch(input, string.Format(#"\b{0}\b", Regex.Escape(match)));
}
The problem is that sometimes the code works and sometimes not. If I match a single code for example: 0930; , it works but when I add 094D; , it skips match.
How to refine the code to work accurately with semicolons?
Try this, I have tested..
string val = "0926;0941;0917;0930;094D;";
string match = "0930;094D;"; // or match = "0930;" both found
if (Regex.IsMatch(val,match))
Console.Write("Found");
else Console.Write("Not Found");
"\b" denotes a word boundary, which is in between a word and a non-word character. Unfortunately, a semi-colon is not a word character. There is no "\b" at the end of "0926;0941;0917;0930;094D;" thus the Regex shows no match.
Why not just remove the last "\b" in your Regex?
Perhaps I'm not understanding your situation correctly; but if you're looking for an exact match within the string, couldn't you simply avoid regex and use string.Contains:
static bool ExactMatch(string input, string match)
{
return input.Contains(match);
}

Categories