Replacing doubleslash to single slash - c#

In my c# application i want to convert a string characters to special characters.
My input string is "G\u00f6teborg" and i want the output as Göteborg.
I am using below code,
string name = "G\\u00f6teborg";
StringBuilder sb = new StringBuilder(name);
sb = sb.Replace(#"\\",#"\");
string name1 = System.Web.HttpUtility.HtmlDecode(sb.ToString());
Console.WriteLine(name1);
In the above code the double slash remains the same , it is not replacing to single slash, so after decoding i am getting the output as G\u00f6teborg .
Please help to find a solution for this.
Thanks in advance.

string name = "G\\u00f6teborg";
Just remove one of the backslashes:
string name = "G\u00f6teborg";
If you got the input from a user then you need to do more: it’s not enough to replace a backslash because that’s not how the characters are stored internally, the \uXXXX is an escape sequence representing a Unicode code point.
If you want to replace a user input escape sequence by a Unicode code point you need to parse the user input properly. You can use a regular expression for that:
MatchEvaluator replacer = m => ((char) int.Parse(m.Groups[1].Value, NumberStyles.AllowHexSpecifier)).ToString();
string result = Regex.Replace(name, #"\\u([a-fA-F0-9]{4})", replacer);
This matches each escape group (\u followed by four hex digits), extracts the hex digits, parses them and translates them to a character.

Related

Convert special character in string into a char

I have this string:
string specialCharacterString = #"\n";
where "\n" is the new line special character.
Is it possible convert/assign that string (of two characters) into a (single) char. How do I do something like:
char specialCharacter = Parse(specialCharacterString);
Where specialCharacter value would be equal to \n
Is there anything in dotnet that would parse the string for me or must I use if or switch the string (the string can contain any special character) to accomplish what I want. Note that char.Parse(string) cannot handle special characters and thinks the string above is actually two characters.
Maybe I am oversimplifying but can't you just do the following:
txtString.Replace("\n", "$");
It is technically a string to string replacement but would be string to char...
You can always cast it to a char since you know what char you are replacing the string with.
Not sure, what business need it is, but if you need parsing C# in C# you can use some tools like Antlr, which supports C# grammar (https://github.com/antlr/grammars-v4/)
I don't think there is any ready tool designed just for strings
Try use Regex.Unescape(specialCharacterString);
It will return the new string with escape characters.
For example:
var literalStringWithEscapeCharacters = #"Hello\tWorld";
var stringWithEscapeCharacters = Regex.Unescape(literalStringWithEscapeCharacters);
Console.WriteLine(stringWithEscapeCharacters);
Will print: Hello World
Instead of: Hello\tWorld
Then you can find escape characters in stringWithEscapeCharacters like this:
var escapeChars= new [] { '\n' };
var characters = stringWithEscapeCharacters.Where(c => escapeChars.Contains(c)).ToList();
All escape characters described here:
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/strings/#string-escape-sequences

Regex expression to replace text between start and end characters, but keep characters too

How can I replace text using c# regex that starts with "<" and ends with ">", but keep start and end characters and suround found match with {} brackets?
All occurrences in text should be replaced.
For example:
This is <my> long <text> should become
This is {<my>} long {<text>}.
Thomas is correct -- in this case, you do not need a regular expression. However, if you insist on using one (or want to expand this logic in the future to handle a range of characters), here it is:
var inputString = "This is <my> long <text>";
var newInputString = Regex.Replace(inputString, "(<[^>]+>)", "{$1}");
This regex assumes you are capturing at least one character between the angled brackets.
Why don't you use just replace;
string text = "This is <my> long <text>";
var replacedText = text.Replace("<", "{<").Replace(">", ">}");
If you have encoded text, you can decode it first;
string text = "This is <my> long <text&gt";
var replacedText = WebUtility.HtmlDecode(text).Replace("<", "{<").Replace(">", ">}");

C# Regex for retrieving capital string in quotation mark

Given a string, I want to retrieve a string that is in between the quotation marks, and that is fully capitalized.
For example, if a string of
oqr"awr"q q"ASRQ" asd "qIKQWIR"
has been entered, the regex would only evaluate "ASRQ" as matching string.
What is the best way to approach this?
Edit: Forgot to mention the string takes a numeric input as well I.E: "IO8917AS" is a valid input
EDIT: If you actually want "one or more characters, and none of the characters is a lower-case letter" then you probably want:
Regex regex = new Regex("\"\\P{Ll}+\"");
That will then allow digits as well... and punctuation. If you want to allow digits and upper case letters but nothing else, you can use:
Regex regex = new Regex("\"[\\p{Lu}\\d]+\"");
Or in verbatim string literal form (makes the quotes more confusing, but the backslashes less so):
Regex regex = new Regex(#"""[\p{Lu}\d]+""");
Original answer (before digits were required)
Sounds like you just want (within the pattern)
"[A-Z]*"
So something like:
Regex regex = new Regex("\"[A-Z]*\"");
Or for full Unicode support, use the Lu Unicode character category:
Regex regex = new Regex("\"\\p{Lu}*\"");
EDIT: As noted, if you don't want to match an empty string in quotes (which is still "a string where everything is upper case") then use + instead of *, e.g.
Regex regex = new Regex("\"\\p{Lu}+\");
Short but complete example of finding and displaying the first match:
using System;
using System.Text.RegularExpressions;
class Program
{
public static void Main()
{
Regex regex = new Regex("\"\\p{Lu}+\"");
string text = "oqr\"awr\"q q\"ASRQ\" asd \"qIKQWIR\"";
Match match = regex.Match(text);
Console.WriteLine(match.Success); // True
Console.WriteLine(match.Value); // "ASRQ"
}
}
Like this:
"\"[A-Z]+\""
The outermost quotes are not part of the regex, they delimit a C# string.
This requires at least one uppercase character between quotes and works for the English language.
Please try the following:
[\w]*"([A-Z0-9]+)"

What does it mean when I enclose a C# string in #" "? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
What does # mean at the start of a string in C#?
Sorry but I can't find this on Google. I guess it maybe is not accepting my search string when I do a search.
Can someone tell me what this means in C#
var a = #"abc";
what's the meaning of the #?
It is a string literal. Which basically means it will take any character except ", including new lines. To write out a ", use "".
The advantage of #-quoting is that escape sequences are not processed,
which makes it easy to write, for example, a fully qualified file
name:
#"c:\Docs\Source\a.txt" // rather than "c:\\Docs\\Source\\a.txt"
It means it's a literal string.
Without it, any string containing a \ will consider the next character a special character, such as \n for new line. With a # in front, it will treat the \ literally.
In the example you've given, there is no difference in the output.
This says that the characters inside the double quotation marks should be interpreted exactly as they are.
You can see that the backslash is treated as a character and not an
escape sequence when the # is used. The C# compiler also allows you to
use real newlines in verbatim literals. You must encode quotation
marks with double quotes.
string fileLocation = "C:\\CSharpProjects";
string fileLocation = #"C:\CSharpProjects";
Look at here for examples.
C# supports two forms of string literals: regular string literals and verbatim string literals.
A regular string literal consists of zero or more characters enclosed
in double quotes, as in "hello", and may include both simple escape
sequences (such as \t for the tab character) and hexadecimal and
Unicode escape sequences.
A verbatim string literal consists of an # character followed by a
double-quote character, zero or more characters, and a closing
double-quote character. A simple example is "hello". In a verbatim
string literal, the characters between the delimiters are interpreted
verbatim, the only exception being a quote-escape-sequence. In
particular, simple escape sequences and hexadecimal and Unicode
escape sequences are not processed in verbatim string literals. A
verbatim string literal may span multiple lines.
Code Example
string a = "hello, world"; // hello, world
string b = #"hello, world"; // hello, world
string c = "hello \t world"; // hello world
string d = #"hello \t world"; // hello \t world
string e = "Joe said \"Hello\" to me"; // Joe said "Hello" to me
string f = #"Joe said ""Hello"" to me"; // Joe said "Hello" to me
string g = "\\\\server\\share\\file.txt"; // \\server\share\file.txt
string h = #"\\server\share\file.txt"; // \\server\share\file.txt
string i = "one\r\ntwo\r\nthree";
string j = #"one
two
three";
Reference link: MSDN

Only keep the legal chars in a text using a .NET Regex

I have a list of legal characters and I want to remove all others chars from text.
// my legal chars. a-Z, numbers, space, _, - and percentage
string legalChars = "[\p{L}\p{Nd}_\- %]*"
string text = "[update], Text with {illegal} chars such as: !? {}";
I do find a lot of examples for removing illegal chars. I want to do the opposite.
How about:
String trimmed = Regex.Replace(input, #"[^\p{L}\p{Nd}_\- %]", "");
Or:
private static readonly Regex RemovalPattern
= new Regex(#"[^\p{L}\p{Nd}_\- %]");
...
string trimmed = RemovalPattern.Replace(input, "");
Note that your regex of legal characters currently doesn't include space, contrary to the comment.
Why not loop through the string yourselfa and check for each character if it's a legal char append the char to a new string (for example with stringbuilder)

Categories