I wanto to convert the string like "123" to string like "\u0031\u0032\u0033".
How can i do this in .NET?
For example: reverse convert:
Encoding enc = Encoding.GetEncoding("us-ascii",
new EncoderExceptionFallback(),
new DecoderExceptionFallback());
byte[] by = enc.GetBytes(s);
string ans = enc.GetString(by);
return ans;
Strings in .NET already are Unicode, so there's no need to convert them from Unicode to Unicode.
If you want to output a unicode escaped string, then try this:
string ans = string.Concat(s.Select(c => string.Format("\\u{0:x4}", (int)c)).ToArray());
Result:
\u0031\u0032\u0033
See it working online: ideone
In .NET 4.0 you can omit the call to ToArray.
string ans = Regex.Replace(s, ".", m => String.Format(#"\u{0:x4}", (int)m.Value[0]));
Related
I have a Unicode string from a text file such that. And I want to display the real character.
For example:
\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b
When read this string from text file, using StreamReader.ReadToLine(), it escape the \ to '\\' such as "\\u8ba1", which is not wanted.
It will display the Unicode string same as from text. Which I want is to display the real character.
How can change the "\\u8ba1" to "\u8ba1" in the result string.
Or should use another Reader to read the string?
If you have a string like
var input1 = "\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b";
// input1 == "计算机•网络•技术类"
you don't need to unescape anything. It's just the string literal that contains the escape sequences, not the string itself.
If you have a string like
var input2 = #"\u8ba1\u7b97\u673a\u2022\u7f51\u7edc\u2022\u6280\u672f\u7c7b";
you can unescape it using the following regex:
var result = Regex.Replace(
input2,
#"\\[Uu]([0-9A-Fa-f]{4})",
m => char.ToString(
(char)ushort.Parse(m.Groups[1].Value, NumberStyles.AllowHexSpecifier)));
// result == "计算机•网络•技术类"
This question came out in the first result when googling, but I thought there should be a simpler way... this is what I ended up using:
using System.Text.RegularExpressions;
//...
var str = "Ingl\\u00e9s";
var converted = Regex.Unescape(str);
Console.WriteLine($"{converted} {str != converted}"); // Inglés True
Anyone knows how to encode ISO-8859-2 charset in C#? The following example does not work:
String name = "Filipović";
String encoded = WebUtility.HtmlEncode(name);
The resulting string should be
"Filipović"
Thanks
After reading your comments (you should support also Chinese names using ASCII chars only) I think you shouldn't stick to ISO-8859-2 encoding.
Solution 1
Use UTF-7 encoding for such names. UTF-7 is designed to use only ASCII characters for any Unicode string.
string value = "Filipović with Unicode symbol: 🏯";
var encoded = Encoding.ASCII.GetString(Encoding.UTF7.GetBytes(value));
Console.WriteLine(encoded); // Filipovi+AQc- with Unicode symbol: +2Dzf7w-
var decoded = Encoding.UTF7.GetString(Encoding.ASCII.GetBytes(encoded));
Solution 2
Alternatively, you can use base64 encoding, too. But in this case the pure ASCII strings will not be human-readable anymore.
string value = "Filipović with Unicode symbol: 🏯";
encoded = Convert.ToBase64String(Encoding.UTF8.GetBytes(value));
Console.WriteLine(encoded); // RmlsaXBvdmnEhyB3aXRoIFVuaWNvZGUgc3ltYm9sOiDwn4+v
var decoded = Encoding.UTF8.GetString(Convert.FromBase64String(encoded));
Solution 3
If you really stick to HTML Entity encoding you can achieve it like this:
string value = "Filipović with Unicode symbol: 🏯";
var result = new StringBuilder();
for (int i = 0; i < value.Length; i++)
{
if (Char.IsHighSurrogate(value[i]))
{
result.Append($"&#{Char.ConvertToUtf32(value[i], value[i + 1])};");
i++;
}
else if (value[i] > 127)
result.Append($"&#{(int)value[i]};");
else
result.Append(value[i]);
}
Console.WriteLine(result); // Filipović with Unicode symbol: 🏯
If you don't have strict requirement on Html encoding I'd recommend using Url (%) encoding which encodes all non-ASCII characters:
String name = "Filipović";
String encoded = WebUtility.UrlEncode(name); // Filipovi%C4%87
If you must have string with all non-ASCII characters to be HTML encoded consistently your best bet is use &xNNNN; or &#NNNN; format to encode all characters above 127. Unfortunately there is no way to convience HtmlEncode to encode all characters, so you need to do it yourself i.e. similarly how it is done in Convert a Unicode string to an escaped ASCII string. You can continue using HtmlDecode to read the values back at it handles &#xNNNN just fine.
Non optimal sample:
var name = "Filipović";
var result = String.Join("",
name.Select(x => x < 127 ? x.ToString() : String.Format("&#x{0:X4}", (int)x))
);
I have value like below
string value = "11,.Ad23";
int n;
bool isNumeric = int.TryParse(value, out n);
I control if string is numeric or not.If string is not numeric and has non numeric i need to get non numeric values as below
Result must be as below
,.Ad
How can i do this in c# ?
If it doesn't matter if the non-digits are consecutive, it's simple:
string nonNumericValue = string.Concat(value.Where(c => !Char.IsDigit(c)));
Online Demo: http://ideone.com/croMht
If you use .NET 3.5. as mentioned in the comment there was no overload of String.Concat (or String.Join as in Dmytris answer) that takes an IEnumerable<string>, so you need to create an array:
string nonNumericValue = string.Concat(value.Where(c => !Char.IsDigit(c)).ToArray());
That takes all non-digits. If you instead want to take the middle part, so skip the digits, then take all until the the next digits:
string nonNumericValue = string.Concat(value.SkipWhile(Char.IsDigit)
.TakeWhile(c => !Char.IsDigit(c)));
Regular expression solution (glue together all non-numeric values):
String source = "11,.Ad23";
String result = String.Join("", Regex
.Matches(source, #"\D{1}")
.OfType<Match>()
.Select(item => item.Value));
Edit: it seems that you use and old version of .Net, in that case you can use straightforward code without RegEx, Linq etc:
String source = "11,.Ad23";
StringBuilder sb = new StringBuilder(source.Length);
foreach (Char ch in source)
if (!Char.IsDigit(ch))
sb.Append(ch);
String result = sb.ToString();
Although I like the solution proposed I think a more efficent way would be using regular expressions such as
[^\D]
Which called as
var regex = new Regex(#"[^\D]");
var nonNumeric = regex.Replace("11,.Ad23", ""));
Which returns:
,.Ad
Would a LINQ solution work for you?
string value = "11,.Ad23";
var result = new string(value.Where(x => !char.IsDigit(x)).ToArray());
I have the following string:
string s = #"a=q\x26T=1";
I want to unescape this to:
"a=q&T=1"
How do I do this is C# other than just replacing the characters? There are various other escaped characters, so I'm not sure what encoding to use.
This works:
var decodedString = Regex.Unescape(#"source=s_q\x26hl=en");
but this works even better:
var regex = new Regex(#"\\x([a-fA-F0-9]{2})");
json = regex.Replace(json, match => char.ConvertFromUtf32(Int32.Parse(match.Groups[1].Value, System.Globalization.NumberStyles.HexNumber)));
I have a string and need the letters from said string.
string s = "EMA123_33"; // I need "EMA"
string s = "EMADRR123_33"; // I need "EMADRR"
I am using C# in Visual Studio 2008.
You can try this:
var myString = "EMA123_33";
var onlyLetters = new String(myString.Where(Char.IsLetter).ToArray());
please note: this version will find "e" just like "E" - if you need only upper-case letters then do something like this:
var myString = "EMA123_33";
var onlyLetters = new String(myString.Where(c => Char.IsLetter(c) && Char.IsUpper(c)).ToArray());
You can use a regular expression to replace all non-letters:
string s2 = Regex.Replace(s, #"[^A-Z]+", String.Empty);
If you're just after the initial letters, i.e. those at the start of the string (your examples are a bit unclear in that I don't know what would happen to letters at the end of the string), you can use a different Regex:
string s2 = Regex.Replace(s, #"(\p{L}+).*", "$1");
Regex MyRegex = new Regex("[^a-z]", RegexOptions.IgnoreCase);
string s = MyRegex.Replace(#"your 76% strings &*81 gose _ here and collect you want_{ (7 438 ?. !`", #"");
Console.WriteLine(s);
output
yourstringsgosehereandcollecyouwant