Different character for the same Unicode Escaped character on different machines

Different character for the same Unicode Escaped character on different machines - c#

On my local machine, this:
Trace.Warn("\u012b");
Outputs this (which is wrong):
ī
Yet, on another machine, (eg. here: http://www.volatileread.com/UtilityLibrary/SnippetCompiler):
Console.WriteLine("\u012b");
Outputs:
i
What's happening?
EDIT: I'm using this function from here: Any libraries to convert number Pinyin to Pinyin with tone markings?
public static string ConvertNumericalPinYinToAccented(string input)
{
Dictionary<int, string> PinyinToneMark = new Dictionary<int, string>
{
{0, "aoeiuv\u00fc"},
{1, "\u0101\u014d\u0113\u012b\u016b\u01d6\u01d6"},
{2, "\u00e1\u00f3\u00e9\u00ed\u00fa\u01d8\u01d8"},
{3, "\u01ce\u01d2\u011b\u01d0\u01d4\u01da\u01da"},
{4, "\u00e0\u00f2\u00e8\u00ec\u00f9\u01dc\u01dc"}
};
string[] words = input.Split(' ');
string accented = "";
string t = "";
foreach (string pinyin in words)
{
foreach (char c in pinyin)
{
if (c >= 'a' && c <= 'z')
{
t += c;
}
else if (c == ':')
{
if (t[t.Length - 1] == 'u')
{
t = t.Substring(0, t.Length - 2) + "\u00fc";
}
}
else
{
if (c >= '0' && c <= '5')
{
int tone = (int)Char.GetNumericValue(c) % 5;
if (tone != 0)
{
Match match = Regex.Match(t, "[aoeiuv\u00fc]+");
if (!match.Success)
{
t += c;
}
else if (match.Groups[0].Length == 1)
{
t = t.Substring(0, match.Groups[0].Index) +
PinyinToneMark[tone][PinyinToneMark[0].IndexOf(match.Groups[0].Value[0])]
+ t.Substring(match.Groups[0].Index + match.Groups[0].Length);
}
else
{
if (t.Contains("a"))
{
t = t.Replace("a", PinyinToneMark[tone][0].ToString());
}
else if (t.Contains("o"))
{
t = t.Replace("o", PinyinToneMark[tone][1].ToString());
}
else if (t.Contains("e"))
{
t = t.Replace("e", PinyinToneMark[tone][2].ToString());
}
else if (t.Contains("ui"))
{
t = t.Replace("i", PinyinToneMark[tone][3].ToString());
}
else if (t.Contains("iu"))
{
t = t.Replace("u", PinyinToneMark[tone][4].ToString());
}
else
{
t += "!";
}
}
}
}
accented += t;
t = "";
}
}
accented += t + " ";
}
accented = accented.TrimEnd();
return accented;
}
Eg.: ConvertNumericalPinYinToAccented("ba2itia1n");
Working version: http://volatileread.com/utilitylibrary/snippetcompiler?id=22734

On this link there is an answer that might be usefull to you.
https://superuser.com/questions/412986/unicode-support-between-different-os-and-browsers
The unicode interpretation depends on the Browser you use and the OS that the server is running on. Knowing that, it is normal that a small difference appears.

Related

Find 2 letter or 3 letters from wordlist

I have a 256 wordlist with 8 digits like "DDUUDDUU", "DDDDUUUU", "DDUUUUUU" and I am having a hard time trying to match any combination of 2 or 3 consecutive letters like "UUDDUUUU", "DDDUUDDD"
foreach (var eachWord in AAAA.Values) {
int iCountU = 0;
int iCountD = 0;
char iLastChar = (char)106;
foreach (char letter in eachWord) {
if (letter == 'D') {
if (iCountD < 3) {
if (letter != iLastChar) {
iLastChar = letter;
iCountD = 1;
} else {
iCountD += 1;
}
}
}
if (letter == 'U') {
if (iCountU < 3) {
if (letter != iLastChar) {
iLastChar = letter;
iCountU = 1;
} else {
iCountU += 1;
}
}
}
}
if (iCountU > 2 && iCountD > 2) {
BBBB[eachWord] = eachWord;
}
}

Since this implementation counts the maximum consecutive occurrences for each character, the second word evaluates as {2,4} which overrides {2,3} and doesn't count as a match.
int matches = 0;
var wordList = new string[] { "DDUUUDUD", "DDUUUUDU" };
foreach (string word in wordList)
{
char? previous = null;
int count = 0;
var results = new Dictionary<char, int>();
foreach (char letter in word)
{
if (letter == previous)
results[letter] = Math.Max(results.ContainsKey(letter) ? results[letter] : 0, ++count);
else
count = 1;
previous = letter;
}
if (results.Values.SequenceEqual(new int[] {2,3}) || results.Values.SequenceEqual(new int[] {3,2}))
matches++;
}
Console.WriteLine(matches);

I don't have a C# compiler with me, here is a python code (with many element pretending in C# style)
AAAA=["DDUUDDUU", "DDDDUUUU"]
for word in AAAA:
isFirst=True
maxCon=0
currCon=1
for c in word:
if isFirst:
isFirst=False
else:
if c==prev:
currCon+=1
maxCon=max(maxCon,currCon)
else:
currCon=1
prev=c
if maxCon in (2,3):
print(word,maxCon)

This is surprisingly simple with a Regular Expression:
static bool testString(string test)
{
return Regex.Matches(test, #"([a-zA-Z])\1+").Any(x => x.Length == 2 || x.Length == 3);
}
The main trick is that the \1+ will create a group when it encounters a new character and add the next characters that match the first character to that match group.
Note on older .NET versions you may need to use Cast<Match>, as Regex.Matches(test, #"([a-zA-Z])\1+").Cast<Match>().Any(x => x.Length == 2 || x.Length == 3)

English to PigLatin c#

Just had this Pig Latin problem as "homework". The conditions I have been given are:
For words that begin with consonant sounds, all letters before the initial vowel are placed at the end of the word sequence. Then, ay is added.
For words that begin with vowel sounds move the initial vowel(s) along with the first consonant or consonant cluster to the end of the word and add ay.
For words that have no consonant add way.
Tested with:
Write a method that will convert an English sentence into Pig Latin
That turned into
itewray away ethodmay atthay illway onvertcay anay ishenglay entencesay ointay igpay atinlay
It does what it should with one exception which is not in the rules but I thought about it and I have no idea how I can implement it. The method I created does exactly what the problem is asking but if I try to convert an all consonants word into piglatin it does not work. For example grrrrr into piglatin should be grrrrray.
public static string ToPigLatin(string sentencetext)
{
string vowels = "AEIOUaeiou";
//string cons = "bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ";
List<string> newWords = new List<string>();
foreach (string word in sentencetext.Split(' '))
{
if (word.Length == 1)
{
newWords.Add(word + "way");
}
if (word.Length == 2 && vowels.Contains(word[0]))
{
newWords.Add(word + "ay");
}
if (word.Length == 2 && vowels.Contains(word[1]) && !vowels.Contains(word[0]))
{
newWords.Add(word.Substring(1) + word.Substring(0, 1) + "ay");
}
if (word.Length == 2 && !vowels.Contains(word[1]) && !vowels.Contains(word[0]))
{
newWords.Add(word + "ay");
}
for (int i = 1; i < word.Length; i++)
{
if (vowels.Contains(word[i]) && (vowels.Contains(word[0])))
{
newWords.Add(word.Substring(i) + word.Substring(0, i) + "ay");
break;
}
}
for (int i = 0; i < word.Length; i++)
{
if (vowels.Contains(word[i]) && !(vowels.Contains(word[0])) && word.Length > 2)
{
newWords.Add(word.Substring(i) + word.Substring(0, i) + "ay");
break;
}
}
}
return string.Join(" ", newWords);
}
static void Main(string[] args)
{
//Console.WriteLine("Enter a sentence to convert to PigLatin:");
// string sentencetext = Console.ReadLine();
string pigLatin = ToPigLatin("Write a method that will convert an English sentence into Pig Latin");
Console.WriteLine(pigLatin);
Console.ReadKey();
}

Give this a go:
public static string ToPigLatin(string sentencetext)
{
string vowels = "AEIOUaeiou";
string cons = "bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ";
Func<string, string> toPigLatin = word =>
{
word = word.ToLower();
var result = word;
Func<string, string, (string, string)> split = (w, l) =>
{
var prefix = new string(w.ToArray().TakeWhile(x => l.Contains(x)).ToArray());
return (prefix, w.Substring(prefix.Length));
};
if (!word.Any(w => cons.Contains(w)))
{
result = word + "way";
}
else
{
var (s, e) = split(word, vowels);
var (s2, e2) = split(e, cons);
result = e2 + s + s2 + "ay";
}
return result;
};
return string.Join(" ", sentencetext.Split(' ').Select(x => toPigLatin(x)));
}
The code:
string pigLatin = ToPigLatin("Grrrr Write a method that will convert an English sentence into Pig Latin");
Console.WriteLine(pigLatin);
gives:
grrrray itewray away ethodmay atthay illway onvertcay anay ishenglay entencesay ointay igpay atinlay

C# How to convert a list of strings with not a single character to char?

I have list of strings which has such items inside - 65,66... so on.
When I'm trying to use Convert.ToChar(item), I'm getting error that single string should have only one character. What I can't understood is what I can really use besides the Convert function to turn 65 (string) into char = A value.
static void Main(string[] args)
{
List<char> Alphabet = new List<char>();
List<string> Message = new List<string>();
for (int i = 65; i < 91; i++)
{
Alphabet.Add((char)i);
}
var stroke = Console.ReadLine().ToString();
foreach (var letter in stroke)
{
if (Alphabet[Alphabet.Count - 1] == letter)
{
Message.Add((66.ToString()));
}
if (Alphabet[Alphabet.Count - 2] == letter)
{
Message.Add((65.ToString()));
}
if (Convert.ToChar(" ") == letter)
{
Message.Add(" ");
}
foreach (var item in Alphabet)
{
if (item == letter && letter != Alphabet[Alphabet.Count -1] && letter != Alphabet[Alphabet.Count - 2])
{
Message.Add((item + 2).ToString());
}
}
}
foreach (var item in Message)
{
if (item != " ")
{
Console.Write(Convert.ToChar(Convert.ToInt16(item)));
}
else
{
Console.Write(" ");
}
}
Console.ReadLine();}
Here is ending working code, for those who may wonder why i need such type of convert.Its Caesarus encrypter. Working only for upper-case letters but its working now with all help of those guys above.

If items of your list are int values such as 65,66, ... that seems to be character codes, you can try:
Convert.ToChar(Convert.ToInt32(item))
For example:
var str="65";
var chr= Convert.ToChar(Convert.ToInt32(str));
//The output is A

Encryption:
var stroke = Console.ReadLine();
var enc = new String(stroke.ToCharArray().Select(c=>Convert.ToChar(c+2)).ToArray());
Console.WriteLine(enc);

Low complexity algorithm to remove/replace special characters [duplicate]

This question already has answers here:
A faster way of doing multiple string replacements
(8 answers)
Closed 9 years ago.
I want to replace some invalid characters in the name of a file uploaded to my application.
I've searched up to something on the internet and found some complex algorithms to do it, here's one:
public static string RemoverAcentuacao(string palavra)
{
string palavraSemAcento = null;
string caracterComAcento = "áàãâäéèêëíìîïóòõôöúùûüçáàãâÄéèêëíìîïóòõÖôúùûÜç, ?&:/!;ºª%‘’()\"”“";
string caracterSemAcento = "aaaaaeeeeiiiiooooouuuucAAAAAEEEEIIIIOOOOOUUUUC___________________";
if (!String.IsNullOrEmpty(palavra))
{
for (int i = 0; i < palavra.Length; i++)
{
if (caracterComAcento.IndexOf(Convert.ToChar(palavra.Substring(i, 1))) >= 0)
{
int car = caracterComAcento.IndexOf(Convert.ToChar(palavra.Substring(i, 1)));
palavraSemAcento += caracterSemAcento.Substring(car, 1);
}
else
{
palavraSemAcento += palavra.Substring(i, 1);
}
}
string[] cEspeciais = { "#39", "---", "--", "'", "#", "\r\n", "\n", "\r" };
for (int q = 0; q < cEspeciais.Length; q++)
{
palavraSemAcento = palavraSemAcento.Replace(cEspeciais[q], "-");
}
for (int x = (cEspeciais.Length - 1); x > -1; x--)
{
palavraSemAcento = palavraSemAcento.Replace(cEspeciais[x], "-");
}
palavraSemAcento = palavraSemAcento.Replace("+", "-").Replace(Environment.NewLine, "").TrimStart('-').TrimEnd('-').Replace("<i>", "-").Replace("<-i>", "-").Replace("<br>", "").Replace("--", "-");
}
else
{
palavraSemAcento = "indefinido";
}
return palavraSemAcento.ToLower();
}
There's a way to do it with a less complex algorithm?
I think this algorithm is very complex to something not too complex, but I can't think in something diferent of this.

I want to replace some invalid characters in the name of a file
if this is really what you want then it is easy
string ToLegalFileName(string s)
{
var invalidChars = new HashSet<char>(Path.GetInvalidFileNameChars());
return String.Join("", s.Select(c => invalidChars.Contains(c) ? '_' : c));
}
if your intent is to replace accented chars with their ascii counterparts then
string RemoverAcentuacao(string s)
{
return String.Join("",
s.Normalize(NormalizationForm.FormD)
.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
}
and this is the 3rd version which replaces accented chars + other chars with '_'
string RemoverAcentuacao2(string s)
{
return String.Join("",
s.Normalize(NormalizationForm.FormD)
.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
.Select(c => char.IsLetterOrDigit(c) ? c : '_')
.Select(c => (int)c < 128 ? c : '_'));
}

A solution using regular expressions:
string ReplaceSpecial(string input, string replace, char replacewith)
{
char[] back = input.ToCharArray();
var matches = Regex.Matches(String.Format("[{0}]", replace), input);
foreach (var i in matches)
back[i.Index] = replacewith;
return new string(back);
}
A somewhat simpler solution using String.Replace:
string ReplaceSpecial(string input, char[] replace, char replacewith)
{
string back = input;
foreach (char i in replace)
back.Replace(i, replacewith);
return back;
}

static string RemoverAcentuacao(string s)
{
string caracterComAcento = "áàãâäéèêëíìîïóòõôöúùûüçáàãâÄéèêëíìîïóòõÖôúùûÜç, ?&:/!;ºª%‘’()\"”“";
string caracterSemAcento = "aaaaaeeeeiiiiooooouuuucAAAAAEEEEIIIIOOOOOUUUUC___________________";
return new String(s.Select(c =>
{
int i = caracterComAcento.IndexOf(c);
return (i == -1) ? c : caracterSemAcento[i];
}).ToArray());
}

Here is a really simple method that I've used recently.
I hope it meets your requirements. To be honest, the code is a bit difficult to read due to the language of the variable declarations.
List<char> InvalidCharacters = new List<char>() { 'a','b','c' };
static string StripInvalidCharactersFromField(string field)
{
for (int i = 0; i < field.Length; i++)
{
string s = new string(new char[] { field[i] });
if (InvalidCharacters.Contains(s))
{
field = field.Remove(i, 1);
i--;
}
}
return field;
}

How to separate character and number part from string

E.g., I would like to separate:
OS234 to OS and 234
AA4230 to AA and 4230
I have used following trivial solution, but I am quite sure that there should be a more efficient and robust solution .
private void demo()
{ string cell="ABCD4321";
int a = getIndexofNumber(cell);
string Numberpart = cell.Substring(a, cell.Length - a);
row = Convert.ToInt32(rowpart);
string Stringpart = cell.Substring(0, a);
}
private int getIndexofNumber(string cell)
{
int a = -1, indexofNum = 10000;
a = cell.IndexOf("0"); if (a > -1) { if (indexofNum > a) { indexofNum = a; } }
a = cell.IndexOf("1"); if (a > -1) { if (indexofNum > a) { indexofNum = a; } }
a = cell.IndexOf("2"); if (a > -1) { if (indexofNum > a) { indexofNum = a; } }
a = cell.IndexOf("3"); if (a > -1) { if (indexofNum > a) { indexofNum = a; } }
a = cell.IndexOf("4"); if (a > -1) { if (indexofNum > a) { indexofNum = a; } }
a = cell.IndexOf("5"); if (a > -1) { if (indexofNum > a) { indexofNum = a; } }
a = cell.IndexOf("6"); if (a > -1) { if (indexofNum > a) { indexofNum = a; } }
a = cell.IndexOf("7"); if (a > -1) { if (indexofNum > a) { indexofNum = a; } }
a = cell.IndexOf("8"); if (a > -1) { if (indexofNum > a) { indexofNum = a; } }
a = cell.IndexOf("9"); if (a > -1) { if (indexofNum > a) { indexofNum = a; } }
if (indexofNum != 10000)
{ return indexofNum; }
else
{ return 0; }
}

Regular Expressions are best suited for this kind of work:
using System.Text.RegularExpressions;
Regex re = new Regex(#"([a-zA-Z]+)(\d+)");
Match result = re.Match(input);
string alphaPart = result.Groups[1].Value;
string numberPart = result.Groups[2].Value;

Use Linq to do this
string str = "OS234";
var digits = from c in str
select c
where Char.IsDigit(c);
var alphas = from c in str
select c
where !Char.IsDigit(c);

Everyone and their mother will give you a solution using regex, so here's one that is not:
// s is string of form ([A-Za-z])*([0-9])* ; char added
int index = s.IndexOfAny(new char[] { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' });
string chars = s.Substring(0, index);
int num = Int32.Parse(s.Substring(index));

I really like jason's answer. Lets improve it a bit. We dont need regex here. My solution handle input like "H1N1":
public static IEnumerable<string> SplitAlpha(string input)
{
var words = new List<string> { string.Empty };
for (var i = 0; i < input.Length; i++)
{
words[words.Count-1] += input[i];
if (i + 1 < input.Length && char.IsLetter(input[i]) != char.IsLetter(input[i + 1]))
{
words.Add(string.Empty);
}
}
return words;
}
This solution is linear O(n).
output
"H1N1" -> ["H", "1", "N", "1"]
"H" -> ["H"]
"GH1N12" -> ["GH", "1", "N", "12"]
"OS234" -> ["OS", "234"]
Same solution with a StringBuilder
public static IEnumerable<string> SplitAlpha(string input)
{
var words = new List<StringBuilder>{new StringBuilder()};
for (var i = 0; i < input.Length; i++)
{
words[words.Count - 1].Append(input[i]);
if (i + 1 < input.Length && char.IsLetter(input[i]) != char.IsLetter(input[i + 1]))
{
words.Add(new StringBuilder());
}
}
return words.Select(x => x.ToString());
}
Try it Online!

If you want resolve more occurrences of char followed by number or vice versa you can use
private string SplitCharsAndNums(string text)
{
var sb = new StringBuilder();
for (var i = 0; i < text.Length - 1; i++)
{
if ((char.IsLetter(text[i]) && char.IsDigit(text[i+1])) ||
(char.IsDigit(text[i]) && char.IsLetter(text[i+1])))
{
sb.Append(text[i]);
sb.Append(" ");
}
else
{
sb.Append(text[i]);
}
}
sb.Append(text[text.Length-1]);
return sb.ToString();
}
And then
var text = SplitCharsAndNums("asd1 asas4gr5 6ssfd");
var tokens = text.Split(' ');

Are you doing this for sorting purposes? If so, keep in mind that Regex can kill performance for large lists. I frequently use an AlphanumComparer that's a general solution to this problem (can handle any sequence of letters and numbers in any order). I believe that I adapted it from this page.
Even if you're not sorting on it, using the character-by-character approach (if you have variable lengths) or simple substring/parse (if they're fixed) will be a lot more efficient and easier to test than a Regex.

I have used bniwredyc's answer to get Improved version of my routine:
private void demo()
{
string cell = "ABCD4321";
int row, a = getIndexofNumber(cell);
string Numberpart = cell.Substring(a, cell.Length - a);
row = Convert.ToInt32(Numberpart);
string Stringpart = cell.Substring(0, a);
}
private int getIndexofNumber(string cell)
{
int indexofNum=-1;
foreach (char c in cell)
{
indexofNum++;
if (Char.IsDigit(c))
{
return indexofNum;
}
}
return indexofNum;
}

.NET 2.0 compatible, without regex
public class Result
{
private string _StringPart;
public string StringPart
{
get { return _StringPart; }
}
private int _IntPart;
public int IntPart
{
get { return _IntPart; }
}
public Result(string stringPart, int intPart)
{
_StringPart = stringPart;
_IntPart = intPart;
}
}
class Program
{
public static Result GetResult(string source)
{
string stringPart = String.Empty;
int intPart;
var buffer = new StringBuilder();
foreach (char c in source)
{
if (Char.IsDigit(c))
{
if (stringPart == String.Empty)
{
stringPart = buffer.ToString();
buffer.Remove(0, buffer.Length);
}
}
buffer.Append(c);
}
if (!int.TryParse(buffer.ToString(), out intPart))
{
return null;
}
return new Result(stringPart, intPart);
}
static void Main(string[] args)
{
Result result = GetResult("OS234");
Console.WriteLine("String part: {0} int part: {1}", result.StringPart, result.IntPart);
result = GetResult("AA4230 ");
Console.WriteLine("String part: {0} int part: {1}", result.StringPart, result.IntPart);
result = GetResult("ABCD4321");
Console.WriteLine("String part: {0} int part: {1}", result.StringPart, result.IntPart);
Console.ReadKey();
}
}

Just use the substring function and set position inside the bracket.
String id = "DON123";
System.out.println("Id nubmer is : "+id.substring(3,6));
Answer:
Id number is: 123

use Split to seprate string from sting that use tab \t and space
string s = "sometext\tsometext\tsometext";
string[] split = s.Split('\t');
now you have an array of string that you want too easy

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Different character for the same Unicode Escaped character on different machines - c#

Related

Find 2 letter or 3 letters from wordlist

English to PigLatin c#

C# How to convert a list of strings with not a single character to char?

Low complexity algorithm to remove/replace special characters [duplicate]

How to separate character and number part from string

Categories

Resources