How to convert Turkish chars to English chars in a string?

How to convert Turkish chars to English chars in a string? - c#

string strTurkish = "ÜST";
how to make value of strTurkish as "UST" ?

var text = "ÜST";
var unaccentedText = String.Join("", text.Normalize(NormalizationForm.FormD)
.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));

You can use the following method for solving your problem. The other methods do not convert "Turkish Lowercase I (\u0131)" correctly.
public static string RemoveDiacritics(string text)
{
Encoding srcEncoding = Encoding.UTF8;
Encoding destEncoding = Encoding.GetEncoding(1252); // Latin alphabet
text = destEncoding.GetString(Encoding.Convert(srcEncoding, destEncoding, srcEncoding.GetBytes(text)));
string normalizedString = text.Normalize(NormalizationForm.FormD);
StringBuilder result = new StringBuilder();
for (int i = 0; i < normalizedString.Length; i++)
{
if (!CharUnicodeInfo.GetUnicodeCategory(normalizedString[i]).Equals(UnicodeCategory.NonSpacingMark))
{
result.Append(normalizedString[i]);
}
}
return result.ToString();
}

I'm not an expert on this sort of thing, but I think you can use string.Normalize to do it, by decomposing the value and then effectively removing an non-ASCII characters:
using System;
using System.Linq;
using System.Text;
class Test
{
static void Main()
{
string text = "\u00DCST";
string normalized = text.Normalize(NormalizationForm.FormD);
string asciiOnly = new string(normalized.Where(c => c < 128).ToArray());
Console.WriteLine(asciiOnly);
}
}
It's entirely possible that this does horrible things in some cases though.

public string TurkishCharacterToEnglish(string text)
{
char[] turkishChars = {'ı', 'ğ', 'İ', 'Ğ', 'ç', 'Ç', 'ş', 'Ş', 'ö', 'Ö', 'ü', 'Ü'};
char[] englishChars = {'i', 'g', 'I', 'G', 'c', 'C', 's', 'S', 'o', 'O', 'u', 'U'};
// Match chars
for (int i = 0; i < turkishChars.Length; i++)
text = text.Replace(turkishChars[i], englishChars[i]);
return text;
}

This is not a problem that requires a general solution. It is known that there only 12 special characters in Turkish alphabet that has to be normalized. Those are ı,İ,ö,Ö,ç,Ç,ü,Ü,ğ,Ğ,ş,Ş. You can write 12 rules to replace those with their English counterparts: i,I,o,O,c,C,u,U,g,G,s,S.

Public Function Ceng(ByVal _String As String) As String
Dim Source As String = "ığüşöçĞÜŞİÖÇ"
Dim Destination As String = "igusocGUSIOC"
For i As Integer = 0 To Source.Length - 1
_String = _String.Replace(Source(i), Destination(i))
Next
Return _String
End Function

public static string TurkishChrToEnglishChr(this string text)
{
if (string.IsNullOrEmpty(text)) return text;
Dictionary<char, char> TurkishChToEnglishChDic = new Dictionary<char, char>()
{
{'ç','c'},
{'Ç','C'},
{'ğ','g'},
{'Ğ','G'},
{'ı','i'},
{'İ','I'},
{'ş','s'},
{'Ş','S'},
{'ö','o'},
{'Ö','O'},
{'ü','u'},
{'Ü','U'}
};
return text.Aggregate(new StringBuilder(), (sb, chr) =>
{
if (TurkishChToEnglishChDic.ContainsKey(chr))
sb.Append(TurkishChToEnglishChDic[chr]);
else
sb.Append(chr);
return sb;
}).ToString();
}

Related

Extracting Formula from String

I have to extract all variables from Formula
Fiddle for below problem
eg. (FB+AB+ESI) / 12
Output {FB,AB,ESI}
Code written so far
var length = formula.Length;
List<string> variables = new List<string>();
List<char> operators = new List<char> { '+', '-', '*', '/', ')', '(', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' };
int count = 0;
string character = string.Empty;
for (int i = 0; i < length; i++)
{
if (!operators.Contains(formula[i]))
character += formula[i];
else
{
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
character = string.Empty;
count = i;
}
}
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
return variables;
Output of the Method is {FB,AB,ESI} which is correct
My problem is where Varaible contains numeric field i.e
eg. (FB1+AB1)/100
Expected Output : {FB1,AB1}
But My method return {FB,AB}

If variable's names must start with
letter A..Z, a..z
and if variable's names can contain
letters A..Z, a..z
digits 0..1
underscopes _
you can use regular expressions:
String source = "(FB2+a_3B+EsI) / 12";
String pattern = #"([A-Z]|[a-z])+([A-z]|[a-z]|\d|_)*";
// output will be "{FB2,a_3B,EsI}"
String output = "{" + String.Join(",",
Regex.Matches(source, pattern)
.OfType<Match>()
.Select(item => item.Value)) + "}";
In case you need a collection, say an array of variable's names, just modify the Linq:
String names[] = Regex.Matches(source, pattern)
.OfType<Match>()
.Select(item => item.Value)
.ToArray();
However, what is implemented is just a kind of naive tokenizer: you have to separate "variable names" found from function names, class names, check if they are commented out etc.

Have changed your code to do what you asked, but not sure about the approach of the solution, seeing that bracket and operator precedence is not taken into consideration.
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
string formula = "AB1+FB+100";
var length = formula.Length;
List<string> variables = new List<string>();
List<char> operators = new List<char>{'+', '-', '*', '/', ')', '('};
List<char> numerals = new List<char>{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
int count = 0;
string character = string.Empty;
char prev_char = '\0';
for (int i = 0; i < length; i++)
{
bool is_operator = operators.Contains(formula[i]);
bool is_numeral = numerals.Contains(formula[i]);
bool is_variable = !(is_operator || is_numeral);
bool was_variable = character.Contains(prev_char);
if (is_variable || (was_variable && is_numeral) )
character += formula[i];
else
{
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
character = string.Empty;
count = i;
}
prev_char = formula[i];
}
if (!string.IsNullOrWhiteSpace(character))
variables.Add(character);
foreach (var item in variables)
Console.WriteLine(item);
Console.WriteLine();
Console.WriteLine();
}
}
Maybe also consider something like Math-Expression-Evaluator (on nuget)

Here is how you could do it with Regular Expressions.
Regex regex = new Regex(#"([A-Z])\w+");
List<string> matchedStrings = new List<string>();
foreach (Match match in regex.Matches("(FB1+AB1)/100"))
{
matchedStrings.Add(match.Value);
}
This will create a list of strings of all the matches.

Without regex, you can split on the actual operators (not numbers), and then remove any items that begin with a number:
public static List<string> GetVariables(string formula)
{
if (string.IsNullOrWhitespace(formula)) return new List<string>();
var operators = new List<char> { '+', '-', '*', '/', '^', '%', '(', ')' };
int temp;
return formula
.Split(operators.ToArray(), StringSplitOptions.RemoveEmptyEntries)
.Where(operand => !int.TryParse(operand[0].ToString(), out temp))
.ToList();
}

You can do it this way, just optimize the code as you want.
string ss = "(FB+AB+ESI) / 12";
string[] spl = ss.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
string final = spl[0].Replace("(", "").Replace(")", "").Trim();
string[] entries = final.Split(new char[] {'+'}, StringSplitOptions.RemoveEmptyEntries);
StringBuilder sbFinal = new StringBuilder();
sbFinal.Append("{");
foreach(string en in entries)
{
sbFinal.Append(en + ",");
}
string finalString = sbFinal.ToString().TrimEnd(',');
finalString += "}";

What you are trying to do is an interpreter.
I can't give you the whole code but what I can give you is a head start (it will require a lot of coding).
First, learn about reverse polish notation.
Second, you need to learn about stacks.
Third, you have to apply both to get what you want to interpret.

C# Pig Latin with Regex Replace

First off- This is a Homework problem. Just getting that out there. Trying to build a Pig Latin Translator in C#, we have to use Regex replace but I'm having some issues. Not allowed to use the Split method to obtain an array of words. We have to use the static method Replace of type Regex. White Space, punctuation linebreaks et should be preserved. Capitalized words should remain so. For those unfamiliar with the rules of Pig Latin-
If the string begins with a vowel, add "way" to the string. (vowels are a,e,i,o,u)
Examples: Pig-Latin for "orange" is "orangeway", Pig-Latin for “eating” is “eatingway”
Otherwise, find the first occurrence of a vowel, move all the characters before the vowel to the end of the word, and add "ay".
(in the middle of the word ‘y’ also counts as a vowel, but NOT at the beginning)
Examples: Pig-Latin for "story" is "orystay" since the characters "st" occur before the first vowel; Pig-Latin for "crystal" is "ystalcray", but Pig-Latin for "yellow" is "ellowyay".
If there are no vowels, add "ay".Examples: Pig-Latin for "mph" is "mphay", Pig-Latin for RPM is RPMay
I've got a ton of commented out code, so I'll remove that for reading ease.
My test sentence is "Eat monkey poo." I'm getting "Ewayaayt moaynkeayy poayoay."
I know Regex is 'greedy', but I can't figure out how to get it to stop with just the first vowel it finds. Using Textboxes as well.
namespace AssignmentPigLatin
{
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
OriginalTb.Text = "Eat monkey poo.";
}
private void translateButton_Click(object sender, RoutedEventArgs e)
{
string vowels = "[AEIOUaeiou]";
var regex = new Regex(vowels);
var translation = regex.Replace(OriginalTb.Text, TranslateToPigLatin);
PigLatinTb.Text = translation;
}
private void ClearButton_Click(object sender, RoutedEventArgs e)
{
OriginalTb.Text = "";
PigLatinTb.Text = "";
}
static string TranslateToPigLatin(Match match)
{
string word = match.ToString();
string firstLetters = word.Substring(0, match.Length);
string restLetters = word.Substring(firstLetters.Length - 1, word.Length-1);
string newWord;
if (match.Index == 0)
{
return word + "way";
}
else
{
return restLetters + firstLetters + "ay";
}
}
}
}

The question was interesting to answer. Don't forget to attribute me ;)
Add this method in your class AssignmentPigLatin
private string PigLatinTranslator(string s)
{
s = Regex.Replace(s, #"(\b[a|e|i|o|u]\w+)", "$1way", RegexOptions.IgnoreCase);
List<string> words = new List<string>();
foreach (Match v in Regex.Matches(s, #"\w+"))
{
string result;
if (!v.Value.EndsWith("way"))
{
result = Regex.Replace(v.Value, #"([^a|e|i|o|u]*)([a|e|i|o|u])(\w+)", "$2$3$1ay", RegexOptions.IgnoreCase);
words.Add(result);
}
else { words.Add(v.Value); }
}
s = string.Join(" ", words);
words.Clear();
foreach (Match v in Regex.Matches(s,#"\w+"))
{
string result = Regex.Replace(v.Value, #"\b([^a|e|i|o|u]+)\b", "$1ay", RegexOptions.IgnoreCase);
words.Add(result);
}
s = string.Join(" ", words);
return s;
}
Call it like this:
string test = "MPH Eat monkey poo."; // Added MPH, so that you can test my method works or not.
string result = PigLatinTranslator(test);
Console.WriteLine(result); // MPHay Eatway onkeymay oopay.

Easier and more clear solution is to use Regex.Replace with lambda.
static string TranslateToPigLatin(string input)
{
char[] vowels = new[] { 'A', 'E', 'I', 'O', 'U', 'a', 'e', 'i', 'o', 'u' };
char[] vowelsExtended = vowels.Concat(new[] { 'Y', 'y' }).ToArray();
string output = Regex.Replace(input, #"\w+", m =>
{
string word = m.Value;
if (vowels.Contains(word[0]))
return word + "way";
else
{
int indexOfVowel = word.IndexOfAny(vowelsExtended, 1);
if (indexOfVowel == -1)
return word + "ay";
else
return word.Substring(indexOfVowel) + word.Substring(0, indexOfVowel) + "ay";
}
});
return output;
}

How do I make letters to uppercase after each of a set of specific characters

I have a collection of characters (',', '.', '/', '-', ' ') then I have a collection of strings (about 500).
What I want to do as fast as possible is: after each of the characters I want to make the next letter uppercase.
I want the first capitalized as well and many of the strings are all uppercase to begin with.
EDIT:
I modified tdragons answer to this final result:
public static String CapitalizeAndStuff(string startingString)
{
startingString = startingString.ToLower();
char[] chars = new[] { '-', ',', '/', ' ', '.'};
StringBuilder result = new StringBuilder(startingString.Length);
bool makeUpper = true;
foreach (var c in startingString)
{
if (makeUpper)
{
result.Append(Char.ToUpper(c));
makeUpper = false;
}
else
{
result.Append(c);
}
if (chars.Contains(c))
{
makeUpper = true;
}
}
return result.ToString();
}
Then I call this method for all my strings.

string a = "fef-aw-fase-fes-fes,fes-,fse--sgr";
char[] chars = new[] { '-', ',' };
StringBuilder result = new StringBuilder(a.Length);
bool makeUpper = true;
foreach (var c in a)
{
if (makeUpper)
{
result.Append(Char.ToUpper(c));
makeUpper = false;
}
else
{
result.Append(c);
}
if (chars.Contains(c))
{
makeUpper = true;
}
}

public static string Capitalise(string text, string targets, CultureInfo culture)
{
bool capitalise = true;
var result = new StringBuilder(text.Length);
foreach (char c in text)
{
if (capitalise)
{
result.Append(char.ToUpper(c, culture));
capitalise = false;
}
else
{
if (targets.Contains(c))
capitalise = true;
result.Append(c);
}
}
return result.ToString();
}
Use it like this:
string targets = ",./- ";
string text = "one,two.three/four-five six";
Console.WriteLine(Capitalise(text, targets, CultureInfo.InvariantCulture));

char[] chars = new char[] { ',', '.', '/', '-', ' ' };
string input = "Foo bar bar foo, foo, bar,foo-bar.bar_foo zz-";
string result = input[0] + new string(input.Skip(1).Select((c, i) =>
chars.Contains(input[i]) ? char.ToUpper(input[i + 1]) : input[i + 1]
).ToArray());
Console.WriteLine(result);

Or you can use a simply regex expression:
var result = Regex.Replace(str, #"([.,-][a-z]|\b[a-z])", m => m.Value.ToUpper());

You can stringSplit your whole string multiple times, once for each element, rinse and repeate, and then uppcase each block.
char[] tempChar = {',','-'};
List<string> tempList = new List();
tempList.Add(yourstring);
foreach(var currentChar in tempChar)
{
List<string> tempSecondList = new List();
foreach(var tempString in tempList)
{
foreach(var tempSecondString in tempString.split(currentchar))
{
tempSecondList.Add(tempSecondString);
}
}
tempList = tempSecondList;
}
I hope i did count correct, anyway, afterwards make every entry in tempList Upper

Inserting my own illegal characters into Path.GetInvalidFileNameChars() in C#

How can I extend the Path.GetInvalidFileNameChars to include my own set of characters that is illegal in my application?
string invalid = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
If I wanted to add the '&' as an illegal character, could I do that?

typeof(Path).GetField("InvalidFileNameChars", BindingFlags.NonPublic | BindingFlags.Static).SetValue(null, new[] { 'o', 'v', 'e', 'r', '9', '0', '0', '0' });

Try this:
var invalid = Path.GetInvalidFileNameChars().Concat(new [] { '&' });
This will yeild an IEnumerable<char> with all invalid characters, including yours.
Here is a full example:
using System.IO;
using System.Linq;
class Program
{
static void Main()
{
// This is the sequence of characters
var invalid = Path.GetInvalidFileNameChars().Concat(new[] { '&' });
// If you want them as an array you can do this
var invalid2 = invalid.ToArray();
// If you want them as a string you can do this
var invalid3 = new string(invalid.ToArray());
}
}

You can't modify an existing function, but you can write a wrapper function that returns Path.GetInvalidFileNameChars() and your illegal characters.
public static string GetInvalidFileNameChars() {
return Path.GetInvalidFileNameChars().Concat(MY_INVALID_FILENAME_CHARS);
}

An extension method is your best bet here.
public static class Extensions
{
public static char[] GetApplicationInvalidChars(this char[] input)
{
//Your list of invalid characters goes below.
var invalidChars = new [] { '%', '#', 't' };
return String.Concat(input, invalidChars).ToCharArray();
}
}
Then use it as follows:
string invalid = Path.GetInvalidFileNameChars().GetApplicationInvalidChars();
It will concatenate your invalid characters to what's already in there.

First create a helper class "SanitizeFileName.cs"
public class SanitizeFileName
{
public static string ReplaceInvalidFileNameChars(string fileName, char? replacement = null)
{
if (fileName != null && fileName.Length != 0)
{
var sb = new StringBuilder();
var badChars = new[] { ',', ' ', '^', '°' };
var inValidChars = Path.GetInvalidFileNameChars().Concat(badChars).ToList();
foreach (var #char in fileName)
{
if (inValidChars.Contains(#char))
{
if (replacement.HasValue)
{
sb.Append(replacement.Value);
}
continue;
}
sb.Append(#char);
}
return sb.ToString();
}
return null;
}
}
Then, use it like this:
var validFileName = SanitizeFileName.ReplaceInvalidFileNameChars(filename, '_');
in my case, i had to clean up the "filename" on the "Content-Deposition" in Response Headers in a c# download method.
Response.AddHeader("Content-Disposition", "attachment;filename=" + validFileName);

How do I remove all non alphanumeric characters from a string except dash?

How do I remove all non alphanumeric characters from a string except dash and space characters?

Replace [^a-zA-Z0-9 -] with an empty string.
Regex rgx = new Regex("[^a-zA-Z0-9 -]");
str = rgx.Replace(str, "");

I could have used RegEx, they can provide elegant solution but they can cause performane issues. Here is one solution
char[] arr = str.ToCharArray();
arr = Array.FindAll<char>(arr, (c => (char.IsLetterOrDigit(c)
|| char.IsWhiteSpace(c)
|| c == '-')));
str = new string(arr);
When using the compact framework (which doesn't have FindAll)
Replace FindAll with1
char[] arr = str.Where(c => (char.IsLetterOrDigit(c) ||
char.IsWhiteSpace(c) ||
c == '-')).ToArray();
str = new string(arr);
1 Comment by ShawnFeatherly

You can try:
string s1 = Regex.Replace(s, "[^A-Za-z0-9 -]", "");
Where s is your string.

Using System.Linq
string withOutSpecialCharacters = new string(stringWithSpecialCharacters.Where(c =>char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-').ToArray());

The regex is [^\w\s\-]*:
\s is better to use instead of space (), because there might be a tab in the text.

Based on the answer for this question, I created a static class and added these. Thought it might be useful for some people.
public static class RegexConvert
{
public static string ToAlphaNumericOnly(this string input)
{
Regex rgx = new Regex("[^a-zA-Z0-9]");
return rgx.Replace(input, "");
}
public static string ToAlphaOnly(this string input)
{
Regex rgx = new Regex("[^a-zA-Z]");
return rgx.Replace(input, "");
}
public static string ToNumericOnly(this string input)
{
Regex rgx = new Regex("[^0-9]");
return rgx.Replace(input, "");
}
}
Then the methods can be used as:
string example = "asdf1234!##$";
string alphanumeric = example.ToAlphaNumericOnly();
string alpha = example.ToAlphaOnly();
string numeric = example.ToNumericOnly();

Want something quick?
public static class StringExtensions
{
public static string ToAlphaNumeric(this string self,
params char[] allowedCharacters)
{
return new string(Array.FindAll(self.ToCharArray(),
c => char.IsLetterOrDigit(c) ||
allowedCharacters.Contains(c)));
}
}
This will allow you to specify which characters you want to allow as well.

Here is a non-regex heap allocation friendly fast solution which was what I was looking for.
Unsafe edition.
public static unsafe void ToAlphaNumeric(ref string input)
{
fixed (char* p = input)
{
int offset = 0;
for (int i = 0; i < input.Length; i++)
{
if (char.IsLetterOrDigit(p[i]))
{
p[offset] = input[i];
offset++;
}
}
((int*)p)[-1] = offset; // Changes the length of the string
p[offset] = '\0';
}
}
And for those who don't want to use unsafe or don't trust the string length hack.
public static string ToAlphaNumeric(string input)
{
int j = 0;
char[] newCharArr = new char[input.Length];
for (int i = 0; i < input.Length; i++)
{
if (char.IsLetterOrDigit(input[i]))
{
newCharArr[j] = input[i];
j++;
}
}
Array.Resize(ref newCharArr, j);
return new string(newCharArr);
}

I´ve made a different solution, by eliminating the Control characters, which was my original problem.
It is better than putting in a list all the "special but good" chars
char[] arr = str.Where(c => !char.IsControl(c)).ToArray();
str = new string(arr);
it´s simpler, so I think it´s better !

Here's an extension method using #ata answer as inspiration.
"hello-world123, 456".MakeAlphaNumeric(new char[]{'-'});// yields "hello-world123456"
or if you require additional characters other than hyphen...
"hello-world123, 456!?".MakeAlphaNumeric(new char[]{'-','!'});// yields "hello-world123456!"
public static class StringExtensions
{
public static string MakeAlphaNumeric(this string input, params char[] exceptions)
{
var charArray = input.ToCharArray();
var alphaNumeric = Array.FindAll<char>(charArray, (c => char.IsLetterOrDigit(c)|| exceptions?.Contains(c) == true));
return new string(alphaNumeric);
}
}

If you are working in JS, here is a very terse version
myString = myString.replace(/[^A-Za-z0-9 -]/g, "");

I use a variation of one of the answers here. I want to replace spaces with "-" so its SEO friendly and also make lower case. Also not reference system.web from my services layer.
private string MakeUrlString(string input)
{
var array = input.ToCharArray();
array = Array.FindAll<char>(array, c => char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-');
var newString = new string(array).Replace(" ", "-").ToLower();
return newString;
}

There is a much easier way with Regex.
private string FixString(string str)
{
return string.IsNullOrEmpty(str) ? str : Regex.Replace(str, "[\\D]", "");
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to convert Turkish chars to English chars in a string? - c#

string strTurkish = "ÜST"; how to make value of strTurkish as "UST" ?

var text = "ÜST"; var unaccentedText = String.Join("", text.Normalize(NormalizationForm.FormD) .Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));

This is not a problem that requires a general solution. It is known that there only 12 special characters in Turkish alphabet that has to be normalized. Those are ı,İ,ö,Ö,ç,Ç,ü,Ü,ğ,Ğ,ş,Ş. You can write 12 rules to replace those with their English counterparts: i,I,o,O,c,C,u,U,g,G,s,S.

Public Function Ceng(ByVal _String As String) As String Dim Source As String = "ığüşöçĞÜŞİÖÇ" Dim Destination As String = "igusocGUSIOC" For i As Integer = 0 To Source.Length - 1 _String = _String.Replace(Source(i), Destination(i)) Next Return _String End Function

Related

Extracting Formula from String

C# Pig Latin with Regex Replace

How do I make letters to uppercase after each of a set of specific characters

Inserting my own illegal characters into Path.GetInvalidFileNameChars() in C#

How do I remove all non alphanumeric characters from a string except dash?

Categories

Resources