I have a requirement, I have a textfield which I'm fetching the data from database. It may be in Arabic or in English. I want to differentiate it dynamically and change alignment accordingly. ie, if the text is in Arabic it should be from right to left else left to right.
You can said that the text is in arabic, if count of arabic characters is more than count of english characters.
You can determine it using character classes in regular expression
public bool IsArabic(this string input)
{
var isArabic = Regex.Matches(input, "\\p{IsArabic}");
var isLatin = Regex.Matches(input, "\\p{IsBasicLatin}");
if (isArabic == null)
return false;
if (isLatin == null)
return true; //suggest that there is no another character types
if (isArabic.Count > isLatin.Count)
return true;
return false;
}
If text contains RTL mark then Windows does it for you.
Otherwise you may simply check for characters (don't forget System.Char represents a code-unit, not a code-point but in this case it's not an issue) in Unicode Arabic Code block:
public bool IsArabic(string text)
{
return Regex.IsMatch(text, "[\u06000-\u06FF]")
}
Related
I Have this text Grou00dfbeerenstrau00dfe and I need to convert it to Großbeerenstraße
also Eichstu00e4tt to Eichstätt
But I don't completely understand and solve this because of these reasons:
ONLY some characters (special characters) are converted, not the whole text
Unicoded texts usually have Escape characters like \u00df instead of u00df
Could you please help me to convert correctly back to its original states?
Basically, how can I convert when there is no escape character?
NOTE: If you must know, I'm sending some special charactered strings into some system. I cannot touch this system but when I request back the same string from that system, it converts Großbeerenstraße to Grou00dfbeerenstrau00dfe and so on.
Based on David's idea of looking for u and checking if the following 4 characters are valid hex numbers, it would look something like this:
public string FixGermanUnicode(string input) {
var output = new StringBuilder();
for (var i = 0; i < input.Length; i++) {
if (i < input.Length - 4 && input[i] == 'u' && input[i + 1] == '0'
&& int.TryParse(input.Substring(i + 1, 4), NumberStyles.HexNumber, null, out var code)) {
try {
output.Append(char.ConvertFromUtf32(code));
i += 4;
} catch (ArgumentOutOfRangeException) {
//not a valid unicode character
output.Append(input[i]);
}
} else {
output.Append(input[i]);
}
}
return output.ToString();
}
Console.WriteLine(FixGermanUnicode("Grou00dfbeerenstrau00dfe"));
Really, it checks for u0 to prevent cases where the next 4 characters are valid unicode, but should not have been replaced. That will work for German at least, since all the special characters in German have unicode codes starting with 0.
This will also catch scenarios where the follow 4 digits are valid hex numbers, but the resulting hex number is not a valid unicode character.
While I completely agree with #Gabriel Luci's answer, I would like to point out a more concise implementation of the same idea (it needs the ' System.Text.RegularExpression' namespace):
readonly static string unicodePattern = #"u0[0-9a-fA-F]{3}";
public static string FixGermanUnicode(string input)
{
return Regex.Replace(input, unicodePattern, match =>
{
var digits = match.Value.Substring(1);
try
{
return char.ConvertFromUtf32(int.Parse(digits, System.Globalization.NumberStyles.AllowHexSpecifier)).ToString();
}
catch (ArgumentOutOfRangeException)
{
//not a valid unicode character
return match.Value;
}
});
}
Is there a way to check if text is in cyrillics or latin using C#?
Use a Regex and check for \p{IsCyrillic}, for example:
if (Regex.IsMatch(stringToCheck, #"\p{IsCyrillic}"))
{
// there is at least one cyrillic character in the string
}
This would be true for the string "abcабв" because it contains at least one cyrillic character. If you want it to be false if there are non cyrillic characters in the string, use:
if (!Regex.IsMatch(stringToCheck, #"\P{IsCyrillic}"))
{
// there are only cyrillic characters in the string
}
This would be false for the string "abcабв", but true for "абв".
To check what the IsCyrillic named block or other named blocks contain, have a look at this http://msdn.microsoft.com/en-us/library/20bw873z.aspx#SupportedNamedBlocks
How about this ?
string pattern = #"\p{IsCyrillic}";
if ( Regex.Matches(textInput, pattern).Count > 0)
{
// contains cyrillics' characters.
}
If you want to check that contains cyrillics characters more than x characters Change the right hand numeric value.
Our system recieved spam email that contains cyrillics' characters roughly 30% of
full of text;so, couldn't decide whether 100% or 0%
Here is another solution for this problem
public bool isCyrillic(string textInput)
{
bool rezultat=true;
string pattern = #"[абвгдѓежзѕијклљмнњопрстќуфхцчџш]";
char[] textArray = textInput.ToCharArray();
for (int i = 0; i < textArray.Length; i++)
{
if (!Regex.IsMatch(textArray[i].ToString(),pattern))
{
rezultat = false;
break;
}
}
return rezultat;
}
I have strings like this:
var a = "abcdefg";
var b = "xxxxxxxx";
The strings are always longer than five characters.
Now I need to trim off the last 3 characters. Is there some simple way that I can do this with C#?
In the trivial case you can just use
result = s.Substring(0, s.Length-3);
to remove the last three characters from the string.
Or as Jason suggested Remove is an alternative:
result = s.Remove(s.Length-3)
Unfortunately for unicode strings there can be a few problems:
A unicode codepoint can consist of multiple chars since the encoding of string is UTF-16 (See Surrogate pairs). This happens only for characters outside the basic plane, i.e. which have a code-point >2^16. This is relevant if you want to support Chinese.
A glyph (graphical symbol) can consist of multiple codepoints. For example ä can be written as a followed by a combining ¨.
Behavior with right-to-left writing might not be what you want either
You want String.Remove(Int32)
Deletes all the characters from this string beginning at a specified
position and continuing through the last position.
If you want to perform validation, along the lines of druttka's answer, I would suggest creating an extension method
public static class MyStringExtensions
{
public static string SafeRemove(this string s, int numCharactersToRemove)
{
if (numCharactersToRemove > s.Length)
{
throw new ArgumentException("numCharactersToRemove");
}
// other validation here
return s.Remove(s.Length - numCharactersToRemove);
}
}
var s = "123456";
var r = s.SafeRemove(3); //r = "123"
var t = s.SafeRemove(7); //throws ArgumentException
string a = "abcdefg";
a = a.Remove(a.Length - 3);
string newString = oldString.Substring(0, oldString.Length - 4);
If you really only need to trim off the last 3 characters, you can do this
string a = "abcdefg";
if (a.Length > 3)
{
a = a.Substring(0, a.Length-3);
}
else
{
a = String.Empty;
}
I found this question but it removes all valid utf-8 characters also (returns me a blank string, while there are valid utf-8 characters plus control characters). As I read about utf-8, there's not a specific range for control characters and each character set has its own control characters.
How can I modify above solution to only remove control characters ?
This is how I roll:
Regex.Replace(evilWeirdoText, #"[\u0000-\u001F]", string.Empty)
This strips out all the first 31 control characters. The next hex value up from \u001F is \u0020 AKA the space. Everything before space is all the line feed and null nonsense.
To believe me on the characters: http://donsnotes.com/tech/charsets/ascii.html
I think the following code will work for you:
public static string RemoveControlCharacters(string inString)
{
if (inString == null) return null;
StringBuilder newString = new StringBuilder();
char ch;
for (int i = 0; i < inString.Length; i++)
{
ch = inString[i];
if (!char.IsControl(ch))
{
newString.Append(ch);
}
}
return newString.ToString();
}
If you plan to use the string as a query string, you should consider using the Uri.EscapeUriString() or Uri.EscapeDataString() before sending it out.
Note: You might still need to pull out anything from char.IsControl() first?
how do I make a boolean statement to only allow text? I have this code for only allowing the user to input numbers but ca'nt figure out how to do text.
bool Dest = double.TryParse(this.xTripDestinationTextBox.Text, out Miles);
bool MilesGal = double.TryParse(this.xTripMpgTextBox.Text, out Mpg);
bool PriceGal = double.TryParse(this.xTripPricepgTextBox.Text, out Price);
Update: looking at your comment I would advise you to read this article:
User Input Validation in Windows Forms
Original answer: The simplest way, at least if you are using .NET 3.5, is to use LINQ:
bool isAllLetters = s.All(c => char.IsLetter(c));
In older .NET versions you can create a method to do this for you:
bool isAllLetters(string s)
{
foreach (char c in s)
{
if (!char.IsLetter(c))
{
return false;
}
}
return true;
}
You can also use a regular expression. If you want to allow any letter as defined in Unicode then you can use this regular expression:
bool isOnlyLetters = Regex.IsMatch(s, #"^\p{L}+$");
If you want to restrict to A-Z then you can use this:
bool isOnlyLetters = Regex.IsMatch(s, #"^[a-zA-Z]+$");
You can use following code in the KeyPress event:
if (!char.IsLetter(e.KeyChar)) {
e.Handled = true;
}
You can use regular expression, browse here: http://regexlib.com/
You can do one of a few things.
User Regular Expression to check that the string matches your concept of 'Text' only
Write some code that checks each character against a list of valid characters. Or even use char.IsLetter()
Use the MaskedTextBox and set a custom mask to limit input to text only characters
I found hooking up into the text changed event of a textbox and accepting/rejecting the changes an acceptable solution. To "only allow text" is a rather vague definition, but you may as well check if your newly added text (the next char) is a number or not and simply reject all numbers/disallowed characters. This will make users feel they can only enter characters and special characters (like dots, question marks etc.).
private void UTextBox_TextChanged(object sender, EventArgs e)
{
string lastCharacter = this.Text[this.Text.Length-1].ToString();
MatchCollection matches = Regex.Matches(lastCharacter, "[0-9]", RegexOptions.None);
if (matches.Count > 0) //character is a number, reject it.
{
this.Text = Text.Substring(0, Text.Length-1);
}
}