Preventing Cyrillic/Greek/Chinese in a string - C# 4.0 - c#

We have a system (using ASP.NET C# 4.0) that supports Greek, Cyrillic, Chinese characters. But a third party system doesn't seem to work correctly. To avoid issues when entering data for this third party system, I want to limit the text fields to accept only English or accented characters, but return a validation error for other characters.
How can I accomplish this? It seems I can use a regex along the lines of \p{Latin}, but C# doesn't seem to support this from my experience, as I get an Unknown property 'Latin' error.

In .NET, the Unicode block properties need to be written with Is...:
[\p{IsGreek}\p{IsCyrillic}...]
A pattern like this would detect all offending characters in your case. If you just want to exclude everything but Latin, you could do something like:
[^\p{IsBasicLatin}\p{IsLatin-1Supplement}\p{IsLatinExtended-A}\p{IsLatinExtended-B}]
This covers all code points up to U+024F.
For a list of supported block names, see MSDN.

Related

How to output string text so there’s no line breaks with words in C#

I’ve been searching on here and I’m developing a chatbot that has various responses, some of the responses are pretty long strings. Is there any way to make sure that line breaks don’t occur mid-word in the output? Would I have to insert it in my code before every response or define it before the method begins? Or is this just not possible? Thanks.
There is no built-in way to format string to fit console.
You need to decide what is the criteria for your line breaking algorithm and implement that.
Notes
you need to re-calculate it every time you render the text (assuming you have some sort of history shown) as window size can change (you can resize console windows similar to all other windows thus changing character-width).
depending on the language finding boundaries of words could range from trivial to implement ("just use spaces") to multiyear research project for once that don't use spaces (range similar to xkcd:Tasks :)).
If you have options I'd recommend switching to HTML rendering instead of console as word breaking already done there for you (and much more like proper emojis which you will have hard time with
in console app)

C# program stops working after the language setting in control panel is changed (say, from English to German)

I have a software developed in C#, which is a pure sentefic application. Howver the German users found this software stopped working from time to time, when it is installed on German computers. The temporary solution is to change the Language setting in the control panel, and it works fine after we change the language setting from German to English. This is just a kind of engineering sofware, and the software have nothing relalted to the German or English language. Also, as suggested from other posts in msdn, I have checked the "InitializeComponent()" in the source does several times. There are not strange codes in the "InitializeComponent()" function.
When you change locale, you change the meaning of ',' (comma) and '.' (full-stop) when used in numbers. Could it be that you are trying to parse text containing these characters into numbers?
Does your program attempt to initialize numeric fields with formatted numbers, perhaps?
You need to make sure that your code is sensitive to the user's culture when parsing and formatting text. You also need to make sure you use a consistent culture (e.g. the InvariantCulture) when reading data stored to file or sent over a network.
If you are using .NET Framework 4.5, you might be interested to read about the CultureInfo.DefaultThreadCurrentCulture Property.
In the .NET Framework 4 and previous versions, by default, the culture
of all threads is set to the Windows system culture. For applications
whose current culture differs from the default system culture, this
behavior is often undesirable.
The examples and their explanations on the page could be quite helpful for your issue.
Also, as a side note, try{...}catch{...} blocks are always welcome.

C# console font

I cannot find out which font the console app uses by default? Is it guaranteed that everyone has that font (when running this .NET app)? Want to display some unicode chars and need to be sure they are present within that font.
Thanks
I strongly recommend avoiding the Console if you want to use Unicode characters. There are many issues with trying to get the Console to display Unicode correctly.
Unicode is not directly supported in Console output. The best option is typically to set the console's code page, which will require P/Invoke.
That being said, a GUI solves all of these issues, in a much nicer fashion. If you need Unicode output, I'd recommend a simple GUI.
You can tell what font is being used by reading the registry value "0" from this key:
HKLM\Software\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont

How do I validate for language? ASP.NET

In a textbox in the application, I need to validate to ensure that a user enters only English language text. I know some languages such as Spanish share English's alphabets. How do I validate text to make sure it's:
Only in English language
Supports only languages that use the English character set (Spanish etc)
Thanks
EDIT: Sorry for not being clear enough. This app is on production and when I check the SQL database where the text is stored, there are a lot of rows with "??? ?????". On further investigation, it appears that this is caused when a non english language text is saved to a database. As an example, go to google news, select google Korea from the dropdown, copy some Korean text and save it to a SQL server database
Anyone?
By "English character set", I guess you are referring to the ASCII character set.
You can iterate through each character and see whether it lies in the ASCII range.
You can try to check against an English dictionary (e.g. OpenOffice has a dictionary which you may use for free, not sure about that though) if most of the used words are recognized by this dictionary.
You could also do some kind of text analysis and check the occurance of each character or short sequence like 'th' etc. Each language has specific character occurances and this could help you determining in what language the text is written.
I would not prohibit certain characters because at least in names special characters occur quite often.
I hope you got an idea of some possibilities.
Best Regards, Oliver Hanappi
If this is for a moderately small amount of text, you could try finding an English dictionary web service and try to look up the words. If lookup fails, you most likely either have a typo or something from another language. I haven't found one that accepts large blocks of text, but there is a web service that operates off of the dict.org database:
DictService
One way is to use a English Language Dictionary / Spell Checker , if is valid English / Spanish Word
a very good sample is this
NetSpell Sample - Spell Checker for .NET
It is as simple as follows
NetSpell.SpellChecker.Spelling SpellChecker =
new NetSpell.SpellChecker.Spelling SpellChecker()
SpellChecker.Text = MyTextBox.Text;
SpellChecker.SpellCheck();
NetSpell Home Page: http://www.loresoft.com/NetSpell

Phone number normalization: Any pre-existing libraries?

I have a system which is using phone numbers as unique identifiers. For this reason, I want to format all phone numbers as they come in using a normalized format. Because I have no control over my source data, I need to parse out these numbers myself and format them before adding them to my DB.
I'm about to write a parser that can read phone numbers in and output a normalized phone format, but before I do I was wondering if anyone knew of any pre-existing libraries I could use to format phone numbers.
If there are no pre-existing libraries out there, what things should I be keeping in mind when creating this feature that may not be obvious?
Although my system is only dealing with US numbers right now, I plan to try to include support for international numbers just in case since there is a chance it will be needed.
Edit I forgot to mention I'm using C#.NET 2.0.
You could use libphonenumber from Google. Here's a blog post:
http://blog.appharbor.com/2012/02/03/net-phone-number-validation-with-google-libphonenumber
Parsing numbers is as easy as installing the NuGet package and then doing this:
var util = PhoneNumberUtil.GetInstance();
var number = util.Parse("555-555-5555", "US");
You can then format the number like this:
util.Format(number, PhoneNumberFormat.E164);
libphonenumber supports several formats other than E.164.
I'm currently involved in the OpenMoko project, which is developing a completely open source cell phone (including hardware). There has been a lot of trouble around normalizing phone numbers. I don't know if anyone has come up with a good solution yet. The biggest problem seems to be with US phone numbers, since sometimes they come in with a 1 on the front and sometimes not. Depending on what you have stored in your contacts list, it may or may not display the caller ID info correctly. I'd recommend stripping off the 1 on the phone number (though I'd expect most people wouldn't enter it in the first place). You may also need to look for a plus sign or country code on the front of international numbers.
You can check around the OpenMoko website, mailing list, and source control to see if they've solved this bug yet.
perl and rails examples
http://validates-as-phone.googlecode.com/svn/trunk/README
http://www.perlmonks.org/?node_id=159645
Just strip out any non-digits, possibly using a RegEx: [^\d]
The only exception might be if you want to handle extensions, to distinguish a number without an area code but with a 3 digit extension, or if you need to handle international numbers.
What you need is list of all country codes and start matching your string first few characters against list of country codes to make sure it's correct then for the rest of the number, make sure it's all digits and of proper length which usually varies from 5-10 digits.
To achieve checking against country codes, install NGeoNames nuget which uses website www.geonames.org to get list of all country codes to use to match against them.

Categories