What is wrong with ToLowerInvariant()? - c#

I have the following line of code:
var connectionString = configItems.
Find(item => item.Name.ToLowerInvariant() == "connectionstring");
VS 2010 code analysis is telling me the following:
Warning 7 CA1308 : Microsoft.Globalization : In method ... replace the call to 'string.ToLowerInvariant()' with String.ToUpperInvariant().
Does this mean ToUpperInvariant() is more reliable?

Google gives a hint pointing to CA1308: Normalize strings to uppercase
It says:
Strings should be normalized to uppercase. A small group of characters, when they are converted to lowercase, cannot make a round trip. To make a round trip means to convert the characters from one locale to another locale that represents character data differently, and then to accurately retrieve the original characters from the converted characters.
So, yes - ToUpper is more reliable than ToLower.
In the future I suggest googling first - I do that for all those FxCop warnings I get thrown around ;) Helps a lot to read the corresponding documentation ;)

Besides what TomTom says, .net is optimized for string comparison in upper case. So using upper invariant is theoretically faster than lowerinvariant.
This is indeed stated in CLR via C# as pointed out in the comments.
Im not sure if this is of course really true since there is nothing to be found on MSDN about this topic. The string comparison guide on msdn mentions that toupperinvariant and tolowerinvariant are equal and does not prefer the former.

Related

String interpolation C#: Documentation of colon and semicolon functionality

I found this codegolf answer for the FizzBuzz test, and after examining it a bit I realized I had no idea how it actually worked, so I started investigating:
for(int i=1; i<101;i++)
System.Console.Write($"{(i%3*i%5<1?0:i):#}{i%3:;;Fizz}{i%5:;;Buzz}\n");
I put it into dotnetfiddle and established the 1st part works as follows:
{(BOOL?0:i):#}
When BOOL is true, then the conditional expression returns 0 otherwise the number.
However the number isn't returned unless it's <> 0. I'm guessing this is the job the of :# characters. I can't find any documentation on the :# characters workings. Can anyone explain the colon/hash or point me in the right direction?
Second part:
{VALUE:;;Fizz}
When VALUE = 0 then nothing is printed. I assume this is determined by the first ; character [end statement]. The second ; character determines 'if VALUE <> 0 then print what's after me.'
Again, does anyone have documentation on the use of a semicolon in string interpolation, as I can't find anything useful.
This is all covered in the String Interpolation documentation, especially the section on the Structure of an Interpolated String, which includes this:
{<interpolatedExpression>[,<alignment>][:<formatString>]}
along with a more detailed description for each of those three sections.
The format string portion of that structure is defined on separate pages, where you can use standard and custom formats for numeric types as well as standard and custom formats for date and time types. There are also options for Enum values, and you can even create your own custom format provider.
It's worth taking a look at the custom format provider documentation just because it will also lead you to the FormattableString type. This isn't well-covered by the documentation, but my understanding is this type may in theory allow you to avoid re-parsing the interpolated string for each iteration when used in a loop, thus potentially improving performance (though in practice, there's no difference at this time). I've written about this before, and my conclusion is MS needs to build this into the framework in a better way.
Thanks to all the commenters! Fast response.
The # is defined here (Custom specifier)
https://learn.microsoft.com/en-us/dotnet/standard/base-types/custom-numeric-format-strings#the--custom-specifier
The "#" custom format specifier serves as a digit-placeholder symbol.
If the value that is being formatted has a digit in the position where
the "#" symbol appears in the format string, that digit is copied to
the result string. Otherwise, nothing is stored in that position in
the result string. Note that this specifier never displays a zero that
is not a significant digit, even if zero is the only digit in the
string. It will display zero only if it is a significant digit in the
number that is being displayed.
The ; is defined here (Section Seperator):
https://learn.microsoft.com/en-us/dotnet/standard/base-types/custom-numeric-format-strings#the--section-separator
The semicolon (;) is a conditional format specifier that applies
different formatting to a number depending on whether its value is
positive, negative, or zero. To produce this behavior, a custom format
string can contain up to three sections separated by semicolons...

C# casing convention for hexadecimal literals?

I like unity and I want to keep my code style guidelines strictly consistent.
So consider this ulong literal:
var x = 0xFFUL;
vs.
var x = 0xffUL;
It might be a stupid question but I hate when my code is not consistent even in these negligible things, so I'd like to know what's hex numbers like for the C# project team...
To date, the most popular format for representing hexadecimal literals is 0 to 9 and A to F.
You can confirm that the use of uppercase characters is most popular by referring to the hexadecimal Wikipedia entry here.
Hence follow the "wisdom of the crowd" and use 0-9, A-F.
I'm actually looking at Microsoft's code base and find inconsistencies there too.
Here it's lowercase while here it's uppercase.
So I believe the true answer is there is no convention about that, and one should pick his favorite naming.
Also searching here I'm not finding anything about hex casing.

string to double validation c# please answer

ok so I am building a program in WPF format.
as you know wpf's inputs are usually string, to turn those into double first I need to validate if those string fit and then to proceed and convert them.
the problem is in the validation, I have done the part in the validation that is checking if the string.IsNullOrEmpty but the thing I could not do is validate if the answer is completely not convertable... let me show an example because some strings that are not completely numeric are still should be accepted for example:
"sadasdaasd" - not accepted (obviously)
"8945a4554" - not accepted (there is an 'a' in the middle)
"3519" - accepted
"12.55" - accepted
"-3/4" - accepted and the value should be converted to double as (-3) divided by (4). so '/' is accepted and it splits the string by 2 and then converts it to double as first part/ second part.
I have been trying to do this validation all day and still have not succeeded, I have tried searching the web for some input validation, some said that I need to use double.TryParse(string, out double) but this function does not work with the '/' split that i wanted. so please help me!!!
I would start by parsing your string via regex (q: is "-3*4" acceptable as -3 times 4?). Basically you're looking for a match on a regex which is kind of like this (this works on -3/4, you'd want to test it further and modify if multiplication is allowed): -?\d+[/]\d+
If you find that match, parse out your string with string.Split('/') which will give you an array of strings. TryParse each of those and do the math.
If there is not a match, use TryParse (as recommended previously). That will either succeed (3519, 12.55 in your examples) or fail (sadasdaasd, 8945a4554 in your examples).
Note: you could also use string.Contains('/'), but then you have to check to see if it holds more than one slash (unless such a thing is allowed- in which case you'll need to revisit that regex).

C# string.IndexOf() returns unexpected value

This question applies to C#, .net Compact Framework 2 and Windows CE 5 devices.
I encountered a bug in a .net DLL which was in use on very different CE devices for years, without showing any problems. Suddenly, on a new Windows CE 5.0 device, this bug appeared in the following code:
string s = "Print revenue receipt"; // has only single space chars
int i = s.IndexOf(" "); // two space chars
I expect i to be -1, however this was only true until today, when indexOf suddenly returned 5.
Since this behaviour doesn't occur when using
int i = s.IndexOf(" ", StringComparison.Ordinal);
, I'm quite sure that this is a culture based phenomenom, but I can't recognize the difference this new device makes. It is a mostly identical version of a known device (just a faster cpu and new board).
Both devices:
run Windows CE 5.0 with identical localization
System.Environment.Version reports '2.0.7045.0'
CultureInfo.CurrentUICulture and CultureInfo.CurrentCulture report 'en-GB' (also tested with 'de-DE')
'all' related registry keys are equal.
The new device had the CF 3.5 preinstalled, whose GAC files I experimentally renamed, with no change in the described behaviour. Since at runtime always Version 2.0.7045.0 is reported, I assume these assemblies have no effect.
Although this is not difficult to fix, i can not stand it when things seem that magical. Any hints what i was missing?
Edit: it is getting stranger and stranger, see screenshot:
One more:
I believe you already have the answer using an ordinal search
int i = s.IndexOf(" ", StringComparison.Ordinal);
You can read a small section in the documentation for the String Class which has this to say on the subject:
String search methods, such as String.StartsWith and String.IndexOf, also can perform culture-sensitive or ordinal string comparisons. The following example illustrates the differences between ordinal and culture-sensitive comparisons using the IndexOf method. A culture-sensitive search in which the current culture is English (United States) considers the substring "oe" to match the ligature "œ". Because a soft hyphen (U+00AD) is a zero-width character, the search treats the soft hyphen as equivalent to Empty and finds a match at the beginning of the string. An ordinal search, on the other hand, does not find a match in either case.
Culture stuff can really appear to be quite magical on some systems. What I came to always do after years of pain is always set the culture information manually to InvariantCulture where I do not explicitly want different behaviour for different cultures. So my suggestion would be: Make that IndexOf check always use the same culture information, like so:
int i = s.IndexOf(" ", StringComparison.InvariantCulture);
The reference at http://msdn.microsoft.com/en-us/library/k8b1470s.aspx states:
"Character sets include ignorable characters, which are characters that are not considered when performing a linguistic or culture-sensitive comparison. In a culture-sensitive search, if value contains an ignorable character, the result is equivalent to searching with that character removed."
This is from 4.5 reference, references from previous versions don't contain nothing like that.
So let me take a guess: they have changed the rules from 4.0 to 4.5 and now the second space of a two space sequence is considered to be a "ignorable character" - at least if the engine recognizes your string as english text (like in your example string s), otherwise not.
And somehow on your new device, a 4.5 dll is used instead of the expected 2.0 dll.
A wild guess, I know :)

How to detect a C++ identifier string?

E.g:
isValidCppIdentifier("_foo") // returns true
isValidCppIdentifier("9bar") // returns false
isValidCppIdentifier("var'") // returns false
I wrote some quick code but it fails:
my regex is "[a-zA-Z_$][a-zA-Z0-9_$]*"
and I simply do regex.IsMatch(inputString).
Thanks..
It should work with some added anchoring:
"^[a-zA-Z_][a-zA-Z0-9_]*$"
If you really need to support ludicrous identifiers using Unicode, feel free to read one of the various versions of the standard and add all the ranges into your regexp (for example, pages 713 and 714 of http://www-d0.fnal.gov/~dladams/cxx_standard.pdf)
Matti's answer will work to sanitize identifiers before inserting into C++ code, but won't handle C++ code as input very well. It will be annoying to separate things like L"wchar_t string", where L is not an identifier. And there's Unicode.
Clang, Apple's compiler which is built on a philosophy of modularity, provides a set of tokenizer functions. It looks like you would want clang_createTranslationUnitFromSourceFile and clang_tokenize.
I didn't check to see if it handles \Uxxxx or anything. Can't make any kind of gurarantees. Last time I used LLVM was five years ago and it wasn't the greatest experience… but not the worst either.
On the other hand, GCC certainly has it, although you have to figure out how to use cpp_lex_direct.

Categories