I like unity and I want to keep my code style guidelines strictly consistent.
So consider this ulong literal:
var x = 0xFFUL;
vs.
var x = 0xffUL;
It might be a stupid question but I hate when my code is not consistent even in these negligible things, so I'd like to know what's hex numbers like for the C# project team...
To date, the most popular format for representing hexadecimal literals is 0 to 9 and A to F.
You can confirm that the use of uppercase characters is most popular by referring to the hexadecimal Wikipedia entry here.
Hence follow the "wisdom of the crowd" and use 0-9, A-F.
I'm actually looking at Microsoft's code base and find inconsistencies there too.
Here it's lowercase while here it's uppercase.
So I believe the true answer is there is no convention about that, and one should pick his favorite naming.
Also searching here I'm not finding anything about hex casing.
Related
I found this codegolf answer for the FizzBuzz test, and after examining it a bit I realized I had no idea how it actually worked, so I started investigating:
for(int i=1; i<101;i++)
System.Console.Write($"{(i%3*i%5<1?0:i):#}{i%3:;;Fizz}{i%5:;;Buzz}\n");
I put it into dotnetfiddle and established the 1st part works as follows:
{(BOOL?0:i):#}
When BOOL is true, then the conditional expression returns 0 otherwise the number.
However the number isn't returned unless it's <> 0. I'm guessing this is the job the of :# characters. I can't find any documentation on the :# characters workings. Can anyone explain the colon/hash or point me in the right direction?
Second part:
{VALUE:;;Fizz}
When VALUE = 0 then nothing is printed. I assume this is determined by the first ; character [end statement]. The second ; character determines 'if VALUE <> 0 then print what's after me.'
Again, does anyone have documentation on the use of a semicolon in string interpolation, as I can't find anything useful.
This is all covered in the String Interpolation documentation, especially the section on the Structure of an Interpolated String, which includes this:
{<interpolatedExpression>[,<alignment>][:<formatString>]}
along with a more detailed description for each of those three sections.
The format string portion of that structure is defined on separate pages, where you can use standard and custom formats for numeric types as well as standard and custom formats for date and time types. There are also options for Enum values, and you can even create your own custom format provider.
It's worth taking a look at the custom format provider documentation just because it will also lead you to the FormattableString type. This isn't well-covered by the documentation, but my understanding is this type may in theory allow you to avoid re-parsing the interpolated string for each iteration when used in a loop, thus potentially improving performance (though in practice, there's no difference at this time). I've written about this before, and my conclusion is MS needs to build this into the framework in a better way.
Thanks to all the commenters! Fast response.
The # is defined here (Custom specifier)
https://learn.microsoft.com/en-us/dotnet/standard/base-types/custom-numeric-format-strings#the--custom-specifier
The "#" custom format specifier serves as a digit-placeholder symbol.
If the value that is being formatted has a digit in the position where
the "#" symbol appears in the format string, that digit is copied to
the result string. Otherwise, nothing is stored in that position in
the result string. Note that this specifier never displays a zero that
is not a significant digit, even if zero is the only digit in the
string. It will display zero only if it is a significant digit in the
number that is being displayed.
The ; is defined here (Section Seperator):
https://learn.microsoft.com/en-us/dotnet/standard/base-types/custom-numeric-format-strings#the--section-separator
The semicolon (;) is a conditional format specifier that applies
different formatting to a number depending on whether its value is
positive, negative, or zero. To produce this behavior, a custom format
string can contain up to three sections separated by semicolons...
In my answer to this question, I mentioned that we used UpperCamelCase parsing to get a description of an enum constant not decorated with a Description attribute, but it was naive, and it didn't work in all cases. I revisited it, and this is what I came up with:
var result = Regex.Replace(camelCasedString,
#"(?<a>(?<!^)[A-Z][a-z])", #" ${a}");
result = Regex.Replace(result,
#"(?<a>[a-z])(?<b>[A-Z0-9])", #"${a} ${b}");
The first Replace looks for an uppercase letter, followed by a lowercase letter, EXCEPT where the uppercase letter is the start of the string (to avoid having to go back and trim), and adds a preceding space. It handles your basic UpperCamelCase identifiers, and leading all-upper acronyms like FDICInsured.
The second Replace looks for a lowercase letter followed by an uppercase letter or a number, and inserts a space between the two. This is to handle special but common cases of middle or trailing acronyms, or numbers in an identifier (except leading numbers, which are usually prohibited in C-style languages anyway).
Running some basic unit tests, the combination of these two correctly separated all of the following identifiers: NoDescription, HasLotsOfWords, AAANoDescription, ThisHasTheAcronymABCInTheMiddle, MyTrailingAcronymID, TheNumber3, IDo3Things, IAmAValueWithSingleLetterWords, and Basic (which didn't have any spaces added).
So, I'm posting this first to share it with others who may find it useful, and second to ask two questions:
Anyone see a case that would follow common CamelCase-ish conventions, that WOULDN'T be correctly separated into a friendly string this way? I know it won't separate adjacent acronyms (FDICFCUAInsured), recapitalize "properly" camelCased acronyms like FdicInsured, or capitalize the first letter of a lowerCamelCased identifier (but that one's easy to add - result = Regex.Replace(result, "^[a-z]", m=>m.ToString().ToUpper());). Anything else?
Can anyone see a way to make this one statement, or more elegant? I was looking to combine the Replace calls, but as they do two different things to their matches it can't be done with these two strings. They could be combined into a method chain with a RegexReplace extension method on String, but can anyone think of better?
So while I agree with Hans Passant here, I have to say that I had to try my hand at making it one regex as an armchair regex user.
(?<a>(?<!^)((?:[A-Z][a-z])|(?:(?<!^[A-Z]+)[A-Z0-9]+(?:(?=[A-Z][a-z])|$))|(?:[0-9]+)))
Is what I came up with. It seems to pass all the tests you put forward in the question.
So
var result = Regex.Replace(camelCasedString, #"(?<a>(?<!^)((?:[A-Z][a-z])|(?:(?<!^[A-Z]+)[A-Z0-9]+(?:(?=[A-Z][a-z])|$))|(?:[0-9]+)))", #" ${a}");
Does it in one pass.
not that this directly answers the question, but why not test by taking the standard C# API and converting each class into a friendly name? It'd take some manual verification, but it'd give you a good list of standard names to test.
Let's say every case you come across works with this (you're asking us for examples that won't and then giving us some, so you don't even have a question left).
This still binds UI to programmatic identifiers in a way that will make both programming and UI changes brittle.
It still assumes your program will only be used in one language. Either your potential market it so small that just indexing an array of names would be scalable enough (e.g. a one-client bespoke or in-house project), or you are assuming you will never be successful enough to need to be available to other languages or other dialects of your first-chosen language.
Does "well, it'll work as long as we're a failure" sound like a passing grade in balancing designs?
Either code it to use resources, or else code it to pass the enum name blindly or use an array of names, as that at least will be modifiable afterwards.
E.g:
isValidCppIdentifier("_foo") // returns true
isValidCppIdentifier("9bar") // returns false
isValidCppIdentifier("var'") // returns false
I wrote some quick code but it fails:
my regex is "[a-zA-Z_$][a-zA-Z0-9_$]*"
and I simply do regex.IsMatch(inputString).
Thanks..
It should work with some added anchoring:
"^[a-zA-Z_][a-zA-Z0-9_]*$"
If you really need to support ludicrous identifiers using Unicode, feel free to read one of the various versions of the standard and add all the ranges into your regexp (for example, pages 713 and 714 of http://www-d0.fnal.gov/~dladams/cxx_standard.pdf)
Matti's answer will work to sanitize identifiers before inserting into C++ code, but won't handle C++ code as input very well. It will be annoying to separate things like L"wchar_t string", where L is not an identifier. And there's Unicode.
Clang, Apple's compiler which is built on a philosophy of modularity, provides a set of tokenizer functions. It looks like you would want clang_createTranslationUnitFromSourceFile and clang_tokenize.
I didn't check to see if it handles \Uxxxx or anything. Can't make any kind of gurarantees. Last time I used LLVM was five years ago and it wasn't the greatest experience… but not the worst either.
On the other hand, GCC certainly has it, although you have to figure out how to use cpp_lex_direct.
I have the following line of code:
var connectionString = configItems.
Find(item => item.Name.ToLowerInvariant() == "connectionstring");
VS 2010 code analysis is telling me the following:
Warning 7 CA1308 : Microsoft.Globalization : In method ... replace the call to 'string.ToLowerInvariant()' with String.ToUpperInvariant().
Does this mean ToUpperInvariant() is more reliable?
Google gives a hint pointing to CA1308: Normalize strings to uppercase
It says:
Strings should be normalized to uppercase. A small group of characters, when they are converted to lowercase, cannot make a round trip. To make a round trip means to convert the characters from one locale to another locale that represents character data differently, and then to accurately retrieve the original characters from the converted characters.
So, yes - ToUpper is more reliable than ToLower.
In the future I suggest googling first - I do that for all those FxCop warnings I get thrown around ;) Helps a lot to read the corresponding documentation ;)
Besides what TomTom says, .net is optimized for string comparison in upper case. So using upper invariant is theoretically faster than lowerinvariant.
This is indeed stated in CLR via C# as pointed out in the comments.
Im not sure if this is of course really true since there is nothing to be found on MSDN about this topic. The string comparison guide on msdn mentions that toupperinvariant and tolowerinvariant are equal and does not prefer the former.
I have a Excel Spreadsheet with lab data which looks like this:
µg/L (ppb)
I want to test for the presence of the Greek letter "µ" and if found I need to do something special.
Normally, I would write something like this:
if ( cell.StartsWith(matchSequence) ) {
//.. <-- universal symbol for "magic" :)
}
I know there is an Encoding API in the Framework, but should I use it for just this one edge-case or just copy the Greek micro symbol from the character map?
How would I test for the presence of a this unicode character? The character map seems like a "cheap" fix that will bite me later (I work for a company which is multinational).
I want to do something that is maintainable and not just some crazy math-voodoo conversion that only works for this edge case.
I guess I'm asking for best practice advice here.
Thanks!
You need to work out the unicode character you're interested in, then you can represent it with in code with an escape sequence.
For example, µ is U+00B5, so you just need:
if (text.Contains("\u00b5"))
You can find out the Unicode value from charmap or from the Unicode code charts.
The Unicode code point for micro µ is U+00B5 and is different from the "Greek letter mu" µ, which is at U+03BC. So you can use "\u00b5" to find it, and possibly also look for "\u03bc" as well - they look the same, so whoever created the spreadsheet could have used the wrong one!
You can create a Char from the numeric equivelent shown to you in the Character Map (displays as U+0050 for 'P'). To do this simply check the contains:
string value;
if (value.Contains(Char.ConvertFromUtf32(0x0050)))
;
C# code files are usually encoded in utf8, since the language is using this encoding. All strings and strign literals in c# (and other .NET languages) are encoded in utf16. So you can safely copy the micro character from the character map.
You can also use its integer value as unicode literal like 0x1234.