How to refactor C# interpolated strings to string.Format(...) automatically?

How to refactor C# interpolated strings to string.Format(...) automatically? - c#

Recently in C# 6 a new language element/syntactic sugar was introduced named string interpolation.
However after a few minutes of enjoying the sweet taste of this syntax, it quickly turns out, that interpolated strings (what are still string literals in semantic point of view) can not be refactored out to a resource because of the variables embedded are living only that scope where the interpolated string is defined.
This scope locked string literals for example can not be localized and regardless of the localization need, some code quality checkers used to regard string literals embedded in as code smell.
Working with a huge enterprise code base I expect to appear more and more interpolated strings, so the problem will be quickly turn from theoretical to practical. I would like both
have a code quality checker rule which bans out this practice just
like string literals in the middle of the code (I can manage it, by
defining custom rules in the standard quality tools. Although StyleCop currently does not even recognize them, and runs to an internal error, so this will not be as easy as it sounds)
have a refactoring tool what can refactor
string interpolation to string.Format so then it can easily can
refactor out to a standard .NET resource.
Any suggestions

Enabling code analysis prevents usage of interpolated strings (warning CA1305) as they don't support specifying locale (unlike String.Format). So while somewhat awkward this is possible solution to your particular case.
Also R# can quickly convert one format to another - so while not automated combination of Code Analysis and R# would let you quickly find and partially correct all the cases.

Localisation is possible with string interpolation.
Consider this:
System.FormattableString s = $"Hello, {name}";
s.ToString() will use the current culture to produce the string.
float flt = 3.141;
System.IFormattable s = $"{flt}";
With IFormattable you can alter numeric formats according to specific languages
So it is possible, just not directly.
source:
https://msdn.microsoft.com/nl-nl/library/dn961160.aspx

Related

Convert string.Format method call to C# interpolated string

I know that I should probably leave the codebase alone (principle: if it ain't broke...) but the one in question has a lot of calls to string.Format() and I'm intrigued by the new interpolated string support in C# 6.0 as it might make the {0} substitutions less error prone. My aim is to detect and remove execution paths that might have an unused parameter to string.Format() or - worse - an unsubstantiated reference to a parameter that doesn't exist (e.g. {1} if there's only only one parameter).
Do any tools or code snippets exist that might assist in converting all string literals from one format to the other? I'd settle for *.cs files to start, the XAML binding replacements could come later.
Would this exercise be relatively safe (i.e. is there some glaringly obvious flaw with interpolated strings that I haven't considered?)

Localization alternative to Resx file

Why I don't want to use Resx files:
I am looking for an alternative for resx files to offer multilanguage support for my project, due to the following reasons:
I don't like to specify a "messageId" when writing messages, it is more effort and it is annoying for the flow as I don't see what the log message would actually say and I would need to open another tab to edit the message
Sometimes I use code inline because I don't want to create new variables for to easy steps (e. g. Log.Info("Iterated {i+1} times");). Using variables or doing simple calculations inline makes the whole code sometimes more clearly than creating additional code lines
What I could imagine instead:
An external application which crawls a compiled exe for all strings, giving you the opportunity to ignore/add strings which should be translated. It could create a XML or Json file for all languages as well then. It would replace all strings with a hash/id so that a lookup for strings in all languages is still possible.
Am I the only one who is not happy with the commonly used Resx / centralized string db solution? Do I miss points why this wouldn't be a good idea?

One reason for relying on established approaches instead of implementing your own format is translation. It really depends on how your resources are translated: if it is done by volunteers with a technical background who don't mind working in a plain text editor, then you are free to come up with your own resource format. If on the other hand you send out your resources to professional translators who are not very technical and who prefer to work in a translation environment with integrated terminology management, translation memory, spelling and quality checks etc. it is quite likely that this environment will not be able to handle your homemade resource format.
Since I already mentioned professional translation environments: some of these tools rely on IDs to figure out which strings are old and which are new. If you use the approach that the text is the ID every fixed typo in your source language means that you create a new string that needs to be translated - and paid for. If the translator sees that the source text for a string has changed he can have a look at the change, notice that a typo has been fixed, decide that the translation is still OK and sign the string off, without extra translation cost.
By the way, if you want good localizations for strings like Log.Info("Iterated {i+1} times"); you have to find some way of dealing with plural forms correctly. Some languages have different grammatical rules for different numbers (see the Unicode Language Plural Rules for an overview). Just because something is easy to do in code does not mean that it is easy to localize, I'm afraid.
To sum this up: if you want to create your own resource format, talk with your translators. Ask them which formats they can handle. Think about translation related limitations that come with your format, for example if there are any characters that the translators should not use because they break your strings? Apostrophes and quotes are prime candidates here because they are often used as string delimiters in resource files, or < and & if you decide to go the XML way. Think about a conversion to XLIFF and back: most translation environments can handle XLIFF.

How do I explain when to use string and when String [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the difference between String and string
I'm using string for a variable, like this
string myString = "test";
And I'm using String if I want to use some methods(?) of the Class String
String.Format...
I thought this is looking better. Buy some people are doing stuff like
String myString;
string.Format...
Its working. But I don't like this. How can I tell them to stop? Is there "C# rule" for stuff like this? The same thing for int,Int; char,Char; ...

string is a C# alias to System.String.
If writing C#, you should be using the alias, so string.Format and string myString.
Both end up compiled to the same IL and mean the same thing, but C# has its idioms and using the type alias is part of them - in the same way that you would use int and not System.Int32.

Coding guidelines are an important thing if you are working on bigger projects and ensure that you can easily read code written by any of the developer in your team.
Unfortunately there isn't any official guideline on how you should format your C# code. Microsoft itself encountered the problem while developing the .NET framework and developed an internal set of style guidelines which grew into a full fledged program called StyleCop which has a default rule set with sensible settings.
According to these rules you should always use string instead of String:
string xyz;
string.Format();
The rules is the following:
SA1121 - UseBuiltInTypeAlias - Readability Rules
The code uses one of the basic C# types, but does not use the built-in
alias for the type.
Rather than using the type name or the fully-qualified type name, the
built-in aliases for these types should always be used: bool, byte,
char, decimal, double, short, int, long, object, sbyte, float, string,
ushort, uint, ulong.
A recommended reading is the history of StyleCop you can find here:
http://stylecop.codeplex.com/wikipage?title=A%20Brief%20History%20of%20CSharp%20Style&referringTitle=Documentation
It explains some of the problems you encounter with different people from different backgrounds working on the same code base and how they developed the rule set.
We recently implemented StyleCop in our own project and although it is a lot of work to really follow all the rules, the resulting code is much more readable. It also has a fairly good ReSharper integration which allows you to do many fixes automatically if you use ReSharper.

I usually express this as, "If there is an identical primitive type in the language, prefer it." In your specific case, which style to use is a preference rather than something that has major impacts to functionality. What is important is that whatever style is chosen is consistently applied by everyone in your team.

How to work with textual formats in otherwise procedural code?

This question sounds trivial but let me explain my scenario.
I am working in an object oriented programming language (C#) and most of the actual execution code is procedural, i.e. series of statements, sometimes branches and loops. Fairly standard.
Now I am presented with a task to deal with a textual format (PGN, but it could be anything other like VCard or some custom format). At least for me, the "standard" way to work with it would be to use a mix of:
regular expressions
if / switch statements
for-loops
storing regexp matches into some custom structure and / or outputting it to some result format
However, I don't like this procedural approach at all - regular expressions are prone to errors, the code is usually quite hard to understand and debug, it usually tends to have quite a high cyclomatic complexity etc.
Simply put, I'd like it to be declarative but I don't know what tools or libraries to use.
I remember that when I saw demos of the "M" language I thought that that was exactly I was looking for. There was a simple way to declare syntax of my textual format, the tool would then automatically parse input string into an in-memory representation of the textual DSL, I think that it was also possible to transform the format into another etc.
I have been also in touch with the people behind JetBrains MPS which is another tool for working with DSLs but my scenario doesn't seem to be a perfect match for what they are trying to provide.
So if anyone has any idea about how to elegantly deal with textual formats in otherwise procedural code base, I'd be happy to learn about the options.

Check out my open source project meta#. I think it sounds like exactly what you're looking for.

Translation and localization issue

Does Microsoft implementation of C# runtime offer some localization mechanism to translate common strings like Overflow, Stack overflow, Underflow, etc...
See the code below - it's a part of Mono and Mono itself has a Locale.GetText routine for making such translations.
// Added to avoid possible integer overflow.
if (inputOffset > inputBuffer.Length - inputCount)
throw new ArgumentException("inputOffset" +
Locale.GetText("Overflow");
Now - how is it done in Microsoft version of runtime and how can I use it, for example, to get the localized equivalent of Overflow without adding resource files?

.NET provides a framework that makes it easy to localize your content (ResourceManager) and while it internally maintains some translations for its own purpose (for example DateTime.ToString gives you a textual representation for the date/time that is locally appropriate, which includes the translated month and day names), it does not provide you with any ready-made translations, be they common strings or not. It could hardly do this reliably anyway, as there is a plethora of human languages out there and words can have different translations depending on context etc.
In your example, I would say that you are OK with untranslated exception messages. Although Microsoft recommends that you localize exception descriptions and they do localize their own (at least for major languages), this advice seems ill-thought at it's not only a waste of effort to translate all this text that users probably should never see, but it can make debugging a nightmare.

Yes, it does and it's a terrible idea. It makes debugging so much harder.

without adding resource files
What do you have against resource files? Resources are the prescribed way to provide localized and localizable strings, images, and other data for a .NET app or assembly.
Note that single word substitution as shown in your example code will result in poor quality translations. Different languages have different sentence structure and word order which your single word substitution won't accommodate. Non-English languages often involve genders for nouns and declension of words to properly reflect their role and number in a phrase. Single word substitution fails miserably at this.
Your non-English customers will most likely prefer that you not butcher their language by attempting to partially translate text a word here and a word there. If you're going to go to the trouble of supporting localizable messages, do it right and allow the entire string to be translated so that word ordering and declension can be done properly by translators. In cases where the content is variable, make the format string a resource so that the translator can set off the variable data using the conventions of the language.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.