c# inline string options - for embedded double quotes - c#

I use the # prefix with my inline strings quite often, to support multi-line strings or to make string with quotes a little more readable. Having to still double up the inline quotes is still somewhat of a pain, so this made me wonder if there was still another option in .net that would allow strings to maintain their doublequotes without requiring some form of delimiting? Something like a CDATA section in xml? I've searched a bit and didn't find anything, but thought I'd ask here in case I've overlooked some .Net feature (perhaps even a recent one in version 4 or 4.5)
update: I've found that vb.net has "XML Literals" that allow defining xml snippets directly inline with the source. This looks pretty close to what I'd like c# to do...

If there was something that would do what you want, than we wouldn't need to "escape" double quotes.
I like to use # when writing dynamic HTML in code. But static strings do belong to resources. Even ones that have dynamic values, for example, "Application error. Error Message: {0}". Then you use string.format to form the output.

Related

How to preserve newlines in LeMP output

I've been using LeMP to great effect to auto-generate some code that is identical across variants but for the type of the arguments. However, the classes I'm working on also contain methods authored "by hand", with no LeMP involvement. The challenge is that LeMP seems to throw away many of the newlines in the original code, making the generated C# much harder to read (which I still need to do for use with a debugger, etc).
There seem to be two cases:
DllImport method prototypes just lose their newlines altogether -- looking in a hex editor, the newlines are transformed into spaces.
Methods with actual function bodies, which look to retain some newlines but there is no newline between the closing curly brace and the next 'public T ...' for the next method, for instance.
Some methods seem to be untouched, which is what I'd like to see for everything that isn't generated by a macro.
What's the best way to get LeMP's output to retain as much of the original formatting in the code as possible?
Ok, it seems pretty likely that this is caused by a bug. I've filed 2 issues with some more details.

Localization alternative to Resx file

Why I don't want to use Resx files:
I am looking for an alternative for resx files to offer multilanguage support for my project, due to the following reasons:
I don't like to specify a "messageId" when writing messages, it is more effort and it is annoying for the flow as I don't see what the log message would actually say and I would need to open another tab to edit the message
Sometimes I use code inline because I don't want to create new variables for to easy steps (e. g. Log.Info("Iterated {i+1} times");). Using variables or doing simple calculations inline makes the whole code sometimes more clearly than creating additional code lines
What I could imagine instead:
An external application which crawls a compiled exe for all strings, giving you the opportunity to ignore/add strings which should be translated. It could create a XML or Json file for all languages as well then. It would replace all strings with a hash/id so that a lookup for strings in all languages is still possible.
Am I the only one who is not happy with the commonly used Resx / centralized string db solution? Do I miss points why this wouldn't be a good idea?
One reason for relying on established approaches instead of implementing your own format is translation. It really depends on how your resources are translated: if it is done by volunteers with a technical background who don't mind working in a plain text editor, then you are free to come up with your own resource format. If on the other hand you send out your resources to professional translators who are not very technical and who prefer to work in a translation environment with integrated terminology management, translation memory, spelling and quality checks etc. it is quite likely that this environment will not be able to handle your homemade resource format.
Since I already mentioned professional translation environments: some of these tools rely on IDs to figure out which strings are old and which are new. If you use the approach that the text is the ID every fixed typo in your source language means that you create a new string that needs to be translated - and paid for. If the translator sees that the source text for a string has changed he can have a look at the change, notice that a typo has been fixed, decide that the translation is still OK and sign the string off, without extra translation cost.
By the way, if you want good localizations for strings like Log.Info("Iterated {i+1} times"); you have to find some way of dealing with plural forms correctly. Some languages have different grammatical rules for different numbers (see the Unicode Language Plural Rules for an overview). Just because something is easy to do in code does not mean that it is easy to localize, I'm afraid.
To sum this up: if you want to create your own resource format, talk with your translators. Ask them which formats they can handle. Think about translation related limitations that come with your format, for example if there are any characters that the translators should not use because they break your strings? Apostrophes and quotes are prime candidates here because they are often used as string delimiters in resource files, or < and & if you decide to go the XML way. Think about a conversion to XLIFF and back: most translation environments can handle XLIFF.

outlaw string in c# using intellesence

I often type string in c# when actually I want to type String.
I know that string is an alias of String and I am really just being pedantic but i wish to outlaw string to force me to write String.
Can this be done in ether visual studio intellesence or in resharper and how?
I've not seen it done before, but you may be able to achieve this with an Intellisense extension. A good place to start would be to look at the source for this extension on CodePlex.
Would be good to hear if you have any success with this.
I have always read in "Best Coding Practices" for C# to prefer string, int, float ,double to String, Int32, Single, Double. I think it is mostly to make C# look less like VB.NET, and more like C, but it works for me.
Also, you can go the other way, and add the following on top of every file
using S = System.String;
..
S msg = #"I don't like string.";
you may laugh at this, but I have found it invaluable when I have two similar source codes with different underlying data types. I usually have using num=System.Single; or using num=System.Double; and the rest of the code is identical, so I can copy and paste from one file to the other and maintain both single precision and double precision library in sync.
I think ReSharper can do this!
Here is an extract from the documentation:
ReSharper 5 provides Structural Search and Replace to find custom code constructs and replace them with other code constructs. What's even more exciting is that it's able to continuously monitor your solution for your search patterns, highlight code that matches them, and provide quick-fixes to replace the code according to your replace patterns. That essentially means that you can extend ReSharper's own 900+ code inspections with your custom inspections. For example, if you're migrating to a newer version of a framework, you can create search patterns to find usages of its older API and replace patterns to introduce an updated API.
cheers,
Chris

Streaming structured text input

I'd like to parse formatted basic values and a few custom strings from a TextReader - essentially like scanf allows.
My input might not have line-breaks, so ReadLine+Regex isn't an option. I could use some other way of chunking text input; but the problem is that I don't know the delimiter at compile time (so that's tricky), and that that delimiter might be localization-dependant. For instance, a float followed by a comma might be "1.5," or "1,5," but in both cases attempting to parse the float should be "greedy".
To be safe, I'd like to assume my input is actively hostile (say, streaming in from a network stream): i.e. intentionally missing chunking delimiters.
I'd like to avoid custom Regex's: int.Parse and double.Parse work well and are localization-aware. Don't get me started on DateTime's - I might need a few custom patterns anyhow, but writing Regexes to cover that scenario doesn't sound like fun.
For a concrete example, let's say I have a TextReader and that I know the next value should be a double - how can I extract that double and possibly a limited amount of lookahead without reading the entire stream and without manually writing a localizable double-parser?
Similar Questions
There's a previous question "Looking for C# equivalent of scanf" which sounds similar but the Q+A focus on readline+regex (which I'd like to avoid). How can I use Regex against a TextReader? didn't find an answer (beyond chunking), and in any case I'd like to avoid writing my own Regexes.
Based on that lack of answers and still not having found anything myself, it seems that
There is no means to use localized parsing directly from Streams (or TextReaders) in .NET, nor is there a way to know how much of the stream corresponds to a parseable prefix in a systematic way.
There is no means to apply regular expressions to Streams (or TextReaders) in .NET, so there's no easy way of implementing something like this yourself.
If you really need something like this, the easiest option is a full-fledged parser generator. ANTLR works well for this; it has a lot of existing grammars you can copy-paste for the basics, and it comes with a GUI to help understand your grammar and makes parsers for .NET, java, C and a host of other languages. It's developer friendly, fast... ...but way too powerful and flexible for what I need; like shooting a bug with a shotgun - I'm not thrilled with this solution.

BBCode to HTML transformation rules

Background
I have written very simple BBCode parser using C# which transforms BBCode to HTML. Currently it supports only [b], [i] and [u] tags. I know that BBCode is always considered as valid regardless whatever user have typed. I cannot find strict specification how to transform BBCode to HTML
Question
Does standard "BBCode to HTML" specification exist?
How should I handle "[b][b][/b][/b]"? For now parser yields "<b>[b][/b]</b>".
How should I handle "[b][i][u]zzz[/b][/i][/u]" input? Currently my parser is smart enough to produce "<b><i><u>zzz</u></i></b>" output for such case, but I wonder that it is "too smart" approach, or it is not?
More details
I have found some ready-to-use BBCode parser implementations, but they are too heavy/complex for me and, what is worse, use tons of Regular Expressions and produce not that markup what I expect. Ideally, I want to receive XHTML at the output. For inferring "BBCode to HTML" transformation rules I am using this online parser: http://www.bbcode.org/playground.php. It produces HTML that is intuitively correct on my opinion. The only thing I dislike it does not produce XHTML. For example "[b][i]zzz[/b][/i]" is transformed to "<b><i>zzz</b></i>" (note closing tags order). FireBug of course shows this as "<b><i>zzz</i></b><i></i>". As I understand, browsers fix such wrong closing tags order cases, but I am in doubt:
Should I rely on this browsers feature and do not try to make XHTML.
Maybe "[b][i]zzz[/b]ccc[/i]" must be understood as "<b>[i]zzz</b>ccc[/i]" - looks logically for such improper formatting, but is in conflict with popular forums BBCode outputs (*zzz****ccc*, not **[i]zzzccc[/i])
Thanks.
On your first question, I don't think that relying on browsers to correct any kind of mistakes is a good idea regardless the scope of your project (well, maybe except when you're actually doing bug tests on the browser itself). Some browsers might do an awesome job on that while others might fail miserably. The best way to make sure the output syntax is correct (or at least as correct as possible) is to send it with a correct syntax to the browser in the first place.
Regarding your second question, since you're trying to have correct BBCode converted to correct HTML, if your input is [b][i]zzz[/b]ccc[/i], its correct HTML equivalent would be <i><b>zzz</b>ccc</i> and not <b>[i]zzz</b>ccc[/i]. And this is where things get complicated as you would not be writing just a converter anymore, but also a syntax checker/correcter. I have written a similar script in PHP for a rather weird game engine scripting language but the logic could be easily applied to your case. Basically, I had a flag set for each opening tag and checked if the closing tag was in the right position. Of course, this gives limited functionality but for what I needed it did the trick. If you need more advanced search patterns, I think you're stuck with regex.
If you're only going to implement B, I and U, which aren't terribly important tags, why not simply have a counter for each of those tags: +1 each time it is opened, and -1 each time it's closed.
At the end of a forum post (or whatever) if there are still-open tags, simply close them. If the user puts in invalid bbcode, it may look strange for the duration of their post, but it won't be disastrous.
Regarding invalid user-submitted markup, you have at least three options:
Strip it out
Print it literally, i.e. don't convert it to HTML
Attempt to fix it.
I don't recommend 3. It gets really tricky really fast. 1 and 2 are both reasonable options.
As for how to parse BBCode, I strongly recommend against using regex. BBCode is actually a fairly complex language. Most significantly, it supports nesting of tags. Regex can't handle arbitrary nesting. That's one of the fundamental limitations of regex. That makes it a bad choice for parsing languages like HTML and BBCode.
For my own project, rbbcode, I use a parsing expression grammer (PEG). I recommend using something similar. In general, these types of tools are called "compiler compilers," "compiler generators," or "parser generators." Using one of these is probably the sanest approach, as it allows you to specify the grammar of BBCode in a clean, readable format. You'll have fewer bugs this way than if you use regex or attempt to build your own state machine.

Categories