Collapse control characters in text file (pseudoconsole) - c#

I have a text file it includes various control characters including backspace \b and carriage return \r. For example
100\r101\r102\r103
£\b$\b%
I'd like to 'collapse' these control characters to leave me with the text one would see in a console that understood these control characters:
103
%
I don't know what this is called. If it has a name, please share it, so I can search for it.

i guess you can simply make a loop on it then replace the characters(backspace \b and carriage return \r) with ones that console could understand

Related

Char.IsControl method does not recognize some characters as control

I noticed that C# 'Char.IsControl' method doesn't recognize some characters as control. For example, the following code outputs false for both values:
char pilcrow = '\u00B6';
char softHyphen = '\u00AD';
Console.Write("{0},{1}",char.IsControl(pilcrow), char.IsControl(softHyphen)); // -> 'false,false'
Is this an expected behavior? I need to escape such characters in my code.
Those aren't control characters. One is the pilcrow sign ¶ which belongs to the Punctuation, Other [Po] category , the other is the soft hyphen, a non-visible formatting character that affects how texts gets hyphenated.
There's nothing special about them, in fact you probably use the soft hyphen when writing a paragraph in Word and want to control hyphenation of some words. Word uses ¶ as the paragraph mark - a visualization of a paragraph's end. It doesn't affect formatting, it's just the common way to denote the end of paragraph. In that respect it's no different than ², ³, §, ¶, ¤, ¦, °, ±, ½, ¬ (just holding Right Alt and hitting keys)
.NET strings use Unicode so there's no need to escape these characters. You can just type them directly.
There's no problem with printing either - those characters are used in document processing after all. The soft hyphen controls how the UI or the print engine lays out text during rendering to the screen or paper.
If someone doesn't want those characters to be printed, a simple string.Remove will do the job. Re­moving the hyphen can affect how text is printed though, with long words moving to the next line. I added that hyphen to Removing in the previous sentence to force hyphenation. Without it, Removing would have moved to the next line

Crystal Reports Improperly Encoding Hyphens On Export to Word

I'm having a problem with a generated word document (coming from crystal reports engine via a C#.net application). Initially hyphens are visible but if the text is copied and pasted with "keep text only" option or "remove formatting option" the hyphen character gets changed to a blank space" ".
I'm quite sure this is an issue with the encoding of the character, probably it is encoded as soft hyphen. Is there any way to resolve this via the crystal report engine.
I have already checked and confirmed that the source text is an actual hyphen character in the database.
It seems that the common Ascii hyphen, U+002D HYPHEN-MINUS in Unicode, gets converted to a code that is treated as Nonbreaking Hyphen in Word. A comment says that in “Show All” mode in Word, it looks like a hyphen, but longer. This means that it looks like an en dash “–”. Internally, it is byte 1E hexadecimal, so we could say that it is the control character U+001E. But it is unaffected by the use of AltX. And if you copy text containing it and paste it as plain text, it gets replaced by U+0020 SPACE, so it’s really treated as a special code and not as a character.
It is not at all the same as NON-BREAKING HYPHEN U+2011 in Unicode; instead, it is Microsoft’s own idea of handling a situation where you want a hyphen to appear but don’t want Word to break a string into two lines after a hyphen (e.g., in the string “F-1”, where such a break would look ridiculous).
So you could try to find how this happens in the report engine and to prevent it. But it may be something more complicated than just replacing “-” by the bye 1E.
If you need to deal with the issue in Word, you can use Find and Replace, where the special characters menu contains “Nonbreaking Hyphen”. You could replace it by the common hyphen “-”, losing the non-breakability.
You could alternatively replace it by NON-BREAKING HYPHEN U+2011, which would preserve that property. But this might cause problems if transferred to other programs, or due to font problems (not all fonts contain that character). The font problem can be tricky: Word automatically switches to another font when needed and does not inform about this, so your text might contain characters in different fonts, which may cause problems of many kinds (such as uneven line spacing). Moreover, when U+2011 is present, it may look different from the common Ascii hyphen; it more or less should. Typographically, if you use U+2011, your normal (breaking) hyphens should be U+2010 HYPHEN.

clean up word pasted apostrophe and other special characters

I've looked around and have not found a solid solution to this. Is there a way to replace word pasted apostrophes and other special characters coming from a text editor (radeditor)? When I attempt to send them in an email, these characters are being replaced with a ? question mark. I would prefer not to manually replace every possible special character, unless there is a know list somewhere.

C# RegularExpressionValidator Trim and Count

I have a textbox with a RegularExpressionValidator. I want to require the user to enter at least n characters. I'd also like to remove whitespace both at the start and end of the textbox. I'd still like to allow spaces within the textbox, I just want to remove the excess at the beginning and end.
I basically don't know how to combine the trim regex and the count together for use in a REV.
trim: ^\s*((?:[\S\s]*\S)?)\s*$
count: .{10}.*
I basically want to know if the input, after leading and trailing whitespace is removed, is greater than n characters.
You could use word boundaries to ignore the whitespace in the beginning, accept 10 characters, then end with another word boundary with a pattern like this:
\b.{10}\b
Be sure to also use a RequiredFieldValidator to cover empty inputs since the RegularExpressionValidator does not handle such cases.

string replace on escape characters

Today I found out that putting strings in a resource file will cause them to be treated as literals, i.e putting "Text for first line \n Text for second line" will cause the escape character itself to become escaped, and so what's stored is "Text for first line \n Text for second line" - and then these come out in the display, instead of my carriage returns and tabs
So what I'd like to do is use string.replace to turn \\ into \ - this doesn't seem to work.
s.Replace("\\\\", "\\");
doesn't change the string at all because the string thinks there's only 1 backslash
s.Replace("\\", "");
replaces all the double quotes and leaves me with just n instead of \n
also, using # and half as many \ chars or the Regex.Replace method give the same result
anyone know of a good way to do this without looping through character by character?
Since \n is actually a single character, you cannot acheive this by simply replacing the backslashes in the string. You will need to replace each pair of \ and the following character with the escaped character, like:
s.Replace("\\n", "\n");
s.Replace("\\t", "\t");
etc
You'd be better served adjusting the resx files themselves. Line breaks can be entered via two mechanisms: You can edit the resx file as XML (right-click in Solution Explorer, choose "Open As," and choose XML), or you can do it in the designer.
If you do it in the XML, simply hit Enter, backspace to the beginning of the newline you've created, and you're done. You could do this with Search and Replace, as well, though it will be tricky.
If you use the GUI resx editor, holding down SHIFT while pressing ENTER will give you a line break.
You could do the run-time replacement thing, but as you are discovering, it's tricky to get going -- and in my mind constitutes a code smell. (You can also make a performance argument, but that would depend on how often string resources are called and the scale of your app in general.)
I'm actually going with John's solution and editing the XML directly as that's the better solution for the project, but codelogic answered the question that was driving me insane.

Categories