I need to put text with RTF format in a richtextbox, I try to put it with the richtextbox.rtf = TextString parameter, but the problem is that the string has special chars and the richtextbox does not show all the string correctly. The String and code that I am using:
String (TextString):
╔═══This is only an example, the special characters may change═══╗
C# Code:
String TextString = System.Text.Encoding.UTF8.GetString(TextBytes);
String TextRTF = #"{\rtf1\ansi " + TextString + "}";
richtextbox1.Rtf = TextRTF;
With this code, richtextbox show "+---This is only an example, the special characters may change---+" and in some cases, show "??????".
How can i solve this problem? if i change \rtf1\ansi to \rtf1\utf-8, i not see changes.
You can simply use the Text property:
richTextBox1.Text = "╔═══This is only an example, the special characters may change═══╗";
If you want to use the RTF property:
Take a look at this question: How to output unicode string to RTF (using C#)
You need to use something like this to convert the special characters to rtf format:
static string GetRtfUnicodeEscapedString(string s)
{
var sb = new StringBuilder();
foreach (var c in s)
{
if(c == '\\' || c == '{' || c == '}')
sb.Append(#"\" + c);
else if (c <= 0x7f)
sb.Append(c);
else
sb.Append("\\u" + Convert.ToUInt32(c) + "?");
}
return sb.ToString();
}
Then use:
richtextbox1.Rtf = GetRtfUnicodeEscapedString(TextString);
Related
I want to replace the delimiter comma with tabs in a CSV file
Input
Output
Note that commas shouldn't be replaced for words enclosed by quotes. Also in the output, we want to omit the double quotes
I tried the following, but the code also replaces commas for words enclosed by quotes
public void Replace_comma_with_tabs(string path)
{
var file = File
.ReadLines(path)
.SkipWhile(line => string.IsNullOrWhiteSpace(line)) // To be on the safe side
.Select((line, index) => line.Replace(',', '\t')) // replace ',' with '\t'
.ToList(); // Materialization, since we write into the same file
File.WriteAllLines(path, file);
}
How can I skip commas for the words enclosed by quotes?
Here is one way of doing it. It uses flag quotesStarted to check if comma should be treated as delimiter or part of the text in column. I also used StringBuilder since that class has good performance with string concatenation. It reads lines and then for each line it iterates through its characters and checks for those with special meaning (comma, single quote, tab, comma between single quotes):
static void Main(string[] args)
{
var path = "data.txt";
var file = File.ReadLines(path).ToArray();
StringBuilder sbFile = new StringBuilder();
foreach (string line in file)
{
if (String.IsNullOrWhiteSpace(line) == false)
{
bool quotesStarted = false;
StringBuilder sbLine = new StringBuilder();
foreach (char currentChar in line)
{
if (currentChar == '"')
{
quotesStarted = !quotesStarted;
sbLine.Append(currentChar);
}
else if (currentChar == ',')
{
if (quotesStarted)
sbLine.Append(currentChar);
else
sbLine.Append("\t");
}
else if (currentChar == '\t')
throw new Exception("Tab found");
else
sbLine.Append(currentChar);
}
sbFile.AppendLine(sbLine.ToString());
}
}
File.WriteAllText("Result-" + path, sbFile.ToString());
}
There's a lot of ways to do this but here's one. This only includes the code to transform a string that has comma delimited text with quoted text. You'd use "ToTabs" instead of "Replace" inside your Select statement. You'll have to harden this to add some error checking.
This will handle escaped quotes inside of quoted fields and it transforms existing tabs to spaces, but it's not a full blown CSV parser.
static class CsvHelper
{
public static string ToTabs(this string source)
{
Func<char,char> getState = NotInQuotes;
char last = ' ';
char InQuotes(char ch)
{
if ('"' == ch && last != '"')
getState = NotInQuotes;
else if ('\t' == ch)
ch = ' ';
last = ch;
return ch;
}
char NotInQuotes(char ch)
{
last = ch;
if ('"' == ch)
getState = InQuotes;
else if (',' == ch)
return '\t';
else if ('\t' == ch)
ch = ' ';
return ch;
}
return string.Create(source.Length, getState, (buffer,_) =>
{
for (int i = 0; i < source.Length; ++i)
{
buffer[i] = getState(source[i]);
}
});
}
}
static void Main(string[] _)
{
const string Source = "a,string,with,commas,\"field,with,\"\"commas\", and, another";
var withTabs = Source.ToTabs();
Console.WriteLine(Source);
Console.WriteLine(withTabs);
}
To change commas in a string to tabs, use Replace method.
Example:
str2.Replace(",", "hit tab key");
string str = "Lucy, John, Mark, Grace";
string str2 = str.Replace(",", " ");
I need verify if string contains + in a interval of single quotes.
Example: string str = "'Name + R405'".
But, it may happen that this string has more than one range with these values.
Example: string str = "'Name + R405' + '(Name)'". In this case, the second + has a particular function in my code (it is out of single quotes).
In other words, I need identify only + that are within the single quotes. If have a other way for this, please explain for me.
Update:
Within single quotes (where is the text I need) may contain other single quotes. Therefore, I can not simply do checks to observe the beginning and end of a pair of single quotes.
Update 2:
I have a problem that might be a little complicated. My system has functions that take certain strings, and those strings are manipulated according to certain parameter:
Text in single quotes are not altered / manipulated;
To separate one text from another, is used +;
My string must accept any character (this is a problem, I know).
For example: "'Name' + On + 'Sector'". Strings like this, only have the part "On" manipulated by these methods. However, I have strings like "'Name + Code' + On + 'Sector'" or "'Name'+Code '+ On +'Sector'". The "Name + Code"/"Name'+Code" will not be manipulated. Thus, the methods are "confused" with this kind of text and using the + and single quotes that are within parts of the text should that not be changed. But I can not change the methods, must perform a treatment in the string before calling the methods for them.
You can do this by iterating through the characters and keeping track of the single quotes you have seen.
public static bool HasPlusBetweenSingleQuotes(string str)
{
bool inSingleQuotes = false;
foreach (char c in str)
{
if (c == '\'')
{
inSingleQuotes = !inSingleQuotes;
}
else if (c == '+' && inSingleQuotes)
{
return true;
}
}
return false;
}
If you need the indexes of the plus signs within single quotes you can do the following.
public static IEnumerable<int> PlusBetweenSingleQuotesIndexes(string str)
{
bool inSingleQuotes = false;
for(int i=0;i<str.Length;i++)
{
if (str[i] == '\'')
{
inSingleQuotes = !inSingleQuotes;
}
else if (str[i] == '+' && inSingleQuotes)
{
yield return i;
}
}
}
Note that these methods do not verify that every opening single quote has a closing single quote.
EDIT
If you have delimited quotes you just check if the previous character is the delimiter like \.
public static bool HasPlusBetweenSingleQuotes(string str)
{
bool inSingleQuotes = false;
char previous = ' '; // just defaulting to a space.
foreach (char c in str)
{
if (c == '\'' && previous != '\\')
{
inSingleQuotes = !inSingleQuotes;
}
else if (c == '+' && inSingleQuotes)
{
return true;
}
previous = c;
}
return false;
}
I'm not sure if this can be done with a regular expression (It might be possible?). It would be easier just to do this with a loop of characters and track if you are in or outside of quotes.
bool inBlock = false;
foreach(var aChar in string mySentence) {
//Testing with ascii codes + == +, ' == '
inBlock = (aChar == "'") ? !inBlock : inBlock;
if(inBlock && aChar == "+")
// do stuff here
}
As a note, the code might not work, I didn't test it.
Why not invert the logic here and use the "concatenation sequences" as the structure for the pattern? These can be described as a sequence of + or +On+ (with optional spaces) that are in between single quoted (possibly non-balanced) strings. Match the "glue" sequence bookended by a lookbehind for a ' and a lookahead for a ', and you can parse the string into "single quoted strings" and "glue" tokens:
var strings = new string[]
{"'Name'+Code '+ On +'Sector'",
"'Name + R405' + '(Name)'",
"'Name + Code' + On + 'Sector'",
"'Name''+'Sector'"
};
const string pattern = #"(?<=')(\s*\+\s*|\s*\+\s*On\s*\+\s*)(?=')";
foreach (string s in strings)
{
Console.WriteLine("input:"+s);
string[] tokens = Regex.Split(s, pattern);
foreach (string token in tokens)
{
Console.WriteLine("token:->{0}<-", token);
}
//tokens.Where((x, i) => i % 2 == 0) //single quoted strings
//tokens.Where((x, i) => i % 2 != 0) //glue sequences
}
I have a multi-language application in asp.net C#. Here I have to create a zip file and use some items from the database to construct file name. I strip out special characters from file name. However if the language is German for example my trimming algorithm will remove some german characters like Umlaut.
Could someone provide me with a language adaptable trimming algorithm.
Here is my code:
private string RemoveSpecialCharacters(string str)
{
return str;
StringBuilder sb = new StringBuilder();
foreach (char c in str)
{
if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') | c == '.' || c == '_' || c == ' ' || c == '+')
{
sb.Append(c);
}
}
return sb.ToString();
}
thanks
Assuming you mean the name of the ZIP file, instead of the names inside the ZIP file, you probably want to check if the character is valid for a filename, which will allow you to use more than just letters or digits:
char[] invalid = System.IO.Path.GetInvalidFileNameChars();
string s = "abcöü*/";
var newstr = new String(s.Where(c => !invalid.Contains(c)).ToArray());
string s = "abcöü*/";
var newstr = new String( s.Where(Char.IsLetterOrDigit).ToArray() );
A more versatile variant that will mangle the string less is:
public static string RemoveDiacritics(this string s)
{
// split accented characters into surrogate pairs
IEnumerable<char> chars = s.Normalize(NormalizationForm.FormD);
// remove all non-ASCII characters – i.e. the accents
return new string(chars.Where(c => c < 0x7f && !char.IsControl(c)).ToArray());
}
This should remove most problematic characters while still preserving most of the text. (If you're creating filenames, you might also want to replace newlines and tabs with the space character.)
One-liner, assuming ASCII where non-printable are essentially all chars before the space:
var safeString = new string(str.Select(c=>c<' '?'_':c).ToArray());
Question
How do I convert the string "Européen" to the RTF-formatted string "Europ\'e9en"?
[TestMethod]
public void Convert_A_Word_To_Rtf()
{
// Arrange
string word = "Européen";
string expected = "Europ\'e9en";
string actual = string.Empty;
// Act
// actual = ... // How?
// Assert
Assert.AreEqual(expected, actual);
}
What I have found so far
RichTextBox
RichTextBox can be used for certain things. Example:
RichTextBox richTextBox = new RichTextBox();
richTextBox.Text = "Européen";
string rtfFormattedString = richTextBox.Rtf;
But then rtfFormattedString turns out to be the entire RTF-formatted document, not just the string "Europ\'e9en".
Stackoverflow
Insert string with special characters into RTF
How to output unicode string to RTF (using C#)
Output RTF special characters to Unicode
Convert Special Characters for RTF (iPhone)
Google
I've also found a bunch of other resources on the web, but nothing quite solved my problem.
Answer
Brad Christie's answer
Had to add Trim() to remove the preceeding space in result. Other than that, Brad Christie's solution seems to work.
I'll run with this solution for now even though I have a bad gut feeling since we have to SubString and Trim the heck out of RichTextBox to get a RTF-formatted string.
Test case:
[TestMethod]
public void Test_To_Verify_Brad_Christies_Stackoverflow_Answer()
{
Assert.AreEqual(#"Europ\'e9en", "Européen".ConvertToRtf());
Assert.AreEqual(#"d\'e9finitif", "définitif".ConvertToRtf());
Assert.AreEqual(#"\'e0", "à".ConvertToRtf());
Assert.AreEqual(#"H\'e4user", "Häuser".ConvertToRtf());
Assert.AreEqual(#"T\'fcren", "Türen".ConvertToRtf());
Assert.AreEqual(#"B\'f6den", "Böden".ConvertToRtf());
}
Logic as an extension method:
public static class StringExtensions
{
public static string ConvertToRtf(this string value)
{
RichTextBox richTextBox = new RichTextBox();
richTextBox.Text = value;
int offset = richTextBox.Rtf.IndexOf(#"\f0\fs17") + 8; // offset = 118;
int len = richTextBox.Rtf.LastIndexOf(#"\par") - offset;
string result = richTextBox.Rtf.Substring(offset, len).Trim();
return result;
}
}
Doesn't RichTextBox always have the same header/footer? You could just read the content based on off-set location, and continue using it to parse. (I think? please correct me if I'm wrong)
There are libraries available, but I've never had good luck with them personally (though always just found another method before fully exhausting the possibilities). In addition, most of the better ones are usually include a nominal fee.
EDIT
Kind of a hack, but this should get you through what you need to get through (I hope):
RichTextBox rich = new RichTextBox();
Console.Write(rich.Rtf);
String[] words = { "Européen", "Apple", "Carrot", "Touché", "Résumé", "A Européen eating an apple while writing his Résumé, Touché!" };
foreach (String word in words)
{
rich.Text = word;
Int32 offset = rich.Rtf.IndexOf(#"\f0\fs17") + 8;
Int32 len = rich.Rtf.LastIndexOf(#"\par") - offset;
Console.WriteLine("{0,-15} : {1}", word, rich.Rtf.Substring(offset, len).Trim());
}
EDIT 2
The breakdown of the codes RTF control code are as follows:
Header
\f0 - Use the 0-index font (first font in the list, which is typically Microsoft Sans Serif (noted in the font table in the header: {\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}))
\fs17 - Font formatting, specify the size is 17 (17 being in half-points)
Footer
\par is specifying that it's the end of a paragraph.
Hopefully that clears some things up. ;-)
I found a nice solution that actually uses the RichTextBox itself to do the conversion:
private static string FormatAsRTF(string DirtyText)
{
System.Windows.Forms.RichTextBox rtf = new System.Windows.Forms.RichTextBox();
rtf.Text = DirtyText;
return rtf.Rtf;
}
http://www.baltimoreconsulting.com/blog/development/easily-convert-a-string-to-rtf-in-net/
This is how I went:
private string ConvertString2RTF(string input)
{
//first take care of special RTF chars
StringBuilder backslashed = new StringBuilder(input);
backslashed.Replace(#"\", #"\\");
backslashed.Replace(#"{", #"\{");
backslashed.Replace(#"}", #"\}");
//then convert the string char by char
StringBuilder sb = new StringBuilder();
foreach (char character in backslashed.ToString())
{
if (character <= 0x7f)
sb.Append(character);
else
sb.Append("\\u" + Convert.ToUInt32(character) + "?");
}
return sb.ToString();
}
I think using a RichTextBox is:
1) overkill
2) I don't like RichTextBox after spending days of trying to make it work with an RTF document created in Word.
Below is an ugly example of converting a string to an RTF string:
class Program
{
static RichTextBox generalRTF = new RichTextBox();
static void Main()
{
string foo = #"Européen";
string output = ToRtf(foo);
Trace.WriteLine(output);
}
private static string ToRtf(string foo)
{
string bar = string.Format("!!##!!{0}!!##!!", foo);
generalRTF.Text = bar;
int pos1 = generalRTF.Rtf.IndexOf("!!##!!");
int pos2 = generalRTF.Rtf.LastIndexOf("!!##!!");
if (pos1 != -1 && pos2 != -1 && pos2 > pos1 + "!!##!!".Length)
{
pos1 += "!!##!!".Length;
return generalRTF.Rtf.Substring(pos1, pos2 - pos1);
}
throw new Exception("Not sure how this happened...");
}
}
I know it has been a while, hope this helps..
This code is working for me after trying every conversion code I could put my hands on:
titleText and contentText are simple text filled in a regular TextBox
var rtb = new RichTextBox();
rtb.AppendText(titleText)
rtb.AppendText(Environment.NewLine);
rtb.AppendText(contentText)
rtb.Refresh();
rtb.rtf now holds the rtf text.
The following code will save the rtf text and allow you to open the file, edit it and than load it back into a RichTextBox back again:
rtb.SaveFile(path, RichTextBoxStreamType.RichText);
Here's improved #Vladislav Zalesak's answer:
public static string ConvertToRtf(string text)
{
// using default template from wiki
StringBuilder sb = new StringBuilder(#"{\rtf1\ansi\ansicpg1250\deff0{\fonttbl\f0\fswiss Helvetica;}\f0\pard ");
foreach (char character in text)
{
if (character <= 0x7f)
{
// escaping rtf characters
switch (character)
{
case '\\':
case '{':
case '}':
sb.Append('\\');
break;
case '\r':
sb.Append("\\par");
break;
}
sb.Append(character);
}
// converting special characters
else
{
sb.Append("\\u" + Convert.ToUInt32(character) + "?");
}
}
sb.Append("}");
return sb.ToString();
}
Not the most elegant, but quite optimal and fast method:
public static string PlainTextToRtf(string plainText)
{
if (string.IsNullOrEmpty(plainText))
return "";
string escapedPlainText = plainText.Replace(#"\", #"\\").Replace("{", #"\{").Replace("}", #"\}");
escapedPlainText = EncodeCharacters(escapedPlainText);
string rtf = #"{\rtf1\ansi\ansicpg1250\deff0{\fonttbl\f0\fswiss Helvetica;}\f0\pard ";
rtf += escapedPlainText.Replace(Environment.NewLine, "\\par\r\n ") + ;
rtf += " }";
return rtf;
}
.
Encode characters (Polish ones) method:
private static string EncodeCharacters(string text)
{
if (string.IsNullOrEmpty(text))
return "";
return text
.Replace("ą", #"\'b9")
.Replace("ć", #"\'e6")
.Replace("ę", #"\'ea")
.Replace("ł", #"\'b3")
.Replace("ń", #"\'f1")
.Replace("ó", #"\'f3")
.Replace("ś", #"\'9c")
.Replace("ź", #"\'9f")
.Replace("ż", #"\'bf")
.Replace("Ą", #"\'a5")
.Replace("Ć", #"\'c6")
.Replace("Ę", #"\'ca")
.Replace("Ł", #"\'a3")
.Replace("Ń", #"\'d1")
.Replace("Ó", #"\'d3")
.Replace("Ś", #"\'8c")
.Replace("Ź", #"\'8f")
.Replace("Ż", #"\'af");
}
I have a C# routine that imports data from a CSV file, matches it against a database and then rewrites it to a file. The source file seems to have a few non-ASCII characters that are fouling up the processing routine.
I already have a static method that I run each input field through but it performs basic checks like removing commas and quotes. Does anybody know how I could add functionality that removes non-ASCII characters too?
Here a simple solution:
public static bool IsASCII(this string value)
{
// ASCII encoding replaces non-ascii with question marks, so we use UTF8 to see if multi-byte sequences are there
return Encoding.UTF8.GetByteCount(value) == value.Length;
}
source: http://snipplr.com/view/35806/
string sOut = Encoding.ASCII.GetString(Encoding.ASCII.GetBytes(s))
Do it all at once
public string ReturnCleanASCII(string s)
{
StringBuilder sb = new StringBuilder(s.Length);
foreach(char c in s)
{
if((int)c > 127) // you probably don't want 127 either
continue;
if((int)c < 32) // I bet you don't want control characters
continue;
if(c == ',')
continue;
if(c == '"')
continue;
sb.Append(c);
}
return sb.ToString();
}
If you wanted to test a specific character, you could use
if ((int)myChar <= 127)
Just getting the ASCII encoding of the string will not tell you that a specific character was non-ASCII to begin with (if you care about that). See MSDN.
Here's an improvement upon the accepted answer:
string fallbackStr = "";
Encoding enc = Encoding.GetEncoding(Encoding.ASCII.CodePage,
new EncoderReplacementFallback(fallbackStr),
new DecoderReplacementFallback(fallbackStr));
string cleanStr = enc.GetString(enc.GetBytes(inputStr));
This method will replace unknown characters with the value of fallbackStr, or if fallbackStr is empty, leave them out entirely. (Note that enc can be defined outside the scope of a function.)
It sounds kind of strange that it's accepted to drop the non-ASCII.
Also I always recommend the excellent FileHelpers library for parsing CSV-files.
strText = Regex.Replace(strText, #"[^\u0020-\u007E]", string.Empty);
public string RunCharacterCheckASCII(string s)
{
string str = s;
bool is_find = false;
char ch;
int ich = 0;
try
{
char[] schar = str.ToCharArray();
for (int i = 0; i < schar.Length; i++)
{
ch = schar[i];
ich = (int)ch;
if (ich > 127) // not ascii or extended ascii
{
is_find = true;
schar[i] = '?';
}
}
if (is_find)
str = new string(schar);
}
catch (Exception ex)
{
}
return str;
}