WPF RichTextBox: AppendText, TextRange and the trailing newline - c#

I am reading a text file's contents into a RichTextBox like this:
string contents = File.ReadAllText("MyFile.txt");
myRichTextBox.Document.Blocks.Clear();
myRichTextBox.AppendText(contents);
I am using the RichTextBox to automatically apply some syntax highlighting of sorts. When I try reading the unformatted text as described here to save it back to the file, things happen:
A newline (\r\n) is added to the back of the file, which I don't want unless the user explicitly adds this newline.
When I load the file again, the newline is not displayed in the RichTextEdit, even if it is present in the file.
How can I change this, so that the RichTextBox displays and returns exactly the contents of the text file?

The newline \r\n (CR/LF) is part of the text formatting in the RichTextBox control. Each paragraph while converting to the text will be appended by the \r\n.
This is means when a user press the ENTER button a new paragraph with \r\n is adding to the RichTextBox control. And when StringFromRichTextBox() method, described in the Microsoft documentation is used to extract the text content from a RichTextBox it will return a string in which all paragraphs are separated by the \r\n.
The explanations regarding the comments above:
A newline (\r\n) is added to the back of the file, which I don't want unless the user explicitly adds this newline.
A newline \r\n is adding to the end of the file only as a part of the each paragraph ending.
NOTE: If it is necessary to save and thereafter to load the saved document the TextRange.Save() and TextRange.Load() methods can be used:
public void SaveRtf(RichTextBox rtb, string file)
{
var range = new TextRange(rtb.Document.ContentStart, rtb.Document.ContentEnd);
using (var stream = new StreamWriter(file))
{
range.Load(stream.BaseStream, DataFormats.Rtf);
}
}
public void LoadRtf(RichTextBox rtb, string file)
{
var range = new TextRange(rtb.Document.ContentStart, rtb.Document.ContentEnd);
using (var stream = new StreamWriter(file))
{
range.Save(stream.BaseStream, DataFormats.Rtf);
}
}
If to save the whole RuchTextBox content the new TextRange(rtb.Document.ContentStart, rtb.Document.ContentEnd).Text will be used than any text formatting after restoring will be lost.

Could this work? contents.Replace("\r\n", "\n");

Related

How to disable mailto: when clicking an Email like filename?

Consider this simple block of code that reproduces a simple text with a web hyperlink to a file located in the user's file-system.
Document doc = new Document();
Page page1 = doc.Pages.Add();
TextFragment textFragment = new TextFragment();
TextSegment textSegment = new TextSegment("foo#boo.net");
textSegment.Hyperlink = new Aspose.Pdf.WebHyperlink("Images/foo#boo.net");
textFragment.Segments.Add(textSegment);
page1.Paragraphs.Add(textFragment);
doc.Save(dataDir);
The file name is foo#boo.net and therefor the PDF recognizes it as an Email and automatically adds a mailto: prefix, so instead of opening the file (with some default program) its being opened with Outlook.
This question is followed by my previous question regarding this issue, after many attempts like trying using the new Aspose.Pdf.FileHyperlink("Images/foo#boo.net"); but then it does not open any of the files whatever its name is.
Is it possible to add a TextSegment with a valid mail text so the PDF reader won't add the mailto: prefix?
For example (tested in Chrome):
Solved by escaping the text with zero-width-whitespace (definition) like:
internal static string SafeEscapeMailtoPrefix(this string value)
{
if (string.IsNullOrEmpty(value))
return value;
return value.Replace("#", "\u200B#");
}
So the TextSegment will be like:
TextSegment textSegment = newTextSegment("foo#boo.net".SafeEscapeMailtoPrefix());

iTextSharp How to read Table in PDF file

I am working on convert PDF to text. I can get text from PDF correctly but it is being complicated in table structure. I know PDF doesn't support table structure but I think there is a way get cells correctly. Well, for example:
I want to convert to text like this:
> This is first example.
> This is second example.
But, when I convert PDF to text, theese datas looking like this:
> This is This is
> first example. second example.
How can I get values correctly?
--EDIT:
Here is how did I convert PDF to Text:
OpenFileDialog ofd = new OpenFileDialog();
string filepath;
ofd.Filter = "PDF Files(*.PDF)|*.PDF|All Files(*.*)|*.*";
if (ofd.ShowDialog() == DialogResult.OK)
{
filepath = ofd.FileName.ToString();
string strText = string.Empty;
try
{
PdfReader reader = new PdfReader(filepath);
for (int page = 1; page < reader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
string s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
strText += s;
}
reader.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
To make my comment an actual answer...
You use the LocationTextExtractionStrategy for text extraction:
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
string s = PdfTextExtractor.GetTextFromPage(reader, page, its);
This strategy arranges all text it finds in left-to-right lines from top to bottom (actually also taking the text line angle into account). Thus, it clearly is not what you need to extract text from tables with cells with multi-line content.
Depending on the document in question there are different approaches one can take:
Use the iText SimpleTextExtractionStrategy if the text drawing operations in the document in question already are in the order one wants for text extraction.
Use a custom text extraction strategy which makes use of tagging information if the document tables are properly tagged.
Use a complex custom text extraction strategy which tries to get hints from text arrangements, line paths, or background colors to guess the table cell structure and extract text cell by cell.
In this case, the OP commented that he changed LocationTextExtractionStrategy with SimpleTextExtractionStrategy, then it worked.

Extract text from pdf by format

I am trying to extract the headlines from pdfs.
Until now I tried to read the plain text and take the first line (which didn't work because in plain text the headlines were not at the beginning) and just read the text from a region (which didn't work, because the regions are not always the same).
The easiest way to do this is in my opinion to read just text with a special format (font, fontsize etc.).
Is there a way to do this?
You can enumerate all text objects on a PDF page using Docotic.Pdf library. For each of the text objects information about the font and the size of the object is available. Below is a sample
public static void listTextObjects(string inputPdf)
{
using (PdfDocument pdf = new PdfDocument(inputPdf))
{
string format = "{0}\n{1}, {2}px at {3}";
foreach (PdfPage page in pdf.Pages)
{
foreach (PdfPageObject obj in page.GetObjects())
{
if (obj.Type != PdfPageObjectType.Text)
continue;
PdfTextData text = (PdfTextData)obj;
string message = string.Format(format, text.Text, text.Font.Name,
text.Size.Height, text.Position);
Console.WriteLine(message);
}
}
}
}
The code will output lines like the following for each text object on each page of the input PDF file.
FACTUUR
Helvetica-BoldOblique, 19.04px at { X=51.12; Y=45.54 }
You can use the retrieved information to find largest text or bold text or text with other properties used to format the headline.
If your PDF is guaranteed to have headline as the topmost text on a page than you can use even simpler approach
public static void printText(string inputPdf)
{
using (PdfDocument pdf = new PdfDocument(inputPdf))
{
foreach (PdfPage page in pdf.Pages)
{
string text = page.GetTextWithFormatting();
Console.WriteLine(text);
}
}
}
The GetTextWithFormatting method returns text in the reading order (i.e from left top to right bottom position).
Disclaimer: I am one of the developer of the library.

Replace found strings with new strings?

I have a open file dialog that open XML file. The regex expression find every string between > and <, and write every string in new line to the rich text box.
private void button1_Click(object sender, EventArgs e)
{
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
StreamReader sr = new StreamReader(openFileDialog1.FileName);
string s = sr.ReadToEnd();
richTextBox1.Text = s;
}
string txt = richTextBox1.Text;
var foundWords = Regex.Matches(txt, #"(?<=>)([\w ]+?)(?=<)");
richTextBox1.Text = string.Join("\n", foundWords.Cast<Match>().Select(x => x.Value).ToArray());
}
Then I can change those strings. But how can I import those changed strings back to original XML file on its same place?
You could try to replace these strings inside a file, but once you replace something with a different length, it would be simpler to just write the entire file instead.
It looks like the user is able to modify these strings - that's your challenge there: you will have to keep track of which word was where in the original file to replace them back into the data. Furthermore the user is able to remove or add lines to the textbox, what would your application do in that case?
It would be easier to process the xml file using XDocument and store the XElements that contain the original values. XDocument allows you to replace these values and store the file.
Note that since you're not explicitly closing the StreamReader, the file may still be in use when you try to write it. Simply put the StreamReader in a using block to prevent this.

RichTextBox getting literal text

I'm trying to save the text from my RichTextBox to a text file. However, upon doing so, the new lines in the RTB aren't considered "new lines" and hence a \n isn't appended in the text file, so if I have:
This is a test
line of content
It will write to the text file as:
This is a testline of content
I'm saving the text the following way:
File.WriteAllText(currentFile, richTextBox1.Text);
I'm just wondering if there's any solution for this. Thanks.
Try this:
StreamWriter sw = File.CreateText(currentFile);
for (int i = 0; i < richTextBox1.Lines.Length; i++)
{
sw.WriteLine(richTextBox1.Lines[i]);
}
sw.Flush();
sw.Close();
or use the rtb.SaveFile() method: http://msdn.microsoft.com/en-us/library/system.windows.forms.richtextbox.savefile%28VS.71%29.aspx
Use the built it SaveFile method instead.
richTextBox1.SaveFile(currentFile, RichTextBoxStreamType.PlainText);

Categories