What's the best way to save a RichTextFile in C#? - c#

I'm trying to create a notepad/wordpad clone. I want to save it in .rtf format so that it can be read by wordpad. How do I save a do this in C#?

Assuming you are trying to do this yourself for learning purposes, you don't want to use a library to do it for you:
Basically you need to write a program which holds in memory either a string with RTF markup, or a DOM-like data tree. Using the RTF specification, you should write a flat (text) file marked up properly and with a .rtf extension. Just the same as if you were writing an HTML file.

Correct me if I'm wrong, but if you're using the RichTextBox control, you can just use the RichTextBox.SaveFile method to accomplush this. Just a guess though that you mean doing it without using that control.

RTF SpecLink
create the xml spec based on their api and you can make your app compatible with wordpad, word etc.

Related

Cross platform rich text in C#

I come from a world (iOS and OSX) where saving a rich text string (NSAttributedString) to a file is easy (NSAttributedString can serialise itself as RTF or HTML etc.)
I'm now porting my app to Windows/C# and trying to figure out how to load/save RTF (so that my apps can share a file format)
It looks like I need to create a control (RichTextBox) in order to import RTF - which is OK for testing, but I need to read in thousands of rich text snippets so this is (probably) going to be a bit of a bottleneck.
How is this type of cross platform rich text format best achieved? I'm starting to think I will need to copy the .docx approach and create XML with textRun elements etc? This strikes me as a problem that must have been solved already - but google is not forthcoming!!
Thanks

How to create a text file from an existing Pdf document in C#.net

I have PDF document data with table structure format and I would like to convert that PDF file into a text file with the same structure with margin and spaces between text in pdf
You need to write your own PDF tool then. Which is not exactly an easy task. Honestly, 3rd party tools make your job much easier, why don't you want to use one?
If you change your mind, I can suggest iTextSharp. I've used it in the past with great success. Here are some example to get you going:
http://www.codeproject.com/Articles/12445/Converting-PDF-to-Text-in-C
ps. there are 3 tools used in there.

Reading Data From PDF File And Write It Into Word File?

How Can We Read Data From PDF File And Write It In Word File Using Asp.net C# Code...?
You can use the IFilter capabilities built into Windows, here's an article with some example code:
Using-IFilter-in-C
The issue with PDF files is that even if you're able to extract the plaintext of the PDF in readable form (which is not a guarantee by any stretch), the text will be completely unformatted. Even simple things like line breaks will be lost in many cases.

C# PDFSharp: Examples of how to strip text from PDF?

I have a fairly simple task: I need to read a PDF file and write out its image contents while ignoring its text contents. So essentially I need to do the complement of "save as text".
Ideally, I would prefer to avoid any sort of re-compression of the image contents but if it's not possible, it's ok too.
Are the examples of how to do it?
Thanks!
Extracting text from a PDF file with PDFsharp is not a simple task.
It was discussed recently in this thread:
https://stackoverflow.com/a/9161732/162529
Extracting text from a PDF with PdfSharp can actually be very easy, depending on the document type and what you intend to do with it. If the text is in the document as text, and not an image, and you don't care about the position or format, then it's quite simple. This code gets all of the text of the first page in the PDFs I'm working with:
var doc = PdfReader.Open(docPath);
string pageText = doc.Pages[0].Contents.Elements.GetDictionary(0).Stream.ToString();
doc.Pages.Count gives you the total number of pages, and you access each one through the doc.Pages array with the index. I don't recommend using foreach and Linq here, as the interfaces aren't implemented well. The index passed into GetDictionary is for which PDF document element - this may vary based on how the documents are produced. If you don't get the text you're looking for, try looping through all of the elements.
The text that this produces will be full of various PDF formatting codes. If all you need to do is extract strings, though, you can find the ones you want using Regex or any other appropriate string searching code. If you need to do anything with the formatting or positioning, then good luck - from what I can tell, you'll need it.
Example of PDFSharp libraries extracting images from .pdf file:
link
library
EDIT:
Then if you want to extract text from image you have to use OCR libraries.
There are two good OCRs tessnet and MODI
Link to thread on stack
But I fully can recommend MODI which I am using now. Some sample # codeproject.
EDIT 2 :
If you don't want to read text from extracted images, you should write new PDF document and put all of them into it. For writing PDFs I use MigraDoc. It is not difficult to use that library.

How does one parse and convert AutoCAD MText entity to raw text?

I would like to parse AutoCAD's MText entity and extract the raw text. I see a pattern in the way the text is formatted. If this has already been solved, then I would not need to reinvent the wheel. I have searched online, but have not found sufficient information.
I am searching for any links or references on this subject.
Edit:
To further clarify, we are using the ODA (Open Design Aliance) libraries to access the DWG files. I am not familiar with this library. Another developer is using the library and extracting information from the files including MText entities. I am then provided with a file containing the MText text, which is what I am looking at. I am looking at the MText formatted text, which I have access to and am working with in C#.
Questions:
I asked the other developer if the ODA library provided a means to extract the raw text unformatted. His response was that it could, however that it would also result in the entity getting written back to the DWG file. I am interested in the raw text without affecting the original DWG file. Does ODA provide a way of extracting the raw text without altering the file?
I am interested in any documentation on the formatting rules of MText, so that I can consider writing a parser myself if necessary.
Is there anything out there to convert MText to RTF? I realize that RTF would not completely satisfy all formatting rules, but this could provide a satisfactory means of displaying the formatted text in a WinForms app. Given RTF I could also obtain the raw text.
This Forum thread includes a VB program to strip the control characters from the MText. The code indicates what should be done to strip each control character, so it should be straightforward to write something similar in C#.
Additionally, the documentation of the format codes is available in the AutoCAD documentation.
If you are using C# and the .NET interface, the Text property of the MText object provides the raw text:
MText mt;
...
string rawText = mt.Text;
If you want the formatting as well, the solution is different.
If you are parsing an AutoCAD file without AutoCAD, you need to specify what file type you are parsing. However, this question is basically a subset of the following questions:
Are there any libraries for parsing AutoCAD files?
Open source cad drawing (dwg) library in C#
.Net CAD component that can read/write dxf/ dwg files
Reading .DXF files
For DWG, the basic options are Open Design Alliance and AutoCAD RealDWG.
If this doesn't help, please provide more details as to exactly what you are trying to do.
If you are using C#, give the netDXF library a try.
I thought pseudo code should be like this:
DxfDocument dxf = new DxfDocument();
dxf = DxfDocument.Load(openFileDialog1.FileName);//load your file
//This extracts the raw text of your first text obj
dxf.MTexts[0].PlainText;

Categories