Retaining Tabs in String When Rendering MigraDoc PDF - c#

Basically, I was able to successfully generate a multiple page PDF (from a string read from a .txt file) using the MigraDoc sample here:
http://www.pdfsharp.net/wiki/MigraDocHelloWorld-sample.ashx
The problem is that in the original text there are tabs and whitespace that yields a document with centered text, as well as sections of text separated by spaces. It looks this way in notepad as well as in the string viewer in Visual Studio.
When the PDF is generated, everything is left justified and all of the tabs and extra whitespace has been removed.
Given the sample in the above link, how can I keep the originally white space? It seems like if notepad can render it correctly and the string reads in correctly in C#, the PDF generated should also look exactly the same.

MigraDoc is like HTML: multiple spaces get merged into one space.
To get centred text with MigraDoc, just set the paragraph to centre alignment. IMHO that is the best way.
To retain multiple spaces in pre-formatted text just replace the spaces with non-breaking spaces.
See also:
https://stackoverflow.com/a/19301602/1015447

Related

text file: Reading file line by line

but after reading file getting output of 2nd Line is not getting with same space.why
THE industrial area layout date 11/12/2020
Head office page no
below is my code
lines = File.ReadAllLines(path,Encoding.UTF8));
text file:
THE industrial area layout date 11/12/2020
Head office page no
After Reading:
THE industrial area layout date 11/12/2020
Head office page no
How would i be able to do this? Thanks in advance.
Make sure you are analyzing the content with the same font, more specifically make sure you have monospace fonts otherwise you might get confused related to the spaces.
I would suggest Courier New.
What you see in the debugger text view doesn't use a monospace font by default so it's unreliable when checking the number of spaces between lines especially when there's other content besides spaces.
Another way to check the number of spaces is to replace them with some other character so that you can count them visibly.

Add Fixed Width file to PDF

I have a client that is asking me to add a fixed width (510 character) header record to a PDF file. They have asked that I create a new page (not problem) in which I write this fixed width header record on.
I can do this, and see the header record as page 1, followed by the original PDF. The problem is white space. The 510 character fixed width header is about 60% white space and all the ways I've tried generating the PDF cause this to be truncated. There are also line breaks where the text wraps. The client want to be able to use some OCR software they have purchased in order to read this header file from page 1.
I know very little about PDF file format. I've tried using ABCpdf, PDFsharp, and also created an RDLC and bound it to this header string and then generated a PDF from that. All 3 resulted in the same outcome.
Let me say I know how crazy this sounds, but it's what a client is requesting. I proposed several other ways in which we could solve their problem, but this (right now) is the only one they are comfortable with. They are not comfortable with me just appending the 510 characters onto the byte array, and having them separate it out programatically.
Are you looking to have a page displaying the long header? You can create a PDF page of any size (Print to PDF with a custom pages size of 20" wide by 6" tall. Weird but possible.)
Once that page is created, it can be inserted into another document of regular letter size pages.
Are you looking for consecutive pages displaying chunks of the header?
Using an OCR to read content that you put in is an overkill. Instead of rendering the 500-character header as text. Render it as single-character form fields. This way it will be easy to access those form-fields by name and retrieve the values using the same PDF library which you created the PDFs.

PDF doesn't wrap text lines automatically & respect line position

I'm trying to generate a PDF via code because not all actual PDF .NET libraries support the new Windows Runtime for Windows/Windows Phone 8.1.
The PDF is saved correctly, with only a bug of stream position count that I can fix easily, but, as you can see, the text doesn't wrap if line is too long.
I tried with PDF NewLine char (\n), but C# automatically convert it in the input string
Also, I can't understand the position of lines or objects to put into the document, because I follow this guide online that talk about a reversing axis disposition (x for height and y for width), but seems I didn't catch the right methodology (I put in my code a constant left position, at 40, and a variable top descreasing value (from 600, I'm not manage now the multipage if the value is less than 0).
This is the code of PDF generated:
http://pastebin.com/ZkZmbJdM
(Sorry if I use Pastebin, but using this editor Code function the code seems to be unformatted for special characters used for it)
Where am I doing wrong?
PDF is a graphical format trying to make you think it's a document format. But nope, it's just like drawing with GDI+ for instance. This is the reason why it can achieve the same rendered output across many platforms/programs/etc as opposed to text flow formats like doc/docx. And also, this is why it can really render anything.
So, as opposed to document formats, it is the responsibility of the program that generates the PDF to get the layout right. Think of it just as if you'd draw with GDI+.
In documents like docx or html, it's the rendering program that has to do the layout work. With such document, you just write text and the viewer will take care of laying it out.
Your PDF library certainly has the necessary code to measure the text length. Maybe even it has some code to provide some layout capabilities. You'll have to use these functions to do the layout.

Replace text in PDF

I'm trying to replace a section of a PDF with different text. From research on all major PDF libraries for .NET, it seems this is complicated and not a trivial task. I think it may be easier to convert the PDF to an image, replace the text (always in the same place), then convert it back to a PDF (or leave it as an image if converting back isn't possible). Is it possible to extract an image from a PDF page with .NET?
If your text is in a known location, you can simply cover it with a rectangle filled with the background color, and then draw your text over top.
Note that the text will still be there, it simply won't be visible. Someone selecting text will still pick up the old stuff. If that's acceptable, it's quite trivial.
If the PDF was created from image, you can import it into Photoshop to edit it as an graphic. Or you can use screenshot program like "Snagit" to capture pdf page as image and use snagit's editor to erase old text and replace new one.
But this method may bring you problem is that the new added text may not the same font as text around it. Personally, I use pdf editor to replace text in pdf since the added text will be automatically fit with the original font and size.

Recognizing text file Form Feeds in MigraDoc

Is there a way for MigraDoc to recognize form feeds characters that are already embedded in a ASCII text file? Or does one have to process the text file line by line to catch them?
I don't think that "form feeds" are treated as page breaks by MigraDoc automatically.
The output probably looks better if you process the text file anyway (e.g. treat single CR/LF as spaces only, treat empty lines as new paragraphs, etc.).
It could even make sense to ignore the form feeds - the text files probably weren't made for proportional fonts. If you use proportional fonts with MigraDoc (e.g. Arial), the documents could fit on a smaller number of pages.

Categories