I have some problems with Telerik reports.
Feels like i have missed something...
I wanna create a list of reports, and then write them to ONE file.
But when i write it out i only get one page.
The writer writer over page 1 all foreach, so it just write one page.
But i want several pages... in this case 10.
Have tried write with FileStream, File and more...
Does anyone have a good idea?
public void WriteToFile()
{
string path = #"C:\";
string test = "test";
var report = new Report2();
var procceser = new ReportProcessor();
var list = new List<RenderingResult>();
for (int i = 0; i < 10; i++)
{
var res = procceser.RenderReport("PDF", report, null);
list.Add(res);
}
string filePath = Path.Combine(path, test);
var Writer = new BinaryWriter(File.Create(filePath));
foreach (var renderingResult in list)
{
Writer.Write(renderingResult.DocumentBytes);
}
Writer.Flush();
Writer.Close();
}
Related
I am trying to create a process in .NET to convert a PDF and all it's pages + attachments to PNGs. I am evaluating libraries and came across PDFiumSharp but it is not working for me. Here is my code:
string Inputfile = "input.pdf";
string OutputFolder = "Output";
string fileName = Path.GetFileNameWithoutExtension(Inputfile);
using (PdfDocument doc = new PdfDocument(Inputfile))
{
for (int i = 0; i < doc.Pages.Count; i++)
{
var page = doc.Pages[i];
using (var bitmap = new PDFiumBitmap((int)page.Width, (int)page.Height, false))
{
page.Render(bitmap);
var targetFile = Path.Combine(OutputFolder, fileName + "_" + i + ".png");
bitmap.Save(targetFile);
}
}
}
When I run this code, I get this exception:
screenshot of exception
Does anyone know how to fix this? Also does PDFiumSharp support extracting PDF attachments? If not, does anyone have any other ideas on how to achieve my goal?
PDFium does not look like it supports extracting PDF attachments. If you want to achieve your goal, then you can take a look at another library that supports both extracting PDF attachments as well as converting PDFs to PNGs.
I am an employee of the LEADTOOLS PDF SDK which you can try out via these 2 nuget packages:
https://www.nuget.org/packages/Leadtools.Pdf/
https://www.nuget.org/packages/Leadtools.Document.Sdk/
Here is some code that will convert a PDF + all attachments in the PDF to separate PNGs in an output directory:
SetLicense();
cache = new FileCache { CacheDirectory = "cache" };
List<LEADDocument> documents = new List<LEADDocument>();
if (!Directory.Exists(OutputDir))
Directory.CreateDirectory(OutputDir);
using var document = DocumentFactory.LoadFromFile("attachments.pdf", new LoadDocumentOptions { Cache = cache, LoadAttachmentsMode = DocumentLoadAttachmentsMode.AsAttachments });
if (document.Pages.Count > 0)
documents.Add(document);
foreach (var attachment in document.Attachments)
documents.Add(document.LoadDocumentAttachment(new LoadAttachmentOptions { AttachmentNumber = attachment.AttachmentNumber }));
ConvertDocuments(documents, RasterImageFormat.Png);
And the ConvertDocuments method:
static void ConvertDocuments(IEnumerable<LEADDocument> documents, RasterImageFormat imageFormat)
{
using var converter = new DocumentConverter();
using var ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
ocrEngine.Startup(null, null, null, null);
converter.SetOcrEngineInstance(ocrEngine, false);
converter.SetDocumentWriterInstance(new DocumentWriter());
foreach (var document in documents)
{
var name = string.IsNullOrEmpty(document.Name) ? "Attachment" : document.Name;
string outputFile = Path.Combine(OutputDir, $"{name}.{RasterCodecs.GetExtension(imageFormat)}");
int count = 1;
while (File.Exists(outputFile))
outputFile = Path.Combine(OutputDir, $"{name}({count++}).{RasterCodecs.GetExtension(imageFormat)}");
var jobData = new DocumentConverterJobData
{
Document = document,
Cache = cache,
DocumentFormat = DocumentFormat.User,
RasterImageFormat = imageFormat,
RasterImageBitsPerPixel = 0,
OutputDocumentFileName = outputFile,
};
var job = converter.Jobs.CreateJob(jobData);
converter.Jobs.RunJob(job);
}
}
I'm trying to write verification code for our PDF generating routines, and I'm having difficulty getting PDFsharp to extract text from files created with MigraDoc. The ExtractText code works with other PDFs, but not with the PDFs that I generate with MigraDoc (see code below.)
Any tips on what I'm doing wrong?
//Create the Doc
var doc = new MigraDoc.DocumentObjectModel.Document();
doc.Info.Title = "VerifyReadWrite";
var section = doc.AddSection();
section.AddParagraph("ABCDEF abcdef");
//Render the PDF
var renderer = new PdfDocumentRenderer(true);
var pdf = new PdfDocument();
renderer.PdfDocument = pdf;
renderer.Document = doc;
renderer.RenderDocument();
var msOut = new MemoryStream();
pdf.Save(msOut, true);
var pdfBytes = msOut.ToArray();
//Read the PDF into PdfSharp
var ms = new MemoryStream(pdfBytes);
var pdfRead = PdfSharp.Pdf.IO.PdfReader.Open(ms, PdfDocumentOpenMode.ReadOnly);
var segments = pdfRead.Pages[0].ExtractText().ToList();
Results in the following:
segments[0] = "\0$\0%\0&\0'\0(\0)"
segments[1] = "\0D\0E\0F\0G\0H\0I"
I'd expect to see:
segments[0] = "ABCDEF"
segments[1] = "abcdef"
I'm using the ExtractText code from here:
C# Extract text from PDF using PdfSharp
and it works very well for all but PDFs generated with MigraDoc.
public static IEnumerable<string> ExtractText(this PdfPage page)
{
var content = ContentReader.ReadContent(page);
var text = content.ExtractText();
return text.Select(x => x.Trim());
}
public static IEnumerable<string> ExtractText(this CObject cObject)
{
if (cObject is COperator)
{
var cOperator = (COperator) cObject;
if (cOperator.OpCode.Name == OpCodeName.Tj.ToString() ||
cOperator.OpCode.Name == OpCodeName.TJ.ToString())
{
foreach (var cOperand in cOperator.Operands)
foreach (var txt in ExtractText(cOperand))
yield return txt;
}
}
else
{
var sequence = cObject as CSequence;
if (sequence != null)
{
var cSequence = sequence;
foreach (var element in cSequence)
foreach (var txt in ExtractText(element))
yield return txt;
}
else if (cObject is CString)
{
var cString = (CString) cObject;
yield return cString.Value;
}
}
}
It seems the code used to extract text does not support all cases.
Try new PdfDocumentRenderer(false) (instead of 'true'). AFAIK this will lead to a different encoding and the text extraction might work.
I am writing a pdf to word converter which works perfectly fine for me. But I want to be able to convert more than one file.
What happens now is that it read the first file and does the convert process.
public static void PdfToImage()
{
try
{
Application application = null;
application = new Application();
var doc = application.Documents.Add();
string path = #"C:\Users\Test\Desktop\pdfToWord\";
foreach (string file in Directory.EnumerateFiles(path, "*.pdf"))
{
using (var document = PdfiumViewer.PdfDocument.Load(file))
{
int pagecount = document.PageCount;
for (int index = 0; index < pagecount; index++)
{
var image = document.Render(index, 200, 200, true);
image.Save(#"C:\Users\chnikos\Desktop\pdfToWord\output" + index.ToString("000") + ".png", ImageFormat.Png);
application.Selection.InlineShapes.AddPicture(#"C:\Users\chnikos\Desktop\pdfToWord\output" + index.ToString("000") + ".png");
}
string getFileName = file.Substring(file.LastIndexOf("\\"));
string getFileWithoutExtras = Regex.Replace(getFileName, #"\\", "");
string getFileWihtoutExtension = Regex.Replace(getFileWithoutExtras, #".pdf", "");
string fileName = #"C:\Users\Test\Desktop\pdfToWord\" + getFileWihtoutExtension;
doc.PageSetup.PaperSize = WdPaperSize.wdPaperA4;
foreach (Microsoft.Office.Interop.Word.InlineShape inline in doc.InlineShapes)
{
if (inline.Height > inline.Width)
{
inline.ScaleWidth = 250;
inline.ScaleHeight = 250;
}
}
doc.PageSetup.TopMargin = 28.29f;
doc.PageSetup.LeftMargin = 28.29f;
doc.PageSetup.RightMargin = 30.29f;
doc.PageSetup.BottomMargin = 28.29f;
application.ActiveDocument.SaveAs(fileName, WdSaveFormat.wdFormatDocument);
doc.Close();
}
}
I thought that with my foreach that problem should not occur. And yes there are more than one pdf in this folder
The line
var doc = application.Documents.Add();
is outside the foreach loop. So you only create a single word document for all your *.pdf files.
Move the above line inside the foreach loop to add a new word document for each *.pdf file.
So I am having an issue with my output being written to a CSV file. The output to this code is in the correct format when being written to the CSV file but it is only entering a single row in the file. There should be much more. Around 150 lines. Current out put is:
(859.85 7N830127 185)
Which is correct, but there should be more of these. It seems like to me that it is only writing the first line of the parsed EDI file and then stopping. I need to find a way to make sure it writes all data that is being parsed can anyone help me?
static void Main(string[] args)
{
StreamReader sr = new StreamReader("edifile.txt");
string[] ediMapTemp = sr.ReadLine().Split('|');
List<string[]> ediMap = new List<string[]>();
List<object[]> outputMatrix = new List<object[]>();
foreach (var line in ediMapTemp)
{
ediMap.Add(line.Split('~'));
}
DetailNode node = new DetailNode(0, null, 0);
int hierarchicalDepth = 0;
int hierarchicalIdNumber;
int hierarchicalParentIdNumber;
int hierarchicalLevelCode;
int hierarchicalChildCode = 0;
for (int i = 0; i < ediMap.Count; i++)
{
string segmentHeader = ediMap[i][0];
if (segmentHeader == "HL")
{
hierarchicalIdNumber = Convert.ToInt32(ediMap[i][1]);
hierarchicalParentIdNumber = Convert.ToInt32(ediMap[i][2]);
hierarchicalLevelCode = Convert.ToInt32(ediMap[i][3]);
hierarchicalChildCode = Convert.ToInt32(ediMap[i][4]);
List<string[]> levelDetails = new List<string[]>();
for (int v = i + 1; v < ediMap.Count; v++)
{
if (ediMap[v][0] == "HL") break;
levelDetails.Add(ediMap[v]);
}
DetailNode getNode = node.Find(node, hierarchicalParentIdNumber);
getNode.headList.Add(new DetailNode(hierarchicalIdNumber, levelDetails, getNode.depth + 1));
}
}
node.Traversal(new VID(), node);
foreach (var vid in VIDList.vidList)
using (StreamWriter writer = new StreamWriter("Import.csv"))
{
//probably a loop here
writer.WriteLine(String.Join(",", vid.totalCurrentCharges, vid.assetId, vid.componentName, vid.recurringCharge));
}
}
Quick Review, should the code be the following:
using (StreamWriter writer = new StreamWriter("Import.csv"))
{
//a loop here
foreach (var vid in VIDList.vidList)
{
writer.WriteLine(String.Join(",", vid.totalCurrentCharges, vid.assetId, vid.componentName, vid.recurringCharge));
}
}
You would open the file once and then loop thru your collection, writing each one.
It looks to me like you're re-opening the output file for each line you're trying to write, so you're overwriting the output file with a new file for each line. That would mean only the last entry remains in the file. Try moving this line
using (StreamWriter writer = new StreamWriter("Import.csv"))
Outside the foreach loop.
How to use the ABCPdf.NET tool to extract the content texts from a PDF file?
I tried the GetText method but doesn't extract the contents:
var doc = new Doc();
var url = #".../FileName.pdf";
doc.Read(url);
string xmlContents = doc.GetText("Text");
Response.Write(xmlContents);
doc.Clear();
doc.Dispose();
My pdf has almost 1000 words but the GetText only returns 4-5 words. I realized it returns only the texts of the first page.
So the question should be "how to extract the text from all pages of a pdf file?" -(changed the Title to make it clearer).
Thanks,
For your benefit, yes you!
public string ExtractTextsFromAllPages(string pdfFileName)
{
var sb = new StringBuilder();
using (var doc = new Doc())
{
doc.Read(pdfFileName);
for (var currentPageNumber = 1; currentPageNumber <= doc.PageCount; currentPageNumber++)
{
doc.PageNumber = currentPageNumber;
sb.Append(doc.GetText("Text"));
}
}
return sb.ToString();
}
if you don't have the url but have the bytes, then:
public string ExtractTextsFromAllPages(Byte[] pdfBytes)
{
var sb = new StringBuilder();
using (var doc = new Doc())
{
doc.Read(pdfBytes);
for (var currentPageNumber = 1; currentPageNumber <= doc.PageCount; currentPageNumber++)
{
doc.PageNumber = currentPageNumber;
sb.Append(doc.GetText("Text"));
}
}
return sb.ToString();
}
Have you tried the GetText method?
doc.Read(.......);
var textOperation = new TextOperation(doc);
textOperation.PageContents.AddPages();
string allText = textOperation.GetText();