Read more than one file

Read more than one file - c#

I am writing a pdf to word converter which works perfectly fine for me. But I want to be able to convert more than one file.
What happens now is that it read the first file and does the convert process.
public static void PdfToImage()
{
try
{
Application application = null;
application = new Application();
var doc = application.Documents.Add();
string path = #"C:\Users\Test\Desktop\pdfToWord\";
foreach (string file in Directory.EnumerateFiles(path, "*.pdf"))
{
using (var document = PdfiumViewer.PdfDocument.Load(file))
{
int pagecount = document.PageCount;
for (int index = 0; index < pagecount; index++)
{
var image = document.Render(index, 200, 200, true);
image.Save(#"C:\Users\chnikos\Desktop\pdfToWord\output" + index.ToString("000") + ".png", ImageFormat.Png);
application.Selection.InlineShapes.AddPicture(#"C:\Users\chnikos\Desktop\pdfToWord\output" + index.ToString("000") + ".png");
}
string getFileName = file.Substring(file.LastIndexOf("\\"));
string getFileWithoutExtras = Regex.Replace(getFileName, #"\\", "");
string getFileWihtoutExtension = Regex.Replace(getFileWithoutExtras, #".pdf", "");
string fileName = #"C:\Users\Test\Desktop\pdfToWord\" + getFileWihtoutExtension;
doc.PageSetup.PaperSize = WdPaperSize.wdPaperA4;
foreach (Microsoft.Office.Interop.Word.InlineShape inline in doc.InlineShapes)
{
if (inline.Height > inline.Width)
{
inline.ScaleWidth = 250;
inline.ScaleHeight = 250;
}
}
doc.PageSetup.TopMargin = 28.29f;
doc.PageSetup.LeftMargin = 28.29f;
doc.PageSetup.RightMargin = 30.29f;
doc.PageSetup.BottomMargin = 28.29f;
application.ActiveDocument.SaveAs(fileName, WdSaveFormat.wdFormatDocument);
doc.Close();
}
}
I thought that with my foreach that problem should not occur. And yes there are more than one pdf in this folder

The line
var doc = application.Documents.Add();
is outside the foreach loop. So you only create a single word document for all your *.pdf files.
Move the above line inside the foreach loop to add a new word document for each *.pdf file.

Related

How to extract all pages and attachments from PDF to PNG

I am trying to create a process in .NET to convert a PDF and all it's pages + attachments to PNGs. I am evaluating libraries and came across PDFiumSharp but it is not working for me. Here is my code:
string Inputfile = "input.pdf";
string OutputFolder = "Output";
string fileName = Path.GetFileNameWithoutExtension(Inputfile);
using (PdfDocument doc = new PdfDocument(Inputfile))
{
for (int i = 0; i < doc.Pages.Count; i++)
{
var page = doc.Pages[i];
using (var bitmap = new PDFiumBitmap((int)page.Width, (int)page.Height, false))
{
page.Render(bitmap);
var targetFile = Path.Combine(OutputFolder, fileName + "_" + i + ".png");
bitmap.Save(targetFile);
}
}
}
When I run this code, I get this exception:
screenshot of exception
Does anyone know how to fix this? Also does PDFiumSharp support extracting PDF attachments? If not, does anyone have any other ideas on how to achieve my goal?

PDFium does not look like it supports extracting PDF attachments. If you want to achieve your goal, then you can take a look at another library that supports both extracting PDF attachments as well as converting PDFs to PNGs.
I am an employee of the LEADTOOLS PDF SDK which you can try out via these 2 nuget packages:
https://www.nuget.org/packages/Leadtools.Pdf/
https://www.nuget.org/packages/Leadtools.Document.Sdk/
Here is some code that will convert a PDF + all attachments in the PDF to separate PNGs in an output directory:
SetLicense();
cache = new FileCache { CacheDirectory = "cache" };
List<LEADDocument> documents = new List<LEADDocument>();
if (!Directory.Exists(OutputDir))
Directory.CreateDirectory(OutputDir);
using var document = DocumentFactory.LoadFromFile("attachments.pdf", new LoadDocumentOptions { Cache = cache, LoadAttachmentsMode = DocumentLoadAttachmentsMode.AsAttachments });
if (document.Pages.Count > 0)
documents.Add(document);
foreach (var attachment in document.Attachments)
documents.Add(document.LoadDocumentAttachment(new LoadAttachmentOptions { AttachmentNumber = attachment.AttachmentNumber }));
ConvertDocuments(documents, RasterImageFormat.Png);
And the ConvertDocuments method:
static void ConvertDocuments(IEnumerable<LEADDocument> documents, RasterImageFormat imageFormat)
{
using var converter = new DocumentConverter();
using var ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
ocrEngine.Startup(null, null, null, null);
converter.SetOcrEngineInstance(ocrEngine, false);
converter.SetDocumentWriterInstance(new DocumentWriter());
foreach (var document in documents)
{
var name = string.IsNullOrEmpty(document.Name) ? "Attachment" : document.Name;
string outputFile = Path.Combine(OutputDir, $"{name}.{RasterCodecs.GetExtension(imageFormat)}");
int count = 1;
while (File.Exists(outputFile))
outputFile = Path.Combine(OutputDir, $"{name}({count++}).{RasterCodecs.GetExtension(imageFormat)}");
var jobData = new DocumentConverterJobData
{
Document = document,
Cache = cache,
DocumentFormat = DocumentFormat.User,
RasterImageFormat = imageFormat,
RasterImageBitsPerPixel = 0,
OutputDocumentFileName = outputFile,
};
var job = converter.Jobs.CreateJob(jobData);
converter.Jobs.RunJob(job);
}
}

Efficiently Convert .xslx to .csv in C#?

As input, I have a set of excel files with several worksheets inside. I need to export a single csv file for each worksheet. Below is my code which works but it is very slow. It builds upon the solutions proposed in this previous post. Please consider that I have to run this on rather big .xlsx files (approx. 300Mb).
QUESTION: Is there any way to improve this?
void Main()
{
string folder = #"\\PATH_TO_FOLDER\";
var files = Directory.GetFiles(folder, "*.xlsx", SearchOption.TopDirectoryOnly);
foreach (string file in files)
{
ConvertToCsv(file, Directory.GetParent(file) + #"\\output\");
}
}
public static void ConvertToCsv(string file, string targetFolder)
{
FileInfo finfo = new FileInfo(file);
ExcelPackage package = new ExcelPackage(finfo);
// if targetFolder doesn't exist, create it
if (!Directory.Exists(targetFolder)) {
Directory.CreateDirectory(targetFolder);
}
var worksheets = package.Workbook.Worksheets;
int sheetcount = 0;
foreach (ExcelWorksheet worksheet in worksheets)
{
sheetcount++;
var maxColumnNumber = worksheet.Dimension.End.Column;
var currentRow = new List<string>(maxColumnNumber);
var totalRowCount = worksheet.Dimension.End.Row+1;
var currentRowNum = 1;
//No need for a memory buffer, writing directly to a file
//var memory = new MemoryStream();
string file_name = targetFolder + Path.GetFileNameWithoutExtension(file) + "_" + sheetcount + ".csv";
using (var writer = new StreamWriter(file_name, false, Encoding.UTF8))
{
//the rest of the code remains the same
for (int i = 1; i < totalRowCount; i++)
{
i.Dump();
// populate line with semi columns separators
string line = "";
for (int j = 1; j < worksheet.Dimension.End.Column+1; j++)
{
if (worksheet.Cells[i, j].Value != null)
{
string cell = worksheet.Cells[i, j].Value.ToString() + ";";
line += cell;
}
}
// write line
writer.WriteLine(line);
}
}
}
}

How to ignore protected pdf's?

I am writing on my pdf-word converter and I just received a really strange exception witch makes no sens to me.
Error:PdfiumViewer.PdfException:{"Unsupported security scheme"}
Its the first time that such a exception appears. but I have to be honest that I never tried to convert more then 3-4 files from pdf to word and right now I am doing more then 100 files.
Here is my code I am sry if its too long but I simply do not know on which line the error occurs
public static void PdfToImage()
{
try
{
Application application = null;
application = new Application();
string path = #"C:\Users\chnikos\Desktop\Test\Batch1\";
foreach (string file in Directory.EnumerateFiles(path, "*.pdf"))
{
var doc = application.Documents.Add();
using (var document = PdfiumViewer.PdfDocument.Load(file))
{
int pagecount = document.PageCount;
for (int index = 0; index < pagecount; index++)
{
var image = document.Render(index, 200, 200, true);
image.Save(#"C:\Users\chnikos\Desktop\Test\Batch1\output" + index.ToString("000") + ".png", ImageFormat.Png);
application.Selection.InlineShapes.AddPicture(#"C:\Users\chnikos\Desktop\Test\Batch1\output" + index.ToString("000") + ".png");
}
string getFileName = file.Substring(file.LastIndexOf("\\"));
string getFileWithoutExtras = Regex.Replace(getFileName, #"\\", "");
string getFileWihtoutExtension = Regex.Replace(getFileWithoutExtras, #".pdf", "");
string fileName = #"C:\Users\chnikos\Desktop\Test\Batch1\" + getFileWihtoutExtension;
doc.PageSetup.PaperSize = WdPaperSize.wdPaperA4;
foreach (Microsoft.Office.Interop.Word.InlineShape inline in doc.InlineShapes)
{
.....
}
doc.PageSetup.TopMargin = 28.29f;
doc.PageSetup.LeftMargin = 28.29f;
doc.PageSetup.RightMargin = 30.29f;
doc.PageSetup.BottomMargin = 28.29f;
application.ActiveDocument.SaveAs(fileName, WdSaveFormat.wdFormatDocument);
doc.Close();
string imagePath = #"C:\Users\chnikos\Desktop\Test\Batch1\";
Array.ForEach(Directory.GetFiles(imagePath, "*.png"), delegate(string deletePath) { File.Delete(deletePath); });
}
}
}
catch (Exception e)
{
Console.WriteLine("Error: " + e);
}
}
}
}

download and save image using c#

I generated three picture from wikipedia api.Now I want to store it in my current directory. with the following code I can successfully create folder with name.But it saves only one image, the last one. I am trying a lot. But could not fix how to save three images accordingly
public static void Load_Image1(string name1, string name2, string name3,string LocationName)
{
var startPath = Application.StartupPath;
string Imagefolder = Path.Combine(startPath, "Image");
string subImageFolder = Path.Combine(Imagefolder, LocationName);
System.IO.Directory.CreateDirectory(subImageFolder);
//string Jpeg = Path.Combine(Environment.CurrentDirectory, subImageFolder);
List<PictureBox> pictureBoxes = new List<PictureBox>();
pictureBoxes.Add(Image1);
pictureBoxes.Add(Image2);
pictureBoxes.Add(Image3);
using (var wc = new System.Net.WebClient())
{
var uri = ("https://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&format=json&iiprop=url&iiurlwidth=400&titles="+name1+"|"+name2+"|"+name3);
var response = wc.DownloadString(new Uri(uri));
var responseJson = JsonConvert.DeserializeObject<RootObject>(response);
List<string> urls = new List<string>();
foreach (KeyValuePair<string, Pageval> entry in responseJson.query.pages)
{
var url = entry.Value.imageinfo.First().thumburl;
urls.Add(url);
var hash = uri.GetHashCode();
string Jpeg = Path.Combine(Environment.CurrentDirectory, subImageFolder);
var path = Path.Combine(Jpeg, hash.ToString("X") + ".jpg");
wc.DownloadFile(url, path);
}
for (int i = 0; i < pictureBoxes.Count; i++)
{
Image1.SizeMode = PictureBoxSizeMode.StretchImage;
Image2.SizeMode = PictureBoxSizeMode.StretchImage;
Image3.SizeMode = PictureBoxSizeMode.StretchImage;
pictureBoxes[i].Load(urls[i]);
var hash = uri.GetHashCode();
string Jpeg = Path.Combine(Environment.CurrentDirectory, subImageFolder);
var path = Path.Combine(Jpeg, hash.ToString("X") + ".jpg");
wc.DownloadFile(urls[i], path);
}
}
}
}

You are downloading all images to the same filename on the disk - causing the first two images to be overwritten by the last one.
The problem is that your base file name is based on
var hash = uri.GetHashCode();
This returns the same value since it's based on the url of all 3 images.
Change to:
var hash = url.GetHashCode();

you actually save all the pictures, but with the same name, that's why only the last remains in the filesystem (you keep overwriting the images). You should use a unique identifier in your variable path that allows you to distinguish the images, saving them with different names to avoid overwriting

public static void Load_Image1(string name1, string name2, string name3,string LocationName)
{
var startPath = Application.StartupPath;
string Imagefolder = Path.Combine(startPath, "Image");
string subImageFolder = Path.Combine(Imagefolder, LocationName);
System.IO.Directory.CreateDirectory(subImageFolder);
//string Jpeg = Path.Combine(Environment.CurrentDirectory, subImageFolder);
List<PictureBox> pictureBoxes = new List<PictureBox>();
pictureBoxes.Add(Image1);
pictureBoxes.Add(Image2);
pictureBoxes.Add(Image3);
using (var wc = new System.Net.WebClient())
{
var uri = ("https://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&format=json&iiprop=url&iiurlwidth=400&titles="+name1+"|"+name2+"|"+name3);
var response = wc.DownloadString(new Uri(uri));
var responseJson = JsonConvert.DeserializeObject<RootObject>(response);
List<string> urls = new List<string>();
foreach (KeyValuePair<string, Pageval> entry in responseJson.query.pages)
{
var url = entry.Value.imageinfo.First().thumburl;
urls.Add(url);
var hash = url.GetHashCode();
string Jpeg = Path.Combine(Environment.CurrentDirectory, subImageFolder);
var path = Path.Combine(Jpeg, hash.ToString("X") + ".jpg");
wc.DownloadFile(url, path);
}
for (int i = 0; i < pictureBoxes.Count; i++)
{
Image1.SizeMode = PictureBoxSizeMode.StretchImage;
Image2.SizeMode = PictureBoxSizeMode.StretchImage;
Image3.SizeMode = PictureBoxSizeMode.StretchImage;
pictureBoxes[i].Load(urls[i]);
}
}
}
}

Telerik RenderReport

I have some problems with Telerik reports.
Feels like i have missed something...
I wanna create a list of reports, and then write them to ONE file.
But when i write it out i only get one page.
The writer writer over page 1 all foreach, so it just write one page.
But i want several pages... in this case 10.
Have tried write with FileStream, File and more...
Does anyone have a good idea?
public void WriteToFile()
{
string path = #"C:\";
string test = "test";
var report = new Report2();
var procceser = new ReportProcessor();
var list = new List<RenderingResult>();
for (int i = 0; i < 10; i++)
{
var res = procceser.RenderReport("PDF", report, null);
list.Add(res);
}
string filePath = Path.Combine(path, test);
var Writer = new BinaryWriter(File.Create(filePath));
foreach (var renderingResult in list)
{
Writer.Write(renderingResult.DocumentBytes);
}
Writer.Flush();
Writer.Close();
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Read more than one file - c#

The line var doc = application.Documents.Add(); is outside the foreach loop. So you only create a single word document for all your .pdf files. Move the above line inside the foreach loop to add a new word document for each .pdf file.

Related

How to extract all pages and attachments from PDF to PNG

Efficiently Convert .xslx to .csv in C#?

How to ignore protected pdf's?

download and save image using c#

Telerik RenderReport

Categories

Resources

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Read more than one file - c#

The line var doc = application.Documents.Add(); is outside the foreach loop. So you only create a single word document for all your *.pdf files. Move the above line inside the foreach loop to add a new word document for each *.pdf file.

Related

How to extract all pages and attachments from PDF to PNG

Efficiently Convert .xslx to .csv in C#?

How to ignore protected pdf's?

download and save image using c#

Telerik RenderReport

Categories

Resources

The line var doc = application.Documents.Add(); is outside the foreach loop. So you only create a single word document for all your .pdf files. Move the above line inside the foreach loop to add a new word document for each .pdf file.