Batch converting files with C# .NET

Batch converting files with C# .NET - c#

In the code snippet below, I am requesting the user to input their directory path to target their .pdf file to be converted. However, I would like to be able to convert a batch of .pdf files at once. How could I go about doing this? Say the user has 100 .pdf files in the directory path each with different file names. What is the best way to alter my code to be able to batch convert all the .pdf files at once?
Console.WriteLine("PDF to Excel conversion requires a user directory path");
Console.WriteLine(#"c:\Users\username\Desktop\FolderName\FileName.pdf");
Console.WriteLine("Your Directory Path: ");
var userPath = Console.ReadLine();
string pathToPdf = userPath;
string pathToExcel = Path.ChangeExtension(pathToPdf, ".xls");
// Converting PDF to Excel file
SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
// 'true' = convert data to spreadsheet (tabular and textual)
// 'false' = skip textual data and convert only tabular (tables)
f.ExcelOptions.ConvertNonTabularDataToSpreadsheet = true;
// 'true' = preserve the original page layout
// 'false' = place tables before text
f.ExcelOptions.PreservePageLayout = true;
f.OpenPdf(pathToPdf);
if (f.PageCount > 0)
{
int result = f.ToExcel(pathToExcel);
// open an excel workbook
if (result == 0)
{
System.Diagnostics.Process.Start(pathToExcel);
}
}
Edit: Below you see my attempt to write the program using Bradley's Directory method shown below.
static void Main(string[] args)
{
Console.WriteLine("Welcome. I am Textron's PDF to Excel converter.");
Console.WriteLine("\n - Create a folder with all your .pdf files to be converted");
Console.WriteLine("\n - You must define your directory path");
Console.WriteLine(#" For Example ==> c:\Users\Username\Desktop\YourFolder");
Console.WriteLine("\n Your directory: ");
var userPath = Console.ReadLine();
foreach (string file in Directory.EnumerateFiles(userPath, "*.pdf"))
{
string excelPath = Path.ChangeExtension(userPath, ".xls");
// Converting PDF to Excel filetype
SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
// 'true' = convert data to spreadsheet (tabular and textual)
// 'false' = skip textual data and convert only tabular (tables)
f.ExcelOptions.ConvertNonTabularDataToSpreadsheet = true;
f.OpenPdf(userPath);
if (f.PageCount > 0)
{
int result = f.ToExcel(excelPath);
// open an excel workbook
if (result == 0)
{
System.Diagnostics.Process.Start(excelPath);
}
}
}
}

To get all files in a directory use Directory.EnumerateFiles(MSDN). In your case:
foreach (string file in Directory.EnumerateFiles(directoryPath, "*.pdf"))
{
// PDF code, probably extracted to its own method!
}
In this specific case GetFiles would also work, but EnumerateFiles is better if you only want to do a subset; as it lazily evaluates.

Related

Convert word document in to pdf at run time (at the time of uploading)

I have a scenario in my application in which user can upload business case document which is word document, and other users(admin and super admin) can see that business case and approve the document.
What is happening is admins and super admins have to download word document , which i don't want , i want them to see that document in a pdf format.
I could do that by restricting the user to upload only pdf files but that is not i want , I want him to upload word document and at the time of saving that file in to server , i want to save as PDF formtat .
Current code is working fine but it is saving file as word file , i want to save the file as pdf format.
int count = 0;
foreach (HttpPostedFileBase file in emailasign.Files)
{
var filename = "";
//Checking file is available to save.
if (file != null)
{
var random = new Random();
filename = random.Next(111111, 999999).ToString() + Path.GetExtension(file.FileName);
var ServerSavePath = Path.Combine(Server.MapPath("~/UploadedFiles/") + filename);
//Save file to server folder
file.SaveAs(ServerSavePath);
count++;
}
if (count > 1)
{
TempData["OrignalFile"] += "," + file.FileName;
TempData["FileName"] += "," + filename;
}
else if (count == 1)
{
TempData["OrignalFile"] = file.FileName;
TempData["FileName"] = filename;
}
}
Note: Right now it is accepting all format but i will restrict it in to word document only , so at the the time of uploading the document , i want to convert it in to PDF format and than save it on server

Mass file download - Drive API .NET

Question:
How can I tell my backup tool to download all the files it recorded in fileids?
The method I'm using is C#/.NET https://developers.google.com/drive/v3/web/manage-downloads#examples
I'll spare the boring details and say that part of my program logs in Once as each user (well, using the Apps Service API), grabs all their files' fileIds and records them to a flat text file. My program then opens that flat text file and begins downloading each fileId recorded for that user, but the problem is: it's soooo slow because it opens a new connection for a file, waits for the file to finish, then gets a new fileid and starts the whole process over again. It's not very efficient.
Google's example, which I copied pretty much Verbatim (I modified the vars a little bit by immediately grabbing and exporting their mimetype, so the first 3 lines are moot):
var fileId = "0BwwA4oUTeiV1UVNwOHItT0xfa2M";
var request = driveService.Files.Get(fileId);
var stream = new System.IO.MemoryStream();
// Add a handler which will be notified on progress changes.
// It will notify on each chunk download and when the
// download is completed or failed.
request.MediaDownloader.ProgressChanged +=
(IDownloadProgress progress) =>
{
switch(progress.Status)
{
case DownloadStatus.Downloading:
{
Console.WriteLine(progress.BytesDownloaded);
break;
}
case DownloadStatus.Completed:
{
Console.WriteLine("Download complete.");
break;
}
case DownloadStatus.Failed:
{
Console.WriteLine("Download failed.");
break;
}
}
};
request.Download(stream);
Is there Any way I can streamline this so that my program can download all the files it knows for the user in one big handshake, vs reading a fileid individually, then opening a session, exporting, downloading, closing, then doing the same exact thing for the next file? Hope this makes sense.
Thank you for any help ahead of time!
--Mike
---EDIT---
I wanted to add more details so that hopefully what I'm looking to do makes more sense:
So what's happening in the following code is: I am creating a "request" that will let me export the filetype (which I have from the flat text file as the fileId[0], and the "mimetype" which is in the array as fileId[1].)
What's killing the speed of the program is having to build the "BuildService" request each time for each file.
foreach (var file in deltafiles)
{
try
{
if (bgW.CancellationPending)
{
stripLabel.Text = "Backup canceled!";
e.Cancel = true;
break;
}
DateTime dt = DateTime.Now;
string[] foldervalues = File.ReadAllLines(savelocation + "folderlog.txt");
cnttototal++;
bgW.ReportProgress(cnttototal);
// Our file is a CSV. Column 1 = file ID, Column 2 = File name
var values = file.Split(',');
string fileId = values[0];
string fileName = values[1];
string mimetype = values[2];
mimetype = mimetype.Replace(",", "_");
string folder = values[3];
int foundmatch = 0;
int folderfilelen = foldervalues.Count();
fileName = fileName.Replace('\\', '_').Replace('/', '_').Replace(':', '_').Replace('!', '_').Replace('\'', '_').Replace('*', '_').Replace('#', '_').Replace('[', '_').Replace(']', '_');
var request = CreateService.BuildService(user).Files.Export(fileId, mimetype);
//Default extensions for files. Not sure what this should be, so we'll null it for now.
string ext = null;
// Things get sloppy here. The reason we're checking MimeTypes
// is because we have to export the files from Google's format
// to a format that is readable by a desktop computer program
// So for example, the google-apps.spreadsheet will become an MS Excel file.
if (mimetype == mimeSheet || mimetype == mimeSheetRitz || mimetype == mimeSheetml)
{
request = CreateService.BuildService(user).Files.Export(fileId, exportSheet);
ext = ".xls";
}
if (mimetype == mimeDoc || mimetype == mimeDocKix || mimetype == mimeDocWord)
{
request = CreateService.BuildService(user).Files.Export(fileId, exportDoc);
ext = ".docx";
}
if (mimetype == mimePres || mimetype == mimePresPunch)
{
request = CreateService.BuildService(user).Files.Export(fileId, exportPres);
ext = ".ppt";
}
if (mimetype == mimeForm || mimetype == mimeFormfb || mimetype == mimeFormDrawing)
{
request = CreateService.BuildService(user).Files.Export(fileId, exportForm);
ext = ".docx";
}
// Any other file type, assume as know what it is (which in our case, will be a txt file)
// apply the mime type and carry on.
string dest = Path.Combine(savelocation, fileName + ext);
var stream = new System.IO.FileStream(dest, FileMode.Create, FileAccess.ReadWrite);
int oops = 0;
// Add a handler which will be notified on progress changes.
// It will notify on each chunk download and when the
// download is completed or failed.
request.MediaDownloader.ProgressChanged +=
(IDownloadProgress progress) =>
{
switch (progress.Status)
{
case DownloadStatus.Downloading:
{
throw new Exception("File may be corrupted.");
break;
}
case DownloadStatus.Completed:
{
Console.WriteLine("Download complete.");
break;
}
case DownloadStatus.Failed:
{
oops = 1;
logFile.WriteLine(fileName + " could not be downloaded. Possible Google draw/form OR bad name.\n");
break;
}
}
};
request.Download(stream);
stream.Close();
stream.Dispose();
Is there any way I could streamline this process so I don't have to build the drive service Every time I want to download a file? The flat text file the program reads looks similar to
FILEID,ACTUAL FILE NAME,MIMETYPE
So is there any way I could cut out the middle man and feed the request.Download method without constantly reminding the "foreach" statement to export the file type as a file system-readable file? (good grief, sorry, I know this sounds like a lot.)
Any pointers would be great!!

You might want to try the tutorial : Google Drive API with C# .net – Download. This is a much simpler code to download a file. Also there are other factors like intermittent internet connect that may affect the ETA of downloading the file.
Code Sample :
/// Download a file
/// Documentation: https://developers.google.com/drive/v2/reference/files/get
///
/// a Valid authenticated DriveService
/// File resource of the file to download
/// location of where to save the file including the file name to save it as.
///
public static Boolean downloadFile(DriveService _service, File _fileResource, string _saveTo)
{
if (!String.IsNullOrEmpty(_fileResource.DownloadUrl))
{
try
{
var x = _service.HttpClient.GetByteArrayAsync(_fileResource.DownloadUrl );
byte[] arrBytes = x.Result;
System.IO.File.WriteAllBytes(_saveTo, arrBytes);
return true;
}
catch (Exception e)
{
Console.WriteLine("An error occurred: " + e.Message);
return false;
}
}
else
{
// The file doesn't have any content stored on Drive.
return false;
}
}
Using _service.HttpClient.GetByteArrayAsync we can pass it the download url of the file we would like to download. Once the file is download its a simple matter of wright the file to the disk.
Hope this helps!

This isn't an answer as much as it is a work around, even then it's only half the answer (for right now.) I threw my gloves off and played dirty.
First, I updated my nuget google api packages to the latest version available today inside my VS project, then went to https://github.com/google/google-api-dotnet-client, forked/cloned it, changed the Google.Apis.Drive.v3.cs file (which compiles to google.apis.drive.v3.dll) so that the mimetype is no longer read only (it can do get; and set;, when by default, it only allowed get).
Because I already knew the mime types, I am able to force assign the mime type now to the request and go on with my life, instead of having to build the client service, connect, only to export the file type that I already know it is.
It's not pretty, not how it should be done, but this was really bothering me!
Going back to #Mr.Rebot, I thank you again for your help and research! :-)

c# zip file - Extract file last

Quick question: I need to extract zip file and have a certain file extract last.
More info: I know how to extract a zip file with c# (fw 4.5).
The problem I'm having now is that I have a zip file and inside it there is always a file name (for example) "myFlag.xml" and a few more files.
Since I need to support some old applications that listen to the folder I'm extracting to, I want to make sure that the XML file will always be extract the last.
Is there some thing like "exclude" for the zip function that can extract all but a certain file so I can do that and then extract only the file alone?
Thanks.

You could probably try a foreach loop on the ZipArchive, and exclude everything that doesn't match your parameters, then, after the loop is done, extract the last file.
Something like this:
private void TestUnzip_Foreach()
{
using (ZipArchive z = ZipFile.Open("zipfile.zip", ZipArchiveMode.Read))
{
string LastFile = "lastFileName.ext";
int curPos = 0;
int lastFilePosition = 0;
foreach (ZipArchiveEntry entry in z.Entries)
{
if (entry.Name != LastFile)
{
entry.ExtractToFile(#"C:\somewhere\" + entry.FullName);
}
else
{
lastFilePosition = curPos;
}
curPos++;
}
z.Entries[lastFilePosition].ExtractToFile(#"C:\somewhere_else\" + LastFile);
}
}

Removing the file with same name, extension doesn't matter

I have some files in "~Content/Documents" folder which holds every uploaded file. In my case the user can only upload one file.
I have done the uploading part where the user can upload his file.
if (file.ContentLength > 0)
{
var fileName = Path.GetFileName(file.FileName);
var fullpath = System.Web.HttpContext.Current.Server.MapPath("~/Content/Documents");
file.SaveAs(Path.Combine(fullpath,"document"+Path.GetExtension(fileName)));
}
My problem is:
User can upload either ".doc", ".docx", ".xls", ".xlsx", or ".pdf" format files.
Now when the user upload the file of ".doc" format it is uploaded to the folder. Later the same user can upload the file of ".pdf" format which is also uploaded to the folder. That means the user can upload two files.
Now what I want to do is:
When a specific user uploads his document:
->search whether the document uploaded by the user is in that folder or not. i.e. the specific filename with different extension exists or not.
->if the filename already exists with different extension then remove that file and upload the new file.

Try this, Just another way; If your filename is "document"
string[] files = System.IO.Directory.GetFiles(fullpath,"document.*");
foreach (string f in files)
{
System.IO.File.Delete(f);
}
So your code would be;
if (file.ContentLength > 0)
{
var fileName = Path.GetFileName(file.FileName);
var fullpath = System.Web.HttpContext.Current.Server.MapPath("~/Content/Documents");
//deleting code starts here
string[] files = System.IO.Directory.GetFiles(fullpath,"document.*");
foreach (string f in files)
{
System.IO.File.Delete(f);
}
//deleting code ends here
file.SaveAs(Path.Combine(fullpath,"document"+Path.GetExtension(fileName)));
}

Something like this should do the trick
var files = new DirectoryInfo(fullpath).GetFiles();
var filesNoExtensions = files.Select(a => a.Name.Split('.')[0]).ToList();
//for below: or 'document' if that's what you rename it to be on disk
var fileNameNoExtension = fileName.Split('.')[0];
if (filesNoExtensions.Contains(fileNameNoExtension))
{
var deleteMe = files.First(f => f.Name.Split('.')[0] == fileNameNoExtension);
deleteMe.Delete();
}
file.SaveAs(Path.Combine(fullpath,"document"+Path.GetExtension(fileName)));

Get the filename of the new file without extension, then loop through all the filenames in the folder where it will be uploaded to and check if the name already exists. If so, delete the old an upload, else upload.
var info = new FileInfo("C:\\MyDoc.docx");
var filename = info.Name.Replace(info.Extension, "");
var files = Directory.GetFiles("YOUR_DIRECTORY").Select(f => new FileInfo(f).Name);
if (files.Any(file => file.Contains(filename)))
{
//Delete old file
}
//Upload new file

Append to file failure when executable not in same folder as data files

Problem is now solved. Mistake by me that I hadn't seen before.
I am pretty new to coding in general and am very new to C# so I am probably missing something simple. I wrote a program to pull data from a login website and save that data to files on the local hard drive. The data is power and energy data for solar modules and each module has its own file. On my main workstation I am running Windows Vista and the program works just fine. When I run the program on the machine running Server 2003, instead of the new data being appended to the files, it just overwrites the data originally in the file.
The data I am downloading is csv format text over a span of 7 days at a time. I run the program once a day to pull the new day's data and append it to the local file. Every time I run the program, the local file is a copy of the newly downloaded data with none of the old data. Since the data on the web site is only updated once a day, I have been testing by removing the last day's data in the local file and/or the first day's data in the local file. Any time I change the file and run the program, the file contains the downloaded data and nothing else.
I just tried something new to test why it wasn't working and think I have found the source of the error. When I ran on my local machine, the "filePath" variable was set to "". On the server and now on my local machine I have changed the "filePath" to #"C:\Solar Yard Data\" and on both machines it catches the file not found exception and creates a new file in the same directory which overwrites the original. Anyone have an idea as to why this happens?
The code is the section that download's each data set and appends any new data to the local file.
int i = 0;
string filePath = "C:/Solar Yard Data/";
string[] filenamesPower = new string[]
{
"inverter121201321745_power",
"inverter121201325108_power",
"inverter121201326383_power",
"inverter121201326218_power",
"inverter121201323111_power",
"inverter121201324916_power",
"inverter121201326328_power",
"inverter121201326031_power",
"inverter121201325003_power",
"inverter121201326714_power",
"inverter121201326351_power",
"inverter121201323205_power",
"inverter121201325349_power",
"inverter121201324856_power",
"inverter121201325047_power",
"inverter121201324954_power",
};
// download and save every module's power data
foreach (string url in modulesPower)
{
// create web request and download data
HttpWebRequest req_csv = (HttpWebRequest)HttpWebRequest.Create(String.Format(url, auth_token));
req_csv.CookieContainer = cookie_container;
HttpWebResponse res_csv = (HttpWebResponse)req_csv.GetResponse();
// save the data to files
using (StreamReader sr = new StreamReader(res_csv.GetResponseStream()))
{
string response = sr.ReadToEnd();
string fileName = filenamesPower[i] + ".csv";
// save the new data to file
try
{
int startIndex = 0; // start index for substring to append to file
int searchResultIndex = 0; // index returned when searching downloaded data for last entry of data on file
string lastEntry; // will hold the last entry in the current data
//open existing file and find last entry
using (StreamReader sr2 = new StreamReader(fileName))
{
//get last line of existing data
string fileContents = sr2.ReadToEnd();
string nl = System.Environment.NewLine; // newline string
int nllen = nl.Length; // length of a newline
if (fileContents.LastIndexOf(nl) == fileContents.Length - nllen)
{
lastEntry = fileContents.Substring(0, fileContents.Length - nllen).Substring(fileContents.Substring(0, fileContents.Length - nllen).LastIndexOf(nl) + nllen);
}
else
{
lastEntry = fileContents.Substring(fileContents.LastIndexOf(nl) + 2);
}
// search the new data for the last existing line
searchResultIndex = response.LastIndexOf(lastEntry);
}
// if the downloaded data contains the last record on file, append the new data
if (searchResultIndex != -1)
{
startIndex = searchResultIndex + lastEntry.Length;
File.AppendAllText(filePath + fileName, response.Substring(startIndex+1));
}
// else append all the data
else
{
Console.WriteLine("The last entry of the existing data was not found\nin the downloaded data. Appending all data.");
File.AppendAllText(filePath + fileName, response.Substring(109)); // the 109 index removes the file header from the new data
}
}
// if there is no file for this module, create the first one
catch (FileNotFoundException e)
{
// write data to file
Console.WriteLine("File does not exist, creating new data file.");
File.WriteAllText(filePath + fileName, response);
//Debug.WriteLine(response);
}
}
Console.WriteLine("Power file " + (i + 1) + " finished.");
//Debug.WriteLine("File " + (i + 1) + " finished.");
i++;
}
Console.WriteLine("\nPower data finished!\n");

Couple of suggestions wich I think will probably resolve the issue
First change your filePath string
string filePath = #"C:\Solar Yard Data\";
create a string with the full path
String fullFilePath = filePath + fileName;
then check to see if it exists and create it if it doesnt
if (!File.Exists(fullFilePath ))
File.Create(fullFilePath );
put the full path to the file in your streamReader
using (StreamReader sr2 = new StreamReader(fullFilePath))

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Batch converting files with C# .NET - c#

Related

Convert word document in to pdf at run time (at the time of uploading)

Mass file download - Drive API .NET

c# zip file - Extract file last

Removing the file with same name, extension doesn't matter

Append to file failure when executable not in same folder as data files

Categories

Resources