Can I Convert multiple excel books in one pdf? (no using itextsharp) - c#

I want to convert multiple excel books (not sheets) to 1 PDF file. I don't want to use itextsharp because I need to purchase for commercial.
Does anybody have any idea?

Well, this is a little complex, what I think is that maybe you can convert the excel docs to PDF first and then merge them to a single PDF doc. what is your thought? How is your plan going?
You can refer the following article, the main thought is to convert office files to pdf then merge them.
http://www.dotnetspider.com/resources/46252-Convert-and-Merge-Office-Files-to-One-PDF-File-in-C.aspx
To get better help, perhaps show more information like diNN's comment above is helpful.

Here is what I used:
public static class ExcelMergeExtension
{
public static ExcelFile Merge(this ExcelFile destination, string sourcePath)
{
var sourceFileName = Path.GetFileNameWithoutExtension(sourcePath);
var source = ExcelFile.Load(sourcePath);
foreach (var sourceSheet in source.Worksheets)
destination.Worksheets.AddCopy(
string.Format("{0}-{1}", sourceFileName, sourceSheet.Name),
sourceSheet);
return destination;
}
}
class Program
{
static void Main(string[] args)
{
var options = new PdfSaveOptions() { SelectionType = SelectionType.EntireFile };
ExcelFile.Load("Book1.xlsx")
.Merge("Book2.xlsx")
.Merge("Book3.xlsx")
.Save("Books.pdf", options);
}
}
The code uses GemBox.Spreadsheet library which has free and commercial version, however note that free one does have some size limitations.
Anyway it worked great for me and I hope it helps you too.

Related

How to convert many files from doc to docx with multithreading

I have millions of doc files which need to be converted to docx. I am currently using the below method to convert each file in the specified directory. How can I effectively multithread this process?
static void ConvertDocToDocx(string path)
{
Application word = new Application();
var sourceFile = new FileInfo(path);
var document = word.Documents.Open(sourceFile.FullName);
string newFileName = sourceFile.FullName.Replace(".doc", ".docx");
document.SaveAs2(newFileName, WdSaveFormat.wdFormatXMLDocument,
CompatibilityMode: WdCompatibilityMode.wdWord2010);
word.ActiveDocument.Close();
word.Quit();
//File.Delete(path);
}
My current approach is to use Directory.GetFiles to create a list of files which are in my path, then use Parallel.ForEach to convert the files. Here's my code:
string[] filesList = Directory.GetFiles(path);
Parallel.ForEach(filesList, new ParallelOptions { MaxDegreeOfParallelism = 20 }, file =>
{
if (file.Contains(".doc"))
{
ConvertDocToDocx(file);
}
});
However, this doesn't seem to increase performance. Am I misunderstanding the use of Parallel.ForEach?
You are using Word via automation which is equivalent of opening the files manually one by one and saving them. This method may have one performance increasing possibility: there is no need to create new Word instances for each file, just reuse the first instance.
...
var wordInstance = new Application();
try
{
var fileNameList = Directory.GetFiles(path);
foreach(var fileName in fileNameList)
{
if (fileName.Contains(".doc"))
{
ConvertDocToDocx(wordInstance, file);
}
}
}
finally
{
word.Quit();
}
...
static void ConvertDocToDocx(Application wordInstance, string path)
{
var sourceFile = new FileInfo(path);
var newFileName = sourceFile.FullName.Replace(".doc", ".docx");
var document = wordInstance.Documents.Open(sourceFile.FullName);
document.SaveAs2(
newFileName,
WdSaveFormat.wdFormatXMLDocument,
CompatibilityMode: WdCompatibilityMode.wdWord2010);
wordInstance.ActiveDocument.Close();
//File.Delete(path);
}
But as others already mentioned that is the limit of this approach.
You should have a look at solutions which are based on file format knowledge, like e.g. NPOI. It is a C# rewrite of popular Apache POI package so if you search for "POI convert doc to docx" and find Java code do not be afraid almost the same code will compile under C# with NPOI package too, in most cases just minor syntax changes would be required.

Extracting and Sorting data from pdf using C# package

I'm working on a project where I have to extract specific text from a pdf so that I can send these info into an excel file.
I tried at first to convert my pdf into a .txt file thinking a .txt file format would be easier to convert into json.
But the result is not at all what I need (dictionary-style Json format) but instead a kind of giant messy string .
The pdf sample looks like this:
Analysis
Some text
Reference Date (Big space) 11/17/2021
Reference Price (Big space) USD 745
Client id (Big space) 4572845
I'd like to have something like this at the end:
{Analysis:Some text, Reference Date:11/17/2021, Reference Price:USD 745, Client id:4572845}
Currently the results give all the info mixed up between each others.
Here is my code:
First, I created a "Global" class where I will create the method "Extract_Row_Info_TS that will basically load the first page of the document (called a TS or Termsheet) and extract the text from the PDF and store it into a txt file called "result.txt":
class Global
{
public static void Extract_RowInfo_TS(string doc_Type, string docPath, int? nbrPage = null)
{
switch (doc_Type)
{
case "Pdf":
Spire.Pdf.PdfDocument doc = new Spire.Pdf.PdfDocument();
doc.LoadFromFile(docPath);
StringBuilder buffer = new StringBuilder();
//Extract text from the first page only
Spire.Pdf.PdfPageBase pagefirst = doc.Pages[0];
buffer.Append(pagefirst.ExtractText());
doc.Close();
//save text
String fileName = #"my_disk:\my_path\result.txt";
File.WriteAllText(fileName, buffer.ToString());
//Load File
System.Diagnostics.Process.Start(fileName);
break;
case "Excel":
Spire.Xls.Workbook Wb = new Spire.Xls.Workbook();
break;
case "Word":
Spire.Doc.Document doc_word = new Spire.Doc.Document();
break;
}
}
}
Come back to my main page, I call the above method "Extract_RowInfo_TS" from above Global class and when it created "result.txt" from the pdf infos, I'll try to convert this "result.txt" into a json format:
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void btn_Extract_PDF_Click(object sender, EventArgs e)
{
Global.Extract_RowInfo_TS("Pdf", #"my_disk:\my_path\my_doc.pdf");
Convert_To_Json_Format(#"my_disk:\my_path\result.txt");
}
private void Convert_To_Json_Format(string baseTextFile)
{
string streamText = new StreamReader(baseTextFile).ReadToEnd();
//Serialize Json Data.
string serializeData = Serialize_into_Json(streamText);
string newFile = #"my_disk:\my_path\NEW_text_file_2.txt";
File.WriteAllText(newFile, serializeData);
System.Diagnostics.Process.Start(newFile);
}
private static string Serialize_into_Json(string json)
{
string jsonData = JsonConvert.SerializeObject(json);
return jsonData;
}
}
I'm stuck here trying to create a proper json format file (or anything alike actually, I just want to group info between them, maybe create a table first ? I don't know...) that I can use for sending into my Excel file. Any help would be much appreciated ! I'm using the Free version of Spire Nuget package v4.3.1 that contains Free Spire.PDF, Spire.Xls, Spire.Doc and more of them. But maybe there are some others solutions out there to achieve the goal I'm looking for.
Thanks in advance for helping and have a great day.

How to Modify Dicom Tags Using fo-dicom

I am creating a console application that will modify dicom tags. I will load up a single dicom file and update the PatientID tag.
I can not seem to to get anything to modify. I am able to read tags, but updating/adding does not seem to work for me. Previously I have used the DICOM ToolKit on powershell and it is very straight forward and easy, but I want to start developing in c# and so far I am failing.
using System;
using System.IO;
using SpiromicsImporterPrep.FileMethods;
using Dicom;
namespace SpiromicsImporterPrep
{
class Program
{
static void Main(string[] args)
{
string filename = #"Z:\SPIROMICS\Human_Scans\Dispatch_Received\NO_BACKUP_DONE_HERE\MIFAR\FORCE\JH114062-FU4\119755500\Non_Con_FRC__0.75__Qr40__5_7094\IM001139";
var file = DicomFile.Open(filename, readOption: FileReadOption.ReadAll);
var dicomDataset = file.Dataset;
dicomDataset.AddOrUpdate(DicomTag.PatientID, "TEST-PATIENT");
}
}
}
I expect after running the code when I look at the Dicom Header tags for this file with ImageJ or and other dicom reader that the value for the PatientID tag will be "TEST-PATIENT" The code runs with no errors but nothing seems to be updated or changed when I look at the dicom header.
you should invoke DicomFile.Save() Method.
string[] files = System.IO.Directory.GetFiles(#"D:\AcquiredImages\20191107\1.2.826.0.1.3680043.2.461.11107149.3266627937\1.2.276.0.7230010.3.1.3.3632557514.6848.1573106796.739");
foreach (var item in files)
{
DicomFile dicomFile = DicomFile.Open(item,FileReadOption.ReadAll);
dicomFile.Dataset.AddOrUpdate<string>(DicomTag.PatientName, "abc");
dicomFile.Save(item);
}
FileReadOption.ReadAll is required.

Removing ShellFile Properties

I ran into a problem this week regarding the Windows Shell Property System when applied to TIFF/TIF files. I'm using Microsoft.WindowsAPICodePack 1.1.0.0 to access the property system.
When adding properties, the file gets corrupted because it seems to get stored where the first IFD pointer would be expected. Now I'm not sure it simply inserts itself at the 5th byte, after the file header (0x49 0x49 0x2A 0x00), of if it overwrites any existing data. Additionally, when comparing the hexadecimal of the IDF entries headers, the bytes looks different. Now when I say corrupted, it is only when programmatically opening the file as a byte stream, not knowing if the file has a property system container added to it. It opens fine in Windows Image Preview, but not in the software my clients are using to view TIFF files.
Here's how I'm adding the properties (as an array of key=value strings).
public void SetFileTag(string fileName, string tagName, string tagValue)
{
try
{
using (var shellFile = ShellFile.FromFilePath(fileName))
{
var keywords = shellFile.Properties.System.Keywords.Value;
var keyValue = string.Concat(tagName, "=", tagValue);
var list = keywords == null ? new List<string>() : new List<string>(keywords);
if (list.Contains(keyValue))
{
return;
}
list.Add(keyValue);
using (var writer = shellFile.Properties.GetPropertyWriter())
{
writer.WriteProperty(shellFile.Properties.System.Keywords, list.ToArray(), true);
writer.Close();
}
}
}
finally
{
GC.Collect();
GC.WaitForPendingFinalizers();
}
}
I looked in the code pack for anything available to entirely remove the properties, but I can't find any method to do so, I can only remove the keywords value. Anyone would have an idea on how to perform this? It doesn't have to be .NET code, it can very well be a command-line tool or win32 code.

Scripting.FileSystemObject Write method fails

So in my program I'm using COM Auotmation (AutomationFactory in Silverlight 4) to create a FileSystemObject, to which I write a string (theContent). theContent in this case is a small UTF-8 XML file, which I serialized using MemoryStream into the string.
The string is fine, but for some reason whenever I call the FileSystemObject's Write method I get the error "HRESULT 0x800A0005 (CTL_E_ILLEGALFUNCTIONCALL from google)." The strangest part is that if I pass another simple string, like "hello," it works with no problems.
Any ideas?
Alternatively, if there's a way to expose a file/text stream with FileSystemObject that I could serialize to directly, that would be good as well (I can't seem to find anything not in VB).
Thanks in advance!
string theContent = System.Text.Encoding.UTF8.GetString(content, 0, content.Length);
string hello = "hello";
using (dynamic fsoCom = AutomationFactory.CreateObject("Scripting.FileSystemObject"))
{
dynamic file = fsoCom.CreateTextFile("file.xml", true);
file.Write(theContent);
file.Write(hello);
file.Close();
}
I solved the same problem today using ADODB.Stream instead of Scripting.FileSystemObject.
In a Silverlight 4 OOB App (even with elevated trust), you cannot access files in locations outside of 'MyDocuments' and a couple of other user related special folders. You have to use the workaround 'COM+ Automation'. But the Scripting.FileSystemObject, which works great for text files, cannot handle binary files. Fortunately you can also use ADODB.Stream there. And that handles binary files just fine. Here is my code, tested with Word Templates, .dotx files:
public static void WriteBinaryFile(string fileName, byte[] binary)
{
const int adTypeBinary = 1;
const int adSaveCreateOverWrite = 2;
using (dynamic adoCom = AutomationFactory.CreateObject("ADODB.Stream"))
{
adoCom.Type = adTypeBinary;
adoCom.Open();
adoCom.Write(binary);
adoCom.SaveToFile(fileName, adSaveCreateOverWrite);
}
}
A file read can be done like this:
public static byte[] ReadBinaryFile(string fileName)
{
const int adTypeBinary = 1;
using (dynamic adoCom = AutomationFactory.CreateObject("ADODB.Stream"))
{
adoCom.Type = adTypeBinary;
adoCom.Open();
adoCom.LoadFromFile(fileName);
return adoCom.Read();
}
}
Why not just:
File.WriteAllText("file.xml", theContent, Encoding.UTF8);
or even
File.WriteAllBytes("file.xml", content);

Categories