Convert pdf file to Excel in C# - c#

I need to convert a pdf file to excel. I have tried to do this using iTextSharp. I was able to extract lines using iTextSharp, but the problem is iTextSharp gives me spaces as column separator and so I am not able to bifurcate between column separator spaces and actual spaces in the data.
e.g. I have the following data in pdf (columns separated by :here),
Col1:Col 2:Col 3:Col4
I get,
Col1 Col 2 Col 3 Col4
I need to get something like
Col1{tab}Col 2{tab}Col 3{tab}Col4
Any solution for this?
I am also open to other C# libraries instead of iTextSharp , preferably open source.
Thanks

Related

Disallow Excel to auto format columns while opening a .csv file

Im exporting some data to a .csv file.
.csv file:
Hotel Name;Street;Postal Code;City;Latitude;Longitude
Hotel X;Street 1;00000;City X;15.000000;15.000000
But if i open it in Excel, Excel will Format the Latitude and Longitude automaticly so it cannot be used for copy & paste.
Ignore the 0 at postal Code, i must get rid of the 1.000 at Latitude and Longitude
How can i prevent Excel from doing this?
It should be done in the .csv file and not in Excel.
The Export Code:
foreach(...)
{
StringBuilder_Export.Append(DataRow_Temp[i] + ";");
}
StreamWriter StreamWriter_Export = new StreamWriter(
SaveFileDialog_Geo_Export.FileName,
true,
Encoding.Default
);
StreamWriter_Export.WriteLine(StringBuilder_Export.ToString());
EDIT: Im searching primarly for a solution of my Latitude and Longitude Problem.
Possibly a duplicate of Stop Excel from automatically converting certain text values to dates but if you want to generate an Excel file that looks exactly as you want when opened, do not use csv. Excel will always attempt to guess datatypes of csv columns by looking at the first (15 i think?) rows..
I use EPPlus to generate xlsx files from my apps, but there are many libraries you could use
Change its extension to txt and use the text import wizard. Then use that to tell excel how the columns should be treated (text, currency, date, etc). The text import wizard will start automatically when you open a .txt file
ensure the file is in UTF-8 format when you save it as a .csv
Then open excel, browse to select the file you want and it will automatically run the prompter.
Any other format than utf-8 and excel will try and auto convert.

Format a column to text format in CSV file using C#

I have an application, on which i export data from a datagrid to a csv file. I do this with the following steps:
Create a file:
var myFile = File.Create("test.csv");
myfile.Close();
write the data to a string builder(data)
write the data to the created file.
File.WriteAllText(filepath, data);
This works fine. The resulting csv file is opened in excel. I have a column of numbers which may have preceeding zeros, when those data is exported to csv file the preceeding 0's are lost. Is it possible to format the column as text column so the zeros are not lost.
View your file in notepad. The leading zeros are there, intact :-)
You need to tell Excel which format to use when you open the file. Change the file to .txt and use File -> Open in Excel and you should be presented with an import wizard. There you can explicitly tell Excel to treat your column as "text" which will prevent it from stripping leading zeroes.
More info here: http://www.howtogeek.com/howto/microsoft-office/how-to-import-a-csv-file-containing-a-column-with-a-leading-0-into-excel/

How to pragmatically set the excel number format to text for a csv file?

I am wring a csv file with some program.
I am facing a issue when I open it in the excel file and for some column if it exceeds the value more than 256 it automatically truncate it.
I learnt that by default the excel has number format TO 'GENERAL' if I could set it to 'TEXT' then it wouldn't truncate any part.
So is there any programmatic way which can set the number format from GENERAL to TEXT.
csv is a plain text format. You can't embed meta-data such as column handling into that file format.
If you don't want excel to truncate the column, then you'll need to truncate it yourself when you write out the csv file:
var csvFileBuffer = new StringBuilder();
var columns = new List<string>();
csvFileBuffer.AppendLine(
string.Join(
",",
columns.Select(s =>
//truncate column header (change logic as appropriate)
s.Substring(0, 255))));
That said, there may be an option you can set in excel which will change the default column type for a csv file. However, that would be a question for SuperUser.com
The other option is to write out to a native excel file format, such as xlsx. There are a number of tools you can use to this, such as the Open Xml Sdk.

How can i import Tabdelimited text into the EPPlus sheet?

I would like to use EPPlus to create an Excel file but I have a problem: my input data is in a tab-delimited format:
Name Code Grade
------------------------
N1 C22 17.6
N2 C09 18.9
N3 C18 20
How can I add this type (tab format) of data using EPPlus package?
EPPlus is a library to read and write xlsx files only, so you cannot directly parse a file in tabular format.
You need to either write a reader for your format or, even easier, use a CSV reader that supports custom delimiters, and set the delimiter to \t. You can use this reader to read each cell of your data and feed it to EPPlus to re-create the datasheet.
I would simply read the text file in the standard old C# way, load each cell into a cell in EPPlus, and then save it. You'd just have to write a few loops and some formatting code.
Epplus supports importing tab delimited text. You can see it sample9.cs (https://github.com/JanKallman/EPPlus/blob/master/SampleApp/Sample9.cs)
If you are just looking for code,
//Create the format object to describe the text file
var format = new ExcelTextFormat();
format.Delimiter='\t'; //Tab
//Now read the file into the sheet.
Console.WriteLine("Load the text file...");
var csvDir = new DirectoryInfo(AppDomain.CurrentDomain.BaseDirectory + "csv");
var range = sheet.Cells["A1"].LoadFromText(Utils.GetFileInfo(csvDir, "Sample9-2.txt", false), format);
Direct link to the line # https://github.com/JanKallman/EPPlus/blob/master/SampleApp/Sample9.cs#L138

Merging rtf documents in C sharp

I am using NRtfTree to merge two rtf documents. The merged document does not retain the same look as the template and it looks to be a problem to do with encoding. How do I set the encoding for NRtfTree. Or is there any other component I can use to merge two documents without using Office automation.
Thanks
Consider using .Net RichTextBox and Class RichTextBox.
Create two System.Windows.Forms.RichTextBox objects set the value from the first and then the second rtf file. Then create another RichTextBox with the two RTF documents put together.
System.Windows.Forms.RichTextBox.Rtf = fromNrOne + fromNrTwo;

Categories