Excel compare by byte array - c#

I would like to achieve excel sheets comparison by comparing excels sheets transferred into byte arrays
Actually my code looks like:
public static Document FileToByteArray(string fileName)
{
System.IO.FileStream fs = new System.IO.FileStream(fileName, System.IO.FileMode.Open, System.IO.FileAccess.Read);
System.IO.BinaryReader binaryReader = new System.IO.BinaryReader(fs);
long byteLength = new System.IO.FileInfo(fileName).Length;
byte[] fileContent = binaryReader.ReadBytes((int)byteLength);
fs.Close();
fs.Dispose();
binaryReader.Close();
Document Document = new Document
{
DocContent = fileContent
};
return Document;
}
public class Document
{
public byte[] DocContent { get; set; }
}
And finally main code:
private static void CompareImportedExportedExcels(string ingredientName, string ingredientsExportFile, AuthorizedLayoutPage authorizedBackofficePage, IngredientsPage ingredientsPage)
{
authorizedBackofficePage.LeftMenuComponent.ChooseLeftSectionOption<IngredientsPage>();
ingredientsPage.FiltersComponent.UseStringFilter(FiltersOptions.IngredientName, ingredientName);
ingredientsPage.ExportIngredientsElement.Click();
var downloadResult = DownloadHelper.WaitUntilDownloadedCompare(ingredientsExportFile);
string ingredientExportExcelFile = DownloadHelper.SeleniumDownloadPath + ingredientsExportFile;
var exelToByteArray1 = Path.GetFullPath(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, #"..\..\..") + #"\TestData\" + "ImportFiles" + #"\IngredientsImport.xlsx");
var excelArray1 = ExcelsExtensions.FileToByteArray(exelToByteArray1);
var excelArray2 = ExcelsExtensions.FileToByteArray(ingredientExportExcelFile);
if (excelArray1.DocContent.Length == excelArray2.DocContent.Length)
{
Console.WriteLine("Excels are equal");
DownloadHelper.CheckFileDownloaded(ingredientsExportFile);
}
else
{
Console.WriteLine("Excels are not equal");
DownloadHelper.CheckFileDownloaded(ingredientsExportFile);
Assert.Fail("Seems that imported and exported excels were not the same! Check it!");
}
}
What's the current status:
Above code works correctly speaking about getting .Lenght and compare it between two excels. Problem appears with different comparison where firstly the exported excel is placed inside .ZIP file. I need to unpack it and then compare. Although excel sheets are the same .Lenght value is different and it fails.
var downloadResult = DownloadHelper.WaitUntilDownloadedCompare(productsExportFile);
string stockProductZIPFile = DownloadHelper.SeleniumDownloadPath + productsExportFile;
string stockProductUnzippedFilePath = DownloadHelper.SeleniumDownloadPath + productsExportFile;
var pathToUnzip = DownloadHelper.SeleniumDownloadPath + productsExportFolderFile;
ZipFile zip = ZipFile.Read(stockProductZIPFile);
zip.ExtractAll(pathToUnzip);
string stockProductExportedExcel = DownloadHelper.SeleniumDownloadPath + "\\ProductsExport" + #"\Stock Products.xlsx";
var exelToByteArray1 = Path.GetFullPath(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, #"..\..\..") + #"\TestData\" + "ImportFiles" + #"\StockProduct.xlsx");
var excelArray1 = ExcelsExtensions.FileToByteArray(exelToByteArray1);
var excelArray2 = ExcelsExtensions.FileToByteArray(stockProductExportedExcel);
if (excelArray1.DocContent.Length == excelArray2.DocContent.Length)
{
Console.WriteLine("Excels are equal");
DownloadHelper.CheckFileDownloaded(stockProductUnzippedFilePath);
DownloadHelper.CheckFileDownloaded(pathToUnzip);
}
else
{
Console.WriteLine("Excels are not equal");
DownloadHelper.CheckFileDownloaded(stockProductUnzippedFilePath);
DownloadHelper.CheckFileDownloaded(pathToUnzip);
Assert.Fail("Seems that imported and exported excels were not the same! Check it!");
}
Ideas to solve
First of all I'm not sure if comparing those two by .Lenght is good idea. For one case it works but for the other it's not. I'm not sure if it is connected with packing sheet to .zip format and then unpacking it? Actually in second (broken) scenario products size actually differs. Oracle product has 4 KB and exported one has 10 KB (even thought their data inside is the same).

Related

How to extract all pages and attachments from PDF to PNG

I am trying to create a process in .NET to convert a PDF and all it's pages + attachments to PNGs. I am evaluating libraries and came across PDFiumSharp but it is not working for me. Here is my code:
string Inputfile = "input.pdf";
string OutputFolder = "Output";
string fileName = Path.GetFileNameWithoutExtension(Inputfile);
using (PdfDocument doc = new PdfDocument(Inputfile))
{
for (int i = 0; i < doc.Pages.Count; i++)
{
var page = doc.Pages[i];
using (var bitmap = new PDFiumBitmap((int)page.Width, (int)page.Height, false))
{
page.Render(bitmap);
var targetFile = Path.Combine(OutputFolder, fileName + "_" + i + ".png");
bitmap.Save(targetFile);
}
}
}
When I run this code, I get this exception:
screenshot of exception
Does anyone know how to fix this? Also does PDFiumSharp support extracting PDF attachments? If not, does anyone have any other ideas on how to achieve my goal?
PDFium does not look like it supports extracting PDF attachments. If you want to achieve your goal, then you can take a look at another library that supports both extracting PDF attachments as well as converting PDFs to PNGs.
I am an employee of the LEADTOOLS PDF SDK which you can try out via these 2 nuget packages:
https://www.nuget.org/packages/Leadtools.Pdf/
https://www.nuget.org/packages/Leadtools.Document.Sdk/
Here is some code that will convert a PDF + all attachments in the PDF to separate PNGs in an output directory:
SetLicense();
cache = new FileCache { CacheDirectory = "cache" };
List<LEADDocument> documents = new List<LEADDocument>();
if (!Directory.Exists(OutputDir))
Directory.CreateDirectory(OutputDir);
using var document = DocumentFactory.LoadFromFile("attachments.pdf", new LoadDocumentOptions { Cache = cache, LoadAttachmentsMode = DocumentLoadAttachmentsMode.AsAttachments });
if (document.Pages.Count > 0)
documents.Add(document);
foreach (var attachment in document.Attachments)
documents.Add(document.LoadDocumentAttachment(new LoadAttachmentOptions { AttachmentNumber = attachment.AttachmentNumber }));
ConvertDocuments(documents, RasterImageFormat.Png);
And the ConvertDocuments method:
static void ConvertDocuments(IEnumerable<LEADDocument> documents, RasterImageFormat imageFormat)
{
using var converter = new DocumentConverter();
using var ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD);
ocrEngine.Startup(null, null, null, null);
converter.SetOcrEngineInstance(ocrEngine, false);
converter.SetDocumentWriterInstance(new DocumentWriter());
foreach (var document in documents)
{
var name = string.IsNullOrEmpty(document.Name) ? "Attachment" : document.Name;
string outputFile = Path.Combine(OutputDir, $"{name}.{RasterCodecs.GetExtension(imageFormat)}");
int count = 1;
while (File.Exists(outputFile))
outputFile = Path.Combine(OutputDir, $"{name}({count++}).{RasterCodecs.GetExtension(imageFormat)}");
var jobData = new DocumentConverterJobData
{
Document = document,
Cache = cache,
DocumentFormat = DocumentFormat.User,
RasterImageFormat = imageFormat,
RasterImageBitsPerPixel = 0,
OutputDocumentFileName = outputFile,
};
var job = converter.Jobs.CreateJob(jobData);
converter.Jobs.RunJob(job);
}
}

How to write a list of structs into a file in C#?

I'm going to use List as a data structure to temporarily save real time data and I want to write it into a file.
The program receives millions of data in real time, so I want to reduce as much latency and overhead as possible. At first, I just combined data (String) and saved it in String (into a list), but I've found out that using a fixed list with structs is better because the process of combining strings is expensive(before writing a file, when temporarily saving real time data).
Now I'm wondering how to efficiently write structs in a list into a file.
List<struct_string> struct_list = new List<struct_string>(1000000);
FileStream fileStream = new FileStream(fileName, FileMode.Append, FileAccess.Write);
StreamWriter streamWriter = new StreamWriter(fileStream);
for (int num = 0; list_structs.Count > num; num++)
{
streamWriter.Write( list_structs[num].string1 + ", " +
list_structs[num].string2 + ", " +
list_structs[num].string3 + ", " +
list_structs[num].string4 + ", " +
list_structs[num].string5 + "\r\n");
}
internal struct struct_string
{
public string string1;
public string string2;
public string string3;
public string string4;
public string string5;
public struct_string(string _string1, string _string2, string _string3, string _string4, string _string5)
{
string1 = _string1;
string2 = _string2;
string3 = _string3;
string4 = _string4;
string5 = _string5;
}
}
This is what I could initially think of, but I think there should be built-in functions or better ways to do this.
To read/write them to a binary file do this:
Define the struct:
[Serializable]
public struct X
{
public int N {get; set;}
public string S {get; set;}
}
Read and write it using a BinaryFormatter:
string filename = #"c:\temp\list.bin";
var list = new List<X>();
list.Add(new X { N=1, S="No. 1"});
list.Add(new X { N=2, S="No. 2"});
list.Add(new X { N=3, S="No. 3"});
BinaryFormatter formatter = new BinaryFormatter();
using (System.IO.Stream ms = File.OpenWrite(filename))
{
formatter.Serialize(ms, list);
}
using (FileStream fs = File.Open(filename, FileMode.Open))
{
object obj = formatter.Deserialize(fs);
var newlist = (List<X>)obj;
foreach (X x in newlist)
{
Console.Out.WriteLine($"N={x.N}, S={x.S}");
}
}
The solution uses that the List class as well as the X struct is serializable.
Try using serialization instead, you have libraries for that and with the new System.Text.Json in .NET Core you have really neat performance, another popular is Newtonsoft for standard .NET.
I know this isn't a direct answer to your question, but I hope it helps.

Storing double values in binary file and reading it

I need to store double values in binary files and read them. My data is stored in an array. I have tried the following code but apparently I am storing more values than the array size and I am reading totally wrong data. Like if I am storing 0.26 from array[0], I can see the very few first values in the binary file to be A4 70 3D... I don't get how it is converting 0.26 to these values and on what basis.
This code is for writing to binary file:
double [] DataCollection_array = new double[10000];
public void store_data()
{
Binary_filename = folder_path + "\\" + "Binary1.bin";
stream = new FileStream(folder_path + "\\" + "Binary1.bin", FileMode.Create);
binary_writer = new BinaryWriter(stream);
writetoBinary(DataCollection_array.size);
}
public void writetoBinary(int size)
{
for (int i = 0; i < size; i++)
{
binary_writer.Write(DataCollection_array[i]);
}
}
This code for reading the double values from a folder that contains binary files:
int bytes_counter1 = 0;
Channels = new List<double>[File_size];
public void read_data ()
{
path2 = Directory2.folder_path + "\\" + "Binary" + file_number + ".bin";
file_stream = new FileStream(path2, FileMode.Open, FileAccess.Read);
using (reader = new BinaryReader(file_stream))
{
if (bytes_counter1 < reader.BaseStream.Length)
{
reader.BaseStream.Seek((count + offset1), SeekOrigin.Begin);
Channels.Add((double)reader.ReadByte());
bytes_counter1++;
}
}
}
You are writing doubles:
binary_writer.Write(DataCollection_array[i]);
But you are only reading bytes:
Channels.Add((double)reader.ReadByte()); // Read one byte
Change it to:
Channels.Add(reader.ReadDouble()); // Read one double

I read pdf file then for show on the client I use tempStream.WriteByte but it reverse writing

By the way my english not perfect sory .
I use this method . I want to read file then show of pdf format on the client. So this code can do it.
But when I show pdf file . İt did reverse write on the document. What is my problem I cant understand
FileStream fileStream = File.OpenRead(fileDirectoryPath + "\\" + pROJEDOKUMANBec.FILENAME);
pROJEDOKUMANBec.SOURCE = ConvertStreamToByteBuffer(fileStream);
public byte[] ConvertStreamToByteBuffer(System.IO.Stream theStream)
{
int b1;
System.IO.MemoryStream tempStream = new System.IO.MemoryStream();
while ((b1 = theStream.ReadByte()) != -1)
{
tempStream.WriteByte(((byte)b1));
}
return tempStream.ToArray();
}
And here my ts code.
#ViewChild('pdfViewir') pdfViewir: PdfViewer;
private _openPdf(bec: ProjeDokumanBec): void {
this._rptSrc = bec.SOURCE;
this.pdfViewir.Zoom = 1;
this.pdfViewir.OriginalSize = true;
}
here my html code
<PdfViewer #pdfViewir Id="pdfViewir" [Src]="_rptSrc"></PdfViewer>

.Net Core: Reading data from CSV & Excel files

Using .net core & c# here.
I have a UI from which user can upload the Excel or CSV files. Once they upload this goes to my web api which handles the reading of the data from these files and returns json.
My Api code as:
[HttpPost("upload")]
public async Task<IActionResult> FileUpload(IFormFile file)
{
JArray data = new JArray();
using (ExcelPackage package = new ExcelPackage(file.OpenReadStream()))
{
ExcelWorksheet worksheet = package.Workbook.Worksheets[1];
//Process, read from excel here and populate jarray
}
return Ok(data );
}
In my above code I am using EPPlus for reading the excel file. For excel file it works all fine but it cannot read csv file which is the limitation of EPPlus.
I searched and found another library CSVHelper: https://joshclose.github.io/CsvHelper/ The issue with this is it does vice versa and can read from CSV but not from Excel.
Is there any library available which supports reading from both.
Or would it be possible use EPPlus only but convert uploaded CSV to excel on the fly and then read. (please note I am not storing the excel file anywhere so cant use save as to save it as excel)
Any inputs please?
--Updated - Added code for reading data from excel---
int rowCount = worksheet.Dimension.End.Row;
int colCount = worksheet.Dimension.End.Column;
for (int row = 1; row <= rowCount; row++)
{
for (int col = 1; col <= colCount; col++)
{
var rowValue = worksheet.Cells[row, col].Value;
}
}
//With the code suggested in the answer rowcount is always 1
You can use EPPLus and a MemoryStream for opening csv files into an ExcelPackage without writing to a file. Below is an example. You may have to change some of the the parameters based on your CSV file specs.
[HttpPost("upload")]
public async Task<IActionResult> FileUpload(IFormFile file)
{
var result = string.Empty;
string worksheetsName = "data";
bool firstRowIsHeader = false;
var format = new ExcelTextFormat();
format.Delimiter = ',';
format.TextQualifier = '"';
using (var reader = new System.IO.StreamReader(file.OpenReadStream()))
using (ExcelPackage package = new ExcelPackage())
{
result = reader.ReadToEnd();
ExcelWorksheet worksheet =
package.Workbook.Worksheets.Add(worksheetsName);
worksheet.Cells["A1"].LoadFromText(result, format, OfficeOpenXml.Table.TableStyles.Medium27, firstRowIsHeader);
}
}
Here's using Aspose, which is unfortunately not free, but wow it works great. My API is using the streaming capability with Content-Type: multipart/form-data rather than the IFormFile implementation:
[HttpPut]
[DisableFormValueModelBinding]
public async Task<IActionResult> UploadSpreadsheet()
{
if (!MultipartRequestHelper.IsMultipartContentType(Request.ContentType))
{
return BadRequest($"Expected a multipart request, but got {Request.ContentType}");
}
var boundary = MultipartRequestHelper.GetBoundary(MediaTypeHeaderValue.Parse(Request.ContentType), _defaultFormOptions.MultipartBoundaryLengthLimit);
var reader = new MultipartReader(boundary, HttpContext.Request.Body);
var section = (await reader.ReadNextSectionAsync()).AsFileSection();
//If you're doing CSV, you add this line:
LoadOptions loadOptions = new LoadOptions(LoadFormat.CSV);
var workbook = new Workbook(section.FileStream, loadOptions);
Cells cells = workbook.Worksheets[0].Cells;
var rows = cells.Rows.Cast<Row>().Where(x => !x.IsBlank);
//Do whatever else you want here
Please try with below code
private string uploadCSV(FileUpload fl)
{
string fileName = "";
serverLocation = Request.PhysicalApplicationPath + "ExcelFiles\\";
fileName = fl.PostedFile.FileName;
int FileSize = fl.PostedFile.ContentLength;
string contentType = fl.PostedFile.ContentType;
fl.PostedFile.SaveAs(serverLocation + fileName);
string rpath = string.Empty, dir = string.Empty;
HttpContext context = HttpContext.Current;
string baseUrl = context.Request.Url.Scheme + "://" + context.Request.Url.Authority + context.Request.ApplicationPath.TrimEnd('/') + '/';
try
{
rpath = serverLocation + fileName;//Server.MapPath(dir + fileName);
using (Stream InputStream = fl.PostedFile.InputStream)
{
Object o = new object();
lock (o)
{
byte[] buffer = new byte[InputStream.Length];
InputStream.Read(buffer, 0, (int)InputStream.Length);
lock (o)
{
File.WriteAllBytes(rpath, buffer);
buffer = null;
}
InputStream.Close();
}
}
}
catch (Exception ex)
{
lblSOTargetVal.Text = ex.Message.ToString();
}
return rpath;
}
Use the Open XML SDK package and add insert working solution for it.

Categories