Optimized parse text file, to then upload to Excel [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
My project is to take large amounts of logs, output inside text files and parse some of the data to be made into Excel format.
There is a lot of garbage data in between not needed
This is how one portion of it is like:
2018-05-17 07:16:57.105>>>>>>
{"status":"success", "statusCode":"0", "statusDesc":"Message Processed Sucessfully", "messageNumber":"451", "payload":{"messageCode":"SORTRESPONSE","Id":"5L","Id":"28032","messageTimestamp":"2018-05-16 23:16:55"}}
I will first need to take the time stamp befor the "{}"
as it differs from the messageTimestamp
When generating the excel workbook
This is how it will look like in Excel:
------A-----------------------------------B--------------C
1. Overall time stamp ---------- status------- statusCode
2. 2018-05-17 07:16:57.105 - success --- -0
And so on...
payload has its own section of logs within its "{}"
so its section in excel will look like this:
F
1. payload
2. {"messageCode":"SORTRESPONSE","Id":"5L","Id":"28032","messageTimestamp":"2018-05-16 23:16:55"}
its content can be in one section that's not an issue.
A friend of mine have done something similar but it can take a few minutes to even generate even one relatively small excel document
My Question:
What is the most optimal way can I parse the data needed to then store it in an array or multidimensional array
to then push it into an excel document.

I would try to split the input text on newline characters, then parse the JSON part with Newtonsoft.Json.
I would highly advise to not try to parse the JSON yourself. The bottleneck here will be disk IO not in-memory processing, so make it easy to write correct code and use 3rd party libraries.
Once you have structured data representing the input, you can write each entry to an output file with only the fields you need.
For an Excel file, is CSV okay or do you need XLSX files? For CSV you can just write to a file directly, for XLSX I would recommend the EPPlus library.
https://www.newtonsoft.com/json
https://archive.codeplex.com/?p=epplus

Related

How to read 10GB .csv file using c# windows application [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I am try to read 10GB .csv file and import in our db. but when i am read this file via data adapter to fill data set or data table its give out of memory exception.
As per the MSDN documentation for the SqlDataAdapter class:
Represents a set of data commands and a database connection that are
used to fill the DataSet and update a SQL Server database. This class
cannot be inherited.
If we where to look at the DataSet documentation:
Represents an in-memory cache of data.
Thus this means that you need to have enough memory for the data set to be stored in it. The question does not provide detail on how is the file actually read, but for this to work you would need to essentially:
Iterate over the file 1 row at a time, do not load the entire file, say as a string in memory. You can use a TextReader to read.
You would then use an SqlConnection to write into the database one line at a time.
This should allow you to essentially keep pointers to where the data is and where it needs to go, thus it reduces the amount of data you need to store in memory.
You may read your file line by line and import to database. For parsing CSV use CsvHelper.

How to detect a corrupted file before opening it in C# [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have some files which have been corrupted and I want to detect which file is corrupted before opening it. I used FileInfor class but useless.
The easiest way to solve this issue is to have your program have a log file of when it accesses and edits a file. By keeping track of this, if a program exited prematurely, you could easily identify the the saving was interrupted. To do this you should have the program add a new log every time the program has completed saving a file, not before it is saved. When the program trys to open the program, you can check the time that the file was last edited and if the last edited time is later than the time logged in the log file, then reject it.
Of course this system will only work on one computer. There are other ways of implementing this such as having a log at the end of the file. If the log does not exist, then you know that the file is corrupt. Open your mind up to more ideas and try to think of some more ways to solve this issue. This is just one example.
1. Unfortunately there is no easy way to determine if file is corrupt before even rendering it. Usually the problem files have a correct header so the real reasons of corruption are different. The file contains a reference table giving the exact byte offset locations of each object from the start of the file. So most probably corrupted files have broken offsets or may have some object missing.
The best way to determine that the file is corrupted is to use specialized libraries of that type like PDF file libraries. There are lots of both free and commercial of such libraries for .NET. You may simply try to load file with one of such libraries. iTextSharp will be a good choice.
2. Or if you want, you can go though this answer :
File Upload Virus Scanning(server side)
You might need to parse the header bytes of the file and make sure it satisfies the rest of the file body contents. e.g.,
Reading image header info without loading the entire image
This is how you can read header of an image and get the image size without opening it. In the same way you should look at the desired file format header and validate it as per the file format rule. This is not the readymade solution but may give you an idea.

How I can read after EOF? C# [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I`m using C# and I need to read something after EOF. Is it possible by using C#? How?
Thanks.
You cant. EOF means end of file, there's nothing actually in the file after that.
You may as well ask how you can get ten gallons of oil from a four-gallon drum. Once it's empty, there's no more to be had.
Since you're talking C# hence Windows (and based on your comment and data located behind the end of file pointer), it's possible that they may be referring to "DOS mode" text files, which are (or used to be, I haven't investigated recently) terminated by the CTRL-Z character.
From the earliest days of the PC revolution, where CP/M used integral numbers of disk blocks to store a file and only stored the number of disk blocks rather than the number of bytes, CTRL-Z was used to indicate end of file if the file wasn't an exact multiple of the disk block size.
If that's the case, it's probably best just to open the file as a binary file, then read up to the first CTRL-Z character (code point 26) - everything beyond that could be considered data beyond EOF if it's truly a text file of that format.

looking for alternative to Excel spreadsheets as a data collection mean [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Someone wants me to implement a server side data extraction service to extract data from Micorsoft Excel 2010 spreadsheet (xlsx). A spreadsheet must have data in the correct places in order for the extraction to work. Is there a better alternative to using spreadsheets as data collection ? I worry that users might produce a spreadsheet that can fail a parsing/extraction method even though the displayed spreadsheet is understandable to a human.
For example , a user needs to type out many items and each item will several detail lines following it. My program will need identify the boundary between each item and then collect the detail lines that follow it. If a extraction fails, a user will need clues to help them to fix the problem and then re-submit the xlsx file again.
Is there a better way ? Is there something as portable as a Excel spreadsheet but has structured data that can be easily extracted ?
Or perhaps can a Excel spreadsheet to prepare data into structured data such as a JSON representation and then store it as part of the open xml package ?
You can improve data collection using Excel by using Named Ranges and adding Validation code that runs on data entry to the spreadsheet. The Validation code could also add metadata tags to the workbook. Then your extraction program can use the Named ranges (and metadata) to find the data.
I would use an Access DB, very portable but allows you to password protect the structure or only allow insert via a form.
Also an access DB can be read easily via the Jet engine so extracting data automatically in C# is fairly straight forward.
If I understood you question right - you want to store some custom XML describing your data inside you OpenXml Excel file. I think you could use Custom XML Parts for that.

Split CSV files into exact 1gb files or little less? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Every month we receive a invoice file that is always bigger then 2GB, our print house has a 1.1GB limitation and we currently do all these process by hand.
The first step in this application would be to be able to split those HUGE 2GB files into limited 1GB files in a way it won't break each CSV entry and that each files will be readable from the start to the end without breaking any data.
How could I split the file to me the above requirements ?
Are there any libraries for this such of process on CSV files ?
How about just copying the first 1 GB of data from the source into a new file, then searching backward for the last newline, and truncating the new file after that. Then you know how large the first file is, and you repeat the process for a second new file from that point to 1 GB later. Seems straightforward to me in just about any language (you mentioned C#, which I haven't used recently, but certainly it can easily do the job).
You didn't make it clear whether you need to copy the header line (if any) to each of the resulting files. Again, should be straightforward--just do it prior to the copying of data into each of the files.
You could also take the approach of just generically splitting the files using tar on Unix or some Zip-like utility on Windows, then telling your large-file-challenged partner to reconstruct the file from that format. Or maybe simply compressing the CSV file would work, and get you under the limit in practice.
There are just a few things you need to take care of:
Keep the line breaks: split the file on a new line (algorithmically said split the file on the previous line to that where the 1GB limit occured minus the header line size)
Copy the header to the beginning of the new file and then paste the rest
Preserve the encoding.
In a bash/terminal prompt, write:
man split
.. then
man wc
.. simply count the number of lines in the file, divide it by X, feed the number to split and you have X files less than 1.1GB (if x = filesize/1.1)

Categories