How to read 10GB .csv file using c# windows application [closed] - c#

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I am try to read 10GB .csv file and import in our db. but when i am read this file via data adapter to fill data set or data table its give out of memory exception.

As per the MSDN documentation for the SqlDataAdapter class:
Represents a set of data commands and a database connection that are
used to fill the DataSet and update a SQL Server database. This class
cannot be inherited.
If we where to look at the DataSet documentation:
Represents an in-memory cache of data.
Thus this means that you need to have enough memory for the data set to be stored in it. The question does not provide detail on how is the file actually read, but for this to work you would need to essentially:
Iterate over the file 1 row at a time, do not load the entire file, say as a string in memory. You can use a TextReader to read.
You would then use an SqlConnection to write into the database one line at a time.
This should allow you to essentially keep pointers to where the data is and where it needs to go, thus it reduces the amount of data you need to store in memory.

You may read your file line by line and import to database. For parsing CSV use CsvHelper.

Related

Optimized parse text file, to then upload to Excel [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
My project is to take large amounts of logs, output inside text files and parse some of the data to be made into Excel format.
There is a lot of garbage data in between not needed
This is how one portion of it is like:
2018-05-17 07:16:57.105>>>>>>
{"status":"success", "statusCode":"0", "statusDesc":"Message Processed Sucessfully", "messageNumber":"451", "payload":{"messageCode":"SORTRESPONSE","Id":"5L","Id":"28032","messageTimestamp":"2018-05-16 23:16:55"}}
I will first need to take the time stamp befor the "{}"
as it differs from the messageTimestamp
When generating the excel workbook
This is how it will look like in Excel:
------A-----------------------------------B--------------C
1. Overall time stamp ---------- status------- statusCode
2. 2018-05-17 07:16:57.105 - success --- -0
And so on...
payload has its own section of logs within its "{}"
so its section in excel will look like this:
F
1. payload
2. {"messageCode":"SORTRESPONSE","Id":"5L","Id":"28032","messageTimestamp":"2018-05-16 23:16:55"}
its content can be in one section that's not an issue.
A friend of mine have done something similar but it can take a few minutes to even generate even one relatively small excel document
My Question:
What is the most optimal way can I parse the data needed to then store it in an array or multidimensional array
to then push it into an excel document.
I would try to split the input text on newline characters, then parse the JSON part with Newtonsoft.Json.
I would highly advise to not try to parse the JSON yourself. The bottleneck here will be disk IO not in-memory processing, so make it easy to write correct code and use 3rd party libraries.
Once you have structured data representing the input, you can write each entry to an output file with only the fields you need.
For an Excel file, is CSV okay or do you need XLSX files? For CSV you can just write to a file directly, for XLSX I would recommend the EPPlus library.
https://www.newtonsoft.com/json
https://archive.codeplex.com/?p=epplus

Read data from serial port and keep track of most recent lines read [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Here is my problem. I have about 20 different types of messages coming in through a port. they can all be identified and come in at different rates (some come multiple times per second, some once every couple of seconds.)
I need to keep a log of these items, but only log them every 30 minutes. I would like to constantly read the port, and update an array of some sort. then when the timer event occurs, log the data from the array.
I am doing this in C# .net 4.5.2
You can use a dictionary of lists to organize this type of in-memory log. If you create say
public enum EventTypes
{
A,
B
};
and assuming your data points are integers, you could use
Dictionary<EventTypes, List<int>> inMemoryLog = new Dictionary<EventTypes, List<int>>();
Then you could log like this:
inMemoryLog[EventTypes.A].Add(myDataPoint);
When you're ready to flush the log to memory, persist the inMemoryLog to disk, removing each entry that you have written out. It may be a good strategy to create a copy of inMemoryLog to use to write to disk (and then re-initialize the imMemoryLog instance you use for logging) so that in-memory logging can continue without interruption.

How to detect a corrupted file before opening it in C# [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have some files which have been corrupted and I want to detect which file is corrupted before opening it. I used FileInfor class but useless.
The easiest way to solve this issue is to have your program have a log file of when it accesses and edits a file. By keeping track of this, if a program exited prematurely, you could easily identify the the saving was interrupted. To do this you should have the program add a new log every time the program has completed saving a file, not before it is saved. When the program trys to open the program, you can check the time that the file was last edited and if the last edited time is later than the time logged in the log file, then reject it.
Of course this system will only work on one computer. There are other ways of implementing this such as having a log at the end of the file. If the log does not exist, then you know that the file is corrupt. Open your mind up to more ideas and try to think of some more ways to solve this issue. This is just one example.
1. Unfortunately there is no easy way to determine if file is corrupt before even rendering it. Usually the problem files have a correct header so the real reasons of corruption are different. The file contains a reference table giving the exact byte offset locations of each object from the start of the file. So most probably corrupted files have broken offsets or may have some object missing.
The best way to determine that the file is corrupted is to use specialized libraries of that type like PDF file libraries. There are lots of both free and commercial of such libraries for .NET. You may simply try to load file with one of such libraries. iTextSharp will be a good choice.
2. Or if you want, you can go though this answer :
File Upload Virus Scanning(server side)
You might need to parse the header bytes of the file and make sure it satisfies the rest of the file body contents. e.g.,
Reading image header info without loading the entire image
This is how you can read header of an image and get the image size without opening it. In the same way you should look at the desired file format header and validate it as per the file format rule. This is not the readymade solution but may give you an idea.

How do i pass a text file to as a parameter to a stored procedure from WinForm? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Is it possible to send a text file of csv values to a parameter in a stored procedure?
how do send the the text file from winforms to stored
procedure?
How do i create a parameter in stored procedure to accept
that text file?
thanks
You can pass the path to the file into the stored procedure as a simple "nvarchar(260)" parameter, though the Sql Server process will need to have read access to that file path (i.e., you either need to copy it to a share on the sql server itself from the WinForms app, or it needs to be on a network share that the Sql Server account has access to).
Once you have the path in the stored procedure, you can use a Bulk Insert process; see this thread on how to use Bulk Load from within a stored procedure:
Bulk insert using stored procedure
or this thread to load it into a temp table to work with: How to BULK INSERT a file into a *temporary* table where the filename is a variable?.
That being said, a better approach would be to convert the CSV file to Xml in the WinForms app, then pass Xml as a string to the stored procedure. Sql Server 2005 and later have good support for Xml parameters, enabling you to query them. (Personally I would take this latter approach). The Xml type is just one of the available parameter types, so you end up passing it like you would a string parameter. Working with Xml in the Stored Procedure will be much easier (and better supported) than the CSV will be.

looking for alternative to Excel spreadsheets as a data collection mean [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Someone wants me to implement a server side data extraction service to extract data from Micorsoft Excel 2010 spreadsheet (xlsx). A spreadsheet must have data in the correct places in order for the extraction to work. Is there a better alternative to using spreadsheets as data collection ? I worry that users might produce a spreadsheet that can fail a parsing/extraction method even though the displayed spreadsheet is understandable to a human.
For example , a user needs to type out many items and each item will several detail lines following it. My program will need identify the boundary between each item and then collect the detail lines that follow it. If a extraction fails, a user will need clues to help them to fix the problem and then re-submit the xlsx file again.
Is there a better way ? Is there something as portable as a Excel spreadsheet but has structured data that can be easily extracted ?
Or perhaps can a Excel spreadsheet to prepare data into structured data such as a JSON representation and then store it as part of the open xml package ?
You can improve data collection using Excel by using Named Ranges and adding Validation code that runs on data entry to the spreadsheet. The Validation code could also add metadata tags to the workbook. Then your extraction program can use the Named ranges (and metadata) to find the data.
I would use an Access DB, very portable but allows you to password protect the structure or only allow insert via a form.
Also an access DB can be read easily via the Jet engine so extracting data automatically in C# is fairly straight forward.
If I understood you question right - you want to store some custom XML describing your data inside you OpenXml Excel file. I think you could use Custom XML Parts for that.

Categories