Efficient way to read large tab delimited txt file?

Efficient way to read large tab delimited txt file? - c#

I have a tab delimited txt file with 500K records. I'm using the code below to read data to dataset. With 50K it works fine but 500K it gives "Exception of type 'System.OutOfMemoryException' was thrown."
What is the more efficient way to read large tab delimited data?
Or how to resolve this issue? Please give me an example
public DataSet DataToDataSet(string fullpath, string file)
{
string sql = "SELECT * FROM " + file; // Read all the data
OleDbConnection connection = new OleDbConnection // Connection
("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + fullpath + ";"
+ "Extended Properties=\"text;HDR=YES;FMT=Delimited\"");
OleDbDataAdapter ole = new OleDbDataAdapter(sql, connection); // Load the data into the adapter
DataSet dataset = new DataSet(); // To hold the data
ole.Fill(dataset); // Fill the dataset with the data from the adapter
connection.Close(); // Close the connection
connection.Dispose(); // Dispose of the connection
ole.Dispose(); // Get rid of the adapter
return dataset;
}

Use a stream approach with TextFieldParser - this way you will not load the whole file into memory in one go.

You really want to enumerate the source file and process each line at a time. I use the following
public static IEnumerable<string> EnumerateLines(this FileInfo file)
{
using (var stream = File.Open(file.FullName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (var reader = new StreamReader(stream))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
Then for each line you can split it using tabs and process each line at a time. This keeps memory down really low for the parsing, you only use memory if the application needs it.

Have you tried the TextReader?
using (TextReader tr = File.OpenText(YourFile))
{
string strLine = string.Empty;
string[] arrColumns = null;
while ((strLine = tr.ReadLine()) != null)
{
arrColumns = strLine .Split('\t');
// Start Fill Your DataSet or Whatever you wanna do with your data
}
tr.Close();
}

I found FileHelpers
The FileHelpers are a free and easy to use .NET library to import/export data from fixed length or delimited records in files, strings or streams.
Maybe it can help.

Related

Read & write a single line from a file without overwrite [duplicate]

I have two text files, Source.txt and Target.txt. The source will never be modified and contain N lines of text. So, I want to delete a specific line of text in Target.txt, and replace by an specific line of text from Source.txt, I know what number of line I need, actually is the line number 2, both files.
I haven something like this:
string line = string.Empty;
int line_number = 1;
int line_to_edit = 2;
using StreamReader reader = new StreamReader(#"C:\target.xml");
using StreamWriter writer = new StreamWriter(#"C:\target.xml");
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
writer.WriteLine(line);
line_number++;
}
But when I open the Writer, the target file get erased, it writes the lines, but, when opened, the target file only contains the copied lines, the rest get lost.
What can I do?

the easiest way is :
static void lineChanger(string newText, string fileName, int line_to_edit)
{
string[] arrLine = File.ReadAllLines(fileName);
arrLine[line_to_edit - 1] = newText;
File.WriteAllLines(fileName, arrLine);
}
usage :
lineChanger("new content for this line" , "sample.text" , 34);

You can't rewrite a line without rewriting the entire file (unless the lines happen to be the same length). If your files are small then reading the entire target file into memory and then writing it out again might make sense. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2; // Warning: 1-based indexing!
string sourceFile = "source.txt";
string destinationFile = "target.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read the old file.
string[] lines = File.ReadAllLines(destinationFile);
// Write the new file over the old file.
using (StreamWriter writer = new StreamWriter(destinationFile))
{
for (int currentLine = 1; currentLine <= lines.Length; ++currentLine)
{
if (currentLine == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(lines[currentLine - 1]);
}
}
}
}
}
If your files are large it would be better to create a new file so that you can read streaming from one file while you write to the other. This means that you don't need to have the whole file in memory at once. You can do that like this:
using System;
using System.IO;
class Program
{
static void Main(string[] args)
{
int line_to_edit = 2;
string sourceFile = "source.txt";
string destinationFile = "target.txt";
string tempFile = "target2.txt";
// Read the appropriate line from the file.
string lineToWrite = null;
using (StreamReader reader = new StreamReader(sourceFile))
{
for (int i = 1; i <= line_to_edit; ++i)
lineToWrite = reader.ReadLine();
}
if (lineToWrite == null)
throw new InvalidDataException("Line does not exist in " + sourceFile);
// Read from the target file and write to a new file.
int line_number = 1;
string line = null;
using (StreamReader reader = new StreamReader(destinationFile))
using (StreamWriter writer = new StreamWriter(tempFile))
{
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
writer.WriteLine(line);
}
line_number++;
}
}
// TODO: Delete the old file and replace it with the new file here.
}
}
You can afterwards move the file once you are sure that the write operation has succeeded (no excecption was thrown and the writer is closed).
Note that in both cases it is a bit confusing that you are using 1-based indexing for your line numbers. It might make more sense in your code to use 0-based indexing. You can have 1-based index in your user interface to your program if you wish, but convert it to a 0-indexed before sending it further.
Also, a disadvantage of directly overwriting the old file with the new file is that if it fails halfway through then you might permanently lose whatever data wasn't written. By writing to a third file first you only delete the original data after you are sure that you have another (corrected) copy of it, so you can recover the data if the computer crashes halfway through.
A final remark: I noticed that your files had an xml extension. You might want to consider if it makes more sense for you to use an XML parser to modify the contents of the files instead of replacing specific lines.

When you create a StreamWriter it always create a file from scratch, you will have to create a third file and copy from target and replace what you need, and then replace the old one.
But as I can see what you need is XML manipulation, you might want to use XmlDocument and modify your file using Xpath.

You need to Open the output file for write access rather than using a new StreamReader, which always overwrites the output file.
StreamWriter stm = null;
fi = new FileInfo(#"C:\target.xml");
if (fi.Exists)
stm = fi.OpenWrite();
Of course, you will still have to seek to the correct line in the output file, which will be hard since you can't read from it, so unless you already KNOW the byte offset to seek to, you probably really want read/write access.
FileStream stm = fi.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None);
with this stream, you can read until you get to the point where you want to make changes, then write. Keep in mind that you are writing bytes, not lines, so to overwrite a line you will need to write the same number of characters as the line you want to change.

I guess the below should work (instead of the writer part from your example). I'm unfortunately with no build environment so It's from memory but I hope it helps
using (var fs = File.Open(filePath, FileMode.Open, FileAccess.ReadWrite)))
{
var destinationReader = StreamReader(fs);
var writer = StreamWriter(fs);
while ((line = reader.ReadLine()) != null)
{
if (line_number == line_to_edit)
{
writer.WriteLine(lineToWrite);
}
else
{
destinationReader .ReadLine();
}
line_number++;
}
}

The solution works fine. But I need to change single-line text when the same text is in multiple places. For this, need to define a trackText to start finding after that text and finally change oldText with newText.
private int FindLineNumber(string fileName, string trackText, string oldText, string newText)
{
int lineNumber = 0;
string[] textLine = System.IO.File.ReadAllLines(fileName);
for (int i = 0; i< textLine.Length;i++)
{
if (textLine[i].Contains(trackText)) //start finding matching text after.
traced = true;
if (traced)
if (textLine[i].Contains(oldText)) // Match text
{
textLine[i] = newText; // replace text with new one.
traced = false;
System.IO.File.WriteAllLines(fileName, textLine);
lineNumber = i;
break; //go out from loop
}
}
return lineNumber
}

Is there a way for extracting an Excel file from an ado.net query and adding custom columns to it using c#

I have an Excel file template, and I need to extract the data from a SQL Server database to this Excel file using C# as same as the template file.
The problem is that I need to add another column using C# to the Excel file, to make this extracted file look like the template file.
So I will not extract the Excel file's data from my web-form application directly.
I need to add some additional columns first.

Convert SQL to CSV while writing SQL result add your custom column data as well. CSV will open in excel by default.
private void SQLToCSV(string query, string Filename)
{
SqlConnection conn = new SqlConnection(connection);
conn.Open();
SqlCommand cmd = new SqlCommand(query, conn);
SqlDataReader result = cmd.ExecuteReader();
using (System.IO.StreamWriter fs = new System.IO.StreamWriter(Filename))
{
// Loop through the fields and add headers
for (int i = 0; i < result.FieldCount; i++)
{
string colval = result.GetColumnName(i);
if (colval.Contains(","))
colval = "\"" + colval + "\"";
fs.Write(colval + ",");
}
//CONCATENATE THE COLUMNS YOU WANT TO ADD IN RESULT HERE
fs.WriteLine();
// Loop through the rows and output the data
while (result.Read())
{
for (int i = 0; i < result.FieldCount; i++)
{
string value = result[i].ToString();
if (value.Contains(","))
value = "\"" + value + "\"";
fs.Write(value + ",");
}
fs.WriteLine();
}
fs.Close();
}
}
You can covert csv to excel
using Excel = Microsoft.Office.Interop.Excel;
private void Convert_CSV_To_Excel()
{
// Rename .csv To .xls
System.IO.File.Move(#"d:\Test.csv", #"d:\Test.csv.xls");
var _app = new Excel.Application();
var _workbooks = _app.Workbooks;
_workbooks.OpenText("Test.csv.xls",
DataType: Excel.XlTextParsingType.xlDelimited,
TextQualifier: Excel.XlTextQualifier.xlTextQualifierNone,
ConsecutiveDelimiter: true,
Semicolon: true);
// Convert To Excle 97 / 2003
_workbooks[1].SaveAs("NewTest.xls", Excel.XlFileFormat.xlExcel5);
_workbooks.Close();
}

Certain files created from MemoryStream are corrupt

I have code that passes a list of objects containing the filename and binary data from the db to a loop that creates all the files. The problem that I have is the code below executes and appears to create the files correctly (filename & size is as expected) however, most files are "corrupt" when opened. The file types vary from images (jpg/png) to Word documents, Powerpoint presentations and PDF files. What is strange is that PDF files work perfectly, everything else is "corrupt"
My code is below (attachment is the object in the loop, the path is already created at this stage)
if(Directory.Exists(attachmentPath))
{
string absolutePath = attachmentPath + "\\importfiles\\" + parentfolders + "\\";
// no need to check if it exists as it will ignore if it does
Directory.CreateDirectory(absolutePath);
absolutePath += filename;
try
{
byte[] byteStream = null;
object objSave = null;
objSave = attachment.Image;
BinaryFormatter tmpBinF = new BinaryFormatter();
MemoryStream tmpMemStrm = new MemoryStream();
tmpBinF.Serialize(tmpMemStrm, objSave);
byteStream = tmpMemStrm.ToArray();
// Delete the file if it exists.
if (File.Exists(absolutePath))
{
File.Delete(absolutePath);
}
// Create the file.
using (FileStream fs = File.Create(absolutePath))
{
fs.Write(byteStream, 0, byteStream.Length);
fs.Dispose();
}
}
catch (Exception ex)
{
Exceptions.Text += ex.ToString();
}
}
I've used tips from MSDN and followed this tutorial but can't figure out why this is happening.
Thanks go to Amy for pointing out the issue with my approach, if anyone needs it, here's my updated code taking her answer into account. I've also extended it to add a log record on a table in the DB for later use.
if (Directory.Exists(attachmentPath))
{
// build path from the parts
string absolutePath = attachmentPath + "\\importfiles\\" + parentfolders + "\\";
// no need to check if it exists as it will ignore if it does
Directory.CreateDirectory(absolutePath);
absolutePath += filename;
byte[] file = attachment.Image;
try
{
// Delete the file if it exists.
if (File.Exists(absolutePath))
{
File.Delete(absolutePath);
}
// Create the file.
using (FileStream fs = File.Create(absolutePath))
{
fs.Write(file, 0, file.Length);
}
// start logging to the database
// add the Stored procedure
string SP = "sp_add_attachment";
// create the connection & command objects
MySqlConnection myConnection1 = new MySqlConnection(WPConnectionString);
MySqlCommand cmd1;
try
{
// open the connection
myConnection1.Open();
cmd1 = myConnection1.CreateCommand();
// assign the stored procedure string to the command
cmd1.CommandText = SP;
// define the command type
cmd1.CommandType = CommandType.StoredProcedure;
// pass the parameters to the Store Procedure
cmd1.Parameters.AddWithValue("#AttachmentID", attachment.ID);
cmd1.Parameters["#AttachmentID"].Direction = ParameterDirection.Input;
cmd1.Parameters.AddWithValue("#subpath", parentfolders);
cmd1.Parameters["#subpath"].Direction = ParameterDirection.Input;
cmd1.Parameters.AddWithValue("#filename", filename);
cmd1.Parameters["#filename"].Direction = ParameterDirection.Input;
// execute the command
int output = cmd1.ExecuteNonQuery();
// close the connection
myConnection1.Close();
}
catch (Exception ex)
{
Exceptions.Text += "MySQL Exception when logging:" + ex.ToString();
}
}
catch (Exception ex)
{
Exceptions.Text += ex.ToString();
}
}

I don't think using the BinaryFormatter is appropriate. If attachment.Image is a byte array, simply write it to the filestream. Forget the memory stream and the binary formatter entirely.
The Binary Formatter class is used to serialize a .Net class into a byte array. You already have a byte array though, so that step is not needed and is the source of your problem. Using the binary formatter would be appropriate only if the same binary formatter was used to create the blobs in the database. But you're storing files, not .Net objects, so it isn't useful here.
I'm not sure why PDFs would load when other files won't. You'd have to inspect the file using a hex editor to see what changed.

Download a .srt file with the information in a database table

I'm making this website to download subtitles. Right now I have this function to upload:
using (StreamReader sr = new StreamReader(file.InputStream, Encoding.Default, true))
{
string line;
while ((line = sr.ReadLine()) != null)
{
srtContent += line + '\0';
}
}
SubtitleFile item = new SubtitleFile();
UpdateModel(item);
item.state = State.Edit;
item.SubtitleText = srtContent;
item.name = char.ToUpper(item.name[0]) + item.name.Substring(1);
repo.AddSubtitle(item);
repo.Save();
ModelState.Clear();
And this uploads the srtContent to a place in my databse called SubtitleText,
Now I somehow need to be able to download this again.
So far I only have a hyperlink to a View that I call Downloader,
But that's all I got so far for the downloader.
What I'm missing is a way to take the information of ID given and do some sort of streamwriter or something, and put the info back into a new file where it would be something like
Model.name + '.srt'
with all the same text as I originally copied.
Hopefully I made this understandable. All constructive help appriciated.

Given that the information is stored in a database, we're gonna use the System.Data.SqlClient namespace.
SqlConnection myConnection = new SqlConnection("your connection string");
myConnection.Open();
string id = "my_id";
string text;
string fileName;
SqlCommand query = new SqlCommand();
query.CommandText = "SELECT FileName, SubtitleText FROM Subtitles WHERE ID = '#id'";
query.Parameters.AddWithValue("#id", id);
query.Connection = myConnection;
SqlDataReader data = query.ExecuteReader();
while (data.Read()) {
text = (string)data["SubtitleText"];
fileName = (string)data["FileName"];
}
using (FileStream fs = File.Create(file + ".srt")) {
File.WriteAllText(file, text);
}
This is kind of bad code but it roughly gives the idea of what you can do to achieve your goal (As i understood it*). If the ID is int, you can change it to that.
Addendum: English is not my first language so excuse the mistakes.

Read in xls file as well as current csv file in app

I have the following code -
private void button1_Click(object sender, EventArgs e)
{
string csv = File.ReadAllText("FilePath");
WebService.function res = new WebService.function();
XDocument doc = ConvertCsvToXML(csv, new[] { "," });
I was wondering how a could adjust the code so that it not only reads .csv files but also .xls files?
I created a public XDocument to do this -
public XDocument ConvertCsvToXML(string csvString, string[] separatorField)
{
var sep = new[] { "\n" };
string[] rows = csvString.Split(sep, StringSplitOptions.RemoveEmptyEntries);
var xsurvey = new XDocument(
new XDeclaration("1.0", "UTF-8", "yes"));
var xroot = new XElement("details");

If I understand your question correctly, you are looking to parse an excel file as text in the same manner that you parse the csv file. While this is possible, you should consider using Office Interop interfaces to do this. If you want to parse the raw file you'll need to account for the different formats between Office versions and a whole slew of encoding/serialization tasks; no small task.
Here are some resources to get you started:
Reading Excel from C#
How to Automate Excel from C#

I'm not sure from your question...but if you are asking how to read an excel file in c#, this will work:
string fileName = [insert path and name];
string connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0;data source={0}; Extended Properties=Excel 8.0;", fileName); // Create the data adapter pointing to the spreadsheet
var oa = new OleDbDataAdapter("SELECT * FROM [xxx$]", connectionString); // xxx is tab name
// Create a blank data set
var ds = new DataSet(); // Fill the data set using the adapter
oa.Fill(ds, "table1"); // Create a data table from the data set
DataTable dt1 = ds.Tables["table1"];
foreach (DataRow dr in dt1.Rows)
{
...
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Efficient way to read large tab delimited txt file? - c#

Use a stream approach with TextFieldParser - this way you will not load the whole file into memory in one go.

I found FileHelpers The FileHelpers are a free and easy to use .NET library to import/export data from fixed length or delimited records in files, strings or streams. Maybe it can help.

Related

Read & write a single line from a file without overwrite [duplicate]

Is there a way for extracting an Excel file from an ado.net query and adding custom columns to it using c#

Certain files created from MemoryStream are corrupt

Download a .srt file with the information in a database table

Read in xls file as well as current csv file in app

Categories

Resources