Working with large csv files - c#

I'm trying to find out some of the best ways to work with large data files. I have a scenario where I will have several CSV files, of which I would like the ability to query data. One of the csv files I will read line by line but I need to be able to query a second CSV file based on a key from the line I'm currently reading. I don't want to (at least I don't think) load the entire CSV into a memory object as they can be millions of lines and will eat tons of RAM. I've considered writing them to some sort of database file on the fly but that just doesn't seem efficient as your essentially duplicating the data. Any suggestions?

You can try OleDb, load data in data table using data adapter, and perform query on it. This link has explained
String conn = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\;
Extended Properties=""Text;HDR=No;FMT=Delimited""";
OleDbConnection cn = new OleDbConnection(conn);
OleDbCommand cmd = new OleDbCommand(#"SELECT * FROM C:\Temp\teams.csv", cn);
OleDbDataAdapter da = new OleDbDataAdapter(cmd);
cn.Open();
DataTable dt = new DataTable();
da.Fill(dt);

Related

asp.net oledbcommand return all rows

I am using Oledbconnection to connect to a Microsoft Access database, and I am using OleDbCommand to retrieve some information. I have a query in the database called retrieveInfo, which retrieves 3 rows of data. There are some duplicates in the fields but that's how it's supposed to be. My data looks like this:
Name Email
A A#gmail.com
B A#gmail.com
B C#gmail.com
My C# code behind looks like this:
DataTable dt = new DataTable();
using (OleDbConnection conn = new OleDbConnection(connectionString))
{
string query = "SELECT * FROM retrieveInfo";
try
{
conn.Open();
DataTable info = new DataTable();
OleDbCommand command = new OleDbCommand(query, conn);
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(command);
info.Clear();
dataAdapter.Fill(info);
}
I ran the query retrieveInfo in MS Access and it returned 3 rows like shown above. However when I run this command using C# and loaded the data into a datatable, it only shows 2 rows. The datatable only has 1st and 2nd row. I don't know if this has anything to do with the original table properties, or is my C# code wrong? I also tried using a data reader, execute reader and using a while loop to read data. But it also only return 2 rows.
Any help would be appreciated!
Thank you
Chances are you are using a 'LIKE' statement somewhere in retrieveInfo's definition, either in that query itself or in one of its sub-queries. OleDbDataAdapter cannot resolve MS Access LIKE statement as it's written in Access SQL. You need to use ANSI LIKE (see link).
Stackoverflow Explanation
Now I use
Like "oldmcdonal%" Or Like "oldmcdonal*"
This provides the expected result set regardless of whether it's executed in MS Access or via .net code.

Rhino ETL - loading large pipe-delimited files

We've got to load large pipe-delimited files. When loading these into a SQL Server DB by using Rhino ETL (relying upon FileHelpers), is it mandatory to provide a record class?
We have to load files to different tables which have dozens of columns - it might take us a whole day to generate them. I guess we can write a small tool to generate the record classes out of the SQL Server tables.
Another approach would be to write an IDataReader wrapper for a FileStream and the pass it on to a SqlBulkCopy.
SqlBulkCopy does require column mappings as well but it does allow column ordinals - that's easy.
Any ideas/suggestions?
Thanks.
I don't know much about Rhino ETL, but FileHelpers has a ClassBuilder which allows you to generate the record class at run time. See the documentation for some examples.
So it would be easy to generate a class with something like the following:
SqlCommand command = new SqlCommand("SELECT TOP 1 * FROM Customers;", connection);
connection.Open();
// get the schema for the customers table
SqlDataReader reader = command.ExecuteReader();
DataTable schemaTable = reader.GetSchemaTable();
// create the FileHelpers record class
// alternatively there is a 'FixedClassBuilder'
DelimitedClassBuilder cb = new DelimitedClassBuilder("Customers", ",");
cb.IgnoreFirstLines = 1;
cb.IgnoreEmptyLines = true;
// populate the fields based on the columns
foreach (DataRow row in schemaTable.Rows)
{
cb.AddField(row.Field<string>("ColumnName"), row.Field<Type>("DataType"));
cb.LastField.TrimMode = TrimMode.Both;
}
// load the dynamically created class into a FileHelpers engine
FileHelperEngine engine = new FileHelperEngine(cb.CreateRecordClass());
// import your records
DataTable dt = engine.ReadFileAsDT("testCustomers.txt");

Put SqlDataReader data into a DataTable while i process it

I want to query a database and with the results, i want to process them. While im processing them, some of them will need to be inserted into another database. Since i cannot run another query with an open SqlDataReader (that i know of). I was thinking about putting the data from the SqlDataReader into a DataTable while i process it. Is there a built in way to do this or is there another solution that can accomplish the same idea?
SqlDataReader reader = com.ExecuteReader();
DataTable dt = new DataTable();
dt.Load(reader);
With the rest of the standard set-up and tear-down of SqlCommand objects, of course.
The easiest way would be to use a DataAdapter to fill a datatable. Then process the data and update the database. Filling a dataset does not tie up the connection once the fill is complete.

C# OLEDBConnection to Excel

I am copying an Excel sheet into a Datatable as such:
OleDbCommand command = new OleDbCommand();
command = new OleDbCommand("Select * from [working sheet$]", oleDBConnection);
OleDbDataAdapter dataAdapter = new OleDbDataAdapter();
dataAdapter.SelectCommand = command;
dataAdapter.Fill(dt);
Is there a similar method where I can just simply copy the datatable back to an Excel sheet? The examples I keep finding are copying cell by cell, but this can be noticably slow with large data sets.
Thanks
You're looking for the DataAdapter.Update method, which applies any changes made in the DataTable to the database (or spreadsheet, in this case)
Don't really know much about OleDB with Excel, but since you mentioned a Database, I assume this runs on a server? Microsoft actually does not recomment running Excel on a server.
I would use OpenXML for tasks like this. It's a bit more complicated, but it's save and stable.

White spaces and oledb

I'm reading an excel file using OleDb on ASP. NET (C#).
All the information is returned ok and I was surprised to see that even the cell type defined on the Excel file is returned to my code.
The problem is that I have column where all the cells "general" types and since the values are only numbers Excel assumes it to be Number.
If there are no spaces olebd driver returns the right value to my code but if there are space it returns ""...
Here's how I'm getting the information:
OleDbConnection oleDbConn = new OleDbConnection(connString);
oleDbConn.Open();
OleDbCommand oleDbComm = new OleDbCommand("SELECT * FROM [Sheet1$]", oleDbConn);
OleDbDataAdapter oleDbDtAdapter = new OleDbDataAdapter();
oleDbDtAdapter.SelectCommand = oleDbComm;
DataSet dtSet = new DataSet();
oleDbDtAdapter.Fill(dtSet, "SMSs");
Object testZeroZero = dtSet.Tables[0].Rows[0][0];
I can't go to the Excel and change the cell type to "text" because the end user must not worry on changing this so how can I overcome this?
Regards!
Have you considered your connection string?
"IMEX=1;" tells the driver to always
read "intermixed" (numbers, dates,
strings etc) data columns as text.
Note that this option might affect
excel sheet write access negative.
-- http://www.connectionstrings.com/excel

Categories