How to Search Data From Huge CSV Files (20Gb) C# ASP.NET - c#

I want to create a program using .Net to read or search data in a 20Gb CSV file
Is there any way to do it ?
My Code For Search
string search = txtBoxSearch.Text;
string pathOnly = Path.GetDirectoryName(csvPath);
string fileName = Path.GetFileName(csvPath);
string sql = #"SELECT F1 AS StringID, F2 AS StringContent FROM [" + fileName + "] WHERE F2 LIKE '%" + search + "%'";
using (OleDbConnection connection = new OleDbConnection(
#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + pathOnly +
";Extended Properties=\"Text;HDR=No\""))
using (OleDbCommand command = new OleDbCommand(sql, connection))
using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
{
DataTable dataTable = new DataTable();
adapter.Fill(dataTable);
dataTable.Columns.Add("MatchTimes", typeof(System.Int32));
foreach (DataRow row in dataTable.Rows)
{
row["MatchTimes"] = Regex.Matches(row["StringContent"].ToString(), search).Count;
}
GridViewResult.DataSource = dataTable;
GridViewResult.DataBind();
My Code for generate the CSV File
int records = 100000;
File.AppendAllLines(csvPath,
(from r in Enumerable.Range(0, records)
let guid = Guid.NewGuid()
let stringContent = GenerateRandomString(256000)
select $"{guid},{stringContent}"));

This really depends on exactly how you're searching. If you're just doing a single search, you could simply read this one line at a time and do a string comparison or something. If you do this, do not load the whole thing into memory - load it one at a time.
If you have access to the "full" edition of SQL Server, you could do a BULK INSERT. If you don't, though (e.g. you're using one of the express editions), you might run into the maximum table size. In this case, I've never tried this, but you could try SQLite. In theory at least, the database can handle multiple terabytes. Be sure to insert a large number of records in each transaction, though; if you do a commit after each insert your performance will be absolutely wretched. Also, be sure that you're not creating an in-memory database, or you'll just run out of memory again.

Related

SQL Bulk Insert in C# not inserting values

I'm completely new to C#, so I'm sure I'm going to get a lot of comments about how my code is formatted - I welcome them. Please feel free to throw any advice or constructive criticisms you might have along the way.
I'm building a very simple Windows Form App that is eventually supposed to take data from an Excel file of varying size, potentially several times per day, and insert it into a table in SQL Server 2005. Thereafter, a stored procedure within the database takes over to perform various update and insert tasks depending on the values inserted into this table.
For this reason, I've decided to use the SQL Bulk Insert method, since I can't know if the user will only insert 10 rows - or 10,000 - at any given execution.
The function I'm using looks like this:
public void BulkImportFromExcel(string excelFilePath)
{
excelApp = new Excel.Application();
excelBook = excelApp.Workbooks.Open(excelFilePath);
excelSheet = excelBook.Worksheets.get_Item(sheetName);
excelRange = excelSheet.UsedRange;
excelBook.Close(0);
try
{
using (SqlConnection sqlConn = new SqlConnection())
{
sqlConn.ConnectionString =
"Data Source=" + serverName + ";" +
"Initial Catalog=" + dbName + ";" +
"User id=" + dbUserName + ";" +
"Password=" + dbPassword + ";";
using (OleDbConnection excelConn = new OleDbConnection())
{
excelQuery = "SELECT InvLakNo FROM [" + sheetName + "$]";
excelConn.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + excelFilePath + ";Extended Properties='Excel 8.0;HDR=Yes'";
excelConn.Open();
using (OleDbCommand oleDBCmd = new OleDbCommand(excelQuery, excelConn))
{
OleDbDataReader dataReader = oleDBCmd.ExecuteReader();
using (SqlBulkCopy bulkImport = new SqlBulkCopy(sqlConn.ConnectionString))
{
bulkImport.DestinationTableName = sqlTable;
SqlBulkCopyColumnMapping InvLakNo = new SqlBulkCopyColumnMapping("InvLakNo", "InvLakNo");
bulkImport.ColumnMappings.Add(InvLakNo);
sqlQuery = "IF OBJECT_ID('ImportFromExcel') IS NOT NULL BEGIN SELECT * INTO [" + DateTime.Now.ToString().Replace(" ", "_") + "_ImportFromExcel] FROM ImportFromExcel; DROP TABLE ImportFromExcel; END CREATE TABLE ImportFromExcel (InvLakNo INT);";
using (SqlCommand sqlCmd = new SqlCommand(sqlQuery, sqlConn))
{
sqlConn.Open();
sqlCmd.ExecuteNonQuery();
while (dataReader.Read())
{
bulkImport.WriteToServer(dataReader);
}
}
}
}
}
}
}
catch(Exception ex)
{
MessageBox.Show(ex.ToString());
}
finally
{
excelApp.Quit();
}
}
The function runs without errors or warnings, and if I replace the WriteToServer with manual SQL commands, the rows are inserted; but the bulkImport isn't inserting anything.
NOTE: There is only one field in this example, and in the actual function I'm currently running to test; but in the end there will be dozens and dozens of fields being inserted, and I'll be doing a ColumnMapping for all of them.
Also, as stated, I am aware that my code is probably horrible - please feel free to give me any pointers you deem helpful. I'm ready and willing to learn.
Thanks!
I think it would be a very long and messy answer if I commented on your code and also gave pointer sample codes in the same message, so I decided to divide then into two messages. Comments first:
You are using automation to get what? You already have the sheet name as I see it and worse you are doing app.Quit() at the end. Completely remove that automation code.
If you needed some information from excel (like sheet names, column names) then you could use OleDbConnecton's GetOleDbSchemaTable method.
You might do the mapping basically in 2 ways:
Excel column ordinal to SQL table column name
Excel column name to SQL table column name
both would do. In a generic code, assuming you have column names same in both sources, but their ordinal and count may differ, you could get the column names from OleDbConnection schema table and do the mapping in a loop.
You are dropping and creating a table named "ImportFromExcel" for the purpose of temp data insertion, then why not simply create a temp SQL server table by using a # prefix in table name? OTOH that code piece is a little weird, it would do an import from "ImportFromExcel" if it is there, then drop and create a new one and attempt to do bulk import into that new one. In first run, SqlBulkCopy (SBC) would fill ImportFromExcel and on next run it would be copied to a table named (DateTime.Now ...) and then emptied via drop and create again. BTW, naming:
DateTime.Now.ToString().Replace(" ", "_") + "_ImportFromExcel"
doesn't feel right. While it looks tempting, it is not sortable, probably you would want something like this instead:
DateTime.Now.ToString("yyyyMMddHHmmss") + "_ImportFromExcel"
Or better yet:
"ImportFromExcel_" +DateTime.Now.ToString("yyyyMMddHHmmss")
so you would have something that is sorted and selectable for all the imports as a wildcard or looping for some reason.
Then you are writing to server inside a reader.Read() loop. That is not the way WriteToServer works. You wouldn't do reader.Read() but simply:
sbc.WriteToServer(reader);
In my next message e I will give simple schema reading and a simple SBC sample from excel into a temp table, as well as a suggestion how you should do that instead.
Here is the sample for reading schema information from Excel (here we read the tablenames - sheet names with tables in them):
private IEnumerable<string> GetTablesFromExcel(string dataSource)
{
IEnumerable<string> tables;
using (OleDbConnection con = new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;" +
string.Format("Data Source={0};", dataSource) +
"Extended Properties=\"Excel 12.0;HDR=Yes\""))
{
con.Open();
var schemaTable = con.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
tables = schemaTable.AsEnumerable().Select(t => t.Field<string>("TABLE_NAME"));
con.Close();
}
return tables;
}
And here is a sample that does SBC from excel into a temp table:
void Main()
{
string sqlConnectionString = #"server=.\SQLExpress;Trusted_Connection=yes;Database=Test";
string path = #"C:\Users\Cetin\Documents\ExcelFill.xlsx"; // sample excel sheet
string sheetName = "Sheet1$";
using (OleDbConnection cn = new OleDbConnection(
"Provider=Microsoft.ACE.OLEDB.12.0;Data Source="+path+
";Extended Properties=\"Excel 8.0;HDR=Yes\""))
using (SqlConnection scn = new SqlConnection( sqlConnectionString ))
{
scn.Open();
// create temp SQL server table
new SqlCommand(#"create table #ExcelData
(
[Id] int,
[Barkod] varchar(20)
)", scn).ExecuteNonQuery();
// get data from Excel and write to server via SBC
OleDbCommand cmd = new OleDbCommand(String.Format("select * from [{0}]",sheetName), cn);
SqlBulkCopy sbc = new SqlBulkCopy(scn);
// Mapping sample using column ordinals
sbc.ColumnMappings.Add(0,"[Id]");
sbc.ColumnMappings.Add(1,"[Barkod]");
cn.Open();
OleDbDataReader rdr = cmd.ExecuteReader();
// SqlBulkCopy properties
sbc.DestinationTableName = "#ExcelData";
// write to server via reader
sbc.WriteToServer(rdr);
if (!rdr.IsClosed) { rdr.Close(); }
cn.Close();
// Excel data is now in SQL server temp table
// It might be used to do any internal insert/update
// i.e.: Select into myTable+DateTime.Now
new SqlCommand(string.Format(#"select * into [{0}]
from [#ExcelData]",
"ImportFromExcel_" +DateTime.Now.ToString("yyyyMMddHHmmss")),scn)
.ExecuteNonQuery();
scn.Close();
}
}
While this would work, thinking in the long run, you need column names, and maybe their types differ, it might be an overkill to do this stuff using SBC and you might instead directly do it from MS SQL server's OpenQuery:
SELECT * into ... from OpenQuery(...)
the WriteToServer(IDataReader) is intended to do internally the IDataReader.Read()operation.
using (SqlCommand sqlCmd = new SqlCommand(sqlQuery, sqlConn))
{
sqlConn.Open();
sqlCmd.ExecuteNonQuery();
bulkImport.WriteToServer(dataReader);
}
You can check the MSDN doc on that function, has a working example: https://msdn.microsoft.com/en-us/library/434atets(v=vs.110).aspx

SQL Script Has to be Changed to Reduce SQL Serverload

A couple of years ago it was a script designed to import persons into a temporary table. Now this script takes 10 minutes to load and that causes problems sometimes.
So I decided to check the code for optimizing but I have no clue how I can change it.
The current code looks like this
// Getting attributes from the configfile
string filePath = getAppSetting("filepath");
string fileName = getAppSetting("filename");
string fileBP = filePath + fileName;
if (File.Exists(fileBP))
{
// Truncate Temp-Table
SqlCommand command = new SqlCommand("TRUNCATE TABLE [dbo].[temp_person];", connection);
command.ExecuteNonQuery();
FileStream logFileStream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
StreamReader logFileReader = new StreamReader(logFileStream, System.Text.Encoding.Default);
while (!logFileReader.EndOfStream)
{
string line = logFileReader.ReadLine();
strActLine = line;
string sql = "INSERT INTO temp_person(per_nummer, per_pid, per_name)"
+ "VALUES(#per_nummer, #per_pid, #per_name)"
SqlCommand cmd = new SqlCommand(sql, connection);
cmd.Parameters.AddWithValue("#per_nummer", isNull(line.Substring(0, 7)));
cmd.Parameters.AddWithValue("#per_pid", isNull(line.Substring(7, 7)));
cmd.Parameters.AddWithValue("#per_name", isNull((line.Substring(14, 20))));
cmd.ExecuteNonQuery();
}
// Clean up
connection.Close();
logFileReader.Close();
logFileStream.Close();
}
In this code I open for each person a new connection and it makes no sense to do that. Is it possible to change that to a bulk insert or something like that? The file does not have any kind of separators like a ";"
I'm Using
MSSQL 2008 R2,
.Net 4.0 (higher is currently not possible on this server)
Presuming your logFileStream file ("fileName") has all the users you are loading, you are NOT opening a new connection as you think. It is currently using one connection to TRUNCATE the table, then load all entries from the file fileName.
The only way to make this run any faster would be to use the Bulk Insert SQL Server statement, details of which you can find here: https://msdn.microsoft.com/en-us/library/ms188365.aspx
Have you investigated the SqlBulkCopy class? It provides several different ways to bulk-copy data from .NET code into SQL Server. Here's an example using a DataTable to buffer the records:
if (File.Exists(fileName))
{
TruncateTempTable(connection);
DataTable newRecs = new DataTable();
newRecs.Columns.Add("per_nummer", typeof (string));
newRecs.Columns.Add("per_pid", typeof(string));
newRecs.Columns.Add("per_name", typeof(string));
using (TextReader tr = File.OpenText(fileName))
{
while (tr.Peek() > 0)
{
string theLine = tr.ReadLine();
DataRow newRow = newRecs.NewRow();
newRow["per_nummer"] = theLine.Substring(0, 7);
newRow["per_pid"] = theLine.Substring(7, 7);
newRow["per_name"] = theLine.Substring(14, 20);
newRecs.Rows.Add(newRow);
}
}
SqlBulkCopy bulkCopy = new SqlBulkCopy(connection);
bulkCopy.WriteToServer(newRecs);
}
Much more detail is available at the MSDN link above.
I hope this will work
BULK INSERT Reports FROM #ReportFile WITH (FIELDTERMINATOR = '|',ROWTERMINATOR = '\n')
if file data like
M1009|20130502|E400969|ARACIL ALONDRA A|2013050220131202201404022014040220140408
M1009|20130502|N1000533|BARRY PATRICIA| 2013050220131202201404022014040220140408
M1009|20130502|N1001263|GRAYSON JOSEPH| 2013050220131202201404022014040220140408
M1009|20130502|N1026710|GANZI LOUIS R.| 2013050220131202201404022014040220140408
this should works with t-sql

How to Get MYSQL Query into a TextBox?

Given the following code How do I get the results of a MYSQL query in C# in to a textbox? I have tried for hours to figure it out - it should be easy I guess. Here's the following code attempt. BTW coming from using procedural PHP I'm really struggling with MYSQL and C#, should I use a different code?:
...
using MySql.Data.MySqlClient;
namespace houseDB1
{
public partial class Form2 : Form
{
private string server;
private string database;
private string uid;
private string password;
private MySqlConnection connection;
public Form2(string strTextBox)
{
InitializeComponent();
// richTextBox1.Text = strTextBox;
int num = int.Parse(strTextBox);
server = "localhost";
database = "realestate_db";
uid = "root";
password = "";
string connectionString;
connectionString = "SERVER=" + server + ";" + "DATABASE=" + database + ";" + "UID=" + uid + ";" + "PASSWORD=" + password + ";";
connection = new MySqlConnection(connectionString);
connection.Open();
MySqlDataAdapter mySqlDataAdapter;
mySqlDataAdapter = new MySqlDataAdapter("SELECT `ID`, `lat` , `long` FROM `house` ", connection);
DataSet DS = new DataSet();
mySqlDataAdapter.Fill(DS);
richTextBox1.Text = DS.Tables[0].ToString(); // doesn't work
}
...
Is there a particular reason you're using a TextBox? A GridView would be the normal approach.
For example you could do:
MySqlConnection conn = new MySqlConnection(connectionString);
MySqlCommand cmd = new MySqlCommand("SELECT `ID`, `lat` , `long` FROM `house`;", conn);
conn.Open();
DataTable dataTable = new DataTable();
MySqlDataAdapter da = new MySqlDataAdapter(cmd);
da.Fill(dataTable);
GridVIew.DataSource = dataTable;
GridVIew.DataBind();
Yes, you should definitely move away from doing what you're currently doing...
It's a dated way of doing things, inefficient (due to the time you waste), and can leave you prone to SQL injection attacks if your queries aren't parametized.
What you need is called Object Relational Mapping (ORM)... It's a way of communicating with your database without writing a single line of SQL, or worrying about table names, or column names, or whether the column is a nullable type. You only interact with your C# objects, and the ORM persists the data if you want it to, it generates all that SQL that you are writing by hand behind the scenes...
Your Customers table would consist of a collection of Customer objects, your Orders table would consist of Order objects, and you could retrieve all of any given customers Orders, because the 'Customer' would have a Collection<Order> property which you could navigate to and retrieve, without writing a single line of SQL.
So how do you do it?
First you need to install Entity Framework from NuGet (EntityFramework is an ORM by Microsoft)...
The shortest way of doing that is pressing CTRL + Q and typing Package Manager and selecting the first option...
Then, type this in to the console....
Install-Package EntityFramework
After that's done... install this extension which you'll use to reverse engineer your entire database and have it represented in C# classes in your project.
DS.Tables[0].Rows[0][0].ToString();// gets the first row, first cell
So you need to loop the table in order to fill the textbox:
string myRichTextTB = "";
for(int i=0; i< DS.Tables[0].Rows.Count; i++)
for(int j=0; j< 3/*number of the selected columns*/; j++)
{
myRichTextTB = myRichTextTB + DS.Tables[0].Rows[i][j].ToString();
}

in C# OleDbDataAdapter.fill method not giving any data or error

I am using a data adapter to pull data from an access database (see below code). When I run the SQL in the Access database I get the expected data. However when I step through the code the fill method produces only the table definition but no rows.
I have used this procedure many times in the past and it still works for those calls.
Again the SQL in access returns the correct data and in C# I don't get ANY error message but I don't get the data either. Had anyone seen this before?
`
public void GetQueries(ref DataTable tSQL, String tool, string Filter, OleDbConnection lConn)
{
OleDbDataAdapter dadapt = new OleDbDataAdapter(); //Data Adapter for Access
String lSQL = "";
//assign the connection to the processing mdb
//lAccProcSQL.Connection = lConn;
//Pull the queries to be executed
lSQL = "SELECT * FROM tblSQL WHERE Active = TRUE AND ToolCode = '" +
tool + "' and type not in (" + Filter + ") ORDER BY QueryNum";
//Set the adapter to point to the tblSQL table
dadapt = new OleDbDataAdapter(lSQL, lConn);
//clear tables in case of rerun
tSQL.Clear();
//Fill working queries data table
dadapt.Fill(tSQL);
}`
Are you sure that the filter that you've defined in the WHERE clause will evaluate to true on certain rows ?
Why don't you use parameters instead of string concatenation ? Are you sure that Active = True will evaluate to true ? As far as I know, True is represented by -1 in Access.
So, why don't you try it like this:
var command = new OleDbCommand();
command.Connection = lConn;
command.CommandText = "SELECT * FROM tblSql WHERE Active = -1 AND ToolCode = #p_toolCode AND type NOT IN (" + filter + ") ORDER BY querynum";
command.Parameters.Add ("#p_toolCode", OleDbType.String).Value = tool;
datapt = new OleDbDataAdapter();
datapt.SelectCommand = command;
dadapt.Fill (tSql);

Passing data from one database to another database table

I want to take a backup of my Access database Pragmatically.
And After taking all data in backup i want to delete data from source database.
( So that it will not take much time while querying and filtering through application.)
The source database name is Data.mdb
The destination database name is Backup.mdb
Both are protected by same password.
For these purpose i am writing a query in C# like this.
string conString = "Provider=Microsoft.Jet.OLEDB.4.0 ;Data Source=Backup.mdb;Jet
OLEDB:Database Password=12345";
OleDbConnection dbconn = new OleDbConnection();
OleDbDataAdapter dAdapter = new OleDbDataAdapter();
OleDbCommand dbcommand = new OleDbCommand();
try
{
if (dbconn.State == ConnectionState.Closed)
dbconn.Open();
string selQuery = "INSERT INTO [Bill_Master] SELECT * FROM [MS Access;DATABASE="+
"\\Data.mdb" + "; Jet OLEDB:Database Password=12345;].[Bill_Master]";
dbcommand.CommandText = selQuery;
dbcommand.CommandType = CommandType.Text;
dbcommand.Connection = dbconn;
int result = dbcommand.ExecuteNonQuery();
}
catch(Exception ex) {}
Everything goes fine if i try with without password database file.
I think error in passing password on query statement.
I am trying to execute through access query but it is saying "Invalid argument".
Please is there any other programing logic for doing that.
Thanks
prashant
YuvaDeveloper
Are Data.mdb and Backup.mdb identically in strcuture? If so, I wouldn't bother copying data via SQL but just copy the whole file.
Try remove the space between the ; and Jet …
So the format would be:
INSERT INTO [Bill_Master] SELECT * FROM [MS Access;DATABASE="+
"\\Data.mdb" + ";Jet OLEDB:Database Password=12345;].[Bill_Master]
You can copy and rename Data.mdb, and then truncate all the tables in Data.mdb. Far easier than trying to copy a table at a time..
Don't delete data. This becomes a lot mroe difficult in the future to do analysis or inquiries. If it's taking a long time then review indexing or upszing to SQL Server. The Express edition is free and can handle databases up to 4 Gb.

Categories