Handling empty fields in csv during bulk import to database

Handling empty fields in csv during bulk import to database - c#

I'm importing geocode data to our database from a csv file.
I've used the following library
A fast csv reader to read the csv and then using SqlBulkCopy
Here's an example of the data I'm importing
"AB10","1BH","L",0,0,,,20
"AB10","1BR","L",39320,80570,57.14214,-2.11400,21
It works ok on good data but on the top line it will throw an exception because the database is set up to not accept null values.
Is there a way to tell bulkcopy to ignore bad data? I've tried to get the csv reader to ignore bad lines by using the in built properties of the library like so but they don't appear to work.
csv.SkipEmptyLines = true;
csv.MissingFieldAction = MissingFieldAction.ParseError;
csv.DefaultParseErrorAction = ParseErrorAction.AdvanceToNextLine;
I guess another option would be to pre-parse the csv and remove all the offending rows. Perhaps there's a better csv library out there for .net?

If you could post your csv reader code then we could help more. But looking at the code on your linked page, you could do something like this:
while (csv.ReadNextRecord())
{
for (int i = 0; i < fieldCount; i++)
Console.Write(string.Format("{0} = {1};",
headers[i],
csv[i] ?? "null"));
Console.WriteLine();
}
See where I have added that null-coalescing operator? This should change your output from:
"AB10","1BH","L",0,0,,,20
to
"AB10","1BH","L",0,0,null,null,20

I used the Microsoft Text Driver to import CSV information for a project. It worked pretty well. I defined a Schema.ini file to specify the column headers, data types, number of rows to scan (MaxScanRows=0, will scan the whole file).
I haven't tried this but when you use Microsoft Text Driver you issue a select query to pull the data out of the csv file, I'm wondering if you could add criteria to filter out the null records.
How to populate IDataReader from .csv for use with SqlBulkCopy.WriteToServer(IDataReader)
http://msdn.microsoft.com/en-us/library/windows/desktop/ms709353(v=vs.85).aspx
http://www.connectionstrings.com/textfile
Hope this helps.

To deal with the null entries I ended up parsing the csv into a DataTable object 1000 entries at a time and then imported them as I went.

Related

How do I append to a CSV file using Filehelpers with multiple record types that have distinct headers?

As the question says, using the FileHelpers library I am attempting to generate a CSV file along side a report file. The report file may have different (but finite) inputs/data structures and hence my CSV generation method is not explicitly typed. The CSV contains all of the report data as well as the report's header information. For my headers, I am using the class object properties because they are descriptive enough for my end use purpose.
My relevant code snippet is below:
// File location, where the .csv goes and gets stored.
string filePath = Path.Combine(destPath, fileName);
// First, write report header details based on header list
Type type = DetermineListType(headerValues);
var headerEngine = new FileHelperEngine(type);
headerEngine.HeaderText = headerEngine.GetFileHeader();
headerEngine.WriteFile(filePath, (IEnumerable<object>)headerValues);
// Next, append the report data below the report header data.
type = DetermineListType(reportData);
var reportDataEngine = new FileHelperEngine(type);
reportDataEngine.HeaderText = reportDataEngine.GetFileHeader();
reportDataEngine.AppendToFile(filePath, (IEnumerable<object>)reportData);
When this is executed, the CSV is successfully generated however the .AppendToFile() method does not add the reportDataEngine.HeaderText. From the documentation I do not see this functionality to .AppendToFile() and I am wondering if anyone has a known work-around for this or a suggestion how to output the headers of two different class objects in a single CSV file using FileHelpers.
The desired output would look something like this however in a single CSV file (This would be a contiguous CSV obviously; not tables)
Report_Name
Operator
Timestamp
Access Report
User1
14:50:12 28 Dec 2020
UserID
Login_Time
Logout_Time
User4
09:33:23
10:45:34
User2
11:32:11
11:44:11
User4
15:14:22
16:31:09
User1
18:55:32
19:10:10
I have looked also at the MultiRecordEngine in FileHelpers and while I think this may be helpful, I cannot figure out based on the examples how to actually write a multirecord CSV file in the required fashion I have above; if it is possible at all.
Thank you!

The best way is to merge the columns and make one big table then make your classes match the columns you need to separate them out when reading. CSV only allows for the first row to define the column names and that is optional based on your use case. Look at CSVHelper https://joshclose.github.io/CsvHelper/ it has a lot of built-in features with lots of examples. Let me know if you need additional help.

Reading data from excel and send to SQL Server using Asp.Net MVC 5 C# with LINQ if its possible

Sorry with my English
I have an input file where the user will upload an Excel File, 1st I need to scan the extension of the file (.xlsx or .xls), after reading data from it and save the data in SQL Server.
Abour scan a extension I have this:
var ext = Path.GetExtension(file.FileName);
var allowedExtensions = new[] { ".xlsx", ".xls" };
if(allowedExtensions.Contains(ext)){ //codigo }
Now, my biggest question is to read the file data and send it to SQL Server.
The table has these columns:
ID
Registro
Nome
Ativo
I'm a newbie in the area, if that's simple, sorry :)

so you can read this thread, here is solidly explained how you can read data from excel : http://csharp.net-informations.com/excel/csharp-read-excel.htm
and about storing this data in DB, you can crate model class with the properties you already showed to us and then fill this properties with the data from excel and than insert them in db.

I recommend to use linktoexcel package method for that process. Its is easy and little code to get data from excel file. Please check this link and try it.
Get all the values from excel file by using linqtoexcel
Good luck

You can also use "Aspose.Cells" from nuget https://www.nuget.org/packages/Aspose.Cells/
to extract the cell information and fill your own objects with values if your intention is not to render the .xls/.xlsx but to extract the information then add to the DB.

Csv-files to Sql-database c#

What is the best approach to store information gathered locally in .csv-files with a C#.net sql-database? My reasons for asking is
1: The data i am to handle is massive (millions of rows in each csv). 2: The data is extremely precise since it describes measurements on a nanoscopic scale, and is therefor delicate.
My first though was to store each row of the csv in a correspondant row in the database. I did this using The DataTable.cs-class. When done, i feelt that if something goes wrong when parsing the .csv-file, i would never notice.
My second though is to upload the .csvfiles to a database in it's .csv-format and later parse the file from the database to the local enviroment when the user asks for it. If even possible in c#.net with visual stuido 2013, how could this be done in a efficient and secure manner?

I used .Net DataStreams library from csv reader in my project. It uses the SqlBulkCopy class, though it is not free.
Example:
using (CsvDataReader csvData = new CsvDataReader(path, ',', Encoding.UTF8))
{
// will read in first record as a header row and
// name columns based on the values in the header row
csvData.Settings.HasHeaders = true;
csvData.Columns.Add("nvarchar");
csvData.Columns.Add("float"); // etc.
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
bulkCopy.DestinationTableName = "DestinationTable";
bulkCopy.BulkCopyTimeout = 3600;
// Optionally, you can declare columnmappings using the bulkCopy.ColumnMappings property
bulkCopy.WriteToServer(csvData);
}
}

It sounds like you are simply asking whether you should store a copy of the source CSV in the database, so if there was an import error you can check to see what happened after the fact.
In my opinion, this is probably not a great idea. It immediately makes me ask, how would you know that an error had occurred? You certainly shouldn't rely on humans noticing the mistake so you must develop a way to programmatically check for errors. If you have an automated error checking method you should apply that method when the import occurs and avoid the error in the first place. Do you see the circular logic here?
Maybe I'm missing something but I don't see the benefit of storing the CSV.

You should probably use Bulk Insert. With your csv-file as a source.
But this will only work if the file is accessible from the PC that is running your SQL Server.
Here you can find a nice solution as well. To be short it looks like this:
StreamReader file = new StreamReader(bulk_data_filename);
CsvReader csv = new CsvReader(file, true,',');
SqlBulkCopy copy = new SqlBulkCopy(conn);
copy.DestinationTableName = tablename;
copy.WriteToServer(csv);

Read csv logfiles with different headers/columns

I need to read multiple csv files and merge them. The merged data is used for generating a chart (with the .NET chart control).
So far I've done this with a simple streamreader and added everything to one DataTable:
while (sr.Peek() > -1)
{
strLine = sr.ReadLine();
strLine = strLine.TrimEnd(';');
strArray = strLine.Split(delimiter);
dataTableMergedData.Rows.Add(strArray);
}
But now there is the problem, that the logfiles can change. As you can see here, newer logfiles have got additional columns:
My current procedure doesn't work now and I'm asking for advice how to do this. Performance is important due to the fact, that every logfile contains about 1500 lines and up to 100 columns and the logfiles get merged up to a one-year-period (equals 365 files).
I would do it that way: Creating a DataTable, which should contain all data at the end and reading each logfile into a seperate DataTable. After each read operation I would add the seperate DataTable to the "big" DataTable, check if columns have changed and add the new columns if they did.
But I'm afraid that using DataTables would affect the performance.
Note: I'm doing this with winforms, but I think that doesn't matter anyway.
Edit: Tried CsvReader but this is about 4 times slower than my current solution.

After hours of testing I did it the way I described it in the question:
Firstly I created a DataTable which should contain all the data at the end. Then I am going through all logfiles by a foreach-loop and for every logfile I create another DataTable and fill it with the csv-data from the logfile. This table gets added to the first DataTable and no matter if they have different columns, they get added properly.
This may costs some performance compared to a simple StreamReader, but It is more easier to extend and still faster than the LumenWorks CsvReader.

Multi line INSERT INTO in .NET on sql2008 .NET 3.5

I have a comma delimited CSV file being loaded into a String value generated as Part of an XSLT Transformation in C# - (CLR assembly, console Application).
I need to get this C# app to output this CSV file into a Database table, 1 line at a time. and as I am rather inexperienced with C#, I have no idea how best to achieve this!
In SQL I managed it by the following SQL Statement:
INSERT INTO CsvData
(ID, sFilename, iLineCount, sData, dDate)
SELECT #ID, #Filename, id, val, CAST(getdate() as smalldatetime)
FROM dbo.split(#CSVFile,char(13))
The dbo.split function takes the #CSVFile and it's Delimeter (char(13) in this case) and returns a Table with 1 row per line in the CSVFile (id=identity, val=Linedata), which is then used to populate the CsvData table.
I cannot pass the #CSVfile as a parameter to an SQL StroedProc as it can get VERY large, so want to keep it all enclosed in the C# code.
How would i best achieve this in .NET?

You can use SqlBulkCopy class from .NET SqlClient provider. Check this MSDN article for its usage. It provides good performance regarding bulk inserts. However, you must firstly read your data from CSV in a structured format, since SqlBulkCopy requires a DataTable or an IDataReader for it to work. You have 2 options:
Load your entire CSV file into a DataTable object, which may not be the best solution if your CSV file is very big
Create a CsvDataReader as a read-only, forward-only cursor for your CSV files. You can find some implementations on web.
Since you're saying that your CSV is being loaded into a String value, than if wouldn't be a problem to fill a DataTable with data. Use this DataTable as an argument to SqlBulkCopy.WriteToServer method.

Try LINQ to CSV.
http://blogs.msdn.com/b/ericwhite/archive/2008/09/30/linq-to-text-and-linq-to-csv.aspx
This implementation would imply iterating rather than a bulk insert though

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Handling empty fields in csv during bulk import to database - c#

To deal with the null entries I ended up parsing the csv into a DataTable object 1000 entries at a time and then imported them as I went.

Related

How do I append to a CSV file using Filehelpers with multiple record types that have distinct headers?

Reading data from excel and send to SQL Server using Asp.Net MVC 5 C# with LINQ if its possible

Csv-files to Sql-database c#

Read csv logfiles with different headers/columns

Multi line INSERT INTO in .NET on sql2008 .NET 3.5

Categories

Resources