CsvHelper - Set the header row and data row - c#

I have sample data that looks like this:
1 This is a random line in the file
2
3 SOURCE_ID|NAME|START_DATE|END_DATE|VALUE_1|VALUE_2
4
5 Another random line in the file
6
7
8
9
10 GILBER|FRED|2019-JAN-01|2019-JAN-31|ABC|DEF
11 ALEF|ABC|2019-FEB-01|2019-AUG-31|FBC|DGF
12 GILBER|FRED|2019-JAN-01|2019-JAN-31|ABC|TEF
13 FLBER|RED|2019-JUN-01|2019-JUL-31|AJC|DEH
14 GI|JOE|2020-APR-01|2020-DEC-31|GBC|DER
I am unable to save changes to the file. Ie, I can't manipulate/clean the original files before consumption. Any manipulation will need to be done on the fly in memory. But what if the files are large (eg, I am currently testing with some files that are 5m+ records).
I am using CsvHelper
I have already referred to the following threads for guidance:
CSVHelper to skip record before header
Better way to skip extraneous lines at the start?
How to read a header from a specific line with CsvHelper?
What I would like to do is:
Set row where header is = 3 (I will know where the header is)
Set row where data starts = 10 (I will know where the data starts from)
Load data into data table, to be displayed into datagridview
If I need perform a combination of stream manipulation before I pass this into the CsvHelper, then do also let me know if that's the missing piece? (and any assistance on how I can actually achieve that under one block of code with be greatly appreciated)
So far I have come up with the below:
string filepath = Path.Combine(txtTst04_File_Location.Text, txtTst04_File_Name.Text);
using (var reader = new StreamReader(filepath))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
// skip rows to get the header
for (int i = 0; i < 4; i++)
{
csv.Read();
}
csv.Configuration.Delimiter = "|"; // Set delimiter
csv.Configuration.IgnoreBlankLines = false;
csv.Configuration.HasHeaderRecord = true;
// how do I set the row where the actual data starts?
using (var dr = new CsvDataReader(csv))
{
var dt = new DataTable();
dt.Load(dr);
dgvTst04_View.DataSource = dt; // Set datagridview source to datatable
}
}
I get the below result:
Do let me know if you would like me to expand on any point.
thanks!
EDIT:
New linked post created here trying to resolve the same objective, but in a different way but getting a new error:
Filestream and datagridview memory issue with CsvHelper

I can get it to work with ShouldSkipRecord. The only problem is it will fail if any of the random lines has a "|" delimiter in it.
using (var reader = new StreamReader(filepath))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.Delimiter = "|"; // Set delimiter
csv.Configuration.ShouldSkipRecord = row => row.Length == 1;
using (var dr = new CsvDataReader(csv))
{
var dt = new DataTable();
dt.Load(dr);
dgvTst04_View.DataSource = dt; // Set datagridview source to datatable
}
}
If you know how many columns there are, you could set it to skip any rows that have less than that many columns.
csv.Configuration.ShouldSkipRecord = row => row.Length < 6;

I came up with another approach that allows you to skip the lines to the header and then to the records.
using (var reader = new StreamReader(filepath))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.Delimiter = "|"; // Set delimiter
csv.Configuration.IgnoreBlankLines = false;
// skip to header
for (int i = 0; i < 3; i++)
{
csv.Read();
}
csv.ReadHeader();
var headers = csv.Context.HeaderRecord;
// skip to records
for (int i = 0; i < 6; i++)
{
csv.Read();
}
var dt = new DataTable();
foreach (var header in headers)
{
dt.Columns.Add(header);
}
while (csv.Read())
{
var row = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
row[i] = csv.GetField(i);
}
dt.Rows.Add(row);
}
}

Related

Can multiple zip file entries be active using ZipOutputStream class?

I am trying to use DotNetZip open source library for creating large zip files.
I need to be able to write to each stream writer part of the data row content (see the code below) of the data table. Other limitation I have is that I can't do this in memory due to the contents being large (several giga bytes each entry).
The problem I have is that despite writing to each stream separately, the output is all written to the last entry only. The first entry contains blank. Does anybody have any idea on how to fix this issue?
static void Main(string fileName)
{
var dt = CreateDataTable();
var streamWriters = new StreamWriter[2];
using (var zipOutputStream = new ZipOutputStream(File.Create(fileName)))
{
for (var i = 0; i < 2; i++)
{
var entryName = "file" + i + ".txt";
zipOutputStream.PutNextEntry(entryName);
streamWriters[i] = new StreamWriter(zipOutputStream, Encoding.UTF8);
}
WriteContents(streamWriters[0], streamWriters[1], dt);
zipOutputStream.Close();
}
}
private DataTable CreateDataTable()
{
var dt = new DataTable();
dt.Columns.AddRange(new DataColumn[] { new DataColumn("col1"), new DataColumn("col2"), new DataColumn("col3"), new DataColumn("col4") });
for (int i = 0; i < 100000; i++)
{
var row = dt.NewRow();
for (int j = 0; j < 4; j++)
{
row[j] = j * 1;
}
dt.Rows.Add(row);
}
return dt;
}
private void WriteContents(StreamWriter writer1, StreamWriter writer2, DataTable dt)
{
foreach (DataRow dataRow in dt.Rows)
{
writer1.WriteLine(dataRow[0] + ", " + dataRow[1]);
writer2.WriteLine(dataRow[2] + ", " + dataRow[3]);
}
}
Expected Results:
Both file0.txt and file1.txt need to written.
Actual results:
Only file1.txt file is written all content. file0.txt is blank.
It seems to be the expected behaviour according to the docs
If you don't call Write() between two calls to PutNextEntry(), the first entry is inserted into the zip file as a file of zero size. This may be what you want.
So to me it seems that it is not possible to do what you want through the current API.
Also, as zip file is a continuous sequence of zip entries, it is probably physically impossible to create entries in parallel, as you would have to know the size of each entry before starting a new one.
Perhaps you could just create separate archives and then combine them (if I am not mistaken there was a simple API to do that)

Fill dynamic Column Data in Rows.Add in Datagrid, Winform

I'm trying to fill a Datagrid with the data fetched from a CSV file.
But I wish to add Columns dynamically, as the number of columns vary in my CSV files.
I don't wish to predefined the Column count in my 'Rows.Add' as below-
dataTable.Rows.Add(totalData[0], totalData[1], totalData[2], totalData[3]);
I have tried two other approaches, but in those don't do the trick.
Below is my code-
using (var selectFileDialog = new OpenFileDialog())
{
if (selectFileDialog.ShowDialog() == DialogResult.OK)
{
string filePath = selectFileDialog.FileName.ToString();
StreamReader streamReader = new StreamReader(filePath);
string[] totalData = new string[File.ReadAllLines(filePath).Length];
DataTable dataTable = new DataTable();
//Fill DataGrid Column Names
totalData = streamReader.ReadLine().Split(';');
for(int i=0; i< totalData.Length; i++)
{ dataTable.Columns.Add(totalData[i]); }
//Fill DataGrid DATA
while (!streamReader.EndOfStream)
{
totalData = streamReader.ReadLine().Split(';');
//METHOD 1: Need a Replacement for this. Dont want a predefined it.
dataTable.Rows.Add(totalData[0], totalData[1], totalData[2], totalData[3]);
//METHOD 2: Doesn't Work. Fills the entire data in the very first column
for (int i = 0; i < totalData.Length; i++)
{ dataTable.Rows.Add(totalData[i]); }
//METHOD 3: Doesn't Work. Throws a Null Pointer Exception.
dgDataFromCSV.Rows[0].Cells[0].Value = "test";
}
dgDataFromCSV.DataSource = dataTable;
}
}
Open for any idea, or nay other method/approach to achieve it.
You could use dataTable.NewRow() to create a new empty DataRow, then use a loop to assign the values to the DataRow and finally attach the DataRow to the table with dataTable.Rows.Add(newDataRow).

C#: Reading a variable structured CSV file into a datatable with a row counter

I am trying to develop a tool that will take a CSV file and import it into a datatable with the first column in the datatable being a row counter.
The CSV files are from different customers and so have different structures. Some have a header line; some have several header lines; some have no header line. They have also have varying columns.
So far, I have the code below.
public void Import_CSV()
{
OpenFileDialog dialog = new OpenFileDialog();
dialog.Filter = "CSV Files (*.csv)|*.csv";
bool? result = dialog.ShowDialog();
if (result ?? false)
{
string[] headers;
string CSVFilePathName = dialog.FileName;
string delimSelect = cboDelimiter.Items.GetItemAt(cboDelimiter.SelectedIndex).ToString();
// If user hasn't selected a delimiter, assume comma
if (delimSelect == "")
{
delimSelect = ",";
}
string[] delimiterType = new string[] {cboDelimiter.Items.GetItemAt(cboDelimiter.SelectedIndex).ToString()};
DataTable dt = new DataTable();
// Read first line of file to get number of fields and create columns and column numbers in data table
using (StreamReader sr1 = new StreamReader(CSVFilePathName))
{
headers = sr1.ReadLine().Split(delimiterType, StringSplitOptions.None);
//dt.Columns.Add("ROW", typeof(int));
//dt.Columns["ROW"].AutoIncrement = true;
//dt.Columns["ROW"].AutoIncrementSeed = 1;
//dt.Columns["ROW"].AutoIncrementStep = 1;
int colCount = 1;
foreach (string header in headers)
{
dt.Columns.Add("C" + colCount.ToString());
colCount++;
}
}
using (StreamReader sr = new StreamReader(CSVFilePathName))
{
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(delimiterType, StringSplitOptions.None);
DataRow dr = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
}
dtGrid.ItemsSource = dt.DefaultView;
txtColCount.Text = dtGrid.Columns.Count.ToString();
txtRowCount.Text = dtGrid.Items.Count.ToString();
}
}
This works, in as much as it creates column headers (C1, C2....according to how many there are in the csv file) and then the rows are written in, but I want to add a column at the far left with a row number as the rows are added. In the code, you can see I've got a section commented out that creates an auto-number column, but I'm totally stuck on how the rows are written into the datatable. If I uncomment that section, I get errors as the first column in the csv file tries to write into an int field. I know you can specify which field in each row can go in which column, but that won't help here as the columns are unknown at this point. I just need it to be able to read ANY file in, regardless of the structure, but with the row counter.
Hope that makes sense.
You write in your question, that uncommenting the code that adds the first column leads to errors. This is because of your loop: it starts at 0, but the 0-th column is the one you have added manually. So you need just to skip it in your loop, starting at 1. However, the source array has to be processed from the 0-th element.
So the solution is:
First, uncomment the row adding code.
Then, in your loop, introduce an offset to leave the first column untouched:
for (int i = 0; i < headers.Length; i++)
{
dr[i + 1] = rows[i];
}

Trouble with parsing CSV files in C#

I'm trying to import a CSV file to my C# site and save it in the database. While doing research I learned about CSV parsing, I've tried to implement this but I've ran into some trouble. Here is a portion of my code so far:
string fileext = Path.GetExtension(fupcsv.PostedFile.FileName);
if (fileext == ".csv")
{
string csvPath = Server.MapPath("~/CSVFiles/") + Path.GetFileName(fupcsv.PostedFile.FileName);
fupcsv.SaveAs(csvPath);
// Add Columns to Datatable to bind data
DataTable dtCSV = new DataTable();
dtCSV.Columns.AddRange(new DataColumn[2] { new DataColumn("ModuleId", typeof(int)), new DataColumn("CourseId", typeof(int))});
// Read all the lines of the text file and close it.
string[] csvData = File.ReadAllLines(csvPath);
// iterate over each row and Split it to New line.
foreach (string row in csvData)
{
// Check for is null or empty row record
if (!string.IsNullOrEmpty(row))
{
using (TextFieldParser parser = new TextFieldParser(csvPath))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Process row
string[] fields = parser.ReadFields();
int i = 1;
foreach (char cell in row)
{
dtCSV.NewRow()[i] = cell;
i++;
}
}
}
}
}
}
I keep getting the error "There is no row at position -1" at " dtCSV.Rows[dtCSV.Rows.Count - 1][i] = cell;"
Any help would be greatly appreciated, thanks
You are trying to index rows that you have not created. Instead of
dtCSV.Rows[dtCSV.Rows.Count - 1][i] = cell;
use
dtCSV.NewRow()[i] = cell;
I also suggest you start indexing i from 0 and not from 1.
All right so it turns out there were a bunch of errors with your code, so I made some edits.
string fileext = Path.GetExtension(fupcsv.PostedFile.FileName);
if (fileext == ".csv")
{
string csvPath = Server.MapPath("~/CSVFiles/") + Path.GetFileName(fupcsv.PostedFile.FileName);
fupcsv.SaveAs(csvPath);
DataTable dtCSV = new DataTable();
dtCSV.Columns.AddRange(new DataColumn[2] { new DataColumn("ModuleId", typeof(int)), new DataColumn("CourseId", typeof(int))});
var csvData = File.ReadAllLines(csvPath);
bool headersSkipped = false;
foreach (string line in csvData)
{
if (!headersSkipped)
{
headersSkipped = true;
continue;
}
// Check for is null or empty row record
if (!string.IsNullOrEmpty(line))
{
//Process row
int i = 0;
var row = dtCSV.NewRow();
foreach (var cell in line.Split(','))
{
row[i] = Int32.Parse(cell);
i++;
}
dtCSV.Rows.Add(row);
dtCSV.AcceptChanges();
}
}
}
I ditched the TextFieldParser solution solely because I'm not familiar with it, but if you want to stick with it, it shouldn't be hard to reintegrate it.
Here are some of the things you got wrong:
Not calling NewRow() to create a new row or adding it to the table with AddRow(row)
Iterating through the characters in row instead of the fields you parsed
Not parsing the value of cell - it's value type is string and you are trying to add to an int column
Some other things worth noting (just to improve your code's performance and readability :))
Consider using var when declaring new variables, it takes a lot of the stress away from having to worry about exactly what type of variable you are creating
As others in the comments said, use ReadAllLines() it parses your text file into lines neatly, making it easier to iterate through.
Most of the times when working with arrays or lists, you need to index from 0, not from 1
You have to use AcceptChanges() to commit all the changes you've made

CSV, add column in first position and fill the first Column with filename

I have a lot of different CSV files with data in it (including headers).
I can't figure it out how to add a column in the first postition and fill the first cells with the filename value (each row).
Can anybody help me?
Thanks in advance
in case that your csv-Files are small enough to load them in your memory
// #1 Read CSV File
string[] CSVDump = File.ReadAllLines(#"c:\temp.csv");
// #2 Split Data
List<List<string>> CSV = CSVDump.Select(x => x.Split(';').ToList()).ToList();
//#3 Update Data
for (int i = 0; i < CSV.Count; i++)
{
CSV[i].Insert(0, i == 0 ? "Headername" : "Filename");
}
//#4 Write CSV File
File.WriteAllLines(#"c:\temp2.csv", CSV.Select(x => string.Join(";", x)));
Adding a column at first position is a special case which can be implemented without CSV parsing. All you need is to prepend every string with the desired value and a comma. The only exception is the very first line which should be prepended with a header name:
string newColumnHeader = "FileName";
string textToPrepend = #"some\file\name";
long lineNumber = 0;
using (StreamWriter sw = File.CreateText("output.csv"))
foreach (var line in File.ReadAllLines("input.csv"))
sw.WriteLine(
(lineNumber++ == 0 ?
newColumnHeader :
textToPrepend) +
"," + line);
I will done little bit more coding. First read the CSV in to DataTable
public static DataTable ConvertCSVtoDataTable(string strFilePath)
{
StreamReader csv = new StreamReader(strFilePath);
string[] headers = csv .ReadLine().Split(',');
DataTable dtCSV = new DataTable();
foreach (string header in headers)
{
dtCSV.Columns.Add(header);
}
while (!csv.EndOfStream)
{
string[] rows = csv.ReadLine().Split(',');
DataRow dr = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
return dtCSV;
}
Then insert the column in my desire location:
DataColumn Col = dtCSV.Columns.Add("FileName", System.Type.GetType("System.String"));
Col.SetOrdinal(0);
After all the value in column return back to CSV:
var lines = new List<string>();
string[] columnNames = dataTable.Columns.Cast<datacolumn>().
Select(column => column.ColumnName).
ToArray();
var header = string.Join(",", columnNames);
lines.Add(header);
var valueLines = dt.AsEnumerable()
.Select(row => string.Join(",", row.ItemArray));
lines.AddRange(valueLines );
File.WriteAllLines("File.csv",lines);

Categories