Csvhelper - read / get a single column of all rows? - c#

Hi I'm using csvHelper to read in a csv files with a variable number of columns. The first row always contains a header row. The number of columns is unknown at first, sometimes there are three columns and sometimes there are 30+. The number of rows can be large.
I can read in the csv file, but how do I address each column of data. I need to do some basic stats on the data (e.g. min, max, stddev), then write them out in a non csv format.
Here is my code so far...
try{
using (var fileReader = File.OpenText(inFile))
using (var csvResult = new CsvHelper.CsvReader(fileReader))
{
// read the header line
csvResult.Read();
// read the whole file
dynamic recs = csvResult.GetRecords<dynamic>().ToList();
/* now how do I get a whole column ???
* recs.getColumn ???
* recs.getColumn['hadername'] ???
*/
}
catch (Exception ex)
{
MessageBox.Show("Error: Could not read file from disk. Original error: " + ex.Message);
}
Thanks

I don't think the library is capable of doing so directly. You have to read your column from individual fields and add them to a List, but the process is usually fast because readers do job fast. For example if your desired column is of type string, the code would be like so:
List<string> myStringColumn= new List<string>();
using (var fileReader = File.OpenText(inFile))
using (var csvResult = new CsvHelper.CsvReader(fileReader))
{
while (csvResult.Read())
{
string stringField=csvResult.GetField<string>("Header Name");
myStringColumn.Add(stringField);
}
}

using (System.IO.StreamReader file = new System.IO.StreamReader(Server.MapPath(filepath)))
{
//Csv reader reads the stream
CsvReader csvread = new CsvReader(file);
while (csvread.Read())
{
int count = csvread.FieldHeaders.Count();
if (count == 55)
{
DataRow dr = myExcelTable.NewRow();
if (csvread.GetField<string>("FirstName") != null)
{
dr["FirstName"] = csvread.GetField<string>("FirstName"); ;
}
else
{
dr["FirstName"] = "";
}
if (csvread.GetField<string>("LastName") != null)
{
dr["LastName"] = csvread.GetField<string>("LastName"); ;
}
else
{
dr["LastName"] = "";
}
}
else
{
lblMessage.Visible = true;
lblMessage.Text = "Columns are not in specified format.";
lblMessage.ForeColor = System.Drawing.Color.Red;
return;
}
}
}

Related

Why am i receiving a "The process cannot access the file because it is being used by another process."

Im trying to process a set of files, i have a given number of txt files, which im currently joining into 1 txt file to apply filters to. The creation of the 1 file from multiple works great. But i have 2 questions and 1 error i cant seem to get around.
1 - Im getting an error when i try to read the newly created file so i can apply the filters. "The process cannot access the file because it is being used by another process."
2 - Am i approaching this the correct or more efficient way? by that i mean can the reading and filtering be applied before creating the concatenated file? I mean i still need to create a new file, but it would be nice to be able to apply everything before creating so that the file is already cleaned and ready for use outside the application.
Here is the current code that is having the issue and the 1 commented line that was my other attempt at releasing the file
private DataTable processFileData(string fname, string locs2 = "0", string effDate = "0", string items = "0")
{
DataTable dt = new DataTable();
string fullPath = fname;
try
{
using (StreamReader sr = new StreamReader(File.OpenRead(fullPath)))
//using (StreamReader sr = new StreamReader(File.Open(fullPath,FileMode.Open,FileAccess.Read, FileShare.Read)))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
if (!String.IsNullOrWhiteSpace(line))
{
string[] headers = line.ToUpper().Split('|');
while (dt.Columns.Count < headers.Length)
{
dt.Columns.Add();
}
string[] rows = line.ToUpper().Split('|');
DataRow dr = dt.NewRow();
for (int i = 0; i < rows.Count(); i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
}
//sr.Close();
sr.Dispose();
}
string cls = String.Format("Column6 NOT LIKE ('{0}')", String.Join("','", returnClass()));
dt.DefaultView.RowFilter = cls;
return dt;
}
catch (IOException ex)
{
Console.WriteLine(ex.Message);
return dt;
}
Here is the concatenation method:
private void Consolidate(string fileType)
{
string sourceFolder = #"H:\Merchant\Strategy\Signs\BACKUP TAG DATA\Wave 6\" + sfld;
string destinationFile = #"H:\Merchant\Strategy\Signs\BACKUP TAG DATA\Wave 6\" + sfld + #"\"+ sfld + #"_consolidation.txt";
// Specify wildcard search to match TXT files that will be combined
string[] filePaths = Directory.GetFiles(sourceFolder, fileType);
StreamWriter fileDest = new StreamWriter(destinationFile, true);
int i;
for (i = 0; i < filePaths.Length; i++)
{
string file = filePaths[i];
string[] lines = File.ReadAllLines(file);
if (i > 0)
{
lines = lines.Skip(1).ToArray(); // Skip header row for all but first file
}
foreach (string line in lines)
{
fileDest.WriteLine(line);
}
}
if (sfld == "CLR")
{
clrFilter(destinationFile);
}
if (sfld == "UPL")
{
uplFilter(destinationFile);
}
if (sfld == "HD")
{
hdFilter(destinationFile);
}
if (sfld == "PD")
{
pdFilter(destinationFile);
}
fileDest.Close();
fileDest.Dispose();
}
What im trying to accomplish is reading min(2 or 3 txt files and as much as 13 txt files) and applying some filtering. But im getting this error:
"The process cannot access the file because it is being used by another process."
You're disposing the stream reader with the following line
sr.Dispose();
Using a 'Using' statement will dispose after the stream goes out of context. So remove the Dispose line (if it wasn't clear below)

how to order the data being written to a text file by score in c#

i am making a quiz application for my computing coursework and i am working on the end screen and in the end screen i have method called savescore()
the savescore() is meant to save the users' username,score and time into a text file. the savescore() method saves the users details into a text-file called scores perfectly but my problem is that when i write the user details into the text file i want the data to be written into the scores text file in order of descending score and i cant figure out how to do that.
private void SaveScore()
{
string file = #"..\..\textfiles\scores.txt";
try
{
//
// Create file if not exists
//
if (!File.Exists(file))
{
File.Create(file).Dispose();
}
//
// Create DataTable
//
DataColumn nameColumn = new DataColumn("name", typeof(String));
DataColumn scoreColumn = new DataColumn("score", typeof(int));
DataColumn timeColumn = new DataColumn("time", typeof(long));
DataTable scores = new DataTable();
scores.Columns.Add(nameColumn);
scores.Columns.Add(scoreColumn);
scores.Columns.Add(timeColumn);
//
// Read CSV and populate DataTable
//
using (StreamReader streamReader = new StreamReader(file))
{
streamReader.ReadLine();
while (!streamReader.EndOfStream)
{
String[] row = streamReader.ReadLine().Split(',');
scores.Rows.Add(row);
}
}
Boolean scoreFound = false;
//
// If user exists and new score is higher, update
//
foreach (DataRow score in scores.Rows)
{
if ((String)score["name"] == player.Name)
{
if ((int)score["score"] < player.Score)
{
score["score"] = player.Score;
score["time"] = elapsedtime;
}
scoreFound = true;
break;
}
}
//
// If user doesn't exist then add user/score
//
if (!scoreFound)
{
scores.Rows.Add(player.Name, player.Score, elapsedtime);
}
//
// Write changes to CSV (empty then rewrite)
//
File.WriteAllText(file, string.Empty);
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.AppendLine("name,score,time");
foreach (DataRow score in scores.Rows)
{
stringBuilder.AppendLine(score["name"] + "," + score["score"] + "," + score["time"]);
}
File.WriteAllText(file, stringBuilder.ToString());
}
catch (Exception ex)
{
MessageBox.Show("Error saving high score:\n\n" + ex.ToString(), "Error");
}
}
so i someone could edit my current code to save the user details in descending order in terms of the score that would be fantastic and thanks in advance.
You can use the DataTable.Select method to achieve that. With the select method you can filter and sort the row in a table.
Here is the changed foreach statement that uses the method to sort the data.
foreach (DataRow score in scores.Select(null, "score DESC"))
{
stringBuilder.AppendLine(score["name"] + "," + score["score"] + "," + score["time"]);
}

Change the name of headers in CSV file using CSVHelper in C#

I am using CSV Helper library to produce CSV files for the user to
to populate and upload into the system. My issue is that the WriteHeader method just writes the attributes of a class with names like "PropertyValue", which is not user friendly. Is there a method I can use to make the text produced user friendly but is still able to successfully map the class to the files data?
My code looks like the following:
public ActionResult UploadPropertyCSV(HttpPostedFileBase file)
{
List<PropertyModel> properties = new List<PropertyModel>();
RIMEDb dbContext = new RIMEDb();
bool success = false;
foreach (string requestFiles in Request.Files)
{
if (file != null && file.ContentLength > 0 && file.FileName.EndsWith(".csv"))
{
using(StreamReader str = new StreamReader(file.InputStream))
{
using(CsvHelper.CsvReader theReader = new CsvHelper.CsvReader(str))
{
while (theReader.Read())
{
RIMUtil.PropertyUploadCSVRowHelper row = new RIMUtil.PropertyUploadCSVRowHelper()
{
UnitNumber = theReader.GetField(0),
StreetNumber = theReader.GetField(1),
StreetName = theReader.GetField(2),
AlternateAddress = theReader.GetField(3),
City = theReader.GetField(4)
};
Property property = new Property();
property.UnitNumber = row.UnitNumber;
property.StreetNumber = row.StreetNumber;
property.StreetName = row.StreetName;
property.AlternateAddress = row.AlternateAddress;
property.City = dbContext.PostalCodes.Where(p => p.PostalCode1 == row.PostalCode).FirstOrDefault().City;
dbContext.Properties.Add(property);
try
{
dbContext.SaveChanges();
success = true;
}
catch(System.Data.Entity.Validation.DbEntityValidationException ex)
{
success = false;
RIMUtil.LogError("Ptoblem validating fields in database. Please check your CSV file for errors.");
}
catch(Exception e)
{
RIMUtil.LogError("Error saving property to database. Please check your CSV file for errors.");
}
}
}
}
}
}
return Json(success);
}
I'm wondering if theres some metadata tag or something I can put on top of each attribute in my PropertyUploadCSVRowHelper class to put the text I want produced in the file
Thanks in advance
Not sure if this existed 2 years ago but now, we can change the property/column name by using the following attribute function:
[CsvHelper.Configuration.Attributes.Name("Column/Field Name")]
Full code:
using CsvHelper;
using System.Collections.Generic;
using System.IO;
namespace Test
{
class Program
{
class CsvColumns
{
private string column_01;
[CsvHelper.Configuration.Attributes.Name("Column 01")] // changes header/column name Column_01 to Column 01
public string Column_01 { get => column_01; set => column_01 = value; }
}
static void Main(string[] args)
{
List<CsvColumns> csvOutput = new List<CsvColumns>();
CsvColumns rows = new CsvColumns();
rows.Column_01 = "data1";
csvOutput.Add(rows);
string filename = "test.csv";
using (StreamWriter writer = File.CreateText(filename))
{
CsvWriter csv = new CsvWriter(writer);
csv.WriteRecords(csvOutput);
}
}
}
}
This might not be answering your question directly as you said you wanted to use csvhelper, but if you're only writing small size files (this is a simple function that I use to generate csv. Note, csvhelper will be much better for larger files as this is just building a string and not streaming the data.
Just customise the columns array in the code below variable to suit your needs.
public string GetCsv(string[] columns, List<object[]> data)
{
StringBuilder CsvData = new StringBuilder();
//add column headers
string[] s = new string[columns.Length];
for (Int32 j = 0; j < columns.Length; j++)
{
s[j] = columns[j];
if (s[j].Contains("\"")) //replace " with ""
s[j].Replace("\"", "\"\"");
if (s[j].Contains("\"") || s[j].Contains(" ")) //add "'s around any string with space or "
s[j] = "\"" + s[j] + "\"";
}
CsvData.AppendLine(string.Join(",", s));
//add rows
foreach (var row in data)
{
for (int j = 0; j < columns.Length; j++)
{
s[j] = row[j] == null ? "" : row[j].ToString();
if (s[j].Contains("\"")) //replace " with ""
s[j].Replace("\"", "\"\"");
if (s[j].Contains("\"") || s[j].Contains(" ")) //add "'s around any string with space or "
s[j] = "\"" + s[j] + "\"";
}
CsvData.AppendLine(string.Join(",", s));
}
return CsvData.ToString();
}
Here is a fiddle example of how to use it: https://dotnetfiddle.net/2WHf6o
Good luck.

Reading and writing very large text files in C#

I have a very large file, almost 2GB in size. I am trying to write a process to read the file in and write it out without the first row. I pretty much have been only able to read and write one line at a time which takes forever. I can open it, remove the first row and save it faster in TextPad, though that is still very slow.
I use this code to get the number of records in the file:
private long getNumRows(string strFileName)
{
long lngNumRows = 0;
string strMsg;
try
{
lngNumRows = 0;
using (var strReader = File.OpenText(#strFileName))
{
while (strReader.ReadLine() != null)
{
lngNumRows++;
}
strReader.Close();
strReader.Dispose();
}
}
catch (Exception excExcept)
{
strMsg = "The File could not be read: ";
strMsg += excExcept.Message;
System.Windows.MessageBox.Show(strMsg);
//Console.WriteLine("Thee was an error reading the file: ");
//Console.WriteLine(excExcept.Message);
//Console.ReadLine();
}
return lngNumRows;
}
This only takes seconds to run. When I add the following code it takes forever to run. Am I doing something wrong? Why does the write add so much time? Any ideas on how I can make this faster?
private void ProcessTextFiles(string strFileName)
{
string strDataLine;
string strFullOutputFileName;
string strSubFileName;
int intPos;
long lngTotalRows = 0;
long lngCurrNumRows = 0;
long lngModNumber = 0;
double dblProgress = 0;
double dblProgressPct = 0;
string strPrgFileName = "";
string strOutName = "";
string strMsg;
long lngFileNumRows;
try
{
using (StreamReader srStreamRdr = new StreamReader(strFileName))
{
while ((strDataLine = srStreamRdr.ReadLine()) != null)
{
lngCurrNumRows++;
if (lngCurrNumRows > 1)
{
WriteDataRow(strDataLine, strFullOutputFileName);
}
}
srStreamRdr.Dispose();
}
}
catch (Exception excExcept)
{
strMsg = "The File could not be read: ";
strMsg += excExcept.Message;
System.Windows.MessageBox.Show(strMsg);
//Console.WriteLine("The File could not be read:");
//Console.WriteLine(excExcept.Message);
}
}
public void WriteDataRow(string strDataRow, string strFullFileName)
{
//using (StreamWriter file = new StreamWriter(#strFullFileName, true, Encoding.GetEncoding("iso-8859-1")))
using (StreamWriter file = new StreamWriter(#strFullFileName, true, System.Text.Encoding.UTF8))
{
file.WriteLine(strDataRow);
file.Close();
}
}
Not sure how much this will improve the performance, but surely, opening and closing the output file for every line that you want to write is not a good idea.
Instead open both files just one time and then write the line directly
using (StreamWriter file = new StreamWriter(#strFullFileName, true, System.Text.Encoding.UTF8))
using (StreamReader srStreamRdr = new StreamReader(strFileName))
{
while ((strDataLine = srStreamRdr.ReadLine()) != null)
{
lngCurrNumRows++;
if (lngCurrNumRows > 1)
file.WriteLine(strDataRow);
}
}
You could also remove the check on lngCurrNumRow simply making an empty read before entering the while loop
strDataLine = srStreamRdr.ReadLine();
if(strDataLine != null)
{
while ((strDataLine = srStreamRdr.ReadLine()) != null)
{
file.WriteLine(strDataRow);
}
}
Depending on the memory of your machine. You could try the following (my big file was "D:\savegrp.log" I had a 2gb file knocking about) This used about 6gb memory when I tried it
int counter = File.ReadAllLines(#"D:\savegrp.log").Length;
Console.WriteLine(counter);
It does depends on the memory available..
File.WriteAllLines(#"D:\savegrp2.log",File.ReadAllLines(#"D:\savegrp.log").Skip(1));
Console.WriteLine("file saved");

How to import large amounts of data from CSV file to DataGridView efficiently

I have 300 csv files that each file contain 18000 rows and 27 columns.
Now, I want to make a windows form application which import them and show in a datagridview and do some mathematical operation later.
But, my performance is very inefficiently...
After search this problem by google, I found a solution "A Fast CSV Reader".
(http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader)
I'm follow the code step by step, but my datagridview still empty.
I don't know how to solve this problem.
Could anyone tell me how to do or give me another better way to read csv efficiently.
Here is my code...
using System.IO;
using LumenWorks.Framework.IO.Csv;
private void Form1_Load(object sender, EventArgs e)
{
ReadCsv();
}
void ReadCsv()
{
// open the file "data.csv" which is a CSV file with headers
using (CachedCsvReader csv = new
CachedCsvReader(new StreamReader("data.csv"), true))
{
// Field headers will automatically be used as column names
dataGridView1.DataSource = csv;
}
}
Here is my input data:
https://dl.dropboxusercontent.com/u/28540219/20130102.csv
Thanks...
The data you provide contains no headers (first line is a data line). So I got an ArgumentException (item with same key added) when I tried to add the csv reader to the DataSource. Setting the hasHeaders parameter in the CachCsvReader constructor did the trick and it added the data to the DataGridView (very fast).
using (CachedCsvReader csv = new CachedCsvReader(new StreamReader("data.csv"), false))
{
dataGridView.DataSource = csv;
}
Hope this helps!
You can also do like
private void ReadCsv()
{
string filePath = #"C:\..\20130102.csv";
FileStream fileStream = null;
try
{
fileStream = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
}
catch (Exception ex)
{
return;
}
DataTable table = new DataTable();
bool isColumnCreated = false;
using (StringReader reader = new StringReader(new StreamReader(fileStream, Encoding.Default).ReadToEnd()))
{
while (reader.Peek() != -1)
{
string line = reader.ReadLine();
if (line == null || line.Length == 0)
continue;
string[] values = line.Split(',');
if(!isColumnCreated)
{
for(int i=0; i < values.Count(); i++)
{
table.Columns.Add("Column" + i);
}
isColumnCreated = true;
}
DataRow row = table.NewRow();
for(int i=0; i < values.Count(); i++)
{
row[i] = values[i];
}
table.Rows.Add(row);
}
}
dataGridView1.DataSource = table;
}
Based on you performance requirement, this code can be improvised. It is just a working sample for your reference.
I hope this will give some idea.

Categories