I need to read from a CSV/Tab delimited file and write to such a file as well from .net.
The difficulty is that I don't know the structure of each file and need to write the cvs/tab file to a datatable, which the FileHelpers library doesn't seem to support.
I've already written it for Excel using OLEDB, but can't really see a way to write a tab file for this, so will go back to a library.
Can anyone help with suggestions?
.NET comes with a CSV/tab delminited file parser called the TextFieldParser class.
http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
It supports the full RFC for CSV files and really good error reporting.
I used this CsvReader, it is really great and well configurable. It behaves well with all kinds of escaping for strings and separators. The escaping in other quick and dirty implementations were poor, but this lib is really great at reading. With a few additional codelines you can also add a cache if you need to.
Writing is not supported but it rather trivial to implement yourself. Or inspire yourself from this code.
Simple example with CsvHelper
using (TextWriter writer = new StreamWriter(filePath)
{
var csvWriter = new CsvWriter(writer);
csvWriter.Configuration.Delimiter = "\t";
csvWriter.Configuration.Encoding = Encoding.UTF8;
csvWriter.WriteRecords(exportRecords);
}
Here are a couple CSV reader implementations:
http://www.codeproject.com/KB/database/CsvReader.aspx
http://www.heikniemi.fi/jhlib/ (just one part of the library; includes a CSV writer too)
I doubt there is a standard way to convert CSV to DataTable or database 'automatically', you'll have to write code to do that. How to do that is a separate question.
You'll create your datatable in code, and (presuming a header row) can create columns based on your first line in the file. After that, it will simply be a matter of reading the file and creating new rows based on the data therein.
You could use something like this:
DataTable Tbl = new DataTable();
using(StreamReader sr = new StreamReader(path))
{
int count = 0;
string headerRow = sr.Read();
string[] headers = headerRow.split("\t") //Or ","
foreach(string h in headers)
{
DataColumn dc = new DataColumn(h);
Tbl.Columns.Add(dc);
count++;
}
while(sr.Peek())
{
string data = sr.Read();
string[] cells = data.Split("\t")
DataRow row = new DataRow();
foreach(string c in cells)
{
row.Columns.Add(c);
}
Tbl.Rows.Add(row);
}
}
The above code has not been compiled, so it may have some errors, but it should get you on the right track.
You can read and write csv file..
This may be helpful for you.
pass split char to this parameter "serparationChar"
Example : -
private DataTable dataTable = null;
private bool IsHeader = true;
private string headerLine = string.Empty;
private List<string> AllLines = new List<string>();
private StringBuilder sb = new StringBuilder();
private char seprateChar = ',';
public DataTable ReadCSV(string path, bool IsReadHeader, char serparationChar)
{
seprateChar = serparationChar;
IsHeader = IsReadHeader;
using (StreamReader sr = new StreamReader(path,Encoding.Default))
{
while (!sr.EndOfStream)
{
AllLines.Add( sr.ReadLine());
}
createTemplate(AllLines);
}
return dataTable;
}
public void WriteCSV(string path,DataTable dtable,char serparationChar)
{
AllLines = new List<string>();
seprateChar = serparationChar;
List<string> StableHeadrs = new List<string>();
int colCount = 0;
using (StreamWriter sw = new StreamWriter(path))
{
foreach (DataColumn col in dtable.Columns)
{
sb.Append(col.ColumnName);
if(dataTable.Columns.Count-1 > colCount)
sb.Append(seprateChar);
colCount++;
}
AllLines.Add(sb.ToString());
for (int i = 0; i < dtable.Rows.Count; i++)
{
sb.Clear();
for (int j = 0; j < dtable.Columns.Count; j++)
{
sb.Append(Convert.ToString(dtable.Rows[i][j]));
if (dataTable.Columns.Count - 1 > j)
sb.Append(seprateChar);
}
AllLines.Add(sb.ToString());
}
foreach (string dataline in AllLines)
{
sw.WriteLine(dataline);
}
}
}
private DataTable createTemplate(List<string> lines)
{
List<string> headers = new List<string>();
dataTable = new DataTable();
if (lines.Count > 0)
{
string[] argHeaders = null;
for (int i = 0; i < lines.Count; i++)
{
if (i > 0)
{
DataRow newRow = dataTable.NewRow();
// others add to rows
string[] argLines = lines[i].Split(seprateChar);
for (int b = 0; b < argLines.Length; b++)
{
newRow[b] = argLines[b];
}
dataTable.Rows.Add(newRow);
}
else
{
// header add to columns
argHeaders = lines[0].Split(seprateChar);
foreach (string c in argHeaders)
{
DataColumn column = new DataColumn(c, typeof(string));
dataTable.Columns.Add(column);
}
}
}
}
return dataTable;
}
I have found best solution
http://www.codeproject.com/Articles/415732/Reading-and-Writing-CSV-Files-in-Csharp
Just I had to re-write
void ReadTest()
{
// Read sample data from CSV file
using (CsvFileReader reader = new CsvFileReader("ReadTest.csv"))
{
CsvRow row = new CsvRow();
while (reader.ReadRow(row))
{
foreach (string s in row)
{
Console.Write(s);
Console.Write(" ");
}
Console.WriteLine();
row = new CsvRow(); //this line added
}
}
}
Well, there is another library Cinchoo ETL - an open source one, for reading and writing CSV files.
Couple of ways you can read CSV files
Id, Name
1, Tom
2, Mark
This is how you can use this library to read it
using (var reader = new ChoCSVReader("emp.csv").WithFirstLineHeader())
{
foreach (dynamic item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
If you have POCO object defined to match up with CSV file like below
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
}
You can parse the same file using this POCO class as below
using (var reader = new ChoCSVReader<Employee>("emp.csv").WithFirstLineHeader())
{
foreach (var item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
Please check out articles at CodeProject on how to use it.
Disclaimer: I'm the author of this library
Related
The data is not displayed correctly in the columns.
The CSV consists of 7 columns. Rows are of different length.
I can not upload a picture.(https://ibb.co/0fnfLW7)
DataTable tblcsv = new DataTable();
tblcsv.Columns.Add("Vorname");
tblcsv.Columns.Add("Nachname");
tblcsv.Columns.Add("RFID");
string csvData = File.ReadAllText(csvPath);
//spliting row after new line
foreach (string csvRow in csvData.Split(';'))
{
if (!string.IsNullOrEmpty(csvRow))
{
//Adding each row into datatable
tblcsv.Rows.Add();
int count = 0;
foreach (string FileRec in csvRow.Split(';'))
{
tblcsv.Rows[tblcsv.Rows.Count - 1][count] = FileRec;
count++;
for(var x=0; x<7; x++)
{
//tblcsv[x][count] = FileRec;
}
count++;
}
}
//Calling Bind Grid Functions
BindgridStaffImport(tblcsv);
}
You split code is using ";", and should it not be "," if a csv?
The code will look much like this:
foreach (string csvRow in csvData.Split(';'))
{
if (!string.IsNullOrEmpty(csvRow))
{
//Adding each row into datatable
DataRow MyNewRow = tblcsv.NewRow();
int count = 0;
foreach (string FileRec in csvRow.Split(','))
{
MyNewRow[count] = FileRec;
count++;
}
tblcsv.Rows.Add(MyNewRow);
}
//Calling Bind Grid Functions
BindgridStaffImport(tblcsv);
So, use the "newRow" method to create a new blank row based on the table. Set the column values, and then add this new row to the rows collection of the table.
And even better is to use the TextFieldParser class.
So,
using Microsoft.VisualBasic.FileIO;
And then you can even assume say the first row has the colum names.
This:
We assume you read the csv file into stirng sFileLocal
// process csv file
TextFieldParser MyParse = new TextFieldParser(sFileLocal);
MyParse.TextFieldType = FieldType.Delimited;
MyParse.SetDelimiters(new string[] { "," });
MyParse.HasFieldsEnclosedInQuotes = true;
// add colums to table.
DataTable rstData = new DataTable();
foreach (string cCol in MyParse.ReadFields())
{
rstData.Columns.Add(cCol);
}
while (!MyParse.EndOfData)
{
string[] OneRow = MyParse.ReadFields();
rstData.Rows.Add(OneRow);
}
// Now display results in a grid
GridView1.DataSource = rstData;
GridView1.DataBind();
Quite much all above is you need.
Thank you very much for the support.
I now have a working variant.
However, the goal is to skip the first line, the header.
Are there any ideas how I can implement this.
List<string> headercolumns = new List<string> { "Vorname", "Nachname", };
DataTable dt = new DataTable();
foreach (string header in headercolumns)
{
dt.Columns.Add(header);
}
List<string> StaffList = new List<string>();
using (StreamReader filereader = new StreamReader(StaffFileUpload.FileContent))
{
string readline = string.Empty;
while ((readline = filereader.ReadLine()) != null)
{
StaffList.Add(readline);
}
if (StaffList != null && StaffList.Count > 0)
{
foreach (string info in StaffList)
{
DataRow row = dt.NewRow();
string[] mitarbeiter = info.Split(';');
for (int i = 0; i < mitarbeiter.Length; i++)
{
row[i] = mitarbeiter[i];
}
dt.Rows.Add(row);
}
}
}
I am trying to use DotNetZip open source library for creating large zip files.
I need to be able to write to each stream writer part of the data row content (see the code below) of the data table. Other limitation I have is that I can't do this in memory due to the contents being large (several giga bytes each entry).
The problem I have is that despite writing to each stream separately, the output is all written to the last entry only. The first entry contains blank. Does anybody have any idea on how to fix this issue?
static void Main(string fileName)
{
var dt = CreateDataTable();
var streamWriters = new StreamWriter[2];
using (var zipOutputStream = new ZipOutputStream(File.Create(fileName)))
{
for (var i = 0; i < 2; i++)
{
var entryName = "file" + i + ".txt";
zipOutputStream.PutNextEntry(entryName);
streamWriters[i] = new StreamWriter(zipOutputStream, Encoding.UTF8);
}
WriteContents(streamWriters[0], streamWriters[1], dt);
zipOutputStream.Close();
}
}
private DataTable CreateDataTable()
{
var dt = new DataTable();
dt.Columns.AddRange(new DataColumn[] { new DataColumn("col1"), new DataColumn("col2"), new DataColumn("col3"), new DataColumn("col4") });
for (int i = 0; i < 100000; i++)
{
var row = dt.NewRow();
for (int j = 0; j < 4; j++)
{
row[j] = j * 1;
}
dt.Rows.Add(row);
}
return dt;
}
private void WriteContents(StreamWriter writer1, StreamWriter writer2, DataTable dt)
{
foreach (DataRow dataRow in dt.Rows)
{
writer1.WriteLine(dataRow[0] + ", " + dataRow[1]);
writer2.WriteLine(dataRow[2] + ", " + dataRow[3]);
}
}
Expected Results:
Both file0.txt and file1.txt need to written.
Actual results:
Only file1.txt file is written all content. file0.txt is blank.
It seems to be the expected behaviour according to the docs
If you don't call Write() between two calls to PutNextEntry(), the first entry is inserted into the zip file as a file of zero size. This may be what you want.
So to me it seems that it is not possible to do what you want through the current API.
Also, as zip file is a continuous sequence of zip entries, it is probably physically impossible to create entries in parallel, as you would have to know the size of each entry before starting a new one.
Perhaps you could just create separate archives and then combine them (if I am not mistaken there was a simple API to do that)
So I am having an issue with my output being written to a CSV file. The output to this code is in the correct format when being written to the CSV file but it is only entering a single row in the file. There should be much more. Around 150 lines. Current out put is:
(859.85 7N830127 185)
Which is correct, but there should be more of these. It seems like to me that it is only writing the first line of the parsed EDI file and then stopping. I need to find a way to make sure it writes all data that is being parsed can anyone help me?
static void Main(string[] args)
{
StreamReader sr = new StreamReader("edifile.txt");
string[] ediMapTemp = sr.ReadLine().Split('|');
List<string[]> ediMap = new List<string[]>();
List<object[]> outputMatrix = new List<object[]>();
foreach (var line in ediMapTemp)
{
ediMap.Add(line.Split('~'));
}
DetailNode node = new DetailNode(0, null, 0);
int hierarchicalDepth = 0;
int hierarchicalIdNumber;
int hierarchicalParentIdNumber;
int hierarchicalLevelCode;
int hierarchicalChildCode = 0;
for (int i = 0; i < ediMap.Count; i++)
{
string segmentHeader = ediMap[i][0];
if (segmentHeader == "HL")
{
hierarchicalIdNumber = Convert.ToInt32(ediMap[i][1]);
hierarchicalParentIdNumber = Convert.ToInt32(ediMap[i][2]);
hierarchicalLevelCode = Convert.ToInt32(ediMap[i][3]);
hierarchicalChildCode = Convert.ToInt32(ediMap[i][4]);
List<string[]> levelDetails = new List<string[]>();
for (int v = i + 1; v < ediMap.Count; v++)
{
if (ediMap[v][0] == "HL") break;
levelDetails.Add(ediMap[v]);
}
DetailNode getNode = node.Find(node, hierarchicalParentIdNumber);
getNode.headList.Add(new DetailNode(hierarchicalIdNumber, levelDetails, getNode.depth + 1));
}
}
node.Traversal(new VID(), node);
foreach (var vid in VIDList.vidList)
using (StreamWriter writer = new StreamWriter("Import.csv"))
{
//probably a loop here
writer.WriteLine(String.Join(",", vid.totalCurrentCharges, vid.assetId, vid.componentName, vid.recurringCharge));
}
}
Quick Review, should the code be the following:
using (StreamWriter writer = new StreamWriter("Import.csv"))
{
//a loop here
foreach (var vid in VIDList.vidList)
{
writer.WriteLine(String.Join(",", vid.totalCurrentCharges, vid.assetId, vid.componentName, vid.recurringCharge));
}
}
You would open the file once and then loop thru your collection, writing each one.
It looks to me like you're re-opening the output file for each line you're trying to write, so you're overwriting the output file with a new file for each line. That would mean only the last entry remains in the file. Try moving this line
using (StreamWriter writer = new StreamWriter("Import.csv"))
Outside the foreach loop.
I have 300 csv files that each file contain 18000 rows and 27 columns.
Now, I want to make a windows form application which import them and show in a datagridview and do some mathematical operation later.
But, my performance is very inefficiently...
After search this problem by google, I found a solution "A Fast CSV Reader".
(http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader)
I'm follow the code step by step, but my datagridview still empty.
I don't know how to solve this problem.
Could anyone tell me how to do or give me another better way to read csv efficiently.
Here is my code...
using System.IO;
using LumenWorks.Framework.IO.Csv;
private void Form1_Load(object sender, EventArgs e)
{
ReadCsv();
}
void ReadCsv()
{
// open the file "data.csv" which is a CSV file with headers
using (CachedCsvReader csv = new
CachedCsvReader(new StreamReader("data.csv"), true))
{
// Field headers will automatically be used as column names
dataGridView1.DataSource = csv;
}
}
Here is my input data:
https://dl.dropboxusercontent.com/u/28540219/20130102.csv
Thanks...
The data you provide contains no headers (first line is a data line). So I got an ArgumentException (item with same key added) when I tried to add the csv reader to the DataSource. Setting the hasHeaders parameter in the CachCsvReader constructor did the trick and it added the data to the DataGridView (very fast).
using (CachedCsvReader csv = new CachedCsvReader(new StreamReader("data.csv"), false))
{
dataGridView.DataSource = csv;
}
Hope this helps!
You can also do like
private void ReadCsv()
{
string filePath = #"C:\..\20130102.csv";
FileStream fileStream = null;
try
{
fileStream = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
}
catch (Exception ex)
{
return;
}
DataTable table = new DataTable();
bool isColumnCreated = false;
using (StringReader reader = new StringReader(new StreamReader(fileStream, Encoding.Default).ReadToEnd()))
{
while (reader.Peek() != -1)
{
string line = reader.ReadLine();
if (line == null || line.Length == 0)
continue;
string[] values = line.Split(',');
if(!isColumnCreated)
{
for(int i=0; i < values.Count(); i++)
{
table.Columns.Add("Column" + i);
}
isColumnCreated = true;
}
DataRow row = table.NewRow();
for(int i=0; i < values.Count(); i++)
{
row[i] = values[i];
}
table.Rows.Add(row);
}
}
dataGridView1.DataSource = table;
}
Based on you performance requirement, this code can be improvised. It is just a working sample for your reference.
I hope this will give some idea.
Want to create a generic text file parser in c# for any find of text file.Actually i have 4 application all 4 getting input data from txt file format but text files are not homogeneous in nature.i have tried fixedwithdelemition.
private static DataTable FixedWidthDiliminatedTxtRead()
{
string[] fields;
StringBuilder sb = new StringBuilder();
List<StringBuilder> lst = new List<StringBuilder>();
DataTable dtable = new DataTable();
ArrayList aList;
using (TextFieldParser tfp = new TextFieldParser(testOCC))
{
tfp.TextFieldType = FieldType.FixedWidth;
tfp.SetFieldWidths(new int[12] { 2,25,8,12,13,5,6,3,10,11,10,24 });
for (int col = 1; col < 13; ++col)
dtable.Columns.Add("COL" + col);
while (!tfp.EndOfData)
{
fields = tfp.ReadFields();
aList = new ArrayList();
for (int i = 0; i < fields.Length; ++i)
aList.Add(fields[i] as string);
if (dtable.Columns.Count == aList.Count)
dtable.Rows.Add(aList.ToArray());
}
}
return dtable;
}
but i feel its very rigid one and really varies application to application making it configgurable .any better way ..
tfp.SetFieldWidths(new int[12] { 2,25,8,12,13,5,6,3,10,11,10,24 });
File nature :
Its a report kind of file .
position of columns are very similar
row data of file id different .
I get this as a reference
http://www.codeproject.com/Articles/11698/A-Portable-and-Efficient-Generic-Parser-for-Flat-F
any other thoughts ?
If the only thing different is the field widths, you could just try sending the field widths in as a parameter:
private static DataTable FixedWidthDiliminatedTxtRead(int[] fieldWidthArray)
{
string[] fields;
StringBuilder sb = new StringBuilder();
List<StringBuilder> lst = new List<StringBuilder>();
DataTable dtable = new DataTable();
ArrayList aList;
using (TextFieldParser tfp = new TextFieldParser(testOCC))
{
tfp.TextFieldType = FieldType.FixedWidth;
tfp.SetFieldWidths(fieldWidthArray);
for (int col = 1; col < 13; ++col)
dtable.Columns.Add("COL" + col);
while (!tfp.EndOfData)
{
fields = tfp.ReadFields();
aList = new ArrayList();
for (int i = 0; i < fields.Length; ++i)
aList.Add(fields[i] as string);
if (dtable.Columns.Count == aList.Count)
dtable.Rows.Add(aList.ToArray());
}
}
return dtable;
}
If you will have more logic to grab the data, you might want to consider defining an interface or abstract class for a GenericTextParser and create concrete implementations for each other file.
Hey I made one of these last week.
I did not write it with the intentions of other people using it so I appologize in advance if its not documented well but I cleaned it up for you. ALSO I grabbed several segments of code from stack overflow so I am not the original author of several pieces of this.
The places you need to edit are the path and pathout and the seperators of text.
char[] delimiters = new char[]
So it searches for part of a word and then grabs the whole word. I used a c# console application for this.
Here you go:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
namespace UniqueListofStringFinder
{
class Program
{
static void Main(string[] args)
{
string path = #"c:\Your Path\in.txt";
string pathOut = #"c:\Your Path\out.txt";
string data = "!";
Console.WriteLine("Current Path In is set to: " + path);
Console.WriteLine("Current Path Out is set to: " + pathOut);
Console.WriteLine(Environment.NewLine + Environment.NewLine + "Input String to Search For:");
Console.Read();
string input = Console.ReadLine();
// Delete the file if it exists.
if (!File.Exists(path))
{
// Create the file.
using (FileStream fs = File.Create(path))
{
Byte[] info =
new UTF8Encoding(true).GetBytes("This is some text in the file.");
// Add some information to the file.
fs.Write(info, 0, info.Length);
}
}
System.IO.StreamReader file = new System.IO.StreamReader(path);
List<string> Spec = new List<string>();
using (StreamReader sr = File.OpenText(path))
{
while (!file.EndOfStream)
{
string s = file.ReadLine();
if (s.Contains(input))
{
char[] delimiters = new char[] { '\r', '\n', '\t', ')', '(', ',', '=', '"', '\'', '<', '>', '$', ' ', '#', '[', ']' };
string[] parts = s.Split(delimiters,
StringSplitOptions.RemoveEmptyEntries);
foreach (string word in parts)
{
if (word.Contains(input))
{
if( word.IndexOf(input) == 0)
{
Spec.Add(word);
}
}
}
}
}
Spec.Sort();
// Open the stream and read it back.
//while ((s = sr.ReadLine()) != null)
//{
// Console.WriteLine(s);
//}
}
Console.WriteLine();
StringBuilder builder = new StringBuilder();
foreach (string s in Spec) // Loop through all strings
{
builder.Append(s).Append(Environment.NewLine); // Append string to StringBuilder
}
string result = builder.ToString(); // Get string from StringBuilder
Program a = new Program();
data = a.uniqueness(result);
int i = a.writeFile(data,pathOut);
}
public string uniqueness(string rawData )
{
if (rawData == "")
{
return "Empty Data Set";
}
List<string> dataVar = new List<string>();
List<string> holdData = new List<string>();
bool testBool = false;
using (StringReader reader = new StringReader(rawData))
{
string line;
while ((line = reader.ReadLine()) != null)
{
foreach (string s in holdData)
{
if (line == s)
{
testBool = true;
}
}
if (testBool == false)
{
holdData.Add(line);
}
testBool = false;
// Do something with the line
}
}
int i = 0;
string dataOut = "";
foreach (string s in holdData)
{
dataOut += s + "\r\n";
i++;
}
// Write the string to a file.
return dataOut;
}
public int writeFile(string dataOut, string pathOut)
{
try
{
System.IO.StreamWriter file = new System.IO.StreamWriter(pathOut);
file.WriteLine(dataOut);
file.Close();
}
catch (Exception ex)
{
dataOut += ex.ToString();
return 1;
}
return 0;
}
}
}
private static DataTable FixedWidthTxtRead(string filename, int[] fieldWidths)
{
string[] fields;
DataTable dtable = new DataTable();
ArrayList aList;
using (TextFieldParser tfp = new TextFieldParser(filename))
{
tfp.TextFieldType = FieldType.FixedWidth;
tfp.SetFieldWidths(fieldWidths);
for (int col = 1; col <= fieldWidths.length; ++col)
dtable.Columns.Add("COL" + col);
while (!tfp.EndOfData)
{
fields = tfp.ReadFields();
aList = new ArrayList();
for (int i = 0; i < fields.Length; ++i)
aList.Add(fields[i] as string);
if (dtable.Columns.Count == aList.Count) dtable.Rows.Add(aList.ToArray());
}
}
return dtable;
}
Here's what I did:
I built a factory for the type of processor needed (based on file type/format), which abstracted the file reader.
I then built a collection object that contained a set of triggers for each field I was interested in (also contained the property name for which this field is destined). This settings collection is loaded in via an XML configuration file, so all I need to change are the settings, and the base parsing process can react to how the settings are configured. Finally I built a reflection wrapper wherein once a field is parsed, the corresponding property on the model object is set.
As the file flowed through, the triggers for each setting evaluated each lines value. When it found what it was set to find (via pattern matching, or column length values) it fired and event that bubbled up and set a property on the model object. I can show some pseudo code if you're interested. It needs some work for efficiency's sake, but I like the concept.