I am trying to use DotNetZip open source library for creating large zip files.
I need to be able to write to each stream writer part of the data row content (see the code below) of the data table. Other limitation I have is that I can't do this in memory due to the contents being large (several giga bytes each entry).
The problem I have is that despite writing to each stream separately, the output is all written to the last entry only. The first entry contains blank. Does anybody have any idea on how to fix this issue?
static void Main(string fileName)
{
var dt = CreateDataTable();
var streamWriters = new StreamWriter[2];
using (var zipOutputStream = new ZipOutputStream(File.Create(fileName)))
{
for (var i = 0; i < 2; i++)
{
var entryName = "file" + i + ".txt";
zipOutputStream.PutNextEntry(entryName);
streamWriters[i] = new StreamWriter(zipOutputStream, Encoding.UTF8);
}
WriteContents(streamWriters[0], streamWriters[1], dt);
zipOutputStream.Close();
}
}
private DataTable CreateDataTable()
{
var dt = new DataTable();
dt.Columns.AddRange(new DataColumn[] { new DataColumn("col1"), new DataColumn("col2"), new DataColumn("col3"), new DataColumn("col4") });
for (int i = 0; i < 100000; i++)
{
var row = dt.NewRow();
for (int j = 0; j < 4; j++)
{
row[j] = j * 1;
}
dt.Rows.Add(row);
}
return dt;
}
private void WriteContents(StreamWriter writer1, StreamWriter writer2, DataTable dt)
{
foreach (DataRow dataRow in dt.Rows)
{
writer1.WriteLine(dataRow[0] + ", " + dataRow[1]);
writer2.WriteLine(dataRow[2] + ", " + dataRow[3]);
}
}
Expected Results:
Both file0.txt and file1.txt need to written.
Actual results:
Only file1.txt file is written all content. file0.txt is blank.
It seems to be the expected behaviour according to the docs
If you don't call Write() between two calls to PutNextEntry(), the first entry is inserted into the zip file as a file of zero size. This may be what you want.
So to me it seems that it is not possible to do what you want through the current API.
Also, as zip file is a continuous sequence of zip entries, it is probably physically impossible to create entries in parallel, as you would have to know the size of each entry before starting a new one.
Perhaps you could just create separate archives and then combine them (if I am not mistaken there was a simple API to do that)
Related
I have sample data that looks like this:
1 This is a random line in the file
2
3 SOURCE_ID|NAME|START_DATE|END_DATE|VALUE_1|VALUE_2
4
5 Another random line in the file
6
7
8
9
10 GILBER|FRED|2019-JAN-01|2019-JAN-31|ABC|DEF
11 ALEF|ABC|2019-FEB-01|2019-AUG-31|FBC|DGF
12 GILBER|FRED|2019-JAN-01|2019-JAN-31|ABC|TEF
13 FLBER|RED|2019-JUN-01|2019-JUL-31|AJC|DEH
14 GI|JOE|2020-APR-01|2020-DEC-31|GBC|DER
I am unable to save changes to the file. Ie, I can't manipulate/clean the original files before consumption. Any manipulation will need to be done on the fly in memory. But what if the files are large (eg, I am currently testing with some files that are 5m+ records).
I am using CsvHelper
I have already referred to the following threads for guidance:
CSVHelper to skip record before header
Better way to skip extraneous lines at the start?
How to read a header from a specific line with CsvHelper?
What I would like to do is:
Set row where header is = 3 (I will know where the header is)
Set row where data starts = 10 (I will know where the data starts from)
Load data into data table, to be displayed into datagridview
If I need perform a combination of stream manipulation before I pass this into the CsvHelper, then do also let me know if that's the missing piece? (and any assistance on how I can actually achieve that under one block of code with be greatly appreciated)
So far I have come up with the below:
string filepath = Path.Combine(txtTst04_File_Location.Text, txtTst04_File_Name.Text);
using (var reader = new StreamReader(filepath))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
// skip rows to get the header
for (int i = 0; i < 4; i++)
{
csv.Read();
}
csv.Configuration.Delimiter = "|"; // Set delimiter
csv.Configuration.IgnoreBlankLines = false;
csv.Configuration.HasHeaderRecord = true;
// how do I set the row where the actual data starts?
using (var dr = new CsvDataReader(csv))
{
var dt = new DataTable();
dt.Load(dr);
dgvTst04_View.DataSource = dt; // Set datagridview source to datatable
}
}
I get the below result:
Do let me know if you would like me to expand on any point.
thanks!
EDIT:
New linked post created here trying to resolve the same objective, but in a different way but getting a new error:
Filestream and datagridview memory issue with CsvHelper
I can get it to work with ShouldSkipRecord. The only problem is it will fail if any of the random lines has a "|" delimiter in it.
using (var reader = new StreamReader(filepath))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.Delimiter = "|"; // Set delimiter
csv.Configuration.ShouldSkipRecord = row => row.Length == 1;
using (var dr = new CsvDataReader(csv))
{
var dt = new DataTable();
dt.Load(dr);
dgvTst04_View.DataSource = dt; // Set datagridview source to datatable
}
}
If you know how many columns there are, you could set it to skip any rows that have less than that many columns.
csv.Configuration.ShouldSkipRecord = row => row.Length < 6;
I came up with another approach that allows you to skip the lines to the header and then to the records.
using (var reader = new StreamReader(filepath))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.Delimiter = "|"; // Set delimiter
csv.Configuration.IgnoreBlankLines = false;
// skip to header
for (int i = 0; i < 3; i++)
{
csv.Read();
}
csv.ReadHeader();
var headers = csv.Context.HeaderRecord;
// skip to records
for (int i = 0; i < 6; i++)
{
csv.Read();
}
var dt = new DataTable();
foreach (var header in headers)
{
dt.Columns.Add(header);
}
while (csv.Read())
{
var row = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
row[i] = csv.GetField(i);
}
dt.Rows.Add(row);
}
}
I am trying to develop a tool that will take a CSV file and import it into a datatable with the first column in the datatable being a row counter.
The CSV files are from different customers and so have different structures. Some have a header line; some have several header lines; some have no header line. They have also have varying columns.
So far, I have the code below.
public void Import_CSV()
{
OpenFileDialog dialog = new OpenFileDialog();
dialog.Filter = "CSV Files (*.csv)|*.csv";
bool? result = dialog.ShowDialog();
if (result ?? false)
{
string[] headers;
string CSVFilePathName = dialog.FileName;
string delimSelect = cboDelimiter.Items.GetItemAt(cboDelimiter.SelectedIndex).ToString();
// If user hasn't selected a delimiter, assume comma
if (delimSelect == "")
{
delimSelect = ",";
}
string[] delimiterType = new string[] {cboDelimiter.Items.GetItemAt(cboDelimiter.SelectedIndex).ToString()};
DataTable dt = new DataTable();
// Read first line of file to get number of fields and create columns and column numbers in data table
using (StreamReader sr1 = new StreamReader(CSVFilePathName))
{
headers = sr1.ReadLine().Split(delimiterType, StringSplitOptions.None);
//dt.Columns.Add("ROW", typeof(int));
//dt.Columns["ROW"].AutoIncrement = true;
//dt.Columns["ROW"].AutoIncrementSeed = 1;
//dt.Columns["ROW"].AutoIncrementStep = 1;
int colCount = 1;
foreach (string header in headers)
{
dt.Columns.Add("C" + colCount.ToString());
colCount++;
}
}
using (StreamReader sr = new StreamReader(CSVFilePathName))
{
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(delimiterType, StringSplitOptions.None);
DataRow dr = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
}
dtGrid.ItemsSource = dt.DefaultView;
txtColCount.Text = dtGrid.Columns.Count.ToString();
txtRowCount.Text = dtGrid.Items.Count.ToString();
}
}
This works, in as much as it creates column headers (C1, C2....according to how many there are in the csv file) and then the rows are written in, but I want to add a column at the far left with a row number as the rows are added. In the code, you can see I've got a section commented out that creates an auto-number column, but I'm totally stuck on how the rows are written into the datatable. If I uncomment that section, I get errors as the first column in the csv file tries to write into an int field. I know you can specify which field in each row can go in which column, but that won't help here as the columns are unknown at this point. I just need it to be able to read ANY file in, regardless of the structure, but with the row counter.
Hope that makes sense.
You write in your question, that uncommenting the code that adds the first column leads to errors. This is because of your loop: it starts at 0, but the 0-th column is the one you have added manually. So you need just to skip it in your loop, starting at 1. However, the source array has to be processed from the 0-th element.
So the solution is:
First, uncomment the row adding code.
Then, in your loop, introduce an offset to leave the first column untouched:
for (int i = 0; i < headers.Length; i++)
{
dr[i + 1] = rows[i];
}
hey guys i been trying to read multi-selected csv files and display them in a wpf data grid but i am having problems with the code. here my code below
OpenFileDialog ofd = new OpenFileDialog();
DataTable dtt = new DataTable();
[DelimitedRecord(",")]
private class myCSVFile
{
public string Supplier;
public string Product;
public string Price;
}
private void btnImport_Click(object sender, RoutedEventArgs e)
{
FileHelperEngine engine = new FileHelperEngine(typeof(myCSVFile));
myCSVFile[] result= new myCSvFile[];
foreach (string filepath in ofd.FileNames)
{
for (int i = 0; i < ofd.FileNames.Lenght; i++)
{
result[i] = File.ReadAllLines(System.IO.Path.ChangeExtension(filepath,".csv"));
}
}
dtt.Columns.Add("Suplier", typeof(string));
dtt.Columns.Add("Supplier Type", typeof(string));
dtt.Columns.Add("Price", typeof(string));
foreach (myCSVFile c in result)
{
Console.WriteLine(c.Supplier + " " + c.Product + " " + c.Price);
dtt.Rows.Add(c.Supplier, c.Product, c.Price);
dataGridv.DataContext = dtt.DefaultView;
}
}
i have a file helper reference that i downloaded online to help read the .csv and this works for a single csv file but not for the multi selected. i used ofd.FileNames to get and array of paths and am trying to use a loop to readAlllines of a particular path but it gives me an error at
result[i] = File.ReadAllLines(System.IO.Path.ChangeExtension(filepath,".csv"));
it says cannot implicity convert type 'string[]' to 'Spurs.Import.myCSVFile' please what am i doing wrong. is there another way to do this please am new to c#
You initialize myCSVFile[] result = null; thus result[i] = ... will fail.
EDIT: In the updated question you initialze myCSVFile[] result = new myCSVFile[]; which is invalid syntax. You must give the size of the array.
You traverse the FileNames collections twice in a nested loop
foreach (string filepath in ofd.FileNames) {
for (int i = 0; i < ofd.FileNames.Length; i++) {
result[i] = //somestuff
}
}
Thus result[i] will be assigned n^2 times, if you selected n files.
File.ReadAllLines() delivers an string[]. Each element of the array is on line in the given file. myresult[i] must be an Instance of myCSVFile. You will have to do some additional parsing of the filecontents to create such an instance.
EDIT: I'm not sure, what to say on the parsing part, without knowing a file structure. But, to be honest, if you don't understand why you cannot assign a string[] to some custom class, we won't be able to help you here.
I successfully imported following file in database but my import method removes double quotes during saving process. but i want to export this file as it is , i.e add quotes to a string which contains delimiter so how to achieve this .
here is my csv file with headers and 1 record.
PTNAME,REGNO/ID,BLOOD GRP,WARD NAME,DOC NAME,XRAY,PATHO,MEDICATION,BLOOD GIVEN
Mr. GHULAVE VASANTRAO PANDURANG,SH1503/00847,,RECOVERY,SHELKE SAMEER,"X RAY PBH RT IT FEMUR FRACTURE POST OP XRAY -ACCEPTABLE WITH IMPLANT IN SITU 2D ECHO MILD CONC LVH GOOD LV SYSTOLIC FUN, ALTERED LV DIASTOLIC FUN.", HB-11.9gm% TLC-8700 PLT COUNT-195000 BSL-173 UREA -23 CREATININE -1.2 SR.ELECTROLYTES-WNR BLD GROUP-B + HIV-NEGATIVE HBsAG-NEGATIVE PT INR -15/15/1.0. ECG SINUS TACHYCARDIA ,IV TAXIMAX 1.5 GM 1-0-1 IV TRAMADOL DRIP 1-0-1 TAB NUSAID SP 1-0-1 TAB ARCOPAN D 1-0-1 CAP BONE C PLUS 1 -0-1 TAB ANXIT 0.5 MG 0-0-1 ANKLE TRACTION 3 KG RT LL ,NOT GIVEN
Here is my method of export:
public void DataExport(string SelectQuery, string fileName)
{
try
{
DataTable dt = new DataTable();
SqlDataAdapter da = new SqlDataAdapter(SelectQuery, con);
da.Fill(dt);
//Sets file path and print Headers
// string filepath = txtreceive.Text + "\\" + fileName;
string filepath = #"C:\Users\Priya\Desktop\R\z.csv";
StreamWriter sw = new StreamWriter(filepath);
int iColCount = dt.Columns.Count;
// First we will write the headers if IsFirstRowColumnNames is true: //
for (int i = 0; i < iColCount; i++)
{
sw.Write(dt.Columns[i]);
if (i < iColCount - 1)
{
sw.Write(',');
}
}
sw.Write(sw.NewLine);
foreach (DataRow dr in dt.Rows) // Now write all the rows.
{
for (int i = 0; i < iColCount; i++)
{
if (!Convert.IsDBNull(dr[i]))
{
sw.Write(dr[i].ToString());
}
if (i < iColCount - 1)
{
sw.Write(',');
}
}
sw.Write(sw.NewLine);
}
sw.Close();
}
catch { }
}
if (myString.Contains(","))
{
myWriter.Write("\"{0}\"", myString);
}
else
{
myWriter.Write(myString);
}
The very simplest thing that you can do is replace the line sw.Write(dr[i].ToString()); with this:
var text = dr[i].ToString();
text = text.Contains(",") ? String.Format("\"{0}\"", text) : text;
sw.Write(text);
However, there are quite a few other issues with your code - most importantly you are opening a lot of disposable resources without disposing them properly.
I'd suggest a bit of a rewrite, like this:
public void DataExport(string SelectQuery, string fileName)
{
using (var dt = new DataTable())
{
using (var da = new SqlDataAdapter(SelectQuery, con))
{
da.Fill(dt);
var header = String.Join(
",",
dt.Columns.Cast<DataColumn>().Select(dc => dc.ColumnName));
var rows =
from dr in dt.Rows.Cast<DataRow>()
select String.Join(
",",
from dc in dt.Columns.Cast<DataColumn>()
let t1 = Convert.IsDBNull(dr[dc]) ? "" : dr[dc].ToString()
let t2 = t1.Contains(",") ? String.Format("\"{0}\"", t1) : t1
select t2);
using (var sw = new StreamWriter(fileName))
{
sw.WriteLine(header);
foreach (var row in rows)
{
sw.WriteLine(row);
}
sw.Close();
}
}
}
}
I've also broken apart the querying of the data from the data adapter from the writing of the data to the stream writer.
And, of course, I'm adding double-quotes to text that contains commas.
The only other thing I was concerned about was the fact that con is clearly a class-level variable and it is being left open. That's bad. Connections should be opened and closed each time they are used. You should probably consider making that change too.
You could also remove the stream write entirely by replacing that block with this:
File.WriteAllLines(fileName, new [] { header }.Concat(rows));
And, finally, wrapping your code in a try { ... } catch { } is just a bad practice. It's like saying "I'm writing some code that could fail, but I don't care and I don't want to be told if it does fail". You should only even catch specific exceptions that you do deal with. In this code you should consider catching file exceptions like running out of hard drive space, or writing to a read-only file, etc.
I need to read from a CSV/Tab delimited file and write to such a file as well from .net.
The difficulty is that I don't know the structure of each file and need to write the cvs/tab file to a datatable, which the FileHelpers library doesn't seem to support.
I've already written it for Excel using OLEDB, but can't really see a way to write a tab file for this, so will go back to a library.
Can anyone help with suggestions?
.NET comes with a CSV/tab delminited file parser called the TextFieldParser class.
http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
It supports the full RFC for CSV files and really good error reporting.
I used this CsvReader, it is really great and well configurable. It behaves well with all kinds of escaping for strings and separators. The escaping in other quick and dirty implementations were poor, but this lib is really great at reading. With a few additional codelines you can also add a cache if you need to.
Writing is not supported but it rather trivial to implement yourself. Or inspire yourself from this code.
Simple example with CsvHelper
using (TextWriter writer = new StreamWriter(filePath)
{
var csvWriter = new CsvWriter(writer);
csvWriter.Configuration.Delimiter = "\t";
csvWriter.Configuration.Encoding = Encoding.UTF8;
csvWriter.WriteRecords(exportRecords);
}
Here are a couple CSV reader implementations:
http://www.codeproject.com/KB/database/CsvReader.aspx
http://www.heikniemi.fi/jhlib/ (just one part of the library; includes a CSV writer too)
I doubt there is a standard way to convert CSV to DataTable or database 'automatically', you'll have to write code to do that. How to do that is a separate question.
You'll create your datatable in code, and (presuming a header row) can create columns based on your first line in the file. After that, it will simply be a matter of reading the file and creating new rows based on the data therein.
You could use something like this:
DataTable Tbl = new DataTable();
using(StreamReader sr = new StreamReader(path))
{
int count = 0;
string headerRow = sr.Read();
string[] headers = headerRow.split("\t") //Or ","
foreach(string h in headers)
{
DataColumn dc = new DataColumn(h);
Tbl.Columns.Add(dc);
count++;
}
while(sr.Peek())
{
string data = sr.Read();
string[] cells = data.Split("\t")
DataRow row = new DataRow();
foreach(string c in cells)
{
row.Columns.Add(c);
}
Tbl.Rows.Add(row);
}
}
The above code has not been compiled, so it may have some errors, but it should get you on the right track.
You can read and write csv file..
This may be helpful for you.
pass split char to this parameter "serparationChar"
Example : -
private DataTable dataTable = null;
private bool IsHeader = true;
private string headerLine = string.Empty;
private List<string> AllLines = new List<string>();
private StringBuilder sb = new StringBuilder();
private char seprateChar = ',';
public DataTable ReadCSV(string path, bool IsReadHeader, char serparationChar)
{
seprateChar = serparationChar;
IsHeader = IsReadHeader;
using (StreamReader sr = new StreamReader(path,Encoding.Default))
{
while (!sr.EndOfStream)
{
AllLines.Add( sr.ReadLine());
}
createTemplate(AllLines);
}
return dataTable;
}
public void WriteCSV(string path,DataTable dtable,char serparationChar)
{
AllLines = new List<string>();
seprateChar = serparationChar;
List<string> StableHeadrs = new List<string>();
int colCount = 0;
using (StreamWriter sw = new StreamWriter(path))
{
foreach (DataColumn col in dtable.Columns)
{
sb.Append(col.ColumnName);
if(dataTable.Columns.Count-1 > colCount)
sb.Append(seprateChar);
colCount++;
}
AllLines.Add(sb.ToString());
for (int i = 0; i < dtable.Rows.Count; i++)
{
sb.Clear();
for (int j = 0; j < dtable.Columns.Count; j++)
{
sb.Append(Convert.ToString(dtable.Rows[i][j]));
if (dataTable.Columns.Count - 1 > j)
sb.Append(seprateChar);
}
AllLines.Add(sb.ToString());
}
foreach (string dataline in AllLines)
{
sw.WriteLine(dataline);
}
}
}
private DataTable createTemplate(List<string> lines)
{
List<string> headers = new List<string>();
dataTable = new DataTable();
if (lines.Count > 0)
{
string[] argHeaders = null;
for (int i = 0; i < lines.Count; i++)
{
if (i > 0)
{
DataRow newRow = dataTable.NewRow();
// others add to rows
string[] argLines = lines[i].Split(seprateChar);
for (int b = 0; b < argLines.Length; b++)
{
newRow[b] = argLines[b];
}
dataTable.Rows.Add(newRow);
}
else
{
// header add to columns
argHeaders = lines[0].Split(seprateChar);
foreach (string c in argHeaders)
{
DataColumn column = new DataColumn(c, typeof(string));
dataTable.Columns.Add(column);
}
}
}
}
return dataTable;
}
I have found best solution
http://www.codeproject.com/Articles/415732/Reading-and-Writing-CSV-Files-in-Csharp
Just I had to re-write
void ReadTest()
{
// Read sample data from CSV file
using (CsvFileReader reader = new CsvFileReader("ReadTest.csv"))
{
CsvRow row = new CsvRow();
while (reader.ReadRow(row))
{
foreach (string s in row)
{
Console.Write(s);
Console.Write(" ");
}
Console.WriteLine();
row = new CsvRow(); //this line added
}
}
}
Well, there is another library Cinchoo ETL - an open source one, for reading and writing CSV files.
Couple of ways you can read CSV files
Id, Name
1, Tom
2, Mark
This is how you can use this library to read it
using (var reader = new ChoCSVReader("emp.csv").WithFirstLineHeader())
{
foreach (dynamic item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
If you have POCO object defined to match up with CSV file like below
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
}
You can parse the same file using this POCO class as below
using (var reader = new ChoCSVReader<Employee>("emp.csv").WithFirstLineHeader())
{
foreach (var item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
Please check out articles at CodeProject on how to use it.
Disclaimer: I'm the author of this library