I am following a tutorial of an inventory stock management system in C# language.
The original csv file is a stock list, which contains four categories:
Item Code, Item Description, Item Count, OnOrder
The original csv file:
In the tutorial, the code is generating a DataTable object, which will be used in the GridView demo in the application.
Here is the code:
DataTable dataTable = new DataTable();
dataTable.Columns.Add("Item Code");
dataTable.Columns.Add("Item Description");
dataTable.Columns.Add("Current Count");
dataTable.Columns.Add("On Order");
string CSV_FilePath = "C:/Users/xxxxx/Desktop/stocklist.csv";
StreamReader streamReader = new StreamReader(CSV_FilePath);
string[] rawData = new string[File.ReadAllLines(CSV_FilePath).Length];
rawData = streamReader.ReadLine().Split(',');
while(!streamReader.EndOfStream)
{
rawData = streamReader.ReadLine().Split(',');
dataTable.Rows.Add(rawData[0], rawData[1], rawData[2], rawData[3]);
}
dataGridView1.DataSource = dataTable;
I am assuming that rawData = streamReader.ReadLine().Split(','); splits the file into an array object like this:
["A0001", "Horse on Wheels","5","No"]
["A0002","Elephant on Wheels","2","No"]
In the while loop, it literates through each line (each array) and assign each of the rawData[x] into corresponding column.
Is this right to understand this code snippet? Thanks in advance.
Another question is, why do I need to run
rawData = streamReader.ReadLine().Split(',');
in a while loop?
Thanks in advance.
Your code should actually look like this:
DataTable dataTable = new DataTable();
dataTable.Columns.Add("Item Code");
dataTable.Columns.Add("Item Description");
dataTable.Columns.Add("Current Count");
dataTable.Columns.Add("On Order");
string CSV_FilePath = "C:/Users/xxxxx/Desktop/stocklist.csv";
using(StreamReader streamReader = new StreamReader(CSV_FilePath))
{
// Skip the header row
streamReader.ReadLine();
while(!streamReader.EndOfStream)
{
string[] rawData = streamReader.ReadLine().Split(','); // read a row and split it into cells
dataTable.Rows.Add(rawData[0], rawData[1], rawData[2], rawData[3]); // add the elements from each cell as a row in the datatable
}
}
dataGridView1.DataSource = dataTable;
Changes I've made:
We've added a using block around StreamReader to ensure that the file handle is only open for as long as we need to read the file.
We now only read the file once, not twice.
Since we only need the rawData in the scope of the while loop, I've moved it into the loop.
Explaining what's wrong:
The following line reads the entire file, and then counts how many rows are in it. With this information, we initialize an array with as many positions as there are rows in the file. This means for a 500 row file, you can access positions rawData[0], rawData[1], ... rawData[499].
string[] rawData = new string[File.ReadAllLines(CSV_FilePath).Length];
With the next row you discard that array, and instead take the cells from the top of the file (the headers):
rawData = streamReader.ReadLine().Split(',');
This line states "read a single line from the file, and split it by comma". You then assign that result to rawData, replacing its old value. So the reason you need this again in the loop is because you're interested in more than the first row of the file.
Finally, you're looping through each row in the file and replacing rawData with the cells from that row. Finally, you add each row to the DataTable:
rawData = streamReader.ReadLine().Split(',');
dataTable.Rows.Add(rawData[0], rawData[1], rawData[2], rawData[3]);
Note that File.ReadAllLines(...) reads the entire file into memory as an array of strings. You're also using StreamReader to read through the file line-by-line, meaning that you are reading the entire file twice. This is not very efficient and you should avoid this where possible. In this case, we didn't need to do that at all.
Also note that your approach to reading a CSV file is fairly naïve. Depending on the software used to create them, some CSV files have cells that span more than one line in the file, some include quoted sections for text, and sometimes those quoted sections include commas which would throw off your split code. Your code also doesn't deal with the possibility of a file being badly formatted such that a row may have less cells than expected, or that there may be a trailing empty row at the end of the file. Generally it's better to use a dedicated CSV parser such as CsvHelper rather than trying to roll your own.
Related
I have a C# program I am creating as a parsing program. One section of the code pulls in a list of rows in an Excel file that do not have a particular value in a particular column. I was then going to use a foreach loop to loop through each of those rows and delete them, however it is taking quite a long time to cycle through each of those rows. And there are multiple tabs that I am needing to run this on.
So my thought was turning the list of Excel rows into a range and then just deleting that range. Is this possible to convert that list of rows into an Excel range? Below is the code snippet:
XLWorkbook wb = new XLWorkbook(Path.Combine(Destination, fName) + ".xlsx");
IXLWorksheet ws = wb.Worksheet(SheetName);
var range = ws.RangeUsed();
var table = range.AsTable();
var cell = table.HeadersRow().CellsUsed(c => c.Value.ToString() == ColName).FirstOrDefault();
//Gets the column letter for use in next section
string colLetter = cell.WorksheetColumn().ColumnLetter();
//Create list of rows that DO NOT contain the inv number being searched
//This is the list I would like to convert to a range to speed up the delete
List<IXLRow> deleterows = ws
.Column(colLetter)
.CellsUsed(c => c.Value.ToString() != i)
.Select(c => c.WorksheetRow()).ToList();
//Deletes the header row so that isn't removed
deleterows.RemoveAt(0);
foreach (IXLRow x in deleterows)
{
x.Delete();
}
Right now each iteration checks the value of the cell by accessing that cell in the file. That takes a lot of time.
Read all the data on one sheet into an array and do all the iteration in the array. That will be a magnitude faster.
Since you want to parse data, I suggest you use that array for any other operation you want to do and only write it back to a file when you are done. (If your result is stored in Excel again, make sure write the whole array at once as a Range).
Presently working on Blue Prism V5.0, now i am working with the code block in BP now my aim is to get all the filesnames in a given folder/path. For this I am using the C# code.
String[] str = Directory.GetFiles(inputFolderPath);
for this I am providing the input file/folder path through the same code block, the output of the file is string by default but BP doesn't have the type String so, how to convert the data type string to a collection.
Any suggestion will be helpful.
Thanks in advance
Well, there is standard BluePrism action that does that!
Object: Utility - File Management
Action: Get Files
It's pretty good, because it does output a lot of columns, including: filename and filepath, extension, size et cetera.
It also allows for filtering the results. Example could be ".*pdf" that should force action to return only the files with that extension.
Add an output to your code stage of Collection (System.Data.DataTable) type. Then, convert the String[] to a DataTable manually:
outputCollection = new DataTable(); // might not be necessary as Blue Prism will generally instantiate the instance for you; remove this line if you're receiving compiler warnings/errors
// create a column to store your paths
DataColumn col = new DataColumn();
col.DataType = System.Type.GetType("System.String")
col.ColumnName = "Filename";
outputCollection.Columns.Add(col);
// loop through the String[] array and place the values in the DataTable structure
foreach (String el in str) {
DataRow row = outputCollection.NewRow();
row["Filename"] = el;
outputCollection.Rows.Add(row);
}
When this is output back to your Blue Prism workflow, you'll have a single column in your output collection of "Text" data type.
I have a C# program that looks through directories for .txt files and loads each into a DataTable.
static IEnumerable<string> ReadAsLines(string fileName)
{
using (StreamReader reader = new StreamReader(fileName))
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
public DataTable GetTxtData()
{
IEnumerable<string> reader = ReadAsLines(this.File);
DataTable txtData = new DataTable();
string[] headers = reader.First().Split('\t');
foreach (string columnName in headers)
txtData.Columns.Add(columnName);
IEnumerable<string> records = reader.Skip(1);
foreach (string rec in records)
txtData.Rows.Add(rec.Split('\t'));
return txtData;
}
This works great for regular tab-delimited files. However, the catch is that not every .txt file in the folders I need to use contains tab-delimited data. Some .txt files are actually SQL queries, notes, etc. that have been saved as plain text files, and I have no way of determining that beforehand. Trying to use the above code on such files clearly won't lead to the expected result.
So my question is this: How can I tell whether a .txt file actually contains tab-delimited data before I try to read it into a DataTable using the above code?
Just searching the file for any tab character won't work because, for example, a SQL query saved as plain text might have tabs for code formatting.
Any guidance here at all would be much appreciated!
If each line contains the same number of elements, then simply read each line, and verify that you get the correct number of fields in each record. If not error out.
if (headers.Count() != CORRECTNUMBER)
{
// ERROR
}
foreach (string rec in records)
{
string[] recordData = rec.Split('\t');
if (recordData.Count() != headers.Count())
{
// ERROR
}
txtData.Rows.Add(recordData);
}
To do this you need a set of "signature" logic providers which can check a given sample of the file for "signature" content. This is similar to how virus scanners work.
Consider you would create a set of classes where the ISignature was implemented by set of classes;
class TSVFile : ISignature
{
enumFileType ISignature.Evaluate(IEnumerable<byte> inputStream);
}
class SQLFile : ISignature
{
enumFileType ISignature.Evaluate(IEnumerable<byte> inputStream);
}
each one would read an appropriate number of bytes in and return the known file type, if it can be evaluated. Each file parser would need its own logic to determine how many bytes to read and on what basis to make its evaluation.
I am trying to retrieve data from an Excel spreadsheet using C#. The data in the spreadsheet has the following characteristics:
no column names are assigned
the rows can have varying column lengths
some rows are metadata, and these rows label the content of the columns in the next row
Therefore, the objects I need to construct will always have their name in the very first column, and its parameters are contained in the next columns. It is important that the parameter names are retrieved from the row above. An example:
row1|---------|FirstName|Surname|
row2|---Person|Bob------|Bloggs-|
row3|---------|---------|-------|
row4|---------|Make-----|Model--|
row5|------Car|Toyota---|Prius--|
So unfortunately the data is heterogeneous, and the only way to determine what rows "belong together" is to check whether the first column in the row is empty. If it is, then read all data in the row, and check which parameter names apply by checking the row above.
At first I thought the straightforward approach would be to simply loop through
1) the dataset containing all sheets, then
2) the datatables (i.e. sheets) and
3) the row.
However, I found that trying to extract this data with nested loops and if statements results in horrible, unreadable and inflexible code.
Is there a way to do this in LINQ ? I had a look at this article to start by filtering the empty rows between data but didn't really get anywhere. Could someone point me in the right direction with a few code snippets please ?
Thanks in advance !
hiro
I see that you've already accepted the answer, but I think that more generic solution is possible - using reflection.
Let say you got your data as a List<string[]> where each element in the list is an array of string with all cells from corresponding row.
List<string[]> data;
data = LoadData();
var results = new List<object>();
string[] headerRow;
var en = data.GetEnumerator();
while(en.MoveNext())
{
var row = en.Current;
if(string.IsNullOrEmpty(row[0]))
{
headerRow = row.Skip(1).ToArray();
}
else
{
Type objType = Type.GetType(row[0]);
object newItem = Activator.CreateInstance(objType);
for(int i = 0; i < headerRow.Length; i++)
{
objType.GetProperty(headerRow[i]).SetValue(newItem, row[i+1]);
}
results.Add(newItem);
}
}
I have a csv file I am going to read from disk. I do not know up front how many columns or the names of the columns.
Any thoughts on how I should represent the fields. Ideally I want to say something like,
string Val = DataStructure.GetValue(i,ColumnName).
where i is the ith Row.
Oh just as an aside I will be parsing using the TextFieldParser class
http://msdn.microsoft.com/en-us/library/cakac7e6(v=vs.90).aspx
That sounds as if you would need a DataTable which has a Rows and Columns property.
So you can say:
string Val = table.Rows[i].Field<string>(ColumnName);
A DataTable is a table of in-memory data. It can be used strongly typed (as suggested with the Field method) but actually it stores it's data as objects internally.
You could use this parser to convert the csv to a DataTable.
Edit: I've only just seen that you want to use the TextFieldParser. Here's a possible simple approach to convert a csv to a DataTable:
var table = new DataTable();
using (var parser = new TextFieldParser(File.OpenRead(path)))
{
parser.Delimiters = new[]{","};
parser.HasFieldsEnclosedInQuotes = true;
// load DataColumns from first line
String[] headers = parser.ReadFields();
foreach(var h in headers)
table.Columns.Add(h);
// load all other lines as data '
String[] fields;
while ((fields = parser.ReadFields()) != null)
{
table.Rows.Add().ItemArray = fields;
}
}
If the column names are in the first row read that and store in a Dictionary<string, int> that maps the column name to the column index.
You could then store the remaining rows in a simple structure like List<string[]>.
To get a column for a row you'd do csv[rowIndex][nameToIndex[ColumnName]];
nameToIndex[ColumnName] gets the column index from the name, csv[rowIndex] gets the row (string array) we want.
This could of course be wrapped in a class.
Use the csv parser if you want, but a text parser is something very easy to do by yourself if you need customization.
For you need, i would use one (or more) Dictionnary. At least one to have the PropertyString --> column index. And maybe the reverse one column index--> PropertyString if needed.
When i parse a file for csv, i usually put the result in a list while parsing, and then in an array once complete for speed reasons (List.ToArray()).