Don't split the string if contains in double marks - c#

I have a text delimeted file need to convert into datatable. Given the text something like this :
Name,Contact,Email,Date Of Birth,Address
JOHN,01212121,hehe#yahoo.com,1/12/1987,"mawar rd, shah alam, selangor"
JACKSON,01223323,haha#yahoo.com,1/4/1967,"neelofa rd, sepang, selangor"
DAVID,0151212,hoho#yahoo.com,3/5/1956,"nora danish rd, klang, selangor"
And this is how i read the text file in C#
DataTable table = new DataTable();
using (StreamReader sr = new StreamReader(path))
{
#region Text to csv
while (!sr.EndOfStream)
{
string[] line = sr.ReadLine().Split(',');
//table.Rows.Add(parts[0], parts[1], parts[2], parts[3], parts[4], parts[5]);
if (IsRowHeader)//Is user want to read first row as the header
{
foreach (string column in line)
{
table.Columns.Add(column);
}
totalColumn = line.Count();
IsRowHeader = false;
}
else
{
if (totalColumn == 0)
{
totalColumn = line.Count();
for (int j = 0; j < totalColumn; j++)
{
table.Columns.Add();
}
}
// create a DataRow using .NewRow()
DataRow row = table.NewRow();
// iterate over all columns to fill the row
for (int i = 0; i < line.Count(); i++)
{
row[i] = line[i];
}
// add the current row to the DataTable
table.Rows.Add(row);
}
}
The column is dynamic, the user can add or remove the column on the text file. So I need to check how many column and set to datatable, after that I will read for each line, set value to datarow and then add row to table.
If I don't remove the semicolon inside the double marks, it will show the error "Cannot find column 5" because on the first line is only 4 column (start from 0).
What the best way to deal with text delimited?

Don't try and re-invent the CSV-parsing wheel. Use the parser built into .NET: Microsoft.VisualBasic.FileIO.TextFieldParser
See https://stackoverflow.com/a/3508572/7122.

No, just don't. Don't try and write your own CSV parser - there's no reason to do it.
This article explains the problem and recommends using FileHelpers - which are decent enough.
There is also the Lumenworks reader which is simpler and just as useful.
Finally apparently you can just use DataSets to link to your CSV as described here. I didn't try this one, but looks interesting, if probably outdated.

I usually go with something like this:
const char separator = ',';
using (var reader = new StreamReader("C:\\sample.txt"))
{
var fields = (reader.ReadLine() ?? "").Split(separator);
// Dynamically add the columns
var table = new DataTable();
table.Columns.AddRange(fields.Select(field => new DataColumn(field)).ToArray());
while (reader.Peek() >= 0)
{
var line = reader.ReadLine() ?? "";
// Split the values considering the quoted field values
var values = Regex.Split(line, ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")
.Select((value, current) => value.Trim())
.ToArray()
;
// Add those values directly
table.Rows.Add(values);
}
// Demonstrate the results
foreach (DataRow row in table.Rows)
{
Console.WriteLine();
foreach (DataColumn col in table.Columns)
{
Console.WriteLine("{0}={1}", col.ColumnName, row[col]);
}
}
}

Related

print an entire table in C#

I'm trying to print the content of a DataTable, starting with the column headers, followed by the content of the table tupples.
output.Add($"Table : [{dataTable.TableName}]");
string strColumnNames = "";
foreach (DataColumn col in dataTable.Columns)
{
if (strColumnNames == "")
strColumnNames = col.ColumnName.PadLeft(col.MaxLength - col.ColumnName.Length); // (*)
else strColumnNames = strColumnNames + "|" +
col.ColumnName.PadLeft(col.MaxLength - col.ColumnName.Length); // (*)
}
output.Add($"[{strColumnNames}]");
foreach (DataRow dataRow in dataTable.Rows)
{
string temp = "";
for (int i = 0; i < dataRow.ItemArray.Count(); i++)
{
if (i == 0)
temp = dataRow.ItemArray[i].ToString(); // (**)
else temp += "|" + dataRow.ItemArray[i].ToString(); // (**)
}
output.Add($"[{temp}]");
}
The (*) parts in this code are using the MaxLength property of the DataColumns maximum length in order to get a column-like output.
I would like to do the same in the (**) parts, but I don't know how to access the corresponding DataColumn, starting from the dataRow object.
Does anybody have an idea?
Thanks in advance
You already have the dataTable instance available. dataTable.Columns[i] should give you the appropriate DataColumn.
Datatable has already been instantiated here. If you want to print the datacolumn you should use dataTable.Column[i] for the appropriate column.

CSV Helper: Parsing null vs empty cells

I am using CSVHelper to parse the CSV file.
I am having some issues to identify when a null cell value or a cell with some value (ie one or more spaces).
Issue is when the user add just one space in the file in its cell and uploads the file, the CSV helper trims that cell value so that value is passed as "".
Now when the user doesnt add anything(or types) to the cell this is also passed as like "".
So what I want is:
- Nulls should not be allowed to be uploaded.
- One or more spaces in a cell is allowed.
How can I achieve this using CSVHelper. Below is my sample code:
using (TextReader fileReader = new StreamReader(file.OpenReadStream()))
{
var configuration = new Configuration
{
HasHeaderRecord = parameter.HasHeader,
Delimiter = parameter.Delimiter.ToString(),
Quote = parameter.Quote
};
using (var csv = new CsvReader(fileReader, configuration))
{
for (int rowIndex = 0; await csv.ReadAsync(); rowIndex++)
{
var record = csv.GetRecord<dynamic>() as IDictionary<string, object>;
string[] row = record?.Select(i => i.Value as string).ToArray();
for (int i = 0; i < row.Length; i++)
{
//process rows
}
}
}
}
Below is the csv example:
"1"," ","1"
"2","0"," "
"3","","1"
In the above csv first row has second column with one space which should be allowed
Third row has 2nd column with null which should not be allowed.
Anything in my code which is missing or any workaround to handle this?
Thanks
CsvHelper 15.0.3
With the following code I show spaces where there are spaces and empty where it is empty.
Maybe there is something else going on?
static void Main(string[] args)
{
ProcessRecords();
}
static async void ProcessRecords()
{
using (var reader = new StringReader("\"1\",\" \",\"1\"\n\"2\",\"0\",\" \"\n\"3\",\"\",\"1\""))
{
var configuration = new CsvHelper.Configuration.CsvConfiguration(CultureInfo.InvariantCulture)
{
HasHeaderRecord = false,
Delimiter = ",",
Quote = '"'
};
using (var csv = new CsvReader(reader, configuration))
{
for (int rowIndex = 0; await csv.ReadAsync(); rowIndex++)
{
Console.WriteLine($"Row: {rowIndex}");
var record = csv.GetRecord<dynamic>() as IDictionary<string, object>;
string[] row = record?.Select(i => i.Value as string).ToArray();
for (int i = 0; i < row.Length; i++)
{
if (row[i] == " ")
{
Console.WriteLine("Has a space");
}
if (row[i] == "")
{
Console.WriteLine("Empty value");
}
}
}
Console.ReadKey();
}
}
}

Why would there be no data in the visualiser when there is valid data in the DataTable?

I'm trying to build a wrapper for SpreadsheetLight that returns a DataSet from any .xlsx document passed through it. However, I seem to be having a problem with DataRows not being added to a temporary DataTable.
Here's part of the code that parses a worksheet and generates a DataTable from it:
public DataSet ReadToDataSet(string fileName)
{
using (var wb = new SLDocument(fileName))
{
var set = new DataSet(GenerateTitle(wb.DocumentProperties.Title));
foreach (var wsName in wb.GetWorksheetNames())
{
var ws = wb.SelectWorksheet(wsName);
// Select worksheet returns a bool, so if it comes back false, try the next worksheet instead.
if (!ws) continue;
// Statistics gives indecies of the first and last data cells
var stats = wb.GetWorksheetStatistics();
// Create a new DataTable for each worksheet
var dt = new DataTable(wsName);
//var addDataColumns = true;
for (var colIdx = stats.StartColumnIndex; colIdx < stats.EndColumnIndex; colIdx++)
dt.Columns.Add(colIdx.ToString(), typeof(string));
// Scan each row
for (var rowIdx = stats.StartRowIndex; rowIdx < stats.EndRowIndex; rowIdx++)
{
//dt.Rows.Add();
var newRow = dt.NewRow();
// And each column for data
for (var colIdx = stats.StartColumnIndex; colIdx < stats.EndColumnIndex; colIdx++)
{
//if (addDataColumns)
// dt.Columns.Add();
newRow[colIdx - 1] = wb.GetCellValueAsString(rowIdx, colIdx);
//if (colIdx >= stats.EndColumnIndex)
// addDataColumns = false;
}
dt.Rows.Add(newRow);
}
set.Tables.Add(dt);
}
// Debug output
foreach (DataRow row in set.Tables[0].Rows)
{
foreach (var output in row.ItemArray)
{
Console.WriteLine(output.ToString());
}
}
return set;
}
}
Note: SpreadsheetLight indicies start from 1 instead of 0;
Now, I've tried replacing dt.Rows.Add() with new object[stats.EndColumnIndex -1];, as well as a temporary variable from var newRow = dt.NewRow(); and then passing them into the DataTable afterwards, but still get the same end result. The row objects are populating correctly, but aren't transferring to the DataTable at the end.
When you explore the object during runtime, it shows the correct number of rows and columns in the relevant properties. But when you open it up in the DataVisualiser you can only see the columns, no rows.
I must be missing something obvious.
Update
I looped through the resulting table and output the values to the console as a test. All the correct values appear, but the visualiser remains empty:
I guess the question now is, why would there be no data in the visualiser when there is valid data in the DataTable?
Update 2
Added the full method for reference, including a simple set of for loops to loop through all rows and columns in the first DataTable. Note: I also experimented with pulling the column creation out of the loop and even setting the datatypes. Made no difference. Commented code shows the original.
Ok, turns out the problem was most likely from the columns being added. Either there were too many columns for the visualiser to handle (1024) which I find hard to believe, or there was a bug in visual studio that's randomly corrected itself.
There's also a bug in SpreadsheetLight that lists all columns as having data when you call GetWorksheetStatistics(); so I've used a workaround that uses the maximum number of total cells available OR the stats.NumberOfColumns, whichever is the smallest.
Either way, the below code now functions.
public DataSet ReadToDataSet(string fileName)
{
using (var wb = new SLDocument(fileName))
{
var set = new DataSet(GenerateTitle(wb.DocumentProperties.Title));
foreach (var wsName in wb.GetWorksheetNames())
{
var ws = wb.SelectWorksheet(wsName);
// Select worksheet returns a bool, so if it comes back false, try the next worksheet instead.
if (!ws) continue;
// Statistics gives indecies of the first and last data cells
var stats = wb.GetWorksheetStatistics();
// There is a bug with the stats columns. Take the total number of elements available or the columns from the stats table, whichever is the smallest
var newColumnIndex = stats.NumberOfCells < stats.NumberOfColumns
? stats.NumberOfCells
: stats.NumberOfColumns;
// Create a new DataTable for each worksheet
var dt = new DataTable(wsName);
var addDataColumns = true;
// Scan each row
for (var rowIdx = stats.StartRowIndex; rowIdx < stats.EndRowIndex; rowIdx++)
{
var newRow = dt.NewRow();
// And each column for data
for (var colIdx = stats.StartColumnIndex; colIdx < newColumnIndex; colIdx++)
{
if (addDataColumns)
dt.Columns.Add();
newRow[colIdx - 1] = wb.GetCellValueAsString(rowIdx, colIdx);
}
addDataColumns = false;
dt.Rows.Add(newRow);
}
set.Tables.Add(dt);
}
return set;
}
}
Hopefully someone else finds this as a useful reference in the future, either for SpreadsheetLight or DataVisualiser in Visual Studio. If anyone know's of any limits for the visualiser, I'm all ears!

Add specific rows from dataGridView depending on search value

Following issue:
I have a datatable and a list which contains specific values.
routIds = col1Items.Distinct().ToList();
String searchValue = String.Empty;
int rowIndex = -1;
for (int i = 0; i < routIds.Count; i++)
{
searchValue = routIds[i];
foreach (DataGridViewRow row in form1.dataGridView5.Rows)
{
if (row.Cells[form1.routingIdBox.Text].Value != null) // Need to check for null if new row is exposed
{
if (row.Cells[form1.routingIdBox.Text].Value.ToString().Equals(searchValue))
{
rowIndex = row.Index;
foreach (DataGridViewColumn column in form1.dataGridView5.Columns)
dtRout.Columns.Add(column.Name);
for (int k = 0; k < form1.dataGridView5.Rows.Count; k++)
{
dtRout.Rows.Add();
for (int j = 0; j < form1.dataGridView5.Columns.Count; j++)
{
datTable.Rows[k][j] = form1.dataGridView5.Rows[rowIndex].Cells[j].Value;
}
}
}
}
}
}
I want to search the first column from my datagridview and check if it matches a specific value (from my array - routIds). If yes then I want to add the whole row into a datatable but I don't know how this works exactly. Tried around but I get exceptions (specific row not found).
Assuming you have a DataTable as your underlying DataSource. I would not iterate through the DataGridViewRows.
DataTable dataSource = dataGridView.DataSource as DataTable; // if it is a DataTable. If not, please specify in your question
// let's make a DataTable, which is a copy of your DataSource
DataTable dataTableFoundIds = new DataTable();
foreach (DataColumn column in dataSource.Columns)
dataTableFoundIds.Columns.Add(column.ColumnName, column.DataType);
// iterate through your routeIds
foreach (int id in routeIds)
{
var row = dataSource.AsEnumerable().FirstOrDefault(item => item["col1"].Equals(id)); // take the first row in your DataSource that matches your routeId
if (row != null)
{
dataTableFoundIds.Rows.Add(row.ItemArray); // if we find something, insert the whole row of our source table
}
}
Hope this helps!
UPDATE: if you want to find all occurances:
foreach (int id in routeIds)
{
var rows = dataSource.AsEnumerable().Where(item => item["col1"].Equals(id)); // take all rows in your DataSource that match your routeId
foreach(var row in rows)
{
dataTableFoundIds.Rows.Add(row.ItemArray); // if we find something, insert the whole row of our source table
}
}

Find a string in all DataTable columns

I am trying to find a fast way to find a string in all datatable columns!
Followed is not working as I want to search within all columns value.
string str = "%whatever%";
foreach (DataRow row in dataTable.Rows)
foreach (DataColumn col in row.ItemArray)
if (row[col].ToString() == str) return true;
You can use LINQ. It wouldn't be any faster, because you still need to look at each cell in case the value is not there, but it will fit in a single line:
return dataTable
.Rows
.Cast<DataRow>()
.Any(r => r.ItemArray.Any(c => c.ToString().Contains("whatever")));
For searching for random text and returning an array of rows with at least one cell that has a case-insensitive match, use this:
var text = "whatever";
return dataTable
.Rows
.Cast<DataRow>()
.Where(r => r.ItemArray.Any(
c => c.ToString().IndexOf(text, StringComparison.OrdinalIgnoreCase) > 0
)).ToArray();
If you want to check every row of every column in your Datatable, try this (it works for me!).
DataTable YourTable = new DataTable();
// Fill your DataTable here with whatever you've got.
foreach (DataRow row in YourTable.Rows)
{
foreach (object item in row.ItemArray)
{
//Do what ya gotta do with that information here!
}
}
Don't forget to typecast object item to whatever you need (string, int etc).
I've stepped through with the debugger and it works a charm. I hope this helps, and good luck!
This can be achieved by filtering. Create a (re-usable) filtering string based on all the columns:
bool UseContains = false;
int colCount = MyDataTable.Columns.Count;
string likeStatement = (UseContains) ? " Like '%{0}%'" : " Like '{0}%'";
for (int i = 0; i < colCount; i++)
{
string colName = MyDataTable.Columns[i].ColumnName;
query.Append(string.Concat("Convert(", colName, ", 'System.String')", likeStatement));
if (i != colCount - 1)
query.Append(" OR ");
}
filterString = query.ToString();
Now you can get the rows where one of the columns matches your searchstring:
string currFilter = string.Format(filterString, searchText);
DataRow[] tmpRows = MyDataTable.Select(currFilter, somethingToOrderBy);
You can create a routine of search with an array of strings with the names of the columns, as well:
string[] elems = {"GUID", "CODE", "NAME", "DESCRIPTION"};//Names of the columns
foreach(string column in elems)
{
string expression = string.Format("{0} like '%{1}%'",column,
txtSearch.Text.Trim());//Search Expression
DataRow[] row = data.Select(expression);
if(row.Length > 0) {
// Some code here
} else {
// Other code here
}
}
You can get names of columns by using ColmunName Method. Then, you can search every column in DataTable by using them. For example, follwing code will work.
string str = "whatever";
foreach (DataRow row in dataTable.Rows)
{
foreach (DataColumn column in dataTable.Columns)
{
if (row[column.ColumnName.ToString()].ToString().Contains(str))
{
return true;
}
}
}
You can create a filter expression on the datatable as well. See this MSDN article. Use like in your filter expression.
string filterExp = "Status = 'Active'";
string sortExp = "City";
DataRow[] drarray;
drarray = dataSet1.Customers.Select(filterExp, sortExp, DataViewRowState.CurrentRows);
for (int i=0; i < drarray.Length; i++)
{
listBox1.Items.Add(drarray[i]["City"].ToString());
}

Categories