Parsing data from CSV file - c#

Like in title I have problem with parsing data from CVS files. When i choose file with diffrent formating all i get is "Input string was not in a correct format".
My code works with files formatted like that:
16.990750 4.0
17.000250 5.0
17.009750 1.0
17.019250 6.0
But cant handle files formatted like this one:
Series1 - X;Series1 - Y;
285.75;798
285.79;764
285.84;578
285.88;690
This is code responsibile for reading data from file and creating chart from it:
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{
string cos = File.ReadAllText(openFileDialog1.FileName);
string[] rows = cos.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
DataTable table = new DataTable();
table.Columns.Add("xValue", typeof(decimal));
table.Columns.Add("yValue", typeof(decimal));
foreach (string row in rows)
{
string[] values = row.Split(' ');
DataRow ch = table.NewRow();
ch[0] = Decimal.Parse(values[0], NumberStyles.AllowDecimalPoint, CultureInfo.InvariantCulture);
ch[1] = Decimal.Parse(values[1], NumberStyles.AllowDecimalPoint, CultureInfo.InvariantCulture);
table.Rows.Add(ch);
}
if (seria == false)
{
wykres.Series.Add("series");
wykres.Series["series"].ChartType = System.Windows.Forms.DataVisualization.Charting.SeriesChartType.Line;
wykres.Series["series"].XValueMember = "xValue";
wykres.Series["series"].YValueMembers = "yValue";
wykres.DataSource = table;
wykres.DataBind();
seria = true;
}
}
EDIT
I changed parsing method to this one:
foreach (string row in rows)
{
var values = row.Split(';');
var ch = table.NewRow();
decimal num = 0;
if (decimal.TryParse(values[0], out num))
ch[0] = num;
if (decimal.TryParse(values[1], out num))
ch[1] = num;
table.Rows.Add(ch);
}
It works okay but with one exception - It can't read decimals only integers from csv file(see the picture below).
View of table in locals
Why is this happening?

I suggest you don't re-invent the wheel, but use some well-tested library to parse the CSV (for example, your implementation does not handle quoted values well. It also doesn't allow the separator as part of a value).
And guess what: .NET includes something that could help you: the TextFieldParser class. Don't worry about the VisualBasicnamespace - it works in C#, too :-)

foreach (string row in rows)
{
var values = row.Split(';');
var ch = table.NewRow();
decimal num = 0;
if (decimal.TryParse(values[0], out num))
ch[0] = num;
if (decimal.TryParse(values[1], out num))
ch[1] = num;
table.Rows.Add(ch);
}
In the second text format the separator is(;) and first row of the text has two strings therefore to convert a string to decimal use decimal.TryParse() instead of decimal.Parse().the return type of method TryParse() is boolean
therefore if it returned true that means the string converted successful .

Related

c# csv count a specified data in file or in datagridview

I have a csv file and would like to count the 2. column how many times contains 111.
the csv file has 46 separated columns with separator ; .
"first col" "second col" "....."
abc 111 a
abc 112 b
abc 113 c
abc 111 d
abc 112 e
abc 113 f
i would like to count the 111.
Filled up first the datagridview fom datatable.
dgv.DataSource = dgv_table;
string[] raw_text = File.ReadAllLines("d:\\"+lb_csv.Text);
string[] data_col = null;
int x = 0;
foreach (string text_line in raw_text)
{
// MessageBox.Show(text_line);
data_col = text_line.Split(';');
if (x == 0)
{
for (int i = 0; i <= data_col.Count() - 1; i++)
{
dgv_table.Columns.Add(data_col[i]);
}
//header
x++;
}
else
{
//data
dgv_table.Rows.Add(data_col);
}
I find a lots of solution to count the 2nd columns specified data:111
but all time i had problems.
int xCount = dgv.Rows.Cast<DataGridViewRow>().Select(row => row.Cells["second col"].Value).Where(s => s !=null && Equals(111)).Count();
this.lb_qty.Text = xCount.ToString();
But it gives error for row.Cells["second col"].Value
An unhandled exception of type 'System.ArgumentException' occurred in System.Windows.Forms.dll
Additional information: Column named second col cannot be found.
Can someone help me how to solve this problem and get the needed result?
I would suggest you to skip using DataGridView and use counter variable in your loop, like Arkadiusz suggested.
If you still want to work with DataTable, count values like this:
int xCount = dgv_table.Rows.Cast<DataRow>().Count(r => r["second col"] != null && r["second col"].ToString() == "111");
I would try to read the file into a DataTable and use it as DataSource for the DataGridView.
DataTable d_Table = new DataTable();
//fill the DataTable
this.dgv_table.DataSource = d_Table;
To count the rows wich contains 111 in the second column, you can select the DataTable like this:
DataTable d_Table = new DataTable();
//fill the DataTable
DataRow[] rowCount = d_Table.Select("secondCol = '111'");
this.lb_qty.Text = rowCount.Length.ToString();
Or you can do it in a foreach-loop:
int count = 0;
foreach(DataGridViewRow dgr in this.dgv_table.Rows)
{
if(dgr.Cells["secondCol"].Value.ToString() == "111") count++;
}
this.lb_qty.Text = count.ToString();
you can use this method to save the CSV into List of arrays List
public static List<string[]> readCSV(String filename)
{
List<string[]> result = new List<string[]>();
try
{
string[] line = File.ReadAllLines(filename);
foreach (string l in line)
{
string[] value= vrstica.Split(',');
result.Add(value);
}
}
catch (Exception e)
{
Console.WriteLine("Error: '{0}'", e);
}
return result;
}
every array will represent a column, so you can simply find the frequency of any value using LINQ or even loop:
foreach (var item in tmp[1].GroupBy(c => c))
{
Console.WriteLine("{0} : {1}", item.Key, item.Count());
}
int CountValues(string input, string searchedValue, int ColumnNumber, bool skipFirstLine = false)
{
int numberOfSearchedValue= 0;
string line;
using (StreamReader reader = new StreamReader (input))
{
if(skipFirstLine)
reader.ReadLine();
while ((line = reader.ReadLine()) != null)
{
if(line.Split(';')[ColumnNumber] == searchedValue)
numberOfSearchedValue++;
}
}
return numberOfSearchedValue;
}
Edit:
StreamReader.ReadLine() reads the line but also, using this method we are jumping to second line. If there is no more lines it returns null, so that is our ending condition. Rest of the code is readable, I think
:)
Didn't test that so be careful :)
It might be necessary to use Trim() or ToUpperCase() in some places (as usually when you are searching).

Summation/Add two values inside foreach loop

I want to add Array items inside forach loop.
My RegionalEarn array comes like this,
[0]Region1=25
[1]Region2=50
I need final RModel.TAX Should be 75(25+50) But in my case it comes like 2550
My Code
string[] RegionalEarn = tickets["EARN"].ToString().Split(',');
foreach (var item in RegionalEarn)
{
RModel.TAX = RModel.TAX + item.Split('=')[1];
}
You're adding strings, not numbers. You can use the TryParse method on for example the Int32 type to try convert a string to an int. The other numbers types have a similar TryParse method. If your number comes with extra signs, dots or commas, apply the the overload that accepts a FormatProvider matching the Numberstyle or a Culture the number is from.
string[] RegionalEarn = tickets["EARN"].ToString().Split(',');
var sum =0;
foreach (var item in RegionalEarn)
{
var num = 0;
if (Int32.TryParse(item.Split('=')[1], out num))
{
sum = sum + num;
}
else
{
// log error, item.Split('=')[1] is not an int
}
}
RModel.TAX = sum.ToString();
actually you done like concatenation.it comes only string values. so try convert to int or double .
string[] RegionalEarn = tickets["EARN"].ToString().Split(',');
foreach (var item in RegionalEarn)
{
RModel.TAX = Convert.ToInt32(RModel.TAX) + Convert.ToInt32( item.Split('=')[1]);
}
This is a simple way to do it.
string[] RegionalEarn = tickets["EARN"].ToString().Split(',');
foreach (var item in RegionalEarn)
{
RModel.TAX = (int.Parse(RModel.TAX) * item.Split('=').Sum (p => int.Parse(p))).ToString();
}
The above will take RModel.TAX multiply it with the sum of the values in the item array using Sum() method. This should give you the correct result.

Casting a String as Integer in a .net DataTable

Disclaimer: This is my very first .net c# project
I am attempting to import a CSV into MSSQL but need to iterate through the CSV values first for sanitization purposes. Some of the columns in the CSV will be integer (will be used for calcuations later) and some are regular varchar.
My script above appears to force all values (that is row column values) in the DataTable as a string which throws an Exception later in my application when SQL cannot write a string as an integer.
Here is my method I am using for the getCSVImport which creates a datatable and populates it.
What I am thinking is to add another condition which checks if the value is an integer and then cast it as an integer (this kind of thing is new to me as PHP would does not handle types so strongly) but I fear that wont work as I am not sure if I can mix the values within a dataTable with various types.
So my question is, is there a way for me to have different values in a datatable as different types? My code below takes the line as a whole and writes it as a string, I need the values to be assigned either as string or as integer.
/*
* getCsvData()
* This method will create a datatable from the CSV file. We'll take the CSV file as is.
* and collect the data as needed:
*
* - Remove those original 4 lines (worthless info)
* - Line 5 starts with the headers, remove any of the brackets around the values
* - Iterate through the rest of the fields and sanitize them before we add it to the datatable
*
*/
private DataTable getCsvData(string csv_file_path)
{
// Create a new csvData tabletable object:
DataTable csvData = new DataTable();
try
{
using (TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
int row = 1;
while (!csvReader.EndOfData)
{
// Read the string and collect the row data
string[] rowData = csvReader.ReadFields();
if (row <= 4)
{
// We want to start on row 5 as first rows are nonsense :)
// Incriment the row so that we can do our magic above
row++;
continue;
} if(row == 5)
{
// Row 5 is the headers, we need to sanitize and continue:
foreach (string column in rowData)
{
// Remove the [ ] from the values:
var col = column.Substring(1, column.Length - 2);
DataColumn datecolumn = new DataColumn(col);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
// Incriment the row so that we can do our magic above
row++;
} else
{
// These are all of the actual rows, sanitize and add the rows:
//Making empty value as null
for (int i = 0; i < rowData.Length; i++)
{
// First remove the brackets:
if (rowData[i].Substring(0,1) == "[")
{
rowData[i] = rowData[i].Substring(1, rowData[i].Length - 2);
}
// Set blank to null:
if (rowData[i] == "" || rowData[i] == "-")
{
rowData[i] = null;
}
// Lastly, we need to do some calculations:
}
// Add the sanitized row to the DataTable:
csvData.Rows.Add(rowData);
}
}
}
}
catch (Exception ex)
{
throw new Exception("Could not parse the CSV file: "+ ex.Message);
}
return csvData;
}
You can cast the string to a int:
int j;
bool parsed=Int32.TryParse("-105", out j))
With TryParse you can check if it succeeded.
Then when you want to save it to the table again, cast it to string. You can simply do <variable>.ToString()
By default, data columns are initialized to a string data type.
There's an overload that allows you to specify the type, so I'd suggest you try that. Since your columns are known beforehand, you can easily handle this in your code.
private DataColumn AddColumn(string columnName, Type columnType)
{
// Remove the [ ] from the values:
var col = column.Substring(1, columnName.Length - 2);
DataColumn dataColumn = new DataColumn(col, columnType);
dataColumn.AllowDBNull = true;
return dataColumn;
}
if (row == 5)
{
csvData.Columns.Add(AddColumn(rowData[0], typeof(string)));
csvData.Columns.Add(AddColumn(rowData[1], typeof(int)));
csvData.Columns.Add(AddColumn(rowData[2], typeof(DateTime)));
csvData.Columns.Add(AddColumn(rowData[3], typeof(string)));
// etc
}
I'm not sure you'll even need to convert the other values before adding them to the DataTable, but if you do, many built-in types have TryParse methods, such as DateTime.TryParse and Int32.TryParse. You can call each of them in succession, and one of the "tries" succeeds, you'll know your type.
Alternatively, since you know the column types beforehand, you can just cast each value.
csvData.Rows.Add(Convert.ToString(rowData[0]),
Convert.ToInt32(rowData[1]),
Convert.ToDateTime(rowData[2]),
Convert.ToString(rowData[3]));
I would use *.TryParse(), ie: With this sample CSV:
*A sample csv file with
*some comment lines at top
-- with different comment
// comment strings.
[charField],[dateField],[intField],[decimalField]
"Sample char data 1",2016/1/2,123,123.45
"Sample char data 2",,2,1.5
"Sample char data 3",,3,
"Sample char data 4",,,
,,,
"Sample char data 6",2016/2/29 10:20,10,20.5
You might use TryParse on those datetime, int, decimal fields:
void Main()
{
var myData = ReadMyCSV(#"c:\MyPath\MyFile.csv");
// do whatever with myData
}
public IEnumerable<MyRow> ReadMyCSV(string fileName)
{
using (TextFieldParser tfp = new TextFieldParser(fileName))
{
tfp.HasFieldsEnclosedInQuotes = true;
tfp.SetDelimiters(new string[] { "," });
//tfp.CommentTokens = new string[] { "*","--","//" };
// instead of using comment tokens we are going to skip 4 lines
for (int j = 0; j < 4; j++)
{
tfp.ReadLine();
}
// header line.
tfp.ReadLine();
DateTime dt;
int i;
decimal d;
while (!tfp.EndOfData)
{
var data = tfp.ReadFields();
yield return new MyRow
{
MyCharData = data[0],
MyDateTime = DateTime.TryParse(data[1], out dt) ? dt : (DateTime?)null,
MyIntData = int.TryParse(data[2], out i) ? i : 0,
MyDecimal = decimal.TryParse(data[3], System.Globalization.NumberStyles.Any, null, out d) ? d : 0M
};
}
}
}
public class MyRow
{
public string MyCharData { get; set; }
public int MyIntData { get; set; }
public DateTime? MyDateTime { get; set; }
public decimal MyDecimal { get; set; }
}
I could further sanitize the data loaded, such as:
myData.Where( d => d.MyIntData != 0 );
Note: I didn't use a DataTable, which I could if I wanted to. For MSSQL loading, I would probably use an intermediate in-memory SQLite instance to save the sanitized data and then push to MSSQL using SqlBulkCopy class. A DataTable is of course an option (I just think it is less flexible).

Passing decimal values from dataTable to dataFrame fails

I have created this code sample to pass an object of type c# DataTable to R.Net dataFrame.
public static DataFrame ConvertDataTableToRDataFrame(DataTable tab)
{
REngine.SetEnvironmentVariables();
REngine engine = REngine.GetInstance();
double?[,] stringData = new double?[tab.Rows.Count, tab.Columns.Count];
DataFrame df = engine.Evaluate("df=NULL").AsDataFrame();
int irow = 0;
foreach (DataRow row in tab.Rows)
{
NumericVector x = engine.Evaluate("x=NULL").AsNumeric();
int icol = 0;
foreach (DataColumn col in tab.Columns)
{
if (row.Field<double?>(col) == null)
{
x = engine.Evaluate("x=c(x, NA) ").AsNumeric();
}
else { x = engine.Evaluate("x=c(x, " + row.Field<double?>(col) + ") ").AsNumeric(); }
icol++;
}
df = engine.Evaluate("df= as.data.frame(rbind(df,x)) ").AsDataFrame();
irow++;
}
return (df);
}
Everything seems to work fine untill i try to inspect the content of the dataframe. I found that values like 1.2355 turn in the dataframe to 12355. For some unknown reason it doesn't recognize . as decimal separator.
-adding #Panagiotis Kanavos proposition as an answer since it solves the problem-
Decimals don't have any specific decimal separator, they are binary values. A separator is used only when they are converted to strings. The output of row.Field<double?>(col) will be a Nullable<double> by definition, but concatenating it to a string will convert it to a string using the current user's culture.
Use String.Format(CultureInfo.InvariantCulture,"x=c(x, {0})" ,row.Field<double?>(col)) instead, or "x=c(x, " + row.Field<double?>(col).ToString(CultureInfo.InvariantCulture) + ") " –

access csv file with LINQ

I have a c# lab question:
This is my code todo add data from the csv file, after compile it gives a error the name "rows" does not exist in current content
foreach (string row in rows)
{
if (string.IsNullOrEmpty(row)) continue;
string[] cols = row.Split(',');
DailyValues v = new DailyValues();
v.Open = Convert.To*(cols[0]);
v.High = Convert.To*(cols[1]);
v.Low = Convert.To*(cols[2]);
v.Close = Convert.To* (cols[3]);
v.Volume = Convert.To* (cols[4]);
v.AdjClose = Convert.To*(cols[5]);
v.Date = Convert.To*(cols[6]);
values.Add(v);
return values;
}
It looks like your CSV file has data which can't be converted into a Decimal. Run it in the debugger, and have a look at row when the exception is thrown.
If you use Decimal.TryParse(), the return value will tell you if the conversion was successful without an exception being thrown.
Edit:
As an example for TryParse:
Decimal _Open, _High;
if (!Decimal.TryParse(cols[0], out _Open))
{
Debug.Print("Error on row: {0}", row);
continue;
}
v.Open = _Open;
if (!Decimal.TryParse(cols[1], out _High))
{
Debug.Print("Error on row: {0}", row);
continue;
}
v.High = _High;

Categories