reading a CSV into a Datatable without knowing the structure

reading a CSV into a Datatable without knowing the structure - c#

I am trying to read a CSV into a datatable.
The CSV maybe have hundreds of columns and only up to 20 rows.
It will look something like this:
+----------+-----------------+-------------+---------+---+
| email1 | email2 | email3 | email4 | … |
+----------+-----------------+-------------+---------+---+
| ccemail1 | anotherccemail1 | 3rdccemail1 | ccemail | |
| ccemail2 | anotherccemail2 | 3rdccemail2 | | |
| ccemail3 | anotherccemail3 | | | |
| ccemail4 | anotherccemail4 | | | |
| ccemail5 | | | | |
| ccemail6 | | | | |
| ccemail7 | | | | |
| … | | | | |
+----------+-----------------+-------------+---------+---+
i am trying to use genericparser for this; however, i believe that it requires you to know the column names.
string strID, strName, strStatus;
using (GenericParser parser = new GenericParser())
{
parser.SetDataSource("MyData.txt");
parser.ColumnDelimiter = "\t".ToCharArray();
parser.FirstRowHasHeader = true;
parser.SkipStartingDataRows = 10;
parser.MaxBufferSize = 4096;
parser.MaxRows = 500;
parser.TextQualifier = '\"';
while (parser.Read())
{
strID = parser["ID"]; //as you can see this requires you to know the column names
strName = parser["Name"];
strStatus = parser["Status"];
// Your code here ...
}
}
is there a way to read this file into a datatable without know the column names?

It's so simple!
var adapter = new GenericParsing.GenericParserAdapter(filepath);
DataTable dt = adapter.GetDataTable();
This will automatically do everything for you.

I looked at the source code, and you can access the data by column index too, like this
var firstColumn = parser[0]
Replace the 0 with the column number.
The number of colums can be found using
parser.ColumnCount

I'm not familiar with that GenericParser, i would suggest to use tools like TextFieldParser, FileHelpers or this CSV-Reader.
But this simple manual approach should work also:
IEnumerable<String> lines = File.ReadAllLines(filePath);
String header = lines.First();
var headers = header.Split(new[]{','}, StringSplitOptions.RemoveEmptyEntries);
DataTable tbl = new DataTable();
for (int i = 0; i < headers.Length; i++)
{
tbl.Columns.Add(headers[i]);
}
var data = lines.Skip(1);
foreach(var line in data)
{
var fields = line.Split(new[]{','}, StringSplitOptions.RemoveEmptyEntries);
DataRow newRow = tbl.Rows.Add();
newRow.ItemArray = fields;
}

i used generic parser to do it.
On the first run through the loop i get the columns names and then reference them to add them to a list
In my case i have pivoted the data but here is a code sample if it helps someone
bool firstRow = true;
List<string> columnNames = new List<string>();
List<Tuple<string, string, string>> results = new List<Tuple<string, string, string>>();
while (parser.Read())
{
if (firstRow)
{
for (int i = 0; i < parser.ColumnCount; i++)
{
if (parser.GetColumnName(i).Contains("FY"))
{
columnNames.Add(parser.GetColumnName(i));
Console.Log("Column found: {0}", parser.GetColumnName(i));
}
}
firstRow = false;
}
foreach (var col in columnNames)
{
double actualCost = 0;
bool hasValueParsed = Double.TryParse(parser[col], out actualCost);
csvData.Add(new ProjectCost
{
ProjectItem = parser["ProjectItem"],
ActualCosts = actualCost,
ColumnName = col
});
}
}

Related

Replace repeated values in collection with its sum

I have a list of custom class ModeTime, its structure is below:
private class ModeTime
{
public DateTime Date { get; set; }
public string LineName { get; set; }
public string Mode { get; set; }
public TimeSpan Time { get; set; }
}
In this list I have some items, whose LineName and Modeare the same, and they are written in the list one by one. I need to sum Time property of such items and replace it with one item with sum of Time property without changing LineName and Mode, Date should be taken from first of replaced items. I will give an example below:
Original: Modified:
Date | LineName | Mode | Time Date | LineName | Mode | Time
01.09.2018 | Line1 | Auto | 00:30:00 01.09.2018 | Line1 | Auto | 00:30:00
01.09.2018 | Line2 | Auto | 00:10:00 01.09.2018 | Line2 | Auto | 00:15:00
01.09.2018 | Line2 | Auto | 00:05:00 01.09.2018 | Line2 | Manual | 00:02:00
01.09.2018 | Line2 | Manual | 00:02:00 01.09.2018 | Line2 | Auto | 00:08:00
01.09.2018 | Line2 | Auto | 00:08:00 01.09.2018 | Line1 | Manual | 00:25:00
01.09.2018 | Line1 | Manual | 00:25:00 01.09.2018 | Line2 | Auto | 00:24:00
01.09.2018 | Line2 | Auto | 00:05:00 02.09.2018 | Line1 | Auto | 00:05:00
02.09.2018 | Line2 | Auto | 00:12:00
02.09.2018 | Line2 | Auto | 00:07:00
02.09.2018 | Line1 | Auto | 00:05:00
I have tried to write method to do it, it partly works, but some not summarized items still remain.
private static List<ModeTime> MergeTime(List<ModeTime> modeTimes)
{
modeTimes = modeTimes.OrderBy(e => e.Date).ToList();
var mergedModeTimes = new List<ModeTime>();
for (var i = 0; i < modeTimes.Count; i++)
{
if (i - 1 != -1)
{
if (modeTimes[i].LineName == modeTimes[i - 1].LineName &&
modeTimes[i].Mode == modeTimes[i - 1].Mode)
{
mergedModeTimes.Add(new ModeTime
{
Date = modeTimes[i - 1].Date,
LineName = modeTimes[i - 1].LineName,
Mode = modeTimes[i - 1].Mode,
Time = modeTimes[i - 1].Time + modeTimes[i].Time
});
i += 2;
}
else
{
mergedModeTimes.Add(modeTimes[i]);
}
}
else
{
mergedModeTimes.Add(modeTimes[i]);
}
}
return mergedModeTimes;
}
I have also tried to wrap for with do {} while() and reduce source list modeTimes length. Unfortunately it leads to loop and memory leak (I waited till 5GB memory using).
Hope someone can help me. I searched this problem, in some familiar cases people use GroupBy. But I don't think it will work in my case, I must sum item with the same LineName and Mode, only if they are in the list one by one.

Most primitive solution would be something like this.
var items = GetItems();
var sum = TimeSpan.Zero;
for (int index = items.Count - 1; index > 0; index--)
{
var item = items[index];
var nextItem = items[index - 1];
if (item.LineName == nextItem.LineName && item.Mode == nextItem.Mode)
{
sum += item.Time;
items.RemoveAt(index);
}
else
{
item.Time += sum;
sum = TimeSpan.Zero;
}
}
items.First().Time += sum;
Edit: I missed last line, where you have to add leftovers. This only applies if first and second elements of the collection are the same. Without it, it would not assign aggregated time to first element.

You can use LINQ's GroupBy. To group only consecutive elements, this uses a trick. It stores the key values in a tuple together with a group index which is only incremented when LineName or Mode changes.
int i = 0; // Used as group index.
(int Index, string LN, string M) prev = default; // Stores previous key for later comparison.
var modified = original
.GroupBy(mt => {
var ret = (Index: prev.LN == mt.LineName && prev.M == mt.Mode ? i : ++i,
LN: mt.LineName, M: mt.Mode);
prev = (Index: i, LN: mt.LineName, M: mt.Mode);
return ret;
})
.Select(g => new ModeTime {
Date = g.Min(mt => mt.Date),
LineName = g.Key.LN,
Mode = g.Key.M,
Time = new TimeSpan(g.Sum(mt => mt.Time.Ticks))
})
.ToList();
This produces the expected 7 result rows.

Search multiple strings between two specific strings in txt files and display in a datagrid

I am trying to mine some data from raw .txt files which are saved in a folder. In each file I got RegretionModel and multiple PeakPoints.
My Raw data file looks something like this;
Model ApprunningVersion="10.4." LastExecution time= ......bla bla bla wr3r43f34f RegretionModel = Linear221....bal bal...
k7878k7 wef34ferf PeakPoints = 11.11.... bal bal
dwedw wf343f4 PeakPoints = 322.11..... bla blaa....
gewwg45gww35w PeakPoints = 6711.11.... bla bla blaaa...
I wanted to extract RegretionModel and all the PeakPoints values into two different RichTextBoxes.
if(all_files.Count>0)
{
var word_1 = "RegretionValue";
var word_2 = "PeakPoints";
foreach (string srd in all_files)
{
using (var sr = new StreamReader(srd))
{
while (!sr.EndOfStream)
{
var line = sr.ReadLine();
if (String.IsNullOrEmpty(line)) continue;
if (line.IndexOf(word_1, StringComparison.CurrentCultureIgnoreCase) >= 0)
{
int startIndex = line.IndexOf("RegretionValue \=") + "RegretionValue \=".Length;
int endIndex = line.IndexOf("\" LAPNum");
string flt_1 = line.Substring(startIndex, endIndex - startIndex);
richTextBox1.Text += flt_1 + "\r";
}
if (line.IndexOf(word_2, StringComparison.CurrentCultureIgnoreCase) >= 0)
{
int count = line.IndexOf(word_2, StringComparison.CurrentCultureIgnoreCase);
int startIndex_1 = line.IndexOf("PeakPoints \=") + "PeakPoints \=".Length;
int flt_2 = line.IndexOf("\" LPPCode");
string newString_1 = line.Substring(startIndex_1, flt_2 - startIndex_1);
richTextBox2.Text += newString_1 + "\r";
counter_1++;
label2.Text = counter_1.ToString() + " of " + matches + " completed";
label4.Text = count.ToString();
}
}
}
}
}
It gives me this, as i expected;
|---------------------|------------------|
| Linear221 | 11.11 |
|---------------------|------------------|
| | 322.11 |
|---------------------|------------------|
| | 6711.11 |
|---------------------|------------------|
But the issue is, When I read multiple files Everything gets mixed up.
|---------------------|------------------|
| Linear221 | 11.11 |
|---------------------|------------------|
| Linear321 | 322.11 |
|---------------------|------------------|
| | 6711.11 |
|---------------------|------------------|
| | 1.11 |
|---------------------|------------------|
| | 21.11 |
|---------------------|------------------|
Which is actually suppose to be;
|---------------------|------------------|
| Linear221 | 11.11 |
|---------------------|------------------|
| Linear221 | 322.11 |
|---------------------|------------------|
| Linear221 | 6711.11 |
|---------------------|------------------|
| Linear321 | 1.11 |
|---------------------|------------------|
| Linear321 | 21.11 |
|---------------------|------------------|
I know, using these two RichTextBoxes are not the best option here. So I thought of putting it to a data grid view without using a database, but I am stuck with linking each Peakpoint to a corresponding RegretionModel,
for example if i read one file I have one RegretionModel Name and multiple Peakpoints, how do i put each Peakpoint with corresponding RegrationModel to a Datagrid.
I am a newbie, any help would be appreciated.
Thank You.

C# MySQLDataReader.AffectedRows = -1

In my MySQL console, i can see the results of select Price from rates order by id, I get this:
mysql> select Price from rates order by id;
+-------+
| Price |
+-------+
| 100 |
| 120 |
| 150 |
| 200 |
| 350 |
| 700 |
| 500 |
| 700 |
| 800 |
| 1300 |
| 1500 |
| 7000 |
| 8000 |
| 15000 |
| 20000 |
+-------+
15 rows in set
but when I run it in this method as the string command;
public List<string[]> ExecuteQuery(string command)
{
com = new MySqlCommand(command, con);
reader = com.ExecuteReader();
if (reader.HasRows)
{
List<string[]> records = new List<string[]>();
while (reader.Read())
{
string[] row = new string[reader.FieldCount];
for (int i = 0; i < reader.RecordsAffected; i++)
row[i] = reader[i].ToString();
records.Add(row);
}
reader.Close();
return records;
}
else
{
reader.Close();
return new List<string[]>();
}
}
the reader.AffectedRows is -1 and it messes up the whole process...
But this one works just fine...
MySqlCommand com = new MySqlCommand("select Access from useraccounts where Username = '" + tbxUsername.Text + "' and Pass = '" + tbxPassword.Text + '\'', d.con);
object result = com.ExecuteScalar();
I am using this connection string: datasource = 192.168.43.191; database = database_name; user = user_name; password = pass_word; in a Visual Studio 2015 and XAMPP with Apache and MySQL running.
This is the first time I've encountered this problem. I hope you can help

The RecordsAffected is set when your query is an INSERT/UPDATE/DELETE query not when your query is a SELECT one. In your code it seems that you want to use the FieldCount property instead
while (reader.Read())
{
string[] row = new string[reader.FieldCount];
for (int i = 0; i < reader.FieldCount; i++)
row[i] = reader[i].ToString();
records.Add(row);
}
You can also change your code to this shorter one
public List<string[]> ExecuteQuery(string command)
{
List<string[]> records = new List<string[]>();
using(com = new MySqlCommand(command, con))
using(reader = com.ExecuteReader())
{
while (reader.Read())
{
string[] row = new string[reader.FieldCount];
for (int i = 0; i < reader.FieldCount i++)
row[i] = reader[i].ToString();
records.Add(row);
}
}
return records;
}
However, in general, I recommend to avoid these do it all methods that cannot be able to handle, in the most performant way, the many different kind of queries required by an application.
For example the method returns a list containing an array of string while, in reality, you are just returning a single column (no array needed) and the values are probably decimals that are converted to strings and probably are converted back to decimals when you use them. And we don't even start talking about dates. Do you see how this method propagates its problem through all your application?
If you want a general solution then choose a good ORM that abstract the use of a database and return data properly converted to object instances. Check for Entity Framework or Dapper (but many other exist)

The AffectedRows property should not be used with SELECT statements, since it is only meaningful when INSERT, UPDATE and DELETE statements are used. The following should fix your issue:
Int32 fields = reader.FieldCount;
List<String[]> records = new List<String[]>()
while (reader.Read())
{
String[] row = new String[fields];
for (Int32 i = 0; i < fields; ++i)
row[i] = reader.GetString(i);
records.Add(row);
}

how to parse two column in string in one cycle?

I have a string like this. I want to put the second row in an array(3,9,10,11...), and the third(5,8,4,3...) in an array
C8| 3| 5| 0| | 0|1|
C8| 9| 8| 0| | 0|1|
C8| 10| 4| 0| | 0|1|
C8| 11| 3| 0| | 0|1|
C8| 12| 0| 0| | 0|1|
C8| 13| 0| 0| | 0|1|
C8| 14| 0| 0| | 0|1|
This method originally parsed numbers by rows. now i have columns..
How to do this in this Parse method? I am trying for hours, i dont know what to do.
The Add method waits 2 integer. int secondNumberFinal, int thirdNumberFinal
private Parse(string lines)
{
const int secondColumn = 1;
const int thirdColum = 2;
var secondNumbers = lines[secondColumn].Split('\n'); // i have to split by new line, right?
var thirdNumbers = lines[thirdColum].Split('\n'); // i have to split by new line, right?
var res = new Collection();
for (var i = 0; i < secondNumbers.Length; i++)
{
try
{
var secondNumberFinal = Int32.Parse(secondNumbers[i]);
var thirdNumberFinal = Int32.Parse(thirdNumbers[i]);
res.Add(secondNumberFinal, thirdNumberFinal);
}
catch (Exception ex)
{
log.Error(ex);
}
}
return res;
}
thank you!

Below piece of code should do it for you. The logic is simple: Split the array with '\n' (please check if you need "\r\n" or some other line ending format) and then split with '|'. Returning the data as an IEnumerable of Tuple will provide flexibility and Lazy execution both. You can convert that into a List at the caller if you so desire using the Enumerable.ToList extension method
It uses LINQ (Select), instead of foreach loops due to its elegance in this situation
static IEnumerable<Tuple<int, int>> Parse(string lines) {
const int secondColumn = 1;
const int thirdColum = 2;
return lines.Split('\n')
.Select(line => line.Split('|'))
.Select(items => Tuple.Create(int.Parse(items[secondColumn]), int.Parse(items[thirdColum])));
}

If the original is a single string, then split once on newline to produce an array of string. Parse each of the new string by splitting on | & select the second & third values.
Partially rewriting your method for you :
private Parse(string lines)
{
const int secondColumn = 1;
const int thirdColum = 2;
string [] arrlines = lines.Split('\r');
foreach (string line in arrlines)
{
string [] numbers = line.Split('|');
var secondNumberFinal = Int32.Parse(numbers[secondNumbers]);
var thirdNumberFinal = Int32.Parse(numbers[thirdNumbers]);
// Whatever you want to do with them here
}
}

TreeView from database with null

I have som problem populating a treeview in c# from database. The table looks like this:
| code | description | attached to |
---------------------------------------
| P001 | TEST001 | NULL |
| P0001 | TEST002 | P001 |
| P002 | TEST003 | NULL |
| P00201 | TESTXXX | P002 |
| P00222 | TESTXXX | P002 |
| P002020 | TESTSSS | P00222 |
This does not work.
protected void PopulateTreeView(TreeNodeCollection parentNode, string parentID, DataTable folders)
{
foreach (DataRow folder in folders.Rows)
{
// if (Convert.ToInt32(folder["Attached to"]) == parentID)
if (string.IsNullOrEmpty(folder["Attached to"].ToString()))
{
String key = folder["code"].ToString();
String text = folder["description"].ToString();
TreeNodeCollection newParentNode = parentNode.Add(key, text).Nodes;
//PopulateTreeView(newParentNode, Convert.ToInt32(folder["code"]), folders);
PopulateTreeView(newParentNode, folder["code"].ToString(), folders);
}
}
}

The problem is with your If condition. You should search for rows that are attached to parentID. Here is corrected method with test:
public void Test_PopulateTreeViews()
{
var rootNode = new TreeNode();
var folders = new DataTable();
folders.Columns.Add("code");
folders.Columns.Add("description");
folders.Columns.Add("Attached to");
folders.Rows.Add("P001", "TEST001", string.Empty);
folders.Rows.Add("P0001", "TEST002", "P001");
folders.Rows.Add("P002", "TEST003", null);
folders.Rows.Add("P00201", "TEST003", "P002");
folders.Rows.Add("P00222", "TESTXXX", "P002");
folders.Rows.Add("P002020", "TESTSSS", "P00222");
PopulateTreeView(rootNode, string.Empty, folders);
}
private void PopulateTreeView(TreeNode parentNode, string parentID, DataTable folders)
{
foreach (DataRow folder in folders.Rows)
{
if (folder["Attached to"].ToString() == parentID)
{
String key = folder["code"].ToString();
String text = folder["description"].ToString();
var newParentNode = new TreeNode(key, text);
parentNode.ChildNodes.Add(newParentNode);
PopulateTreeView(newParentNode, folder["code"].ToString(), folders);
}
}
}
});

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

reading a CSV into a Datatable without knowing the structure - c#

It's so simple! var adapter = new GenericParsing.GenericParserAdapter(filepath); DataTable dt = adapter.GetDataTable(); This will automatically do everything for you.

I looked at the source code, and you can access the data by column index too, like this var firstColumn = parser[0] Replace the 0 with the column number. The number of colums can be found using parser.ColumnCount

Related

Replace repeated values in collection with its sum

Search multiple strings between two specific strings in txt files and display in a datagrid

C# MySQLDataReader.AffectedRows = -1

how to parse two column in string in one cycle?

TreeView from database with null

Categories

Resources