Creating multiple data tables programmatically and structuring classes - c#

I'm working on a project where I will end up opening multiple text files.
Each of these text files will then be split into tables of data.
My initial thoughts were that I wanted to keep as much code away from the UI as possible and that I should create a class DataSetClass with a public method OpenFile.
The DataSet would be created as a global inside this class and then all my other functions for sorting the data or getting information from that particular dataset would be inside that class. However, since making the class, I believe I am unable to actually get at that data now and that instead, I should create a new DataSet and then start a class that takes the dataSet and modifies it by adding the tables. But I am worried that this would be feature envy?
In addition to this, I am struggling with how to add multiple DataTables to a DataSet.
In my code below I add the DataTable to a DataSet after I have put all the relevant data into it. However, I end up just adding the exact same DataTable into the DataSet each time and only ever end up with one table.
for (int data_table = 1; data_table < header_location.Count; data_table++) {
DataTable dt = new DataTable();
SaveHeaderInfo(dt, header_location[data_table], split_data_list);
SaveDataUntilStop(dt, start_location[data_table], split_data_list);
dataSet.Tables.Add(dt);
//dt.Reset();
}
When using the dt.Reset line, it simply wipes all the data even after the table has been added to the dataset.
I presume that this is probably because the .add function actually just inserts a pointer to the table into the dataset. So how would I create an entirely new datatable each time?
Thank you
EDIT: Adding Extra Code:
private void SaveDataUntilStop(DataTable dt, int start, List<string[]> split_data_list)
{
string[] row = new string[400];
for (int i = start + 1; i < split_data_list.Count; i++)
{
if (split_data_list[i][0] != "STOP")
{
for (int j = 0; j < split_data_list[i].Length; j++)
row[j] = split_data_list[i][j];
CheckColumnCountAndAdd(dt, row);
}
else
return;
}
}
private void SaveHeaderInfo(DataTable dt, int location, List<string[]> list)
{
CheckColumnCountAndAdd(dt, list[location + 1]);
CheckColumnCountAndAdd(dt, list[location + 2]);
}
private void CheckColumnCountAndAdd(DataTable dt, string[] str_to_add)
{
while (str_to_add.Length > dt.Columns.Count)
{
AddColumn(dt);
}
dt.Rows.Add(str_to_add);
}

Related

Error when trying to duplicate rows in DataTable in c#

I have an existing datatable called _longDataTable containing data. Now, I want to duplicate each row and in each duplicate of the row, I want to set only the value in the SheetCode column according to a value from a different datatable called values, see code below. For example, the values datatable contains 1, 2 and 3, then I want each row of _longDataTable to be duplicated three times and in each of the duplicated rows, I want the Sheet Code column to have values 1, 2 and 3 respectively. My code now looks like below:
foreach (DataRow sheets in _longDataTable.Rows)
{
for(int k = 0; k < number_of_sheets; k++)
{
var newRowSheets = _longDataTable.NewRow();
newRowSheets.ItemArray = sheets.ItemArray;
newRowSheets["SheetCode"] = values.Rows[k]["Sheet Code"];
//add edited row to long datatable
_longDataTable.Rows.Add(newRowSheets);
}
}
However, I get the following error:
Collection was modified; enumeration operation might not execute.
Does anyone know where this error comes from and how to solve my problem?
you get enumeration error because you are iterating through a collection which is changing in the loop(new rows added to it),
as you said in the comment, you get out of memory exception because you are iterating on the _longDataTable, then you add rows to it, the iteration never reach to end and you will get out of memory exception.
I assume this can help you:
//assume _longDataTable has two columns : column1 and SheetCode
var _longDataTable = new DataTable();
var duplicatedData = new DataTable();
duplicatedData.Columns.Add("Column1");
duplicatedData.Columns.Add("SheetCode");
foreach (DataRow sheets in _longDataTable.Rows)
{
for (int k = 0; k < number_of_sheets; k++)
{
var newRowSheets = duplicatedData.NewRow();
newRowSheets.ItemArray = sheets.ItemArray;
newRowSheets["SheetCode"] = values.Rows[k]["Sheet Code"];
newRowSheets["Column1"] = "anything";
//add edited row to long datatable
duplicatedData.Rows.Add(newRowSheets);
}
}
_longDataTable.Merge(duplicatedData);
do not modify _longDataTable, add rows to the temp table (with the same schema) and after the iteration merge two data tables.

How to fill DataTable rows only if primary key is bigger than last primary key?

The problem: inserting rows only if primary key is bigger than existing one when merging source DataTable to actual DataTable (ActualDT.Merge(SourceDT)).
Details of my problem below:
I fill an Actual DataTable with an Int64 primary key by the API from external server after deserializing JSON to Source DataTable. Then I write rows from DataTable to my database and cleanup all rows in DataTable except the biggest primary key. Later I request new data from the API and often the response contains the same rows I already wrote to database and cleanup from my DataTable.
If I won't cleanup the DataTable rows, performance decrease and it's memory pig. So, I leave one row with the biggest primary key after cleaning.
I don't want to compare every PrimaryKey from Source DataTable before merge, comparing can take a lot of time.
What should I do to prevent merging rows that I already wrote to database and removed from Actual DataTable? Maybe I can exclude them even at deserialisation process (I use NewtonSoft JSON.net)? Or any zippy way to prevent merging rows if they primary key < primary key in Actual DataTable?
Thanks for your answers!
UPDATE: merging code
public class MyData
{
DataTable BlackPairs = new DataTable();
DataTable WhiteTable = new DataTable();
public string _Json {
set
{
DataSet TempDS = JsonConvert.DeserializeObject<DataSet>(value);
try
{
foreach (DataTable table in TempDS.Tables)
{
BlackPairs = table.Copy();
WhiteTable.Merge(BlackPairs);
}
}catch{}
}
}
public MyData()
{ //columns initialization
WhiteTable.Columns.AddRange(new DataColumn[]{columns);
WhiteTable.PrimaryKey = new DataColumn[]{tid};
}
I have created custom Merge function based on what we have talked through comments. This function is only if primary column is typeof(int) but it can be easily improved to get all types or just change it to what type you need (string, int, bool...)
public Test()
{
InitializeComponent();
DataTable smallerDatatable = new DataTable();
smallerDatatable.Columns.Add("Col1", typeof(int));
smallerDatatable.Columns.Add("Col2", typeof(string));
DataTable biggerDatatable = new DataTable();
biggerDatatable.Columns.Add("Col1", typeof(int));
biggerDatatable.Columns.Add("Col2", typeof(string));
smallerDatatable.Rows.Add(1, "Row1");
smallerDatatable.Rows.Add(2, "Row2");
smallerDatatable.Rows.Add(3, "Row3");
biggerDatatable.Rows.Add(1, "Row1");
biggerDatatable.Rows.Add(2, "Row2");
biggerDatatable.Rows.Add(3, "Row3");
biggerDatatable.Rows.Add(4, "Row4");
biggerDatatable.Rows.Add(5, "Row5");
DataTable mergedTable = MergeOnUniqueColumn(smallerDatatable, biggerDatatable, "Col1");
dataGridView1.DataSource = mergedTable;
}
private DataTable MergeOnUniqueColumn(DataTable smallTable, DataTable bigTable, string uniqueColumn)
{
DataTable m = smallTable;
for(int i = 0; i < bigTable.Rows.Count; i++)
{
if(!(smallTable.AsEnumerable().Any(row => bigTable.Rows[i][uniqueColumn].Equals(row.Field<object>(uniqueColumn)))))
{
smallTable.Rows.Add(bigTable.Rows[i].ItemArray);
}
}
return m;
}
Function above will fill every missing unique value inside smallTable from bigTable.
If you want to fill smallTable with values from bigTable only after last smallTable row then use this function.
private DataTable MergeOnUniqueColumnAfterLastID(DataTable smallTable, DataTable bigTable, string uniqueColumn)
{
DataTable m = smallTable;
int maxUnique = Convert.ToInt32(m.Compute("max([" + uniqueColumn + "])", string.Empty));
for (int i = 0; i < bigTable.Rows.Count; i++)
{
if (!(smallTable.AsEnumerable().Any(row => (int)bigTable.Rows[i][uniqueColumn] <= maxUnique)))
{
smallTable.Rows.Add(bigTable.Rows[i].ItemArray);
}
}
return m;
}

Combining two c# datatables into one

I have two datatables in my ASP.NET application that are filled from csv files and I am trying to combine the two into one.
Heres what the interface looks like:
When I click the 'Merge Data' button it should merge the test1.csv and test2.csv which kind of works but looks like this:
So my question is how do I align these two datatables so that all the data is on the same row?
Below is the code for the Merge Data Button:
List<string> filepaths = new List<string>();
List<DataTable> allTables = new List<DataTable>();
DataTable mergedTables = new DataTable();
int rowCount = grdFiles.Rows.Count;
for (int i = 0; i < rowCount; i++)
{
string filename = grdFiles.Rows[i].Cells[0].Text;
filepaths.Add(Server.MapPath("~/Uploads/") + filename);
}
foreach(string path in filepaths)
{
DataTable dt = new DataTable();
//converts csv into datatable
dt = GetDataTableFromCsv(path, true);
//add table to list of tables
allTables.Add(dt);
}
foreach(DataTable datatable in allTables)
{
//Merge each table in the list to the mergedTables datatable
mergedTables.Merge(datatable);
}
csvUploadResults.DataSource = mergedTables;
csvUploadResults.DataBind();
Thanks in advance for any help :)
If your objective is just to merge data without considering the relationship between the two data then you can add two more columns into first datatable and through loop get data from second table and assign them to first datatable columns. The way the data is received will be the way data will be saved in first datatable.
public DataTable MergeData(DataTable dtFirst,DataTable dtSecond)
{
dtFirst.Columns.Add("LocalAuthority");
dtFirst.Columns.Add("AverageSpeed");
for (int i = 0; i < dtFirst.Rows.Count; i++)
{
dtFirst.Rows[i]["LocalAuthority"] = dtSecond.Rows[i]["LocalAuthority"];
dtFirst.Rows[i]["AverageSpeed"] = dtSecond.Rows[i]["AverageSpeed"];
}
return dtFirst;
}
Now , you need to pass datatable as parameter in following method.
MergeData(allTables.ElementAt(0), allTables.ElementAt(1));
You're going to need a unique key on both datatables and merge them together. You could add the SchoolName to your second datatable and merge the two tables on the postcode. Or more preferably, add an id to both of the datatables and merge the two datatables on the id.

Can I add a row to the middle of a dataset using c#?

Is it possible to add a row to the middle of an existing dataset with c#? I've done a lot of searching and haven't been able to find anything on how to do this. What have I tried? I've tried searching a lot and haven't found anything like an 'insertAt' method for datasets.
Thanks
Mike
The DataSet consists of a collection of DataTable objects so I assume that you are talking about a Datatable, right? If so, it has an InsertAt method:
DataTable dt = dataset.Tables[0]; //Get first datatable from dataset
DataRow row = dt.NewRow();
//fill row
dt.Rows.InsertAt(row,3); //Insert at index 3
DataSet does not have a rows collection, so you can't add a row to it at all.
You can insert a row by index into a DataTable object using DataTable.Rows.InsertAt(row, i). If the table is in a DataSet, your syntax would be DataSet.Tables[i].Rows.InsertAt(row, 0)
In my opinion (though this could take a lot of time), you can create an array or a list array then transfer all the data there from your dataset through for loop or any loop...then put an if statement inside to check where you want to put your extra data like this:
List<string> arrayList = dataset;// i know this is not possible just showing you that you have to put all your data from dataset to array:)
List <string> newList = new List<string>();//its up to you if you want to put another temporary array or you could simply output your data from the loop.
//THE LOOP
for(int i = 0; i<=arrayList.Count(); i++){
if(i == x)//x is the index or you may change this statement its up to you
{
//do the print or pass the data to newList
newList.add(arraList[i]);//not sure about this. its been a while since the last time i use this array list..
}
}
another way is customize your query(if your pulling out some data from database)
happy coding:)
Here's a short sample of doing it:
class Program
{
static void Main(string[] args)
{
DataSet ds = new DataSet();
DataTable dt = ds.Tables.Add("Table");
dt.Columns.Add("Id");
for (int i = 0; i < 10; i++)
{
dt.Rows.Add(new object[]{i});
}
var newRow=dt.NewRow();
newRow.ItemArray=new string[]{(dt.Rows.Count/2).ToString()+".middle"};
dt.Rows.InsertAt(newRow, dt.Rows.Count / 2);
}
}

Performance of setting DataRow values in a large DataTable

I have a large DataTable - around 15000 rows and 100 columns - and I need to set the values for some of the columns in every row.
// Creating the DataTable
DataTable dt = new DataTable();
for (int i = 0; i < COLS_NUM; i++)
{
dt.Columns.Add("COL" + i);
}
for (int i = 0; i < ROWS_NUM; i++)
{
dt.Rows.Add(dt.NewRow());
}
// Setting several values in every row
Stopwatch sw2 = new Stopwatch();
sw2.Start();
foreach (DataRow row in dt.Rows)
{
for (int j = 0; j < 15; j++)
{
row["Col" + j] = 5;
}
}
sw2.Stop();
The measured time above is about 4.5 seconds. Is there any simple way to improve this?
Before you populate the data, call the BeginLoadData() method on the DataTable. When you have finished loading the data, call EndLoadData(). This turns off all notifications, index maintenance, and constraints, which will improve performance.
As an alternative, call BeginEdit() before updating each row, and EndEdit() when the editing for that row is complete.
Here is a link with more information on improving DataSet performance:
http://www.softwire.com/blog/2011/08/04/dataset-performance-in-net-web-applications/
One improvement that I can think of is editing columns by their indices, rather than their names.
foreach (DataRow row in dt.Rows)
{
for (int j = 0; j < 15; j++)
{
row[j] = 5;
}
}
With an empirical test, your method seems to run in ~1500 milliseconds on my computer, and this index based version runs in ~1100 milliseconds.
Also, see Marc's answer in this post:
Set value for all rows in a datatable without for loop
this depends on your business logic which is not clear in your question, however, If you want to set the values for some of the columns in every row, try the following,
Create a separated temp column(s), you might create it in the same
loop when creating the original data table
Fill the new values into this column,
delete the old column and insert the new one in its place instead.
This solution will be logical if you can expect the new values or if you have the same value for all rows (like in your example) or if you have some kind of repeat, in that case adding a new column with loaded will be much more faster than looping all rows.

Categories