DataTable - how to achieve datacolumn expression (computed column) persistence? - c#

I have this DataTable
FName LName Tag1 Tag2 Tag3 ... (not fixed, can be as many)
What I want is
FName LName TagAll
So, I created a column TagAll of type string with expression as
var expression = string.Empty;
// ... other code
// In a loop for all tag columns
expression = expression + " + ',' + " + tagColumn;
// at the end of loop
dtContact.Columns["Tag_All"].Expression = expression;
So, if I have 3 columns, the expression is like this
"Tag1 + ',' + Tag2 + ',' + Tag3"
For example the data is
FName LName Tag1 Tag2 Tag3
Jeff Atwood test tag other
Matt breeden myTag total last
The resulting DataTable becomes like this
FName LName Tag1 Tag2 Tag3 Tag_All
Jeff Atwood test tag other test, tag, other
Matt breeden myTag total last myTag, total, last
It is fine till now, but now I would like to remove all these other Tag(s) column. I tried doing
dtContact.Columns.RemoveAt(2) but it throws 'System.ArgumentException'
I am guessing it is because that column is used in a computed column expression, is that correct? Because when I remove column 0 or column 1. It works fine. So, is there a way that I could remove all these other Tag(s) column, given that they are used in a computed column expression? May be somehow make this column persistent? Though I searched for it on Google but couldn't find anything.
Also, like I said, it is not fixed that there would only be 2, or 3 or n number of these Tag(s) column, they are dynamic, and there can be just 1, Tag1, upto any... say Tag88 or whatever.

Try this method:
//Usage
DataTable dtMod = GetModifiedTable( dt);
//Function to return modified data table
public DataTable GetModifiedTable(DataTable dt)
{
var columnList = dt.Columns.Cast<DataColumn>()
.Where(x => x.ColumnName.StartsWith("Tag"))
.Select(x => x.ColumnName)
.ToArray();
DataTable dtNew = new DataTable();
dtNew.Columns.Add("FName");
dtNew.Columns.Add("LName");
dtNew.Columns.Add("Tag_All");
var results = dt.AsEnumerable().Select(r =>
dtNew.LoadDataRow(
new object[] {
r.Field<string>("FName"),
r.Field<string>("LName"),
GetTagValues(r, columnList)
}, false
));
dtNew.Rows.Add(results.ToArray());
return dtNew;
}
//Function to return csv values of given column list
public string GetTagValues(DataRow r, string[] columns )
{
string csv = string.Empty;
foreach(string column in columns)
{
csv += r[column].ToString() + ",";
}
return csv.Substring(0, csv.Length - 1);
}

You can't do this. You have to take another approach.
Add the TAG_ALL column but not as a computed column. For each row in the DataTable, go through all the TagX columns adding them up, and then assign the value to the Tag_All column. Repeat for each row. When finished, you can now delete the TagX columns.
Depending on the number of rows, this can actually be quite fast.
However, I'd question whether this is a good idea. If you are databinding the DataTable to some grid, then all you need do it not bind the TagX columns, or tell the Grid to make those columns invisible.

While handling huge data inside a datatable (about 500000 rows), looping over rows is taking time (even with dt.AsEnumerable().Select() method). I was searching for a faster method until i found the following workaround:
Clone the datatable (only structure) to a new table
Loop over columns and remove expression (set to ""), or just remove the expression for a specific Datacolumn
Merge the new datatable with the old one.
Now you can delete the original column without affecting the computed column.
Example:
//assign expression
var expression = string.Empty;
expression = expression + " + ',' + " + tagColumn;
dtContact.Columns["Tag_All"].Expression = expression;
//Clone datatable structure
DataTable dtNew = dtContact.Clone();
//Remove expression from a specific column
dtNew.Columns["Tag_All"].Expression = "";
//Merge data with the new Table
dtNew.Merge(dtContact);
dtContact.Dispose();
//Now you can remove the column used within the expression
dtNew.Columns.RemoveAt(2);

Check out this code:
private void creatable()
{
dt.Columns.Add("FName");
dt.Columns.Add("LName");
dt.Columns.Add("Tag1");
dt.Columns.Add("Tag2");
dt.Columns.Add("Tag3");
dt.Columns.Add("Tag_All");
}
private void removeColumn()
{
string temp = null;
List<string> colToRemove = new List<string>();
int colcount = dt.Columns.Count;
for (int i = 0; i <colcount ;i++ )
{
temp = dt.Columns[i].ColumnName;
if (temp == "Tag1" || temp == "Tag2" || temp == "Tag3")
{
colToRemove.Add(temp);
}
temp = null;
}
foreach (string item in colToRemove)
{
dt.Columns.Remove(item);
}
}
It's working as per your requirements.

Related

How to add rows and columns into datatable in single loop?

I have Json stored in DataBase which I deserialize into DataTable with the help of Newtonsoft.Json like this
string jsonString = "[myJsonfromDB....]";
//Deserialize to DataTable
DataTable dtSerialized = (DataTable)JsonConvert.DeserializeObject(jsonString, (typeof(DataTable)));
Which gives me result like this other columns in image are not shown
Here my label is Column and value is column value. Both of these columns will be moved to new DataTable which I'll process further for my operations. Now my problem is that I want to do it in one loop while I do it in multiple loops i.e add columns first (in first loop) and then add column values (in second loop). Currently I'm doing it like this
string colName = string.Empty;
// First Loop to add columns
foreach (DataRow dr in dtSerialized.Rows)
{
if (!string.IsNullOrEmpty(Utility.Instance.ToString(dr["label"])))
{
colName = prefix + "_" + Utility.Instance.ToString(dr["label"]).Replace(" ", string.Empty).Replace("/", "_").Replace("-", "_");
if (!dtResult.Columns.Contains(colName))
dtResult.Columns.Add(colName, typeof(string));
}
}
DataRow drSelect = dtResult.NewRow();
//Second loop to add column values
foreach (DataRow dr in dtSerialized.Rows)
{
if (!string.IsNullOrEmpty(Utility.Instance.ToString(dr["label"])))
{
colName = prefix + "_" + Utility.Instance.ToString(dr["label"]).Replace(" ", "").Replace("/", "_").Replace("-", "_");
drSelect[colName] = dr["value"];
}
}
dtResult.Rows.Add(drSelect);
dsResult.Tables.Add(dtResult);
After this I have
As much I know is that first DataRow schema is built from DataTable and then values can be added which is clear in above code. Now, How can i do it in one loop? Or should I search for alternate method which i don't know how to do this.
Thanks in advance
I am guessing I am missing something here. This looks like a transpose function and I cannot think of a way to accomplish this without two loops or transposing the data as you read it in. But going from what is posted it appears the column label holds the new DataTable’s column names. The first column is the first row of data to this new DataTable.
If this is the case then while you are looping through the rows to get the column names from column 1 (label), you can also get the “value’ from column 0 (value) and put this value in a List<string> named valuesList below.
Then after you have looped through all the rows and set the columns in the new DataTable dtResults you can add a single row from the valuesList by setting the list to a string array like below. This will produce the second picture you showed in one loop. Again I am guessing there is more to it than this simple transpose. Since a DataTable does not have a built in transpose function you will have to write your own. Not sure how you would do this in one loop though. Hope this helps.
private DataTable Transpose2ColDT(DataTable dtSource) {
string prefix = "DIAP_";
string colName = "";
DataTable dtResult = new DataTable();
List<string> valuesList = new List<String>();
if (dtSource.Rows.Count > 0) {
foreach (DataRow dr in dtSource.Rows) {
if (!dr.IsNull("Label")) {
if (dr.ItemArray[1].ToString() != "" ) {
colName = prefix + "_" + dr.ItemArray[1].ToString();
if (!dtResult.Columns.Contains(colName)) {
dtResult.Columns.Add(colName, typeof(string));
valuesList.Add(dr.ItemArray[0].ToString());
}
}
}
}
dtResult.Rows.Add(valuesList.ToArray<string>());
} // no rows in the original source
return dtResult;
}

Best way to add unknown number of columns to Datatable and populate with data?

I've looked into datatable.merge() and LINQ but not sure if Merge will work in this case and I'm just now starting to learn about LINQ.
In my project I have to join an unknown number of columns to an existing dataset.
I currently query the database and get records for each column that must be created and then a table of items and Id's that match the new columns and the existing dataset.
So the structure is like this:
foreach(Datarow row in newColumns.rows)
{
originialDataTable.Columns.Add("\"Supplier: " + row["Supplier"] + "\r\n" +
"ETA: " + row["ETADate"] + "\r\n" +
"Shipped: " + row["ShipDate"] + "\r\n" +
"Cartons: " + row["ContainerQty"] + "\r\n" +
"Delivery Mode: " + row["ShipType"] + "\r\n" +
"Container number: " + row["ContainerNumber"] + "\"");
for (int c = 0; c < cmDataTable.Rows.Count; c++)
{
for (int b = 0; b < originialDataTable.Rows.Count; b++)
{
string containerItem = cmDataTable.Rows[c]["POnumber"].ToString() + cmDataTable.Rows[c]["ItemNumber"].ToString();
string poItem = originialDataTable.Rows[b]["ponumber"].ToString() + originialDataTable.Rows[b]["stocknumber"];
if (containerItem == poItem && cmID == cmDataTable.Rows[c]["CMID"].ToString())
{
originialDataTable.Rows[b][position] = cmDataTable.Rows[c][2];
}
}
}
}
I know I should change the for loops to foreach but I'm not sure of the quality of the above code. Thanks for any help in advance.
Not completely clear regarding the requirements, but tried few things, check out the following code:
DataTable dt = new DataTable(); // Create DataTable
dt.Columns.Add("A", typeof(int)); // Add a Column
dt.Columns["A"].DefaultValue = 5; // Set Column default value
dt.Rows.Add(3); // Add a DataRow
DataTable dt1 = new DataTable(); // Create DataTable
dt1.Columns.Add("C", typeof(int)); // Add a Column
dt1.Columns["C"].DefaultValue = 5; // Set Column default value
dt1.Rows.Add(3); // Add a DataRow
dt.Merge(dt1); // Merge dt1 into dt
The challenge that you would find in this kind of code is even when you might expect merged DataTable to contain two columns and one row, it would actually contain 2 rows, where for merged column for each DataTable it will fill the default value (try running the code to understand), if I don't provide the DefaultValue, then it fills in the Null, following would be the result of the above code:
A C
3 5
5 3
Now my understanding is this result would not be desirable, then I tried using some Linq Join, Full Outer Join to figure out if we can have a solution, but also seems unlikely since in a Join it traverse through all the rows and similar to Merge above add a default for all non existing Rows.
Only Solution I can think of is:
Add all the columns to a new DataTable explicitly
Run a for loop for number of Rows you want to insert, which can be from any of the DataTable (source) and copy the values inside the loop
Default the unavailable values, for other Datatable until and unless we have exactly same number of rows

use select statement to get Data from a DataTable

I have DataTable containing three columns, Name, Date and DialedNumber. I want to get rows on the basis of DialedNumber column having phone number like 03001234567 ...
I am filing datatable with an method return type is datatable.
{
DataTable dt = filldata();
}
Problem is how to use select statement to get rows having number 03001234567 or some other telephone number ?
Try this Suppose you have a variable **string str** which is having that telephone number which you want to get from that data table then you can use this
{
DataTable dt = filldata();
DataRow[] resut = dt.Select("DialedNumber ='" + str + "'");
}
It will return you those rows having same telephone number in column DialedNumber.
If you want to filter from the start, not getting all table rows every time, you should adjust your SQL statement:
SELECT * FROM Table WHERE DialedNumber = #dialedNumber
and in C# use SqlCommand.Parameters.AddWithValue(...) to add the #dialedNumber parameter to the query.
Try to use Linq to DataTable like this
var results = from myRow in dt.AsEnumerable()
where myRow.Field<String>("DialedNumber") == "03001234567"
select myRow;
You can use Linq to DataSet:
string number = "03001234567";
var rows = dt.AsEnumerable()
.Where(r => r.Field<string>("DialedNumber").Contains(number));
You even can project rows into strongly typed objects:
var people = from r in dt.AsEnumerable()
where r.Field<string>("DialedNumber").Contains(number)
select new {
Name = r.Field<string>("Name"),
Date = r.Field<DateTime>("Date"),
DialedNumber = r.Field<string>("DialedNumber")
};
Note: if you want to check exact match of dialed number, then instead of Contains(number) (which is equivalent of LIKE) use == number.
Try like this
private void GetRowsByFilter()
{
DataTable table = DataSet1.Tables["Table1"];
// Presuming the DataTable has a column named Date.
string expression;
expression = "DialedNumber ='03001234567 '";
DataRow[] foundRows;
// Use the Select method to find all rows matching the filter.
foundRows = table.Select(expression);
// Print column 0 of each returned row.
for(int i = 0; i < foundRows.Length; i ++)
{
Console.WriteLine(foundRows[i][0]);
}
}
DataTable.Select Method

Using Split and storing in a data table

I am looking at how to split a string and store the info in a datatable. I can get the split and store to work correctly but the issue comes in how I am trying to use the split. this is an example of the string I have:
itemid/n3,itemid/n4
itemid is the items unique id and after /n is how many of the item the user has selected, the comma seperates the entries
I have a data table like this:
DataTable table = new DataTable();
table.Columns.Add("id", typeof(int));
table.Columns.Add("count", typeof(int));
Id like to be able to split the string at the comma and then store each of the values in the data table so they appear on the same row (split at the /n) is there an easy way to do this using split? or am I better off doing it another way
Yeah, you may split by the comma first and by /n afterwards:
foreach(var row in myString.Split(','))
{
var fields = row.Split(new string[] { "/n" },
StringSplitOptions.None);
// fields[0] is ID, fields[1] is count
}
This still executes in linear time, therefore it may definitely be a way to go.
If "/n" and "," are always present for each record, you can use a regular expression split with the expression "(?:/n|\,)" and then loop with x+=2 instead of x++ through the list. X will be the ID, X+1 will be the value.
string Input = "12/nTwelve,13/nThirteen,";
string[] InputSplit = Regex.Split(Input, #"(?:/n|\,)");
for(int i = 0 ; i < ((InputSplit.Length / 2) * 2) ; i+=2){
//Math in the middle helps when there's a trailing comma in the data set
Console.WriteLine(string.Format("{0}\t{1}", InputSplit[i], InputSplit[i+1]));
}
Note that for the example, I changed the type of the first column, as in the provided sample string, id is a string.
DataTable table = new DataTable();
table.Columns.Add("id", typeof(string));
table.Columns.Add("count", typeof(int));
var str = "itemid/n3,itemid/n4";
var items =
str.Split(',').Select(
r =>
new
{
Id = r.Split(new[] {"/n"}, StringSplitOptions.RemoveEmptyEntries).First(),
Count = int.Parse(r.Split(new[] {"/n"}, StringSplitOptions.RemoveEmptyEntries).Last())
});
foreach (var item in items)
{
var row = table.NewRow();
row["id"] = item.Id;
row["count"] = item.Count;
}

Delete Duplicate records from large csv file C# .Net

I have created a solution which read a large csv file currently 20-30 mb in size, I have tried to delete the duplicate rows based on certain column values that the user chooses at run time using the usual technique of finding duplicate rows but its so slow that it seems the program is not working at all.
What other technique can be applied to remove duplicate records from a csv file
Here's the code, definitely I am doing something wrong
DataTable dtCSV = ReadCsv(file, columns);
//columns is a list of string List column
DataTable dt=RemoveDuplicateRecords(dtCSV, columns);
private DataTable RemoveDuplicateRecords(DataTable dtCSV, List<string> columns)
{
DataView dv = dtCSV.DefaultView;
string RowFilter=string.Empty;
if(dt==null)
dt = dv.ToTable().Clone();
DataRow row = dtCSV.Rows[0];
foreach (DataRow row in dtCSV.Rows)
{
try
{
RowFilter = string.Empty;
foreach (string column in columns)
{
string col = column;
RowFilter += "[" + col + "]" + "='" + row[col].ToString().Replace("'","''") + "' and ";
}
RowFilter = RowFilter.Substring(0, RowFilter.Length - 4);
dv.RowFilter = RowFilter;
DataRow dr = dt.NewRow();
bool result = RowExists(dt, RowFilter);
if (!result)
{
dr.ItemArray = dv.ToTable().Rows[0].ItemArray;
dt.Rows.Add(dr);
}
}
catch (Exception ex)
{
}
}
return dt;
}
One way to do this would be to go through the table, building a HashSet<string> that contains the combined column values you're interested in. If you try to add a string that's already there, then you have a duplicate row. Something like:
HashSet<string> ScannedRecords = new HashSet<string>();
foreach (var row in dtCSV.Rows)
{
// Build a string that contains the combined column values
StringBuilder sb = new StringBuilder();
foreach (string col in columns)
{
sb.AppendFormat("[{0}={1}]", col, row[col].ToString());
}
// Try to add the string to the HashSet.
// If Add returns false, then there is a prior record with the same values
if (!ScannedRecords.Add(sb.ToString())
{
// This record is a duplicate.
}
}
That should be very fast.
If you've implemented your sorting routine as a couple of nested for or foreach loops, you could optimise it by sorting the data by the columns you wish to de-duplicate against, and simply compare each row to the last row you looked at.
Posting some code is a sure-fire way to get better answers though, without an idea of how you've implemented it anything you get will just be conjecture.
Have you tried Wrapping the rows in a class and using Linq?
Linq will give you options to get distinct values etc.
You're currently creating a string-defined filter condition for each and every row and then running that against the entire table - that is going to be slow.
Much better to take a Linq2Objects approach where you read each row in turn into an instance of a class and then use the Linq Distinct operator to select only unique objects (non-uniques will be thrown away).
The code would look something like:
from row in inputCSV.rows
select row.Distinct()
If you don't know the fields you're CSV file is going to have then you may have to modify this slightly - possibly using an object which reads the CSV cells into a List or Dictionary for each row.
For reading objects from file using Linq, this article by someone-or-other might help - http://www.developerfusion.com/article/84468/linq-to-log-files/
Based on the new code you've included in your question, I'll provide this second answer - I still prefer the first answer, but if you have to use DataTable and DataRows, then this second answer might help:
class DataRowEqualityComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
// perform cell-by-cell comparison here
return result;
}
public int GetHashCode(DataRow obj)
{
return base.GetHashCode();
}
}
// ...
var comparer = new DataRowEqualityComparer();
var filteredRows = from row in dtCSV.Rows
select row.Distinct(comparer);

Categories