Finding lookup values in datatable with match and with interpolation - c#

I have a datatable that was imported from a CSV file. There are a multitude of different tables that can be imported.
The datatables have four columns (type=double): LookupColumn L M S
LookupColumn will usually have a unique name (e.g., Length, Height, Weight). The other column names remain the same. This is immaterial as you can just use dt.Column[0] mostly. The lookup column will always be the first column imported.
I need to search the datatable on LookupColumn for a LookupValue passed from the app (from a textbox).
If a LookupValue matches exactly a number in LookupColumn, then, return the values for L, M, S.
If there is no match, then I need to find the rows on either side of where the LookupValue would lie and return the min/max values for each variable in L,M,S.
Once I have those, I can interpolate the values for L, M, S.
For example:
Col_0
L
M
S
45.0
-0.3521
2.441
0.09182
45.5
-0.3521
2.524
0.09153
46.0
-0.3521
2.608
0.09124
46.5
-0.3521
2.691
0.09094
47.0
-0.3521
2.776
0.09065
47.5
-0.3521
2.861
0.09036
48.0
-0.3521
2.948
0.09007
If my LookupValue in Col[0] = 46.5, the program would return L=-0.3521 M=2.691 S=0.09094
These values will be put in textboxes on the form the viewer sees.
If there was no match (assuming LookupValue was within the range LookupColumn min/max) then I need to return the rows on both sides of where the value would lie if it were present--that is, Lmin Lmax, Mmin Mmax, Smin Smax and use those in the following formula to get the interpolated value (IntVal) for LookupColumn (Col_0).
For example, if the LookupValue in (Col_0) = 46.8, the returned results (array?, list?) would be the rows where Col_0 = 46.5 and 47.0:
Col_0
L Values
M Values
S Values
LookupMin = 46.5
Lmin = -0.3521
Mmin = 2.691
Smin = 0.09094
LookupMax = 47.0
Lmax = -0.3521
Mmax = 2.776
Smax = 0.09065
Interpolated Value = LMSmin + (46.8 - LookupMin) * (LMSmax - LMSmin / LookupMax - LookupMin)
Interpolated L = -0.3521 because Lmin = Lmax
Interpolated M = 2.691 + (46.8 - 46.5) * (2.776 - 2.691 / 47.0 - 46.5)
Interpolated M = 2.7418
Interpolated S = 0.09094 + (46.8 - 46.5) * (0.09065 - 0.09094 / 47.0 - 46.5)
Interpolated S = 0.09088
So, given the Min/Max values for Col_0 and either L, M, or S min/max values, I can interpolate any value the user provides that isn't in the lookup even if the LookupValue has more decimals. The interpolated L,M,S values will be put in textboxes for the user.
I have a bit of code that works when there's a match, but, I think there's a better/more concise way either using Linq or Tuples. I realize this isn't the best code and I'm open to suggestions.
I have scoured StackOverflow and found several posts on interpolation and Lookup Tables. It seems that the best practice for the lookup is to use a tuple, but, I'm not very clear on on their use.
For the most part, this question is focused on returning the Min/Max values of the lookup when there's no match. Once I have those, I don't think the interpolation is a big feat as I know the formula. Also, I know that the user could enter values out of range--I will account for those issues later.
Any help is appreciated.
private void tbLookup_Leave(object sender, System.EventArgs e)
{
string colName = tmpDT.Columns[0].ColumnName;
string colSearch = colName + " = '" + tbLookup.Text + "'";
if (tbLookup.Text.Length > 0)
{
// Exact match
while (true)
{
DataRow[] foundRow = tmpDT.Select(colSearch);
if (foundRow.Length == 0)
{
break;
}
foreach (DataRow row in foundRow)
{
string L = row.Field<string>("L");
string M = row.Field<string>("M");
string S = row.Field<string>("S");
tbLkupL.Text = L;
tbLkupM.Text = M;
tbLkupS.Text = S;
}
// No match
// Call interpolation method
}
}
else
{
MessageBox.Show("Please enter a lookup value", "Missing Data");
}

You inquired about possibly using LINQ, so I checked my code chest and found something similar, that I adapted to your needs.
using System.Linq; // Add this at the top of the Program.cs file.
The extension method returns three output parameters, that contain found indices, or -1 if not found.
// An extension methods class must be the first class in a file.
// Add this class inside the namespace of a console app, before the Program class (in the Program.cs file).
public static class ExtensionMethods
{
public static bool GetNearestOrEqual<TSource, TValue>(this System.Collections.Generic.IEnumerable<TSource> source, System.Func<TSource, TValue> valueSelector, TValue referenceValue, out int indexOfLowerMax, out int indexOfEqual, out int indexOfHigherMin)
where TValue : struct, System.IComparable<TValue>
{
using var e = source.GetEnumerator();
var ltCurrent = new TValue?();
var gtCurrent = new TValue?();
indexOfLowerMax = -1;
indexOfEqual = -1;
indexOfHigherMin = -1;
var index = 0;
while (e.MoveNext())
{
var currentValue = valueSelector(e.Current);
switch (currentValue.CompareTo(referenceValue))
{
case int lo when lo < 0:
if (!ltCurrent.HasValue || currentValue.CompareTo(ltCurrent.Value) > 0)
{
indexOfLowerMax = index;
ltCurrent = currentValue;
}
break;
case int hi when hi > 0:
if (!gtCurrent.HasValue || currentValue.CompareTo(gtCurrent.Value) < 0)
{
indexOfHigherMin = index;
gtCurrent = currentValue;
}
break;
default:
indexOfEqual = index;
break;
}
index++;
}
return indexOfLowerMax != -1 || indexOfEqual != -1 || indexOfHigherMin != -1;
}
}
Sample of how you could use it (created a simple console app):
// Replace the Main() inside the Program class of a console app.
static void Main(string[] args)
{
var dt = new System.Data.DataTable();
dt.Columns.Add("Col_0", typeof(double));
dt.Columns.Add("L", typeof(double));
dt.Columns.Add("M", typeof(double));
dt.Columns.Add("S", typeof(double));
dt.Rows.Add(new object[] { 45.0, -0.3521, 2.441, 0.09182 });
dt.Rows.Add(new object[] { 45.5, -0.3521, 2.524, 0.09153 });
dt.Rows.Add(new object[] { 46.0, -0.3521, 2.608, 0.09124 });
dt.Rows.Add(new object[] { 46.5, -0.3521, 2.691, 0.09094 });
dt.Rows.Add(new object[] { 47.0, -0.3521, 2.776, 0.09065 });
dt.Rows.Add(new object[] { 47.5, -0.3521, 2.861, 0.09036 });
dt.Rows.Add(new object[] { 48.0, -0.3521, 2.948, 0.09007 });
var lookupValue = 46.8;
var foundAnything = dt.Rows.Cast<System.Data.DataRow>().GetNearestOrEqual(o => (double)o.ItemArray[0], lookupValue, out var indexOfLowerMax, out var indexOfEqual, out var indexOfHigherMin);
// Assuming example for when both low and high are found...
var dr = dt.NewRow();
var lookuploDiff = lookupValue - (double)dt.Rows[indexOfLowerMax][0];
var hiloDiff = (double)dt.Rows[indexOfHigherMin][0] - (double)dt.Rows[indexOfLowerMax][0];
dr.ItemArray = new object[] {
lookupValue,
(double)dt.Rows[indexOfLowerMax][1] + lookuploDiff * (((double)dt.Rows[indexOfHigherMin][1] - (double)dt.Rows[indexOfLowerMax][1]) / hiloDiff),
(double)dt.Rows[indexOfLowerMax][2] + lookuploDiff * (((double)dt.Rows[indexOfHigherMin][2] - (double)dt.Rows[indexOfLowerMax][2]) / hiloDiff),
(double)dt.Rows[indexOfLowerMax][3] + lookuploDiff * (((double)dt.Rows[indexOfHigherMin][3] - (double)dt.Rows[indexOfLowerMax][3]) / hiloDiff),
};
dt.Rows.InsertAt(dr, indexOfHigherMin);
}
As always, if there are any questions, this is the place. :)

Related

C# Constructing a Dynamic Query From DataTable

Trying to Generate a Dynamic Linq Query, based on DataTable returned to me... The column names in the DataTable will change, but I will know which ones I want to total, and which ones I will want to be grouped.
I can get this to work with loops and writing the output to a variable, then recasting the parts back into a data table, but I'm hoping there is a more elegant way of doing this.
//C#
DataTable dt = new DataTable;
Dt.columns(DynamicData1)
Dt.columns(DynamicData1)
Dt.columns(DynamicCount)
In this case the columns are LastName, FirstName, Age. I want to total ages by LastName,FirstName columns (yes both in the group by). So one of my parameters would specify group by = LastName, FirstName and another TotalBy = Age. The next query may return different column names.
Datarow dr =..
dr[0] = {"Smith","John",10}
dr[1] = {"Smith","John",11}
dr[2] = {"Smith","Sarah",8}
Given these different potential columns names...I'm looking to generate a linq query that creates a generic group by and Total output.
Result:
LastName, FirstName, AgeTotal
Smith, John = 21
Smith, Sarah = 8
If you use a simple converter for Linq you can achieve that easily.
Here a quick data generation i did for the sample :
// create dummy table
var dt = new DataTable();
dt.Columns.Add("LastName", typeof(string));
dt.Columns.Add("FirstName", typeof(string));
dt.Columns.Add("Age", typeof(int));
// action to create easily the records
var addData = new Action<string, string, int>((ln, fn, age) =>
{
var dr = dt.NewRow();
dr["LastName"] = ln;
dr["FirstName"] = fn;
dr["Age"] = age;
dt.Rows.Add(dr);
});
// add 3 datarows records
addData("Smith", "John", 10);
addData("Smith", "John", 11);
addData("Smith", "Sarah", 8);
This is how to use my simple transformation class :
// create a linq version of the table
var lqTable = new LinqTable(dt);
// make the group by query
var groupByNames = lqTable.Rows.GroupBy(row => row["LastName"].ToString() + "-" + row["FirstName"].ToString()).ToList();
// for each group create a brand new linqRow
var linqRows = groupByNames.Select(grp =>
{
//get all items. so we can use first item for last and first name and sum the age easily at the same time
var items = grp.ToList();
// return a new linq row
return new LinqRow()
{
Fields = new List<LinqField>()
{
new LinqField("LastName",items[0]["LastName"].ToString()),
new LinqField("FirstName",items[0]["FirstName"].ToString()),
new LinqField("Age",items.Sum(item => Convert.ToInt32(item["Age"]))),
}
};
}).ToList();
// create new linq Table since it handle the datatable format ad transform it directly
var finalTable = new LinqTable() { Rows = linqRows }.AsDataTable();
And finally here are the custom class that are used
public class LinqTable
{
public LinqTable()
{
}
public LinqTable(DataTable sourceTable)
{
LoadFromTable(sourceTable);
}
public List<LinqRow> Rows = new List<LinqRow>();
public List<string> Columns
{
get
{
var columns = new List<string>();
if (Rows != null && Rows.Count > 0)
{
Rows[0].Fields.ForEach(field => columns.Add(field.Name));
}
return columns;
}
}
public void LoadFromTable(DataTable sourceTable)
{
sourceTable.Rows.Cast<DataRow>().ToList().ForEach(row => Rows.Add(new LinqRow(row)));
}
public DataTable AsDataTable()
{
var dt = new DataTable("data");
if (Rows != null && Rows.Count > 0)
{
Rows[0].Fields.ForEach(field =>
{
dt.Columns.Add(field.Name, field.DataType);
});
Rows.ForEach(row =>
{
var dr = dt.NewRow();
row.Fields.ForEach(field => dr[field.Name] = field.Value);
dt.Rows.Add(dr);
});
}
return dt;
}
}
public class LinqRow
{
public List<LinqField> Fields = new List<LinqField>();
public LinqRow()
{
}
public LinqRow(DataRow sourceRow)
{
sourceRow.Table.Columns.Cast<DataColumn>().ToList().ForEach(col => Fields.Add(new LinqField(col.ColumnName, sourceRow[col], col.DataType)));
}
public object this[int index]
{
get
{
return Fields[index].Value;
}
set
{
Fields[index].Value = value;
}
}
public object this[string name]
{
get
{
return Fields.Find(f => f.Name == name).Value;
}
set
{
var fieldIndex = Fields.FindIndex(f => f.Name == name);
if (fieldIndex >= 0)
{
Fields[fieldIndex].Value = value;
}
}
}
public DataTable AsSingleRowDataTable()
{
var dt = new DataTable("data");
if (Fields != null && Fields.Count > 0)
{
Fields.ForEach(field =>
{
dt.Columns.Add(field.Name, field.DataType);
});
var dr = dt.NewRow();
Fields.ForEach(field => dr[field.Name] = field.Value);
dt.Rows.Add(dr);
}
return dt;
}
}
public class LinqField
{
public Type DataType;
public object Value;
public string Name;
public LinqField(string name, object value, Type dataType)
{
DataType = dataType;
Value = value;
Name = name;
}
public LinqField(string name, object value)
{
DataType = value.GetType();
Value = value;
Name = name;
}
public override string ToString()
{
return Value.ToString();
}
}
I think I'd just use a dictionary:
public Dictionary<string, int> GroupTot(DataTable dt, string[] groupBy, string tot){
var d = new Dictionary<string, int>();
foreach(DataRow ro in dt.Rows){
string key = "";
foreach(string col in groupBy)
key += (string)ro[col] + '\n';
if(!d.ContainsKey(key))
d[key] = 0;
d[key]+= (int)ro[tot];
}
return d;
}
If you want the total on each row, we could get cute and create a column that is an array of one int instead of an int:
public void GroupTot(DataTable dt, string[] groupBy, string tot){
var d = new Dictionary<string, int>();
var dc = dt.Columns.Add("Total_" + tot, typeof(int[]));
foreach(DataRow ro in dt.Rows){
string key = "";
foreach(string col in groupBy)
key += (string)ro[col] + '\n'; //build a grouping key from first and last name
if(!d.ContainsKey(key)) //have we seen this name pair before?
d[key] = new int[1]; //no we haven't, ensure we have a tracker for our total, for this first+last name
d[key][0] += (int)ro[tot]; //add the total
ro[dc] = d[key]; //link the row to the total tracker
}
}
At the end of the operation every row will have an array of int in the "Total_age" column that represents the total for that First+Last name. The reason I used int[] rather than int, is because int is a value type, whereas int[] is a reference. Because as the table is being iterated each row gets assigned a reference to an int[] some of them with the same First+Last name will end up with their int[] references pointing to the same object in memory, so incrementing a later one increments all the earlier ones too (all "John Smith" rows total column holds a refernece to the same int[]. If we'd made the column an int type, then every row would point to a different counter, because every time we say ro[dc] = d[key] it would copy the current value of d[key] int into ro[dc]'s int. Any reference type would do for this trick to work, but value types wouldn't. If you wanted your column to be value type you'd have to iterate the table again, or have two dictionaries, one that maps DataRow -> total and iterate the keys, assigning the totals back into the row

Merging two arrays in C# [duplicate]

This question already has answers here:
How do I concatenate two arrays in C#?
(23 answers)
Joining two lists together
(15 answers)
how to sort a string array by alphabet?
(3 answers)
Sort a list alphabetically
(5 answers)
Closed 4 years ago.
In my code I have 2 arrays and I want to merge the both using right sequence. and save value to 3rd array. I tried a lot but could not find perfect solution.
public void Toolchange_T7()
{
for (int T7 = 0; T7 < sModulename_listofsafetysensor.Length; T7++)
{
if (sModulename_listofsafetysensor[T7] != null && sModulename_listofsafetysensor[T7].Contains("IR") && sModulename_listofsafetysensor[T7].Contains("FS"))
{
sElement_toolchanger[iET7] = sModulename_listofsafetysensor[T7];
iET7++;
}
}
for (int T7 = 0; T7 < sDesignation_toolchanger_t7.Length; T7++)
{
if (sDesignation_toolchanger_t7[T7] != null && sDesignation_toolchanger_t7[T7].Contains("IR") && sDesignation_toolchanger_t7[T7].Contains("FW"))
{
sDesignation_toolchanger[iMT7] = sDesignation_toolchanger_t7[T7];
iMT7++;
}
}
}
sElement_toolchanger contains:
++ST010+IR001+FW001
++ST010+IR002+FW001
++ST010+IR006+FW001
sDesignation_toolchanger contains:
++ST010+IR001.FS001
++ST010+IR001.FS002
++ST010+IR002.FS001
++ST010+IR002.FS002
++ST010+IR006.FS001
++ST010+IR006.FS002
My desired output is:
++ST010+IR001+FW001
++ST010+IR001.FS001
++ST010+IR001.FS002
++ST010+IR002+FW001
++ST010+IR002.FS001
++ST010+IR002.FS002
++ST010+IR006+FW001
++ST010+IR006.FS001
++ST010+IR006.FS002
It will be very helpful if some one know perfect solution
using System.Collections;
var mergedAndSorted = list1.Union(list2).OrderBy(o => o);
Simplest would be to:
Convert the arrays to lists:
var List1 = new List<string>(myArray1);
var List2 = new List<string>(myArray2);
Merge the two lists together:
List1.AddRange(List2);
and sort them.
List1.Sort();
According to what you said in the comments, here is a small function that will take one item from the first array, then two from the second array and so on to make a third one.
This code could be improved...
static void Main(string[] args)
{
string[] t1 = new string[] { "a", "b", "c" };
string[] t2 = new string[] { "a1", "a2", "b1", "b2", "c1", "c2" };
List<string> merged = Merge(t1.ToList(), t2.ToList());
foreach (string item in merged)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
private static List<T> Merge<T>(List<T> first, List<T> second)
{
List<T> ret = new List<T>();
for (int indexFirst = 0, indexSecond = 0;
indexFirst < first.Count && indexSecond < second.Count;
indexFirst++, indexSecond+= 2)
{
ret.Add(first[indexFirst]);
ret.Add(second[indexSecond]);
ret.Add(second[indexSecond + 1]);
}
return ret;
}
An example here
From your given sample Output, it doesn't look to be alphabetically sorted. In that scenario, you would need to use a Custom Comparer along with your OrderBy Clause after taking the Union.
For example, (since your sorting algorithm is unknown, am assuming one here for showing the example)
var first = new[]{"++ST010+IR001+FW001","++ST010+IR002+FW001","++ST010+IR006+FW001"};
var second = new[]{"++ST010+IR001.FS001",
"++ST010+IR001.FS002",
"++ST010+IR002.FS001",
"++ST010+IR002.FS002",
"++ST010+IR006.FS001",
"++ST010+IR006.FS002"};
var customComparer = new CustomComparer();
var result = first.Union(second).OrderBy(x=>x,customComparer);
Where your custom comparer is defined as
public class CustomComparer : IComparer<string>
{
int IComparer<string>.Compare(string x, string y)
{
var items1 = x.ToCharArray();
var items2 = y.ToCharArray();
var temp = items1.Zip(items2,(a,b)=> new{Item1 = a, Item2=b});
var difference = temp.FirstOrDefault(item=>item.Item1!=item.Item2);
if(difference!=null)
{
if(difference.Item1=='.' && difference.Item2=='+')
return 1;
if(difference.Item1=='+' && difference.Item2=='.')
return -1;
}
return string.Compare(x,y);
}
public int GetHashCode(string obj)
{
return obj.GetHashCode();
}
}
Output
Here's a solution if the two input arrays are already sorted. Since I suspect your comparison function is non-straightforward, I include a separate comparison function (not very efficient, but it will do). The comparison function splits the string into two (the alpha part and the numeric part) and then compares them so that the numeric part is more "important" in the sort.
Note that it doesn't do any sorting - it relies on the inputs being sorted. It just picks the lower valued item from the current position in one of the two arrays for transfer to the output array. The result is O(N).
The code makes a single pass through the two arrays (in one loop). When I run this, the output is AA00, AA01, AB01, AB02, AC02, AA03, AA04. I'm sure there are opportunities to make this code cleaner, I just popped it off. However, it should give you ideas to continue:
public class TwoArrays
{
private string[] _array1 = {"AA01", "AB01", "AB02", "AC02"};
private string[] _array2 = {"AA00", "AA03", "AA04"};
public void TryItOut()
{
var result = ConcatenateSorted(_array1, _array2);
var concatenated = string.Join(", ", result);
}
private int Compare(string a, string b)
{
if (a == b)
{
return 0;
}
string a1 = a.Substring(0, 2);
string a2 = a.Substring(2, 2);
string b1 = b.Substring(0, 2);
string b2 = b.Substring(2, 2);
return string.Compare(a2 + a1, b2 + b1);
}
private string[] ConcatenateSorted(string[] a, string[] b)
{
string[] ret = new string[a.Length + b.Length];
int aIndex = 0, bIndex = 0, retIndex = 0;
while (true) //do this forever, until time to "break" and return
{
if (aIndex >= a.Length && bIndex >= b.Length)
{
return ret;
}
if (aIndex >= a.Length)
{
ret[retIndex++] = b[bIndex++];
continue;
}
if (bIndex >= b.Length)
{
ret[retIndex++] = a[aIndex++];
continue;
}
if (Compare(a[aIndex], b[bIndex]) > 0)
{
ret[retIndex++] = b[bIndex++];
}
else
{
ret[retIndex++] = a[aIndex++];
}
}
}
}
To get it to work with your data, you need to change the input arrays and provide your own comparison function (which you could pass in as a delegate).

Casting a String as Integer in a .net DataTable

Disclaimer: This is my very first .net c# project
I am attempting to import a CSV into MSSQL but need to iterate through the CSV values first for sanitization purposes. Some of the columns in the CSV will be integer (will be used for calcuations later) and some are regular varchar.
My script above appears to force all values (that is row column values) in the DataTable as a string which throws an Exception later in my application when SQL cannot write a string as an integer.
Here is my method I am using for the getCSVImport which creates a datatable and populates it.
What I am thinking is to add another condition which checks if the value is an integer and then cast it as an integer (this kind of thing is new to me as PHP would does not handle types so strongly) but I fear that wont work as I am not sure if I can mix the values within a dataTable with various types.
So my question is, is there a way for me to have different values in a datatable as different types? My code below takes the line as a whole and writes it as a string, I need the values to be assigned either as string or as integer.
/*
* getCsvData()
* This method will create a datatable from the CSV file. We'll take the CSV file as is.
* and collect the data as needed:
*
* - Remove those original 4 lines (worthless info)
* - Line 5 starts with the headers, remove any of the brackets around the values
* - Iterate through the rest of the fields and sanitize them before we add it to the datatable
*
*/
private DataTable getCsvData(string csv_file_path)
{
// Create a new csvData tabletable object:
DataTable csvData = new DataTable();
try
{
using (TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
int row = 1;
while (!csvReader.EndOfData)
{
// Read the string and collect the row data
string[] rowData = csvReader.ReadFields();
if (row <= 4)
{
// We want to start on row 5 as first rows are nonsense :)
// Incriment the row so that we can do our magic above
row++;
continue;
} if(row == 5)
{
// Row 5 is the headers, we need to sanitize and continue:
foreach (string column in rowData)
{
// Remove the [ ] from the values:
var col = column.Substring(1, column.Length - 2);
DataColumn datecolumn = new DataColumn(col);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
// Incriment the row so that we can do our magic above
row++;
} else
{
// These are all of the actual rows, sanitize and add the rows:
//Making empty value as null
for (int i = 0; i < rowData.Length; i++)
{
// First remove the brackets:
if (rowData[i].Substring(0,1) == "[")
{
rowData[i] = rowData[i].Substring(1, rowData[i].Length - 2);
}
// Set blank to null:
if (rowData[i] == "" || rowData[i] == "-")
{
rowData[i] = null;
}
// Lastly, we need to do some calculations:
}
// Add the sanitized row to the DataTable:
csvData.Rows.Add(rowData);
}
}
}
}
catch (Exception ex)
{
throw new Exception("Could not parse the CSV file: "+ ex.Message);
}
return csvData;
}
You can cast the string to a int:
int j;
bool parsed=Int32.TryParse("-105", out j))
With TryParse you can check if it succeeded.
Then when you want to save it to the table again, cast it to string. You can simply do <variable>.ToString()
By default, data columns are initialized to a string data type.
There's an overload that allows you to specify the type, so I'd suggest you try that. Since your columns are known beforehand, you can easily handle this in your code.
private DataColumn AddColumn(string columnName, Type columnType)
{
// Remove the [ ] from the values:
var col = column.Substring(1, columnName.Length - 2);
DataColumn dataColumn = new DataColumn(col, columnType);
dataColumn.AllowDBNull = true;
return dataColumn;
}
if (row == 5)
{
csvData.Columns.Add(AddColumn(rowData[0], typeof(string)));
csvData.Columns.Add(AddColumn(rowData[1], typeof(int)));
csvData.Columns.Add(AddColumn(rowData[2], typeof(DateTime)));
csvData.Columns.Add(AddColumn(rowData[3], typeof(string)));
// etc
}
I'm not sure you'll even need to convert the other values before adding them to the DataTable, but if you do, many built-in types have TryParse methods, such as DateTime.TryParse and Int32.TryParse. You can call each of them in succession, and one of the "tries" succeeds, you'll know your type.
Alternatively, since you know the column types beforehand, you can just cast each value.
csvData.Rows.Add(Convert.ToString(rowData[0]),
Convert.ToInt32(rowData[1]),
Convert.ToDateTime(rowData[2]),
Convert.ToString(rowData[3]));
I would use *.TryParse(), ie: With this sample CSV:
*A sample csv file with
*some comment lines at top
-- with different comment
// comment strings.
[charField],[dateField],[intField],[decimalField]
"Sample char data 1",2016/1/2,123,123.45
"Sample char data 2",,2,1.5
"Sample char data 3",,3,
"Sample char data 4",,,
,,,
"Sample char data 6",2016/2/29 10:20,10,20.5
You might use TryParse on those datetime, int, decimal fields:
void Main()
{
var myData = ReadMyCSV(#"c:\MyPath\MyFile.csv");
// do whatever with myData
}
public IEnumerable<MyRow> ReadMyCSV(string fileName)
{
using (TextFieldParser tfp = new TextFieldParser(fileName))
{
tfp.HasFieldsEnclosedInQuotes = true;
tfp.SetDelimiters(new string[] { "," });
//tfp.CommentTokens = new string[] { "*","--","//" };
// instead of using comment tokens we are going to skip 4 lines
for (int j = 0; j < 4; j++)
{
tfp.ReadLine();
}
// header line.
tfp.ReadLine();
DateTime dt;
int i;
decimal d;
while (!tfp.EndOfData)
{
var data = tfp.ReadFields();
yield return new MyRow
{
MyCharData = data[0],
MyDateTime = DateTime.TryParse(data[1], out dt) ? dt : (DateTime?)null,
MyIntData = int.TryParse(data[2], out i) ? i : 0,
MyDecimal = decimal.TryParse(data[3], System.Globalization.NumberStyles.Any, null, out d) ? d : 0M
};
}
}
}
public class MyRow
{
public string MyCharData { get; set; }
public int MyIntData { get; set; }
public DateTime? MyDateTime { get; set; }
public decimal MyDecimal { get; set; }
}
I could further sanitize the data loaded, such as:
myData.Where( d => d.MyIntData != 0 );
Note: I didn't use a DataTable, which I could if I wanted to. For MSSQL loading, I would probably use an intermediate in-memory SQLite instance to save the sanitized data and then push to MSSQL using SqlBulkCopy class. A DataTable is of course an option (I just think it is less flexible).

More efficient way of assigning values in DataTable?

I have a DataTable with two columns: JobDetailID and CalculatedID. JobDetailID is not always unique. I want one/the first instance of CalculatedID for a given JobDetailID to be JobDetailID + "A", and when there are multiple rows with the same JobDetailID, I want successive rows to be JobDetailID + "B", "C", etc. There aren't more than four or five rows with the same JobDetailID.
I currently have it implemented as follows, but it's unacceptably slow:
private void AddCalculatedID(DataTable data)
{
var calculatedIDColumn = new DataColumn { DataType = typeof(string), ColumnName = "CalculatedID" };
data.Columns.Add(calculatedIDColumn);
data.Columns["CalculatedID"].SetOrdinal(0);
var enumerableData = data.AsEnumerable();
foreach (DataRow row in data.Rows)
{
var jobDetailID = row["JobDetailID"].ToString();
// Give calculated ID of JobDetailID + A, B, C, etc. for multiple rows with same JobDetailID
int x = 65; // ASCII value for A
string calculatedID = jobDetailID + (char)x;
while (string.IsNullOrEmpty(row["CalculatedID"].ToString()))
{
if ((enumerableData
.Any(r => r.Field<string>("CalculatedID") == calculatedID)))
{
calculatedID = jobDetailID + (char)x;
x++;
}
else
{
row["CalculatedID"] = calculatedID;
break;
}
}
}
}
Assuming I need to adhere to this format of output, how might I improve this performance?
It would be better to add the code for generation of CalculatedID in the place where you are getting the data, but, if that is unavailable, you might want to avoid scanning the entire table each time a duplicate is found. You could use a Dictionary for the used keys, like this:
private void AddCalculatedID(DataTable data)
{
var calculatedIDColumn = new DataColumn { DataType = typeof(string), ColumnName = "CalculatedID" };
data.Columns.Add(calculatedIDColumn);
data.Columns["CalculatedID"].SetOrdinal(0);
Dictionary<string, string> UsedKeyIndex = new Dictionary<string, string>();
foreach (DataRow row in data.Rows)
{
string jobDetailID = row["JobDetailID"].ToString();
string calculatedID;
if (UsedKeyIndex.ContainsKey(jobDetailID))
{
calculatedID = jobDetailID + 'A';
UsedKeyIndex.Add(jobDetailID, 'A');
}
else
{
char nextKey = UsedKeyIndex[jobDetailID].Value+1;
calculatedID = jobDetailID + nextKey;
UsedKeyIndex[jobDetailID] = nextKey;
}
row["CalculatedID"] = calculatedID;
}
}
This will essentially trade memory for speed, as it will cache all used JobDetailID's along with the last char used for the generated key. If you have lots and lots of these JobDetailID, this might get a bit memory intensive, but I doubt that you'll have problems unless you have millions of rows to process.
If I understand your idea about setting CalculatedID for the rows, then following algorithm would do the trick and it's complexity is linear. Most important part is data.Select("","JobDetailID"), where I get a sorted list of rows.
I didn't compiled it myself, so there could be syntactical errors.
private void AddCalculatedID(DataTable data)
{
var calculatedIDColumn = new DataColumn { DataType = typeof(string), ColumnName = "CalculatedID" };
data.Columns.Add(calculatedIDColumn);
data.Columns["CalculatedID"].SetOrdinal(0);
int jobDetailID = -1;
int letter = 65;
foreach (DataRow row in data.Select("","JobDetailID"))
{
if((int)row["JobDetailID"] == jobDetailID)
{
row["CalculatedID"] = row["JobDetailID"].ToString() + (char)letter;
letter++;
}
else
{
letter = 65;
jobDetailID = (int)row["JobDetailID"];
}
}
}
You tagged this as LINQ, but you are using iterative methods. Probably the best way to do this would be to use a combination of both, iterating over each "grouping" and assigning the calculated ID for each row in the grouping.
foreach (var groupRows in data.AsEnumerable().GroupBy(d => d["JobDetailID"].ToString()))
{
if(string.IsNullOrEmpty(groupRows.Key))
continue;
// We now have each "grouping" of duplicate JobDetailIDs.
int x = 65; // ASCII value for A
foreach (var duplicate in groupRows)
{
string calcID = groupRows.Key + ((char)x++);
duplicate["CalculatedID"] = calcID;
//Can also do this and achieve same results.
//duplicate["CalculatedID"] = groupRows.Key + ((char)x++);
}
}
First thing you do is group on the column that's going to have duplicates. You're going to iterate over each of these groupings, and reset the suffix value for every grouping. For every row in the grouping, you're going to get the calculated ID (incrementing the suffix value at the same time) and assign the ID back to the duplicate row. As a side note, we're altering the items we're enumerating here, which is normally a bad thing. However, we're changing data that isn't associated with our enumeration declaration (GroupBy), so it will not alter the behavior of our enumeration.
This method gets the job done in a single pass. You can optimize it further if, for example, "JobDetailID" is an integer instead of a string, or if the DataTable is always receiving the data sorted by "JobDetailID" (you could get rid of the dictionary), but here's a draft:
private static void AddCalculatedID(DataTable data)
{
data.BeginLoadData();
try
{
var calculatedIDColumn = new DataColumn { DataType = typeof(string), ColumnName = "CalculatedID" };
data.Columns.Add(calculatedIDColumn);
data.Columns["CalculatedID"].SetOrdinal(0);
var jobDetails = new Dictionary<string, int>(data.Rows.Count);
foreach (DataRow row in data.Rows)
{
var jobDetailID = row["JobDetailID"].ToString();
int lastSuffix;
if (jobDetails.TryGetValue(jobDetailID, out lastSuffix))
{
lastSuffix++;
}
else
{
// ASCII value for A
lastSuffix = 65;
}
row["CalculatedID"] = jobDetailID + (char)lastSuffix;
jobDetails[jobDetailID] = lastSuffix;
}
}
finally
{
data.EndLoadData();
}
}

System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary

I receive the above error message when performing a unit test on a method. I know where the problem is at, I just don't know why it's not present in the dictionary.
Here is the dictionary:
var nmDict = xelem.Descendants(plantNS + "Month").ToDictionary(
k => new Tuple<int, int, string>(int.Parse(k.Ancestors(plantNS + "Year").First().Attribute("Year").Value), Int32.Parse(k.Attribute("Month1").Value), k.Ancestors(plantNS + "Report").First().Attribute("Location").Value.ToString()),
v => {
var detail = v.Descendants(plantNS + "Details").First();
return new HoursContainer
{
BaseHours = detail.Attribute("BaseHours").Value,
OvertimeHours = detail.Attribute("OvertimeHours").Value,
TotalHours = float.Parse(detail.Attribute("BaseHours").Value) + float.Parse(detail.Attribute("OvertimeHours").Value)
};
});
var mergedDict = new Dictionary<Tuple<int, int, string>, HoursContainer>();
foreach (var item in nmDict)
{
mergedDict.Add(Tuple.Create(item.Key.Item1, item.Key.Item2, "NM"), item.Value);
}
var thDict = xelem.Descendants(plantNS + "Month").ToDictionary(
k => new Tuple<int, int, string>(int.Parse(k.Ancestors(plantNS + "Year").First().Attribute("Year").Value), Int32.Parse(k.Attribute("Month1").Value), k.Ancestors(plantNS + "Report").First().Attribute("Location").Value.ToString()),
v => {
var detail = v.Descendants(plantNS + "Details").First();
return new HoursContainer
{
BaseHours = detail.Attribute("BaseHours").Value,
OvertimeHours = detail.Attribute("OvertimeHours").Value,
TotalHours = float.Parse(detail.Attribute("BaseHours").Value) + float.Parse(detail.Attribute("OvertimeHours").Value)
};
});
foreach (var item in thDict)
{
mergedDict.Add(Tuple.Create(item.Key.Item1, item.Key.Item2, "TH"), item.Value);
}
return mergedDict;
and here is the method that is being tested:
protected IList<DataResults> QueryData(HarvestTargetTimeRangeUTC ranges,
IDictionary<Tuple<int, int, string>, HoursContainer> mergedDict)
{
var startDate = new DateTime(ranges.StartTimeUTC.Year, ranges.StartTimeUTC.Month, 1);
var endDate = new DateTime(ranges.EndTimeUTC.Year, ranges.EndTimeUTC.Month, 1);
const string IndicatorName = "{6B5B57F6-A9FC-48AB-BA4C-9AB5A16F3745}";
DataResults endItem = new DataResults();
List<DataResults> ListOfResults = new List<DataResults>();
var allData =
(from vi in context.vDimIncidents
where vi.IncidentDate >= startDate.AddYears(-3) && vi.IncidentDate <= endDate
select new
{
vi.IncidentDate,
LocationName = vi.LocationCode,
GroupingName = vi.Location,
vi.ThisIncidentIs, vi.Location
});
var finalResults =
(from a in allData
group a by new { a.IncidentDate.Year, a.IncidentDate.Month, a.LocationName, a.GroupingName, a.ThisIncidentIs, a.Location }
into groupItem
select new
{
Year = String.Format("{0}", groupItem.Key.Year),
Month = String.Format("{0:00}", groupItem.Key.Month),
groupItem.Key.LocationName,
GroupingName = groupItem.Key.GroupingName,
Numerator = groupItem.Count(),
Denominator = mergedDict[Tuple.Create(groupItem.Key.Year, groupItem.Key.Month, groupItem.Key.LocationName)].TotalHours,
IndicatorName = IndicatorName,
}).ToList();
for (int counter = 0; counter < finalResults.Count; counter++)
{
var item = finalResults[counter];
endItem = new DataResults();
ListOfResults.Add(endItem);
endItem.IndicatorName = item.IndicatorName;
endItem.LocationName = item.LocationName;
endItem.Year = item.Year;
endItem.Month = item.Month;
endItem.GroupingName = item.GroupingName;
endItem.Numerator = item.Numerator;
endItem.Denominator = item.Denominator;
}
foreach(var item in mergedDict)
{
if(!ListOfResults.Exists(l=> l.Year == item.Key.Item1.ToString() && l.Month == item.Key.Item2.ToString()
&& l.LocationName == item.Key.Item3))
{
for (int counter = 0; counter < finalResults.Count; counter++)
{
var data = finalResults[counter];
endItem = new DataResults();
ListOfResults.Add(endItem);
endItem.IndicatorName = data.IndicatorName;
endItem.LocationName = item.Key.Item3;
endItem.Year = item.Key.Item1.ToString();
endItem.Month = item.Key.Item2.ToString();
endItem.GroupingName = data.GroupingName;
endItem.Numerator = 0;
endItem.Denominator = item.Value.TotalHours;
}
}
}
return ListOfResults;
}
The error occurs here:
Denominator = mergedDict[Tuple.Create(groupItem.Key.Year, groupItem.Key.Month, groupItem.Key.LocationName)].TotalHours,
I do not understand why it is not present in the key. The key consists on an int, int, string (year, month, location) and that is what I have assigned it.
I've looked at all of the other threads concerning this error message but I didn't see anything that applied to my situation.
I was unsure of what tags to put on this but from my understanding the dictionary was created with linq to xml, the query is linq to sql and it's all part of C# so I used all the tags. if this was incorrect then I apologize in advance.
The problem is with comparisons between the keys you are storing in the Dictionary and the keys you are trying to look up.
When you add something to a Dictionary or access the indexer of a Dictionary it uses the GetHashCode() method to get a hash value of the key. The hashcode for a Tuple is unique to that instance of the Tuple. This means that unless you are passing in the exact same instance of the Tuple class into the indexer, it will not find the previously stored value. Your usage of mergedDict[Tuple.Create(... creates a brand new Tuple with a different hash code than is stored in the Dictionary.
I would recommend creating your own class to use as the key and implementing GetHashCode() and the Equality methods on that class. That way the Dictionary will be able to find what you previously stored there.
More:
The reason this is confusing to a lot of people is that for something like String or Int32, String.GetHashCode() will return the same hash code for two different instances that have the same value. A more specialized class such as Tuple doesn't always work the same. The implementor of Tuple could have gotten the hash code of each input to the Tuple and added them together (or something), but running Tuple through a decompiler you can see that this is not the case.

Categories