Conversion of DataTable to Dictionary<String,StringBuilder> - c#

DataTable has following column names:
Name, Value, Type, Info
Need to convert a structure Dictionary
Where Name is the Key (string) and other values will be appended to the StringBuilder like "Value,Type,Info", however it is possible to have repetitive Name column value, then successive each appended value to the StringBuilder will use a separator like ?: to depict the next set of value.
For eg: if the DataTable data is like:
Name, Value, Type, Info
one 1 a aa
one 11 b bb
two 2 c cc
two 22 dd ddd
two 222 ee eee
Now the result structure should be like:
Dictionary<String,StringBuilder> detail = new Dictionary<String,StringBuilder>
{
{[one],[1,a,aa?:11,b,bb},
{[two],[2,c,cc?:22,dd,ddd?:222,ee,eee}
}
It is easy to achieve the same using for loop, but I was trying to do it via Linq, so I tried something like:
datatable.AsEnumerable.Select(row =>
{
KeyValuePair<String,StringBuilder> kv = new KeyValuePair<String,StringBuilder>();
kv.key = row[Name];
kv.Value = row[Value]+","+row[Type]+","+row[Info]
return kv;
}).ToDictionary(y=>y.Key,y=>y.Value)
This code doesn't take care of repetitive keys and thus appending, probably I need to use SelectMany to flatten the structure, but how would it work in giving me a dictionary with requirements specified above, so that delimiters can be added to be existing key's value. Any pointer that can direct me in the correct direction.

Edited:
datatable.AsEnumerable()
.GroupBy(r => (string)r["Name"])
.Select(g => new
{
Key = g.Key,
// Preferred Solution
Value = new StringBuilder(
g.Select(r => string.Format("{0}, {1}, {2}",
r["Value"], r["Type"], r["Info"]))
.Aggregate((s1, s2) => s1 + "?:" + s2))
/*
//as proposed by juharr
Value = new StringBuilder(string.Join("?:", g.Select( r => string.Format("{0}, {1}, {2}", r["Value"], r["Type"], r["Info"]))))
*/
})
.ToDictionary(p => p.Key, p => p.Value);

Something like this should work, and it avoid some complex Linq that could get irritating to debug:
public static Dictionary<string, StringBuilder> GetData(DataTable table)
{
const string delimiter = "?:";
var collection = new Dictionary<string, StringBuilder>();
// dotNetFiddle wasn't liking the `.AsEnumerable()` extension
// But you should still be able to use it here
foreach (DataRow row in table.Rows)
{
var key = (string)row["Name"];
var #value = string.Format("{0},{1},{2}",
row["Value"],
row["Type"],
row["Info"]);
StringBuilder existingSb;
if (collection.TryGetValue(key, out existingSb))
{
existingSb.Append(delimiter + #value);
}
else
{
existingSb = new StringBuilder();
existingSb.Append(#value);
collection.Add(key, existingSb);
}
}
return collection;
}

Related

LINQ - GroupBy multiple columns and merge the result

I am working with sizeable set of data (~130.000 records), I've managed to transform it the way I want it (to csv).
Here is a simplified example of how the List looks like:
"Surname1, Name1;Address1;State1;YES;Group1"
"Surname2, Name2;Address2;State2;YES;Group2"
"Surname2, Name2;Address2;State2;YES;Group1"
"Surname3, Name3;Address3;State3;NO;Group1"
"Surname1, Name1;Address2;State1;YES;Group1"
Now, I would like to merge the records if 1st, 2nd AND 3rd column match, like so:
output
"Surname1, Name1;Address1;State1;YES;Group1"
"Surname2, Name2;Address2;State2;YES;Group2 Group1"
"Surname3, Name3;Address3;State3;NO;Group1"
"Surname1, Name1;Address2;State1;YES;Group1"
Here's what I've got so far:
output.GroupBy(x => new { c1 = x.Split(';')[0], c2 = x.Split(';')[1], c3 = x.Split(';')[2] }).Select(//have no idea what should go here);
First try to get the columns you need projecting the result in an anonymous type:
var query= from r in output
let columns= r.Split(';')
select new { c1 =columns[0], c2 =columns[1], c3 = columns[2] ,c5=columns[4]};
And then create the groups but now using the anonymous object you define in the previous query:
var result= query.GroupBy(e=>new {e.c1, e.c2, e.c3})
.Select(g=> new {SurName=g.Key.c1,
Name=g.Key.c2,
Address=g.Key.c3,
Groups=String.Join(",",g.Select(e=>e.c4)});
I know I'm missing some columns but I think you can get the idea.
PS: The fact I have separated the logic in two queries is just for readability propose, you can compose both queries in one but that is not going to change the performance because LINQ use deferred evaluation.
This is how I would do it:
class Program
{
static void Main(string[] args)
{
List<string> input = new List<string> {
"Surname1, Name1;Address1;State1;YES;Group1",
"Surname2, Name2;Address2;State2;YES;Group2",
"Surname2, Name2;Address2;State2;YES;Group1",
"Surname3, Name3;Address3;State3;NO;Group1",
"Surname1, Name1;Address2;State1;YES;Group1",
};
var transformed = input.Select(s => s.Split(';'))
.GroupBy( s => new string[] { s[0], s[1], s[2], s[3] },
(key, elements) => string.Join(";", key) + ";" + string.Join(" ", elements.Select(e => e.Last())),
new MyEqualityComparer())
.ToList();
}
}
internal class MyEqualityComparer : IEqualityComparer<string[]>
{
public bool Equals(string[] x, string[] y)
{
return x[0] == y[0] && x[1] == y[1] && x[2] == y[2];
}
public int GetHashCode(string[] obj)
{
int hashCode = obj[0].GetHashCode();
hashCode = hashCode ^ obj[1].GetHashCode();
hashCode = hashCode ^ obj[2].GetHashCode();
return hashCode;
}
}
Consider the first 4 columns as the grouping key, but only use the first 3 for the comparison (hence the custom IEqualityComparer).
Then if you have the (key, elements) groups, transform them so that you join the elements of the key with ; (remember, the key consists of the first 4 columns) and add to it the last element from every member of the group, joined with a space.

C# take a duplicate entry in a CSV file and remove the duplicate by taking an average

My program creates a .csv file with a persons name and an integer next to them.
Occasionally there are two entries of the same name in the file, but with a different time. I only want one instance of each person.
I would like to take the mean of the two numbers to produce just one row for the name, where the number will be the average of the two existing.
So here Alex Pitt has two numbers. How can I take the mean of 105 and 71 (in this case) to produce a row that just includes Alex Pitt, 88?
Here is how I am creating my CSV file if reference is required.
public void CreateCsvFile()
{
PaceCalculator ListGather = new PaceCalculator();
List<string> NList = ListGather.NameGain();
List<int> PList = ListGather.PaceGain();
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
string filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
using (var file = File.CreateText(filepath))
{
foreach (var arr in nAndPList)
{
if (arr == null || arr.Length == 0) continue;
file.Write(arr[0]);
for (int i = 1; i < arr.Length; i++)
{
file.Write(arr[i]);
}
file.WriteLine();
}
}
}
To start with, you can write your current CreateCsvFile much more simply like this:
public void CreateCsvFile()
{
var filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => String.Format("{0},{1}", a, b));
File.WriteAllLines(filepath, records);
}
Now, it can easily be changed to work out the average pace if you have duplicate names, like this:
public void CreateCsvFile()
{
var filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
from record in ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => new { Name = a, Pace = b })
group record.Pace by record.Name into grs
select String.Format("{0},{1}", grs.Key, grs.Average());
File.WriteAllLines(filepath, records);
}
I would recommend to merge the duplicates before you put everything into the CSV file.
use:
// The List with all duplicate values
List<string> duplicateChecker = new List<string>();
//Takes the duplicates and puts them in a new List. I'm using the NList because I assume the Names are the important part.
duplicateChecker = NList .Distinct().ToList();
Now you can simply Iterrate through the new list and search their values in your NList. Use a foreach loop which is looking up the index of the Name value in Nlist. After that you can use the Index to merge the integers with a simple math method.
//Something like this:
Make a foreach loop for every entry in your duplicateChecker =>
Use Distrinc again on duplicateChecker to make sure you won't go twice through the same duplicate =>
Get the Value of the current String and search it in Nlist =>
Get the Index of the current Element in Nlist and search for the Index in Plist =>
Get the Integer of Plist and store it in a array =>
// make sure your math method runs before a new name starts. After that store the new values in your nAndPList
Once the Loop is through with the first name use a math method.
I hope you understand what I was trying to say. However I would recommend using a unique identifier for your persons. Sooner or later 2 persons will appear with the same name (like in a huge company).
Change the code below:
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
To
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b)
.ToList()
.GroupBy(x => x.[The field you want to group by])
.Select(y => y.First);

TextBox display closest match string

How can I get the string from a list that best match with a base string using the Levenshtein Distance.
This is my code:
{
string basestring = "Coke 600ml";
List<string> liststr = new List<string>
{
"ccoca cola",
"cola",
"coca cola 1L",
"coca cola 600",
"Coke 600ml",
"coca cola 600ml",
};
Dictionary<string, int> resultset = new Dictionary<string, int>();
foreach(string test in liststr)
{
resultset.Add(test, Ldis.Compute(basestring, test));
}
int minimun = resultset.Min(c => c.Value);
var closest = resultset.Where(c => c.Value == minimun);
Textbox1.Text = closest.ToString();
}
In this example if I run the code I get 0 changes in string number 5 from the list, so how can I display in the TextBox the string itself?
for exemple : "Coke 600ml" Right now my TextBox just returns:
System.Linq.Enumerable+WhereEnumerableIterator`1
[System.Collections.Generic.KeyValuePair`2[System.String,System.Int32]]
Thanks.
Try this
var closest = resultset.First(c => c.Value == minimun);
Your existing code is trying to display a list of items in the textbox. I looks like it should just grab a single item where Value == min
resultset.Where() returns a list, you should use
var closest = resultset.First(c => c.Value == minimun);
to select a single result.
Then the closest is a KeyValuePair<string, int>, so you should use
Textbox1.Text = closest.Key;
to get the string. (You added the string as Key and changes count as Value to resultset earilier)
There is a good solution in code project
http://www.codeproject.com/Articles/36869/Fuzzy-Search
It can be very much simplified like so:
var res = liststr.Select(x => new {Str = x, Dist = Ldis.Compute(basestring, x)})
.OrderBy(x => x.Dist)
.Select(x => x.Str)
.ToArray();
This will order the list of strings from most similar to least similar.
To only get the most similar one, simply replace ToArray() with First().
Short explanation:
For every string in the list, it creates an anonymous type which contains the original string and it's distance, computed using the Ldis class. Then, it orders the collection by the distance and maps back to the original string, so as to lose the "extra" information calculated for the ordering.

Add keyvaluepair from file to Dictionary?

I am trying to import values from a .txt file into my dictionary. The .txt file is formatted like this:
Donald Duck, 2010-04-03
And so on... there is 1 entry like that on each line. My problem comes when I try to add the split strings into the dictionary.
I am trying it like this: scoreList.Add(values[0], values[1]); But it says that names doesn't exist in the context. I hope someone can point me in the correct direction about this...
Thank you!
private void Form1_Load(object sender, EventArgs e)
{
Dictionary<string, DateTime> scoreList = new Dictionary<string, DateTime>();
string path = #"list.txt";
var query = (from line in File.ReadAllLines(path)
let values = line.Split(',')
select new { Key = values[0], Value = values[1] });
foreach (KeyValuePair<string, DateTime> pair in scoreList)
{
scoreList.Add(values[0], values[1]);
}
textBox1.Text = scoreList.Keys.ToString();
}
Your values variable are only in scope within the LINQ query. You need to enumerate the query result, and add the values to the dictionary:
foreach (var pair in query)
{
scoreList.Add(pair.Key, pair.Value);
}
That being said, LINQ features a ToDictionary extension method that can help you here. You could replace your loop with:
scoreList = query.ToDictionary(x => x.Key, x => x.Value);
Finally, for the types to be correct, you need to convert the Value to DateTimeusing, for instance, DateTime.Parse.
First you are doing it wrong, you should add item from list not values[0] and values[1] used in LINQ..
Dictionary<string, DateTime> scoreList = new Dictionary<string, DateTime>();
string path = #"list.txt";
var query = (from line in File.ReadAllLines(path)
let values = line.Split(',')
select new { Key = values[0], Value = values[1] });
foreach (var item in query) /*changed thing*/
{
scoreList.Add(item.Key, DateTime.Parse(item.Value)); /*changed thing*/
}
textBox1.Text = scoreList.Keys.ToString();
The immediate problem with the code is that values only exists in the query expression... your sequence has an element type which is an anonymous type with Key and Value properties.
The next problem is that you're then iterating over scoreList, which will be empty to start with... and there's also no indication of where you plan to convert from string to DateTime. Oh, and I'm not sure whether Dictionary<,>.Keys.ToString() will give you anything useful.
You can build the dictionary simply enough though:
var scoreList = File.ReadLines(path)
.Select(line => line.Split(','))
.ToDictionary(bits => bits[0], // name
bits => DateTime.ParseExact(bits[1], // date
"yyyy-MM-dd",
CultureInfo.InvariantCulture));
Note the use of DateTime.ParseExact instead of just DateTime.Parse - if you know the format of the data, you should use that information.

Getting a cell from DataTable.Row.ItemArray with Linq

I have the following ItemArray:
dt.Rows[0].ItemArray.. //{0,1,2,3,4,5}
the headers are : item0,item1,item2 etc..
So far, to get a value from the ItemArray I used to call it by an index.
Is there any way to get the value within the ItemArray with a Linq expression based on the column name?
Thanks
You can also use the column-name to get the field value:
int item1 = row.Field<int>("Item1");
DataRow.Item Property(String)
DataRow.Field Method: Provides strongly-typed access
You could also use LINQ-to-DataSet:
int[] allItems = (from row in dt.AsEnumerable()
select row.Field<int>("Item1")).ToArray();
or in method syntax:
int[] allItems = dt.AsEnumerable().Select(r => r.Field<int>("Item1")).ToArray();
If you use the Item indexer rather than ItemArray, you can access items by column name, regardless of whether you use LINQ or not.
dt.Rows[0]["Column Name"]
Tim Schmelter's answer is probably what you are lookin for, just to add also this way using Convert class instead of DataRow.Field:
var q = (from row in dataTable.AsEnumerable() select Convert.ToInt16(row["COLUMN1"])).ToArray();
Here's what I've come up with today solving a similar problem. In my case:
(1)I needed to xtract the values from columns named Item1, Item2, ... of bool type.
(2) I needed to xtract the ordinal number of that ItemN that had a true value.
var itemValues = dataTable.Select().Select(
r => r.ItemArray.Where((c, i) =>
dataTable.Columns[i].ColumnName.StartsWith("Item") && c is bool)
.Select((v, i) => new { Index = i + 1, Value = v.ToString().ToBoolean() }))
.ToList();
if (itemValues.Any())
{
//int[] of indices for true values
var trueIndexArray = itemValues.First().Where(v => v.Value == true)
.Select(v => v.Index).ToArray();
}
forgot an essential part: I have a .ToBoolean() helper extension method to parse object values:
public static bool ToBoolean(this string s)
{
if (bool.TryParse(s, out bool result))
{
return result;
}
return false;
}

Categories