How can I get the string from a list that best match with a base string using the Levenshtein Distance.
This is my code:
{
string basestring = "Coke 600ml";
List<string> liststr = new List<string>
{
"ccoca cola",
"cola",
"coca cola 1L",
"coca cola 600",
"Coke 600ml",
"coca cola 600ml",
};
Dictionary<string, int> resultset = new Dictionary<string, int>();
foreach(string test in liststr)
{
resultset.Add(test, Ldis.Compute(basestring, test));
}
int minimun = resultset.Min(c => c.Value);
var closest = resultset.Where(c => c.Value == minimun);
Textbox1.Text = closest.ToString();
}
In this example if I run the code I get 0 changes in string number 5 from the list, so how can I display in the TextBox the string itself?
for exemple : "Coke 600ml" Right now my TextBox just returns:
System.Linq.Enumerable+WhereEnumerableIterator`1
[System.Collections.Generic.KeyValuePair`2[System.String,System.Int32]]
Thanks.
Try this
var closest = resultset.First(c => c.Value == minimun);
Your existing code is trying to display a list of items in the textbox. I looks like it should just grab a single item where Value == min
resultset.Where() returns a list, you should use
var closest = resultset.First(c => c.Value == minimun);
to select a single result.
Then the closest is a KeyValuePair<string, int>, so you should use
Textbox1.Text = closest.Key;
to get the string. (You added the string as Key and changes count as Value to resultset earilier)
There is a good solution in code project
http://www.codeproject.com/Articles/36869/Fuzzy-Search
It can be very much simplified like so:
var res = liststr.Select(x => new {Str = x, Dist = Ldis.Compute(basestring, x)})
.OrderBy(x => x.Dist)
.Select(x => x.Str)
.ToArray();
This will order the list of strings from most similar to least similar.
To only get the most similar one, simply replace ToArray() with First().
Short explanation:
For every string in the list, it creates an anonymous type which contains the original string and it's distance, computed using the Ldis class. Then, it orders the collection by the distance and maps back to the original string, so as to lose the "extra" information calculated for the ordering.
Related
My program creates a .csv file with a persons name and an integer next to them.
Occasionally there are two entries of the same name in the file, but with a different time. I only want one instance of each person.
I would like to take the mean of the two numbers to produce just one row for the name, where the number will be the average of the two existing.
So here Alex Pitt has two numbers. How can I take the mean of 105 and 71 (in this case) to produce a row that just includes Alex Pitt, 88?
Here is how I am creating my CSV file if reference is required.
public void CreateCsvFile()
{
PaceCalculator ListGather = new PaceCalculator();
List<string> NList = ListGather.NameGain();
List<int> PList = ListGather.PaceGain();
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
string filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
using (var file = File.CreateText(filepath))
{
foreach (var arr in nAndPList)
{
if (arr == null || arr.Length == 0) continue;
file.Write(arr[0]);
for (int i = 1; i < arr.Length; i++)
{
file.Write(arr[i]);
}
file.WriteLine();
}
}
}
To start with, you can write your current CreateCsvFile much more simply like this:
public void CreateCsvFile()
{
var filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => String.Format("{0},{1}", a, b));
File.WriteAllLines(filepath, records);
}
Now, it can easily be changed to work out the average pace if you have duplicate names, like this:
public void CreateCsvFile()
{
var filepath = #"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
from record in ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => new { Name = a, Pace = b })
group record.Pace by record.Name into grs
select String.Format("{0},{1}", grs.Key, grs.Average());
File.WriteAllLines(filepath, records);
}
I would recommend to merge the duplicates before you put everything into the CSV file.
use:
// The List with all duplicate values
List<string> duplicateChecker = new List<string>();
//Takes the duplicates and puts them in a new List. I'm using the NList because I assume the Names are the important part.
duplicateChecker = NList .Distinct().ToList();
Now you can simply Iterrate through the new list and search their values in your NList. Use a foreach loop which is looking up the index of the Name value in Nlist. After that you can use the Index to merge the integers with a simple math method.
//Something like this:
Make a foreach loop for every entry in your duplicateChecker =>
Use Distrinc again on duplicateChecker to make sure you won't go twice through the same duplicate =>
Get the Value of the current String and search it in Nlist =>
Get the Index of the current Element in Nlist and search for the Index in Plist =>
Get the Integer of Plist and store it in a array =>
// make sure your math method runs before a new name starts. After that store the new values in your nAndPList
Once the Loop is through with the first name use a math method.
I hope you understand what I was trying to say. However I would recommend using a unique identifier for your persons. Sooner or later 2 persons will appear with the same name (like in a huge company).
Change the code below:
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
To
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b)
.ToList()
.GroupBy(x => x.[The field you want to group by])
.Select(y => y.First);
I have list which contains objects of type Field i.e. List<Field> and my field class is defined as follows:
Public Class Field
{
public string FieldName { get; set; }
public string FieldValue { get; set; }
}
This list is then converted to a dictionary of type Dictionary<string, List<string>>
Dictionary<string, List<string>> myResult =
myFieldList.Select(m => m)
.Select((c, i) => new { Key = c.FieldName, value = c.FieldValue })
.GroupBy(o => o.Key, o => o.value)
.ToDictionary(grp => grp.Key, grp => grp.ToList());
I would like to use Linq to append the string values contained in the list as a single string, so technically the dictionary defined above should be defined as Dictionary<string, string> but I need a couple of extra steps when appending.
I need to add the \r\n in front of each values being appended and I need to make sure that these values including the new line do not get appended if the value is empty i.e.
value += (string.IsNullOrEmpty(newval) ? string.Empty : '\r\n' + newVal);
Thanks.
T.
Maybe this is what you want:
var myResult = myFieldList.GroupBy(o => o.FieldName, o => o.FieldValue)
.ToDictionary(grp => grp.Key, grp => string.Join("\r\n",
grp.Where(x=>!string.IsNullOrEmpty(x))));
Replace grp.ToList() with the logic that takes a sequence of strings and puts it together in the way you want.
The key methods you need are .Where to ignore items (i.e. the nulls), and string.Join to concatenate the strings together with a custom joiner (i.e. newlines).
Incidentally, you should use Environment.NewLine instead of '\r\n' to keep your code more portable.
Instead of grp.ToList() in the elements selector of ToDictionnary, aggregate everything to a single string there (only do this if you have a reasonable amount of strings in there, not a very high one as it would kill performance)
// Replace this
grp.ToList()
// With this
grp
.Where(s=>!string.IsNullOrEmtpy(s)) // drop the empty lines
.Aggregate((a,b)=>a+'\r\n'+b) // Aggregate all elements to a single string, adding your separator between each
I am trying to import values from a .txt file into my dictionary. The .txt file is formatted like this:
Donald Duck, 2010-04-03
And so on... there is 1 entry like that on each line. My problem comes when I try to add the split strings into the dictionary.
I am trying it like this: scoreList.Add(values[0], values[1]); But it says that names doesn't exist in the context. I hope someone can point me in the correct direction about this...
Thank you!
private void Form1_Load(object sender, EventArgs e)
{
Dictionary<string, DateTime> scoreList = new Dictionary<string, DateTime>();
string path = #"list.txt";
var query = (from line in File.ReadAllLines(path)
let values = line.Split(',')
select new { Key = values[0], Value = values[1] });
foreach (KeyValuePair<string, DateTime> pair in scoreList)
{
scoreList.Add(values[0], values[1]);
}
textBox1.Text = scoreList.Keys.ToString();
}
Your values variable are only in scope within the LINQ query. You need to enumerate the query result, and add the values to the dictionary:
foreach (var pair in query)
{
scoreList.Add(pair.Key, pair.Value);
}
That being said, LINQ features a ToDictionary extension method that can help you here. You could replace your loop with:
scoreList = query.ToDictionary(x => x.Key, x => x.Value);
Finally, for the types to be correct, you need to convert the Value to DateTimeusing, for instance, DateTime.Parse.
First you are doing it wrong, you should add item from list not values[0] and values[1] used in LINQ..
Dictionary<string, DateTime> scoreList = new Dictionary<string, DateTime>();
string path = #"list.txt";
var query = (from line in File.ReadAllLines(path)
let values = line.Split(',')
select new { Key = values[0], Value = values[1] });
foreach (var item in query) /*changed thing*/
{
scoreList.Add(item.Key, DateTime.Parse(item.Value)); /*changed thing*/
}
textBox1.Text = scoreList.Keys.ToString();
The immediate problem with the code is that values only exists in the query expression... your sequence has an element type which is an anonymous type with Key and Value properties.
The next problem is that you're then iterating over scoreList, which will be empty to start with... and there's also no indication of where you plan to convert from string to DateTime. Oh, and I'm not sure whether Dictionary<,>.Keys.ToString() will give you anything useful.
You can build the dictionary simply enough though:
var scoreList = File.ReadLines(path)
.Select(line => line.Split(','))
.ToDictionary(bits => bits[0], // name
bits => DateTime.ParseExact(bits[1], // date
"yyyy-MM-dd",
CultureInfo.InvariantCulture));
Note the use of DateTime.ParseExact instead of just DateTime.Parse - if you know the format of the data, you should use that information.
I have the following ItemArray:
dt.Rows[0].ItemArray.. //{0,1,2,3,4,5}
the headers are : item0,item1,item2 etc..
So far, to get a value from the ItemArray I used to call it by an index.
Is there any way to get the value within the ItemArray with a Linq expression based on the column name?
Thanks
You can also use the column-name to get the field value:
int item1 = row.Field<int>("Item1");
DataRow.Item Property(String)
DataRow.Field Method: Provides strongly-typed access
You could also use LINQ-to-DataSet:
int[] allItems = (from row in dt.AsEnumerable()
select row.Field<int>("Item1")).ToArray();
or in method syntax:
int[] allItems = dt.AsEnumerable().Select(r => r.Field<int>("Item1")).ToArray();
If you use the Item indexer rather than ItemArray, you can access items by column name, regardless of whether you use LINQ or not.
dt.Rows[0]["Column Name"]
Tim Schmelter's answer is probably what you are lookin for, just to add also this way using Convert class instead of DataRow.Field:
var q = (from row in dataTable.AsEnumerable() select Convert.ToInt16(row["COLUMN1"])).ToArray();
Here's what I've come up with today solving a similar problem. In my case:
(1)I needed to xtract the values from columns named Item1, Item2, ... of bool type.
(2) I needed to xtract the ordinal number of that ItemN that had a true value.
var itemValues = dataTable.Select().Select(
r => r.ItemArray.Where((c, i) =>
dataTable.Columns[i].ColumnName.StartsWith("Item") && c is bool)
.Select((v, i) => new { Index = i + 1, Value = v.ToString().ToBoolean() }))
.ToList();
if (itemValues.Any())
{
//int[] of indices for true values
var trueIndexArray = itemValues.First().Where(v => v.Value == true)
.Select(v => v.Index).ToArray();
}
forgot an essential part: I have a .ToBoolean() helper extension method to parse object values:
public static bool ToBoolean(this string s)
{
if (bool.TryParse(s, out bool result))
{
return result;
}
return false;
}
I have a string containing up to 9 unique numbers from 1 to 9 (myString) e.g. "12345"
I have a list of strings {"1"}, {"4"} (myList) .. and so on.
I would like to know how many instances in the string (myString) are contained within the list (myList), in the above example this would return 2.
so something like
count = myList.Count(myList.Contains(myString));
I could change myString to a list if required.
Thanks very much
Joe
I would try the following:
count = mylist.Count(s => myString.Contains(s));
It is not perfectly clear what you need, but these are some options that could help:
myList.Where(s => s == myString).Count()
or
myList.Where(s => s.Contains(myString)).Count()
the first would return the number of strings in the list that are the same as yours, the second would return the number of strings that contain yours. If neither works, please make your question more clear.
If myList is just List<string>, then this should work:
int count = myList.Count(x => myString.Contains(x));
If myList is List<List<string>>:
int count = myList.SelectMany(x => x).Count(s => myString.Contains(s));
Try
count = myList.Count(s => s==myString);
This is one approach, but it's limited to 1 character matches. For your described scenario of numbers from 1-9 this works fine. Notice the s[0] usage which refers to the list items as a character. For example, if you had "12" in your list, it wouldn't work correctly.
string input = "123456123";
var list = new List<string> { "1", "4" };
var query = list.Select(s => new
{
Value = s,
Count = input.Count(c => c == s[0])
});
foreach (var item in query)
{
Console.WriteLine("{0} occurred {1} time(s)", item.Value, item.Count);
}
For multiple character matches, which would correctly count the occurrences of "12", the Regex class comes in handy:
var query = list.Select(s => new
{
Value = s,
Count = Regex.Matches(input, s).Count
});
try
var count = myList.Count(x => myString.ToCharArray().Contains(x[0]));
this will only work if the item in myList is a single digit
Edit: as you probably noticed this will convert myString to a char array multiple times so it would be better to have
var myStringArray = myString.ToCharArray();
var count = myList.Count(x => myStringArray.Contains(x[0]));