Finding differences in two lists - c#

I am thinking about a good way to find differences in two lists
here is the problem:
Two lists have some strings where first 3 numbers/characters (*delimited) represent the unique key(followed by the text String="key1*key2*key3*text").
here is the string example:
AA1*1D*4*The quick brown fox*****CC*3456321234543~
where "*AA1*1D*4*" is a unique key
List1: "index1*index2*index3", "index2*index2*index3", "index3*index2*index3"
List2: "index2*index2*index3", "index1*index2*index3", "index3*index2*index3", "index4*index2*index3"
I need to match indexes in both lists and compare them.
If all 3 indexes from 1 list match 3 indexes from another list, I need to track both string entries in the new list
If there is a set of indexes in one list that don't appear in another, I need to track one side and keep an empty entry in another side. (#4 in the example above)
return the list
This is what I did so far, but I am kind of struggling here:
List<String> Base = baseListCopy.Except(resultListCopy, StringComparer.InvariantCultureIgnoreCase).ToList(); //Keep unique values(keep differences in lists)
List<String> Result = resultListCopy.Except(baseListCopy, StringComparer.InvariantCultureIgnoreCase).ToList(); //Keep unique values (keep differences in lists)
List<String[]> blocksComparison = new List<String[]>(); //we container for non-matching blocks; so we could output them later
//if both reports have same amount of blocks
if ((Result.Count > 0 || Base.Count > 0) && (Result.Count == Base.Count))
{
foreach (String S in Result)
{
String[] sArr = S.Split('*');
foreach (String B in Base)
{
String[] bArr = B.Split('*');
if (sArr[0].Equals(bArr[0]) && sArr[1].Equals(bArr[1]) && sArr[2].Equals(bArr[2]) && sArr[3].Equals(bArr[3]))
{
String[] NA = new String[2]; //keep results
NA[0] = B; //[0] for base
NA[1] = S; //[1] for result
blocksComparison.Add(NA);
break;
}
}
}
}
could you suggest a good algorithm for this process?
Thank you

You can use a HashSet.
Create a HashSet for List1. remember index1*index2*index3 is diffrent from index3*index2*index1.
Now iterate through second list.
Create Hashset for List1.
foreach(string in list2)
{
if(hashset contains string)
//Add it to the new list.
}

If I understand your question correctly, you'd like to be able to compare the elements by their "key" prefix, instead by the whole string content. If so, implementing a custom equality comparer will allow you to easily leverage the LINQ set algorithms.
This program...
class EqCmp : IEqualityComparer<string> {
public bool Equals(string x, string y) {
return GetKey(x).SequenceEqual(GetKey(y));
}
public int GetHashCode(string obj) {
// Using Sum could cause OverflowException.
return GetKey(obj).Aggregate(0, (sum, subkey) => sum + subkey.GetHashCode());
}
static IEnumerable<string> GetKey(string line) {
// If we just split to 3 strings, the last one could exceed the key, so we split to 4.
// This is not the most efficient way, but is simple.
return line.Split(new[] { '*' }, 4).Take(3);
}
}
class Program {
static void Main(string[] args) {
var l1 = new List<string> {
"index1*index1*index1*some text",
"index1*index1*index2*some text ** test test test",
"index1*index2*index1*some text",
"index1*index2*index2*some text",
"index2*index1*index1*some text"
};
var l2 = new List<string> {
"index1*index1*index2*some text ** test test test",
"index2*index1*index1*some text",
"index2*index1*index2*some text"
};
var eq = new EqCmp();
Console.WriteLine("Elements that are both in l1 and l2:");
foreach (var line in l1.Intersect(l2, eq))
Console.WriteLine(line);
Console.WriteLine("\nElements that are in l1 but not in l2:");
foreach (var line in l1.Except(l2, eq))
Console.WriteLine(line);
// Etc...
}
}
...prints the following result:
Elements that are both in l1 and l2:
index1*index1*index2*some text ** test test test
index2*index1*index1*some text
Elements that are in l1 but not in l2:
index1*index1*index1*some text
index1*index2*index1*some text
index1*index2*index2*some text

List one = new List();
List two = new List();
List three = new List();
HashMap<String,Integer> intersect = new HashMap<String,Integer>();
for(one: String index)
{
intersect.put(index.next,intersect.get(index.next) + 1);
}
for(two: String index)
{
if(intersect.containsKey(index.next))
{
three.add(index.next);
}
}

Related

C# Splitting a list based on items

I have a list with following items (lets call it 'main list'):
[1,2,2,3,1,2,3,4,4,1,2,2]
My goal is to split it into smaller lists that will contain '1' and the following numbers (until the next '1' occurs).
The end result should look like this:
[1,2,2,3]
[1,2,3,4,4]
[1,2,2]
Some additional info:
small lists can be of different lengths
the main list always starts with '1' (so you don't have to look for the beginning)
the elements of the main list and smaller lists are strings!
the number of smaller lists depends on the number of '1' in the main list
Basically I need to split a list of strings to smaller lists of strings.
The first idea that came to my mind was to create a ('final') list containing smaller lists, so I created a loop:
List<string> templist = new List<string>();
List<List<string> final = new List<List<string>();
foreach (string log in mainlist)
{
if (log != '1')
{
templist.Add(log);
}
else
{
final.add(templist);
templist.Clear();
templist.Add(log)}
}
final.add(templog);
it seems to work but I get a list with duplicates:
[[1,2,2],[1,2,2],[1,2,2]]
you can do this.
check for 1 and initialize the current list with default item 1 (add the current list to the final list in this step as well)
and if not one then keep on adding the items to the current list.
List<string> currentList = null;
List<string> mainList = new List<string> { "1", "2", "2", "3", "1", "2", "3", "4", "4", "1", "2", "2" };
List<List<string>> finalList = new List<List<string>>();
foreach (string item in mainList)
{
if (item == "1")
{
currentList = new List<string>() { item };
finalList.Add(currentList);
}
else
{
currentList.Add(item);
}
}
Can be:
else
{
templist = new List<string>(){log};
final.add(templist);
}
You can find the index of all occurrences of 1. Then according to these indexes and using the two functions Take() and Skip() to separate the lists..
List<string> mainlist = new List<string>{"1","2","2","3","1","2","3","4","4","1","2","2"};
List<List<string>> final = new List<List<string>>();
var numberOfList = mainlist.Select((item, index) => new {val = item,ind = index})
.Where(item => item.val == "1").ToList();
for(int i = 0; i<numberOfList.Count;i++)
{
if(i == numberOfList.Count-1)
final.Add(mainlist.Skip(numberOfList[i].ind)
.Take(mainlist.Count - numberOfList[i].ind).ToList());
else
final.Add(mainlist.Skip(numberOfList[i].ind)
.Take(numberOfList[i + 1].ind - numberOfList[i].ind).ToList());
}
result:
[1,2,2,3]
[1,2,3,4,4]
[1,2,2]

Faster way to find first occurence of String in list

I have a method, that finds first occurrences in list of words.
wordSet - set of words, that i need to check
That list is representation of text, so words located in order, that text has.
so if pwWords has suck elements {This,is,good,boy,and,this,girl,is,bad}
and wordSet has {this,is} method should add true only for first two elements.
My question is: is there any faster way to do this?
Because if pwWords has like over million elements, and wordSet over 10 000 it works pretty slow.
public List<bool> getFirstOccurances(List<string> pwWords)
{
var firstOccurance = new List<bool>();
var wordSet = new List<String>(WordsWithFDictionary.Keys);
foreach (var pwWord in pwWords)
{
if (wordSet.Contains(pwWord))
{
firstOccurance.Add(true);
wordSet.Remove(pwWord);
}
else
{
firstOccurance.Add(false);
}
}
return firstOccurance;
}
Another approach is using HashSet for wordSet
public List<bool> getFirstOccurances(List<string> pwWords)
{
var wordSet = new HashSet<string>(WordsWithFDictionary.Keys);
return pwWords.Select(word => wordSet.Contains(word)).ToList();
}
HashSet.Contains algorithm is O(1), where List.Contains will loop all items until item is found.
For better performance you can create wordSet only once if this is possible.
public class FirstOccurances
{
private HashSet<string> _wordSet;
public FirstOccurances(IEnumerable<string> wordKeys)
{
_wordSet = new HashSet<string>(wordKeys);
}
public List<bool> GetFor(List<string> words)
{
return words.Select(word => _wordSet.Contains(word)).ToList();
}
}
Then use it
var occurrences = new FirstOccurances(WordsWithFDictionary.Keys);
// Now you can effectively search for occurrences multiple times
var result = occurrences.GetFor(pwWords);
var anotherResult = occurrences.GetFor(anotherPwWords);
Because item of pwWords can be checked for occurrences independently and if order of items not imported you can try to use Parallel LINQ
public List<bool> GetFor(List<string> words)
{
return words.AsParallel().Select(word => _wordSet.Contains(word)).ToList();
}

Using List<string>.Any() to find if a string contains an item as well as find the matching item?

I have a list of strings, which can be considered 'filters'.
For example:
List<string> filters = new List<string>();
filters.Add("Apple");
filters.Add("Orange");
filters.Add("Banana");
I have another list of strings, which contains sentences.
Example:
List<string> msgList = new List<string>();
msgList.Add("This sentence contains the word Apple.");
msgList.Add("This doesn't contain any fruits.");
msgList.Add("This does. It's a banana.");
Now I want to find out which items in msgList contains a fruit. For which, I use the following code:
foreach(string msg in msgList)
{
if(filters.Any(msg.Contains))
{
// Do something.
}
}
I'm wondering, is there a way in Linq where I can use something similar to List.Any() where I can check if msgList contains a fruit, and if it does, also get the fruit which matched the inquiry. If I can get the matching index in 'filters' that should be fine. That is, for the first iteration of the loop it should return 0 (index of 'Apple'), for the second iteration return null or something like a negative value, for the third iteration it should return 2 (index of 'Banana').
I checked around in SO as well as Google but couldn't find exactly what I'm looking for.
You want FirstOrDefault instead of Any.
FirstOrDefault will return the first object that matches, if found, or the default value (usually null) if not found.
You could use the List<T>.Find method:
foreach (string msg in msgList)
{
var fruit = filters.Find(msg.Contains);
if (fruit != null)
{
// Do something.
}
}
List<string> filters = new List<string>() { "Apple", "Orange", "Banana" };
string msg = "This sentence contains the word Apple.";
var fruit = Regex.Matches(msg, #"\w+", RegexOptions.IgnoreCase)
.Cast<Match>()
.Select(x=>x.Value)
.FirstOrDefault(s => filters.Contains(s));
A possible approach to return the indexes of the elements
foreach (string msg in msgList)
{
var found = filters.Select((x, i) => new {Key = x, Idx = i})
.FirstOrDefault(x => msg.Contains(x.Key));
Console.WriteLine(found?.Idx);
}
Note also that Contains is case sensitive, so the banana string is not matched against the Banana one. If you want a case insensitive you could use IndexOf with the StringComparison operator

Generate permutations using polymorphic method

The instructions :
Please write a piece of code that takes as an input a list in which
each element is another list containing an unknown type and which
returns a list of all possible lists that can be obtained by taking
one element from each of the input lists.
For example:
[[1, 2], [3, 4]], should return: [[1, 3], [1, 4], [2, 3], [2, 4]].
[['1'], ['2'], ['3', '4' ]], should return [['1', '2', '3'], ['1',
'2', '4']].
My code:
public static void Main(string[] args)
{
//Create a list of lists of objects.
var collections = new List<List<object>>();
collections.Add(new List<object> { 1, 5, 3 });
collections.Add(new List<object> { 7, 9 });
collections.Add(new List<object> { "a", "b" });
//Get all the possible permutations
var combinations = GetPermutations(collections);
//Loop through the results and display them in console
foreach (var result in combinations)
{
result.ForEach(item => Console.Write(item + " "));
Console.WriteLine();
}
Console.WriteLine("Press any key to exit.");
Console.ReadKey();
}
private static List<List<object>> GetPermutations(List<List<object>> collections)
{
List<List<object>> permutations = new List<List<object>>();
//Check if the input list has any data, else return the empty list.
if (collections.Count <= 0)
return permutations;
//Add the values of the first set to the empty List<List<object>>
//permutations list
foreach (var value in collections[0])
permutations.Add(new List<object> { value });
/* Skip the first set of List<List<object>> collections as it was
* already added to the permutations list, and loop through the
* remaining sets. For each set, call the AppendValues function
* to append each value in the set to the permuations list.
* */
foreach (var set in collections.Skip(1))
permutations = AppendNewValues(permutations, set);
return permutations;
}
private static List<List<object>> AppendNewValues(List<List<object>> permutations, List<object> set)
{
//Loop through the values in the set and append them to each of the
//list of permutations calculated so far.
var newCombinations = from additional in set
from value in permutations
select new List<object>(value) { additional };
return newCombinations.ToList();
}
How could I make it work with polymorphic method that returns a generic list?
Please write a piece of code that takes as an input a list in which each element is another list containing an unknown type and which returns a list of all possible lists that can be obtained by taking one element from each of the input lists.
I would have asked for clarification, something like "You mean a generic method then?"
In speaking of polymorphism, they were likely being able to write just one method and call it form any arbitrary type, something like:
public static IList<IList<T>> GetPermutations<T>(IList<IList<T>> inputLists) {
if (inputLists.Count < 2) {
// special case.
}
return _permutationHelper(0, inputLists);
}
private static IList<IList<T>> _permutationHelper<T>(int i, IList<IList<T>> inputLists) {
IList<IList<T>> returnValue = new List<IList<T>>();
if (i == inputLists.Count) {
returnValue.Add(new List<T>());
} else {
foreach (var t in inputLists[i]) {
foreach (var list in _permutationHelper(i + 1, inputLists)) {
list.Add(t);
returnValue.Add(list);
}
}
}
return returnValue;
}
It is true that your implementation would allow arbitrary types at run time, but it loses type safety. Given that it's an implementation in C#, type safety being a requirement is a safe guess - but it doesn't hurt to ask either.
Another thing of note - they could have just said they were looking for the Cartesian product of the given lists.
All I can think of is that they were not trying to mix different types in the lists(like you implemented), the types of all lists would be the same and they wanted you to write a Generic Class that would handle the problem for different types of lists, resulting in something like this:
static void Main(string[] args)
{
var intCollections = new List<List<int>>();
intCollections.Add(new List<int> { 1, 5, 3 });
intCollections.Add(new List<int> { 7, 9 });
var stringCollections = new List<List<String>>();
stringCollections.Add(new List<String> { "a", "b" });
stringCollections.Add(new List<String> { "c","d", "e" });
stringCollections.Add(new List<String> { "g", "f" });
//here you would have the "polymorphism", the same signature for different Lists types
var intCombinations = GetPermutations(intCollections);
var stringCombinations = GetPermutations(stringCollections);
foreach (var result in intCombinations)
{
result.ForEach(item => Console.Write(item + " "));
Console.WriteLine();
}
Console.WriteLine();
foreach (var result in stringCombinations)
{
result.ForEach(item => Console.Write(item + " "));
Console.WriteLine();
}
Console.WriteLine("Press any key to exit.");
Console.ReadKey();
}
//This would be your generic implementation, basically changing from object to T and adding <T> after method
private static List<List<T>> GetPermutations<T>(List<List<T>> collections)
{
List<List<T>> permutations = new List<List<T>>();
//Check if the input list has any data, else return the empty list.
if (collections.Count <= 0)
return permutations;
//Add the values of the first set to the empty List<List<object>>
//permutations list
foreach (var value in collections[0])
permutations.Add(new List<T> { value });
/* Skip the first set of List<List<object>> collections as it was
* already added to the permutations list, and loop through the
* remaining sets. For each set, call the AppendValues function
* to append each value in the set to the permuations list.
* */
foreach (var set in collections.Skip(1))
permutations = AppendNewValues(permutations, set);
return permutations;
}
private static List<List<T>> AppendNewValues<T>(List<List<T>> permutations, List<T> set)
{
//Loop through the values in the set and append them to each of the
//list of permutations calculated so far.
var newCombinations = from additional in set
from value in permutations
select new List<T>(value) { additional };
return newCombinations.ToList();
}
This generic implementation, comparing to yours, have the advantage of type Safety, it makes sure you will not mix different object types.

Find matching KVP from Dictionary<List<enum>,string> where search key is List<enum> and return reverse partial matches

I have a Dictionary where the key is a list of enum values, and the value is a simple string.
What I need to do is using another list of enum values find the match KVP.
The curveball and reason for posting here is I also need it to return KVP if the list from my test or search list contains all the items (or enum objects) in any key in the dictionary.
example excerpt of code:
public enum fruit{ apple , orange , banana , grapes };
public class MyClass
{
public Dictionary<List<fruit>, string> FruitBaskets = new Dictionary<List<fruit>, string>;
FruitBaskets.Add(new List<fruit>{apple,orange},"Basket 1");
List<fruit> SearchList = new List<fruit>{orange,apple,grapes};
}
I need to search the dictionary for SearchList and return "Basket 1".
Note that the matching may be backwards than what you would expect for such an example as I need the key to match agains the search list and not vice versa, so extra items in the search list that are not in the key are ok.
I know I could simply iterate the dict and check one by one but I also need this to be as fast as possible as it resides in a loop that is running fairly fast.
What I am currently using is;
public Dictionary<List<fruit>, string> SearchResults;
foreach (KeyValuePair<List<fruit>, string> FruitBasket in FruitBaskets)
{
if (FruitBasket.Key.Except(SearchList).Count() == 0)
SearchResults.Add(FruitBasket);
}
Wondering if there is a better/faster way.
You need to rethink about you choice of Keys in dictionary. There are some major problem with List keys, such as:
You can't use O(1) key lookup with List
Your keys aren't immutable
You can have identical lists as keys without receiving errors, for example you can have:
var a = new[] { fruit.organge }.ToList();
var b = new[] { fruit.organge }.ToList();
fruitBasket.Add(a, "1");
fruitBasket.Add(b, "2");
But is this dictionary valid? I guess not but it depends on your requirements.
You can change Dictionary keys!
For this reasons, you need to change your dictionary key type. You can use combined Enum values instead of using a List with bitwise operators. For this to work, you need to assign powers of 2 to each enum value:
[Flags]
public Enum Fruit
{
Orange = 1,
Apple = 2,
Banana = 4,
Grape = 8
}
You have to combine these enum values to get the desired multi-value enum dictionary key effect:
For [Fruit.Orange, Fruit.Apple] you use Fruit.Orange | Fruit.Apple.
Here's a sample code for combining and decomposing values:
private static fruit GetKey(IEnumerable<fruit> fruits)
{
return fruits.Aggregate((x, y) => x |= y);
}
private static IEnumerable<fruit> GetFruits(fruit combo)
{
return Enum.GetValues(typeof(fruit)).Cast<int>().Where(x => ((int)combo & x) > 0).Cast<fruit>();
}
Now you need a function to get all combinaions (power set) of the SearchList:
private static IEnumerable<fruit> GetCombinations(IEnumerable<fruit> fruits)
{
return Enumerable.Range(0, 1 << fruits.Count())
.Select(mask => fruits.Where((x, i) => (mask & (1 << i)) > 0))
.Where(x=>x.Any())
.Select(x=> GetKey(x));
}
Using these combinations, you can lookup values from dictionary using O(1) time.
var fruitBaskets = new Dictionary<fruit, string>();
fruitBaskets.Add(GetKey(new List<fruit> { fruit.apple, fruit.orange }), "Basket 1");
List<fruit> SearchList = new List<fruit> { fruit.orange, fruit.apple, fruit.grapes };
foreach (var f in GetCombinations(SearchList))
{
if (fruitBaskets.ContainsKey(f))
Console.WriteLine(fruitBaskets[f]);
}
Consider storing your data in a different way:
var FruitBaskets = Dictionary<fruit, List<string>>();
Each entry contains elements that match at least one fruit. Conversion from your structure is as follows:
foreach (var kvp in WobblesFruitBaskets)
{
foreach (var f in kvp.Key)
{
List<string> value;
if (!FruitBaskets.TryGetValue(f, out value))
{
value = new List<string>();
FruitBaskets.Add(f, value);
}
value.Add(kvp.Value);
}
}
Now, the search would look like this: For a composed key searchList you first calculate results for single keys:
var partialResults = new Dictionary<fruit, List<string>>();
foreach (var key in searchList)
{
List<string> r;
if (FruitBaskets.TryGetValue(key, out r))
{
partialResults.Add(key, r);
}
}
Now, what is left is to compose all possible search results. This is the hardest part, which I believe is inherent to your approach: for a key with n elements you have 2n - 1 possible subkeys. You can use one of subset generating approaches from answers to this question and generate your final result:
var finalResults = new Dictionary<List<fruit>, List<string>>();
foreach (var subkey in GetAllSubsetsOf(searchList))
{
if (!subkey.Any())
{
continue; //I assume you don't want results for an empty key (hence "-1" above)
}
var conjunction = new HashSet<string>(partialResults[subkey.First()]);
foreach (var e in subkey.Skip(1))
{
conjunction.IntersectWith(partialResults[e]);
}
finalResults.Add(subkey, conjunction.ToList());
}
I've changed string to List<string> in result's value part. If there is some invariant in your approach that guarantees there will be always only one result, then it should be easy to fix that.
if you create a Dictionary from a Reference Type, you stored just the Reference (Not value), then you can't use simply FruitBaskets[XXX] (except you use the same key that you create the node of dictionary), you must iterate whole of Keys in your dictionary.
I think this function is easy and good for you:
bool Contain(List<fruit> KEY)
{
foreach (var item in FruitBaskets.Keys)
{
if (Enumerable.SequenceEqual<fruit>(KEY,item))
return true;
}
return false;
}
and this,
bool B = Contain(new List<fruit> { fruit.apple, fruit.orange }); //this is True
But if you want to consider the permutation of members, you can use this function:
bool Contain(List<fruit> KEY)
{
foreach (var item in FruitBaskets.Keys)
{
HashSet<fruit> Hkey= new HashSet<fruit>(KEY);
if (Hkey.SetEquals(item))
return true;
}
return false;
}
and here's the output:
bool B1 = Contain(new List<fruit> { fruit.orange, fruit.grapes }); // = False
bool B2 = Contain(new List<fruit> { fruit.orange, fruit.apple }); // = True
bool B3 = Contain(new List<fruit> { fruit.apple, fruit.orange }); // = True

Categories