How to remove duplicate record using linq?

How to remove duplicate record using linq? - c#

I have list of list and want to remove duplicate from list.
Data is stored in list format say IEnumerable<IEnumerable<string>> tableData
if we consider it as table value,
parent list is for rows and child list is values of every column.
Now I want to delete all duplicate rows. from below table value A is duplicate.
List<List<string>> ls = new List<List<string>>();
ls.Add(new List<string>() { "1", "A" });
ls.Add(new List<string>() { "2", "B" });
ls.Add(new List<string>() { "3", "C" });
ls.Add(new List<string>() { "4", "A" });
ls.Add(new List<string>() { "5", "A" });
ls.Add(new List<string>() { "6", "D" });
IEnumerable<IEnumerable<string>> tableData = ls;
var abc = tableData.SelectMany(p => p).Distinct(); ///not work
after operation, I want abc should be exactly tableData format
ls.Add(new List<string>() { "1", "A" });
ls.Add(new List<string>() { "2", "B" });
ls.Add(new List<string>() { "3", "C" });
ls.Add(new List<string>() { "6", "D" });

tableData.GroupBy(q => q.Skip(1).First()).Select(q => q.First())

You can use the overload of Distinct passing in an IEqualityComparer assuming you actually have an IEnumerable<IEnumerable<PropertyData>>.
For example:
var items = tableData.SelectMany(x => x).Distinct(new TableComparer());
And the comparer:
public class TableComparer : IEqualityComparer<PropertyData>
{
public bool Equals(PropertyData x, PropertyData y)
{
return x.id == y.id;
}
public int GetHashCode(PropertyData pData)
{
return pData.id.GetHashCode();
}
}
If it's just an IEnumerable<IEnumerable<string>>, you can use Distinct() without the overload:
var items = tableData.SelectMany(x => x).Distinct();
Though your question lacks clarity..

var distinctValues = tableData.SelectMany(x => x).Distinct();
This will flatten your list of lists and select the distinct set of strings.

you can use below menioned code
List<List<string>> ls=new List<List<string>>();
ls.Add(new List<string>(){"Hello"});
ls.Add(new List<string>(){"Hello"});
ls.Add(new List<string>() { "He" });
IEnumerable<IEnumerable<string>> tableData = ls;
var abc = tableData.SelectMany(p => p).Distinct();
O/P is
Hello
He

Related

linq query in list<list<T>>

I want to extract only the elements of List corresponding to a specific index from List<List>.
Tried to do this using LINQ Query and not For or Foreach, but I wasn't lucky
I want the result like below
List<List<string>> list = new List<List<string>>();
list.Add(new List<string> { "1", "2", "3" });
list.Add(new List<string> { "A", "B", "C" });
// expectation result List : index 1
// { "2", "B" }

Use Select.
int index;
list.Select(l => l[index]);

Set subtraction while keeping duplicates

I need to get the set subtraction of two string arrays while considering duplicates.
Ex:
var a = new string[] {"1", "2", "2", "3", "4", "4"};
var b = new string[] {"2", "3"};
(a - b) => expected output => string[] {"1", "2", "4", "4"}
I already tried Enumerable.Except() which returns the unique values after subtract: { "1", "4" } which is not what I'm looking for.
Is there a straightforward way of achieving this without a custom implementation?

You can try GroupBy, and work with groups e.g.
var a = new string[] {"1", "2", "2", "3", "4", "4"};
var b = new string[] {"2", "3"};
...
var subtract = b
.GroupBy(item => item)
.ToDictionary(chunk => chunk.Key, chunk => chunk.Count());
var result = a
.GroupBy(item => item)
.Select(chunk => new {
value = chunk.Key,
count = chunk.Count() - (subtract.TryGetValue(chunk.Key, out var v) ? v : 0)
})
.Where(item => item.count > 0)
.SelectMany(item => Enumerable.Repeat(item.value, item.count));
// Let's have a look at the result
Console.Write(string.Join(", ", result));
Outcome:
1, 2, 4, 4

By leveraging the undersung Enumerable.ToLookup (which allows you to create dictionary-like structure with multi-values per key) you can do this quite efficiently. Here, because key lookups on non-existent keys in an ILookup return empty IGrouping (rather than null or an error), you can avoid a whole bunch of null-checks/TryGet...-boilerplate. Because Enumerable.Take with a negative value is equivalent to Enumerable.Take(0), we don't have to check our arithmetic either.
var aLookup = a.ToLookup(x => x);
var bLookup = b.ToLookup(x => x);
var filtered = aLookup
.SelectMany(aItem => aItem.Take(aItem.Count() - bLookup[aItem.Key].Count()));

Try the following:
var a = new string[] { "1", "2", "2", "3", "4", "4" }.ToList();
var b = new string[] { "2", "3" };
foreach (var element in b)
{
a.Remove(element);
}
Has been tested.

Insert Values from string[] arrays to an ArrayList of records

I will add the values in the String[] in to the Arraylist. But, I want to access those string values from the ArrayList.
I tried this way.
private void Form1_Load()
{
fr = new string[5] { "1", "2", "3", "4", "5" };
bd = new string[5] {"a", "b","c", "d", "e"};
m = new ArrayList();
dosomething();
}
private void dosomething()
{
string[] record = new string[3];
for (int i = 0; i < 5; i++)
{
record[0] = "1";
record[1] = fr[i];
record[2] = bd[i];
m.Add(record);
}
}
I don't want to use the for loop is that any other way to do this???

I recommend you to use dictionaries. It is in my opinion the quickest way to store / access data. Also, with arraylists, at runtime, it performs a dynamic transtyping, which makes you loose so much time.

You maybe want to use :
fr = new string[5] { "1", "2", "3", "4", "5" };
bd = new string[5] { "a", "b", "c", "d", "e" };
m = new ArrayList();
fr.ToList().ForEach(_item => m.Add(new String[]{ "1", _item,bd[fr.ToList().IndexOf(_item)]}));
But I would prefere a solution like Fares already recommented...Use A Dictionary
Dictionary - MSDN

Not sure why you need an ArrayList. A generic list might be more suitable for you.
var fr = new string[5] { "1", "2", "3", "4", "5" };
var bd = new string[5] {"a", "b","c", "d", "e"};
int i = 1;
var results = fr.Zip<string, string, string[]>(bd, (a,b) => {
var v = new string [3] { i.ToString(), a,b };
i++;
return v;
}).ToList();

Add Range of items to list without duplication

I have a List of List of Strings, and I need to use the AddRange() Function to add set of items to it, but never duplicate items.
I used the following code :
List<List<string>> eList = new List<List<string>>();
List<List<string>> mergedList = new List<List<string>>();
//
// some code here
//
mergedList.AddRange(eList.Where(x => !mergedList.Contains(x)).ToList());
However it does not work.
All Duplicated items are added, so how could I solve that?

A)
If what you mean from duplicate is both lists contain the same elements int the same order, then
List<List<string>> eList = new List<List<string>>();
eList.Add(new List<string>() { "a", "b" });
eList.Add(new List<string>() { "a", "c" });
eList.Add(new List<string>() { "a", "b" });
var mergedList = eList.Distinct(new ListComparer()).ToList();
public class ListComparer : IEqualityComparer<List<string>>
{
public bool Equals(List<string> x, List<string> y)
{
return x.SequenceEqual(y);
}
public int GetHashCode(List<string> obj)
{
return obj.Take(5).Aggregate(23,(sum,s)=> sum ^= s.GetHashCode());
}
}
B)
If the order of elements in the list is not important, then
List<List<string>> eList = new List<List<string>>();
eList.Add(new List<string>() { "a", "b" }); <--
eList.Add(new List<string>() { "a", "c" });
eList.Add(new List<string>() { "b", "a" }); <--
var mergedList = eList.Select(x => new HashSet<string>(x))
.Distinct(HashSet<string>.CreateSetComparer()).ToList();

Try following LINQ query
mergeList.AddRange( eList.Where (x => mergeList.Where ( y => y.Intersect(x)).Any()));

Zip N IEnumerable<T>s together? Iterate over them simultaneously?

I have:-
IEnumerable<IEnumerable<T>> items;
and I'd like to create:-
IEnumerable<IEnumerable<T>> results;
where the first item in "results" is an IEnumerable of the first item of each of the IEnumerables of "items", the second item in "results" is an IEnumerable of the second item of each of "items", etc.
The IEnumerables aren't necessarily the same lengths. If some of the IEnumerables in items don't have an element at a particular index, then I'd expect the matching IEnumerable in results to have fewer items in it.
For example:-
items = { "1", "2", "3", "4" } , { "a", "b", "c" };
results = { "1", "a" } , { "2", "b" }, { "3", "c" }, { "4" };
Edit: Another example (requested in comments):-
items = { "1", "2", "3", "4" } , { "a", "b", "c" }, { "p", "q", "r", "s", "t" };
results = { "1", "a", "p" } , { "2", "b", "q" }, { "3", "c", "r" }, { "4", "s" }, { "t" };
I don't know in advance how many sequences there are, nor how many elements are in each sequence. I might have 1,000 sequences with 1,000,000 elements in each, and I might only need the first ~10, so I'd like to use the (lazy) enumeration of the source sequences if I can. In particular I don't want to create a new data structure if I can help it.
Is there a built-in method (similar to IEnumerable.Zip) that can do this?
Is there another way?

Now lightly tested and with working disposal.
public static class Extensions
{
public static IEnumerable<IEnumerable<T>> JaggedPivot<T>(
this IEnumerable<IEnumerable<T>> source)
{
List<IEnumerator<T>> originalEnumerators = source
.Select(x => x.GetEnumerator())
.ToList();
try
{
List<IEnumerator<T>> enumerators = originalEnumerators
.Where(x => x.MoveNext()).ToList();
while (enumerators.Any())
{
List<T> result = enumerators.Select(x => x.Current).ToList();
yield return result;
enumerators = enumerators.Where(x => x.MoveNext()).ToList();
}
}
finally
{
originalEnumerators.ForEach(x => x.Dispose());
}
}
}
public class TestExtensions
{
public void Test1()
{
IEnumerable<IEnumerable<int>> myInts = new List<IEnumerable<int>>()
{
Enumerable.Range(1, 20).ToList(),
Enumerable.Range(21, 5).ToList(),
Enumerable.Range(26, 15).ToList()
};
foreach(IEnumerable<int> x in myInts.JaggedPivot().Take(10))
{
foreach(int i in x)
{
Console.Write("{0} ", i);
}
Console.WriteLine();
}
}
}

It's reasonably straightforward to do if you can guarantee how the results are going to be used. However, if the results might be used in an arbitrary order, you may need to buffer everything. Consider this:
var results = MethodToBeImplemented(sequences);
var iterator = results.GetEnumerator();
iterator.MoveNext();
var first = iterator.Current;
iterator.MoveNext();
var second = iterator.Current;
foreach (var x in second)
{
// Do something
}
foreach (var x in first)
{
// Do something
}
In order to get at the items in "second" you'll have to iterate over all of the subsequences, past the first items. If you then want it to be valid to iterate over the items in first you either need to remember the items or be prepared to re-evaluate the subsequences.
Likewise you'll either need to buffer the subsequences as IEnumerable<T> values or reread the whole lot each time.
Basically it's a whole can of worms which is difficult to do elegantly in a way which will work pleasantly for all situations :( If you have a specific situation in mind with appropriate constraints, we may be able to help more.

Based on David B's answer, this code should perform better:
public static IEnumerable<IEnumerable<T>> JaggedPivot<T>(
this IEnumerable<IEnumerable<T>> source)
{
var originalEnumerators = source.Select(x => x.GetEnumerator()).ToList();
try
{
var enumerators =
new List<IEnumerator<T>>(originalEnumerators.Where(x => x.MoveNext()));
while (enumerators.Any())
{
yield return enumerators.Select(x => x.Current).ToList();
enumerators.RemoveAll(x => !x.MoveNext());
}
}
finally
{
originalEnumerators.ForEach(x => x.Dispose());
}
}
The difference is that the enumerators variable isn't re-created all the time.

Here's one that is a bit shorter, but no doubt less efficient:
Enumerable.Range(0,items.Select(x => x.Count()).Max())
.Select(x => items.SelectMany(y => y.Skip(x).Take(1)));

What about this?
List<string[]> items = new List<string[]>()
{
new string[] { "a", "b", "c" },
new string[] { "1", "2", "3" },
new string[] { "x", "y" },
new string[] { "y", "z", "w" }
};
var x = from i in Enumerable.Range(0, items.Max(a => a.Length))
select from z in items
where z.Length > i
select z[i];

You could compose existing operators like this,
IEnumerable<IEnumerable<int>> myInts = new List<IEnumerable<int>>()
{
Enumerable.Range(1, 20).ToList(),
Enumerable.Range(21, 5).ToList(),
Enumerable.Range(26, 15).ToList()
};
myInts.SelectMany(item => item.Select((number, index) => Tuple.Create(index, number)))
.GroupBy(item => item.Item1)
.Select(group => group.Select(tuple => tuple.Item2));

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to remove duplicate record using linq? - c#

tableData.GroupBy(q => q.Skip(1).First()).Select(q => q.First())

var distinctValues = tableData.SelectMany(x => x).Distinct(); This will flatten your list of lists and select the distinct set of strings.

Related

linq query in list<list<T>>

Set subtraction while keeping duplicates

Insert Values from string[] arrays to an ArrayList of records

Add Range of items to list without duplication

Zip N IEnumerable<T>s together? Iterate over them simultaneously?

Categories

Resources