Removing duplicates from a sorted list c#

Removing duplicates from a sorted list c# - c#

I have a list of details about a large number of files. This list contains the file ID, last modified date and the file path. The problem is there are duplicates of the files which are older versions and sometimes have different file paths. I want to only store the newest version of a file regardless of file path. So I created a loop that iterates through the ordered list, checks to see if the ID is unique and if it is, it gets stored in a new unique list.
var ordered = list.OrderBy(x => x.ID).ThenByDescending(x => x.LastModifiedDate);
List<Item> unique = new List<Item>();
string curAssetId = null;
foreach (Item result in ordered)
{
if (!result.ID.Equals(curAssetId))
{
unique.Add(result);
curAssetId = result.ID;
}
}
However this is still allowing duplicates into the DB and I can't figure out why this code isn't working as expected. By duplicates I mean, the files have the same ID but different file paths, which like I said before shouldn't be an issue. I just want the latest version regardless of pathway. Can anyone else see what the issue is? Thanks
var ordered = listOfItems.OrderBy(x => x.AssetID).ThenByDescending(x => x.LastModifiedDate);
List<Item> uniqueItems = new List<Item>();
foreach (Item result in ordered)
{
if (!uniqueItems.Any(x => x.AssetID.Equals(result.AssetID)))
{
uniqueItems.Add(result);
}
}
this is what I have now and it is still allowing duplicates

This is because , you are not searching entire list to check whether the id is unique or not
List<Item> unique = new List<Item>();
string curAssetId = null; // here is the problem
foreach (Item result in ordered)
{
if (!result.ID.Equals(curAssetId)) // here you only compare the last value.
{
unique.Add(result);
curAssetId = result.ID; // You are only assign the current ID value and
}
}
to solve this , change the following
if (!result.ID.Equals(curAssetId)) // here you only compare the last value.
{
unique.Add(result);
curAssetId = result.ID; // You are only assign the current ID value and
}
to
if (!unique.Any(x=>x.ID.Equals(result.ID)))
{
unique.Add(result);
}

I don't know if this code is just simplified, but have you considered grouping on ID, sorting on LastModifiedDate, then just taking the first from each group?
Something like:
var unique = list.GroupBy(i => i.ID).Select(x => x.OrderByDescending(y => y.LastModifiedDate).First());

var ordered = list.OrderBy(x => x.ID).ThenByDescending(x => x.LastModifiedDate).Distinct() ??

For this purpose you have to create your own EquityComparer and after that you could use linq's Distinct method. Enumerable.Distinct at msdn
Also I think you could stay with your current code but you have to modify it in such a way (as a sample):
var ordered = list.OrderByDescending(x => x.LastModifiedDate);
var unique = new List<Item>();
foreach (Item result in ordered)
{
if (unique.Any(x => x.ID == result.ID))
continue;
unique.Add(result);
}

List<Item> p = new List<Item>();
var x = p.Select(c => new Item
{
AssetID = c.AssetID,
LastModifiedDate = c.LastModifiedDate.Date
}).OrderBy(y => y.id).ThenByDescending(c => c.LastModifiedDate).Distinct();

Related

How to group a list with Linq

I have a list which I get from a database. The structure looks like (which I'm representing with JSON as it's easier for me to visualise)
{id:1
value:"a"
},
{id:1
value:"b"
},
{id:1
value:"c"
},
{id:2
value:"t"
}
As you can see, I have 2 unique ID's, ID 1 and 2. I want to group by the ID. The end result I'd like is
{id:1,
values:["a","b","c"],
},
{id:2,
values["g"]
}
Is this possible with Linq? At the moment, I have a massive complex foreach, which first sorts the list (by ID) and then detects if it's already been added etc but this monstrous loop made me realise I'm doing wrong and honestly, it's too embarrassing to share.

You can group by the item Id and have the resulting type be a Dictionary<int, List<string>>
var result = myList.GroupBy(item => item.Id)
.ToDictionary(item => item.Key,
item => item.Select(i => i.Value).ToList());

You can either use GroupBy method on IEnumerable to create IGrouping object that contains a key and grouped objects or you can use ToLookupto create exactly what you want in result:
yourList.ToLookup(m => m.id, m => m.value);
This creates a hashed collection of keys with their values.
For more information please see below post:
https://www.c-sharpcorner.com/UploadFile/d3e4b1/practical-usage-of-using-tolookup-method-in-linq-C-Sharp/

Just a little more detail to emphasize the difference between the ToLookup approach and the GroupBy approach:
// class definition
public class Item
{
public long Id { get; set; }
public string Value { get; set; }
}
// create your list
var items = new List<Item>
{
new Item{Id = 0, Value = "value0a"},
new Item{Id = 0, Value = "value0b"},
new Item{Id = 1, Value = "value1"}
};
// this approach results in a List<string> (a collection of the values)
var lookup = items.ToLookup(i => i.Id, i => i.Value);
var groupOfValues = lookup[0].ToList();
// this approach results in a List<Item> (a collection of the objects)
var itemsGroupedById = items.GroupBy(i => i.Id).ToList();
var groupOfItems = itemsGroupedById[0].ToList();
So, if you want to work with values only after grouping, then you could take the first approach; if you want to work with objects after grouping, you could take the second approach. And, these are just a couple example implementations, there are plenty of ways to accomplish your goal.

First convert to a Lookup then select into a list, like so:
var groups = list
.ToLookup
(
item => item.ID,
item => item.Value
)
.Select
(
item => new
{
ID = item.Key,
Values = item.ToList()
}
)
.ToList();
The resulting JSON looks like this:
[{"ID":1,"Values":["a","b","c"]},{"ID":2,"Values":["t"]}]
Link to working example on DotNetFiddle.

Compare two List elements and replace if id is equals

I have two lists with Classes
public class Product
{
int id;
string url;
ect.
}
I need compare in the old list (10k+ elements) a new list(10 elements) by ID
and if an id is same just replace data from new List to old list
I think it will be good using LINQ.
Can you help me how can I use LINQ or there are batter library?

Do you need to modify the collection in place or return a new collection?
If you are returning a new collection you could
var query = from x in oldItems
join y in newItems on y.Id equals x.Id into g
from z in g.DefaultIfEmpty()
select z ?? x;
var new List = query.ToList();
This method will ignore entries in newItems that do not exist in old items.
If you are going to be modifying the collection in place you would be better off working with a dictionary and referencing that everywhere.
You can create a dictionary from the list by doing
var collection = items.ToDictionary(x => x.Id, x => x);
Note modifying the dictionary doesn't alter the source collection, the idea is to replace your collection with the dictionary object.
If you are using the dictionary you can then iterate over new collection and check the key.
foreach (var item in newItems.Where(x => collection.ContainsKey(x.Id))) {
collection[item.Id] = item;
}
Dictionaries are iterable so you can loop over the Values collection if you need to. Adds and removes are fast because you can reference by key. The only problem I can think you may run into is if you rely on the ordering of the collection.
If you are stuck needing to use the original collection type then you could use the ToDictionary message on your newItems collection. This makes your update code look like this.
var converted = newItems.ToDictionary(x => x.Id, x => x);
for (var i = 0; i < oldItems.Count(); i++) {
if (converted.ContainsKey(oldItems[i].Id)) {
oldItems[i] = converted[oldItems[i].Id];
}
}
This has the advantage the you only need to loop the newitems collection once, from then on it's key lookups, so it's less cpu intensive. The downside is you've created an new collection of keys for newitems so it consumes more memory.

Send you a sample function that joins the two list by id property of both lists and then update original Product.url with the newer one
void ChangeItems(IList<Product> original, IList<Product> newer){
original.Join(newer, o => o.id, n => n.id, (o, n) => new { original = o, newer = n })
.ToList()
.ForEach(j => j.original.Url = j.newer.Url);
}

Solution :- : The LINQ solution you're look for will be something like this
oldList = oldList.Select(ele => { return (newList.Any(i => i.id == ele.id) ? newList.FirstOrDefault(newObj => newObj.id == ele.id) : ele); }).ToList();
Note :- Here we are creating the OldList based on NewList & OldList i.e we are replacing OldList object with NewList object.If you only want some of the new List properties you can create a copy Method in your class
EG for copy constructor
oldList = oldList.Select(ele => { return (newList.Any(i => i.id == ele.id) ? ele.Copy(newList.FirstOrDefault(newObj => newObj.id == ele.id)) : ele); }).ToList();
//Changes in your class
public void Copy(Product prod)
{
//use req. property of prod. to be replaced the old class
this.id = prod.id;
}
Read
It is not a good idea to iterate over 10k+ elements even using linq as such it will still affect your CPU performance*
Online sample for 1st solution

As you have class
public class Product
{
public int id;
public string url;
public string otherData;
public Product(int id, string url, string otherData)
{
this.id = id;
this.url = url;
this.otherData = otherData;
}
public Product ChangeProp(Product newProd)
{
this.url = newProd.url;
this.otherData = newProd.otherData;
return this;
}
}
Note that, now we have ChangeProp method in data class, this method will accept new class and modify old class with properties of new class and return modified new class (as you want your old class be replaced with new classes property (data). So at the end Linq will be readable and clean.
and you already have oldList with lots of entries, and have to replace data of oldList by data of newList if id is same, you can do it like below.
suppose they are having data like below,
List<Product> oldList = new List<Product>();
for (int i = 0; i < 10000; i++)
{
oldList.Add(new Product(i, "OldData" + i.ToString(), "OldData" + i.ToString() + "-other"));
}
List<Product> newList = new List<Product>();
for (int i = 0; i < 5; i++)
{
newList.Add(new Product(i, "NewData" + i.ToString(), "NewData" + i.ToString() + "-other"));
}
this Linq will do your work.
oldList.Where(x => newList.Any(y => y.id == x.id))
.Select(z => oldList[oldList.IndexOf(z)].ChangeProp(newList.Where(a => a.id == z.id).FirstOrDefault())).ToList();

foreach(var product in newList)
{
int index = oldList.FindIndex(x => x.id == product.id);
if (index != -1)
{
oldList[index].url = product.url;
}
}
This will work and i think it's a better solution too.
All the above solution are creating new object in memory and creating new list with 10k+
records is definitely a bad idea.
Please make fields in product as it won't be accessible.

Get Random item from table without repetition

I am creating an application where I have to display a question from a list without repetition.
public IEnumerable<dynamic> GetQue()
{
var result = obj.tblQuestions
.OrderBy(r => Guid.NewGuid())
.Select(o => new { o.id, o.Question, o.Opt1, o.Opt2, o.Opt3, o.Opt4 })
.Take(1);
return result;
}
Currently I am getting a random question but with repetition. How do I get a record without repetition?

As I said in the comment, you can get elements one by one, using a random, and then remove the selected elements from list. Repeat this until the list is empty.
I am not gving yu exactly the code necessary for your case, you will still need to adapt it to your classes, but this is the principle it shoud respect:
var list = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
int randomId;
Random rand = new Random();
if (list.Count != 0)
{
randomId = rand.Next(list.Count);
var randomElement = list[randomId];
list.RemoveAt(randomId);
return randomElement;
}
This gets the random elements from a list of integers, considering your list is the data iside a class, not the one you should renew, of course.

public ActionResult GetNextQuestion(int[] prevs = null)
{
var que = GetQue(prevs);
var ids = new int[] { que.id};
if(prevs != null)
ids = ids.Concat(prevs);
ViewBag.list = ids;
return View(que);
}
public dynamic GetQue(int[] prevs = null)
{
using (var obj = new Db())
{
var result = obj.tblQuestions;
if(prevs != null)
result = result.Where(e => !prevs.Contains(e.id));
result = result.OrderBy(r => new Guid())
.Select(o => new { o.id, o.Question, o.Opt1, o.Opt2, o.Opt3, o.Opt4 });
return result.First();
}
}

Source:how to avoid number repeation by using random class in c#?
If you add the items to a list as you cycle them, you can check the list to see if its been added or not. I'm pretty rookie, so i cant really code it out for you, but the idea is there. Make a seperate list for the entries you've already cycled through, then do maybe an if statement to check if the next entry is in the list before executing it.
I would have done this in a comment, but i dont have 50 rep, so i cant start a comment chain. :/

how do I make this LINQ query faster?

modelData has 100,000 items in the list.
I am doing 2 "Selects" within 2 loops.
Could it be structured differently - as it take a long time - 10 mins
public class ModelData
{
public string name;
public DateTime DT;
public int real;
public int trade;
public int position;
public int dayPnl;
}
List<ModelData> modelData;
var dates = modelData.Select(x => x.DT.Date).Distinct();
var names = modelData.Select(x => x.name).Distinct();
foreach (var aDate in dates)
{
var dateRealTrades = modelData.Select(x => x)
.Where(x => x.DT.Date.Equals(aDate) && x.real.Equals(1));
foreach (var aName in names)
{
var namesRealTrades = dateRealTrades.Select(x => x)
.Where(x => x.name.Equals(aName));
// DO MY PROCESSING
}
}

I believe what you want can be achieved with two queries using group by. One to create a lookup by the date and the other to give you the name-date grouped items.
var data = modelData.Where(x => x.real.Equals(1))
.GroupBy(x => new { x.DT.Date, x.name });
var byDate = modelData.Where(x => x.real.Equals(1))
.ToLookup(x => x.DT.Date);
foreach(var item in data)
{
var aDate = item.Key.Date;
var aName = item.Key.name;
var namesRealTrades = item.ToList();
var dateRealTrades = byDate[aDate].ToList();
// DO MY PROCESSING
}
The first query will give you items grouped by the name and date to iterate over and the second will give you a lookup to get all the items associated with a given date. The second uses a lookup so that the list is iterated once and gives you fast access to the resulting list of items.
This should greatly reduce the number of times you iterate over modelData from what you currently have.

You could rewrite your for loop like this:
foreach (var namesRealTrades in names.Select(aName => dateRealTrades.Where(x => x.name.Equals(aName))))
{
//DO STUFF
}
Depending on your data this could reduce the number of queries you have to make

Did you try to compile your query as suggested on MSDN WebSite?
When you have an application that executes structurally similar
queries many times, you can often increase performance by compiling
the query one time and executing it several times with different
parameters. For example, an application might have to retrieve all the
customers who are in a particular city, where the city is specified at
runtime by the user in a form. LINQ to SQL supports the use of
compiled queries for this purpose.
https://msdn.microsoft.com/en-us/library/bb399335(v=vs.110).aspx

A couple of things:
use .ToList() to calculate a sequence once, so you can keep it for later.
use .GroupBy() to avoid re-searching modelData for things you have already found.
// Collections of models having the same Date or Name.
var dates = modelData.GroupBy(x => x.DT.Date);
var names = modelData.GroupBy(x => x.Name);
foreach (var modelsWithDate in dates)
{
var aDate = modelsWithDate.Key;
var dateRealTrades = modelsWithDate.Where(x => x.real == 1).ToList();
foreach (var modelsWithName in names)
{
var aName = modelsWithName.Key;
var namesRealTrades = modelsWithName.ToList();
// DO MY PROCESSING
}
}

There are two ways the code is ineffective.
names has deffered evaluation. Every time You iterate over it, it has to go though the whole data to find all the distinct names again. You should save the result.
You find distinct values from collection and then You go through collection again for every distinct value and look fot its occurences. You should use grouping.
the rewritten code can look like this
var dates = modelData.GroupBy(x => x.DT.Date);
var names = modelData.Select(x => x.name).Distinct().ToArray();
foreach (var date in dates)
{
var dateRealTrades = date.Where(x => x.real.Equals(1)).ToArray();
var namesRealTradesLookup = dateRealTrades.ToLookup(x => x.name);
foreach (var aName in names)
{
var namesRealTrades = namesRealTradesLookup[aName];
// DO MY PROCESSING
// var aDate = date.Key;
}
}
In case You are not interestested in date/name combination with no real trade, it can be done in much more straightforward way
var realModelData = modelData.Where(x => x.real.Equals(1));
foreach (var dateRealTrades in realModelData.ToLookup(x => x.DT.Date))
{
foreach (var namesRealTrades in dateRealTrades.ToLookup(x => x.name))
{
// DO MY PROCESSING
//var aDate = dateRealTrades.Key;
//var aName = namesRealTrades.Key;
//foreach(var trade in namesRealTrades) { ...
//foreach(var trade in dateRealTrades) { ...
}
}

How to get values out of IGrouping?

I have applied IGrouping<> over a list - here's what it looks like:
IEnumerable<IGrouping<TierRequest,PingtreeNode>> Tiers
{
get { return ActiveNodes.GroupBy(x => new TierRequest(x.TierID, x.TierTimeout, x.TierMaxRequests)); }
}
Later in my code I iterate over Tiers. Its simple to get the key data using the Key element, but how do I get the IEnumerable<PingtreeNode> that forms the value part?
Thanks in advance

Tiers.Select(group => group.Select(element => ...));

in foreach you can get values like this
foreach(var group in tiers)
{
TierRequest key = group.Key;
PingtreeNode[] values = group.ToArray();
}

The group itself implements IEnumerable<T> and can be iterated over, or used with linq methods.
var firstGroup = Tiers.First();
foreach(var item in firstGroup)
{
item.DoSomething();
}
// or using linq:
firstGroup.Select(item => item.ToString());
// or if you want to iterate over all items at once (kind of unwinds
// the grouping):
var itemNames = Tiers.SelectMany(g => g.ToString()).ToList();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Removing duplicates from a sorted list c# - c#

I don't know if this code is just simplified, but have you considered grouping on ID, sorting on LastModifiedDate, then just taking the first from each group? Something like: var unique = list.GroupBy(i => i.ID).Select(x => x.OrderByDescending(y => y.LastModifiedDate).First());

var ordered = list.OrderBy(x => x.ID).ThenByDescending(x => x.LastModifiedDate).Distinct() ??

List<Item> p = new List<Item>(); var x = p.Select(c => new Item { AssetID = c.AssetID, LastModifiedDate = c.LastModifiedDate.Date }).OrderBy(y => y.id).ThenByDescending(c => c.LastModifiedDate).Distinct();

Related

How to group a list with Linq

Compare two List elements and replace if id is equals

Get Random item from table without repetition

how do I make this LINQ query faster?

How to get values out of IGrouping?

Categories

Resources