I have this class :
public class Item
{
public int Id { get; set; }
public string Name { get; set; }
public decimal Price { get; set; }
}
I want to store instances of Item in a list, and keep it ordered like the user has ordered them (Likely to be in a GUI with up-down arrows while selecting an Item)...
Should I be adding an order member to my Item class, or is there a specific datastructure that can keep an arbitrary user-specified order.
Note: I'm going to use this to keep a list of items, in the order a person has seen them, walking in a store.
If you intend to persist the list to a database then you may want to include an Order property in your Item class; databases such as SQL Server do not guarantee the order of the result set.
List/Array/Collection are names for ordered sequence of items.
List<Item> is enough to keep items in particular order. Note that re-ordering items will be "slow" ( O(n) ) operation in this case to move single item in new place. If you just need Add regular List<T> is probably the easiest choice that does not require any additional fields.
Related
I need to be able to search over a collection of approx 2 million items in C#. Search should be possible over multiple fields. Simple string-matching is good enough.
Using an external dependency like a database is not an option, but using an in-memory database would be OK.
The main goal is to do this memory-efficient.
The type in the collection is quite simple and has no long strings:
public class Item
{
public string Name { get; set; } // Around 50 chars
public string Category { get; set; } // Around 20 chars
public bool IsActive { get; set; }
public DateTimeOffset CreatedAt { get; set; }
public IReadOnlyList<string> Tags { get; set; } // 2-3 items
}
Focus and requirements
Clarification of focus and requirements:
No external dependencies (like a database)
Memory-efficient (below 2 GB for 2 million items)
Searchable items in collection (must be performant)
Today's non-optimal solution
Using a simple List<T> over above type, either as a class or a struct, still requires about 2 GB of memory.
Is there a better way?
The most significant memory hog in your class is the use of a read-only list. Get rid of it and you will reduce memory footprint by some 60% (tested with three tags):
public class Item
{
public string Name { get; set; }
public string Category { get; set; }
public bool IsActive { get; set; }
public DateTimeOffset CreatedAt { get; set; }
public string Tags { get; set; } // Semi-colon separated
}
Also, consider using DateTime instead of DateTimeOffset. That will further reduce memory footprint with around 10%.
There are many things you can do in order to reduce the memory footprint of your data, but probably the easiest thing to do with the greatest impact would be to intern all strings. Or at least these that you expect to be repeated a lot.
// Rough example (no checks for null values)
public class Item
{
private string _name;
public string Name
{
get { return _name; }
set { _name = String.Intern(value); }
}
private string _category;
public string Category
{
get { return _category; }
set { _category = String.Intern(value); }
}
public bool IsActive { get; set; }
public DateTimeOffset CreatedAt { get; set; }
private IReadOnlyList<string> _tags;
public IReadOnlyList<string> Tags
{
get { return _tags; }
set { _tags = Array.AsReadOnly(value.Select(s => String.Intern(s)).ToArray()); }
}
}
Another thing you could do, more difficult and with smaller impact, would be to assign the same IReadOnlyList<string> object to items with identical tags (assuming that many items with identical tags exist in your data).
Update: Also don't forget to call TrimExcess to the list after you fill it with items, in order to get rid of the unused capacity.
This method can be used to minimize a collection's memory overhead if no new elements will be added to the collection.
With 2 GB (i.e. 2 billion bytes) for 2 million items, we have 1000 bytes per item, which should be more than enough to do this in polynomial time.
If I understand your requirements correctly, you have 2 million instances of a complex type, and you want to match complete strings / string prefixes / string infixes in any of their fields. Is that correct? I'm going to assume the hardest case, searching infixes, i.e. any part of any string.
Since you have not provided a requirement that new items be added over time, I am going to assume this is not required.
You will need to consider how you want to compare. Are there cultural requirements? Or is ordinal (i.e. byte-by-byte) comparison acceptable?
With that out of the way, let's get into an answer.
Browsers do efficient in-memory text search for web pages. They use data structures like Suffix Trees for this. A suffix tree is created once, in linear time time linear in the total word count, and then allows searches in logarithmic time time linear in the length of the word. Although web pages are generally smaller than 2 GB, linear creation and logarithmic searching scale very well.
Find or implement a Suffix Tree.
The suffix tree allows you to find substrings (with time complexity O(log N) O(m), where m is the word length) and get back the original objects they occur in.
Construct the suffix tree once, with the strings of each object pointing back to that object.
Suffix trees compact data nicely if there are many common substrings, which tends to be the case for natural language.
If a suffix tree turns out to be too large (unlikely), you can have an even more compact representation with a Suffix Array. They are harder to implement, however.
Edit: On memory usage
As the data has more common prefixes (e.g. natural language), a suffix tree's memory usage approaches the memory required to store simply the strings themselves.
For example, the words fire and firm will be stored as a parent node fir with two leaf nodes, e and m, thus forming the words. Should the word fish be introduced, the node fir will be split: a parent node fi, with child nodes sh and r, and the r having child nodes e and m. This is how a suffix tree forms a compressed, efficiently searchable representation of many strings.
With no common prefixes, there would simply be each of the strings. Clearly, based on the alphabet, there can only be so many unique prefixes. For example, if we only allow characters a through z, then we can only have 26 unique first letters. A 27th would overlap with one of the existing words' first letter and thus get compacted. In practice, this can save lots of memory.
The only overhead comes from storing separate substrings and the nodes that represent and connect them.
You can do theses dots, then you will see if there is trouble:
you can enable gcAllowVeryLargeObjects to enables arrays that are greater than 2 gigabytes.
Let the class implementation. When you choose between class and struct, the performance is not the main factor. I think there is no reason to use struct here. See Choosing Between Class and Struct.
Depending your search filter, you must override GetHashCode and Equal.
Do you need to mutate properties, or just search object in the collection?
If you just want research, and if your properties repeat themselves a lot, you can have one property used by many objects.
In this way, the value is stored only one time, and the object only store the reference.
You can do this only if you dont want to mutate the property.
As exemple, if two objects have the same category:
public class Category
{
public string Value { get; }
public Category(string category)
{
Value = category;
}
}
public class Item
{
public string Name { get; set; }
public Category Category { get; set; }
public bool IsActive { get; set; }
public DateTimeOffset CreatedAt { get; set; }
public IReadOnlyList<string> Tags { get; set; }
}
class Program
{
public void Init()
{
Category category = new Category("categoryX");
var obj1 = new Item
{
Category = category
};
var obj2 = new Item
{
Category = category
};
}
}
I would not expect any major memory issues with 2M objects if you are running 64-bits. There is a max size limit of lists of 2Gb, but a reference is only 8 bytes, so the list should be well under this limit. The total memory usage will depend mostly on how large the strings are. There will also be some object overhead, but this is difficult to avoid if you need to store multiple strings.
Also, how do you measure memory? The .Net runtime might over allocate memory, so the actual memory usage of your object might be significantly lower than the memory reported by windows. Use a memory profiler to get an exact count.
If strings are duplicated between many objects there might be a major win if you can deduplicate them, making use of the same instance.
using a struct instead of a class could avoid some overhead, so I made some tests:
list of objects using LINQ - 46ms
list of objects using for loop - 16ms
list of structs using for loop - 250ms
list of readonly structs with ref-return using for loop: 180ms
The exact times will depend on what query you are doing, these numbers are mostly for comparison.
Conclusion is that a regular List of objects with a regular for loop is probably the fastest. Also, iterating over all objects is quite fast, so in most cases it should not cause a major performance issue.
If you need better performance you will need to create some kind of index so you can avoid iterating over all items. Exact strategies for this is difficult to know without knowing what kinds of queries you are doing.
One option could be to use some variant of in memory database, this could provide most of the indexing functionality. SQLite would be one example
If the categories could be defined as an Enum, you can map it to bits, that would help in reducing the size pretty much. From 20bytes to say 2bytes(short int), this could approximately save around 36M bytes for 2M objects.
In WPF, I want to effectively store the millions of objects in low memory usage and retrieve it very fast. Below is my sample class.
public class CellInfo
{
public int A { get; set; }
public int B { get; set; }
public string C { get; set; }
public object D { get; set; }
public bool E { get; set; }
public double F { get; set; }
public ClassA G { get; set; }
}
I want to store a millions of CellInfo objects and each object have its own identity. And I want retrieve it back using that identity. If the properties of the CellInfo instance is not defined, then it needs to be return the default value which would be stored in a static field.
So i want to only store the Properties of CellInfo object which are defined and others i dont want to keep in a memory and can retrieve those from static variable.
So can anyone please suggest me fastest way to store and retrieve the millions of objects in a low memory usage?
Note: I dont want any additional software installation and DB or any external file to store this.
You haven't indicated which field is the 'own identity', so I've assumed a Guid Identity. A Dictionary keyed on this identity should offer fastest retrieval.
Dictionary<Guid, CellInfo> cells = new Dictionary<Guid, CellInfo>();
If you have the data already, you can use .ToDictionary() to project the key / value mappings from an enumerable.
If you need to simultaneously mutate and access the collection from multiple threads (or if you intend making the collection static), you can swap out with a ConcurrentDictionary to address thread safety issues:
Before accessing an element, you'll need to determine whether the item exists in the Dictionary via ContainsKey (or TryGet as per Amleth). If not, use your default element. So would suggest you hide the underlying dictionary implementation and force consumers through an encapsulation helper which does this check for you.
I have some orders coming down the wire quite frequently, I need to store them and build an aggregation out of them. An order will have an ID and there will be an instrument type associated with it. The orders can also have some events attached to it, like say add, update or remove. If it's an update event, then there will not be an instrument type attached with the order, but the order id will be same. For ex: if I have an order for instrument "xyz" with order id 100, later on I can get an event to update the order which has an id 100 by $20, and there will not be an instrument type present with that event (order).
Once I receive an order, I need to build an order book for unique instruments, for example instrument "xyz" should contain all recieved orders for it inside the orderbook.
My question is how efficiently can I store this and what kind of data structure should I use for it?
An Order looks something like this:
public class Order
{
public Order(Action add, int id, string instrument, int price)
}
An Orderbook:
public class OrderBook
{
public string Instrument;
public List<Order> AllOrders;
}
Option 1:
Update a Dictionary<int,OrderBook> when I receive an order, with key as order id, and create an order book for the instrument.
Issue: This will take care of the update events, I can check whether the order already exists, and then update the order book. However an instrument type should only have one order book, and this condition is violated here, as for instrument "xyz" there could be multiple Add orders coming through, and also makes the manipulation difficult.
Option 2:
Update a dictionary of Dictionary<OrderBook, List<int>>, with values as the order id's.
Issue: This will take care of the above issue, however when I get an update event, I'll have to check through every list of values (i.e list of order id's) to see whether the order exists already, since the instrument type is going to be empty and I cannot look by the OrderBook key.
Orders are coming down at real time, and the operation for storing and retrieving has has to be bit more efficient(if not O(1) then O(logn)), is there a better way to structure this please?
NOTE: An OrderBook is an aggregation of all orders for an instrument and will be unique for the instrument. An order will be for an instrument for a particular price, and there will be many orders for the same instrument. I get the orders along with the event from someone else(a third party lib), and I'm responsible for building the orderbook.
I see this issue as combination of two child issues.
You are tracking the OrderID for across the inputs.
You are maintaining unique Orderbook per item.
In such case, I would suggest maintain both the dictionaries
Alternatively, you can convert the List<order> to dictionary <int, order> in orderbook to simplify the search with the orderbook.
For the option 1, you mentioned
However an instrument type should only have one order book, and this
condition is violated here
You are not going to have multiple orderbooks, but maintaining the reference of the same order book across the dictionary entries.
Try this.
public class Order
{
public Action Action { get; set; }
public int Id { get; set; }
public int Price { get; set; }
public Order(Action add, int id, int price){
//Initialize
}
}
public class Instrument
{
public string InstrumentName { get; set; }
public Dictionary<int, Order> OrderBook { get; set; }
public Instrument(string instrument)
{
InstrumentName = instrument;
//OrderBook = new List<Order>();
}
public void AddOrder(Order order)
{
//Check order exist condition
OrderBook.Add(order.Id, order);
}
}
Then use List<Instrument>
I guess it should work for you. Let me know if any issue in this.
Here is my situation. I have 2 list of the same type. Imagine the names like these. FullList and ElementsRemoved. So in order to avoid the database roundtrip, anytime I delete an element from the fulllist I added to the list of ElementsRemoved in case of regret's user so he can revert the deletion.
I was thinking to loop inside my ElementsRemoved to insert them again into the FullList from where initially were removed.
There is any way to do this as simple with List Methods.
Something like
FullList.Insert, Add, ..... (x =>
in order to reduce line code and optimized?
Instead of deleting the item from your database consider using a flag in the table.
For example consider this entities table (written in TSQL):
CREATE TABLE Entity
(
Id INT IDENTITY PRIMARY KEY
,Name NVARCHAR(20) NOT NULL
,IsDelete BIT NOT NULL DEFAULT 0
);
This way you can set the IsDelete bit when the user deletes the entity which will prevent the data from being lost. The data can be pruned on a job in the off hours.
The would lead to only needing one list instead of keeping track of two lists.
public class Entity
{
public int Id { get; set; }
public string Name { get; set; }
public bool IsDelete { get; set; }
}
public static void UndoDelete(IEnumerable<Entity> fullList, int[] removedIds)
{
foreach(var entity in fullList.Where(e => removedIds.Contains(e.Id)))
{
entity.IsDelete = false;
}
}
In case you cannot modify your application.
You can simply add the entities back in.
See List(T).AddRange
var entitiesToAdd = new[] { 2, 3, 4 };
var entitiesToInsert = ElementsRemoved.Where(e => entitiesToAdd.Contains(e.Id));
FullList.AddRange(entitiesToInsert);
In your front end make a class that holds a bool and your object:
public class DelPair<T>{
public bool IsDeleted{get;set;}
public T Item{get;set;}
}
Now instead of using a list of objects use a list of DelPair<YourClass> and set IsDeleted=true when deleting.
This pattern will also allow you to track other things, such as IsModified if it comes to that.
Based on OP comment that he's using an ENTITY class and needs it to function as such:
One option is to make your DelPair class inherit ENTITY. Another may be to put implicit casting operator:
...
// not exactly sure about the signature, trial/error should do :)
public static implicit operator T(DelPair<T> pair)
{
return pair.Item;
}
Suppose you have an element having a field id which uniquely identifies it.
class Element{public int id;}
In that case you can do this
FullList.Add(ElementsRemoved.FirstOrDefault(e=>e.id==id));
In case you want to add all elements use AddRange
FullList.AddRange(ElementsRemoved);
You can use the AddRange method
FullList.AddRange(ElementsRemoved);
But consider doing this
public class YourClass
{
public string AnyValue{get;set;}
public bool IsDeleted{get;set;}
}
And you have list like this List < YourClass> FullList. Now whenever user removes any item you just set the
IsDeleted = true
of the item that is removed. This will help you in keeping just one list and adding removing from the list
I have no other developers to ask for advice or "what do you think - I'm thinking this" so please, if you have time, have a read and let me know what you think.
It's easier to show than describe, but the app is essentially like a point of sale app with 3 major parts: Items, OrderItems and the Order.
The item class is the data as it comes from the datastore.
public class Item
: IComparable<OrderItem>, IEquatable<OrderItem>
{
public Int32 ID { get; set; }
public String Description { get; set; }
public decimal Cost { get; set; }
public Item(Int32 id, String description, decimal cost)
{
ID = id;
Description = description;
Cost = cost;
}
// Extraneous Detail Omitted
}
The order item class is an item line on an order.
public class OrderItem
: Item, IBillableItem, IComparable<OrderItem>, IEquatable<OrderItem>
{
// IBillableItem members
public Boolean IsTaxed { get; set; }
public decimal ExtendedCost { get { return Cost * Quantity; } }
public Int32 Quantity { get; set; }
public OrderItem (Item i, Int32 quantity)
: base(i.ID, i.Description, i.Cost)
{
Quantity = quantity;
IsTaxed = false;
}
// Extraneous Detail Omitted
}
Currently when you add fees or discounts to an order it's as simple as:
Order order = new Order();
// Fee
order.Add(new OrderItem(new Item("Admin Fee", 20), 1));
// Discount
order.Add(new OrderItem(new Item("Today's Special", -5), 1));
I like it, it makes sense and a base class that Order inherits from iterates through the items in the list, calculates appropriate taxes, and allows for other Order-type documents (of which there are 2) to inherit from the base class that calculates all of this without re-implimenting anything. If an order-type document doesn't have discounts, it's as easy as just not adding a -$ value OrderItem.
The only problem that I'm having is displaying this data. The form(s) that this goes on has a grid where the Sale items (ie. not fees/discounts) should be displayed. Likewise there are textboxes for certain fees and certain discounts. I would very much like to databind those ui elements to the fields in this class so that it's easier on the user (and me).
MY THOUGHT
Have 2 interfaces: IHasFees, IHasDiscounts and have Order implement them; both of which would have a single member of List. That way, I could access only Sale items, only Fees and only Discounts (and bind them to controls if need be).
What I don't like about it:
- Now I've got 3 different add/remove method for the class (AddItem/AddFee/AddDiscount/Remove...)
- I'm duplicating (triplicating?) functionality as all of them are simply lists of the same type of item, just that each list has a different meaning.
Am I on the right path? I suspect that this is a solved problem to most people (considering that this type of software is very common).
I'll point you to a remark by Rob Connery on an ALT.net podcast I listened to not long ago (I'm not an ALT.net advocate, but the reasoning seemed sound):
What does make sense to a "business user" (if you have any of those around).
As a programmer, you're gonna want to factor in Item, Fee, Discount etc, because they have similar attributes and behaviors.
BUT, they might be two totally separate concepts in terms of the model. And someone is gonna come at a later time, saying "but this makes no sense, they are separate things, I need to report on them separately and I need to apply this specific rule to discounts in that case".
DRY does not mean limiting your model, and you should keep that in sight when factoring behavior via inheritance or anything like that.
The specific example that was used in that case was that of the shopping cart. The programmer's natural idea was to use an order in an uncommited state. And it makes sense, because they look exactly the same.
Except that they are not. It makes no sense to the client, because they are two separate concept, and it just make the design less clear.
It is a matter of practices, taste and opinion though, so don't blindly follow advice posted on a web site :)
And to your specific problem, the system I work with uses items, fees, line-item discount (a property of the item) and a global discount on the order (though it's not an order, it's POS receipt but it does not really matter in that case).
I guess the reason is that, behind those concepts, Items are specific instances of inventoried pieces, they impact stock quantities, they are enumerable and quantifiable.
Fees are not. They do not share most of the attributes.
It might not matter in your case, because your domain seems much more limited than that, but you might want to keep those issues in mind.
Effectively, I'd look at your design in the details and try to figure out where the behaviors lie; then extract any commonalities in those behaviors to a distinct interface and make sure that applies to your design.
To wit; Fees may have associated validation behaviors associated with them. Let's say you add a Fee to any Order which has 20 items or more (just a random example, run with me on this one). Now, when you add the 20th item, you may want to add that Fee to the Order, but there's a problem; when you remove an item from your order, do you want to have to check every time to see if you need to remove that Fee from your order? I doubt it; the implication here is that there is a behavior that is associated with the Fees / Discounts that essentially makes them an entirely different class of things.
I'd look at it this way; categorize Fees and Discounts as "Special" things, and then create an "ISpecial" interface from which both Fees and Discounts inherit. Extract any common functionality to the ISpecial interface (for example, "Validate"). Then have your Order implement the ISpecial (or whatever) interface.
In that way, you can define the specific Fee.Validate() behavior and the Discount.Validate behavior, and have the operate properly thanks to the magic of polymorphism (foreach of m_specialCollection .validate those). In that way, as well, you can easily extend the Special interface for anything else that might be necessary (say, Taxes).
I think the core of the problem that you're facing here is that you've implemented OrderItem as a subclass of Item, and now you're discovering that this really isn't always appropriate.
Given what you describe, here's how I'd try implementing this:
Create an Order class that implements public properties for every single-valued data element that you want to expose to data binding: order number, date, customer, total fees, total discounts, etc. It sounds like you may be needing to display specific fees/discounts as single values; if so, implement public properties for those.
Create an abstract OrderItem class that implements public properties for every data element that you want to bind to in the grid, and for every data element that you want to sort the items on. (You could also make this an IOrderItem interface; it really depends on whether or not there are going to be methods common to all order items.)
Create subclasses of OrderItem (or classes that implement IOrderItem) for the specific kinds of line item that can appear on an order: ProductOrderItem, FeeOrderItem, DiscountOrderItem, etc.
In your implementation of ProductItem, implement a property of type Item - it'd look something like:
public class ProductItem : OrderItem
{
public Item Item { get; set; }
public string Description { get { return Item.Description; } }
public int Quantity { get; set; }
public decimal Amount { get { return Item.Price * Quantity; } }
}
Implement a property of type IEnumerable<OrderItem> within Order for storing all of the line items. Implement an AddItem method for adding OrderItems, e.g.:
public void AddItem(OrderItem item)
{
_Items.Add(item); // note that backing field is a List<OrderItem>
}
which you can call pretty simply:
Order o = new Order();
o.AddItem(new ProductOrderItem { Item = GetItem(1), Quantity = 2 });
o.AddItem(new FeeItem { Description = "Special Fee", Amount = 100 });
o.AddItem(new DiscountItem { DiscountAmount = .05 });
Write implementations of those single-valued fields that need to extract values from this list, e.g.:
public decimal TotalFees
{
get
{
return (from OrderItem item in Items
where item is FeeItem
select item.Amount).Sum();
}
}
You can come back later and optimize these properties if necessary (e.g. saving the computation once you've done it once).
Note that you could also restrict AddItem to adding ProductItems, and use other methods in the Order to add other types of items. For instance, if an order can have only one discount amount:
public void SetDiscountAmount(decimal discountAmount)
{
DiscountOrderItem item = _Items
.Where(x => x is DiscountOrderItem)
.SingleOrDefault();
if (item == null)
{
item = new DiscountOrderItem();
_Items.Add(item);
}
item.DiscountAmount = discountAmount;
}
You'd use this approach if you wanted to display the discount amount in the appropriate place in the grid of order items, but also wanted an order's discount amount to be a single value. (It's arguable that you might want to make DiscountAmount a property of the Order, create the DiscountOrderItem in its setter, and have DiscountOrderItem get its Amount from Order.DiscountAmount. I think both approaches have their pros and cons.)
One option is to add a ItemType attribute to OrderItem
enum ItemType
{
Item,
Fee,
Discount
}
Now you could in your order class have:
public IList<OrderItem> Fees
{
get
{
return _items.Find(i=>i.ItemType==ItemType.Fee);
}
}
Now you can still keep your single list and avoid the extra interfaces. You could even have a method like IList GetItems(ItemType type).
One other thought is your current design doesn't allow for a discount of a %. Today you get 10% off. This might not be a requirement, but one option to avoid the application having to calculate this is to seperate the items from the discounts.
The discounts could even become more of rules, if I order 10 items take 5% off.