C# Structure that sorts key pairs and is accessible? - c#

Noob here. I've been scouring the internet for days, and cannot find a decent structure that auto-sorts data (like SortedSet), while still allowing that data to be accessible (like List). Here's what I have:
A list containing 100,000 nodes, added and modified regularly.
List<Nodes> nodes;
The node object, containing data I need to access/change
public class Node (string name, int index){ doSomething(); }
I don't wish to be vague, but can't sort the actual list because the index is a history of when nodes were added. Thus, I want to use a structure that auto-sorts KeyValuePair pairs(where string is the name to be sorted by, and int is the index as it is found in my list of nodes), but I must be able to access the value. Here's what I want to do:
// Add a node to the list, then to the structure
int index = nodes.Count;
nodes.Add(new Node("someName", index));
someStructure.Add("someName", index);
// Give name to structure, which returns int value for use in finding node
node[someStructure.findValueOf("someName"))].doSomething();
This would tell the node with the name "someName" to doSomething();
I am positive that I am missing something. I've tried using SortedSet, SortedList, Dictionary, etc. In each case, I can't retrieve the sorted object. What is the purpose of auto-sorting if I can't find out where they are at? Please help my poor life.

You are looking for a SortedDictionary.
As per the documentation: Represents a collection of key/value pairs that are sorted on the key. Although, as some comments say, those 100k objects would be better kept in a database...
Link: https://msdn.microsoft.com/en-us/library/f7fta44c(v=vs.110).aspx

You can use SortedList and LINQ:
SortedList<int,string> list = new SortedList<int, string>();
list.Add(1, "name1");
list.Add(2, "name2");;
var c = list.Select(x => x.Value == "name2").FirstOrDefault();
However I agree with a Christopher's comment about using db.

Related

Which data structure should I use for having sorted data, where the key can be used on multiple entries

I have some set of data to store where I have a key to find the object related, but the key is not unique and the same key can have multiple pointers.
I would like to be able to throw this data into a structure a bit like the SortedList, but it require unique keys to work.
Is there any out of the box C# object that will allow me to do this. In following example I am using a string as reference object, but it might as well be any other object, just like a SortedList can contain any type of object.
eg.
unknownDatastructure structure = new unknownDatastructure<string, string>();
structure.add("left","the remainder after a removal");
structure.add("left","a direction opposite of right");
structure.add("right","a direction opposite of left");
structure.add("right","opposite of being wrong");
and I would like to either get an array of objects back that matches the search key when called, or first entry (and I can find remaining via the index).
eg.
List<string> results = structure.FindAll("Left");
Where the List would contain the actual referenced object, in this case "the remainder of a removal" and "a direction opposite of right".
int firstmatch = structure.FindIndex("Right");
Where the int would be 2 (0 = first left, 1 = second left, 2 = first right, 3 = second right). I do not care which order the 2 lefts or the 2 rights are sorted, just need to be able to locate the entry of the first repeat, then I can step forward through all entries as required to create the "FindAll" function
Dictionary<K,V> does not support duplicate keys. You will have to maintain a list of all entries for each key:
Dictionary<string, List<string>>
Just create your own:
Dictionary<TKey, List<TValue>> dictionary;
Then your code would look a little like:
List<TValue> list;
if(!dictionary.TryGetValue(key, out list))
{
list = new List<TValue>();
dictionary[key] = list;
}
list.Add("whatever...");
You could look into Tuple or Anonymous Types.
Is there any reason you don't want to create a class or struct to contain your data? It would allow you to easily add functionality later on.
Adding either of the above to a List<T> will let you find the element by using a predicate. myList.FindAll(x => x.MyKey == "left");

Is there any way to loop through my sql results and store certain name/value pairs elsewhere in C#?

I have a large result set coming from a pretty complex SQL query. Among the values are a string which represents a location (that will later help me determine the page location that the value came from), an int which is a priority number calculated for each row based on other values from the row, and another string which contains a value I must remember for display later.
The problem is that the sql query is so complex (it has UNIONS, JOINS, and complex calculations with aliases) that I can't logically fit anything else into it without messing with the way it works.
Suffice it to say, though, after the query is done and the calculations performed, I need something that perhaps aggregate functions might solve, but that IS NOT an option, as all the columns do not come from other aggregate functions.
I have been wracking my brain for days now as to how I can iterate through the results, store a pair of values in a list (or two separate lists tied together somehow) where one value is the sum of all the priority values for each location and the other value is a distinct location value (i.e., as the results are looped through, it will not create another list item with the same location value that has been used before, HOWEVER, it does still need the sum of all of the other priority values from locations that ARE identical). Also, the results need to be ordered by priority in Descending order (hence the problem with using two lists).
EXAMPLE:
EDIT: I forgot, the preserved value should be the value from the row with the highest priority from the sql query.
If I had the following results:
location priority value
--------------------------------------------------------------------------------
page1 1 some text!
page2 3 more text!
page2 4 even more text!
page3 3 text again
page3 1 text
page3 1 still more text!
page4 6 text
If I was able to do what I wanted I would be able to achieve something like this after iteration (and in this order):
location priority value
--------------------------------------------------------------------------------
page2 7 even more text!
page4 6 text
page3 5 text again
page1 1 some text!
I have done research after research after research but absolutely nothing really even gets close to solving this dilemma.
Is what I'm asking too tough for even the powerful C# language?
THINGS I HAVE CONSIDERED:
Looping through the sql results and checking each location for repeats, adding together all priority values as I go, and storing these two plus value in two or three separate lists.
Why I still need help
I can't use a foreach because the logic didn't pan out, and I can't use a for loop because I can't access an IEnumerable (or whatever type it is that stores what's returned from Database.Open.Query() by index. (this makes sense, of course). Also, I need to sort on priority, but can't get one list out of sync with the others.
Using LINQ to select and store what I need
Why I still need help
I don't know LINQ (at all!) mainly because I don't understand lambda expressions (no matter HOW MUCH I read up about it).
Using an instantiated class to store the name/value pairs
Why I still need help
Not only do I expect sorting on this sort of thing to be impossible, and while I do now how to use .cs files in my C#.net webpages with WebMatrix environment, I have mainly only ever used static classes and would also need a little refresher course on constructors and how to set this up appropriately.
Somehow fitting this functionality into the already sizeable and complex SQL query
Why I still need help
While this is probably where I would ideally like this functionality to be, I stress again that this IS NOT AN OPTION. I have tried using aggregate functions, but only get an error saying how not all the other columns come from aggregate functions.
Making another query based on values from the first query's result set
Why I still need help
I can't select distinct results based on only one column (i.e., location) alone.
Assuming I could get the loop logic correct, storing the values in a 3 dimensional array
Why I still need help
I can't declare the array, because I do not know all of its dimensions before I need to use it.
Your post has amazed me in a number of ways like saying to 'mostly using static classes' and 'expecting instantiate a class/object to be impossible'.. really strange things you say. I can only respond in a quote from Charles Babbage:
I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
Anyways.. As you say you find lambdas hard, let's trace the problem in the classic 'manual' way.
Let's assume you have a list of ROWS that contains LOCATIONS and PRIORITIES.
List<DataRow> rows = .... ; // datatable, sqldatareader, whatever
You say you need:
list of unique locations
a "list" of locations paired up with summed up priorites
Let's start with the first objective.
To gather a list of unique 'values', a HashSet is just perfect:
HashSet<string> locations = new HashSet<string>();
foreach(var row in rows)
locations.Add( (string)rows["LOCATION"] );
well, and that's all. After that, the locations hashset will only remember all the unique locations. The "Add" does not result in duplicate elements. The HashSet checks and "uniquifies" all values that are put inside it. Small tricky thing is the hashset does not have the [index] operator. You'll have to enumerate the hashset to get the values:
foreach(string loc in locations)
{
Console.WriteLine(loc);
}
or convert/rewrite it to a list:
List<string> locList = new List<string>(locations);
Console.WriteLine(locList[2]); // of course, assuming there were at least three..
Let's get to the second objective.
To gather a list of values related to some thing behaving like a "logical key", a Dictionary<Key,Val> may be useful. It allows you to store/associate a "value" with some "key", ie:
Dictionary<string, double> dict = new Dictionary<string, double>();
dict["mamma"] = 123.45;
double d = dict["mamma"]; // d == 123.45
    dict["mamma"] += 101; // possible!
double e = dict["mamma"]; // d == 224.45
However, it has a behavior of happily throwing exceptions when you try to read from an unknown key:
Dictionary<string, double> dict = new Dictionary<string, double>();
dict["mamma"] = 123.45;
double d = dict["daddy"]; // throws KeyNotBlarghException
    dict["daddy"] += 101; // would throw too! tries to read the old/current value!
So, one have to be very careful with it with "keys" that it does not yet know. Fortunatelly, you can always ask the dictionary if it already knows a key:
Dictionary<string, double> dict = new Dictionary<string, double>();
dict["mamma"] = 123.45;
bool knowIt = dict.ContainsKey("daddy"); // == false
So you can easily check-and-initialize-when-unknown:
Dictionary<string, double> dict = new Dictionary<string, double>();
bool knowIt = dict.ContainsKey("daddy"); // == false
if( !knowIt )
dict["daddy"] = 5;
dict["daddy"] += 101; // now 106
So.. let's try summing up the priorities location-wise:
Dictionary<string, double> prioSums = new Dictionary<string, double>();
foreach(var row in rows)
{
string location = (string)rows["LOCATION"];
double priority = (double)rows["PRIORITY"];
if( ! prioSums.ContainsKey(location) )
// make sure that dictionary knows the location
prioSums[location] = 0.0;
prioSums[location] += priority;
}
And, really, that's all. Now the prioSums will know all locations and all sums of priorities:
var sss = prioSums["NewYork"]; // 9123, assuming NewYork was some location
However, that'd be quite useless to have to hardcode all locations. Hence, you also can ask the dictionary about what keys does it curently know
foreach(string key in prioSums.Keys)
Console.WriteLine(key);
and you can immediatelly use it:
foreach(string key in prioSums.Keys)
{
Console.WriteLine(key);
Console.WriteLine(prioSums[key]);
}
that should print all locations with all their sums.
You might already noticed an interesting thing: the dictionary can tell you what keys has it remembered. Hence, you do not need the HashSet from the first objective. Simply by summing up the priorities inside the Dictionary, you get the uniquized list of location by free: just ask the dict for its keys.
EDIT:
I noticed you've had a few more requests (like sort-descending or find-highest-prio-value), but I think I'll leave them for now. If you understand how I used a dictionary to collect the priorities, then you will easily build a similar Dictionary<string,string> to collect the highest-ranking value for a location. And the 'descending order' is done very easily if only you take the values out of dictionary and sort them as a i.e. List.. So I'll skip that for now.. This text got far tl;dr already I think :)
LINQ is really the tool to use for this kind of problems.
Suppose you have a variable pages which is an IEnumerable<Page>, where Page is a class with properties location, priority and value you could do
var query = from page in pages
group page by page.location into grp
select new { location = grp.Key,
priority = grp.Sum(page => page.priority),
value = grp.OrderByDescending(page => page.priority)
.First().value
}
You say you don't understand LINQ, so let me try to begin explain this statement.
The rows are group by location, which results in 4 groups of pages of which page.location is the key:
location priority value
--------------------------------------
page1 1 some text!
page2 3 more text!
4 even more text!
page3 1 text
1 still more text!
3 text again
page4 6 text
The select loops through these 4 groups and for each group it creates an anonymous type with 3 properties:
location: the key of the group
priority: the sum of priorities in one group
value: the first value in one group when its pages are sorted by priority in descending order.
The lamba expressions are a way to express which property should be used for a LINQ function like Sum. In short they say "transform page to page.priority": page => page.priority.
You want these new rows in descending order of priority, so finally you can do
result = query.OrderByDescending(x => x.priority).ToList();
The x is just an arbitrary placeholder representing one item in the collection in hand, query (likewise in the query above page could have been any word or character).

Select an element by index from a .NET HashSet

At the moment I am using a custom class derived from HashSet. There's a point in the code when I select items under certain condition:
var c = clusters.Where(x => x.Label != null && x.Label.Equals(someLabel));
It works fine and I get those elements. But is there a way that I could receive an index of that element within the collection to use with ElementAt method, instead of whole objects?
It would look more or less like this:
var c = select element index in collection under certain condition;
int index = c.ElementAt(0); //get first index
clusters.ElementAt(index).RunObjectMthod();
Is manually iterating over the whole collection a better way? I need to add that it's in a bigger loop, so this Where clause is performed multiple times for different someLabel strings.
Edit
What I need this for? clusters is a set of clusters of some documents collection. Documents are grouped into clusters by topics similarity. So one of the last step of the algorithm is to discover label for each cluster. But algorithm is not perfect and sometimes it makes two or more clusters with the same label. What I want to do is simply merge those cluster into big one.
Sets don't generally have indexes. If position is important to you, you should be using a List<T> instead of (or possibly as well as) a set.
Now SortedSet<T> in .NET 4 is slightly different, in that it maintains a sorted value order. However, it still doesn't implement IList<T>, so access by index with ElementAt is going to be slow.
If you could give more details about why you want this functionality, it would help. Your use case isn't really clear at the moment.
In the case where you hold elements in HashSet and sometimes you need to get elements by index, consider using extension method ToList() in such situations. So you use features of HashSet and then you take advantage of indexes.
HashSet<T> hashset = new HashSet<T>();
//the special situation where we need index way of getting elements
List<T> list = hashset.ToList();
//doing our special job, for example mapping the elements to EF entities collection (that was my case)
//we can still operate on hashset for example when we still want to keep uniqueness through the elements
There's no such thing as an index with a hash set. One of the ways that hash sets gain efficincy in some cases is by not having to maintain them.
I also don't see what the advantage is here. If you were to obtain the index, and then use it this would be less efficient than just obtaining the element (obtaining the index would be equally efficient, and then you've an extra operation).
If you want to do several operations on the same object, just hold onto that object.
If you want to do something on several objects, do so on the basis of iterating through them (normal foreach or doing foreach on the results of a Where() etc.). If you want to do something on several objects, and then do something else on those several same objects, and you have to do it in such batches, rather than doing all the operations in the same foreach then store the results of the Where() in a List<T>.
why don't use a dictionary?
Dictionary<string, int> dic = new Dictionary<string, int>();
for (int i = 0; i < 10; i++)
{
dic.Add("value " + i, dic.Count + 1);
}
string find = "value 3";
int position = dic[find];
Console.WriteLine("the position of " + find + " is " + position);
example

Quickly or concisely determine the longest string per column in a row-based data collection

Judging from the result of my last inquiry, I need to calculate and preset the widths of a set of columns in a table that is being made into an Excel file. Unfortunately, the string data is stored in a row-based format, but the widths must be calculated in a column-based format. The data for the spreadsheets are generated from the following two collections:
var dictFiles = l.Items.Cast<SPListItem>().GroupBy(foo => foo.GetSafeSPValue("Category")).ToDictionary(bar => bar.Key);
StringDictionary dictCols = GetColumnsForItem(l.Title);
Where l is an SPList whose title determines which columns are used. Each SPListItem corresponds to a row of data, which are sorted into separate worksheets based on Category (hence the dictionary). The second line is just a simple StringDictionary that has the column name (A, B, C, etc.) as a key and the corresponding SPListItme field display name as the corresponding value. So for each Category, I enumerate through dictFiles[somekey] to get all the rows in that sheet, and get the particular cell data using SPListItem.Fields[dictCols[colName]].
What I am asking is, is there a quick or concise method, for any one dictFiles[somekey], to retrieve a readout of the longest string in each column provided by dictCols? If it is impossible to get both quickness and conciseness, I can settle for either (since I always have the O(n*m) route of just enumerating the collection and updating an array whenever strCurrent.Length > strLongest.Length). For example, suppose I had a set of 3 items, and dictCols specified the fields Field1, Field2, and Field3. The goal table might look like the following:
Item# Field1 Field2 Field3
1 Oarfish Atmosphere Pretty
2 Raven Radiation Adorable
3 Sunflower Flowers Cute
I'd like a function which could cleanly take the collection of items 1, 2, and 3 and output in the correct order...
Sunflower, Atmosphere, Adorable
Using .NET 3.5 and C# 3.0.
Unfortunately, searching for an item with the highest value for a specified observable on a set of n items means that each item must be checked. So the complexity is O(n).
This stands if no assumtion is made on the collection's order.
Having m collections to scan, the complexity (as you already figured out) is O(m x n).
EDIT [Erik Burigo]: This part of answer has been removed because it did not respond to the question's needs.
[omissis]
After having misunderstood the question I finally catched the point.
I can't see a more compact and elegant syntax than the one I'm proposing below.
var collection =
new List<Dictionary<String, String>>
{
new Dictionary<string, string> {{"Field1", "Oarfish"}, {"Field2", "Atmosphere"}, {"Field3", "Pretty"}},
new Dictionary<string, string> {{"Field1", "Raven"}, {"Field2", "Radiation"}, {"Field3", "Adorable"}},
new Dictionary<string, string> {{"Field1", "Sunflower"}, {"Field2", "Flowers"}, {"Field3", "Cute"}}
};
var fields = new[] {"Field1", "Field2", "Field3"};
var maximums = new List<String>(fields.Length);
foreach (var field in fields)
{
maximums.Add(Field(collection, field).OrderByDescending(fieldItem => fieldItem.Length).First());
}
where
static IEnumerable<String> Field(IEnumerable<Dictionary<String, String>> collection, String field)
{
foreach (var row in collection)
{
yield return row[field];
}
}
Thus relaying on an accumulator list.
This solution needs the number of fields of the various rows not to be varying from row to row (but it seems to be the case).
However, using an accumulator and an invoked method is not truly compact. What you really need is a steamlined way to transpose your data structure in order to figure the longest string for each field. As far as I know there is no shortcut in the framework to do this, thus the resulting method (Field(...)) will be tailored on your specific data structure (a collection of string-indexed strings).
Being so, the Field(...) method could be further enhanced by making it provide the longest string, thus shortening the overall invocation statement. So, the more work we put in that specific method, the more the solution approaches what you already had in mind before posting the question.
EDIT [Erik Burigo]: Changed in order to make the collection more similar to the one posted in the question.

loading a doubly linked list from a SQL database

I have a QueueList table:
{Id, Queue_Instance_ID, Parent_Queue_Instance_ID, Child_Queue_Instance_ID}
What would be the most efficient way to stuff this into a LinkedList<QueueList>? Do you think I can go lower than o(n^2)?
O(n) is better than O(n^2), right? ;)
I assume that the parent and child id are actually the previous and next id in the list.
First put the items in a dictionary so that you easily can look them up on the QueueInstanceId. In this loop you also locate the first item. Then you just add the items to the linked list and use the dictionary to get the next item. Example in C#:
Dictionary<int, QueueList> lookup = new Dictionary<int, QueueList>();
QueueList first = null;
foreach (QueueList item in source) {
lookup.Add(item.QueueInstanceId, item);
if (item.ParentQueueInstanceId == -1) {
first = item;
}
}
LinkedList<QueueList> list = new LinkedList<QueueList>();
do {
list.AddLast(first);
} while (lookup.TryGetValue(first.ChildQueueInstanceId, out first));
Adding items to a dictionary and getting them by key are O(1) operations, and each loop is the length of the source list, so it's all an O(n) operation.
Thinking about this a little bit, you can do it in O(n) time, although it will take some extra stuff that will add some overhead. (Still O(n)).
You could use some sort of universal hashing scheme so that you can build a good hashtable. Essentially, hash by the instance id.
-Take each queue, and throw it into a doubly linked node of some sort.
- Insert each node into the hashtable.
- When you insert, check if the parent and/or child are in the hashtable and link accordingly.
_ when you are done, start at the head (which you should be able to figure out in the first pass), walk the list and add it to your LinkedList or just use the resulting doubly linked list.
This takes one pass that is O(n), and the hashtable takes O(n) to initialize and inserts are O(1).
The problem left for you is choosing a good hashtable implementation that will give you those results.
Also, you have to consider the memory overhead, as I do not know how many results you expect. Hopefully enough to keep in memory.
I don't know of or think there is an SQL query based solution, but my SQL knowledge is very limited.
Hope this helps!

Categories