Linq to mess up order of an array? - c#

Update: False alarm! The source of the error was elsewhere. (See at the end of the question.)
Is it possible that either Linq or foreach can mess up the order of an array?
One of my testers reported to have experienced that the order of a list of items he fed in as input didn't match the order of the final list that was saved in the database. More precisely, the first element became the last.
So I reviewed my code step by step, and I have no clue what should change the order. There is a Linq query and a foreach loop however. Is it possible that one of these can mess up the order of an array?
Code, simplified:
List<FooBar> fooBarList = new List<FooBar>();
string[][] theData = new string[][] {
new string[] { "a", "x" },
new string[] { "b", "y" },
new string[] { "c", "z" } };
FooBar[] fooBarArray = theData.Select(
row => new FooBar { Foo = row[0], Bar = row[1] }
).ToArray();
foreach (FooBar item in fooBarArray)
{
int iRank = fooBarList.Count + 1;
item.Ranking = iRank;
fooBarList.Add(item);
}
The array of arrays of strings theData is in fact given as an input. It is transformed into an array of business objects. These are then added to a list and assigned a ranking field. This field is written to the database together with "Foo" and "Bar".
After saving the list in the database, the rank of "a" was 3 in that particular case. For me, however, I cannot reproduce the misbehavior...
Update: I was wrong, the data written to the database was correct. The data I looked to was from a business object that was copied from the original one. When copying, the order was mixed up while reading it from the database, and this wrong order was then persisted in the copy of the object... => Accepted Jon's answer saying "LINQ to Objects generally has a predictable ordering - other providers often don't."

Well, your sample code is only showing LINQ to Objects. How are you inserting the data into the database? If that's using LINQ as well, I strongly suspect that it's the LINQ to SQL (or whatever) side which is causing the issue, not LINQ to Objects.
LINQ to Objects generally has a predictable ordering - other providers often don't.
EDIT: If this happens reproducibly, then you ought to be able to catch it happening in the debugger... that should give you some hints. I suspect that if you try to create a short but complete program which demonstrates the problem, you'll end up finding out what's wrong.

Related

Textual Mining on the column Cell of Table that remove the Duplicates based on "##" notation

Let's Assume I have Table in SQL server that represents employee information for example
I want to do the Textual Mining on the Degree column that remove the Duplicates based on "##" notation.
LINQ to SQL
I am using Linq to SQL , so I am planning to get this data in C# variable context.And Perform operation on string and store again to the location!
Rules: i need to update the data or generate new table!
Is this right way of doing whether its possible ? need some suggestion on this approach or any alternative suggestions are welcome
So it looks like you need to break up the string based on the "##" delimiters, take the distinct items, and put them back in -- comma-delimited this time? The String.Split method to break up the string and then LINQ's Distinct extension method should get you just the unique ones.
Assuming you've got the text of the degree in a variable somewhere:
var uniques = degree
.Split(new String[] { "##" }, StringSplitOptions.None)
.Distinct();
String.Split usually works with a single character delimiter, but there's an overload that allows splitting on a larger string, so you'll have to use that one.
Then you can use String.Join to comma-delimit the unique items, or whatever else you need to do.
Edit: Apologies, I thought your original question was more about how to eliminate the duplicates than how to use LINQ to SQL.
Assuming you've got your DataContext and object model set up, you just need to select your object(s) out of the database using LINQ to SQL, make the changes you need to them, and then and then call SubmitChanges() on them.
For example:
var degrees = from d in context.GetTable<Employee>() select d;
foreach (var d in degrees)
{
d.Degree = String.Join(",", d.Degree
.Split(new String[] { "##" }, StringSplitOptions.None)
.Distinct());
}
context.SubmitChanges();
If you're new to LINQ to SQL, it may be worthwhile to run through a tutorial or two first. Here's part 1 of a pretty good series:
Lastly, you mentioned in your edit that you have the option of creating a new table after making your changes -- if that's the case, I'd consider storing the individual degrees in a table that links back to the employee record, rather than storing them as comma-separated values. It depends on your needs, of course, but SQL is designed to work in tables and sets, so the less string parsing/processing you can do the better.
Good luck!

What's the best way to normalize between two lists using Linq?

Given two IEnumerable instances in C# (call them a and b), what's the best way to remove all values in a that are not in b and add all values of b that are not in a?
I know I could just set a = b, normally, but this is ultimately for persisting to the DB via Entity Framework CodeFirst in an MVC application so there's some wonkiness of state to watch out for. In fact, we're talking about updating a record based on stuff posted from the client.
The closest that seems to work involves about for foreach loops, one to iterate the 'a' list and populate a collection of 'items to be removed', another to iterate the 'b' list to identify the 'items to be added', and then one each on the 'items to be removed' and 'items to be added' collections to add and remove items, respectively (since you can't modify the 'a' collection while you're iterating on it.
That feels clunky, though; is there a better way?
UPDATE
For clarity, I'll make an example. Let's say I have an entity I fetch from the DB which represents a blog Post (since that example never gets tired...) and said Post has a collection of Tags. From the client, I get a list of Tags that should be now the 'canonical' list of tags, but none of them are entities, it's just an in-memory collection. What I want to do is ensure that Post.Tags matches the tags being posted by the client, without creating duplicate tags in the database.
You can use Intersect, Concat and Except:
a = a.Intersect(b).Concat(b.Except(a));
Intersect returns items that exist in both collections, so a.Intersect(b) will give you all items that are in a and b.
Except returns elements that are in first collection, but not in the other, so b.Except(a) returns elements that are in b but not in a.
Concat concatenates these two collections.
But I don't really get your questions, so I'm not sure it's what you're looking for.
It sounds like your enumerable a is actually a list, so I did it this way:
var a = new List<int>() { 1, 2, 3, };
var b = new List<int>() { 1, 3, 4, 5, };
foreach (var x in a.Except(b).ToArray())
{
a.Remove(x);
}
foreach (var x in b.Except(a).ToArray())
{
a.Add(x);
}
At the end a has the same elements as b.
However, you need to be careful if you have duplicates in b.

Is there any way to loop through my sql results and store certain name/value pairs elsewhere in C#?

I have a large result set coming from a pretty complex SQL query. Among the values are a string which represents a location (that will later help me determine the page location that the value came from), an int which is a priority number calculated for each row based on other values from the row, and another string which contains a value I must remember for display later.
The problem is that the sql query is so complex (it has UNIONS, JOINS, and complex calculations with aliases) that I can't logically fit anything else into it without messing with the way it works.
Suffice it to say, though, after the query is done and the calculations performed, I need something that perhaps aggregate functions might solve, but that IS NOT an option, as all the columns do not come from other aggregate functions.
I have been wracking my brain for days now as to how I can iterate through the results, store a pair of values in a list (or two separate lists tied together somehow) where one value is the sum of all the priority values for each location and the other value is a distinct location value (i.e., as the results are looped through, it will not create another list item with the same location value that has been used before, HOWEVER, it does still need the sum of all of the other priority values from locations that ARE identical). Also, the results need to be ordered by priority in Descending order (hence the problem with using two lists).
EXAMPLE:
EDIT: I forgot, the preserved value should be the value from the row with the highest priority from the sql query.
If I had the following results:
location priority value
--------------------------------------------------------------------------------
page1 1 some text!
page2 3 more text!
page2 4 even more text!
page3 3 text again
page3 1 text
page3 1 still more text!
page4 6 text
If I was able to do what I wanted I would be able to achieve something like this after iteration (and in this order):
location priority value
--------------------------------------------------------------------------------
page2 7 even more text!
page4 6 text
page3 5 text again
page1 1 some text!
I have done research after research after research but absolutely nothing really even gets close to solving this dilemma.
Is what I'm asking too tough for even the powerful C# language?
THINGS I HAVE CONSIDERED:
Looping through the sql results and checking each location for repeats, adding together all priority values as I go, and storing these two plus value in two or three separate lists.
Why I still need help
I can't use a foreach because the logic didn't pan out, and I can't use a for loop because I can't access an IEnumerable (or whatever type it is that stores what's returned from Database.Open.Query() by index. (this makes sense, of course). Also, I need to sort on priority, but can't get one list out of sync with the others.
Using LINQ to select and store what I need
Why I still need help
I don't know LINQ (at all!) mainly because I don't understand lambda expressions (no matter HOW MUCH I read up about it).
Using an instantiated class to store the name/value pairs
Why I still need help
Not only do I expect sorting on this sort of thing to be impossible, and while I do now how to use .cs files in my C#.net webpages with WebMatrix environment, I have mainly only ever used static classes and would also need a little refresher course on constructors and how to set this up appropriately.
Somehow fitting this functionality into the already sizeable and complex SQL query
Why I still need help
While this is probably where I would ideally like this functionality to be, I stress again that this IS NOT AN OPTION. I have tried using aggregate functions, but only get an error saying how not all the other columns come from aggregate functions.
Making another query based on values from the first query's result set
Why I still need help
I can't select distinct results based on only one column (i.e., location) alone.
Assuming I could get the loop logic correct, storing the values in a 3 dimensional array
Why I still need help
I can't declare the array, because I do not know all of its dimensions before I need to use it.
Your post has amazed me in a number of ways like saying to 'mostly using static classes' and 'expecting instantiate a class/object to be impossible'.. really strange things you say. I can only respond in a quote from Charles Babbage:
I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
Anyways.. As you say you find lambdas hard, let's trace the problem in the classic 'manual' way.
Let's assume you have a list of ROWS that contains LOCATIONS and PRIORITIES.
List<DataRow> rows = .... ; // datatable, sqldatareader, whatever
You say you need:
list of unique locations
a "list" of locations paired up with summed up priorites
Let's start with the first objective.
To gather a list of unique 'values', a HashSet is just perfect:
HashSet<string> locations = new HashSet<string>();
foreach(var row in rows)
locations.Add( (string)rows["LOCATION"] );
well, and that's all. After that, the locations hashset will only remember all the unique locations. The "Add" does not result in duplicate elements. The HashSet checks and "uniquifies" all values that are put inside it. Small tricky thing is the hashset does not have the [index] operator. You'll have to enumerate the hashset to get the values:
foreach(string loc in locations)
{
Console.WriteLine(loc);
}
or convert/rewrite it to a list:
List<string> locList = new List<string>(locations);
Console.WriteLine(locList[2]); // of course, assuming there were at least three..
Let's get to the second objective.
To gather a list of values related to some thing behaving like a "logical key", a Dictionary<Key,Val> may be useful. It allows you to store/associate a "value" with some "key", ie:
Dictionary<string, double> dict = new Dictionary<string, double>();
dict["mamma"] = 123.45;
double d = dict["mamma"]; // d == 123.45
    dict["mamma"] += 101; // possible!
double e = dict["mamma"]; // d == 224.45
However, it has a behavior of happily throwing exceptions when you try to read from an unknown key:
Dictionary<string, double> dict = new Dictionary<string, double>();
dict["mamma"] = 123.45;
double d = dict["daddy"]; // throws KeyNotBlarghException
    dict["daddy"] += 101; // would throw too! tries to read the old/current value!
So, one have to be very careful with it with "keys" that it does not yet know. Fortunatelly, you can always ask the dictionary if it already knows a key:
Dictionary<string, double> dict = new Dictionary<string, double>();
dict["mamma"] = 123.45;
bool knowIt = dict.ContainsKey("daddy"); // == false
So you can easily check-and-initialize-when-unknown:
Dictionary<string, double> dict = new Dictionary<string, double>();
bool knowIt = dict.ContainsKey("daddy"); // == false
if( !knowIt )
dict["daddy"] = 5;
dict["daddy"] += 101; // now 106
So.. let's try summing up the priorities location-wise:
Dictionary<string, double> prioSums = new Dictionary<string, double>();
foreach(var row in rows)
{
string location = (string)rows["LOCATION"];
double priority = (double)rows["PRIORITY"];
if( ! prioSums.ContainsKey(location) )
// make sure that dictionary knows the location
prioSums[location] = 0.0;
prioSums[location] += priority;
}
And, really, that's all. Now the prioSums will know all locations and all sums of priorities:
var sss = prioSums["NewYork"]; // 9123, assuming NewYork was some location
However, that'd be quite useless to have to hardcode all locations. Hence, you also can ask the dictionary about what keys does it curently know
foreach(string key in prioSums.Keys)
Console.WriteLine(key);
and you can immediatelly use it:
foreach(string key in prioSums.Keys)
{
Console.WriteLine(key);
Console.WriteLine(prioSums[key]);
}
that should print all locations with all their sums.
You might already noticed an interesting thing: the dictionary can tell you what keys has it remembered. Hence, you do not need the HashSet from the first objective. Simply by summing up the priorities inside the Dictionary, you get the uniquized list of location by free: just ask the dict for its keys.
EDIT:
I noticed you've had a few more requests (like sort-descending or find-highest-prio-value), but I think I'll leave them for now. If you understand how I used a dictionary to collect the priorities, then you will easily build a similar Dictionary<string,string> to collect the highest-ranking value for a location. And the 'descending order' is done very easily if only you take the values out of dictionary and sort them as a i.e. List.. So I'll skip that for now.. This text got far tl;dr already I think :)
LINQ is really the tool to use for this kind of problems.
Suppose you have a variable pages which is an IEnumerable<Page>, where Page is a class with properties location, priority and value you could do
var query = from page in pages
group page by page.location into grp
select new { location = grp.Key,
priority = grp.Sum(page => page.priority),
value = grp.OrderByDescending(page => page.priority)
.First().value
}
You say you don't understand LINQ, so let me try to begin explain this statement.
The rows are group by location, which results in 4 groups of pages of which page.location is the key:
location priority value
--------------------------------------
page1 1 some text!
page2 3 more text!
4 even more text!
page3 1 text
1 still more text!
3 text again
page4 6 text
The select loops through these 4 groups and for each group it creates an anonymous type with 3 properties:
location: the key of the group
priority: the sum of priorities in one group
value: the first value in one group when its pages are sorted by priority in descending order.
The lamba expressions are a way to express which property should be used for a LINQ function like Sum. In short they say "transform page to page.priority": page => page.priority.
You want these new rows in descending order of priority, so finally you can do
result = query.OrderByDescending(x => x.priority).ToList();
The x is just an arbitrary placeholder representing one item in the collection in hand, query (likewise in the query above page could have been any word or character).

Help with linq to sql compiled query

I am trying to use compiled query for one of my linq to sql queries. This query contains 5 to 6 joins. I was able to create the compiled query but the issue I am facing is my query needs to check if the key is within a collection of keys passed as input. But compiled queries do not allow passing of collection (since collection can have varying number of items hence not allowed).
For instance
input to the function is a collection of keys. Say: List<Guid> InputKeys
List<SomeClass> output = null;
var compiledQuery = CompiledQuery.Compile<DataContext, List<Guid>, IQueryable<SomeClass>>(
(context, inputKeys) => from a in context.GetTable<A>()
where inputKeys.Contains(a.Key)
select a);
using(var dataContext = new DataContext())
{
output = compiledQuery(dataContext, InputKeys).ToList();
}
return output;
The above query does not compile since it is taking list as one of the inputs. Is there any work around or better way to do the above?
I'm not sure this is possible using only Linq to SQL. I think you'll need to have a stored procedure or function written on the server that lets you pass in a delimited string representing your list, and parses that returning a table, which you can then compare against.
I think the easiest way to accomplish this would be to write (or have your DBA write) the entire thing as a stored procedure, which will still need to take your list as a string for its argument, calling the aforementioned splitter function. The stored procedure will have its execution plan precompiled by the server.
You can easily make your list into a string using Linq with something like
string[] strings = new string[4] { "1", "2", "3", "4" };
string listOfStrings = strings.Aggregate((acc, s) => acc = acc + ", " + s);
You can turn a list of anything that can be cast to string into an IEnumerable of strings with
IEnumerable<string> strings = list.Cast<string>();
You can then add your stored procedure to your dbml file and call it using Linq to SQL.
I seem to recall that Linq to SQL, in order to remain general, doesn't deal with lists of things, and converts all lists you pass into a parameter for each entry in the list.

When Where clause is used inside Linq statement produces different results than when used outside

I have the following statement:
List<string> tracks = new List<string> { "ABC", "DEF" };
var items = (from i in Agenda.AgendaSessions
select i).Where(p => p.Tracks.Any(s => tracks.Contains(s.Code)));
this returns all sessions which track contains either ABC or DEF, now when I rewrite the statement like the following, it returns All sessions regardless, as if the clause always yeilds into true, can anyone shed any light on this please?
var items = from i in Agenda.AgendaSessions
where i.Tracks.Any(s=> tracks.Contains(s.Code))
select i;
Update
if there are other clauses within the where, does that affect the results?
The two code snippets are equivalent, i.e. they should always produce the same results under all circumstances. Of course, that assumes that AgendaSessions, Tracks and .Contains() are what we expect them to be; if they are property getters/methods which have curious side-effects such as modifying the contents of tracks, then anything could happen.
In other words, without knowing what the rest of your code looks like, we cannot help you, because there is no semantic difference between the two code snippets.

Categories