C# Console App - LINQ to find matching values between two IEnumerables

C# Console App - LINQ to find matching values between two IEnumerables - c#

I'm having some issues construction a LINQ query to find MATCHING values in two IEnumerables I have from CSV files and outputting those matching values to another list for some bookkeeping later on in my application.
My classes for the IEnumerables and the related code (using CSVHelper) to read the CSVs into the IEnumerables are below. Any input on where to begin, LINQ query wise, to find those matching values and output to a list? I'm relatively new to LINQ (usually use SQL in the backend) and I'm finding it a bit difficult to do exactly what I want to.
CLASSES:
class StudentSuccessStudents
{
[CsvColumn(Name ="StudentID", FieldIndex = 1)]
public string StudentID { get; set; }
}
class PlacementStudents
{
[CsvColumn(Name = "StudentId", FieldIndex = 1)]
public string StudentId { get; set; }
}
PROGRAM:
CsvFileDescription inputCsvStuSuccess = new CsvFileDescription
{
SeparatorChar = ',',
FirstLineHasColumnNames = true,
EnforceCsvColumnAttribute = false
};
CsvContext ccStuSuccess = new CsvContext();
CsvFileDescription inputCsvStuScores = new CsvFileDescription
{
SeparatorChar = ',',
FirstLineHasColumnNames = false,
EnforceCsvColumnAttribute = true
};
CsvContext ccStuScores = new CsvContext();
IEnumerable<StudentSuccessStudents> students = ccStuSuccess.Read<StudentSuccessStudents>(filePath, inputCsvStuSuccess);
IEnumerable<PlacementStudents> outputStudents = ccStuScores.Read<PlacementStudents>(csvPath, inputCsvStuScores);
Any suggestions on how to get all my "StudentID" fields in the first list that match a "StudentId" in the second one to output to another list with LINQ? I basically need to have that "matching" list so I can safely ignore those values elsewhere.

You could always use the SQL syntax of Linq like shown below. This way it looks more like something you're use to, and gets you the duplicate values you need. Also looks more readable (in my opinion) too.
var duplicates = from success in students
join placement in outputStudents on success.StudentID equals placement.StudentId
select success.StudentID;

This is a linq operator to do this functionality called Intersect so
If they were both the same type you could do this
var result = students.Intersect(outputStudents);
Which should be the fastest way.
Since they are different types you do this
var result = students.Intersect(outputStudents.Select(x => new StudentSuccessStudent(x.id) );
basically you create a new list of the correct type dynamically
This is an example where inheritance or interfaces are powerful. If they both inherited from the same type then you could intersect on that type and C# would solve this problem super fast.

Related

Sorting out Nodetypes after yield in Neo4jClient

I have the follwoing query in cypher
Match(n1: Red)
Where n1.Id = "someId"
Call apoc.path.subgraphAll(n1,{ minLevel: 0,maxLevel: 100,relationshipFilter: "link",labelFilter: "+Red|Blue"})
Yield nodes, relationships
Return nodes, relationships
The graph I query has roughly a structure of "Red -> Blue -> Red" where all the edges are of the type "link".
The query yield exactly the expected result in the browser client.
My C# looks like this:
string subgraphAll = "apoc.path.subgraphAll";
object optionsObj = new {
minLevel = 0,
maxLevel = 100,
relationshipFilter = $"{link}",
labelFilter = $"+{Red}|{Blue}",
beginSequenceAtStart = "true",
bfs = true,
filterStartNode = false,
limit = -1,
//endNodes = null,
//terminatorNodes = null,
//whitelistNodes = null,
//blacklistNodes = null,
};
string options = JObject.FromObject(optionsObj).ToString();
var query = client.Cypher
.Match($"(n1:{"Red"})")
.Where((Red n1) => n1.Id == "someId")
.Call($"{subgraphAll}(n1, {options})")
.Yield($"nodes,relationships")
//FigureOut what to do
.Return<Object>("");
var result = query.ResultsAsync.Result;
My question is: How would I write that in C# with the Neo4J client and how do I get typesafe lists at the end (something like List<Red>, List<Blue>, List<Relationship>).
As Red and Blue are different types in C#, I don't see how I can deserialize the mixed "nodes" list from the query.
Note that my examples are a bit simplified. The Nodetypes are not strings but come from Enums in my application to have a safe way to know what node types exist and there are real models behind those types.
I tried to break out the whole parametrization of the stored proc, but the code is untested and I don't know if there is a better solution to do this yet. If there is a better way, please advise on that too.
I am new to cypher, so I need a little help here.
My idea was to split the nodes list into two lists (Red and Blue List) and then output the three Lists as properties of an anonymous object (as in the examples). Unfortunately My cypher isn't good enough to figure it out yet, and translating to the c# syntax at the same time doesn't help either.
My main concern is that once I deserialize into a list of untyped objects, It will be hell to parse them back into my models. So I want the query to do that sorting out for me.

In my view, if you want to go down the route of parsing the outputs into Red/Blue classes, it's going to be easier to do it in C# than in Cypher.
Unfortunately, also in this case - I think it'll be easier to execute the query using the Neo4j.Driver driver instead of Neo4jClient - and that's because at the moment, Neo4jClient seems to remove the id (etc) properties you'd need to be able to rebuild the graph properly.
With 4.0.3 of the Client you can access the Driver by doing this:
((BoltGraphClient)client).Driver
I have used a 'Movie/Person' example, as it's a dataset I had to hand, but the principals are the same, something like:
var queryStr = #"
Match(n1: Movie)
Where n1.title = 'The Matrix'
Call apoc.path.subgraphAll(n1,{ minLevel: 0,maxLevel: 2,relationshipFilter: 'ACTED_IN',labelFilter: '+Movie|Person'})
Yield nodes, relationships
Return nodes, relationships
";
var movies = new List<Movie>();
var people = new List<People>();
var session = client.Driver.AsyncSession();
var res = await session.RunAsync(queryStr);
await res.FetchAsync();
foreach (var node in res.Current.Values["nodes"].As<List<INode>>())
{
//Assumption of one label per node.
switch(node.Labels.Single().ToLowerInvariant()){
case "movie":
movies.Add(new Movie(node));
break;
case "person":
/* similar to above */
break;
default:
throw new ArgumentOutOfRangeException("node", node.Labels.Single(), "Unknown node type");
}
}
With Movie etc defined as:
public class Movie {
public long Id {get;set;}
public string Title {get;set;}
public Movie(){}
public Movie(INode node){
Id = node.Id;
Title = node.Properties["title"].As<string>();
}
}
The not pulling back ids etc problem for the client is something I need to look at how to fix, but this is the quickest way short of that to get where you want to be.

Search a List of string array to find a value in matching element and return another element in same array

So I have
List<string[]> listy = new List<string[]>();
listy.add('a','1','blue');
listy.add('b','2','yellow');
And i want to search through all of the list ti find the index where the array containing 'yellow' is, and return the first element value, in this case 'b'.
Is there a way to do this with built in functions or am i going to need to write my own search here?
Relatively new to c# and not aware of good practice or all the built in functions. Lists and arrays im ok with but lists of arrays baffles me somewhat.
Thanks in advance.

As others have already suggested, the easiest way to do this involves a very powerful C# feature called LINQ ("Language INtegrated Queries). It gives you a SQL-like syntax for querying collections of objects (or databases, or XML documents, or JSON documents).
To make LINQ work, you will need to add this at the top of your source code file:
using System.Linq;
Then you can write:
IEnumerable<string> yellowThings =
from stringArray in listy
where stringArray.Contains("yellow")
select stringArray[0];
Or equivalently:
IEnumerable<string> yellowThings =
listy.Where(strings => strings.Contains("yellow"))
.Select(strings => strings[0]);
At this point, yellowThings is an object containing a description of the query that you want to run. You can write other LINQ queries on top of it if you want, and it won't actually perform the search until you ask to see the results.
You now have several options...
Loop over the yellow things:
foreach(string thing in yellowThings)
{
// do something with thing...
}
(Don't do this more than once, otherwise the query will be evaluated repeatedly.)
Get a list or array :
List<string> listOfYellowThings = yellowThings.ToList();
string[] arrayOfYellowThings = yellowThings.ToArray();
If you expect to have exactly one yellow thing:
string result = yellowThings.Single();
// Will throw an exception if the number of matches is zero or greater than 1
If you expect to have either zero or one yellow things:
string result = yellowThings.SingleOrDefault();
// result will be null if there are no matches.
// An exception will be thrown if there is more than one match.
If you expect to have one or more yellow things, but only want the first one:
string result = yellowThings.First();
// Will throw an exception if there are no yellow things
If you expect to have zero or more yellow things, but only want the first one if it exists:
string result = yellowThings.FirstOrDefault();
// result will be null if there are no yellow things.

Based on the problem explanation provided by you following is the solution I can suggest.
List<string[]> listy = new List<string[]>();
listy.Add(new string[] { "a", "1", "blue"});
listy.Add(new string[] { "b", "2", "yellow"});
var target = listy.FirstOrDefault(item => item.Contains("yellow"));
if (target != null)
{
Console.WriteLine(target[0]);
}
This should solve your issue. Let me know if I am missing any use case here.

You might consider changing the data structure,
Have a class for your data as follows,
public class Myclas
{
public string name { get; set; }
public int id { get; set; }
public string color { get; set; }
}
And then,
static void Main(string[] args)
{
List<Myclas> listy = new List<Myclas>();
listy.Add(new Myclas { name = "a", id = 1, color = "blue" });
listy.Add(new Myclas { name = "b", id = 1, color = "yellow" });
var result = listy.FirstOrDefault(t => t.color == "yellow");
}

Your current situation is
List<string[]> listy = new List<string[]>();
listy.Add(new string[]{"a","1","blue"});
listy.Add(new string[]{"b","2","yellow"});
Now there are Linq methods, so this is what you're trying to do
var result = listy.FirstOrDefault(x => x.Contains("yellow"))?[0];

How do i get the difference in two lists in C#?

Ok so I have two lists in C#
List<Attribute> attributes = new List<Attribute>();
List<string> songs = new List<string>();
one is of strings and and one is of a attribute object that i created..very simple
class Attribute
{
public string size { get; set; }
public string link { get; set; }
public string name { get; set; }
public Attribute(){}
public Attribute(string s, string l, string n)
{
size = s;
link = l;
name = n;
}
}
I now have to compare to see what songs are not in the attributes name so for example
songs.Add("something");
songs.Add("another");
songs.Add("yet another");
Attribute a = new Attribute("500", "http://google.com", "something" );
attributes.Add(a);
I want a way to return "another" and "yet another" because they are not in the attributes list name
so for pseudocode
difference = songs - attributes.names

var difference = songs.Except(attributes.Select(s=>s.name)).ToList();
edit
Added ToList() to make it a list

It's worth pointing out that the answers posted here will return a list of songs not present in attributes.names, but it won't give you a list of attributes.names not present in songs.
While this is what the OP wanted, the title may be a little misleading, especially if (like me) you came here looking for a way to check whether the contents of two lists differ. If this is what you want, you can use the following:-
var differences = new HashSet(songs);
differences.SymmetricExceptWith(attributes.Select(a => a.name));
if (differences.Any())
{
// The lists differ.
}

This is the way to find all the songs which aren't included in attributes names:
var result = songs
.Where(!attributes.Select(a => a.name).ToList().Contains(song));
The answer using Except is also perfect and probably more efficient.
EDIT: This sintax has one advantage if you're using it in LINQ to SQL: it translates into a NOT IN SQL predicate. Except is not translated to anything in SQL. So, in that context, all the records would be recovered from the database and excepted on the app side, which is much less efficient.

var diff = songs.Except(attributes.Select(a => a.name)).ToList();

My code is very inefficient for this simple Linq usage

I have the following method that is supposed to parse information from an XML response and return a collection of users.
I've opted into creating a Friend class and returning a List<Friend> to the calling method.
Here's what I have so far, but I noticed that the ids.ToList().Count method parses every single id element to a List, then does it again in the for conditional. It's just super ineffective.
public List<Friend> FindFriends()
{
List<Friend> friendList = new List<Friend>();
var friends = doc.Element("ipb").Element("profile").Element("friends").Elements("user");
var ids = from fr in friends
select fr.Element("id").Value;
var names = from fr in friends
select fr.Element("name").Value;
var urls = from fr in friends
select fr.Element("url").Value;
var photos = from fr in friends
select fr.Element("photo").Value;
if (ids.ToList().Count > 0)
{
for (int i = 0; i < ids.ToList().Count; i++)
{
Friend buddy = new Friend();
buddy.ID = ids.ToList()[i];
buddy.Name = names.ToList()[i];
buddy.URL = urls.ToList()[i];
buddy.Photo = photos.ToList()[i];
friendList.Add(buddy);
}
}
return friendList;
}

First question - do you have to return a List<Friend>? Can you return an IEnumerable<Friend> instead? If so, performance gets a lot better:
IEnumerable<Friend> FindFriends()
{
return doc.Descendants("user").Select(user => new Friend {
ID = user.Element("id").Value,
Name = user.Element("name").Value,
Url = user.Element("url").Value,
Photo = user.Element("photo").Value
});
}
Rather than actually creating new buckets and stuffing values into them, this creates a projection, or a new object that simply contains all of the logic for how to create the new Friend objects without actually creating them. They get created when the caller eventually starts to foreach over the IEnumerable. This is called "deferred execution".
This also makes one assumption - All the <user> nodes in your XML fragment are friends. If that isn't true, the first part of the XML selection might need to be a little more complex.
And as #anon points out, even if you do need to return a List<Friend> for some reason not obvious from the information you've provided, you can just call .ToList() at the end of the return statement. This will just execute the projection I described above straight into a new bucket, so you only ever create one.

Why do you need the separate ids/names/urls/photos variables? Combine it all. You can eliminate the ToList() call if you don't need a List.
List<Friend> friendList = (from f in doc.Element("ipb").Element("profile").Element("friends").Elements("user")
select new Friend() {
ID = f.Element("id").Value,
Name = f.Element("name").Value,
URL = f.Element("url").Value,
Photo = f.Element("photo").Value
}).ToList();
return friendList;

Regex Replace to assist Orderby in LINQ

I'm using LINQ to SQL to pull records from a database, sort them by a string field, then perform some other work on them. Unfortunately the Name field that I'm sorting by comes out of the database like this
Name
ADAPT1
ADAPT10
ADAPT11
...
ADAPT2
ADAPT3
I'd like to sort the Name field in numerical order. Right now I'm using the Regex object to replace "ADAPT1" with "ADAPT01", etc. I then sort the records again using another LINQ query. The code I have for this looks like
var adaptationsUnsorted = from aun in dbContext.Adaptations
where aun.EventID == iep.EventID
select new Adaptation
{
StudentID = aun.StudentID,
EventID = aun.EventID,
Name = Regex.Replace(aun.Name,
#"ADAPT([0-9])$", #"ADAPT0$1"),
Value = aun.Value
};
var adaptationsSorted = from ast in adaptationsUnsorted
orderby ast.Name
select ast;
foreach(Adaptation adaptation in adaptationsSorted)
{
// do real work
}
The problem I have is that the foreach loop throws the exception
System.NotSupportedException was unhandled
Message="Method 'System.String Replace(System.String, System.String,
System.String)' has no supported translation to SQL."
Source="System.Data.Linq"
I'm also wondering if there's a cleaner way to do this with just one LINQ query. Any suggestions would be appreciated.

Force the hydration of the elements by enumerating the query (call ToList). From that point on, your operations will be against in-memory objects and those operations will not be translated into SQL.
List<Adaptation> result =
dbContext.Adaptation
.Where(aun => aun.EventID = iep.EventID)
.ToList();
result.ForEach(aun =>
aun.Name = Regex.Replace(aun.Name,
#"ADAPT([0-9])$", #"ADAPT0$1")
);
result = result.OrderBy(aun => aun.Name).ToList();

Implement a IComparer<string> with your logic:
var adaptationsUnsorted = from aun in dbContext.Adaptations
where aun.EventID == iep.EventID
select new Adaptation
{
StudentID = aun.StudentID,
EventID = aun.EventID,
Name = aun.Name,
Value = aun.Value
};
var adaptationsSorted = adaptationsUnsorted.ToList<Adaptation>().OrderBy(a => a.Name, new AdaptationComparer ());
foreach (Adaptation adaptation in adaptationsSorted)
{
// do real work
}
public class AdaptationComparer : IComparer<string>
{
public int Compare(string x, string y)
{
string x1 = Regex.Replace(x, #"ADAPT([0-9])$", #"ADAPT0$1");
string y1 = Regex.Replace(y, #"ADAPT([0-9])$", #"ADAPT0$1");
return Comparer<string>.Default.Compare(x1, y1);
}
}
I didn't test this code but it should do the job.

I wonder if you can add a calculated+persisted+indexed field to the database, that does this for you. It would be fairly trivial to write a UDF that gets the value as an integer (just using string values), but then you can sort on this column at the database. This would allow you to use Skip and Take effectively, rather than constantly fetching all the data to the .NET code (which simply doesn't scale).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Console App - LINQ to find matching values between two IEnumerables - c#

Related

Sorting out Nodetypes after yield in Neo4jClient

Search a List of string array to find a value in matching element and return another element in same array

How do i get the difference in two lists in C#?

My code is very inefficient for this simple Linq usage

Regex Replace to assist Orderby in LINQ

Categories

Resources