Remove duplicates from list using criteria - c#

I want to remove duplicate filenames from a list that contains:
http://www.test.com/download/imagename_A.jpg
http://www.test.com/download/imagename_B.jpg
http://www.test.com/download/imagename_C.jpg
http://fc07.test.net/fs49/f/2009/216/6/f/imagename_A.jpg
http://fc09.test.net/fs49/f/2009/195/d/8/imagename_B.jpg
I want the final list to find duplicates that have the SAME filename, where if they do, the domain.net is selected over the domain.com, resulting in this final list:
http://fc07.test.net/fs49/f/2009/216/6/f/imagename_A.jpg
http://fc09.test.net/fs49/f/2009/195/d/8/imagename_B.jpg
http://www.test.com/download/imagename_C.jpg
I suspect that this can be done with linq (I found this article - Find Duplicate in list but with criteria), but I don't know enough about linq to make it work for me.

var result = urls.GroupBy(url => Path.GetFileName(url))
.Select(g => g.OrderByDescending(u=>new Uri(u).DnsSafeHost.EndsWith(".net")).First())
.ToList();

You can use string.split('/') to split the URL (after converting URL to string) by "/" then compare the file names by checking the last position of the array that is created. Then you can split the second position of the array with string.split('.') and check for .net/.com in the third position of that array.

Related

Splitting large string in c# and adding values in it to a List

I have a string as shown below
string names = "<?startname; Max?><?startname; Alex?><?startname; Rudy?>";
is there any way I can split this string and add Max , Alex and Rudy into a separate list ?
Sure, split on two strings (all that consistently comes before, and all that consistently comes after) and specify that you want Split to remove the empties:
var r = names.Split(new[]{ "<?startname; ", "?>" }, StringSplitOptions.RemoveEmptyEntries);
If you take out the RemoveEmptyEntries it will give you a more clear idea of how the splitting is working, but in essence without it you'd get your names interspersed with array entries that are empty strings because split found a delimiter (the <?...) immediately following another (the ?>) with an empty string between the delimiters
You can read the volumes of info about this form of split here - that's a direct link to netcore3.1, you can change your version in the table of contents - this variant of Split has been available since framework2.0
You did also say "add to a separate list" - didn't see any code for that so I guess you will either be happy to proceed with r here being "a separate list" (an array actually, but probably adequately equivalent and easy to convert with LINQ's ToList() if not) or if you have another list of names (that really is a List<string>) then you can thatList.AddRange(r) it
Another Idea is to use Regex
The following regex should work :
(?<=; )(.*?)(?=\s*\?>)

Linq - Is Exact String in String Array

I'm trying to write a Linq query that loops through a set of Umbraco nodes and checks if it's Document Type Alias is in a string array. I've got something very close:
if (allowedDocTypes != null && allowedDocTypes.Length > 0)
{
allowedDocTypes = allowedDocTypes.Where(x => !string.IsNullOrEmpty(x)).ToArray();
nodes = nodes.Where(x => x.DocumentTypeAlias.ContainsAny(allowedDocTypes));
}
allowedDocTypes is a string array that includes the document types. The first line inside the if statement removes any empty strings from the array. Finally, I'm making use of the ContainsAny method to check if the document type alias is in the string array.
This almost works in that it'll check if the document type alias contains any of the string in the string array. However, it works for partial matches as well but I really need exact matches.
For example, the string array has a value of review in it. What ContainsAny appears to do is pull through all the nodes with a document type alias of review but it'll also pull through any with a document type alias of preview.
Is there a way to easily change this so that review would be an exact match rather than partial?
Thanks,
Ben
All you really should have to do is reverse the logic a bit and use Contains:
nodes = nodes.Where(x => allowedDocTypes.Contains(x.DocumentTypeAlias));

Find Results with at Least One term from array

I'm writing a search algorithm. For the last portion of it, I want to split their search into individual words and then find any results that have at least one of those words in it. Is there any function that would work something like "ContainsAny" below? Otherwise, how can I make that happen?
string[] splitStr = text.Split();
result = db.Table.Where(x => x.Name.ContainsAny(splitStr).FirstOrDefault();
For example, if they search for "Metal Spoon" both "Metal Chair" and "spoon book" would be valid results because each contains at least one of the search terms.
There is no ContainsAny, but you can use combination of Any and Contains like this:
var results = db.Table.Where(x => splitStr.Any(s => x.Name.Contains(s)));

Search list of objects

I get a list of security groups a user belongs to as array of objects. I use DirectoryEntry to get active directory properties and one of the properties is "memberOf" (de.properties["memberOf"].value). The return values is an "array of objects". Each element of this array of objects look something like:
"CN=SITE_MAINTENANCE,OU=CMS,OU=SD,OU=ESM,OU=Engineering Systems,DC=usa,DC=abc,DC=domain,DC=com"
I can loop through the elements, cast each element as "string" and search this way. I just thought there might be an easier way that does not require looping.
I need to be able to find the one(s) with OU=CMS in it.
Thanks.
Loop through the array and then use indexOf or Regexp search for the string "OU=CMS". If it exists in the string, then you've "found the one(s) with OU=CMS in it."
You can do anything like throwing the items into a new list or whatever you want.
list.Where(a=>a.ToString().Contains("OU=CMS")).ToList();
You can use like follows
string listString="CN=SITE_MAINTENANCE,OU=CMS,OU=SD,OU=ESM,"+
"OU=Engineering Systems,DC=usa,DC=abc,DC=domain,DC=com"
Using linq:
listString.Split(',').Contains("OU=CMS")
W/o linq:
Array.IndexOf(listString.Split(','), "OU=CMS") >= 0
you can search required value by foreach loop

Order the lines in a file by the last character on the line

Can you please help me with this:
I want to build a method in C# which will order a lot of files by the following rule
every line contains strings and the last character in every line is an int.
I want to order the lines in the file by this last character, the int.
Thanks
To order ascending by the last character, interpreted as an integer you could do:
var orderedLines= File.ReadAllLines(#"test.txt")
.OrderBy(line => Convert.ToInt32(line[line.Length-1]))
.ToList();
Edit:
With the clarification in your comment - integer following a space character, can be more than one digit:
var orderedLines= File.ReadAllLines(#"test.txt")
.OrderBy(line => Convert.ToInt32(line.Substring(line.LastIndexOf(" ")+1,
line.Length - line.LastIndexOf(" ")-1)))
.ToList();
You could do something like this, where filename is the name of your file:
// Replace with the actual name of your file
string fileName = "MyFile.txt";
// Read the contents of the file into memory
string[] lines = File.ReadAllLines(fileName);
// Sort the contents of the file based on the number after the last space in each line
var orderedLines = lines.OrderBy(x => Int32.Parse(x.Substring(x.LastIndexOf(' '))));
// Write the lines back to the file
File.WriteAllText(fileName, string.Join(Environment.NewLine, orderedLines));
This is just a rough outline; hopefully it's helpful.
File.WriteAllLines(
pathToWriteTo,
File.ReadLines(pathToReadFrom)
.OrderBy(s => Convert.ToInt32(s.Split(' ').Last()))
);
If the file is large, this could be ineffective as this method of sorting effectively requires reading the entire file into memory.
Assuming you want more than single digit integers and that you have a separation character between the filename and the rest (we'll call it 'splitChar') which can be any character at all:
from string str in File.ReadAllLines(fileName)
let split = str.Split(splitChar)
orderby Int32.Parse(split[split.Count()-1])
select str
will get you a sequence of strings in order of the integer value of the last grouping (separated by the split character).
Maybe one of these links can help you by sorting it the natural way:
Natural Sorting in C#
Sorting for Humans : Natural Sort Order

Categories