Compare 2 lists for partial match

Compare 2 lists for partial match - c#

C# Folks! I have 2 List that I want to compare.
Example:
List<string> ONE contains:
A
B
C
List<string> TWO contains:
B
C
I know I can achieve the results of ONE if I do:
ONE.Except(TWO);
Results: A
How can I do the same if my Lists contain a file extension for each
Element?
List<string> ONE contains:
A.pdf
B.pdf
C.pdf
List<string> TWO contains: (will always have .txt extension)
B.txt
C.txt
Results should = A.pdf
I realized that I need to display the full filename (A.pdf) in a report at the end, so I cannot strip the extension, like I originally did.
Thanks for the help!
EDIT:
This is how I went about it, but I am not sure if this is the "best" or "most performant" way to actually solve it, but it does seem to work...
foreach (string s in ONE)
{
//since I know TWO will always be .txt
string temp = Path.GetFileNameWithoutExtension(s) + ".txt";
if (TWO.Contains(temp))
{
// yes it exists, do something
}
else
{
// no it does not exist, do something
}
}

This a very straightforward and a easy code , but if your requirement has more file extension
List<string> lstA = new List<string>() { "A.pdf", "B.pdf", "C.pdf" };
List<string> lstB = new List<string>() { "B.txt", "C.txt" };
foreach (var item in lstA)
{
if (lstB.Contains(item.Replace(".pdf",".txt"))==false)
{
Console.WriteLine(item);
}
}

You can implement a custom equality comparer:
class FileNameComparer: IEqualityComparer<String>
{
public bool Equals(String b1, String b2)
{
return Path.GetFileNameWithoutExtension(b1).Equals(Path.GetFileNameWithoutExtension(b2));
}
public int GetHashCode(String a)
{
return Path.GetFileNameWithoutExtension(a).GetHashCode();
}
}
... and pass it to the Except method:
System.Console.WriteLine(string.Join(", ", list1.Except(list2, new FileNameComparer())));

Related

loading TextAsset from file and unable to compare

I am loading TextAsset from resources which is dictionary words and added to List and i want to compare user input word with list whether list contains user input word or not? i have tried many methods but none is working, result is negative. can any one help me out to find out?
public TextAsset txt;
public List<string> words;
void Awake()
{
words = new List<string>();
txt = (TextAsset)Resources.Load("words");
words = TextAssetExtensionMethods.TextAssetToList(txt);
}
public void Search()
{
Debug.Log(inputField.text);
Debug.Log(words.Contains(inputField.text));
Debug.Log(words.FindAll(s => s.Contains(inputField.text)));
Debug.Log(words.FindAll(s => s.IndexOf(inputField.text, StringComparison.OrdinalIgnoreCase) >= 0));
if (words.Contains(inputField.text, StringComparer.CurrentCultureIgnoreCase)) {
Debug.Log("Contains");
} else{
Debug.Log("not");
}
}
public static class TextAssetExtensionMethods {
public static List<string> TextAssetToList(this TextAsset ta) {
return new List<string>(ta.text.Split('\n'));
}
}

I don't know why you have created an extension method for TextAsset class but now, when you have it, you should use it like this (calling own method on TextAsset instance):
words = txt.TextAssetToList();
instad of:
words = TextAssetExtensionMethods.TextAssetToList(txt);
Now, one of possible issues you might have here is leaving spaces in your strings,
just trim out your entries:
Array.ConvertAll(ta.text.Split(','), p => p.Trim()).ToList(); //LINQ used
assuming your words are separated by comma

sort fileinfo list by int within fileinfo name

I have some trouble to sort a List, I need it to be sorted by the FileInfo.Name attribut, within this name is an interger with unknown length at the very beginning of the string. I need to sort the list by this number.
As my experience it is very difficult to compare two strings by a number within this string, so I could need some help.
This is my list:
I need the list to be sorted from this [1,13,2,3,4,5] into this [1,2,3,4,5,13]
Here is what I have tried so far:
infos.Sort((a, b) => a.Split('-')[0].CompareTo(b.Split('-')[0]));
Of course this can not work as I try to compare strings by numbers....
EDIT:
Unfortunaely the solution from Mukund does not work as shown in this image:

You can use this.
infos.OrderBy(x => Convert.ToInt32(x.Split('-')[0]))
class Program11
{
static void Main(string [] args)
{
var infos = new List<string> { "1-100.jpg", "13-11.jpg", "2-145.jpg", "3-421.jpg", "4-842.jpg", "5-1000.jpg" };
var orderedList = infos.OrderBy(x => Convert.ToInt32(x.Split('-')[0]));
foreach (var lstItem in orderedList)
{
Console.WriteLine(lstItem);
}
Console.ReadKey();
}
}
Output:

how to efficiently Comparing two lists with 500k objects and strings

So i have a main directory with sub folders and around 500k images. I know alot of theese images does not exist in my database and i want to know which ones so that i can delete them.
This is the code i have so far:
var listOfAdPictureNames = ImageDB.GetAllAdPictureNames();
var listWithFilesFromImageFolder = ImageDirSearch(adPicturesPath);
var result = listWithFilesFromImageFolder.Where(p => !listOfAdPictureNames.Any(q => p.FileName == q));
var differenceList = result.ToList();
listOfAdPictureNames is of type List<string>
here is my model that im returing from the ImageDirSearch:
public class CheckNotUsedAdImagesModel
{
public List<ImageDirModel> ListWithUnusedAdImages { get; set; }
}
public class ImageDirModel
{
public string FileName { get; set; }
public string Path { get; set; }
}
and here is the recursive method to get all images from my folder.
private List<ImageDirModel> ImageDirSearch(string path)
{
string adPicturesPath = ConfigurationManager.AppSettings["AdPicturesPath"];
List<ImageDirModel> files = new List<ImageDirModel>();
try
{
foreach (string f in Directory.GetFiles(path))
{
var model = new ImageDirModel();
model.Path = f.ToLower();
model.FileName = Path.GetFileName(f.ToLower());
files.Add(model);
}
foreach (string d in Directory.GetDirectories(path))
{
files.AddRange(ImageDirSearch(d));
}
}
catch (System.Exception excpt)
{
throw new Exception(excpt.Message);
}
return files;
}
The problem I have is that this row:
var result = listWithFilesFromImageFolder.Where(p => !listOfAdPictureNames.Any(q => p.FileName == q));
takes over an hour to complete. I want to know if there is a better way to check in my images folder if there are images there that doesn't exist in my database.
Here is the method that get all the image names from my database layer:
public static List<string> GetAllAdPictureNames()
{
List<string> ListWithAllAdFileNames = new List<string>();
using (var db = new DatabaseLayer.DBEntities())
{
ListWithAllAdFileNames = db.ad_pictures.Select(b => b.filename.ToLower()).ToList();
}
if (ListWithAllAdFileNames.Count < 1)
return new List<string>();
return ListWithAllAdFileNames;
}

Perhaps Except is what you're looking for. Something like this:
var filesInFolderNotInDb = listWithFilesFromImageFolder.Select(p => p.FileName).Except(listOfAdPictureNames).ToList();
Should give you the files that exist in the folder but not in the database.

Instead of the search being repeated on each of these lists its optimal to sort second list "listOfAdPictureNames" (Use any of n*log(n) sorts). Then checking for existence by binary search will be the most efficient all other techniques including the current one are exponential in order.

As I said in my comment, you seem to have recreated the FileInfo class, you don't need to do this, so your ImageDirSearch can become the following
private IEnumerable<string> ImageDirSearch(string path)
{
return Directory.EnumerateFiles(path, "*.jpg", SearchOption.TopDirectoryOnly);
}
There doesn't seem to be much gained by returning the whole file info where you only need the file name, and also this only finds jpgs, but this can be changed..
The ToLower calls are quite expensive and a bit pointless, so is the to list when you are planning on querying again so you can get rid of that and return an IEnumerable again, (this is in the GetAllAdPictureNames method)
Then your comparison can use equals and ignore case.
!listOfAdPictureNames.Any(q => p.Equals(q, StringComparison.InvariantCultureIgnoreCase));
One more thing that will probably help is removing items from the list of file names as they are found, this should make the searching of the list quicker every time one is removed since there is less to iterate through.

Finding differences in two lists

I am thinking about a good way to find differences in two lists
here is the problem:
Two lists have some strings where first 3 numbers/characters (*delimited) represent the unique key(followed by the text String="key1*key2*key3*text").
here is the string example:
AA1*1D*4*The quick brown fox*****CC*3456321234543~
where "*AA1*1D*4*" is a unique key
List1: "index1*index2*index3", "index2*index2*index3", "index3*index2*index3"
List2: "index2*index2*index3", "index1*index2*index3", "index3*index2*index3", "index4*index2*index3"
I need to match indexes in both lists and compare them.
If all 3 indexes from 1 list match 3 indexes from another list, I need to track both string entries in the new list
If there is a set of indexes in one list that don't appear in another, I need to track one side and keep an empty entry in another side. (#4 in the example above)
return the list
This is what I did so far, but I am kind of struggling here:
List<String> Base = baseListCopy.Except(resultListCopy, StringComparer.InvariantCultureIgnoreCase).ToList(); //Keep unique values(keep differences in lists)
List<String> Result = resultListCopy.Except(baseListCopy, StringComparer.InvariantCultureIgnoreCase).ToList(); //Keep unique values (keep differences in lists)
List<String[]> blocksComparison = new List<String[]>(); //we container for non-matching blocks; so we could output them later
//if both reports have same amount of blocks
if ((Result.Count > 0 || Base.Count > 0) && (Result.Count == Base.Count))
{
foreach (String S in Result)
{
String[] sArr = S.Split('*');
foreach (String B in Base)
{
String[] bArr = B.Split('*');
if (sArr[0].Equals(bArr[0]) && sArr[1].Equals(bArr[1]) && sArr[2].Equals(bArr[2]) && sArr[3].Equals(bArr[3]))
{
String[] NA = new String[2]; //keep results
NA[0] = B; //[0] for base
NA[1] = S; //[1] for result
blocksComparison.Add(NA);
break;
}
}
}
}
could you suggest a good algorithm for this process?
Thank you

You can use a HashSet.
Create a HashSet for List1. remember index1*index2*index3 is diffrent from index3*index2*index1.
Now iterate through second list.
Create Hashset for List1.
foreach(string in list2)
{
if(hashset contains string)
//Add it to the new list.
}

If I understand your question correctly, you'd like to be able to compare the elements by their "key" prefix, instead by the whole string content. If so, implementing a custom equality comparer will allow you to easily leverage the LINQ set algorithms.
This program...
class EqCmp : IEqualityComparer<string> {
public bool Equals(string x, string y) {
return GetKey(x).SequenceEqual(GetKey(y));
}
public int GetHashCode(string obj) {
// Using Sum could cause OverflowException.
return GetKey(obj).Aggregate(0, (sum, subkey) => sum + subkey.GetHashCode());
}
static IEnumerable<string> GetKey(string line) {
// If we just split to 3 strings, the last one could exceed the key, so we split to 4.
// This is not the most efficient way, but is simple.
return line.Split(new[] { '*' }, 4).Take(3);
}
}
class Program {
static void Main(string[] args) {
var l1 = new List<string> {
"index1*index1*index1*some text",
"index1*index1*index2*some text ** test test test",
"index1*index2*index1*some text",
"index1*index2*index2*some text",
"index2*index1*index1*some text"
};
var l2 = new List<string> {
"index1*index1*index2*some text ** test test test",
"index2*index1*index1*some text",
"index2*index1*index2*some text"
};
var eq = new EqCmp();
Console.WriteLine("Elements that are both in l1 and l2:");
foreach (var line in l1.Intersect(l2, eq))
Console.WriteLine(line);
Console.WriteLine("\nElements that are in l1 but not in l2:");
foreach (var line in l1.Except(l2, eq))
Console.WriteLine(line);
// Etc...
}
}
...prints the following result:
Elements that are both in l1 and l2:
index1*index1*index2*some text ** test test test
index2*index1*index1*some text
Elements that are in l1 but not in l2:
index1*index1*index1*some text
index1*index2*index1*some text
index1*index2*index2*some text

List one = new List();
List two = new List();
List three = new List();
HashMap<String,Integer> intersect = new HashMap<String,Integer>();
for(one: String index)
{
intersect.put(index.next,intersect.get(index.next) + 1);
}
for(two: String index)
{
if(intersect.containsKey(index.next))
{
three.add(index.next);
}
}

When I add a new item to a List<List<string>> each item of the parent list get the same values

Apologies if the answer to this is obvious, I'm fairly new to C# and OOP. I've stepped though my code and spent quite some time on Google but I can't find the answer to my question (quite possibly because I am using the wrong search terms!).
I have the following class that creates a static List<List<string>> and has a method to add items to that list:
public static class WordList
{
static List<List<string>> _WordList; // Static List instance
static WordList()
{
//
// Allocate the list.
//
_WordList = new List<List<string>>();
}
public static void Record(List<string> Words)
{
//
// Record this value in the list.
//
_WordList.Add(Words);
}
}
Else where I create a List<string> which I pass into the Record() method to be added to _WordList. The problem is when I add items to WordList it gives every item in that list the same value. e.g.:
1st item added contains "Foo" and "bar"
2nd item added contains "Not","Foo" and "bar"
So instead of a list that looks like:
1: "Foo","bar"
2: "Not","Foo","bar"
I end up with:
1: "Not","Foo","bar"
2: "Not","Foo","bar"
I haven't used a List<string[]> instead of a List<List<string>> because the way I am getting the List<string> to add is by reading a text file line by line with a delimiter saying when I should add the List<string> and clear it so I can start again. Therefore I don't know how long an array I need to declare.
Hope this makes some kind of sense! If you need anymore of the code posting to help let me know.
Thanks, in advance.
EDIT
Here is the code for the creation of the List<string> that is passed to the Record() method. I think I see what people are saying about not creating a new instance of the List<string> but I'm not sure how to remedy this in regards to my code. I will have a think about it and post an answer if I come up with one!
public static void LoadWordList(string path)
{
string line;
List<string> WordsToAdd = new List<string>();
StreamReader file = new System.IO.StreamReader(path);
while ((line = file.ReadLine()) != null)
{
if (line.Substring(0, 1) == "$")
{
WordList.Record(WordsToAdd);
WordsToAdd.Clear();
WordsToAdd.Add(line.Replace("$", ""));
}
else
{
WordsToAdd.Add(line.Replace("_"," "));
}
}
file.Close();
}

Instead of
WordList.Record(WordsToAdd);
WordsToAdd.Clear();
WordsToAdd.Add(line.Replace("$", ""));
do
WordList.Record(WordsToAdd);
WordsToAdd = new List<string>();
WordsToAdd.Add(line.Replace("$", ""));

All that your Record method is doing is adding a reference to the List<string> you've passed to it. You then clear that same list, and start adding different strings to it.
Maybe something like:
public static void Record(IEnumerable<string> Words)
{
_WordList.Add(Words.ToList());
}
Which will force a copy to occur; also, by accepting IEnumerable<string>, it puts less restrictions on the code that calls it.

Can you post the code that adds the list - I bet you are doing something like
create a list l
add it
modify l
add it
This result in a single object (because you created it only once) with multiple references to it, namely from the first value in _WordList, from the second value in _WordList, from l.
So the right way to do it is:
create list l
add it
create NEW list l
add it
Or in code:
List<string> l = new string[] { "Foo", "bar" }.ToList();
WordList.Record(l);
l = new string[] { "Not", "Foo", "bar" }.ToList();
WordList.Record(l);

You haven't shown how you are adding items to the list. Here's an example which works as expected:
using System;
using System.Collections.Generic;
using System.Linq;
public static class WordList
{
static List<List<string>> _WordList; // Static List instance
static WordList()
{
_WordList = new List<List<string>>();
}
public static void Record(List<string> Words)
{
_WordList.Add(Words);
}
public static void Print()
{
foreach (var item in _WordList)
{
Console.WriteLine("-----");
Console.WriteLine(string.Join(",", item.ToArray()));
}
}
}
class Program
{
static void Main()
{
WordList.Record(new[] { "Foo", "bar" }.ToList());
WordList.Record(new[] { "Not", "Foo", "bar" }.ToList());
WordList.Print();
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Compare 2 lists for partial match - c#

Related

loading TextAsset from file and unable to compare

sort fileinfo list by int within fileinfo name

how to efficiently Comparing two lists with 500k objects and strings

Finding differences in two lists

When I add a new item to a List<List<string>> each item of the parent list get the same values

Categories

Resources