LINQ Query Not Selecting Files - c#

I am trying to LINQ query a set of files where I can find the file names with a specific string in them.
I was using:
var docs = directory.enumerateFiles(searchFolder, "* " + strNumber+ "*", SearchOption.AllDirectories);
That was working fine, but some of my file searches were taking 30+ minutes due to the fact that one of the directories has 1+ million files. I was hoping to speed up the search process with a PLINQ query. However, while my syntax is good, I'm not getting the results I would expect. It looks like my problem may be in the Where statement. Any help would be helpful.
foreach (strNumber in strNumbers)
{
DirectoryInfo searchDirectory = new DirectoryInfo(searchFolder);
IEnumerable<System.IO.FileInfo> allDocs = searchDirectory.EnumerateFiles("*", SearchOPtion.AllDirectories);
IEnumerable<System.IO.FileInfo> docsToProcess = strNumbers
.SelectMany(strNumber => allDocs
.Where(file => file.Name.Contains(strNumber)))
.Distinct();
}
Any help would be much appreciated.

I would change the order of the problem.
Create a list of all files (into memory)
Perform the search over the memory list
Then, you can use a Parallel Foreach over the memory array and your disk usage is limited to the initial search.
var searchDirectory = new DirectoryInfo(searchFolder);
var allDocs = searchDirectory.EnumerateFiles("*", SearchOPtion.AllDirectories).ToArray();
// For extra points, use a Parallel.ForEach here for multi-threaded work
Parallel.Foreach(strNumbers, strNumber =>
{
// Work on allDocs here, it should be in memory
});

Related

What's the best way to fiil list <string>?

I have a method, the input of which is a list of file addresses that I want to open this files and process it. this address contains the file extension. I know for sure that I have 3 file extensions (txt, xlsx, xls)
in the code pathWithFilesName it input list with file path;
then I want to send them to methods that will open and process them
pathWithFilesName.Add("ds.xlsx");
pathWithFilesName.Add("ds.txt");
var listExcel=new List<string>();
var listTxt= new List<string>();
var validExcelFileTypes = new List<string>{ ".xls", ".xlsx" };
foreach (var path in pathWithFilesName)
{
foreach (var valid in validExcelFileTypes)
{
if (path.EndsWith(valid))
{
listExcel.Add(path);
}
else
{
listTxt.Add(path);
}
}
}
this variant not optimal at all but work)
i know how take excel files on link
var list= (from path in pathWithFilesName from valid in validExcelFileTypes where path.EndsWith(valid) select path).ToList();
but with this approach I need then compare 2 lists. for example some kind of Intersect
what is the best way to make a sample?
Here is a variation using LinQ and lambda. It should not be more efficient not better or worse. It may be more readable.
The listExcel can be find that way :
var listExcel = pathWithFilesName.Where(path=>validExcelFileTypes.Any(ext=> path.EndsWith(ext)));
Enumerable.Any
Enumerable.Where
If you need both list in one go. You can group the source on the same condition:
var listGrp = pathWithFilesName.GroupBy(path=>validExcelFileTypes.Any(ext=> path.EndsWith(ext)));
You can use MoreLinQ Partition: "Partitions a sequence by a predicate,..".
var (listExcel, listTxt) = pathWithFilesName
.Partition(p =>
validExcelFileTypes.Any(ext => p.EndsWith(ext))
);
Under the hood it's just a GroupBy source code. Unrolled into a Named Tuple.
Live demo

How to search a directory for files that begin with something then get the one that was modified most recently

What I want to do is search/scan a directory for multiple files beginning with something, then get the file that was last modified most recently. For example, I want to search the directory Prefetch for files that begin with "apple", "pear", and "orange". These files may not exist, but if they do, and say there are files that begin with apple and files that begin with pear, out of all of those files, I want to get the one that was modified most recently. The code below allows me do to this but search only 1 thing.
DirectoryInfo prefetch = new DirectoryInfo("c:\\Windows\\Prefetch");
FileInfo[] apple = prefetch.GetFiles("apple*");
if (apple.Length == 0)
// Do something
else
{
double lastused = DateTime.Now.Subtract(
apple.OrderByDescending(x => x.LastWriteTime)
.FirstOrDefault().LastWriteTime).TotalMinutes;
int final = Convert.ToInt32(lastused);
}
Basically, how can I make that code search 'apple', 'pear' etc. instead of just apple? I don't know if you can modify the code above to do that or if you have to change it completely. I've been trying to figure this out for hours and can't do it.
As explained in my comments you can't use DirectoryInfo.GetFiles to return list of FileInfo with so different patterns. Just one pattern is supported.
As others as already shown, you can prepare a list of patterns and then call in a loop the GetFiles on each pattern.
However, I would show you the same approach, but done with just one line of code in Linq.
List<string> patterns = new List<string> { "apple*", "pear*", "orange*" };
DirectoryInfo prefetch = new DirectoryInfo(#"c:\Windows\Prefetch");
var result = patterns.SelectMany(x => prefetch.GetFiles(x))
.OrderByDescending(k => k.LastWriteTime)
.FirstOrDefault();
Now, result is a FileInfo with the most recent update. Of course, if no files matches the three patterns, then result will be null. A check before using that variable is mandatory.
You could create a set of files that match the prefixes then check the date of those files, something like (not tested):
List<string> files=new List<string>();
foreach(var str in prefixes)
files.AddRange(dirInfo.GetFiles(str));
return (from d in (from name in files select File.GetLastAccessTime(name)) orderby d descending).FirstOrDefault();
prefixes is the list of search patterns, and dirInfo is a DirectoryInfo object.
You can iterate over a list
List<string> patterns = new List<string> { "apple*", "pear*", "orange*" };
DirectoryInfo prefetch = new DirectoryInfo("c:\\Windows\\Prefetch");
foreach (var pattern in patterns) {
FileInfo[] files = prefetch.GetFiles(pattern);
var lastAccessed = files.OrderByDescending(x => x.LastAccessTime).FirstOrDefault();
if (lastAccessed != null) {
var minutes = DateTime.Now.Subtract(lastAccessed.LastAccessTime).TotalMinutes;
}
}

Enumerate contents of specific folder DotNetZip, without child folders

Using Ionic.Zip
I wish to display the files or folders in a specific folder. I am using the SelectEntries method, but it unfortunately is filtering out the folders. Not what I was expecting using '*'.
ICollection<ZipEntry> selectEntries = _zipFile.SelectEntries("*",rootLocation)
If I follow an alternative approach:
IEnumerable<ZipEntry> selectEntries = _zipFile.Entries.Where(e => e.FileName.StartsWith(rootLocation))
I face two problems:
I have to switch '/' for '\' potentially.
I get all the subfolders.
Which is not desirable.
Anyone know why SelectEntries returns no folders, or am I misusing it?
I found a solution in my particular case. I think something about the way the Zipfile was constructed led to it appearing to have folders but none actually existed i.e. the following code yielded an empty list.
_zipFile.Entries.Where(e=>e.IsDirectory).AsList(); // always empty!
I used the following snippet to achieve what I needed. The regex is not as comprehensive as it should be but worked for all cases I needed.
var conformedRootLocation = rootLocation.Replace('\\','/').TrimEnd('/') + "/";
var pattern = string.Format(#"({0})([a-z|A-Z|.|_|0-9|\s]+)/?", conformedRootLocation);
var regex = new Regex(pattern);
return _zipFile.EntryFileNames.Select(e => regex.Match(e))
.Where(match => match.Success)
.Select(match => match.Groups[2].Value)
.Distinct()
.Select(f => new DirectoryResource
{
Name = f, IsDirectory = !Path.HasExtension(f)
})
.ToList();

C#: Get the 5 newest (last modified) files from a directory

Is there a way I can store the file location of the 5 last modified files from a directory using Array?
I am currently using the following codes below to get the last file:
DateTime lastHigh = new DateTime(1900,1,1);
string highDir;
foreach (string subdir in Directory.GetDirectories(path)){
DirectoryInfo fi1 = new DirectoryInfo(subdir);
DateTime created = fi1.LastWriteTime;
if (created > lastHigh){
highDir = subdir;
lastHigh = created;
}
}
I'll be using Array to send multiple files to an email address as attachment.
UPDATE
I am currently using the codes below to get the last modified files after 1 minute:
string myDirectory = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.MyPictures),
"Test Folder");
var directory = new DirectoryInfo(myDirectory);
DateTime from_date = DateTime.Now.AddMinutes(-1);
DateTime to_date = DateTime.Now;
var files = directory.GetFiles().Where(file => file.LastWriteTime >= from_date && file.LastWriteTime <= to_date);
I want to store to list of file names coming from files
Here's a general way to do this with LINQ:
Directory.GetFiles(path)
.Select(x => new FileInfo(x))
.OrderByDescending(x => x.LastWriteTime)
.Take(5)
.ToArray()
I suspect this isn't quite what you want, since your code examples seem to be working at different tasks, but in the general case, this would do what the title of your question requests.
It sounds like you want a string array of the full filepaths of all the files in a directory.
Given you already have your FileInfo enumerable, you can do this:
var filenames = files.Select(f => f.FullName).ToArray();
If you wanted just the filenames, replace FullName with Name.
While the answer Paul Phillips provided worked. It's worth to keep in mind that the
FileInfo.LastWriteTime & FileInfo.LastAccessTime do not always work. It depends on how the OS is configured or could be a caching issue.
.NET FileInfo.LastWriteTime & FileInfo.LastAccessTime are wrong
File.GetLastWriteTime seems to be returning 'out of date' value

How to read File names recursively from subfolder using LINQ

How to read file name with dll extension from a directory and from its subfolders recursively using LINQ or LAMBDA expression.
Now i'm using Nested for-each loop to do this.
Is there any way to do this using LINQ or LAMBDA expression?
You don't need to use LINQ to do this - it's built into the framework:
string[] files = Directory.GetFiles(directory, "*.dll",
SearchOption.AllDirectories);
or if you're using .NET 4:
IEnumerable<string> files = Directory.EnumerateFiles(directory, "*.dll",
SearchOption.AllDirectories);
To be honest, LINQ isn't great in terms of recursion. You'd probably want to write your own general-purpose recursive extension method. Given how often this sort of question is asked, I should really do that myself some time...
this returns just file names+extensions:
DirectoryInfo di = new DirectoryInfo(#"d:\somewhere\");
var q = from i in di.GetFiles("*.dll", SearchOption.AllDirectories)
select i.Name;
this returns just file names without extensions:
DirectoryInfo di = new DirectoryInfo(#"d:\somewhere\");
var q = from i in di.GetFiles("*.dll", SearchOption.AllDirectories)
select System.IO.Path.GetFileNameWithoutExtension(i.Name);
If you really want to do it with a recursive lambda expression here you go:
Action<string, List<string>> discoverFiles = null;
discoverFiles = new Action<string, List<string>>((dir, list) =>
{
try
{
foreach (var subDir in Directory.GetDirectories(dir))
discoverFiles(string.Concat(subDir), list);
foreach (var dllFile in Directory.GetFiles(dir, "*.dll"))
{
var fileNameOnly = Path.GetFileName(dllFile);
if (!list.Contains(fileNameOnly))
list.Add(fileNameOnly);
}
}
catch (IOException)
{
// decide what to do here
}
});
// usage:
var targetList = new List<string>();
discoverFiles("c:\\MyDirectory", targetList);
foreach (var item in targetList)
Debug.WriteLine(item);
Note: this is probably several times slower (and way harder to read/debug/maintain) than the previous answers, but it does not stop if there is an I/O exception somewhere.
IEnumerable<string> filenames = Directory.GetFiles(searchDirectory, "*.dll",
SearchOption.AllDirectories)
.Select(s => Path.GetFileName(s));
Directory.GetFiles() returns the full path of files that match the specified search pattern in the specified directory. Select projects each element of fullpath sequence into a new form, only the filename.
Reading files and directories is usually done with classes situated in the System.IO namespace. So the first step would consist into getting all the files that you need to read using the Directory.EnumerateFiles method and then for each file that corresponds to your search criteria read the contents using for example the File.ReadAllBytes method.

Categories