What's the best way to fiil list <string>? - c#

I have a method, the input of which is a list of file addresses that I want to open this files and process it. this address contains the file extension. I know for sure that I have 3 file extensions (txt, xlsx, xls)
in the code pathWithFilesName it input list with file path;
then I want to send them to methods that will open and process them
pathWithFilesName.Add("ds.xlsx");
pathWithFilesName.Add("ds.txt");
var listExcel=new List<string>();
var listTxt= new List<string>();
var validExcelFileTypes = new List<string>{ ".xls", ".xlsx" };
foreach (var path in pathWithFilesName)
{
foreach (var valid in validExcelFileTypes)
{
if (path.EndsWith(valid))
{
listExcel.Add(path);
}
else
{
listTxt.Add(path);
}
}
}
this variant not optimal at all but work)
i know how take excel files on link
var list= (from path in pathWithFilesName from valid in validExcelFileTypes where path.EndsWith(valid) select path).ToList();
but with this approach I need then compare 2 lists. for example some kind of Intersect
what is the best way to make a sample?

Here is a variation using LinQ and lambda. It should not be more efficient not better or worse. It may be more readable.
The listExcel can be find that way :
var listExcel = pathWithFilesName.Where(path=>validExcelFileTypes.Any(ext=> path.EndsWith(ext)));
Enumerable.Any
Enumerable.Where
If you need both list in one go. You can group the source on the same condition:
var listGrp = pathWithFilesName.GroupBy(path=>validExcelFileTypes.Any(ext=> path.EndsWith(ext)));

You can use MoreLinQ Partition: "Partitions a sequence by a predicate,..".
var (listExcel, listTxt) = pathWithFilesName
.Partition(p =>
validExcelFileTypes.Any(ext => p.EndsWith(ext))
);
Under the hood it's just a GroupBy source code. Unrolled into a Named Tuple.
Live demo

Related

LINQ Query Not Selecting Files

I am trying to LINQ query a set of files where I can find the file names with a specific string in them.
I was using:
var docs = directory.enumerateFiles(searchFolder, "* " + strNumber+ "*", SearchOption.AllDirectories);
That was working fine, but some of my file searches were taking 30+ minutes due to the fact that one of the directories has 1+ million files. I was hoping to speed up the search process with a PLINQ query. However, while my syntax is good, I'm not getting the results I would expect. It looks like my problem may be in the Where statement. Any help would be helpful.
foreach (strNumber in strNumbers)
{
DirectoryInfo searchDirectory = new DirectoryInfo(searchFolder);
IEnumerable<System.IO.FileInfo> allDocs = searchDirectory.EnumerateFiles("*", SearchOPtion.AllDirectories);
IEnumerable<System.IO.FileInfo> docsToProcess = strNumbers
.SelectMany(strNumber => allDocs
.Where(file => file.Name.Contains(strNumber)))
.Distinct();
}
Any help would be much appreciated.
I would change the order of the problem.
Create a list of all files (into memory)
Perform the search over the memory list
Then, you can use a Parallel Foreach over the memory array and your disk usage is limited to the initial search.
var searchDirectory = new DirectoryInfo(searchFolder);
var allDocs = searchDirectory.EnumerateFiles("*", SearchOPtion.AllDirectories).ToArray();
// For extra points, use a Parallel.ForEach here for multi-threaded work
Parallel.Foreach(strNumbers, strNumber =>
{
// Work on allDocs here, it should be in memory
});

How to Remove Directories From EnumerateFiles?

So I'm working on a program that will list all the files in a directory. Pretty simple. Basically, when I do this: List<string> dirs = new List<string>(Directory.EnumerateFiles(target));, I don't want it to include the directory and all. Just the file name. When I run my code;
List<string> dirs = new List<string>(Directory.EnumerateFiles(target));
Console.WriteLine($"Folders and files in this directory:\n");
foreach (string i in dirs) {
Console.WriteLine($"> {i}");
}
it gives me the following:
C:\Users\Camden\Desktop\Programming\Visual Studio\C#\DirectoryManager\DirectoryManager\bin\Debug\DirectoryManager.exe
I just want the DirectoryManager.exe part, so I looked it up and I found that you can replace strings inside of strings. Like so: i.Replace(target, "");. However, this isn't doing anything, and it's just running like normal. Why isn't it replacing, and how should I instead do this?
Use methods from the System.IO.Path class.
var fullfile = #"C:\Users\Camden\Desktop\Programming\Visual Studio\C#\DirectoryManager\DirectoryManager\bin\Debug\DirectoryManager.exe";
var fileName = Path.GetFileName(fullfile); // DirectoryManager.exe
var name = Path.GetFileNameWithoutExtension(fullfile); // DirectoryManager
The simplest way is to use the Select IEnumerable extension
(you need to have a using Linq; at the top of your source code file)
List<string> files = new List<string>(Directory.EnumerateFiles(target)
.Select(x => Path.GetFileName(x)));
In this way the sequence of files retrieved by Directory.EnumerateFiles is passed, one by one, to the Select method where each fullfile name (x) is passed to Path.GetFileName to produce a new sequence of just filenames.
This sequence is then returned as a parameter to the List constructor.
And about your question on the Replace method. Remember that the Replace method doesn't change the string that you use to call the method, but returns a new string with the replacement executed. In NET strings are immutable.
So if you want to look at the replacement you need
string justFileName = i.Replace(target, "");
An alternative to using Directory.EnumerateFiles, would be DirectoryInfo.EnumerateFiles. This method returns an IEnumerable<FileInfo>. You can then make use of the FileInfo.Name property of each of the returned objects. Your code would then become:
var files = new DirectoryInfo(target).EnumerateFiles();
Console.WriteLine("Files in this directory:\n");
foreach (FileInfo i in files) {
Console.WriteLine($"> {i.Name}");
}
For just the list of file names:
List<string> fileNames = new DirectoryInfo(target).EnumerateFiles().Select(f => f.Name).ToList();
Alternatively, if you want both files and directories, you can use EnumerateFileSystemInfos. If you need to know if you have a file vs a directory you can query the Attributes property and compare it to the FileAttributes flags enumeration.
var dirsAndFiles = new DirectoryInfo(target).EnumerateFileSystemInfos();
Console.WriteLine("Folders and files in this directory:\n");
foreach (var i in dirsAndFiles) {
var type = (i.Attributes & FileAttributes.Directory) == FileAttributes.Directory ? "Directory" : "File";
Console.WriteLine($"{type} > {i.Name}");
}
The FileSystemInfo.Name property will return either the file's name (in case of a file) or the last directory in the hierarchy (for a directory)--so just the subdirectory name and not the full path ("sub" instead of "c:\sub").

How to search a directory for files that begin with something then get the one that was modified most recently

What I want to do is search/scan a directory for multiple files beginning with something, then get the file that was last modified most recently. For example, I want to search the directory Prefetch for files that begin with "apple", "pear", and "orange". These files may not exist, but if they do, and say there are files that begin with apple and files that begin with pear, out of all of those files, I want to get the one that was modified most recently. The code below allows me do to this but search only 1 thing.
DirectoryInfo prefetch = new DirectoryInfo("c:\\Windows\\Prefetch");
FileInfo[] apple = prefetch.GetFiles("apple*");
if (apple.Length == 0)
// Do something
else
{
double lastused = DateTime.Now.Subtract(
apple.OrderByDescending(x => x.LastWriteTime)
.FirstOrDefault().LastWriteTime).TotalMinutes;
int final = Convert.ToInt32(lastused);
}
Basically, how can I make that code search 'apple', 'pear' etc. instead of just apple? I don't know if you can modify the code above to do that or if you have to change it completely. I've been trying to figure this out for hours and can't do it.
As explained in my comments you can't use DirectoryInfo.GetFiles to return list of FileInfo with so different patterns. Just one pattern is supported.
As others as already shown, you can prepare a list of patterns and then call in a loop the GetFiles on each pattern.
However, I would show you the same approach, but done with just one line of code in Linq.
List<string> patterns = new List<string> { "apple*", "pear*", "orange*" };
DirectoryInfo prefetch = new DirectoryInfo(#"c:\Windows\Prefetch");
var result = patterns.SelectMany(x => prefetch.GetFiles(x))
.OrderByDescending(k => k.LastWriteTime)
.FirstOrDefault();
Now, result is a FileInfo with the most recent update. Of course, if no files matches the three patterns, then result will be null. A check before using that variable is mandatory.
You could create a set of files that match the prefixes then check the date of those files, something like (not tested):
List<string> files=new List<string>();
foreach(var str in prefixes)
files.AddRange(dirInfo.GetFiles(str));
return (from d in (from name in files select File.GetLastAccessTime(name)) orderby d descending).FirstOrDefault();
prefixes is the list of search patterns, and dirInfo is a DirectoryInfo object.
You can iterate over a list
List<string> patterns = new List<string> { "apple*", "pear*", "orange*" };
DirectoryInfo prefetch = new DirectoryInfo("c:\\Windows\\Prefetch");
foreach (var pattern in patterns) {
FileInfo[] files = prefetch.GetFiles(pattern);
var lastAccessed = files.OrderByDescending(x => x.LastAccessTime).FirstOrDefault();
if (lastAccessed != null) {
var minutes = DateTime.Now.Subtract(lastAccessed.LastAccessTime).TotalMinutes;
}
}

How to search for Files using GetFiles method (multiple criteria..)

The code below obviously searches a directory for Files that contain the word "FINAL" but what I'm wondering is can I add to its search criteria? I have a Well_Name and Actual_Date strings that I would like to search for in the File names in addition to the "FINAL" word. Thoughts? Thanks in advance.
DirectoryInfo myDir = new DirectoryInfo("C://DWGs");
var files = myDir.GetFiles("FINAL");
//Can I do something like this to add to my search criteria?
var files = myDir.GetFiles("FINAL" +
drow["Well_Name"].ToString() +
drow["Actual_Date"]);
var files = myDir.GetFileInfo()
.Where(f => f.FileName.Contains("FINAL") ||
f.FileName.Contains(drow["Well_Name"].ToString()) ||
f.FileName.Contains(drow["Actual_Date"]));
Since GetFiles() returns an Enumerable Collection of FileInfo you can just check all of the file names for the criteria that you want.
If you want to get really generic on this you could write a function that looks like this
public IEnumerable<FileInfo> addCriteria(IEnumerable<FileInfo> FileList,
List<String> searchCriteria)
{
var newFileList = FileList;
foreach(String criteria in searchCriteria)
{
newFileList = newFileList.Where(f => f.FileName.Contains(criteria).AsQueryable();
}
return newFileList.AsEnumerable();
}
GetFiles method does not support multiple search criteria, but there is a simple way around this limitation. Run a getFile for each file extension, and then "merge" returned arrays into a List<>. Then use a List's ToArray method to "convert" a List back to an Array.
I used this approach, and it works for me
The code is below (do not forget to reference "using System.Collections.Generic;" namespace):
// Get the DirectoryInfo and FileInfo objects for aspx and html files.
FileInfo[] files_aspx = dir.GetFiles("*.aspx");
FileInfo[] files_html = dir.GetFiles("*.html");
List<FileInfo> files = new List<FileInfo>();
files.AddRange(files_aspx);
files.AddRange(files_html);
files.ToArray();

How to read File names recursively from subfolder using LINQ

How to read file name with dll extension from a directory and from its subfolders recursively using LINQ or LAMBDA expression.
Now i'm using Nested for-each loop to do this.
Is there any way to do this using LINQ or LAMBDA expression?
You don't need to use LINQ to do this - it's built into the framework:
string[] files = Directory.GetFiles(directory, "*.dll",
SearchOption.AllDirectories);
or if you're using .NET 4:
IEnumerable<string> files = Directory.EnumerateFiles(directory, "*.dll",
SearchOption.AllDirectories);
To be honest, LINQ isn't great in terms of recursion. You'd probably want to write your own general-purpose recursive extension method. Given how often this sort of question is asked, I should really do that myself some time...
this returns just file names+extensions:
DirectoryInfo di = new DirectoryInfo(#"d:\somewhere\");
var q = from i in di.GetFiles("*.dll", SearchOption.AllDirectories)
select i.Name;
this returns just file names without extensions:
DirectoryInfo di = new DirectoryInfo(#"d:\somewhere\");
var q = from i in di.GetFiles("*.dll", SearchOption.AllDirectories)
select System.IO.Path.GetFileNameWithoutExtension(i.Name);
If you really want to do it with a recursive lambda expression here you go:
Action<string, List<string>> discoverFiles = null;
discoverFiles = new Action<string, List<string>>((dir, list) =>
{
try
{
foreach (var subDir in Directory.GetDirectories(dir))
discoverFiles(string.Concat(subDir), list);
foreach (var dllFile in Directory.GetFiles(dir, "*.dll"))
{
var fileNameOnly = Path.GetFileName(dllFile);
if (!list.Contains(fileNameOnly))
list.Add(fileNameOnly);
}
}
catch (IOException)
{
// decide what to do here
}
});
// usage:
var targetList = new List<string>();
discoverFiles("c:\\MyDirectory", targetList);
foreach (var item in targetList)
Debug.WriteLine(item);
Note: this is probably several times slower (and way harder to read/debug/maintain) than the previous answers, but it does not stop if there is an I/O exception somewhere.
IEnumerable<string> filenames = Directory.GetFiles(searchDirectory, "*.dll",
SearchOption.AllDirectories)
.Select(s => Path.GetFileName(s));
Directory.GetFiles() returns the full path of files that match the specified search pattern in the specified directory. Select projects each element of fullpath sequence into a new form, only the filename.
Reading files and directories is usually done with classes situated in the System.IO namespace. So the first step would consist into getting all the files that you need to read using the Directory.EnumerateFiles method and then for each file that corresponds to your search criteria read the contents using for example the File.ReadAllBytes method.

Categories