I wish to get a list of all the files of a certain extension (recursive), but only the files ending with that extension.
For example, I wish to get all the files with the ".exe" extension, If I have the following files:
file1.exe , file2.txt.exe , file3.exe.txt , file4.txt.exe1 , file5.txt
I expect to get a list of 1 file, which is: file1.exe.
I'm trying to use the following line:
List<string> theList = Directory.GetFiles(#"C:\SearchDir", "*.exe", SearchOption.AllDirectories).ToList();
But what I get is a list of the following three files: file1.exe , file2.txt.exe , file4.txt.exe1
Any ideas?
Try this:
var exeFiles = Directory.EnumerateFiles(sourceDirectory,
"*", SearchOption.AllDirectories)
.Where(s => s.EndsWith(".exe") && s.Count( c => c == '.') == 2)
.ToList();
This is a common issue to see. Take note to the MSDN documentation:
When using the asterisk wildcard character in a searchPattern, such as "*.txt", the matching behavior when the extension is exactly three characters long is different than when the extension is more or less than three characters long. A searchPattern with a file extension of exactly three characters returns files having an extension of three or more characters, where the first three characters match the file extension specified in the searchPattern.
You can't solve it by searching for the .exe extension; you'll need to filter your results one more time in the client code.
Now, one thing to note also is this. The following examples would in fact be considered executable files:
file1.exe
file2.txt.exe
whereas this one wouldn't technically be considered an executable file.
file4.txt.exe1
So the question then becomes, what algorithm do you want? It appears to me you want the following:
Files that have an extension of exe.
Files that don't have multiple extensions.
Have a look at Ahmed's answer for a fantastic approach to getting the algorithm you want.
Related
I have a directory with multiple sub directories that contain .doc files. Example:
C:\Users\user\Documents\testenviroment\Released\test0.doc
C:\Users\user\Documents\testenviroment\Debug\test1.doc
C:\Users\user\Documents1\testenviroment\Debug\test2.doc
C:\Users\user\Documents1\testenviroment\Released\test20.doc
I want to get all the test*.doc files under all Debug folders. I tried:
string[] files = Directory.GetFiles(#"C:\Users\user", "*Debug\\test*.doc",
SearchOption.AllDirectories);
And it gives me an "Illegal characters in path" error.
If I try:
string[] files = Directory.GetFiles(#"C:\Users\user", "\\Debug\\test*.doc",
SearchOption.AllDirectories);
I get a different error: "Could not find a part of the path C:\Users\user\Debug".
You are including a folder within the search pattern which isn't expected. According to the docs:
searchPattern Type: System.String The search string to match against
the names of files in path. This parameter can contain a combination
of valid literal path and wildcard (* and ?) characters (see Remarks),
but doesn't support regular expressions.
With this in mind, try something like this:
String[] files = Directory.GetFiles(#"C:\Users\user", "test*.doc", SearchOption.AllDirectories)
.Where(file => file.Contains("\\Debug\\"))
.ToArray();
This will get ALL the files in your specified directory and return the ones with Debug in the path. With this in mind, try and keep the search directory narrowed down as much as possible.
Note:
My original answer included EnumerateFiles which would work like this (making sure to pass the search option (thanks #CodeCaster)):
String[] files = Directory.EnumerateFiles(#"C:\Users\user", "test*.doc", SearchOption.AllDirectories)
.Where(file => file.Contains("\\Debug\\"))
.ToArray();
I've just run a test and the second seems to be slower however it might be quicker on a larger folder. Worth keeping in mind.
Edit: Note from #pinkfloydx33
I've actually had that practically take down a system that I had
inherited. It was taking so much time trying to return the array and
killing the memory footprint as well. Problem was diverted converting
over to the enumerable counterparts
So using the second option would be safer for larger directories.
The second parameter, the search pattern, works only for filenames. So you'll need to iterate the directories you want to search, then call Directory.GetFiles(directory, "test*.doc") on each directory.
How to write that code depends on how robust you want it to be and what assumptions you want to make (e.g. "all Debug directories are always two levels into the user's directory" versus "the Debug directory can be at any level into the user's directory").
See How to recursively list all the files in a directory in C#?.
Alternatively, if you want to search all subdirectories and then discard files that don't match your preferences, see Searching for file in directories recursively:
var files = Directory.GetFiles(#"C:\Users\user", "test*.doc", SearchOption.AllDirectories)
.Where(f => f.IndexOf(#"\debug", StringComparison.OrdinalIgnoreCase) >= 0);
But note that this may be bad for performance, as it'll scan irrelevant directories.
I have some files that have multiple extensions(ex: D*.P*.C*) I'm building a process to move files with specific extensions like the above one and .csv and .arc files. I'm failing to filter D*.P*.C* files. Here is the code below. Any help will be greatly appreciated.
var entries =
Directory.GetFileSystemEntries(_sourceLocation_FRUD, "*.*", SearchOption.AllDirectories)
.Where(s => (s.StartsWith("D"))
&& (s.Contains(".P"))
&& (s.EndsWith(".C")));
The second parameter isn't Regex, but it is a form of wildcard search.
Remove the LINQ code and specify your extension pattern in the method call. Since the * means 0 or more characters, you should be able to just use your D*.P*.C* pattern.
var entries =
Directory.GetFileSystemEntries(_sourceLocation_FRUD, "D*.P*.C*", SearchOption.AllDirectories)
Assuming only the extension starts with D, not the full filename, you may have to change your pattern to *.D*.P*.C*
I want a list of all xml files in a folder like this:
foreach (var file in Directory.EnumerateFiles(folderPath, "*.xml"))
{
// add file to a collection
}
However, if I for some reason have any files in folderPath that ends with .xmlXXX where XXX represent any characters, then they will be part of the enumerator.
If can solve it easily by doing something like
foreach (var file in Directory.EnumerateFiles(folderPath, "*.xml").Where(x => x.EndsWith(".xml")))
But it seems a bit odd to me, as I basically have to search for the same thing two times. Is there any way to get the right files directly or am I doing something wrong?
The is the documented/default behaviour of the wildcard usage with file searching.
Directory.EnumerateFiles Method (String, String)
If the specified extension is exactly three characters long, the
method returns files with extensions that begin with the specified
extension. For example, "*.xls" returns both "book.xls" and
"book.xlsx".
Your current approach of filtering twice is the right way.
The only improvement you can do is to ignore case in EndsWith like:
x.EndsWith(".xml", StringComparison.CurrentCultureIgnoreCase)
It seems like you cant do it using EnumerateFiles for 3 characters extension, according to MSDN
Quote from the article above
When you use the asterisk wildcard character in a searchPattern such as ".txt", the number of characters in the specified extension affects the search as follows:
If the specified extension is exactly three characters long, the method returns files with extensions that begin with the specified extension. For example, ".xls" returns both "book.xls" and "book.xlsx".
In all other cases, the method returns files that exactly match the specified extension. For example, ".ai" returns "file.ai" but not "file.aif".
When you use the question mark wildcard character, this method returns only files that match the specified file extension. For example, given two files, "file1.txt" and "file1.txtother", in a directory, a search pattern of "file?.txt" returns just the first file, whereas a search pattern of "file.txt" returns both files.
Therefore using the .Where extension seems like the best solution to your problem
Yes, and this design is stupid, stupid, stupid! It shouldn't do that. And it's annoying too!
That said, it appears this is what is happening: It actually searches both the long and short filenames. So files with longer extensions will have a short filename with the extension truncated to three characters.
And on newer versions of Windows, the short filenames may be disabled. So the behavior on newer systems will actually be what you would expect, and what it should've been in the first place.
I am using the following line to return specific files...
FileInfo file in nodeDirInfo.GetFiles("*.sbs", option)
But there are other files in the directory with the extension .sbsar, and it is getting them, too. How can I differentiate between .sbs and .sbsar in the search pattern?
The issue you're experiencing is a limitation of the search pattern, in the Win32 API.
A searchPattern with a file extension (for example *.txt) of exactly
three characters returns files having an extension of three or more
characters, where the first three characters match the file extension
specified in the searchPattern.
My solution is to manually filter the results, using Linq:
nodeDirInfo.GetFiles("*.sbs", option).Where(s => s.EndsWith(".sbs"),
StringComparison.InvariantCultureIgnoreCase));
Try this, filtered using file extension.
FileInfo[] files = nodeDirInfo.GetFiles("*", SearchOption.TopDirectoryOnly).
Where(f=>f.Extension==".sbs").ToArray<FileInfo>();
That's the behaviour of the Win32 API (FindFirstFile) that is underneath GetFiles() being reflected on to you.
You'll need to do your own filtering if you must use GetFiles(). For instance:
GetFiles("*", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
Or more efficiently:
EnumerateFiles("*", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
Note that I use StringComparison.InvariantCultureIgnoreCase to deal with the fact that Windows file names are case-insensitive.
If performance is an issue, that is if the search has to process directories with large numbers of files, then it is more efficient to perform the filtering twice: once in the call to GetFiles or EnumerateFiles, and once to clean up the unwanted file names. For example:
GetFiles("*.sbs", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
EnumerateFiles("*.sbs", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
Its mentioned in docs
When using the asterisk wildcard character in a searchPattern,a
searchPattern with a file extension of exactly three characters
returns files having an extension of three or more characters.When
using the question mark wildcard character, this method returns only
files that match the specified file extension.
I am attempting to retrieve jpeg and jpg files using the following statement:
string[] files = Directory.GetFiles(someDirectoryPath, "*.jp?g");
MSDN's docs for System.IO.Directory.GetFiles(string, string) state that ? represents "Exactly zero or one character.", however the above block selects jpeg files but omits jpg files.
I am currently using the overly-permissive search pattern "*.jp*g" to achieve my results, but it wrinkles my brain because it should work.
From the docs you linked to:
A searchPattern with a file extension of one, two, or more than three characters returns only files having extensions of exactly that length that match the file extension specified in the searchPattern.
I suspect that's the problem. To be honest, I'd probably fetch all the files and then postprocess them in code - it'll make for code which is simpler to reason about than relying on the Windows path-handling oddities.
You could either use "*" as a pattern and process the result yourself OR use
string[] files = Directory.GetFiles(someDirectoryPath, "*.jpg").Union (Directory.GetFiles(someDirectoryPath, "*.jpeg")).ToArray();
According to the Docs the pattern you use would return only files with extensions which are 4 characters long.
MSDN reference on Union