Directory.GetFiles - Search pattern for file extensions [duplicate] - c#

This question already has answers here:
Directory.GetFiles of certain extension
(3 answers)
Closed 8 years ago.
I need to get all ASP files in a folder, so I wrote a code like this:
string[] files = Directory.GetFiles(#"C:\Folder", "*.asp", SearchOption.AllDirectories);
However, it also returns files with extension "aspx".
Is there a way to specify the end of the extension?
Sorry for my english and thanks in advance.

Is there a way to specify the end of the extension?
There isn't a way to do this directly. The best option would be to switch to Directory.EnumerateFiles and filter afterwards:
var files = Directory.EnumerateFiles(#"C:\Folder", "*.asp", SearchOption.AllDirectories)
.Where(f => f.EndsWith(".asp", StringComparison.OrdinalIgnoreCase));
This is because the Directory methods have specific behavior which prevents this from working directly. From the docs:
If the specified extension is exactly three characters long, the method returns files with extensions that begin with the specified extension. For example, "*.xls" returns both "book.xls" and "book.xlsx".
This is an exception to the normal search rules, but, in your case, is working against you. Using EnumerateFiles streams the results, and filtering afterwards allows you to find only the proper matches.

Unfortunately, i don't think there's a built in way. but
Directory.EnumerateFiles(#"C:\Folder", "*.asp", SearchOption.AllDirectories).Where(f => f.EndsWith(".asp")
should be as performant as a direct query would be. (note that EnumerateFiles returns an IEnumerable and is preferable to GetFiles if you don't need the files actually in an array)

Related

How to know the extension of documents/files? [duplicate]

This question already has answers here:
How to find the extension of a file in C#?
(14 answers)
Closed 6 years ago.
I need to know the extension of different documents and depending of the extensions take some decisions within my code: for example I can have this FileNames:
commands to save.docx
Old Word.doc
rel.txt
test.pdf
I know that using the Contains() method that comes with Linq I could do something but I'm afraid that using Contains() it will take a .doc extension even if is a .docx extension because .doc is a substring of .docx
I think that a better approach would be maybe a regular expression for this... Any suggestion?
I've done something like this but maybe a RegEx should be a better approach:
if (fileName.Contains(".pdf"))
{
Response.AddHeader("Content-Type", "application/pdf; Content-Disposition, inline" + fileName);
}
You will find what you seek here, using Path.GetExtension
You may use the MimeMapping.GetMimeMapping method the mime type of the document. With that,you do not really need to get the file extension and write those if condition for all the different types.
var fileName = Path.GetFileName("SomeFileNameWithLongPath.pdf");
string mimeType= MimeMapping.GetMimeMapping(fileName );
If you really want the extension, you can use the Path.GetExtension method
var extension = Path.GetExtension("SomeFileNameWithLongPath.pdf");

How to filter Directory.EnumerateFiles with specific extension

I want a list of all xml files in a folder like this:
foreach (var file in Directory.EnumerateFiles(folderPath, "*.xml"))
{
// add file to a collection
}
However, if I for some reason have any files in folderPath that ends with .xmlXXX where XXX represent any characters, then they will be part of the enumerator.
If can solve it easily by doing something like
foreach (var file in Directory.EnumerateFiles(folderPath, "*.xml").Where(x => x.EndsWith(".xml")))
But it seems a bit odd to me, as I basically have to search for the same thing two times. Is there any way to get the right files directly or am I doing something wrong?
The is the documented/default behaviour of the wildcard usage with file searching.
Directory.EnumerateFiles Method (String, String)
If the specified extension is exactly three characters long, the
method returns files with extensions that begin with the specified
extension. For example, "*.xls" returns both "book.xls" and
"book.xlsx".
Your current approach of filtering twice is the right way.
The only improvement you can do is to ignore case in EndsWith like:
x.EndsWith(".xml", StringComparison.CurrentCultureIgnoreCase)
It seems like you cant do it using EnumerateFiles for 3 characters extension, according to MSDN
Quote from the article above
When you use the asterisk wildcard character in a searchPattern such as ".txt", the number of characters in the specified extension affects the search as follows:
If the specified extension is exactly three characters long, the method returns files with extensions that begin with the specified extension. For example, ".xls" returns both "book.xls" and "book.xlsx".
In all other cases, the method returns files that exactly match the specified extension. For example, ".ai" returns "file.ai" but not "file.aif".
When you use the question mark wildcard character, this method returns only files that match the specified file extension. For example, given two files, "file1.txt" and "file1.txtother", in a directory, a search pattern of "file?.txt" returns just the first file, whereas a search pattern of "file.txt" returns both files.
Therefore using the .Where extension seems like the best solution to your problem
Yes, and this design is stupid, stupid, stupid! It shouldn't do that. And it's annoying too!
That said, it appears this is what is happening: It actually searches both the long and short filenames. So files with longer extensions will have a short filename with the extension truncated to three characters.
And on newer versions of Windows, the short filenames may be disabled. So the behavior on newer systems will actually be what you would expect, and what it should've been in the first place.

How can I make GetFiles() exclude files with extensions that start with the search extension?

I am using the following line to return specific files...
FileInfo file in nodeDirInfo.GetFiles("*.sbs", option)
But there are other files in the directory with the extension .sbsar, and it is getting them, too. How can I differentiate between .sbs and .sbsar in the search pattern?
The issue you're experiencing is a limitation of the search pattern, in the Win32 API.
A searchPattern with a file extension (for example *.txt) of exactly
three characters returns files having an extension of three or more
characters, where the first three characters match the file extension
specified in the searchPattern.
My solution is to manually filter the results, using Linq:
nodeDirInfo.GetFiles("*.sbs", option).Where(s => s.EndsWith(".sbs"),
StringComparison.InvariantCultureIgnoreCase));
Try this, filtered using file extension.
FileInfo[] files = nodeDirInfo.GetFiles("*", SearchOption.TopDirectoryOnly).
Where(f=>f.Extension==".sbs").ToArray<FileInfo>();
That's the behaviour of the Win32 API (FindFirstFile) that is underneath GetFiles() being reflected on to you.
You'll need to do your own filtering if you must use GetFiles(). For instance:
GetFiles("*", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
Or more efficiently:
EnumerateFiles("*", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
Note that I use StringComparison.InvariantCultureIgnoreCase to deal with the fact that Windows file names are case-insensitive.
If performance is an issue, that is if the search has to process directories with large numbers of files, then it is more efficient to perform the filtering twice: once in the call to GetFiles or EnumerateFiles, and once to clean up the unwanted file names. For example:
GetFiles("*.sbs", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
EnumerateFiles("*.sbs", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
Its mentioned in docs
When using the asterisk wildcard character in a searchPattern,a
searchPattern with a file extension of exactly three characters
returns files having an extension of three or more characters.When
using the question mark wildcard character, this method returns only
files that match the specified file extension.

Get files of certain extension c#

I wish to get a list of all the files of a certain extension (recursive), but only the files ending with that extension.
For example, I wish to get all the files with the ".exe" extension, If I have the following files:
file1.exe , file2.txt.exe , file3.exe.txt , file4.txt.exe1 , file5.txt
I expect to get a list of 1 file, which is: file1.exe.
I'm trying to use the following line:
List<string> theList = Directory.GetFiles(#"C:\SearchDir", "*.exe", SearchOption.AllDirectories).ToList();
But what I get is a list of the following three files: file1.exe , file2.txt.exe , file4.txt.exe1
Any ideas?
Try this:
var exeFiles = Directory.EnumerateFiles(sourceDirectory,
"*", SearchOption.AllDirectories)
.Where(s => s.EndsWith(".exe") && s.Count( c => c == '.') == 2)
.ToList();
This is a common issue to see. Take note to the MSDN documentation:
When using the asterisk wildcard character in a searchPattern, such as "*.txt", the matching behavior when the extension is exactly three characters long is different than when the extension is more or less than three characters long. A searchPattern with a file extension of exactly three characters returns files having an extension of three or more characters, where the first three characters match the file extension specified in the searchPattern.
You can't solve it by searching for the .exe extension; you'll need to filter your results one more time in the client code.
Now, one thing to note also is this. The following examples would in fact be considered executable files:
file1.exe
file2.txt.exe
whereas this one wouldn't technically be considered an executable file.
file4.txt.exe1
So the question then becomes, what algorithm do you want? It appears to me you want the following:
Files that have an extension of exe.
Files that don't have multiple extensions.
Have a look at Ahmed's answer for a fantastic approach to getting the algorithm you want.

System.IO.Directory search pattern not working as expected

I am attempting to retrieve jpeg and jpg files using the following statement:
string[] files = Directory.GetFiles(someDirectoryPath, "*.jp?g");
MSDN's docs for System.IO.Directory.GetFiles(string, string) state that ? represents "Exactly zero or one character.", however the above block selects jpeg files but omits jpg files.
I am currently using the overly-permissive search pattern "*.jp*g" to achieve my results, but it wrinkles my brain because it should work.
From the docs you linked to:
A searchPattern with a file extension of one, two, or more than three characters returns only files having extensions of exactly that length that match the file extension specified in the searchPattern.
I suspect that's the problem. To be honest, I'd probably fetch all the files and then postprocess them in code - it'll make for code which is simpler to reason about than relying on the Windows path-handling oddities.
You could either use "*" as a pattern and process the result yourself OR use
string[] files = Directory.GetFiles(someDirectoryPath, "*.jpg").Union (Directory.GetFiles(someDirectoryPath, "*.jpeg")).ToArray();
According to the Docs the pattern you use would return only files with extensions which are 4 characters long.
MSDN reference on Union

Categories