"Illegal characters in path" error using wildcards with Directory.GetFiles - c#

I have a directory with multiple sub directories that contain .doc files. Example:
C:\Users\user\Documents\testenviroment\Released\test0.doc
C:\Users\user\Documents\testenviroment\Debug\test1.doc
C:\Users\user\Documents1\testenviroment\Debug\test2.doc
C:\Users\user\Documents1\testenviroment\Released\test20.doc
I want to get all the test*.doc files under all Debug folders. I tried:
string[] files = Directory.GetFiles(#"C:\Users\user", "*Debug\\test*.doc",
SearchOption.AllDirectories);
And it gives me an "Illegal characters in path" error.
If I try:
string[] files = Directory.GetFiles(#"C:\Users\user", "\\Debug\\test*.doc",
SearchOption.AllDirectories);
I get a different error: "Could not find a part of the path C:\Users\user\Debug".

You are including a folder within the search pattern which isn't expected. According to the docs:
searchPattern Type: System.String The search string to match against
the names of files in path. This parameter can contain a combination
of valid literal path and wildcard (* and ?) characters (see Remarks),
but doesn't support regular expressions.
With this in mind, try something like this:
String[] files = Directory.GetFiles(#"C:\Users\user", "test*.doc", SearchOption.AllDirectories)
.Where(file => file.Contains("\\Debug\\"))
.ToArray();
This will get ALL the files in your specified directory and return the ones with Debug in the path. With this in mind, try and keep the search directory narrowed down as much as possible.
Note:
My original answer included EnumerateFiles which would work like this (making sure to pass the search option (thanks #CodeCaster)):
String[] files = Directory.EnumerateFiles(#"C:\Users\user", "test*.doc", SearchOption.AllDirectories)
.Where(file => file.Contains("\\Debug\\"))
.ToArray();
I've just run a test and the second seems to be slower however it might be quicker on a larger folder. Worth keeping in mind.
Edit: Note from #pinkfloydx33
I've actually had that practically take down a system that I had
inherited. It was taking so much time trying to return the array and
killing the memory footprint as well. Problem was diverted converting
over to the enumerable counterparts
So using the second option would be safer for larger directories.

The second parameter, the search pattern, works only for filenames. So you'll need to iterate the directories you want to search, then call Directory.GetFiles(directory, "test*.doc") on each directory.
How to write that code depends on how robust you want it to be and what assumptions you want to make (e.g. "all Debug directories are always two levels into the user's directory" versus "the Debug directory can be at any level into the user's directory").
See How to recursively list all the files in a directory in C#?.
Alternatively, if you want to search all subdirectories and then discard files that don't match your preferences, see Searching for file in directories recursively:
var files = Directory.GetFiles(#"C:\Users\user", "test*.doc", SearchOption.AllDirectories)
.Where(f => f.IndexOf(#"\debug", StringComparison.OrdinalIgnoreCase) >= 0);
But note that this may be bad for performance, as it'll scan irrelevant directories.

Related

Get Files and Folders by Folder and File wildcard

I have a pattern like ..\\*\\your_magic*.txt*zip and I'm in Directory "x"
now I would love to get all files and directories that match the above pattern.
For example if I'm in
d:\test\test1
valid results would be: (lets assume the folders and files do exist)
d:\test\test1\your_magic.txt.zip
d:\test\test1\your_magic.txtzip
d:\test\test2\your_magic.txt.zip
d:\test\test1\test3\your_magic.txt.zip
What I'm thinking, is that I would need to split up the string into folders and search all of them recursively. Now I'm not a c# pro and hope that there will be a much more simple solution.
See Directory.GetFiles:
string[] files = Directory.GetFiles(#"d:\test", "your_magic*.txt*zip", SerachOption.AllDirectories);

How can I make GetFiles() exclude files with extensions that start with the search extension?

I am using the following line to return specific files...
FileInfo file in nodeDirInfo.GetFiles("*.sbs", option)
But there are other files in the directory with the extension .sbsar, and it is getting them, too. How can I differentiate between .sbs and .sbsar in the search pattern?
The issue you're experiencing is a limitation of the search pattern, in the Win32 API.
A searchPattern with a file extension (for example *.txt) of exactly
three characters returns files having an extension of three or more
characters, where the first three characters match the file extension
specified in the searchPattern.
My solution is to manually filter the results, using Linq:
nodeDirInfo.GetFiles("*.sbs", option).Where(s => s.EndsWith(".sbs"),
StringComparison.InvariantCultureIgnoreCase));
Try this, filtered using file extension.
FileInfo[] files = nodeDirInfo.GetFiles("*", SearchOption.TopDirectoryOnly).
Where(f=>f.Extension==".sbs").ToArray<FileInfo>();
That's the behaviour of the Win32 API (FindFirstFile) that is underneath GetFiles() being reflected on to you.
You'll need to do your own filtering if you must use GetFiles(). For instance:
GetFiles("*", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
Or more efficiently:
EnumerateFiles("*", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
Note that I use StringComparison.InvariantCultureIgnoreCase to deal with the fact that Windows file names are case-insensitive.
If performance is an issue, that is if the search has to process directories with large numbers of files, then it is more efficient to perform the filtering twice: once in the call to GetFiles or EnumerateFiles, and once to clean up the unwanted file names. For example:
GetFiles("*.sbs", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
EnumerateFiles("*.sbs", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
Its mentioned in docs
When using the asterisk wildcard character in a searchPattern,a
searchPattern with a file extension of exactly three characters
returns files having an extension of three or more characters.When
using the question mark wildcard character, this method returns only
files that match the specified file extension.

Get files of certain extension c#

I wish to get a list of all the files of a certain extension (recursive), but only the files ending with that extension.
For example, I wish to get all the files with the ".exe" extension, If I have the following files:
file1.exe , file2.txt.exe , file3.exe.txt , file4.txt.exe1 , file5.txt
I expect to get a list of 1 file, which is: file1.exe.
I'm trying to use the following line:
List<string> theList = Directory.GetFiles(#"C:\SearchDir", "*.exe", SearchOption.AllDirectories).ToList();
But what I get is a list of the following three files: file1.exe , file2.txt.exe , file4.txt.exe1
Any ideas?
Try this:
var exeFiles = Directory.EnumerateFiles(sourceDirectory,
"*", SearchOption.AllDirectories)
.Where(s => s.EndsWith(".exe") && s.Count( c => c == '.') == 2)
.ToList();
This is a common issue to see. Take note to the MSDN documentation:
When using the asterisk wildcard character in a searchPattern, such as "*.txt", the matching behavior when the extension is exactly three characters long is different than when the extension is more or less than three characters long. A searchPattern with a file extension of exactly three characters returns files having an extension of three or more characters, where the first three characters match the file extension specified in the searchPattern.
You can't solve it by searching for the .exe extension; you'll need to filter your results one more time in the client code.
Now, one thing to note also is this. The following examples would in fact be considered executable files:
file1.exe
file2.txt.exe
whereas this one wouldn't technically be considered an executable file.
file4.txt.exe1
So the question then becomes, what algorithm do you want? It appears to me you want the following:
Files that have an extension of exe.
Files that don't have multiple extensions.
Have a look at Ahmed's answer for a fantastic approach to getting the algorithm you want.

How to scan a directory with wildcard with a specific subdirectory

I was wondering what would be a good way to scan a directory that has characters you are not sure of.
For example, I want to scan
C:\Program\Version2.*\Files
Meaning
The folder is located in C:\Program
Version2.* could be anything like Version2.33, Version2.1, etc.
That folder has a folder named Files in it
I know that I could do something like foreach (directory) if contains("Version2."), but I was wondering if there was a better way of doing so.
Directory.EnumerateDirectories accepts search pattern. So enumerate parent that has wildcard and than enumerate the rest:
var directories =
Directory.EnumerateDirectories(#"C:\Program\", "Version2.*")
.SelectMany(parent => Directory.EnumerateDirectories(parent,"Files"))
Note: if path can contain wildcards on any level - simply normalize path and split by "\", than collect folders level by level.
Try this
var pattern = new Regex(#"C:\\Program\\Version 2(.*)\\Files(.*)");
var directories = Directory.EnumerateDirectories(#"C:\Program", "*",
SearchOption.AllDirectories)
.Where(d => pattern.IsMatch(d));

How Do I filter out names of folders in C#?

I have the code searching through the directory and picks out all the folders, but I only want it to pick out ones that Start with Data. How would I do that?
Below is the code I have that goes through the Directory:
string[] filePaths = Directory.GetDirectories(defaultPath).Where(Data => !Data.EndsWith(".")).ToArray();
No need to use LINQ; GetDirectories supports search patterns, and will probably be significantly faster since the filtering may be done by the filesystem, before enumerating the results in .NET.
string[] filePaths = Directory.GetDirectories(defaultPath, "Data*");
Note that * is a wildcard which matches zero or more characters.
If "starts with data" you just mean the folder name begins with "Data", this will work
string[] filePaths = Directory.GetDirectories(defaultPath)
.Where(s => s.StartsWith("Data") && !s.EndsWith(".")).ToArray();

Categories