I have directories arranged as in the picture. I want to edit the files OneA1, OneA2,OneA3.. and TwoA1, TwoA2, TwoA3... (They are xml files and want to edit some tags). There are 100s of files in C drive. How do I filter the required files in C# ? Aim is to filter all xml files with file names contain the word OneA and OneB.
static void Main (string[] args)
{
DirectoryInfo directory = new DirectoryInfo (#"C:\Products\MetalicProducts");
}
You can use a search option to include all subdirectories:
var prefilteredFiles = Directory.EnumerateFiles(path, "???A*.xml",
SearchOption.AllDirectories);
var filtered = prefilteredFiles
.Select(f => (full: f, name: Path.GetFileNameWithoutExtension(f)))
.Where(t => t.name.StartsWith("OneA") || t.name.StartsWith("TwoA"));
The wildcard pattern ???A*.xml pre-filters the files but is not selective enough. Therefore we use LINQ to refine the search.
The Select creates a tuple with the full file name including the directory and the extension and the bare file name.
Of course you could use Regex if the simple string operations are not precise enough:
var filtered = prefilteredFiles
.Where(f =>
Regex.IsMatch(Path.GetFileNameWithoutExtension(f), "(OneA|TwoA)[1-9]+")
);
This also has the advantage that only one test per file is required what allows us to discard the Select.
You also might use a pre-compiled regex to speed up the search; however, file operations are usually very slow compared to any calculations.
Note that DirectoryInfo also has a EnumerateFiles method with a SearchOption parameter. It will return FileInfo objects instead of just file names.
Related
I have a directory with multiple sub directories that contain .doc files. Example:
C:\Users\user\Documents\testenviroment\Released\test0.doc
C:\Users\user\Documents\testenviroment\Debug\test1.doc
C:\Users\user\Documents1\testenviroment\Debug\test2.doc
C:\Users\user\Documents1\testenviroment\Released\test20.doc
I want to get all the test*.doc files under all Debug folders. I tried:
string[] files = Directory.GetFiles(#"C:\Users\user", "*Debug\\test*.doc",
SearchOption.AllDirectories);
And it gives me an "Illegal characters in path" error.
If I try:
string[] files = Directory.GetFiles(#"C:\Users\user", "\\Debug\\test*.doc",
SearchOption.AllDirectories);
I get a different error: "Could not find a part of the path C:\Users\user\Debug".
You are including a folder within the search pattern which isn't expected. According to the docs:
searchPattern Type: System.String The search string to match against
the names of files in path. This parameter can contain a combination
of valid literal path and wildcard (* and ?) characters (see Remarks),
but doesn't support regular expressions.
With this in mind, try something like this:
String[] files = Directory.GetFiles(#"C:\Users\user", "test*.doc", SearchOption.AllDirectories)
.Where(file => file.Contains("\\Debug\\"))
.ToArray();
This will get ALL the files in your specified directory and return the ones with Debug in the path. With this in mind, try and keep the search directory narrowed down as much as possible.
Note:
My original answer included EnumerateFiles which would work like this (making sure to pass the search option (thanks #CodeCaster)):
String[] files = Directory.EnumerateFiles(#"C:\Users\user", "test*.doc", SearchOption.AllDirectories)
.Where(file => file.Contains("\\Debug\\"))
.ToArray();
I've just run a test and the second seems to be slower however it might be quicker on a larger folder. Worth keeping in mind.
Edit: Note from #pinkfloydx33
I've actually had that practically take down a system that I had
inherited. It was taking so much time trying to return the array and
killing the memory footprint as well. Problem was diverted converting
over to the enumerable counterparts
So using the second option would be safer for larger directories.
The second parameter, the search pattern, works only for filenames. So you'll need to iterate the directories you want to search, then call Directory.GetFiles(directory, "test*.doc") on each directory.
How to write that code depends on how robust you want it to be and what assumptions you want to make (e.g. "all Debug directories are always two levels into the user's directory" versus "the Debug directory can be at any level into the user's directory").
See How to recursively list all the files in a directory in C#?.
Alternatively, if you want to search all subdirectories and then discard files that don't match your preferences, see Searching for file in directories recursively:
var files = Directory.GetFiles(#"C:\Users\user", "test*.doc", SearchOption.AllDirectories)
.Where(f => f.IndexOf(#"\debug", StringComparison.OrdinalIgnoreCase) >= 0);
But note that this may be bad for performance, as it'll scan irrelevant directories.
I have some files that have multiple extensions(ex: D*.P*.C*) I'm building a process to move files with specific extensions like the above one and .csv and .arc files. I'm failing to filter D*.P*.C* files. Here is the code below. Any help will be greatly appreciated.
var entries =
Directory.GetFileSystemEntries(_sourceLocation_FRUD, "*.*", SearchOption.AllDirectories)
.Where(s => (s.StartsWith("D"))
&& (s.Contains(".P"))
&& (s.EndsWith(".C")));
The second parameter isn't Regex, but it is a form of wildcard search.
Remove the LINQ code and specify your extension pattern in the method call. Since the * means 0 or more characters, you should be able to just use your D*.P*.C* pattern.
var entries =
Directory.GetFileSystemEntries(_sourceLocation_FRUD, "D*.P*.C*", SearchOption.AllDirectories)
Assuming only the extension starts with D, not the full filename, you may have to change your pattern to *.D*.P*.C*
I found several questions on Stack Overflow about the Directory.GetFiles() but in all of the cases, they explain how to use it to find a specific extension or a set of files through multiple criteria. But in my case, what i want is get a search pattern for Directory.GetFiles() using regular expressions, which return all of the files of the directory but the set that i'm specifying. I mean not declare the set that i want but the difference. For example, if i want all of the files of a directory but not the htmls. Notice that, i',m know it could be achieve it in this way:
var filteredFiles = Directory
.GetFiles(path, "*.*")
.Where(file => !file.ToLower().EndsWith("html")))
.ToList();
But this is not a very reusable solution, if later i want to filter for another kind of file i have to change the code adding an || to the Where condition. I'm looking for something that allows me create a regex, which consist in the files that i don't want to get and pass it to Directory.GetFiles(). So, if i want to filter for more extensions later, is just changing the regex.
You don't need a regex if you want to filter extension(s):
// for example a field or property in your class
private HashSet<string> ExtensionBlacklist { get; } =
new HashSet<string>(StringComparer.InvariantCultureIgnoreCase)
{
".html",
".htm"
};
// ...
var filteredFiles = Directory.EnumerateFiles(path, "*.*")
.Where(fn => !ExtensionBlacklist.Contains(System.IO.Path.GetExtension(fn)))
.ToList();
I would recommend against using regex in favor of something like this:
var filteredFiles = Directory
.GetFiles(path, "*.*")
.Where(file => !excludedExtensions.Any<string>((extension) =>
file.EndsWith(extension, StringComparison.CurrentCultureIgnoreCase)))
.ToList();
You can pass it a collection of your excluded extensions, e.g.:
var excludedExtensions = new List<string>(new[] {".html", ".xml"});
The Any will short-circuit as soon as it finds a match on an excluded extension, so I think this is preferable even to excludedExtensions.Contains(). As for the regex, I don't think there's a good reason to use that given the trouble it can buy you. Don't use regex unless it's the only tool for the job.
So essentially you just don't know how to perform a regex match on a string?
There is Regex.IsMatch for that very purpose. However, you could also change the code to look up the extension in a set of extensions to filter, which would also allow you to easily add new filters.
I was wondering what would be a good way to scan a directory that has characters you are not sure of.
For example, I want to scan
C:\Program\Version2.*\Files
Meaning
The folder is located in C:\Program
Version2.* could be anything like Version2.33, Version2.1, etc.
That folder has a folder named Files in it
I know that I could do something like foreach (directory) if contains("Version2."), but I was wondering if there was a better way of doing so.
Directory.EnumerateDirectories accepts search pattern. So enumerate parent that has wildcard and than enumerate the rest:
var directories =
Directory.EnumerateDirectories(#"C:\Program\", "Version2.*")
.SelectMany(parent => Directory.EnumerateDirectories(parent,"Files"))
Note: if path can contain wildcards on any level - simply normalize path and split by "\", than collect folders level by level.
Try this
var pattern = new Regex(#"C:\\Program\\Version 2(.*)\\Files(.*)");
var directories = Directory.EnumerateDirectories(#"C:\Program", "*",
SearchOption.AllDirectories)
.Where(d => pattern.IsMatch(d));
I have the code searching through the directory and picks out all the folders, but I only want it to pick out ones that Start with Data. How would I do that?
Below is the code I have that goes through the Directory:
string[] filePaths = Directory.GetDirectories(defaultPath).Where(Data => !Data.EndsWith(".")).ToArray();
No need to use LINQ; GetDirectories supports search patterns, and will probably be significantly faster since the filtering may be done by the filesystem, before enumerating the results in .NET.
string[] filePaths = Directory.GetDirectories(defaultPath, "Data*");
Note that * is a wildcard which matches zero or more characters.
If "starts with data" you just mean the folder name begins with "Data", this will work
string[] filePaths = Directory.GetDirectories(defaultPath)
.Where(s => s.StartsWith("Data") && !s.EndsWith(".")).ToArray();