I have some files that have multiple extensions(ex: D*.P*.C*) I'm building a process to move files with specific extensions like the above one and .csv and .arc files. I'm failing to filter D*.P*.C* files. Here is the code below. Any help will be greatly appreciated.
var entries =
Directory.GetFileSystemEntries(_sourceLocation_FRUD, "*.*", SearchOption.AllDirectories)
.Where(s => (s.StartsWith("D"))
&& (s.Contains(".P"))
&& (s.EndsWith(".C")));
The second parameter isn't Regex, but it is a form of wildcard search.
Remove the LINQ code and specify your extension pattern in the method call. Since the * means 0 or more characters, you should be able to just use your D*.P*.C* pattern.
var entries =
Directory.GetFileSystemEntries(_sourceLocation_FRUD, "D*.P*.C*", SearchOption.AllDirectories)
Assuming only the extension starts with D, not the full filename, you may have to change your pattern to *.D*.P*.C*
Related
I have directories arranged as in the picture. I want to edit the files OneA1, OneA2,OneA3.. and TwoA1, TwoA2, TwoA3... (They are xml files and want to edit some tags). There are 100s of files in C drive. How do I filter the required files in C# ? Aim is to filter all xml files with file names contain the word OneA and OneB.
static void Main (string[] args)
{
DirectoryInfo directory = new DirectoryInfo (#"C:\Products\MetalicProducts");
}
You can use a search option to include all subdirectories:
var prefilteredFiles = Directory.EnumerateFiles(path, "???A*.xml",
SearchOption.AllDirectories);
var filtered = prefilteredFiles
.Select(f => (full: f, name: Path.GetFileNameWithoutExtension(f)))
.Where(t => t.name.StartsWith("OneA") || t.name.StartsWith("TwoA"));
The wildcard pattern ???A*.xml pre-filters the files but is not selective enough. Therefore we use LINQ to refine the search.
The Select creates a tuple with the full file name including the directory and the extension and the bare file name.
Of course you could use Regex if the simple string operations are not precise enough:
var filtered = prefilteredFiles
.Where(f =>
Regex.IsMatch(Path.GetFileNameWithoutExtension(f), "(OneA|TwoA)[1-9]+")
);
This also has the advantage that only one test per file is required what allows us to discard the Select.
You also might use a pre-compiled regex to speed up the search; however, file operations are usually very slow compared to any calculations.
Note that DirectoryInfo also has a EnumerateFiles method with a SearchOption parameter. It will return FileInfo objects instead of just file names.
I found several questions on Stack Overflow about the Directory.GetFiles() but in all of the cases, they explain how to use it to find a specific extension or a set of files through multiple criteria. But in my case, what i want is get a search pattern for Directory.GetFiles() using regular expressions, which return all of the files of the directory but the set that i'm specifying. I mean not declare the set that i want but the difference. For example, if i want all of the files of a directory but not the htmls. Notice that, i',m know it could be achieve it in this way:
var filteredFiles = Directory
.GetFiles(path, "*.*")
.Where(file => !file.ToLower().EndsWith("html")))
.ToList();
But this is not a very reusable solution, if later i want to filter for another kind of file i have to change the code adding an || to the Where condition. I'm looking for something that allows me create a regex, which consist in the files that i don't want to get and pass it to Directory.GetFiles(). So, if i want to filter for more extensions later, is just changing the regex.
You don't need a regex if you want to filter extension(s):
// for example a field or property in your class
private HashSet<string> ExtensionBlacklist { get; } =
new HashSet<string>(StringComparer.InvariantCultureIgnoreCase)
{
".html",
".htm"
};
// ...
var filteredFiles = Directory.EnumerateFiles(path, "*.*")
.Where(fn => !ExtensionBlacklist.Contains(System.IO.Path.GetExtension(fn)))
.ToList();
I would recommend against using regex in favor of something like this:
var filteredFiles = Directory
.GetFiles(path, "*.*")
.Where(file => !excludedExtensions.Any<string>((extension) =>
file.EndsWith(extension, StringComparison.CurrentCultureIgnoreCase)))
.ToList();
You can pass it a collection of your excluded extensions, e.g.:
var excludedExtensions = new List<string>(new[] {".html", ".xml"});
The Any will short-circuit as soon as it finds a match on an excluded extension, so I think this is preferable even to excludedExtensions.Contains(). As for the regex, I don't think there's a good reason to use that given the trouble it can buy you. Don't use regex unless it's the only tool for the job.
So essentially you just don't know how to perform a regex match on a string?
There is Regex.IsMatch for that very purpose. However, you could also change the code to look up the extension in a set of extensions to filter, which would also allow you to easily add new filters.
I've a directory with tons of files and I want only to get the names of the ones starting with sly_.
If I'm not wrong, the patter for this is ^sly_.
This is my try using the solution of this question:
string pattern = #"^sly_";
var matches = Directory.GetFiles(#"D:\mypath").Where(path => Regex.Match(path, pattern).Success);
foreach (string file in matches)
Console.Write(file);
Unfortunatelly, this doesn't list the files matching my pattern. So, can someone tell me what's wrong with me code and how can I list the file names starting with sly_?
Thanks in advance.
If you insist on regular expression you should test FileName, not the entire path:
string pattern = #"^sly_";
var matches = Directory
.GetFiles(#"D:\mypath")
.Where(path => Regex.IsMatch(Path.GetFileName(path), pattern));
Console.Write(String.Join(Environment.NewLine, matches));
Your actual issue is that Directory.GetFiles returns
An array of the full names (including paths) for the files in the specified directory, or an empty array if no files are found.
You regex would need to check for the D:\mypath part as well as the sly_ part. Other than that, your expression is correct.
You don't need to use regex at all. This is more readable and efficient:
string[] matches = Directory.GetFiles(#"D:\mypath", "sly_*");
Directory.GetFiles Method (String, String)
* is a wildcard for "zero or more characters in that position" and it's used only one the file-name not the full-path. If you wanted to include the extension:
string[] matches = Directory.GetFiles(#"D:\mypath", "sly_*.txt");
Your regex would also work if you just use the file-name not the full-path:
var matches = Directory.GetFiles(#"D:\mypath")
.Where(path => Regex.Match(Path.GetFileName(path), pattern).Success);
But as mentioned, this is less readable and not efficient. Remember that matches currently is only a LINQ query, not a collection. You need to add f.e. ToArray to get one. Otherwise this query is executed always when you use matches.
This is easy using Linq and classes DirectoryInfo and FileInfo. A FileInfo has properties FileName and FullFileName. Usage would be as follows:
IEnumerable<FileInfo> myFiles = new DirectoryInfo(#"D:\mypath")
.EnumerateFiles()
.Where(fileInfo => fileInfo.Name.StartsWith("sly_", StringComparison.OrdinalIgnoreCase));
Use Enumerable.Select to get the sequence with full file names or short file names
Why your code is not working is, Directory.GetFiles() returns the full path of files, like
D:\mypath\sly_yourFile.txt
So, the string path doesn't start with sly_ and does not match your Regex #"^sly_".
A simpler solution is to provide the search pattern to GetFiles() method like
Directory.GetFiles(#"D:\mypath", "sly_*")
I am using the following line to return specific files...
FileInfo file in nodeDirInfo.GetFiles("*.sbs", option)
But there are other files in the directory with the extension .sbsar, and it is getting them, too. How can I differentiate between .sbs and .sbsar in the search pattern?
The issue you're experiencing is a limitation of the search pattern, in the Win32 API.
A searchPattern with a file extension (for example *.txt) of exactly
three characters returns files having an extension of three or more
characters, where the first three characters match the file extension
specified in the searchPattern.
My solution is to manually filter the results, using Linq:
nodeDirInfo.GetFiles("*.sbs", option).Where(s => s.EndsWith(".sbs"),
StringComparison.InvariantCultureIgnoreCase));
Try this, filtered using file extension.
FileInfo[] files = nodeDirInfo.GetFiles("*", SearchOption.TopDirectoryOnly).
Where(f=>f.Extension==".sbs").ToArray<FileInfo>();
That's the behaviour of the Win32 API (FindFirstFile) that is underneath GetFiles() being reflected on to you.
You'll need to do your own filtering if you must use GetFiles(). For instance:
GetFiles("*", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
Or more efficiently:
EnumerateFiles("*", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
Note that I use StringComparison.InvariantCultureIgnoreCase to deal with the fact that Windows file names are case-insensitive.
If performance is an issue, that is if the search has to process directories with large numbers of files, then it is more efficient to perform the filtering twice: once in the call to GetFiles or EnumerateFiles, and once to clean up the unwanted file names. For example:
GetFiles("*.sbs", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
EnumerateFiles("*.sbs", searchOption).Where(s => s.EndsWith(".sbs",
StringComparison.InvariantCultureIgnoreCase));
Its mentioned in docs
When using the asterisk wildcard character in a searchPattern,a
searchPattern with a file extension of exactly three characters
returns files having an extension of three or more characters.When
using the question mark wildcard character, this method returns only
files that match the specified file extension.
I wish to get a list of all the files of a certain extension (recursive), but only the files ending with that extension.
For example, I wish to get all the files with the ".exe" extension, If I have the following files:
file1.exe , file2.txt.exe , file3.exe.txt , file4.txt.exe1 , file5.txt
I expect to get a list of 1 file, which is: file1.exe.
I'm trying to use the following line:
List<string> theList = Directory.GetFiles(#"C:\SearchDir", "*.exe", SearchOption.AllDirectories).ToList();
But what I get is a list of the following three files: file1.exe , file2.txt.exe , file4.txt.exe1
Any ideas?
Try this:
var exeFiles = Directory.EnumerateFiles(sourceDirectory,
"*", SearchOption.AllDirectories)
.Where(s => s.EndsWith(".exe") && s.Count( c => c == '.') == 2)
.ToList();
This is a common issue to see. Take note to the MSDN documentation:
When using the asterisk wildcard character in a searchPattern, such as "*.txt", the matching behavior when the extension is exactly three characters long is different than when the extension is more or less than three characters long. A searchPattern with a file extension of exactly three characters returns files having an extension of three or more characters, where the first three characters match the file extension specified in the searchPattern.
You can't solve it by searching for the .exe extension; you'll need to filter your results one more time in the client code.
Now, one thing to note also is this. The following examples would in fact be considered executable files:
file1.exe
file2.txt.exe
whereas this one wouldn't technically be considered an executable file.
file4.txt.exe1
So the question then becomes, what algorithm do you want? It appears to me you want the following:
Files that have an extension of exe.
Files that don't have multiple extensions.
Have a look at Ahmed's answer for a fantastic approach to getting the algorithm you want.