C# Directory.GetFiles with mask - c#

In C#, I would like to get all files from a specific directory that matches the following mask:
prefix is "myfile_"
suffix is some numeric number
file extension is xml
i.e
myfile_4.xml
myfile_24.xml
the following files should not match the mask:
_myfile_6.xml
myfile_6.xml_
the code should like somehing this this (maybe some linq query can help)
string[] files = Directory.GetFiles(folder, "???");
Thanks

I am not good with regular expressions, but this might help -
var myFiles = from file in System.IO.Directory.GetFiles(folder, "myfile_*.xml")
where Regex.IsMatch(file, "myfile_[0-9]+.xml",RegexOptions.IgnoreCase) //use the correct regex here
select file;

You can try it like:
string[] files = Directory.GetFiles("C:\\test", "myfile_*.xml");
//This will give you all the files with `xml` extension and starting with `myfile_`
//but this will also give you files like `myfile_ABC.xml`
//to filter them out
int temp;
List<string> selectedFiles = new List<string>();
foreach (string str in files)
{
string fileName = Path.GetFileNameWithoutExtension(str);
string[] tempArray = fileName.Split('_');
if (tempArray.Length == 2 && int.TryParse(tempArray[1], out temp))
{
selectedFiles.Add(str);
}
}
So if your Test folder has files:
myfile_24.xml
MyFile_6.xml
MyFile_6.xml_
myfile_ABC.xml
_MyFile_6.xml
Then you will get in selectedFiles
myfile_24.xml
MyFile_6.xml

You can do something like:
Regex reg = new Regex(#"myfile_\d+.xml");
IEnumerable<string> files = Directory.GetFiles("C:\\").Where(fileName => reg.IsMatch(fileName));

Related

In C# matching all files in a directory using regex

I am currently trying to use the below regular expression in C#
Regex reg = new Regex(#"-(FILENM01P\\.(\\d){3}\\.PGP)$");
var files = Directory.GetFiles(savePath, "*.PGP")
.Where(path => reg.IsMatch(path))
.ToList();
foreach (string file in files)
{
MessageBox.Show(file);
}
To match all files that have this file naming convention in a single to directory
FILENM01P.001.PGP
If I just load up all files like this
var files = Directory.GetFiles(savePath, "*.PGP")
foreach (string file in files)
{
MessageBox.Show(file);
}
The I get a string like this; etc.
C:\Users\User\PGP Files\FILENM01P.001.PGP
There could be many of these files for example
FILENM01P.001.PGP
FILENM01P.002.PGP
FILENM01P.003.PGP
FILENM01P.004.PGP
But there will never be
FILENM01P.000.PGP
FILENM01P.1000.PGP
To clarify, only the 3 numbers together will change and can only be between 001 to 999 (with leading zeros) the rest of the text is static and will never change.
I'm a complete novice when it comes to RegEx so any help would be greatly appreciated.
Essentially my end goal is to find the next number and create the file and if there are no files then it will create one starting at 001 and if it gets to 999 then it returns 1000 so that I know I need to move to a new directory as each directory is limited to 999 sequential files. (I'll deal with this stuff though)
Try this code.
var reg = new Regex(#"FILENM01P\.(\d{3})\.PGP");
var matches = files.Select(f => reg.Match(f)).Where(f => f.Success).Select(x=> Convert.ToInt32(x.Value.Split('.')[1])).ToList();
var nextNumber = (matches.Max() + 1).ToString("D3"); // 3 digit with leading zeros
Also you might need a if check to see if the next number is 1000 if so then return 0.
(matches.Max() + 1 > 999? 0:matches.Max() + 1).ToString("D3")
My test case.
List<string> files = new List<string>();
files.Add(#"C:\Users\User\PGP Files\FILENM01P.001.PGP");
files.Add(#"C:\Users\User\PGP Files\FILENM01P.002.PGP");
files.Add(#"C:\Users\User\PGP Files\FILENM01P.003.PGP");
files.Add(#"C:\Users\User\PGP Files\FILENM01P.004.PGP");
The output is
nextNumber = "005";
Regex regex = new Regex(#"FILENM01P\.(\d+)\.", RegexOptions.IgnoreCase);
var fnumbers = Directory.GetFiles(src, "*.PGP", SearchOption.TopDirectoryOnly)
.Select(f=>regex.Match(f))
.Where(m=>m.Success)
.Select(m=>int.Parse(m.Groups[1].Value));
int fileNum = 1 + (fnumbers.Any() ? fnumbers.Max() : 0);
You can do something like this:
var reg = new Regex(#"FILENM01P\.(\d{3})\.PGP");
var matches = files.Select(f => reg.Match(f)).Where(f => f.Success).ToList();
var nextNumber = matches.Any()
? matches.Max(f => int.Parse(f.Groups[1].Value)) + 1
: 1;
Where files is a list of the files to match.

Buffer file names in a given directory

I'm trying to find a way to buffer FileNames from a given directory in C#. By this I mean:
Given directory
C:/MyDir
Which contains files:
File1_orig.txt
File1_edited.txt
File2_orig.txt
File2_edited.txt
...
Filen_orig.txt
Filen_edited.txt
I want to store the filenames(not the whole filepath, just the filename, e.g. String[] filename = Filen_orig.txt) into temporary strings and run a simple comparison on them to see if they contain a target string.
I would like to pass the strings into:
while(STILL FILES IN DIRECTORY)
{
string[] exFileName = {BUFFER FILENAME HERE}
string[] words = exFileName.Split('_');
string[] toCompare = "edited";
bool result;
foreach (string word in words)
{
Console.WriteLine(word);
bool result = toCompare.Equals(word, StringComparison.OrdinalIgnoreCase);
if (result)
{
Console.WriteLine("success");
}
}
Console.ReadLine();
To check to see if the file being examined is edited (*_edited.txt) or an original (*_original.txt), and, if the file is edited, further process the file.
Does anyone know how to automate a filepath read?
Thank you very much.
if you want to see if any files contain the _edited bit, you can use:
bool success = Directory.GetFiles(#"c:\MyDir").Any(p => p.Contains("_edited"));
I'm making a bit of a guess this is what you want because your code isn't very clear (nor is your description)
Edit: to show all edited files:
foreach(var file in Directory.GetFiles(#"c:\MyDir").Where(p => p.Contains("_edited")))
{
Console.WriteLine(" {0}: edited", file);
}
Also, must be using "System.Linq"
How about DirectoryInfo.GetFiles?
DirectoryInfo di = new DirectoryInfo(#"c:\");
// Get only subdirectories that contain the letter "p."
FileInfo[] files= di.GetFiles("*.txt");
foreach (FileInfo fi in files)
{
string exFileName = fi.FileName;
...
}

Best way to get only certain groupings out of a string

I am getting a list of file names using the following code:
//Set up Datatable
dtUpgradeFileInfo.Columns.Add("BaseFW");
dtUpgradeFileInfo.Columns.Add("ActiveFW");
dtUpgradeFileInfo.Columns.Add("UpgradeFW");
dtUpgradeFileInfo.Columns.Add("FileName");
//Gets Upgrade information and upgrade Files from Upgrade Folder
DirectoryInfo di = new DirectoryInfo(g_strAppPath + "\\Update Files");
FileInfo[] rgFiles = di.GetFiles("*.txt");
foreach (FileInfo fi in rgFiles)
{
test1 = fi.Name.ToString();
}
All file names will be in the form BXXXX_AXXXX_UXXXX. Where of course the Xs represent a number 0-9, and i need those 3 grouping of just numbers to put each into their respective column in the Datatable. I was initially intending to get the characters that represent each grouping and putting them together for each grouping but i'm wondering if there is a better way/quicker way than sending it to a charArray. Any suggestions?
Here is a relatively simple way to get the numbers out of test1 (without LINQ):
...
string test1 = fi.Name.ToString();
int baseFW=0;
int activeFW=0;
int upgradeFW=0;
// Break the file name into the three groups
string[] groups=test1.Split('_');
if (groups.Length==3)
{
// Create a numbers array to hold the numbers
int[] nums=new int[groups.Length];
// Parse the numbers out of the strings
int idx=0;
foreach (string s in groups)
nums[idx++]=int.Parse(s.Remove(0,1)); // Convert to num
baseFW=nums[0];
activeFW=nums[1];
upgradeFW=nums[2];
}
else
{
// Error handling...
}
If you want to do this using LINQ, it's even easier:
...
string test1 = fi.Name.ToString();
int baseFW=0;
int activeFW=0;
int upgradeFW=0;
// Extract all numbers
int[] nums=test1.Split('_') // Split on underscores
.Select(s => int.Parse(s.Remove(0,1))) // Convert to ints
.ToArray(); // For random access, below
if (nums.Length==3)
{
baseFW=nums[0];
activeFW=nums[1];
upgradeFW=nums[2];
}
else
{
// Error handling...
}
Using regular expressions allows you to easily parse out the values that you need, and has the added benefit of allowing you to skip over files that end up in the directory that don't match the expected filename format.
Your code would look something like this:
//Gets Upgrade information and upgrade Files from Upgrade Folder
string strRegex = #"^B(?<Base>[0-9]{4})_A(?<Active>[0-9]{4})_U(?<Upgrade>[0-9]{4}).txt$";
RegexOptions myRegexOptions = RegexOptions.ExplicitCapture | RegexOptions.Compiled;
Regex myRegex = new Regex(strRegex, myRegexOptions);
DirectoryInfo di = new DirectoryInfo(g_strAppPath + "\\Update Files");
FileInfo[] rgFiles = di.GetFiles("*.txt");
foreach (FileInfo fi in rgFiles)
{
string name = fi.Name.ToString();
Match matched = myRegex.Match(name);
if (matched.Success)
{
//do the inserts into the data table here
string baseFw = matched.Groups["Base"].Value;
string activeFw = matched.Groups["Active"].Value;
string upgradeFw = matched.Groups["Upgrade"].Value;
}
}

Stop implicit wildcard in Directory.GetFiles()

string[] fileEntries = Directory.GetFiles(pathName, "*.xml");
Also returns files like foo.xml_ Is there a way to force it to not do so, or will I have to write code to filter the return results.
This is the same behavior as dir *.xml on the command prompt, but different than searching for *.xml in windows explorer.
This behavior is by design. From MSDN (look at the note section and examples given):
A searchPattern with a file extension
of exactly three characters returns
files having an extension of three or
more characters, where the first three
characters match the file extension
specified in the searchPattern.
You could limit it as follows:
C# 2.0:
string[] fileEntries = Array.FindAll(Directory.GetFiles(pathName, "*.xml"),
delegate(string file) {
return String.Compare(Path.GetExtension(file), ".xml", StringComparison.CurrentCultureIgnoreCase) == 0;
});
// or
string[] fileEntries = Array.FindAll(Directory.GetFiles(pathName, "*.xml"),
delegate(string file) {
return Path.GetExtension(file).Length == 4;
});
C# 3.0:
string[] fileEntries = Directory.GetFiles(pathName, "*.xml").Where(file =>
Path.GetExtension(file).Length == 4).ToArray();
// or
string[] fileEntries = Directory.GetFiles(pathName, "*.xml").Where(file =>
String.Compare(Path.GetExtension(file), ".xml",
StringComparison.CurrentCultureIgnoreCase) == 0).ToArray();
it's due to the 8.3 search method of windows. If you try to search for "*.xm" you'll get 0 results.
you can use this in .net 2.0:
string[] fileEntries =
Array.FindAll<string>(System.IO.Directory.GetFiles(pathName, "*.xml"),
new Predicate<string>(delegate(string s)
{
return System.IO.Path.GetExtension(s) == ".xml";
}));

Filtering file names: getting *.abc without *.abcd, or *.abcde, and so on

Directory.GetFiles(LocalFilePath, searchPattern);
MSDN Notes:
When using the asterisk wildcard character in a searchPattern, such as ".txt", the matching behavior when the extension is exactly three characters long is different than when the extension is more or less than three characters long. A searchPattern with a file extension of exactly three characters returns files having an extension of three or more characters, where the first three characters match the file extension specified in the searchPattern. A searchPattern with a file extension of one, two, or more than three characters returns only files having extensions of exactly that length that match the file extension specified in the searchPattern. When using the question mark wildcard character, this method returns only files that match the specified file extension. For example, given two files, "file1.txt" and "file1.txtother", in a directory, a search pattern of "file?.txt" returns just the first file, while a search pattern of "file.txt" returns both files.
The following list shows the behavior of different lengths for the searchPattern parameter:
*.abc returns files having an extension of .abc, .abcd, .abcde, .abcdef, and so on.
*.abcd returns only files having an extension of .abcd.
*.abcde returns only files having an extension of .abcde.
*.abcdef returns only files having an extension of .abcdef.
With the searchPattern parameter set to *.abc, how can I return files having an extension of .abc, not .abcd, .abcde and so on?
Maybe this function will work:
private bool StriktMatch(string fileExtension, string searchPattern)
{
bool isStriktMatch = false;
string extension = searchPattern.Substring(searchPattern.LastIndexOf('.'));
if (String.IsNullOrEmpty(extension))
{
isStriktMatch = true;
}
else if (extension.IndexOfAny(new char[] { '*', '?' }) != -1)
{
isStriktMatch = true;
}
else if (String.Compare(fileExtension, extension, true) == 0)
{
isStriktMatch = true;
}
else
{
isStriktMatch = false;
}
return isStriktMatch;
}
Test Program:
class Program
{
static void Main(string[] args)
{
string[] fileNames = Directory.GetFiles("C:\\document", "*.abc");
ArrayList al = new ArrayList();
for (int i = 0; i < fileNames.Length; i++)
{
FileInfo file = new FileInfo(fileNames[i]);
if (StriktMatch(file.Extension, "*.abc"))
{
al.Add(fileNames[i]);
}
}
fileNames = (String[])al.ToArray(typeof(String));
foreach (string s in fileNames)
{
Console.WriteLine(s);
}
Console.Read();
}
Anybody else better solution?
The answer is that you must do post filtering. GetFiles alone cannot do it. Here's an example that will post process your results. With this you can use a search pattern with GetFiles or not - it will work either way.
List<string> fileNames = new List<string>();
// populate all filenames here with a Directory.GetFiles or whatever
string srcDir = "from"; // set this
string destDir = "to"; // set this too
// this filters the names in the list to just those that end with ".doc"
foreach (var f in fileNames.All(f => f.ToLower().EndsWith(".doc")))
{
try
{
File.Copy(Path.Combine(srcDir, f), Path.Combine(destDir, f));
}
catch { ... }
}
Not a bug, perverse but well-documented behavior. *.doc matches *.docx based on 8.3 fallback lookup.
You will have to manually post-filter the results for ending in doc.
use linq....
string strSomePath = "c:\\SomeFolder";
string strSomePattern = "*.abc";
string[] filez = Directory.GetFiles(strSomePath, strSomePattern);
var filtrd = from f in filez
where f.EndsWith( strSomePattern )
select f;
foreach (string strSomeFileName in filtrd)
{
Console.WriteLine( strSomeFileName );
}
This won't help in the short term, but voting on the MS Connect post for this issue may get things changed in the future.
http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=95415
Since for "*.abc" GetFiles will return extensions of 3 or more, anything with a length of 3 after the "." is an exact match, and anything longer is not.
string[] fileList = Directory.GetFiles(path, "*.abc");
foreach (string file in fileList)
{
FileInfo fInfo = new FileInfo(file);
if (fInfo.Extension.Length == 4) // "." is counted in the length
{
// exact extension match - process the file...
}
}
Not sure of the performance of the above - while it uses simple length comparisons rather than string manipulations, new FileInfo() is called each time around the loop.

Categories