In C# matching all files in a directory using regex - c#

I am currently trying to use the below regular expression in C#
Regex reg = new Regex(#"-(FILENM01P\\.(\\d){3}\\.PGP)$");
var files = Directory.GetFiles(savePath, "*.PGP")
.Where(path => reg.IsMatch(path))
.ToList();
foreach (string file in files)
{
MessageBox.Show(file);
}
To match all files that have this file naming convention in a single to directory
FILENM01P.001.PGP
If I just load up all files like this
var files = Directory.GetFiles(savePath, "*.PGP")
foreach (string file in files)
{
MessageBox.Show(file);
}
The I get a string like this; etc.
C:\Users\User\PGP Files\FILENM01P.001.PGP
There could be many of these files for example
FILENM01P.001.PGP
FILENM01P.002.PGP
FILENM01P.003.PGP
FILENM01P.004.PGP
But there will never be
FILENM01P.000.PGP
FILENM01P.1000.PGP
To clarify, only the 3 numbers together will change and can only be between 001 to 999 (with leading zeros) the rest of the text is static and will never change.
I'm a complete novice when it comes to RegEx so any help would be greatly appreciated.
Essentially my end goal is to find the next number and create the file and if there are no files then it will create one starting at 001 and if it gets to 999 then it returns 1000 so that I know I need to move to a new directory as each directory is limited to 999 sequential files. (I'll deal with this stuff though)

Try this code.
var reg = new Regex(#"FILENM01P\.(\d{3})\.PGP");
var matches = files.Select(f => reg.Match(f)).Where(f => f.Success).Select(x=> Convert.ToInt32(x.Value.Split('.')[1])).ToList();
var nextNumber = (matches.Max() + 1).ToString("D3"); // 3 digit with leading zeros
Also you might need a if check to see if the next number is 1000 if so then return 0.
(matches.Max() + 1 > 999? 0:matches.Max() + 1).ToString("D3")
My test case.
List<string> files = new List<string>();
files.Add(#"C:\Users\User\PGP Files\FILENM01P.001.PGP");
files.Add(#"C:\Users\User\PGP Files\FILENM01P.002.PGP");
files.Add(#"C:\Users\User\PGP Files\FILENM01P.003.PGP");
files.Add(#"C:\Users\User\PGP Files\FILENM01P.004.PGP");
The output is
nextNumber = "005";

Regex regex = new Regex(#"FILENM01P\.(\d+)\.", RegexOptions.IgnoreCase);
var fnumbers = Directory.GetFiles(src, "*.PGP", SearchOption.TopDirectoryOnly)
.Select(f=>regex.Match(f))
.Where(m=>m.Success)
.Select(m=>int.Parse(m.Groups[1].Value));
int fileNum = 1 + (fnumbers.Any() ? fnumbers.Max() : 0);

You can do something like this:
var reg = new Regex(#"FILENM01P\.(\d{3})\.PGP");
var matches = files.Select(f => reg.Match(f)).Where(f => f.Success).ToList();
var nextNumber = matches.Any()
? matches.Max(f => int.Parse(f.Groups[1].Value)) + 1
: 1;
Where files is a list of the files to match.

Related

reading in text file and spliting by comma in c#

I have a text file whose format is like this
Number,Name,Age
I want to read "Number" at the first column of this text file into an array to find duplication. here is the two ways i tried to read in the file.
string[] account = File.ReadAllLines(path);
string readtext = File.ReadAllText(path);
But every time i try to split the array to just get whats to the left of the first comma i fail. Have any ideas? Thanks.
You need to explicitly split the data to access its various parts. How would your program otherwise be able to decide that it is separated by commas?
The easiest approach to access the number that comes to my mind goes something like this:
var lines = File.ReadAllLines(path);
var firstLine = lines[0];
var fields = firstLine.Split(',');
var number = fields[0]; // Voilla!
You could go further by parsing the number as an int or another numeric type (if it really is a number). On the other hand, if you just want to test for uniqueness, this is not really necessary.
If you want all duplicate lines according to the Number:
var numDuplicates = File.ReadLines(path)
.Select(l => l.Trim().Split(','))
.Where(arr => arr.Length >= 3)
.Select(arr => new {
Number = arr[0].Trim(),
Name = arr[1].Trim(),
Age = arr[2].Trim()
})
.GroupBy(x => x.Number)
.Where(g => g.Count() > 1);
foreach(var dupNumGroup in numDuplicates)
Console.WriteLine("Number:{0} Names:{1} Ages:{2}"
, dupNumGroup.Key
, string.Join(",", dupNumGroup.Select(x => x.Name))
, string.Join(",", dupNumGroup.Select(x => x.Age)));
If you are looking specifically for a string.split solution, here is a really simple method of doing what you are looking for:
List<int> importedNumbers = new List<int>();
// Read our file in to an array of strings
var fileContents = System.IO.File.ReadAllLines(path);
// Iterate over the strings and split them in to their respective columns
foreach (string line in fileContents)
{
var fields = line.Split(',');
if (fields.Count() < 3)
throw new Exception("We need at least 3 fields per line."); // You would REALLY do something else here...
// You would probably want to be more careful about your int parsing... (use TryParse)
var number = int.Parse(fields[0]);
var name = fields[1];
var age = int.Parse(fields[2]);
// if we already imported this number, continue on to the next record
if (importedNumbers.Contains(number))
continue; // You might also update the existing record at this point instead of just skipping...
importedNumbers.Add(number); // Keep track of numbers we have imported
}

Reading file names, respectively?

I have 1000 files in a folder, I want to find the name of the file, but when I do it
The file names are not sorted.
For example: These are my filenames
1-CustomerApp.txt
2-CustomerApp.txt
3-CustomerApp.txt
...
var adddress = #"data\";
str = "";
DirectoryInfo d = new DirectoryInfo(adddress);//Assuming Test is your Folder
FileInfo[] Files = d.GetFiles("*.txt"); //Getting Text files
foreach (FileInfo file in Files)
{
str = str + ", " + file.Name;
}
You could use this to keep a "numerical" order
FileInfo[] files = d.GetFiles("*.txt")
.OrderBy(m => m.Name.PadLeft(200, '0')).ToArray();
200 is quite arbitrary, of course.
This will add "as many 0 as needed" so that the file name + n 0 are a string with a length of 200.
to make it a little bit less arbitrary and "brittle", you could do
var f = d.GetFiles("*.txt");
var maxLength = f.Select(l => l.Name.Length).Max() + 1;
FileInfo[] files = f.OrderBy(m => m.Name.PadLeft(maxLength, '0')).ToArray();
How does it work (bad explanation, someone could do better) and why is it far from perfect :
Ordering of characters in c# puts numeric characters before alpha characters.
So 0 (or 9) will be before a.
But 10a, will be before 2a as 1 comes before 2.
see char after char
1 / 2 => 10a first
But if we change
10a and 2a, with PadLeft, to
010a and 002a, we see, character after character
0 / 0 => equal
1 / 0 => 002a first
This "works" in your case, but really depends on your "file naming" logic.
CAUTION : This solution won't work with other file naming logic
For example, the numeric part is not at the start.
f-2-a and f-10-a
Because
00-f-2-a would still be before 0-f-10-a
or the "non-numeric part" is not of the same length.
1-abc and 2-z
Because
01-abc will come after 0002-z
They are sorted alphabetically. I don't know how do you see what they are not sorted (with what you compare or where have you seen different result), if you are using file manager, then perhaps it apply own sorting. In windows explorer result will be the same (try to sort by name column).
If you know template of how file name is created, then you can do a trick, to apply own sorting. In case of your example, extract number at start (until minus) and pad it with zeroes (until maximum possible number size), and then sort what you get.
Or you can pad with zeroes when files are created:
0001-CustomerApp.txt
0002-CustomerApp.txt
...
9999-CustomerApp.txt
This should give you more or less what you want; it skips any characters until it hits a number, then includes all directly following numerical characters to get the number. That number is then used as a key for sorting. Somewhat messy perhaps, but should work (and be less brittle than some other options, since this searches explicitly for the first "whole" number within the filename):
var filename1 = "10-file.txt";
var filename2 = "2-file.txt";
var filenames = new []{filename1, filename2};
var sorted =
filenames.Select(fn => new {
nr = int.Parse(new string(
fn.SkipWhile(c => !Char.IsNumber(c))
.TakeWhile(c => Char.IsNumber(c))
.ToArray())),
name = fn
})
.OrderBy(file => file.nr)
.Select(file => file.name);
Try the alphanumeric-sorting and sort the files' names according to it.
FileInfo[] files = d.GetFiles("*.txt");
files.OrderBy(file=>file.Name, new AlphanumComparatorFast())

Different integers? c#

I'm trying to read and extract specific information from several data files in which the filename format remains the same for every one, the format of my file is, XXXXXX_XXXXXX_PCPDTB_ODT.datafile where X is a random digit.
It represents year, month, day in the first 6 digits and hours, minutes and seconds in the last 6 X's, so 131005_091429_PCPDTB_ODT.datafile would be 2013, 10th month, 5th day and so on, the _PCPDTB_ODT.datafile is always there.
I'm able to gather my desired data (extracting all information after a certain keyword, in this case '#Footer' is my keyword) from a file successfully, but I'm not sure how I'd go about this with lots of files with several changing integers?
Here is my attempt (although it is terrible since I have very little experience of coding), but only seem to be to input 4 digits and no more. Which would only allow to access files like XXXX_PCPDTB_ODT.datafile or 1304_PCPDTB_ODT.datafile.
static void Main(string[] args)
{
var path = #"C:\Users\#####\Desktop\";
var ext = "_PCPDTB_ODT.datafile";
var range = Enumerable.Range(0,9);
var filePaths =
from i1 in range
from i2 in range
from i3 in range
from i4 in range
let file = path + i1 + i2 + i3 + i4 + ext
where File.Exists(file)
select File.ReadLines(file)
.SkipWhile(line => !line.Contains("#Footer"))
.Skip(1);
try
{
Console.WriteLine(String.Join(Environment.NewLine,filePaths.SelectMany(f => f)));
}
catch (Exception e)
{
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
Console.Read();
}
}
I've attempted adding more i values, i5, i6, i7 etc. with "_" to space the first six digits from the last 6 but it doesn't seem to do anything when using the larger i values past i4.
Any ideas would help greatly, please keep in mind my coding is most likely rubbish since my knowledge is very little at the moment, thanks.
Instead of trying to loop through every possible valid file name, you should just see what files are there. Here's an example using Directory.GetFiles
var filePaths = Directory.GetFiles(path, "*" + ext)
.Select(file => File.ReadLines(file)
.SkipWhile(line => !line.Contains("#Footer"))
.Skip(1));
If you need the date/time also, you can parse that out, too, with DateTime.ParseExact.
If you're trying to process all of the files in a directory, use Directory.EnumerateFiles.
Forexample:
foreach (var filename in Directory.EnumerateFiles(#"c:\mydata\", "*PCPDTB_ODT.data")
{
var lines = File.ReadLines(filename)
.SkipWhile(line => !line.Contains("#Footer"))
.Skip(1);
// add the rest of your code . . .
}
it doesn't seem to do anything when using the larger i values past i4
Proavbly because it's having to iterate through 1,000,000,000,000 different filenames?
Why not just get a list of files that match the pattern?
var path = #"C:\Users\#####\Desktop\";
var pattern = "??????_??????_PCPDTB_ODT.datafile";
var filePaths = Directory.GetFiles(path, pattern)
.Select(file => File.ReadLines(file)
.SkipWhile(line => !line.Contains("#Footer"))
.Skip(1));
Try this
using System;
public class Test
{
public static void Main()
{
string str = "131005_091429_PCPDTB_ODT.datafile ";
int[] date = new int[3];
int[] time = new int[3];
string[] arr = str.Split('_');
for(int i = 0;i<6;i=i+2)
{
date[i/2]=Convert.ToInt32(arr[0].Substring(i,2));
}
for(int i = 0;i<6;i=i+2)
{
time[i/2]=Convert.ToInt32(arr[1].Substring(i,2));
}
}
}

C# Directory.GetFiles with mask

In C#, I would like to get all files from a specific directory that matches the following mask:
prefix is "myfile_"
suffix is some numeric number
file extension is xml
i.e
myfile_4.xml
myfile_24.xml
the following files should not match the mask:
_myfile_6.xml
myfile_6.xml_
the code should like somehing this this (maybe some linq query can help)
string[] files = Directory.GetFiles(folder, "???");
Thanks
I am not good with regular expressions, but this might help -
var myFiles = from file in System.IO.Directory.GetFiles(folder, "myfile_*.xml")
where Regex.IsMatch(file, "myfile_[0-9]+.xml",RegexOptions.IgnoreCase) //use the correct regex here
select file;
You can try it like:
string[] files = Directory.GetFiles("C:\\test", "myfile_*.xml");
//This will give you all the files with `xml` extension and starting with `myfile_`
//but this will also give you files like `myfile_ABC.xml`
//to filter them out
int temp;
List<string> selectedFiles = new List<string>();
foreach (string str in files)
{
string fileName = Path.GetFileNameWithoutExtension(str);
string[] tempArray = fileName.Split('_');
if (tempArray.Length == 2 && int.TryParse(tempArray[1], out temp))
{
selectedFiles.Add(str);
}
}
So if your Test folder has files:
myfile_24.xml
MyFile_6.xml
MyFile_6.xml_
myfile_ABC.xml
_MyFile_6.xml
Then you will get in selectedFiles
myfile_24.xml
MyFile_6.xml
You can do something like:
Regex reg = new Regex(#"myfile_\d+.xml");
IEnumerable<string> files = Directory.GetFiles("C:\\").Where(fileName => reg.IsMatch(fileName));

Best way to get only certain groupings out of a string

I am getting a list of file names using the following code:
//Set up Datatable
dtUpgradeFileInfo.Columns.Add("BaseFW");
dtUpgradeFileInfo.Columns.Add("ActiveFW");
dtUpgradeFileInfo.Columns.Add("UpgradeFW");
dtUpgradeFileInfo.Columns.Add("FileName");
//Gets Upgrade information and upgrade Files from Upgrade Folder
DirectoryInfo di = new DirectoryInfo(g_strAppPath + "\\Update Files");
FileInfo[] rgFiles = di.GetFiles("*.txt");
foreach (FileInfo fi in rgFiles)
{
test1 = fi.Name.ToString();
}
All file names will be in the form BXXXX_AXXXX_UXXXX. Where of course the Xs represent a number 0-9, and i need those 3 grouping of just numbers to put each into their respective column in the Datatable. I was initially intending to get the characters that represent each grouping and putting them together for each grouping but i'm wondering if there is a better way/quicker way than sending it to a charArray. Any suggestions?
Here is a relatively simple way to get the numbers out of test1 (without LINQ):
...
string test1 = fi.Name.ToString();
int baseFW=0;
int activeFW=0;
int upgradeFW=0;
// Break the file name into the three groups
string[] groups=test1.Split('_');
if (groups.Length==3)
{
// Create a numbers array to hold the numbers
int[] nums=new int[groups.Length];
// Parse the numbers out of the strings
int idx=0;
foreach (string s in groups)
nums[idx++]=int.Parse(s.Remove(0,1)); // Convert to num
baseFW=nums[0];
activeFW=nums[1];
upgradeFW=nums[2];
}
else
{
// Error handling...
}
If you want to do this using LINQ, it's even easier:
...
string test1 = fi.Name.ToString();
int baseFW=0;
int activeFW=0;
int upgradeFW=0;
// Extract all numbers
int[] nums=test1.Split('_') // Split on underscores
.Select(s => int.Parse(s.Remove(0,1))) // Convert to ints
.ToArray(); // For random access, below
if (nums.Length==3)
{
baseFW=nums[0];
activeFW=nums[1];
upgradeFW=nums[2];
}
else
{
// Error handling...
}
Using regular expressions allows you to easily parse out the values that you need, and has the added benefit of allowing you to skip over files that end up in the directory that don't match the expected filename format.
Your code would look something like this:
//Gets Upgrade information and upgrade Files from Upgrade Folder
string strRegex = #"^B(?<Base>[0-9]{4})_A(?<Active>[0-9]{4})_U(?<Upgrade>[0-9]{4}).txt$";
RegexOptions myRegexOptions = RegexOptions.ExplicitCapture | RegexOptions.Compiled;
Regex myRegex = new Regex(strRegex, myRegexOptions);
DirectoryInfo di = new DirectoryInfo(g_strAppPath + "\\Update Files");
FileInfo[] rgFiles = di.GetFiles("*.txt");
foreach (FileInfo fi in rgFiles)
{
string name = fi.Name.ToString();
Match matched = myRegex.Match(name);
if (matched.Success)
{
//do the inserts into the data table here
string baseFw = matched.Groups["Base"].Value;
string activeFw = matched.Groups["Active"].Value;
string upgradeFw = matched.Groups["Upgrade"].Value;
}
}

Categories