Different integers? c# - c#

I'm trying to read and extract specific information from several data files in which the filename format remains the same for every one, the format of my file is, XXXXXX_XXXXXX_PCPDTB_ODT.datafile where X is a random digit.
It represents year, month, day in the first 6 digits and hours, minutes and seconds in the last 6 X's, so 131005_091429_PCPDTB_ODT.datafile would be 2013, 10th month, 5th day and so on, the _PCPDTB_ODT.datafile is always there.
I'm able to gather my desired data (extracting all information after a certain keyword, in this case '#Footer' is my keyword) from a file successfully, but I'm not sure how I'd go about this with lots of files with several changing integers?
Here is my attempt (although it is terrible since I have very little experience of coding), but only seem to be to input 4 digits and no more. Which would only allow to access files like XXXX_PCPDTB_ODT.datafile or 1304_PCPDTB_ODT.datafile.
static void Main(string[] args)
{
var path = #"C:\Users\#####\Desktop\";
var ext = "_PCPDTB_ODT.datafile";
var range = Enumerable.Range(0,9);
var filePaths =
from i1 in range
from i2 in range
from i3 in range
from i4 in range
let file = path + i1 + i2 + i3 + i4 + ext
where File.Exists(file)
select File.ReadLines(file)
.SkipWhile(line => !line.Contains("#Footer"))
.Skip(1);
try
{
Console.WriteLine(String.Join(Environment.NewLine,filePaths.SelectMany(f => f)));
}
catch (Exception e)
{
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
Console.Read();
}
}
I've attempted adding more i values, i5, i6, i7 etc. with "_" to space the first six digits from the last 6 but it doesn't seem to do anything when using the larger i values past i4.
Any ideas would help greatly, please keep in mind my coding is most likely rubbish since my knowledge is very little at the moment, thanks.

Instead of trying to loop through every possible valid file name, you should just see what files are there. Here's an example using Directory.GetFiles
var filePaths = Directory.GetFiles(path, "*" + ext)
.Select(file => File.ReadLines(file)
.SkipWhile(line => !line.Contains("#Footer"))
.Skip(1));
If you need the date/time also, you can parse that out, too, with DateTime.ParseExact.

If you're trying to process all of the files in a directory, use Directory.EnumerateFiles.
Forexample:
foreach (var filename in Directory.EnumerateFiles(#"c:\mydata\", "*PCPDTB_ODT.data")
{
var lines = File.ReadLines(filename)
.SkipWhile(line => !line.Contains("#Footer"))
.Skip(1);
// add the rest of your code . . .
}

it doesn't seem to do anything when using the larger i values past i4
Proavbly because it's having to iterate through 1,000,000,000,000 different filenames?
Why not just get a list of files that match the pattern?
var path = #"C:\Users\#####\Desktop\";
var pattern = "??????_??????_PCPDTB_ODT.datafile";
var filePaths = Directory.GetFiles(path, pattern)
.Select(file => File.ReadLines(file)
.SkipWhile(line => !line.Contains("#Footer"))
.Skip(1));

Try this
using System;
public class Test
{
public static void Main()
{
string str = "131005_091429_PCPDTB_ODT.datafile ";
int[] date = new int[3];
int[] time = new int[3];
string[] arr = str.Split('_');
for(int i = 0;i<6;i=i+2)
{
date[i/2]=Convert.ToInt32(arr[0].Substring(i,2));
}
for(int i = 0;i<6;i=i+2)
{
time[i/2]=Convert.ToInt32(arr[1].Substring(i,2));
}
}
}

Related

Enumerate Files

I want to get the files where the file names contains 14 digits.
foreach (var file_path in Directory.EnumerateFiles(#"F:\apinvoice", "*.pdf"))
{
}
I need to get the files only which has "14" digits.
16032021133026
17032021120457
17032021120534
I would go with regex where you specify pattern
you said you want 14 digits meaning it will ignore names like
a1603202113302
because it contains letter
therefore pattern is
^[0-9]{14}$
and full code:
Regex rx = new Regex("^[0-9]{14}$");
Directory
.EnumerateFiles(#"F:\apinvoice", "*.pdf")
.Where(x => rx.IsMatch(Path.GetFileNameWithoutExtension(x)));
Assign it to a list
List<string> list = Directory.EnumerateFiles(#"F:\apinvoice", "*.pdf"))
List<string> whatyouwant = list.Where(l => l.Length == 14).ToList();
Since these seem to be timestamps, another thing you could do is this;
foreach (var file_path in Directory.EnumerateFiles(#"F:\apinvoice", "*.pdf"))
{
DateTime dateTimeParsed;
var dateTimeParsedSuccesfully = DateTime.TryParseExact(file_path, "ddMMyyyyHHmmss", CultureInfo.InvariantCulture, DateTimeStyles.None, out dateTimeParsed);
if(dateTimeParsedSuccesfully)
{
// Got a valid file, add it to a list or something.
}
}
Also see:
https://learn.microsoft.com/en-us/dotnet/api/system.datetime.tryparseexact?view=net-5.0
https://learn.microsoft.com/en-us/dotnet/api/system.datetime.parseexact?view=net-5.0
ofcourse often the timespan will often be at the end of a file, so if there are characters or something in front, you may want to pass file_path.Substring(file_path.length - 14) to TryParseExact().

In C# matching all files in a directory using regex

I am currently trying to use the below regular expression in C#
Regex reg = new Regex(#"-(FILENM01P\\.(\\d){3}\\.PGP)$");
var files = Directory.GetFiles(savePath, "*.PGP")
.Where(path => reg.IsMatch(path))
.ToList();
foreach (string file in files)
{
MessageBox.Show(file);
}
To match all files that have this file naming convention in a single to directory
FILENM01P.001.PGP
If I just load up all files like this
var files = Directory.GetFiles(savePath, "*.PGP")
foreach (string file in files)
{
MessageBox.Show(file);
}
The I get a string like this; etc.
C:\Users\User\PGP Files\FILENM01P.001.PGP
There could be many of these files for example
FILENM01P.001.PGP
FILENM01P.002.PGP
FILENM01P.003.PGP
FILENM01P.004.PGP
But there will never be
FILENM01P.000.PGP
FILENM01P.1000.PGP
To clarify, only the 3 numbers together will change and can only be between 001 to 999 (with leading zeros) the rest of the text is static and will never change.
I'm a complete novice when it comes to RegEx so any help would be greatly appreciated.
Essentially my end goal is to find the next number and create the file and if there are no files then it will create one starting at 001 and if it gets to 999 then it returns 1000 so that I know I need to move to a new directory as each directory is limited to 999 sequential files. (I'll deal with this stuff though)
Try this code.
var reg = new Regex(#"FILENM01P\.(\d{3})\.PGP");
var matches = files.Select(f => reg.Match(f)).Where(f => f.Success).Select(x=> Convert.ToInt32(x.Value.Split('.')[1])).ToList();
var nextNumber = (matches.Max() + 1).ToString("D3"); // 3 digit with leading zeros
Also you might need a if check to see if the next number is 1000 if so then return 0.
(matches.Max() + 1 > 999? 0:matches.Max() + 1).ToString("D3")
My test case.
List<string> files = new List<string>();
files.Add(#"C:\Users\User\PGP Files\FILENM01P.001.PGP");
files.Add(#"C:\Users\User\PGP Files\FILENM01P.002.PGP");
files.Add(#"C:\Users\User\PGP Files\FILENM01P.003.PGP");
files.Add(#"C:\Users\User\PGP Files\FILENM01P.004.PGP");
The output is
nextNumber = "005";
Regex regex = new Regex(#"FILENM01P\.(\d+)\.", RegexOptions.IgnoreCase);
var fnumbers = Directory.GetFiles(src, "*.PGP", SearchOption.TopDirectoryOnly)
.Select(f=>regex.Match(f))
.Where(m=>m.Success)
.Select(m=>int.Parse(m.Groups[1].Value));
int fileNum = 1 + (fnumbers.Any() ? fnumbers.Max() : 0);
You can do something like this:
var reg = new Regex(#"FILENM01P\.(\d{3})\.PGP");
var matches = files.Select(f => reg.Match(f)).Where(f => f.Success).ToList();
var nextNumber = matches.Any()
? matches.Max(f => int.Parse(f.Groups[1].Value)) + 1
: 1;
Where files is a list of the files to match.

How do i get the last file from directory by the file name and then create the next file?

This is the loop i'm using today:
for (int i = 0; i < dateTime.Count; i++)
{
string result = dateTime[i].ToString("yyyyMMddHHmm");
link = "http://www.sat24.com/image2.ashx?region=" + selectedregion + "&time=" + result + "&ir=" +
infraredorvisual;
string filePath = Path.Combine(satimagesdir, "SatImage" + (i + last) + ".GIF");
try
{
client1.DownloadFile(link, filePath);
}
catch (Exception e)
{
DannyGeneral.Logger.Write(e.ToString());
}
}
This way now i'm not getting the last file and the variable last is not in use so it's all the time 0 so in the end it will allways create new 0 to 8 files and will overwrite existing files.
What i need to do is two things:
To find the last file by name by it's part of the number like i explained above.
The next new 9 files should be by the numbers from the last file.
So if i know for example that the last existing file is SatImage845.gif then the next file i know should be SatImage846.gif and then i need to make that in the loop it will create SatImage846.gif,SatImage846.gif....untill SatImage854.gif
This should be the rule.
Each time the loop should create the next 9 files according to the last file name.
Try with this before your for loop, so last is the highest number of the files in that same directory.
var regex = new Regex("SatImage([0-9]+).gif", RegexOptions.IgnoreCase);
var last =
Directory.GetFiles(satimagesdir)
.Select(Path.GetFileName)
.Select(x => regex.Match(x))
.Where(x => x.Success)
.Select(x => int.Parse(x.Groups[1].Value))
.OrderByDescending(x => x)
.First();

reading in text file and spliting by comma in c#

I have a text file whose format is like this
Number,Name,Age
I want to read "Number" at the first column of this text file into an array to find duplication. here is the two ways i tried to read in the file.
string[] account = File.ReadAllLines(path);
string readtext = File.ReadAllText(path);
But every time i try to split the array to just get whats to the left of the first comma i fail. Have any ideas? Thanks.
You need to explicitly split the data to access its various parts. How would your program otherwise be able to decide that it is separated by commas?
The easiest approach to access the number that comes to my mind goes something like this:
var lines = File.ReadAllLines(path);
var firstLine = lines[0];
var fields = firstLine.Split(',');
var number = fields[0]; // Voilla!
You could go further by parsing the number as an int or another numeric type (if it really is a number). On the other hand, if you just want to test for uniqueness, this is not really necessary.
If you want all duplicate lines according to the Number:
var numDuplicates = File.ReadLines(path)
.Select(l => l.Trim().Split(','))
.Where(arr => arr.Length >= 3)
.Select(arr => new {
Number = arr[0].Trim(),
Name = arr[1].Trim(),
Age = arr[2].Trim()
})
.GroupBy(x => x.Number)
.Where(g => g.Count() > 1);
foreach(var dupNumGroup in numDuplicates)
Console.WriteLine("Number:{0} Names:{1} Ages:{2}"
, dupNumGroup.Key
, string.Join(",", dupNumGroup.Select(x => x.Name))
, string.Join(",", dupNumGroup.Select(x => x.Age)));
If you are looking specifically for a string.split solution, here is a really simple method of doing what you are looking for:
List<int> importedNumbers = new List<int>();
// Read our file in to an array of strings
var fileContents = System.IO.File.ReadAllLines(path);
// Iterate over the strings and split them in to their respective columns
foreach (string line in fileContents)
{
var fields = line.Split(',');
if (fields.Count() < 3)
throw new Exception("We need at least 3 fields per line."); // You would REALLY do something else here...
// You would probably want to be more careful about your int parsing... (use TryParse)
var number = int.Parse(fields[0]);
var name = fields[1];
var age = int.Parse(fields[2]);
// if we already imported this number, continue on to the next record
if (importedNumbers.Contains(number))
continue; // You might also update the existing record at this point instead of just skipping...
importedNumbers.Add(number); // Keep track of numbers we have imported
}

Reading file names, respectively?

I have 1000 files in a folder, I want to find the name of the file, but when I do it
The file names are not sorted.
For example: These are my filenames
1-CustomerApp.txt
2-CustomerApp.txt
3-CustomerApp.txt
...
var adddress = #"data\";
str = "";
DirectoryInfo d = new DirectoryInfo(adddress);//Assuming Test is your Folder
FileInfo[] Files = d.GetFiles("*.txt"); //Getting Text files
foreach (FileInfo file in Files)
{
str = str + ", " + file.Name;
}
You could use this to keep a "numerical" order
FileInfo[] files = d.GetFiles("*.txt")
.OrderBy(m => m.Name.PadLeft(200, '0')).ToArray();
200 is quite arbitrary, of course.
This will add "as many 0 as needed" so that the file name + n 0 are a string with a length of 200.
to make it a little bit less arbitrary and "brittle", you could do
var f = d.GetFiles("*.txt");
var maxLength = f.Select(l => l.Name.Length).Max() + 1;
FileInfo[] files = f.OrderBy(m => m.Name.PadLeft(maxLength, '0')).ToArray();
How does it work (bad explanation, someone could do better) and why is it far from perfect :
Ordering of characters in c# puts numeric characters before alpha characters.
So 0 (or 9) will be before a.
But 10a, will be before 2a as 1 comes before 2.
see char after char
1 / 2 => 10a first
But if we change
10a and 2a, with PadLeft, to
010a and 002a, we see, character after character
0 / 0 => equal
1 / 0 => 002a first
This "works" in your case, but really depends on your "file naming" logic.
CAUTION : This solution won't work with other file naming logic
For example, the numeric part is not at the start.
f-2-a and f-10-a
Because
00-f-2-a would still be before 0-f-10-a
or the "non-numeric part" is not of the same length.
1-abc and 2-z
Because
01-abc will come after 0002-z
They are sorted alphabetically. I don't know how do you see what they are not sorted (with what you compare or where have you seen different result), if you are using file manager, then perhaps it apply own sorting. In windows explorer result will be the same (try to sort by name column).
If you know template of how file name is created, then you can do a trick, to apply own sorting. In case of your example, extract number at start (until minus) and pad it with zeroes (until maximum possible number size), and then sort what you get.
Or you can pad with zeroes when files are created:
0001-CustomerApp.txt
0002-CustomerApp.txt
...
9999-CustomerApp.txt
This should give you more or less what you want; it skips any characters until it hits a number, then includes all directly following numerical characters to get the number. That number is then used as a key for sorting. Somewhat messy perhaps, but should work (and be less brittle than some other options, since this searches explicitly for the first "whole" number within the filename):
var filename1 = "10-file.txt";
var filename2 = "2-file.txt";
var filenames = new []{filename1, filename2};
var sorted =
filenames.Select(fn => new {
nr = int.Parse(new string(
fn.SkipWhile(c => !Char.IsNumber(c))
.TakeWhile(c => Char.IsNumber(c))
.ToArray())),
name = fn
})
.OrderBy(file => file.nr)
.Select(file => file.name);
Try the alphanumeric-sorting and sort the files' names according to it.
FileInfo[] files = d.GetFiles("*.txt");
files.OrderBy(file=>file.Name, new AlphanumComparatorFast())

Categories