I want to get the files where the file names contains 14 digits.
foreach (var file_path in Directory.EnumerateFiles(#"F:\apinvoice", "*.pdf"))
{
}
I need to get the files only which has "14" digits.
16032021133026
17032021120457
17032021120534
I would go with regex where you specify pattern
you said you want 14 digits meaning it will ignore names like
a1603202113302
because it contains letter
therefore pattern is
^[0-9]{14}$
and full code:
Regex rx = new Regex("^[0-9]{14}$");
Directory
.EnumerateFiles(#"F:\apinvoice", "*.pdf")
.Where(x => rx.IsMatch(Path.GetFileNameWithoutExtension(x)));
Assign it to a list
List<string> list = Directory.EnumerateFiles(#"F:\apinvoice", "*.pdf"))
List<string> whatyouwant = list.Where(l => l.Length == 14).ToList();
Since these seem to be timestamps, another thing you could do is this;
foreach (var file_path in Directory.EnumerateFiles(#"F:\apinvoice", "*.pdf"))
{
DateTime dateTimeParsed;
var dateTimeParsedSuccesfully = DateTime.TryParseExact(file_path, "ddMMyyyyHHmmss", CultureInfo.InvariantCulture, DateTimeStyles.None, out dateTimeParsed);
if(dateTimeParsedSuccesfully)
{
// Got a valid file, add it to a list or something.
}
}
Also see:
https://learn.microsoft.com/en-us/dotnet/api/system.datetime.tryparseexact?view=net-5.0
https://learn.microsoft.com/en-us/dotnet/api/system.datetime.parseexact?view=net-5.0
ofcourse often the timespan will often be at the end of a file, so if there are characters or something in front, you may want to pass file_path.Substring(file_path.length - 14) to TryParseExact().
Related
I have a issue searching files with Directory class. I have a lot of files with the name similar to this:
XXX_YYYYMMDD_HHMMSS.
I want to list only the files that have the name with a date in that format and Directory.GetFiles() support patterns, but i do not know if there is a pattern that allows me to filter that the name has a date with that format. I thought about using the date of creation or the date of modification, but it is not the same as the one that comes in the name and is the one I need to use.
Does anyone know how to help me? Thanks!
What about using regex? You could propably use something like this:
private static readonly Regex DateFileRegex = new Regex(".*_[0-9]{8}_[\d]{6}.*");
public IEnumerable<string> EnumerateDateFiles(string path)
{
return Directory.EnumerateFiles(path)
.Where(x => this.IsValidDateFile(x));
}
private bool IsValidDateFile(string filename)
{
return DateFileRegex.IsMatch(filename);
}
The pattern:
.*_[0-9]{8}_[\d]{6}.*
matches
AHJDJKA_20180417_113028sad.jpg
for example.
Notice: I couldnt test the code right now. But youll get the idea.
I suggest two steps filtering:
Wild cards (raw filtering) XXX_YYYYMMDD_HHMMSS where XXX, YYYYMMDD and HHMMSS are some characters
Fine filtering (Linq) where we ensure YYYYMMDD_HHMMSS is a proper date.
Something like this:
var files = Directory
.EnumerateFiles(#"c:\MyFiles", "*_????????_??????.*") // raw filtering
.Where(file => { // fine filtering
string name = Path.GetFileNameWithoutExtension(file);
// from second last underscope '_'
string at = name.Substring(name.LastIndexOf('_', name.LastIndexOf('_') - 1) + 1);
return DateTime.TryParseExact( // is it a proper date?
at,
"yyyyMMdd'_'HHmmss",
CultureInfo.InvariantCulture,
DateTimeStyles.AssumeLocal,
out var _date); })
.ToArray(); // Finally, we want an array
I have a text file whose format is like this
Number,Name,Age
I want to read "Number" at the first column of this text file into an array to find duplication. here is the two ways i tried to read in the file.
string[] account = File.ReadAllLines(path);
string readtext = File.ReadAllText(path);
But every time i try to split the array to just get whats to the left of the first comma i fail. Have any ideas? Thanks.
You need to explicitly split the data to access its various parts. How would your program otherwise be able to decide that it is separated by commas?
The easiest approach to access the number that comes to my mind goes something like this:
var lines = File.ReadAllLines(path);
var firstLine = lines[0];
var fields = firstLine.Split(',');
var number = fields[0]; // Voilla!
You could go further by parsing the number as an int or another numeric type (if it really is a number). On the other hand, if you just want to test for uniqueness, this is not really necessary.
If you want all duplicate lines according to the Number:
var numDuplicates = File.ReadLines(path)
.Select(l => l.Trim().Split(','))
.Where(arr => arr.Length >= 3)
.Select(arr => new {
Number = arr[0].Trim(),
Name = arr[1].Trim(),
Age = arr[2].Trim()
})
.GroupBy(x => x.Number)
.Where(g => g.Count() > 1);
foreach(var dupNumGroup in numDuplicates)
Console.WriteLine("Number:{0} Names:{1} Ages:{2}"
, dupNumGroup.Key
, string.Join(",", dupNumGroup.Select(x => x.Name))
, string.Join(",", dupNumGroup.Select(x => x.Age)));
If you are looking specifically for a string.split solution, here is a really simple method of doing what you are looking for:
List<int> importedNumbers = new List<int>();
// Read our file in to an array of strings
var fileContents = System.IO.File.ReadAllLines(path);
// Iterate over the strings and split them in to their respective columns
foreach (string line in fileContents)
{
var fields = line.Split(',');
if (fields.Count() < 3)
throw new Exception("We need at least 3 fields per line."); // You would REALLY do something else here...
// You would probably want to be more careful about your int parsing... (use TryParse)
var number = int.Parse(fields[0]);
var name = fields[1];
var age = int.Parse(fields[2]);
// if we already imported this number, continue on to the next record
if (importedNumbers.Contains(number))
continue; // You might also update the existing record at this point instead of just skipping...
importedNumbers.Add(number); // Keep track of numbers we have imported
}
I'm trying to read and extract specific information from several data files in which the filename format remains the same for every one, the format of my file is, XXXXXX_XXXXXX_PCPDTB_ODT.datafile where X is a random digit.
It represents year, month, day in the first 6 digits and hours, minutes and seconds in the last 6 X's, so 131005_091429_PCPDTB_ODT.datafile would be 2013, 10th month, 5th day and so on, the _PCPDTB_ODT.datafile is always there.
I'm able to gather my desired data (extracting all information after a certain keyword, in this case '#Footer' is my keyword) from a file successfully, but I'm not sure how I'd go about this with lots of files with several changing integers?
Here is my attempt (although it is terrible since I have very little experience of coding), but only seem to be to input 4 digits and no more. Which would only allow to access files like XXXX_PCPDTB_ODT.datafile or 1304_PCPDTB_ODT.datafile.
static void Main(string[] args)
{
var path = #"C:\Users\#####\Desktop\";
var ext = "_PCPDTB_ODT.datafile";
var range = Enumerable.Range(0,9);
var filePaths =
from i1 in range
from i2 in range
from i3 in range
from i4 in range
let file = path + i1 + i2 + i3 + i4 + ext
where File.Exists(file)
select File.ReadLines(file)
.SkipWhile(line => !line.Contains("#Footer"))
.Skip(1);
try
{
Console.WriteLine(String.Join(Environment.NewLine,filePaths.SelectMany(f => f)));
}
catch (Exception e)
{
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
Console.Read();
}
}
I've attempted adding more i values, i5, i6, i7 etc. with "_" to space the first six digits from the last 6 but it doesn't seem to do anything when using the larger i values past i4.
Any ideas would help greatly, please keep in mind my coding is most likely rubbish since my knowledge is very little at the moment, thanks.
Instead of trying to loop through every possible valid file name, you should just see what files are there. Here's an example using Directory.GetFiles
var filePaths = Directory.GetFiles(path, "*" + ext)
.Select(file => File.ReadLines(file)
.SkipWhile(line => !line.Contains("#Footer"))
.Skip(1));
If you need the date/time also, you can parse that out, too, with DateTime.ParseExact.
If you're trying to process all of the files in a directory, use Directory.EnumerateFiles.
Forexample:
foreach (var filename in Directory.EnumerateFiles(#"c:\mydata\", "*PCPDTB_ODT.data")
{
var lines = File.ReadLines(filename)
.SkipWhile(line => !line.Contains("#Footer"))
.Skip(1);
// add the rest of your code . . .
}
it doesn't seem to do anything when using the larger i values past i4
Proavbly because it's having to iterate through 1,000,000,000,000 different filenames?
Why not just get a list of files that match the pattern?
var path = #"C:\Users\#####\Desktop\";
var pattern = "??????_??????_PCPDTB_ODT.datafile";
var filePaths = Directory.GetFiles(path, pattern)
.Select(file => File.ReadLines(file)
.SkipWhile(line => !line.Contains("#Footer"))
.Skip(1));
Try this
using System;
public class Test
{
public static void Main()
{
string str = "131005_091429_PCPDTB_ODT.datafile ";
int[] date = new int[3];
int[] time = new int[3];
string[] arr = str.Split('_');
for(int i = 0;i<6;i=i+2)
{
date[i/2]=Convert.ToInt32(arr[0].Substring(i,2));
}
for(int i = 0;i<6;i=i+2)
{
time[i/2]=Convert.ToInt32(arr[1].Substring(i,2));
}
}
}
I have to retrieve list of file names from the specific directory using numeric order.Actually file names are combination of strings and numeric values but end with numeric values.
For example : page_1.png,page_2.png,page3.png...,page10.png,page_11.png,page_12.png...
my c# code is below :
string filePath="D:\\vs-2010projects\\delete_sample\\delete_sample\\myimages\\";
string[] filePaths = Directory.GetFiles(filePath, "*.png");
It retrieved in the following format:
page_1.png
page_10.png
page_11.png
page_12.png
page_2.png...
I am expecting to retrieve the list ordered like this:
page_1.png
page_2.png
page_3.png
[...]
page_10.png
page_11.png
page_12.png
Ian Griffiths has a natural sort for C#. It makes no assumptions about where the numbers appear, and even correctly sorts filenames with multiple numeric components, such as app-1.0.2, app-1.0.11.
You can try following code, which sort your file names based on the numeric values. Keep in mind, this logic works based on some conventions such as the availability of '_'. You are free to modify the code to add more defensive approach save you from any business case.
var vv = new DirectoryInfo(#"C:\Image").GetFileSystemInfos("*.bmp").OrderBy(fs=>int.Parse(fs.Name.Split('_')[1].Substring(0, fs.Name.Split('_')[1].Length - fs.Extension.Length)));
First you can extract the number:
static int ExtractNumber(string text)
{
Match match = Regex.Match(text, #"_(\d+)\.(png)");
if (match == null)
{
return 0;
}
int value;
if (!int.TryParse(match.Value, out value))
{
return 0;
}
return value;
}
Then you could sort your list using:
list.Sort((x, y) => ExtractNumber(x).CompareTo(ExtractNumber(y)));
Maybe this?
string[] filePaths = Directory.GetFiles(filePath, "*.png").OrderBy(n => n);
EDIT: As Marcelo pointed, I belive you can get get all file names you can get their numerical part with a regex, than you can sort them including their file names.
This code would do that:
var dir = #"C:\Pictures";
var sorted = (from fn in Directory.GetFiles(dir)
let m = Regex.Match(fn, #"(?<order>\d+)")
where m.Success
let n = int.Parse(m.Groups["order"].Value)
orderby n
select fn).ToList();
foreach (var fn in sorted) Console.WriteLine(fn);
It also filters out those files that has not a number in their names.
You may want to change the regex pattern to match more specific name structures for file names.
I recently made a little application to read in a text file of lyrics, then use a Dictionary to calculate how many times each word occurs. However, for some reason I'm finding instances in the output where the same word occurs multiple times with a tally of 1, instead of being added onto the original tally of the word. The code I'm using is as follows:
StreamReader input = new StreamReader(path);
String[] contents = input.ReadToEnd()
.ToLower()
.Replace(",","")
.Replace("(","")
.Replace(")", "")
.Replace(".","")
.Split(' ');
input.Close();
var dict = new Dictionary<string, int>();
foreach (String word in contents)
{
if (dict.ContainsKey(word))
{
dict[word]++;
}else{
dict[word] = 1;
}
}
var ordered = from k in dict.Keys
orderby dict[k] descending
select k;
using (StreamWriter output = new StreamWriter("output.txt"))
{
foreach (String k in ordered)
{
output.WriteLine(String.Format("{0}: {1}", k, dict[k]));
}
output.Close();
timer.Stop();
}
The text file I'm inputting is here: http://pastebin.com/xZBHkjGt (it's the lyrics of the top 15 rap songs, if you're curious)
The output can be found here: http://pastebin.com/DftANNkE
A quick ctrl-F shows that "girl" occurs at least 13 different times in the output. As far as I can tell, it is the exact same word, unless there's some sort of difference in ASCII values. Yes, there are some instances on there with odd characters in place of a apostrophe, but I'll worry about those later. My priority is figuring out why the exact same word is being counted 13 different times as different words. Why is this happening, and how do I fix it? Any help is much appreciated!
Another way is to split on non words.
var lyrics = "I fly with the stars in the skies I am no longer tryin' to survive I believe that life is a prize But to live doesn't mean your alive Don't worry bout me and who I fire I get what I desire, It's my empire And yes I call the shots".ToLower();
var contents = Regex.Split(lyrics, #"[^\w'+]");
Also here's an alternative (and probably more obscure) loop
int value;
foreach (var word in contents)
{
dict[word] = dict.TryGetValue(word, out value) ? ++value : 1;
}
dict.Remove("");
If you notice, the repeat occurrences appear on a line following a word which apparently doesn't have a count.
You're not stripping out newlines, so em\r\ngirl is being treated as a different word.
String[] contents = input.ReadToEnd()
.ToLower()
.Replace(",", "")
.Replace("(", "")
.Replace(")", "")
.Replace(".", "")
.Split("\r\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
Works better.
Add Trim to each word:
foreach (String word in contents.Select(w => w.Trim()))