Reading file names, respectively?

Reading file names, respectively? - c#

I have 1000 files in a folder, I want to find the name of the file, but when I do it
The file names are not sorted.
For example: These are my filenames
1-CustomerApp.txt
2-CustomerApp.txt
3-CustomerApp.txt
...
var adddress = #"data\";
str = "";
DirectoryInfo d = new DirectoryInfo(adddress);//Assuming Test is your Folder
FileInfo[] Files = d.GetFiles("*.txt"); //Getting Text files
foreach (FileInfo file in Files)
{
str = str + ", " + file.Name;
}

You could use this to keep a "numerical" order
FileInfo[] files = d.GetFiles("*.txt")
.OrderBy(m => m.Name.PadLeft(200, '0')).ToArray();
200 is quite arbitrary, of course.
This will add "as many 0 as needed" so that the file name + n 0 are a string with a length of 200.
to make it a little bit less arbitrary and "brittle", you could do
var f = d.GetFiles("*.txt");
var maxLength = f.Select(l => l.Name.Length).Max() + 1;
FileInfo[] files = f.OrderBy(m => m.Name.PadLeft(maxLength, '0')).ToArray();
How does it work (bad explanation, someone could do better) and why is it far from perfect :
Ordering of characters in c# puts numeric characters before alpha characters.
So 0 (or 9) will be before a.
But 10a, will be before 2a as 1 comes before 2.
see char after char
1 / 2 => 10a first
But if we change
10a and 2a, with PadLeft, to
010a and 002a, we see, character after character
0 / 0 => equal
1 / 0 => 002a first
This "works" in your case, but really depends on your "file naming" logic.
CAUTION : This solution won't work with other file naming logic
For example, the numeric part is not at the start.
f-2-a and f-10-a
Because
00-f-2-a would still be before 0-f-10-a
or the "non-numeric part" is not of the same length.
1-abc and 2-z
Because
01-abc will come after 0002-z

They are sorted alphabetically. I don't know how do you see what they are not sorted (with what you compare or where have you seen different result), if you are using file manager, then perhaps it apply own sorting. In windows explorer result will be the same (try to sort by name column).
If you know template of how file name is created, then you can do a trick, to apply own sorting. In case of your example, extract number at start (until minus) and pad it with zeroes (until maximum possible number size), and then sort what you get.
Or you can pad with zeroes when files are created:
0001-CustomerApp.txt
0002-CustomerApp.txt
...
9999-CustomerApp.txt

This should give you more or less what you want; it skips any characters until it hits a number, then includes all directly following numerical characters to get the number. That number is then used as a key for sorting. Somewhat messy perhaps, but should work (and be less brittle than some other options, since this searches explicitly for the first "whole" number within the filename):
var filename1 = "10-file.txt";
var filename2 = "2-file.txt";
var filenames = new []{filename1, filename2};
var sorted =
filenames.Select(fn => new {
nr = int.Parse(new string(
fn.SkipWhile(c => !Char.IsNumber(c))
.TakeWhile(c => Char.IsNumber(c))
.ToArray())),
name = fn
})
.OrderBy(file => file.nr)
.Select(file => file.name);

Try the alphanumeric-sorting and sort the files' names according to it.
FileInfo[] files = d.GetFiles("*.txt");
files.OrderBy(file=>file.Name, new AlphanumComparatorFast())

Related

Working on huge text file, C#. Modifying the file

Please, help me resolve this issue.
I have a huge input.txt. Now it's 465 Mb, but later it will be 1Gb at least.
User enters a term (not a whole word). Using that term I need to find a word that contains it, put it between <strong> tags and save the contents to the output.txt. The term-search should be case insensitive.
This is what I have so far. It works on small texts, but doesn't on bigger ones.
Regex regex = new Regex(" ");
string text = File.ReadAllText("input.txt");
Console.WriteLine("Please, enter a term to search for");
string term = Console.ReadLine();
string[] w = regex.Split(text);
for (int i = 0; i < w.Length; i++)
{
if (Processor.Contains(w[i], term, StringComparison.OrdinalIgnoreCase))
{
w[i] = #"<strong>" + w[i] + #"</string>";
}
}
string result = null;
result = string.Join(" ", w);
File.WriteAllText("output.txt", result);

Trying to read the entire file in one go is causing your memory exception. Look into reading the file in stages. The FileStream and BufferedStream classes provide ways of doing this:
https://msdn.microsoft.com/en-us/library/system.io.filestream(v=vs.110).aspx
https://msdn.microsoft.com/en-us/library/system.io.bufferedstream.read(v=vs.110).aspx

Try not to load the entire file into memory, avoid huge GB-size arrays, Strings etc. (you may just not have enough RAM). Can you process the file line by line (i.e. you don't have multiline terms, do you?)? If it's your case then
...
var source = File
.ReadLines("input.txt") // Notice absence of "All", not ReadAllLines
.Select(line => line.Split(' ')) // You don't need Regex here, just Split
.Select(items => items
.Select(item => String.Equals(item, term, StringComparison.OrdinalIgnoreCase)
? #"<strong>" + term + #"</strong>"
: item))
.Select(items => String.Join(" ", items));
File.WriteAllLines("output.txt", source);

Read the file line by line (or buffer more lines). A bit slower but should work.
Also there can be a problem if all the lines match your term. Consider writing results in a temporary file when you find them and then just rename/move the file to the destination folder.

In C# matching all files in a directory using regex

I am currently trying to use the below regular expression in C#
Regex reg = new Regex(#"-(FILENM01P\\.(\\d){3}\\.PGP)$");
var files = Directory.GetFiles(savePath, "*.PGP")
.Where(path => reg.IsMatch(path))
.ToList();
foreach (string file in files)
{
MessageBox.Show(file);
}
To match all files that have this file naming convention in a single to directory
FILENM01P.001.PGP
If I just load up all files like this
var files = Directory.GetFiles(savePath, "*.PGP")
foreach (string file in files)
{
MessageBox.Show(file);
}
The I get a string like this; etc.
C:\Users\User\PGP Files\FILENM01P.001.PGP
There could be many of these files for example
FILENM01P.001.PGP
FILENM01P.002.PGP
FILENM01P.003.PGP
FILENM01P.004.PGP
But there will never be
FILENM01P.000.PGP
FILENM01P.1000.PGP
To clarify, only the 3 numbers together will change and can only be between 001 to 999 (with leading zeros) the rest of the text is static and will never change.
I'm a complete novice when it comes to RegEx so any help would be greatly appreciated.
Essentially my end goal is to find the next number and create the file and if there are no files then it will create one starting at 001 and if it gets to 999 then it returns 1000 so that I know I need to move to a new directory as each directory is limited to 999 sequential files. (I'll deal with this stuff though)

Try this code.
var reg = new Regex(#"FILENM01P\.(\d{3})\.PGP");
var matches = files.Select(f => reg.Match(f)).Where(f => f.Success).Select(x=> Convert.ToInt32(x.Value.Split('.')[1])).ToList();
var nextNumber = (matches.Max() + 1).ToString("D3"); // 3 digit with leading zeros
Also you might need a if check to see if the next number is 1000 if so then return 0.
(matches.Max() + 1 > 999? 0:matches.Max() + 1).ToString("D3")
My test case.
List<string> files = new List<string>();
files.Add(#"C:\Users\User\PGP Files\FILENM01P.001.PGP");
files.Add(#"C:\Users\User\PGP Files\FILENM01P.002.PGP");
files.Add(#"C:\Users\User\PGP Files\FILENM01P.003.PGP");
files.Add(#"C:\Users\User\PGP Files\FILENM01P.004.PGP");
The output is
nextNumber = "005";

Regex regex = new Regex(#"FILENM01P\.(\d+)\.", RegexOptions.IgnoreCase);
var fnumbers = Directory.GetFiles(src, "*.PGP", SearchOption.TopDirectoryOnly)
.Select(f=>regex.Match(f))
.Where(m=>m.Success)
.Select(m=>int.Parse(m.Groups[1].Value));
int fileNum = 1 + (fnumbers.Any() ? fnumbers.Max() : 0);

You can do something like this:
var reg = new Regex(#"FILENM01P\.(\d{3})\.PGP");
var matches = files.Select(f => reg.Match(f)).Where(f => f.Success).ToList();
var nextNumber = matches.Any()
? matches.Max(f => int.Parse(f.Groups[1].Value)) + 1
: 1;
Where files is a list of the files to match.

reading in text file and spliting by comma in c#

I have a text file whose format is like this
Number,Name,Age
I want to read "Number" at the first column of this text file into an array to find duplication. here is the two ways i tried to read in the file.
string[] account = File.ReadAllLines(path);
string readtext = File.ReadAllText(path);
But every time i try to split the array to just get whats to the left of the first comma i fail. Have any ideas? Thanks.

You need to explicitly split the data to access its various parts. How would your program otherwise be able to decide that it is separated by commas?
The easiest approach to access the number that comes to my mind goes something like this:
var lines = File.ReadAllLines(path);
var firstLine = lines[0];
var fields = firstLine.Split(',');
var number = fields[0]; // Voilla!
You could go further by parsing the number as an int or another numeric type (if it really is a number). On the other hand, if you just want to test for uniqueness, this is not really necessary.

If you want all duplicate lines according to the Number:
var numDuplicates = File.ReadLines(path)
.Select(l => l.Trim().Split(','))
.Where(arr => arr.Length >= 3)
.Select(arr => new {
Number = arr[0].Trim(),
Name = arr[1].Trim(),
Age = arr[2].Trim()
})
.GroupBy(x => x.Number)
.Where(g => g.Count() > 1);
foreach(var dupNumGroup in numDuplicates)
Console.WriteLine("Number:{0} Names:{1} Ages:{2}"
, dupNumGroup.Key
, string.Join(",", dupNumGroup.Select(x => x.Name))
, string.Join(",", dupNumGroup.Select(x => x.Age)));

If you are looking specifically for a string.split solution, here is a really simple method of doing what you are looking for:
List<int> importedNumbers = new List<int>();
// Read our file in to an array of strings
var fileContents = System.IO.File.ReadAllLines(path);
// Iterate over the strings and split them in to their respective columns
foreach (string line in fileContents)
{
var fields = line.Split(',');
if (fields.Count() < 3)
throw new Exception("We need at least 3 fields per line."); // You would REALLY do something else here...
// You would probably want to be more careful about your int parsing... (use TryParse)
var number = int.Parse(fields[0]);
var name = fields[1];
var age = int.Parse(fields[2]);
// if we already imported this number, continue on to the next record
if (importedNumbers.Contains(number))
continue; // You might also update the existing record at this point instead of just skipping...
importedNumbers.Add(number); // Keep track of numbers we have imported
}

IndexOf does not correctly identify if a line starts with a value

How can I remove a whole line from a text file if the first word matches to a variable I have?
What I'm currently trying is:
List<string> lineList = File.ReadAllLines(dir + "textFile.txt").ToList();
lineList = lineList.Where(x => x.IndexOf(user) <= 0).ToList();
File.WriteAllLines(dir + "textFile.txt", lineList.ToArray());
But I can't get it to remove.

The only mistake that you have is you are checking <= 0 with indexOf, instead of = 0.
-1 is returned when the string does not contain the searched for string.
<= 0 means either starts with or does not contain
=0 means starts with <- This is what you want

This method will read the file line-by-line instead of all at once. Also note that this implementation is case-sensitive.
It also assumes you aren't subjected to leading spaces.
using (var writer = new StreamWriter("temp.file"))
{
//here I only write back what doesn't match
foreach(var line in File.ReadLines("file").Where(x => !x.StartsWith(user)))
writer.WriteLine(line); // not sure if this will cause a double-space ?
}
File.Move("temp.file", "file");

You were pretty close, String.StartsWith handles that nicely:
// nb: if you are case SENSITIVE remove the second argument to ll.StartsWith
File.WriteAllLines(
path,
File.ReadAllLines(path)
.Where(ll => ll.StartsWith(user, StringComparison.OrdinalIgnoreCase)));
For really large files that may not be well performing, instead:
// Write our new data to a temp file and read the old file On The Fly
var temp = Path.GetTempFileName();
try
{
File.WriteAllLines(
temp,
File.ReadLines(path)
.Where(
ll => ll.StartsWith(user, StringComparison.OrdinalIgnoreCase)));
File.Copy(temp, path, true);
}
finally
{
File.Delete(temp);
}
Another issue noted was that both IndexOf and StartsWith will treat ABC and ABCDEF as matches if the user is ABC:
var matcher = new Regex(
#"^" + Regex.Escape(user) + #"\b", // <-- matches the first "word"
RegexOptions.CaseInsensitive);
File.WriteAllLines(
path,
File.ReadAllLines(path)
.Where(ll => matcher.IsMatch(ll)));

Use `= 0` instead of `<= 0`.

Sort FileSystemInfo[] by Name

I have probably spent about 500 hours Googling this and reading MSDN documentation and it still refuses to work the way I want.
I can sort by name for files like this:
01.png
02.png
03.png
04.png
I.e. all the same file length.
The second there is a file with a longer file length everything goes to hell.
For example in the sequence:
1.png
2.png
3.png
4.png
5.png
10.png
11.png
It reads:
1.png, 2.png then 10.png, 11.png
I don't want this.
My Code:
DirectoryInfo di = new DirectoryInfo(directoryLoc);
FileSystemInfo[] files = di.GetFileSystemInfos("*." + fileExtension);
Array.Sort<FileSystemInfo>(files, new Comparison<FileSystemInfo>(compareFiles));
foreach (FileInfo fri in files)
{
fri.MoveTo(directoryLoc + "\\" + prefix + "{" + operationNumber.ToString() + "}" + (i - 1).ToString("D10") +
"." + fileExtension);
i--;
x++;
progressPB.Value = (x / fileCount) * 100;
}
// compare by file name
int compareFiles(FileSystemInfo a, FileSystemInfo b)
{
// return a.LastWriteTime.CompareTo(b.LastWriteTime);
return a.Name.CompareTo(b.Name);
}

It's not a matter of the file length particularly - it's a matter of the names being compared in lexicographic order.
It sounds like in this particular case you want to get the name without the extension, try to parse it as an integer, and compare the two names that way - you could fall back to lexicographic ordering if that fails.
Of course, that won't work if you have "debug1.png,debug2.png,...debug10.png"...you'd need a more sophisticated algorithm in that case.

You're comparing the names as strings, even though (I'm assuming) you want them sorted by number.
This is a well-known problem where "10" comes before "9" because the first character in 10 (1) is less than the first character in 9.
If you know that the files will all consist of numbered names, you can modify your custom sort routine to convert the names to integers and sort them appropriately.

Your code is correct and working as expected, just the sort is performed alphabetically, not numerically.
For instance, the strings "1", "10", "2" are in alphabetical order. Instead if you know your filenames are always just a number plus ".png" you can do the sort numerically. For instance, something like this:
int compareFiles(FileSystemInfo a, FileSystemInfo b)
{
// Given an input 10.png, parses the filename as integer to return 10
int first = int.Parse(Path.GetFileNameWithoutExtension(a.Name));
int second = int.Parse(Path.GetFileNameWithoutExtension(b.Name));
// Performs the comparison on the integer part of the filename
return first.CompareTo(second);
}

I ran into this same issue, but instead of sorting the list myself, I changed the filename by using 6 digit '0' padded key.
My list now looks like this:
000001.jpg
000002.jpg
000003.jpg
...
000010.jpg
But, if you can't change the filenames, you're going to have to implement your own sorting routine to deal with the alpha sort.

How about a bit of linq and regex to fix the ordering?
var orderedFileSysInfos =
new DirectoryInfo(directoryloc)
.GetFileSystemInfos("*." + fileExtension)
//regex below grabs the first bunch of consecutive digits in file name
//you might want something different
.Select(fsi => new{fsi, match = Regex.Match(fsi.Name, #"\d+")})
//filter away names without digits
.Where(x => x.match.Success)
//parse the digits to int
.Select(x => new {x.fsi, order = int.Parse(x.match.Value)})
//use this value to perform ordering
.OrderBy(x => x.order)
//select original FileSystemInfo
.Select(x => x.fsi)
//.ToArray() //maybe?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Reading file names, respectively? - c#

Try the alphanumeric-sorting and sort the files' names according to it. FileInfo[] files = d.GetFiles("*.txt"); files.OrderBy(file=>file.Name, new AlphanumComparatorFast())

Related

Working on huge text file, C#. Modifying the file

In C# matching all files in a directory using regex

reading in text file and spliting by comma in c#

IndexOf does not correctly identify if a line starts with a value

Sort FileSystemInfo[] by Name

Categories

Resources