c# - Remove old Files from FileInfo list - c#

I have a File-Info-List of more than 200 log-files from a directory.
Most of the files need to be in the list, but there are a few lists that should be ignored.
Here is an example of the File-List:
A300a1_ContentLink.log
A301a20_ContentLink.log
A1_4a0_ContentLink.log
B200a101_ContentLink.log
B200a101_ContentLink_20221208_115905.log
B200a101_ContentLink_20221208_115907.log
B200a101_ContentLink_20221208_120647.log
B201a1_ContentLink.log
B202a0_ContentLink.log
Explanation of the file name:
The first chars refer to a room (e.g. room A300 or A1). A room could have any description, eg B200, CXS2 or only CDD, the next to a device-name (e.g. device a1 oder device a20). Each device starts with a, followed by 1-3 digits. Last part of each file is "_ContentLink" .
All files with further ending, like _202211208_115905 are duplicates of older versions, that are needed in other programs, but not in my List.
My problem is that I only need the newest File of each logfile in my File-Info-List.
I initialized a FileInfo[] allFiles that contains all of the files of the directory.
Next I initialized a new FileInfo[] in which I would like to store only the newest version of each file.
My first attempt was to compare the LastWrite time
FileInfo currentFile = allFiles[0];
foreach (FileInfo file in allFiles)
{
if (file.LastWriteTime > currentFile.LastWriteTime)
{
currentFile = file;
}
}
But I only get back the latest file of the whole folder.
Now, I am thinking about to use Regular Expressions insteadt of .LastWriteTime, to exclude all Files that have a suffix after ContentLink.
But I don't know how and how to remove the outdated files from the list with all files (or transfer only the relevatn to a new File Info[]-List)
Thank you in advance for your ideas.

You can use a LINQ query to:
extract the name and time part from each file name
group the files by name and
select the latest (maximum) file by time
Something like :
var regex=new Regex("^(.*?)_ContentLink(.*?).log");
var latest=allFiles.Select(f=>{
var parts=regex.Match(f.Name);
return new {
File=f,
Name=parts.Groups[1].ToString(),
Date=parts.Groups[2].ToString()
};
})
.GroupBy(f=>f.Name)
.Select(g=>g.MaxBy(f=>f.Date).File)
.ToArray();
foreach(var file in latest)
{
Console.WriteLine(file.Name);
}
This produces
A300a1_ContentLink.log
A301a20_ContentLink.log
A1_4a0_ContentLink.log
B200a101_ContentLink_20221208_120647.log
B201a1_ContentLink.log
B202a0_ContentLink.log
MaxBy was added in .NET 6. Before that you can use the equivalent method from the MoreLINQ library.
The regular expression captures the smallest possible string before _ContentLink in the first group (.*?) and the smallest possible date part in the second group.
You could get a bit fancier and use different regular expressions to capture the name and time part. Combined with local functions, this results in a somewhat cleaner query:
var nameRex=new Regex("^(.*?)_ContentLink.*.log");
var timeRex=new Regex("^.*_ContentLink(.*?).log");
string NamePart(FileInfo f)
{
return nameRex.Match(f.Name).Groups[1].ToString();
}
string TimePart(FileInfo f)
{
return timeRex.Match(f.Name).Groups[1].ToString();
}
var latest=allFiles
.GroupBy(NamePart)
.Select(g=>g.MaxBy(TimePart))
.ToArray();

Related

Search for file in c#

I wanted to search for files in c# begin with a string.
I followed the code in the internet
string[] dirs = Directory.GetFiles(#"c:\", "c*");
but instead of finding "c", I want to find files contains a string (i mean the file name for example contain.txt and contain.pdf both has "contain") i created. Here is my code
string filetofind;
string[] dirs = Directory.GetFiles(#"c:\", filetofind + "*");
but it just not working, is there anyway else?
If
I want to find files contains a string i created
means you want to check file's content (not name) You have to load the file, e.g. (assuming stringToFind doesn't have line breaks)
string[] dirs = Directory
.EnumerateFiles(#"c:\", "*.txt"); // all txt files (put the right wildcard)
.Where(file => File
.ReadLines(file) // with at least one line
.Any(line => line.Contains(stringToFind))) // which contains stringToFind
.ToArray();
Edit: In case you want files' names which contain c, e.g. "mycode.txt", "Constraints.dat" etc. (but not "demo.com" since c is in the file's extension); you can try *c*.* wild card: file name contains c with any extension:
string[] dirs = Directory
.GetFiles(#"c:\", $"*{filetofind}*.*");
In case of elaborated condition, when standard wildcard in not enough, just add Where:
string[] dirs = Directory
.EnumerateFiles(#"c:\", "*.*")
.Where(path => Your_Condition(Path.GetFileNameWithoutExtension(path)))
.ToArray();
For instance, let's test file name for small (not capital) letter c
string[] dirs = Directory
.EnumerateFiles(#"c:\", "*.*")
.Where(path => Path.GetFileNameWithoutExtension(path).Contains('c'))
.ToArray();
To find files where the file name contains "foo", use
var files = Directory.EnumerateFiles("C:\\dir", "*foo*", SearchOption.AllDirectories);
To find files where the text content contains "foo" use:
var files = Directory.EnumerateFiles("C:\\dir", "*", SearchOption.AllDirectories)
.Where(f => File.ReadAllText(f).Contains("foo"));
This should work, but it will read the entire file as text until you stop enumerating the list of files, so you might want to filter the file list search pattern before reading them. You could also write your own method to inspect each file rather than reading the entire thing into memory for every file.
Substitute SearchOption.AllDirectories for SearchOption.TopDirectoryOnly if you only want to search that directory, and not recursively search subdirectories.
if the file you want find starts with "filetofind" then code is correct. But if "filetofind" comes somewhere between the complete file name then your code must change to
string filetofind;
string[] dirs = Directory.GetFiles(#"c:\", "*filetofind*");

C# Split File Name Beginner Exercise

I have a directory filled with multiple excel files that I would like to rename. The names all have leading integers and a '-'. For example: 0123456-Test_01. I would like to rename all of the files within this directory by removing this prefix. 0123456-Test_01 should just be Test_01. I can rename a hard coded instance of a string, but am having trouble getting the files and renaming all of them.
My code is below. Any help is appreciated, as I am clearly new to C#.
public static void Main()
{
//Successfully splits hardcoded string
var temp = "0005689-Test_01".Split('-');
Console.WriteLine(temp[1]);
Console.ReadLine();
//Unsuccessful renaming of all files within directory
List<string> files = System.IO.Directory.GetFiles(#"C:\Users\acars\Desktop\B", "*").ToList();
System.IO.File.Move(#"C:\Users\acars\Desktop\B\", #"C:\Users\acars\Desktop\B\".Split('-'));
foreach (string file in files)
{
var temp = files.Split('-');
return temp[1];
};
}
There are some errors to fix in your code.
The first one is the wrong usage of the variable files. This is the full list of files, not the single file that you want to split and move. As explained comments you should use the iterator result stored in the variable file
The most important problem is the fact that the File.Move method throws an exception if the destination file exists. After removing the first part of your filename string, you cannot be sure that the resulting name is unique in your directory.
So a check for the existance of the file before the Move is mandatory.
Finally, it is better use Directory.EnumerateFiles because this method allows you to start the execution of your moving code without loading first all filenames in memory in a list. (In a folder full of files this could make a noticeable difference in speed)
public static void Main()
{
string workPath = #"C:\Users\acars\Desktop\B";
foreach (string file in Directory.EnumerateFiles(workPath)
{
string[] temp = file.Split('-');
if(temp.Length > 1)
{
string newName = Path.Combine(workPath, temp[1]);
if(!File.Exists(newName))
File.Move(file, newName);
}
}
}
Pay also attention to the comment below from CodeNotFound. You are using an hard-coded path so the problem actually doesn't exist, but if the directory contains a single "-" in its name then you should use something like this to get the last element in the splitted array
string newName = Path.Combine(workPath, temp[temp.Length-1]);

How to match two paths pointing to the same file?

I have two lists containing paths to a directory of music files and I want to determine which of these files are stored on both lists and which are only stored on one. The problem lies in that the format of the paths differ between the two lists.
Format example:
List1: file://localhost//FILE/Musik/30%20Seconds%20To%20Mars.mp3
List2: \\FILE\Musik\30 Seconds To Mars.mp3
How do I go about comparing these two file paths and matching them to the same source?
The answer depends on your notion of "same file". If you merely want to check if the file is equal, but not the very same file, you could simply generate a hash over the file's content and compare that. If the hashes are equal (please use a strong hash, like SHA-256), you can be confident that the files are also. Likewise you could of course also compare the files byte by byte.
If you really want to figure that the two files are actually the same file, i.e. just addressed via different means (like file-URL or UNC path), you have a little more work to do.
First you need to find out the true file system path for each of the addresses. For example, you need to find the file system path behind the UNC path and/or file-URL (which typically is the URL itself). In the case of UNC paths, that are shares on a remote computer, you might even be able to do so.
Also, even if you have the local path figured out somehow, you also need to deal with different redirection mechanisms for local paths (on Windows junctions/reparse points/links; on UNIX symbolic or hard links). For example, you could have a share using file system link as source, while the file URL uses the true source path. So to the casual observer they still look like different files.
Having all that said, the "algorithm" would be something like this:
Figure out the source path for the URLs, UNC paths/shares, etc. you have
Figure out the local source path from those paths (considering links/junctions, subst.exe, etc.)
Normalize those paths, if necessary (i.e. a/b/../c is actually a/c)
Compare the resulting paths.
I think the best way to do it is by temporarily converting one of the paths to the other one's format. I would suggest you change the first to match the second.
string List1 = "file://localhost//FILE/Musik/30%20Seconds%20To%20Mars.mp3"
string List2 = "\\FILE\Musik\30 Seconds To Mars.mp3"
I would recommend you use Replace()-method.
Get rid of "file://localhost":
var tempStr = List1.Replace("file://localhost", "");
Change all '%20' into spaces:
tempStr = List1.Replace("%20", " ");
Change all '/' into '\':
tempStr = List1.Replace("/", "\");
VoilĂ ! To strings in matching format!
Use python: you can easily compare the two files like this
>>> import filecmp
>>> filecmp.cmp('file1.txt', 'file1.txt')
True
>>> filecmp.cmp('file1.txt', 'file2.txt')
False
to open the files with the file:// syntax use URLLIB
>>> import urllib
>>> file1 = urllib.urlopen('file://localhost/tmp/test')
for the normal files path use the standard file open.
>>> file2 = open('/pathtofile','r')
I agree completely with Christian, you should re-think structure of the lists, but the below should get you going.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication5
{
class Program
{
public static List<string> SanitiseList(List<string> list)
{
List<string> sanitisedList = new List<string>();
foreach (string filename in list)
{
String sanitisedFilename = String.Empty;
if (!String.IsNullOrEmpty(filename))
{
sanitisedFilename = filename;
// get rid of the encoding
sanitisedFilename = Uri.UnescapeDataString(sanitisedFilename);
// first of all change all back-slahses to forward slashes
sanitisedFilename = sanitisedFilename.Replace(#"\", #"/");
// if we have two back-slashes at the beginning assume its localhsot
if (sanitisedFilename.Substring(0, 2) == "//")
{
// remove these first double slashes and stick in localhost
sanitisedFilename = sanitisedFilename.TrimStart('/');
sanitisedFilename = sanitisedFilename = "//localhost" + "/" + sanitisedFilename;
}
// remove file
sanitisedFilename = sanitisedFilename.Replace(#"file://", "//");
// remove double back-slashes
sanitisedFilename = sanitisedFilename.Replace("\\", #"\");
// remove double forward-slashes (but not the first two)
sanitisedFilename = sanitisedFilename.Substring(0,2) + sanitisedFilename.Substring(2, sanitisedFilename.Length - 2).Replace("//", #"/");
}
if (!String.IsNullOrEmpty(sanitisedFilename))
{
sanitisedList.Add(sanitisedFilename);
}
}
return sanitisedList;
}
static void Main(string[] args)
{
List<string> listA = new List<string>();
List<string> listB = new List<string>();
listA.Add("file://localhost//FILE/Musik/BritneySpears.mp3");
listA.Add("file://localhost//FILE/Musik/30%20Seconds%20To%20Mars.mp3");
listB.Add("file://localhost//FILE/Musik/120%20Seconds%20To%20Mars.mp3");
listB.Add(#"\\FILE\Musik\30 Seconds To Mars.mp3");
listB.Add(#"\\FILE\Musik\5 Seconds To Mars.mp3");
listA = SanitiseList(listA);
listB = SanitiseList(listB);
List<string> missingFromA = listB.Except(listA).ToList();
List<string> missingFromB = listA.Except(listB).ToList();
}
}
}

finding a number in filename using regex

I don't have much experience with regexes and I wanted to rectify that. I decided to build an application that takes a directory name, scans all files (that all have a increasing serial number but differ subtly in their filenames. Example : episode01.mp4, episode_02.mp4, episod03.mp4, episode04.rmvb etc.)
The application should scan the directory, find the number in each file name and rename the file along wit the extension to a common format (episode01.mp4,episode02.mp4,episode03.mp4,episode04.rmvb etc.).
I have the following code:
Dictionary<string, string> renameDictionary = new Dictionary<string,string>();
DirectoryInfo dInfo = new DirectoryInfo(path);
string newFormat = "Episode{0}.{1}";
Regex regex = new Regex(#".*?(?<no>\d+).*?\.(?<ext>.*)"); //look for a number(before .) aext: *(d+)*.*
foreach (var file in dInfo.GetFiles())
{
string fileName = file.Name;
var match = regex.Match(fileName);
if (match != null)
{
GroupCollection gc = match.Groups;
//Console.WriteLine("Number : {0}, Extension : {2} found in {1}.", gc["no"], fileName,gc["ext"]);
renameDictionary[fileName] = string.Format(newFormat, gc["no"], gc["ext"]);
}
}
foreach (var renamePair in renameDictionary)
{
Console.WriteLine("{0} will be renamed to {1}.", renamePair.Key, renamePair.Value);
//stuff for renaming here
}
One problem in this code is that it also includes files which don't have numbers in the renameDictionary. It would also be helpful if you could point out any other gotchas that I should be careful about.
PS: I am assuming that the filenames will only contain numbers corresponding to serial (nothing like cam7_0001.jpg)
This simplest solution is probably to use Path.GetFileNameWithoutExtension to get the file name, and then the regex \d+$ to get the number at its end (or Path.GetExtension and \d+ to get the number anywhere).
You can also achieve this in a single replace:
Regex.Replace(fileName, #".*?(\d+).*(\.[^.]+)$", "Episode$1$2")
This regex is a bit better, in that it forces the extension not to contain dots.

Regex to parse out filename and partial path, conditionally

I have a C# app that uses the search functions to find all files in a directory, then shows them in a list. I need to be able to filter the files based on extension (possible using the search function) and directory (eg, block any in the "test" or "debug" directories from showing up).
My current code is something like:
Regex filter = new Regex(#"^docs\(?!debug\)(?'display'.*)\.(txt|rtf)");
String[] filelist = Directory.GetFiles("docs\\", "*", SearchOption.AllDirectories);
foreach ( String file in filelist )
{
Match m = filter.Match(file);
if ( m.Success )
{
listControl.Items.Add(m.Groups["display"]);
}
}
(that's somewhat simplified and consolidated, the actual regex is created from a string read from a file and I do more error checking in between.)
I need to be able to pick out a section (usually a relative path and filename) to be used as the display name, while ignoring any files with a particular foldername as a section of their path. For example, for these files, only ones with +s should match:
+ docs\info.txt
- docs\data.dat
- docs\debug\info.txt
+ docs\world\info.txt
+ docs\world\pictures.rtf
- docs\world\debug\symbols.rtf
My regex works for most of those, except I'm not sure how to make it fail on the last file. Any suggestions on how to make this work?
Try Directory.GetFiles. This should do what you want.
Example:
// Only get files that end in ".txt"
string[] dirs = Directory.GetFiles(#"c:\", "*.txt", SearchOption.AllDirectories);
Console.WriteLine("The number of files ending with .txt is {0}.", dirs.Length);
foreach (string dir in dirs)
{
Console.WriteLine(dir);
}
^docs\\(?:(?!\bdebug\\).)*\.(?:txt|rtf)$
will match a string that
starts with docs\,
does not contain debug\ anywhere (the \b anchor ensures that we match debug as an entire word), and
ends with .txt or .rtf.

Categories