Better way to detect file differences between 2 directories? - c#

I made some C# functions to roughly "diff" 2 directories, similar to KDiff3.
First this function compares file names between directories. Any difference in file names implies a file has been added to dir1:
public static List<string> diffFileNamesInDirs(string dir1, string dir2)
{
List<string> dir1FileNames = Directory
.EnumerateFiles(dir1, "*", SearchOption.AllDirectories)
.Select(Path.GetFullPath)
.Select(entry => entry.Replace(dir1 + "\\", "")
.ToList();
List<string> dir2FileNames = Directory
.EnumerateFiles(dir2, "*", SearchOption.AllDirectories)
.Select(Path.GetFullPath)
.Select(entry => entry.Replace(dir2 + "\\", "")
.ToList();
List<string> diffs = dir1FileNames.Except(dir2FileNames).Distinct().ToList();
return diffs;
}
Second this function compares file sizes for file names which exist in both directories. Any difference in file size implies some edit has been made:
public static List<string> diffFileSizesInDirs(string dir1, string dir2)
{
//Get list of file paths, relative to the base dir1/dir2 directories
List<string> dir1FileNames = Directory
.EnumerateFiles(dir1, "*", SearchOption.AllDirectories)
.Select(Path.GetFullPath)
.Select(entry => entry.Replace(dir1 + "\\", "")
.ToList();
List<string> dir2FileNames = Directory
.EnumerateFiles(dir2, "*", SearchOption.AllDirectories)
.Select(Path.GetFullPath)
.Select(entry => entry.Replace(dir2 + "\\", "")
.ToList();
List<string> sharedFileNames = dir1FileNames.Intersect(dir2FileNames).Distinct().ToList();
//Get list of file sizes corresponding to file paths
List<long> dir1FileSizes = sharedFileNames
.Select(s =>
new FileInfo(dir1 + "\\" + s) //Create the full file path as required for FileInfo objects
.Length).ToList();
List<long> dir2FileSizes = sharedFileNames
.Select(s =>
new FileInfo(dir2 + "\\" + s) //Create the full file path as required for FileInfo objects
.Length).ToList();
List<string> changedFiles = new List<string>();
for (int i = 0; i < sharedFileNames.Count; i++)
{
//If file sizes are different, there must have been a change made to one of the files.
if (dir1FileSizes[i] != dir2FileSizes[i])
{
changedFiles.Add(sharedFileNames[i]);
}
}
return changedFiles;
}
Lastly combining the results gives a list of all files which have been added/edited between the directories:
List<string> nameDiffs = FileIO.diffFileNamesInDirs(dir1, dir2);
List<string> sizeDiffs = FileIO.diffFileSizesInDirs(dir1, dir2);
List<string> allDiffs = nameDiffs.Concat(sizeDiffs).ToList();
This approach generally works but feels sloppy and also would fail for the "binary equal" case where a file is modified but still has the same size. Any suggestions on a better way?

You could use System.Security.Cryptographie.MD5 to calculate MD5 for each file and compare these.
E.g. using this Method:
public static string GetMd5Hash(string path)
{
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(path))
{
var hash = md5.ComputeHash(stream);
return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
}
}
}
Maybe this takes a little bit more time than geting values from FileInfo (depends on the amount of file to compare), but you can be completely sure if files are binary identical.

Related

Sorting files according to creation time and storing in string array is possible?

I am trying to sort files according to their creation time from a specific directory and store them in a string array. But I am getting the
error "Cannot implicitly convert type 'System.IO.FileInfo[]' to 'string[]'. Is it not possible the store the data
in string array?
Here is my code:
string[] getFiles(string path, string text, string fileExtension)
{
try
{
string searchingText = text;
searchingText = "*" + text + "*";
string[] filesArray = Directory.GetFiles(path, searchingText, SearchOption.TopDirectoryOnly).Select(f => new FileInfo(f)).OrderBy(f => f.CreationTime).ToArray();
//filesArray = filesArray.OrderBy(s => s.).ToArray();
//string[] filesArray2 = Array.Sort(filesArray);
List<string> filesList = new List<string>(filesArray);
List<string> newFilesList = new List<string>();
foreach(string file in filesList)
{
if ( file.Contains(fileExtension) == true)
{
newFilesList.Add(file);
}
}
string[] files = newFilesList.ToArray();
return files;
}
catch
{
string[] files = new string[0];
return files;
}
}
Where are you getting that error? The error is pretty self-explanotory: your function returns an array of strings, and you cannot simply cast a string to a FileInfo object. In order to get a FileInfo object, use:
var fi = new FileInfo(fileName);
However, if all you want is to get a sorted list, I'd go about this differently, for example:
var folder = #"C:\temp";
var files = Directory
.EnumerateFiles(folder)
.OrderBy(x => File.GetCreationTime(x))
.ToArray();
This will give you a list of strings holding file names, sorted by their creation date.
Edit:
If you'd only want files with a given extension, you could expand the LINQ query:
var files = Directory
.EnumerateFiles(folder)
.Where(x => x.EndsWith(".ext"))
.OrderBy(x => File.GetCreationTime(x))
.ToArray();

C# detect extraneous files

I have folder with these files:
image1.png
image2.png
image3.png
image4.png
image5.png
And I need to check is exists extraneous files in this folder, for example if I create example.file.css I need to give an error, there must be only that files which listed above. So i've created needed files string:
string[] only_these_files = {
"image1.png",
"image2.png",
"image3.png",
"image4.png",
"image5.png"
};
Now I need to search for extraneous files, but how to? Thanks immediately.
Use Directory.GetFiles:
https://msdn.microsoft.com/en-us/library/07wt70x2(v=vs.110).aspx
And compare with your list of allowed files.
string[] only_these_files = {
"image1.png",
"image2.png",
"image3.png",
"image4.png",
"image5.png"
};
string[] fileEntries = Directory.GetFiles(targetDirectory);
List<String> badFiles = new List<string>();
foreach (string fileName in fileEntries)
if (!only_these_files.Contains(fileName))
{
badFiles.Add(fileName);
}
This would be my implementation with the use of a lil' LINQ
var onlyAllowedFiles = new List<string>
{
"image1.png",
"image2.png",
"image3.png",
"image4.png",
"image5.png"
};
var path = "";
var files = Directory.GetFiles(path);
var nonAllowedFiles = files.Where(f => onlyAllowedFiles.Contains(f) == false);
Or alternatively if you wish to only detect the presence of illegal files.
var errorState = files.Any(f => onlyAllowedFiles.Contains(f) == false);

Copy files from the latest sub directory in c#

I have been struggling with this for a few days now and cannot figure it out.
I need to copy files from the last created sub directory in a directory, the sub directory has a few sub directories as well to navigate before I get to the files and that is where the problem comes in.
I hope I made this clear, I will give an example of the directories below, thanks in advance for the help.
C:\ProgramFiles\BuildOutput\mmh\LongTerm\**49**\release\MarketMessageHandler\Service\
The number highlighted in bold is the sub directory that I need to find the latest one and in the services folder is where I need to copy the files from...
Here is my code I tried
string sourceDir = #"\sttbedbsd001\BuildOutput\mmh\LongTerm\51\release\MarketMessageHandler\Service";
string target = #"C:\Users\gwessels\Desktop\test\";
string[] sDirFiles = Directory.GetFiles(sourceDir, "*", SearchOption.TopDirectoryOnly);
string targetDir;
if (sDirFiles.Length > 0)
{
foreach (string file in sDirFiles)
{
string[] splitFile = file.Split('\\');
string copyFile = Path.GetFileName(file);
string source = sourceDir + "\\" + copyFile;
targetDir = target + copyFile;
try
{
if (File.Exists(targetDir))
{
File.Delete(targetDir);
File.Copy(source, targetDir);
}
else
{
File.Copy(source, targetDir);
}
}
catch (Exception e)
{
Console.WriteLine(e);
}
}
}
Assuming that the LongTerm directory is known since it is stored somewhere(f.e. Application-Settings):
string longTermDirectory = Properties.Settings.Default.LongTermDirectory;
DirectoryInfo dir = new DirectoryInfo(longTermDirectory);
dir.Create(); // does nothing if it already exists
int Number = int.MinValue;
DirectoryInfo latestFolder = dir.EnumerateDirectories("*.*", SearchOption.AllDirectories)
.Where(d => int.TryParse(d.Name, out Number))
.Select(Directory => new { Directory, Number })
.OrderByDescending(x => x.Number)
.Select(x => x.Directory)
.First();
Directory.EnumerateDirectories with SearchOption.AllDirectories enumerates all directories recursively. Enumerable.OrderByDescending with the number of the directory-name will order them numerically and highest first(so 50 before 49 and 100 before 99).

Fast FileSize Compare with Linq

i have two file directories and i want to be sure both are identical. Therefore i've created a query to put all Files into on FileInfo array. I grouped all files by their FileName and want now compare for every group both Files for their 'LastWriteAccess' and 'Length'.
But, to be honest, like i do this, its far to slow. Any Idea how i could compare the Files within a Group over Linq about their Length and let me do 'sth' if the are different?
...
FileInfo[] fiArrOri5 = d5ori.GetFiles("*.*", System.IO.SearchOption.TopDirectoryOnly);
FileInfo[] fiArrNew5 = d5new.GetFiles("*.*", System.IO.SearchOption.TopDirectoryOnly);
FileInfo[] AllResults = new FileInfo[fiArrNew5.Length+fiArrOri5.Length];
fiArrNew5.CopyTo(AllResults, 0);
fiArrOri5.CopyTo(AllResults, fiArrNew5.Length);
var duplicateGroups = AllResults.GroupBy(file => file.Name);
foreach (var group in duplicateGroups)
{
AnzahlElemente = group.Count();
if (AnzahlElemente == 2)
{
if (group.ElementAt(0).Length != group.ElementAt(1).Length)
{
// do sth
}
}
...
}
EDIT:
if i run only the following snippet, it runs super fast. (~00:00:00:0005156)
Console.WriteLine(group.ElementAt(0).LastWriteTime);
if i run only the following snippet, it runs super slow. (~00:00:00:0750000)
Console.WriteLine(group.ElementAt(1).LastWriteTime);
Any Idea why ?
I'm not sure this will be faster - but this is how I would have done this:
var folderPathOne = "FolderPath1";
var folderPathTwo = "FolderPath2";
//Get all the filenames from dir 1
var directoryOne = Directory
.EnumerateFiles(folderPathOne, "*.*", SearchOption.TopDirectoryOnly)
.Select(Path.GetFileName);
//Get all the filenames from dir 2
var directoryTwo = Directory
.EnumerateFiles(folderPathTwo, "*.*", SearchOption.TopDirectoryOnly)
.Select(Path.GetFileName);
//Get only the files that appear in both directories
var filesToCheck = directoryOne.Intersect(directoryTwo);
var differentFiles = filesToCheck.Where(f => new FileInfo(folderPathOne + f).Length != new FileInfo(folderPathTwo + f).Length);
foreach(var differentFile in differentFiles)
{
//Do something
}

C#:Getting all image files in folder

I am trying to get all images from folder but ,this folder also include sub folders. like /photos/person1/ and /photos/person2/ .I can get photos in folder like
path= System.IO.Directory.GetCurrentDirectory() + "/photo/" + groupNO + "/";
public List<String> GetImagesPath(String folderName)
{
DirectoryInfo Folder;
FileInfo[] Images;
Folder = new DirectoryInfo(folderName);
Images = Folder.GetFiles();
List<String> imagesList = new List<String>();
for (int i = 0; i < Images.Length; i++)
{
imagesList.Add(String.Format(#"{0}/{1}", folderName, Images[i].Name));
// Console.WriteLine(String.Format(#"{0}/{1}", folderName, Images[i].Name));
}
return imagesList;
}
But how can I get all photos in all sub folders? I mean I want to get all photos in /photo/ directory at once.
Have a look at the DirectoryInfo.GetFiles overload that takes a SearchOption argument and pass SearchOption.AllDirectories to get the files including all sub-directories.
Another option is to use Directory.GetFiles which has an overload that takes a SearchOption argument as well:
return Directory.GetFiles(folderName, "*.*", SearchOption.AllDirectories)
.ToList();
I'm using GetFiles wrapped in method like below:
public static String[] GetFilesFrom(String searchFolder, String[] filters, bool isRecursive)
{
List<String> filesFound = new List<String>();
var searchOption = isRecursive ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly;
foreach (var filter in filters)
{
filesFound.AddRange(Directory.GetFiles(searchFolder, String.Format("*.{0}", filter), searchOption));
}
return filesFound.ToArray();
}
It's easy to use:
String searchFolder = #"C:\MyFolderWithImages";
var filters = new String[] { "jpg", "jpeg", "png", "gif", "tiff", "bmp", "svg" };
var files = GetFilesFrom(searchFolder, filters, false);
There's a good one-liner solution for this on a similar thread:
get all files recursively then filter file extensions with LINQ
Or if LINQ cannot be used, then use a RegEx to filter file extensions:
var files = Directory.GetFiles("C:\\path", "*.*", SearchOption.AllDirectories);
List<string> imageFiles = new List<string>();
foreach (string filename in files)
{
if (Regex.IsMatch(filename, #"\.jpg$|\.png$|\.gif$"))
imageFiles.Add(filename);
}
I found the solution this Might work
foreach (string img in Directory.GetFiles(Environment.GetFolderPath(Environment.SpecialFolder.Desktop),"*.bmp" + "*.jpg" + "SO ON"))
You need the recursive form of GetFiles:
DirectoryInfo.GetFiles(pattern, searchOption);
(specify AllDirectories as the SearchOption)
Here's a link for more information:
MSDN: DirectoryInfo.GetFiles
This allows you to use use the same syntax and functionality as Directory.GetFiles(path, pattern, options); except with an array of patterns instead of just one.
So you can also use it to do tasks like find all files that contain the word "taxes" that you may have used to keep records over the past year (xlsx, xls, odf, csv, tsv, doc, docx, pdf, txt...).
public static class CustomDirectoryTools {
public static string[] GetFiles(string path, string[] patterns = null, SearchOption options = SearchOption.TopDirectoryOnly) {
if(patterns == null || patterns.Length == 0)
return Directory.GetFiles(path, "*", options);
if(patterns.Length == 1)
return Directory.GetFiles(path, patterns[0], options);
return patterns.SelectMany(pattern => Directory.GetFiles(path, pattern, options)).Distinct().ToArray();
}
}
In order to get all image files on your c drive you would implement it like this.
string path = #"C:\";
string[] patterns = new[] {"*.jpg", "*.jpeg", "*.jpe", "*.jif", "*.jfif", "*.jfi", "*.webp", "*.gif", "*.png", "*.apng", "*.bmp", "*.dib", "*.tiff", "*.tif", "*.svg", "*.svgz", "*.ico", "*.xbm"};
string[] images = CustomDirectoryTools.GetFiles(path, patterns, SearchOption.AllDirectories);
You can use GetFiles
GetFiles("*.jpg", SearchOption.AllDirectories)
GetFiles("*.jpg", SearchOption.AllDirectories) has a problem at windows7. If you set the directory to c:\users\user\documents\, then it has an exception: because of windows xp, win7 has links like Music and Pictures in the Documents folder, but theese folders don't really exists, so it creates an exception. Better to use a recursive way with try..catch.
This will get list of all images from folder and sub folders and it also take care for long file name exception in windows.
// To handle long folder names Pri external library is used.
// Source https://github.com/peteraritchie/LongPath
using Directory = Pri.LongPath.Directory;
using DirectoryInfo = Pri.LongPath.DirectoryInfo;
using File = Pri.LongPath.File;
using FileInfo = Pri.LongPath.FileInfo;
using Path = Pri.LongPath.Path;
// Directory and sub directory search function
public void DirectoryTree(DirectoryInfo dr, string searchname)
{
FileInfo[] files = null;
var allFiles = new List<FileInfo>();
try
{
files = dr.GetFiles(searchname);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
if (files != null)
{
try
{
foreach (FileInfo fi in files)
{
allFiles.Add(fi);
string fileName = fi.DirectoryName + "\\" + fi.Name;
string orgFile = fileName;
}
var subDirs = dr.GetDirectories();
foreach (DirectoryInfo di in subDirs)
{
DirectoryTree(di, searchname);
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
public List<String> GetImagesPath(String folderName)
{
var dr = new DirectoryInfo(folderName);
string ImagesExtensions = "jpg,jpeg,jpe,jfif,png,gif,bmp,dib,tif,tiff";
string[] imageValues = ImagesExtensions.Split(',');
List<String> imagesList = new List<String>();
foreach (var type in imageValues)
{
if (!string.IsNullOrEmpty(type.Trim()))
{
DirectoryTree(dr, "*." + type.Trim());
// output to list
imagesList.Add = DirectoryTree(dr, "*." + type.Trim());
}
}
return imagesList;
}
var files = new DirectoryInfo(path).GetFiles("File")
.OrderByDescending(f => f.LastWriteTime).First();
This could gives you the perfect result of searching file with its latest mod

Categories