Fast FileSize Compare with Linq - c#

i have two file directories and i want to be sure both are identical. Therefore i've created a query to put all Files into on FileInfo array. I grouped all files by their FileName and want now compare for every group both Files for their 'LastWriteAccess' and 'Length'.
But, to be honest, like i do this, its far to slow. Any Idea how i could compare the Files within a Group over Linq about their Length and let me do 'sth' if the are different?
...
FileInfo[] fiArrOri5 = d5ori.GetFiles("*.*", System.IO.SearchOption.TopDirectoryOnly);
FileInfo[] fiArrNew5 = d5new.GetFiles("*.*", System.IO.SearchOption.TopDirectoryOnly);
FileInfo[] AllResults = new FileInfo[fiArrNew5.Length+fiArrOri5.Length];
fiArrNew5.CopyTo(AllResults, 0);
fiArrOri5.CopyTo(AllResults, fiArrNew5.Length);
var duplicateGroups = AllResults.GroupBy(file => file.Name);
foreach (var group in duplicateGroups)
{
AnzahlElemente = group.Count();
if (AnzahlElemente == 2)
{
if (group.ElementAt(0).Length != group.ElementAt(1).Length)
{
// do sth
}
}
...
}
EDIT:
if i run only the following snippet, it runs super fast. (~00:00:00:0005156)
Console.WriteLine(group.ElementAt(0).LastWriteTime);
if i run only the following snippet, it runs super slow. (~00:00:00:0750000)
Console.WriteLine(group.ElementAt(1).LastWriteTime);
Any Idea why ?

I'm not sure this will be faster - but this is how I would have done this:
var folderPathOne = "FolderPath1";
var folderPathTwo = "FolderPath2";
//Get all the filenames from dir 1
var directoryOne = Directory
.EnumerateFiles(folderPathOne, "*.*", SearchOption.TopDirectoryOnly)
.Select(Path.GetFileName);
//Get all the filenames from dir 2
var directoryTwo = Directory
.EnumerateFiles(folderPathTwo, "*.*", SearchOption.TopDirectoryOnly)
.Select(Path.GetFileName);
//Get only the files that appear in both directories
var filesToCheck = directoryOne.Intersect(directoryTwo);
var differentFiles = filesToCheck.Where(f => new FileInfo(folderPathOne + f).Length != new FileInfo(folderPathTwo + f).Length);
foreach(var differentFile in differentFiles)
{
//Do something
}

Related

Better way to detect file differences between 2 directories?

I made some C# functions to roughly "diff" 2 directories, similar to KDiff3.
First this function compares file names between directories. Any difference in file names implies a file has been added to dir1:
public static List<string> diffFileNamesInDirs(string dir1, string dir2)
{
List<string> dir1FileNames = Directory
.EnumerateFiles(dir1, "*", SearchOption.AllDirectories)
.Select(Path.GetFullPath)
.Select(entry => entry.Replace(dir1 + "\\", "")
.ToList();
List<string> dir2FileNames = Directory
.EnumerateFiles(dir2, "*", SearchOption.AllDirectories)
.Select(Path.GetFullPath)
.Select(entry => entry.Replace(dir2 + "\\", "")
.ToList();
List<string> diffs = dir1FileNames.Except(dir2FileNames).Distinct().ToList();
return diffs;
}
Second this function compares file sizes for file names which exist in both directories. Any difference in file size implies some edit has been made:
public static List<string> diffFileSizesInDirs(string dir1, string dir2)
{
//Get list of file paths, relative to the base dir1/dir2 directories
List<string> dir1FileNames = Directory
.EnumerateFiles(dir1, "*", SearchOption.AllDirectories)
.Select(Path.GetFullPath)
.Select(entry => entry.Replace(dir1 + "\\", "")
.ToList();
List<string> dir2FileNames = Directory
.EnumerateFiles(dir2, "*", SearchOption.AllDirectories)
.Select(Path.GetFullPath)
.Select(entry => entry.Replace(dir2 + "\\", "")
.ToList();
List<string> sharedFileNames = dir1FileNames.Intersect(dir2FileNames).Distinct().ToList();
//Get list of file sizes corresponding to file paths
List<long> dir1FileSizes = sharedFileNames
.Select(s =>
new FileInfo(dir1 + "\\" + s) //Create the full file path as required for FileInfo objects
.Length).ToList();
List<long> dir2FileSizes = sharedFileNames
.Select(s =>
new FileInfo(dir2 + "\\" + s) //Create the full file path as required for FileInfo objects
.Length).ToList();
List<string> changedFiles = new List<string>();
for (int i = 0; i < sharedFileNames.Count; i++)
{
//If file sizes are different, there must have been a change made to one of the files.
if (dir1FileSizes[i] != dir2FileSizes[i])
{
changedFiles.Add(sharedFileNames[i]);
}
}
return changedFiles;
}
Lastly combining the results gives a list of all files which have been added/edited between the directories:
List<string> nameDiffs = FileIO.diffFileNamesInDirs(dir1, dir2);
List<string> sizeDiffs = FileIO.diffFileSizesInDirs(dir1, dir2);
List<string> allDiffs = nameDiffs.Concat(sizeDiffs).ToList();
This approach generally works but feels sloppy and also would fail for the "binary equal" case where a file is modified but still has the same size. Any suggestions on a better way?
You could use System.Security.Cryptographie.MD5 to calculate MD5 for each file and compare these.
E.g. using this Method:
public static string GetMd5Hash(string path)
{
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(path))
{
var hash = md5.ComputeHash(stream);
return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
}
}
}
Maybe this takes a little bit more time than geting values from FileInfo (depends on the amount of file to compare), but you can be completely sure if files are binary identical.

C# detect extraneous files

I have folder with these files:
image1.png
image2.png
image3.png
image4.png
image5.png
And I need to check is exists extraneous files in this folder, for example if I create example.file.css I need to give an error, there must be only that files which listed above. So i've created needed files string:
string[] only_these_files = {
"image1.png",
"image2.png",
"image3.png",
"image4.png",
"image5.png"
};
Now I need to search for extraneous files, but how to? Thanks immediately.
Use Directory.GetFiles:
https://msdn.microsoft.com/en-us/library/07wt70x2(v=vs.110).aspx
And compare with your list of allowed files.
string[] only_these_files = {
"image1.png",
"image2.png",
"image3.png",
"image4.png",
"image5.png"
};
string[] fileEntries = Directory.GetFiles(targetDirectory);
List<String> badFiles = new List<string>();
foreach (string fileName in fileEntries)
if (!only_these_files.Contains(fileName))
{
badFiles.Add(fileName);
}
This would be my implementation with the use of a lil' LINQ
var onlyAllowedFiles = new List<string>
{
"image1.png",
"image2.png",
"image3.png",
"image4.png",
"image5.png"
};
var path = "";
var files = Directory.GetFiles(path);
var nonAllowedFiles = files.Where(f => onlyAllowedFiles.Contains(f) == false);
Or alternatively if you wish to only detect the presence of illegal files.
var errorState = files.Any(f => onlyAllowedFiles.Contains(f) == false);

List<string>.contains with wildcards c#

I'm trying to match a file name to a partial string in a List. The list will have something like 192 in it and the matching file will be xxx192.dat. Using .Contains does not match the file name to the string in the List. Can anyone tell me how to get this done or how to use wildcard chars in the contains?
Code below.
// use this to get a specific list of files
private List<string> getFiles(string path, List<string> filenames)
{
List<string> temp = new List<string>();
string mapPath = Server.MapPath(path);
//DirectoryInfo Di = new DirectoryInfo(mapPath);
DirectoryInfo Di = new DirectoryInfo(#"C:\inetpub\wwwroot\Distribution\" + path); // for testing locally
FileInfo[] Fi = Di.GetFiles();
foreach (FileInfo f in Fi)
{
if (filenames.Contains(f.Name)) **// *** this never matches**
temp.Add(f.FullName);
}
return temp;
}
I'v changed the code trying to use the suggestions but it's still not working. I'll add in the data like I'm stepping through the code.
// use this to get a specific list of files
private List<string> getFiles(string path, List<string> filenames)
{
List<string> temp = new List<string>();
string mapPath = Server.MapPath(path);
//DirectoryInfo Di = new DirectoryInfo(mapPath);
DirectoryInfo Di = new DirectoryInfo(#"C:\inetpub\wwwroot\Distribution\" + path); // for testing locally
foreach (string s in filenames) // list has 228,91,151,184 in it
{
FileInfo[] Fi = Di.GetFiles(s); // s = 228: Fi = {System.IO.FileInfo[0]}
foreach (FileInfo f in Fi) //Fi = {System.IO.FileInfo[0]}
{
temp.Add(f.FullName);
}
}
return temp;
}
When I look at the directory where these files are I can see:
pbset228.dat
pbmrc228.dat
pbput228.dat
pbext228.dat
pbget228.dat
pbmsg228.dat
This is working now. It may not be the most efficient way to do this, but it gets the job done. Maybe someone can post a sample that does the same thing in a better way.
// use this to get a specific list of files
private List<string> getFiles(string path, List<string> filenames)
{
List<string> temp = new List<string>();
string mapPath = Server.MapPath(path);
//DirectoryInfo Di = new DirectoryInfo(mapPath);
DirectoryInfo Di = new DirectoryInfo(#"C:\inetpub\wwwroot\Distribution\" + path); // for testing locally
FileInfo[] Fi = Di.GetFiles();
foreach (FileInfo f in Fi)
{
foreach (string s in filenames)
{
if (f.Name.Contains(s))
temp.Add(f.FullName);
}
}
return temp;
}
You can use the Any() LINQ extension:
filenames.Any(s => s.EndsWith(f.Name));
This will return True if any element in the enumeration returns true for the given function.
For anything more complex, you could use a regular expression to match:
filenames.Any(s => Regex.IsMatch(s, "pattern"));
Use the static Directory.GetFiles method that lets you include a wildcards and will be more efficient that retrieving all the files and then having to iterate through them.
Or you can even use DirectoryInfo.GetFiles and pass your search string to that.
Change this
foreach (FileInfo f in Fi)
{
if (filenames.Contains(f.Name)) **// *** this never matches**
temp.Add(f.FullName);
}
return temp;
Into something like this
temp = filenames.Find(file => file.Contains(someNameYoureLookingFor));

C# Searching for files and folders except in certain folders

Is there any way to exclude certain directories from SearchOption using LINQ command like this
string path = "C:\SomeFolder";
var s1 = Directory.GetFiles(path , "*.*", SearchOption.AllDirectories);
var s2 = Directory.GetDirectories(path , "*.*", SearchOption.AllDirectories);
The path consists of Sub1 and Sub2 Folders with certain files in it. I need to exclude them from directory search.
Thanks
This Worked:
string[] exceptions = new string[] { "c:\\SomeFolder\\sub1",
"c:\\SomeFolder\\sub2" };
var s1 = Directory.GetFiles("c:\\x86", "*.*",
SearchOption.AllDirectories).Where(d => exceptions.All(e =>
!d.StartsWith(e)));
This helped with Exceptions
No there isn't as far as I know. But you could use very simple LINQ to do that in a single line.
var s1 = Directory.GetFiles(path , "*.*", SearchOption.AllDirectories).Where(d => !d.StartsWith("<EXCLUDE_DIR_PATH>")).ToArray();
You can easily combine multiple exclude DIRs too.
You can't do exactly what you want with simple LINQ methods. You will need to write a recursive routine instead of using SearchOption.AllDirectories. The reason is that you want to filter directories not files.
You could use the following static method to achieve what you want:
public static IEnumerable<string> GetFiles(
string rootDirectory,
Func<string, bool> directoryFilter,
string filePattern)
{
foreach (string matchedFile in Directory.GetFiles(rootDirectory, filePattern, SearchOption.TopDirectoryOnly))
{
yield return matchedFile;
}
var matchedDirectories = Directory.GetDirectories(rootDirectory, "*.*", SearchOption.TopDirectoryOnly)
.Where(directoryFilter);
foreach (var dir in matchedDirectories)
{
foreach (var file in GetFiles(dir, directoryFilter, filePattern))
{
yield return file;
}
}
}
You would use it like this:
var files = GetFiles("C:\\SearchDirectory", d => !d.Contains("AvoidMe", StringComparison.OrdinalIgnoreCase), "*.*");
Why the added complexity? This method completely avoids looking inside directories you're not interested in. The SearchOption.AllDirectories will, as the name suggests, search within all directories.
If you're not familiar with iterator methods (the yield return syntax), this can be written differently: just ask!
Alternative
This has almost the same effect. However, it still finds files within subdirectories of the directories you want to ignore. Maybe that's OK for you; the code is easier to follow.
public static IEnumerable<string> GetFilesLinq(
string root,
Func<string, bool> directoryFilter,
string filePattern)
{
var directories = Directory.GetDirectories(root, "*.*", SearchOption.AllDirectories)
.Where(directoryFilter);
List<string> results = new List<string>();
foreach (var d in directories)
{
results.AddRange(Directory.GetFiles(d, filePattern, SearchOption.TopDirectoryOnly));
}
return results;
}
try this
var s2 = Directory.GetDirectories(dirPath, "*", SearchOption.AllDirectories)
.Where(directory => !directory.Contains("DirectoryName"));
///used To Load Files And Folder information Present In Dir In dir
private void button1_Click(object sender, EventArgs e)
{
FileInfo[] fileInfoArr;
StringBuilder sbr=new StringBuilder();
StringBuilder sbrfname = new StringBuilder();
string strpathName = #"C:\Users\prasad\Desktop\Dll";
DirectoryInfo dir = new DirectoryInfo(strpathName);
fileInfoArr = dir.GetFiles("*.dll");
//Load Files From RootFolder
foreach (FileInfo f in fileInfoArr)
{
sbrfname.AppendLine(f.FullName);
}
DirectoryInfo[] dirInfos = dir.GetDirectories("*.*");
//Load Files from folder folder
foreach (DirectoryInfo d in dirInfos)
{
fileInfoArr = d.GetFiles("*.dll");
foreach (FileInfo f in fileInfoArr)
{
sbrfname.AppendLine(f.FullName);
}
sbr.AppendLine(d.ToString());
}
richTextBox1.Text = sbr.ToString();
richTextBox2.Text = sbrfname.ToString();
}

C#:Getting all image files in folder

I am trying to get all images from folder but ,this folder also include sub folders. like /photos/person1/ and /photos/person2/ .I can get photos in folder like
path= System.IO.Directory.GetCurrentDirectory() + "/photo/" + groupNO + "/";
public List<String> GetImagesPath(String folderName)
{
DirectoryInfo Folder;
FileInfo[] Images;
Folder = new DirectoryInfo(folderName);
Images = Folder.GetFiles();
List<String> imagesList = new List<String>();
for (int i = 0; i < Images.Length; i++)
{
imagesList.Add(String.Format(#"{0}/{1}", folderName, Images[i].Name));
// Console.WriteLine(String.Format(#"{0}/{1}", folderName, Images[i].Name));
}
return imagesList;
}
But how can I get all photos in all sub folders? I mean I want to get all photos in /photo/ directory at once.
Have a look at the DirectoryInfo.GetFiles overload that takes a SearchOption argument and pass SearchOption.AllDirectories to get the files including all sub-directories.
Another option is to use Directory.GetFiles which has an overload that takes a SearchOption argument as well:
return Directory.GetFiles(folderName, "*.*", SearchOption.AllDirectories)
.ToList();
I'm using GetFiles wrapped in method like below:
public static String[] GetFilesFrom(String searchFolder, String[] filters, bool isRecursive)
{
List<String> filesFound = new List<String>();
var searchOption = isRecursive ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly;
foreach (var filter in filters)
{
filesFound.AddRange(Directory.GetFiles(searchFolder, String.Format("*.{0}", filter), searchOption));
}
return filesFound.ToArray();
}
It's easy to use:
String searchFolder = #"C:\MyFolderWithImages";
var filters = new String[] { "jpg", "jpeg", "png", "gif", "tiff", "bmp", "svg" };
var files = GetFilesFrom(searchFolder, filters, false);
There's a good one-liner solution for this on a similar thread:
get all files recursively then filter file extensions with LINQ
Or if LINQ cannot be used, then use a RegEx to filter file extensions:
var files = Directory.GetFiles("C:\\path", "*.*", SearchOption.AllDirectories);
List<string> imageFiles = new List<string>();
foreach (string filename in files)
{
if (Regex.IsMatch(filename, #"\.jpg$|\.png$|\.gif$"))
imageFiles.Add(filename);
}
I found the solution this Might work
foreach (string img in Directory.GetFiles(Environment.GetFolderPath(Environment.SpecialFolder.Desktop),"*.bmp" + "*.jpg" + "SO ON"))
You need the recursive form of GetFiles:
DirectoryInfo.GetFiles(pattern, searchOption);
(specify AllDirectories as the SearchOption)
Here's a link for more information:
MSDN: DirectoryInfo.GetFiles
This allows you to use use the same syntax and functionality as Directory.GetFiles(path, pattern, options); except with an array of patterns instead of just one.
So you can also use it to do tasks like find all files that contain the word "taxes" that you may have used to keep records over the past year (xlsx, xls, odf, csv, tsv, doc, docx, pdf, txt...).
public static class CustomDirectoryTools {
public static string[] GetFiles(string path, string[] patterns = null, SearchOption options = SearchOption.TopDirectoryOnly) {
if(patterns == null || patterns.Length == 0)
return Directory.GetFiles(path, "*", options);
if(patterns.Length == 1)
return Directory.GetFiles(path, patterns[0], options);
return patterns.SelectMany(pattern => Directory.GetFiles(path, pattern, options)).Distinct().ToArray();
}
}
In order to get all image files on your c drive you would implement it like this.
string path = #"C:\";
string[] patterns = new[] {"*.jpg", "*.jpeg", "*.jpe", "*.jif", "*.jfif", "*.jfi", "*.webp", "*.gif", "*.png", "*.apng", "*.bmp", "*.dib", "*.tiff", "*.tif", "*.svg", "*.svgz", "*.ico", "*.xbm"};
string[] images = CustomDirectoryTools.GetFiles(path, patterns, SearchOption.AllDirectories);
You can use GetFiles
GetFiles("*.jpg", SearchOption.AllDirectories)
GetFiles("*.jpg", SearchOption.AllDirectories) has a problem at windows7. If you set the directory to c:\users\user\documents\, then it has an exception: because of windows xp, win7 has links like Music and Pictures in the Documents folder, but theese folders don't really exists, so it creates an exception. Better to use a recursive way with try..catch.
This will get list of all images from folder and sub folders and it also take care for long file name exception in windows.
// To handle long folder names Pri external library is used.
// Source https://github.com/peteraritchie/LongPath
using Directory = Pri.LongPath.Directory;
using DirectoryInfo = Pri.LongPath.DirectoryInfo;
using File = Pri.LongPath.File;
using FileInfo = Pri.LongPath.FileInfo;
using Path = Pri.LongPath.Path;
// Directory and sub directory search function
public void DirectoryTree(DirectoryInfo dr, string searchname)
{
FileInfo[] files = null;
var allFiles = new List<FileInfo>();
try
{
files = dr.GetFiles(searchname);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
if (files != null)
{
try
{
foreach (FileInfo fi in files)
{
allFiles.Add(fi);
string fileName = fi.DirectoryName + "\\" + fi.Name;
string orgFile = fileName;
}
var subDirs = dr.GetDirectories();
foreach (DirectoryInfo di in subDirs)
{
DirectoryTree(di, searchname);
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
public List<String> GetImagesPath(String folderName)
{
var dr = new DirectoryInfo(folderName);
string ImagesExtensions = "jpg,jpeg,jpe,jfif,png,gif,bmp,dib,tif,tiff";
string[] imageValues = ImagesExtensions.Split(',');
List<String> imagesList = new List<String>();
foreach (var type in imageValues)
{
if (!string.IsNullOrEmpty(type.Trim()))
{
DirectoryTree(dr, "*." + type.Trim());
// output to list
imagesList.Add = DirectoryTree(dr, "*." + type.Trim());
}
}
return imagesList;
}
var files = new DirectoryInfo(path).GetFiles("File")
.OrderByDescending(f => f.LastWriteTime).First();
This could gives you the perfect result of searching file with its latest mod

Categories