Copy only new or modified files/directories in C# - c#

I am trying to create a simple “directory/file copy" console application in C#. What I need is to copy all folders and files (keeping the original hierarchy) from one drive to another, like from drive C:\Data to drive E:\Data.
However, I only want it to copy any NEW or MODIFIED files from the source to the destination.
If the file on the destination drive is newer than the one on the source drive, then it does not copy.
(the problem)
In the code I have, it's comparing file "abc.pdf" in the source with file "xyz.pdf" in the destination and thus is overwriting the destination file with whatever is in the source even though the destination file is newer. I am trying to figure out how to make it compare "abc.pdf" in the source to "abc.pdf" in the destination.
This works if I drill the source and destination down to a specific file, but when I back out to the folder level, it overwrites the destination file with the source file, even though the destination file is newer.
(my solutions – that didn’t work)
I thought by putting the “if (file.LastWriteTime > destination.LastWriteTime)” after the “foreach” command, that it would compare the files in the two folders, File1 source to File1 destination, but it’s not.
It seems I’m missing something in either the “FileInfo[]”, “foreach” or “if” statements to make this a one-to-one comparison. I think maybe some reference to the “Path.Combine” statement or a “SearchOption.AllDirectories”, but I’m not sure.
Any suggestions?
As you can see from my basic code sample, I'm new to C# so please put your answer in simple terms.
Thank you.
Here is the code I have tried, but it’s not working.
class Copy
{
public static void CopyDirectory(DirectoryInfo source, DirectoryInfo destination)
{
if (!destination.Exists)
{
destination.Create();
}
// Copy files.
FileInfo[] files = source.GetFiles();
FileInfo[] destFiles = destination.GetFiles();
foreach (FileInfo file in files)
foreach (FileInfo fileD in destFiles)
// Copy only modified files
if (file.LastWriteTime > fileD.LastWriteTime)
{
file.CopyTo(Path.Combine(destination.FullName,
file.Name), true);
}
// Copy all new files
else
if (!fileD.Exists)
{
file.CopyTo(Path.Combine(destination.FullName, file.Name), true);
}
// Process subdirectories.
DirectoryInfo[] dirs = source.GetDirectories();
foreach (DirectoryInfo dir in dirs)
{
// Get destination directory.
string destinationDir = Path.Combine(destination.FullName, dir.Name);
// Call CopyDirectory() recursively.
CopyDirectory(dir, new DirectoryInfo(destinationDir));
}
}
}

You can just take the array of files in "source" and check for a matching name in "destination"
/// <summary>
/// checks whether the target file needs an update (if it doesn't exist: it needs one)
/// </summary>
public static bool NeedsUpdate(FileInfo localFile, DirectoryInfo localDir, DirectoryInfo backUpDir)
{
bool needsUpdate = false;
if (!File.Exists(Path.Combine(backUpDir.FullName, localFile.Name)))
{
needsUpdate = true;
}
else
{
FileInfo backUpFile = new FileInfo(Path.Combine(backUpDir.FullName, localFile.Name));
DateTime lastBackUp = backUpFile.LastWriteTimeUtc;
DateTime lastChange = localFile.LastWriteTimeUtc;
if (lastChange != lastBackUp)
{
needsUpdate = true;
}
else
{/*no change*/}
}
return needsUpdate;
}

Update:
I modified my code with the suggestions above and all went well. It did exactly as I expected.
However, the problem I ran into was the amount of time it took run the application on a large folder. (containing 6,000 files and 5 sub-folders)
On a small folder, (28 files in 5 sub-folders) it only took a few seconds to run. But, on the larger folder it took 35 minutes to process only 1,300 files.
Solution:
The code below will do the same thing but much faster. This new version processed 6,000 files in about 10 seconds. It processed 40,000 files in about 1 minute and 50 seconds.
What this new code does (and doesn’t do)
If the destination folder is empty, copy all from the source to the destination.
If the destination has some or all of the same files / folders as the source, compare and copy any new or modified files from the source to the destination.
If the destination file is newer than the source, don’t copy.
So, here’s the code to make it happen. Enjoy and share.
Thanks to everyone who helped me get a better understanding of this.
using System;
using System.IO;
namespace VSU1vFileCopy
{
class Program
{
static void Main(string[] args)
{
const string Src_FOLDER = #"C:\Data";
const string Dest_FOLDER = #"E:\Data";
string[] originalFiles = Directory.GetFiles(Src_FOLDER, "*", SearchOption.AllDirectories);
Array.ForEach(originalFiles, (originalFileLocation) =>
{
FileInfo originalFile = new FileInfo(originalFileLocation);
FileInfo destFile = new FileInfo(originalFileLocation.Replace(Src_FOLDER, Dest_FOLDER));
if (destFile.Exists)
{
if (originalFile.Length > destFile.Length)
{
originalFile.CopyTo(destFile.FullName, true);
}
}
else
{
Directory.CreateDirectory(destFile.DirectoryName);
originalFile.CopyTo(destFile.FullName, false);
}
});
}
}
}

Related

Application: Application Launcher, can't Move directory, it's being used by another process

I'm writing application launcher as a Window Application in C#, VS 2017. Currently, having problem with this piece of code:
if (System.IO.Directory.Exists(extractPath))
{
string[] files = System.IO.Directory.GetFiles(extractPath);
string[] dirs = Directory.GetDirectories(extractPath);
// Copy the files and overwrite destination files if they already exist.
foreach (string s in files)
{
// Use static Path methods to extract only the file name from the path.
var fileName = System.IO.Path.GetFileName(s);
var destFile = System.IO.Path.Combine(oldPath, fileName);
System.IO.File.Move(s, destFile);
}
foreach (string dir in dirs)
{
//var dirSplit = dir.Split('\\');
//var last = dirSplit.Last();
//if (last != "Resources")
//{
var fileName = System.IO.Path.GetFileName(dir);
var destFile = System.IO.Path.Combine(oldPath, fileName);
System.IO.Directory.Move(dir, destFile);
//}
}
}
I'm getting well known error
"The process cannot access the file 'XXX' because it is being used by another process."
I was looking for solution to fix it, found several on MSDN and StackOvervflow, but my problem is quite specific. I cannot move only 1 directory to another, which is Resources folder of my main application:
Here is my explanation why problem is specific:
I'm not having any issues with moving other files from parent directory. Error occurs only when loop reaches /Resources directory.
At first, I was thinking that it's beeing used by VS instantion, in which I've had main app opened. Nothing have changed after closing VS and killing process.
I've copied and moved whole project to another directory. Never opened it in VS nor started via *.exe file, to make sure that none of files in new, copied directory, is used by any process.
Finally, I've restarted PC.
I know that this error is pretty common when you try to Del/Move files, but in my case, I'm sure that it's being used only by my launcher app. Here is a little longer sample code to show what files operation I'm actually doing:
private void RozpakujRepo()
{
string oldPath = #"path\Debug Kopia\Old";
string extractPath = #"path\Debug Kopia";
var tempPath = #"path\ZipRepo\RexTempRepo.zip";
if (System.IO.File.Exists(tempPath) == true)
{
System.IO.File.Delete(tempPath);
}
System.IO.Compression.ZipFile.CreateFromDirectory(extractPath, tempPath);
if (System.IO.Directory.Exists(oldPath))
{
DeleteDirectory(oldPath);
}
if (!System.IO.Directory.Exists(oldPath))
{
System.IO.Directory.CreateDirectory(oldPath);
}
if (System.IO.Directory.Exists(extractPath))
{
string[] files = System.IO.Directory.GetFiles(extractPath);
string[] dirs = Directory.GetDirectories(extractPath);
// Copy the files and overwrite destination files if they already exist.
foreach (string s in files)
{
// Use static Path methods to extract only the file name from the path.
var fileName = System.IO.Path.GetFileName(s);
var destFile = System.IO.Path.Combine(oldPath, fileName);
System.IO.File.Move(s, destFile);
}
foreach (string dir in dirs)
{
//var dirSplit = dir.Split('\\');
//var last = dirSplit.Last();
//if (last != "Resources")
//{
var fileName = System.IO.Path.GetFileName(dir);
var destFile = System.IO.Path.Combine(oldPath, fileName);
System.IO.Directory.Move(dir, destFile);
//}
}
}
string zipPath = #"path\ZipRepo\RexRepo.zip";
ZipFile.ExtractToDirectory(zipPath, extractPath);
}
And now, my questions:
Can it be related to file types (.png, .ico, .bmp) ?
Can it be related to fact, that those resources files are being used like, as, for example .exe file icon in my main application? Or just because those are resources files?
Is there anything else what I'm missing and what can cause the error?
EDIT:
To clarify:
There are 2 apps:
Main Application
Launcher Application (to launch Main Application)
And Resources folder is Main Application/Resources, I'm moving it while I'm doing application version update.
It appeared that problem is in different place than in /Resources directory. Actually problem was with /Old directory, because it caused inifinite recurrence.

Fastest way to delete million of files

I have a list of string which are relative paths. I also have a string which contains root path for those files. Now I am deleting them like this:
foreach (var rawDocumentPath in documents.Select(x => x.RawDocumentPath))
{
if (string.IsNullOrEmpty(rawDocumentPath))
{
continue;
}
string fileName = Path.Combine(storagePath, rawDocumentPath);
File.Delete(fileName);
}
the problem is that I call Path.Combine for every file, and it's slow enough.
How can I speed up this code? I can't delete whole folders, I cannot change current directory (because it affects a whole program)...
I need something like a class which can delete fast several files in specified directory.
If your disk can handle it, parallizing should help a lot:
documents.AsParallel().ForAll(
document =>
{
if (!string.IsNullOrEmpty(document.RawDocumentPath))
{
string fileName = Path.Combine(storagePath, document.RawDocumentPath);
File.Delete(fileName);
}
});

Need elegant way to move orphan files from folder after paired files are moved with FileInfo class

I have a folder from which I'm moving pairs of related files (xml paired with pdf). Additional files could be deposited into this folder at any time, but the utility runs every 10 minutes or so. We could use the FileSystemWatcher class but for internal reasons we don't for this utility.
I'm using the System.IO.FileInfo class to read all the files in the folder (will only be xml and pdf) during each run. Once I have the files in the FileInfo object, I iterate through the files, moving matches to a working folder. Once that is done, I want to move any files that were not paired, but are in the FileInfo object, to a failure folder.
Since I can't seem to remove items from the FileInfo object (or I am missing something), would it be easier to (1) use a string array from Directory class .GetFiles, (2) create a Dictionary from the FileInfo object and remove values from that during iteration, or (3) is there a more elegant approach using LINQ or something else?
Here is the code so far:
internal static bool CompareXMLandPDFFileNames(FileInfo[] xmlFiles, FileInfo[] pdfFiles, string xmlFilePath)
{
string workingFilePath = xmlFilePath + #"\WORKING";
if (xmlFiles.Length > 0)
{
foreach (var xmlFile in xmlFiles)
{
string xfn = xmlFile.Name; //xml file name
string pdfName = xfn.Substring(0,xfn.IndexOf('_')) + ".pdf"; //parsed pdf file name contained in xml file name
foreach (var pdfFile in pdfFiles)
{
string pfn = pdfFile.Name; //pdf file name
if (pfn == pdfName)
{
//move xml and pdf files to working folder...
FileInfo xmlInfo = new FileInfo(xmlFilePath + xfn);
FileInfo pdfInfo = new FileInfo(xmlFilePath + pfn);
if (!File.Exists(workingFilePath + xfn))
{
xmlInfo.MoveTo(workingFilePath + xfn);
}
if (!File.Exists(workingFilePath + pfn))
{
pdfInfo.MoveTo(workingFilePath + pfn);
}
}
}
}
//all files in the file objects should now be moved to working folder, if not, fix orphans...
}
return true;
}
To be honest I think the question is a bit poor. The problem is stated in a very complicated fashion. I think the workflow be designed to be more robust and deterministic. (e.g. why not upload file pairs in zipped sets in the first place?)
(And no "Someone" most likely "must not have been here before")
Here are some random improvements:
using System;
using System.Linq;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
using System.Collections.Generic;
namespace O
{
static class X
{
private static readonly Regex _xml2pdf = new Regex("(_.*).xml$", RegexOptions.Compiled | RegexOptions.IgnoreCase);
internal static void MoveFileGroups(string uploadFolder)
{
string workingFilePath = Path.Combine(uploadFolder, "PROGRESS");
var groups = new DirectoryInfo(uploadFolder)
.GetFiles()
.GroupBy(fi => _xml2pdf.Replace(fi.Name, ".pdf"), StringComparer.InvariantCultureIgnoreCase)
.Where(group => group.Count() >1);
foreach (var group in groups)
{
if (!group.Any(fi => File.Exists(Path.Combine(workingFilePath, fi.Name))))
foreach (var file in group)
file.MoveTo(Path.Combine(workingFilePath, file.Name));
}
}
public static void Main(string[]args)
{
}
}
}
use readable names (say what you mean)
IndexOf returns -1 if filename contains no "_"; random upload filenames could make procedure fail
Handle filenames case insensitive on Windows
Don't manually do the path concats (you could accidentally manufacture UNC paths, and your code is less portable)
don't assume one xml will map to one pdf: the naming scheme implies that many xmls map to the same pdf name. This implementation allows that (or you could detect the situation by rejecting groups.Where(g => g.Count()>2)
Move groups atomically only (!): if any one of the files in a group exist in the target dir, don't move any (or you will have a race condition, where part of a group get's moved before the last file was (completely) uploaded and it will never get moved because the group is no longer detected
Other items (todo)
Don't pass redundant parameters. You might pass a FI[] instead of the raw GetFiles() call if you want filtering.
Do error handling, notably:
handle IO exceptions
locking errors are expectable while uploads in progress (test it or end up with corrupted files); you need to atomically handle these (i.e. not move any files in a group unless all could be moved; this will be somewhat tricky)
test your code (none of my sample was tested; it just compiled on linux with mono)

C# How to loop a large set of folders and files recursively without using a huge amount of memory

I want to index all my music files and store them in a database.
I have this function that i call recusively, starting from the root of my music drive.
i.e.
start > ReadFiles(C:\music\);
ReadFiles(path){
foreach(file)
save to index;
foreach(directory)
ReadFiles(directory);
}
This works fine, but while running the program the amount of memory that is used grows and grows and.. finally my system runs out of memory.
Does anyone have a better approach that doesnt need 4GB of RAM to complete this task?
Best Regards, Tys
Alxandr's queue based solution should work fine.
If you're using .NET 4.0, you could also take advantage of the new Directory.EnumerateFiles method, which enumerates files lazily, without loading them all in memory:
void ReadFiles(string path)
{
IEnumerable<string> files =
Directory.EnumerateFiles(
path,
"*",
SearchOption.AllDirectories); // search recursively
foreach(string file in files)
SaveToIndex(file);
}
Did you check for the . and .. entries that show up in every directory except the root?
If you don't skip those, you'll have an infinite loop.
You can implement this as a queue. I think (but I'm not sure) that this will save memory. At least it will free up your stack. Whenever you find a folder you add it to the queue, and whenever you find a file you just read it. This prevents recursion.
Something like this:
Queue<string> dirs = new Queue<string>();
dirs.Enqueue("basedir");
while(dirs.Count > 0) {
foreach(directory)
dirs.Enqueue(directory);
ReadFiles();
}
Beware, though, that EnumerateFiles() will stop running if you don't have access to a file or if a path is too long or if some other exception occurs. This is what I use for the moment to solve those problems:
public static List<string> getFiles(string path, List<string> files)
{
IEnumerable<string> fileInfo = null;
IEnumerable<string> folderInfo = null;
try
{
fileInfo = Directory.EnumerateFiles(str);
}
catch
{
}
if (fileInfo != null)
{
files.AddRange(fileInfo);
//recurse through the subfolders
fileInfo = Directory.EnumerateDirectories(str);
foreach (string s in folderInfo)
{
try
{
getFiles(s, files);
}
catch
{
}
}
}
return files;
}
Example use:
List<string> files = new List<string>();
files = folder.getFiles(path, files);
My solution is based on the code at this page: http://msdn.microsoft.com/en-us/library/vstudio/bb513869.aspx.
Update: A MUCH faster method to get files recursively can be found at http://social.msdn.microsoft.com/Forums/vstudio/en-US/ae61e5a6-97f9-4eaa-9f1a-856541c6dcce/directorygetfiles-gives-me-access-denied?forum=csharpgeneral. Using Stack is new to me (I didn't even know it existed), but the method seems to work. At least it listed all files on my C and D partition with no errors.
It could be junction folders wich leads to infinite loop when doing recursion but i am not sure , check this out and see by yourself . Link: https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/mklink

How do I compare one collection of files to another in c#?

I am just learning C# (have been fiddling with it for about 2 days now) and I've decided that, for leaning purposes, I will rebuild an old app I made in VB6 for syncing files (generally across a network).
When I wrote the code in VB 6, it worked approximately like this:
Create a Scripting.FileSystemObject
Create directory objects for the source and destination
Create file listing objects for the source and destination
Iterate through the source object, and check to see if it exists in the destination
if not, create it
if so, check to see if the source version is newer/larger, and if so, overwrite the other
So far, this is what I have:
private bool syncFiles(string sourcePath, string destPath) {
DirectoryInfo source = new DirectoryInfo(sourcePath);
DirectoryInfo dest = new DirectoryInfo(destPath);
if (!source.Exists) {
LogLine("Source Folder Not Found!");
return false;
}
if (!dest.Exists) {
LogLine("Destination Folder Not Found!");
return false;
}
FileInfo[] sourceFiles = source.GetFiles();
FileInfo[] destFiles = dest.GetFiles();
foreach (FileInfo file in sourceFiles) {
// check exists on file
}
if (optRecursive.Checked) {
foreach (DirectoryInfo subDir in source.GetDirectories()) {
// create-if-not-exists destination subdirectory
syncFiles(sourcePath + subDir.Name, destPath + subDir.Name);
}
}
return true;
}
I have read examples that seem to advocate using the FileInfo or DirectoryInfo objects to do checks with the "Exists" property, but I am specifically looking for a way to search an existing collection/list of files, and not live checks to the file system for each file, since I will be doing so across the network and constantly going back to a multi-thousand-file directory is slow slow slow.
Thanks in Advance.
The GetFiles() method will only get you files that does exist. It doesn't make up random files that doesn't exist. So all you have to do is to check if it exists in the other list.
Something in the lines of this could work:
var sourceFiles = source.GetFiles();
var destFiles = dest.GetFiles();
foreach (var file in sourceFiles)
{
if(!destFiles.Any(x => x.Name == file.Name))
{
// Do whatever
}
}
Note: You have of course no guarantee that something hasn't changed after you have done the calls to GetFiles(). For example, a file could have been deleted or renamed if you try to copy it later.
Could perhaps be done nicer somehow by using the Except method or something similar. For example something like this:
var sourceFiles = source.GetFiles();
var destFiles = dest.GetFiles();
var sourceFilesMissingInDestination = sourceFiles.Except(destFiles, new FileNameComparer());
foreach (var file in sourceFilesMissingInDestination)
{
// Do whatever
}
Where the FileNameComparer is implemented like so:
public class FileNameComparer : IEqualityComparer<FileInfo>
{
public bool Equals(FileInfo x, FileInfo y)
{
return Equals(x.Name, y.Name);
}
public int GetHashCode(FileInfo obj)
{
return obj.Name.GetHashCode();
}
}
Untested though :p
One little detail, instead of
sourcePath + subDir.Name
I would use
System.IO.Path.Combine(sourcePath, subDir.Name)
Path does reliable, OS independent operations on file- and foldernames.
Also I notice optRecursive.Checked popping out of nowhere. As a matter of good design, make that a parameter:
bool syncFiles(string sourcePath, string destPath, bool checkRecursive)
And since you mention it may be used for large numbers of files, keep an eye out for .NET 4, it has an IEnumerable replacement for GetFiles() that will let you process this in a streaming fashion.

Categories