How to make a program search for a file quickly - c#

I wrote a program looking for a specific file in the computer, but it suffers from slow and delays in obtaining the many files on your computer
This function is working to get all the files
void Get_Files(DirectoryInfo D)
{
FileInfo[] Files;
try
{
Files = D.GetFiles("*.*");
foreach (FileInfo File_Name in Files)
listBox3.Items.Add(File_Name.FullName);
}
catch { }
DirectoryInfo[] Dirs;
try
{
Dirs = D.GetDirectories();
foreach (DirectoryInfo Dir in Dirs)
{
if (!(Dir.ToString().Equals("$RECYCLE.BIN")) && !(Dir.ToString().Equals("System Volume Information")))
Get_Files(Dir);
}
}
catch { }
}
Is there another way to get a little faster all the computer files??

Use profiler to find out, what operation is the slowest. Then think about how to make it faster. Otherwise you can waste your time by optimizing something, that is not bottleneck and will not bring you expected speed up.
In your case, you will probably find, that when you call this function for the first time (when directory structure is not in cache), most time will be spent in GetDirectories() and GetFiles() functions. You can pre-cache list of all files in memory (or in database) and use FileSystemWatcher to monitor changes in filesystem to update your file list with new files. Or you can use existing services, such as Windows Indexing service, but these may not be available on every computer.
Second bottleneck could be adding files to ListBox. If number of added item is large, you can temporarily disable drawing of listbox using ListBox.BeginUpdate and when you finish, enable it again with ListBox.EndUpdate. This can sometimes lead to huge speed up.

The answer will generally depend on your operating system. In any case you will want to build and maintain your own database of files; explicit search like in your example will be too costly and slow.
A standard solution on Linux (and Mac OS X, if I'm not mistaken) is to maintain a locatedb file, which is updated by the system on a regular basis. If run on these systems, your program could make queries against this database.

Part of the problem is that the GetFiles method doesn't return until it has gotten all the files in the folder and if you are performing a recursive search, then for each sub folder you recurse into, it will take longer and longer.
Look into using DirectoryInfo.EnumerateFile or DirectoryInfo.EnumerateFileSystemInfos
From the docs:
The EnumerateFiles and GetFiles methods differ as follows: When you
use EnumerateFiles, you can start enumerating the collection of
FileInfo objects before the whole collection is returned; when you use
GetFiles, you must wait for the whole array of FileInfo objects to be
returned before you can access the array. Therefore, when you are
working with many files and directories, EnumerateFiles can be more
efficient.
The same is true for EnumerateFileSystemInfos
You can also look into querying the Indexing Service (if it is installed and running). See this article on CodeProject:
http://www.codeproject.com/Articles/19540/Microsoft-Indexing-Service-How-To
I found this by Googling "How to query MS file system index"

You can enumerate all files once and store the list.
But if you can't do that, this is basically as good as it gets. You can do two small things:
Try using threads. This will get much better on an SSD but might hurt on a rotating disk
Use DirectoryInfo.GetFileSystemEntries. This will return files and dirs in one efficient call.

You will find much faster performance using Directory.GetFiles() as the FileInfo and DirectoryInfo classes get extra information from the file system which is much slower than simply than returning the string based file name.
Here is a code example that should yield much improved results and abstracts the action of retrieving files from the operation of displaying them in a list box.
static void Main(string[] args)
{
var fileFinder = new FileFinder(#"c:\SomePath");
listBox3.Items.Add(fileFinder.Files);
}
/// <summary>
/// SOLID: This class is responsible for recusing a directory to return the list of files, which are
/// not in an predefined set of folder exclusions.
/// </summary>
internal class FileFinder
{
private readonly string _rootPath;
private List<string> _fileNames;
private readonly IEnumerable<string> _doNotSearchFolders = new[] { "System Volume Information", "$RECYCLE.BIN" };
internal FileFinder(string rootPath)
{
_rootPath = rootPath;
}
internal IEnumerable<string> Files
{
get
{
if (_fileNames == null)
{
_fileNames = new List<string>();
GetFiles(_rootPath);
}
return _fileNames;
}
}
private void GetFiles(string path)
{
_fileNames.AddRange(Directory.GetFiles("*.*"));
foreach (var recursivePath in Directory.GetDirectories(path).Where(_doNotSearchFolders.Contains))
{
GetFiles(recursivePath);
}
}
}

Related

Is there a method in C# to copy multiple files/folders as one task?

I am using dotnet 6-windows.
I want to copy a large amount of files/folders at once.
The problem with an approach like the following is that it is
noticeably slower for large amount of files compared to copying using Windows file explorer. I tested it by copying ca. 500 images. The beneath code needed over a minute while the file explorer finished in just a few seconds.
does not show a progress bar for all files as a whole like the file explorer
foreach (string filePath in paths)
{
FileSystem.CopyFile(filePath, destination, UIOption.AllDialogs);
}
The problem is that it hands over one copying task after the other to the operating system instead of one task for all of the files.
Is there any library or built-in method that achieves this (something like FileSystem.CopyMultipleFiles(arrayOfPaths, destination, UIOption.AllDialogs)? Or do I have to use native windows APIs and if so, which?
As far as I understood your question, you want to copy files in parallel. You may use a perfect multithreading abstraction — class System.Threading.Tasks.Parallel. Static method Parallel.ForEach takes a collection, takes an Action<TCollectionElement> and for each item in the collection runs the action, passing it the item. All actions (if there's not too many of them) run in parallel. The order is not specified, but it's not a problem in your case.
Here's an example for your case:
using System.Threading.Tasks;
// ...
Parallel.ForEach(paths, p => {
FileSystem.CopyFile(p, destination, UIOption.AllDialogs);
});
By the way, if each action increases the value of your progress bar according to its progress, and each action thinks it can fill only a fixed area in the bar (in the sum), the effect will be as if the bar represented the progress of work in whole. The fixed area should be 1/x of the whole bar, where x is the number of your files.
Yes, in C# you can use the System.IO namespace to copy multiple files and folders as one task. One way to accomplish this is by using the Directory.GetFiles() method to get a list of files in a directory, and then using the File.Copy() method to copy each file to a different location. Similarly, you can use the Directory.GetDirectories() method to get a list of subdirectories, and then use the Directory.CreateDirectory() method to create those subdirectories in the new location.
You can also use the File.Move() method to move the files and folders.
You could use a loop to iterate through the files and folders and copy them one by one.
Here is an example of copying multiple files and folders in one task :
string sourcePath = #"C:\example\source";
string targetPath = #"C:\example\target";
// Copy all files from source to target
foreach (string file in Directory.GetFiles(sourcePath))
{
string targetFile = Path.Combine(targetPath, Path.GetFileName(file));
File.Copy(file, targetFile, true);
}
// Copy all subdirectories from source to target
foreach (string dir in Directory.GetDirectories(sourcePath))
{
string targetDir = Path.Combine(targetPath, Path.GetFileName(dir));
Directory.CreateDirectory(targetDir);
CopyAll(dir, targetDir);
}
This example uses a helper method CopyAll() that recursively copy all the files and folders from source to target.
You can also use the Microsoft's System.IO.Compression namespace to Zip and Unzip files and folders.

How to recursively list all the files in and get the files "on going"

I'm writing a program that needs to search a directory and all its sub directories for files that have a certain extension. This is going to be used both on a local, and a network drive, so performance is a bit of an issue.
i know i can use this kind of option:
foreach (string file in Directory.EnumerateFiles(
path, "*.*", SearchOption.AllDirectories))
{
///
}
but my folders gonna have a lot of files so i wondered how to imlement this kind of search that return the files "on going" instead on waiting until all the search will finish (something like Queue)
If you mean that your method should return them one by one (assuming you understood EnumerateFiles() does that already). Use yield return:
public IEnumerable<string > Foo(string path)
{
foreach (string file in Directory.EnumerateFiles(
path, "*.*", SearchOption.AllDirectories))
{
// Add additional logic if you need here
yield return file;
}
}
That way, if you run with foreach on your method you'll get each file at a time and you can add additional logic in the method.

Better Search for a string in all files using C# [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
After referring many blogs and articles, I have reached at the following code for searching for a string in all files inside a folder. It is working fine in my tests.
QUESTIONS
Is there a faster approach for this (using C#)?
Is there any scenario that will fail with this code?
Note: I tested with very small files. Also very few number of files.
CODE
static void Main()
{
string sourceFolder = #"C:\Test";
string searchWord = ".class1";
List<string> allFiles = new List<string>();
AddFileNamesToList(sourceFolder, allFiles);
foreach (string fileName in allFiles)
{
string contents = File.ReadAllText(fileName);
if (contents.Contains(searchWord))
{
Console.WriteLine(fileName);
}
}
Console.WriteLine(" ");
System.Console.ReadKey();
}
public static void AddFileNamesToList(string sourceDir, List<string> allFiles)
{
string[] fileEntries = Directory.GetFiles(sourceDir);
foreach (string fileName in fileEntries)
{
allFiles.Add(fileName);
}
//Recursion
string[] subdirectoryEntries = Directory.GetDirectories(sourceDir);
foreach (string item in subdirectoryEntries)
{
// Avoid "reparse points"
if ((File.GetAttributes(item) & FileAttributes.ReparsePoint) != FileAttributes.ReparsePoint)
{
AddFileNamesToList(item, allFiles);
}
}
}
REFERENCE
Using StreamReader to check if a file contains a string
Splitting a String with two criteria
C# detect folder junctions in a path
Detect Symbolic Links, Junction Points, Mount Points and Hard Links
FolderBrowserDialog SelectedPath with reparse points
C# - High Quality Byte Array Conversion of Images
Instead of File.ReadAllText() better use
File.ReadLines(#"C:\file.txt");
It returns IEnumerable (yielded) so you will not have to read the whole file if your string is found before the last line of the text file is reached
I wrote somthing very similar, a couple of changes I would recommend.
Use Directory.EnumerateDirectories instead of GetDirectories, it returns immediately with a IEnumerable so you don't need to wait for it to finish reading all of the directories before processing.
Use ReadLines instead of ReadAllText, this will only load one line in at a time in memory, this will be a big deal if you hit a large file.
If you are using a new enough version of .NET use Parallel.ForEach, this will allow you to search multiple files at once.
You may not be able to open the file, you need to check for read permissions or add to the manifest that your program requires administrative privileges (you should still check though)
I was creating a binary search tool, here is some snippets of what I wrote to give you a hand
private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
Parallel.ForEach(Directory.EnumerateFiles(_folder, _filter, SearchOption.AllDirectories), Search);
}
//_array contains the binary pattern I am searching for.
private void Search(string filePath)
{
if (Contains(filePath, _array))
{
//filePath points at a match.
}
}
private static bool Contains(string path, byte[] search)
{
//I am doing ReadAllBytes due to the fact that I am doing a binary search not a text search
// There are no "Lines" to seperate out on.
var file = File.ReadAllBytes(path);
var result = Parallel.For(0, file.Length - search.Length, (i, loopState) =>
{
if (file[i] == search[0])
{
byte[] localCache = new byte[search.Length];
Array.Copy(file, i, localCache, 0, search.Length);
if (Enumerable.SequenceEqual(localCache, search))
loopState.Stop();
}
});
return result.IsCompleted == false;
}
This uses two nested parallel loops. This design is terribly inefficient, and could be greatly improved by using the Booyer-Moore search algorithm but I could not find a binary implementation and I did not have the time when I wrote it originally to implement it myself.
the main problem here is that you are searching all the files in real time for every search. there is also the possibility of file access conflicts if 2+ users are searching at the same time.
to dramtically improve performance I would index the files ahead of time, and as they are edited/saved. store the indexed using something like lucene.net and then query the index (again using luence.net) and return the file names to the user. so the user never queries the files directly.
if you follow the links in this SO Post you may have a head start on implementing the indexing. I didn't follow the links, but it's worth a look.
Just a heads up, this will be an intense shift from your current approach and will require
a service to monitor/index the files
the UI project
I think your code will fail with an exception if you lack permission to open a file.
Compare it with the code here: http://bgrep.codeplex.com/releases/view/36186
That latter code supports
regular expression search and
filters for file extensions
-- things you should probably consider.
Instead of Contains better use algorithm Boyer-Moore search.
Fail scenario: file have not read permission.

Get attributes of all files under a directory while accessing the directory only

I'm trying to write a function in C# that gets a directory path as parameter and returns a dictionary where the keys are the files directly under that directory and the values are their last modification time.
This is easy to do with Directory.GetFiles() and then File.GetLastWriteTime(). However, this means that every file must be accessed, which is too slow for my needs.
Is there a way to do this while accessing just the directory? Does the file system even support this kind of requirement?
Edit, after reading some answers:
Thank you guys, you are all saying pretty much the same - use FileInfo object. Still, it is just as slow to use Directory.GetFiles() (or Directory.EnumerateFiles()) to get those objects, and I suspect that getting them requires access to every file. If the file system keeps last modification time of its files in the files themselves only, there can't be a way to extract that info without file access. Is this the case here? Do GetFiles() and EnumerateFiles() of DirectoryInfo access every file or get their info from the directory entry? I know that if I would have wanted to get just the file names, I could do this with the Directory class without accessing every file. But getting attributes seems trickier...
Edit, following henk's response:
it seems that it really is faster to use FileInfo Object. I created the following test:
static void Main(string[] args)
{
Console.WriteLine(DateTime.Now);
foreach (string file in Directory.GetFiles(#"\\169.254.78.161\dir"))
{
DateTime x = File.GetLastWriteTime(file);
}
Console.WriteLine(DateTime.Now);
DirectoryInfo dirInfo2 = new DirectoryInfo(#"\\169.254.78.161\dir");
var files2 = from f in dirInfo2.EnumerateFiles()
select f;
foreach (FileInfo file in files2)
{
DateTime x = file.LastWriteTime;
}
Console.WriteLine(DateTime.Now);
}
For about 800 files, I usually get something like:
31/08/2011 17:14:48
31/08/2011 17:14:51
31/08/2011 17:14:52
I didn't do any timings but your best bet is:
DirectoryInfo di = new DirectoryInfo(myPath);
FileInfo[] files = di.GetFiles();
I think all the FileInfo attributes are available in the directory file records so this should (could) require the minimum I/O.
The only other thing I can think of is using the FileInfo-Class. As far as I can see this might help you or it might read the file as well (Read Permissions are required)

Moving files on different volumes in .NET

Apparently I can't move files on different volumes using Directory.Move.
I have read that I have to copy each file individually to the destination, then delete the source directory.
Do I have any other option?
Regardless of whether or not Directory.Move (or any other function) performed the move between volumes, it would essentially be doing a copy and delete anyway underneath. So if you want a speed increase, that's not going to happen. I think the best solution would be to write your own reusable move function, which would get the volume label (C:,D:) from the to and from paths, and then either perform a move, or copy+delete when necessary.
To my knowledge there is no other way however deleting a directory has a catch: Read Only Files might cause a UnauthorizedAccessException when deleting a directory and all of its contents.
This recurses a directory and unsets all the read only flags. Call before Directory.Delete:
public void removeReadOnlyDeep(string directory)
{
string[] files = Directory.GetFiles(directory);
foreach (string file in files)
{
FileAttributes attributes = File.GetAttributes(file);
if ((attributes & FileAttributes.ReadOnly) != 0)
{
File.SetAttributes(file, ~FileAttributes.ReadOnly);
}
}
string[] dirs = Directory.GetDirectories(directory);
foreach (string dir in dirs)
{
removeReadOnlyDeep(dir);
}
}
An easier option would be, to add a reference to the Microsoft.VisualBasic namespace and use the MoveDirectory method, which can move across volumes.
Microsoft.VisualBasic.FileIO.FileSystem.MoveDirectory(sourceDirName, destDirName);
Try to use this:
public static void RobustMove(string sourceDirectory, string destDirectory)
{
//move if directories are on the same volume
if (Path.GetPathRoot(source) == Path.GetPathRoot(destination))
{
Directory.Move(source, destination);
}
else
{
CopyDirectoryRecursive(source, destination);
Directory.Delete(source, true);
}
}
You will find CopyDirectoryRecursive function here:
This should be working until you use spanned volume or symbol links to another physical disk.
To be even more robust you can improve this function to use Move until System.IO .Exception is thrown and then to switch to copying and deleting.

Categories