I have some functions that allow a user to search through multiple directories for files of a certain type, and then just the path of those files is added to a listbox. Right now it's done through some nested foreach statements. It's going to be retrieving hundreds of thousands of filepaths, so I was curious what other efficient ways there would be to go about this?
Also, I know it sounds dumb to add that many items to a listbox. I'm only doing what I was told to do. I have a feeling in the future it will be asked to get rid of, but the filepaths will still have to be stored in a list somewhere.
Note: I'm using the WindowsAPICodePack to get a dialogue box that allows multiple directory selection.
List<string> selectedDirectories = new List<string>();
/// <summary>
/// Adds the paths of the directories chosen by the user into a list
/// </summary>
public void AddFilesToList()
{
selectedDirectories.Clear(); //make sure list is empty
var dlg = new CommonOpenFileDialog();
dlg.IsFolderPicker = true;
dlg.AddToMostRecentlyUsedList = false;
dlg.AllowNonFileSystemItems = false;
dlg.EnsureFileExists = true;
dlg.EnsurePathExists = true;
dlg.EnsureReadOnly = false;
dlg.EnsureValidNames = true;
dlg.Multiselect = true;
dlg.ShowPlacesList = true;
if (dlg.ShowDialog() == CommonFileDialogResult.Ok)
{
selectedDirectories = dlg.FileNames.ToList(); //add paths of selected directories to list
}
}
/// <summary>
/// Populates a listbox with all the filepaths of the selected type of file the user has chosen
/// </summary>
public void PopulateListBox()
{
foreach (string directoryPath in selectedDirectories) //for each directory in list
{
foreach (string ext in (dynamic)ImageCB.SelectedValue) //for each file type selected in dropdown
{
foreach (string imagePath in Directory.GetFiles(directoryPath, ext, SearchOption.AllDirectories)) //for each file in specified directory w/ specified format(s)
{
ListBox1.Items.Add(imagePath); //add file path to listbox
}
}
}
}
Edit: Not sure if it makes a difference, but I'm using the WPF listbox, not winforms.
One way to begin refactoring this outside of learning Linq would be to use the AddRange method. A good explanation as to its performance advantages over a for loop:
https://stackoverflow.com/a/9836512/4846465
There's probably no one answer to this question however.
foreach (var directoryPath in selectedDirectories)
{
foreach (string ext in (dynamic)ImageCB)
{
ListBox1.Items.AddRange(Directory.GetFiles(directoryPath, ext, SearchOption.AllDirectories).ToArray());
}
}
You can refactor it, or you can leave it how it is.
If you refactor it;
Your code will be more readable, understandable and reusable.
You need to write just a couple methods.
And your methods can be usable for another things like your current method.
And works.
If you leave it how it is;
Your code works. But hard to understand and read. Hard to debug in case of bug.
But works.
Related
I need to convert images(like .jpg) to PDF files for an assignment for school. I have a ListBox where I put the pages of the PDF file, so the user can reorder the list and convert the files in that order.
I have the files in a temporary folder in order to get the files there to convert them to PDF.
My problem here is : how do I convert the files with the order that the user had chosen?
I already searched and I tried to do a Class with the strings ID and Name so i get the ID from the item in the ListBox and change it on a new list. And i think after, I do a foreach() loop where I get the files from the temporary folder and merge them in a new PDF file, but to do in the order I want, I think I have to compare the name of the file with the name in the list and, if it matches, convert and add it, if not, pass to the next file.
But I don't know how to do it.
Can please someone help me getting this right?
Thanks in advance!
I'm sending my code to:
//the open files button
private void proc2_Click(object sender, EventArgs e)
{
OpenFileDialog dialogo = new OpenFileDialog();
dialogo.Title = "Search files";
dialogo.InitialDirectory = #"E:\";
dialogo.Filter = "Images (.bmp,.jpg,.png,.tiff,.tif) |*.bmp;*.jpg;*.png;*tiff;*tif|All of the files (*.*)|*.*";
DialogResult resposta = dialogo.ShowDialog();
if (resposta == DialogResult.OK)
{
string caminhoCompleto = dialogo.FileName;
caminho2 = dialogo.SafeFileName;
caminhotb2.Text = caminhoCompleto;
string fish = "";
string path = #"C:\temporario";
if(Directory.Exists(path))
{
fish=Path.Combine(path, caminho2);
}
else
{
Directory.CreateDirectory(path);
fish = Path.Combine(path, caminho2);
}
File.Create(fish);
listaimg.Items.Add(caminho2);
}
}
public string[] GetFilesImg4() //jpg files
{
if (!Directory.Exists(#"C:\temporario"))
{
Directory.CreateDirectory(#"C:\temporario");
}
DirectoryInfo dirInfo = new DirectoryInfo(#"C:\temporario");
FileInfo[] fileInfos4 = dirInfo.GetFiles("*.jpg");
foreach (FileInfo info in fileInfos4)
{
if (info.Name.IndexOf("protected") == -1)
list4.Add(info.FullName);
}
return (string[])list4.ToArray(typeof(string));
}
If both actions happen in the same process, you can just store the list of file names in memory (and you already do add them to listaimg):
public string[] GetFilesImg4() //jpg files
{
string tempPath = #"C:\temporario";
if (!Directory.Exists(tempPath))
{
foreach (string filename in listimga.Items)
{
if (!filename.Contains("protected"))
list4.Add(Path.Combine(tempPath, filename);
}
}
return (string[])list4.ToArray(typeof(string));
}
if these are different processes then you can just dump content of your listimga at some point and then read it from the same file. In the example below I store it to file named "order.txt" in the same directory, but logic may be more complicated, such as merging several files with a timestamp and such.
// somewhere in after selecting all files
File.WriteAllLines(#"c:\temporario\order.txt", listimga.Items.Select(t=>t.ToString()));
public string[] GetFilesImg4() //jpg files
{
string tempPath = #"C:\temporario";
if (!Directory.Exists(tempPath))
{
var orderedFilenames = File.ReadAllLines(Path.Combine(tempPath, "order.txt")); // list of files loaded in order
foreach (string filename in orderedFilenames)
{
if (!filename.Contains("protected"))
list4.Add(Path.Combine(tempPath, filename);
}
}
return (string[])list4.ToArray(typeof(string));
}
it's also a good idea to examine available method on a class, such as in this case string.IndexOf(s) == -1 is equivalent to !string.Contains(s) and the latter is much more readable at least for an English speaking person.
I also noticed that your users have to select documents one by one, but FileOpen dialogs allow to select multiple files at a time, and I believe it preserves the order of selection as well.
If order of selection is important and file open dialogs don't preserve order or users find it hard to follow you can still use multiple file selection open dialog and then allow to reorder your listimga list box to get the order right.
I am new to C# . I have a text box where i enter the file to search and a 'search' button. on clock of search i want it to populate the files in the folder but i get the above error. Below is my code:
string[] directories = Directory.GetDirectories(#"d:\",
"*",
SearchOption.AllDirectories);
string file = textBox1.Text;
DataGrid dg = new DataGrid();
{
var files = new List<string>();
foreach (DriveInfo d in DriveInfo.GetDrives().Where(x => x.IsReady))
{
try
{
files.AddRange(Directory.GetFiles(d.RootDirectory.FullName, file , SearchOption.AllDirectories));
}
catch(Exception ex)
{
MessageBox.Show("the exception is " + ex.ToString());
//Logger.Log(e.Message); // Log it and move on
}
}
Please help me resolve it . Thanks
The most important rule when searching on a folder which potentially contains inaccessible subfolder is:
Do NOT use SearchOption.AllDirectories!
Use SearchOption.TopDirectoryOnly instead, combined with recursive search for all the accessible directories.
Using SearchOption.AllDirectories, one access violation will break your entire loop even before any file/directory is processed. But if you use SearchOption.TopDirectoryOnly, you only skip what is inaccessible.
There is more difficult way to use Directory.GetAccessControl() per child directory check to see if you have an access to a Directory before hand (this option is rather hard though - I don't really recommend this unless you know exactly how the access system works).
For recursive search, I have this code implemented for my own use:
public static List<string> GetAllAccessibleDirectories(string path, string searchPattern) {
List<string> dirPathList = new List<string>();
try {
List<string> childDirPathList = Directory.GetDirectories(path, searchPattern, SearchOption.TopDirectoryOnly).ToList(); //use TopDirectoryOnly
if (childDirPathList == null || childDirPathList.Count <= 0) //this directory has no child
return null;
foreach (string childDirPath in childDirPathList) { //foreach child directory, do recursive search
dirPathList.Add(childDirPath); //add the path
List<string> grandChildDirPath = GetAllAccessibleDirectories(childDirPath, searchPattern);
if (grandChildDirPath != null && grandChildDirPath.Count > 0) //this child directory has children and nothing has gone wrong
dirPathList.AddRange(grandChildDirPath.ToArray()); //add the grandchildren to the list
}
return dirPathList; //return the whole list found at this level
} catch {
return null; //something has gone wrong, return null
}
}
This is how you call it
List<string> accessibleDirs = GetAllAccessibleDirectories(myrootpath, "*");
Then, you only need to search/add the files among all accessible directories.
Note: this question is quite classical though. I believe there are some other better solutions out there too.
And in case there are some directories which you particularly want to avoid after you get all your accessible directories, you could also filter the List result by LINQ using part of the directory's name as keyword (i.e. Recycle.Bins).
As Ian has specified in his post, do not use recursive file listing (Directory.GetFiles(path, searchPattern, SearchOption.AllDirectories)) in case like yours, since the first exception will stop further processing.
Also, to somewhat alleviate such issues and for better results in general, you should run this program as an Administrator. This can be done by right-clicking your application in windows explorer, and then checking Run this program as an administrator option on Compatibility tab.
Also, you should use code like below to do your search, so the intermediate exceptions do not stop further searching.
static void Main(string[] args) {
string fileToFind = "*.jpg";
var files = new List<string>();
foreach (DriveInfo d in DriveInfo.GetDrives().Where(x => x.IsReady))
files.AddRange(FindDirectory(fileToFind, d.RootDirectory.FullName));
}
/// <summary>
/// This function returns the full file path of the matches it finds.
/// 1. It does not do any parameter validation
/// 2. It searches recursively
/// 3. It eats up any error that occurs when requesting files and directories within the specified path
/// 4. Supports specifying wildcards in the fileToFind parameter.
/// </summary>
/// <param name="fileToFind">Name of the file to search, without the path</param>
/// <param name="path">The path under which the file needs to be searched</param>
/// <returns>Enumeration of all valid full file paths matching the file</returns>
public static IEnumerable<string> FindDirectory(string fileToFind, string path) {
// Check if "path" directly contains "fileToFind"
string[] files = null;
try {
files = Directory.GetFiles(path, fileToFind);
} catch { }
if (files != null) {
foreach (var file in files)
yield return file;
}
// Check all sub-directories of "path" to see if they contain "fileToFInd"
string[] subDirs = null;
try {
subDirs = Directory.GetDirectories(path);
} catch { }
if (subDirs == null)
yield break;
foreach (var subDir in subDirs)
foreach (var foundFile in FindDirectory(fileToFind, subDir))
yield return foundFile;
}
I am trying to build a recursive search function for a web service that returns a list of files and folders. I created the two methods so they act as recursive search, it first goes and gets the top level contents, then it adds any files to the fileList, and any sub folders to the subFoldersList. We pass in the access level (in our case root) and then the path which you want the information for. If any folders were found it then removes the top folder because it has begun the search for that folder. Then it calls the processDirectories method, which passes back to getFiles the new path location starting the process all over again. Right now for testing my folder structure is below. When it goes to add the second file (profilepic.png) to the list. I get an error "Collection was modified; enumeration operation may not execute." What is causing this error?
Photos
picture1.png
TestFolder
profilepic.png
my code:
public static List<string> fileList = new List<string>();
public static List<string> subFolderList = new List<string>();
static void processDirectories(string access, string Folder)
{
getFiles(access, Folder);
}
static void getFiles(string access, string Folder)
{
var accessToken = new OAuthToken(token, secret);
var api = new DssAPI(ConsumerKey, ConsumerSecret, accessToken);
var folder = api.GetContents(access, Folder);//Get list from WebService
foreach (var item in folder.Contents)//Contents is an IEnumerable
{
if (item.IsDirectory == true)
subFolderList.Add(item.Path);
else
fileList.Add(item.Path);
}
foreach (var subFolder in subFolderList)
{
subFolderList.RemoveAt(0);
processDirectories(root, subFolder);
}
}
Assuming you're not writing this as an academic exercise, you can use Directory.EnumerateFiles and avoid implementing this yourself.
foreach(var png in Directory.EnumerateFiles(sourceDirectory, "*.png", SearchOption.AllDirectories))
{
// do something with the png file
}
Change that:
foreach (var subFolder in subFolderList)
{
subFolderList.RemoveAt(0);
processDirectories(root, subFolder);
}
To:
while (subFolderList.Count > 0)
{
var subFolder = subFolderList[0];
subFolderList.RemoveAt(0);
processDirectories(root, subFolder);
}
A collection cannot be modified while iterating through it, so when you're foreach-ing it and removing items from it inside the iteration, it causes trouble. The workaround is usually using a for loop and manipulating the loop-variable appropriately, but in your case a while loop is simpler.
the problem is here
foreach (var subFolder in subFolderList)
{
subFolderList.RemoveAt(0);
processDirectories(root, subFolder);
}
You're iterating over subFilderList, and you're removing items from it at the same time. The machine doesn't know how to handle that.
What I would suggest, in this case, is probably doing a regular for-loop
Try this,
Public static void GetFilesLocal( string path)
{
foreach (string f in Directory.GetFiles( path))
{
// Add to subFolderList.
}
foreach (string d in Directory.GetDirectories( path))
{
GetFilesLocal( d );
}
}
You cannot go over the collection and modify it, as the error message says. For example the lower foreach is iterating the subFolderList and then you remove the first item. After that the iterators are not valid.
You should be using for loops, if you want to modify the collections, but then you have to remember to decrease the indexer variable if you delete the first item etc.
I am just learning C# (have been fiddling with it for about 2 days now) and I've decided that, for leaning purposes, I will rebuild an old app I made in VB6 for syncing files (generally across a network).
When I wrote the code in VB 6, it worked approximately like this:
Create a Scripting.FileSystemObject
Create directory objects for the source and destination
Create file listing objects for the source and destination
Iterate through the source object, and check to see if it exists in the destination
if not, create it
if so, check to see if the source version is newer/larger, and if so, overwrite the other
So far, this is what I have:
private bool syncFiles(string sourcePath, string destPath) {
DirectoryInfo source = new DirectoryInfo(sourcePath);
DirectoryInfo dest = new DirectoryInfo(destPath);
if (!source.Exists) {
LogLine("Source Folder Not Found!");
return false;
}
if (!dest.Exists) {
LogLine("Destination Folder Not Found!");
return false;
}
FileInfo[] sourceFiles = source.GetFiles();
FileInfo[] destFiles = dest.GetFiles();
foreach (FileInfo file in sourceFiles) {
// check exists on file
}
if (optRecursive.Checked) {
foreach (DirectoryInfo subDir in source.GetDirectories()) {
// create-if-not-exists destination subdirectory
syncFiles(sourcePath + subDir.Name, destPath + subDir.Name);
}
}
return true;
}
I have read examples that seem to advocate using the FileInfo or DirectoryInfo objects to do checks with the "Exists" property, but I am specifically looking for a way to search an existing collection/list of files, and not live checks to the file system for each file, since I will be doing so across the network and constantly going back to a multi-thousand-file directory is slow slow slow.
Thanks in Advance.
The GetFiles() method will only get you files that does exist. It doesn't make up random files that doesn't exist. So all you have to do is to check if it exists in the other list.
Something in the lines of this could work:
var sourceFiles = source.GetFiles();
var destFiles = dest.GetFiles();
foreach (var file in sourceFiles)
{
if(!destFiles.Any(x => x.Name == file.Name))
{
// Do whatever
}
}
Note: You have of course no guarantee that something hasn't changed after you have done the calls to GetFiles(). For example, a file could have been deleted or renamed if you try to copy it later.
Could perhaps be done nicer somehow by using the Except method or something similar. For example something like this:
var sourceFiles = source.GetFiles();
var destFiles = dest.GetFiles();
var sourceFilesMissingInDestination = sourceFiles.Except(destFiles, new FileNameComparer());
foreach (var file in sourceFilesMissingInDestination)
{
// Do whatever
}
Where the FileNameComparer is implemented like so:
public class FileNameComparer : IEqualityComparer<FileInfo>
{
public bool Equals(FileInfo x, FileInfo y)
{
return Equals(x.Name, y.Name);
}
public int GetHashCode(FileInfo obj)
{
return obj.Name.GetHashCode();
}
}
Untested though :p
One little detail, instead of
sourcePath + subDir.Name
I would use
System.IO.Path.Combine(sourcePath, subDir.Name)
Path does reliable, OS independent operations on file- and foldernames.
Also I notice optRecursive.Checked popping out of nowhere. As a matter of good design, make that a parameter:
bool syncFiles(string sourcePath, string destPath, bool checkRecursive)
And since you mention it may be used for large numbers of files, keep an eye out for .NET 4, it has an IEnumerable replacement for GetFiles() that will let you process this in a streaming fashion.
I have a job that runs every night to pull xml files from a directory that has over 20,000 subfolders under the root. Here is what the structure looks like:
rootFolder/someFolder/someSubFolder/xml/myFile.xml
rootFolder/someFolder/someSubFolder1/xml/myFile1.xml
rootFolder/someFolder/someSubFolderN/xml/myFile2.xml
rootFolder/someFolder1
rootFolder/someFolderN
So looking at the above, the structure is always the same - a root folder, then two subfolders, then an xml directory, and then the xml file.
Only the name of the rootFolder and the xml directory are known to me.
The code below traverses through all the directories and is extremely slow. Any recommendations on how I can optimize the search especially if the directory structure is known?
string[] files = Directory.GetFiles(#"\\somenetworkpath\rootFolder", "*.xml", SearchOption.AllDirectories);
Rather than doing GetFiles and doing a brute force search you could most likely use GetDirectories, first to get a list of the "First sub folder", loop through those directories, then repeat the process for the sub folder, looping through them, lastly look for the xml folder, and finally searching for .xml files.
Now, as for performance the speed of this will vary, but searching for directories first, THEN getting to files should help a lot!
Update
Ok, I did a quick bit of testing and you can actually optimize it much further than I thought.
The following code snippet will search a directory structure and find ALL "xml" folders inside the entire directory tree.
string startPath = #"C:\Testing\Testing\bin\Debug";
string[] oDirectories = Directory.GetDirectories(startPath, "xml", SearchOption.AllDirectories);
Console.WriteLine(oDirectories.Length.ToString());
foreach (string oCurrent in oDirectories)
Console.WriteLine(oCurrent);
Console.ReadLine();
If you drop that into a test console app you will see it output the results.
Now, once you have this, just look in each of the found directories for you .xml files.
I created a recursive method GetFolders using a Parallel.ForEach to find all the folders named as the variable yourKeyword
List<string> returnFolders = new List<string>();
object locker = new object();
Parallel.ForEach(subFolders, subFolder =>
{
if (subFolder.ToUpper().EndsWith(yourKeyword))
{
lock (locker)
{
returnFolders.Add(subFolder);
}
}
else
{
lock (locker)
{
returnFolders.AddRange(GetFolders(Directory.GetDirectories(subFolder)));
}
}
});
return returnFolders;
Are there additional directories at the same level as the xml folder? If so, you could probably speed up the search if you do it yourself and eliminate that level from searching.
System.IO.DirectoryInfo root = new System.IO.DirectoryInfo(rootPath);
List<System.IO.FileInfo> xmlFiles=new List<System.IO.FileInfo>();
foreach (System.IO.DirectoryInfo subDir1 in root.GetDirectories())
{
foreach (System.IO.DirectoryInfo subDir2 in subDir1.GetDirectories())
{
System.IO.DirectoryInfo xmlDir = new System.IO.DirectoryInfo(System.IO.Path.Combine(subDir2.FullName, "xml"));
if (xmlDir.Exists)
{
xmlFiles.AddRange(xmlDir.GetFiles("*.xml"));
}
}
}
I can't think of anything faster in C#, but do you have indexing turned on for that file system?
Only way I can see that would make much difference is to change from a brute strength hunt and use some third party or OS indexing routine to speed the return. that way the search is done off line from your app.
But I would also suggest you should look at better ways to structure that data if at all possible.
Use P/Invoke on FindFirstFile/FindNextFile/FindClose and avoid overhead of creating lots of FileInfo instances.
But this will be hard work to get right (you will have to do all the handling of file vs. directory and recursion yourself). So try something simple (Directory.GetFiles(), Directory.GetDirectories()) to start with and get things working. If it is too slow look at alternatives (but always measure, too easy to make it slower).
Depending on your needs and configuration, you could utilize the Windows Search Index: https://msdn.microsoft.com/en-us/library/windows/desktop/bb266517(v=vs.85).aspx
Depending on your configuration this could increase performance greatly.
For file and directory search purpose I would want to offer use multithreading .NET library that possess a wide search opportunities.
All information about library you can find on GitHub: https://github.com/VladPVS/FastSearchLibrary
If you want to download it you can do it here: https://github.com/VladPVS/FastSearchLibrary/releases
If you have any questions please ask them.
Works really fast. Check it yourself!
It is one demonstrative example how you can use it:
class Searcher
{
private static object locker = new object();
private FileSearcher searcher;
List<FileInfo> files;
public Searcher()
{
files = new List<FileInfo>();
}
public void Startsearch()
{
CancellationTokenSource tokenSource = new CancellationTokenSource();
searcher = new FileSearcher(#"C:\", (f) =>
{
return Regex.IsMatch(f.Name, #".*[Dd]ragon.*.jpg$");
}, tokenSource);
searcher.FilesFound += (sender, arg) =>
{
lock (locker) // using a lock is obligatorily
{
arg.Files.ForEach((f) =>
{
files.Add(f);
Console.WriteLine($"File location: {f.FullName}, \nCreation.Time: {f.CreationTime}");
});
if (files.Count >= 10)
searcher.StopSearch();
}
};
searcher.SearchCompleted += (sender, arg) =>
{
if (arg.IsCanceled)
Console.WriteLine("Search stopped.");
else
Console.WriteLine("Search completed.");
Console.WriteLine($"Quantity of files: {files.Count}");
};
searcher.StartSearchAsync();
}
}
It's part of other example:
***
List<string> folders = new List<string>
{
#"C:\Users\Public",
#"C:\Windows\System32",
#"D:\Program Files",
#"D:\Program Files (x86)"
}; // list of search directories
List<string> keywords = new List<string> { "word1", "word2", "word3" }; // list of search keywords
FileSearcherMultiple multipleSearcher = new FileSearcherMultiple(folders, (f) =>
{
if (f.CreationTime >= new DateTime(2015, 3, 15) &&
(f.Extension == ".cs" || f.Extension == ".sln"))
foreach (var keyword in keywords)
if (f.Name.Contains(keyword))
return true;
return false;
}, tokenSource, ExecuteHandlers.InCurrentTask, true);
***
Moreover one can use simple static method:
List<FileInfo> files = FileSearcher.GetFilesFast(#"C:\Users", "*.xml");
Note that all methods of this library DO NOT throw UnauthorizedAccessException instead standard .NET search methods.
Furthermore fast methods of this library are performed at least in 2 times faster than simple one-thread recursive algorithm if you use multicore processor.
For those of you who want to search for a single file and you know your root directory then I suggest you keep it simple as possible. This approach worked for me.
private void btnSearch_Click(object sender, EventArgs e)
{
string userinput = txtInput.Text;
string sourceFolder = #"C:\mytestDir\";
string searchWord = txtInput.Text + ".pdf";
string filePresentCK = sourceFolder + searchWord;
if (File.Exists(filePresentCK))
{
pdfViewer1.LoadFromFile(sourceFolder+searchWord);
}
else if(! File.Exists(filePresentCK))
{
MessageBox.Show("Unable to Find file :" + searchWord);
}
txtInput.Clear();
}// end of btnSearch method