Selecting entries according to running total - c#

I would like to select from a list of files only so many files that their total size does not exceed a threshold (i.e. the amount of free space on the target drive).
I understand that I could do this by adding up file sizes in a loop until I hit the threshold and then use that number to select files from the list. However, is it possible to do that with a LINQ-query instead?

This could work (files is a List<FileInfo>):
var availableSpace = DriveInfo.GetDrives()
.First(d => d.Name == #"C:\").AvailableFreeSpace;
long usedSpace = 0;
var availableFiles = files
.TakeWhile(f => (usedSpace += f.Length) < availableSpace);
foreach (FileInfo file in availableFiles)
{
Console.WriteLine(file.Name);
}

You can achieve that by using a closure:
var directory = new DirectoryInfo(#"c:\temp");
var files = directory .GetFiles();
long maxTotalSize = 2000000;
long aggregatedSize = 0;
var result = files.TakeWhile(fileInfo =>
{
aggregatedSize += fileInfo.Length;
return aggregatedSize <= maxTotalSize;
});
Theres a caveat though, because the variable aggregatedSize may get modified after you have left the scope where it has been defined.
You could wrap that in an extension method though - that would eliminate the closure:
public static IEnumerable<FileInfo> GetWithMaxAggregatedSize(this IEnumerable<FileInfo> files, long maxTotalSize)
{
long aggregatedSize = 0;
return files.TakeWhile(fileInfo =>
{
aggregatedSize += fileInfo.Length;
return aggregatedSize <= maxTotalSize;
});
}
You finally use the method like this:
var directory = new DirectoryInfo(#"c:\temp");
var files = directory.GetFiles().GetWithMaxAggregatedSize(2000000);
EDIT: I replaced the Where-method with the TakeWhile-method. The TakeWhile-extension will stop once the threshold has been reached, while the Where-extension will continue. Credits for bringing up the TakeWhile-extension go to Tim Schmelter.

Related

Count how many files starts with the same first characters c#

I want to make function that will count how many files in selected folder starts with the same 10 characters.
For example in folder will be files named File1, File2, File3 and int count will give 1 because all 3 files starts with the same characters "File", if in folder will be
File1,File2,File3,Docs1,Docs2,pdfs1,pdfs2,pdfs3,pdfs4
will give 3, because there are 3 unique values for fileName.Substring(0, 4).
I've tried something like this, but it gives overall number of files in folder.
int count = 0;
foreach (string file in Directory.GetFiles(folderLocation))
{
string fileName = Path.GetFileName(file);
if (fileName.Substring(0, 10) == fileName.Substring(0, 10))
{
count++;
}
}
Any idea how to count this?
You can try querying directory with a help of Linq:
using System.IO;
using System.Linq;
...
int n = 10;
int count = Directory
.EnumerateFiles(folderLocation, "*.*")
.Select(file => Path.GetFileNameWithoutExtension(file))
.Select(file => file.Length > n ? file.Substring(0, n) : file)
.GroupBy(name => name, StringComparer.OrdinalIgnoreCase)
.OrderByDescending(group => group.Count())
.FirstOrDefault()
?.Count() ?? 0;
You could instantiate a list of strings of files with a unique name, and check if each file is in that list or not:
int count = 0;
int length = 0;
List<string> list = new List<string>();
foreach (string file in Directory.GetFiles(folderLocation))
{
boolean inKnown = false;
string fileName = Path.GetFileName(file);
for (string s in list)
{
if (s.Length() < length)
{
// Add to known list just so that we don't check for this string later
inKnown = true;
count--;
break;
}
if (s.Substring(0, length) == fileName.Substring(0, length))
{
inKnown = true;
break;
}
}
if (!inKnown)
{
count++;
list.Add(s);
}
}
The limitation here is that you are asking if the first ten characters are the same, but your examples given showed the first 4, so just adjust the length variable according to how many characters you would like to check for.
#acornTime give me idea, his solution didn't work but this worked. Thanks for help!
List<string> list = new List<string>();
foreach (string file in Directory.GetFiles(folderLocation))
{
string fileName = Path.GetFileName(file);
list.Add(fileName.Substring(0, 10));
}
list = list.Distinct().ToList();
//count how many items are in list
int count = list.Count;

What is better way to delete file with condition

I want to build a win service(no UI) on c# that all what it done is: run on list of directories and delete files that over then X kb.
I want the better performance,
what is the better way to do this?
there is no pure async function for delete file so if i want to use async await
I can wrap this function like:
public static class FileExtensions {
public static Task DeleteAsync(this FileInfo fi) {
return Task.Factory.StartNew(() => fi.Delete() );
}
}
and call to this function like:
FileInfo fi = new FileInfo(fileName);
await fi.DeleteAsync();
i think to run like
foreach file on ListOfDirectories
{
if(file.Length>1000)
await file.DeleteAsync
}
but on this option the files will delete 1 by 1 (and every DeleteAsync will use on thread from the threadPool).
so i not earn from the async, i can do it 1 by 1.
maybe i think to collect X files on list and then delete them AsParallel
please help me to find the better way
You can use Directory.GetFiles("DirectoryPath").Where(x=> new FileInfo(x).Length < 1000); to get a list of files that are under 1 KB of size.
Then use Parallel.ForEach to iterate over that collection like this:
var collectionOfFiles = Directory.GetFiles("DirectoryPath")
.Where(x=> new FileInfo(x).Length < 1000);
Parallel.ForEach(collectionOfFiles, File.Delete);
It could be argued that you should use:
Parallel.ForEach(collectionOfFiles, currentFile =>
{
File.Delete(currentFile);
});
to improve the readability of the code.
MSDN has a simple example on how to use Parallel.ForEach()
If you are wondering about the FileInfo object, here is the documentation
this is may be can help you.
public static class FileExtensions
{
public static Task<int> DeleteAsync(this IEnumerable<FileInfo> files)
{
var count = files.Count();
Parallel.ForEach(files, (f) =>
{
f.Delete();
});
return Task.FromResult(count);
}
public static async Task<int> DeleteAsync(this DirectoryInfo directory, Func<FileInfo, bool> predicate)
{
return await directory.EnumerateFiles().Where(predicate).DeleteAsync();
}
public static async Task<int> DeleteAsync(this IEnumerable<FileInfo> files, Func<FileInfo, bool> predicate)
{
return await files.Where(predicate).DeleteAsync();
}
}
var _byte = 1;
var _kb = _byte * 1000;
var _mb = _kb * 1000;
var _gb = _mb * 1000;
DirectoryInfo d = new DirectoryInfo(#"C:\testDirectory");
var deletedFileCount = await d.DeleteAsync(f => f.Length > _mb * 1);
Debug.WriteLine("{0} Files larger than 1 megabyte deleted", deletedFileCount);
// => 7 Files larger than 1 megabyte deleted
deletedFileCount = await d.GetFiles("*.*",SearchOption.AllDirectories)
.Where(f => f.Length > _kb * 10).DeleteAsync();
Debug.WriteLine("{0} Files larger than 10 kilobytes deleted", deletedFileCount);
// => 11 Files larger than 10 kilobytes deleted

C# : How to save a zip file every X files

I have a program written in C# which should save a zip file every n records (like 500).
My idea was using the mod operator (%) and where the result of the operation is zero then write the file. Which is good, but: what if I have 520 records? I should write 500 files inside the first zip and then 20 file on the second one.
Here the code:
using (ZipFile zip = new ZipFile())
{
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.Level8;
zip.CompressionMethod = CompressionMethod.Deflate;
int indexrow = 0;
foreach(DataRow row in in_dt.Rows)
{
zip.AddFile(row["Path"].ToString(),"prova123");
if(indexrow % 500 == 0)
{
using (var myZipFile = new FileStream("c:\\tmp\\partial_"+indexrow.ToString()+".zip", FileMode.Create))
{
zip.Save(myZipFile);
}
indexrow = indexrow++;
}
}
}
}
in_dt is a datatable which contains all the file paths on filesystem.
zip object is an object based on the dotnetzip library.
I'd use LINQ for this problem:
// Define the group size
const int GROUP_SIZE = 500;
// Select a new object type that encapsulates the base item
// and a new property called "Grouping" that will group the
// objects based on their index relative to the group size
var groups = in_dt
.Rows
.AsEnumerable()
.Select(
(item, index) => new {
Item = item,
Index = index,
Grouping = Math.Floor(index / GROUP_SIZE)
}
)
.GroupBy(item => item.Grouping)
;
// Loop through the groups
foreach (var group in groups) {
// Generate a zip file for each group of files
}
For files 0 through 499, the Grouping property is 0.
For files 500 - 520, the Grouping property is 1.
What you probably want to do is something like this:
zipFiles(File[] Files, int MaxFilesInZip)
{
int Parts = Files.Count / MaxFilesInZip;
int Remaning = Files.Count % MaxFilesInZip;
for(int i = 0; i < Parts; i++)
//New zip
for(int u = 0; u < MaxFilesInZip; u++)
//Add Files[i*MaxFilesInZip + u]
//New Zip
//Add 'Remaning' amount of files
}
This way if you run the function like ths: zipFiles(520, 250), you would have 2*250 zip files and 1*20 with the remaning. You might have to work something with value on Parts (Floor/Celling).

How to process directory files in Task parallel library?

I have a scenario in which i have to process the multiple files(e.g. 30) parallel based on the processor cores. I have to assign these files to separate tasks based on no of processor cores. I don't know how to make a start and end limit of each task to process. For example each and every task knows how many files it has to process.
private void ProcessFiles(object e)
{
try
{
var diectoryPath = _Configurations.Descendants().SingleOrDefault(Pr => Pr.Name == "DirectoryPath").Value;
var FilePaths = Directory.EnumerateFiles(diectoryPath);
int numCores = System.Environment.ProcessorCount;
int NoOfTasks = FilePaths.Count() > numCores ? (FilePaths.Count()/ numCores) : FilePaths.Count();
for (int i = 0; i < NoOfTasks; i++)
{
Task.Factory.StartNew(
() =>
{
int startIndex = 0, endIndex = 0;
for (int Count = startIndex; Count < endIndex; Count++)
{
this.ProcessFile(FilePaths);
}
});
}
}
catch (Exception ex)
{
throw;
}
}
For problems such as yours, there are concurrent data structures available in C#. You want to use BlockingCollection and store all the file names in it.
Your idea of calculating the number of tasks by using the number of cores available on the machine is not very good. Why? Because ProcessFile() may not take the same time for each file. So, it would be better to start the number of tasks as the number of cores you have. Then, let each task read file name one by one from the BlockingCollection and then process the file, until the BlockingCollection is empty.
try
{
var directoryPath = _Configurations.Descendants().SingleOrDefault(Pr => Pr.Name == "DirectoryPath").Value;
var filePaths = CreateBlockingCollection(directoryPath);
//Start the same #tasks as the #cores (Assuming that #files > #cores)
int taskCount = System.Environment.ProcessorCount;
for (int i = 0; i < taskCount; i++)
{
Task.Factory.StartNew(
() =>
{
string fileName;
while (!filePaths.IsCompleted)
{
if (!filePaths.TryTake(out fileName)) continue;
this.ProcessFile(fileName);
}
});
}
}
And the CreateBlockingCollection() would be as follows:
private BlockingCollection<string> CreateBlockingCollection(string path)
{
var allFiles = Directory.EnumerateFiles(path);
var filePaths = new BlockingCollection<string>(allFiles.Count);
foreach(var fileName in allFiles)
{
filePaths.Add(fileName);
}
filePaths.CompleteAdding();
return filePaths;
}
You will have to modify your ProcessFile() to receive a file name now instead of taking all the file paths and processing its chunk.
The advantage of this approach is that now your CPU won't be over or under subscribed and the load will be evenly balanced too.
I haven't run the code myself, so there might be some syntax error in my code. Feel free to correct the error, if you come across any.
Based on my admittedly limited understanding of the TPL, I think your code could be rewritten as such:
private void ProcessFiles(object e)
{
try
{
var diectoryPath = _Configurations.Descendants().SingleOrDefault(Pr => Pr.Name == "DirectoryPath").Value;
var FilePaths = Directory.EnumerateFiles(diectoryPath);
Parallel.ForEach(FilePaths, path => this.ProcessFile(path));
}
catch (Exception ex)
{
throw;
}
}
regards

How to remove Folders whose contents are empty

This question may seem a bit absurd but here goes..
I have a directory structure: It has 8 levels. So, for example this is 1 path:
C:\Root\Catalogue\000EC902F17F\2\2013\11\15\13
The '2' is an index for a webcam. I have 4 in total. so..
C:\Root\Catalogue\000EC902F17F\1\2013\11\15\13
C:\Root\Catalogue\000EC902F17F\2\2013\11\15\13
C:\Root\Catalogue\000EC902F17F\3\2013\11\15\13
C:\Root\Catalogue\000EC902F17F\4\2013\11\15\13
The '000EC902F17F' is my own uuid for my webcam.
The '2013' is the year.
The '11' is the month.
The '13' is the day.
When I capture motion the jpegs are saved in a directory that signifies when that image was captured.
I have a timer that goes through each directory and create a video file from the images. The images are then deleted.
Now, I want to have another timer that will go through each directory to check for empty directories. If they are empty the folder is deleted.
This tidy-up timer will look at directories created that are older than the current day it runs.
I presently have this:
private List<string> GetFoldersToDelete()
{
DateTime to_date = DateTime.Now.AddDays(-1);
List<string> paths = Directory.EnumerateDirectories(#"C:\MotionWise\Catalogue\" + Shared.ActiveMac, "*", SearchOption.AllDirectories)
.Where(path =>
{
DateTime lastWriteTime = File.GetLastWriteTime(path);
return lastWriteTime <= to_date;
})
.ToList();
return paths;
}
called by:
List<string> _deleteMe = new List<string>();
List<string> _folders2Delete = GetFoldersToDelete();
foreach (string _folder in _folders2Delete)
{
List<string> _folderContents = Directory.EnumerateFiles(_folder).ToList();
if (_folderContents.Count == 0)
{
_folders2Delete.Add(_folder);
}
}
for (int _index = 0; _index < _folders2Delete.Count; _index++)
{
Directory.Delete(_folders2Delete[_index];
}
Is there a better way to achieve what I want?
Something like this?
private void KillFolders()
{
DateTime to_date = DateTime.Now.AddDays(-1);
List<string> paths = Directory.EnumerateDirectories(#"C:\MotionWise\Catalogue\" + Shared.ActiveMac, "*", SearchOption.TopDirectoryOnly)
.Where(path =>
{
DateTime lastWriteTime = File.GetLastWriteTime(path);
return lastWriteTime <= to_date;
})
.ToList();
foreach (var path in paths))
{
cleanDirs(path);
}
}
private static void cleanDirs(string startLocation)
{
foreach (var directory in Directory.GetDirectories(startLocation))
{
cleanDirs(directory);
if (Directory.GetFiles(directory).Length == 0 && Directory.GetDirectories(directory).Length == 0)
{
Directory.Delete(directory, false);
}
}
}
Note; this wont regard subdirs last writeTime. It will jsut take from the topDir where you have all the diff folders with dates older than a day and clean empty subdirs.
And if your goal is to simply clean empty folders in a target Dir the "cleanDirs" function woorks standalone..
A slightly different take, for comparison:
public static void DeleteEmptyFolders(string rootFolder)
{
foreach (string subFolder in Directory.EnumerateDirectories(rootFolder))
DeleteEmptyFolders(subFolder);
DeleteFolderIfEmpty(rootFolder);
}
public static void DeleteFolderIfEmpty(string folder)
{
if (!Directory.EnumerateFileSystemEntries(folder).Any())
Directory.Delete(folder);
}
(I find this slightly more readable.)
Here's a quick piece of code:
static void Main(string[] args)
{
var baseDirectory = ".";
DeleteEmptyDirectory(baseDirectory);
}
static bool DeleteEmptyDirectory(string directory)
{
var subDirs = Directory.GetDirectories(directory);
var canDelete = true;
if (subDirs.Any())
foreach (var dir in subDirs)
canDelete = DeleteEmptyDirectory(dir) && canDelete;
if (canDelete && !Directory.GetFiles(directory).Any())
{
Directory.Delete(directory);
return true;
}
else
return false;
}
This will delete all empty folders and leave anything with any files in it intact.
Regarding the comment you made about recursion... I wouldn't worry about it, unless you have crazy symlinks creating an infinite directory structure. ;)

Categories