I have a drive of 20 terabyte .Now I want to get oldest folder in the drive and move it to other location.Linear search is making my service quite slow.Any solution on how to get the oldest folder from the drive.
Thanks
You may better ask this at serverfault.com (check which tools already exists for this)
stackoverflow.com is not a programming service.
Said this, a few things you should ask yourself and consider:
What do you mean exactly by "oldest folder". The timestamp of the folder itself is useless in most cases. My experiences in this area show that best you can do is enumerating all FILES and check the modification timestamp.
What I can recommend:
Write a tool which recursively enumerates all files on the drive and writes the information (path, file size, time stamps) to a text file.
Later you can read this file and can create lists e.g.
how many files / GB of data on the drive is older than 2 years, folder with the oldest files in it, how many files ending in .xls are older than 6 months...
Linear search is making my service quite slow.
There is no API "query the oldest folder". You have to scan the drive in one way or another.
So my answer is more of a non-answer, sorry...
Related
I need copy files from my local hard drive to an external hard drive. My thought is, I only want to copy the files that do not currently exist. I am sure there is a much easier way to do as such, but this is where my mind went first.
My thoughts on how to accomplish this:
1) Get a list of all files on my C: drive and write to a text file
2) Get a list of all files on my L: drive (backup) and write to a text file
3) Compare C: drive text file to L: drive text file to find the files that do not exist
4) Write results of the files that do not exist to an array
5) Iterate through the newly created array and copy the files to the L: drive
Is there a more effective/time efficient way to accomplish this task?
For sure you don't want to create text files listing file names, and then compare them. That will be inefficient and clunky. The way to do this is to walk through the source directories looking for all the files. As you go, you'll be creating a matching destination path for each file. Just before you copy the file you need to decide whether or not to copy it. If a file exists at the destination path, skip copying.
Some enhancements on that might include skipping copying only if the file exists and the last modified date/time and file size match. And so on, I'm sure you can imagine variants on this.
One thing that you might not want to do is build a list of all the files first, and then start copying. It may very well be more efficient to copy files as you are iterating over the source directory. For example you could use Directory.EnumerateFiles to do this in an efficient way.
Of course, you don't need to write a program to do this. Thousands already exist, some of which are quite effective.
As part of our installer build, we have to zip thousands of large data files into about ten or twenty 'packages' with a few hundred (or even thousands of) files in each which are all dependent on being kept with the other files in the package. (They are versioned together if you will.)
Then during the actual install, the user selects which packages they want included on their system. This also lets them download updates to the packages from our site as one large, versioned file rather than asking them to download thousands of individual ones which could also lead to them being out of sync with others in the same package.
Since these are data files, some of them change regularly during the design and coding stages, meaning we then have to re-compress all files in that particular zip package, even if only one file has changed. This makes the packaging step of our installer build take well over an hour each time, with most of that going to re-compressing things that we haven't touched.
We've looked into leaving the zip packages alone, then replacing specific files inside them, but inserting and removing large files from the middle of a zip doesn't give us that much of a performance boost. (A little, but not enough that its worth it.)
I'm wondering if its possible to pre-process files down into a cached raw 'compressed state' that matches how it would be written to the zip package, but only the data itself, not the zip header info, etc.
My thinking is if that is possible, during our build step, we would first look for any data file that doesn't have a compressed cache associated with it, and if not, we would compress that file and write the result to the cache.
Next we would simply append all of the caches together in a file stream, adding any appropriate zip header needed for the files.
This would mean we are still recreating the entire zip during each build, but we are only recompressing data that has changed. The rest would just be written as-is which is very fast since it is a straight write-to-disk. And if a data file changes, its cache is destroyed, so next build-pass it would be recreated.
However, I'm not sure such a thing is possible. Is it, and if so, is there any documentation to show how one would go about attempting this?
Yes, that's possible. The most straightforward approach would be to zip each file individually into its own associated zip archive with one entry. When any file is modified, you replace its associated zip file to keep all of those up to date. Then you can write a simple program to take a set of those single entry zip files and merge them into a single zip file. You will need to refer to the documentation in the PKZip appnote. Take a look at that.
Now that you've read the appnote, what you need to do is use the local header, data, and central header from each individual zip file, write the local header and data as is sequentially to the new zip file, and save the central header and the offsets of the local headers in the new file. Then at the end of the new file save the current offset, write a new central directory using the central headers you saved, updating the offsets appropriately, and ending with a new end of central directory record with the offset of the start of the central directory.
Update:
I decided this was a useful enough thing to write. You can get it here.
You could zip each file before hand, and then "zip" them together with no compression at the end to quickly aggregate them into a distributable package. It won't be as efficient as compressing all the data at once, but should be faster to make modifications.
I cannot seem to locate an actual exe that implements this type of functionality. It appears that most existing tools I've tried that have the ability to merge/update will reprocess(compress) the data stream as you have already stated you saw.
However it seems what you describe can be done if you or someone wants to write it. If you take a look at this link for the ZIP file format specification, you can get an overview of the structure you would have to parse out and process. It looks like you can pretty quickly go from file to file gathering up and discarding the files of interest, then merging in your new/updated files. You would still need to rebuild a new central directory (refer to section 4.3.6 of the above linked document) within your new destination archive.
After a little more digging, the DotNetZip Library forum has a message asking about the same type of functionality which also gives a description just like I described above. It also links to this document which seems to indicate that support for that may be added to the DotNetZip library for you to further experiment with.
My program produces a log of info every hour that the system is running, that contains various data like access times, data transfers and any faults/warnings experienced. unfortunately these log files can be anywhere from 10,000KB to 25,000KB in size, so I've begun zipping them individually once they're at least 24hr old, this way my system has only 24 unzipped log files at any one time.
The issue I need to resolve is that part of this software is a 'Diagnostics' window, where the user can load up log files from a selected date range based on file's creation time and view their contents in an easy to read format. I understand that in order for the files to show up in their search there must be an exception allowing .zip to be checked as well, but I cannot access any of the file's data to see if said .zip files fall into the date range.
My question is: is their a way for me to access the zipped file's information (and to further extent it's contents) without having to unzip the files, do the search, re-zip the files? that seems like too much work to unzip one hundred or more files if only 1 or 2 fall in your date range.
You should add a timestamp to the filename of each zipped file.
In general, when you zip a file you're putting the actual data of the file into a format that is unreadable. Most zipping algorithms (keep in mind that there are very many) work on a very bit-hacky level, which is why you really need to unzip the files to get your original data out. (There's no such thing as a free lunch.)
Luckily though, a file is not just a file! Because you're totally right, having to read a file to do things with it would be terrible! Imagine having to search a file system if you had to read each file to figure out where in the directory it was.
There are a number of ways to access the metadata associated with your file depending on what exact system you're on. For instance, in unix-style machines using the command ls -l will get you the last edited information.
That said, log files usually have names that start with a timestamp for this exact reason. If you want to keep your filenames pretty though, going through the last-edited date is probably the way to go.
A good zip library (e.g. SharpZipLib) ought to allow you to iterate over the files contained in the archive without extracting them. This will allow you to query the associated file dates. For example, using the aforementioned SharpZipLib, you would just need to inspect the DateTime property of the ZipEntry objects contained in the archive.
I'm trying to find some lost .jpg pictures. Here's a .bat file to setup a simplified version of my situation
md TestSetup
cd TestSetup
md a
cd a
echo "Can we find this later?" > a.abc
del a.abc
cd..
rd a
What code would be needed to open the text file again? I'm actually looking for .jpeg files that were treated in a similar manner
More details: I'm trying to recover picture files from a previous one-touch backup where the directories and files have been deleted and everything was saved in the backup with a single character name and every file has the same 3 letter extension. There is a current backup but they need to view the previous deleted ones (or at least the .jpg files).
Here's how I was trying to approach it: C# code
To the best of my knowledge, most file recovery tools actually read the low-level filesystem format on the disk and try to piece together deleted files. This works because, at least in FAT, a deleted file still resides in the sector specifying the directory (just with a different first character to identify it as "deleted"). New files may overwrite these deleted entries and therefore make the file unrecoverable. That's just a little bit of theory.
There is a current backup but they
need to view the previous deleted ones
(or at least the .jpg files).
Unless there's a backup for that file at the time that you want to restore from, I believe you're going to have a hard time getting that file without resorting to a low-level filesystem read. And even then, you may be out of luck if enough revisions have been made (or it's not a trivial filesystem like FAT).
I've got a project which requires a fairly complicated process and I want to make sure I know the best way to do this. I'm using ASP.net C# with Adobe Flex 3. The app server is Mosso (cloud server) and the file storage server is Amazon S3. The existing site can be viewed at NoiseTrade.com
I need to do this:
Allow users to upload MP3 files to
an album "widget"
After the user has uploaded their
album/widget, I need to
automatically zip the mp3 (for other
users to download) and upload the
zip along with the mp3 tracks to
Amazon S3
I actually have this working already (using client side processing in Flex) but this no longer works because of Adobe's flash 10 "security" update. So now I need to implement this server-side.
The way I am thinking of doing this is:
Store the mp3 in a temporary folder
on the app server
When the artist "publishes" create a
zip of the files in that folder
using a c# library
Start the amazon S3 upload process (zip and mp3s)
and email the user when it is
finished (as well as deleting the
temporary folder)
The major problem I see with this approach is that if a user deletes or adds a track later on I'll have to update the zip file but the temporary files will not longer exist.
I'm at a loss at the best way to do this and would appreciate any advice you might have.
Thanks!
The bit about updating the zip but not having the temporary files if the user adds or removes a track leads me to suspect that you want to build zips containing multiple tracks, possibly complete albums. If this is incorrect and you're just putting a single mp3 into each zip, then StingyJack is right and you'll probably end up making the file (slightly) larger rather than smaller by zipping it.
If my interpretation is correct, then you're in luck. Command-line zip tools frequently have flags which can be used to add files to or delete files from an existing zip archive. You have not stated which library or other method you're using to do the zipping, but I expect that it probably has this capability as well.
MP3's are compressed. Why bother zipping them?
I would say it is not necessary to zip a compressed file format, you are only gong to get a five percent reduction in filesize, give or take a little. Mp3's dont really zip up by their nature the have compressed most of the possible data already.
DotNetZip can zip up files from C#/ASP.NET. I concur with the prior posters regarding compressibility of MP3s. DotNetZip will automatically skip compression on MP3, and just store the file, just for this reason. It still may be interesting to use a zip as a packaging/archive container, aside from the compression.
If you change the zip file later (user adds a track), you could grab the .zip file from S3, and just update it. DotNetZip can update zip files, too. But in this case you would have to pay for the transfer cost into and out of S3.
DotNetZip can do all of this with in-memory handling of the zips - though that may not be feasible for large archives with lots of MP3s and lots of concurrent users.