Does anyone know how to (natively) get the max allowed file size for a given drive/folder/directory? As in for Fat16 it is ~2gb, Fat32 it was 4gb as far as I remember and for the newer NTFS versions it is something way beyond that.. let alone Mono and the underlying OSes.
Is there anything I can read out / retrieve that might give me a hint on that? Basically I -know- may app will produce bigger, single files than 2gb and I want to check for that when the user sets the corresponding output path(s)...
Cheers & thanks,
-J
This may not be the ideal solution, but I will suggest the following anyway:
// Returns the maximum file size in bytes on the filesystem type of the specified drive.
long GetMaximumFileSize(string drive)
{
var driveInfo = new System.IO.DriveInfo(drive)
switch(driveInfo.DriveFormat)
{
case "FAT16":
return 1000; // replace with actual limit
case "FAT32":
return 1000; // replace with actual limit
case "NTFS":
return 1000; // replace with actual limit
}
}
// Examples:
var maxFileSize1 = GetMaximumFileSize("C"); // for the C drive
var maxFileSize2 = GetMaximumFileSize(absolutePath.Substring(0, 1)); // for whichever drive the given absolute path refers to
This page on Wikipedia contains a pretty comprehensive list of the maximum file sizes for various filesystems. Depending on the number of filesystems for which you want to check in the GetMaximumFileSize function, you may want to use a Dictionary object or even a simple data file rather than a switch statement.
Now, you may be retrieve the maximum file size directly using WMI or perhaps even the Windows API, but these solutions will of course only be compatible with Windows (i.e. no luck with Mono/Linux). However, I would consider this a reasonably nice purely managed solution, despite the use of a lookup table, and has the bonus of working reliably on all OSs.
Hope that helps.
How about using System.Info.DriveInfo.DriveFormat to retrieve the drive's file system (NTFS, FAT, ect.)? That ought to give you at least some idea of the supported file sizes.
Related
I am currently trying to read from a few images the text and it seems that the google api is skipping some 0's.
Here is the code:
Google.Cloud.Vision.V1.Image image = Google.Cloud.Vision.V1.Image.FromFile(imagepath);
ImageAnnotatorClient client = ImageAnnotatorClient.Create();
IReadOnlyList<EntityAnnotation> response = client.DetectText(image);
string test = string.Empty;
foreach (EntityAnnotation annotation in response)
{
if (annotation.Description != null)
{
Console.WriteLine(annotation.Description);
test += Environment.NewLine + annotation.Description;
}
}
Here is the image(s) it is attempting:Attempt 1Attempt 2Attempt 3
Are there settings I need to change to make it accept 0's?
Also here is the output from
Attempt 1: https://pastebin.com/dNxRt7QK
results above
Attempt 2: https://pastebin.com/XVZzmtTg
results above
Attempt 3: https://pastebin.com/2kQMiC8h
results above
It's really good at reading everything but it really hates reading 0's.
The Deaths specifically in Attempt 2/3.
Edit:
Adding in a few results showing this from the google drag-n-drop testing:
Attempt 1
Attempt 2
In order to get better results, it is recommended not to use lossy formats (example of lossy format: JPEG). Using or reducing file sizes for such lossy formats may result in a degradation of image quality, and hence, Vision API accuracy.
The image’s recommended size is 1024 x 768 for the features TEXT_DETECTION and DOCUMENT_TEXT_DETECTION. As an additional note:
The Vision API requires images to be a sufficient size so that
important features within the request can be easily distinguished.
Sizes smaller or larger than these recommended sizes may work.
However, smaller sizes may result in lower accuracy, while larger
sizes may increase processing time and bandwidth usage without
providing comparable benefits in accuracy. Image size should not
exceed 75M pixels (length x width) for OCR analysis.
The items discussed above can be found in this article.
With the code you are using, you can alternately use the DOCUMENT_TEXT_DETECTION feature and select the ones which gives you better results. I see that you are using the code in this link for TEXT_DETECTION. Try using the code in this link for DOCUMENT_TEXT_DETECTION.
In case that issue still persists after the suggested actions, I recommend that you contact Google Cloud Platform Support or create a public issue via this link.
i want a fast way in c# to remove a blocks of bytes in different places from binary file of size between 500MB to 1GB , the start and the length of bytes needed to be removed are in saved array
int[] rdiDataOffset= {511,15423,21047};
int[] rdiDataSize={102400,7168,512};
EDIT:
this is a piece of my code and it will not work correctly unless i put buffer size to 1:
while(true){
if (rdiDataOffset.Contains((int)fsr.Position))
{
int idxval = Array.IndexOf(rdiDataOffset, (int)fsr.Position, 0, rdiDataOffset.Length);
int oldRFSRPosition = (int)fsr.Position;
size = rdiDataSize[idxval];
fsr.Seek(size, SeekOrigin.Current);
}
int bufferSize = size == 0 ? 2048 : size;
if ((size>0) && (bufferSize > (size))) bufferSize = (size);
if (bufferSize > (fsr.Length - fsr.Position)) bufferSize = (int)(fsr.Length - fsr.Position);
byte[] buffer = new byte[bufferSize];
int nofbytes = fsr.Read(buffer, 0, buffer.Length);
fsr.Flush();
if (nofbytes < 1)
{
break;
}
}
No common file system provides an efficient way to remove chunks from the middle of an existing file (only truncate from the end). You'll have to copy all the data after the removal back to the appropriate new location.
A simple algorithm for doing this using a temp file (it could be done in-place as well but you have a riskier situation in case things go wrong).
Create a new file and call SetLength to set the stream size (if this is too slow you can Interop to SetFileValidData). This ensures that you have room for your temp file while you are doing the copy.
Sort your removal list in ascending order.
Read from the current location (starting at 0) to the first removal point. The source file should be opened without granting Write share permissions (you don't want someone mucking with it while you are editing it).
Write that content to the new file (you will likely need to do this in chunks).
Skip over the data not being copied
Repeat from #3 until done
You now have two files - the old one and the new one ... replace as necessary. If this is really critical data you might want to look a transactional approach (either one you implement or using something like NTFS transactions).
Consider a new design. If this is something you need to do frequently then it might make more sense to have an index in the file (or near the file) which contains a list of inactive blocks - then when necessary you can compress the file by actually removing blocks ... or maybe this IS that process.
If you're on the NTFS file system (most Windows deployments are) and you don't mind doing p/invoke methods, then there is a way, way faster way of deleting chunks from a file. You can make the file sparse. With sparse files, you can eliminate a large chunk of the file with a single call.
When you do this, the file is not rewritten. Instead, NTFS updates metadata about the extents of zeroed-out data. The beauty of sparse files is that consumers of your file don't have to be aware of the file's sparseness. That is, when you read from a FileStream over a sparse file, zeroed-out extents are transparently skipped.
NTFS uses such files for its own bookkeeping. The USN journal, for example, is a very large sparse memory-mapped file.
The way you make a file sparse and zero-out sections of that file is to use the DeviceIOControl windows API. It is arcane and requires p/invoke but if you go this route, you'll surely hide the uggles behind nice pretty function calls.
There are some issues to be aware of. For example, if the file is moved to a non-ntfs volume and then back, the sparseness of the file can disappear - so you should program defensively.
Also, a sparse file can appear to be larger than it really is - complicating tasks involving disk provisioning. A 5g sparse file that has been completely zeroed out still counts 5g towards a user's disk quota.
If a sparse file accumulates a lot of holes, you might want to occasionally rewrite the file in a maintenance window. I haven't seen any real performance troubles occur, but I can at least imagine that the metadata for a swiss-cheesy sparse file might accrue some performance degradation.
Here's a link to some doc if you're into the idea.
I've got pretty unusual request:
I would like to load all files from specific folder (so far easy). I need something with very small memory footprint.
Now it gets complicated (at least for me). I DON'T need to store or use the content of the files - I just need to force block-level caching mechanism to cache all the blocks that are used by that specific folder.
I know there are many different methods (BinaryReader, StreamReader etc.), but my case is quite special, since I don't care about the content...
Any idea what would be the best way how to achieve this?
Should I use small buffer? But since it would filled quickly, wouldn't flushing of the buffer actually slow down the operation?
Thanks,
Martin
I would perhaps memory map the files and then loop around accessing an element of each file at regular (block-spaced) intervals.
Assuming of course that you are able to use .Net 4.0.
In psuedo code you'd do something like:
using ( var mmf = MemoryMappedFile.CreateFromFile( path ) )
{
for ( long offset = 0 ; offset < file.Size ; offset += block_size )
{
using ( var acc = accessor = mmf.CreateViewAccessor(offset, 1) )
{
acc.ReadByte(offset);
}
}
}
But at the end of the day, each method will have different performance characteristics so you might have to use a bit of trial and error to find out which is the most performant.
I would simply read those files. When you do that, CacheManager in NTFS caches these files automatically, and you don't have to care about anything else - that's exactly the role of CacheManager, and by reading these files, you give it a hint that these files should be cached.
Im looking for a way to get the maximum supported screen resolution.
I need to find this without any drivers installed.
I have already tried using WMI and the EnumDisplaySettings.
Is it possible to get this information direct from the hardware, or do i need to look it up online? If online, which information do i then need in order to look it up?
EnumDisplaySettings give you all the screen resolutions in a loop. It is up to you to choose which one is the "maximum" (the widest or the tallest?)
I've done it, in C++ :
for (i=0;; i++)
{
memset(&vimodetmp,0,sizeof vimodetmp);
vimodetmp.dmSize = sizeof vimodetmp;
if (!EnumDisplaySettings(DisplayDevice.DeviceName,i,&vimodetmp))
{
break;
}
// store in a array
}
// you can choose in the array
Hope that can help you.
I have a physical Directory structure as :
Root directory (X) -> many subdirectory in side root (1,2,3,4..) -> In each sub dir many files present.
Photos(Root)
----
123456789(Child One)
----
1234567891_w.jpg (Child two)
1234567891_w1.jpg(Child two)
1234567891_w2.jpg(Child two)
1234567892_w.jpg (Child two)
1234567892_w1.jpg(Child two)
1234567892_w2.jpg(Child two)
1234567893_w.jpg(Child two)
1234567893_w1.jpg(Child two)
1234567893_w2.jpg(Child two)
-----Cont
232344343(Child One)
323233434(Child One)
232323242(Child One)
232324242(Child One)
----Cont..
In database I have one table having huge number of names of type "1234567891_w.jpg".
NOTE : Both number of data in database and number of photos are in lacs.
I need an effective and faster way to check the presence of each name from database table to the physical directory structure.
Ex : Whether any file with "1234567891_w.jpg" name is present in physical folder inside Photos (Root).*
Please let me know if I miss any information to be given here.
Update :
I know how to find a file name existance in a directory. But I am looking for an efficient way, as it will be too much resource consuming to check each filename (from lacs of record) existance in more than 40 GB data.
You can try to group data from the database based on the directory in which they are. Sort them somehow (based on the filename for instance) and then get the array of files within that directory
string[] filePaths = Directory.GetFiles(#"c:\MyDir\");. Now you only have to compare strings.
It might sound funny or Might be I was unclear or did not provide much information..
But from the directory pattern I got one nice way to handle it is :
AS the probability of existance of the file name is only in one location and that is :
Root/SubDir/filename
I should be using :
File.Exists(Root/SubDir/filename);
i.e - Photos/123456789/1234567891_w.jpg
And I think this will be O(1)
it would seem the files are uniquely named if that's the case you can do something like this
var fileNames = GetAllFileNamesFromDb();
var physicalFiles = Directory.GetFiles(rootDir,
string.Join(",",fileNames),
SearchOptions.AllDirectories)
.Select(f=>Path.GetFileName(f));
var setOfFiles = new Hashset<string>(physicalFiles);
var notPresent = from name in fileNames
where setOfFiles.Contains(name)
select name;
First get all the names of the files from the datatbase
Then search for all the files at once searching from the root and including all subdirectories to get all the physical files
Create a Hashset for fast lookup
Then match the fileNames to the set those not in the set are selected.
the Hashset is basically just a set. That is a collection that can only incude an item once (Ie there's no duplicates) equality in the Hashset is based on HashCode and the lookup to determine if an item is in the set is O(1).
This approach requires you to store a potentially hugh Hashset in memory and depending on the size of that set it might affect the system to an extend where it's no longer optimizing the speed of the application but passes an optimum instead.
As is the case with most optimizations they are all trade offs and the key is finding the balance between all the trade offs in the context of the value the application is producing for the end user
Unfortunately their is no magic bullet which you could use to improve your performance. As always it will be a trade off between speed and memory. Also their are two sides which could lack on performance: The database site and the hdd drive i/o speed.
So to gain speed i would in a first step improve the performance of the database query to ensure that it can return the names for searching fast enough. So ensure that your query is fast and also maybe uses (im MS SQL case) keywords like READ SEQUENTIAL in this case you will already retrieve the first results while the query is still running and you don't have to wait till the query finished and gave you the names as a big block.
On the other hdd side you can either call Directory.GetFiles(), but this call would block till it iterated over all files and will give you back a big array containing all filenames. This would be the memory consuming path and take a while for the first search, but if you afterwards only work on that array you get speed improvements for all consecutive searches. Another approach would be to call Directory.EnumerateFiles() which would search the drive on the fly by every call and so maybe gain speed for the first search, but their won't happen any memory storage for the next search which improves memory footprint but costs speed, due to the fact that their is no array in your memory which could be searched. On the other hand the OS will also do some caching if detects that you iterate over the same files over and over again and some caching occurs on a lower level.
So for the check on hdd site use Directory.GetFiles() if the returned array won't blow your memory and do all your searches on this (maybe put it into a HashSet to further improve performance if filename only or full path depends on what you get from your database) and in the other case use Directory.EnumerateFiles() and hope the best for some caching done be the OS.
Update
After re-reading your question and comments, as far as i understand you have a name like 1234567891_w.jpg and you don't know which part of the name represents the directory part. So in this case you need to make an explicit search, cause iteration through all directories simply takes to much time. Here is some sample code, which should give you an idea on how to solve this in a first shot:
string rootDir = #"D:\RootDir";
// Iterate over all files reported from the database
foreach (var filename in databaseResults)
{
var fullPath = Path.Combine(rootDir, filename);
// Check if the file exists within the root directory
if (File.Exists(Path.Combine(rootDir, filename)))
{
// Report that the file exists.
DoFileFound(fullPath);
// Fast exit to continue with next file.
continue;
}
var directoryFound = false;
// Use the filename as a directory
var directoryCandidate = Path.GetFileNameWithoutExtension(filename);
fullPath = Path.Combine(rootDir, directoryCandidate);
do
{
// Check if a directory with the given name exists
if (Directory.Exists(fullPath))
{
// Check if the filename within this directory exists
if (File.Exists(Path.Combine(fullPath, filename)))
{
// Report that the file exists.
DoFileFound(fullPath);
directoryFound = true;
}
// Fast exit, cause we looked into the directory.
break;
}
// Is it possible that a shorter directory name
// exists where this file exists??
// If yes, we have to continue the search ...
// (Alternative code to the above one)
////// Check if a directory with the given name exists
////if (Directory.Exists(fullPath))
////{
//// // Check if the filename within this directory exists
//// if (File.Exists(Path.Combine(fullPath, filename)))
//// {
//// // Report that the file exists.
//// DoFileFound(fullPath);
//// // Fast exit, cause we found the file.
//// directoryFound = true;
//// break;
//// }
////}
// Shorten the directory name for the next candidate
directoryCandidate = directoryCandidate.Substring(0, directoryCandidate.Length - 1);
} while (!directoryFound
&& !String.IsNullOrEmpty(directoryCandidate));
// We did our best but we found nothing.
if (!directoryFound)
DoFileNotAvailable(filename);
}
The only furhter performance improvement i could think of, would be putting the directories found into a HashSet and before checking with Directory.Exists() use this to check for an existing directory, but maybe this wouldn't gain anything cause the OS already makes some caching in directory lookups and would then nearly as fast as your local cache. But for these things you simply have to measure your concrete problem.