System.IO.IOException C# when FileInfo and WriteAllLines

System.IO.IOException C# when FileInfo and WriteAllLines - c#

I want to clean some volume of my text log file if it size more then max:
FileInfo f = new FileInfo(filename);
if (f.Length > 30*1024*1024)
{
var lines = File.ReadLines(filename).Skip(10000);
File.WriteAllLines(filename, lines);
}
But I have exception
System.IO.IOException: The process cannot access the file '<path>' because it is being used by another process.
Questions:
Do I need close FileInfo object before future work with file?
Is there some more adequate method to rotate logs? (for example eficciant way to obtain number of lines instead of it byte size? )

File.ReadLines keep the file open until you dispose of the returned IEnumerable<string>.
So this has nothing to do with FileInfo.
If you need to write it back to the same file, fully enumerate the contents:
var lines = File.ReadLines(filename).Skip(10000).ToList();
You mention "rotating logs", have you considered rotating files instead? ie. write to a fixed file, when it gets "full" (by whatever criteria you deem full, like 1GB in size, one days worth of log entries, 100.000 lines, etc.), you rename the file and create a new, empty, one.
You would probably want to rename existing rotated files as well, so as to keep the number of rotated files low.

Related

System.IO.Compression - Counting the number of files using ZipFileArchive is very slow

In order to update a progress bar with the number of files to extract. My program is going over a list of Zip files and collects the number of files in them. The combined number is approximately 22000 files.
The code I am using:
foreach (string filepath in zipFiles)
{
ZipArchive zip = ZipFile.OpenRead(filepath);
archives.Add(zip);
filesCounter += zip.Entries.Count;
}
However it looks like the zip.Entries.Count is doing some kind of a traversal and it takes ages for this count to complete (Several Minutes and much, much more, if the internet connection is not great).
To have a sort of notion how much this can improve, I compared the above to the performance of 7-Zip.
I took one of the zip files that contain ~11000 files and folders:
2 Seconds to Open 7-Zip Archive.
1 Second to get the file properties
In the properties I can see 10016 files + 882 folder - meaning it takes 7-Zip ~3 seconds to know there are 10898 entries in the Zip file.
Any Idea, suggestion or any alternative method, that quickly counts the number of files, will be appreciated.
Using DotNetZip to count is actually much faster, but due to some internal bureaucratic issues, I can't use it.
I need to have a solution not involving third party libraries, I can still use Microsoft Standard Libraries.

My progress bar issue is solved, by taking a new approach to the matter.
I simply accumulate all ZIP files sizes, which serves as the max size. Now for each individual file that is extracted I add its compressed size to the progress. This way the progress bar does not show me the number of files, it shows me the uncompressed progress (E.g. If, in total, I have 4GB to Extract, when the progress bar is 1/4 green, I know I Extracted 1GB). Looks like a better representation of reality.
foreach (string filepath in zipFiles)
{
ZipArchive zip = ZipFile.OpenRead(filepath);
archives.Add(zip);
// Accumulating the Zip files sizes.
filesCounter += new FileInfo(filepath).Length;
}
// To utilize multiple processors it is possible to activate this loop
// in a thread for each ZipArchive -> currentZip!
// :
// :
foreach (ZipArchiveEntry entry in currentZip.Entries) {
// Doing my extract code here.
// :
// :
// Accumulate the compressed size of each file.
compressedFileSize += entry.CompressedLength
// Doing other stuff
// :
// :
}
So the issue with improving the performance of the zip.Entries.Count is still on, and I am still interested in knowing how to solve this specific issue (What does 7Zip do to be so quick - may be they use the DotNetZip or other C++ libraries)

C# - remove blocks of bytes in large binary files

i want a fast way in c# to remove a blocks of bytes in different places from binary file of size between 500MB to 1GB , the start and the length of bytes needed to be removed are in saved array
int[] rdiDataOffset= {511,15423,21047};
int[] rdiDataSize={102400,7168,512};
EDIT:
this is a piece of my code and it will not work correctly unless i put buffer size to 1:
while(true){
if (rdiDataOffset.Contains((int)fsr.Position))
{
int idxval = Array.IndexOf(rdiDataOffset, (int)fsr.Position, 0, rdiDataOffset.Length);
int oldRFSRPosition = (int)fsr.Position;
size = rdiDataSize[idxval];
fsr.Seek(size, SeekOrigin.Current);
}
int bufferSize = size == 0 ? 2048 : size;
if ((size>0) && (bufferSize > (size))) bufferSize = (size);
if (bufferSize > (fsr.Length - fsr.Position)) bufferSize = (int)(fsr.Length - fsr.Position);
byte[] buffer = new byte[bufferSize];
int nofbytes = fsr.Read(buffer, 0, buffer.Length);
fsr.Flush();
if (nofbytes < 1)
{
break;
}
}

No common file system provides an efficient way to remove chunks from the middle of an existing file (only truncate from the end). You'll have to copy all the data after the removal back to the appropriate new location.

A simple algorithm for doing this using a temp file (it could be done in-place as well but you have a riskier situation in case things go wrong).
Create a new file and call SetLength to set the stream size (if this is too slow you can Interop to SetFileValidData). This ensures that you have room for your temp file while you are doing the copy.
Sort your removal list in ascending order.
Read from the current location (starting at 0) to the first removal point. The source file should be opened without granting Write share permissions (you don't want someone mucking with it while you are editing it).
Write that content to the new file (you will likely need to do this in chunks).
Skip over the data not being copied
Repeat from #3 until done
You now have two files - the old one and the new one ... replace as necessary. If this is really critical data you might want to look a transactional approach (either one you implement or using something like NTFS transactions).
Consider a new design. If this is something you need to do frequently then it might make more sense to have an index in the file (or near the file) which contains a list of inactive blocks - then when necessary you can compress the file by actually removing blocks ... or maybe this IS that process.

If you're on the NTFS file system (most Windows deployments are) and you don't mind doing p/invoke methods, then there is a way, way faster way of deleting chunks from a file. You can make the file sparse. With sparse files, you can eliminate a large chunk of the file with a single call.
When you do this, the file is not rewritten. Instead, NTFS updates metadata about the extents of zeroed-out data. The beauty of sparse files is that consumers of your file don't have to be aware of the file's sparseness. That is, when you read from a FileStream over a sparse file, zeroed-out extents are transparently skipped.
NTFS uses such files for its own bookkeeping. The USN journal, for example, is a very large sparse memory-mapped file.
The way you make a file sparse and zero-out sections of that file is to use the DeviceIOControl windows API. It is arcane and requires p/invoke but if you go this route, you'll surely hide the uggles behind nice pretty function calls.
There are some issues to be aware of. For example, if the file is moved to a non-ntfs volume and then back, the sparseness of the file can disappear - so you should program defensively.
Also, a sparse file can appear to be larger than it really is - complicating tasks involving disk provisioning. A 5g sparse file that has been completely zeroed out still counts 5g towards a user's disk quota.
If a sparse file accumulates a lot of holes, you might want to occasionally rewrite the file in a maintenance window. I haven't seen any real performance troubles occur, but I can at least imagine that the metadata for a swiss-cheesy sparse file might accrue some performance degradation.
Here's a link to some doc if you're into the idea.

Finding "empty" portions in a file

EDIT 1:
I build a torrent application; Downloading from diffrent clients simultaneously. Each download represent a portion for my file and diffrent clients have diffrent portions.
After a download is complete, I need to know which portion I need to achieve now by Finding "empty" portions in my file.
One way to creat a file with fixed size:
File.WriteAllBytes(#"C:\upload\BigFile.rar", new byte[Big Size]);
My portion Arr that represent my file as portions:
BitArray TorrentPartsState = new BitArray(10);
For example:
File size is 100.
TorrentPartsState[0] = true; // thats mean that in my file, from position 0 until 9 I **dont** need to fill in some information.
TorrentPartsState[1] = true; // thats mean that in my file, from position 10 until 19 I **need** to fill in some information.
I seatch an effective way to save what the BitArray is containing even if the computer/application is shut down. One way I tought of, is by xml file and to update it each time a portion is complete.
I don't think its smart and effective solution. Any idea for other one?

It sounds like you know the following when you start a transfer:
The size of the final file.
The (maximum) number of streams you intend to use for the file.
Create the output file and allocate the required space.
Create a second "control" file with a related filename, e.g. add you own extension. In that file maintain an array of stream status structures corresponding to the network streams. Each status consists of the starting offset and number of bytes transferred. Periodically flush the stream buffers and then update the control file to reflect the progress made and committed.
Variations on the theme:
The control file can define segments to be transferred, e.g. 16MB chunks, and treated as a work queue by threads that look for an incomplete segment and a suitable server from which to retrieve it.
The control file could be a separate fork within the result file. (Who am I kidding?)

You could use a BitArray (in System.Collections).
Then, when you visit on offset in the file, you can set the BitArray at that offset to true.
So for your 10,000 byte file:
BitArray ba = new BitArray(10000);
// Visited offset, mark in the BitArray
ba[4] = true;

Implement a file system (like on a disk) in your file - just use something simple, should be something available in the FOS arena

DotNetZip Creates corrupt archives (bad CRC)

There's a strange problem with DotNetZip that I can't seem to find a solution to.
I've searched for a few hours now and I just can't find anything on this, so here goes.
var ms = new MemoryStream();
using (var archive = new Ionic.Zip.ZipFile()) {
foreach (var file in files) {
// string byte[]
var entry = archive.AddEntry(file.Name, file.Data);
entry.ModifiedTime = DateTime.Now.AddYears(10); // Just for testing
}
archive.Save(ms);
}
return ms.GetBuffer();
I need to add the modified time, which is rather crucial, but right now I just have a dummy timestamp.
When I open the file with WinRAR, it says "Unexpected end of archive". Each individual file has checksum 00000000, and WinRAR says "The archive is either in unknown format or damaged". I can repair it, which brings it down 20% in size and makes everything OK. But that's not really useful..
When I make a breakpoint after adding all the entries, I can see in zip.Entries that all the entries have that same bad CRC, but all the data seems to be there.
So it shouldn't be the way I save the archive that's the problem.
I use my file collection elsewhere without problems, which adds to DotNetZip being weird. Well either that or I misunderstand something :)

GetBuffer is certainly wrong. It returns the internal buffer of the MemoryStream, which is often bigger than the actual content.
To return an array that only contains the actual content, use ToArray().
Or you could carefully handle the incompletely filled buffer in the consuming code. This would reduce GC pressure, since you don't need to allocate a whole new array for the return value.
If the zip-archive is large, I'd also consider saving to a file directly, instead of assembling the archive in-memory.

Reading a MemoryStream which contains multiple files

If I have a single MemoryStream of which I know I sent multiple files (example 5 files) to this MemoryStream. Is it possible to read from this MemoryStream and be able to break apart file by file?
My gut is telling me no since when we Read, we are reading byte by byte... Any help and a possible snippet would be great. I haven't been able to find anything on google or here :(

You can't directly, not if you don't delimit the files in some way or know the exact size of each file as it was put into the buffer.
You can use a compressed file such as a zip file to transfer multiple files instead.

A stream is just a line of bytes. If you put the files next to each other in the stream, you need to know how to separate them. That means you must know the length of the files, or you should have used some separator. Some (most) file types have a kind of header, but looking for this in an entire stream may not be waterproof either, since the header of a file could just as well be data in another file.
So, if you need to write files to such a stream, it is wise to add some extra information. For instance, start with a version number, then, write the size of the first file, write the file itself and then write the size of the next file, etc....
By starting with a version number, you can make alterations to this format. In the future you may decide you need to store the file name as well. In that case, you can increase version number, make up a new format, and still be able to read streams that you created earlier.
This is of course especially useful if you store these streams too.

Since you're sending them, you'll have to send them into the stream in such a way that you'll know how to pull them out. The most common way of doing this is to use a length specification. For example, to write the files to the stream:
write an integer to the stream to indicate the number of files
Then for each file,
write an integer (or a long if the files are large) to indicate the number of bytes in the file
write the file
To read the files back,
read an integer (n) to determine the number of files in the stream
Then, iterating n times,
read an integer (or long if that's what you chose) to determine the number of bytes in the file
read the file

You could use an IEnumerable<Stream> instead.

You need to implement this yourself, what you would want to do is write in some sort of 'delimited' into the stream. As you're reading, look for that delimited, and you'll know when you have hit a new file.
Here's a quick and dirty example:
byte[] delimiter = System.Encoding.Default.GetBytes("++MyDelimited++");
ms.Write(myFirstFile);
ms.Write(delimiter);
ms.Write(mySecondFile);
....
int len;
do {
len = ms.ReadByte(buffer, lastOffest, delimiter.Length);
if(buffer == delimiter)
{
// Close and open a new file stream
}
// Write buffer to output stream
} while(len > 0);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.