i'm writing a little tape writer application in C#, using class contained in this article:
http://www.codeproject.com/Articles/15487/Magnetic-Tape-Data-Storage-Part-1-Tape-Drive-IO-Co
this works very well, but writes a lot more data on tape than the original file data.
Pratical example:
my test file is 160mb.
writing in a tape results in about 300mb space occupation.
enabling hardware compression it takes about 250mb.
if i read the just wrote raw data from tape i get an about 170mb file (witch is acceptable) and the backupped file always works well.
I tried with other programs, Microsoft NTBackup uses just 170mb (!!) with compression enabled, other commercial and free program uses from 200 to 300mb
But ALL the programs can read correctly the backup (same md5 and sha1 on recovered file!)
whats going on? how can i improve my application? i really can't understand this.
i add my "write" function, who uses a modded write in the class (this works only if you write a single file):
private void Write(string path)
{
int BlockCounter = 0;
int BytesRead = 0;
Byte[] Temp = new Byte[BUFFER_SIZE];
using (System.IO.FileStream InputStream = System.IO.File.OpenRead(path))
{
TapeOperator TapeOp = new TapeOperator();
TapeOp.Load("\\\\.\\Tape0", 0);
TapeOp.SetTapePosition(0);
BytesRead = InputStream.Read(Temp, 0, BUFFER_SIZE);
while (BytesRead > 0)
{
TapeOp.Write(BlockCounter, Temp);
BlockCounter++;
BytesRead = InputStream.Read(Temp, 0, BUFFER_SIZE);
}
TapeOp.TapeMark(1, 1, 1); //TapeMark is a custom function to write a FileMark
BlockCounter++;
TapeOp.Close();
}
}
Modded write from class:
public void Write(long startPos, byte[] stream)
{
m_stream.Write(stream, 0, stream.Length);
m_stream.Flush();
}
My take on it would be the block size of the tape is greater than your BUFFER_SIZE. You are not filling the tape blocks all the way.
Related
The following code do :
Read all bytes from an input file
Keep only part of the file in outbytes
Write the extracted bytes in outputfile
byte[] outbytes = File.ReadAllBytes(sourcefile).Skip(offset).Take(size).ToArray();
File.WriteAllBytes(outfile, outbytes);
But there is a limitation of ~2GB data for each step.
Edit: The extracted bytes size can also be greater than 2GB.
How could I handle big file ? What is the best way to proceed with good performances, regardless of size ?
Thx !
Example to FileStream to take the middle 3 Gb out of a 5 Gb file:
byte[] buffer = new byte{1024*1024];
using(var readFS = File.Open(pathToBigFile))
using(var writeFS = File.OpenWrite(pathToNewFile))
{
readFS.Seek(1024*1024*1024); //seek to 1gb in
for(int i=0; i < 3000; i++){ //3000 times of one megabyte = 3gb
int bytesRead = readFS.Read(buffer, 0, buffer.Length);
writeFS.Write(buffer, 0, bytesRead);
}
}
It's not a production grade code; Read might not read a full megabyte so you'd end up with less than 3Gb - it's more to demonstrate the concept of using two filestreams and reading repeatedly from one and writing repeatedly to the other. I'm sure you can modify it so that it copies an exact number of bytes by keeping track of the total of all the bytesRead in the loop and stopping reading when you have read enough
It is better to stream the data from one file to the other, only loading small parts of it into memory:
public static void CopyFileSection(string inFile, string outFile, long startPosition, long size)
{
// Open the files as streams
using (var inStream = File.OpenRead(inFile))
using (var outStream = File.OpenWrite(outFile))
{
// seek to the start position
inStream.Seek(startPosition, SeekOrigin.Begin);
// Create a variable to track how much more to copy
// and a buffer to temporarily store a section of the file
long remaining = size;
byte[] buffer = new byte[81920];
do
{
// Read the smaller of 81920 or remaining and break out of the loop if we've already reached the end of the file
int bytesRead = inStream.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining));
if (bytesRead == 0) { break; }
// Write the buffered bytes to the output file
outStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
while (remaining > 0);
}
}
Usage:
CopyFileSection(sourcefile, outfile, offset, size);
This should have equivalent functionality to your current method without the overhead of reading the entire file, regardless of its size, into memory.
Note: If you're doing this in code that uses async/await, you should change CopyFileSection to be public static async Task CopyFileSection and change inStream.Read and outStream.Write to await inStream.ReadAsync and await outStream.WriteAsync respectively.
I hope someone here will be able to help me out with this.
What I'm trying to do is decompress a zlib compressed file in C# using ZlibNet. (I've also tried DotNetZip and SharpZipLib)
The problem that I'm having is that it'll decompress only the first 256kb, or rather the first 262144 bytes.
Here's my Decompress method, taken from here:
public static byte[] Decompress(byte[] gzip)
{
using (var stream = new Ionic.Zlib.ZlibStream(new MemoryStream(gzip), Ionic.Zlib.CompressionMode.Decompress))
{
var outStream = new MemoryStream();
const int size = 999999; //Playing around with various sizes didn't help
byte[] buffer = new byte[size];
int read;
while ((read = stream.Read(buffer, 0, size)) > 0)
{
outStream.Write(buffer, 0, read);
read = 0;
}
return outStream.ToArray();
}
}
Basically, the int (read) gets set to 262144 on the first time the while loop executes, it writes, and then the next pass of the while loop, read gets said to 0, thus making the loop exit and the function return the outStream as an array. (Even though there are still bytes left to be read!)
Thanks in advance to anyone who could help with this!
Upon further inspection of the originally packed data, it turns out that the script responsible for (de)compressing the data in the original application would split the zlib stream of a file into chunks of 262144 bytes each.
This is why the various libraries I tested always stopped at 262144 bytes-- it was the end of the zlib stream, but not the end of the file it was supposed to extract. (Each zlib stream was also seperated by a 32-bit unsigned int that indicated the amount of bytes the next zlib stream would contain)
My only guess is that they did this so that if they had a very large file, they wouldn't need to load all of it into memory for decompression. (But that's just a guess.)
Looking at the different ways to upload a file in .NET, e.g. HttpPostedFile, and using a HttpHandler, I'm trying to understand how the process works in a bit more details.
Specifically how it writes the information to a file.
Say I have the following:
HttpPostedFile file = context.Request.Files[0];
file.SaveAs("c:\temp\file.zip");
The actual file does not get created until the full stream seems to be processed.
Similarly:
using (Stream output = File.OpenWrite("c:\temp\file.zip"))
using (Stream input = file.InputStream)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
I would have thought that this would "progressively" write the file as it reads the stream. Looking at the filesystem, it does not seems to do this at all. If I breakpoint inside the while, it does though.
What I'm trying to do, is have it so you upload a file (using a javascript uploader), and poll alongside, whereby the polling ajax request tries to get the fileinfo(file size) of the uploaded file every second. However, it always returns 0 until the upload is complete.
Vimeo seems to be able to do this type of functionality (for IE)?? Is this a .NET limitation, or is there a way to progressively write the file from the stream?
Two points:
First, in Windows, the displayed size of a file is not updated constantly. The file might indeed be growing continually, but the size only increases once.
Second (more likely in this case), the stream might not be flushing to the disk. You could force it to by adding output.Flush() after the call to output.Write(). You might not want to do that, though, since it will probably have a negative impact on performance.
Perhaps you could poll the Length property of the output stream directly, instead of going through the file system.
EDIT:
To make the Length property of the stream accessible to other threads, you could have a field in your class and update it with each read/write:
private long _uploadedByteCount;
void SomeMethod()
{
using (Stream output = File.OpenWrite("c:\temp\file.zip"))
using (Stream input = file.InputStream)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
Interlocked.Add(ref _uploadedByteCount, bytesRead);
}
}
}
public long GetUploadedByteCount()
{
return _uploadedByteCount;
}
Hi
Does anyone have workable sample code that copy files and support resuming them if there are the source has been disconnected?
In this example I am copying videos files. If the source is disconnected ie usb was unplugged, how can I support resuming them again? I have tried some code on stackoverflow, but after resuming, the video files seem to be corrupted. Is FileStream the best solution for video transfers/resume?
Any other pointers or tips are welcome.
private void CreateNewCopyTo(FileStream source, FileStream dest) {
int size = (source.CanSeek) ? Math.Min((int)(source.Length - source.Position), 0x2000) : 0x2000;
byte[] buffer = new byte[size];
int read;
long fileLength = source.Length;
while ((read = source.Read(buffer, 0, buffer.Length)) > 0) {
dest.Write(buffer, 0, read);
dest.Close();
dest = new FileStream(dest.Name, FileMode.Append, FileAccess.Write);
}
dest.Close();
}
private void ResumeCopyTo(FileStream source, FileStream dest) {
int size = (source.CanSeek) ? Math.Min((int)(source.Length - source.Position), 0x2000) : 0x2000;
byte[] buffer = new byte[size];
long TempPos = source.Position;
while (true) {
int read = source.Read(buffer, 0, buffer.Length);
if (read <= 0)
return;
dest.Write(buffer, 0, read);
dest.Close();
dest = new FileStream(dest.Name, FileMode.Append, FileAccess.Write);
}
}
Try to insert source.Seek(dest.Length, SeekOrigin.Begin); as first line in ResumeCopyTo method.
create a file that is the same size as the file your copying. And if the connection is broken write a "BookMark" into the file. Some arbitrary string at the end of the file that tells you where you left off.
Best to get a hash of the original file to determine if the file has changed since the last copy attempt. (if the file is different then you should check the data in the destination with the data in the source, perhaps with another hash of the data.)
The most important part tho is to seek in the source and the destination to the exact same point in both files. before the resume attempt.
You will want to make sure that your seek will give you the next byte in the source file. Not the same byte you left off at. Otherwise you will be left with your data having a single duplicate character. Which will corrupt the data.
(e.g. if the source file is 12345678901234567890 and your destination made it to 123456 , you dont want to resume at 6, you want to resume at 7. otherwise your destination would be 123456678901234567890)
Assuming that the file does not change in time, we can use the functions that you have written. However, they should be slightly modified and simplified to one function.
private void ResumeCopy(FileStream source, FileStream dest)
{
byte[] buffer = new byte[0x10000]; //use 64kB buffer
int read;
source.Seek(dest.Length, SeekOrigin.Begin);
while ((read = source.Read(buffer, 0, buffer.Length)) > 0)
{
dest.Write(buffer, 0, read);
dest.Flush();
}
}
We can create an additional function:
private void ResumeCopyTo(string sourceFilePath, string destFilePath)
{
FileStream input = new FileStream(sourceFilePath, FileMode.Open, FileAccess.Read);
FileStream output = new FileStream(destFilePath, FileMode.Append, FileAccess.Write);
ResumeCopy(input, output);
output.Close();
input.Close();
}
Ideas for a production environment:
add error handling
check whether the file has changed (use checksum)
opened the source file in exclusive mode
BTW:
In this way, I copied an 15 GB file in 9 hours via a network connection of really poor quality.
I have a huge file, where I have to insert certain characters at a specific location. What is the easiest way to do that in C# without rewriting the whole file again.
Filesystems do not support "inserting" data in the middle of a file. If you really have a need for a file that can be written to in a sorted kind of way, I suggest you look into using an embedded database.
You might want to take a look at SQLite or BerkeleyDB.
Then again, you might be working with a text file or a legacy binary file. In that case your only option is to rewrite the file, at least from the insertion point up to the end.
I would look at the FileStream class to do random I/O in C#.
You will probably need to rewrite the file from the point you insert the changes to the end. You might be best always writing to the end of the file and use tools such as sort and grep to get the data out in the desired order. I am assuming you are talking about a text file here, not a binary file.
There is no way to insert characters in to a file without rewriting them. With C# it can be done with any Stream classes. If the files are huge, I would recommend you to use GNU Core Utils inside C# code. They are the fastest. I used to handle very large text files with the core utils ( of sizes 4GB, 8GB or more etc ). Commands like head, tail, split, csplit, cat, shuf, shred, uniq really help a lot in text manipulation.
For example if you need to put some chars in a 2GB file, you can use split -b BYTECOUNT, put the ouptut in to a file, append the new text to it, and get the rest of the content and add to it. This should supposedly be faster than any other way.
Hope it works. Give it a try.
You can use random access to write to specific locations of a file, but you won't be able to do it in text format, you'll have to work with bytes directly.
If you know the specific location to which you want to write the new data, use the BinaryWriter class:
using (BinaryWriter bw = new BinaryWriter (File.Open (strFile, FileMode.Open)))
{
string strNewData = "this is some new data";
byte[] byteNewData = new byte[strNewData.Length];
// copy contents of string to byte array
for (var i = 0; i < strNewData.Length; i++)
{
byteNewData[i] = Convert.ToByte (strNewData[i]);
}
// write new data to file
bw.Seek (15, SeekOrigin.Begin); // seek to position 15
bw.Write (byteNewData, 0, byteNewData.Length);
}
You may take a look at this project:
Win Data Inspector
Basically, the code is the following:
// this.Stream is the stream in which you insert data
{
long position = this.Stream.Position;
long length = this.Stream.Length;
MemoryStream ms = new MemoryStream();
this.Stream.Position = 0;
DIUtils.CopyStream(this.Stream, ms, position, progressCallback);
ms.Write(data, 0, data.Length);
this.Stream.Position = position;
DIUtils.CopyStream(this.Stream, ms, this.Stream.Length - position, progressCallback);
this.Stream = ms;
}
#region Delegates
public delegate void ProgressCallback(long position, long total);
#endregion
DIUtils.cs
public static void CopyStream(Stream input, Stream output, long length, DataInspector.ProgressCallback callback)
{
long totalsize = input.Length;
long byteswritten = 0;
const int size = 32768;
byte[] buffer = new byte[size];
int read;
int readlen = length < size ? (int)length : size;
while (length > 0 && (read = input.Read(buffer, 0, readlen)) > 0)
{
output.Write(buffer, 0, read);
byteswritten += read;
length -= read;
readlen = length < size ? (int)length : size;
if (callback != null)
callback(byteswritten, totalsize);
}
}
Depending on the scope of your project, you may want to decide to insert each line of text with your file in a table datastructure. Sort of like a database table, that way you can insert to a specific location at any given moment, and not have to read-in, modify, and output the entire text file each time. This is given the fact that your data is "huge" as you put it. You would still recreate the file, but at least you create a scalable solution in this manner.
It may be "possible" depending on how the filesystem stores files to quickly insert (ie, add additional) bytes in the middle. If it is remotely possible it may only be feasible to do so a full block at a time, and only by either doing low level modification of the filesystem itself or by using a filesystem specific interface.
Filesystems are not generally designed for this operation. If you need to quickly do inserts you really need a more general database.
Depending on your application a middle ground would be to bunch your inserts together, so you only do one rewrite of the file rather than twenty.
You will always have to rewrite the remaining bytes from the insertion point. If this point is at 0, then you will rewrite the whole file. If it is 10 bytes before the last byte, then you will rewrite the last 10 bytes.
In any case there is no function to directly support "insert to file". But the following code can do it accurately.
var sw = new Stopwatch();
var ab = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ";
// create
var fs = new FileStream(#"d:\test.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite, 262144, FileOptions.None);
sw.Restart();
fs.Seek(0, SeekOrigin.Begin);
for (var i = 0; i < 40000000; i++) fs.Write(ASCIIEncoding.ASCII.GetBytes(ab), 0, ab.Length);
sw.Stop();
Console.WriteLine("{0} ms", sw.Elapsed.TotalMilliseconds);
fs.Dispose();
// insert
fs = new FileStream(#"d:\test.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite, 262144, FileOptions.None);
sw.Restart();
byte[] b = new byte[262144];
long target = 10, offset = fs.Length - b.Length;
while (offset != 0)
{
if (offset < 0)
{
offset = b.Length - target;
b = new byte[offset];
}
fs.Position = offset; fs.Read(b, 0, b.Length);
fs.Position = offset + target; fs.Write(b, 0, b.Length);
offset -= b.Length;
}
fs.Position = target; fs.Write(ASCIIEncoding.ASCII.GetBytes(ab), 0, ab.Length);
sw.Stop();
Console.WriteLine("{0} ms", sw.Elapsed.TotalMilliseconds);
To gain better performance for file IO, play with "magic two powered numbers" like in the code above. The creation of the file uses a buffer of 262144 bytes (256KB) that does not help at all. The same buffer for the insertion does the "performance job" as you can see by the StopWatch results if you run the code. A draft test on my PC gave the following results:
13628.8 ms for creation and 3597.0971 ms for insertion.
Note that the target byte for insertion is 10, meaning that almost the whole file was rewritten.
Why don't you put a pointer to the end of the file (literally, four bytes above the current size of the file) and then, on the end of file write the length of inserted data, and finally the data you want to insert itself. For example, if you have a string in the middle of the file, and you want to insert few characters in the middle of the string, you can write a pointer to the end of file over some four characters in the string, and then write that four characters to the end together with the characters you firstly wanted to insert. It's all about ordering data. Of course, you can do this only if you are writing the whole file by yourself, I mean you are not using other codecs.