Writing bytes to a file in C# - c#

I have a BLOB from an Oracle database. In .NET it is of type OracleLob and has among them a Read and ReadByte methods.
int OracleLob.Read(byte[] buffer, int offset, int count)
int OracleLob.ReadByte()
So the Read method reads a sequence of bytes and ReadByte reads a single byte at a time. Here is my code:
OracleLob ol = (OracleLob) cmd.Parameters[1].Value; //her er filen!!
BinaryWriter binWriter = new BinaryWriter(File.Open(#"D:\wordfile.DOCX", FileMode.Create));
int currentByte = ol.ReadByte();
while (currentByte != -1)
{
binWriter.Write(currentByte);
currentByte = ol.ReadByte();
}
binWriter.Close();
But when I open wordfile.DOCX in Word, it says that the file is corrupt and cannot be opened. What am I doing wrong?

BinaryWriter will output the int in some form of serialized manner. It won't write just a single byte. Use FileStream for this purpose, and use the byte[] versions of the read/write methods, since byte-at-a-time streaming is very slow.

A docx file is a OpenXml format file that is basically a series of xml files that is zipped down and renamed to docx. You can't just take a output from a database and write to a file and magically make it into a docx file.
Are you sure it's a docx file you are trying to make here? The only way I can see this working is if you serialized a docx file into the database, but then you have to make sure that it's de-serialized the exactly same way on "the way out", else the underlaying zip file will be corrupt, and the file cannot be opened.

What's wrong with the code is that it's using an int value when writing the byte data to the BinaryWriter. It's using the overload that is writing an int instead of the one writing a byte, so each byte from the source will be written as four bytes. If you check the file size you see that it's four times as large as it should be.
Cast the value to byte so that the correct overload of the Write method is used:
binWriter.Write((byte)currentByte);
To do this more efficiently, you can use a buffer to read blocks of bytes instead of a single byte at a time:
using (FileStream stream = File.Open(#"D:\wordfile.DOCX", FileMode.Create)) {
byte[] buffer = new byte[4096];
int len = ol.Read(buffer, 0, buffer.Length);
while (len > 0) {
stream.Write(buffer, 0, len);
len = ol.Read(buffer, 0, buffer.Length);
}
}

currentByte is declared as an int, so the binary writer is writing 4 bytes for each write.
You need to cast currentByte as an actual byte:
binWriter.Write((byte) currentByte);

Related

Decompress file using zlib - stuck at 256kb limit

I hope someone here will be able to help me out with this.
What I'm trying to do is decompress a zlib compressed file in C# using ZlibNet. (I've also tried DotNetZip and SharpZipLib)
The problem that I'm having is that it'll decompress only the first 256kb, or rather the first 262144 bytes.
Here's my Decompress method, taken from here:
public static byte[] Decompress(byte[] gzip)
{
using (var stream = new Ionic.Zlib.ZlibStream(new MemoryStream(gzip), Ionic.Zlib.CompressionMode.Decompress))
{
var outStream = new MemoryStream();
const int size = 999999; //Playing around with various sizes didn't help
byte[] buffer = new byte[size];
int read;
while ((read = stream.Read(buffer, 0, size)) > 0)
{
outStream.Write(buffer, 0, read);
read = 0;
}
return outStream.ToArray();
}
}
Basically, the int (read) gets set to 262144 on the first time the while loop executes, it writes, and then the next pass of the while loop, read gets said to 0, thus making the loop exit and the function return the outStream as an array. (Even though there are still bytes left to be read!)
Thanks in advance to anyone who could help with this!
Upon further inspection of the originally packed data, it turns out that the script responsible for (de)compressing the data in the original application would split the zlib stream of a file into chunks of 262144 bytes each.
This is why the various libraries I tested always stopped at 262144 bytes-- it was the end of the zlib stream, but not the end of the file it was supposed to extract. (Each zlib stream was also seperated by a 32-bit unsigned int that indicated the amount of bytes the next zlib stream would contain)
My only guess is that they did this so that if they had a very large file, they wouldn't need to load all of it into memory for decompression. (But that's just a guess.)

How to find the no. of bytes of a text file without reading it?

I have c# code reading a text file and printing it out which looks like this:
StreamReader sr = new StreamReader(File.OpenRead(ofd.FileName));
byte[] buffer = new byte[100]; //is there a way to simply specify the length of this to be the number of bytes in the file?
sr.BaseStream.Read(buffer, 0, buffer.Length);
foreach (byte b in buffer)
{
label1.Text += b.ToString("x") + " ";
}
Is there anyway I can know how many bytes my file has?
I want to know the length of the byte[] buffer in advance so that in the Read function, I can simply pass in buffer.length as the third argument.
System.IO.FileInfo fi = new System.IO.FileInfo("myfile.exe");
long size = fi.Length;
In order to find the file size, the system has to read from the disk. So, the above example performs data read from disk but does not read file content.
It's not clear why you're using StreamReader at all if you're going to read binary data. Just use FileStream instead. You can use the Length property to find the length of the file.
Note, however, that that still doesn't mean you should just call Read and *assume` that a single call will read all the data. You should loop until you've read everything:
byte[] data;
using (var stream = File.OpenRead(...))
{
data = new byte[(int) stream.Length];
int offset = 0;
while (offset < data.Length)
{
int chunk = stream.Read(data, offset, data.Length - offset);
if (chunk == 0)
{
// Or handle this some other way
throw new IOException("File has shrunk while reading");
}
offset += chunk;
}
}
Note that this is assuming you do want to read the data. If you don't want to even open the stream, use FileInfo.Length as other answers have shown. Note that both FileStream.Length and FileInfo.Length have a type of long, whereas arrays are limited to 32-bit lengths. What do you want to happen with a file which is bigger than 2 gigs?
You can use the FileInfo.Length method.
Take a look at the example given in the link.
I would imagine something in here should help.
I doubt you can preemptively guess the size of a file without reading it...
How do I use File.ReadAllBytes In chunks
If it is a large file; then reading in chunks should might help

How to save FileSteam as PDF file

I am using a third party tool to get the scanned content from the scanner. On button click it executes the code and gives the content as a FileStream. Now I need to save this FileStream content as a pdf file in to a specified folder.
After saving I need to open the file in browser. How can I save the FileStream as a PDF file?
You can write the stream directly to the output buffer of the response.
So if you're at the point in your code where you have the filestream from the scanner. Simply read bytes from the scanner filestream and write them to the Response.OutputStream
Set the contentType to application/pdf
Make sure you return nothing else. The users browser will do whatever it is configured to do now, either save to disk or show in the browser. You can also save to disk on the server at this point as well in case you wanted a backup.
I'm assuming your file stream is already a pdf, otherwise you'll need to use something like itextsharp to create the pdf.
Edit
Here's some rough and ready code to do it. You'll want to tidy this up, like adding exception trapping to make sure the file stream gets cleaned up properly.
public void SaveToOutput(Stream dataStream)
{
dataStream.Seek(0, SeekOrigin.Begin);
FileStream fileout = File.Create("somepath/file.pdf");
const int chunk = 512;
byte[] buffer = new byte[512];
int bytesread = dataStream.Read(buffer,0,chunk);
while (bytesread == chunk)
{
HttpContext.Current.Response.OutputStream.Write(buffer, 0, chunk);
fileout.Write(buffer, 0, chunk);
bytesread = dataStream.Read(buffer, 0, chunk);
}
HttpContext.Current.Response.OutputStream.Write(buffer, 0, bytesread);
fileout.Write(buffer, 0, bytesread);
fileout.Close();
HttpContext.Current.Response.ContentType = "application/pdf";
}
Simon
You might want to take a look at the C# PDF Library on SourceForge: http://sourceforge.net/projects/pdflibrary/
If I'm understanding you correctly, the third party library is handing you a stream containing the data for the scanned document and you need to write it to a file? If that's the case you need to look up file I/O in C#. Here's a link and an example:
Stream sourceStream = scanner.GetOutput(); // whereever the source stream is
FileStream targetStream = File.OpenWrite(filename, FileMode.Create());
int bytesRead = 0;
byte[] buffer = new byte[2048];
while (true) {
bytesRead = sourceStream.Read(buffer, 0, buffer.length);
if (bytesRead == 0)
break;
targetStream.Write(buffer, 0, bytesRead);
}
sourceStream.Close();
targetStream.Close();
not sure, but maybe check this
http://sourceforge.net/projects/itextsharp/
iTextSharp + FileStream = Corrupt PDF file
Another prominent PDF library (which I have used in the past as well) is iTextSharp. You can take a look at this tutorial on how to convert your Stream to PDF then have the user download it.

How to store an IStream to a file via C#?

I'm working with a 3rd party component that returns an IStream object (System.Runtime.InteropServices.ComTypes.IStream). I need to take the data in that IStream and write it to a file. I've managed to get that done, but I'm not really happy with the code.
With "strm" being my IStream, here's my test code...
// access the structure containing statistical info about the stream
System.Runtime.InteropServices.ComTypes.STATSTG stat;
strm.Stat(out stat, 0);
System.IntPtr myPtr = (IntPtr)0;
// get the "cbSize" member from the stat structure
// this is the size (in bytes) of our stream.
int strmSize = (int)stat.cbSize; // *** DANGEROUS *** (long to int cast)
byte[] strmInfo = new byte[strmSize];
strm.Read(strmInfo, strmSize, myPtr);
string outFile = #"c:\test.db3";
File.WriteAllBytes(outFile, strmInfo);
At the very least, I don't like the long to int cast as commented above, but I wonder if there's not a better way to get the original stream length than the above? I'm somewhat new to C#, so thanks for any pointers.
You don't need to do that cast, as you can read data from IStream source in chunks.
// ...
System.IntPtr myPtr = (IntPtr)-1;
using (FileStream fs = new FileStream(#"c:\test.db3", FileMode.OpenOrCreate))
{
byte[] buffer = new byte[8192];
while (myPtr.ToInt32() > 0)
{
strm.Read(buffer, buffer.Length, myPtr);
fs.Write(buffer, 0, myPtr.ToInt32());
}
}
This way (if works) is more memory efficient, as it just uses a small memory block to transfer data between that streams.
System.Runtime.InteropServices.ComTypes.IStream is a wrapper for ISequentialStream.
From MSDN: http://msdn.microsoft.com/en-us/library/aa380011(VS.85).aspx
The actual number of bytes read can be
less than the number of bytes
requested if an error occurs or if the
end of the stream is reached during
the read operation. The number of
bytes returned should always be
compared to the number of bytes
requested. If the number of bytes
returned is less than the number of
bytes requested, it usually means the
Read method attempted to read past the
end of the stream.
This documentation says, that you can loop and read as long as pcbRead is less then cb.

How to insert characters to a file using C#

I have a huge file, where I have to insert certain characters at a specific location. What is the easiest way to do that in C# without rewriting the whole file again.
Filesystems do not support "inserting" data in the middle of a file. If you really have a need for a file that can be written to in a sorted kind of way, I suggest you look into using an embedded database.
You might want to take a look at SQLite or BerkeleyDB.
Then again, you might be working with a text file or a legacy binary file. In that case your only option is to rewrite the file, at least from the insertion point up to the end.
I would look at the FileStream class to do random I/O in C#.
You will probably need to rewrite the file from the point you insert the changes to the end. You might be best always writing to the end of the file and use tools such as sort and grep to get the data out in the desired order. I am assuming you are talking about a text file here, not a binary file.
There is no way to insert characters in to a file without rewriting them. With C# it can be done with any Stream classes. If the files are huge, I would recommend you to use GNU Core Utils inside C# code. They are the fastest. I used to handle very large text files with the core utils ( of sizes 4GB, 8GB or more etc ). Commands like head, tail, split, csplit, cat, shuf, shred, uniq really help a lot in text manipulation.
For example if you need to put some chars in a 2GB file, you can use split -b BYTECOUNT, put the ouptut in to a file, append the new text to it, and get the rest of the content and add to it. This should supposedly be faster than any other way.
Hope it works. Give it a try.
You can use random access to write to specific locations of a file, but you won't be able to do it in text format, you'll have to work with bytes directly.
If you know the specific location to which you want to write the new data, use the BinaryWriter class:
using (BinaryWriter bw = new BinaryWriter (File.Open (strFile, FileMode.Open)))
{
string strNewData = "this is some new data";
byte[] byteNewData = new byte[strNewData.Length];
// copy contents of string to byte array
for (var i = 0; i < strNewData.Length; i++)
{
byteNewData[i] = Convert.ToByte (strNewData[i]);
}
// write new data to file
bw.Seek (15, SeekOrigin.Begin); // seek to position 15
bw.Write (byteNewData, 0, byteNewData.Length);
}
You may take a look at this project:
Win Data Inspector
Basically, the code is the following:
// this.Stream is the stream in which you insert data
{
long position = this.Stream.Position;
long length = this.Stream.Length;
MemoryStream ms = new MemoryStream();
this.Stream.Position = 0;
DIUtils.CopyStream(this.Stream, ms, position, progressCallback);
ms.Write(data, 0, data.Length);
this.Stream.Position = position;
DIUtils.CopyStream(this.Stream, ms, this.Stream.Length - position, progressCallback);
this.Stream = ms;
}
#region Delegates
public delegate void ProgressCallback(long position, long total);
#endregion
DIUtils.cs
public static void CopyStream(Stream input, Stream output, long length, DataInspector.ProgressCallback callback)
{
long totalsize = input.Length;
long byteswritten = 0;
const int size = 32768;
byte[] buffer = new byte[size];
int read;
int readlen = length < size ? (int)length : size;
while (length > 0 && (read = input.Read(buffer, 0, readlen)) > 0)
{
output.Write(buffer, 0, read);
byteswritten += read;
length -= read;
readlen = length < size ? (int)length : size;
if (callback != null)
callback(byteswritten, totalsize);
}
}
Depending on the scope of your project, you may want to decide to insert each line of text with your file in a table datastructure. Sort of like a database table, that way you can insert to a specific location at any given moment, and not have to read-in, modify, and output the entire text file each time. This is given the fact that your data is "huge" as you put it. You would still recreate the file, but at least you create a scalable solution in this manner.
It may be "possible" depending on how the filesystem stores files to quickly insert (ie, add additional) bytes in the middle. If it is remotely possible it may only be feasible to do so a full block at a time, and only by either doing low level modification of the filesystem itself or by using a filesystem specific interface.
Filesystems are not generally designed for this operation. If you need to quickly do inserts you really need a more general database.
Depending on your application a middle ground would be to bunch your inserts together, so you only do one rewrite of the file rather than twenty.
You will always have to rewrite the remaining bytes from the insertion point. If this point is at 0, then you will rewrite the whole file. If it is 10 bytes before the last byte, then you will rewrite the last 10 bytes.
In any case there is no function to directly support "insert to file". But the following code can do it accurately.
var sw = new Stopwatch();
var ab = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ";
// create
var fs = new FileStream(#"d:\test.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite, 262144, FileOptions.None);
sw.Restart();
fs.Seek(0, SeekOrigin.Begin);
for (var i = 0; i < 40000000; i++) fs.Write(ASCIIEncoding.ASCII.GetBytes(ab), 0, ab.Length);
sw.Stop();
Console.WriteLine("{0} ms", sw.Elapsed.TotalMilliseconds);
fs.Dispose();
// insert
fs = new FileStream(#"d:\test.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite, 262144, FileOptions.None);
sw.Restart();
byte[] b = new byte[262144];
long target = 10, offset = fs.Length - b.Length;
while (offset != 0)
{
if (offset < 0)
{
offset = b.Length - target;
b = new byte[offset];
}
fs.Position = offset; fs.Read(b, 0, b.Length);
fs.Position = offset + target; fs.Write(b, 0, b.Length);
offset -= b.Length;
}
fs.Position = target; fs.Write(ASCIIEncoding.ASCII.GetBytes(ab), 0, ab.Length);
sw.Stop();
Console.WriteLine("{0} ms", sw.Elapsed.TotalMilliseconds);
To gain better performance for file IO, play with "magic two powered numbers" like in the code above. The creation of the file uses a buffer of 262144 bytes (256KB) that does not help at all. The same buffer for the insertion does the "performance job" as you can see by the StopWatch results if you run the code. A draft test on my PC gave the following results:
13628.8 ms for creation and 3597.0971 ms for insertion.
Note that the target byte for insertion is 10, meaning that almost the whole file was rewritten.
Why don't you put a pointer to the end of the file (literally, four bytes above the current size of the file) and then, on the end of file write the length of inserted data, and finally the data you want to insert itself. For example, if you have a string in the middle of the file, and you want to insert few characters in the middle of the string, you can write a pointer to the end of file over some four characters in the string, and then write that four characters to the end together with the characters you firstly wanted to insert. It's all about ordering data. Of course, you can do this only if you are writing the whole file by yourself, I mean you are not using other codecs.

Categories