Read a zip entry and save it in a string - c#

Ack. I am trying to open a specific entry in a zip file archive and store the contents in a string, instead of saving it to a file. I cannot use disk space for this per the client.
Here's what I have:
string scontents = "";
byte[] abbuffer = null;
MemoryStream oms = new MemoryStream();
try
{
//get the file contents
ozipentry.Open().CopyTo(oms);
int length = (int)oms.Length; // get file length
abbuffer = new byte[length]; // create buffer
int icount; // actual number of bytes read
int isum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((icount = oms.Read(abbuffer, isum, length - isum)) > 0)
{
isum += icount; // sum is a buffer offset for next reading
}
scontents = BytesToString(abbuffer); <----abbuffer is filled with Ascii 0
}
finally
{
oms.Close();
}
The variable abbuffer is supposed to hold that contents of the stream, but all it holds is a bunch of ascii zeros, which I guess means it didn't read (or copy) the stream! But I do not get any error messages or anything. Can someone tell me how to get this working?
I've looked everywhere on stack and on the web, and no where does anyone answer this question specifically for ASP.NET 4.5 ZipArchive library. I cannot use any other library, so if you offer an answer in that, while it would be educational, won't help me at all in this instance. Thanks so much for any help!
One more thing. 'ozipentry' is of type ZipArchiveEntry and is an element in a ZipArchive Entries array. (ie ozipentry = oziparchive.Entries[i])
Oops. One more thing! The function 'BytesToString' is not included, because it is irrelevant. Before the function is called, the abbuffer array is already filled with 0's

Ok. Sorry for being so dense. I realized I was overthinking this. I changed to function to do this:
osr = new StreamReader(ozipentry.Open(), Encoding.Default);
scontents = osr.ReadToEnd();
And it worked fine! Didn't even have to worry about Encoding...

Related

How can i read the end of a file in c#?

i searched in stackoverflow and got one way but this method only let me to write word by word in the console. My goal is to get the end of my file but get the complete result not char by char.
This code only show me char by char the end of my file:
using (var reader = new StreamReader("file.dll")
{
if (reader.BaseStream.Length > 1024)
{
reader.BaseStream.Seek(-1024, SeekOrigin.End);
}
string line;
while ((line = reader.ReadLine()) != null)
{
Console.WriteLine(line);
Console.ReadKey();
}
}
I was trying to get something like this, it's c++ but i was trying to get the same result in c#.
QFile *archivo;
archivo = new QFile();
archivo->setFileName("file.dll");
archivo->open(QFile::ReadOnly);
archivo->seek(archivo->size() - 1024);
trama = archivo->read(1024);
It's possible to get the complete result of the end of my file in c#?
If the file is line-delimited text file, you can use ReadAllLines.
string[] lines = System.IO.File.ReadAllLines("file.txt");
If it's a binary file, you can use ReadAllBytes. Shocker, I know.
byte[] data = System.IO.File.ReadAllBytes("file.dll");
if you want to be able to seek first (e.g. if you want only the last 1024 bytes of the file) you can use the stream's Read method. Again, crazy.
reader.BaseStream.Seek(-1024, SeekOrigin.End);
var chars = new char[1024];
reader.Read(chars, 0, 1024);
And before you ask, you can convert the characters to a string by passing them to the constructor:
char[] chars = new char[1024];
string s = new string(chars);
Console.WriteLine(s);
Not sure what it'll look like, since you're reading characters from a binary file, but good luck. My guess is you should be reading bytes instead though:
reader.BaseStream.Seek(-1024, SeekOrigin.End);
var bytes = new byte[1024];
reader.BaseStream.Read(bytes, 0, 1024);
(Notice you don't even need the StreamReader, since the FileStream (your base stream) exposes the Read method you need).

Trouble with replacing bytes in binary files

There are two programs involved. The first one has a string like "##########". The second one is a config tool to find "##########" and replace this string with user input from a textbox.
Now I have trouble in the replacing part. Here is the code.
//This is code from first program:
string myIP = "####################";
string myPort = "%%%%%%%%";
int port = Int32.Parse(myIP );
tcpClient.Connect(myIP , port);
//This is code from second program:
//Get bytes from textbox:
byte[] byte_IP = new byte[60];
byte_IP = System.Text.Encoding.ASCII.GetBytes(textBox1_ip.Text);
//Get all bytes in the first program:
byte[] buffer = File.ReadAllBytes(#"before.exe");
//Replace string with textbox input, 0x1c00 is where the "#" starts:
Buffer.BlockCopy( byte_IP, 0, buffer, 0x1c00, byte_IP.Length);
//Build a new exe:
File.WriteAllBytes(#"after.exe", buffer);
However, I get "127.0.0.1#.#.#.#.#.#." in the new exe. But I need "1.2.7...0...0...1........." to process as a valid host.
First I'd like to reiterate what has already been said in the comments: there are simpler ways to handle this stuff. That's what config files are for, or registry settings.
But if you absolutely must...
First, you have to match the encoding that the framework expects. Is the string stored as UTF8? UTF16? ASCII? Writing data in the wrong encoding will turn it into pure garbage, almost every time. Generally for strings in code like you're looking for you'll be wanting to use Encoding.UNICODE.
Next, you need some way to deal with strings of different lengths. The buffer you define needs to be large enough to contain the widest string you want to be able to set - 15 bytes for dotted numeric IPv4 addresses - but you have to allow for the minimum of 7 characters. Padding the remainder and removing that padding before using the value will probably suffice.
The minimum program I could think to use for testing this was:
class Program
{
static void Main(string[] args)
{
var addr = "###.###.###.###".TrimEnd();
Console.WriteLine("Address: [{0}]", addr);
}
}
Now in your patcher you will need to locate the starting position in the file and overwrite the bytes with the new string's bytes. Here's a Patch method, which calls a FindString method that you will have to write yourself:
static void PatchFile(string filename, string searchString, string replaceString)
{
// Open the file
using (var file = File.Open(filename, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite))
{
// Locate the search string in the file (needs to be implemented)
long pos = FindString(file, searchString);
if (pos < 0)
return;
// Pad and limit replacement string, then convert to bytes
string rep = string.Format("{0,-" + searchString.Length + "}", replaceString).Substring(0, searchString.Length);
byte[] replaceBytes = Encoding.Unicode.GetBytes(rep);
// Overwrite the located bytes with the replacement
file.Position = pos;
file.Write(replaceBytes, 0, replaceBytes.Length);
}
}
Hopefully it makes sense.

How to find the no. of bytes of a text file without reading it?

I have c# code reading a text file and printing it out which looks like this:
StreamReader sr = new StreamReader(File.OpenRead(ofd.FileName));
byte[] buffer = new byte[100]; //is there a way to simply specify the length of this to be the number of bytes in the file?
sr.BaseStream.Read(buffer, 0, buffer.Length);
foreach (byte b in buffer)
{
label1.Text += b.ToString("x") + " ";
}
Is there anyway I can know how many bytes my file has?
I want to know the length of the byte[] buffer in advance so that in the Read function, I can simply pass in buffer.length as the third argument.
System.IO.FileInfo fi = new System.IO.FileInfo("myfile.exe");
long size = fi.Length;
In order to find the file size, the system has to read from the disk. So, the above example performs data read from disk but does not read file content.
It's not clear why you're using StreamReader at all if you're going to read binary data. Just use FileStream instead. You can use the Length property to find the length of the file.
Note, however, that that still doesn't mean you should just call Read and *assume` that a single call will read all the data. You should loop until you've read everything:
byte[] data;
using (var stream = File.OpenRead(...))
{
data = new byte[(int) stream.Length];
int offset = 0;
while (offset < data.Length)
{
int chunk = stream.Read(data, offset, data.Length - offset);
if (chunk == 0)
{
// Or handle this some other way
throw new IOException("File has shrunk while reading");
}
offset += chunk;
}
}
Note that this is assuming you do want to read the data. If you don't want to even open the stream, use FileInfo.Length as other answers have shown. Note that both FileStream.Length and FileInfo.Length have a type of long, whereas arrays are limited to 32-bit lengths. What do you want to happen with a file which is bigger than 2 gigs?
You can use the FileInfo.Length method.
Take a look at the example given in the link.
I would imagine something in here should help.
I doubt you can preemptively guess the size of a file without reading it...
How do I use File.ReadAllBytes In chunks
If it is a large file; then reading in chunks should might help

How to insert characters to a file using C#

I have a huge file, where I have to insert certain characters at a specific location. What is the easiest way to do that in C# without rewriting the whole file again.
Filesystems do not support "inserting" data in the middle of a file. If you really have a need for a file that can be written to in a sorted kind of way, I suggest you look into using an embedded database.
You might want to take a look at SQLite or BerkeleyDB.
Then again, you might be working with a text file or a legacy binary file. In that case your only option is to rewrite the file, at least from the insertion point up to the end.
I would look at the FileStream class to do random I/O in C#.
You will probably need to rewrite the file from the point you insert the changes to the end. You might be best always writing to the end of the file and use tools such as sort and grep to get the data out in the desired order. I am assuming you are talking about a text file here, not a binary file.
There is no way to insert characters in to a file without rewriting them. With C# it can be done with any Stream classes. If the files are huge, I would recommend you to use GNU Core Utils inside C# code. They are the fastest. I used to handle very large text files with the core utils ( of sizes 4GB, 8GB or more etc ). Commands like head, tail, split, csplit, cat, shuf, shred, uniq really help a lot in text manipulation.
For example if you need to put some chars in a 2GB file, you can use split -b BYTECOUNT, put the ouptut in to a file, append the new text to it, and get the rest of the content and add to it. This should supposedly be faster than any other way.
Hope it works. Give it a try.
You can use random access to write to specific locations of a file, but you won't be able to do it in text format, you'll have to work with bytes directly.
If you know the specific location to which you want to write the new data, use the BinaryWriter class:
using (BinaryWriter bw = new BinaryWriter (File.Open (strFile, FileMode.Open)))
{
string strNewData = "this is some new data";
byte[] byteNewData = new byte[strNewData.Length];
// copy contents of string to byte array
for (var i = 0; i < strNewData.Length; i++)
{
byteNewData[i] = Convert.ToByte (strNewData[i]);
}
// write new data to file
bw.Seek (15, SeekOrigin.Begin); // seek to position 15
bw.Write (byteNewData, 0, byteNewData.Length);
}
You may take a look at this project:
Win Data Inspector
Basically, the code is the following:
// this.Stream is the stream in which you insert data
{
long position = this.Stream.Position;
long length = this.Stream.Length;
MemoryStream ms = new MemoryStream();
this.Stream.Position = 0;
DIUtils.CopyStream(this.Stream, ms, position, progressCallback);
ms.Write(data, 0, data.Length);
this.Stream.Position = position;
DIUtils.CopyStream(this.Stream, ms, this.Stream.Length - position, progressCallback);
this.Stream = ms;
}
#region Delegates
public delegate void ProgressCallback(long position, long total);
#endregion
DIUtils.cs
public static void CopyStream(Stream input, Stream output, long length, DataInspector.ProgressCallback callback)
{
long totalsize = input.Length;
long byteswritten = 0;
const int size = 32768;
byte[] buffer = new byte[size];
int read;
int readlen = length < size ? (int)length : size;
while (length > 0 && (read = input.Read(buffer, 0, readlen)) > 0)
{
output.Write(buffer, 0, read);
byteswritten += read;
length -= read;
readlen = length < size ? (int)length : size;
if (callback != null)
callback(byteswritten, totalsize);
}
}
Depending on the scope of your project, you may want to decide to insert each line of text with your file in a table datastructure. Sort of like a database table, that way you can insert to a specific location at any given moment, and not have to read-in, modify, and output the entire text file each time. This is given the fact that your data is "huge" as you put it. You would still recreate the file, but at least you create a scalable solution in this manner.
It may be "possible" depending on how the filesystem stores files to quickly insert (ie, add additional) bytes in the middle. If it is remotely possible it may only be feasible to do so a full block at a time, and only by either doing low level modification of the filesystem itself or by using a filesystem specific interface.
Filesystems are not generally designed for this operation. If you need to quickly do inserts you really need a more general database.
Depending on your application a middle ground would be to bunch your inserts together, so you only do one rewrite of the file rather than twenty.
You will always have to rewrite the remaining bytes from the insertion point. If this point is at 0, then you will rewrite the whole file. If it is 10 bytes before the last byte, then you will rewrite the last 10 bytes.
In any case there is no function to directly support "insert to file". But the following code can do it accurately.
var sw = new Stopwatch();
var ab = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ";
// create
var fs = new FileStream(#"d:\test.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite, 262144, FileOptions.None);
sw.Restart();
fs.Seek(0, SeekOrigin.Begin);
for (var i = 0; i < 40000000; i++) fs.Write(ASCIIEncoding.ASCII.GetBytes(ab), 0, ab.Length);
sw.Stop();
Console.WriteLine("{0} ms", sw.Elapsed.TotalMilliseconds);
fs.Dispose();
// insert
fs = new FileStream(#"d:\test.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite, 262144, FileOptions.None);
sw.Restart();
byte[] b = new byte[262144];
long target = 10, offset = fs.Length - b.Length;
while (offset != 0)
{
if (offset < 0)
{
offset = b.Length - target;
b = new byte[offset];
}
fs.Position = offset; fs.Read(b, 0, b.Length);
fs.Position = offset + target; fs.Write(b, 0, b.Length);
offset -= b.Length;
}
fs.Position = target; fs.Write(ASCIIEncoding.ASCII.GetBytes(ab), 0, ab.Length);
sw.Stop();
Console.WriteLine("{0} ms", sw.Elapsed.TotalMilliseconds);
To gain better performance for file IO, play with "magic two powered numbers" like in the code above. The creation of the file uses a buffer of 262144 bytes (256KB) that does not help at all. The same buffer for the insertion does the "performance job" as you can see by the StopWatch results if you run the code. A draft test on my PC gave the following results:
13628.8 ms for creation and 3597.0971 ms for insertion.
Note that the target byte for insertion is 10, meaning that almost the whole file was rewritten.
Why don't you put a pointer to the end of the file (literally, four bytes above the current size of the file) and then, on the end of file write the length of inserted data, and finally the data you want to insert itself. For example, if you have a string in the middle of the file, and you want to insert few characters in the middle of the string, you can write a pointer to the end of file over some four characters in the string, and then write that four characters to the end together with the characters you firstly wanted to insert. It's all about ordering data. Of course, you can do this only if you are writing the whole file by yourself, I mean you are not using other codecs.

Bytes consumed by StreamReader

Is there a way to know how many bytes of a stream have been used by StreamReader?
I have a project where we need to read a file that has a text header followed by the start of the binary data. My initial attempt to read this file was something like this:
private int _dataOffset;
void ReadHeader(string path)
{
using (FileStream stream = File.OpenRead(path))
{
StreamReader textReader = new StreamReader(stream);
do
{
string line = textReader.ReadLine();
handleHeaderLine(line);
} while(line != "DATA") // Yes, they used "DATA" to mark the end of the header
_dataOffset = stream.Position;
}
}
private byte[] ReadDataFrame(string path, int frameNum)
{
using (FileStream stream = File.OpenRead(path))
{
stream.Seek(_dataOffset + frameNum * cbFrame, SeekOrigin.Begin);
byte[] data = new byte[cbFrame];
stream.Read(data, 0, cbFrame);
return data;
}
return null;
}
The problem is that when I set _dataOffset to stream.Position, I get the position that the StreamReader has read to, not the end of the header. As soon as I thought about it this made sense, but I still need to be able to know where the end of the header is and I'm not sure if there's a way to do it and still take advantage of StreamReader.
You can find out how many bytes the StreamReader has actually returned (as opposed to read from the stream) in a number of ways, none of them too straightforward I'm afraid.
Get the result of textReader.CurrentEncoding.GetByteCount(totalLengthOfAllTextRead) and then seek to this position in the stream.
Use some reflection hackery to retrieve the value of the private variable of the StreamReader object that corresponds to the current byte position within the internal buffer (different from that with the stream - usually behind, but no more than equal to of course). Judging by .NET Reflector, the this variable seems to be named bytePos.
Don't bother using a StreamReader at all but instead implement your custom ReadLine function built on top of the Stream or BinaryReader even (BinaryReader is guaranteed never to read further ahead than what you request). This custom function must read from the stream char by char, so you'd actually have to use the low-level Decoder object (unless the encoding is ASCII/ANSI, in which case things are a bit simpler due to single-byte encoding).
Option 1 is going to be the least efficient I would imagine (since you're effectively re-encoding text you just decoded), and option 3 the hardest to implement, though perhaps the most elegant. I'd probably recommend against using the ugly reflection hack (option 2), even though it's looks tempting, being the most direct solution and only taking a couple of lines. (To be quite honest, the StreamReader class really ought to expose this variable via a public property, but alas it does not.) So in the end, it's up to you, but either method 1 or 3 should do the job nicely enough...
Hope that helps.
So the data is utf8 (the default encoding for StreamReader). This is a multibyte encoding, so IndexOf would be inadvisable. You could:
Encoding.UTF8.GetByteCount(string)
on your data so far, adding 1 or 2 bytes for the missing line ending.
If you're needing to count bytes, I'd go with the BinaryReader. You can take the results and cast them about as needed, but I find its idea of its current position to be more reliable (in that since it reads in binary, its immune to character-set problems).
So your last line contains 'DATA' + an unknown amount of data bytes. You could extract the position by using IndexOf() with your last read line. Then readjust the stream.Position.
But I am not sure if you should use ReadLine() at all in this case. Maybe it would be better to read byte by byte until you reach the 'DATA' mark.
The line breaks are easily identifiable without needing to decode the stream first (except for some encodings rarely used for text files like EBCDIC, UTF-16, UTF-32), so you can just read each line as bytes and then decode the entire line:
using (FileStream stream = File.OpenRead(path)) {
List<byte> buffer = new List<byte>();
bool hasCr = false;
bool done = false;
while (!done) {
int b = stream.ReadByte();
if (b == -1) throw new IOException("End of file reached in header.");
if (b == 13) {
hasCr = true;
} else if (b == 10 && hasCr) {
string line = Encoding.UTF8.GetString(buffer.ToArray(), 0, buffer.Count);
if (line == "DATA") {
done = true;
} else {
HandleHeaderLine(line);
}
buffer.Clear();
hasCr = false;
} else {
if (hasCr) buffer.Add(13);
hasCr = false;
buffer.Add((byte)b);
}
}
_dataOffset = stream.Position;
}
Instead of closing the stream and open it again, you could of course just keep on reading the data.

Categories