Cast filestream length to int

Cast filestream length to int - c#

I have a question about the safety of a cast from long to int. I fear that the method I wrote might fail at this cast. Can you please take a look at the code below and tell me if it is possible to write something that would avoid a possible fail?
Thank you in advance.
public static string ReadDecrypted(string fileFullPath)
{
string result = string.Empty;
using (FileStream fs = new FileStream(fileFullPath, FileMode.Open, FileAccess.Read))
{
int fsLength = (int)fs.Length;
byte[] decrypted;
byte[] read = new byte[fsLength];
if (fs.CanRead)
{
fs.Read(read, 0, fsLength);
decrypted = ProtectedData.Unprotect(read, CreateEntropy(), DataProtectionScope.CurrentUser);
result = Utils.AppDefaultEncoding.GetString(decrypted, 0, decrypted.Length);
}
}
return result;
}

the short answer is: yes, this way you will have problems with any file with a length >= 2 GB!
if you don't expect any files that big then you can insert directly at the start of the using block:
if (((int)fs.Length) != fs.Length) throw new Exception ("too big");
otherwise you should NOT cast to int, but change byte[] read = new byte[fsLength];
to byte[] read = new byte[fs.Length]; and use a loop to read the file content in "chunks" of max. 2 GB per chunk.
Another alternative (available in .NET4) is to use MemoryMappedFile (see http://msdn.microsoft.com/en-us/library/dd997372.aspx) - this way you don't need to call Read at all :-)

Well, int is 32-bit and long is 64-bit, so there's always the possibility of losing some data with the cast if you're opening up 2GB files; on the other hand, that allocation of a byte array of fsLength would seem to indicate you're not expecting files that big. Put a check in to make sure that fs.Length isn't greater than 2,147,483,647, and you should be fine.

Related

Which is a better way to compare 2 files?

I have the following situation in C#:
ZipFile z1 = ZipFile.Read("f1.zip");
ZipFile z2 = ZipFile.Read("f2.zip");
MemoryStream ms1 = new MemoryStream();
MemoryStream ms2 = new MemoryStream()
ZipEntry zipentry1 = zip1["f1.dll"];
ZipEntry zipentry1 = zip2["f2.dll"];
zipentry1.Extract(ms1);
zipentry2.Extract(ms2);
byte[] b1 = new byte[ms1.Length];
byte[] b2 = new byte[ms2.Length];
ms1.Seek(0, SeekOrigin.Begin);
ms2.Seek(0, SeekOrigin.Begin);
what I have done here is opened 2 zip files f1.zip and f2.zip. Then I extract 2 files inside them (f1.txt and f2.txt inside f1.zip and f2.zip respectively) onto the MemoryStream objects. I now want to compare the files and find out if they are the same or not. I had 2 ways in mind:
1) Read the memory streams byte by byte and compare them.
For this I would use
ms1.BeginRead(b1, 0, (int) ms1.Length, null, null);
ms2.BeginRead(b2, 0, (int) ms2.Length, null, null);
and then run a for loop and compare each byte in b1 and b2.
2) Get the string values for both the memory streams and then do a string compare. For this I would use
string str1 = Encoding.UTF8.GetString(ms1.GetBuffer(), 0, (int)ms1.Length);
string str2 = Encoding.UTF8.GetString(ms2.GetBuffer(), 0, (int)ms2.Length);
and then do a simple string compare.
Now, I know comparing byte by byte will always give me a correct result. But the problem with it is, it will take a lot time as I have to do this for thousands of files. That is why I am thinking about the string compare method which looks to find out if the files are equal or not very quickly. But I am not sure if string compare will give me the correct result as the files are either dlls or media files etc and will contain special characters for sure.
Can anyone tell me if the string compare method will work correctly or not ?
Thanks in advance.
P.S. : I am using DotNetLibrary.

The baseline for this question is the native way to compare arrays: Enumerable.SequenceEqual. You should use that unless you have good reason to do otherwise.
If you care about speed, you could attempt to p/invoke to memcmp in msvcrt.dll and compare the byte arrays that way. I find it hard to imagine that could be beaten. Obviously you'd do a comparison of the lengths first and only call memcmp if the two byte arrays had the same length.
The p/invoke looks like this:
[DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)]
static extern int memcmp(byte[] lhs, byte[] rhs, UIntPtr count);
But you should only contemplate this if you really do care about speed, and the pure managed alternatives are too slow for you. So, do some timings to make sure you are not optimising prematurely. Well, even to make sure that you are optimising at all.
I'd be very surprised if converting to string was fast. I'd expect it to be slow. And in fact I'd expect your code to fail because there's no reason for your byte arrays to be valid UTF-8. Just forget you ever had that idea!

Compare ZipEntry.Crc and ZipEntry.UncompressedSize of the two files, only if they match uncompress and do the byte comparison. If the two files are the same, their CRC and Size will be the same too. This strategy will save you a ton of CPU cycles.
ZipEntry zipentry1 = zip1["f1.dll"];
ZipEntry zipentry2 = zip2["f2.dll"];
if (zipentry1.Crc == zipentry2.Crc && zipentry1.UncompressedSize == zipentry2.UncompressedSize)
{
// uncompress
zipentry1.Extract(ms1);
zipentry2.Extract(ms2);
byte[] b1 = new byte[ms1.Length];
byte[] b2 = new byte[ms2.Length];
ms1.Seek(0, SeekOrigin.Begin);
ms2.Seek(0, SeekOrigin.Begin);
ms1.BeginRead(b1, 0, (int) ms1.Length, null, null);
ms2.BeginRead(b2, 0, (int) ms2.Length, null, null);
// perform a byte comparison
if (Enumerable.SequenceEqual(b1, b2)) // or a simple for loop
{
// files are the same
}
else
{
// files are different
}
}
else
{
// files are different
}

dbml changes the type of varbinay(max) on Sql Server to Systm.Data.Linq.Binary but not byte[]

I have some table in Sql Server which has VarBinary(MAX) and I want to upload files to them, I need to make the Dbml to make the field byte[] but instead I get Systm.Data.Linq.Binary.
Why is that and how to make the default as byte[]? Thanks.
I read the file in my MVC 3 coroller action like this (resourceFile is a HttpPostedFileBase and newFile has the byte[])
newFile.FileContent = new byte[resourceFile.ContentLength];
resourceFile.InputStream.Read(newFile.FileContent, 0, resourceFile.ContentLength);

The System.Linq.Data.Binary type is a wrapper for a byte array that makes the wrapped byte array immutable. See msdn page.
Sample code to illustrate immutability:
byte[] goesIn = new byte[] { 0xff };
byte[] comesOut1 = null;
byte[] comesOut2 = null;
byte[] comesOut3 = null;
System.Data.Linq.Binary theBinary = goesIn;
comesOut1 = theBinary.ToArray();
comesOut1[0] = 0xfe;
comesOut2 = theBinary.ToArray();
theBinary = comesOut1;
comesOut3 = theBinary.ToArray();
The immutability can be seen after changing the value of the first byte in comesOut1. The byte[] wrapped in theBinary is not changed. Only after you assign the whole byte[] to theBinary does it change.
Anyway for your purpose you can use the Binary field. To assign a new value to it do as Dennis wrote in his answer. To get the byte array out of it, use the .ToArray() method.

You can use Binary to store byte[] as well, because Binary has implicit conversion from byte[]:
myEntity.MyBinaryProperty = File.ReadAllBytes("foo.txt");

Efficient way to read big endian data in C#

I use the following code to read BigEndian information using BinaryReader but I'm not sure if it is the efficient way of doing it. Is there any better solution?
Here is my code:
// some code to initialize the stream value
// set the length value to the Int32 size
BinaryReader reader =new BinaryReader(stream);
byte[] bytes = reader.ReadBytes(length);
Array.Reverse(bytes);
int result = System.BitConverter.ToInt32(temp, 0);

BitConverter.ToInt32 isn't very fast in the first place. I'd simply use
public static int ToInt32BigEndian(byte[] buf, int i)
{
return (buf[i]<<24) | (buf[i+1]<<16) | (buf[i+2]<<8) | buf[i+3];
}
You could also consider reading more than 4 bytes at a time.

As of 2019 (in fact, since .net core 2.1), there is now
byte[] buffer = ...;
BinaryPrimitives.ReadInt32BigEndian(buffer.AsSpan());
Documentation
Implementation

You could use IPAddress.NetworkToHostOrder, but I have no idea if it's actually more efficient. You'd have to profile it.

Safely access data in MemoryStream

Assume that I have a MemoryStream and function that operates on bytes.
Current code is something like this:
void caller()
{
MemoryStream ms = // not important
func(ms.GetBuffer(), 0, (int)ms.Length);
}
void func(byte[] buffer, int offset, int length)
{
// not important
}
I can not change func but I would like to minimize possibility of changing stream data from within the func.
How could / should I rewrite the code to be sure that stream data won't be changed?
Or this can't be done?
EDIT:
I am sorry, I didn't mention that a I would like to not make copies of data.

Call .ToArray.
func(ms.GetBuffer().ToArray(), 0, (int)ms.Length);
From MSDN (emphasis mine):
Note that the buffer contains allocated bytes which might be unused.
For example, if the string "test" is written into the MemoryStream
object, the length of the buffer returned from GetBuffer is 256, not
4, with 252 bytes unused. To obtain only the data in the buffer, use
the ToArray method; however, ToArray creates a copy of the data in
memory.
Ideally you would change func to take an IEnumerable<byte>. Once a method has the array, you're trusting they won't modify the data if you don't want them to. If the contract was to provide IEnumerable<byte>, the implementer would have to decide if they need a copy to edit or not.

If you can't make a copy (ToArray as suggested in other answers) and can't change signature of the func function the only thing left is try to validate that function did not change the data.
You may compute some sort of hash before/after call and check if it is the same. It will not guarantee that func did not changed the underlying data (due to hash collisions), but at least will give you good chance to know if it happened. May be useful for non-production code...
The real solution is to either provide copy of the data to untrusted code OR pass some wrapper interface/object that does not allow data changes (requires signature changes/rewrite for func).

Copy the data out of the stream by using ms.ToArray(). Obviously, there'll be a performance hit.

You cannot pass only a 'slice' of an array to a method. Either you pass a copy of the array to the method and copy the result back:
byte[] slice = new byte[length];
Buffer.BlockCopy(bytes, offset, slice, 0, length);
func(slice, 0, length);
Buffer.BlockCopy(slice, 0, bytes, offset, length);
or, if you can change the method, you pass some kind of proxy object that wraps the array and checks for each access if it's within the allowed range:
class ArrayView<T>
{
private T[] array;
private int offset;
private int length;
public T this[int index]
{
get
{
if (index < offset || index >= offset + length)
throw new ArgumentOutOfRange("index");
return array[index];
}
set
{
if (index < offset || index >= offset + length)
throw new ArgumentOutOfRange("index");
array[index] = value;
}
}
}

Are you trying to make sure that func() is never actually able to change the memory stream, or is it enough if your code can throw an exception if something is changed? Sounds like you want to do something like:
void caller()
{
MemoryStream ms = // not important
var checksum = CalculateMyChecksum(ms);
func(ms.GetBuffer(), 0, (int)ms.Length);
if(checksum != CalculateMyChecksum(ms)){
throw new Exception("Hey! Someone has been fiddling with my memory!");
}
}
I would not feel comfortable recommending this for anything important / critical though. Could you give some more information? Maybe there is a better solution to your problem, and a way to avoid this issue completely.

How to create big sized .txt file?

For certain reasons, I have to create a 1024 kb .txt file.
Below is my current code:
int size = 1024000 //1024 kb..
byte[] bytearray = new byte[size];
foreach (byte bit in bytearray)
{
bit = 0;
}
string tobewritten = string.Empty;
foreach (byte bit in bytearray)
{
tobewritten += bit.ToString();
}
//newPath is local directory, where I store the created file
using (System.IO.StreamWriter sw = File.CreateText(newPath))
{
sw.WriteLine(tobewritten);
}
I have to wait at least 30 minutes to execute this piece of code, which I consider too long.
Now, I would like to ask for advice on how to actually achieve my mentioned objective effectively. Are there any alternatives to do this task? Am I writing bad code? Any help is appreciated.

There are several misunderstandings in the code you provided:
byte[] bytearray = new byte[size];
foreach (byte bit in bytearray)
{
bit = 0;
}
You seem to think that your are initializing each byte in your array bytearray with zero. Instead you just set the loop variable bit (unfortunate naming) to zero size times. Actually this code wouldn't even compile since you cannot assign to the foreach iteration variable.
Also you didn't need initialization here in the first place: byte array elements are automatically initialized to 0.
string tobewritten = string.Empty;
foreach (byte bit in bytearray)
{
tobewritten += bit.ToString();
}
You want to combine the string representation of each byte in your array to the string variable tobewritten. Since strings are immutable you create a new string for each element that has to be garbage collected along with the string you created for bit, this is relatively expensive, especially when you create 2048000 one of them - use a Stringbuilder instead.
Lastly all of that is not needed at all anyway - it seems you just want to write a bunch of "0" characters to a text file - if you are not worried about creating a single large string of zeros (it depends on the value of size whether this makes sense) you can just create the string directly to do this one go - or alternatively write a smaller string directly to the stream a bunch of times.
using (var file = File.CreateText(newpath))
{
file.WriteLine(new string('0', size));
}

Replace the string with a pre-sized StringBuilder to avoid unnecessary allocations.
Or, better yet, write each piece directly to the StreamWriter instead of pointlessly building a 100MB in-memory string first.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Cast filestream length to int - c#

Related

Which is a better way to compare 2 files?

dbml changes the type of varbinay(max) on Sql Server to Systm.Data.Linq.Binary but not byte[]

Efficient way to read big endian data in C#

Safely access data in MemoryStream

How to create big sized .txt file?

Categories

Resources