I have this method that is expected to return a string but the string is pretty big maybe in GB. Currently this runs into out of memory exception.
I was thinking I would write to a file (random filename) then read it delete the file and send back the response. Tried MemoryStream and StreamWriter that too ran into out of memory exception.
Did not want to initialize StringBuilder with a capacity since the size was not known and I think it needs a continuous block of memory.
What is the best course of action to solve this problem?
public string Result()
{
StringBuilder response = new StringBuilder();
for (int i = 1; i <= Int.Max; i++)
{
response.Append("keep adding some text {0}", i);
response.Append(Environment.NewLine);
}
return response.ToString();
}
Even if you could solve the memory problem on the StringBuilder, the resulting string by calling response.ToString() would be larger than the maximum length of a string in .NET, see: https://stackoverflow.com/a/140749/3997704
So you will have to make your function store its result elsewhere, like in a file.
Related
Ack. I am trying to open a specific entry in a zip file archive and store the contents in a string, instead of saving it to a file. I cannot use disk space for this per the client.
Here's what I have:
string scontents = "";
byte[] abbuffer = null;
MemoryStream oms = new MemoryStream();
try
{
//get the file contents
ozipentry.Open().CopyTo(oms);
int length = (int)oms.Length; // get file length
abbuffer = new byte[length]; // create buffer
int icount; // actual number of bytes read
int isum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((icount = oms.Read(abbuffer, isum, length - isum)) > 0)
{
isum += icount; // sum is a buffer offset for next reading
}
scontents = BytesToString(abbuffer); <----abbuffer is filled with Ascii 0
}
finally
{
oms.Close();
}
The variable abbuffer is supposed to hold that contents of the stream, but all it holds is a bunch of ascii zeros, which I guess means it didn't read (or copy) the stream! But I do not get any error messages or anything. Can someone tell me how to get this working?
I've looked everywhere on stack and on the web, and no where does anyone answer this question specifically for ASP.NET 4.5 ZipArchive library. I cannot use any other library, so if you offer an answer in that, while it would be educational, won't help me at all in this instance. Thanks so much for any help!
One more thing. 'ozipentry' is of type ZipArchiveEntry and is an element in a ZipArchive Entries array. (ie ozipentry = oziparchive.Entries[i])
Oops. One more thing! The function 'BytesToString' is not included, because it is irrelevant. Before the function is called, the abbuffer array is already filled with 0's
Ok. Sorry for being so dense. I realized I was overthinking this. I changed to function to do this:
osr = new StreamReader(ozipentry.Open(), Encoding.Default);
scontents = osr.ReadToEnd();
And it worked fine! Didn't even have to worry about Encoding...
I am using Chrome's Native Messaging API to pass the DOM of a page to my host. When I try passing a small string from my extension to my host, everything works, but when I try to pass the entire DOM (which isn't that large...only around 260KB), everything runs much slower and I eventually get a Native host has exited error preventing the host from responding.
My main question: Why does it take so long to pass a 250KB - 350KB message from the extension to the host?
According to the developer's site:
Chrome starts each native messaging host in a separate process and communicates with it using standard input (stdin) and standard output (stdout). The same format is used to send messages in both directions: each message is serialized using JSON, UTF-8 encoded and is preceded with 32-bit message length in native byte order. The maximum size of a single message from the native messaging host is 1 MB, mainly to protect Chrome from misbehaving native applications. The maximum size of the message sent to the native messaging host is 4 GB.
The page's whose DOMs I'm interested in sending to my host are no more than 260KB (and on occasion 300KB), well below the 4GB imposed maximum.
popup.js
document.addEventListener('DOMContentLoaded', function() {
var downloadButton = document.getElementById('download_button');
downloadButton.addEventListener('click', function() {
chrome.tabs.query({currentWindow: true, active: true}, function (tabs) {
chrome.tabs.executeScript(tabs[0].id, {file: "getDOM.js"}, function (data) {
chrome.runtime.sendNativeMessage('com.google.example', {"text":data[0]}, function (response) {
if (chrome.runtime.lastError) {
console.log("Error: " + chrome.runtime.lastError.message);
} else {
console.log("Response: " + response);
}
});
});
});
});
});
host.exe
private static string StandardOutputStreamIn() {
Stream stdin = new Console.OpenStandardInput();
int length = 0;
byte[] bytes = new byte[4];
stdin.Read(bytes, 0, 4);
length = System.BitConverter.ToInt32(bytes, 0);
string = "";
for (int i=0; i < length; i++)
string += (char)stdin.ReadByte();
return string;
}
Please note, I found the above method from this question.
For the moment, I'm just trying to write the string to a .txt file:
public void Main(String[] args) {
string msg = OpenStandardStreamIn();
System.IO.File.WriteAllText(#"path_to_file.txt", msg);
}
Writing the string to the file takes a long time (~4 seconds, and sometimes up to 10 seconds).
The amount of text that is actually written varies, but it's never more than just the top document declaration and a few IE comment tags. All the text now shows up.
This file with barely any text is 649KB but the actual document should only 205KB (when I download it). The file is still slightly larger than it should be (216KB when it should be 205KB).
I've tested my getDOM.js function by just downloading the file, and the entire process is almost instantaneous.
I'm not sure why this process is taking such a long time, why the file is so huge, or why barely any of the message is actually being sent.
I'm not sure if this has something to do with deserializing the message in a specific way, if I should create a port instead of using the chrome.runtime.sendNativeMessage(...); method, or if there's something else entirely that I'm missing.
All help is very much appreciated! Thank you!
EDIT
Although my message is correctly sending FROM the extension TO the host, I am now receiving a Native host has exited error before the extension receive's the host's message.
This question is essentially asking, "How can I efficiently and quickly read information from the standard input?"
In the above code, the problem is not between the Chrome extension and the host, but rather between the standard input and the method that reads from the standard input stream, namely StandardOutputStreamIn().
The way the method works in the OP's code is that a loop runs through the standard input stream and continuously concatenates the input string with a new string (i.e. the character it reads from the byte stream). This is an expensive operation, and we can get around this by creating a StreamReader object to just grab the entire stream at once (especially since we know the length information contained in the first 4 bytes). So, we fix the speed issue with:
public static string OpenStandardStreamIn()
{
//Read 4 bytes of length information
System.IO.Stream stdin = Console.OpenStandardInput();
int length = 0;
byte[] bytes = new byte[4];
stdin.Read(bytes, 0, 4);
length = System.BitConverter.ToInt32(bytes, 0);
char[] buffer = new char[length];
using (System.IO.StreamReader sr = new System.IO.StreamReader(stdin))
{
while (sr.Peek() >= 0)
{
sr.Read(buffer, 0, buffer.Length);
}
}
string input = new string(buffer);
return input;
}
While this fixes the speed problem, I am unsure why the extension is throwing a Native host has exited error.
I have a created a StringBuilder of length "132370292", when I try to get the string using the ToString() method it throws OutOfMemoryException.
StringBuilder SB = new StringBuilder();
for(int i =0; i<=5000; i++)
{
SB.Append("Some Junk Data for testing. My Actual Data is created from different sources by Appending to the String Builder.");
}
try
{
string str = SB.ToString(); // Throws OOM mostly
Console.WriteLine("String Created Successfully");
}
catch(OutOfMemoryException ex)
{
StreamWriter sw = new StreamWriter(#"c:\memo.txt", true);
sw.Write(SB.ToString()); //Always writes to the file without any error
Console.WriteLine("Written to File Successfully");
}
What is the reason for the OOM while creating a new string and why it doesn't throw OOM while writing to a file?
Machine Details: 64-bit, Windows-7, 2GB RAM, .NET version 2.0
What is the reason for the OOM while creating a new string
Because you're running out of memory - or at least, the CLR can't allocate an object with the size you've requested. It's really that simple. If you want to avoid the errors, don't try to create strings that don't fit into memory. Note that even if you have a lot of memory, and even if you're running a 64-bit CLR, there are limits to the size of objects that can be created.
and why it doesn't throw OOM while writing to a file ?
Because you have more disk space than memory.
I'm pretty sure the code isn't exactly as you're describing though. This line would fail to compile:
sw.write(SB.ToString());
... because the method is Write rather than write. And if you're actually calling SB.ToString(), then that's just as likely to fail as str = SB.ToString().
It seems more likely that you're actually writing to the file in a streaming fashion, e.g.
using (var writer = File.CreateText(...))
{
for (int i = 0; i < 5000; i++)
{
writer.Write(mytext);
}
}
That way you never need to have huge amounts of text in memory - it just writes it to disk as it goes, possibly with some buffering, but not enough to cause memory issues.
Workaround: Suppose you would want to write a big string stored in StringBuilder to a StreamWriter, I would do a write this way to avoid SB.ToString's OOM exception. But if your OOM exception is due to StringBuilder's content add itself, you should work on that.
public const int CHUNK_STRING_LENGTH = 30000;
while (SB.Length > CHUNK_STRING_LENGTH )
{
sw.Write(SB.ToString(0, CHUNK_STRING_LENGTH ));
SB.Remove(0, CHUNK_STRING_LENGTH );
}
sw.Write(SB);
You have to remember that strings in .NET are stored in memory in 16-bit unicode. This means string of length 132370292 will reqire 260MB of RAM.
Furthermore, while executing
string str = SB.ToString();
you are creating a COPY of your string (another 260MB).
Keep in mind that each process have its own RAM limit so OutOfMemoryException can be thrown even if you have some free RAM left.
Might help someone , if your logic needs large objects then you can change your application to 64bit and also
change your app.config by adding this section
<runtime>
<gcAllowVeryLargeObjects enabled="true" />
</runtime>
gcAllowVeryLargeObjects On 64-bit platforms, enables arrays that are greater than 2 gigabytes (GB) in total size.
String m_filename = "c:\temp\myfile.xml"
StreamWriter sw = new StreamWriter(m_filename);
while (sb.Length > 0)
{
int writelen = Math.Min(sb.Length, 30000);
sw.Write (sb.ToString(0, writelen));
sb.Remove (0,writelen);
}
sw.Flush();
sw.Close();
sw = null;
OK, I made a C# winform app, it's a File_Splitter_Joiner.
You just give it a file and it splits it for you to a number of pieces you specify.
The splitting is done in a separate thread.
Everything was working pretty fine until I sliced a 1Gig file!
In the task manager, I saw that my program started consuming 1Gigabyte of memory and my computer almost died!
not just that, when slicing finished, the consuming didn't change!
(dunno if this means that the garbage collector isn't working, although I'm pretty sure that I lost all references to what was holding the big data chumps, so it should work)
Here's the Splitter constructor (just to give you a better idea):
public FileSplitter(string FileToSplitPath, string PiecesFolder, int NumberOfPieces, int PieceSize, SplittingMethod Method)
{
FileToSplitInfo = new FileInfo(FileToSplitPath);
this.FileToSplitPath = FileToSplitPath;
this.PiecesFolder = PiecesFolder;
this.NumberOfPieces = NumberOfPieces;
this.PieceSize = PieceSize;
this.Method = Method;
SplitterThread = new Thread(Split);
}
And here is the method that did the actual splitting:
(I'm still a newbie, so what you're about to see 'may not' be done in the best way ever possible, I'm just learning here)
private void Split()
{
int remainingSize = 0;
int remainingPos = -1;
bool isNumberOfPiecesEqualInSize = true;
int fileSize = (int)FileToSplitInfo.Length; // FileToSplitInfo is a FileInfo object
if (fileSize % PieceSize != 0)
{
remainingSize = fileSize % PieceSize;
remainingPos = fileSize - remainingSize;
isNumberOfPiecesEqualInSize = false;
}
byte[] fileBytes = new byte[fileSize];
var _fs = File.Open(FileToSplitPath, FileMode.Open);
BinaryReader br = new BinaryReader(_fs);
br.Read(fileBytes, 0, fileSize);
br.Close();
_fs.Close();
for (int i = 0, index = 0; i < NumberOfPieces; i++, index += PieceSize)
{
var fs = File.Create(PiecesFolder + "\\" + Path.GetFileName(FileToSplitPath) + "." + (i+1).ToString());
var bw = new BinaryWriter(fs);
bw.Write(fileBytes, index, PieceSize);
if(i == NumberOfPieces-1 && !isNumberOfPiecesEqualInSize && Method == SplittingMethod.NumberOfPieces)
bw.Write(fileBytes, remainingPos, remainingSize);
bw.Close();
fs.Close();
}
MessageBox.Show("File has been splitted successfully!");
SplitterThread.Abort();
}
Now, instead of reading the bytes of the file via a BinaryReader, I was first reading it via the File.ReadAllBytes method, it was working fine with small file sizes, but, I got a "SystemOutOfMemory" exception when I dealt with our big guy, dunno why I didn't get that exception when I read the bytes via a BinaryReader.
(that was an in between question)
So, the main question is, how can I load big files (gigs speaking) in a way that doesn't consume so much memory ? I mean, how can I make my program not consume all that memory ?
and how I can I free the used memory after the splitting is done ?
(I actually used
bw.Dispose; fs.Dispose;
instead of
bw.Close(); fs.Close();
it was the same.
I know the Q might not make sense, cuz when we load something, it gets in our memory not somewhere else, but, the reason I asked it like that, is cuz I used another Splitting_Joining program (not written by me) just to see that if it had the same problem, I loaded the file, the program consumed about 5Migs of ram, when I started splitting, it used about 10Migs!!
Now that is a VERY big difference .. Probably that app was in C/C++ ..
So to sum up, who sucks ? is it my code, and if so how can I fix it ? or is it C# when it comes to performance ?
Thank you SOOO much for anything you could hook me up with :)
The following 2 lines will kil you:
int fileSize = (int)FileToSplitInfo.Length; // a FileInfo object
...
byte[] fileBytes = new byte[fileSize];
Your code will fail when the size is over Int32.MaxValue. Unnecessary, just use long fileSize = FileToSplitInfo.Length;
This corrected code will fail when there is not enough contiguous memory. Fragmentation (of the LOH) will bring you down sooner or later.
You allocate memory for the entire file but your only need PieceSize bytes at a time.
You don't even need to know the fileSize, just
byte[] pieceBuffer = new byte[PieceSize];
while (true)
{
int nBytes = br.Read(pieceBuffer, 0, pieceBuffer.Length);
if (nBytes == 0)
break;
// write this piece, the length is nBytes
}
There are different aspects that can be made better:
if you are working with big file, why first read all inside an array and after write into another file ? Just write into the new file while reading from the other.
use using to gurantee disposal of the streams, in any case: either there is an exception or not.
if you begin to work with really big file, like 1GB or even more, I would recommend to look on Memory Mapped Files. So you will laverage incredible memory consuption benefit with some increased performance cost.
I am trying to empower users to upload large files. Before I upload a file, I want to chunk it up. Each chunk needs to be a C# object. The reason why is for logging purposes. Its a long story, but I need to create actual C# objects that represent each file chunk. Regardless, I'm trying the following approach:
public static List<FileChunk> GetAllForFile(byte[] fileBytes)
{
List<FileChunk> chunks = new List<FileChunk>();
if (fileBytes.Length > 0)
{
FileChunk chunk = new FileChunk();
for (int i = 0; i < (fileBytes.Length / 512); i++)
{
chunk.Number = (i + 1);
chunk.Offset = (i * 512);
chunk.Bytes = fileBytes.Skip(chunk.Offset).Take(512).ToArray();
chunks.Add(chunk);
chunk = new FileChunk();
}
}
return chunks;
}
Unfortunately, this approach seems to be incredibly slow. Does anyone know how I can improve the performance while still creating objects for each chunk?
thank you
I suspect this is going to hurt a little:
chunk.Bytes = fileBytes.Skip(chunk.Offset).Take(512).ToArray();
Try this instead:
byte buffer = new byte[512];
Buffer.BlockCopy(fileBytes, chunk.Offset, buffer, 0, 512);
chunk.Bytes = buffer;
(Code not tested)
And the reason why this code would likely be slow is because Skip doesn't do anything special for arrays (though it could). This means that every pass through your loop is iterating the first 512*n items in the array, which results in O(n^2) performance, where you should just be seeing O(n).
Try something like this (untested code):
public static List<FileChunk> GetAllForFile(string fileName, FileMode.Open)
{
var chunks = new List<FileChunk>();
using (FileStream stream = new FileStream(fileName))
{
int i = 0;
while (stream.Position <= stream.Length)
{
var chunk = new FileChunk();
chunk.Number = (i);
chunk.Offset = (i * 512);
Stream.Read(chunk.Bytes, 0, 512);
chunks.Add(chunk);
i++;
}
}
return chunks;
}
The above code skips several steps in your process, preferring to read the bytes from the file directly.
Note that, if the file is not an even multiple of 512, the last chunk will contain less than 512 bytes.
Same as Robert Harvey's answer, but using a BinaryReader, that way I don't need to specify an offset. If you use a BinaryWriter on the other end to reassemble the file, you won't need the Offset member of FileChunk.
public static List<FileChunk> GetAllForFile(string fileName) {
var chunks = new List<FileChunk>();
using (FileStream stream = new FileStream(fileName)) {
BinaryReader reader = new BinaryReader(stream);
int i = 0;
bool eof = false;
while (!eof) {
var chunk = new FileChunk();
chunk.Number = i;
chunk.Offset = (i * 512);
chunk.Bytes = reader.ReadBytes(512);
chunks.Add(chunk);
i++;
if (chunk.Bytes.Length < 512) { eof = true; }
}
}
return chunks;
}
Have you thought about what you're going to do to compensate for packet loss and data corruption?
Since you mentioned that the load is taking a long time then I would use asynchronous file reading in order to speed up the loading process. The hard disk is the slowest component of a computer. Google does asynchronous reads and writes on Google Chrome to improve their load times. I had to do something like this in C# in a previous job.
The idea would be to spawn several asynchronous requests over different parts of the file. Then when a request comes in, take the byte array and create your FileChunk objects taking 512 bytes at a time. There are several benefits to this:
If you have this run in a separate thread, then you won't have the whole program waiting to load the large file you have.
You can process a byte array, creating FileChunk objects, while the hard disk is still trying to for-fill read request on other parts of the file.
You will save on RAM space if you limit the amount of pending read requests you can have. This allows less page faulting to the hard disk and use the RAM and CPU cache more efficiently, which speeds up processing further.
You would want to use the following methods in the FileStream class.
[HostProtectionAttribute(SecurityAction.LinkDemand, ExternalThreading = true)]
public virtual IAsyncResult BeginRead(
byte[] buffer,
int offset,
int count,
AsyncCallback callback,
Object state
)
public virtual int EndRead(
IAsyncResult asyncResult
)
Also this is what you will get in the asyncResult:
// Extract the FileStream (state) out of the IAsyncResult object
FileStream fs = (FileStream) ar.AsyncState;
// Get the result
Int32 bytesRead = fs.EndRead(ar);
Here is some reference material for you to read.
This is a code sample of working with Asynchronous File I/O Models.
This is a MS documentation reference for Asynchronous File I/O.