I am having a problem with a console job that runs and creates a daily log file that I archive at midnight.
This creates a blank log file for the next day and an archived file with yesterdays date in the name and the contents of the old file for debugging issues I may have had and not known about until the day after.
However since I cranked up the BOT's job I have been hitting issues with System Out of Memory errors when I try and archive the file.
At first I was just not able to get an archived file at all then I worked out a way to get at least the last 100,000 lines which is not nearly enough.
I wrap everything in 3 try/catches
I/O
System out of memory
standard exception
However it's always the OutOfMemoryException that I get e.g
System.OutOfMemoryException Error: Exception of type 'System.OutOfMemoryException' was thrown.;
To give you an example of size 100,000 lines of log is about 11MB file
A standard full log file can be anything from 1/2 a GB to 2GB
What I need to know is this:
a) what size of a standard text file will throw an out of memory error when trying to use File.ReadAllText or a custom StreamReader function I call ReadFileString e.g
public static string ReadFileString(string path)
{
// Use StreamReader to consume the entire text file.
using (StreamReader reader = new StreamReader(path))
{
return reader.ReadToEnd();
}
}
b) is it my computers memory (I have 16GB RAM - 8GB use at time of copying) or the objects I am using in C# that are failing with the opening and copying of files.
When archiving I first try with my custom ReadFileString function (see above), if that returns 0 bytes of content I try File.ReadAllText and then if that fails I try a custom function to get the last 100,000 lines, which is really not enough for debugging errors earlier in the day.
The log file starts at midnight when a new one is created and logs all day. I never used to have out of memory errors but since I have turned up the frequency of method calls the logging has expanded which means the file sizes have as well.
This is my custom function for getting the last 100,000 lines. I am wondering how many lines I could get without IT throwing an out of memory error and me not getting any contents of the last days log file at all.
What do people suggest for the maximum file size for various methods / memory needed to hold X lines, and what is the best method for obtaining as much of the log file as possible?
E.G some way of looping line by line until an exception is hit and then saving what I have.
This is my GetHundredThousandLines method and it logs to a very small debug file so I can see what errors happened during the archive process.
private bool GetHundredThousandLines(string logpath, string archivepath)
{
bool success = false;
int numberOfLines = 100000;
if (!File.Exists(logpath))
{
this.LogDebug("GetHundredThousandLines - Cannot find path " + logpath + " to archive " + numberOfLines.ToString() + " lines");
return false;
}
var queue = new Queue<string>(numberOfLines);
using (FileStream fs = File.Open(logpath, FileMode.Open, FileAccess.Read, FileShare.Read))
using (BufferedStream bs = new BufferedStream(fs)) // May not make much difference.
using (StreamReader sr = new StreamReader(bs))
{
while (!sr.EndOfStream)
{
if (queue.Count == numberOfLines)
{
queue.Dequeue();
}
queue.Enqueue(sr.ReadLine() + "\r\n");
}
}
// The queue now has our set of lines. So print to console, save to another file, etc.
try
{
do
{
File.AppendAllText(archivepath, queue.Dequeue(), Encoding.UTF8);
} while (queue.Count > 0);
}
catch (IOException exception)
{
this.LogDebug("GetHundredThousandLines - I/O Error accessing daily log file with ReadFileString: " + exception.Message.ToString());
}
catch (System.OutOfMemoryException exception)
{
this.LogDebug("GetHundredThousandLines - Out of Memory Error accessing daily log file with ReadFileString: " + exception.Message.ToString());
}
catch (Exception exception)
{
this.LogDebug("GetHundredThousandLines - Exception accessing daily log file with ReadFileString: " + exception.Message.ToString());
}
if (File.Exists(archivepath))
{
this.LogDebug("GetHundredThousandLines - Log file exists at " + archivepath);
success = true;
}
else
{
this.LogDebug("GetHundredThousandLines - Log file DOES NOT exist at " + archivepath);
}
return success;
}
Any help would be much appreciated.
Thanks
try:
keep the queue and stream position in class scope, try GC.Collect() when getting out of memory exception and call function again. seek stream to last position and continue.
or:
use one database like sqlite and keep newest 100000 record in each table.
Related
I have 369 files that need to be formatted and consolidated into 5-8 files before being submitted to the server. I can't submit the 369 files because that would overwhelm the metadata tables in our database (they can handle it, but it'd be 369 rows for what was essentially one file, which would make querying and utilizing those tables a nightmare) and I can't handle it as one file because the total of 3.6 GB is too much for SSIS to handle on our servers.
I wrote the following script to fix the issue:
static void PrepPAIDCLAIMSFiles()
{
const string HEADER = "some long header text, trimmed for SO question";
const string FOOTER = "some long footer text, trimmed for SO question";
//path is defined as a static member of the containing class
string[] files = Directory.GetFiles(path + #"split\");
int splitFileCount = 0, finalFileCount = 0;
List<string> newFileContents = new List<string>();
foreach(string file in files)
{
try
{
var contents = File.ReadAllLines(file).ToList();
var fs = File.OpenRead(file);
if (splitFileCount == 0)
{
//Grab everything except the header
contents = contents.GetRange(1, contents.Count - 1);
}
else if (splitFileCount == files.Length - 1)
{
//Grab everything except the footer
contents = contents.GetRange(0, contents.Count - 1);
}
if (!Directory.Exists(path + #"split\formatted"))
{
Directory.CreateDirectory(path + #"split\formatted");
}
newFileContents.AddRange(contents);
if (splitFileCount % 50 == 0 || splitFileCount >= files.Length)
{
Console.WriteLine($"{splitFileCount} {finalFileCount}");
var sb = new StringBuilder(HEADER);
foreach (var row in newFileContents)
{
sb.Append(row);
}
sb.Append(FOOTER);
newFileContents = new List<string>();
GC.Collect();
string fileName = file.Split('\\').Last();
string baseFileName = fileName.Split('.')[0];
DateTime currentTime = DateTime.Now;
baseFileName += "." + COMPANY_NAME_SetHHMMSS(currentTime, finalFileCount) + ".TXT";
File.WriteAllText(path + #"split\formatted\" + baseFileName, sb.ToString());
finalFileCount += 1;
}
splitFileCount += 1;
}
catch(OutOfMemoryException OOM)
{
Console.WriteLine(file);
Console.WriteLine(OOM.Message);
break;
}
}
}
The way this works is it reads the split file, puts its rows into a string builder, every time it gets to a multiple of 50 files, it writes the string builder to a new file and starts over. The COMPANY_NAME_SetHHMMSS() method ensures the file has a unique name, so it's not writing to the same file over and over (and I can verify this by seeing the output, it writes two files before exploding.)
It breaks when it gets to the 81st file. System.OutOfMemoryException on var contents = File.ReadAllLines(file).ToList();. There's nothing special about the 81st file, it's the same exact size as all the others (~10MB.) The files this function delivers are about ~500MB. It also has no trouble reading and processing all the files upto and not including the 81st, so I don't think that it's running out of memory reading the file, but running out of memory doing something else and it's at the 81st where memory runs out.
The newFileContents() list should be getting emptied by overwriting it with a new list, right? That shouldn't be growing with every iteration in this function. GC.Collect() was sort of a last ditch effort.
The original file that the 369 splits come from has been a headache for a few days now, causing UltraEdit to crash, SSIS to crash, C# to crash, etc. Splitting it via 7zip seemed to be the only option that worked, and splitting it to 369 files seemed to be the only option 7zip had that didn't also reformat or somehow compress the file in an undesirable way.
Is there something that I'm missing? Something in my code that keeps growing in memory? I know File.ReadAllLines() opens and closes the file, so it should be disposed after called, right? newFileContents() gets overwritten every 50th file, as does the string builder. What else could I be doing?
One thing that jumps out at me is that you are opening a FileStream, never using it, and never disposing of it. With 300+ file streams this may be contributing to your issue.
var fs = File.OpenRead(file);
Another thing that perked my ear is that you said 3.6GB. Make sure you are compiling for 64 bit architecture.
Finally, stuffing gigabytes into a string builder may cause you grief. Maybe create a staging file - which every time you open a new input file, you write that to the staging file, close the input, and not depend on stuffing everything into memory.
You should just be looping over the rows in your source files and appending them to a new file. You're holding the contents of up to 50 10MB files in memory at once, plus anything else you're doing. This may be because you're compiling for x86 instead of x64, but there isn't any reason this should use anywhere near that memory. Something like the following:
var files = Directory.Getfiles(System.IO.Path.Combing(path, "split")).ToList();
//since you were skipping the first and last file
files.Remove(files.FirstOrDefault());
files.Remove(files.LastOrDefault());
string combined_file_path = "<whatever you want to call this>";
System.IO.StreamWriter combined_file_writer = null;
try
{
foreach(var file in files)
{
//if multiple of 50, write footer, dispose of stream, and make a new stream
if((files.IndexOf(file)) % 50 == 0)
{
combined_file_writer?.WriteLine(FOOTER);
combined_file_writer?.Dispose();
combined_file_writer = new System.IO.StreamWriter(combined_file_path + "_1"); //increment the name somewhow
combined_file_writer.WriteLine(Header);
}
using(var file_reader = new System.IO.StreamReader(file))
{
while(!file_reader.EOF)
{
combined_file_writer.WriteLine(file_reader.ReadLine());
}
}
}
//finish out the last file
combined_file_writer?.WriteLine(FOOTER);
}
finally
{
//dispose of last file
combined_file_writer?.Dispose();
}
I have a solution that acts as an interface between two systems, reading files that were dropped on an FTP site and importing any orders/products/etc. into the target system.
When a file is picked up, it is moved to a temp file in the same location, and then the contents are read into an XmlDocument.
string[] files = Directory.GetFiles(pickupFolder, fileFilter, SearchOption.TopDirectoryOnly);
foreach (string pathToFile in files)
{
FileInfo srcFile = new FileInfo(pathToFile);
string tmpFilename = Path.Combine(srcFile.DirectoryName, $"~{Path.GetFileNameWithoutExtension(srcFile.Name)}.tmp");
srcFile.MoveTo(tmpFilename);
XmlDocument srcXml = new XmlDocument();
try
{
using (FileStream fs = srcFile.Open(FileMode.Open, FileAccess.Read))
{
srcXml.Load(fs);
}
}
catch (XmlException ex)
{
throw new FileException($"Invalid XML in {srcFile.Name}.", ex);
}
}
Very, very occassionally, the interface will attempt to open the file so that it can be loaded into the XmlDocument while the moving process has not been completed, throwing an IOException. Is there some way to prevent this from happening?
What is the best way to create something like this that needs to iterate through and process files?
The file move operation will throw an exception when the FTP server is still having a lock on the file. That may happen when the file is still being uploaded and is not yet completed, but is "visible" on the disk. Such collisions are rare, but they happen.
Start by checking your FTP server settings and features if it can hide incomplete files during upload. Another way is if you control the system that uploads files, you could upload them with a special "do not download" extension, and rename them after the upload is complete (atomic operation). Finally, as other pointed out, you could simply catch this specific exception and retry with a delay.
As others have pointed out, if process runs periodically, you can simply wrap it with try / catch block:
try
{
srcFile.MoveTo(tmpFilename);
}
catch (Excption ex)
{
// Write log entry if required
continue;
}
If it's a one-off process, then you'll need to periodically attempt MoveTo until file is released and can be moved. Something like this may work:
int maxRetries = 60;
int retries = 0;
bool success = false;
while (retries < maxRetries)
{
try
{
retries++;
srcFile.MoveTo(tmpFilename);
success = true;
break;
}
catch (Excption ex)
{
// Log the error if required
Thread.Sleep(1000); // Wait 1 second
}
}
if (success == fale)
{
// Log the error
continue; // Skip the file if its still not released
}
The code tries to access the file every second during a minute. If it fails, then program skips this file and continues to next.
I am working on an application which reads paths of all the text files from a folder into a list. It reads each file, creates a temporary output file, overwrites the original file with temporary output file and deletes the temporary output file.
Following is my code:
foreach (string lF in multipleFiles)
{
int lineNumber = 0;
using (StreamReader sr = new StreamReader(lF))
{
using (StreamWriter sw = new StreamWriter(lF + "Output"))
{
while (!sr.EndOfStream)
{
//LOGIC
sw.WriteLine(line);
}
}
}
File.Copy(lF + "Output", lF, true);
//File.Delete(lF + "Output");
try
{
File.Delete(lF + "Output"); <--- ERROR HERE
}
catch(Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
I am unable to delete the temporary output file due to the following error:
{"The process cannot access the file '' because it is being
used by another process."}
The error does not occur for every file but only a few. None of the files are open or being used by any other application.
How can the temporary file be deleted?
UPDATE: Refereed to Does FileStream.Dispose close the file immediately?
Added Thread.Sleep(1) before File.Delete(), The issue still exists. Tried increasing the sleep value to 5. No luck.
You always run the risk that an virus scanner or some other driver in the stack still holds on to that file or its directory entry. Use some retry mechanisms but that still doesn't guarantee you'll be able to remove that file as the file operations are not atomic, so any process can open that file between your calls trying to delete it.
var path = lf + "Output";
// we iterate a couple of times (10 in this case, increase if needed)
for(var i=0; i < 10; i++)
{
try
{
File.Delete(path);
// this is success, so break out of the loop
break;
} catch (Exception exc)
{
Trace.WriteLine("failed delete #{0} with error {1}", i, exc.Message);
// allow other waiting threads do some work first
// http://blogs.msmvps.com/peterritchie/2007/04/26/thread-sleep-is-a-sign-of-a-poorly-designed-program/
Thread.Sleep(0);
// we don't throw, we just iterate again
}
}
if (File.Exists(path))
{
// deletion still not happened
// this is beyond the code can handle
// possible options:
// store the filepath to be deleted on startup
// throw an exception
// format the disk (only joking)
}
Code slightly adapted from my answer here but that was in a different context.
I have a code that is SSIS script task to zip file written in C#.
I have problem when zipping 1gb (approxymately) file.
I try to implement this code and still get error 'System.OutOfMemoryException'
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at ST_4cb59661fb81431abcf503766697a1db.ScriptMain.AddFileToZipUsingStream(String sZipFile, String sFilePath, String sFileName, String sBackupFolder, String sPrefixFolder) in c:\Users\dtmp857\AppData\Local\Temp\vsta\84bef43d323b439ba25df47c365b5a29\ScriptMain.cs:line 333
at ST_4cb59661fb81431abcf503766697a1db.ScriptMain.Main() in c:\Users\dtmp857\AppData\Local\Temp\vsta\84bef43d323b439ba25df47c365b5a29\ScriptMain.cs:line 131
This is the snippet of code when zipping file:
protected bool AddFileToZipUsingStream(string sZipFile, string sFilePath, string sFileName, string sBackupFolder, string sPrefixFolder)
{
bool bIsSuccess = false;
try
{
if (File.Exists(sZipFile))
{
using (ZipArchive addFile = ZipFile.Open(sZipFile, ZipArchiveMode.Update))
{
addFile.CreateEntryFromFile(sFilePath, sFileName);
//Move File after zipping it
BackupFile(sFilePath, sBackupFolder, sPrefixFolder);
}
}
else
{
//from https://stackoverflow.com/questions/28360775/adding-large-files-to-io-compression-ziparchiveentry-throws-outofmemoryexception
using (var zipFile = ZipFile.Open(sZipFile, ZipArchiveMode.Update))
{
var zipEntry = zipFile.CreateEntry(sFileName);
using (var writer = new BinaryWriter(zipEntry.Open()))
using (FileStream fs = File.Open(sFilePath, FileMode.Open))
{
var buffer = new byte[16 * 1024];
using (var data = new BinaryReader(fs))
{
int read;
while ((read = data.Read(buffer, 0, buffer.Length)) > 0)
writer.Write(buffer, 0, read);
}
}
}
//Move File after zipping it
BackupFile(sFilePath, sBackupFolder, sPrefixFolder);
}
bIsSuccess = true;
}
catch (Exception ex)
{
throw ex;
}
return bIsSuccess;
}
What I am missing, please give me suggestion maybe tutorial or best practice handling this problem.
I know this is an old post but what can I say, it helped me sort out some stuff and still comes up as a top hit on Google.
So there is definitely something wrong with the System.IO.Compression library!
First and Foremost...
You must make sure to turn off 32-Preferred. Having this set (in my case with a build for "AnyCPU") causes so many inconsistent issues.
Now with that said, I took some demo files (several less than 500MB, one at 500MB, and one at 1GB), and created a sample program with 3 buttons that made use of the 3 methods.
Button 1 - ZipArchive.CreateFromDirectory(AbsolutePath, TargetFile);
Button 2 - ZipArchive.CreateEntryFromFile(AbsolutePath, RelativePath);
Button 3 - Using the [16 * 1024] Byte Buffer method from above
Now here is where it gets interesting. (Assuming that the program is built as "AnyCPU" and with NO 32 Preferred check)... all 3 Methods worked on a Windows 64-Bit OS, regardless of how much memory it had.
However, as soon as I ran the same test on a 32-Bit OS, regardless of how much memory it had, ONLY method 1 worked!
Method 2 and 3 blew up with the outofmemory error AND to add salt to it, method 3 (the preferred method of chunking) actually corrupted more files than method #2!
By messed up, I mean that of my files, the 500MB and the 1GB file ended up in the zipped archive but at a size less than the original (it was basically truncated).
So I dunno... since there are not many 32-bit OS around anymore, I guess maybe it is a moot point.
But seems like there are some bugs in the System.IO.Compression Framework!
Here is my code:
public static TextWriter twLog = null;
private int fileNo = 1;
private string line = null;
TextReader tr = new StreamReader("file_no.txt");
TextWriter tw = new StreamWriter("file_no.txt");
line = tr.ReadLine();
if(line != null){
fileNo = int.Parse(line);
twLog = new StreamWriter("log_" + line + ".txt");
}else{
twLog = new StreamWriter("log_" + fileNo.toString() + ".txt");
}
System.IO.File.WriteAllText("file_no.txt",string.Empty);
tw.WriteLine((fileNo++).ToString());
tr.Close();
tw.Close();
twLog.Close();
It throws this error:
IOException: Sharing violation on path C:\Users\Water Simulation\file_no.txt
What i'm trying to do is just open a file with log_x.txt name and take the "x" from file_no.txt file.If file_no.txt file is empty make log file's name log_1.txt and write "fileNo + 1" to file_no.txt.After a new program starts the new log file name must be log_2.txt.But i'm getting this error and i couldn't understand what am i doing wrong.Thanks for help.
Well, you're trying to open the file file_no.txt for reading and for writing using separate streams. This may not work as the file will be locked by the reading stream, so the writing stream can't be created and you get the exception.
One solution would be to read the file first, close the stream and then write the file after increasing the fileNo. That way the file is only opened once at a time.
Another way would be to create a file stream for both read and write access like that:
FileStream fileStream = new FileStream(#"file_no.txt",
FileMode.OpenOrCreate,
FileAccess.ReadWrite,
FileShare.None);
The accepted answer to this question seems to have a good solution also, even though I assume you do not want to allow shared reads.
Possible alternate solution
I understand you want to create unique log files when your program starts. Another way to do so would be this:
int logFileNo = 1;
string fileName = String.Format("log_{0}.txt", logFileNo);
while (File.Exists(fileName))
{
logFileNo++;
fileName = String.Format("log_{0}.txt", logFileNo);
}
This increases the number until it finds a file number where the log file doesn't exist. Drawback: If you have log_1.txt and log_5.txt, the next file won't be log_6.txt but log_2.txt.
To overcome this, you could enumerate all the files in your directory with mask log_*.txt and find the greatest number by performing some string manipulation.
The possibilities are endless :-D
Well this may be old but the accepted answer didn't work for me. This is caused when you try to Read or Write a file you just created from a separate stream. Solving this is very simple, just dispose the filestream you used in creating it and then you can access the file freely.
if (!File.Exists(myfile))
{
var fs = new FileStream(fav, FileMode.Create);
fs.Dispose();
string text = File.ReadAllText(myfile);
}
enter image description here
var stream = new System.IO.FileStream(filePath, System.IO.FileMode.Create);
resizedBitmap.Compress(Bitmap.CompressFormat.Png, 200, stream); //problem here
stream.Close();
return resizedBitmap;
In the Compress method, I was passing the value of the quality parameter as 200, which sadly doesn't allows values outside the range 0-100.
I changed back the value of quality to 100 and the issue got fixed.
None of the proposed options helped me. But I found a solution:
In my case, the problem was with Anti-Virus, with intensive writing to a file, Anti-Virus started scanning the file and at that moment there was a problem with writing to the file.