I have a CSV file (E.g. Directories.csv) which contains a huge list of directories. I am looping through the directories from the CSV using streamreader and performing some task. I am updating the completed directory list to a dictionary and stuck at this step now.
Ask: I want to capture the data through the loop on which directories are complete in the same CSV just in case the application crashes or server reboots, so that I don't have to re-iterate through the loop again which got completed. (Or) Delete the completed directories row from the CSV
I tried to check online for suggestions and asking to create temp file and move the copy of it. Can this be possible in case the server reboots or application crashes? Please suggest how can I take this forward.
My code:
Dictionary<string, string> directoryDictionary = new Dictionary<string, string>();
using (FileStream fileStreamDirectory = File.Open(outputdir + "\\Directories.csv", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bufferStreamDirectory = new BufferedStream(fileStreamDirectory))
using (StreamReader streamReaderDirectory = new StreamReader(bufferStreamDirectory))
{
while ((Directoryline = streamReaderDirectory.ReadLine()) != null)
{
#Doing the task here
directoryDictionary.Add(Directoryline, "Completed");
}
}
You can't really insert data into middle of a text file (unless it is fixed width format which in not the case of CSV).
Two options:
read to memory, update in-memory data, rewrite whole table back to the file (may need to keep previous version in case of write failures)
use database that satisfy you criteria and import CSV there to work with.
Related
I have to read many existing CSV files on a External Drive and combine the in Sequence (Sequencing is Critical) with restore point and write to output.csv on same External Drive in different path. Example
A.CSV, B.CSV and so on to Output.csv , I am always appending to output.csv but there are high probability that IO operation fails. Like when writing B.CSV after A.CSV, if say B.CS has Character from A to Z, and IO exception happens after writing M, when I rerun the program , it should reprocess B.CSV and append O to Z to Output.csv. In my business case. output.csv going to be very big file in GBs though source file will be in 3-5 mbs max so do not want to reprocess it from start rather to restore writing where it fails. I am keeping the file names in Database Table and keeping updating the status as "Processing" and then Processed. Thanks and looking for your input.
using var fs = new FileStream(file, FileMode.Open, FileAccess.Read);
using var reader = new StreamReader(fs, Encoding.Default);
(StreamWriter) _filewriter.Write(Environment.NewLine + reader.ReadToEnd());
I want to be able to open an Excel file (or create if it doesn't exist) and add data to it asynchronously. I have the async component working quite well using a blocking collection, though if I want to save every loop of my while statement i keep getting issues.
I can either get file corruption, or the data never saves at all. Or sometimes it only saves the first or second data segment in my two part test.
I have the following code to show a similar cut down version of my issue:
BlockingCollection<Excel_Data> collection = null;
FileStream fs = new FileStream(this.path, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.Read);
ExcelPackage excel = new ExcelPackage(fs);
int i = 0;
while (true) {
//---- do some asyc operations
Excel_Data dict_item = collection.Take();
excel.Workbook.Worksheets.Add("sheet" + i.ToString());
//excel.Save();
excel.SaveAs(fs);
if (++i == 2) {
break;
}
}
fs.Close();
In the above example after simply create 2 sheets, the file already becomes corrupted and I am unsure how to fix this issue without going purely with FileInfo over FileStream. But then i will never be able to lock my file for writing for the duration of my app.
I am trying to create a function that will retrieve all the uploaded files (which are now saved as byte in the database) and download it in a single zip file. I currently have 6000 files to download (and the number could grow).
The functionality is already working (from retrieval to download) if I limit the number of files being downloaded, otherwise, I get an OutOfMemoryException on the ForEach loop.
Here's a pseudo code: (files variable is a list of byte array and file name)
var files = getAllFilesFromDB();
foreach (var file in files)
{
var tempFilePath = Path.Combine(path, filename);
using (FileStream stream = new FileStream(tempfileName, FileMode.Create, FileAccess.ReadWrite))
{
stream.Write(file.byteArray, 0, file.byteArray.Length);
}
}
private readonly IEntityRepository<File> fileRepository;
IEnumerable<FileModel> getAllFilesFromDb()
{
return fileRepository.Select(f => new FileModel(){ fileData = f.byteArray, filename = f.fileName});
}
My question is, is there any other way to do this to avoid getting such errors?
To avoid this problem, you could avoid loading all the contents of all the files in one go. Most likely you will need to split your database call in to two database calls.
Retrieve a list of all the files without their contents but with some identifier - like the PK of the table.
A method which retrieves the contents of an individual file.
Then your (pseudo)code becomes
get list of all files
for each file
get the file contents
write the file to disk
Another possibility is to alter the way your query works currently, so that it uses deferred execution - this means it will not actually load all the files at once, but stream them one at a time from the database - but without seeing more code from your repository implementation, I cannot/ will not guess the right solution for you.
I have a strange problem. So my code follows as following.
The exe takes some data from the user
Call a web service to write(and create CSV for the data) the file at perticular network location(say \some-server\some-directory).
Although this web service is hosted at the same location where this
folder is (i.e i can also change it to be c:\some-directory). It then
returns after writing the file
the exe checks for the file to exists, if the file exists then further processing else quite with error.
The problem I am having is at step 3. When I try to read the file immediately after it has been written, I always get file not found exception(but the file there is present). I do not get this exception when I am debugging (because then I am putting a delay by debugging the code) or when Thread.Sleep(3000) before reading the file.
This is really strange because I close the StreamWriter before I return the call to exe. Now according to the documention, close should force the flush of the stream. This is also not related to the size of the file. Also I am not doing Async thread calls for writing and reading the file. They are running in same thread serially one after another(only writing is done by a web service and reading is done by exe. Still the call is serial)
I do not know, but it feels like there is some time difference between the file actually gets written on the disk and when you do Close(). However this baffling because this is not at all related to size. This happens for all file size. I have tried this with file with 10, 50, 100,200 lines of data.
Another thing which I suspected was since I was writing this file to a network location, it could be windows is optimizing the call by writing first to cache and then to network location. So I went ahead and changed the code to write it on drive(i.e use c:\some-directory), rather than network location. But it also resulted in same error.
There is no error in code(for reading and writing). As explained earlier, by putting a delay, it starts working fine. Some other useful information
The exe is .Net Framework 3.5
Windows Server 2008(64 bit, 4 GB Ram)
Edit 1
File.AppendAllText() is not correct solution, as it creates a new file, if it does not exits
Edit 2
code for writing
using (FileStream fs = new FileStream(outFileName, FileMode.Create))
{
using (StreamWriter writer = new StreamWriter(fs, Encoding.Unicode))
{
writer.WriteLine(someString)
}
}
code for reading
StreamReader rdr = new StreamReader(File.OpenRead(CsvFilePath));
string header = rdr.ReadLine();
rdr.Close();
Edit 3
used textwriter, same error
using (TextWriter writer = File.CreateText(outFileName))
{
}
Edit 3
Finally as suggested by some users, I am doing a check for the file in while loop for certain number of times before I throw the exception of file not found.
int i = 1;
while (i++ < 10)
{
bool fileExists = File.Exists(CsvFilePath);
if (!fileExists)
System.Threading.Thread.Sleep(500);
else
break;
}
So you are writing a stream to a file, then reading the file back to a stream? Do you need to write the file then post process it, or can you not just use the source stream directly?
If you need the file, I would use a loop that keeps checking if the file exists every second until it appears (or a silly amount of time has passed) - the writer would give you an error if you couldn't write the file, so you know it will turn up eventually.
Since you're writing over a network, most optimal solution would be to save your file in the local system first, then copy it to network location. This way you can avoid network connection problems. And as well have a backup in case of network failure.
Based on your update, Try this instead:
File.WriteAllText(outFileName, someString);
header = null;
using(StreamReader reader = new StreamReader(CsvFilePath)) {
header = reader.ReadLine();
}
Have you tried to read after disposing the writer FileStream?
Like this:
using (FileStream fs = new FileStream(outFileName, FileMode.Create))
{
using (StreamWriter writer = new StreamWriter(fs, Encoding.Unicode))
{
writer.WriteLine(someString)
}
}
using (StreamReader rdr = new StreamReader(File.OpenRead(CsvFilePath)))
{
string header = rdr.ReadLine();
}
I was just wondering if I missed anything inside the documentation that would allow me to get the number of lines contained in a file at a certain revision (or even number of lines changed from a SvnChangeItem, that would be nice too) without having to directly export the file to the filesystem and parse through it counting each line.
Any help would be appreciated. Thanks.
Nope, your stuck with exactly the solution you named. Export to temp file, count the lines, delete the file. A fairly expensive operation if your doing this file-by-file. It may be better to fetch the entire repo if you need to line-count every file and reuse the working directory for future runs.
The meta data (like current line count) is not contained within the repository but you can get the file without doing messy temp files.
For brevity, excluded code to iterate over revisions etc.
using (var client = new SvnClient())
{
using (MemoryStream memoryStream = new MemoryStream())
{
client.Write(new SvnUriTarget(urlToFile), memoryStream);
memoryStream.Position = 0;
var streamReader = new StreamReader(memoryStream);
int lineCount = 0;
while (streamReader.ReadLine() != null)
{
lineCount++;
}
}
}