I am a .NET developer and recently started learning ruby with ruby_koans. Some of Ruby's syntaxes are amazing and one of them is the way it handles "Sandwich" code.
The following is ruby sandwich code.
def file_sandwich(file_name)
file = open(file_name)
yield(file)
ensure
file.close if file
end
def count_lines2(file_name)
file_sandwich(file_name) do |file|
count = 0
while line = file.gets
count += 1
end
count
end
end
def test_counting_lines2
assert_equal 4, count_lines2("example_file.txt")
end
I am fascinated that I can get rid of the cumbersome "file open and close code" each time I access a file but cannot think of any C# equivalent code. Maybe, I can use IoC's dynamic proxy to do the same thing, but is there any way I can do it purely with C#?
Many thanks in advance.
You certainly don't need anything IoC-related here. How about:
public T ActOnFile<T>(string filename, Func<Stream, T> func)
{
using (Stream stream = File.OpenRead(stream))
{
return func(stream);
}
}
public int CountLines(string filename)
{
return ActOnFile(filename, stream =>
{
using (StreamReader reader = new StreamReader(stream))
{
int count = 0;
while (reader.ReadLine() != null)
{
count++;
}
return count;
}
});
}
In this case it doesn't help very much, as the using statement already does most of what you want... but the general principle holds. Indeed, that's how LINQ is so flexible. If you haven't looked at LINQ yet, I strongly recommend that you do.
Here's the act CountLines method I'd use:
public int CountLines(string filename)
{
return File.ReadLines(filename).Count();
}
Note that this will still only read a line at a time... but the Count extension method acts on the returned sequence.
In .NET 3.5 it would be:
public int CountLines(string filename)
{
using (var reader = File.OpenText(filename))
{
int count = 0;
while (reader.ReadLine() != null)
{
count++;
}
return count;
}
}
... still pretty simple.
are you just looking for something that opens and closes the stream for you?
public IEnumerable<string>GetFileLines(string path)
{
//the using() statement will open, close, and dispose your stream for you:
using(FileStream fs = new FileStream(path, FileMode.Open))
{
//do stuff here
}
}
Is yield return what you're looking for?
using will call Dispose() and Close() when it reaches the closing brace, but I think the question is how to achieve this particular structure of code.
Edit: Just realized that this isn't exactly what you're looking for, but I'll leave this answer here since a lot of people aren't aware of this technique.
static IEnumerable<string> GetLines(string filename)
{
using (var r = new StreamReader(filename))
{
string line;
while ((line = r.ReadLine()) != null)
yield return line;
}
}
static void Main(string[] args)
{
Console.WriteLine(GetLines("file.txt").Count());
//Or, similarly:
int count = 0;
foreach (var l in GetLines("file.txt"))
count++;
Console.WriteLine(count);
}
Related
I have below code which is reading a .json stream line by line. since it will be a lengthy process, I have decided that I will take 100 lines at a time before I call my main function. and so the below code works fine. but this also gives me an issue if number of lines is less than 100, in that case my main function will not be called. how can I optimize my below code to handle both the scenario i.e. read maximum 100 lines at a time and pass it to main function or read all the lines if it is below 100
public async void ReadJsonStream()
{
JsonSerializer serializer = new JsonSerializer();
using (Stream data = await manager.DownloadBlob(null, "TestMultipleLines.json", null))
{
using (StreamReader streamReader = new StreamReader(data, Encoding.UTF8))
{
int counter = 1;
List<string> lines = new List<string>();
while (streamReader.Peek() >= 0)
{
lines.Add(streamReader.ReadLine());
if (counter == 100)
{
counter = 1;
// call main function with line
lines.Clear();
}
counter++;
}
}
}
}
I feel what you are trying to do is wrong. How will you parse 100 lines? Do you want to rebuild from scratch a Json deserializer? And what will happen if some piece of json is split between the line 100 and the line 101?
But in the end, you asked for something and I'll give you what you asked.
public async void ReadJsonStream()
{
JsonSerializer serializer = new JsonSerializer();
using (Stream data = await manager.DownloadBlob(null, "TestMultipleLines.json", null))
{
using (StreamReader streamReader = new StreamReader(data, Encoding.UTF8))
{
List<string> lines = new List<string>();
string line;
while ((line = await streamReader.ReadLineAsync()) != null)
{
lines.Add(line);
if (lines.Count == 100)
{
// call main function with lines
lines.Clear();
}
}
if (lines.Count != 0)
{
// call main function with lines
lines.Clear(); // useless
}
}
}
}
As others noted, you forgot the "additional" call to // call main function with lines at the end of the cycle. I've even modified the code. You don't need to .Peek(), .ReadLine() returns null at the end of the input stream. You made your method async... You could make it fully async by using .ReadLineAsync().
Note that the JsonSerializer of Json.NET already has a Deserialize method that accept a TextReader (and a StreamReader is a TextReader), and that method will read the file "a piece at a time", and won't preload it before parsing it.
Add a check after the while loop. If the lines list is not empty, call main.
public async void ReadJsonStream()
{
JsonSerializer serializer = new JsonSerializer();
using (Stream data = await manager.DownloadBlob(null, "TestMultipleLines.json", null))
{
using (StreamReader streamReader = new StreamReader(data, Encoding.UTF8))
{
int counter = 1;
List<string> lines = new List<string>();
while (streamReader.Peek() >= 0)
{
lines.Add(streamReader.ReadLine());
if (counter == 100)
{
counter = 1;
// call main function with line
lines.Clear();
}
counter++;
}
if (lines.Count > 0)
// call main function with line
}
}
}
``
I need to process a very large text file (6-8 GB). I wrote the code attached below. Unfortunately, every time output file reaches (being created next to source file) reaches ~2GB, I observe sudden jump in memory consumption (~100MB to few GBs) and in result - out of memory exception.
Debugger indicates that OOM occurs at while ((tempLine = streamReader.ReadLine()) != null)
I am targeting .NET 4.7 and x64 architecture only.
Single line is at most 50 character long.
I can workaround this and split original file to smaller parts not to face the problem while processing and merge resuls back to one file at the end, but would like not to do it.
Code:
public async Task PerformDecodeAsync(string sourcePath, string targetPath)
{
var allLines = CountLines(sourcePath);
long processedlines = default;
using (File.Create(targetPath));
var streamWriter = File.AppendText(targetPath);
var decoderBlockingCollection = new BlockingCollection<string>(1000);
var writerBlockingCollection = new BlockingCollection<string>(1000);
var producer = Task.Factory.StartNew(() =>
{
using (var streamReader = new StreamReader(File.OpenRead(sourcePath), Encoding.Default, true))
{
string tempLine;
while ((tempLine = streamReader.ReadLine()) != null)
{
decoderBlockingCollection.Add(tempLine);
}
decoderBlockingCollection.CompleteAdding();
}
});
var consumer1 = Task.Factory.StartNew(() =>
{
foreach (var line in decoderBlockingCollection.GetConsumingEnumerable())
{
short decodeCounter = 0;
StringBuilder builder = new StringBuilder();
foreach (var singleChar in line)
{
var positionInDecodeKey = decodingKeysList[decodeCounter].IndexOf(singleChar);
if (positionInDecodeKey > 0)
builder.Append(model.Substring(positionInDecodeKey, 1));
else
builder.Append(singleChar);
if (decodeCounter > 18)
decodeCounter = 0;
else ++decodeCounter;
}
writerBlockingCollection.TryAdd(builder.ToString());
Interlocked.Increment(ref processedlines);
if (processedlines == (long)allLines)
writerBlockingCollection.CompleteAdding();
}
});
var writer = Task.Factory.StartNew(() =>
{
foreach (var line in writerBlockingCollection.GetConsumingEnumerable())
{
streamWriter.WriteLine(line);
}
});
Task.WaitAll(producer, consumer1, writer);
}
Solutions, as well as advices how to optimize it a little more is greatly appreciated.
Like I said, I'd probably go for something simpler first, unless or until it's demonstrated that it's not performing well. As Adi said in their answer, this work appears to be I/O bound - so there seems little benefit in creating multiple tasks for it.
publiv void PerformDecode(string sourcePath, string targetPath)
{
File.WriteAllLines(targetPath,File.ReadLines(sourcePath).Select(line=>{
short decodeCounter = 0;
StringBuilder builder = new StringBuilder();
foreach (var singleChar in line)
{
var positionInDecodeKey = decodingKeysList[decodeCounter].IndexOf(singleChar);
if (positionInDecodeKey > 0)
builder.Append(model.Substring(positionInDecodeKey, 1));
else
builder.Append(singleChar);
if (decodeCounter > 18)
decodeCounter = 0;
else ++decodeCounter;
}
return builder.ToString();
}));
}
Now, of course, this code actually blocks until it's done, which is why I've not marked it async. But then, so did yours, and it should have been warning about that already.
(You could try using PLINQ instead of LINQ for the Select portion but honestly, the amount of processing we're doing here looks trivial; Profile first before applying any such change)
As the work you are doing is mostly IO bound, you aren't really gaining anything from parallelization. It also looks to me like (correct me if I'm wrong) that your transformation algorithm doesn't depend on you reading the file line-by-line, so I would recommend instead doing something like this:
void Main()
{
//Setup streams for testing
using(var inputStream = new MemoryStream())
using(var outputStream = new MemoryStream())
using (var inputWriter = new StreamWriter(inputStream))
using (var outputReader = new StreamReader(outputStream))
{
//Write test string and rewind stream
inputWriter.Write("abcdefghijklmnop");
inputWriter.Flush();
inputStream.Seek(0, SeekOrigin.Begin);
var inputBuffer = new byte[5];
var outputBuffer = new byte[5];
int inputLength;
while ((inputLength = inputStream.Read(inputBuffer, 0, inputBuffer.Length)) > 0)
{
for (var i = 0; i < inputLength; i++)
{
//transform each character
outputBuffer[i] = ++inputBuffer[i];
}
//Write to output
outputStream.Write(outputBuffer, 0, inputLength);
}
//Read for testing
outputStream.Seek(0, SeekOrigin.Begin);
var output = outputReader.ReadToEnd();
Console.WriteLine(output);
//Outputs: "bcdefghijklmnopq"
}
}
Obviously, you would be using FileStreams instead of MemoryStreams, and you can increase the buffer length to something much larger (as this was just a demonstrative example). Also as your original method is Async, you use the async variants of Stream.Write and Stream.Read
In my code I have two methods, one is the main method and the other is 'MethodA'. In the main method, I will be doing all the displaying an other functions from MethodA. However, in MethodA I am attempting to read in several files into an array and return them to the main method so they can be used. The questions I am asking are: (I have condensed my main method down for this question, it does include other items that don't link in this)
1: Is it worth reading the files into an array?
2: Should I just have the 'ReadFiles' in the main, instead of another method?
static void Main(string[] args) // THE MAIN INCLUDES MAINLY ONLY THE WRITELINES AND WHAT IS SEEN
{
string value = MethodA();
Console.WriteLine(value);
Console.ReadKey();
}
public string[] MethodA() //FILE METHOD - READS IN THE FILES THEN RETURNS THEM TO MAIN METHOD
{
StreamReader dayFile = new StreamReader("c:..\\Files\\Day.txt"); StreamReader dateFile = new StreamReader("c:..\\Files\\Date.txt");
StreamReader sh1Open = new StreamReader("c:..\\Files\\SH1_Open.txt"); StreamReader sh1Close = new StreamReader("c:..\\Files\\SH1_Close.txt");
StreamReader sh1Volume = new StreamReader("c:..\\Files\\SH1_Volume.txt"); StreamReader sh1Diff = new StreamReader("c:..\\Files\\SH1_Diff.txt");
StreamReader sh2Open = new StreamReader("c:..\\Files\\SH2_Open.txt"); StreamReader sh2Close = new StreamReader("c:..\\Files\\SH2_Close.txt");
StreamReader sh2Volume = new StreamReader("c:..\\Files\\SH2_Volume.txt"); StreamReader sh2Diff = new StreamReader("c:..\\Files\\SH2_Diff.txt");
string dayString = dayFile.ReadToEnd(); string dateString = dateFile.ReadToEnd(); string Sh1OpenString = sh1Open.ReadToEnd(); string Sh1CloseString = sh1Close.ReadToEnd();
string Sh1VolumeString = sh1Volume.ReadToEnd(); string Sh1DiffString = sh1Diff.ReadToEnd(); string Sh2OpenString = sh2Open.ReadToEnd(); string Sh2CloseString = sh2Close.ReadToEnd();
string Sh2VolumeString = sh2Volume.ReadToEnd(); string Sh2DiffString = sh2Diff.ReadToEnd();
string[] fileArray = new string[] { dayString, dateString, Sh1OpenString, Sh1CloseString, Sh1VolumeString, Sh1DiffString, Sh2OpenString, Sh2CloseString, Sh2VolumeString, Sh2DiffString };
return fileArray;
}
I think it is fine to read them all in - unless you know if the files are massive.
I'd do it this way:
public string[] MethodA()
{
var filenames = new []
{
"Date.txt",
"Day.txt",
"SH1_Close.txt",
"SH1_Diff.txt",
"SH1_Open.txt",
"SH1_Volume.txt",
"SH2_Close.txt",
"SH2_Diff.txt",
"SH2_Open.txt",
"SH2_Volume.txt",
};
return
filenames
.Select(fn => File.ReadAllText("c:..\\Files\\" + fn))
.ToArray();
}
It depends on wether the files tend to be rather small or not. If they are several gigabytes big, you are going to have a problem. In this case, you should instead try to sequentially read the files.
public static IEnumerable<string> ReadLines(Stream stream)
{
using (var reader = new StreamReader(stream))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
BTW: StreamReader should always be used with the "using" keyword, as it otherwise leaves files opened until the garbace collector has finalized the instances that have gone out of scope.
I'm designing an API. Currently I'm trying to safely handle a condition where we run out of disk space. Basically, we have a series of files holding some data. When the disk is full, when we go to write another data file, it will of course throw an error. At this point, we delete a single file(loop through file list from oldest to newest and retry after we successfully delete a file). Then, we retry writing the file. Repeat that process until the file is written without error.
Now the fun part. All of this happens concurrently. Like, at some point there are 8 threads doing this at once. This makes things extra interesting, and has lead to an odd error.
Here is the code
public void Save(string text, string id)
{
using (var store = IsolatedStorageFile.GetUserStoreForApplication())
{
var existing = store.GetFileNames(string.Format(Prefix + "/*-{0}.dat", id));
if (existing.Any()) return; //it already is saved
string name = string.Format(Prefix + "/{0}-{1}.dat", DateTime.UtcNow.ToString("yyyyMMddHHmmssfffffff"), id);
tryagain:
bool doover=false;
try
{
AttemptFileWrite(store, name, text);
}
catch (IOException)
{
doover = true;
}
catch (IsolatedStorageException) //THIS LINE
{
doover = true;
}
if (doover)
{
Attempt(() => store.DeleteFile(name)); //because apparently this can also fail.
var files = store.GetFileNames(Path.Combine(Prefix, "*.dat"));
foreach (var file in files.OrderBy(x=>x))
{
try
{
store.DeleteFile(Path.Combine(Prefix, file));
}
catch
{
continue;
}
break;
}
goto tryagain; //prepare the velociraptor shield!
}
}
}
void AttemptFileWrite(IsolatedStorageFile store, string name, string text)
{
using (var file = store.OpenFile(
name,
FileMode.Create,
FileAccess.ReadWrite,
FileShare.None | FileShare.Delete
))
{
using (var writer = new StreamWriter(file))
{
writer.Write(text);
writer.Flush();
writer.Close();
}
file.Close();
}
}
static void Attempt(Action func)
{
try
{
func();
}
catch
{
}
}
static T Attempt<T>(Func<T> func)
{
try
{
return func();
}
catch
{
}
return default(T);
}
public string GetSaved()
{
string content=null;
using (var store = IsolatedStorageFile.GetUserStoreForApplication())
{
var files = store.GetFileNames(Path.Combine(Prefix,"*.dat")).OrderBy(x => x);
if (!files.Any()) return new MessageBatch();
foreach (var filename in files)
{
IsolatedStorageFileStream file=null;
try
{
file = Attempt(() =>
store.OpenFile(Path.Combine(Prefix, filename), FileMode.Open, FileAccess.ReadWrite, FileShare.None | FileShare.Delete));
if (file == null)
{
continue; //couldn't open. assume locked or some such
}
file.Seek(0L, SeekOrigin.Begin);
using (var reader = new StreamReader(file))
{
content = reader.ReadToEnd();
}
//take note here. We delete the file, while we still have it open!
//This is done because having the file open prevents other readers, but if we close it first,
//then there is a race condition that right after closing the stream, another reader could pick it up and
//open exclusively. It looks weird, but it's right. Trust me.
store.DeleteFile(Path.Combine(Prefix, filename));
if (!string.IsNullOrEmpty(content))
{
break;
}
}
finally
{
if (file != null) file.Close();
}
}
}
return content;
}
At the line marked THIS LINE, is what I'm talking about. When doing AttemptFileWrite, I can look over at store.AvailableSpace and see that there is enough room to fit the data into it, but upon trying to open the file, it throws this IsolatedStorageException with the description of Operation Not Permitted. Aside from this weird case, in all other cases it's just an IOException thrown with a message about the disk being full
I'm trying to figure out if I have some odd race condition, or if this is an error I just have to deal with or what?
Why does this error occur?
This is a simplified example, to isolate the purpose of the question. In my actual scneario, the ColumnReader returned by GetColumnReader will actually do more work than merely ReadLine.
If I run the following program, I will get an error when I try to call Reader(), because of course the TextReader has already been disposed by the using statement.
public class Play{
delegate string ColumnReader();
static ColumnReader GetColumnReader(string filename){
using (TextReader reader = new StreamReader(filename)){
var headers = reader.ReadLine();
return () => reader.ReadLine();
}
}
public static void Main(string[] args){
var Reader = GetColumnReader("Input.tsv");
Console.WriteLine(Reader());
}
}
Alternatively, I can remove the "using" and directly declare the TextReader, which would function, but now we no longer have a guarantee that the TextReader will be eventually closed.
Is there a way to add a "destructor" to the returned lambda function where I might be able to Dispose of the TextReader as soon as the lambda function goes out of scope (no more references)?
I also welcome other suggestions but wish to keep the basic closure structure (that is, fits into the scope of the question).
If you would not require lambda expression you could create enumerable instead.
Potentially moving using inside the =>{} may work in real code... Still probably not what you are looking for:
static ColumnReader GetColumnReader(string filename) {
return () => {
using (TextReader reader = new StreamReader(filename)) {
var headers = reader.ReadLine();
return reader.ReadLine();
}
};
}
Version with IEnumerable (if you alway finish iteration):
static IEnumerable<string> GetColumnReader(string filename) {
using (TextReader reader = new StreamReader("aa")) {
var headers = reader.ReadLine();
yield return reader.ReadLine();
}
}
You'll need to create custom IDisposable iterator if you want to support iterations to the middle of the enumeration. See how foreach handles iterators that implement IDisposable to handle such cases.
Essentially you need the scope of the disposable element outside of the delegate itself. In these situations I would make the delegate accept the disposable instance (I.e. TextReader) rather than a file name.
public class Play {
delegate string ColumnReader();
static ColumnReader GetColumnReader(string filename) {
return () => {
using (TextReader reader = new StreamReader(filename)) {
var headers = reader.ReadLine();
return reader.ReadLine();
}
};
}
public static void Main(string[] args) {
var Reader = GetColumnReader("Input.tsv");
Console.WriteLine(Reader());
}
}
Obviously, this will open / read one line / close the file every time you call the returned delegate.
If you need to open it once, and then keep it open while reading through several lines, you'll be better off with an iterator block, similar to this:
public class Play {
static IEnumerable<string> ReadLines(string filename) {
using (TextReader reader = new StreamReader(filename)) {
var headers = reader.ReadLine(); // I'm guessing you want to ignore this??
while (true) {
string line = reader.ReadLine();
if (line == null)
yield break;
yield return line;
}
}
}
public static void Main(string[] args) {
foreach (string line in ReadLines("Input.tsv"))
Console.WriteLine(line);
}
}
If you really want to preserve the closure semantics, you will need to add an argument for it. Something like bellow, but you have to take care of calling the dispose command.
public class Play {
enum ReaderCommand {
Read,
Close
}
delegate string ColumnReader(ReaderCommand cmd);
static ColumnReader GetColumnReader(string filename) {
TextReader reader = new StreamReader(filename);
var headers = reader.ReadLine();
return (ReaderCommand cmd) => {
switch (cmd) {
case ReaderCommand.Read:
return reader.ReadLine();
case ReaderCommand.Close:
reader.Dispose();
return null;
}
return null;
};
}
public static void Main(string[] args) {
var Reader = GetColumnReader("Input.tsv");
Console.WriteLine(Reader(ReaderCommand.Read));
Console.WriteLine(Reader(ReaderCommand.Read));
Reader(ReaderCommand.Close);
Console.ReadKey();
}
}
How is this any easier than simply returning the TextReader? Seems to me you're making things much more complicated just to achieve a particular coding style.
The onus will always be on the caller to dispose of whatever is returned correctly.
I'm sure your project will give you plenty of opportunity to flex your muscles - this time just keep it simple!
I really like the yield solution. I have wirten a simple code, it shows it works well, the resouce can be disposed, after client out of for-each.
static void Main(string[] args)
{
using (Resource resource = new Resource())
{
foreach (var number in resource.GetNumbers())
{
if (number > 2)
break;
Console.WriteLine(number);
}
}
Console.Read();
}
public class Resource : IDisposable
{
private List<int> _numbers = new List<int> { 1, 2, 3, 4, 5, 6, 7 };
public IEnumerable<int> GetNumbers()
{
foreach (var number in _numbers)
yield return number;
}
public void Dispose()
{
Console.WriteLine("Resource::Dispose()...");
}
}