Read lines batch wise in c#

Read lines batch wise in c# - c#

I have below code which is reading a .json stream line by line. since it will be a lengthy process, I have decided that I will take 100 lines at a time before I call my main function. and so the below code works fine. but this also gives me an issue if number of lines is less than 100, in that case my main function will not be called. how can I optimize my below code to handle both the scenario i.e. read maximum 100 lines at a time and pass it to main function or read all the lines if it is below 100
public async void ReadJsonStream()
{
JsonSerializer serializer = new JsonSerializer();
using (Stream data = await manager.DownloadBlob(null, "TestMultipleLines.json", null))
{
using (StreamReader streamReader = new StreamReader(data, Encoding.UTF8))
{
int counter = 1;
List<string> lines = new List<string>();
while (streamReader.Peek() >= 0)
{
lines.Add(streamReader.ReadLine());
if (counter == 100)
{
counter = 1;
// call main function with line
lines.Clear();
}
counter++;
}
}
}
}

I feel what you are trying to do is wrong. How will you parse 100 lines? Do you want to rebuild from scratch a Json deserializer? And what will happen if some piece of json is split between the line 100 and the line 101?
But in the end, you asked for something and I'll give you what you asked.
public async void ReadJsonStream()
{
JsonSerializer serializer = new JsonSerializer();
using (Stream data = await manager.DownloadBlob(null, "TestMultipleLines.json", null))
{
using (StreamReader streamReader = new StreamReader(data, Encoding.UTF8))
{
List<string> lines = new List<string>();
string line;
while ((line = await streamReader.ReadLineAsync()) != null)
{
lines.Add(line);
if (lines.Count == 100)
{
// call main function with lines
lines.Clear();
}
}
if (lines.Count != 0)
{
// call main function with lines
lines.Clear(); // useless
}
}
}
}
As others noted, you forgot the "additional" call to // call main function with lines at the end of the cycle. I've even modified the code. You don't need to .Peek(), .ReadLine() returns null at the end of the input stream. You made your method async... You could make it fully async by using .ReadLineAsync().
Note that the JsonSerializer of Json.NET already has a Deserialize method that accept a TextReader (and a StreamReader is a TextReader), and that method will read the file "a piece at a time", and won't preload it before parsing it.

Add a check after the while loop. If the lines list is not empty, call main.
public async void ReadJsonStream()
{
JsonSerializer serializer = new JsonSerializer();
using (Stream data = await manager.DownloadBlob(null, "TestMultipleLines.json", null))
{
using (StreamReader streamReader = new StreamReader(data, Encoding.UTF8))
{
int counter = 1;
List<string> lines = new List<string>();
while (streamReader.Peek() >= 0)
{
lines.Add(streamReader.ReadLine());
if (counter == 100)
{
counter = 1;
// call main function with line
lines.Clear();
}
counter++;
}
if (lines.Count > 0)
// call main function with line
}
}
}
``

Related

IOException: The process cannot access the file 'fileName/textFile.txt' because it is being used by another process

I saw other threads about this problem, and non of them seems to solve my exact problem.
static void RecordUpdater(string username,int points,string term) //Updates Record File with New Records.
{
int minPoints = 0;
StreamWriter streamWriter = new StreamWriter($#"Record\{term}");
Player playersRecord = new Player(points, username);
List<Player> allRecords = new List<Player>();
StreamReader reader = new StreamReader($#"Record\{term}");
while (!reader.EndOfStream)
{
string[] splitText = reader.ReadLine().Split(',');
Player record = new Player(Convert.ToInt32(splitText[0]), splitText[1]);
allRecords.Add(record);
}
reader.Close();
foreach (var playerpoint in allRecords )
{
if(minPoints > playerpoint.points)
minPoints = playerpoint.points;
}
if (points > minPoints)
{
allRecords.Add(playersRecord);
allRecords.Remove(allRecords.Min());
}
allRecords.Sort();
allRecords.Reverse();
streamWriter.Flush();
foreach (var player in allRecords)
{
streamWriter.WriteLine(player.points + "," + player.username);
}
}
So after I run the program and get to that point in code I get an error message:
"The process cannot access the file 'fileName/textFile.txt' because it is being used by another process."

You should use the using statement around disposable objects like streams. This will ensure that the objects release every unmanaged resources they hold. And don't open the writer until you need it. Makes no sense to open the writer when first you need to read the records
static void RecordUpdater(string username,int points,string term)
{
Player playersRecord = new Player(points, username);
List<Player> allRecords = new List<Player>();
int minPoints = 0;
try
{
using(StreamReader reader = new StreamReader($#"Record\{term}"))
{
while (!reader.EndOfStream)
{
.... load you data line by line
}
}
..... process your data .....
using(StreamWriter streamWriter = new StreamWriter($#"Record\{term}"))
{
... write your data...
}
}
catch(Exception ex)
{
... show a message about the ex.Message or just log everything
in a file for later analysis
}
}
Also you should consider that working with files is one of the most probable context in which you could receive an exception due to external events in which your program has no control.
It is better to enclose everything in a try/catch block with proper handling of the exception

Sudden memory consumption jump resulting in out of memory exception while processing huge text file

I need to process a very large text file (6-8 GB). I wrote the code attached below. Unfortunately, every time output file reaches (being created next to source file) reaches ~2GB, I observe sudden jump in memory consumption (~100MB to few GBs) and in result - out of memory exception.
Debugger indicates that OOM occurs at while ((tempLine = streamReader.ReadLine()) != null)
I am targeting .NET 4.7 and x64 architecture only.
Single line is at most 50 character long.
I can workaround this and split original file to smaller parts not to face the problem while processing and merge resuls back to one file at the end, but would like not to do it.
Code:
public async Task PerformDecodeAsync(string sourcePath, string targetPath)
{
var allLines = CountLines(sourcePath);
long processedlines = default;
using (File.Create(targetPath));
var streamWriter = File.AppendText(targetPath);
var decoderBlockingCollection = new BlockingCollection<string>(1000);
var writerBlockingCollection = new BlockingCollection<string>(1000);
var producer = Task.Factory.StartNew(() =>
{
using (var streamReader = new StreamReader(File.OpenRead(sourcePath), Encoding.Default, true))
{
string tempLine;
while ((tempLine = streamReader.ReadLine()) != null)
{
decoderBlockingCollection.Add(tempLine);
}
decoderBlockingCollection.CompleteAdding();
}
});
var consumer1 = Task.Factory.StartNew(() =>
{
foreach (var line in decoderBlockingCollection.GetConsumingEnumerable())
{
short decodeCounter = 0;
StringBuilder builder = new StringBuilder();
foreach (var singleChar in line)
{
var positionInDecodeKey = decodingKeysList[decodeCounter].IndexOf(singleChar);
if (positionInDecodeKey > 0)
builder.Append(model.Substring(positionInDecodeKey, 1));
else
builder.Append(singleChar);
if (decodeCounter > 18)
decodeCounter = 0;
else ++decodeCounter;
}
writerBlockingCollection.TryAdd(builder.ToString());
Interlocked.Increment(ref processedlines);
if (processedlines == (long)allLines)
writerBlockingCollection.CompleteAdding();
}
});
var writer = Task.Factory.StartNew(() =>
{
foreach (var line in writerBlockingCollection.GetConsumingEnumerable())
{
streamWriter.WriteLine(line);
}
});
Task.WaitAll(producer, consumer1, writer);
}
Solutions, as well as advices how to optimize it a little more is greatly appreciated.

Like I said, I'd probably go for something simpler first, unless or until it's demonstrated that it's not performing well. As Adi said in their answer, this work appears to be I/O bound - so there seems little benefit in creating multiple tasks for it.
publiv void PerformDecode(string sourcePath, string targetPath)
{
File.WriteAllLines(targetPath,File.ReadLines(sourcePath).Select(line=>{
short decodeCounter = 0;
StringBuilder builder = new StringBuilder();
foreach (var singleChar in line)
{
var positionInDecodeKey = decodingKeysList[decodeCounter].IndexOf(singleChar);
if (positionInDecodeKey > 0)
builder.Append(model.Substring(positionInDecodeKey, 1));
else
builder.Append(singleChar);
if (decodeCounter > 18)
decodeCounter = 0;
else ++decodeCounter;
}
return builder.ToString();
}));
}
Now, of course, this code actually blocks until it's done, which is why I've not marked it async. But then, so did yours, and it should have been warning about that already.
(You could try using PLINQ instead of LINQ for the Select portion but honestly, the amount of processing we're doing here looks trivial; Profile first before applying any such change)

As the work you are doing is mostly IO bound, you aren't really gaining anything from parallelization. It also looks to me like (correct me if I'm wrong) that your transformation algorithm doesn't depend on you reading the file line-by-line, so I would recommend instead doing something like this:
void Main()
{
//Setup streams for testing
using(var inputStream = new MemoryStream())
using(var outputStream = new MemoryStream())
using (var inputWriter = new StreamWriter(inputStream))
using (var outputReader = new StreamReader(outputStream))
{
//Write test string and rewind stream
inputWriter.Write("abcdefghijklmnop");
inputWriter.Flush();
inputStream.Seek(0, SeekOrigin.Begin);
var inputBuffer = new byte[5];
var outputBuffer = new byte[5];
int inputLength;
while ((inputLength = inputStream.Read(inputBuffer, 0, inputBuffer.Length)) > 0)
{
for (var i = 0; i < inputLength; i++)
{
//transform each character
outputBuffer[i] = ++inputBuffer[i];
}
//Write to output
outputStream.Write(outputBuffer, 0, inputLength);
}
//Read for testing
outputStream.Seek(0, SeekOrigin.Begin);
var output = outputReader.ReadToEnd();
Console.WriteLine(output);
//Outputs: "bcdefghijklmnopq"
}
}
Obviously, you would be using FileStreams instead of MemoryStreams, and you can increase the buffer length to something much larger (as this was just a demonstrative example). Also as your original method is Async, you use the async variants of Stream.Write and Stream.Read

Wait until string.contains() condition is met

I'm reading a file to my C# application and decompressing a tile_data BLOB using a gzip stream. I'm currently accessing the BLOB data through this method:
SQLiteCommand command = new SQLiteCommand(query, DbConn);
SQLiteDataReader reader = command.ExecuteReader();
bool isMet = false;
while (reader.Read())
{
using (var file = reader.GetStream(0))
using (var unzip = new GZipStream(file, CompressionMode.Decompress))
using (var fileReader = new StreamReader(unzip))
{
var line = fileReader.ReadLine();
while (!fileReader.EndOfStream)
{
}
Console.WriteLine("End of tile_data");
}
}
reader.Close();
Console.WriteLine("Reader closed");
Console.ReadKey();
}
catch (Exception e)
{
Console.Write(e.StackTrace);
Console.ReadKey();
}
I'm looking to wait until the fileReader detects "tertiary" (string) and then prints all data afterwards. I attempted to use a bool and a nested while loop but it came back as an infinite loop, hence the question.
The code that I used (and failed with):
if (line.Contains("tertiary"))
{
isMet = true;
}
while (!fileReader.EndOfStream && isMet)
{
Console.WriteLine(line);
}
How can I perform an operation only with my fileReader once a condition has been met?

Your fileReader.EndOfStream loop will only work if your stream has only a single line. The problem is that you're only reading from the stream once - so unless you've already read the whole thing, you're in an endless loop.
Instead, do something like this:
string line;
while ((line = fileReader.ReadLine()) != null)
{
if (line.Contains("...")) break; // Or whatever else you want to do
}

You can use this code. It find any character in the string. It is working for me.
string matchStr = "tertiary";
if (line.Any(matchStr.Contains)
{
isMet = true;
}
while (!fileReader.EndOfStream && isMet)
{
Console.WriteLine(line);
}

The line var line = fileReader.ReadLine(); be inside the while loop?
while (!fileReader.EndOfStream)
{
var line = fileReader.ReadLine();
// do some other stuff
}

Try something like this
var yourval;
var secval = fileReader.ReadLine()
while ((yourval = secval ) != null)
{
if (line.Contains("your string here"))
{
break;
}
}

Is it possible or worth sending an array to another method c#

In my code I have two methods, one is the main method and the other is 'MethodA'. In the main method, I will be doing all the displaying an other functions from MethodA. However, in MethodA I am attempting to read in several files into an array and return them to the main method so they can be used. The questions I am asking are: (I have condensed my main method down for this question, it does include other items that don't link in this)
1: Is it worth reading the files into an array?
2: Should I just have the 'ReadFiles' in the main, instead of another method?
static void Main(string[] args) // THE MAIN INCLUDES MAINLY ONLY THE WRITELINES AND WHAT IS SEEN
{
string value = MethodA();
Console.WriteLine(value);
Console.ReadKey();
}
public string[] MethodA() //FILE METHOD - READS IN THE FILES THEN RETURNS THEM TO MAIN METHOD
{
StreamReader dayFile = new StreamReader("c:..\\Files\\Day.txt"); StreamReader dateFile = new StreamReader("c:..\\Files\\Date.txt");
StreamReader sh1Open = new StreamReader("c:..\\Files\\SH1_Open.txt"); StreamReader sh1Close = new StreamReader("c:..\\Files\\SH1_Close.txt");
StreamReader sh1Volume = new StreamReader("c:..\\Files\\SH1_Volume.txt"); StreamReader sh1Diff = new StreamReader("c:..\\Files\\SH1_Diff.txt");
StreamReader sh2Open = new StreamReader("c:..\\Files\\SH2_Open.txt"); StreamReader sh2Close = new StreamReader("c:..\\Files\\SH2_Close.txt");
StreamReader sh2Volume = new StreamReader("c:..\\Files\\SH2_Volume.txt"); StreamReader sh2Diff = new StreamReader("c:..\\Files\\SH2_Diff.txt");
string dayString = dayFile.ReadToEnd(); string dateString = dateFile.ReadToEnd(); string Sh1OpenString = sh1Open.ReadToEnd(); string Sh1CloseString = sh1Close.ReadToEnd();
string Sh1VolumeString = sh1Volume.ReadToEnd(); string Sh1DiffString = sh1Diff.ReadToEnd(); string Sh2OpenString = sh2Open.ReadToEnd(); string Sh2CloseString = sh2Close.ReadToEnd();
string Sh2VolumeString = sh2Volume.ReadToEnd(); string Sh2DiffString = sh2Diff.ReadToEnd();
string[] fileArray = new string[] { dayString, dateString, Sh1OpenString, Sh1CloseString, Sh1VolumeString, Sh1DiffString, Sh2OpenString, Sh2CloseString, Sh2VolumeString, Sh2DiffString };
return fileArray;
}

I think it is fine to read them all in - unless you know if the files are massive.
I'd do it this way:
public string[] MethodA()
{
var filenames = new []
{
"Date.txt",
"Day.txt",
"SH1_Close.txt",
"SH1_Diff.txt",
"SH1_Open.txt",
"SH1_Volume.txt",
"SH2_Close.txt",
"SH2_Diff.txt",
"SH2_Open.txt",
"SH2_Volume.txt",
};
return
filenames
.Select(fn => File.ReadAllText("c:..\\Files\\" + fn))
.ToArray();
}

It depends on wether the files tend to be rather small or not. If they are several gigabytes big, you are going to have a problem. In this case, you should instead try to sequentially read the files.
public static IEnumerable<string> ReadLines(Stream stream)
{
using (var reader = new StreamReader(stream))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
BTW: StreamReader should always be used with the "using" keyword, as it otherwise leaves files opened until the garbace collector has finalized the instances that have gone out of scope.

What is a good way to capture a TextReader in a closure but still dispose of it properly?

This is a simplified example, to isolate the purpose of the question. In my actual scneario, the ColumnReader returned by GetColumnReader will actually do more work than merely ReadLine.
If I run the following program, I will get an error when I try to call Reader(), because of course the TextReader has already been disposed by the using statement.
public class Play{
delegate string ColumnReader();
static ColumnReader GetColumnReader(string filename){
using (TextReader reader = new StreamReader(filename)){
var headers = reader.ReadLine();
return () => reader.ReadLine();
}
}
public static void Main(string[] args){
var Reader = GetColumnReader("Input.tsv");
Console.WriteLine(Reader());
}
}
Alternatively, I can remove the "using" and directly declare the TextReader, which would function, but now we no longer have a guarantee that the TextReader will be eventually closed.
Is there a way to add a "destructor" to the returned lambda function where I might be able to Dispose of the TextReader as soon as the lambda function goes out of scope (no more references)?
I also welcome other suggestions but wish to keep the basic closure structure (that is, fits into the scope of the question).

If you would not require lambda expression you could create enumerable instead.
Potentially moving using inside the =>{} may work in real code... Still probably not what you are looking for:
static ColumnReader GetColumnReader(string filename) {
return () => {
using (TextReader reader = new StreamReader(filename)) {
var headers = reader.ReadLine();
return reader.ReadLine();
}
};
}
Version with IEnumerable (if you alway finish iteration):
static IEnumerable<string> GetColumnReader(string filename) {
using (TextReader reader = new StreamReader("aa")) {
var headers = reader.ReadLine();
yield return reader.ReadLine();
}
}
You'll need to create custom IDisposable iterator if you want to support iterations to the middle of the enumeration. See how foreach handles iterators that implement IDisposable to handle such cases.

Essentially you need the scope of the disposable element outside of the delegate itself. In these situations I would make the delegate accept the disposable instance (I.e. TextReader) rather than a file name.

public class Play {
delegate string ColumnReader();
static ColumnReader GetColumnReader(string filename) {
return () => {
using (TextReader reader = new StreamReader(filename)) {
var headers = reader.ReadLine();
return reader.ReadLine();
}
};
}
public static void Main(string[] args) {
var Reader = GetColumnReader("Input.tsv");
Console.WriteLine(Reader());
}
}
Obviously, this will open / read one line / close the file every time you call the returned delegate.
If you need to open it once, and then keep it open while reading through several lines, you'll be better off with an iterator block, similar to this:
public class Play {
static IEnumerable<string> ReadLines(string filename) {
using (TextReader reader = new StreamReader(filename)) {
var headers = reader.ReadLine(); // I'm guessing you want to ignore this??
while (true) {
string line = reader.ReadLine();
if (line == null)
yield break;
yield return line;
}
}
}
public static void Main(string[] args) {
foreach (string line in ReadLines("Input.tsv"))
Console.WriteLine(line);
}
}

If you really want to preserve the closure semantics, you will need to add an argument for it. Something like bellow, but you have to take care of calling the dispose command.
public class Play {
enum ReaderCommand {
Read,
Close
}
delegate string ColumnReader(ReaderCommand cmd);
static ColumnReader GetColumnReader(string filename) {
TextReader reader = new StreamReader(filename);
var headers = reader.ReadLine();
return (ReaderCommand cmd) => {
switch (cmd) {
case ReaderCommand.Read:
return reader.ReadLine();
case ReaderCommand.Close:
reader.Dispose();
return null;
}
return null;
};
}
public static void Main(string[] args) {
var Reader = GetColumnReader("Input.tsv");
Console.WriteLine(Reader(ReaderCommand.Read));
Console.WriteLine(Reader(ReaderCommand.Read));
Reader(ReaderCommand.Close);
Console.ReadKey();
}
}

How is this any easier than simply returning the TextReader? Seems to me you're making things much more complicated just to achieve a particular coding style.
The onus will always be on the caller to dispose of whatever is returned correctly.
I'm sure your project will give you plenty of opportunity to flex your muscles - this time just keep it simple!

I really like the yield solution. I have wirten a simple code, it shows it works well, the resouce can be disposed, after client out of for-each.
static void Main(string[] args)
{
using (Resource resource = new Resource())
{
foreach (var number in resource.GetNumbers())
{
if (number > 2)
break;
Console.WriteLine(number);
}
}
Console.Read();
}
public class Resource : IDisposable
{
private List<int> _numbers = new List<int> { 1, 2, 3, 4, 5, 6, 7 };
public IEnumerable<int> GetNumbers()
{
foreach (var number in _numbers)
yield return number;
}
public void Dispose()
{
Console.WriteLine("Resource::Dispose()...");
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Read lines batch wise in c# - c#

Related

IOException: The process cannot access the file 'fileName/textFile.txt' because it is being used by another process

Sudden memory consumption jump resulting in out of memory exception while processing huge text file

Wait until string.contains() condition is met

Is it possible or worth sending an array to another method c#

What is a good way to capture a TextReader in a closure but still dispose of it properly?

Categories

Resources