Count all characters in file during reading CSV - c#

I just want to ask you if there is any possibilities to get the number of all characters in file during reading the CSV file? I don't want to load file into memory two times (one time for parsing, second time for counting).
I need to parse CSV file but also I need to get the number of all characters in this file (with delimeters). Someone has any idea how to do that in the most efficient way?
using (TextReader stream = new StreamReader(file.OpenReadStream()))
{
CsvReader reader = new CsvReader(stream, GetCsvReaderOptions());
while (reader.Read())
{
//parsing
}
}
There is an option to iterate through all fields in actual reader row
and at the end increment length by delimeters (number of fields ==
number of delimeters).
Also I have idea to count characters on parsed objects by reflection
(get all properties value from object).
I don't think that these options will be efficient.
Thanks in Advance

You can use Reader.Context.RawRecord and remove the line endings. (Assuming you don't want to count those)
using (TextReader stream = new StreamReader(file.OpenReadStream()))
{
var count = 0;
CsvReader reader = new CsvReader(stream, GetCsvReaderOptions());
while (reader.Read())
{
count += reader.Context.RawRecord.Replace("\n", "").Replace("\r", "").Length;
//parsing
}
}

The basic way of doing this could be the following:
using (TextReader stream = new StreamReader(file.OpenReadStream()))
{
var content = stream.ReadToEnd();
var length = content.Length;
}
So that variable "length" will contain the count of all symbols in the passed file

Related

C# StreamReader read value of type [duplicate]

This is something that should be very simple. I just want to read numbers and words from a text file that consists of tokens separated by white space. How do you do this in C#? For example, in C++, the following code would work to read an integer, float, and word. I don't want to have to use a regex or write any special parsing code.
ifstream in("file.txt");
int int_val;
float float_val;
string string_val;
in >> int_val >> float_val >> string_val;
in.close();
Also, whenever a token is read, no more than one character beyond the token should be read in. This allows further file reading to depend on the value of the token that was read. As a concrete example, consider
string decider;
int size;
string name;
in >> decider;
if (decider == "name")
in >> name;
else if (decider == "size")
in >> size;
else if (!decider.empty() && decider[0] == '#')
read_remainder_of_line(in);
Parsing a binary PNM file is also a good example of why you would like to stop reading a file as soon as a full token is read in.
Brannon's answer explains how to read binary data. If you want to read text data, you should be reading strings and then parsing them - for which there are built-in methods, of course.
For example, to read a file with data:
10
10.5
hello
You might use:
using (TextReader reader = File.OpenText("test.txt"))
{
int x = int.Parse(reader.ReadLine());
double y = double.Parse(reader.ReadLine());
string z = reader.ReadLine();
}
Note that this has no error handling. In particular, it will throw an exception if the file doesn't exist, the first two lines have inappropriate data, or there are less than two lines. It will leave a value of null in z if the file only has two lines.
For a more robust solution which can fail more gracefully, you would want to check whether reader.ReadLine() returned null (indicating the end of the file) and use int.TryParse and double.TryParse instead of the Parse methods.
That's assuming there's a line separator between values. If you actually want to read a string like this:
10 10.5 hello
then the code would be very similar:
using (TextReader reader = File.OpenText("test.txt"))
{
string text = reader.ReadLine();
string[] bits = text.Split(' ');
int x = int.Parse(bits[0]);
double y = double.Parse(bits[1]);
string z = bits[2];
}
Again, you'd want to perform appropriate error detection and handling. Note that if the file really just consisted of a single line, you may want to use File.ReadAllText instead, to make it slightly simpler. There's also File.ReadAllLines which reads the whole file into a string array of lines.
EDIT: If you need to split by any whitespace, then you'd probably be best off reading the whole file with File.ReadAllText and then using a regular expression to split it. At that point I do wonder how you represent a string containing a space.
In my experience you generally know more about the format than this - whether there will be a line separator, or multiple values in the same line separated by spaces, etc.
I'd also add that mixed binary/text formats are generally unpleasant to deal with. Simple and efficient text handling tends to read into a buffer, which becomes problematic if there's binary data as well. If you need a text section in a binary file, it's generally best to include a length prefix so that just that piece of data can be decoded.
using (FileStream fs = File.OpenRead("file.txt"))
{
BinaryReader reader = new BinaryReader(fs);
int intVal = reader.ReadInt32();
float floatVal = reader.ReadSingle();
string stringVal = reader.ReadString();
}
I like using the StreamReader for quick and easy file access. Something like....
String file = "data_file.txt";
StreamReader dataStream = new StreamReader(file);
string datasample;
while ((datasample = dataStream.ReadLine()) != null)
{
// datasample has the current line of text - write it to the console.
Console.Writeline(datasample);
}
Not exactly the answer to your question, but just an idea to consider if you are new to C#: If you are using a custom text file to read some configuration parameters, you might want to check XML serialization topics in .NET.
XML serialization provides a simple way to write and read XML formatted files. For example, if you have a configuration class like this:
public class Configuration
{
public int intVal { get; set; }
public float floatVal { get; set; }
public string stringVal { get; set; }
}
you can simply save it and load it using the XmlSerializer class:
public void Save(Configuration config, string fileName)
{
XmlSerializer xml = new XmlSerializer(typeof(Configuration));
using (StreamWriter sw = new StreamWriter(fileName))
{
xml.Serialize(sw, config);
}
}
public Configuration Load(string fileName)
{
XmlSerializer xml = new XmlSerializer(typeof(Configuration));
using (StreamReader sr = new StreamReader(fileName))
{
return (Configuration)xml.Deserialize(sr);
}
}
Save method as defined above will create a file with the following contents:
<Configuration>
<intVal>0</intVal>
<floatVal>0.0</floatVal>
<stringVal></stringVal>
</Configuration>
Good thing about this approach is that you don't need to change the Save and Load methods if your Configuration class changes.
C# doesn't seem to have formatted stream readers like C++ (I would be happy to be corrected). So Jon Skeet approach of reading the contents as string and parsing them to the desired type would be the best.
Try someting like this:
http://stevedonovan.blogspot.com/2005/04/reading-numbers-from-file-in-c.html
IMHO Maybe to read a c# tutorial it will be really useful to have the whole picture in mind before asking
Here is my code to read numbers from the text file. It demonstrates the concept of reading numbers from text file "2 3 5 7 ..."
public class NumberReader
{
StreamReader reader;
public NumberReader(StreamReader reader)
{
this.reader = reader;
}
public UInt64 ReadUInt64()
{
UInt64 result = 0;
while (!reader.EndOfStream)
{
int c = reader.Read();
if (char.IsDigit((char) c))
{
result = 10 * result + (UInt64) (c - '0');
}
else
{
break;
}
}
return result;
}
}
Here is sample code to use this class:
using (StreamReader reader = File.OpenText("numbers.txt"))
{
NumberReader numbers = new NumberReader(reader);
while (! reader.EndOfStream)
{
ulong lastNumber = numbers.ReadUInt64();
}
}

Read CSV line by line in c#

I am reading a CSV file and want to read it line by line. The code below does not have any error but when execute the code it reads from middle of the CSV, it just prints last four lines of CSV but i need the whole CSV data as output. please assist what i an missing in my code
I want to achieve this using streamreader only and not parser.
using (StreamReader rd = new StreamReader(#"C:\Test.csv"))
{
while (!rd.EndOfStream)
{
String[] value = null;
string splits = rd.ReadLine();
value = splits.Split(',');
foreach (var test in value)
{
Console.WriteLine(test);
}
}
}
Test.csv
TEST Value ,13:00,,,14:00,,,15:00,,, "Location","Time1","Transaction1","Transaction2","Tim2", "Pune","1.07","-","-","0.99", "Mumbai","0.55","-","-","0.59", "Delhi","1.00","-","-","1.08", "Chennai","0.52","-","-","0.50",
There is already a stack overflow article about this.
Also, the article provides a much better way to do this same:
using (TextFieldParser parser = new TextFieldParser(#"c:\test.csv"))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Processing row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Process field
}
}
}
I believe there is something wrong with your CSV file, it may contain some unexpected characters.Several things you can try:
You can let StreamReader class to detect correct encoding of your CSV
new StreamReader(#"C:\Test.csv", System.Text.Encoding.Default, true)
You can force your StreamReader class to read your CSV from the beginning.
rd.DiscardBufferedData();
rd.BaseStream.Seek(0, SeekOrigin.Begin);
rd.BaseStream.Position = 0;
You can try to fix your CSV file, such as clean null character and Convert Unix newline to Windows newline.

What's the best way to get all the content in between two tagged lines of a file so that you can deserialize it?

I've been noticing that the following segment of code does not scale well for large files (I think that appending to the paneContent string is slow):
string paneContent = String.Empty;
bool lineFound = false;
foreach (string line in File.ReadAllLines(path))
{
if (line.Contains(tag))
{
lineFound = !lineFound;
}
else
{
if (lineFound)
{
paneContent += line;
}
}
}
using (TextReader reader = new StringReader(paneContent))
{
data = (PaneData)(serializer.Deserialize(reader));
}
What's the best way to speed this all up? I have a file that looks like this (so I wanna get all the content in between the two different tags and then deserialize all that content):
A line with some tag
A line with content I want to get into a single stream or string
A line with content I want to get into a single stream or string
A line with content I want to get into a single stream or string
A line with content I want to get into a single stream or string
A line with content I want to get into a single stream or string
A line with some tag
Note: These tags are not XML tags.
You could use a StringBuilder as opposed to a string, that is what the StringBuilder is for. Some example code is below:
var paneContent = new StringBuilder();
bool lineFound = false;
foreach (string line in File.ReadLines(path))
{
if (line.Contains(tag))
{
lineFound = !lineFound;
}
else
{
if (lineFound)
{
paneContent.Append(line);
}
}
}
using (TextReader reader = new StringReader(paneContent.ToString()))
{
data = (PaneData)(serializer.Deserialize(reader));
}
As mentioned in this answer, a StringBuilder is preferred to a string when you are concatenating in a loop, which is the case here.
Here is an example of how to use groups with regexes and retrieve their contents afterwards.
What you want is a regex that will match your tags, label this as a group then retrieve the data of the group as in the example
Use a StringBuilder to build your data string (paneContent). It's much faster because concatenating strings results in new memory allocations. StringBuilder pre-allocates memory (if you expect large data strings, you can customize the initial allocation).
It's a good idea to read your input file line-by-line so you can avoid loading the whole file into memory if you expect files with many lines of text.

How to read a csv file one line at a time and replace/edit certain lines as you go?

I have a 60GB csv file I need to make some modifications to. The customer wants some changes to the files data, but I don't want to regenerate the data in that file because it took 4 days to do.
How can I read the file, line by line (not loading it all into memory!), and make edits to those lines as I go, replacing certain values etc.?
The process would be something like this:
Open a StreamWriter to a temporary file.
Open a StreamReader to the target file.
For each line:
Split the text into columns based on a delimiter.
Check the columns for the values you want to replace, and replace them.
Join the column values back together using your delimiter.
Write the line to the temporary file.
When you are finished, delete the target file, and move the temporary file to the target file path.
Note regarding Steps 2 and 3.1: If you are confident in the structure of your file and it is simple enough, you can do all this out of the box as described (I'll include a sample in a moment). However, there are factors in a CSV file that may need attention (such as recognizing when a delimiter is being used literally in a column value). You can drudge through this yourself, or try an existing solution.
Basic example just using StreamReader and StreamWriter:
var sourcePath = #"C:\data.csv";
var delimiter = ",";
var firstLineContainsHeaders = true;
var tempPath = Path.GetTempFileName();
var lineNumber = 0;
var splitExpression = new Regex(#"(" + delimiter + #")(?=(?:[^""]|""[^""]*"")*$)");
using (var writer = new StreamWriter(tempPath))
using (var reader = new StreamReader(sourcePath))
{
string line = null;
string[] headers = null;
if (firstLineContainsHeaders)
{
line = reader.ReadLine();
lineNumber++;
if (string.IsNullOrEmpty(line)) return; // file is empty;
headers = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
writer.WriteLine(line); // write the original header to the temp file.
}
while ((line = reader.ReadLine()) != null)
{
lineNumber++;
var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
// if there are no headers, do a simple sanity check to make sure you always have the same number of columns in a line
if (headers == null) headers = new string[columns.Length];
if (columns.Length != headers.Length) throw new InvalidOperationException(string.Format("Line {0} is missing one or more columns.", lineNumber));
// TODO: search and replace in columns
// example: replace 'v' in the first column with '\/': if (columns[0].Contains("v")) columns[0] = columns[0].Replace("v", #"\/");
writer.WriteLine(string.Join(delimiter, columns));
}
}
File.Delete(sourcePath);
File.Move(tempPath, sourcePath);
memory-mapped files is a new feature in .NET Framework 4 which can be used to edit large files.
read here http://msdn.microsoft.com/en-us/library/dd997372.aspx
or google Memory-mapped files
Just read the file, line by line, with streamreader, and then use REGEX! The most amazing tool in the world.
using (var sr = new StreamReader(new FileStream(#"C:\temp\file.csv", FileMode.Open)))
{
var line = sr.ReadLine();
while (!sr.EndOfStream)
{
// do stuff
line = sr.ReadLine();
}
}

Explaining codes

Hi can anyone explain these lines of codes, I need to understand how it works in order to proceed with what I am doing
if (e.Error == null){
Stream responseStream = e.Result;
StreamReader responseReader = new StreamReader(responseStream);
string response = responseReader.ReadToEnd();
string[] split1 = Regex.Split(response, "},{");
List<string> pri1 = new List<string>(split1);
pri1.RemoveAt(0);
string last = pri1[pri1.Count() - 1];
pri1.Remove(last);
}
// Check if there was no error
if (e.Error == null)
{
// Streams are a way to read/write information from/to somewhere
// without having to manage buffer allocation and such
Stream responseStream = e.Result;
// StreamReader is a class making it easier to read from a stream
StreamReader responseReader = new StreamReader(responseStream);
// read everything that was written to a stream and convert it to a string using
// the character encoding that was specified for the stream/reader.
string response = responseReader.ReadToEnd();
// create an array of the string by using "},{" as delimiter
// string.Split would be more efficient and more straightforward.
string[] split1 = Regex.Split(response, "},{");
// create a list of the array. Lists makes it easier to work with arrays
// since you do not have to move elements manually or take care of allocations
List<string> pri1 = new List<string>(split1);
pri1.RemoveAt(0);
// get the last item in the array. It would be more efficient to use .Length instead
// of Count()
string last = pri1[pri1.Count() - 1];
// remove the last item
pri1.Remove(last);
}
I would use a LinkedList instead of List if the only thing to do was to remove the first and last elements.
It's reading the response stream as a string, making the assumption that the string consists of sequences "{...}" separated by commas, e.g.:
{X},{Y},{Z}
then splits the string on "},{", giving an array of
{X
Y
Z}
then removes the first brace from the first element of the array ( {X => X ) and the end brace from the last element of the array ( Z} => Z).
From what I can see, it is reading from a stream that could have came from TCP.
It reads the whole chunk of data, then separate the chunk using the delimiter },{.
So if you have something like abc},{dec , it will be placed into split1 array with 2 values, split1 [0]=abc , split1 [1]=dec.
After that, it basically remove the 1st and the last content
It is processing an error output.
It received a stream from the e (I guess it is an exception), reads it.
It looks something like :
""{DDD},{I failed},{Because},{There was no signal}{ENDCODE}
It splits it into different string, and removes to fist and last entries (DDD, ENDCODE)

Categories