so i am loading a file that has some encrypted text in it, it uses a custom character table, how can i load it from an external file or put the character table in the code ?
Thank you.
Start by going over the file and counting the lines so you can allocate an array. You could just use a list here but arrays have much better performance and you have a significant amount of items which you'll have to loop over a lot (once for each encoded char in the file) so I think you should use an array instead.
int lines = 0;
try
{
using (StreamReader sr = new StreamReader("Encoding.txt"))
{
string line;
while ((line = sr.ReadLine()) != null)
{
lines++;
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
Now we're going to allocate and array of tuples;
Tuple<string, string> tuples = new Tuple<string, string>[lines];
After that we'll loop over the file again adding each key-value pair as a tuple.
try
{
using (StreamReader sr = new StreamReader("Encoding.txt"))
{
string line;
for (int i =0; i < lines; i++)
{
line = sr.Readline();
if (!line.startsWith('#')) //ignore comments
{
string[] tokens = line.Split('='); //split for key and value
foreach(string token in tokens)
token.Trim(' '); // remove whitespaces
tuples[i].Item1 = tokens[0];
tuples[i].Item2 = tokens[1];
}
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
I've given you a lot of code although this may take a little tinkering to make work. I didn't both to write the second loop in a compiler and I'm too lazy to look up things like System.String.Trim and make sure I'm using it correctly. I'll leave those things to you. This has the core logic to do it. If you want to instead use a list move the logic inside of the for loop into the while loop where I count the lines.
Do decode the file you're reading you'll have to loop over this array and compare the keys or values until you have a match.
One other thing - your array of tuples is going to have some empty indexes (the array is of length lines while there are actually lines - comments + blankLines in the file). You'll need some check to make sure you're not accessing these indexes when you try to match characters. Alternatively, you could enhance the file reading so it doesn't count blank lines or comments or remove those lines from the file you read from. The best solution would be to enhance the file reading but that's also the most work.
Related
I want to get the line containing a certain word that cannot be repeated like profile ID without make loop to read each of line separately, Because if the word I am looking for is in the last line of the text file, this will take a lot of time to get it, and if the search process is for more than one word and extract the line that contains it, I think it will take a lot of time.
Example for line text file
name,id,image,age,place,link
string word = "13215646";
string output = string.Empty;
using (var fileStream = File.OpenRead(FileName))
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8))
{
String line;
while ((line = streamReader.ReadLine()) != null)
{
string[] strList = line.Split(',');
if (word == strList[1]) // check if word = id
{
output = line;
break;
}
}
}
You can use this to search the file:
var output = File.ReadLines(FileName).
Where(line => line.Split(',')[1] == word).
FirstOrDefault();
But it won't solve this:
if the word I am looking for is in the last line of the text file, this will take a lot of time to get it, and if the search process is for more than one word and extract the line that contains it, I think it will take a lot of time.
There's not a practical way to avoid this for a basic file.
The only ways around actually reading through the file is either maintaining an index, which requires absolute control over everything that might write into the file, or if you can guarantee the file is already sorted by the columns that matter, in which case you can do something like a binary search.
But neither is likely for a random csv file. This is one of the reasons people use databases.
However, we also need to stop and check whether this is really a problem for you. I'd expect the code above to handle files up to a couple hundred MB in around 1 to 2 seconds on modern hardware, even if you need to look through the whole file.
You can optimise the code. Here are few ideas:
var ids = new ["13215646", "113"];
foreach(var line in File.ReadLines(FileName))
{
var id = line.Split(',', count: 3)[1]; // Optimization 1: Use: `count: 3`
if(ids.Contains(id) // Optimization 2: Search for multiple ids
{
//Do what you need with the line
}
}
I am new to c# and am attempting to read in a .csv file and put each line of text in to a separate list item so I can sort it later.
the .csv file is organised like so:
1;"final60";"United Kingdom";"2013-12-06 15:48:16";
2;"donnyr8";"Netherlands";"2013-12-06 15:54:32";
etc
This is my first attempt that doesn't work.It shows no errors in Visual studios 2010 but when I run the console program it displays the following Exception instead of the list.
Exception of type 'System.OutOFMemoryException' was thrown. Which is bizarre because the .csv file only contains a small list.
try
{
// load csv file
using (StreamReader file = new StreamReader("file.csv"))
{
string line = file.ReadLine();
List<string> fileList = new List<string>();
// Do something with the lines from the file until the end of
// the file is reached.
while (line != null)
{
fileList.Add(line);
}
foreach (string fileListLine in fileList)
{
Console.WriteLine(fileListLine);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
So am I approaching this the correct way?
If the file you are loading isn't really big then you can use File.ReadAllLines:
List<string> list = File.ReadAllLines("file.csv").ToList();
As Servy pointed out in comment it would be better to use File.ReadLines method.
File.ReadLines - MSDN
The ReadLines and ReadAllLines methods differ as follows: When you use
ReadLines, you can start enumerating the collection of strings before
the whole collection is returned; when you use ReadAllLines, you must
wait for the whole array of strings be returned before you can access
the array. Therefore, when you are working with very large files,
ReadLines can be more efficient.
If you need a List<string> then you can do:
List<string> list = File.ReadLines("file.csv").ToList();
You are not updating the line variable so the line will be always different from null infinite loop which cause OutOfMemoryException
try
{
// load csv file
using (StreamReader file = new StreamReader("file.csv"))
{
string line = file.ReadLine();
List<string> fileList = new List<string>();
// Do something with the lines from the file until the end of
// the file is reached.
while (line != null)
{
fileList.Add(line);
line = file.ReadLine();
}
foreach (string fileListLine in fileList)
{
Console.WriteLine(fileListLine);
}
}
}
but the correct approaches will be
List<string> list = File.ReadLines("file.csv").ToList();
which is better than File.ReadAllLines for the following reason
From MSDN:
When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned;
You should use File.ReadAllLines() and then parse the strings in the array.
For extremely large files this might not be feasible and you'll have to stream the single lines in and process them one by one.
But this is something you can only decide AFTER you have seen this quick approach failing miserably. Until then, stick to the quick and dirty.
I have a problem with the stream reader. i want to read from a text file just one line.
I want a specific line, like the seventh line. and i don't know how to.
it's a function or something like that ? like file.ReadLine(number 7) ?
The simplest approach would probably be to use LINQ combined with File.ReadLines:
string line = File.ReadLines("foo.txt").ElementAt(6); // 0-based
You could use File.ReadAllLines instead, but that would read the whole file even if you only want an early one. If you need various different lines of course, it means you can read them in one go. You could write a method to read multiple specific lines efficiently (i.e. in one pass, but no more than one line at a time) reasonably easily, but it would be overkill if you only want one line.
Note that this will throw an exception if there aren't enough lines - you could use ElementAtOrDefault if you want to handle that without any exceptions.
If you want to read line by number it's better to use
string line = File.ReadLines(fileName).Skip(N).FirstOrDefault();
Thus you will avoid reading all lines from file, and you'll read lines only until you get line you need. If you need several lines, then it's better to read all lines to array, and then get your lines from that array:
string[] lines = File.ReadAllLines(fileName);
if (lines.Count() > N)
line = lines[N];
if you want to specific line by using StreamReader.
Suppose you have a data Line1,Line2,Line3,Line4 in text files.
Every time you call "ReadLine" method it will increase 1 line.
That mean you can write you own function and passing your parmeter to function.
You can do it by.
string l1, l2, l3, l4;
StreamReader sr = new StreamReader(sourcePath);
l1 = sr.Readline(); // Line 1
l2 = sr.Readline(); // Line 2
l3 = sr.Readline(); // Line 3
public string StreamReadLine(string sourcepath, int lineNum)
{
int index = lineNum;
string strLine = "N/A";
StreamReader sr = new StreamReader(sourcepath);
try
{
for (var i = 0; i <= index; i++)
{
strLine = sr.ReadLine();
if (i == index)
break;
i += 1;
}
}
catch (Exception ex)
{
strLine = ex.ToString();
}
return strLine;
}
How would it be possible to search for a string e.g. #Test1 in a text file and then output the line below it as a string e.g.
Test.txt
#Test1
86/100
#Test2
99/100
#Test3
13/100
so if #Test2 was the search keyword "99/200" would be turned into a string
Parse the file once, store the results in a dictionary. Then lookup in the dictionary.
var dictionary = new Dictionary<string, string>();
var lines = File.ReadLines("testScores.txt");
var e = lines.GetEnumerator();
while(e.MoveNext()) {
if(e.Current.StartsWith("#Test")) {
string test = e.Current;
if(e.MoveNext()) {
dictionary.Add(test, e.Current);
}
else {
throw new Exception("File not in expected format.");
}
}
}
Now you can just say
Console.WriteLine(dictionary["#Test1"]);
etc.
Also, long-term, I recommend moving to a database.
Use readline and search for the string (ex. #Test1) and then use the next line as input.
If the exactly above is the file format. Then you can use this
1. read all lines till eof in an array.
2. now run a loop and check if the string[] is not empty.
Hold the value in some other array or list.
now you have items one after one. so whenever you use loop and use [i][i+1],
it will give you the test number and score.
Hope this might help.
How about RegularExpressions? here's a good example
This should do it for you:
int lineCounter = 0;
StreamReader strReader = new StreamReader(path);
while (!strReader.EndOfStream)
{
string fileLine = strReader.ReadLine();
if (Regex.IsMatch(fileLine,pattern))
{
Console.WriteLine(pattern + "found in line " +lineCounter.ToString());
}
lineCounter++;
}
I have an Excel spreadsheet being converted into a CSV file in C#, but am having a problem dealing with line breaks. For instance:
"John","23","555-5555"
"Peter","24","555-5
555"
"Mary,"21","555-5555"
When I read the CSV file, if the record does not starts with a double quote (") then a line break is there by mistake and I have to remove it. I have some CSV reader classes from the internet but I am concerned that they will fail on the line breaks.
How should I handle these line breaks?
Thanks everybody very much for your help.
Here's is what I've done so far. My records have fixed format and all start with
JTW;...;....;...;
JTW;...;...;....
JTW;....;...;..
..;...;... (wrong record, line break inserted)
JTW;...;...
So I checked for the ; in the [3] position of each line. If true, I write; if false, I'll append on the last (removing the line-break)
I'm having problems now because I'm saving the file as a txt.
By the way, I am converting the Excel spreadsheet to csv by saving as csv in Excel. But I'm not sure if the client is doing that.
So the file as a TXT is perfect. I've checked the records and totals. But now I have to convert it back to csv, and I would really like to do it in the program. Does anybody know how?
Here is my code:
namespace EditorCSV
{
class Program
{
static void Main(string[] args)
{
ReadFromFile("c:\\source.csv");
}
static void ReadFromFile(string filename)
{
StreamReader SR;
StreamWriter SW;
SW = File.CreateText("c:\\target.csv");
string S;
char C='a';
int i=0;
SR=File.OpenText(filename);
S=SR.ReadLine();
SW.Write(S);
S = SR.ReadLine();
while(S!=null)
{
try { C = S[3]; }
catch (IndexOutOfRangeException exception){
bool t = false;
while (t == false)
{
t = true;
S = SR.ReadLine();
try { C = S[3]; }
catch (IndexOutOfRangeException ex) { S = SR.ReadLine(); t = false; }
}
}
if( C.Equals(';'))
{
SW.Write("\r\n" + S);
i = i + 1;
}
else
{
SW.Write(S);
}
S=SR.ReadLine();
}
SR.Close();
SW.Close();
Console.WriteLine("Records Processed: " + i.ToString() + " .");
Console.WriteLine("File Created SucacessFully");
Console.ReadKey();
}
}
}
CSV has predefined ways of handling that. This site provides an easy to read explanation of the standard way to handle all the caveats of CSV.
Nevertheless, there is really no reason to not use a solid, open source library for reading and writing CSV files to avoid making non-standard mistakes. LINQtoCSV is my favorite library for this. It supports reading and writing in a clean and simple way.
Alternatively, this SO question on CSV libraries will give you the list of the most popular choices.
Rather than check if the current line is missing the (") as the first character, check instead to see if the last character is a ("). If it is not, you know you have a line break, and you can read the next line and merge it together.
I am assuming your example data was accurate - fields were wrapped in quotes. If quotes might not delimit a text field (or new-lines are somehow found in non-text data), then all bets are off!
There is a built-in method for reading CSV files in .NET (requires Microsoft.VisualBasic assembly reference added):
public static IEnumerable<string[]> ReadSV(TextReader reader, params string[] separators)
{
var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader);
parser.SetDelimiters(separators);
while (!parser.EndOfData)
yield return parser.ReadFields();
}
If you're dealing with really large files this CSV reader claims to be the fastest one you'll find: http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
I've used this piece of code recently to parse rows from a CSV file (this is a simplified version):
private void Parse(TextReader reader)
{
var row = new List<string>();
var isStringBlock = false;
var sb = new StringBuilder();
long charIndex = 0;
int currentLineCount = 0;
while (reader.Peek() != -1)
{
charIndex++;
char c = (char)reader.Read();
if (c == '"')
isStringBlock = !isStringBlock;
if (c == separator && !isStringBlock) //end of word
{
row.Add(sb.ToString().Trim()); //add word
sb.Length = 0;
}
else if (c == '\n' && !isStringBlock) //end of line
{
row.Add(sb.ToString().Trim()); //add last word in line
sb.Length = 0;
//DO SOMETHING WITH row HERE!
currentLineCount++;
row = new List<string>();
}
else
{
if (c != '"' && c != '\r') sb.Append(c == '\n' ? ' ' : c);
}
}
row.Add(sb.ToString().Trim()); //add last word
//DO SOMETHING WITH LAST row HERE!
}
Try CsvHelper (a library I maintain). It ignores empty rows. I believe there is a flag you can set in FastCsvReader to have it handle empty rows also.
Heed the advice from the experts and Don't roll your own CSV parser.
Your first thought is, "How do I handle new line breaks?"
Your next thought is, "I need to handle commas inside of quotes."
Your next thought will be, "Oh, crap, I need to handle quotes inside of quotes. Escaped quotes. Double quotes. Single quotes..."
It's a road to madness. Don't write your own. Find a library with an extensive unit test coverage that hits all the hard parts and has gone through hell for you. For .NET, use the free CsvHelper library.
Maybe you could count for (") during the ReadLine(). If they are odd, that will raise the flag. You could either ignore those lines, or get the next two and eliminate the first "\n" occurrence of the merge lines.
What I usually do is read the text in character by character opposed to line by line, due to this very problem.
As you're reading each character, you should be able to figure out where each cell starts and stops, but also the difference between a linebreak in a row and in a cell: If I remember correctly, for Excel generated files anyway, rows start with \r\n, and newlines in cells are only \r.
There is an example parser is c# that seems to handle your case correctly. Then you can read your data in and purge the line breaks out of it post-read.
Part 2 is the parser, and there is a Part 1 that covers the writer portion.
Read the line.
Split into columns(fields).
If you have enough columns expected for each line, then process.
If not, read the next line, and capture the remaining columns until you get what you need.
Repeat.
A somewhat simple regular expression could be used on each line. When it matches, you process each field from the match. When it doesn't find a match, you skip that line.
The regular expression could look something like this.
Match match = Regex.Match(line, #"^(?:,?(?<q>['"](?<field>.*?\k'q')|(?<field>[^,]*))+$");
if (match.Success)
{
foreach (var capture in match.Groups["field"].Captures)
{
string fieldValue = capture.Value;
// Use the value.
}
}
Have a look at FileHelpers Library
It supports reading\writing CSV with line breaks as well as reading\writing to excel
The LINQy solution:
string csvText = File.ReadAllText("C:\\Test.txt");
var query = csvText
.Replace(Environment.NewLine, string.Empty)
.Replace("\"\"", "\",\"").Split(',')
.Select((i, n) => new { i, n }).GroupBy(a => a.n / 3);
You might also check out my CSV parser SoftCircuits.CsvParser on NuGet. It will not only parse a CSV file but--if wanted--can also automatically map column values to your class properties. And it runs nearly four times faster than CsvHelper.
For a line break to exist in a CSV, there must be an open double quote that's not closed.
Assuming that all CSVs cells must open and close a double quote, just check if there's an odd number of quotation marks
my_string.Count(c => c == '"') % 2 == 1
and if that's the case, continue reading until you have the even number.