I have a C# console application which parses a .txt file. The txt file has 4 values on each line. So here are a couple of samples:
c:\ecpg\myfolder\no_space.cfm 20160803 01:09:54 1574
c:\ecpg\myfolder\file with space.cfm 20160803 01:09:54 1574
c:\myfolder\.project 20170221 07:54:10 265
I am using the following to split based on white spaces in each row:
while ((line = file.ReadLine()) != null)
{
string[] parts = line.Split(new char[0], StringSplitOptions.RemoveEmptyEntries);
}
Problem is that, in case of row 2, there is a space in the file name and so that's failing the parsing because now I have 5 values instead of 4. How can I prevent this? Maybe some way to detect if there is a . (dot) soon after the space?
Thank you!
You can use Regex to split your string, it will give you better output. Please check my code:
while ((line = file.ReadLine()) != null)
{
string[] parts = Regex.Split(line, #"(\s+\s+)");
}
Also I've written it in DotNetFiddle you can check this.
EDIT: I've edited the code and it will cover all of your scenario. New Solution Fiddle
while ((line = file.ReadLine()) != null)
{
string partOne = Regex.Match(line, #"[a-z](.*)[a-z]").Value;
//string[] parts = Regex.Split(line.Replace(partOne, ""), #"(\s+)");
string[] parts;
if (!string.IsNullOrEmpty(partOne))
{
parts = Regex.Split(line.Replace(partOne, ""), #"(\s+)");
}
else
{
parts = Regex.Split(line, #"(\s+)");
}
}
Final Code:
List<string> parts = new List<string>();
while ((line = file.ReadLine()) != null)
{
parts = new List<string>();
//string partOne = Regex.Match(line, #"[A-Za-z](.*)[A-Za-z]").Value;
//Update Regex for handle numeric value in part one.
string partOne = Regex.Match(line, #"[A-Za-z](.*)([A-Za-z]|([A-Za-z]{1}[0-9]))(.*?)\s").Value.Trim();
parts.Add(partOne);
string[] fianlParts;
if (!string.IsNullOrEmpty(partOne))
{
fianlParts = Regex.Split(line.Replace(partOne, ""), #"(\s+)");
}
else
{
fianlParts = Regex.Split(line, #"(\s+)");
}
foreach (string part in fianlParts)
{
if (!string.IsNullOrEmpty(part.Trim()))
{
parts.Add(part);
}
}
Console.WriteLine(parts[0] + " " + parts[1] + " " + parts[2] + " " + parts[3]);
}
This method is manual but works. It supports filenames with any number of spaces.
It works by locating spaces from the end of the string, retrieving three fields in the loop and finally the filename. There's plenty of room for optimalization here if you're parsing large files.
while ((line = file.ReadLine()) != null)
{
string[] parts = new string[4];
int n = -1;
for (int idx = 0; idx < 3; idx++)
{
n = line.LastIndexOf(' ');
parts[3-idx] = line.Substring(n + 1);
line = line.Substring(0, n).TrimEnd();
}
parts[0] = line; // filename
}
If one or more of the fields are missing you can do simple pattern checks. In your file the first parameter is the filename, the second an 8-digit date, the third the time of day and the fourth (probably) the file size. In this case this code should be more robust (I didn't try compiling it so it might contain typos):
while ((line = file.ReadLine()) != null)
{
string[] parts = new string[4];
int n = -1;
for (int idx = 0; idx < 3; idx++)
{
n = line.LastIndexOf(' ');
if (n == -1 || n == 0) break;
string part = line.Substring(n + 1);
if (part.IndexOf(':') > 0) parts[2] = part;
else if (part.Length == 8) parts[1] = part;
else parts[3] = part; // assuming you don't have 8-digit filesizes
line = line.Substring(0, n).TrimEnd();
}
parts[0] = line.TrimEnd(); // filename
}
Split on the period instead. That will give you two separate strings: the file and the rest. Split only the second string on space. The very first element of the second string split is your file extension:
while ((line = file.ReadLine()) != null)
{
string[] parts = line.Split('.');
string[] secondSplit = parts[1].Split(' ');
// put together the file path
string filePath = parts[0] + "." + secondSplit[0];
// Do something here with the rest of the second split: secondSplit
}
Related
I'd like to know on how I can check if the cell inside my CSV file has a comma, so that I can then replace it with white space. Below is the image of my CSV file:
My CSV file
Here is my C# code:
if (System.IO.File.Exists(fName))
{
System.IO.StreamReader objReader = new System.IO.StreamReader(fName);
do
{
textLine = objReader.ReadLine();
if (textLine != "")
{
splitLine = textLine.Split(',');
if (splitLine[0] != "" || splitLine[1] != "")
{
dataGridView1.Rows.Add(splitLine);
}
}
} while (objReader.Peek() != -1);
}
return true;
Below is the result of the above code.
Result
Instead of Blk 32, Lot 3, the Lot 3 got separated due to the comma after the Blk 32. I want them to become one as Blk 32, Lot 3.
I think for your case, it's better to add quotation marks around your columns when exporting to csv in Excel. You then can use regex to get those matching column data.
http://www.lenashore.com/2012/04/how-to-add-quotes-to-your-cells-in-excel-automatically/
As this happened due to the presence of comma in csv files which is marked be enclosing it in quotation marks, it can be solved be tracking these quotation mark terms and merge them like this:
splitLine = textLine.Split(',');
List<string> tmp = new List<string>();
for (i = 0; i < splitLine.Length - 1; i++)
{
if (splitLine[i] != "" && splitLine[i + 1] != "" && splitLine[i][0] == '"' && splitLine[i + 1][splitLine[i + 1].Length - 1] == '"')
{
tmp.Add(splitLine[i] + "," + splitLine[i + 1]);
i++;
}
else
tmp.Add(splitLine[i]);
}
splitLine = tmp.ToArray();
I am trying to search through a text file for a string, once I have found this string I need to display this line and then also display the 6 preceding lines i.e. which will contain the details about the error message in the string. I have been searching for similar code and have found the following code but it doesn’t meet my requirements, just wondering if it's possible to do this.
Thanks,
John.
private static void Main(string[] args)
{
string cacheline = "";
string line;
System.IO.StreamReader file = new
System.IO.StreamReader(#"D:\Temp\AccessOutlook.txt");
List<string> lines = new List<string>();
while ((line = file.ReadLine()) != null)
{
if (line.Contains("errors"))
{
lines.Add(cacheline);
}
cacheline = line;
}
file.Close();
foreach (var l in lines)
{
Console.WriteLine(l);
}
}
}
This is probably what you want:
static void Main(string[] args)
{
Queue<string> lines = new Queue<string>();
using (var reader = new StreamReader(args[0]))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.Contains("error"))
{
Console.WriteLine("----- ERROR -----");
foreach (var errLine in lines)
Console.WriteLine(errLine);
Console.WriteLine(line);
Console.WriteLine("-----------------");
}
lines.Enqueue(line);
while (lines.Count > 6)
lines.Dequeue();
}
}
}
You can keep caching the lines until you find the line you are looking for:
using(var file = new StreamReader(#"D:\Temp\AccessOutlook.txt"))
{
List<string> lines = new List<string>();
while ((line = file.ReadLine()) != null)
{
if (!line.Contains(myString))
{
lines.Add(line);
}
else
{
Console.WriteLine(string.Join(Environment.NewLine, lines.Concat(new[] { line })));
}
if(lines.Count > 6) lines.RemoveAt(0);
}
}
string filename = "filename"; // Put your own filename here.
string target = "target"; // Put your target string here.
int numLinesToShow = 7;
var lines = File.ReadAllLines(filename);
int index = Array.FindIndex(lines, element => element.Contains(target));
if (index >= 0)
{
int start = Math.Max(0, index - numLinesToShow + 1);
var result = lines.Skip(start).Take(numLinesToShow).ToList();
// Use result.
}
The code below will open the file, search for the line you want, and then write the 6 preceeding lines to the Console.
var lines = File.ReadAllLines(filePath);
int lineIndex;
for (lineIndex = 0; lineIndex < lines.Length - 1; lineIndex++)
{
if (lines[lineIndex] == textToFind)
{
break;
}
}
var startLine = Math.Max(0, lineIndex - 6);
for (int i = startLine; i < lineIndex; i++)
{
Console.WriteLine(lines[i]);
}
I'm attempting to parse a text file containing data that is being used on a remote FTP server. The data is delimited by an equals sign (=) and I'm attempting to load each row in to two columns in a DataGridView. The code I have written works fine except for when an equals character is thrown into the second column's value. When this happens, regardless of specifying the maximum count as being 2. I'd prefer not to change the delimiter if possible.
Here is the code that is being problematic:
dataGrid_FileContents.Rows.Clear();
char delimiter = '=';
StreamReader fileReader = new StreamReader(fileLocation);
String fileData = fileReader.ReadToEnd();
String[] rows = fileData.Split("\n".ToCharArray());
for(int i = 0; i < rows.Length; i++)
{
String str = rows[i];
String[] items = str.Split(new char[] { delimiter }, 1, StringSplitOptions.RemoveEmptyEntries);
if (items.Length == 2)
{
dataGrid_FileContents.Rows.Add(items[0], items[1]);
}
}
fileReader.Close();
And an example of the file being loaded:
boats=123
cats=234-f
cars==1
It works as intended for the first two rows and then ignores the last row as it ends up creating a String[] with 1 element and two String[]s with zero elements.
Try the following. It will capture the value before and after the first '=', correctly parsing the cars==1 scenario.
String[] items = str.Split(new char[] { delimiter }, 2, stringSplitOptions.None);
A different solution, if you want everything after the first equals then you could approach this problem using string.IndexOf
for(int i = 0; i < rows.Length; i++)
{
String str = rows[i];
int pos = str.IndexOf(delimiter);
if (pos != -1)
{
string first = str.Substring(0, pos-1);
string second = str.Substring(pos + 1);
dataGrid_FileContents.Rows.Add(first, second);
}
}
Just read all items delimeted by '=' in row.
Then iterate over items, and check, that item not empty, than use this prepared data to write
here illustrated snippet
http://dotnetfiddle.net/msVho2
and your snippet can be transformed to something like bellow
dataGrid_FileContents.Rows.Clear();
char delimiter = '=';
using(StreamReader fileReader = new StreamReader(fileLocation))
{
string[] data = new string[2];
while(true)
{
string row = fileReader.ReadLine();
if(row == null)
break;
string[] items = row.Split(delimiter);
int data_index = 0;
foreach(string item in items)
{
if(data_index >= data.Length)
{
//TODO: log warning
break;
}
if(!string.IsNullOrWhiteSpace(item))
{
data[data_index++] = item;
}
}
if(data_index < data.Length)
{
//TODO: log error, only 1 item in row
continue;
}
dataGrid_FileContents.Rows.Add(data[0], data[1]);
}
}
In a Windows Forms C# app, I have a textbox where users paste log data, and it sorts it. I need to check each line individualy so I split the input by the new line, but if there are a lot of lines, greater than 100,000 or so, it throws a OutOfMemoryException.
My code looks like this:
StringSplitOptions splitOptions = new StringSplitOptions();
if(removeEmptyLines_CB.Checked)
splitOptions = StringSplitOptions.RemoveEmptyEntries;
else
splitOptions = StringSplitOptions.None;
List<string> outputLines = new List<string>();
foreach(string line in input_TB.Text.Split(new string[] { "\r\n", "\n" }, splitOptions))
{
if(line.Contains(inputCompare_TB.Text))
outputLines.Add(line);
}
output_TB.Text = string.Join(Environment.NewLine, outputLines);
The problem comes from when I split the textbox text by line, here input_TB.Text.Split(new string[] { "\r\n", "\n" }
Is there a better way to do this? I've thought about taking the first X amount of text, truncating at a new line and repeat until everything has been read, but this seems tedious. Or is there a way to allocate more memory for it?
Thanks,
Garrett
Update
Thanks to Attila, I came up with this and it seems to work. Thanks
StringReader reader = new StringReader(input_TB.Text);
string line;
while((line = reader.ReadLine()) != null)
{
if(line.Contains(inputCompare_TB.Text))
outputLines.Add(line);
}
output_TB.Text = string.Join(Environment.NewLine, outputLines);
The better way to do this would be to extract and process one line at a time, and use a StringBuilder to create the result:
StringBuilder outputTxt = new StringBuilder();
string txt = input_TB.Text;
int txtIndex = 0;
while (txtIndex < txt.Length) {
int startLineIndex = txtIndex;
GetMore:
while (txtIndex < txt.Length && txt[txtIndex] != '\r' && txt[txtIndex] != '\n')) {
txtIndex++;
}
if (txtIndex < txt.Length && txt[txtIndex] == '\r' && (txtIndex == txt.Length-1 || txt[txtIndex+1] != '\n') {
txtIndex++;
goto GetMore;
}
string line = txt.Substring(startLineIndex, txtIndex-startLineIndex);
if (line.Contains(inputCompare_TB.Text)) {
if (outputTxt.Length > 0)
outputTxt.Append(Environment.NewLine);
outputTxt.Append(line);
}
txtIndex++;
}
output_TB.Text = outputTxt.ToString();
Pre-emptive comment: someone will object to the goto - but it is what's needed here, the alternatives are much more complex (reg exp for example), or fake the goto with another loop and continue or break
Using a StringReader to split the lines is a much cleaner solution, but it does not handle both \r\n and \n as a new line:
StringReader reader = new StringReader(input_TB.Text);
StringBuilder outputTxt = new StringBuilder();
string compareTxt = inputCompare_TB.Text;
string line;
while((line = reader.ReadLine()) != null) {
if (line.Contains(compareTxt)) {
if (outputTxt.Length > 0)
outputTxt.Append(Environment.NewLine);
outputTxt.Append(line);
}
}
output_TB.Text = outputTxt.ToString();
Split will have to duplicate the memory need of the original text, plus overhead of string objects for each line. If this causes memory issues, a reliable way of processing the input is to parse one line at a time.
I guess the only way to do this on large text files is to open the file manually and use a StreamReader. Here is an example how to do this.
You can avoid creating strings for all lines and the array by creating the string for each line one at a time:
var eol = new[] { '\r', '\n' };
var pos = 0;
while (pos < input.Length)
{
var i = input.IndexOfAny(eol, pos);
if (i < 0)
{
i = input.Length;
}
if (i != pos)
{
var line = input.Substring(pos, i - pos);
// process line
}
pos = i + 1;
}
On other hand, In this article say that the point is that "split" method is implemented poorly. Read it, and make your conclusions.
Like Attila said, you have to parse line by line.
I have a large string separated by newline character. This string contains 100 lines. I want to split these line into small chunks say chunk of 20 also based on newline character.
Let's say the string variable is like this,
Line1 This is line2 Line3 is here I am Line4
Now I want to split this large string variable into small chunks of 2. The result should be 2 strings as,
Line1 This is line2
Line3 is here I am Line4
Using Split function, I am not getting the expected results. Please help me in achieving this.
Thanks in advance,
Vijay
The simple approach (Split on Environment.NewLine, then loop and append):
public static List<string> GetStringSegments(string originalString, int linesPerSegment)
{
List<string> segments = new List<string>();
string[] allLines = originalString.Split(new string[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
StringBuilder sb = new StringBuilder();
int linesProcessed = 0;
for (int i = 0; i < allLines.Length; i++)
{
sb.AppendLine(allLines[i]);
linesProcessed++;
if (linesProcessed == linesPerSegment
|| i == allLines.Length-1)
{
segments.Add(sb.ToString());
sb.Clear();
inesProcessed = 0;
}
}
return segments;
}
The above approach is slightly inefficient since it requires splitting the string first into individual lines, which creates unnecessary strings. A string of 1000 lines will create an array of 1000 strings. We can improved this if we just scan the string and search for \n:
public static List<string> GetStringSegments(string original, int linesPerSegment)
{
List<string> segments = new List<string>();
int startIndex = 0;
int newLinesEncountered = 0;
for (int i = 0; i < original.Length; i++)
{
if (original[i] == '\n')
{
newLinesEncountered++;
}
if (newLinesEncountered == linesPerSegment
|| i == original.Length - 1)
{
segments.Add(original.Substring(startIndex, (i - startIndex + 1)));
startIndex = i + 1;
newLinesEncountered = 0;
}
}
return segments;
}
You can use something like the batch operator from http://www.make-awesome.com/2010/08/batch-or-partition-a-collection-with-linq
string s = "[YOUR DATA]";
var lines = s.Split(new[]{Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
foreach(var batch in lines.Batch(20))
{
foreach(batchLine in batch)
{
Console.Writeline(batchLine);
}
}
static class LinqEx
{
// from http://www.make-awesome.com/2010/08/batch-or-partition-a-collection-with-linq
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> collection,
int batchSize)
{
List<T> nextbatch = new List<T>(batchSize);
foreach (T item in collection)
{
nextbatch.Add(item);
if (nextbatch.Count == batchSize)
{
yield return nextbatch;
nextbatch = new List<T>(batchSize);
}
}
if (nextbatch.Count > 0)
yield return nextbatch;
}
}
As several people mentioned, using string.Split will split the whole string into memory, which might be an allocation-heavy operation. This is why we have the TextReader class and its descendants, which should provide better memory performance, and might also be clearer, logically:
using (var reader = new StringReader(myString))
{
do
{
StringBuilder newString = null;
StringWriter newStringWriter = null;
if (lineCounter % 20 == 0)
{
newString = new StringBuilder();
newStringWriter = new StringWriter(newString);
newStringCollection.Add(newString);
}
string line = reader.ReadLine();
if (!string.isNullOrEmpty(line))
{
newStringWriter.WriteLine(line);
lineCounter++;
}
}
while (line != null)
}
We're using the StringReader to read our big string, one line at a time. And the corresponding StringWriter writes those lines to the new string, one line a time. After every 20 lines, we start a new StringBuilder (and the appropriate StringWriter wrapper).
split the strings by newline.
Then merge/fetch the number of strings together while using the strings.
string s = "Line1\nThis is line2 \nLine3 is here\nI am Line4";
string [] str = s.split('\n');
List<String> str1 = new List<String>();
for(int i=0; i<str.Length; i+=2)
{
string ss = str[i];
if(i+1 <str.Length)
ss += '\n' + str[i+1];
str1.Add(ss);
}
str = str1.ToArray();
If condition has been checked inside loop because may be the length of str is odd
var strAray = myLongString.Split('\n').ToList();
var skip=0;
var take=20;
var chunk = strAray.Skip(skip).Take(take).ToList();
While(chunk.Count >0)
{
foreach(var line in chunk)
{
// use line string
}
skip++;
chunk = strAray.Skip(skip).Take(take).ToList()
}