I have a text file that is divided up into many sections, each about 10 or so lines long. I'm reading in the file using File.ReadAllLines into an array, one line per element of the array, and I'm then I'm trying to parse each section of the file to bring back just some of the data. I'm storing the results in a list, and hoping to export the list to csv ultimately.
My for loop is giving me trouble, as it loops through the right amount of times, but only pulls the data from the first section of the text file each time rather than pulling the data from the first section and then moving on and pulling the data from the next section. I'm sure I'm doing something wrong either in my for loop or for each loop. Any clues to help me solve this would be much appreciated! Thanks
David
My code so far:
namespace ParseAndExport
{
class Program
{
static readonly string sourcefile = #"Path";
static void Main(string[] args)
{
string[] readInLines = File.ReadAllLines(sourcefile);
int counter = 0;
int holderCPStart = counter + 3;//Changed Paths will be an different number of lines each time, but will always start 3 lines after the startDiv
/*Need to find the start of the section and the end of the section and parse the bit in between.
* Also need to identify the blank line that occurs in each section as it is essentially a divider too.*/
int startDiv = Array.FindIndex(readInLines, counter, hyphens72);
int blankLine = Array.FindIndex(readInLines, startDiv, emptyElement);
int endDiv = Array.FindIndex(readInLines, counter + 1, hyphens72);
List<string> results = new List<string>();
//Test to see if FindIndexes work. Results should be 0, 7, 9 for 1st section of sourcefile
/*Console.WriteLine(startDiv);
Console.WriteLine(blankLine);
Console.WriteLine(endDiv);*/
//Check how long the file is so that for testing we know how long the while loop should run for
//Console.WriteLine(readInLines.Length);
//sourcefile has 5255 lines (elements) in the array
for (int i = 0; i <= readInLines.Length; i++)
{
if (i == startDiv)
{
results = (readInLines[i + 1].Split('|').Select(p => p.Trim()).ToList());
string holderCP = string.Join(Environment.NewLine, readInLines, holderCPStart, (blankLine - holderCPStart - 1)).Trim();
results.Add(holderCP);
string comment = string.Join(" ", readInLines, blankLine + 1, (endDiv - (blankLine + 1)));//in case the comment is more than one line long
results.Add(comment);
i = i + 1;
}
else
{
i = i + 1;
}
foreach (string result in results)
{
Console.WriteLine(result);
}
//csvcontent.AppendLine("Revision Number, Author, Date, Time, Count of Lines, Changed Paths, Comments");
/* foreach (string result in results)
{
for (int x = 0; x <= results.Count(); x++)
{
StringBuilder csvcontent = new StringBuilder();
csvcontent.AppendLine(results[x] + "," + results[x + 1] + "," + results[x + 2] + "," + results[x + 3] + "," + results[x + 4] + "," + results[x + 5]);
x = x + 6;
string csvpath = #"addressforcsvfile";
File.AppendAllText(csvpath, csvcontent.ToString());
}
}*/
}
Console.ReadKey();
}
private static bool hyphens72(String h)
{
if (h == "------------------------------------------------------------------------")
{
return true;
}
else
{
return false;
}
}
private static bool emptyElement(String ee)
{
if (ee == "")
{
return true;
}
else
{
return false;
}
}
}
}
It looks like you are trying to grab all of the lines in a file that are not "------" and put them into a list of strings.
You can try this:
var lineswithoutdashes = readInLines.Where(x => x != hyphens72).Select(x => x).ToList();
Now you can take this list and do the split with a '|' to extract the fields you wanted
The logic seems wrong. There are issues with the code in itself also. I am unsure what precisely you're trying to do. Anyway, a few hints that I hope will help:
The if (i == startDiv) checks to see if I equals startDiv. I assume the logic that happens when this condition is met, is what you refer to as "pulls the data from the first section". That's correct, given you only run this code when I equals startDiv.
You increase the counter I inside the for loop, which in itself also increases the counter i.
If the issue in 2. wouldn't exists then I'd suggest to not do the same operation "i = i + 1" in both the true and false conditions of the if (i == startDiv).
Given I assume this file might actually be massive, it's probably a good idea to not store it in memory, but just read the file line by line and process line by line. There's currently no obvious reason why you'd want to consume this amount of memory, unless it's because of the convenience of this API "File.ReadAllLines(sourcefile)". I wouldn't be too scared to read the file like this:
Try (BufferedReader br = new BufferedReader(new FileReader (file))) {
String line;
while ((line = br.readLine()) != null) {
// process the line.
}
}
You can skip the lines until you've passed where the line equals hyphens72.
Then for each line, you process the line with the code you provided in the true case of (i == startDiv), or at least, from what you described, this is what I assume you are trying to do.
int startDiv will return the line number that contains hyphens72.
So your current for loop will only copy to results for the single line that matches the calculated line number.
I guess you want to search the postion of startDiv in the current line?
const string hyphens72;
// loop over lines
for (var lineNumber = 0; lineNumber <= readInLines.Length; lineNumber++) {
string currentLine = readInLines[lineNumber];
int startDiv = currentLine.IndexOf(hyphens72);
// loop over characters in line
for (var charIndex = 0; charIndex < currentLine.Length; charIndex++) {
if (charIndex == startDiv) {
var currentCharacter = currentLine[charIndex];
// write to result ...
}
else {
continue; // skip this character
}
}
}
There are a several things which could be improved.
I would use ReadLines over File.ReadAllLines( because ReadAllLines reads all the lines at ones. ReadLines will stream it.
With the line results = (readInLines[i + 1].Split('|').Select(p => p.Trim()).ToList()); you're overwriting the previous results list. You'd better use results.AddRange() to add new results.
for (int i = 0; i <= readInLines.Length; i++) means when the length = 10 it will do 11 iterations. (1 too many) (remove the =)
Array.FindIndex(readInLines, counter, hyphens72); will do a scan. On large files it will take ages to completely read them and search in it. Try to touch a single line only ones.
I cannot test what you are doing, but here's a hint:
IEnumerable<string> readInLines = File.ReadLines(sourcefile);
bool started = false;
List<string> results = new List<string>();
foreach(var line in readInLines)
{
// skip empty lines
if(emptyElement(line))
continue;
// when dashes are found, flip a boolean to activate the reading mode.
if(hyphens72(line))
{
// flip state.. (start/end)
started != started;
}
if(started)
{
// I don't know what you are doing here precisely, do what you gotta do. ;-)
results.AddRange((line.Split('|').Select(p => p.Trim()).ToList()));
string holderCP = string.Join(Environment.NewLine, readInLines, holderCPStart, (blankLine - holderCPStart - 1)).Trim();
results.Add(holderCP);
string comment = string.Join(" ", readInLines, blankLine + 1, (endDiv - (blankLine + 1)));//in case the comment is more than one line long
results.Add(comment);
}
}
foreach (string result in results)
{
Console.WriteLine(result);
}
You might want to start with a class like this. I don't know whether each section begins with a row of hyphens, or if it's just in between. This should handle either scenario.
What this is going to do is take your giant list of strings (the lines in the file) and break it into chunks - each chunk is a set of lines (10 or so lines, according to your OP.)
The reason is that it's unnecessarily complicated to try to read the file, looking for the hyphens, and process the contents of the file at the same time. Instead, one class takes the input and breaks it into chunks. That's all it does.
Another class might read the file and pass its contents to this class to break them up. Then the output is the individual chunks of text.
Another class can then process those individual sections of 10 or so lines without having to worry about hyphens or what separates on chunk from another.
Now that each of these classes is doing its own thing, it's easier to write unit tests for each of them separately. You can test that your "processing" class receives an array of 10 or so lines and does whatever it's supposed to do with them.
public class TextSectionsParser
{
private readonly string _delimiter;
public TextSectionsParser(string delimiter)
{
_delimiter = delimiter;
}
public IEnumerable<IEnumerable<string>> ParseSections(IEnumerable<string> lines)
{
var result = new List<List<string>>();
var currentList = new List<string>();
foreach (var line in lines)
{
if (line == _delimiter)
{
if(currentList.Any())
result.Add(currentList);
currentList = new List<string>();
}
else
{
currentList.Add(line);
}
}
if (currentList.Any() && !result.Contains(currentList))
{
result.Add(currentList);
}
return result;
}
}
My code it quite long but this is the part that I am stuck on. I have my try statement at the beginning before my for loops. But now I want to take my information that is in the ListBox and send it to a text file which I already have in my debug folder. I would like to just take my outputs from the list box and write them to the population.txt file.
//variable user input, store in userInput
try
{
if (double.TryParse(startTextbox.Text, out start))
{
if (double.TryParse(averageTextbox.Text, out average))
{
if (double.TryParse(daysTextbox.Text, out days))
{
//process
int count = 1;
while (count <= days)
{
//calculation
double output;
output = start * Math.Pow((1 + average / 100), count - 1);
//display the results in the listbox
populationListBox.Items.Add("The approximate population for " +
count + " day(s) is " + output.ToString("n2"));
//count the days
count = count + 1;
}
//used to text statement
//populationListBox.Items.Add("End of while loop");
count = 1;
do
{
//calculation
double output;
output = start * Math.Pow((1 + average / 100), count - 1);
//display the results in the listbox
populationListBox.Items.Add("The approximate population for " +
count + " day(s) is " + output.ToString("n2"));
//count the days
count = count + 1;
} while (count <= days);
//used to text statement
//populationListBox.Items.Add("End of do-while loop");
//int count;
for (count = 1; count <= days; )
{
//calculation
double output;
output = start * Math.Pow((1 + average / 100), count - 1);
//display the results in the listbox
populationListBox.Items.Add("The approximate population for " +
count + " day(s) is " + output.ToString("n2"));
//count the days
count = count + 1;
}
//used to text statement
//populationListBox.Items.Add("End of for loop");
}
else
{
//error message for input
MessageBox.Show("Invalid input for number of days to multiply.");
}
}
else
{
//error message for input
MessageBox.Show("Invalid input for average daily increase.");
}
}
else
{
//error message for input
MessageBox.Show("Invalid input for starting days");
}
StreamWriter outputFile;
outputFile = File.CreateText("population.txt");
outputFile.WriteLine("Approximate population: ");
outputFile.WriteLine(populationListBox.Items);
outputFile.ToString();
outputFile.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
You can use library like FileHelper to perform your job. It is open source and free. If you want to use just FileIO from .NET framework you can do that also
using (StreamWriter sr = File.CreateText("population.txt"))
{
foreach (string s in listBox.Items)
{
sr.WriteLine(s);
}
}
I have 1000's of images and more than 100,000 lines of log files. I need to check if there exists an image that is associated with each unix time in log file. To do this, I first read thru all the images and stored time information in an array. Then I read thru all the lines of the log file, split each information (lat, long, time) and stored them in an array. Finally, I am taking one time element at a time and checking if it matches with image time array. If no match is found, I get the time from log, get lat and long from the same array and write it to a text file. But the overall process takes very long time. I am looking into efficiencies on how to make this process faster.
var fileList = Directory.GetFiles(imageLocation, "*.jpg");
//Array that will store all the time information obtained from image property
double[] imgTimeInfo = new double[fileList.Length];
int imgTimeCounter =0;
foreach (var fileName in fileList)
{
x++;
string fileNameShort = fileName.Substring(fileName.LastIndexOf('\\') + 1);
richTextBox1.AppendText("Getting time information from image " + x + " of " + fileList.Length + " : " + fileNameShort + Environment.NewLine);
richTextBox1.Refresh();
using (var fs = File.OpenRead(fileName))
{
//create an instance of a bitmap image
var image = new Bitmap(fs);
//get the date/time image property of the image
PropertyItem property = image.GetPropertyItem(36867);
System.Text.Encoding encoding = new System.Text.ASCIIEncoding();
string valueFrmProperty = encoding.GetString(property.Value);
//Format the value obtained to convert it into unix equivalent for comparison
string valueCorrected = valueFrmProperty.Split(' ')[0].Replace(":", "/") + " " + valueFrmProperty.Split(' ')[1];
var unixTime = ConvertToUnixTimeStamp(DateTime.Parse(valueCorrected));
imgTimeInfo[imgTimeCounter] = unixTime;
imgTimeCounter++;
//It is very important to dispose the image resource before trying to read the property of another image. image.dispose frees the resources or else we get
//outofmemoryexception.
image.Dispose();
}
}
MessageBox.Show("Images done.");
richTextBox1.AppendText("Fetching time information from log files..."+Environment.NewLine);
richTextBox1.Refresh();
int counter4Time = contentBathy.Length / 6;
//assign counter for lat,long and time
int timeCounter = 3;
for (int i = 0; i < counter4Time; i++)
{
richTextBox1.AppendText("Searching time match with image files..." + Environment.NewLine);
richTextBox1.Refresh();
double timeValue = Int32.Parse(contentBathy[timeCounter]);
//Looks for values that is +- 3 seconds different in the image file.
if (Array.Exists(imgTimeInfo, a => a == timeValue || a == timeValue + 1 || a == timeValue + 2|| a == timeValue+3
||a == timeValue-1|| a== timeValue-2||a == timeValue-3))
{
File.AppendAllText(#"c:\temp\matched.txt", "Lat : " + contentBathy[timeCounter - 3] + " Log : " + contentBathy[timeCounter - 2] + Environment.NewLine);
richTextBox1.AppendText("Image with same time information found. Looking for another match."+ Environment.NewLine);
}
else
{
//richTextBox1.AppendText("Time did not match...Writing GPX cordinates..." + Environment.NewLine);
//richTextBox1.Refresh();
File.AppendAllText(gpxLocation, "Lat : " + contentBathy[timeCounter - 3] + " Log : " + contentBathy[timeCounter - 2] + Environment.NewLine);
}
if(timeCounter < contentBathy.Length-3)
timeCounter += 6;
}
}
In my application, I am gathering data regarding the performance of system, where I need to find
% Free Space
% Disk Time
% Disk Read Time
% Disk Write Time
% Idle Time
% Usage
% Usage Peak
using below function;
private void CollectnPopulatePerfCounters()
{
try
{
foreach (var pc in System.Diagnostics.PerformanceCounterCategory.GetCategories())
{
if (pc.CategoryName == "LogicalDisk" || pc.CategoryName == "Paging File" || pc.CategoryName == "ProcessorPerformance")
{
try
{
foreach (var insta in pc.GetInstanceNames())
{
try
{
foreach (PerformanceCounter cntr in pc.GetCounters(insta))
{
using (System.IO.StreamWriter sw = new System.IO.StreamWriter("C:\\amit.txt", true))
{
sw.WriteLine("--------------------------------------------------------------------");
sw.WriteLine("Category Name : " + pc.CategoryName);
sw.WriteLine("Counter Name : " + cntr.CounterName);
sw.WriteLine("Explain Text : " + cntr.CounterHelp);
sw.WriteLine("Instance Name: " + cntr.InstanceName);
sw.WriteLine("Value : " + Convert.ToString(cntr.RawValue)); //TODO:
sw.WriteLine("Counter Type : " + cntr.CounterType);
sw.WriteLine("--------------------------------------------------------------------");
}
}
}
catch (Exception) { }
}
}
catch (Exception) { }
}
}
}
catch (Exception) { }
}
When the code is executed the data is generated. While observing I found that the value against the above mentioned list [i.e. % free space, % disk time etc.] is not in correct form.
On my machine the value for
% Disk Read Time = 44553438000 for C Drive
% Usage Peak = 48386 for \??\C:\pagefile.sys
actually the value should be in the percent form [i.e within the range of 0 to 100 %]
Is there any way to get the exact value for all these except [% free Space for which I have calculated].
Or
Does anyone know how to calculate for rest of all headers.
Use following
sw.WriteLine("Value : " + Convert.ToString(Math.Round(cntr.NextValue(),2)) + "%");
More info at:
Why the cpu performance counter kept reporting 0% cpu usage?
All the best!
Don't forget to vote :-D
I've been working for the last few days on a method to compress 144 million tile representation for my xna game down to a very small size when saved. Having managed to pull that off I now find myself stumped on how to go about getting them back from the file in chunks.
In the file I have.
An integer (it gets compressed to bytes using the 7BitEncodedInt method)
A byte
The compressed integer represents the number of tiles and the byte that follows determines what type the tiles are. This is all well and good and works really well. Most importantly it shrinks the file size down to just 50mb on average.
The problem is that I am currently reading back the entire file.
From the file I'm getting this.
The index value of each tile (just a basic iteration as I grab the tiles)
The type for each tile as a byte value
A byte value representing a texture for that tile (this is hard to explain but its necessary on a per tile basis)
The end result of all this is that I'm managing to save the file and only use about 50mb. But by loading the whole thing back in it expands out to nearly 1.5gigs on the ram. I can't really afford to sacrifice anymore tile info. so I need a way to only load portions of the map based on the player location. The goal is to be around the 100-200mb range
I have been looking at memory mapping the file, using quadtrees, pretty much anything I could find for loading files in chunks. While these options all seem pretty good I'm not sure which is best or if given the situation there may be another even better one. The other problem with all this is that these solutions all seem very involved (especially since this is my first time using them) and while I'm not against devoting myself to some lengthy coding I'd like to know that its gonna do what I need it to before hand.
My question is, given how I have to process the file as I pull it in and the fact that it needs to be done based on the players location what would be the best way to do this ? I'm just looking for some direction here. Code is always welcome but not required.
You want to have fixed length variables in your Tile class and implement something like such:
This is an example of a collection class (People) that can get a value based on index from a collection that was serialised into a file.
Person is the class that is the base of the People collection.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace FileStreamDatabaseTest
{
class Program
{
static void Main(string[] args)
{
People.OpenCollection();
People.Test_WillOverwriteData();
People.CloseCollection();
Console.ReadLine();
}
}
public class Person
{
// define maxium variable sizes for serialisation
protected static int pMaxLength_FirstName = 64;
protected static int pMaxLength_Age = 10;
public static int MaxObjectSize
{
get
{
// return the sum of all the maxlegnth variables to define the entire object size for serialisation
return pMaxLength_FirstName + pMaxLength_Age;
}
}
// define each object that will be serialised as follows
protected string pFirstName;
public string Firstname
{
set
{
// ensure the new value is not over max variable size
if (value.Length > pMaxLength_FirstName)
throw new Exception("the length of the value is to long.");
pFirstName = value;
}
get
{
return pFirstName;
}
}
protected int pAge;
public int Age
{
get
{
return pAge;
}
set
{
pAge = value;
}
}
public byte[] Serialise()
{
// Output string builder
StringBuilder Output = new StringBuilder();
// Append firstname value
Output.Append(Firstname);
// Add extra spaces to end of string until max length is reached
if (Firstname.Length < pMaxLength_FirstName)
for (int i = Firstname.Length; i < pMaxLength_FirstName; i++)
Output.Append(" ");
// Append age value as string
Output.Append(Age.ToString());
// Add extra spaces to end of string until max length is reached
int AgeLength = Age.ToString().Length;
if (AgeLength < pMaxLength_Age)
for (int i = AgeLength; i < pMaxLength_Age; i++)
Output.Append(" ");
// Return the output string as bytes using ascii encoding
return System.Text.Encoding.ASCII.GetBytes(Output.ToString());
}
public void Deserialise(byte[] SerialisedData)
{
string Values = System.Text.Encoding.ASCII.GetString(SerialisedData);
pFirstName = Values.Substring(0, pMaxLength_FirstName).Trim();
pAge = int.Parse(Values.Substring(pMaxLength_FirstName, pMaxLength_Age).Trim());
}
}
public static class People
{
private static string tileDatasource = #"c:\test.dat";
private static System.IO.FileStream FileStream;
public static void OpenCollection()
{
FileStream = new System.IO.FileStream(tileDatasource, System.IO.FileMode.OpenOrCreate, System.IO.FileAccess.ReadWrite, System.IO.FileShare.None);
}
public static void CloseCollection()
{
FileStream.Close();
FileStream.Dispose();
FileStream = null;
}
public static void SaveCollection(Person[] People)
{
FileStream.SetLength(People.Length * Person.MaxObjectSize);
FileStream.Position = 0;
foreach (Person PersonToWrite in People)
{
// call serialise to get bytes
byte[] OutputBytes = PersonToWrite.Serialise();
// write the output buffer
// note: this will always be the same size as each variable should
// append spaces until its max size is reached
FileStream.Write(OutputBytes, 0, OutputBytes.Length);
}
}
public static Person GetValue(int Index)
{
// set the stream position to read the object by multiplying the requested index with the max object size
FileStream.Position = Index * Person.MaxObjectSize;
// read the data
byte[] InputBytes = new byte[Person.MaxObjectSize];
FileStream.Read(InputBytes, 0, Person.MaxObjectSize);
// deserialise
Person PersonToReturn = new Person();
PersonToReturn.Deserialise(InputBytes);
// retun the person
return PersonToReturn;
}
public static void Test_WillOverwriteData()
{
long StartTime;
long EndTime;
TimeSpan TimeTaken;
Console.WriteLine("-------------------------------------------------------------------");
Console.WriteLine("*** Creating 2,000,000 test people... ");
StartTime = DateTime.Now.Ticks;
Person[] People = new Person[2000000];
for (int i = 0; i < 2000000; i++)
{
People[i] = new Person();
People[i].Firstname = "TestName." + i;
People[i].Age = i;
}
EndTime = DateTime.Now.Ticks;
TimeTaken = new TimeSpan(EndTime - StartTime);
Console.WriteLine("-> Completed in " + TimeTaken.TotalSeconds + " seconds");
Console.WriteLine("-------------------------------------------------------------------");
Console.WriteLine("*** Serialising Collection to disk... ");
StartTime = DateTime.Now.Ticks;
SaveCollection(People);
EndTime = DateTime.Now.Ticks;
TimeTaken = new TimeSpan(EndTime - StartTime);
Console.WriteLine("-> Completed in " + TimeTaken.TotalSeconds + " seconds");
Console.WriteLine("-------------------------------------------------------------------");
Console.WriteLine("*** Redundancy Test... ");
StartTime = DateTime.Now.Ticks;
bool Parsed = true;
int FailedCount = 0;
for (int i = 0; i < 2000000; i++)
{
if (GetValue(i).Age != i)
{
Parsed = false;
FailedCount++;
}
}
EndTime = DateTime.Now.Ticks;
TimeTaken = new TimeSpan(EndTime - StartTime);
Console.WriteLine("-> " + (Parsed ? "PARSED" : "FAILED (" + FailedCount + " failed index's"));
Console.WriteLine("-> Completed in " + TimeTaken.TotalSeconds + " seconds");
Console.WriteLine("-------------------------------------------------------------------");
Console.WriteLine("*** Reading 10,000 index's at once... ");
StartTime = DateTime.Now.Ticks;
Person[] ChunkOfPeople = new Person[10000];
for (int i = 0; i < 10000; i++)
ChunkOfPeople[i] = GetValue(i);
EndTime = DateTime.Now.Ticks;
TimeTaken = new TimeSpan(EndTime - StartTime);
Console.WriteLine("-> Completed in " + TimeTaken.TotalSeconds + " seconds");
Console.WriteLine("-------------------------------------------------------------------");
Console.WriteLine("*** Reading 100,000 index's at once... ");
StartTime = DateTime.Now.Ticks;
ChunkOfPeople = new Person[100000];
for (int i = 0; i < 100000; i++)
ChunkOfPeople[i] = GetValue(i);
EndTime = DateTime.Now.Ticks;
TimeTaken = new TimeSpan(EndTime - StartTime);
Console.WriteLine("-> Completed in " + TimeTaken.TotalSeconds + " seconds");
Console.WriteLine("-------------------------------------------------------------------");
Console.WriteLine("*** Reading 1,000,000 index's at once... ");
StartTime = DateTime.Now.Ticks;
ChunkOfPeople = new Person[1000000];
for (int i = 0; i < 1000000; i++)
ChunkOfPeople[i] = GetValue(i);
EndTime = DateTime.Now.Ticks;
TimeTaken = new TimeSpan(EndTime - StartTime);
Console.WriteLine("-> Completed in " + TimeTaken.TotalSeconds + " seconds");
}
}
}
There are a number of options, not all of them may be appropriate for your particular project:
Don't use a single file for all the data. Divide the map in smaller "rooms" and store each one in its own file. Load only the "room" the player starts in and preemptively load neighboring "rooms" and unload old ones.
Reduce the number of tiles you need to store. Use procedural generation to create an area's layout.
If you have a 10x10 room with the floor made of a single tile type then don't store 100 separate tiles but instead use a specific marker that says "this area has a 10x10 floor with this tile". If it's a wall then save the start and end positions and the texture type. If you have a multi-tile doodad in the middle of an open field and it's position is not relevant to the story then position it randomly in the field (and save the seed for the random number generator in the map file so next time it will appear in the same place).