Display unique number before array beginning C# - c#

I was working on a small program that basically reads from a txt multiple arrays and writes them to another file, but, additionally, it should generate a unique number and place it just before the information. I got the first part working with no problems but the second part is causing me problems even though it should work.
public static void Main(string[] args)
{
StreamReader vehiclereader = new StreamReader(#"C:\Users\Admin\Desktop\Program\vehicles.txt");
string line = vehiclereader.ReadToEnd();
string ID;
string category;
string Type;
string Brand;
string Model;
string Year;
string Colour;
while (line != null)
{
var parts = line.Split(',');
Type = parts[0];
Brand = parts[1];
Model = parts[2];
Year = parts[3];
Colour = parts[4];
Console.WriteLine(line);
string[] lines = { line };
System.IO.File.WriteAllLines(#"C:\Users\Admin\Desktop\Program\vehicles2.txt", lines);
List<string> categories = new List<string>();
categories.Add(Type);
int count = categories.Where(x => x.Equals(Type)).Count();
ID = Type.Substring(0, 4) + count.ToString("00");
Console.ReadKey();
}
}
}
Currently, this code reads from a txt file, displays it into the console and writes it back to another txt file. This is fine, the part that is not willing to work is the unique number generator.
Last 5 lines of code are supposed to add a unique number just before 'Type' data and would always start from 001. If the 'Type' data is identical, then the number just grows in ascending order. Otherwise, the unique number should reset for new types and start counting from 001 and should keep growing for identical types. (e.g. For all lightweight vehicles the counter should be the same and for heavyweight vehicles the counter should be different but count all of the heavy vehicles)
I'm open to any help or suggestions!

There are a variety of issues and suggestions with this code, allow me to list them before providing a corrected version:
StreamReader is disposable, so put it in a "using" block.
ReadToEnd reads the entire file into a single string, whereas your code structure is expecting it to return a line at a time, so you want the "ReadLine" method.
The value of line does not get modified within your loop, so you will get an infinite loop (program that never ends).
(Suggestion) Use lower case letters at the start of your variable names, it will help you spot what things are variables and what are classes/methods.
(Suggestion) The local variables are declared in a wider scope than they are needed. There is no performance hit to only declaring them within the loop, and it makes your program easier to read.
"string[] lines = { line };" The naming implies that you think this will split the single line into an array of lines. But actually, it will just create an array with one item in it (which we've already established is the entire contents of the file).
"category" is an unused variable; but actually, you don't use Brand, Model, Year or Colour either.
It would have helped if the question had a couple of lines as an example of input and output.
Since, you're processing a line at a time, we might as well write the output file a line at a time, rather than hold the entire file in memory at once.
The ID is unused, and that code is after the line writing the output file, so there is no way it will appear in there.
"int count = categories.Where(x => x.Equals(type)).Count();" is inefficient, as it iterates through the list twice: prefer "int count = categories.Count(x => x.Equals(type));"
Removed the "Console.Write", since the output goes to a file.
Is that "Console.ReadKey" meant to be within the loop, or after it? I put it outside.
I created a class to be responsible for the counting, to demonstrate how it is possible to "separate concerns".
Clearly I don't have your files, so I don't know whether this will work.
class Program
{
public static void Main(string[] args)
{
var typeCounter = new TypeCounter();
using (StreamWriter vehicleWriter = new StreamWriter(#"C:\Users\Admin\Desktop\Program\vehicles2.txt"))
using (StreamReader vehicleReader = new StreamReader(#"C:\Users\Admin\Desktop\Program\vehicles.txt"))
{
string line;
while ((line = vehicleReader.ReadLine()) != null)
{
var parts = line.Split(',');
string type = parts[0].Substring(0, 4); // not sure why you're using substring, I'm just matching what you did
var identifier = typeCounter.GetIdentifier(type);
vehicleWriter.WriteLine($"{identifier},{line}");
}
}
Console.ReadKey();
}
}
public class TypeCounter
{
private IDictionary<string, int> _typeCount = new Dictionary<string, int>();
public string GetIdentifier(string type)
{
int number;
if (_typeCount.ContainsKey(type))
{
number = ++_typeCount[type];
}
else
{
number = 1;
_typeCount.Add(type, number);
}
return $"{type}{number:00}"; // feel free to use more zeros
}
}

Related

How do i make a global array

I am trying to create a global array using data from a file and use that array for calculations in different functions, e.g. a button click handler. When I use my calculation button it says 'Value cannot be null'.
public partial class Form1 : Form
{
double size = 0;
double[] temperture;
public Form1()
{
InitializeComponent();
OpenFileDialog fileDialog = new OpenFileDialog();
if (fileDialog.ShowDialog() == DialogResult.OK)
{
StreamReader sr = new StreamReader(fileDialog.OpenFile());
string line = sr.ReadLine();
double size = Convert.ToDouble(line);
//create array
double[] temperture = new double[(int)size];
for (int i = 0; i < size; i++)
{
line = sr.ReadLine();
//convert line to double and store in the array
temperture[i] = Convert.ToDouble(line);
}
}
}
private void calculateAverageTempertureToolStripMenuItem1_Click(object sender, System.EventArgs e)
{
double sum = temperture.Sum();
double average = ((double)sum) / temperture.Length;
textBox1.Text = "Average Temperature = " + average;
}
}
Don't redeclare a local variable for temperture:
double[] temperture = new double[(int)size];
Re-use the instance variable you already declared as an instance of the class:
temperture = new double[(int)size];
You are almost done!
Try not to put everything into one procedure.
Google for the SOLID principle and remember the S!
In your Form1 class you have fields: temperture and size. Before you display your form, you want to give these fields a value.
Ask operator for a file name
To calculate these values, you need a filename. You decided to ask the operator for the filename. So let's create small procedures in your Form1 class that will do this: one to ask for the filename, one to read the contents of the file and one to put this all together:
private string SelectFileName()
{
using (OpenFileDialog dlg = new OpenFileDialog())
{
// Set properties before showing the dialog, for example:
dlg.Title = "Please select a file";
dlg.CheckFileExists = true;
dlg.InitialDirectory = ...
// etc. Google: OpenFileDialog class
// show the dlg, and if user presses OK, return the filename, otherwise null
var dlgResult = dlg.ShowDialog(this);
if (dlgResult == DialogResult.Ok)
return dlg.FileName;
else
return null;
}
}
OpenFileDialog is Disposable. Remember to always put IDisposable objects inside a using statement, so you can be certain that it is disposed after use, even if an exception is thrown.
For properties see OpenFileDialog and the base class.
Read the file with Temperatures
After the operator selected a file, you can read it. Apparently the first line of the file contains the number of temperatures as a string, every next line contains every temperature in string format.
You decided to use two separate fields: size and temperature. Know that every array has a property length that holds the number of elements in the array, so you won't need field size.
Furthermore, my advice would be to use List<double> instead of double[]. Only use arrays if you know the size of the array before you create it, and if you are certain that you will never have to change the length. With Lists you won't have to care about the number of elements in it, if you add items, the list automatically changes size.
private List<double> Temperatures {get; set;}
(or if you want, make it a field: `private list temperatures;)
private List<double> ReadTemperatureFile(string fileName)
{
// TODO: decide what to do if fileName null, or if file does not exist
// return empty list? or throw ArgumentNullException and FileNotFoundException?
using (var textReader = File.OpenText(fileName))
{
// first line is the expected number of Temperatures in this file
string firstLine = textReader.ReadLine();
if (firstLine == null)
{
// There is no first line
// Todo: return empty array or throw DataNotFoundException?
}
// if here: the first line has been read
int expectedNumberOfTemperatures = Int32.Parse(firstLine);
List<double> temperatures = new List<double>(firstLine);
Actually, to create a List, you don't need to know how Long it will be. Adding the expected size to the constructor is only to improve performance.
If you are free to change the format of the temperature files, consider to remove the line with the number of elements and keep only the temperatures. This way, you can avoid the problem that the first line says that there are 100 temperatures, but the file only contains 50 temperature values.
By the way, only use Int32.Parse(firstLine) if you are absolutely certain that the first line can be parsed to an int. If you want to handle invalid file formats correctly, consider to use:
if (!Int32.TryParse(firstLine, out int expectedNumberOfTemperatures)
{
// TODO: handle invalid file format; return empty array?
// throw InvalidDataException?
}
List<double> temperatures = new List<double>(expectedNumberOfTemperatures);
By the way: the StreamReader that OpenText returns is IDisposable, so I wrapped it in a using statement.
Continuing reading the file:
string temperatureText = textReader.ReadLine();
while (temperatureText != null)
{
// a line has been read, convert to double and add to array
double temperature = Int32.Parse(temperatureText);
temperatures.Add(temperature);
temperaturText = textReader.ReadLine();
}
}
Did you see that I didn't care about the actual number of temperatures in the file? If the first line said that 100 temperatures are expected, but in fact there are 200 temperatures, I just read them all, and add them to the List. The List grows automatically when needed.
Of course, if you only want to read 100 temperatures, even if there are still temperatures left, then use a counter and stop when the expectedNumberOfTemperatures are read.
Only use Int32.Parse if you are certain that the file contains only valid temperatures, otherwise use Int32.TryParse and decide what to do if an invalid line has been read.
By the way, if you use LINQ, your procedure will be much smaller. Most programmers will immediately know what happens:
(assuming that you can change the file such that it only contains temperatures.)
private List<double> ReadTemperatureFile(string fileName)
{
return System.IO.File.ReadAllLines(fileName)
.Select(line => int32.Parse(line))
.ToList();
}
In words: read all lines that are in the text file with fileName. Parse every read line into a double and convert the sequence of parsed doubles to a List.
If you really want to return double[], replace the terminating ToList() with a ToArray()
Or if the first line has to hold the expected number of temperatures and you want to return all temperatures, skip the first line before you convert to doubles:
return System.IO.File.ReadAllLines(fileName)
.Skip(1)
.Select(line => int32.Parse(line))
.ToList();
Calculate Sum and Average of every sequence of doubles
To make this procedure more reusable, I won't make it only for Lists, or for Arrays, but for every sequence of doubles:
private void CalculateAverageTemperature(IEnumerable<double> temperatures)
{
// using LINQ make this a one-liner:
return temperatures.Average();
}
In fact, I wouldn't even bother to create a procedure for this.
Put it all together:
private void FillTemperatures()
{
string fileName = this.SelectFileName();
this.Temperatures = this.ReadTemperatureFile(fileName);
double averageTemperature = this.Temperatures.Average();
this.textBoxAverage.Text = averageTemperature.ToString(); }
}
And finally your event handler:
private void OnMenuItemCalculateAverage(object sender, ...)
{
this.FillTemperatures();
}
I separated the event handler from the actual data processing. If you decide to use a button to select a file and calculate an average, changes will be minimal
Conclusion
Because you separated your code into smaller procedures, that each have only one task, it is much easier for readers to understand what every procedure should do. For you it is much easier to unit test each procedure. If slight changes are needed, for instance you want a button instead of a menu item, or uses type the name of the file in a text box, or you want to support OpenFileDialog as well as a text box with the filename, changes are minimal.
Furthermore: always wrap disposables in a using statement. Use List<...> instead of array, and consider to use LINQ to process sequences of similar items.

How do you return an array in a procedure back to the main procedure in C#?

static string readfileName(string[] name)
{
using (StreamReader file = new StreamReader("StudentMarks.txt"))
{
int counter = 0;
string ln;
while ((ln = file.ReadLine()) != null)
{
if (ln.Length > 4)
{
name[counter] = ln;
counter++;
}
}
file.Close();
return name;
}
}
This is the procedure I'm currently trying to return the array name[50] but the compile time error I can't fix states
"Error CS0029 Cannot implicitly convert type 'string[]' to 'string' "
You don't need to. Your main method passed the array to this method, this method filled it. It doesn't need to hand it back because the object pointed to by your 'name` variable is the same object as pointed to by the original variable in the main method; your main method already has all the array data:
static void Main(){
var x = new string[10];
MyMethod(x);
Console.Write(x[0]); //prints "Hello"
}
static void MyMethod(string[] y){
y[0] = "Hello";
}
In this demo code above we start out with an array of size 10 that is referred to by a variable x. In memory it looks like:
x --refers to--> arraydata
When you call MyMethod and pass x in, c# will create another reference y that points to the same data:
x --refers to--> arraydata <--refers to-- y
Now because both references point to the same area of memory anything that you do with y, will also affect what x sees. You put a string (like I did with Hello) in slot 0, both x and y see it. When MyMethod finishes, the reference y is thrown away, but x survives and sees all the changes you made when working with y
The only thing you can't do is point y itself to another different array object somewhere else in memory. That won't change x. You can't do this:
static void MyMethod(string[] y){
y = new string[20];
}
If you do this your useful reference of x and y pointing to the same area of memory:
x ---> array10 <--- y
Will change to:
x ---> array10 y ---> array20
And then the whole array20 and the y reference will be thrown away when MyMethod finishes.
The same rule applies if you call a method that supplies you an array:
static void MyMethod(string[] y){
y = File.ReadAllLines("some path"); //this also points y away to a new array made by ReadAllLines
}
It doesn't matter how or who makes the new array. Just remember that you can fiddle with the contents of an object pointed to by y all you like and the changes will be seen by x, but you can't change out the entire object pointed to by y and hope x will see it
in that case you WOULD have to pass it back when you're done:
static string[] MyMethod(string[] y){
y = new ...
return y;
}
And the main method would have to capture the change:
Main(...){
string[] x = new string[10];
string[] result = MyMethod(x);
}
Now, while I'm giving this mini lesson of "pass by reference" and "pass by value" (which should have been called "pass by original reference" and "pass by copy of reference") it would be useful to note that there is a way to change things so MyMethod can swap y out for a whole new object and x will see the change too.
We don't really use it, ever; there is rarely any need to. Just about the only time it's used is in things like int.Parse. I'm telling you for completeness if education so that if you encounter it you understand it but you should always prefer a "change the contents but not the whole object" or a "if you make a new object pass it back" approach
By marking the y argument with the ref keyword, c# wont make a copy of the reference when calling the method, it will use the original reference and temporarily allow you to call it y:
static void MyMethod(ref string[] y){
y = new array[20];
}
Our diagram:
x ---> array10data
Temporarily becomes:
x a.k.a y ---> array10data
So if you point y to a new array, x experiences the change too, because they're the same reference; y is no longer a different reference to the same data
x a.k.a y ---> array20data
Like I say, don't use it- we always seek to avoid it for various reasons.
Now, I said at the start "you don't need to" - by that, and for the reasons above, I meant you don't need to return anything from this method
Your method receives the array it shall fill (from the file) as a parameter; it doesn't make a new array anywhere so there isn't any need to return the array when done. It will just put any line longer than 4 chars into an array slot. It could then finish without returning anything and the method that called this method will see the changes it made in the array. This is just like my code, where MyMethod changes slot 0 of the array, MyMethod was declared as void so it didn't need to make a return statement , and my Main method god could still see the Hello that I put in the array. In the same vein, your Main method will see all those lines from the file if you make your ReadFileName method (which should perhaps be called FillArray) because it fills the array called name
The most useful thing your method could return is actually an integer saying how many lines were read; the array passed in is of a fixed size. You can't resize it because that entails making a new array which won't work for all those reasons I talked about above. If you were to make a new array and return it there wouldn't be any point in passing an array in.
There are thus several ways we could improve this code but to my mind they come down to two:
don't pass an array in; let this method make a new array and return it. The new array passed back can be exactly sized to fit
keep with the "pass an array in" idea and return an integer of how many lines were actually read instead
For the second idea (which is the simplest to implement) you have to change the return type to int:
static int ReadFileName(string[] name)
And you have to return that variable you use to track which slot to put the next thing in, counter. Counter is always 1 greater than the number of things you've stored so:
return counter - 1;
Your calling method can now look like:
string[] fileData = new string[10000]; //needs to be big enough to hold the whole file!
int numberOfLinesRead = ReadFileName(fileData);
Can you see now why ReadFileName is a bad name for the method? Calling it FillArrayFromFile would be better. This last line of code doesn't read like a book, it doesn't make sense from a natural language perspective. Why would something that looks like it reads a file name (if that even makes sense) take an array and return an int - calling it ReadFileName makes it sound more like it searches an array for a filename and returns the slot number it was found in. Here ends the "name your methods appropriately 101"
So the other idea was to have the Read method make its own array and return it. While we are at it, let's call it ReadFileNamed, and have it take a file path in so it's not hard coded to reading just that one file. And we will have it return an array
static string[] ReadFileNamed(string filepath)
^^^^^^^^ ^^^^^^^^^^^^^^^
the return type the argument passed in
Make it so the first thing it does is declare an array big enough to hold the file (there are still problems with this idea, but this is programming 101; I'll let them go. Can't fix everything using stuff you haven't been taught yet)
Put this somewhere sensible:
string lines = new string[10000];
And change all your occurrences of "name" to be "lines" instead - again we name our variables we'll just like we name our methods sensibly
Change the line that reads the fixed filename to use the variable name we pass in..
using (StreamReader file = new StreamReader(filepath))
At the end of the method, the only thing left to do is size the array accurately before we return it. For a 49 line file, counter will be 50 so let's make an array that is 49 big and then fill it using a loop (I doubt you've been shown Array.Copy)
string[] toReturn = new string[counter-1];
for(int x = 0; x < toReturn.Length; x++)
toReturn[x] = lines[x];
return toReturn;
And now call it like this:
string[] fileLines = ReadFileNamed("student marks.txt");
If you're looking to return name[50] and you know that will be populated, why not go with:
static string readfileName(string[] name)
{
using (StreamReader file = new StreamReader("StudentMarks.txt"))
{
int counter = 0;
string ln;
while ((ln = file.ReadLine()) != null)
{
if (ln.Length > 4)
{
name[counter] = ln;
counter++;
}
}
file.Close();
return name[50];
}
}
You're getting the error because your method signature indicates that you're going to return a string, but you're defining name as a string[] in the argument. If you simply select a single index of your array in the return statement, you'll only return a string.
You have defined your method to return a string, yet the code inside is returning name, which is a string[]. If you want it to return a string[], then change the signature to specify that:
static string[] ReadFileName(string[] name)
However, since your method is only populating the array that was passed in, it's not really necessary to return the array, since the caller already has a reference to the array we're modifying (they passed it to our method in the first place).
There is a potential problem here, though
We're expecting the caller to pass us an array of the appropriate length to hold all the valid lines from the file, yet that number is unknown until we read the file. We could return an array of the size they specified with either empty indexes at the end if it was too big, or incomplete data if it was too small, but instead we should probably just return them a new array, and not require them to pass one to us.
Note that it's easier to use a List<string> instead of a string[], since lists don't require any knowledge of their size at instantiation (they can grow dynamically). Also, we no longer need a counter variable (since we're using the Add method of the list to add new items), and we can remove the file.Close() call since the using block will call that automatically (one of the cool things about them):
static string[] ReadFileName()
{
List<string> validLines = new List<string>();
using (StreamReader file = new StreamReader("StudentMarks.txt"))
{
string ln;
while ((ln = file.ReadLine()) != null)
{
if (ln.Length > 4)
{
validLines.Add(ln);
}
}
}
return validLines.ToArray();
}
And we can simplify the code even more if we use some static methods of the System.IO.File class:
static string[] ReadFileName()
{
return File.ReadLines("StudentMarks.txt").Where(line => line.Length > 4).ToArray();
}
We could also make the method a little more robust by allowing the caller to specify the file name as well as the minimum line length requirement:
static string[] ReadFileName(string fileName, int minLineLength)
{
return File.ReadLines(fileName)
.Where(line => line.Length >= minLineLength).ToArray();
}
Well, you are trying to do several thing in one method:
Read "StudentMarks.txt" file
Put top lines into name existing array (what if you have too few lines in the file?)
return 50th (magic number!) item
If you insist on such implementation:
using System.Linq;
...
static string readfileName(string[] name)
{
var data = File
.ReadLines("StudentMarks.txt")
.Where(line => line.Length > 4)
.Take(name.Length);
int counter = 0;
foreach (item in data)
if (counter < name.Length)
name[counter++] = item;
return name.Length > 50 ? name[50] : "";
}
However, I suggest doing all things separately:
// Reading file lines, materialize them into string[] name
string[] name = File
.ReadLines("StudentMarks.txt")
.Where(line => line.Length > 4)
// .Take(51) // uncomment, if you want at most 51 items
.ToArray();
...
// 50th item of string[] name if any
string item50 = name.Length > 50 ? name[50] : "";
Edit: Splitting single record (name and score) into different collections (name[] and score[]?) often is a bad idea;
the criterium itself (line.Length > 4) is dubious as well (what if we have Lee - 3 letter name - with 187 score?).
Let's implement Finite State Machine with 2 states (when we read name or score) and read (name, score) pairs:
var data = File
.ReadLines("StudentMarks.txt")
.Select(line => line.Trim())
.Where(line => !string.IsNullOrEmpty(line));
List<(string name, int score)> namesAndScores = new List<(string name, int score)>();
string currentName = null;
foreach (string item in data) {
if (null == currentName)
currentName = item;
else {
namesAndScores.Add((currentName, int.Parse(item)));
currentName = null;
}
}
Now it's easy to deal with namesAndScores:
// 25th student and his/her score:
if (namesAndScores.Count > 25)
Console.Write($"{namesAndScores[25].name} achieve {namesAndScores[25].score}");

How to add more than one element to my list/array in C# while reading a file, so I can print all on one line per line of the text?

Help me find out how to add the full array i created into one line, It looks like by changing the int to double to big error changes, so for now I just want to try to add everything on the whole line.
It seems when i try to add a array it prints out the name of the array, when i try to print out the array instance name it prints the file pathway, when i add split array[0] it successfully prints the first element in the array. How can I add the whole array and not just the first element?
This is what the text looks like:
regular,bread,2.00,2
regular,milk,2.00,3
This is what I want it to look like after coded
regular,bread,2.00,2,(the result of 2*2*GST)
regular,milk,2.00,3,(the result of 2*3*GST)
This is what I get it(dont need to show regular item cost string):
System.Collections.Generic.List`1[System.String]
RegularItemCost:
4.4
This is the code I have got for reading and the method and constructors for calculations:
public List<string> readFile()
{
string line = "";
StreamReader reader = new StreamReader("groceries.txt"); //variable reader to read file
while ((line = reader.ReadLine()) != null) //reader reads each line while the lines is not blank, line is assigned value of reader
{
line = line.Trim(); //gets rid of any spaces on each iteration within the line
if (line.Length > 0) //during each line the below actions are performed
{
string[] splitArray = line.Split(new char[] { ',' }); //creates a array called splitArray which splits each line into an array and a new char
type = splitArray[0]; // type is assigned for each line at position [0] on
name = splitArray[1]; //name is assigned at position [1]
//<<<-------food cost calculation methods initialized-------->>>>
RegularItem purchasedItem = new RegularItem(splitArray); //purchased Item is the each line to be printed
FreshItem freshItem = new FreshItem(splitArray);
double regCost = purchasedItem.getRegularCost(); //regCost will multiply array at position [2] with [3]
double freshCost = freshItem.getFreshItemCost();
string[] arrayList = { Convert.ToString(regCost), Convert.ToString(freshCost) };
List<string> newArray = new List<string>(splitArray);
newArray.AddRange(arrayList);
if (type == "regular")
{
// items.InsertRange(4, (arrayList)); //first write a line in the list with the each line written
items.Add(Convert.ToString(newArray));
items.Add("RegularItemCost:");
items.Add(Convert.ToString(regCost)); //next add the regCost method to write a line with the cost of that item
}
else if (type == "fresh")
{
items.Add(Convert.ToString(freshItem)); //first write a line in the list with the each line written
items.Add("FreshItemCost:");
items.Add(Convert.ToString(freshCost)); //next add the fresh method to write another line with the cost of that item
}
}
}
return items;
}
//constrctor and method
public class RegularItem : GroceryItem //inheriting properties from class GroceryItem
{
private string[] splitArray;
public RegularItem()
{
}
public RegularItem(string[] splitArray) //enables constructor for RegularItem to split into array
{
this.type = splitArray[0];
this.name = splitArray[1];
this.price = double.Parse(splitArray[2]); //each line at position 4 is a double
this.quantity = double.Parse(splitArray[3]); //each line at position 3 is parsed to an integer
}
public double getRegularCost() //method from cost of regular
{
return this.price * this.quantity * 1.1; //workout out cost for purchases including GST
}
}
Ok, multiple things. First, while it's not bad to use Convert. ToString(), i think is better to just do . ToString. Remember all objects inherit from object so all objects will have that method.
If you want all the values of a collection to be "joined" into one string beter use string. Join(), look at it, you can specify a separatorto use between values. If you just use the Convert. ToString() directly on the list it just print the information about the list object itself, not the values inside the list.
Next, if you use ToString or Convert. ToString with a built-in type like int or double, it will print the number as a string, but if you do it with your custom object or simply with a more conplex object like List it will just print the type info. To solve this in your custom objects (like RegularItem f. I.) you must override the ToString() method an code there what you want to print when the method get called. So you can override the method and put there to print the cost dor example.

Method adds not a necessary line to a list when it should not do so

I'm a beginner in c# and I am working with text exercises. I made a method to filter vehicle's plate numbers. It should consist of 3 letters and 3 integers ( AAA:152 ). My method sends the wrong plate numbers to a file, but also it adds that bad number to a good ones list.
private static string[] InvalidPlates(string[] csvLines, int fieldToCorrect)
{
var toReturn = new List<string>();
var toSend = new List<string>();
int wrongCount = 0;
for (int i = 0; i < csvLines.Length; i++)
{
string[] stringFields = csvLines[i].Split(csvSeparator[0]);
string[] values = stringFields[fieldToCorrect].Split(':');
if(Regex.IsMatch(values[0], #"^[a-zA-Z]+$") && Regex.IsMatch(values[1], "^[0-9]+$"))
{
toReturn.Add(string.Join(csvSeparator, stringFields));
}
else
{
toSend.Add(string.Join(csvSeparator, stringFields));
wrongCount++;
}
}
WriteLinesToFile(OutputFile, toSend.ToArray(), wrongCount);
return toReturn.ToArray();
}
Can somebody help me to fix that?
You need to constrain the possible length using quantifiers:
^[a-zA-Z]{3}\:\d{3}$
which literally means the following, in the strict order:
the strings begins from exactly 3 lowercase or uppercase English alphabet letters, continues with semicolon (:), and ends with exactly three digits
Remember that \ should be escaped in C#.
Also, there is no need to join stringFields back into a string, when you can use non-splitted csvLines[i]:
if (Regex.IsMatch(stringFields, #"^[a-zA-Z]{3}\\:\\d{3}$"))
toReturn.Add(csvLines[i]);
}
else
{
toSend.Add(csvLines[i]);
wrongCount++;
}
Another important thing is that your code is incorrect in terms of OOP. It is pretty inobvious that your method called InvalidPlates will save something to a file. It may confuse you after some time or other developers. There should be no "hidden" functionality, and all methods should actually do only the one thing.
Here is how I would do this using LINQ:
private static bool IsACorrectPlate(string p) => Regex.IsMatch(p, #"^[a-zA-Z]{3}\:\d{3}$");
private static void SortPlatesOut(string[] csvLines, int column, out string[] correct, out string[] incorrect)
{
var isCorrect = csvLines
.GroupBy(l => IsACorrectPlate(l.Split(';')[column]))
.ToDictionary(g => g.Key, g => g.ToArray());
correct = isCorrect[true];
incorrect = isCorrect[false];
}
// Usage:
string[] incorrect, correct;
SortPlatesOut(csvLines, 1, out correct, out incorrect);
File.WriteAllLines("", incorrect);
// do whatever you need with correct
Now, SortPlatesOut method has an expectable behavior without side effects. The code has also become two times shorter. At the same time, it looks more readable for me. If it looks non-readable for you, you can unpack LINQ and split some things other things up.

Search for string in multiple text files of size 150 MB each C#

I have multiple .txt files of 150MB size each. Using C# I need to retrieve all the lines containing the string pattern from each file and then write those lines to a newly created file.
I already looked into similar questions but none of their suggested answers could give me the fastest way of fetching results. I tried regular expressions, linq query, contains method, searching with byte arrays but all of them are taking more than 30 minutes to read and compare the file content.
My test files doesn't have any specific format, it's like raw data which we can't split based on a demiliter and filter based on DataViews.. Below is sample format of each line in that file.
Sample.txt
LTYY;;0,0,;123456789;;;;;;;20121002 02:00;;
ptgh;;0,0,;123456789;;;;;;;20121002 02:00;;
HYTF;;0,0,;846234863;;;;;;;20121002 02:00;;
Multiple records......
My Code
using (StreamWriter SW = new StreamWriter(newFile))
{
using(StreamReader sr = new StreamReader(sourceFilePath))
{
while (sr.Peek() >= 0)
{
if (sr.ReadLine().Contains(stringToSearch))
SW.WriteLine(sr.ReadLine().ToString());
}
}
}
I want a sample code which would take less than a minute to search for 123456789 from the Sample.txt. Let me know if my requirement is not clear. Thanks in advance!
Edit
I found the root cause as having the files residing in a remote server is what consuming more time for reading them because when I copied the files into my local machine, all comparison methods completed very quickly so this isn't issue with the way we read or compare content, they more or less took the same time.
But now how do I address this issue, I can't copy all those files to my machine for comparison and get OutOfMemory exceptions
Fastest method to search is using the Boyer–Moore string search algorithm as this method not require to read all bytes from the files, but require random access to bytes or you can try using the Rabin Karp Algorithm
or you can try doing something like the following code, from this answer:
public static int FindInFile(string fileName, string value)
{ // returns complement of number of characters in file if not found
// else returns index where value found
int index = 0;
using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName))
{
if (String.IsNullOrEmpty(value))
return 0;
StringSearch valueSearch = new StringSearch(value);
int readChar;
while ((readChar = reader.Read()) >= 0)
{
++index;
if (valueSearch.Found(readChar))
return index - value.Length;
}
}
return ~index;
}
public class StringSearch
{ // Call Found one character at a time until string found
private readonly string value;
private readonly List<int> indexList = new List<int>();
public StringSearch(string value)
{
this.value = value;
}
public bool Found(int nextChar)
{
for (int index = 0; index < indexList.Count; )
{
int valueIndex = indexList[index];
if (value[valueIndex] == nextChar)
{
++valueIndex;
if (valueIndex == value.Length)
{
indexList[index] = indexList[indexList.Count - 1];
indexList.RemoveAt(indexList.Count - 1);
return true;
}
else
{
indexList[index] = valueIndex;
++index;
}
}
else
{ // next char does not match
indexList[index] = indexList[indexList.Count - 1];
indexList.RemoveAt(indexList.Count - 1);
}
}
if (value[0] == nextChar)
{
if (value.Length == 1)
return true;
indexList.Add(1);
}
return false;
}
public void Reset()
{
indexList.Clear();
}
}
I don't know how long this will take to run, but here are some improvements:
using (StreamWriter SW = new StreamWriter(newFile))
{
using (StreamReader sr = new StreamReader(sourceFilePath))
{
while (!sr.EndOfStream)
{
var line = sr.ReadLine();
if (line.Contains(stringToSearch))
SW.WriteLine(line);
}
}
}
Note that you don't need Peek, EndOfStream will give you what you want. You were calling ReadLine twice (probably not what you had intended). And there's no need to call ToString() on a string.
As I said already, you should have a database, but whatever.
The fastest, shortest and nicest way to do it (even one-lined) is this:
File.AppendAllLines("b.txt", File.ReadLines("a.txt")
.Where(x => x.Contains("123456789")));
But fast? 150MB is 150MB. It's gonna take a while.
You can replace the Contains method with your own, for faster comparison, but that's a whole different question.
Other possible solution...
var sb = new StringBuilder();
foreach (var x in File.ReadLines("a.txt").Where(x => x.Contains("123456789")))
{
sb.AppendLine(x);
}
File.WriteAllText("b.txt", sb.ToString()); // That is one heavy operation there...
Testing it with a file size 150MB, and it found all results within 3 seconds. The thing that takes time is writing the results into the 2nd file (in case there are many results).
150MB is 150MB. If you have one thread going through the entire 150MB, line by line (a "line" being terminated by a newline character/group or by an EOF), your process must read in and spin through all 150MB of the data (not all at once, and it doesn't have to hold all of it at the same time). A linear search through 157,286,400 characters is, very simply, going to take time, and you say you have many such files.
First thing; you're reading the line out of the stream twice. This will, in most cases, actually cause you to read two lines whenever there's a match; what's written to the new file will be the line AFTER the one containing the search string. This is probably not what you want (then again, it may be). If you want to write the line actually containing the search string, read it into a variable before performing the Contains check.
Second, String.Contains() will, by necessity, perform a linear search. In your case, the behavior will actually approach N^2, because when searching for a string within a string, the first character must be found, and where it is, each character is then matched one by one to subsequent characters until all characters in the search string have matched or a non-matching character is found; when a non-match occurs, the algorithm must go back to the character after the initial match to avoid skipping a possible match, meaning it can test the same character many times when checking for a long string against a longer one with many partial matches. This strategy is therefore technically a "brute force" solution. Unfortunately, when you don't know where to look (such as in unsorted data files), there is no more efficient solution.
The only possible speedup I could suggest, other than being able to sort the files' data and then perform an indexed search, is to multithread the solution; if you're only running this method on one thread that looks through every file, not only is only one thread doing the job, but that thread is constantly waiting for the hard drive to serve up the data it needs. Having 5 or 10 threads each working through one file at a time will not only leverage the true power of modern multi-core CPUs more efficiently, but while one thread is waiting on the hard drive, another thread whose data has been loaded can execute, further increasing the efficiency of this approach. Remember, the further away the data is from the CPU, the longer it takes for the CPU to get it, and when your CPU can do between 2 and 4 billion things per second, having to wait even a few milliseconds for the hard drive means you're losing out on millions of potential instructions per second.
I'm not giving you sample code, but have you tried sorting the content of your files?
trying to search for a string from 150MB worth of files is going to take some time any way you slice it, and if regex takes too long for you, than I'd suggest maybe sorting the content of your files, so that you know roughly where "123456789" will occur before you actually search, that way you won't have to search the unimportant parts.
Do not read and write at same time. Search first, save list of matching lines and write it to file at the end.
using System;
using System.Collections.Generic;
using System.IO;
...
List<string> list = new List<string>();
using (StreamReader reader = new StreamReader("input.txt")) {
string line;
while ((line = reader.ReadLine()) != null) {
if (line.Contains(stringToSearch)) {
list.Add(line); // Add to list.
}
}
}
using (StreamWriter writer = new StreamWriter("output.txt")) {
foreach (string line in list) {
writer.WriteLine(line);
}
}
You're going to experience performance problems in your approaches of blocking input from these files while doing string comparisons.
But Windows has a pretty high performance GREP-like tool for doing string searches of text files called FINDSTR that might be fast enough. You could simply call it as a shell command or redirect the results of the command to your output file.
Either preprocessing (sort) or loading your large files into a database will be faster, but I'm assuming that you already have existing files you need to search.

Categories