I have a .csv file that looks like this:
Index,Time,value
1,0,20
2,1,30
What I want to do is to open the .csv file in C# and then read it by getting row = 0, column = 0 which would give me value 1 in the case above. Much like this:
public double GetVal(int row, int column)
{
...
return val;
}
I have looked around for a solution, for example this one: Remove values from rows under specific columns in csv file
But I need to be able to specify both the column and row in the function to get the specific value.
In case CSV file is simple one (no quotations), you can try Linq:
using System.IO;
using System.Linq;
...
public double GetVal(int row, int column) {
return File
.ReadLines(#"c:\MyFile.csv") //TODO: put the right name here
.Where(line => !string.IsNullOrWhiteSpace(line)) // to be on the safe side
.Skip(1) // Skip line with Titles
.Skip(row)
.Select(line => double.Parse(line.Split(',')[column]))
.First();
}
Note, that this code re-reads the file; you may want to read the file once:
string[][] m_Data = File
.ReadLines(#"c:\MyFile.csv")
.Where(line => !string.IsNullOrWhiteSpace(line))
.Skip(1)
.Select(line => line.Split(','))
.ToArray();
...
public double GetVal(int row, int column) => double.Parse(m_Data[row][col]);
Although you can write simple code by yourself, I would suggest you using dedicated CSV library for this like https://www.nuget.org/packages/LumenWorksCsvReader/
There are tons of edge cases like values escaping with double quotes, multiline values in CSV file format.
But if you totally control your files and they are small, you can read all lines at once and parse them. Something like this to get all lines from file and then split every line by ',' character
var lines = File.ReadAllLines('your file');
Quick answer, without considering the performance issue (e.g. the file read should happen once, and other issue like index checking to avoid overflows.
public double GetVal(int row, int column)
{
double output;
using (var reader = new StreamReader("filename.csv"))
{
int m = 1;
while (!reader.EndOfStream)
{
if(m==row)
{
var splits = rd.ReadLine().Split(',');
//You need check the index to avoid overflow
output = double.Parse(splits[column]);
return output;
}
m++;
}
}
return output;
}
Related
I have created a text file with some random float numbers ranging from 743.6 to 1500.4.
I am figuring out a way to read the text file (which i have did) and include a number range: lets say( 743.6 <= x <= 800) and remove the numbers which are outside the range and eventually store the final values in a text file.
I have managed to write some codes to read the text file so that when i compile it shows the numbers in the text file. Now i do not know how to progress further . Here is my code, which is able to run compile. This code now ables to read the textfile.
743.6
742.8
744.7
743.2
1000
1768.6
1750
1767
1780
1500
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
namespace ReadTextFile
{
class Program
{
static void Main(string[] args)
{
string filePath = "C:\\Users\\Student\\Desktop\\ConsoleApp1\\ConsoleApp1\\Data\\TextFile.txt"; // File Direcotry
List<string> lines = File.ReadAllLines(filePath).ToList();
foreach (string line in lines)
{
Console.WriteLine(line);
}
Console.ReadLine();
}
}
}
This will read the file into memory, parse it, filter it and than overwrite the existing file with the new data.
File.WriteAllLines(filePath,
File.ReadAllLines(filePath)
.Select(x => double.Parse(x))
.Where(x => x >= 800.5 && x <= 850.5)
.Select(x => x.ToString()));
Here's my solution w/ basic error detection and some robustness thanks to the use of regular expressions.
As a foreword: Using regular expressions can be quite expensive and they are not always the way to go.
In this case I think they're okay, because you're handling user-generated input (possibly).
Regular expressions can be optimised by pre-compiling them!
/*
using System;
using System.IO;
using System.Text.RegularExpressions;
*/
void ReadFile(string filePath) {
var fileInfo = default(FileInfo);
var separator = #"[;\s:,]"; // This is a simple RegEx, can be done otherwise. This allows for a little more robustness IMO
// VERY rudimentary error detection
if (string.IsNullOrEmpty(filePath))
throw new ArgumentNullException(nameof(filePath), "The path to the file must not be null or empty!");
try {
fileInfo = new FileInfo(filePath);
} catch {
throw new ArgumentException(nameof(filePath), "A valid path must be given!");
}
if (!fileInfo.Exists) {
throw new IOException(string.Format("The file {0} does not exist!", filePath));
}
// END VERY rudimentary error checking
var numberStrings = Regex.Split(File.ReadAllText(fileInfo.FullName), separator);
// numberStrings is now an array of strings
foreach (var numString in numberStrings) {
if (decimal.TryParse(numString, out var myDecimal)) {
// Do something w/ number
} else {
Debug.WriteLine("{0} is NaN!", numString);
}
}
}
Here's what the code does (written off the top of my head, please don't just C&P it. Test it first):
At first we're defining the regular expression. This matches any character in the range (between the brackets).
Then we're performing very basic error checking:
If the argument passed is null or empty throw an exception
If we couldn't parse the argument to a FileInfo object, the path is likely invalid. Throw an exception.
If the file doesn't exist, throw an exception.
Next we're reading the entire text file in to memory (not on a per-line basis!) and using the regular expression we've defined to split the entire string in to an array of strings.
At last we're looping through our array of strings and parsing each number to a float (that's what you wanted. I personally would use a double or decimal for more precision. See this video from Tom Scott.).
If the string doesn't parse to a float, then you can handle the error accordingly. Otherwise do what you need to with the variable myFloat.
EDIT:
I thought I read you wanting to use floats. My mistake; I changed the datatype to decimal.
You need to read all the lines and replace all values between your min and max value with an empty string:
float min = 800.5F, max = 850.5F;
float currentValue;
var lines = File.ReadAllLines(usersPath);
var separator = ';'; // Change this according to which separator you're using between your values (if any)
foreach (var line in lines)
{
foreach (string word in line.Trim().Split(separator))
{
if (float.TryParse(word.Trim(), out currentValue))
{
if (currentValue < min || currentValue > max)
{
line.Replace(word, "");
}
}
}
}
File.WriteAllLines(usersPath, lines);
I make an console app, this app is reading csv files using linq to load every line of the file into an IEnumerable.
var lines = from rawLine in File.ReadLines(readFolderFile, Encoding.Default) where !string.IsNullOrEmpty(rawLine) && !string.IsNullOrEmpty(rawLine.Trim(';')) select rawLine;
Now I need to check how many semicolon every line compared to the first line has and if a line has more semicolon than the first one it will be added to an errorList.
So my question is there any easy way to just count the amount of an specific symbol per line?
The outcome should be that I can say, after my source file is proceed with this app, that every row has the identical amount of columns.
Remember every string could be treated as IEnumerable<char> so:
using System.Linq;
...
"my;string;from;csv;file".Count(c => c.Equals(';')); // = 4;
...
You can use this:
int count = source.Count(f => f == ';');
where source is string variable.
So in your case it will look like:
foreach (var line in lines)
{
if (line.Count(f => f == ';') != firstLineCount))
{
//your logic here
}
}
I have a text file whose format is like this
Number,Name,Age
I want to read "Number" at the first column of this text file into an array to find duplication. here is the two ways i tried to read in the file.
string[] account = File.ReadAllLines(path);
string readtext = File.ReadAllText(path);
But every time i try to split the array to just get whats to the left of the first comma i fail. Have any ideas? Thanks.
You need to explicitly split the data to access its various parts. How would your program otherwise be able to decide that it is separated by commas?
The easiest approach to access the number that comes to my mind goes something like this:
var lines = File.ReadAllLines(path);
var firstLine = lines[0];
var fields = firstLine.Split(',');
var number = fields[0]; // Voilla!
You could go further by parsing the number as an int or another numeric type (if it really is a number). On the other hand, if you just want to test for uniqueness, this is not really necessary.
If you want all duplicate lines according to the Number:
var numDuplicates = File.ReadLines(path)
.Select(l => l.Trim().Split(','))
.Where(arr => arr.Length >= 3)
.Select(arr => new {
Number = arr[0].Trim(),
Name = arr[1].Trim(),
Age = arr[2].Trim()
})
.GroupBy(x => x.Number)
.Where(g => g.Count() > 1);
foreach(var dupNumGroup in numDuplicates)
Console.WriteLine("Number:{0} Names:{1} Ages:{2}"
, dupNumGroup.Key
, string.Join(",", dupNumGroup.Select(x => x.Name))
, string.Join(",", dupNumGroup.Select(x => x.Age)));
If you are looking specifically for a string.split solution, here is a really simple method of doing what you are looking for:
List<int> importedNumbers = new List<int>();
// Read our file in to an array of strings
var fileContents = System.IO.File.ReadAllLines(path);
// Iterate over the strings and split them in to their respective columns
foreach (string line in fileContents)
{
var fields = line.Split(',');
if (fields.Count() < 3)
throw new Exception("We need at least 3 fields per line."); // You would REALLY do something else here...
// You would probably want to be more careful about your int parsing... (use TryParse)
var number = int.Parse(fields[0]);
var name = fields[1];
var age = int.Parse(fields[2]);
// if we already imported this number, continue on to the next record
if (importedNumbers.Contains(number))
continue; // You might also update the existing record at this point instead of just skipping...
importedNumbers.Add(number); // Keep track of numbers we have imported
}
Here is just an example of the data I need to format.
The first column is simple, the problem the second column.
What would be the best approach to format multiple data fields in one column?
How to parse this data?
Important*: The second column needs to contain multiple values, like in an example below
Name Details
Alex Age:25
Height:6
Hair:Brown
Eyes:Hazel
A csv should probably look like this:
Name,Age,Height,Hair,Eyes
Alex,25,6,Brown,Hazel
Each cell should be separated by exactly one comma from its neighbor.
You can reformat it as such by using a simple regex which replaces certain newline and non-newline whitespace with commas (you can easily find each block because it has values in both columns).
A CSV file is normally defined using commas as field separators and CR for a row separator. You are using CR within your second column, this will cause problems. You'll need to reformat your second column to use some other form of separator between multiple values. A common alternate separator is the | (pipe) character.
Your format would then look like:
Alex,Age:25|Height:6|Hair:Brown|Eyes:Hazel
In your parsing, you would first parse the comma separated fields (which would return two values), and then parse the second field as pipe separated.
This is an interesting one - it can be quite difficult to parse specific format files which is why people often write specific classes to deal with them. More conventional file formats like CSV, or other delimited formats are [more] easy to read because they are formatted in a similar way.
A problem like the above can be addressed in the following way:
1) What should the output look like?
In your instance, and this is just a guess, but I believe you are aiming for the following:
Name, Age, Height, Hair, Eyes
Alex, 25, 6, Brown, Hazel
In which case, you have to parse out this information based on the structure above. If it's repeated blocks of text like the above then we can say the following:
a. Every person is in a block starting with Name Details
b. The name value is the first text after Details, with the other columns being delimited in the format Column:Value
However, you might also have sections with addtional attributes, or attributes that are missing if the original input was optional, so tracking the column and ordinal would be useful too.
So one approach might look like the following:
public void ParseFile(){
String currentLine;
bool newSection = false;
//Store the column names and ordinal position here.
List<String> nameOrdinals = new List<String>();
nameOrdinals.Add("Name"); //IndexOf == 0
Dictionary<Int32, List<String>> nameValues = new Dictionary<Int32 ,List<string>>(); //Use this to store each person's details
Int32 rowNumber = 0;
using (TextReader reader = File.OpenText("D:\\temp\\test.txt"))
{
while ((currentLine = reader.ReadLine()) != null) //This will read the file one row at a time until there are no more rows to read
{
string[] lineSegments = currentLine.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries);
if (lineSegments.Length == 2 && String.Compare(lineSegments[0], "Name", StringComparison.InvariantCultureIgnoreCase) == 0
&& String.Compare(lineSegments[1], "Details", StringComparison.InvariantCultureIgnoreCase) == 0) //Looking for a Name Details Line - Start of a new section
{
rowNumber++;
newSection = true;
continue;
}
if (newSection && lineSegments.Length > 1) //We can start adding a new person's details - we know that
{
nameValues.Add(rowNumber, new List<String>());
nameValues[rowNumber].Insert(nameOrdinals.IndexOf("Name"), lineSegments[0]);
//Get the first column:value item
ParseColonSeparatedItem(lineSegments[1], nameOrdinals, nameValues, rowNumber);
newSection = false;
continue;
}
if (lineSegments.Length > 0 && lineSegments[0] != String.Empty) //Ignore empty lines
{
ParseColonSeparatedItem(lineSegments[0], nameOrdinals, nameValues, rowNumber);
}
}
}
//At this point we should have collected a big list of items. We can then write out the CSV. We can use a StringBuilder for now, although your requirements will
//be dependent upon how big the source files are.
//Write out the columns
StringBuilder builder = new StringBuilder();
for (int i = 0; i < nameOrdinals.Count; i++)
{
if(i == nameOrdinals.Count - 1)
{
builder.Append(nameOrdinals[i]);
}
else
{
builder.AppendFormat("{0},", nameOrdinals[i]);
}
}
builder.Append(Environment.NewLine);
foreach (int key in nameValues.Keys)
{
List<String> values = nameValues[key];
for (int i = 0; i < values.Count; i++)
{
if (i == values.Count - 1)
{
builder.Append(values[i]);
}
else
{
builder.AppendFormat("{0},", values[i]);
}
}
builder.Append(Environment.NewLine);
}
//At this point you now have a StringBuilder containing the CSV data you can write to a file or similar
}
private void ParseColonSeparatedItem(string textToSeparate, List<String> columns, Dictionary<Int32, List<String>> outputStorage, int outputKey)
{
if (String.IsNullOrWhiteSpace(textToSeparate)) { return; }
string[] colVals = textToSeparate.Split(new[] { ":" }, StringSplitOptions.RemoveEmptyEntries);
List<String> outputValues = outputStorage[outputKey];
if (!columns.Contains(colVals[0]))
{
//Add the column to the list of expected columns. The index of the column determines it's index in the output
columns.Add(colVals[0]);
}
if (outputValues.Count < columns.Count)
{
outputValues.Add(colVals[1]);
}
else
{
outputStorage[outputKey].Insert(columns.IndexOf(colVals[0]), colVals[1]); //We append the value to the list at the place where the column index expects it to be. That way we can miss values in certain sections yet still have the expected output
}
}
After running this against your file, the string builder contains:
"Name,Age,Height,Hair,Eyes\r\nAlex,25,6,Brown,Hazel\r\n"
Which matches the above (\r\n is effectively the Windows new line marker)
This approach demonstrates how a custom parser might work - it's purposefully over verbose as there is plenty of refactoring that could take place here, and is just an example.
Improvements would include:
1) This function assumes there are no spaces in the actual text items themselves. This is a pretty big assumption and, if wrong, would require a different approach to parsing out the line segments. However, this only needs to change in one place - as you read a line at a time, you could apply a reg ex, or just read in characters and assume that everything after the first "column:" section is a value, for example.
2) No exception handling
3) Text output is not quoted. You could test each value to see if it's a date or number - if not, wrap it in quotes as then other programs (like Excel) will attempt to preserve the underlying datatypes more effectively.
4) Assumes no column names are repeated. If they are, then you have to check if a column item has already been added, and then create an ColName2 column in the parsing section.
I have a .txt file which has about 500k entries, each separated by new line. The file size is about 13MB and the format of each line is the following:
SomeText<tab>Value<tab>AnotherValue<tab>
My problem is to find a certain "string" with the input from the program, from the first column in the file, and get the corresponding Value and AnotherValue from the two columns.
The first column is not sorted, but the second and third column values in the file are actually sorted. But, this sorting is of no good use to me.
The file is static and does not change. I was thinking to use the Regex.IsMatch() here but I am not sure if that's the best approach here to go line by line.
If the lookup time would increase drastically, I could probably go for rearranging the first column (and hence un-sorting the second & third column). Any suggestions on how to implement this approach or the above approach if required?
After locating the string, how should I fetch those two column values?
EDIT
I realized that there will be quite a bit of searches in the file for atleast oe request by the user. If I have an array of values to be found, how can I return some kind of dictionary having a corresponding values of found matches?
Maybe with this code:
var myLine = File.ReadAllLines()
.Select(line => line.Split(new [] {' ', '\t'}, SplitStringOptions.RemoveEmptyEntries)
.Single(s => s[0] == "string to find");
myLine is an array of strings that represents a row. You may also use .AsParallel() extension method for better performance.
How many times do you need to do this search?
Is the cost of some pre-processing on startup worth it if you save time on each search?
Is loading all the data into memory at startup feasible?
Parse the file into objects and stick the results into a hashtable?
I don't think Regex will help you more than any of the standard string options. You are looking for a fixed string value, not a pattern, but I stand to be corrected on that.
Update
Presuming that the "SomeText" is unique, you can use a dictionary like this
Data represents the values coming in from the file.
MyData is a class to hold them in memory.
public IEnumerable<string> Data = new List<string>() {
"Text1\tValue1\tAnotherValue1\t",
"Text2\tValue2\tAnotherValue2\t",
"Text3\tValue3\tAnotherValue3\t",
"Text4\tValue4\tAnotherValue4\t",
"Text5\tValue5\tAnotherValue5\t",
"Text6\tValue6\tAnotherValue6\t",
"Text7\tValue7\tAnotherValue7\t",
"Text8\tValue8\tAnotherValue8\t"
};
public class MyData {
public String SomeText { get; set; }
public String Value { get; set; }
public String AnotherValue { get; set; }
}
[TestMethod]
public void ParseAndFind() {
var dictionary = Data.Select(line =>
{
var pieces = line.Split(new char[] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
return new MyData {
SomeText = pieces[0],
Value = pieces[1],
AnotherValue = pieces[2],
};
}).ToDictionary<MyData, string>(dat =>dat.SomeText);
Assert.AreEqual("AnotherValue3", dictionary["Text3"].AnotherValue);
Assert.AreEqual("Value7", dictionary["Text7"].Value);
}
hth,
Alan
var firstFoundLine = File.ReadLines("filename").FirstOrDefault(s => s.StartsWith("string"));
if (firstFoundLine != "")
{
char yourColumnDelimiter = '\t';
var columnValues = firstFoundLine.Split(new []{yourColumnDelimiter});
var secondColumn = columnValues[1];
var thirdColumns = columnValues[2];
}
File.ReadLines is better than File.RealAllLines because you won't need to read the whole file -- only until matching string is found http://msdn.microsoft.com/en-us/library/dd383503.aspx
Parse this monstrosity into some sort of database.
SQL Server/MySQL would be preferable, but if you can't use them for various reasons, SQLite or even Access or Excel could work.
Doing that a single time is not hard.
After you are done with that, searching will become easy and fast.
GetLines(inputPath).FirstOrDefault(p=>p.Split(",")[0]=="SearchText")
private static IEnumerable<string> GetLines(string inputFile)
{
string filePath = Path.Combine(Directory.GetCurrentDirectory(),inputFile);
return File.ReadLines(filePath);
}