I have a csv file
Date,Open,High,Low,Close,Volume,Adj Close
2011-09-23,24.90,25.15,24.69,25.06,64768100,25.06
2011-09-22,25.30,25.65,24.60,25.06,96278300,25.06
...
and i have a class StockQuote with fields
Date,open,high...
How can i make a list of StockQuote object from csv file using linq?
I m trying something like this:`
stirng[] Data = parser.ReadFields();
var query = from d in Data
where !String.IsNullorWhiteSpace(d)
let data=d.Split(',')
select new StockQuote()
{
Date=data[0], Open=double.Parse(data [ 1 ] ),
...
`
You can do something like this..
var yourData = File.ReadAllLines("yourFile.csv")
.Skip(1)
.Select(x => x.Split(','))
.Select(x => new
{
Date= x[0],
Open = double.Parse(x[1]),
High = double.Parse(x[2]),
Low = double.Parse(x[3]),
Close = double.Parse(x[4]),
Volume = double.Parse(x[5]),
AdjClose = double.Parse(x[6])
});
You should not be using Linq, Regex or the like for CSV parsing. For CSV parsing, use a CSV Parser.
Linq and Regex will work exactly until you run into a escaped control character, multiline fields or something of the sort. Then they will plain break. And propably be unfixable.
Take a look at this question :
Parsing CSV files in C#, with header
The answer mentionning .Net integrated CSV parser seems fine.
And no, you don't need Linq for this.
Related
I have a method, the input of which is a list of file addresses that I want to open this files and process it. this address contains the file extension. I know for sure that I have 3 file extensions (txt, xlsx, xls)
in the code pathWithFilesName it input list with file path;
then I want to send them to methods that will open and process them
pathWithFilesName.Add("ds.xlsx");
pathWithFilesName.Add("ds.txt");
var listExcel=new List<string>();
var listTxt= new List<string>();
var validExcelFileTypes = new List<string>{ ".xls", ".xlsx" };
foreach (var path in pathWithFilesName)
{
foreach (var valid in validExcelFileTypes)
{
if (path.EndsWith(valid))
{
listExcel.Add(path);
}
else
{
listTxt.Add(path);
}
}
}
this variant not optimal at all but work)
i know how take excel files on link
var list= (from path in pathWithFilesName from valid in validExcelFileTypes where path.EndsWith(valid) select path).ToList();
but with this approach I need then compare 2 lists. for example some kind of Intersect
what is the best way to make a sample?
Here is a variation using LinQ and lambda. It should not be more efficient not better or worse. It may be more readable.
The listExcel can be find that way :
var listExcel = pathWithFilesName.Where(path=>validExcelFileTypes.Any(ext=> path.EndsWith(ext)));
Enumerable.Any
Enumerable.Where
If you need both list in one go. You can group the source on the same condition:
var listGrp = pathWithFilesName.GroupBy(path=>validExcelFileTypes.Any(ext=> path.EndsWith(ext)));
You can use MoreLinQ Partition: "Partitions a sequence by a predicate,..".
var (listExcel, listTxt) = pathWithFilesName
.Partition(p =>
validExcelFileTypes.Any(ext => p.EndsWith(ext))
);
Under the hood it's just a GroupBy source code. Unrolled into a Named Tuple.
Live demo
hi have a text files that contains 3 columns something like this:
contract1;pdf1;63
contract1;pdf2;5
contract1;pdf3;2
contract1;pdf4;00
contract2;pdf1;2
contract2;pdf2;30
contract2;pdf3;5
contract2;pdf4;80
now, i want to write those information into another text files ,and the output will be order put for first the records with the last column in "2,5", something like this:
contract1;pdf3;2
contract1;pdf2;5
contract1;pdf1;63
contract1;pdf4;00
contract2;pdf1;2
contract2;pdf3;5
contract2;pdf2;30
contract2;pdf4;80
how can i do?
thanks
You can use LINQ to group and sort the lines after reading, then put them back together:
var output = File.ReadAllLines(#"path-to-file")
.Select(s => s.Split(';'))
.GroupBy(s => s[0])
.SelectMany(sg => sg.OrderBy(s => s[2] == "2" ? "-" : s[2] == "5" ? "+" : s[2]).Select(sg => String.Join(";", sg)));
Then just write them to a file.
I'm not going to write your program for you, but I would recommend this library for reading and writing delimited files:
https://joshclose.github.io/CsvHelper/getting-started/
When you new up the reader make sure to specify your semi-colon delimiter:
using (var reader = new StreamReader("path\\to\\input_file.csv"))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.Delimiter = ";";
var records = csv.GetRecords<Row>();
// manipulate the data as needed here
}
Your "Row" class (choose a more appropriate name for clarity) will specify the schema of the flat file. It sounds like you don't have headers? If not, you can specify the Order of each item.
public class Row
{
[Index(1)]
public string MyValue1 { get; set; }
[Index(2)]
public string MyValue2 { get; set; }
}
After reading the data in, you can manipulate it as needed. If the output format is different from the input format, you should convert the input class into an output class. You can use the Automapper library if you would like. However, for a simple project I would suggest to just manually convert the input class into the output class.
Lastly, write the data back out:
using (var writer = new StreamWriter("path\\to\\output_file.csv"))
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(records);
}
I'm pretty new to programming in c# and I have some problems to process a lot of data in several csv-files into one xml-file.
The csv files I have look like the following:
"ID","NODE","PROCESS_STATE","TIME_STAMP","PREV_TIME_STAMP","CALCULATED"
206609474,2175,47,31.03.2015 00:01:25,31.03.2015 00:01:24,1
206609475,2175,47,31.03.2015 00:02:25,31.03.2015 00:01:25,1
206609476,2175,47,31.03.2015 00:03:25,31.03.2015 00:02:25,1
In a first step I remove all entries that aren't important for my calculations (e.g. I remove all files that don't contain specific dates) and then save each file again.
The second step is to merge all those prepared files (~ 100) into one big csv-file.
Until here everything works pretty good and fast.
The last step is to convert the csv-file into an xml-file of the following format:
<data-set>
<PDA_DATA>
<ID>484261933</ID>
<NODE>2190</NODE>
<PROCESS_STATE>18</PROCESS_STATE>
<PREV_TIME_STAMP>05.05.2016 22:53:41</PREV_TIME_STAMP>
</PDA_DATA>
<PDA_DATA>
<ID>484261935</ID>
<NODE>2190</NODE>
<PROCESS_STATE>47</PROCESS_STATE>
<PREV_TIME_STAMP>06.05.2016 00:44:17</PREV_TIME_STAMP>
</PDA_DATA>
</data-set>
As you can see I remove elements ("TIME_STAMP", "CALCULATED") and further more I also remove all entries where the entry "TIME_STAMP" is equal to "PREV_TIME_STAMP". I'm doing this with the following code:
string[] csvlines = File.ReadAllLines("All_Machines.csv");
XElement xml = new XElement("data-set",
from str in csvlines
let columns = str.Split(',')
select new XElement("PDA_DATA",
new XElement("ID", columns[0]),
new XElement("NODE", columns[2]),
new XElement("PROCESS_STATE", columns[5]),
new XElement("TIME_STAMP", columns[6]),
new XElement("PREV_TIME_STAMP", columns[9]),
new XElement("CALCULATED", columns[10])));
// Remove unneccessray elements
xml.Elements("PDA_DATA")
.Where(e =>
e.Element("TIME_STAMP").Value.Equals(e.Element("PREV_TIME_STAMP").Value))
.Remove(); // Remove entries with duration = 0
xml.Elements("PDA_DATA").Elements("TIME_STAMP").Remove();
xml.Elements("PDA_DATA").Elements("PREV_PROCESS_STATE").Remove();
xml.Elements("PDA_DATA").Elements("CALCULATED").Remove();
xml.Save("All_Machines.xml");
And here is my problem. If I exclude the line where I remove Elements where TimeStamp equals PrevTimeStamp everything works pretty good and fast.
But with this command, it takes a lot of time and does only work with small csv-files.
I have no knowledge about resource-efficient programming, so I'd be really glad if someone of you could tell me where the problem is or how to do that better.
This works out much faster:
string[] csvlines = File.ReadAllLines("All_Machines.csv");
XElement xml = new XElement("data-set",
from str in csvlines
let columns = str.Split(',')
select new XElement("PDA_DATA",
new XElement("ID", columns[0]),
new XElement("NODE", columns[1]),
new XElement("PROCESS_STATE", columns[2]),
new XElement("TIME_STAMP", columns[3]),
new XElement("PREV_TIME_STAMP", columns[4]),
new XElement("CALCULATED", columns[5]),
)
);
// Remove unneccessray elements
XElement xml2 = new XElement("data-set",
from el in xml.Elements()
where (el.Element("TIME_STAMP").Value != (el.Element("PREV_TIME_STAMP").Value))
select el
);
xml2.Elements("PDA_DATA").Elements("TIME_STAMP").Remove();
xml2.Elements("PDA_DATA").Elements("PREV_PROCESS_STATE").Remove();
xml2.Elements("PDA_DATA").Elements("CALCULATED").Remove();
xml2.Save("All_Machines.xml");
Still not perfect for csv-file-sizes over 150 MB.. Any better suggestions?
With Cinchoo ETL - an open source framework, you can read and write CSV/Xml large files quickly with few lines of code as below
using (var csv = new ChoCSVReader("NodeData.csv").WithFirstLineHeader(true)
.WithFields("ID", "NODE", "PROCESS_STATE", "PREV_TIME_STAMP"))
{
using (var xml = new ChoXmlWriter("NodeData.xml").WithXPath("data-set/PDA_DATA"))
xml.Write(csv);
}
The output xml look like
<data-set>
<PDA_DATA>
<ID>206609474</ID>
<NODE>2175</NODE>
<PROCESS_STATE>47</PROCESS_STATE>
<PREV_TIME_STAMP>31.03.2015 00:01:25</PREV_TIME_STAMP>
</PDA_DATA>
<PDA_DATA>
<ID>206609475</ID>
<NODE>2175</NODE>
<PROCESS_STATE>47</PROCESS_STATE>
<PREV_TIME_STAMP>31.03.2015 00:02:25</PREV_TIME_STAMP>
</PDA_DATA>
<PDA_DATA>
<ID>206609476</ID>
<NODE>2175</NODE>
<PROCESS_STATE>47</PROCESS_STATE>
<PREV_TIME_STAMP>31.03.2015 00:03:25</PREV_TIME_STAMP>
</PDA_DATA>
</data-set>
Disclosure: I'm the author of this library
I have 3 CSV files which I download and read into a List of a class that matches the CSV file. Now I do that using a LINQ query. Code:
var ListOfCSV= CsvString.Remove(CsvString.LastIndexOf((Environment.NewLine), StringComparison.Ordinal)).Split(new[] { Environment.NewLine }, StringSplitOptions.None)
.Skip(1)
.Select(columns => columns.Split(';'))
.Select(columns => new MyClass
{
argument1 = columns[0],
argument2 = columns[1],
argument3 = columns[2],
argument4 = (columns[3]),
argument5 = columns[4],
argument6 = columns[5],
argument7 = columns[6],
argument8 = columns[7],
});
I do that 3 times for each CSV file (as they are different classes).
Is there a way to shorten this or maybe make it faster? It is certainly not slow but I just would like to make it as fast and well performing as possible.
Thanks!
Checks this out, it might spare you the effort:
FileHelpers
It supports reading delimited CSV files:
Read Delimited File
I'm writing a function that will parse a file similar to an XML file from a legacy system.
....
<prod pid="5" cat='gov'>bla bla</prod>
.....
<prod cat='chi'>etc etc</prod>
....
.....
I currently have this code:
buf = Regex.Replace(entry, "<prod(?:.*?)>(.*?)</prod>", "<span class='prod'>$1</span>");
Which was working fine until it was decided that we also wanted to show the categories.
The problem is, categories are optional and I need to run the category abbreviation through a SQL query to retrieve the category's full name.
eg:
SELECT * FROM cats WHERE abbr='gov'
The final output should be:
<span class='prod'>bla bla</span><span class='cat'>Government</span>
Any idea on how I could do this?
Note1: The function is done already (except this part) and working fine.
Note2: Cannot use XML libraries, regex has to be used
Regex.Replace has an overload that takes a MatchEvaluator, which is basically a Func<Match, string>. So, you can dynamically generate a replacement string.
buf = Regex.Replace(entry, #"<prod(?<attr>.*?)>(?<text>.*?)</prod>", match => {
var attrText = match.Groups["attr"].Value;
var text = match.Groups["text"].Value;
// Now, parse your attributes
var attributes = Regex.Matches(#"(?<name>\w+)\s*=\s*(['""])(?<value>.*?)\1")
.Cast<Match>()
.ToDictionary(
m => m.Groups["name"].Value,
m => m.Groups["value"].Value);
string category;
if (attributes.TryGetValue("cat", out category))
{
// Your SQL here etc...
var label = GetLabelForCategory(category)
return String.Format("<span class='prod'>{0}</span><span class='cat'>{1}</span>", WebUtility.HtmlEncode(text), WebUtility.HtmlEncode(label));
}
// Generate the result string
return String.Format("<span class='prod'>{0}</span>", WebUtility.HtmlEncode(text));
});
This should get you started.