Reading a CSV file and extracting specific data [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
So I have a .CSV file which has possibly several millions, maybe even billions lines of data. The data is in the format below:
1,,5,6,7,82,4,6
1,4,4,5,6,33,4,
2,6,3,,6,32,6,7
,,,2,5,45,,6
,4,5,6,,33,5,6
What I am trying to achieve is this: Lets assume each line of data is an "event". Lets call it that. Now lets say an user says, show me all events where the 6th value is 33. You can see above that the 6th data element is a 2 digit number and the user can say show me all events where the 6th data element is 33 and the output would be:
1,4,4,5,6,33,4,
,4,5,6,,33,5,6
Also, as you can see. The data can have blanks or holes where data is missing. I don't need help reading a .CSV file or anything. I just cant wrap my mind around how I would access the 6th data element. Also, I would prefer if this output is represented in a collection of some sort maybe. I'm new to C# so I don't have much knowledge about the inbuilt classes. Any help will be appreciated!

I suggest instead of using term "event" to call this data structure more customarily as "rows and columns" and use C# Split() function to create 2d-array (string[,] or int[,]), where each element is conveniently accessible by its row/column index, and to apply whatever business logic to those elements.
Possible implementation of the CSV file reader (by line, with each line stored in the List<string> listRows) is shown below (re: Reading CSV file and storing values into an array)
using System.IO;
static void Main(string[] args)
{
var reader = new StreamReader(File.OpenRead(#"C:\YouFile.csv"));
List<string> listRows= new List<string>();
while (!reader.EndOfStream)
{
listRows.Add(reader.ReadLine());
}
}
Then apply Split(',') function to each row (stored in listRows) to compose a 2d-array string[,] and use int.TryParse() method to convert it to type int (optional, upon necessity).
Alternatively, this could be implemented by using LINQ Library, which is not recommended because of unnecessary extension of the technology surface area, plus possible performance degradation (LINQ solution expected to be slower than suggested direct processing).
Hope this may help.

Using Linq it is pretty easy to achieve. I'm posting as sample from LinqPad and providing output. All you need to do is to replace 33 with a parameter:
void Main()
{
string csvFile = #"C:\Temp\TestData.csv";
string[] lines = File.ReadAllLines(csvFile);
var values = lines.Select(s => new { myRow = s.Split(',')});
//and here is your collection representing results
List<string[]> results = new List<string[]>();
foreach (var value in values)
{
if(value.Values.Contains("33")){
results.Add(value.myRow);
}
}
results.Dump();
}
Output:
or if you want you can have it all in one shot by doing this
string csvFile = #"C:\Temp\TestData.csv";
string[] lines = File.ReadAllLines(csvFile);
var values = lines.Select(s =>
new {Position =Array.FindIndex(s.Split(','),a=>a.Contains("33"))+1
,myRow = s.Split(',')
});
so the final product will have both - the position of your search (33) and the complete string[] of items.

Create a class EventEntity. In this class create a List<int> with a constructor that initializes the list. Here is a class example:
public class EventEntity
{
public EventEntity()
{
EventList = new List<int>();
}
public List<int> EventList { get; set; }
}
From there loop through each row of data. Example:
public class EventEntityRepo
{
public EventEntity GetEventEntityByCsvDataRow(String[] csvRow)
{
EventEntity events = new EventEntity();
foreach (String csvCell in csvRow)
{
int eventId = -1;
if(csvCell != null && csvCell != String.Empty)
{
try
{
eventId = Convert.ToInt32(csvCell.Trim());
}
catch (Exception ex)
{
//failed to parse int
}
}
events.EventList.Add(eventId); //if an empty item, insert -1
}
return events;
}
}
Then you can reference the items whenever you want.
eventEntityList = GetEventEntityByCsvDataRow(csvDataRow);
eventEntitySixthElement = eventEntityList[5];

So your questions is how to access the 6th data element. It's not too hard if you have right data structure representing your csv.
Basically this csv document in abstract term can be described as IEnumerable<IEnumerable<String>>, or, maybe, IEnumerable<IEnumerable<int?>>. Having implemented csv parsing logic, you will access the 6th elements by executin:
var csvRepresenation = ParseCsv(#"D:/file.csv");
var element = csvRepresentation.ElementAt(6);
if (element == "6")
{
// do smth
}
With this aproach you will also be able to execute Linq statements on it.
Now the question is how you will implement the ParseCsv():
public IEnumerable<IEnumerable<String>> ParseCsv(string path)
{
return File.ReadAllLines(path).Select(row => row.Split(','));
}

Related

How to extract from a list based on user input

The code contains two methods.
The Main which prompts the user for input and prints a sublist based on said user input.
The Extract method passes query from user input and adds all indices to dbQueryList to be extracted from dbListing and printed as query.
How does one to add to a List based on user input?
The primary issue is the if statement which contains the condition of
i.Substring(0, query.Length) = query. This is meant to test the condition 'if part of the query exists in any index in dbListing, add elements to dbQueryList '.
I originally wrote this in Python and it worked perfectly fine. I'm learning C# and not sure how to change that if condition. I considered changing the code and use LINQ in the foreach loop but not entirely clear on how to implement that.
Looking forward to community feedback! :)
//**************************************************
// File Name: autocomplete.cs
// Version: 1.0
// Description: Create a method that functions like an autocomplete
// API and truncates search to 5 results.
// Last Modified: 12/19/2018
//**************************************************
using System;
using System.Collections.Generic;
namespace autocomplete
{
class Program
{
private static string[] database;
private static string input;
private static string query;
static void Main(string[] args)
{
// user input to pass as query
Console.Write("Enter your query: ");
string query = Console.ReadLine();
// dynamic list comprised of 'database' array
List<string> dbListing = new List<string>();
string[] database = new string[] { "abracadara", "al", "alice", "alicia", "allen", "alter","altercation", "bob", "element", "ello", "eve", "evening", "event", "eventually", "mallory" };
dbListing.AddRange(database);
// write results based on user query
Console.WriteLine("Your results: " + Extract(Program.query));
// keep console window open after displaying results
Console.ReadLine();
}
// extract method passing query, return dbQueryList as query
public static List<string> Extract(string query)
{
// empty list is initiated
List<string> dbQueryList = new List<string>();
// foreach assesses all strings in database in main
// then, appends all indices of list equal to given query
foreach (string i in database)
{
// compares query (from index 0 to length of) to all strings in database
if (i.Substring(0, query.Length) = query)
{
// add to list above based on query
dbQueryList.Add(i);
}
// if statement truncates dbQueryList to 5 results
if (dbQueryList.Capacity >= 5)
break;
}
return dbQueryList;
}
}
UPDATE: 1/3/2019 18:30
I made the following changes to the Extract(query) and it worked!
foreach (string i in database)
{
// compares query (from index 0 to length of) to all strings in database
if (i.StartsWith(query))
{
// add to list above based on query
dbQueryList.Add(i);
Console.WriteLine(i);
}
// if statement truncates dbQueryList to 5 results
if (dbQueryList.Capacity >= 5)
break;
}
return dbQueryList;
Very excited that I got this to work! Please let me know if there are any further feedback about how to improve and clean this code if necessary! Cheers, everyone!
The problem is you are using = instead of == in the if statement.
In C# = operator is for assignment so what you are doing is trying to assign query to the expression on the left side, which is not possible. Instead use == operator which is for comparison.
Also, there is a more suitable method - use i.StartsWith(query) to check if the string starts with the given query. The current solution would work as long as i is not shorter than query.Length, in which case it would throw an exception.
if (i.StartsWith(query))
{
...

C# - Excel Export a List

Hi i have this code To export a List to An Excel:
private DataTable ListaDatiReportQuietanzamento(List<DatiReportQuietanzamento> datiReportQuietanzamento)
{
DataTable dt = new DataTable("DatiReportQuietanzamento");
dt.Columns.Add("Polizza");
dt.Columns.Add("Posizione");
dt.Columns.Add("Codice Frazionamento");
var result = datiReportQuietanzamento.ToDataTable().AsEnumerable().Select(p =>
new
{
n_polizza = p.Field<long>("n_polizza"),
n_posizione = p.Field<byte>("n_posizione"),
c_frazionamento = p.Field<string>("c_frazionamento")
}).Distinct().ToList();
foreach (var item in result)
{
dt.Rows.Add(item.n_polizza, item.n_posizione, item.c_frazionamento);
}
return dt;
}
This method works with Lists that does not contain many items , but when the list is very large , the method takes too many time.
There is a way to avoid the foreach and add to the rows the items directly? Maybe with Lambda Expression?
Thank you.
While you have not specified how the data is ultimately to be supplied to Excel, generally it is supplied a CSV (Comma Separated Values) file for easy import.
So this being the case you can eliminate your data table conversion entirely and create a list of strings as follows:
private List<string> ListaDatiReportQuietanzamento(List<DatiReportQuietanzamento> datiReportQuietanzamento)
{
var result = new List<string>();
foreach (var item in datiReportQuietanzamento)
{
result.AppendLine($"{item.n_polizza},{item.n_posizione},{item.c_frazionamento}");
}
return result;
}
Now the only simplification I have made is not to worry about encoding because strings should actually be escaped so item.c_frazionamento should actually be escaped.
Instead of doing this all yourself, I suggest you have a look at a NuGet package such as CsvHelper which will help you with creating CSV files and take all the hassle with escaping things out of the equation. It can also directly deal with a list of objects and convert it into a CSV file for you see specifically the first example in https://joshclose.github.io/CsvHelper/writing#writing-all-records

How to do this kind of search in ASP.net MVC?

I have an ASP.NET MVC web application.
The SQL table has one column ProdNum and it contains data such as 4892-34-456-2311.
The user needs a form to search the database that includes this field.
The problem is that the user wants to have 4 separate fields in the UI razor view whereas each field should match with the 4 parts of data above between -.
For example ProdNum1, ProdNum2, ProdNum3 and ProdNum4 field should match with 4892, 34, 456, 2311.
Since the entire search form contains many fields including these 4 fields, the search logic is based on a predicate which is inherited from the PredicateBuilder class.
Something like this:
...other field to be filtered
if (!string.IsNullOrEmpty(ProdNum1) {
predicate = predicate.And(
t => t.ProdNum.toString().Split('-')[0].Contains(ProdNum1).ToList();
...other fields to be filtered
But the above code has run-time error:
The LINQ expression node type 'ArrayIndex' is not supported in LINQ to Entities`
Does anybody know how to resolve this issue?
Thanks a lot for all responses, finally, I found an easy way to resolve it.
instead of rebuilding models and change the database tables, I just add extra space in the search strings to match the search criteria. since the data format always is: 4892-34-456-2311, so I use Startwith(PODNum1) to search first field, and use Contains("-" + PODNum2 + "-") to search second and third strings (replace PODNum1 to PODNum3), and use EndWith("-" + PODNum4) to search 4th string. This way, I don't need to change anything else, it is simple.
Again, thanks a lot for all responses, much appreciated.
If i understand this correct,you have one column which u want to act like 4 different column ? This isn't worth it...For that,you need to Split each rows column data,create a class to handle the splitted data and finally use a `List .Thats a useless workaround.I rather suggest u to use 4 columns instead.
But if you still want to go with your existing applied method,you first need to Split as i mentioned earlier.For that,here's an example :
public void test()
{
SqlDataReader datareader = new SqlDataReader;
while (datareader.read)
{
string part1 = datareader(1).toString.Split("-")(0);///the 1st part of your column data
string part2 = datareader(1).toString.Split("-")(1);///the 2nd part of your column data
}
}
Now,as mentioned in the comments,you can rather a class to handle all the data.For example,let's call it mydata
public class mydata {
public string part1;
public string part2;
public string part3;
public string part4;
}
Now,within the While loop of the SqlDatareader,declare a new instance of this class and pass the values to it.An example :
public void test()
{
SqlDataReader datareader = new SqlDataReader;
while (datareader.read)
{
Mydata alldata = new Mydata;
alldata.Part1 = datareader(1).toString.Split("-")(0);
alldata.Part2 = datareader(1).toString.Split("-")(1);
}
}
Create a list of the class in class-level
public class MyForm
{
List<MyData> storedData = new List<MyData>;
}
Within the while loop of the SqlDatareader,add this at the end :
storedData.Add(allData);
So finally, u have a list of all the splitted data..So write your filtering logic easily :)
As already mentioned in a comment, the error means that accessing data via index (see [0]) is not supported when translating your expression to SQL. Split('-') is also not supported hence you have to resort to the supported functions Substring() and IndexOf(startIndex).
You could do something like the following to first transform the string into 4 number strings ...
.Select(t => new {
t.ProdNum,
FirstNumber = t.ProdNum.Substring(0, t.ProdNum.IndexOf("-")),
Remainder = t.ProdNum.Substring(t.ProdNum.IndexOf("-") + 1)
})
.Select(t => new {
t.ProdNum,
t.FirstNumber,
SecondNumber = t.Remainder.Substring(0, t.Remainder.IndexOf("-")),
Remainder = t.Remainder.Substring(t.Remainder.IndexOf("-") + 1)
})
.Select(t => new {
t.ProdNum,
t.FirstNumber,
t.SecondNumber,
ThirdNumber = t.Remainder.Substring(0, t.Remainder.IndexOf("-")),
FourthNumber = t.Remainder.Substring(t.Remainder.IndexOf("-") + 1)
})
... and then you could simply write something like
if (!string.IsNullOrEmpty(ProdNum3) {
predicate = predicate.And(
t => t.ThirdNumber.Contains(ProdNum3)

C# Index was out of range. Must be non-negative and less than the size of the collection [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Recently i am doing the project about developing a editor,
why C# list can allow me using ADD function, but i cant assign value by indexing
can anybody help?
List<List<String>> datalist = new List<List<String>>();
....
datalist[tag.Count-1]=datasublist;
Is a problem because you're trying to assign something that doesn't exist. A list isn't like an array, where you declare it of a specific size and can use any part of what you made. Since you've declared datalist as a new List with nothing in it, there isn't anything in it you can change. You need to use
datalist.Add(datasublist);
An unrelated aside, as I noted in the comments, you can replace
line.Substring(26, EndOfIndex)
With
line.Substring(26)
It will by default return the rest of the string. Will also let you remove the EndOfIndex variable.
I'm assuming error is on the line:
datalist[tag.Count-1]=datasublist;
At first glance it seemed to me like you want a Dictionary<int, List<string>> or better yet maybe a Dictionary<string, List<string>>. In which case you can either store the value tag.Count - 1, or even the value of the tag just added. But on second thought the line above is being repeated in a loop, and it looks like you're just trying to add a bunch of strings to a correlated list.
So, I recommend using a class to store tag names and their associated data together:
class TagInfo
{
public string TagName {get; set;}
private readonly List<string> data = new List<string>();
public List<string> Data {get {return data;}}
}
Which will then allow you to do:
List<TagInfo> tags = new List<TagInfo>();
while (line != null)
{
if (line.Substring(0, 26) == "CRDI-CONTROL %%LINES-BEGIN")
{
string tagName = line.Substring(26);
TagInfo tag = new TagInfo {TagName = tagName};
tags.Add(tag);
line = reader.ReadLine();
while (line.Substring(0, 24) != "CRDI-CONTROL %%LINES-END")
{
tag.Data.Add(line.Replace(" ", String.Empty));
line = reader.ReadLine();
}
}
You could be more advanced and use a Dictionary<string, TagInfo> if you need to be looking up tags by name later. Just store the tag name as the key. You could probably clean the code up more by adding a constructor that takes a tag name, or even creating your own TagInfoCollection if you desired.

Arrays/Array Lists

I am fairly new to C#
I am trying to retrieve some information from an external data source and store it in array, once it is in an array I wish to sort it by time.
I know how to do this for just one column in a row, however the information I require has multiple columns.
For example:
foreach (Appointment Appoint in fapts)
{
// Store Appoint.Subject, Appoint.Start, Appoint.Organiser.Name.ToString(), Appoint.Location in an array
}
// Sort my array by Appoint.Start
foreach ( item in myNewArray )
{
//print out Appoint.Subject - Appoint.Start, Appoint.Organiser.Name.ToString() and Appoint.location
}
Many thanks for your help.
EDIT:
I have multiple data sources which pull in this:
foreach (Appointment Appoint in fapts)
{
// Store Appoint.Subject, Appoint.Start, Appoint.Organiser.Name.ToString(), Appoint.Location in an array
}
Hence the need to sort the items in a new array, I know this isn't very efficent but there is no way of getting the information I need in any other way.
You can sort a list using the LINQ sorting operators OrderBy and ThenBy, as shown below.
using System.Linq;
and then...
var appointments = new List<Appointment>();
var sortedAppointments = list.OrderBy(l => l.Subject).ThenBy(l => l.Name).ToList();
This will create a new list of appointments, sorted by subject and then by name.
It's unclear what your final aim is but:
Use a generic List instead of an array:
See this SO question for more information as to why using a List is prefered.
List<Appointment> appointments = new List<Appointment>();
foreach (Appointment Appoint in fapts)
{
appointments.Add(Appoint);
}
foreach (var item in appointments)
{
Console.WriteLine(item.Subject);
Console.WriteLine(item.Foo);
// Here you could override ToString() on Appointment to print eveything in one Console.WriteLine
}
If the aim of your code is to order by time, try the following:
var sortedAppointments = fapts.OrderBy(a => a.Start); // assuming Start is a DateTime property of `Appointment`.
Consider a Dictionary Object instead of an array if the data is conceptually one row multiple columns.
foreach(KeyValuePair<string, string> entry in MyDic)
{
// do something with entry.Value or entry.Key
}
You already have a list of objects in fpts, sort that list itself:
fpts.OrderBy(x => x.Subject).ThenBy(x => x.Location).ToList();
LINQ is your friend here.
fapts appears to already be a collection so you could just operate on it.
var myNewArray = fapts.OrderBy(Appoint => Appoint.Start).ToArray()
I've used the ToArray() call to force immediate evaluation and means that myNewArray is already sorted so that if you use it more than once you don't have to re-evaluate the sort.
Alternatively if you are only using this once you can just as easily miss the ToArray() portion out and then execution of the sort will be deferred until you try and enumerate through myNewArray.
This solution puts the source objects into the array, but if you are just wanting to store the specific fields you mention then you will need to use a select. You have two choices for the array item type, you can either use an anonymous class which provides difficulties if you are returning this array from a function or define a class.
For anonymous:
var myNewArray = fapts.OrderBy(Appoint => Appoint.Start)
.Select(Appoint => new {
Start = Appoint.Start,
Organiser = Appoint.Organiser.Name.ToString(),
Location = Appoint.Location
}).ToArray();
For named class assuming class is MyClass:
var myNewArray = fapts.OrderBy(Appoint => Appoint.Start)
.Select(Appoint => new MyClass {
Start = Appoint.Start,
Organiser = Appoint.Organiser.Name.ToString(),
Location = Appoint.Location
}).ToArray();
You have a wide range of options. The 2 most common are:
1) Create a class, then define an array or list of that class, and populate that
2) Create a structure that matches the data format and create an array or list of that
Of course, you could put the data into an XML format or dataset, but that's probably more work than you need.
public List<foo> appointments = new List<foo>();
public struct foo
{
public string subject ;
public DateTime start ;
public string name ;
public string location ;
}
public void foo1()
{
// parse the file
while (!File.eof())
{
// Read the next line...
var myRecord = new foo() ;
myRecord.subject = data.subject ;
myRecord.start = data.Start ;
myRecord.name = data.Name ;
//...
appointments.Add(myRecord);
}
}
Enjoy
(Since I can't comment and reply to the comment - it wasn't clear if he had a class, etc. or was just showing us what he wanted to do. I assumed it was just for demonstration purposes since there wasn't any info as to how the data was being read. If he could already put it into a class, than the first answer applied anyway. I just tossed the last 2 in there because they were options for getting the data first.)

Categories