Insert huge files (2G) to mongodb

Insert huge files (2G) to mongodb - c#

I have 2GB files (9 of them) which contains approximately 12M records of strings that i want to insert each one as a document to local mongodb (windows).
Now i'm reading line by line and inserting every second line (the first is unnecessary header) like this:
bool readingFlag = false;
foreach (var line in File.ReadLines(file))
{
if (readingflag)
{
String document = "{'read':'" + line + "'}";
var documnt = new BsonDocument(
MongoDB
.Bson
.Serialization
.BsonSerializer
.Deserialize<BsonDocument>(document));
await collection.InsertOneAsync(documnt);
readingflag = false;
}
else
{
readingflag = true;
}
}
This method is working but not as fast as i expected. I'm now in the middle of the file and i assume it will end in about 4 hours for just one file. (40 hours for all my data)
I think that my bottleneck is the file reading but since it is very big file VS doesn't let my load it to memory (out of memory exception).
Is there any other way that i'm missing here?

I think we could utilize those things:
Get some lines and add in a bunch by insert many
insert data on separate thread as we don't need to wait for finish
use a typed class TextData to push serialization to other thread
You can play with limit at once - as this depend of amount of data read from file
public class TextData{
public ObjectId _id {
get;
set;
}
public string read {
get;
set;
}
}
public class Processor{
public async void ProcessData() {
var client = new MongoClient("mongodb://localhost:27017");
var database = client.GetDatabase("test");
var collection = database.GetCollection < TextData > ("Yogevnn");
var readingflag = false;
var listOfDocument = new List < TextData > ();
var limiAtOnce = 100;
var current = 0;
foreach(var line in File.ReadLines( # "E:\file.txt")) {
if (readingflag) {
var dataToInsert = new TextData {
read = line
};
listOfDocument.Add(dataToInsert);
readingflag = false;
Console.WriteLine($ "Current position: {current}");
if (++current == limiAtOnce) {
current = 0;
Console.WriteLine($ "Inserting data");
var listToInsert = listOfDocument;
var t = new Task(() = > {
Console.WriteLine($ "Inserting data START");
collection.InsertManyAsync(listToInsert);
Console.WriteLine($ "Inserting data FINISH");
});
t.Start();
listOfDocument = new List < TextData > ();
}
} else {
readingflag = true;
}
}
// insert remainder
await collection.InsertManyAsync(listOfDocument);
}
}
Any comments welcome!

In my experiments I found Parallel.ForEach(File.ReadLines("path")) to be the fastest.
File size was about 42 GB. I also tried batching a set of 100 lines and save the batch but was slower than Parallel.ForEach.
Another example: Read large txt file multithreaded?

Related

How to add data from Firebase to DataGridView using FireSharp

I just want to retrieve data from a Firebase to DataGridView. The code I have is retrieving data already, however, it's retrieving everything to the same row instead of creating a new one. I'm a beginner in coding, so I really need help with that.
I read online that Firebase doesn't "Count" data, so it'd be needed to create a counter, so each time I add or delete data, an update would be needed. I did it and it's working. I created a method to load the data.
private async Task firebaseData()
{
int i = 0;
FirebaseResponse firebaseResponse = await client.GetAsync("Counter/node");
Counter_class counter = firebaseResponse.ResultAs<Counter_class>();
int foodCount = Convert.ToInt32(counter.food_count);
while (true)
{
if (i == foodCount)
{
break;
}
i++;
try
{
FirebaseResponse response2 = await client.GetAsync("Foods/0" + i);
Foods foods = response2.ResultAs<Foods>();
this.dtGProductsList.Rows[0].Cells[0].Value = foods.menuId;
this.dtGProductsList.Rows[0].Cells[1].Value = foods.name;
this.dtGProductsList.Rows[0].Cells[2].Value = foods.image;
this.dtGProductsList.Rows[0].Cells[3].Value = foods.price;
this.dtGProductsList.Rows[0].Cells[4].Value = foods.discount;
this.dtGProductsList.Rows[0].Cells[5].Value = foods.description;
}
catch
{
}
}
MessageBox.Show("Done");
}
OBS: A DataTable exists already(dataTable), there's a DataGridView too which has columns(ID,Name, Image, Price, Discount, Description), which match the number and order given to the .Cells[x]. When the Form loads, dtGProductsList.DataSource = dataTable; I tried replacing [0] for [i].
I expect the data that is beeing retrieved to be set to a new row and not to the same, and to not skip rows. I'm sorry if it's too simple, but I can't see a way out.

I Faced the same problem and here is mu solution :
Counter_class XClass = new Counter_class();
FirebaseResponse firebaseResponse = await client.GetAsync("Counter/node");
string JsTxt = response.Body;
if (JsTxt == "null")
{
return ;
}
dynamic data = JsonConvert.DeserializeObject<dynamic>(JsTxt);
var list = new List<XClass >();
foreach (var itemDynamic in data)
{
list.Add(JsonConvert.DeserializeObject<XClass >
(((JProperty)itemDynamic).Value.ToString()));
}
// Now you have a list you can loop through to put it at any suitable Visual
//control
foreach ( XClass _Xcls in list)
{
Invoke((MethodInvoker)delegate {
DataGridViewRow row(DataGridViewRow)dg.Rows[0].Clone();
row.Cells[0].Value =_Xdcls...
row.Cells[1].Value =Xdcls...
row.Cells[2].Value =Xdcls...
......
dg.Insert(0, row);
}

C# MVC Loop through list and update each record efficiently

I have a list of 'Sites' that are stored in my database. The list is VERY big and contains around 50,000+ records.
I am trying to loop through each record and update it. This takes ages, is there a better more efficient way of doing this?
using (IRISInSiteLiveEntities DB = new IRISInSiteLiveEntities())
{
var allsites = DB.Sites.ToList();
foreach( var sitedata in allsites)
{
var siterecord = DB.Sites.Find(sitedata.Id);
siterecord.CabinOOB = "Test";
siterecord.TowerOOB = "Test";
siterecord.ManagedOOB = "Test";
siterecord.IssueDescription = "Test";
siterecord.TargetResolutionDate = "Test";
DB.Entry(siterecord).State = EntityState.Modified;
}
DB.SaveChanges();
}
I have cut the stuff out of the code to get to the point. The proper function code I am using basically pulls a list out from Excel, then matches the records in the sites list and updates each record that matches accordingly. The DB.Find is slowing the loop down dramatically.
[HttpPost]
public ActionResult UploadUpdateOOBList()
{
CheckPermissions("UpdateOOBList");
string[] typesallowed = new string[] { ".xls", ".xlsx" };
HttpPostedFileBase file = Request.Files[0];
var fname = file.FileName;
if (!typesallowed.Any(fname.Contains))
{
return Json("NotAllowed");
}
file.SaveAs(Server.MapPath("~/Uploads/OOB List/") + fname);
//Create empty OOB data list
List<OOBList.OOBDetails> oob_data = new List<OOBList.OOBDetails>();
//Using ClosedXML rather than Interop Excel....
//Interop Excel: 30 seconds for 750 rows
//ClosedXML: 3 seconds for 750 rows
string fileName = Server.MapPath("~/Uploads/OOB List/") + fname;
using (var excelWorkbook = new XLWorkbook(fileName))
{
var nonEmptyDataRows = excelWorkbook.Worksheet(2).RowsUsed();
foreach (var dataRow in nonEmptyDataRows)
{
//for row number check
if (dataRow.RowNumber() >= 4 )
{
string siteno = dataRow.Cell(1).GetValue<string>();
string sitename = dataRow.Cell(2).GetValue<string>();
string description = dataRow.Cell(4).GetValue<string>();
string cabinoob = dataRow.Cell(5).GetValue<string>();
string toweroob = dataRow.Cell(6).GetValue<string>();
string manageoob = dataRow.Cell(7).GetValue<string>();
string resolutiondate = dataRow.Cell(8).GetValue<string>();
string resolutiondate_converted = resolutiondate.Substring(resolutiondate.Length - 9);
oob_data.Add(new OOBList.OOBDetails
{
SiteNo = siteno,
SiteName = sitename,
Description = description,
CabinOOB = cabinoob,
TowerOOB = toweroob,
ManageOOB = manageoob,
TargetResolutionDate = resolutiondate_converted
});
}
}
}
//Now delete file.
System.IO.File.Delete(Server.MapPath("~/Uploads/OOB List/") + fname);
Debug.Write("DOWNLOADING LIST ETC....\n");
using (IRISInSiteLiveEntities DB = new IRISInSiteLiveEntities())
{
var allsites = DB.Sites.ToList();
//Loop through sites and the OOB list and if they match then tell us
foreach( var oobdata in oob_data)
{
foreach( var sitedata in allsites)
{
var indexof = sitedata.SiteName.IndexOf(' ');
if( indexof > 0 )
{
var OOBNo = oobdata.SiteNo;
var OOBName = oobdata.SiteName;
var SiteNo = sitedata.SiteName;
var split = SiteNo.Substring(0, indexof);
if (OOBNo == split && SiteNo.Contains(OOBName) )
{
var siterecord = DB.Sites.Find(sitedata.Id);
siterecord.CabinOOB = oobdata.CabinOOB;
siterecord.TowerOOB = oobdata.TowerOOB;
siterecord.ManagedOOB = oobdata.ManageOOB;
siterecord.IssueDescription = oobdata.Description;
siterecord.TargetResolutionDate = oobdata.TargetResolutionDate;
DB.Entry(siterecord).State = EntityState.Modified;
Debug.Write("Updated Site ID/Name Record: " + sitedata.Id + "/" + sitedata.SiteName);
}
}
}
}
DB.SaveChanges();
}
var nowdate = DateTime.Now.ToString("dd/MM/yyyy");
System.IO.File.WriteAllText(Server.MapPath("~/Uploads/OOB List/lastupdated.txt"),nowdate);
return Json("Success");
}

Looks like you are using Entity Framework (6 or Core). In either case both
var siterecord = DB.Sites.Find(sitedata.Id);
and
DB.Entry(siterecord).State = EntityState.Modified;
are redundant, because the siteData variable is coming from
var allsites = DB.Sites.ToList();
This not only loads the whole Site table in memory, but also EF change tracker keeps reference to every object from that list. You can easily verify that with
var siterecord = DB.Sites.Find(sitedata.Id);
Debug.Assert(siterecord == sitedata);
The Find (when the data is already in memory) and Entry methods themselves are fast. But the problem is that they by default trigger automatic DetectChanges, which leads to quadratic time complexity - in simple words, very slow.
With that being said, simply remove them:
if (OOBNo == split && SiteNo.Contains(OOBName))
{
sitedata.CabinOOB = oobdata.CabinOOB;
sitedata.TowerOOB = oobdata.TowerOOB;
sitedata.ManagedOOB = oobdata.ManageOOB;
sitedata.IssueDescription = oobdata.Description;
sitedata.TargetResolutionDate = oobdata.TargetResolutionDate;
Debug.Write("Updated Site ID/Name Record: " + sitedata.Id + "/" + sitedata.SiteName);
}
This way EF will detect changes just once (before SaveChanges) and also will update only the modified record fields.

I have followed Ivan Stoev's suggestion and have changed the code by removing the DB.Find and the EntitySate Modified - It now takes about a minute and a half compared to 15 minutes beforehand. Very suprising as I didn't know that you dont actually require that to update the records. Clever. The code is now:
using (IRISInSiteLiveEntities DB = new IRISInSiteLiveEntities())
{
var allsites = DB.Sites.ToList();
Debug.Write("Starting Site Update loop...");
//Loop through sites and the OOB list and if they match then tell us
//750 records takes around 15-20 minutes.
foreach( var oobdata in oob_data)
{
foreach( var sitedata in allsites)
{
var indexof = sitedata.SiteName.IndexOf(' ');
if( indexof > 0 )
{
var OOBNo = oobdata.SiteNo;
var OOBName = oobdata.SiteName;
var SiteNo = sitedata.SiteName;
var split = SiteNo.Substring(0, indexof);
if (OOBNo == split && SiteNo.Contains(OOBName) )
{
sitedata.CabinOOB = oobdata.CabinOOB;
sitedata.TowerOOB = oobdata.TowerOOB;
sitedata.ManagedOOB = oobdata.ManageOOB;
sitedata.IssueDescription = oobdata.Description;
sitedata.TargetResolutionDate = oobdata.TargetResolutionDate;
Debug.Write("Thank you, next: " + sitedata.Id + "\n");
}
}
}
}
DB.SaveChanges();
}

So first of all you should turn your HTTPPost in an async function
more info https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/
What you then should do is create the tasks and add them to a list. Then wait for them to complete (if you want/need to) by calling Task.WaitAll()
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.waitall?view=netframework-4.7.2
This will allow your code to run in parallel on multiple threads optimizing performance quite a bit already.
You can also use linq to for example reduce the size of allsites beforehand by doing something that will roughly look like this
var sitedataWithCorrectNames = allsites.Where(x => x //evaluate your condition here)
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/ef/language-reference/supported-and-unsupported-linq-methods-linq-to-entities
and then start you foreach (var oobdata) with the now foreach(sitedate in sitedataWithCorrectNames)
Same goes for SiteNo.Contains(OOBName)
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/getting-started-with-linq
P.S. Most db sdk's also provide asynchornous functions so use those aswell.
P.P.S. I didn't have an IDE so I eyeballed the code but the links should provide you with plenty of samples. Reply if you need more help.

Take 1gb ram to parse json object and give System.OutOfMemoryException after performing any other filter [duplicate]

This question already has answers here:
How to parse huge JSON file as stream in Json.NET?
(5 answers)
Closed 4 years ago.
public void ReadJsonFile()
{
try
{
string json = string.Empty;
using (StreamReader r = new StreamReader(val))
{
json = r.ReadToEnd();
var test = JObject.Parse(json);
JArray items = (JArray)test["locations"];
int length = items.Count;
data = new List<Info>();
for (int i = 0; i < items.Count; i++)
{
var d = test["locations"][i]["timestampMs"];
double dTimeSpan = Convert.ToDouble(d);
DateTime dtReturn = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc).AddSeconds(Math.Round(dTimeSpan / 1000d)).ToLocalTime();
string printDate = dtReturn.DayOfWeek.ToString() + "," + " " + dtReturn.ToShortDateString() + " " + dtReturn.ToShortTimeString();
day = dtReturn.DayOfWeek.ToString();
date = dtReturn.ToShortDateString();
time = dtReturn.ToShortTimeString();
var e = test["locations"][i]["latitudeE7"];
var f = test["locations"][i]["longitudeE7"];
var n = test["locations"][i]["accuracy"];
accuracy = n.ToString();
// getLocationByGeoLocation(e.ToString(), f.ToString());
var g = test["locations"][i]["activity"] != null;
if (g == true)
{
JArray items1 = (JArray)test["locations"][i]["activity"];
int length1 = items1.Count;
while (j < items1.Count)
{
if (j == 0)
{
var h = test["locations"][i]["activity"][j]["activity"][j]["type"];
type = h.ToString();
j = 1;
}
else { }
j++;
}
j = 0;
}
else { }
Info ddm = new Info(day, date, time, lat, longi, address, accuracy, type);
data.Add(ddm);
type = "";
}
}
return;
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
I am trying to parse JSON file. val is the name of my file to parse. Using StreamReader I am reading each line when I am trying to parse using jObject I will take around 1gb memory and give me System.OutOfMemoryException how can I parse JObject using small memory.
Please help me with this I don't have much idea of JSON.

Please read about JSON thoroughly. NewtonSof.JSON is are very famous library and it is well documented. Let us get back to your problem. As mentioned in comments you have lots of unnecessary middle steps while trying to parse your file. Moreover, you are trying to parse a big file in one go!. First thing first, this is the layout for your JSON
public partial class Data
{
[JsonProperty("locations")]
public Location[] Locations { get; set; }
}
public partial class Location
{
[JsonProperty("timestampMs")]
public string TimestampMs { get; set; }
[JsonProperty("latitudeE7")]
public long LatitudeE7 { get; set; }
[JsonProperty("longitudeE7")]
public long LongitudeE7 { get; set; }
[JsonProperty("accuracy")]
public long Accuracy { get; set; }
}
And while you are deserilazing you should do it object by object, not all at once
The following assumes that your stream is made of Data type of objects, if it is, made up of Location type of objects you have to change it
using (StreamReader streamReader = new StreamReader(val))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
reader.SupportMultipleContent = true;
var serializer = new JsonSerializer();
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
var data = serializer.Deserialize<Data>(reader);
//data.locations etc etc..
}
}
}

I could fix the System.OutOfMemoryException with the following steps:
If you are not using Visual Studio Hosting Process:
Uncheck the option:
Project->Properties->Debug->Enable the Visual Studio Hosting Process
if still the problem remains:
Go to Project->Properties->Build Events->Post-Build Event Command line and paste this 2 lines
call "$(DevEnvDir)..\..\vc\vcvarsall.bat" x86
"$(DevEnvDir)..\..\vc\bin\EditBin.exe" "$(TargetPath)" /LARGEADDRESSAWARE
Now, build the project

Array to List<t> c#

My current code works and output is correct. I am pulling data from a data.txt file and have successfully done so to an array using TextFieldParser. Is there a way to convert my code to a List? And how so? If converting is not an option then any recommendations on where to start with the code? Basically trying to go from an array to a list collections.
public partial class EmployeeInfoGeneratorForm : Form
{
public EmployeeInfoGeneratorForm()
{
InitializeComponent();
}
// button event handler
private void GenerateButton_Click(object sender, EventArgs e)
{
string[] parts;
if(File.Exists("..\\data.txt"))
{
TextFieldParser parser = new TextFieldParser("..\\data.txt");
parser.Delimiters = new string[] { "," };
while (true)
{
parts = parser.ReadFields();
if (parts == null)
{
break;
}
this.nameheadtxt.Text = parts[0];
this.addressheadtxt.Text = parts[1];
this.ageheadtxt.Text = parts[2];
this.payheadtxt.Text = parts[3];
this.idheadtxt.Text = parts[4];
this.devtypeheadtxt.Text = parts[5];
this.taxheadtxt.Text = parts[6];
this.emp1nametxt.Text = parts[7];
this.emp1addresstxt.Text = parts[8];
this.emp1agetxt.Text = parts[9];
this.emp1paytxt.Text = parts[10];
this.emp1idtxt.Text = parts[11];
this.emp1typetxt.Text = parts[12];
this.emp1taxtxt.Text = parts[13];
this.emp2nametxt.Text = parts[14];
this.emp2addresstxt.Text = parts[15];
this.emp2agetxt.Text = parts[16];
this.emp2paytxt.Text = parts[17];
this.emp2idtxt.Text = parts[18];
this.emp2typetxt.Text = parts[19];
this.emp2taxtxt.Text = parts[20];
this.emp3nametxt.Text = parts[21];
this.emp3addresstxt.Text = parts[22];
this.emp3agetxt.Text = parts[23];
this.emp3paytxt.Text = parts[24];
this.emp3idtxt.Text = parts[25];
this.emp3typetxt.Text = parts[26];
this.emp3taxtxt.Text = parts[27];
}
}
else //Error Message for if File isn't found
{
lblError.Text = "File Not Found";
}
}
}

In your code example there are two arrays.
First example
parser.Delimiters = new string[] { "," };
Since parser is a TextFieldParser, I can see that Delimiters must be set to a string array. So you cannot change it.
Second example
string[] parts;
parts = parser.ReadFields();
This array accepts the result of parser.ReadFields(). The output of that function is a string array, so this code can't be changed without breaking the call.
However, you can immediately convert it to a list afterward:
var parts = parser.ReadFields().ToList();
There isn't much point to this either.
An array is just as good as a list when the size of the array/list doesn't change after it is created. Making it into a list will just add overhead.

There are a number of problems here. I'd be inclined to write your code like this:
public static IEnumerable<List<string>> ParseFields(string file)
{
// Use "using" to clean up the parser.
using (var parser = new TextFieldParser(file))
{
parser.Delimiters = new string[] { "," };
// Use end-of-data, not checks for null.
while (!parser.EndOfData)
yield return parser.ReadFields().ToList();
}
}
I'd refactor your code to put the UI updates in one method:
private void UpdateText(List<string> parts ) { ... }
You only do something with the last element in the sequence; all your previous edits are lost. So be explicit about that:
private void GenerateButton_Click(object sender, EventArgs e)
{
// Use a named constant for constant strings used in several places
const string data = "..\\data.txt";
if(!File.Exists(data))
{
lblError.Text = "File Not Found";
} else {
var parts = ParseFields(data).LastOrDefault();
if (parts != null)
UpdateText(parts);
}
}
See how much cleaner that logic looks when you break it up into smaller parts? It's very pleasant to have methods that fit easily onto a page.

A direct answer to your question:
Use the List<T> constructor that takes an IEnumerable<T> parameter.
With that said, I would read Mr. Lippert's answer until you fully understand it.

Pull separate columns from .csv into separate arrays in c#

Background on this project. It started as a simple homework assignment that required me to store 5 zip codes and their corresponding cities. When a user puts a Zip code in a textbox, a corresponding city is returned, and likewise the opposite can be done. I wrote the code to return these values, but then I decided I wanted to store ALL zip codes and their corresponding Cities in an external .csv, and store those values in arrays and run the code off that because if its worth doing, its worth overdoing! To clarify, this is no longer for homework, just to learn more about using external files in C#.
In the following code, I have called to open the file successfully, now I just need help in figuring out how to pull the data that is stored in two separate columns (one for city, one for zip code) and store them in two arrays to be acted upon by the for loop. Here is the code I have now. You can see how I have previously stored the other values in arrays and pulled them out:
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void btnConvert2City_Click(object sender, EventArgs e)
{
try
{
string dir = System.IO.Path.GetDirectoryName(
System.Reflection.Assembly.GetExecutingAssembly().Location);
string path = dir + #"\zip_code_database_edited.csv";
var open = new StreamReader(File.OpenRead(path));
int EnteredZipcode = Convert.ToInt32(txtZipcode.Text.Trim());
string result = "No Cities Found";
string[] Cities = new String[5] { "FLINTSTONE", "JAMAICA", "SCHENECTADY", "COTTONDALE", "CINCINNATI" };
int[] Zipcode = new int[5] { 30725, 11432, 12345, 35453, 45263 };
for (int i = 0; i <= Zipcode.Length - 1; i++)
{
if (Zipcode[i] == EnteredZipcode)
{
result = Cities[i];
break;
}
}
string DisplayState = result;
txtCity.Text = DisplayState;
}
catch (FormatException)
{
MessageBox.Show("Input must be numeric value.");
}
catch (OverflowException)
{
MessageBox.Show("Zipcode to long. Please Re-enter");
}
}
private void btnConvert2Zipcode_Click(object sender, EventArgs e)
{
string dir = System.IO.Path.GetDirectoryName(
System.Reflection.Assembly.GetExecutingAssembly().Location);
string path = dir + #"\zip_code_database_edited.csv";
var open = new StreamReader(File.OpenRead(path));
String EnteredCity = txtCity.Text.ToUpper();
string result = "No Zipcode Found";
string[] Cities = new String[5] { "FLINTSTONE", "JAMAICA", "SCHENECTADY", "COTTONDALE", "CINCINNATI" };
int[] Zipcode = new int[5] { 30725, 11432, 12345, 35453, 45263 };
for (int i = 0; i <= Cities.Length - 1; i++)
{
if (Cities[i] == EnteredCity)
{
result = Convert.ToString(Zipcode[i]);
break;
}
}
string DisplayZip = result;
txtZipcode.Text = DisplayZip;
}
}
The following data is a snippet of what the data in my excel .csv looks like:
zip,primary_city
44273,Seville
44274,Sharon Center
44275,Spencer
44276,Sterling
44278,Tallmadge
44280,Valley City
44281,Wadsworth
44282,Wadsworth
44285,Wayland
And so on for about 46,000 rows.
How can I pull the zip and the primary_city into two separate arrays (I'm guessing with some ".Split "," "line) that my for-loop can operate on?
Also, if there are better ways to go about this, please let me know (but be sure to leave an explanation as I want to understand where you are coming from).

Don't create two separate array.Create a separate class for city
class City
{
public string Name{get;set;}
public int ZipCode{get;set;}
}
Now to read the data from that csv file
List<City> cities=File.ReadAllLines(path)
.Select(x=>new City
{
ZipCode=int.Parse(x.Split(',')[0]),
Name=x.Split(',')[1]
}).ToList<City>();
Or you can do this
List<City> cities=new List<City>();
foreach(String s in File.ReadAllLines(path))
{
City temp=new City();
temp.ZipCode=int.Parse(s.Split(',')[0]);
temp.Name=s.Split(',')[1];
cities.Add(temp);
}

You can try this:
string dir = System.IO.Path.GetDirectoryName(
System.Reflection.Assembly.GetExecutingAssembly().Location);
string path = dir + #"\zip_code_database_edited.csv";
var open = new StreamReader(File.OpenRead(path));
var cities = new HashList<string>();
var zipCodes = new HashList<int>();
var zipAndCity = new string[2];
string line = string.Empty;
using (open)
{
while ((line = reader.ReadLine()) != null)
{
zipAndCity = line.Split(",");
zipCodes.Add(int.Parse(zipAndCity[0]));
cities.Add(zipAndCity[1]);
}
}

I am posting this answer having learned much more about C# since I posted this question. When reading a CSV, there are better options than String.Split().
The .NET Framework already has a built-in dedicated CSV parser called TextFieldParser.
It's located in the Microsoft.VisualBasic.FileIO namespace.
Not only are there many edge cases that String.Split() is not properly equipped to handle, but it's also much slower to use StreamReader.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Insert huge files (2G) to mongodb - c#

In my experiments I found Parallel.ForEach(File.ReadLines("path")) to be the fastest. File size was about 42 GB. I also tried batching a set of 100 lines and save the batch but was slower than Parallel.ForEach. Another example: Read large txt file multithreaded?

Related

How to add data from Firebase to DataGridView using FireSharp

C# MVC Loop through list and update each record efficiently

Take 1gb ram to parse json object and give System.OutOfMemoryException after performing any other filter [duplicate]

Array to List<t> c#

Pull separate columns from .csv into separate arrays in c#

Categories

Resources