I have loads of .csv files I need to convert to .xslx after applying some formatting.
A file containing approx 20 000 rows and 7 columns takes 12 minutes to convert.
If the file contains more than 100 000 it runs for > 1 hour.
This is unfortunately not acceptable for me.
Code snippet:
var format = new ExcelTextFormat();
format.Delimiter = ';';
format.Encoding = new UTF7Encoding();
format.Culture = new CultureInfo(System.Threading.Thread.CurrentThread.CurrentCulture.ToString());
format.Culture.DateTimeFormat.ShortDatePattern = "dd.mm.yyyy";
using (ExcelPackage package = new ExcelPackage(new FileInfo(file.Name))){
ExcelWorksheet worksheet = package.Workbook.Worksheets.Add(Path.GetFileNameWithoutExtension(file.Name));
worksheet.Cells["A1"].LoadFromText(new FileInfo(file.FullName), format);
}
I have verified that it is the LoadFromText command that spends the time used.
Is there a way to speed things up?
I have tried without the "format" parameter, but the loadtime was the same.
What loadtimes are you experiencing?
My suggestion here is to read the file by yourself and then use the library to create the file.
The code to read the CSV could be as simple as:
List<String> lines = new List<String>();
using (StreamReader reader = new StreamReader("file.csv"))
{
String line;
while((line = reader.ReadLine()) != null)
{
lines.add(line);
}
}
//Now you got all lines of your CSV
//Create your file with EPPLUS
foreach(String line in lines)
{
var values = line.Split(';');
foreach(String value in values)
{
//use EPPLUS library to fill your file
}
}
I ran into a very similar problem with LoadFromCollection. EPPlus has to account for all situations in their methods to load data generically like that so there is a good deal of overhead. I ended up narrowing done the bottleneck to that method and ended up just manually coverting the data from the collection to Excel Cell object in EPPlus. Probably saved several minutes in my exports.
Plenty of examples on how to read csv data:
C# Read a particular value from CSV file
Related
This question already has answers here:
How do I generate a random integer in C#?
(31 answers)
Closed 3 years ago.
I have a csv file in my file explorer windows 10. This file contains a list of rows e.g.:
John, 5656, Phil, Simon,,Jude, Helen, Andy
Conor, 5656, Phil, Simon,,Jude, Helen, Andy
I am an automated tester using C#, selenium and visual studio. In the application I am testing, there is an upload button which imports the csv file.
How do I randomly change the second number automatically so the update would be 1234 on the first row, 4444 on the second row(just append randomly). I think I would need a random generator for this.
Any advice or snippets of code would be appreciated.
Do you want to append the CSV file before its uploaded to the program or after? Either way it would look something like this:
public File updateFile(string filePath){
List<string> modifiedNames;
using (StreamReader sr = File.OpenText(path))
{
string s;
while ((s = sr.ReadLine()) != null)
{
s = s + randomlyGeneratedSuffix();
newEntries.add(s)
}
}
using (StreamWriter sw = new StreamWriter("names.txt")) {
foreach (string s in modifiedNames) {
sw.WriteLine(s);
}
}
// return new file?
}
Reading the file before uploading, changing the numbers on the second position in csv and writing it again to disk should work. Here is a very simple approach, to help you get started:
var fileLines = File.ReadAllLines("file.csv");
var randomGenerator = new Random();
var newFileLines = new List<string>();
foreach (var fileLine in fileLines)
{
var lineValues = fileLine.Split(',');
lineValues[1] = randomGenerator.Next(1000, int.MaxValue).ToString();
var newLine = string.Join(",", lineValues);
newFileLines.Add(newLine);
}
File.WriteAllLines("file.csv", newFileLines);
Instead of updating an existing CSV file for testing I would generate a new one from code.
There are a lot of code examples online how to create a CSV file in C#, for example: Writing data into CSV file in C#
For random numbers you can use the random class: https://learn.microsoft.com/en-us/dotnet/api/system.random?view=netframework-4.7.2
I am currently programming a "search engine" in C# for a game, from which i get very large (3GB and more!) .csv and .json(l) files, I need to parse them, but it takes up very large amounts of RAM... what are good ways to parse them (I need all the data for transfering it into a DB)?
example csv:
id,station_id,commodity_id,supply,buy_price,sell_price,demand,collected_at
1,1,5,0,0,315,532,1486247405
2,1,6,0,0,6795,38,1486247405
3,1,7,0,0,527,318,1486247405
Unfortunatly no json example, but it is an array of OBJs which hold the data.
I used Microsoft.VisualBasic.FileIO.TextFieldParser and it was fast enough for a 2 GB .CSV file.
using (TextFieldParser sr = new TextFieldParser(datapath)
{
Delimiters = new string[1] { "," },
HasFieldsEnclosedInQuotes = true;
})
{
string[] values = sr.ReadFields();
while (values != null)
{
// ....
values = sr.ReadFields();
}
}
Hope it helps.
so I have this application that I have inherited from someone that is long gone. The gist of the application is that it reads in a .cvs file that has about 5800 lines in it, copies it over to another .cvs, which it creates new each time, after striping out a few things , #, ', &. Well everything works great, or it has until about a month ago. so I started checking into it, and what I have found so far is that there are about 131 items missing from the spreadsheet. Now I read someplace that the maximun amount of data a string can hold is over 1,000,000,000 chars, and my spreadsheet is way under that, around 800,000 chars, but the only thing I can think is doing it is the string object.
So anyway, here is the code in question, this piece appears
to both read in from the existing field, and output to the new file:
StreamReader s = new StreamReader(File);
//Read the rest of the data in the file.
string AllData = s.ReadToEnd();
//Split off each row at the Carriage Return/Line Feed
//Default line ending in most windows exports.
//You may have to edit this to match your particular file.
//This will work for Excel, Access, etc. default exports.
string[] rows = AllData.Split("\r\n".ToCharArray(), System.StringSplitOptions.RemoveEmptyEntries);
//Now add each row to the DataSet
foreach (string r in rows)
{
//Split the row at the delimiter.
string[] items = r.Split(delimiter.ToCharArray());
//Add the item
result.Rows.Add(items);
}
If anyone can help me I would really appreciate it. I either need to figure out how to split the data better, or I need to figure out why it is cutting out the last 131 lines from the existing excel file to the new excel file.
One easier way to do this, since you're using "\r\n" for lines, would be to just use the built-in line reading method: File.ReadLines(path)
foreach(var line in File.ReadLines(path))
{
var items = line.Split(',');
result.Rows.Add(items);
}
You may want to check out the TextFieldParser class, which is part of the Microsoft.VisualBasic.FileIO namespace (yes, you can use this with C# code)
Something along the lines of:
using(var reader = new TextFieldParser("c:\\path\\to\\file"))
{
//configure for a delimited file
reader.TextFieldType = FieldType.Delimited;
//configure the delimiter character (comma)
reader.Delimiters = new[] { "," };
while(!reader.EndOfData)
{
string[] row = reader.ReadFields();
//do stuff
}
}
This class can help with some of the issues of splitting a line into its fields, when the field may contain the delimiter.
I would like to insert text from one text file to another.
So for example I have a text file at C:\Users\Public\Test1.txt
first
second
third
forth
And i have a second text file at C:\Users\Public\Test2.txt
1
2
3
4
I want to insert Test2.txt into Test1.txt
The end result should be:
first
second
1
2
3
4
third
forth
It should be inserted at the third line.
So far I have this:
string strTextFileName = #"C:\Users\Public\test1.txt";
int iInsertAtLineNumber = 2;
string strTextToInsert = #"C:\Users\Public\test2.txt";
ArrayList lines = new ArrayList();
StreamReader rdr = new StreamReader(
strTextFileName);
string line;
while ((line = rdr.ReadLine()) != null)
lines.Add(line);
rdr.Close();
if (lines.Count > iInsertAtLineNumber)
lines.Insert(iInsertAtLineNumber,
strTextToInsert);
else
lines.Add(strTextToInsert);
StreamWriter wrtr = new StreamWriter(
strTextFileName);
foreach (string strNewLine in lines)
wrtr.WriteLine(strNewLine);
wrtr.Close();
However I get this when i run it:
first
second
C:\Users\Public\test2.txt
third
forth
Thanks in advance!
Instead of using StreamReaders/Writers, you can use methods from the File helper class.
const string textFileName = #"C:\Users\Public\test1.txt";
const string textToInsertFileName = #"C:\Users\Public\test2.txt";
const int insertAtLineNumber = 2;
List<string> fileContent = File.ReadAllLines(textFileName).ToList();
fileContent.InsertRange(insertAtLineNumber , File.ReadAllLines(textToInsertFileName));
File.WriteAllLines(textFileName, fileContent);
A List<string> is way more convenient than an ArrayList. I also renamed a couple of your variables (most notably textToInsertFileName, and removed the prefix cluttering your declarations, any modern IDE will tell you the datatype if you hover for half a second) and declared your constants with const.
Your original problem was related to the fact that you're never reading from strTextToInsert, looks like you thought it was already the text to insert where it's actually the filename.
Without changing your structure or types around too much you could create a method to read the lines
public ArrayList GetFileLines(string fileName)
{
var lines = new ArrayList();
using (var rdr = new StreamReader(fileName))
{
string line;
while ((line = rdr.ReadLine()) != null)
lines.Add(line);
}
return lines;
}
In the intiial question you were not reading the second file, in the following example it is a little easier to determine when you are reading the files and that each one is read:
string strTextFileName = #"C:\Users\Public\test1.txt";
int iInsertAtLineNumber = 2;
string strTextToInsert = #"C:\Users\Public\test2.txt";
ArrayList lines = new ArrayList();
lines.AddRange(GetFileLines(strTextFileName));
lines.InsertRange(iInsertAtLineNumber, GetFileLines(strTextToInsert));
using (var wrtr = new StreamWriter(strTextFileName))
{
foreach (string strNewLine in lines)
wrtr.WriteLine(strNewLine);
}
NOTE: if you wrap a reader or write in a using statement it will automatically close
I haven't tested this and it could be done better, but hopefully this gets you pointed in the right direction. This solution would completely rewrite the first file
I have a 60GB csv file I need to make some modifications to. The customer wants some changes to the files data, but I don't want to regenerate the data in that file because it took 4 days to do.
How can I read the file, line by line (not loading it all into memory!), and make edits to those lines as I go, replacing certain values etc.?
The process would be something like this:
Open a StreamWriter to a temporary file.
Open a StreamReader to the target file.
For each line:
Split the text into columns based on a delimiter.
Check the columns for the values you want to replace, and replace them.
Join the column values back together using your delimiter.
Write the line to the temporary file.
When you are finished, delete the target file, and move the temporary file to the target file path.
Note regarding Steps 2 and 3.1: If you are confident in the structure of your file and it is simple enough, you can do all this out of the box as described (I'll include a sample in a moment). However, there are factors in a CSV file that may need attention (such as recognizing when a delimiter is being used literally in a column value). You can drudge through this yourself, or try an existing solution.
Basic example just using StreamReader and StreamWriter:
var sourcePath = #"C:\data.csv";
var delimiter = ",";
var firstLineContainsHeaders = true;
var tempPath = Path.GetTempFileName();
var lineNumber = 0;
var splitExpression = new Regex(#"(" + delimiter + #")(?=(?:[^""]|""[^""]*"")*$)");
using (var writer = new StreamWriter(tempPath))
using (var reader = new StreamReader(sourcePath))
{
string line = null;
string[] headers = null;
if (firstLineContainsHeaders)
{
line = reader.ReadLine();
lineNumber++;
if (string.IsNullOrEmpty(line)) return; // file is empty;
headers = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
writer.WriteLine(line); // write the original header to the temp file.
}
while ((line = reader.ReadLine()) != null)
{
lineNumber++;
var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
// if there are no headers, do a simple sanity check to make sure you always have the same number of columns in a line
if (headers == null) headers = new string[columns.Length];
if (columns.Length != headers.Length) throw new InvalidOperationException(string.Format("Line {0} is missing one or more columns.", lineNumber));
// TODO: search and replace in columns
// example: replace 'v' in the first column with '\/': if (columns[0].Contains("v")) columns[0] = columns[0].Replace("v", #"\/");
writer.WriteLine(string.Join(delimiter, columns));
}
}
File.Delete(sourcePath);
File.Move(tempPath, sourcePath);
memory-mapped files is a new feature in .NET Framework 4 which can be used to edit large files.
read here http://msdn.microsoft.com/en-us/library/dd997372.aspx
or google Memory-mapped files
Just read the file, line by line, with streamreader, and then use REGEX! The most amazing tool in the world.
using (var sr = new StreamReader(new FileStream(#"C:\temp\file.csv", FileMode.Open)))
{
var line = sr.ReadLine();
while (!sr.EndOfStream)
{
// do stuff
line = sr.ReadLine();
}
}