Needed to import a large number of text files and find some research material, particularly for my problem, I decided to post the solution here. I believe it will help someone else.
My files are registries of 3,000,000 up. Tried to read line by line, with StreamReader.ReadLine(), but it was impractical. Moreover, the files are too large to loads them in memory.
The solution was to load files in memory in blocks (buffers) using the streamReader.ReadBlock().
The difficulty I had was that the ReadBlock() reads byte-by-byte, occurring in a row or get another half. Then the next buffer the first line was incomplete. To correct, I load a string (resto) and concatenate with the 1st line (primeiraLinha) of the next buffer.
Another important detail in using the Split, in most examples the 1st verification of variables are followed Trim() to eliminate spaces. In this case I do not use because it joined the 1st and 2nd line buffer.
using System;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication2
{
class Program
{
static void Main()
{
const string arquivo = "Arquivo1.txt";
using (var streamReader = new StreamReader(arquivo))
{
int deslocamento = 1000;
int pStart = 0; // buffer starting position
int pEnd = deslocamento; // buffer end position
string resto = "";
for (int i = pStart; i < int.MaxValue; i += pStart)
{
string primeiraLinha;
char[] buffer = new char[pEnd-pStart];
streamReader.ReadBlock(buffer, 0, buffer.Length);
var bufferString = new String(buffer);
string[] bufferSplit = null;
bufferSplit = bufferString.Split(new char[] { '\n' });
foreach (var bs in bufferSplit )
{
if (bs != "")
{
if (resto != "")
{
primeiraLinha = resto + bs;
Console.WriteLine(primeiraLinha);
resto = "";
}
else
{
if (bs.Contains('\r'))
{
Console.WriteLine(bs);
}
else
{
resto = bs;
}
}
}
}
Console.ReadLine();
// Moves pointers
pStart = pEnd;
pEnd += deslocamento;
if (bufferString == null)
break;
}
}
}
}
}
I had a great help from my friend training, Gabriel Gustaf, the resolution of this problem.
If anyone has any suggestions to further improve the performance, or to make any comments, feel free.
C# have a designed class to work with large files: MemoryMappedFile. It's simple and I think could help you.
Related
I'm trying to read a string with StreamReader, so I don't know how to read it.
using System;
using System.Diagnostics;
using System.IO;
using System.Text;
namespace
{
class Program
{
static void Main(string[] args)
{
string itemCostsInput = "25.34\n10.99\n250.22\n21.87\n50.24\n15";
string payerCountInput = "8\n";
string individualCostInput = "52.24\n";
double individualCost = RestaurantBillCalculator.CalculateIndividualCost(reader2, totalCost);
Debug.Assert(individualCost == 54.14);
uint payerCount = RestaurantBillCalculator.CalculatePayerCount(reader3, totalCost);
Debug.Assert(payerCount == 9);
}
}
}
}
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
namespace as
{
public static class RestaurantBillCalculator
{
public static double CalculateTotalCost(StreamReader input)
{
// I want to read the input (not System.IO.StreamReader,
25.34
10.99
250.22
21.87
50.24
15
//below is what i tried..
int[] numbers = new int[6];
for (int i = 0; i < 5; i++)
{
numbers[int.Parse(input.ReadLine())]++;
}
for (int i = 0; i < 5; i++)
{
Console.WriteLine(numbers[i]);
}
return 0;
}
public static double CalculateIndividualCost(StreamReader input, double totalCost)
{
return 0;
}
public static uint CalculatePayerCount(StreamReader input, double totalCost)
{
return 0;
}
}
}
Even when I googled it, only file input/output came up with that phrase.
I want to get a simple string and read it.
int[] numbers = new int[6]; // The number at the index number
// take the given numbers
for (int i = 0; i < n; i++)
{
numbers[int. Parse(sr. ReadLine())]++;
}
I tried the above method, but it didn't work.
I just want to get the index and read the contents of itemCostsInput as it is. If I just execute Console.writeLine, String == System.IO.StreamReader
comes out I want to read and save the values of itemCostsInput respectively. I just want to do something like read.
I'm sorry I'm not good at English
I expected input Read
25.34
10.99
250.22
21.87
50.24
15
but console print System.IO.StreamReader
This lines are the ones causing (more) trouble I think:
for (int i = 0; i < 5; i++)
{
numbers[int.Parse(input.ReadLine())]++;
}
Should be
for (int i = 0; i < 5; i++)
{
numbers[i] = int.Parse(input.ReadLine());
}
But since you have a decimal input (in string format due to the streamreader), maybe numbers should be an array of decimals.
Also there are quite a few remarks about the use of StreamReader, since if the file doesn't have 5 or more lines, your program will also break. I let this here hoping will clarify something to you, though
Your code does not make sense in its current state.
Please read up on Streams.
Usually you'd get a stream from a file or from a network connection but not from a string.
You are confusing integer and double.
The double data type represents floating point numbers.
It seems to me that you just started programming and are missing out on most of the fundamentals.
First, convert your string input into a stream:
static System.IO.Stream GetStream(string input)
{
Stream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(input);
writer.Flush();
stream.Position = 0;
return stream;
}
Now you can convert your input to a stream like this:
// ... code ...
string itemCostsInput = "25.34\n10.99\n250.22\n21.87\n50.24\n15";
var dataStream = GetStream(itemCostsInput);
// ... code ...
Now you that you converted your string input into a stream you can start to parse your data and extract the numbers:
static List<double> GetDoubleFromStream(Stream stream)
{
if (stream == null) {
return new List<double>();
}
const char NEWLINE = '\n';
List<double> result = new List<double>();
using (var reader = new StreamReader(stream))
{
// Continue until end of stream has been reached.
while (reader.Peek() > -1)
{
string temp = string.Empty;
// Read while not end of stream and char is not new line.
while (reader.Peek() != NEWLINE && reader.Peek() > -1) {
temp += (char)reader.Read();
}
// Perform another read operation
// to skip the current new line character
// and continue reading.
reader.Read();
// Parse data to double if valid.
if (!(string.IsNullOrEmpty(temp)))
{
double d;
// Allow decimal points and ignore culture.
if (double.TryParse(
temp,
NumberStyles.AllowDecimalPoint,
CultureInfo.InvariantCulture,
out d))
{
result.Add(d);
}
}
}
}
return result;
}
This would be your intermediate result:
Now you can convert your input to a stream like this:
// ... code ...
string itemCostsInput = "25.34\n10.99\n250.22\n21.87\n50.24\n15";
var dataStream = GetStream(itemCostsInput);
var result = GetDoubleFromStream(dataStream);
// ... code ...
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
Hello i am trying to create a file that reads from a excel file 125000 ids and usernames which have to be delimited, create tokens based on that and then creating confirmations links. The problem is that at a certain point in time (after more than 30 000 iterations) the index goes out of range for no concrete reason.
using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using ExcelDataReader;
using Excel = Microsoft.Office.Interop.Excel;
namespace CoRegApp
{
public class Excel_Creation
{
static string[] tokens;
string emailConfirmationLink;
public List<string> EmailsList = new List<string>();
public List<string> userList = new List<string>();
//Create a variable of the token date
public Excel_Creation() { }
public void readExcel()
{
string filepath = "batch2.xlsx";
var tokenDate = DateTime.Now.AddDays(4).Date;
using (FileStream stream = File.Open(filepath, FileMode.Open, FileAccess.Read))
{
using (IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream))
{
DataSet result = excelReader.AsDataSet();
DataTable firstTable = result.Tables[0];
//StringBuilder sb = new StringBuilder();
foreach (DataRow dr in firstTable.Rows)
{
object[] arr = dr.ItemArray;
for (int i = 0; i < arr.Length; i++)
{
string input = ((Convert.ToString(arr[i])));
if (!string.IsNullOrWhiteSpace(input))
{
tokens = input.Split(';');
for (i = 0; i < 1; i++)
{
string token = EncryptionHelper.CreateToken(tokens[0], tokens[1], tokenDate);
emailConfirmationLink = "https://blablaconfirmation/Validate?token=" + token + "&blablavalidation2";
EmailsList.Add(emailConfirmationLink);
userList.Add((Convert.ToString(arr[i])));
Console.WriteLine(emailConfirmationLink);
}
}
//tokens =().Split(';'));
}
}
excelReader.Close();
}
}
}
public void MapToExcel()
{
//start excel
Excel.Application excapp = new Microsoft.Office.Interop.Excel.Application();
//if you want to make excel visible
excapp.Visible = true;
//create a blank workbook
var workbook = excapp.Workbooks.Add(Excel.XlWBATemplate.xlWBATWorksheet);
//Not done yet. You have to work on a specific sheet - note the cast
//You may not have any sheets at all. Then you have to add one with NsExcel.Worksheet.Add()
var sheet = (Excel.Worksheet)workbook.Sheets[1]; //indexing starts from 1
//now the list
string cellName;
int counter = 1;
foreach (string item in EmailsList)
{
cellName = "B" + counter.ToString();
var range = sheet.get_Range(cellName, cellName);
range.Value2 = item.ToString();
++counter;
}
string cellName2;
int counterB = 1;
foreach (string item in userList)
{
cellName2 = "A" + counterB.ToString();
var range = sheet.get_Range(cellName2, cellName2);
range.Value2 = item.ToString();
++counterB;
}
}//end of mapping method
}
}
You have reused the looping variable "i" within the loop of "i":
for (int i = 0; i < arr.Length; i++)
{
...
for (i = 0; i < 1; i++)
The clue is that you didn't have to declare the variable for the inner loop. You should always expect to declare your looping variable in the for (or foreach); or you're probably doing something wrong.
In this case, what will happen, is that it will enter the outer loop, set "i" to zero, check that i is less than arr.Length; do some other stuff, and if the conditions are right, it will enter the inner loop (which re-sets "i" to zero, checks it is less than 1, does the contents of the inner loop, increments i (because of the inner loop), drops out of that loop, reaches the end of the outer loop, increments i again (so now it's 2), and checks against arr.Length before possibly going round again.
That inner loop is effectively pointless because it will always do it once and only once, so I'd suggest removing that loop, and fixing the references to "i" within it to either be 0, or to stay as "i"; depending on what your intent was (because it's ambiguous which "i" you were trying to refer to).
If I can suggest that you always give your variables names, you may find that it not only helps prevent you doing this; but it will make your code more readable.
If it helps, you can think of a for-loop as being like a while loop that is coded like this...
int i = 0;
while(i < arr.Length)
{
...
i++;
}
But you have tinkered with "i" in the "..." part.
EDIT: additional:
tokens = input.Split(';');
...
string token = EncryptionHelper.CreateToken(tokens[0], tokens[1], tokenDate);
but there is no check of how many items are in tokens before using the indexers.
Thanks for your help.
Actually the issue was kind of dumb. Even if my array will always be resetted to contain only two items of index[0] and [1]. hardcoding it to access those values seems to make the loop go out of bound at some point in time. Don't really know. Thus simply replacing the
string token = EncryptionHelper.CreateToken(tokens[0], tokens[1], tokenDate);
with
int counter1 = counter;
counter1--;
string token = EncryptionHelper.CreateToken(tokens[counter1], tokens[counter], tokenDate);
solved the issue
I´m trying to split an audio file in some pieces.
The fact is: I have a byte array and I would like do split the wav file into some random pieces (3 for example).
Of course, I know that I can´t do something like this. But does anyone have an idea on how to do it?
byte[] result = stream.ToArray();
byte[] testing = new byte[44];
for (int ix = 0; ix < testing.Length; ix++)
{
testing[ix] = result[ix];
}
System.IO.File.WriteAllBytes("yourfilepath_" + System.Guid.NewGuid() + ".wav", testing);
I would like do build this solution in C# but I heard that there is a lib called Sox and I can split with silence gap like this:
sox in.wav out.wav silence 1 0.5 1% 1 5.0 1% : newfile : restart
But everytime I run this command, only one file is generated. (audio file lasts 5 seconds, and each splitted file must have something aroung 1 second).
What is the best way to do this?
Thank you very much!
EDIT
With SOX:
string sox = #"C:\Program Files (x86)\sox-14-4-1\sox.exe";
string inputFile = #"D:\Brothers Vibe - Rainforest.mp3";
string outputDirectory = #"D:\splittest";
string outputPrefix = "split";
int[] segments = { 10, 15, 30 };
IEnumerable<string> enumerable = segments.Select(s => "trim 0 " + s.ToString(CultureInfo.InvariantCulture));
string #join = string.Join(" : newfile : ", enumerable);
string cmdline = string.Format("\"{0}\" \"{1}%1n.wav" + "\" {2}", inputFile,
Path.Combine(outputDirectory, outputPrefix), #join);
var processStartInfo = new ProcessStartInfo(sox, cmdline);
Process start = System.Diagnostics.Process.Start(processStartInfo);
If SOX complains about libmad (for MP3) : copy DLLs next to it, see here
Alternatively you can use FFMPEG in the same manner :
ffmpeg -ss 0 -t 30 -i "Brothers Vibe - Rainforest.mp3" "Brothers Vibe - Rainforest.wav"
(see the docs for all the details)
You can do that easily with BASS.NET :
For the code below you pass in :
input file name
desired duration for each segment
output directory
prefix to use for each segment file
The method will check whether the file is long enough for the specified segments, if yes then it will cut the file to WAVs with the same sample rate, channels, bit depth.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Windows.Forms;
using Un4seen.Bass;
using Un4seen.Bass.Misc;
namespace WindowsFormsApplication2
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
if (!Bass.BASS_Init(-1, 44100, BASSInit.BASS_DEVICE_DEFAULT, IntPtr.Zero))
throw new InvalidOperationException("Couldn't initialize BASS");
string fileName = #"D:\Brothers Vibe - Rainforest.mp3";
var segments = new double[] {30, 15, 20};
string[] splitAudio = SplitAudio(fileName, segments, "output", #"D:\split");
}
private static string[] SplitAudio(string fileName, double[] segments, string prefix, string outputDirectory)
{
if (fileName == null) throw new ArgumentNullException("fileName");
if (segments == null) throw new ArgumentNullException("segments");
if (prefix == null) throw new ArgumentNullException("prefix");
if (outputDirectory == null) throw new ArgumentNullException("outputDirectory");
int i = Bass.BASS_StreamCreateFile(fileName, 0, 0,
BASSFlag.BASS_STREAM_PRESCAN | BASSFlag.BASS_STREAM_DECODE);
if (i == 0)
throw new InvalidOperationException("Couldn't create stream");
double sum = segments.Sum();
long length = Bass.BASS_ChannelGetLength(i);
double seconds = Bass.BASS_ChannelBytes2Seconds(i, length);
if (sum > seconds)
throw new ArgumentOutOfRangeException("segments", "Required segments exceed file duration");
BASS_CHANNELINFO info = Bass.BASS_ChannelGetInfo(i);
if (!Directory.Exists(outputDirectory)) Directory.CreateDirectory(outputDirectory);
int index = 0;
var list = new List<string>();
foreach (double segment in segments)
{
double d = segment;
long seconds2Bytes = Bass.BASS_ChannelSeconds2Bytes(i, d);
var buffer = new byte[seconds2Bytes];
int getData = Bass.BASS_ChannelGetData(i, buffer, buffer.Length);
string name = string.Format("{0}_{1}.wav", prefix, index);
string combine = Path.Combine(outputDirectory, name);
int bitsPerSample = info.Is8bit ? 8 : info.Is32bit ? 32 : 16;
var waveWriter = new WaveWriter(combine, info.chans, info.freq, bitsPerSample, true);
waveWriter.WriteNoConvert(buffer, buffer.Length);
waveWriter.Close();
list.Add(combine);
index++;
}
bool free = Bass.BASS_StreamFree(i);
return list.ToArray();
}
}
}
TODO
The extraction is not optimized, if you are concerned with memory usage, then the function should be enhanced to grab parts of a segments and write them progressively to the WaveWriter.
Notes
BASS.NET has a nag screen, but you can request for a free registration serial at their website.
Note, install BASS.NET then make sure to copy bass.dll from the base package next to your EXE. Also, you can use pretty much any audio formats, see their website for formats plugins and how to load them (BASS_PluginLoad).
I wrote a program, what is compute the difference of two string or compute a hamming distance.
I run in debug mode. And I saw, the at the string first the first element of string is missing. But the string second is good!
When I tested the first's length and second's length is equal.
Forexample:
I typed this: 00011
And in debug mode it's value only: 0011
. Or I typed this: "this", in debug the real value is only "his"
Somebody can explain me, why missing the first element of string?
The code:
while (Console.Read() != 'X')
{
string first = Console.ReadLine();
string second = Console.ReadLine();
int distance = 0;
for (int i = 0; i < first.Length; i++)
{
if (first[i]!= second[i])
{
++distance;
}
}
Console.WriteLine("Hamming distance is {0}.", distance);
}
I tried modify the iteration, forexample the loop was ++i, or the first[i-1] but these aren't solve my problem.
Console.Read() reads the first character from the buffer. This character will not be included in the ReadLine().
I would personally find a better way to end your program such as if first=="quit" or by some other syntaxic means.
You consume the first char with Console.Read() so it will not appear in first:
string first = Console.ReadLine();
while ((first != null) && (first[0] != 'X'))
{
string second = Console.ReadLine();
int distance = 0;
for (int i = 0; i < first.Length; i++)
{
if (first[i]!= second[i])
{
++distance;
}
}
Console.WriteLine("Hamming distance is {0}.", distance);
first = Console.ReadLine();
}
I have the same problem in vb.net and found out that it was causing by "console.readkey()". console should only read one at time.See you have multiple read function at same time.
like Readkey() at main() and readline() on Background.thread...
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
namespace_File_Handling
{
class Program
{
static void Main(string[] args)
{
string path = #"E:\File.txt";
StreamReader r1 = new StreamReader(path);
string m = r1.ReadToEnd();
Console.WriteLine(m);
Console.ReadKey();
r1.Close();
StreamWriter wr = File.AppendText(path);
string na = Convert.ToString(Console.ReadLine());
wr.WriteLine(na);
wr.Close();
Console.WriteLine(na);
Console.ReadKey();
StreamReader rd = new StreamReader(path);
string val = rd.ReadToEnd();
Console.WriteLine(val);
rd.Close();
Console.ReadKey();
}
}
}
the values are comma separeted so I am using a stringbuilder to build up the values. then write them to the appropriate buffer. I noticed a considerable time spent in the builder.ToString and the Parse functions. Do I have to write unsafe code to overcome this problem? and what's the best way to acheive what I want
private static void ReadSecondBySecondFileToEndBytes(FileInfo file, SafeDictionary<short, SafeDictionary<string, SafeDictionary<int, decimal>>> dayData)
{
string name = file.Name.Split('.')[0];
int result = 0;
int index = result;
int length = 1*1024; //1 kb
char[] buffer = new char[length];
StringBuilder builder = new StringBuilder();
bool pendingTick = true;
bool pendingSymbol = true;
bool pendingValue = false;
string characterString = string.Empty;
short symbol = 0;
int tick = 0;
decimal value;
using (StreamReader streamReader = (new StreamReader(file.FullName)))
{
while ((result = streamReader.Read(buffer, 0, length)) > 0)
{
int i = 0;
while (i < result)
{
if (buffer[i] == '\r' || buffer[i] == '\n')
{
pendingTick = true;
if (pendingValue)
{
value = decimal.Parse(builder.ToString());
pendingSymbol = true;
pendingValue = false;
dayData[symbol][name][tick] = value;
builder.Clear();
}
}
else if (buffer[i] == ',') // new value to capture
{
if (pendingTick)
{
tick = int.Parse(builder.ToString());
pendingTick = false;
}
else if (pendingSymbol)
{
symbol = short.Parse(builder.ToString());
pendingValue = true;
pendingSymbol = false;
}
else if (pendingValue)
{
value = decimal.Parse(builder.ToString());
pendingSymbol = true;
pendingValue = false;
dayData[symbol][name][tick] = value;
}
builder.Clear();
}
else
builder.Append(buffer[i]);
i++;
}
}
}
}
My suggestion would be to not try to parse the majority of the file as you are doing now, but go for something like this:
using (var reader = File.OpenText("<< filename >>"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] parts = line.Split(',');
// Process the different parts of the line here.
}
}
The main difference here is that you are not parsing line ends and separation on comma's. The advantage being that when you use high level methods like ReadLine(), the StreamReader (which File.OpenText() returns) can optimize for reading the file line by line. The same goes for String.Split().
Using these high level methods will almost always be faster then when you parse the buffer yourself.
With the approach above, you don't have to use the StringBuilder anymore and can just get your values like this:
tick = int.Parse(parts[0]);
symbol = short.Parse(parts[1]);
value = decimal.Parse(parts[2]);
dayData[symbol][name][tick] = value;
I have not verified the above snippet; please verify that these lines are correct, or correct them for your business logic.
You got the wrong impression. Yes, while you are testing your program, you'll indeed see most time being spent inside the Parse() and builder. Because that is the only code that does any real work.
But that's not going to be this way in production. Then all the time will be spent in the StreamReader. Because the file won't be present in the file system cache like it is when you run your program over and over again on your dev machine. In production, the file has to be read off a disk drive. And that's glacially slow, disk I/O is the true bottleneck of your program. Making the parsing twice as fast will only make your program a few percent faster, if at all.
Don't compromise the reliability or maintainability of your code for such a small gain.