FFmpeg.AutoGen example of how to split audio file - c#

I want to use the FFmpeg.AutoGen project from here: https://github.com/Ruslan-B/FFmpeg.AutoGen
I'm not familiar with the ffmpeg API, so I wanted to get an example of how to split an audio file into little files, for example the audio file is about 2 hours long (can be mp3,ogg,wav,etc.) and I want to split it into several little files of x minutes. The splitting should be done on the main audio file with timestamps from and to, for example from = 00:25:00 (meaning 25 minutes), to = 00:31:12 (meaning 31 minutes, 12 seconds) and the output should be the section from the main audio file, the result is 00:06:12 (meaning 6 minutes, 12 seconds) long.
How can this task be achieved with the project? Also a C example can help me, I would try to convert it into the framework.
Thanks for your responses.

FFmpeg.AutoGen
What I think you need to do:
Decode audio file
Extract audio samples from the desired start timestamp until desired end timestamp
Encode the extracted samples
Write new audio file
Here are some sources in C which might help you to put it all together:
Read any audio file type: Libavcodec tutorial: decode virtually any audio file in C/C++. Note: avcodec_decode_audio4 is already deprecated (see New AVCodec API) You should use avcodec_send_packet / avcodec_receive_frame.
Encode / decode audio: FFmpeg examples - especially decode_audio.c / encode_audio.c
I think that the effort to use FFmpeg.AutoGen is too high for your use-case and therefore I propose 2 alternatives: Use NAudio or FFmpeg via command line
NAudio
This code reads an MP3, extracts a defined segment and writes it to a WAV file. It is based on the blog post Concatenating Segments of an Audio File with NAudio and can be tweaked quite easily.
using NAudio.Wave;
using System;
namespace NAudioSegments
{
class SegmentProvider : IWaveProvider
{
private readonly WaveStream sourceStream;
private int segmentStart, segmentDuration;
public SegmentProvider(WaveStream sourceStream)
{
this.sourceStream = sourceStream;
}
public WaveFormat WaveFormat => sourceStream.WaveFormat;
public void DefineSegment(TimeSpan start, TimeSpan duration)
{
if (start + duration > sourceStream.TotalTime)
throw new ArgumentOutOfRangeException("Segment goes beyond end of input");
segmentStart = TimeSpanToOffset(start);
segmentDuration = TimeSpanToOffset(duration);
sourceStream.Position = segmentStart;
}
public int TimeSpanToOffset(TimeSpan ts)
{
var bytes = (int)(WaveFormat.AverageBytesPerSecond * ts.TotalSeconds);
bytes -= (bytes % WaveFormat.BlockAlign);
return bytes;
}
public int Read(byte[] buffer, int offset, int count)
{
int totalBytesRead = 0;
int bytesRead = 0;
do
{
bytesRead = ReadFromSegment(buffer, offset + totalBytesRead, count - totalBytesRead);
totalBytesRead += bytesRead;
} while (totalBytesRead < count && bytesRead != 0);
return totalBytesRead;
}
private int ReadFromSegment(byte[] buffer, int offset, int count)
{
var bytesAvailable = (int)(segmentStart + segmentDuration - sourceStream.Position);
var bytesRequired = Math.Min(bytesAvailable, count);
return sourceStream.Read(buffer, offset, bytesRequired);
}
}
class Program
{
static void Main(string[] args)
{
using (var source = new Mp3FileReader(#"<input-path>"))
{
var segmentProvider = new SegmentProvider(source);
// Add desired splitting e.g. start at 2 seconds, duration 1 second
segmentProvider.DefineSegment(TimeSpan.FromSeconds(2), TimeSpan.FromSeconds(1));
WaveFileWriter.CreateWaveFile(#"<output-path>", segmentProvider);
}
}
}
}
FFmpeg via command line
You could invoke ffmpeg directly from your C# code via the System.Diagnostics.Process class (see e.g. this SO question) instead of using FFmpeg.AutoGen.
You could then use the following command line for automatically splitting the audio file in segemnts of the same length beginning at 00:00:00
ffmpeg -i in.m4a -f segment -segment_time 300 -c copy out%03d.m4a
or you can change the start time with the parameter -ss (replace <start-time> with the number of seconds). You would need to repeat this for every segment.
ffmpeg -ss <start-time> -i in.m4a -c copy -t 300 out.m4a
Source on superuser

Related

Reading a MemoryStream containing opus audio with NAudio

I'm trying to play opus audio files from web which I try to buffer with a MemoryStream. I'm aware of NAudio's ability to take urls however I need to set cookies and user agent before I access the file so this eliminates that option.
My latest approach was to buffer 30~ seconds of stream, feed it to StreamMediaFoundationReader and write to the same MemoryStream when needed, however NAudio ends up playing the initial buffered segment and stop after that is completed. What would be the correct approach for this?
Here's my current code if needed. (I have no idea how audio streaming works so please go easy on me)
bufstr.setReadStream(httpreq.GetResponse().GetResponseStream()); //bufstr is a custom class which creates a memorystream I can write to.
StreamMediaFoundationReader streamread = new StreamMediaFoundationReader(bufstr.getStream());
bufstr.readToStream(); //get the initial 30~ seconds of content
waveOut.Init(streamread);
waveOut.Play();
int seconds = 0;
while(waveOut.PlaybackState == PlaybackState.Playing) {
Thread.Sleep(1000);
seconds++;
if (secs % 30 > 15) bufstr.readToStream();
}
bufstr's readToStream method
public void readToStream() {
int prevbufcount = totalbuffered; //I keep track of how many bytes I fetched from remote url.
while (stream.CanRead && prevbufcount + (30 * (this.bitrate / 8)) > totalbuffered && totalbuffered != contentlength) { //read around 30 seconds of content;
Console.Write($"Caching {prevbufcount + (30 * (this.bitrate / 8))}/{totalbuffered}\r");
byte[] buf = new byte[1024];
bufferedcount = stream.Read(buf, 0, 1024);
totalbuffered += bufferedcount;
memorystream.Write(buf, 0, bufferedcount);
}
}
While debugging I found out that content length I get from server does not match with the actual size of stream, so I ended up calculating content length via other details I get from server.
The issue might also be a race condition due to the fact that it stopped after I manually kept track of where I write on memory stream

Get waveform data from audio file using FFMPEG

I am writing an application that needs to get the raw waveform data of an audio file so I can render it in an application (C#/.NET). I am using ffmpeg to offload this task but it looks like ffmpeg can only output the waveform data as a png or as a stream to gnuplot.
I have looked at other libraries to do this (NAudio/CSCore) however they require windows/microsoft media foundation and since this app is going to be deployed to azure as a web app I can not use them.
My strategy was to just read the waveform data from the png itself but this seems hacky and over the top. The ideal output would be a fix sampled series of peaks in an array where each value in the array is the peak value (ranging from 1-100 or something, like this for example).
Sabona budi,
Wrote about the manual way to get waveform but then to show you an example, I found this code which does what you want (or at the least, you can learn something from it).
1) Use FFmpeg to get array of samples
Try the example code shown here : http://blog.wudilabs.org/entry/c3d357ed/?lang=en-US
Experiment with it, try tweaking with suggestions from manual etc... In that shown code just change string path to point to your own file-path. Edit the proc.StartInfo.Arguments section to replace the last section to look like:
proc.StartInfo.Arguments = "-i \"" + path + "\" -vn -ac 1 -filter:a aresample=myNum -map 0:a -c:a pcm_s16le -f data -";
That myNum from the part aresample=myNum is calculated by :
44100 * total Seconds = X.
myNum = X / WaveForm Width.
Finally use the ProcessBuffer function with this logic :
static void ProcessBuffer(byte[] buffer, int length)
{
float val; //amplitude value of a sample
int index = 0; //position within sample bytes
int slicePos = 0; //horizontal (X-axis) position for pixels of next slice
while (index < length)
{
val = BitConverter.ToInt16(buffer, index);
index += sizeof(short);
// use number in va to do something...
// eg: Draw a line on canvas for part of waveform's pixels
// eg: myBitmap.SetPixel(slicePos, val, Color.Green);
slicePos++;
}
}
If you want to do it manually without FFmpeg. You could try...
2) Decode audio to PCM
You could just load the audio file (mp3) into your app and first decode that to PCM (ie: raw digital audio). Then read just the PCM numbers to make the waveform. Don't read numbers directly from bytes of compression math like MP3.
These PCM data values (about audio amplitudes) go into a byte array. If your sound is 16-bit then you extract the PCM value by reading each sample as a short (ie: getting value of two consecutive bytes at once since 16 bits == 2 bytes length).
Basically when you have 16-bit audio PCM inside a byte array, every two bytes represents an audio sample's amplitude value. This value becomes your height (loudness) at each slice. A slice is a 1-pixel vertical line from a time in the waveform.
Now sample rate means how many samples per-second. Usually 44100 samples (44.1 khz). You can see that using 44 thousand pixels to represent one second of sound is not feasible, so divide total seconds by required waveform width. Take the result & multiply by 2 (to cover two bytes) and that is how you much you jump-&-sample the amplitudes as you form the waveform. Do this in a while loop.
You can use the function described in this tutorial to get the raw data decoded from an audio file as an array of double values.
Summarizing from the link:
The function decode_audio_file takes 4 parameters:
path: the path of the file to decode
sample_rate: the desired sample rate for the output data
data: a pointer to a pointer to double precision values, where the extracted data will be stored
size: a pointer to the length of the final extracted values array (number of samples)
It returns 0 upon success, and -1 in case of failure, assorted with error message written to the stderr stream.
The function code is below:
#include <stdio.h>
#include <stdlib.h>
#include <libavutil/opt.h>
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libswresample/swresample.h>
int decode_audio_file(const char* path, const int sample_rate, double** data, int* size) {
// initialize all muxers, demuxers and protocols for libavformat
// (does nothing if called twice during the course of one program execution)
av_register_all();
// get format from audio file
AVFormatContext* format = avformat_alloc_context();
if (avformat_open_input(&format, path, NULL, NULL) != 0) {
fprintf(stderr, "Could not open file '%s'\n", path);
return -1;
}
if (avformat_find_stream_info(format, NULL) < 0) {
fprintf(stderr, "Could not retrieve stream info from file '%s'\n", path);
return -1;
}
// Find the index of the first audio stream
int stream_index =- 1;
for (int i=0; i<format->nb_streams; i++) {
if (format->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO) {
stream_index = i;
break;
}
}
if (stream_index == -1) {
fprintf(stderr, "Could not retrieve audio stream from file '%s'\n", path);
return -1;
}
AVStream* stream = format->streams[stream_index];
// find & open codec
AVCodecContext* codec = stream->codec;
if (avcodec_open2(codec, avcodec_find_decoder(codec->codec_id), NULL) < 0) {
fprintf(stderr, "Failed to open decoder for stream #%u in file '%s'\n", stream_index, path);
return -1;
}
// prepare resampler
struct SwrContext* swr = swr_alloc();
av_opt_set_int(swr, "in_channel_count", codec->channels, 0);
av_opt_set_int(swr, "out_channel_count", 1, 0);
av_opt_set_int(swr, "in_channel_layout", codec->channel_layout, 0);
av_opt_set_int(swr, "out_channel_layout", AV_CH_LAYOUT_MONO, 0);
av_opt_set_int(swr, "in_sample_rate", codec->sample_rate, 0);
av_opt_set_int(swr, "out_sample_rate", sample_rate, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", codec->sample_fmt, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_DBL, 0);
swr_init(swr);
if (!swr_is_initialized(swr)) {
fprintf(stderr, "Resampler has not been properly initialized\n");
return -1;
}
// prepare to read data
AVPacket packet;
av_init_packet(&packet);
AVFrame* frame = av_frame_alloc();
if (!frame) {
fprintf(stderr, "Error allocating the frame\n");
return -1;
}
// iterate through frames
*data = NULL;
*size = 0;
while (av_read_frame(format, &packet) >= 0) {
// decode one frame
int gotFrame;
if (avcodec_decode_audio4(codec, frame, &gotFrame, &packet) < 0) {
break;
}
if (!gotFrame) {
continue;
}
// resample frames
double* buffer;
av_samples_alloc((uint8_t**) &buffer, NULL, 1, frame->nb_samples, AV_SAMPLE_FMT_DBL, 0);
int frame_count = swr_convert(swr, (uint8_t**) &buffer, frame->nb_samples, (const uint8_t**) frame->data, frame->nb_samples);
// append resampled frames to data
*data = (double*) realloc(*data, (*size + frame->nb_samples) * sizeof(double));
memcpy(*data + *size, buffer, frame_count * sizeof(double));
*size += frame_count;
}
// clean up
av_frame_free(&frame);
swr_free(&swr);
avcodec_close(codec);
avformat_free_context(format);
// success
return 0;
}
You will need the following flags to compile a program that uses : -lavcodec-ffmpeg -lavformat-ffmpeg -lavutil -lswresample
Depending on your system and installation, it could also be: -lavcodec -lavformat -lavutil -lswresample
and its usage is below:
int main(int argc, char const *argv[]) {
// check parameters
if (argc < 2) {
fprintf(stderr, "Please provide the path to an audio file as first command-line argument.\n");
return -1;
}
// decode data
int sample_rate = 44100;
double* data;
int size;
if (decode_audio_file(argv[1], sample_rate, &data, &size) != 0) {
return -1;
}
// sum data
double sum = 0.0;
for (int i=0; i<size; ++i) {
sum += data[i];
}
// display result and exit cleanly
printf("sum is %f", sum);
free(data);
return 0;
}

C# parsing of Freebase RDF dump yields only 11.5 million N-Triples instead of 1.9 billion

I'm working on building a C# program to read the RDF data in the Google Freebase data dump. To start out, I've written a simple loop to simply read the file and get a count of the Triples. However, instead of getting the 1.9 billion count as stated in the documentation page (referred above), my program is counting only about 11.5 million and then exiting. The relevant portion of the source code is given below (takes about 30 seconds to run).
What am I missing here?
// Simple reading through the gz file
try
{
using (FileStream fileToDecompress = File.Open(#"C:\Users\Krishna\Downloads\freebase-rdf-2014-02-16-00-00.gz", FileMode.Open))
{
int tupleCount = 0;
string readLine = "";
using (GZipStream decompressionStream = new GZipStream(fileToDecompress, CompressionMode.Decompress))
{
StreamReader sr = new StreamReader(decompressionStream, detectEncodingFromByteOrderMarks: true);
while (true)
{
readLine = sr.ReadLine();
if (readLine != null)
{
tupleCount++;
if (tupleCount % 1000000 == 0)
{ Console.WriteLine(DateTime.Now.ToShortTimeString() + ": " + tupleCount.ToString()); }
}
else
{ break; }
}
Console.WriteLine("Tuples: " + tupleCount.ToString());
}
}
}
catch (Exception ex)
{ Console.WriteLine(ex.Message); }
(I tried using GZippedNTriplesParser in dotNetRdf to read the data by building on this recommendation, but that seems to be choking on an RdfParseException right at the beginning (Tab delimiters? UTF-8??). So, for the moment, trying to roll my own).
The Freebase RDF dumps are built by a map/reduce job that outputs 200 individual Gzip files. Those 200 files are then concatenated into one final Gzip file. According to the Gzip spec, concatenating the raw bytes from multiple Gzip files will produce a valid Gzip file. A library that adheres to the spec should produce a single file with concatenated content of each input file when uncompressing that file.
Based on the number of triples that you're seeing, I'm guessing that your code is only uncompressing the first chunk of the file and ignoring the other 199. I'm not much of a C# programmer but from reading another Stackoverflow answer it seems like switching to DotNetZip will solve this problem.
I'm use DotNetZip and create decoration class GzipDecorator for "gzipped chunks" workaround.
sealed class GzipDecorator : Stream
{
private readonly Stream _readStream;
private GZipStream _gzip;
private long _totalIn;
private long _totalOut;
public GzipDecorator(Stream readStream)
{
Throw.IfArgumentNull(readStream, "readStream");
_readStream = readStream;
_gzip = new GZipStream(_readStream, CompressionMode.Decompress, true);
}
public override int Read(byte[] buffer, int offset, int count)
{
var bytesRead = _gzip.Read(buffer, offset, count);
if (bytesRead <= 0 && _readStream.Position < _readStream.Length)
{
_totalIn += _gzip.TotalIn + 18;
_totalOut += _gzip.TotalOut;
_gzip.Dispose();
_readStream.Position = _totalIn;
_gzip = new GZipStream(_readStream, CompressionMode.Decompress, true);
bytesRead = _gzip.Read(buffer, offset, count);
}
return bytesRead;
}
}
I managed to solve the problem by repacking dump using "7-zip" archiver. Maybe it helps you.

Play dynamically-created simple sounds in C# without external libraries

I need to be able to generate dynamically a waveform and play it, using C#, without any external libraries, and without having to store sound files on the hard disk. Latency isn't an issue; the sounds would be generated well before they are needed by the application.
Actually the Console.Beep() method might meet my needs if it weren't for the fact that Microsoft says it isn't supported in 64-bit versions of Windows.
If I generate my own sound dynamically I can get a more fancy than a simple beep. For example, I could make a waveform from a triangle wave that increases in frequency from 2 KHz to 4 KHz while decaying in volume. I don't need fancy 16-bit stereo, just 8-bit mono is fine. I don't need dynamic control over volume and pitch, just basically generate a soundfile in memory and play it without storing it.
Last time I needed to generate sounds was many years ago, on Apple II, on HP workstations, and on my old Amiga computer. Haven't needed to do it since then, and it seems that something simple that I describe has gotten a lot more complicated. I am having trouble believing that something so simple seems so hard. Most of the answers I see refer to NAudio or similar libraries, and that isn't an option for this project (aside from the fact that pulling in an entire library just to play a tone seems like a waste).
Based on one of the links in the answers I received, and some other pages I found about .wav header formats, here is my working code for a little class that generates an 8-bit "ding!" sound with a user-specified frequency and duration. It's basically a beep that decays linearly to zero in amplitude during the specified duration.
public class AlertDing {
private SoundPlayer player = null;
private BinaryWriter writer = null;
/// <summary>
/// Dynamically generate a "ding" sound and save it to a memory stream
/// </summary>
/// <param name="freq">Frequency in Hertz, e.g. 880</param>
/// <param name="tenthseconds">Duration in multiple of 1/10 second</param>
public AlertDing(double freq, uint tenthseconds) {
string header_GroupID = "RIFF"; // RIFF
uint header_FileLength = 0; // total file length minus 8, which is taken up by RIFF
string header_RiffType = "WAVE"; // always WAVE
string fmt_ChunkID = "fmt "; // Four bytes: "fmt "
uint fmt_ChunkSize = 16; // Length of header in bytes
ushort fmt_FormatTag = 1; // 1 for PCM
ushort fmt_Channels = 1; // Number of channels, 2=stereo
uint fmt_SamplesPerSec = 14000; // sample rate, e.g. CD=44100
ushort fmt_BitsPerSample = 8; // bits per sample
ushort fmt_BlockAlign =
(ushort)(fmt_Channels * (fmt_BitsPerSample / 8)); // sample frame size, in bytes
uint fmt_AvgBytesPerSec =
fmt_SamplesPerSec * fmt_BlockAlign; // for estimating RAM allocation
string data_ChunkID = "data"; // "data"
uint data_ChunkSize; // Length of header in bytes
byte [] data_ByteArray;
// Fill the data array with sample data
// Number of samples = sample rate * channels * bytes per sample * duration in seconds
uint numSamples = fmt_SamplesPerSec * fmt_Channels * tenthseconds / 10;
data_ByteArray = new byte[numSamples];
//int amplitude = 32760, offset=0; // for 16-bit audio
int amplitude = 127, offset = 128; // for 8-audio
double period = (2.0*Math.PI*freq) / (fmt_SamplesPerSec * fmt_Channels);
double amp;
for (uint i = 0; i < numSamples - 1; i += fmt_Channels) {
amp = amplitude * (double)(numSamples - i) / numSamples; // amplitude decay
// Fill with a waveform on each channel with amplitude decay
for (int channel = 0; channel < fmt_Channels; channel++) {
data_ByteArray[i+channel] = Convert.ToByte(amp * Math.Sin(i*period) + offset);
}
}
// Calculate file and data chunk size in bytes
data_ChunkSize = (uint)(data_ByteArray.Length * (fmt_BitsPerSample / 8));
header_FileLength = 4 + (8 + fmt_ChunkSize) + (8 + data_ChunkSize);
// write data to a MemoryStream with BinaryWriter
MemoryStream audioStream = new MemoryStream();
BinaryWriter writer = new BinaryWriter(audioStream);
// Write the header
writer.Write(header_GroupID.ToCharArray());
writer.Write(header_FileLength);
writer.Write(header_RiffType.ToCharArray());
// Write the format chunk
writer.Write(fmt_ChunkID.ToCharArray());
writer.Write(fmt_ChunkSize);
writer.Write(fmt_FormatTag);
writer.Write(fmt_Channels);
writer.Write(fmt_SamplesPerSec);
writer.Write(fmt_AvgBytesPerSec);
writer.Write(fmt_BlockAlign);
writer.Write(fmt_BitsPerSample);
// Write the data chunk
writer.Write(data_ChunkID.ToCharArray());
writer.Write(data_ChunkSize);
foreach (byte dataPoint in data_ByteArray) {
writer.Write(dataPoint);
}
player = new SoundPlayer(audioStream);
}
/// <summary>
/// Call this to clean up when program is done using this sound
/// </summary>
public void Dispose() {
if (writer != null) writer.Close();
if (player != null) player.Dispose();
writer = null;
player = null;
}
/// <summary>
/// Play "ding" sound
/// </summary>
public void Play() {
if (player != null) {
player.Stream.Seek(0, SeekOrigin.Begin); // rewind stream
player.Play();
}
}
}
Hopefully this should help others who are trying to produce a simple alert sound dynamically without needing a sound file.
The following article explains how *.wav file can be generated and played using SoundPlayer. Be aware that SoundPlayer can take a stream as an argument, so you can generate wav-file contents in a MemoryStream and avoid saving to a file.
http://blogs.msdn.com/b/dawate/archive/2009/06/24/intro-to-audio-programming-part-3-synthesizing-simple-wave-audio-using-c.aspx
I tried out the code-snipped from Anachronist (2012-10) - and it is working for me.
biggest hurdle for me:
get rid of the systematic "clicking-noise" at the end of "AlertDing" wav.
This is caused by a "soft-bug" in the code snipped:
for (uint i = 0; i < numSamples - 1; i += fmt_Channels)
needs to change to
for (uint i = 0; i < numSamples; i += fmt_Channels)
if not changed, a systematic "zero" will be generated at the end of each "play", causing a sharp clicking noise. (= amplitude jumps 0->min->0)
the original question implies "without clicking noise" of course :)

How to insert characters to a file using C#

I have a huge file, where I have to insert certain characters at a specific location. What is the easiest way to do that in C# without rewriting the whole file again.
Filesystems do not support "inserting" data in the middle of a file. If you really have a need for a file that can be written to in a sorted kind of way, I suggest you look into using an embedded database.
You might want to take a look at SQLite or BerkeleyDB.
Then again, you might be working with a text file or a legacy binary file. In that case your only option is to rewrite the file, at least from the insertion point up to the end.
I would look at the FileStream class to do random I/O in C#.
You will probably need to rewrite the file from the point you insert the changes to the end. You might be best always writing to the end of the file and use tools such as sort and grep to get the data out in the desired order. I am assuming you are talking about a text file here, not a binary file.
There is no way to insert characters in to a file without rewriting them. With C# it can be done with any Stream classes. If the files are huge, I would recommend you to use GNU Core Utils inside C# code. They are the fastest. I used to handle very large text files with the core utils ( of sizes 4GB, 8GB or more etc ). Commands like head, tail, split, csplit, cat, shuf, shred, uniq really help a lot in text manipulation.
For example if you need to put some chars in a 2GB file, you can use split -b BYTECOUNT, put the ouptut in to a file, append the new text to it, and get the rest of the content and add to it. This should supposedly be faster than any other way.
Hope it works. Give it a try.
You can use random access to write to specific locations of a file, but you won't be able to do it in text format, you'll have to work with bytes directly.
If you know the specific location to which you want to write the new data, use the BinaryWriter class:
using (BinaryWriter bw = new BinaryWriter (File.Open (strFile, FileMode.Open)))
{
string strNewData = "this is some new data";
byte[] byteNewData = new byte[strNewData.Length];
// copy contents of string to byte array
for (var i = 0; i < strNewData.Length; i++)
{
byteNewData[i] = Convert.ToByte (strNewData[i]);
}
// write new data to file
bw.Seek (15, SeekOrigin.Begin); // seek to position 15
bw.Write (byteNewData, 0, byteNewData.Length);
}
You may take a look at this project:
Win Data Inspector
Basically, the code is the following:
// this.Stream is the stream in which you insert data
{
long position = this.Stream.Position;
long length = this.Stream.Length;
MemoryStream ms = new MemoryStream();
this.Stream.Position = 0;
DIUtils.CopyStream(this.Stream, ms, position, progressCallback);
ms.Write(data, 0, data.Length);
this.Stream.Position = position;
DIUtils.CopyStream(this.Stream, ms, this.Stream.Length - position, progressCallback);
this.Stream = ms;
}
#region Delegates
public delegate void ProgressCallback(long position, long total);
#endregion
DIUtils.cs
public static void CopyStream(Stream input, Stream output, long length, DataInspector.ProgressCallback callback)
{
long totalsize = input.Length;
long byteswritten = 0;
const int size = 32768;
byte[] buffer = new byte[size];
int read;
int readlen = length < size ? (int)length : size;
while (length > 0 && (read = input.Read(buffer, 0, readlen)) > 0)
{
output.Write(buffer, 0, read);
byteswritten += read;
length -= read;
readlen = length < size ? (int)length : size;
if (callback != null)
callback(byteswritten, totalsize);
}
}
Depending on the scope of your project, you may want to decide to insert each line of text with your file in a table datastructure. Sort of like a database table, that way you can insert to a specific location at any given moment, and not have to read-in, modify, and output the entire text file each time. This is given the fact that your data is "huge" as you put it. You would still recreate the file, but at least you create a scalable solution in this manner.
It may be "possible" depending on how the filesystem stores files to quickly insert (ie, add additional) bytes in the middle. If it is remotely possible it may only be feasible to do so a full block at a time, and only by either doing low level modification of the filesystem itself or by using a filesystem specific interface.
Filesystems are not generally designed for this operation. If you need to quickly do inserts you really need a more general database.
Depending on your application a middle ground would be to bunch your inserts together, so you only do one rewrite of the file rather than twenty.
You will always have to rewrite the remaining bytes from the insertion point. If this point is at 0, then you will rewrite the whole file. If it is 10 bytes before the last byte, then you will rewrite the last 10 bytes.
In any case there is no function to directly support "insert to file". But the following code can do it accurately.
var sw = new Stopwatch();
var ab = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ";
// create
var fs = new FileStream(#"d:\test.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite, 262144, FileOptions.None);
sw.Restart();
fs.Seek(0, SeekOrigin.Begin);
for (var i = 0; i < 40000000; i++) fs.Write(ASCIIEncoding.ASCII.GetBytes(ab), 0, ab.Length);
sw.Stop();
Console.WriteLine("{0} ms", sw.Elapsed.TotalMilliseconds);
fs.Dispose();
// insert
fs = new FileStream(#"d:\test.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite, 262144, FileOptions.None);
sw.Restart();
byte[] b = new byte[262144];
long target = 10, offset = fs.Length - b.Length;
while (offset != 0)
{
if (offset < 0)
{
offset = b.Length - target;
b = new byte[offset];
}
fs.Position = offset; fs.Read(b, 0, b.Length);
fs.Position = offset + target; fs.Write(b, 0, b.Length);
offset -= b.Length;
}
fs.Position = target; fs.Write(ASCIIEncoding.ASCII.GetBytes(ab), 0, ab.Length);
sw.Stop();
Console.WriteLine("{0} ms", sw.Elapsed.TotalMilliseconds);
To gain better performance for file IO, play with "magic two powered numbers" like in the code above. The creation of the file uses a buffer of 262144 bytes (256KB) that does not help at all. The same buffer for the insertion does the "performance job" as you can see by the StopWatch results if you run the code. A draft test on my PC gave the following results:
13628.8 ms for creation and 3597.0971 ms for insertion.
Note that the target byte for insertion is 10, meaning that almost the whole file was rewritten.
Why don't you put a pointer to the end of the file (literally, four bytes above the current size of the file) and then, on the end of file write the length of inserted data, and finally the data you want to insert itself. For example, if you have a string in the middle of the file, and you want to insert few characters in the middle of the string, you can write a pointer to the end of file over some four characters in the string, and then write that four characters to the end together with the characters you firstly wanted to insert. It's all about ordering data. Of course, you can do this only if you are writing the whole file by yourself, I mean you are not using other codecs.

Categories