C# Combine Archive Divided Into One File - c#

Code:
public void mergeFiles(string dir)
{
for (int i = 0; i < parts; i++)
{
if (!File.Exists(dir))
{
File.Create(dir).Close();
}
var output = File.Open(dir, FileMode.Open);
var input = File.Open(dir + ".part" + (i + 1), FileMode.Open);
input.CopyTo(output);
output.Close();
input.Close();
File.Delete(dir + ".part" + (i + 1));
}
}
dir variable is for example /path/file.txt.gz
I have a file packed into a .gz archive. This archive is divided into e.g. 8 parts and I want to get this file.
The problem is that I don't know how to combine these files "file.gz.part1..." to extract them later.
When I use the above function, the archive is corrupted.
I have been struggling with it for a week, looking on the Internet, but this is the best solution I have found and it does not work.
Anyone have any advice on how to combine archive parts into one file?

Your code has a few problems. If you look at the documentation for System.IO.Stream.Close you will see the following remark (emphasis mine):
Closes the current stream and releases any resources (such as sockets and file handles) associated with the current stream. Instead of calling this method, ensure that the stream is properly disposed.
So, per the docs, you want to dispose your streams rather than calling close directly (I'll come back to that in a second). Ignoring that, your main problem lies here:
var output = File.Open(dir, FileMode.Open);
You're using FileMode.Open for your output file. Again from the docs:
Specifies that the operating system should open an existing file. The ability to open the file is dependent on the value specified by the FileAccess enumeration. A FileNotFoundException exception is thrown if the file does not exist.
That's opening a stream at the beginning of the file. So, you're writing each partial file over the beginning of your output file repeatedly. I'm sure you noticed that your combined file size was only as large as the largest partial file. Take a look at FileMode.Append on the other hand:
Opens the file if it exists and seeks to the end of the file, or creates a new file. This requires Append permission. FileMode.Append can be used only in conjunction with FileAccess.Write. Trying to seek to a position before the end of the file throws an IOException exception, and any attempt to read fails and throws a NotSupportedException exception.
OK - but backing up even a step further, this:
if (!File.Exists(dir))
{
File.Create(dir).Close();
}
var output = File.Open(dir, FileMode.Open);
... is ineffecient. Why would we check for the file existing n number of times, then open/close it n number of times? We can just create the file as the first step, and leave that output stream open until we have appended all of our data to it.
So, how would we refactor your code to use IDisposable while fixing your bug? Check out the using statement. Putting all of this together, your code might look like this:
public void mergeFiles(string dir)
{
using (FileStream combinedFile = File.Create(dir))
{
for (int i = 0; i < parts; i++)
{
// Since this string is referenced more than once, capture as a
// variable to lower risk of copy/paste errors.
var splitFileName = dir + ".part" + (i + 1);
using (FileStream filePart = File.Open(splitFileName, FileMode.Open))
{
filePart.CopyTo(combinedFile);
}
// Note that it's safe to delete the file now, because our filePart
// stream has been disposed as it is out of scope.
File.Delete(splitFileName);
}
}
}
Give that a try. And here's an entire working program with a contrived example that you can past into a new console app and run:
using System.IO;
using System.Text;
namespace temp_test
{
class Program
{
static int parts = 10;
static void Main(string[] args)
{
// First we will generate some dummy files.
generateFiles();
// Next, open files and combine.
combineFiles();
}
/// <summary>
/// A contived example to generate some files.
/// </summary>
static void generateFiles()
{
for (int i = 0; i < parts; i++)
{
using (FileStream newFile = File.Create("splitfile.part" + i))
{
byte[] info = new UTF8Encoding(true).GetBytes($"This is File # ${i.ToString()}");
newFile.Write(info);
}
}
}
/// <summary>
/// A contived example to combine our files.
/// </summary>
static void combineFiles()
{
using (FileStream combinedFile = File.Create("combined"))
{
for (int i = 0; i < parts; i++)
{
var splitFileName = "splitfile.part" + i;
using (FileStream filePart = File.Open(splitFileName, FileMode.Open))
{
filePart.CopyTo(combinedFile);
}
// Note that it's safe to delete the file now, because our filePart
// stream has been disposed as it is out of scope.
File.Delete(splitFileName);
}
}
}
}
}
Good luck and welcome to StackOverflow!

Related

Change the size of a file without opening the file

With std::filesystem::resize_file in C++, it is possible to change the size of a file without opening the file.
Is there any similar function in C#, which allows changing the size of a file without opening it?
I think opening a file as a FileStream and saving it again with a new size will be slower.
Using FileStream.SetLength() will be about as fast as you can make it.
It ends up calling the Windows API to set the length of the file, the same as the std::filesystem::resize_file().
So you just need to do something like this, and it will be fast enough:
using (var file = File.Open(myFilePath, FileMode.Open))
{
file.SetLength(myRequiredFileSize);
}
The implementation of FileStream.SetLength() is:
private void SetLengthCore(long value)
{
Contract.Assert(value >= 0, "value >= 0");
long origPos = _pos;
if (_exposedHandle)
VerifyOSHandlePosition();
if (_pos != value)
SeekCore(value, SeekOrigin.Begin);
if (!Win32Native.SetEndOfFile(_handle)) {
int hr = Marshal.GetLastWin32Error();
if (hr==__Error.ERROR_INVALID_PARAMETER)
throw new ArgumentOutOfRangeException("value", Environment.GetResourceString("ArgumentOutOfRange_FileLengthTooBig"));
__Error.WinIOError(hr, String.Empty);
}
// Return file pointer to where it was before setting length
if (origPos != value) {
if (origPos < value)
SeekCore(origPos, SeekOrigin.Begin);
else
SeekCore(0, SeekOrigin.End);
}
}
(Note that SeekCore() just calls the the Windows API SetFilePointer() function.)
Doing this does NOT read the file into memory.
Also, the Windows API function SetEndOfFile() does not write to the extended region, so it is fast. The documentation states If the file is extended, the contents of the file between the old end of the file and the new end of the file are not defined. - this is as a result of data not being written to the extended region.
As test, I tried the following code:
using System;
using System.Diagnostics;
using System.IO;
namespace Demo
{
public class Program
{
public static void Main()
{
string filename = #"e:\tmp\test.bin";
File.WriteAllBytes(filename, new byte[0]); // Create empty file.
var sw = Stopwatch.StartNew();
using (var file = File.Open(filename, FileMode.Open))
{
file.SetLength(1024*1024*1024);
}
Console.WriteLine(sw.Elapsed);
}
}
}
My E:\ drive is a hard drive, not an SSD.
The output was: 00:00:00.0003574
So it took less than a hundreth of a second to extend the file to 1GB in size.

.dat* filename handling in .Net and Windows

While troubleshooting a performance problem, I came across an issue in Windows 8 which relates to file names containing .dat (e.g. file.dat, file.data.txt).
I found that it takes over 6x as long to create them as any file with any other extension.
The same issue occurs in windows explorer where it takes significantly longer when copying folders containing .dat* files.
I have created some sample code to illustrate the issue.
internal class DatExtnIssue
{
internal static void Run()
{
CreateFiles("txt");
CreateFiles("dat");
CreateFiles("dat2");
CreateFiles("doc");
}
internal static void CreateFiles(string extension)
{
var folder = Path.Combine(#"c:\temp\FileTests", extension);
if (!Directory.Exists(folder))
Directory.CreateDirectory(folder);
var sw = new Stopwatch();
sw.Start();
for (var n = 0; n < 500; n++)
{
var fileName = Path.Combine(folder, string.Format("File-{0:0000}.{1}", n, extension));
using (var fileStream = File.Create(fileName))
{
// Left empty to show the problem is due to creation alone
// Same issue occurs regardless of writing, closing or flushing
}
}
sw.Stop();
Console.WriteLine(".{0} = {1,6:0.000}secs", extension, sw.ElapsedMilliseconds/1000.0);
}
}
Results from creating 500 files with the following extensions
.txt = 0.847secs
.dat = 5.200secs
.dat2 = 5.493secs
.doc = 0.806secs
I got similar results using:
using (var fileStream = new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None))
{ }
and:
File.WriteAllText(fileName, "a");
This caused a problem as I had a batch application which was taking far too long to run. I finally tracked it down to this.
Does anyone have any idea why this would be happening? Is this by design? I hope not, as it could cause problems for high-volume application creating .dat files.
It could be something on my PC but I have checked the windows registry and found no unusual extension settings.
If all else fails, try a kludge:
Write all files out as .txt and then rename *.txt to .dat. Maybe it will be faster :)

Issues in with line end when writing multiple files into one file with C#

I'm trying to write 4 sets of 15 txt files into 4 large txt files in order to make it easier to import into another app.
Here's my code:
using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace AggregateMultipleFiles
{
class AggMultiFilestoOneFile
{/*This program can reduce multiple input files and grouping results into one file for easier app loading.*/
static void Main(string[] args)
{
TextWriter writer = new StreamWriter("G:/user/data/yr2009/fy09_filtered.txt");
int linelen =495;
char[] buf = new char[linelen];
int line_num = 1;
for (int i = 1; i <= 15; i++)
{
TextReader reader = File.OpenText("G:/user/data/yr2009/fy09_filtered"+i+".txt");
while (true)
{
int nin = reader.Read(buf, 0, buf.Length);
if (nin == 0 )
{
Console.WriteLine("File ended");
break;
}
writer.Write(new String(buf));
line_num++;
}
reader.Close();
}
Console.WriteLine("done");
Console.WriteLine(DateTime.Now);
Console.ReadLine();
writer.Close();
}
}
}
My problem is somewhere in calling the end of the file. It doesn't finishing writing the last line of a file, and then, proceeds to start writing the first line of the next file half way through the middle of the last line of the previous file.
This is throwing off all of my columns and data in the app it imports into.
Someone suggested that perhaps I need to pad the end of each line of each of the 15 files with carriage and line return, \r\n.
Why doesn't what I have work?
Would padding work instead? How would I write that?
Thank you!
I strongly suspect this is the problem:
writer.Write(new String(buf));
You're always creating a string from all of buf, rather than just the first nin characters. If any of your files are short, you may end up with "null" Unicode characters (i.e. U+0000) which may be seen as string terminators in some apps.
There's no need even to create a string - just use:
writer.Write(buf, 0, nin);
(I would also strongly suggest using using statements instead of manually calling Close, by the way.)
It's also worth noting that there's nothing to guarantee that you're really reading a line at a time. You might as well increase your buffer size to something like 32K in order to read the files in potentially fewer chunks.
Additionally, if the files are small enough, you could read each one into memory completely, which would make your code simpler:
using (var writer = File.CreateText("G:/user/data/yr2009/fy09_filtered.txt"))
{
for (int i = 1; i <= 15; i++)
{
string inputName = "G:/user/data/yr2009/fy09_filtered" + i + ".txt";
writer.Write(File.ReadAllText(inputName));
}
}

How can I split a big text file into smaller file?

I have a big file with some text, and I want to split it into smaller files.
In this example, What I do:
I open a text file let's say with 10 000 lines into it
I set a number of package=300 here, which means, that's the small file limit, once a small file has 300 lines into it, close it, open a new file for writing for example (package2).
Same, as step 2.
You already know
Here is the code from my function that should do that. The ideea (what I dont' know) is how to close, and open a new file once it has reached the 300 limit (in our case here).
Let me show you what I'm talking about:
int nr = 1;
package=textBox1.Text;//how many lines/file (small file)
string packnr = nr.ToString();
string filer=package+"Pack-"+packnr+"+_"+date2+".txt";//name of small file/s
int packtester = 0;
int package= 300;
StreamReader freader = new StreamReader("bigfile.txt");
StreamWriter pak = new StreamWriter(filer);
while ((line = freader.ReadLine()) != null)
{
if (packtester < package)
{
pak.WriteLine(line);//writing line to small file
packtester++;//increasing the lines of small file
}
else if (packtester == package)//in this example, checking if the lines
//written, got to 300
{
packtester = 0;
pak.Close();//closing the file
nr++;//nr++ -> just for file name to be Pack-2;
packnr = nr.ToString();
StreamWriter pak = new StreamWriter(package + "Pack-" + packnr + "+_" + date2 + ".txt");
}
}
I get this errors:
Cannot use local variable 'pak' before it is declared
A local variable named 'pak' cannot be declared in this scope because it would give a different meaning to 'pak', which is already used in a 'parent or current' scope to denote something else
Try this:
public void SplitFile()
{
int nr = 1;
int package = 300;
DateTime date2 = DateTime.Now;
int packtester = 0;
using (var freader = new StreamReader("bigfile.txt"))
{
StreamWriter pak = null;
try
{
pak = new StreamWriter(GetPackFilename(package, nr, date2), false);
string line;
while ((line = freader.ReadLine()) != null)
{
if (packtester < package)
{
pak.WriteLine(line); //writing line to small file
packtester++; //increasing the lines of small file
}
else
{
pak.Flush();
pak.Close(); //closing the file
packtester = 0;
nr++; //nr++ -> just for file name to be Pack-2;
pak = new StreamWriter(GetPackFilename(package, nr, date2), false);
}
}
}
finally
{
if(pak != null)
{
pak.Dispose();
}
}
}
}
private string GetPackFilename(int package, int nr, DateTime date2)
{
return string.Format("{0}Pack-{1}+_{2}.txt", package, nr, date2);
}
Logrotate can do this automatically for you. Years have been put into it and it's what people trust to handle their sometimes very large webserver logs.
Note that the code, as written, will not compile because you define the variable pak more than once. It should otherwise function, though it has some room for improvement.
When working with files, my suggestion and the general norm is to wrap your code in a using block, which is basically syntactic sugar built on top of a finally clause:
using (var stream = File.Open("C:\hi.txt"))
{
//write your code here. When this block is exited, stream will be disposed.
}
Is equivalent to:
try
{
var stream = File.Open(#"C:\hi.txt");
}
finally
{
stream.Dispose();
}
In addition, when working with files, always prefer opening file streams using very specific permissions and modes as opposed to using the more sparse constructors that assume some default options. For example:
var stream = new StreamWriter(File.Open(#"c:\hi.txt", FileMode.CreateNew, FileAccess.ReadWrite, FileShare.Read));
This will guarantee, for example, that files should not be overwritten -- instead, we assume that the file we want to open doesn't exist yet.
Oh, and instead of using the check you perform, I suggest using the EndOfStream property of the StreamReader object.
This code looks like it closes the stream and re-opens a new stream when you hit 300 lines. What exactly doesn't work in this code?
One thing you'll want to add is a final close (probably with a check so it doesn't try to close an already closed stream) in case you don't have an even multiple of 300 lines.
EDIT:
Due to your edit I see your problem. You don't need to redeclare pak in the last line of code, simply reinitialize it to another streamwriter.
(I don't remember if that is disposable but if it is you probably should do that before making a new one).
StreamWriter pak = new StreamWriter(package + "Pack-" + packnr + "+_" + date2 + ".txt");
becomes
pak = new StreamWriter(package + "Pack-" + packnr + "+_" + date2 + ".txt");

Reading of particular line of text file in WP7 based on some starting string and replacing (overwriting) it

I am able to do read/write/append operation on text file storing in isolated storage in WP7 application.
My scenario is that I am storing space seperated values in text file inside isolated storage.
So if I have to find for some particular line having some starting key then how to overwrite
value for that key without affecting the other line before and after it.
Example:
Key Value SomeOtherValue
*status read good
status1 unread bad
status2 null cantsay*
So if I have to change the whole second line based on some condition with key as same
status1 read good
How can I achieve this?
There are a number of ways you could do this, and the method you choose should be best suited to the size and complexity of the data file.
One option to get you started is to use the static string.Replace() method. This is crude, but if your file is only small then there is nothing wrong with it.
class Program
{
static void Main(string[] args)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine("*status read good");
sb.AppendLine("status1 unread bad");
sb.AppendLine("status2 null cantsay*");
string input = sb.ToString();
var startPos = input.IndexOf("status1");
var endPos = input.IndexOf(Environment.NewLine, startPos);
var modifiedInput = input.Replace(oneLine.Substring(startPos, endPos - startPos), "status1 read good");
Console.WriteLine(modifiedInput);
Console.ReadKey();
}
}
If you store this information in text files then there won't be a way around replacing whole files. The following code does exactly this and might even be what you are doing right now.
// replace a given line in a given text file with a given replacement line
private void ReplaceLine(string fileName, int lineNrToBeReplaced, string newLine)
{
using (IsolatedStorageFile isf = IsolatedStorageFile.GetUserStoreForApplication())
{
// the memory writer will hold the read and modified lines
using (StreamWriter memWriter = new StreamWriter(new MemoryStream()))
{
// this is for reading lines from the source file
using (StreamReader fileReader = new StreamReader(new IsolatedStorageFileStream(fileName, System.IO.FileMode.Open, isf)))
{
int lineCount = 0;
// iterate file and read lines
while (!fileReader.EndOfStream)
{
string line = fileReader.ReadLine();
// check if this is the line which should be replaced; check is done by line
// number but could also be based on content
if (lineCount++ != lineNrToBeReplaced)
{
// just copy line from file
memWriter.WriteLine(line);
}
else
{
// replace line from file
memWriter.WriteLine(newLine);
}
}
}
memWriter.Flush();
memWriter.BaseStream.Position = 0;
// re-create file and save all lines from memory to this file
using (IsolatedStorageFileStream fileStream = new IsolatedStorageFileStream(fileName, System.IO.FileMode.Create, isf))
{
memWriter.BaseStream.CopyTo(fileStream);
}
}
}
}
private void button1_Click(object sender, RoutedEventArgs e)
{
ReplaceLine("test.txt", 1, "status1 read good");
}
And I agree with slugster: using SQLCE database might be a solution with better performance.

Categories