Opening large files

Opening large files - c#

I have a processes I made that has been working well for several months now. The process recursively zips up all files and folders in a given directory and then uploads the zip file to an FTP server. Its been working, but now, the zip file is exceeding 2gb and its erroring out. Can someone please help me figure out how to get around this 2gb limit? I commented the offending line in the code. Here is the code:
class Program
{
// Location of upload directory
private const string SourceFolder = #"C:\MyDirectory";
// FTP server
private const string FtpSite = "10.0.0.1";
// FTP User Name
private const string FtpUserName = "myUserName";
// FTP Password
private const string FtpPassword = "myPassword";
static void Main(string[] args)
{
try
{
// Zip everything up using SharpZipLib
string tmpFile = Path.GetTempFileName();
var zip = new ZipOutputStream(File.Create(tmpFile));
zip.SetLevel(8);
ZipFolder(SourceFolder, SourceFolder, zip);
zip.Finish();
zip.Close();
// Upload the zip file
UploadFile(tmpFile);
// Delete the zip file
File.Delete(tmpFile);
}
catch (Exception ex)
{
throw ex;
}
}
private static void UploadFile(string fileName)
{
string remoteFileName = "/ImagesUpload_" + DateTime.Now.ToString("MMddyyyyHHmmss") + ".zip";
var request = (FtpWebRequest)WebRequest.Create("ftp://" + FtpSite + remoteFileName);
request.Credentials = new NetworkCredential(FtpUserName, FtpPassword);
request.Method = WebRequestMethods.Ftp.UploadFile;
request.KeepAlive = false;
request.Timeout = -1;
request.UsePassive = true;
request.UseBinary = true;
// Error occurs in the next line!!!
byte[] b = File.ReadAllBytes(fileName);
using (Stream s = request.GetRequestStream())
{
s.Write(b, 0, b.Length);
}
using (var resp = (FtpWebResponse)request.GetResponse())
{
}
}
private static void ZipFolder(string rootFolder, string currentFolder, ZipOutputStream zStream)
{
string[] subFolders = Directory.GetDirectories(currentFolder);
foreach (string folder in subFolders)
ZipFolder(rootFolder, folder, zStream);
string relativePath = currentFolder.Substring(rootFolder.Length) + "/";
if (relativePath.Length > 1)
{
var dirEntry = new ZipEntry(relativePath) {DateTime = DateTime.Now};
}
foreach (string file in Directory.GetFiles(currentFolder))
{
AddFileToZip(zStream, relativePath, file);
}
}
private static void AddFileToZip(ZipOutputStream zStream, string relativePath, string file)
{
var buffer = new byte[4096];
var fi = new FileInfo(file);
string fileRelativePath = (relativePath.Length > 1 ? relativePath : string.Empty) + Path.GetFileName(file);
var entry = new ZipEntry(fileRelativePath) {DateTime = DateTime.Now, Size = fi.Length};
zStream.PutNextEntry(entry);
using (FileStream fs = File.OpenRead(file))
{
int sourceBytes;
do
{
sourceBytes = fs.Read(buffer, 0, buffer.Length);
zStream.Write(buffer, 0, sourceBytes);
} while (sourceBytes > 0);
}
}
}

You are trying to allocate an array possessing more than 2billion elements. .NET limits the maximum size of an array is System.Int32.MaxValue i.e. 2Gb is the upper bound.
You're better off reading the file in pieces an uploading it in pieces; e.g using a loop reading:
int buflen = 128 * 1024;
byte[] b = new byte[buflen];
FileStream source = new FileStream(fileName, FileMode.Open);
Stream dest = request.GetRequestStream();
while (true) {
int bytesRead = source.Read(buf, 0, buflen);
if (bytesRead == 0) break;
dest.Write(buf, 0, bytesRead);
}

The problem isn't in the zip, but in the File.ReadAllBytes call, which returns an array which has the default size limit of 2GB.
It is possible to disable this limit, as detailed here. I'm assuming you're already compiling this specifically for 64 bit to handle these kind of file sizes. Enabling this option switches .NET over to using 64 bit addresses for arrays instead of the default 32 bit addresses.
It would probably be better to split the archive into parts and upload them separately however. As far as I know the built in ZipFile class doesn't support multi-part archives, but several of the third party libraries do.
Edit: I was thinking about the resulting zip output, rather than the input. To load a huge amount of data INTO the ZipFile, you should use the Buffer based approach suggested by Petesh and philip.

Related

RestSharp, Forge API - Getting error:overlapping ranges on file upload

I am trying to upload a file to a bucket using the forge .NET SDK. It works most of the time but gives an {error: overlapping ranges} occasionally. Here is the code snippet.
private string uploadFileToBucket(Configuration configuration, string bucketKey, string filePath)
{
ObjectsApi objectsApi = new ObjectsApi(configuration);
string fileName = Path.GetFileName(filePath);
string base64EncodedUrn, objectKey;
using (FileStream fileStream = File.Open(filePath, FileMode.Open))
{
long contentLength = fileStream.Length;
string content_range = "bytes 0-" + (contentLength - 1) + "/" + contentLength;
dynamic result = objectsApi.UploadChunk(bucketKey, fileName, (int)fileStream.Length, content_range,
"12313", fileStream);
DynamicJsonResponse dynamicJsonResponse = (DynamicJsonResponse)result;
JObject json = dynamicJsonResponse.ToJson();
JToken urn = json.GetValue("objectId");
string urnStr = urn.ToString();
base64EncodedUrn = ApiClient.encodeToSafeBase64(urnStr);
objectKey = fileName;
}
return base64EncodedUrn;
}

Before uploading, the file content must have to read to the computer memory, otherwise, the FileStream object in your code snippet is empty.
However, I would like to advise you to use PUT buckets/:bucketKey/objects/:objectName instead if you want to upload the whole file in a single chunk only. Here is my test code. Hope it helps~
private static TwoLeggedApi oauth2TwoLegged;
private static dynamic twoLeggedCredentials;
private static Random random = new Random();
public static string RandomString(int length)
{
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
// Initialize the 2-legged OAuth 2.0 client, and optionally set specific scopes.
private static void initializeOAuth()
{
// You must provide at least one valid scope
Scope[] scopes = new Scope[] { Scope.DataRead, Scope.DataWrite, Scope.BucketCreate, Scope.BucketRead };
oauth2TwoLegged = new TwoLeggedApi();
twoLeggedCredentials = oauth2TwoLegged.Authenticate(FORGE_CLIENT_ID, FORGE_CLIENT_SECRET, oAuthConstants.CLIENT_CREDENTIALS, scopes);
objectsApi.Configuration.AccessToken = twoLeggedCredentials.access_token;
}
private static void uploadFileToBucket(string bucketKey, string filePath)
{
Console.WriteLine("*****Start uploading file to the OSS");
string path = filePath;
//File Total size
var info = new System.IO.FileInfo(path);
long fileSize = info.Length;
using (FileStream fileStream = File.Open(filePath, FileMode.Open))
{
string sessionId = RandomString(12);
Console.WriteLine(string.Format("sessionId: {0}", sessionId));
long contentLength = fileSize;
string content_range = "bytes 0-" + (contentLength - 1) + "/" + contentLength;
Console.WriteLine("Uploading range： " + content_range);
byte[] buffer = new byte[contentLength];
MemoryStream memoryStream = new MemoryStream(buffer);
int nb = fileStream.Read(buffer, 0, (int)contentLength);
memoryStream.Write(buffer, 0, nb);
memoryStream.Position = 0;
dynamic response = objectsApi.UploadChunk(bucketKey, info.Name, (int)contentLength, content_range,
sessionId, memoryStream);
Console.WriteLine(response);
}
}
static void Main(string[] args)
{
initializeOAuth();
uploadFileToBucket(BUCKET_KEY, FILE_PATH);
}

What is the optimal buffer size to increase efficiency of the application?

I have an application which reads a file and copy's its content and writes into another file.
I am using buffer to read the file and write into another file.
The application take too long when the files are more.
Is there any specific optimal buffer size value that I can use to make the application more efficient?
I have used 256KB as maximum buffer size.
Below Upload method is called within a Parallel.ForEach loop.
Below is the code :
private bool Upload(string address, string uploadFile, string user, string password, string clientLogFile)
{
// Get the object used to communicate with the server.
FtpWebRequest request = null;
try
{
request = (FtpWebRequest)WebRequest.Create(address);
request.Credentials = new NetworkCredential(user, password);
request.Method = WebRequestMethods.Ftp.UploadFile;
request.KeepAlive = false;
request.Timeout = Convert.ToInt32(ConfigurationSettings.AppSettings["timeout"]);
request.UsePassive = Convert.ToBoolean(ConfigurationSettings.AppSettings["ftpMode"]);
// _fileBufferSize = 256kb
byte[] buffer = new byte[_fileBufferSize];
using (FileStream fs = new FileStream(uploadFile, FileMode.Open))
{
long dataLength = (long)fs.Length;
long bytesRead = 0;
int bytesDownloaded = 0;
using (Stream requestStream = request.GetRequestStream())
{
while (bytesRead < dataLength)
{
bytesDownloaded = fs.Read(buffer, 0, buffer.Length);
bytesRead = bytesRead + bytesDownloaded;
requestStream.Write(buffer, 0, bytesDownloaded);
}
requestStream.Close();
}
}
return true;
}
catch (Exception ex)
{
// Catch exception
}
finally
{
request = null;
}
return false;
}
All suggestions are welcome.

Consolidate your files to upload into multiple tasks to run, then execute several tasks in parallel.
Read this older guide from Microsoft on parallel tasks. A modern version of this might look like this code below.
public void UploadAllFiles(IEnumerable<FileUploadParameters> files) {
var tasks = new List<Task>();
foreach (var file in files) {
var task = Task.Run(() => {
UploadFile(file);
});
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
}

How to download .zip files through code?

I have some zip files on an ftp server that I want to download through code but every time I download them and try to open them they are corrupt.
FtpClient conn = new FtpClient();
conn.Host = ftpFtpServerAddress;
conn.Credentials = new NetworkCredential(ftpSrcUsername, ftpSrcPwd);
var files = conn.GetListing(ftpSrcFolder, FtpListOption.Modify | FtpListOption.Size);
foreach (var file in files)
{
conn.BeginOpenRead(file.FullName,
new AsyncCallback(BeginOpenReadCallback), new AsyncArguments()
{
Client = conn,
FileName = file.Name
});
}
private void BeginOpenReadCallback(IAsyncResult ar)
{
AsyncArguments args = (AsyncArguments)ar.AsyncState;
FtpClient conn = args.Client;
Stream istream = conn.EndOpenRead(ar);
using (System.IO.FileStream fs = System.IO.File.Create(#"C:\temp\" + args.FileName))
{
byte[] bytes = new byte[istream.Length + 10];
int numBytesToRead = (int)istream.Length;
fs.Write(bytes, 0, numBytesToRead);
}
}

Now with stream.Read. While this method is ok for little files you shouldn´t use it for very large files, because it would fill up your memory very quickly and the byte size of files larger than 2 gb can´t be stored as a int32.
private void BeginOpenReadCallback(IAsyncResult ar)
{
AsyncObject args = (AsyncObject)ar.AsyncState;
FtpClient conn = args.conn;
Stream istream = conn.EndOpenRead(ar);
using (System.IO.FileStream fs = System.IO.File.Create(#"C:\temp\" + args.filename))
{
byte[] bytes = new byte[istream.Length];
int bytesread = istream.Read(bytes, 0, bytes.Length);
fs.Write(bytes, 0, bytesread);
}
}

Making self-extracting executable with C#

I'm creating simple self-extracting archive using magic number to mark the beginning of the content.
For now it is a textfile:
MAGICNUMBER .... content of the text file
Next, textfile copied to the end of the executable:
copy programm.exe/b+textfile.txt/b sfx.exe
I'm trying to find the second occurrence of the magic number (the first one would be a hardcoded constant obviously) using the following code:
string my_filename = System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName;
StreamReader file = new StreamReader(my_filename);
const int block_size = 1024;
const string magic = "MAGICNUMBER";
char[] buffer = new Char[block_size];
Int64 count = 0;
Int64 glob_pos = 0;
bool flag = false;
while (file.ReadBlock(buffer, 0, block_size) > 0)
{
var rel_pos = buffer.ToString().IndexOf(magic);
if ((rel_pos > -1) & (!flag))
{
flag = true;
continue;
}
if ((rel_pos > -1) & (flag == true))
{
glob_pos = block_size * count + rel_pos;
break;
}
count++;
}
using (FileStream fs = new FileStream(my_filename, FileMode.Open, FileAccess.Read))
{
byte[] b = new byte[fs.Length - glob_pos];
fs.Seek(glob_pos, SeekOrigin.Begin);
fs.Read(b, 0, (int)(fs.Length - glob_pos));
File.WriteAllBytes("c:/output.txt", b);
but for some reason I'm copying almost entire file, not the last few kilobytes. Is it because of the compiler optimization, inlining magic constant in while loop of something similar?
How should I do self-extraction archive properly?
Guessed I should read file backwards to avoid problems of compiler inlining magic constant multiply times.
So I've modified my code in the following way:
string my_filename = System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName;
StreamReader file = new StreamReader(my_filename);
const int block_size = 1024;
const string magic = "MAGIC";
char[] buffer = new Char[block_size];
Int64 count = 0;
Int64 glob_pos = 0;
while (file.ReadBlock(buffer, 0, block_size) > 0)
{
var rel_pos = buffer.ToString().IndexOf(magic);
if (rel_pos > -1)
{
glob_pos = block_size * count + rel_pos;
}
count++;
}
using (FileStream fs = new FileStream(my_filename, FileMode.Open, FileAccess.Read))
{
byte[] b = new byte[fs.Length - glob_pos];
fs.Seek(glob_pos, SeekOrigin.Begin);
fs.Read(b, 0, (int)(fs.Length - glob_pos));
File.WriteAllBytes("c:/output.txt", b);
}
So I've scanned the all file once, found that I though would be the last occurrence of the magic number and copied from here to the end of it. While the file created by this procedure seems smaller than in previous attempt it in no way the same file I've attached to my "self-extracting" archive. Why?
My guess is that position calculation of the beginning of the attached file is wrong due to used conversion from binary to string. If so how should I modify my position calculation to make it correct?
Also how should I choose magic number then working with real files, pdfs for example? I wont be able to modify pdfs easily to include predefined magic number in it.

Try this out. Some C# Stream IO 101:
public static void Main()
{
String path = #"c:\here is your path";
// Method A: Read all information into a Byte Stream
Byte[] data = System.IO.File.ReadAllBytes(path);
String[] lines = System.IO.File.ReadAllLines(path);
// Method B: Use a stream to do essentially the same thing. (More powerful)
// Using block essentially means 'close when we're done'. See 'using block' or 'IDisposable'.
using (FileStream stream = File.OpenRead(path))
using (StreamReader reader = new StreamReader(stream))
{
// This will read all the data as a single string
String allData = reader.ReadToEnd();
}
String outputPath = #"C:\where I'm writing to";
// Copy from one file-stream to another
using (FileStream inputStream = File.OpenRead(path))
using (FileStream outputStream = File.Create(outputPath))
{
inputStream.CopyTo(outputStream);
// Again, this will close both streams when done.
}
// Copy to an in-memory stream
using (FileStream inputStream = File.OpenRead(path))
using (MemoryStream outputStream = new MemoryStream())
{
inputStream.CopyTo(outputStream);
// Again, this will close both streams when done.
// If you want to hold the data in memory, just don't wrap your
// memory stream in a using block.
}
// Use serialization to store data.
var serializer = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
// We'll serialize a person to the memory stream.
MemoryStream memoryStream = new MemoryStream();
serializer.Serialize(memoryStream, new Person() { Name = "Sam", Age = 20 });
// Now the person is stored in the memory stream (just as easy to write to disk using a
// file stream as well.
// Now lets reset the stream to the beginning:
memoryStream.Seek(0, SeekOrigin.Begin);
// And deserialize the person
Person deserializedPerson = (Person)serializer.Deserialize(memoryStream);
Console.WriteLine(deserializedPerson.Name); // Should print Sam
}
// Mark Serializable stuff as serializable.
// This means that C# will automatically format this to be put in a stream
[Serializable]
class Person
{
public String Name { get; set; }
public Int32 Age { get; set; }
}

The easiest solution is to replace
const string magic = "MAGICNUMBER";
with
static string magic = "magicnumber".ToUpper();
But there are more problems with the whole magic string approach. What is the file contains the magic string? I think that the best solution is to put the file size after the file. The extraction is much easier that way: Read the length from the last bytes and read the required amount of bytes from the end of the file.
Update: This should work unless your files are very big. (You'd need to use a revolving pair of buffers in that case (to read the file in small blocks)):
string inputFilename = System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName;
string outputFilename = inputFilename + ".secret";
string magic = "magic".ToUpper();
byte[] data = File.ReadAllBytes(inputFilename);
byte[] magicData = Encoding.ASCII.GetBytes(magic);
for (int idx = magicData.Length - 1; idx < data.Length; idx++) {
bool found = true;
for (int magicIdx = 0; magicIdx < magicData.Length; magicIdx++) {
if (data[idx - magicData.Length + 1 + magicIdx] != magicData[magicIdx]) {
found = false;
break;
}
}
if (found) {
using (FileStream output = new FileStream(outputFilename, FileMode.Create)) {
output.Write(data, idx + 1, data.Length - idx - 1);
}
}
}
Update2: This should be much faster, use little memory and work on files of all size, but the program your must be proper executable (with size being a multiple of 512 bytes):
string inputFilename = System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName;
string outputFilename = inputFilename + ".secret";
string marker = "magic".ToUpper();
byte[] data = File.ReadAllBytes(inputFilename);
byte[] markerData = Encoding.ASCII.GetBytes(marker);
int markerLength = markerData.Length;
const int blockSize = 512; //important!
using(FileStream input = File.OpenRead(inputFilename)) {
long lastPosition = 0;
byte[] buffer = new byte[blockSize];
while (input.Read(buffer, 0, blockSize) >= markerLength) {
bool found = true;
for (int idx = 0; idx < markerLength; idx++) {
if (buffer[idx] != markerData[idx]) {
found = false;
break;
}
}
if (found) {
input.Position = lastPosition + markerLength;
using (FileStream output = File.OpenWrite(outputFilename)) {
input.CopyTo(output);
}
}
lastPosition = input.Position;
}
}
Read about some approaches here: http://www.strchr.com/creating_self-extracting_executables

You can add the compressed file as resource to the project itself:
Project > Properties
Set the property of this resource to Binary.
You can then retrieve the resource with
byte[] resource = Properties.Resources.NameOfYourResource;

Search backwards rather than forwards (assuming your file won't contain said magic number).
Or append your (text) file and then lastly its length (or the length of the original exe), so you only need read the last DWORD / few bytes to see how long the file is - then no magic number is required.
More robustly, store the file as an additional data section within the executable file. This is more fiddly without external tools as it requires knowledge of the PE file format used for NT executables, q.v. http://msdn.microsoft.com/en-us/library/ms809762.aspx

SharpZipLib library Compress a folder with subfolders with high level compresion and efficient time

I used many existing codes and I tried to zip the folder in many ways but still I am having problem with time and folder size (still approx same size).
this code is from the source of the library and still not giving the wanted result
static void Main(string[] args)
{
//copyDirectory(#"C:\x", #"D:\1");
ZipOutputStream zip = new ZipOutputStream(File.Create(#"d:\2.zip"));
zip.SetLevel(9);
string folder = #"D:\music";
ZipFolder(folder, folder, zip);
zip.Finish();
zip.Close();
}
public static void ZipFolder(string RootFolder, string CurrentFolder, ZipOutputStream zStream)
{
string[] SubFolders = Directory.GetDirectories(CurrentFolder);
foreach (string Folder in SubFolders)
ZipFolder(RootFolder, Folder, zStream);
string relativePath = CurrentFolder.Substring(RootFolder.Length) + "/";
if (relativePath.Length > 1)
{
ZipEntry dirEntry;
dirEntry = new ZipEntry(relativePath);
dirEntry.DateTime = DateTime.Now;
}
foreach (string file in Directory.GetFiles(CurrentFolder))
{
AddFileToZip(zStream, relativePath, file);
}
}
private static void AddFileToZip(ZipOutputStream zStream, string relativePath, string file)
{
byte[] buffer = new byte[4096];
string fileRelativePath = (relativePath.Length > 1 ? relativePath : string.Empty) + Path.GetFileName(file);
ZipEntry entry = new ZipEntry(fileRelativePath);
entry.DateTime = DateTime.Now;
zStream.PutNextEntry(entry);
using (FileStream fs = File.OpenRead(file))
{
int sourceBytes;
do
{
sourceBytes = fs.Read(buffer, 0, buffer.Length);
zStream.Write(buffer, 0, sourceBytes);
} while (sourceBytes > 0);
}
}

string folder = #"D:\music";
If you're trying to zip MP3 files you're not going to see much shrinking.
There are limits to how much a compression algorithm can do anyway. And more compression always takes more time.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Opening large files - c#

Related

RestSharp, Forge API - Getting error:overlapping ranges on file upload

What is the optimal buffer size to increase efficiency of the application?

How to download .zip files through code?

Making self-extracting executable with C#

SharpZipLib library Compress a folder with subfolders with high level compresion and efficient time

Categories

Resources