How to read chunk of file in WebAPI when file is large

How to read chunk of file in WebAPI when file is large - c#

I have a big file, and I want to send it to Web API which will send it to Amazon. Since file is big I want to send file to Amazon in chunk wise.
So If I have 1 GB file, I want my API to receive file in let's say 20 MB chunk so that I can send it to Amazon and then again receive 20 MB chunk. How is this doable. Below is my attempt.
public async Task<bool> Upload()
{
var fileuploadPath = ConfigurationManager.AppSettings["FileUploadLocation"];
var provider = new MultipartFormDataStreamProvider(fileuploadPath);
var content = new StreamContent(HttpContext.Current.Request.GetBufferlessInputStream(true));
// Now code below writes to a folder, but I want to make sure I read it as soon as I receive some chunk
await content.ReadAsMultipartAsync(provider);
return true;
}
Pseudo Code:
While (await content.ReadAsMultipartAsync(provider) == 20 MB chunk)
{
//Do something
// Then again do something with rest of chunk and so on.
}
File is as large as 1 GB.
As of now entire file is getting sent by this line of code:
await content.ReadAsMultipartAsync(provider);
I am lost here please help me. All I want is receive file in small chunk and process it.
P.S: I am sending file as MultiPart/Form-Data from Postman to test.
Attempt No 2:
var filesReadToProvider = await Request.Content.ReadAsMultipartAsync();
foreach (var content in filesReadToProvider.Contents)
{
var stream = await content.ReadAsStreamAsync();
using (StreamReader sr = new StreamReader(stream))
{
string line = "";
while ((line = sr.ReadLine()) != null)
{
using (MemoryStream outputStream = new MemoryStream())
using (StreamWriter sw = new StreamWriter(outputStream))
{
sw.WriteLine(line);
sw.Flush();
// Do Something
}
}
}
}

No time to test this, but the ReadBlock method seems to be what you want to use.
Should look something like what I have below, but it assumes all your other code is good and you just needed some help with the buffering. This is a "blocking" read operation, but there is also a ReadBlockAsync method which returns a Task.
const int bufferSize= 1024;
var filesReadToProvider = await Request.Content.ReadAsMultipartAsync();
foreach (var content in filesReadToProvider.Contents)
{
var stream = await content.ReadAsStreamAsync();
using (StreamReader sr = new StreamReader(stream))
{
int bytesRead;
char[] buffer = new char[bufferSize];
while ((bytesRead = sr.ReadBlock(buffer, 0, bufferSize)) > 0)
{
// Do something with the first <bytesRead> of buffer and
// not with <bufferSize> as <bytesRead> will contain the
// number of bytes actually read by the call to ReadBlock
}
}
}

Related

Read Bytes from Request Content

var bytes = Request.Content.ReadAsByteArrayAsync().Result;
hello, is there any other method that read bytes from content? expect this one

You can also stream the contents into a streamreader
https://learn.microsoft.com/en-us/dotnet/api/system.io.streamreader?view=net-6.0
This code is from the Microsoft website. You can replace "TestFile.txt" for a stream.
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader("TestFile.txt"))
{
string line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
Console.WriteLine(line);
}
}

If you want to read content as a string follow code part will help you.
public async Task<string> FormatRequest(HttpRequest request)
{
//This line allows us to set the reader for the request back at the beginning of its stream.
request.EnableRewind();
var body = request.Body;
//We now need to read the request stream. First, we create a new byte[] with the same length as the request stream...
var buffer = new byte[Convert.ToInt32(request.ContentLength)];
//...Then we copy the entire request stream into the new buffer.
await request.Body.ReadAsync(buffer, 0, buffer.Length);
//We convert the byte[] into a string using UTF8 encoding...
var bodyAsText = Encoding.UTF8.GetString(buffer);
//..and finally, assign the read body back to the request body, which is allowed because of EnableRewind()
request.Body.Seek(0, SeekOrigin.Begin);
request.Body = body;
return bodyAsText;
}

Read last line from website without saving file on disk

I have a website with many large CSV files (up to 100,000 lines each). From each CSV file, I need to read the last line in the file. I know how to solve the problem when I save the file on disk before reading its content:
var url = "http://data.cocorahs.org/cocorahs/export/exportreports.aspx?ReportType=Daily&Format=csv&Date=1/1/2000&Station=UT-UT-24"
var client = new System.Net.WebClient();
var tempFile = System.IO.Path.GetTempFileName();
client.DownloadFile(url, tempFile);
var lastLine = System.IO.File.ReadLines(tempFile).Last();
Is there any way to get the last line without saving a temporary file on disk?
I tried:
using (var stream = client.OpenRead(seriesUrl))
{
using (var reader = new StreamReader(stream))
{
var lastLine = reader.ReadLines("file.txt").Last();
}
}
but the StreamReader class does not have a ReadLines method ...

StreamReader does not have a ReadLines method, but it does have a ReadLine method to read the next line from the stream. You can use it to read the last line from the remote resource like this:
using (var stream = client.OpenRead(seriesUrl))
{
using (var reader = new StreamReader(stream))
{
string lastLine;
while ((lastLine = reader.ReadLine()) != null)
{
// Do nothing...
}
// lastLine now contains the very last line from reader
}
}
Reading one line at a time with ReadLine will use less memory compared to StreamReader.ReadToEnd, which will read the entire stream into memory as a string. For CSV files with 100,000 lines this could be a significant amount of memory.

This worked for me, though the service did not return data (Headers of CSV only):
public void TestMethod1()
{
var url = "http://data.cocorahs.org/cocorahs/export/exportreports.aspx?ReportType=Daily&Format=csv&Date=1/1/2000&Station=UT-UT-24";
var client = new System.Net.WebClient();
using (var stream = client.OpenRead(url))
{
using (var reader = new StreamReader(stream))
{
var str = reader.ReadToEnd().Split('\n').Where(x => !string.IsNullOrEmpty(x)).LastOrDefault();
Debug.WriteLine(str);
Assert.IsNotEmpty(str);
}
}
}

How to upload the Stream from an HttpContent result to Azure File Storage

I am attempting to download a list of files from urls stored in my database, and then upload them to my Azure FileStorage account. I am successfully downloading the files and can turn them into files on my local storage or convert them to text and upload them. However I lose data when converting something like a pdf to a text and I do not want to have to store the files on the Azure app that this endpoint is hosted on as I do not need to manipulate the files in any way.
I have attempted to upload the files from the Stream I get from the HttpContent object using the UploadFromStream method on the CloudFile. Whenever this command is run I get an InvalidOperationException with the message "Operation is not valid due to the current state of the object."
I've tried converting the original Stream to a MemoryStream as well but this just writes a blank file to the FileStorage account, even if I set the position to the beginning of the MemoryStream. My code is below and if anyone could point out what information I am missing to make this work I would appreciate it.
public DownloadFileResponse DownloadFile(FileLink fileLink)
{
string fileName = string.Format("{0}{1}{2}", fileLink.ExpectedFileName, ".", fileLink.ExpectedFileType);
HttpStatusCode status;
string hash = "";
using (var client = new HttpClient())
{
client.Timeout = TimeSpan.FromSeconds(10); // candidate for .config setting
client.DefaultRequestHeaders.Add("User-Agent", USER_AGENT);
var request = new HttpRequestMessage(HttpMethod.Get, fileLink.ExpectedURL);
var sendTask = client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);
var response = sendTask.Result; // not ensuring success here, going to handle error codes without exceptions
status = response.StatusCode;
if (status == HttpStatusCode.OK)
{
var httpStream = response.Content.ReadAsStreamAsync().Result;
fileStorage.WriteFile(fileLink.ExpectedFileType, fileName, httpStream);
hash = HashGenerator.GetMD5HashFromStream(httpStream);
}
}
return new DownloadFileResponse(status, fileName, hash);
}
public void WriteFile(string targetDirectory, string targetFilePath, Stream fileStream)
{
var options = SetOptions();
var newFile = GetTargetCloudFile(targetDirectory, targetFilePath);
newFile.UploadFromStream(fileStream, options: options);
}
public FileRequestOptions SetOptions()
{
FileRequestOptions options = new FileRequestOptions();
options.ServerTimeout = TimeSpan.FromSeconds(10);
options.RetryPolicy = new NoRetry();
return options;
}
public CloudFile GetTargetCloudFile(string targetDirectory, string targetFilePath)
{
if (!shareConnector.share.Exists())
{
throw new Exception("Cannot access Azure File Storage share");
}
CloudFileDirectory rootDirectory = shareConnector.share.GetRootDirectoryReference();
CloudFileDirectory directory = rootDirectory.GetDirectoryReference(targetDirectory);
if (!directory.Exists())
{
throw new Exception("Target Directory does not exist");
}
CloudFile newFile = directory.GetFileReference(targetFilePath);
return newFile;
}

Had the same problem, the only way it worked is by reading the coming stream (in your case it is httpStream in DownloadFile(FileLink fileLink) method) to a byte array and using UploadFromByteArray (byte[] buffer, int index, int count) instead of UploadFromStream
So your WriteFile(FileLink fileLink) method will look like:
public void WriteFile(string targetDirectory, string targetFilePath, Stream fileStream)
{
var options = SetOptions();
var newFile = GetTargetCloudFile(targetDirectory, targetFilePath);
const int bufferLength= 600;
byte[] buffer = new byte[bufferLength];
// Buffer to read from stram This size is just an example
List<byte> byteArrayFile = new List<byte>(); // all your file will be here
int count = 0;
try
{
while ((count = fileStream.Read(buffer, 0, bufferLength)) > 0)
{
byteArrayFile.AddRange(buffer);
}
fileStream.Close();
}
catch (Exception ex)
{
throw; // you need to change this
}
file.UploadFromByteArray(allFile.ToArray(), 0, byteArrayFile.Count);
// Not sure about byteArrayFile.Count.. it should work
}

According to your description and codes, I suggest you could use Steam.CopyTo to copy the stream to the local memoryStream firstly, then upload the MemoryStream to azure file storage.
More details, you could refer to below codes:
I just change the DownloadFile method to test it.
HttpStatusCode status;
using (var client = new HttpClient())
{
client.Timeout = TimeSpan.FromSeconds(10); // candidate for .config setting
// client.DefaultRequestHeaders.Add("User-Agent", USER_AGENT);
//here I use my blob file to test it
var request = new HttpRequestMessage(HttpMethod.Get, "https://xxxxxxxxxx.blob.core.windows.net/media/secondblobtest-eypt.txt");
var sendTask = client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);
var response = sendTask.Result; // not ensuring success here, going to handle error codes without exceptions
status = response.StatusCode;
if (status == HttpStatusCode.OK)
{
MemoryStream ms = new MemoryStream();
var httpStream = response.Content.ReadAsStreamAsync().Result;
httpStream.CopyTo(ms);
ms.Position = 0;
WriteFile("aaa", "testaa", ms);
// hash = HashGenerator.GetMD5HashFromStream(httpStream);
}
}

I had a similar problem and got to find out that the UploadFromStream method only works with buffered streams. Nevertheless I was able to successfully upload files to azure storage by using a MemoryStream. I don't think this to be a very good solution as you are using up your memory resources by copying the content of the file stream to memory before handing it to the azure stream. What I have come up with is a way of writing directly to an azure stream by using instead the OpenWriteAsync method to create the stream and then a simple CopyToAsync from the source stream.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse( "YourAzureStorageConnectionString" );
CloudFileClient fileClient = storageAccount.CreateCloudFileClient();
CloudFileShare share = fileClient.GetShareReference( "YourShareName" );
CloudFileDirectory root = share.GetRootDirectoryReference();
CloudFile file = root.GetFileReference( "TheFileName" );
using (CloudFileStream fileWriteStream = await file.OpenWriteAsync( fileMetadata.FileSize, new AccessCondition(),
new FileRequestOptions { StoreFileContentMD5 = true },
new OperationContext() ))
{
await fileContent.CopyToAsync( fileWriteStream, 128 * 1024 );
}

Passing bytes through Web API: why are these files different?

I'm storing bitmap images in an Azure blob store and delivering them to a .Net Micro Framework device. Because of memory limitations on the device I need to break the files into chunks and deliver them to the device where they are to be recombined onto the device's microSD card. I am having trouble with byte fidelity and am struggling to understand this pared down test.
I have a simple bitmap on azure: https://filebytetest9845.blob.core.windows.net/files/helloworld.bmp It is just a black and white bitmap of the words "Hello World".
Here's some test code I've written to sit in an ASP .Net Web API and read the bytes ready for breaking into chunks. But to test I just store the bytes to a local file.
[Route("api/testbytes/")]
[AcceptVerbs("GET", "POST")]
public void TestBytes()
{
var url = "https://filebytetest9845.blob.core.windows.net/files/helloworld.bmp";
var fileRequest = (HttpWebRequest) WebRequest.Create(url);
var fileResponse = (HttpWebResponse) fileRequest.GetResponse();
if (fileResponse.StatusCode == HttpStatusCode.OK)
{
if (fileResponse.ContentLength > 0)
{
var responseStream = fileResponse.GetResponseStream();
if (responseStream != null)
{
var contents = new byte[fileResponse.ContentLength];
responseStream.Read(contents, 0, (int) fileResponse.ContentLength);
if (!Directory.Exists(#"C:\Temp\Bytes\")) Directory.CreateDirectory(#"C:\Temp\Bytes\");
using (var fs = System.IO.File.Create(#"C:\Temp\Bytes\helloworldbytes.bmp"))
{
fs.Write(contents, 0, (int) fileResponse.ContentLength);
}
}
}
}
}
Here's the original bitmap:
And here's the version saved to disk:
As you can see they are different, but my code should just be saving a byte-for-byte copy. Why are they different?

Try this
var contents = new byte[fileResponse.ContentLength];
int readed = 0;
while (readed < fileResponse.ContentLength)
{
readed += responseStream.Read(contents, readed, (int)fileResponse.ContentLength - readed);
}
looks like it can't download whole image in single Read call and you have to recall it untill whole image is downloaded.

Atomosk is right - single Read call can't read whole response. If you are using .NET 4+, then you can use this code to read full response stream:
var fileResponse = (HttpWebResponse)fileRequest.GetResponse();
if (fileResponse.StatusCode == HttpStatusCode.OK)
{
var responseStream = fileResponse.GetResponseStream();
if (responseStream != null)
{
using (var ms = new MemoryStream())
{
responseStream.CopyTo(ms);
ms.Position = 0;
using (var fs = System.IO.File.Create(#"C:\Temp\Bytes\helloworldbytes.bmp"))
{
ms.CopyTo(fs);
}
}
}
}
Using this code you don't need to know Content Length since it is not always available.

Displaying the contents of a Zip archive in WinRT

I want to iterate through the contents of a zipped archive and, where the contents are readable, display them. I can do this for text based files, but can't seem to work out how to pull out binary data from things like images. Here's what I have:
var zipArchive = new System.IO.Compression.ZipArchive(stream);
foreach (var entry in zipArchive.Entries)
{
using (var entryStream = entry.Open())
{
if (IsFileBinary(entry.Name))
{
using (BinaryReader br = new BinaryReader(entryStream))
{
//var fileSize = await reader.LoadAsync((uint)entryStream.Length);
var fileSize = br.BaseStream.Length;
byte[] read = br.ReadBytes((int)fileSize);
binaryContent = read;
I can see inside the zip file, but calls to Length result in an OperationNotSupported error. Also, given that I'm getting a long and then having to cast to an integer, it feels like I'm missing something quite fundamental about how this should work.

I think the stream will decompress the data as it is read, which means that the stream cannot know the decompressed length. Calling entry.Length should return the correct size value that you can use. You can also call entry.CompressedLength to get the compressed size.

Just copy the stream into a file or another stream:
using (var fs = await file.OpenStreamForWriteAsync())
{
using (var src = entry.Open())
{
var buffLen = 1024;
var buff = new byte[buffLen];
int read;
while ((read = await src.ReadAsync(buff, 0, buffLen)) > 0)
{
await fs.WriteAsync(buff, 0, read);
await fs.FlushAsync();
}
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to read chunk of file in WebAPI when file is large - c#

Related

Read Bytes from Request Content

Read last line from website without saving file on disk

How to upload the Stream from an HttpContent result to Azure File Storage

Passing bytes through Web API: why are these files different?

Displaying the contents of a Zip archive in WinRT

Categories

Resources