Read last line from website without saving file on disk - c#

I have a website with many large CSV files (up to 100,000 lines each). From each CSV file, I need to read the last line in the file. I know how to solve the problem when I save the file on disk before reading its content:
var url = "http://data.cocorahs.org/cocorahs/export/exportreports.aspx?ReportType=Daily&Format=csv&Date=1/1/2000&Station=UT-UT-24"
var client = new System.Net.WebClient();
var tempFile = System.IO.Path.GetTempFileName();
client.DownloadFile(url, tempFile);
var lastLine = System.IO.File.ReadLines(tempFile).Last();
Is there any way to get the last line without saving a temporary file on disk?
I tried:
using (var stream = client.OpenRead(seriesUrl))
{
using (var reader = new StreamReader(stream))
{
var lastLine = reader.ReadLines("file.txt").Last();
}
}
but the StreamReader class does not have a ReadLines method ...

StreamReader does not have a ReadLines method, but it does have a ReadLine method to read the next line from the stream. You can use it to read the last line from the remote resource like this:
using (var stream = client.OpenRead(seriesUrl))
{
using (var reader = new StreamReader(stream))
{
string lastLine;
while ((lastLine = reader.ReadLine()) != null)
{
// Do nothing...
}
// lastLine now contains the very last line from reader
}
}
Reading one line at a time with ReadLine will use less memory compared to StreamReader.ReadToEnd, which will read the entire stream into memory as a string. For CSV files with 100,000 lines this could be a significant amount of memory.

This worked for me, though the service did not return data (Headers of CSV only):
public void TestMethod1()
{
var url = "http://data.cocorahs.org/cocorahs/export/exportreports.aspx?ReportType=Daily&Format=csv&Date=1/1/2000&Station=UT-UT-24";
var client = new System.Net.WebClient();
using (var stream = client.OpenRead(url))
{
using (var reader = new StreamReader(stream))
{
var str = reader.ReadToEnd().Split('\n').Where(x => !string.IsNullOrEmpty(x)).LastOrDefault();
Debug.WriteLine(str);
Assert.IsNotEmpty(str);
}
}
}

Related

Read an Excel File from an Amazon S3 Bucket using c#

I'm trying to read excel file from my S3 bucket. In the Response Stream, I am getting values like "PK\u0003\u0004\n\0\0\0\0\0�N0\0\0\0\0\t\0\0\0docProps". Could anyone help to map the stream to a data table or convert to string. And also when I see quick watch, the Read and Write Timeout has thrown some errors.
using (var _client = new AmazonS3Client(accKey, secKey, Amazon.RegionEndpoint.USEast1))
using (var response1 = await _client.GetObjectAsync("rrrrr","mmm.xls"))
using (var responseStream = response1.ResponseStream)
using (var reader = new StreamReader(responseStream))
{
var title = response1.Metadata["x-amz-meta-title"];
var contentType = response1.Headers["Content-Type"];
responseBody = reader.ReadToEnd();
string line;
string[] columns = null;
// Here the reader.ReadLine receiving only null values
while ((line = reader.ReadLine()) != null)
{
columns = line.Split(',');
string col1 = columns[0]; }
}

How to read chunk of file in WebAPI when file is large

I have a big file, and I want to send it to Web API which will send it to Amazon. Since file is big I want to send file to Amazon in chunk wise.
So If I have 1 GB file, I want my API to receive file in let's say 20 MB chunk so that I can send it to Amazon and then again receive 20 MB chunk. How is this doable. Below is my attempt.
public async Task<bool> Upload()
{
var fileuploadPath = ConfigurationManager.AppSettings["FileUploadLocation"];
var provider = new MultipartFormDataStreamProvider(fileuploadPath);
var content = new StreamContent(HttpContext.Current.Request.GetBufferlessInputStream(true));
// Now code below writes to a folder, but I want to make sure I read it as soon as I receive some chunk
await content.ReadAsMultipartAsync(provider);
return true;
}
Pseudo Code:
While (await content.ReadAsMultipartAsync(provider) == 20 MB chunk)
{
//Do something
// Then again do something with rest of chunk and so on.
}
File is as large as 1 GB.
As of now entire file is getting sent by this line of code:
await content.ReadAsMultipartAsync(provider);
I am lost here please help me. All I want is receive file in small chunk and process it.
P.S: I am sending file as MultiPart/Form-Data from Postman to test.
Attempt No 2:
var filesReadToProvider = await Request.Content.ReadAsMultipartAsync();
foreach (var content in filesReadToProvider.Contents)
{
var stream = await content.ReadAsStreamAsync();
using (StreamReader sr = new StreamReader(stream))
{
string line = "";
while ((line = sr.ReadLine()) != null)
{
using (MemoryStream outputStream = new MemoryStream())
using (StreamWriter sw = new StreamWriter(outputStream))
{
sw.WriteLine(line);
sw.Flush();
// Do Something
}
}
}
}
No time to test this, but the ReadBlock method seems to be what you want to use.
Should look something like what I have below, but it assumes all your other code is good and you just needed some help with the buffering. This is a "blocking" read operation, but there is also a ReadBlockAsync method which returns a Task.
const int bufferSize= 1024;
var filesReadToProvider = await Request.Content.ReadAsMultipartAsync();
foreach (var content in filesReadToProvider.Contents)
{
var stream = await content.ReadAsStreamAsync();
using (StreamReader sr = new StreamReader(stream))
{
int bytesRead;
char[] buffer = new char[bufferSize];
while ((bytesRead = sr.ReadBlock(buffer, 0, bufferSize)) > 0)
{
// Do something with the first <bytesRead> of buffer and
// not with <bufferSize> as <bytesRead> will contain the
// number of bytes actually read by the call to ReadBlock
}
}
}

Download zip file from the server and parsing it

I am trying to download a zipped file from the server and trying to show the content of each files in zipped folder to the view.
I wrote a separate code where the file is on my laptop and I ran across each file and dislpayed the content such as
static void Main(string[] args)
{
string filePath = "C:\\ACL Data\\New folder\\files.zip";
var zip= new ZipInputStream(File.OpenRead(filePath));
var filestream=new FileStream(filePath,FileMode.Open,FileAccess.Read);
ZipFile zipfile = new ZipFile(filestream);
ZipEntry item;
while ((item = zip.GetNextEntry()) != null)
{
Console.WriteLine(item.Name);
using (StreamReader s = new StreamReader(zipfile.GetInputStream(item)))
{
Console.WriteLine(s.ReadToEnd());
}
}
Console.Read();
}
I am using sharplibzip library to implement this
This is the case when the zip file is located locally in the system. My next task scenario is what if the zipped file is located on the server. I am figuring out the way to implement it, below is the code what I assume should work
static void Main(string[] args)
{
string url = "https://test/code/304fd9c6-7e53-42a2-845a-624608bfd2ce.zip";
WebRequest webRequest = WebRequest.Create(url);
webRequest.Method = "GET";
WebResponse webResponse = webRequest.GetResponse();
var zip = new ZipInputStream(webResponse.GetResponseStream());
ZipEntry item1;
//var zip= new ZipInputStream(File.OpenRead(filePath));
var filestream = new FileStream(filepath, FileMode.Open, FileAccess.Read);
ZipFile zipfile = new ZipFile(filestream);
ZipEntry item;
while ((item = zip.GetNextEntry()) != null)
{
Console.WriteLine(item.Name);
using (StreamReader s = new StreamReader(zipfile.GetInputStream(item)))
{
Console.WriteLine(s.ReadToEnd());
}
}
Console.Read();
}
I am stuck at this part: var filestream = new FileStream(filepath, FileMode.Open, FileAccess.Read);
This expect the first parameter to be path of the zip file. Since in the new scenario zip file is located remotely on the server. What should be the parameter in this case?
Your original code opens the stream twice on the following rows, which I think is causing some confusion:
var zip= new ZipInputStream(File.OpenRead(filePath));
var filestream=new FileStream(filePath,FileMode.Open,FileAccess.Read);
There is an overload to the ZipFile constructor that takes "any" Stream rather than specifically a FileStream, which you - unsurprisingly - can only create for files.
However, you cannot use the stream returned by GetResponseStream directly, because it's CanSeek property is false. This is because it's a NetworkStream, which can only be read once from beginning to end. SharpZipLib needs random access to read the file contents.
Depending on the size of the ZIP file, loading it in memory may be an option. If you expect large files, writing it to a temporary file may be better.
This should do the trick, without using both ZipInputStream and ZipFile, by enumerating through ZipFile instead:
string url = "https://test/code/304fd9c6-7e53-42a2-845a-624608bfd2ce.zip";
WebRequest webRequest = WebRequest.Create(url);
webRequest.Method = "GET";
WebResponse webResponse = webRequest.GetResponse();
using (var responseStream = webResponse.GetResponseStream())
using (var ms = new MemoryStream())
{
// Copy entire file into memory. Use a file if you expect a lot of data
responseStream.CopyTo(ms);
var zipFile = new ZipFile(ms);
foreach (ZipEntry item in zipFile)
{
Console.WriteLine(item.Name);
using (var s = new StreamReader(zipFile.GetInputStream(item)))
{
Console.WriteLine(s.ReadToEnd());
}
}
}
Console.Read();
PS: starting .NET 4.5, there is support for ZIP files built in. See the ZipArchive class.

Read the content of an xml file within a zip package

I am required to read the contents of an .xml file using the Stream (Here the xml file is existing with in the zip package). Here in the below code, I need to get the file path at runtime (here I have hardcoded the path for reference). Please let me know how to read the file path at run time.
I have tried to use string s =entry.FullName.ToString(); but get the error "Could not find the Path". I have also tried to hard code the path as shown below. however get the same FileNotFound error.
string metaDataContents;
using (var zipStream = new FileStream(#"C:\OB10LinuxShare\TEST1\Temp" + "\\"+zipFileName+".zip", FileMode.Open))
using (var archive = new ZipArchive(zipStream, ZipArchiveMode.Read))
{
foreach (var entry in archive.Entries)
{
if (entry.Name.EndsWith(".xml"))
{
FileInfo metadataFileInfo = new FileInfo(entry.Name);
string metadataFileName = metadataFileInfo.Name.Replace(metadataFileInfo.Extension, String.Empty);
if (String.Compare(zipFileName, metadataFileName, true) == 0)
{
using (var stream = entry.Open())
using (var reader = new StreamReader(stream))
{
metaDataContents = reader.ReadToEnd();
clientProcessLogWriter.WriteToLog(LogWriter.LogLevel.DEBUG, "metaDataContents : " + metaDataContents);
}
}
}
}
}
I have also tried to get the contents of the .xml file using the Stream object as shown below. But here I get the error "Stream was not readable".
Stream metaDataStream = null;
string metaDataContent = string.Empty;
using (Stream stream = entry.Open())
{
metaDataStream = stream;
}
using (var reader = new StreamReader(metaDataStream))
{
metaDataContent = reader.ReadToEnd();
}
Kindly suggest, how to read the contents of the xml with in a zip file using Stream and StreamReader by specifying the file path at run time
Your section code snippet is failing because when you reach the end of the first using statement:
using (Stream stream = entry.Open())
{
metaDataStream = stream;
}
... the stream will be disposed. That's the point of a using statment. You should be fine with this sort of code, but load the XML file while the stream is open:
XDocument doc;
using (Stream stream = entry.Open())
{
doc = XDocument.Load(stream);
}
That's to load it as XML... if you really just want the text, you could use:
string text;
using (Stream stream = entry.Open())
{
using (StreamReader reader = new StreamReader(stream))
{
text = reader.ReadToEnd();
}
}
Again, note how this is reading before it hits the end of either using statement.
Here is a sample of how to read a zip file using .net 4.5
private void readZipFile(String filePath)
{
String fileContents = "";
try
{
if (System.IO.File.Exists(filePath))
{
System.IO.Compression.ZipArchive apcZipFile = System.IO.Compression.ZipFile.Open(filePath, System.IO.Compression.ZipArchiveMode.Read);
foreach (System.IO.Compression.ZipArchiveEntry entry in apcZipFile.Entries)
{
if (entry.Name.ToUpper().EndsWith(".XML"))
{
System.IO.Compression.ZipArchiveEntry zipEntry = apcZipFile.GetEntry(entry.Name);
using (System.IO.StreamReader sr = new System.IO.StreamReader(zipEntry.Open()))
{
//read the contents into a string
fileContents = sr.ReadToEnd();
}
}
}
}
}
catch (Exception)
{
throw;
}
}

Reading from large text files in c# causing memory leak

I am trying to read from a large text file with a word on each line and put all the values into an SQL database, with a small text file this works fine but when I have a larger text file, say 300,000 lines I run out of memory.
What is the best way to avoid this? Is there a way to read only a portion of the file, add this to the database then take it out of memory and move on to the next portion?
Here is my code so far:
string path = Server.MapPath("~/content/wordlist.txt");
StreamReader word_stream = new StreamReader(path);
string wordlist = word_stream.ReadToEnd();
string[] all_words = wordlist.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);
I then loop through the array adding each value to the database, but when the file is to large it simply doesnt work.
Do it like this:
// Choose the size of the buffer according
// to your requirements and/or available memory.
int bufferSize = 256 * 1024 * 1024;
string path = Server.MapPath("~/content/wordlist.txt");
using (FileStream stream = new FileStream(path, FileMode.Open, FileAccess.Read))
using (BufferedStream bufferedStream = new BufferedStream(stream, bufferSize))
using (StreamReader reader = new StreamReader(bufferedStream))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
... put line into DB ...
}
}
Also, do not forget exception handling.
try it with yield return
StreamReader r = new StreamReader(path);
while( !r.EndOfStream )
{
string line = r.ReadLine();
yield return line;
}
maybe you read ten lines yield return them, write them to the database and then the next portion.

Categories