how to handle download stream - c#

I need to download xml file from a secure link, and store the content into database.
Can i use text reader??? or i need to store the file into my local file system first, then read the content from my file system and store into my database?
HttpWebRequest downloadRequest = (HttpWebRequest)WebRequest.Create("https://line-to-xml-file.xml");
string Content;
downloadRequest.Credentials = new NetworkCredential()
{
UserName = this._userCredentials.UserName,
Password = this._userCredentials.Password
};
downloadRequest.PreAuthenticate = true;
using (HttpWebResponse downloadHTTPResponse = (HttpWebResponse)downloadRequest.GetResponse())
{
using (Stream downloadResponseStream = downloadHTTPResponse.GetResponseStream())
using (TextReader tReader = new StreamReader(downloadResponseStream))
{
Content = tReader.ReadToEnd();
}
}
return Content;
Since the remote file is huge, up to 100MB, i can see nothing from debug.
and when i try to save it.
using (TransactionScope trans = new TransactionScope()) <--- when comes to this line, exception throws...
{
// perform update, save the content into databse
// send a notification message to message bus, indicate content has been updated
}
It complain MSDTC transaction timeout/cancelled

By having it into a stream you should be fine... if you have any problem with that specific stream then you could use a MemoryStream instead of a FileStream and use it the same way. But I hardly doubt this is your case.
I think you should make also sure to open the connection RIGHT BEFORE you are going to save the stream and when you have it completely loaded... You can also play with your Command.TimeOut property if it is taking really, really long to save, here "A value of 0 indicates no limit" but this should be avoided.
but

There is an XmlReader, but if the Xml is already well-formed and you don't need to parse it because your database is just taking it as a blob, or the database engine is going to parse it, you could use whatever.
Depends on your database schema.
Sounds like the database is having trouble inserting it, but need more info.

100MB of text is a lot to cram in a table. Your statement is almost certainly timing out. Check your context and your SQL command object (if you have one) and increase the timeout value.

You will probably set a longer timeout duration to solve the timeout problem, and make sure you call the Complete function
using (TransactionScope trans = new TransactionScope(
TransactionScopeOption.Required, new TimeSpan(1, 4, 3))) // Sets time out to 1 hour 4 minutes and 3 seconds
{
// perform update, save the content into databse
// send a notification message to message bus, indicate content has been updated
trans.Complete();
}
as for reading the file, you could possible use WebClient. WebClient allows you to monitor the progress.
WebClient wc = new WebClient();
wc.Credentials = new NetworkCredential()
{
UserName = this._userCredentials.UserName,
Password = this._userCredentials.Password
};
wc.DownloadProgressChanged += new DownloadProgressChangedEventHandler(wc_DownloadProgressChanged);
wc.DownloadFile("https://line-to-xml-file.xml", "C:\\local.xml");
The handler could have log the progress if necessary:
void wc_DownloadProgressChanged(object sender, DownloadProgressChangedEventArgs e)
{
// Log or show the current progress (e.ProgressPercentage or e.BytesReceived)
}
You could use DownloadString instead of Download file if you want the string straight without needing to read off the file again.
wc.DownloadString("https://line-to-xml-file.xml");

How about using the WebClient and its download file method. Just save it locally, use it and delete upon completion of use.

Related

Download Large File from Azure Blob Storage, process it and send back to the Client

I have the following request flow where the customer can request to download a CSV file from the Server. The issue is that the blob file is too large and the customer has to wait a lot longer before the actual download starts (the customer thinks that there is some issue and closes the browser). How can the download be made more efficient using streams?
Current sequence is as below:
Request Sequence:
Client clicks the download button from the browser.
Backend receives the request.
Backend Server Downloads the Blob from the Azure Storage Account.
There is some custom processing that needs to be done.
Once the processing is completed, start sending the response back to the client.
Now the issue is that while using the DownloadTo(Stream) function of BlobBaseClient, the file is entirely downloaded to memory before I could do anything.
How can I download the blob file in chunks, do the processing and start sending it to the customer?
Part of Download Controller:
var contentDisposition = new ContentDispositionHeaderValue("attachment")
{
FileName = "customer-file.csv",
CreationDate = DateTimeOffset.UtcNow
};
Response.Headers.Add("Content-Disposition", contentDisposition.ToString());
var result = blobService.DownloadAndProcessContent();
foreach (var line in result)
{
yield return line ;
}
Response.BodyWriter.FlushAsync();
Part of DownloadAndProcessContent Function:
var stream = new MemoryStream();
var blob = container.GetAppendBlobClient(blobName);
blob.DownloadTo(stream);
// Processing is done on the Blob Data
var streamReader = new StreamReader(stream);
while (!streamReader.EndOfStream)
{
string currentLine= streamReader.ReadLine();
// process the line.
string processDataLine = ProcessData(currentLine);
yield return processDataLine;
}
Did you consider using built-in method OpenRead so you can apply the StreamReader directly to the blob stream without needing a MemoryStream in the middle? This should give you a way process line-by-line as you do in the loop.
Also note it's recommended to take the async-await approach all the way so your controller code (made async) would be much more scalable by not blocking on I/O turning the .Net thread-pool into a bottleneck for handling concurrent requests to your API.
This answer doesn't address returning an HTTP response with streaming, that's separate from streaming a downloaded blob.

Caching posted data and fall-backs

I'm currently working on a project that has an external site posting xml data to a specified url on our site. My initial thoughts were to first of all save the xml data to a physical file on our server as a backup. I then insert the data into the cache and from then on, all requests for the data will be made to the cache instead of the physical file.
At the moment I have the following:
[HttpPost]
public void MyHandler()
{
// filePath = path to my xml file
// Delete the previous file
if (File.Exists(filePath))
File.Delete(filePath));
using (Stream output = File.OpenWrite(filePath))
using (Stream input = request.InputStream)
{
input.CopyTo(output);
}
// Deserialize and save the data to the cache
var xml = new XmlTextReader(filePath);
var serializer = new XmlSerializer(typeof(MyClass));
var myClass = (MyClass)serializer.Deserialize(xml);
HttpContext.Current.Cache.Insert(myKey,
myClass,
null,
myTimespan,
Cache.NoSlidingExpiration,
CacheItemPriority.Default, null);
}
The issue I have is that I'm always getting exceptions thrown because the file that I'm saving to 'is in use' when I try a second post to update the data.
A colleague suggested using a Mutex class just before I left work on the Friday so I wonder if that is the correct approach here?
Basically I'm just trying to sanity check that this is a good way of managing the data? I can see there's clearly an issue with how I'm writing the data to a file but aside from this, does my approach make sense?
Thanks

Reading file after writing it

I have a strange problem. So my code follows as following.
The exe takes some data from the user
Call a web service to write(and create CSV for the data) the file at perticular network location(say \some-server\some-directory).
Although this web service is hosted at the same location where this
folder is (i.e i can also change it to be c:\some-directory). It then
returns after writing the file
the exe checks for the file to exists, if the file exists then further processing else quite with error.
The problem I am having is at step 3. When I try to read the file immediately after it has been written, I always get file not found exception(but the file there is present). I do not get this exception when I am debugging (because then I am putting a delay by debugging the code) or when Thread.Sleep(3000) before reading the file.
This is really strange because I close the StreamWriter before I return the call to exe. Now according to the documention, close should force the flush of the stream. This is also not related to the size of the file. Also I am not doing Async thread calls for writing and reading the file. They are running in same thread serially one after another(only writing is done by a web service and reading is done by exe. Still the call is serial)
I do not know, but it feels like there is some time difference between the file actually gets written on the disk and when you do Close(). However this baffling because this is not at all related to size. This happens for all file size. I have tried this with file with 10, 50, 100,200 lines of data.
Another thing which I suspected was since I was writing this file to a network location, it could be windows is optimizing the call by writing first to cache and then to network location. So I went ahead and changed the code to write it on drive(i.e use c:\some-directory), rather than network location. But it also resulted in same error.
There is no error in code(for reading and writing). As explained earlier, by putting a delay, it starts working fine. Some other useful information
The exe is .Net Framework 3.5
Windows Server 2008(64 bit, 4 GB Ram)
Edit 1
File.AppendAllText() is not correct solution, as it creates a new file, if it does not exits
Edit 2
code for writing
using (FileStream fs = new FileStream(outFileName, FileMode.Create))
{
using (StreamWriter writer = new StreamWriter(fs, Encoding.Unicode))
{
writer.WriteLine(someString)
}
}
code for reading
StreamReader rdr = new StreamReader(File.OpenRead(CsvFilePath));
string header = rdr.ReadLine();
rdr.Close();
Edit 3
used textwriter, same error
using (TextWriter writer = File.CreateText(outFileName))
{
}
Edit 3
Finally as suggested by some users, I am doing a check for the file in while loop for certain number of times before I throw the exception of file not found.
int i = 1;
while (i++ < 10)
{
bool fileExists = File.Exists(CsvFilePath);
if (!fileExists)
System.Threading.Thread.Sleep(500);
else
break;
}
So you are writing a stream to a file, then reading the file back to a stream? Do you need to write the file then post process it, or can you not just use the source stream directly?
If you need the file, I would use a loop that keeps checking if the file exists every second until it appears (or a silly amount of time has passed) - the writer would give you an error if you couldn't write the file, so you know it will turn up eventually.
Since you're writing over a network, most optimal solution would be to save your file in the local system first, then copy it to network location. This way you can avoid network connection problems. And as well have a backup in case of network failure.
Based on your update, Try this instead:
File.WriteAllText(outFileName, someString);
header = null;
using(StreamReader reader = new StreamReader(CsvFilePath)) {
header = reader.ReadLine();
}
Have you tried to read after disposing the writer FileStream?
Like this:
using (FileStream fs = new FileStream(outFileName, FileMode.Create))
{
using (StreamWriter writer = new StreamWriter(fs, Encoding.Unicode))
{
writer.WriteLine(someString)
}
}
using (StreamReader rdr = new StreamReader(File.OpenRead(CsvFilePath)))
{
string header = rdr.ReadLine();
}

Read only the title and/or META tag of HTML file, without loading complete HTML file

Scenario :
I need to parse millions of HTML files/pages (as fact as I can) & then read only only Title or Meta part of it & Dump it to Database
What I am doing is using System.Net.WebClient Class's DownloadString(url_path) to download & then Saving it to Database by LINQ To SQL
But this DownloadString function gives me complete html source, I just need only Title part & META tag part.
Any ideas, to download only that much content?
I think you can open a stream with this url and use this stream to read the first x bytes, I can't tell the exact number but i think you can set it to reasonable number to get the title and the description.
HttpWebRequest fileToDownload = (HttpWebRequest)HttpWebRequest.Create("YourURL");
using (WebResponse fileDownloadResponse = fileToDownload.GetResponse())
{
using (Stream fileStream = fileDownloadResponse.GetResponseStream())
{
using (StreamReader fileStreamReader = new StreamReader(fileStream))
{
char[] x = new char[Number];
fileStreamReader.Read(x, 0, Number);
string data = "";
foreach (char item in x)
{
data += item.ToString();
}
}
}
}
I suspect that WebClient will try to download the whole page first, in which case you'd probably want a raw client socket. Send the appropriate HTTP request (manually, since you're using raw sockets), start reading the response (which will not be immediately) and kill the connection when you've read enough. However, the rest will have probably already been sent from the server and winging its way to your PC whether you want it or not, so you might not save much - if anything - of the bandwidth.
Depending on what you want it for, many half decent websites have a custom 404 page which is a lot simpler than a known page. Whether that has the information you're after is another matter.
You can use the verb "HEAD" in a HttpWebRequest to return the the response headers (not element. To get the full element with the meta data you'll need to download the page and parse out the meta data you want.
System.Net.WebRequest.Create(uri) { Method = "HEAD" };

Is there any issue with below code?

I'm creating a Windows application which reads XML file from given server. This application has installed in about 30 clients. Maybe they will call this function at the same time.
My question:
Will any problem occur if several user call this method at same time?
public string GetXmlInnerText()
{
FtpWebRequest tmpReq = null;
System.Net.WebResponse tmpRes = null;
try
{
if (Settings.Default.Internal)
tmpReq = (FtpWebRequest)FtpWebRequest.Create("ftp://<IPhere>/XMLData.xml");
else
tmpReq = (FtpWebRequest)FtpWebRequest.Create("ftp://<IPhere>/XMLData.xml");
tmpReq.Credentials = new System.Net.NetworkCredential("userName", "password");
tmpReq.KeepAlive = false;
tmpRes = tmpReq.GetResponse();
}
catch (Exception ex)
{
//------
}
string fileContents = null;
using (System.IO.Stream tmpStream = tmpRes.GetResponseStream())
{
using (System.IO.TextReader tmpReader = new System.IO.StreamReader(tmpStream))
{
fileContents = tmpReader.ReadToEnd();
}
}
return fileContents;
}
thanks
One problem - you're not disposing of the WebResponse. It implements IDisposable, so you should use a using statement. With your current structuring, that's not terribly easy to do - you should consider restructuring your try/catch blocks appropriately.
Further, StreamReader uses UTF-8 by default - if your XML documents aren't encoded in UTF-8, you could have problems. If it's an XML document, why not load it via XmlReader.Create(Stream) or something similar? That will handle the encoding for you.
I don't see a problem as you are only reading from a file. The only problem could be the server prohibiting concurrent access to a user account and restricts how many connections are allowed at the same time. In that case, you might be better of with a webservice or script (eg. php) delivering the xml via HTTP, not FTP.
If you're wondering about multiple clients accessing the FTP server at once, it will depend on how the FTP server is set up.
Some will be set up to only allow 2 or 3 clients at once, whereas some will allow (almost) as many as you could ever need.
If the FTP server causes troubles, you could serve it through a HTTP server instead.

Categories