Unable to upload large files to Google Docs - c#

I am uploading documents on Google Docs as:
DocumentsService myService = new DocumentsService("");
myService.setUserCredentials("username#domain.com", password );
DocumentEntry newEntry = myService.UploadDocument(#"C:\Sample.txt", "Sample.txt");
But when I try to upload a file of 3 MB I get an exception:
An unhandled exception of type
'Google.GData.Client.GDataRequestException'
occurred in Google.GData.Client.dll
Additional information: Execution of
request failed:
http://docs.google.com/feeds/documents/private/full
How can I upload large files to Google Docs?
I am using Google API ver 2.

In your code you are attempting to upload a .txt file.
This will be converted to a Google Docs "Document" file.
Files that can be converted are limited to 1 Million characters of (2Mb in size).
If you change the extension type to something that is not recognised by Google Docs (for example .log), it will allow you to upload a file of up to 10Gb! Although the free account only has 1Gb of storage for files.
This will allow you to store and retrieve files via your application, but the user will not be able to directly modify them with the Google docs interface, although they can still download it.

There is a limit on size of file being uploaded:
http://docs.google.com/support/bin/answer.py?hl=en&answer=37603
Note that there is a conversion done to html and that post-conversion size id the limit.
If you could post some more specifics I could probably come up with a creative solution. What comes to mind so far are:
How about break document up into smaller documents and link then in either file name or actual link in document.
Pre process the document into streamlined text (not sure what kind of files you need to upload.
Upload as stored files and maybe have a google doc that loads the content in an iframe or something similar.
But yeah, if you give me more details, I can think it out if you like.

Try to findout terms and conditions, if they support larger files. Also there will be timeouts set in the library, see ifyou can increase your timeout values in your GData.Client

You have a file limit of 500Kb for documents to be converted.
http://docs.google.com/support/bin/answer.py?hl=en&answer=37603

Related

Download only the first few bytes of an Azure Blob

I need to search many millions of jpeg files stored in Azure Blob Storage to find ones that are corrupt. It is a specific type of corruption where all the bytes in the file are 0. I should be able to tell if the file is corrupt by inspecting the header, which is in the first several bytes of the file. I don't want to have to download the entire file since it will cost money and time to do so.
I'm using the Microsoft.Azure.Storage.Blob, v 11.1.2 NuGet package and have seen a few methods that looked promising, such as CloudBlockBlob.DownloadToByteArrayAsync and CloudBlockBlob.DownloadToStreamAsync, but it appears to download the entire file (well, DownloadToByteArrayAsync threw an Exception because I hoped I could give it a small array).
Any help is appreciated.
See DownloadRangeToStreamAsync and DownloadRangeToByteArrayAsync. "Range" is the key term here, as it refers to the HTTP Range header, which broadly captures the notion of only downloading part of a resource. See here for how the SDK works under the hood with the Blob REST API.

Temporary files without saving to HDD

Here is my case:
I'm using ABCPDF to generate a HTML document from a .DOCX file that I need to show on the web.
When you export to HTML from ABCPDF you generate a HTML and a folder with support files (.css, .js, .png)
Now these HTML files may contain quite sensitive data so I immediately after generating the files, I move them to a password-protected .zip file (from which I fetch them later)
The problem is, that this leaves the files unencrypted on the HDD for a few seconds and even longer if I'm (for some reason) unable to delete them at once.
I'd like suggestions for another way of doing this. I've looked in to a ram drive, but I'm not happy with installing such drivers on my servers. (AND the RAM drive would still be accessible from the OS)
The cause of the problem here might be that ABCPDF can only export HTML as files (since its multiple files) and not as a stream.
Any ideas?
I'm using .NET 4.6.x and c#
Since all your files except the .HTML are anonymous, you can use the suggested way of writing the HTML to a stream. Only all other files will be stored to the file system.
http://www.websupergoo.com/helppdfnet/source/5-abcpdf/doc/1-methods/save.htm
When saving to a Stream the format can be indicated using a Doc.SaveOptions.FileExtension property such as ".htm" or ".xps". For HTML you must provide a sensible value for the Doc.SaveOptions.Folder property.
http://www.websupergoo.com/helppdfnet/source/5-abcpdf/xsaveoptions/2-properties/folder.htm
This property specifies the folder where to store additional data such as images and fonts. it is only used when exporting documents to HTML. It is ignored otherwise.
For a start, try using a simple MemoryStream to hold the sensitive data. If you get large files or high traffic, open an encrypted stream to a file on your system.

Upload large files to sitecore media library

I have a media centric website that requires us to upload large images, videos to the media library.
I have the default settings for the following settings in web.config.
Media.MaxSizeInDatabase (20MB)
httpRuntime maxRequestLength
I do not want to increase MaxSizeInDatabase limit on the production server for security reasons.
Also, Media.UploadAsFiles is set to false.
So, my question is - Is there a way to configure sitecore such that if the file being uploaded is less than 20MB, it gets stored in the database and the files larger than 20MB get stored on the file system?
As Martijn says, there is nothing built in to automatically detect this, but if you know that the file is going to be large (or the upload fails due to the large size) then you can manually "force it" to save to file on a per upload basis.
You need to use the Advanced Upload option and select the "Upload as Files" option.
EDIT: If you are able to use YouTube then consider the following modules with nicely/tightly integrated with Sitecore. There are a couple of others ways of achieving the same thing for different providers.
YouTube Integration
YouTube Uploader
No, not that I know of. At least not automatically. Uploaded files are either stored in the DB or on the filesystem, based on your setting.
You might want to create an override upload method which could automatically handle this for you or use the manual checkbox in the Advanced Media Upload method as Jammykam says.

Limit the length of a video uploaded to YouTube via C#

I'm working on a project that will use the .NET wrapper for the YouTube API. We will provide a form to users where they can upload a video and it will get posted to a specific page on YouTube. We'd like to limit the length of videos that are uploaded to 60 seconds. Is it possible to set such a length limit at the C#-level in the upload code? I was unable to find anything specific about this in the API docs.
I suspect that this cannot be done as you need to upload the actual video first to determine its length.
You will have to resort to saving the file locally on the server before transmitting it to YouTube. You would then have to use a Media Library to load the video and confirm its length before doing any further processing.
See this for an example.
I haven't used the YouTube API, but an alternative may be to upload the video, check its legnth from YouTube and remove it if it is in violation of your limits.
You are correct. You would have to upload and then check the file attributes to determine length. Theoretically, you can query this while streaming, as you can look at the metadata in the header. I have never queried video, so I am not sure how this is formatted. If you head this direction, you can abort the stream if the header has a length attribute greater than 60 seconds.
A possible issue here is certain types of media files don't contain the length attribute. I am not sure about the types one can upload to YouTube, however.

how to find the timestamp of an online pdf file using c#?

I am writing an application that would download and replace a pdf file only if the timestamp is newer than that of the already existing one ...
I know its possible to read the time stamp of a file on a local computer via the code line below,
MessageBox.Show(File.GetCreationTime("C:\\test.pdf").ToString());
is it possible to read the timestamp of a file that is online without downloading it .. ?
Unless the directory containing the file on the site is configured to show raw file listings there's no way to get a timestamp for a file via HTTP. Even with raw listings you'd need to parse the HTML yourself to get at the timestamp.
If you had FTP access to the files then you could do this. If just using the basic FTP capabilities built into the .NET Framework you'd still need to parse the directory listing to get at the date. However there are third party FTP libraries that fill in the gaps such as editFTPnet where you get a FTPFile class.
Updated:
Per comment:
If I were to set up a simple html file with the dates and filenames
written manually , I could simply read that to find out which files
have actually been updated and download just the required files . is
that a feasible solution ..
That would be one approach, or if you have scripting available (ASP.NET, ASP, PHP, Perl, etc) then you could automate this and have the script get the timestamp of the files(s) and render them for you. Or you could write a very simple web service that returns a JSON or XML blob containing the timestamps for the files which would be less hassle to parse than some HTML.
It's only possible if the web server explicitly serves that data to you. The creation date for a file is part of the file system. However, when you're downloading something over HTTP it's not part of a file system at that point.
HTTP doesn't have a concept of "files" in the way people generally think. Instead, what would otherwise be a "file" is transferred as response data with a response header that gives information about the data. The header can specify the type of the data (such as a PDF "file") and even specify a default name to use if the client decides to save the data as a file on the client's local file system.
However, even when saving that, it's a new file on the client's local file system. It has no knowledge of the original file which produced the data that was served by the web server.

Categories