.NET WebClient.DownloadData get file type?

.NET WebClient.DownloadData get file type? - c#

In order to handle cases of downloading data from a url that has no file extension,
I need to know what the file type is.
for example, how can the WebClient.DownloadData method reveal that it downloaded a png [edit: jpeg] image using the url below?
https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcTw4P3HxyHR8wumE3lY3TOlGworijj2U2DawhY9wnmcPKnbmGHg
I did not find anything in the documentation that describes how to do this.

If you trust the header information, this is possible to do using WebClient—you don't need to use HttpClient:
var webClient = new WebClient();
var result = webClient.DownloadData(url);
var contentType = webClient.ResponseHeaders["Content-Type"];
if (contentType != null &&
contentType.StartsWith("image", StringComparison.OrdinalIgnoreCase))
{
// it's probably an image
}

It can't, directly.
If you trust the headers the web server sends back, you could use a different HTTP client (e.g. WebRequest or HttpClient) to make the entire response available rather than just the body. You can then look at the Content-Type header.
Other than that, you'll need to look at the content itself. Various file types have "magic numbers" which you could use to identify the file - they're typically at the start of the file, and if you only have a limited set of file types to look for, this may well be a viable approach. It won't be able to identify all file types though.
As an example, the first four bytes of the image you've linked to are ff d8 ff e0. That reveals that actually it's not a jpeg image. As it happens, the server response also included a header of content-type: image/jpeg.

You may use HttpClient for doing this GET request.
Sample code:
HttpClient client = new HttpClient();
var response = await client.GetAsync("https://encrypted-tbn2.gstatic.com/images?q=tbn%3aANd9GcTw4P3HxyHR8wumE3lY3TOlGworijj2U2DawhY9wnmcPKnbmGHg");
var filetype = response.Content.Headers.ContentType.MediaType;
var imageArray = await response.Content.ReadAsByteArrayAsync();
On the above code, filetype variable has the file type and also extension as image/JPEG or image/PNG etc.

You can try using FindMimeFromData API. Here is the snippet. It may help you.
WebClient webClient = new WebClient();
var result = webClient.DownloadData(new Uri("url"));
IntPtr mimeout;
int result2 = FindMimeFromData(IntPtr.Zero, "sample", result, 4096, null, 0, out mimeout, 0);
if (result2 != 0)
throw Marshal.GetExceptionForHR(result2);
string mime = Marshal.PtrToStringUni(mimeout);
Marshal.FreeCoTaskMem(mimeout);
Console.WriteLine(mime);
And here is the API declaration. (Copied from here)
[DllImport("urlmon.dll", CharSet = CharSet.Unicode, ExactSpelling = true, SetLastError = false)]
static extern int FindMimeFromData(IntPtr pBC, [MarshalAs(UnmanagedType.LPWStr)] string pwzUrl, [MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.I1, SizeParamIndex = 3)]
byte[] pBuffer, int cbSize, [MarshalAs(UnmanagedType.LPWStr)] string pwzMimeProposed, int dwMimeFlags, out IntPtr ppwzMimeOut, int dwReserved);

Related

Can't open network drive (smb / samba) programatically with C# and .NET 4.8

I have problems opening a network drive programmatically. I have used the way described here:Access a Remote Directory from C#. This is the proposed solution in many other Stackoverflow posts and on other sites. And I have not seen any other way to do it.
The proposed solution gives an exception: system error 67 "The network name cannot be found".
The problem is that the network name is correct and can be found.
If I first make the connection in windows file explorer typing the same network name, username and password, the files can be accessed from the code.
Code as I want to make it following the proposed solution, which is not working:
var SambaSharePath = #"\\images.eksempel.dk\archive\public";
var SambaUsername = #"net\username";
var SambaPassword = ConfigurationManager.AppSettings["SambaPassword"];
networkCredential = new NetworkCredential(SambaUsername, SambaPassword);
string filename = Path.Combine(SambaSharePath, imagePath);
MemoryStream image = new MemoryStream();
using (var x = new NetworkConnection(SambaSharePath, networkCredential))
{
var stream = new FileStream(filename, FileMode.Open);
filename = Path.GetFileName(filename);
stream.CopyTo(image);
stream.Close();
}
return image;
To see how NetworkConnection works, use the link above.
Code which is working after the connection to the share is established in file explorer (but not before):
...
string filename = Path.Combine(SambaSharePath, imagePath);
var stream = new FileStream(filename, FileMode.Open);
...
I have tried any possible combinations of forward and backward slashes etc. for the sharename and username, to get the code working but nothing helps. So I have ruled out misspelling and formatting errors.
I have tried google the exception, but none of the explanations gave meaning in the context.
Does anybody have an idea of how to get the code working and what the is the cause of the error?
The workaround is to first establish the connection with file explorer, but that doesn't work automatically with server restart.

The problem was solved using this post: Why do these DLLs have two apparently identical entry points?
I had to make a change in the original solution.
[DllImport("mpr.dll")]
private static extern int WNetAddConnection2(NetResource netResource,
string password,
string username,
int flags);
Must be modified to:
[DllImport("mpr.dll", EntryPoint = "WNetAddConnection2", SetLastError = true, CharSet = CharSet.Auto)]
private static extern int WNetAddConnection2(NetResource netResource,
string password, string username, int flags);
It came down how the pathname and username string are handled. As ANSI or Unicode.

MarshalAs(UnmanagedType.LPStr) - how does this convert utf-8 strings to char*

The question title is basically what I'd like to ask:
[MarshalAs(UnmanagedType.LPStr)] - how does this convert utf-8 strings to char* ?
I use the above line when I attempt to communicate between c# and c++ dlls;
more specifically, between:
somefunction(char *string) [c++ dll]
somefunction([MarshalAs(UnmanagedType.LPStr) string text) [c#]
When I send my utf-8 text (scintilla.Text) through c# and into my c++ dll,
I'm shown in my VS 10 debugger that:
the c# string was successfully converted to char*
the resulting char* properly reflects the corresponding utf-8 chars (including the bit in Korean) in the watch window.
Here's a screenshot (with more details):
As you can see, initialScriptText[0] returns the single byte(char): 'B' and the contents of char* initialScriptText are displayed properly (including Korean) in the VS watch window.
Going through the char pointer, it seems that English is saved as one byte per char, while Korean seems to be saved as two bytes per char. (the Korean word in the screenshot is 3 letters, hence saved in 6 bytes)
This seems to show that each 'letter' isn't saved in equal size containers, but differs depending on language. (possible hint on type?)
I'm trying to achieve the same result in pure c++: reading in utf-8 files and saving the result as char*.
Here's an example of my attempt to read a utf-8 file and convert to char* in c++:
observations:
loss in visual when converting from wchar_t* to char*
since result, s8 displays the string properly, I know I've converted the utf-8 file content in wchar_t* successfully to char*
since 'result' retains the bytes I've taken directly from the file, but I'm getting a different result from what I had through c# (I've used the same file), I've concluded that the c# marshal has put the file contents through some other procedure to further mutate the text to char*.
(the screenshot also shows my terrible failure in using wcstombs)
note: I'm using the utf8 header from (http://utfcpp.sourceforge.net/)
Please correct me on any mistakes in my code/observations.
I'd like to be able to mimic the result I'm getting through the c# marshal and I've realised after going through all this that I'm completely stuck. Any ideas?

[MarshalAs(UnmanagedType.LPStr)] - how does this convert utf-8 strings to char* ?
It doesn't. There is no such thing as a "utf-8 string" in managed code, strings are always encoded in utf-16. The marshaling from and to an LPStr is done with the default system code page. Which makes it fairly remarkable that you see Korean glyphs in the debugger, unless you use code page 949.
If interop with utf-8 is a hard requirement then you need to use a byte[] in the pinvoke declaration. And convert back and forth yourself with System.Text.Encoding.UTF8. Use its GetString() method to convert the byte[] to a string, its GetBytes() method to convert a string to byte[]. Avoid all this if possible by using wchar_t[] in the native code.

While the other answers are correct, there has been a major development in .NET 4.7. Now there is an option that does exactly what UTF-8 needs: UnmanagedType.LPUTF8Str. I tried it and it works like a Swiss chronometre, doing exactly what it sounds like.
In fact, I even used MarshalAs(UnmanagedType.LPUTF8Str) in one parameter and MarshalAs(UnmanagedType.LPStr) in another. Also works. Here is my method (takes in string parameters and returns a string via a parameter):
[DllImport("mylib.dll", ExactSpelling = true, CallingConvention = CallingConvention.StdCall)]
public static extern void ProcessContent([MarshalAs(UnmanagedType.LPUTF8Str)]string content,
[MarshalAs(UnmanagedType.LPUTF8Str), Out]StringBuilder outputBuffer,[MarshalAs(UnmanagedType.LPStr)]string settings);
Thanks, Microsoft! Another nuisance is gone.

ICustomMarshaler can be used, in case of using .NET Framework earlier than 4.7.
class UTF8StringCodec : ICustomMarshaler
{
public static ICustomMarshaler GetInstance(string cookie) => new UTF8StringCodec();
public void CleanUpManagedData(object ManagedObj)
{
// nop
}
public void CleanUpNativeData(IntPtr pNativeData)
{
Marshal.FreeCoTaskMem(pNativeData);
}
public int GetNativeDataSize()
{
throw new NotImplementedException();
}
public IntPtr MarshalManagedToNative(object ManagedObj)
{
var text = $"{ManagedObj}";
var bytes = Encoding.UTF8.GetBytes(text);
var ptr = Marshal.AllocCoTaskMem(bytes.Length + 1);
Marshal.Copy(bytes, 0, ptr, bytes.Length);
Marshal.WriteByte(ptr, bytes.Length, 0);
return ptr;
}
public object MarshalNativeToManaged(IntPtr pNativeData)
{
if (pNativeData == IntPtr.Zero)
{
return null;
}
var bytes = new MemoryStream();
var ofs = 0;
while (true)
{
var byt = Marshal.ReadByte(pNativeData, ofs);
if (byt == 0)
{
break;
}
bytes.WriteByte(byt);
ofs++;
}
return Encoding.UTF8.GetString(bytes.ToArray());
}
}
P/Invoke declaration:
[DllImport("native.dll", CallingConvention = CallingConvention.Cdecl)]
private extern static int NativeFunc(
[MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(UTF8StringCodec))] string path
);
Usage inside callback:
[StructLayout(LayoutKind.Sequential)]
struct Options
{
[MarshalAs(UnmanagedType.FunctionPtr)]
public CallbackFunc callback;
}
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
public delegate int CallbackFunc(
[MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(UTF8StringCodec))] string path
);

If you need to marshal UTF-8 string do it manually.
Define function with IntPtr instead of string:
somefunction(IntPtr text)
Then convert text to zero-terminated UTF8 array of bytes and write them to IntPtr:
byte[] retArray = Encoding.UTF8.GetBytes(text);
byte[] retArrayZ = new byte[retArray.Length + 1];
Array.Copy(retArray, retArrayZ, retArray.Length);
IntPtr retPtr = AllocHGlobal(retArrayZ.Length);
Marshal.Copy(retArrayZ, 0, retPtr, retArrayZ.Length);
somefunction(retPtr);

Check type of uploaded file

How do I check the file type of a file uploaded using FileUploader control in an ASP.NET C# webpage?
I tried checking file extension, but it obviously fails when a JPEG image (e.g. Leonardo.jpg) is renamed to have a PDF's extension (e.g. Leonardo.pdf).
I tried
FileUpload1.PostedFile.ContentType.ToLower().Equals("application/pdf")
but this fails as the above code behaves the same way as the first did.
Is there any other way to check the actual file type, not just the extension?
I looked at ASP.NET how to check type of the file type irrespective of extension.
Edit: I tried below code from one of the posts in stackoverflow. But this down't work. Any idea about this.
/// <summary>
/// This class allows access to the internal MimeMapping-Class in System.Web
/// </summary>
class MimeMappingWrapper
{
static MethodInfo getMimeMappingMethod;
static MimeMappingWrapper() {
// dirty trick - Assembly.LoadWIthPartialName has been deprecated
Assembly ass = Assembly.LoadWithPartialName("System.Web");
Type t = ass.GetType("System.Web.MimeMapping");
getMimeMappingMethod t.GetMethod("GetMimeMapping", BindingFlags.Static | BindingFlags.NonPublic | BindingFlags.Public));
}
/// <summary>
/// Returns a MIME type depending on the passed files extension
/// </summary>
/// <param name="fileName">File to get a MIME type for</param>
/// <returns>MIME type according to the files extension</returns>
public static string GetMimeMapping(string fileName) {
return (string)getMimeMappingMethod.Invoke(null, new[] { fileName });
}
}

Dont use File Extensions to work out MIME Types, instead use "Winista" for binary analysis.
Say someone renames an exe with a jpg extension. You can still determine the real file format. It doesn't detect swf's or flv's but does pretty much every other well known format and you can get a hex editor to add more files it can detect.
Download Winista: here or my mirror or my GitHub https://github.com/MeaningOfLights/MimeDetect.
Where Winista fails to detect the real file format, I've resorted back to the URLMon method:
public class urlmonMimeDetect
{
[DllImport(#"urlmon.dll", CharSet = CharSet.Auto)]
private extern static System.UInt32 FindMimeFromData(
System.UInt32 pBC,
[MarshalAs(UnmanagedType.LPStr)] System.String pwzUrl,
[MarshalAs(UnmanagedType.LPArray)] byte[] pBuffer,
System.UInt32 cbSize,
[MarshalAs(UnmanagedType.LPStr)] System.String pwzMimeProposed,
System.UInt32 dwMimeFlags,
out System.UInt32 ppwzMimeOut,
System.UInt32 dwReserverd
);
public string GetMimeFromFile(string filename)
{
if (!File.Exists(filename))
throw new FileNotFoundException(filename + " not found");
byte[] buffer = new byte[256];
using (FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
if (fs.Length >= 256)
fs.Read(buffer, 0, 256);
else
fs.Read(buffer, 0, (int)fs.Length);
}
try
{
System.UInt32 mimetype;
FindMimeFromData(0, null, buffer, 256, null, 0, out mimetype, 0);
System.IntPtr mimeTypePtr = new IntPtr(mimetype);
string mime = Marshal.PtrToStringUni(mimeTypePtr);
Marshal.FreeCoTaskMem(mimeTypePtr);
return mime;
}
catch (Exception e)
{
return "unknown/unknown";
}
}
}
From inside the Winista method, I fall back on the URLMon here:
public MimeType GetMimeTypeFromFile(string filePath)
{
sbyte[] fileData = null;
using (FileStream srcFile = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
byte[] data = new byte[srcFile.Length];
srcFile.Read(data, 0, (Int32)srcFile.Length);
fileData = Winista.Mime.SupportUtil.ToSByteArray(data);
}
MimeType oMimeType = GetMimeType(fileData);
if (oMimeType != null) return oMimeType;
//We haven't found the file using Magic (eg a text/plain file)
//so instead use URLMon to try and get the files format
Winista.MimeDetect.URLMONMimeDetect.urlmonMimeDetect urlmonMimeDetect = new Winista.MimeDetect.URLMONMimeDetect.urlmonMimeDetect();
string urlmonMimeType = urlmonMimeDetect.GetMimeFromFile(filePath);
if (!string.IsNullOrEmpty(urlmonMimeType))
{
foreach (MimeType mimeType in types)
{
if (mimeType.Name == urlmonMimeType)
{
return mimeType;
}
}
}
return oMimeType;
}
Update:
To work out more files using magic here is a FILE SIGNATURES TABLE

Checking the names or extension is in no way a reliable idea. The only way you can be sure is that you actually read the content of the file.
i.e. if you want to check the file for image, you should try loading image from the file and if it fails, you can be sure that it is not an image file. This can be done easily using GDI objects.
Same is also true for PDF files.
Conclusion is, don't rely on the user supplied name or extension.

you can check you file type in FileApload by
ValidationExpression="^.+.(([pP][dD][fF])|([jJ][pP][gG])|([pP][nN][gG])))$"
for ex: you can add ([rR][aA][rR]) for Rar file type and etc ...

File upload asp.net

What is the best way to validate the file format in file-upload control in ASP.NET?
Actually I want that user only upload files with specific format. Although I validate it by checking file name but I am looking for another solution to over come this.

Try the following code which reads the first 256 bytes from the file and return the mime type of the file using an internal dll (urlmon.dll).Then compare the mime type of your file and the returned mime type after parsing.
using System.Runtime.InteropServices; ...
[DllImport(#"urlmon.dll", CharSet = CharSet.Auto)]
private extern static System.UInt32 FindMimeFromData(
System.UInt32 pBC,
[MarshalAs(UnmanagedType.LPStr)] System.String pwzUrl,
[MarshalAs(UnmanagedType.LPArray)] byte[] pBuffer,
System.UInt32 cbSize,
[MarshalAs(UnmanagedType.LPStr)] System.String pwzMimeProposed,
System.UInt32 dwMimeFlags,
out System.UInt32 ppwzMimeOut,
System.UInt32 dwReserverd
);
public string getMimeFromFile(string filename)
{
if (!File.Exists(filename))
throw new FileNotFoundException(filename + " not found");
byte[] buffer = new byte[256];
using (FileStream fs = new FileStream(filename, FileMode.Open))
{
if (fs.Length >= 256)
fs.Read(buffer, 0, 256);
else
fs.Read(buffer, 0, (int)fs.Length);
}
try
{
System.UInt32 mimetype;
FindMimeFromData(0, null, buffer, 256, null, 0, out mimetype, 0);
System.IntPtr mimeTypePtr = new IntPtr(mimetype);
string mime = Marshal.PtrToStringUni(mimeTypePtr);
Marshal.FreeCoTaskMem(mimeTypePtr);
return mime;
}
catch (Exception e)
{
return "unknown/unknown";
}
}
But check the type in different browsers, since the mimetype might be different in different browsers.
Also this will give the exact mimetype even if you changed the extension by editing the name of the file.
Hope this helps you...

The only way to be sure is to actually parse the whole file according to the specification of the file format and check that everything fits.
If you want to do just basic check, most binary file formats have some form of header or magic number at their beginning, that you can check for.

You can use a component like Uploadify that limit's the user which type of files he can choose before uploading.

Using HttpWebRequest with dynamic URI causes "parameter is not valid" in Image.FromStream

I'm trying to obtain an image to encode to a WordML document. The original version of this function used files, but I needed to change it to get images created on the fly with an aspx page. I've adapted the code to use HttpWebRequest instead of a WebClient. The problem is that I don't think the page request is getting resolved and so the image stream is invalid, generating the error "parameter is not valid" when I invoke Image.FromStream.
public string RenderCitationTableImage(string citation_table_id)
{
string image_content = "";
string _strBaseURL = String.Format("http://{0}",
HttpContext.Current.Request.Url.GetComponents(UriComponents.HostAndPort, UriFormat.Unescaped));
string _strPageURL = String.Format("{0}{1}", _strBaseURL,
ResolveUrl("~/Publication/render_citation_chart.aspx"));
string _staticURL = String.Format("{0}{1}", _strBaseURL,
ResolveUrl("~/Images/table.gif"));
string _fullURL = String.Format("{0}?publication_id={1}&citation_table_layout_id={2}",
_strPageURL, publication_id, citation_table_id);
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(_fullURL);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream image_stream = response.GetResponseStream();
// Read the image data
MemoryStream ms = new MemoryStream();
int num_read;
byte[] crlf = System.Text.Encoding.Default.GetBytes("\r\n");
byte[] buffer = new byte[1024];
for (num_read = image_stream.Read(buffer, 0, 1024); num_read > 0; num_read = image_stream.Read(buffer, 0, 1024))
{
ms.Write(buffer, 0, num_read);
}
// Base 64 Encode the image data
byte[] image_bytes = ms.ToArray();
string encodedImage = Convert.ToBase64String(image_bytes);
ms.Position = 0;
System.Drawing.Image image_original = System.Drawing.Image.FromStream(ms); // <---error here: parameter is not valid
image_stream.Close();
image_content = string.Format("<w:p>{4}<w:r><w:pict><w:binData w:name=\"wordml://{0}\">{1}</w:binData>" +
"<v:shape style=\"width:{2}px;height:{3}px\">" +
"<v:imagedata src=\"wordml://{0}\"/>" +
"</v:shape>" +
"</w:pict></w:r></w:p>", _word_image_id, encodedImage, 800, 400, alignment.center);
image_content = "<w:br w:type=\"text-wrapping\"/>" + image_content + "<w:br w:type=\"text-wrapping\"/>";
}
catch (Exception ex)
{
return ex.ToString();
}
return image_content;
Using a static URI it works fine. If I replace "staticURL" with "fullURL" in the WebRequest.Create method I get the error. Any ideas as to why the page request doesn't fully resolve?
And yes, the full URL resolves fine and shows an image if I post it in the address bar.

UPDATE:
Just read your updated question. Since you're running into login issues, try doing this before you execute the request:
request.Credentials = CredentialCache.DefaultCredentials
If this doesn't work, then perhaps the problem is that authentication is not being enforced on static files, but is being enforced on dynamic files. In this case, you'll need to log in first (using your client code) and retain the login cookie (using HttpWebRequest.CookieContainer on the login request as well as on the second request) or turn off authentication on the page you're trying to access.
ORIGINAL:
Since it works with one HTTP URL and doesn't work with another, the place to start diagnosing this is figuring out what's different between the two requests, at the HTTP level, which accounts for the difference in behavior in your code.
To figure out the difference, I'd use Fiddler (http://fiddlertool.com) to compare the two requests. Compare the HTTP headers. Are they the same? In particular, are they the same HTTP content type? If not, that's likely the source of your problem.
If headers are the same, make sure both the static and dynamic image are exactly the same content and file type on the server. (e.g. use File...Save As to save the image in a browser to your disk). Then use Fiddler's Hex View to compare the image content. Can you see any obvious differences?
Finally, I'm sure you've already checked this, but just making sure: /Publication/render_citation_chart.aspx refers to an actual image file, not an HTML wrapper around an IMG element, right? This would account for the behavior you're seeing, where a browser renders the image OK but your code doesn't.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

.NET WebClient.DownloadData get file type? - c#

Related

Can't open network drive (smb / samba) programatically with C# and .NET 4.8

MarshalAs(UnmanagedType.LPStr) - how does this convert utf-8 strings to char*

Check type of uploaded file

File upload asp.net

Using HttpWebRequest with dynamic URI causes "parameter is not valid" in Image.FromStream

Categories

Resources