c# Regex for file extension embedded in a string

c# Regex for file extension embedded in a string - c#

I need to extract file extension that is embedded in a connection string. I was wondering if it's possible to achieve it using Regex? or any solution would be of great help. The connection string looks something similar to this:
"Provider=MSDASQL.1;Persist Security Info=False;Extended
Properties="DSN=Excel Files; DBQ=<Assume_Location>FILE_NAME.xls;
DriverId=1046;MaxBufferSize=2048;PageTimeout=5;""
In this example the filename is of extension ".xls" but this can be anything e.g. xlsx, dbf, mdb, accdb etc. . We don't have control over how the connection string is generated
Data Source (i.e. DBQ in this example) may be different for different connection strings e.g. Network Address, SourceDB, Server, Hostname etc.

This will work assuming:
The file name does not contain ;
There is a period between the filename and the extension
There is a ; before the filename
There is a ; after the extension
string pattern = #";(?<filename>[^;]*)[.]{1,1}.*;";
string fileName = Regex.Match(connectionString, pattern).Groups["filename"].Value;
Or if this <Assume_Location> is before the file name rather than ; then you can use this for the pattern:
string pattern = #"<Assume_Location>(?<filename>[^;]*)[.]{1,1}.*;";
Edit 1 - In response to comment.
One of these should work if you need the extension as well:
string pattern = #";(?<filename>[^;]*[.]{1,1}[^;]*);";
string pattern = #"<Assume_Location>(?<filename>[^;]*[.]{1,1}[^;]*);";
Edit 2 - In response to more comments.
This should absolutely work (unless it does not, then I will edit this to tone down the confidence level).
string pattern = #";[^=]*=<[^>]*>(?<filename>[^;]*[.][^;]*);";
or this if > and < is not replaced with angle brackets:
string pattern = #";[^=]*=<.*>(?<filename>[^;]*[.][^;]*);";

Well, I am not 100% sure, but it seems like the file is the only thing that is not assigned, is it? Then you can use the following (not tested):
(;|^)(?<File>[^=]+)(;|$)

If the filename always starts with <Assume_Location> then you can:
public static string GetAssumeLocation(string connectionString)
{
const string assumeLocation = "<Assume_Location>";
var builder1 = new DbConnectionStringBuilder();
builder1.ConnectionString = connectionString;
object extended;
if (builder1.TryGetValue("Extended Properties", out extended) && extended is string)
{
var builder2 = new DbConnectionStringBuilder();
builder2.ConnectionString = (string)extended;
foreach (KeyValuePair<string, object> kv in builder2)
{
var value = kv.Value as string;
if (value != null && value.StartsWith(assumeLocation))
{
return Path.GetExtension(value.Substring(assumeLocation.Length));
}
}
}
return null;
}
Note that I'm using the DbConnectionStringBuilder to "split" the connection string, and I'm looking only in the Extended Properties.

Related

How can I get a part/subdomain of my URL in C#?

I have a URL like the following
http://yellowcpd.testpace.net
How can I get yellowcpd from this? I know I can do that with string parsing, but is there a builtin way in C#?

Assuming your URLs will always be testpace.net, try this:
var subdomain = Request.Url.Host.Replace("testpace.net", "").TrimEnd('.');
It'll just give you the non-testpace.net part of the Host. If you don't have Request.Url.Host, you can do new Uri(myString).Host instead.

try this
string url = Request.Url.AbsolutePath;
var myvalues= url.Split('.');

How can I get yellowcpd from this? I know I can do that with string
parsing, but is there a builtin way in C#?
.Net doesn't provide a built-in feature to extract specific parts from Uri.Host. You will have to use string manipulation or a regular expression yourself.
The only constant part of the domain string is the TLD. The TLD is the very last bit of the domain string, eg .com, .net, .uk etc. Everything else under that depends on the particular TLD for its position (so you can't assume the next to last part is the "domain name" as, for .co.uk it would be .co

This fits the bill.
Split over two lines:
string rawURL = Request.Url.Host;
string domainName = rawURL .Split(new char[] { '.', '.' })[1];
Or over one:
string rawURL = Request.Url.Host.Split(new char[] { '.', '.' })[1];

The simple answer to your question is no there isn't a built in method to extract JUST the sub-domain. With that said this is the solution that I use...
public enum GetSubDomainOption
{
ExcludeWWW,
IncludeWWW
};
public static class Extentions
{
public static string GetSubDomain(this Uri uri,
GetSubDomainOption getSubDomainOption = GetSubDomainOption.IncludeWWW)
{
var subdomain = new StringBuilder();
for (var i = 0; i < uri.Host.Split(new char[]{'.'}).Length - 2; i++)
{
//Ignore any www values of ExcludeWWW option is set
if(getSubDomainOption == GetSubDomainOption.ExcludeWWW && uri.Host.Split(new char[]{'.'})[i].ToLowerInvariant() == "www") continue;
//I use a ternary operator here...this could easily be converted to an if/else if you are of the ternary operators are evil crowd
subdomain.Append((i < uri.Host.Split(new char[]{'.'}).Length - 3 &&
uri.Host.Split(new char[]{'.'})[i+1].ToLowerInvariant() != "www") ?
uri.Host.Split(new char[]{'.'})[i] + "." :
uri.Host.Split(new char[]{'.'})[i]);
}
return subdomain.ToString();
}
}
USAGE:
var subDomain = Request.Url.GetSubDomain(GetSubDomainOption.ExcludeWWW);
or
var subDomain = Request.Url.GetSubDomain();
I currently have the default set to include the WWW. You could easilly reverse this by switching the optional parameter value in the GetSubDomain() method.
In my opinion this allows for an option that looks nice in code and without digging in appears to be 'built-in' to c#. Just to confirm your expectations...I tested three values and this method will always return just the "yellowcpd" if the exclude flag is used.
www.yellowcpd.testpace.net
yellowcpd.testpace.net
www.yellowcpd.www.testpace.net
One assumption that I use is that...splitting the hostname on a . will always result in the last two values being the domain (i.e. something.com)

As others have pointed out, you can do something like this:
var req = new HttpRequest(filename: "search", url: "http://www.yellowcpd.testpace.net", queryString: "q=alaska");
var host = req.Url.Host;
var yellow = host.Split('.')[1];
The portion of the URL you want is part of the hostname. You may hope to find some method that directly addresses that portion of the name, e.g. "the subdomain (yellowcpd) within TestSpace", but this is probably not possible, because the rules for valid host names allow for any number of labels (see Valid Host Names). The host name can have any number of labels, separated by periods. You will have to add additional restrictions to get what you want, e.g. "Separate the host name into labels, discard www if present and take the next label".

How to make filenames web safe using c#

This is not about encoding URLs its more to do with a problem I noticed where you can have a valid filename on IIS sucha as "test & test.jpg" but this cannot be downloaded due to the & causing an error. There are other characters that do this also that are valid in windows but not for web.
My quick solution is to change the filename before saving using a regex below...
public static string MakeFileNameWebSafe(string fileNameIn)
{
string pattern = #"[^A-Za-z0-9. ]";
string safeFilename = System.Text.RegularExpressions.Regex.Replace(fileNameIn, pattern, string.Empty);
if (safeFilename.StartsWith(".")) safeFilename = "noname" + safeFilename;
return safeFilename;
}
but I was wondering if there were any better built in ways of doing this.

Built-in I don't know about.
What you can do is, like you say, scan the original filename and generate a Web-safe version of it.
For such Web-safe versions, you can make it appear like slugs in blogs and blog categories (these are search engine-optimized):
Only lowercase characters
Numbers are allowed
Dashes are allowed
Spaces are replaced by dashes
Nothing else is allowed
Possibly you could replace "&" by "-and-"
So "test & test.jpg" would translate to "test-and-test.jpg".

Just looking back at this question since its fairly popular. Just though I would post my current solution up here with various overloads for anyone who wants it..
public static string MakeSafeFilename(string filename, string spaceReplace)
{
return MakeSafeFilename(filename, spaceReplace, false, false);
}
public static string MakeSafeUrlSegment(string text)
{
return MakeSafeUrlSegment(text, "-");
}
public static string MakeSafeUrlSegment(string text, string spaceReplace)
{
return MakeSafeFilename(text, spaceReplace, false, true);
}
public static string MakeSafeFilename(string filename, string spaceReplace, bool htmlDecode, bool forUrlSegment)
{
if (htmlDecode)
filename = HttpUtility.HtmlDecode(filename);
string pattern = forUrlSegment ? #"[^A-Za-z0-9_\- ]" : #"[^A-Za-z0-9._\- ]";
string safeFilename = Regex.Replace(filename, pattern, string.Empty);
safeFilename = safeFilename.Replace(" ", spaceReplace);
return safeFilename;
}

I think you are referring to the "A potentially dangerous Request.Path value was detected from the client (%)" error which Asp.Net throws for paths which include characters which might indicate cross site scripting attempts:
there is a good article on how to work around this:
http://www.hanselman.com/blog/ExperimentsInWackinessAllowingPercentsAnglebracketsAndOtherNaughtyThingsInTheASPNETIISRequestURL.aspx

Here's the one I use:
public static string MakeFileNameWebSafe(string path, string replace, string other)
{
var folder = System.IO.Path.GetDirectoryName(path);
var name = System.IO.Path.GetFileNameWithoutExtension(path);
var ext = System.IO.Path.GetExtension(path);
if (name == null) return path;
var allowed = #"a-zA-Z0-9" + replace + (other ?? string.Empty);
name = System.Text.RegularExpressions.Regex.Replace(name.Trim(), #"[^" + allowed + "]", replace);
name = System.Text.RegularExpressions.Regex.Replace(name, #"[" + replace + "]+", replace);
if (name.EndsWith(replace)) name = name.Substring(0, name.Length - 1);
return folder + name + ext;
}

If you are not concerned to keep the original name perhaps you could just replace the name with a guid?

Path functions for URL

I want to use functions of Path class (GetDirectoryName, GetFileName, Combine,etc.) with paths in URL format with slash (/).
Example of my path:
"xxx://server/folder1/folder2/file"
I tried to do the job with Path functions and in the end just replaced the separator.
I've found that the GetDirectoryName function does not correctly replace the slashes:
Path.GetDirectoryName(#"xxx://server/folder/file") -> #"xxx:\server\folder"
Like you see one slash is lost.
How can I cause the Path functions to use the 'alternative' separator?
Can I use another class with the same functionality?

I'm afraid GetDirectoryName, GetFileName, Combine,etc. use Path.DirectorySeparatorChar in the definition and you want Path.AltDirectorySeparatorChar.
And since Path is a sealed class, I think the only way to go about is string replacement.You can replace Path.DirectorySeparatorChar('\') with Path.AltDirectorySeparatorChar('/') and Path.VolumeSeparatorChar(':') with ":/"

For GetDirectoryName(), you can use
pageRoot = uri.Remove(uri.LastIndexOf('/') + 1);

Have you considered using a combination of System.Uri, System.UriBuilder, and (if necessary) custom System.UriParser subclass(es)?

If the URI is a local file URI of the form file://whatever then you can call string path = new Uri(whatever).LocalPath and call the Path methods on it. If you cannot guarantee the Uri is to a local path, you cannot guarantee components of the Uri correspond to machines, folders, files, extensions, use directories, separator characters, or anything else.

Long time after...I was looking for a solution and found this topic, so i decided to make my (very simple) code
string dirRootUpdate = string.Empty;
string fileNameupdate = string.Empty;
string pathToGetUpdate = string.Empty;
string[] _f = Properties.Settings.Default.AutoUpdateServerUrl.Split('/');
for (int i = 0; i < _f.Count() - 1; i++)
{
dirRootUpdate += _f[i];
if (i == 0) // is the first one
{
dirRootUpdate += "/";
}
else if (i != _f.Count() - 2) // not the last one ?
{
dirRootUpdate += "/";
}
}
fileNameupdate = _f[_f.Count() - 1];
the setting "Properties.Settings.Default.AutoUpdateServerUrl" contains the string to be verified
Works fine, may require some refination to look better.
Hope could help someone

extract query string from a URL string

I am reading from history, and I want that when i come across a google query, I can extract the query string. I am not using request or httputility since i am simply parsing a string. however, when i come across URLs like this, my program fails to parse it properly:
http://www.google.com.mt/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=mt&source=hp&biw=986&bih=663&q=hotmail&meta=&btnG=Fittex+bil-Google
what i was trying to do is get the index of q= and the index of & and take the words in between but in this case the index of & will be smaller than q= and it will give me errors.
any suggestions?
thanks for your answers, all seem good :) p.s. i couldn't use httputility, not I don't want to. when i add a reference to system.web, httputility isn't included! it's only included in an asp.net application. Thanks again

It's not clear why you don't want to use HttpUtility. You could always add a reference to System.Web and use it:
var parsedQuery = HttpUtility.ParseQueryString(input);
Console.WriteLine(parsedQuery["q"]);
If that's not an option then perhaps this approach will help:
var query = input.Split('&')
.Single(s => s.StartsWith("q="))
.Substring(2);
Console.WriteLine(query);
It splits on & and looks for the single split result that begins with "q=" and takes the substring at position 2 to return everything after the = sign. The assumption is that there will be a single match, which seems reasonable for this case, otherwise an exception will be thrown. If that's not the case then replace Single with Where, loop over the results and perform the same substring operation in the loop.
EDIT: to cover the scenario mentioned in the comments this updated version can be used:
int index = input.IndexOf('?');
var query = input.Substring(index + 1)
.Split('&')
.SingleOrDefault(s => s.StartsWith("q="));
if (query != null)
Console.WriteLine(query.Substring(2));

If you don't want to use System.Web.HttpUtility (thus be able to use the client profile), you can still use Mono HttpUtility.cs which is only an independent .cs file that you can embed in your application. Then you can simply use the ParseQueryString method inside the class to parse the query string properly.

here is the solution -
string GetQueryString(string url, string key)
{
string query_string = string.Empty;
var uri = new Uri(url);
var newQueryString = HttpUtility.ParseQueryString(uri.Query);
query_string = newQueryString[key].ToString();
return query_string;
}

Why don't you create a code which returns the string from the q= onwards till the next &?
For example:
string s = historyString.Substring(url.IndexOf("q="));
int newIndex = s.IndexOf("&");
string newString = s.Substring(0, newIndex);
Cheers

Use the tools available:
String UrlStr = "http://www.google.com.mt/search?client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&channel=s&hl=mt&source=hp&biw=986&bih=663&q=hotmail&meta=&btnG=Fittex+bil-Google";
NameValueCollection Items = HttpUtility.ParseQueryString(UrlStr);
String QValue = Items["q"];

If you really need to do the parsing yourself, and are only interested in the value for 'q' then the following would work:
string url = #"http://www.google.com.mt/search?" +
"client=firefoxa&rls=org.mozilla%3Aen-" +
"US%3Aofficial&channel=s&hl=mt&source=hp&" +
"biw=986&bih=663&q=hotmail&meta=&btnG=Fittex+bil-Google";
int question = url.IndexOf("?");
if(question>-1)
{
int qindex = url.IndexOf("q=", question);
if (qindex > -1)
{
int ampersand = url.IndexOf('&', qindex);
string token = null;
if (ampersand > -1)
token = url.Substring(qindex+2, ampersand - qindex - 2);
else
token = url.Substring(qindex+2);
Console.WriteLine(token);
}
}
But do try to look at using a proper URL parser, it will save you a lot of hassle in the future.
(amended this question to include a check for the '?' token, and support 'q' values at the end of the query string (without the '&' at the end) )

And that's why you should use Uri and HttpUtility.ParseQueryString.

HttpUtility is fine for the .Net Framework. However that class is not available for WinRT apps. If you want to get the parameters from a url in a Windows Store App you need to use WwwFromUrlDecoder. You create an object from this class with the query string you want to get the parameters from, the object has an enumerator and supports also lambda expressions.
Here's an example
var stringUrl = "http://localhost/?name=Jonathan&lastName=Morales";
var decoder = new WwwFormUrlDecoder(stringUrl);
//Using GetFirstByName method
string nameValue = decoder.GetFirstByName("name");
//nameValue has "Jonathan"
//Using Lambda Expressions
var parameter = decoder.FirstOrDefault(p => p.Name.Contains("last")); //IWwwFormUrlDecoderEntry variable type
string parameterName = parameter.Name; //lastName
string parameterValue = parameter.Value; //Morales
You can also see http://www.dzhang.com/blog/2012/08/21/parsing-uri-query-strings-in-windows-8-metro-style-apps

C# file upload: no groups from reg ex?

this code was working fine till this morning, can anyone spot my mistake? probably really silly but it has me stumped!
i use a form to submit a file (field name 'fileUpEx'), and then i wrote a class to upload it (like i said, it's been working for ages)....
(if i write 'filepath' to the page it is 'Test copy.pdf')
My class returns 'no groups'!!!
Very odd, can anyone please help?
string filepath = fileUpEx.PostedFile.FileName;
string pat = #"\\(?:.+)\\(.+)\.(.+)";
Regex r = new Regex(pat);
Match m = r.Match(filepath);
if (m.Groups[0].Captures.Count != 0)
{
//blaa blaa blaa
}
else
{
return "no Groups";
}
Thanks in advance,
Vauneen

Your regular expression requires that the file path contains a backslash which it doesn't. You could perhaps make that part optional, for example:
#"(?:\\.+\\)?(.+)\.(.+)"
Alternatively you could use the methods available in System.IO.Path:
string extension = Path.GetExtension(filePath);
string filename = Path.GetFilenameWithoutExtension(filePath);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# Regex for file extension embedded in a string - c#

Well, I am not 100% sure, but it seems like the file is the only thing that is not assigned, is it? Then you can use the following (not tested): (;|^)(?<File>[^=]+)(;|$)

Related

How can I get a part/subdomain of my URL in C#?

How to make filenames web safe using c#

Path functions for URL

extract query string from a URL string

C# file upload: no groups from reg ex?

Categories

Resources