I am currently trying to extract the ID of a YouTube video from the embed url YouTube supplies.
I am currently using this as an example:
<iframe width="560" height="315" src="http://www.youtube.com/embed/aSVpBqOsC7o" frameborder="0" allowfullscreen></iframe>
So far my code currently looks like this,
else if (TB_VideoLink.Text.Trim().Contains("http://www.youtube.com/embed/"))
{
youtube_url = TB_VideoLink.Text.Trim();
int Count = youtube_url.IndexOf("/embed/", 7);
string cutid = youtube_url.Substring(Count,youtube_url.IndexOf("\" frameborder"));
LB_VideoCodeLink.Text = cutid;
}
I Seem to be getting there, however the code falls over on CutID and I am not sure why???
Cheers
I always find it much easier to use regular expressions for this sort of thing, Substringand IndexOf always seem dated to me, but that's just my personal opinion.
Here is how I would solve this problem.
Regex regexPattern = new Regex(#"src=\""\S+/embed/(?<videoId>\w+)");
Match videoIdMatch = regexPattern.Match(TB_VideoLink.Text);
if (videoIdMatch.Success)
{
LB_VideoCodeLink.Text = videoIdMatch.Groups["videoId"].Value;
}
This will perform a regular expression match, locating src=", ignoring all characters up until /embed/ then extracting all the word characters after it as a named group.
You can then get the value of this named group. The advantage is, this will work even if frameborder does not occur directly after the src.
Hope this is useful,
Luke
The second parameter of the Substring method is length, not second index. Subtract the index of the second test from the first to get the required length.
else if (TB_VideoLink.Text.Trim().Contains("http://www.youtube.com/embed/"))
{
youtube_url = TB_VideoLink.Text.Trim();
// Find the start of the embed code
int Count = youtube_url.IndexOf("/embed/", 7);
// From the start of the embed bit, search for the next "
int endIndex = youtube_url.IndexOf("\"", Count);
// The ID is from the 'Count' variable, for the next (endIndex-Count) characters
string cutid = youtube_url.Substring(Count, endIndex - Count);
LB_VideoCodeLink.Text = cutid;
}
You probably should have some more exception handling for when either of the two test strings do not exist.
Similar to answer above, but was beaten to it.. doh
//Regex with YouTube Url and Group () any Word character a-z0-9 and expect 1 or more characters +
var youTubeIdRegex = new Regex(#"http://www.youtube.com/embed/(?<videoId>\w+)",RegexOptions.IgnoreCase|RegexOptions.Compiled);
var youTubeUrl = TB_VideoLink.Text.Trim();
var match = youTubeIdRegex.Match(youTubeUrl);
var youTubeId = match.Groups["videoId"].Value; //Group[1] is (\w+) -- first group ()
LB_VideoCodeLink.Text = youTubeId;
Related
I'm currently trying to strip a string of data that is may contain the hyphen symbol.
E.g. Basic logic:
string stringin = "test - 9894"; OR Data could be == "test";
if (string contains a hyphen "-"){
Strip stringin;
output would be "test" deleting from the hyphen.
}
Console.WriteLine(stringin);
The current C# code i'm trying to get to work is shown below:
string Details = "hsh4a - 8989";
var regexItem = new Regex("^[^-]*-?[^-]*$");
string stringin;
stringin = Details.ToString();
if (regexItem.IsMatch(stringin)) {
stringin = stringin.Substring(0, stringin.IndexOf("-") - 1); //Strip from the ending chars and - once - is hit.
}
Details = stringin;
Console.WriteLine(Details);
But pulls in an Error when the string does not contain any hyphen's.
How about just doing this?
stringin.Split('-')[0].Trim();
You could even specify the maximum number of substrings using overloaded Split constructor.
stringin.Split('-', 1)[0].Trim();
Your regex is asking for "zero or one repetition of -", which means that it matches even if your input does NOT contain a hyphen. Thereafter you do this
stringin.Substring(0, stringin.IndexOf("-") - 1)
Which gives an index out of range exception (There is no hyphen to find).
Make a simple change to your regex and it works with or without - ask for "one or more hyphens":
var regexItem = new Regex("^[^-]*-+[^-]*$");
here -------------------------^
It seems that you want the (sub)string starting from the dash ('-') if original one contains '-' or the original string if doesn't have dash.
If it's your case:
String Details = "hsh4a - 8989";
Details = Details.Substring(Details.IndexOf('-') + 1);
I wouldn't use regex for this case if I were you, it makes the solution much more complex than it can be.
For string I am sure will have no more than a couple of dashes I would use this code, because it is one liner and very simple:
string str= entryString.Split(new [] {'-'}, StringSplitOptions.RemoveEmptyEntries)[0];
If you know that a string might contain high amount of dashes, it is not recommended to use this approach - it will create high amount of different strings, although you are looking just for the first one. So, the solution would look like something like this code:
int firstDashIndex = entryString.IndexOf("-");
string str = firstDashIndex > -1? entryString.Substring(0, firstDashIndex) : entryString;
you don't need a regex for this. A simple IndexOf function will give you the index of the hyphen, then you can clean it up from there.
This is also a great place to start writing unit tests as well. They are very good for stuff like this.
Here's what the code could look like :
string inputString = "ho-something";
string outPutString = inputString;
var hyphenIndex = inputString.IndexOf('-');
if (hyphenIndex > -1)
{
outPutString = inputString.Substring(0, hyphenIndex);
}
return outPutString;
I have a file called file_test1.txt and I want to extract just test1 from the name and place it in a string. Whats the best way of doing this?
E.g.
string fullfile = #"C:\file_test1.txt";
string section = [test1] from fullfile; // <- expected result
I want to be able to split on 'file_' and '.txt' as the 'test1' section could be larger or smaller however the 'file_' and '.txt' will always be the same.
Try Path.GetFileNameWithoutExtension(fullfile).Substring(5) (or Substring("TEMPLATE_PREFIX".Length))
You can try spilt
var test = Path.GetFileNameWithoutExtension(fullfile).split('_')[1];
Try following
string fullfile = #"C:\file_test1.txt";
var name = fullfile.Substring(8,fullfile.Length-12)
As c:\file_ and .txt are fixed, You can take Substring starting at index 8 (skip leading name), upto length of total string length - 12 (12 => length of leading name, and trailing extension)
Thought I'd give a solution that uses Split and handles files with multiple underscores:
string.Join("_", Path.GetFileNameWithoutExtension(file).Split('_').Skip(1));
String.Split() works quite well for my uses:
http://msdn.microsoft.com/en-us/library/b873y76a.aspx
Obviously many ways to accomplish this. Here's yet another approach:
string fullfile = #"C:\file_test1.txt";
int index1 = fullfile.LastIndexOf("file_");
if (index1 != -1)
{
int index2 = fullfile.IndexOf(".", index1);
if (index2 != -1)
{
string section = fullfile.Substring(index1 + 5, index2 - index1 - 5);
}
}
You could also get "test1", or any subsequent filename (assuming your file naming convention remains constant!) using this regular expression:
var defaultRegex = new Regex(#"(?<=_).*(?=.txt)");
var matches = defaultRegex.Matches(fullfile);
var match = matches[0].Value;
The regular expression:
(?<=_).*(?=.txt)
uses positive look behind to find text preceded by '_', and also positive lookahead to find text which has '.txt' ahead of it.
I have a mental block and can't seem to figure this out, sure its pretty easy 0_o
I have the following string: "5555S1"
String can contain any number of digits, followed by a Letter(A-Z), followed by numbers again.
How do I get the index of the Letter(S), so that I can substring so get everything following the Letter
Ie: 5555S1
Should return S1
Cheers
You could also check if the integer representation of the character is >= 65 && <=90.
Simple Python:
test = '5555Z187456764587368457638'
for i in range(0,len(test)):
if test[i].isalpha():
break
print test[i:]
Yields: Z187456764587368457638
Given that you didn't say what language your using I'm going to pick the one I want to answer in - c#
String.Index see http://msdn.microsoft.com/en-us/library/system.string.indexof.aspx for more
for good measure here it is in java string.indexOf
One way could be to loop through the string untill you find a letter.
while(! isAlpha(s[i])
i++;
or something should work.
This doesn't answer your question but it does solve your problem.
(Although you can use it to work out the index)
Your problem is a good candidate for Regular Expressions (regex)
Here is one I prepared earlier:
String code = "1234A0987";
//timeout optional but needed for security (so bad guys dont overload your server)
TimeSpan timeout = TimeSpan.FromMilliseconds(150);
//Magic here:
//Pattern == (Block of 1 or more numbers)(block of 1 or more not numbers)(Block of 1 or more numbers)
String regexPattern = #"^(?<firstNum>\d+)(?<notNumber>\D+)(?<SecondNum>\d+)?";
Regex r = new Regex(regexPattern, RegexOptions.None, timeout);
Match m = r.Match(code);
if (m.Success)//We got a match!
{
Console.WriteLine ("SecondNumber: {0}",r.Match(code).Result("${SecondNum}"));
Console.WriteLine("All data (formatted): {0}",r.Match(code).Result("${firstNum}-${notNumber}-${SecondNum}"));
Console.WriteLine("Offset length (not that you need it now): {0}", r.Match(code).Result("${firstNum}").Length);
}
Output:
SecondNumber: 0987
All data (formatted): 1234-A-0987
Offset length (not that you need it now): 4
Further info on this example here.
So there you go you can even work out what that index was.
Regex cheat sheet
The code below is designed to take a string in and remove any of a set of arbitrary words that are considered non-essential to a search phrase.
I didn't write the code, but need to incorporate it into something else. It works, and that's good, but it just feels wrong to me. However, I can't seem to get my head outside the box that this method has created to think of another approach.
Maybe I'm just making it more complicated than it needs to be, but I feel like this might be cleaner with a different technique, perhaps by using LINQ.
I would welcome any suggestions; including the suggestion that I'm over thinking it and that the existing code is perfectly clear, concise and performant.
So, here's the code:
private string RemoveNonEssentialWords(string phrase)
{
//This array is being created manually for demo purposes. In production code it's passed in from elsewhere.
string[] nonessentials = {"left", "right", "acute", "chronic", "excessive", "extensive",
"upper", "lower", "complete", "partial", "subacute", "severe",
"moderate", "total", "small", "large", "minor", "multiple", "early",
"major", "bilateral", "progressive"};
int index = -1;
for (int i = 0; i < nonessentials.Length; i++)
{
index = phrase.ToLower().IndexOf(nonessentials[i]);
while (index >= 0)
{
phrase = phrase.Remove(index, nonessentials[i].Length);
phrase = phrase.Trim().Replace(" ", " ");
index = phrase.IndexOf(nonessentials[i]);
}
}
return phrase;
}
Thanks in advance for your help.
Cheers,
Steve
This appears to be an algorithm for removing stop words from a search phrase.
Here's one thought: If this is in fact being used for a search, do you need the resulting phrase to be a perfect representation of the original (with all original whitespace intact), but with stop words removed, or can it be "close enough" so that the results are still effectively the same?
One approach would be to tokenize the phrase (using the approach of your choice - could be a regex, I'll use a simple split) and then reassemble it with the stop words removed. Example:
public static string RemoveStopWords(string phrase, IEnumerable<string> stop)
{
var tokens = Tokenize(phrase);
var filteredTokens = tokens.Where(s => !stop.Contains(s));
return string.Join(" ", filteredTokens.ToArray());
}
public static IEnumerable<string> Tokenize(string phrase)
{
return string.Split(phrase, ' ');
// Or use a regex, such as:
// return Regex.Split(phrase, #"\W+");
}
This won't give you exactly the same result, but I'll bet that it's close enough and it will definitely run a lot more efficiently. Actual search engines use an approach similar to this, since everything is indexed and searched at the word level, not the character level.
I guess your code is not doing what you want it to do anyway. "moderated" would be converted to "d" if I'm right. To get a good solution you have to specify your requirements a bit more detailed. I would probably use Replace or regular expressions.
I would use a regular expression (created inside the function) for this task. I think it would be capable of doing all the processing at once without having to make multiple passes through the string or having to create multiple intermediate strings.
private string RemoveNonEssentialWords(string phrase)
{
return Regex.Replace(phrase, // input
#"\b(" + String.Join("|", nonessentials) + #")\b", // pattern
"", // replacement
RegexOptions.IgnoreCase)
.Replace(" ", " ");
}
The \b at the beginning and end of the pattern makes sure that the match is on a boundary between alphanumeric and non-alphanumeric characters. In other words, it will not match just part of the word, like your sample code does.
Yeah, that smells.
I like little state machines for parsing, they can be self-contained inside a method using lists of delegates, looping through the characters in the input and sending each one through the state functions (which I have return the next state function based on the examined character).
For performance I would flush out whole words to a string builder after I've hit a separating character and checked the word against the list (might use a hash set for that)
I would create A Hash table of Removed words parse each word if in the hash remove it only one time through the array and I believe that creating a has table is O(n).
How does this look?
foreach (string nonEssent in nonessentials)
{
phrase.Replace(nonEssent, String.Empty);
}
phrase.Replace(" ", " ");
If you want to go the Regex route, you could do it like this. If you're going for speed it's worth a try and you can compare/contrast with other methods:
Start by creating a Regex from the array input. Something like:
var regexString = "\\b(" + string.Join("|", nonessentials) + ")\\b";
That will result in something like:
\b(left|right|chronic)\b
Then create a Regex object to do the find/replace:
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(regexString, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
Then you can just do a Replace like so:
string fixedPhrase = regex.Replace(phrase, "");
I was wondering which is the best way to turn a string (e.g. a post title) into a descriptive URL.
the simplest way that comes to mind is by using a regex, such in:
public static Regex regex = new Regex(
"\\W+",
RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
string result = regex.Replace(InputText,"_");
which turns
"my first (yet not so bad) cupcake!! :) .//\."
into
my_first_yet_not_so_bad_cupcake_
then I can strip the last "_" and check it against my db and see if it's yet present. in that case I would add a trailing number to make it unique and recheck.
I could use it in, say
http://myblogsite.xom/posts/my_first_yet_not_so_bad_cupcake
but, is this way safe? should i check other things (like the length of the string)
is there any other, better method you prefer?
thanks
Here's what I do. regStripNonAlpha removes all the non-alpha or "-" characters. Trim() removes trailing and leading spaces (so we don't end up with dashes on either side). regSpaceToDash converts spaces (or runs of spaces) into a single dash. This has worked well for me.
static Regex regStripNonAlpha = new Regex(#"[^\w\s\-]+", RegexOptions.Compiled);
static Regex regSpaceToDash = new Regex(#"[\s]+", RegexOptions.Compiled);
public static string MakeUrlCompatible(string title)
{
return regSpaceToDash.Replace(
regStripNonAlpha.Replace(title, string.Empty).Trim(), "-");
}
string result = regex.Replace(InputText,"-");
instead of under score put hypen (-) that would give added advantage for Google search engine.
See below post for more details
http://www.mattcutts.com/blog/dashes-vs-underscores/
Here's a method I wrote not too long ago that takes a string and formats it to a permalink.
private string FormatPermalink(string title)
{
StringBuilder result = new StringBuilder();
title = title.Trim();
bool lastOneChanged = false;
for (int i = 0; i < title.Length; i++)
{
char c = title[i];
if (!char.IsLetterOrDigit(c))
{
c = '_';
if (lastOneChanged)
{
continue;
}
lastOneChanged = true;
}
else
{
lastOneChanged = false;
}
result.Append(c);
}
if (result[result.Length - 1] == '_') //if last one is underscore, remove
{
result = result.Remove(result.Length - 1, 1);
}
return result.ToString();
}
This takes into account special characters as well, so if the title has a special character, it just ignores it and moves on to the next one.
You could look into a URL re-writing HTTPModule. There are many examples on the net.
Once implemented in your web.config you simply specify the regular expression to map to the "real" page using the SEO friendly name
<!-- Rule 1: example... "/admin/somepage" redirects to..."/UI/Forms/Admin/frmPage.aspx" -->
<add key="^/admin/(.*)" value="/UI/Forms/Admin/frm$1.aspx" />
If you want to avoid doing this yourself, an HttpModule like http://urlrewriter.net/
could help. It's pretty good but requires a bit setting up.
Personally, I'd couple your special character removing with a date so your example would look like:
http://myblogsite.xom/posts/2009/04/03/my_first_yet_not_so_bad_cupcake
That way, if you content with the same title, it gets differentiated by date too. I see this often on some blogs I visit where they use "Five Random Things Make A Post" a lot (but not within the same day).