I had trouble downloading a file from Mediafire. I found out the I have to use their API. I found another SO question: "Get direct download link and file site from Mediafire.com"
With the help of the shown functions I created the following class:
class Program
{
static void Main(string[] args)
{
Mediafireclass mf = new Mediafireclass();
WebClient webClient = new WebClient();
mf.Mediafiredownload("somemediafirelink/test.txt");
webClient.DownloadFileAsync(new Uri("somemediafirelink/test.txt"), #"location to save/test.txt");
}
}
and used the function by T3KBAU5 like this:
internal class Mediafireclass
{
public string Mediafiredownload(string download)
{
HttpWebRequest req;
HttpWebResponse res;
string str = "";
req = (HttpWebRequest)WebRequest.Create(download);
res = (HttpWebResponse)req.GetResponse();
str = new StreamReader(res.GetResponseStream()).ReadToEnd();
int indexurl = str.IndexOf("http://download");
int indexend = GetNextIndexOf('"', str, indexurl);
string direct = str.Substring(indexurl, indexend - indexurl);
return direct;
}
private int GetNextIndexOf(char c, string source, int start)
{
if (start < 0 || start > source.Length - 1)
{
throw new ArgumentOutOfRangeException();
}
for (int i = start; i < source.Length; i++)
{
if (source[i] == c)
{
return i;
}
}
return -1;
}
}
But when I run it this error pops up:
Screenshot of the Error
What can I do to solve the problem, and can you explain what this error means?
Firstly, the Mediafiredownload method returns a string, the direct download link, which you are not using. Your code should resemble:
Mediafireclass mf = new Mediafireclass();
WebClient webClient = new WebClient();
string directLink = mf.Mediafiredownload("somemediafirelink/test.txt");
webClient.DownloadFileAsync(new Uri(directLink), #"location to save/test.txt");
As for the exception it's firing, it's important to understand what the GetNextIndexOf method is doing - iterating through a string, source, to find the index of a character, c, after a certain start position, start. The first line in that method is checking that the start value is within the length of the source string, so that it doesn't immediately look at a character out of the range and throw an ArgumentOutOfRangeException. You need to set a breakpoint on this line:
int indexend = GetNextIndexOf('"', str, indexurl);
And look at the values of str and indexurl using the locals window. This will reveal the problem.
Also, the code you are using is almost 5 years old and I expect this problem is more to do with the fact that Mediafire will have changed the URL structure since then. Your code relies on the fact that the url contains "http://download" which may not be the case any more.
The easiest way is to use an dll file. Such as DirektDownloadLinkCatcher.
Or u have to query for the right div by the "download_link" class and the get the href of the containing tag. Thats the way how I solved it in these dll.^^
Or use the API from MediaFire.
Hoped I could help.
Related
I solved the problem. it must be like this :
int baslangic = Kodlar.IndexOf("<img src=") + 3;
int bitis = Kodlar.Substring(baslangic).IndexOf(">");
I'm trying to parse html with streamreader.
My purpose is , get all images links.
My code is :
string site;
site = $"http://tr.socialll.net/search?name={isim}+{soyad}&location={sehir}&gender=both";
WebRequest talep = HttpWebRequest.Create(site);
WebResponse cevap = talep.GetResponse();
StreamReader oku = new StreamReader(cevap.GetResponseStream());
string Kodlar = oku.ReadToEnd();
int start = Kodlar.IndexOf("<img>") + 4;
int finish = Kodlar.Substring(start).IndexOf("</img>");
Console.WriteLine(Kodlar.Substring(start, finish));
Console.Read();
I want to get here :
<img src="https://iasdai.net/img/user/128x128/116a38953-MWOVJ4aS250K5U.jpg" onerror="this.src='http://tr.socialll.net/img/alternative.png';" alt="">
But i get an error message like this :
An unhandled exception of type 'System.ArgumentOutOfRangeException' occurred in mscorlib.dll
What should i do?
You could use the HtmlDocument class and get all links by their tags through the predefined method GetElementsByTagName(String)
One problem I spotted was how you are searching for the img element
int start = Kodlar.IndexOf("<img>") + 4;
int finish = Kodlar.Substring(start).IndexOf("</img>");
Compare this to the actual image element, it begins with <img src=" and ends with >. There is no </img> at the end of it, so neither were found and both the start and finish variables were set to -1. Once you tried to use these in your substring commands they threw the out of range error
So what you would probably want to do is adjust your start and finish definitons to something like this:
int start = Kodlar.IndexOf("<img ") + 4;
int finish = Kodlar.Substring(start).IndexOf(">");
You may need to double check the values to verify.
I'm trying to make C# program that gets a line on a website and use it.
Unfortunately, I don't know the full line on the site. I only know "steam://joinlobby/730/". Although, what comes after "/730/" is always different.
So i need help getting the full line that comes after it.
What I've got:
public void Main()
{
WebClient web = new WebClient();
// here is the site that i want to download and read text from it.
string result = web.DownloadString("http://steamcommunity.com/id/peppahtank");
if (result.Contains("steam://joinlobby/730/"))
{
//get the part after /730/
}
}
I can tell you that it always ends with "xxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxx"
so: steam://joinlobby/730/xxxxxxxxx/xxxxxxxx.
What's to prevent you from just splitting the string on '/730/'?
result.Split(#"/730/")[1]
https://msdn.microsoft.com/en-us/library/system.string.split(v=vs.110).aspx
The easiest method for this particular case would be to take the first part, and then just skip that many characters
const string Prefix = #"steam://joinlobby/730/";
//...
if(result.StartsWith(Prefix))
{
var otherPart = result.SubString(Prefix.Length);
// TODO: Process other part
}
Make sure your result is not null and begins with steam://joinlobby/730/
if(string.IsNullOrWhiteSpaces(result) && result.StartsWith("steam://joinlobby/730/"))
{
string rest = result.SubString(("steam://joinlobby/730/").Length);
}
I have the following method for replacing emoticons in a string using C#
public static string Emotify(string inputText)
{
var emoticonFolder = EmoticonFolder;
var emoticons = new Hashtable(100)
{
{":)", "facebook-smiley-face-for-comments.png"},
{":D", "big-smile-emoticon-for-facebook.png"},
{":(", "facebook-frown-emoticon.png"},
{":'(", "facebook-cry-emoticon-crying-symbol.png"},
{":P", "facebook-tongue-out-emoticon.png"},
{"O:)", "angel-emoticon.png"},
{"3:)", "devil-emoticon.png"},
{":/", "unsure-emoticon.png"},
{">:O", "angry-emoticon.png"},
{":O", "surprised-emoticon.png"},
{"-_-", "squinting-emoticon.png"},
{":*", "kiss-emoticon.png"},
{"^_^", "kiki-emoticon.png"},
{">:(", "grumpy-emoticon.png"},
{":v", "pacman-emoticon.png"},
{":3", "curly-lips-emoticon.png"},
{"o.O", "confused-emoticon-wtf-symbol-for-facebook.png"},
{";)", "wink-emoticon.png"},
{"8-)", "glasses-emoticon.png"},
{"8| B|", "sunglasses-emoticon.png"}
};
var sb = new StringBuilder(inputText.Length);
for (var i = 0; i < inputText.Length; i++)
{
var strEmote = string.Empty;
foreach (string emote in emoticons.Keys)
{
if (inputText.Length - i >= emote.Length && emote.Equals(inputText.Substring(i, emote.Length), StringComparison.InvariantCultureIgnoreCase))
{
strEmote = emote;
break;
}
}
if (strEmote.Length != 0)
{
sb.AppendFormat("<img src=\"{0}{1}\" alt=\"\" class=\"emoticon\" />", emoticonFolder, emoticons[strEmote]);
i += strEmote.Length - 1;
}
else
{
sb.Append(inputText[i]);
}
}
return sb.ToString();
}
It works great and 'seems' pretty fast, however I realised a slight problem with Html.
This method breaks pages with a link in them because of the..
:/
emoticon. It breaks the
http://
By sticking an image in the middle. I'm trying to figure out a way to adapt this method to take into account links and ignore them - But without sacrificing performance.
Any help or pointers greatly appreciated.
HTML agility pack and regex will be your friend here. You could have a decorator where your decorations build up the src?. Can we have an example of the src that causes the issue? :)
I've been trying to get either an <object> or an <embed> tag using:
HtmlNode videoObjectNode = doc.DocumentNode.SelectSingleNode("//object");
HtmlNode videoEmbedNode = doc.DocumentNode.SelectSingleNode("//embed");
This doesn't seem to work.
Can anyone please tell me how to get these tags and their InnerHtml?
A YouTube embedded video looks like this:
<embed height="385" width="640" type="application/x-shockwave-flash"
src="http://s.ytimg.com/yt/swf/watch-vfl184368.swf" id="movie_player" flashvars="..."
allowscriptaccess="always" allowfullscreen="true" bgcolor="#000000">
I got a feeling the JavaScript might stop the swf player from working, hope not...
Cheers
Update 2010-08-26 (in response to OP's comment):
I think you're thinking about it the wrong way, Alex. Suppose I wrote some C# code that looked like this:
string codeBlock = "if (x == 1) Console.WriteLine(\"Hello, World!\");";
Now, if I wrote a C# parser, should it recognize the contents of the string literal above as C# code and highlight it (or whatever) as such? No, because in the context of a well-formed C# file, that text represents a string to which the codeBlock variable is being assigned.
Similarly, in the HTML on YouTube's pages, the <object> and <embed> elements are not really elements at all in the context of the current HTML document. They are the contents of string values residing within JavaScript code.
In fact, if HtmlAgilityPack did ignore this fact and attempted to recognize all portions of text that could be HTML, it still wouldn't succeed with these elements because, being inside JavaScript, they're heavily escaped with \ characters (notice the precarious Unescape method in the code I posted to get around this issue).
I'm not saying my hacky solution below is the right way to approach this problem; I'm just explaining why obtaining these elements isn't as straightforward as grabbing them with HtmlAgilityPack.
YouTubeScraper
OK, Alex: you asked for it, so here it is. Some truly hacky code to extract your precious <object> and <embed> elements out from that sea of JavaScript.
class YouTubeScraper
{
public HtmlNode FindObjectElement(string url)
{
HtmlNodeCollection scriptNodes = FindScriptNodes(url);
for (int i = 0; i < scriptNodes.Count; ++i)
{
HtmlNode scriptNode = scriptNodes[i];
string javascript = scriptNode.InnerHtml;
int objectNodeLocation = javascript.IndexOf("<object");
if (objectNodeLocation != -1)
{
string htmlStart = javascript.Substring(objectNodeLocation);
int objectNodeEndLocation = htmlStart.IndexOf(">\" :");
if (objectNodeEndLocation != -1)
{
string finalEscapedHtml = htmlStart.Substring(0, objectNodeEndLocation + 1);
string unescaped = Unescape(finalEscapedHtml);
var objectDoc = new HtmlDocument();
objectDoc.LoadHtml(unescaped);
HtmlNode objectNode = objectDoc.GetElementbyId("movie_player");
return objectNode;
}
}
}
return null;
}
public HtmlNode FindEmbedElement(string url)
{
HtmlNodeCollection scriptNodes = FindScriptNodes(url);
for (int i = 0; i < scriptNodes.Count; ++i)
{
HtmlNode scriptNode = scriptNodes[i];
string javascript = scriptNode.InnerHtml;
int approxEmbedNodeLocation = javascript.IndexOf("<\\/object>\" : \"<embed");
if (approxEmbedNodeLocation != -1)
{
string htmlStart = javascript.Substring(approxEmbedNodeLocation + 15);
int embedNodeEndLocation = htmlStart.IndexOf(">\";");
if (embedNodeEndLocation != -1)
{
string finalEscapedHtml = htmlStart.Substring(0, embedNodeEndLocation + 1);
string unescaped = Unescape(finalEscapedHtml);
var embedDoc = new HtmlDocument();
embedDoc.LoadHtml(unescaped);
HtmlNode videoEmbedNode = embedDoc.GetElementbyId("movie_player");
return videoEmbedNode;
}
}
}
return null;
}
protected HtmlNodeCollection FindScriptNodes(string url)
{
var doc = new HtmlDocument();
WebRequest request = WebRequest.Create(url);
using (var response = request.GetResponse())
using (var stream = response.GetResponseStream())
{
doc.Load(stream);
}
HtmlNode root = doc.DocumentNode;
HtmlNodeCollection scriptNodes = root.SelectNodes("//script");
return scriptNodes;
}
static string Unescape(string htmlFromJavascript)
{
// The JavaScript has escaped all of its HTML using backslashes. We need
// to reverse this.
// DISCLAIMER: I am a TOTAL Regex n00b; I make no claims as to the robustness
// of this code. If you could improve it, please, I beg of you to do so. Personally,
// I tested it on a grand total of three inputs. It worked for those, at least.
return Regex.Replace(htmlFromJavascript, #"\\(.)", UnescapeFromBeginning);
}
static string UnescapeFromBeginning(Match match)
{
string text = match.ToString();
if (text.StartsWith("\\"))
{
return text.Substring(1);
}
return text;
}
}
And in case you're interested, here's a little demo I threw together (super fancy, I know):
class Program
{
static void Main(string[] args)
{
var scraper = new YouTubeScraper();
HtmlNode davidAfterDentistEmbedNode = scraper.FindEmbedElement("http://www.youtube.com/watch?v=txqiwrbYGrs");
Console.WriteLine("David After Dentist:");
Console.WriteLine(davidAfterDentistEmbedNode.OuterHtml);
Console.WriteLine();
HtmlNode drunkHistoryObjectNode = scraper.FindObjectElement("http://www.youtube.com/watch?v=jL68NyCSi8o");
Console.WriteLine("Drunk History:");
Console.WriteLine(drunkHistoryObjectNode.OuterHtml);
Console.WriteLine();
HtmlNode jessicaDailyAffirmationEmbedNode = scraper.FindEmbedElement("http://www.youtube.com/watch?v=qR3rK0kZFkg");
Console.WriteLine("Jessica's Daily Affirmation:");
Console.WriteLine(jessicaDailyAffirmationEmbedNode.OuterHtml);
Console.WriteLine();
HtmlNode jazzerciseObjectNode = scraper.FindObjectElement("http://www.youtube.com/watch?v=VGOO8ZhWFR4");
Console.WriteLine("Jazzercise - Move your Boogie Body:");
Console.WriteLine(jazzerciseObjectNode.OuterHtml);
Console.WriteLine();
Console.Write("Finished! Hit Enter to quit.");
Console.ReadLine();
}
}
Original Answer
Why not try using the element's Id instead?
HtmlNode videoEmbedNode = doc.GetElementbyId("movie_player");
Update: Oh man, you're searching for HTML tags that are themselves within JavaScript? That's definitely why this isn't working. (They aren't really tags to be parsed from the perspective of HtmlAgilityPack; all of that JavaScript is really one big string inside a <script> tag.) Maybe there's some way you can parse the <script> tag's inner text itself as HTML and go from there.
Given an absolute URI/URL, I want to get a URI/URL which doesn't contain the leaf portion. For example: given http://foo.com/bar/baz.html, I should get http://foo.com/bar/.
The code which I could come up with seems a bit lengthy, so I'm wondering if there is a better way.
static string GetParentUriString(Uri uri)
{
StringBuilder parentName = new StringBuilder();
// Append the scheme: http, ftp etc.
parentName.Append(uri.Scheme);
// Appned the '://' after the http, ftp etc.
parentName.Append("://");
// Append the host name www.foo.com
parentName.Append(uri.Host);
// Append each segment except the last one. The last one is the
// leaf and we will ignore it.
for (int i = 0; i < uri.Segments.Length - 1; i++)
{
parentName.Append(uri.Segments[i]);
}
return parentName.ToString();
}
One would use the function something like this:
static void Main(string[] args)
{
Uri uri = new Uri("http://foo.com/bar/baz.html");
// Should return http://foo.com/bar/
string parentName = GetParentUriString(uri);
}
Thanks,
Rohit
Did you try this? Seems simple enough.
Uri parent = new Uri(uri, "..");
This is the shortest I can come up with:
static string GetParentUriString(Uri uri)
{
return uri.AbsoluteUri.Remove(uri.AbsoluteUri.Length - uri.Segments.Last().Length);
}
If you want to use the Last() method, you will have to include System.Linq.
There must be an easier way to do this with the built in uri methods but here is my twist on #unknown (yahoo)'s suggestion.
In this version you don't need System.Linq and it also handles URIs with query strings:
private static string GetParentUriString(Uri uri)
{
return uri.AbsoluteUri.Remove(uri.AbsoluteUri.Length - uri.Segments[uri.Segments.Length -1].Length - uri.Query.Length);
}
Quick and dirty
int pos = uriString.LastIndexOf('/');
if (pos > 0) { uriString = uriString.Substring(0, pos); }
Shortest way I found:
static Uri GetParent(Uri uri) {
return new Uri(uri, Path.GetDirectoryName(uri.LocalPath) + "/");
}
PapyRef's answer is incorrect, UriPartial.Path includes the filename.
new Uri(uri, ".").ToString()
seems to be cleanest/simplest implementation of the function requested.
I read many answers here but didn't find one that I liked because they break in some cases.
So, I am using this:
public Uri GetParentUri(Uri uri) {
var withoutQuery = new Uri(uri.GetComponents(UriComponents.Scheme |
UriComponents.UserInfo |
UriComponents.Host |
UriComponents.Port |
UriComponents.Path, UriFormat.UriEscaped));
var trimmed = new Uri(withoutQuery.AbsoluteUri.TrimEnd('/'));
var result = new Uri(trimmed, ".");
return result;
}
Note: It removes the Query and the Fragment intentionally.
new Uri(uri.AbsoluteUri + "/../")
Get segmenation of url
url="http://localhost:9572/School/Common/Admin/Default.aspx"
Dim name() As String = HttpContext.Current.Request.Url.Segments
now simply using for loop or by index, get parent directory name
code = name(2).Remove(name(2).IndexOf("/"))
This returns me, "Common"
Thought I'd chime in; despite it being almost 10 years, with the advent of the cloud, getting the parent Uri is a fairly common (and IMO more valuable) scenario, so combining some of the answers here you would simply use (extended) Uri semantics:
public static Uri Parent(this Uri uri)
{
return new Uri(uri.AbsoluteUri.Remove(uri.AbsoluteUri.Length - uri.Segments.Last().Length - uri.Query.Length).TrimEnd('/'));
}
var source = new Uri("https://foo.azure.com/bar/source/baz.html?q=1");
var parent = source.Parent(); // https://foo.azure.com/bar/source
var folder = parent.Segments.Last(); // source
I can't say I've tested every scenario, so caution advised.