BitTorrent tracker announce problem - c#

I've been throwing a little bit of spare time at writing a BitTorrent client, mostly out of curiosity but partly out of a desire to improve my c# skills.
I've been using the theory wiki as my guide. I've built up a library of classes for handling BEncoding, which I'm quite confident in; basically because the sanity check is to regenerate the original .torrent file from my internal representation immediately after parsing, then hash and compare.
The next stage is to get tracker announces working. Here I hit a stumbling block, because trackers reject my requests without terribly useful error messages.
Take, for instance, the latest stack overflow database dump. My code generates the following announce URI:
http://208.106.250.207:8192/announce?info_hash=-%CA8%C1%C9rDb%ADL%ED%B4%2A%15i%80Z%B8%F%C&peer_id=01234567890123456789&port=6881&uploaded=0&downloaded=0&left=0&compact=0&no_peer_id=0&event=started
The tracker's response to my code:
d14:failure reason32:invalid info hash and/or peer ide
The tracker's response to that string dropped into Chrome's address bar:
d8:completei2e11:external ip13:168.7.249.11110:incompletei0e8:intervali600e5:peerslee
The peer_id is (valid) garbage, but changing it to something sensible (impersonating a widely used client) doesn't change anything.
Like I said, I'm pretty sure I'm pulling the info dictionary out properly and hashing (SHA1) like I should, and the peer id is well formed.
My guess is I'm doing some minor thing stupidly wrong, and would appreciate any help in spotting what it is exactly.
Its kind of hard to guess what code would be pertinent (and there's far to much to just post). However, I'll try and post anything asked for.
EDIT
I wasn't hex encoding the info_hash, which sort of helps.
This is the code that takes the generates URI and try's to fetch a response:
//uri is the above
WebRequest req = WebRequest.Create(uri);
WebResponse resp = req.GetResponse();
Stream stream = resp.GetResponseStream();

MonoTorrent is a BitTorrent implementation that comes with Mono.
In the HTTPTracker class there is a CreateAnnounceString method.
Maybe you can compare your implementation with how that method is doing it?
(You probably need to hunt down where the AnnounceParameters instance is created.)

This isn't an answer to your problem, but it may help for testing.
There are open-source PHP-based torrent trackers out there. They are incredibly inefficient (I know, I wrote a caching mechanism for one back in the day), but you could set up your own local tracker and modify the PHP code to help debug your client as it communicates with the tracker. Having a local client-server setup would make troubleshooting a lot easier.

What exactly are you hashing? You should only hash the info section, not the whole torrent file... So basically, decode the file, reencode the info section, hash that.
ie. For the torrent posted, all you should be hashing is:
d6:lengthi241671490e4:name20:so-export-2009-07.7z12:piece lengthi262144e6:pieces18440:<lots of binary data>e

There is a error in the URL %-encoding of the info_hash. The leading zeros in the two last bytes of the info_hash has been removed.
It is: info_hash=-%CA8%C1%C9rDb%ADL%ED%B4%2A%15i%80Z%B8%F%C
Should be: info_hash=-%CA8%C1%C9rDb%ADL%ED%B4%2A%15i%80Z%B8%0F%0C
When the announce string is dropped into Chrome's address bar it's probably auto-corrected by the browser.

Related

using xmlDocument.Load vs using http GET for loading xml documents

so i have been reading around and i cant seem to find a simple enough answer, i have been doing a bit of work with webservices and xml documents being sent around but now im looking to understand something a little better.
xmlDocument.Load(url) and myHttpWebRequest = (HttpWebRequest) HttpWebRequest.Create(inURL);
now there there is obviously a little more code to each of these but i am just giving a brief idea of both so we're all on the same page.
i have used both, and they both work perfectly well, i just dont want to sell myself short when using one over the other (.Load(url) has WAY less code to it)
In my instance (testing at the moment) i am using the former to get tiny amounts of data from my web service and using the later to post a fair bit of information back to my web service.
So my question actually is, not really which is better but when would it be desirable to use the one over the other?
Does it make a big difference or just 2 ways to do the same thing without any negatives?

Any tips on how i would go about extract Pandora likes and putting them on a spreadsheet? (C++/C#)

Fairly new to coding and i want a project to work on that could help me advance my skills. I'm not sure what language would be best for this sort of undertaking but i would definitely prefer to use C++ or C#.
For the first part of the program i basically would like to try and take all my pandora likes and put them on a spreadsheet with song name is one column and artist in the other. I don't see the formatting being too hard once i actually get the data i need, but i'm not really sure how to communicate with a server at all in this point in time. I'm guessing i probably won't be able to grab a raw list of likes so the i'm thinking my best course of action will be to first expand the likes list all the way, and then i need to read the text on the screen ro in the source code.
For the first step, expanding my like i found the HTML source code that actually does this:
<div class="show_more tracklike" data-nextLikeStartIndex="0" data-nextThumbStartIndex="5">Show more</div>"
Not sure if this is something i can work with but i was thinking if i could set data-nextThumbStartIndex="5" to be equal to the # of likes - 5 (the amount it shows by default) it would be fairly easy to expand the list. If not i would probably have to click the "show more" link repeatedly until i have all the likes on the page.
For the next step, getting the data i want, i think my best option would be to basically just grab the text that i physically see on the screen and worry about filtering and manipulating the data afterwards. The other option is looking at the source code, which i actually found the pieces of code where the info i want is stored. If i could retrieve the page's source code i think it would be relatively easy to pick out the data i actually want from that.
So yea that's about it, i know i'm pretty noob atm and what i'm saying is probably wrong and/or much more complicated than i think but i'm a pretty quick learner and at the very least if someone could point me in the right direction to communicate with a server that would be much appreciated.
This question is quite "wide" (and I have absolutely no knowledge of Pandora itself - can't access it from where I live).
In general, there are several different ways to solve this type of problem:
Screen Scraping - basically access the website as if you were a web-server, and from the HTML string that comes back, dig out the information you need. The problem here is that the data is not very suitable for "machine reading", as it often has no distinct points for the "reader" to find the relevant information, and it's difficult to sort the data from the "chaff".
AJAX api - "Asynchronous Java Script and XML" where the provider of the website has an interface to fetch certain data within to the web-browser - of course, if you "pretend" to be the web-browser, requesting the same type of information. You are relying on the website to have such an interface, but if it exists, the data is generally in a "more suitable form to be machine read" (typically XML, but not always).
JSON api - "Java Script Object Notation" is a similar solution to AJAX - like XML, JSON is a "human and machine readable format".
The latter two are definitely preferable, as the data coming back is meant for machine reading. The drawback is that you need to have "server side cooperation". The good thing here is that Pandora does have a JSON API. The bad thing is that it seems to be hard to use... Here's one discussion on the subject:
Making JSON calls to Unoffical Pandora API
The main principle here is that you send some stuff to the webserver, and receive a reply with the requested information. Exactly how this is done depends on the language/programming environment. A popular C++ solution is libcurl.
There is a Ruby Client here, using the JSON interface
https://github.com/nixme/pandora_client
A C# implementation to interface with Pandora is here:
http://pandoraunleashed.googlecode.com/svn/trunk/PandoraUnleashed/Pandora.cs
Unfortunately, I can't find any direct reference to "listing likes".

Is MonoTouch or the iOS web stack eating my HTTP DELETE request body?

I am using MonoTouch to call a remote web service from an iOS app. I use HttpWebRequest and it works great for me for GET, PUT, and POST requests. However, when I try to make a DELETE request, I get some odd behavior: the entity body that I send gets truncated and the server receives an empty body (Content-Length: 0).
The identical code works perfectly when run on a Windows Phone with the WP7.1 implementation of System.Net.HttpWebRequest.
I know that there is some debate on whether RFC 2616 allows an entity body in a DELETE request (e.g. Phil Haack's question). This question isn't about that - it is about why the body does not make it to the server.
Now to the question :-) Is this issue in MonoTouch's implementation of HttpWebRequest (i.e. Mono enforces a Content-Length of 0 for the body of a DELETE request)? Or does Mono implement HWR on top of an Apple framework that is responsible for this behavior? The reason for the question, of course, is to better understand whether I can work around the issue and/or implore Miguel to allow DELETE bodies, or whether I need to change my wire format.
This looks like a bug in Mono, after a (very) quick look in the source code I found this, which seems to be the culprit.
You should file a bug with a test case so it can be fixed (even better: provide a patch as well, in which case it shouldn't take long to get it fixed).

Compressing parameters in the URL

The urls on my site can become very long, and its my understanding that urls are transmitted with the http requests. So the idea came to compress the string in the url.
From my searching on the internet, i found suggestions on using short urls and then link that one to the long url. Id prefer to not use this solution because I would have to do a extra database check to convert between long and short url.
That leaves in my head 3 options:
Hashing, I don't think this is a option. If you want a safe hashing algorithm, its going to be long.
Compressing the url string, basically having the server depress the string when when it gets the url parameters.
Changing the url so its not descriptive, this is bad because it would make development harder for me (This is a 1 man project).
Considering the vast amount possible amount of OS/browsers out there, I figured id as if anyone else has tried this or have some clever suggestions.
If it maters the url parameters can reach 100+ chars.
Example:
mysite.com/Reports/Ability.aspx?PlayerID=7737&GuildID=132&AbilityID=1140&EventID=1609&EncounterID=-1&ServerID=17&IsPlayer=True
EDIT:
Let me clarify atm this is NOT breaking the site. Its more about me learning to find a good solution ( Im well aware this is micro optimization, my site is very fast atm ) and making my site even faster ( To challenge myself, and become a better coder ).
There is also a cosmetic issue, I personal think that a URL longer then the address bar looks bad.
You have some conflicting requirements as you want to shorten/compress the url without making it less descriptive. By the very nature of shortening the URL, you will, to a certain extent, make it less descriptive.
As I understand it, your goal is to optimise by sending less over the request. You mention 100+ characters, instead of 1000+ which I assume means they don't get that big? In which case, I'd see this as an unnecessary micro-optimisation.
To add to previous suggestions of using POST, a simple thing would be to just shorten the keys instead of using full names if you don't want to do full url shortening e.g.:
mysite.com/Reports/Ability.aspx?pid=7737&GID=132&AID=1140&EID=1609&EnID=-1&SID=17&IsP=True
These are obviously less descriptive.
But like I said, are you having a real problem with having long URLs?
I'm not sure I understand what's your problem with long URLs? Generally I'd try to avoid them, but if that's necessary then you won't depend on the user remembering it anyway, so why go through all the compressing trouble? Even with a URL of 1000 chars (~2KB) the page request won't be slow.
I would, however, consider using POST instead of GET if possible, to prettify the URL, but that's of course depends on your implementation / environment.
It is recommended a few times here to use POST instead of GET. I would strongly recommend AGAINST picking your HTTP action by what the URL looks like. There is more to this choice than how it is displayed in the browser.
A quick overview:
http://www.w3.org/2001/tag/doc/whenToUseGet.html#checklist
A few options to add to the other answers:
Using a subclassed LinkButton for your navigation. This holds the extra data (PlayerId for example) inside its viewstate as properties. This won't be much help though if you're giving URLs to people via emails.
Use the MVC routing engine to produce slightly improved URLs - no keys for the querystring. e.g. mysite.com/Reports/Ability/7737/132/1140/1609/-1/17/True
Create your own URL shortener like tinyurl.com. Store the url in the database along with each of the querystring values to lookup.
Simply setup some friendly URLs for the most popular reports, for example mysite.com/Reports/JanuaryReport. You can also do this using the MVC routing engine.
The MVC routing engine is stand alone and can work without your site being an MVC site.
With my scheme, one could encode the params section of a URL as a base64 string which is ~50% shorter than a direct base64 representation. So for your case you get:
~50% shorter params section
a base 64 string which hides a lot of the detail
see http://blog.alivate.com.au/packed-url/
Most browsers can handle up 2048 characters in URL; if you don't feel like to use a long parameter list, you can always to pass parameters through POST requests.
There are theoretical problems with extended URLs. The exact limit varies across browser (roughly 2k in sort versions of IE) and server (4-8k in Apache, varying on version and configuration), and isn't officially specified in any RFC that I am aware of.
I would agree with synhershko, and replace the URL with form POST parameters instead if you are concerned that your URLs are growing too long.
I've encountered similar situations in the past, although my reasons for optimisation were for SEO. To me it depends on what you're doing with the page URL variables, are they being appended on all/most pages? If they are then to me there is almost always a much better way, although if you're far down the development path it's probably too late now.
I like being able to 'read' a URL, especially when I drop into an unknown site 2 or more layers deep in the navigation and there site is designed poorly, it's often the easiest and fastest way for an advanced user to find where they are on the site.
If you're interested in it from an SEO point of view, its normally best to have a hierarchy which only contains: / - _
Search engines will try and read URL's, see this video by Matt Cutts (can't remember how far into the video he mentions it but it's a good watch anyway...)
Any form of compression of the URL (hashing, compressing, non-descriptive) is going to:
make the urls harder to read, remember and type in correctly
have a performance impact as you will have to decrypt/decompress/convert the url before you can work with it.
Also, hashing is usually considered to be non-reversible - given a hashed value you shouldn't be able to work out what generated it, but you could use it to look up a value in a database, which gets you back to your first issue of short-long lookups.
You could easily just remove the redundant "ID" at the end of each parameter, and possibly strip out vowels or similar to "shorten" the url without losing too much from the semantics of the request.
But to be honest, the length of your URL is one of the least things to worry about in terms of performance - look at the size of any cookies you're sending back and forth between the browser and the server, and the page size you're sending back.

Sending a binary stream through SOAP

I have a "simple" task. I have an existing project with a web service written in C# which has a method that will send a huge XML file to the client. (This is a backup file of data stored on the server that needs to be sent somewhere else.) This service also had some additional authentication/authorization set up.
And I have an existing Delphi 2007 application for WIN32 which calls the web service to extract the XML data for further processing. It's a legacy system that runs without a .NET installation.
Only problem: the XML file is huge (at least 5 MB) and needs to be sent as a whole. Due to system requirements I cannot just split this up into multiple parts. And I'm not allowed to make major changes to either the C# or the Delphi code. (I can only change the method call on both client and server.) And I'm not allowed to spend more than 8 (work) hours to come up with a better solution or else things will just stay unchanged.
The modification I want to add is to compress the XML data (which reduces it to about 100 KB) and then send it to the client as a binary stream. The Delphi code should then accept this incoming stream and de compress the XML data again. Now, with a minimum of changes to the existing code, how should this be done?
(And yes, I wrote the original client and server in the past and it was never meant to send that much data at once. Unfortunately, the developer who took it over from me had other ideas, made several dumb changes, did more damage and left the company before my steel-tipped boot could connect to his behind so now I need to fix a few things. Fixing this web service has a very low priority compared to the other damage that needs to be restored.)
The server code is based on legacy ASMX stuff, the client code is the result of the Delphi SOAP import with some additional modifications.
The XML is a daily update for the 3000+ users which happens to be huge in it's current design. We're working on this but that takes time. There are more important items that need to be fixed first, but as I said, there's a small amount of time available to fix this problem quickly.
This sounds like a good candidate for an HttpHandler
My good links are on my work computer (I'll add them when I get to work), but you can look to see if it will be a good fit.
-- edit --
Here are the links...
http://www.ddj.com/windows/184416694
http://visualstudiomagazine.com/articles/2006/08/01/create-dedicated-service-handlers.aspx?sc_lang=en&sc_mode=edit
What is the problem with a 5MB file in a soap message? I have written a document server that runs over soap and this server has no problem with large files.
If the size is a problem for you I would just compress and decompress the xml data. This can easily be done with one of the many (free) available components for compression of a TStream descendant.
If you get that kind of compression, merely convert each byte to its hex equivalent, which will only double the size, and send this. Then do the opposite on the other end. Or am I missing something?
I would agree with Brad Bruce, HttpHandler would be fast, and using GZIP or Deflate Compression with I might be wrong... browsers support natively. you can get easy great compression on text based data for cheap cpu time.
System.IO.Compression.GZipStream GZipStream = new System.IO.Compression.GZipStream("Your XML Doc Stream",System.IO.Compression.CompressionMode.Compress)

Categories