| Author: | Niall O'Higgins <niallo@p2presearch.com> |
|---|---|
| Date: | July 10th 2008 |
bittorrent security
tit-for-tat
limitations
third parties
Because BitTorrent itself lacks mechanisms for some essential problems, third party technologies and sites have arisen to fill the gaps:
HTTP and RSS
You could say that the "BitTorrent Network" is almost comprised more of HTTP sites and RSS feeds than BitTorrent protocol traffic itself!
For this reason, the vast majority of useful analysis can be conducted just by employing the HTTP protocol and using RSS parsers.
No BitTorrent needed at all!
Python rocks
Python is of course an excellent choice both for HTTP and RSS operations.
threading
Crawling BitTorrent aggregators consists of doing lots of HTTP and RSS.
Threading module for concurrency. GIL doesn't matter much for I/O bound stuff.
batteries included
C and Python are friends
While Python rocks, sometimes stuff can be worth writing in C. In our case metadata (.torrent file) parsing.
We have around 300,000 metadata files. These can be over a megabyte in size. Parsers in C can be quite fast.
C and Python are friends
Also, C gives you a bit more control over some stuff than the Python stdlib. For example, Python mmap module misses 'offset' parameter which is very useful for P2P apps.
But of course this could be fixed.
bittorrent content
Analysis of over 320,000 torrents.
Is anything here surprising?
content supply rate
The blue line shows the number of individual torrents added to the bittorrent network over the past 7 days.
supply by category
An additive graph of content added over the past 7 days.
the end
More info at our blog, http://blog.p2presearch.com