Easing into BitTorrent

In a previous post the comments were all over the map, but I figure that’s because I didn’t explain well-enough what I was looking to do. So I’m going to break the project up into bite-size chunks, just play along with me, and help if you can. Okay? Thanks.

The first little project is to figure out what’s inside a Torrent file. Honestly, I hadn’t looked until yesterday, as I was opening it in TextEdit I was kind of hoping it would be XML. Apparently it’s a binary format. Lots of junky looking characters. So I did a couple of quick searches trying to find a document that explains the file format, something like the RSS 2.0 spec. I didn’t find it. So that’s question #1.

1. Are there any docs for the BitTorrent file format? If so, pointers please. Thank you very much.

My goal is pretty simple, I want to write a script that creates a Torrent from an MP3, and I don’t want to run Python.

7 responses to this post.

  1. However, I did find that Azureus will create an XML file from a Torrent. Example. It goes both ways. Now I need to find out if this is documented somewhere, and if BT clients will accept the XML version in place of the non-XML format.

    Reply

  2. Posted by Ryan Greene on January 14, 2006 at 12:32 pm

    More info here. Hopefully that and the link from Guan above will be enough to get you rolling.

    Reply

  3. Here’s a brief re-interpretation of the spec:

    First you need to implement the bencode data format. The data structures you need to support are dictionaries, lists, strings (which can be binary) and integers. (Kind of like XML-RPC.)

    The torrent file itself is bencoded. At the top level it’s a dictionary with certain keys that are described in the wiki.theory.org spec document. The most important is the info data structure, which is itself a dictionary and contains keys such as length (integer), name (string), piece length (integer) and pieces (string).

    Pieces is a string containing a concatenation of the SHA-1 hash of each piece.

    When the client communicates with the tracker, bencoded messages are exchanged. The format (a dictionary at the top level) is described in the spec.

    Reply

  4. Posted by Gary Lerhaupt on January 15, 2006 at 6:09 pm

    continuing over to this thread… you can read more about the extra metadata that i’ve been using inside the torrents that i create. included currently is a license url and a dictionary of tags:

    http://www.torrentocracy.com/blog/archives/2005/03/adding_more_met_1.shtml

    Reply

  5. Its called BEncoding. .torrent files consist primarily (by volume) of hashes of each piece of the file. To avoid using Base64 encoding and wasting space BEncoding is used. In fact BitTorrent uses BEncoding for all messages in the protocol. Using XML in some places would have made the protocol inconsistent. Using XML across the board would make it 20% slower than it needs to be. BEncoding is actually a very useful standard for documents that are primarily Binary data. There are free Encoder/Decoder implementations in Python and in Java (the Azureus project).

    Others have posted some good links, particularly: http://wiki.theory.org/BitTorrentSpecification

    Reply

Leave a reply to Dave Winer Cancel reply