gnutella protocol
I've started work on a program which utilizes the gnutella protocol
(as of version 0.48). Basically, you connect a SOCK_STREAM (tcp) socket
to any other
gnutella server,
send a GNUTELLA CONNECT/0.4[lf][lf] and expect back a
GNUTELLA OK[lf][lf] . At this point the server expects
you to identify yourself. You send a type 0x00 message to whoever
you just connected to, and the
server responds with how many files it is sharing, and the total size
of those files (in KB). You'll also get a response from everybody connected
to the machine you connect to, and so on, until the TTL expires on the
message.
At this point, the server will start bombarding you with information
about other servers (which fills the host catcher, and gnutellanet
stats). You'll also get search requests. You're supposed to decrement
the TTL and pass it on to any other servers you're connected to (if TTL > 0)
If you have no matching files you can simply discard the packet, otherwise
you should build a query response to that message and send it back from
where it came.
The header is fixed for all message and ends with the size of the
data area which follows. The header contains a Microsoft GUID
(Globally Unique Identifier for you nonWinblows people) which is the
message identifer. My crystal ball reports that "the GUIDs only have
to be unique on the client", which means that you can really put
anything here, as long as you keep track of it (a client won't respond
to you if it sees the same message id again). If you're responding to
a message, be sure you haven't seen the message id (from that host) before,
copy their message ID into your response and send it on it's way.
That message ID is followed by a function ID (one byte), which looks to
be a bitmask. The function ID
indicates what type to do with the packet (search request, search response,
server info, etc). The next field is a byte TTL. Every packet you recieve
you should dec (or -- for the C guys) the TTL and pass the packet on if the
TTL is still > 0 (i.e. if (--hdr.TTL) { [pass on] }, god I love C). You
should also inc the hop count. Seems redundant? Well, some people have
smaller TTLs, and you have the right to drop any message you want to based
on its hop count. The header finishes up by telling us how large the
function-dependant data that follows is.
Searches: Easy, just build a type 0x80 packet, add a WORD
for the minimum connection speed (in kbps), then the null terminated string.
There isn't a response from people who have no match, but a result
will come back as a type 0x81 message. There
will be a gnutella_query_response_hdr followed by N
gnutella_query_response_rec_hdr and double NULL terminated filenames.
To finish this up, there's a gnutella_query_response_ftr with the full
128 bit (16 byte) client ID of the server that found the result.
Downloads and Uploads:These are a POC. If you want a file from
a server, you connect to the server, and send an HTTP request for it.
The URL is of the form /get/[file_id]/[filename] .
The file id
was returned with the search result. The gnutella HTTP server also
supports resuming a transfer via the Content-range:
HTTP header. If you're just curious, the User-Agent is gnutella.
You can actually load up Netscape, and get a file from a Gutella
server. Pretty cool, eh? Here's a dump of what a HTTP request
looks like:
GET /get/293/rhubarb_pie.rcp HTTP/1.0
User-Agent: gnutella
Yes, the user-agent header and HTTP version are required. When gnutella
requests a file, it does NOT url encode the filename, but it will accept
requests which are url encoded.
If the server is behind a firewall which does not allow incoming
connections, the client can negotiate a push connection. This is a
function ID 0x40 packet. It contains the ClientID128 (GUID) of the
server, followed by the File ID requested, and the IP address and
port of the client.
Ok, you all are wondering, sure the verbal theory is great, but
why not just give us the code! This is my source code:
Here's what I got so far. I know, it's in pascal, but what
the heck, I'm a Delphi programer 18 hours a day (and a beer drinker
the rest). This is a component you can add to your palette which
handles connecting to gnutellanet simply drop on your form,
link up some event handlers, and call ConnectNewHost to get
started. BeginNewSearch() is also something you might be interested
in.
So, what have I got running? I've got a client which will connect
to multiple servers, host catch, do search requests (and get the results).
Of course
the search monitor works too. Here are some
screenshots. It's written in Delphi, and the code is up above.
Here's a sample log file which shows the
protocol in action.
Awlright, some people have read this document and sill keep asking
me things like "What's in the first 10 bytes of the header?", so
for people who can't figure out how what the syntax of a Delphi record
looks like, or can't read english so well, here's some nice tables:
gnutella_header
Byte Pos |
Name |
Notes |
0 - 15 |
Message ID |
A message ID is generated on the client for each new message
it creates. The 16 byte value is created with the Windows API
call CoCreateGUID(), which in theory will generate a new globally
unique value every time you call it. See the text above for a
comment about this values uniqueness |
16 |
Function ID |
What message type the packet is. See the table of message types
below for descriptions of the types |
17 |
TTL Remaining |
How many hops the packet has left before it should be dropped |
18 |
Hops taken |
How many hops this packet has already taken. Set the TTL
on response messages to this value! |
19 - 22 |
Data Length |
The length of the Function-dependant data which follows. There
has been some discussion as to if this value is actually
only 2 bytes and the last 2 bytes are something else. Seems
to work with 4 for me. Also there is a question as to signed
or unsigned integers. Don't know that either, I can't get
gnutella to try and send a 2^31 + 1 byte packet :) |
Function IDs
0x00 |
Ping - An empty message (datalen = 0) sent by a client
requesting an 0x01 from everyone on the network. This message
type should be responded to with a 0x01 and passed on. |
0x01 |
Ping response - Sent in response to a 0x00, this message
contains the host ip and port, how many files the host is sharing
and their total size. |
0x40 |
Client push request - For serverants behind a firewall,
where the client cannot reach the server directly, a push request
message is sent, asking the server to connect out to the client
and perform an upload |
0x80 |
Search - This is a search message and contains the query
string as well as the minimum speed. |
0x81 |
Search Results - These are results of a 0x80 search request
It contains the IP address, port, and speed of the serverant,
followed by a list of file sizes and names, and the ClientID128
of the serverant which found the files. ClientID128 is another
16 byte GUID. However, this GUID was created once when the client
was installed, is stored in the gnutella.ini, and never changes. |
gnutella_ping_response
Byte Pos |
Name |
Notes |
23 - 24 |
Host port |
The TCP port number of the listening host |
25 - 28 |
Host IP |
The IP addres of the listening host, in network byte order. |
29 - 32 |
File Count |
An integer value indicating the number of files shared by the
host. No idea if this is a signed or unsigned value. |
33 - 36 |
Files Total Size |
An integer value indicating the total size of files shared by the
host, in kilobytes (KB). No idea if this is a signed or unsigned value. |
gnutella_query_hdr
Byte Pos |
Name |
Notes |
23 - 24 |
Minimum speed |
The minimum speed of serverants which should perform the search.
This is entered my the user in the "Minimum connection speed" edit
box. |
25 + |
Search query |
A NULL terminated character string wich contains the search
request |
gnutella_query_response_hdr
Byte Pos |
Name |
Notes |
23 |
Num recs |
Number of gnutella_query_response_recs which follow this header |
24 - 25 |
Host port |
The listening port number of the host which found the results |
26 - 29 |
Host IP |
The ip address of the host which found the results. In network
byte order. |
30 - 33 |
Host Speed |
The speed of the host which found the results. This may be
incorrect. I would assume that only 2 bytes would be needed for
this. The last 2 bytes may be used to indicate something else |
34 + |
Array of gnutella_query_response_recs |
A gnutella_query_response_recs for each result found |
Last 16 bytes |
gnutella_query_response_ftr |
The clientID128 of the host which found the results. This value
is stored in the gnutella.ini and is a GUID created with CoCreateGUID()
the first time gnutella is started. |
gnutella_query_response_rec
Byte Pos |
Name |
Notes |
+0 offset from start of rec |
File Index |
Each file indexed on the server has an integer value associated
with it. When gnutella scans the hard drive on the server a
sequential number is given to each file as it is found. This is the
file index. |
+4 offset from start of rec |
File Size |
The size of the file (in bytes). |
+8 offset from start of rec |
File Name |
The name of the file found. No path information is sent, just the
file's name. The filename field is double-NULL terminated. |
gnutella_push_req
Byte Pos |
Name |
Notes |
23 - 38 |
ClientID128 |
The ClientID128 GUID of the server the client wishes the push from. |
39 - 42 |
File Index |
Index of file requested. See query_response_rec for more info |
43 - 46 |
Requester IP |
IP Address of the host requesting the push. Network byte order |
47 - 48 |
Requester Port |
Port number of the host requesting the push. |
Routing
An issue everyone wants to ask me about nowadays is routing. "Do I
forward every packet I see to every connected host?" Holy Jesus no!
That would swamp the network with duplicate packets (which it already
is). Here's the secret. For simplicity sake, TTL is not
discussed in this section
(Forgive the non-straight lines, but the internet's like that)
Imagine yourself as node 1 in the above diagram. You have direct
gnutellanet (physical socket) connections to nodes 2, 3, 4, and 5.
You have reachable hosts at nodes 6 thru 13.
- You get an ping message (function 0x00) from 2 with a
message id of x.
- Lookup in your message routing table
[message x,
socket ???]
- Not there? Save
[message x, socket 2]
in the list.
- Respond with an Ping Response (0x01), message id x to node 2.
- Send the function 0x00 message to nodes 3, 4, and 5 (not 2!!).
- Node 3 will respond with Ping Response (0x01), message id x.
- Forward the message to whoever in the
list has
[message x, socket ???] ,
since this packet is being routed and not broadcast, there is
no need to check for if it is a duplicate, as routed messages
don't make loops.
- Do the same thing with responses from 4 and 5.
- Since 3 thru 5 will also pass the message on to 8 thru 13,
you'll also get a 0x01 from them too.
- Problem:Node 3 is connected to 10 who is connected to
4 who is connected to you! It's OK! You lookup in your route list
[message x, socket ???] ... It's already
there! You drop the message, do not respond to 4, do not forward
to anyone!
Here's the basic mechanics, described in the example above:
- If the low bit of the function ID (f) is 0, look for
[message x, socket ???] . If it's already
there, drop the message. If it isn't, add it as
[message x, socket s] , respond to
socket s , and forward to all connected clients except
socket s (the one you got the message from).
- If the low bit of the function ID (f) is 1, look for
the socket which matches
[message x, socket ???]
and forward the message to that connection only.
- If the low bit if the function (f) is not 0 or 1, you need to
stop letting an inifite number of monkeys use your machine
while they work on their Hamlet script.
TTL and Hops
"How many computers the packet can go through before it will stop
being passed around like a whore" - Nouser (#gnutella on efnet)
TTL, anyone who knows anything about TCP/IP will tell you that TTL stands
for Time To Live. Basically, when a packet (or message in our
case) is sent out, it is stamped with a TTL, for each host that receives
the packet, they decrement the TTL. If the TTL is zero, the packet is
dropped, otherwise it is routed to the next host in the route. Gnutella
TTLs work similarly. When a NEW message is sent from your host, the
TTL is set to whatever you have set in your Config | TTL | My TTL setting.
When the packet is received by the next host in line the TTL is
decremented. Then that TTL is checked against that host's Config |
TTL | Max TTL setting. The lower of the two numbers in placed in the
outgoing TTL field. If the outgoing TTL is zero, the packet is dropped.
[Capn's Note: I'm not positive about this next part]
Then the Hops field of the message is incremented and checked.
If this number is greater than the Max TTL setting,
the packet is dropped.[End Capn's Note] This method means that
even if you set your TTL to 255 (maximum value), odds are the TTL will
be set to the default (5) by the next host in your chain.
|