*Cap'n Bry's gnutella search*

Pretend to search for: Server timeout

Problems downloading?

Links

The protocol

Screenshots

My clone

PHP source

Search stats

gnutella protocol

I've started work on a program which utilizes the gnutella protocol (as of version 0.48). Basically, you connect a SOCK_STREAM (tcp) socket to any other gnutella server, send a GNUTELLA CONNECT/0.4[lf][lf] and expect back a GNUTELLA OK[lf][lf]. At this point the server expects you to identify yourself. You send a type 0x00 message to whoever you just connected to, and the server responds with how many files it is sharing, and the total size of those files (in KB). You'll also get a response from everybody connected to the machine you connect to, and so on, until the TTL expires on the message.

At this point, the server will start bombarding you with information about other servers (which fills the host catcher, and gnutellanet stats). You'll also get search requests. You're supposed to decrement the TTL and pass it on to any other servers you're connected to (if TTL > 0) If you have no matching files you can simply discard the packet, otherwise you should build a query response to that message and send it back from where it came.

The header is fixed for all message and ends with the size of the data area which follows. The header contains a Microsoft GUID (Globally Unique Identifier for you nonWinblows people) which is the message identifer. My crystal ball reports that "the GUIDs only have to be unique on the client", which means that you can really put anything here, as long as you keep track of it (a client won't respond to you if it sees the same message id again). If you're responding to a message, be sure you haven't seen the message id (from that host) before, copy their message ID into your response and send it on it's way. That message ID is followed by a function ID (one byte), which looks to be a bitmask. The function ID indicates what type to do with the packet (search request, search response, server info, etc). The next field is a byte TTL. Every packet you recieve you should dec (or -- for the C guys) the TTL and pass the packet on if the TTL is still > 0 (i.e. if (--hdr.TTL) { [pass on] }, god I love C). You should also inc the hop count. Seems redundant? Well, some people have smaller TTLs, and you have the right to drop any message you want to based on its hop count. The header finishes up by telling us how large the function-dependant data that follows is.

Searches: Easy, just build a type 0x80 packet, add a WORD for the minimum connection speed (in kbps), then the null terminated string. There isn't a response from people who have no match, but a result will come back as a type 0x81 message. There will be a gnutella_query_response_hdr followed by N gnutella_query_response_rec_hdr and double NULL terminated filenames. To finish this up, there's a gnutella_query_response_ftr with the full 128 bit (16 byte) client ID of the server that found the result.

Downloads and Uploads:These are a POC. If you want a file from a server, you connect to the server, and send an HTTP request for it. The URL is of the form /get/[file_id]/[filename]. The file id was returned with the search result. The gnutella HTTP server also supports resuming a transfer via the Content-range: HTTP header. If you're just curious, the User-Agent is gnutella. You can actually load up Netscape, and get a file from a Gutella server. Pretty cool, eh? Here's a dump of what a HTTP request looks like:


      GET /get/293/rhubarb_pie.rcp HTTP/1.0

      User-Agent: gnutella

Yes, the user-agent header and HTTP version are required. When gnutella requests a file, it does NOT url encode the filename, but it will accept requests which are url encoded. If the server is behind a firewall which does not allow incoming connections, the client can negotiate a push connection. This is a function ID 0x40 packet. It contains the ClientID128 (GUID) of the server, followed by the File ID requested, and the IP address and port of the client.

Ok, you all are wondering, sure the verbal theory is great, but why not just give us the code! This is my source code: Here's what I got so far. I know, it's in pascal, but what the heck, I'm a Delphi programer 18 hours a day (and a beer drinker the rest). This is a component you can add to your palette which handles connecting to gnutellanet simply drop on your form, link up some event handlers, and call ConnectNewHost to get started. BeginNewSearch() is also something you might be interested in.

So, what have I got running? I've got a client which will connect to multiple servers, host catch, do search requests (and get the results). Of course the search monitor works too. Here are some screenshots. It's written in Delphi, and the code is up above.

Here's a sample log file which shows the protocol in action.

Awlright, some people have read this document and sill keep asking me things like "What's in the first 10 bytes of the header?", so for people who can't figure out how what the syntax of a Delphi record looks like, or can't read english so well, here's some nice tables:

gnutella_header

Byte Pos	Name	Notes
0 - 15	Message ID	A message ID is generated on the client for each new message it creates. The 16 byte value is created with the Windows API call CoCreateGUID(), which in theory will generate a new globally unique value every time you call it. See the text above for a comment about this values uniqueness
16	Function ID	What message type the packet is. See the table of message types below for descriptions of the types
17	TTL Remaining	How many hops the packet has left before it should be dropped
18	Hops taken	How many hops this packet has already taken. Set the TTL on response messages to this value!
19 - 22	Data Length	The length of the Function-dependant data which follows. There has been some discussion as to if this value is actually only 2 bytes and the last 2 bytes are something else. Seems to work with 4 for me. Also there is a question as to signed or unsigned integers. Don't know that either, I can't get gnutella to try and send a 2^31 + 1 byte packet :)

Function IDs

0x00	Ping - An empty message (datalen = 0) sent by a client requesting an 0x01 from everyone on the network. This message type should be responded to with a 0x01 and passed on.
0x01	Ping response - Sent in response to a 0x00, this message contains the host ip and port, how many files the host is sharing and their total size.
0x40	Client push request - For serverants behind a firewall, where the client cannot reach the server directly, a push request message is sent, asking the server to connect out to the client and perform an upload
0x80	Search - This is a search message and contains the query string as well as the minimum speed.
0x81	Search Results - These are results of a 0x80 search request It contains the IP address, port, and speed of the serverant, followed by a list of file sizes and names, and the ClientID128 of the serverant which found the files. ClientID128 is another 16 byte GUID. However, this GUID was created once when the client was installed, is stored in the gnutella.ini, and never changes.

gnutella_ping_response

Byte Pos	Name	Notes
23 - 24	Host port	The TCP port number of the listening host
25 - 28	Host IP	The IP addres of the listening host, in network byte order.
29 - 32	File Count	An integer value indicating the number of files shared by the host. No idea if this is a signed or unsigned value.
33 - 36	Files Total Size	An integer value indicating the total size of files shared by the host, in kilobytes (KB). No idea if this is a signed or unsigned value.

gnutella_query_hdr

Byte Pos	Name	Notes
23 - 24	Minimum speed	The minimum speed of serverants which should perform the search. This is entered my the user in the "Minimum connection speed" edit box.
25 +	Search query	A NULL terminated character string wich contains the search request

gnutella_query_response_hdr

Byte Pos	Name	Notes
23	Num recs	Number of gnutella_query_response_recs which follow this header
24 - 25	Host port	The listening port number of the host which found the results
26 - 29	Host IP	The ip address of the host which found the results. In network byte order.
30 - 33	Host Speed	The speed of the host which found the results. This may be incorrect. I would assume that only 2 bytes would be needed for this. The last 2 bytes may be used to indicate something else
34 +	Array of gnutella_query_response_recs	A gnutella_query_response_recs for each result found
Last 16 bytes	gnutella_query_response_ftr	The clientID128 of the host which found the results. This value is stored in the gnutella.ini and is a GUID created with CoCreateGUID() the first time gnutella is started.

gnutella_query_response_rec

Byte Pos	Name	Notes
+0 offset from start of rec	File Index	Each file indexed on the server has an integer value associated with it. When gnutella scans the hard drive on the server a sequential number is given to each file as it is found. This is the file index.
+4 offset from start of rec	File Size	The size of the file (in bytes).
+8 offset from start of rec	File Name	The name of the file found. No path information is sent, just the file's name. The filename field is double-NULL terminated.

gnutella_push_req

Byte Pos	Name	Notes
23 - 38	ClientID128	The ClientID128 GUID of the server the client wishes the push from.
39 - 42	File Index	Index of file requested. See query_response_rec for more info
43 - 46	Requester IP	IP Address of the host requesting the push. Network byte order
47 - 48	Requester Port	Port number of the host requesting the push.

Routing

An issue everyone wants to ask me about nowadays is routing. "Do I forward every packet I see to every connected host?" Holy Jesus no! That would swamp the network with duplicate packets (which it already is). Here's the secret. For simplicity sake, TTL is not discussed in this section

(Forgive the non-straight lines, but the internet's like that)

Imagine yourself as node 1 in the above diagram. You have direct gnutellanet (physical socket) connections to nodes 2, 3, 4, and 5. You have reachable hosts at nodes 6 thru 13.

You get an ping message (function 0x00) from 2 with a message id of x.
Lookup in your message routing table [message x, socket ???]
Not there? Save [message x, socket 2] in the list.
Respond with an Ping Response (0x01), message id x to node 2.
Send the function 0x00 message to nodes 3, 4, and 5 (not 2!!).
Node 3 will respond with Ping Response (0x01), message id x.
Forward the message to whoever in the list has [message x, socket ???], since this packet is being routed and not broadcast, there is no need to check for if it is a duplicate, as routed messages don't make loops.
Do the same thing with responses from 4 and 5.
Since 3 thru 5 will also pass the message on to 8 thru 13, you'll also get a 0x01 from them too.
Problem:Node 3 is connected to 10 who is connected to 4 who is connected to you! It's OK! You lookup in your route list [message x, socket ???]... It's already there! You drop the message, do not respond to 4, do not forward to anyone!

Here's the basic mechanics, described in the example above:

If the low bit of the function ID (f) is 0, look for [message x, socket ???]. If it's already there, drop the message. If it isn't, add it as [message x, socket s], respond to socket s, and forward to all connected clients except socket s (the one you got the message from).
If the low bit of the function ID (f) is 1, look for the socket which matches [message x, socket ???] and forward the message to that connection only.
If the low bit if the function (f) is not 0 or 1, you need to stop letting an inifite number of monkeys use your machine while they work on their Hamlet script.

TTL and Hops

"How many computers the packet can go through before it will stop being passed around like a whore" - Nouser (#gnutella on efnet)

TTL, anyone who knows anything about TCP/IP will tell you that TTL stands for Time To Live. Basically, when a packet (or message in our case) is sent out, it is stamped with a TTL, for each host that receives the packet, they decrement the TTL. If the TTL is zero, the packet is dropped, otherwise it is routed to the next host in the route. Gnutella TTLs work similarly. When a NEW message is sent from your host, the TTL is set to whatever you have set in your Config | TTL | My TTL setting. When the packet is received by the next host in line the TTL is decremented. Then that TTL is checked against that host's Config | TTL | Max TTL setting. The lower of the two numbers in placed in the outgoing TTL field. If the outgoing TTL is zero, the packet is dropped. [Capn's Note: I'm not positive about this next part] Then the Hops field of the message is incremented and checked. If this number is greater than the Max TTL setting, the packet is dropped.[End Capn's Note] This method means that even if you set your TTL to 255 (maximum value), odds are the TTL will be set to the default (5) by the next host in your chain.

Email CapnBry remove spam from the address for it to work

Home | Other things on linus

CapnBry's PHP Gnutella search v0.4 - See source code for licensing information