Subject: Re: Attachment "Box" (covers MIME) Sun Sep 27 18:14:31 1998 > I would like to set up an email form to allow an attachment to also be > attached. Does anyone know how to do this? basically, it's a MIME issue. to upload files from a form, you need to use more or less the following:
Select a File:  
the two items of note being the 'enctype="multipart/form-data"' line in the
declaration, and the item of type 'file'. the first tells the browser (and indirectly, the server) the the data sent will be formatted in a different way than is normal.. one which is vastly more convenient for handling files. the second is the specific markup which creates a file upload widget. the standard encoding method for form data is x-encoded, which is familiar to anyone who deals with forms regularly: input_01=item+1&input_02=item+2 the multipart/form-data encoding is a bit bulkier, transmitting each input item as a separate 'entity', without encoding the items themselves: -----------------------------19198123266180 Content-Disposition: form-data; name="input_01" item 1 -----------------------------19198123266180 Content-Disposition: form-data; name="input_02" item 2 -----------------------------19198123266180-- and putting a randomly-generated separator line between entities. the idea behind the separator is to generate a pattern of characters which is, shall we say, unlikely to appear by accident as part of the data being uploaded in my experience, it's best to be a bit stodgy when handling file uploads on the server side. it's tempting to read the whole input stream directly into memory and split it up there, and this technique is in fact given as an example in various CGI programming books. it happens to suck rocks when you're sucking down 30MB graphics files, though, which is the first practical application i ever had for file uploads. in general, it's much easier on the system to write the entire input stream to a file, then go back and process the pieces from there. for the sake of speed, it's best to write the data in blocks of uniform size, generally a multiple of 4K. that tends to be the buffer size most OSes use for file i/o: $tmpfile = sprintf ("/tmp/upload_%d.%d", time(), $$); open (TMP, "+>$tmpfile") or die qq(can't write to "$tmpfile": $!); while (read (STDIN, $block, 4096)) { print TMP $block; } seek (TMP, 0, 0); in the code above, a unique filename is generated using the current timestamp and the ID of the process ID of the script. unix systems guarantee that no two processes running at the same time can have the same PID, so that's a quick & easy way to keep multiple versions from stepping on each others' toes. the read() function returns zero when there's no more data, and a positive value otherwise, so it makes an adequate control item for the while() loop. the seek() function takes you back to the beginning of the file so you can step through it and process the data. it's a whole lot easier to chop up the data this way, in the long run. reading multipart data from a stream is tricky because there's an inherent chicken-and-egg scenario to it. you don't know that you're done with one part until you start reading the beginning of the next part, and then you can't back up to start the next part properly. there are buffering techniques you can use to solve that problem, but they're subtle. you need at least two buffers, because there's a chance you'll get part of the separator string in one read(), and the rest in the next. then you have to rotate the buffers to make sure you're keeping the correct "last chunk", paste the buffers together in the right order to see if the separator really is in there somewhere, deal with the possibility of multiple separators in the same 2-buffer chunk before reading again (because most of the non-file data will be very small), etc, etc, etc. all the problems are solvable, but the resulting system is quite complex and has to be handled carefully. dumping everything to a file and making a second pass is less elegant, but a lot more robust. if you want to write a script which *sends* files as attachments, you just have to run the same process in reverse. here's the full text of a message with an attachment: From webgeek@yawp.com Sun Sep 27 17:48:43 1998 Received: from [208.229.121.27] ([208.229.121.27]) by gw.yawp.com (8.8.8/8.8.8) with ESMTP id RAA12973 for ; Sun, 27 Sep 1998 17:48:42 -0500 (CDT) X-Sender: webgeek@204.71.106.52 Message-Id: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="============_-1305185946==_============" Date: Sun, 27 Sep 1998 17:49:07 -0500 To: mike@yawp.com From: "Michael A. Stone" --============_-1305185946==_============ Content-Type: text/plain; charset="us-ascii" test message, with attachments. --============_-1305185946==_============ Content-Type: text/plain; name="file_01.txt"; charset="us-ascii" Content-Disposition: attachment; filename="file_01.txt" 1 2 3 4 5 6 7 8 9 10 --============_-1305185946==_============ Content-Type: text/plain; name="file_02.txt"; charset="us-ascii" Content-Disposition: attachment; filename="file_02.txt" a b c d e f g h i j k l m n o p q r s t u v w x y z --============_-1305185946==_============ Content-Type: text/plain; charset="us-ascii" mike stone 'net geek.. been there, done that, have network, will travel. --============_-1305185946==_============-- which has pretty much the same structure as the previous one. the separator is different in appearance, but does exactly the same thing. the items of importance here are: - the "Content-Type" line, which tells the client to expect attached data, and defines the separator. technically, that whole thing should be on one line, with a semicolon between the pieces. - the headers for each attachment, showing the type and disposition of the entity's data. - the final two dashes after the last item. those are the termination signal which says the message is done. if you forget those, you'll really make the mail client unhappy. note that the main body of the message and the sig line are in different places, but don't have explicitly defined dispositions. that being the case, they'll both be displayed by the client as part of the message body. the two pieces in the middle, which are explicitly defined as attachments, will be written to disk using the filenames given on the disposition line.