(09:54:30 PM) sonny_ep [n=sonny@70.58.166.119] entered the room. (09:54:42 PM) racke1: hello sonny_ep (09:55:51 PM) sonny_ep: Hello. How're things? (09:56:50 PM) racke1: busy as always and pretty late :-) (10:01:30 PM) racke1: ding dong who is up for Interchange IRC meeting 8-) (10:01:42 PM) __justin__: i'm awake (10:02:27 PM) sonny_ep: that's what I dropped in for (10:02:37 PM) racke1: *thumbs up* (10:02:44 PM) ezekiel: and Rene has been waiting for hours :) (10:02:54 PM) racke1: hello ezekiel (10:03:06 PM) ezekiel: hello! (10:03:09 PM) racke1: jon_jensen said he would drop in as well (10:03:12 PM) ***matjones is here but not awake yet (10:03:40 PM) ***Rene is still sitting by the irc-bonfire (10:04:08 PM) ezekiel: popcorn gone? (10:04:10 PM) matjones: with marshmallows? (10:04:29 PM) jon_jensen: hello everyone (10:04:35 PM) racke1: hello jon_jensen (10:04:38 PM) matjones: hi jon (10:04:50 PM) ***racke1 puts some UTF-8 into the bonfire (10:04:51 PM) jon_jensen: thanks for the nice agenda, racke (10:04:56 PM) jon_jensen: (by email) (10:04:59 PM) ***endpoint_david here (10:05:33 PM) ***racke1 just rebooted the mail server, let's check ;-) (10:05:49 PM) racke1: first bullet point is binary uploads and UTF8 (10:07:47 PM) jon_jensen: racke1: anything you mean beyond the patch you committed for that a few days ago? (10:07:56 PM) racke1: yes (10:08:17 PM) racke1: as you could upload text content which might be treated as UTF-8 (10:08:43 PM) ***Rene hates marshmallows (10:08:45 PM) jon_jensen: if it's text content, it *should* be treated as UTF-8, shouldn't it? unless browser specifies another encoding ... (10:08:51 PM) racke1: right (10:09:33 PM) jon_jensen: oh, I see, you didn't check for MIME text/* to make an exception (10:10:16 PM) endpoint_david: racke and I discussed adding a $CGI::encoding counterpart to $CGI::file (10:11:57 PM) racke1: I just made a bandaid for the usual case (10:12:08 PM) racke1: where is this code with $CGI::file? (10:12:23 PM) endpoint_david: Vend::Server has the initial setting of it, I believe (10:12:40 PM) endpoint_david: :485 (10:13:04 PM) racke1: yeah (10:13:40 PM) racke1: in theory this code could make the correct thing? (10:13:57 PM) endpoint_david: define correct thing? (10:14:22 PM) endpoint_david: we'd need to decode the text to internal format if it's text, otherwise we'd need to leave it raw (10:14:30 PM) racke1: yes (10:14:37 PM) endpoint_david: either way on the output we'd need to know if we're storing it raw or as text (10:14:44 PM) endpoint_david: i.e,. in the call to write_file (10:15:10 PM) racke1: hm yes (10:15:53 PM) endpoint_david: which raises the question, do we want to always coerce text to the default encoding when writing out, or do we want to preserve the charset of the uploaded file? (10:16:36 PM) jon_jensen: I would coerce text to the default encoding (10:16:45 PM) jon_jensen: otherwise you'll never get it right on subsequent requests when that info is lost (10:16:49 PM) endpoint_david: I presume the default when !MV_UTF8 is provided is to leave it raw and write it out raw (10:17:12 PM) endpoint_david: jon_jensen: that's my inclination as well, just talking through to see if there are any side-effects (10:17:16 PM) racke1: and the update_data "interface" sucks enough anyway (10:17:20 PM) sonny_ep: I expect you want to "coerce" it to whatever the receiver of the text (file system, database, etc) wants it in. (10:17:29 PM) jon_jensen: sonny_ep: yeah (10:17:42 PM) endpoint_david: write_file already has the ability to set the encoding explicitly, it's what you do by default (10:17:50 PM) racke1: file system doesn't know about encoding (10:18:14 PM) sonny_ep: the OS does, though (generally) (10:18:16 PM) endpoint_david: sonny_ep: you're talking in the !UTF8 case? (10:18:29 PM) jon_jensen: sonny_ep: "the OS" is kind of a broad space :) (10:18:36 PM) racke1: how does the OS know? (10:19:22 PM) racke1: ok, so in this space in Server.pm we can add $CGI::file_encoding with the proper value and use it in update_data? (10:20:15 PM) racke1: and tag_value_extended etc (10:20:17 PM) endpoint_david: do we need to distinguish between browser-provided charset and inheriting the default_charset() ? (10:21:08 PM) racke1: we convert to our charset in this function, so I guess we don't bother about that for now (10:21:56 PM) racke1: jon_jensen: just noticed your commit and have a question about UPGRADE (10:22:09 PM) racke1: ## Create the makefile (10:22:09 PM) racke1: perl Makefile.PL prefix=/usr/local/interchange (10:22:26 PM) sonny_ep: fair question racke1, but you have the locale, which isn't ironclad, but you can expect the text files on a system to generally be in the system's locale (10:22:42 PM) racke1: does that work, I noticed that it annoys the user with a question about the location ... (10:22:58 PM) racke1: sonny_ep: expecting that's what Windows make mess up (10:23:04 PM) jon_jensen: racke1: what about UPGRADE? (10:23:21 PM) racke1: INSTALLING INTERCHANGE IN THE SAME LOCATION, 4. (10:23:28 PM) jon_jensen: oh, I see (10:23:39 PM) jon_jensen: I don't know ... I didn't write that :) (10:23:55 PM) jon_jensen: we could leave the prefix= off, and just let it ask, right? (10:24:17 PM) racke1: or teach Interchange sane behaviour it that case (10:24:25 PM) racke1: just keeping the prefix (10:24:37 PM) jon_jensen: by "Interchange" you mean MakeMaker's Makefile.PL? (10:24:52 PM) racke1: no the stupid question (10:25:07 PM) racke1: if you use --force it works as expected, at least I think os (10:25:24 PM) racke1: env $(PERL) Makefile.PL force=1 \ (10:25:24 PM) racke1: INTERCHANGE_USER=interchange \ (10:25:24 PM) racke1: PREFIX=/usr/lib/interchange (10:25:31 PM) racke1: that's what .deb does (10:26:34 PM) racke1: and I don't see the point in ignoring the prefix (10:26:59 PM) jon_jensen: wait, is it just because prefix= is not PREFIX= ?! (10:27:07 PM) jon_jensen: I think that matters (10:27:27 PM) racke1: I don't think so (10:27:42 PM) racke1: or maybe ... hm (10:27:56 PM) ***endpoint_david going afk for a bit, back in 30 (10:28:16 PM) jon_jensen: yep, that's it, racke1 (10:28:23 PM) jon_jensen: when I do PREFIX, the question has the correct default (10:28:35 PM) racke1: acke@erebus:~/interchange/edge$ perl Makefile.PL PREFIX=/usr/lib/interchange (10:28:35 PM) racke1: Interchange V5.7.3 (10:28:35 PM) racke1: (10:28:35 PM) racke1: Copyright (C) 2002-2009 Interchange Development Group. (10:28:35 PM) racke1: Copyright (C) 1996-2002 Red Hat, Inc. (10:28:35 PM) racke1: Interchange is free under the terms of the GNU General Public License. (10:28:35 PM) racke1: http://www.icdevgroup.org/ (10:28:35 PM) racke1: Where is your Interchange to be installed? /usr/lib/interchange (10:28:37 PM) jon_jensen: though in UPGRADE instructions I don't see the point of specifying on the cmd line (10:29:02 PM) racke1: right, but if I provide a prefix, I don't see the point of the question (10:29:13 PM) jon_jensen: I really just don't care about this. :) (10:29:14 PM) racke1: questions are obstacles (10:29:20 PM) jon_jensen: I'm going to remove the prefix= thing from UPGRADE. (10:29:41 PM) racke1: why don't use at least PREFIX ? (10:29:47 PM) jon_jensen: why? (10:29:52 PM) jon_jensen: it asks the question anyway (10:29:55 PM) jon_jensen: the user can answer that (10:30:26 PM) racke1: well, upgrade UPGRADE and annoy me (10:30:29 PM) racke1: back to UTF-8 (10:32:21 PM) fenic [n=richard@74.80.40.12] entered the room. (10:35:02 PM) racke1: hello fenic (10:35:27 PM) racke1: in this parse_multipart we don't know yet which catalog we are serving or do we? (10:44:43 PM) jon_jensen: ssssh (10:45:45 PM) racke1: yeah that's UTF-8 fun :-) (10:46:06 PM) racke1: just added a comment to the ticket (10:46:09 PM) jon_jensen: I'm working on RT #321 for something different. (10:46:17 PM) sonny_ep: I think david is still afk. Is anyone else on that knows this area? (10:46:33 PM) racke1: I think we should delay file contents conversion until we are in catalog code (10:49:08 PM) sonny_ep: I think (its been too long since I looked at that code) that it depends on where parse_multipart is being called from. It may or may not know about the catalog yet. (10:52:31 PM) racke1: jon_jensen: cool thanks :-) (10:53:38 PM) racke1: speaking of testing: how could we setup a test for file uploads? (10:54:04 PM) k2b3 left the room. (10:56:27 PM) endpoint_david: .me back (10:56:31 PM) ***endpoint_david back (10:57:19 PM) racke1: endpoint_david: added a comment to the binary upload ticket (10:58:02 PM) endpoint_david: oh, nice, we're still in request parsing, so we don't know the catalog, so we don't know the utf8edness status (10:58:29 PM) endpoint_david: that seems to point to a "store encoding" and punt until later approach (10:58:36 PM) jeff_b left the room (quit: "Leaving."). (10:58:53 PM) endpoint_david: i.e., the current behavior is broken everywhere, unless all catalogs are all the same utf8/not utf8 (10:58:54 PM) racke1: yep, but I suppose we can leave the data alone and convert when we know about the catalog :-) (11:00:40 PM) racke1: and it's premature to do this conversion in parse_multipart in terms of resources (11:01:06 PM) endpoint_david: this is #268? (11:01:17 PM) racke1: yes (11:05:57 PM) racke1: do you agree with my suggestions? (11:07:30 PM) endpoint_david: is there a single well-defined entry point for the catalog? what's your proposal? (11:08:02 PM) racke1: open_cat ? (11:11:21 PM) endpoint_david: so like right before the open_database() call in Vend::Dispatch? :1157 (11:13:43 PM) racke1: ok (11:15:18 PM) endpoint_david: looks like open_cat gets called in two places, once for jobs and once for the normal dispatch (11:16:39 PM) racke1: yes (11:24:03 PM) endpoint_david: is there any compelling reason why a content-type without a declared charset would be interpreted as whatever our user's default_charset() is set to? seems like there's probably a standard out there which defines is (guessing latin-1) (11:25:00 PM) sonny_ep: Interesting you should ask...you want the long painful answer or the short one (ie take my word for it)? (11:26:04 PM) racke1: short one first :- (11:26:06 PM) endpoint_david: both, short first? :-) (11:28:18 PM) sonny_ep: you generally are *not* going to get a charset with your content-type, but (again, generally) you are going to get requests in the same charset that your site's pages are encoded in (ie the default_charset). (11:28:47 PM) endpoint_david: is that just because things are not usually sent as multipart? (11:28:56 PM) endpoint_david: i.e., your usual page request? (11:30:05 PM) endpoint_david: I suppose the encoding doesn't come into play without an HTTP request body, which would effectively be POST and PUT (11:30:55 PM) racke1: what about a GET with UTF-8 in the CGI parameters ? *evil grin* (11:31:00 PM) endpoint_david: GET is US-ASCII, which when URL-decoded is implicit UTF8 encoding (or so I read) (11:31:06 PM) sonny_ep: even then, you generally don't get told what the encoding is unless you get multipart stuff or the encoding is different from your page (11:31:13 PM) racke1: ok (11:31:20 PM) sonny_ep: racke1: you can just forget about that :) (11:32:10 PM) sonny_ep: actually, you can encode utf8 in the parameters just fine, but...there is no sane way to determine that they are *for sure* encoded that way (11:33:18 PM) sonny_ep: this stuff is all dead simple compared to the next item on your agenda (#255) (11:35:32 PM) racke1: ok why? (11:37:10 PM) sonny_ep: Oh, I was being snarky. I recall that it was a real minefield of nastiness when I last looked at it. What's the actual issue? (11:37:37 PM) racke1: david said it's really simple while I had problems with the generated emails (11:37:42 PM) endpoint_david: seems to me we wouldn't have to support arbitrary header encoding, just basically running the decoded string through encode('MIMEB'); (11:37:48 PM) racke1: david blamed Thunderbird :-) (11:37:50 PM) endpoint_david: or whatever the specific header encoding is (11:37:58 PM) endpoint_david: there's B and Q encoding (11:38:09 PM) endpoint_david: it's a noop on US-ASCII (11:38:31 PM) racke1: how do we know whether to B or Q ? (11:38:36 PM) endpoint_david: although it would cause issues where MV_UTF8 is off (11:38:44 PM) endpoint_david: assuming mail client support, wouldn't matter (11:38:51 PM) endpoint_david: Q is basically quoted-printable (11:39:02 PM) endpoint_david: B is binary w/base85 encoding or something similarly esoteric (11:39:11 PM) endpoint_david: either can encode the source charset and the data stream (11:39:23 PM) endpoint_david: (*that's* where I remember seeing that before) (11:39:37 PM) racke1: if MV_UTF8 is off, we just leave it alone, right? (11:39:48 PM) endpoint_david: we...ll (11:39:58 PM) endpoint_david: arbitrary binary output would definitiely be broken (11:40:21 PM) endpoint_david: at best, you'd encode quoted-printable in the wrong charset (latin1, probably) (11:40:23 PM) racke1: we are talking about the headers not the mail body (11:40:27 PM) endpoint_david: yeah (11:40:33 PM) endpoint_david: From, etc (11:41:05 PM) endpoint_david: that's why you get those spam messages with crap characters in them (11:41:27 PM) endpoint_david: (which annoys me more than spam which can manage to get their charsets right... :-D) (11:42:11 PM) endpoint_david: again, that's contingent on mail client support, but I believe that's a pretty well-established standard (11:43:05 PM) racke1: ok so in send_mail we apply encode('MIMEB') to mail headers? (11:43:15 PM) racke1: and what is a good mail client to test :-) (11:43:32 PM) racke1: other than spamassassin ;-) (11:44:39 PM) sonny_ep: my previous experience with thunderbird is that it is perfectly happy to display email with 8bit chars, so that is not a good test. would mutt work? (11:45:42 PM) endpoint_david: ssibly (11:45:47 PM) endpoint_david: *possibly (11:45:58 PM) endpoint_david: I'd guess it would (11:48:08 PM) endpoint_david: B encoding is defined in RFC 1521 (11:49:33 PM) endpoint_david: looks like the Encode.pm module defines MIME-B and MIME-Q, as well as MIME-Header, which is an alias for MIME-B (11:51:18 PM) sonny_ep: If you are adding this for email support, you may want to look into refactoring parse_multipart, which has to hanlde mime headers encoded either of these ways (as well as the body). (11:51:51 PM) racke1: interesting: (11:51:56 PM) racke1: http://src.chromium.org/viewvc/chrome/branches/WebKit/BugsSite/Bugzilla/Mailer.pm?view=markup&pathrev=30348 (11:52:20 PM) endpoint_david: sonny_ep: not following why we'd need to modify parse_multipart (11:52:25 PM) racke1: I think email and CGI are quite different in regard to UTF-8 (11:52:26 PM) endpoint_david: this is for outgoing email, no? (11:53:14 PM) racke1: yes, the same thing we need, right? (11:53:29 PM) racke1: for [email] and friends (11:53:41 PM) sonny_ep: the encoding for mail headers and for mime-multipart headers is the same (11:54:06 PM) racke1: but we decoding CGI and encoding Email (11:56:42 PM) sonny_ep: yup, I was guessing that it might support encoding and decoding together...if we are going to add support for another library (11:57:06 PM) racke1: we already use Encode (11:57:19 PM) racke1: unless running in Mike Heins mode :-) (11:57:24 PM) sonny_ep: oh (12:05:14 AM) endpoint_david: back later (12:05:15 AM) endpoint_david left the room (quit: ). (12:06:14 AM) racke1: ok, I'm off for sleep now, at least I gather a bit more insight into UTF-8 :-)