Read This Before You Write a Newsreader, News Transport System, etc.
By Tom Limoncelli
Version 1.12. Updated 1995-03-30
This document is not a FAQ. A FAQ implies that someone asked questions
and someone else answered said questions. That's not what this
document is about. This document is written because of all the people
that didn't ask, or didn't know to ask, and got in trouble because of
it. People constantly post to the net that they are writing some kind
of software and in the process of asking other questions they reveal
that they are doing something else that is, well, stupid. This
document attempts to point out common stupid mistakes so that you can
avoid them. Wow! How amazingly useful! Isn't that nice of Tom?
Well, actually it's a sad statement on the understanding of netnews
technology that makes these stupid mistakes so common. Very common.
Common enough to make this document useful. So, this is not a FAQ,
this is a warning.
Point #1
"I think I'll write a newsreader!"
Stop. Stop right there. I have a suggestion that will save you a lot
of time: Go to a movie. Rent a video. Volunteer with a local
first-aid squad. Feed the homeless. Make a sandwich, walk
onto the street and when you see a homeless person say, "HERE!"
Just do anything other than write a newsreader. There are enough
already. We should have started neutering newsreader authors a long
time ago.
"...but I'm going to write one that does something nobody has done
before!"
Yeah, right. Before you say that, learn all the features of nn, trn,
strn, gnus, and tin. Now tell me you've thought of something new. Can
you add this new feature to one of the old newsreaders? Yes you can.
Plus, you'll save a million sysadmins grief when they have to go and
install yet another newsreader and figure out how to build it. It's
much easier to install a updated version of an old newsreader.
"...well I could, but I really want to make my system
stand-alone."
Then go stand alone in a corner until you've changed your mind. If you
don't, you'll spend all of your time writing silly parts like user
interfaces, figuring out your command structure, getting it to compile
on every kind of Unix in the world, etc. Heck, you'll waste most of
your time just writing the install script. Really! Just add your
feature to some other newsreader. All the boring parts are written
for you.
For example, a scoring system for articles could be the basis for an
entire newsreader. However, by adding this feature to trn (creating
strn), the author was able to build an entire newsreader with all those
great trn features, but concentrate on what he considered "fun"
(i.e. the scoring system).
Here's another example: TMNN was an attempt to make an entirely new
netnews system where things would be a lot more hypertext'ish. Rather
than just add these brilliant ideas to a newsreader (or to C News and
modify a newsreader to take advantage of the new data), the author
tried to re-invent the entire newstransport. The project was never
completed. I'm sure the author didn't get to spend much time on that
part that really interested him either.
Point #2
If you are writing a newsreader or transport from scratch, here's what
I think the areas of interesting research would be:
Wait a second, you aren't sure what's a transport and what's a
newsreader? Well, that's a sure sign that you shouldn't be writing
this software just yet. Much of the work of writing any software that
interacts with netnews can be avoided if you KNOW THE CURRENT
TECHNOLOGY FIRST. Read all the RFC's
(822,
977,
1036),
install
C News
or INN
once or twice.
Install
tin,
trn,
nn,
and readnews.
Read the O'Reilly book. If anything, read the
2 Usenix papers about C News
(same place you'll find the C News code),
Rich's Usenix paper on INN
(stored where INN is stored),
and Kurt Lidl's paper
on MUSE (distributing netnews via MBONE multicast).
Heck, they're just plain good reading for anyone that writes software.
Point #2.5
If you are writing news software (transport or reader) for a non-Unix
system then it's still important to have experience with the Unix
systems that are available. In fact, considering the horrible I/O
throughput on most Intel-based computers, you have 2x the reason to
have studied the papers about C News and INN because they have
optimized the amount of I/O to a minimum. If you are writing a
newsreader you have 10x the reason to learn a number of Unix
newsreaders. There is over 100 years of combined software engineering
experience in those readers. Where else do you have the opportunity to
learn from that much experience? (Maybe while writing accounting
software but who would want to do that?) Also, PC news posting agents
are notorious for not following the RFCs and making the mistakes listed
here. So, go slowly, read the resources, do your homework, get lots of
advice, and then go for it.
Point #3
If you are writing a newsreader or transport from scratch, here's what
I think the areas of interesting research would be:
- Advanced user interfaces.
- I don't mean Athena Widgets vs. MOTIF vs.
OpenWindows, etc. All those damn GUI-based newsreaders add absolutely
nothing to the state of the art... except maybe permitting the
mouse-generation to access the technically elite Usenet (which, being a
"More Power To The People"-kind-of-guy, I feel this is a good thing).
However, I think an advanced UI is something that reads news for you.
Read the Usenix papers on
RightPages
or Ferret. Or, how about
something that lets me post articles in a way that lets me communicate
better (for some definition of "better"). Something that
cross-references articles in a way that is more useful than currently
available.
- Disconnected Mode.
- Every pissant BBS user uses QWK and has all sorts
of fancy QWK newsreaders. Nobody has invented something as nice as
this for netnews. On the other hand, QWK sucks. Boy does it suck!
It is the embodiment of bad software design. It is the best example of
everything wrong with the way PC software authors create systems. Then
there is SOUP, the spec is available as soup12.zip from all good
SimTel mirrors. I haven't read it yet, but the author tells me it is
much better that QWK. Software can be found by looking in the FAQ's
for comp.os.msdos.mail-news and
alt.usenet.offline-reader.
Investigate this before you go reinventing the wheel.
- Better posting.
- Wanna be famous? Make a seriously amazing MIME
posting tool. You could be responsible for the next explosion of
netnews bandwith as everyone uses your MIME authoring tool to make
megabyte posts full of text, sounds, and graphics. Or, make a
idiot-proof, GUI-based, bullet-proof posting system that doesn't lock
you into just the standard headers. Want real fame? Separate the
posting mechanism from the newsreader. Define an interface between
newsreaders and newsposters and then make a couple newsposters. Try to
get every newsreader to add support to your newsposter. Then we'll
hear things like, "I use trn with FreshPost" and "Oh, I use trn too but
I use MagicPost, it has better MIME capabilities". Fame and fortune
await you!
- New storage systems.
- Everyone talks about storing netnews in a
compressed form, as a database, as a flat-file, on a special "netnewsfs"
filesystem, etc. etc. Nobody actually implements it. What's stopping
you? How about a storage system that makes expires happen blindingly
fast? How about a storage system that makes reading the "next article"
a fast operation (note: the next article is not numerically next if you
are using a threaded newsreader). INN has hooks for these kinds of
things and all INN utilities uses these hooks. Make the change once in
the right place and all (or most) INN code supports it. Time after time
people have suggested using an SQL database to store articles, the
history file, the kill file, the X-Files, etc. Why not actually
implement it and see if these are good ideas?
- HypertextNews.
- Why store quoted text? Why not just store a code
which specifies the quoted article and which lines? Newsreaders that
support it could let you click on the quoted text and view the lines
specified, the whole article, or whatever! Make a system that is also
backwards compatible or figure out how to expire such a news
database and you'll win the Nobel Prize!
There are plenty of ideas where those came from. Please don't just
write yet another newsreader.
Point #4
Never re-invent the wheel. Why write a text editor when you can just
call $EDITOR? (Unless your amazing new feature is a better editor...
in which case you shouldn't be writing a newsreader, you should be
writing something that all newsreaders could be calling as $EDITOR.
Why get bogged down writting tons of NNTP code when you can link to a
pre-written client library? NNTP-t5 and INN both generate ready-to-use
client libraries that anyone can link to and they do all the work for
you. Best of all, they are similar enough that you can write your code
so that it works when linked to either. It would be nice if someone
wrote a library with all the same calls that read everything off the
disk instead of via NNTP. Then you could link a NNTP-based newsreader
to this library and turn it into a non-NNTP-based newsreader. Why not
write a new library that checks a flag and reads news via either NNTP,
the file system, on a special compressed system, or by smoke signals.
Point #5
THINGS TO DO OR NOT TO DO
- Do use MODE READER.
- When you talk to an NNTP port, the first thing
you should do is send the command mode reader. Pay attention to the
error messages. 500 means "I don't know that command" (proceed as
normal), 200 means "good". Anything else and you don't want to talk
to this server.
- Do use a pre-written database.
- Don't use your own database, use NOV.
Link to the NOV library so you don't have to implement any of it. It
does all the work for you. Kill your sysadmin if they want to install
tin's database, trn's database, nn's database, etc. (unless you get
your hard disks for free).
- Posting: Don't validate headers.
- When the user wants to post an
article, give them an editor with the minimum headers and accept
whatever you get back. If any changes were made, send everything
verbatim to inews -h. inews's job is to validate the headers,
insert missing ones, silently delete certain ones, etc. Don't try to
do all this work in the newsreader. Sysadmins often hack their inews
to add some special feature... don't undo their work or require them to
re-add this hack to every newsreader they install! (The NNTP POST
command is the same as piping to inews -h except you must include a
From header).
- Posting: Don't work too hard #1.
- The inews -h command requires only
two valid headers: Subject and Newsgroups. Don't send it
anything else (unless the user inserts it him/her self). For example,
why figure out how to format the date properly? The format is very
specific and if you get it wrong, the transport silently drops the
article. Why try at all when you know that inews -h generates a
perfect one for you? Also, if the user inserts a Reply-To: foo@bar,
let them. Don't try to validate it, if they put in a non-functioning
address it's not your job to care.
- Posting: Don't work too hard #2.
- The NNTP POST command requires only
3 valid headers: Subject, Newsgroups, and From. It will
generate the rest if they are left out. Don't do the work yourself.
RFC977 says that you must generate all required headers, but that isn't
a good idea, as authors learned. That's why it is important to educate
yourself about the RFCs, as well as how they got implemented.
- Posting: Don't generate a Path header.
- Don't generate a Path
header. Period. With networks changing so often, it is impossible to
generate one that is correct for all sites. Let inews or NNTP's
POST command generate it for you. They will generate it properly
because they were installed (and maybe modified) by someone that
understands the site's special configuration. The person that installs
the newsreader is often someone different, and is often installed by
Joe Loser that thinks netnews was invented 3 months ago when he first
discovered alt.sex.
- Posting: More headers not to generate.
- When generating a post's
headers, don't insert the Date header, munge the Sender, From
header, etc. That is inews's job. inews's only purpose in life is
to take the crap that the user input, add the missing required headers,
check and fix obvious errors, and reject what it can't fix. inews will
send it to the spool or post it via NNTP. Why does everyone think they
can out-do inews by doing the work themselves?
- Posting: The importance of the Date header.
- The Date header is
critical to the news transport because this makes it possible to expire
netnews. Therefore, the Date header has to be one of a couple very
specific formats so that transport software authors aren't chasing a
moving target. Since every site that touches an article must re-parse
this date (and it is slow to parse), C News and INN have optimized on
one particular date format. The other formats are handled in a manner
that isn't as fast. So, output Date: formats like C News does. Better
yet, if you are a posting agent do not generate the Date format and
let inews (or the NNTP POST command) generate it for you!
- Date header mania.
- The Date header that you generate should always
use your local GMT timezone offset. However, if you want to be a
really cool newsreader author, make sure your program displays that
header in the local timezone of the person reading the message. (i.e.
convert the header to the local time when displaying it). Remember to
provide a way to see the original header (i.e. the "show all headers"
command shouldn't do the conversion).
- NNTP posting: Don't use IHAVE.
- Use the NNTP POST command, not the
NNTP IHAVE command. If you use NNTP's ihave command then you have
spent about a week duplicating all the work that inews (or NNTP's POST
command) does, wasted another week of programming time to get
everything "just right"... and when someone installs their software on
an INN server, they'll find that it doesn't work. Duuuh!
- When your user is idle, don't generate traffic.
- If your user isn't
typing, mouse'ing, clicking, etc. your newsreader shouldn't be
generating work for the server. Imagine 5,000 users all leaving your
program running when they leave for lunch. EXCEPTION: If you are
implementing some fancy read-ahead model, but then you shouldn't be
reading too far ahead if the user seems to have walked away from their
terminal, eh?
- If you lose your connection, handle it transparently.
- Write your code
so that if your NNTP connection closes you handle it gracefully. You
shouldn't go into an infinite loop, or spin in a open->error->open
loop chewing up CPU time. If you can't re-open the server, tell
the user but don't core-dump.
- When your connection is closed, reconnect gracefully.
- Write your
code so that you reconnect without the user being warned a zillion
times. Maybe put "Reconnecting to server" in a status line, but
don't require the user to click on "OK". Give the user the feeling
that they always have a connection, even when they are talking to
a server that disconnects after 30 seconds of idle time.
- When your connection is closed, don't reconnect until you have to.
- If
your connection closed it was for a good reason. Either you closed it
because your user was idle, or the server closed it because it felt
your user was idle, or maybe the server went down. Don't reconnect
until you need to issue your next NNTP command. Example: If a server
has 400 connections when it reboots, you don't want 400 clients all
pummeling it with packets trying to start new connections while it is
trying to come up. Plus, when the service is operational again, only
those connections that are actively used should be reconnecting
anyway. If you delay reconnecting until the user needs it, the load on
the server will be smoothed out since everyone won't be connecting at
the same time.
- When your user is idle for a long time, disconnect.
- If your user is idle
for more than 5 minutes, why not close the NNTP connection? If you
followed the above advice, the reconnect will be seamless and the
users will not notice.
- When your client is idle for a long time, disconnect.
- NNTP servers should
disconnect if a connection hasn't seen traffic for 5-10 minutes.
Let the newsadmin set this time limit, and let them disable
this feature if they need to. In a perfect world, all newsreaders
disconnect after 5 minutes of idle time, all servers will disconnect
after 5 minutes of idle time, and all re-connects will be transparent
to the user. However, since we don't live in a perfect world, we
have to do our best to do our share.
- Disconnect every 4 hours.
- Whether idle or or not, disconnect from the
server every 4 hours. This lets any file handle leaks on the server
get flushed out. If you followed the above advice about reconnecting,
your users won't notice.
- Don't disconnect between every command.
- I hate to embarrass anyone but
the authors of Netscape
made the mistake in a beta version (the current
one is fixed) where they closed the connection after every single
article. You could just hear your system performance die as your
kernel locks out everything trying to fork() fast enough to keep up
with the Netscape users.
- Don't connect if you don't have to.
- nnpost (part of
NN) connects to
the NNTP port, then put you in the editor. 15 minutes later, you have
completed writing your post and the server has disconnected you because
your connection was idle. Now it has to re-connect to do the actual
posting. The opposite is just as bad: /bin/rnews -U (part of the
INN distribution) connects to the server every time it runs, even if it
doesn't need to send anything. (This actually triggers a bug in
certain operating systems. Someone forgot to test the OS to see how it
handled a connection being created then closed, with no read()s or
write()s on it in between.)
- Disconnect correctly.
- If you drop the NNTP connection, drop it
gracefully. Send QUIT\r\n on the socket, then close it. When might
you want to do this? For example, if a user cancels any kind of
operation while a transaction is in progress with the news server, you
may want to abort the news stream. Don't just disconnect the stream!
Ungraceful disconnects annoy news administrators because they show up
in logs.
- If your connection closes for good, don't go crazy.
- Some times a
connection dies because a machine is down or doing maintence or maybe
the permission file just changed and you no longer have permission to
talk to that server. All NNTP-based newsreaders should handle this
gracefully.
- Don't generate vanity headers.
- Don't include a header that identifies
what newsreader the user is using.
Son-of-RFC1036
explicitly states
that this is A Bad Thing. If you haven't seen this header before it
basically looks like: X-Newsreader: this was posted by a user that
uses FooReader v33.1 which is the software that I wrote and I put this
header in because I'm boring and immature and think that I can make
myself famous by adding this header when it really just shows how
shallow I am. Well, you're not completely shallow, but you should
watch out for the neutering patrol (see Point #1 above).
- Trash certain headers.
- News posting agents shouldn't generate
NNTP-Posting-Host or Path headers. News transports that receive
posts (i.e. the NNTP POST command or non-NNTP inews commands) should
notice attempts by users to supply their own NNTP-Posting-Host or
Path headers and delete them. Of course, the transport should add
replacement headers. My point is that if a user tries to supply a
NNTP-Posting-Host or Path header, they should be silently replaced by
the transport (or the mechanism that accepts posts).
- Educate yourself.
- Reading RFC1036 and
RFC977 in one sitting in a quiet
library was the best investment I ever made. Read
RFC822
too, but it
might put you to sleep so read it in bed. (Certainly do not read it
while operating heavy machinery.) You learn all sorts of requirements
you may not have known of and they explain many issues. They also tell
you certain things that were tried one way but failed, and (therefore)
why it is standard practice to not do those things.
- Tips when replying and doing a follow-up post.
- Don't reply to the user
listed in the Path header. The Path header is just informational as
far as you are concerned, unless you are a news transport (C News or
INN). Don't ignore the Reply-To header when doing a reply, and
don't invent some data-structure that will prevent you from using the
Reply-To header. FIDO sites have a From but no Reply-To field. So,
FIDO gateway software just drops the Reply-To header or promotes the
Reply-To header to replace the From header "so that replies work
right". Well, Mr. Snoop-FIDO-Dog, first of all, to be gramatically
correct it's "work correctly." Secondly, you've just broken the spec.
There are other pairs of headers that work like this. For example,
Followup-To/Newsgroups is similar to Reply-To/From. However, don't
forget to implement the RFC1036 requirement that Followup-To: poster
(yes, the string p-o-s-t-e-r) means that if the user tries to do a
Followup, do a Reply instead. If this happens, don't forget to check
for the existence of a Reply-To header!
- Don't confuse the header and the body.
- Between the header and the body
of an article is one blank line. It doesn't have anything on it. No
spaces, no tabs, no nothing. After that blank line, don't fuss with
what you find. I've seen FIDO software that finds headers inside the
body and treats them like real headers. For example, such broken
software would find 3 "headers" in the above paragraph
("TIPS WHEN REPLYING...")
and try to process them.
(to be continued...)
[ A related note: The Good Net-Keeping Seal of Approval attempts to
establish a standard for newsreader behavior on Usenet. For more
information on the aims and requirements of the Good Net-Keeping Seal
of Approval, see
http://www.cybercom.net/~rnewman/Good_Netkeeping_Seal
]
--
Tom Limoncelli -- tal@plts.org (home) -- tal@big.att.com (work)
"Would you compare your system administrator
to `Indiana Jones' or `Tank Girl'?" "Both!"
<