KILL file popularized by rn). Kill files are normally very limited
in capability, often being unable to do more than mark articles with a
given subject, or from a given author, as having been
"read." Even in the more powerful newsreaders, a filtering
action (e.g. junking or selecting) can depend on only a single
condition being met, making it impossible to write a filtering rule
which depends on two or more independent criteria (e.g. a rule which
is invoked only when the Subject and From
headers match certain patterns). The scoring
newsreaders which have recently become popular allow users
considerably more power, by assigning numeric scores to articles that
match particular patterns. However, even these powerful scoring
interfaces cannot perform extremely precise filtering, due to
limitations in the way scoring commands are designed and
implemented.There are advantages to making the scoring engine completely independent from the newsreader; for example, Brad Templeton's NewsClip[tm] package. Notably: an external filter permits more power and flexibility to the user than existing interfaces offer; it permits cross-platform compatibility for filtering systems; and it reduces the complexity of the newsreader's internal logic. The NewsClip software suffers on the first condition, in that its filters must be written in an idiosyncratic compiled language; though this makes it a more powerful interface, it also makes it less flexible in that the user is bound to using a single language, and is limited to platforms for which that language is implemented. This proposal does not specify the use of any particular language, thus sacrificing some power but gaining flexibility and simplicity. Such an interface has already been implemented in trn and slrn. Work is proceeding for other newsreaders.
bye message) when the newsreader exits,
and not deliberately destroyed at any time during the course of a
newsreading session. At the very least, if the filter should ever
need to be restarted or reinitialized, the newsreader should do so
transparently, rather than require the user to take some explicit
action.In principle, the newsreader does not even need to spawn a filtering process itself: instead, it could simply open a communications channel with some already-running process, a filtering daemon which would then calculate and return a set of scores. Such a daemon would not even have to run locally, but could be reached remotely over a network. However, for the sake of convenience, this proposal keys articles on their "article numbers," which are unique to a particular news server, rather than on Message-IDs, which are persistent across sites. Thus, the filtering agent must be local to the newsreader's site. Implementors are encouraged to experiment in this area.
The communication between the newsreader and the filter is all
performed using only 7-bit ASCII, in accordance with existing news conventions and standards. Each message is
a single line of data terminated with a CRLF pair (octet 015 followed
by octet 012).
Every message sent by the newsreader is one of three commands:
newsgroup, art, and scores.
newsgroup
art
XOVER NNTP command. Nearly
every message received from the newsreader will be an
art command.
scores
skip, article
scores and done.
skip
skip command is sent in response to a newsgroup
command when there are no external filters to be run for that newsgroup.
done
(The protocol is designed this way specifically to accommodate trn, which in its current incarnation requires an immediate response each time an article score is requested. It is not considered a feature.)
Newsreader authors also should note that it is unwise to request scores before any overview data has been sent, lest the snake bite its own tail.
NOTE: It would be useful here to present some statistics about the speed and efficiency of different scoring interfaces.
Filtering packages have already been written in Perl and Tcl. Some details of the Perl implementation are described here, merely as an example of a useful filtering package. It is instructive to note that the same package can be implemented, with little difficulty, in any one of many high-level interpreted languages.
When the Perl article filter is invoked, it first loads a library of functions to assist with article scoring. The most interesting ones (from the user's point of view) are those which actually assign scores to articles:
score_art
select_art
score_art for this article may alter this score.
junk_art
select_art,
but assigns an extremely high negative score to article.
global, if
found. This code defines a global filtering subroutine,
global_score, which will be used to calculate a score for
every article received, regardless of the newsgroup in which it was
found. An example of a global library that may be of interest to many
users:
# global
sub global_score {
my ($a) = @_;
$ngs = $a->{xref} =~ tr/ //;
score_art($a, $ngs * -5);
}
The tr command counts the number of spaces found in a
variable, in this case the contents of the article's Xref
header. Each time the global_score function is called to
calculate a score for an article, it will assign that article -5
points for each newsgroup it is crossposted to. The more newsgroups
to which an article is crossposted, the lower the score it receives.Since the user may not wish to apply such draconian measures to articles that appear in news.answers or its sibling groups:
unless ($a->{xref} =~ /answers/) {
$ngs = $a->{xref} =~ tr/ //;
score_art($a, $ngs * -5);
}
These are merely examples, intended to illustrate how easy it is to
write a script which filters articles through a very fine sieve.
After loading the global library, the filter enters a loop which
responds continuously to commands sent by the newsreader (on standard
input). When a newsgroup command is received, the filter
looks for a Perl library of the same name as the newsgroup selected,
and loads it. For example, when the filter receives the message
newsgroup news.admin.misc, it looks in its search path
for a file called news.admin.misc which contains Perl
code. This library, if it exists, should define a subroutine named
local_score, which subsequently will be used for scoring
articles in that newsgroup.
When an art command containing an overview record is received,
the filter parses the overview data into an associative array. A
reference to this structure is passed to the global_score
subroutine and the local_score subroutine in turn, if either
of them is defined. Because the global_score function is
defined exactly once when the filter is invoked, it will remain
constant as long as the filter process remains alive. Since the
local_score
function is defined differently in each newsgroup-specific library, it
is redefined each time the filter receives a newsgroup
command and will calculate a score for each article according to the
rules for the newsgroup the user is currently reading.
When the filter receives a scores command, it returns
each score it has calculated (together with the number of the article
possessing that score, of course). These lines are printed on
standard output, followed by done.
All of this bookkeeping is handled automatically by the script. All
the user need do is write the global_score function and
each newsgroup's local_score function, assigning scores
to articles by means of score_art and related functions.
However, one of the strengths of this system is that users who find
one filtering package unsatisfactory may write a completely new
package with comparative ease. The default package is provided simply
as a convenience for users who do not have such needs.
An example of what one user's news.admin.net-abuse.misc filter might look like:
# news.admin.net-abuse.misc
sub local_score {
my ($a) = @_;
junk_art ($a) if ($a->{Subject} =~ /forge/i and
$a->{From} =~ /boursy/i);
}
When news.admin.net-abuse.misc is selected, the newsreader sends a
newsgroup command to the filter, which loads the
news.admin.net-abuse.misc library, overriding any
previous definition of local_score.Conventional newsreaders do not allow the user to make a particular rule contingent upon two independent headers. For example, it is difficult in some scoring newsreaders to write a rule that junks an article if the article's subject contains the word "forge" and the author's name is "boursy", but not if either condition is met by itself. Generally the only way of doing this is to assign a large negative score to articles with a subject of "forge" and another large negative score to articles written by "boursy", junking articles with a sufficiently high negative aggregate score. This clumsy solution quickly proves impractical if the user wishes to write a large number of rules which depend on such specific conditions, as the independent scoring rules quickly start to interfere with one another. By contrast, a system which implements filtering by applying commands written in a high-level language makes such specific filtering almost a trivial matter.
want command which permits the filter to specify the
headers on which to sort. Filters written according to this protocol
should continue to work under trn, but trn filters which make use of
this extension may not work with other newsreaders.