0.30 Is here! (finally)
[Sat Jan 12, 2002] After a long wait, FilterProxy 0.30 is finally here.
This version has the change from Parse::ePerl to HTML::Mason. If you tried to
install FilterProxy before and were unable because of issues with Parse::ePerl
or perl 5.6, you should have no trouble with this version. Other exciting changes
in this version include a view-source like
functionality that marks-up pieces of the document that were filtered.
Since this is a little difficult to explain, it's best to just see it. With
one click, and also edit the configuration for that page. An XSLT module has
been contributed by Mario Lang. XSLT will let you transform XML/HTML by examining
the file's structure and writing an XML stylesheet. For more info on XSLT,
Take a look at this
What is FilterProxy?
FilterProxy is a generic http proxy with the capability to modify proxied
content on the fly. It has a modular system of filters which can modify
web pages. The modular system means that many filters can be applied in
succession to a web page, and configuration is easy and flexible. FilterProxy
can proxy any data served by the HTTP protocol (i.e. anything off the web), and
filter any recognizable mime-type. All configuration is done via web-based forms, or editing
a configuration file. It was created to fix some of the annoyances of poor
web design by rewriting it. It also can improve the web for you, in both speed
(Compress) in quality (Rewrite/XSLT). After ads (and their graphics) are
stripped out, and html is compressed, surfing over a modem is much
faster. Compare to Muffin (a similar
project in java), and WebCleaner (a similar project in
python) in purpose and functionality. FilterProxy is written in perl, and is
(NEW!) Also check out my list of ways to fix web/Netscape
Ok, ok, now what the hell does it really do?
Modules that are currently written are:
- Allows web pages to be rewritten in arbitrary ways. This means that
advertisements can be removed from really complex pages. It also
will let you reformat the layout. It will allow you to remove
tags, modify the attributes to tags, or remove or change entire
sections. Practically, this means it can remove :
change <font size=1> to something larger
and more readable:
- strip tag <blink>
(Ideally, remove all absolute font sizes, and replace them with
relative ones. Why do so many web pages do this to me?) It also
removes web bugs:
- rewrite attrib <font size=1> as size="-1"
which are usually 1x1 gifs that
advertisers use to track you (and slow down your browsing severly
when over a modem!) For a good description of web bugs, check out
this Washington Post article. (It will say it can't find the
article...just hit reload and it will show up). Most importantly, it can
- rewrite tag <img /width/=1 /height/=1> add encloser </(no)?script/> add alternate as <spacer width=1 height=1>
These are rewrite rules, and just a hint of the power with which you can
rewrite web pages you visit.
- strip regex #(ads\.freecity\.de|flycast\.com|/RealMedia/ads/)# inside tagblock <script> add alternate add balanced
- XSLT stands for XML Stylesheet Language Transformations, and it
transforms one XML document into another XML document. With the XSLT
module you can apply XSL transformations to HTML. Here is a
tutorial on XSLT Basically you can rewrite HTML documents by
examining the structure of the document. But it XSLT does not have
the matching power of regular expressions, so it is complementary to
the Rewrite module.
- Compresses web pages. This can lead to a 5x speed improvement if you
are surfing over a modem, and can arrange to have FilterProxy running on a
server with a direct net connection
- Filter HTTP headers in arbitrary ways. This means it can anonymize
your requests (removing User-Agent, Referer), and filter cookies by domain.
i.e. don't accept or send cookies to any known advertiser's domain. It can
remove any header (including regexp matching of header names), and add
- De-animates animated gifs, and removes other "extension blocks", generally
making them smaller as well as de-animating them.
- Example module (heavily commented) for people interested in extending
FilterProxy by writing new modules
Modules that may be coming, as the author has time (or
- Keep a "cookie jar". Rather than Netscape keeping all your cookies, FilterCookie
will take care of it instead. It will also allow you to easily view your cookie jar
and remove cookies from it. It will be able to filter cookies on a site-by-site
- Map URL's to other URL's. This might be useful to get "printer-friendly"
versions of articles from various news sites, and to block requests for
images from known advertisers domains (in case they slip through Rewrite).
- Cache a local copy of images from sites you visit often. Possibly
rewrite img references to be local (i.e. http://... ->
- Acts as a Netscape roaming profile server, so that it stores bookmarks
and netscape preferences.
- Allows web-based access to a user's bookmarks (so friends can see them).
- Has a "search engine" which indexes content on bookmarked pages (and
pages linked to from a bookmarked page, on the same server), so that you
can find things in your bookmarks by a search through this module's interface.
- Allows "classification" of bookmarks in a more sophisticated manner
(preferably by keyword, rather than tree), and then can generate
yahoo-like indexes by keyword. (or by searches for a keyword).
For instance, I might bookmark the homepage for xmms (http://www.xmms.org/)
which I would then classify by adding the keywords (mp3, linux, audio,
music, eyecandy, earcandy, X11, software). Then when I do a search for
"software" using this module's interface, I get all items which have the
keyword, including xmms. If I search for "linux sofware", I get all things
with these keywords, etc. You get the idea. (You could make a yahoo-like
index, or filesystem-like path by joining keywords "/linux/software/mp3"
note this is the same as "/software/linux/mp3") (Does anyone else but me
have thousands of bookmarks, and occasionally think "I saw a piece of
software that does X", and then spend 2 hours manually searching your
For bonus points, add a web spider that will search documents linked from
the bookmarked page, and add them to the search engine's database. (This
way you could find info by searching that you've never seen, but is closely
related to something you've bookmarked).
For bonus bonus points, add the capability for the spider to use Netscape's
"What's Related" (or similar) interface to find things similar to the page
bookmarked, and index them too.
For bonus bonus bonus points, make sure this doesn't get exploited by
This could be an entire thesis project on software agents. Any takers?
Ok, enough rambling, how do I use it?
Well, first download it. It requires perl,
and several modules from CPAN (See the INSTALL file).
After getting it running, tell your browser to us e the proxy. Under
netscape, select the menu item Edit->Preferences. Then, in the preferences
dialog box, select Advanced->Proxies. (You may have to click the little
arrow next to advanced to get netscape to expand the menu). Then select "Manual
proxy configuration", and put in the "HTTP Proxy" field the host and port on
which you ran FilterProxy. If you haven't edited FilterProxy.pl, this should be
'localhost' and '8888'.
So now what?
- Read the README, and LICENSE files.
- Send mail to
me (Bob McElrath),
and let me know if you got it working, and detail any problems you have.
- Browse the web.
- Configure FilterProxy. Configuration is done via web forms (you may
also edit FilterProxy.conf if you are familiar with perl -- I suggest
you use the forms to add entries first, so you know the structure).
- If you are interested in writing a module or fixing a bug, please contact
me, and read the TODO file.
What do I do if something goes wrong?
- Read the file BUGS carefully, to make sure this is not a known bug.
- If the bug is a web page that was badly mangled by FilterProxy, please
the URL of the page, and the (unmangled) page itself, if possible,
to me (email@example.com). Please include as
much information as possible, including the version of FilterProxy you
are using, your FilterProxy.conf file, and any other information you
feel is relevant. The more detailed your report, the more likely it is
to get fixed. If you think you can fix it yourself, please do so, and
mail me a patch. In the meantime, configure FilterProxy to not filter
FilterProxy and this page are © Copyright Bob McElrath. Last
modified Friday July 20 22:07:00 CDT 2001