Practical ECMAScript
Problem: Fighting Off the Email Scrapers
Constructing a Web "spider" to walk through a Web site is not really hard any more. Simple libraries exist to make this a fairly trivial task. Adding a "scraper" to the code so that a "spider" extracts email addresses from the pages it encounters is also not really hard. You simply tell it to look for anchor tags on the pages it encounters, and then to check for the presence of the string "mailto:" within the href attribute, and you have an email address scraper. This is how many spam systems gather up email addresses.
So, how do Webmasters defeat these systems while continuing to post email links on Web pages? The solution is a simple one, if you know a little JavaScript (aka ECMAScript).
Solution: Use ECMAScript
JavaScript is a standard scripting language which is supported by all the current and reasonably-recent crop of Web browsers. JavaScript is turned on by default when someone installs a Web browser. There were initially a million different versions of JavaScript, so the European Computer Manufacturers Association (aka ECMA) came up with a standard version which is now supported by everyone. This standard version of JavaScript is known as ECMAScript.
The Meta-Tag
To add a call to include an ECMAScript into your Web page, you simply add a script meta-tag into the head element. It might look like this:
<script src="http://www.it.rit.edu/~jxs/javascript/genericEmail.js" type="text/javascript" charset="utf-8"></script>
The Function
The following function is a simple example of using ECMAScript to perform dynamic functions within a Web page. Note that it is simply a function call with three arguments:
- user name
- email server name
- and a Subject header for the resulting email
If you use a call to this instead of a standard mailto: URI, then an email address scraper will not be able to harvest email addresses from your Web pages. Here is what the function might look like:
function genericEmail( user, target, subject ) { window.location = "mailto:" + user + "@" + target + "?Subject=" + subject; }
Note the use of window.location in the ECMAScript function. What does that mean? (hint: do a Google search on window.location and see what you find)
The Function Call
If you add something like the following code into a block-level container within the body of your HTML, then the function will get called when the user clicks on the anchor:
<a href="#" title="send the instructor email" onclick="genericEmail( 'jeffs', 'it.rit.edu', 'test email' );">test email script</a>
Note the use of a dummy href value. This insures the browser will just keep the display at the current position on the current page. Note also the use of the three function arguments:
- user name: jeffs
- email server name it.rit.edu
- Subject header string: test email
The function is called when the user clicks on the anchor. The three arguments are concatenated together by the function to construct a complete email call. Note the use of single-quotes within the double-quotes surrounding the value of the onclick attribute. Note also how the function call ends with a semi-colon.
A Full-Featured Version
Here is a full featured version using a span, including cursor changes and underlining and so on. To make it easier to follow, the style instructions are inline.
Code
<span title="send the instructor email" onclick="genericEmail( 'jeffs', 'it.rit.edu', 'test email' );" style="font: italic 100%/1.0 Georgia, serif; text-decoration: underline; color: #0000ff; background: #ffffff;" onmouseover="this.style.color='#0000ff'; this.style.background='#ffff00'; this.style.cursor='pointer';" onmouseout="this.style.color='#0000ff'; this.style.background='#ffffff'; this.style.cursor='default';">test email script</span>
Try It
test email script
An Improved Version
Of course, the demo code above violates the principle of separation of concerns. It contains direct dynamic style instructions, rather than isolating things where they belong. Remember our discussion of how HTML, CSS, and JavaScript basically adhere to the Model - View - Controller paradigm. Let's improve the above, separating the dynamic stuff into the JavaScript code and thereby making our HTML cleaner and easier to read.
A Slight Digression: the Keyword this
The keyword this is a very valuable one to understand and to use. If we pass this as an argument from an event-handler call in the HTML, like onmouseover or onmouseout or onclick, then the keyword this refers to the element which the event-handler is servicing (see this article for more details and a longer explanation). That way you can use the keyword this to pass a reference to a specific HTML element on to a JavaScript function. The JavaScript function does not need to do any extra work to find that element, and you don't even have to give the element a unique id. This is quite handy, and keeps your HTML from getting all cluttered up.
Code
The HTML
<span class="pseudo_anchor" title="send the instructor email" onclick="genericEmail( 'jeffs', 'it.rit.edu', 'test email' );" onmouseover="mouseEnter( this );" onmouseout="mouseExit( this );">test cleaner email script</span>
The CSS
.pseudo_anchor { font: italic 100%/1.0 Georgia, serif; text-decoration: underline; color: #0000ff; background: #ffffff; }
The JavaScript
function genericEmail( user, target, subject ) { window.location = "mailto:" + user + "@" + target + "?Subject=" + subject; } function mouseEnter( what ) { what.style.color = "#0000ff"; what.style.background = "#ffff00"; what.style.cursor = "pointer"; } function mouseExit( what ) { what.style.color = "#0000ff"; what.style.background = "#ffffff"; what.style.cursor = "default"; }