Internet Security - Building a Bad Bot Trap for IIS
marketing space for rent

Introduction

There are hundreds of thousands of automated web crawlers or 'spiders' roaming the web following links and sucking up the contents. A magority of them are poisonous and not there to do you any good.

These bad bots range from content stealers and spam harvesters to bots looking of poorly implemented open source to abuse and exploit.

Bad Bot They all have one thing in common: they don't care about you nor your website and will and do break all rules to get what they want. This is how we'll catch them.

Principles

Two primary conditions must be available to catch automated robots.

  • An invisible hyperlink to the bot trap page. Invisible so no human user can click on it. We'll use CSS to hide the link
  • Implement all possible methods to inform robots to stay away from this page. We will use robots.txt and in-page instructions.

In theory all spiders must look for the 'robots.txt' file before starting to spider your site. If you don't have one, create it. A HTML tag 'rel=nofollow' instructs bots and spiders to ignore the link and not follow it.

Building the trap

We'll start of building the landing page for the hidden link, this is where rule breaking spiders and bots will arrive if they do not adhere to the rules.

In the main website from the root, create a new folder, I named it 'Trap'. Then make a new aspx page, I named it 'Bots.aspx'. Open the aspx file, and place the code in the head section:
<meta name="robots" content="noindex, nofollow" />

In the code behind place the following code (VB) and don't forget to place this directive in the page Imports System.IO


 Protected Sub Page_Load(sender As Object, e As System.EventArgs) Handles Me.Load

        '// Register that the page has been accessed in a log file.

        Dim MyUID As String = Now.Ticks.ToString

        Dim LogPath As String = Server.MapPath("~") & "\Trap"

        Try

            If Not Directory.Exists(LogPath) Then

                Directory.CreateDirectory(LogPath)

            End If

            Dim tfs As New FileStream(LogPath & "\" & "BOT_" & MyUID & ".txt", FileMode.Create, FileAccess.Write)

            Dim s As New StreamWriter(tfs)

            s.BaseStream.Seek(0, SeekOrigin.End)

            s.WriteLine(Request.ServerVariables("ALL_RAW") & vbCrLf & Request.UserHostAddress())

            s.Flush()

            s.Close()

            s.Dispose()

            tfs.Dispose()

        Catch ex As Exception

            '// Do error handling here

        End Try

    End Sub

protected void Page_Load(object sender, System.EventArgs e)

{

 // Register that the page has been accessed in a log file.

 string MyUID = DateAndTime.Now.Ticks.ToString();

 string LogPath = Server.MapPath("~") + "\\Trap";

 try {

  if (!Directory.Exists(LogPath)) {

   Directory.CreateDirectory(LogPath);

  }

  FileStream tfs = new FileStream(LogPath + "\\" + "BOT_" + MyUID + ".txt", FileMode.Create, FileAccess.Write);

  StreamWriter s = new StreamWriter(tfs);

  s.BaseStream.Seek(0, SeekOrigin.End);

  s.WriteLine(Request.ServerVariables("ALL_RAW") + Constants.vbCrLf + Request.UserHostAddress());

  s.Flush();

  s.Close();

  s.Dispose();

  tfs.Dispose();

 } catch (Exception ex) {

  // Do error handling here

 }

}

You may want to add some warning text to the page or be mean and nasty and build a random email address generator that puts a few hundred email addresses on the page to foul up spam harvester databases. It's up to you. With the code above we have a bad bot logger ready to run.

Configuring and setting the trap

Create a css style sheet or add a new class to an existing style sheet file to hide the link:
.bots{display:none;}

Add an instruction to the robots.txt file to tell bots and spiders to avoid the trap:
Disallow: /Trap/Bots.aspx
If you created another folder and/or named your trap file differently, you will need to change this to point to your bot trap page.

The last step is deciding where to put the trap link, in my case I have placed it on the master page of the site, this will ensure it is potentially exposed at all entry points the spiders may choose to enter from, meaning I'll catch them at the initial attempt. Remember to link the 'bots.css' style sheet to the page you use, else the link will be visible for users to click on.
Place the link to the trap in the html file(s) of your choice:
<div class="bots"><a href="/Trap/Bots.aspx" rel="nofollow">Bots not welcome here</a></div>

That's it, you're done. Keep an eye on the '/Trap" folder, it will create log files and report when bots that do not follow your instructions go snooping where you told them not to.

Remarks

  • Do not automate the blocking of detected IP numbers using this method. We regularly see Microsoft, Google and Verisign breaking the very same rules they claim to uphold. Please double check the caught IP numbers before banishing them from your website.
  • Hiding things in plain sight (hidden invisible links) is something search engines do not like (Google) and you could get penalized. How much? I don't care. Minus five points is worth finding and blocking bad business.
  • Bots we've found with this routine and other automated contacts that do not have your interest at heart. Needless to say these IP addresses have been refused access to this system
Back to Internet Security