Attackers normally use a Spambot for sending spams. A Spambot is an automated program which is used by the attackers to send spam emails to users, send automated posts to various forums or even social networking sites like twitter.
Spambots crawl websites for malicious purposes and waste a website's bandwidth unnecessarily. So, websites use Spider Traps as a countermeasure against those spambots.
How does Spider Trap work ?
Spambots request webpages from a webserver several times within a short duration. So, to counter them, a Spider Trap catches spambots and makes them run in some infinite loop.
There are a number of common techniques that are frequently used to make the Spambots run in an infinite loop. To name a few of them :
- Sometimes, a cyclic directory structure is used. For example : /path/to/directory/again/path/to/directory. As a result, if a spambot starts crawling the website, it will start running in an infinite loop.
- Some websites use unbounded number of dynamic pages. For example, algorithmically generated poetry or including a calendar.
- Webpages filled with a large number of characters so that when a lexical analyzer will try to parse it, it will end up crashing.
Disadvantage of using Spider Trap
Not all web crawlers are spambots. Sometimes, polite web crawlers crawl websites for indexing purpose. So, a website cannot use Spider Traps to trap all the crawlers it encounters. It needs to differentiate between Spambots and legitimate web crawlers.
How to prevent legitimate webcrawlers from falling into Spider Trap
Polite webcrawlers alternate requests between different hosts. They do not request webpages from same server more than once within a short time frame. So, normally Spam Traps do not affect them much. Moreover, websites with Spider Traps can keep a robots.txt, which can keep enough information so that legitimate webcrawlers do not fall in trap.
This was a short informative article on Spider Trap. Hope you enjoyed it.