List of Filtered Robots/Spiders (Regular Expressions)
This list represents the regular expressions for the robots/spiders that we have excluded.
- robot
- crawl
- spider
- acme\.spider
- [^a]fish
- alexa
- AllenTrack
- Alexandria(\s|\+)prototype(\s|\+)project
- almaden
- appie
- Arachmo
- archive\.org_bot
- arks
- asterias
- atomz
- autoemailspider
- awbot
- baiduspider
- bbot
- biadu
- biglotron
- bloglines
- blogpulse
- boitho\.com\-dc
- bookmark\-manager
- [+:,\.\;\/\\-]bot
- bot[+:,\.\;\/\\-]
- Brutus\/AET
- bspider
- bwh3_user_agent
- cfnetwork| checkbot
- China\sLocal\sBrowse\s2\.6
- combine
- commons\-httpclient
- ContentSmartz
- core
- cursor
- custo
- DataCha0s\/2\.0
- Demo\sBot
- docomo
- DSurf
- dtSearchSpider
- dumbot
- easydl
- EmailSiphon
- EmailWolf
- exabot
- fast-webcrawler
- favorg
- FDM(\s|\+)1
- feedburner
- feedfetcher\-google
- Fetch(\s|\+)API(\s|\+)Request
- findlinks
- gaisbot
- GetRight
- geturl
- gigabot
- girafabot
- gnodspider
- Goldfire(\s|\+)Server
- Googlebot
- grub
- heritrix
- hl_ftien_spider
- holmes
- htdig
- htmlparser
- httpget\−5\.2\.2
- httrack
- HTTrack
- ia_archiver
- ichiro
- iktomi
- ilse
- internetseer
- iSiloX
- java
- jeeves
- jobo
- larbin
- libwww\-perl
- linkbot
- linkchecker
- linkscan
- linkwalker
- livejournal\.com
- lmspider
- LOCKSS
- lwp\-request
- LWP\:\:Simple
- lwp\-tivial
- lwp\-trivial
- lycos
- mediapartners\-google
- megite
- Microsoft(\s|\+)URL(\s|+)Control
- milbot
- mj12bot
- mnogosearch
- mojeekbot
- momspider
- motor
- msiecrawler
- msnbot
- MuscatFerre
- myweb
- nagios
- NABOT
- NaverBot
- netcraft
- netluchs
- ng\/2\.
- no_user_agent
- nutch
- ocelli
- Offline(\s|\+)Navigator
- OurBrowser
- perman
- pioneer
- playmusic\.com
- powermarks
- psbot
- python
- qihoobot
- rambler
- Readpaper
- redalert| robozilla
- scan4mail
- scooter
- seekbot
- seznambot
- shoutcast
- slurp
- sogou
- speedy
- spider
- spiderman
- spiderview
- Strider
- sunrise
- superbot
- surveybot
- tailrank
- technoratibot
- Teleport(\s|\+)Pro
- Teoma
- T\-H\-U\-N\-D\-E\-R\-S\-T\-O\-N\-E
- titan
- turnitinbot
- twiceler
- ucsd
- ultraseek
- urlaliasbuilder
- voila
- Wanadoo
- w3c\-checklink
- WebCloner
- webcollage
- WebCopier
- Webmetrics
- webmirror
- Webinator
- WebReaper
- Web(\s|\+)Downloader
- WebStripper
- WebZIP
- Wget
- wordpress
- worm
- Xenu(\s|\+)Link(\s|\+)Sleuth
- yacy
- yahoofeedseeker
- yahoo\-mmcrawler
- yahooseeker
- yandex
- y!j
- yodaobot
- zealbot
- zeus
- zyborg