Please login or register.

Login with username, password and session length
Advanced search  

Author Topic: [ADDON] robots.txt for sNews  (Read 5393 times)

albert

  • Sr. Member
  • ****
  • Karma: 0
  • Posts: 405
    • http://www.oswt.co.uk/
[ADDON] robots.txt for sNews
« on: January 22, 2006, 05:17:11 am »

Hi

Here is the code for the robots.txt for sNews

Code: [Select]
##################################
# robots.txt
# sNews
# http://www.awddesign.co.uk/
# updated 2006-01-21 (disallow rss.xml links)
# don't let search engines see the RSS feed, it's just confusing.
#

User-agent: Googlebot # let him see
User-agent: InfoNaviRobot
User-agent: TV33_Mercator
User-agent: AVSearch
User-agent: Mercator
User-agent: Scooter
User-agent: Slurp
User-agent: SearchengineLicenceSheep
User-agent: shadow
User-agent: MultiText
User-agent: FAST-WebCrawler
User-agent: Lycos_Spider
User-agent: Atomz
User-agent: htdig
User-agent: spider00.logika.net
User-agent: NetMechanic
User-agent: libwww-perl
User-agent: Teleport Pro
Disallow: /rss.xml
Disallow: rss.xml

User-agent: *
Disallow: /cgi-bin/
Disallow: /css
Disallow: /inc
Disallow: /img
Disallow: /images
Disallow: /?action=login
Disallow: /index.php?action=login

#Disallow: /add more here..
#Disallow: /add more here..
Albert
Logged
Albert
http://snews.awddesign.co.uk/snews/ site: v1.3
http://snews.awddesign.co.uk/           site: v1.2 http://www.awddesign.co.uk/
“Putting together the largest collection of sNews 1.5 designs. Coming very soon :)

dexter

  • Newbie
  • *
  • Karma: 0
  • Posts: 14
    • http://www.prodvinu.com/
[ADDON] robots.txt for sNews
« Reply #1 on: January 22, 2006, 07:30:53 am »

I promouter (search engine optimization)
You Robots.txt bad.

My robots.txt

User-agent: *
Disallow: /cgi-bin/
Disallow: /css
Disallow: /inc
Disallow: /img
Disallow: /images
Disallow: /?action=login
Disallow: /index.php?action=login
Logged
Sorry on my bad English

Mario

  • Newbie
  • *
  • Karma: 0
  • Posts: 43
[ADDON] robots.txt for sNews
« Reply #2 on: February 06, 2006, 09:44:46 am »

Added some useragents:

Quote
##################################
# robots.txt
# sNews
# http://www.awddesign.co.uk/
# updated 2006-01-21 (disallow rss.xml links)
# don't let search engines see the RSS feed, it's just confusing.
#

User-agent: Googlebot # let him see
User-agent: AVSearch
User-agent: Atomz
User-agent: BlackWidow
User-agent: EirGrabber
User-agent: EmailSiphon
User-agent: Express WebPictures
User-agent: ExtractorPro
User-agent: EyeNetIE
User-agent: FAST-WebCrawler
User-agent: Faxobot
User-agent: FlashGet
User-agent: GetRight
User-agent: Go!Zilla
User-agent: Go-Ahead-Got-It
User-agent: GrabNet
User-agent: Grafula
User-agent: InfoNaviRobot
User-agent: JOC Web Spider
User-agent: LeechFTP
User-agent: LinkWalker
User-agent: Lycos_Spider
User-agent: Mass Downloader
User-agent: Mercator
User-agent: Missigua
User-agent: MultiText
User-agent: Net Vampire
User-agent: NetAnts
User-agent: NetMechanic
User-agent: NetSpider
User-agent: NetZIP
User-agent: Octopus
User-agent: Offline Explorer
User-agent: Offline Navigator
User-agent: Scooter
User-agent: SearchengineLicenceSheep
User-agent: Slurp
User-agent: SurveyBot
User-agent: TV33_Mercator
User-agent: Teleport
User-agent: Teleport Pro
User-agent: WebStripper
User-agent: WebWhacker
User-agent: WebZIP
User-agent: Wget
User-agent: Widow
User-agent: Xaldon WebSpider
User-agent: Zeus
User-agent: htdig
User-agent: larbin
User-agent: libwww-perl
User-agent: psbot
User-agent: shadow
User-agent: spider00.logika.net

Disallow: /rss.xml
Disallow: rss.xml

User-agent: *
Disallow: /cgi-bin/
Disallow: /css
Disallow: /inc
Disallow: /img
Disallow: /images
Disallow: /?action=login
Disallow: /index.php?action=login

#Disallow: /add more here..
#Disallow: /add more here.


Pls ignore my post, added some of the browsers which are listed and denied through my .htaccess file...
Logged

George Antoniadis

  • Sr. Member
  • ****
  • Karma: 0
  • Posts: 479
[ADDON] robots.txt for sNews
« Reply #3 on: February 06, 2006, 03:24:57 pm »

why not just User-agent: * ??? I don't get it
Logged
How I feel like I'm starless, I'm ready to fade now.
And how I feel like I'm starless, I'm hopeless and greyed out.

Mario

  • Newbie
  • *
  • Karma: 0
  • Posts: 43
[ADDON] robots.txt for sNews
« Reply #4 on: February 06, 2006, 06:10:42 pm »

Quote from: dexter
I promouter (search engine optimization)
You Robots.txt bad.

My robots.txt

User-agent: *
Disallow: /cgi-bin/
Disallow: /css
Disallow: /inc
Disallow: /img
Disallow: /images
Disallow: /?action=login
Disallow: /index.php?action=login

Quote
Disallow: /img
Disallow: /images
Wouldn't this prevent google caching the site/pages? Just wondering.

Code: [Select]
Disallow: /?action=login
Disallow: /index.php?action=login

I wouldn't do this. Nice bots won't go there. Not so nice bots - or people - know where to go to check things out when they read your robots.txt

Maybe a hint:
If you want to change the default loginpage on your 1.4beta install from www.yourdomain.com/login/
to something less obvious all you have to do is edit two lines of code in sNews.php

Code: [Select]
case "login":
login();

to:

Code: [Select]
case "mysecretlogin":
mysecretlogin();


Code: [Select]
function login() {
to:
Code: [Select]
function mysecretlogin() {
and 1 line in your .htaccess file

Code: [Select]
RewriteRule ^login/$ index.php?category=login [L]
to:
Code: [Select]
RewriteRule ^mysecretlogin/$ index.php?category=mysecretlogin [L]
now you can login through: http://www.yourdomain.com/mysecretlogin/

Logged

bryn

  • Hero Member
  • *****
  • Karma: 2
  • Posts: 934
    • http://www.cssugly.com
[ADDON] robots.txt for sNews
« Reply #5 on: February 06, 2006, 07:37:36 pm »

nice information and code examples in your post Mario..thanks for that..very handy to know! ;D
Logged
Over 1,000 posts of joy, sNews is not only brilliant, but fun too! thanks guys :D

George Antoniadis

  • Sr. Member
  • ****
  • Karma: 0
  • Posts: 479
[ADDON] robots.txt for sNews
« Reply #6 on: February 06, 2006, 11:19:01 pm »

mario why change the function name?
just change the case... the function has nothing to do and might break something else...
Logged
How I feel like I'm starless, I'm ready to fade now.
And how I feel like I'm starless, I'm hopeless and greyed out.

Mario

  • Newbie
  • *
  • Karma: 0
  • Posts: 43
[ADDON] robots.txt for sNews
« Reply #7 on: February 07, 2006, 05:17:41 pm »

Quote from: analyzerx
mario why change the function name?
just change the case... the function has nothing to do and might break something else...

I did a search and replace in my editor... one too many indeed.
Logged

Jochum Meester

  • Sr. Member
  • ****
  • Karma: 1
  • Posts: 309
    • JochumMeester.com
[ADDON] robots.txt for sNews
« Reply #8 on: February 07, 2006, 05:20:15 pm »

What are all those useragents??  ???
Logged

Mario

  • Newbie
  • *
  • Karma: 0
  • Posts: 43
[ADDON] robots.txt for sNews
« Reply #9 on: February 07, 2006, 05:25:28 pm »

Quote from: JM
What are all those useragents??  ???

The ones I pasted between in error are known emailharversters and hostile robots.
Logged