Please login or register.

Login with username, password and session length
Advanced search  
Pages: 1 [2] 3 4

Author Topic: [MOD] Avoiding duplicate content (trailing slash issue)  (Read 14883 times)

Armen

  • Sr. Member
  • ****
  • Karma: 41
  • Posts: 338
    • http://www.funnydays.ru
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #15 on: July 29, 2008, 03:27:38 PM »

Here's the solution I'm using (it's sitewise):

Just after:
Code: [Select]
#RewriteBase /PATH
Add
Code: [Select]
RewriteCond %{REQUEST_URI} ^/[^\.]+[^/]$
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1/ [R=301,L]

And ALL pages and categories will recieve a trailing slash if none provided.

Joost, thank you again for bringing up another important issue.

It should be added into the default package.
« Last Edit: July 29, 2008, 03:29:23 PM by Armen »
Logged
Now ogres, oh, they're much worse. They'll make a suit from your freshly peeled skin. They'll shave your liver, squeeze the jelly from your eyes... Actually, it's quite good on toast.

Sven

  • ULTIMATE member
  • ******
  • Karma: 88
  • Posts: 2029
  • Chasing MY bugs!
    • hiseo.fr - rédacteur Web
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #16 on: July 29, 2008, 03:44:22 PM »

Waiter, please?
1 karma for the ogre and for all.

Joost

  • Guest
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #17 on: July 29, 2008, 05:02:15 PM »

Armen,

Your code doesn't work great on my localhost with the additional tests.
An unslashed, real subdirectory does get a slash, but instead of redirecting to
http://localhost/sNews16/testmap/
it redirects to
http://localhost/testmap/

I added your code like this:

RewriteEngine On
RewriteBase /sNews16
RewriteCond %{REQUEST_URI} ^/[^\.]+[^/]$
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1/ [R=301,L]

RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*) $1 [L]
Logged

Armen

  • Sr. Member
  • ****
  • Karma: 41
  • Posts: 338
    • http://www.funnydays.ru
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #18 on: August 01, 2008, 12:34:33 PM »

Hm. No problems on my side, Joost. BTW, I didn't apply the fix from your first post. Just those lines I specified.

Here's my final solution, + removes "index.php" from url if specified):

Code: [Select]

#RewriteBase /
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
RewriteRule ^index\.php$ http://%{HTTP_HOST}/ [R=301,L]
RewriteCond %{REQUEST_URI} ^/[^\.]+[^/]$
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1/ [R=301,L]

RewriteCond %{REQUEST_FILENAME} -f..........
Logged
Now ogres, oh, they're much worse. They'll make a suit from your freshly peeled skin. They'll shave your liver, squeeze the jelly from your eyes... Actually, it's quite good on toast.

Sven

  • ULTIMATE member
  • ******
  • Karma: 88
  • Posts: 2029
  • Chasing MY bugs!
    • hiseo.fr - rédacteur Web
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #19 on: August 01, 2008, 01:46:35 PM »

No problem for me: on line it works rela fine.
And another karma for my ogre who has thought of the index.php question.
Bravo!

Ken Dahlin

  • Full Member
  • ***
  • Karma: 30
  • Posts: 139
    • http://www.kendahlin.com/
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #20 on: August 01, 2008, 04:42:39 PM »

You know, I honestly didn't think this was a big deal until I checked google page ranks for certain URL with and without trailing slash... and to my shock found that the ones indexed without the slash were PR 0 compared to the same pages with slashes at PR 4, 5 or 6!

Sadly, this also may explain some inconsistent performance in indexing sites with my dynamic sitemap generator mod which does not use trailing slashes in the URIs. My guess is that before I added sitemaps to my site, google had already indexed the site and now sees the sitemap as duplicate content!

Thanks for bringing this to my attention.

Ken
Logged

Keyrocks

  • Doug
  • ULTIMATE member
  • ******
  • Karma: 449
  • Posts: 6019
  • Semantically Challenged
    • snews.ca
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #21 on: August 01, 2008, 06:45:12 PM »

@ Joost, Ken and Armen... would this be another "Bug" that should be posted in the "Bugs" board and then patched into the official 1.6 download?
Logged
Do it now... later may not come.
-------------------------------------------------------------------------------------------------
sNews 1.6 MESU | sNews 1.6 MEMU

Ken Dahlin

  • Full Member
  • ***
  • Karma: 30
  • Posts: 139
    • http://www.kendahlin.com/
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #22 on: August 01, 2008, 09:38:42 PM »

@ Joost, Ken and Armen... would this be another "Bug" that should be posted in the "Bugs" board and then patched into the official 1.6 download?

That it's a "bug" is debatable as sNews probably still works as intended. But the project claims that "Search Engines dig sNews. Really." and so I'd say this little htaccess rule is a step in the right direction for the official release.
Logged

Armen

  • Sr. Member
  • ****
  • Karma: 41
  • Posts: 338
    • http://www.funnydays.ru
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #23 on: August 01, 2008, 10:31:17 PM »

Yes, Keyrocks. I think, it should be packed into the default package as soon as possible, because sNews is being promoted as a SEO-friendly CMS.

With latest .htaccess fixes applied, - yes, - very-very friendly.
Logged
Now ogres, oh, they're much worse. They'll make a suit from your freshly peeled skin. They'll shave your liver, squeeze the jelly from your eyes... Actually, it's quite good on toast.

Joost

  • Guest
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #24 on: August 02, 2008, 02:08:33 AM »

Lets not get carried away here.
There are two options presented in the first post.
The oldest code (second) is strictly SE friendly:
In sNews all uris end with a slash. If no slash is appended, Google and visitors get a 404 message.

The new code (Armen's code didn't work in my config and therefore disregarded for now), is more or less user friendly:
If no slash is found at the end of a uri, a slash is appended and redirects to the right location.
However, this approach assumes that the the uri actually exists. If it doesn't exist, it invokes a cascaded response, something like:
- HTTP/1.x 301 Moved Permanently (new location, slash added)
- HTTP/1.x 302 Found ( new location 404/)
- HTTP/1.x 404 Not Found

I am not sure how Google handles this. For that matter I opt for the second.
ps: Of course, the newst code might be convenient, when you have an important (but misspelled)  inbound link.
That, you can repair with a specific redirect rule.
« Last Edit: August 02, 2008, 02:14:36 AM by Joost »
Logged

Armen

  • Sr. Member
  • ****
  • Karma: 41
  • Posts: 338
    • http://www.funnydays.ru
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #25 on: August 02, 2008, 09:40:50 AM »

Regarding errors and "cascaded responses"

You mentioned something about a redirection to 404/ page?

I've already made sNews send "NOT FOUND" headers without redirecting user to "404/" page and thus without changing URL. Default sNews' method (I mean a 403 redirect to 404/) is not SEO-friendly at all. Because in case search engine stumbles upon an inexistent page it recieves "FOUND" http header, then suddenly "403 REDIRECT" and then "404 NOT FOUND".

That's abnormal. Google doesn't like this, because 404 header must be sent at the first place. Some time ago in the webmaster.google.com lounge I've recieved multiple error reports that the engine returns "FOUND" when there must be a "NOT FOUND" header. It cost me some time, but I made the CMS recognize inexistent pages and immediately send a "404" header.

But even when I downloaded and used vanilla 1.6 version of sNews, I still recieve no errors. No "looping" whatsoever, no unlimited redirects, no wrong redirects. Tried everything and still can't find any fault in the rules.

And, BTW, Google knows about No-Slashed-to-Slashed URL 403 redirect. It's considered normal to use this method. It's safe.

Quote
In sNews all uris end with a slash. If no slash is appended, Google and visitors get a 404 message.

Why should one recieve a 404 error, when he can be redirected to the right URL? That's not human-friendly for sure.

Update: BTW, tried your updated code and whenever I call a category without a trailing slash, it gives me a 404 message. Is that how it's supposed to be? I hope, not.

RULE-TEST

You can test my solution here: http://sapehelp.ru/

Try http://sapehelp.ru/money/ and http://sapehelp.ru/money

You'll be conviniently redirected. Try http://sapehelp.ru/index.php and you'll recieve http://sapehelp.ru/

(If you want to see modified 404 engine try anything else, like http://sapehelp.ru/sfewf/ or even http://sapehelp.ru/sfewf (without slash). See? No loops whatsoever...)

Feel free to try out my solution and say what you think.

Another mystery from your first post:

Quote
You will have to edit #RewriteBase /sNews16 as usual.

Why edit a comment?  :D  :D  :D

Why? It's a commented line, FGSake.

BTW, everything works great even without specifying "RewriteBase", when rules are well-written. At least with sNews.
« Last Edit: August 02, 2008, 10:03:36 AM by Armen »
Logged
Now ogres, oh, they're much worse. They'll make a suit from your freshly peeled skin. They'll shave your liver, squeeze the jelly from your eyes... Actually, it's quite good on toast.

Joost

  • Guest
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #26 on: August 02, 2008, 07:42:28 PM »

There seems to be some misunderstanding here.
- Your code works on your system. However, not on mine. We are looking for (core)code that can live on the majority of servers. If code from the first post behaves badly (403, 500 headers or 404 on existing directories etc), then that code should be tossed out. For me there is no competence issue. It is about server configuration.
FYI: Your code invokes a redirect before a 404 header on http://sapehelp.ru/sfewf as well.

- There is a difference between Apache error handling and sNews (php) error handling:
 Apache can dtermine whether a file or directory exists. But it does not know wheter a database generated page exists.
  Before appending a slash and redirect, Apache has verified the existence of that directory. This is similar to the way domainnames are handled: A slash is appended only if the domainname exists (there's some logic here).
 sNews can determine whether a database generated page exists. To do so, it must work toghether with Apache. The htaccess file tells Apache: "If no such file exists, let sNews handlle it".
That's when the cascaded error handling starts. Cooperation of the two is not flawless and sNews suffers from a bug.
Bug description and fix:
Remove /404/
So unless this fix is applied, a typo like http;//site.com/categor.. instead of http;//site.com/category/
will invoke a cascaded error handling.


 
Quote
You will have to edit #RewriteBase /sNews16 as usual.

Why edit a comment?  :D  :D  :D

Why? It's a commented line, FGSake.
BTW, everything works great even without specifying "RewriteBase", when rules are well-written. At least with sNews.

Well, I tried to be as clear as possible, not for God, but for newbies sake. For ogre sake ;) : Not all sNews install live in the root.  Some of us might have to uncomment that line, change the foldername.
Rewritebase: Without it there seems to be an improvement (on localhost, that is) for this code also:
Quote
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*) $1 [L]
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ index.php?category=$1 [L]

Consider:
- The fact that not all server configurations are equal (GoDaddy is a real pain).
- 'Slash handling' might be needed for different reasons and need different solutions. The code in this post might be great for Google, while bad for existing, 'slash-less' but valuable inbound links.
- Typos like this don't occur that often.

I see no all purpose solution here, that works flawless on all servers. I prefer targeting issues as they occur, unless someone has figured out an all purpose solution.
Logged

Armen

  • Sr. Member
  • ****
  • Karma: 41
  • Posts: 338
    • http://www.funnydays.ru
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #27 on: August 03, 2008, 05:17:17 AM »

Well configured server will swallow anything that's supposed to work, according to manuals.

Too bad your server (for a mysterious reason) doesn't process my directives.

BTW, Sven, any problems on your side with the updated htaccess rules?
Logged
Now ogres, oh, they're much worse. They'll make a suit from your freshly peeled skin. They'll shave your liver, squeeze the jelly from your eyes... Actually, it's quite good on toast.

Joost

  • Guest
Re: [MOD] Avoiding duplicate content (trailing slash issue)
« Reply #28 on: August 03, 2008, 05:29:35 AM »

Sorry Armen,

But it is not about (any of) my server(s).I don't need a solution. Didn't you read or understand my post?  ???
And there is no such thing as one good configuration. People have all kinds of hosting, depending on that particular server configuration.

« Last Edit: August 03, 2008, 05:31:48 AM by Joost »
Logged

Sven

  • ULTIMATE member
  • ******
  • Karma: 88
  • Posts: 2029
  • Chasing MY bugs!
    • hiseo.fr - rédacteur Web
Pages: 1 [2] 3 4