sNews Forum

Previous sNews versions => sNews 1.5 Final => Suggestions => Topic started by: ki11 on April 23, 2007, 05:50:29 pm

Title: sNews and Duplicate Content
Post by: ki11 on April 23, 2007, 05:50:29 pm
I just wanted to know, if you care about duplicate content.

The standard sNews produces a lot of duplicate content I think. There articles are accessible through the category, through pagination inside the category and through their sef.

Therefore I changed my Navigation to the sitemap and tweaked it a little bit. I made the Categories non-clickable so the articles should only be accessible via their sef. Should help I hope.

Any other ideas to make sNews more searchengine friendly?
Title: sNews and Duplicate Content
Post by: Joost on April 23, 2007, 07:07:46 pm
Quote from: ki11
I just wanted to know, if you care about duplicate content.

The standard sNews produces a lot of duplicate content I think. There articles are accessible through the category, through pagination inside the category and through their sef.

Therefore I changed my Navigation to the sitemap and tweaked it a little bit. I made the Categories non-clickable so the articles should only be accessible via their sef. Should help I hope.

Any other ideas to make sNews more searchengine friendly?
There is no duplicate content. The fact, that you can access the content through different links, doesn't make it duplicated, unless it is located on different url's. I don't think that is the case. Only snippets of content  can be found on different locations.
Title: sNews and Duplicate Content
Post by: ki11 on April 23, 2007, 08:30:00 pm
Quote from: Joost
There is no duplicate content. The fact, that you can access the content through different links, doesn't make it duplicated, unless it is located on different url's.
A different link means a different URL.  A search engine can't decide which link take as the original, so you produce duplicate content.
Title: sNews and Duplicate Content
Post by: Mika on April 23, 2007, 09:10:56 pm
Quote
The fact, that you can access the content through different links, doesn't make it duplicated, unless it is located on different url's.
Word! ;)

 In fact, you're able to navigate via several directions (categories listing, archive listing, search listing etc..) and still fetch the contents by following one single unique URL constructed as domain/unique-category-SEF/unique-article-SEF/. The engine itself prohibits the usage of non-unique category and article names.

Why don't you post a link of your site so we could examine that duplicated contents issue of yours?
Title: sNews and Duplicate Content
Post by: codetwist on April 23, 2007, 09:26:38 pm
There is one side-effect with the way how sNews handle article URL-s.

For example, if proper url for article is 'domain/unique-category-SEF/unique-article-SEF/' then all of following urls will display it as well:
 - domain/home/unique-article-SEF/
 - domain//unique-article-SEF/
 - domain/blah/unique-article-SEF/

Basically any string will do for category SEF, including unrelated existing categories.
Title: sNews and Duplicate Content
Post by: Fred K on April 23, 2007, 09:38:16 pm
there's an added twist (no pun intended) to article urls: Pages.
Pages are noted as articles, so the url to any unique Page is always mydomain/home/mypage/
It would be nice to have it mydomain/mypage/ by default. And even nicer if Pages were made separate from Articles, as a top level item, with the possibility to attach one or more category to them...
(I can dream, can't I? :) )
But that's for a whole 'nother topic, I know.
Title: sNews and Duplicate Content
Post by: ki11 on April 23, 2007, 10:54:07 pm
Your content is at least accessible through

domain/category-SEF/

and

domain/category-SEF/article-SEF

And both of these links are used. The category link on the front page and the article link in the sitemap. It's two different urls, so it's duplicate content in my opinion.
Title: sNews and Duplicate Content
Post by: Fred K on April 24, 2007, 12:01:45 am
that can't be url duplication, imho.
mydomain/cat-sef points to the category index. If you have more than one article in any given category, they will all be listed there.
mydomain/cat-sef/article-sef on the other hand points to one specific article.
as was said before, there are many ways leading to the same place, but no single place (ehm ... exception below) is duplicated. And many ways of getting to one place can only be good. Look at how popular Rome is... ;)

The only real duplication going on, as I see it, is Home. Home is treated both as a category and a page. That can be seen as a form of duplication, or wires crossed, perhaps. Personally I'd love to see Home just being a page, so I will typically remove it from the category list. But that's me -- others may see it differently.
Title: sNews and Duplicate Content
Post by: ki11 on April 24, 2007, 12:10:56 am
domain/category-SEF/ is not the same destination as domain/category-SEF/article-SEF as you already mentioned. So it's not two ways to Rome, but one way to Rome and one to Venice. Only that in our case the cities are equal ...
Title: sNews and Duplicate Content
Post by: Joost on April 24, 2007, 07:27:36 am
there is some truth in what ki11 says: When there no more then one article on the categoy index,  no comment  is added, it could be considered duplicated content. For each category, there can be only one duplicate page/content. However, Google won't punish you for having some duplicate content on your site. If that is what you (ki11)  referring to, I wouldn't bother.
Title: sNews and Duplicate Content
Post by: ki11 on April 24, 2007, 04:05:33 pm
Quote from: Joost
there is some truth in what ki11 says: When there no more then one article on the categoy index,  no comment  is added, it could be considered duplicated content. For each category, there can be only one duplicate page/content. However, Google won't punish you for having some duplicate content on your site. If that is what you (ki11)  referring to, I wouldn't bother.
Almost. It doesn't matter how many articles in the category are shown. If there are two articles you created 50% duplicate content which doesn't make it any better ... The size of the comments are - in most cases - just a fraction of the size of the article. So they also don't make your duplicate content problem any better.

If you show only two articles in your category, you other articles are accessible through pagination. Again: duplicate content.

I didn't want to create such a big discussion on duplicate content, it was just a topic, that bothered me. And I wanted to hear if you thought about it. You could also solve the problem with a noindex tag. I like my approach with the tweaked sitemap, even if it's all but perfect.
Title: sNews and Duplicate Content
Post by: Joost on April 24, 2007, 04:33:45 pm
Quote from: ki11
So they also don't make your duplicate content problem any better.
Could you clarify this problem? I still assume you are referring to bad seo, or do you mean something else? And what effects will it cause?

Regards,

Joost
Title: sNews and Duplicate Content
Post by: ki11 on April 24, 2007, 04:48:39 pm
Quote from: Joost
I still assume you are referring to bad seo, or do you mean something else? And what effects will it cause?
Yes, I'm talking about bad seo in general, about "duplicate content" in special.

Here's what Matt Cutts said:

Quote
In general, if you think you might be having problems, your best guess is probably to make sure your pages are quite different from each other, because we do do a lot of different duplicate detection... to crawl less, and to provide better results and more diversity.
source: http://blog.outer-court.com/archive/2006-08-02-n60.html

Duplicate content could make some of your pages not to rank that good in google or any other search engine.
Title: sNews and Duplicate Content
Post by: Joost on April 24, 2007, 05:33:08 pm
I was reading Matt Cutts blog (http://www.mattcutts.com/blog/) while you were writing your post ( not a real coincident by the way, just a quick update of my knowledge).   ;)
Google could decide there are 'near duplicates' on a sNews site (I forgot to mention 'title' and 'meta tags'). At worse Google would chose to pagerank one page and neglect the other. I see no problem in Google doing  that. I don't think Google will see this as bad seo practice and ban or punish the sNews user, unless you have hundreds of 'near duplicates' on your site.
Most of the time, Matt Cutts addresses his topics to the people in seo-industry who balance on the thin line between good and bad seo practice. Keep that in mind.

Regards.
Title: sNews and Duplicate Content
Post by: ki11 on April 24, 2007, 09:32:24 pm
I don't think that google bans pages because of duplicate content issues. They rather devaluate them. If you have some near duplicates on your page you never know which of them will rank and which of them will be devaluated. If you try to build up some links you may be collecting links to the "wrong" pages.

But it can come worse I think: Let's assume you show 2 articles per category. You have article A and B in this category. So google takes article A as near dupicate of the category, because tihs exact article is already published there. But google takes also the category as a near duplicate of B, because B is published in the category and as B itself. So the only article that ranks in a normal manner is B. Okay ... this may already sound a little paranoid, but it could really happen this way.
Title: sNews and Duplicate Content
Post by: Joost on April 24, 2007, 10:59:25 pm
@ki11
I don't see any reason for Google to work this way, it would not benefit the search-engine user. Why would they degrade quality and relevant content?
Title: sNews and Duplicate Content
Post by: 4dd1ct on April 25, 2007, 11:48:38 pm
Duplicate content is bad all round since it breaks the one document per URI model of the web (which is what search engines expect, and after all, is the way the web was intended to work). I can go into why this is harmful to both users and search engines in more depth, if requested.

AFAIK sNews currently has two major problems that create duplicate content in a way that is harmful.

There is no validation of categories, so ...

domain/category-SEF/article-SEF

domain/any-characters/article-SEF

...result in the same article, but different URIs. Try http://www.solucija.com/ghjghj/new-free-template-internet-corporation/

The other issue is the broken missing article code, so any deleted, renamed or mistyped URI results in the accidental creation of a valid URI (discussed already on these forums - http://www.solucija.com/forum/viewtopic.php?id=3822). Try http://www.solucija.com/ghjghj/fghfhghgfhghg/
Title: sNews and Duplicate Content
Post by: Joost on April 26, 2007, 01:26:50 am
Quote from: 4dd1ct
Duplicate content is bad all round since it breaks the one document per URI model of the web (which is what search engines expect, and after all, is the way the web was intended to work). I can go into why this is harmful to both users and search engines in more depth, if requested.

AFAIK sNews currently has two major problems that create duplicate content in a way that is harmful.
OK, sNews is bad for the Internet :( , but how harmful are near duplicates for the sNews user? And yes I am looking forward to some in depth explanation, if you would like to take the effort..
Quote from: 4dd1ct
There is no validation of categories, so ...

domain/category-SEF/article-SEF

domain/any-characters/article-SEF

...result in the same article, but different URIs. Try http://www.solucija.com/ghjghj/new-free-template-internet-corporation/
No well designed webcrawler has ever looked for http://www.solucija.com/ghjghj/new-free-template-internet-corporation/,  until today. :D
Quote from: 4dd1ct
The other issue is the broken missing article code, so any deleted, renamed or mistyped URI results in the accidental creation of a valid URI (discussed already on these forums - http://www.solucija.com/forum/viewtopic.php?id=3822). Try http://www.solucija.com/ghjghj/fghfhghgfhghg/
There is a mod available for this issue, you can find it here (http://www.solucija.com/forum/viewtopic.php?id=2348). It doesn't look like the mod is finished.
The basic idea, is to send an ErrorDocument 404 header.

Regards,

Joost
Title: sNews and Duplicate Content
Post by: quaffapint on April 26, 2007, 01:32:18 am
I have pages of mine end up in the supplemental index do to the very reasons described - Since the robot says, I can get to it via main\page6 - so why do I need to also list the-real-article-link/.  Taking some of the actions ki11 mentioned would probably be a good idea, just to be sure you get the 'real' page link in the index and not have it end up in the supplemental index.
Title: sNews and Duplicate Content
Post by: codetwist on April 26, 2007, 01:21:10 pm
Quote from: Joost
...
No well designed webcrawler has ever looked for http://www.solucija.com/ghjghj/new-free-template-internet-corporation/,  until today. :D
...
There is a mod available for this issue, you can find it here (http://www.solucija.com/forum/viewtopic.php?id=2348). It doesn't look like the mod is finished.
The basic idea, is to send an ErrorDocument 404 header.
...
Well ... it's more problems when applying/writing mods - it's easy to create crappy (as in complete nonsense) URI that will still allow to access article. And everything will looks dandy to user in this case ;)

As for that 404 page - it's still not finished so there is not mod yet.
Title: sNews and Duplicate Content
Post by: iatbm on April 26, 2007, 02:04:29 pm
There are no duplicate content issues with sNews. I can confirm that running 20+ sites with sNews ..... it is all fine ....
Title: sNews and Duplicate Content
Post by: codetwist on April 26, 2007, 02:23:10 pm
From posts in this thread I'd say that duplicate content could be an issue only if site as such is set up in a way that almost exactly same stuff is showed on different URI. But this definitely isn't a problem with sNews code, just not so good site configuration.

Loose category handling is a little different story, but It still doesn't qualify as a bug that breaks things. And of course, this isn't problem for those who mods their snews code anyway ;)

And that 404 - yet another feature request in a queue.
Title: sNews and Duplicate Content
Post by: Joost on April 26, 2007, 02:35:43 pm
Quote from: codetwist
Quote from: Joost
...
No well designed webcrawler has ever looked for http://www.solucija.com/ghjghj/new-free-template-internet-corporation/,  until today. :D
...
There is a mod available for this issue, you can find it here (http://www.solucija.com/forum/viewtopic.php?id=2348). It doesn't look like the mod is finished.
The basic idea, is to send an ErrorDocument 404 header.
...
Well ... it's more problems when applying/writing mods - it's easy to create crappy (as in complete nonsense) URI that will still allow to access article. And everything will looks dandy to user in this case ;)

As for that 404 page - it's still not finished so there is not mod yet.
Nice way of quoting, codetwist. You can make me say anything this way. :/

Anyway, ki11 started a very interesting discussion here. which should take place in somewhere else on the forum (that is what I think) and more often. It started as an seo issue, 'the one document per URI model of the web' was mentioned and now we are talking about implementing Hypertext Transfer Protocol -- HTTP/1.1.
Title: sNews and Duplicate Content
Post by: codetwist on April 26, 2007, 05:32:49 pm
Quote from: Joost
Quote from: 4dd1ct
Duplicate content is bad all round since it breaks the one document per URI model of the web (which is what search engines expect, and after all, is the way the web was intended to work). I can go into why this is harmful to both users and search engines in more depth, if requested.

AFAIK sNews currently has two major problems that create duplicate content in a way that is harmful.
OK, sNews is bad for the Internet :( , but how harmful are near duplicates for the sNews user? And yes I am looking forward to some in depth explanation, if you would like to take the effort..
Quote from: 4dd1ct
There is no validation of categories, so ...

domain/category-SEF/article-SEF

domain/any-characters/article-SEF

...result in the same article, but different URIs. Try http://www.solucija.com/ghjghj/new-free-template-internet-corporation/
No well designed webcrawler has ever looked for http://www.solucija.com/ghjghj/new-free-template-internet-corporation/,  until today. :D
Quote from: 4dd1ct
The other issue is the broken missing article code, so any deleted, renamed or mistyped URI results in the accidental creation of a valid URI (discussed already on these forums - http://www.solucija.com/forum/viewtopic.php?id=3822). Try http://www.solucija.com/ghjghj/fghfhghgfhghg/
There is a mod available for this issue, you can find it here (http://www.solucija.com/forum/viewtopic.php?id=2348). It doesn't look like the mod is finished.
The basic idea, is to send an ErrorDocument 404 header.

Regards,

Joost
Well ... it's more problems when applying/writing mods - it's easy to create crappy (as in complete nonsense) URI that will still allow to access article. And everything will looks dandy to user in this case ;)

As for that 404 page - it's still not finished so there is not mod yet.

P.S. Ok, Joost, here is full quote. I thought I didn't changed meaning, sorry. Only I hope that quoted post is still the same, not checking that.
Title: sNews and Duplicate Content
Post by: Joost on April 26, 2007, 05:48:13 pm
Very considered of you codetwist. It was not such a big deal, but i thought these two lines together could be misinterpreted.  So I used the :/  icon and not the  :mad:  icon.

Regards
Title: sNews and Duplicate Content
Post by: piXelatedEmpire on April 27, 2007, 02:02:30 am
Quote from: Joost
Quote from: codetwist
Quote from: Joost
...
No well designed webcrawler has ever looked for http://www.solucija.com/ghjghj/new-free-template-internet-corporation/,  until today. :D
...
There is a mod available for this issue, you can find it here (http://www.solucija.com/forum/viewtopic.php?id=2348). It doesn't look like the mod is finished.
The basic idea, is to send an ErrorDocument 404 header.
...
Well ... it's more problems when applying/writing mods - it's easy to create crappy (as in complete nonsense) URI that will still allow to access article. And everything will looks dandy to user in this case ;)

As for that 404 page - it's still not finished so there is not mod yet.
Nice way of quoting, codetwist. You can make me say anything this way. :/
Actually, this way of quoting is much cleaner as you can edit out anything that is relevant and keep post sizes smaller.

Now, back on topic lads :D
Title: sNews and Duplicate Content
Post by: Joost on April 27, 2007, 02:22:06 am
Quote from: piXelatedEmpire
Quote from: Joost
Quote from: codetwist
Well ... it's more problems when applying/writing mods - it's easy to create crappy (as in complete nonsense) URI that will still allow to access article. And everything will looks dandy to user in this case ;)

As for that 404 page - it's still not finished so there is not mod yet.
Nice way of quoting, codetwist. You can make me say anything this way. :/
Actually, this way of quoting is much cleaner as you can edit out anything that is relevant and keep post sizes smaller.

Now, back on topic lads :D
OK :P
Title: sNews and Duplicate Content
Post by: piXelatedEmpire on May 02, 2007, 02:33:32 am
Guys, a heads up... this issue is being addressed in the next version of sNews.  Stay tuned!  :cool:
Title: sNews and Duplicate Content
Post by: Joost on May 02, 2007, 02:57:31 am
Quote from: piXelatedEmpire
Guys, a heads up... this issue is being addressed in the next version of sNews.  Stay tuned!  :cool:
Yes, less quoting = less duplicated content :lol:  :lol:  :lol: