Please login or register.

Login with username, password and session length
Advanced search  
Pages: 1 2 [3] 4

Author Topic: Restructuring the articles table  (Read 21113 times)

centered

  • Guest
Re: Restructuring the articles table
« Reply #30 on: February 13, 2008, 05:22:19 AM »

Quote
Not too hard for me either, but it certainly won't be too easy for the database engine which has to search an extra field...  ;)
or to retrieve a field it wont use (select *)?
select title, seftitle, date, category from articles ...
Exactly, for index pages:
select title, id, summary...
And the article pages:
select title, id, atext...

 I meant for the search function...
Logged

invarbrass

  • Full Member
  • ***
  • Karma: 18
  • Posts: 117
    • http://snews.extremebittorrent.com
Re: Restructuring the articles table
« Reply #31 on: February 13, 2008, 05:24:20 AM »

Exactly, for index pages:
select title, id, summary...
And the article pages:
select title, id, atext...
except if someone chooses not to [break] an article (because it's rather short, maybe) in which case only a link will be shown (will it??) as there is no summary field content[[error handling required here as well maybe]]. (currently, if no [break], full (but short in total length) article is displayed.)
What would be desirable??
not if we we use the existing snews parser: if no break tag is found, grab the first N characters as summary
Logged

invarbrass

  • Full Member
  • ***
  • Karma: 18
  • Posts: 117
    • http://snews.extremebittorrent.com
Re: Restructuring the articles table
« Reply #32 on: February 13, 2008, 05:28:20 AM »

Quote
Not too hard for me either, but it certainly won't be too easy for the database engine which has to search an extra field...  ;)
or to retrieve a field it wont use (select *)?
select title, seftitle, date, category from articles ...
Exactly, for index pages:
select title, id, summary...
And the article pages:
select title, id, atext...

 I meant for the search function...

select title, seftitle, date, category from articles ... should perform better than select * IMHO
« Last Edit: February 13, 2008, 05:30:25 AM by invarbrass »
Logged

centered

  • Guest
Re: Restructuring the articles table
« Reply #33 on: February 13, 2008, 05:28:52 AM »

here's a comparison that comes to mind:
separate summary and rest of the article:
search: the database has to perform search on two separate fields... possible overhead?
article page: retrieve both summary and atext fields, contatenate them to get the full text... overhead here also
search: the search has to search 3 fields currently...
article page: if the current function selects * from articles then the overhead is null since the db already pulled this data. I wold assume a decision could be made to have the summary only in the category or home page and have the full article in the article page, or vice versa.
category or home page: db pulls full summary, no overhead

Quote
break-less article:
search: works right out of the box, no overhead
article page: retrieve the atext field only, no need for processing... no overhead

how is a breakless article different from a seperate summary an article?  I assume you mean a article with a break in it
THEN:
search: works right out of the box, no overhead
articlepage: retrieve the atext field only, no need for processing... no overhead
category page or home page: Overhead: must process partial request, awaiting signal to use the remainder.
Logged

centered

  • Guest
Re: Restructuring the articles table
« Reply #34 on: February 13, 2008, 05:29:51 AM »

Quote
Not too hard for me either, but it certainly won't be too easy for the database engine which has to search an extra field...  ;)
or to retrieve a field it wont use (select *)?
select title, seftitle, date, category from articles ...
Exactly, for index pages:
select title, id, summary...
And the article pages:
select title, id, atext...

 I meant for the search function...

select title, seftitle, date, category from articles ... should perform better
select title, seftitle, date, category from articles ...

I think i said that before... lol
Logged

invarbrass

  • Full Member
  • ***
  • Karma: 18
  • Posts: 117
    • http://snews.extremebittorrent.com
Re: Restructuring the articles table
« Reply #35 on: February 13, 2008, 05:32:54 AM »

here's a comparison that comes to mind:
separate summary and rest of the article:
search: the database has to perform search on two separate fields... possible overhead?
article page: retrieve both summary and atext fields, contatenate them to get the full text... overhead here also
search: the search has to search 3 fields currently...
article page: if the current function selects * from articles then the overhead is null since the db already pulled this data. I wold assume a decision could be made to have the summary only in the category or home page and have the full article in the article page, or vice versa.
category or home page: db pulls full summary, no overhead

Quote
break-less article:
search: works right out of the box, no overhead
article page: retrieve the atext field only, no need for processing... no overhead

how is a breakless article different from a seperate summary an article?  I assume you mean a article with a break in it
THEN:
search: works right out of the box, no overhead
articlepage: retrieve the atext field only, no need for processing... no overhead
category page or home page: Overhead: must process partial request, awaiting signal to use the remainder.

no, break-less means the article without a break tag. we strip the break tag from atext when saving.
Logged

philmoz

  • High flyer
  • ULTIMATE member
  • ******
  • Karma: 161
  • Posts: 1988
    • fiddle 'n fly
Re: Restructuring the articles table
« Reply #36 on: February 13, 2008, 05:33:29 AM »

Exactly, for index pages:
select title, id, summary...
And the article pages:
select title, id, atext...
except if someone chooses not to [break] an article (because it's rather short, maybe) in which case only a link will be shown (will it??) as there is no summary field content[[error handling required here as well maybe]]. (currently, if no [break], full (but short in total length) article is displayed.)
What would be desirable??
not if we we use the existing snews parser: if no break tag is found, grab the first N characters as summary
existing parser displays whole article if no [break] is found, which means that a new method will need to be included if some sort of summary is to be shown, in which case, it should be done when article is saved, except if it is of the type that shouldn't be summarised -- pages(),extra().
Which then introduces the conundrum of selecting validating text from the deliberately un[break]en text, and having a setting somewhere that allows the altering of the length.
Logged
Of all the things I have lost, it is my mind that I miss the most.

invarbrass

  • Full Member
  • ***
  • Karma: 18
  • Posts: 117
    • http://snews.extremebittorrent.com
Re: Restructuring the articles table
« Reply #37 on: February 13, 2008, 05:41:37 AM »

existing parser displays whole article if no [break] is found, which means that a new method will need to be included if some sort of summary is to be shown, in which case, it should be done when article is saved, except if it is of the type that shouldn't be summarised -- pages(),extra().
Which then introduces the conundrum of selecting validating text from the deliberately un[break]en text, and having a setting somewhere that allows the altering of the length.
something like echo $r['summary'];  :)
yes, we need to reconstruct the full article before editing. this is a trade-off for the the speed gain.
Logged

philmoz

  • High flyer
  • ULTIMATE member
  • ******
  • Karma: 161
  • Posts: 1988
    • fiddle 'n fly
Re: Restructuring the articles table
« Reply #38 on: February 13, 2008, 06:01:37 AM »

existing parser displays whole article if no [break] is found, which means that a new method will need to be included if some sort of summary is to be shown, in which case, it should be done when article is saved, except if it is of the type that shouldn't be summarised -- pages(),extra().
Which then introduces the conundrum of selecting validating text from the deliberately un[break]en text, and having a setting somewhere that allows the altering of the length.
something like echo $r['summary'];  :)
yes, we need to reconstruct the full article before editing. this is a trade-off for the the speed gain.
if $r['summary']; is empty, there is nothing to display.
If we then select the first N chars, we may break into a html tag.. eg.
<p>yada yada <strong>YADA</strong> yada yada yada yada </p>
if N is set to 20, then we will select
<p>yada yada <strong
which will be displayed as the summary like
yada yada <strong
which is not desirable at all, plus the </p> is missing, unless a certain mod is applied ;)
So, auto selecting the summary should be tossed out.
If no [break] is present when editing, no summary should be generated.
If no summary present when viewing the article listing, either the whole article is to be retrieved, or nothing is to be displayed under the link in the listing.

Logged
Of all the things I have lost, it is my mind that I miss the most.

invarbrass

  • Full Member
  • ***
  • Karma: 18
  • Posts: 117
    • http://snews.extremebittorrent.com
Re: Restructuring the articles table
« Reply #39 on: February 13, 2008, 06:58:28 AM »

if $r['summary']; is empty, there is nothing to display.
If we then select the first N chars, we may break into a html tag.. eg.
<p>yada yada <strong>YADA</strong> yada yada yada yada </p>
if N is set to 20, then we will select
<p>yada yada <strong
which will be displayed as the summary like
yada yada <strong
which is not desirable at all, plus the </p> is missing, unless a certain mod is applied ;)
So, auto selecting the summary should be tossed out.
If no [break] is present when editing, no summary should be generated.
If no summary present when viewing the article listing, either the whole article is to be retrieved, or nothing is to be displayed under the link in the listing.

here's a snippet from snews.php:
Code: [Select]
$short_display = strpos($text, '[break]');
$shorten = $short_display == 0 ? 9999000 : $short_display;

the condition you describe can also happen with the default snews. if the admin places the break tag incorrectly, i.e. before closing a HTML tag.
« Last Edit: February 13, 2008, 07:06:48 AM by invarbrass »
Logged

philmoz

  • High flyer
  • ULTIMATE member
  • ******
  • Karma: 161
  • Posts: 1988
    • fiddle 'n fly
Re: Restructuring the articles table
« Reply #40 on: February 13, 2008, 08:05:11 AM »

the condition you describe can also happen with the default snews. if the admin places the break tag incorrectly, i.e. before closing a HTML tag.
That is correct, but at least the admin can repair such an error
Logged
Of all the things I have lost, it is my mind that I miss the most.

invarbrass

  • Full Member
  • ***
  • Karma: 18
  • Posts: 117
    • http://snews.extremebittorrent.com
Re: Restructuring the articles table
« Reply #41 on: February 14, 2008, 11:19:14 AM »

the condition you describe can also happen with the default snews. if the admin places the break tag incorrectly, i.e. before closing a HTML tag.
That is correct, but at least the admin can repair such an error

I agree, but admin can fix the errors in this case also... by adding a break tag in the proper place. In the end, it doesn't make much difference whether the error occurs due to the proposed system automatically cropping the text, or the author inserts an improper tag... the end result is the same, and both cases are correctable by the site-admin.

Anyways, my personal preference is:
Quote
no break tag found --> summary = empty --> only the article title is displayed in the index page

Of course we could implement an HTML tag-aware summary extraction mechanism, but it'd be too awkward and resource-hungry IMHO.
Logged

philmoz

  • High flyer
  • ULTIMATE member
  • ******
  • Karma: 161
  • Posts: 1988
    • fiddle 'n fly
Re: Restructuring the articles table
« Reply #42 on: February 14, 2008, 11:53:37 AM »

Anyways, my personal preference is:
Quote
no break tag found --> summary = empty --> only the article title is displayed in the index page
Of course we could implement an HTML tag-aware summary extraction mechanism, but it'd be too awkward and resource-hungry IMHO.
Agree with all that... but

I still reckon it would be better to avoid duplication of the 'summary' text, as when a link is selected, you are already accessing that article's entry, getting the 2 text fields and joining them will have very little overhead.
Granted, pulling the whole text might save a nano second or 2, but it is the duplication that irks me...
Logged
Of all the things I have lost, it is my mind that I miss the most.

invarbrass

  • Full Member
  • ***
  • Karma: 18
  • Posts: 117
    • http://snews.extremebittorrent.com
Re: Restructuring the articles table
« Reply #43 on: February 14, 2008, 01:19:56 PM »

Agree with all that... but

I still reckon it would be better to avoid duplication of the 'summary' text, as when a link is selected, you are already accessing that article's entry, getting the 2 text fields and joining them will have very little overhead.
Granted, pulling the whole text might save a nano second or 2, but it is the duplication that irks me...

Yup, I agree with you. However, if you take scalability into consideration, it could be an issue. Let's see the pros and cons of both approach. The following scenario applies to article page generation only:

No duplicate version, separate summary and rest of the article:
Cons:
1. we need to grab the summary field in addition to the atext field <<-- slight overhead
Code: [Select]
SELECT summary, atext, .... FROM articles WHERE ...2. We have to concatenate the 2 strings:
Code: [Select]
$full_text = $r['summary'] . ' ' .$r['atext'];I am not very well acquainted with PHP, but from my past experience with compiled languages, string operations are among the most expensive operations, which you better avoid. Visit any assembly language newsgroup and you'll see so many heavily optimized assembly-language versions of RTL functions. I don't know exactly how PHP handles concatenations, but let's suffice it to say you'll freak out if you saw the amount of opcodes it execute in the background to carry out that simple looking statement. This could be a bottleneck.
Pros:
The database will be a wee bit smaller.

The duplicate version, duplicate summary + full artcile:
Pros
1. only the atext field is needed:
Code: [Select]
SELECT atext, .... FROM articles WHERE ...2. no need for expensive string concatenations:
Code: [Select]
echo $r['atext'];Cons
The database file will be slightly larger.... how big? Somewhere between 700 KB to 2.5 MB for 10,000 articles

Taking both the scenarios into consideration, it's quite obvious the second approach will impart greater scalability. While low-traffic sites will never see any difference between the two approaches, super-busy sites will experience better performance, for the cost of a few megabytes of disk space.

Besides, the the summary field is non-indexed. So there won't be any negative effect if we put in duplicate content in it.

Also keep in mind that with the non-duplicate approach, we actually have to perform search on two fields: both summary and atext.... instead of the only atext field in case of duplicate approach

Future modifications to snews may also be affected with the no-duplicate approach. I don't know if snews utilizes MySQL's FTS for searching, but if we split the content into summary and atext, we might have create indexes on both those fields.WIth the duplicate approach, only the atext field needs to be indexed.

However, the most important thing about them is this: Simplistic-ally speaking, with the duplicate method, we simply introduce an extra field, no core algorithm is changed. Rest of the script will work right out of the box. But if we split the summary and article, the core logic of the program will have to be changed. We risk braking the code in many places. Future development of snews will also be affected.

I proposed the summary field only because of scalability. There are many ways to achieve this, but the duplicate-content approach seems to be the most efficient to me, unless I am missing something.

There may be other issues (can't think of anything right now).

More discussion may be needed on this topic. What do you think?
« Last Edit: February 14, 2008, 01:25:36 PM by invarbrass »
Logged

philmoz

  • High flyer
  • ULTIMATE member
  • ******
  • Karma: 161
  • Posts: 1988
    • fiddle 'n fly
Re: Restructuring the articles table
« Reply #44 on: February 15, 2008, 09:00:46 PM »

Quote
More discussion may be needed on this topic. What do you think?
I think I might be flogging a dying horse :)

Still, the discussion is worth it (maybe not for the horse, but anyway).

What overhead is involve with the mysql concat??
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat
Logged
Of all the things I have lost, it is my mind that I miss the most.
Pages: 1 2 [3] 4