sNews Forum

sNews 1.6 (previous version) => Exploring sNews Versatility => Topic started by: invarbrass on February 10, 2008, 02:40:08 pm

Title: Restructuring the articles table
Post by: invarbrass on February 10, 2008, 02:40:08 pm
The current version of sNews.php stores the summary and the full article in the same field ('text'). The summary is extracted from the full text by parsing the document until the [break] tag is found. This could lead to some performance degradation on a very busy site.

Alternatively, we could introduce another field e.g. summary to store only the summary of the article. That way we can simply display the summary and avoid searching for the break tag.

We can modify the admin article page in two ways:

1. We use two separate textarea-s for the summary and full text in add/edit article page.
2. A better solution would be to use the same article admin page as before with a single textarea for the article body. But when the article is posted, we extract the summary section, remove the break tag from the full text and save the data to appropriate fields.
This approach will cause the database to be slightly bigger since we are duplicating part of the text field. What do you think?
Title: Re: Restructuring the articles table
Post by: philmoz on February 13, 2008, 03:05:27 am
why duplicate.
If you are going to collect and store the summary seperate, remove it from the text, but reattach when article is viewed in full. -- since you are querying the table, surely the overhead of retrieving the extra field won't be excessive... would it? I mean, would $fulltext = $r['summary'].$r['all_the_rest'];  be as 'resource hungry' as what is now?
note... when calling article for edit, you will have to rejoin and replace the [break].

to replace the [break] with your proposal may be rather awkward  if you have stripped it on save as you are suggesting. You would have to access and count the chars in the summary field, and then use that result to put the [break] back in the whole text.

Certainly, removing the parsing for [break] from the public access script would definitly have benefits. However it is achieved by admin access script wouldn't be too important in regard to the result.


As for seperate textfields, that, I wouldn't go along with, but any improvement with what is in use now I would wholeheardtedly support.
Title: Re: Restructuring the articles table
Post by: Joost on February 13, 2008, 03:30:31 am
Replacing [break] is not that awkward. On update the text is split into two elements of an array. function explode can be used, or the method used now. In opposite direction [break] is used to glue the introtext to the rest, using function implode. Just a few lines of code needed.

As for the performance issue. There is one, especially newest articles displayed on home. I have not yet determined whether it is the query or the function which plays a major role.
 

Title: Re: Restructuring the articles table
Post by: philmoz on February 13, 2008, 03:37:19 am
Replacing [break] is not that awkward. On update the text is split into two elements of an array. function explode can be used, or the method used now. In opposite direction [break] is used to glue the introtext to the rest, using function implode. Just a few lines of code needed.

As for the performance issue. There is one, especially newest articles displayed on home. I have not yet determined whether it is the query or the function which plays a major role.
Understand that, it's just tat invar has proposed stripping [break] from the stored fulltext. The only way to recover is to use the stored summary to work out where it is supposed to go.
Title: Re: Restructuring the articles table
Post by: Joost on February 13, 2008, 03:47:33 am
As I understand it, everything above [break] is stored in field 'shorttext' everything below [break] is stored in field 'rest-of-the-text'. So break is inserted after the end of 'shorttext' when the full article is requested.
Something like this:

implode("[break]", $array_all_text);

When newest articles are displayed on the frontpage, only 'shorttext' is retrieved. That way, you won't have to calculate where [break] is and what to leave out after read more.
Title: Re: Restructuring the articles table
Post by: philmoz on February 13, 2008, 03:55:38 am
no no no.
invar is proposing to replicate the intro/summary (bit before [break]) and store in new table field, and remove [break] from the full article.
His aim is to remove the requirement to parse the full article to remove the [break] when full article is viewed, and to remove the requirement to parse the article and select only the summary when the shorttext is required.
Which, if achieved would reduce work of script significantly where a blog style listing of articles is on display... especially if the setting is set to 10 or more .


Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 03:57:14 am
why duplicate.
If you are going to collect and store the summary seperate, remove it from the text, but reattach when article is viewed in full. -- since you are querying the table, surely the overhead of retrieving the extra field won't be excessive... would it? I mean, would $fulltext = $r['summary'].$r['all_the_rest'];  be as 'resource hungry' as what is now?

I am not proposing we reattach the fulltext. Instead we extract the summary from the fulltext when adding/editing the article. The article field contains the full text (including the summary + rest of the article). So when a user requests a index page, we just dump the contents of the summary field, no parsing is needed. When the user wants to view the full article, we just dump the contents from the article field, again no parsing is necessary.

note... when calling article for edit, you will have to rejoin and replace the [break].
to replace the [break] with your proposal may be rather awkward  if you have stripped it on save as you are suggesting. You would have to access and count the chars in the summary field, and then use that result to put the [break] back in the whole text.

Certainly, removing the parsing for [break] from the public access script would definitly have benefits. However it is achieved by admin access script wouldn't be too important in regard to the result.

As for seperate textfields, that, I wouldn't go along with, but any improvement with what is in use now I would wholeheardtedly support.

This is right. We will need to reconstruct the text by including the [break] tag. It IS a little cumbersome to rebuild the article using the strlen($summary) you mentioned. But this is how it has to be done if you do not want to insert an extra summary textarea in the admin panel.

If we had an extra text-box for summary, we can safely skip this step. This is a trade-off you'll have to pay if you want to retain the old interface.
Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 04:00:47 am
As I understand it, everything above [break] is stored in field 'shorttext' everything below [break] is stored in field 'rest-of-the-text'. So break is inserted after the end of 'shorttext' when the full article is requested.
Something like this:

implode("[break]", $array_all_text);

When newest articles are displayed on the frontpage, only 'shorttext' is retrieved. That way, you won't have to calculate where [break] is and what to leave out after read more.

This is also a possibility, if you don't want to alter the fulltext field. We'll still need to remove the [break] tag while displaying the full article. At least the index pages will load faster.

However, we need to consider the cost/benifit ratio. Do you want to parse the same article 100000 times just to avoid re-constructing the text in case the author may want to edit it in the future? Most articles are never updated.
Title: Re: Restructuring the articles table
Post by: Joost on February 13, 2008, 04:01:19 am
no no no.
invar is proposing to replicate the intro/summary (bit before [break]) and store in new table field, and remove [break] from the full article.
His aim is to remove the requirement to parse the full article to remove the [break] when full article is viewed, and to remove the requirement to parse the article and select only the summary when the shorttext is required.

I see it now. You are right. Sorry for the misunderstanding.
Then my idea is original. I actually like that idea.
Title: Re: Restructuring the articles table
Post by: centered on February 13, 2008, 04:03:58 am
If we had an extra text-box for summary, we can safely skip this step. This is a trade-off you'll have to pay if you want to retain the old interface.

I am paying attention to this article, since my summary mod was never completed, but utilize it, if you wish to finish your idea invar
http://snewscms.com/forum/index.php?topic=6482.0

Many news sites have the structure to display the summary of an article, then display the full text in the full page.  That is how I think sNews should be.
Title: Re: Restructuring the articles table
Post by: philmoz on February 13, 2008, 04:04:30 am
@invar.
If you seperate intro from full article and store it, and store the rest of thearticle (without the intro), then when calling for full display, just need to join them together
$fulltext = $r['summary'].$r['all_the_rest'];

when sending to editor, you join them with the [break] sandwiched between...
$fulltext = $r['summary'].'[break]'.$r['all_the_rest'];
no need for counting/strlen etc.
Title: Re: Restructuring the articles table
Post by: Joost on February 13, 2008, 04:07:08 am
This is also a possibility, if you don't want to alter the fulltext field. We'll still need to remove the [break] tag while displaying the full article. At least the index pages will load faster.

No, no. I thought of to separate field, no redundancy.
field sometext and field othertext

In category, only sometext is shown. When the actual article is requested, or edited,  sometext and othertext are glued together, using [break].

Update: This is not how I understand invarbrass's proposition.
This is my proposition. Sorry for the confusion.
Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 04:10:08 am
@invar.
If you seperate intro from full article and store it, and store the rest of thearticle (without the intro), then when calling for full display, just need to join them together
$fulltext = $r['summary'].$r['all_the_rest'];

when sending to editor, you join them with the [break] sandwiched between...
$fulltext = $r['summary'].'[break]'.$r['all_the_rest'];

I am not proposing to remove summary from the article field. Only the break tag is removed, and nothing else. We duplicate the summary from the article field.
So when viewing the index page:
Code: [Select]
echo $r['summary'];When viewing the article page:
Code: [Select]
echo $r['article'];When sending to the editor:
Code: [Select]
$len = strlen($r['summary']);
// Just copy the rest of the text from the article starting at $len
$rest_of_the_text = str_copy($r['article'], $len, strlen($r['article']);
$editable_text = $r['summary']. '[break]' . $rest_of_the_text;
Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 04:15:53 am
I am paying attention to this article, since my summary mod was never completed, but utilize it, if you wish to finish your idea invar
http://snewscms.com/forum/index.php?topic=6482.0

Many news sites have the structure to display the summary of an article, then display the full text in the full page.  That is how I think sNews should be.
Thanks quilini, I am taking a look at it. Let's see if we can do away with an extra summary text-box.
Title: Re: Restructuring the articles table
Post by: philmoz on February 13, 2008, 04:18:59 am
@invar.
I acknowledge that you weren't going to seperate the items. I am suggesting that is what could be done.
Then you will not be storing an extra 80?? - 250?? characters in the db.
These characters (which is the summary text) are duplicated in your approach, making the db significantly larger in a site with masses of articles. Take H.A.C's site of apparently over 10,000 articles. If each article has a 250 character summary, that's over 2,500,000 surplus chars that are replicated.
Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 04:27:22 am
@invar.
I acknowledge that you weren't going to seperate the items. I am suggesting that is what could be done.
Then you will not be storing an extra 80?? - 250?? characters in the db.
These characters (which is the summary text) are duplicated in your approach, making the db significantly larger in a site with masses of articles. Take H.A.C's site of apparently over 10,000 articles. If each article has a 250 character summary, that's over 2,500,000 surplus chars that are replicated.

I agree, all the three methods mentioned so far have their pros and cons. We need to select the best approach for out purpose.

The DB will be larger with the duplicate-summary method, but I think the size increase won't be significant. Taking HAC's site for example, this approach will only increase the DB by only 2.34 MB (for summary size of 250 chars; 750 KB if the summary is 80 characters), provided he does have 10,000 articles. So even for the most extreme case, the trade-off isn't that big. And looking at his site (and others), most of the articles will never be edited in the future. So we could save a whole lot of processing if we just separated the summary from the full article. This will benifit large sites like HAC's.
Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 04:36:30 am
I forgot to mention, if you separate the summary and article, you will need to alter the search code as well. But for the "duplication approach", the default search method will work out of the box.
Title: Re: Restructuring the articles table
Post by: centered on February 13, 2008, 04:38:22 am
Let's see if we can do away with an extra summary text-box.

Why? Isn't that current functionality?  I would like to see the summary and the full article seperated.  The db doesn't have to access the same information twice and it will be a few lines less code to deal with.
Title: Re: Restructuring the articles table
Post by: centered on February 13, 2008, 04:39:51 am
I forgot to mention, if you separate the summary and article, you will need to alter the search code as well. But for the "duplication approach", the default search method will work out of the box.

summary LIKE '%$keywords[$i]%' ?? Not too hard
Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 04:44:36 am
Let's see if we can do away with an extra summary text-box.

Why? Isn't that current functionality?  I would like to see the summary and the full article seperated.  The db doesn't have to access the same information twice and it will be a few lines less code to deal with.

I agree. However, while the "2-editor" approach will be easier (and cleaner) to implement, it's also slightly cumbersome on user-end. Think about it, the user needs to copy-paste the content from the article editor. With the old-style "single-editor" approach, the user needs only to click on the break button.
Title: Re: Restructuring the articles table
Post by: Joost on February 13, 2008, 04:45:22 am
@Philmozz!!!!

New avatar? How is it hanging? ;D

Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 04:46:14 am
I forgot to mention, if you separate the summary and article, you will need to alter the search code as well. But for the "duplication approach", the default search method will work out of the box.

summary LIKE '%$keywords[$i]%' ?? Not too hard

Not too hard for me either, but it certainly won't be too easy for the database engine which has to search an extra field...  ;) Not to mention snews' default search function isn't exactly top-notch.
Title: Re: Restructuring the articles table
Post by: centered on February 13, 2008, 04:56:16 am
Quote
Not too hard for me either, but it certainly won't be too easy for the database engine which has to search an extra field...  ;)

or to retrieve a field it wont use (select *)?
select title, seftitle, date, category from articles ...

Title: Re: Restructuring the articles table
Post by: philmoz on February 13, 2008, 04:56:44 am
search... damn, that one always bites me. :)
... and just had to try searching for [break]... it is not invisible to search.
Just did this on HAC's site... 17590 results were found for query [break].

ok, assuming searching the 'seperated' summary can be easily overcome, what would the overhead be when concatenating summary and rest as I put forward, as opposed to pulling full ([break]less) article from db?

hmmm, aaaand, [break] detection determines if short article is or is not shown.
So, that detection would need to be done using the new articles.summary field. (ie:- empty or not)
If the admin removes the [break], (converting to page type, or extra type article), then emptying the summary field in both cases is required.
Also need to look at what else is affected if the [break] tag is not actually stored with the text.
Title: Re: Restructuring the articles table
Post by: centered on February 13, 2008, 04:57:28 am
Not to mention snews' default search function isn't exactly top-notch.

Not the first one to mention that... care to elaborate?

Title: Re: Restructuring the articles table
Post by: centered on February 13, 2008, 05:00:01 am
ok, assuming searching the 'seperated' summary can be easily overcome, what would the overhead be when concatenating summary and rest as I put forward, as opposed to pulling full ([break]less) article from db?

I would assume the overhead would be less than the db pulling the entire article and doing nothing with the remainder after the break function...
Title: Re: Restructuring the articles table
Post by: philmoz on February 13, 2008, 05:06:54 am
@equilni
I think we can all agree on that .
Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 05:07:23 am
Quote
Not too hard for me either, but it certainly won't be too easy for the database engine which has to search an extra field...  ;)
or to retrieve a field it wont use (select *)?
select title, seftitle, date, category from articles ...
Exactly, for index pages:
select title, id, summary...
And the article pages:
select title, id, atext...
Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 05:14:51 am
search... damn, that one always bites me. :)
... and just had to try searching for [break]... it is not invisible to search.
Just did this on HAC's site... 17590 results were found for query [break].

ok, assuming searching the 'seperated' summary can be easily overcome, what would the overhead be when concatenating summary and rest as I put forward, as opposed to pulling full ([break]less) article from db?

here's a comparison that comes to mind:
separate summary and rest of the article:
search: the database has to perform search on two separate fields... possible overhead?
article page: retrieve both summary and atext fields, contatenate them to get the full text... overhead here also

break-less article:
search: works right out of the box, no overhead
article page: retrieve the atext field only, no need for processing... no overhead
Title: Re: Restructuring the articles table
Post by: philmoz on February 13, 2008, 05:18:47 am
Exactly, for index pages:
select title, id, summary...
And the article pages:
select title, id, atext...
except if someone chooses not to [break] an article (because it's rather short, maybe) in which case only a link will be shown (will it??) as there is no summary field content[[error handling required here as well maybe]]. (currently, if no [break], full (but short in total length) article is displayed.)
What would be desirable??
Title: Re: Restructuring the articles table
Post by: centered on February 13, 2008, 05:22:19 am
Quote
Not too hard for me either, but it certainly won't be too easy for the database engine which has to search an extra field...  ;)
or to retrieve a field it wont use (select *)?
select title, seftitle, date, category from articles ...
Exactly, for index pages:
select title, id, summary...
And the article pages:
select title, id, atext...

 I meant for the search function...
Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 05:24:20 am
Exactly, for index pages:
select title, id, summary...
And the article pages:
select title, id, atext...
except if someone chooses not to [break] an article (because it's rather short, maybe) in which case only a link will be shown (will it??) as there is no summary field content[[error handling required here as well maybe]]. (currently, if no [break], full (but short in total length) article is displayed.)
What would be desirable??
not if we we use the existing snews parser: if no break tag is found, grab the first N characters as summary
Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 05:28:20 am
Quote
Not too hard for me either, but it certainly won't be too easy for the database engine which has to search an extra field...  ;)
or to retrieve a field it wont use (select *)?
select title, seftitle, date, category from articles ...
Exactly, for index pages:
select title, id, summary...
And the article pages:
select title, id, atext...

 I meant for the search function...

select title, seftitle, date, category from articles ... should perform better than select * IMHO
Title: Re: Restructuring the articles table
Post by: centered on February 13, 2008, 05:28:52 am
here's a comparison that comes to mind:
separate summary and rest of the article:
search: the database has to perform search on two separate fields... possible overhead?
article page: retrieve both summary and atext fields, contatenate them to get the full text... overhead here also
search: the search has to search 3 fields currently...
article page: if the current function selects * from articles then the overhead is null since the db already pulled this data. I wold assume a decision could be made to have the summary only in the category or home page and have the full article in the article page, or vice versa.
category or home page: db pulls full summary, no overhead

Quote
break-less article:
search: works right out of the box, no overhead
article page: retrieve the atext field only, no need for processing... no overhead

how is a breakless article different from a seperate summary an article?  I assume you mean a article with a break in it
THEN:
search: works right out of the box, no overhead
articlepage: retrieve the atext field only, no need for processing... no overhead
category page or home page: Overhead: must process partial request, awaiting signal to use the remainder.
Title: Re: Restructuring the articles table
Post by: centered on February 13, 2008, 05:29:51 am
Quote
Not too hard for me either, but it certainly won't be too easy for the database engine which has to search an extra field...  ;)
or to retrieve a field it wont use (select *)?
select title, seftitle, date, category from articles ...
Exactly, for index pages:
select title, id, summary...
And the article pages:
select title, id, atext...

 I meant for the search function...

select title, seftitle, date, category from articles ... should perform better
select title, seftitle, date, category from articles ...

I think i said that before... lol
Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 05:32:54 am
here's a comparison that comes to mind:
separate summary and rest of the article:
search: the database has to perform search on two separate fields... possible overhead?
article page: retrieve both summary and atext fields, contatenate them to get the full text... overhead here also
search: the search has to search 3 fields currently...
article page: if the current function selects * from articles then the overhead is null since the db already pulled this data. I wold assume a decision could be made to have the summary only in the category or home page and have the full article in the article page, or vice versa.
category or home page: db pulls full summary, no overhead

Quote
break-less article:
search: works right out of the box, no overhead
article page: retrieve the atext field only, no need for processing... no overhead

how is a breakless article different from a seperate summary an article?  I assume you mean a article with a break in it
THEN:
search: works right out of the box, no overhead
articlepage: retrieve the atext field only, no need for processing... no overhead
category page or home page: Overhead: must process partial request, awaiting signal to use the remainder.

no, break-less means the article without a break tag. we strip the break tag from atext when saving.
Title: Re: Restructuring the articles table
Post by: philmoz on February 13, 2008, 05:33:29 am
Exactly, for index pages:
select title, id, summary...
And the article pages:
select title, id, atext...
except if someone chooses not to [break] an article (because it's rather short, maybe) in which case only a link will be shown (will it??) as there is no summary field content[[error handling required here as well maybe]]. (currently, if no [break], full (but short in total length) article is displayed.)
What would be desirable??
not if we we use the existing snews parser: if no break tag is found, grab the first N characters as summary
existing parser displays whole article if no [break] is found, which means that a new method will need to be included if some sort of summary is to be shown, in which case, it should be done when article is saved, except if it is of the type that shouldn't be summarised -- pages(),extra().
Which then introduces the conundrum of selecting validating text from the deliberately un[break]en text, and having a setting somewhere that allows the altering of the length.
Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 05:41:37 am
existing parser displays whole article if no [break] is found, which means that a new method will need to be included if some sort of summary is to be shown, in which case, it should be done when article is saved, except if it is of the type that shouldn't be summarised -- pages(),extra().
Which then introduces the conundrum of selecting validating text from the deliberately un[break]en text, and having a setting somewhere that allows the altering of the length.
something like echo $r['summary'];  :)
yes, we need to reconstruct the full article before editing. this is a trade-off for the the speed gain.
Title: Re: Restructuring the articles table
Post by: philmoz on February 13, 2008, 06:01:37 am
existing parser displays whole article if no [break] is found, which means that a new method will need to be included if some sort of summary is to be shown, in which case, it should be done when article is saved, except if it is of the type that shouldn't be summarised -- pages(),extra().
Which then introduces the conundrum of selecting validating text from the deliberately un[break]en text, and having a setting somewhere that allows the altering of the length.
something like echo $r['summary'];  :)
yes, we need to reconstruct the full article before editing. this is a trade-off for the the speed gain.
if $r['summary']; is empty, there is nothing to display.
If we then select the first N chars, we may break into a html tag.. eg.
<p>yada yada <strong>YADA</strong> yada yada yada yada </p>
if N is set to 20, then we will select
<p>yada yada <strong
which will be displayed as the summary like
yada yada <strong
which is not desirable at all, plus the </p> is missing, unless a certain mod is applied ;)
So, auto selecting the summary should be tossed out.
If no [break] is present when editing, no summary should be generated.
If no summary present when viewing the article listing, either the whole article is to be retrieved, or nothing is to be displayed under the link in the listing.

Title: Re: Restructuring the articles table
Post by: invarbrass on February 13, 2008, 06:58:28 am
if $r['summary']; is empty, there is nothing to display.
If we then select the first N chars, we may break into a html tag.. eg.
<p>yada yada <strong>YADA</strong> yada yada yada yada </p>
if N is set to 20, then we will select
<p>yada yada <strong
which will be displayed as the summary like
yada yada <strong
which is not desirable at all, plus the </p> is missing, unless a certain mod is applied ;)
So, auto selecting the summary should be tossed out.
If no [break] is present when editing, no summary should be generated.
If no summary present when viewing the article listing, either the whole article is to be retrieved, or nothing is to be displayed under the link in the listing.

here's a snippet from snews.php:
Code: [Select]
$short_display = strpos($text, '[break]');
$shorten = $short_display == 0 ? 9999000 : $short_display;

the condition you describe can also happen with the default snews. if the admin places the break tag incorrectly, i.e. before closing a HTML tag.
Title: Re: Restructuring the articles table
Post by: philmoz on February 13, 2008, 08:05:11 am
the condition you describe can also happen with the default snews. if the admin places the break tag incorrectly, i.e. before closing a HTML tag.
That is correct, but at least the admin can repair such an error
Title: Re: Restructuring the articles table
Post by: invarbrass on February 14, 2008, 11:19:14 am
the condition you describe can also happen with the default snews. if the admin places the break tag incorrectly, i.e. before closing a HTML tag.
That is correct, but at least the admin can repair such an error

I agree, but admin can fix the errors in this case also... by adding a break tag in the proper place. In the end, it doesn't make much difference whether the error occurs due to the proposed system automatically cropping the text, or the author inserts an improper tag... the end result is the same, and both cases are correctable by the site-admin.

Anyways, my personal preference is:
Quote
no break tag found --> summary = empty --> only the article title is displayed in the index page

Of course we could implement an HTML tag-aware summary extraction mechanism, but it'd be too awkward and resource-hungry IMHO.
Title: Re: Restructuring the articles table
Post by: philmoz on February 14, 2008, 11:53:37 am
Anyways, my personal preference is:
Quote
no break tag found --> summary = empty --> only the article title is displayed in the index page
Of course we could implement an HTML tag-aware summary extraction mechanism, but it'd be too awkward and resource-hungry IMHO.
Agree with all that... but

I still reckon it would be better to avoid duplication of the 'summary' text, as when a link is selected, you are already accessing that article's entry, getting the 2 text fields and joining them will have very little overhead.
Granted, pulling the whole text might save a nano second or 2, but it is the duplication that irks me...
Title: Re: Restructuring the articles table
Post by: invarbrass on February 14, 2008, 01:19:56 pm
Agree with all that... but

I still reckon it would be better to avoid duplication of the 'summary' text, as when a link is selected, you are already accessing that article's entry, getting the 2 text fields and joining them will have very little overhead.
Granted, pulling the whole text might save a nano second or 2, but it is the duplication that irks me...

Yup, I agree with you. However, if you take scalability into consideration, it could be an issue. Let's see the pros and cons of both approach. The following scenario applies to article page generation only:

No duplicate version, separate summary and rest of the article:
Cons:
1. we need to grab the summary field in addition to the atext field <<-- slight overhead
Code: [Select]
SELECT summary, atext, .... FROM articles WHERE ...2. We have to concatenate the 2 strings:
Code: [Select]
$full_text = $r['summary'] . ' ' .$r['atext'];I am not very well acquainted with PHP, but from my past experience with compiled languages, string operations are among the most expensive operations, which you better avoid. Visit any assembly language newsgroup and you'll see so many heavily optimized assembly-language versions of RTL functions. I don't know exactly how PHP handles concatenations, but let's suffice it to say you'll freak out if you saw the amount of opcodes it execute in the background to carry out that simple looking statement. This could be a bottleneck.
Pros:
The database will be a wee bit smaller.

The duplicate version, duplicate summary + full artcile:
Pros
1. only the atext field is needed:
Code: [Select]
SELECT atext, .... FROM articles WHERE ...2. no need for expensive string concatenations:
Code: [Select]
echo $r['atext'];Cons
The database file will be slightly larger.... how big? Somewhere between 700 KB to 2.5 MB for 10,000 articles

Taking both the scenarios into consideration, it's quite obvious the second approach will impart greater scalability. While low-traffic sites will never see any difference between the two approaches, super-busy sites will experience better performance, for the cost of a few megabytes of disk space.

Besides, the the summary field is non-indexed. So there won't be any negative effect if we put in duplicate content in it.

Also keep in mind that with the non-duplicate approach, we actually have to perform search on two fields: both summary and atext.... instead of the only atext field in case of duplicate approach

Future modifications to snews may also be affected with the no-duplicate approach. I don't know if snews utilizes MySQL's FTS for searching, but if we split the content into summary and atext, we might have create indexes on both those fields.WIth the duplicate approach, only the atext field needs to be indexed.

However, the most important thing about them is this: Simplistic-ally speaking, with the duplicate method, we simply introduce an extra field, no core algorithm is changed. Rest of the script will work right out of the box. But if we split the summary and article, the core logic of the program will have to be changed. We risk braking the code in many places. Future development of snews will also be affected.

I proposed the summary field only because of scalability. There are many ways to achieve this, but the duplicate-content approach seems to be the most efficient to me, unless I am missing something.

There may be other issues (can't think of anything right now).

More discussion may be needed on this topic. What do you think?
Title: Re: Restructuring the articles table
Post by: philmoz on February 15, 2008, 09:00:46 pm
Quote
More discussion may be needed on this topic. What do you think?
I think I might be flogging a dying horse :)

Still, the discussion is worth it (maybe not for the horse, but anyway).

What overhead is involve with the mysql concat??
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat
Title: Re: Restructuring the articles table
Post by: funlw65 on March 18, 2008, 06:43:16 am
1. Two text area in Article create/edit form.
  - The first part named as: 'The Begining part of article'.
  - The second part named as: 'The Rest of article (if you want to split it for Continue reading)'
Article can't be submited if first text area is empty. This way, the summary table  field is ALWAYS not empty.

Edit: Also, maybe a blogger want entire article on first page (I saw this on many blogs) so, you cant limit $summary to a fixed number of characters. And this is another reason why summary cant be empty. Look at Drupal CMS.

2.Why join them? you just echo($summary) on home page. For article page, echo(summary) and echo(restOfText) , you can even avoid testing if restOfText (I don't remember how you named the second field...) is empty, because if empty not echo anything.

This way, break is history.

EDIT: Also a good rss feed, with more options (title, short or entire article, no verification is needed)

3.I dont know how to search but I think it does not matter much if search function is a little bit slow... not every visitor use it as much as viewing articles.
Title: Re: Restructuring the articles table
Post by: jackp on March 26, 2008, 02:01:28 am
As a complete newbie please don't thrash me if I get this wrong, I know nothing of what is best method for machine load.

I have been thinking about this for awhile, because I would like to implement some sort of "easy" mod/s to improving both rss feeds and article summaries in one go.

First I thought split panel in admin to write summary and article

Second I thought you could just select first x (200) characters of articles

Now I am thinking after reading your discussion maybe a new set of tags [summary] [/summary] can be used in text editor. This way you can use them anywhere in article and the text inside the tag could be used for the rss feed also.

No extra fields in db need be created, only 20 bytes extra to store the summary tags. I don't know whether it would be difficult to add code to parse for extra tags or not. The [break] tag just isn't quite enough for me.

Jackp
Title: Re: Restructuring the articles table
Post by: Joost on March 26, 2008, 02:07:12 am
As a general remark: Why not use the meta description instead?