Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1] 2 3 4

Author Topic: Restructuring the articles table  (Read 21101 times)

invarbrass

  • Full Member
  • ***
  • Karma: 18
  • Posts: 117
    • http://snews.extremebittorrent.com
Restructuring the articles table
« on: February 10, 2008, 02:40:08 PM »

The current version of sNews.php stores the summary and the full article in the same field ('text'). The summary is extracted from the full text by parsing the document until the [break] tag is found. This could lead to some performance degradation on a very busy site.

Alternatively, we could introduce another field e.g. summary to store only the summary of the article. That way we can simply display the summary and avoid searching for the break tag.

We can modify the admin article page in two ways:

1. We use two separate textarea-s for the summary and full text in add/edit article page.
2. A better solution would be to use the same article admin page as before with a single textarea for the article body. But when the article is posted, we extract the summary section, remove the break tag from the full text and save the data to appropriate fields.
This approach will cause the database to be slightly bigger since we are duplicating part of the text field. What do you think?
« Last Edit: February 10, 2008, 07:35:38 PM by invarbrass »
Logged

philmoz

  • High flyer
  • ULTIMATE member
  • ******
  • Karma: 161
  • Posts: 1988
    • fiddle 'n fly
Re: Restructuring the articles table
« Reply #1 on: February 13, 2008, 03:05:27 AM »

why duplicate.
If you are going to collect and store the summary seperate, remove it from the text, but reattach when article is viewed in full. -- since you are querying the table, surely the overhead of retrieving the extra field won't be excessive... would it? I mean, would $fulltext = $r['summary'].$r['all_the_rest'];  be as 'resource hungry' as what is now?
note... when calling article for edit, you will have to rejoin and replace the [break].

to replace the [break] with your proposal may be rather awkward  if you have stripped it on save as you are suggesting. You would have to access and count the chars in the summary field, and then use that result to put the [break] back in the whole text.

Certainly, removing the parsing for [break] from the public access script would definitly have benefits. However it is achieved by admin access script wouldn't be too important in regard to the result.


As for seperate textfields, that, I wouldn't go along with, but any improvement with what is in use now I would wholeheardtedly support.
« Last Edit: February 13, 2008, 03:13:29 AM by philmoz »
Logged
Of all the things I have lost, it is my mind that I miss the most.

Joost

  • Guest
Re: Restructuring the articles table
« Reply #2 on: February 13, 2008, 03:30:31 AM »

Replacing [break] is not that awkward. On update the text is split into two elements of an array. function explode can be used, or the method used now. In opposite direction [break] is used to glue the introtext to the rest, using function implode. Just a few lines of code needed.

As for the performance issue. There is one, especially newest articles displayed on home. I have not yet determined whether it is the query or the function which plays a major role.
 

Logged

philmoz

  • High flyer
  • ULTIMATE member
  • ******
  • Karma: 161
  • Posts: 1988
    • fiddle 'n fly
Re: Restructuring the articles table
« Reply #3 on: February 13, 2008, 03:37:19 AM »

Replacing [break] is not that awkward. On update the text is split into two elements of an array. function explode can be used, or the method used now. In opposite direction [break] is used to glue the introtext to the rest, using function implode. Just a few lines of code needed.

As for the performance issue. There is one, especially newest articles displayed on home. I have not yet determined whether it is the query or the function which plays a major role.
Understand that, it's just tat invar has proposed stripping [break] from the stored fulltext. The only way to recover is to use the stored summary to work out where it is supposed to go.
Logged
Of all the things I have lost, it is my mind that I miss the most.

Joost

  • Guest
Re: Restructuring the articles table
« Reply #4 on: February 13, 2008, 03:47:33 AM »

As I understand it, everything above [break] is stored in field 'shorttext' everything below [break] is stored in field 'rest-of-the-text'. So break is inserted after the end of 'shorttext' when the full article is requested.
Something like this:

implode("[break]", $array_all_text);

When newest articles are displayed on the frontpage, only 'shorttext' is retrieved. That way, you won't have to calculate where [break] is and what to leave out after read more.
« Last Edit: February 13, 2008, 03:54:40 AM by Joost »
Logged

philmoz

  • High flyer
  • ULTIMATE member
  • ******
  • Karma: 161
  • Posts: 1988
    • fiddle 'n fly
Re: Restructuring the articles table
« Reply #5 on: February 13, 2008, 03:55:38 AM »

no no no.
invar is proposing to replicate the intro/summary (bit before [break]) and store in new table field, and remove [break] from the full article.
His aim is to remove the requirement to parse the full article to remove the [break] when full article is viewed, and to remove the requirement to parse the article and select only the summary when the shorttext is required.
Which, if achieved would reduce work of script significantly where a blog style listing of articles is on display... especially if the setting is set to 10 or more .


« Last Edit: February 13, 2008, 03:58:44 AM by philmoz »
Logged
Of all the things I have lost, it is my mind that I miss the most.

invarbrass

  • Full Member
  • ***
  • Karma: 18
  • Posts: 117
    • http://snews.extremebittorrent.com
Re: Restructuring the articles table
« Reply #6 on: February 13, 2008, 03:57:14 AM »

why duplicate.
If you are going to collect and store the summary seperate, remove it from the text, but reattach when article is viewed in full. -- since you are querying the table, surely the overhead of retrieving the extra field won't be excessive... would it? I mean, would $fulltext = $r['summary'].$r['all_the_rest'];  be as 'resource hungry' as what is now?

I am not proposing we reattach the fulltext. Instead we extract the summary from the fulltext when adding/editing the article. The article field contains the full text (including the summary + rest of the article). So when a user requests a index page, we just dump the contents of the summary field, no parsing is needed. When the user wants to view the full article, we just dump the contents from the article field, again no parsing is necessary.

note... when calling article for edit, you will have to rejoin and replace the [break].
to replace the [break] with your proposal may be rather awkward  if you have stripped it on save as you are suggesting. You would have to access and count the chars in the summary field, and then use that result to put the [break] back in the whole text.

Certainly, removing the parsing for [break] from the public access script would definitly have benefits. However it is achieved by admin access script wouldn't be too important in regard to the result.

As for seperate textfields, that, I wouldn't go along with, but any improvement with what is in use now I would wholeheardtedly support.

This is right. We will need to reconstruct the text by including the [break] tag. It IS a little cumbersome to rebuild the article using the strlen($summary) you mentioned. But this is how it has to be done if you do not want to insert an extra summary textarea in the admin panel.

If we had an extra text-box for summary, we can safely skip this step. This is a trade-off you'll have to pay if you want to retain the old interface.
Logged

invarbrass

  • Full Member
  • ***
  • Karma: 18
  • Posts: 117
    • http://snews.extremebittorrent.com
Re: Restructuring the articles table
« Reply #7 on: February 13, 2008, 04:00:47 AM »

As I understand it, everything above [break] is stored in field 'shorttext' everything below [break] is stored in field 'rest-of-the-text'. So break is inserted after the end of 'shorttext' when the full article is requested.
Something like this:

implode("[break]", $array_all_text);

When newest articles are displayed on the frontpage, only 'shorttext' is retrieved. That way, you won't have to calculate where [break] is and what to leave out after read more.

This is also a possibility, if you don't want to alter the fulltext field. We'll still need to remove the [break] tag while displaying the full article. At least the index pages will load faster.

However, we need to consider the cost/benifit ratio. Do you want to parse the same article 100000 times just to avoid re-constructing the text in case the author may want to edit it in the future? Most articles are never updated.
« Last Edit: February 13, 2008, 04:04:38 AM by invarbrass »
Logged

Joost

  • Guest
Re: Restructuring the articles table
« Reply #8 on: February 13, 2008, 04:01:19 AM »

no no no.
invar is proposing to replicate the intro/summary (bit before [break]) and store in new table field, and remove [break] from the full article.
His aim is to remove the requirement to parse the full article to remove the [break] when full article is viewed, and to remove the requirement to parse the article and select only the summary when the shorttext is required.

I see it now. You are right. Sorry for the misunderstanding.
Then my idea is original. I actually like that idea.
Logged

centered

  • Guest
Re: Restructuring the articles table
« Reply #9 on: February 13, 2008, 04:03:58 AM »

If we had an extra text-box for summary, we can safely skip this step. This is a trade-off you'll have to pay if you want to retain the old interface.

I am paying attention to this article, since my summary mod was never completed, but utilize it, if you wish to finish your idea invar
http://snewscms.com/forum/index.php?topic=6482.0

Many news sites have the structure to display the summary of an article, then display the full text in the full page.  That is how I think sNews should be.
Logged

philmoz

  • High flyer
  • ULTIMATE member
  • ******
  • Karma: 161
  • Posts: 1988
    • fiddle 'n fly
Re: Restructuring the articles table
« Reply #10 on: February 13, 2008, 04:04:30 AM »

@invar.
If you seperate intro from full article and store it, and store the rest of thearticle (without the intro), then when calling for full display, just need to join them together
$fulltext = $r['summary'].$r['all_the_rest'];

when sending to editor, you join them with the [break] sandwiched between...
$fulltext = $r['summary'].'[break]'.$r['all_the_rest'];
no need for counting/strlen etc.
Logged
Of all the things I have lost, it is my mind that I miss the most.

Joost

  • Guest
Re: Restructuring the articles table
« Reply #11 on: February 13, 2008, 04:07:08 AM »

This is also a possibility, if you don't want to alter the fulltext field. We'll still need to remove the [break] tag while displaying the full article. At least the index pages will load faster.

No, no. I thought of to separate field, no redundancy.
field sometext and field othertext

In category, only sometext is shown. When the actual article is requested, or edited,  sometext and othertext are glued together, using [break].

Update: This is not how I understand invarbrass's proposition.
This is my proposition. Sorry for the confusion.
« Last Edit: February 13, 2008, 04:15:31 AM by Joost »
Logged

invarbrass

  • Full Member
  • ***
  • Karma: 18
  • Posts: 117
    • http://snews.extremebittorrent.com
Re: Restructuring the articles table
« Reply #12 on: February 13, 2008, 04:10:08 AM »

@invar.
If you seperate intro from full article and store it, and store the rest of thearticle (without the intro), then when calling for full display, just need to join them together
$fulltext = $r['summary'].$r['all_the_rest'];

when sending to editor, you join them with the [break] sandwiched between...
$fulltext = $r['summary'].'[break]'.$r['all_the_rest'];

I am not proposing to remove summary from the article field. Only the break tag is removed, and nothing else. We duplicate the summary from the article field.
So when viewing the index page:
Code: [Select]
echo $r['summary'];When viewing the article page:
Code: [Select]
echo $r['article'];When sending to the editor:
Code: [Select]
$len = strlen($r['summary']);
// Just copy the rest of the text from the article starting at $len
$rest_of_the_text = str_copy($r['article'], $len, strlen($r['article']);
$editable_text = $r['summary']. '[break]' . $rest_of_the_text;
« Last Edit: February 13, 2008, 04:18:35 AM by invarbrass »
Logged

invarbrass

  • Full Member
  • ***
  • Karma: 18
  • Posts: 117
    • http://snews.extremebittorrent.com
Re: Restructuring the articles table
« Reply #13 on: February 13, 2008, 04:15:53 AM »

I am paying attention to this article, since my summary mod was never completed, but utilize it, if you wish to finish your idea invar
http://snewscms.com/forum/index.php?topic=6482.0

Many news sites have the structure to display the summary of an article, then display the full text in the full page.  That is how I think sNews should be.
Thanks quilini, I am taking a look at it. Let's see if we can do away with an extra summary text-box.
Logged

philmoz

  • High flyer
  • ULTIMATE member
  • ******
  • Karma: 161
  • Posts: 1988
    • fiddle 'n fly
Re: Restructuring the articles table
« Reply #14 on: February 13, 2008, 04:18:59 AM »

@invar.
I acknowledge that you weren't going to seperate the items. I am suggesting that is what could be done.
Then you will not be storing an extra 80?? - 250?? characters in the db.
These characters (which is the summary text) are duplicated in your approach, making the db significantly larger in a site with masses of articles. Take H.A.C's site of apparently over 10,000 articles. If each article has a 250 character summary, that's over 2,500,000 surplus chars that are replicated.
Logged
Of all the things I have lost, it is my mind that I miss the most.
Pages: [1] 2 3 4