Wednesday, April 18, 2007

DB Opt

Here is a link to my complete report.
gabrielcorotan.com/sql.doc

I have summariezed it in this blog entry.


All about indexes – what I learned

1. Index – creates a separate table with the keys for numbers
a. Don’t need to full table search, used index to sort

b. Makes it easier when joining

c. They are generally smaller that the tables they were created from, because they contain only the fields of interest. If you are not searching on a column, don’t include it. That also makes for faster queries.

d. Three types
i. Unique – all of the values in the index need to be unique, ensures original columns.
ii. Full-text – can compare a word to the root of the word, or even find words that are similar to each other.
iii. Spatial – used for determining difference between points on space, must be used with a spatial database and spatial queries

e. B tree – sometimes known as a balanced tree. It is balanced because all nodes have the same number of jumps back to the root. It is a data structure that allows for insertions and deletions, and makes adjustments to the structure of the tree when these occur. Adjustments don’t always occur, which means the tree isn’t always in balance, but when it needs, it reconfigures, and balances itself out. It is used because is saves time because it doesn’t have to search all of the nodes in the tree, because they are organized. It simply has to grab the correct node.

f. Hash – A table of hashes is created. The foreign keys are used to create the hash, which is a smaller, easier-to-compare element. Instead of looking for the matching keyword, it creates a hash, and compares that hash to the hash table (index). One major drawback that I see is that there needs to be an encryption happening, at least the first time through, to create the hash, and encryption takes time and processing power. I think B tree sounds faster.

1. Split the set of queries into two groups: those that likely need no further optimization than simply adding indexes, and those that are more complex.
a. Here is a link for a full list of the queries separated into groups.
gabrielcorotan.com/mpr.doc

2. Recommend indexes that should be added to the stripped-down schema I've given you.
a. Index all of the primary keys in all of the tables

b. Hashes are good for reducing storage space, but consume a lot of memory, and use resources to encrypt

i. They are better for dynamic tables, and for tables that have large amounts of data, and they are modified often. B trees are not for dynamic tables because it may require rearranging the structure. Here are the tables in the database that should use a hashed index.

'activity'
'adjacentcampground'
'agentstake'
'amenity'
'announcement'
'assignedto'
'camp'
'campground'
'camppicture'
'camprule'
'camprulecategory'
'campsite' table
'codes'
'direction'
'facility'
'form'
'generalsite'
'glossary'
'nextpriorityschedule'
'priorityschedule'
'rulecategory'
'ruletable'
'stake'
'ward'

c. B trees are good for tables that need to be displayed, and are more static

d. Below a list of the tables in the database and the type of index that should be created for it:
i. Some table should use a b tree index because it is more static. There will be occasional additions and modifications to the table, but for the most part, it will be accessed and displayed more than it is modified. The b tree will allow faster search time. Also, this will be one of the most viewed tables on the site.

'faq'
'reservation'
'reservationdate'
'reservationforcampsite'
'unavailabledate'
'usercamp'
'usertable'


3. From the complex set, pick three queries to tune. Modify the queries and/or the database to optimize those queries.


SELECT r.reservid, r.reservedBy, r.stake, r.ward, r.groupType,
CAST(r.startDate AS DateTime) AS arrival,
CAST(r.endDate AS DateTime) AS departure,
(SELECT stakeName FROM Stake s
WHERE s.stakeID=r.stake) AS stakeName,
(SELECT wardName FROM Ward w
WHERE CAST(w.wardID AS NVARCHAR)=r.ward) AS wardName
FROM Reservation r
INNER JOIN ReservationForCampsite rfc ON (r.reservid=rfc.reservid)
WHERE ((CAST(r.startDate AS DateTime) >= 'firstDate' AND
CAST(r.startDate AS DateTime) <= 'lastDate') OR (CAST(r.endDate AS DateTime) >= 'firstDate' AND
CAST(r.endDate AS DateTime) <= 'lastDate')) AND r.campID=cid AND rfc.siteID=sid ORDER BY departure SELECT DISTINCT category, crc.catID FROM CampRuleCategory crc, CampRule cr WHERE crc.catID=cr.catID AND campID=cid SELECT r.reservid, r.reservedBy, r.stake, r.ward, r.groupType, CAST(r.startDate AS DateTime) AS arrival, CAST(r.endDate AS DateTime) AS departure, (SELECT stakeName FROM Stake s WHERE s.stakeID=r.stake) AS stakeName, (SELECT wardName FROM Ward w WHERE CAST(w.wardID AS NVARCHAR)=r.ward) AS wardName FROM Reservation r INNER JOIN ReservationForCampsite rfc ON (r.reservid=rfc.reservid) WHERE ((CAST(r.startDate AS DateTime) >= 'firstDate' AND
CAST(r.startDate AS DateTime) <= 'lastDate') OR (CAST(r.endDate AS DateTime) >= 'firstDate' AND
CAST(r.endDate AS DateTime) <= 'lastDate')) AND r.campID=cid AND rfc.siteID=sid
ORDER BY departure

4. Are there any database schema changes you recommend?
a. Every table should be in MyISAM - it is faster and uses much more overhead

b. I would be sure that on all fixed length values, such as zip codes or yes/no fields, I would use the appropriate data type. For example, yes/no only needs a Boolean, and that will save space compared to the char data type.

c. Eliminate the varchars for the same reason as above. This will tighten up the schema, and reduce the size on disk and improve query speeds.

d. I would change some of the larger tables by splitting the table. This will help when the table is large, and is accessed frequently, but there are only a few fields that are ever searched on. That will allow the query to run much faster because it won’t be messing with searching and joining fields that it doesn’t need.

No comments: