opendba: 2007

Thursday, December 13, 2007

SQLServer: What am I running anyway?

Microsoft has enough crazy patches and service packs to drive a dba nuts! (that said, Oracle, Mysql, and most other databases have their fair share as well).

Question is though, what version are we running anyway? For those that don't come from the Microsoftie world, the whole daily build number thing in the version is a little out of control.

IE:

Server A:

1> SELECT @@VERSION
2> go
Microsoft SQL Server 2005 - 9.00.2164.00 (X64)
Jun 14 2006 12:39:07
Copyright (c) 1988-2005 Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 1)

Server B:

1> SELECT @@VERSION
2> go

Microsoft SQL Server 2005 - 9.00.2047.00 (Intel X86)
Apr 14 2006 01:12:25
Copyright (c) 1988-2005 Microsoft Corporation
Enterprise Edition on Windows NT 5.2 (Build 3790: Service Pack 1)

So, what is the difference between these two servers?

Besides the obvious X86 vs X64, the only really difference is the nightly build number. They are both actually SP1.

So we should really be checking this way:

Server A:

1> SELECT SERVERPROPERTY('productversion'),SERVERPROPERTY('productlevel')
2> GO

9.00.2047.00 SP1

Server B:

1> SELECT SERVERPROPERTY('productversion'),SERVERPROPERTY('productlevel')
2> GO

9.00.2164.00 SP1

Just another dorky pet peeve of mine :)

Thursday, November 29, 2007

SQLServer 2005: Full Text Issue

In my opinion, there is security, and then there is the absurd paranoia. Microsoft tends to ride this line a little too much lately. Trying to be the end all to security, they forget that we have firewalls, access lists, etc, etc that should be considered when locking things down.

The interesting issue I've run into with Full Text searches now, is , somewhat not surprisingling, a known Microsoft Bug / Feature

The question I'd like to pose Microsoft is this:

If the work break files I've installed came from the CD / Microsoft bits that I got from Microsoft, should I be able to assume that they've verified where they've came from? And for that matter, If I install a third - party work break file after my SQLServer Full Text install, shouldn't it be my responsibility to verify the validity of those files?

Why should Microsoft automatically be checking these for me? (and why, should they do it every time I load the word breaker info into memory?

BTW - if you are having the problem, it presents itself something like this:

Running SQLServer 2005 (or possibly other servers), with Full Text Indexing, you notice that the initial call to the query engine is slow (surprisingly, 45 seconds consistently). Subsequent queries perform as expected. Then after 10 or 15 minutes, the query processing engine appears to 'go to sleep' and you have another 45 second response time for a query to indexing service.

Friday, November 16, 2007

SQLServer 2005: Full Text Search Monitoring

Seems more and more lately that a DBA can't get out of being the Full Text query expert. Whether that is an in-database technology such as Full Text/Intermedia Text, or an external database technology such as google appliances / FAST appliances. The typical DBA is always stuck in between somewhere, and looks like we have to understand at least how to monitor and troubleshoot these indexing technologies.

Here are a few important views/selects for monitoring Full Text Indexes withing SQLServer 2005:

Important dm views:

sys.dm_fts_active_catalogs /* info about the active ft catalogs */
sys.dm_fts_index_population /* population types for the ft ndexes */
sys.dm_fts_population_ranges /* detailed info about the population of ft indexes */

Important system procedures:

sp_help_fulltext_catalogs /* info on names and locations of ft indexes */
sp_help_fulltext_tables /* info on tables and indexes used by ft catalogs */

Example Queries:

/* Get Status of Full Text Indexes Currently Being Populated */

SELECT

DISTINCT DBNAME (database_id),
OBJECT_NAME (table_id)

FROM sys.dm_fts_index_population population
JOIN sys.dm_fts_population_ranges ranges
ON (population.memory_address = ranges.parent_memory_address)
WHERE ranges.session_id <> 0

/* Get List of Full Text Index Population Counts as Compared with Table Counts */

SELECT catalog.name,

OBJECT_NAME (population.table_id) AS table_name,
catalog.is_paused,
catalog.status_description,
catalog.row_count_in_thousands AS ft_index_rows_in_thousands,
tables.row_count / 1000 AS table_rows_in_thousands,
population.population_type_description AS population_type,
population.status_description AS status,
population.start_time

FROM sys.dm_fts_active_catalogs catalog
CROSS JOIN sys.dm_fts_index_population population
INNER JOIN sys.dm_db_partition_stats tables ON population.table_id = object_id
WHERE catalog.database_id = population.database_id
AND catalog.catalog_id = population.catalog_id
AND catalog.database_id =

(SELECT dbid FROM sys.sysdatabases WHERE name = DB_NAME () )

AND tables.index_id = 1;

Friday, October 19, 2007

MySQL : the beauty in beta

Beta or not, here we come. Yea, you'd think I'd learn to check out the known bugs list before letting developers/qa have at the new database servers running in the lab with MySQL 5.1.21-b . I mean, how bad could beta really be?

Well, as we should all know, beta is called beta for a reason, but sometimes we open source fanatics tend to jump on board, dealing with issues along the way. This sometimes ends up causing a lot of heartache tracking down issues that turn out to be 'known bugs' in a beta version.

Lesson learned (again) :

Yesterday I received a note from one of our lead developers, that went something like this:

It seems that our test mysql instance is very picky about honoring desc versus asc queries (qa confirmed it).

If you run this query it will return in asc regardless of the order by.

select id
from registry_service.ALIAS
where (owner_uri_path_id = 700 and owner_uri_ref='Account/3760')
order by id desc;
----data returned ---
107538
107539
107540
107541
107542

Now if you change the column that you are ordering by, it seems to work…

select name
from registry_service.ALIAS persistent0_
where (owner_uri_path_id = 700 and owner_uri_ref='Account/3760')
order by name desc;
---- data returned----
'7SortAliasTest.20071018.141531980.141'
'7SortAliasTest.20071018.141531980.131'
'3SortAliasTest.20071018.141531980.151'
'0SortAliasTest.20071018.141531980.161'
'0SortAliasTest.20071018.141531980.121'

I’m very confused ; can you help me understand what’s going on?

Well it took all of 2 minutes and 1 Google query to get an answer.

Known beta bug ( http://bugs.mysql.com/bug.php?id=31001 ).

Just a little foresight on my part could have saved development and test both a lot of hair pulling and time wasted had I evaluated the known bugs list before letting dev and qa have at the new MySQL instance. They could have know that this issue was there, and it would be fixed in the 5.1.22-rc release, which I already have loaded on our new development servers.

Shoulda, Woulda, Coulda...

Wednesday, October 17, 2007

MySQL innodb plugin

Recently I've spent hours compiling MySQL 5.1.22 and trying to get my cent0s configured for innodb and partitioning. I tried every imaginable way to run configure, use different plugins, etc.

I messed with this forever, then finally looked at the config options, and sure enough, the ./configure expects innobase, not innodb as the plugin name:

from: http://www.innodb.com/support/tips

On MySQL 5.1, ./configure expects a different parameter:

$ ./configure –with-plugins=innobase

Monday, October 15, 2007

SQLServer: Image Replication - Oops

Yea, it's always something, huh?

Better mind your default configuration options with SQLServer 2005 if you plan on transactional replication with LOBs. It's nice that they only replicate the appropriate changes, but you'll get an error if you are using images with sizes > the default replication size allowed.

See the Microsoft article for more info.

Fun... :)

Monday, October 01, 2007

SQLServer: Transactional Replication Issues

After seeing a few friends at the SQLPass 2007 Summit in Colorado and discussing some of the finer 'features' of pier to pier transactional replication and large database systems (my friends work on the replication team at Microsoft), I quickly ran into one of my big grips with P2P replication and SQLServer 2005.

I had noted that being able to set up P2P from a 3rd party backup tool is essential with large databases (> 100GB) because creating a backup with SQLServer 2005 just takes way too long. The problem is, however, that SQLServer must know which LSN the backup was on when it was restored to the replicated server, so that any transactions that have occurred on the main database since the backup was taken can be correctly applied in the correct order.

The only way for SQLServer to get that is by reading the header in the backup file, which can't be done via a third party backup tool (SQLLightspeed, SQLBackup, etc, etc). My friends implied that if I knew the LSN, it could be done. I'm not quite there yet.

However, it wouldn't be that hard for Microsoft to allow a third party backup tool to call a procedure letting replication know what the current LSN is, thus fixing the problem. Hmm...

Maybe one day. For now, we'll have to deal with 5 hour backups and 2 hours restores in order to setup P2P transactional replication from backup, at least until SQLServer 2008 come around and do the backups for us with it's supposedly blazing fast compression, etc.

I'm not holding my breath....

Friday, September 28, 2007

MySQL Migration Toolkit

I've been evaluating the latest version of the MySQL Migration Toolkit to work on migrations from SQLServer 2005 to MySQL 5.1

Here are some of the things I found so far:

Ease of use:

Very simple and fast to get up and running - I was able to get multiple databases migrated from SQLServer to MySQL without any special setup or configuration changes. Gui is straight forward and intuitive.
Took a little work to figure out how to 'customize' the migration though, but nothing crazy.

Speed:

Seems relatively fast. I was able to transfer about 6 million rows in about 15 minutes.
Would probably be more efficient if I wasn't using an intermediary stopping point (my laptop)
Would benefit if I used bulk loading options, but I wasn't able to determine that right out of the box quickly.

Issues:

Needs a 'Migrate Data Only' option for just moving data
Better options to handle case sensitivity issues (had to modify each table/object manually for that)
The handling for stored connections was not that great, and I had to keep re-creating the stored connections.
Should have an option to just do a bulk load from a datafile if necessary

All in all, a great shortcut to porting databases over to MySQL and moving data from one data center/location to another (Prod -> Test -> Dev, etc).

Very Slick....

Wednesday, September 05, 2007

SQLServer: Tracking Down Stat Issues

The old debate of auto update statistics. I've had issues with it in the past, and coming from an old UNIX / Oracle background, I've always been of the opinion that I know best ( probably not, but it's fun to believe that).

The advantage of use sp_updatestats over update statistics is that is looks at sys.sysindexes for the status of indexes and whether or not they need to be updated or not, before the tables/indexes are scanned for updating stats.

Given this, we can then get a look at distribution changes in order to determine how often we should be updating stats on different tables indexes.

The following script gives a look at how many changes have been done since statistics on an object were last updated:

SELECT a.rowcnt AS total_rows,
a.rowmodctr AS rows_updated,
object_name (a.id) AS table_name,
a.name AS stats_object
FROM sys.sysindexes a
WHERE object_name (a.id) IN ( [table name],[table name],...)
ORDER BY 2 DESC,1 DESC

Then just keep track of how fast the rows_updated grows, and use that to get an idea of how often stats should be updated.

Friday, August 03, 2007

SQLServer: Setting up Access from UNIX

These are the detailed instructions for getting t-sql setup to run on UNIX.

This assumes that you have perl 5x installed, and CPAN as well....

Update CPAN to the current available version

cpan > install Bundle::CPAN

Use CPAN to install the DBI::Bundle

cpan > install Bundle::DBI

Install Freetds

Download files from here
Gunzip and untar the freetds tar.gz file
Read the README!
Run configure tool to customize for your system

./configure

Install freetds

./make install

Add valid SQLServer info to /usr/local/etc/freetds.conf

[{server_connection_name}]
Use CPAN to install the DBD::Sybase

Set SYBASE environment variable to /usr/local
cpan > force install DBD::Sybase

Test the install with a perl connection script

Here's a test script to use (note: pass in in clear text):

#!/usr/bin/perl

#
# test the perl freetds dbi driver
#

use DBI ;

$user = '{dbuser}' ;
$passwd = '{dbpass}' ;

$dbh = DBI->connect('DBI:Sybase:server={servername}',
$user, $passwd);
$dbh->do("use {dbname}");

$action = $dbh->prepare("select count(*) from master..sysprocesses with (nolock)") ;
$action->execute ;
$rows = $action->rows ;
print "rows is $rows\n";

while ( @first = $action->fetchrow_array ) {
foreach $field ( @first ) {
print "$field\t";
}
print "\n";
}

exit(0);

Monday, July 09, 2007

SQLServer: The problem with SQLServer is the OS it runs on :)

Yea,

No matter what way you slice it, the really problem I have with SQLServer, is the operating system it runs on. Seems time after time my SQLServer issues are more related to the OS than anything else.

The latest issue surrounds dual core processors and parallelism.

By default, SQLServer's optimizer seems to look at the number of processors on a systems, multiply it by 2, and use that for the number of parallel threads when running dml in parallel. This was all find and dandy till the OS reported 16 processors on a 8 proc / dual core machine.

Yea, cause then SQLServer says, ohh - I can do this in parallel.... let's see... 16 x 2 = 32 threads! Yea baby!

Uh... great, now my query takes 2x as long because I'm waiting in a line at the door to get in.

Figures.

Take this into account when doing bulk loads, large queries, things that the optimizer might want to parallelize. Either set the max degree of parallelizm at the server level, or use MAXDOP at the query level.

Friday, June 22, 2007

SQLServer 2005: Capturing Processes Waiting / Blocked

OK - it's out of order (this is supposed to be a followup on partitioning ), but I'm always writing some code and then later losing it and re-creating it, so I decided to put this up today...

Everyonce in a while we've had short 'blocking / waiting' where things back up in the system. It's kindof a pain in the you know what when it disappears after a few minutes, so I wrote some code to grab all the SQL that's waiting, and stick it in a temp table for later use. Obviously, a working table in a separate database might be more practical, and maybe a few more options on the stored procedures, but for now, here goes:

1. Create a procedure to check for waiting SQL:

CREATE PROCEDURE blocked_queries]
@thresh DECIMAL (5,2) = 1.0,
@dbname VARCHAR (128) = NULL
AS
DECLARE
@dbstart INT,
@dbend INT
BEGIN
IF @dbname IS NOT NULL
BEGIN
SET @dbstart = DB_ID (@dbname)
SET @dbend = @dbstart + 1
END
ELSE
BEGIN
SET @dbstart = 4
SET @dbend = 999
END

SELECT DB_NAME (b.database_id) AS [database],
x.host_name AS blocked_host,
a.session_id AS sid,
a.wait_type,
ROUND (CAST (a.wait_duration_ms / 1000.0 AS DECIMAL (5,2)),2,1) as
wait_time_sec,
(
SELECT
SUBSTRING
(text, b.statement_start_offset/2 + 1,
(CASE
WHEN b.statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(max), text)) * 2
ELSE b.statement_end_offset
END - b.statement_start_offset
) / 2
)
FROM sys.dm_exec_sql_text(b.sql_handle)
) AS blocked_query ,
a.blocking_session_id AS blocking_sid,
(
SELECT
SUBSTRING
(text, c.statement_start_offset/2 + 1,
(CASE
WHEN c.statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(max), text)) * 2
ELSE c.statement_end_offset
END - c.statement_start_offset
) / 2
)
FROM sys.dm_exec_sql_text(c.sql_handle)
) AS blocking_query
FROM sys.dm_os_waiting_tasks a
join sys.dm_exec_sessions x ON a.session_id = x.session_id
join sys.dm_exec_requests b ON a.session_id = b.session_id
left outer join sys.dm_exec_requests c ON a.blocking_session_id =
c.session_id
WHERE b.database_id >= @dbstart
AND b.database_id < @dbend
AND a.wait_duration_ms / 1000.00 > @thresh
END

2. Now Create the Temporary Table:

create table #blocked_queries
(
database_name varchar(20),
blocked_host varchar(50),
sid int,
wait_type varchar(20),
wait_time_sec decimal (5,2),
blocked_query varchar(4000),
blocking_sid int,
blocking_query varchar(4000),
lock_time datetime default getdate()
)

3. Last Use the Stored Procedure to Popoulate the Temporary Table:

declare
@blocked_count int,
@done int
begin
select @blocked_count = count(*) from #blocked_queries
set @done = 10000

while @blocked_count <= @done begin insert into #blocked_queries ( database_name, blocked_host, sid , wait_type, wait_time_sec, blocked_query, blocking_sid, blocking_query ) exec tp_blocked_queries 1.0,

select @blocked_count = count(*) from #blocked_queries
end
end

Now all you have to do is hope you catch the culprit in action, and that it won't be too difficult to fix, right? :)

Friday, June 08, 2007

SQLServer 2005: Automated Partitioning Part 1

Partitioning is great, but it has to be automated if you want to do anything regularly with it. And as it typically goes, if you can do a task in repeatable order with T-SQL commands, you ought to be able to automate it, right?

In the next few weeks I'll layout a hopefully straight forward way to automate your partitioning in SQLServer 2005. Though the examples will be done with date range partitioning, the basic methodolgy should be transferable to any types of partitioning.

This week we'll be looking at setting up our partitioning functions, shemes, tables, and initial switching of the current partition(s).

In later weeks' we'll look at automating the merging and/or splitting of partitions as data is rolled out of relevance from a time perspective (IE: X number of months then rolls up into quarters / years as time goes on)

Example: Creating the Partition Function monthly_partition_function
We're going to start with a partition function that partitions by month for one year, then partitions by quarter for one year, then partitions by year. We'll assume that we'll start with 2008-01-01 as our highest defined partition, and 2005-01-01 as our oldest/lowest defined partition.

CREATE PARTITION FUNCTION monthly_partition_function (DATETIME)
AS RANGE RIGHT FOR VALUES
(
'2005-01-01', -- partition by year
'2006-01-01', -- partition by year
'2006-04-01', -- partition by q1
'2006-07-01', -- partition by q2
'2006-10-01', -- partition by q3
'2007-01-01', -- partition by q4
'2007-02-01', -- partition by month
'2007-03-01', -- partition by month
'2007-04-01', -- partition by month
'2007-05-01', -- partition by month
'2007-06-01', -- partition by month
'2007-07-01', -- partition by month
'2007-08-01', -- partition by month
'2007-09-01', -- partition by month
'2007-10-01', -- partition by month
'2007-11-01', -- partition by month
'2007-12-01', -- partition by month
'2008-01-01' -- partition by month
)

Example: Creating the Partition Scheme monthly_partition_scheme
Next create our partition scheme that lays out partitions onto file groups. For our purposes, we'll put all partitions in the primary file group (In reality, you should put them in different file groups based on your disk subsystem design for performance and managability reasons).

CREATE PARTITION SCHEME monthly_partition_scheme AS PARTITION monthly_partition_function ALL TO ([PRIMARY])

Example: Creating the Base Table example_sales_data

Now create our base fact table ( example_sales_data ) that will be partitioned according to the rules defined by our partition function and partition scheme:

CREATE TABLE example_sales_data ( trans_date DATETIME NOT NULL, trans_amt DECIMAL (5,2) NOT NULL, trans_type INT, sales_id INT, store_id INT, cust_id INT ) ON monthly_partition_scheme (trans_date)

Example: Creating the Load Table example_sales_data_load
Next create our load table ( example_sales_data_load ) that will be simillarly partitioned according to the rules defined by our partition function and partition scheme in order to insure partition switching will be seamless:

CREATE TABLE example_sales_data_load ( trans_date DATETIME NOT NULL, trans_amt DECIMAL (5,2) NOT NULL, trans_type INT, sales_id INT, store_id INT, cust_id INT ) ON monthly_partition_scheme (trans_date)

Example: Auto Load Seed Data Into Table example_sales_data
Next auto load our fact table ( example_sales_data) with seed data from previous years and months. We'll load through Feb of 2007 to start with.

DECLARE
@my_trans_date DATETIME ,
@my_trans_amt DECIMAL (5,2),
@my_trans_type INT,
@my_sales_id INT,
@my_store_id INT,
@my_cust_id INT,
@my_cur_date DATETIME
BEGIN
SET @my_cur_date = GETDATE() - 1200
SET @my_trans_date = @my_cur_date
SET @my_trans_amt = 10.00
SET @my_trans_type = 1
SET @my_sales_id = 1
SET @my_store_id = 1
SET @my_cust_id = 1

WHILE CAST ('2007-03-01' AS DATETIME) > @my_cur_date
BEGIN
WHILE @my_trans_type <= 3
BEGIN
WHILE @my_sales_id <= 5
BEGIN
WHILE @my_store_id <= 6
BEGIN
WHILE @my_cust_id <= 4
BEGIN
WHILE @my_trans_amt <= 500.00
BEGIN

INSERT INTO example_sales_data
VALUES
(
@my_trans_date,
@my_trans_amt,
@my_trans_type,
@my_sales_id,
@my_store_id,
@my_cust_id
)

SET @my_trans_amt = @my_trans_amt * 1.25
END -- @my_trans_amt
SET @my_trans_amt = @my_trans_amt / 10.0
SET @my_cust_id = @my_cust_id + 1
END -- @my_cust_id
SET @my_cust_id = 1
SET @my_store_id = @my_store_id + 1
END -- @my_store_id
SET @my_store_id = 1
SET @my_sales_id = @my_sales_id + 1
END -- @my_sales_id
SET @my_sales_id = 1
SET @my_trans_type = @my_trans_type + 1
END --@my_trans_type
SET @my_trans_type = 1
SET @my_cur_date = @my_cur_date + 7
SET @my_trans_date = @my_cur_date
END --@my_cur_date
END -- Base Table Load

Example: Verify Seed Data Via Partition Counts example_sales_data
Next check that rows were distributed appropriately to table partitions via the sys.dm_db_partition_stats dynamic management object.

SELECT OBJECT_NAME ([object_id]),partition_number,row_count
FROM sys.dm_db_partition_stats
WHERE object_id = OBJECT_ID('example_sales_data')

Example: Auto Load New Data Into Load Table example_sales_data
Next auto load our load table ( example_sales_data_load) with new data from current month. (This is autogen'd here, but in reality it would be from whatever ETL process).

DECLARE
@my_trans_date DATETIME ,
@my_trans_amt DECIMAL (5,2),
@my_trans_type INT,
@my_sales_id INT,
@my_store_id INT,
@my_cust_id INT,
@my_cur_date DATETIME
BEGIN
SET @my_cur_date = CAST ('2007-03-01' AS DATETIME)
SET @my_trans_date = @my_cur_date
SET @my_trans_amt = 10.00
SET @my_trans_type = 1
SET @my_sales_id = 1
SET @my_store_id = 1
SET @my_cust_id = 1

WHILE CAST ('2007-04-01' AS DATETIME) > @my_cur_date
BEGIN
WHILE @my_trans_type <= 3
BEGIN
WHILE @my_sales_id <= 5
BEGIN
WHILE @my_store_id <= 6
BEGIN
WHILE @my_cust_id <= 4
BEGIN
WHILE @my_trans_amt <= 500.00
BEGIN

INSERT INTO example_sales_data_load
VALUES
(
@my_trans_date,
@my_trans_amt,
@my_trans_type,
@my_sales_id,
@my_store_id,
@my_cust_id
)

SET @my_trans_amt = @my_trans_amt * 1.25
END -- @my_trans_amt
SET @my_trans_amt = @my_trans_amt / 10.0
SET @my_cust_id = @my_cust_id + 1
END -- @my_cust_id
SET @my_cust_id = 1
SET @my_store_id = @my_store_id + 1
END -- @my_store_id
SET @my_store_id = 1
SET @my_sales_id = @my_sales_id + 1
END -- @my_sales_id
SET @my_sales_id = 1
SET @my_trans_type = @my_trans_type + 1
END --@my_trans_type
SET @my_trans_type = 1
SET @my_cur_date = @my_cur_date + 7
SET @my_trans_date = @my_cur_date
END --@my_cur_date
END -- Base Table Load

Example: Verify Seed Data Via Partition Counts example_sales_data_load
Next check that rows were distributed appropriately to table partitions via the sys.dm_db_partition_stats dynamic management object. All rows should reside in partition 9 now.

SELECT OBJECT_NAME ([object_id]),partition_number,row_count
FROM sys.dm_db_partition_stats
WHERE object_id = OBJECT_ID('example_sales_data')

Example: Switch partition from LOAD table to MAIN table
Now we can switch partition 9 of the load table with the main table. Let's auto determine which partition to switch though!

To Manually get the partion number to switch:

SELECT $PARTITION.monthly_partition_function ( (
SELECT TOP 1 (trans_date) FROM example_sales_data_load ) )

To Auto Generate the partion number and switch:

DECLARE
@switch_str VARCHAR (512),
@target_partition INT

BEGIN SELECT @target_partition = $PARTITION.monthly_partition_function
((
SELECT TOP 1 (trans_date)
FROM example_sales_data_load
))

SET @switch_str = 'ALTER TABLE example_sales_data_load '
SET @switch_str = @switch_str + 'SWITCH PARTITION ('
SET @switch_str = @switch_str + CAST (@target_partition AS VARCHAR)
SET @switch_str = @switch_str + ') TO example_sales_data'
SET @switch_str = @switch_str + ' PARTITION ('
SET @switch_str = @switch_str + CAST (@target_partition AS VARCHAR)
SET @switch_str = @switch_str + ')'

EXEC (@switch_str);
END

OK - It's a lot in one session, but the scripts work, and now you have automated loading of your monthly partitions. Next time we'll look at automating the quarterly partition merges....

Thursday, May 31, 2007

TUNING: Taking Advantage of New Features

Wouldn't it be great if we always were able to design things from scratch? So many things could be re-hashed and re-designed. One of the real reason's we're (DBA's that is) frustrated all the time with poorly scalable products, is that we don't pro-actively encourage evolving architectures in a meaningful, planned way.

It's always those developers throwing things over the wall, or the feature requests from product managers - never giving the time to design it right the first time. There may be some thread of truth to those statements, but if we take advantage of those new features, bug fixes, maint windows, we could do a lot for ourselves in terms of performance tuning.

Here's a couple examples to get you started thinking:

A new feature:

Does it require new tables?

Have we thought about partitioning should the data in those tables scale up?
Have we thought about primary keys and which columns should or should not be included given usage and future modifications?
Have we thought about foreign keys and associated indexes?
Have we thought about normalization?

Is the new feature heavy read or heavy write?

Should we be considering additional data files and/or data stores?
Are there certain types of indexes that might be more prudent in the long term?
How does this fit in with our HA / disaster recovery scenarios?

Does the new feature have any long term reporting benefits?

Are changes easily monitored?
Is the data easily transformed to dimensions / facts?
Should there be some archival process?

Those are just a few questions that might be thought about and pondered when new features or bug fixes come across in our applications from a database perspective. Many times, we rely on developers who are under tight schedules and deadlines to think these things through, when they are really focusing on usability and functionality.

If we want to make our lives easier, we need to be the ones asking those questions, and in such a way that helps the development team see the benefit in making those decisions now, and not a year from now when we've added a new customer with 500,000 new transactions a day...

As DBA's we need to strive to get on the front end of the development cycles to make it as easy as possible for our development teams to take advantage of features and designs that will allow scalability and manageability with our databases - whatever those databases might be.

If you're not doing that, then in my opinion, you might as well outsource your database administration to some company that just gonna monitor alerts and backup your data....

Friday, May 25, 2007

MySQL: getting started for newbies

When MySQL first came out and started popping up all over the web in the e-commerce arena, if you were like me you looked at it as a wannabe. 'Call me in a few years when you grow up'. Yea, sometimes the Oracle in us goes to our heads a little to fast...

MySQL has really gone mainstream in the last few years. Companies that never would have considered MySQL in the past for critical database systems now have it on the forfront of their minds when evaluating different database technologies. Clustering, Replication, Monitoring are all well used now with MySQL and well tested.

Well, if I'm not using it now, how do I get on board?

Don't wait for your boss to evaluate it, that's for sure. Go get your MySQL VM set up on your laptop and stop thinking about it! It took me about 1/2 an hour to get the MySQL VM installed, up and running, and security settings all in place to try out the latest version of the MySQL Administrator up and running to play with.

Most likely you have various programs that are already using MySql as a backend - you just might not be maintaining them or playing with them because they are in your corporate environment instead of your production environment. This is the perfect oportunity to get started with MySQL.

Start by downloading the MySQL VM and get it runing on your PC or laptop.

Then download the MySQL Administrator and set up your security and connetions.

Next get the MySql Migration Toolkit and migrate one of your smaller SQLServer or Oracle databases to it.

Now, get familiar with the tools, features, backup and recovery, etc, etc. Try setting up replication; try out clustering, try out the partitioning....

Lastly, admit your Oracle snobbery and get on board :)

Friday, May 18, 2007

VMWare: The greatest thing since sliced bread

So if you like to try out new things, and you're not using VMWare yet, get with the program!

Really, though - it's so easy now (and the VMServer is free) . Just download and install.

VMWare lets you run mini virtualized server environments within your current operating system. Yea, there is a little overhead, but for testing and trying out new databases, configurations, applications, it's really cool.

The best part of it all, is the fact that you can download a host of pre-built applications to run with in the VMServer you just installed. Just startup the server and, wala... a running MYSql database in slackware on your Dell windows laptop. Yea, they have an Oracle 10g RAC appliance, as well as a SQLServer 2005 appliance.

Sweet. If you haven't checked it out, do so now...

Monday, May 14, 2007

SQLServer 2005: DMV's and determiniing fragmentation

Dynamic Management Views (which are not all views, some are stored procedures / functions) are very useful and if you're a DBA working with SQLServer, you should know them inside and out, and have you're list of scripts ready at hand.

Here's a couple that work well for determinining fragmentaion issues:

*Note: Though these can be done on line, the due use CPU and heavy READ I/O - be sure to know your application and how that affects performance, and carfully track performance while they are running the first few times. (Use the DMV options to control how much I/O is done - in the example we use the 'LIMITED' option)


SELECT  substring (b.name,1,40) + '.' +
    substring (c.name,1,50) AS Object,
    a.avg_fragmentation_in_percent AS fragmentation_pct
FROM    sys.dm_db_index_physical_stats
        (
              db_id ('{DBNAME}'),
              object_id ('{TABLE/OBJECT NAME}'),
              NULL,
              NULL,
              'LIMITED'
         ) a
JOIN    sysobjects b
    ON    a.object_id = b.id
LEFT OUTER JOIN sysindexes c
    ON    a.object_id = c.id
AND     a.index_id = c.indid

Wednesday, April 18, 2007

SQLServer 2000: Denormalizing ?

Ever get one of those requests to denormalize a table into a view (sometimes referred to as pivot tables) ?

Here's some code to use a function and view in SQLServer to denormalize data from a single table to force a pivot table based on a key:

create table test_table
(
key_field int ,
cat_field varchar (512)
)

insert into test_table values (1,'John')
insert into test_table values (1,'R')
insert into test_table values (1,'Smith')
insert into test_table values (2,'John')
insert into test_table values (2,'C')
insert into test_table values (2,'Joe')

select * from test_table

key_field cat_field
---------- -------------------------------------------------------------
1 John
1 R
1 Smith
2 John
2 C
2 Joe

create function denorm (@key int)
returns varchar(4000)
as
BEGIN
declare
@mystr varchar(4000),
@tmpstr varchar(4000);

declare mystrcur cursor for
select cat_field
from test_table
WHERE key_field = @key;

open mystrcur;

fetch next from mystrcur into @tmpstr;

set @mystr = @tmpstr;

while @@fetch_status = 0
BEGIN
fetch next from mystrcur into @tmpstr;
if @@fetch_status = 0
set @mystr = @mystr + ',' + @tmpstr;
END

close mystrcur;
deallocate mystrcur;

return (@mystr);
end

create view test_view
as
select distinct key_field,dbo.denorm (key_field)as denorm_list
from test_table

select * from test_view

key_field denorm_list
----------- -----------------------------------------------------------------------
1 John,R,Smith
2 John,C,Joe

Tuesday, March 13, 2007

SQLServer 2005: Working with partitions

Now that SQLServer finally supports something the database world would really consider a decent first attempt at partitioning, I think it's high time we all start putting it to practice.

Partitioning in the Oracle world is almost taken for granted. It's one of the most straight forward ways to scale a database in terms of size and performance, without looking at additional hardware and/or heavy software changes in terms of data flows and use patterns.

Partitioning does not need to be restricted to data warehousing either. It has very practical applications to OLTP databases. When it comes to SQLServer, in my mind, it has even more impact on performance than it does with Oracle because of SQLServer's implementation of locking (although, technically, with SQLServer 2005, it is possible to implement a reader's don't block writer's transaction scheme).

This is because partitioning allows us to have more control over what rows/pages get locked during inserts and selects if we design it correctly. The next few posts with focus on ways to partition an OLTP application, and my workarounds for unimplemented schemes like hash-partitioning. Most of my initial on the job training I give credit to Frank Bommarito, an incredibly talented Oracle architect.

Monday, March 05, 2007

SQLServer 2000: Memory Grants Pending

I tell you -

Finding out anything about memory grants pending is virtually impossible. Yea, there are a few references to it out there in the google world, but most just refer to the fact that it's not much more than the title implies. SQLServer has memory grants for processes queued up.

At surface value, you would expect it would mean 'get more memory', which is what most of the links imply.

As it turns out - you need to dig a little further. We noticed that some applications we run were slowing down our whole server due to memory grants pending. So add memory, of course....

Nope - throw on a trace and look for deadlocks - sure enough, the real culprit was some out of control application code that was causing deadlocks in a cascading manner, which in turn caused the memory grants queue to jump way up and slow SQLServer to a crawl. Fix the deadlocks, fix the memory grants queue.

Anyway, haven't yet looked at 2005 for these issues and possible explanations, but maybe there will be more info....

Tuesday, February 13, 2007

SQLServer: Joining Views in Queries

Two Words: Bad Idea!

The problem with this is that the SQLServer optimizer has to determine the best cost for the set of joins, and then base that on a view that has additional joins underneath it.

It's almost always more efficient to rewrite the query to join the actual tables, and the optimizer is much faster at finding the Cost. We were also able to determine that when evaluating joins to view, the optimizer doesn't handle hints within the view. So if your view is supposed to use a nolock hint (SQLServer 2000), but you join it to a couple tables outside the view, you end up getting shared locks anyway.

Bummin.

Bad Idea!

Pages