Issue
Is there a document that describes the SQL terms used as part of I/NET Seven's File Equalization feature?
Product Line
TAC INET
Environment
- I/NET Seven File Equalization
- SQL 2000 all editions
Cause
An overview of SQL Server Systems which can be used to describe the SQL terms used within I/NET Seven's File equalization feature.
Resolution
SQL Server Systems Overview (I/NET Seven)
A SQL Server system can be implemented as a client/server system or as a stand-alone desktop system. The type of system you design will depend on the number of users that will be accessing the database simultaneously and on the kind of work they will be doing.
Generally the most typical environment is a tier 2 configuration consisting of a SQL Server 2000 software configured on a PC with client connections over ethernet
Desktop System
SQL Server can also be used as a stand-alone database server that runs on a desktop computer or a laptop. This is referred to as a desktop system. The client applications run on the same computer that stores the SQL Server engine and the databases. Only one computer is involved in this system. Therefore, no network connection is made from client to server; the client makes a local connection to its local installation of SQL Server.
The desktop system is useful in cases in which a single user accesses the database or in which a few users who share a computer access the database at different times. For example, in a small store with one computer, several employees might access the store’s database to insert customer and sales information, but they must take turns accessing the system. Also, as in this example, desktop systems are useful when the database or databases are small.
Designing a Microsoft SQL Server System
Before you even begin loading the operating system and Microsoft SQL Server, you should have a good idea how you want to design your SQL Server system. By carefully designing your SQL Server system, you can avoid costly downtime caused by having to either rebuild the system or reinstall SQL Server with different options. By comparison, when we talk in general about a system, we mean all of the hardware and software that make up all of the computers working on behalf of the user in order to access data from one or more SQL Server databases.
Security
Today’s computing environments have differing security requirements. Windows 2000 allows you to customize the level of security to meet your needs. The following features assist you in securing both your computer and network access:
Windows NT file system (NTFS) The Windows NT file system is the core security technology in Windows 2000. NTFS provides file security at a group or user level. Required for I/NET Seven Installations.
Windows NT security model This feature permits only authorized users to access system resources. This model controls which user or users can access objects, such as files and printers, as well as which actions individuals can take on an object. In addition, you can enable auditing to track and monitor actions taken, as well as to track actions that a user attempted that would have violated a security policy.
Encrypting file system (EFS) This feature encrypts files with a randomly generated key. The encryption and decryption processes are transparent to the user. EFS requires your disks to be formatted with NTFS.
IP Security (IPSec) support IP Security helps protect data transmitted across a network. This is an integral component for providing security for virtual private networks (VPNs), which allow organizations to transmit data securely across the Internet. Figure 2-1 displays the IP Security configuration dialog box.
Kerberos support This feature provides an industry-standard, highly secure authentication method for single-logon support for Windows 2000—based networks. The Kerberos protocol is an Internet standard and is highly effective when you are integrating Windows 2000 systems into an environment with a different operating system, such as UNIX.
Microsoft SQL Server Replication: Overview and Snapshot Replication
Microsoft SQL Server database replication technology is designed to help you distribute data and stored procedures among the servers in your company. Replication allows you to configure systems to copy data to other systems automatically. Using database replication, you can copy as much or as little data as you want, and you can allocate data among as many systems as you want. Because the replication process is automatic and because during replication a database stores data about the state of the replication as well as the replicated data, there is no danger in losing data. If a replication procedure is interrupted (say, due to a power failure), replication resumes from the point of failure as soon as the systems are running normally again.
What Is Database Replication?
Database replication is the act of copying, or replicating, data from one table or database to another table or database. Using this technology, you can distribute copies of an entire database to multiple systems throughout your company, or you can distribute selected pieces of the database. When SQL Server replication technology is used, the task of copying and distributing data is automated. No user intervention is needed to replicate data once replication has been set up and configured. Because the data replication and processing is done from within a SQL Server database, there is additional stability and recoverability. If a failure occurs during replication (or while any other SQL Server transaction is being performed), operations resume at the point of failure once the problem is fixed. Because of this, many people prefer replication to other methods of moving data between systems.
Replication Components
Microsoft SQL Server 2000 replication is based on the publish-and-subscribe metaphor first used to implement replication in SQL Server 6. This metaphor consists of three main concepts: publishers, distributors, and subscribers. A publisher is a database system that makes data available for replication. A distributor is the database system that contains the distribution database, or pseudodata, used to maintain and manage the replication. A subscriber is a database system that receives replicated data and stores the replicated database.
Publishers
The publisher consists of a Microsoft Windows system hosting a SQL Server database. This database provides data to be replicated to other systems. In addition, the SQL Server database keeps track of which data has changed so that it can be effectively replicated. The publisher also maintains information about which data is configured for replication. Depending on the type of replication that is chosen, the publisher does some work or little work during the replication process. This will be explained in further detail later in this chapter.
A replicated environment can contain multiple subscribers, but any given set of data that is configured for replication, called an article, can have only one publisher. Having only one publisher for a particular set of data does not mean that the publisher is the only component that can modify the data—the subscriber can also modify and even republish the data.
Distributors
In addition to containing the distribution database, servers acting as distributors store metadata, history data, and other information. In many cases, the distributor is also responsible for distributing the replication data to subscribers. The publisher and the distributor are not required to be on the same server. In fact, you will likely use a dedicated server as a distributor. Each publisher must be assigned a distributor when it is created, and a publisher can have only one distributor.
NOTE
Metadata is data about data. Metadata is used in replication to keep track of the state of replication operations. It is also the data that is propagated by the distributor to other members of the replication set and includes information about the structure of data and the properties of data, such as the type of data in a column (numeric, text, and so on) or the length of a column.
Subscribers
As mentioned, subscribers are the database servers that store the replicated data and receive updates. Subscribers can also make updates and serve as publishers to other systems. For a subscriber to receive replicated data, it must subscribe to that data. Subscribing to replication involves configuring the subscriber to receive that data. A subscription is the database information to which you are subscribing.
Types of Replication
SQL Server offers three types of replication: snapshot, transactional, and merge. These replication types offer varying degrees of data consistency within the replicated database, and they require different levels of overhead.
Snapshot Replication
Snapshot replication is the simplest replication type. With snapshot replication, a picture, or snapshot, of the database is taken periodically and propagated to subscribers. The main advantage of snapshot replication is that it does not involve continuous overhead on publishers and subscribers. That is, it does not require publishers to continuously monitor data changes, and it doesn’t require the continuous transmission of data to subscribers. The main disadvantage is that the database on a subscriber is only as current as the last snapshot.
In many cases, as you will see later in this chapter, snapshot replication is sufficient and appropriate—for example, when source data is modified only occasionally. Information such as phone lists, price lists, and item descriptions can easily be handled by using snapshot replication.
Transactional Replication – ( Is not applicable to I/NET Seven)
Transactional replication can be used to replicate changes to the database. With transactional replication, any changes made to articles (a set of data configured for replication) are immediately captured from the transaction log and propagated to the distributors. Using transactional replication, you can keep a publisher and its subscribers in almost exactly the same state, depending on how you configure the replication.
Transactional replication should be used when it is important to keep all of the replicated systems current. Transactional replication uses more system overhead than snapshot replication because it individually applies each transaction that changes data in the system to the replicated systems.
Merge Replication
Merge replication is similar to transactional replication in that it keeps track of the changes made to articles. However, instead of individually propagating transactions that make changes, merge replication periodically transmits a batch of changes. Because merge replication transmits data in batches, it is also similar to snapshot replication.
Merge replication differs from transactional replication in that it is inherently multidirectional. With merge replication, publishers and subscribers can update the publication equally. Transactional replication also allows subscribers to update the publication, but the two replication types function quite differently.
Introduction to Merge Replication
Merge replication performs multidirectional replication between the publisher and one or more subscribers. This allows multiple systems to have updatable copies of the publication and to modify their own copies. A modification on the publisher will be replicated to the subscribers. A modification on a subscriber will be replicated to the publisher and then replicated to the other subscribers.
Unlike transactional replication, merge replication works by installing triggers on the publisher and on the subscribers. Whenever a change is made to the publication or a copy of it, the appropriate trigger is fired, which causes a replication command to be queued up to be sent to the distribution database. This command is eventually sent to the distribution database and then sent to participating systems. Because merge replication operates this way, it requires much more overhead, especially on the publisher, than does transactional replication.
As you will learn in this chapter, the key components involved in the merge replication system are the Merge Agent and the distribution database. The Merge Agent reconciles (merges) incremental changes that have occurred since the last reconciliation. When you use merge replication, no Distribution Agent is used—the Merge Agent communicates with both the publisher and the distributor. The Snapshot Agent is used only to create the initial database. The Merge Agent performs the following tasks.
- The Merge Agent uploads all changes from the subscriber.
- All of the rows without a conflict (rows not modified on both the publisher and the subscriber) are uploaded immediately; those with a conflict (rows modified on both systems) are sent to the conflict resolver. The resolver is a module that is used to resolve conflicts in merge replication. You can configure this module to resolve conflicts based on your needs.
- All changes are applied to the publisher.
- The Merge Agent uploads all changes from the publisher.
- All of the rows without a conflict are uploaded immediately; those with a conflict are sent to the conflict resolver.
- All changes are applied to the subscriber.
This process will repeat as scheduled. With push subscriptions, the Merge Agent runs on the distributor. With pull subscriptions, the Merge Agent runs on the subscriber. Each merge publication has its own Merge Agent.
Publications
A publication is a set of articles grouped together as a unit. Publications provide the means to replicate a logical grouping of articles as one replication object. For example, you can create a publication to be used to replicate a database consisting of multiple tables, each of which is defined as an article. It is more efficient to replicate a database by replicating the entire database in one publication than by replicating tables individually.
A publication can consist of a single article, but it almost always contains more than one article. However, a subscriber can subscribe only to publications, not to articles. Therefore, if you want to subscribe to a single article, you must configure a publication that contains only that article and then subscribe to that publication.
Push and Pull Subscriptions
Replicated data can be propagated in a number of ways. All propagation methods are based on either push subscriptions or pull subscriptions. A subscriber can support a mixture of push and pull subscriptions simultaneously.
Push Subscriptions
The distributor is responsible for providing updates to the subscribers. Updates are initiated without any request from the subscriber. A push subscription is useful when centralized administration is desired because the distributor, rather than multiple subscribers, controls and administers replication. In other words, the initiation and the scheduling of the replication are handled on the distributor.
Pull Subscriptions
Pull subscriptions allow subscribers to initiate replication. Replication can be initiated either via a scheduled task or manually. Pull subscriptions are useful if you have a large number of subscribers and if the subscribers are not always attached to the network. Because subscribers initiate pull subscriptions, subscribers not always connected to the network can periodically connect and request replication data. This can also be useful in reducing the number of connection errors reported on the distributor. If the distributor tries to initiate replication to a subscriber that does not respond, an error will be reported. Thus, if the replication is initiated on the subscriber only when it is attached, no errors will be reported.
Replication Agents
Several agents are used to perform the actions necessary to move the replicated data from the publisher to the distributor and finally to the subscriber: the Snapshot Agent, the Log Reader Agent, the Distribution Agent, the Merge Agent, and the Queue Reader Agent.
Snapshot Agent
The Snapshot Agent is used for creating and propagating the snapshots from the publisher to the distributor (or snapshot location). The Snapshot Agent creates the replication data (the snapshot) and creates the information that is used by the Distribution Agent to propagate that data (the metadata). The Snapshot Agent stores the snapshot on the distributor (or anywhere that you specify). The Snapshot Agent is also responsible for maintaining information about the synchronization status of the replication objects; this information is stored in the distribution database.
The Snapshot Agent is dormant most of the time and might periodically activate, based on the schedule that you have configured, and perform its tasks. Each time the Snapshot Agent runs, it performs the following tasks:
- The Snapshot Agent establishes a connection from the distributor to the publisher. If a connection is not available, the Snapshot Agent will not proceed with creating the snapshot. Once the connection has been established, the Snapshot Agent locks all of the articles involved in the replication to ensure that the snapshot is a consistent view of the data. The Snapshot Agent establishes a connection from the publisher to the distributor. Once this connection has been established, the Snapshot Agent engineers a copy of the schema for each article and stores that information in the distribution database. This data is considered metadata.
- The Snapshot Agent takes a snapshot of the actual data on the publisher and writes it to a file at the snapshot location. The snapshot location does not necessarily need to be on the distributor. If all systems involved in the replication are SQL Server systems, the file is stored as a native bulk copy program. If mixed types of systems are involved in the replication, the data is stored in text files. At this point, synchronization information is set by the Snapshot Agent.
- After the data has been copied, the Snapshot Agent updates information in the distribution database.
- The Snapshot Agent releases the locks that it has held on the articles and logs the snapshot into the history file.
As you can see, the Snapshot Agent is responsible for only creating the snapshot; it does not distribute it to subscribers. Other agents perform this task.
Distribution Agent
The Distribution Agent propagates snapshots and transactions from the distribution database to subscribers. Each publication has its own Distribution Agent.
Merge Agent
The Merge Agent is used in merge replication to reconcile (merge) incremental changes that have occurred since the last reconciliation. When you use merge replication, the Distribution Agent and the Snapshot Agent aren’t used—the Merge Agent communicates with both the publisher and the distributor.
Backing Up Microsoft SQL Server
Backing up the database is one of the DBA’s most important tasks. Having backup files and carefully planning for disaster recovery enable the DBA to restore the system in the event of a failure. The DBA is responsible for keeping the system up and running as much as possible and, in the event of a system failure, for restoring service as quickly as possible. Downtime can be both inconvenient and extremely expensive. Getting the database back up and running as soon as possible is essential.
Backup Terminology
Before we look at backup techniques, let’s review some terminology. In this section, you’ll learn some basic facts about backup, restore, and recovery operations.
Backup and Restore
Backup and restore operations are related and involve saving data from the database for later use, similar to the backup and restore operations that can be performed by the operating system. During the backup, data is copied from the database and saved in another location. The difference between an operating system backup and a database backup is that the operating system backup can save individual files, whereas the database backup saves the entire database. Usually, a database is shared by many users, whereas many operating system files belong to individual users. Thus, a database backup backs up all of the user’s data at once. Because SQL Server is designed for maximum uptime, the backup process is designed to work while the database is up and running, and even while users are accessing the database.
During the restore, the backed up data is copied back to the database. (Don’t confuse restore with recovery; these are two separate operations.) Unlike the backup process, the restore process cannot be done while SQL Server is up and running. In addition, a table cannot be restored separately.
Recovery
Recovery involves the ability of the relational database management system (RDBMS) to survive a system failure and replay (recover) transactions.
Because of the delay in writing changes to disk, a system failure might leave the database in a corrupted state, because some changes made to the database might not have been written to disk or changes written to disk might not have been committed. To maintain the integrity of the database, SQL Server logs all changes in a transaction log. When SQL Server restarts after a system failure, it uses the transaction log to roll forward transactions that were committed but not written to disk and to roll back transactions that were not committed at the time of the failure. In this manner, data accuracy is guaranteed.
SQL Server must be prepared to handle several types of transactions during recovery, including the following:
Transactions that are queries only No recovery is necessary.
Transactions that changed data in the database and were committed but were not written to disk During recovery, SQL Server reads the data pages from disk, reapplies the changes, and then rewrites the pages to disk.
Transactions that changed data in the database, were committed, and were written to disk During recovery, SQL Server determines that the changes were written to disk. No other intervention is required.
Transactions that changed data in the database and were not committed During recovery, SQL Server uses the transaction log to undo any changes that were made to data pages and restores the database to the state it was in before the transactions started.
When SQL Server restarts from a system failure, the recovery mechanism starts automatically. The recovery mechanism uses the transaction log to determine which transactions need to be recovered and which do not. Many of the transactions will not need recovery, but SQL Server must read the transaction log to determine which transactions do require recovery. SQL Server starts reading the transaction log at the point where the last checkpoint occurred.
System Failure
You might be wondering whether backups are really necessary if you use technologies such as Microsoft Cluster Services and RAID fault tolerance. The answer is a resounding “yes.” Your system can fail in a number of ways, and those methods of fault tolerance and fault recovery will help keep your system functioning properly through only some of them. Some system failures can be mild; others can be devastating. To understand why backups are so important, you need to know about the three main categories of failures: hardware failures, software failures, and human error.
Hardware Failures
Hardware failures are probably the most common type of failure you will encounter. Although these failures are becoming less frequent as computer hardware becomes more reliable, components will still wear out over time. Typical hardware failures include the following:
CPU, memory, or bus failure These failures usually result in a system crash. After you replace the faulty component and restart the system, SQL Server automatically performs a database recovery. The database itself is intact, so it does not need to be restored—SQL Server needs simply to replay the lost transactions.
Disk failure If you’re using RAID fault tolerance, this failure type will probably not affect the state of the database at all. You must simply repair the RAID array. If you are not using RAID fault tolerance or if an entire RAID array fails, your only alternative is to restore the database from the backup and use the transaction log backups to recover the database.
Catastrophic system failure or permanent loss of server If the entire system is destroyed in a fire or some other disaster, you might have to start over from scratch. The hardware will need to be reassembled, the database restored from the backup, and the database recovered by means of the data and transaction log backups.
Software Failures
Software failures are rare, and your system probably will never experience them. However, a software failure is usually more disastrous than a hardware failure because software has built-in features that minimize the effect of hardware failures, and without these protective features, the system is vulnerable to disaster if a hardware failure occurs. The transaction log is an example of a software feature designed to help systems recover from hardware failures. Typical software failures include the following:
Operating system failure If a failure of this type occurs in the I/O subsystem, data on disk can be corrupted. If no database corruption occurs, only recovery is necessary. If database corruption occurs, your only option is to restore the database from a backup.
RDBMS failure SQL Server itself can fail. If this type of failure causes corruption to occur, the database must be restored from a backup and recovered. If no corruption occurs, only the automatic recovery is needed to return the system to the state it was in at the point of failure.
Application failure Applications can fail, which can cause data corruption. Like an RDBMS failure, if this type of failure causes corruption to occur, the database must be restored from a backup. If no corruption occurs, no restore is necessary; the automatic recovery will return the system to the state it was in at the point of failure. You might also need to obtain a patch from your application vendor to prevent this type of failure from recurring.
Human Error
The third main category of failure is human error. Human errors can occur at any time and without notice. They can be mild or severe. Unfortunately, these types of errors can go unnoticed for days or even weeks, which can make recovery more difficult. By establishing a good relationship (including good communication) with your users, you can help make recovery from user errors easier and faster. Users should not be afraid to come to you immediately to report a mistake. The earlier you find out about an error, the better. The following failures can be caused by human error:
Database server loss Human errors that can cause the server to fail include accidentally shutting off the power or shutting down the server without first shutting down SQL Server. Recovery is automatic when SQL Server is restarted, but it might take some time. Because the database is intact on disk, a restore is not necessary.
Data loss This type of loss can be caused by someone’s accidentally deleting a data file, for example, thus causing loss of the database. Restore and recovery operations must be performed to return the database to its prefailure state.
Table loss or corrupted data If a table is dropped by mistake or its data is somehow incorrectly changed, you can use backup and recovery to return the table to its original state. Recovery from this type of failure can be quite complex because a single table or a small set of data that is lost cannot simply be recovered from a backup.
Data Base Backups
All SQL Server backups are performed for a specific database. To completely back up your system, you should back up all databases in the system and their transaction logs. Don’t forget to back up the master database as well. And remember, without good backups, you might not be able to restore your data in the event of a failure.
Full Backups
As mentioned, a full backup involves backing up an entire database. All of the filegroups and data files that are part of this database are backed up. If you have multiple databases, you should back up all of them. A full backup is probably the most common technique for backing up small- to medium-size databases. Depending on how large the databases are, this process can be quite time consuming, so if time is an issue, you might consider performing differential backups or filegroup backups, as described next. Once you start a backup, you cannot pause it—the backup will continue until the entire database is backed up.
Conclusion
It is very important to understand when using SQL Server 2000 full version that this application requires more specific knowledge in deployment and configuration of this software program.