发新话题
打印

讨论,数据库真的只靠日志就能绝对保证一致性么?

引用:
楼主其实应该把这两个问题分两个帖子来发,
1fs cache
2和存储的cache对数据完整性的影响

对于oracle数据库而言,对于这两个极端情况,是不能保证数据一致的。

所以
1 oracle建议使用dio来bypass the fs buffer cache  
2 从性能上说建议要开存储的cache。安全上则有存储来保证
Veritas的VxFS和Oracle合作,一起设计了一个叫做ODM的东西:
a) 它不需要FS的CACHE,因为Database已经有了cache;
b) 它不需要FS的RW lock,因为Database已经有了记录锁;
c)它针对Oracle的多个实例,减少了文件描述符的处理;
从而使的性能(相当于裸设备)、数据一致性、以及对数据库提供文件系统的管理能力,都带来了巨大的改进。

所以,数据库的cache和FS的cache,应该结合起来设计,从而保证更好的性能和完整性。

至于存储设备的Cache,就是看现场对于数据一致性的要求程度!它本身就是一个优化性能的feature.

TOP

引用:
原帖由 rechardluo 于 2007-9-11 19:23 发表
日志文件系统只对文件系统的元数据作日志
请教rechard,那么元数据和实体数据都日志的叫什么文件系统?
敝人博客
《大话存储》预订链接:http://www.china-pub.com/301645

TOP

引用:
原帖由 冬瓜头 于 2007-9-11 19:33 发表


请教rechard,那么元数据和实体数据都日志的叫什么文件系统?
“日志文件系统只对文件系统的元数据作日志”
兄弟,这句不是我写的,应该使fengwy同学的发言,我回的里面用了3个> (>>>)

TOP

晕,看看你那格式,明明就是自己说的么,哈哈
敝人博客
《大话存储》预订链接:http://www.china-pub.com/301645

TOP

引用:
原帖由 rechardluo 于 2007-9-11 19:38 发表


“日志文件系统只对文件系统的元数据作日志”
兄弟,这句不是我写的,应该使fengwy同学的发言,我回的里面用了3个> (>>>
代表D中央特此更正:这句话不是rechard说的。
敝人博客
《大话存储》预订链接:http://www.china-pub.com/301645

TOP

引用:
原帖由 fengwy 于 2007-9-11 19:22 发表
这两个极端情况如何不能保证数据的一致性了,能否详细说说。
most file system I/O is buffered by the operating system in its file system buffer cache. The idea of buffering is that if a process attempts to read data that is already in the cache, then that data can be returned immediately without waiting for a physical I/O operation. This is called a cache hit. The opposite is called a cache miss. When a cache miss occurs, the data is read from disk and placed into the cache. Old data may have to be removed from the cache to make room for the new data. If so, buffers are reused according to a least recently used algorithm in an attempt to maximize the number of cache hits.
The file system buffer cache is also used for write operations. When a process writes data, the modified buffer goes into the cache. If the process has explicitly requested the synchronous completion of writes (synchronous writes) then the data is written to disk immediately and the process waits until the operation has completed. However, by default delayed writes are used. User processes do not wait for delayed writes to complete. The data is just copied into the buffer cache, and the operating system has a background task that periodically flushes delayed write buffers to disk. Delayed writes allow multiple changes to hot blocks to be combined into fewer physical writes, and they allow physical writes to be optimally ordered and grouped.

Delayed writes can be lost if a system failure occurs while some delayed writes are still pending. Some file systems support a write behind mount option that minimizes the delay before the flushing of delayed write buffers begins. This minimizes the risk of data loss, but reduces the benefit of delayed write caching. It also reduces the risk and severity of delayed write backlogs.

Because delayed writes involve a risk of data loss, Oracle never uses them. Oracle insists on the synchronous completion of writes for all buffered I/O to database files. This is done by using the O_DSYNC flag when opening database files. This means that the data itself must be written synchronously, but that delayed writes may be used for updates to the file access and modification times recorded by the file system.

Do not confuse delayed writes with asynchronous writes. User processes do not wait for the completion of either type of writes. But, they are notified or learn when asynchronous writes have been completed, whereas there is no notification of the completion of delayed writes. It is just assumed that delayed writes will be completed. Thus delayed writes involve a risk of data loss, but asynchronous writes do not.

[ 本帖最后由 shahand 于 2007-9-12 11:12 编辑 ]

TOP

还有:

In UNIX you can control whether a file system uses buffered or unbuffered IO. With Oracle the use of a buffered filesystem is redundant and dangerous. An example of the dangers of a buffered filesystem with Oracle is when power is lost. The buffer in a buffered filesystem depends on the cache battery to provide enough power to allow the buffer to be written to disk before the disk spins down. However, many shops fail to monitor the cache battery lifetime limitations or fail to change the batteries at all. This can result in loss of data in a buffered filesystem on loss of power.

You can turn off buffered writes in several ways (buffered reads aren’t an issue, but you should always use write-through caching). One is to mount the filesystems used with Oracle files as non-buffered using such options as:

AIX: “dio”, “rbrw”, “nointegrity”  

SUN: “delaylog”, “mincache=direct”, “convosync=direct” ,”nodatainlog”

LINUX: “async”, “noatime”

HP: Use VxFS with: “delaylog”, “nodatainlog”, “mincache=direct”, “convosync=direct”

[ 本帖最后由 shahand 于 2007-9-12 11:35 编辑 ]

TOP

还有:

A Quantitative Comparison between Raw Devices
and File Systems for implementing Oracle Databases
Oracle/HP White Paper

www.oracle.com/technology/deploy/performance/pdf/TWP_Oracle_HP_files.pdf
.......

We also evaluated the case where the log option is used instead of the delaylog. This measures the trade-off between no data loss and a potential data loss in case of system crash. We should remind the reader that the delaylog option offers a similar guarantee as the traditional UNIX file system. Log and delaylog options are only meant for file system metadata, but they can influence the amount of data loss during system crash. Our tests showed that the transactional throughput difference between log and delaylog option is between 8% (medium workload) and 10% (heavy workload).

......

TOP

oracle从来都是用同步io?那延迟可很大了吧。根据这段话来看,异步IO是比较折中的方法,oracle为何不用异步io呢?
敝人博客
《大话存储》预订链接:http://www.china-pub.com/301645

TOP

引用:
原帖由 冬瓜头 于 2007-9-11 19:30 发表


oracle做崩溃之后redo的时候,是完全按照物理磁盘上现有的scn来比对log中的scn的,我们说的端到端的一致性,磁盘最终的数据,面就是一端,最终客户,是另一端,oracle从崩溃后的物理磁盘上扫描scn来做redo,思想 ...
redo如果被cache(无论是fs cache 还是 array的cache)了,

然后随着系统crash丢失了部分redo数据(记录)呢?

TOP

发新话题