Details of Block Size values in creating tables (OpenInsight 32-Bit) [Revelation On-Line Wiki]

OpenInsight 32-Bit, Martin Drenovac, Richard Hunt, Richard Bright, //www.sprezzatura.com]The Sprezzatura Group[/url], dbakke@srpcs.com's Don Bakke

Join The Works program to have access to the most current content, and to be able to ask questions and get answers from Revelation staff and the Revelation community

At 07 SEP 2007 08:39:13PM Martin Drenovac wrote:

I'd like some details about the use of the various parameters in the Create Table process.

What is the difference in performance in creating a file with 1024 vs 4096 byte blocks?

How do we use the modulo parameter - do we need to set it when the file manager manipulates file sizing dynamically?

At 08 SEP 2007 12:07PM Richard Hunt wrote:

I can tell you about frame size (byte blocks). My hard drive reads data in byte blocks of 4096 so I have set all my tables with a minimum frame size of 4096. I have not litterally bench tested any settings for "CREATE_TABLE".

It does make a difference for me, although I really do not think you will notice the difference unless you are selecting from a large table or a table with large rows. The tested table has 1,017,750 rows at 323,754,795 bytes.

My personal belief is that the frame size is defaulting to 1024 because a long time ago the "hard drives" read data in byte blocks of 1024.

Also, I have noticed that all the computers that I use and have tested software on, seem to cache "hard drive" reads on the megabyte scale. What I mean is that I can select from a large table and you can actually see the "hard drive" light lit. Do the same select again a second time and the "hard drive" light does not light up. And the second time the select is done, it takes about 40 times faster.

At 08 SEP 2007 08:31PM Martin Drenovac wrote:

Richard - thanks for the note. In reference to create table I don't mean benchmarking that process, rather why do we select 1024 vs 4096 when creating a table. I'm assuming OI does something better, smarter, faster - and faster is what we're currently in need of for some of our tables - esp for our larger tables - if we were to create the tables with block sizes of 4096.

OI, whilst a great environment - would be greater if there was more ready information available to the developer community.

At 08 SEP 2007 11:26PM Richard Bright wrote:

Martin,

I believe there was discussion at Oz REv University about frame size. Richard Hunt correctly points to the historical basis for the default value - and that related to a mix of hard drive function, and more particularly, network packet size.

The world has changed significantly in last 15 years. The benchmark data now points to 2048 (or larger) possibly delivering much better performance. Problem is that simply re-sizing the existing table does not work - you will need to creat new empty table; set the frame size then copy in your data.

More info on frame size - and why play round - can be found in the Native tables documentation and I think in the old Arev Tech Bulletins.

At 10 SEP 2007 04:57AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

Frame sizing is as much as art as it is a science. Richard is very much correct in the disk block size should play an important part of your frame size, but record size is also an important part of of the puzzle. As a historical note, one of the main reason for the original frame size was that IPX frames were 1500 bytes or 1492 (or something like that), so by the time you took the frame header and added it to the 1K frame, you pretty much had a single frame per network packet. Also, 1024 fits very nicely into a single DUMP window.

If you have a 4K disk block, if your records are under 1K in size, then using a 4K frame will actually slow you down, at the expense of wasted disk space. That's because you'll start fitting more of your records into a single frame and they'll all have to be parsed.

The basic idea is to have as few records per frame as possible, and optimize that with your disk block size. Assuming you are using a network product (NLM or NT Service), then packet size is no longer part of the picture, since only the record is sent across the wire. In addition to frame size, you can manipulate record access speed and frame load through the threshold settings.

In the end, there's a trade off between disk space, wasted disk space and access speeds. It's up to you to decided which one is most important.

Since this seems to be a generally hot topic, if there's enough demand, with permission from Revelation, Sprezzatura would be willing to represent our Linear Hash talk from the Seattle Conference earlier this year at the Las Vegas conference next year.

The Sprezzatura Group

World leaders in all things RevSoft

At 10 SEP 2007 06:14AM Martin Drenovac wrote:

Thanks very much all for the info. What I'm really looking to find information on is a comment that Richard refers to where as I understood it, Rev had found that if you resize the OI files to use 4K blocks (irrespective of the path), that we can expect "Substantial" throughput improvement - I'm trying to find if anyone's had the experience - so that we can qualify what benchmarking we'll do in the next couple of days. We have very large record sizes, and very large files, so you can understand how the ears prick up when you're told that by changing the structure of a file, will give "Substantial" throughput. I don't want to quote the factor until we do the benchmarks ourselves.

At 10 SEP 2007 01:11PM dbakke@srpcs.com's Don Bakke wrote:

…if there's enough demand, with permission from Revelation, Sprezzatura would be willing to represent our Linear Hash talk from the Seattle Conference earlier this year at the Las Vegas conference next year.

I missed out on some Sprezz topics in Seattle due to scheduling conflicts. I, for one, would be interested in this topic being replayed.

dbakke@srpcs.com

SRP Computer Solutions, Inc.

At 10 SEP 2007 06:55PM Richard Bright wrote:

Martin,

Kevin may have some draft benchmark data on the effect of changing LH frame size in modern IP networks that he can share.

Just want to pick up on Sprez's point on key reason for 1024 default frame size - to fit efficiently in IPX packet. In them days Novell IPX / 802.2 network was the best platform. Now most of us have migrated to Windows TCP/IP, lan speed has changed from 10Mbit/sec to Gigbit/sec - so the rate limiting point is possibly different. Under TCP/IP larger packets of data can be pushed down the wire but may be fragmented into smaller packets (with performance hit) while passing thru a router / bridge etc. On the file server side we have characteristics of hard disk / data array and caching to consider. So we have a number of variables - some are characteristic of the particular network and hardware.

Regardless, it would be interested to see some representative data to give some substance to the suggestion of 4K being a better frame size for large files.

At 11 SEP 2007 05:00AM [url=http://www.sprezzatura.com]The Sprezzatura Group[/url] wrote:

We're still back to the factor of record size and parsing speeds. If you have 50 byte records and a 4K frame, you'll have about 80 records per frame. When trying to find the last record in a frame, the system will need to check all 80 records to determine if your record is there. If you assume three overflow frames (not unreasonable), then you have a potential 245 records to check for. Dropping this down to a 1K frame gives 20 and and 61 records. So, disk read speed being equal on both frame sizes, in this particular instance, a 1K frame size should be faster.

I'm not saying that 4K wouldn't be faster, on a ~4K record, it would be, since with a 1K frame, there would be four reads to get the entire record, as opposed to one read with the 4K, assuming one record per frame.

I can prove to you a 1K frame is faster.

I can prove to you a 4K frame is faster.

Schrödinger's hash.

The Sprezzatura Group

World leaders in all things RevSoft

View this thread on the Works forum...