Monday, July 26, 2010

Indexing - the Need

Objects are stored in discs just like they are stored in memory. Every object has a memory pointer at which it has been stored. Like wise for example a database row, has a location pointer at which is has been stored in the disc.

As for as the arrays are concerned, if the stored memory is contigeous, it is easy to randomly fetch that element. Let as assume we have an integer array with length 10. In java, integer occupies 4 bytes. We know the startIndex of this array while defining this array. Lets say its 1000. Now if we want to access 4th element, we shall directly go to 1000 + 3 * 4 = 1012 and read next 4 bytes. This is one kind of indexing.

This principle can be applied to database too but with many complexities involved. For example having multiple columns and variable string length etc. If we assume constant maximum length, then we might end up in wasting some space but with the performance benefits.

Normally rows are ordered based on primary key. Hence by default rows are indexed based on primary key. On accesssing the primary key value in the disc, user get to know the location of the complete row and if possible size of that row. If you want a specific column to be searched frequently, we shall index that column. By indexing means here, that values of that column are stored in a sorted manner in a separate place.

Now the users query might be little more complex, like checking whether value 5 exist or not or get the row values when my column value is 5. Here we need to scan through the values of the primary key/indexed column. So here sorting can help us to minimize the retrival time. It helps you find the element in log n time.

No comments:

Post a Comment