A SQL Challenge...

Posts   
 
    
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 25-Aug-2005 10:30:21   

Take a look at http://www.flickr.com/photos/gen/2211583/ and admire the jellyfish smile , which I can assure you, looks exactly like this and they are as big as dinner plates... This is not my photo, but I visited the aquarium in Monterey, California a couple of years ago. simple_smile

Now that you have seen and appreciated the photo... Let's get technical stuck_out_tongue_winking_eye

I've been trying to come up with an efficient way to extract similar information you are seeing on this page from a DB. The page contains an item, in this case, an image whose id is presumably "2211583" and the image belongs to an account "gen". The page looks up the image and account based on the 2 pieces of information, ID and account name, and displays the results...

This is all very well, but what is interesting to me, is the "Gen Kanai's photostream" where you can see the "previous" and "next" images from the account's "photostream". A photostream in the case of flickr is simply all the photos that Gen has ever uploaded in date order. For Gen, this list contains only 50 items, but it could obviously grow very large. What's also interesting is that the previous and next images are personalised for the user visiting the site, meaning that different users will see different previous and next images depending on what permissions they have. Some images are publically viewable, some only by friends, some only by family and some are private to Gen...

So here comes the challenge: "How does flickr find the previous and next images to display based on the ID of the main image?"

This doesn't necessarily apply to images, it could apply to, for example, "reservations": Fetch me the reservation whose ID is 1234 along with the 3 previous and 3 next reservations which were made by a person of the male sex...

At first I was thinking that this was simply a paging problem. In the case of the flickr example the page size is 3 but I'm not sure what the selection criteria would be given you only have the ID of an image which appears on that page...

I have yet to solve this problem in an efficient way (i.e. no linked lists, using a single query etc) and thought I would throw it open to see if anyone knew of a standard way to implement it... simple_smile

Marcus

Fishy avatar
Fishy
User
Posts: 392
Joined: 15-Apr-2004
# Posted on: 25-Aug-2005 16:38:26   

Marcus wrote:

Take a look at http://www.flickr.com/photos/gen/2211583/ and admire the jellyfish smile , which I can assure you, looks exactly like this and they are as big as dinner plates... This is not my photo, but I visited the aquarium in Monterey, California a couple of years ago. simple_smile

Yea, I saw 'Finding Nemo' 3 times so I'm sort of an expert on this jellyfish thingy.

Anyways, I can't help you on the rest of your post, but I just wanted to back you up on the jellyfish part.

Fishy

pilotboba
User
Posts: 434
Joined: 05-Aug-2005
# Posted on: 25-Aug-2005 17:11:08   

Within the database there has to be some way to sequence the entities. Whether it is a reservation date for reservations or a sequence number for photos. Then you can query on the sequencing field in addtion to any other fields.

To get Next SELECT Top 1 * FROM Photos WHERE Seq > @SegofCurrent AND Permission = @Whatever AND ...

To get previous SELECT Top 1 * FROM Photos WHERE Seq < @SegofCurrent AND Permission = @Whatever AND ...

I think that would be the basic idea. But this will only work if you have a field in the database which gives you some type of rownumber equivelent.

This is simmilar to paging. Cause, alot of times in paging you are creating a surrogate sequence number where one doesn't exist based on your filter and sort. Once you have your records numbered you can pull out the specific page of data.

In the case above your paging is 1 item per page, and you are displaying 3 pages at once.

BOb

jeffreygg
User
Posts: 805
Joined: 26-Oct-2003
# Posted on: 25-Aug-2005 18:31:32   

Fishy wrote:

Marcus wrote:

Take a look at http://www.flickr.com/photos/gen/2211583/ and admire the jellyfish smile , which I can assure you, looks exactly like this and they are as big as dinner plates... This is not my photo, but I visited the aquarium in Monterey, California a couple of years ago. simple_smile

Yea, I saw 'Finding Nemo' 3 times so I'm sort of an expert on this jellyfish thingy.

Anyways, I can't help you on the rest of your post, but I just wanted to back you up on the jellyfish part.

Fishy

Lol, Fish. smile

Jeff...

wayne avatar
wayne
User
Posts: 611
Joined: 07-Apr-2004
# Posted on: 25-Aug-2005 21:39:22   

Fishy wrote:

Marcus wrote:

Take a look at http://www.flickr.com/photos/gen/2211583/ and admire the jellyfish smile , which I can assure you, looks exactly like this and they are as big as dinner plates... This is not my photo, but I visited the aquarium in Monterey, California a couple of years ago. simple_smile

Yea, I saw 'Finding Nemo' 3 times so I'm sort of an expert on this jellyfish thingy.

Anyways, I can't help you on the rest of your post, but I just wanted to back you up on the jellyfish part.

Fishy

Fishy smile , You know i couldn't stop gigling about this for about 5 minutes.

Devildog74
User
Posts: 719
Joined: 04-Feb-2004
# Posted on: 26-Aug-2005 02:24:49   

There are various artificial intellegence algorythms that can do this. for example, some games that simulate decision making use feed forward neural networks to this kind of thing. I dont know that a feed forward neural net would be the best fit for your scenario, but retailers do the same types of studies, simply look at amazon.com. They make product upsells using demographics and statistics, etc.

it is feasable that the account id of the person doing the viewing and the id of the image are linked to various categories. the engine would compare tidbits from the account demographics to come up with suitable pictures to display, etc. etc.

Typically you need to have trained data cubes and models before you could tackle this using a standard.

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 30-Aug-2005 17:20:56   

Thanks everyone (especially Fishy... Finding Nemo is now on my list of must see dvds simple_smile )!

Bod I've thought about it and agree that there doesn't seem to be a faster way to do without using 3 queries... Devildog, I'm a big fan of neural nets, but I think it would be overkill for this situation.

Thanks everyone once again!

Marcus

davisg avatar
davisg
User
Posts: 113
Joined: 27-Feb-2005
# Posted on: 31-Aug-2005 09:38:51   

Heh, you guys crack me up smile

Rogelio
User
Posts: 221
Joined: 29-Mar-2005
# Posted on: 31-Aug-2005 13:27:52   

Hi,

May be the do the following:

A unique query to get the list of all the images a user has rights to see, this query retrieve the id of the images only and could be restricted to max. 50 or other number of records, from this query you can get the 3 images records each time using an IN sql predicate that contain the 3 id.

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 31-Aug-2005 15:25:18   

Rogelio wrote:

A unique query to get the list of all the images a user has rights to see, this query retrieve the id of the images only and could be restricted to max. 50 or other number of records, from this query you can get the 3 images records each time using an IN sql predicate that contain the 3 id.

Thanks for your suggestion, however I'm not sure how you could get rows adjacent to the row whose ID you are looking for using this method... SQL only gives "sets" of results and because the where clause is filtering the resultset by an ID (which bears no relation to the ordering of the resultset), you are not able to get rows either side of the target.

psandler
User
Posts: 540
Joined: 22-Feb-2005
# Posted on: 02-Sep-2005 05:49:36   

I'm assuming SQL Server for this. Oracle and SQL (the two databases I am most familiar with) handle TOP/ROWNUM with very different syntax, so I assume that each database would require a different solution.

If you only need the next and previous item, it's pretty easy to do in a subquery (northwind for all of these):


--select the ordernum, which would be passed in as a param
DECLARE @ordernum int
SELECT @ordernum = 10256

--get the freight of the selected record, plus the freight for
--the chronologically next and previous record for the same employee.
--You would have to add a user rights filter to the subqueries
--which are hard to define without knowing how security works
SELECT  
    (
    SELECT TOP 1 
        freight 
    FROM 
        orders o1 
    WHERE o1.orderdate <  o.orderdate 
          AND o1.EmployeeID = o.EmployeeID
    ORDER BY
        orderdate desc
    
    ) as previous,
    freight,
    (
    SELECT TOP 1 
        freight 
    FROM 
        orders o1 
    WHERE o1.orderdate >  o.orderdate 
          AND o1.EmployeeID = o.EmployeeID
    ORDER BY
        orderdate asc
    
    ) as next
    
FROM
    orders o
WHERE
    orderid = @ordernum


Not a pretty solution, though. The number of items you return for prev and next is hardcoded, plus the resultset is flat.

You can't use use "TOP @x" in SQL Server, where @x is the number of rows you want. You could actually use this method by creating a SQL string and EXECuting it, but I am not a fan of this practice. If you did go this route, you could use three TOP queries UNIONed together.

The only way I can think to do it dynamically and elegantly is to use a temp table:


--select the ordernum, which would be passed in as a param
DECLARE @ordernum int
SELECT @ordernum = 10256
--select the number of rows before and after
declare @PrevNextCount int
SELECT @PrevNextCount = 5
--set the rowcount to limit the resultset for each query
SET ROWCOUNT @PrevNextCount

--explicitly creating the table with IDENTITY here, to avoid
--the case where the selected record has the same date as records before or
--after it, and thus comes back in the final query in the wrong spot
CREATE TABLE #mytemp
    (
        rowId int IDENTITY(1,1),
        orderId int,
        freight money
    )

--insert prev records into temp table
INSERT INTO #mytemp (orderId, freight)
SELECT 
    o.orderid,
    o.freight
FROM
    orders o
WHERE
    o.orderdate < (SELECT orderdate FROM orders where orderId = @ordernum)
    AND o.employeeId = (SELECT employeeid FROM orders WHERE orderid = @ordernum)
ORDER BY 
    orderdate DESC  

--insert "main" record
INSERT INTO #mytemp (orderId, freight) 
SELECT 
    o.orderid,
    o.freight
FROM
    orders o
WHERE
    o.orderId = @orderNum

--insert next records into temp table
INSERT INTO #mytemp (orderId, freight)
SELECT 
    o.orderid,
    o.freight
FROM
    orders o
WHERE
    o.orderdate > (SELECT orderdate FROM orders where orderId = @ordernum)
    AND o.employeeId = (SELECT employeeid FROM orders WHERE orderid = @ordernum)
ORDER BY 
    orderdate ASC   

--reset the rowcount to 0 (no limit)
SET ROWCOUNT 0

--return the contents of the temp table
SELECT orderId, freight FROM #mytemp ORDER BY orderId ASC

--drop the temp table
DROP TABLE #mytemp

This could obviously be cleaned up or supplemented to support the app tier.

Re-reading your original question, I'm wondering if I misunderstood the challange. Did you mean one stored proc, or (literally) one query? flushed flushed flushed

Funny, I thought this was an easy problem—it turned out to be trickier than I thought. wink

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 02-Sep-2005 11:35:07   

Thanks for the reply Phil! smile

psandler wrote:

I'm assuming SQL Server for this. Oracle and SQL (the two databases I am most familiar with) handle TOP/ROWNUM with very different syntax, so I assume that each database would require a different solution.

... code removed...

If you only need the next and previous item, it's pretty easy to do in a subquery

Not a pretty solution, though. The number of items you return for prev and next is hardcoded, plus the resultset is flat.

This would be perfect only the result is flat as you say... In my case I need all the row data. disappointed

psandler wrote:

You can't use use "TOP @x" in SQL Server, where @x is the number of rows you want. You could actually use this method by creating a SQL string and EXECuting it, but I am not a fan of this practice. If you did go this route, you could use three TOP queries UNIONed together.

I believe in SQL Server 2005, parameterised TOP is now supported. simple_smile

psandler wrote:

The only way I can think to do it dynamically and elegantly is to use a temp table:

... code removed...

Re-reading your original question, I'm wondering if I misunderstood the challange. Did you mean one stored proc, or (literally) one query? flushed flushed flushed

Not at all! my question was open to interpretation... I was simply interested in different opinions regarding solving this problem. I think the temp table example gives a neat result but may not be performant given the use of the temp table. It does have the advantage of a single database call however...

The other way to do this would be to have the stored procedure first select the sequence column for the target orderid into a parameter and then have it return 3 separate resultsets for previous, target and next rows... This would be my preferred approach since it would eliminate the temp table and the separate resultsets give me a clean differentiator between the previous, target and next sets of rows making the client side processing easier.

The only problem is that I don't think LLBLGen supports multiple resultsets from a call to a stored procedure... disappointed This thread (http://llblgen.com/tinyforum/Messages.aspx?ThreadID=2506) seems to suggest there is a way... But I'm not sure this method would work for an SP.

Frans - Any comment on this point?

psandler wrote:

Funny, I thought this was an easy problem—it turned out to be trickier than I thought. wink

Me too simple_smile and hence the post!

Thanks again for your input!

Rogelio
User
Posts: 221
Joined: 29-Mar-2005
# Posted on: 02-Sep-2005 13:01:42   

Marcus wrote:

Rogelio wrote:

A unique query to get the list of all the images a user has rights to see, this query retrieve the id of the images only and could be restricted to max. 50 or other number of records, from this query you can get the 3 images records each time using an IN sql predicate that contain the 3 id.

Thanks for your suggestion, however I'm not sure how you could get rows adjacent to the row whose ID you are looking for using this method... SQL only gives "sets" of results and because the where clause is filtering the resultset by an ID (which bears no relation to the ordering of the resultset), you are not able to get rows either side of the target.

Hi,

I thought I have to give more details:

  1. The system know the user id always.
  2. When the user requests to see the pictures, then system retrieve the list of all the pictures the user has rights to see, this list contain the id of the pictures (and may be the route the get picture if the picture is not saved in the database). This list is sorted by id.
  3. From the list is easy to navigate the current picture, the previous and next: 1234567 1235678 1236789 <- previous picture 1246789 <- current picture 2124567 <- next picture 3356378 3455432 4326754 4456555 .......

To get the 3 records from database: Select picture from pictureTable where id IN (1236789,1246789,2124567) order by id

If the pictures are not saved in the database, then you use the route to get each picture.

You only need to define a rule for the first picture to show to the user, that could be the last picture the user saw. Then you save the id of the last picture in the user record.

If the user decides to see the next picture (2124567), then:

1234567 1235678 1236789
1246789 <- previous picture 2124567 <- current picture 3356378 <- next picture 3455432 4326754 4456555 .......

And so on.

If you want performance, then as you have the previous, current [1246789] and next pictures: when the user move to the next picture, then you only has to retrieve one picture [3356378] .

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 02-Sep-2005 14:42:19   

Rogelio wrote:

I thought I have to give more details:

Thanks Rogelio!

There a couple of issues with your approach which would prohibit me being able to use it in my application... simple_smile

Rogelio wrote:

  1. The system know the user id always.

Agreed.

Rogelio wrote:

  1. When the user requests to see the pictures, then system retrieve the list of all the pictures the user has rights to see, this list contain the id of the pictures (and may be the route the get picture if the picture is not saved in the database). This list is sorted by id.

Unfortunately a given user may have rights to see millions of images and so pulling this data to the client is not possible (even if it were just the IDs). Also SQL Server can also sort rows much faster than the client since it may be able to use indexes...

Rogelio wrote:

If you want performance, then as you have the previous, current [1246789] and next pictures: when the user move to the next picture, then you only has to retrieve one picture [3356378] .

While performance is high priority, the application is stateless which means that nothing is kept in memory between page requests. The second request is likely to be served by a different web server in the cluster.

Thanks for you input!

Marcus

ianbanks
User
Posts: 9
Joined: 21-Sep-2005
# Posted on: 21-Sep-2005 17:15:38   

Marcus,

Would this do:

SELECT * FROM sometable WHERE id = (SELECT TOP 1 id FROM sometable WHERE id < @current ORDER BY id DESC) OR id = (SELECT TOP 1 id FROM sometable WHERE id > @current ORDER BY id ASC) OR id = @current ORDER BY id

Assuming you have an index on id, the minimum number of disk pages should be hit (the subqueries are executed once). The query should also be amenable to extra predicates (such as the security level checks).

Regards, Ben

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 21-Sep-2005 19:40:50   

Ben,

Thanks for the suggestion! simple_smile

ianbanks wrote:

Would this do:

SELECT * FROM sometable WHERE id = (SELECT TOP 1 id FROM sometable WHERE id < @current ORDER BY id DESC) OR id = (SELECT TOP 1 id FROM sometable WHERE id > @current ORDER BY id ASC) OR id = @current ORDER BY id

I think this only works if the desired resultset is ordered by "id". If we wanted to order by another field, 'name' for instance (but still target the middle row whose id we know...), then I think we would have a problem... disappointed

ianbanks
User
Posts: 9
Joined: 21-Sep-2005
# Posted on: 21-Sep-2005 20:33:38   

Marcus wrote:

Ben,

...

I think this only works if the desired resultset is ordered by "id". If we wanted to order by another field, 'name' for instance (but still target the middle row whose id we know...), then I think we would have a problem... disappointed

Hmm, how about:

SELECT * FROM SomeTable WHERE Id = (SELECT TOP 1 Id FROM SomeTable WHERE Name < (SELECT TOP 1 Name FROM SomeTable WHERE Id = @current) ORDER BY Name DESC) OR Id = (SELECT TOP 1 Id FROM SomeTable WHERE Name > (SELECT TOP 1 Name FROM SomeTable WHERE Id = @current) ORDER BY Name ASC) OR Id = @current ORDER BY Name

Of course, you didn't say that Name wasn't unique. grin

Regards, Ben

ianbanks
User
Posts: 9
Joined: 21-Sep-2005
# Posted on: 21-Sep-2005 20:51:53   

Also, replacing the next and before expressions with something like--

Id = (SELECT TOP 1 Id FROM SomeTable
    WHERE Name > (SELECT TOP 1 Name FROM SomeTable WHERE Id = @current) 
    OR (Name = (SELECT TOP 1 Name FROM SomeTable WHERE Id = @current) 
        AND Id > @current)
    ORDER BY Name ASC, Id ASC)

--should allow it to work even with duplicate names. (I haven't tested it or checked the query plan, so I'm not sure if it still uses the theoretical minimum number of page reads)...

Regards, Ben