Forum Moderators: open
I haven't thought this through, but I figure there must be a good way to do this.
I have a table which resembles this:
5 ¦ 2 ¦ 4 ¦ 1 ¦ 3
1 ¦ 3 ¦ 2 ¦ 4 ¦ 5
7 ¦ 8 ¦ 1 ¦ 2 ¦ 5
6 ¦ 7 ¦ 8 ¦ 9 ¦ 5
1 ¦ 2 ¦ 3 ¦ 4 ¦ 5
I would like it to look like this:
5 ¦ 4 ¦ 3 ¦ 2 ¦ 1
5 ¦ 4 ¦ 3 ¦ 2 ¦ 1
8 ¦ 7 ¦ 5 ¦ 2 ¦ 1
9 ¦ 8 ¦ 7 ¦ 6 ¦ 5
5 ¦ 4 ¦ 3 ¦ 2 ¦ 1
see - each row has 5 numbers in 5 columns, no particular order.
I would like column 1 to contain the largest number, column 2 to contain the lext-largest... sorted within the row.
My first hunch is that I need to do a bubble sort on each row; swapping values in columns with an UPDATE. I can't immediately think of the SQL syntax for this, but I suspect it's possible. The bubble sort would compare and swap these columns in order:
1-2, 2-3, 3-4, 4-5, 1-2, 2-3, 3-4, 1-2, 2-3, 1-2
Now, my challenge is that I'm doing this to over 14 million records. It's a big denormalization effort to make some of my other queries faster. It's all going to be preprocessed, so time is not a big factor. But simplicity and efficiency is always a good thing.
I could do this pretty easily with a PHP script, using a loop that reads, sorts, and updates each row. Is there a way to do the same thing with SQL?
Is there an easier/better way?
My guess is that it's easiest to read the data, re-sort in PHP, and then update back to the DB. Depending on your DB, you may be able to do the same kind of functionality in a user-defined function in the DB. But my experience has been that DBs don't play as nicely with lists as programming languages do.
I'm not a PHP expert, but I assume it has some built-in functions for manipulating lists. So, read a row, convert the 5 fields into a list, resort the list, then update the table (field1=SortedItem1, field2=SortedItem2, etc.).
This makes the SQL-only solution merely academic. While the PHP is running doing its job, I'll still be wondering if there was a way to do it without stressing out my Apache server.
On a related topic, here's how I often approach large iterative problems like this:
in a kind of pseudo-code,
1) I create a PHP script that performs a SQL action using $_GET['i']
2) the PHP script does its thing to record i
3) the PHP script outputs a basic "trace" or "log" of anything interesting that it does that I might want to monitor
4) script also looks in the database for the next item in the database:
SELECT id as nextid FROM items WHERE id > i LIMIT 15) after the work is done, the script outputs a little JavaScript that will reload the page (this is my "loop"), often with a little setTimeout() so there's a slight delay between reloads:
<script>location.href=thisscript.php?i=<?php echo($nextid);?></script>
once all that is built:
6) I open said PHP page in a browser. I like to use Chrome for jobs like this
7) page renders, does its SQL thing
8) page reloads with ?i=[i+1]. GOTO 7.
9) when there is no "next" record, it stops.
I admit it's a wacky way to do it, but it works... it lets me stop the process at any time and restart it at any point just by opening the page with ?i=n. When you have a process that could take many hours, days, or weeks to complete, it's not practical to put it in a single PHP "while()" loop. The script may timeout or die and ... then what?
I've used this method so many times, I've got a boilerplate version where I can drop in a SQL statement and some additional logic, and let it run.
The downside:
- it hammers my Apache server hard with HTTP requests
- the server log gets full too
- if you run said script for X days or weeks, the browser cache gets maxed out. There may be steps I can take to prevent that, but I haven't bothered to figure it out
- I imagine it's comparatively slow compared to other possible methods
- it's not transactional. I mean, it's not error-tolerant. I might let it run for a while, and if for some reason it crashes, I've got to start off exactly where it quit... not easy if I'm doing calculations that change the data (like, val = val + 1). I can solve this by putting a flag in the database that gets flipped when the process is run, but that means altering the table schema, which I dislike doing.
opinions?
I don't usually log anything (again, it depends on the data), although I may have it e-mail me an alert if it dies.
If it's not something I need to constantly watch and it's going to take a while, I'll open the browser from the server itself and let it run there. Then I don't have to worry about any power outages or network interruptions on my desktop affecting the process.
#!/usr/bin/perl
use DBI;
$sql_login= 'test_usr';
$sql_pass= 'testme';
$db_name= 'test';
$db_host= 'localhost';
$conn_string = "DBI:mysql:$db_name";
if ($db_host) { $conn_string .= ":$db_host"; }
$dbh = DBI->connect("$conn_string",$sql_login,$sql_pass);
$select = qq¦create table if not exists sorttest(id int(11) primary key auto_increment,¦;
$select .= qq¦one int(11),two int(11),three int(11),four int(11),five int(11));¦;
$sth = $dbh->prepare("$select");
$rv = $sth->execute or &error("Cannot create table",$dbh->errstr());
$sth->finish;
@qs=(
'insert into sorttest (one,two,three,four,five) values (5,2,4,1,3);',
'insert into sorttest(one,two,three,four,five) values (1,3,2,4,5);',
'insert into sorttest(one,two,three,four,five) values (7,8,1,2,5);',
'insert into sorttest(one,two,three,four,five) values (6,7,8,9,5);',
'insert into sorttest(one,two,three,four,five) values (1,2,3,4,5);'
);
$sth = $dbh->prepare("delete from sorttest");
$rv = $sth->execute or &error("Cannot delete data",$dbh->errstr());
$sth->finish;
foreach $v (@qs) {
$sth = $dbh->prepare("$v");
$rv = $sth->execute or &error("Cannot insert into table",$dbh->errstr());
$sth->finish;
}
$sth = $dbh->prepare("select * from sorttest");
$rv = $sth->execute or &error("Cannot get data",$dbh->errstr());
while ((@row) = $sth->fetchrow_array) {
$id = shift(@row);
@sort = reverse sort @row;
$select = qq¦update sorttest set one=$sort[0],two=$sort[1],three=$sort[2],four=$sort[3],five=$sort[4] where id=$id;¦;
$sth2 = $dbh->prepare("$select");
$rv2 = $sth2->execute or &error("Cannot update table",$dbh->errstr());
$sth2->finish;
}
$sth->finish;
$sth = $dbh->prepare("select * from sorttest;");
$rv = $sth->execute or &error("Cannot get sorted results",$dbh->errstr());
while (($id,$one,$two,$three,$four,$five) = $sth->fetchrow_array) {
print "$one,$two,$three,$four,$five\n";
}
$sth->finish;
$sth = $dbh->prepare("drop table sorttest;");
$rv = $sth->execute or &error("Cannot drop table",$dbh->errstr());
$sth->finish;
############################################################
sub error {
my ($err,$dberr);
($err,$dberr) = (@_);
print "crap. Error: $err \n DB error: $dberr";
exit 0;
}