La domanda: Perché le prestazioni di MySQL diminuiscono per le query che uniscono tabelle quasi vuote se eseguite in parallelo?Perché le prestazioni di MySQL diminuiscono quando le query vengono eseguite in parallelo?
Di seguito è fornita una spiegazione più dettagliata del problema che sto affrontando. Ho due tabelle in MySQL
CREATE TABLE first (
num int(10) NOT NULL,
UNIQUE KEY key_num (num)
) ENGINE=InnoDB
CREATE TABLE second (
num int(10) NOT NULL,
num2 int(10) NOT NULL,
UNIQUE KEY key_num (num, num2)
) ENGINE=InnoDB
Il primo contiene circa un migliaio di record. Il secondo è vuoto o contiene pochissimi record. Contiene anche un doppio indice che in qualche modo si riferisce al problema: il problema scompare per singolo indice. Ora sto cercando di fare molte query identiche a quelle tabelle in parallelo. Ogni query è simile al seguente:
SELECT first.num
FROM first
LEFT JOIN second AS second_1 ON second_1.num = -1 # non-existent key
LEFT JOIN second AS second_2 ON second_2.num = -2 # non-existent key
LEFT JOIN second AS second_3 ON second_3.num = -3 # non-existent key
LEFT JOIN second AS second_4 ON second_4.num = -4 # non-existent key
LEFT JOIN second AS second_5 ON second_5.num = -5 # non-existent key
LEFT JOIN second AS second_6 ON second_6.num = -6 # non-existent key
WHERE second_1.num IS NULL
AND second_2.num IS NULL
AND second_3.num IS NULL
AND second_4.num IS NULL
AND second_5.num IS NULL
AND second_6.num IS NULL
Il problema che sto ricevendo è che invece di avere un aumento quasi lineare in termini di prestazioni su 8 core macchina Io in realtà sono una goccia. Vale a dire avere un processo, il numero tipico di richieste al secondo che ho è di circa 200. Avendo due processi invece di aumento previsto fino a 300 - 400 query al secondo ho effettivamente un calo fino a 150. Per 10 processi ho solo 70 query al secondo Il codice Perl che sto utilizzando per il test è la seguente:
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
use Parallel::Benchmark;
use SQL::Abstract;
use SQL::Abstract::Plugin::InsertMulti;
my $children_dbh;
foreach my $second_table_row_count (0, 1, 1000) {
print '#' x 80, "\nsecond_table_row_count = $second_table_row_count\n";
create_and_fill_tables(1000, $second_table_row_count);
foreach my $concurrency (1, 2, 3, 4, 6, 8, 10, 20) {
my $bm = Parallel::Benchmark->new(
'benchmark' => sub {
_run_sql();
return 1;
},
'concurrency' => $concurrency,
'time' => 3,
);
my $result = $bm->run();
}
}
sub create_and_fill_tables {
my ($first_table_row_count, $second_table_row_count) = @_;
my $dbh = dbi_connect();
{
$dbh->do(q{DROP TABLE IF EXISTS first});
$dbh->do(q{
CREATE TABLE first (
num int(10) NOT NULL,
UNIQUE KEY key_num (num)
) ENGINE=InnoDB
});
if ($first_table_row_count) {
my ($stmt, @bind) = SQL::Abstract->new()->insert_multi(
'first',
['num'],
[map {[$_]} 1 .. $first_table_row_count],
);
$dbh->do($stmt, undef, @bind);
}
}
{
$dbh->do(q{DROP TABLE IF EXISTS second});
$dbh->do(q{
CREATE TABLE second (
num int(10) NOT NULL,
num2 int(10) NOT NULL,
UNIQUE KEY key_num (num, num2)
) ENGINE=InnoDB
});
if ($second_table_row_count) {
my ($stmt, @bind) = SQL::Abstract->new()->insert_multi(
'second',
['num'],
[map {[$_]} 1 .. $second_table_row_count],
);
$dbh->do($stmt, undef, @bind);
}
}
}
sub _run_sql {
$children_dbh ||= dbi_connect();
$children_dbh->selectall_arrayref(q{
SELECT first.num
FROM first
LEFT JOIN second AS second_1 ON second_1.num = -1
LEFT JOIN second AS second_2 ON second_2.num = -2
LEFT JOIN second AS second_3 ON second_3.num = -3
LEFT JOIN second AS second_4 ON second_4.num = -4
LEFT JOIN second AS second_5 ON second_5.num = -5
LEFT JOIN second AS second_6 ON second_6.num = -6
WHERE second_1.num IS NULL
AND second_2.num IS NULL
AND second_3.num IS NULL
AND second_4.num IS NULL
AND second_5.num IS NULL
AND second_6.num IS NULL
});
}
sub dbi_connect {
return DBI->connect(
'dbi:mysql:'
. 'database=tmp'
. ';host=localhost'
. ';port=3306',
'root',
'',
);
}
E per confrontare le query di questo tipo eseguito in concomitanza con l'aumento delle prestazioni:
SELECT first.num
FROM first
LEFT JOIN second AS second_1 ON second_1.num = 1 # existent key
LEFT JOIN second AS second_2 ON second_2.num = 2 # existent key
LEFT JOIN second AS second_3 ON second_3.num = 3 # existent key
LEFT JOIN second AS second_4 ON second_4.num = 4 # existent key
LEFT JOIN second AS second_5 ON second_5.num = 5 # existent key
LEFT JOIN second AS second_6 ON second_6.num = 6 # existent key
WHERE second_1.num IS NOT NULL
AND second_2.num IS NOT NULL
AND second_3.num IS NOT NULL
AND second_4.num IS NOT NULL
AND second_5.num IS NOT NULL
AND second_6.num IS NOT NULL
risultati dei test, misurazioni della CPU e di utilizzo del disco sono qui :
* table `first` have 1000 rows * table `second` have 6 rows: `[1,1],[2,2],..[6,6]` For query: SELECT first.num FROM first LEFT JOIN second AS second_1 ON second_1.num = -1 # non-existent key LEFT JOIN second AS second_2 ON second_2.num = -2 # non-existent key LEFT JOIN second AS second_3 ON second_3.num = -3 # non-existent key LEFT JOIN second AS second_4 ON second_4.num = -4 # non-existent key LEFT JOIN second AS second_5 ON second_5.num = -5 # non-existent key LEFT JOIN second AS second_6 ON second_6.num = -6 # non-existent key WHERE second_1.num IS NULL AND second_2.num IS NULL AND second_3.num IS NULL AND second_4.num IS NULL AND second_5.num IS NULL AND second_6.num IS NULL Results: concurrency: 1, speed: 162.910/sec concurrency: 2, speed: 137.818/sec concurrency: 3, speed: 130.728/sec concurrency: 4, speed: 107.387/sec concurrency: 6, speed: 90.513/sec concurrency: 8, speed: 80.445/sec concurrency: 10, speed: 80.381/sec concurrency: 20, speed: 84.069/sec System usage after for last 60 minutes of running query in 6 processes: $ iostat -cdkx 60 avg-cpu: %user %nice %system %iowait %steal %idle 74.82 0.00 0.08 0.00 0.08 25.02 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda1 0.00 0.00 0.00 0.12 0.00 0.80 13.71 0.00 1.43 1.43 0.02 sdf10 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 10.00 5.00 0.02 sdf4 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 30.00 15.00 0.05 sdm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf8 0.00 0.00 0.00 0.37 0.00 1.24 6.77 0.00 5.00 3.18 0.12 sdf6 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 10.00 5.00 0.02 sdf9 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 0.00 0.00 0.00 sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf3 0.00 0.00 0.00 0.08 0.00 1.33 32.00 0.00 4.00 4.00 0.03 sdf2 0.00 0.00 0.00 0.17 0.00 1.37 16.50 0.00 3.00 3.00 0.05 sdf15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf1 0.00 0.00 0.00 0.05 0.00 0.40 16.00 0.00 0.00 0.00 0.00 sdf13 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 10.00 5.00 0.02 sdf5 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 50.00 25.00 0.08 sdm2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdm1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf12 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 10.00 5.00 0.02 sdf11 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 10.00 5.00 0.02 sdf7 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 10.00 5.00 0.02 md0 0.00 0.00 0.00 0.97 0.00 13.95 28.86 0.00 0.00 0.00 0.00 ################################################################################ For query: SELECT first.num FROM first LEFT JOIN second AS second_1 ON second_1.num = 1 # existent key LEFT JOIN second AS second_2 ON second_2.num = 2 # existent key LEFT JOIN second AS second_3 ON second_3.num = 3 # existent key LEFT JOIN second AS second_4 ON second_4.num = 4 # existent key LEFT JOIN second AS second_5 ON second_5.num = 5 # existent key LEFT JOIN second AS second_6 ON second_6.num = 6 # existent key WHERE second_1.num IS NOT NULL AND second_2.num IS NOT NULL AND second_3.num IS NOT NULL AND second_4.num IS NOT NULL AND second_5.num IS NOT NULL AND second_6.num IS NOT NULL Results: concurrency: 1, speed: 875.973/sec concurrency: 2, speed: 944.986/sec concurrency: 3, speed: 1256.072/sec concurrency: 4, speed: 1401.657/sec concurrency: 6, speed: 1354.351/sec concurrency: 8, speed: 1110.100/sec concurrency: 10, speed: 1145.251/sec concurrency: 20, speed: 1142.514/sec System usage after for last 60 minutes of running query in 6 processes: $ iostat -cdkx 60 avg-cpu: %user %nice %system %iowait %steal %idle 74.40 0.00 0.53 0.00 0.06 25.01 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda1 0.00 0.00 0.00 0.02 0.00 0.13 16.00 0.00 0.00 0.00 0.00 sdf10 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 10.00 5.00 0.02 sdf4 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 10.00 5.00 0.02 sdm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf8 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 10.00 5.00 0.02 sdf6 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 0.00 0.00 0.00 sdf9 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 10.00 5.00 0.02 sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf3 0.00 0.00 0.00 0.13 0.00 2.67 40.00 0.00 3.75 2.50 0.03 sdf2 0.00 0.00 0.00 0.23 0.00 2.72 23.29 0.00 2.14 1.43 0.03 sdf15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf14 0.00 0.00 0.00 0.98 0.00 0.54 1.10 0.00 2.71 2.71 0.27 sdf1 0.00 0.00 0.00 0.08 0.00 1.47 35.20 0.00 8.00 6.00 0.05 sdf13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf5 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 10.00 5.00 0.02 sdm2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdm1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf11 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 0.00 0.00 0.00 sdf7 0.00 0.00 0.00 0.03 0.00 1.07 64.00 0.00 10.00 5.00 0.02 md0 0.00 0.00 0.00 1.70 0.00 15.92 18.74 0.00 0.00 0.00 0.00 ################################################################################ And this server has lots of free memory. Example of top: top - 19:02:59 up 4:23, 4 users, load average: 4.43, 3.03, 2.01 Tasks: 218 total, 1 running, 217 sleeping, 0 stopped, 0 zombie Cpu(s): 72.8%us, 0.7%sy, 0.0%ni, 26.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.1%st Mem: 71701416k total, 22183980k used, 49517436k free, 284k buffers Swap: 0k total, 0k used, 0k free, 1282768k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2506 mysql 20 0 51.7g 17g 5920 S 590 25.8 213:15.12 mysqld 9348 topadver 20 0 72256 11m 1428 S 2 0.0 0:01.45 perl 9349 topadver 20 0 72256 11m 1428 S 2 0.0 0:01.44 perl 9350 topadver 20 0 72256 11m 1428 S 2 0.0 0:01.45 perl 9351 topadver 20 0 72256 11m 1428 S 1 0.0 0:01.44 perl 9352 topadver 20 0 72256 11m 1428 S 1 0.0 0:01.44 perl 9353 topadver 20 0 72256 11m 1428 S 1 0.0 0:01.44 perl 9346 topadver 20 0 19340 1504 1064 R 0 0.0 0:01.89 top
Qualcuno ha un'idea del perché le prestazioni sono diminuite per la query con chiavi inesistenti?
perché si utilizza "dove num .... è nullo" se la tabella di creazione ha già una condizione NULL? – jcho360
@ jcho360 i join di sinistra creeranno valori nulli del genere. Sembra una configurazione. Awayka, potresti fornire alcune informazioni sul tuo server MYSQL? Ha più processori? – Twelfth
@Twelfth mysql Ver 14.14 Distrib 5.1.59, per debian-linux-gnu (x86_64) utilizzando readline 5.1 su m2.4xlarge istanza EC2 che ha 8 core – awayka