2012-07-16 15 views
8

Ho una query generata da SQLAlchemy ORM. Si suppone di recuperare stream_items per un corso specifico, insieme a tutte le loro parti - risorse, blocchi di testo del contenuto, ecc. E agli utenti che li hanno pubblicati. Tuttavia, questa query sembra essere estremamente lenta, richiedendo minuti sul nostro database di produzione con 20.000 utenti nel database, circa 25 stream_items per il corso e un paio di blocchi di testo del contenuto per stream_item. Si noti che ci sono pochissimi altri record oltre agli utenti nel database perché abbiamo importato un gruppo di utenti ma un contenuto molto limitato.Come posso ottimizzare questa query prodotta da SQLAlchemy?

Modifica: si noti che ogni ID oggetto è una chiave esterna nella tabella franklin_object.

ho provato guardando la query, e hanno identificato diverse punte preoccupanti (guardando l'uscita EXPLAIN)

  1. Una delle ricerche è 'Utilizzando temporanea; Usando filesort '.
  2. La tabella utente viene colpito due volte con nessun indice
  3. La tabella dei contenuti blocco di testo viene colpito due volte con nessun indice

Comunque, io davvero non so cosa fare con questi, in particolare gli ultimi due problemi.

Ecco la domanda:

SELECT stream_item.id        AS stream_item_id, 
     franklin_object.id       AS franklin_object_id, 
     franklin_object.type       AS franklin_object_type, 
     franklin_object.uuid       AS franklin_object_uuid, 
     stream_item.parent_id      AS stream_item_parent_id, 
     stream_item.shown_at       AS stream_item_shown_at, 
     stream_item.author_id      AS stream_item_author_id, 
     stream_item.stream_sort_at     AS stream_item_stream_sort_at, 
     stream_item.PUBLIC       AS stream_item_public, 
     stream_item.created_at      AS stream_item_created_at, 
     stream_item.updated_at      AS stream_item_updated_at, 
     anon_1.content_text_block_text    AS anon_1_content_text_block_text, 
     anon_2.resource_id       AS anon_2_resource_id, 
     anon_2.franklin_object_id     AS anon_2_franklin_object_id, 
     anon_2.franklin_object_type     AS anon_2_franklin_object_type, 
     anon_2.franklin_object_uuid     AS anon_2_franklin_object_uuid, 
     anon_2.resource_top_parent_resource   AS anon_2_resource_top_parent_resource, 
     anon_2.resource_top_parent_id    AS anon_2_resource_top_parent_id, 
     anon_2.resource_title      AS anon_2_resource_title, 
     anon_2.resource_url       AS anon_2_resource_url, 
     anon_2.resource_image      AS anon_2_resource_image, 
     anon_2.resource_created_at     AS anon_2_resource_created_at, 
     anon_2.resource_updated_at     AS anon_2_resource_updated_at, 
     franklin_object_1.id       AS franklin_object_1_id, 
     franklin_object_1.type      AS franklin_object_1_type, 
     franklin_object_1.uuid      AS franklin_object_1_uuid, 
     anon_1.content_text_block_id     AS anon_1_content_text_block_id, 
     anon_1.franklin_object_id     AS anon_1_franklin_object_id, 
     anon_1.franklin_object_type     AS anon_1_franklin_object_type, 
     anon_1.franklin_object_uuid     AS anon_1_franklin_object_uuid, 
     anon_1.content_text_block_position   AS anon_1_content_text_block_position, 
     anon_1.content_text_block_franklin_object_id AS anon_1_content_text_block_franklin_object_id, 
     anon_1.content_text_block_created_at   AS anon_1_content_text_block_created_at, 
     anon_1.content_text_block_updated_at   AS anon_1_content_text_block_updated_at, 
     anon_3.user_password       AS anon_3_user_password, 
     anon_3.user_auth_token      AS anon_3_user_auth_token, 
     anon_3.user_id        AS anon_3_user_id, 
     anon_3.franklin_object_id     AS anon_3_franklin_object_id, 
     anon_3.franklin_object_type     AS anon_3_franklin_object_type, 
     anon_3.franklin_object_uuid     AS anon_3_franklin_object_uuid, 
     anon_3.user_email       AS anon_3_user_email, 
     anon_3.user_auth_token_expiration   AS anon_3_user_auth_token_expiration, 
     anon_3.user_active       AS anon_3_user_active, 
     anon_3.user_activation_token     AS anon_3_user_activation_token, 
     anon_3.user_first_name      AS anon_3_user_first_name, 
     anon_3.user_last_name      AS anon_3_user_last_name, 
     anon_3.user_image       AS anon_3_user_image, 
     anon_3.user_bio        AS anon_3_user_bio, 
     anon_3.user_aspirations      AS anon_3_user_aspirations, 
     anon_3.user_website       AS anon_3_user_website, 
     anon_3.user_resume       AS anon_3_user_resume, 
     anon_3.user_resume_name      AS anon_3_user_resume_name, 
     anon_3.user_primary_role      AS anon_3_user_primary_role, 
     anon_3.user_institution_id     AS anon_3_user_institution_id, 
     anon_3.user_birth_date      AS anon_3_user_birth_date, 
     anon_3.user_gender       AS anon_3_user_gender, 
     anon_3.user_graduation_year     AS anon_3_user_graduation_year, 
     anon_3.user_complete       AS anon_3_user_complete, 
     anon_3.user_masthead_y_position    AS anon_3_user_masthead_y_position, 
     anon_3.user_masthead       AS anon_3_user_masthead, 
     anon_3.user_fb_access_token     AS anon_3_user_fb_access_token, 
     anon_3.user_fb_user_id      AS anon_3_user_fb_user_id, 
     anon_3.user_location       AS anon_3_user_location, 
     anon_3.user_created_at      AS anon_3_user_created_at, 
     anon_3.user_updated_at      AS anon_3_user_updated_at, 
     anon_4.content_text_block_text    AS anon_4_content_text_block_text, 
     anon_4.content_text_block_id     AS anon_4_content_text_block_id, 
     anon_4.franklin_object_id     AS anon_4_franklin_object_id, 
     anon_4.franklin_object_type     AS anon_4_franklin_object_type, 
     anon_4.franklin_object_uuid     AS anon_4_franklin_object_uuid, 
     anon_4.content_text_block_position   AS anon_4_content_text_block_position, 
     anon_4.content_text_block_franklin_object_id AS anon_4_content_text_block_franklin_object_id, 
     anon_4.content_text_block_created_at   AS anon_4_content_text_block_created_at, 
     anon_4.content_text_block_updated_at   AS anon_4_content_text_block_updated_at, 
     anon_5.user_password       AS anon_5_user_password, 
     anon_5.user_auth_token      AS anon_5_user_auth_token, 
     anon_5.user_id        AS anon_5_user_id, 
     anon_5.franklin_object_id     AS anon_5_franklin_object_id, 
     anon_5.franklin_object_type     AS anon_5_franklin_object_type, 
     anon_5.franklin_object_uuid     AS anon_5_franklin_object_uuid, 
     anon_5.user_email       AS anon_5_user_email, 
     anon_5.user_auth_token_expiration   AS anon_5_user_auth_token_expiration, 
     anon_5.user_active       AS anon_5_user_active, 
     anon_5.user_activation_token     AS anon_5_user_activation_token, 
     anon_5.user_first_name      AS anon_5_user_first_name, 
     anon_5.user_last_name      AS anon_5_user_last_name, 
     anon_5.user_image       AS anon_5_user_image, 
     anon_5.user_bio        AS anon_5_user_bio, 
     anon_5.user_aspirations      AS anon_5_user_aspirations, 
     anon_5.user_website       AS anon_5_user_website, 
     anon_5.user_resume       AS anon_5_user_resume, 
     anon_5.user_resume_name      AS anon_5_user_resume_name, 
     anon_5.user_primary_role      AS anon_5_user_primary_role, 
     anon_5.user_institution_id     AS anon_5_user_institution_id, 
     anon_5.user_birth_date      AS anon_5_user_birth_date, 
     anon_5.user_gender       AS anon_5_user_gender, 
     anon_5.user_graduation_year     AS anon_5_user_graduation_year, 
     anon_5.user_complete       AS anon_5_user_complete, 
     anon_5.user_masthead_y_position    AS anon_5_user_masthead_y_position, 
     anon_5.user_masthead       AS anon_5_user_masthead, 
     anon_5.user_fb_access_token     AS anon_5_user_fb_access_token, 
     anon_5.user_fb_user_id      AS anon_5_user_fb_user_id, 
     anon_5.user_location       AS anon_5_user_location, 
     anon_5.user_created_at      AS anon_5_user_created_at, 
     anon_5.user_updated_at      AS anon_5_user_updated_at, 
     anon_6.stream_item_id      AS anon_6_stream_item_id, 
     anon_6.franklin_object_id     AS anon_6_franklin_object_id, 
     anon_6.franklin_object_type     AS anon_6_franklin_object_type, 
     anon_6.franklin_object_uuid     AS anon_6_franklin_object_uuid, 
     anon_6.stream_item_parent_id     AS anon_6_stream_item_parent_id, 
     anon_6.stream_item_shown_at     AS anon_6_stream_item_shown_at, 
     anon_6.stream_item_author_id     AS anon_6_stream_item_author_id, 
     anon_6.stream_item_stream_sort_at   AS anon_6_stream_item_stream_sort_at, 
     anon_6.stream_item_public     AS anon_6_stream_item_public, 
     anon_6.stream_item_created_at    AS anon_6_stream_item_created_at, 
     anon_6.stream_item_updated_at    AS anon_6_stream_item_updated_at 
FROM franklin_object 
     INNER JOIN stream_item 
       ON franklin_object.id = stream_item.id 
     INNER JOIN (SELECT franklin_object.id     AS franklin_object_id, 
          franklin_object.type     AS franklin_object_type, 
          franklin_object.uuid     AS franklin_object_uuid, 
          content_text_block.id     AS content_text_block_id, 
          content_text_block.text    AS content_text_block_text, 
          content_text_block.position   AS content_text_block_position, 
          content_text_block.franklin_object_id AS content_text_block_franklin_object_id, 
          content_text_block.created_at   AS content_text_block_created_at, 
          content_text_block.updated_at   AS content_text_block_updated_at 
        FROM franklin_object 
          INNER JOIN content_text_block 
            ON franklin_object.id = content_text_block.id) AS anon_1 
       ON stream_item.id = anon_1.content_text_block_franklin_object_id 
     LEFT OUTER JOIN contents_resources AS contents_resources_1 
        ON anon_1.content_text_block_id = contents_resources_1.content_id 
     LEFT OUTER JOIN (SELECT franklin_object.id   AS franklin_object_id, 
           franklin_object.type   AS franklin_object_type, 
           franklin_object.uuid   AS franklin_object_uuid, 
           resource.id     AS resource_id, 
           resource.top_parent_resource AS resource_top_parent_resource, 
           resource.top_parent_id  AS resource_top_parent_id, 
           resource.title    AS resource_title, 
           resource.url     AS resource_url, 
           resource.image    AS resource_image, 
           resource.created_at   AS resource_created_at, 
           resource.updated_at   AS resource_updated_at 
         FROM franklin_object 
           INNER JOIN resource 
             ON franklin_object.id = resource.id) AS anon_2 
        ON anon_2.resource_id = contents_resources_1.resource_id 
     LEFT OUTER JOIN contents_franklin_objects AS contents_franklin_objects_1 
        ON anon_1.content_text_block_id = contents_franklin_objects_1.content_id 
     LEFT OUTER JOIN franklin_object AS franklin_object_1 
        ON franklin_object_1.id = contents_franklin_objects_1.franklin_object_id 
     LEFT OUTER JOIN likers AS likers_1 
        ON stream_item.id = likers_1.post_id 
     LEFT OUTER JOIN (SELECT franklin_object.id   AS franklin_object_id, 
           franklin_object.type  AS franklin_object_type, 
           franklin_object.uuid  AS franklin_object_uuid, 
           USER.id     AS user_id, 
           USER.email     AS user_email, 
           USER.password    AS user_password, 
           USER.auth_token   AS user_auth_token, 
           USER.auth_token_expiration AS user_auth_token_expiration, 
           USER.active    AS user_active, 
           USER.activation_token  AS user_activation_token, 
           USER.first_name   AS user_first_name, 
           USER.last_name    AS user_last_name, 
           USER.image     AS user_image, 
           USER.bio     AS user_bio, 
           USER.aspirations   AS user_aspirations, 
           USER.website    AS user_website, 
           USER.resume    AS user_resume, 
           USER.resume_name   AS user_resume_name, 
           USER.primary_role   AS user_primary_role, 
           USER.institution_id  AS user_institution_id, 
           USER.birth_date   AS user_birth_date, 
           USER.gender    AS user_gender, 
           USER.graduation_year  AS user_graduation_year, 
           USER.complete    AS user_complete, 
           USER.masthead_y_position AS user_masthead_y_position, 
           USER.masthead    AS user_masthead, 
           USER.fb_access_token  AS user_fb_access_token, 
           USER.fb_user_id   AS user_fb_user_id, 
           USER.location    AS user_location, 
           USER.created_at   AS user_created_at, 
           USER.updated_at   AS user_updated_at 
         FROM franklin_object 
           INNER JOIN USER 
             ON franklin_object.id = USER.id) AS anon_3 
        ON anon_3.user_id = likers_1.user_id 
     LEFT OUTER JOIN contents_franklin_objects AS contents_franklin_objects_2 
        ON franklin_object.id = contents_franklin_objects_2.franklin_object_id 
     LEFT OUTER JOIN (SELECT franklin_object.id     AS franklin_object_id, 
           franklin_object.type     AS franklin_object_type, 
           franklin_object.uuid     AS franklin_object_uuid, 
           content_text_block.id     AS content_text_block_id, 
           content_text_block.text    AS content_text_block_text, 
           content_text_block.position   AS content_text_block_position, 
           content_text_block.franklin_object_id AS content_text_block_franklin_object_id, 
           content_text_block.created_at   AS content_text_block_created_at, 
           content_text_block.updated_at   AS content_text_block_updated_at 
         FROM franklin_object 
           INNER JOIN content_text_block 
             ON franklin_object.id = content_text_block.id) AS anon_4 
        ON anon_4.content_text_block_id = contents_franklin_objects_2.content_id 
     LEFT OUTER JOIN (SELECT franklin_object.id   AS franklin_object_id, 
           franklin_object.type  AS franklin_object_type, 
           franklin_object.uuid  AS franklin_object_uuid, 
           stream_item.id    AS stream_item_id, 
           stream_item.parent_id  AS stream_item_parent_id, 
           stream_item.shown_at  AS stream_item_shown_at, 
           stream_item.author_id  AS stream_item_author_id, 
           stream_item.stream_sort_at AS stream_item_stream_sort_at, 
           stream_item.PUBLIC   AS stream_item_public, 
           stream_item.created_at  AS stream_item_created_at, 
           stream_item.updated_at  AS stream_item_updated_at 
         FROM franklin_object 
           INNER JOIN stream_item 
             ON franklin_object.id = stream_item.id) AS anon_6 
        ON anon_6.stream_item_parent_id = franklin_object.id 
     LEFT OUTER JOIN likers AS likers_2 
        ON anon_6.stream_item_id = likers_2.post_id 
     LEFT OUTER JOIN (SELECT franklin_object.id   AS franklin_object_id, 
           franklin_object.type  AS franklin_object_type, 
           franklin_object.uuid  AS franklin_object_uuid, 
           USER.id     AS user_id, 
           USER.email     AS user_email, 
           USER.password    AS user_password, 
           USER.auth_token   AS user_auth_token, 
           USER.auth_token_expiration AS user_auth_token_expiration, 
           USER.active    AS user_active, 
           USER.activation_token  AS user_activation_token, 
           USER.first_name   AS user_first_name, 
           USER.last_name    AS user_last_name, 
           USER.image     AS user_image, 
           USER.bio     AS user_bio, 
           USER.aspirations   AS user_aspirations, 
           USER.website    AS user_website, 
           USER.resume    AS user_resume, 
           USER.resume_name   AS user_resume_name, 
           USER.primary_role   AS user_primary_role, 
           USER.institution_id  AS user_institution_id, 
           USER.birth_date   AS user_birth_date, 
           USER.gender    AS user_gender, 
           USER.graduation_year  AS user_graduation_year, 
           USER.complete    AS user_complete, 
           USER.masthead_y_position AS user_masthead_y_position, 
           USER.masthead    AS user_masthead, 
           USER.fb_access_token  AS user_fb_access_token, 
           USER.fb_user_id   AS user_fb_user_id, 
           USER.location    AS user_location, 
           USER.created_at   AS user_created_at, 
           USER.updated_at   AS user_updated_at 
         FROM franklin_object 
           INNER JOIN USER 
             ON franklin_object.id = USER.id) AS anon_5 
        ON anon_5.user_id = likers_2.user_id 
WHERE stream_item.parent_id = 11 
ORDER BY stream_item.stream_sort_at DESC, 
      anon_1.content_text_block_position, 
      anon_6.stream_item_stream_sort_at DESC 

E l'uscita SPIEGARE:

ID SELECT_TYPE TABLE POSSIBLY_KEYS KEY KEY_LEN REF ROWS EXTRA 
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 599 Using  temporary; Using filesort 
1 PRIMARY stream_item eq_ref PRIMARY,parent_id PRIMARY 4 anon_1.content_text_block_franklin_object_id 1 Using where 
1 PRIMARY contents_resources_1 ref content_id content_id 5 anon_1.content_text_block_id 2 
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 7 
1 PRIMARY contents_franklin_objects_1 ref content_id content_id 5 anon_1.content_text_block_id 1 
1 PRIMARY franklin_object eq_ref PRIMARY PRIMARY 4 franklin.stream_item.id 1 Using where 
1 PRIMARY franklin_object_1 eq_ref PRIMARY PRIMARY 4 franklin.contents_franklin_objects_1.franklin_object_id 1 
1 PRIMARY likers_1 ref post_id post_id 5 franklin.stream_item.id 1 
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 136 
1 PRIMARY contents_franklin_objects_2 ref franklin_object_id franklin_object_id 5 franklin.stream_item.id 1 
1 PRIMARY <derived5> ALL NULL NULL NULL NULL 599 
1 PRIMARY <derived6> ALL NULL NULL NULL NULL 608 
1 PRIMARY likers_2 ref post_id post_id 5 anon_6.stream_item_id 1 
1 PRIMARY <derived7> ALL NULL NULL NULL NULL 136 
7 DERIVED user ALL PRIMARY NULL NULL NULL 133 
7 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.user.id 1 
6 DERIVED stream_item ALL PRIMARY NULL NULL NULL 709 
6 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.stream_item.id 1 
5 DERIVED content_text_block ALL PRIMARY NULL NULL NULL 666 
5 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.content_text_block.id  1 
4 DERIVED user ALL PRIMARY NULL NULL NULL 133 
4 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.user.id 1 
3 DERIVED resource ALL PRIMARY NULL NULL NULL 7 
3 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.resource.id 1 
2 DERIVED content_text_block ALL PRIMARY NULL NULL NULL 666 
2 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.content_text_block.id 1 

Come posso ridurre il tutte le query a qualcosa di più veloce? Quali altri modi posso accelerare?

È il modo in cui gli oggetti di Franklin sono impostati come antipattern? Il modo in cui funziona è che la tabella franklin_object ha due colonne: id e type. Quindi ogni tipo è una tabella, con una chiave primaria che è una chiave esterna in franklin_object.

Il codice che genera il codice SQL è qualcosa sulla falsariga di:

stream_item_query = StreamItem.query.options(db.joinedload('stream_items'),db.joinedload('contents_included_in'),db.joinedload('contents.resources'),db.joinedload('contents.objects'),db.subqueryload('likers'))

stream_items = stream_item_query.filter(StreamItem.parent_id == community_id).order_by(db.desc(StreamItem.stream_sort_at)).all()

+0

Aggiunto il codice orm sopra. –

+0

Le tue classi sono mappate alle tabelle o per selezionare le statizioni su più tabelle? i join sono una specie di strano 'l join (selezionare * da un join b) r invece di quello che mi aspetterei,' l join b join a' – SingleNegationElimination

+0

Ogni classe eredita (ereditarietà unita) da oggetto franklin. –

risposta

4

Wow, questo fa male il mio cervello un po '. Cercando di capire cosa sta facendo la query, quali sono tutte le tabelle e le relazioni sono noiose. Se hai un'esperienza simile, lascia che sia il primo indizio che probabilmente stai cercando di fare troppo in questa singola query.

Il mio suggerimento è quello di ripensare l'intero approccio.

SQLAlchemy è uno strumento molto carino, e non ho intenzione di prenderlo in giro (o la scelta di mysql), ma come con la maggior parte degli strumenti ORM è necessario considerare i costi con il loro uso. Un esempio è questo business table franklin_object. È un anti-modello? Sì e No. Ha senso da una prospettiva puramente OO. È possibile determinare quali tabelle interrogare guardando in uno id in questa tabella. Da una prospettiva di query relazionale, serve pochissimo scopo. Potrei rimuovere ogni istanza di franklin_object dalla query e non perdere nient'altro che ... le colonne da franklin_object. Se questa è un'opzione praticabile, lo farei subito.

Esaminiamo questo collegamento con franklin_object ulteriormente.Guardando i sub-query, hanno tutti la stessa forma:

SELECT franklin_object.id   AS franklin_object_id, 
     franklin_object.type   AS franklin_object_type, 
     franklin_object.uuid   AS franklin_object_uuid, 
     linked_table.id    AS linked_table_id, 
     linked_table.col2   AS col2 --and more 
    FROM franklin_object 
    INNER JOIN linked_table 
     ON franklin_object.id = linked_table.id) AS anon_n 

Non ci sono molte informazioni per il database di andare avanti, per quanto come ottimizzare questa parte della query, a prescindere dalla statistica. Forse se franklin_object fosse limitato specificando lo type in una clausola where, la query sarebbe migliore. Può essere.

Questo è particolarmente problematico con la tabella USER, in quanto questa tabella ha molti record (come dici tu). Poiché stai interrogando la maggior parte delle colonne e l'ottimizzatore non è in grado di calcolare con precisione quante righe verranno recuperate, è logico eseguire una scansione completa della tabella. Nel tuo caso, due volte.

Un altro aspetto è l'enorme numero di join coinvolti. Se togliamo tutti i riferimenti franklin_object, ci sono ancora 11 join. Non è terribile, se il tuo modello di dati era più relazionale, ma non lo è. La query generata non fornisce molto aiuto al database per capire il modo migliore per eseguire la query, e quindi non fa un buon lavoro. Forse potresti mitigarlo con suggerimenti e così via, ma scommetto che questo ti morderà nel lungo periodo.

Si sta utilizzando uno strumento ORM, quindi utilizza realmente lo. Non guadagni nulla avendo una query così ampia da fare tutto in una volta. Potrebbe essere diviso un po 'per le prestazioni. Esegui i recuperi pigri per evitare query enormi e complicate. Direi di provare, solo per vedere come va, a fare tutto pigramente. Le prestazioni saranno probabilmente ok, direi meglio. Non eccezionale, probabilmente nemmeno accettabile, ma è meglio che essere in grado di prendere un caffè mentre il database gira.

Quindi, inizia a mettere insieme le cose in blocchi più snelli. Unisci oggetti che logicamente hanno senso, come ad esempio resource e contents_resources. Un altro esempio, la connessione tra stream_item, likers e user è duplicata. Crea una query e lascia che SQLAlchemy faccia il suo dovere.

Come ultima risorsa, è possibile implementare una sorta di meccanismo di memorizzazione nella cache. Forse denormalizzare le tabelle da qualche parte. In un sistema a modifica lenta e ad alta leggibilità, è possibile inserire queste tabelle in un'altra struttura in cui le query sono semplici e veloci. Cioè, per eseguire l'elaborazione in anticipo e memorizzarla in un'unica tabella.

Good Luck

Problemi correlati