MongoDB 查询优化

本文总结了 MongoDB 查询的执行过程,如何读懂 Explain 信息、执行统计信息,如何创建索引,如何建立数据归档。

MoongoDB 版本:v4.4

查询的执行过程

Explain 与 执行统计

Mongodb也提供了explain语句,可以获取query语句的查询计划(queryPlanner)、以及执行过程中的统计信息(executionStats)。

land> db.boxes.find({"stopped_at":{"$lt": ISODate("2023-12-07T21:12:00.957+0800")},"status":"running"}).sort({"created_at":1}).explain()
{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'land.boxes',
    indexFilterSet: false,
    parsedQuery: {
      '$and': [
        { status: { '$eq': 'running' } },
        { stopped_at: { '$lt': ISODate("2023-12-07T13:12:00.957Z") } }
      ]
    },
    winningPlan: {
      stage: 'FETCH',
      filter: {
        '$and': [
          { status: { '$eq': 'running' } },
          {
            stopped_at: { '$lt': ISODate("2023-12-07T13:12:00.957Z") }
          }
        ]
      },
      inputStage: {
        stage: 'IXSCAN',
        keyPattern: { created_at: 1 },
        indexName: 'created_at_1',
        isMultiKey: false,
        multiKeyPaths: { created_at: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { created_at: [ '[MinKey, MaxKey]' ] }
      }
    },
    rejectedPlans: []
  },
  serverInfo: {
    host: 'mongo-0',
    port: 27017,
    version: '4.2.14',
    gitVersion: '0e6db36e92d82cc81cbd40ffd607eae88dc1f09d'
  },
  ok: 1
}

结果分为四部分:queryPlanner、executionStats、serverInfo、ok,仅关注queryPlanner、executionStats这两部分。

查询计划是按 tree of stages 组织的:

"winningPlan" : {
   "stage" : <STAGE1>,
   ...
   "inputStage" : {
      "stage" : <STAGE2>,
      ...
      "inputStage" : {
         "stage" : <STAGE3>,
         ...
      }
   }
}

每个阶段将其结果(即文档或索引键)传递给父节点。叶节点访问集合或索引。内部节点操作子节点产生的文档或索引键。根节点是MongoDB从中派生结果集的最后阶段。

阶段是对操作的描述,有以下几种类型:

  • COLLSCAN for a collection scan

  • IXSCAN for scanning index keys

  • FETCH for retrieving documents

  • SHARD_MERGE for merging results from shards

  • SHARDING_FILTER for filtering out orphan documents from shards

executionStats 就是执行 queryPlanner.winningPlan 这个计划时的统计信息。在mongo shell中的使用方法是在query语句后面加上.explain('executionStats')。

land> db.boxes.find({"stopped_at":{"$lt": ISODate("2023-12-07T21:12:00.957+0800")},"status":"running"}).sort({"created_at":1}).explain("executionStats")
{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'land.boxes',
    indexFilterSet: false,
    parsedQuery: {
      '$and': [
        { status: { '$eq': 'running' } },
        { stopped_at: { '$lt': ISODate("2023-12-07T13:12:00.957Z") } }
      ]
    },
    winningPlan: {
      stage: 'FETCH',
      filter: {
        '$and': [
          { status: { '$eq': 'running' } },
          {
            stopped_at: { '$lt': ISODate("2023-12-07T13:12:00.957Z") }
          }
        ]
      },
      inputStage: {
        stage: 'IXSCAN',
        keyPattern: { created_at: 1 },
        indexName: 'created_at_1',
        isMultiKey: false,
        multiKeyPaths: { created_at: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { created_at: [ '[MinKey, MaxKey]' ] }
      }
    },
    rejectedPlans: []
  },
  executionStats: {
    executionSuccess: true,
    nReturned: 0,
    executionTimeMillis: 15818,
    totalKeysExamined: 87411,
    totalDocsExamined: 87411,
    executionStages: {
      stage: 'FETCH',
      filter: {
        '$and': [
          { status: { '$eq': 'running' } },
          {
            stopped_at: { '$lt': ISODate("2023-12-07T13:12:00.957Z") }
          }
        ]
      },
      nReturned: 0,
      executionTimeMillisEstimate: 14065,
      works: 87412,
      advanced: 0,
      needTime: 87411,
      needYield: 0,
      saveState: 892,
      restoreState: 892,
      isEOF: 1,
      docsExamined: 87411,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 87411,
        executionTimeMillisEstimate: 846,
        works: 87412,
        advanced: 87411,
        needTime: 0,
        needYield: 0,
        saveState: 892,
        restoreState: 892,
        isEOF: 1,
        keyPattern: { created_at: 1 },
        indexName: 'created_at_1',
        isMultiKey: false,
        multiKeyPaths: { created_at: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { created_at: [ '[MinKey, MaxKey]' ] },
        keysExamined: 87411,
        seeks: 1,
        dupsTested: 0,
        dupsDropped: 0
      }
    }
  },
  serverInfo: {
    host: 'mongo-0',
    port: 27017,
    version: '4.2.14',
    gitVersion: '0e6db36e92d82cc81cbd40ffd607eae88dc1f09d'
  },
  ok: 1
}

explain.executionStats

winning plan 执行的统计信息

explain.executionStats.nReturned

符合条件的总行数

explain.executionStats.executionTimeMillis

执行时间,毫秒

explain.executionStats.totalKeysExamined

索引扫描条目总数

explain.executionStats.totalDocsExamined

文档扫描条目总数,对应 COLLSCANFETCH

索引操作

查看索引

db.boxes.getIndexes();

创建索引

db.boxes.createIndex({"status:": 1, "stopped_at": -1});

删除索引

db.boxes.dropIndex("created_at_1_status_1_deleted_at_1")

数据归档

将用不到的历史数据分离出来建立归档,也是一种有效减少数据量提升查询性能的手段。

1、备份 collection

 db.boxes.aggregate([{$match: { created_at: { $lte: new Date(2023, 1, 1)} }}, {$out: "boxes_archive"}]);

2、删除历史数据

db.boxes.deleteMany({ created_at: { $lt: new Date(2023, 1, 1)}});

参考

[1] MongoDB查询优化:从 10s 到 10ms
[2] Explain Results