MongoDB 查询优化
本文总结了 MongoDB 查询的执行过程,如何读懂 Explain 信息、执行统计信息,如何创建索引,如何建立数据归档。
MoongoDB 版本:v4.4
查询的执行过程
Explain 与 执行统计
Mongodb也提供了explain语句,可以获取query语句的查询计划(queryPlanner)、以及执行过程中的统计信息(executionStats)。
land> db.boxes.find({"stopped_at":{"$lt": ISODate("2023-12-07T21:12:00.957+0800")},"status":"running"}).sort({"created_at":1}).explain()
{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'land.boxes',
    indexFilterSet: false,
    parsedQuery: {
      '$and': [
        { status: { '$eq': 'running' } },
        { stopped_at: { '$lt': ISODate("2023-12-07T13:12:00.957Z") } }
      ]
    },
    winningPlan: {
      stage: 'FETCH',
      filter: {
        '$and': [
          { status: { '$eq': 'running' } },
          {
            stopped_at: { '$lt': ISODate("2023-12-07T13:12:00.957Z") }
          }
        ]
      },
      inputStage: {
        stage: 'IXSCAN',
        keyPattern: { created_at: 1 },
        indexName: 'created_at_1',
        isMultiKey: false,
        multiKeyPaths: { created_at: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { created_at: [ '[MinKey, MaxKey]' ] }
      }
    },
    rejectedPlans: []
  },
  serverInfo: {
    host: 'mongo-0',
    port: 27017,
    version: '4.2.14',
    gitVersion: '0e6db36e92d82cc81cbd40ffd607eae88dc1f09d'
  },
  ok: 1
}
结果分为四部分:queryPlanner、executionStats、serverInfo、ok,仅关注queryPlanner、executionStats这两部分。
查询计划是按 tree of stages 组织的:
"winningPlan" : {
   "stage" : <STAGE1>,
   ...
   "inputStage" : {
      "stage" : <STAGE2>,
      ...
      "inputStage" : {
         "stage" : <STAGE3>,
         ...
      }
   }
}
每个阶段将其结果(即文档或索引键)传递给父节点。叶节点访问集合或索引。内部节点操作子节点产生的文档或索引键。根节点是MongoDB从中派生结果集的最后阶段。
阶段是对操作的描述,有以下几种类型:
- 
COLLSCANfor a collection scan
- 
IXSCANfor scanning index keys
- 
FETCHfor retrieving documents
- 
SHARD_MERGEfor merging results from shards
- 
SHARDING_FILTERfor filtering out orphan documents from shards
executionStats 就是执行 queryPlanner.winningPlan 这个计划时的统计信息。在mongo shell中的使用方法是在query语句后面加上.explain('executionStats')。
land> db.boxes.find({"stopped_at":{"$lt": ISODate("2023-12-07T21:12:00.957+0800")},"status":"running"}).sort({"created_at":1}).explain("executionStats")
{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'land.boxes',
    indexFilterSet: false,
    parsedQuery: {
      '$and': [
        { status: { '$eq': 'running' } },
        { stopped_at: { '$lt': ISODate("2023-12-07T13:12:00.957Z") } }
      ]
    },
    winningPlan: {
      stage: 'FETCH',
      filter: {
        '$and': [
          { status: { '$eq': 'running' } },
          {
            stopped_at: { '$lt': ISODate("2023-12-07T13:12:00.957Z") }
          }
        ]
      },
      inputStage: {
        stage: 'IXSCAN',
        keyPattern: { created_at: 1 },
        indexName: 'created_at_1',
        isMultiKey: false,
        multiKeyPaths: { created_at: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { created_at: [ '[MinKey, MaxKey]' ] }
      }
    },
    rejectedPlans: []
  },
  executionStats: {
    executionSuccess: true,
    nReturned: 0,
    executionTimeMillis: 15818,
    totalKeysExamined: 87411,
    totalDocsExamined: 87411,
    executionStages: {
      stage: 'FETCH',
      filter: {
        '$and': [
          { status: { '$eq': 'running' } },
          {
            stopped_at: { '$lt': ISODate("2023-12-07T13:12:00.957Z") }
          }
        ]
      },
      nReturned: 0,
      executionTimeMillisEstimate: 14065,
      works: 87412,
      advanced: 0,
      needTime: 87411,
      needYield: 0,
      saveState: 892,
      restoreState: 892,
      isEOF: 1,
      docsExamined: 87411,
      alreadyHasObj: 0,
      inputStage: {
        stage: 'IXSCAN',
        nReturned: 87411,
        executionTimeMillisEstimate: 846,
        works: 87412,
        advanced: 87411,
        needTime: 0,
        needYield: 0,
        saveState: 892,
        restoreState: 892,
        isEOF: 1,
        keyPattern: { created_at: 1 },
        indexName: 'created_at_1',
        isMultiKey: false,
        multiKeyPaths: { created_at: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { created_at: [ '[MinKey, MaxKey]' ] },
        keysExamined: 87411,
        seeks: 1,
        dupsTested: 0,
        dupsDropped: 0
      }
    }
  },
  serverInfo: {
    host: 'mongo-0',
    port: 27017,
    version: '4.2.14',
    gitVersion: '0e6db36e92d82cc81cbd40ffd607eae88dc1f09d'
  },
  ok: 1
}
explain.executionStats
winning plan 执行的统计信息
explain.executionStats.nReturned
符合条件的总行数
explain.executionStats.executionTimeMillis
执行时间,毫秒
explain.executionStats.totalKeysExamined
索引扫描条目总数
explain.executionStats.totalDocsExamined
文档扫描条目总数,对应 COLLSCAN 和 FETCH
索引操作
查看索引
db.boxes.getIndexes();
创建索引
db.boxes.createIndex({"status:": 1, "stopped_at": -1});
删除索引
db.boxes.dropIndex("created_at_1_status_1_deleted_at_1")
数据归档
将用不到的历史数据分离出来建立归档,也是一种有效减少数据量提升查询性能的手段。
1、备份 collection
 db.boxes.aggregate([{$match: { created_at: { $lte: new Date(2023, 1, 1)} }}, {$out: "boxes_archive"}]);
2、删除历史数据
db.boxes.deleteMany({ created_at: { $lt: new Date(2023, 1, 1)}});
