English
Sometimes we need to perform analysis operations on data, such as some statistical operations, join table queries, etc. At this time, simple query operations cannot handle these requirements, so we need to use aggregation operations to complete them.
Get the aggregate operation instance of the database collection
db.collection('scores').aggregate()
Notice:
Do not reuse aggregate instances when used in cloud functions, which may cause bugs.
Both of the following are examples of errors:
const db = uniCloud.database()
const collection = db.collection('test')
const aggregate = collection.aggregate() // 云函数实例复用时,此聚合实例也会复用,导致Bug
exports.main = async function(){
const res = await aggregate.match({a:1}).end()
return {res}
}
const db = uniCloud.database()
const collection = db.collection('test')
exports.main = async function(){
const aggregate = collection.aggregate() // 此聚合实例分别在两个请求内使用,导致Bug
const res1 = await aggregate.match({a:1}).end()
const res2 = await aggregate.match({a:2}).end()
return {res1, res2}
}
Correct the above two usages to the correct usage as follows:
const db = uniCloud.database()
const collection = db.collection('test')
exports.main = async function(){
const aggregate = collection.aggregate() // 每次执行云函数会有独立的聚合实例
const res = await aggregate.match({a:1}).end()
return {res}
}
const db = uniCloud.database()
const collection = db.collection('test')
exports.main = async function(){
const res1 = await collection.aggregate().match({a:1}).end() // 两个请求分别调用aggregate方法产生聚合实例
const res2 = await collection.aggregate().match({a:2}).end()
return {res1, res2}
}
Expressions can be field paths, constants, or database operations. Expressions can nest expressions.
Field Path
Expressions use field path notation to specify fields in the record. Field paths are represented by a $
sign followed by the field name or nested field name. Nested field names use dots to connect fields at all levels of nesting. For example, $profile
represents the field path of profile
, and $profile.name
represents the field path of profile.name
(the name
field nested in the profile
field).
For example: the following data is available
[{
"profile": {
"name": "Chloe"
},
"status": 0
}]
// do the following
let res = await db.collection('scores').aggregate()
.addFields({
name: '$profile.name'
})
.end()
// return value
{
"data": [{
"profile": {
"name": "Chloe"
},
"status": 0,
"name": "Chloe"
}]
}
constant
Constants can be of any type. By default, strings starting with $ are treated as field paths. To avoid this behavior, use db.command.aggregate.literal
to declare it as a constant.
Database operation method
Refer to Database Operation Method
Aggregation stage. Add new fields to the output records. After the addFields
aggregation stage, all output records will have the fields specified by addFields
in addition to the fields they were input.
API description
addFields
is equivalent to the project
phase specifying all existing fields and new fields at the same time.
addFields
has the following form:
addFields({
<new field>: <expression>
})
addFields
can specify multiple new fields, the value of each new field is determined by the expression used.
If the specified new field has the same name as the original field, the value of the new field will overwrite the value of the original field. Note that addFields
cannot be used to add elements to array fields.
Example 1: addFields twice in a row
Suppose the set scores has the following records:
{
_id: 1,
student: "Maya",
homework: [ 10, 5, 10 ],
quiz: [ 10, 8 ],
extraCredit: 0
}
{
_id: 2,
student: "Ryan",
homework: [ 5, 6, 5 ],
quiz: [ 8, 8 ],
extraCredit: 8
}
Apply addFields
twice, the first time adding two fields are the sum of homework
and quiz
, the second time adding a field and then calculating the sum based on the previous two sums.
const $ = db.command.aggregate
let res = await db.collection('scores').aggregate()
.addFields({
totalHomework: $.sum('$homework'),
totalQuiz: $.sum('$quiz')
})
.addFields({
totalScore: $.add(['$totalHomework', '$totalQuiz', '$extraCredit'])
})
.end()
The returned results are as follows:
{
"_id" : 1,
"student" : "Maya",
"homework" : [ 10, 5, 10 ],
"quiz" : [ 10, 8 ],
"extraCredit" : 0,
"totalHomework" : 25,
"totalQuiz" : 18,
"totalScore" : 43
}
{
"_id" : 2,
"student" : "Ryan",
"homework" : [ 5, 6, 5 ],
"quiz" : [ 8, 8 ],
"extraCredit" : 8,
"totalHomework" : 16,
"totalQuiz" : 16,
"totalScore" : 40
}
Example 2: Adding fields to nested records
Fields can be added to nested records using dot notation. Suppose the vehicles collection contains the following records:
{ _id: 1, type: "car", specs: { doors: 4, wheels: 4 } }
{ _id: 2, type: "motorcycle", specs: { doors: 0, wheels: 2 } }
{ _id: 3, type: "jet ski" }
A new field fuel_type
can be added under the specs
field by the following operations, and the value is set to the fixed string unleaded
:
let res = await db.collection('vehicles').aggregate()
.addFields({
'specs.fuel_type': 'unleaded'
})
.end()
The returned results are as follows:
{ _id: 1, type: "car",
specs: { doors: 4, wheels: 4, fuel_type: "unleaded" } }
{ _id: 2, type: "motorcycle",
specs: { doors: 0, wheels: 2, fuel_type: "unleaded" } }
{ _id: 3, type: "jet ski",
specs: { fuel_type: "unleaded" } }
Example 3: Set field value to another field
You can set the value of a field to the value of another field by using an expression consisting of $
followed by the field name as the value.
Also using the previous collection example, you can add a field vehicle_type
and set its value to the value of the type
field as follows:
let res = await db.collection('vehicles').aggregate()
.addFields({
vehicle_type: '$type'
})
.end()
The returned results are as follows:
{ _id: 1, type: "car", vehicle_type: "car",
specs: { doors: 4, wheels: 4, fuel_type: "unleaded" } }
{ _id: 2, type: "motorcycle", vehicle_type: "motorcycle",
specs: { doors: 0, wheels: 2, fuel_type: "unleaded" } }
{ _id: 3, type: "jet ski", vehicle_type: "jet ski",
specs: { fuel_type: "unleaded" } }
Aggregation stage. Divide the input records into different groups according to the given conditions and boundaries, each group is a bucket
.
API description
Each group is output as a record, containing a _id
field with a lower bound and a count
field with the number of records in the group. count
is output by default when output
is not specified.
bucket
will only output if there is at least one record in the group.
bucket has the following form:
bucket({
groupBy: <expression>,
boundaries: [<lowerbound1>, <lowerbound2>, ...],
default: <literal>,
output: {
<output1>: <accumulator expr>,
...
<outputN>: <accumulator expr>
}
})
groupBy
is an expression used to determine the grouping applied to each input record. Expressions can be prefixed with $
followed by the path of the field to be used for grouping. Unless a default value is specified with default
, each record needs to contain the specified field, and the field value must be within the range specified by boundaries
.
boundaries
is an array where each element is the lower bound of each group. At least two boundary values must be specified. Array values must be incremented values of the same type.
default
is optional. After specifying, records that do not enter any group will enter a default group. The _id
of this group record is determined by default
. The value of default
must be less than the minimum value or greater than or equal to the maximum value in boundaries
. The value of default
can be different from the value type of the boundaries
element.
output
is optional and is used to determine which fields the output record contains in addition to _id
. The value of each field must be specified with an accumulator expression. When output
is specified, the default count
will not be output by default and must be specified manually:
output: {
count: $.sum(1),
...
<outputN>: <accumulator expr>
}
Using buckets requires at least one of the following conditions, otherwise an error will be thrown:
The value obtained by applying the groupBy expression to each input record must be a value within boundaries
Specify a default value that is outside boundaries, or a different type than the value of the boundaries element.
Example
Suppose the collection items has the following records:
{
_id: "1",
price: 10
}
{
_id: "2",
price: 50
}
{
_id: "3",
price: 20
}
{
_id: "4",
price: 80
}
{
_id: "5",
price: 200
}
To group the above records, group [0, 50) into one group, [50, 100) into one group, and the others into one group:
const $ = db.command.aggregate
let res = await db.collection('items').aggregate()
.bucket({
groupBy: '$price',
boundaries: [0, 50, 100],
default: 'other',
output: {
count: $.sum(1),
ids: $.push('$_id')
}
})
.end()
The returned results are as follows:
[
{
"_id": 0,
"count": 2,
"ids": [
"1",
"3"
]
},
{
"_id": 50,
"count": 2,
"ids": [
"2",
"4"
]
},
{
"_id": "other",
"count": 1,
"ids": [
"5"
]
}
]
Aggregation stage. Divide the input records into different groups according to the given conditions, each group is a bucket
. One of the differences from bucket
is that there is no need to specify boundaries
, bucketAuto
will automatically try to spread the records as evenly as possible into each group.
API description Each group is output as a record, containing an _id field whose value is an object containing the two fields of the maximum and minimum values in the group, and a count field whose value is the number of records in the group. count is output by default when output is not specified.
bucketAuto has the following form:
bucketAuto({
groupBy: <expression>,
buckets: <number>,
granularity: <string>,
output: {
<output1>: <accumulator expr>,
...
<outputN>: <accumulator expr>
}
})
groupBy
is an expression used to determine the grouping applied to each input record. Expressions can be prefixed with $ followed by the field path to group by. Unless a default value is specified with default
, each record needs to contain the specified field, and the field value must be within the range specified by boundaries
.
buckets
is a positive integer specifying the number of buckets to split.
granularity
is an optional enumeration value string used to ensure that the automatically computed bounds conform to the given rules. This field can only be used if all groupBy
values are numbers and there are no NaN
s. Enumerated values include: R5, R10, R20, R40, R80, 1-2-5, E6, E12, E24, E48, E96, E192, POWERSOF2
.
output
is optional and is used to determine which fields the output record contains in addition to _id
. The value of each field must be specified with an accumulator expression. When output
is specified, the default count
will not be output by default and must be specified manually:
output: {
count: $.sum(1),
...
<outputN>: <accumulator expr>
}
The output grouping may be smaller than the given number of groups in the following cases:
The number of input records is less than the number of groups
groupBy
counts fewer unique values than groupsgranularity
has less spacing than the number of packetsgranularity
is not granular enough to be evenly distributed among groupsgranularity details
granularity
is used to guarantee that boundary values belong to a given sequence of numbers.
Renard Sequence
The Renard sequence is a sequence of numbers between 1.0 and 10.0 (10.3 in the case of R80) derived by the 5 / 10 / 20 / 40 / 80 root of 10.
Setting the granularity to R5 / R10 / R20 / R40 / R80 limits the boundary values to within the sequence. If the value of groupBy is not within 1.0 to 10.0 (or 10.3 in case of R80), the serial number is automatically multiplied by 10.
E sequence
The E-sequence is a sequence of numbers between 1.0 and 10.0 with a certain error derived from 10 to the power of 6 / 12 / 24 / 48 / 96 / 192.
1-2-5 sequence
The 1-2-5 sequence behaves the same as the three-valued Renard sequence.
Sequence of powers of 2
A sequence of numbers made up of powers of 2.
Example
Suppose the collection items has the following records:
{
_id: "1",
price: 10.5
}
{
_id: "2",
price: 50.3
}
{
_id: "3",
price: 20.8
}
{
_id: "4",
price: 80.2
}
{
_id: "5",
price: 200.3
}
The above records are automatically grouped into three groups:
const $ = db.command.aggregate
let res = await db.collection('items').aggregate()
.bucketAuto({
groupBy: '$price',
buckets: 3,
})
.end()
The returned results are as follows:
{
"_id": {
"min": 10.5,
"max": 50.3
},
"count": 2
}
{
"_id": {
"min": 50.3,
"max": 200.3
},
"count": 2
}
{
"_id": {
"min": 200.3,
"max": 200.3
},
"count": 1
}
Aggregation stage. Calculates the number of records input to this stage from the previous aggregation stage, and outputs a record where the value of the specified field is the number of records.
API description
count has the following form:
count(<string>)
<string>
is the name of the field to output the number of records, cannot be an empty string, cannot start with $, cannot contain . characters.
The count stage is equivalent to the operation of group + project:
const $ = db.command.aggregate
let res = await db.collection('items').aggregate()
.group({
_id: null,
count: $.sum(1),
})
.project({
_id: 0,
})
.end()
The above operation will output a record with the count field.
Example
Suppose the collection items has the following records:
{
_id: "1",
price: 10.5
}
{
_id: "2",
price: 50.3
}
{
_id: "3",
price: 20.8
}
{
_id: "4",
price: 80.2
}
{
_id: "5",
price: 200.3
}
Find the number of records with a price greater than 50:
const $ = db.command.aggregate
let res = await db.collection('items').aggregate()
.match({
price: $.gt(50)
})
.count('expensiveCount')
.end()
The returned results are as follows:
{
"expensiveCount": 3
}
Aggregation stage. Output the records from near to far from the given point.
property | type | default | required | description |
---|---|---|---|---|
near | GeoPoint | Yes | GeoJSON Point, the point used to determine distance | |
spherical | true | Yes | Required, value is true | |
maxDistance | number | No | Maximum Distance | |
minDistance | number | No | Minimum Distance | |
query | Object | No | requires that records must meet this condition at the same time (same syntax as where) | |
distanceMultiplier | number | No | Multiply the distance by this number on return | |
distanceField | string | Yes | The name of the output field to store the distance, a nested field can be represented by dot notation | |
includeLocs | string | No | Lists the fields to be used for distance calculations, useful if multiple fields in the record are geographic locations | |
key | string | No | Select the geographic index to use. If the collection is indexed by multiple geographic locations, one must be specified, by specifying the corresponding field |
API description
geoNear
must be the first aggregation stagekey
parameter.Example
Suppose the collection attractions has the following records:
{
"_id": "geoNear.0",
"city": "Guangzhou",
"docType": "geoNear",
"location": {
"type": "Point",
"coordinates": [
113.30593,
23.1361155
]
},
"name": "Canton Tower"
},
{
"_id": "geoNear.1",
"city": "Guangzhou",
"docType": "geoNear",
"location": {
"type": "Point",
"coordinates": [
113.306789,
23.1564721
]
},
"name": "Baiyun Mountain"
},
{
"_id": "geoNear.2",
"city": "Beijing",
"docType": "geoNear",
"location": {
"type": "Point",
"coordinates": [
116.3949659,
39.9163447
]
},
"name": "The Palace Museum"
},
{
"_id": "geoNear.3",
"city": "Beijing",
"docType": "geoNear",
"location": {
"type": "Point",
"coordinates": [
116.2328567,
40.242373
]
},
"name": "Great Wall"
}
const $ = db.command.aggregate
let res = await db.collection('attractions').aggregate()
.geoNear({
distanceField: 'distance', // 输出的每个记录中 distance 即是与给定点的距离
spherical: true,
near: new db.Geo.Point(113.3089506, 23.0968251),
query: {
docType: 'geoNear',
},
key: 'location', // 若只有 location 一个地理位置索引的字段,则不需填
key: 'location', // If there is only one field of location index, it is not required to fill in
includeLocs: 'location', // 若只有 location 一个是地理位置,则不需填
includeLocs: 'location', // If only location is a geographic location, you don't need to fill in
})
.end()
The returned results are as follows:
{
"_id": "geoNear.0",
"location": {
"type": "Point",
"coordinates": [
113.30593,
23.1361155
]
},
"docType": "geoNear",
"name": "Canton Tower",
"city": "Guangzhou",
"distance": 4384.68131486958
},
{
"_id": "geoNear.1",
"city": "Guangzhou",
"location": {
"type": "Point",
"coordinates": [
113.306789,
23.1564721
]
},
"docType": "geoNear",
"name": "Baiyun Mountain",
"distance": 6643.521654040738
},
{
"_id": "geoNear.2",
"docType": "geoNear",
"name": "The Palace Museum",
"city": "Beijing",
"location": {
"coordinates": [
116.3949659,
39.9163447
],
"type": "Point"
},
"distance": 1894750.4414538583
},
{
"_id": "geoNear.3",
"docType": "geoNear",
"name": "Great Wall",
"city": "Beijing",
"location": {
"type": "Point",
"coordinates": [
116.2328567,
40.242373
]
},
"distance": 1928300.3308822548
}
Aggregation stage. Group the input records according to the given expression, each record represents a group when outputting, and the _id of each record is the key to distinguish different groups. Output records can also include cumulative values, and setting the output field to cumulative values will calculate cumulative values from that grouping.
Using group can easily implement distinct functions similar to SQL
API description
group has the following form:
group({
_id: <expression>,
<field1>: <accumulator1>,
...
<fieldN>: <accumulatorN>
})
The _id
parameter is required, if you fill in a constant, there is only one set. The other fields are optional and are accumulated values, using accumulators such as $.sum
(const $ = db.command.aggregate
), but other expressions can also be used.
The accumulator must be one of the following operators:
For details, see Accumulator Operator
operator | description | |
---|---|---|
addToSet | Adds a value to the array, does nothing if the value already exists in the array | |
avg | Returns the average value of the data corresponding to the specified field in a set of collections | |
sum | Computes and returns the sum of all values in a set of fields | |
first | Returns the value corresponding to the first record of the specified field in a set. This operation only makes sense if the set of collections is sorted by some definition ( sort ). | |
last | Returns the value corresponding to the last record of the specified field in a set. This operation only makes sense if the set of collections is sorted by some definition ( sort ). | |
max | returns the maximum value of a set of values | |
min | Returns the minimum value of a set of values | |
push | In the group stage, returns an array consisting of the column specified by the expression and the corresponding value | |
stdDevPop | Returns the standard deviation of the corresponding values of a set of fields | |
stdDevSamp | Calculates the sample standard deviation of the input values. If the input values represent the population of data, or do not generalize more data, use db.command.aggregate.stdDevPop | instead |
mergeObjects | Merge multiple documents into a single document |
Memory Limit
There is a 100M memory usage limit at this stage.
Example 1: Group by Field Value
Suppose the collection avatar has the following records:
{
_id: "1",
alias: "john",
region: "asia",
scores: [40, 20, 80],
coins: 100
}
{
_id: "2",
alias: "arthur",
region: "europe",
scores: [60, 90],
coins: 20
}
{
_id: "3",
alias: "george",
region: "europe",
scores: [50, 70, 90],
coins: 50
}
{
_id: "4",
alias: "john",
region: "asia",
scores: [30, 60, 100, 90],
coins: 40
}
{
_id: "5",
alias: "george",
region: "europe",
scores: [20],
coins: 60
}
{
_id: "6",
alias: "john",
region: "asia",
scores: [40, 80, 70],
coins: 120
}
const $ = db.command.aggregate
let res = await db.collection('avatar').aggregate()
.group({
_id: '$alias',
num: $.sum(1)
})
.end()
The returned results are as follows:
{
"_id": "john",
"num": 3
}
{
"_id": "authur",
"num": 1
}
{
"_id": "george",
"num": 2
}
Example 2: Group by Multiple Values
Records can be grouped by multiple values in the way _id is passed in. Or follow the example data above, group by regions with the same highest score (score), and find the total amount of coins in each group:
const $ = db.command.aggregate
let res = await db.collection('avatar').aggregate()
.group({
_id: {
region: '$region',
maxScore: $.max('$scores')
},
totalCoins: $.sum('$coins')
})
.end()
The returned results are as follows:
{
"_id": {
"region": "asia",
"maxScore": 80
},
"totalCoins": 220
}
{
"_id": {
"region": "asia",
"maxScore": 100
},
"totalCoins": 40
}
{
"_id": {
"region": "europe",
"maxScore": 90
},
"totalCoins": 70
}
{
"_id": {
"region": "europe",
"maxScore": 20
},
"totalCoins": 60
}
Aggregation stage. Limit the number of records output to the next stage.
Example
Suppose the collection items has the following records:
{
_id: "1",
price: 10
}
{
_id: "2",
price: 50
}
{
_id: "3",
price: 20
}
{
_id: "4",
price: 80
}
{
_id: "5",
price: 200
}
Return the smallest two records of records with a price greater than 20:
const $ = db.command.aggregate
let res = await db.collection('items').aggregate()
.match({
price: $.gt(20)
})
.sort({
price: 1,
})
.limit(2)
.end()
The returned results are as follows:
{
"_id": "2",
"price": 50
}
{
"_id": "4",
"price": 80
}
Aggregation stage. Joint query. Do left outer join with a specified collection in the same database. For each input record in this stage, lookup will add an array field to the record, and the array is a list of records in the linked table that meet the matching conditions. lookup will output the concatenated result to the next stage.
API description
lookup
can be used in two ways
When matching a field of an input record for equality with a field of the joined collection, the following definitions are used:
lookup({
from: <要连接的集合名>,
from: <collection name to connect to>,
localField: <输入记录的要进行相等匹配的字段>,
localField: <field of the input record to be matched for equality>,
foreignField: <被连接集合的要进行相等匹配的字段>,
foreignField: <the field of the connected collection to be matched for equality>,
as: <输出的数组字段名>
as: <output array field name>
})
Detailed description of parameters
Parameter Fields | Description |
---|---|
from | The name of another collection to join |
localField | The field name of the input record of the current pipeline, which will be used for equality matching with the foreignField of the collection specified by from. If the field is not present in the input record, the value of the field will be treated as null |
foreignField | The field name of the concatenated collection that will be used for equality matching with localField. If the field is not present in the records of the joined collection, the value of the field will be treated as null when matching |
as | Specifies the field name to be stored in the list of records matched by the connection. This array contains the matched records from the from collection. If the field already exists in the input record, the field will be overwritten |
This operation is equivalent to the following pseudo-SQL operation:
SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (SELECT *
FROM <collection to join>
WHERE <foreignField>= <collection.localField>);
example:
If you need to specify join conditions other than equality matching, or specify multiple equality matching conditions, or need to concatenate the subquery results of the joined sets, you can use the following definitions:
lookup({
from: <要连接的集合名>,
from: <collection name to connect to>,
let: { <变量1>: <表达式1>, ..., <变量n>: <表达式n> },
let: { <variable1>: <expression1>, ..., <variablen>: <expressionn> },
pipeline: [ <在要连接的集合上进行的流水线操作> ],
pipeline: [ <pipeline operation on the collection to join> ],
as: <输出的数组字段名>
as: <output array field name>
})
Detailed description of parameters
Parameter Fields | Description |
---|---|
from | The name of another collection to join |
let | Optional. Specify a variable that can be used in the pipeline. The value of the variable can refer to the field of the input record. For example, let: { userName: '$name' } means that the name field of the input record is used as the value of the variable userName. The fields of the input record cannot be directly accessed in the pipeline, and must be accessed after the let definition. The way to access is to use the $$ variable name in the expr operator to access, such as $$userName. |
pipeline | Specifies the aggregation operation to run on the joined collection. If the entire collection is to be returned, this field takes the empty array []. The fields of the input record cannot be directly accessed in the pipeline, and must be accessed after the let definition. The way to access is to use the $$ variable name in the expr operator to access, such as $$userName. |
as | Specifies the field name to be stored in the list of records matched by the connection. This array contains the matched records from the from collection. If the field already exists in the input record, the field will be overwritten |
This operation is equivalent to the following pseudo-SQL statement:
SELECT *, <output array field>
FROM collection
WHERE <output array field> IN (SELECT <documents as determined from the pipeline>
FROM <collection to join>
WHERE <pipeline> );
example
Example
Specify an equality match condition
Suppose the orders collection has the following records:
[
{"_id":4,"book":"novel 1","price":30,"quantity":2},
{"_id":5,"book":"science 1","price":20,"quantity":1},
{"_id":6}
]
The books collection has the following records:
[
{"_id":"book1","author":"author 1","category":"novel","stock":10,"time":1564456048486,"title":"novel 1"},
{"_id":"book3","author":"author 3","category":"science","stock":30,"title":"science 1"},
{"_id":"book4","author":"author 3","category":"science","stock":40,"title":"science 2"},
{"_id":"book2","author":"author 2","category":"novel","stock":20,"title":"novel 2"},
{"_id":"book5","author":"author 4","category":"science","stock":50,"title":null},
{"_id":"book6","author":"author 5","category":"novel","stock":"60"}
]
The following aggregation operations can join the orders
and books
collections with an equality match condition on the book
field of the orders
collection and the title field of the books
collection:
const db = uniCloud.database()
let res = await db.collection('orders').aggregate()
.lookup({
from: 'books',
localField: 'book',
foreignField: 'title',
as: 'bookList',
})
.end()
result:
[
{
"_id": 4,
"book": "novel 1",
"price": 30,
"quantity": 2,
"bookList": [
{
"_id": "book1",
"title": "novel 1",
"author": "author 1",
"category": "novel",
"stock": 10
}
]
},
{
"_id": 5,
"book": "science 1",
"price": 20,
"quantity": 1,
"bookList": [
{
"_id": "book3",
"category": "science",
"title": "science 1",
"author": "author 3",
"stock": 30
}
]
},
{
"_id": 6,
"bookList": [
{
"_id": "book5",
"category": "science",
"author": "author 4",
"stock": 50,
"title": null
},
{
"_id": "book6",
"author": "author 5",
"stock": "60",
"category": "novel"
}
]
}
]
Apply equality matching to array fields Suppose the authors collection has the following records:
[
{"_id": 1, "name": "author 1", "intro": "Two-time best-selling sci-fiction novelist"},
{"_id": 3, "name": "author 3", "intro": "UCB assistant professor"},
{"_id": 4, "name": "author 4", "intro": "major in CS"}
]
The books collection has the following records:
[
{"_id":"book1","authors":["author 1"],"category":"novel","stock":10,"time":1564456048486,"title":"novel 1"},
{"_id":"book3","authors":["author 3", "author 4"],"category":"science","stock":30,"title":"science 1"},
{"_id":"book4","authors":["author 3"],"category":"science","stock":40,"title":"science 2"}
]
The following operation retrieves author information and their respective published books, using the lookup operation to match the name field of the authors collection with the authors array field of the books collection:
const db = cloud.database()
let res = await db.collection('authors').aggregate()
.lookup({
from: 'books',
localField: 'name',
foreignField: 'authors',
as: 'publishedBooks',
})
.end()
result
[
{
"_id": 1,
"intro": "Two-time best-selling sci-fiction novelist",
"name": "author 1",
"publishedBooks": [
{
"_id": "book1",
"title": "novel 1",
"category": "novel",
"stock": 10,
"authors": [
"author 1"
]
}
]
},
{
"_id": 3,
"name": "author 3",
"intro": "UCB assistant professor",
"publishedBooks": [
{
"_id": "book3",
"category": "science",
"title": "science 1",
"stock": 30,
"authors": [
"author 3",
"author 4"
]
},
{
"_id": "book4",
"title": "science 2",
"category": "science",
"stock": 40,
"authors": [
"author 3"
]
}
]
},
{
"_id": 4,
"intro": "major in CS",
"name": "author 4",
"publishedBooks": [
{
"_id": "book3",
"category": "science",
"title": "science 1",
"stock": 30,
"authors": [
"author 3",
"author 4"
]
}
]
}
]
Combining mergeObjects applies equality matching
Suppose the orders
collection has the following records:
[
{"_id":4,"book":"novel 1","price":30,"quantity":2},
{"_id":5,"book":"science 1","price":20,"quantity":1},
{"_id":6}
]
The books
collection has the following records:
[
{"_id":"book1","author":"author 1","category":"novel","stock":10,"time":1564456048486,"title":"novel 1"},
{"_id":"book3","author":"author 3","category":"science","stock":30,"title":"science 1"},
{"_id":"book4","author":"author 3","category":"science","stock":40,"title":"science 2"},
{"_id":"book2","author":"author 2","category":"novel","stock":20,"title":"novel 2"},
{"_id":"book5","author":"author 4","category":"science","stock":50,"title":null},
{"_id":"book6","author":"author 5","category":"novel","stock":"60"}
]
The following operation matches the book field of orders and the title field of books, and merges the books matching result directly into the orders record.
var db = cloud.database()
var $ = db.command.aggregate
let res = await db.collection('orders').aggregate()
.lookup({
from: "books",
localField: "book",
foreignField: "title",
as: "bookList"
})
.replaceRoot({
newRoot: $.mergeObjects([ $.arrayElemAt(['$bookList', 0]), '$$ROOT' ])
})
.project({
bookList: 0
})
.end()
result
[
{
"_id": 4,
"title": "novel 1",
"author": "author 1",
"time": 1564456048486,
"category": "novel",
"stock": 10,
"book": "novel 1",
"price": 30,
"quantity": 2
},
{
"_id": 5,
"category": "science",
"title": "science 1",
"author": "author 3",
"stock": 30,
"book": "science 1",
"price": 20,
"quantity": 1
},
{
"_id": 6,
"category": "science",
"author": "author 4",
"stock": 50,
"title": null
}
]
Specify multiple join conditions
Suppose the orders
collection has the following records:
[
{"_id":4,"book":"novel 1","price":300,"quantity":20},
{"_id":5,"book":"science 1","price":20,"quantity":1}
]
The books
collection has the following records:
[
{"_id":"book1","author":"author 1","category":"novel","stock":10,"time":1564456048486,"title":"novel 1"},
{"_id":"book3","author":"author 3","category":"science","stock":30,"title":"science 1"}
]
The following operation joins the orders
and books
collections, requiring two conditions:
const db = cloud.database()
const dbCmd = db.command
const $ = dbCmd.aggregate
let res = await db.collection('orders').aggregate()
.lookup({
from: 'books',
let: {
order_book: '$book',
order_quantity: '$quantity'
},
pipeline: $.pipeline()
.match(dbCmd.expr($.and([
$.eq(['$title', '$$order_book']),
$.gte(['$stock', '$$order_quantity'])
])))
.project({
_id: 0,
title: 1,
author: 1,
stock: 1
})
.done(),
as: 'bookList',
})
.end()
result:
[
{
"_id": 4,
"book": "novel 1",
"price": 300,
"quantity": 20,
"bookList": []
},
{
"_id": 5,
"book": "science 1",
"price": 20,
"quantity": 1,
"bookList": [
{
"title": "science 1",
"author": "author 3",
"stock": 30
}
]
}
]
Concatenate subqueries of joined collections
Suppose the orders
collection has the following records:
[
{"_id":4,"book":"novel 1","price":30,"quantity":2},
{"_id":5,"book":"science 1","price":20,"quantity":1}
]
The books
collection has the following records:
[
{"_id":"book1","author":"author 1","category":"novel","stock":10,"time":1564456048486,"title":"novel 1"},
{"_id":"book3","author":"author 3","category":"science","stock":30,"title":"science 1"},
{"_id":"book4","author":"author 3","category":"science","stock":40,"title":"science 2"}
]
Add an array field to each output record whose value is the result of a query on the books collection:
const db = cloud.database()
const $ = db.command.aggregate
let res = await db.collection('orders').aggregate()
.lookup({
from: 'books',
let: {
order_book: '$book',
order_quantity: '$quantity'
},
pipeline: $.pipeline()
.match({
author: 'author 3'
})
.project({
_id: 0,
title: 1,
author: 1,
stock: 1
})
.done(),
as: 'bookList',
})
.end()
result
[
{
"_id": 4,
"book": "novel 1",
"price": 30,
"quantity": 20,
"bookList": [
{
"title": "science 1",
"author": "author 3",
"stock": 30
},
{
"title": "science 2",
"author": "author 3",
"stock": 40
}
]
},
{
"_id": 5,
"book": "science 1",
"price": 20,
"quantity": 1,
"bookList": [
{
"title": "science 1",
"author": "author 3",
"stock": 30
},
{
"title": "science 2",
"author": "author 3",
"stock": 40
}
]
}
]
Multiple table join table query
Suppose the orders
collection has the following records:
[
{"_id":4,"book":"book1","price":30,"quantity":2},
{"_id":5,"book":"book2","price":20,"quantity":1}
]
The books
collection has the following records:
[
{"_id":"book1","author":"author1","title":"novel 1"},
{"_id":"book2","author":"author2","title":"science 1"},
]
The authors
collection has the following records:
[
{"_id":"author1","name":"A1"},
{"_id":"author2","author":"A2"}
]
If orders are associated with books, and books are associated with authors query, you can use lookup in the pipeline.
const db = cloud.database()
const dbCmd = db.command
const $ = db.command.aggregate
let res = await db.collection('orders').aggregate()
.lookup({
from: 'books',
let: {
book_id: '$book'
},
pipeline: $.pipeline()
.match(
dbCmd.expr($.eq(['$_id', '$$book_id']))
)
.lookup({
from: 'authors',
let: {
author_id: '$author'
},
pipeline: $.pipeline()
.match(
dbCmd.expr($.eq(['$_id', '$$author_id']))
)
.done(),
as: 'authorList'
})
.done(),
as: 'bookList',
})
.end()
The result is as follows
[
{
"_id":4,
"book":"book1",
"price":30,
"quantity":2,
"bookList": [{
"_id":"book1",
"author":"author1",
"title":"novel 1",
authorList: [{
"_id":"author1",
"name":"A1"
}]
}]
},
{
"_id":5,
"book":"book2",
"price":20,
"quantity":1,
"bookList": [{
"_id":"book2",
"author":"author2",
"title":"science 1",
authorList: [{
"_id":"author2",
"name":"A2"
}]
}]
}
]
Aggregation stage. Filter documents based on a condition and pass those documents that match the condition to the next pipeline stage.
已知问题
支付宝小程序云中,使用match管道过滤时,如果传入的参数是一个对象,将按照字段的值进行相等匹配,包含字段顺序。
例如:过滤内存是8g的计算机商品
// 错误示例
let res = await db.collection('goods')
.aggregate()
.match({
category: 'computer',
type: {
memory: 8,
}
})
.end()
// 正确示例
let res = await db.collection('goods')
.aggregate()
.match({
category: 'computer',
"type.memory": 8
})
.end()
API description
match has the following form:
match(<查询条件>)
match(<query condition>)
The query conditions are the same as those of ordinary queries, and ordinary query operators can be used. Note that the match stage is different from other aggregation stages. Database operations cannot be used, and only query operators can be used.
// use string directly
match({
name: 'Tony Stark'
})
// use operator
const dbCmd = db.command
match({
age: dbCmd.gt(18)
})
Example
Suppose the collection articles has the following records:
{ "_id" : "1", "author" : "stark", "score" : 80 }
{ "_id" : "2", "author" : "stark", "score" : 85 }
{ "_id" : "3", "author" : "bob", "score" : 60 }
{ "_id" : "4", "author" : "li", "score" : 55 }
{ "_id" : "5", "author" : "jimmy", "score" : 60 }
{ "_id" : "6", "author" : "li", "score" : 94 }
{ "_id" : "7", "author" : "justan", "score" : 95 }
match
Here is an example of a direct match:
let res = await db.collection('articles')
.aggregate()
.match({
author: 'stark'
})
.end()
The code here tries to find all articles where the author field is stark , then the match is as follows:
{ "_id" : "1", "author" : "stark", "score" : 80 }
{ "_id" : "2", "author" : "stark", "score" : 85 }
count
After match filters out documents, it can also be used in conjunction with other pipeline stages.
For example, in the following example, we use group to match and calculate the number of documents whose score field is greater than 80:
const dbCmd = db.command
const $ = dbCmd.aggregate
let res = await db.collection('articles')
.aggregate()
.match({
score: dbCmd.gt(80)
})
.group({
_id: null,
count: $.sum(1)
})
.end()
The return value is as follows:
{ "_id" : null, "count" : 3 }
Aggregation stage. Pass the specified field to the next pipeline. The specified field can be an existing field or a new calculated field.
API description
project has the following form:
project({
<expression>
})
Expressions can have the following formats:
Format | Description |
---|---|
Specifies to include an existing field | |
_id: <0 or false> | Discard _id field |
Add a new field, or reset an existing field | |
Discard a field (if you specify to discard a non-_id field, you cannot use other expressions in this project) |
Specify Include Fields
The _id field is included in the output by default, and any other fields must be specified in the project if they want to be reflected in the output; if the specified field contains a field that does not yet exist, the project will ignore this field, will not be added to the output document;
Specify exclusion fields
If you specify to exclude a field in the project, other fields will be reflected in the output; if the specified exclusion is a non-_id field, then in this project, no other expressions can be used;
Add a new field or reset an existing field
You can use some special expressions to add new fields, or reset an existing field.
Multi-level nested fields
Sometimes some fields are at the bottom of multiple levels of nesting, we can use dot notation:
"contact.phone.number": <1 or 0 or 表达式>
"contact.phone.number": <1 or 0 or expression>
Nested formats can also be used directly:
contact: { phone: { number: <1 or 0 or 表达式> } }
contact: { phone: { number: <1 or 0 or expression> } }
Example
Suppose we have an articles collection with the following documents:
{
"_id": 666,
"title": "This is title",
"author": "Nobody",
"isbn": "123456789",
"introduction": "......"
}
Specify to include certain fields
The following code uses project to have the output only contain the _id, title and author fields:
let res = await db.collection('articles')
.aggregate()
.project({
title: 1,
author: 1
})
.end()
The output is as follows:
{ "_id" : 666, "title" : "This is title", "author" : "Nobody" }
Remove _id field from output
_id is included in the output by default, if you don't want it, you can specify to remove it:
let res = await db.collection('articles')
.aggregate()
.project({
_id: 0, // 指定去除 _id 字段
title: 1,
author: 1
})
.end()
The output is as follows:
{ "title" : "This is title", "author" : "Nobody" }
Remove a non-_id field
We can also specify to remove a non-_id field from the output, so that all other fields will be output:
let res = await db.collection('articles')
.aggregate()
.project({
isbn: 0, // 指定去除 isbn 字段
isbn: 0, // Specifies to remove the isbn field
})
.end()
The output is as follows, without the isbn field compared to the input:
{
"_id" : 666,
"title" : "This is title",
"author" : "Nobody",
"introduction": "......"
}
Add calculated new field
Suppose we have a students collection with the following documents:
{
"_id": 1,
"name": "小明",
"scores": {
"chinese": 80,
"math": 90,
"english": 70
}
}
In the code below, we use project and add a new field totalScore to the output:
const { sum } = db.command.aggregate
let res = await db.collection('students')
.aggregate()
.project({
_id: 0,
name: 1,
totalScore: sum([
"$scores.chinese",
"$scores.math",
"$scores.english"
])
})
.end()
The output is:
{ "name": "小明", "totalScore": 240 }
{ "name": "Xiao Ming", "totalScore": 240 }
Add new array field
Suppose we have a points collection containing the following documents:
{ "_id": 1, "x": 1, "y": 1 }
{ "_id": 2, "x": 2, "y": 2 }
{ "_id": 3, "x": 3, "y": 3 }
In the code below, we use project to put the x and y fields into a new array field coordinate :
let res = await db.collection('points')
.aggregate()
.project({
coordinate: ["$x", "$y"]
})
.end()
The output is as follows:
{ "_id": 1, "coordinate": [1, 1] }
{ "_id": 2, "coordinate": [2, 2] }
{ "_id": 3, "coordinate": [3, 3] }
Aggregation stage. Specify an existing field as the root node of the output, or you can specify a new calculated field as the root node.
API description
replaceRoot is used in the following form:
replaceRoot({
newRoot: <表达式>
newRoot: <expression>
})
The expression format is as follows:
Format | Description |
---|---|
Specifies an existing field as the output root node (if the field does not exist, an error is reported) | |
Computes a new field and uses this new field as the root node |
Example
Use existing field as root node
Suppose we have a schools collection with the following content:
{
"_id": 1,
"name": "SFLS",
"teachers": {
"chinese": 22,
"math": 18,
"english": 21,
"other": 123
}
}
The following code uses replaceRoot to output the teachers field as the root node:
let res = await db.collection('schools')
.aggregate()
.replaceRoot({
newRoot: '$teachers'
})
.end()
The output is as follows:
{
"chinese": 22,
"math": 18,
"english": 21,
"other": 123
}
Use calculated new field as root node
Suppose we have a roles collection with the following content:
{ "_id": 1, "first_name": "四郎", "last_name": "黄" }
{ "_id": 1, "first_name": "Shiro", "last_name": "Yellow" }
{ "_id": 2, "first_name": "邦德", "last_name": "马" }
{ "_id": 2, "first_name": "Bond", "last_name": "Horse" }
{ "_id": 3, "first_name": "牧之", "last_name": "张" }
{ "_id": 3, "first_name": "Muzhi", "last_name": "Zhang" }
The following code uses replaceRoot to concatenate first_name and last_name:
const { concat } = db.command.aggregate
let res = await db.collection('roles')
.aggregate()
.replaceRoot({
newRoot: {
full_name: concat(['$last_name', '$first_name'])
}
})
.end()
The output is as follows:
{ "full_name": "黄四郎" }
{ "full_name": "Huang Shilang" }
{ "full_name": "马邦德" }
{ "full_name": "Mabonde" }
{ "full_name": "张牧之" }
{ "full_name": "Zhang Muzhi" }
Notice
sample({size:40}).limit(40)
Aggregation stage. Randomly picks the specified number of records from the document.
API description
sample has the following form:
sample({
size: <正整数>
size: <positive integer>
})
Note: size is a positive integer, otherwise an error occurs.
Example
Suppose the document users has the following records:
{ "name": "a" }
{ "name": "b" }
choose randomly
If there is a lottery now, one lucky user needs to be selected. Then sample is called as follows:
let res = await db.collection('users')
.aggregate()
.sample({
size: 1
})
.end()
The record corresponding to a randomly selected user is returned, and the result is as follows:
{ "_id": "696529e4-7e82-4e7f-812e-5144714edff6", "name": "b" }
Aggregation stage. Specify a positive integer to skip the corresponding number of documents and output the remaining documents.
Example
let res = await db.collection('users')
.aggregate()
.skip(5)
.end()
This code will skip the first 5 documents found and output the remaining documents.
Aggregation stage. Sorts the entered documents according to the specified field.
API description
The form is as follows:
sort({
<fieldname1>: <collation>,
<fieldname2>: <collation>,
})
<collation />
can be the following values:
Example
Ascending/descending order
Suppose we have the collection articles, which contains the following data:
{ "_id": "1", "author": "stark", "score": 80, "age": 18 }
{ "_id": "2", "author": "bob", "score": 60, "age": 18 }
{ "_id": "3", "author": "li", "score": 55, "age": 19 }
{ "_id": "4", "author": "jimmy", "score": 60, "age": 22 }
{ "_id": "5", "author": "justan", "score": 95, "age": 33 }
let res = await db.collection('articles')
.aggregate()
.sort({
age: -1,
score: -1
})
.end()
The above code performs an aggregate search in the students collection and sorts the results, first according to the age field in descending order, and then according to the score field in descending order.
The output is as follows:
{ "_id": "5", "author": "justan", "score": 95, "age": 33 }
{ "_id": "4", "author": "jimmy", "score": 60, "age": 22 }
{ "_id": "3", "author": "li", "score": 55, "age": 19 }
{ "_id": "1", "author": "stark", "score": 80, "age": 18 }
{ "_id": "2", "author": "bob", "score": 60, "age": 18 }
Aggregation stage. Groups the incoming collection according to the incoming expression. Then count the number of distinct groups, and sort the groups by their number, returning the sorted result.
API description
sortByCount is called as follows:
sortByCount(<表达式>)
sortByCount(<expression>)
The form of the expression is: $ + specifies the field. Note: don't leave out the $ sign.
Example
Statistical base type
Suppose the set passages are recorded as follows:
{ "category": "Web" }
{ "category": "Web" }
{ "category": "Life" }
The following code can count the classification information of the article and calculate the number of each classification. That is, perform the sortByCount aggregation operation on the category field.
let res = await db.collection('passages')
.aggregate()
.sortByCount('$category')
.end()
The returned results are as follows: 2 articles in the Web category and 1 article in the Life category.
{ "_id": "Web", "count": 2 }
{ "_id": "Life", "count": 1 }
destructuring array types
Suppose the records of the collection passages are as follows: the value corresponding to the tags field is of type array.
{ "tags": [ "JavaScript", "C#" ] }
{ "tags": [ "Go", "C#" ] }
{ "tags": [ "Go", "Python", "JavaScript" ] }
How to count the tag information of the article and calculate the number of each tag? Because of the array corresponding to the tags field, you need to use the unwind operation to deconstruct the tags field, and then call sortByCount.
The following code implements this functionality:
let res = await db.collection('passages')
.aggregate()
.unwind(`$tags`)
.sortByCount(`$tags`)
.end()
The result returned is as follows:
{ "_id": "Go", "count": 2 }
{ "_id": "C#", "count": 2 }
{ "_id": "JavaScript", "count": 2 }
{ "_id": "Python", "count": 1 }
Aggregation stage. Splits the document using each element in the specified array field. After splitting, the document changes from one to one or more, one for each element of the array.
API description
Splits the document using each element in the specified array field. After splitting, the document changes from one to one or more, one for each element of the array.
unwind can be used in two forms:
The parameter is a field name
unwind(<字段名>)
unwind(<field name>)
The parameter is an object
unwind({
path: <字段名>,
includeArrayIndex: <string>,
preserveNullAndEmptyArrays: <boolean>
})
Field | Type | Description |
---|---|---|
path | string | The field name of the array you want to split, needs to start with $. |
includeArrayIndex | string | Optional, pass in a new field name on which the array index will be stored. New field names cannot start with $. |
preserveNullAndEmptyArrays | boolean | If true, the document will still be output when the field corresponding to path is null, an empty array, or the field does not exist; if false, unwind will not output these documents. Defaults to false. |
Example
split array
Suppose we have a products collection with the following data:
{ "_id": "1", "product": "tshirt", "size": ["S", "M", "L"] }
{ "_id": "2", "product": "pants", "size": [] }
{ "_id": "3", "product": "socks", "size": null }
{ "_id": "4", "product": "trousers", "size": ["S"] }
{ "_id": "5", "product": "sweater", "size": ["M", "L"] }
We split these documents based on the size field
db.collection('products')
.aggregate()
.unwind('$size')
.end()
The output is as follows:
{ "_id": "1", "product": "tshirt", "size": "S" }
{ "_id": "1", "product": "tshirt", "size": "M" }
{ "_id": "1", "product": "tshirt", "size": "L" }
{ "_id": "4", "product": "trousers", "size": "S" }
{ "_id": "5", "product": "sweater", "size": "M" }
{ "_id": "5", "product": "sweater", "size": "L" }
After splitting, keep the index of the original array
After we split the document according to the size field, we want to keep the original array index in the new index field.
let res = await db.collection('products')
.aggregate()
.unwind({
path: '$size',
includeArrayIndex: 'index'
})
.end()
The output is as follows:
{ "_id": "1", "product": "tshirt", "size": "S", "index": 0 }
{ "_id": "1", "product": "tshirt", "size": "M", "index": 1 }
{ "_id": "1", "product": "tshirt", "size": "L", "index": 2 }
{ "_id": "4", "product": "trousers", "size": "S", "index": 0 }
{ "_id": "5", "product": "sweater", "size": "M", "index": 0 }
{ "_id": "5", "product": "sweater", "size": "L", "index": 1 }
Keep documents with empty fields
Notice that we have two special rows of null data in our collection:
...
{ "_id": "2", "product": "pants", "size": [] }
{ "_id": "3", "product": "socks", "size": null }
...
If you want to keep documents whose size is an empty array, null, or the size field does not exist in the output, you can use the preserveNullAndEmptyArrays parameter
let res = await db.collection('products')
.aggregate()
.unwind({
path: '$size',
preserveNullAndEmptyArrays: true
})
.end()
The output is as follows:
{ "_id": "1", "product": "tshirt", "size": "S" }
{ "_id": "1", "product": "tshirt", "size": "M" }
{ "_id": "1", "product": "tshirt", "size": "L" }
{ "_id": "2", "product": "pants", "size": null }
{ "_id": "3", "product": "socks", "size": null }
{ "_id": "4", "product": "trousers", "size": "S" }
{ "_id": "5", "product": "sweater", "size": "M" }
{ "_id": "5", "product": "sweater", "size": "L" }
Indicates that the definition of the aggregation operation is completed, and initiates the actual aggregation operation
return value
Promise.<Object>
property | type | description |
---|---|---|
list | Array.<any> | List of aggregated results |
Sample code
const $ = db.command.aggregate
let res = await db.collection('books').aggregate()
.group({
// group by category field
_id: '$category',
// Let each group of records output have an avgSales field whose value is the average of the sales fields of all records in the group
avgSales: $.avg('$sales')
})
.end()