Docs Menu
Docs Home
/ / /
PyMongo Driver
/ /

Filtered Subset

On this page

  • Introduction
  • Aggregation Task Summary
  • Before You Get Started
  • Tutorial
  • Add a match stage for people who are engineers
  • Add a sort stage to sort from youngest to oldest
  • Add a limit stage to see only three results
  • Add an unset stage to remove unneeded fields
  • Run the aggregation pipeline
  • Interpret results

In this tutorial, you can learn how to use PyMongo to construct an aggregation pipeline, perform the aggregation on a collection, and print the results by completing and running a sample app. This aggregation performs the following operations:

  • Matches a subset of documents by a field value

  • Formats result documents

This tutorial demonstrates how to query a collection for a specific subset of documents in a collection. The results contain documents that describe the three youngest people who are engineers.

This example uses one collection, persons, which contains documents describing people. Each document includes a person's name, date of birth, vocation, and other details.

Before you start this tutorial, complete the Aggregation Template App instructions to set up a working Python application.

After you set up the app, access the persons collection by adding the following code to the application:

person_coll = agg_db["persons"]

Delete any existing data in the collections and insert sample data into the persons collection as shown in the following code. Select the Synchronous or Asynchronous tab to see the corresponding code:

person_coll.delete_many({})
person_data = [
{
"person_id": "6392529400",
"firstname": "Elise",
"lastname": "Smith",
"dateofbirth": datetime(1972, 1, 13, 9, 32, 7),
"vocation": "ENGINEER",
"address": {
"number": 5625,
"street": "Tipa Circle",
"city": "Wojzinmoj",
}
},
{
"person_id": "1723338115",
"firstname": "Olive",
"lastname": "Ranieri",
"dateofbirth": datetime(1985, 5, 12, 23, 14, 30),
"gender": "FEMALE",
"vocation": "ENGINEER",
"address": {
"number": 9303,
"street": "Mele Circle",
"city": "Tobihbo",
}
},
{
"person_id": "8732762874",
"firstname": "Toni",
"lastname": "Jones",
"dateofbirth": datetime(1991, 11, 23, 16, 53, 56),
"vocation": "POLITICIAN",
"address": {
"number": 1,
"street": "High Street",
"city": "Upper Abbeywoodington",
}
},
{
"person_id": "7363629563",
"firstname": "Bert",
"lastname": "Gooding",
"dateofbirth": datetime(1941, 4, 7, 22, 11, 52),
"vocation": "FLORIST",
"address": {
"number": 13,
"street": "Upper Bold Road",
"city": "Redringtonville",
}
},
{
"person_id": "1029648329",
"firstname": "Sophie",
"lastname": "Celements",
"dateofbirth": datetime(1959, 7, 6, 17, 35, 45),
"vocation": "ENGINEER",
"address": {
"number": 5,
"street": "Innings Close",
"city": "Basilbridge",
}
},
{
"person_id": "7363626383",
"firstname": "Carl",
"lastname": "Simmons",
"dateofbirth": datetime(1998, 12, 26, 13, 13, 55),
"vocation": "ENGINEER",
"address": {
"number": 187,
"street": "Hillside Road",
"city": "Kenningford",
}
}
]
person_coll.insert_many(person_data)
await person_coll.delete_many({})
person_data = [
{
"person_id": "6392529400",
"firstname": "Elise",
"lastname": "Smith",
"dateofbirth": datetime(1972, 1, 13, 9, 32, 7),
"vocation": "ENGINEER",
"address": {
"number": 5625,
"street": "Tipa Circle",
"city": "Wojzinmoj",
}
},
{
"person_id": "1723338115",
"firstname": "Olive",
"lastname": "Ranieri",
"dateofbirth": datetime(1985, 5, 12, 23, 14, 30),
"gender": "FEMALE",
"vocation": "ENGINEER",
"address": {
"number": 9303,
"street": "Mele Circle",
"city": "Tobihbo",
}
},
{
"person_id": "8732762874",
"firstname": "Toni",
"lastname": "Jones",
"dateofbirth": datetime(1991, 11, 23, 16, 53, 56),
"vocation": "POLITICIAN",
"address": {
"number": 1,
"street": "High Street",
"city": "Upper Abbeywoodington",
}
},
{
"person_id": "7363629563",
"firstname": "Bert",
"lastname": "Gooding",
"dateofbirth": datetime(1941, 4, 7, 22, 11, 52),
"vocation": "FLORIST",
"address": {
"number": 13,
"street": "Upper Bold Road",
"city": "Redringtonville",
}
},
{
"person_id": "1029648329",
"firstname": "Sophie",
"lastname": "Celements",
"dateofbirth": datetime(1959, 7, 6, 17, 35, 45),
"vocation": "ENGINEER",
"address": {
"number": 5,
"street": "Innings Close",
"city": "Basilbridge",
}
},
{
"person_id": "7363626383",
"firstname": "Carl",
"lastname": "Simmons",
"dateofbirth": datetime(1998, 12, 26, 13, 13, 55),
"vocation": "ENGINEER",
"address": {
"number": 187,
"street": "Hillside Road",
"city": "Kenningford",
}
}
]
await person_coll.insert_many(person_data)
1

First, add a $match stage that finds documents in which the value of the vocation field is "ENGINEER":

pipeline.append({
"$match": {
"vocation": "ENGINEER"
}
})
2

Next, add a $sort stage that sorts the documents in descending order by the dateofbirth field to list the youngest people first:

pipeline.append({
"$sort": {
"dateofbirth": -1
}
})
3

Next, add a $limit stage to the pipeline to output only the first three documents in the results.

pipeline.append({
"$limit": 3
})
4

Finally, add an $unset stage. The $unset stage removes unnecessary fields from the result documents:

pipeline.append({
"$unset": [
"_id",
"address"
]
})

Tip

Use the $unset operator instead of $project to avoid modifying the aggregation pipeline if documents with different fields are added to the collection.

5

Add the following code to the end of your application to perform the aggregation on the persons collection. Select the Synchronous or Asynchronous tab to see the corresponding code:

aggregation_result = person_coll.aggregate(pipeline)
aggregation_result = await person_coll.aggregate(pipeline)

Finally, run the following command in your shell to start your application:

python3 agg_tutorial.py
6

The aggregated result contains three documents. The documents represent the three youngest people with the vocation of "ENGINEER", ordered from youngest to oldest. The results omit the _id and address fields.

{
'person_id': '7363626383',
'firstname': 'Carl',
'lastname': 'Simmons',
'dateofbirth': datetime.datetime(1998, 12, 26, 13, 13, 55),
'vocation': 'ENGINEER'
}
{
'person_id': '1723338115',
'firstname': 'Olive',
'lastname': 'Ranieri',
'dateofbirth': datetime.datetime(1985, 5, 12, 23, 14, 30),
'gender': 'FEMALE',
'vocation': 'ENGINEER'
}
{
'person_id': '6392529400',
'firstname': 'Elise',
'lastname': 'Smith',
'dateofbirth': datetime.datetime(1972, 1, 13, 9, 32, 7),
'vocation': 'ENGINEER'
}

To view the complete code for this tutorial, see the Completed Filtered Subset App on GitHub.

Back

Tutorials