MongodB cumulated conditionnal array:using map/reduce or aggregation

aggregatedistinctmongodb

First, I aplogized because I have some problem to formalized my question so any idea on express it correctly will be appreciated and edited.

Last try:
How to get last of a cumulated conditionnal array with distinct value using map/reduce or aggregation

I have a collection of documents and I want to group by (student, dataset, target, stimulus) and chapter to get a rate
where the chapter rate consists of the last element of the cumulation rates for each chapter where => chapter.

Better with illustration

Input :


{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 1, "CV" : "C", "target" : "l", "stimulus" : "l", "rate" : null}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 2, "CV" : "C", "target" : "l", "stimulus" : "l", "rate" : 0.35}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 3, "CV" : "C", "target" : "l", "stimulus" : "l", "rate" : null}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 4, "CV" : "C", "target" : "l", "stimulus" : "l", "rate" : 0.75}

{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 1, "CV" : "C", "target" : "l", "stimulus" : "m", "rate" : 0.10}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 2, "CV" : "C", "target" : "l", "stimulus" : "m", "rate" : null}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 3, "CV" : "C", "target" : "l", "stimulus" : "m", "rate" : 1}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 4, "CV" : "C", "target" : "l", "stimulus" : "m", "rate" : null}

Output step 1


{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 1, "CV" : "C", "target" : "l", "stimulus" : "l", "rate" : null}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 2, "CV" : "C", "target" : "l", "stimulus" : "l", "rate" : 0.35}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 3, "CV" : "C", "target" : "l", "stimulus" : "l", "rate" : 0.35}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 4, "CV" : "C", "target" : "l", "stimulus" : "l", "rate" : 0.75}

{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 1, "CV" : "C", "target" : "l", "stimulus" : "m", "rate" : 0.10}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 2, "CV" : "C", "target" : "l", "stimulus" : "m", "rate" : 0.10}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 3, "CV" : "C", "target" : "l", "stimulus" : "m", "rate" : 1}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 4, "CV" : "C", "target" : "l", "stimulus" : "m", "rate" : 1}

Explaination:

For 1st target l, stimulus l and chapter 4 :
rates: [null, 0.35, null, 0.75] >> rate: 0.75

For 1st target l, stimulus l and chapter 3 :
rates: [null, 0.35, null] >> rate: 0.35
For 1st target l, stimulus l and chapter 2 :
rates: [null, 0.35] >> rate: 0.35
For 1st target l, stimulus l and chapter 1 :
rates: [null] >> rate: 0.35

For 2d target l, stimulus m and chapter 4 :
rates: [0.10, null, 1, null] >> rate: 1
For 2d target l, stimulus m and chapter 3 :
rates: [0.10, null, 1] >> rate: 1
For 2d target l, stimulus m and chapter 2 :
rates: [0.10, null] >> rate: 0.10
For 2d target l, stimulus m and chapter 1 :
rates: [0.10] >> rate: 0.10

Simple implementation can be done in a loop

FOR each chapter in chapters
    SELECT records WHERE record.chapter <= chapter

Output step 2


{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 1, "CV" : "C", "matrix" : ["l", ["l", null], ["m", 0.10]]}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 2, "CV" : "C", "matrix" : ["l", ["l", 0.35], ["m", 0.10]]}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 3, "CV" : "C", "matrix" : ["l", ["l", 0.35], ["m", 1]]}
{ "_id" : ObjectId("5dd31e91ffa18a63d2c1777a"), "student" : 2780, "dataset" : "gp", "chapter" : 4, "CV" : "C", "matrix" : ["l", ["l", 0.75], ["m", 1]]}

I implemented it in python but it is quite slow and for ~ 8600000 records reduced to ~13400 take ~ 7 minutes. (Not so bad thought but still not staisfactory)

So I would be curious on a efficient way of using aggregation framework or map/reduce in mongodb.

Implementation in python

for chapter in sorted(db.student_confusion.distinct("chapter", {"dataset": dataset, "student": student}), reverse=True):
    records = sorted([n for n in db.student_confusion.find({"dataset": dataset, "student": student,"CV": CV, "chapter": {
                    "$lte": chapter}})], key=lambda key: (key["target"], key["stimulus"]))
    records_rates = []
    for couple, group in itertools.groupby(records, key=lambda key: (key["target"], key["stimulus"])):
        target, stimulus = couple
        rates = [c["WA_rate"] for c in list(group) if c["WA_rate"] is not None]
        if len(rates) == 0:
            rate = None
        else:
            rate = round(rates[-1],2)
        # itertools.groupby(records, key=lambda key: (key["target"], key["stimulus"]))
        records_rates.append((target, stimulus, rate))
    matrix = [(key, [(n[1],n[2]) for n in list(group)]) for key, group in itertools.groupby(records_rates, key=lambda key: key[0])]
    db.student_cumul_matrix.insert({"student":student, "dataset": dataset, "CV": CV, "chapter": chapter, "matrix":matrix})

Any idea or demonstration of virtuosity will be appreciated and any intents upvoted.

Best Answer

Start using db.student_confusion.aggregate() with something like this:

[
    {
        '$group': {
            '_id': '$chapter',                 
            'ma': {
                '$push': {
                    'target': '$target', 
                    'stimulus': '$stimulus', 
                    'rate': '$rate'
                }
            }, 
            'chapter': {
                '$first': '$chapter'
            }
        }
    }, {
        '$project': {
            '_id': 1,                 
            'chapter': 1, 
            'matrix': {
                '$reduce': {
                    'input': '$ma', 
                    'initialValue': [], 
                    'in': {
                        '$let': {
                            'vars': {
                                'tg': '$$this.target', 
                                'vl': [
                                    [
                                        '$$this.stimulus', '$$this.rate'
                                    ]
                                ], 
                                'doc': {
                                    'target': '$$this.target', 
                                    'v': [
                                        [
                                            '$$this.stimulus', '$$this.rate'
                                        ]
                                    ]
                                }, 
                                'ind': {
                                    '$indexOfArray': [
                                        '$$value.target', '$$this.target'
                                    ]
                                }
                            }, 
                            'in': {
                                '$cond': [
                                    {
                                        '$gt': [
                                            '$$ind', {
                                                '$literal': -1
                                            }
                                        ]
                                    }, {
                                        '$concatArrays': [
                                            {
                                                '$filter': {
                                                    'input': '$$value', 
                                                    'as': 'fi', 
                                                    'cond': {
                                                        '$ne': [
                                                            '$$fi.target', '$$tg'
                                                        ]
                                                    }
                                                }
                                            }, [
                                                {
                                                    'target': '$$tg', 
                                                    'v': {
                                                        '$concatArrays': [
                                                            '$$vl', {
                                                                '$let': {
                                                                    'vars': {
                                                                        'vv': {
                                                                            '$arrayElemAt': [
                                                                                '$$value', '$$ind'
                                                                            ]
                                                                        }
                                                                    }, 
                                                                    'in': '$$vv.v'
                                                                }
                                                            }
                                                        ]
                                                    }
                                                }
                                            ]
                                        ]
                                    }, {
                                        '$concatArrays': [
                                            '$$value', [
                                                '$$doc'
                                            ]
                                        ]
                                    }
                                ]
                            }
                        }
                    }
                }
            }
        }
    }, {
        '$project': {
            'chapter': 1, 
            'matrix': {
                '$map': {
                    'input': '$matrix', 
                    'as': 'mat', 
                    'in': {
                        '$concatArrays': [
                            [
                                '$$mat.target'
                            ], '$$mat.v'
                        ]
                    }
                }
            }
        }
    }
]