Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split the child counting into two functions #32

Merged
merged 2 commits into from
Mar 13, 2017

Conversation

puf
Copy link
Contributor

@puf puf commented Mar 13, 2017

This PR splits the child counting into two functions:

  1. tracks add/remove of child nodes and increments/decrements the counter
  2. one function is triggered when the counter is deleted and it recounts the child nodes

The delta counter in #1 is more efficient than counting all children every time, since it only needs
access to the delta. The recounter function in #2 is equally inefficient as before, but is needed
a lot less.

1. tracks add/remove of child nodes and increments/decrements the counter
2. one function is triggered when the counter is deleted and it recounts the child nodes

The delta counter in #1 is more efficient than counting all children every time, since it only needs
access to the delta. The recounter function in #2 is equally inefficient as before, but is needed
a lot less.
@puf puf requested review from nicolasgarnier and jwngr March 13, 2017 03:21
@jwngr
Copy link

jwngr commented Mar 13, 2017

Nifty! I like it. Thanks @puf!

@eltonnobrega
Copy link

eltonnobrega commented Mar 16, 2017

First we must understand that the likes of users should not be putter inside the post, just likes_count should be in the post.

-likes
	-postid
		-uid_user1:true
		-uid_user4:true
		-uid_usern:true
-posts
	-postid
		-likes_count or simply likes(just remember that these likes in posts are actually likes count)
		-...fff...
exports.likesCount = functions.database.ref('/likes/{postid}').onWrite(event => {
	//Insert Like
	if (!event.data.previous.val() && event.data.val()){
		return admin.database().ref('posts/' + postId + '/likes').transaction(function (current) {
			return (current || 0) + 1;
		});
	}
	//Delete Like
	if(event.data.previous.val() && !event.data.val()){
        return admin.database().ref('posts/' + postId + '/likes').transaction(function (current) {
			return (current || 0) - 1;
		});
    }
});

This code is good especially if you are using an external server and not the firebase functions.
Why? Why you will not need to download all the children as in the case of using event.data.numChildren. Assuming a post has 1 million likes then "numChildren" would have to download all to make the count. Assuming that uidUser has 4 bytes and the value true is 1 byte, we will have 5 bytes per like. Then 1 million of like represents 1000000 * 5bytes = 5000000bytes ~> 4.77MB.

Why is this code bad?
If incoherence occurs and the firebase function is not called, this like or deslike will never be counted.

The solution is to actually use "numChildren"
You will only download the 4.77MB when you have 1 million likes, and this will be done within the google network, which is extremely fast. If I am not mistaken this internal traffic is neither paid.
In case of inconsistency, on the next call the function will perform the recount of all the likes and make the correct count.
What's more, the code is cleaner and nicer.

exports.likesCount = functions.database.ref('/likes/{postid}').onWrite(event => {
	return admin.database().ref('posts/'+event.params.postid+'/likes').set(event.data.numChildren());
});

or

exports.likesCount = functions.database.ref('/likes/{postid}').onWrite(event => {
    //Insert Like or Delete Like
    if ((!event.data.previous.val() && event.data.val()) || (event.data.previous.val() && !event.data.val())){
        return admin.database().ref('posts/'+event.params.postid+'/likes').set(event.data.numChildren());
	}
});

@nicolasgarnier
Copy link
Contributor

It is very inefficient to download the whole likes data. You may not pay for data transfer but you will pay for CPU/memory time usage both of which will increase a lot in this scenario.

Imagine a popular picture/post which gets 20 Million likes, this means that you will need to count, in average 10 Million likes 20 million times... I haven't done the math but that is going to be very expensive. (yep 200 Trillion likes downloaded...)

Also it's true to say that data discrepancy can occur with Functions (some events may be double fired in some rare occasions). but typically these real-time "counts" don't need to be very accurate (I'm pretty sure likes on Facebook/instagram are not perfectly accurate and if you pay attention you will see that Likes and views count in Youtube are only real-time up to around 300 likes/views after which they switch to updating them every 24h or so.

@eltonnobrega
Copy link

The multiplication is all wrong, but I understand what you said. It is essential to give preference to perfomace, since the amount of likes can be high. And by weighing that you said, I repeat that the likes can not be inside the post.
The way that structure the database is, if it displays in the view the top 10 posts. Will download all the likes of all users from 10 posts. That is 20 Million each.

What about followers and favorites? Is this methodology correct?

@eltonnobrega
Copy link

Now it's beautiful. Look:

-likes
	-postid
		-uid_user1:true
		-uid_user4:true
		-uid_usern:true
-posts
	-postid
		-likes
		...
exports.likesCount = functions.database.ref('/likes/{postid}/{uid}').onWrite(event => {
    var collectionRef = event.data.ref.parent;
    var counterRef = admin.database().ref('posts/' + event.params.postid + '/likes');
    
    return counterRef.transaction(function (current) {
        //Insert Like
        if (event.data.exists() && !event.data.previous.exists()) {
            return (current || 0) + 1;
        }
        //Remove Like
        else if (!event.data.exists() && event.data.previous.exists()) {
            return (current || 0) - 1;
        }
    }, function(error, committed, snapshot) {
	//Transaction failed or aborted
        if(error || !committed){
            return collectionRef.once('value', function(likes) {
                    return counterRef.set(likes.numChildren());
            });
        }
    });
});

My question now is: is it better to set applyLocally to true?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants