Ubuntu – What data does the Software Centre use to give you recommendations

software-centersoftware-recommendation

Just below the What's New section there is a Recommended For You section.

Currently I don't have it enabled, but other than the list of software, what data does it take about what you have installed?

  1. Does that list of software include software installed via Synaptic or apt-get?

  2. Does it take data about the time (or order) of installation?

  3. What does it use that data for? (Other than recommendations, if any)

  4. How does it work out what software you might like from that? (For example, does it take just the category, or more in depth categorisation)?

  5. If I have A installed and it recommends C, and I then installed B, could that mean it might stop recommending C (maybe because B and C are very similar)?

enter image description here

Best Answer

APT, or Advanced Packaging Tool, basically resolves dependency problems and retrieves the requested packages. It works with dpkg, another tool, which handles the actual installation and removal of packages (applications). APT is very powerful, and is primarily used on the command line (console/terminal). There are, however, many GUI/Graphical tools to let you use APT without having to touch the command line.

Synaptic is one of the GUI/Graphical tools to use APT. This is what Synaptic Package Manager's website has to say about the same -

Synaptic is a graphical package management program for apt. It provides the same features as the apt-get command line utility with a GUI front-end based on Gtk+.

So basically both Synaptic and the Terminal do the same thing, ie use APT, with a different interface (GUI and CLI respectively).

This is what the Ubuntu Wiki page on Software Center recommendations states -

Data we can use

The current data we have about people we have is:

  • what all other people have installed (new recommender service / popcorn)

  • what all other people are using (zeitgeist/new recommender service/popcon)

  • what specific apps other people like or dislike (rnr)

The data we have about the users system is:

  • what apps the user has installed

  • what apps the user is using (popcon/zeitgeist)

  • what mimetypes the user is working with (zeitgeist)

  • maybe the SSO ID of the user

  • maybe what apps the user likes (based on his/her reviews)

  • the user's contacts

Now what is funny about this is the use of the word maybe which makes things a bit sketchy.

Basically what the recommendation server does is that it stores a list of packages installed on the system. Whether it was installed by the software center, synaptic or the terminal is immaterial here since it most probably uses dpkg or APT for the same. I can say this because I haven't used the software center but still get recommendations based on the packages I have installed via the terminal.

Storage

The server stores the list of each participant's installed packages, and a cache of the recommendations generated for them.

Serving

When sent a request containing the UUID, the server returns a Json list of packages representing the recommendations for that UUID.

There will be a REST API call that involves the UUID and that will return the recommendations in some format that s-c can understand.

Ultimately, the Software Center is also a GUI/Graphical tool for APT.

However, a few of the points you have raised come under the category of Unresolved Issues and no information is given as to how the recommendations exactly work, ie the algorithm.

Unresolved issues

  • How do we cater for people whose computer is used by multiple people? Should we add local username to the UUID to ensure it's unique (when some or all of the users don't have an SSO account?

    • It's an interesting question what people would expect here. If I have a dedicated game machine and productivity machine then we should have two different recommendations. If OTOH I have a laptop and a desktop that I use for the same things the recommendations should be the same. Hopefully the system can work it out from the context.
  • Does the algorithm take software ratings into account as well as whether it is installed? Is it less effective if someone has never rated software themselves (i.e. Users without an SSO account)?

  • What if someone reinstalls Ubuntu?

    • So we should probably do periodic "ping" (even if the system does not install/remove software a ping to tell the server that its still in use) with the UUID to be able to remove no longer valid UUIDs over time.
  • One interesting point though is bootstrapping the dataset, that is, what recommendations to serve until we have a reasonable amount of data on the server. In the case of recommendations based on reviews we already have a decent amount of reviews up there to start review—based recommendations. For recommendations based on installed packages otoh, we'd need to start receiving data for a while before we can start making useful recommendations.

It is best to contact the developer(s) or the team at Canonical who have designed the software center recommendations for the most accurate answer.

That said, I feel that the recommendation system is not that intelligent as it recommends popular packages, which many other users have installed, rather than lesser known but similar packages, which may actually be of more relevance to the user.