Kafka Streams Topology Testing: Mocked Streams 3.3 is out.

Heterogenous Groups in Reviews

Product reviews should be improved. Different groups of people have other expectations about a product.

While working in the data engineering field it is no suprise that certain topics make it into lunch time discussions. Our work is basically in the recommender systems area, so product reviews are not far off. Before I start, I want to clarify that product reviews are useful in the current state and can give you a rough impression of a product, but there are problems the average person is not aware of.

I remember the first time we stated that the employed product review systems, on shopping sites for instance, need a fundamental change to be more transparent and useful for the individual. It took about two months and we had the same discussion. Now, I will finally write about it. So what is the status quo? Let us stick to the shopping site example.

Where do Reviews come from?

The first question we have to ask is, on what basis do the users review products? There might be a well-defined theory, but to keep it simple, let us assume they rate the product based on their expectations of previous experiences or comparison to other products with the same functionality. Let us keep that in mind, before getting into the well-known problems of product reviews.

The Well-Known Problems

The first problem is, that low-price products receive far more reviews than high-price products. If you do not go with the masses, you might end up exhaustingly seeking for any reviews. Further, alltough it is not legal certain parties operate sock puppets (computer programs) to write reviews that support their products. When more reviews are aggregated the sock puppetery might become background noise. Still, high-price products are clearly more vulnerable for sock puppetery, so be aware.

Heterogeneous Groups of Users

Let us focus onto a certain product. Who usually buys this product? We normally assume that the product is bought by people who are like yourself. While there are products which are indeed just for one specific group, lots of products are bought and used by different groups of people, for instace: students, parents, kids and seniors.

A student might have different expectations from a product than, for instance, a mother. I would assume that the shop usually dont care about this variability. The goal of the shop: Serving products with good reviews, so the products get bought because of their recommendations, but instead of making you (as individual in a certain group of people, e.g. parents) happy, shops only target the average person.

An Explicit and Implicit Solution

To adress this room for improvement with a solution, let us start with a simple one. The reviews are grouped by audience pools, parents, students and so forth. Every user can choose his audience pool on a voluntary basis. On the other end, people reading the reviews can decide to which audience pool they belong to.

A more integrated implicit approach would be, to classify each user, based on their shopping behaviour and then use this information to label any review the user writes. Analogously, people reading the reviews are also classified by their shopping behaviour and the system only let them see reviews of their audience pool.

While the first more explicit solution is more transparent and gives more freedom to the user, the latter lets the system decide for the user and further shapes his filter bubble and supports nudging effects.


Giving and receiving recommendations is a good thing, but it is important to know who recommends what on which basis? This article states well-known problems like sample-sizes and socket puppetery as well as the problem of variability among the users. No matter the opinion on which solution is best, stating the hidden problem and being aware of it is a step in the right direction.

One or two mails a month about the latest technology I'm hacking on.