Sigmoid Function, Mathematical formula which turned out to be useful in Data Analytics work!
It’s not a mistake to study Maths for 3.5 years. One of the advantages of studying Mathematics while in college, is logic and way of thinking, which of course is very useful in my current job, especially as a Data Analyst.
Long story short, I have implemented many mathematical lessons (especially statistics) in my work. Like one of them:
- Define business metrics, whether using the mean, median, mode, or even business justification is sufficient,
- Coding the query sequentially and structured,
- Doing hypothesis testing,
- Simple til advance analysis to give insights and recommendations, and
- Many other things.
This time, I will tell you briefly about the sigmoid function, which at first I was still confused about what it was used for, but now I feel it is useful in my work.
We know that the sigmoid function, comes from this formula:
With general graphs:
Who would have thought, we could apply a sigmoid function, to do the weighting!
For example, such as the youtube algorithm to determine whether a content is trending content or not. If we look in more detail, the ranking of one of the most trending YouTube content will not always be higher than the ranking below it. As in the image below.
We obviously don’t know how the youtube algorithm works (But you can read the explanation here). However, if you look at it in more detail, it’s possible that a newly uploaded content will be trending if their number of views suddenly explodes in a short time.
Therefore, we can try to weight the number of views per content, but we also need to pay attention to the time or recency of the content. Long story short, we can use data points like this:
From here, we want the number of views content at the most recent hour to be the main factor that makes the content trending, but it is possible that the overall number of views content is also used as a factor. Therefore, we can use the inverse sigmoid function, to assign weights, where the more recent the time, the higher the weight and vice versa.
Using the sigmoid inverse function above, we can assign weights to each content as follows:
From here we can get a real-time score for each content, which will dynamically be updated once the content gets views.
- There are many methods to weight a number, we can use logarithmic functions, or even business justification (e.g.: number of orders outside Jakarta is weighted 0.6, but other than that we are weighted with a smaller number). We can use a custom formula, according to the needs of what weight function is needed.
- Weighting is used as a method so that all historical data can be considered and (in this case) youtube content is trending or not, that’s all. Once in the last few hours there has been no activity at all on the youtube content, it is possible that the youtube content will not be trending anymore.
- We need to do simulation, even experimentation if we want to do weighting.