A couple of years ago I was working on a calendar application, on the machine learning team, to make it smarter. We had many great ideas, one of them being that once you indicated you wanted to meet with a group of people, the app would automatically suggest you a time slot for the meeting.
We worked on it for several months. Because we wanted to do things like learn every users working hours, which could vary based on many things, we couldn’t just use simple hand-coded rules. In the end, we implemented this feature using a combination of both hand coded rules (to avoid some bad edge cases) and machine learning. We did lots of testing, both automated and manually in our team.
Once the UI was ready, we did some user testing, where the new prototype was put in front of real users, unrelated to our team, who were recorded while they tried to use it and then were asked questions about the product. When the reports came in, the whole team banged their heads against the desk: most users thought we were suggesting times when the meeting couldn’t take place!
What happened? If you included either many people or even only one very busy person, there will be no empty slot which is good for everyone. So our algorithm would make three suggestions, saying that for each there would be a different person who might not be able to make the meeting.
In our own testing, it was obvious to us what was happening, so we didn’t consider it a big problem. But users who didn’t know the system, found it confusing and kept going to the classic grid to manually find a slot for the meeting.
Lesson: machine learning algorithms are never perfect and every project needs to be prepared to deal with mistakes.
How will your machine learning project handle failures? How will you explain to the end users the decisions the algorithm made? If you need help answering these questions, let’s talk.