Communicating Data Science Results

The Power Of The Written Word

Dec 07, 2023

In the last post, I discussed the steps to bring a data science product to an internal POC with an eye toward deploying a system in production. Those steps are all designed to drive consensus among the various leaders in the organization about the specific expectations they can have of a data science product. And what's the best way to get consensus? I'm sorry to say that no approach I've tried has ever worked better than a meeting, and a long one at that. But wait! It's not just a long meeting for the sake of having a long meeting!

I know that most meetings are the bane of corporate existence. The problem has gotten so bad that meetings have become group therapy sessions in many organizations. Whatever communication method you choose to share your team's results should avoid that trap as much as possible*.

The first step towards a successful data science meeting is to scrap PowerPoint altogether. PowerPoint is not the right tool for presenting data, especially data that contains a great deal of nuance and depth. Just go back to some presentations from six months ago, perhaps even presentations that you yourself gave, and try to figure out just what was said. If you read the deck verbatim, you gave a very boring presentation. If you didn't read the deck verbatim, you cannot hand off the deck to someone new to the organization to help them understand what work was done and why; you will have to be on hand to guide them through your thought process. That approach creates a tremendous burden to onboard new team members and runs the risk of giving different presentations to different audiences.

Another reason to ditch PowerPoint in this context: someone will inevitably interrupt your presentation with a question, generally around slide 2 or 3, looking for an answer you've likely given on slide 6. If you've prepared your presentation well and timed how long you will need to get through everything, congratulations, your flow has just been interrupted and you may never get to the end. I won't say I'm a perfect listener; as an invested participant in the meeting, I find myself wanting to interrupt all the time to get additional clarity on some point that may or may not be germane to the flow of the presentation.

So, without PowerPoint, what medium can you use to present data to the organization? I submit that a written report, like many of us used to make in high school, is the right way to go. It should contain sections for an introduction, materials and methods, results, and conclusions, and I'll go into what each section looks like more completely shortly. I'm not alone in adopting this approach; Amazon famously uses written reports for their planning sessions. Ultimately, these data science report presentations become planning sessions, albeit on a different timeline and for a different purpose than what I'm told Amazon does internally.

This report performs several vital functions:

It will help you and your team to organize your thoughts. Unlike a PowerPoint presentation, written reports require that you have complete thoughts, one after another, building towards a set of conclusions clearly and methodically. If you read your report aloud and realize you don't know how you reached a particular conclusion, it is time for a rewrite.
It will create a touchstone for further conversations with people who aren't in the meeting. I often find myself talking with people who just want to know what the team is up to; I happily hand them a bunch of reports and tell them I'd love to schedule a follow-up conversation to cover any questions or concerns they may have. If I have a follow-up conversation, that conversation tends to be far more productive since the other person will have as much context as anyone else.
It will provide foundational documents for patent conversations.
It will be training material for onboarding new team members.
It will provide as detailed and as documented a record that anyone could expect of how certain decisions came to be. I've been in too many calls where an executive actively wonders how we "got into this mess" without wanting to have some historical record.
It can serve as the foundation for any external publications. You and your team members may be very interested in publishing your work in any manner of ways (conference? blog? presentation?) and having source material on hand changes the task into adapting already written material rather than starting from scratch.
If anyone on your team does not speak English as their primary language, they may lose the flow of conversation or not understand some points made in a presentation. Writing everything down gives them time to parse the report's contents at their own speed, especially if you send the report out the day before so they can be as prepared as everyone else in the meeting.

But what are the drawbacks?

Time. These reports take some time to write. I typically set aside a week's time to get everything together, including graphs for the results sections, DAGs/code for the methods section, and just writing itself. Some tools can help, especially MLOps tools that can capture model training metrics, but in the end, everything needs to be compiled into a single document.
Not everyone's a writer. Most data scientists were trained in math, statistics, computer science, and engineering, and those disciplines tend to be light on writing. If you have team members who've gone to graduate school, chances are they have some experience with this kind of formal writing, but that still does not mean they are good at it. The problem can be ameliorated somewhat with practice and (now) with some generative AI tools.
Resistance. People who know that they don't know how to write well can be resistant to writing, and those who don't want to read can be resistant to reading. I've overcome this hurdle by asking people to trust me for one report cycle; the results are usually so overwhelmingly positive that I don't have to keep begging. I do, however, have to set aside quite a bit of time for editing.
English as a first language (or whatever language you write in as a first language). If your team works primarily in Language A, but the person who did the work mainly speaks Language B, they may not be comfortable writing and communicating deeply technical concepts in Language A. I argue that this discomfort can only be addressed by working directly to increase the contributor's comfort in Language A. To put it another way, practice makes perfect, and this point could be considered a benefit rather than a drawback from the standpoint of developing the skills of the people on your team.

Assuming you can get some buy-in (even if it's only for one trial meeting), you'll want to know what the report looks like and how the meeting can go.

Report Outline

The basic report outline looks like this:

Goal: a one-sentence summary describing the overall goal of the project. Goals would be things like "lower the fraud rate," or "increase search result relevance," and the like. These are very broad goals that are defined by Product.
Hypothesis: a one-sentence summary of the hypothesis to be tested. Hypotheses are of the form "Technology A will be better than Technology B at achieving the goal using Metric M," or "Technology C can achieve a goal when augmented with Features D, E, and F."
Results: a one-sentence summary of the results of the experiment. Putting the results at the beginning helps to set expectations for the reader who will read the document and the reader who just wants a summary. Sentences describing the results are of the form "Technology A performed worse than Technology B, with Metric M showing a decrease of Y%." or "Technology C, when augmented only by Features E and F, was able to improve Metric M by X%, while Feature D provided no additional benefit."
Introduction: This section should provide the business and product context for the experiment, including any information uncovered by previous experiments and subtle nuances brought up by SMEs through interviews or conversations around the project. As a side benefit, I've found that enlisting the Product Manager's help to write this section helps frame the discussion around business and product requirements rather than any neat tech that, while probably interesting to the data science and engineering teams, is ultimately a distraction from the matter at hand.
Materials and Methods: This section should include any and all relevant tech conversations. Training/testing splits, data set size, DAGs, algorithm selection, relevant parameters, and so forth are all described in this section. For particularly hairy datasets, I've found that including the SQL used to source the data has led to particularly productive conversations around whether that query is correct and can sometimes result in data engineering shifting their priorities to make cleaner pipelines to feed you the data you need.
Results: This section should include graphs, tables, and the like to provide whatever depictions of the work that you've done. I find it particularly helpful to build a story with your results; you did this one experiment, that led to this experiment, then this result led to this experiment, etc, leading to the conclusion you reached.
Conclusions: This section generally contains what you learned during your experiment and summarized points of discussion. If any meeting participants expressed particular reservations or some other sentiment, call it out here to explain how their point was-- or was not-- addressed. I've found that creating a bullet point list (but with complete sentences! We're still avoiding the PowerPoint paradigm) helps facilitate discussion, and I also work with Product to order the list based on their perceptions of how the conversation should flow.

Meeting Agenda

The meeting agenda looks like this:

The day before the meeting, send out the report.
Minutes 0-15: everyone reads the report. If you're on a conference call, I generally ask participants to put some emoji in the chat to indicate when they're done. If I can somehow wrangle everyone physically together in a room, I'll wait until I see everyone's put the report down.
Minutes 15 - 75: Discuss the contents of the report. Everyone will have different takes on what is presented, and every time, I get a wide array of questions I did not even think to ask. You'll find a Subject Matter Expert (SME) who will tell you that you got some fundamental principle wrong in a subtle but essential way, you'll get an engineer asking about why you didn't examine some use case you didn't know about, or you'll have a businessperson ask why you chose one evaluation method over another. It is crucial to take everyone's questions at face value and assume that they're approaching the conversation in good faith; if you think you've already answered many of these questions, but those questions keep getting asked, then it's likely that you didn't answer those questions sufficiently in your writing.
Minutes 75-120: assessing what to do next. Sometimes this phase takes 15 minutes, sometimes longer. The Product Manager (aka Product with a capital P) will weigh in as to whether or not the presented report meets the needs of the product (lowercase p). If you've said you can find the needle in the haystack 75% of the time, but they need it at 90%, then there's no need to go into further conversations about moving the model into production. In those situations, I try to guide the discussion to brainstorm how to get that last 15% of performance. If Product is satisfied with the model's performance, then there will be a conversation with Engineering, Product, and Project about how to move the model into production, and with the Business to help them understand when they can expect an ROI (Return On Investment).

And there you have it-- clarity of communication achieved through creating concrete documentation and a single reference point full of complete thoughts from which further conversation can spring. The discussions in these meetings should hopefully align the teams behind a common purpose and set expectations around what can and cannot be delivered, by whom, and by when.

How often should you write this report? The product of a data science sprint is a report, and the sprint demo is the meeting. Once you get the hang of them, you can start producing them pretty efficiently, and most teams seem to get there by about four or five reports.

* If you want to read more about making meetings less terrible, I highly recommend starting with Death By Meeting by Patrick Lencioni. Many of his books read like Just So Stories to me, but even so, it's a great place to start to fix the culture of having meetings just for the sake of having meetings.

Managing Data Science

Discussion about this post