Data Science in the Trenches

If you are working as a data architect or a technical lead of a data team you are in a bit of thankless position at the moment. You could be working at or even founding one of the many data platform startups right now. Or work for the many enterprise consultancies that provide “big data solutions”. Both would mean directly profiting from you acquired technical skills. Instead, you are working in a company that actually needs the data you provide but also doesn’t care how you get it. There is the old business metaphor of selling shovels to gold diggers instead of digging for gold yourself. I think a closer metaphor is that the other guys are logistics and you are fighting with everybody in the trenches.

The particular trench for me is free-to-play mobile gaming which is closer to being a figurative battle field than say web or B2B. You either get big or you die. There is no meeting that goes by without people discussing performance metrics, mostly retention and ARPDAU. Because the business boils down to a mathematical formula: if you have a good retention and a good revenue per user and your acquisition costs are low you make a profit. If either of those is flailing, even just for a couple of days, you don’t. Fortunes can change very very quickly. Where metrics are this important, having people who can provide the metrics accurately is key. Hence front line data science.

The challenges you face in the trenches are of different nature. Real-time is very very important as everybody wants to see the impact of say an Apple feature right away. At the same time product managers and game designers want to crunch weeks of data to optimise say level difficulty. Spark Streaming query bugging out late night on Saturday and your inbox is overflowing with “What’s going on?” emails. Delays in a weekly Hadoop aggregation and a game release might be delayed as an A/B test could not be verified. In the trenches, the meaning within the data is much much more important than the technology you throw at it. But it’s also very limited from a data science point of you: you do a bit of significance testing here and a some revenue predictions there but most of the statistical methods are rather simple. Not what everybody was promised when taking up data science.

What does one gain being on the front lines? The data actually flows into the product every day, what you find during data mining is important to the survival of the game or app. Features live or die with your significance test which you hopefully picked the correct statistical method for. You could be making tools for data scientist or crunching large data sets for reports that one manager might read maybe – but that would be less chaotic, less rushed and less fun than throwing out some data and actually watching your game going up the charts. Welcome to the trenches.