A Data-Science Point of Entry

About a year ago, I had an interesting conversation with a young Naval Officer who was on his way out of the Navy with a freshly-minted degree in Data Science. The question was: what next? It made sense for him to be asking me, as I’m a reasonably old guy who has made something of a career in Data. That conversation put my foot on a path that ultimately led to the establishment of RINGKNOCKER. True story.

So this morning I had a similar conversation. Slightly different scenario: the kid in this case was younger, just about to complete his undergrad studies with degrees in Economics & Data Science. But the question was much the same: how do I make the leap from academia to the real world?

I had some thoughts. They are data-centric thoughts, but then that’s where I live.

Here’s one: don’t count on finding a job right out of college as a capital-letter Data Scientist.

Show up often enough with your sleeves rolled up and your boots on, and opportunity will find you.

Why not?

Well, you might have noticed that there were a lot of young data scientists standing next to you at your graduation ceremony. It’s a popular field. But when you look out into the industry, what do you find? Not so many lucrative data science jobs are open to you.

There are plenty of data science jobs. Many of them are with startups that expect you to work for peanuts on the outside chance that the “data science” problem they want you to solve—yes, those were air quotes—has a data science solution they can sell to their theoretical customers. Up to you whether you want to risk your rent on that proposition, but play it safe and get a roommate.

Some data science jobs are lucrative. They are with well established firms who have found customers willing to pay for what their data scientists create. The problem? It only takes one or two strong data scientists to conceive and develop an approach that an army of coders will spend the next three years working out in product releases. Those guys are playing in the Show, and there just aren’t that many of them. Are you such a rockstar rookie data scientist that you can reasonably expect to be recruited into the majors while the ink on your diploma is still wet? Maybe… but you’ll also have to get lucky. So maybe you should have a backup plan.

There are plenty of tough data science problems. Many of the hardest ones employ teams of people secreting rivers of flop-sweat trying everything they can think of to stay on top of them. Those people often make a ton of money, because not solving those problems day in and day out will usually leave their institutional employers dead in the water. And trillion-dollar asset managers don’t LIKE being left dead in the water.

So here’s the funny thing. Very often, the problems those people are (barely) solving are data-science problems. Except those people aren’t data scientists, and they don’t know their problems are data science problems!

It isn’t that the same firms don’t employ one or two major-league data scientists. But these are the kinds of problems that live in Microsoft Access databases and Microsoft Excel spreadsheets, and—as I am sure you were about to remind me—real capital-D Data Scientists never dirty their soft little hands with Microsoft Office products or, G-d forfend, Visual Basic for Applications (VBA). I mean seriously.

And the same institutions also employ a ton of coders who can write VBA, or at least could do so if they were ordered to. But none of those coders are Data Scientists, or even data scientists, so on their own they’re rarely in a position to help those desperate data managers.

This is not the sort of job you can apply for. Nobody’s hiring for a position they didn’t even realize might exist. But if you happen to be on hand doing something related, near where problems like this crop up, then you’re in a position to help. And I mean really help… when you can offer a solution that improves both throughput and precision by three orders of magnitude, you’ve just changed the nature of the game. Welcome to the Majors.

So… how to be on hand for that sort of thing?

I would look into Enterprise Data Management (EDM), or its cousin Master Data Management (MDM). This is the janitorial level of the Data Science ivory tower: it’s all about matching, cleansing, and conforming data, usually at industrial scales, in order to produce datasets that real Data Scientists can actually use without breaking out in hives. It’s the sort of thing that every large institution needs a lot of, and they employ small armies of mid-level coders to get it done.

If you have a strong background in the nuts and bolts of database query and administration, and if you are a team player and a systems thinker, then you can easily find an entry-level position in Data Management that will put you in the right place to hear about those hard problems and pay you enough to show your roommate the door. Within that context, your data science degree is just icing on the cake. You’ll be competitive right out of the gate.

How to increase your odds? Years ago I wrote an article on LinkedIn that aimed to provide a little clarity to both ends of the EDM interviewing experience. That article is still 100% valid, and if you’re headed in that direction then I would strongly consider using it as a study guide.

Beyond that, you should be familiar with the basic tools of enterprise software development, including the Agile Framework and SDLC management tools like JIRA. In fact, JIRA is so core to the enterprise software development experience that I would strongly suggest signing up for a cloud account of your own and moving all your personal projects into it. They’ll be more productive, and you’ll gain some essential experience.

Wait… you have some personal projects, right? If this is a serious conversation, then the answer had better be YES. Not even going to bother to explain.

Scratch most any major-league Data Scientist, and underneath you’ll find a minor-league data scientist who happened to be on hand when an interesting problem popped up, and had access to the tools he needed to solve it profitably. Let’s break that down:

  • You need to be on hand. Get out there with the farm team and write some useful code. Once in a while, back-stop the team with some awesome data-science moves, but mostly just get in there and do your part. Show up often enough with your sleeves rolled up and your boots on, and opportunity will find you.
  • You need to have the tools. Python is what professional institutional coders use on weekends. Learn enough VBA to get an Excel spreadsheet to sit up on its hind legs and beg. Learn to encapsulate SQL query functionality into a library of CTEs. Write a Definition of Done for your personal projects and live by it.

Data science is fun. But you can’t live on candy, and few working data scientists spend all day every day doing data science. The key to a robust start with tons of data science career opportunities is a pretty simple formula: Less science. More data.

Related Articles


Your email address will not be published. Required fields are marked *