by jnordwick on 9/8/2016, 6:02:22 PM
by mrharrison on 9/8/2016, 6:31:29 PM
We should rename this job position to Data Sanity Engineers.
I have been thrown these projects at work before, where I'm the frontend engineer and I need to make some cool D3 visualization, but low behold the data is shit, and I have to help the backend team make the data useable. It's a mind-numbing job, that nobody wants, because it sounds like a one month task to get a good REST API up and working, but it usually takes three months, because you have to go back and forth making sure the data is right, and there is always 10 tricky edge cases that you have to work some magic on. Not only that but you need to have smart people cleaning the data, so that you don't make some big mistake down the line or your REST API is super slow, and you have to add another couple weeks or month to rework the data again. So that one month becomes three months, and most likely a year, because somebody will say that looks great but can we also add this, and it goes on and on. It's literally a mind-numbing job that most nobody wants. I have found that products like Tableau are the best for this, you still have to clean the data, but it helps speed up the process.
Data cleaning is a super golden problem to solve.
by dmatthewson on 9/8/2016, 6:11:45 PM
From the article: "Data engineers are the janitors who keep your data clean and flowing."
Hm, I wonder why he's having problems hiring janitors.
by tom_b on 9/8/2016, 7:02:21 PM
Ignoring the breathless nature of the article, this is a buzzword label for a commodity skill set that pays a commodity salary in tech. It is also the commodity skill set that my employers have all paid me for.
There has been for a long time hype around new technology and labels for business intelligence, data warehousing, big data, and now data engineering/science. I'm not saying there are not some roles in this space that return huge value to organizations, but that these opportunities are much rarer than the buzz indicates.
I wonder if the perceived shortage is mainly hype as the shift to new cloud technologies makes many of the older ideas a little less useful - if you are plowing data into BigQuery, you probably aren't so worried about your star schema data model for reporting.
I would strongly advise people that look at these types of articles to look at the roles in question and ask "Is this role on the critical path to customers paying us?" My experience has been that the answer is often "No." This is bad. I have also seen situations where businesses that do rely on smart data integration can show that they are selling dollar bills for ten cents that still have trouble getting customers on board with spending that ten cents. Business is weird.
by mattnewton on 9/8/2016, 5:30:29 PM
I'm trying to switch careers into "Data Engineering" now, as a full stack developer who is more interested in ML, and I've found almost no traction internally at my company or externally. It looks like I may just accept a full stack position at a good company that does a lot of data science for now, but though I would ask - Where are all these jobs?
by ef5a0b0628 on 9/8/2016, 9:20:39 PM
Every time something comes up on HN about a talent shortage in a field related to software engineering, it hurts. I have been unsuccessfully looking for a full time position since my last start up (I was not a founder) folded six months ago. I have been on over 25 in person interviews and gone through untold degrading whiteboard interviews, code tests, trick questions, and take home projects; all have ended in rejection. This industry has a need to torture candidates because we are all considered to be liars by default. Much is said about combating impostor syndrome in ourselves but we are too eager to engender it in others.
It seems people in this industry refuse to understand that some people are not perfect. I never graduated college because I hated it with the very fiber of my being, so I am not particularly great at white boarding answers to algorithm questions off the top of my head in a high pressure environment. If I need them during my job, I look up answers and learn from people who are much smarter than I am.
My personal identity has been shattered, as I thought my ~5-10 year history of success in the industry indicated I was in demand and talented. I saw posts like this and thought that if the worst happened I'd still be able to find a job. The idea that there is a talent shortage is a lie, or candidates like me wouldn't be treated as I have been. I'm not asking for a free job, or a handout. I have had a successful career so far and am capable of doing good work. But I'm not a specialist in Big Data Machine Learning Neural Networks.
I have struggled with bipolar disorder and suicidal ideation most of my life. I've dealt with the death of my beloved grandmother and my father who was instrumental in my choosing to be an engineer with only minor lapses in control. Nothing has caused me to consider taking my own life as much as the past 6 months. It seems there is no future for me in the only career I have any skill in and which is a huge part of my identity. And to constantly be told that there is such a shortage of engineers only salts the wound.
by rch on 9/8/2016, 6:05:14 PM
I've heard more than one CTO/Sr. Engineer refer to people in these roles as 'data grunts' or something similarly dismissive. Then they're mystified as to why solid engineers are so quick to move up or out, year after year.
by skynetv2 on 9/8/2016, 5:37:31 PM
anything and everything is marketed as "data science" and "data engineering" these days becasue this is the buzzword of the day.
I've been dealing with large data even before "big data" was a word but i dont call myself "data scientist" or "data engineer". I am still a software engineer working on what benefits my organization.
"Serial Entrepreneur" is the same these days, claimed by anyone who had a lemonade stand as a kid.
by jboggan on 9/8/2016, 8:24:25 PM
It's digital Charlie Work [0], that's why.
I really enjoy that kind of work but it is difficult to articulate your business value in that environment. The best thing is working closely with a data scientist/front-end dev who can deliver products to the analysts and executives that need the data and make sure that you get the credit for enabling new streams of data. But most of the time you are putting out someone else's dumpster fire.
One advantage of data engineering: unlike front-end work, there are few non-technical people who will have an opinion on how you are doing things and burden you with bikeshedding.
[0] - http://www.avclub.com/tvclub/its-always-sunny-philadelphia-c...
by GeneralMayhem on 9/8/2016, 6:07:59 PM
There are 6600 jobs listed and 6500 individuals on LinkedIn with that particular title, and therefore there's a shortage? Seriously?
* How many aren't on LinkedIn?
* Since the whole article is about how the job title is poorly defined and growing in prevalence, why would you assume that people who don't already have such a job would use the term?
* The "growth" charts on the full study are just as bad - how much of that is just from renaming existing generic developer positions, since "data engineer" is clearly a relatively new term?
by binalpatel on 9/8/2016, 8:25:07 PM
The fact that the original, unmodified article referred to data engineers as "janitors" pretty much says it all.
It's very analogous to front-office and back-office work in Investment Banking. "Data Scientist" are the front-office, with all the prestige, and "Data Engineers" are the back-office, doing a lot of the heavy lifting without nearly as much recognition.
In my opinion there shouldn't be a delineation. You shouldn't be a data scientist if you can't gather, process, and clean up your own data.
by ThePhysicist on 9/8/2016, 6:35:56 PM
Data engineering sounds much better than "data plumbing", but in my experience the latter is a more accurate description of the work of a data engineer: Building -and often unclogging- pipes that transport data from A to B, and putting in filters to clean it and extract the useful bits.
So why not change your LinkedIn job title to "data plumber", which is sure to get you some serious recruiter attention ;)
by untilHellbanned on 9/8/2016, 5:38:00 PM
Ahh the ol' write a post about a not well understood distinction and then proceed to not explain the distinction.
Looks like we need more English engineers too.
by cutler on 9/8/2016, 7:02:06 PM
I'm puzzled at the omission of Scala and Spark in this report.
by protomyth on 9/8/2016, 7:19:41 PM
I worked for about 10 years doing exactly what they want, but I ended up having to write a lot of the tools which means I'm not able to check the boxes on some tool you require which gets me punted by HR.
I'm starting to think that the message is if HR is going to do checklists then developers should really make sure they work mostly with contracts that use popular checklist items.
by makmanalp on 9/8/2016, 7:05:10 PM
Quick sidenote, anyone know where the databases / distributed systems engineering jobs are at? E.g. if one wanted to not use these tools but also go help build these tools?
I can think of Facebook, Google, Microsoft, IBM (which locations and groups within these companies / where?). I can also think of Confluent, CitusDB, Databricks, etc.
by lifeisstillgood on 9/8/2016, 11:44:02 PM
Weirdly the problem is most hires have it backwards.
Before going out to the market and discovering what talent exists and consequently what salary it will take to get them to join (ie negotiate) most organisations decide on a salary range, usually reflecting the current internal structure not the current external market.
The longer an organisation has existed the more out of whack with the market its internal set up is.
As such companies decide on their price point first, then go looking. Which is of course backwards.
by otto_ortega on 9/8/2016, 6:25:24 PM
Am I the only one who thinks there will be a ton of people changing their job title on LinkedIn to "Data Engineer" as a result of this article?
by realworldview on 9/8/2016, 5:54:09 PM
We surely need data mechanics.
by slantedview on 9/8/2016, 10:31:21 PM
These "shortage" stories always make me roll my eyes, because they're usually about money more than anything. And money is usually about cost of living more than anything.
If you choose to locate your company in one of the highest cost of living regions in the world, then you are complicit in the "shortage". Supply and demand - pay up. Or don't.
by moandcompany on 9/8/2016, 6:56:40 PM
I am a data engineer working on a machine learning team with models actively used as part of our product(s).
From my experiences working in various contexts (applied machine learning, analytics, policy research, academics, etc...), there are several of factors that contribute to this shortage: (1) "data engineering" often requires a lot of breadth and knowledge, (2) "data engineering" is often (derisively and naively) referred to as the "janitorial work" of data science, (3) the spectrum of roles and requirements within the "data engineering" domain, in terms of job descriptions, can range from database systems administration, to ETL, to data warehousing, curation of data services / APIs, business intelligence, to the design/deployment/operation of pipelines and distributed data processing and storage systems (these aren't mutually exclusive, but often job descriptions fall into one of these stovepipes).
Some of my quick thoughts and anecdata:
Companies have made large investments in creating 'data science' teams, and many of those companies have trouble realizing value from those investments.
A part of this stems from investments and teams with no tangible vision of how that team will generate value. And there are several other contributing factors…
"Dirty work." People haven't learned how to, and more often don't want to do it. There's a vast number of tutorials and boot camps out there that teach newcomers how to "learn data science" with clean datasets -- this is ideal for learning those basics, but the real world usually does not have clean or ideal datasets -- the dataset may not even exist -- and there are a number of non-ideal constraints.
There are people that wish to call themselves “data scientists” that “don’t want to write code” and would “prefer to do the analysis and storytelling”
Engineering as the application of science with real world constraints: there are a number of factors that we take into account, often acquired through painful experience, that aren’t part of these tutorials, bootcamps, or academic environments.
Many “data scientists” I’ve met have a hard time adapting to and working with these constraints (e.g. we believe that the application of data science would solve/address __ problem, but: how do we know and show that it works and is useful? what are the dependencies, and costs of developing and applying that solution? is it a one-time solution, or is it going to be a recurring application? does the solution require people? who will use it? what are the assumptions or expectations of those operators and users? is it suitable? is it maintainable? is it sustainable? how long will it take? what are the risks involved and how do we manage them? is it re-usable, and can we amortize its costs over time? is it worth doing? This is part of a methodology that comes from experience, versus what is taught in data science)
Larger teams with more people/financial/political resources can specialize and take advantage of these divisions of labor, which helps recognize the process aspects of applying data science and address some of the above
Short story: if you view data engineering as "janitorial work" you're missing the big picture
Anyone else notice that the attributes of a 'unicorn' data scientist include the traits of a 'data engineer?'
by cheriot on 9/8/2016, 8:05:40 PM
It was only 20 years ago that companies hired a "web master" or a generalist to do everything. But pieces of those jobs became specialized. Now we need UX, UI programmer, general engineers, dev ops, data engineers, a data scientist, etc.
And how many companies are still interviewing with fizzbuzz?
by collyw on 9/8/2016, 8:19:04 PM
So I know SQL, Python, Django, Java (though its been a while), Javascrit, Linux, some cloud computing and a bit of devops. Am I a data engineer? Software engineer, with a lot of database background? What makes a data engineer different from a software engineer?
by LawrenceHecht on 9/9/2016, 2:37:15 PM
Just checked, the # of data engineers rose to 9,246 (42%) in the last six months. So, the shortage is at least being addressed by people changing their job titles on LinkedIn.
by wpiel on 9/9/2016, 2:31:53 PM
What I've learned from the comments: If something is valuable, there is a shortage of it.
I'm not even sure if I'm being sarcastic.
by edoceo on 9/8/2016, 11:02:39 PM
We hire only the best! We only hire the top 1% of candidates.
But only 1 out of 100 are qualified :(
Whenever I see these posts I immediate translate them in my head to "we're in the middle of a talent shortage at a price I am willing to pay."
I've worked with very large amounts of data and high performance computing for most of my career; I mostly had finance related jobs in the last decade or so. I have most of the skill you want, including some you don't know you want. However when salary comes up, that is where we start to part ways. If you are really serious about a shortage, you should be really serious about making offers that can be competitive, but I keep seeing the same $150k offers. That isn't a "shortage" kind of offer.