Subscribe to DSC Newsletter

I'm hoping someone can give me some advice around how to move into the field of data science.

My background is in physics, in which I achieved a first class masters in 2006, but since graduating I have pursued a career in corporate strategy consulting. I decided on strategy because it is about looking beneath what is happening and analyzing trends to help guide decisions.

However, this career path has never fully satisfied my desire to manipulate data, identify patterns and make decisions based on these. Data science on the other hand is about doing just this and I've started exploring options to move into this field. I recognize that to achieve this I will have to develop new skills and am currently assessing academic courses to achieve this.

In terms of understanding statistical methods, my degree gave me a good grounding in this and my work within consulting has helped develop my skills in bringing data to life. The big gap in my knowledge though is around programming.

Are there any courses available in data science that are widely recognized and could be completed part-time? Or, if there are none, could people suggest the programming languages that I should focus on and any associated qualifications?

Answers to the above, as well as any other advice on how to get into this field, would be gratefully received and much appreciated.

Views: 9001

Reply to This

Replies to This Discussion

I guess the easiest languages for data manipulation is R . It should not be difficult for a person with your qualifications.

I think the three languages you're most likely to encounter in the field are R, Matlab, and Python. I'd recommend learning Python.

R is good for classical stats, and has a number of nice libraries, but its syntax and usage will not give you a good intro to other programming languages - it's kind of its own thing. The biggest issues with R has to do with efficiency. By default, it's got very slow I/O, gobbles RAM like you wouldn't believe, and often doesn't make the most efficient use of your CPU. If you're working with big data sets, you'll run into R's bottlenecks pretty quickly. I've been told there are ways to mitigate some of these issues, but is the hoop-jumping really worth it?

Matlab tends to be pretty prevalent in academia. I'm not a big fan of this language, in part because of its syntax, and in part because it's proprietary, as well as a few other reasons.

Python is the one you'll probably most often encounter in professional environments. It's easy to learn and use, and once you've got a good grip on it, you can easily transition to many other programming languages. Some programmers don't like that whitespace is significant in Python, but it actually helps you produce extremely readable code. Python implementations tend to be very efficient - you sometimes get performance improvements between one and two orders of magnitude using Python over R, for example.

For data science purposes, check out Python's scikits module (bunch of machine learning/stats stuff), matplotlib (visualization), and pandas (you can use dataframes, which are perhaps the best feature inherent to R).

You'll also want to look up NoSQL databases, MapReduce, and Hadoop.

To get some experience, participate in kaggle contests.

I though R is more popular for data processing than Python and it has more libraries for this purpose ?.

Personally I prefer Python because of its multi-purpose.

Thanks for the replies both, much appreciated. If you teach yourself how to use these languages do you know how acceptable it is for potential employers? Or are accredited qualifications demanded?

It's a given that you should know at least one (preferably more) language, and the most commonly used language seems to be Python. However, keep in mind that it's useful to be language-agnostic in positions where you're not tasked to work specifically in a particular language (i.e. if you're writing Android apps, you'll want to know Java), so to focus on a language, rather than programming as a skill, is to miss the point. When you're sufficiently experienced in programming, you should be able to pick up a new language at a reasonable level of competence (given that it's not weird/drastic change, like going from Python to Lisp or Haskell) very quickly (i.e. less than a day).

I'd guess that accreditations in this field, short of a university degree, are generally a sham, and I don't think most HR departments will be overly impressed by them. If you've got a Master's in physics, I'm actually very surprised you're not that experienced in programming. In the US, I'd assume most HR departments will take it as a given that you're a good enough programmer if you're applying for a data science position, especially if you've got a degree in physics, but they will probably ask you about your programming skills at a preliminary interview, and you may be asked to sit down and show your skills at some stage of the interviewing process.

But again, programming is just a foundational tool. Your analytical and quantitative skills are what matter. If you're a carpenter, it's assumed that you're sufficiently proficient with a saw, but they're not explicitly concerned with your advanced sawing skills.

I studied Experimental Physics, which focused on the analytical skills and applied theory rather than programming. There was a separate 'Computational Physics' course that concentrated more on things such as Monte Carlo analysis.

I get what you mean about it being more of a skill. So what I should do is look to become comfortable with one type of code and then transfer the skills to another form of program language to continue developing my understanding of programming rather than one language? 

"So what I should do is look to become comfortable with one type of code and then transfer the skills to another form of program language to continue developing my understanding of programming rather than one language? "

I'd say starting off by becoming solid in one language, and building from there is a good approach. For that first language, I'd really recommend Python, and the Python libraries/modules I mentioned in the first post come up quite often in data science/analysis.

You should learn the Python/scipy/pandas programming stack, as well as R. You could also check out the following Coursera courses (they're free):

https://www.coursera.org/course/compdata

https://www.coursera.org/course/dataanalysis

Thanks Rob, those courses look good and I've signed up.

You are welcome. Good luck.

Carl Watt said:

Thanks Rob, those courses look good and I've signed up.

Hi all,

As a secondary question to this thread, can anyone recommend good places to find Data Scientist jobs or recommend particular companies? I'd love to work for a company that operates in a range of fields such as IT, Finance, Energy and Telecommunications, to develop domain experience. At present, I check LinkedIn regularly and I contact individuals in the field.

In addition, to second the views of people above: R, Python, SQL appear to be sought after, and Coursera comes recommended. I would also say that a deep understanding of the mathematical basis of the models has helped me interpret results.

Hi Carl,

The following includes my roadmap for best ROI

 1- Follow the instructions in the book “Visualize This” –- good starting point

http://www.amazon.com/Visualize-This-FlowingData-Visualization-Stat...

 

 2- Free online courses by SAS

"SAS OnDemand for Academics provides a no-cost online delivery model to professors for teaching and students for learning data management and analytics. By connecting to a SAS server in the cloud, users access the analytical power of SAS software through a user-friendly, point-and-click interface."

http://www.sas.com/govedu/edu/programs/od_academics.html

3- Learn R programming language

4- Download the public version of Tableau and start experimenting

5- Take a look at Alteryx. I believe they recently made their excellent analytics tool available to public for free

http://www.alteryx.com/

Good luck!

RSS

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service