As an emerging field that is a collection of a number of well-established fields, the skills that make for a successful data scientist come from a variety of disciplines including mathematics, statistics & machine learning, and computer science. Navigating a pathway through developing skills in these fields can be challenging since no single resource (by necessity) is able to provide guidance on all of the tools of a modern data scientist. Many of these resources have been personally helpful in developing skills in this area, and I've compiled a list of computational tools, (mostly free) resources and educational references, and a checklist of data science concepts for those interested in structure on their pathway through their personal data science curriculum or to brush up on topics and expand your knowledge.

**Data Science Concept Checklist**. Checklist of core and advanced concepts in data science across the three primary disciplines (mathematics, statistics & machine learning, and computer science) organized by topical areas. This can act as a roadmap through which concepts to explore or as a tool for evaluating opportunities for expanding your existing skillset.

**Tools**. Descriptions and links to powerful computational tools and useful packages.

**Resources and References**. A curated collection of educational resources on a wide variety of core data science concepts and some special topics.

There are a vast array of tools that can be used for solving problems in data science. Some are programming languages or environments, others are useful packages for solving specific problems or communicating and visualizing your results.

Almost any programming language can be used to solve computational problems, although there are a few that outshine in terms of built in packages and user support communities. Most notably, R and Python have excelled in these respects and are also freely available. MATLAB may have the most detailed documentation of any of the options available, but it is commercial software.

**R**. R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows, and OSX. With the RStudio integrated development environment (IDE), the language can be powerfully wielded for rapid analyses. Additionally, R Shiny can turn R analyses into interactive web applications.

**Python**. Python is a powerful, general purpose, dynamic programming language that is has extensive packages for scientific computation (NumPy, SciPy, Pandas), advanced plotting (matplotlib), and machine learning (scikit-learn). For this sort of scientific computing, using an IDE such as Rodeo or Spyder may speed up the development of analyses.

**MATLAB**. A numerical computing environment and programming language with a wide set of standard toolboxes including those for statistics and machine learning.

**Julia**. A newer programming language designed to meet the needs of mathematical computing.

Almost any data science project worth doing requires significant numbers of revisions and collaboration. These tools allow for comprehensive Git-based version control with a web-based repository. Github is the most popular, but all offer similar web-based repository services.

**Git**. Open source distributed version control system. Git is often used with a web-based Git repository hosting service such as Github.

**Apache Subversion (SVN)**. A free software versioning and revision control system, based on a centralized concurrent versioning model.

**Jupyter Notebook**. This web application allows you to create and share documents that contain live code, equations, visualizations and explanatory text.

**Github Pages / Github.io**. Github Pages allows you to create a web page from a Github repository and use convert plain text into a formatted web document.

**D3.js**. D3 (or Data Driven Documents) is an open-source JavaScript library for producing dynamic, interactive data visualizations in web browsers. Since this is based in JavaScript, visualizations are entirely customizable, but do require significant skill use effectively.

**Tableau**. Proprietary desktop and web-based visualization tools that include many data visualization techniques for the rapid development of professional visualizations.

**MySQL**. An open source relational database management system using SQL.

**Apache Hadoop**. An open source framework for distributed file storage and processing (often associated with “big data”) that uses the Hadoop Distributed File System (HDFS) for storage and the MapReduce algorithm for data processing.

**MongoDB**. A document-oriented NoSQL database (non-relational database, which does not rely on tables for storing data) capable of handling a wider variety of data types than traditional SQL relational databases.

Here are some (primarily free) resources data science. Some of these are personal favorites () or recommendations, and many come from the github awesome-machine-learning repository on data science books.

Donoho, David. 2015. 50 Years of Data Science.

Nilsson, Nils. The Quest for Artificial Intelligence: A History of Ideas and Achievements. 2010. A history of machine learning and data science

Stitch. 2016. The State of Data Science.

Swanstrom, Ryan. 2015. Data Science University Programs.

Corethell, Clare. 2015. Open Data Science Masters Curriculum.

Guichard, David. 2016. Community Calculus.

Hartman, Gregory. 2015. Calculus 1, 2, and 3. 3rd edition.

Marsden, Jerrold, and Alan Weinstein. 1985. Calculus I, 2, and 3. 2nd edition. New York: Springer.

Strang, Gilbert. 1991. Calculus. MIT Open Courseware.

Stewart, James. 2015. Calculus: Early Transcendentals. 8th edition. Boston, MA, USA: Brooks Cole.

Beezer, Robert Arnold. 2008. A First Course in Linear Algebra.

Hefferon, Jim. 2006. Linear Algebra.

Treil, Sergei. 2004. Linear Algebra Done Wrong.

Vandenberghe, L. 2007. Applied Numerical Computing. Lecture Notes.

Lay, David C. 2006. Linear Algebra and Its Applications. Pearson/Addison-Wesley.

Lebl, Jiří. 2014. Notes on Diffy Qs: Differential Equations for Engineering.

Trench, William. 2013. Elementary Differential Equations.

Ash, Robert B. 1970. Basic Probability Theory. John Wiley and Sons.

Diez, David, Christopher Barr, and Mine Cetinkaya-Rundel. 2015. OpenIntro Statistics. Third.

Downey, Allen B. 2014. Think Stats: Probability and Statistics for Programmers. O’Reilly Media, Inc.

Ross, Sheldon. 2014. A First Course in Probability.

Barber, David. 2012. Bayesian Reasoning and Machine Learning. Cambridge University Press.

Daumé III, Hal. 2015. A Course in Machine Learning.

Duda, Richard O., Peter E. Hart, and David G. Stork. 2012. Pattern Classification. John Wiley & Sons.

Yee, Stephanie, and Tony Chu. A Visual Introduction to Machine learning. Data visualizations that guide the reader through core machine learning concepts.

Shalizi, Cosma. Advanced Data Analysis from an Elementary Point of View. A pre-publication pdf draft textbook made available by the author.

Bishop, Christopher M. 2006. Pattern Recognition.

Downey, Allen. 2013. Think Bayes: Bayesian Statistics Made Simple. O’Reilly Media, Inc.

Kriesel, David. 2007. A Brief Introduction to Neural Networks.

Nielsen, Michael. Neural Networks and Deep Learning. 2016. Free online book.

Smilkov, Daniel and Shan Carter. An Interactive Neural Network Playground. Interactive neural network simulator.

Deep Learning Tutorial. (Stanford)

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep Learning. 2016. An MIT Press book on deep learning (and basic machine learning).

Sutton, Richard S., and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press.

Severance, Charles. Python for Informatics. A free pdf book on an data-analysis-centered approach to Python coding.

Wickham, Hadley. Advanced R. An online textbook based on a popular print book on R.

Hamilton, Antonia. Matlab for Psychologists. 2004. A MATLAB beginner's pdf tutorial.

Mathworks MATLAB Statistics and Machine Learning Toolbox Tutorial.

Tufte, Edward R. 2001. The Visual Display of Quantitative Information. 2nd edition. Cheshire, Conn: Graphics Pr.

Few, Stephen. 2009. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press.

Cairo, Alberto. 2012. The Functional Art: An Introduction to Information Graphics and Visualization. New Riders.

Murray, Scott. 2013. Interactive Data Visualization for the Web. O’Reilly Media, Inc.

Maclean, Malcolm. 2013. D3 Tips and Tricks. Leanpub.

Skinner, Grant. RegExr. An online tool to learn, build, & test Regular Expressions.

Gertz, M. 2000. Oracle/SQL Tutorial.

MySQL Tutorial. 1997. MySQL 5.1 Reference Manual.