As a complement to my note on R as a data science language, this note lists ten other technologies that you might want to learn to use, or at least monitor, if you are interested in learning data science.
Git is a concurrent versioning system that is easy to use through platforms like GitHub or GitLab. It is the best tool that I know of to keep track of code and text over long periods of time (meaning: over 24 hours).
Learning the basics of Git will also force you to learn the fundamentals of command-line programming, which is a crucial skill in its own right, and an indispensable skill to be able to make use of many useful tools.
Learning to structure and style content using HTML and CSS will also make clear to you why you need to equip yourself with a code/plain text editor, why you need to learn how to use regular expressions as soon as possible.
LaTeX is an extremely powerful typesetting language. Unlike Markdown and its variants, such as R Markdown, it requires considerable effort to use; however, it is unbeatable to typeset high-quality reports, especially when mathematical notation is involved.
Given the complexity of its inner workings and the nice replacements offered by tools such as Markdown and Pandoc, LaTeX is perhaps the least important item to learn on this list.
Apache has released several tools for big data analysis, such as Hadoop, which can be interfaced with R via the RHadoop packages, Spark, which can be interfaced with R via
sparklyr, and Arrow, which can be interfaced with R (or with Python) via
Learning to use Google Docs, Google Drive and Google Maps efficiently is very straightforward and has saved me lots of time while working on data collection tasks with non-technical users.
threejspackages. For more examples of such interfaces, browse the reverse imports of the
Note that if you want to become well-versed in visualizing geographic data on maps, some basic knowledge of cartography will also become very handy. Unfortunately, I have no further guidance to provide on the topic.
This note lists a few technologies and tools that I do not use myself, or that I learnt only to complete a specific task, and then later forgot about.
- First published on January 5th, 2017