We break data down into fundamental pieces, spend quite a bit of time
understanding these base components, and then rebuild the analysis
structures using the best tools and techniques available. 'Data
Science' is an incredibly broad subject, but after several years
working in data consulting, we've found that the largest hindrance to
effective analytics is failing to discard the non-essential and letting
complexity overwhelm the end goal.
Our philosophy in Data Science follows the advice that Einstein gave - *Everything should be made as simple as possible, but no simpler.* By reducing the data and focusing on what is most important, it is much easier to assess accuracy, build effective calculations, and visualize it in a way that the decision makers can understand and take action.
We've included a few different articles on the data cleansing and blending process. This is perhaps one of the most underappreciated, yet important parts of any project. Data often comes from many different places, in many different formats, at different levels of granularity. This is the point that we find out what information we have, determine what information we're missing, clean it, standardize it make sure all the information is accurate and reliable.
Data exploration is the process of taking the information and trying to find insights. It's a continuous process delving into relationships, assessing accuracy and figuring out where powerful insights are. We'll take you through the process with some interesting projects, and give you the tools you need to get the foundations of your project ready.
This is where the presentation of results come in. While Data exploration is just for you, data explanation is for your audience. What did you find? What are the results? What action can you take based on the data? These questions, among others, are an essential part communicating and conveying important information in a way that your audience understands, and knows what to do with it.
Data visualization is an art form and discipline of its own. It's essential to both the data exploration and data explanation process, and can provide powerful insights that wouldn't be possible by simply reading a data table. You can't do data science without the visualization piece, and we prefer bottom-up tools that give you unlimited options in how you present data. We focus on data visualization using R, Python and Tableau, and have papers and projects discussing the process.
Time Series modeling is an integral part of analyzing data that changes over a fixed-period of time. There's more complexity in this discipline than is immediately apparent, and we hope to showcase the rich analysis and sophisticated insights that you can glean from time series data.
Clustering and classification modeling is the process of finding similarities and relationships between different datapoints that are not evident. These techniques have a strong mathematical component, and are the foundation of machine learning and artificial intelligence.