Data Scientist at Groupon

Excellent R Programmer, Creator, Inquisitive Problem solving using Data Mining and Advanced analytics.

Deep expertise with mathematical and statistical data mining using R, proficient in Python as well. Good at production grade planning & execution and Internal Consulting.

Operational execution and delivery for key advanced analytics solutions.

About R-Statistics

An educational resource for those seeking knowledge related to machine learning and statistical computing in R. Here, you will find quality articles, with working R code and examples, where, the goal is to make the #rstats concepts clear and as simple as possible.

This is built by keeping in mind, statisticians who are new to R programming language, R programmers without a stats background, analysts who work in SAS or python, college grads and developers who are relatively new to both R and stats/ML. If you are completely new to R, this R tutorial is a good place it start.

- 115
- Big Data, Analytics, Maths and Statistics

This is a beginner, non-programmer friendly guide to learn and understand the R language from scratch, giving a brief walk through of the most important parts of the language in plain English, intended to get you on board fairly quick.I assume that you have R or RStudio installed and are ready to fo...

- 194
- Big Data, Analytics, Maths and Statistics

How to make any plot in ggplot2?ggplot2 is the most elegant and aesthetically pleasing graphics framework available in R. It has a nicely planned structure to it. This tutorial focusses on exposing this underlying structure you can use to make any ggplot. But, the way you make plots in ggplot2 is ve...

- 146
- Big Data, Analytics, Maths and Statistics

Basic tasksBasic plot setupgg <- ggplot(df, aes(x=xcol, y=ycol)) df must be a dataframe that contains all information to make the ggplot. Plot will show up only after adding the geom layers.Scatterplotlibrary(ggplot2) gg <- ggplot(diamonds, aes(x=carat, y=price)) gg + geom_point() ...

- 187
- Big Data, Analytics, Maths and Statistics

Linear regression is used to predict the value of an outcome variable Y based on one or more input predictor variables X. The aim is to establish a linear relationship (a mathematical formula) between the predictor variable(s) and the response variable, so that, we can use this formula to estimate t...

- 146
- Big Data, Analytics, Maths and Statistics

Statistical TestsThis chapter explains the purpose of some of the most commonly used statistical tests and how to implement them in R1. One Sample t-TestWhy is it used?It is a parametric test used to test if the mean of a sample from a normal distribution could reasonably be a spec...

- 131
- Big Data, Analytics, Maths and Statistics

Missing Value TreatmentMissing values in data is a common phenomenon in real world problems. Knowing how to handle missing values effectively is a required step to reduce bias and to produce powerful models. Lets explore various options of how to deal with missing values and how to...

- 120
- Big Data, Analytics, Maths and Statistics

Outliers in data can distort predictions and affect the accuracy, if you don’t detect and handle them appropriately especially in regression models.Why outliers treatment is important?Because, it can drastically bias/change the fit estimates and predictions. Let me illustrate this using thecars da...

- 96
- Big Data, Analytics, Maths and Statistics

Finding the most important predictor variables (of features) that explains major part of variance of the response variable is key to identify and build high performing models.Import DataFor illustrating the various methods, we will use the ‘Ozone’ data from ‘mlbench’ package, except for Info...

- 90
- Big Data, Analytics, Maths and Statistics

It is possible to build multiple models from a given set of X variables. But building a good quality model can make all the difference. Here, we explore various approaches to build and evaluate regression models.Data PrepLets prepare the data upon which the various model selection approaches will be...

- 352
- Big Data, Analytics, Maths and Statistics

If linear regression serves to predict continuous Y variables, logistic regression is used for binary classification.If we use linear regression to model a dichotomous variable (as Y), the resulting model might not restrict the predicted Ys within 0 and 1. Besides, other assumptions of linear regres...

- 190
- Big Data, Analytics, Maths and Statistics

Any metric that is measured over regular time intervals forms a time series. Analysis of time series is commercially importance because of industrial need and relevance especially w.r.t forecasting (demand, sales, supply etc).A time series can be broken down to its components so as to systematically...

- 132
- Big Data, Analytics, Maths and Statistics

This is a follow-up to the introduction to time series analysis, but focused more on forecasting rather than analysis.Simple Moving AverageSimple moving average can be calculated using ma() from forecastsm <- ma(ts, order=12) # 12 month moving average lines(sm,) # plotExponential SmoothingSimple...

- 88
- Big Data, Analytics, Maths and Statistics

R provides a number of convenient facilities for parallel computing. The following method shows you how to setup and run a parallel process on your current multi-core device, without need for additional hardware.Setting up for parallelizationThe number of parallel processes you can run simultaneousl...

- 122
- Big Data, Analytics, Maths and Statistics

The for-loop in R, can be very slow in its raw un-optimized form, especially when dealing with larger data sets. There are a number of ways you can make your logics run fast, but you will be really surprised how fast you can actually go. This chapter shows a number of approaches including simple twe...

- 118
- Big Data, Analytics, Maths and Statistics

Association mining is commonly used to make product recommendations by identifying products that are frequently bought together. But, if you are not careful, the rules can give misleading results in certain cases.Association mining is usually done on transactions data from a retail market or from an...

- 76
- Big Data, Analytics, Maths and Statistics

If you have multiple features for each observation (row) in a dataset and would like to reduce the number of features in the data so as to visualize which observations are similar,Multi Dimensional Scaling (MDS) will help.The Advantage and Disadvantage of MDSThe advantage with MDS is that you can sp...

- 123
- Big Data, Analytics, Maths and Statistics

The functions in InformationValue package are broadly divided in following categories:1. Diagnostics of predicted probability scores2. Performance analysis3. Functions that aid accuracy improvementFirst, lets define the meaning of the various terms used in this document.How to installinstall.package...

- 91
- Big Data, Analytics, Maths and Statistics

Robust regression can be used in any situation where OLS regression can be applied. It generally gives better accuracies over OLS because it uses a weighting mechanism to weigh down the influential observations. It is particularly resourceful when there are no compelling reasons to exclude outliers in your data.Robust regression can be implemented using the rlm() function in MASS package. The outliers can be weighted down differently based on psi.huber, psi.hampel and psi.bisquare methods specified by thepsi argument.How To Specify A Robust Regression Modellibrary(MASS) rlm_mod <- rlm(stack.loss ~ ., stackloss, psi = psi.bisquare) # robust reg model summary(rlm_mod) #> Call: rlm(formula = stack.loss ~ ., data = stackloss) #> Residuals: #> Min 1Q Median 3Q Max #> -8.91753 -1.73127 0.06187 1.54306 6.50163 #> #> Coefficients: #> Value Std. Error t value #> (Intercept) -41.0265 9.8073 -4.1832 #> Ai...

- 115
- Big Data, Analytics, Maths and Statistics

Probit regression can used to solve binary classification problems, just like logistic regression.While logistic regression used a cumulative logistic function, probit regression uses a normal cumulative density function for the estimation model. Specifying a probit model is similar to logistic regr...

- 101
- Big Data, Analytics, Maths and Statistics

Multinomial regression is much similar to logistic regression but is applicable when the response variable is a nominal categorical variable with more than 2 levels.IntroductionMultinomial logistic regression can be implemented with mlogit() from mlogit package and multinom()from nnet package. We will use the latter for this example.Example: Predict Choice of Contraceptive MethodIn this example, we will try to predict the choice of contraceptive preferred by women (1=No-use, 2=Long-term, 3=Short-term). We have the education, work, religion, number of children, media exposure and standard of living as variables available in the cmc data. In this example, we will model the choice of contraceptive method cmc as a function of all these variables.Import DatacmcData <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/cmc/cmc.data", stringsAsFactors=FALSE, header=F) colnames(cmcData) <- c("wife_age", "wife_edu", "hus_edu", "num_child", "wife_rel", "wife_work", "hus_oc...

Deep expertise with mathematical and statistical data mining using R, proficient in Python as well. Good at production grade planning & execution and Internal Consulting.

Operational execution and delivery for key advanced analytics solutions.

About R-Statistics

An educational resource for those seeking knowledge related to machine learning and statistical computing in R. Here, you will find quality articles, with working R code and examples, where, the goal is to make the #rstats concepts clear and as simple as possible.

This is built by keeping in mind, statisticians who are new to R programming language, R programmers without a stats background, analysts who work in SAS or python, college grads and developers who are relatively new to both R and stats/ML. If you are completely new to R, this R tutorial is a good place it start.

- 115
- Big Data, Analytics, Maths and Statistics

This is a beginner, non-programmer friendly guide to learn and understand the R language from scratch, giving a brief walk through of the most important parts of the language in plain English, intended to get you on board fairly quick.I assume that you have R or RStudio installed and are ready to fo...

- 194
- Big Data, Analytics, Maths and Statistics

How to make any plot in ggplot2?ggplot2 is the most elegant and aesthetically pleasing graphics framework available in R. It has a nicely planned structure to it. This tutorial focusses on exposing this underlying structure you can use to make any ggplot. But, the way you make plots in ggplot2 is ve...

- 146
- Big Data, Analytics, Maths and Statistics

Basic tasksBasic plot setupgg <- ggplot(df, aes(x=xcol, y=ycol)) df must be a dataframe that contains all information to make the ggplot. Plot will show up only after adding the geom layers.Scatterplotlibrary(ggplot2) gg <- ggplot(diamonds, aes(x=carat, y=price)) gg + geom_point() ...

- 187
- Big Data, Analytics, Maths and Statistics

Linear regression is used to predict the value of an outcome variable Y based on one or more input predictor variables X. The aim is to establish a linear relationship (a mathematical formula) between the predictor variable(s) and the response variable, so that, we can use this formula to estimate t...

- 146
- Big Data, Analytics, Maths and Statistics

Statistical TestsThis chapter explains the purpose of some of the most commonly used statistical tests and how to implement them in R1. One Sample t-TestWhy is it used?It is a parametric test used to test if the mean of a sample from a normal distribution could reasonably be a spec...

- 131
- Big Data, Analytics, Maths and Statistics

Missing Value TreatmentMissing values in data is a common phenomenon in real world problems. Knowing how to handle missing values effectively is a required step to reduce bias and to produce powerful models. Lets explore various options of how to deal with missing values and how to...

- 120
- Big Data, Analytics, Maths and Statistics

Outliers in data can distort predictions and affect the accuracy, if you don’t detect and handle them appropriately especially in regression models.Why outliers treatment is important?Because, it can drastically bias/change the fit estimates and predictions. Let me illustrate this using thecars da...

- 96
- Big Data, Analytics, Maths and Statistics

Finding the most important predictor variables (of features) that explains major part of variance of the response variable is key to identify and build high performing models.Import DataFor illustrating the various methods, we will use the ‘Ozone’ data from ‘mlbench’ package, except for Info...

- 90
- Big Data, Analytics, Maths and Statistics

It is possible to build multiple models from a given set of X variables. But building a good quality model can make all the difference. Here, we explore various approaches to build and evaluate regression models.Data PrepLets prepare the data upon which the various model selection approaches will be...

- 352
- Big Data, Analytics, Maths and Statistics

If linear regression serves to predict continuous Y variables, logistic regression is used for binary classification.If we use linear regression to model a dichotomous variable (as Y), the resulting model might not restrict the predicted Ys within 0 and 1. Besides, other assumptions of linear regres...

- 190
- Big Data, Analytics, Maths and Statistics

Any metric that is measured over regular time intervals forms a time series. Analysis of time series is commercially importance because of industrial need and relevance especially w.r.t forecasting (demand, sales, supply etc).A time series can be broken down to its components so as to systematically...

- 132
- Big Data, Analytics, Maths and Statistics

This is a follow-up to the introduction to time series analysis, but focused more on forecasting rather than analysis.Simple Moving AverageSimple moving average can be calculated using ma() from forecastsm <- ma(ts, order=12) # 12 month moving average lines(sm,) # plotExponential SmoothingSimple...

- 88
- Big Data, Analytics, Maths and Statistics

R provides a number of convenient facilities for parallel computing. The following method shows you how to setup and run a parallel process on your current multi-core device, without need for additional hardware.Setting up for parallelizationThe number of parallel processes you can run simultaneousl...

- 122
- Big Data, Analytics, Maths and Statistics

The for-loop in R, can be very slow in its raw un-optimized form, especially when dealing with larger data sets. There are a number of ways you can make your logics run fast, but you will be really surprised how fast you can actually go. This chapter shows a number of approaches including simple twe...

- 118
- Big Data, Analytics, Maths and Statistics

Association mining is commonly used to make product recommendations by identifying products that are frequently bought together. But, if you are not careful, the rules can give misleading results in certain cases.Association mining is usually done on transactions data from a retail market or from an...

- 76
- Big Data, Analytics, Maths and Statistics

If you have multiple features for each observation (row) in a dataset and would like to reduce the number of features in the data so as to visualize which observations are similar,Multi Dimensional Scaling (MDS) will help.The Advantage and Disadvantage of MDSThe advantage with MDS is that you can sp...

- 123
- Big Data, Analytics, Maths and Statistics

The functions in InformationValue package are broadly divided in following categories:1. Diagnostics of predicted probability scores2. Performance analysis3. Functions that aid accuracy improvementFirst, lets define the meaning of the various terms used in this document.How to installinstall.package...

- 91
- Big Data, Analytics, Maths and Statistics

Robust regression can be used in any situation where OLS regression can be applied. It generally gives better accuracies over OLS because it uses a weighting mechanism to weigh down the influential observations. It is particularly resourceful when there are no compelling reasons to exclude outliers ...

- 115
- Big Data, Analytics, Maths and Statistics

Probit regression can used to solve binary classification problems, just like logistic regression.While logistic regression used a cumulative logistic function, probit regression uses a normal cumulative density function for the estimation model. Specifying a probit model is similar to logistic regr...

- 101
- Big Data, Analytics, Maths and Statistics

Multinomial regression is much similar to logistic regression but is applicable when the response variable is a nominal categorical variable with more than 2 levels.IntroductionMultinomial logistic regression can be implemented with mlogit() from mlogit package and multinom()from nnet package. We wi...

2017 © Grroups ALL Rights Reserved. Privacy Policy | Terms of Use