blog.zenggyu.com/en

The Limitation of Accuracy of Classification Models

It didn’t occur to me that even if a classification model can perfectly estimate the probability of an outcome, the accuracy of the prediction can still be low. This post explains the phenomenon. Some useful notes on the code that makes Figure


ewen.io

discogger (day one)

Building an R interface to the Discogs


www.robert-hickman.eu

Could an Independent Yorkshire Win the World Cup - LASSOs and Player Positions

The data we’ve scraped only gives a player’s overall ‘ability’ and their abilities on specific skills (e.g. strength, long shots, dribbling…). We want to use this to work out how good each player is at each position


engineering.pivotal.io

How to use 'tag_filter' in 'git' resources in Concourse CI

This blog post will show how this resource type can be used to fetch a specific git tag from the repository


jessesadler.com

One Year Anniversary

In a nice little coincidence this is the 11th post to the blog and pushes my first introductory post to the second page of blog entries


blog.wallaroolabs.com

Stream processing, trending hashtags, and Wallaroo

A prospective Wallaroo user contacted us and asked for an example of chaining state computations together so the output of one could be fed into another to take still further action. In particular, their first step was doing aggregation


www.tidyverse.org/articles

dplyr 0.7.5

The next release involves substantial refactoring of the internals to make hybrid evaluation simpler and less surprising, a new implementation of grouping that better respects levels of factors, and redesign of the grouping metadata to replace the current collection of attributes by a single tidy tibble


blog.zenggyu.com/en

A Brief Introduction to Bagged Trees, Random Forest and Their Applications in R

It should be noted that although the bagged trees are identically distributed, they are not necessarily independent. Since the boostrap samples used to train each individual tree come from the same data set, it is not surprising that the trees may share some similar


aosmith.rbind.io

A closer look at replicate() and purrr

Since I’m going generate random numbers I’ll set the seed so anyone following along at home will see the same values. The output below is a list of three vectors


guyabel.com

Animating Changes in Football Kits using R

This bit of code can take a while to execute if the are many frames (see my comments towards the end of the post). I could then run the same code as above to scrape the images, annotate the year and copyright information and build the


sciathlon.github.io

Athlete's foot and its treatment

Hi athletes, today we will be looking at data about a health issue that affects many athletes: athlete’s foot. It’s not a very glamorous subject but it’s still interesting, and I find fungi really fascinating, they are warrior eukaryotes that survive everywhere! Most of you probably would rather never hear about it..


www.robert-hickman.eu

Could an Independent Yorkshire Win the World Cup - Data & Scraping

In order to calculate how good each county team would be, I needed a measure of the ability of all of the players they could field. For this I turned to the FIFA18 video game which rates players along a variety of scales. Once that’s scraped and bound we can take a peek at the data


www.rostrum.blog

Tid-ye-text with geniusr

Matt Dray ⚠️ Warning: this post contains offensive words. ⚠️ Genius? Kanye West released his latest album – ye – last week1 after a(nother) pretty turbulent and controversial period of his life2


leonawicz.github.io/blog

epubr 0.4.0 CRAN release

E-book formatting is non-standard enough across all literature that no function can curate parsed e-book content across an arbitrary collection of e-books, in completely general form, resulting in a singular, consistently formatted output containing all the same variables


yihui.name/en

Pour Positive Energy into Github Issues

I think it got the largest number of likes among all tweets that ever involved me. It is great to see this many people appreciated the appreciation. Of course, there is no point of just blindly saying good words


fharrell.com

Viewpoints on Heterogeneity of Treatment Effect and Precision Medicine

There are two stages in the understanding and implementation of RM: In most cases one can compute the absolute benefit as a function of (known or unknown) patient baseline risk using simple math, without requiring any data, once the relative efficacy is estimated


djnavarro.net

Day 38: Algorithmic complexity

can be produced with a very short R program: whereas a random-looking string like However, if I use R as the compressing language, there is a very short program that produces it: All of which is by way of background. And calling it… Not surprisingly, complexity increases as a sequence becomes longer, even if it’s the the same symbol being


rubuntu.netlify.com

June 2nd, 2018 c2d4u Update

Small weekly update to c2d4u


blog.zenggyu.com/en

Setting Up PostgreSQL

This is one of a series of posts where I document software configurations for personal reference. This post documents the configurations for


wytham.rbind.io

A note on factors in regression (in R)

Factors terrify me. I can avoid dealing with them most of the time, but they’re immensely useful in a regression when you have a categorical variable with many levels (e.g. “Very Bad”, “Bad”, “Good”, “Very Good”)


djnavarro.net

Day 36-37: Concerned DALEX

I was working on a longer post continuing the metaprogramming series, and realised I wasn’t going to get it done this evening. But it’s been a couple of days since I tried out something new, so I resorted to the twitters to find


blog.sellorm.com

First steps with data pipelines

If you’re a data scientist, data engineer or otherwise someone just starting to think about creating data pipelines, you could do a lot worse than check out make. Having a consistent and flexible way of executing your data pipeline should be an essential part of any data professionals toolkit


blog.zenggyu.com/en

Setting Up R

This is one of a series of posts where I document software configurations for personal reference. This post documents the configurations for R. This post will concentrate on user- or project-specific files, so all the files mentioned below should be placed in a user’s home directory or in the working directory of a project. Global Options -> General:


www.tidyverse.org/articles

conflicted

Install conflicted by running: conflicted does not export any functions. To use it, you just need to load it: Loading conflicted creates a new “conflicted” environment that is attached just after the global environment


www.aggieerin.com

New Publications and Updated CV

Hi guys! I just wanted to post that I’ve updated the website to be current with some new publications I wanted to highlight: First up is two papers on psycholinguistics that were undergraduate student projects: Duncan, J., Buchanan, E.M., Marshall, C.Z., & Oberdieck, K. (accepted). But words will never hurt me, Journal of Psychology and Behavioral Sciences, X, XX-XX. PDF Forbes, F.-J., & Buchanan, E.


blog.zenggyu.com/en

Setting Up Ubuntu

This is one of a series of posts where I document software configurations for personal reference. This post documents the configurations for Ubuntu. There are two settings that may be of particular interest: The following command can be used to update system time, but note that the program is no longer installed on Ubuntu by


statsbylopez.netlify.com

The within-game evolution of MLB’s strike zone

Next, I averaged across each of the nine-innings and each three-inch window of a strike zone grid to identify the likelihood of a pitch in a given part of the zone being called a strike


eliocamp.github.io/codigo-r

Tu propio smooth en geom_smooth()

Algo increíblemente satisfactorio de ggplot2 es la posibilidad de ajustar curvas a los datos de manera súper fácil con geom_smooth()


www.katiescranton.com

Building my website with blogdown

This is my third attempt at building a website, including an (overly?) ambitious idea to document all of the #Rcats and #Rdogs (and #Rchickens Lucy!) on twitter


mouse-imaging-centre.github.io/blog

An overfit representation of ICLR 2018

I was recently extremely fortunate to attend ICLR 2018, albeit as something of an interloper. Accordingly, what follows is surely a rather atypical highlight reel


www.stat.cmu.edu/~ryurko

Bayesian Baby Steps

You’re trying to evaluate a receiver’s ability to catch a football. Let’s pretend you can take the following (completely unrealistic) strategy: you tell your quarterback to repeatedly throw the ball to your receiver in practice, recording each time whether or not they caught the ball


www.onceupondata.com

#runconf18

The other good things were: Stefanie Butland and rOpenSci people organized everything to make sure everyone is feeling


data-chips.com

Crocheting & plotting

If there’s one thing I’m passionate about, it’s combining my passions. So right now I’m going to visualize crochet circles using R’s ggplot2 package. Rest assured I’m going to keep the crochet jargon to a minimum. First things first, I need to load the ggplot2 package


ndres.me

Machine learning explained with gifs

About style transfer Pioneered in 2015, style transfer is a concept that uses transfers the style of a painting to an existing photography, using neural networks. The original paper is A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S


jvera.netlify.com

My favourite snippets

A hidden gem from Rstudio is snippets feature. A well known option in any other editor (Atom, VS Code, Notepad ++….) seems that for R people is not a very used tool. For what I know some developers tend to code a full Add-in for things that can be achieved easily just adding a snippet to your Rstudio configuration


sciathlon.github.io

Running races and waste

Hi everyone! I am tackling a new topic today, which is: waste generated during races


ramhiser.com

Adding Dask and Jupyter to a Kubernetes Cluster

In this post, we’re going to set up Dask and Jupyter on a Kubernetes cluster running on AWS


bgstieber.github.io

An Introduction to the kmeans Algorithm

This post will provide an R code-heavy, math-light introduction to selecting the (k) in k means. It presents the main idea of kmeans, demonstrates how to fit a kmeans in R, provides some components of the kmeans fit, and displays some methods for selecting k


blog.mgechev.com

Fast, extensible, configurable, and beautiful linter for Go

About a year ago I decided to polish my Go skills. Although the language is pretty small compared to most others that I use on a daily basis, it still has some useful syntax constructs that I didn’t use enough. What a better way to brush up your skills in a programming language other than building tools with it..


ellocke.github.io

(R) Some Tricks for Blogdown & Hugo (Working Draft)

1 Fix your Table of Contents / TOC (with .Rmd) 1.1 Numbering 1.2 Custom TOC & Numbering CSS 1


rubuntu.netlify.com

c2d4u Update

On the c2d4u PPAs, my goal is to update and add new packages (from CRAN Task Views) on a weekly basis, usually on the weekend. While I was building c2d4u3.5, I put this on hold, as I didn’t want to build new (to the PPA) packages at the same time as checking old ones


www.mytinyshinys.com

epldata Package

I have been collating data from the English Premier League since it began in 1992 and have a complete database of every players appearances in league games, details of goals scored and assists made


ellocke.github.io

(R) Troubleshooting Blogdown & Hugo for (Windows) Dummies

1 When blogdown::serve_site() stops working 2 My Problem Space Working Enivironment 3 How to Debug Hugo


rubuntu.netlify.com

Announcing cran2deb4ubuntu3.5

Many things to consider before you add this PPA to your Ubuntu machine. The PPA supplies binaries for Trusty (14.04), Xenial (16.04), and Bionic (18.04). If you decide to utilize this PPA, please let me know if something is not working. There is no way I can test all 3,400+ packages and there are always little things that I miss


ramhiser.com

Interpreting Machine Learning Algorithms

I’ve had an open tab with an overview piece on interpreting machine learning algorithms for several weeks now


chichacha.netlify.com

Making Calendar with ggplot + Moon Phase Calendar for fun

To make calendar, I need to strip out weekday, month, day, week number within a month. So I can use weekday as x-axis, week number within a month as y-axis, and facet by month. First I just made simple calendar with below code. This time, I tried using geom_tile function to create tiles. I’ve coloured sell using fraction (illuminated fraction of the moon)


bgstieber.github.io

My First Post

Welcome to my blog! I plan to use this website to present data explorations and analyses in a way that’s understandable to a broad audience. I hope to demonstrate the utility of applying ideas like machine learning, data visualization, and exploratory data analysis to day-to-day life to improve decision-making processes


thestudyofthehousehold.com

Visualizing insect count data — a zero-inflated poisson model

Most ecologists would agree: it’s really hard to predict which animals are going to be where, and how many of them you might find when you look. Lately, there has been lots of interest in using mixed-effects models to make these predictions


www.rostrum.blog

Cloudy with a chance of pie

Matt Dray The pinnacle of visualisation Great news everyone: I’ve taken the best of two stellar data visualisations and smashed them together into something that can only be described as perfection. Let me set the scene. There’s three things we can agree on: Everyone loves pie charts, particularly when they’re in 3D, exploded and tilted. Word clouds aren’t at all overused


cevo.com.au

Jenkins as a Service

In this session we will work through provisioning Jenkins on AWS ECS from a set of Docker containers that allow individuals or teams to self service an immutable CI/CD setup


yihui.name/en

One (Perhaps Surprising) Reason Why I may Silently Ignore a Github Issue

What made me hesitate when looking at this issue was the incorrect format of the reproducible example


ropensci.org/technotes

vcr

The first time the above code block is run real HTTP connections are allowed because it doesn’t match any previous requests, and the response is cached. The second time the request is made, the cached response is used


www.rdatagen.net

A little function to help generate ICCs in simple clustered data

In health services research, experiments are often conducted at the provider or site level rather than the patient level. However, we might still be interested in the outcome at the patient level


amateurdatasci.rbind.io

All About Git and Github in RStudio

1 Git Newbie 2 Commit, Push, and Pull 3 Let’s Git It On! 3.1 Create an account in Github 3.2 Once we have an account, we can immediately create a repository. 3.3 Configure Git in R Studio 3.4 Create new project with version control 3.5 Copy Repository URL and Create Project 4 Git Up, Git Down 4.1 R Studio’s Easy Git Interface. 4.2 First Commit 4


sciathlon.github.io

Favorite trail race

Hi everyone! I am continuing my journey to learn awk and I finally managed to process (almost) an entire file today so let’s analyse the 2018 Tencin trail


leonawicz.github.io/blog

trekfont

First use base graphics. Did you ever think you would be annotating your plots in Vulcan and Klingon? Next use ggplot2


rubuntu.netlify.com

Adding jq library to Trusty and Xenial PPAs

One of the advantages of using Launchpad’s PPA system is that it allows you to easily use the work of others backporting packages to older


gcppodcast.com

Decision Intelligence with Cassie Kozyrkov

There are several other episodes that provide insights into data science: As well as case studies on real world problems: How can I secure my Google Cloud Platoform acoount using a


eliocamp.github.io/codigo-r

Hacer una presentación de PowerPoint a partir de rmarkdown

La interfaz entre usuarios de knitr/markdown y word/powerpoint no deja de ser áspera ya que es difícil cambiar el workflow propio para acomodar el de otras personas


www.jessemaegan.com

R4DS June Challenge

No, however you are still encouraged to work through a book or course and share what you’ve learned on Twitter by using the #SoDS18 hashtag


blog.wallaroolabs.com

Real-time Streaming Pattern

Introduction Many of you have been reading our engineering blog and enjoy our deep technical dives


blog.wallaroolabs.com

Streaming with Wallaroo

Introduction Many of you have been reading our engineering blog and enjoy our deep technical dives


cjbarrie.netlify.com

Younger electorates vote independent in Tunisian Municipal Elections

Municipalities with a greater number of younger registered voters saw a higher vote share for independent lists. Similarly, younger electorates were less likely to vote for established parties


ropensci.org/technotes

taxize

We’ve come a long way since May 2011. We’ve added a lot of new functionality and many new contributors


rubuntu.netlify.com

Replacing weatherunderground.com data with...weatherunderground?

Combining these two data sources recreates what weatherunderground.com used to provide. It should be noted that Dark Sky has a slightly different definition of a day for their API, calculating daily averages from 4AM to 4AM, not midnight to midnight. I compared data for previous years and the differences were negligible


www.aggieerin.com

Current Publications with Papaja

Heyo! Frederik, the author of papaja, requested that we update him with papers written with his package. I was like, oh man, like the whole lab?! So, I decided that I could probably make it easy by making a table here


mgb-research.netlify.com

Gaussian Process Imputation/Forecast Models

As a toy problem, I am going to focus on the application of a Gaussian process model to forecasting future monthly passengers. This is not the only way one could try to solve this prediction problem


lenkiefer.com

Pomological Plots

In the real world, when I give talks and use slides I am typically constrained in my aesthetic. Often I’m speaking at a work-related thing and we have a corporate template and color scheme. They serve us well and I’ve found restraint helps focus on the message


www.tidyverse.org/articles

ggplot2 2.3.0 — upcoming release

In addition to highlighting a few features and improvements, we also want to share a bit about our release-preparation process for ggplot2, which has over 2,000 reverse


chichacha.netlify.com

16 Personalities with Circlize

There were difinitely some traits that sounded like me, and a lot of statement I can relate for sure. It’s definitely some fun test to do. There was a one part that stated “INFP - compromising just 4% of the population”


eliocamp.github.io/codigo-r

Arte reproducible del Subte de Buenos Aires

El sábado pasado estábamos en el subte con mi novia y pasamos por la estación Ángel Gallardo de la Línea B y notamos que tenía unas ilustraciones en las paredes hechas a partir de líneas de distinto grosor. Charlamos un poco sobre ese estilo y cómo se podía replicar con ggplot2 usando el paquete ggridges


chichacha.netlify.com

Daylight in Vancouver (Canada) vs Tokyo (Japan)

I currently live in British Columbia, Canada. So I live above 49th parallel line


vegawidget.rbind.io

Introducing altair, an R interface to the Altair Python Package

Introducing altair, an R package to work with the Python package Altair, which you can use to build and render Vega-Lite chart-specifications: https://vegawidget.github.io/altair Vega-Lite offers an implementation of an interactive grammar of graphics


www.stat.cmu.edu/~ryurko

It's blogging time

Stay tuned for


ramhiser.com

Setting Up a Kubernetes Cluster on AWS in 5 Minutes

Kubernetes is like magic. It is a system for working with containerized applications: deployment, scaling, management, service discovery, magic. Think Docker at scale with little hassle


blog.brianz.bz

The Dark Art of AWS VPC Networking

It’s been quite some time since a blog post went up here. The reason for this is mainly due to my book with Packt Publishing, Serverless Design Patterns and Best Practices. Happily I can say that it’s published and I can turn my technical attention to other things. In chapters 2 and 3, I walk through setting up serverless REST and GraphQL APIs, respectively


rubuntu.netlify.com

Update on the Move to R 3.5.0

One of the challenges with using Launchpad is that once a package is built, it needs to be published. This takes some time (around 20 minutes). Therefore, you can’t just push a series of packages to Launchpad and walk away. In order to ensure the dependencies are built, you need to wait until they have been published in the PPA


vegawidget.rbind.io

Welcome to vegawidget

The effort to bring Vega-Lite to the R community is collaborative; so it appropriate that the altair package be hosted by an organization. The altair R package uses the Altair Python package to create Vega-Lite specifications for interactive charts


www.openplantpathology.org

What is going on in OPP? a quick summary of the first five months

To achieve this goal, OPP evolved to: We were surprised by a quick reaction and initial engagement in our Slack workspace where several channels were created to accommodate smaller groups with a specific interest including #epidemictheory, #phytopathometry, #reproducibility, #teaching and #r-pkg-dev, as among the more active


roh.engineering

Animating a Monte Carlo Simulation

Oftentimes, I run into difficulty trying to explain some of the concepts of statistical sampling with audiences that either have very limited or no understanding of statistics


matthewsmith.rbind.io

Country Networks and Flags

Recently, I was asked whether I could create an international trade network with flags as nodes. Therefore, I thought I would write a post introducing the ggflags packages and how to use it in network visualisation


amateurdatasci.rbind.io

Hideous Progeny

1 The Moden Prometheus 2 Pursuit for Frankenstein Begins 3 Emptiness Filled 4 Destruction and Creation 5 Of Man, Of Life 6 A Big Ending 7 Uncontrollable Feelings 7.1 Waves of Emotions 7.2 Down the Precipice 7.3 Fear the Daemon 8 Ice and Hearts of Fire And now, once again, I bid my hideous progeny go forth and prosper


www.rostrum.blog

Pokeballs in Super Smash Bros

Matt Dray Smash! Super Smash Bros (SSB) is a beat ’em up videogame series featuring characters from various Nintendo franchises and beyond


chichacha.netlify.com

Testing Entry with R Rmarkdown File

Just figuring out how the blog post works with this random set of coffee data! Pie chart can be created with using polar


amateurdatasci.rbind.io

The Crusade

1 A View of Mining Lyrics 2 Into the Mouth of R We March 3 Pull Harder on the Strings of Lyrics with geniusR 4 Text Dismantled 5 Anthem (We are the Functions) 6 Album That Spawned the Most Words 7 Tread the Words 8 Torn Between Term Frequency and Inverse Document Frequency 9 Negativity Thrives 10 These Sentiments Can’t Tear Us Apart (Warning: NSFW


www.rladiesnyc.org

Lightning Talks!

6:30-6:55: Food & Networking 6:55-7:00: Introduction by our host 7:00-8:30: Lightning talks 8:30-9:00: Networking Date: Thursday, June 14, 2018 Time:


cevo.com.au

Watchmen on the Radar

Cevo are thrilled to see the Watchmen project receive recognition in the 2018 ThoughtWorks Technology Radar


cjbarrie.netlify.com

Youth and competition boost turnout in Tunisian Municipal Elections

Municipalities with a greater number of younger registered voters experienced higher turnout. Municipalities with more candidates proportional to the size of the electorate witnessed substantially increased turnout


ropensci.org/technotes

drake's improved high-performance computing power

A typical workflow is a sequence of interdependent data transformations


ramhiser.com

I Was on a Machine Learning for Geosciences Podcast

I listen to a lot of podcasts — Tim Ferriss, EconTalk, Rocket, Talking Machines. But I had an opportunity to be on one called Undersampled Radio! It was a lot of fun


ritsokiguess.site/docs

Ken ventures into community ecology

Introduction Somebody mentioned ANOSIM to me, and I had this kind of vague recollection of it, meaning that I didn’t really understand anything of it at all. This prompted me to explore further, which got me into the vegan package


g-tierney.github.io

The Genetics of Magic

Last spring, I took a class on Bayesian statistics at the University of Chicago that had several exercises focused on building a model to classify species based on their genome. The basic setup was that you were given a data set of salmon, their genome sequencing data, and which sub-population they belonged to


eliocamp.github.io/codigo-r

Como hacer un stat genérico en ggplot2

Hace un tiempo que venía pensando que si bien ggplot2 es genial y tiene un montón de geoms y stats, le faltaba la opción de extenderlo a stats y geoms creados por el usuario. Luego, aprendí que ggplot2 tiene un excelente sistema para extenderlo y empecé a crear mis propios stats


blog-mjay.firebaseapp.com

Design Pattern Tricks for PySpark

Hi there! Apache Spark has been written in Scala originally, although Python developers are loving it’s wrapper-known as PySpark. One can work with RDD’s and dataframes in Python too. We,data science team @Talentica, love PySpark and mostly rely on Spark Clusters for data processing and other relevant stuffs


sharanry.github.io

Google Summer of Code 2018 with PyMC

I have been selected for Google Summer of Code(GSoC) 2018! :D *All models in PyMC3 are defined using such a class. This blog is one of GSoC’s requirements


eliocamp.github.io/codigo-r

How to make a generic stat in ggplot2

For a while now I’ve been thinking that, yes, ggplot2 is awesome and offers a lot of geoms and stats, but it would be great if it could be extended with new user-generated geoms and stats


gcppodcast.com

SRE vs Devops with Liz Fong-Jones and Seth Vargo

I’m a researcher at a regionally accredited academic institution and I need compute resources


blog.rstudio.com

Applied Machine Learning Workshop

Join Max Kuhn of RStudio for his popular Applied Machine Learning Workshop in Washington D.C.! If you’d missed his sold out course at rstudio::conf 2018 now is your chance. This two-day course will provide an overview of using R for supervised learning


www.mytinyshinys.com

EPL Week 38

Match of the DayVery little to play for - other than stacks of place money Well at least we got a nine goal


blog.wallaroolabs.com

Exploring The GitHub Archive

Note: Wallaroo will be hosting a live webinar stepping through the example in this blog post on Thursday, May 24th at 1 PM EST