blog.zenggyu.com/en
The Limitation of Accuracy of Classification Models
It didn’t occur to me that even if a classification model can perfectly estimate the probability of an outcome, the accuracy of the prediction can still be low. This post explains the phenomenon. Some useful notes on the code that makes Figure…
www.robert-hickman.eu
Could an Independent Yorkshire Win the World Cup - LASSOs and Player Positions
The data we’ve scraped only gives a player’s overall ‘ability’ and their abilities on specific skills (e.g. strength, long shots, dribbling…). We want to use this to work out how good each player is at each position…
engineering.pivotal.io
How to use 'tag_filter' in 'git' resources in Concourse CI
This blog post will show how this resource type can be used to fetch a specific git tag from the repository…
jessesadler.com
One Year Anniversary
In a nice little coincidence this is the 11th post to the blog and pushes my first introductory post to the second page of blog entries…
blog.wallaroolabs.com
Stream processing, trending hashtags, and Wallaroo
A prospective Wallaroo user contacted us and asked for an example of chaining state computations together so the output of one could be fed into another to take still further action. In particular, their first step was doing aggregation…
www.tidyverse.org/articles
dplyr 0.7.5
The next release involves substantial refactoring of the internals to make hybrid evaluation simpler and less surprising, a new implementation of grouping that better respects levels of factors, and redesign of the grouping metadata to replace the current collection of attributes by a single tidy tibble…
blog.zenggyu.com/en
A Brief Introduction to Bagged Trees, Random Forest and Their Applications in R
It should be noted that although the bagged trees are identically distributed, they are not necessarily independent. Since the boostrap samples used to train each individual tree come from the same data set, it is not surprising that the trees may share some similar…
aosmith.rbind.io
A closer look at replicate() and purrr
Since I’m going generate random numbers I’ll set the seed so anyone following along at home will see the same values. The output below is a list of three vectors…
guyabel.com
Animating Changes in Football Kits using R
This bit of code can take a while to execute if the are many frames (see my comments towards the end of the post). I could then run the same code as above to scrape the images, annotate the year and copyright information and build the…
sciathlon.github.io
Athlete's foot and its treatment
Hi athletes, today we will be looking at data about a health issue that affects many athletes: athlete’s foot. It’s not a very glamorous subject but it’s still interesting, and I find fungi really fascinating, they are warrior eukaryotes that survive everywhere! Most of you probably would rather never hear about it..…
www.robert-hickman.eu
Could an Independent Yorkshire Win the World Cup - Data & Scraping
In order to calculate how good each county team would be, I needed a measure of the ability of all of the players they could field. For this I turned to the FIFA18 video game which rates players along a variety of scales. Once that’s scraped and bound we can take a peek at the data…
www.rostrum.blog
Tid-ye-text with geniusr
Matt Dray ⚠️ Warning: this post contains offensive words. ⚠️ Genius? Kanye West released his latest album – ye – last week1 after a(nother) pretty turbulent and controversial period of his life2…
leonawicz.github.io/blog
epubr 0.4.0 CRAN release
E-book formatting is non-standard enough across all literature that no function can curate parsed e-book content across an arbitrary collection of e-books, in completely general form, resulting in a singular, consistently formatted output containing all the same variables…
yihui.name/en
Pour Positive Energy into Github Issues
I think it got the largest number of likes among all tweets that ever involved me. It is great to see this many people appreciated the appreciation. Of course, there is no point of just blindly saying good words…
fharrell.com
Viewpoints on Heterogeneity of Treatment Effect and Precision Medicine
There are two stages in the understanding and implementation of RM: In most cases one can compute the absolute benefit as a function of (known or unknown) patient baseline risk using simple math, without requiring any data, once the relative efficacy is estimated…
djnavarro.net
Day 38: Algorithmic complexity
can be produced with a very short R program: whereas a random-looking string like However, if I use R as the compressing language, there is a very short program that produces it: All of which is by way of background. And calling it… Not surprisingly, complexity increases as a sequence becomes longer, even if it’s the the same symbol being…
blog.zenggyu.com/en
Setting Up PostgreSQL
This is one of a series of posts where I document software configurations for personal reference. This post documents the configurations for…
wytham.rbind.io
A note on factors in regression (in R)
Factors terrify me. I can avoid dealing with them most of the time, but they’re immensely useful in a regression when you have a categorical variable with many levels (e.g. “Very Bad”, “Bad”, “Good”, “Very Good”)…
djnavarro.net
Day 36-37: Concerned DALEX
I was working on a longer post continuing the metaprogramming series, and realised I wasn’t going to get it done this evening. But it’s been a couple of days since I tried out something new, so I resorted to the twitters to find…
blog.sellorm.com
First steps with data pipelines
If you’re a data scientist, data engineer or otherwise someone just starting to think about creating data pipelines, you could do a lot worse than check out make. Having a consistent and flexible way of executing your data pipeline should be an essential part of any data professionals toolkit…
blog.zenggyu.com/en
Setting Up R
This is one of a series of posts where I document software configurations for personal reference. This post documents the configurations for R. This post will concentrate on user- or project-specific files, so all the files mentioned below should be placed in a user’s home directory or in the working directory of a project. Global Options -> General:…
www.tidyverse.org/articles
conflicted
Install conflicted by running: conflicted does not export any functions. To use it, you just need to load it: Loading conflicted creates a new “conflicted” environment that is attached just after the global environment…
www.aggieerin.com
New Publications and Updated CV
Hi guys! I just wanted to post that I’ve updated the website to be current with some new publications I wanted to highlight: First up is two papers on psycholinguistics that were undergraduate student projects: Duncan, J., Buchanan, E.M., Marshall, C.Z., & Oberdieck, K. (accepted). But words will never hurt me, Journal of Psychology and Behavioral Sciences, X, XX-XX. PDF Forbes, F.-J., & Buchanan, E.…
blog.zenggyu.com/en
Setting Up Ubuntu
This is one of a series of posts where I document software configurations for personal reference. This post documents the configurations for Ubuntu. There are two settings that may be of particular interest: The following command can be used to update system time, but note that the program is no longer installed on Ubuntu by…
statsbylopez.netlify.com
The within-game evolution of MLB’s strike zone
Next, I averaged across each of the nine-innings and each three-inch window of a strike zone grid to identify the likelihood of a pitch in a given part of the zone being called a strike…
eliocamp.github.io/codigo-r
Tu propio smooth en geom_smooth()
Algo increíblemente satisfactorio de ggplot2 es la posibilidad de ajustar curvas a los datos de manera súper fácil con geom_smooth()…
www.katiescranton.com
Building my website with blogdown
This is my third attempt at building a website, including an (overly?) ambitious idea to document all of the #Rcats and #Rdogs (and #Rchickens Lucy!) on twitter…
mouse-imaging-centre.github.io/blog
An overfit representation of ICLR 2018
I was recently extremely fortunate to attend ICLR 2018, albeit as something of an interloper. Accordingly, what follows is surely a rather atypical highlight reel…
www.stat.cmu.edu/~ryurko
Bayesian Baby Steps
You’re trying to evaluate a receiver’s ability to catch a football. Let’s pretend you can take the following (completely unrealistic) strategy: you tell your quarterback to repeatedly throw the ball to your receiver in practice, recording each time whether or not they caught the ball…
www.onceupondata.com
#runconf18
The other good things were: Stefanie Butland and rOpenSci people organized everything to make sure everyone is feeling…
data-chips.com
Crocheting & plotting
If there’s one thing I’m passionate about, it’s combining my passions. So right now I’m going to visualize crochet circles using R’s ggplot2 package. Rest assured I’m going to keep the crochet jargon to a minimum. First things first, I need to load the ggplot2 package…
ndres.me
Machine learning explained with gifs
About style transfer Pioneered in 2015, style transfer is a concept that uses transfers the style of a painting to an existing photography, using neural networks. The original paper is A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S…
jvera.netlify.com
My favourite snippets
A hidden gem from Rstudio is snippets feature. A well known option in any other editor (Atom, VS Code, Notepad ++….) seems that for R people is not a very used tool. For what I know some developers tend to code a full Add-in for things that can be achieved easily just adding a snippet to your Rstudio configuration…
sciathlon.github.io
Running races and waste
Hi everyone! I am tackling a new topic today, which is: waste generated during races…
ramhiser.com
Adding Dask and Jupyter to a Kubernetes Cluster
In this post, we’re going to set up Dask and Jupyter on a Kubernetes cluster running on AWS…
bgstieber.github.io
An Introduction to the kmeans Algorithm
This post will provide an R code-heavy, math-light introduction to selecting the (k) in k means. It presents the main idea of kmeans, demonstrates how to fit a kmeans in R, provides some components of the kmeans fit, and displays some methods for selecting k…
blog.mgechev.com
Fast, extensible, configurable, and beautiful linter for Go
About a year ago I decided to polish my Go skills. Although the language is pretty small compared to most others that I use on a daily basis, it still has some useful syntax constructs that I didn’t use enough. What a better way to brush up your skills in a programming language other than building tools with it..…
ellocke.github.io
(R) Some Tricks for Blogdown & Hugo (Working Draft)
1 Fix your Table of Contents / TOC (with .Rmd) 1.1 Numbering 1.2 Custom TOC & Numbering CSS 1…
rubuntu.netlify.com
c2d4u Update
On the c2d4u PPAs, my goal is to update and add new packages (from CRAN Task Views) on a weekly basis, usually on the weekend. While I was building c2d4u3.5, I put this on hold, as I didn’t want to build new (to the PPA) packages at the same time as checking old ones…
www.mytinyshinys.com
epldata Package
I have been collating data from the English Premier League since it began in 1992 and have a complete database of every players appearances in league games, details of goals scored and assists made…
ellocke.github.io
(R) Troubleshooting Blogdown & Hugo for (Windows) Dummies
1 When blogdown::serve_site() stops working 2 My Problem Space Working Enivironment 3 How to Debug Hugo…
rubuntu.netlify.com
Announcing cran2deb4ubuntu3.5
Many things to consider before you add this PPA to your Ubuntu machine. The PPA supplies binaries for Trusty (14.04), Xenial (16.04), and Bionic (18.04). If you decide to utilize this PPA, please let me know if something is not working. There is no way I can test all 3,400+ packages and there are always little things that I miss…
ramhiser.com
Interpreting Machine Learning Algorithms
I’ve had an open tab with an overview piece on interpreting machine learning algorithms for several weeks now…
chichacha.netlify.com
Making Calendar with ggplot + Moon Phase Calendar for fun
To make calendar, I need to strip out weekday, month, day, week number within a month. So I can use weekday as x-axis, week number within a month as y-axis, and facet by month. First I just made simple calendar with below code. This time, I tried using geom_tile function to create tiles. I’ve coloured sell using fraction (illuminated fraction of the moon)…
bgstieber.github.io
My First Post
Welcome to my blog! I plan to use this website to present data explorations and analyses in a way that’s understandable to a broad audience. I hope to demonstrate the utility of applying ideas like machine learning, data visualization, and exploratory data analysis to day-to-day life to improve decision-making processes…
thestudyofthehousehold.com
Visualizing insect count data — a zero-inflated poisson model
Most ecologists would agree: it’s really hard to predict which animals are going to be where, and how many of them you might find when you look. Lately, there has been lots of interest in using mixed-effects models to make these predictions…
www.rostrum.blog
Cloudy with a chance of pie
Matt Dray The pinnacle of visualisation Great news everyone: I’ve taken the best of two stellar data visualisations and smashed them together into something that can only be described as perfection. Let me set the scene. There’s three things we can agree on: Everyone loves pie charts, particularly when they’re in 3D, exploded and tilted. Word clouds aren’t at all overused…
cevo.com.au
Jenkins as a Service
In this session we will work through provisioning Jenkins on AWS ECS from a set of Docker containers that allow individuals or teams to self service an immutable CI/CD setup…
yihui.name/en
One (Perhaps Surprising) Reason Why I may Silently Ignore a Github Issue
What made me hesitate when looking at this issue was the incorrect format of the reproducible example…
ropensci.org/technotes
vcr
The first time the above code block is run real HTTP connections are allowed because it doesn’t match any previous requests, and the response is cached. The second time the request is made, the cached response is used…
www.rdatagen.net
A little function to help generate ICCs in simple clustered data
In health services research, experiments are often conducted at the provider or site level rather than the patient level. However, we might still be interested in the outcome at the patient level…
amateurdatasci.rbind.io
All About Git and Github in RStudio
1 Git Newbie 2 Commit, Push, and Pull 3 Let’s Git It On! 3.1 Create an account in Github 3.2 Once we have an account, we can immediately create a repository. 3.3 Configure Git in R Studio 3.4 Create new project with version control 3.5 Copy Repository URL and Create Project 4 Git Up, Git Down 4.1 R Studio’s Easy Git Interface. 4.2 First Commit 4…
sciathlon.github.io
Favorite trail race
Hi everyone! I am continuing my journey to learn awk and I finally managed to process (almost) an entire file today so let’s analyse the 2018 Tencin trail…
leonawicz.github.io/blog
trekfont
First use base graphics. Did you ever think you would be annotating your plots in Vulcan and Klingon? Next use ggplot2…
rubuntu.netlify.com
Adding jq library to Trusty and Xenial PPAs
One of the advantages of using Launchpad’s PPA system is that it allows you to easily use the work of others backporting packages to older…
gcppodcast.com
Decision Intelligence with Cassie Kozyrkov
There are several other episodes that provide insights into data science: As well as case studies on real world problems: How can I secure my Google Cloud Platoform acoount using a…
eliocamp.github.io/codigo-r
Hacer una presentación de PowerPoint a partir de rmarkdown
La interfaz entre usuarios de knitr/markdown y word/powerpoint no deja de ser áspera ya que es difícil cambiar el workflow propio para acomodar el de otras personas…
www.jessemaegan.com
R4DS June Challenge
No, however you are still encouraged to work through a book or course and share what you’ve learned on Twitter by using the #SoDS18 hashtag…
blog.wallaroolabs.com
Real-time Streaming Pattern
Introduction Many of you have been reading our engineering blog and enjoy our deep technical dives…
blog.wallaroolabs.com
Streaming with Wallaroo
Introduction Many of you have been reading our engineering blog and enjoy our deep technical dives…
cjbarrie.netlify.com
Younger electorates vote independent in Tunisian Municipal Elections
Municipalities with a greater number of younger registered voters saw a higher vote share for independent lists. Similarly, younger electorates were less likely to vote for established parties…
ropensci.org/technotes
taxize
We’ve come a long way since May 2011. We’ve added a lot of new functionality and many new contributors…
rubuntu.netlify.com
Replacing weatherunderground.com data with...weatherunderground?
Combining these two data sources recreates what weatherunderground.com used to provide. It should be noted that Dark Sky has a slightly different definition of a day for their API, calculating daily averages from 4AM to 4AM, not midnight to midnight. I compared data for previous years and the differences were negligible…
www.aggieerin.com
Current Publications with Papaja
Heyo! Frederik, the author of papaja, requested that we update him with papers written with his package. I was like, oh man, like the whole lab?! So, I decided that I could probably make it easy by making a table here…
mgb-research.netlify.com
Gaussian Process Imputation/Forecast Models
As a toy problem, I am going to focus on the application of a Gaussian process model to forecasting future monthly passengers. This is not the only way one could try to solve this prediction problem…
lenkiefer.com
Pomological Plots
In the real world, when I give talks and use slides I am typically constrained in my aesthetic. Often I’m speaking at a work-related thing and we have a corporate template and color scheme. They serve us well and I’ve found restraint helps focus on the message…
www.tidyverse.org/articles
ggplot2 2.3.0 — upcoming release
In addition to highlighting a few features and improvements, we also want to share a bit about our release-preparation process for ggplot2, which has over 2,000 reverse…
chichacha.netlify.com
16 Personalities with Circlize
There were difinitely some traits that sounded like me, and a lot of statement I can relate for sure. It’s definitely some fun test to do. There was a one part that stated “INFP - compromising just 4% of the population”…
eliocamp.github.io/codigo-r
Arte reproducible del Subte de Buenos Aires
El sábado pasado estábamos en el subte con mi novia y pasamos por la estación Ángel Gallardo de la Línea B y notamos que tenía unas ilustraciones en las paredes hechas a partir de líneas de distinto grosor. Charlamos un poco sobre ese estilo y cómo se podía replicar con ggplot2 usando el paquete ggridges…
chichacha.netlify.com
Daylight in Vancouver (Canada) vs Tokyo (Japan)
I currently live in British Columbia, Canada. So I live above 49th parallel line…
vegawidget.rbind.io
Introducing altair, an R interface to the Altair Python Package
Introducing altair, an R package to work with the Python package Altair, which you can use to build and render Vega-Lite chart-specifications: https://vegawidget.github.io/altair Vega-Lite offers an implementation of an interactive grammar of graphics…
ramhiser.com
Setting Up a Kubernetes Cluster on AWS in 5 Minutes
Kubernetes is like magic. It is a system for working with containerized applications: deployment, scaling, management, service discovery, magic. Think Docker at scale with little hassle…
blog.brianz.bz
The Dark Art of AWS VPC Networking
It’s been quite some time since a blog post went up here. The reason for this is mainly due to my book with Packt Publishing, Serverless Design Patterns and Best Practices. Happily I can say that it’s published and I can turn my technical attention to other things. In chapters 2 and 3, I walk through setting up serverless REST and GraphQL APIs, respectively…
rubuntu.netlify.com
Update on the Move to R 3.5.0
One of the challenges with using Launchpad is that once a package is built, it needs to be published. This takes some time (around 20 minutes). Therefore, you can’t just push a series of packages to Launchpad and walk away. In order to ensure the dependencies are built, you need to wait until they have been published in the PPA…
vegawidget.rbind.io
Welcome to vegawidget
The effort to bring Vega-Lite to the R community is collaborative; so it appropriate that the altair package be hosted by an organization. The altair R package uses the Altair Python package to create Vega-Lite specifications for interactive charts…
www.openplantpathology.org
What is going on in OPP? a quick summary of the first five months
To achieve this goal, OPP evolved to: We were surprised by a quick reaction and initial engagement in our Slack workspace where several channels were created to accommodate smaller groups with a specific interest including #epidemictheory, #phytopathometry, #reproducibility, #teaching and #r-pkg-dev, as among the more active…
roh.engineering
Animating a Monte Carlo Simulation
Oftentimes, I run into difficulty trying to explain some of the concepts of statistical sampling with audiences that either have very limited or no understanding of statistics…
matthewsmith.rbind.io
Country Networks and Flags
Recently, I was asked whether I could create an international trade network with flags as nodes. Therefore, I thought I would write a post introducing the ggflags packages and how to use it in network visualisation…
amateurdatasci.rbind.io
Hideous Progeny
1 The Moden Prometheus 2 Pursuit for Frankenstein Begins 3 Emptiness Filled 4 Destruction and Creation 5 Of Man, Of Life 6 A Big Ending 7 Uncontrollable Feelings 7.1 Waves of Emotions 7.2 Down the Precipice 7.3 Fear the Daemon 8 Ice and Hearts of Fire And now, once again, I bid my hideous progeny go forth and prosper…
www.rostrum.blog
Pokeballs in Super Smash Bros
Matt Dray Smash! Super Smash Bros (SSB) is a beat ’em up videogame series featuring characters from various Nintendo franchises and beyond…
chichacha.netlify.com
Testing Entry with R Rmarkdown File
Just figuring out how the blog post works with this random set of coffee data! Pie chart can be created with using polar…
amateurdatasci.rbind.io
The Crusade
1 A View of Mining Lyrics 2 Into the Mouth of R We March 3 Pull Harder on the Strings of Lyrics with geniusR 4 Text Dismantled 5 Anthem (We are the Functions) 6 Album That Spawned the Most Words 7 Tread the Words 8 Torn Between Term Frequency and Inverse Document Frequency 9 Negativity Thrives 10 These Sentiments Can’t Tear Us Apart (Warning: NSFW…
www.rladiesnyc.org
Lightning Talks!
6:30-6:55: Food & Networking 6:55-7:00: Introduction by our host 7:00-8:30: Lightning talks 8:30-9:00: Networking Date: Thursday, June 14, 2018 Time:…
cevo.com.au
Watchmen on the Radar
Cevo are thrilled to see the Watchmen project receive recognition in the 2018 ThoughtWorks Technology Radar…
cjbarrie.netlify.com
Youth and competition boost turnout in Tunisian Municipal Elections
Municipalities with a greater number of younger registered voters experienced higher turnout. Municipalities with more candidates proportional to the size of the electorate witnessed substantially increased turnout…
ropensci.org/technotes
drake's improved high-performance computing power
A typical workflow is a sequence of interdependent data transformations…
ramhiser.com
I Was on a Machine Learning for Geosciences Podcast
I listen to a lot of podcasts — Tim Ferriss, EconTalk, Rocket, Talking Machines. But I had an opportunity to be on one called Undersampled Radio! It was a lot of fun…
ritsokiguess.site/docs
Ken ventures into community ecology
Introduction Somebody mentioned ANOSIM to me, and I had this kind of vague recollection of it, meaning that I didn’t really understand anything of it at all. This prompted me to explore further, which got me into the vegan package…
g-tierney.github.io
The Genetics of Magic
Last spring, I took a class on Bayesian statistics at the University of Chicago that had several exercises focused on building a model to classify species based on their genome. The basic setup was that you were given a data set of salmon, their genome sequencing data, and which sub-population they belonged to…
eliocamp.github.io/codigo-r
Como hacer un stat genérico en ggplot2
Hace un tiempo que venía pensando que si bien ggplot2 es genial y tiene un montón de geoms y stats, le faltaba la opción de extenderlo a stats y geoms creados por el usuario. Luego, aprendí que ggplot2 tiene un excelente sistema para extenderlo y empecé a crear mis propios stats…
blog-mjay.firebaseapp.com
Design Pattern Tricks for PySpark
Hi there! Apache Spark has been written in Scala originally, although Python developers are loving it’s wrapper-known as PySpark. One can work with RDD’s and dataframes in Python too. We,data science team @Talentica, love PySpark and mostly rely on Spark Clusters for data processing and other relevant stuffs…
sharanry.github.io
Google Summer of Code 2018 with PyMC
I have been selected for Google Summer of Code(GSoC) 2018! :D *All models in PyMC3 are defined using such a class. This blog is one of GSoC’s requirements…
eliocamp.github.io/codigo-r
How to make a generic stat in ggplot2
For a while now I’ve been thinking that, yes, ggplot2 is awesome and offers a lot of geoms and stats, but it would be great if it could be extended with new user-generated geoms and stats…
gcppodcast.com
SRE vs Devops with Liz Fong-Jones and Seth Vargo
I’m a researcher at a regionally accredited academic institution and I need compute resources…
blog.rstudio.com
Applied Machine Learning Workshop
Join Max Kuhn of RStudio for his popular Applied Machine Learning Workshop in Washington D.C.! If you’d missed his sold out course at rstudio::conf 2018 now is your chance. This two-day course will provide an overview of using R for supervised learning…
www.mytinyshinys.com
EPL Week 38
Match of the DayVery little to play for - other than stacks of place money Well at least we got a nine goal…
blog.wallaroolabs.com
Exploring The GitHub Archive
Note: Wallaroo will be hosting a live webinar stepping through the example in this blog post on Thursday, May 24th at 1 PM EST…