www.tidyverse.org/articles

pillar 1.2.1

In response to use feedback, the following changes were made to the output: Very small numbers are now printed


www.tidyverse.org/articles

rlang 0.2.0

The quirks in the quasiquotation syntax have been much reduced. Quosures gained a much improved printing method with colour support. The performance of quoting, splicing and quosure evaluation was vastly improved. Many bugs have been fixed. Install the latest version of rlang with: The way it prints inlined vectors is ambiguous


mailund.github.io/r-programmer-blog

tailr — Tail Recursion Optimisation

Believe it or not, all the bother with setting up this blog was such that I could write this post easier than I could on Wordpress


mailund.github.io/r-programmer-blog

First Post

This is my first attempt at a Hugo+Blogdown blog


davemcg.github.io

Let’s Plot 4

The battle that we’ve all been waiting for. Excel vs. R. Bar plot versus a plot that actually shows the data. Yeah, this isn’t a fair fight. Bar plots are terrible. Why? Simple. They don’t show what your data looks like. A bar plot gives you zero idea how many data points there are


www.aggieerin.com

New Publications

Just wanted to do a quick post to say that the Nature Human Behavior response paper, Justify Your Alpha is now online at NHB’s website: Springer - it is free to view but not download. You can download the PDF version on OSF


engineering.pivotal.io

Event Sourcing with Kafka, RabbitMQ and JPA

We recently finished work on a system for a client in which we built an Event Source system. This application is a demo of the architecture we produced. This application was unique in that we implemented the backend with Apache Kafka, MongoDb and MySQL


mlr-blog.netlify.com

Interpretable Machine Learning with iml and mlr

Machine learning models repeatedly outperform interpretable, parametric models like the linear regression model. The gains in performance have a price: The models operate as black boxes which are not interpretable


thestudyofthehousehold.com

Mixed modelling as a foreign language

j’ n’y suis jamais allé quoi Ah dacc, tout s’explique donc Andrew: … pardon, mais vous avez dit combien de mots là? I have something like that interaction almost every day. French is an intricate language, and high school studying prepared me well for all the conjugations, rules, and exceptions


gcppodcast.com

Solution Architects with Miles Ward and Grace Mollison

How do I get a Docker image into Minikube without uploading it to an external registry and then downloading it all over again? Is there an easy way to do this


mlr-blog.netlify.com

Training Courses for mlr

The mlr: Machine Learning in R package provides a generic, object-oriented and extensible framework for classification, regression, survival analysis and clustering for the statistical programming language


ritsokiguess.site/docs

Working my way back to you, a re-investigation of rstan

Introduction I learned Stan a while back, when I was fitting some Bayesian models


yihui.name/en

'Bite-sized' Pull Requests

The cognitive load of PR reviewers will also be lighter when PRs are bite-sized. Each small PR may take one minute to review and merge, but if three small changes are placed in a moderate PR, it may take more than three minutes to review and merge (1 + 1 > 2). P.S


yihui.name/en

And They Closed a Valid Question on StackOverflow Again

I still don’t understand the benefit of closing questions on StackOverflow. If you close a question, no one can answer it (you’ll have to wait for five people to vote to re-open it), even if it is a valid question


www.samabbott.co.uk

Exploring Estimates of the Tuberculosis Case Fatality Ratio - with getTBinR

Maps can be a useful first visualisation for summarising spatial data, although they can also be misleading for more complex comparisons


saidejp.rbind.io

Fundamentos de Inferencia Estadística

A pesar de que en la formación en psicología se nos ofrecen varios cursos sobre estadística descriptiva e inferencial, difícilmente los estudiantes comprenden a qué se refiere exactamente el tema. Es común, relacionar la inferencia con la aplicación de pruebas estadísticas (e.g


blog.wallaroolabs.com

How We Built Wallaroo to Process Millions of Messages/Sec with Microsecond Latencies

When designing Wallaroo-a high-throughput, low-latency data processing framework written in the Pony programming language-we were concerned with designing for performance from the very beginning, with our initial goal being to achieve sub-millisecond latency tails with high throughputs


lenkiefer.com

Employment growth and house price trends

LET US TAKE A LOOK AT HOUSE PRICE AND EMPLOYMENT TRENDS. House prices in the Unitest States have been increasing at a rapid pace, about 7 percent on an annual basis. How does that relate to employment growth? And how do those trends vary by geography. Let’s take a look


cevo.com.au

It's Showtime!

Benefits of putting on a Show(case) I’ve found regular showcases one of the most effective tools in the Agile bag of tricks. Here are some of the things I’ve used showcases for: Sharing your work Well this one is pretty obvious


timtrice.net

Query Stack Overflow for Top 5 Tags and Children Posts

You should give your query a title but it is not necessary. It does help keep track of your edits as you move along. I want to find the top five tags on Stack Overflow along with the top five posts per tag. The Stack Exchange Data Explorer gives us free access to this data. The data itself is slightly out-of-date, but this is fine


purrple.cat/blog

emoji domain names with the puny package

Typical sunday night, lost in several inception layers of I don’t know how I got here, what I am doing here and what I was looking for in the first place


www.jtimm.net

topic models for synchronic & diachronic corpus exploration

Synchronic application Diachronic application Topic clusters quick summary References This post outlines a fairly simple workflow from annotated corpus to topic model, with a focus on the exploratory utility of topic


wenlong-liu.github.io

Brief introduction of storm hysteresis effects in solute concentration-stream discharge (C-Q) relationship

Generally, in order to investigate the dynamics of stream discharge and solute concentrations (C-Q relationship) in a watershed, researchers and environmental engineers usually set up monitoring stations in the watershed


www.aggieerin.com

New Publication - Detect Low Quality Data

My coauthor John Scofield and I just had a publication accepted at Behavior Research Methods - you can check out the publication preprint at OSF


ryantravis.netlify.com

Super Learning from Scratch

Introduction Super Learning is a conceptually simple way of combining predictions from different models using cross validation. It simply uses the cross-validated results to form an optimal weighted combination of predictions


www.tidyverse.org/articles

usethis 1.3.0

[Jigs are made] to increase productivity through consistency, to do repetitive activities or to do a job more precisely. usethis provides a useful complement to WRE. Each function adds one specific piece of infrastructure to a package or project


www.riinu.me

Converting old Wordpress posts to Hugo

Get all your wordpress posts into one XML: WP Admin - Tools - Export. Looks like most of my posts were converted like a charm, with nicely formatted code blocks and images. But I few things I noticed that I think I have to fix: Overall I feared a lot worse and am super happy with the conversion experience


www.jessemaegan.com

R4DS March Challenge

We’re so close to March, which for many of you involves a lot of college basketball


www.aggieerin.com

Citations in R Markdown + Papaja

Heyo! I wanted to write a post about some of the quirky things I’ve found with writing manuscripts in R Markdown, as well as provide a solution to a problem that someone else might be having. Update: The csl file I describe below is a special formatted one, which was shared with me


thestudyofthehousehold.com

Converting Anxiety into wisdom

I can be skeptical, even anxious, about ecology – particularly about methods and data quality. However, “anxiety can be cultivated into Wisdom” (McElreath, 2015). This is my mission for myself, and this post might be one of a series attempting this cultivation


ritsokiguess.site/docs

Making a lot of plots all at once, the tidyverse way

Introduction I was thinking the other day about how you might come up with a bunch of separate-but-related plots, without plotting them one by one, for example to show a class


yihui.name/en

Netlify is Hiring Its First Data Scientist

Netlify was a company that I started to appreciate very much from 2017. I could feel how committed it was to open source, its passion for making it super convenient for people to publish content to the web, and also its amazing customer service


mouse-imaging-centre.github.io/blog

Preferential Spatial Gene Expression in Neuroanatomy

Intro In this post I will demonstrate how to use my package ABIgeneRMINC to download, read and analyze mouse brain gene expression data from the Allen Brain Institute


yihui.name/en

Thanks, Alicia Schep, for Digging into knitr Engines

After figuring out a quick way to do this, I ended up becoming interested in how knitr’s language engines work, and was pleasantly surprised by how accessible the engines are - with a few lines of code you can add a new chunk option to affect the output of a javascript chunk! So thanks, Alicia


mvaugoyeau.netlify.com

How I check the data

There are outliers? First step: Outliers and Boxplot Second step: Outliers, means and standard error Verify the repeatability of the data Actually I analyse data from a thesis which measure urbanisation’s influence on physicochemical characteristics


blog.millerti.me

Skipping CI Jobs on GitLab

In the last year or so I’ve earnestly incorporated Continuous Integration (CI) pipelines into a couple of projects to automate the testing, building, and deployment of various sites and packages


www.tidyverse.org/articles

stringr 1.3.0

Since stringr is loaded with tidyverse, this means that you can now access glue’s functionality without loading another


gcppodcast.com

Google Play Marketing with Dom Elliott and Stewart Bryson

Dom Elliott leads global developer marketing communications for Google Play


yihui.name/en

The #1 Question to Ask Yourself when Designing a Questionnaire

Earlier last month, I took the 2018 Stack Overflow Developer Survey, and I found a few really difficult questions, such as the one that asked me to rank 10 aspects of a job opportunity in order of


www.tidyverse.org/articles

forcats 0.3.0

You can install the latest version with: We needed to make two backward incompatible changes in order to increase consistency across the


www.rdatagen.net

“I have to randomize by cluster. Is it OK if I only have 6 sites?'

Here is the bottom line: if there are differences between clusters that relate to the outcome, there is a good chance that we might confuse those inherent differences for treatment effects


www.blog.rdata.lu

BIKE SERVICES API + SHINY = NICE APP

Hi everyone, The JCDecaux API gives the data under the following format: Hence, our shiny application gets real time information on bike stations in 27 cities. This application works better on computer than on smartphone because shiny is not fully smartphone friendly


blog.wallaroolabs.com

Building low-overhead metrics collection for high-performance systems

Metrics play an integral part in providing confidence in a high-performance software system. Whether you’re dealing with a data processing framework or a web server, metrics provide insight into whether your system is performing as expected


cattleguard.github.io

Competitive Steak Eating and Gender

Before we get too far down the trail on this, I’ll warn readers that this is a pink and blue post. It’s simple prediction using an interesting R package. It’s important to consider the stakes (pun intended), when “enriching” a dataset with information that might introduce bias


blog.wallaroolabs.com

Latency Histograms and Percentile Distributions In Wallaroo Performance Metrics

How We Implemented Wallaroo’s Low Overhead Performance Counters, and the Philosophy Behind Our Choices This post is based on an internal white paper from May 2016 and follows the basic paper format


jesse.tw

Simulating A/B testing and experiment data

Simulating data is super useful for testing methods and data science interview prep Simulation is a great way to study statistics. If you’re picking up a method for the first time, (e.g


lbusett.netlify.com

Speeding up spatial analyses by integrating `sf` and `data.table`

However, this starts to have problems over really large datasets, because the total number of comparisons to be done still rapidly increase besides the use of spatial


ropensci.org/technotes

webmockr: mock HTTP requests

But I’ve been making some improvements, so you’ll probably want the dev version: Install some dependencies Next, you’ll want to think about stubbing requests Stubbing requests simply refers to the act of saying “I want all HTTP requests that match this pattern to return this thing”


cevo.com.au

A Lead...or a Leader?

After discussing hiring processes with friends of mine who work in IT Recruitment, I’ve noticed a common theme; hiring is quite often based on a candidate’s online profile, rather than taking a more holistic


lenkiefer.com

More house price plots

SO TODAY I SPENT SOME TIME WITH THE KIDDOS and contemplated the Enlightenment, so I didn’t have time to write up some code. But I will post a couple images that I think are interesting. I’ve got two plots for you, both using geofacets. See this post on using the geofacet package in R to make plots like these. The first plot shows U.S


jesse.tw

Why bother with covariates in A/B testing?

Motivation I’ll skip the part where I tell you why A/B testing is important. Just look at any data science team in tech, Microsoft, Airbnb, Twitter, Facebook, etc. etc


rmflight.github.io

knitrProgressBar Package

These are pretty easy to setup and


www.aggieerin.com

A Shiny App to Compare Stats

For a recent publication comparing null hypothesis testing p-values to Bayes Factors and Observation Oriented Modeling, we created a Shiny app to graph all of our complex plots


ryantravis.netlify.com

Covariate Adjustment for Binary Outcomes in Randomized Trials

Introduction A common misconception about randomized clinical trials is that the randomization process should balance any particular covariate across the arms of the trial and that therefore there is no benefit to controlling for covariates with a regression model unless a particular covariate happens to be unbalanced by


blog.sellorm.com

I am not a Data Scientist - My R journey

Today is the fifth anniversary of my joining Data science consultancy, Mango Solutions. That also means it’s my fifth anniversary of using and working with R


jesse.tw

Open Canada 🇨🇦 Audit with R

Motivation Open data is an important way to get information in the hands of citizens


www.rladiesnyc.org

Parallelization of Simulations with the foreach Package and Missing Data in R

Come out to our March event to hear talks from two great R Ladies! First we’ll learn about parallelization of simulations with the foreach R package, with applications to progression free survival assessed using electronic health records. Then we’ll get an introduction to methods for handling missing data in R


lenkiefer.com

Recent House Price Trends

LAST YEAR WE TOURED recent house price trends Post. Let’s update the data visualizations with data through December 2017. We are going to show house price trends using data from the publicly available Freddie Mac House Price Index. Animation: Here’s an updated animation showing trends in the top 20 metro areas, based on population


www.seanlnguyen.com

The World’s Most Powerful Rocket

SpaceX launched Falcon Heavy this week and I remembered how Elon Musk noted that it would have twice the thrust of any rocket currently in existence. I was intrigued by this statement and decided to look further and compare the thrusts of other rockets of the past and rockets that are planned in the future


nilsreimer.com

Workshop

I’m teaching a workshop/lecture on data visualisation as part of the Advanced Statistics course for our department’s graduate students. I’ve made all slides and materials available on GitHub. Click here for the repository, or here to download everything. If you haven’t attended the workshop, the slides might lack context


www.jennadallen.com

Football Fans

This was a project that I originally did for my Data Warehousing class in grad school using Microsoft SQL server and SSIS. I’ve been taking a lot of datacamp courses lately and wanted to put what I learned about the tidyverse into action


www.aggieerin.com

Learn About MOTE

The APA Task Force on Statistical Inference (Wilkinson & TFSI, 1999) has advocated the inclusion of effect sizes in journal articles as an important source of information. The fifth and current edition of the APA publication manual (2001, 2010) emphasized these findings from the task force, along with requirement of effect sizes to publish in their journals


jesse.tw

Measuring URL health in R

Motivation I’m working on auditing Canada’s open data portal


www.aggieerin.com

Research Statement

Upon arriving at Missouri State University, I founded the Deciphering Outrageous Observations and Modeling (DOOM) lab which has included more than ten graduate and thirty undergraduate students. My research mission has been in two primary domains described in detail below and includes many collaborative efforts throughout the years.


sharanry.github.io

Skip Thought Vectors

Unsupervised way of representing text in order to produce task independent vector representations. Current processes mostly use compositional operators that map word vectors to sentense vectors using various deep learning methods


www.frankfarach.com

Taking flight with R

Inspired by the current exhibit on ART ∩ MATH at Seattle’s Center on Contemporary Cart (COCA), I decided to replicate one of the pieces by the very talanted Iranian mathematical artist, Hamid Naderi Yeganeh


www.aggieerin.com

Teaching Statement

Overview. My approach to teaching centers on the ideas of accessibility, association, and application. As an effective instructor, I strive to ensure that all students are able to orient to and understand material


yihui.name/en

My Early Career Crisis (2014 - 2015)

So yeah, I’ve always got issues. Sometimes funny, and sometimes not so funny. Sometimes I make good trouble, and sometimes I make really bad trouble. So Dr Xie got a PhD degree in statistics, with an invisible “master” degree of procrastination


blog.davisvaughan.com

Tidying Excel cash flow spreadsheets using R

Below is a typical cash flow statement for 1 year of performance, broken down by month. This does not fit the “tidy” data standards, but is incredibly common in the Financial world


yihui.name/en

Another R-Podcast with Eric Nantz

Many people don’t understand the point of Markdown, especially those LaTeX users. They don’t understand why we invest so much time in such a simple and weak authoring language. My philosophy is that the pursuit of features can sometimes be harmful for both software developers and users. You may spend too much time on features of software, and even forget what you actually need to do (e.g


mouse-imaging-centre.github.io/blog

Co-Clinical Trials

Why co-clinical trials? Here at the Mouse Imaging Centre, a large portion of our research is related to neurodevelopmental disorders. Ultimately the goal of such research is to improve health outcomes for individuals with these disorders


wytham.rbind.io

Scraping NIH PIs with rvest

Background: I was doing some exploratory work for a potential project looking at intramural investigators at the NIH


cevo.com.au

Another Warrior In Our Midst

We are delighted to announce that Cevo’s Trent Hornibrook has been selected as one of the 2018 AWS Partner Network (APN) Cloud Warriors


adamspannbauer.github.io

Image Classification with Keras in R & Python

This post is a comparison between R & Python for applying the pretrained imagenet VGG19 model shipped with keras. The comparison for using the keras model across the 2 languages will be addressing the classic image classification problem of cats vs dogs


rmflight.github.io

Licensing R Packages that Include Others Code

If you include others code in your own R package, list them as contributors with comments about what they contributed, and add a license statement in the file that includes their code


gcppodcast.com

Machine Learning Bias and Fairness with Timnit Gebru and Margaret Mitchell

Sample papers on bias and fairness: Additional links: “Is there a gcp service that’s cloud identity-aware proxy except for a static site that you host via cloud


jesse.tw

Parliament's gender problem

(Preface: I’m 🇨🇦 but idk anything about parliament and dk what a hansard was before I started


www.gokhanciflikli.com

Supervised vs. Unsupervised Learning

Outcome Supervision Yesterday I was part of an introductory session on machine learning and unsurprisingly, the issue of supervised vs. unsupervised learning came up


ropensci.org/technotes

Support for hOCR and Tesseract 4 in R

Two major new features are support for HOCR and support for the upcoming Tesseract 4. Every word in the hOCR output includes meta data such as bounding box, confidence metrics, etc


www.blog.rdata.lu

Teaching Luxembourgish to my computer

Today we reveal a project that Kevin and myself have been working on for the past 2 months, Liss. Liss is a sentiment analysis artificial intelligence; you can let Liss read single words or whole sentences, and Liss will tell you if the overall sentiment is either positive or negative


blog.schochastics.net

Using UMAP in R with rPython

UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data


thestudyofthehousehold.com

Easily made fitted and predicted values made easy

Earlier this week I wrote out a multilevel model. It had fit well (though slowly) and I spent a happy hour admiring the chains, checking the coefficients, plotting posterior values. Life was good and easy; merrily did I sail before a fair breeze and a clear sky. My next port of call was to plot some smooth fitted lines, aka “counterfactual predictions”. Ah yes


leonawicz.github.io/blog

Mix ggplot2 graphs with your favorite memes. memery 0.4.2 released.

Please do share your data analyst meme creations.


www.ifconfig.it/hugo

NetShot

We live in a time of intent, automation, orchestration and a lot of wonderful tools that promise to make the life of network engineers easier. Sometimes reality is simpler and maybe less fascinating, real problems need to be solved quickly with small budget


www.jakekaupp.com

Second star to the right and straight on 'til morning

The strength of R doesn’t lie in a single programming paradigm, it lies within the warm, welcoming and ecclectic community of useRs. Like anyone who gets introduced to R, you start to look on the web for other like minded people


batteriesnotincluded.rbind.io

The FutuRe is Bright

I’ve been a (usually) silent observer of the rstats community via twitter. Occasionally I’ll jump in and share thoughts or retweet something I found particularly helpful or inspiring, but for the most part I just sit back and observe. I’ve always admired the fact that, online, the R community seems helpful, kind, and aware of one another


www.mytinyshinys.com

EPL Week 27

For the remainder of the season, I will be travelling with a back up laptop so please excuse any shortfall in posts and site updates Match of the Day/p> Aguero-a-go-goWith the signing of Jesus last season and the pursuit of Alexis Sanchez in the recent transfer window, it appears as though Sergio Aguero is no longer the favourite son at Manchester


satopirka.com

Encoder-decoderモデルとTeacher Forcing,Scheduled Sampling,Professor Forcing

Encoder-decoderモデルとTeacher Forcing,それを拡張したScheduled Sampling,Professor


satopirka.com

Encoder-decoderモデルとTeacher Forcing,Scheduled Sampling,Professor Forcing

Encoder-decoderモデルとTeacher Forcing,それを拡張したScheduled Sampling,Professor


www.riinu.me

Hello world

I wrote my last blog post on Wordpress on 20-October 2017 and promised myself this was the last time. I’ve been blogging on Wordpress since 2014 and the more I used it the more painful it got! This is most likely caused by the fact that I have been thrifting further and further away from point-and-click interfaces anyway..


yutani.rbind.io

ICYMI

Then, I did a quick poll (sorry, in Japanese) about how well known is the


jvera.netlify.com

My imposter syndrome

There’s a well known issue about the psychology of the majority of people working on Data Science


theaknowles.com

Reflection

Sort of. It rolled just a wee bit and then it stopped. I’d say we’ve had a pretty stellar year. Thank you to each of you who have come out to the events, who have stuck around and headed to Milo’s with us after, and have provided input, insight, and encouragement


www.jessemaegan.com

So you’ve been asked to make a reprex

Reprexes are significantly easier to read, as well as copy and paste


livefreeordichotomize.com

The United States of Seasons

⊕I think my favorite detail about this map is the little splotch that is the Smoky Mountains on the western edge of North Carolina


www.jamesuanhoro.com

Using binary regression software to model ordinal data as a multivariate GLM

I have read that the most common model for analyzing ordinal data is the cumulative link logistic model, coupled with the proportional odds assumption. Essentially, you treat the outcome as if it were the categorical manifestation of a continuous latent variable


yutani.rbind.io

dplyr Doesn't Provide Full Support For S4 (For Now?)

I’ve seen sooo many (duplicated) issues on this topic were opened on dplyr’s repo and lubridate’s repo. So, apparently, the content of this post won’t stay useful over time. But, for now, I feel this temporal “known issue” should be well-known, at least among those who suffers from this issue


www.rladiesnyc.org

Creating websites in R

Date: Thursday, February 15, 2018 Time: 6:30pm Speaker: Emily


www.mytinyshinys.com

EPL Week 26

For the remainder of the season, I will be travelling with a back up laptop so please excuse any shortfall in posts This week’s crisis team, Chelsea


www.samatkins.me

Setting up ESLint and Prettier

Image: Prettier.io Set-up ESLint and Prettier I’m a big fan of linting and I love the configurability of ESLint with the auto formatting capabilities of Prettier. It’s been a revelation. Learning best practices in terms of ESLint rules and formatting from Prettier, plus no more bikeshedding at work in pull requests on coding style