www.tidyverse.org/articles
pillar 1.2.1
In response to use feedback, the following changes were made to the output: Very small numbers are now printed…
www.tidyverse.org/articles
rlang 0.2.0
The quirks in the quasiquotation syntax have been much reduced. Quosures gained a much improved printing method with colour support. The performance of quoting, splicing and quosure evaluation was vastly improved. Many bugs have been fixed. Install the latest version of rlang with: The way it prints inlined vectors is ambiguous…
mailund.github.io/r-programmer-blog
tailr — Tail Recursion Optimisation
Believe it or not, all the bother with setting up this blog was such that I could write this post easier than I could on Wordpress…
davemcg.github.io
Let’s Plot 4
The battle that we’ve all been waiting for. Excel vs. R. Bar plot versus a plot that actually shows the data. Yeah, this isn’t a fair fight. Bar plots are terrible. Why? Simple. They don’t show what your data looks like. A bar plot gives you zero idea how many data points there are…
www.aggieerin.com
New Publications
Just wanted to do a quick post to say that the Nature Human Behavior response paper, Justify Your Alpha is now online at NHB’s website: Springer - it is free to view but not download. You can download the PDF version on OSF…
engineering.pivotal.io
Event Sourcing with Kafka, RabbitMQ and JPA
We recently finished work on a system for a client in which we built an Event Source system. This application is a demo of the architecture we produced. This application was unique in that we implemented the backend with Apache Kafka, MongoDb and MySQL…
mlr-blog.netlify.com
Interpretable Machine Learning with iml and mlr
Machine learning models repeatedly outperform interpretable, parametric models like the linear regression model. The gains in performance have a price: The models operate as black boxes which are not interpretable…
thestudyofthehousehold.com
Mixed modelling as a foreign language
j’ n’y suis jamais allé quoi Ah dacc, tout s’explique donc Andrew: … pardon, mais vous avez dit combien de mots là? I have something like that interaction almost every day. French is an intricate language, and high school studying prepared me well for all the conjugations, rules, and exceptions…
gcppodcast.com
Solution Architects with Miles Ward and Grace Mollison
How do I get a Docker image into Minikube without uploading it to an external registry and then downloading it all over again? Is there an easy way to do this…
mlr-blog.netlify.com
Training Courses for mlr
The mlr: Machine Learning in R package provides a generic, object-oriented and extensible framework for classification, regression, survival analysis and clustering for the statistical programming language…
ritsokiguess.site/docs
Working my way back to you, a re-investigation of rstan
Introduction I learned Stan a while back, when I was fitting some Bayesian models…
yihui.name/en
'Bite-sized' Pull Requests
The cognitive load of PR reviewers will also be lighter when PRs are bite-sized. Each small PR may take one minute to review and merge, but if three small changes are placed in a moderate PR, it may take more than three minutes to review and merge (1 + 1 > 2). P.S…
yihui.name/en
And They Closed a Valid Question on StackOverflow Again
I still don’t understand the benefit of closing questions on StackOverflow. If you close a question, no one can answer it (you’ll have to wait for five people to vote to re-open it), even if it is a valid question…
www.samabbott.co.uk
Exploring Estimates of the Tuberculosis Case Fatality Ratio - with getTBinR
Maps can be a useful first visualisation for summarising spatial data, although they can also be misleading for more complex comparisons…
saidejp.rbind.io
Fundamentos de Inferencia Estadística
A pesar de que en la formación en psicología se nos ofrecen varios cursos sobre estadística descriptiva e inferencial, difícilmente los estudiantes comprenden a qué se refiere exactamente el tema. Es común, relacionar la inferencia con la aplicación de pruebas estadísticas (e.g…
blog.wallaroolabs.com
How We Built Wallaroo to Process Millions of Messages/Sec with Microsecond Latencies
When designing Wallaroo-a high-throughput, low-latency data processing framework written in the Pony programming language-we were concerned with designing for performance from the very beginning, with our initial goal being to achieve sub-millisecond latency tails with high throughputs…
lenkiefer.com
Employment growth and house price trends
LET US TAKE A LOOK AT HOUSE PRICE AND EMPLOYMENT TRENDS. House prices in the Unitest States have been increasing at a rapid pace, about 7 percent on an annual basis. How does that relate to employment growth? And how do those trends vary by geography. Let’s take a look…
cevo.com.au
It's Showtime!
Benefits of putting on a Show(case) I’ve found regular showcases one of the most effective tools in the Agile bag of tricks. Here are some of the things I’ve used showcases for: Sharing your work Well this one is pretty obvious…
timtrice.net
Query Stack Overflow for Top 5 Tags and Children Posts
You should give your query a title but it is not necessary. It does help keep track of your edits as you move along. I want to find the top five tags on Stack Overflow along with the top five posts per tag. The Stack Exchange Data Explorer gives us free access to this data. The data itself is slightly out-of-date, but this is fine…
purrple.cat/blog
emoji domain names with the puny package
Typical sunday night, lost in several inception layers of I don’t know how I got here, what I am doing here and what I was looking for in the first place…
www.jtimm.net
topic models for synchronic & diachronic corpus exploration
Synchronic application Diachronic application Topic clusters quick summary References This post outlines a fairly simple workflow from annotated corpus to topic model, with a focus on the exploratory utility of topic…
wenlong-liu.github.io
Brief introduction of storm hysteresis effects in solute concentration-stream discharge (C-Q) relationship
Generally, in order to investigate the dynamics of stream discharge and solute concentrations (C-Q relationship) in a watershed, researchers and environmental engineers usually set up monitoring stations in the watershed…
www.aggieerin.com
New Publication - Detect Low Quality Data
My coauthor John Scofield and I just had a publication accepted at Behavior Research Methods - you can check out the publication preprint at OSF…
ryantravis.netlify.com
Super Learning from Scratch
Introduction Super Learning is a conceptually simple way of combining predictions from different models using cross validation. It simply uses the cross-validated results to form an optimal weighted combination of predictions…
www.tidyverse.org/articles
usethis 1.3.0
[Jigs are made] to increase productivity through consistency, to do repetitive activities or to do a job more precisely. usethis provides a useful complement to WRE. Each function adds one specific piece of infrastructure to a package or project…
www.riinu.me
Converting old Wordpress posts to Hugo
Get all your wordpress posts into one XML: WP Admin - Tools - Export. Looks like most of my posts were converted like a charm, with nicely formatted code blocks and images. But I few things I noticed that I think I have to fix: Overall I feared a lot worse and am super happy with the conversion experience…
www.jessemaegan.com
R4DS March Challenge
We’re so close to March, which for many of you involves a lot of college basketball…
www.aggieerin.com
Citations in R Markdown + Papaja
Heyo! I wanted to write a post about some of the quirky things I’ve found with writing manuscripts in R Markdown, as well as provide a solution to a problem that someone else might be having. Update: The csl file I describe below is a special formatted one, which was shared with me…
thestudyofthehousehold.com
Converting Anxiety into wisdom
I can be skeptical, even anxious, about ecology – particularly about methods and data quality. However, “anxiety can be cultivated into Wisdom” (McElreath, 2015). This is my mission for myself, and this post might be one of a series attempting this cultivation…
ritsokiguess.site/docs
Making a lot of plots all at once, the tidyverse way
Introduction I was thinking the other day about how you might come up with a bunch of separate-but-related plots, without plotting them one by one, for example to show a class…
yihui.name/en
Netlify is Hiring Its First Data Scientist
Netlify was a company that I started to appreciate very much from 2017. I could feel how committed it was to open source, its passion for making it super convenient for people to publish content to the web, and also its amazing customer service…
mouse-imaging-centre.github.io/blog
Preferential Spatial Gene Expression in Neuroanatomy
Intro In this post I will demonstrate how to use my package ABIgeneRMINC to download, read and analyze mouse brain gene expression data from the Allen Brain Institute…
yihui.name/en
Thanks, Alicia Schep, for Digging into knitr Engines
After figuring out a quick way to do this, I ended up becoming interested in how knitr’s language engines work, and was pleasantly surprised by how accessible the engines are - with a few lines of code you can add a new chunk option to affect the output of a javascript chunk! So thanks, Alicia…
mvaugoyeau.netlify.com
How I check the data
There are outliers? First step: Outliers and Boxplot Second step: Outliers, means and standard error Verify the repeatability of the data Actually I analyse data from a thesis which measure urbanisation’s influence on physicochemical characteristics…
blog.millerti.me
Skipping CI Jobs on GitLab
In the last year or so I’ve earnestly incorporated Continuous Integration (CI) pipelines into a couple of projects to automate the testing, building, and deployment of various sites and packages…
www.tidyverse.org/articles
stringr 1.3.0
Since stringr is loaded with tidyverse, this means that you can now access glue’s functionality without loading another…
gcppodcast.com
Google Play Marketing with Dom Elliott and Stewart Bryson
Dom Elliott leads global developer marketing communications for Google Play…
yihui.name/en
The #1 Question to Ask Yourself when Designing a Questionnaire
Earlier last month, I took the 2018 Stack Overflow Developer Survey, and I found a few really difficult questions, such as the one that asked me to rank 10 aspects of a job opportunity in order of…
www.tidyverse.org/articles
forcats 0.3.0
You can install the latest version with: We needed to make two backward incompatible changes in order to increase consistency across the…
www.rdatagen.net
“I have to randomize by cluster. Is it OK if I only have 6 sites?'
Here is the bottom line: if there are differences between clusters that relate to the outcome, there is a good chance that we might confuse those inherent differences for treatment effects…
www.blog.rdata.lu
BIKE SERVICES API + SHINY = NICE APP
Hi everyone, The JCDecaux API gives the data under the following format: Hence, our shiny application gets real time information on bike stations in 27 cities. This application works better on computer than on smartphone because shiny is not fully smartphone friendly…
blog.wallaroolabs.com
Building low-overhead metrics collection for high-performance systems
Metrics play an integral part in providing confidence in a high-performance software system. Whether you’re dealing with a data processing framework or a web server, metrics provide insight into whether your system is performing as expected…
cattleguard.github.io
Competitive Steak Eating and Gender
Before we get too far down the trail on this, I’ll warn readers that this is a pink and blue post. It’s simple prediction using an interesting R package. It’s important to consider the stakes (pun intended), when “enriching” a dataset with information that might introduce bias…
blog.wallaroolabs.com
Latency Histograms and Percentile Distributions In Wallaroo Performance Metrics
How We Implemented Wallaroo’s Low Overhead Performance Counters, and the Philosophy Behind Our Choices This post is based on an internal white paper from May 2016 and follows the basic paper format…
jesse.tw
Simulating A/B testing and experiment data
Simulating data is super useful for testing methods and data science interview prep Simulation is a great way to study statistics. If you’re picking up a method for the first time, (e.g…
lbusett.netlify.com
Speeding up spatial analyses by integrating `sf` and `data.table`
However, this starts to have problems over really large datasets, because the total number of comparisons to be done still rapidly increase besides the use of spatial…
ropensci.org/technotes
webmockr: mock HTTP requests
But I’ve been making some improvements, so you’ll probably want the dev version: Install some dependencies Next, you’ll want to think about stubbing requests Stubbing requests simply refers to the act of saying “I want all HTTP requests that match this pattern to return this thing”…
cevo.com.au
A Lead...or a Leader?
After discussing hiring processes with friends of mine who work in IT Recruitment, I’ve noticed a common theme; hiring is quite often based on a candidate’s online profile, rather than taking a more holistic…
lenkiefer.com
More house price plots
SO TODAY I SPENT SOME TIME WITH THE KIDDOS and contemplated the Enlightenment, so I didn’t have time to write up some code. But I will post a couple images that I think are interesting. I’ve got two plots for you, both using geofacets. See this post on using the geofacet package in R to make plots like these. The first plot shows U.S…
jesse.tw
Why bother with covariates in A/B testing?
Motivation I’ll skip the part where I tell you why A/B testing is important. Just look at any data science team in tech, Microsoft, Airbnb, Twitter, Facebook, etc. etc…
www.aggieerin.com
A Shiny App to Compare Stats
For a recent publication comparing null hypothesis testing p-values to Bayes Factors and Observation Oriented Modeling, we created a Shiny app to graph all of our complex plots…
ryantravis.netlify.com
Covariate Adjustment for Binary Outcomes in Randomized Trials
Introduction A common misconception about randomized clinical trials is that the randomization process should balance any particular covariate across the arms of the trial and that therefore there is no benefit to controlling for covariates with a regression model unless a particular covariate happens to be unbalanced by…
blog.sellorm.com
I am not a Data Scientist - My R journey
Today is the fifth anniversary of my joining Data science consultancy, Mango Solutions. That also means it’s my fifth anniversary of using and working with R…
jesse.tw
Open Canada 🇨🇦 Audit with R
Motivation Open data is an important way to get information in the hands of citizens…
www.rladiesnyc.org
Parallelization of Simulations with the foreach Package and Missing Data in R
Come out to our March event to hear talks from two great R Ladies! First we’ll learn about parallelization of simulations with the foreach R package, with applications to progression free survival assessed using electronic health records. Then we’ll get an introduction to methods for handling missing data in R…
lenkiefer.com
Recent House Price Trends
LAST YEAR WE TOURED recent house price trends Post. Let’s update the data visualizations with data through December 2017. We are going to show house price trends using data from the publicly available Freddie Mac House Price Index. Animation: Here’s an updated animation showing trends in the top 20 metro areas, based on population…
www.seanlnguyen.com
The World’s Most Powerful Rocket
SpaceX launched Falcon Heavy this week and I remembered how Elon Musk noted that it would have twice the thrust of any rocket currently in existence. I was intrigued by this statement and decided to look further and compare the thrusts of other rockets of the past and rockets that are planned in the future…
nilsreimer.com
Workshop
I’m teaching a workshop/lecture on data visualisation as part of the Advanced Statistics course for our department’s graduate students. I’ve made all slides and materials available on GitHub. Click here for the repository, or here to download everything. If you haven’t attended the workshop, the slides might lack context…
www.jennadallen.com
Football Fans
This was a project that I originally did for my Data Warehousing class in grad school using Microsoft SQL server and SSIS. I’ve been taking a lot of datacamp courses lately and wanted to put what I learned about the tidyverse into action…
www.aggieerin.com
Learn About MOTE
The APA Task Force on Statistical Inference (Wilkinson & TFSI, 1999) has advocated the inclusion of effect sizes in journal articles as an important source of information. The fifth and current edition of the APA publication manual (2001, 2010) emphasized these findings from the task force, along with requirement of effect sizes to publish in their journals…
www.aggieerin.com
Research Statement
Upon arriving at Missouri State University, I founded the Deciphering Outrageous Observations and Modeling (DOOM) lab which has included more than ten graduate and thirty undergraduate students. My research mission has been in two primary domains described in detail below and includes many collaborative efforts throughout the years.…
sharanry.github.io
Skip Thought Vectors
Unsupervised way of representing text in order to produce task independent vector representations. Current processes mostly use compositional operators that map word vectors to sentense vectors using various deep learning methods…
www.frankfarach.com
Taking flight with R
Inspired by the current exhibit on ART ∩ MATH at Seattle’s Center on Contemporary Cart (COCA), I decided to replicate one of the pieces by the very talanted Iranian mathematical artist, Hamid Naderi Yeganeh…
www.aggieerin.com
Teaching Statement
Overview. My approach to teaching centers on the ideas of accessibility, association, and application. As an effective instructor, I strive to ensure that all students are able to orient to and understand material…
yihui.name/en
My Early Career Crisis (2014 - 2015)
So yeah, I’ve always got issues. Sometimes funny, and sometimes not so funny. Sometimes I make good trouble, and sometimes I make really bad trouble. So Dr Xie got a PhD degree in statistics, with an invisible “master” degree of procrastination…
blog.davisvaughan.com
Tidying Excel cash flow spreadsheets using R
Below is a typical cash flow statement for 1 year of performance, broken down by month. This does not fit the “tidy” data standards, but is incredibly common in the Financial world…
yihui.name/en
Another R-Podcast with Eric Nantz
Many people don’t understand the point of Markdown, especially those LaTeX users. They don’t understand why we invest so much time in such a simple and weak authoring language. My philosophy is that the pursuit of features can sometimes be harmful for both software developers and users. You may spend too much time on features of software, and even forget what you actually need to do (e.g…
mouse-imaging-centre.github.io/blog
Co-Clinical Trials
Why co-clinical trials? Here at the Mouse Imaging Centre, a large portion of our research is related to neurodevelopmental disorders. Ultimately the goal of such research is to improve health outcomes for individuals with these disorders…
wytham.rbind.io
Scraping NIH PIs with rvest
Background: I was doing some exploratory work for a potential project looking at intramural investigators at the NIH…
cevo.com.au
Another Warrior In Our Midst
We are delighted to announce that Cevo’s Trent Hornibrook has been selected as one of the 2018 AWS Partner Network (APN) Cloud Warriors…
adamspannbauer.github.io
Image Classification with Keras in R & Python
This post is a comparison between R & Python for applying the pretrained imagenet VGG19 model shipped with keras. The comparison for using the keras model across the 2 languages will be addressing the classic image classification problem of cats vs dogs…
rmflight.github.io
Licensing R Packages that Include Others Code
If you include others code in your own R package, list them as contributors with comments about what they contributed, and add a license statement in the file that includes their code…
gcppodcast.com
Machine Learning Bias and Fairness with Timnit Gebru and Margaret Mitchell
Sample papers on bias and fairness: Additional links: “Is there a gcp service that’s cloud identity-aware proxy except for a static site that you host via cloud…
jesse.tw
Parliament's gender problem
(Preface: I’m 🇨🇦 but idk anything about parliament and dk what a hansard was before I started…
www.gokhanciflikli.com
Supervised vs. Unsupervised Learning
Outcome Supervision Yesterday I was part of an introductory session on machine learning and unsurprisingly, the issue of supervised vs. unsupervised learning came up…
ropensci.org/technotes
Support for hOCR and Tesseract 4 in R
Two major new features are support for HOCR and support for the upcoming Tesseract 4. Every word in the hOCR output includes meta data such as bounding box, confidence metrics, etc…
www.blog.rdata.lu
Teaching Luxembourgish to my computer
Today we reveal a project that Kevin and myself have been working on for the past 2 months, Liss. Liss is a sentiment analysis artificial intelligence; you can let Liss read single words or whole sentences, and Liss will tell you if the overall sentiment is either positive or negative…
blog.schochastics.net
Using UMAP in R with rPython
UMAP is constructed from a theoretical framework based in Riemannian geometry and algebraic topology. The result is a practical scalable algorithm that applies to real world data…
thestudyofthehousehold.com
Easily made fitted and predicted values made easy
Earlier this week I wrote out a multilevel model. It had fit well (though slowly) and I spent a happy hour admiring the chains, checking the coefficients, plotting posterior values. Life was good and easy; merrily did I sail before a fair breeze and a clear sky. My next port of call was to plot some smooth fitted lines, aka “counterfactual predictions”. Ah yes…
leonawicz.github.io/blog
Mix ggplot2 graphs with your favorite memes. memery 0.4.2 released.
Please do share your data analyst meme creations.…
www.ifconfig.it/hugo
NetShot
We live in a time of intent, automation, orchestration and a lot of wonderful tools that promise to make the life of network engineers easier. Sometimes reality is simpler and maybe less fascinating, real problems need to be solved quickly with small budget…
www.jakekaupp.com
Second star to the right and straight on 'til morning
The strength of R doesn’t lie in a single programming paradigm, it lies within the warm, welcoming and ecclectic community of useRs. Like anyone who gets introduced to R, you start to look on the web for other like minded people…
batteriesnotincluded.rbind.io
The FutuRe is Bright
I’ve been a (usually) silent observer of the rstats community via twitter. Occasionally I’ll jump in and share thoughts or retweet something I found particularly helpful or inspiring, but for the most part I just sit back and observe. I’ve always admired the fact that, online, the R community seems helpful, kind, and aware of one another…
www.mytinyshinys.com
EPL Week 27
For the remainder of the season, I will be travelling with a back up laptop so please excuse any shortfall in posts and site updates Match of the Day/p> Aguero-a-go-goWith the signing of Jesus last season and the pursuit of Alexis Sanchez in the recent transfer window, it appears as though Sergio Aguero is no longer the favourite son at Manchester…
satopirka.com
Encoder-decoderモデルとTeacher Forcing,Scheduled Sampling,Professor Forcing
Encoder-decoderモデルとTeacher Forcing,それを拡張したScheduled Sampling,Professor…
satopirka.com
Encoder-decoderモデルとTeacher Forcing,Scheduled Sampling,Professor Forcing
Encoder-decoderモデルとTeacher Forcing,それを拡張したScheduled Sampling,Professor…
www.riinu.me
Hello world
I wrote my last blog post on Wordpress on 20-October 2017 and promised myself this was the last time. I’ve been blogging on Wordpress since 2014 and the more I used it the more painful it got! This is most likely caused by the fact that I have been thrifting further and further away from point-and-click interfaces anyway..…
jvera.netlify.com
My imposter syndrome
There’s a well known issue about the psychology of the majority of people working on Data Science…
theaknowles.com
Reflection
Sort of. It rolled just a wee bit and then it stopped. I’d say we’ve had a pretty stellar year. Thank you to each of you who have come out to the events, who have stuck around and headed to Milo’s with us after, and have provided input, insight, and encouragement…
www.jessemaegan.com
So you’ve been asked to make a reprex
Reprexes are significantly easier to read, as well as copy and paste…
livefreeordichotomize.com
The United States of Seasons
⊕I think my favorite detail about this map is the little splotch that is the Smoky Mountains on the western edge of North Carolina…
www.jamesuanhoro.com
Using binary regression software to model ordinal data as a multivariate GLM
I have read that the most common model for analyzing ordinal data is the cumulative link logistic model, coupled with the proportional odds assumption. Essentially, you treat the outcome as if it were the categorical manifestation of a continuous latent variable…
yutani.rbind.io
dplyr Doesn't Provide Full Support For S4 (For Now?)
I’ve seen sooo many (duplicated) issues on this topic were opened on dplyr’s repo and lubridate’s repo. So, apparently, the content of this post won’t stay useful over time. But, for now, I feel this temporal “known issue” should be well-known, at least among those who suffers from this issue…
www.rladiesnyc.org
Creating websites in R
Date: Thursday, February 15, 2018 Time: 6:30pm Speaker: Emily…
www.mytinyshinys.com
EPL Week 26
For the remainder of the season, I will be travelling with a back up laptop so please excuse any shortfall in posts This week’s crisis team, Chelsea…
www.samatkins.me
Setting up ESLint and Prettier
Image: Prettier.io Set-up ESLint and Prettier I’m a big fan of linting and I love the configurability of ESLint with the auto formatting capabilities of Prettier. It’s been a revelation. Learning best practices in terms of ESLint rules and formatting from Prettier, plus no more bikeshedding at work in pull requests on coding style…