www.rostrum.blog

EARL 2018: Crosstalk in memes

Matt Dray EARL 2018 I gave a talk called Crosstalk: Shiny-like without Shiny1 at the EARL 2018 conference in London. The crosstalk package by Joe Cheng allows HTML widgets – JavaScript visualisations wrapped in R code – to interact with each other


rviews.rstudio.com

GDP Data via API

Let’s make our changes to both goods and services in the data. I’m also going to replace a few other accounts with shorter names, e.g., I will use “Govt” for “Government consumption expenditures and gross investment”. We now have 25 accounts, each with 285 observations


blog.rstudio.com

Getting started with deep learning in R

There are good reasons to get into deep learning: Deep learning has been outperforming the respective “classical” techniques in areas like image recognition and natural language processing for a while now, and it has the potential to bring interesting insights even to the analysis of tabular


gcppodcast.com

Google AI with Jeff Dean

Jeff received a Ph.D. in Computer Science from the University of Washington in 1996, working with Craig Chambers on whole-program optimization techniques for object-oriented languages. He received a B.S


www.williamrchase.com

Is Hadley Wickham a Cat or Dog Person

Based on who Hadley follows, I’m going to give this round to dogs. Sure Hadley follows a couple of people that mention cats in their descriptions, but I can forgive him. After all, some of my best friends are cat people (don’t worry, I’m working on converting them)


jenrichmond.rbind.io

next up anova

Next I need learn how to conduct ANOVA in R. the formula- specify which variable is your outcome and which are your grouping variables the data- which dataframe are you analysing In a clinical trial where you are looking to see if the drug improved mood scores you might specify..


www.rdatagen.net

Binary, beta, beta-binomial

A couple of interesting things to note here. First is that the coefficient estimates are pretty similar to the beta regression model. However, the standard errors are slightly higher, as they should be, since we are using only observed probabilities and not the true (albeit randomly selected or generated) probabilities


blog.wallaroolabs.com

Converting a Batch Job to Real-time

Introduction Often called stream processing, real-time processing allows applications to run computations and filter data at any scale. At Wallaroo Labs, we build and offer support for an event-based stream processing framework called Wallaroo


wenlong-liu.github.io

Generate animated tracking maps for hurricanes and typhoons

Further data cleaning is needed to reformat the datetime and rename a column. We can also save the animation into gif files, instead of embedding the animation. R and related packages are able to conveniently draw both static and animated maps for tracking hurricanes or typhoons


www.ashwinmalshe.com

Homework 2

(7.5 points) B. Using the same data frames as above, recreate the following graph. Take a note of differences in the two graphs. NOTE: The font used here for labeling the bar graphs is “Open Sans”. You may not have the same font available on your computer. In that case, use any alternative font. However, refrain from using the default font OR the font you used in the previous graph. (7


lenkiefer.com

JOLTS update

It’s been a while since I posted here. I’ve got some longer form things in the works, but let’s ease back into it. Let’s take a look at the latest Job Openings and Labor Turnover Survey (JOLTS) data via the U.S. Bureau of Labor Statistics. This post is an update of this post. Per usual we will make our graphics with R


ropensci.org/blog

What have these birds been studied for? Querying science outputs with R

For the sake of simplicity, we shall only use the 50 species observed the most often. We first define a function retrieving the titles and abstracts of works obtained as result when querying one species name. We then apply this function to all 50 species and keep each article only once


jenrichmond.rbind.io

more wrangling tips

It is definitely true that it takes much longer to get your data ready for analysis than it does to actually analyse it. Apparently up to 80% of the data analysis time is spent wrangling data (and cursing and swearing)


cevo.com.au

Disrupting an industry to help the little guys grow

Earlier this year, Simon Bond sat down for a chat with a couple of our long standing customers, CTO Greg Frye and Head of Development Nish Mahanty from iRexchange


engineering.pivotal.io

Let's Contribute to Golang!

I want to share some particular insights I gained after attending the Contribution Workshop at GopherCon 2018. The purpose of this post is to allow you to be able to contribute to Golang as easily as possible and to provide you with some helpful tips. These tips are coalesced from multiple sources and my own troubleshooting. We are going to cover Gerrit..


www.njtierney.com

New Paper Submission

This is the first full length paper I have written about software, and I am really grateful to have had the guidance of my co-author Di Cook - I’m really proud of this work. I’d also like to share the acknowledgements section of the


yihui.name/en

The First Notebook War

While reading Joel’s critiques on Jupyter notebooks, I couldn’t help thinking whether they apply to R Markdown notebooks, or R Markdown documents in general, so I’ll mention how some of the problems have been addressed in the R Markdown ecosystem in this post, too


nowosad.github.io

sabre

Creating or determination of regions is a useful way to describe the world. Regionalization does not only allow for a quicker understanding of spatial patterns but also can influence how regions are managed. Regions are created in various disciplines. We can delineate regions based on a single property (e.g. landform regions or climate regions) or several factors (e.g.


www.stevejburr.com

Exploring test cricket boundary rates in R

This past Friday, I was in the pub with a couple of colleagues watching the cricket. As you’d expect for a bunch of people who deal with numbers all day, there’s was a lot discussion of various statistics


www.njtierney.com

I graduated!

Some exciting news: I finally walked across the floor and graduated from my PhD in Statistical Sciences from QUT! The graduations was live streamed onto the TVs in our department at QUT - here’s a photo of me at the exact moment my brother yelled out “YEEAAAAHHHH NIIIICCCKK YYEEEAAAAHH” in the acoustically well designed Concert Hall at


energychisquared.com

La odisea de comercializar electricidad en las SEIEs

Las sistemas extrapeninsulares (antiguos SEIEs, ahora SENP) han sido históricamente un quebradero de cabeza para el legislador


www.ashwinmalshe.com

Some ggplot2 Features

Compare the frequency distribution to the scatterplot and notice that you have many more points output in the table. Why? This is due to the overlapping points. A potential solution is to change the transparency of the points


www.stevejburr.com

Tidy Tuesday 04-09-2018

Over the last few months, I’d been taking part in #MakeoverMonday to practice different types of visualisation. I’ve not written these up yet, but plenty of examples can be seen on my Twitter


www.thecrosstab.com

Women, Not a 'Liberal Tea Party,' are Changing the Democratic Party

Here’s the graph showing that ideological differences between the incumbent candidates in either district provide a compelling refutation of the flank-them-from-the-left hypothesis. Pressley’s victory could only be one from the left if Capuano was moderate-ish, like Crowley


www.robert-hickman.eu

sf.chlorodot mini-package

The basic idea of the dot chloropleths is to visualise not only the location clustering of each variable but the number of observations (something traditional ‘filled’ chloropleths don’t do). More importantly than this, the maps also just look really really cool


www.stevejburr.com

#SWDchallenge - September 2018

This was the second time that I’ve taken part in the #SWDchallenge. Full details of the challenge can be seen here. The summary is that the goal is to remake this pie chart into something better


www.ashwinmalshe.com

Intuition behind Cross-Validation

Cross-validation error is an estimate of the out-of-sample error. Cross-validation is a great tool for helping modelers select a model with low out-of-sample error. The objective of this note is to show you how to write simple code to carry out cross-validation in R. I will post similar code for SAS later


jenrichmond.rbind.io

testing out t-tests

Here is what I learned about t-tests from doing the analysis below. The AFL data that comes with Dani’s book includes attendance and score information for home and away teams over regular and finals games for years and years. Disclaimer- I know nothing about AFL


jenrichmond.rbind.io

using R for analysis

I am feeling more confident about my resolution to get rid of Excel and only use R for data wrangling and visualisation. Next steps… analysis


coolbutuseless.github.io

A stricter `%in%`

I’m not trying to be as elegant as he is, so I’m just going to make something work in isolation


simplystatistics.org

Being at the Center

A mentor once told me that in any large-ish coordinated scientific collaboration there will usually be regular meetings to discuss the data collection, data analysis, or both. Basically, a meeting to discuss data


divingintogeneticsandgenomics.rbind.io

Compute averages/sums on GRanges or equal length bins

tile the whole genome to 100 bp bins compute the binned average for my_var It turns out that there are functions to convert between meta data column and RleList


amateurdatasci.rbind.io

Sliding a Ladder and Filling a Bowl

1 Sliding Ladder 1.1 Problem 1.2 Solution 2 Filling a Bowl 2.1 Problem 2.2 Solution 3 References 1 Sliding Ladder 1.1 Problem Problem 116, Page 142 in Simmons (2016) A ladder 20 ft long is leaning against a wall 12 ft high, with its top projecting over the wall


jenrichmond.rbind.io

creating data using rep()

Some code that is probably going to be useful in the future: To get AAABBB use To get 1 through 8, repeated 3 times use This creates a new variable called Stimulus that grabs the 8th value of CommentName and fills the column with it


jenrichmond.rbind.io

mutate + if else = new conditional variable

Most recently I needed to extract a Stimulus number from a variable called CommentName, and then turn those numbers into levels of Model and Emotion in separate


dusty.phillips.codes

An Order to Learn to Program, Part 4

Parts in this series An Order to Learn to Program, Part 1 An Order to Learn to Program, Part 2 An Order to Learn to Program, Part 3 An Order to Learn to Program, Part 4 Part 4: Binary, bits, and bytes This is part 4 of my series on the order to study topics related to programming


www.ashwinmalshe.com

Celebrating India's Decriminalization of Homosexuality

A few months back, I made a simple t-shirt design using R. That time, it was an R exercise for me and I didn’t share it with many people. This is my small gift to LGBTQ Indians


blog.zenggyu.com/en

Git Objects in a Nutshell

The main purpose of Git as a version control system is to keep track of files. The content of each file at any point in time as well as other information that is necessary to reproduce the changing history are stored as objects in a Git repository. Therefore, understanding the types of objects and how they relate is essential to understanding how Git works, and hence knowing how to use it


engineering.pivotal.io

Safely Upgrading PAS 2.2 → 2.3 with NSX-T Load Balancers

When customers with vSphere+NSX-T-based foundations upgrade PAS (Pivotal Application Service) from 2.2 to 2


jenrichmond.rbind.io

Use map to read many csv files

Get list of .csv files called files. The code below looks for files that have


www.tidyverse.org/articles

processx 3.2.0

processx deals with two kinds of external processes: foreground and background. Foreground processes are synchronous, R waits until they finish, and collects the output and the exit code of the process


coolbutuseless.github.io

strict `case_when`

I want to eliminate ways in which errors or oversights can creep in, so I’d like special handling for the following cases: Before starting, let me state clearly that My main use case for this strict version is ensuring that continuous values are correctly turned into categories, when using complicated rules involving multiple


gcppodcast.com

ATLAS with Dr. Mario Lassnig

I am not familiar with Docker or Kubernetes - where can I get started? Docker


jenrichmond.rbind.io

I don’t like cats much

Tom Kelly pointed me towards the @swcarpentry resources You can use dplyr::bind_rows() instead of reduce(rbind()). BUT if you want them all in one frame at the end you probably just want purrr::map_dfr(), which is a map and bind combo function


ropensci.org/technotes

In praise of Commonmark

In this note I’ll use my local fork of rOpenSci’s website source, and use all the Markdown sources of blog posts as example data. The chunk below is therefore not portable, sorry about that. My fork master branch isn’t entirely synced. It has 202 posts


rviews.rstudio.com

How to Build a Shiny 'Truck'!

I concluded that most people just don’t need to build them that big! So now, I would like to explain why we needed such a large app and how we went about building it. To give you an idea of the scale I am talking about, an automotive metaphor might be useful. A typical Shiny app I see in my daily work has about 50 or even fewer interaction items


visualizingtheleague.com

Manu Ginobili

Manu Ginobili retired last week after a 16-year, 4-championship career that will likely see him land in the HOF and was undoubtably great


cevo.com.au

Scaling AWS ECS services with Alarms, Target tracking & CloudFormation

ECS Autoscaling It is quite hard to come up with efficient scaling policies for Amazon Elastic Container Services (ECS)


blog.rstudio.com

Shiny Server (Pro) 1.5.8

Upgrade to Node v8.11.3. Added support for listening on IPv6 addresses. X-Powered-By response header now reports “Shiny Server” instead of “Express”


blog.zenggyu.com/en

The Usage of ANSI C Escape Sequences in Various Programing Languages

An escape sequence is a sequence of characters that does not represent itself when used inside a character or string literal, but is translated into another character or a sequence of characters that may be difficult or impossible to represent directly. Escape sequences are widely used in C and many other languages, such as R, (Postgre)SQL and


ropensci.org/blog

What are these birds? Complement occurrence data with taxonomy and traits information

We will also need these two data.frames later: abundance by species, and dictionary of names. It is rather tricky to automatically get pics from Phylopic since you might not get one for the order itself, maybe one for the subtaxon instead, etc, so we made decisions blindly in the script above


coolbutuseless.github.io

bits and bit reversal

I have a sequence of values in R and I want to reverse the bits in each value. Problem dimensions: For a vector of raw bytes I want: That is, each byte within a vector of values is unmoved, but each byte has its bits reversed


yihui.name/en

xfun

It is also common to see code like this in R scripts (install a package if not installed or


mgb-research.netlify.com

Bayesian Multilevel Model with Missing Data Complete Workflow (Part 2 of 3)

Having satisfied myself that there are no lingering convergence issues I can create some initial plots. First, I need to re-structure the data to make it a bit easier to plot. Okay now we can plot the results


martakolczynska.com

Age distributions in samples from cross-national survey projects

Cross-national survey projects conduct surveys on representative samples of adult


energychisquared.com

Explorando correlaciones de los futuros con el CO2

Las relaciones entre variables constituyen el primer análisis serio para cualquier analista y/o trader del mercado


theaknowles.com

Ongoing curated list of useful resources for writing articles/theses in RMarkdown

A list of resources I am finding helpful for preparing to write a dissertation in


roh.engineering

Shiny Gadget

The ‘Fit Distributions’ shiny gadget allows easy automated diagnostics for fitting univariate distributions. It reads in numeric vectors in the global environment, and uses MLE to estimate the parameters of the selected distributions. The visual outputs are GOF statistics, density plot, pp-plot, and qq-plot


ryansafner.com

Test Post Please Don't Ignore

I hope to do my small part to spread word about these useful tools and post examples I use in class or in my research


ritsokiguess.site/docs

Scraping Icelandic soccer results with rvest and selenium

Introduction The other day, I wanted to download all of this season’s results in the Icelandic soccer league. I’m sure you often want to do this. Or, more seriously, you want to grab something from a web page, but something is standing in the way of making it simple


favstats.netlify.com

Visualizing Temperature Rise in Stuttgart, Germany over Time

This is a quick use-case of gganimate to visualize the rise of average temperature in my home town, Stuttgart, Germany


roh.engineering

fitur 0.6.1 Release

shiny gadget for fitting univariate distributions has been added added test function for distfun objects diagnostic plots now have better checks for distfun objects and lists of distfun


yihui.name/en

Using TinyTeX from a Flash Drive

One folder to rule them all. No dependency hell. No waste of disk space. No IT support


rviews.rstudio.com

Slack and Plumber, Part One

Note that this approach is different from APIs that are not being built around a known request or specification


ryantravis.netlify.com

Some books I read in August

October - China Mieville China Mieville is a very good science fiction writer, so I was intrigued when I saw that he wrote a book about the Russian revolution of


blog.wallaroolabs.com

Wallaroo Up

Distributed data stream processing frameworks can be hard to build and setup


jenrichmond.rbind.io

lesser known stars of the tidyverse

Tibble = modern dataframe. Use instead of printing your dataset to the console. summarise(numberNA = sum(is.na(variable)) map_df(~sum(is.na(.))) na_if(“”) When you want help, if it helpful to helpers if you create a minimal reproudicule example so that they can see and run the code using your data. www.r4ds.co


rubuntu.netlify.com

August 25th c2d4u Update

64 new or updated packages on c2d4u were uploaded on August 25th. Packages are listed below. Currently on the version 3.5 c2d4u PPA, there are 4059 packages for Bionic, 3674 for Xenial, and 3673 for Trusty


rubuntu.netlify.com

Changes to CRAN Ubuntu webpage regarding apt-secure key

One of the keys is mine, uid “Michael Rutter”. The other key, even though the date suggests otherwise, appears to be new. The uid is “Totally Legit Signing Key”. I am fairly certain that this key was placed there to demonstrate that using the short key ID is flawed, as it is easy to create a key using brute force that matches my key


www.juliapilowsky.com

Creating a scientific manuscript in LaTeX

I also had to add captions and labels to my figures so I could refer to them in the text with their labels instead of their numbers


dicook.org

Getting past the little hiccups to getting plotly animations into slides

The tourr package, elegantly crafted by Hadley Wickham, provides a broad range of tour types, and is easy to run locally on your laptop


aosmith.rbind.io

Getting started simulating data in R

I started out thinking I’d talk about doing simulations. But could I do that in 45 minutes? Maybe not. After much pondering I ended up settling on the topic of how we start a simulation: by making data in R


yihui.name/en

Impact: Depth or Breadth?

My principle is depth first. The broad impact may be a natural by-product. Make one person extremely happy first. Do not aim at making everybody (even mildly) happy


blog.zenggyu.com/en

Setting up Datagrip

This is one of a series of posts where I document software configurations for personal reference. This post documents the configurations for Datagrip. Datagrip is available as a snap package on Ubuntu, which can be installed using the following command: Follow these steps: Here is a made-up TNS


visualizingtheleague.com

Draft Combine Measures & Defense, Part One - Basic Relationships

When we talk about the defensive potential of incoming NBA players, we’re usually referring to a set of physical attributes – height, length, physical strength, footspeed, etc


favstats.netlify.com

How does Collinearity Influence Linear Regressions?

This is a short simulation study trying to figure out the impact of collinearity on linear regressions. Load the necessary packages First, I write a little function to simulate collinearity. Draw data from function and save it


blog.millerti.me

How to mirror a git repo with large files in its commit history

I was tasked at work recently with mirroring a client’s codebase to our internal Github organization


fharrell.com

In Machine Learning Predictions for Health Care the Confusion Matrix is a Matrix of Confusion

Machine Learning (ML) has already transformed e-commerce, web search, advertising, finance, intelligence, media, and more


nowosad.github.io

Moving beyond pattern-based analysis

GeoPAT 2 gives its users a lot of freedom, having a large number of possible workflows: Some of them can consist of only one step, while others require several steps


sarahromanes.github.io

My first gganimate - exploring concepts from first year linear modelling!

Have you ever had one of those moments whilst teaching where the content blows your mind? Today, whilst teaching MATH1005 at the University of Sydney, that exact thing happened to me. This weeks content was focused on teaching the students the introductions to linear modelling


ropensci.org/blog

What's this bird? Classify old natural history drawings with R

In this section, we explain the different elements of our R workflow: preparing images, extracting text, resolving taxonomic names. We get a result! So we see that the image transformation was quite useful


atusy.github.io/blog

roxygen2タグまとめ

Roxygen2のタグについての情報が複数箇所に分散していて調べるのが大変なのでまとめた。 超訳 + 超要約 しているので、おかしなところがあれば


alaburda.rbind.io

Analysing my university's publications

Hello! I have recently finished my master’s degree and finished my summer projects! With spare time on my hands, I have finally gotten around to analysing the full list of my university’s publications


tiao.io

Approximating the KL Divergence Between Implicit Distributions with Density Ratio Estimation

The Kullback-Leibler (KL) divergence between distributions $p$ and $q$ is defined as $$ \mathcal{D}{\mathrm{KL}}[p(x) || q(x)] := \mathbb{E}{p(x)} \left [ \log \left ( \frac{p(x)}{q(x)} \right )


ewen.io

Building open football player transfer data

Collating player transfers to and from football clubs in major European


tiao.io

Density Ratio Estimation for KL Divergence Minimization between Implicit Distributions

The Kullback-Leibler (KL) divergence between distributions $p$ and $q$ is defined as $$ \mathcal{D}{\mathrm{KL}}[p(x) || q(x)] := \mathbb{E}{p(x)} \left [ \log \left ( \frac{p(x)}{q(x)} \right )


rviews.rstudio.com

July 2018: Top 40 New Packages

Below are my “Top 40” picks organized into ten categories: Computational Methods, Data, Econometrics, Machine Learning, Mathematics, Science, Statistics, Time Series, Utilities, and


mathlacome.rbind.io

Readiness or Between-player normalisation

We need to load the good library into R - we only need tidyverse to work around the database and openxlsx to load our .xls file where we store the data. I upload the data into R and visualize the format of my database


www.tidyverse.org/articles

Save the date

Hadley has promised “the best BBQ in Texas,” so feel free to take that into account


blog.zenggyu.com/en

Setting up a Git Repository

This is one of a series of posts where I document software configurations for personal reference. This post documents the configurations for setting up a git


mathlacome.rbind.io

Welcome

I’m Mathieu Lacome, a sports scientist working in elite football with over 10 years of experience in team-sport


cevo.com.au

What's the Difference? Monitoring, Logging & Alerting

Actually, not at all… In this short video we provide a high level, introductory explanation of monitoring, logging and alerting - what each is, and does. We’ll highlight the differences and also outline the benefits the three combined bring to your organisation. With obligatory 1980s flourishes…and a bot named Poy Poy! .embed-container iframe,


jenrichmond.rbind.io

dirty data

I have been doing lots of data wrangling recently and decided a needed a quick rundown of data cleaning in R. Turns out www.DataCamp.com has a course called exactly that. Here are notes on useful things I learned. Histogram: to get an idea of the distribution of data in a particular variable use


energychisquared.com

Cómo conseguir datos de ESIOS con su API (parte I)

Uno de los primeros retos a los que se enfrentan los analistas del sector es conseguir automatizar la entrada de los


r-tastic.co.uk

Exploring London Crime with R heat maps

Here’s a sweet collection of packages required to run this analysis: First thing


www.openplantpathology.org

OPP Interviews

The tweet below highlights members of the Grünwald Lab teaching a workshop during ICPP 2018, Boston,


martakolczynska.com

Reliability of survey estimates

Data Differences within country-years Differences by groups Gender Age Urban/rural residence Education Sampling scheme The growth in cross-national survey projects in the last decades leads to situations when two or more surveys are carried out in the same country and the same year but in different projects, and contain overlapping sets of survey


r-mageddon.netlify.com

Writing an R package from scratch

Anyone who has created their own R package has probably come across Hilary Parker’s awesome blogpost, that walks you through creating your very first R package


favstats.netlify.com

Analyzing Tweets of the ECPR General Conference 2018

This is a short notebook outlining the code used to scrape tweets related to the ECPR Conference 2018 in Hamburg. Load the necessary packages Lets first look at the data structure and column names. Twitter returns over 1,200 unique tweets. The top ten retweeted tweets


masalmon.eu

O'Reilly animals in trouble? Conservation status of book covers

I had a great time webscraping the menagerie, not only thanks to my now reasonable experience doing such things, but also thanks to the webpage having really good structured html with specific classes