www.rostrum.blog
EARL 2018: Crosstalk in memes
Matt Dray EARL 2018 I gave a talk called Crosstalk: Shiny-like without Shiny1 at the EARL 2018 conference in London. The crosstalk package by Joe Cheng allows HTML widgets – JavaScript visualisations wrapped in R code – to interact with each other…
rviews.rstudio.com
GDP Data via API
Let’s make our changes to both goods and services in the data. I’m also going to replace a few other accounts with shorter names, e.g., I will use “Govt” for “Government consumption expenditures and gross investment”. We now have 25 accounts, each with 285 observations…
blog.rstudio.com
Getting started with deep learning in R
There are good reasons to get into deep learning: Deep learning has been outperforming the respective “classical” techniques in areas like image recognition and natural language processing for a while now, and it has the potential to bring interesting insights even to the analysis of tabular…
gcppodcast.com
Google AI with Jeff Dean
Jeff received a Ph.D. in Computer Science from the University of Washington in 1996, working with Craig Chambers on whole-program optimization techniques for object-oriented languages. He received a B.S…
www.williamrchase.com
Is Hadley Wickham a Cat or Dog Person
Based on who Hadley follows, I’m going to give this round to dogs. Sure Hadley follows a couple of people that mention cats in their descriptions, but I can forgive him. After all, some of my best friends are cat people (don’t worry, I’m working on converting them)…
jenrichmond.rbind.io
next up anova
Next I need learn how to conduct ANOVA in R. the formula- specify which variable is your outcome and which are your grouping variables the data- which dataframe are you analysing In a clinical trial where you are looking to see if the drug improved mood scores you might specify..…
www.rdatagen.net
Binary, beta, beta-binomial
A couple of interesting things to note here. First is that the coefficient estimates are pretty similar to the beta regression model. However, the standard errors are slightly higher, as they should be, since we are using only observed probabilities and not the true (albeit randomly selected or generated) probabilities…
blog.wallaroolabs.com
Converting a Batch Job to Real-time
Introduction Often called stream processing, real-time processing allows applications to run computations and filter data at any scale. At Wallaroo Labs, we build and offer support for an event-based stream processing framework called Wallaroo…
wenlong-liu.github.io
Generate animated tracking maps for hurricanes and typhoons
Further data cleaning is needed to reformat the datetime and rename a column. We can also save the animation into gif files, instead of embedding the animation. R and related packages are able to conveniently draw both static and animated maps for tracking hurricanes or typhoons…
www.ashwinmalshe.com
Homework 2
(7.5 points) B. Using the same data frames as above, recreate the following graph. Take a note of differences in the two graphs. NOTE: The font used here for labeling the bar graphs is “Open Sans”. You may not have the same font available on your computer. In that case, use any alternative font. However, refrain from using the default font OR the font you used in the previous graph. (7…
lenkiefer.com
JOLTS update
It’s been a while since I posted here. I’ve got some longer form things in the works, but let’s ease back into it. Let’s take a look at the latest Job Openings and Labor Turnover Survey (JOLTS) data via the U.S. Bureau of Labor Statistics. This post is an update of this post. Per usual we will make our graphics with R…
ropensci.org/blog
What have these birds been studied for? Querying science outputs with R
For the sake of simplicity, we shall only use the 50 species observed the most often. We first define a function retrieving the titles and abstracts of works obtained as result when querying one species name. We then apply this function to all 50 species and keep each article only once…
jenrichmond.rbind.io
more wrangling tips
It is definitely true that it takes much longer to get your data ready for analysis than it does to actually analyse it. Apparently up to 80% of the data analysis time is spent wrangling data (and cursing and swearing)…
cevo.com.au
Disrupting an industry to help the little guys grow
Earlier this year, Simon Bond sat down for a chat with a couple of our long standing customers, CTO Greg Frye and Head of Development Nish Mahanty from iRexchange…
engineering.pivotal.io
Let's Contribute to Golang!
I want to share some particular insights I gained after attending the Contribution Workshop at GopherCon 2018. The purpose of this post is to allow you to be able to contribute to Golang as easily as possible and to provide you with some helpful tips. These tips are coalesced from multiple sources and my own troubleshooting. We are going to cover Gerrit..…
www.njtierney.com
New Paper Submission
This is the first full length paper I have written about software, and I am really grateful to have had the guidance of my co-author Di Cook - I’m really proud of this work. I’d also like to share the acknowledgements section of the…
yihui.name/en
The First Notebook War
While reading Joel’s critiques on Jupyter notebooks, I couldn’t help thinking whether they apply to R Markdown notebooks, or R Markdown documents in general, so I’ll mention how some of the problems have been addressed in the R Markdown ecosystem in this post, too…
nowosad.github.io
sabre
Creating or determination of regions is a useful way to describe the world. Regionalization does not only allow for a quicker understanding of spatial patterns but also can influence how regions are managed. Regions are created in various disciplines. We can delineate regions based on a single property (e.g. landform regions or climate regions) or several factors (e.g.…
www.stevejburr.com
Exploring test cricket boundary rates in R
This past Friday, I was in the pub with a couple of colleagues watching the cricket. As you’d expect for a bunch of people who deal with numbers all day, there’s was a lot discussion of various statistics…
www.njtierney.com
I graduated!
Some exciting news: I finally walked across the floor and graduated from my PhD in Statistical Sciences from QUT! The graduations was live streamed onto the TVs in our department at QUT - here’s a photo of me at the exact moment my brother yelled out “YEEAAAAHHHH NIIIICCCKK YYEEEAAAAHH” in the acoustically well designed Concert Hall at…
energychisquared.com
La odisea de comercializar electricidad en las SEIEs
Las sistemas extrapeninsulares (antiguos SEIEs, ahora SENP) han sido históricamente un quebradero de cabeza para el legislador…
www.ashwinmalshe.com
Some ggplot2 Features
Compare the frequency distribution to the scatterplot and notice that you have many more points output in the table. Why? This is due to the overlapping points. A potential solution is to change the transparency of the points…
www.stevejburr.com
Tidy Tuesday 04-09-2018
Over the last few months, I’d been taking part in #MakeoverMonday to practice different types of visualisation. I’ve not written these up yet, but plenty of examples can be seen on my Twitter…
www.thecrosstab.com
Women, Not a 'Liberal Tea Party,' are Changing the Democratic Party
Here’s the graph showing that ideological differences between the incumbent candidates in either district provide a compelling refutation of the flank-them-from-the-left hypothesis. Pressley’s victory could only be one from the left if Capuano was moderate-ish, like Crowley…
www.robert-hickman.eu
sf.chlorodot mini-package
The basic idea of the dot chloropleths is to visualise not only the location clustering of each variable but the number of observations (something traditional ‘filled’ chloropleths don’t do). More importantly than this, the maps also just look really really cool…
www.stevejburr.com
#SWDchallenge - September 2018
This was the second time that I’ve taken part in the #SWDchallenge. Full details of the challenge can be seen here. The summary is that the goal is to remake this pie chart into something better…
www.ashwinmalshe.com
Intuition behind Cross-Validation
Cross-validation error is an estimate of the out-of-sample error. Cross-validation is a great tool for helping modelers select a model with low out-of-sample error. The objective of this note is to show you how to write simple code to carry out cross-validation in R. I will post similar code for SAS later…
jenrichmond.rbind.io
testing out t-tests
Here is what I learned about t-tests from doing the analysis below. The AFL data that comes with Dani’s book includes attendance and score information for home and away teams over regular and finals games for years and years. Disclaimer- I know nothing about AFL…
jenrichmond.rbind.io
using R for analysis
I am feeling more confident about my resolution to get rid of Excel and only use R for data wrangling and visualisation. Next steps… analysis…
coolbutuseless.github.io
A stricter `%in%`
I’m not trying to be as elegant as he is, so I’m just going to make something work in isolation…
simplystatistics.org
Being at the Center
A mentor once told me that in any large-ish coordinated scientific collaboration there will usually be regular meetings to discuss the data collection, data analysis, or both. Basically, a meeting to discuss data…
divingintogeneticsandgenomics.rbind.io
Compute averages/sums on GRanges or equal length bins
tile the whole genome to 100 bp bins compute the binned average for my_var It turns out that there are functions to convert between meta data column and RleList…
amateurdatasci.rbind.io
Sliding a Ladder and Filling a Bowl
1 Sliding Ladder 1.1 Problem 1.2 Solution 2 Filling a Bowl 2.1 Problem 2.2 Solution 3 References 1 Sliding Ladder 1.1 Problem Problem 116, Page 142 in Simmons (2016) A ladder 20 ft long is leaning against a wall 12 ft high, with its top projecting over the wall…
jenrichmond.rbind.io
creating data using rep()
Some code that is probably going to be useful in the future: To get AAABBB use To get 1 through 8, repeated 3 times use This creates a new variable called Stimulus that grabs the 8th value of CommentName and fills the column with it…
jenrichmond.rbind.io
mutate + if else = new conditional variable
Most recently I needed to extract a Stimulus number from a variable called CommentName, and then turn those numbers into levels of Model and Emotion in separate…
dusty.phillips.codes
An Order to Learn to Program, Part 4
Parts in this series An Order to Learn to Program, Part 1 An Order to Learn to Program, Part 2 An Order to Learn to Program, Part 3 An Order to Learn to Program, Part 4 Part 4: Binary, bits, and bytes This is part 4 of my series on the order to study topics related to programming…
www.ashwinmalshe.com
Celebrating India's Decriminalization of Homosexuality
A few months back, I made a simple t-shirt design using R. That time, it was an R exercise for me and I didn’t share it with many people. This is my small gift to LGBTQ Indians…
blog.zenggyu.com/en
Git Objects in a Nutshell
The main purpose of Git as a version control system is to keep track of files. The content of each file at any point in time as well as other information that is necessary to reproduce the changing history are stored as objects in a Git repository. Therefore, understanding the types of objects and how they relate is essential to understanding how Git works, and hence knowing how to use it…
engineering.pivotal.io
Safely Upgrading PAS 2.2 → 2.3 with NSX-T Load Balancers
When customers with vSphere+NSX-T-based foundations upgrade PAS (Pivotal Application Service) from 2.2 to 2…
jenrichmond.rbind.io
Use map to read many csv files
Get list of .csv files called files. The code below looks for files that have …
www.tidyverse.org/articles
processx 3.2.0
processx deals with two kinds of external processes: foreground and background. Foreground processes are synchronous, R waits until they finish, and collects the output and the exit code of the process…
coolbutuseless.github.io
strict `case_when`
I want to eliminate ways in which errors or oversights can creep in, so I’d like special handling for the following cases: Before starting, let me state clearly that My main use case for this strict version is ensuring that continuous values are correctly turned into categories, when using complicated rules involving multiple…
gcppodcast.com
ATLAS with Dr. Mario Lassnig
I am not familiar with Docker or Kubernetes - where can I get started? Docker…
jenrichmond.rbind.io
I don’t like cats much
Tom Kelly pointed me towards the @swcarpentry resources You can use dplyr::bind_rows() instead of reduce(rbind()). BUT if you want them all in one frame at the end you probably just want purrr::map_dfr(), which is a map and bind combo function…
ropensci.org/technotes
In praise of Commonmark
In this note I’ll use my local fork of rOpenSci’s website source, and use all the Markdown sources of blog posts as example data. The chunk below is therefore not portable, sorry about that. My fork master branch isn’t entirely synced. It has 202 posts…
rviews.rstudio.com
How to Build a Shiny 'Truck'!
I concluded that most people just don’t need to build them that big! So now, I would like to explain why we needed such a large app and how we went about building it. To give you an idea of the scale I am talking about, an automotive metaphor might be useful. A typical Shiny app I see in my daily work has about 50 or even fewer interaction items…
visualizingtheleague.com
Manu Ginobili
Manu Ginobili retired last week after a 16-year, 4-championship career that will likely see him land in the HOF and was undoubtably great…
cevo.com.au
Scaling AWS ECS services with Alarms, Target tracking & CloudFormation
ECS Autoscaling It is quite hard to come up with efficient scaling policies for Amazon Elastic Container Services (ECS)…
blog.rstudio.com
Shiny Server (Pro) 1.5.8
Upgrade to Node v8.11.3. Added support for listening on IPv6 addresses. X-Powered-By response header now reports “Shiny Server” instead of “Express”…
blog.zenggyu.com/en
The Usage of ANSI C Escape Sequences in Various Programing Languages
An escape sequence is a sequence of characters that does not represent itself when used inside a character or string literal, but is translated into another character or a sequence of characters that may be difficult or impossible to represent directly. Escape sequences are widely used in C and many other languages, such as R, (Postgre)SQL and…
ropensci.org/blog
What are these birds? Complement occurrence data with taxonomy and traits information
We will also need these two data.frames later: abundance by species, and dictionary of names. It is rather tricky to automatically get pics from Phylopic since you might not get one for the order itself, maybe one for the subtaxon instead, etc, so we made decisions blindly in the script above…
coolbutuseless.github.io
bits and bit reversal
I have a sequence of values in R and I want to reverse the bits in each value. Problem dimensions: For a vector of raw bytes I want: That is, each byte within a vector of values is unmoved, but each byte has its bits reversed…
yihui.name/en
xfun
It is also common to see code like this in R scripts (install a package if not installed or…
mgb-research.netlify.com
Bayesian Multilevel Model with Missing Data Complete Workflow (Part 2 of 3)
Having satisfied myself that there are no lingering convergence issues I can create some initial plots. First, I need to re-structure the data to make it a bit easier to plot. Okay now we can plot the results…
martakolczynska.com
Age distributions in samples from cross-national survey projects
Cross-national survey projects conduct surveys on representative samples of adult…
energychisquared.com
Explorando correlaciones de los futuros con el CO2
Las relaciones entre variables constituyen el primer análisis serio para cualquier analista y/o trader del mercado…
theaknowles.com
Ongoing curated list of useful resources for writing articles/theses in RMarkdown
A list of resources I am finding helpful for preparing to write a dissertation in…
roh.engineering
Shiny Gadget
The ‘Fit Distributions’ shiny gadget allows easy automated diagnostics for fitting univariate distributions. It reads in numeric vectors in the global environment, and uses MLE to estimate the parameters of the selected distributions. The visual outputs are GOF statistics, density plot, pp-plot, and qq-plot…
ryansafner.com
Test Post Please Don't Ignore
I hope to do my small part to spread word about these useful tools and post examples I use in class or in my research…
ritsokiguess.site/docs
Scraping Icelandic soccer results with rvest and selenium
Introduction The other day, I wanted to download all of this season’s results in the Icelandic soccer league. I’m sure you often want to do this. Or, more seriously, you want to grab something from a web page, but something is standing in the way of making it simple…
favstats.netlify.com
Visualizing Temperature Rise in Stuttgart, Germany over Time
This is a quick use-case of gganimate to visualize the rise of average temperature in my home town, Stuttgart, Germany…
roh.engineering
fitur 0.6.1 Release
shiny gadget for fitting univariate distributions has been added added test function for distfun objects diagnostic plots now have better checks for distfun objects and lists of distfun…
yihui.name/en
Using TinyTeX from a Flash Drive
One folder to rule them all. No dependency hell. No waste of disk space. No IT support…
rviews.rstudio.com
Slack and Plumber, Part One
Note that this approach is different from APIs that are not being built around a known request or specification…
ryantravis.netlify.com
Some books I read in August
October - China Mieville China Mieville is a very good science fiction writer, so I was intrigued when I saw that he wrote a book about the Russian revolution of…
blog.wallaroolabs.com
Wallaroo Up
Distributed data stream processing frameworks can be hard to build and setup…
jenrichmond.rbind.io
lesser known stars of the tidyverse
Tibble = modern dataframe. Use instead of printing your dataset to the console. summarise(numberNA = sum(is.na(variable)) map_df(~sum(is.na(.))) na_if(“”) When you want help, if it helpful to helpers if you create a minimal reproudicule example so that they can see and run the code using your data. www.r4ds.co…
rubuntu.netlify.com
August 25th c2d4u Update
64 new or updated packages on c2d4u were uploaded on August 25th. Packages are listed below. Currently on the version 3.5 c2d4u PPA, there are 4059 packages for Bionic, 3674 for Xenial, and 3673 for Trusty…
rubuntu.netlify.com
Changes to CRAN Ubuntu webpage regarding apt-secure key
One of the keys is mine, uid “Michael Rutter”. The other key, even though the date suggests otherwise, appears to be new. The uid is “Totally Legit Signing Key”. I am fairly certain that this key was placed there to demonstrate that using the short key ID is flawed, as it is easy to create a key using brute force that matches my key…
www.juliapilowsky.com
Creating a scientific manuscript in LaTeX
I also had to add captions and labels to my figures so I could refer to them in the text with their labels instead of their numbers…
dicook.org
Getting past the little hiccups to getting plotly animations into slides
The tourr package, elegantly crafted by Hadley Wickham, provides a broad range of tour types, and is easy to run locally on your laptop…
aosmith.rbind.io
Getting started simulating data in R
I started out thinking I’d talk about doing simulations. But could I do that in 45 minutes? Maybe not. After much pondering I ended up settling on the topic of how we start a simulation: by making data in R…
yihui.name/en
Impact: Depth or Breadth?
My principle is depth first. The broad impact may be a natural by-product. Make one person extremely happy first. Do not aim at making everybody (even mildly) happy…
blog.zenggyu.com/en
Setting up Datagrip
This is one of a series of posts where I document software configurations for personal reference. This post documents the configurations for Datagrip. Datagrip is available as a snap package on Ubuntu, which can be installed using the following command: Follow these steps: Here is a made-up TNS…
visualizingtheleague.com
Draft Combine Measures & Defense, Part One - Basic Relationships
When we talk about the defensive potential of incoming NBA players, we’re usually referring to a set of physical attributes – height, length, physical strength, footspeed, etc…
favstats.netlify.com
How does Collinearity Influence Linear Regressions?
This is a short simulation study trying to figure out the impact of collinearity on linear regressions. Load the necessary packages First, I write a little function to simulate collinearity. Draw data from function and save it…
blog.millerti.me
How to mirror a git repo with large files in its commit history
I was tasked at work recently with mirroring a client’s codebase to our internal Github organization…
fharrell.com
In Machine Learning Predictions for Health Care the Confusion Matrix is a Matrix of Confusion
Machine Learning (ML) has already transformed e-commerce, web search, advertising, finance, intelligence, media, and more…
nowosad.github.io
Moving beyond pattern-based analysis
GeoPAT 2 gives its users a lot of freedom, having a large number of possible workflows: Some of them can consist of only one step, while others require several steps…
sarahromanes.github.io
My first gganimate - exploring concepts from first year linear modelling!
Have you ever had one of those moments whilst teaching where the content blows your mind? Today, whilst teaching MATH1005 at the University of Sydney, that exact thing happened to me. This weeks content was focused on teaching the students the introductions to linear modelling…
ropensci.org/blog
What's this bird? Classify old natural history drawings with R
In this section, we explain the different elements of our R workflow: preparing images, extracting text, resolving taxonomic names. We get a result! So we see that the image transformation was quite useful…
atusy.github.io/blog
roxygen2タグまとめ
Roxygen2のタグについての情報が複数箇所に分散していて調べるのが大変なのでまとめた。 超訳 + 超要約 しているので、おかしなところがあれば…
alaburda.rbind.io
Analysing my university's publications
Hello! I have recently finished my master’s degree and finished my summer projects! With spare time on my hands, I have finally gotten around to analysing the full list of my university’s publications…
tiao.io
Approximating the KL Divergence Between Implicit Distributions with Density Ratio Estimation
The Kullback-Leibler (KL) divergence between distributions $p$ and $q$ is defined as $$ \mathcal{D}{\mathrm{KL}}[p(x) || q(x)] := \mathbb{E}{p(x)} \left [ \log \left ( \frac{p(x)}{q(x)} \right )…
ewen.io
Building open football player transfer data
Collating player transfers to and from football clubs in major European…
tiao.io
Density Ratio Estimation for KL Divergence Minimization between Implicit Distributions
The Kullback-Leibler (KL) divergence between distributions $p$ and $q$ is defined as $$ \mathcal{D}{\mathrm{KL}}[p(x) || q(x)] := \mathbb{E}{p(x)} \left [ \log \left ( \frac{p(x)}{q(x)} \right )…
rviews.rstudio.com
July 2018: Top 40 New Packages
Below are my “Top 40” picks organized into ten categories: Computational Methods, Data, Econometrics, Machine Learning, Mathematics, Science, Statistics, Time Series, Utilities, and…
mathlacome.rbind.io
Readiness or Between-player normalisation
We need to load the good library into R - we only need tidyverse to work around the database and openxlsx to load our .xls file where we store the data. I upload the data into R and visualize the format of my database…
www.tidyverse.org/articles
Save the date
Hadley has promised “the best BBQ in Texas,” so feel free to take that into account…
blog.zenggyu.com/en
Setting up a Git Repository
This is one of a series of posts where I document software configurations for personal reference. This post documents the configurations for setting up a git…
mathlacome.rbind.io
Welcome
I’m Mathieu Lacome, a sports scientist working in elite football with over 10 years of experience in team-sport…
cevo.com.au
What's the Difference? Monitoring, Logging & Alerting
Actually, not at all… In this short video we provide a high level, introductory explanation of monitoring, logging and alerting - what each is, and does. We’ll highlight the differences and also outline the benefits the three combined bring to your organisation. With obligatory 1980s flourishes…and a bot named Poy Poy! .embed-container iframe, …
jenrichmond.rbind.io
dirty data
I have been doing lots of data wrangling recently and decided a needed a quick rundown of data cleaning in R. Turns out www.DataCamp.com has a course called exactly that. Here are notes on useful things I learned. Histogram: to get an idea of the distribution of data in a particular variable use…
energychisquared.com
Cómo conseguir datos de ESIOS con su API (parte I)
Uno de los primeros retos a los que se enfrentan los analistas del sector es conseguir automatizar la entrada de los…
r-tastic.co.uk
Exploring London Crime with R heat maps
Here’s a sweet collection of packages required to run this analysis: First thing…
www.openplantpathology.org
OPP Interviews
The tweet below highlights members of the Grünwald Lab teaching a workshop during ICPP 2018, Boston,…
martakolczynska.com
Reliability of survey estimates
Data Differences within country-years Differences by groups Gender Age Urban/rural residence Education Sampling scheme The growth in cross-national survey projects in the last decades leads to situations when two or more surveys are carried out in the same country and the same year but in different projects, and contain overlapping sets of survey…
r-mageddon.netlify.com
Writing an R package from scratch
Anyone who has created their own R package has probably come across Hilary Parker’s awesome blogpost, that walks you through creating your very first R package…
favstats.netlify.com
Analyzing Tweets of the ECPR General Conference 2018
This is a short notebook outlining the code used to scrape tweets related to the ECPR Conference 2018 in Hamburg. Load the necessary packages Lets first look at the data structure and column names. Twitter returns over 1,200 unique tweets. The top ten retweeted tweets…
masalmon.eu
O'Reilly animals in trouble? Conservation status of book covers
I had a great time webscraping the menagerie, not only thanks to my now reasonable experience doing such things, but also thanks to the webpage having really good structured html with specific classes…