mgb-research.netlify.com
RMarkdown is the Most Powerful Codebook Maker You Can Find for Your Datasets
Or… You could create a manual that is complete, easy to you use, readily tweakable, completely reproducible, and 100% shareable. Having made a couple of manuals in my time as a former camp director and current Ph.D…
yihui.name/en
Reflections on 25+ Years of '50 Years of Data Science'
- “Comfortably Numb” by Pink Floyd (1979) John Tukey: “The Future of Data Analysis” (1962) John Chambers: “Greater or Lesser Statistics: A Choice for Future Research” (1993) Leo Breiman: “Statistical Modeling: The Two Cultures” (2001) William…
lenkiefer.com
State employment dataviz
Today was JOLTS Tuesday, when the U.S. Bureau of Labor Statistics releases updated data from the Job Openings and Labor Turnover Survey. I was talking about it earlier today, but before we get into that… If you care about dataviz check this out I saw this on Twitter today via Jon Schwabish. Link to a handy dataviz cheatsheet outlining Jon’s core dataviz principles…
rviews.rstudio.com
Two Big Ideas from JSM 2018
The Joint Statistical Meetings offer an astounding number of talks. It is impossible for an individual to see more than a small portion of what is going on. Even so, a diligent attendee ought to come away with more than a few good ideas…
blog.rstudio.com
What they forgot to teach you about R
Join Jenny Bryan and Jim Hester of RStudio for this two-day hands-on workshop designed for experienced R and RStudio users who want to (re)design their R lifestyle! If you’d missed this sold out course at rstudio::conf 2018 now is your chance…
www.tidyverse.org/articles
roxygen2 6.1.0
In this version, we’ve made a number of bug fixes to markdown translation: Code in link text is now properly rendered as…
jvera.netlify.com
Dockerized Spacemacs on Windows
I was talking the other day of setting Spacemacs on Windows with no privileges. Sometimes it’s not even possible to do so, but you have Cygwin/mintty/babun + Docker engine on your machine for testing purposes…
bgstieber.github.io
Everything I Know About Machine Learning I Learned from Making Soup
Introduction In this post, I’m going to make the claim that we can simplify some parts of the machine learning process by using the analogy of making soup…
jacobbuckman.com
OpenAI Five Takeaways
I’m sure this series will be analyzed by people with far deeper understanding of Dota than me, but in my opinion, OpenAI Five essentially won on the back of its teamfighting ability…
amateurdatasci.rbind.io
Tangent Lines and Non-Existent Ones
1 Definition of a Tangent Line 2 Slope of a Tangent Line on a Curve 3 Condition for Existence 4 Limit of an Absolute Value 4.1 Problem 4.2 Solution 5 Reference 1 Definition of a Tangent Line Consider a curve (y = f(x)), and let (P) be a given fixed point on this curve. Let (Q) be a second nearby point on the curve, and draw the secant line…
www.tidyverse.org/articles
The tidymodels Package
The number of tidyverse modeling package continues to grow…
yihui.name/en
Write a Book with bookdown and Publish with Chapman & Hall
I think the typeface should be the only thing you may want to customize. Other things are trivialities and not worth too much time. Don’t be preoccupied with customizing the appearance of your PDF (at least don’t do this too early). I guess the No…
blog.sellorm.com
Automating a simple static website
I started the awesome blogdown list not long after I first heard about the blogdown package for R. I wanted a quick and easy way to showcase websites built with it, so I started a simple “awesome” style README page on GitHub…
lenkiefer.com
Charts within charts
Maybe you are of the opinion that charts should have their y axis extend all the way down to 0, even if the data live far away from zero. I’m not sure if that’s always the right thing to do…
jacobbuckman.com
More on Graph Inspection
The computational graph is not just a nebulous, immaterial abstraction; it is a computational object that exists, and can be inspected…
dusty.phillips.codes
Refund for Contribution?
I accidentally started working on a new personal project for budgeting that I think others might be interested in…
martakolczynska.com
Shiny app for exploring harmonized cross-national survey data (SDR v.1.0)
Instructions References In the previous post I wrote about downloading and exploring the Survey Data Recycling (SDR), version 1.0 dataset, which consists of selected harmonized variables from 22 survey projects, 1966-2013. The SDR project will develop a website for browsing, subsetting, downloading, and visualizing data from the SDR project…
roh.engineering
fitur 0.6.0 Release
Adding continuous distribution testing functions Kolmogorov-Smirnov, Anderson-Darling, and Cramer-Von Mises S3 methods have now been added for distfit objects Code reformatting and…
blog.zenggyu.com/en
Creating User-Defined Functions in PostgreSQL
The following code has been tested with PostgreSQL 10.4 on Ubuntu 18.04. A function can have an arbitrary number of arguments (e.g., 0, 1, 2, …) as input. If there are more than two arguments, each should be separated by a comma. The arguments need not to be named (in which case they should be referenced by positional parameters in the function body), but must has a type…
evangelinereynolds.netlify.com
National Anthems’ Sentiment Scores, Mapped and Interactive
This post, as indicated in the title, is about an interactive mapping of sentiment scores calculated for national anthems. Text analysis is of growing interest for political researchers, and I count myself among the interested! The interactive plot at the end of the post is, I think, an ideal introduction sentiment analysis…
ritsokiguess.site/docs
Testing means and medians
Introduction The data set that inspired this post comes from this edition of Mendenhall and Sincich. It comes from an investigation of how you learn people’s names effectively…
www.robert-hickman.eu
The Knowledge 4th August 2018
Some longer chains involving cities happened in the 1920-1921 seasons in the Second Division, but it seems like the scheduling worked differently then and teams played back to back more, so doesn’t really…
malco.io
When interaction is not interaction
If you want to learn about more about these methods, you may be interested in this great-looking resource from Maarten van Smeden: Thanks to him for providing…
djnavarro.net
Day 99-100: Small Steps
Not surprisingly most of the posts (about 75% of them) were written in the first half of the project. That’s partly the inevitable consequence of the novelty wearing off, but it also there have been a few other things that have come up along the way… One big thing that interacted with this 100Days project in positive way is my teaching…
cevo.com.au
DynamoDB Autoscaling with CloudFormation
DynamoDB Autoscaling DynamoDB autoscaling is a feedback-loop based monitoring setup which can dynamically change provisioned capacity for the table or global secondary index…
blog.wallaroolabs.com
Dynamic Keys
Wallaroo is designed to help you build stateful event processing services that scale easily and elastically. State is partitioned across workers in the system and migrates when workers join or leave the cluster…
lenkiefer.com
Global house price trends
In this post I want to share updated plots comparing house price trends around the world. Or at least part of the world. Our view will be somewhat limited, based on data, but will at least allow us to see how U.S. house prices compare to a few other countries…
bayesianbabes.netlify.com
I (Heart Emoji) Statistics
We learned so much from Hamdan Azhar’s awesome Prismoji tutorial after seeing his wonderful talk at the Southern Data Science Conference…
brendanmolin.netlify.com
Introduction to Urban Institute Education Data API
The Urban Institute released a public API that pulls and pre-processes data from various sources of education institution data, including but not limited to the Department of Education. We used their R package to explore the relationship between applicant and enrollment volume. To install the R, you must have the devtools library installed…
r-mageddon.netlify.com
UK Population Pyramid
On my journey to creating my animated Premier League table in my previous post, I noticed a lot of examples for creating gifs using the magick package. The gist behind the majority of these examples was to create a sequence of snapshots which could be combined together to create animations…
magesblog.com
Use domain knowledge to review prior distributions
The prior predictive distribution shows me how the model behaves before I use my data. Thus, I can check if the model describes the data generating process reasonably well, before I go through the lengthy process of fitting the model…
toscano84.github.io
A Leaflet approach to Coffee Chains
This post talks about making interactive visualizations in R with leaflet(). In this example, I’ll map the USA locations of two of the biggest coffee chains, Starbucks and Dunkin’ Donuts. This package allows us to map data and play interactively with it…
ropensci.org/blog
A package for dimensionality reduction of large data
…. My thought is that the ideal would be a package focused on UMAP specifically, implemented in R or Rcpp. Unfortunately I am not at all an expert in this topic or familiar with the mathematics involved, so the best I would be able to do is try to translate the Python implementation into R…
djnavarro.net
Day 95-98: Press any key
I’m getting to the very end of this package tryout exercise, and I suspect this will be the last post (other than perhaps a wrap up on Friday). It’s been a mildly annoying morning: I’ve done something to my foot, I’ve been awake since 4am, and somehow my twitter feed was full of people talking about Jordan Peterson 😒…
www.justadatageek.com
Exploring Burlington County, NJ, Part Two
Preface I share my blogposts on Twitter and LinkedIn. I also let a few friends know via email. The suggestions that I received were welcome. Some were things I had already planned to do and others I had not thought of…
www.granvillematheson.com
Publications
2018 Matheson, GJ (2018). We need to talk about reliability: Making better use of test retest studies for study design and interpretation. bioRxiv, 274894. Matheson, GJ, Plavén-Sigray, P, Louzolo, Anaïs, Borg, J, Farde, L, Petrovic, P & Cervenka, S (2018). Dopamine D1 receptor availability is not associated with delusional ideation measures of psychosis proneness…
www.ddrive.no
Reading vintage magazines with `hocr`
library(tidyverse) library(tesseract) library(pdftools) library(hocr) library(here) library(fs) library(hunspell) library(hrbrthemes) library(patchwork) Challenge This post is inspired by recent tweet by Paige Bailey about vintage computer magazines made available for free download on…
dsollberger.netlify.com
Semester Schedule Planner
The convention is that “0” is a Sunday, “1” is a Monday, …, and “5” is a Friday…
www.granvillematheson.com
Who I Am and What I Do
My Present My name is Granville Matheson, currently living and working in Stockholm,…
masalmon.eu
ALLSTATisticians in decline? A polite look at ALLSTAT email Archives
And then it was time to scrape and parse… I created a function getting the metadata out of each archive page. The trickiest points here were: ALLSTAT encourages you to use keywords in emails’ subjects, so many job openings contain some variant of “job”, and that’s the sample on which I shall work…
dusty.phillips.codes
An Order to Learn to Program, Part 1
Parts in this series An Order to Learn to Program, Part 1 An Order to Learn to Program, Part 2 An Order to Learn to Program, Part 3 An Order to Learn to Program, Part 4 Part 1 Learning to program is hard. There are a few reasons this is the case: Programming itself is hard. However, this is less true than most people believe…
lenkiefer.com
House price gif that keeps on giffing
This tweet turned out to be popular: 👀house price trends👀 pic.twitter.com/JXB5P0H84A - Leonard Kiefer (@lenkiefer) August 1, 2018 It’s a remix of a chart we made here, though it uses a different index…
r-mageddon.netlify.com
Interactive Premier League Table
For my inaugural blog post I decided I would step into the world of animated graphics for the first time…
www.jennadallen.com
Text Mining
As a part of the R4DS June Challenge and the “Summer of Data Science” Twitter initiative started by Data Science Renee, I decided to improve my text mining skills by working my way through Tidy Text Mining with R by Julia Silge and David Robinson…
yihui.name/en
Two of My Use Cases of Lazy Evaluation
I’m not an expert of quotation or lazy evaluation. I just happen to have used them occasionally. I’m going to talk about two use cases of lazy evaluation. In two of my talks, I used delayed assignments to execute R code for no good reasons except that I just wanted to confuse the audience…
blog.rstudio.com
rstudio
Learn from and interact with these outstanding invited speakers and R innovators: Find out what RStudio is working on from the people who make the materials and tools you use…
tiao.io
Building Probability Distributions with the TensorFlow Probability Bijector API
The underlying process that generates samples $\tilde{\mathbf{y}} \sim p{Y}(\mathbf{y})$ is simple to describe, and is of the general form, $$ \tilde{\mathbf{y}} \sim p{Y}(\mathbf{y}) \quad \Leftrightarrow \quad \tilde{\mathbf{y}} = G(\tilde{\mathbf{x}}), \quad \tilde{\mathbf{x}} \sim…
lcolladotor.github.io
Harassment, diversity in science and inspiration from my grandmother
I actually don’t know much more. She passed away when I was 13 after a years long battle with disease. Google tells me that she is a co-author of at least three titles in the field of Public Health: I did inherit her souvenirs from her trips (my dad also loves them) and something that is precious to me: a medal with her name…
yihui.name/en
In HTML and the Web I Trust
My blog post is relatively short, and I strongly recommend that you read the full article “LaTeX is dead”…
lenkiefer.com
Beige-ian Statistics
Let’s pick up where we left off yesterday and do some more exploration with text mining. Like yesterday we’ll use the tidytext package for R. And we’ll lean heavily on Julie Silge and David Robinson’s Text Mining with R…
rviews.rstudio.com
June 2018: Top 40 New Packages
Simulate a variety of periodically-collapsing bubble models…
engineering.pivotal.io
Let's use Vault - Part 3
We have now come to the final leg of our journey. We will be integrating Vault with Concourse CI and exploring some tooling that was built specifically to make your lives easier…
djnavarro.net
Day 82-94
So this is a post about how I set up one part of my workflow. I feel nervous about it for two reasons: Yes, I realise that I’m setting myself up to feel bad. I should stop…
engineering.pivotal.io
Let's use Vault - Part 2
This post provides a guideline of simplest commands that are required to setup vault locally for your team instead of having to wade through all of Hashicorp’s extensive documentation…
lenkiefer.com
Text Mining Fedspeak
Textmining is an exciting topic. There is tremendous potential to gain insights from textual analysis. See for example Gentzko, Kelly and Taddy’s Text as Data. While text mining may be quite advanced in other fields, in finance and economics the application of these techniques is still in its infancy…
atusy.github.io/blog
blogdownでMathJax
MathJaxとは? MathJaxを利用すると、$\TeX$ 記法を用いて数式を表現できる。 ブロックにするには $$\latex$$ と入力すると $$ \LaTeX $$ となる。…
ab604.netlify.com
An unmet need for data science training
The aim is to try to define the problem(s) a bit better and also a bit of a cry for help. I appreciate that none of this may be novel, but I needed to get it written down and out of my head…
blog.rstudio.com
Announcing the 1st Bookdown Contest
There are no hard judging criteria for this contest, but in general, we’d prefer these types of applications: We’d also like to see non-English applications, such as books written in CJK (Chinese, Japanese, Korean), right-to-left, or other languages, since there are additional challenges in typesetting with these…
cevo.com.au
Docker on Windows
Since my recently published blog post When Docker meets Make, a few of my mates commented they couldn’t get Docker and GNU Make working on their shiny new Windows PCs…
www.williamrchase.com
Friday Fails #2
So what do now? Well the truth is that I compared sequences from all three sources, tried to minimize differences between them, and then just sent that off for synthesis. Does the sequence I sent off match any of the sources exactly? No…
research.libd.org/rstatsclub
Hacking our way through UpSetR
First, let’s install the version we used for this post: Next, we did the same (commas to semicolons) for the inputs of the first example. Our club session was out of time, so we decided to continue our project another day and ask for help on twitter…
yihui.name/en
Help Needed
The three components of a software package are equally important in my eyes: source code, documentation, and tests…
yihui.name/en
Quietly Struggling (with Software)
Anyway, if a software package seems to try to turn an average user into a sysadmin, that is probably not a good sign. Ummm… R CMD javareconf? Java 8? 9? JDK 10? sudo? Actually I did figure out how to install it, but it was a long way… I was afraid that I would have to go through this again in the future (like I did for a few times in the past), so I chose not to touch it again…
toscano84.github.io
Tuition costs and gdp per capita
This post will explore with R one of the simplest approaches to predict a response of a quantitative nature. This approach is called Linear Regression…
simplystatistics.org
Why I Indent My Code 8 Spaces
In the video version of the talk (not in the slides) Jenny calls out my particular indentation rule, which is to use 8 spaces. In my experience, people tend to find this a rather extreme indentation policy, with maybe 4 spaces being at the outer limit of what they could imagine…
www.rostrum.blog
Engifification in R with gifski
Matt Dray gifski::gifski() You and I both know that the world needs more intergalatic-sloth-pizza gifs. Great news: ‘the fastest gif encoder in the universe’ has been created. The gifski package for R is now in CRAN…
irene.rbind.io
FUNctional programming tricks in httr
httr basics On with the tricks! Embrace the backtick The null-default operator %||% Check argument inputs with match…
lenkiefer.com
Getting animated about new home sales
Indications are that U.S. housing market activity in the middle part of 2018 has moderated. Home sales estimates for both new home sales and existing home sales declined on a seasonally adjusted basis in June relative to May. House price growth has also moderated recently. Some folks have gotten animated about the recent trends…
dusty.phillips.codes
Hacking Happier
Back in 2012, I wrote a book called Hacking Happy. It was my first self-published work, and I was actually surprised by how well it did without a publisher or marketing behind it. I had plenty of positive feedback including more than one hopefully exaggerated, “This book saved my life…
www.williamrchase.com
How to Phylogeny (Part 0
Hi, in this series of posts, I’ll introduce a general workflow for estimating a phylogenetic tree for a single gene. When learning phylogenetics, I often got lost in the dizzying array of tools and methods available for sequence alignment and tree building…
www.williamrchase.com
How to Phylogeny (Part 1
Hi, in this series of posts, I’ll introduce a general workflow for estimating a phylogenetic tree for a single gene. When learning phylogenetics, I often got lost in the dizzying array of tools and methods available for sequence alignment and tree building…
gcppodcast.com
Next Day 2
Paresh Kharya is Group Product Marketing Manager for data center products at NVIDIA responsible for product marketing of NVIDIA’s Tesla accelerated computing platform…
blog.rstudio.com
RStudio Connect 1.6.6 - Custom Emails
We are excited to announce RStudio Connect 1.6.6! This release caps a series of improvements to RStudio Connect’s ability to deliver your work to others. All customizations are done using code in the underlying R Markdown document…
blog.wallaroolabs.com
Real-time Streaming Pattern
Introduction This week I will continue series of posts to looking at data processing patterns used to build event triggered streaming applications, focusing on joining event streams…
ropensci.org/blog
rOpenSci Educators Collaborative
In previous posts in this series, we identified challenges that individual instructors typically face when teaching science with R, and shared characteristics of effective educational resources to help address these challenges…
dusty.phillips.codes
I'm Back
Hi there, I’m Dusty. Welcome to my resurrected blog. I started a tech blog in 2007 that I maintained with regular posts for several years. While it was well-regarded at the time, I took it down in late 2016 for several reasons…
rviews.rstudio.com
JSM 2018 Itinerary
JSM 2018 is almost here! Usually around this time, I comb through the entire program manually making an itinerary for myself. But this year I decided to try something new – a programmatic way of going through the program, and then building a Shiny app that helps me better navigate the online program…
mlr-blog.netlify.com
Visualization of spatial cross-validation partitioning
Introduction In July mlr got a new feature that extended the support for spatial data: The ability to visualize spatial partitions in cross-validation (CV) 9d4f3…
ropensci.org/blog
rOpenSci Educators Collaborative
In the first post of this series, we sketched out some of the common challenges faced by educators who teach with R across scientific domains…
magesblog.com
Notes from the 1st Insurance Data Science event
The Insurance Data Science conference is a great opportunity to bring together academic and industry leaders, who will explore new developments and applications of cutting-edge techniques in insurance, as well as the bigger picture of how statistical and business practice is transformed with the wide adoption and embedding of digital…
nowosad.github.io
Quantifying temporal change of landscape pattern
Imagine you have two values expressing the world population in 1950 (2.5 billion people) and 2012 (7.1 billion people). How would you compare the change in the world population? The easiest (and correct) approach is just to subtract the past value from the more recent one: We can conclude that the world population between 1950 and 2015 increased by 4…
yihui.name/en
Slowly but Steadily, They Started to Help Me Answer Questions
Some people have been helping me so frequently on Github and Stack Overflow that I can easily list their names: Marcel Schilling, Michael Harper, Ralf Stubner, Christophe Dervieux, and TC Zhang (apologies if I omitted other frequent helpers - I’m pretty bad at remembering people’s…
cevo.com.au
You'll always remember your first time...open sourcing
You may have seen our previous posts regarding projects that Cevo has worked on that have gone on to be open sourced. If not, now’s a great time to catch up before we continue! Information about Watchmen can be found here and here…
ropensci.org/blog
rOpenSci Educators Collaborative
This first post aims to summarize the main challenges that educators face, as a tool to help them think through the decisions they make about their course materials…
ropensci.org/technotes
Gifski on CRAN
The R package wraps the Rust crate and can be installed in the usual way from CRAN. One of the major benefits of Rust is that it has no runtime, so the R package has no dependencies. This is the first CRAN package that interfaces a Rust library…
martakolczynska.com
ISA World Congress 2018
Getting data from Twitter Tweets over time Text analysis Tweets by ISA Resesarch Committee The International Sociological Association 19th World Congress of Sociology in Toronto (15-21 July) has received quite some Twitter…
martakolczynska.com
Late start
This blog is going to be mostly about my adventures with R, primarily using survey data, and usually somewhat related to my social science interests; for the fun of it, to share code and hopefully get feedback…
simplystatistics.org
Partitioning the Variation in Data
Understanding which aspects of the variation in your data are fixed is important because often you can collect data on those fixed characteristics and use them directly in any statistical modeling you might do. For example, season is an easy covariate to include because we already know when the seasons begin and end…
rviews.rstudio.com
REST APIs and Plumber
Traditionally, moving this model into production has involved one of two approaches: either running customer data through the model on a batch basis and caching the results in a database, or handing the model definition off to a development team to translate the work done in R into another language, such as Java or Scala…
ryantravis.netlify.com
Some Books I Read in July
The Dilemmas of Lenin: Terrorism, War, Empire, Love, Revolution by Tariq Ali A very interesting biography of Lenin. The book isn’t a traditional biography. Instead, it’s a kind of intellectual biography focused around particular topics…
jvera.netlify.com
Back to Basics (Emacs + ESS + zsh + byobu)
I think i’m a little bit “old school” or maybe sometimes you have to use the right tool for the task. Some time ago, i discovered I feel more productive staying away from the mouse, so terminals and text editors are my daily working environment from then. If you use linux, i’m preaching to the choir, and nearly the same if your work involved Mac OS…
lenkiefer.com
House price gifski
I saw today, via Ropensci a blog post about a new package for making animated gifs with R called gifski now available on CRAN. Let’s adapt the code we shared last week to use the gifski package…
evangelinereynolds.netlify.com
Layered Presentation of Graphics, revised
I think it is more straight forward than messing around with alpha. Several folks brought up geom_blank() having looked at the previous implementation, but I didn’t find it necessary in this case if you are using last_plot() which I think it makes sense to do in this context. Still, geom_blank is good to know about…
livefreeordichotomize.com
Shinyviewr
Motivation My package shinysense has been around for more than a year now. It started as a package to add swiping via touch screens to shiny for our app Papr, but then slowly got built to include functions for hearing (shinyearr), movement (shinymovr), and drawing (shinydrawr). However one major sense was missing: vision…
yihui.name/en
The Best Way to Support LaTeX Math in Markdown with MathJax
your math expressions will have a light-gray background, too. It is possible to remove the background color, but it is relatively complicated. Even if you use a pair of backticks, you still have the second problem above…