mgb-research.netlify.com

RMarkdown is the Most Powerful Codebook Maker You Can Find for Your Datasets

Or… You could create a manual that is complete, easy to you use, readily tweakable, completely reproducible, and 100% shareable. Having made a couple of manuals in my time as a former camp director and current Ph.D


yihui.name/en

Reflections on 25+ Years of '50 Years of Data Science'

  • “Comfortably Numb” by Pink Floyd (1979) John Tukey: “The Future of Data Analysis” (1962) John Chambers: “Greater or Lesser Statistics: A Choice for Future Research” (1993) Leo Breiman: “Statistical Modeling: The Two Cultures” (2001) William


lenkiefer.com

State employment dataviz

Today was JOLTS Tuesday, when the U.S. Bureau of Labor Statistics releases updated data from the Job Openings and Labor Turnover Survey. I was talking about it earlier today, but before we get into that… If you care about dataviz check this out I saw this on Twitter today via Jon Schwabish. Link to a handy dataviz cheatsheet outlining Jon’s core dataviz principles


rviews.rstudio.com

Two Big Ideas from JSM 2018

The Joint Statistical Meetings offer an astounding number of talks. It is impossible for an individual to see more than a small portion of what is going on. Even so, a diligent attendee ought to come away with more than a few good ideas


blog.rstudio.com

What they forgot to teach you about R

Join Jenny Bryan and Jim Hester of RStudio for this two-day hands-on workshop designed for experienced R and RStudio users who want to (re)design their R lifestyle! If you’d missed this sold out course at rstudio::conf 2018 now is your chance


www.tidyverse.org/articles

roxygen2 6.1.0

In this version, we’ve made a number of bug fixes to markdown translation: Code in link text is now properly rendered as


jvera.netlify.com

Dockerized Spacemacs on Windows

I was talking the other day of setting Spacemacs on Windows with no privileges. Sometimes it’s not even possible to do so, but you have Cygwin/mintty/babun + Docker engine on your machine for testing purposes


bgstieber.github.io

Everything I Know About Machine Learning I Learned from Making Soup

Introduction In this post, I’m going to make the claim that we can simplify some parts of the machine learning process by using the analogy of making soup


jacobbuckman.com

OpenAI Five Takeaways

I’m sure this series will be analyzed by people with far deeper understanding of Dota than me, but in my opinion, OpenAI Five essentially won on the back of its teamfighting ability


amateurdatasci.rbind.io

Tangent Lines and Non-Existent Ones

1 Definition of a Tangent Line 2 Slope of a Tangent Line on a Curve 3 Condition for Existence 4 Limit of an Absolute Value 4.1 Problem 4.2 Solution 5 Reference 1 Definition of a Tangent Line Consider a curve (y = f(x)), and let (P) be a given fixed point on this curve. Let (Q) be a second nearby point on the curve, and draw the secant line


www.tidyverse.org/articles

The tidymodels Package

The number of tidyverse modeling package continues to grow


yihui.name/en

Write a Book with bookdown and Publish with Chapman & Hall

I think the typeface should be the only thing you may want to customize. Other things are trivialities and not worth too much time. Don’t be preoccupied with customizing the appearance of your PDF (at least don’t do this too early). I guess the No


blog.sellorm.com

Automating a simple static website

I started the awesome blogdown list not long after I first heard about the blogdown package for R. I wanted a quick and easy way to showcase websites built with it, so I started a simple “awesome” style README page on GitHub


lenkiefer.com

Charts within charts

Maybe you are of the opinion that charts should have their y axis extend all the way down to 0, even if the data live far away from zero. I’m not sure if that’s always the right thing to do


jacobbuckman.com

More on Graph Inspection

The computational graph is not just a nebulous, immaterial abstraction; it is a computational object that exists, and can be inspected


dusty.phillips.codes

Refund for Contribution?

I accidentally started working on a new personal project for budgeting that I think others might be interested in


martakolczynska.com

Shiny app for exploring harmonized cross-national survey data (SDR v.1.0)

Instructions References  In the previous post I wrote about downloading and exploring the Survey Data Recycling (SDR), version 1.0 dataset, which consists of selected harmonized variables from 22 survey projects, 1966-2013. The SDR project will develop a website for browsing, subsetting, downloading, and visualizing data from the SDR project


roh.engineering

fitur 0.6.0 Release

Adding continuous distribution testing functions Kolmogorov-Smirnov, Anderson-Darling, and Cramer-Von Mises S3 methods have now been added for distfit objects Code reformatting and


blog.zenggyu.com/en

Creating User-Defined Functions in PostgreSQL

The following code has been tested with PostgreSQL 10.4 on Ubuntu 18.04. A function can have an arbitrary number of arguments (e.g., 0, 1, 2, …) as input. If there are more than two arguments, each should be separated by a comma. The arguments need not to be named (in which case they should be referenced by positional parameters in the function body), but must has a type


ritsokiguess.site/docs

Dates and lubridate

Did somebody say dates? Well, actually not those


evangelinereynolds.netlify.com

National Anthems’ Sentiment Scores, Mapped and Interactive

This post, as indicated in the title, is about an interactive mapping of sentiment scores calculated for national anthems. Text analysis is of growing interest for political researchers, and I count myself among the interested! The interactive plot at the end of the post is, I think, an ideal introduction sentiment analysis


ritsokiguess.site/docs

Testing means and medians

Introduction The data set that inspired this post comes from this edition of Mendenhall and Sincich. It comes from an investigation of how you learn people’s names effectively


www.robert-hickman.eu

The Knowledge 4th August 2018

Some longer chains involving cities happened in the 1920-1921 seasons in the Second Division, but it seems like the scheduling worked differently then and teams played back to back more, so doesn’t really


malco.io

When interaction is not interaction

If you want to learn about more about these methods, you may be interested in this great-looking resource from Maarten van Smeden: Thanks to him for providing


djnavarro.net

Day 99-100: Small Steps

Not surprisingly most of the posts (about 75% of them) were written in the first half of the project. That’s partly the inevitable consequence of the novelty wearing off, but it also there have been a few other things that have come up along the way… One big thing that interacted with this 100Days project in positive way is my teaching


cevo.com.au

DynamoDB Autoscaling with CloudFormation

DynamoDB Autoscaling DynamoDB autoscaling is a feedback-loop based monitoring setup which can dynamically change provisioned capacity for the table or global secondary index


blog.wallaroolabs.com

Dynamic Keys

Wallaroo is designed to help you build stateful event processing services that scale easily and elastically. State is partitioned across workers in the system and migrates when workers join or leave the cluster


lenkiefer.com

Global house price trends

In this post I want to share updated plots comparing house price trends around the world. Or at least part of the world. Our view will be somewhat limited, based on data, but will at least allow us to see how U.S. house prices compare to a few other countries


bayesianbabes.netlify.com

I (Heart Emoji) Statistics

We learned so much from Hamdan Azhar’s awesome Prismoji tutorial after seeing his wonderful talk at the Southern Data Science Conference


brendanmolin.netlify.com

Introduction to Urban Institute Education Data API

The Urban Institute released a public API that pulls and pre-processes data from various sources of education institution data, including but not limited to the Department of Education. We used their R package to explore the relationship between applicant and enrollment volume. To install the R, you must have the devtools library installed


r-mageddon.netlify.com

UK Population Pyramid

On my journey to creating my animated Premier League table in my previous post, I noticed a lot of examples for creating gifs using the magick package. The gist behind the majority of these examples was to create a sequence of snapshots which could be combined together to create animations


magesblog.com

Use domain knowledge to review prior distributions

The prior predictive distribution shows me how the model behaves before I use my data. Thus, I can check if the model describes the data generating process reasonably well, before I go through the lengthy process of fitting the model


toscano84.github.io

A Leaflet approach to Coffee Chains

This post talks about making interactive visualizations in R with leaflet(). In this example, I’ll map the USA locations of two of the biggest coffee chains, Starbucks and Dunkin’ Donuts. This package allows us to map data and play interactively with it


ropensci.org/blog

A package for dimensionality reduction of large data

…. My thought is that the ideal would be a package focused on UMAP specifically, implemented in R or Rcpp. Unfortunately I am not at all an expert in this topic or familiar with the mathematics involved, so the best I would be able to do is try to translate the Python implementation into R


djnavarro.net

Day 95-98: Press any key

I’m getting to the very end of this package tryout exercise, and I suspect this will be the last post (other than perhaps a wrap up on Friday). It’s been a mildly annoying morning: I’ve done something to my foot, I’ve been awake since 4am, and somehow my twitter feed was full of people talking about Jordan Peterson 😒


www.justadatageek.com

Exploring Burlington County, NJ, Part Two

Preface I share my blogposts on Twitter and LinkedIn. I also let a few friends know via email. The suggestions that I received were welcome. Some were things I had already planned to do and others I had not thought of


www.niklasjohannes.com

Nothing to see yet

Thanks for popping by my website


www.granvillematheson.com

Publications

2018 Matheson, GJ (2018). We need to talk about reliability: Making better use of test retest studies for study design and interpretation. bioRxiv, 274894. Matheson, GJ, Plavén-Sigray, P, Louzolo, Anaïs, Borg, J, Farde, L, Petrovic, P & Cervenka, S (2018). Dopamine D1 receptor availability is not associated with delusional ideation measures of psychosis proneness


www.ddrive.no

Reading vintage magazines with `hocr`

library(tidyverse) library(tesseract) library(pdftools) library(hocr) library(here) library(fs) library(hunspell) library(hrbrthemes) library(patchwork) Challenge This post is inspired by recent tweet by Paige Bailey about vintage computer magazines made available for free download on


dsollberger.netlify.com

Semester Schedule Planner

The convention is that “0” is a Sunday, “1” is a Monday, …, and “5” is a Friday


www.granvillematheson.com

Who I Am and What I Do

My Present My name is Granville Matheson, currently living and working in Stockholm,


masalmon.eu

ALLSTATisticians in decline? A polite look at ALLSTAT email Archives

And then it was time to scrape and parse… I created a function getting the metadata out of each archive page. The trickiest points here were: ALLSTAT encourages you to use keywords in emails’ subjects, so many job openings contain some variant of “job”, and that’s the sample on which I shall work


dusty.phillips.codes

An Order to Learn to Program, Part 1

Parts in this series An Order to Learn to Program, Part 1 An Order to Learn to Program, Part 2 An Order to Learn to Program, Part 3 An Order to Learn to Program, Part 4 Part 1 Learning to program is hard. There are a few reasons this is the case: Programming itself is hard. However, this is less true than most people believe


lenkiefer.com

House price gif that keeps on giffing

This tweet turned out to be popular: 👀house price trends👀 pic.twitter.com/JXB5P0H84A - Leonard Kiefer (@lenkiefer) August 1, 2018 It’s a remix of a chart we made here, though it uses a different index


r-mageddon.netlify.com

Interactive Premier League Table

For my inaugural blog post I decided I would step into the world of animated graphics for the first time


atusy.github.io/blog

R3.5系ではファイル同期ソフトでパッケージを同期しないように

タイトル通り、R3


www.jennadallen.com

Text Mining

As a part of the R4DS June Challenge and the “Summer of Data Science” Twitter initiative started by Data Science Renee, I decided to improve my text mining skills by working my way through Tidy Text Mining with R by Julia Silge and David Robinson


yihui.name/en

Two of My Use Cases of Lazy Evaluation

I’m not an expert of quotation or lazy evaluation. I just happen to have used them occasionally. I’m going to talk about two use cases of lazy evaluation. In two of my talks, I used delayed assignments to execute R code for no good reasons except that I just wanted to confuse the audience


blog.rstudio.com

rstudio

Learn from and interact with these outstanding invited speakers and R innovators: Find out what RStudio is working on from the people who make the materials and tools you use


tiao.io

Building Probability Distributions with the TensorFlow Probability Bijector API

The underlying process that generates samples $\tilde{\mathbf{y}} \sim p{Y}(\mathbf{y})$ is simple to describe, and is of the general form, $$ \tilde{\mathbf{y}} \sim p{Y}(\mathbf{y}) \quad \Leftrightarrow \quad \tilde{\mathbf{y}} = G(\tilde{\mathbf{x}}), \quad \tilde{\mathbf{x}} \sim


lcolladotor.github.io

Harassment, diversity in science and inspiration from my grandmother

I actually don’t know much more. She passed away when I was 13 after a years long battle with disease. Google tells me that she is a co-author of at least three titles in the field of Public Health: I did inherit her souvenirs from her trips (my dad also loves them) and something that is precious to me: a medal with her name


yihui.name/en

In HTML and the Web I Trust

My blog post is relatively short, and I strongly recommend that you read the full article “LaTeX is dead”


lenkiefer.com

Beige-ian Statistics

Let’s pick up where we left off yesterday and do some more exploration with text mining. Like yesterday we’ll use the tidytext package for R. And we’ll lean heavily on Julie Silge and David Robinson’s Text Mining with R


rviews.rstudio.com

June 2018: Top 40 New Packages

Simulate a variety of periodically-collapsing bubble models


engineering.pivotal.io

Let's use Vault - Part 3

We have now come to the final leg of our journey. We will be integrating Vault with Concourse CI and exploring some tooling that was built specifically to make your lives easier


djnavarro.net

Day 82-94

So this is a post about how I set up one part of my workflow. I feel nervous about it for two reasons: Yes, I realise that I’m setting myself up to feel bad. I should stop


engineering.pivotal.io

Let's use Vault - Part 2

This post provides a guideline of simplest commands that are required to setup vault locally for your team instead of having to wade through all of Hashicorp’s extensive documentation


lenkiefer.com

Text Mining Fedspeak

Textmining is an exciting topic. There is tremendous potential to gain insights from textual analysis. See for example Gentzko, Kelly and Taddy’s Text as Data. While text mining may be quite advanced in other fields, in finance and economics the application of these techniques is still in its infancy


atusy.github.io/blog

blogdownでMathJax

MathJaxとは? MathJaxを利用すると、$\TeX$ 記法を用いて数式を表現できる。 ブロックにするには $$\latex$$ と入力すると $$ \LaTeX $$ となる。


ab604.netlify.com

An unmet need for data science training

The aim is to try to define the problem(s) a bit better and also a bit of a cry for help. I appreciate that none of this may be novel, but I needed to get it written down and out of my head


blog.rstudio.com

Announcing the 1st Bookdown Contest

There are no hard judging criteria for this contest, but in general, we’d prefer these types of applications: We’d also like to see non-English applications, such as books written in CJK (Chinese, Japanese, Korean), right-to-left, or other languages, since there are additional challenges in typesetting with these


cevo.com.au

Docker on Windows

Since my recently published blog post When Docker meets Make, a few of my mates commented they couldn’t get Docker and GNU Make working on their shiny new Windows PCs


www.williamrchase.com

Friday Fails #2

So what do now? Well the truth is that I compared sequences from all three sources, tried to minimize differences between them, and then just sent that off for synthesis. Does the sequence I sent off match any of the sources exactly? No


research.libd.org/rstatsclub

Hacking our way through UpSetR

First, let’s install the version we used for this post: Next, we did the same (commas to semicolons) for the inputs of the first example. Our club session was out of time, so we decided to continue our project another day and ask for help on twitter


yihui.name/en

Help Needed

The three components of a software package are equally important in my eyes: source code, documentation, and tests


yihui.name/en

Quietly Struggling (with Software)

Anyway, if a software package seems to try to turn an average user into a sysadmin, that is probably not a good sign. Ummm… R CMD javareconf? Java 8? 9? JDK 10? sudo? Actually I did figure out how to install it, but it was a long way… I was afraid that I would have to go through this again in the future (like I did for a few times in the past), so I chose not to touch it again


yihui.name/en

The First Bookdown Contest

[..


toscano84.github.io

Tuition costs and gdp per capita

This post will explore with R one of the simplest approaches to predict a response of a quantitative nature. This approach is called Linear Regression


simplystatistics.org

Why I Indent My Code 8 Spaces

In the video version of the talk (not in the slides) Jenny calls out my particular indentation rule, which is to use 8 spaces. In my experience, people tend to find this a rather extreme indentation policy, with maybe 4 spaces being at the outer limit of what they could imagine


www.rostrum.blog

Engifification in R with gifski

Matt Dray gifski::gifski() You and I both know that the world needs more intergalatic-sloth-pizza gifs. Great news: ‘the fastest gif encoder in the universe’ has been created. The gifski package for R is now in CRAN


irene.rbind.io

FUNctional programming tricks in httr

httr basics On with the tricks! Embrace the backtick The null-default operator %||% Check argument inputs with match


lenkiefer.com

Getting animated about new home sales

Indications are that U.S. housing market activity in the middle part of 2018 has moderated. Home sales estimates for both new home sales and existing home sales declined on a seasonally adjusted basis in June relative to May. House price growth has also moderated recently. Some folks have gotten animated about the recent trends


dusty.phillips.codes

Hacking Happier

Back in 2012, I wrote a book called Hacking Happy. It was my first self-published work, and I was actually surprised by how well it did without a publisher or marketing behind it. I had plenty of positive feedback including more than one hopefully exaggerated, “This book saved my life


www.williamrchase.com

How to Phylogeny (Part 0

Hi, in this series of posts, I’ll introduce a general workflow for estimating a phylogenetic tree for a single gene. When learning phylogenetics, I often got lost in the dizzying array of tools and methods available for sequence alignment and tree building


www.williamrchase.com

How to Phylogeny (Part 1

Hi, in this series of posts, I’ll introduce a general workflow for estimating a phylogenetic tree for a single gene. When learning phylogenetics, I often got lost in the dizzying array of tools and methods available for sequence alignment and tree building


gcppodcast.com

Next Day 2

Paresh Kharya is Group Product Marketing Manager for data center products at NVIDIA responsible for product marketing of NVIDIA’s Tesla accelerated computing platform


blog.rstudio.com

RStudio Connect 1.6.6 - Custom Emails

We are excited to announce RStudio Connect 1.6.6! This release caps a series of improvements to RStudio Connect’s ability to deliver your work to others. All customizations are done using code in the underlying R Markdown document


blog.wallaroolabs.com

Real-time Streaming Pattern

Introduction This week I will continue series of posts to looking at data processing patterns used to build event triggered streaming applications, focusing on joining event streams


ropensci.org/blog

rOpenSci Educators Collaborative

In previous posts in this series, we identified challenges that individual instructors typically face when teaching science with R, and shared characteristics of effective educational resources to help address these challenges


dusty.phillips.codes

I'm Back

Hi there, I’m Dusty. Welcome to my resurrected blog. I started a tech blog in 2007 that I maintained with regular posts for several years. While it was well-regarded at the time, I took it down in late 2016 for several reasons


rviews.rstudio.com

JSM 2018 Itinerary

JSM 2018 is almost here! Usually around this time, I comb through the entire program manually making an itinerary for myself. But this year I decided to try something new – a programmatic way of going through the program, and then building a Shiny app that helps me better navigate the online program


mlr-blog.netlify.com

Visualization of spatial cross-validation partitioning

Introduction In July mlr got a new feature that extended the support for spatial data: The ability to visualize spatial partitions in cross-validation (CV) 9d4f3


ropensci.org/blog

rOpenSci Educators Collaborative

In the first post of this series, we sketched out some of the common challenges faced by educators who teach with R across scientific domains


magesblog.com

Notes from the 1st Insurance Data Science event

The Insurance Data Science conference is a great opportunity to bring together academic and industry leaders, who will explore new developments and applications of cutting-edge techniques in insurance, as well as the bigger picture of how statistical and business practice is transformed with the wide adoption and embedding of digital


nowosad.github.io

Quantifying temporal change of landscape pattern

Imagine you have two values expressing the world population in 1950 (2.5 billion people) and 2012 (7.1 billion people). How would you compare the change in the world population? The easiest (and correct) approach is just to subtract the past value from the more recent one: We can conclude that the world population between 1950 and 2015 increased by 4


yihui.name/en

Slowly but Steadily, They Started to Help Me Answer Questions

Some people have been helping me so frequently on Github and Stack Overflow that I can easily list their names: Marcel Schilling, Michael Harper, Ralf Stubner, Christophe Dervieux, and TC Zhang (apologies if I omitted other frequent helpers - I’m pretty bad at remembering people’s


www.tidyverse.org/articles

Tidy evaluation in ggplot2

We could use this same pattern to make a


cevo.com.au

You'll always remember your first time...open sourcing

You may have seen our previous posts regarding projects that Cevo has worked on that have gone on to be open sourced. If not, now’s a great time to catch up before we continue! Information about Watchmen can be found here and here


ropensci.org/blog

rOpenSci Educators Collaborative

This first post aims to summarize the main challenges that educators face, as a tool to help them think through the decisions they make about their course materials


ropensci.org/technotes

Gifski on CRAN

The R package wraps the Rust crate and can be installed in the usual way from CRAN. One of the major benefits of Rust is that it has no runtime, so the R package has no dependencies. This is the first CRAN package that interfaces a Rust library


martakolczynska.com

ISA World Congress 2018

Getting data from Twitter Tweets over time Text analysis Tweets by ISA Resesarch Committee The International Sociological Association 19th World Congress of Sociology in Toronto (15-21 July) has received quite some Twitter


martakolczynska.com

Late start

This blog is going to be mostly about my adventures with R, primarily using survey data, and usually somewhat related to my social science interests; for the fun of it, to share code and hopefully get feedback


simplystatistics.org

Partitioning the Variation in Data

Understanding which aspects of the variation in your data are fixed is important because often you can collect data on those fixed characteristics and use them directly in any statistical modeling you might do. For example, season is an easy covariate to include because we already know when the seasons begin and end


rviews.rstudio.com

REST APIs and Plumber

Traditionally, moving this model into production has involved one of two approaches: either running customer data through the model on a batch basis and caching the results in a database, or handing the model definition off to a development team to translate the work done in R into another language, such as Java or Scala


ryantravis.netlify.com

Some Books I Read in July

The Dilemmas of Lenin: Terrorism, War, Empire, Love, Revolution by Tariq Ali A very interesting biography of Lenin. The book isn’t a traditional biography. Instead, it’s a kind of intellectual biography focused around particular topics


jvera.netlify.com

Back to Basics (Emacs + ESS + zsh + byobu)

I think i’m a little bit “old school” or maybe sometimes you have to use the right tool for the task. Some time ago, i discovered I feel more productive staying away from the mouse, so terminals and text editors are my daily working environment from then. If you use linux, i’m preaching to the choir, and nearly the same if your work involved Mac OS


lenkiefer.com

House price gifski

I saw today, via Ropensci a blog post about a new package for making animated gifs with R called gifski now available on CRAN. Let’s adapt the code we shared last week to use the gifski package


evangelinereynolds.netlify.com

Layered Presentation of Graphics, revised

I think it is more straight forward than messing around with alpha. Several folks brought up geom_blank() having looked at the previous implementation, but I didn’t find it necessary in this case if you are using last_plot() which I think it makes sense to do in this context. Still, geom_blank is good to know about


livefreeordichotomize.com

Shinyviewr

Motivation My package shinysense has been around for more than a year now. It started as a package to add swiping via touch screens to shiny for our app Papr, but then slowly got built to include functions for hearing (shinyearr), movement (shinymovr), and drawing (shinydrawr). However one major sense was missing: vision


yihui.name/en

The Best Way to Support LaTeX Math in Markdown with MathJax

your math expressions will have a light-gray background, too. It is possible to remove the background color, but it is relatively complicated. Even if you use a pair of backticks, you still have the second problem above