jesse.tw

A data viz that asks questions

I don’t design to answer questions, I design to ask them1 There are an incredible range of data visualizations. They’re great, really. Sometimes they are very pretty pictures. Which is great, unless the goal wasn’t to draw a pretty picture


livefreeordichotomize.com

A year as told by fitbit

I managed to wear a fitbit the entirety of 2017, this is exciting for a few reasons: one I have commitment problems, and two: it’s a lot of data that I have to play with


rmflight.github.io

Custom Deployment Script

Here is the simple script I ended up


www.gokhanciflikli.com

Visualising US Voting Records with shinydashboard

Introducing adavis My second ever post on this blog was on introducing adamap, a Shiny app that maps Americans for Democratic Action voting scores (the so-called Liberal Quotient) between 1947-2015. It was built with highcharter, and hence it was nicely interactive but quite slow


yihui.name/en

library(methods)

Okay, let’s see how terribly “slow” it is today. On my MacBook Pro (I should run this multiple times but I think 0.04 should be in the ballpark): It is sad to see things that exist for historical reasons are so hard to change. P.S


roelandtn.frama.io

saveRDS() vs write_csv() - a newbie introduction

Last week, I came across this blog post from Yihui-down called save() vs saveRDS(). It was issued after a response from Jenny Bryan on Twitter. So when more experimented people like them advise to use a tool, it is for a reason. So I decided to give it a try


www.mytinyshinys.com

EPL Week 19

Match of the DayArsenal and Liverpool showing why they are fun to watch but won’t be winning major trophies Kane is AbleSome eyebrows were raised when Harry Kane was rated by a Guardian poll of 169 experts as the fifth best player in the world for 2017 but he certainly did his best to justify that ranking with a hat-trick at high-flying


www.blog.rdata.lu

Launching your shiny app in 2 clicks

Hello everyone, It’s fast and useful if you work with colleagues that don’t have a clue about R and just want to use your shiny app. Open a text editor and write the following lines : Open a text editor and write the following lines: if it doesn’t work, check your pandoc location


ritsokiguess.site/docs

Drawing maps in R with 'ggmap' and 'OpenStreetMap'

Introduction I have long been interested in drawing maps, and when I discovered how to do it in R, I have tried to add it to my statistical repertoire, including drawing things on maps


jesse.tw

Friends Title Generator, Part 2

We’re back on the Friends script grind. titles % select(-director, -writers) titles ## # A tibble: 236 x 5 ## season episode title rating n_ratings ## ## 1 1. 1. The One Where Monica Gets a Roommate 8.50 4317. ## 2 1. 2. The One with the Sonogram at the End 8.20 3107. ## 3 1. 3. The One with the Thumb 8


livefreeordichotomize.com

Leveraging uncertainty information from deep neural networks for disease detection - a summary

As a biostatistician in the deep learning world I have the awkward task of balancing the dogma of statistics (everything is uncertain) along with the alluring success of some of the newest crazy complex neural network architectures


adamspannbauer.github.io

Plotting Christmas Carols in R

In this post we’ll be playing with spacyr & visNetwork to parse and plot the lyrics of the Christmas Carol ‘Santa Claus is Coming to Town’. The spacyr package is a wrapper around the spaCy python module for NLP


jesse.tw

Random sampling

The problem: you want to generate a random collection of letters. letters %>% sample(size = 4) %>% paste0(collapse = “”) ## [1] “dlcz” Great. Now do it 10 times. letters %>% sample(size = 4) %>% paste0(collapse = “”) %>% rep(8) ## [1] “rwuj” “rwuj” “rwuj” “rwuj” “rwuj” “rwuj” “rwuj” “rwuj” 😖. Why am I like this. Ok, rep copies the object 8 times. It doesn’t draw 8 samples


jesse.tw

Friends Title Generator, Part 1

To fulfill my lifelong desire to write Friends scripts, I’ll start by writing a Friends episode title. By that I mean a script that writes a Friends episode title


www.noahlandesberg.com

Introducing rhymer

For example, I’ve been enjoying rewriting nursery rhymes: The package includes additional functions to find other related words, including: And of course, much thanks


jesse.tw

Thank you blogdown ppl

Hardest problem: understanding Hugo structures / getting a theme I’m happy with. Easiest problems: Netlify, emoji, gifs, actually writing posts (ty blogdown). A list of things I found useful when making this blog


blog.sellorm.com

Learn to Write Command Line Utilities in R - part 5

Check out the first post in this series for an index of all the other posts. Last time, we changed the way our sorting hat command line utility did its sorting


jesse.tw

Post-selection inference on Friends titles in R

Goal I want to be a Friends scriptwriter. Can I pick a title that makes an episode an automatic classic? If I just include a character’s name in the title, does it make it automatically popular? I assume I should just write “The One Where Rachel is Rachel”


www.jessemaegan.com

R4DS: the next iteration

Like most online learning endeavors, we had a massive surge of interest at the onset, with exponential drop-offs week after week as we progressively worked through each chapter based on an established schedule


eliocamp.github.io/codigo-r

Cómo calcular el Índice de Precipitación Estandarizado en R

El Índice de Precipitación Estandarizado (SPI por Standarized Precipitation Index) es un índice para evaluar el estado de sequía o exceso de lluvia. La idea del SPI es tener una idea de qué tan probable es tener una cantidad de lluvia igual o menor


translatedmedicine.com

Income Along the Boston T (Red Line)

In the first part of this series, I went through my approach for creating a New Yorker inspired visualization of income along The Orange Line on the Boston T. For this post, we take a look at the The Red Line. Below you will see median household income along both lines of the Boston Red Line. Ashmont/Mattapan Red Line ,“title”:“Median Household Income Along the Red Line


blog.sellorm.com

Learn to Write Command Line Utilities in R - part 4

Check out the first post in this series for an index of all the other posts. In previous posts, we’ve been working on our command line Sorting Hat utility. We started out with a really simple tool that ran on the command line and just output a random Hogwarts house


www.blog.rdata.lu

Skip errors in R loops by not writing loops

You probably have encountered situations similar to this one: I hope you enjoyed this blog post, and that these functions will make your life


aosmith.rbind.io

Using DHARMa for residual checks of unsupported models

One of the difficult things about working with generalized linear models (GLM) and generalized linear mixed models (GLMM) is figuring out how to interpret residual


yihui.name/en

Being Busy vs Being Productive

Even if it is well-documented and fully tested, it may still not be “finished” - you have to maintain it (unless it is as stable as Donald Knuth’s TeX). This has been the biggest challenge to me. Bug reports, feature requests, and questions, ..


blog.sellorm.com

Learn to Write Command Line Utilities in R - part 3

Check out the first post in this series for an index of all the other posts. Yesterday we modified our simple sorting hat command line utility to accept it’s first argument, a name. Those of you who’ve been playing along may have noticed that our implementation wasn’t ideal. It’s fine if you run the script with an argument, like ./sortinghat


www.gokhanciflikli.com

Mining Game of Thrones Scripts with R

Quantitative Text Analysis Part II I meant to showcase the quanteda package in my previous post on the Weinstein Effect but had to switch to tidytext at the last minute


blog.wallaroolabs.com

Simplify Stream Processing in Python and Wallaroo using Docker

Distributed data stream processing frameworks can be hard to build and setup


lenkiefer.com

State population growth and house prices

EARLIER TODAY THE U.S. CENSUS BUREAU released new estimates of population for U.S. states from 2010 through 2017. Let’s see how population trends look compared to recent house price growth


roh.engineering

fitur 0.5.20 Release

DESCRIPTION summary has been updated Examples have been updated for fit_univariate, fit_empirical fit_empirical_discrete and fit_empirical_continuous are no longer exported added plot_qq and plot_pp functions for diagnostic plotting of fits Added a vignette Diagnostic Plots for Fitting Distributions Introduction vignette has updated


www.tidyverse.org/articles

styler 1.0.0

styler can style text, single files, packages and entire R source trees with the following functions: A distinguishing feature of styler is its flexibility. The following options are available: We will briefly describe all of them below


yihui.name/en

Another Year, Another R Package, Another Book, and Endless Joy

Originally we were thinking of completely reinventing the wheels of static websites using R, but when I finally started to work on it in October 2016, I realized the wheels were too big for me


giorasimchoni.com

E is for Elephant (The ebayr Package)

It occurred to me I’m not always putting my R powers to good


www.mytinyshinys.com

EPL Week 18

Match of the DayFour teams win away from home by at least three goal margins - surprisingly Huddersfield, West Ham and Crystal Palace join Liverpool Who is your TalismanSome players are regarded as more essential to their teams than others - Pogba at Manchester United or Zaha at Crystal Palace spring to


roelandtn.frama.io

French Urban Population growth between 2010 and 2014

Hi all ! Sorry this map is in French but it was a school assessment, so… French it is. Disclaimer: You might encounter a lot of french words and links to french webpages. You are warned. The main goal was to work wih PostgreSQL and PostGIS on census data


aebou.rbind.io

How to follow and engage with the R community

“R is not just a programming language, but it is also an interactive environment for doing data science.” “Investing a little time in learning R each day will pay off handsomely in the long run.”- Hadley Wickham and Garret Grolemund in R for Data Science


magesblog.com

Insurance Data Science Conference 2018

In 2013, we started with the aim to bring practitioners of industry and academia together to discuss and exchange ideas and needs from both sides. R was and is a perfect glue between the two groups, a tool which both side embrace and which has fostered the knowledge transfer between the two


blog.sellorm.com

Learn to Write Command Line Utilities in R - part 2

In yesterday’s post we took a look at command line utilities in general, some of the reasons why they’re useful, and also made our first bare-bones utility of our own. Today, we’re going to extend our sortinghat.R example, by allowing it to accept ‘arguments’


sharanry.github.io

The Three Giants' Survey

For quite some time researchers thought backpropagation wouldn’t work as it might get stuck in a local minima, but analysis showed that this rarely happened. There might be many saddle points at which the algorithm might get stuck, but it doesn’t matter which one as they all almost had the same objective value


www.tidyverse.org/articles

testthat 2.0.0

Install the latest version of testthat with: A new default reporter revamps the output to make better use of colour. New setup and teardown tools make it easier to run code before and after each test file, and before and after all tests. New and improved expectations make it easier to test printed output and precisely test conditions (i.e


blog.sellorm.com

Learn to Write Command Line Utilities in R

Other posts in this series Part 1 - Getting Started - This post Part 2 - Arguments Part 3 - Argument validation Part 4 - Improved sorting Part 5 - Debug logging Part 6 - Improving look and feel Introduction Do you know some R? Have you ever wanted to write your own command line utilities, but didn’t know where to start? Do you like Harry


alison.rbind.io

R-Ladies presentation ninja

The way to use the theme is to update the YAML like so: Some examples:


roelandtn.frama.io

Virtualbox images resizing

I add to resize an OSGeoLive virtual machine because after a couple postgresql workshops with quite big datasets (less than 1 GB each but still), the default size was not


jesse.tw

rvest + imdb -> explore Friends episode titles

I always wanted to be a scriptwriter. But my approach to doing creative things is “find the secret, program it, retire”. So what’s the secret to a successful Friends episode? [Really, I want to write/experience a gentle introduction to rvest, and later tidytext and language data science


blog.schochastics.net

A wild R package appears! Pokemon/Gameboy inspired plots in R

The package is only available via github so far. The Package comes with a dataset on 801 pokemon with a rich set of attributes. The package includes three main themes for ggplot. If you want to get nostalgic. If you want to get nostalgic, but not too much, use the Gameboy Advanced theme


jesse.tw

Functional programming #rstats


maximewack.com

Hello Internet

This is the first post for this blog. I will post here about the tools that I use, mostly Archlinux, vim and R. You can read more about this blog and myself in the About section. Social media contacts are at the bottom of every page. You can browse my (public) repos on my own instance of Gitea with the top menu


adamspannbauer.github.io

Summarizing Web Articles with R using lexRankr

In this post we’ll be recreating the output of a popular summarization bot using the R package lexRankr. If you browse reddit you may have come across /u/autotldr, a popular bot that performs article summarization


yihui.name/en

How To Stop Sexual Harassment Or Other Misconduct At Conferences

Again, I think we could add a text field on the conference registration page to let people report the names of past offenders that they know. The report could be (perhaps should be) anonymous. The one who has been reported for many times should be banned from the conference


ritsokiguess.site/docs

Testing for time trend

Introduction One of the things my environmental science colleagues spend much of their time doing is assessing whether something is changing over time. Most commonly, the depressing conclusion from one of their investigations is “climate change”. One of the studies I was part of concerned temporal trends in sea ice in Hudson Bay


www.jtimm.net

from web to annotated corpus

Simple web scraping Corpus annotation English part-of-speech tags Spanish web corpus Quick summary References This post demonstrates some methods for building multi-lingual corpora from web-based news content using my R package quicknews, as well as methods for annotating multi-lingual corpora using the cleanNLP (Arnold 2017) andudpipe (Wijffels 2018)


vatlab.github.io/blog

SoS: a cure to pipelineitis

Because of the needs to use libraries and tools in different languages and to execute them on different systems such as computer clusters, bioinformaticians write a lot of scripts in different languages and face many challenges in developing, running, managing, sharing, and reproducing bioinformatic data


blog.wallaroolabs.com

Dynamic tracing a Pony + Python program with DTrace

Your application probably has a performance problem. Or your app has a terrible bug. Or both. To find and fix these problems, many software developers use a profiler or a debugger. Profilers and debuggers are (usually) fantastic tools for solving performance and correctness problems


satopirka.com

GTC Japan 2017 深層学習フレームワークメモ

GTC Japan 2017


satopirka.com

GTC Japan 2017 深層学習フレームワークメモ

GTC Japan 2017


mlr-blog.netlify.com

Team Rtus wins Munich Re Datathon with mlr

On the weekend of November 17. - 19. five brave data-knights from team “Rtus and the knights of the data.table” took on the challenge to compete in a datathon organized by Munich Re in its Munich-based innovation lab


gcppodcast.com

A Year in Review with Francesc Campoy Flores and Greg Wilson

What were your personal highlights for 2017? It’s the end of the year! So we’ll be taking a break, and returning in January


www.mytinyshinys.com

EPL Week 17

Match of the DaySecond time recently, Watford have removed Deeney late in winning position only to end up with no points Watford were ahead of Crystal Palace for 85 minutes in Tuesday’s game - 15 minutes more than Palace have led in matches the whole season to dateYet Another Man City ChartWith


yihui.name/en

One Little Thing To Consider When Naming Things (Software)

Everyone knows that naming is hard (the other hard thing is cache invalidation)


translatedmedicine.com

Income Along the Boston T (Orange Line)

As a native New Yorker, I was recently intrigued by a visualization of household income in NY highlight inequality ( New Yorker ). Having lived in Boston for going on 9 years, I’ve started to call this place home. Could I make a similar representation of income in Boston? That was my challenge! PART I: The Orange Line To start I used these


www.tidyverse.org/articles

Project-oriented workflow

If the first line of your R script is I will come into your office and SET YOUR COMPUTER ON FIRE 🔥. If the first line of your R script is I will come into your office and SET YOUR COMPUTER ON FIRE 🔥. Caveat: only you can decide how much you care about this


www.mytinyshinys.com

EPL Week 16

Match of the DayBizarrely, the top two teams contrive to score three cock-up goals City dominanceWith an eleven point lead over their nearest rivals, Man City are looking home and hosed for the title


lenkiefer.com

Plotting U.S. Macroeconomic Trends with FRED and R

LET’S TAKE A LOOK AT RECENT U.S. macroeconomic trends by making a couple plots with R code. Since we’re going to be looking at U.S. macroeconomic data, the data we’ll need is available in the St. Louis Federal Reserve Bank Economic Database FRED


www.rdatagen.net

When there's a fork in the road, take it. Or, taking a look at marginal structural models.

The DAG below is a simple version of how things can get complicated very fast if we have sequential treatments or exposures that both affect and are affected by intermediate factors or conditions. In reality, there are no parallel universes


giorasimchoni.com

Ave Mariah

My friend Nir and I have this habbit of sending one another songs in the middle of the day. The other day I sent him this one by Mariah Carey: To which he replied “that’s so gay”, to which I replied “don’t confuse my 12-year-old taste with my gay taste!”


magesblog.com

Changing settlement rate model for paid losses

Glenn used the correlated log-normal chain-ladder model on reported incurred claims data to predict future developments. However, when looking at paid claims data, Glenn suggested to change the model slightly


asch3tti.netlify.com

Happy Holidays!

A simple white background was not enough. We needed something more appropriate for this period of the year… ah, yes! Snow! The end result is a big flat face fluctuating in the air, like a spooky and ominous reminder of our mortality even in these days of opulence


vatlab.github.io/blog

What's the big deal about backing SoS Notebook with a workflow engine?

After I announced the release of SoS Notebook as a third-party multi-language kernel for Jupyter, I was asked repeatedly (e.g


blog.sellorm.com

When a Tweet Turns Into an R Package

Boy, that escalated quickly I just wanted to write up a brief post about the power of R, its community, and tell the story of how actually putting stuff out into the world can have amazing


blog.davisvaughan.com

Writing a paper with RStudio

This semester I had to write a paper for my Financial Econometrics class. My topic was on analyzing the volatility of Bitcoin using GARCH modeling. I’m not particularly interested in Bitcoin, but with all the recent news around it, and with its highly volatile characteristics, I figured it would be a good candidate for analysis


blog.mgechev.com

Redux Anti-Patterns - Part 1. State Management.

For the past year I’ve been working on a project which uses React with TypeScript and Redux. In a few blog posts I’m planning to share lessons learned while combining these technologies. In this article I’ll share a few anti-patterns related to state management that I noticed in our development process


blog.wallaroolabs.com

Stateful Multi-Stream Processing in Python with Wallaroo

Wallaroo is a high-performance, open-source framework for building distributed stateful applications. In an earlier post, we looked at how Wallaroo scales distributed state


ewen.io

Introducing geniusr

I made an R interface to the Genius


gcppodcast.com

New York Times with Deep Kapadia and JP Robinson

What best practices are there for securing a Kubernetes Engine


www.mytinyshinys.com

EPL week 15

Match of the DayThe big-six contest, Arsenal v Man. Utd. illustrated once again that de Gea may be the finest goalkeeper in the world Club in Crisis - SpursAfter being praised to the heights by the press, things at Spurs are unravelling fairly quickly


jvera.netlify.com

Great packages for understanding your data

On the first steps of any project, the most usual task is to take a glimpse, figure how our data is distributed, and as fast as possible, be ready for next steps (wrangling and imputation)


ropensci.org/technotes

Magick 1.6

One issue was that sometimes magick graphics would show a 1px black border around the image. It turned out this is caused by rounding of clipping coordinates. When R calculates clipping area it often ends up at non-whole values


eddjberry.netlify.com

SparkR vs sparklyr for interacting with Spark from R

This post grew out of some notes I was making on the differences between SparkR and sparklyr, two packages that provide an R interface to Spark


cevo.com.au

Cevo - So Fast We're Seeing Double!

Cevo are delighted to have been awarded second place in the CRN Fast50 and 40th in the Financial Review Fast Starters for 2017. The former, announced last week, places us amongst the fastest growing IT solution providers in Australia


jvera.netlify.com

Explainers

When working on Machine Learning for classification and predictive models we tend to use the well known packages as randomforest, caret, xgboost, gbm and such. The issue is when the user needs explanation about how we get these results. The easy part is to explain a Tree


giorasimchoni.com

Snap-Fu

In one of Silicon Valley Season 4’s episodes Dinesh finds himself in need of penises images, in order to make a penis images detection app. I thought about choosing this task for my object detection project. After all I am no Dinesh, tagging images of penises comes easy to me1. But then I could hear my Dad after this is posted, saying “it had to be you


www.blog.rdata.lu

Visualizing box office revenue by genre

In this post, I describe the different steps leading to the treemap: First of all we read the data. The dataset looks better. As you have seen on top of this post. We want to design a treemap chart to visualize box-office revenue by genre


www.gokhanciflikli.com

A Tidytext Analysis of the Weinstein Effect

Quantifying He-Said, She-Said: Newspaper Reporting I have been meaning to get into quantitative text analysis for a while. I initially planned this post to feature a different package (that I wanted to showcase), however I ran into some problems with their


roh.engineering

MMC Queues

A queue or waiting line is a natural occurence in a system when the demand of customers exceeds the currently available resources that can serve that demand. Queues occur everywhere in our daily life at the grocery store, movie theatre, emergency room, and restaurants


ryanestrellado.netlify.com

Comparing Home and Away Wins of Kenny Dalglish’s Managerial Runs (Also, Did It Matter?)

After my first post, @HighlandDataSci on Twitter had a great question: Was Kenny Dalglish’s home and away win odds ratio different during his first run as manager than his


magesblog.com

Correlated log-normal chain-ladder model

The following code allows me to download the data and extract the information for one company, here company 353, which was the example company Glenn used as well. The data shows the historical annual developments of incurred claims for accident years 1988 to 1997


www.ifconfig.it/hugo

Unpatchable?

Quite often cable management is something that starts well when a new IDF is deployed and then gets messier over time


blog.wallaroolabs.com

DDoS Attack Detection with Wallaroo

This post will go through a real-world use case for Wallaroo, our distributed data processing framework for building high-performance streaming data applications


www.datalorax.com

Alluvial Diagrams with ggforce

Today I wanted to quickly share my first real attempt at making an alluvial diagram. For those not familiar (and I wasn’t previously) an alluvial diagram is a type of flow plot that is essentially equivalent to a sankey diagram


www.mytinyshinys.com

EPL week 14

Match of the DayAllardyce is beaming


www.diegobarneche.com

Making R base plots prettier

A common issue that many researchers face when producing plots in R is consistently placing legends or pictures at the exact same relative position


gcppodcast.com

Node.js with Myles Borins

Node.js is an open-source, JavaScript runtime environment built on Chrome’s V8 JavaScript engine, and Google is a Platinum Member of the Node.js Foundation. Myles Borins is a developer, musician, artist, and maker he works for Google as a developer advocate serving the Node


vatlab.github.io/blog

SoS Notebook

I started to use IPython, and then Jupyter more than ten years ago but despite of all the nice features, there were always something missing, something that prevented me from using it as my main working


lenkiefer.com

Housing construction and employment trends

THE UNITED STATES IS NOT building enough homes to meet demand. Be sure to check out my upcoming presentation at Realtor University to learn more about whether or not this could mean a house price bubble. One reason often cited for low levels of construction is a lack of labor


blog.wallaroolabs.com

How to Build a Thriving Open-source Community

Building a community of developers was one of the key motivations that led Wallaroo Labs to open-source our distributed data engine, Wallaroo. But it’s not always easy


www.tidyverse.org/articles

usethis 1.0.0 (and 1.1.0)

Take advantage of these helpers to document your package: If you want to share your code with others, it’s good practice to make the licensing


www.rdatagen.net

Characterizing the variance for clustered data that are Gamma distributed

Way back when I was studying algebra and wrestling with one word problem after another (I think now they call them story problems), I complained to my father. He laughed and told me to get used to it. “Life is one big word problem,” is how he put it


engineering.pivotal.io

All I do is VIM VIM VIM

I assume you understand the difference between insert, visual and normal mode. So there are a few handy moves that I don’t know where to put. They are definitely beyond the basics


cevo.com.au

Podcast - Hannah Browne chats to Wise Up

Cevo’s General Manager Hannah Browne recently chatted with Alexandra Stokes from Wise Up


ryanestrellado.netlify.com

Liverpool FC's Managers

Anfield in Liverpool is one of the legendary stadiums of English football. Fortress Anfield has been the home to many historic games, including Liverpool’s 4-3 win against Newcastle in 1996 and my personal favorite, Liverpool’s 3-1 win against Olympiacos to go through to the Champions League’s last