jesse.tw
A data viz that asks questions
I don’t design to answer questions, I design to ask them1 There are an incredible range of data visualizations. They’re great, really. Sometimes they are very pretty pictures. Which is great, unless the goal wasn’t to draw a pretty picture…
livefreeordichotomize.com
A year as told by fitbit
I managed to wear a fitbit the entirety of 2017, this is exciting for a few reasons: one I have commitment problems, and two: it’s a lot of data that I have to play with…
www.gokhanciflikli.com
Visualising US Voting Records with shinydashboard
Introducing adavis My second ever post on this blog was on introducing adamap, a Shiny app that maps Americans for Democratic Action voting scores (the so-called Liberal Quotient) between 1947-2015. It was built with highcharter, and hence it was nicely interactive but quite slow…
yihui.name/en
library(methods)
Okay, let’s see how terribly “slow” it is today. On my MacBook Pro (I should run this multiple times but I think 0.04 should be in the ballpark): It is sad to see things that exist for historical reasons are so hard to change. P.S…
roelandtn.frama.io
saveRDS() vs write_csv() - a newbie introduction
Last week, I came across this blog post from Yihui-down called save() vs saveRDS(). It was issued after a response from Jenny Bryan on Twitter. So when more experimented people like them advise to use a tool, it is for a reason. So I decided to give it a try…
www.mytinyshinys.com
EPL Week 19
Match of the DayArsenal and Liverpool showing why they are fun to watch but won’t be winning major trophies Kane is AbleSome eyebrows were raised when Harry Kane was rated by a Guardian poll of 169 experts as the fifth best player in the world for 2017 but he certainly did his best to justify that ranking with a hat-trick at high-flying…
www.blog.rdata.lu
Launching your shiny app in 2 clicks
Hello everyone, It’s fast and useful if you work with colleagues that don’t have a clue about R and just want to use your shiny app. Open a text editor and write the following lines : Open a text editor and write the following lines: if it doesn’t work, check your pandoc location…
ritsokiguess.site/docs
Drawing maps in R with 'ggmap' and 'OpenStreetMap'
Introduction I have long been interested in drawing maps, and when I discovered how to do it in R, I have tried to add it to my statistical repertoire, including drawing things on maps…
jesse.tw
Friends Title Generator, Part 2
We’re back on the Friends script grind. titles % select(-director, -writers) titles ## # A tibble: 236 x 5 ## season episode title rating n_ratings ## ## 1 1. 1. The One Where Monica Gets a Roommate 8.50 4317. ## 2 1. 2. The One with the Sonogram at the End 8.20 3107. ## 3 1. 3. The One with the Thumb 8…
livefreeordichotomize.com
Leveraging uncertainty information from deep neural networks for disease detection - a summary
As a biostatistician in the deep learning world I have the awkward task of balancing the dogma of statistics (everything is uncertain) along with the alluring success of some of the newest crazy complex neural network architectures…
adamspannbauer.github.io
Plotting Christmas Carols in R
In this post we’ll be playing with spacyr & visNetwork to parse and plot the lyrics of the Christmas Carol ‘Santa Claus is Coming to Town’. The spacyr package is a wrapper around the spaCy python module for NLP…
jesse.tw
Random sampling
The problem: you want to generate a random collection of letters. letters %>% sample(size = 4) %>% paste0(collapse = “”) ## [1] “dlcz” Great. Now do it 10 times. letters %>% sample(size = 4) %>% paste0(collapse = “”) %>% rep(8) ## [1] “rwuj” “rwuj” “rwuj” “rwuj” “rwuj” “rwuj” “rwuj” “rwuj” 😖. Why am I like this. Ok, rep copies the object 8 times. It doesn’t draw 8 samples…
jesse.tw
Friends Title Generator, Part 1
To fulfill my lifelong desire to write Friends scripts, I’ll start by writing a Friends episode title. By that I mean a script that writes a Friends episode title…
www.noahlandesberg.com
Introducing rhymer
For example, I’ve been enjoying rewriting nursery rhymes: The package includes additional functions to find other related words, including: And of course, much thanks…
jesse.tw
Thank you blogdown ppl
Hardest problem: understanding Hugo structures / getting a theme I’m happy with. Easiest problems: Netlify, emoji, gifs, actually writing posts (ty blogdown). A list of things I found useful when making this blog…
blog.sellorm.com
Learn to Write Command Line Utilities in R - part 5
Check out the first post in this series for an index of all the other posts. Last time, we changed the way our sorting hat command line utility did its sorting…
jesse.tw
Post-selection inference on Friends titles in R
Goal I want to be a Friends scriptwriter. Can I pick a title that makes an episode an automatic classic? If I just include a character’s name in the title, does it make it automatically popular? I assume I should just write “The One Where Rachel is Rachel”…
www.jessemaegan.com
R4DS: the next iteration
Like most online learning endeavors, we had a massive surge of interest at the onset, with exponential drop-offs week after week as we progressively worked through each chapter based on an established schedule…
eliocamp.github.io/codigo-r
Cómo calcular el Índice de Precipitación Estandarizado en R
El Índice de Precipitación Estandarizado (SPI por Standarized Precipitation Index) es un índice para evaluar el estado de sequía o exceso de lluvia. La idea del SPI es tener una idea de qué tan probable es tener una cantidad de lluvia igual o menor…
translatedmedicine.com
Income Along the Boston T (Red Line)
In the first part of this series, I went through my approach for creating a New Yorker inspired visualization of income along The Orange Line on the Boston T. For this post, we take a look at the The Red Line. Below you will see median household income along both lines of the Boston Red Line. Ashmont/Mattapan Red Line ,“title”:“Median Household Income Along the Red Line…
blog.sellorm.com
Learn to Write Command Line Utilities in R - part 4
Check out the first post in this series for an index of all the other posts. In previous posts, we’ve been working on our command line Sorting Hat utility. We started out with a really simple tool that ran on the command line and just output a random Hogwarts house…
www.blog.rdata.lu
Skip errors in R loops by not writing loops
You probably have encountered situations similar to this one: I hope you enjoyed this blog post, and that these functions will make your life…
aosmith.rbind.io
Using DHARMa for residual checks of unsupported models
One of the difficult things about working with generalized linear models (GLM) and generalized linear mixed models (GLMM) is figuring out how to interpret residual…
yihui.name/en
Being Busy vs Being Productive
Even if it is well-documented and fully tested, it may still not be “finished” - you have to maintain it (unless it is as stable as Donald Knuth’s TeX). This has been the biggest challenge to me. Bug reports, feature requests, and questions, ..…
blog.sellorm.com
Learn to Write Command Line Utilities in R - part 3
Check out the first post in this series for an index of all the other posts. Yesterday we modified our simple sorting hat command line utility to accept it’s first argument, a name. Those of you who’ve been playing along may have noticed that our implementation wasn’t ideal. It’s fine if you run the script with an argument, like ./sortinghat…
www.gokhanciflikli.com
Mining Game of Thrones Scripts with R
Quantitative Text Analysis Part II I meant to showcase the quanteda package in my previous post on the Weinstein Effect but had to switch to tidytext at the last minute…
blog.wallaroolabs.com
Simplify Stream Processing in Python and Wallaroo using Docker
Distributed data stream processing frameworks can be hard to build and setup…
lenkiefer.com
State population growth and house prices
EARLIER TODAY THE U.S. CENSUS BUREAU released new estimates of population for U.S. states from 2010 through 2017. Let’s see how population trends look compared to recent house price growth…
roh.engineering
fitur 0.5.20 Release
DESCRIPTION summary has been updated Examples have been updated for fit_univariate, fit_empirical fit_empirical_discrete and fit_empirical_continuous are no longer exported added plot_qq and plot_pp functions for diagnostic plotting of fits Added a vignette Diagnostic Plots for Fitting Distributions Introduction vignette has updated…
www.tidyverse.org/articles
styler 1.0.0
styler can style text, single files, packages and entire R source trees with the following functions: A distinguishing feature of styler is its flexibility. The following options are available: We will briefly describe all of them below…
yihui.name/en
Another Year, Another R Package, Another Book, and Endless Joy
Originally we were thinking of completely reinventing the wheels of static websites using R, but when I finally started to work on it in October 2016, I realized the wheels were too big for me…
giorasimchoni.com
E is for Elephant (The ebayr Package)
It occurred to me I’m not always putting my R powers to good…
www.mytinyshinys.com
EPL Week 18
Match of the DayFour teams win away from home by at least three goal margins - surprisingly Huddersfield, West Ham and Crystal Palace join Liverpool Who is your TalismanSome players are regarded as more essential to their teams than others - Pogba at Manchester United or Zaha at Crystal Palace spring to…
roelandtn.frama.io
French Urban Population growth between 2010 and 2014
Hi all ! Sorry this map is in French but it was a school assessment, so… French it is. Disclaimer: You might encounter a lot of french words and links to french webpages. You are warned. The main goal was to work wih PostgreSQL and PostGIS on census data…
aebou.rbind.io
How to follow and engage with the R community
“R is not just a programming language, but it is also an interactive environment for doing data science.” “Investing a little time in learning R each day will pay off handsomely in the long run.”- Hadley Wickham and Garret Grolemund in R for Data Science…
magesblog.com
Insurance Data Science Conference 2018
In 2013, we started with the aim to bring practitioners of industry and academia together to discuss and exchange ideas and needs from both sides. R was and is a perfect glue between the two groups, a tool which both side embrace and which has fostered the knowledge transfer between the two…
blog.sellorm.com
Learn to Write Command Line Utilities in R - part 2
In yesterday’s post we took a look at command line utilities in general, some of the reasons why they’re useful, and also made our first bare-bones utility of our own. Today, we’re going to extend our sortinghat.R example, by allowing it to accept ‘arguments’…
sharanry.github.io
The Three Giants' Survey
For quite some time researchers thought backpropagation wouldn’t work as it might get stuck in a local minima, but analysis showed that this rarely happened. There might be many saddle points at which the algorithm might get stuck, but it doesn’t matter which one as they all almost had the same objective value…
www.tidyverse.org/articles
testthat 2.0.0
Install the latest version of testthat with: A new default reporter revamps the output to make better use of colour. New setup and teardown tools make it easier to run code before and after each test file, and before and after all tests. New and improved expectations make it easier to test printed output and precisely test conditions (i.e…
blog.sellorm.com
Learn to Write Command Line Utilities in R
Other posts in this series Part 1 - Getting Started - This post Part 2 - Arguments Part 3 - Argument validation Part 4 - Improved sorting Part 5 - Debug logging Part 6 - Improving look and feel Introduction Do you know some R? Have you ever wanted to write your own command line utilities, but didn’t know where to start? Do you like Harry…
alison.rbind.io
R-Ladies presentation ninja
The way to use the theme is to update the YAML like so: Some examples:…
roelandtn.frama.io
Virtualbox images resizing
I add to resize an OSGeoLive virtual machine because after a couple postgresql workshops with quite big datasets (less than 1 GB each but still), the default size was not…
jesse.tw
rvest + imdb -> explore Friends episode titles
I always wanted to be a scriptwriter. But my approach to doing creative things is “find the secret, program it, retire”. So what’s the secret to a successful Friends episode? [Really, I want to write/experience a gentle introduction to rvest, and later tidytext and language data science…
blog.schochastics.net
A wild R package appears! Pokemon/Gameboy inspired plots in R
The package is only available via github so far. The Package comes with a dataset on 801 pokemon with a rich set of attributes. The package includes three main themes for ggplot. If you want to get nostalgic. If you want to get nostalgic, but not too much, use the Gameboy Advanced theme…
maximewack.com
Hello Internet
This is the first post for this blog. I will post here about the tools that I use, mostly Archlinux, vim and R. You can read more about this blog and myself in the About section. Social media contacts are at the bottom of every page. You can browse my (public) repos on my own instance of Gitea with the top menu…
adamspannbauer.github.io
Summarizing Web Articles with R using lexRankr
In this post we’ll be recreating the output of a popular summarization bot using the R package lexRankr. If you browse reddit you may have come across /u/autotldr, a popular bot that performs article summarization…
yihui.name/en
How To Stop Sexual Harassment Or Other Misconduct At Conferences
Again, I think we could add a text field on the conference registration page to let people report the names of past offenders that they know. The report could be (perhaps should be) anonymous. The one who has been reported for many times should be banned from the conference…
ritsokiguess.site/docs
Testing for time trend
Introduction One of the things my environmental science colleagues spend much of their time doing is assessing whether something is changing over time. Most commonly, the depressing conclusion from one of their investigations is “climate change”. One of the studies I was part of concerned temporal trends in sea ice in Hudson Bay…
www.jtimm.net
from web to annotated corpus
Simple web scraping Corpus annotation English part-of-speech tags Spanish web corpus Quick summary References This post demonstrates some methods for building multi-lingual corpora from web-based news content using my R package quicknews, as well as methods for annotating multi-lingual corpora using the cleanNLP (Arnold 2017) andudpipe (Wijffels 2018)…
vatlab.github.io/blog
SoS: a cure to pipelineitis
Because of the needs to use libraries and tools in different languages and to execute them on different systems such as computer clusters, bioinformaticians write a lot of scripts in different languages and face many challenges in developing, running, managing, sharing, and reproducing bioinformatic data…
blog.wallaroolabs.com
Dynamic tracing a Pony + Python program with DTrace
Your application probably has a performance problem. Or your app has a terrible bug. Or both. To find and fix these problems, many software developers use a profiler or a debugger. Profilers and debuggers are (usually) fantastic tools for solving performance and correctness problems…
mlr-blog.netlify.com
Team Rtus wins Munich Re Datathon with mlr
On the weekend of November 17. - 19. five brave data-knights from team “Rtus and the knights of the data.table” took on the challenge to compete in a datathon organized by Munich Re in its Munich-based innovation lab…
gcppodcast.com
A Year in Review with Francesc Campoy Flores and Greg Wilson
What were your personal highlights for 2017? It’s the end of the year! So we’ll be taking a break, and returning in January…
www.mytinyshinys.com
EPL Week 17
Match of the DaySecond time recently, Watford have removed Deeney late in winning position only to end up with no points Watford were ahead of Crystal Palace for 85 minutes in Tuesday’s game - 15 minutes more than Palace have led in matches the whole season to dateYet Another Man City ChartWith…
yihui.name/en
One Little Thing To Consider When Naming Things (Software)
Everyone knows that naming is hard (the other hard thing is cache invalidation)…
translatedmedicine.com
Income Along the Boston T (Orange Line)
As a native New Yorker, I was recently intrigued by a visualization of household income in NY highlight inequality ( New Yorker ). Having lived in Boston for going on 9 years, I’ve started to call this place home. Could I make a similar representation of income in Boston? That was my challenge! PART I: The Orange Line To start I used these…
www.tidyverse.org/articles
Project-oriented workflow
If the first line of your R script is I will come into your office and SET YOUR COMPUTER ON FIRE 🔥. If the first line of your R script is I will come into your office and SET YOUR COMPUTER ON FIRE 🔥. Caveat: only you can decide how much you care about this…
www.mytinyshinys.com
EPL Week 16
Match of the DayBizarrely, the top two teams contrive to score three cock-up goals City dominanceWith an eleven point lead over their nearest rivals, Man City are looking home and hosed for the title…
lenkiefer.com
Plotting U.S. Macroeconomic Trends with FRED and R
LET’S TAKE A LOOK AT RECENT U.S. macroeconomic trends by making a couple plots with R code. Since we’re going to be looking at U.S. macroeconomic data, the data we’ll need is available in the St. Louis Federal Reserve Bank Economic Database FRED…
www.rdatagen.net
When there's a fork in the road, take it. Or, taking a look at marginal structural models.
The DAG below is a simple version of how things can get complicated very fast if we have sequential treatments or exposures that both affect and are affected by intermediate factors or conditions. In reality, there are no parallel universes…
giorasimchoni.com
Ave Mariah
My friend Nir and I have this habbit of sending one another songs in the middle of the day. The other day I sent him this one by Mariah Carey: To which he replied “that’s so gay”, to which I replied “don’t confuse my 12-year-old taste with my gay taste!”…
magesblog.com
Changing settlement rate model for paid losses
Glenn used the correlated log-normal chain-ladder model on reported incurred claims data to predict future developments. However, when looking at paid claims data, Glenn suggested to change the model slightly…
asch3tti.netlify.com
Happy Holidays!
A simple white background was not enough. We needed something more appropriate for this period of the year… ah, yes! Snow! The end result is a big flat face fluctuating in the air, like a spooky and ominous reminder of our mortality even in these days of opulence…
vatlab.github.io/blog
What's the big deal about backing SoS Notebook with a workflow engine?
After I announced the release of SoS Notebook as a third-party multi-language kernel for Jupyter, I was asked repeatedly (e.g…
blog.sellorm.com
When a Tweet Turns Into an R Package
Boy, that escalated quickly I just wanted to write up a brief post about the power of R, its community, and tell the story of how actually putting stuff out into the world can have amazing…
blog.davisvaughan.com
Writing a paper with RStudio
This semester I had to write a paper for my Financial Econometrics class. My topic was on analyzing the volatility of Bitcoin using GARCH modeling. I’m not particularly interested in Bitcoin, but with all the recent news around it, and with its highly volatile characteristics, I figured it would be a good candidate for analysis…
blog.mgechev.com
Redux Anti-Patterns - Part 1. State Management.
For the past year I’ve been working on a project which uses React with TypeScript and Redux. In a few blog posts I’m planning to share lessons learned while combining these technologies. In this article I’ll share a few anti-patterns related to state management that I noticed in our development process…
blog.wallaroolabs.com
Stateful Multi-Stream Processing in Python with Wallaroo
Wallaroo is a high-performance, open-source framework for building distributed stateful applications. In an earlier post, we looked at how Wallaroo scales distributed state…
gcppodcast.com
New York Times with Deep Kapadia and JP Robinson
What best practices are there for securing a Kubernetes Engine…
www.mytinyshinys.com
EPL week 15
Match of the DayThe big-six contest, Arsenal v Man. Utd. illustrated once again that de Gea may be the finest goalkeeper in the world Club in Crisis - SpursAfter being praised to the heights by the press, things at Spurs are unravelling fairly quickly…
jvera.netlify.com
Great packages for understanding your data
On the first steps of any project, the most usual task is to take a glimpse, figure how our data is distributed, and as fast as possible, be ready for next steps (wrangling and imputation)…
ropensci.org/technotes
Magick 1.6
One issue was that sometimes magick graphics would show a 1px black border around the image. It turned out this is caused by rounding of clipping coordinates. When R calculates clipping area it often ends up at non-whole values…
eddjberry.netlify.com
SparkR vs sparklyr for interacting with Spark from R
This post grew out of some notes I was making on the differences between SparkR and sparklyr, two packages that provide an R interface to Spark…
cevo.com.au
Cevo - So Fast We're Seeing Double!
Cevo are delighted to have been awarded second place in the CRN Fast50 and 40th in the Financial Review Fast Starters for 2017. The former, announced last week, places us amongst the fastest growing IT solution providers in Australia…
jvera.netlify.com
Explainers
When working on Machine Learning for classification and predictive models we tend to use the well known packages as randomforest, caret, xgboost, gbm and such. The issue is when the user needs explanation about how we get these results. The easy part is to explain a Tree…
giorasimchoni.com
Snap-Fu
In one of Silicon Valley Season 4’s episodes Dinesh finds himself in need of penises images, in order to make a penis images detection app. I thought about choosing this task for my object detection project. After all I am no Dinesh, tagging images of penises comes easy to me1. But then I could hear my Dad after this is posted, saying “it had to be you…
www.blog.rdata.lu
Visualizing box office revenue by genre
In this post, I describe the different steps leading to the treemap: First of all we read the data. The dataset looks better. As you have seen on top of this post. We want to design a treemap chart to visualize box-office revenue by genre…
www.gokhanciflikli.com
A Tidytext Analysis of the Weinstein Effect
Quantifying He-Said, She-Said: Newspaper Reporting I have been meaning to get into quantitative text analysis for a while. I initially planned this post to feature a different package (that I wanted to showcase), however I ran into some problems with their …
roh.engineering
MMC Queues
A queue or waiting line is a natural occurence in a system when the demand of customers exceeds the currently available resources that can serve that demand. Queues occur everywhere in our daily life at the grocery store, movie theatre, emergency room, and restaurants…
ryanestrellado.netlify.com
Comparing Home and Away Wins of Kenny Dalglish’s Managerial Runs (Also, Did It Matter?)
After my first post, @HighlandDataSci on Twitter had a great question: Was Kenny Dalglish’s home and away win odds ratio different during his first run as manager than his…
magesblog.com
Correlated log-normal chain-ladder model
The following code allows me to download the data and extract the information for one company, here company 353, which was the example company Glenn used as well. The data shows the historical annual developments of incurred claims for accident years 1988 to 1997…
www.ifconfig.it/hugo
Unpatchable?
Quite often cable management is something that starts well when a new IDF is deployed and then gets messier over time…
blog.wallaroolabs.com
DDoS Attack Detection with Wallaroo
This post will go through a real-world use case for Wallaroo, our distributed data processing framework for building high-performance streaming data applications…
www.datalorax.com
Alluvial Diagrams with ggforce
Today I wanted to quickly share my first real attempt at making an alluvial diagram. For those not familiar (and I wasn’t previously) an alluvial diagram is a type of flow plot that is essentially equivalent to a sankey diagram…
www.diegobarneche.com
Making R base plots prettier
A common issue that many researchers face when producing plots in R is consistently placing legends or pictures at the exact same relative position…
gcppodcast.com
Node.js with Myles Borins
Node.js is an open-source, JavaScript runtime environment built on Chrome’s V8 JavaScript engine, and Google is a Platinum Member of the Node.js Foundation. Myles Borins is a developer, musician, artist, and maker he works for Google as a developer advocate serving the Node…
vatlab.github.io/blog
SoS Notebook
I started to use IPython, and then Jupyter more than ten years ago but despite of all the nice features, there were always something missing, something that prevented me from using it as my main working…
lenkiefer.com
Housing construction and employment trends
THE UNITED STATES IS NOT building enough homes to meet demand. Be sure to check out my upcoming presentation at Realtor University to learn more about whether or not this could mean a house price bubble. One reason often cited for low levels of construction is a lack of labor…
blog.wallaroolabs.com
How to Build a Thriving Open-source Community
Building a community of developers was one of the key motivations that led Wallaroo Labs to open-source our distributed data engine, Wallaroo. But it’s not always easy…
www.tidyverse.org/articles
usethis 1.0.0 (and 1.1.0)
Take advantage of these helpers to document your package: If you want to share your code with others, it’s good practice to make the licensing…
www.rdatagen.net
Characterizing the variance for clustered data that are Gamma distributed
Way back when I was studying algebra and wrestling with one word problem after another (I think now they call them story problems), I complained to my father. He laughed and told me to get used to it. “Life is one big word problem,” is how he put it…
engineering.pivotal.io
All I do is VIM VIM VIM
I assume you understand the difference between insert, visual and normal mode. So there are a few handy moves that I don’t know where to put. They are definitely beyond the basics…
cevo.com.au
Podcast - Hannah Browne chats to Wise Up
Cevo’s General Manager Hannah Browne recently chatted with Alexandra Stokes from Wise Up…
ryanestrellado.netlify.com
Liverpool FC's Managers
Anfield in Liverpool is one of the legendary stadiums of English football. Fortress Anfield has been the home to many historic games, including Liverpool’s 4-3 win against Newcastle in 1996 and my personal favorite, Liverpool’s 3-1 win against Olympiacos to go through to the Champions League’s last…