www.datalorax.com
Sharing some functions from my personal R package
In this post I basically just wanted to share some recent developments that I’ve made to my personal R package {sundry}. All of the recent advancements have been made to work with the tidyverse, so things like group_by should work seamlessly. If you feel like giving the package a whirl, I’d love any feedback you have or bugs you may find…
www.jtimm.net
locating linguistic diversity in the usa
Language data and the census Languages in the US Linguistic diversity as entropy Locating linguistic diversity FIN This post investigates linguistic diversity in the United States utilizing data made available by the US Census…
sciathlon.github.io
Biathlon data analysis
The 2018 Winter Olympics finally kicked off! In honor of this, I asked a friend of mine who loves sports what data he would be interested in seeing in the sports that are in the winter Olympics and he answered “biathlon…
mouse-imaging-centre.github.io/blog
Finding and playing with peaks in RMINC
So, peaks. When producing a statistical map, it’s good to get a report of the peaks (i.e. most significant findings). RMINC has had this support for a while now, though it has remained somewhat hidden…
thestudyofthehousehold.com
How I use Rmarkdown
Last week or so, I achieved a wonderous thing. A trivial thing. I acheived a wondrous, trivial thing: I wrote my most popular tweet ever: My new thing is ending every Rmd with a list of links to the forums / SO questions / blogs / github repos that I used to solve the problem #rstats pic.twitter…
adamspannbauer.github.io
Proposals, diamonds, xgboost, & lime
I recently got engaged! While picking out the stone for the ring I played around with the diamonds dataset from ggplot2. The analysis was around what are the main contributors to diamond pricing. (The analysis was also an excuse to play around with the lime package for the first time…
www.carlbfrederick.com
Using R to Create Google Maps
Voila! The new file is only 1.6 MB. Now, all that is left to do is to login to Google Maps, import the widists_sm.kml file and share it…
blog.wallaroolabs.com
A Scikit-learn pipeline in Wallaroo
While it would seem that machine learning is taking over the world, a lot of the attention has been focused towards researching new methods and applications, and how to make a single model faster…
ryanestrellado.netlify.com
California School Dashboards Part 1
This is part one of a three part series where I’ll be working with California School Dashboard data by cleaning, visualizaing, and exploring through modeling. Introduction: It’s Ok to Skip Around I’m writing this series for data scientists, public school educators, and data scientists who are also public school…
cevo.com.au
Is there such a thing as too much automation?
A recent discussion at Cevo HQ came onto the topic of satellite navigation and how lost we would be without it, how in the good old days of street directories you were generally only lost the first time you travelled to a new location…
www.njtierney.com
Acknowledgements
Last week (specifically, the 1st Feb) I got “the letter” from QUT that basically says “You, Nicholas Tierney, are a Dr.”. Well, specifically, it says: The Queensland University of Technology’s Registrar has executively approved your degree and you are now entitled to use the title of…
giorasimchoni.com
Book'em Danno!
One of the best talks in my opinion at rstudio::conf 2018 was actually the first keynote by Prof. Di Cook, of Monash University, titled “To the Tidyverse and Beyond: Challenges for the Future in Data Science”1. Di talked about how she views the plot of a dataset as simply another statistic, a function of this dataset…
lenkiefer.com
February 2018 housing market update
EARLIER THIS WEEK I TWEETED out a poll asking whether or not folks wanted to see a thread/tweetstorm with slides from an upcoming presentation on the economy and housing markets that I’m giving. Over 90 percent voted for a thread. So I shared it. In this post let me add a little more commentary on the individual slides…
gcppodcast.com
Open Source TensorFlow with Yifei Feng
How do I design identity and access management policies policies for a…
yihui.name/en
Perhaps My Documentation is not Too Bad
My documentation was poor. The user didn’t read the documentation. It turned out the second possibility was true. What a relief! I’m glad that I didn’t yell “Why don’t you read the documentation?” That said, as software developers, we should always keep the first possibility in mind…
thug-r.life
Prep Your Hugo Blog for R-bloggers
Get Ready for R-Bloggers There are lots of reasons to write a technical blog. Getting practice writing, thinking through ideas, etc. are all fulfilling reasons on their own. But life is always better with an audience. Well, not always. But for a blog it is…
blog.schochastics.net
Sample Entropy with Rcpp
Simple. Problem is, I need to calculate the sample entropy of 150,000 time series. Can the function handle that in reasonable time? This translates to several hours for 150,000 time series, which is kind of not ok. I would prefer it a little faster. Perfect. Now let’s check if we gained some speed up. The speed up is actually ridiculous…
yihui.name/en
Thanks, Marie Dussault, for Reading the Full blogdown Book
Write, as if no one would read…
blog.sellorm.com
dater - a tiny Addin for RStudio
Inserting a date with a keyboard shortcut TLDR: You can find the dater package on github. Update 1: It seems all of this was for nothing… Hi…
blog.earo.me
tsibble? or tibbletime?
So what do these two packages have in common? A common time series analysis task is to aggregate the values to higher-level time periods. For example, it may be interesting to examine average temperature and total precipitation every month. Generally, tsibble defines a time series tibble more strictly than tibbletime…
lenkiefer.com
Comparing recent periods of mortgage rate increases
THIS MORNING I SAW AN INTERESTING CHART OVER ON BLOOMBERG. In this post they compared recent 10-year Treasury yield movements with the Taper Tantrum in 2013. The chart you can see here was an area chart with overlapping line plots. I thought it would be a fun exercise to remix a similar chart with R…
www.redbandsports.net
How good were the offences in the Super Bowl
Sunday’s Super Bowl set 17 statistical records and tied 12 more, according to research by the Elias Sports Bureau. The 1,151 combined total yards by the Eagles and Patriots obliterated the old record by 222 yards. Below is a histogram showing the combined total yards of the 52 Super Bowls, sorted into bins of 25 yards…
eliocamp.github.io/codigo-r
How to make a shaded relief in R
Spanish version of this post While trying to build a circular colour scale to plot angles and wind direction, I stumbled upon an easy way to make shaded reliefs in…
translatedmedicine.com
Boston Limited English Proficient Population
Inspire by Julia Silge and the tidycensus package by Kyle Walker, I wanted to explore the limited English proficient (LEP) population in Suffolk County (which includes Boston). For those, not in the know, LEP refers people who speak English less than very well. Beyond the overall population, I wanted to get a glimpse into the language diversity of the…
fishsciences.github.io
Visualizing Fish Encounter Histories
Encounter histories are the translation of a fish’s path into a row of ones and zeros, each corresponding to a positive or negative detection record at a receiver location in the acoustic…
yihui.name/en
What is More Convincing than a CV?
You need to have a Github presence (I’m sure it does not have to be Github, e.g., Gitlab should also be fine). It is hard to imagine that a data scientist does not use version control. You must have given talks at local meetups or conferences. Communication is a key part of data science…
emmavestesson.netlify.com
What should I have for lunch?
Inspiration I was having lunch with some colleagues the other day when they told me about a restaurant spreadsheet that they used to use to randomly pick a place to get lunch from. I of course felt the need to see if I could create something similar in R…
adamspannbauer.github.io
rPackedBar on CRAN
This post is to announce rPackedBar’s release to CRAN and to share a shiny app to visualize twitter interactions using a packed barchart. This post and the package has been updated due to feedback from Xan Gregg. Click to See more …
yihui.name/en
How to Pretend Typing Super Fast in RStudio
Below is a prototype of the function to automatically “type” a character vector into your RStudio source editor: It should give you what you…
sciathlon.github.io
Triathlon pubmed analysis
Today I am using the RISmed package for R to analyze publications about triathlon. It is an amazing package to look through the Pubmed database for what they have on a certain subject. Pubmed is a NIH (USA) funded database which hosts articles about medicine and biology…
www.redbandsports.net
Do Super Bowl QBs get more babies named after them?
On Jan. 23, Sports Illustrated posted the following tweet, which got us thinking. Fourteen years ago, a 7-year-old in Foxboro told Tom Brady he had named his baby brother after him…
shotwell.ca/blog
Flagging toxic comments with Tidytext and Keras
The task here is to try to determine how likely a string is to have a particular set of labels. We can take a look at the data The first step for looking at the actual text is to split up the strings into words and then remove stop words…
blog.wallaroolabs.com
Idiomatic Python Stream Processing in Wallaroo
We have been working on Wallaroo, our scale-independent event processing system, for a little over two years…
fharrell.com
Is Medicine Mesmerized by Machine Learning?
Avati et al used deep learning on the 13,654 features to achieve a validated c-index of 0.93. To the authors’ credit, they constructed an unbiased calibration curve, although it used binning and is very low resolution…
www.ifconfig.it/hugo
Tech Field Day Extra at Cisco Live Europe 2018
I had the honor and pleasure of being invited again to attend Tech Field Day, this time for an Extra event at Cisco Live Europe in Barcelona…
mouse-imaging-centre.github.io/blog
Bayesian Model Selection with PSIS-LOO
Pitch In this post I’d like to provide an overview of Pareto-Smoothed Importance Sampling (PSIS-LOO) and how it can be used for bayesian model selection…
aosmith.rbind.io
Making many added variable plots with purrr and ggplot2
Last week two of my consulting meetings ended up on the same topic: making added variable plots. In both cases, the student had a linear model of some flavor that had several continuous explanatory variables…
www.cultureofinsight.com/blog
Map your Google Location Data with R Shiny
I Know What You Vizzed Last Summer tl;dr click the image to launch the app I guess I’m of that school of thought, I don’t mind my mobile tracking me…
gcppodcast.com
Percy.io with Mike Fotinakis
I would love a weekly roundup of news about Google Cloud Platform - where can I get…
yihui.name/en
The Large Variance in the Attention Level of Readers
I keep forgetting this, too, and let outliers bias me. For example, I often feel heartbroken when I see users ask questions on Twitter without reading the documentation on which I have spent countless hours. What is worse is that they may get misleading or wrong answers. Perhaps my documentation is just too poor or boring, and perhaps they just didn’t pay attention…
yihui.name/en
Anything that Can Look Like Cats will Look Like Cats
Anything that can look like cats in your eyes, will look like cats in some other people’s eyes. P.S…
timtrice.net
Department Top Three Salaries
Write a SQL query to find employees who earn the top three salaries in each of the department. For the above tables, your SQL query should return the following rows. Table: (#tab:solution-2)0 records Department Employee Salary ———– ——— ——- Adding the line below to the query above passes the test case…
yihui.name/en
How to Properly Write a URL
You may think it is dead simple to write a URL, and we click links to browse websites every day…
yihui.name/en
Ian Lyttle is the Most Serious Conference Attendee I've Met
I think I got to know Ian Lyttle in early 2014 (about 1.5 years before I left Iowa). Out of nowhere, he started to show up at the weekly ISU graphics working group meetings (led by my PhD advisors Di and Heike) after driving for three boring hours from another city…
magesblog.com
PK/PD reserving models
The dynamical system is no longer autonomous and initially I can’t be bothered to solve it analytically. Hence, I use an ODE solver instead, but I will get back to integrating the differential equations later. Fortunately, an ODE solver is part of the Stan language…
www.gokhanciflikli.com
Scraping Wikipedia Tables from Lists for Visualisation
Get WikiTables from Lists Recently I was asked to submit a short take-home challenge and I thought what better excuse for writing a quick blog post! It was on short notice so initially I stayed within the confines of my comfort zone and went for something safe and bland…
ritsokiguess.site/docs
Tidy simple effects in analysis of variance
Introduction In two-way analysis of variance, the (continuous) response variable depends on two explanatory factors, say A and B…
blog.wallaroolabs.com
Why we wrote our Kafka Client in Pony
At Wallaroo Labs we’ve been working on our stream processing engine, Wallaroo for just under two years now…
www.jtimm.net
a simple framework for corpus-based keyphrase extraction
Defining potential keyphrases Corpus search for potential keyphrases Selecting descriptive keyphrases with the tf-idf statisitic Post script - State of the Union Addresses This post outlines a simple framework for identifying and extracting keyphrases from component texts of a…
cevo.com.au
Introduction to R
R is great for doing any kind of slicing and dicing with data. However the barrier to entry can be high, especially for people that come from a non-data background. I know that it took me quite some time to grasp just how R does its magic…
blog.mgechev.com
JavaScript Decorators for Declarative and Readable Code
Decorators in JavaScript are now in stage 2. They allow us to alter the definition of a class, method, or a property. There are already a few neat libraries which provide decorators and make our life easier by allowing us to write more declarative code with better performance characteristics. In this blog post I’ll share a few decorators which I’m using on a daily basis…
asch3tti.netlify.com
My first experience with text mining
The first step was to quantify how often words were used across the 34 chapters of the novel, to have an initial idea of the content. So, I counted the number of occurrences for each word and selected only the most common ones (i.e…
www.rdatagen.net
Have you ever asked yourself, 'how should I approach the classic pre-post analysis?'
I’ve explored various scenarios (i.e. different data generating assumptions) to see if it matters which approach we use. (Of course it does…
fharrell.com
Information Gain From Using Ordinal Instead of Binary Outcomes
The point about the increase in power can also be made by, instead of varying the effect size, varying the effect that can be detected with a fixed power of 0.9 when the degree of granularity in Y is increased. This is all about breaking ties in Y. The more ties there are, the less statistical information is present…
www.sastibe.de
My Motivations for Starting a Blog
Hello world! My name is Sebastian Schweer, and I am a Data Scientist. This job description is increasingly popular, but it is notoriously difficult to describe precisely, what that entails…
www.jessemaegan.com
R4DS February Challenge
The challenge is short and sweet this month, and the same for both learners and mentors: Remember: the size of your win isn’t what’s important–everyone’s learning process unfolds at different rates and sizes–what matters is coming together to celebrate everyone’s learning journey within our online…
ryanestrellado.netlify.com
Turning Dataset Codes to Words With R
Note: I include a lot of code in this post so my fellow data scientists can either learn from it or give me feedback about how to make it better. It’s totally ok to skip over all that and just check out the…
livefreeordichotomize.com
Wrangling Data Day Texas Slides
Since twitter threads are excessively cumbersome to navigate, Maëlle asked me to relocate the list of #rstats Data Day Texas slides to a blog post, so here we are! The titles link to the slides 👯 Pilgrim’s Progress: a journey from confusion to contribution Mara Averick Navigating the data science landscape can be…
rsangole.netlify.com
First foray into Shiny
Visualising Distributions Visualising Linear Discriminant Analysis Shiny had interested me for a while for it’s power to quickly communicate and vizualise data and models. I hadn’t delved into it due to lack of time to do so, until now. Two quick visualizations I’ve created as my 1st foray into R Shiny…
emmavestesson.netlify.com
Happy Birthday To Me!
Happy Birthday To Me! Today is my birthday. To celebrate I decided to look at what was in the news on January 27 every year since I was born. Mainly I want to see if the news were positive or negative. Getting the data I start buy creating a list of dates…
josiahparry.com
Introducing geniusR
The functions in this package enable easy access of individual song lyrics, album tracklists, and lyrics to whole albums. Load the package: This returns a tidy data frame with three columns: In this example I will extract 3 albums from Kendrick Lamar and Sara Bareilles (two of my favotire musicians)…
jesse.tw
Jalen v. Shaq as baby names
Was Jalen Rose really the first Jalen? He claims his mother was the first to make up the name, a combo of his father’s (the NBA player Jimmie Walker) and uncle’s (Leonard) names…
www.sastibe.de
Setting up a Scalable RStudio Instance in AWS
Obviously, that is the case. In this post, I will show you the steps for setting up such an environment on Amazon Web Services (AWS). The main advantages of using such a set-up: Convinced? Awesome, let’s get started! First a short overview of the main steps covered in this blog post: Ready? Alright, sweet…
www.blog.rdata.lu
Analysis of the Renert - Part 3
Now that we have the data in a nice format, let’s make a frequency plot! First let’s load the data and the packages: Because such a list is not available in Luxembourguish, I have translated it using Google’s translate api…
malco.io
Stochastic Shakespeare
I’m also going to extract the punctuation and assess how many of each there are for when I actually assemble the sonnets later. Now fit the Markov Chain with the vector of words. with white nor my self thou with gentle verse which Let’s try it out…
r-tastic.co.uk
Trump VS Clinton Interpretable Text Classifier
As always, let’s start with loading necessary packages. Quick glimpse on the class balance, which looks very good, BTW. Finally, let’s clean the data a little: select only tweets text and author, change column names to something more readable and remove URLs from text…
adamspannbauer.github.io
YouTube Reaction Face Finder
This post is to share a side project on extracting ‘reaction faces’ from YouTube videos. Example Output Output from video: ‘PLOTCON 2016: Hadley Wickham, New open viz in R’…
thug-r.life
mgsub v1.0 Launched to CRAN
Official CRAN Launch Earlier this week I submitted mgsub to CRAN and after a couple of days it was accepted! Now it’s live! I’m very excited to have published my second package and one that I think is a more valuable contribution than my first. The package represented a few firsts for me…
nowosad.github.io
Geocomputation with R - the intermission
Both chapters apply command-line based geocomputation introduced in chapters 1-6 to the real world, and answer relevant questions in a reproducible manner with the help of open data and…
wenlong-liu.github.io
Hellow world!
I build this website to achieve two goals: helping others know more about my professional achievements, and presenting my most latest output (with details) to persons of interest…
saidejp.rbind.io
Introducción al Aprendizaje no Supervisado con R
El presente documento realiza una introducción al aprendizaje no supervisado, el cual se puede entender como un conjunto de técnicas estadísticas que permiten encontrar patrones o estructura en los datos, sin necesariamente contar con hipótesis…
mouse-imaging-centre.github.io/blog
Linear Models
Preamble The purpose of this post is to elucidate some of the concepts associated with statistical linear models…
thug-r.life
One Year of Trump Executive Orders
First Year Less than a week ago marked the end of Trump’s first year in office. Back in August I posted code on analyzing the issuing of Executive Orders. Today I’m just going to provide updated commentary. Notes The Federal Register takes time to actually publish Executive Orders. This window is variable but has a median value of 5 days…
roelandtn.frama.io
Problematic, data source, and variables selection
This is the first part of a series of blog post regarding a project I did with 2 master degree colleagues. The main entry to this series is here. Today, we will discuss the problematic, the data and the variables selection from those data in respect of the problematic…
www.cultureofinsight.com/blog
Visualising Intersecting Sets Of Twitter Followers
Twitter Analytics There has been a surge in a lot of great twitter analytics recently in the #rstats world, in part due to Michael Kearney’s excellent rtweet package…
ropensci.org/technotes
nodbi
You can imagine how it is relatively straight-forward to create a common interace to row-column oriented databases, and DBI is great for that. Thus far, we’ve built nodbi around data.frame’s. That is, we’re focusing on the data.frame use case as it’s very common that R users are dealing solely with data.frame’s in their analysis pipelines…
www.blog.rdata.lu
Analysis of the Renert - Part 2
So, let’s unnest the tokens: We can remove these with a couple lines of code: For my Luxembourgish-speaking compatriots, I’d be glad to get help to make this list better! This list is far from perfect, certainly contains typos, or even words that have no reason to be there! Please…
www.redbandsports.net
Can LeBron become the greatest scorer in NBA history?
LeBron James became the seventh player in NBA history to surpass 30,000 points in his career last night when he scored 28 points in Cleveland’s 114-102 loss in San Antonio. The 30,000th points came on a long two-pointer at the end of the first quarter…
eliocamp.github.io/codigo-r
Cómo hacer un efecto de relieve en R
(Versión en inglés) Estaba tratando de hacer una guía de colores circular (que los extremos tengan el mismo color) para hacer gráficos de ángulos o direcciones del viento, cuando descubrí una forma interesante de crear un efecto de relieve en mapas de topografía…
mvaugoyeau.netlify.com
First post
It is not easy to start, the first step is the hardest… I created this site to explain statistical analyses and used of R that I already did. It is also intended to evolve with my future works…
blog.schochastics.net
SOMs and ggplot
We will, however, only use a random sample of the 75,000 players, for computational convenience. We start by computing the SOM for the random sample. There we go! Now we can continue putting the players in the right node. I think you can see more easily how homogeneous the grid nodes are with this plot. This very much the same code as used in the package…
saidejp.rbind.io
Socioeconomic Factors of Poor Physical and Mental Health
BRFSS is an ongoing surveillance system designed to measure behavioral risk factors for the non-institutionalized adult population (18 years of age and older) residing in the US…
www.tidyverse.org/articles
tibble 1.4.2
This article shows the effect of each new option based on the following simple tibble…
lenkiefer.com
Me on a podcast
Hey check it out! Me on a podcast: https://policyviz.com/podcast/episode-111-len-kiefer/. We talk about data visualization and how I use it at work. A bit about using R too. I got the opportunity to talk with Jon Schwabish on the Policyviz podcast…
www.cultureofinsight.com/blog
Building a Cryptocurrency Tracker with R
TL;DR - check the tracker out here. As a recent cryptocurrency ‘Investor’ (0…
www.gokhanciflikli.com
Predicting Conflict Duration with (gg)plots using Keras
An Unlikely Pairing Last week, Marc Cohen from Google Cloud was on campus to give a hands-on workshop on image classification using TensorFlow. Consequently, I spent most of my time thinking about how I can incorporate image classifiers in my work…
cevo.com.au
Sending Watchmen into the Open
Open source software is a decentralised development and distribution model that encourages collaboration in the public domain…
www.tidyverse.org/articles
fs 1.0.0
Install the latest version with: Some examples… Filter files by type, permission, size and 15 other attributes. Tabulate and display folder size…
emmavestesson.netlify.com
Please like me
Sundays When I woke up this morning I wrote a long to do list. Instead of working through the list I somehow ended up spending most of my day playing in R. Scraping the web I have been keen to try web scraping in R for a while so I gave rvest a go…
adamspannbauer.github.io
Snake Game Shiny Loader
This post is to share the 🐍snakeLoadR🐍 R package. The package adds the snake game as a loading screen tied to a specific output in a shiny app. This repo has the code used to make the app in the gif. This loader package is more of a novelity than it is anything useful, but it was a fun little project…
ritsokiguess.site/docs
Displaying grouped bar charts in ggplot
Introduction When you have two categorical variables to plot, grouped bar charts are one possible visualization…
sciathlon.github.io
Figure skating athletes' personal best
Today I am writing another piece about figure skating, also another piece about data analysis in this event…
sciathlon.github.io
R figure skating analysis
Analysing medals won per athlete/per country with R Today I am introducing a sneaky little data analysis using R on figure skating in the olympics. I have already written a piece on Figure skating and what I think is going to happen in the upcoming olympics…
yihui.name/en
Back To The DT Package After Two Years
As a standalone HTML page (rendered as a full-page HTML widget), in R Markdown code chunks, and in Shiny. In the RStudio Viewer, and in a normal web browser. In Bootstrap themes, or on normal HTML pages…
www.redbandsports.net
Genie Bouchard and tennis Elo ratings
When the next WTA rankings come out on Jan. 29, Eugenie Bouchard will have a doubles ranking that is higher than her singles ranking. Since she rarely plays doubles, it’s a pretty stark picture of the state of her game…
aosmith.rbind.io
Reversing the order of a ggplot2 legend
It’s always nice to get good questions in a workshop. It can help everybody, including the instructor, get a bit of extra learnin’ in…
mouse-imaging-centre.github.io/blog
StanCon Highlights
Hi readers, Recently I got back from StanCon 2018 Ansilomar. I had a little time waiting for one of my flights and I thought I’d reflect on the conference. Last year I was lucky enough to go to the first StanCon and it was nice to be able to see how the conference has grown. This year it was three days of tutorials, talks, and networking…
blog.schochastics.net
Traveling Beerdrinker Problem
Whenever I participate in a Science Slam, I try to work in an analysis of something typical for the respective city. My next gig will be in Munich, so there are two natural options: beer or football. In the end I choose both, but here I will focus on the former…