www.datalorax.com

Sharing some functions from my personal R package

In this post I basically just wanted to share some recent developments that I’ve made to my personal R package {sundry}. All of the recent advancements have been made to work with the tidyverse, so things like group_by should work seamlessly. If you feel like giving the package a whirl, I’d love any feedback you have or bugs you may find


www.jtimm.net

locating linguistic diversity in the usa

Language data and the census Languages in the US Linguistic diversity as entropy Locating linguistic diversity FIN This post investigates linguistic diversity in the United States utilizing data made available by the US Census


sciathlon.github.io

Biathlon data analysis

The 2018 Winter Olympics finally kicked off! In honor of this, I asked a friend of mine who loves sports what data he would be interested in seeing in the sports that are in the winter Olympics and he answered “biathlon


mouse-imaging-centre.github.io/blog

Finding and playing with peaks in RMINC

So, peaks. When producing a statistical map, it’s good to get a report of the peaks (i.e. most significant findings). RMINC has had this support for a while now, though it has remained somewhat hidden


thestudyofthehousehold.com

How I use Rmarkdown

Last week or so, I achieved a wonderous thing. A trivial thing. I acheived a wondrous, trivial thing: I wrote my most popular tweet ever: My new thing is ending every Rmd with a list of links to the forums / SO questions / blogs / github repos that I used to solve the problem #rstats pic.twitter


adamspannbauer.github.io

Proposals, diamonds, xgboost, & lime

I recently got engaged! While picking out the stone for the ring I played around with the diamonds dataset from ggplot2. The analysis was around what are the main contributors to diamond pricing. (The analysis was also an excuse to play around with the lime package for the first time


www.benjaminackerman.com

Tidying and Visualizing TV Ratings Data in R

Bless you, rebus 🙏🏻


www.carlbfrederick.com

Using R to Create Google Maps

Voila! The new file is only 1.6 MB. Now, all that is left to do is to login to Google Maps, import the widists_sm.kml file and share it


blog.wallaroolabs.com

A Scikit-learn pipeline in Wallaroo

While it would seem that machine learning is taking over the world, a lot of the attention has been focused towards researching new methods and applications, and how to make a single model faster


ryanestrellado.netlify.com

California School Dashboards Part 1

This is part one of a three part series where I’ll be working with California School Dashboard data by cleaning, visualizaing, and exploring through modeling. Introduction: It’s Ok to Skip Around I’m writing this series for data scientists, public school educators, and data scientists who are also public school


cevo.com.au

Is there such a thing as too much automation?

A recent discussion at Cevo HQ came onto the topic of satellite navigation and how lost we would be without it, how in the good old days of street directories you were generally only lost the first time you travelled to a new location


www.njtierney.com

Acknowledgements

Last week (specifically, the 1st Feb) I got “the letter” from QUT that basically says “You, Nicholas Tierney, are a Dr.”. Well, specifically, it says: The Queensland University of Technology’s Registrar has executively approved your degree and you are now entitled to use the title of


giorasimchoni.com

Book'em Danno!

One of the best talks in my opinion at rstudio::conf 2018 was actually the first keynote by Prof. Di Cook, of Monash University, titled “To the Tidyverse and Beyond: Challenges for the Future in Data Science”1. Di talked about how she views the plot of a dataset as simply another statistic, a function of this dataset


lenkiefer.com

February 2018 housing market update

EARLIER THIS WEEK I TWEETED out a poll asking whether or not folks wanted to see a thread/tweetstorm with slides from an upcoming presentation on the economy and housing markets that I’m giving. Over 90 percent voted for a thread. So I shared it. In this post let me add a little more commentary on the individual slides


jessesadler.com

Introduction to GIS with R

Let’s look at the result


gcppodcast.com

Open Source TensorFlow with Yifei Feng

How do I design identity and access management policies policies for a


yihui.name/en

Perhaps My Documentation is not Too Bad

My documentation was poor. The user didn’t read the documentation. It turned out the second possibility was true. What a relief! I’m glad that I didn’t yell “Why don’t you read the documentation?” That said, as software developers, we should always keep the first possibility in mind


thug-r.life

Prep Your Hugo Blog for R-bloggers

Get Ready for R-Bloggers There are lots of reasons to write a technical blog. Getting practice writing, thinking through ideas, etc. are all fulfilling reasons on their own. But life is always better with an audience. Well, not always. But for a blog it is


blog.schochastics.net

Sample Entropy with Rcpp

Simple. Problem is, I need to calculate the sample entropy of 150,000 time series. Can the function handle that in reasonable time? This translates to several hours for 150,000 time series, which is kind of not ok. I would prefer it a little faster. Perfect. Now let’s check if we gained some speed up. The speed up is actually ridiculous


yihui.name/en

Thanks, Marie Dussault, for Reading the Full blogdown Book

Write, as if no one would read


blog.sellorm.com

dater - a tiny Addin for RStudio

Inserting a date with a keyboard shortcut TLDR: You can find the dater package on github. Update 1: It seems all of this was for nothing… Hi


blog.earo.me

tsibble? or tibbletime?

So what do these two packages have in common? A common time series analysis task is to aggregate the values to higher-level time periods. For example, it may be interesting to examine average temperature and total precipitation every month. Generally, tsibble defines a time series tibble more strictly than tibbletime


lenkiefer.com

Comparing recent periods of mortgage rate increases

THIS MORNING I SAW AN INTERESTING CHART OVER ON BLOOMBERG. In this post they compared recent 10-year Treasury yield movements with the Taper Tantrum in 2013. The chart you can see here was an area chart with overlapping line plots. I thought it would be a fun exercise to remix a similar chart with R


www.redbandsports.net

How good were the offences in the Super Bowl

Sunday’s Super Bowl set 17 statistical records and tied 12 more, according to research by the Elias Sports Bureau. The 1,151 combined total yards by the Eagles and Patriots obliterated the old record by 222 yards. Below is a histogram showing the combined total yards of the 52 Super Bowls, sorted into bins of 25 yards


eliocamp.github.io/codigo-r

How to make a shaded relief in R

Spanish version of this post While trying to build a circular colour scale to plot angles and wind direction, I stumbled upon an easy way to make shaded reliefs in


translatedmedicine.com

Boston Limited English Proficient Population

Inspire by Julia Silge and the tidycensus package by Kyle Walker, I wanted to explore the limited English proficient (LEP) population in Suffolk County (which includes Boston). For those, not in the know, LEP refers people who speak English less than very well. Beyond the overall population, I wanted to get a glimpse into the language diversity of the


fishsciences.github.io

Visualizing Fish Encounter Histories

Encounter histories are the translation of a fish’s path into a row of ones and zeros, each corresponding to a positive or negative detection record at a receiver location in the acoustic


yihui.name/en

What is More Convincing than a CV?

You need to have a Github presence (I’m sure it does not have to be Github, e.g., Gitlab should also be fine). It is hard to imagine that a data scientist does not use version control. You must have given talks at local meetups or conferences. Communication is a key part of data science


emmavestesson.netlify.com

What should I have for lunch?

Inspiration I was having lunch with some colleagues the other day when they told me about a restaurant spreadsheet that they used to use to randomly pick a place to get lunch from. I of course felt the need to see if I could create something similar in R


adamspannbauer.github.io

rPackedBar on CRAN

This post is to announce rPackedBar’s release to CRAN and to share a shiny app to visualize twitter interactions using a packed barchart. This post and the package has been updated due to feedback from Xan Gregg. Click to See more


yihui.name/en

How to Pretend Typing Super Fast in RStudio

Below is a prototype of the function to automatically “type” a character vector into your RStudio source editor: It should give you what you


sciathlon.github.io

Triathlon pubmed analysis

Today I am using the RISmed package for R to analyze publications about triathlon. It is an amazing package to look through the Pubmed database for what they have on a certain subject. Pubmed is a NIH (USA) funded database which hosts articles about medicine and biology


www.redbandsports.net

Do Super Bowl QBs get more babies named after them?

On Jan. 23, Sports Illustrated posted the following tweet, which got us thinking. Fourteen years ago, a 7-year-old in Foxboro told Tom Brady he had named his baby brother after him


shotwell.ca/blog

Flagging toxic comments with Tidytext and Keras

The task here is to try to determine how likely a string is to have a particular set of labels. We can take a look at the data The first step for looking at the actual text is to split up the strings into words and then remove stop words


blog.wallaroolabs.com

Idiomatic Python Stream Processing in Wallaroo

We have been working on Wallaroo, our scale-independent event processing system, for a little over two years


fharrell.com

Is Medicine Mesmerized by Machine Learning?

Avati et al used deep learning on the 13,654 features to achieve a validated c-index of 0.93. To the authors’ credit, they constructed an unbiased calibration curve, although it used binning and is very low resolution


www.ifconfig.it/hugo

Tech Field Day Extra at Cisco Live Europe 2018

I had the honor and pleasure of being invited again to attend Tech Field Day, this time for an Extra event at Cisco Live Europe in Barcelona


mouse-imaging-centre.github.io/blog

Bayesian Model Selection with PSIS-LOO

Pitch In this post I’d like to provide an overview of Pareto-Smoothed Importance Sampling (PSIS-LOO) and how it can be used for bayesian model selection


aosmith.rbind.io

Making many added variable plots with purrr and ggplot2

Last week two of my consulting meetings ended up on the same topic: making added variable plots. In both cases, the student had a linear model of some flavor that had several continuous explanatory variables


www.cultureofinsight.com/blog

Map your Google Location Data with R Shiny

I Know What You Vizzed Last Summer tl;dr click the image to launch the app I guess I’m of that school of thought, I don’t mind my mobile tracking me


gcppodcast.com

Percy.io with Mike Fotinakis

I would love a weekly roundup of news about Google Cloud Platform - where can I get


yihui.name/en

The Large Variance in the Attention Level of Readers

I keep forgetting this, too, and let outliers bias me. For example, I often feel heartbroken when I see users ask questions on Twitter without reading the documentation on which I have spent countless hours. What is worse is that they may get misleading or wrong answers. Perhaps my documentation is just too poor or boring, and perhaps they just didn’t pay attention


yihui.name/en

Anything that Can Look Like Cats will Look Like Cats

Anything that can look like cats in your eyes, will look like cats in some other people’s eyes. P.S


timtrice.net

Department Top Three Salaries

Write a SQL query to find employees who earn the top three salaries in each of the department. For the above tables, your SQL query should return the following rows. Table: (#tab:solution-2)0 records Department Employee Salary ———– ——— ——- Adding the line below to the query above passes the test case


yihui.name/en

How to Properly Write a URL

You may think it is dead simple to write a URL, and we click links to browse websites every day


yihui.name/en

Ian Lyttle is the Most Serious Conference Attendee I've Met

I think I got to know Ian Lyttle in early 2014 (about 1.5 years before I left Iowa). Out of nowhere, he started to show up at the weekly ISU graphics working group meetings (led by my PhD advisors Di and Heike) after driving for three boring hours from another city


magesblog.com

PK/PD reserving models

The dynamical system is no longer autonomous and initially I can’t be bothered to solve it analytically. Hence, I use an ODE solver instead, but I will get back to integrating the differential equations later. Fortunately, an ODE solver is part of the Stan language


www.gokhanciflikli.com

Scraping Wikipedia Tables from Lists for Visualisation

Get WikiTables from Lists Recently I was asked to submit a short take-home challenge and I thought what better excuse for writing a quick blog post! It was on short notice so initially I stayed within the confines of my comfort zone and went for something safe and bland


ritsokiguess.site/docs

Tidy simple effects in analysis of variance

Introduction In two-way analysis of variance, the (continuous) response variable depends on two explanatory factors, say A and B


blog.wallaroolabs.com

Why we wrote our Kafka Client in Pony

At Wallaroo Labs we’ve been working on our stream processing engine, Wallaroo for just under two years now


www.jtimm.net

a simple framework for corpus-based keyphrase extraction

Defining potential keyphrases Corpus search for potential keyphrases Selecting descriptive keyphrases with the tf-idf statisitic Post script - State of the Union Addresses This post outlines a simple framework for identifying and extracting keyphrases from component texts of a


cevo.com.au

Introduction to R

R is great for doing any kind of slicing and dicing with data. However the barrier to entry can be high, especially for people that come from a non-data background. I know that it took me quite some time to grasp just how R does its magic


blog.mgechev.com

JavaScript Decorators for Declarative and Readable Code

Decorators in JavaScript are now in stage 2. They allow us to alter the definition of a class, method, or a property. There are already a few neat libraries which provide decorators and make our life easier by allowing us to write more declarative code with better performance characteristics. In this blog post I’ll share a few decorators which I’m using on a daily basis


asch3tti.netlify.com

My first experience with text mining

The first step was to quantify how often words were used across the 34 chapters of the novel, to have an initial idea of the content. So, I counted the number of occurrences for each word and selected only the most common ones (i.e


www.rdatagen.net

Have you ever asked yourself, 'how should I approach the classic pre-post analysis?'

I’ve explored various scenarios (i.e. different data generating assumptions) to see if it matters which approach we use. (Of course it does


fharrell.com

Information Gain From Using Ordinal Instead of Binary Outcomes

The point about the increase in power can also be made by, instead of varying the effect size, varying the effect that can be detected with a fixed power of 0.9 when the degree of granularity in Y is increased. This is all about breaking ties in Y. The more ties there are, the less statistical information is present


www.sastibe.de

My Motivations for Starting a Blog

Hello world! My name is Sebastian Schweer, and I am a Data Scientist. This job description is increasingly popular, but it is notoriously difficult to describe precisely, what that entails


www.jessemaegan.com

R4DS February Challenge

The challenge is short and sweet this month, and the same for both learners and mentors: Remember: the size of your win isn’t what’s important–everyone’s learning process unfolds at different rates and sizes–what matters is coming together to celebrate everyone’s learning journey within our online


ryanestrellado.netlify.com

Turning Dataset Codes to Words With R

Note: I include a lot of code in this post so my fellow data scientists can either learn from it or give me feedback about how to make it better. It’s totally ok to skip over all that and just check out the


livefreeordichotomize.com

Wrangling Data Day Texas Slides

Since twitter threads are excessively cumbersome to navigate, Maëlle asked me to relocate the list of #rstats Data Day Texas slides to a blog post, so here we are! The titles link to the slides 👯 Pilgrim’s Progress: a journey from confusion to contribution Mara Averick Navigating the data science landscape can be


rsangole.netlify.com

First foray into Shiny

Visualising Distributions Visualising Linear Discriminant Analysis Shiny had interested me for a while for it’s power to quickly communicate and vizualise data and models. I hadn’t delved into it due to lack of time to do so, until now. Two quick visualizations I’ve created as my 1st foray into R Shiny


emmavestesson.netlify.com

Happy Birthday To Me!

Happy Birthday To Me! Today is my birthday. To celebrate I decided to look at what was in the news on January 27 every year since I was born. Mainly I want to see if the news were positive or negative. Getting the data I start buy creating a list of dates


josiahparry.com

Introducing geniusR

The functions in this package enable easy access of individual song lyrics, album tracklists, and lyrics to whole albums. Load the package: This returns a tidy data frame with three columns: In this example I will extract 3 albums from Kendrick Lamar and Sara Bareilles (two of my favotire musicians)


jesse.tw

Jalen v. Shaq as baby names

Was Jalen Rose really the first Jalen? He claims his mother was the first to make up the name, a combo of his father’s (the NBA player Jimmie Walker) and uncle’s (Leonard) names


www.sastibe.de

Setting up a Scalable RStudio Instance in AWS

Obviously, that is the case. In this post, I will show you the steps for setting up such an environment on Amazon Web Services (AWS). The main advantages of using such a set-up: Convinced? Awesome, let’s get started! First a short overview of the main steps covered in this blog post: Ready? Alright, sweet


www.blog.rdata.lu

Analysis of the Renert - Part 3

Now that we have the data in a nice format, let’s make a frequency plot! First let’s load the data and the packages: Because such a list is not available in Luxembourguish, I have translated it using Google’s translate api


malco.io

Stochastic Shakespeare

I’m also going to extract the punctuation and assess how many of each there are for when I actually assemble the sonnets later. Now fit the Markov Chain with the vector of words. with white nor my self thou with gentle verse which Let’s try it out


r-tastic.co.uk

Trump VS Clinton Interpretable Text Classifier

As always, let’s start with loading necessary packages. Quick glimpse on the class balance, which looks very good, BTW. Finally, let’s clean the data a little: select only tweets text and author, change column names to something more readable and remove URLs from text


adamspannbauer.github.io

YouTube Reaction Face Finder

This post is to share a side project on extracting ‘reaction faces’ from YouTube videos. Example Output Output from video: ‘PLOTCON 2016: Hadley Wickham, New open viz in R’


thug-r.life

mgsub v1.0 Launched to CRAN

Official CRAN Launch Earlier this week I submitted mgsub to CRAN and after a couple of days it was accepted! Now it’s live! I’m very excited to have published my second package and one that I think is a more valuable contribution than my first. The package represented a few firsts for me


nowosad.github.io

Geocomputation with R - the intermission

Both chapters apply command-line based geocomputation introduced in chapters 1-6 to the real world, and answer relevant questions in a reproducible manner with the help of open data and


wenlong-liu.github.io

Hellow world!

I build this website to achieve two goals: helping others know more about my professional achievements, and presenting my most latest output (with details) to persons of interest


saidejp.rbind.io

Introducción al Aprendizaje no Supervisado con R

El presente documento realiza una introducción al aprendizaje no supervisado, el cual se puede entender como un conjunto de técnicas estadísticas que permiten encontrar patrones o estructura en los datos, sin necesariamente contar con hipótesis


mouse-imaging-centre.github.io/blog

Linear Models

Preamble The purpose of this post is to elucidate some of the concepts associated with statistical linear models


thug-r.life

One Year of Trump Executive Orders

First Year Less than a week ago marked the end of Trump’s first year in office. Back in August I posted code on analyzing the issuing of Executive Orders. Today I’m just going to provide updated commentary. Notes The Federal Register takes time to actually publish Executive Orders. This window is variable but has a median value of 5 days


roelandtn.frama.io

Problematic, data source, and variables selection

This is the first part of a series of blog post regarding a project I did with 2 master degree colleagues. The main entry to this series is here. Today, we will discuss the problematic, the data and the variables selection from those data in respect of the problematic


www.cultureofinsight.com/blog

Visualising Intersecting Sets Of Twitter Followers

Twitter Analytics There has been a surge in a lot of great twitter analytics recently in the #rstats world, in part due to Michael Kearney’s excellent rtweet package


ropensci.org/technotes

nodbi

You can imagine how it is relatively straight-forward to create a common interace to row-column oriented databases, and DBI is great for that. Thus far, we’ve built nodbi around data.frame’s. That is, we’re focusing on the data.frame use case as it’s very common that R users are dealing solely with data.frame’s in their analysis pipelines


www.blog.rdata.lu

Analysis of the Renert - Part 2

So, let’s unnest the tokens: We can remove these with a couple lines of code: For my Luxembourgish-speaking compatriots, I’d be glad to get help to make this list better! This list is far from perfect, certainly contains typos, or even words that have no reason to be there! Please


www.redbandsports.net

Can LeBron become the greatest scorer in NBA history?

LeBron James became the seventh player in NBA history to surpass 30,000 points in his career last night when he scored 28 points in Cleveland’s 114-102 loss in San Antonio. The 30,000th points came on a long two-pointer at the end of the first quarter


eliocamp.github.io/codigo-r

Cómo hacer un efecto de relieve en R

(Versión en inglés) Estaba tratando de hacer una guía de colores circular (que los extremos tengan el mismo color) para hacer gráficos de ángulos o direcciones del viento, cuando descubrí una forma interesante de crear un efecto de relieve en mapas de topografía


mvaugoyeau.netlify.com

First post

It is not easy to start, the first step is the hardest… I created this site to explain statistical analyses and used of R that I already did. It is also intended to evolve with my future works


blog.schochastics.net

SOMs and ggplot

We will, however, only use a random sample of the 75,000 players, for computational convenience. We start by computing the SOM for the random sample. There we go! Now we can continue putting the players in the right node. I think you can see more easily how homogeneous the grid nodes are with this plot. This very much the same code as used in the package


saidejp.rbind.io

Socioeconomic Factors of Poor Physical and Mental Health

BRFSS is an ongoing surveillance system designed to measure behavioral risk factors for the non-institutionalized adult population (18 years of age and older) residing in the US


www.tidyverse.org/articles

tibble 1.4.2

This article shows the effect of each new option based on the following simple tibble


lenkiefer.com

Me on a podcast

Hey check it out! Me on a podcast: https://policyviz.com/podcast/episode-111-len-kiefer/. We talk about data visualization and how I use it at work. A bit about using R too. I got the opportunity to talk with Jon Schwabish on the Policyviz podcast


www.cultureofinsight.com/blog

Building a Cryptocurrency Tracker with R

TL;DR - check the tracker out here. As a recent cryptocurrency ‘Investor’ (0


www.gokhanciflikli.com

Predicting Conflict Duration with (gg)plots using Keras

An Unlikely Pairing Last week, Marc Cohen from Google Cloud was on campus to give a hands-on workshop on image classification using TensorFlow. Consequently, I spent most of my time thinking about how I can incorporate image classifiers in my work


cevo.com.au

Sending Watchmen into the Open

Open source software is a decentralised development and distribution model that encourages collaboration in the public domain


www.tidyverse.org/articles

fs 1.0.0

Install the latest version with: Some examples… Filter files by type, permission, size and 15 other attributes. Tabulate and display folder size


emmavestesson.netlify.com

Please like me

Sundays When I woke up this morning I wrote a long to do list. Instead of working through the list I somehow ended up spending most of my day playing in R. Scraping the web I have been keen to try web scraping in R for a while so I gave rvest a go


adamspannbauer.github.io

Snake Game Shiny Loader

This post is to share the 🐍snakeLoadR🐍 R package. The package adds the snake game as a loading screen tied to a specific output in a shiny app. This repo has the code used to make the app in the gif. This loader package is more of a novelity than it is anything useful, but it was a fun little project


ritsokiguess.site/docs

Displaying grouped bar charts in ggplot

Introduction When you have two categorical variables to plot, grouped bar charts are one possible visualization


sciathlon.github.io

Figure skating athletes' personal best

Today I am writing another piece about figure skating, also another piece about data analysis in this event


sciathlon.github.io

R figure skating analysis

Analysing medals won per athlete/per country with R Today I am introducing a sneaky little data analysis using R on figure skating in the olympics. I have already written a piece on Figure skating and what I think is going to happen in the upcoming olympics


yihui.name/en

Back To The DT Package After Two Years

As a standalone HTML page (rendered as a full-page HTML widget), in R Markdown code chunks, and in Shiny. In the RStudio Viewer, and in a normal web browser. In Bootstrap themes, or on normal HTML pages


www.redbandsports.net

Genie Bouchard and tennis Elo ratings

When the next WTA rankings come out on Jan. 29, Eugenie Bouchard will have a doubles ranking that is higher than her singles ranking. Since she rarely plays doubles, it’s a pretty stark picture of the state of her game


aosmith.rbind.io

Reversing the order of a ggplot2 legend

It’s always nice to get good questions in a workshop. It can help everybody, including the instructor, get a bit of extra learnin’ in


mouse-imaging-centre.github.io/blog

StanCon Highlights

Hi readers, Recently I got back from StanCon 2018 Ansilomar. I had a little time waiting for one of my flights and I thought I’d reflect on the conference. Last year I was lucky enough to go to the first StanCon and it was nice to be able to see how the conference has grown. This year it was three days of tutorials, talks, and networking


blog.schochastics.net

Traveling Beerdrinker Problem

Whenever I participate in a Science Slam, I try to work in an analysis of something typical for the respective city. My next gig will be in Munich, so there are two natural options: beer or football. In the end I choose both, but here I will focus on the former