www.mytinyshinys.com
EPL Week 34
For the remainder of the season, I will be travelling with a back up laptop so please excuse any shortfall in posts and site updates Match of the DayBit of a damp squib with Man. Utd…
gcppodcast.com
Post-Quantum Cryptography with Nick Sullivan and Adam Langley
How to stream realtime…
blog.rstudio.com
Summer Interns
Fanny is a master’s student working with Max Kuhn this summer. Previously she studied Statistics and International Relations at UC Davis. She is particularly interested in privacy and interpretability in machine learning…
www.jakekaupp.com
#TidyTuesday: The Sad Story
This weeks #TidyTuesday is all about global mortality rates. I’ve looked at datasets like this before, and while they are a great way to see interesting trends and practise some visualization skills, I think there is another lesson to be learned. The human element of data. These data were people. They meant something to someone…
engineering.pivotal.io
How Pivotal contributes to the Development of PostgreSQL 11
Not only is the PostgreSQL Project an awesome project with a thriving community, and it is fun to work with, in the long run this strategy has several advantages for us…
research.libd.org/rstatsclub
git to know git
Beyond working on projects with collaborators, using GitHub is equally rewarding when used for individual projects…
www.jennadallen.com
A Shiny App to Visualize and Share My Dogs’ Medical History
As a digital nomad traveling with 2 dogs, keeping track of all their medical and vaccine records has been challenging. Especially since one of our dogs has had some recent health issues. I needed a way to organize all the vet visits, test results, vaccine certificates, etc. as well as be able to share them with new vets and our primary vet back in Colorado…
ramhiser.com
Building Scikit-Learn Pipelines With Pandas DataFrames
I’ve used scikit-learn for a number of years now…
www.mytinyshinys.com
Visualizing Networks
Armed with a free month of Data Camp, I have been taking a look at one or two areas I am pretty uninformed about, including Social Networks The course is run by fellow ex-pat and soccer/football enthusiast, James…
www.sastibe.de
Don't Worry
I have a personal Google account, complete with gmail, gdrive and everything else. I first opened it up as a sort of spam email for all kinds of logins, but started to it use more and more due to its convenience…
ryantravis.netlify.com
Predicting NFL Injuries with Stan
Yesterday I wrote a post using Stan to fit simple one parameter models. These are boring, but helpful for learning the basics. Today, I’d like to start building a series of increasingly complicated regression models…
josiahparry.com
Coursera R-Programming
Over the past several weeks I have been helping students, career professionals, and people of other backgrounds learn R…
www.datalorax.com
New Website Theme!
This post has needed to be writtend for a little while, but I’ve been busy with the actual work of redesigning my website (in fact, I have a number of posts that are backlogged)…
www.stuartlee.org
Rookie mistakes and how to fix them when making plots of data
In this assignment, the focus was to practice data cleaning. Students suggested questions to build a class survey, to get to know the interests of other class members, and then completed the composed survey…
ryantravis.netlify.com
Stan Basics
I attended a great short course on bayesian workflow using Stan at the New England Statistics Symposium yesterday. If you don’t know, Stan is “a state-of-the-art platform for statistical modeling and high-performance statistical computation”…
blog-mjay.firebaseapp.com
Text Clustering at scale
Hello there! Today we will explore the overview of databricks © clusters and how to run the model using community account. Prerequisite Basic understanding of programming in Python or Scala. Knowledge or experience in Java, SQL, PySpark can be beneficial but is not essential…
blog.wallaroolabs.com
Choosing Elixir's Phoenix to power a real-time Web UI
Here at WallarooLabs, we’ve been working on Wallaroo, our high-throughput, low-latency, and elastic data processing framework, for nearly two years now…
davemcg.github.io
Let’s Plot 5
It contains cell area size for thousands of cells which have had a drug perturbation, split by wells in a dish. One drug per well. Several wells got the same drugs. So there are multiple plots per drug. How did I know? Because a bunch of the density plots were super wavy - which means (almost always) that the number of counts in that sample is very low…
www.aggieerin.com
New Publication: Texting
One more announcement! We just had a new publication accepted: “Textisms”: The Comfort of the Recipient: This paper was an undergraduate honors thesis that Flora-Jean and I finally got accepted! She did a great job making sure this paper was completed and…
rsangole.netlify.com
Performance Benchmarking for Date-Time conversions
Motivation Performance comparison Packages compared Results Motivation Once more, there’s was an opportunity at work to optimize code and reduce run-time. The last time was for dummy-variable creation. Upon querying large data from our hive tables, the returned dataframe contains values of class character…
blog.rstudio.com
RStudio Connect 1.6.0 - A Year in the Making!
We’re pleased to announce RStudio Connect 1.6.0. Connect 1.6.0 caps a year of significant updates and we encourage all customers to upgrade. Connect users continue to share analytics through reports, dashboards, and web applications…
lenkiefer.com
Rate Cloud
TIME FOR A FUN NEW MORTGAGE RATE CHART. This one: We’ll use R to plot a new visualization of mortgage rates. Let’s make it. Data As we did with our majestic mortgage rate plot post we’ll plot mortgage rates using the Freddie Mac Primary Mortgage Market Survey…
mvaugoyeau.netlify.com
Start with the data
The distribution of each variable The multi-test of correlations After verifying that there is no one missprint in the data and before starting the true statistical analysis, I check the distribution of data and relationships between factors. For this blog article, I use data of air quality available in R…
ritsokiguess.site/docs
Tidy chi-squared testing
Introduction R has the creaky old functions table and chisq.test for counting up frequencies and doing chi-squared tests for association. They work, but there is nothing very tidyverse or elegant about them…
www.tidyverse.org/articles
Upvoting issues
We’re adding Reactions to conversations today to help people express their feelings more simply and effectively. When we pull the repo, issue, and reaction data together we end up with something that looks like the following figure. It’s a far cry from a fancy report, but it provides another avenue for easy communication among users and developers…
www.samabbott.co.uk
Exploring Tuberculosis Monitoring Indicators in England; Using Dimension Reduction and Clustering
Looking through the other tibbles they all have the same structure - we can write a function using this knowledge to speed up data extraction…
cattleguard.github.io
How to Proxy Go net/http
I started playing around a bit (again) in Go recently and had a need to take a look at some requests I was generating. I wanted to take a look in ZAP to see how things were working and use that to make adjustments, etc. Post is no big deal, but I didn’t want to forget how to do this. Also, there are several posts that don’t seem to actually work…
ritsokiguess.site/docs
Ordered alternatives in ANOVA
Introduction Standard analysis of variance, and nonparametric alternatives to it such as the Kruskal-Wallis test, test a null hypothesis of “all the groups have the same mean” against a vague alternative of “two or more of the groups have different…
gcppodcast.com
Project Jupyter with Jessica Forde, Yuvi Panda and Chris Holdgraf
How did Google’s predictions do during March…
blog.rstudio.com
Shiny Server (Pro) 1.5.7
Upgrade to Node v8.10.0. Dropped support for Ubuntu 12.04 and SLES 11. Don’t color log output if stdout is not a terminal. The above changes, plus: Rename CSRF token cookie from XSRF-TOKEN to SSP-CSRF, so as not to conflict with other Angular apps being served from the same host…
statsbylopez.netlify.com
The Vegas flu looks real — but somehow the Chicago Blackhawks also got sick
But analyzing team strength is just the start of how we can use this type of framework to analyze betting market data in…
www.jakekaupp.com
#TidyTuesday
I do not update this site as much as I would like. Interacting through twitter and the #R4DS slack channel has been my main contributions as of late to being more active in the #rstats community. I make plans to get to things like posting, then end up with other work, a neat idea, or just decide spend time with my family…
www.aggieerin.com
Mediation Moderation Workshop
Hi everyone! I have been super swamped with a bunch of due dates that all hit in…
statsbylopez.netlify.com
A state-space model to evaluate sports teams
Relative to other popular sports like basketball and football, it seemed to us at the time that the best team was winning less often in baseball and hockey. And as fans wanting skilled teams to be rewarded, it was frustrating to so often have well-constructed teams fall short of titles. Team quality in sports is not static. Players get hurt, traded, or decide to retire…
blog.rstudio.com
Building tidy tools workshop
Join RStudio Chief Data Scientist Hadley Wickham for his popular Building tidy tools workshop in San Francisco! If you’d missed the sold out course at rstudio::conf 2018 now is your chance. You should take this class if you have some experience programming in R and you want to learn how to tackle larger scale problems…
matthewsmith.rbind.io
Creating R packages, R markdown and blogdown
As this my very first blog post for this site (created using blogdown) I decided to write some comments/general points on my experience moving from a being a general R user making use of functions to writing R packages, using GitHub, and making use of markdown and blogdown…
www.mytinyshinys.com
EPL Week 33
For the remainder of the season, I will be travelling with a back up laptop so please excuse any shortfall in posts and site updates Match of the DayCity’s title celebrations put on ice as the Manchester derby provides 5 goals and 9 bookings whilst the Liverpool one cannot provide one of either Rooney has only completed 5 of his 23…
livefreeordichotomize.com
network3d - a 3D network visualization and layout library
What is network3d? network3d is a tiny R package built using the htmlwidgets package that takes network data in the form of a node and edge dataframes and performs a physics simulation to determine the optimal layout in three dimensions…
ritsokiguess.site/docs
Name your code chunks!
Blogdown is amazing, but there is one thing that tripped me up, and I just worked out why. I have lots of graphs in my posts, but I was sometimes getting the wrong ones, and I was wondering why that was…
www.onceupondata.com
yelpr Package for Yelp Fusion API
The available functions to search businesses are: Here we can retrieve the first 5 results in ‘New York’ with the term ‘chinese’ The available functions to search events are: So we can get details of featured event in the given location using the longitude and latitude…
www.brodieg.com
Adventures in R and Compiled Code
Preface My programming background is mostly in interpreted languages, so much of what follows will be obvious to experienced C programmers. If you notice any errors please let me know. I share this in the hopes it may be helpful to others taking the leap into C from R…
www.seanlnguyen.com
Analyzing Sleep Data with R
I’ve been using the iOS Sleep Cycle app to track my sleeping since late 2014 and have accumulated quite a bit of information about my sleeping habits…
thestudyofthehousehold.com
Centering by group means, Part I
What this post is about I have been thinking a lot lately about the importance of centering variables. This is something that has earned me eyerolls and scorn from my nearest collaborators, because it is so basic…
emitanaka.github.io
Making a Hexagon Sticker
It’s worth noting about copyright of images since you may like to use images in your hex sticker. If you created the image on your own then the copyright in general will rest with you and there is no problem…
yihui.name/en
The Trouble of .Rprofile if it Doesn't Have a Trailing Newline
WAT?! Anyway, I always configure my text editors to add a newline to the end of text files (below is a screenshot of my RStudio options), so I probably would never have discovered this issue by…
www.redbandsports.net
Blue Jays attendance is down probably because they were bad last year
A narrative being spun around Toronto is that Blue Jays ticket sales are weak this season because the current front office of president Mark Shapiro and general manager Ross Atkins are being too … something. It’s hard to pin down what the two executives have done wrong because the goal posts move a lot…
engineering.pivotal.io
Create Regression Tests for Greenplum Database
In today’s software development world, testing is a fundamental and necessary part of the entire lifecycle of the product…
cevo.com.au
Get Baking
Accounts everywhere The team I was working with had responsibility for seven AWS accounts. Some accounts hosted production workloads, some were used for development and staging. As a result, there were several painful problems we encountered. On boarding team members When a new team member started, we had to create IAM users in every account they required access to…
ritsokiguess.site/docs
R, it's OK I guess
On a dare from @dataandme and @littlemissdata, last night I bought the domain ritsokiguess.site, and now this blog lives there (as well as at the old nxskok.github.io, which is the “actual” home for the site)…
blog.wallaroolabs.com
The Snake and the Horse
Introduction Welcome to our continuing series on building Wallaroo…
thomasmock.netlify.com
TidyTuesday - A weekly social data project in R
To participate in TidyTuesday, you need to do a few things: We welcome all newcomers, enthusiasts, and experts to participate, but be mindful of a few things: Everyone did such a great job! I’m posting all the ones that I can find through the hashtag, you can always tag me in your post to make sure you get noticed in the future…
www.jessemaegan.com
Kaggle panel recap
Short answer: Twitter. I don’t always tweet about career transitions in data science, but I do keep my Twitter bio section pretty focused on what I do and how I got there: Short answer: sheer dumb luck…
gcppodcast.com
Kontributing to Kubernetes with Paris Pittman and Garrett Rodrigues
Technical Lead of the Contributor Experience SIG for…
www.mytinyshinys.com
EPL Week 32
For the remainder of the season, I will be travelling with a back up laptop so please excuse any shortfall in posts and site updates Match of the DayAlthough Stoke and Southampton did themselves no favours, at least there is a relegation…
blog.wallaroolabs.com
How the end-to-end back-pressure mechanism inside Wallaroo works
Introduction to part two This is part two of a two-part series on how a Wallaroo system reacts to workload demands that exceed Wallaroo’s capacity, i.e., how Wallaroo reacts to overload…
wytham.rbind.io
Introducing Git + Github
Something helpful that came out of this exercise was that it really made me sit down and think about how exactly version control is useful to applied economists and how it can be made to feel as accessible as possible…
www.frankfarach.com
TidyTuesday
In the first quarter of 2018, I focused my data science education on expanding my R programming skills and setting up this blog in Hugo. I developed my first R package, an API wrapper to the U.S. National Provider Identification (NPI) registry, and created an R-powered Power BI custom visual for a client…
lbusett.netlify.com
A new RStudio addin to facilitate inserting tables in Rmarkdown documents
After struggling a bit due to my rather nonexistent shiny skills, in the end I managed to obtain a “basic but useful” (IMO) addin…
www.riinu.me
Islay distilleries in 3 days
Left Edinburgh at 8am for a 1pm ferry Kennacraig to Port Askaig (Islay). Edinburgh-Kennacraig should be a 3.5h drive (and it was), but we left early to allow for any delays on the road. Arrived on Islay at 3pm and our accommodation near Port Ellen (southern Islay, close to to Ardbeg, Lagavulin, Laphroiaig) was a 40 min drive from the port…
thedatawitch.com
Note to Self
This is the first post in a series where I write to myself regarding the various data science spells I’m learning. Today’s spell: dplyr’s filter function…
www.jessemaegan.com
YMMV: non-profit data science
Feeling inspired by some recent data science collaborations, on Friday I released the following tweet into the wild: Publicly it seemed to garner a good deal of positive attention, although I did also receive some valid criticism via…
ryanestrellado.netlify.com
California School Dashboards Part 3
This is part three of a three part series where I work with California School Dashboard data by cleaning, visualizaing, and exploring through modeling. You can read the first part of this series, which shows one way to clean and prepare the data, and the second part of the series, which shows a way to visualize the…
thedatawitch.com
Plotting multiple lines on the y axis of a ggplot graph
I wanted to plot the yearly sales of three different types of hybrid and electric vehicles on the same graph. The dataset was originally wide with years as columns and the types of cars as rows…
roelandtn.frama.io
September 2018 datascience goals
Last updated: 2018-09-03 I know, I know, my last post was 2 months ago. I’m not very steady but lots have been done on other things (FOSS4G-fr 2018 program, a bread recipe that I can manage, writing a python course from scratch, that kind of things). Next step in my life will be in September at the end of my Master degree…
engineering.pivotal.io
Windows Containers in Cloud Foundry? Here's How We Did It
Here’s a quick diagram, showing how these components interact. A close-up of Garden on Windows Server 2016 You’ll notice that the Garden components on a Windows Server 2016 Diego cell are the same as those on a Linux Diego cell…except for the new plugins mentioned above…
mailund.github.io/r-programmer-blog
“Optional” types using pmatch
Some programming languages, e.g. Swift, have special “optional” types. These are types the represent elements that either contain a value of some other type or contain nothing at all…
thestudyofthehousehold.com
Don’t hide your (scientific) weaknesses
What are your weaknesses as a scientist? What should we do when we become aware of those weaknesses, or when someone makes us aware? Backstory: Several months ago I started doing Crossfit here in France1. It’s been a fun way to meet new people and get some exercise…
vatlab.github.io/blog
How does SoS compare with other workflow engines
Over 200 workflow systems have been developed to date. Like any other software tools, many workflow systems are actively evolving with new features added from time to time…
www.jtimm.net
building historical socio-demographic profiles
Some preliminaries Socio-economic profiles Age distribution profiles Summary This post demonstrates a simple workflow for building census-based, historical socio-demographic profiles using the R package tidycensus…
gcppodcast.com
Forseti with Nenad Stojanovski and Andrew Hoying
Staff Security Engineer, Spotify Andrew Hoying is a Senior Security Engineer at Google. His goal is to ensure all services built by Google and running on Google Cloud Platform have the same, or better, security assurances as services running in any other environment…
jessesadler.com
Great Circles with R
At this point, the output is not hugely informative. However, we can confirm that the process of making great circles worked by observing the curvatures of the lines…
blog.wallaroolabs.com
Some Common Mitigation Techniques for Overload in Queueing Networks
Series Introduction: Overload and how Wallaroo mitigates overload This is the first of a pair of Wallaroo Labs articles about overload. Here’s a sketch of the series…
www.mytinyshinys.com
Tabulating File Information
I have been a bit lazy with regard to deploying this blog and am not sure of the best way of dealing with it As you may know, the package I use, blogdown, you create all the files within a public directory which you can then deploy the files to your server of…
www.mytinyshinys.com
Geofacetting South African style
Ryan Hafens has recently started what is promised to be a series of blog posts about his geofacet package…
magesblog.com
Insurance Data Science 2018
The abstract submission deadline for the Insurance Data Science conference at Cass Business School on 16 July 2018 is closing soon. You have until the 9th of April to submit your abstract. We like to see proposals for talks that demonstrate how data science is used in insurance, e.g…
mailund.github.io/r-programmer-blog
Lots of Function Transformations
The last couple of days I’ve been doing a lot of experimenting with a package for function rewriting: foolbox…
fharrell.com
Musings on Multiple Endpoints in RCTs
The purpose of this article is not to discuss ISCHEMIA but to discuss the general study design, endpoint selection, and analysis issues ISCHEMIA raises that apply to a multitude of trials…
statsbylopez.netlify.com
On log-loss and scoring the NCAA tournament
To wit: We are pretty sure we actually mis-typed Ohio State as Ohio during our data wrangling process, thus giving what was a pretty good Ohio State team a fairly mediocre ranking. This helped us immensely when Ohio State lost to Dayton in a first round upset…
www.jessemaegan.com
R4DS April Challenge
Both of these projects are the result of collaborations between Radovan, Burcu, Raul, Rosa, Jake, Thomas, & Ariel, and we’re thrilled to be sharing them with our…
yihui.name/en
Timezones, and Worse, Daylight Saving Time
One nice thing about the time in China is that the whole country shares the same time, from the very east to the very west, even though it means that 8am is still completely dark in the west…
aosmith.rbind.io
Unstandardizing coefficients from a GLMM
Winter term grades are in and I can once again scrape together some time to write blog posts! 🎉 I find this comes up particularly for generalized linear mixed models (GLMM), where models don’t always converge if explanatory variables are left…
blog.rstudio.com
reticulate
Calling Python from R in a variety of ways including R Markdown, sourcing Python scripts, importing Python modules, and using Python interactively within an R session. Translation between R and Python objects (for example, between R and Pandas data frames, or between R matrices and NumPy arrays)…
blog-mjay.firebaseapp.com
Data Fingerprinting with Text Data
Query : What would be the correct methodology to detect whether a statement or a word means a “yes” or a “no” using NLP? Answer : I can’t tell the precise solution. Since it is the main essence of my research[1],nonetheless I can provide you some heads up…
www.rdatagen.net
Exploring the underlying theory of the chi-square test through simulation - part 2
To motivate some of the key issues, I talked a bit about recycling. I asked you to imagine a set of bins placed in different locations to collect glass bottles…
ramhiser.com
Feature Selection with a Scikit-Learn Pipeline
I am a big fan of scikit-learn’s pipelines. Why are pipelines useful? Ensure reproducibility Export models to JSON for production models Enforce structure in preprocessing and hyperparameter search to avoid over-optimistic error estimates Unfortunately though, there are a number of sklearn modules not well integrated with pipelines…
ritsokiguess.site/docs
Today on Twitter I learned...
Introduction Today on Twitter I learned (or was reminded about) two #rstats things: from @pkqstr about separate_rows from tidyr, that does something like separate followed by gather, but better…
www.gokhanciflikli.com
1st LSE CSS Hackathon! London 17-19 April
It’s Happening, Folks! Whew! Less than a month left and I still haven’t publicised this on my blog. People, we are hosting the 1st Computational Social Science Hackathon at the London School of Economics and you are invited! It’s completely free and open to all…
engineering.pivotal.io
Scaling Machine Learning to Recommend Driving Routes
We built an app to predict potential earnings of a driver given his current location for next 8 hours in successive time intervals. The App also provided recommendations of next best pickup locations ranked based on driver preferences and behavior…
cevo.com.au
Automatically Compliant
While it’s possible to argue that automation can be taken too far, one area where I’m sure many people would appreciate more automation is around compliance. I recently had a great experience working for a client, where the team was sent a list of items that auditors would be focusing on for the next audit…
blog-mjay.firebaseapp.com
Learning NLP
Query : How do one learn natural language processing from scratch? Answer : Learn in a classical way: 1. Introduction to information retrieval (Manning ) 2…
developer.r-project.org/Blog/public
Maximum Number of DLLs
Some packages contain native code, which is linked to R dynamically in the form of dynamically loaded libraries (DLLs). Recently, R users started loading increasing numbers of packages; “workflow documents” are one source of this pattern…
yihui.name/en
On Adjectives
Among all types of words, I have always found adjectives the most challenging to use in communication (either writing or talking). They are challenging because (1) it is so hard to refrain from using them, and (2) it is hard to be precise…
www.samatkins.me
Udacity React NanoDegree reflections
Image: React.js Nanodegree I completed my third Udacity Nanodegree in January 2018. This time the subject was React and React Native. It was a very rewarding yet demanding course…
thestudyofthehousehold.com
Weekly Reading
Here’s where my reading took me this week. Some of it is ecology, but a lot of it is other, random things that came to me via Twitter or from friends. Perhaps this will become a regular post for me! Favourite paper Paine et al. on a wonderful idea of integrating demography and traits to predict community dynamics…
purrple.cat/blog
gofast
I’ve used it to make a simple function that gives the names of the functions in a Go file: So following the pattern I’ve used in my previous go adventures, here is another Go function that sits between the real pure go code we’ve seen before, and the R things…
blog.wallaroolabs.com
How We Test the Stateful Autoscaling of Our Stream Processing System
This post discusses how we use end-to-end testing techniques to test Wallaroo’s autoscaling features. Background Autoscaling in Wallaroo enables adding or removing work capacity from an application that performs partitioned work…
purrple.cat/blog
I’m (not) looking for a job
I was a bright kid, and everybody in the family was surprised when I picked up reading on my own (that’s at least the story I’ve heard many times). After some tests I skipped two classes and things still were easy. I’m not writing this to brag, quite the opposite…
purrple.cat/blog
Quick and dirty branchmark
That’s what happens on my 💻, a pretty decent macbook pro late 2017 equipped with an…
ritsokiguess.site/docs
Ward's method and dissimilarities
Introduction I don’t know yet where this post is going. Think of it, for now, as a ramble through cluster analysis. I may eventually figure out what to do with it, but I don’t want to delete what I have written just yet…