livefreeordichotomize.com
Twitter trees
A little over a week ago, Hilary Parker tweeted out a poll about sending calendar invites that generated quite the repartee. Do you like getting google calendar invites from your friends for lunches / coffees / etc…
www.mytinyshinys.com
Automated congratulatory tweet to Twitter Friends
On the front page of my premiersoccerstats site, I have a Player Milestones table which highlights players who have reached certain levels in the Premier League’s latest round of games e.g…
jvera.netlify.com
Processing mail using R
After a long time seeking for R packages to connect to a remote mailbox (not Gmail), I’ve had to admit that there’s no such feature right now in R. Tested a pair of Python scripts but too much convoluted to my needs…
cevo.com.au
Test driven infrastructure with Kitchen and InSpec
For a long time infrastructure was the sort of thing you pulled out of a box, plugged in and then set about configuring and testing. The cycle between needing new equipment and having it ready was measured in weeks, if not months…
ritsokiguess.site/docs
How To Measure The Height of a Tree
Introduction In a previous post, I was trying to estimate the volume of wood in a tree from its diameter, and I noted that it would be an advantage to know the height of the tree: for example, we could pretend the tree was cone-shaped, or use a power-law-type relationship in which we estimate the best powers of diameter and height to use to estimate…
tojyouso.github.io
When is the best time to play your wildcard?
I scraped the website at the end of last season to get details on when the top players used their chips. I wanted to see if there was a clear pattern and see if there was a strategy I could learn…
blog.brianz.bz
Structuring Serverless Applications with Python
In spite of my intentions to get more involved in Elixir I’ve been stuck in the Python tractor beam. For all of the issues that may arise in large Python web applications, Python really is a fantastic do-it-all language…
ritsokiguess.site/docs
Summarizing several models using broom and purrr
Introduction broom is supposed to be a powerful way to summarize several models at once, and so it is. The trouble is, the examples show how to fit the same model to different subsets of a data set. I had something different in mind: I had one data set, and three different models on that same data…
jvera.netlify.com
file.choose: empowering useRs
Sometimes when sharing your analysis, via Rmarkdown or the brand new NoteBook, the data file is located at the user’s computer, making unusable the default path from your own pc…
jvera.netlify.com
3D chart using rgl library
Iris is one of the most used data set in R. We’ve seen it in many formats, and broadly used for data manipulation. You could say that there’s nothing new to learn if someone use Iris. I was wondering if there’s something new, something never done to it before…
gcppodcast.com
Broad Institute and Platinum Customers with Lukas Karlsson and Mike Altarace
Mike has been a Strategic Customer Engineer (SCE, pronounced Ski) assigned to the Broad Institute for over a year. He’s been working with Broad on all manners of operating their GCP environment…
mlr-blog.netlify.com
Parameter tuning with mlrHyperopt
Hyperparameter tuning with mlr is rich in options as they are multiple tuning methods: Simple Random Search Grid Search Iterated F-Racing (via irace) Sequential Model-Based Optimization (via mlrMBO) Also the search space is easily definable and customizable for each of the 60+ learners of mlr using the ParamSets from the ParamHelpers…
www.mytinyshinys.com
User2017- padr package example
Of course, it is not the same as actually being there, but as a good fall-back the videos of the talks for the R User 2017 conference are now available on channel 9. I’ll be dipping into them over the next few weeks and reporting on any I find of interest. Let’s kick-off with the padr package from Edwin Thoen…
vuorre.netlify.com
Correlated Psychological Variables, Uncertainty, and Bayesian Estimation
Assessing the correlations between psychological variabless, such as abilities and improvements, is one essential goal of psychological science…
livefreeordichotomize.com
The making of 'We R-Ladies'
Last March Maëlle wrote a blog post “Faces of #rstats Twitter”, a great tutorial on scraping twitter photos and compiling them in a collage…
jvera.netlify.com
managing installation and packages in R
Mentioned yesterday the useful library pacman , so a brief comment about it is due. But I’m going to recommend installR for managing updates first (packages and R itself)…
jvera.netlify.com
Easy Rstudio add-in management with addinslist package
When started using Rstudio (some time ago) I had been wondering where the Rstudio Addins were located. There’s a menu option, but It was empty on my machine. Seeking an easy way to install some addins I’ve found “addinslist” install…
livefreeordichotomize.com
Happy World Emoji Day
HAPPY world emoji day! 🌎 🐔 📆 In honor of this momentous occasion, I have decided to analyze the emojis used on rOpenSci’s Slack. library(“dplyr”) ⊕If you’d like to follow along, go fetch yourself a Slack token. token <- “MY_SLACK_API_TOKEN” ## stick your token here We will first use Slack’s reactions…
www.mytinyshinys.com
Weather plots for any US location
There are issues with packages in this post. Here are author comments weatherData“All, yes looks like WU is no longer making it easy to get CSV files without API’s. If anyone figures out a URL for directly fetching CSV’s, I will modify the package…
jvera.netlify.com
fourfoldplot
Working with R, it’s high likely you end with a table regarding to dichotomous variables in your datasets no matter the specific project you’re involved in. I like the ConfusionMatrix function from caret package, that calculates a cross-tabulation of observed and predicted classes…
giorasimchoni.com
Playmate of the Month - From Marylin To Ashley
I like working with weird and unexpected datasets. And when they don’t come to me - I go get them myself…
cattleguard.github.io
If You’re Going to Fail To Scale, Don’t - Part II
People hate to wait. Now, if you’re not familiar with ramp metering here’s the gist. A stoplight is placed at the end of an on ramp which regulates how many cars are allowed onto a highway at a given time. The idea being that the number of required slowdowns and wrecks decreases as cars have appropriate distance. Waiting sucks…
jvera.netlify.com
Some essential R packages
For me, there’s a bunch of packages considered as “essential” ‘cause in the end, sooner or later I use them in any project that involved opening the RStudio regardless of the type of issue that I’m trying to…
ritsokiguess.site/docs
Summarizing columns in the tidyverse
Introduction I thought summarizing columns in the tidyverse was kind of clunky, at least until a couple of days ago. Let’s read in some data to illustrate what I thought I had to do…
ritsokiguess.site/docs
Tufte-esque
Playing with a new look, thanks to this. One of the main reasons I’m trying this is ⊕the possibility of making side comments on the side (look right). Ho ho ho. This is a new thought apparently…
livefreeordichotomize.com
Introducing the tuftesque blogdown theme
This post will serve as a quick tutorial getting you from nothing to a customized blogdown blog using the theme built for this blog: tuftesque…
vuorre.netlify.com
Visualizing varying effects' posteriors with joyplots
However, to make the figure more Unknown Pleasures-y, you’ll need to modify the theme a little bit: Well, there you go…
jvera.netlify.com
First thing first: Thanks!
First things first. Question of etiquette, when starting a blog like this, mainly focused on Data Science with R, is to acknowledge all the people and teams that made possible that I’m writing this today. People from CRAN, Rstudio and the R consortium, for pushing forward the best language in the world for data analytics…
www.onceupondata.com
Highlights from UseR! 2017
In the first week of July, the 14th UseR! conference took place in Brussels as the biggest UseR!. For me, it was the first UseR! and I believe it was a good opportunity to get exposed to different approaches in the data world, see different applications, learn about new packages and meet people in the R community, all in one place…
gcppodcast.com
Istio with Varun Talwar and Sven Mawson
If I want to apply Istio to an existing Kubernetes application, how do I do…
livefreeordichotomize.com
useR!2017 digressions
We both recently attended useR!2017 in Brussels. It was a blast to say the least. We’re going to tag team to cover our favorite things & the lessons we learned while adventuring across the Atlantic. Location Lucy: Brussels was incredible…
purrple.cat/blog
Emojis at #useR2017
I am first there, but that’s not fair because at some point while developping the app I tweeted the list of all the emojis then used so far…
www.mytinyshinys.com
Mapping Eurostat information Part 1
Keeping up with the theme of utilizing official government open data to map via an R package I will now turn to the eurostat package which accesses data - via an API - from the European Commission…
dsnotes.com
Benchmarking different implementations of weighted-ALS matrix factorization
updated 01/08/2017 - added CG solver in reco, adjusted results As I promised in last post, I’m going to share benchmark of different implementation of matrix factorization with Weighted Alternating Least Squares…
emil.tbjerglund.dk
Open Science tools for our research group
I have been considering how to apply this thinking to our research…
www.rdatagen.net
Using simulation for power analysis
Recently, I was helping an investigator plan a stepped wedge cluster randomized trial to study the effects of modifying a physician support system on patient-level diabetes management. While analytic approaches for power calculations do exist in the context of this complex study design, it seemed worth the effort to be explicit about all of the assumptions…
giorasimchoni.com
Read My Face
Recently I’ve seen some interesting posts showing how to make ASCII art in R (see here and here). Why limit ourselves to ASCII, I thought. Lincoln’s portrait could be drawn with the Gettysberg Address instead of commas and semicolons. And Trump’s portrait really deserves his tweets1…
www.mytinyshinys.com
Useful links for mapping in R
Geography was not my favourite subject as a high-schooler: maybe having a teacher who smoked a pipe in the classroom had somethiing to do with…
www.semidocumentedlife.com
exploring NUFORC sightings
That said, the number of sightings in each state has seen a steady climb since 2000. My guess is that due to prominence of search engines, awareness of NUFORC (and thus, the likelihood of reporting) has increased. You can follow the trend with the boxplots below…
nilsreimer.com
Call for unpublished research for intergroup contact / collective action meta-analysis
Dear Colleagues, Miles Hewstone, Nikhil Sengupta, and I are conducting a meta-analytic review of studies that have examined how intergroup contact affects collective action, perceived discrimination and/or support for reparative policies among members of disadvantaged groups…
gcppodcast.com
Kaggle with Wendy Kan
Kaggle joined the Google family a few months ago, so it’s a great opportunity to know more about the platform and the amazing community behind it…
www.rdatagen.net
simstudy update
Here is the the estimated correlation (we would expect an estimate close to 0…
www.gokhanciflikli.com
Mapping ADA Voting Scores 1947-2015
Tracking Legislator Voting Patterns How do US legislators vote once they get elected? Or, perhaps more dynamically, how do they react to external shocks (e.g…
giorasimchoni.com
Auto Emojis
I hate Emojis. I’m sorry, I do. So I decided to make my own. Automatically. Strike a POS! The idea is to take a given piece of text, and replace some words automatically with custom-made emojis, which are basically images. Let’s worry about finding images for our emojis later. Now, suppose you have a text, e.g…
www.mytinyshinys.com
First look at Tidycensus
The whole future of the US census has been coming under scrutiny recently, but, thankfully, we are getting more tools to scrutinise both its decennial data and that of its sister-source, the American Community service…
blog.wallaroolabs.com
What's the 'Secret Sauce'?
Hi there! Welcome to the second blog post on our high-performance stream processor Wallaroo. This post assumes that you are familiar with the basics of what Wallaroo is and the features that it provides…
ritsokiguess.site/docs
Cricket: wins by adjusted runs
In cricket, there are two ways to win a one-day game: by runs, if you bat first and score more runs than the other guys, or by wickets, if you bat second and score more runs than the other guys: at the moment where the second team has more runs, the game ends, and the result is given as “won by 6 wickets with 12 balls remaining”, or…
dsnotes.com
Matrix factorization for recommender systems (part 2)
In previous post I explained Weigted Alternating Least Squares algorithm for matrix factorization. This post will be more practical - we will build a model which will recommend artists recommendations based on history of track listenings…
cattleguard.github.io
Is Zero-Sum Thinking Affecting Your Risk Decision?
One of the challenges we embrace in my line of work is the attempt to identify risk convergence and opportunities for risk reduction across multiple scenarios. It’s not uncommon for these opportunities to cut across business functions or risk assets…
gcppodcast.com
Public Datasets with Mike Hamberg and Will Curran
Mike works on helping Google teams and partners take raw data from the web and make it look beautiful and usable in BigQuery (and other platforms like Merchant Center)…
www.rdatagen.net
Balancing on multiple factors when the sample is too small to stratify
In this case, we have nine different combinations of the four characteristics, four of which include only a single school (rows 2, 4, 7, and 8). Stratification wouldn’t work necessarily work here if our goal was balance across all four characteristics…
giorasimchoni.com
MC RNN
I have been struggling with the understanding of Recurrent Neural Networks (RNN) and Long Short Term Memory (LSTM) for a while. I find that explaining a topic to other people really helps in nailing down just what is it you don’t understand, and eventually “getting it”…
cattleguard.github.io
Tony's Coffee Guide
I drink quite a bit of coffee. It’s true. Occasionally it comes up in conversation. It occurs that someone is bored with their beans and wants to class up their caffeine delivery. Maybe this is you. Well, here are some of my favorites as of late. Check them out. I’ll continue to keep my notes here, but I don’t plan on spamming the RSS with updates…
blog.mgechev.com
WebVR for a Gamified IDE
In the first part of this blog post I discuss the idea of using virtual reality for gamification of manual tasks in the software development process…
jessesadler.com
Thinking about Workflow
In the spring of 2011, I was in the middle of doing research for my dissertation…
gcppodcast.com
Prometheus with Julius Volz
I didn’t put enough log statements in my application, and now things are broken.…
vuorre.netlify.com
Where are all the consciousness scientists?
I first asked if there was, across all the 20 journals in the database, any obvious change in how often the term “consciousness” was mentioned…
www.mytinyshinys.com
Baby Names in the UK and USA
Lost in the realms of time when reshape2 and ggvis were flavour of the month (i…
lenkiefer.com
More on housing affordability
LET US FOLLOW UP ON YESTERDAY’S POST with some more analysis of housing affordability. Per usual, we’ll use R to generate the plots and I’ll share the code below. Measuring affordability First, let’s talk a little bit more about what we are seeing in the plots…
www.rdatagen.net
Copulas and correlated data generation
Here are the results for an auto-regressive (AR-1) correlation structure…
lenkiefer.com
Housing affordability trends
HOW IS YOUR SUMMER GOING? Well okay, it’s not summer yet, but it sure is hot around where I am. Haven’t posted recently, so I’m going to share a couple of visualizations…
www.gokhanciflikli.com
Hello, World!
Introduction Hello, and welcome to my new website. I will briefly lay out my MO in this post. The primary reason why I switched from my old academic website in favor of a more functional (modern?) version is one of pure convenience…
giorasimchoni.com
It Gets Better (The yrbss Package)
It’s Pride Month! So I thought, maybe I should perform some cool analyses of data concerning the Gay community…
gcppodcast.com
Cloud Dataflow with Frances Perry
How can I connect all the instances in a Managed Instance Group to CloudSQL securely? Mark is still on vacation - but don’t worry, he’ll be back…
www.juliapilowsky.com
I'm going to Denmark! Here's why.
With my Master’s degree in hand, I’m happy to say that I will be starting a year-long fellowship with the European Doctoral School of Demography (EDSD) in September, at the Max Planck Center for Biodemography in…
alison.rbind.io
Up and running with blogdown
Before you start, I recommend reading the following: Finally, I did not want to learn more about a lot of things! For instance, the nitty gritty of static site generators and how domain names work…
giorasimchoni.com
Deep South Springfield
Recently RStudio (a.k.a my dream job) released a wrapper around keras with TensorFlow backend. Well, I just had to take this baby for a spin. But what to train my first Deep Learning network in R on? I’m neither a Cats nor a Dogs person. Whatever, I do what I want! South Park vs. Simpsons! ATTENTION: THIS IS NOT A DEEP LEARNING LESSON…
ritsokiguess.site/docs
Heritage walk in Kensington Market
This morning’s Heritage Walk was in Kensington Market. I grabbed a few photos. This is the Church of St Stephen-in-the-Fields, on College between Spadina and Bathurst: Though it is now thoroughly of downtown, when it was built (1857), it was literally in the fields…
ritsokiguess.site/docs
Monster Chiller Horror Theatre!
I saw this in the elevator last week: and it immediately made me think of Count Floyd in…
www.rdatagen.net
When marginal and conditional logistic model estimates diverge
My aim is to show this through a couple of data simulations that allow us to see this visually…
ritsokiguess.site/docs
Histograms and bins
Most software, when you ask it to draw you a histogram, will choose a number of intervals (“bins”) for you…
ritsokiguess.site/docs
The Designated Hitter
Back in 1973, when the American League introduced the Designated Hitter rule, they were worried (among other things) about their league having fewer runs per game than the rival National League…
ndres.me
Converting a Caffe model to TensorFlow
Converting a Caffe model to TensorFlow The Caffe Model Zoo is an extraordinary place where reasearcher share their models. Caffe is an awesome framework, but you might want to use TensorFlow instead. In this blog post, I’ll show you how to convert the Places 365 model to TensorFlow…
gcppodcast.com
Spinnaker with Steven Kim and Christopher Sanson
Spinnaker is an open-source multi-cloud continuous delivery platform used in production at companies like Netflix, Waze, Target, and Cloudera, plus a new open-source command line interface (CLI) tool called halyard that makes it easy to deploy Spinnaker itself Steven Kim is an engineering manager at Google based in New York City, focused on build and delivery…
jessesadler.com
By Way of Introduction
Concerning, the actual content of this blog, I envision the posts falling into two general categories. In the first place, the blog will be a space for me to discuss the various projects that I am working on, both traditional history projects and those in digital humanities…
lenkiefer.com
Housing supply, population, and house prices
I MADE A LITTLE TABLEAU VISUALIZATION TO ANLAYZE TRENDS in population, housing supply and house prices. If you like interactive dataviz, then the best thing might be to jump down below and explore. But I’ll frame the viz with a bit of discussion…
tojyouso.github.io
Monthly Report: May 2017
This is my review of May 2017. I hope to make this a regular occurrence and I want to publish something at the end of each month. I’ve been putting this off for quite some time because it’s not where I want it to be but I’m just going ahead and doing it. You can call this the MVP…
giorasimchoni.com
The One With Friends
I’ve recently stumbbled upon this really cool text analysis of Seinfeld scripts, by Michael…
livefreeordichotomize.com
runconf17, an analysis of emoji use
I had such a delightful time at rOpenSci’s unconference last week. ⊕21 📦 were produced! Not only was it extremely productive, but in between the crazy productivity was some epic community building…
vuorre.netlify.com
Quantitative literature review with R
We’ll be working with R, so if you want to follow along on your own computer, fire up R and load up the required R packages: As before, we limit the investigation to Psychonomic Society journals: Let’s begin by looking at the articles’…
ritsokiguess.site/docs
Carter and Guthrie
Introduction Carter and Guthrie, in 2004, proposed a method of modelling cricket matches…
ritsokiguess.site/docs
Comments
I seem to have Disqus comments enabled now. The crucial thing appeared to be the disqus.html file written by Yihui Xie. I changed the disqus shortname given there to mine, added my shortname to the disqusShortname in config.toml, and it seems to work…
gcppodcast.com
Container Builder with Christopher Sanson and David Bendory
David Bendory is the Tech Lead for Google Cloud Container Builder. He joined Google on the Container Builder team in April 2015 after more than 20 years in software engineering on Wall Street…
ritsokiguess.site/docs
Odd sums
While waiting for my coffee at work this morning, I was leafing through a recreational mathematics journal. It said, “the numbers 1–9 are arranged at random in a 3 by 3 matrix…
r-tastic.co.uk
Animated Plots As Part Of Exploratory Data Analysis
Next, I only need to append identified files… … and we can now start! Let’s have a look at crime types and their frequencies: And a quick peek into sample sizes..…
cattleguard.github.io
If You’re Going to Fail To Scale, Don’t - Part I
Businesses that don’t deliver, don’t survive. Why should your information security program? The organization has decided to spin up an information security program and you’re in charge. How sure are you that you can handle what you’ve built? If your customers are placing orders and you don’t have inventory you’ve got two options…
ewen.io
Tracking London's Pub & Bar Landscape with geofacet
Toying around with geofacet, a ggplot2 extension for geographic small…
www.mytinyshinys.com
Theme Update
Yow will have noticed a new look to the site, now based on the Hugo Icarus theme The major reason for introducing this is that the previous theme I used was unable to render certain htmlwidgets I wanted to use to illustrate my…
lenkiefer.com
Housing supply, population, and house prices
'getSymbols' currently uses auto.assign=TRUE by default, but will
use auto.assign=FALSE in 0.5-0. You will still be able to use
'loadSymbols' to automatically load data. getOption(“getSymbols.env”)
and getOption(“getSymbols.auto…
dsnotes.com
Matrix factorization for recommender systems
Generally speaking the task for a recommender system is not to make up-sale. The real task is to keep customers engaged in your service. With loyal customers, you can monetize your service. Recommender systems is a very wide area, but in this post I won’t go into basics…
giorasimchoni.com
The Sounds of Probability
I’ve always wanted to play with Sonification: “… the use of non-speech audio to convey information or perceptualize data.” (Wikipedia, the source of all knowledge) Don’t get me wrong, I am thrilled with the sight of a nice visualization as the next (data geek) guy…
www.ifconfig.it/hugo
Ansible and IOS quick start
Ansible has been around for I while but I didn’t had a chance to play with it so far…
lenkiefer.com
Housing market recap
QUITE A LOT OF HOUSING DATA CAME OUT THIS WEEK. Let’s recap with some graphs. Mortgage rates back below 4 percent The 30-year fixed rate mortgage fell back below 4 percent this week. New home sales New home sales data was released and came in weaker than expected for April 2017…
gcppodcast.com
Firebase at I/O 2017 with James Tamplin and Andrew Lee
How do I give one of my Google Cloud Platform Project’s to another person? Mark is going on vacation for a few weeks - but don’t worry, he’ll still be on the…
lenkiefer.com
Index starting points and dataviz
SO WE HAVE BEEN PLOTTING A LOT OF INDEX VALUES LATELY. It’s been great. But you have questions. Great questions. I got an interesting response to my house price dot chart over Twitter regarding the house price index we were plotting. User @chrisschnabel
www.mytinyshinys.com
Integrating dplyr with Remote databases
A recent RViews article covers the use of the dplyr package to interact with SQL databases All the code can be written in R, which dplyr then translates into SQL queries to harness the power of a database You will probably want to read the article if interested in extending the process to your own data but here is a taster from some of…
ritsokiguess.site/docs
Add-in
I just discovered a couple of things: an R Studio add-in called CRANsearcher that, when you run it, prompts you for search terms and searches the whole of CRAN for anything that matches those search terms. (Thanks to @juliasilge on Twitter for this…