livefreeordichotomize.com

Twitter trees

A little over a week ago, Hilary Parker tweeted out a poll about sending calendar invites that generated quite the repartee. Do you like getting google calendar invites from your friends for lunches / coffees / etc


www.mytinyshinys.com

Automated congratulatory tweet to Twitter Friends

On the front page of my premiersoccerstats site, I have a Player Milestones table which highlights players who have reached certain levels in the Premier League’s latest round of games e.g


jvera.netlify.com

Processing mail using R

After a long time seeking for R packages to connect to a remote mailbox (not Gmail), I’ve had to admit that there’s no such feature right now in R. Tested a pair of Python scripts but too much convoluted to my needs


cevo.com.au

Test driven infrastructure with Kitchen and InSpec

For a long time infrastructure was the sort of thing you pulled out of a box, plugged in and then set about configuring and testing. The cycle between needing new equipment and having it ready was measured in weeks, if not months


ritsokiguess.site/docs

How To Measure The Height of a Tree

Introduction In a previous post, I was trying to estimate the volume of wood in a tree from its diameter, and I noted that it would be an advantage to know the height of the tree: for example, we could pretend the tree was cone-shaped, or use a power-law-type relationship in which we estimate the best powers of diameter and height to use to estimate


tojyouso.github.io

When is the best time to play your wildcard?

I scraped the website at the end of last season to get details on when the top players used their chips. I wanted to see if there was a clear pattern and see if there was a strategy I could learn


blog.brianz.bz

Structuring Serverless Applications with Python

In spite of my intentions to get more involved in Elixir I’ve been stuck in the Python tractor beam. For all of the issues that may arise in large Python web applications, Python really is a fantastic do-it-all language


ritsokiguess.site/docs

Summarizing several models using broom and purrr

Introduction broom is supposed to be a powerful way to summarize several models at once, and so it is. The trouble is, the examples show how to fit the same model to different subsets of a data set. I had something different in mind: I had one data set, and three different models on that same data


jvera.netlify.com

file.choose: empowering useRs

Sometimes when sharing your analysis, via Rmarkdown or the brand new NoteBook, the data file is located at the user’s computer, making unusable the default path from your own pc


jvera.netlify.com

3D chart using rgl library

Iris is one of the most used data set in R. We’ve seen it in many formats, and broadly used for data manipulation. You could say that there’s nothing new to learn if someone use Iris. I was wondering if there’s something new, something never done to it before


gcppodcast.com

Broad Institute and Platinum Customers with Lukas Karlsson and Mike Altarace

Mike has been a Strategic Customer Engineer (SCE, pronounced Ski) assigned to the Broad Institute for over a year. He’s been working with Broad on all manners of operating their GCP environment


mlr-blog.netlify.com

Parameter tuning with mlrHyperopt

Hyperparameter tuning with mlr is rich in options as they are multiple tuning methods: Simple Random Search Grid Search Iterated F-Racing (via irace) Sequential Model-Based Optimization (via mlrMBO) Also the search space is easily definable and customizable for each of the 60+ learners of mlr using the ParamSets from the ParamHelpers


www.mytinyshinys.com

User2017- padr package example

Of course, it is not the same as actually being there, but as a good fall-back the videos of the talks for the R User 2017 conference are now available on channel 9. I’ll be dipping into them over the next few weeks and reporting on any I find of interest. Let’s kick-off with the padr package from Edwin Thoen


vuorre.netlify.com

Correlated Psychological Variables, Uncertainty, and Bayesian Estimation

Assessing the correlations between psychological variabless, such as abilities and improvements, is one essential goal of psychological science


livefreeordichotomize.com

The making of 'We R-Ladies'

Last March Maëlle wrote a blog post “Faces of #rstats Twitter”, a great tutorial on scraping twitter photos and compiling them in a collage


jvera.netlify.com

managing installation and packages in R

Mentioned yesterday the useful library pacman , so a brief comment about it is due. But I’m going to recommend installR for managing updates first (packages and R itself)


jvera.netlify.com

Easy Rstudio add-in management with addinslist package

When started using Rstudio (some time ago) I had been wondering where the Rstudio Addins were located. There’s a menu option, but It was empty on my machine. Seeking an easy way to install some addins I’ve found “addinslist” install


livefreeordichotomize.com

Happy World Emoji Day

HAPPY world emoji day! 🌎 🐔 📆 In honor of this momentous occasion, I have decided to analyze the emojis used on rOpenSci’s Slack. library(“dplyr”) ⊕If you’d like to follow along, go fetch yourself a Slack token. token <- “MY_SLACK_API_TOKEN” ## stick your token here We will first use Slack’s reactions


www.mytinyshinys.com

Weather plots for any US location

There are issues with packages in this post. Here are author comments weatherData“All, yes looks like WU is no longer making it easy to get CSV files without API’s. If anyone figures out a URL for directly fetching CSV’s, I will modify the package


jvera.netlify.com

fourfoldplot

Working with R, it’s high likely you end with a table regarding to dichotomous variables in your datasets no matter the specific project you’re involved in. I like the ConfusionMatrix function from caret package, that calculates a cross-tabulation of observed and predicted classes


giorasimchoni.com

Playmate of the Month - From Marylin To Ashley

I like working with weird and unexpected datasets. And when they don’t come to me - I go get them myself


cattleguard.github.io

If You’re Going to Fail To Scale, Don’t - Part II

People hate to wait. Now, if you’re not familiar with ramp metering here’s the gist. A stoplight is placed at the end of an on ramp which regulates how many cars are allowed onto a highway at a given time. The idea being that the number of required slowdowns and wrecks decreases as cars have appropriate distance. Waiting sucks


jvera.netlify.com

Some essential R packages

For me, there’s a bunch of packages considered as “essential” ‘cause in the end, sooner or later I use them in any project that involved opening the RStudio regardless of the type of issue that I’m trying to


ritsokiguess.site/docs

Summarizing columns in the tidyverse

Introduction I thought summarizing columns in the tidyverse was kind of clunky, at least until a couple of days ago. Let’s read in some data to illustrate what I thought I had to do


ritsokiguess.site/docs

Tufte-esque

Playing with a new look, thanks to this. One of the main reasons I’m trying this is ⊕the possibility of making side comments on the side (look right). Ho ho ho. This is a new thought apparently


livefreeordichotomize.com

Introducing the tuftesque blogdown theme

This post will serve as a quick tutorial getting you from nothing to a customized blogdown blog using the theme built for this blog: tuftesque


vuorre.netlify.com

Visualizing varying effects' posteriors with joyplots

However, to make the figure more Unknown Pleasures-y, you’ll need to modify the theme a little bit: Well, there you go


jvera.netlify.com

First thing first: Thanks!

First things first. Question of etiquette, when starting a blog like this, mainly focused on Data Science with R, is to acknowledge all the people and teams that made possible that I’m writing this today. People from CRAN, Rstudio and the R consortium, for pushing forward the best language in the world for data analytics


www.onceupondata.com

Highlights from UseR! 2017

In the first week of July, the 14th UseR! conference took place in Brussels as the biggest UseR!. For me, it was the first UseR! and I believe it was a good opportunity to get exposed to different approaches in the data world, see different applications, learn about new packages and meet people in the R community, all in one place


gcppodcast.com

Istio with Varun Talwar and Sven Mawson

If I want to apply Istio to an existing Kubernetes application, how do I do


livefreeordichotomize.com

useR!2017 digressions

We both recently attended useR!2017 in Brussels. It was a blast to say the least. We’re going to tag team to cover our favorite things & the lessons we learned while adventuring across the Atlantic. Location Lucy: Brussels was incredible


purrple.cat/blog

Emojis at #useR2017

I am first there, but that’s not fair because at some point while developping the app I tweeted the list of all the emojis then used so far


www.mytinyshinys.com

Mapping Eurostat information Part 1

Keeping up with the theme of utilizing official government open data to map via an R package I will now turn to the eurostat package which accesses data - via an API - from the European Commission


dsnotes.com

Benchmarking different implementations of weighted-ALS matrix factorization

updated 01/08/2017 - added CG solver in reco, adjusted results As I promised in last post, I’m going to share benchmark of different implementation of matrix factorization with Weighted Alternating Least Squares


emil.tbjerglund.dk

Open Science tools for our research group

I have been considering how to apply this thinking to our research


www.rdatagen.net

Using simulation for power analysis

Recently, I was helping an investigator plan a stepped wedge cluster randomized trial to study the effects of modifying a physician support system on patient-level diabetes management. While analytic approaches for power calculations do exist in the context of this complex study design, it seemed worth the effort to be explicit about all of the assumptions


giorasimchoni.com

Read My Face

Recently I’ve seen some interesting posts showing how to make ASCII art in R (see here and here). Why limit ourselves to ASCII, I thought. Lincoln’s portrait could be drawn with the Gettysberg Address instead of commas and semicolons. And Trump’s portrait really deserves his tweets1


www.mytinyshinys.com

Useful links for mapping in R

Geography was not my favourite subject as a high-schooler: maybe having a teacher who smoked a pipe in the classroom had somethiing to do with


www.semidocumentedlife.com

exploring NUFORC sightings

That said, the number of sightings in each state has seen a steady climb since 2000. My guess is that due to prominence of search engines, awareness of NUFORC (and thus, the likelihood of reporting) has increased. You can follow the trend with the boxplots below


nilsreimer.com

Call for unpublished research for intergroup contact / collective action meta-analysis

Dear Colleagues, Miles Hewstone, Nikhil Sengupta, and I are conducting a meta-analytic review of studies that have examined how intergroup contact affects collective action, perceived discrimination and/or support for reparative policies among members of disadvantaged groups


gcppodcast.com

Kaggle with Wendy Kan

Kaggle joined the Google family a few months ago, so it’s a great opportunity to know more about the platform and the amazing community behind it


www.rdatagen.net

simstudy update

Here is the the estimated correlation (we would expect an estimate close to 0


www.gokhanciflikli.com

Mapping ADA Voting Scores 1947-2015

Tracking Legislator Voting Patterns How do US legislators vote once they get elected? Or, perhaps more dynamically, how do they react to external shocks (e.g


giorasimchoni.com

Auto Emojis

I hate Emojis. I’m sorry, I do. So I decided to make my own. Automatically. Strike a POS! The idea is to take a given piece of text, and replace some words automatically with custom-made emojis, which are basically images. Let’s worry about finding images for our emojis later. Now, suppose you have a text, e.g


www.mytinyshinys.com

First look at Tidycensus

The whole future of the US census has been coming under scrutiny recently, but, thankfully, we are getting more tools to scrutinise both its decennial data and that of its sister-source, the American Community service


blog.wallaroolabs.com

What's the 'Secret Sauce'?

Hi there! Welcome to the second blog post on our high-performance stream processor Wallaroo. This post assumes that you are familiar with the basics of what Wallaroo is and the features that it provides


ritsokiguess.site/docs

Cricket: wins by adjusted runs

In cricket, there are two ways to win a one-day game: by runs, if you bat first and score more runs than the other guys, or by wickets, if you bat second and score more runs than the other guys: at the moment where the second team has more runs, the game ends, and the result is given as “won by 6 wickets with 12 balls remaining”, or


dsnotes.com

Matrix factorization for recommender systems (part 2)

In previous post I explained Weigted Alternating Least Squares algorithm for matrix factorization. This post will be more practical - we will build a model which will recommend artists recommendations based on history of track listenings


cattleguard.github.io

Is Zero-Sum Thinking Affecting Your Risk Decision?

One of the challenges we embrace in my line of work is the attempt to identify risk convergence and opportunities for risk reduction across multiple scenarios. It’s not uncommon for these opportunities to cut across business functions or risk assets


gcppodcast.com

Public Datasets with Mike Hamberg and Will Curran

Mike works on helping Google teams and partners take raw data from the web and make it look beautiful and usable in BigQuery (and other platforms like Merchant Center)


www.rdatagen.net

Balancing on multiple factors when the sample is too small to stratify

In this case, we have nine different combinations of the four characteristics, four of which include only a single school (rows 2, 4, 7, and 8). Stratification wouldn’t work necessarily work here if our goal was balance across all four characteristics


ewen.io

FPL Mythbusting with fplr

Dissecting fantasy football falsehoods with my first R


giorasimchoni.com

MC RNN

I have been struggling with the understanding of Recurrent Neural Networks (RNN) and Long Short Term Memory (LSTM) for a while. I find that explaining a topic to other people really helps in nailing down just what is it you don’t understand, and eventually “getting it”


cattleguard.github.io

Tony's Coffee Guide

I drink quite a bit of coffee. It’s true. Occasionally it comes up in conversation. It occurs that someone is bored with their beans and wants to class up their caffeine delivery. Maybe this is you. Well, here are some of my favorites as of late. Check them out. I’ll continue to keep my notes here, but I don’t plan on spamming the RSS with updates


blog.mgechev.com

WebVR for a Gamified IDE

In the first part of this blog post I discuss the idea of using virtual reality for gamification of manual tasks in the software development process


jessesadler.com

Thinking about Workflow

In the spring of 2011, I was in the middle of doing research for my dissertation


gcppodcast.com

Prometheus with Julius Volz

I didn’t put enough log statements in my application, and now things are broken.


vuorre.netlify.com

Where are all the consciousness scientists?

I first asked if there was, across all the 20 journals in the database, any obvious change in how often the term “consciousness” was mentioned


www.mytinyshinys.com

Baby Names in the UK and USA

Lost in the realms of time when reshape2 and ggvis were flavour of the month (i


lenkiefer.com

More on housing affordability

LET US FOLLOW UP ON YESTERDAY’S POST with some more analysis of housing affordability. Per usual, we’ll use R to generate the plots and I’ll share the code below. Measuring affordability First, let’s talk a little bit more about what we are seeing in the plots


www.rdatagen.net

Copulas and correlated data generation

Here are the results for an auto-regressive (AR-1) correlation structure


lenkiefer.com

Housing affordability trends

HOW IS YOUR SUMMER GOING? Well okay, it’s not summer yet, but it sure is hot around where I am. Haven’t posted recently, so I’m going to share a couple of visualizations


www.gokhanciflikli.com

Hello, World!

Introduction Hello, and welcome to my new website. I will briefly lay out my MO in this post. The primary reason why I switched from my old academic website in favor of a more functional (modern?) version is one of pure convenience


giorasimchoni.com

It Gets Better (The yrbss Package)

It’s Pride Month! So I thought, maybe I should perform some cool analyses of data concerning the Gay community


www.riinu.me

Handling your .bib file (LaTex bibliography)

To create a


gcppodcast.com

Cloud Dataflow with Frances Perry

How can I connect all the instances in a Managed Instance Group to CloudSQL securely? Mark is still on vacation - but don’t worry, he’ll be back


www.juliapilowsky.com

I'm going to Denmark! Here's why.

With my Master’s degree in hand, I’m happy to say that I will be starting a year-long fellowship with the European Doctoral School of Demography (EDSD) in September, at the Max Planck Center for Biodemography in


www.jakekaupp.com

CEEA Reflections

Yes, I get it. I don’t post often enough


alison.rbind.io

Up and running with blogdown

Before you start, I recommend reading the following: Finally, I did not want to learn more about a lot of things! For instance, the nitty gritty of static site generators and how domain names work


giorasimchoni.com

Deep South Springfield

Recently RStudio (a.k.a my dream job) released a wrapper around keras with TensorFlow backend. Well, I just had to take this baby for a spin. But what to train my first Deep Learning network in R on? I’m neither a Cats nor a Dogs person. Whatever, I do what I want! South Park vs. Simpsons! ATTENTION: THIS IS NOT A DEEP LEARNING LESSON


ritsokiguess.site/docs

Heritage walk in Kensington Market

This morning’s Heritage Walk was in Kensington Market. I grabbed a few photos. This is the Church of St Stephen-in-the-Fields, on College between Spadina and Bathurst: Though it is now thoroughly of downtown, when it was built (1857), it was literally in the fields


ritsokiguess.site/docs

Monster Chiller Horror Theatre!

I saw this in the elevator last week: and it immediately made me think of Count Floyd in


www.rdatagen.net

When marginal and conditional logistic model estimates diverge

My aim is to show this through a couple of data simulations that allow us to see this visually


ritsokiguess.site/docs

Histograms and bins

Most software, when you ask it to draw you a histogram, will choose a number of intervals (“bins”) for you


ritsokiguess.site/docs

The Designated Hitter

Back in 1973, when the American League introduced the Designated Hitter rule, they were worried (among other things) about their league having fewer runs per game than the rival National League


ndres.me

Converting a Caffe model to TensorFlow

Converting a Caffe model to TensorFlow The Caffe Model Zoo is an extraordinary place where reasearcher share their models. Caffe is an awesome framework, but you might want to use TensorFlow instead. In this blog post, I’ll show you how to convert the Places 365 model to TensorFlow


gcppodcast.com

Spinnaker with Steven Kim and Christopher Sanson

Spinnaker is an open-source multi-cloud continuous delivery platform used in production at companies like Netflix, Waze, Target, and Cloudera, plus a new open-source command line interface (CLI) tool called halyard that makes it easy to deploy Spinnaker itself Steven Kim is an engineering manager at Google based in New York City, focused on build and delivery


jessesadler.com

By Way of Introduction

Concerning, the actual content of this blog, I envision the posts falling into two general categories. In the first place, the blog will be a space for me to discuss the various projects that I am working on, both traditional history projects and those in digital humanities


lenkiefer.com

Housing supply, population, and house prices

I MADE A LITTLE TABLEAU VISUALIZATION TO ANLAYZE TRENDS in population, housing supply and house prices. If you like interactive dataviz, then the best thing might be to jump down below and explore. But I’ll frame the viz with a bit of discussion


tojyouso.github.io

Monthly Report: May 2017

This is my review of May 2017. I hope to make this a regular occurrence and I want to publish something at the end of each month. I’ve been putting this off for quite some time because it’s not where I want it to be but I’m just going ahead and doing it. You can call this the MVP


giorasimchoni.com

The One With Friends

I’ve recently stumbbled upon this really cool text analysis of Seinfeld scripts, by Michael


livefreeordichotomize.com

runconf17, an analysis of emoji use

I had such a delightful time at rOpenSci’s unconference last week. ⊕21 📦 were produced! Not only was it extremely productive, but in between the crazy productivity was some epic community building


vuorre.netlify.com

Quantitative literature review with R

We’ll be working with R, so if you want to follow along on your own computer, fire up R and load up the required R packages: As before, we limit the investigation to Psychonomic Society journals: Let’s begin by looking at the articles’


ritsokiguess.site/docs

Carter and Guthrie

Introduction Carter and Guthrie, in 2004, proposed a method of modelling cricket matches


ritsokiguess.site/docs

Comments

I seem to have Disqus comments enabled now. The crucial thing appeared to be the disqus.html file written by Yihui Xie. I changed the disqus shortname given there to mine, added my shortname to the disqusShortname in config.toml, and it seems to work


gcppodcast.com

Container Builder with Christopher Sanson and David Bendory

David Bendory is the Tech Lead for Google Cloud Container Builder. He joined Google on the Container Builder team in April 2015 after more than 20 years in software engineering on Wall Street


ritsokiguess.site/docs

Odd sums

While waiting for my coffee at work this morning, I was leafing through a recreational mathematics journal. It said, “the numbers 1–9 are arranged at random in a 3 by 3 matrix


r-tastic.co.uk

Animated Plots As Part Of Exploratory Data Analysis

Next, I only need to append identified files… … and we can now start! Let’s have a look at crime types and their frequencies: And a quick peek into sample sizes..


cattleguard.github.io

If You’re Going to Fail To Scale, Don’t - Part I

Businesses that don’t deliver, don’t survive. Why should your information security program? The organization has decided to spin up an information security program and you’re in charge. How sure are you that you can handle what you’ve built? If your customers are placing orders and you don’t have inventory you’ve got two options


ewen.io

Tracking London's Pub & Bar Landscape with geofacet

Toying around with geofacet, a ggplot2 extension for geographic small


www.mytinyshinys.com

Theme Update

Yow will have noticed a new look to the site, now based on the Hugo Icarus theme The major reason for introducing this is that the previous theme I used was unable to render certain htmlwidgets I wanted to use to illustrate my


lenkiefer.com

Housing supply, population, and house prices

'getSymbols' currently uses auto.assign=TRUE by default, but will

use auto.assign=FALSE in 0.5-0. You will still be able to use

'loadSymbols' to automatically load data. getOption(“getSymbols.env”)


dsnotes.com

Matrix factorization for recommender systems

Generally speaking the task for a recommender system is not to make up-sale. The real task is to keep customers engaged in your service. With loyal customers, you can monetize your service. Recommender systems is a very wide area, but in this post I won’t go into basics


giorasimchoni.com

The Sounds of Probability

I’ve always wanted to play with Sonification: “… the use of non-speech audio to convey information or perceptualize data.” (Wikipedia, the source of all knowledge) Don’t get me wrong, I am thrilled with the sight of a nice visualization as the next (data geek) guy


www.ifconfig.it/hugo

Ansible and IOS quick start

Ansible has been around for I while but I didn’t had a chance to play with it so far


lenkiefer.com

Housing market recap

QUITE A LOT OF HOUSING DATA CAME OUT THIS WEEK. Let’s recap with some graphs. Mortgage rates back below 4 percent The 30-year fixed rate mortgage fell back below 4 percent this week. New home sales New home sales data was released and came in weaker than expected for April 2017


gcppodcast.com

Firebase at I/O 2017 with James Tamplin and Andrew Lee

How do I give one of my Google Cloud Platform Project’s to another person? Mark is going on vacation for a few weeks - but don’t worry, he’ll still be on the


lenkiefer.com

Index starting points and dataviz

SO WE HAVE BEEN PLOTTING A LOT OF INDEX VALUES LATELY. It’s been great. But you have questions. Great questions. I got an interesting response to my house price dot chart over Twitter regarding the house price index we were plotting. User @chrisschnabel


www.mytinyshinys.com

Integrating dplyr with Remote databases

A recent RViews article covers the use of the dplyr package to interact with SQL databases All the code can be written in R, which dplyr then translates into SQL queries to harness the power of a database You will probably want to read the article if interested in extending the process to your own data but here is a taster from some of


ritsokiguess.site/docs

Add-in

I just discovered a couple of things: an R Studio add-in called CRANsearcher that, when you run it, prompts you for search terms and searches the whole of CRAN for anything that matches those search terms. (Thanks to @juliasilge on Twitter for this