giorasimchoni.com
Billboard Bananas
I was looking for a “mellow” project to make, when I came across this post by Michael Kling, playing in Python with the Billboard Hot 100 charts since 1940, scraped from the Ultimate Music Database site. Now that is something I need to do in R…
lenkiefer.com
Facet my geo!
TIME TO TRY OUT ANOTHER HOUSE PRICE VISUALIZATION. In this post we’ll try out a new way to visualize recent house price trends with R. Just this wekeend I saw a new package geofacet for organizing ggplot2 facets along a geographic grid…
mdgbeck.netlify.com
Tidytext Analysis of Seinfeld
I then wrote a function that takes the URL for an episode and pulls the necessary data. Unfortunately, as the scripts were submitted to the site by different fans there is no standard format, making the scraping a little trickier…
ritsokiguess.site/docs
Victoria Day bike ride
I suppose I should have ridden Victoria Park on Victoria Day, but I didn’t: When the wind is blowing from the west, don’t ride out to the east! My new phone now has Human on it, which works better than on the tablet (so I uninstalled it from there)…
wirtel.be
Award At Pycon US 2017
Introduction In June 2016, I have received an email from the Board of the Python Software Foundation, where they informed me I have received a Community Service Award. Seriously, I was glad to receive this award. I don’t know how to describe this moment. After that, the PSF has written a blog post about me…
lenkiefer.com
Consumer prices, household debt
LET’S TAKE A LOOK AT RECENT TRENDS IN CONSUMER PRICES AND HOUSEHOLD DEBT. Along the way we’ll refresh some visualizations of consumer prices (see here) and household debt (see here) we made last year, as well as think up some new ones…
cattleguard.github.io
Get Sankey! Sankey diagrams for infosec
Yesterday, a tweet caught my eye. It was something that I know I’d seen before, but it somehow had escaped my memory as to what it was called or how it was constructed. Well, it bugged me enough that I had to track it down. As I am currently neck deep in writing an annual risk report it’d be easy for me to agree with the thought…
ritsokiguess.site/docs
Welch analysis of variance
Introduction The standard analysis of variance based on the (F)-test has two major assumptions: Normally distributed data Equal variance within each group…
lenkiefer.com
State house price growth trends
TIME FOR A NEW HOUSE PRICE VISUALIZATIONS. In this post I’ll new way to visualize recent house price trends with R. Data We’re going to be visualizing the Freddie Mac House Price Index. We talked about these data earlier this month, see this post for some other visualizations…
gcppodcast.com
Basecamp Networks with Craig Ganssle
Craig Ganssle is the Founder and CEO of Basecamp Networks…
timtrice.net
RStudio in Docker
When developing some of my applications I wanted the ability to test these applications on different versions of R. But setting this up on Linux proved to be more difficult than on my Windows machine. Docker makes it significantly easier…
cevo.com.au
Evolving team leadership
There is much written about the changing roles of Development and Operations staff when organisations undergo agile/devops transformations…
www.rdatagen.net
It can be easy to explore data generating mechanisms with the simstudy package
I learned statistics and probability by simulating data. Sure, I did the occasional proof, but I never believed the results until I saw it in a simulation. I guess I have it backwards, but I that’s just the way I am…
ritsokiguess.site/docs
Making scatterplots against multiple explanatory variables
Introduction An R post here…
mlr-blog.netlify.com
shinyMlr
shinyMlr is a web application, built with the R-package “shiny” that provides a user interface for mlr…
cattleguard.github.io
Just a Test Post From Blogdown
So, Yihui Xie gifted the RStats world with another presentation surface…
blog.davisvaughan.com
RStudio and Shiny Servers with AWS - Part 1
After realizing how fast I can burn through my free 25 hours on shinyapps.io, I decided to repurpose my RStudio Server to also work with Shiny Server…
giorasimchoni.com
Data Porn!
Recently I’ve stumbled upon this treasure of data, called sexualitics.org. I love exploring bizarre datasets and it doesn’t get more bizarre than (brace yourselves) ~786 thousand (!) tagged porn video titles from xhamster.com - the site’s entire inventory from 2007 to 2013…
ritsokiguess.site/docs
Sunday walk - Riverdale park(s)
Today’s walk comes in two parts, with a pause for lunch between. Part 1: From Broadview station, across past the Pizza Pizza, and (carefully) across the entrance ramp to the DVP, and then down the “secret” path: crossing the entrance ramp on a rather dilapidated-looking bridge: I am never quite sure about the next bit…
mdgbeck.netlify.com
Comparison of NBA Draft Classes' Immediate Impact
Curious to know the answer, I looked at each draft class since 1989 (when the draft was changed to two rounds) and their performance in that year’s season. To be clear, this is only looking at immediate contribution, and not long-term success…
dscinomics.com
A first taste of the common workflow language, part 2.
First, we start with the header information: The first two lines are self-explanatory. You have to specify the version of the spec, and that this document refers to a Workflow. Because steps 1 and 2 are essentially the same, I’ll present both below, but only go over the first step…
blog.davisvaughan.com
Amazon RDS + R
Welcome to my first post! To start things off at Data Insights, I’m going to show you how to connect to an AWS RDS instance from R. For those of you who don’t know, RDS is an easy way to create a database in the cloud. In this post, I won’t be showing you how to setup an RDS instance, but I will show you how to connect to it if you have one running…
www.rdatagen.net
Everyone knows that loops in R are to be avoided, but vectorization is not always possible
Again, the specifics of the simulation are not important here. What is important, is the notion that the problem requires looking through individual data sequentially, something R is generally not so good at when the sequences get particularly long, and they must be repeated a large number of times…
gcppodcast.com
Kubernetes 1.6 with Daniel Smith
Q: Francesc and Mark discuss “What is the first thing you do when creating a Google Cloud Platform…
www.stencilled.me
States to Shapes
I started working on this visualation after coming across Mike Bostock’s shape tweening bl.ock , which was done for one state. The source for this data is Insurance Institute for Highway Service(IIHS). The size of the square is based on the motor vehicle deaths per 100,000 people(2015). …
sjfox.github.io
You should make an R package for your paper
R packages are made up of three main things: (1) data, (2) functions, and (3) documentation…
ndres.me
Faster inference in TensorFlow using XLA.
About inference Using neural networks is primarily made of 2 phases: training your model and using it. The later part can also be called inference, forward pass or evaluation. For most researchers, most of the time is used by training : they have to retrain using different architectures or different parameters…
dscinomics.com
A first taste of the common workflow language, part 1.
If you go to the CWL webpage you are greeted, right at the top, by these two sentences: The Common Workflow Language (CWL) is a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC)…
ritsokiguess.site/docs
Bike Ride
It was finally good enough weather for a bike ride today: Not a very long ride, but enough to get the legs going again (with some huffing and puffing into the…
ritsokiguess.site/docs
Rugby League
Last night, Megan and I went to the first ever home game of the Toronto Wolfpack…
ndres.me
Turn any Jupyter notebook into a REST API
Jupyter Notebooks are pretty awesome. They allow you to prototype and experiment with ease…
giorasimchoni.com
Don't Drink and Gamble
Once upon a time I took a job as a Gaming Analyst and Mathematician at a big Online Gambling company. What can I say. The term “Data Science” did not exist, and this was the highest paying job around…
ritsokiguess.site/docs
A very short walk
An even shorter walk around campus today than yesterday; you will see why from the pictures. The walk around the back of the residences was still passable this afternoon…
gcppodcast.com
Container Engine with Chen Goldberg
Chen has a customer-centered development philosophy and believes open source is the best way to innovate and develop incredible technologies that are accessible and beneficial to everyone…
lenkiefer.com
House price growth and employment trends
IN THIS POST I WANT TO REVIEW RECENT EMPLOYMENT AND HOUSE PRICE TRENDS at the metropolitan statistical area. No R code here, but you can recreate the graphs we’ll explore today by following the code in this post…
ritsokiguess.site/docs
Images
For my last post, I had a lot of trouble finding out how to make the images appear. It turns out that the secret is to save them in the static folder, and then to treat that folder as the “root” (that is, start its name with a forward slash) when making the R Markdown for the image…
ritsokiguess.site/docs
Round the back of the S wing
A short walk today around the back of the S wing at UTSC. This is the side of the S wing that nobody sees; it has the concrete brutalism of the outside wall (no entrances from this side) up against the forest that leads down to Highland Creek. You don’t meet too many people around here…
shotwell.ca/blog
Why you should work remotely, even if you're not remote
My last job was as a data scientist at Upworthy, which is a 100% remote company. Prior to starting the position I was worried about whether I could be happy and productive on a remote team…
yonicd.netlify.com
sinew
Sinew is a R package that generates a Roxygen skeleton populated with information scraped from the function…
www.stencilled.me
Harry Potter Characters.
Across 7 books/ 8 movies there are so many characters in this series by J.K Rowling. For this project I have got the data from data.world. All files here were combined to create a json with nodes and links. Using d3js I have visualized the characters connected to each other. …
lenkiefer.com
House price visualizations
I AM JUST GOING TO DROP OFF A COUPLE OF housing data animated gifs here. And add a little bit of code for data wrangling. In earlier posts I described how to make these using R in greater detail. We’re going to be visualizing the Freddie Mac House Price Index…
giorasimchoni.com
Federer, Nadal, Djokovic and Murray, Love.
What do people mean when they say “The Big Four”? In tennis, there is a single answer: Roger Federer, Rafael Nadal, Novak Djokovic and Andy Murray. These 4 players have dominated the world of men’s tennis throught the beginning of the 21st century…
ritsokiguess.site/docs
About this blog
I discovered that one can host a blog on Github, on which I already have an account, via a thing called Github Pages…
ritsokiguess.site/docs
Sunday walk in Scarborough
Actually, a bit more than that, since I started at Kennedy station…
www.stencilled.me
Breweries or Wineries ??
.myIframe { position: relative; padding-bottom: 65.25%; padding-top: 30px; height: 0; overflow: auto; -webkit-overflow-scrolling:touch; //When you think about Texas, the first thing that would come in mind is barbecue which of course pairs well with a good beer…
nowosad.github.io
Intro to R workshop
Last week, I had a pleasure to conduct a workshop for graduate students and faculty in the Department of Geography and GIS at the University of Cincinnati…
www.ifconfig.it/hugo
TAC Security Workshop - Poland
This week I attended and event organized by Cisco TAC in Krakow. I’ve been in may Cisco events (Live, PVT, Pint etc.) but It was the first time for me at a TAC workshop and I was curious about it…
dscinomics.com
R upgrading can be a smooth process
In this post I will cover: On a Mac, by default, this will return two paths (Linux will give you something similar): Just to make sure it all worked well, you might want to run the following: We will use that file we saved above to generate a list of packages. Take the following steps: You are in easy territory then…
giorasimchoni.com
You Better (Net)Work!
RuPaul’s Drag Race happens to be my second best reality TV show. Can you guess the first? Oh, I’m sure I’ll get to that eventually. Anyway…
gcppodcast.com
Cloud Video Intelligence API with Sara Robinson
When she’s not programming she can be found on a spin bike, listening to the Hamilton soundtrack, or finding the best ice cream in New York…
lenkiefer.com
Visualizing uncertainty in housing data
HOUSING DATA ARE OFTEN MEASURED WITH CONSIDERABLE uncertainty. Estimates are usually based on small samples that are subject to sampling variability…
lenkiefer.com
Animate a bivariate choropleth
IN THIS POST I WANT TO EXTEND ON yesterday’s post and build an animated bivariate choropleth. We’ll use the same data as yesterday and create a combined scatterplot with bivariate choropleth map and animate it with R. Let’s get right to it…
giorasimchoni.com
Anne Frank's Diary
A while ago I read through Social Media Mining with R and was fascinated by the subject of Sentiment Analysis…
lenkiefer.com
Bivariate choropleth maps with R
NOTE: After I posted this (like within 5 minutes) I found this post which also constructs bivariate chropleths in R. IN THIS POST I WANT TO REVISIT SOME MAPS I MADE LAST YEAR…
sjfox.github.io
Simple trick to speed up ODE integration in R
Let’s also confirm that both models give the same…
blog.mgechev.com
7 Angular Tools That You Should Consider
In this article we’re going to quickly explore 7 Angular development tools which can make our everyday life easier. The purpose of the list is to not be opinionated architecture wise…
ewen.io
The Representativeness of UK Parliament
Leveraging open data on politicians to learn more about representation and diversity within UK…
lenkiefer.com
What's that on the horizon? An awesome dataviz!
This post is everything you want it’s everything you need it’s every viz inside of you that you wish you could see it’s all the right viz at exactly the right time but it means nothing to you and you don’t know why LET US MAKE SOME HORIZON CHARTS. What is a horizon chart you ask? That’s exactly what I was thinking earlier this weekend…
lenkiefer.com
Treemapify those pies!
TIME FOR ANOTHER DATAVIZ REMIX. Saw on Twitter that @hrbrmstr posted a remix of a Wall Street Journal visualization over at rud.is. The original WSJ article used pies of various size to compare recent store closings. As we usually do in this space, we’ll use R to create our plots. Let’s mix things up and go remix the remix…
www.blog.rdata.lu
Scraping data from STATEC's public tables
After watching the video, take a look at the code below. This code does two things; first it scrapes the data, and then it puts the data in a tidy format fur further processing. As you can see, we got the data in quite a nice format, but it still needs to be cleaned a bit…
lenkiefer.com
Gather round and spread the word
IN THIS POST I WANT TO SHARE SOME R data wrangling strategy and use it to prepare an update to some global house price plots I shared last year. In last year’s post I did some data manipulation by hand and mouse in Excel before getting into R…
gcppodcast.com
Cloud Functions with Bret McGowen
Bret is on the Google Cloud Platform team at Google, focusing on developer-oriented products like Google Cloud Functions, App Engine, Firebase, machine learning APIs, and more. He’s currently an aspiring Node.js developer…
lenkiefer.com
Housing gets off to a good start
IN 2016 HOUSING IN THE UNITED STATES HAD ITS BEST YEAR IN A DECADE (see my review or my flexdashboard remix) and so far 2017 has gotten off to a good start…
lbusett.netlify.com
MODIStsp (v 1.3.2) is on CRAN !
In this case, for example, Land Surface Temperature values in the output rasters will be in °K, and spectral indices will be floating point values (e.g…
lenkiefer.com
Let's Pixelate America
LET’S PIXELATE AMERICA. This morning I happened across a fun blog post on how to generate Pixel maps with R via R weekly…
lenkiefer.com
Of kernels and beeswarms
BACK IN JANUARY WE LOOKED AT HOUSING microdata from the American Community Survey Public Microdata that we collected from IPUMS. Let’s pick back up and look at these data some more. Glad you could join us. Be sure to check out my earlier post for more discussion of the underlying data…
lenkiefer.com
Mortgage rates after dark
TONIGHT WE VISUALIZE MORTGAGE RATES AFTER DARK. Last year I shared 10 amazing ways to visualize mortgage rates (and more ways and even more ways). In this post I have one more DATA VISUALIZATION (dataviz) for you. I was putting together a presentation using remark…
shotwell.ca/blog
Data Visualization and UI design
While I think the basic idea of the initial app was a good one, the implementation had a lot of problems. The user interface was confusing and there were a lot of counter-intuitive design decisions. Since I was the source of most of these decisions, I thought the release of version 2…
gcppodcast.com
Customer Reliability Engineering with Luke Stone
Luke is defining the customer experience of Google’s new Customer Reliability Engineering (CRE) team. When he joined Google in 2002 he was the first technical support engineer for AdSense. He ran software engineering teams and started building on Google App Engine in 2009…
lenkiefer.com
Plotting house price trends with FRED and R
IN THIS POST I AM going to share some useful code to create some custom plots using the St Louis Federal Reserve Economic Database (FRED). While the FRED page has some nice chart customization options, I’m going to import the data into R with the quantmod package and draw the plots…
www.stencilled.me
Indian Premier League so far.
Indian Premier League(IPL) has had 13 different teams over the past ten years…
blog.mgechev.com
Announcing ngrev - Reverse Engineering Tool for Angular
Have you ever been hired to work on a huge legacy Angular project with thousands of NgModules, components, directives, pipes and services? Neither do I. Angular (2 and above) is still relatively new framework and there are not many enormous projects out there…
adamspannbauer.github.io
Tutorial
Note: Please let me know if you follow the tutorial and are unable to setup a messenger bot successfully; I’d be happy to update the steps to make the tutorial more useful…
blog.brianz.bz
Elixir for Pythonistas part I
For the past many many years my goto language has been Python…
yonicd.netlify.com
slickR
This tool helps review multiple outputs in an efficient manner and saves much needed space in documents and Shiny applications, while creating a user friendly experience. These carousels can be used directly from the R console, from RStudio, in Shiny apps and R Markdown documents…
gcppodcast.com
Cloud Machine Learning Engine with Yufeng Guo
Cloud Machine Learning Engine offers a managed platform for training and serving Tensorflow models. Kubernetes 1…
tojyouso.github.io
Analysing the words I write in my journal
It’s like having a twitter feed just for yourself. The only negative (and not even a real negative) is the price of the Mac app: But honestly, I think it would be worth it - it’s a one off price, not a subscription. The only reason I haven’t made the commitment is I’m not sure I’ll still be on Mac in a year’s time…
yonicd.netlify.com
ggedit 0.2.0
To install the package you can call the standard R command To install the dev version: ggedit is an R package that is used to facilitate ggplot formatting…
blog.sellorm.com
Customising Shiny Server HTML Pages
This post was originally published on the Mango Solutions blog. At Mango we work with a great many clients using the Shiny framework for R…
mlr-blog.netlify.com
Most Popular Learners in mlr
For the development of mlr as well as for an “machine learning expert” it can be handy to know what are the most popular learners used. Not necessarily to see, what are the top notch performing methods but to see what is used “out there” in the real world…
lenkiefer.com
QR code or dataviz?
TODAY I MADE A KIND OF SILLY DATAVIZ, a tile plot of weekly changes in mortgage rates. A colleague happened by my viz terminal, pointed at my monitor and asked “what is that, a QR code?” Nope, it was a tile plot…
rmflight.github.io
Criticizing a Publication, and Lying About It
Also of note, the proteins with aberrant zinc geometries showed enrichment for different types of enzyme classifications than those with canonical zinc geometries. Finally, no one from RWJ2016 ever contacted our research group to see if the results might be available…
gcppodcast.com
Drone CI with Brad Rydzewksi and Jessie Frazelle
What is protocol buffers, and why should we all start using…
lenkiefer.com
Resampling
THIS PAST MONTH HAS BEEN BUSY. People have been traveling, I’ve been traveling, kids have been sick, and we’ve had the March Madness basketball keeping me occupied. Today I wanted to just explore a little analysis I’ve put together on resampling…
www.mytinyshinys.com
UK 2015 Election Mapped
Just providing a quick update to the previous post. Since that was done a few weeks ago, Evan Odell has been doing some great work on enhancing his Hansard package details of which you can view here…
mlr-blog.netlify.com
Multilabel Classification with mlr
Multilabel classification has lately gained growing interest in the research community. We implemented several methods, which make use of the standardized mlr framework…
mlr-blog.netlify.com
New mlr Logo
We at mlr are currently deciding on a new logo, and in the spirit of open-source, we would like to involve the community in the voting process! You can vote for your favorite logo on GitHub by reacting to the logo with a +1…
mlr-blog.netlify.com
Parallel benchmarking with OpenML and mlr
With this post I want to show you how to benchmark several learners (or learners with different parameter settings) using several data sets in a structured and parallelized fashion. For this we want to use batchtools. The data that we will use here is stored on the open machine learning platform openml…
gcppodcast.com
Server Density with David Mytton
Server Density provides an open source logging and monitoring solution running on Google Cloud Platform…
mlr-blog.netlify.com
Use mlrMBO to optimize via command line
Many people who want to apply Bayesian optimization want to use it to optimize an algorithm that is not implemented in R but runs on the command line as a shell script or an executable. We recently published mlrMBO on CRAN…
www.stuartlee.org
Theories of Data Analysis
Diaconis says that magical thinking is the tendency to see patterns in the noise and to persist believing false statements despite evidence to the contrary…
ewen.io
A Sentiment Analysis of Kanye West Records
Using the Genius API and sentiment analysis techniques to explore Ye’s…
www.stencilled.me
Maps and gifs
As I have been using R for a while, one of the things I wanted to do was a time series map. Most of the time series maps I see have sliders to change the years. While looking at how to make time series maps I happened to lear how to make a GIF with a set of images…
blog.wallaroolabs.com
Hello Wallaroo!
We handle the hard infrastructure problems so you don’t have to. Welcome to the Wallaroo Labs Engineering blog. I’m writing today to introduce you to Wallaroo, the product we’ve been working on for a little over a year now…
livefreeordichotomize.com
ENAR in words
I had an absolutely delightful time at ENAR this year. Lots of talk about the intersection between data science & statistics, diversity, and exceptional advancements in statistical methods. “By a small sample we may judge of the whole piece” – – I loved it, but let’s see what others were saying! Check out this word cloud of the most commonly tweeted words…
gcppodcast.com
The Home Depot with William Bonnell
Why should I be using Cloud Spanner, rather than Cloud SQL? (Thanks…