www.rladiesnyc.org
Benjana Guraziu
R is an awesome language, but I wouldn’t be this excited about it if I weren’t so excited about the people that are in this community. NYR was a great conference that really highlighted the strength of the R community…
livefreeordichotomize.com
Bringing the family together
My husband’s family throws a family reunion every year and this year we’ve been tasked with co-planning it. We were trying to decide on the best location for everyone, so I embarked on a mission to find the center of all of our residences. library(tidyverse) library(leaflet) Geocoding the locations I began by putting together a quick …
gcppodcast.com
ML Kit with Brahim Elbouchikhi and Sachin Kotwani
He holds an MBA from Carnegie Mellon University, and dual bachelor’s degrees in Business Management and Computer Science from the University of Missouri - Columbia. His hobbies include traveling with his family, chasing his daughter around the house, and tinkering with mobile apps and backends…
aosmith.rbind.io
Time after time
I first learned how to check for autocorrelation via autocorrelation function (ACF) plots in R in a class on time series However, the examples we worked on were all single, long term time series with no missing values and no…
simplystatistics.org
Creativity in Data Analysis
Missing data are present in almost every dataset and the most important question a data analyst can ask when confronted with missing data is “Why are the data missing?” It’s important to develop some understanding of the mechanism behind what makes the data missing in order to develop an appropriate strategy for dealing with missing data…
djnavarro.net
Day 55-62: R: The Boring Bits
The “random walk on CRAN” project, however, has been on hold for a bit - and in truth today’s post is a bit of a cop out because there’s no package here at all and barely anything resembling code. Instead, it’s some initial thoughts about how to revisit some of my teaching material. For today though, I have a different goal… But still, it’s nice to think about what we might do if we have more…
www.rostrum.blog
Mail merge in 2018 with R
Matt Dray Two-thousand and late Clip art! Fax machines! CD-ROMs! Dial-up modems! The World Wide Web! Mail merge! These exotic terms give me flashbacks to computer class at the turn of the millennium…
magesblog.com
Models are about what changes, and what doesn't
The purpose of most models is to understand change, and yet, considering what doesn’t change and should be kept constant can be equally important. Models are about what changes, and what doesn’t. Some are useful. In mathematics change is often best described with differential equations, and that’s how I will motivate and justify my models today…
emmavestesson.netlify.com
My first hackathon (part 1)
Gender pay gap hackathon Last weekend I went to my first hackathon. It was organised by the AI club for gender minorities, codebar and ellpha. We used data on the gender pay gap available here. I had a great time so I wanted to share my experience. This is the first part of my first hackathon…
www.rdatagen.net
Re-referencing factor levels to estimate standard errors when there is interaction turns out to be a really simple solution
Maybe this should be filed under topics that are so obvious that it is not worth writing about. But, I hate to let a good simulation just sit on my computer…
blog.rstudio.com
Shiny 1.1.0
Without this capability, when Shiny performs long-running calculations or tasks on behalf of one user, it stalls progress for all other Shiny users that are connected to the same process…
masalmon.eu
Storrrify #satRdayCDF 2018
Now, let’s have a look at the day as tweeted by me… I obtained 22…
blog.sellorm.com
Running python in the RStudio IDE
A quick look at running python inside the RStudio IDE. When version 1…
evangelinereynolds.netlify.com
Selection effects
My limited goals: Perhaps the central difference between working in the Stata environment and in R is that in R you always have to be declaring which data frame you are working with. In Stata, you just have one active data frame and then you can refer to the variables by their names alone…
jacobbuckman.com
Tensorflow
This post is my attempt to fill this gap. Rather than focusing on a specific task, I take a more general approach, and explain the fundamental abstractions underpinning Tensorflow. With a good grasp of these concepts, deep learning with Tensorflow becomes intuitive and straightforward…
blog.schochastics.net
Fast Fiedler Vector Computation
While this is easy to implement, it comes with the huge drawback of computing many unnecessary eigenvectors. We just need one, but we calculate all 100 in the example. The bigger the graph, the bigger the overheat from computing all eigenvectors…
bgstieber.github.io
Golf, Tidy Data, and Using Data Analysis to Guide Strategy
Introduction I’m going to use this post to discuss some of the aspects of data science that interest me most (tidy data as well as using data to guide strategy). I’ll be discussing these topics through the lens of a data analysis of results from a few high school golf tournaments…
lenkiefer.com
Plotting house price and income trends
In this post we will create some plots of house prices and incomes for the United States and individual states. We will also try out the bea.R package to get data from the U.S. Bureau of Economic Analysis. We’ll end up with something like this: Per usual we’ll do it with R and I’ll include code so you can follow along…
wytham.rbind.io
Solution to a frustrating rJava problem
Go to the command line and run: According to the solution at the aforementioned link, this will “create a link to libjvm.dylib inside R’s lib folder”…
rmflight.github.io
Using IRanges for Non-Integer Overlaps
Lets actually test differences in speed by counting how many overlapping points there…
ropensci.org/blog
Announcing new software review editors
The overall goals of rOpenSci are fully aligned with my interests and passions, both personally and also professionally as a Research Software Engineer, tasked with helping researchers make the most of their code and…
engineering.pivotal.io
Diagnosing Ruby Memory Issues in Cloud Foundry's API Server
Occasionally our end-users will use the platform in ways we might not have predicted, which results in unique and difficult-to-reproduce issues…
divingintogeneticsandgenomics.rbind.io
How to upload files to GEO
I used my google account. soft link does not work for me… After your transfer is complete, you need to tell the NCBI. After file transfer is complete, please e-mail GEO with the following information: - GEO account username (tangming2005@gmail.com); - Names of the directory and files deposited; - Public release date (required - up to 3 years from now - see FAQ)…
yihui.name/en
Ideally, I Hope to Simply Copy and Run Your Example
That is why you should provide a fully reproducible example whenever possible, instead of describing all the steps to create such an example. For step 1, I had to copy it to an Rmd document in my RStudio session. For step 2, I had to copy the code and put it in an R code chunk…
sciathlon.github.io
Multiple Sclerosis and exercise
Hi athletes. I recently went to a Multiple Sclerosis (MS) conference in Paris organized by the ARSEP foundation, mostly to see what people are doing on the immunology side of the disease, and I started looking into the foundation a bit more and that’s how I started doing this article…
ryantravis.netlify.com
Odds ratios and logistic regression basics
Binary outcome variables that only take on two distinct values such as alive vs. not alive are very common in medicine and elsewhere…
yihui.name/en
On Cache Invalidation
First of all, the main purpose of caching is speed. The basic idea is simple: if you know you are going to compute the same thing, you may just load the result saved from the previous run, and skip the computing this time. There are two keywords here: “the same thing”, and “the saved result”…
sarahromanes.github.io
R-Ladies Sydney Launch!
I thought I would spend all my PhD reading textbooks and learning new R techiniques (see below). However research, admin, and teaching can get in the way! I learned how to develop packages to complement my research and to also improve workflow Learn from examples! Eg, this presentation was based off Alison Hill’s R Ladies talk about blogdown! Work smart, not hard…
cevo.com.au
Story time
I’ve often been asked what information should be put inside a user story, or what kind of template I use for a user story on a project. In order to provide a useful answer to this question, we must look back and see where user stories came from…
blog.wallaroolabs.com
Implementing Time Windowing in an Evented Streaming System
Hi there! Welcome to the second and final installment of my trending twitter hashtags example series. In part 1, we covered the basic dataflow and logic of the application…
www.njtierney.com
Naming Things
The oldSchool namer generally mixes case in an R package, often capitalising the “R”, or going all in on ALL CAPS. Examples: Although personally I wouldn’t use this style as it can make it difficult to type, they have a certain charm, and are easy to google - provided you spell it…
lcolladotor.github.io
SciLifeLab Prize
Last year I submitted an entry to this competition and I enjoyed the experience, even if it was a bit rushed. The process of joining the competition is relatively straight forward: You don’t need to pay for competing! You already did the very hard part of completing your Ph.D…
www.tidyverse.org/articles
The tidyverse is for EDA, not packages
Because the tidyverse is a set of packages designed for interactive data analysis, this is, in short, a bad idea…
rmflight.github.io
Turn Robert's Beard Purple!
Links: I decided to participate in the Walk To End Alzheimer’s this year, coming up on August 25th here in Lexington…
www.justadatageek.com
Continued Introduction
This is a continuation to my first post just to see how things look using blogdown. So… first, I’ll call one of the built in data sets: library(tidyverse) library(hexSticker) data(cars) summary(cars) ## speed dist ## Min. : 4.0 Min. : 2.00 ## 1st Qu.:12.0 1st Qu.: 26.00 ## Median :15.0 Median : 36.00 ## Mean :15.4 Mean : 42.98 ## 3rd Qu.:19.0 3rd Qu.: 56.00 ## Max. :25…
fharrell.com
Data Methods Discussion Site
I have learned more from Twitter than I ever thought possible, from those I follow and from my followers. Quick pointers to useful resources has been invaluable. I have also gotten involved in longer discussions…
yihui.name/en
HTML Widgets for Non-HTML Output Formats
Screenshotting HTML widgets, Shiny apps, and arbitrary URLs works in any non-HTML output formats: PDF, Word, EPUB, RTF, and PowerPoint, etc…
sharanry.github.io
It's been a Month
Its been a month since the coding period of GSoC has started. My first evaluations are done. I have passes it successfully thanks to my mentor Colin. :) First few weeks of the coding period was spent familiarizing myself with the basic concepts of Bayesian Inference…
www.datalorax.com
Looking into #KeepFamiliesTogether
This week I’m at the Seattle branch of the Summer Institute on Computational Social Science…
blog.rstudio.com
RStudio Connect v1.6.4
There are a few breaking changes and a handful of new features that are highlighted below. We encourage you to upgrade as soon as possible! Please take note of important breaking changes before upgrading. RStudio Connect includes Pandoc 1 and will now also include Pandoc 2…
evangelinereynolds.netlify.com
The visual taming of a paradox
@drob has posted code to play with on Twitter today. To illustrate what he calls a veridical paradox he’s posted the set up, the code and result of a coin flipping experiment: There are some good and exact explanations in the thread, for this at-first-glance puzzle. But I didn’t see a visualization that might give you quick intuition about what is going on…
lcolladotor.github.io
Using Slack for Academic Departmental Communication
versus This blog post was made possible thanks…
nowosad.github.io
GeoPAT 2
Now, try to do the same with the image below representing a land cover over a part of Eastern…
magesblog.com
Principled Bayesian Workflow
He gave also succinct advise for model calibration and validation…
davemcg.github.io
Quick Guide to Gene Name Conversion
There are several popular naming systems for (human) genes: But, he did not add Refseq names. So if you need to get RefSeq names into one of the others, you’ll have to do another step…
masalmon.eu
Really not a fish? Scraping my mathematical family tree
From the above I deduced that I was allowed to scrape mathematicians’…
simplystatistics.org
The Role of Resources in Data Analysis
When learning about data analysis in school, you don’t hear much about the role that resources-time, money, and technology-play in the development of analysis. This is a conversation that is often had “in the hallway” when talking to senior faculty or mentors…
aebou.rbind.io
How I Found Myself doing Data science in marketing
Since February 26 i am working at Seedstars Ivory Coast in a venture named Bora digital who works mostly in marketing and digital marketing…
watanabesmith.rbind.io
If Ranked-choice voting decided the (second) BEST Black Mirror episode
The data is particularly unique because many users did not make a full ranking of all 19 episodes, with some users ranking just a single episode as their favorite…
www.justadatageek.com
Introductory Post
After a bit of research and testing, I have decided to start using the blogdown package in R in order to continue blogging…
www.matteodefelice.name
The Copernicus toolbox and the role of software in climate services
The best thing of the C3S is that they are trying to foster the creation of a ecosystem of data services and — not surprisingly — software (design, development, architecture) plays a critical role here…
watanabesmith.rbind.io
Behind the Viz
Lots of packages here: The gather() function pivots the data, we name “episodes” as the key (what the column names will be called) and rankings as the values (what the data in those columns will now be called), while telling the function to not mess with the columns user, other, or…
djnavarro.net
Day 51-52: Kabling
Not wanting to give away the list of things that might appear on bingo cards (or alternatively, not actually having written all the items yet!) I’ll need to find some content to use for this post… Oh yes little network, you had me at “hello”. Here’s what we get: It’s a good start, but it’s all structure and no style! No, wait, that doesn’t work. Why? There we…
evangelinereynolds.netlify.com
Federalist Papers
Every couple of weeks I like to explore data that’s brand new to me. I anticipate a one-hour, one-off project. Usually this turns out to be a beautiful lie, and the projects chew up much more time. Still, this enticing time-line is pulling me into new projects from time to time…
blog.zenggyu.com/en
Multivariate Adaptive Regression Splines in a Nutshell
Like standard linear regression, MARS uses the ordinary least squares (OLS) method to estimate the coefficient of each term. However, instead of an original predictor, each term in a MARS model is a basis function derived from original predictors. A basis function takes one of the following forms: MARS does not treat categorical predictors differently from standard linear…
yihui.name/en
One Little Thing
One of the most frequent topics on which I blog in recent years is Chinese literature (of course, only in my Chinese blog). In particular, I often quote poems in my posts. To quote a poem in Markdown, you have to add two trailing spaces after every line of the poem…
bayesianbabes.netlify.com
Probability
Say we have a fruit bowl–mmm!–consisting of strawberries, raspberries, blueberries, and blackberries. There are 50 strawberry slices, 20 blueberries, 15 raspberries, and 15 blackberries…
yutani.rbind.io
Re-introduction to gghighlight
But, please forget about that gghighlight; gghighlight has become far more powerful and simple! So, let me re-introduce about gghighlight. What do you do when you explore a data that is too large to print? OK, good…
cevo.com.au
How to debug AWS Application Load Balancers with minimal colourful vituperations
Introduction From the Temples of Testers, a browser bestowed a 504 gateway timeout in your newly deployed internal facing Application Load Balancer (ALB). There was a gnashing of molars and gurning of visages. Your ALB isn’t responding. Don’t panic and be “oh wow! heavy heavy heavy” like Neil from the Young Ones…
simplystatistics.org
People vs. Institutions in Data Analysis
Invest in businesses any idiot could run because someday one will. A perhaps more detailed version of this sentiment comes from fellow legendary investor Warren Buffett, in his testimony before the U.S…
www.njtierney.com
What I wish I'd known
Something I never tire of hearing is the story of how someone arrived at where they are…
yihui.name/en
Build Binary R Packages for the Homebrew Version of R?
Personally I don’t really care about if a project is “sticky”, and I believe being magnetic is a lot more difficult and valuable than being sticky for an open source project. Okay, that is a little bit digression for this blog post. If you use the Homebrew version of R, it will be super easy to upgrade or remove R in the future…
evangelinereynolds.netlify.com
Covariance -- A Visual Walk Through
In a previous post, I’ve looked at walking through the calculation of variance and standard deviation, visualizing each step. This post is dedicated to the visualization of another statistic: covariance…
djnavarro.net
Day 47-50: Paletter
Okay! It’s a Thursday evening. Solo parenting is over. My partner is back in town. The kids are in bed. Tina Turner is playing over the wireless. I’m the last one awake. Time for an R post, because that’s just the kind of girl I…
ropensci.org/blog
Exploring European attitudes and behaviours using the European Social Survey
The 4th of March of 2018 I submitted the package to rOpensci, intimidated but very excited about the peer review process. To my surprise, the process was enriching, respectful and transparent, unlike my previous experience in academic research…
blog.wallaroolabs.com
Real-time Streaming Pattern
Introduction I am starting a series of posts looking at a variety of data processing patterns used to build real-time stream processing applications, the use cases that the patterns relate to, and how you would go about implementing within…
bayesianbabes.netlify.com
Uncertainty and Sample Size
As a hungry botanist I couldn’t think of a more a-peel-ing metaphor than fruit. Let’s say we have a fruit bowl consisting of strawberries, raspberries, blueberries, and blackberries…
brendanmolin.netlify.com
World Cup Club Representation
It’s common knowledge amongst association football fans that the World Cup, while being the highest profile event in the world, isn’t necessarily a display of the best football play in the…
www.rdatagen.net
Late anniversary edition redux
This afternoon, I was looking over some simulations I plan to use in an upcoming lecture on multilevel models. I created these examples a while ago, before I started this blog…
mvaugoyeau.netlify.com
Principal Component Analysis
PCA Number of factor retained by psycho::n_factors() Extraction of the variables With article about correlations, we saw data from airquality were correlated. Sometimes it is need to use Principal Component Analysis (PCA) to determine non correlated variables in order to analyze data…
ritsokiguess.site/docs
Tidy matched pairs t-test
Introduction The matched pairs (t)-test is for comparing two measurements obtained on the same individual, such as a before and an after measurement. This is different from the two-sample (t)-test, which has two independent sets of measurements, one for each experimental condition, with each set collected on different…
evangelinereynolds.netlify.com
Eat near the Big Ben? That will cost you...
#MakeoverMonday is a fun data visualization initiative; most participants use Tableau as their preferred visualization tool. But I’ve used R and ggplot() and the organizers and participants have been very welcoming…
www.juliapilowsky.com
My current research, for laypeople
I have struggled a lot to explain my current line of research to people in my life who aren’t scientists. But if I can’t explain my research to everyone, then I can’t claim to really know what I’m doing…
yihui.name/en
The Best Experience in Remote Talks that I Have Given
For someone who usually does not prefer traveling (and cannot travel too far in a couple of years), I have to say this system is just perfect…
ropensci.org/technotes
The ssh Package
Because the ssh package is based on libssh it does not need to shell out. Therefore it works natively on all platforms without any runtime dependencies. Even on Windows…
www.njtierney.com
naniar 0.3.1
There were a few things that changed in this release, some of them big, some small, and some technical, let’s break them down…
ellocke.github.io
(R) Fetching JSON
1 The Setting 2 Inspect a Website’s DOM & HTTP Requests 3 Extract JSON 3.1 Quick: JSON from File (copy & paste) 3.1.1 It’s all about [[1]] 3.1.2 Two Lines, Pt. 1
3.2 Robust: Fetch JSON from API with GET 3.3 Parse JSON
4 From JSON to Tidy 4.1 Tidy Date #1 with lubridate 4.2 Tidy Date #2: type_convert() to the Rescue / Two Lines, Pt…
lcolladotor.github.io
Mindfulness
All these tweets are threads, so you’ll have to open them to see them: click on the blue bird on the right side of each tweet. Lately I’ve been processing some strong feelings related to feeling unwelcome, homesickness and loneliness…
leonawicz.github.io/blog
tiler 0.2.0 CRAN release
Lastly, consider the power of your system before attempting to make a ton of tiles for large images at very high resolutions…
www.robert-hickman.eu
Could an Independent Yorkshire Win the World Cup - Rest of the World/UK
To save time, I’m gonig to used saved versions of the datasets I built up over the 5 blog posts. I won’t include the functions in this blog post either, but the article uses (at most very slight modified) functions from the previous 5 posts. We first need to sort the players into either the UK vs…
djnavarro.net
Day 39-46
This series of posts has been on hold for the last few days because I’ve been solo parenting and had a few deadlines at work. I have no idea how single parents…
lenkiefer.com
Kalman Filter for a dynamic linear model in R
As an economist with a background in econometrics and forecasting I recognize that predictions are often (usually?) an exercise in futility. Forecasting, after all, is hard. While non-economists have great fun pointing this futility out, many critics miss out on why it’s so hard. There are at least two reasons why forecasting is hard…
cevo.com.au
When 'Docker' meets 'Make'
Being a DevOps engineer, it’s very common that we use tools like AWS CLI, Docker/ECS, and Ansible to build continuous deployment solutions. It is also common to use tools like JenkinsCI to fully automate the deployment of your applications. Recently I have experienced that, due to some bizarre and varied reasons, you cannot always use CI…
www.robert-hickman.eu
Could an Independent Yorkshire Win the World Cup - Simulate World Cups
Now that we have the teams for each county, we want to work out how well they would do at a world cup. For this, we need to know roughly what their ranking would be compared to actual nations…
yutani.rbind.io
Plot geom_sf() On OpenStreetMap Tiles
Clearly display license attribution. For “Technical Usage Requirements” section, I have to read this more carefully. Let’s look at the requirements one by one. Valid HTTP User-Agent identifying application. Faking another app’s User-Agent WILL get you blocked. If known, a valid HTTP Referer…
ropensci.org/blog
.rprofile: Julia Silge
KO: What is your name, job title, and how long have you been using R? KO: Wow! What were you all about before that? I have a bachelor’s degree in physics, a PhD in astrophysics, and I did a postdoc and research for a while…
www.robert-hickman.eu
Could an Independent Yorkshire Win the World Cup - Picking Teams
First, we need a list of plausible formations, and the positions they contain. There’s a handy list of the default FIFA18 formations online which we’ll scrape…
www.datalorax.com
Peeking behind the curtain with {slidex}
I gave a lightning talk (slides here) this past weekend at the second annual Cascadia R Conference that was focused on creating and contributing new themes to the {xaringan} package, which is essentially a really well thought out and well-organized R Markdown wrapper around the remark…
ropensci.org/blog
Unconf18 projects 4
In the spirit of exploration and experimentation at rOpenSci unconferences, these projects are not necessarily finished products or in scope for rOpenSci packages…
evangelinereynolds.netlify.com
What’s the IGO dataset?
This webpage is meant provide students and the curious with an visual, explorable introduction to the dataset. The number of IGOs observed over the time period of 1815 to 2005 has dramatically increased. At the beginning of this period there were just a handful, but now they number more than 300…
g-tierney.github.io
Article Round Up June 2018
The first article is quite long, but easily skim-able…
www.robert-hickman.eu
Could an Independent Yorkshire Win the World Cup - Finding British Player's Birthplaces
To select our county teams, we need to know where each British player was born (and thus their ‘county’ nationality)…
peerchristensen.netlify.com
Fair is foul, and foul is fair
We see that the text variable contains one line of text for each row. Given this format, we can create a new data frame with a row for each word token found in the Bing lexicon of sentiment words. By using this lexicon, sentiment words are simpky assigned a value of positive or negative. Have a look at the other options with ?get_sentiments…
mgb-research.netlify.com
Power Analyses for an Unconditional Growth Model using {lmer}
We collected measures of these variables at three time points, approximately evenly spaced apart, and, for the purposes of these analyses, I decided to treat the data as if they were collected at precisely the same equally spaced interval for all participants…