Statistics, computer biology and the future. R!

I have been attempting to learn R. And biostatistics and how to analyze large datasets…not having been traditionally trained in these things (at least not in a super formal way).

It’s going slowly in fits and starts. @ALoraine205 suggested I document what I’ve been tryign to learn. That won’t be pretty as I haven’t been able to devote my full attention to it, so it’s easy to forget what I’ve already learned.

So, why do I want to learn to do all of this? Because I want to push myself, but also prepare for the future when statisitics, computers and large datasets are all that will exist in biology in terms of data.

So here’s the inaugural post of an Idi  Ian learning R.

What do I know now?

1. I can load my data into R and am familiar with R studio and some packages.

2. I can do basic arithmetic, and t-tests as well as make very basic plots.

3. I know how to do some very, very simple manipulations of data to format it properly.

4. I know what packages are, but still am very unclear on what most of them actually do in any detail.

What are my stumbling blocks? 

Too numerous to count. Main = Main?! Yes, but in R Main = Main…the mains are different, some how? huh?! Still don’t fully grasp that one. And the help files for the various functions are gibberish to me. It’s English (my native language), but make no sense words do in help file. Googling things isn’t much more helpful.

Some of this blind spot is lack of education in statistics and in handling large datasets; but maybe I’m getting there. I know at least most of the software tools that people use to analyze these large datasets, but still have trouble running them myself nad am not sure how they all fit together with one another.

I am also not a computer programmer, so the command line thing is a little challenging too.

What are my goals?

1. Biggest one is that I want to do all of my statistics and data analysis in R, if possible; I think that’s possible. Most of what I have are continuous data measurements, qRTPCR data and other things like that that R should breeze through. And I would love to be able to better mine published large data sets (microarrays, RNAseq, etc.) as well as design and analyze my own.

Anyway, I hope someone finds my path to learning enlightening some how or can help me along the way; for years, I was terrified of saying I didn’t know something or how to do something (oh, then I’ll just pass this off to someone who does– since taking time to learn something new isn’t something anyone has time for any more…at least that’s how I sense the culture around me). And it’s showing vulnerability- possibly weakness at some level. Hopefully this will help me learn, help teachers figure out where student’s stumbling blocks are (if they’re not aware already) and hopefully we’ll have some fun too. This should become a regular feature on the blog. Let me know if it doesn’t.












Author: Ian Street

Ian is a plant scientist and science writer relating stories of plant science and scientists on his blog, The Quiet Branches as well as other outlets. You can find him on Twitter @IHStreet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s