Science

Database soup: GenBank, RefSeq, TPA, and UniProt

In the Letters section of the May issue of Microbe, Tatiana Tatusova at NCBI writes a great summary and comparison of GenBank, RefSeq, UniProt, and Swiss-Prot in an article titled “GenBank, RefSeq, TPA, and UniProt: What’s in a Name?”. Certainly, a useful introduction to these resources.

Science
Web

Comments (0)

Permalink

Electronic Laboratory Notebooks don’t work in the Wet-lab

An interesting discussion has arisen on Electronic Laboratory Notebooks (ELNs) and why “wet lab” biologists don’t use them.

Neil points out several advantages of ELNs, which I will paraphrase:

  • Easily share data with lab members/collaborators
  • Search experiments by date, title, keywords, tags
  • Link experiments to related resources: “protocols, MSDS data, risk assessments, plasmid maps, PubMed entries, tagged data in social networks”

In a comment, Curtis says:

“While nearly every aspect of the modern research enterprise changes quickly, the lab notebook hasn’t changed in over 100 years”,

which he means as an indictment of the paper based notebook. I would take the opposite tack; If it has worked for 100 years or more, maybe we shouldn’t be so quick to throw paper based notebooks out.
The problem with ELNs is that they are inconvenient, compared to paper notebooks. ELNs require access to a computer to read or write. Also, it is easier to lose digital information, than paper and pen based information (I’ve had far more hard disk crashes than I care to think about).
Most laboratory computers aren’t at the bench, they are at the desk. Since they aren’t located where the work is occurring, there are three main options for writing up experiments.

  1. Write up experiments completely before the experiment occurs, then perform the experiment just as planned.
  2. Try to remember everything and type it up afterwards.
  3. Write things in pen and paper, then transpose into the ELN.

The problem with each of these, respectively, is:

  1. Things rarely go exactly as planned.
  2. The longer between observing something and writing it down increases the possibility of errors.
  3. Transposing information is twice the work, and also increases the possibility of errors.

In the real world, a mixture of all of these would be the way to go, but I think it is unlikely to be readily adopted by academic scientists, because of the hassle. Instead, I think we should try to leverage the work people put into their venerable old paper and pen lab notebooks with digital technologies.
One easy path to searchable, shareable notebooks might be Optical Character Recognition, where one scans in the lab notebook pages and lets the computer figure out what’s what, but this approach will be limited by how often a researcher drags the notebook over to the scanner.

An interesting alternative would be the recently announced pen-top computer from Livescribe. Basically, this is a pen that can “remember” what you write, and upload what you have written to a computer. I envision writing in my notebook, then when I (and the pen) get back to the computer, the recorded text is uploaded to the computer and “auto-blogged” into my ELN. Sure, it’s not perfect: any images I paste into my paper notebook wouldn’t be saved, and the formatting won’t be perfect. However, I could have a date and keyword searchable archive of the things I have written, from which I can then reference to my actual notebook.

If ELNs are going to come into common use, there are only two ways: forced by the industry/academic hierarchy, or so easy and simple to adopt that researchers would be fools not to start using it.

Learn from the iPod: it needs to be easy.

Science
Web

Comments (0)

Permalink

Screencast: Download unfinished genomic sequences at NCBI Entrez

Over at RRResearch, Dr. Redfield is looking to download lots of incomplete H. influenzae genomes. So, I left a comment describing how I would solve the problem. Here, I expand the comment with a screencast (5 minutes) that show the procedures I take to download 288 nucleotide records from NCBI Entrez.

Updated comment

How to download a concatenated file of all H. influenzae genomes from NCBI Entrez Nucleotide

Note: I don’t know if this is the best way, just a way.

1. Go to NCBI Entrez Nucleotide.

2. Search for “Haemophilus influenzae [Organism] WGS” and get 576 results , 288 each from GenBank and Refseq. (”WGS” stands for whole genome sequencing, I think, but does not include completed genomes, for some reason.)

3. On the “Limits” tab, select either the “GenBank” or “RefSeq” results from the pop up menu titled “Only from:” to get the 288 GenBank records or the 288 RefSeq records. I am not sure which you should use, but I lean towards GenBank over RefSeq, as it is the submitted form of the record, so my example will proceed using the GenBank results.

4. From the “Display” pop-up, choose “FASTA”, and it will give you the FASTA of the first five results (Sorry, no static link available).

5. Next, choose to display a maximum 500 of the results, which gives us the FASTA of all 288 results in a web page format (Sorry, no static link available).

6. Finally, from the “Send to” pop-up, choose “file”, and your web browser should start downloading a text file of the results. .zip Archive of the resulting download (5 MB).

From here, you could probably use this file to make a BLAST database, however, there are 9 nearly empty FASTA records that should be deleted. You can do this many ways, but I like to do so graphically using the free TextWrangler, by the makers of BBEdit. To locate the empty records, I did a “find all” for “00000″, which are 5 zeros Once you delete these records, you end up with a multi-FASTA file containing genomic sequences from an additional 9 Hin strains:

22.4-21 (44 contigs), 22.1-21 (18 contigs), 3655 (23 contigs), PittAA (40 contigs), PittHH (59 contigs), PittII (25 contigs), R2846 (20 contigs), R2866 (4 contigs), and R3021 (46 contigs). I certainly can’t speak to the quality or completeness of these sequences, but you can download my results (.zip, 5 MB), if they would be useful.

I am sure there is some better way to do this, but I haven’t been able to find an FTP server where I can locate these files. A particular problem with this method is that it tends to slow your web browser to a crawl. I accomplished steps 4,5, and 6 with a liberal use of the Safari “stop loading page” button. Basically, I let the intermediate step pages begin to load, then stop them before they complete, so I can choose the next setting. With Firefox, I was unable to complete this tutorial, because the browser would become completely non-responsive for at least 10 minutes. Be careful!

Is there a better way to do this?

Science
Screencast
Web

Comments (2)

Permalink

Color Oracle: Make sure your figures are colorblind-friendly

Color Oracle Simulates Deuteranopia at NCBI Entrez

Color Oracle is a free software utility to simulate how the colorblind might see your artwork or figures available for Mac OS X (10.3.9 or better), Windows, and Linux. Another similar utility for this is Sim Daltonism (Mac OS X 10.2.8 or better).

According to Wikipedia article, as many as 8% of males and more than 1% of all people have difficulty distinguishing colors. We can make it easier for our audiences to interpret our figures and use our bioinformatics web applications if we give a little forethought and check to make sure that they will be able to discriminate what we identify with color.

Sim Daltonism simulating protanopia at NCBI EntrezGenerally, I don’t use color in figures for journal articles or posters, but I do tend to use color in slide presentations. Now, I can make sure that my work is more accessible to those who might be colorblind.

Both software links above via Daring Fireball.

Update: I re-wrote this article and submitted it to MacResearch.org.

OSX
Science
Web

Comments (0)

Permalink

Make your project “Google-able”

I was looking through Nucleic Acids Research this morning, and I saw an abstract for taveRNA, which I remembered as a graphical design interface for bioinformatic workflows (PMC). But instead, it was a set of web-tools for understanding RNA structure (inteRNA, pRuNA, alteRNA). Both sets of tools share the name “taverna”.
So, my fellow scientists, if you are going be cute when naming your genes, proteins, databases, or program, please make sure your term is at least somewhat unique in Google or PubMed. Otherwise, keeping these things straight in my head is difficult.
This of course, is not meant to disparage the creators of the taveRNA suite. They have put together a useful set of tools for modeling inter- and intra-molecular interactions of RNA molecules. These sorts of tools are just going to be more and more important in the future, as we discover more riboswitches and other RNA-based regulators.

UPDATE—

The Wall Street Journal discusses how this is important for people’s names. 

Science
Web

Comments (0)

Permalink

5 Ways to Turbocharge your PubMed Searches using MyNCBI

MyNCBI LogoNCBI’s PubMed is essential to the biomedical researcher, and luckily, NCBI offers many interesting ways to increase productivity, especially through MyNCBI.
MyNCBI is a service that allows personalization of NCBI resources to aid your research, saving you time and hassle. Some of the ways you can use it include:
1. Save your search queries
Do you find yourself doing the same PubMed searches every two weeks or monthly? Do you have some complex search queries that take a long time to enter? With MyNCBI, you can save your common search queries for all NCBI databases. This includes PubMed, PubMedCentral (PMC), Genome, Protein, Nucleotide, and many more. Save Search

When logged into MyNCBI, the words “Save Search” appear after you’ve done a query.

After clicking the link, a pop-up appears, letting you give your search a name:

Save your search with a unique name

You can access your saved searches from your MyNCBI page, which you can get to by clicking the “MyNCBI” link at the top right of any PubMed web page.
The popup also lets you choose to receive email updates to your search, which leads us to Tip #2.
2. Email the results of your saved queries
Save even more time in your searching by having NCBI send you regular updates to your most useful searches. New results of your saved queries can be sent to you by email daily, weekly, or monthly. You can even specify which day of the week you want to receive the results. This is a great way to keep up with topics without constantly reading journal table of contents.
Receive Email updates to your search
You can modify these settings later by going to your MyNCBI settings page, which you can get to by clicking the “MyNCBI” link at the top right of any PubMed web page.

Continue Reading »

Science
Web

Comments (2)

Permalink

Find Half-Remembered Journal Articles with Single Citation Matcher

Single Citation Matcher in SidebarDo you ever remember reading a paper, but can’t remember enough about it to find it easily in a normal Pubmed search? Do you have a paper copy of an article that lacks full bibliographic information?

PubMed offers Single Citation Matcher, which provides a great interactive interface to finding that article you need, and it’s available from the sidebar of all PubMed pages.

Basically, you start typing in any bibliographic information you have: the journal name, authors, dates, and/or title words. The interface even has autocompletion for the journal name and author names, based on number of citations.
PubMed even offers a helpful animated tutorial.

Science
Web

Comments (0)

Permalink

10 Bookmarklets to Quickly Search NCBI Resources

Spend a lot of time using the web interface to NCBI BLAST or other NCBI services? Want to minimize the time you spend opening BLAST or PubMed search pages? Bookmarklets ? are a handy way to quickly search for information. Save these bookmarklets for easy use by dragging any of the links below to your bookmarks bar, as pictured here:
Bookmarklets in the Bookmark Bar

Then, there are 2 ways to use the bookmarklet:

  • Highlight text within a webpage, then click the bookmarklet, and a NCBI search will be started using the selected text as the search parameters.
  • Click the bookmarklet and a pop-up window will appear, into which you can paste your search term(s) or sequence. Click “OK” and the specified search will load in your current window.
  1. Search Pubmed–The starting point for any biomedical scientific literature search.
  2. Search PubMedCentral–A great tool to find articles that may only mention your term of interest in the discussion, methods, or figures of the paper.
  3. BLASTN–Find nucleotide matches to your nucleotide sequence.
  4. BLASTP–Find protein matches to your protein.
  5. BLASTX–Find protein matches to the 6 frame translation of your nucleotide sequence in Genbank.
  6. Search All Entrez–Find all references to a set of terms in all NCBI Entrez databases
  7. Search Entrez Protein–Use accession numbers, protein functions, and other keywords to find proteins of interest.
  8. Search Entrez Nucleotide–Use accession numbers, gene identifiers, and other terms to find nucleotide sequences.
  9. Search Entrez Genome–Use variety of terms to find genomic sequences, integrated genetic and physical maps, and sequence maps.
  10. Search Entrez Gene–Search the NCBI database of genes from RefSeq genomes.

Obviously, NCBI and NLM provide dozens of resources, but hopefully these search tools will be useful to you. I have tested these in Firefox 2.0 and Safari on Mac OS X 10.4.9.

These bookmarklets were adapted from those available at Jesse’s Bookmarklet Site.

OSX
Science
Web

Comments (0)

Permalink

AttachToMyWebSpace Updated

Attach to My WebSpace is an AppleScript Droplet to copy files to a users web folder at MyWebSpace at the University of Wisconsin-Madison so that the file is publically available. The file is renamed in a web-friendly form (spaces removed, etc). Finally, a new message is created in Mail.app, with the URL of the uploaded file in the body of the message. This script works with Mac OS X 10.4 “Tiger” and doesn’t require any other software.

This updated version only asks for your NetID once, and then remembers it thereafter. To reset the set NetID, double-click on the droplet, and enter the new NetID.

Download Attach to My WebSpace

AttachToMyWebSpacewas inspired by Sender, a product of Stairways Software designed to work with Interarchy.

Leave any other questions in the comments section or send me an email.

powered by performancing firefox

OSX
Science
Web

Comments (0)

Permalink

Best recent science podcasts No. 2

The End of Free Will?: Has research on our minds removed choice from the marketplace?

University of Wisconsin-Madison Genetics Professor Sean B. Carroll, talks about evolution and the human genome. He’s the author of a new book, “The Making of the Fittest: DNA and the Ultimate Forensic Record of Evolution,”.

Science Laureates Town Hall : A discussion of the interaction of science and journalism by scientists and science journalists at Purdue University.

Podcast
Science

Comments (0)

Permalink