Digithead's Lab Notebook: reference

Showing posts with label reference. Show all posts

Wednesday, October 26, 2011

PostgreSQL cheat sheet

A handy cheat-sheet for the PostgreSQL database, for when I'm too lazy to dig through the docs or find another cheat-sheet.

Start and stop server

sudo su postgres -c '/opt/local/lib/postgresql/bin/postgres -D /opt/local/var/db/postgres/defaultdb'
cnt-z
bg

su -c 'pg_ctl start -D /opt/local/var/db/postgres/defaultdb -l postgreslog' postgres

To shutdown

sudo su postgres -c 'pg_ctl stop -D /opt/local/var/db/postgres/defaultdb'

Run client

psql -U postgres

sudo -u postgres psql

Commands

Postgres commands start with a backslash '\' character. Type \l to list databases and \c to connect to a database. \? shows help and \q quits. \d lists tables, views and sequences, \dt lists tables.

Granting access privileges

create database dbname;
create user joe_mamma with password 'password';
grant all privileges on database dbname to joe_mamma;
grant all privileges on all tables in schema public to joe_mamma;
grant all privileges on all sequences in schema public to joe_mamma;

See the docs for GRANT.

SQL dump and restore

pg_dump -U postgres dbname | gzip > dbname.dump.2011.10.24.gz

gunzip < dbname.dump.2011.10.24.gz | sudo -u postgres psql --dbname dbname

For more, see Backup and Restore from the Postgres manual.

Truncate

Delete all data from a table and related tables.

truncate my_table CASCADE;

Sequences

Sequences can be manipulated with currval and setval.

select currval('my_table_id_seq');
select setval('my_table_id_seq',1,false);

Trouble-shooting

If you seen an Ident authentication error...

FATAL:  Ident authentication failed for user "postgres"

... look in your pg_hba.conf file. Ask Postgres where this file is by typing, "show hba_file;".

sudo cat /etc/postgresql/9.0/main/pg_hba.conf

You might see a line that looks like this:

local  all  all      ident

What the 'ident' means is postgres uses your shell account name to log you in. Specifying the user on the command line "psql -U postgres" doesn't help. Either change "ident" in the pg_hba.conf to "md5" or "trust" and restart postgres, or just do what it wants: "sudo -u postgres psql". More on this can be found in “FATAL: Ident authentication failed”, or how cool ideas get bad usage schemas.

Saturday, October 22, 2011

Octave cheat sheet

I'm mucking about with Octave, MATLAB's open source cousin, as part of Stanford's Machine Learning class. Here are a few crib notes to keep me right side up.

The docs for Octave must be served from a Commodore 64 in Siberia judging by the speed, but Matlab's Function Reference is convenient. The Octave WikiBook covers a lot of the basics.

Matrices

Try some matrix operations. Create a 2x3 matrix, and a 3x2 matrix. Multiply them to get a 2x2 matrix. Try indexing.

>> A = [1 2 3; 4 5 6]
A =
   1   2   3
   4   5   6

>> B = 2 * ones(3,2)
B =
   2   2
   2   2
   2   2

>> size(B)
ans =
   3   2

>> A * B  % matrix multiplication
ans =
   12   12
   30   30

>> who    % list variables
A    B    ans

>> A(2,3) % get row 2, column 3
ans =  6

>> A(2,:) % get 2nd row
ans =
   4   5   6

>> A'     % A transpose
ans =
   1   4
   2   5
   3   6

>> A' .* B  % element-wise multiply
ans =
    2    8
    4   10
    6   12

Sum

sum(A,dim) is a little bass-ackwards in that the columns are dimension 1, rows are dimension 2, contrary to R and common sense.

>> sum(A,2)
ans =
    6
   15

Max

The max function operates strangely. There are at least 3 forms of max.

[C,I] = max(A)
C = max(A,B)
[C,I] = max(A,[],dim)

For max(v), if v is a vector, returns the largest element of v. If A is an m x n matrix, max(A) returns a row vector of length n holding the largest element from each column of A. You can also get the indices of the largest values in the I return value.

To get the row maximums, use the third form, with an empty vector as the second parameter. Oddly, setting dim=1 gives you the max of the columns, while dim=2 gives the row maximums.

Navigation and Reading data

Perform file operations with Unix shell type commands: pwd, ls, cd. Import and export data, like this:

>> data = csvread('ex1data1.txt');

>> load binary_file.dat

Printing output

The disp function is Octave's word for 'print'.

disp(sprintf('pi to 5 decimal places: %0.5f', pi))

Histogram

Plot a histogram for some normally distributed random numbers

>> w = -6 + sqrt(10)*(randn(1,10000))  % (mean = 1, var = 2)
>> hist(w,40)

Plotting

t = [0:0.01:0.99];
y1 = sin(2*pi*4*t); 
plot(t,y1);
y2 = cos(2*pi*2*t);
hold on;         % "hold off" to turn off
plot(t,y2,'r');
xlabel('time');
ylabel('value');
legend('sin','cos');
title('my plot');
print -dpng 'myPlot.png'
close;           % or,  "close all" to close all figs

Multiple plots in a grid.

figure(2), clf;  % select figure 2 and clear it
subplot(1,2,1);  % Divide plot into 1x2 grid, access 1st element
plot(t,y1);
subplot(1,2,2);  % Divide plot into 1x2 grid, access 2nd element
plot(t,y2);
axis([0.5 1 -1 1]);  % change axis scale

heatmap

figure;
imagesc(magic(15)), colorbar

These crib notes are based on the Octave tutorial from the ml class by Andrew Ng. Also check out the nice and quick Introduction to GNU Octave. I'm also collecting a few notes on matrix arithmetic.

Defining a function

function ret = test(a)
  ret = a + 1;
end

Peter Acklam's MATLAB array manipulation tips and tricks.
Tips for Making Octave Code Run Faster

Tuesday, October 18, 2011

Python cheat sheet

The most important docs at python.org are the tutorial and library reference.

Pointers to the docs

Important modules: sys, os, os.path, re, math, io

Inspecting objects

>>> help(obj)
>>> dir(obj)

List comprehensions and generators

numbers = [2, 3, 4, 5, 6, 7, 8, 9]
[x * x for x in numbers if x % 2 == 0]

Generators might be thought of as lazy list comprehensions.

def generate_squares():
  i = 1
  while True:
    yield i*i
    i += 1

Main method

def main():
    # do something here

if __name__ == "__main__":
    main()

See Guido's advice on main methods. To parse command line arguments use argparse instead o the older optparse or getopt.

Classes

The tutorial covers classes, but know that there are old-style classes and new-style classes.

class Foo(object):
  'Doc string for class'

  def __init__(self, a, b):
    'Doc string for constructor'
    self.a = a
    self.b = b
  
  def square_plus_a(self, x):
    'Doc string for a useless method'
    return x * x + a
    
  def __str__(self):
    return "Foo: a=%d, b=%d" % (self.a, self.b)

Preoccupation with classes is a bit passé these days. Javascript objects are just bags of properties to which you can add arbitrary properties whenever you feel like it. In Ruby, you might use OpenStruct. It's quite easy in Python. You just have to define your own class. I'll follow the convention I've seen elsewhere of creating an empty class called Object derived from the base object. Why you can't set attributes on an object instance is something I'll leave to the Python gurus.

class Object(object):
    pass

obj = MyEmptyClass()
obj.foo = 123
obj.bar = "A super secret message
dir(obj)
['__doc__', '__module__', 'bar', 'foo']

You can add methods, too, but they act a little funny. Self doesn't seem to work.

Files

Reading text files line by line can be done like so:

with open(filename, 'r') as f:
    for line in f:
        dosomething(line)

Be careful not to mix iteration over lines in a file with readline().

Exceptions

try:
  raise Exception('My spammy exception!', 1234, 'zot')
except Exception as e:
  print type(e)
  print e
finally:
  print "cleanup in finally clause!"

Traceback prints stack traces.

Logging

import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG,
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

Conditional Expressions

Finally added in Python 2.5:

x = true_value if condition else false_value

Packages, Libraries, Modules

What do they call them in Python? Here's a quick tip for finding out where installed packages are:

python -c 'import sys, pprint; pprint.pprint(sys.path)'

To find out what packages are installed, open a python shell and type:

help('modules')

Magic methods

Python's magic methods are the source of much confusion. Rafe Kettler's Guide to Python's Magic Methods sorts things out beautifully.

Saturday, October 02, 2010

CouchDB and R

Here are some quick crib notes on getting R talking to CouchDB using Couch's ReSTful HTTP API. We'll do it in two different ways. First, we'll construct HTTP calls with RCurl, then move on to the R4CouchDB package for a higher level interface. I'll assume you've already gotten started with CouchDB and are familiar with the basic ReST actions: GET PUT POST and DELETE.

First install RCurl and RJSONIO. You'll have to download the tar.gz's if you're on a Mac. For the second part, we'll need to install R4CouchDB, which depends on the previous two. I checked it out from GitHub and used R CMD INSTALL.

ReST with RCurl

Ping server

getURL("http://localhost:5984/")
[1] "{\"couchdb\":\"Welcome\",\"version\":\"1.0.1\"}\n"

That's nice, but we want to get the result back as a real R data structure. Try this:

welcome <- fromJSON(getURL("http://localhost:5984/"))
welcome$version
[1] "1.0.1"

Sweet!

PUT

One way to add a new record is with http PUT.

bozo = list(name="Bozo", occupation="clown", shoe.size=100)
getURL("http://localhost:5984/testing123/bozo",
       customrequest="PUT",
       httpheader=c('Content-Type'='application/json'),
       postfields=toJSON(bozo))
[1] "{\"ok\":true,\"id\":\"bozo\",\"rev\":\"1-70f5f59bf227d2d715c214b82330c9e5\"}\n"

Notice that RJSONIO has no high level PUT method, so you have to fake it using the costumrequest parameter. I'd never have figured that out without an example from R4CouchDB's source. The API of libCurl is odd, I have to say, and RCurl mostly just reflects it right into R.

If you don't like the idea of sending a put request with a get function, you could use RCurl's curlPerform. Trouble is, curlPerform returns an integer status code rather than the response body. You're supposed to provide an R function to collect the response body text. Not really worth the bother, unless you're getting into some of the advanced tricks described in the paper, R as a Web Client - the RCurl package.

bim <-  list(
  name="Bim", 
  occupation="clown",
  tricks=c("juggling", "pratfalls", "mocking Bolsheviks"))
reader = basicTextGatherer()
curlPerform(
  url = "http://localhost:5984/testing123/bim",
  httpheader = c('Content-Type'='application/json'),
  customrequest = "PUT",
  postfields = toJSON(bim),
  writefunction = reader$update
)
reader$value()

GET

Now that there's something in there, how do we get it back? That's super easy.

bozo2 <- fromJSON(getURL("http://localhost:5984/testing123/bozo"))
bozo2
$`_id`
[1] "bozo"

$`_rev`
[1] "1-646331b58ee010e8df39b5874b196c02"

$name
[1] "Bozo"

$occupation
[1] "clown"

$shoe.size
[1] 100

PUT again for updating

Updating is done by using PUT on an existing document. For example, let's give Bozo, some mad skillz:

getURL(
  "http://localhost:5984/testing123/bozo",
  customrequest="PUT",
  httpheader=c('Content-Type'='application/json'),
  postfields=toJSON(bozo2))

POST

If you POST to the database, you're adding a document and letting CouchDB assign its _id field.

bender = list(
  name='Bender',
  occupation='bending',
  species='robot')
response <- fromJSON(getURL(
  'http://localhost:5984/testing123/',
  customrequest='POST',
  httpheader=c('Content-Type'='application/json'),
  postfields=toJSON(bender)))
response
$ok
[1] TRUE

$id
[1] "2700b1428455d2d822f855e5fc0013fb"

$rev
[1] "1-d6ab7a690acd3204e0839e1aac01ec7a"

DELETE

For DELETE, you pass the doc's revision number in the query string. Sorry, Bender.

response <- fromJSON(getURL("http://localhost:5984/testing123/2700b1428455d2d822f855e5fc0013fb?rev=1-d6ab7a690acd3204e0839e1aac01ec7a",
  customrequest="DELETE"))

CRUD with R4CouchDB

R4CouchDB provides a layer on top of the techniques we've just described.

R4CouchDB uses a slightly strange idiom. You pass a cdb object, really just a list of parameters, into every R4CouchDB call and every call returns that object again, maybe modified. Results are returned in cdb$res. Maybe, they did this because R uses pass by value. Here's how you would initialize the object.

cdb <- cdbIni()
cdb$serverName <- "localhost"
cdb$port <- 5984
cdb$DBName="testing123"

Create

fake.data <- list(
  state='WA',
  population=6664195,
  state.bird='Lady GaGa')
cdb$dataList <- fake.data
cdb$id <- 'fake.data'  ## optional, otherwise an ID is generated
cdb <- cdbAddDoc(cdb)

cdb$res
$ok
[1] TRUE

$id
[1] "fake.data"

$rev
[1] "1-14bc025a194e310e79ac20127507185f"

Read

cdb$id <- 'bozo'
cdb <- cdbGetDoc(cdb)

bozo <- cdb$res
bozo
$`_id`
[1] "bozo"
... etc.

Update

First we take the document id and rev from the existing document. Then, save our revised document back to the DB.

cdb$id <- bozo$`_id`
cdb$rev <- bozo$`_rev`
bozo = list(
  name="Bozo",
  occupation="assassin",
  shoe.size=100,
  skills=c(
    'pranks',
    'honking nose',
    'kung fu',
    'high explosives',
    'sniper',
    'lock picking',
    'safe cracking'))
cdb <- cdbUpdateDoc(bozo)

Delete

Shortly thereafter, Bozo mysteriously disappeared.

cdb$id = bozo$`_id`
cdb <- cdbDeleteDoc(cdb)

Wednesday, April 14, 2010

HTML CSS and JavaScript References

Here's a place for web and html related reference material.

CSS margin and padding

top, right, bottom, left

HTML Entities

Result	Description	Entity Name	Entity Number
	non-breaking space
<	less than	<	<
>	greater than	>	>
&	ampersand	&	&
"	quotation mark	"	"
'	apostrophe	'	'
“	left double quote	“	&#147 / “
”	right double quote	”	&#148 / ”
×	multiplication	×	×
÷	division	÷	÷
©	copyright	©	©

HTML template

<html>

<head>

<title></title>

<link rel="stylesheet" type="text/css" href="style.css" />

</head>

<body>

<h1>Example Page</h1>
<p></p>

</body>
</html>

Style

<link rel="stylesheet" type="text/css" href="style.css" />

<style type="text/css">
.myborder
{
border:1px solid black;
}
</style>

Table

<h2>Table</h2>

<table>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</table>

List

<h2>List</h2>

<ul>
<li><a href=""></a></li>
<li><a href=""></a></li>
</ul>

Sunday, July 26, 2009

Select operations on R data frames

The R language is weird - particularly for those coming from a typical programmer's background, which likely includes OO languages in the curly-brace family and relational databases using SQL. A key data structure in R, the data.frame, is used something like a table in a relational database. In terms of R's somewhat byzantine type system (which is explained nicely here), a data.frame is a list of vectors of varying types. Each vector is a column in the data.frame making this a column-oriented data structure as opposed to the row-oriented nature of relational databases.

In spite of this difference, we often want to do the same sorts of things to an R data.frame that we would to a SQL table. The R docs confuse the SQL-savvy by using different terminology, so here is a quick crib-sheet for applying SQL concepts to data.frames.

We're going to use a sample data.frame with the following configuration of columns, or schema, if you prefer: (sequence:factor, strand:factor, start:integer, end:integer, common_name:character, value:double) where the type character is a string and a factor is something like an enum. Well, more accurately, value is a vector of type double and so forth. Anyway, our example is motivated by annotation of genome sequences, but the techniques aren't particular to any type of data.

> head(df)
    sequence strand start   end common_name      value
1 chromosome      +  1450  2112        yvrO  0.9542516
2 chromosome      + 41063 41716       graD6  0.2374012
3 chromosome      + 62927 63640       graD3  1.0454790
4 chromosome      + 63881 64807         gmd  1.4383845
5 chromosome      + 71811 72701        moaE -1.8739953
6 chromosome      + 73639 74739        moaA  1.2711058

So, given a data.frame of that schema, how do we do some simple select operations?

Selecting columns by name is easy:

> df[,c('sequence','start','end')]
       sequence   start     end
1    chromosome    1450    2112
2    chromosome   41063   41716
3    chromosome   62927   63640
4    chromosome   63881   64807
5    chromosome   71811   72701
...

As is selecting row names, or both:

> df[566:570,c('sequence','start','end')]
      sequence  start    end
566 chromosome 480999 479860
567 chromosome 481397 480999
568 chromosome 503053 501275
569 chromosome 506476 505712
570 chromosome 515461 514277

Selecting rows that meet certain criteria is a lot like a SQL where clause:

> df[df$value>3.0,]
      sequence strand   start     end common_name    value
199 chromosome      +  907743  909506        hutU 3.158821
321 chromosome      + 1391811 1393337        nadB 3.092771
556 chromosome      -  431600  431037         apt 3.043373
572 chromosome      -  519043  518186        hbd1 3.077040

For extra bonus points, let's find tRNAs.

> df[grep("trna", df$common_name, ignore.case=T),]
      sequence strand   start     end common_name        value
18  chromosome      +  115152  115224    Asn tRNA -0.461038128
19  chromosome      +  115314  115422    Ile tRNA -0.925268307
31  chromosome      +  167315  167388    Tyr tRNA  0.112527023
32  chromosome      +  191112  191196    Ser tRNA  0.986357577
...

Duplicate row names

Row names are not necessarily unique in R, which breaks the method shown above for selecting by row name. Take matrix a:

< a = matrix(1:18, nrow=6, ncol=3)
< rownames(a) <- c('a', 'a', 'a', 'b', 'b', 'b')
< colnames(a) <- c('foo', 'bar', 'bat')
< a
  foo bar bat
a   1   7  13
a   2   8  14
a   3   9  15
b   4  10  16
b   5  11  17
b   6  12  18

It looks to me like trying to index by the row names just returns the first row of a given name:

< a['a',]
foo bar bat 
  1   7  13
< a['b',]
foo bar bat 
  4  10  16

But this works:

< a[rownames(a)=='a',]
  foo bar bat
a   1   7  13
a   2   8  14
a   3   9  15

More Resources:

Help for R, the R language, or the R project is notoriously hard to search for, so I like to stick in a few extra keywords, like R, data frames, data.frames, select subset, subsetting, selecting rows from a data.frame that meet certain criteria, and find.

...more on R.

Wednesday, April 29, 2009

MySQL cheat-sheet

Well, now that MySQL is Oracle's SQL, I dunno how long this information will remain useful. But, here it goes:

Start

If you've installed MySQL by HomeBrew:

mysql.server start

otherwise...

sudo /usr/local/mysql/bin/mysqld_safe
cnt-z, bg

Set root password

mysqladmin -u root -pcurrentpassword password 'newpassword'

Console

mysql -p -u root

Create DB

create database foo;
use foo;

Create User

create user 'bar'@'localhost' identified by 'some_pass';
grant all privileges on foo.* to 'bar'@'localhost';
grant all privileges on foo.* to 'bar'@'localhost' with grant option;

Show Users

select host, user, password from mysql.user;

Shutdown

mysqladmin -p -u root shutdown

Load data from a table

LOAD DATA infile '/temp/myfile.tsv' INTO TABLE my_table IGNORE 1 lines;

You might get ERROR 13 (HY000): Can't get stat of ... caused by permissions. I get around it by giving full permissions to the file and its parent directory. See man stat for more.

Dump and restore data

mysqldump -p -u [user] [dbname] | gzip > [filename]
gunzip < [filename] | mysql -p -u [user] [dbname]

Docs for the mysqladmin tool and other client programs. SQL syntax docs for create table and select, insert, update, and delete.

BTW, where a server isn't needed, I'm starting to like SQLite a lot.

Wednesday, March 25, 2009

User directory structures

Some OS's call them home or user home directories, document directories. Here's a quick guide to how different operating systems typically organize user directories.

Directory Type

Directory path returned for each operating system

User Home

Windows XP:	C:\Documents and Settings\<user>
Windows Vista:	C:\Users\<user>
Mac OSX:	/Users/<user>
Linux:	/home/<user>

User Desktop

Windows XP:	C:\Documents and Settings\<user>\Desktop
Windows Vista:	C:\Users\<user>\Desktop
Mac OSX:	/Users/<user>/Desktop
Linux:	/home/<user>/Desktop

User Documents

Windows XP:	C:\Documents and Settings\<user>\My Documents
Windows Vista:	C:\Users\<user>\Documents
Mac OSX:	/Users/<user>/Documents
Linux:	/home/<user>

User Application Data

Windows XP:	C:\Documents and Settings\<user>\Local Settings\Application Data
Windows Vista:	C:\Users\<user>\AppData\Local
Mac OSX:	/Users/<user>/Library/Application Support
Linux:	/home/<user>

User Preferences

Windows XP:	C:\Documents and Settings\<user>\Local Settings\Application Data
Windows Vista:	C:\Users\<user>\AppData\Local
Mac OSX:	/Users/<user>/Library/Preferences
Linux:	/home/<user>

Public Documents

Windows XP:	C:\Documents and Settings \All Users\Documents
Windows Vista:	C:\Users\Public\Documents
Mac OSX:	/Library/Application Support
Linux:	/usr/local/

Public Application Data

Windows XP:	C:\Documents and Settings\ All Users\Application Data
Windows Vista:	C:\ProgramData
Mac OSX:	/Library/Application Support
Linux:	/usr/local/

Public Preferences

Windows XP:	C:\Documents and Settings\ All Users\Application Data
Windows Vista:	C:\ProgramData
Mac OSX:	/Library/Preferences
Linux:	/etc

System Libraries

Windows XP:	C:\Windows\System32
Windows Vista:	C:\Windows\System32
Mac OSX:	/Library/Frameworks
Linux:	/usr/lib

Application Files

Windows XP:	C:\Program Files
Windows Vista:	C:\Program Files
Mac OSX:	/Applications
Linux:	/usr/local/

Volume Root

Windows XP:	C:\
Windows Vista:	C:\
Mac OSX:	/
Linux:	/

Temp

Windows XP:	C:\Documents and Settings \<user>\Local Settings\Temp
Windows Vista:	C:\Users\<user>\AppData \Local\Temp
Mac OSX:	/private/tmp/folders.501 /TemporaryItems
Linux:	/tmp

Thanks to National Instruments for the data.

See Wikipedia entries Home directory and My Documents. If you're in Java land, values of the os.name property are documented here and here.