In [2]:

import pandas as pd
import sqlite3

9.1 从SQL数据库中读数据¶

目前为止，我们所说的都是从CSV文件中读取数据，这是最常用的方式，不过除此之外，还有其他方式。Pandas能够从 HTML, JSON, SQL, Excel (!!!), HDF5, Stata等等中读取数据。本部分讲的是如何从数据库中读取数据。

你可以用pd.read_sql函数来从SQL数据库中读数据，这个函数会自动将表的列名转换成DataFrame的列名。

read_sql有两个参数，一个是SQL语句，另一个是数据库的连接对象。这样子分离开来意味着你可以从任何数据中读取数据，而并不在乎到底是MySQL，SQLite，PostgreSQL还是其他什么。

本例将解释如何从sqlite中读数据，不过其他数据库应该也差不多的。

In [3]:

con = sqlite3.connect("../data/weather_2012.sqlite")
df = pd.read_sql("SELECT * from weather_2012 LIMIT 3", con)
df

Out[3]:

	id	date_time	temp
0	1	2012-01-01 00:00:00	-1.8
1	2	2012-01-01 01:00:00	-1.8
2	3	2012-01-01 02:00:00	-1.8

read_sql并不会自动将id列设为索引，你可以传递参数 index_col给 read_sql来指定索引列。

如果你熟悉read_csv的话，你会发现它也有个 index_col 参数，是一个意思。

In [4]:

df = pd.read_sql("SELECT * from weather_2012 LIMIT 3", con, index_col='id')
df

Out[4]:

	date_time	temp
id
1	2012-01-01 00:00:00	-1.8
2	2012-01-01 01:00:00	-1.8
3	2012-01-01 02:00:00	-1.8

如果你希望索引列包含多个字段的话，将其作为列表传递给index_col就行：

In [5]:

df = pd.read_sql("SELECT * from weather_2012 LIMIT 3", con, 
                 index_col=['id', 'date_time'])
df

Out[5]:

		temp
id	date_time
1	2012-01-01 00:00:00	-1.8
2	2012-01-01 01:00:00	-1.8
3	2012-01-01 02:00:00	-1.8

9.2 向 SQLite 写数据¶

Pandas 有一个write_frame 来写入数据到数据库中。目前还只支持SQLite数据库。我们用它来将2012年的天气数据写入到数据库中去

可以注意到还有个pd.io.sql。里面有一堆的函数用于读和写，值得你去好好花点时间探索)

In [6]:

weather_df = pd.read_csv('../data/weather_2012.csv')
con = sqlite3.connect("../data/test_db.sqlite")
con.execute("DROP TABLE IF EXISTS weather_2012")
weather_df.to_sql("weather_2012", con)

/home/tianjun/anaconda/lib/python2.7/site-packages/pandas/io/sql.py:1309: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
  warnings.warn(_SAFE_NAMES_WARNING)

接下来从 test_db.sqlite中的weather_2012 表读出数据：

In [7]:

con = sqlite3.connect("../data/test_db.sqlite")
df = pd.read_sql("SELECT * from weather_2012 LIMIT 3", con)
df

Out[7]:

	index	Date/Time	Temp (C)	Dew Point Temp (C)	Rel Hum (%)	Wind Spd (km/h)	Visibility (km)	Stn Press (kPa)	Weather
0	0	2013-01-01 00:00:00	-1.0	-1.7	95	35	6.4	99.89	Snow
1	1	2013-01-01 01:00:00	-2.0	-5.1	79	35	16.1	99.93	Mainly Clear
2	2	2013-01-01 02:00:00	-2.7	-6.0	78	28	19.3	100.08	Snow

最炫的地方在于，如果你对SQL更熟的话，你可以用各种各样的SQL语句来完成数据操作，比如下面这个例子直接用SQL语句排序好结果并取前三：

In [8]:

con = sqlite3.connect("../data/test_db.sqlite")
df = pd.read_sql("SELECT * from weather_2012 ORDER BY Weather LIMIT 3", con)
df

Out[8]:

	index	Date/Time	Temp (C)	Dew Point Temp (C)	Rel Hum (%)	Wind Spd (km/h)	Visibility (km)	Stn Press (kPa)	Weather
0	3	2013-01-01 03:00:00	-5.6	-11.7	62	30	25	100.21	Clear
1	5	2013-01-01 05:00:00	-9.7	-14.8	66	33	25	100.47	Clear
2	6	2013-01-01 06:00:00	-11.1	-17.0	62	30	25	100.65	Clear

如果你使用的是PostgreSQL或者MySQL，读取数据的方式是几乎一样的。你可以通过psycopg2.connect() 或者 MySQLdb.connect()来创建一个连接，然后用

pd.read_sql("SELECT whatever from your_table", con)

9.3 连接到其他数据库¶

连接到MySQL数据库

注：你需要有一个可用的数据库，并填好正确的 host 和数据库名称以及其他相关信息

连接到PostgreSQL 数据库: