20

I'm uploading potentially large files to a web server. Currently I'm doing this:

import urllib2

f = open('somelargefile.zip','rb')
request = urllib2.Request(url,f.read())
request.add_header("Content-Type", "application/zip")
response = urllib2.urlopen(request)

However, this reads the entire file's contents into memory before posting it. How can I have it stream the file to the server?

2

6 Answers 6

31

Reading through the mailing list thread linked to by systempuntoout, I found a clue towards the solution.

The mmap module allows you to open file that acts like a string. Parts of the file are loaded into memory on demand.

Here's the code I'm using now:

import urllib2
import mmap

# Open the file as a memory mapped string. Looks like a string, but 
# actually accesses the file behind the scenes. 
f = open('somelargefile.zip','rb')
mmapped_file_as_string = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

# Do the request
request = urllib2.Request(url, mmapped_file_as_string)
request.add_header("Content-Type", "application/zip")
response = urllib2.urlopen(request)

#close everything
mmapped_file_as_string.close()
f.close()
Sign up to request clarification or add additional context in comments.

1 Comment

Image
could you please confirm the below line is correct: request = urllib2.Request(url, mmapped_file_as_string)
5

The documentation doesn't say you can do this, but the code in urllib2 (and httplib) accepts any object with a read() method as data. So using an open file seems to do the trick.

You'll need to set the Content-Length header yourself. If it's not set, urllib2 will call len() on the data, which file objects don't support.

import os.path
import urllib2

data = open(filename, 'r')
headers = { 'Content-Length' : os.path.getsize(filename) }
response = urllib2.urlopen(url, data, headers)

This is the relevant code that handles the data you supply. It's from the HTTPConnection class in httplib.py in Python 2.7:

def send(self, data):
    """Send `data' to the server."""
    if self.sock is None:
        if self.auto_open:
            self.connect()
        else:
            raise NotConnected()

    if self.debuglevel > 0:
        print "send:", repr(data)
    blocksize = 8192
    if hasattr(data,'read') and not isinstance(data, array):
        if self.debuglevel > 0: print "sendIng a read()able"
        datablock = data.read(blocksize)
        while datablock:
            self.sock.sendall(datablock)
            datablock = data.read(blocksize)
    else:
        self.sock.sendall(data)

2 Comments

urllib2.urlopen(url, data, headers) doesn't take headers as parameter, so the line response = urllib2.urlopen(url, data, headers) won't work. I have provided working code in answer below
Is this possible with the requests module ? I have to send files in chunks (10 MB) however do not want to read all 10MB in memory but want to read some bytes (8192) and send to requests..till I complete 10MB
2

Have you tried with Mechanize?

from mechanize import Browser
br = Browser()
br.open(url)
br.form.add_file(open('largefile.zip'), 'application/zip', 'largefile.zip')
br.submit()

or, if you don't want to use multipart/form-data, check this old post.

It suggests two options:

  1. Use mmap, Memory Mapped file object
  2. Patch httplib.HTTPConnection.send

2 Comments

I'm not wanting to send the files encoded "multipart/form-data". This would seem to do that. I'm just looking for a raw post.
On python 2.7 option #2 has been added patched already, the block size is 8192, I wonder why.. hmmm. what's the norm/standard on this?
1

Try pycurl. I don't have anything setup will accept a large file that isn't in a multipart/form-data POST, but here's a simple example that reads the file as needed.

import os
import pycurl

class FileReader:
    def __init__(self, fp):
        self.fp = fp
    def read_callback(self, size):
        return self.fp.read(size)

c = pycurl.Curl()
c.setopt(pycurl.URL, url)
c.setopt(pycurl.UPLOAD, 1)
c.setopt(pycurl.READFUNCTION, FileReader(open(filename, 'rb')).read_callback)
filesize = os.path.getsize(filename)
c.setopt(pycurl.INFILESIZE, filesize)
c.perform()
c.close()

1 Comment

Thanks JimB. I'd have used this, except I have a few people Windows using this, and I don't want them to have to install anything else.
1

Using the requests library you can do

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)

as mentioned here in their docs

1 Comment

The 8K block size still applies, as httplib.py, send() L#869 is called.
0

Below is the working example for both Python 2 / Python 3:

try:
    from urllib2 import urlopen, Request
except:
    from urllib.request import urlopen, Request

headers = { 'Content-length': str(os.path.getsize(filepath)) }
with open(filepath, 'rb') as f:
    req = Request(url, data=f, headers=headers)
    result = urlopen(req).read().decode()

The requests module is great, but sometimes you cannot install any extra modules...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.