New upstream version 2.2.0
Gianfranco Costamagna
2 years ago
0 | s3cmd-2.2.0 - 2021-09-27 | |
1 | =============== | |
2 | * Added support for metadata modification of files bigger than 5 GiB | |
3 | * Added support for remote copy of files bigger than 5 GiB using MultiPart copy (Damian Martinez, Florent Viard) | |
4 | * Added progress info output for multipart copy and current-total info in output for cp, mv and modify | |
5 | * Added support for all special/foreign character names in object names to cp/mv/modify | |
6 | * Added support for SSL authentication (Aleksandr Chazov) | |
7 | * Added the http error 429 to the list of retryable errors (#1096) | |
8 | * Added support for listing and resuming of multipart uploads of more than 1000 parts (#346) | |
9 | * Added time based expiration for idle pool connections in order to avoid random broken pipe errors (#1114) | |
10 | * Added support for STS webidentity authentication (ie AssumeRole and AssumeRoleWithWebIdentity) (Samskeyti, Florent Viard) | |
11 | * Added support for custom headers to the mb command (#1197) (Sébastien Vajda) | |
12 | * Improved MultiPart copy to preserve acl and metadata of objects | |
13 | * Improved the server errors catching and reporting for cp/mv/modify commands | |
14 | * Improved resiliency against servers sending garbage responses (#1088, #1090, #1093) | |
15 | * Improved remote copy to have consistent copy of metadata in all cases: multipart or not, aws or not | |
16 | * Improved security by revoking public-write acl when private acl is set (#1151) (ruanzitao) | |
17 | * Improved speed when running on an EC2 instance (#1117) (Patrick Allain) | |
18 | * Reduced connection_max_age to 5s to avoid broken pipes as AWS closes https conns after around 6s (#1114) | |
19 | * Ensure that KeyboardInterrupt are always properly raised (#1089) | |
20 | * Changed sized of multipart copy chunks to 1 GiB | |
21 | * Fixed ValueError when using more than one ":" inside add_header in config file (#1087) | |
22 | * Fixed extra label issue when stdin used as source of a MultiPart upload | |
23 | * Fixed remote copy to allow changing the mime-type (ie content-type) of the new object | |
24 | * Fixed remote_copy to ensure that meta-s3cmd-attrs will be set based on the real source and not on the copy source | |
25 | * Fixed deprecation warnings due to invalid escape sequences (Karthikeyan Singaravelan) | |
26 | * Fixed getbucketinfo that was broken when the bucket lifecycle uses the filter element (Liu Lan) | |
27 | * Fixed RestoreRequest XML namespace URL (#1203) (Akete) | |
28 | * Fixed PARTIAL exit code that was not properly set when needed for object_get (#1190) | |
29 | * Fixed a possible inifinite loop when a file is truncated during hashsum or upload (#1125) (Matthew Krokosz, Florent Viard) | |
30 | * Fixed report_exception wrong error when LANG env var was not set (#1113) | |
31 | * Fixed wrong wiki url in error messages (Alec Barrett) | |
32 | * Py3: Fixed an AttributeError when using the "files-from" option | |
33 | * Py3: Fixed compatibility issues due to the removal of getchildren() from ElementTree in python 3.9 (#1146, #1157, #1162, # 1182, #1210) (Ondřej Budai) | |
34 | * Py3: Fixed compatibility issues due to the removal of encodestring() in python 3.9 (#1161, #1174) (Kentaro Kaneki) | |
35 | * Fixed a crash when the AWS_ACCESS_KEY env var is set but not AWS_SECRET_KEY (#1201) | |
36 | * Cleanup of check_md5 (Riccardo Magliocchetti) | |
37 | * Removed legacy code for dreamhost that should be necessary anymore | |
38 | * Migrated CI tests to use github actions (Arnaud Jaffre) | |
39 | * Improved README with a link to INSTALL.md (Sia Karamalegos) | |
40 | * Improved help content (Dmitrii Korostelev, Roland Van Laar) | |
41 | * Improvements for setup and build configurations | |
42 | * Many other bug fixes | |
43 | ||
44 | ||
0 | 45 | s3cmd-2.1.0 - 2020-04-07 |
1 | 46 | =============== |
2 | 47 | * Changed size reporting using k instead of K as it a multiple of 1024 (#956) |
0 | 0 | Metadata-Version: 1.2 |
1 | 1 | Name: s3cmd |
2 | Version: 2.1.0 | |
2 | Version: 2.2.0 | |
3 | 3 | Summary: Command line tool for managing Amazon S3 and CloudFront services |
4 | 4 | Home-page: http://s3tools.org |
5 | 5 | Author: Michal Ludvig |
19 | 19 | Authors: |
20 | 20 | -------- |
21 | 21 | Florent Viard <florent@sodria.com> |
22 | ||
22 | 23 | Michal Ludvig <michal@logix.cz> |
24 | ||
23 | 25 | Matt Domsch (github.com/mdomsch) |
24 | 26 | |
25 | 27 | Platform: UNKNOWN |
0 | 0 | ## S3cmd tool for Amazon Simple Storage Service (S3) |
1 | 1 | |
2 | [![Build Status](https://travis-ci.org/s3tools/s3cmd.svg?branch=master)](https://travis-ci.org/s3tools/s3cmd) | |
2 | [![Build Status](https://github.com/s3tools/s3cmd/actions/workflows/test.yml/badge.svg)](https://github.com/s3tools/s3cmd/actions/workflows/test.yml) | |
3 | 3 | |
4 | 4 | * Author: Michal Ludvig, michal@logix.cz |
5 | 5 | * [Project homepage](http://s3tools.org) |
14 | 14 | |
15 | 15 | S3cmd requires Python 2.6 or newer. |
16 | 16 | Python 3+ is also supported starting with S3cmd version 2. |
17 | ||
18 | See [installation instructions](https://github.com/s3tools/s3cmd/blob/master/INSTALL.md). | |
17 | 19 | |
18 | 20 | |
19 | 21 | ### What is S3cmd |
196 | 198 | |
197 | 199 | As you can see we didn't have to create the `/somewhere` 'directory'. In fact it's only a filename prefix, not a real directory and it doesn't have to be created in any way beforehand. |
198 | 200 | |
199 | In stead of using `put` with the `--recursive` option, you could also use the `sync` command: | |
201 | Instead of using `put` with the `--recursive` option, you could also use the `sync` command: | |
200 | 202 | |
201 | 203 | ``` |
202 | 204 | $ s3cmd sync dir1 dir2 s3://public.s3tools.org/somewhere/ |
8 | 8 | from __future__ import absolute_import, print_function |
9 | 9 | |
10 | 10 | import sys |
11 | from .Utils import getTreeFromXml, deunicodise, encode_to_s3, decode_from_s3 | |
11 | from .BaseUtils import getTreeFromXml, encode_to_s3, decode_from_s3 | |
12 | from .Utils import deunicodise | |
12 | 13 | |
13 | 14 | try: |
14 | 15 | import xml.etree.ElementTree as ET |
40 | 41 | |
41 | 42 | def isAnonRead(self): |
42 | 43 | return self.isAllUsers() and (self.permission == "READ" or self.permission == "FULL_CONTROL") |
44 | ||
45 | def isAnonWrite(self): | |
46 | return self.isAllUsers() and (self.permission == "WRITE" or self.permission == "FULL_CONTROL") | |
43 | 47 | |
44 | 48 | def getElement(self): |
45 | 49 | el = ET.Element("Grant") |
126 | 130 | return True |
127 | 131 | return False |
128 | 132 | |
133 | def isAnonWrite(self): | |
134 | for grantee in self.grantees: | |
135 | if grantee.isAnonWrite(): | |
136 | return True | |
137 | return False | |
138 | ||
129 | 139 | def grantAnonRead(self): |
130 | 140 | if not self.isAnonRead(): |
131 | 141 | self.appendGrantee(GranteeAnonRead()) |
132 | 142 | |
133 | 143 | def revokeAnonRead(self): |
134 | 144 | self.grantees = [g for g in self.grantees if not g.isAnonRead()] |
145 | ||
146 | def revokeAnonWrite(self): | |
147 | self.grantees = [g for g in self.grantees if not g.isAnonWrite()] | |
135 | 148 | |
136 | 149 | def appendGrantee(self, grantee): |
137 | 150 | self.grantees.append(grantee) |
147 | 160 | elif grantee.permission.upper() == permission: |
148 | 161 | return True |
149 | 162 | |
150 | return False; | |
163 | return False | |
151 | 164 | |
152 | 165 | def grant(self, name, permission): |
153 | 166 | if self.hasGrant(name, permission): |
11 | 11 | |
12 | 12 | from . import S3Uri |
13 | 13 | from .Exceptions import ParameterError |
14 | from .Utils import getTreeFromXml, decode_from_s3 | |
14 | from .BaseUtils import getTreeFromXml, decode_from_s3 | |
15 | 15 | from .ACL import GranteeAnonRead |
16 | 16 | |
17 | 17 | try: |
0 | # -*- coding: utf-8 -*- | |
1 | ||
2 | ## Amazon S3 manager | |
3 | ## Author: Michal Ludvig <michal@logix.cz> | |
4 | ## http://www.logix.cz/michal | |
5 | ## License: GPL Version 2 | |
6 | ## Copyright: TGRMN Software and contributors | |
7 | ||
8 | from __future__ import absolute_import, division | |
9 | ||
10 | import re | |
11 | import sys | |
12 | ||
13 | from calendar import timegm | |
14 | from logging import debug, warning, error | |
15 | ||
16 | import xml.dom.minidom | |
17 | import xml.etree.ElementTree as ET | |
18 | ||
19 | from .ExitCodes import EX_OSFILE | |
20 | ||
21 | try: | |
22 | import dateutil.parser | |
23 | except ImportError: | |
24 | sys.stderr.write(u""" | |
25 | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | |
26 | ImportError trying to import dateutil.parser. | |
27 | Please install the python dateutil module: | |
28 | $ sudo apt-get install python-dateutil | |
29 | or | |
30 | $ sudo yum install python-dateutil | |
31 | or | |
32 | $ pip install python-dateutil | |
33 | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | |
34 | """) | |
35 | sys.stderr.flush() | |
36 | sys.exit(EX_OSFILE) | |
37 | ||
38 | try: | |
39 | from urllib import quote | |
40 | except ImportError: | |
41 | # python 3 support | |
42 | from urllib.parse import quote | |
43 | ||
44 | try: | |
45 | unicode | |
46 | except NameError: | |
47 | # python 3 support | |
48 | # In python 3, unicode -> str, and str -> bytes | |
49 | unicode = str | |
50 | ||
51 | ||
52 | __all__ = [] | |
53 | ||
54 | ||
55 | RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)') | |
56 | RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE) | |
57 | ||
58 | ||
59 | # Date and time helpers | |
60 | ||
61 | ||
62 | def dateS3toPython(date): | |
63 | # Reset milliseconds to 000 | |
64 | date = RE_S3_DATESTRING.sub(".000", date) | |
65 | return dateutil.parser.parse(date, fuzzy=True) | |
66 | __all__.append("dateS3toPython") | |
67 | ||
68 | ||
69 | def dateS3toUnix(date): | |
70 | ## NOTE: This is timezone-aware and return the timestamp regarding GMT | |
71 | return timegm(dateS3toPython(date).utctimetuple()) | |
72 | __all__.append("dateS3toUnix") | |
73 | ||
74 | ||
75 | def dateRFC822toPython(date): | |
76 | """ | |
77 | Convert a string formated like '2020-06-27T15:56:34Z' into a python datetime | |
78 | """ | |
79 | return dateutil.parser.parse(date, fuzzy=True) | |
80 | __all__.append("dateRFC822toPython") | |
81 | ||
82 | ||
83 | def dateRFC822toUnix(date): | |
84 | return timegm(dateRFC822toPython(date).utctimetuple()) | |
85 | __all__.append("dateRFC822toUnix") | |
86 | ||
87 | ||
88 | def formatDateTime(s3timestamp): | |
89 | date_obj = dateutil.parser.parse(s3timestamp, fuzzy=True) | |
90 | return date_obj.strftime("%Y-%m-%d %H:%M") | |
91 | __all__.append("formatDateTime") | |
92 | ||
93 | ||
94 | # Encoding / Decoding | |
95 | ||
96 | ||
97 | def base_unicodise(string, encoding='UTF-8', errors='replace', silent=False): | |
98 | """ | |
99 | Convert 'string' to Unicode or raise an exception. | |
100 | """ | |
101 | if type(string) == unicode: | |
102 | return string | |
103 | ||
104 | if not silent: | |
105 | debug("Unicodising %r using %s" % (string, encoding)) | |
106 | try: | |
107 | return unicode(string, encoding, errors) | |
108 | except UnicodeDecodeError: | |
109 | raise UnicodeDecodeError("Conversion to unicode failed: %r" % string) | |
110 | __all__.append("base_unicodise") | |
111 | ||
112 | ||
113 | def base_deunicodise(string, encoding='UTF-8', errors='replace', silent=False): | |
114 | """ | |
115 | Convert unicode 'string' to <type str>, by default replacing | |
116 | all invalid characters with '?' or raise an exception. | |
117 | """ | |
118 | if type(string) != unicode: | |
119 | return string | |
120 | ||
121 | if not silent: | |
122 | debug("DeUnicodising %r using %s" % (string, encoding)) | |
123 | try: | |
124 | return string.encode(encoding, errors) | |
125 | except UnicodeEncodeError: | |
126 | raise UnicodeEncodeError("Conversion from unicode failed: %r" % string) | |
127 | __all__.append("base_deunicodise") | |
128 | ||
129 | ||
130 | def decode_from_s3(string, errors = "replace"): | |
131 | """ | |
132 | Convert S3 UTF-8 'string' to Unicode or raise an exception. | |
133 | """ | |
134 | return base_unicodise(string, "UTF-8", errors, True) | |
135 | __all__.append("decode_from_s3") | |
136 | ||
137 | ||
138 | def encode_to_s3(string, errors='replace'): | |
139 | """ | |
140 | Convert Unicode to S3 UTF-8 'string', by default replacing | |
141 | all invalid characters with '?' or raise an exception. | |
142 | """ | |
143 | return base_deunicodise(string, "UTF-8", errors, True) | |
144 | __all__.append("encode_to_s3") | |
145 | ||
146 | ||
147 | def s3_quote(param, quote_backslashes=True, unicode_output=False): | |
148 | """ | |
149 | URI encode every byte. UriEncode() must enforce the following rules: | |
150 | - URI encode every byte except the unreserved characters: 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'. | |
151 | - The space character is a reserved character and must be encoded as "%20" (and not as "+"). | |
152 | - Each URI encoded byte is formed by a '%' and the two-digit hexadecimal value of the byte. | |
153 | - Letters in the hexadecimal value must be uppercase, for example "%1A". | |
154 | - Encode the forward slash character, '/', everywhere except in the object key name. | |
155 | For example, if the object key name is photos/Jan/sample.jpg, the forward slash in the key name is not encoded. | |
156 | """ | |
157 | if quote_backslashes: | |
158 | safe_chars = "~" | |
159 | else: | |
160 | safe_chars = "~/" | |
161 | param = encode_to_s3(param) | |
162 | param = quote(param, safe=safe_chars) | |
163 | if unicode_output: | |
164 | param = decode_from_s3(param) | |
165 | else: | |
166 | param = encode_to_s3(param) | |
167 | return param | |
168 | __all__.append("s3_quote") | |
169 | ||
170 | ||
171 | def base_urlencode_string(string, urlencoding_mode = None, unicode_output=False): | |
172 | string = encode_to_s3(string) | |
173 | ||
174 | if urlencoding_mode == "verbatim": | |
175 | ## Don't do any pre-processing | |
176 | return string | |
177 | ||
178 | encoded = quote(string, safe="~/") | |
179 | debug("String '%s' encoded to '%s'" % (string, encoded)) | |
180 | if unicode_output: | |
181 | return decode_from_s3(encoded) | |
182 | else: | |
183 | return encode_to_s3(encoded) | |
184 | __all__.append("base_urlencode_string") | |
185 | ||
186 | ||
187 | def base_replace_nonprintables(string, with_message=False): | |
188 | """ | |
189 | replace_nonprintables(string) | |
190 | ||
191 | Replaces all non-printable characters 'ch' in 'string' | |
192 | where ord(ch) <= 26 with ^@, ^A, ... ^Z | |
193 | """ | |
194 | new_string = "" | |
195 | modified = 0 | |
196 | for c in string: | |
197 | o = ord(c) | |
198 | if (o <= 31): | |
199 | new_string += "^" + chr(ord('@') + o) | |
200 | modified += 1 | |
201 | elif (o == 127): | |
202 | new_string += "^?" | |
203 | modified += 1 | |
204 | else: | |
205 | new_string += c | |
206 | if modified and with_message: | |
207 | warning("%d non-printable characters replaced in: %s" % (modified, new_string)) | |
208 | return new_string | |
209 | __all__.append("base_replace_nonprintables") | |
210 | ||
211 | ||
212 | # XML helpers | |
213 | ||
214 | ||
215 | def parseNodes(nodes): | |
216 | ## WARNING: Ignores text nodes from mixed xml/text. | |
217 | ## For instance <tag1>some text<tag2>other text</tag2></tag1> | |
218 | ## will be ignore "some text" node | |
219 | ## WARNING 2: Any node at first level without children will also be ignored | |
220 | retval = [] | |
221 | for node in nodes: | |
222 | retval_item = {} | |
223 | for child in node: | |
224 | name = decode_from_s3(child.tag) | |
225 | if len(child): | |
226 | retval_item[name] = parseNodes([child]) | |
227 | else: | |
228 | found_text = node.findtext(".//%s" % child.tag) | |
229 | if found_text is not None: | |
230 | retval_item[name] = decode_from_s3(found_text) | |
231 | else: | |
232 | retval_item[name] = None | |
233 | if retval_item: | |
234 | retval.append(retval_item) | |
235 | return retval | |
236 | __all__.append("parseNodes") | |
237 | ||
238 | ||
239 | def getPrettyFromXml(xmlstr): | |
240 | xmlparser = xml.dom.minidom.parseString(xmlstr) | |
241 | return xmlparser.toprettyxml() | |
242 | ||
243 | __all__.append("getPrettyFromXml") | |
244 | ||
245 | ||
246 | def stripNameSpace(xml): | |
247 | """ | |
248 | removeNameSpace(xml) -- remove top-level AWS namespace | |
249 | Operate on raw byte(utf-8) xml string. (Not unicode) | |
250 | """ | |
251 | xmlns_match = RE_XML_NAMESPACE.match(xml) | |
252 | if xmlns_match: | |
253 | xmlns = xmlns_match.group(3) | |
254 | xml = RE_XML_NAMESPACE.sub("\\1\\2", xml, 1) | |
255 | else: | |
256 | xmlns = None | |
257 | return xml, xmlns | |
258 | __all__.append("stripNameSpace") | |
259 | ||
260 | ||
261 | def getTreeFromXml(xml): | |
262 | xml, xmlns = stripNameSpace(encode_to_s3(xml)) | |
263 | try: | |
264 | tree = ET.fromstring(xml) | |
265 | if xmlns: | |
266 | tree.attrib['xmlns'] = xmlns | |
267 | return tree | |
268 | except Exception as e: | |
269 | error("Error parsing xml: %s", e) | |
270 | error(xml) | |
271 | raise | |
272 | __all__.append("getTreeFromXml") | |
273 | ||
274 | ||
275 | def getListFromXml(xml, node): | |
276 | tree = getTreeFromXml(xml) | |
277 | nodes = tree.findall('.//%s' % (node)) | |
278 | return parseNodes(nodes) | |
279 | __all__.append("getListFromXml") | |
280 | ||
281 | ||
282 | def getDictFromTree(tree): | |
283 | ret_dict = {} | |
284 | for child in tree: | |
285 | if len(child): | |
286 | ## Complex-type child. Recurse | |
287 | content = getDictFromTree(child) | |
288 | else: | |
289 | content = decode_from_s3(child.text) if child.text is not None else None | |
290 | child_tag = decode_from_s3(child.tag) | |
291 | if child_tag in ret_dict: | |
292 | if not type(ret_dict[child_tag]) == list: | |
293 | ret_dict[child_tag] = [ret_dict[child_tag]] | |
294 | ret_dict[child_tag].append(content or "") | |
295 | else: | |
296 | ret_dict[child_tag] = content or "" | |
297 | return ret_dict | |
298 | __all__.append("getDictFromTree") | |
299 | ||
300 | ||
301 | def getTextFromXml(xml, xpath): | |
302 | tree = getTreeFromXml(xml) | |
303 | if tree.tag.endswith(xpath): | |
304 | return decode_from_s3(tree.text) if tree.text is not None else None | |
305 | else: | |
306 | result = tree.findtext(xpath) | |
307 | return decode_from_s3(result) if result is not None else None | |
308 | __all__.append("getTextFromXml") | |
309 | ||
310 | ||
311 | def getRootTagName(xml): | |
312 | tree = getTreeFromXml(xml) | |
313 | return decode_from_s3(tree.tag) if tree.tag is not None else None | |
314 | __all__.append("getRootTagName") | |
315 | ||
316 | ||
317 | def xmlTextNode(tag_name, text): | |
318 | el = ET.Element(tag_name) | |
319 | el.text = decode_from_s3(text) | |
320 | return el | |
321 | __all__.append("xmlTextNode") | |
322 | ||
323 | ||
324 | def appendXmlTextNode(tag_name, text, parent): | |
325 | """ | |
326 | Creates a new <tag_name> Node and sets | |
327 | its content to 'text'. Then appends the | |
328 | created Node to 'parent' element if given. | |
329 | Returns the newly created Node. | |
330 | """ | |
331 | el = xmlTextNode(tag_name, text) | |
332 | parent.append(el) | |
333 | return el | |
334 | __all__.append("appendXmlTextNode") | |
335 | ||
336 | ||
337 | # vim:et:ts=4:sts=4:ai |
21 | 21 | from .S3 import S3 |
22 | 22 | from .Config import Config |
23 | 23 | from .Exceptions import * |
24 | from .Utils import (getTreeFromXml, appendXmlTextNode, getDictFromTree, | |
25 | dateS3toPython, getBucketFromHostname, | |
26 | getHostnameFromBucket, deunicodise, urlencode_string, | |
27 | convertHeaderTupleListToDict, encode_to_s3, decode_from_s3) | |
24 | from .BaseUtils import (getTreeFromXml, appendXmlTextNode, getDictFromTree, | |
25 | dateS3toPython, encode_to_s3, decode_from_s3) | |
26 | from .Utils import (getBucketFromHostname, getHostnameFromBucket, deunicodise, | |
27 | urlencode_string, convertHeaderTupleListToDict) | |
28 | 28 | from .Crypto import sign_string_v2 |
29 | 29 | from .S3Uri import S3Uri, S3UriS3 |
30 | 30 | from .ConnMan import ConnMan |
482 | 482 | fp.write(deunicodise("\n".join(paths)+"\n")) |
483 | 483 | warning("Request to invalidate %d paths (max 999 supported)" % len(paths)) |
484 | 484 | warning("All the paths are now saved in: %s" % tmp_filename) |
485 | except: | |
485 | except Exception: | |
486 | 486 | pass |
487 | 487 | raise ParameterError("Too many paths to invalidate") |
488 | 488 | |
621 | 621 | # do this since S3 buckets that are set up as websites use custom origins. |
622 | 622 | # Thankfully, the custom origin URLs they use start with the URL of the |
623 | 623 | # S3 bucket. Here, we make use this naming convention to support this use case. |
624 | distListIndex = getBucketFromHostname(d.info['CustomOrigin']['DNSName'])[0]; | |
624 | distListIndex = getBucketFromHostname(d.info['CustomOrigin']['DNSName'])[0] | |
625 | 625 | distListIndex = distListIndex[:len(uri.bucket())] |
626 | 626 | else: |
627 | 627 | # Aral: I'm not sure when this condition will be reached, but keeping it in there. |
802 | 802 | try: |
803 | 803 | for i in inval_list['inval_list'].info['InvalidationSummary']: |
804 | 804 | requests.append("/".join(["cf:/", cfuri.dist_id(), i["Id"]])) |
805 | except: | |
805 | except Exception: | |
806 | 806 | continue |
807 | 807 | for req in requests: |
808 | 808 | cfuri = S3Uri(req) |
8 | 8 | from __future__ import absolute_import |
9 | 9 | |
10 | 10 | import logging |
11 | from logging import debug, warning, error | |
11 | import datetime | |
12 | import locale | |
12 | 13 | import re |
13 | 14 | import os |
14 | 15 | import io |
15 | 16 | import sys |
16 | 17 | import json |
17 | from . import Progress | |
18 | from .SortedDict import SortedDict | |
18 | import time | |
19 | ||
20 | from logging import debug, warning | |
21 | ||
22 | from .ExitCodes import EX_OSFILE | |
23 | ||
24 | try: | |
25 | import dateutil.parser | |
26 | import dateutil.tz | |
27 | except ImportError: | |
28 | sys.stderr.write(u""" | |
29 | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | |
30 | ImportError trying to import dateutil.parser and dateutil.tz. | |
31 | Please install the python dateutil module: | |
32 | $ sudo apt-get install python-dateutil | |
33 | or | |
34 | $ sudo yum install python-dateutil | |
35 | or | |
36 | $ pip install python-dateutil | |
37 | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | |
38 | """) | |
39 | sys.stderr.flush() | |
40 | sys.exit(EX_OSFILE) | |
41 | ||
19 | 42 | try: |
20 | 43 | # python 3 support |
21 | 44 | import httplib |
22 | 45 | except ImportError: |
23 | 46 | import http.client as httplib |
24 | import locale | |
25 | 47 | |
26 | 48 | try: |
27 | from configparser import (NoOptionError, NoSectionError, | |
28 | MissingSectionHeaderError, ParsingError, | |
29 | ConfigParser as PyConfigParser) | |
49 | from configparser import (NoOptionError, NoSectionError, | |
50 | MissingSectionHeaderError, ParsingError, | |
51 | ConfigParser as PyConfigParser) | |
30 | 52 | except ImportError: |
31 | # Python2 fallback code | |
32 | from ConfigParser import (NoOptionError, NoSectionError, | |
33 | MissingSectionHeaderError, ParsingError, | |
34 | ConfigParser as PyConfigParser) | |
53 | # Python2 fallback code | |
54 | from ConfigParser import (NoOptionError, NoSectionError, | |
55 | MissingSectionHeaderError, ParsingError, | |
56 | ConfigParser as PyConfigParser) | |
57 | ||
58 | from . import Progress | |
59 | from .SortedDict import SortedDict | |
60 | from .BaseUtils import (s3_quote, getTreeFromXml, getDictFromTree, | |
61 | base_unicodise, dateRFC822toPython) | |
62 | ||
35 | 63 | |
36 | 64 | try: |
37 | 65 | unicode |
40 | 68 | # In python 3, unicode -> str, and str -> bytes |
41 | 69 | unicode = str |
42 | 70 | |
43 | def config_unicodise(string, encoding = "utf-8", errors = "replace"): | |
44 | """ | |
45 | Convert 'string' to Unicode or raise an exception. | |
46 | Config can't use toolbox from Utils that is itself using Config | |
47 | """ | |
48 | if type(string) == unicode: | |
49 | return string | |
50 | ||
51 | try: | |
52 | return unicode(string, encoding, errors) | |
53 | except UnicodeDecodeError: | |
54 | raise UnicodeDecodeError("Conversion to unicode failed: %r" % string) | |
55 | 71 | |
56 | 72 | def is_bool_true(value): |
57 | 73 | """Check to see if a string is true, yes, on, or 1 |
67 | 83 | else: |
68 | 84 | return False |
69 | 85 | |
86 | ||
70 | 87 | def is_bool_false(value): |
71 | 88 | """Check to see if a string is false, no, off, or 0 |
72 | 89 | |
81 | 98 | else: |
82 | 99 | return False |
83 | 100 | |
101 | ||
84 | 102 | def is_bool(value): |
85 | 103 | """Check a string value to see if it is bool""" |
86 | 104 | return is_bool_true(value) or is_bool_false(value) |
105 | ||
87 | 106 | |
88 | 107 | class Config(object): |
89 | 108 | _instance = None |
93 | 112 | secret_key = u"" |
94 | 113 | access_token = u"" |
95 | 114 | _access_token_refresh = True |
115 | _access_token_expiration = None | |
116 | _access_token_last_update = None | |
96 | 117 | host_base = u"s3.amazonaws.com" |
97 | 118 | host_bucket = u"%(bucket)s.s3.amazonaws.com" |
98 | 119 | kms_key = u"" #can't set this and Server Side Encryption at the same time |
152 | 173 | gpg_decrypt = u"%(gpg_command)s -d --verbose --no-use-agent --batch --yes --passphrase-fd %(passphrase_fd)s -o %(output_file)s %(input_file)s" |
153 | 174 | use_https = True |
154 | 175 | ca_certs_file = u"" |
176 | ssl_client_key_file = u"" | |
177 | ssl_client_cert_file = u"" | |
155 | 178 | check_ssl_certificate = True |
156 | 179 | check_ssl_hostname = True |
157 | 180 | bucket_location = u"US" |
160 | 183 | use_mime_magic = True |
161 | 184 | mime_type = u"" |
162 | 185 | enable_multipart = True |
163 | multipart_chunk_size_mb = 15 # MB | |
164 | multipart_max_chunks = 10000 # Maximum chunks on AWS S3, could be different on other S3-compatible APIs | |
186 | # Chunk size is at the same time the chunk size and the threshold | |
187 | multipart_chunk_size_mb = 15 # MiB | |
188 | # Maximum chunk size for s3-to-s3 copy is 5 GiB. | |
189 | # But, use a lot lower value by default (1GiB) | |
190 | multipart_copy_chunk_size_mb = 1 * 1024 | |
191 | # Maximum chunks on AWS S3, could be different on other S3-compatible APIs | |
192 | multipart_max_chunks = 10000 | |
165 | 193 | # List of checks to be performed for 'sync' |
166 | 194 | sync_checks = ['size', 'md5'] # 'weak-timestamp' |
167 | 195 | # List of compiled REGEXPs |
210 | 238 | throttle_max = 100 |
211 | 239 | public_url_use_https = False |
212 | 240 | connection_pooling = True |
241 | # How long in seconds a connection can be kept idle in the pool and still | |
242 | # be alive. AWS s3 is supposed to close connections that are idle for 20 | |
243 | # seconds or more, but in real life, undocumented, it closes https conns | |
244 | # after around 6s of inactivity. | |
245 | connection_max_age = 5 | |
213 | 246 | |
214 | 247 | ## Creating a singleton |
215 | 248 | def __new__(self, configfile = None, access_key=None, secret_key=None, access_token=None): |
234 | 267 | # Do not refresh the IAM role when an access token is provided. |
235 | 268 | self._access_token_refresh = False |
236 | 269 | |
237 | if len(self.access_key)==0: | |
270 | if len(self.access_key) == 0: | |
238 | 271 | env_access_key = os.getenv('AWS_ACCESS_KEY') or os.getenv('AWS_ACCESS_KEY_ID') |
239 | 272 | env_secret_key = os.getenv('AWS_SECRET_KEY') or os.getenv('AWS_SECRET_ACCESS_KEY') |
240 | 273 | env_access_token = os.getenv('AWS_SESSION_TOKEN') or os.getenv('AWS_SECURITY_TOKEN') |
241 | 274 | if env_access_key: |
275 | if not env_secret_key: | |
276 | raise ValueError( | |
277 | "AWS_ACCESS_KEY environment variable is used but" | |
278 | " AWS_SECRET_KEY variable is missing" | |
279 | ) | |
242 | 280 | # py3 getenv returns unicode and py2 returns bytes. |
243 | self.access_key = config_unicodise(env_access_key) | |
244 | self.secret_key = config_unicodise(env_secret_key) | |
281 | self.access_key = base_unicodise(env_access_key) | |
282 | self.secret_key = base_unicodise(env_secret_key) | |
245 | 283 | if env_access_token: |
246 | 284 | # Do not refresh the IAM role when an access token is provided. |
247 | 285 | self._access_token_refresh = False |
248 | self.access_token = config_unicodise(env_access_token) | |
286 | self.access_token = base_unicodise(env_access_token) | |
249 | 287 | else: |
250 | 288 | self.role_config() |
251 | 289 | |
257 | 295 | |
258 | 296 | def role_config(self): |
259 | 297 | """ |
260 | Get credentials from IAM authentication | |
298 | Get credentials from IAM authentication and STS AssumeRole | |
261 | 299 | """ |
262 | 300 | try: |
263 | conn = httplib.HTTPConnection(host='169.254.169.254', timeout = 2) | |
264 | conn.request('GET', "/latest/meta-data/iam/security-credentials/") | |
265 | resp = conn.getresponse() | |
266 | files = resp.read() | |
267 | if resp.status == 200 and len(files)>1: | |
268 | conn.request('GET', "/latest/meta-data/iam/security-credentials/%s" % files.decode('utf-8')) | |
269 | resp=conn.getresponse() | |
270 | if resp.status == 200: | |
271 | resp_content = config_unicodise(resp.read()) | |
272 | creds=json.loads(resp_content) | |
273 | Config().update_option('access_key', config_unicodise(creds['AccessKeyId'])) | |
274 | Config().update_option('secret_key', config_unicodise(creds['SecretAccessKey'])) | |
275 | Config().update_option('access_token', config_unicodise(creds['Token'])) | |
301 | role_arn = os.environ.get('AWS_ROLE_ARN') | |
302 | if role_arn: | |
303 | role_session_name = 'role-session-%s' % (int(time.time())) | |
304 | params = { | |
305 | 'Action': 'AssumeRole', | |
306 | 'Version': '2011-06-15', | |
307 | 'RoleArn': role_arn, | |
308 | 'RoleSessionName': role_session_name, | |
309 | } | |
310 | web_identity_token_file = os.environ.get('AWS_WEB_IDENTITY_TOKEN_FILE') | |
311 | if web_identity_token_file: | |
312 | with open(web_identity_token_file) as f: | |
313 | web_identity_token = f.read() | |
314 | params['Action'] = 'AssumeRoleWithWebIdentity' | |
315 | params['WebIdentityToken'] = web_identity_token | |
316 | encoded_params = '&'.join([ | |
317 | '%s=%s' % (k, s3_quote(v, unicode_output=True)) | |
318 | for k, v in params.items() | |
319 | ]) | |
320 | conn = httplib.HTTPSConnection(host='sts.amazonaws.com', | |
321 | timeout=2) | |
322 | conn.request('POST', '/?' + encoded_params) | |
323 | resp = conn.getresponse() | |
324 | resp_content = resp.read() | |
325 | if resp.status == 200 and len(resp_content) > 1: | |
326 | tree = getTreeFromXml(resp_content) | |
327 | result_dict = getDictFromTree(tree) | |
328 | if tree.tag == "AssumeRoleResponse": | |
329 | creds = result_dict['AssumeRoleResult']['Credentials'] | |
330 | elif tree.tag == "AssumeRoleWithWebIdentityResponse": | |
331 | creds = result_dict['AssumeRoleWithWebIdentityResult']['Credentials'] | |
332 | else: | |
333 | raise IOError("Unexpected XML message from STS server: <%s />" % tree.tag) | |
334 | Config().update_option('access_key', creds['AccessKeyId']) | |
335 | Config().update_option('secret_key', creds['SecretAccessKey']) | |
336 | Config().update_option('access_token', creds['SessionToken']) | |
337 | expiration = dateRFC822toPython(base_unicodise(creds['Expiration'])) | |
338 | # Add a timedelta to prevent any expiration if the EC2 machine is not at the right date | |
339 | self._access_token_expiration = expiration - datetime.timedelta(minutes=15) | |
340 | # last update date is not provided in STS responses | |
341 | self._access_token_last_update = datetime.datetime.now(dateutil.tz.tzutc()) | |
342 | # Others variables : Code / Type | |
276 | 343 | else: |
277 | 344 | raise IOError |
278 | 345 | else: |
279 | raise IOError | |
346 | conn = httplib.HTTPConnection(host='169.254.169.254', | |
347 | timeout=2) | |
348 | conn.request('GET', "/latest/meta-data/iam/security-credentials/") | |
349 | resp = conn.getresponse() | |
350 | files = resp.read() | |
351 | if resp.status == 200 and len(files) > 1: | |
352 | conn.request('GET', "/latest/meta-data/iam/security-credentials/%s" % files.decode('utf-8')) | |
353 | resp=conn.getresponse() | |
354 | if resp.status == 200: | |
355 | resp_content = base_unicodise(resp.read()) | |
356 | creds=json.loads(resp_content) | |
357 | Config().update_option('access_key', base_unicodise(creds['AccessKeyId'])) | |
358 | Config().update_option('secret_key', base_unicodise(creds['SecretAccessKey'])) | |
359 | Config().update_option('access_token', base_unicodise(creds['Token'])) | |
360 | expiration = dateRFC822toPython(base_unicodise(creds['Expiration'])) | |
361 | # Add a timedelta to prevent any expiration if the EC2 machine is not at the right date | |
362 | self._access_token_expiration = expiration - datetime.timedelta(minutes=15) | |
363 | self._access_token_last_update = dateRFC822toPython(base_unicodise(creds['LastUpdated'])) | |
364 | # Others variables : Code / Type | |
365 | else: | |
366 | raise IOError | |
367 | else: | |
368 | raise IOError | |
280 | 369 | except: |
281 | 370 | raise |
282 | 371 | |
283 | 372 | def role_refresh(self): |
284 | 373 | if self._access_token_refresh: |
374 | now = datetime.datetime.now(dateutil.tz.tzutc()) | |
375 | if self._access_token_expiration \ | |
376 | and now < self._access_token_expiration \ | |
377 | and self._access_token_last_update \ | |
378 | and self._access_token_last_update <= now: | |
379 | # current token is still valid. No need to refresh it | |
380 | return | |
285 | 381 | try: |
286 | 382 | self.role_config() |
287 | except: | |
383 | except Exception: | |
288 | 384 | warning("Could not refresh role") |
289 | 385 | |
290 | 386 | def aws_credential_file(self): |
293 | 389 | credential_file_from_env = os.environ.get('AWS_CREDENTIAL_FILE') |
294 | 390 | if credential_file_from_env and \ |
295 | 391 | os.path.isfile(credential_file_from_env): |
296 | aws_credential_file = config_unicodise(credential_file_from_env) | |
392 | aws_credential_file = base_unicodise(credential_file_from_env) | |
297 | 393 | elif not os.path.isfile(aws_credential_file): |
298 | 394 | return |
299 | ||
300 | warning("Errno %d accessing credentials file %s" % (e.errno, aws_credential_file)) | |
301 | 395 | |
302 | 396 | config = PyConfigParser() |
303 | 397 | |
311 | 405 | # but so far readfp it is still available. |
312 | 406 | config.readfp(io.StringIO(config_string)) |
313 | 407 | except MissingSectionHeaderError: |
314 | # if header is missing, this could be deprecated credentials file format | |
315 | # as described here: https://blog.csanchez.org/2011/05/ | |
408 | # if header is missing, this could be deprecated | |
409 | # credentials file format as described here: | |
410 | # https://blog.csanchez.org/2011/05/ | |
316 | 411 | # then do the hacky-hack and add default header |
317 | 412 | # to be able to read the file with PyConfigParser() |
318 | 413 | config_string = u'[default]\n' + config_string |
322 | 417 | "Error reading aws_credential_file " |
323 | 418 | "(%s): %s" % (aws_credential_file, str(exc))) |
324 | 419 | |
325 | profile = config_unicodise(os.environ.get('AWS_PROFILE', "default")) | |
420 | profile = base_unicodise(os.environ.get('AWS_PROFILE', "default")) | |
326 | 421 | debug("Using AWS profile '%s'" % (profile)) |
327 | 422 | |
328 | 423 | # get_key - helper function to read the aws profile credentials |
329 | # including the legacy ones as described here: https://blog.csanchez.org/2011/05/ | |
424 | # including the legacy ones as described here: | |
425 | # https://blog.csanchez.org/2011/05/ | |
330 | 426 | def get_key(profile, key, legacy_key, print_warning=True): |
331 | 427 | result = None |
332 | 428 | |
333 | 429 | try: |
334 | 430 | result = config.get(profile, key) |
335 | 431 | except NoOptionError as e: |
336 | if print_warning: # we may want to skip warning message for optional keys | |
337 | warning("Couldn't find key '%s' for the AWS Profile '%s' in the credentials file '%s'" % (e.option, e.section, aws_credential_file)) | |
338 | if legacy_key: # if the legacy_key defined and original one wasn't found, try read the legacy_key | |
432 | # we may want to skip warning message for optional keys | |
433 | if print_warning: | |
434 | warning("Couldn't find key '%s' for the AWS Profile " | |
435 | "'%s' in the credentials file '%s'", | |
436 | e.option, e.section, aws_credential_file) | |
437 | # if the legacy_key defined and original one wasn't found, | |
438 | # try read the legacy_key | |
439 | if legacy_key: | |
339 | 440 | try: |
340 | 441 | key = legacy_key |
341 | 442 | profile = "default" |
342 | 443 | result = config.get(profile, key) |
343 | 444 | warning( |
344 | "Legacy configuratin key '%s' used, " % (key) + | |
345 | "please use the standardized config format as described here: " + | |
346 | "https://aws.amazon.com/blogs/security/a-new-and-standardized-way-to-manage-credentials-in-the-aws-sdks/" | |
347 | ) | |
445 | "Legacy configuratin key '%s' used, please use" | |
446 | " the standardized config format as described " | |
447 | "here: https://aws.amazon.com/blogs/security/a-new-and-standardized-way-to-manage-credentials-in-the-aws-sdks/", | |
448 | key) | |
348 | 449 | except NoOptionError as e: |
349 | 450 | pass |
350 | 451 | |
351 | 452 | if result: |
352 | debug("Found the configuration option '%s' for the AWS Profile '%s' in the credentials file %s" % (key, profile, aws_credential_file)) | |
453 | debug("Found the configuration option '%s' for the AWS " | |
454 | "Profile '%s' in the credentials file %s", | |
455 | key, profile, aws_credential_file) | |
353 | 456 | return result |
354 | 457 | |
355 | profile_access_key = get_key(profile, "aws_access_key_id", "AWSAccessKeyId") | |
458 | profile_access_key = get_key(profile, "aws_access_key_id", | |
459 | "AWSAccessKeyId") | |
356 | 460 | if profile_access_key: |
357 | Config().update_option('access_key', config_unicodise(profile_access_key)) | |
358 | ||
359 | profile_secret_key = get_key(profile, "aws_secret_access_key", "AWSSecretKey") | |
461 | Config().update_option('access_key', | |
462 | base_unicodise(profile_access_key)) | |
463 | ||
464 | profile_secret_key = get_key(profile, "aws_secret_access_key", | |
465 | "AWSSecretKey") | |
360 | 466 | if profile_secret_key: |
361 | Config().update_option('secret_key', config_unicodise(profile_secret_key)) | |
362 | ||
363 | profile_access_token = get_key(profile, "aws_session_token", None, False) | |
467 | Config().update_option('secret_key', | |
468 | base_unicodise(profile_secret_key)) | |
469 | ||
470 | profile_access_token = get_key(profile, "aws_session_token", None, | |
471 | False) | |
364 | 472 | if profile_access_token: |
365 | Config().update_option('access_token', config_unicodise(profile_access_token)) | |
473 | Config().update_option('access_token', | |
474 | base_unicodise(profile_access_token)) | |
366 | 475 | |
367 | 476 | except IOError as e: |
368 | warning("Errno %d accessing credentials file %s" % (e.errno, aws_credential_file)) | |
477 | warning("Errno %d accessing credentials file %s", e.errno, | |
478 | aws_credential_file) | |
369 | 479 | except NoSectionError as e: |
370 | warning("Couldn't find AWS Profile '%s' in the credentials file '%s'" % (profile, aws_credential_file)) | |
480 | warning("Couldn't find AWS Profile '%s' in the credentials file " | |
481 | "'%s'", profile, aws_credential_file) | |
371 | 482 | |
372 | 483 | def option_list(self): |
373 | 484 | retval = [] |
398 | 509 | |
399 | 510 | if cp.get('add_headers'): |
400 | 511 | for option in cp.get('add_headers').split(","): |
401 | (key, value) = option.split(':') | |
512 | (key, value) = option.split(':', 1) | |
402 | 513 | self.extra_headers[key.strip()] = value.strip() |
403 | 514 | |
404 | 515 | self._parsed_files.append(configfile) |
441 | 552 | shift = 0 |
442 | 553 | try: |
443 | 554 | value = shift and int(value[:-1]) << shift or int(value) |
444 | except: | |
555 | except Exception: | |
445 | 556 | raise ValueError("Config: value of option %s must have suffix m, k, or nothing, not '%s'" % (option, value)) |
446 | 557 | |
447 | 558 | ## allow yes/no, true/false, on/off and 1/0 for boolean options |
448 | 559 | ## Some options default to None, if that's the case check the value to see if it is bool |
449 | 560 | elif (type(getattr(Config, option)) is type(True) or # Config is bool |
450 | (getattr(Config, option) is None and is_bool(value))): # Config is None and value is bool | |
561 | (getattr(Config, option) is None and is_bool(value))): # Config is None and value is bool | |
451 | 562 | if is_bool_true(value): |
452 | 563 | value = True |
453 | 564 | elif is_bool_false(value): |
480 | 591 | if type(sections) != type([]): |
481 | 592 | sections = [sections] |
482 | 593 | in_our_section = True |
483 | r_comment = re.compile("^\s*#.*") | |
484 | r_empty = re.compile("^\s*$") | |
485 | r_section = re.compile("^\[([^\]]+)\]") | |
486 | r_data = re.compile("^\s*(?P<key>\w+)\s*=\s*(?P<value>.*)") | |
487 | r_quotes = re.compile("^\"(.*)\"\s*$") | |
594 | r_comment = re.compile(r'^\s*#.*') | |
595 | r_empty = re.compile(r'^\s*$') | |
596 | r_section = re.compile(r'^\[([^\]]+)\]') | |
597 | r_data = re.compile(r'^\s*(?P<key>\w+)\s*=\s*(?P<value>.*)') | |
598 | r_quotes = re.compile(r'^"(.*)"\s*$') | |
488 | 599 | with io.open(file, "r", encoding=self.get('encoding', 'UTF-8')) as fp: |
489 | 600 | for line in fp: |
490 | 601 | if r_comment.match(line) or r_empty.match(line): |
13 | 13 | else: |
14 | 14 | from .Custom_httplib27 import httplib |
15 | 15 | import ssl |
16 | ||
17 | from logging import debug | |
16 | 18 | from threading import Semaphore |
17 | from logging import debug | |
19 | from time import time | |
18 | 20 | try: |
19 | 21 | # python 3 support |
20 | 22 | from urlparse import urlparse |
60 | 62 | return context |
61 | 63 | |
62 | 64 | @staticmethod |
65 | def _ssl_client_auth_context(certfile, keyfile, check_server_cert, cafile): | |
66 | context = None | |
67 | try: | |
68 | cert_reqs = ssl.CERT_REQUIRED if check_server_cert else ssl.CERT_NONE | |
69 | context = ssl._create_unverified_context(cafile=cafile, | |
70 | keyfile=keyfile, | |
71 | certfile=certfile, | |
72 | cert_reqs=cert_reqs) | |
73 | except AttributeError: # no ssl._create_unverified_context | |
74 | pass | |
75 | return context | |
76 | ||
77 | @staticmethod | |
63 | 78 | def _ssl_context(): |
64 | 79 | if http_connection.context_set: |
65 | 80 | return http_connection.context |
68 | 83 | cafile = cfg.ca_certs_file |
69 | 84 | if cafile == "": |
70 | 85 | cafile = None |
86 | certfile = cfg.ssl_client_cert_file or None | |
87 | keyfile = cfg.ssl_client_key_file or None # the key may be embedded into cert file | |
88 | ||
71 | 89 | debug(u"Using ca_certs_file %s", cafile) |
72 | ||
73 | if cfg.check_ssl_certificate: | |
90 | debug(u"Using ssl_client_cert_file %s", certfile) | |
91 | debug(u"Using ssl_client_key_file %s", keyfile) | |
92 | ||
93 | if certfile is not None: | |
94 | context = http_connection._ssl_client_auth_context(certfile, keyfile, cfg.check_ssl_certificate, cafile) | |
95 | elif cfg.check_ssl_certificate: | |
74 | 96 | context = http_connection._ssl_verified_context(cafile) |
75 | 97 | else: |
76 | 98 | context = http_connection._ssl_unverified_context(cafile) |
217 | 239 | debug(u'proxied HTTPConnection(%s, %s)', cfg.proxy_host, cfg.proxy_port) |
218 | 240 | # No tunnel here for the moment |
219 | 241 | |
242 | self.last_used_time = time() | |
220 | 243 | |
221 | 244 | class ConnMan(object): |
222 | 245 | _CS_REQ_SENT = httplib._CS_REQ_SENT |
223 | 246 | CONTINUE = httplib.CONTINUE |
224 | 247 | conn_pool_sem = Semaphore() |
225 | 248 | conn_pool = {} |
226 | conn_max_counter = 800 ## AWS closes connection after some ~90 requests | |
227 | ||
228 | @staticmethod | |
229 | def get(hostname, ssl = None): | |
249 | conn_max_counter = 800 ## AWS closes connection after some ~90 requests | |
250 | ||
251 | @staticmethod | |
252 | def get(hostname, ssl=None): | |
230 | 253 | cfg = Config() |
231 | if ssl == None: | |
254 | if ssl is None: | |
232 | 255 | ssl = cfg.use_https |
233 | 256 | conn = None |
234 | 257 | if cfg.proxy_host != "": |
240 | 263 | ConnMan.conn_pool_sem.acquire() |
241 | 264 | if conn_id not in ConnMan.conn_pool: |
242 | 265 | ConnMan.conn_pool[conn_id] = [] |
243 | if len(ConnMan.conn_pool[conn_id]): | |
266 | while ConnMan.conn_pool[conn_id]: | |
244 | 267 | conn = ConnMan.conn_pool[conn_id].pop() |
245 | debug("ConnMan.get(): re-using connection: %s#%d" % (conn.id, conn.counter)) | |
268 | cur_time = time() | |
269 | if cur_time < conn.last_used_time + cfg.connection_max_age \ | |
270 | and cur_time >= conn.last_used_time: | |
271 | debug("ConnMan.get(): re-using connection: %s#%d" | |
272 | % (conn.id, conn.counter)) | |
273 | break | |
274 | # Conn is too old or wall clock went back in the past | |
275 | debug("ConnMan.get(): closing expired connection") | |
276 | ConnMan.close(conn) | |
277 | conn = None | |
278 | ||
246 | 279 | ConnMan.conn_pool_sem.release() |
247 | 280 | if not conn: |
248 | 281 | debug("ConnMan.get(): creating new connection: %s" % conn_id) |
257 | 290 | def put(conn): |
258 | 291 | if conn.id.startswith("proxy://"): |
259 | 292 | ConnMan.close(conn) |
260 | debug("ConnMan.put(): closing proxy connection (keep-alive not yet supported)") | |
293 | debug("ConnMan.put(): closing proxy connection (keep-alive not yet" | |
294 | " supported)") | |
261 | 295 | return |
262 | 296 | |
263 | 297 | if conn.counter >= ConnMan.conn_max_counter: |
271 | 305 | debug("ConnMan.put(): closing connection (connection pooling disabled)") |
272 | 306 | return |
273 | 307 | |
308 | # Update timestamp of conn to record when was its last use | |
309 | conn.last_used_time = time() | |
310 | ||
274 | 311 | ConnMan.conn_pool_sem.acquire() |
275 | 312 | ConnMan.conn_pool[conn.id].append(conn) |
276 | 313 | ConnMan.conn_pool_sem.release() |
277 | debug("ConnMan.put(): connection put back to pool (%s#%d)" % (conn.id, conn.counter)) | |
314 | debug("ConnMan.put(): connection put back to pool (%s#%d)" | |
315 | % (conn.id, conn.counter)) | |
278 | 316 | |
279 | 317 | @staticmethod |
280 | 318 | def close(conn): |
9 | 9 | |
10 | 10 | import sys |
11 | 11 | import hmac |
12 | import base64 | |
12 | try: | |
13 | from base64 import encodebytes as encodestring | |
14 | except ImportError: | |
15 | # Python 2 support | |
16 | from base64 import encodestring | |
13 | 17 | |
14 | 18 | from . import Config |
15 | 19 | from logging import debug |
16 | from .Utils import encode_to_s3, time_to_epoch, deunicodise, decode_from_s3, check_bucket_name_dns_support | |
20 | from .BaseUtils import encode_to_s3, decode_from_s3, s3_quote | |
21 | from .Utils import time_to_epoch, deunicodise, check_bucket_name_dns_support | |
17 | 22 | from .SortedDict import SortedDict |
18 | 23 | |
19 | 24 | import datetime |
20 | try: | |
21 | # python 3 support | |
22 | from urllib import quote | |
23 | except ImportError: | |
24 | from urllib.parse import quote | |
25 | ||
25 | 26 | |
26 | 27 | from hashlib import sha1, sha256 |
27 | 28 | |
65 | 66 | and returned signature will be utf-8 encoded "bytes". |
66 | 67 | """ |
67 | 68 | secret_key = Config.Config().secret_key |
68 | signature = base64.encodestring(hmac.new(encode_to_s3(secret_key), string_to_sign, sha1).digest()).strip() | |
69 | signature = encodestring(hmac.new(encode_to_s3(secret_key), string_to_sign, sha1).digest()).strip() | |
69 | 70 | return signature |
70 | 71 | __all__.append("sign_string_v2") |
71 | 72 | |
249 | 250 | return new_headers |
250 | 251 | __all__.append("sign_request_v4") |
251 | 252 | |
252 | def s3_quote(param, quote_backslashes=True, unicode_output=False): | |
253 | """ | |
254 | URI encode every byte. UriEncode() must enforce the following rules: | |
255 | - URI encode every byte except the unreserved characters: 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '_', and '~'. | |
256 | - The space character is a reserved character and must be encoded as "%20" (and not as "+"). | |
257 | - Each URI encoded byte is formed by a '%' and the two-digit hexadecimal value of the byte. | |
258 | - Letters in the hexadecimal value must be uppercase, for example "%1A". | |
259 | - Encode the forward slash character, '/', everywhere except in the object key name. | |
260 | For example, if the object key name is photos/Jan/sample.jpg, the forward slash in the key name is not encoded. | |
261 | """ | |
262 | if quote_backslashes: | |
263 | safe_chars = "~" | |
264 | else: | |
265 | safe_chars = "~/" | |
266 | param = encode_to_s3(param) | |
267 | param = quote(param, safe=safe_chars) | |
268 | if unicode_output: | |
269 | param = decode_from_s3(param) | |
270 | else: | |
271 | param = encode_to_s3(param) | |
272 | return param | |
273 | __all__.append("s3_quote") | |
274 | ||
275 | 253 | def checksum_sha256_file(filename, offset=0, size=None): |
276 | 254 | try: |
277 | 255 | hash = sha256() |
278 | except: | |
256 | except Exception: | |
279 | 257 | # fallback to Crypto SHA256 module |
280 | 258 | hash = sha256.new() |
281 | 259 | with open(deunicodise(filename),'rb') as f: |
287 | 265 | size_left = size |
288 | 266 | while size_left > 0: |
289 | 267 | chunk = f.read(min(8192, size_left)) |
268 | if not chunk: | |
269 | break | |
290 | 270 | size_left -= len(chunk) |
291 | 271 | hash.update(chunk) |
292 | 272 | |
295 | 275 | def checksum_sha256_buffer(buffer, offset=0, size=None): |
296 | 276 | try: |
297 | 277 | hash = sha256() |
298 | except: | |
278 | except Exception: | |
299 | 279 | # fallback to Crypto SHA256 module |
300 | 280 | hash = sha256.new() |
301 | 281 | if size is None: |
11 | 11 | except ImportError: |
12 | 12 | from StringIO import StringIO |
13 | 13 | |
14 | from .Utils import encode_to_s3 | |
14 | from .BaseUtils import encode_to_s3 | |
15 | 15 | |
16 | 16 | |
17 | 17 | _METHODS_EXPECTING_BODY = ['PATCH', 'POST', 'PUT'] |
10 | 10 | |
11 | 11 | from io import StringIO |
12 | 12 | |
13 | from .Utils import encode_to_s3 | |
13 | from .BaseUtils import encode_to_s3 | |
14 | 14 | |
15 | 15 | |
16 | 16 | _METHODS_EXPECTING_BODY = ['PATCH', 'POST', 'PUT'] |
9 | 9 | |
10 | 10 | from logging import debug, error |
11 | 11 | import sys |
12 | import S3.BaseUtils | |
12 | 13 | import S3.Utils |
13 | 14 | from . import ExitCodes |
14 | 15 | |
40 | 41 | ## s3cmd exceptions |
41 | 42 | |
42 | 43 | class S3Exception(Exception): |
43 | def __init__(self, message = ""): | |
44 | def __init__(self, message=""): | |
44 | 45 | self.message = S3.Utils.unicodise(message) |
45 | 46 | |
46 | 47 | def __str__(self): |
57 | 58 | ## (Base)Exception.message has been deprecated in Python 2.6 |
58 | 59 | def _get_message(self): |
59 | 60 | return self._message |
61 | ||
60 | 62 | def _set_message(self, message): |
61 | 63 | self._message = message |
62 | 64 | message = property(_get_message, _set_message) |
67 | 69 | self.status = response["status"] |
68 | 70 | self.reason = response["reason"] |
69 | 71 | self.info = { |
70 | "Code" : "", | |
71 | "Message" : "", | |
72 | "Resource" : "" | |
72 | "Code": "", | |
73 | "Message": "", | |
74 | "Resource": "" | |
73 | 75 | } |
74 | 76 | debug("S3Error: %s (%s)" % (self.status, self.reason)) |
75 | 77 | if "headers" in response: |
77 | 79 | debug("HttpHeader: %s: %s" % (header, response["headers"][header])) |
78 | 80 | if "data" in response and response["data"]: |
79 | 81 | try: |
80 | tree = S3.Utils.getTreeFromXml(response["data"]) | |
82 | tree = S3.BaseUtils.getTreeFromXml(response["data"]) | |
81 | 83 | except XmlParseError: |
82 | 84 | debug("Not an XML response") |
83 | 85 | else: |
113 | 115 | return ExitCodes.EX_PRECONDITION |
114 | 116 | elif self.status == 500: |
115 | 117 | return ExitCodes.EX_SOFTWARE |
116 | elif self.status == 503: | |
118 | elif self.status in [429, 503]: | |
117 | 119 | return ExitCodes.EX_SERVICE |
118 | 120 | else: |
119 | 121 | return ExitCodes.EX_SOFTWARE |
125 | 127 | if not error_node.tag == "Error": |
126 | 128 | error_node = tree.find(".//Error") |
127 | 129 | if error_node is not None: |
128 | for child in error_node.getchildren(): | |
130 | for child in error_node: | |
129 | 131 | if child.text != "": |
130 | 132 | debug("ErrorXML: " + child.tag + ": " + repr(child.text)) |
131 | 133 | info[child.tag] = child.text |
11 | 11 | from .Config import Config |
12 | 12 | from .S3Uri import S3Uri |
13 | 13 | from .FileDict import FileDict |
14 | from .Utils import * | |
14 | from .BaseUtils import dateS3toUnix, dateRFC822toUnix | |
15 | from .Utils import unicodise, deunicodise, deunicodise_s, replace_nonprintables | |
15 | 16 | from .Exceptions import ParameterError |
16 | 17 | from .HashCache import HashCache |
17 | 18 | |
34 | 35 | ''' |
35 | 36 | try: |
36 | 37 | names = os.listdir(deunicodise(top)) |
37 | except: | |
38 | except Exception: | |
38 | 39 | return |
39 | 40 | |
40 | 41 | dirs, nondirs = [], [] |
183 | 184 | |
184 | 185 | # reformat to match os.walk() |
185 | 186 | result = [] |
186 | keys = filelist.keys() | |
187 | keys.sort() | |
188 | for key in keys: | |
187 | for key in sorted(filelist): | |
189 | 188 | values = filelist[key] |
190 | 189 | values.sort() |
191 | 190 | result.append((key, [], values)) |
245 | 244 | try: |
246 | 245 | uid = os.geteuid() |
247 | 246 | gid = os.getegid() |
248 | except: | |
247 | except Exception: | |
249 | 248 | uid = 0 |
250 | 249 | gid = 0 |
251 | 250 | loc_list["-"] = { |
372 | 371 | remote_item.update({ |
373 | 372 | 'size': int(response['headers']['content-length']), |
374 | 373 | 'md5': response['headers']['etag'].strip('"\''), |
375 | 'timestamp' : dateRFC822toUnix(response['headers']['last-modified']) | |
374 | 'timestamp': dateRFC822toUnix(response['headers']['last-modified']) | |
376 | 375 | }) |
377 | 376 | try: |
378 | 377 | md5 = response['s3cmd-attrs']['md5'] |
541 | 540 | try: |
542 | 541 | src_md5 = src_list.get_md5(file) |
543 | 542 | dst_md5 = dst_list.get_md5(file) |
544 | except (IOError,OSError): | |
543 | except (IOError, OSError): | |
545 | 544 | # md5 sum verification failed - ignore that file altogether |
546 | 545 | debug(u"IGNR: %s (disappeared)" % (file)) |
547 | 546 | warning(u"%s: file disappeared, ignoring." % (file)) |
603 | 602 | # Found one, we want to copy |
604 | 603 | dst1 = dst_list.find_md5_one(md5) |
605 | 604 | debug(u"DST COPY src: %s -> %s" % (dst1, relative_file)) |
606 | copy_pairs.append((src_list[relative_file], dst1, relative_file)) | |
605 | copy_pairs.append((src_list[relative_file], dst1, relative_file, md5)) | |
607 | 606 | del(src_list[relative_file]) |
608 | 607 | del(dst_list[relative_file]) |
609 | 608 | else: |
620 | 619 | try: |
621 | 620 | md5 = src_list.get_md5(relative_file) |
622 | 621 | except IOError: |
623 | md5 = None | |
622 | md5 = None | |
624 | 623 | dst1 = dst_list.find_md5_one(md5) |
625 | 624 | if dst1 is not None: |
626 | 625 | # Found one, we want to copy |
627 | 626 | debug(u"DST COPY dst: %s -> %s" % (dst1, relative_file)) |
628 | copy_pairs.append((src_list[relative_file], dst1, relative_file)) | |
627 | copy_pairs.append((src_list[relative_file], dst1, | |
628 | relative_file, md5)) | |
629 | 629 | del(src_list[relative_file]) |
630 | 630 | else: |
631 | 631 | # we don't have this file, and we don't have a copy of this file elsewhere. Get it. |
25 | 25 | d = self.inodes[dev][inode][mtime] |
26 | 26 | if d['size'] != size: |
27 | 27 | return None |
28 | except: | |
28 | except Exception: | |
29 | 29 | return None |
30 | 30 | return d['md5'] |
31 | 31 |
5 | 5 | |
6 | 6 | from __future__ import absolute_import |
7 | 7 | |
8 | import os | |
9 | 8 | import sys |
10 | from stat import ST_SIZE | |
11 | 9 | from logging import debug, info, warning, error |
12 | from .Utils import getTextFromXml, getTreeFromXml, formatSize, unicodise, deunicodise, calculateChecksum, parseNodes, encode_to_s3 | |
10 | from .Exceptions import ParameterError | |
11 | from .S3Uri import S3UriS3 | |
12 | from .BaseUtils import getTextFromXml, getTreeFromXml, s3_quote, parseNodes | |
13 | from .Utils import formatSize, calculateChecksum | |
14 | ||
15 | SIZE_1MB = 1024 * 1024 | |
16 | ||
13 | 17 | |
14 | 18 | class MultiPartUpload(object): |
15 | ||
16 | MIN_CHUNK_SIZE_MB = 5 # 5MB | |
17 | MAX_CHUNK_SIZE_MB = 5120 # 5GB | |
18 | MAX_FILE_SIZE = 42949672960 # 5TB | |
19 | ||
20 | def __init__(self, s3, file_stream, uri, headers_baseline=None): | |
19 | """Supports MultiPartUpload and MultiPartUpload(Copy) operation""" | |
20 | MIN_CHUNK_SIZE_MB = 5 # 5MB | |
21 | MAX_CHUNK_SIZE_MB = 5 * 1024 # 5GB | |
22 | MAX_FILE_SIZE = 5 * 1024 * 1024 # 5TB | |
23 | ||
24 | def __init__(self, s3, src, dst_uri, headers_baseline=None, | |
25 | src_size=None): | |
21 | 26 | self.s3 = s3 |
22 | self.file_stream = file_stream | |
23 | self.uri = uri | |
27 | self.file_stream = None | |
28 | self.src_uri = None | |
29 | self.src_size = src_size | |
30 | self.dst_uri = dst_uri | |
24 | 31 | self.parts = {} |
25 | 32 | self.headers_baseline = headers_baseline or {} |
33 | ||
34 | if isinstance(src, S3UriS3): | |
35 | # Source is the uri of an object to s3-to-s3 copy with multipart. | |
36 | self.src_uri = src | |
37 | if not src_size: | |
38 | raise ParameterError("Source size is missing for " | |
39 | "MultipartUploadCopy operation") | |
40 | c_size = self.s3.config.multipart_copy_chunk_size_mb * SIZE_1MB | |
41 | else: | |
42 | # Source is a file_stream to upload | |
43 | self.file_stream = src | |
44 | c_size = self.s3.config.multipart_chunk_size_mb * SIZE_1MB | |
45 | ||
46 | self.chunk_size = c_size | |
26 | 47 | self.upload_id = self.initiate_multipart_upload() |
27 | 48 | |
28 | 49 | def get_parts_information(self, uri, upload_id): |
29 | multipart_response = self.s3.list_multipart(uri, upload_id) | |
30 | tree = getTreeFromXml(multipart_response['data']) | |
50 | part_list = self.s3.list_multipart(uri, upload_id) | |
31 | 51 | |
32 | 52 | parts = dict() |
33 | for elem in parseNodes(tree): | |
53 | for elem in part_list: | |
34 | 54 | try: |
35 | parts[int(elem['PartNumber'])] = {'checksum': elem['ETag'], 'size': elem['Size']} | |
55 | parts[int(elem['PartNumber'])] = { | |
56 | 'checksum': elem['ETag'], | |
57 | 'size': elem['Size'] | |
58 | } | |
36 | 59 | except KeyError: |
37 | 60 | pass |
38 | 61 | |
40 | 63 | |
41 | 64 | def get_unique_upload_id(self, uri): |
42 | 65 | upload_id = "" |
43 | multipart_response = self.s3.get_multipart(uri) | |
44 | tree = getTreeFromXml(multipart_response['data']) | |
45 | for mpupload in parseNodes(tree): | |
66 | multipart_list = self.s3.get_multipart(uri) | |
67 | for mpupload in multipart_list: | |
46 | 68 | try: |
47 | 69 | mp_upload_id = mpupload['UploadId'] |
48 | 70 | mp_path = mpupload['Key'] |
49 | 71 | info("mp_path: %s, object: %s" % (mp_path, uri.object())) |
50 | 72 | if mp_path == uri.object(): |
51 | 73 | if upload_id: |
52 | raise ValueError("More than one UploadId for URI %s. Disable multipart upload, or use\n %s multipart %s\nto list the Ids, then pass a unique --upload-id into the put command." % (uri, sys.argv[0], uri)) | |
74 | raise ValueError( | |
75 | "More than one UploadId for URI %s. Disable " | |
76 | "multipart upload, or use\n %s multipart %s\n" | |
77 | "to list the Ids, then pass a unique --upload-id " | |
78 | "into the put command." % (uri, sys.argv[0], uri)) | |
53 | 79 | upload_id = mp_upload_id |
54 | 80 | except KeyError: |
55 | 81 | pass |
64 | 90 | if self.s3.config.upload_id: |
65 | 91 | self.upload_id = self.s3.config.upload_id |
66 | 92 | elif self.s3.config.put_continue: |
67 | self.upload_id = self.get_unique_upload_id(self.uri) | |
93 | self.upload_id = self.get_unique_upload_id(self.dst_uri) | |
68 | 94 | else: |
69 | 95 | self.upload_id = "" |
70 | 96 | |
71 | 97 | if not self.upload_id: |
72 | request = self.s3.create_request("OBJECT_POST", uri = self.uri, | |
73 | headers = self.headers_baseline, | |
74 | uri_params = {'uploads': None}) | |
98 | request = self.s3.create_request("OBJECT_POST", uri=self.dst_uri, | |
99 | headers=self.headers_baseline, | |
100 | uri_params={'uploads': None}) | |
75 | 101 | response = self.s3.send_request(request) |
76 | 102 | data = response["data"] |
77 | 103 | self.upload_id = getTextFromXml(data, "UploadId") |
85 | 111 | TODO use num_processes to thread it |
86 | 112 | """ |
87 | 113 | if not self.upload_id: |
88 | raise RuntimeError("Attempting to use a multipart upload that has not been initiated.") | |
89 | ||
90 | self.chunk_size = self.s3.config.multipart_chunk_size_mb * 1024 * 1024 | |
91 | filename = self.file_stream.stream_name | |
92 | ||
93 | if filename != u"<stdin>": | |
94 | size_left = file_size = os.stat(deunicodise(filename))[ST_SIZE] | |
95 | nr_parts = file_size // self.chunk_size + (file_size % self.chunk_size and 1) | |
96 | debug("MultiPart: Uploading %s in %d parts" % (filename, nr_parts)) | |
114 | raise ParameterError("Attempting to use a multipart upload that " | |
115 | "has not been initiated.") | |
116 | ||
117 | remote_statuses = {} | |
118 | ||
119 | if self.src_uri: | |
120 | filename = self.src_uri.uri() | |
121 | # Continue is not possible with multipart copy | |
97 | 122 | else: |
98 | debug("MultiPart: Uploading from %s" % filename) | |
99 | ||
100 | remote_statuses = dict() | |
123 | filename = self.file_stream.stream_name | |
124 | ||
101 | 125 | if self.s3.config.put_continue: |
102 | remote_statuses = self.get_parts_information(self.uri, self.upload_id) | |
126 | remote_statuses = self.get_parts_information(self.dst_uri, | |
127 | self.upload_id) | |
103 | 128 | |
104 | 129 | if extra_label: |
105 | 130 | extra_label = u' ' + extra_label |
131 | labels = { | |
132 | 'source': filename, | |
133 | 'destination': self.dst_uri.uri(), | |
134 | } | |
135 | ||
106 | 136 | seq = 1 |
107 | if filename != u"<stdin>": | |
137 | ||
138 | if self.src_size: | |
139 | size_left = self.src_size | |
140 | nr_parts = self.src_size // self.chunk_size \ | |
141 | + (self.src_size % self.chunk_size and 1) | |
142 | debug("MultiPart: Uploading %s in %d parts" % (filename, nr_parts)) | |
143 | ||
108 | 144 | while size_left > 0: |
109 | 145 | offset = self.chunk_size * (seq - 1) |
110 | current_chunk_size = min(file_size - offset, self.chunk_size) | |
146 | current_chunk_size = min(self.src_size - offset, | |
147 | self.chunk_size) | |
111 | 148 | size_left -= current_chunk_size |
112 | labels = { | |
113 | 'source' : filename, | |
114 | 'destination' : self.uri.uri(), | |
115 | 'extra' : "[part %d of %d, %s]%s" % (seq, nr_parts, "%d%sB" % formatSize(current_chunk_size, human_readable = True), extra_label) | |
116 | } | |
149 | labels['extra'] = "[part %d of %d, %s]%s" % ( | |
150 | seq, nr_parts, "%d%sB" % formatSize(current_chunk_size, | |
151 | human_readable=True), | |
152 | extra_label) | |
117 | 153 | try: |
118 | self.upload_part(seq, offset, current_chunk_size, labels, remote_status = remote_statuses.get(seq)) | |
154 | if self.file_stream: | |
155 | self.upload_part( | |
156 | seq, offset, current_chunk_size, labels, | |
157 | remote_status=remote_statuses.get(seq)) | |
158 | else: | |
159 | self.copy_part( | |
160 | seq, offset, current_chunk_size, labels, | |
161 | remote_status=remote_statuses.get(seq)) | |
119 | 162 | except: |
120 | error(u"\nUpload of '%s' part %d failed. Use\n %s abortmp %s %s\nto abort the upload, or\n %s --upload-id %s put ...\nto continue the upload." | |
121 | % (filename, seq, sys.argv[0], self.uri, self.upload_id, sys.argv[0], self.upload_id)) | |
163 | error(u"\nUpload of '%s' part %d failed. Use\n " | |
164 | "%s abortmp %s %s\nto abort the upload, or\n " | |
165 | "%s --upload-id %s put ...\nto continue the upload." | |
166 | % (filename, seq, sys.argv[0], self.dst_uri, | |
167 | self.upload_id, sys.argv[0], self.upload_id)) | |
122 | 168 | raise |
123 | 169 | seq += 1 |
124 | else: | |
125 | while True: | |
126 | buffer = self.file_stream.read(self.chunk_size) | |
127 | offset = 0 # send from start of the buffer | |
128 | current_chunk_size = len(buffer) | |
129 | labels = { | |
130 | 'source' : filename, | |
131 | 'destination' : self.uri.uri(), | |
132 | 'extra' : "[part %d, %s]" % (seq, "%d%sB" % formatSize(current_chunk_size, human_readable = True)) | |
133 | } | |
134 | if len(buffer) == 0: # EOF | |
135 | break | |
136 | try: | |
137 | self.upload_part(seq, offset, current_chunk_size, labels, buffer, remote_status = remote_statuses.get(seq)) | |
138 | except: | |
139 | error(u"\nUpload of '%s' part %d failed. Use\n %s abortmp %s %s\nto abort, or\n %s --upload-id %s put ...\nto continue the upload." | |
140 | % (filename, seq, sys.argv[0], self.uri, self.upload_id, sys.argv[0], self.upload_id)) | |
141 | raise | |
142 | seq += 1 | |
170 | ||
171 | debug("MultiPart: Upload finished: %d parts", seq - 1) | |
172 | return | |
173 | ||
174 | ||
175 | # Else -> Case of u"<stdin>" source | |
176 | debug("MultiPart: Uploading from %s" % filename) | |
177 | while True: | |
178 | buffer = self.file_stream.read(self.chunk_size) | |
179 | offset = 0 # send from start of the buffer | |
180 | current_chunk_size = len(buffer) | |
181 | labels['extra'] = "[part %d of -, %s]%s" % ( | |
182 | seq, "%d%sB" % formatSize(current_chunk_size, | |
183 | human_readable=True), | |
184 | extra_label) | |
185 | if not buffer: | |
186 | # EOF | |
187 | break | |
188 | try: | |
189 | self.upload_part(seq, offset, current_chunk_size, labels, | |
190 | buffer, | |
191 | remote_status=remote_statuses.get(seq)) | |
192 | except: | |
193 | error(u"\nUpload of '%s' part %d failed. Use\n " | |
194 | "%s abortmp %s %s\nto abort, or\n " | |
195 | "%s --upload-id %s put ...\nto continue the upload." | |
196 | % (filename, seq, sys.argv[0], self.dst_uri, | |
197 | self.upload_id, sys.argv[0], self.upload_id)) | |
198 | raise | |
199 | seq += 1 | |
143 | 200 | |
144 | 201 | debug("MultiPart: Upload finished: %d parts", seq - 1) |
145 | 202 | |
146 | def upload_part(self, seq, offset, chunk_size, labels, buffer = '', remote_status = None): | |
203 | def upload_part(self, seq, offset, chunk_size, labels, buffer='', | |
204 | remote_status=None): | |
147 | 205 | """ |
148 | 206 | Upload a file chunk |
149 | 207 | http://docs.amazonwebservices.com/AmazonS3/latest/API/index.html?mpUploadUploadPart.html |
150 | 208 | """ |
151 | 209 | # TODO implement Content-MD5 |
152 | debug("Uploading part %i of %r (%s bytes)" % (seq, self.upload_id, chunk_size)) | |
210 | debug("Uploading part %i of %r (%s bytes)" % (seq, self.upload_id, | |
211 | chunk_size)) | |
153 | 212 | |
154 | 213 | if remote_status is not None: |
155 | 214 | if int(remote_status['size']) == chunk_size: |
156 | checksum = calculateChecksum(buffer, self.file_stream, offset, chunk_size, self.s3.config.send_chunk) | |
215 | checksum = calculateChecksum(buffer, self.file_stream, offset, | |
216 | chunk_size, | |
217 | self.s3.config.send_chunk) | |
157 | 218 | remote_checksum = remote_status['checksum'].strip('"\'') |
158 | 219 | if remote_checksum == checksum: |
159 | warning("MultiPart: size and md5sum match for %s part %d, skipping." % (self.uri, seq)) | |
220 | warning("MultiPart: size and md5sum match for %s part %d, " | |
221 | "skipping." % (self.dst_uri, seq)) | |
160 | 222 | self.parts[seq] = remote_status['checksum'] |
161 | return | |
223 | return None | |
162 | 224 | else: |
163 | warning("MultiPart: checksum (%s vs %s) does not match for %s part %d, reuploading." | |
164 | % (remote_checksum, checksum, self.uri, seq)) | |
225 | warning("MultiPart: checksum (%s vs %s) does not match for" | |
226 | " %s part %d, reuploading." | |
227 | % (remote_checksum, checksum, self.dst_uri, seq)) | |
165 | 228 | else: |
166 | warning("MultiPart: size (%d vs %d) does not match for %s part %d, reuploading." | |
167 | % (int(remote_status['size']), chunk_size, self.uri, seq)) | |
168 | ||
169 | headers = { "content-length": str(chunk_size) } | |
170 | query_string_params = {'partNumber':'%s' % seq, | |
229 | warning("MultiPart: size (%d vs %d) does not match for %s part" | |
230 | " %d, reuploading." % (int(remote_status['size']), | |
231 | chunk_size, self.dst_uri, seq)) | |
232 | ||
233 | headers = {"content-length": str(chunk_size)} | |
234 | query_string_params = {'partNumber': '%s' % seq, | |
171 | 235 | 'uploadId': self.upload_id} |
172 | request = self.s3.create_request("OBJECT_PUT", uri = self.uri, | |
173 | headers = headers, | |
174 | uri_params = query_string_params) | |
175 | response = self.s3.send_file(request, self.file_stream, labels, buffer, offset = offset, chunk_size = chunk_size) | |
236 | request = self.s3.create_request("OBJECT_PUT", uri=self.dst_uri, | |
237 | headers=headers, | |
238 | uri_params=query_string_params) | |
239 | response = self.s3.send_file(request, self.file_stream, labels, buffer, | |
240 | offset=offset, chunk_size=chunk_size) | |
176 | 241 | self.parts[seq] = response["headers"].get('etag', '').strip('"\'') |
242 | return response | |
243 | ||
244 | def copy_part(self, seq, offset, chunk_size, labels, remote_status=None): | |
245 | """ | |
246 | Copy a remote file chunk | |
247 | http://docs.amazonwebservices.com/AmazonS3/latest/API/index.html?mpUploadUploadPart.html | |
248 | http://docs.amazonwebservices.com/AmazonS3/latest/API/mpUploadUploadPartCopy.html | |
249 | """ | |
250 | debug("Copying part %i of %r (%s bytes)" % (seq, self.upload_id, | |
251 | chunk_size)) | |
252 | ||
253 | # set up headers with copy-params. | |
254 | # Examples: | |
255 | # x-amz-copy-source: /source_bucket/sourceObject | |
256 | # x-amz-copy-source-range:bytes=first-last | |
257 | # x-amz-copy-source-if-match: etag | |
258 | # x-amz-copy-source-if-none-match: etag | |
259 | # x-amz-copy-source-if-unmodified-since: time_stamp | |
260 | # x-amz-copy-source-if-modified-since: time_stamp | |
261 | headers = { | |
262 | "x-amz-copy-source": s3_quote("/%s/%s" % (self.src_uri.bucket(), | |
263 | self.src_uri.object()), | |
264 | quote_backslashes=False, | |
265 | unicode_output=True) | |
266 | } | |
267 | ||
268 | # byte range, with end byte included. A 10 byte file has bytes=0-9 | |
269 | headers["x-amz-copy-source-range"] = \ | |
270 | "bytes=%d-%d" % (offset, (offset + chunk_size - 1)) | |
271 | ||
272 | query_string_params = {'partNumber': '%s' % seq, | |
273 | 'uploadId': self.upload_id} | |
274 | request = self.s3.create_request("OBJECT_PUT", uri=self.dst_uri, | |
275 | headers=headers, | |
276 | uri_params=query_string_params) | |
277 | ||
278 | labels[u'action'] = u'remote copy' | |
279 | response = self.s3.send_request_with_progress(request, labels, | |
280 | chunk_size) | |
281 | ||
282 | # NOTE: Amazon sends whitespace while upload progresses, which | |
283 | # accumulates in response body and seems to confuse XML parser. | |
284 | # Strip newlines to find ETag in XML response data | |
285 | #data = response["data"].replace("\n", '') | |
286 | self.parts[seq] = getTextFromXml(response['data'], "ETag") or '' | |
287 | ||
177 | 288 | return response |
178 | 289 | |
179 | 290 | def complete_multipart_upload(self): |
187 | 298 | part_xml = "<Part><PartNumber>%i</PartNumber><ETag>%s</ETag></Part>" |
188 | 299 | for seq, etag in self.parts.items(): |
189 | 300 | parts_xml.append(part_xml % (seq, etag)) |
190 | body = "<CompleteMultipartUpload>%s</CompleteMultipartUpload>" % ("".join(parts_xml)) | |
191 | ||
192 | headers = { "content-length": str(len(body)) } | |
193 | request = self.s3.create_request("OBJECT_POST", uri = self.uri, | |
194 | headers = headers, body = body, | |
195 | uri_params = {'uploadId': self.upload_id}) | |
301 | body = "<CompleteMultipartUpload>%s</CompleteMultipartUpload>" \ | |
302 | % "".join(parts_xml) | |
303 | ||
304 | headers = {"content-length": str(len(body))} | |
305 | request = self.s3.create_request( | |
306 | "OBJECT_POST", uri=self.dst_uri, headers=headers, body=body, | |
307 | uri_params={'uploadId': self.upload_id}) | |
196 | 308 | response = self.s3.send_request(request) |
197 | 309 | |
198 | 310 | return response |
209 | 321 | response = None |
210 | 322 | return response |
211 | 323 | |
324 | ||
212 | 325 | # vim:et:ts=4:sts=4:ai |
6 | 6 | ## Copyright: TGRMN Software and contributors |
7 | 7 | |
8 | 8 | package = "s3cmd" |
9 | version = "2.1.0" | |
9 | version = "2.2.0" | |
10 | 10 | url = "http://s3tools.org" |
11 | 11 | license = "GNU GPL v2+" |
12 | 12 | short_description = "Command line tool for managing Amazon S3 and CloudFront services" |
11 | 11 | import os |
12 | 12 | import time |
13 | 13 | import errno |
14 | import base64 | |
15 | 14 | import mimetypes |
16 | 15 | import io |
17 | 16 | import pprint |
24 | 23 | from urlparse import urlparse |
25 | 24 | except ImportError: |
26 | 25 | from urllib.parse import urlparse |
26 | try: | |
27 | # Python 2 support | |
28 | from base64 import encodestring | |
29 | except ImportError: | |
30 | # Python 3.9.0+ support | |
31 | from base64 import encodebytes as encodestring | |
27 | 32 | |
28 | 33 | import select |
29 | 34 | |
32 | 37 | except ImportError: |
33 | 38 | from md5 import md5 |
34 | 39 | |
35 | from .Utils import * | |
40 | from .BaseUtils import (getListFromXml, getTextFromXml, getRootTagName, | |
41 | decode_from_s3, encode_to_s3, s3_quote) | |
42 | from .Utils import (convertHeaderTupleListToDict, hash_file_md5, unicodise, | |
43 | deunicodise, check_bucket_name, | |
44 | check_bucket_name_dns_support, getHostnameFromBucket, | |
45 | calculateChecksum) | |
36 | 46 | from .SortedDict import SortedDict |
37 | 47 | from .AccessLog import AccessLog |
38 | 48 | from .ACL import ACL, GranteeLogDelivery |
43 | 53 | from .S3Uri import S3Uri |
44 | 54 | from .ConnMan import ConnMan |
45 | 55 | from .Crypto import (sign_request_v2, sign_request_v4, checksum_sha256_file, |
46 | checksum_sha256_buffer, s3_quote, format_param_str) | |
56 | checksum_sha256_buffer, format_param_str) | |
47 | 57 | |
48 | 58 | try: |
49 | 59 | from ctypes import ArgumentError |
122 | 132 | result = (None, None) |
123 | 133 | return result |
124 | 134 | |
135 | ||
125 | 136 | EXPECT_CONTINUE_TIMEOUT = 2 |
126 | ||
137 | SIZE_1MB = 1024 * 1024 | |
127 | 138 | |
128 | 139 | __all__ = [] |
140 | ||
129 | 141 | class S3Request(object): |
130 | 142 | region_map = {} |
131 | 143 | ## S3 sometimes sends HTTP-301, HTTP-307 response |
351 | 363 | num_prefixes = 0 |
352 | 364 | max_keys = limit |
353 | 365 | while truncated: |
354 | response = self.bucket_list_noparse(bucket, prefix, recursive, uri_params, max_keys) | |
366 | response = self.bucket_list_noparse(bucket, prefix, recursive, | |
367 | uri_params, max_keys) | |
355 | 368 | current_list = _get_contents(response["data"]) |
356 | 369 | current_prefixes = _get_common_prefixes(response["data"]) |
357 | 370 | num_objects += len(current_list) |
362 | 375 | if truncated: |
363 | 376 | if limit == -1 or num_objects + num_prefixes < limit: |
364 | 377 | if current_list: |
365 | uri_params['marker'] = _get_next_marker(response["data"], current_list) | |
378 | uri_params['marker'] = \ | |
379 | _get_next_marker(response["data"], current_list) | |
380 | elif current_prefixes: | |
381 | uri_params['marker'] = current_prefixes[-1]["Prefix"] | |
366 | 382 | else: |
367 | uri_params['marker'] = current_prefixes[-1]["Prefix"] | |
383 | # Unexpectedly, the server lied, and so the previous | |
384 | # response was not truncated. So, no new key to get. | |
385 | yield False, current_prefixes, current_list | |
386 | break | |
368 | 387 | debug("Listing continues after '%s'" % uri_params['marker']) |
369 | 388 | else: |
370 | 389 | yield truncated, current_prefixes, current_list |
386 | 405 | #debug(response) |
387 | 406 | return response |
388 | 407 | |
389 | def bucket_create(self, bucket, bucket_location = None): | |
408 | def bucket_create(self, bucket, bucket_location = None, extra_headers = None): | |
390 | 409 | headers = SortedDict(ignore_case = True) |
410 | if extra_headers: | |
411 | headers.update(extra_headers) | |
412 | ||
391 | 413 | body = "" |
392 | 414 | if bucket_location and bucket_location.strip().upper() != "US" and bucket_location.strip().lower() != "us-east-1": |
393 | 415 | bucket_location = bucket_location.strip() |
446 | 468 | return location |
447 | 469 | |
448 | 470 | def get_bucket_requester_pays(self, uri): |
449 | request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), | |
450 | uri_params = {'requestPayment': None}) | |
451 | response = self.send_request(request) | |
452 | payer = getTextFromXml(response['data'], "Payer") | |
471 | request = self.create_request("BUCKET_LIST", bucket=uri.bucket(), | |
472 | uri_params={'requestPayment': None}) | |
473 | response = self.send_request(request) | |
474 | resp_data = response.get('data', '') | |
475 | if resp_data: | |
476 | payer = getTextFromXml(response['data'], "Payer") | |
477 | else: | |
478 | payer = None | |
453 | 479 | return payer |
454 | 480 | |
455 | 481 | def bucket_info(self, uri): |
458 | 484 | try: |
459 | 485 | response['requester-pays'] = self.get_bucket_requester_pays(uri) |
460 | 486 | except S3Error as e: |
461 | response['requester-pays'] = 'none' | |
487 | response['requester-pays'] = None | |
462 | 488 | return response |
463 | 489 | |
464 | 490 | def website_info(self, uri, bucket_location = None): |
515 | 541 | def expiration_info(self, uri, bucket_location = None): |
516 | 542 | bucket = uri.bucket() |
517 | 543 | |
518 | request = self.create_request("BUCKET_LIST", bucket = bucket, | |
519 | uri_params = {'lifecycle': None}) | |
544 | request = self.create_request("BUCKET_LIST", bucket=bucket, | |
545 | uri_params={'lifecycle': None}) | |
520 | 546 | try: |
521 | 547 | response = self.send_request(request) |
522 | response['prefix'] = getTextFromXml(response['data'], ".//Rule//Prefix") | |
523 | response['date'] = getTextFromXml(response['data'], ".//Rule//Expiration//Date") | |
524 | response['days'] = getTextFromXml(response['data'], ".//Rule//Expiration//Days") | |
525 | return response | |
526 | 548 | except S3Error as e: |
527 | 549 | if e.status == 404: |
528 | debug("Could not get /?lifecycle - lifecycle probably not configured for this bucket") | |
550 | debug("Could not get /?lifecycle - lifecycle probably not " | |
551 | "configured for this bucket") | |
529 | 552 | return None |
530 | 553 | elif e.status == 501: |
531 | debug("Could not get /?lifecycle - lifecycle support not implemented by the server") | |
554 | debug("Could not get /?lifecycle - lifecycle support not " | |
555 | "implemented by the server") | |
532 | 556 | return None |
533 | 557 | raise |
558 | ||
559 | root_tag_name = getRootTagName(response['data']) | |
560 | if root_tag_name != "LifecycleConfiguration": | |
561 | debug("Could not get /?lifecycle - unexpected xml response: " | |
562 | "%s", root_tag_name) | |
563 | return None | |
564 | response['prefix'] = getTextFromXml(response['data'], | |
565 | ".//Rule//Prefix") | |
566 | response['date'] = getTextFromXml(response['data'], | |
567 | ".//Rule//Expiration//Date") | |
568 | response['days'] = getTextFromXml(response['data'], | |
569 | ".//Rule//Expiration//Days") | |
570 | return response | |
534 | 571 | |
535 | 572 | def expiration_set(self, uri, bucket_location = None): |
536 | 573 | if self.config.expiry_date and self.config.expiry_days: |
676 | 713 | if not self.config.enable_multipart and filename == "-": |
677 | 714 | raise ParameterError("Multi-part upload is required to upload from stdin") |
678 | 715 | if self.config.enable_multipart: |
679 | if size > self.config.multipart_chunk_size_mb * 1024 * 1024 or filename == "-": | |
716 | if size > self.config.multipart_chunk_size_mb * SIZE_1MB or filename == "-": | |
680 | 717 | multipart = True |
681 | if size > self.config.multipart_max_chunks * self.config.multipart_chunk_size_mb * 1024 * 1024: | |
718 | if size > self.config.multipart_max_chunks * self.config.multipart_chunk_size_mb * SIZE_1MB: | |
682 | 719 | raise ParameterError("Chunk size %d MB results in more than %d chunks. Please increase --multipart-chunk-size-mb" % \ |
683 | 720 | (self.config.multipart_chunk_size_mb, self.config.multipart_max_chunks)) |
684 | 721 | if multipart: |
693 | 730 | # an md5. |
694 | 731 | try: |
695 | 732 | info = self.object_info(uri) |
696 | except: | |
733 | except Exception: | |
697 | 734 | info = None |
698 | 735 | |
699 | 736 | if info is not None: |
775 | 812 | raise ParameterError("You must restore a file for 1 or more days") |
776 | 813 | if self.config.restore_priority not in ['Standard', 'Expedited', 'Bulk']: |
777 | 814 | raise ParameterError("Valid restoration priorities: bulk, standard, expedited") |
778 | body = '<RestoreRequest xmlns="http://s3.amazonaws.com/doc/2006-3-01">' | |
815 | body = '<RestoreRequest xmlns="http://s3.amazonaws.com/doc/2006-03-01/">' | |
779 | 816 | body += (' <Days>%s</Days>' % self.config.restore_days) |
780 | 817 | body += ' <GlacierJobParameters>' |
781 | 818 | body += (' <Tier>%s</Tier>' % self.config.restore_priority) |
803 | 840 | 'server', |
804 | 841 | 'x-amz-id-2', |
805 | 842 | 'x-amz-request-id', |
843 | # Other headers that are not copying by a direct copy | |
844 | 'x-amz-storage-class', | |
845 | ## We should probably also add server-side encryption headers | |
806 | 846 | ] |
807 | 847 | |
808 | 848 | for h in to_remove + self.config.remove_headers: |
810 | 850 | del headers[h.lower()] |
811 | 851 | return headers |
812 | 852 | |
813 | def object_copy(self, src_uri, dst_uri, extra_headers = None): | |
853 | def object_copy(self, src_uri, dst_uri, extra_headers=None, | |
854 | src_size=None, extra_label="", replace_meta=False): | |
855 | """Remote copy an object and eventually set metadata | |
856 | ||
857 | Note: A little memo description of the nightmare for performance here: | |
858 | ** FOR AWS, 2 cases: | |
859 | - COPY will copy the metadata of the source to dest, but you can't | |
860 | modify them. Any additional header will be ignored anyway. | |
861 | - REPLACE will set the additional metadata headers that are provided | |
862 | but will not copy any of the source headers. | |
863 | So, to add to existing meta during copy, you have to do an object_info | |
864 | to get original source headers, then modify, then use REPLACE for the | |
865 | copy operation. | |
866 | ||
867 | ** For Minio and maybe other implementations: | |
868 | - if additional headers are sent, they will be set to the destination | |
869 | on top of source original meta in all cases COPY and REPLACE. | |
870 | It is a nice behavior except that it is different of the aws one. | |
871 | ||
872 | As it was still too easy, there is another catch: | |
873 | In all cases, for multipart copies, metadata data are never copied | |
874 | from the source. | |
875 | """ | |
814 | 876 | if src_uri.type != "s3": |
815 | 877 | raise ValueError("Expected URI type 's3', got '%s'" % src_uri.type) |
816 | 878 | if dst_uri.type != "s3": |
824 | 886 | if exc.status != 501: |
825 | 887 | raise exc |
826 | 888 | acl = None |
827 | headers = SortedDict(ignore_case = True) | |
828 | headers['x-amz-copy-source'] = "/%s/%s" % (src_uri.bucket(), | |
829 | urlencode_string(src_uri.object(), unicode_output=True)) | |
830 | headers['x-amz-metadata-directive'] = "COPY" | |
889 | ||
890 | multipart = False | |
891 | headers = None | |
892 | ||
893 | if extra_headers or self.config.mime_type: | |
894 | # Force replace, that will force getting meta with object_info() | |
895 | replace_meta = True | |
896 | ||
897 | if replace_meta: | |
898 | src_info = self.object_info(src_uri) | |
899 | headers = src_info['headers'] | |
900 | src_size = int(headers["content-length"]) | |
901 | ||
902 | if self.config.enable_multipart: | |
903 | # Get size of remote source only if multipart is enabled and that no | |
904 | # size info was provided | |
905 | src_headers = headers | |
906 | if src_size is None: | |
907 | src_info = self.object_info(src_uri) | |
908 | src_headers = src_info['headers'] | |
909 | src_size = int(src_headers["content-length"]) | |
910 | ||
911 | # If we are over the grand maximum size for a normal copy/modify | |
912 | # (> 5GB) go nuclear and use multipart copy as the only option to | |
913 | # modify an object. | |
914 | # Reason is an aws s3 design bug. See: | |
915 | # https://github.com/aws/aws-sdk-java/issues/367 | |
916 | if src_uri is dst_uri: | |
917 | # optimisation in the case of modify | |
918 | threshold = MultiPartUpload.MAX_CHUNK_SIZE_MB * SIZE_1MB | |
919 | else: | |
920 | threshold = self.config.multipart_copy_chunk_size_mb * SIZE_1MB | |
921 | ||
922 | if src_size > threshold: | |
923 | # Sadly, s3 has a bad logic as metadata will not be copied for | |
924 | # multipart copy unlike what is done for direct copies. | |
925 | # TODO: Optimize by re-using the object_info request done | |
926 | # earlier earlier at fetch remote stage, and preserve headers. | |
927 | if src_headers is None: | |
928 | src_info = self.object_info(src_uri) | |
929 | src_headers = src_info['headers'] | |
930 | src_size = int(src_headers["content-length"]) | |
931 | headers = src_headers | |
932 | multipart = True | |
933 | ||
934 | if headers: | |
935 | self._sanitize_headers(headers) | |
936 | headers = SortedDict(headers, ignore_case=True) | |
937 | else: | |
938 | headers = SortedDict(ignore_case=True) | |
939 | ||
940 | # Following meta data are updated even in COPY by aws | |
831 | 941 | if self.config.acl_public: |
832 | 942 | headers["x-amz-acl"] = "public-read" |
833 | 943 | |
840 | 950 | ## Set kms headers |
841 | 951 | if self.config.kms_key: |
842 | 952 | headers['x-amz-server-side-encryption'] = 'aws:kms' |
843 | headers['x-amz-server-side-encryption-aws-kms-key-id'] = self.config.kms_key | |
844 | ||
953 | headers['x-amz-server-side-encryption-aws-kms-key-id'] = \ | |
954 | self.config.kms_key | |
955 | ||
956 | # Following meta data are not updated in simple COPY by aws. | |
845 | 957 | if extra_headers: |
846 | 958 | headers.update(extra_headers) |
847 | 959 | |
848 | request = self.create_request("OBJECT_PUT", uri = dst_uri, headers = headers) | |
849 | response = self.send_request(request) | |
960 | if self.config.mime_type: | |
961 | headers["content-type"] = self.config.mime_type | |
962 | ||
963 | # "COPY" or "REPLACE" | |
964 | if not replace_meta: | |
965 | headers['x-amz-metadata-directive'] = "COPY" | |
966 | else: | |
967 | headers['x-amz-metadata-directive'] = "REPLACE" | |
968 | ||
969 | if multipart: | |
970 | # Multipart decision. Only do multipart copy for remote s3 files | |
971 | # bigger than the multipart copy threshold. | |
972 | ||
973 | # Multipart requests are quite different... delegate | |
974 | response = self.copy_file_multipart(src_uri, dst_uri, src_size, | |
975 | headers, extra_label) | |
976 | else: | |
977 | # Not multipart... direct request | |
978 | headers['x-amz-copy-source'] = s3_quote( | |
979 | "/%s/%s" % (src_uri.bucket(), src_uri.object()), | |
980 | quote_backslashes=False, unicode_output=True) | |
981 | ||
982 | request = self.create_request("OBJECT_PUT", uri=dst_uri, | |
983 | headers=headers) | |
984 | response = self.send_request(request) | |
985 | ||
850 | 986 | if response["data"] and getRootTagName(response["data"]) == "Error": |
851 | #http://doc.s3.amazonaws.com/proposals/copy.html | |
987 | # http://doc.s3.amazonaws.com/proposals/copy.html | |
852 | 988 | # Error during copy, status will be 200, so force error code 500 |
853 | 989 | response["status"] = 500 |
854 | error("Server error during the COPY operation. Overwrite response status to 500") | |
990 | error("Server error during the COPY operation. Overwrite response " | |
991 | "status to 500") | |
855 | 992 | raise S3Error(response) |
856 | 993 | |
857 | 994 | if self.config.acl_public is None and acl: |
864 | 1001 | raise exc |
865 | 1002 | return response |
866 | 1003 | |
867 | def object_modify(self, src_uri, dst_uri, extra_headers = None): | |
868 | ||
869 | if src_uri.type != "s3": | |
870 | raise ValueError("Expected URI type 's3', got '%s'" % src_uri.type) | |
871 | if dst_uri.type != "s3": | |
872 | raise ValueError("Expected URI type 's3', got '%s'" % dst_uri.type) | |
873 | ||
874 | info_response = self.object_info(src_uri) | |
875 | headers = info_response['headers'] | |
876 | headers = self._sanitize_headers(headers) | |
877 | ||
878 | try: | |
879 | acl = self.get_acl(src_uri) | |
880 | except S3Error as exc: | |
881 | # Ignore the exception and don't fail the modify | |
882 | # if the server doesn't support setting ACLs | |
883 | if exc.status != 501: | |
884 | raise exc | |
885 | acl = None | |
886 | ||
887 | headers['x-amz-copy-source'] = "/%s/%s" % (src_uri.bucket(), | |
888 | urlencode_string(src_uri.object(), unicode_output=True)) | |
889 | headers['x-amz-metadata-directive'] = "REPLACE" | |
890 | ||
891 | # cannot change between standard and reduced redundancy with a REPLACE. | |
892 | ||
893 | ## Set server side encryption | |
894 | if self.config.server_side_encryption: | |
895 | headers["x-amz-server-side-encryption"] = "AES256" | |
896 | ||
897 | ## Set kms headers | |
898 | if self.config.kms_key: | |
899 | headers['x-amz-server-side-encryption'] = 'aws:kms' | |
900 | headers['x-amz-server-side-encryption-aws-kms-key-id'] = self.config.kms_key | |
901 | ||
902 | if extra_headers: | |
903 | headers.update(extra_headers) | |
904 | ||
905 | if self.config.mime_type: | |
906 | headers["content-type"] = self.config.mime_type | |
907 | ||
908 | request = self.create_request("OBJECT_PUT", uri = src_uri, headers = headers) | |
909 | response = self.send_request(request) | |
910 | if response["data"] and getRootTagName(response["data"]) == "Error": | |
911 | #http://doc.s3.amazonaws.com/proposals/copy.html | |
912 | # Error during modify, status will be 200, so force error code 500 | |
913 | response["status"] = 500 | |
914 | error("Server error during the MODIFY operation. Overwrite response status to 500") | |
915 | raise S3Error(response) | |
916 | ||
917 | if acl != None: | |
918 | try: | |
919 | self.set_acl(src_uri, acl) | |
920 | except S3Error as exc: | |
921 | # Ignore the exception and don't fail the modify | |
922 | # if the server doesn't support setting ACLs | |
923 | if exc.status != 501: | |
924 | raise exc | |
925 | ||
926 | return response | |
927 | ||
928 | def object_move(self, src_uri, dst_uri, extra_headers = None): | |
929 | response_copy = self.object_copy(src_uri, dst_uri, extra_headers) | |
1004 | def object_modify(self, src_uri, dst_uri, extra_headers=None, | |
1005 | src_size=None, extra_label=""): | |
1006 | # dst_uri = src_uri Will optimize by using multipart just in worst case | |
1007 | return self.object_copy(src_uri, src_uri, extra_headers, src_size, | |
1008 | extra_label, replace_meta=True) | |
1009 | ||
1010 | def object_move(self, src_uri, dst_uri, extra_headers=None, | |
1011 | src_size=None, extra_label=""): | |
1012 | response_copy = self.object_copy(src_uri, dst_uri, extra_headers, | |
1013 | src_size, extra_label) | |
930 | 1014 | debug("Object %s copied to %s" % (src_uri, dst_uri)) |
931 | if not response_copy["data"] or getRootTagName(response_copy["data"]) == "CopyObjectResult": | |
1015 | if not response_copy["data"] \ | |
1016 | or getRootTagName(response_copy["data"]) \ | |
1017 | in ["CopyObjectResult", "CompleteMultipartUploadResult"]: | |
932 | 1018 | self.object_delete(src_uri) |
933 | 1019 | debug("Object '%s' deleted", src_uri) |
934 | 1020 | else: |
935 | debug("Object '%s' NOT deleted because of an unexepected response data content.", src_uri) | |
1021 | warning("Object '%s' NOT deleted because of an unexpected " | |
1022 | "response data content.", src_uri) | |
936 | 1023 | return response_copy |
937 | 1024 | |
938 | 1025 | def object_info(self, uri): |
939 | request = self.create_request("OBJECT_HEAD", uri = uri) | |
1026 | request = self.create_request("OBJECT_HEAD", uri=uri) | |
940 | 1027 | response = self.send_request(request) |
941 | 1028 | return response |
942 | 1029 | |
943 | 1030 | def get_acl(self, uri): |
944 | 1031 | if uri.has_object(): |
945 | request = self.create_request("OBJECT_GET", uri = uri, | |
946 | uri_params = {'acl': None}) | |
1032 | request = self.create_request("OBJECT_GET", uri=uri, | |
1033 | uri_params={'acl': None}) | |
947 | 1034 | else: |
948 | request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), | |
949 | uri_params = {'acl': None}) | |
1035 | request = self.create_request("BUCKET_LIST", bucket=uri.bucket(), | |
1036 | uri_params={'acl': None}) | |
950 | 1037 | |
951 | 1038 | response = self.send_request(request) |
952 | 1039 | acl = ACL(response['data']) |
953 | 1040 | return acl |
954 | 1041 | |
955 | 1042 | def set_acl(self, uri, acl): |
956 | # dreamhost doesn't support set_acl properly | |
957 | if 'objects.dreamhost.com' in self.config.host_base: | |
958 | return { 'status' : 501 } # not implemented | |
959 | ||
960 | 1043 | body = u"%s"% acl |
961 | 1044 | debug(u"set_acl(%s): acl-xml: %s" % (uri, body)) |
962 | 1045 | |
1060 | 1143 | response = self.send_request(request) |
1061 | 1144 | return response |
1062 | 1145 | |
1063 | def get_multipart(self, uri): | |
1064 | request = self.create_request("BUCKET_LIST", bucket = uri.bucket(), | |
1065 | uri_params = {'uploads': None}) | |
1146 | def get_multipart(self, uri, uri_params=None, limit=-1): | |
1147 | upload_list = [] | |
1148 | for truncated, uploads in self.get_multipart_streaming(uri, | |
1149 | uri_params, | |
1150 | limit): | |
1151 | upload_list.extend(uploads) | |
1152 | ||
1153 | return upload_list | |
1154 | ||
1155 | def get_multipart_streaming(self, uri, uri_params=None, limit=-1): | |
1156 | uri_params = uri_params and uri_params.copy() or {} | |
1157 | bucket = uri.bucket() | |
1158 | ||
1159 | truncated = True | |
1160 | num_objects = 0 | |
1161 | max_keys = limit | |
1162 | ||
1163 | # It is the "uploads: None" in uri_params that will change the | |
1164 | # behavior of bucket_list to return multiparts instead of keys | |
1165 | uri_params['uploads'] = None | |
1166 | while truncated: | |
1167 | response = self.bucket_list_noparse(bucket, recursive=True, | |
1168 | uri_params=uri_params, | |
1169 | max_keys=max_keys) | |
1170 | ||
1171 | xml_data = response["data"] | |
1172 | # extract list of info of uploads | |
1173 | upload_list = getListFromXml(xml_data, "Upload") | |
1174 | num_objects += len(upload_list) | |
1175 | if limit > num_objects: | |
1176 | max_keys = limit - num_objects | |
1177 | ||
1178 | xml_truncated = getTextFromXml(xml_data, ".//IsTruncated") | |
1179 | if not xml_truncated or xml_truncated.lower() == "false": | |
1180 | truncated = False | |
1181 | ||
1182 | if truncated: | |
1183 | if limit == -1 or num_objects < limit: | |
1184 | if upload_list: | |
1185 | next_key = getTextFromXml(xml_data, "NextKeyMarker") | |
1186 | if not next_key: | |
1187 | next_key = upload_list[-1]["Key"] | |
1188 | uri_params['KeyMarker'] = next_key | |
1189 | ||
1190 | upload_id_marker = getTextFromXml( | |
1191 | xml_data, "NextUploadIdMarker") | |
1192 | if upload_id_marker: | |
1193 | uri_params['UploadIdMarker'] = upload_id_marker | |
1194 | elif 'UploadIdMarker' in uri_params: | |
1195 | # Clear any pre-existing value | |
1196 | del uri_params['UploadIdMarker'] | |
1197 | else: | |
1198 | # Unexpectedly, the server lied, and so the previous | |
1199 | # response was not truncated. So, no new key to get. | |
1200 | yield False, upload_list | |
1201 | break | |
1202 | debug("Listing continues after '%s'" % | |
1203 | uri_params['KeyMarker']) | |
1204 | else: | |
1205 | yield truncated, upload_list | |
1206 | break | |
1207 | yield truncated, upload_list | |
1208 | ||
1209 | def list_multipart(self, uri, upload_id, uri_params=None, limit=-1): | |
1210 | part_list = [] | |
1211 | for truncated, parts in self.list_multipart_streaming(uri, | |
1212 | upload_id, | |
1213 | uri_params, | |
1214 | limit): | |
1215 | part_list.extend(parts) | |
1216 | ||
1217 | return part_list | |
1218 | ||
1219 | def list_multipart_streaming(self, uri, upload_id, uri_params=None, | |
1220 | limit=-1): | |
1221 | uri_params = uri_params and uri_params.copy() or {} | |
1222 | ||
1223 | truncated = True | |
1224 | num_objects = 0 | |
1225 | max_parts = limit | |
1226 | ||
1227 | while truncated: | |
1228 | response = self.list_multipart_noparse(uri, upload_id, | |
1229 | uri_params, max_parts) | |
1230 | ||
1231 | xml_data = response["data"] | |
1232 | # extract list of multipart upload parts | |
1233 | part_list = getListFromXml(xml_data, "Part") | |
1234 | num_objects += len(part_list) | |
1235 | if limit > num_objects: | |
1236 | max_parts = limit - num_objects | |
1237 | ||
1238 | xml_truncated = getTextFromXml(xml_data, ".//IsTruncated") | |
1239 | if not xml_truncated or xml_truncated.lower() == "false": | |
1240 | truncated = False | |
1241 | ||
1242 | if truncated: | |
1243 | if limit == -1 or num_objects < limit: | |
1244 | if part_list: | |
1245 | next_part_number = getTextFromXml( | |
1246 | xml_data, "NextPartNumberMarker") | |
1247 | if not next_part_number: | |
1248 | next_part_number = part_list[-1]["PartNumber"] | |
1249 | uri_params['part-number-marker'] = next_part_number | |
1250 | else: | |
1251 | # Unexpectedly, the server lied, and so the previous | |
1252 | # response was not truncated. So, no new part to get. | |
1253 | yield False, part_list | |
1254 | break | |
1255 | debug("Listing continues after Part '%s'" % | |
1256 | uri_params['part-number-marker']) | |
1257 | else: | |
1258 | yield truncated, part_list | |
1259 | break | |
1260 | yield truncated, part_list | |
1261 | ||
1262 | def list_multipart_noparse(self, uri, upload_id, uri_params=None, | |
1263 | max_parts=-1): | |
1264 | if uri_params is None: | |
1265 | uri_params = {} | |
1266 | if max_parts != -1: | |
1267 | uri_params['max-parts'] = str(max_parts) | |
1268 | uri_params['uploadId'] = upload_id | |
1269 | request = self.create_request("OBJECT_GET", uri=uri, | |
1270 | uri_params=uri_params) | |
1066 | 1271 | response = self.send_request(request) |
1067 | 1272 | return response |
1068 | 1273 | |
1069 | 1274 | def abort_multipart(self, uri, id): |
1070 | 1275 | request = self.create_request("OBJECT_DELETE", uri = uri, |
1071 | uri_params = {'uploadId': id}) | |
1072 | response = self.send_request(request) | |
1073 | return response | |
1074 | ||
1075 | def list_multipart(self, uri, id): | |
1076 | request = self.create_request("OBJECT_GET", uri = uri, | |
1077 | 1276 | uri_params = {'uploadId': id}) |
1078 | 1277 | response = self.send_request(request) |
1079 | 1278 | return response |
1347 | 1546 | if response["status"] == 405: # Method Not Allowed. Don't retry. |
1348 | 1547 | raise S3Error(response) |
1349 | 1548 | |
1350 | if response["status"] >= 500: | |
1549 | if response["status"] >= 500 or response["status"] == 429: | |
1351 | 1550 | e = S3Error(response) |
1352 | 1551 | |
1353 | 1552 | if response["status"] == 501: |
1364 | 1563 | |
1365 | 1564 | if response["status"] < 200 or response["status"] > 299: |
1366 | 1565 | raise S3Error(response) |
1566 | ||
1567 | return response | |
1568 | ||
1569 | def send_request_with_progress(self, request, labels, operation_size=0): | |
1570 | """Wrapper around send_request for slow requests. | |
1571 | ||
1572 | To be able to show progression for small requests | |
1573 | """ | |
1574 | if not self.config.progress_meter: | |
1575 | info("Sending slow request, please wait...") | |
1576 | return self.send_request(request) | |
1577 | ||
1578 | if 'action' not in labels: | |
1579 | labels[u'action'] = u'request' | |
1580 | progress = self.config.progress_class(labels, operation_size) | |
1581 | ||
1582 | try: | |
1583 | response = self.send_request(request) | |
1584 | except Exception as exc: | |
1585 | progress.done("failed") | |
1586 | raise | |
1587 | ||
1588 | progress.update(current_position=operation_size) | |
1589 | progress.done("done") | |
1367 | 1590 | |
1368 | 1591 | return response |
1369 | 1592 | |
1444 | 1667 | http_response.read() |
1445 | 1668 | conn.c._HTTPConnection__state = ConnMan._CS_REQ_SENT |
1446 | 1669 | |
1447 | while (size_left > 0): | |
1670 | while size_left > 0: | |
1448 | 1671 | #debug("SendFile: Reading up to %d bytes from '%s' - remaining bytes: %s" % (self.config.send_chunk, filename, size_left)) |
1449 | 1672 | l = min(self.config.send_chunk, size_left) |
1450 | 1673 | if buffer == '': |
1451 | 1674 | data = stream.read(l) |
1452 | 1675 | else: |
1453 | 1676 | data = buffer |
1677 | ||
1678 | if not data: | |
1679 | raise InvalidFileError("File smaller than expected. Was the file truncated?") | |
1454 | 1680 | |
1455 | 1681 | if self.config.limitrate > 0: |
1456 | 1682 | start_time = time.time() |
1483 | 1709 | ConnMan.put(conn) |
1484 | 1710 | debug(u"Response:\n" + pprint.pformat(response)) |
1485 | 1711 | except ParameterError as e: |
1712 | raise | |
1713 | except InvalidFileError as e: | |
1714 | if self.config.progress_meter: | |
1715 | progress.done("failed") | |
1486 | 1716 | raise |
1487 | 1717 | except Exception as e: |
1488 | 1718 | if self.config.progress_meter: |
1506 | 1736 | response["data"] = http_response.read() |
1507 | 1737 | response["size"] = size_total |
1508 | 1738 | known_error = True |
1509 | except: | |
1739 | except Exception: | |
1510 | 1740 | error("Cannot retrieve any response status before encountering an EPIPE or ECONNRESET exception") |
1511 | 1741 | if not known_error: |
1512 | 1742 | warning("Upload failed: %s (%s)" % (resource['uri'], e)) |
1564 | 1794 | if response["status"] < 200 or response["status"] > 299: |
1565 | 1795 | try_retry = False |
1566 | 1796 | if response["status"] >= 500: |
1567 | ## AWS internal error - retry | |
1797 | # AWS internal error - retry | |
1568 | 1798 | try_retry = True |
1569 | 1799 | if response["status"] == 503: |
1570 | 1800 | ## SlowDown error |
1571 | 1801 | throttle = throttle and throttle * 5 or 0.01 |
1802 | elif response["status"] == 429: | |
1803 | # Not an AWS error, but s3 compatible server possible error: | |
1804 | # TooManyRequests/Busy/slowdown | |
1805 | try_retry = True | |
1806 | throttle = throttle and throttle * 5 or 0.01 | |
1572 | 1807 | elif response["status"] >= 400: |
1573 | 1808 | err = S3Error(response) |
1574 | 1809 | ## Retriable client error? |
1585 | 1820 | return self.send_file(request, stream, labels, buffer, throttle, |
1586 | 1821 | retries - 1, offset, chunk_size, use_expect_continue) |
1587 | 1822 | else: |
1588 | warning("Too many failures. Giving up on '%s'" % (filename)) | |
1589 | raise S3UploadError | |
1823 | warning("Too many failures. Giving up on '%s'" % filename) | |
1824 | raise S3UploadError("Too many failures. Giving up on '%s'" | |
1825 | % filename) | |
1590 | 1826 | |
1591 | 1827 | ## Non-recoverable error |
1592 | 1828 | raise S3Error(response) |
1602 | 1838 | retries - 1, offset, chunk_size, use_expect_continue) |
1603 | 1839 | else: |
1604 | 1840 | warning("Too many failures. Giving up on '%s'" % (filename)) |
1605 | raise S3UploadError | |
1606 | ||
1607 | return response | |
1608 | ||
1609 | def send_file_multipart(self, stream, headers, uri, size, extra_label = ""): | |
1841 | raise S3UploadError("Too many failures. Giving up on '%s'" | |
1842 | % filename) | |
1843 | ||
1844 | return response | |
1845 | ||
1846 | def send_file_multipart(self, stream, headers, uri, size, extra_label=""): | |
1610 | 1847 | timestamp_start = time.time() |
1611 | upload = MultiPartUpload(self, stream, uri, headers) | |
1848 | upload = MultiPartUpload(self, stream, uri, headers, size) | |
1612 | 1849 | upload.upload_all_parts(extra_label) |
1613 | 1850 | response = upload.complete_multipart_upload() |
1614 | 1851 | timestamp_end = time.time() |
1621 | 1858 | # raise S3UploadError |
1622 | 1859 | raise S3UploadError(getTextFromXml(response["data"], 'Message')) |
1623 | 1860 | return response |
1861 | ||
1862 | def copy_file_multipart(self, src_uri, dst_uri, size, headers, | |
1863 | extra_label=""): | |
1864 | return self.send_file_multipart(src_uri, headers, dst_uri, size, | |
1865 | extra_label) | |
1624 | 1866 | |
1625 | 1867 | def recv_file(self, request, stream, labels, start_position = 0, retries = _max_retries): |
1626 | 1868 | self.update_region_inner_request(request) |
1827 | 2069 | |
1828 | 2070 | def compute_content_md5(body): |
1829 | 2071 | m = md5(encode_to_s3(body)) |
1830 | base64md5 = base64.encodestring(m.digest()) | |
2072 | base64md5 = encodestring(m.digest()) | |
1831 | 2073 | base64md5 = decode_from_s3(base64md5) |
1832 | 2074 | if base64md5[-1] == '\n': |
1833 | 2075 | base64md5 = base64md5[0:-1] |
36 | 36 | def keys(self): |
37 | 37 | # TODO fix |
38 | 38 | # Probably not anymore memory efficient on python2 |
39 | # as now 2 copies ok keys to sort them. | |
39 | # as now 2 copies of keys to sort them. | |
40 | 40 | keys = dict.keys(self) |
41 | 41 | if self.ignore_case: |
42 | 42 | # Translation map |
8 | 8 | from __future__ import absolute_import, division |
9 | 9 | |
10 | 10 | import os |
11 | import sys | |
12 | 11 | import time |
13 | 12 | import re |
14 | import string | |
13 | import string as string_mod | |
15 | 14 | import random |
16 | 15 | import errno |
17 | from calendar import timegm | |
18 | from logging import debug, warning, error | |
19 | from .ExitCodes import EX_OSFILE | |
20 | try: | |
21 | import dateutil.parser | |
22 | except ImportError: | |
23 | sys.stderr.write(u""" | |
24 | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | |
25 | ImportError trying to import dateutil.parser. | |
26 | Please install the python dateutil module: | |
27 | $ sudo apt-get install python-dateutil | |
28 | or | |
29 | $ sudo yum install python-dateutil | |
30 | or | |
31 | $ pip install python-dateutil | |
32 | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | |
33 | """) | |
34 | sys.stderr.flush() | |
35 | sys.exit(EX_OSFILE) | |
36 | ||
37 | try: | |
38 | from urllib import quote | |
39 | except ImportError: | |
40 | # python 3 support | |
41 | from urllib.parse import quote | |
16 | from hashlib import md5 | |
17 | from logging import debug | |
18 | ||
42 | 19 | |
43 | 20 | try: |
44 | 21 | unicode |
47 | 24 | # In python 3, unicode -> str, and str -> bytes |
48 | 25 | unicode = str |
49 | 26 | |
27 | ||
50 | 28 | import S3.Config |
51 | 29 | import S3.Exceptions |
52 | import xml.dom.minidom | |
53 | ||
54 | from hashlib import md5 | |
55 | ||
56 | import xml.etree.ElementTree as ET | |
30 | ||
31 | from S3.BaseUtils import (base_urlencode_string, base_replace_nonprintables, | |
32 | base_unicodise, base_deunicodise) | |
33 | ||
57 | 34 | |
58 | 35 | __all__ = [] |
59 | def parseNodes(nodes): | |
60 | ## WARNING: Ignores text nodes from mixed xml/text. | |
61 | ## For instance <tag1>some text<tag2>other text</tag2></tag1> | |
62 | ## will be ignore "some text" node | |
63 | retval = [] | |
64 | for node in nodes: | |
65 | retval_item = {} | |
66 | for child in node.getchildren(): | |
67 | name = decode_from_s3(child.tag) | |
68 | if child.getchildren(): | |
69 | retval_item[name] = parseNodes([child]) | |
70 | else: | |
71 | found_text = node.findtext(".//%s" % child.tag) | |
72 | if found_text is not None: | |
73 | retval_item[name] = decode_from_s3(found_text) | |
74 | else: | |
75 | retval_item[name] = None | |
76 | retval.append(retval_item) | |
77 | return retval | |
78 | __all__.append("parseNodes") | |
79 | ||
80 | def getPrettyFromXml(xmlstr): | |
81 | xmlparser = xml.dom.minidom.parseString(xmlstr) | |
82 | return xmlparser.toprettyxml() | |
83 | ||
84 | __all__.append("getPrettyFromXml") | |
85 | ||
86 | RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](http://[^\'"]+)[\'"]', re.MULTILINE) | |
87 | ||
88 | def stripNameSpace(xml): | |
89 | """ | |
90 | removeNameSpace(xml) -- remove top-level AWS namespace | |
91 | Operate on raw byte(utf-8) xml string. (Not unicode) | |
92 | """ | |
93 | xmlns_match = RE_XML_NAMESPACE.match(xml) | |
94 | if xmlns_match: | |
95 | xmlns = xmlns_match.group(3) | |
96 | xml = RE_XML_NAMESPACE.sub("\\1\\2", xml, 1) | |
97 | else: | |
98 | xmlns = None | |
99 | return xml, xmlns | |
100 | __all__.append("stripNameSpace") | |
101 | ||
102 | def getTreeFromXml(xml): | |
103 | xml, xmlns = stripNameSpace(encode_to_s3(xml)) | |
104 | try: | |
105 | tree = ET.fromstring(xml) | |
106 | if xmlns: | |
107 | tree.attrib['xmlns'] = xmlns | |
108 | return tree | |
109 | except Exception as e: | |
110 | error("Error parsing xml: %s", e) | |
111 | error(xml) | |
112 | raise | |
113 | ||
114 | __all__.append("getTreeFromXml") | |
115 | ||
116 | def getListFromXml(xml, node): | |
117 | tree = getTreeFromXml(xml) | |
118 | nodes = tree.findall('.//%s' % (node)) | |
119 | return parseNodes(nodes) | |
120 | __all__.append("getListFromXml") | |
121 | ||
122 | def getDictFromTree(tree): | |
123 | ret_dict = {} | |
124 | for child in tree.getchildren(): | |
125 | if child.getchildren(): | |
126 | ## Complex-type child. Recurse | |
127 | content = getDictFromTree(child) | |
128 | else: | |
129 | content = decode_from_s3(child.text) if child.text is not None else None | |
130 | child_tag = decode_from_s3(child.tag) | |
131 | if child_tag in ret_dict: | |
132 | if not type(ret_dict[child_tag]) == list: | |
133 | ret_dict[child_tag] = [ret_dict[child_tag]] | |
134 | ret_dict[child_tag].append(content or "") | |
135 | else: | |
136 | ret_dict[child_tag] = content or "" | |
137 | return ret_dict | |
138 | __all__.append("getDictFromTree") | |
139 | ||
140 | def getTextFromXml(xml, xpath): | |
141 | tree = getTreeFromXml(xml) | |
142 | if tree.tag.endswith(xpath): | |
143 | return decode_from_s3(tree.text) if tree.text is not None else None | |
144 | else: | |
145 | result = tree.findtext(xpath) | |
146 | return decode_from_s3(result) if result is not None else None | |
147 | __all__.append("getTextFromXml") | |
148 | ||
149 | def getRootTagName(xml): | |
150 | tree = getTreeFromXml(xml) | |
151 | return decode_from_s3(tree.tag) if tree.tag is not None else None | |
152 | __all__.append("getRootTagName") | |
153 | ||
154 | def xmlTextNode(tag_name, text): | |
155 | el = ET.Element(tag_name) | |
156 | el.text = decode_from_s3(text) | |
157 | return el | |
158 | __all__.append("xmlTextNode") | |
159 | ||
160 | def appendXmlTextNode(tag_name, text, parent): | |
161 | """ | |
162 | Creates a new <tag_name> Node and sets | |
163 | its content to 'text'. Then appends the | |
164 | created Node to 'parent' element if given. | |
165 | Returns the newly created Node. | |
166 | """ | |
167 | el = xmlTextNode(tag_name, text) | |
168 | parent.append(el) | |
169 | return el | |
170 | __all__.append("appendXmlTextNode") | |
171 | ||
172 | RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)') | |
173 | ||
174 | def dateS3toPython(date): | |
175 | # Reset milliseconds to 000 | |
176 | date = RE_S3_DATESTRING.sub(".000", date) | |
177 | return dateutil.parser.parse(date, fuzzy=True) | |
178 | __all__.append("dateS3toPython") | |
179 | ||
180 | def dateS3toUnix(date): | |
181 | ## NOTE: This is timezone-aware and return the timestamp regarding GMT | |
182 | return timegm(dateS3toPython(date).utctimetuple()) | |
183 | __all__.append("dateS3toUnix") | |
184 | ||
185 | def dateRFC822toPython(date): | |
186 | return dateutil.parser.parse(date, fuzzy=True) | |
187 | __all__.append("dateRFC822toPython") | |
188 | ||
189 | def dateRFC822toUnix(date): | |
190 | return timegm(dateRFC822toPython(date).utctimetuple()) | |
191 | __all__.append("dateRFC822toUnix") | |
192 | ||
193 | def formatSize(size, human_readable = False, floating_point = False): | |
36 | ||
37 | ||
38 | def formatSize(size, human_readable=False, floating_point=False): | |
194 | 39 | size = floating_point and float(size) or int(size) |
195 | 40 | if human_readable: |
196 | 41 | coeffs = ['K', 'M', 'G', 'T'] |
203 | 48 | return (size, "") |
204 | 49 | __all__.append("formatSize") |
205 | 50 | |
206 | def formatDateTime(s3timestamp): | |
207 | date_obj = dateutil.parser.parse(s3timestamp, fuzzy=True) | |
208 | return date_obj.strftime("%Y-%m-%d %H:%M") | |
209 | __all__.append("formatDateTime") | |
210 | 51 | |
211 | 52 | def convertHeaderTupleListToDict(list): |
212 | 53 | """ |
218 | 59 | return retval |
219 | 60 | __all__.append("convertHeaderTupleListToDict") |
220 | 61 | |
221 | _rnd_chars = string.ascii_letters+string.digits | |
62 | _rnd_chars = string_mod.ascii_letters + string_mod.digits | |
222 | 63 | _rnd_chars_len = len(_rnd_chars) |
223 | 64 | def rndstr(len): |
224 | 65 | retval = "" |
227 | 68 | len -= 1 |
228 | 69 | return retval |
229 | 70 | __all__.append("rndstr") |
71 | ||
230 | 72 | |
231 | 73 | def mktmpsomething(prefix, randchars, createfunc): |
232 | 74 | old_umask = os.umask(0o077) |
246 | 88 | return dirname |
247 | 89 | __all__.append("mktmpsomething") |
248 | 90 | |
91 | ||
249 | 92 | def mktmpdir(prefix = os.getenv('TMP','/tmp') + "/tmpdir-", randchars = 10): |
250 | 93 | return mktmpsomething(prefix, randchars, os.mkdir) |
251 | 94 | __all__.append("mktmpdir") |
95 | ||
252 | 96 | |
253 | 97 | def mktmpfile(prefix = os.getenv('TMP','/tmp') + "/tmpfile-", randchars = 20): |
254 | 98 | createfunc = lambda filename : os.close(os.open(deunicodise(filename), os.O_CREAT | os.O_EXCL)) |
255 | 99 | return mktmpsomething(prefix, randchars, createfunc) |
256 | 100 | __all__.append("mktmpfile") |
101 | ||
257 | 102 | |
258 | 103 | def hash_file_md5(filename): |
259 | 104 | h = md5() |
266 | 111 | h.update(data) |
267 | 112 | return h.hexdigest() |
268 | 113 | __all__.append("hash_file_md5") |
114 | ||
269 | 115 | |
270 | 116 | def mkdir_with_parents(dir_name): |
271 | 117 | """ |
294 | 140 | return True |
295 | 141 | __all__.append("mkdir_with_parents") |
296 | 142 | |
297 | def unicodise(string, encoding = None, errors = "replace", silent=False): | |
298 | """ | |
299 | Convert 'string' to Unicode or raise an exception. | |
300 | """ | |
301 | ||
143 | def unicodise(string, encoding=None, errors='replace', silent=False): | |
302 | 144 | if not encoding: |
303 | 145 | encoding = S3.Config.Config().encoding |
304 | ||
305 | if type(string) == unicode: | |
306 | return string | |
307 | ||
308 | if not silent: | |
309 | debug("Unicodising %r using %s" % (string, encoding)) | |
310 | try: | |
311 | return unicode(string, encoding, errors) | |
312 | except UnicodeDecodeError: | |
313 | raise UnicodeDecodeError("Conversion to unicode failed: %r" % string) | |
146 | return base_unicodise(string, encoding, errors, silent) | |
314 | 147 | __all__.append("unicodise") |
315 | 148 | |
316 | def unicodise_s(string, encoding = None, errors = "replace"): | |
149 | ||
150 | def unicodise_s(string, encoding=None, errors='replace'): | |
317 | 151 | """ |
318 | 152 | Alias to silent version of unicodise |
319 | 153 | """ |
320 | 154 | return unicodise(string, encoding, errors, True) |
321 | 155 | __all__.append("unicodise_s") |
322 | 156 | |
323 | def deunicodise(string, encoding = None, errors = "replace", silent=False): | |
324 | """ | |
325 | Convert unicode 'string' to <type str>, by default replacing | |
326 | all invalid characters with '?' or raise an exception. | |
327 | """ | |
328 | ||
157 | ||
158 | def deunicodise(string, encoding=None, errors='replace', silent=False): | |
329 | 159 | if not encoding: |
330 | 160 | encoding = S3.Config.Config().encoding |
331 | ||
332 | if type(string) != unicode: | |
333 | return string | |
334 | ||
335 | if not silent: | |
336 | debug("DeUnicodising %r using %s" % (string, encoding)) | |
337 | try: | |
338 | return string.encode(encoding, errors) | |
339 | except UnicodeEncodeError: | |
340 | raise UnicodeEncodeError("Conversion from unicode failed: %r" % string) | |
161 | return base_deunicodise(string, encoding, errors, silent) | |
341 | 162 | __all__.append("deunicodise") |
342 | 163 | |
343 | def deunicodise_s(string, encoding = None, errors = "replace"): | |
164 | ||
165 | def deunicodise_s(string, encoding=None, errors='replace'): | |
344 | 166 | """ |
345 | 167 | Alias to silent version of deunicodise |
346 | 168 | """ |
347 | 169 | return deunicodise(string, encoding, errors, True) |
348 | 170 | __all__.append("deunicodise_s") |
349 | 171 | |
350 | def unicodise_safe(string, encoding = None): | |
172 | ||
173 | def unicodise_safe(string, encoding=None): | |
351 | 174 | """ |
352 | 175 | Convert 'string' to Unicode according to current encoding |
353 | 176 | and replace all invalid characters with '?' |
356 | 179 | return unicodise(deunicodise(string, encoding), encoding).replace(u'\ufffd', '?') |
357 | 180 | __all__.append("unicodise_safe") |
358 | 181 | |
359 | def decode_from_s3(string, errors = "replace"): | |
360 | """ | |
361 | Convert S3 UTF-8 'string' to Unicode or raise an exception. | |
362 | """ | |
363 | if type(string) == unicode: | |
364 | return string | |
365 | # Be quiet by default | |
366 | #debug("Decoding string from S3: %r" % string) | |
367 | try: | |
368 | return unicode(string, "UTF-8", errors) | |
369 | except UnicodeDecodeError: | |
370 | raise UnicodeDecodeError("Conversion to unicode failed: %r" % string) | |
371 | __all__.append("decode_from_s3") | |
372 | ||
373 | def encode_to_s3(string, errors = "replace"): | |
374 | """ | |
375 | Convert Unicode to S3 UTF-8 'string', by default replacing | |
376 | all invalid characters with '?' or raise an exception. | |
377 | """ | |
378 | if type(string) != unicode: | |
379 | return string | |
380 | # Be quiet by default | |
381 | #debug("Encoding string to S3: %r" % string) | |
382 | try: | |
383 | return string.encode("UTF-8", errors) | |
384 | except UnicodeEncodeError: | |
385 | raise UnicodeEncodeError("Conversion from unicode failed: %r" % string) | |
386 | __all__.append("encode_to_s3") | |
387 | 182 | |
388 | 183 | ## Low level methods |
389 | def urlencode_string(string, urlencoding_mode = None, unicode_output=False): | |
390 | string = encode_to_s3(string) | |
391 | ||
184 | def urlencode_string(string, urlencoding_mode=None, unicode_output=False): | |
392 | 185 | if urlencoding_mode is None: |
393 | 186 | urlencoding_mode = S3.Config.Config().urlencoding_mode |
394 | ||
395 | if urlencoding_mode == "verbatim": | |
396 | ## Don't do any pre-processing | |
397 | return string | |
398 | ||
399 | encoded = quote(string, safe="~/") | |
400 | debug("String '%s' encoded to '%s'" % (string, encoded)) | |
401 | if unicode_output: | |
402 | return decode_from_s3(encoded) | |
403 | else: | |
404 | return encode_to_s3(encoded) | |
187 | return base_urlencode_string(string, urlencoding_mode, unicode_output) | |
405 | 188 | __all__.append("urlencode_string") |
189 | ||
406 | 190 | |
407 | 191 | def replace_nonprintables(string): |
408 | 192 | """ |
411 | 195 | Replaces all non-printable characters 'ch' in 'string' |
412 | 196 | where ord(ch) <= 26 with ^@, ^A, ... ^Z |
413 | 197 | """ |
414 | new_string = "" | |
415 | modified = 0 | |
416 | for c in string: | |
417 | o = ord(c) | |
418 | if (o <= 31): | |
419 | new_string += "^" + chr(ord('@') + o) | |
420 | modified += 1 | |
421 | elif (o == 127): | |
422 | new_string += "^?" | |
423 | modified += 1 | |
424 | else: | |
425 | new_string += c | |
426 | if modified and S3.Config.Config().urlencoding_mode != "fixbucket": | |
427 | warning("%d non-printable characters replaced in: %s" % (modified, new_string)) | |
428 | return new_string | |
198 | warning_message = (S3.Config.Config().urlencoding_mode != "fixbucket") | |
199 | return base_replace_nonprintables(string, warning_message) | |
429 | 200 | __all__.append("replace_nonprintables") |
201 | ||
430 | 202 | |
431 | 203 | def time_to_epoch(t): |
432 | 204 | """Convert time specified in a variety of forms into UNIX epoch time. |
462 | 234 | raise S3.Exceptions.ParameterError('Unable to convert %r to an epoch time. Pass an epoch time. Try `date -d \'now + 1 year\' +%%s` (shell) or time.mktime (Python).' % t) |
463 | 235 | |
464 | 236 | |
465 | def check_bucket_name(bucket, dns_strict = True): | |
237 | def check_bucket_name(bucket, dns_strict=True): | |
466 | 238 | if dns_strict: |
467 | 239 | invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE) |
468 | 240 | if invalid: |
497 | 269 | return False |
498 | 270 | __all__.append("check_bucket_name_dns_conformity") |
499 | 271 | |
272 | ||
500 | 273 | def check_bucket_name_dns_support(bucket_host, bucket_name): |
501 | 274 | """ |
502 | 275 | Check whether either the host_bucket support buckets and |
507 | 280 | |
508 | 281 | return check_bucket_name_dns_conformity(bucket_name) |
509 | 282 | __all__.append("check_bucket_name_dns_support") |
283 | ||
510 | 284 | |
511 | 285 | def getBucketFromHostname(hostname): |
512 | 286 | """ |
528 | 302 | return m.group(1), True |
529 | 303 | __all__.append("getBucketFromHostname") |
530 | 304 | |
305 | ||
531 | 306 | def getHostnameFromBucket(bucket): |
532 | 307 | return S3.Config.Config().host_bucket.lower() % { 'bucket' : bucket } |
533 | 308 | __all__.append("getHostnameFromBucket") |
540 | 315 | mfile.seek(offset) |
541 | 316 | while size_left > 0: |
542 | 317 | data = mfile.read(min(send_chunk, size_left)) |
318 | if not data: | |
319 | break | |
543 | 320 | md5_hash.update(data) |
544 | 321 | size_left -= len(data) |
545 | 322 | else: |
575 | 352 | |
576 | 353 | |
577 | 354 | # vim:et:ts=4:sts=4:ai |
578 |
24 | 24 | |
25 | 25 | if sys.version_info < (2, 6): |
26 | 26 | sys.stderr.write(u"ERROR: Python 2.6 or higher required, sorry.\n") |
27 | sys.exit(EX_OSFILE) | |
27 | # 72 == EX_OSFILE | |
28 | sys.exit(72) | |
28 | 29 | |
29 | 30 | PY3 = (sys.version_info >= (3, 0)) |
30 | 31 | |
50 | 51 | |
51 | 52 | try: |
52 | 53 | import htmlentitydefs |
53 | except: | |
54 | except Exception: | |
54 | 55 | # python 3 support |
55 | 56 | import html.entities as htmlentitydefs |
56 | 57 | |
60 | 61 | # python 3 support |
61 | 62 | # In python 3, unicode -> str, and str -> bytes |
62 | 63 | unicode = str |
64 | ||
65 | try: | |
66 | unichr | |
67 | except NameError: | |
68 | # python 3 support | |
69 | # In python 3, unichr was removed as chr can now do the job | |
70 | unichr = chr | |
63 | 71 | |
64 | 72 | try: |
65 | 73 | from shutil import which |
242 | 250 | def cmd_bucket_create(args): |
243 | 251 | cfg = Config() |
244 | 252 | s3 = S3(cfg) |
253 | ||
245 | 254 | for arg in args: |
246 | 255 | uri = S3Uri(arg) |
247 | 256 | if not uri.type == "s3" or not uri.has_bucket() or uri.has_object(): |
248 | 257 | raise ParameterError("Expecting S3 URI with just the bucket name set instead of '%s'" % arg) |
249 | 258 | try: |
250 | response = s3.bucket_create(uri.bucket(), cfg.bucket_location) | |
259 | response = s3.bucket_create(uri.bucket(), cfg.bucket_location, cfg.extra_headers) | |
251 | 260 | output(u"Bucket '%s' created" % uri.uri()) |
252 | 261 | except S3Error as e: |
253 | 262 | if e.info["Code"] in S3.codes: |
420 | 429 | seq += 1 |
421 | 430 | |
422 | 431 | uri_final = S3Uri(local_list[key]['remote_uri']) |
432 | try: | |
433 | src_md5 = local_list.get_md5(key) | |
434 | except IOError: | |
435 | src_md5 = None | |
423 | 436 | |
424 | 437 | extra_headers = copy(cfg.extra_headers) |
425 | 438 | full_name_orig = local_list[key]['full_name'] |
427 | 440 | seq_label = "[%d of %d]" % (seq, local_count) |
428 | 441 | if Config().encrypt: |
429 | 442 | gpg_exitcode, full_name, extra_headers["x-amz-meta-s3tools-gpgenc"] = gpg_encrypt(full_name_orig) |
430 | attr_header = _build_attr_header(local_list, key) | |
443 | attr_header = _build_attr_header(local_list[key], key, src_md5) | |
431 | 444 | debug(u"attr_header: %s" % attr_header) |
432 | 445 | extra_headers.update(attr_header) |
433 | 446 | try: |
529 | 542 | if remote_count > 1: |
530 | 543 | raise ParameterError("Destination must be a directory or stdout when downloading multiple sources.") |
531 | 544 | remote_list[remote_list.keys()[0]]['local_filename'] = destination_base |
532 | elif os.path.isdir(deunicodise(destination_base)): | |
545 | else: | |
533 | 546 | if destination_base[-1] != os.path.sep: |
534 | 547 | destination_base += os.path.sep |
535 | 548 | for key in remote_list: |
537 | 550 | if os.path.sep != "/": |
538 | 551 | local_filename = os.path.sep.join(local_filename.split("/")) |
539 | 552 | remote_list[key]['local_filename'] = local_filename |
540 | else: | |
541 | raise InternalError("WTF? Is it a dir or not? -- %s" % destination_base) | |
542 | 553 | |
543 | 554 | if cfg.dry_run: |
544 | 555 | for key in exclude_list: |
596 | 607 | dst_stream.close() |
597 | 608 | raise ParameterError(u"File %s already exists. Use either of --force / --continue / --skip-existing or give it a new name." % destination) |
598 | 609 | except IOError as e: |
599 | error(u"Skipping %s: %s" % (destination, e.strerror)) | |
610 | error(u"Creation of file '%s' failed (Reason: %s)" | |
611 | % (destination, e.strerror)) | |
612 | if cfg.stop_on_error: | |
613 | error(u"Exiting now because of --stop-on-error") | |
614 | raise | |
615 | ret = EX_PARTIAL | |
600 | 616 | continue |
617 | ||
601 | 618 | try: |
602 | 619 | try: |
603 | 620 | response = s3.object_get(uri, dst_stream, destination, start_position = start_position, extra_label = seq_label) |
604 | 621 | finally: |
605 | 622 | dst_stream.close() |
606 | 623 | except S3DownloadError as e: |
607 | error(u"%s: Skipping that file. This is usually a transient error, please try again later." % e) | |
608 | if not file_exists: # Delete, only if file didn't exist before! | |
624 | error(u"Download of '%s' failed (Reason: %s)" % (destination, e)) | |
625 | # Delete, only if file didn't exist before! | |
626 | if not file_exists: | |
609 | 627 | debug(u"object_get failed for '%s', deleting..." % (destination,)) |
610 | 628 | os.unlink(deunicodise(destination)) |
611 | ret = EX_PARTIAL | |
612 | if cfg.stop_on_error: | |
613 | ret = EX_DATAERR | |
614 | break | |
629 | if cfg.stop_on_error: | |
630 | error(u"Exiting now because of --stop-on-error") | |
631 | raise | |
632 | ret = EX_PARTIAL | |
615 | 633 | continue |
616 | 634 | except S3Error as e: |
635 | error(u"Download of '%s' failed (Reason: %s)" % (destination, e)) | |
617 | 636 | if not file_exists: # Delete, only if file didn't exist before! |
618 | 637 | debug(u"object_get failed for '%s', deleting..." % (destination,)) |
619 | 638 | os.unlink(deunicodise(destination)) |
633 | 652 | if Config().delete_after_fetch: |
634 | 653 | s3.object_delete(uri) |
635 | 654 | output(u"File '%s' removed after fetch" % (uri)) |
636 | return EX_OK | |
655 | return ret | |
637 | 656 | |
638 | 657 | def cmd_object_del(args): |
639 | 658 | cfg = Config() |
830 | 849 | cfg = Config() |
831 | 850 | if action_str == 'modify': |
832 | 851 | if len(args) < 1: |
833 | raise ParameterError("Expecting one or more S3 URIs for " + action_str) | |
852 | raise ParameterError("Expecting one or more S3 URIs for " | |
853 | + action_str) | |
834 | 854 | destination_base = None |
835 | 855 | else: |
836 | 856 | if len(args) < 2: |
837 | raise ParameterError("Expecting two or more S3 URIs for " + action_str) | |
857 | raise ParameterError("Expecting two or more S3 URIs for " | |
858 | + action_str) | |
838 | 859 | dst_base_uri = S3Uri(args.pop()) |
839 | 860 | if dst_base_uri.type != "s3": |
840 | raise ParameterError("Destination must be S3 URI. To download a file use 'get' or 'sync'.") | |
861 | raise ParameterError("Destination must be S3 URI. To download a " | |
862 | "file use 'get' or 'sync'.") | |
841 | 863 | destination_base = dst_base_uri.uri() |
842 | 864 | |
843 | 865 | scoreboard = ExitScoreboard() |
844 | 866 | |
845 | remote_list, exclude_list, remote_total_size = fetch_remote_list(args, require_attribs = False) | |
867 | remote_list, exclude_list, remote_total_size = \ | |
868 | fetch_remote_list(args, require_attribs=False) | |
846 | 869 | |
847 | 870 | remote_count = len(remote_list) |
848 | 871 | |
853 | 876 | # so we don't need to test for it here. |
854 | 877 | if not destination_base.endswith('/') \ |
855 | 878 | and (len(remote_list) > 1 or cfg.recursive): |
856 | raise ParameterError("Destination must be a directory and end with '/' when acting on a folder content or on multiple sources.") | |
879 | raise ParameterError("Destination must be a directory and end with" | |
880 | " '/' when acting on a folder content or on " | |
881 | "multiple sources.") | |
857 | 882 | |
858 | 883 | if cfg.recursive: |
859 | 884 | for key in remote_list: |
872 | 897 | for key in exclude_list: |
873 | 898 | output(u"exclude: %s" % key) |
874 | 899 | for key in remote_list: |
875 | output(u"%s: '%s' -> '%s'" % (action_str, remote_list[key]['object_uri_str'], remote_list[key]['dest_name'])) | |
900 | output(u"%s: '%s' -> '%s'" % (action_str, | |
901 | remote_list[key]['object_uri_str'], | |
902 | remote_list[key]['dest_name'])) | |
876 | 903 | |
877 | 904 | warning(u"Exiting now because of --dry-run") |
878 | 905 | return EX_OK |
885 | 912 | item = remote_list[key] |
886 | 913 | src_uri = S3Uri(item['object_uri_str']) |
887 | 914 | dst_uri = S3Uri(item['dest_name']) |
915 | src_size = item.get('size') | |
888 | 916 | |
889 | 917 | extra_headers = copy(cfg.extra_headers) |
890 | 918 | try: |
891 | response = process_fce(src_uri, dst_uri, extra_headers) | |
892 | output(message % { "src" : src_uri, "dst" : dst_uri }) | |
919 | response = process_fce(src_uri, dst_uri, extra_headers, | |
920 | src_size=src_size, | |
921 | extra_label=seq_label) | |
922 | output(message % {"src": src_uri, "dst": dst_uri, | |
923 | "extra": seq_label}) | |
893 | 924 | if Config().acl_public: |
894 | 925 | info(u"Public URL is: %s" % dst_uri.public_url()) |
895 | 926 | scoreboard.success() |
896 | except S3Error as e: | |
897 | if e.code == "NoSuchKey": | |
927 | except (S3Error, S3UploadError) as exc: | |
928 | if isinstance(exc, S3Error) and exc.code == "NoSuchKey": | |
898 | 929 | scoreboard.notfound() |
899 | 930 | warning(u"Key not found %s" % item['object_uri_str']) |
900 | 931 | else: |
901 | 932 | scoreboard.failed() |
902 | if cfg.stop_on_error: break | |
933 | error(u"Copy failed for: '%s' (%s)", item['object_uri_str'], | |
934 | exc) | |
935 | ||
936 | if cfg.stop_on_error: | |
937 | break | |
903 | 938 | return scoreboard.rc() |
904 | 939 | |
905 | 940 | def cmd_cp(args): |
906 | 941 | s3 = S3(Config()) |
907 | return subcmd_cp_mv(args, s3.object_copy, "copy", u"remote copy: '%(src)s' -> '%(dst)s'") | |
942 | return subcmd_cp_mv(args, s3.object_copy, "copy", | |
943 | u"remote copy: '%(src)s' -> '%(dst)s' %(extra)s") | |
908 | 944 | |
909 | 945 | def cmd_modify(args): |
910 | 946 | s3 = S3(Config()) |
911 | return subcmd_cp_mv(args, s3.object_modify, "modify", u"modify: '%(src)s'") | |
947 | return subcmd_cp_mv(args, s3.object_modify, "modify", | |
948 | u"modify: '%(src)s' %(extra)s") | |
912 | 949 | |
913 | 950 | def cmd_mv(args): |
914 | 951 | s3 = S3(Config()) |
915 | return subcmd_cp_mv(args, s3.object_move, "move", u"move: '%(src)s' -> '%(dst)s'") | |
952 | return subcmd_cp_mv(args, s3.object_move, "move", | |
953 | u"move: '%(src)s' -> '%(dst)s' %(extra)s") | |
916 | 954 | |
917 | 955 | def cmd_info(args): |
918 | 956 | cfg = Config() |
946 | 984 | else: |
947 | 985 | info = s3.bucket_info(uri) |
948 | 986 | output(u"%s (bucket):" % uri.uri()) |
949 | output(u" Location: %s" % info['bucket-location']) | |
950 | output(u" Payer: %s" % info['requester-pays']) | |
987 | output(u" Location: %s" % (info['bucket-location'] | |
988 | or 'none')) | |
989 | output(u" Payer: %s" % (info['requester-pays'] | |
990 | or 'none')) | |
951 | 991 | expiration = s3.expiration_info(uri, cfg.bucket_location) |
952 | if expiration: | |
992 | if expiration and expiration['prefix'] is not None: | |
953 | 993 | expiration_desc = "Expiration Rule: " |
954 | 994 | if expiration['prefix'] == "": |
955 | 995 | expiration_desc += "all objects in this bucket " |
956 | else: | |
996 | elif expiration['prefix'] is not None: | |
957 | 997 | expiration_desc += "objects with key prefix '" + expiration['prefix'] + "' " |
958 | 998 | expiration_desc += "will expire in '" |
959 | 999 | if expiration['days']: |
1108 | 1148 | item = src_list[file] |
1109 | 1149 | src_uri = S3Uri(item['object_uri_str']) |
1110 | 1150 | dst_uri = S3Uri(item['target_uri']) |
1151 | src_size = item.get('size') | |
1111 | 1152 | seq_label = "[%d of %d]" % (seq, src_count) |
1112 | 1153 | extra_headers = copy(cfg.extra_headers) |
1113 | 1154 | try: |
1114 | response = s3.object_copy(src_uri, dst_uri, extra_headers) | |
1115 | output(u"remote copy: '%s' -> '%s'" % (src_uri, dst_uri)) | |
1155 | response = s3.object_copy(src_uri, dst_uri, extra_headers, | |
1156 | src_size=src_size, | |
1157 | extra_label=seq_label) | |
1158 | output(u"remote copy: '%s' -> '%s' %s" % | |
1159 | (src_uri, dst_uri, seq_label)) | |
1116 | 1160 | total_nb_files += 1 |
1117 | 1161 | total_size += item.get(u'size', 0) |
1118 | except S3Error as exc: | |
1162 | except (S3Error, S3UploadError) as exc: | |
1119 | 1163 | ret = EX_PARTIAL |
1120 | 1164 | error(u"File '%s' could not be copied: %s", src_uri, exc) |
1121 | 1165 | if cfg.stop_on_error: |
1135 | 1179 | total_files_copied += nb_files |
1136 | 1180 | total_size_copied += size |
1137 | 1181 | |
1138 | ||
1139 | n_copied, bytes_saved, failed_copy_files = remote_copy(s3, copy_pairs, destination_base, None) | |
1182 | n_copied, bytes_saved, failed_copy_files = remote_copy( | |
1183 | s3, copy_pairs, destination_base, None, False) | |
1140 | 1184 | total_files_copied += n_copied |
1141 | 1185 | total_size_copied += bytes_saved |
1142 | 1186 | |
1143 | 1187 | #process files not copied |
1144 | debug("Process files that were not remote copied") | |
1145 | failed_copy_count = len (failed_copy_files) | |
1188 | debug("Process files that were not remotely copied") | |
1189 | failed_copy_count = len(failed_copy_files) | |
1146 | 1190 | for key in failed_copy_files: |
1147 | 1191 | failed_copy_files[key]['target_uri'] = destination_base + key |
1148 | 1192 | status, seq, nb_files, size = _upload(failed_copy_files, seq, src_count + update_count + failed_copy_count) |
1294 | 1338 | deleted_count, deleted_size = (0, 0) |
1295 | 1339 | |
1296 | 1340 | def _download(remote_list, seq, total, total_size, dir_cache): |
1297 | original_umask = os.umask(0); | |
1298 | os.umask(original_umask); | |
1341 | original_umask = os.umask(0) | |
1342 | os.umask(original_umask) | |
1299 | 1343 | file_list = remote_list.keys() |
1300 | 1344 | file_list.sort() |
1301 | 1345 | ret = EX_OK |
1368 | 1412 | # Try to remove the temp file if it exists |
1369 | 1413 | if chkptfname_b: |
1370 | 1414 | os.unlink(chkptfname_b) |
1371 | except: | |
1415 | except Exception: | |
1372 | 1416 | pass |
1373 | 1417 | |
1374 | 1418 | if allow_partial and not cfg.stop_on_error: |
1411 | 1455 | try: |
1412 | 1456 | # set permissions on destination file |
1413 | 1457 | if not is_empty_directory: # a normal file |
1414 | mode = 0o777 - original_umask; | |
1415 | else: # an empty directory, make them readable/executable | |
1458 | mode = 0o777 - original_umask | |
1459 | else: | |
1460 | # an empty directory, make them readable/executable | |
1416 | 1461 | mode = 0o775 |
1417 | 1462 | debug(u"mode=%s" % oct(mode)) |
1418 | os.chmod(deunicodise(dst_file), mode); | |
1463 | os.chmod(deunicodise(dst_file), mode) | |
1419 | 1464 | except: |
1420 | 1465 | raise |
1421 | 1466 | |
1465 | 1510 | finally: |
1466 | 1511 | try: |
1467 | 1512 | os.remove(chkptfname_b) |
1468 | except: | |
1513 | except Exception: | |
1469 | 1514 | pass |
1470 | 1515 | |
1471 | 1516 | speed_fmt = formatSize(response["speed"], human_readable = True, floating_point = True) |
1528 | 1573 | # For instance all empty files would become hardlinked together! |
1529 | 1574 | saved_bytes = 0 |
1530 | 1575 | failed_copy_list = FileDict() |
1531 | for (src_obj, dst1, relative_file) in copy_pairs: | |
1576 | for (src_obj, dst1, relative_file, md5) in copy_pairs: | |
1532 | 1577 | src_file = os.path.join(destination_base, dst1) |
1533 | 1578 | dst_file = os.path.join(destination_base, relative_file) |
1534 | 1579 | dst_dir = os.path.dirname(deunicodise(dst_file)) |
1544 | 1589 | failed_copy_list[relative_file] = src_obj |
1545 | 1590 | return len(copy_pairs), saved_bytes, failed_copy_list |
1546 | 1591 | |
1547 | def remote_copy(s3, copy_pairs, destination_base, uploaded_objects_list=None): | |
1592 | def remote_copy(s3, copy_pairs, destination_base, uploaded_objects_list=None, | |
1593 | metadata_update=False): | |
1548 | 1594 | cfg = Config() |
1549 | 1595 | saved_bytes = 0 |
1550 | 1596 | failed_copy_list = FileDict() |
1551 | for (src_obj, dst1, dst2) in copy_pairs: | |
1597 | seq = 0 | |
1598 | src_count = len(copy_pairs) | |
1599 | for (src_obj, dst1, dst2, src_md5) in copy_pairs: | |
1600 | seq += 1 | |
1552 | 1601 | debug(u"Remote Copying from %s to %s" % (dst1, dst2)) |
1553 | 1602 | dst1_uri = S3Uri(destination_base + dst1) |
1554 | 1603 | dst2_uri = S3Uri(destination_base + dst2) |
1604 | src_obj_size = src_obj.get(u'size', 0) | |
1605 | seq_label = "[%d of %d]" % (seq, src_count) | |
1555 | 1606 | extra_headers = copy(cfg.extra_headers) |
1607 | if metadata_update: | |
1608 | # source is a real local file with its own personal metadata | |
1609 | attr_header = _build_attr_header(src_obj, dst2, src_md5) | |
1610 | debug(u"attr_header: %s" % attr_header) | |
1611 | extra_headers.update(attr_header) | |
1612 | extra_headers['content-type'] = \ | |
1613 | s3.content_type(filename=src_obj['full_name']) | |
1556 | 1614 | try: |
1557 | s3.object_copy(dst1_uri, dst2_uri, extra_headers) | |
1558 | saved_bytes += src_obj.get(u'size', 0) | |
1559 | output(u"remote copy: '%s' -> '%s'" % (dst1, dst2)) | |
1615 | s3.object_copy(dst1_uri, dst2_uri, extra_headers, | |
1616 | src_size=src_obj_size, | |
1617 | extra_label=seq_label) | |
1618 | output(u"remote copy: '%s' -> '%s' %s" % (dst1, dst2, seq_label)) | |
1619 | saved_bytes += src_obj_size | |
1560 | 1620 | if uploaded_objects_list is not None: |
1561 | 1621 | uploaded_objects_list.append(dst2) |
1562 | except: | |
1622 | except Exception: | |
1563 | 1623 | warning(u"Unable to remote copy files '%s' -> '%s'" % (dst1_uri, dst2_uri)) |
1564 | 1624 | failed_copy_list[dst2] = src_obj |
1565 | 1625 | return (len(copy_pairs), saved_bytes, failed_copy_list) |
1566 | 1626 | |
1567 | def _build_attr_header(local_list, src): | |
1627 | def _build_attr_header(src_obj, src_relative_name, md5=None): | |
1568 | 1628 | cfg = Config() |
1569 | 1629 | attrs = {} |
1570 | 1630 | if cfg.preserve_attrs: |
1571 | 1631 | for attr in cfg.preserve_attrs_list: |
1632 | val = None | |
1572 | 1633 | if attr == 'uname': |
1573 | 1634 | try: |
1574 | val = Utils.urlencode_string(Utils.getpwuid_username(local_list[src]['uid']), unicode_output=True) | |
1635 | val = Utils.urlencode_string(Utils.getpwuid_username(src_obj['uid']), unicode_output=True) | |
1575 | 1636 | except (KeyError, TypeError): |
1576 | 1637 | attr = "uid" |
1577 | val = local_list[src].get('uid') | |
1638 | val = src_obj.get('uid') | |
1578 | 1639 | if val: |
1579 | warning(u"%s: Owner username not known. Storing UID=%d instead." % (src, val)) | |
1640 | warning(u"%s: Owner username not known. Storing UID=%d instead." % (src_relative_name, val)) | |
1580 | 1641 | elif attr == 'gname': |
1581 | 1642 | try: |
1582 | val = Utils.urlencode_string(Utils.getgrgid_grpname(local_list[src].get('gid')), unicode_output=True) | |
1643 | val = Utils.urlencode_string(Utils.getgrgid_grpname(src_obj.get('gid')), unicode_output=True) | |
1583 | 1644 | except (KeyError, TypeError): |
1584 | 1645 | attr = "gid" |
1585 | val = local_list[src].get('gid') | |
1646 | val = src_obj.get('gid') | |
1586 | 1647 | if val: |
1587 | warning(u"%s: Owner groupname not known. Storing GID=%d instead." % (src, val)) | |
1648 | warning(u"%s: Owner groupname not known. Storing GID=%d instead." % (src_relative_name, val)) | |
1588 | 1649 | elif attr != "md5": |
1589 | 1650 | try: |
1590 | val = getattr(local_list[src]['sr'], 'st_' + attr) | |
1591 | except: | |
1651 | val = getattr(src_obj['sr'], 'st_' + attr) | |
1652 | except Exception: | |
1592 | 1653 | val = None |
1593 | 1654 | if val is not None: |
1594 | 1655 | attrs[attr] = val |
1595 | 1656 | |
1596 | if 'md5' in cfg.preserve_attrs_list: | |
1597 | try: | |
1598 | val = local_list.get_md5(src) | |
1599 | if val is not None: | |
1600 | attrs['md5'] = val | |
1601 | except IOError: | |
1602 | pass | |
1657 | if 'md5' in cfg.preserve_attrs_list and md5: | |
1658 | attrs['md5'] = md5 | |
1603 | 1659 | |
1604 | 1660 | if attrs: |
1605 | result = "" | |
1661 | attr_str_list = [] | |
1606 | 1662 | for k in sorted(attrs.keys()): |
1607 | result += "%s:%s/" % (k, attrs[k]) | |
1608 | return {'x-amz-meta-s3cmd-attrs' : result[:-1]} | |
1663 | attr_str_list.append(u"%s:%s" % (k, attrs[k])) | |
1664 | attr_header = {'x-amz-meta-s3cmd-attrs': u'/'.join(attr_str_list)} | |
1609 | 1665 | else: |
1610 | return {} | |
1611 | ||
1666 | attr_header = {} | |
1667 | ||
1668 | return attr_header | |
1612 | 1669 | |
1613 | 1670 | def cmd_sync_local2remote(args): |
1614 | 1671 | cfg = Config() |
1670 | 1727 | seq += 1 |
1671 | 1728 | item = local_list[file] |
1672 | 1729 | src = item['full_name'] |
1730 | try: | |
1731 | src_md5 = local_list.get_md5(file) | |
1732 | except IOError: | |
1733 | src_md5 = None | |
1673 | 1734 | uri = S3Uri(item['remote_uri']) |
1674 | 1735 | seq_label = "[%d of %d]" % (seq, total) |
1675 | 1736 | extra_headers = copy(cfg.extra_headers) |
1676 | 1737 | try: |
1677 | attr_header = _build_attr_header(local_list, file) | |
1738 | attr_header = _build_attr_header(local_list[file], | |
1739 | file, src_md5) | |
1678 | 1740 | debug(u"attr_header: %s" % attr_header) |
1679 | 1741 | extra_headers.update(attr_header) |
1680 | 1742 | response = s3.object_put(src, uri, extra_headers, extra_label = seq_label) |
1688 | 1750 | continue |
1689 | 1751 | except InvalidFileError as exc: |
1690 | 1752 | error(u"Upload of '%s' is not possible (Reason: %s)" % (item['full_name'], exc)) |
1691 | ret = EX_PARTIAL | |
1692 | 1753 | if cfg.stop_on_error: |
1693 | 1754 | ret = EX_OSFILE |
1694 | 1755 | error(u"Exiting now because of --stop-on-error") |
1695 | 1756 | raise |
1757 | ret = EX_PARTIAL | |
1696 | 1758 | continue |
1697 | 1759 | speed_fmt = formatSize(response["speed"], human_readable = True, floating_point = True) |
1698 | 1760 | if not cfg.progress_meter: |
1761 | 1823 | output(u"upload: '%s' -> '%s'" % (local_list[key]['full_name'], local_list[key]['remote_uri'])) |
1762 | 1824 | for key in update_list: |
1763 | 1825 | output(u"upload: '%s' -> '%s'" % (update_list[key]['full_name'], update_list[key]['remote_uri'])) |
1764 | for (src_obj, dst1, dst2) in copy_pairs: | |
1826 | for (src_obj, dst1, dst2, md5) in copy_pairs: | |
1765 | 1827 | output(u"remote copy: '%s' -> '%s'" % (dst1, dst2)) |
1766 | 1828 | if cfg.delete_removed: |
1767 | 1829 | for key in remote_list: |
1792 | 1854 | # uploaded_objects_list reference is passed so it can be filled with |
1793 | 1855 | # destination object of succcessful copies so that they can be |
1794 | 1856 | # invalidated by cf |
1795 | n_copies, saved_bytes, failed_copy_files = remote_copy(s3, copy_pairs, | |
1796 | destination_base, | |
1797 | uploaded_objects_list) | |
1857 | n_copies, saved_bytes, failed_copy_files = remote_copy( | |
1858 | s3, copy_pairs, destination_base, uploaded_objects_list, True) | |
1798 | 1859 | |
1799 | 1860 | #upload file that could not be copied |
1800 | debug("Process files that were not remote copied") | |
1861 | debug("Process files that were not remotely copied") | |
1801 | 1862 | failed_copy_count = len(failed_copy_files) |
1802 | 1863 | _set_remote_uri(failed_copy_files, destination_base, single_file_local) |
1803 | 1864 | status, n, size_transferred = _upload(failed_copy_files, n, upload_count + failed_copy_count, size_transferred) |
1885 | 1946 | def cmd_sync(args): |
1886 | 1947 | cfg = Config() |
1887 | 1948 | if (len(args) < 2): |
1888 | raise ParameterError("Too few parameters! Expected: %s" % commands['sync']['param']) | |
1949 | syntax_msg = '' | |
1950 | commands_list = get_commands_list() | |
1951 | for cmd in commands_list: | |
1952 | if cmd.get('cmd') == 'sync': | |
1953 | syntax_msg = cmd.get('param', '') | |
1954 | break | |
1955 | raise ParameterError("Too few parameters! Expected: %s" % syntax_msg) | |
1889 | 1956 | if cfg.delay_updates: |
1890 | 1957 | warning(u"`delay-updates` is obsolete.") |
1891 | 1958 | |
2075 | 2142 | #id = '' |
2076 | 2143 | #if(len(args) > 1): id = args[1] |
2077 | 2144 | |
2078 | response = s3.get_multipart(uri) | |
2079 | debug(u"response - %s" % response['status']) | |
2145 | upload_list = s3.get_multipart(uri) | |
2080 | 2146 | output(u"%s" % uri) |
2081 | tree = getTreeFromXml(response['data']) | |
2082 | debug(parseNodes(tree)) | |
2147 | debug(upload_list) | |
2083 | 2148 | output(u"Initiated\tPath\tId") |
2084 | for mpupload in parseNodes(tree): | |
2149 | for mpupload in upload_list: | |
2085 | 2150 | try: |
2086 | output(u"%s\t%s\t%s" % (mpupload['Initiated'], "s3://" + uri.bucket() + "/" + mpupload['Key'], mpupload['UploadId'])) | |
2151 | output(u"%s\t%s\t%s" % ( | |
2152 | mpupload['Initiated'], | |
2153 | "s3://" + uri.bucket() + "/" + mpupload['Key'], | |
2154 | mpupload['UploadId'])) | |
2087 | 2155 | except KeyError: |
2088 | 2156 | pass |
2089 | 2157 | return EX_OK |
2106 | 2174 | uri = S3Uri(args[0]) |
2107 | 2175 | id = args[1] |
2108 | 2176 | |
2109 | response = s3.list_multipart(uri, id) | |
2110 | debug(u"response - %s" % response['status']) | |
2111 | tree = getTreeFromXml(response['data']) | |
2177 | part_list = s3.list_multipart(uri, id) | |
2112 | 2178 | output(u"LastModified\t\t\tPartNumber\tETag\tSize") |
2113 | for mpupload in parseNodes(tree): | |
2179 | for mpupload in part_list: | |
2114 | 2180 | try: |
2115 | output(u"%s\t%s\t%s\t%s" % (mpupload['LastModified'], mpupload['PartNumber'], mpupload['ETag'], mpupload['Size'])) | |
2116 | except: | |
2181 | output(u"%s\t%s\t%s\t%s" % (mpupload['LastModified'], | |
2182 | mpupload['PartNumber'], | |
2183 | mpupload['ETag'], | |
2184 | mpupload['Size'])) | |
2185 | except KeyError: | |
2117 | 2186 | pass |
2118 | 2187 | return EX_OK |
2119 | 2188 | |
2187 | 2256 | pass |
2188 | 2257 | return text # leave as is |
2189 | 2258 | text = text.encode('ascii', 'xmlcharrefreplace') |
2190 | return re.sub("&#?\w+;", _unescape_fixup, text) | |
2259 | return re.sub(r"&#?\w+;", _unescape_fixup, text) | |
2191 | 2260 | |
2192 | 2261 | cfg = Config() |
2193 | 2262 | cfg.urlencoding_mode = "fixbucket" |
2199 | 2268 | if culprit.type != "s3": |
2200 | 2269 | raise ParameterError("Expecting S3Uri instead of: %s" % arg) |
2201 | 2270 | response = s3.bucket_list_noparse(culprit.bucket(), culprit.object(), recursive = True) |
2202 | r_xent = re.compile("&#x[\da-fA-F]+;") | |
2271 | r_xent = re.compile(r"&#x[\da-fA-F]+;") | |
2203 | 2272 | keys = re.findall("<Key>(.*?)</Key>", response['data'], re.MULTILINE | re.UNICODE) |
2204 | 2273 | debug("Keys: %r" % keys) |
2205 | 2274 | for key in keys: |
2304 | 2373 | |
2305 | 2374 | if getattr(cfg, "proxy_host") == "" and os.getenv("http_proxy"): |
2306 | 2375 | autodetected_encoding = locale.getpreferredencoding() or "UTF-8" |
2307 | re_match=re.match("(http://)?([^:]+):(\d+)", | |
2376 | re_match=re.match(r"(http://)?([^:]+):(\d+)", | |
2308 | 2377 | unicodise_s(os.getenv("http_proxy"), autodetected_encoding)) |
2309 | 2378 | if re_match: |
2310 | 2379 | setattr(cfg, "proxy_host", re_match.groups()[1]) |
2448 | 2517 | with open(deunicodise(fname), "rt") as fn: |
2449 | 2518 | for pattern in fn: |
2450 | 2519 | pattern = unicodise(pattern).strip() |
2451 | if re.match("^#", pattern) or re.match("^\s*$", pattern): | |
2520 | if re.match("^#", pattern) or re.match(r"^\s*$", pattern): | |
2452 | 2521 | continue |
2453 | 2522 | debug(u"%s: adding rule: %s" % (fname, pattern)) |
2454 | 2523 | patterns_list.append(pattern) |
2459 | 2528 | return patterns_list |
2460 | 2529 | |
2461 | 2530 | def process_patterns(patterns_list, patterns_from, is_glob, option_txt = ""): |
2462 | """ | |
2531 | r""" | |
2463 | 2532 | process_patterns(patterns, patterns_from, is_glob, option_txt = "") |
2464 | 2533 | Process --exclude / --include GLOB and REGEXP patterns. |
2465 | 2534 | 'option_txt' is 'exclude' / 'include' / 'rexclude' / 'rinclude' |
2503 | 2572 | {"cmd":"rm", "label":"Delete file from bucket (alias for del)", "param":"s3://BUCKET/OBJECT", "func":cmd_object_del, "argc":1}, |
2504 | 2573 | #{"cmd":"mkdir", "label":"Make a virtual S3 directory", "param":"s3://BUCKET/path/to/dir", "func":cmd_mkdir, "argc":1}, |
2505 | 2574 | {"cmd":"restore", "label":"Restore file from Glacier storage", "param":"s3://BUCKET/OBJECT", "func":cmd_object_restore, "argc":1}, |
2506 | {"cmd":"sync", "label":"Synchronize a directory tree to S3 (checks files freshness using size and md5 checksum, unless overridden by options, see below)", "param":"LOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR", "func":cmd_sync, "argc":2}, | |
2575 | {"cmd":"sync", "label":"Synchronize a directory tree to S3 (checks files freshness using size and md5 checksum, unless overridden by options, see below)", "param":"LOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR or s3://BUCKET[/PREFIX] s3://BUCKET[/PREFIX]", "func":cmd_sync, "argc":2}, | |
2507 | 2576 | {"cmd":"du", "label":"Disk usage by buckets", "param":"[s3://BUCKET[/PREFIX]]", "func":cmd_du, "argc":0}, |
2508 | 2577 | {"cmd":"info", "label":"Get various information about Buckets or Files", "param":"s3://BUCKET[/OBJECT]", "func":cmd_info, "argc":1}, |
2509 | 2578 | {"cmd":"cp", "label":"Copy object", "param":"s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]", "func":cmd_cp, "argc":2}, |
2567 | 2636 | acl.grantAnonRead() |
2568 | 2637 | something_changed = True |
2569 | 2638 | elif cfg.acl_public == False: # we explicitely check for False, because it could be None |
2570 | if not acl.isAnonRead(): | |
2639 | if not acl.isAnonRead() and not acl.isAnonWrite(): | |
2571 | 2640 | info(u"%s: already Private, skipping %s" % (uri, seq_label)) |
2572 | 2641 | else: |
2573 | 2642 | acl.revokeAnonRead() |
2643 | acl.revokeAnonWrite() | |
2574 | 2644 | something_changed = True |
2575 | 2645 | |
2576 | 2646 | # update acl with arguments |
2597 | 2667 | output(u"%s: ACL updated" % uri) |
2598 | 2668 | |
2599 | 2669 | class OptionMimeType(Option): |
2600 | def check_mimetype(option, opt, value): | |
2601 | if re.compile("^[a-z0-9]+/[a-z0-9+\.-]+(;.*)?$", re.IGNORECASE).match(value): | |
2670 | def check_mimetype(self, opt, value): | |
2671 | if re.compile(r"^[a-z0-9]+/[a-z0-9+\.-]+(;.*)?$", re.IGNORECASE).match(value): | |
2602 | 2672 | return value |
2603 | 2673 | raise OptionValueError("option %s: invalid MIME-Type format: %r" % (opt, value)) |
2604 | 2674 | |
2605 | 2675 | class OptionS3ACL(Option): |
2606 | def check_s3acl(option, opt, value): | |
2676 | def check_s3acl(self, opt, value): | |
2607 | 2677 | permissions = ('read', 'write', 'read_acp', 'write_acp', 'full_control', 'all') |
2608 | 2678 | try: |
2609 | permission, grantee = re.compile("^(\w+):(.+)$", re.IGNORECASE).match(value).groups() | |
2679 | permission, grantee = re.compile(r"^(\w+):(.+)$", re.IGNORECASE).match(value).groups() | |
2610 | 2680 | if not permission or not grantee: |
2611 | raise | |
2681 | raise OptionValueError("option %s: invalid S3 ACL format: %r" % (opt, value)) | |
2612 | 2682 | if permission in permissions: |
2613 | 2683 | return { 'name' : grantee, 'permission' : permission.upper() } |
2614 | 2684 | else: |
2615 | 2685 | raise OptionValueError("option %s: invalid S3 ACL permission: %s (valid values: %s)" % |
2616 | 2686 | (opt, permission, ", ".join(permissions))) |
2617 | except: | |
2687 | except OptionValueError: | |
2688 | raise | |
2689 | except Exception: | |
2618 | 2690 | raise OptionValueError("option %s: invalid S3 ACL format: %r" % (opt, value)) |
2619 | 2691 | |
2620 | 2692 | class OptionAll(OptionMimeType, OptionS3ACL): |
2647 | 2719 | |
2648 | 2720 | config_file = None |
2649 | 2721 | if os.getenv("S3CMD_CONFIG"): |
2650 | config_file = unicodise_s(os.getenv("S3CMD_CONFIG"), autodetected_encoding) | |
2722 | config_file = unicodise_s(os.getenv("S3CMD_CONFIG"), | |
2723 | autodetected_encoding) | |
2651 | 2724 | elif os.name == "nt" and os.getenv("USERPROFILE"): |
2652 | config_file = os.path.join(unicodise_s(os.getenv("USERPROFILE"), autodetected_encoding), | |
2653 | os.getenv("APPDATA") and unicodise_s(os.getenv("APPDATA"), autodetected_encoding) | |
2654 | or 'Application Data', | |
2655 | "s3cmd.ini") | |
2725 | config_file = os.path.join( | |
2726 | unicodise_s(os.getenv("USERPROFILE"), autodetected_encoding), | |
2727 | os.getenv("APPDATA") | |
2728 | and unicodise_s(os.getenv("APPDATA"), autodetected_encoding) | |
2729 | or 'Application Data', | |
2730 | "s3cmd.ini") | |
2656 | 2731 | else: |
2657 | 2732 | from os.path import expanduser |
2658 | 2733 | config_file = os.path.join(expanduser("~"), ".s3cfg") |
2689 | 2764 | optparser.add_option( "--restore-priority", dest="restore_priority", action="store", choices=['standard', 'expedited', 'bulk'], help="Priority for restoring files from S3 Glacier (only for 'restore' command). Choices available: bulk, standard, expedited") |
2690 | 2765 | |
2691 | 2766 | optparser.add_option( "--delete-removed", dest="delete_removed", action="store_true", help="Delete destination objects with no corresponding source file [sync]") |
2692 | optparser.add_option( "--no-delete-removed", dest="delete_removed", action="store_false", help="Don't delete destination objects.") | |
2767 | optparser.add_option( "--no-delete-removed", dest="delete_removed", action="store_false", help="Don't delete destination objects [sync]") | |
2693 | 2768 | optparser.add_option( "--delete-after", dest="delete_after", action="store_true", help="Perform deletes AFTER new uploads when delete-removed is enabled [sync]") |
2694 | 2769 | optparser.add_option( "--delay-updates", dest="delay_updates", action="store_true", help="*OBSOLETE* Put all updated files into place at end [sync]") # OBSOLETE |
2695 | 2770 | optparser.add_option( "--max-delete", dest="max_delete", action="store", help="Do not delete more than NUM files. [del] and [sync]", metavar="NUM") |
2767 | 2842 | optparser.add_option( "--cache-file", dest="cache_file", action="store", default="", metavar="FILE", help="Cache FILE containing local source MD5 values") |
2768 | 2843 | optparser.add_option("-q", "--quiet", dest="quiet", action="store_true", default=False, help="Silence output on stdout") |
2769 | 2844 | optparser.add_option( "--ca-certs", dest="ca_certs_file", action="store", default=None, help="Path to SSL CA certificate FILE (instead of system default)") |
2845 | optparser.add_option( "--ssl-cert", dest="ssl_client_cert_file", action="store", default=None, help="Path to client own SSL certificate CRT_FILE") | |
2846 | optparser.add_option( "--ssl-key", dest="ssl_client_key_file", action="store", default=None, help="Path to client own SSL certificate private key KEY_FILE") | |
2770 | 2847 | optparser.add_option( "--check-certificate", dest="check_ssl_certificate", action="store_true", help="Check SSL certificate validity") |
2771 | 2848 | optparser.add_option( "--no-check-certificate", dest="check_ssl_certificate", action="store_false", help="Do not check SSL certificate validity") |
2772 | 2849 | optparser.add_option( "--check-hostname", dest="check_ssl_hostname", action="store_true", help="Check SSL certificate hostname validity") |
2872 | 2949 | |
2873 | 2950 | ## Process --(no-)check-md5 |
2874 | 2951 | if options.check_md5 == False: |
2875 | try: | |
2952 | if "md5" in cfg.sync_checks: | |
2876 | 2953 | cfg.sync_checks.remove("md5") |
2954 | if "md5" in cfg.preserve_attrs_list: | |
2877 | 2955 | cfg.preserve_attrs_list.remove("md5") |
2878 | except Exception: | |
2879 | pass | |
2880 | if options.check_md5 == True: | |
2881 | if cfg.sync_checks.count("md5") == 0: | |
2956 | elif options.check_md5 == True: | |
2957 | if "md5" not in cfg.sync_checks: | |
2882 | 2958 | cfg.sync_checks.append("md5") |
2883 | if cfg.preserve_attrs_list.count("md5") == 0: | |
2959 | if "md5" not in cfg.preserve_attrs_list: | |
2884 | 2960 | cfg.preserve_attrs_list.append("md5") |
2885 | 2961 | |
2886 | 2962 | ## Update Config with other parameters |
3040 | 3116 | branch found at: |
3041 | 3117 | https://github.com/s3tools/s3cmd |
3042 | 3118 | and have a look at the known issues list: |
3043 | https://github.com/s3tools/s3cmd/wiki/Common-known-issues-and-their-solutions | |
3119 | https://github.com/s3tools/s3cmd/wiki/Common-known-issues-and-their-solutions-(FAQ) | |
3044 | 3120 | If the error persists, please report the |
3045 | 3121 | %s (removing any private |
3046 | 3122 | info as necessary) to: |
3053 | 3129 | try: |
3054 | 3130 | s = u' '.join([unicodise(a) for a in sys.argv]) |
3055 | 3131 | except NameError: |
3056 | # Error happened before Utils module was yet imported to provide unicodise | |
3132 | # Error happened before Utils module was yet imported to provide | |
3133 | # unicodise | |
3057 | 3134 | try: |
3058 | 3135 | s = u' '.join([(a) for a in sys.argv]) |
3059 | 3136 | except UnicodeDecodeError: |
3065 | 3142 | try: |
3066 | 3143 | sys.stderr.write(u"Problem: %s: %s\n" % (e_class, e)) |
3067 | 3144 | except UnicodeDecodeError: |
3068 | sys.stderr.write(u"Problem: [encoding safe] %r: %r\n" % (e_class, e)) | |
3145 | sys.stderr.write(u"Problem: [encoding safe] %r: %r\n" | |
3146 | % (e_class, e)) | |
3069 | 3147 | try: |
3070 | 3148 | sys.stderr.write(u"S3cmd: %s\n" % PkgInfo.version) |
3071 | 3149 | except NameError: |
3072 | sys.stderr.write(u"S3cmd: unknown version. Module import problem?\n") | |
3150 | sys.stderr.write(u"S3cmd: unknown version." | |
3151 | "Module import problem?\n") | |
3073 | 3152 | sys.stderr.write(u"python: %s\n" % sys.version) |
3074 | 3153 | try: |
3075 | sys.stderr.write(u"environment LANG=%s\n" % unicodise_s(os.getenv("LANG"), 'ascii')) | |
3154 | sys.stderr.write(u"environment LANG=%s\n" | |
3155 | % unicodise_s(os.getenv("LANG", "NOTSET"), | |
3156 | 'ascii')) | |
3076 | 3157 | except NameError: |
3077 | sys.stderr.write(u"environment LANG=%s\n" % os.getenv("LANG")) | |
3158 | # Error happened before Utils module was yet imported to provide | |
3159 | # unicodise | |
3160 | sys.stderr.write(u"environment LANG=%s\n" | |
3161 | % os.getenv("LANG", "NOTSET")) | |
3078 | 3162 | sys.stderr.write(u"\n") |
3079 | 3163 | if type(tb) == unicode: |
3080 | 3164 | sys.stderr.write(tb) |
3086 | 3170 | sys.stderr.write("Your sys.path contains these entries:\n") |
3087 | 3171 | for path in sys.path: |
3088 | 3172 | sys.stderr.write(u"\t%s\n" % path) |
3089 | sys.stderr.write("Now the question is where have the s3cmd modules been installed?\n") | |
3173 | sys.stderr.write("Now the question is where have the s3cmd modules" | |
3174 | " been installed?\n") | |
3090 | 3175 | |
3091 | 3176 | sys.stderr.write(alert_header % (u"above lines", u"")) |
3092 | 3177 | |
3105 | 3190 | from S3.S3Uri import S3Uri |
3106 | 3191 | from S3 import Utils |
3107 | 3192 | from S3 import Crypto |
3108 | from S3.Utils import * | |
3193 | from S3.BaseUtils import (formatDateTime, getPrettyFromXml, | |
3194 | encode_to_s3, decode_from_s3) | |
3195 | from S3.Utils import (formatSize, unicodise_safe, unicodise_s, | |
3196 | unicodise, deunicodise, replace_nonprintables) | |
3109 | 3197 | from S3.Progress import Progress, StatsInfo |
3110 | 3198 | from S3.CloudFront import Cmd as CfCmd |
3111 | 3199 | from S3.CloudFront import CloudFront |
3186 | 3274 | sys.exit(EX_OSERR) |
3187 | 3275 | |
3188 | 3276 | except UnicodeEncodeError as e: |
3189 | lang = unicodise_s(os.getenv("LANG"), 'ascii') | |
3277 | lang = unicodise_s(os.getenv("LANG", "NOTSET"), 'ascii') | |
3190 | 3278 | msg = """ |
3191 | 3279 | You have encountered a UnicodeEncodeError. Your environment |
3192 | 3280 | variable LANG=%s may not specify a Unicode encoding (e.g. UTF-8). |
48 | 48 | s3cmd \fBrestore\fR \fIs3://BUCKET/OBJECT\fR |
49 | 49 | Restore file from Glacier storage |
50 | 50 | .TP |
51 | s3cmd \fBsync\fR \fILOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR\fR | |
51 | s3cmd \fBsync\fR \fILOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR or s3://BUCKET[/PREFIX] s3://BUCKET[/PREFIX]\fR | |
52 | 52 | Synchronize a directory tree to S3 (checks files freshness using size and md5 checksum, unless overridden by options, see below) |
53 | 53 | .TP |
54 | 54 | s3cmd \fBdu\fR \fI[s3://BUCKET[/PREFIX]]\fR |
275 | 275 | source file [sync] |
276 | 276 | .TP |
277 | 277 | \fB\-\-no\-delete\-removed\fR |
278 | Don't delete destination objects. | |
278 | Don't delete destination objects [sync] | |
279 | 279 | .TP |
280 | 280 | \fB\-\-delete\-after\fR |
281 | 281 | Perform deletes AFTER new uploads when delete-removed |
523 | 523 | Enable debug output. |
524 | 524 | .TP |
525 | 525 | \fB\-\-version\fR |
526 | Show s3cmd version (2.1.0) and exit. | |
526 | Show s3cmd version (2.2.0) and exit. | |
527 | 527 | .TP |
528 | 528 | \fB\-F\fR, \fB\-\-follow\-symlinks\fR |
529 | 529 | Follow symbolic links as if they are regular files |
537 | 537 | \fB\-\-ca\-certs\fR=CA_CERTS_FILE |
538 | 538 | Path to SSL CA certificate FILE (instead of system |
539 | 539 | default) |
540 | .TP | |
541 | \fB\-\-ssl\-cert\fR=SSL_CLIENT_CERT_FILE | |
542 | Path to client own SSL certificate CRT_FILE | |
543 | .TP | |
544 | \fB\-\-ssl\-key\fR=SSL_CLIENT_KEY_FILE | |
545 | Path to client own SSL certificate private key | |
546 | KEY_FILE | |
540 | 547 | .TP |
541 | 548 | \fB\-\-check\-certificate\fR |
542 | 549 | Check SSL certificate validity |
0 | 0 | Metadata-Version: 1.2 |
1 | 1 | Name: s3cmd |
2 | Version: 2.1.0 | |
2 | Version: 2.2.0 | |
3 | 3 | Summary: Command line tool for managing Amazon S3 and CloudFront services |
4 | 4 | Home-page: http://s3tools.org |
5 | 5 | Author: Michal Ludvig |
19 | 19 | Authors: |
20 | 20 | -------- |
21 | 21 | Florent Viard <florent@sodria.com> |
22 | ||
22 | 23 | Michal Ludvig <michal@logix.cz> |
24 | ||
23 | 25 | Matt Domsch (github.com/mdomsch) |
24 | 26 | |
25 | 27 | Platform: UNKNOWN |