Imported Debian patch 0.4-1
Jonathan Wiltshire
14 years ago
0 | 0 | Metadata-Version: 1.0 |
1 | 1 | Name: pyquery |
2 | Version: 0.3.1 | |
2 | Version: 0.4 | |
3 | 3 | Summary: A jquery-like library for python |
4 | 4 | Home-page: http://www.bitbucket.org/olauzanne/pyquery/ |
5 | 5 | Author: Olivier Lauzanne |
25 | 25 | Bitbucket. I have the policy of giving push access to anyone who wants it |
26 | 26 | and then to review what he does. So if you want to contribute just email me. |
27 | 27 | |
28 | The Sphinx documentation is available on `pyquery.org`_. | |
28 | The full documentation is available on `pyquery.org`_. | |
29 | 29 | |
30 | 30 | .. _deliverance: http://www.gawel.org/weblog/en/2008/12/skinning-with-pyquery-and-deliverance |
31 | 31 | .. _project: http://www.bitbucket.org/olauzanne/pyquery/ |
32 | 32 | .. _pyquery.org: http://pyquery.org/ |
33 | 33 | |
34 | .. contents:: | |
35 | ||
36 | Usage | |
37 | ----- | |
34 | Quickstart | |
35 | ========== | |
38 | 36 | |
39 | 37 | You can use the PyQuery class to load an xml document from a string, a lxml |
40 | 38 | document, from a file or from an url:: |
41 | 39 | |
42 | 40 | >>> from pyquery import PyQuery as pq |
43 | 41 | >>> from lxml import etree |
42 | >>> import urllib | |
44 | 43 | >>> d = pq("<html></html>") |
45 | 44 | >>> d = pq(etree.fromstring("<html></html>")) |
46 | 45 | >>> d = pq(url='http://google.com/') |
46 | >>> d = pq(url='http://google.com/', opener=lambda url: urllib.urlopen(url).read()) | |
47 | 47 | >>> d = pq(filename=path_to_html_file) |
48 | 48 | |
49 | 49 | Now d is like the $ in jquery:: |
56 | 56 | >>> p.html("you know <a href='http://python.org/'>Python</a> rocks") |
57 | 57 | [<p#hello.hello>] |
58 | 58 | >>> p.html() |
59 | 'you know <a href="http://python.org/">Python</a> rocks' | |
59 | u'you know <a href="http://python.org/">Python</a> rocks' | |
60 | 60 | >>> p.text() |
61 | 61 | 'you know Python rocks' |
62 | 62 | |
68 | 68 | [<p#hello.hello>] |
69 | 69 | |
70 | 70 | |
71 | Attributes | |
72 | ---------- | |
73 | ||
74 | You can play with the attributes with the jquery API:: | |
75 | ||
76 | >>> p.attr("id") | |
77 | 'hello' | |
78 | >>> p.attr("id", "plop") | |
79 | [<p#plop.hello>] | |
80 | >>> p.attr("id", "hello") | |
81 | [<p#hello.hello>] | |
82 | ||
83 | ||
84 | Or in a more pythonic way:: | |
85 | ||
86 | >>> p.attr.id = "plop" | |
87 | >>> p.attr.id | |
88 | 'plop' | |
89 | >>> p.attr["id"] = "ola" | |
90 | >>> p.attr["id"] | |
91 | 'ola' | |
92 | >>> p.attr(id='hello', class_='hello2') | |
93 | [<p#hello.hello2>] | |
94 | >>> p.attr.class_ | |
95 | 'hello2' | |
96 | >>> p.attr.class_ = 'hello' | |
97 | ||
98 | CSS | |
99 | --- | |
100 | ||
101 | You can also play with css classes:: | |
102 | ||
103 | >>> p.addClass("toto") | |
104 | [<p#hello.toto.hello>] | |
105 | >>> p.toggleClass("titi toto") | |
106 | [<p#hello.titi.hello>] | |
107 | >>> p.removeClass("titi") | |
108 | [<p#hello.hello>] | |
109 | ||
110 | Or the css style:: | |
111 | ||
112 | >>> p.css("font-size", "15px") | |
113 | [<p#hello.hello>] | |
114 | >>> p.attr("style") | |
115 | 'font-size: 15px' | |
116 | >>> p.css({"font-size": "17px"}) | |
117 | [<p#hello.hello>] | |
118 | >>> p.attr("style") | |
119 | 'font-size: 17px' | |
120 | ||
121 | Same thing the pythonic way ('_' characters are translated to '-'):: | |
122 | ||
123 | >>> p.css.font_size = "16px" | |
124 | >>> p.attr.style | |
125 | 'font-size: 16px' | |
126 | >>> p.css['font-size'] = "15px" | |
127 | >>> p.attr.style | |
128 | 'font-size: 15px' | |
129 | >>> p.css(font_size="16px") | |
130 | [<p#hello.hello>] | |
131 | >>> p.attr.style | |
132 | 'font-size: 16px' | |
133 | >>> p.css = {"font-size": "17px"} | |
134 | >>> p.attr.style | |
135 | 'font-size: 17px' | |
136 | ||
137 | Traversing | |
138 | ---------- | |
139 | ||
140 | Some jQuery traversal methods are supported. Here are a few examples. | |
141 | ||
142 | You can filter the selection list using a string selector:: | |
143 | ||
144 | >>> d('p').filter('.hello') | |
145 | [<p#hello.hello>] | |
146 | ||
147 | It is possible to select a single element with eq:: | |
148 | ||
149 | >>> d('p').eq(0) | |
150 | [<p#hello.hello>] | |
151 | ||
152 | You can find nested elements:: | |
153 | ||
154 | >>> d('p').find('a') | |
155 | [<a>, <a>] | |
156 | >>> d('p').eq(1).find('a') | |
157 | [<a>] | |
158 | ||
159 | Breaking out of a level of traversal is also supported using end:: | |
160 | ||
161 | >>> d('p').find('a').end() | |
162 | [<p#hello.hello>, <p#test>] | |
163 | >>> d('p').eq(0).end() | |
164 | [<p#hello.hello>, <p#test>] | |
165 | >>> d('p').filter(lambda i: i == 1).end() | |
166 | [<p#hello.hello>, <p#test>] | |
167 | ||
168 | Manipulating | |
169 | ------------ | |
170 | ||
171 | You can also add content to the end of tags:: | |
172 | ||
173 | >>> d('p').append('check out <a href="http://reddit.com/r/python"><span>reddit</span></a>') | |
174 | [<p#hello.hello>, <p#test>] | |
175 | >>> print d | |
176 | <html> | |
177 | ... | |
178 | <p class="hello" id="hello" style="font-size: 17px">you know <a href="http://python.org/">Python</a> rockscheck out <a href="http://reddit.com/r/python"><span>reddit</span></a></p><p id="test"> | |
179 | hello <a href="http://python.org">python</a> ! | |
180 | check out <a href="http://python.org/">Python</a> rockscheck out <a href="http://reddit.com/r/python"><span>reddit</span></a></p> | |
181 | ... | |
182 | ||
183 | Or to the beginning:: | |
184 | ||
185 | >>> p.prepend('check out <a href="http://reddit.com/r/python">reddit</a>') | |
186 | [<p#hello.hello>] | |
187 | >>> p.html() | |
188 | 'check out <a href="http://reddit.com/r/python">reddit</a>you know ...' | |
189 | ||
190 | Prepend or append an element into an other:: | |
191 | ||
192 | >>> p.prependTo(d('#test')) | |
193 | [<p#hello.hello>] | |
194 | >>> d('#test').html() | |
195 | '<p class="hello" ...</p>...hello...python...' | |
196 | ||
197 | Insert an element after another:: | |
198 | ||
199 | >>> p.insertAfter(d('#test')) | |
200 | [<p#hello.hello>] | |
201 | >>> d('#test').html() | |
202 | '<a href="http://python.org">python</a> !...' | |
203 | ||
204 | Or before:: | |
205 | ||
206 | >>> p.insertBefore(d('#test')) | |
207 | [<p#hello.hello>] | |
208 | >>> d('body').html() | |
209 | '\n<p class="hello" id="hello" style="font-size: 17px">...' | |
210 | ||
211 | Doing something for each elements:: | |
212 | ||
213 | >>> p.each(lambda e: e.addClass('hello2')) | |
214 | [<p#hello.hello2.hello>] | |
215 | ||
216 | Remove an element:: | |
217 | ||
218 | >>> d.remove('p#id') | |
219 | [<html>] | |
220 | >>> d('p#id') | |
221 | [] | |
222 | ||
223 | Replace an element by another:: | |
224 | ||
225 | >>> p.replaceWith('<p>testing</p>') | |
226 | [<p#hello.hello2.hello>] | |
227 | >>> d('p') | |
228 | [<p>, <p#test>] | |
229 | ||
230 | Or the other way around:: | |
231 | ||
232 | >>> d('<h1>arya stark</h1>').replaceAll('p') | |
233 | [<h1>] | |
234 | >>> d('p') | |
235 | [] | |
236 | >>> d('h1') | |
237 | [<h1>, <h1>] | |
238 | ||
239 | Remove what's inside the selection:: | |
240 | ||
241 | >>> d('h1').empty() | |
242 | [<h1>, <h1>] | |
243 | ||
244 | And you can get back the modified html:: | |
245 | ||
246 | >>> print d | |
247 | <html> | |
248 | <body> | |
249 | <h1/><h1/></body> | |
250 | </html> | |
251 | ||
252 | You can generate html stuff:: | |
253 | ||
254 | >>> from pyquery import PyQuery as pq | |
255 | >>> print pq('<div>Yeah !</div>').addClass('myclass') + pq('<b>cool</b>') | |
256 | <div class="myclass">Yeah !</div><b>cool</b> | |
257 | ||
258 | ||
259 | AJAX | |
260 | ---- | |
261 | ||
262 | .. fake imports | |
263 | ||
264 | >>> from ajax import PyQuery as pq | |
265 | ||
266 | You can query some wsgi app if `WebOb`_ is installed (it's not a pyquery | |
267 | dependencie). IN this example the test app returns a simple input at `/` and a | |
268 | submit button at `/submit`:: | |
269 | ||
270 | >>> d = pq('<form></form>', app=input_app) | |
271 | >>> d.append(d.get('/')) | |
272 | [<form>] | |
273 | >>> print d | |
274 | <form><input name="youyou" type="text" value=""/></form> | |
275 | ||
276 | The app is also available in new nodes:: | |
277 | ||
278 | >>> d.get('/').app is d.app is d('form').app | |
279 | True | |
280 | ||
281 | You can also request another path:: | |
282 | ||
283 | >>> d.append(d.get('/submit')) | |
284 | [<form>] | |
285 | >>> print d | |
286 | <form><input name="youyou" type="text" value=""/><input type="submit" value="OK"/></form> | |
287 | ||
288 | If `Paste`_ is installed, you are able to get url directly with a `Proxy`_ app:: | |
289 | ||
290 | >>> a = d.get('http://pyquery.org/') | |
291 | >>> a | |
292 | [<html>] | |
293 | ||
294 | You can retrieve the app response:: | |
295 | ||
296 | >>> print a.response.status | |
297 | 200 OK | |
298 | ||
299 | The response attribute is a `WebOb`_ `Response`_ | |
300 | ||
301 | .. _webob: http://pythonpaste.org/webob/ | |
302 | .. _response: http://pythonpaste.org/webob/#response | |
303 | .. _paste: http://pythonpaste.org/ | |
304 | .. _proxy: http://pythonpaste.org/modules/proxy.html#paste.proxy.Proxy | |
305 | ||
306 | Making links absolute | |
307 | --------------------- | |
308 | ||
309 | You can make links absolute which can be usefull for screen scrapping:: | |
310 | ||
311 | >>> d = pq(url='http://www.w3.org/', parser='html') | |
312 | >>> d('a[title="W3C Activities"]').attr('href') | |
313 | '/Consortium/activities' | |
314 | >>> d.make_links_absolute() | |
315 | [<html>] | |
316 | >>> d('a[title="W3C Activities"]').attr('href') | |
317 | 'http://www.w3.org/Consortium/activities' | |
318 | ||
319 | Using different parsers | |
320 | ----------------------- | |
321 | ||
322 | By default pyquery uses the lxml xml parser and then if it doesn't work goes on | |
323 | to try the html parser from lxml.html. The xml parser can sometimes be | |
324 | problematic when parsing xhtml pages because the parser will not raise an error | |
325 | but give an unusable tree (on w3c.org for example). | |
326 | ||
327 | You can also choose which parser to use explicitly:: | |
328 | ||
329 | >>> pq('<html><body><p>toto</p></body></html>', parser='xml') | |
330 | [<html>] | |
331 | >>> pq('<html><body><p>toto</p></body></html>', parser='html') | |
332 | [<html>] | |
333 | >>> pq('<html><body><p>toto</p></body></html>', parser='html_fragments') | |
334 | [<p>] | |
335 | ||
336 | The html and html_fragments parser are the ones from lxml.html. | |
337 | ||
338 | Testing | |
339 | ------- | |
340 | ||
341 | If you want to run the tests that you can see above you should do:: | |
342 | ||
343 | $ hg clone https://bitbucket.org/olauzanne/pyquery/ | |
344 | $ cd pyquery | |
345 | $ python bootstrap.py | |
346 | $ bin/buildout | |
347 | $ bin/test | |
348 | ||
349 | You can build the Sphinx documentation by doing:: | |
350 | ||
351 | $ cd docs | |
352 | $ make html | |
353 | ||
354 | If you don't already have lxml installed use this line:: | |
355 | ||
356 | $ STATIC_DEPS=true bin/buildout | |
357 | ||
358 | More documentation | |
359 | ------------------ | |
360 | ||
361 | First there is the Sphinx documentation `here`_. | |
362 | Then for more documentation about the API you can use the `jquery website`_. | |
363 | The reference I'm now using for the API is ... the `color cheat sheet`_. | |
364 | Then you can always look at the `code`_. | |
365 | ||
366 | .. _jquery website: http://docs.jquery.com/ | |
367 | .. _code: http://www.bitbucket.org/olauzanne/pyquery/src/tip/pyquery/pyquery.py | |
368 | .. _here: http://pyquery.org | |
369 | .. _color cheat sheet: http://colorcharge.com/wp-content/uploads/2007/12/jquery12_colorcharge.png | |
370 | ||
371 | TODO | |
372 | ---- | |
373 | ||
374 | - SELECTORS: still missing some jQuery pseudo classes (:radio, :password, ...) | |
375 | - ATTRIBUTES: done | |
376 | - CSS: done | |
377 | - HTML: done | |
378 | - MANIPULATING: missing the wrapAll and wrapInner methods | |
379 | - TRAVERSING: about half done | |
380 | - EVENTS: nothing to do with server side might be used later for automatic ajax | |
381 | - CORE UI EFFECTS: did hide and show the rest doesn't really makes sense on | |
382 | server side | |
383 | - AJAX: some with wsgi app | |
384 | ||
385 | 71 | Keywords: jquery html xml |
386 | 72 | Platform: UNKNOWN |
0 | pyquery (0.4-1) unstable; urgency=low | |
1 | ||
2 | * New upstream release. | |
3 | ||
4 | -- Jonathan Wiltshire <debian@jwiltshire.org.uk> Thu, 28 Jan 2010 11:48:35 +0000 | |
5 | ||
0 | 6 | pyquery (0.3.1-1) unstable; urgency=low |
1 | 7 | |
2 | 8 | * Initial release (Closes: #552001) |
1 | 1 | Section: python |
2 | 2 | Priority: extra |
3 | 3 | Maintainer: Debian Python Modules Team <python-modules-team@lists.alioth.debian.org> |
4 | Uploaders: Jonathan Wiltshire <debian@jwiltshire.org.uk> | |
4 | Uploaders: Jonathan Wiltshire <debian@jwiltshire.org.uk>, | |
5 | TANIGUCHI Takaki <takaki@debian.org> | |
6 | DM-Upload-Allowed: yes | |
5 | 7 | Build-Depends: debhelper (>= 7), python-all (>= 2.5) |
6 | 8 | Build-Depends-Indep: python-support (>= 0.5.3), python-setuptools |
7 | 9 | Standards-Version: 3.8.3 |
17 | 17 | Bitbucket. I have the policy of giving push access to anyone who wants it |
18 | 18 | and then to review what he does. So if you want to contribute just email me. |
19 | 19 | |
20 | The Sphinx documentation is available on `pyquery.org`_. | |
20 | The full documentation is available on `pyquery.org`_. | |
21 | 21 | |
22 | 22 | .. _deliverance: http://www.gawel.org/weblog/en/2008/12/skinning-with-pyquery-and-deliverance |
23 | 23 | .. _project: http://www.bitbucket.org/olauzanne/pyquery/ |
24 | 24 | .. _pyquery.org: http://pyquery.org/ |
25 | 25 | |
26 | .. contents:: | |
27 | ||
28 | Usage | |
29 | ----- | |
26 | Quickstart | |
27 | ========== | |
30 | 28 | |
31 | 29 | You can use the PyQuery class to load an xml document from a string, a lxml |
32 | 30 | document, from a file or from an url:: |
33 | 31 | |
34 | 32 | >>> from pyquery import PyQuery as pq |
35 | 33 | >>> from lxml import etree |
34 | >>> import urllib | |
36 | 35 | >>> d = pq("<html></html>") |
37 | 36 | >>> d = pq(etree.fromstring("<html></html>")) |
38 | 37 | >>> d = pq(url='http://google.com/') |
38 | >>> d = pq(url='http://google.com/', opener=lambda url: urllib.urlopen(url).read()) | |
39 | 39 | >>> d = pq(filename=path_to_html_file) |
40 | 40 | |
41 | 41 | Now d is like the $ in jquery:: |
48 | 48 | >>> p.html("you know <a href='http://python.org/'>Python</a> rocks") |
49 | 49 | [<p#hello.hello>] |
50 | 50 | >>> p.html() |
51 | 'you know <a href="http://python.org/">Python</a> rocks' | |
51 | u'you know <a href="http://python.org/">Python</a> rocks' | |
52 | 52 | >>> p.text() |
53 | 53 | 'you know Python rocks' |
54 | 54 | |
59 | 59 | >>> d('p:first') |
60 | 60 | [<p#hello.hello>] |
61 | 61 | |
62 | ||
63 | Attributes | |
64 | ---------- | |
65 | ||
66 | You can play with the attributes with the jquery API:: | |
67 | ||
68 | >>> p.attr("id") | |
69 | 'hello' | |
70 | >>> p.attr("id", "plop") | |
71 | [<p#plop.hello>] | |
72 | >>> p.attr("id", "hello") | |
73 | [<p#hello.hello>] | |
74 | ||
75 | ||
76 | Or in a more pythonic way:: | |
77 | ||
78 | >>> p.attr.id = "plop" | |
79 | >>> p.attr.id | |
80 | 'plop' | |
81 | >>> p.attr["id"] = "ola" | |
82 | >>> p.attr["id"] | |
83 | 'ola' | |
84 | >>> p.attr(id='hello', class_='hello2') | |
85 | [<p#hello.hello2>] | |
86 | >>> p.attr.class_ | |
87 | 'hello2' | |
88 | >>> p.attr.class_ = 'hello' | |
89 | ||
90 | CSS | |
91 | --- | |
92 | ||
93 | You can also play with css classes:: | |
94 | ||
95 | >>> p.addClass("toto") | |
96 | [<p#hello.toto.hello>] | |
97 | >>> p.toggleClass("titi toto") | |
98 | [<p#hello.titi.hello>] | |
99 | >>> p.removeClass("titi") | |
100 | [<p#hello.hello>] | |
101 | ||
102 | Or the css style:: | |
103 | ||
104 | >>> p.css("font-size", "15px") | |
105 | [<p#hello.hello>] | |
106 | >>> p.attr("style") | |
107 | 'font-size: 15px' | |
108 | >>> p.css({"font-size": "17px"}) | |
109 | [<p#hello.hello>] | |
110 | >>> p.attr("style") | |
111 | 'font-size: 17px' | |
112 | ||
113 | Same thing the pythonic way ('_' characters are translated to '-'):: | |
114 | ||
115 | >>> p.css.font_size = "16px" | |
116 | >>> p.attr.style | |
117 | 'font-size: 16px' | |
118 | >>> p.css['font-size'] = "15px" | |
119 | >>> p.attr.style | |
120 | 'font-size: 15px' | |
121 | >>> p.css(font_size="16px") | |
122 | [<p#hello.hello>] | |
123 | >>> p.attr.style | |
124 | 'font-size: 16px' | |
125 | >>> p.css = {"font-size": "17px"} | |
126 | >>> p.attr.style | |
127 | 'font-size: 17px' | |
128 | ||
129 | Traversing | |
130 | ---------- | |
131 | ||
132 | Some jQuery traversal methods are supported. Here are a few examples. | |
133 | ||
134 | You can filter the selection list using a string selector:: | |
135 | ||
136 | >>> d('p').filter('.hello') | |
137 | [<p#hello.hello>] | |
138 | ||
139 | It is possible to select a single element with eq:: | |
140 | ||
141 | >>> d('p').eq(0) | |
142 | [<p#hello.hello>] | |
143 | ||
144 | You can find nested elements:: | |
145 | ||
146 | >>> d('p').find('a') | |
147 | [<a>, <a>] | |
148 | >>> d('p').eq(1).find('a') | |
149 | [<a>] | |
150 | ||
151 | Breaking out of a level of traversal is also supported using end:: | |
152 | ||
153 | >>> d('p').find('a').end() | |
154 | [<p#hello.hello>, <p#test>] | |
155 | >>> d('p').eq(0).end() | |
156 | [<p#hello.hello>, <p#test>] | |
157 | >>> d('p').filter(lambda i: i == 1).end() | |
158 | [<p#hello.hello>, <p#test>] | |
159 | ||
160 | Manipulating | |
161 | ------------ | |
162 | ||
163 | You can also add content to the end of tags:: | |
164 | ||
165 | >>> d('p').append('check out <a href="http://reddit.com/r/python"><span>reddit</span></a>') | |
166 | [<p#hello.hello>, <p#test>] | |
167 | >>> print d | |
168 | <html> | |
169 | ... | |
170 | <p class="hello" id="hello" style="font-size: 17px">you know <a href="http://python.org/">Python</a> rockscheck out <a href="http://reddit.com/r/python"><span>reddit</span></a></p><p id="test"> | |
171 | hello <a href="http://python.org">python</a> ! | |
172 | check out <a href="http://python.org/">Python</a> rockscheck out <a href="http://reddit.com/r/python"><span>reddit</span></a></p> | |
173 | ... | |
174 | ||
175 | Or to the beginning:: | |
176 | ||
177 | >>> p.prepend('check out <a href="http://reddit.com/r/python">reddit</a>') | |
178 | [<p#hello.hello>] | |
179 | >>> p.html() | |
180 | 'check out <a href="http://reddit.com/r/python">reddit</a>you know ...' | |
181 | ||
182 | Prepend or append an element into an other:: | |
183 | ||
184 | >>> p.prependTo(d('#test')) | |
185 | [<p#hello.hello>] | |
186 | >>> d('#test').html() | |
187 | '<p class="hello" ...</p>...hello...python...' | |
188 | ||
189 | Insert an element after another:: | |
190 | ||
191 | >>> p.insertAfter(d('#test')) | |
192 | [<p#hello.hello>] | |
193 | >>> d('#test').html() | |
194 | '<a href="http://python.org">python</a> !...' | |
195 | ||
196 | Or before:: | |
197 | ||
198 | >>> p.insertBefore(d('#test')) | |
199 | [<p#hello.hello>] | |
200 | >>> d('body').html() | |
201 | '\n<p class="hello" id="hello" style="font-size: 17px">...' | |
202 | ||
203 | Doing something for each elements:: | |
204 | ||
205 | >>> p.each(lambda e: e.addClass('hello2')) | |
206 | [<p#hello.hello2.hello>] | |
207 | ||
208 | Remove an element:: | |
209 | ||
210 | >>> d.remove('p#id') | |
211 | [<html>] | |
212 | >>> d('p#id') | |
213 | [] | |
214 | ||
215 | Replace an element by another:: | |
216 | ||
217 | >>> p.replaceWith('<p>testing</p>') | |
218 | [<p#hello.hello2.hello>] | |
219 | >>> d('p') | |
220 | [<p>, <p#test>] | |
221 | ||
222 | Or the other way around:: | |
223 | ||
224 | >>> d('<h1>arya stark</h1>').replaceAll('p') | |
225 | [<h1>] | |
226 | >>> d('p') | |
227 | [] | |
228 | >>> d('h1') | |
229 | [<h1>, <h1>] | |
230 | ||
231 | Remove what's inside the selection:: | |
232 | ||
233 | >>> d('h1').empty() | |
234 | [<h1>, <h1>] | |
235 | ||
236 | And you can get back the modified html:: | |
237 | ||
238 | >>> print d | |
239 | <html> | |
240 | <body> | |
241 | <h1/><h1/></body> | |
242 | </html> | |
243 | ||
244 | You can generate html stuff:: | |
245 | ||
246 | >>> from pyquery import PyQuery as pq | |
247 | >>> print pq('<div>Yeah !</div>').addClass('myclass') + pq('<b>cool</b>') | |
248 | <div class="myclass">Yeah !</div><b>cool</b> | |
249 | ||
250 | ||
251 | AJAX | |
252 | ---- | |
253 | ||
254 | .. fake imports | |
255 | ||
256 | >>> from ajax import PyQuery as pq | |
257 | ||
258 | You can query some wsgi app if `WebOb`_ is installed (it's not a pyquery | |
259 | dependencie). IN this example the test app returns a simple input at `/` and a | |
260 | submit button at `/submit`:: | |
261 | ||
262 | >>> d = pq('<form></form>', app=input_app) | |
263 | >>> d.append(d.get('/')) | |
264 | [<form>] | |
265 | >>> print d | |
266 | <form><input name="youyou" type="text" value=""/></form> | |
267 | ||
268 | The app is also available in new nodes:: | |
269 | ||
270 | >>> d.get('/').app is d.app is d('form').app | |
271 | True | |
272 | ||
273 | You can also request another path:: | |
274 | ||
275 | >>> d.append(d.get('/submit')) | |
276 | [<form>] | |
277 | >>> print d | |
278 | <form><input name="youyou" type="text" value=""/><input type="submit" value="OK"/></form> | |
279 | ||
280 | If `Paste`_ is installed, you are able to get url directly with a `Proxy`_ app:: | |
281 | ||
282 | >>> a = d.get('http://pyquery.org/') | |
283 | >>> a | |
284 | [<html>] | |
285 | ||
286 | You can retrieve the app response:: | |
287 | ||
288 | >>> print a.response.status | |
289 | 200 OK | |
290 | ||
291 | The response attribute is a `WebOb`_ `Response`_ | |
292 | ||
293 | .. _webob: http://pythonpaste.org/webob/ | |
294 | .. _response: http://pythonpaste.org/webob/#response | |
295 | .. _paste: http://pythonpaste.org/ | |
296 | .. _proxy: http://pythonpaste.org/modules/proxy.html#paste.proxy.Proxy | |
297 | ||
298 | Making links absolute | |
299 | --------------------- | |
300 | ||
301 | You can make links absolute which can be usefull for screen scrapping:: | |
302 | ||
303 | >>> d = pq(url='http://www.w3.org/', parser='html') | |
304 | >>> d('a[title="W3C Activities"]').attr('href') | |
305 | '/Consortium/activities' | |
306 | >>> d.make_links_absolute() | |
307 | [<html>] | |
308 | >>> d('a[title="W3C Activities"]').attr('href') | |
309 | 'http://www.w3.org/Consortium/activities' | |
310 | ||
311 | Using different parsers | |
312 | ----------------------- | |
313 | ||
314 | By default pyquery uses the lxml xml parser and then if it doesn't work goes on | |
315 | to try the html parser from lxml.html. The xml parser can sometimes be | |
316 | problematic when parsing xhtml pages because the parser will not raise an error | |
317 | but give an unusable tree (on w3c.org for example). | |
318 | ||
319 | You can also choose which parser to use explicitly:: | |
320 | ||
321 | >>> pq('<html><body><p>toto</p></body></html>', parser='xml') | |
322 | [<html>] | |
323 | >>> pq('<html><body><p>toto</p></body></html>', parser='html') | |
324 | [<html>] | |
325 | >>> pq('<html><body><p>toto</p></body></html>', parser='html_fragments') | |
326 | [<p>] | |
327 | ||
328 | The html and html_fragments parser are the ones from lxml.html. | |
329 | ||
330 | Testing | |
331 | ------- | |
332 | ||
333 | If you want to run the tests that you can see above you should do:: | |
334 | ||
335 | $ hg clone https://bitbucket.org/olauzanne/pyquery/ | |
336 | $ cd pyquery | |
337 | $ python bootstrap.py | |
338 | $ bin/buildout | |
339 | $ bin/test | |
340 | ||
341 | You can build the Sphinx documentation by doing:: | |
342 | ||
343 | $ cd docs | |
344 | $ make html | |
345 | ||
346 | If you don't already have lxml installed use this line:: | |
347 | ||
348 | $ STATIC_DEPS=true bin/buildout | |
349 | ||
350 | More documentation | |
351 | ------------------ | |
352 | ||
353 | First there is the Sphinx documentation `here`_. | |
354 | Then for more documentation about the API you can use the `jquery website`_. | |
355 | The reference I'm now using for the API is ... the `color cheat sheet`_. | |
356 | Then you can always look at the `code`_. | |
357 | ||
358 | .. _jquery website: http://docs.jquery.com/ | |
359 | .. _code: http://www.bitbucket.org/olauzanne/pyquery/src/tip/pyquery/pyquery.py | |
360 | .. _here: http://pyquery.org | |
361 | .. _color cheat sheet: http://colorcharge.com/wp-content/uploads/2007/12/jquery12_colorcharge.png | |
362 | ||
363 | TODO | |
364 | ---- | |
365 | ||
366 | - SELECTORS: still missing some jQuery pseudo classes (:radio, :password, ...) | |
367 | - ATTRIBUTES: done | |
368 | - CSS: done | |
369 | - HTML: done | |
370 | - MANIPULATING: missing the wrapAll and wrapInner methods | |
371 | - TRAVERSING: about half done | |
372 | - EVENTS: nothing to do with server side might be used later for automatic ajax | |
373 | - CORE UI EFFECTS: did hide and show the rest doesn't really makes sense on | |
374 | server side | |
375 | - AJAX: some with wsgi app |
2 | 2 | # Copyright (C) 2008 - Olivier Lauzanne <olauzanne@gmail.com> |
3 | 3 | # |
4 | 4 | # Distributed under the BSD license, see LICENSE.txt |
5 | from lxml.cssselect import Pseudo, XPathExpr, XPathExprOr, Function, css_to_xpath | |
5 | from lxml.cssselect import Pseudo, XPathExpr, XPathExprOr, Function, css_to_xpath, Element | |
6 | 6 | from lxml import cssselect |
7 | 7 | |
8 | 8 | class JQueryPseudo(Pseudo): |
54 | 54 | return xpath |
55 | 55 | |
56 | 56 | def _xpath_enabled(self, xpath): |
57 | """Matches all elements that are disabled. | |
57 | """Matches all elements that are enabled. | |
58 | 58 | """ |
59 | 59 | xpath.add_condition("not(@disabled) and name(.) = 'input'") |
60 | 60 | return xpath |
63 | 63 | """Matches all input elements of type file. |
64 | 64 | """ |
65 | 65 | xpath.add_condition("@type = 'file' and name(.) = 'input'") |
66 | return xpath | |
67 | ||
68 | def _xpath_input(self, xpath): | |
69 | """Matches all input elements. | |
70 | """ | |
71 | xpath.add_condition("(name(.) = 'input' or name(.) = 'select') " | |
72 | + "or (name(.) = 'textarea' or name(.) = 'button')") | |
73 | return xpath | |
74 | ||
75 | def _xpath_button(self, xpath): | |
76 | """Matches all button input elements and the button element. | |
77 | """ | |
78 | xpath.add_condition("(@type = 'button' and name(.) = 'input') " | |
79 | + "or name(.) = 'button'") | |
80 | return xpath | |
81 | ||
82 | def _xpath_radio(self, xpath): | |
83 | """Matches all radio input elements. | |
84 | """ | |
85 | xpath.add_condition("@type = 'radio' and name(.) = 'input'") | |
86 | return xpath | |
87 | ||
88 | def _xpath_text(self, xpath): | |
89 | """Matches all text input elements. | |
90 | """ | |
91 | xpath.add_condition("@type = 'text' and name(.) = 'input'") | |
92 | return xpath | |
93 | ||
94 | def _xpath_checkbox(self, xpath): | |
95 | """Matches all checkbox input elements. | |
96 | """ | |
97 | xpath.add_condition("@type = 'checkbox' and name(.) = 'input'") | |
98 | return xpath | |
99 | ||
100 | def _xpath_password(self, xpath): | |
101 | """Matches all password input elements. | |
102 | """ | |
103 | xpath.add_condition("@type = 'password' and name(.) = 'input'") | |
104 | return xpath | |
105 | ||
106 | def _xpath_submit(self, xpath): | |
107 | """Matches all submit input elements. | |
108 | """ | |
109 | xpath.add_condition("@type = 'submit' and name(.) = 'input'") | |
110 | return xpath | |
111 | ||
112 | def _xpath_image(self, xpath): | |
113 | """Matches all image input elements. | |
114 | """ | |
115 | xpath.add_condition("@type = 'image' and name(.) = 'input'") | |
116 | return xpath | |
117 | ||
118 | def _xpath_reset(self, xpath): | |
119 | """Matches all reset input elements. | |
120 | """ | |
121 | xpath.add_condition("@type = 'reset' and name(.) = 'input'") | |
122 | return xpath | |
123 | ||
124 | def _xpath_header(self, xpath): | |
125 | """Matches all header elelements (h1, ..., h6) | |
126 | """ | |
127 | # this seems kind of brute-force, is there a better way? | |
128 | xpath.add_condition("(name(.) = 'h1' or name(.) = 'h2' or name (.) = 'h3') " | |
129 | + "or (name(.) = 'h4' or name (.) = 'h5' or name(.) = 'h6')") | |
130 | return xpath | |
131 | ||
132 | def _xpath_parent(self, xpath): | |
133 | """Match all elements that contain other elements | |
134 | """ | |
135 | xpath.add_condition("count(child::*) > 0") | |
136 | return xpath | |
137 | ||
138 | def _xpath_empty(self, xpath): | |
139 | """Match all elements that do not contain other elements | |
140 | """ | |
141 | xpath.add_condition("count(child::*) = 0") | |
66 | 142 | return xpath |
67 | 143 | |
68 | 144 | cssselect.Pseudo = JQueryPseudo |
87 | 163 | """Matches all elements with an index below the given one. |
88 | 164 | """ |
89 | 165 | xpath.add_post_condition('position() < %s' % int(expr+1)) |
166 | return xpath | |
167 | ||
168 | def _xpath_contains(self, xpath, expr): | |
169 | """Matches all elements that contain the given text | |
170 | """ | |
171 | xpath.add_post_condition("contains(text(), '%s')" % str(expr)) | |
90 | 172 | return xpath |
91 | 173 | |
92 | 174 | cssselect.Function = JQueryFunction |
133 | 215 | |
134 | 216 | cssselect.XPathExprOr = AdvancedXPathExprOr |
135 | 217 | |
136 | def selector_to_xpath(selector): | |
218 | class JQueryElement(Element): | |
219 | """ | |
220 | Represents namespace|element | |
221 | """ | |
222 | ||
223 | def xpath(self): | |
224 | if self.namespace == '*': | |
225 | el = self.element | |
226 | else: | |
227 | # FIXME: Should we lowercase here? | |
228 | el = '%s:%s' % (self.namespace, self.element) | |
229 | return AdvancedXPathExpr(element=el) | |
230 | ||
231 | cssselect.Element = JQueryElement | |
232 | ||
233 | def selector_to_xpath(selector, prefix='descendant-or-self::'): | |
137 | 234 | """JQuery selector to xpath. |
138 | 235 | """ |
139 | 236 | selector = selector.replace('[@', '[') |
140 | return css_to_xpath(selector) | |
237 | return css_to_xpath(selector, prefix) |
86 | 86 | if 'filename' in kwargs: |
87 | 87 | html = file(kwargs['filename']).read() |
88 | 88 | elif 'url' in kwargs: |
89 | from urllib2 import urlopen | |
90 | url = kwargs['url'] | |
91 | html = urlopen(url).read() | |
89 | url = kwargs.pop('url') | |
90 | if 'opener' in kwargs: | |
91 | opener = kwargs.pop('opener') | |
92 | html = opener(url) | |
93 | else: | |
94 | from urllib2 import urlopen | |
95 | html = urlopen(url).read() | |
92 | 96 | self._base_url = url |
93 | 97 | else: |
94 | 98 | raise ValueError('Invalid keyword arguments %s' % kwargs) |
157 | 161 | self._extend(other[:]) |
158 | 162 | |
159 | 163 | def __str__(self): |
160 | """html representation of current nodes | |
164 | """xml representation of current nodes:: | |
165 | ||
166 | >>> xml = PyQuery('<script><![[CDATA[ ]></script>', parser='html_fragments') | |
167 | >>> print str(xml) | |
168 | <script><![[CDATA[ ]></script> | |
169 | ||
161 | 170 | """ |
162 | 171 | return ''.join([etree.tostring(e) for e in self]) |
172 | ||
173 | def __html__(self): | |
174 | """html representation of current nodes:: | |
175 | ||
176 | >>> html = PyQuery('<script><![[CDATA[ ]></script>', parser='html_fragments') | |
177 | >>> print html.__html__() | |
178 | <script><![[CDATA[ ]></script> | |
179 | ||
180 | """ | |
181 | return ''.join([lxml.html.tostring(e) for e in self]) | |
163 | 182 | |
164 | 183 | def __repr__(self): |
165 | 184 | r = [] |
179 | 198 | # Traversing # |
180 | 199 | ############## |
181 | 200 | |
201 | def _filter_only(self, selector, elements, reverse=False, unique=False): | |
202 | """Filters the selection set only, as opposed to also including | |
203 | descendants. | |
204 | """ | |
205 | if selector is None: | |
206 | results = elements | |
207 | else: | |
208 | xpath = selector_to_xpath(selector, 'self::') | |
209 | results = [] | |
210 | for tag in elements: | |
211 | results.extend(tag.xpath(xpath)) | |
212 | if reverse: | |
213 | results.reverse() | |
214 | if unique: | |
215 | result_list = results | |
216 | results = [] | |
217 | for item in result_list: | |
218 | if not item in results: | |
219 | results.append(item) | |
220 | return self.__class__(results, **dict(parent=self)) | |
221 | ||
222 | def parent(self, selector=None): | |
223 | return self._filter_only(selector, [e.getparent() for e in self if e.getparent() is not None], unique = True) | |
224 | ||
225 | def prev(self, selector=None): | |
226 | return self._filter_only(selector, [e.getprevious() for e in self if e.getprevious() is not None]) | |
227 | ||
228 | def next(self, selector=None): | |
229 | return self._filter_only(selector, [e.getnext() for e in self if e.getnext() is not None]) | |
230 | ||
231 | def _traverse(self, method): | |
232 | for e in self: | |
233 | current = getattr(e, method)() | |
234 | while current is not None: | |
235 | yield current | |
236 | current = getattr(current, method)() | |
237 | ||
238 | def _traverse_parent_topdown(self): | |
239 | for e in self: | |
240 | this_list = [] | |
241 | current = e.getparent() | |
242 | while current is not None: | |
243 | this_list.append(current) | |
244 | current = current.getparent() | |
245 | this_list.reverse() | |
246 | for j in this_list: | |
247 | yield j | |
248 | ||
249 | def _nextAll(self): | |
250 | return [e for e in self._traverse('getnext')] | |
251 | ||
252 | def nextAll(self, selector=None): | |
253 | """ | |
254 | >>> d = PyQuery('<span><p class="hello">Hi</p><p>Bye</p><img scr=""/></span>') | |
255 | >>> d('p:last').nextAll() | |
256 | [<img>] | |
257 | """ | |
258 | return self._filter_only(selector, self._nextAll()) | |
259 | ||
260 | def _prevAll(self): | |
261 | return [e for e in self._traverse('getprevious')] | |
262 | ||
263 | def prevAll(self, selector=None): | |
264 | """ | |
265 | >>> d = PyQuery('<span><p class="hello">Hi</p><p>Bye</p><img scr=""/></span>') | |
266 | >>> d('p:last').prevAll() | |
267 | [<p.hello>] | |
268 | """ | |
269 | return self._filter_only(selector, self._prevAll(), reverse = True) | |
270 | ||
271 | def siblings(self, selector=None): | |
272 | """ | |
273 | >>> d = PyQuery('<span><p class="hello">Hi</p><p>Bye</p><img scr=""/></span>') | |
274 | >>> d('.hello').siblings() | |
275 | [<p>, <img>] | |
276 | >>> d('.hello').siblings('img') | |
277 | [<img>] | |
278 | """ | |
279 | return self._filter_only(selector, self._prevAll() + self._nextAll()) | |
280 | ||
281 | def parents(self, selector=None): | |
282 | """ | |
283 | >>> d = PyQuery('<span><p class="hello">Hi</p><p>Bye</p></span>') | |
284 | >>> d('p').parents() | |
285 | [<span>] | |
286 | >>> d('.hello').parents('span') | |
287 | [<span>] | |
288 | >>> d('.hello').parents('p') | |
289 | [] | |
290 | """ | |
291 | return self._filter_only( | |
292 | selector, | |
293 | [e for e in self._traverse_parent_topdown()], | |
294 | unique = True | |
295 | ) | |
296 | ||
297 | def children(self, selector=None): | |
298 | """Filter elements that are direct children of self using optional selector. | |
299 | ||
300 | >>> d = PyQuery('<span><p class="hello">Hi</p><p>Bye</p></span>') | |
301 | >>> d | |
302 | [<span>] | |
303 | >>> d.children() | |
304 | [<p.hello>, <p>] | |
305 | >>> d.children('.hello') | |
306 | [<p.hello>] | |
307 | """ | |
308 | elements = [child for tag in self for child in tag.getchildren()] | |
309 | return self._filter_only(selector, elements) | |
310 | ||
182 | 311 | def filter(self, selector): |
183 | 312 | """Filter elements in self using selector (string or function). |
184 | 313 | |
193 | 322 | [<p.hello>] |
194 | 323 | """ |
195 | 324 | if not callable(selector): |
196 | return self.__class__(selector, self, **dict(parent=self)) | |
325 | return self._filter_only(selector, self) | |
197 | 326 | else: |
198 | 327 | elements = [] |
199 | 328 | try: |
251 | 380 | [<p.hello>] |
252 | 381 | >>> d('p').eq(1) |
253 | 382 | [<p>] |
254 | """ | |
255 | return self.__class__([self[index]], **dict(parent=self)) | |
383 | >>> d('p').eq(2) | |
384 | [] | |
385 | """ | |
386 | # Use slicing to silently handle out of bounds indexes | |
387 | items = self[index:index+1] | |
388 | return self.__class__(items, **dict(parent=self)) | |
256 | 389 | |
257 | 390 | def each(self, func): |
258 | 391 | """apply func on each nodes |
526 | 659 | if not children: |
527 | 660 | return tag.text |
528 | 661 | html = tag.text or '' |
529 | html += ''.join(map(etree.tostring, children)) | |
662 | html += ''.join(map(lambda x: etree.tostring(x, encoding=unicode), children)) | |
530 | 663 | return html |
531 | 664 | else: |
532 | 665 | if isinstance(value, self.__class__): |
823 | 956 | if expr is no_default: |
824 | 957 | for tag in self: |
825 | 958 | parent = tag.getparent() |
826 | parent.remove(tag) | |
959 | if parent is not None: | |
960 | if tag.tail: | |
961 | if not parent.text: | |
962 | parent.text = '' | |
963 | parent.text += ' ' + tag.tail | |
964 | parent.remove(tag) | |
827 | 965 | else: |
828 | 966 | results = self.__class__(expr, self) |
829 | 967 | results.remove() |
6 | 6 | from lxml import etree |
7 | 7 | import unittest |
8 | 8 | import doctest |
9 | import httplib | |
10 | import socket | |
9 | 11 | import os |
10 | 12 | |
11 | 13 | import pyquery |
12 | 14 | from pyquery import PyQuery as pq |
13 | 15 | from ajax import PyQuery as pqa |
14 | 16 | |
17 | socket.setdefaulttimeout(1) | |
18 | ||
19 | try: | |
20 | conn = httplib.HTTPConnection("pyquery.org:80") | |
21 | conn.request("GET", "/") | |
22 | response = conn.getresponse() | |
23 | except socket.timeout: | |
24 | GOT_NET=False | |
25 | else: | |
26 | GOT_NET=True | |
27 | ||
28 | def with_net(func): | |
29 | if GOT_NET: | |
30 | return func | |
31 | ||
15 | 32 | dirname = os.path.dirname(os.path.abspath(pyquery.__file__)) |
33 | docs = os.path.join(os.path.dirname(dirname), 'docs') | |
16 | 34 | path_to_html_file = os.path.join(dirname, 'test.html') |
17 | 35 | |
18 | 36 | def input_app(environ, start_response): |
38 | 56 | def setUp(self): |
39 | 57 | test = self._dt_test |
40 | 58 | test.globs.update(globals()) |
59 | ||
60 | for filename in os.listdir(docs): | |
61 | if filename.endswith('.txt'): | |
62 | if not GOT_NET and filename in ('ajax.txt', 'tips.txt'): | |
63 | continue | |
64 | klass_name = 'Test%s' % filename.replace('.txt', '').title() | |
65 | path = os.path.join(docs, filename) | |
66 | exec '%s = type("%s", (TestReadme,), dict(path=path))' % (klass_name, klass_name) | |
41 | 67 | |
42 | 68 | class TestTests(doctest.DocFileCase): |
43 | 69 | path = os.path.join(dirname, 'tests.txt') |
94 | 120 | <input name="radio" type="radio" value="one"/> |
95 | 121 | <input name="radio" type="radio" value="two" checked="checked"/> |
96 | 122 | <input name="radio" type="radio" value="three"/> |
123 | <input name="checkbox" type="checkbox" value="a"/> | |
124 | <input name="checkbox" type="checkbox" value="b" checked="checked"/> | |
125 | <input name="checkbox" type="checkbox" value="c"/> | |
126 | <input name="button" type="button" value="button" /> | |
127 | <button>button</button> | |
97 | 128 | </form> |
129 | </body> | |
130 | </html> | |
131 | """ | |
132 | ||
133 | html5 = """ | |
134 | <html> | |
135 | <body> | |
136 | <h1>Heading 1</h1> | |
137 | <h2>Heading 2</h2> | |
138 | <h3>Heading 3</h3> | |
139 | <h4>Heading 4</h4> | |
140 | <h5>Heading 5</h5> | |
141 | <h6>Heading 6</h6> | |
98 | 142 | </body> |
99 | 143 | </html> |
100 | 144 | """ |
141 | 185 | #test on the form |
142 | 186 | e = self.klass(self.html4) |
143 | 187 | assert len(e(':disabled')) == 1 |
144 | assert len(e('input:enabled')) == 5 | |
188 | assert len(e('input:enabled')) == 9 | |
145 | 189 | assert len(e(':selected')) == 1 |
146 | assert len(e(':checked')) == 1 | |
190 | assert len(e(':checked')) == 2 | |
147 | 191 | assert len(e(':file')) == 1 |
192 | assert len(e(':input')) == 12 | |
193 | assert len(e(':button')) == 2 | |
194 | assert len(e(':radio')) == 3 | |
195 | assert len(e(':checkbox')) == 3 | |
196 | ||
197 | #test on other elements | |
198 | e = self.klass(self.html5) | |
199 | assert len(e(":header")) == 6 | |
200 | assert len(e(":parent")) == 2 | |
201 | assert len(e(":empty")) == 6 | |
202 | assert len(e(":contains('Heading')")) == 6 | |
148 | 203 | |
149 | 204 | class TestTraversal(unittest.TestCase): |
150 | 205 | klass = pq |
174 | 229 | assert len(self.klass('#node2', self.html).find('span')) == 2 |
175 | 230 | assert len(self.klass('div', self.html).find('span')) == 3 |
176 | 231 | |
232 | def test_each(self): | |
233 | doc = self.klass(self.html) | |
234 | doc('span').each(lambda e: e.wrap("<em></em>")) | |
235 | assert len(doc('em')) == 3 | |
236 | ||
177 | 237 | def test_map(self): |
178 | 238 | def ids_minus_one(i, elem): |
179 | 239 | return int(self.klass(elem).attr('id')[-1]) - 1 |
182 | 242 | def test_end(self): |
183 | 243 | assert len(self.klass('div', self.html).find('span').end()) == 2 |
184 | 244 | assert len(self.klass('#node2', self.html).find('span').end()) == 1 |
245 | ||
246 | ||
247 | class TestOpener(unittest.TestCase): | |
248 | ||
249 | def test_custom_opener(self): | |
250 | def opener(url): | |
251 | return '<html><body><div class="node"></div>' | |
252 | ||
253 | doc = pq(url='http://example.com', opener=opener) | |
254 | assert len(doc('.node')) == 1, doc | |
185 | 255 | |
186 | 256 | def application(environ, start_response): |
187 | 257 | req = Request(environ) |
200 | 270 | class TestAjaxSelector(TestSelector): |
201 | 271 | klass = pqa |
202 | 272 | |
273 | @with_net | |
203 | 274 | def test_proxy(self): |
204 | 275 | e = self.klass([]) |
205 | 276 | val = e.get('http://pyquery.org/') |
233 | 304 | val = n.post('/') |
234 | 305 | assert len(val('a')) == 1, val |
235 | 306 | |
307 | class TestManipulating(unittest.TestCase): | |
308 | html = ''' | |
309 | <div class="portlet"> | |
310 | <a href="/toto">Test<img src ="myimage" />My link text</a> | |
311 | <a href="/toto2"><img src ="myimage2" />My link text 2</a> | |
312 | </div> | |
313 | ''' | |
314 | ||
315 | def test_remove(self): | |
316 | d = pq(self.html) | |
317 | d('img').remove() | |
318 | val = d('a:first').html() | |
319 | assert val == 'Test My link text', repr(val) | |
320 | val = d('a:last').html() | |
321 | assert val == ' My link text 2', repr(val) | |
322 | ||
236 | 323 | if __name__ == '__main__': |
237 | 324 | fails, total = unittest.main() |
238 | 325 | if fails == 0: |
0 | 0 | Metadata-Version: 1.0 |
1 | 1 | Name: pyquery |
2 | Version: 0.3.1 | |
2 | Version: 0.4 | |
3 | 3 | Summary: A jquery-like library for python |
4 | 4 | Home-page: http://www.bitbucket.org/olauzanne/pyquery/ |
5 | 5 | Author: Olivier Lauzanne |
25 | 25 | Bitbucket. I have the policy of giving push access to anyone who wants it |
26 | 26 | and then to review what he does. So if you want to contribute just email me. |
27 | 27 | |
28 | The Sphinx documentation is available on `pyquery.org`_. | |
28 | The full documentation is available on `pyquery.org`_. | |
29 | 29 | |
30 | 30 | .. _deliverance: http://www.gawel.org/weblog/en/2008/12/skinning-with-pyquery-and-deliverance |
31 | 31 | .. _project: http://www.bitbucket.org/olauzanne/pyquery/ |
32 | 32 | .. _pyquery.org: http://pyquery.org/ |
33 | 33 | |
34 | .. contents:: | |
35 | ||
36 | Usage | |
37 | ----- | |
34 | Quickstart | |
35 | ========== | |
38 | 36 | |
39 | 37 | You can use the PyQuery class to load an xml document from a string, a lxml |
40 | 38 | document, from a file or from an url:: |
41 | 39 | |
42 | 40 | >>> from pyquery import PyQuery as pq |
43 | 41 | >>> from lxml import etree |
42 | >>> import urllib | |
44 | 43 | >>> d = pq("<html></html>") |
45 | 44 | >>> d = pq(etree.fromstring("<html></html>")) |
46 | 45 | >>> d = pq(url='http://google.com/') |
46 | >>> d = pq(url='http://google.com/', opener=lambda url: urllib.urlopen(url).read()) | |
47 | 47 | >>> d = pq(filename=path_to_html_file) |
48 | 48 | |
49 | 49 | Now d is like the $ in jquery:: |
56 | 56 | >>> p.html("you know <a href='http://python.org/'>Python</a> rocks") |
57 | 57 | [<p#hello.hello>] |
58 | 58 | >>> p.html() |
59 | 'you know <a href="http://python.org/">Python</a> rocks' | |
59 | u'you know <a href="http://python.org/">Python</a> rocks' | |
60 | 60 | >>> p.text() |
61 | 61 | 'you know Python rocks' |
62 | 62 | |
68 | 68 | [<p#hello.hello>] |
69 | 69 | |
70 | 70 | |
71 | Attributes | |
72 | ---------- | |
73 | ||
74 | You can play with the attributes with the jquery API:: | |
75 | ||
76 | >>> p.attr("id") | |
77 | 'hello' | |
78 | >>> p.attr("id", "plop") | |
79 | [<p#plop.hello>] | |
80 | >>> p.attr("id", "hello") | |
81 | [<p#hello.hello>] | |
82 | ||
83 | ||
84 | Or in a more pythonic way:: | |
85 | ||
86 | >>> p.attr.id = "plop" | |
87 | >>> p.attr.id | |
88 | 'plop' | |
89 | >>> p.attr["id"] = "ola" | |
90 | >>> p.attr["id"] | |
91 | 'ola' | |
92 | >>> p.attr(id='hello', class_='hello2') | |
93 | [<p#hello.hello2>] | |
94 | >>> p.attr.class_ | |
95 | 'hello2' | |
96 | >>> p.attr.class_ = 'hello' | |
97 | ||
98 | CSS | |
99 | --- | |
100 | ||
101 | You can also play with css classes:: | |
102 | ||
103 | >>> p.addClass("toto") | |
104 | [<p#hello.toto.hello>] | |
105 | >>> p.toggleClass("titi toto") | |
106 | [<p#hello.titi.hello>] | |
107 | >>> p.removeClass("titi") | |
108 | [<p#hello.hello>] | |
109 | ||
110 | Or the css style:: | |
111 | ||
112 | >>> p.css("font-size", "15px") | |
113 | [<p#hello.hello>] | |
114 | >>> p.attr("style") | |
115 | 'font-size: 15px' | |
116 | >>> p.css({"font-size": "17px"}) | |
117 | [<p#hello.hello>] | |
118 | >>> p.attr("style") | |
119 | 'font-size: 17px' | |
120 | ||
121 | Same thing the pythonic way ('_' characters are translated to '-'):: | |
122 | ||
123 | >>> p.css.font_size = "16px" | |
124 | >>> p.attr.style | |
125 | 'font-size: 16px' | |
126 | >>> p.css['font-size'] = "15px" | |
127 | >>> p.attr.style | |
128 | 'font-size: 15px' | |
129 | >>> p.css(font_size="16px") | |
130 | [<p#hello.hello>] | |
131 | >>> p.attr.style | |
132 | 'font-size: 16px' | |
133 | >>> p.css = {"font-size": "17px"} | |
134 | >>> p.attr.style | |
135 | 'font-size: 17px' | |
136 | ||
137 | Traversing | |
138 | ---------- | |
139 | ||
140 | Some jQuery traversal methods are supported. Here are a few examples. | |
141 | ||
142 | You can filter the selection list using a string selector:: | |
143 | ||
144 | >>> d('p').filter('.hello') | |
145 | [<p#hello.hello>] | |
146 | ||
147 | It is possible to select a single element with eq:: | |
148 | ||
149 | >>> d('p').eq(0) | |
150 | [<p#hello.hello>] | |
151 | ||
152 | You can find nested elements:: | |
153 | ||
154 | >>> d('p').find('a') | |
155 | [<a>, <a>] | |
156 | >>> d('p').eq(1).find('a') | |
157 | [<a>] | |
158 | ||
159 | Breaking out of a level of traversal is also supported using end:: | |
160 | ||
161 | >>> d('p').find('a').end() | |
162 | [<p#hello.hello>, <p#test>] | |
163 | >>> d('p').eq(0).end() | |
164 | [<p#hello.hello>, <p#test>] | |
165 | >>> d('p').filter(lambda i: i == 1).end() | |
166 | [<p#hello.hello>, <p#test>] | |
167 | ||
168 | Manipulating | |
169 | ------------ | |
170 | ||
171 | You can also add content to the end of tags:: | |
172 | ||
173 | >>> d('p').append('check out <a href="http://reddit.com/r/python"><span>reddit</span></a>') | |
174 | [<p#hello.hello>, <p#test>] | |
175 | >>> print d | |
176 | <html> | |
177 | ... | |
178 | <p class="hello" id="hello" style="font-size: 17px">you know <a href="http://python.org/">Python</a> rockscheck out <a href="http://reddit.com/r/python"><span>reddit</span></a></p><p id="test"> | |
179 | hello <a href="http://python.org">python</a> ! | |
180 | check out <a href="http://python.org/">Python</a> rockscheck out <a href="http://reddit.com/r/python"><span>reddit</span></a></p> | |
181 | ... | |
182 | ||
183 | Or to the beginning:: | |
184 | ||
185 | >>> p.prepend('check out <a href="http://reddit.com/r/python">reddit</a>') | |
186 | [<p#hello.hello>] | |
187 | >>> p.html() | |
188 | 'check out <a href="http://reddit.com/r/python">reddit</a>you know ...' | |
189 | ||
190 | Prepend or append an element into an other:: | |
191 | ||
192 | >>> p.prependTo(d('#test')) | |
193 | [<p#hello.hello>] | |
194 | >>> d('#test').html() | |
195 | '<p class="hello" ...</p>...hello...python...' | |
196 | ||
197 | Insert an element after another:: | |
198 | ||
199 | >>> p.insertAfter(d('#test')) | |
200 | [<p#hello.hello>] | |
201 | >>> d('#test').html() | |
202 | '<a href="http://python.org">python</a> !...' | |
203 | ||
204 | Or before:: | |
205 | ||
206 | >>> p.insertBefore(d('#test')) | |
207 | [<p#hello.hello>] | |
208 | >>> d('body').html() | |
209 | '\n<p class="hello" id="hello" style="font-size: 17px">...' | |
210 | ||
211 | Doing something for each elements:: | |
212 | ||
213 | >>> p.each(lambda e: e.addClass('hello2')) | |
214 | [<p#hello.hello2.hello>] | |
215 | ||
216 | Remove an element:: | |
217 | ||
218 | >>> d.remove('p#id') | |
219 | [<html>] | |
220 | >>> d('p#id') | |
221 | [] | |
222 | ||
223 | Replace an element by another:: | |
224 | ||
225 | >>> p.replaceWith('<p>testing</p>') | |
226 | [<p#hello.hello2.hello>] | |
227 | >>> d('p') | |
228 | [<p>, <p#test>] | |
229 | ||
230 | Or the other way around:: | |
231 | ||
232 | >>> d('<h1>arya stark</h1>').replaceAll('p') | |
233 | [<h1>] | |
234 | >>> d('p') | |
235 | [] | |
236 | >>> d('h1') | |
237 | [<h1>, <h1>] | |
238 | ||
239 | Remove what's inside the selection:: | |
240 | ||
241 | >>> d('h1').empty() | |
242 | [<h1>, <h1>] | |
243 | ||
244 | And you can get back the modified html:: | |
245 | ||
246 | >>> print d | |
247 | <html> | |
248 | <body> | |
249 | <h1/><h1/></body> | |
250 | </html> | |
251 | ||
252 | You can generate html stuff:: | |
253 | ||
254 | >>> from pyquery import PyQuery as pq | |
255 | >>> print pq('<div>Yeah !</div>').addClass('myclass') + pq('<b>cool</b>') | |
256 | <div class="myclass">Yeah !</div><b>cool</b> | |
257 | ||
258 | ||
259 | AJAX | |
260 | ---- | |
261 | ||
262 | .. fake imports | |
263 | ||
264 | >>> from ajax import PyQuery as pq | |
265 | ||
266 | You can query some wsgi app if `WebOb`_ is installed (it's not a pyquery | |
267 | dependencie). IN this example the test app returns a simple input at `/` and a | |
268 | submit button at `/submit`:: | |
269 | ||
270 | >>> d = pq('<form></form>', app=input_app) | |
271 | >>> d.append(d.get('/')) | |
272 | [<form>] | |
273 | >>> print d | |
274 | <form><input name="youyou" type="text" value=""/></form> | |
275 | ||
276 | The app is also available in new nodes:: | |
277 | ||
278 | >>> d.get('/').app is d.app is d('form').app | |
279 | True | |
280 | ||
281 | You can also request another path:: | |
282 | ||
283 | >>> d.append(d.get('/submit')) | |
284 | [<form>] | |
285 | >>> print d | |
286 | <form><input name="youyou" type="text" value=""/><input type="submit" value="OK"/></form> | |
287 | ||
288 | If `Paste`_ is installed, you are able to get url directly with a `Proxy`_ app:: | |
289 | ||
290 | >>> a = d.get('http://pyquery.org/') | |
291 | >>> a | |
292 | [<html>] | |
293 | ||
294 | You can retrieve the app response:: | |
295 | ||
296 | >>> print a.response.status | |
297 | 200 OK | |
298 | ||
299 | The response attribute is a `WebOb`_ `Response`_ | |
300 | ||
301 | .. _webob: http://pythonpaste.org/webob/ | |
302 | .. _response: http://pythonpaste.org/webob/#response | |
303 | .. _paste: http://pythonpaste.org/ | |
304 | .. _proxy: http://pythonpaste.org/modules/proxy.html#paste.proxy.Proxy | |
305 | ||
306 | Making links absolute | |
307 | --------------------- | |
308 | ||
309 | You can make links absolute which can be usefull for screen scrapping:: | |
310 | ||
311 | >>> d = pq(url='http://www.w3.org/', parser='html') | |
312 | >>> d('a[title="W3C Activities"]').attr('href') | |
313 | '/Consortium/activities' | |
314 | >>> d.make_links_absolute() | |
315 | [<html>] | |
316 | >>> d('a[title="W3C Activities"]').attr('href') | |
317 | 'http://www.w3.org/Consortium/activities' | |
318 | ||
319 | Using different parsers | |
320 | ----------------------- | |
321 | ||
322 | By default pyquery uses the lxml xml parser and then if it doesn't work goes on | |
323 | to try the html parser from lxml.html. The xml parser can sometimes be | |
324 | problematic when parsing xhtml pages because the parser will not raise an error | |
325 | but give an unusable tree (on w3c.org for example). | |
326 | ||
327 | You can also choose which parser to use explicitly:: | |
328 | ||
329 | >>> pq('<html><body><p>toto</p></body></html>', parser='xml') | |
330 | [<html>] | |
331 | >>> pq('<html><body><p>toto</p></body></html>', parser='html') | |
332 | [<html>] | |
333 | >>> pq('<html><body><p>toto</p></body></html>', parser='html_fragments') | |
334 | [<p>] | |
335 | ||
336 | The html and html_fragments parser are the ones from lxml.html. | |
337 | ||
338 | Testing | |
339 | ------- | |
340 | ||
341 | If you want to run the tests that you can see above you should do:: | |
342 | ||
343 | $ hg clone https://bitbucket.org/olauzanne/pyquery/ | |
344 | $ cd pyquery | |
345 | $ python bootstrap.py | |
346 | $ bin/buildout | |
347 | $ bin/test | |
348 | ||
349 | You can build the Sphinx documentation by doing:: | |
350 | ||
351 | $ cd docs | |
352 | $ make html | |
353 | ||
354 | If you don't already have lxml installed use this line:: | |
355 | ||
356 | $ STATIC_DEPS=true bin/buildout | |
357 | ||
358 | More documentation | |
359 | ------------------ | |
360 | ||
361 | First there is the Sphinx documentation `here`_. | |
362 | Then for more documentation about the API you can use the `jquery website`_. | |
363 | The reference I'm now using for the API is ... the `color cheat sheet`_. | |
364 | Then you can always look at the `code`_. | |
365 | ||
366 | .. _jquery website: http://docs.jquery.com/ | |
367 | .. _code: http://www.bitbucket.org/olauzanne/pyquery/src/tip/pyquery/pyquery.py | |
368 | .. _here: http://pyquery.org | |
369 | .. _color cheat sheet: http://colorcharge.com/wp-content/uploads/2007/12/jquery12_colorcharge.png | |
370 | ||
371 | TODO | |
372 | ---- | |
373 | ||
374 | - SELECTORS: still missing some jQuery pseudo classes (:radio, :password, ...) | |
375 | - ATTRIBUTES: done | |
376 | - CSS: done | |
377 | - HTML: done | |
378 | - MANIPULATING: missing the wrapAll and wrapInner methods | |
379 | - TRAVERSING: about half done | |
380 | - EVENTS: nothing to do with server side might be used later for automatic ajax | |
381 | - CORE UI EFFECTS: did hide and show the rest doesn't really makes sense on | |
382 | server side | |
383 | - AJAX: some with wsgi app | |
384 | ||
385 | 71 | Keywords: jquery html xml |
386 | 72 | Platform: UNKNOWN |