05 December 2014
R from Python with rpy2 - featuring rOpenSci!
Our last formal meeting of the year!
So you still want to use some R
Sure, we all love Python, but sometimes we need some fancy stats or viz in R that will just get your job done. No worries, rpy2 has your back!
As a demonstration, we'll show how you can use the awesome open and reproducible science libraries from the rOpenSci project from an IPython notebook.
Can it possibly be that easy?!?
Yes! Unless, as per usual, you are on windows. If you're interested, I'll talk a bit about challenges that remain and what you might do to help.
Fancy teaching Python in the D-Lab Jan 12-16?
It's a great experience, and we'll even pay you (a bit)! Please register your interest at this GitHub issue if you're interested!
The Code
Get it here.
21 November 2014
Tabular Data Smackdown - HDF5 or SQL?
Tabular data?
Tabular data shows up all the time. You've probably seen it in spreadsheets, but if you need more speed, data security, or advanced features, you might be happier using something like HDF5 or SQL.
So, what to use?
It turns out that different tools are good for different jobs. @katyhuff, the organizer of the Berkeley chapter of the Hacker Within will take us through some examples of when you might use a relational database (using sqlite) vs. an efficient tabular storage format (using pytables).
If there's time, she'll take us through some examples of using pytables. Tutorial material is available here.
Bring your questions and data challenges, and let's see if we can solve your problems.
07 November 2014
Reproducible Science with Dexy and Docker
Documenting complex scientific workflows
@ananelson will give us an overview of using Dexy with Docker. Ana is the primary author of Dexy, so it doesn't get any more authoritative than this!
If there's interest, we may discuss some of the recent moves towards using the Berkeley Common Environment with Docker.
(Updates) Some links and stuff
We're currently keeping track of links and things on this etherpad.
Note that Dexy has a great tutorial.
There is a (currently somewhat out-of-date) Docker example on github.
Follow Ana and Dexy on Twitter.
17 October 2014
Getting Started with Geo-coding
Geocoding with Python
Maybe you want to find something. You have an address, but where is that? Sure, you could use google maps. But what if you wanted to do it with a python script?
Come for this Beginner's Mind session to find out. Or, learn whatever else you're interested in!
Google Doc available at http://tinyurl.com/k5r4ume.
03 October 2014
Blinky Lights
Blinky Lights
Dav is very busy, but is still presenting. It'll be something like this, but less musical and more IPython. Except the IPython part isn't written yet.
26 September 2014
Teaching Python for Big Open Data
This coming Friday – 2014.09.26 4:30-5:30pm (during the off-week for the Python Worker's Party), Raymond Yee (former lecturer at the School of Information), along with Lisa Green and Stephen Merity (of CommonCrawl.org) will lead a discussion on the topic of teaching Python for big data. We (Stephen, Raymond, and Lisa) have been developing training materials for computing on web crawl data as a vehicle for teaching both web science and techniques for handling large amounts of data.
We're actively working on the training materials and would love to get feedback on our work in progress. Some topics we hope to sketch out this Friday are:
- What is a web crawl and what exactly is in the CommonCrawl data sets and how the data is structured (housed in AWS S3)?
- How Python programmers might be able to process this data with mrjob (Stephen has already developed some materials on this front: cc-mrjob)
- How we might use a combination of Python multiprocessing and/or IPython Parallel + docker + AWS + the IPython notebook to do some exploratory data analysis.
- Use BCE for some computations?
- Figure out how to work in Apache Spark?
Everyone is welcome!
19 September 2014
New Meeting Structure?
Two kinds of meeting
I'm proposing that we shift to the following format (which is also reflected now on the D-Lab website):
1st Friday of each month: "New and Exciting" topic. The talks we've had recently about Blaze and Xray are good examples of that format.
3rd Friday of each month: "Beginners Mind". These weeks, folks are encouraged to bring their own problems. Presentations should be accessible to folks who are just getting started, and could focus on issues like project structure, etc.
Lightning talks are always welcome.
I'm also proposing that we start "for real" at 4:30. This week, I'm happy to show up at 4, and anyone else is welcome to do so as well.
05 September 2014
Xray!
Efficient multidimensional arrays?
This week, we'll continue with our exploration of high performance numberical data structures with a presentation on Xray.
And just in case you didn't check it out last week, I promise these python xray images are worth a click!
Other (free) stuff up the hill
Lawrence Berkeley Labs has published the near-final agenda for LabTech and there's still time to register whether you want to attend just some or all of the day's activities on Sept 10th.
Highlights include morning mini-classes, including 3 one hour sessions on getting the most out of Python in scientific computing, a 3 hour Arduino basics class, and, new this year, Intro and Advanced LabVIEW.
The (free) Lunch and Keynote starts at noon, with an overview of what's new (and old!) from IT this year.
22 August 2014
Installation party!
Welcome the new Pythonistas
We've got a group of new Python programmers who've been working hard all week. Please show up, share your knowledge, and help them orient on the path that lets them dig into their research!
Perhaps now is also a good time to have a look at our own learning resources and add your wisdom there as well?
Coming up: Xray
Coming up in two weeks, we'll continue with our exploration of high performance numberical data structures with a presentation on Xray.
And just in case I've deprived you of some of the serendipitous fun you might have found if you googled "python xray", check out these pythons!
08 August 2014
Real Data Science!
Dealing with sort-of-big data
Last meeting I (Dav) was talking smack about Pandas while singing the praises of Blaze. Fortunately, the eminently reasonable Thomas provided counterpoint on the virtues of the admittedly excellent pandas.
But all of that is about to get cleared up by one of the current developers of Blaze:
Blaze is a new project that provides a user interface similar to NumPy and Pandas but hooks out to a variety of data and computation backends like HDF5, SQL, and Spark. By separating the user interface from the computation we enable users to easily experiment with different systems based on their needs. Blaze is experimental and so input both on new backends and on usability is welcome. For a simple usage example see the README on GitHub
Also, their logo (for now) appears to be a tesseract, which is even cooler than the mathematicians would lead you to believe. Perhaps the same is true of Blaze?
Also, devops
We've just minted version 0.1 of BCE (which might stand for the "Berkeley Common Environment"). As is often the case, you fine people are amongst the first to know! The goal is to provide a standard "data science" VM for campus, serving both as a standardized learning environment, and also a standard reference environment where researchers can ensure their instructions work in at least one place!
Since I've been busy pulling down packages, Aaron from Berkeley Research Computing (BRC) offered to explain to me (in front of all of you) how he uses a caching proxy server to speed up repetitive installs, including his efforts to make this server highly portable with Docker. You may not have heard of Docker, but trust me, they're six ways to crazy about it in the Valley. Er, South Bay.
Help us teach!
We're organizing the next D-Lab training here on this very website! There, you'll see that the inimitable Matt Davis has already submitted a pull request to indicate his availability. Who will be next? How many points will they get?
Only time will tell. (Your questions will be answered on Friday, or just shoot me an email.)
25 July 2014
First meeting after SciPy 2014
What did we learn from SciPy?
It'd be great to reflect on what we learned from SciPy this year.
27 June 2014
Last meeting before SciPy
Some of us are going to SciPy
So, next meeting might be a good time to touch base on what we'll do at SciPy! In particular, if folks want to do a run-through of what they'll be presenting, this will be a great time. If you don't know what SciPy is, there's a website.
We'll also have at least one new visitor, so plenty to do! Look forward to seeing you there!
For those of you that want it, here's a link to the current build of BCE.
17 June 2014
Meetings this Summer
For the rest of the Summer
This summer we'll switch to every other week. In particular, we'll skip June 17!
But feel free to drop us a line on the mailing list, or come in to the D-Lab!
06 June 2014
No Meeting! But lots of other stuff
Go to a concert!
I will be going to this concert on Friday, and I encourage you to do the same!
And since the D-Lab is on reduced hours now (M-Th 1-3pm), it seems right to cancel the Workers' Party this week…
But despair not!
Tomorrow (Wednesday), you can join The Hacker Within at 4pm. It's been pretty chill since the summer hit, and they like you! And need your help / want to help you! They have a webpage on GitHub pages, just like us.
I recommend hatching a plot to overthrow my despotic rule.
And some open science?
If you fancy an evening out with some transparent and open science types, please join us here:
"Data Science Meets Social Science"
Thursday, June 5 | 6.00pm - 7.30pm
David Brower Center
2150 Allston Way | Berkeley, CA
This event also has a website (but not on GitHub).
30 May 2014
Welcome the new members!
We're finishing up another training
The demand for Python training keeps going up! We actually had trouble fitting this training into the D-Lab, and we've got a lot of motivated students (not all university students, though!). Interested in welcoming the new Pythonistas? Come to the Python Workers' Party and say hi! Help your colleagues get going on an exciting new project, or show them what you've been working on.
If you bring snacks, you get points.
Summer plans?
We'll also discuss plans for the summer, and point out that a sister group, The Hacker Within, has also been going strong this semester. Who knew? We should totally collaborate with these folks!
23 May 2014
Training the next generation of Pythonistas!
This week, I'll get your input and kick the tires on my Python Intensive curriculum that I'll be teaching next week in the D-Lab (preceded by a "what is programming?" workshop).
I'd love to get your input, but even more I'd love it if you can assist teaching! I especially need someone for Tuesday: 10am-noon for programming FUNdamentals and 1-4pm for the Python Intensive. But we can also use folks Wed-Fri from 1-4pm.
Updating the Site
Note that I updated last weeks event with links to the presentations (and the links to ipynb files are automatically munged w/ nbviewer links via some moderately brittle javascript).
Anyone can update the site via a pull request on GitHub! You'll get points, and can compete with some of these heavy hitters.
That's right - the Python Workers' Party has it's own digital currency, and you can even mine your own points using pull requests (and ask if you don't know how!). But please use this power responsibly.
Discussion
We have a lot of strong opinions. We had a lively discussion around how using VMs might save instructional time, but reduce the usage of python in the long run.
Teaching unit testing is hard. None of us know how to get people to write or even run tests without some sort of authority. I've heard tales of such things, though.
We're thinking it'd be good to have some motivation. Here are some example papers and videos:
- Code and Data
- ADD YOUR OWN!
16 May 2014
Packaging, a discussion / diatribe
Packaging is great!
I mean, packaging is really great. It means you can just install and run someone else's code and have good confidence it'll "just work." You can also share your work with others (or yourself) with a minimum of fuss.
Why is packaging so hard?
The scientific python community has had a long and difficult path with packaging – largely because we build complex codes that use lots of "foreign" languages like fortran and C.
Indeed, this week's SF Python meetup is actually discussing this very topic. We've got some scouts heading over, and we'll get a report back.
Pip
Currently, there are two main solutions. The "standard" system endorsed by the python packaging team is pip, and pip is now available by default in python 3.4
If you want to get the straight dirt from the python packaging team, they try to keep this up to date.
And, it turns out that Matthew Brett has graciously built all the scientific packages you're likely to want as wheels… I'm sure he'll tell us about it.
Conda
The other solution, conda, is offered by Continuum analytics as part of their Anaconda distribution. You could install it in other python distributions, but (for now), it seems that few people do. Why are these guys putting energy into a separate effort? Here's what Travis Oliphant (principal author of NumPy) has to say.
Want to know more about this "cross-platform homebrew written in Python?" Docs are online
Other packaging projects
Here's a video by Roman Shaposhnik who created the Apache BigTop project for packaging up the Hadoop ecosystem that may have some interesting parallels. And the slides to go with it.
And for those of you old enough to remember, we've had a presentation here at the Workers' Party on HashDist and friends (which, it turns out, is moving towards supporting conda).
Honorable mention
Setting up a complete development environment is hard. A team of us here on campus are working on the Berkeley Common Environment (BCE) to facilitate teaching and research using a standardized VM. Maybe by Friday, I'll have cleaned up the documentation at this link.
And, I'm back!
I know I've missed a lot of parties, but I hope you welcome me back. I'm hoping to pull in some folks from the broader community that I met in my travels… I'm looking forwards to a great summer of Python!
Updates
09 May 2014
Lots to do!
Recap of PyData?
PyData was great. It was a great reminder of how there's all these brilliant people out there working really hard to provide us with awesome free tools. You might check out the schedule and have a discussion around some of the more interesting topics.
Help a comrade out?
The D-Lab's very own Laura Nelson would like some help multiplying large matrices (on the order of 400k x 20k). I don't know how to do this off the top of my head, but if you could help her out on-list or at the Party, you may get points.
Neat part-time job
I connected with the folks from the Art of Problem Solving at PyData. They want to pay you to give programming feedback to high school kids. Check them out (scroll all the way to the bottom for the "Graders" section).
Carry on in Dav's absence
This brings me to my last point – I will be busy doing something else AGAIN. But I promise, next week, I'll show up and be super helpful and fun. Matthew Brett and I have been kicking around the idea of having a meeting about packaging, which for you academics out there could be central to developing your own CITED code projects.
Pinkies out! …or not.
02 May 2014
A Different Week
Another week without Dav
You're ALWAYS welcome in the D-Lab from 4-6pm on a Friday to dig into Python, and last week's event was certainly evidence that I don't need to be there!
But many of us will be attending PyData this weekend (and there's certainly still time to sign up).
25 April 2014
April 25th, 2014 roundup
While comrade Clark was away…
We had a great meeting with 11 people attending!
Jess Hamrick gave us an awesome overview of what it was like to attend her first PyCon . She told us about the overall format, some of the interesting talks she caught, but most importantly, how great it was to interact with the larger Python community! The videos are already up. Here's Asheesh Laroia's entertaining talk about Python Packaging which Jess recommended.
First-time attendee Martin, who is brand new to Python, asked about reading 55 thousand emails and starting to work with them. I (Paul Ivanov) suggested using the email module Moshe suggested OpenRefine.org as an option of munging the data, too.
It was also Matt Rocklin's first time at the Workers' Party. He gave us a quick spiel about PyToolz and CyToolz, two modules which bring even more functional programming ideas to Python.
Antony told us about faulthandler, a Python 3.3+ module which allows you print the stack trace and more information even if your process is dying. Unfortunately, it doesn't really work on Windows, which Antony is forced to use due to proprietary driver software for the microscope he uses in his research and we discussed a possible workaround.
Next week, a bunch of us (Min, Thomas, Matt, and Paul) are giving talks at PyData Silicon Valley. If you haven't registered, you can use the code CU@PyData to get 20% off of registration.
25 April 2014
Remixing Parties
Thanks for your help thinking about data!
Last week we had a great brainstorming session about open data. The results of that session made their way into a "concerns" file. Also, check out the winners of the hackathon – you helped them!
Carry on in Dav's absence
Tomorrow, I will be attending a workshop on Buddhism and Science. So, I won't make the Worker's party. However, the Python Workers' Party is not about me, so you are welcome to come hang out in the D-Lab from 4-6 anyway.
In particular, there is a fancy Blum center tour around campus that will be hitting the convening room at 5:25 and 5:45. But, the invite list is now closed.
But, you're special, because you already have a reason to be in that space! However, we should relocate the hacking out to the public space (the collaboratory) instead of the classroom ("convening room") that we normally use.
Jobs
I met a fella looking to hire a python programmer for reciprocity labs. Reciprocity labs sounds AWESOME. It's like literate computing (think IPython notebooks) for corporate governance and risk management (i.e., good citizenship). Worried that companies like Google are spying (or just helping the NSA spy) on the nation? These are the guys that are engineering solutions that can help us be sure that DOESN'T happen.
https://www.reciprocitylabs.com/careers
Should we have a job board? Or point to one? Let me know or submit a pull request!
18 April 2014
Save the Planet (with open code and data)
BERC Cleanweb Hackathon 2.0
Local do-gooders are organizing a hackathon this weekend to engage some great coders in coming up with solutions for opening access to utility data, water conservation, and "devices and prices." And if you don't fit into any of those categories, you're still eligible for the grand prize.
You can register here. And we can certainly use more mentors, even if you can only come for the afternoon!
Towards Open Data with Python
As you know, the "D" in D-Lab stands for "Social Science Data." But we still have some pretty ad hoc methods for organizing this data. The R folks are way ahead of us.
So, I'd like to have some discussion about best practices for pythonistas to deal with open data - both in terms of libraries and in terms of archives (looking towards something like the rOpenSci project linked above). Let's do this!
11 April 2014
Doing some Hacking
Son and/or Daughter, we need to hack…
Some weeks ago, Chris Holdgraf and I said we wanted to do some hacking on pandas – really easy stuff that will make everyone's life better. But, we didn't get around to it!
Then, Will Stein came by and told us about this awesome IPython extension to do Sage's convenience pre-parsing. Sure, it's the kind of thing that will make some people cringe, but it goes some way towards complaints that python makes you type a lot of stuff.
All we need to do is open a repo with a name (which might be hard!), put that code in there, et voilà, we've got this crazy new extension for everyone. And we'll need to factor out the other dependencies from Sage, but how hard can that be?
UPDATES: Total hijack by Thomas
So, instead of all that, Thomas got us to join in the fun with the first round of the Google Code Jam. It was great fun (for us anyway), and you can still go back and practice on previous contests (though it's too late to join this year). There was much whiteboarding and coding about cookies and minesweeper.
04 April 2014
What is the sound of one hand coding?
Python Koans
This Friday, I propose an exploration of "Koans" for learning python (and potentially other programming languages / frameworks). It turns out there's a whole internet subculture dedicated to this "test-driven learning" idea that I've been talking about with some of you!
As per usual, the Hitchiker's Guide to Python already knew about this.
Thanks to @ivanov for finding this, and yes, the Koan idea was started by a Rubyist…
The python guide suggests the following links to find more koans on GitHub and on BitBucket.
UPDATE: How'd it go?
The Python koans were surprisingly in line with the official python docs, starting with section 3. The Koans really aren't usable for a beginner without some orientation to these or similar docs (as is recommended by the above-mentioned Hitchhiker's Guide.
The Koans are quite dry, and for a beginning non-programmer, they may wonder why they spend all that time on the minutiae of string syntax. Jess did some work making a strings notebook that has the user go through and test whether various python strings are the same or different (using different syntax to get similar or identical strings). For example, are the following equivalent?
str1 = "This has two lines?\n"
str2 = r"This has two lines?\n"
str3 = """This has two lines?
"""
If you're unsure, your python interpreter knows! And the above link to section 3 in the tutorial should get you sorted out…
Also, it turns out that Behavior Driven Development (BDD) as implemented by RSpec and "such" in nose2 make Jess and Mike cringe. How about you?
21 March 2014
Teaching and Intervening
Teaching
We just had another bootcamp! I've been thinking about planning for future curricula in the D-Lab, and would love to chat with interested folks about what we do next.
UPDATE: We talked about different approaches we might take:
William Stein paid us a visit, and mentioned the utility of introducing a surprising gotcha after each topic.
You can (unsurprisingly) find some gotchas on Stack Overflow
Mobile (and other) Interventions
I'll also be fresh from a meeting about how to do mobile interventions and surveys, including technologies like Django. If folks wanted to touch base about that, I'd love to! Look for updates with links after the meeting.
NOTE: We didn't actually get to this.
Other stuff: William Stein / SAGE Math Cloud
William Stein presented SAGE Math Cloud. It's pretty rad, you can use it for free. He also still skates vert.
Fancy photo shoot
Lastly - Angela from the Berkeley Science Review showed up and took some pictures of us. She also said she thought William's presentation was pretty cool. Stay posted for further images.
Spring Break
PLEASE NOTE - we will be taking a break over spring break. You should too! We'll meet Friday, March 21, and then resume in April.
14 March 2014
MOOCs and SciPy
A call for submissions to SciPy 2014
The deadline for SciPy 2014 appproaches! It's a great, growing conference, with an emphasis on the following topics (espeically the first two this year):
- Education
- Geospatial data
- Astronomy and astrophysics
- Bioinformatics
- Geophysics
- Vision, Visualization, and Imaging
- Computational Social Science and Digital Humanities
- Engineering
There is also a Diversity Goal. Anyone up for organizing something for women, people of color, or other under-represented groups?
Abstract submissions are due this Friday, but if that's a deal-breaker, we will probably be able to accomodate later submissions (feel free to contact me or Katy Huff about it).
MOOCs this week
And now, a selfish topic request this week: the D-Lab is looking at becoming a resource for MOOC (1) data on campus. I'd invite folks who are engaged with online courseware to talk about what they've been doing. This could include things that Raymond is doing with the new bCourses system (our campus course management portal). Or particularly folks who've been working with EdX systems!
(1) MOOC = “Massive Open Online Course” You've probably heard of some of them, like Kahn Academy, EdX (& our local BerkeleyX), Codecademy, Coursera, etc.
A place for beginners
Lastly, please invite folks who are beginners. The idea is for the Python Workers' Party to support the development of our community! People can treat it like a "study hall" to follow a MOOC or book and ask for help if they get stuck, or ask for help with their projects.
What happened
Raymond talked about his experiences with bCourses (which runs on Canvas, which is produced by Instructure). He was initialy motivated to do automated grading, but it's been harder to do than to just have his TA do it.
So, he's been working on being able to do something like clickers during his class, although some of the response features of Canvas are still in beta. I.e., students can provide answers to questions during class, and then you can programmatically access what they respond.
There's a REST API, and Raymonds working on a library to work with it. Let him know if you want to join the effort!
07 March 2014
Updates from KBase and open Indian computing training
We had some great presentations today! Stephen Chan from LBNL's KBase team, and Kannan Moudgalya.
KBase
KBase is a project to adapt the IPython notebook to fascilitate collaboration between computational and experimental (bench) biologists. But they want to expand. There is a developer preview at demo.kbase.us, you can sign up!
Spoken Tutorial
Spoken Tutorial is a systematic approach to providing materials like Rails Casts. (Is there something like this in Python? Submit a pull-request!)
Check out these statistics!
This program offers vetted, open, downloadable screencasts in hybrid English/native language instruction. Materials span open office software to Scilab and Python.
FOSSEE
The Indian government has also heavily invested in building out free software infrastructure for reasearch and education.
Approximate rationale: “We take a loan from the World Bank, and burn that money on Matlab®”
So, there is now a pipeline to create Scilab and Python materials for over 500 textbooks in science and engineering. All of these materials are available online.
It's hard to get students to use FOSS! But one success is when students realize that Turbo C (you read that correctly) is a barrier to pass a class – then they're willing to switch to gcc on linux.
Akash Tablet
They apparently also make these tablets, which aim to be the cheapest linux computers in the world.
28 February 2014
Getting Things Done with Pandas!
Pandas? What's that?
Pandas has become a hub of activity for the development of useful, (relatively) easy tools for doing all kinds of "data science." This functionality is built around a central DataFrame structure (familiar to those of you coming from R), and includes support for oft-neglected stages of science like data cleaning and exploratory plotting.
The plan
We didn't quite get through all the nice ways to use matplotlib last week, so we'll be ready to have a quick look at:
- Seaborn, which makes it easier to plot using pandas and make things look nice.
- Vectorized string methods that make your data-cleaning life easier.
- Dav used stock data using the pandas io module for one example.
But we always welcome short talks from everyone! Particularly if you are haven't spoken before!
If there's interest, after the lightning talks, I (Dav) will lead a group on fixing a bug in the string methods linked above, and we'll try submitting a pull request against pandas on github.
21 February 2014
Plotting and Stuff
We're going to PLOT!!!!
There are lots of ways to make your matplotlib experience better (especially looking better):
Then, there are numerous approaches using JavaScript (via python):
And lastly, plotly doesn't quit fit into the above. It's a true web API for plotting.
Actual discussion
Mark talked about matplotlib basics. The defaults for bar plots are poor, but you can fix them up. Scatterplots are great.
Veusz is a neat package to manually fix up your plot, but you can save them manually.
Bokeh can plot matplotlib objects in javascript. It also does awesome interactive plots and mapping. It can plot in the notbook directly, but you need to run a bokeh-server. Jake Vanderplas (author of mpld3) said he was excited about Bokeh too.
Plotly requires you to be online, but looks nice and is interactive / easy to share.
Vincent is an easy way to get from pandas to D3 (but provides a strong abstraction layer around D3 - Vega).
If you're interested in digging into javascript, dimple.js comes recommended as a well-documented, lightweight approach that allows full access to D3 underneath. It doesn't do maps, so if you want that, leaflet and tilemill seem to be the go-to tech these days, with a lot of investment by the folks at Mapbox. This post will probably make you say "whoa, holy map-diffs, GitHub!"
07 February 2014
A very Python Tea Party
Tea?
We had tea for our British friend. But, he didn't show up. We managed to get by, and did a little one-on-one consulting. Then…
Lightning talks
- Dav demo'd rpy2 and rmagic to send data you've cleaned up in Python to R.
- Min just pushed %interact to master! Check it out!
- Paul melted some brains with his "vimception" plugin to enable multi-vim-mode in IPython notebook itself and every cell within.
31 January 2014
A New Format for Python Meetings
The Plan
Going forward, we'll try consolidating to just one meeting format, and do it each week.
Fridays 4-6pm
D-Lab "Convening Room" (the classroom in our main space)
First meeting: Friday 31st January 2014
We hope this time will attract both people who're too busy during the main working day, and people who need to get home soon after work. Come and work on your own projects, or talk to other people about your shared interests. We made a map of topics people at the organisational meeting last week work on:
There will be lightning talks at about 4.30pm on using Python in teaching and automated grading. If you know of projects or ideas about that, let us know at the meeting.
Update: What Happened?
17 people present with optional affiliations, interests, like:
- ipython, vim, vision science
- python, vim, biostatistics
- IPython, pyzmq, plasma physics
- iSchool, open data, IPython
Dav mentioned prose.io as a way to edit the GitHub pages (i.e., for this site). He's also using pytest in his course: e.g., https://github.com/dlab-berkeley/python-fundamentals/blob/master/challenges/01-Intro/test_A_print_stuff.py
he Picked pytest because error messages clearer than alternatives pedagogically: it might be great to emphasize importance of testing.
Jarrod about a tutorial for students to setup a github repository
- berkeley-stat133.github.io
- https://education.github.com/
- http://apis.io
GRADING WITH PYTHON RESOURCES / LINKS
Harvard cs109 is using IPython notebooks and some kind of grading (http://cs109.org)
- (runipy) https://github.com/paulgb/runipy run IPython notebooks
- ipnbdoctest script: https://gist.github.com/minrk/2620735
- IPython nbconvert's preprocessor and metadata can be used to re-execute the notebooks, and generate HTML reports in a single step
24 January 2014
The Python Workers' Party Inaugural Rally!
Executive Summary
The Python Workers' Party will have it's first planning meeting this Friday at noon. We'll figure out our plan for the semster. Please come even if you're just getting started! Afterwards, around 1pm, Dav (and perhaps others) will be available for consulting.
A Little History
Last semester, we had two different kinds of meeting for the Python community on campus, with the following confusing names:
- py4science has been a continuation of the 6-ish year old user group meeting started by the IPython crew & friends.
- After the summer 2013 FUNdamentals class in the D-Lab, students created py4data, where people show up with their own projects, or work on learning projects together. A wide range of expertise is often available for assistance in the room.
We're doing a complete rebrand this semester, announcing the Python Workers' Party!
Note that, much like "py4data," we don't imply that you need to be a "scientist" or even be doing "science" to attend. Digital humanists, open gov types, and multi-media artists are welcome!
The Agenda
- Do we continue with two separate meetings this semster?
- When will our meetings be?
- Are there any particular topics / guests we'd like to see?
Revolutionary attire is encouraged.
11 December 2013
Coastal Ecosystem Simulation at Py4science
Chris Kees and Aron Ahmadia will be presenting some of their work.
Chris is one of the lead developers of Proteus, a Python-based toolkit for the solution of partial differential equations with coastal applications. Aron is a developer of PyClaw, a Python-based toolkit for the solution of wave propagation problems. from their email:
We're mostly interested in meeting the newly-reinvigorated py4science group, and sharing a little bit about what we're working on over here at the US Army Engineer Research and Development Center in terms of how Python is helping us protect our coasts, estuaries, rivers, and levees.
These meetings are informal but fun. We meet from 5-7pm oon the 3rd floor of Barrows Hall in the D-lab Convening Room (largish meeting room). And there usually is pizza.
UPDATE: We saw presentations on PyClaw and a way to install such gnarly code with HashDist/HashStack! Even on Windows (and equally gnarly supercomputing clusters)!
20 November 2013
Testing Packages: pytest and nose
In attendance
11 folks signed in, from:
- Psychology
- Nuclear Engineering
- Vision Science
- Physics
- L&S
- IPython
- and (of course) the D-Lab
Nose
You can find out more about Nose here.
Katy presented a tutorial on testing from Software Carpentry.
Pytest
You can learn more about Pytest here.
Thomas presented a notebook, which was mostly about pytest.
About this post
This is the first post where we're using jQuery to automatically format links to
IPython notebooks (code here).
For now, we automatically add a link to view in nbviewer following any link
ending in .ipynb
. It would be pretty nifty if we could use javascript to look
for a notebook server (on localhost:8888
?), and offer it a notebook somehow.
18 November 2013
Let's Get Ready to Test!
We'll be having two events this coming week working towards creating well-tested, ready-to-share scientific code.
Py4science testing extravaganza
We couldn't do py4science last week, due to some absences and illnesses. So,
we'll pick up this week Nov 20, 5-7pm
with:
(Feel free to add your name above if you are planning to help with that section.)
Py4data code porting and testing
Then, Friday, Nov 22 at noon, we'll start working on porting some pharmacological MR analysis code to battle-tested python. This project will serve as (we hope!) an exemplary model for folks wanting to create reusable code for science. Should you come even if you're not a neuroscientist? Definitely! It's about the process here - not the specific code.
30 October 2013
Meeting Notes 2013-10-30
Today's topic: data-wrangling with pandas
!
Attendance
There were 15 attendees, with people coming from the following departments and organizations:
- D-Lab
- IPython
- Redwood Center
- Department of City and Regional Planning
- Psychology
- School of Information
- Neuroscience
- Lawrence Berkeley National Laboratory
Agile Data Wrangling with Python
IPython notebook presented by Cindee
Next Time: Testing
Next meeting is in two weeks on 11/13/2013
UPDATE: 11/13 (today's) meeting postponed.
Unfortunately, some of our core presenters will be unavailable for this evening. As such, we are postponing tonight's py4science testing extravaganza until next week (11/20).
We'll be covering basic testing strategies and testing frameworks (unittest, py.test, nose, etc.).
25 October 2013
Hacking away at py4data
Continuing with didactic
Dav (that's me) led a brief tour of how easy it is to pull down financial data from one of the major providers using pandas, and using methods right there on the DataFrame to plot the results. But, there are pitfalls!
Find out more in the py4data directory in the master
branch of our repo (this site is managed in the
gh-pages
branch).
Settling into a reasonable pace
We continue to get newcomers, and some folks have still managed to come to every meeting. While we're building expertise as a group, you are more than welcome to drop in. Folks are making progress on everything from managing campus budgets, to automatically classifying power plants. No challenge is turned away!
Planning ahead for some open, battle tested science!
In only 4 weeks (Nov 22), we'll be starting on a group project to make some existing scientific code open, easy-to-use, and well-tested. Tell your colleagues!
Up next
There wasn't much feedback on what people wanted to see for the first 30 minutes of the meeting, so I just chose something I thought would be useful. Please contact me if you're interested in hearing about a particular topic. We can also pull in expertise from other people!
18 October 2013
Packaging and Meeting Notes
We've started a new format, where we'll talk about a (hopefully) useful topic for Python and data science for the first 30-40 minutes. See the end of this post for a poll for this Friday!
Planning for the Future
We agreed that we'll invite Ian Greenhouse from Neuroscience to bring his code to py4data in November. We'll work on developing robust, sharable, well-tested python code that processes data for pharmacological MRI scans. Even if you're not a neuroscientist or biologist, this will be a great opportunity to develop best practices in sotware developmetn, and will include general analysis issues like spectrum analysis and data management.
We are tentatively planning to start this project at the py4data meeting on Nov 22, 2013
Making it Easier to Get Started and Learn
We talked about resources that are missing from python.berkeley.edu.
GOAL: go to python.berkeley.edu when confused about something. It's a work in progress, mostly useful for folks getting started (on Python or a particular topic). Following are some resources we'd like to integrate:
- Udacity
- Learn Py the Hard Way
- Point to GitHub "How To"
- Learnds. ?
- Graphing:
- Vincent (for d3) - apparently hard to use for a beginner
- ggplot - a copy of R's ggplot2 (Can use ggplot2 or lattice via rpy2)
- Lecture notes from high-quality intro python courses on campus?Terry Regier (Cog Sci / Linguistics) python notes?
How do we Install Packages?
- EASY, TRY THESE FIRST:
- Canopy (GUI, cmd line)
- Anaconda (make sure to update conda with
$ conda update conda
) - For Ubuntu (and other Debian-based systems): NeuroDebian PPA - Good even for non-neuro-science
- other PPAs are mostly outdated or TOO up-to-date / broken
- Linux: apt/yum/etc
- Use pip
$ pip install pandas
$ pip install -U pandas
to upgrade
- Download directly from the project page directly
- download bundle OUTSIDE python dirs/folders. We'll call this
pkg-dir
$ cd pkg-dir
$ python setup.py install
- download bundle OUTSIDE python dirs/folders. We'll call this
What do we talk about this Friday?
- Designing experiments in python (eye tracking, etc.)
- Speeding up code (pandas/numpy, cython, array ops)
- start with pandas (over numpy) for social sciences
- Graphing
- Nothing
16 October 2013
Meeting Notes for 2013-10-16
Last Meeting Editors
Attendance : 9
- Overview of different editors
- most useful, and resources on how to get started using that particular editor)
=======
- Emacs: Jess
- Sublime: Bill
- Vim: Many
- Gedit: Dav
Jess Hamrick started with Emacs
- suggested the homebrew version os cocoa emacs for MacOSX
####The bad
- high initial learning curve
- plugins can be buggy
- bad package management
- customization written in elisp
####The good
- emacswiki.org
- supports many languages (Python, Latex, GIT Markdown)
- Terminal mode
- plugin for ipython notebook
- scratch buffer for prototyping
- Jedi for auto completion
- [nipy tricked out emacs] (http://nipy.sourceforge.net/devel/tools/tricked_out_emacs.html)
- pycheckers for integrating
- magit Git interface
Dav Clark Gedit via virtualbox
Used virtualbox + vagrant to run ubuntu and gedit on OSX
- gedit and related packages need to map to outside folder, but this is easy to set up
- preferences
- set up default preferences by choosing preferences form File menu
- easily choose relevent modules (eg smart spaces)
- great tool for beginners and teaching
- syntax highlighting
Bill Sprague Sublime
The bad
- not opensource need a license ($70)
- though!! public beta for verson 3 license is not required
The good
- sublime works on all platforms
- faster and less bloated than Vim
- powerful GUI interface
- multiple ways to edit text * command pallate is amazing (has fuzzy searching for all commands making it trivial to find command you want) * good keyboard shortcuts classic mode with vim keyboard shortcuts, good for transition * setup files are json script, easy to edit * many add on packages which are easy to install * has project mode for searching within projects * good for multipurpose cleaning of text files
08 October 2013
Meeting Notes from 2013-10-02
This was the second meeting of py4science!
Attendance
There were 9 attendees, with an experience breakdown of:
- Experienced: 7
- Intermediate: 2
- New: 0
People came from the following departments and organizations:
- Neuroscience
- IPython
- Psychology
- D-Lab
- Bioengineering
- Vision Science
Working Groups
Dav talked about the differences between working groups and the py4science meeting, emphasizing that py4science is more "show and tell" while py4text/py4data is more hands-on.
Newbie Nugget
Cindee presented this week's Newbie Nugget. The topic is about if __name__ ==
"__main__":
IPython notebook for the Newbie Nugget
Min pointed out that you can write files in IPython notebook using the
%%file
cell magic!
Jess mentioned that if __name__ == "__main__"
doesn't work well in emacs and IPython. Apparently python-mode ignores anything in the 'if' statement.
Lightning Talks
What's New in IPython
Presented by Min
Awesome stuff: %matplotlib
, raw_input
, nbconvert
, widgets
Notebook for New Features in IPython 1.0
Notebook for Upcoming Features in IPython 2.0
Python 3
Presented by Thomas
Thomas covered a range of topics, including: print
, iterators, unicode, function annotation for
typechecking and command line argument parsing, yield from and return from generators.
Useful tip: python -o
will remove assert statements!
Update: Thomas' Slides
Next Time: Editors
Next py4science meeting is in two weeks, on 10/16/2013!
- Dav will do the newbie nugget
- Overview of different editors (Each person should take about 10 minutes to describe their workflow, what things are most useful, and resources on how to get started using that particular editor).
- Emacs: Jess
- Sublime: Bill
- Vim: Paul
- TextMate: Min
- Gedit: Dav
This meeting will be good for beginners! Please come join us to learn more.
01 October 2013
A healthy variety at py4science
We'll have our first substantive meeting this week on 10/02/2013!
As per usual, we'll meet from 5-7pm, but unlike previously we'll meet in the large breakout room, 371 Barrows. The door is around the corner from the main D-Lab entrance.
Meetings are still every other week, but may change to every week if there's enough content.
Schedule
- Cindee will do the newbie nugget
- Lightning talks on advanced topics (~20 minutes each)
- Python 3 (Thomas)
- New stuff in IPython (Min)
Next meeting: editor extravaganza!
01 October 2013
Pandas and econometrics at py4data
Recap
- We had eight people show up, mostly grad students (less than normal, perhaps due to a competing job fair).
- This week people mostly worked independently on different projects. In particular people are starting to work with their own data.
- Almost everyone is finding it useful to use pandas to read in their data sets and some people are starting to look at doing quantitative economics on their data.
- Py4text is in the process of merging with py4data so that we have more skills and expertise in the room at once.
Upcoming
Both organizers and py4data and py4text are entering a busy period of the semester and we're realizing we have conflicts with other important events on campus. So:
- Py4text and py4data meetings will merge - we'll just call it py4data in the future
- Py4data is on hold while we try to figure out the best time to accomodate interested members of the py4science community.
So, if you're interested in hacking on some code with a like-minded Python community, please join the py4data working group! (details will be forthcoming here and on the mailing list.) But, there is no meeting this Friday, Oct 4 for py4data. The regular py4science meeting will meet on Wednesday, and will keep meeting on alternating Wednesdays.
As always, we'll coordinate here on python.berkeley.edu/py4science, and on the py4science mailing list!
27 September 2013
Introducing py4text
Some folks have already found out about our py4text Working group, meeting Fridays at the D-Lab from 2-4pm. And there's apparently some confusion. So, just to be clear:
All members of the Py4Science community are welcome!
We are currently figuring out the schedule for the working group this semester. Check back soon for an update.
25 September 2013
Introducing py4data
Some folks have already found out about the new py4data working group, meeting Fridays at the D-Lab from noon-2pm. And there's apparently some confusion. So, just to be clear:
All members of the py4science community are welcome!
And what goes on there? Here are some lightly edited words from the current organizer of py4data, Bob Bell:
I hope everyone is well. We had an awesome time last week learning how the python data analysis package pandas can help us productively work with data.
We will be meeting again Friday (9/27/2013), 12-2pm at the D-Lab (Barrows 3rd floor) to finish working through our sections in Chapter 2 and give our group presentations.
If you want to continue with Pandas, we will switch this week from Wes McKinney's book to these [IPython notebook based resources] (https://bitbucket.org/hrojas/learn-pandas).
Some of us might forge ahead into [econometics and monte carlo sampling] (http://quant-econ.net/).
As we did last week, towards the end we will discuss some of the specific topics we want to cover in subsequent meetings, based on the features we have explored the past three weeks as well as your own projects.
And, as before, I will be available 15-20 minutes before the working group session to help anyone with accessing this notebook, getting setup with wakari.io, etc.
18 September 2013
Meeting Notes 2013-09-18
This was the first meeting of py4science, and was primarily organizational.
Attendance
12 attendees, with an experience breakdown of:
- Experienced: 6
- New: 3
- Intermediate: 3
People came from a variety of different departments and organizations as well:
- Helen Wills
- Redwood Center
- Neuroscience
- Psychology
- Nuclear engineering
- IPython
- Astronomy
- I School
Meeting topics
Newbie nuggets
-
Each meeting will start with approximately 20 minutes for a "newbie nugget", which is a brief overview of some introductory topic.
-
Newbie nuggets should be written up in IPython notebooks, so they can be archived and referred to
-
Cindee went through two example nuggets on the glob library and list comprehensions
Related groups/events
- Working groups
- Python for data analysis on Fridays 12-2pm, in D-Lab (Bob Bell)
- Text analysis on Fridays 2-4pm in D-Lab
- There will probably be more Python Fundamentals courses
Advanced topics for lightning talks
What do people want to hear about?
- Test-driven development (resources, getting into the habit of test-driven development)
- Using virtual environments
- Parallel computing (theano, and just parallelization in general)
- Pandas workflow
- IPython: new and upcoming features, tips and tricks
- How to work on the bleeding edge of an environment
- Maintaining packages
- pytables
- Writing documentation
- Cython
- Python 3
- Development tools: editors, version control, pylint, etc.
- scikits (especially scikit-learn)
- statsmodels
- RPy
- Javascript, D3
- Pyglet
- Starcluster (for managing amazon ec2 clusters)
- networkx
- Package managment (pip, easy_install, anaconda, enthought, wheels, etc…)
- Panda3D
Publicity
How do we get the word out about this meeting to people? What is the best way to have a collaborative sharing environment?
- py4science mailing list (please forward to other lists, too!)
- google calendar that sits on the website, can be imported (this has now been created and can be found here
- py4science list name is not on the website, there should be a link that takes you to the page to add yourself. We should also include this in the emails.
- py4science twitter feed?
- we should put the exact room number on the invitation emails/website
Misc
- Open to more people besides Berkeley, e.g. industry
- If we want pizza, we can set up a collection and ask different people to bring food each meeting. D-Lab will reimburse for small amounts of food.
- Information about the old py4science can be found here
Future meetings
-
Meetings are every other week, may change to every week if there's enough content
- Next meeting is in two weeks 10/02/2013
- Cindee will do the newbie nugget
- Lightning talks on advanced topics (~20 minutes each)
- Python 3 (Thomas)
- New stuff in IPython (Min)
- The following meeting: editor extravaganza!
16 September 2013
Resurrecting py4science at UC Berkeley
We will be having an organizational meeting this Wednesday, September 18th, 2013 at the D-lab on the 3rd floor of Barrows Hall.
The main purpose of the first meeting is to set up the structure for the rest of the year. The goal of py4science is to bring togther people using python for science to allow for sharing of tips, tricks, and resources. This is not a meant to be a training series, instead we hope to discuss a wide range of ever changing topics:
For example:
- source control
- scipy or numpy advanced nugget
- virtual env
- integrating testing
- package specific nuggets (eg Pandas, NetworkX, Ipython)
Example Meeting Agenda
- newbie nuggets: @ 20 minutes on a topic of interest to new coders
- lightning talks: (1-4) advanced topics
- working groups (split or not-split into groups to discuss lightning topics in detail)
- random access: bring your code issues and questions, this is the time we help each other
- core dump: meet for beers to continue conversations
This first week we need input from you:
- topics to cover
- what do you think of the meeting layout
- what do you want to see
- how to disseminate information (mailing list, blog, wiki, newsletter)
- engaging the community
- industry night (May?)