Assignment 6¶

In this assignment, you'll analyze a collection of data sets from the San Francisco Open Data Portal and Zillow. The data sets have been stored in the SQLite database sf_data.sqlite, which you can download here. The database contains the following tables:

Table	Description
`crime`	Crime reports dating back to 2010.
`mobile_food_locations`	List of all locations where mobile food vendors sell.
`mobile_food_permits`	List of all mobile food vendor permits. More details here.
`mobile_food_schedule`	Schedules for mobile food vendors.
`noise`	Noise complaints dating back to August 2015.
`parking`	List of all parking lots.
`parks`	List of all parks.
`schools`	List of all schools.
`zillow`	Zillow rent and housing statistics dating back to 1996. More details here.

The mobile_food_ tables are explicitly connected through the locationid and permit columns. The other tables are not connected, but you may be able to connect them using dates, latitude/longitude, or postal codes.

Shapefiles for US postal codes are available here. These may be useful for converting latitude/longitude to postal codes.

Shapefiles for San Francisco Neighborhoods are available here.

import sqlite3 as sql
import sqlalchemy as sqla
import pandas as pd
import seaborn as sns
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams['figure.figsize'] = (12, 12)
db = sqla.create_engine('sqlite:///sf_data.sqlite')

cursor = db.execute("SELECT * FROM sqlite_master")
rows = cursor.fetchall()

Exercise 1.1. Which mobile food vendor(s) sells at the most locations?

pd.read_sql("SELECT * FROM sqlite_master", db)

pd.read_sql("SELECT * FROM mobile_food_schedule LIMIT 1 ",db)
#nothing for 0040, but 0070 is off the grid. requesting permit still

pd.read_sql("SELECT * FROM mobile_food_locations LIMIT 1",db)

approved=pd.read_sql("SELECT * FROM mobile_food_permits WHERE Status = 'APPROVED' LIMIT 2",db)
approved

db.execute("CREATE TABLE ls AS SELECT * FROM mobile_food_schedule INNER JOIN mobile_food_locations ON mobile_food_locations.locationid = mobile_food_schedule.locationid ")

#db.execute("CREATE TABLE NewTable AS SELECT * FROM Parts WHERE Weight > 14")
#INNER JOIN SupplierParts ON Parts.PartID = SupplierParts.PartID", db

pd.read_sql("SELECT * FROM ls LIMIT 1",db)

db.execute("CREATE TABLE permitSchedule AS SELECT * FROM mobile_food_permits INNER JOIN ls ON ls.permit = mobile_food_permits.permit")

PS=pd.read_sql("SELECT LocationDescription,Applicant FROM permitSchedule WHERE LocationDescription IS NOT NULL ",db)
PS=PS.drop_duplicates(keep='last')
PS['Applicant'].value_counts()

May Catering                                        52
Anas Goodies Catering                               34
Natan's Catering                                    25
Liang Bai Ping                                      24
Singh Brothers Ice Cream                            21
Sun Rise Catering                                   18
Mang Hang Catering                                  18
John's Catering #5                                  18
Park's Catering                                     18
Two G's Catering                                    16
D & T Catering                                      16
Mini Mobile Food Catering                           15
Steve's Mobile Deli                                 15
Quan Catering                                       14
M M Catering                                        13
Mike's Catering                                     11
F & C Catering                                      11
Bach Catering                                       10
Munch A Bunch                                       10
Roadside Rotisserie Corporation / Country Grill      7
BH & MT LLC                                          7
Linda's Catering                                     6
Off the Grid Services, LLC                           6
Eva's Catering                                       6
Cheese Gone Wild                                     5
Casey's Pizza, LLC                                   5
Covered Dish Productions, LLC. dba. The Rib Whip     5
San Francisco Street Foods, Inc.                     5
Golden Catering                                      5
Lobsta Truck                                         5
                                                    ..
Bob Johnson                                          1
Eli's Hot Dogs                                       1
Breaking Bread Inc.                                  1
La Jefa                                              1
Mr. Nice, LLC                                        1
Your Community Foods                                 1
Peruchi Food Truck,LLC                               1
El Gallo Jiro                                        1
Raspados El Yocateco                                 1
The Creme Brulee Cart                                1
Stanley Roth                                         1
Curbside Coffee                                      1
Wexler, LLC                                          1
Geary Cocina                                         1
Faith Sandwich                                       1
SF Cart Project                                      1
Got Snacks                                           1
Golden Gate Halal Food                               1
Rita's Catering                                      1
Munch India                                          1
Leo's Hot Dogs                                       1
Mob Dog                                              1
Bonito Poke                                          1
Kabob Trolley, LLC                                   1
Sausage Slinger                                      1
Curry Up Now                                         1
Kiss Point                                           1
Missing Link SF                                      1
Mario's Colombian and Mexican Food                   1
Alfaro's Truck                                       1
Name: Applicant, dtype: int64

Which mobile food vendor(s) sells at the most locations?

May Catering is the food vendor that sells at the most locations of 52. The second largest food vendor would be Anas Goodies with 34 locations. In third is Natan's Catering with 25 different locations.

Exercise 1.2. Ask and use the database to analyze 5 questions about San Francisco. For each question, write at least 150 words and support your answer with plots. Make a map for at least 2 of the 5 questions.

You should try to come up with some questions on your own, but these are examples of reasonable questions:

Which parts of the city are the most and least expensive?
Which parts of the city are the most dangerous (and at what times)?
Are noise complaints and mobile food vendors related?
What are the best times and places to find food trucks?
Is there a relationship between housing prices and any of the other tables?

Please make sure to clearly state each of your questions in your submission.

pd.read_sql("SELECT * FROM zillow LIMIT 2",db)

Which parts of the city are the most and least expensive?

pd.read_sql("SELECT RegionName, AVG(MedianSoldPricePerSqft_AllHomes) AS Avg_Sqft_Home FROM zillow GROUP BY RegionName LIMIT 5", db)

avg_rent=pd.read_sql("SELECT RegionName, AVG(MedianSoldPricePerSqft_AllHomes) AS Avg_Sqft_Home FROM zillow GROUP BY RegionName", db)
avg_rent.max()

RegionName       94158.000000
Avg_Sqft_Home     1336.128581
dtype: float64

avg_rent.min()

RegionName       94080.000000
Avg_Sqft_Home      297.008046
dtype: float64

The highest average rent was 1,336 sq ft and the zip code is 94158. And the lowest average rent per square ft was 297 at the zip code of 94080. The barplot below is a distribution of all 26 zip codes and their corresponding average median rent from Zillow.

x = list(avg_rent['RegionName'])
y=list(avg_rent['Avg_Sqft_Home'])
plt.bar(x, y)
plt.ylabel("Avg Sq Ft")
plt.xlabel("Zip Code (941__)")
plt.title("Average Rent by Sq Ft by Zip Code")
plt.show()

What parts of the city are the most dangerous? (And what times?)

pd.read_sql("SELECT * FROM crime LIMIT 2 ",db)

crime_count=pd.read_sql("SELECT Category, COUNT(*) AS Count FROM crime GROUP BY Category", db)

crime_count=crime_count.sort_values(by=["Count"], ascending=[False])
crime_count.head()

crime1=crime_count.drop([20,21])
crime1['Count']

counts=list(crime1.head()['Count'])
counts
labels=list(crime1.head()['Category'])
plt.rcParams["figure.figsize"] = (6,6)
plt.pie(counts,labels=labels,autopct='%1.1F%%',startangle=140)
plt.show()

The pie chart above displays the 5 most common criminal charges in the San Francisco area. The two categories of reports that were removed were 'Other Offenses' and 'Non-Criminal' because they did not provide enough specific information regarding the offense. We see that some form Theft is the most frequent crime among the top 5, at nearly 60% of reports.

What regions have the most amount of noise complaints?

noise_complaint=pd.read_sql("SELECT * FROM noise",db)

from mpl_toolkits.basemap import Basemap

fig,ax=plt.subplots(figsize=(10,20))
noisemap = Basemap(llcrnrlon = -122.553864, llcrnrlat = 37.675669, urcrnrlon=-122.334137,urcrnrlat = 37.816836,resolution = 'h', projection = 'merc')
noisemap.drawmapboundary(fill_color = 'white')
noisemap.drawcoastlines() #maybe dun need
noisemap.readshapefile("geo_export_9c1e2bde-8fa4-45cf-aa3a-7959911db8f7", 'districts')

for lon,lat in zip(noise_complaint.Lon,noise_complaint.Lat):
    x,y=noisemap(lon,lat)
    noisemap.plot(x,y,'ro',markersize=2)
plt.title('Noise Complaint by District')
plt.show()

The red dots indicate all of the locations where there was a noise complaint reported. It appears that the highest density of these reports is in the downtown area of San Francisco. This is expected because this region of San Francisco has the highest density of population and businesses. There are many companies, restaurants, and homes in the financial district so it is realistic that this area would be so loud.

Parks

park=pd.read_sql('SELECT * FROM parks ',db)

fig,ax=plt.subplots(figsize=(10,20))
parkmap = Basemap(llcrnrlon = -122.553864, llcrnrlat = 37.675669, urcrnrlon=-122.334137,urcrnrlat = 37.816836,resolution = 'h', projection = 'merc')
parkmap.drawmapboundary(fill_color = 'white')
parkmap.drawcoastlines() 
parkmap.readshapefile("geo_export_9c1e2bde-8fa4-45cf-aa3a-7959911db8f7", 'districts')

for lon,lat in zip(park.Lon,park.Lat):
    x,y=parkmap(lon,lat)
    parkmap.plot(x,y,'go',markersize=2)
plt.title('Park by District')
plt.show()

This plot shows the location of parks in San Francisco, split by the districts. There does not appear to be any sort of clustering of parks, which makes sense. It is unnecessary to have so many parks all in one close proximity.

How many schools are there?

pd.read_sql('SELECT * FROM schools LIMIT 3',db)

school_type=pd.read_sql("SELECT GeneralType, COUNT(*) AS Count FROM schools GROUP BY GeneralType", db)

count_school=list(school_type['Count'])
count_school
label_school=list(school_type['GeneralType'])
plt.rcParams["figure.figsize"] = (6,6)
plt.pie(count_school,labels=label_school,autopct='%1.1F%%')
plt.title('Type of School')
plt.show()

The last graphic displayed is another pie chart. Because there were only four types of schools, they were all plotted together. It appears that CDC or development center are the most frequent in San Francisco. Another interesting statistic was the high frequency of private schools at 28.5%. For people that live in San Francisco, they know that the public school system is a lottery so many families take education into their own hands by sending their children to private schools.

	permit	Status	Applicant	FacilityType	FoodItems	PriorPermit	Approved	Expiration
0	16MFF-0027	APPROVED	F & C Catering	Truck	Cold Truck: Hot/Cold Sandwiches: Water: Soda: ...	1	2016-03-09 12:00:00	2017-03-15 12:00:00
1	16MFF-0069	APPROVED	Munch A Bunch	Truck	Cold Truck: packaged sandwiches: pitas: breakf...	1	2016-03-15 12:00:00	2017-03-15 12:00:00

	RegionName	Date	ZriPerSqft_AllHomes	MedianSoldPricePerSqft_AllHomes	PriceToRentRatio_AllHomes	Turnover_AllHomes
0	94109	2010-11-01 00:00:00	3.156	675.1913	19.14	6.0771
1	94110	2010-11-01 00:00:00	2.566	599.6785	18.10	5.4490

	RegionName	Avg_Sqft_Home
0	94080	375.629802
1	94102	626.417276
2	94103	604.009929
3	94104	1336.128581
4	94105	798.257824

	IncidntNum	Category	Descript	DayOfWeek	Datetime	PdDistrict	Resolution	Address	Lon	Lat
0	150060275	NON-CRIMINAL	LOST PROPERTY	Monday	2015-01-19 14:00:00	MISSION	NONE	18TH ST / VALENCIA ST	-122.421582	37.761701
1	150098210	ROBBERY	ROBBERY, BODILY FORCE	Sunday	2015-02-01 15:45:00	TENDERLOIN	NONE	300 Block of LEAVENWORTH ST	-122.414406	37.784191

	Category	Count
16	LARCENY/THEFT	243144
21	OTHER OFFENSES	141431
20	NON-CRIMINAL	125702
1	ASSAULT	90029
35	VANDALISM	54467

	type	name	tbl_name	rootpage	sql
0	table	crime	crime	2	CREATE TABLE "crime" (\n"IncidntNum" INTEGER,\...
1	table	noise	noise	35775	CREATE TABLE "noise" (\n"CaseID" INTEGER,\n "...
2	table	parking	parking	35921	CREATE TABLE "parking" (\n"Owner" TEXT,\n "Ad...
3	table	schools	schools	35944	CREATE TABLE "schools" (\n"Name" TEXT,\n "Ent...
4	table	parks	parks	35961	CREATE TABLE "parks" (\n"Name" TEXT,\n "Type"...
5	table	zillow	zillow	35967	CREATE TABLE "zillow" (\n"RegionName" INTEGER,...
6	table	mobile_food_permits	mobile_food_permits	36050	CREATE TABLE "mobile_food_permits" (\n"permit"...
7	table	mobile_food_locations	mobile_food_locations	36060	CREATE TABLE "mobile_food_locations" (\n"locat...
8	table	mobile_food_schedule	mobile_food_schedule	36079	CREATE TABLE "mobile_food_schedule" (\n"locati...
9	table	location	location	36105	CREATE TABLE location(\n locationid INT,\n p...
10	table	ls	ls	36522	CREATE TABLE ls(\n locationid INT,\n permit ...
11	table	permitSchedule	permitSchedule	36210	CREATE TABLE permitSchedule(\n permit TEXT,\n...

	Name	Entity	LowerGrade	UpperGrade	GradeRange	Category	LowerAge	UpperAge	GeneralType	Address	Lat	Lon
0	Alamo Elementary School	SFUSD	0	5	K-5	USD Grades K-5	5	10	PS	250 23RD AVE, San Francisco, CA 94121	37.783005	-122.482300
1	Alvarado Elementary School	SFUSD	0	5	K-5	USD Grades K-5	5	10	PS	625 DOUGLASS ST, San Francisco, CA 94114	37.753681	-122.438194
2	Aptos Middle School	SFUSD	6	8	6-8	USD Grades 6-8	11	13	PS	105 APTOS AVE, San Francisco, CA 94127	37.729672	-122.465782