Sunday, June 23, 2013

Looking at the Data So Far

Good News and Bad News

The crawls have been running for about a week now, and it's time to look at some of the data we've been gathering

The Good News

The crawls seem to be running smoothly. The scripts are working, and all the output is getting to the right places. Some of the qsub output files showed segfault errors, but this is consistent with what has happened in the past, and they occur seemingly at random.

Additionally, from the looks of the sites I've looked at, it seems that having one file per article would be reasonable for most sites. However...

The Not So Good News

After poking though a bunch of crawled papers, it seems that the terminal branch of most directories has only an index.html file in it. Thinking that the wget might have run out of time, I re-ran the crawl on one site without a time limit, and indeed many more pages showed up. I will increase the crawl time a little, but we may have to switch to crawling every other day.

One problem is that, for most newspapers, the actual articles are located at the furthest ends of directory tree. That means that nearly all the other pages, superfluous to us, must be downloaded before we get any of the content we're after.

Another problem is that I'm still not 100% certain how all the wget flags work. When we started, I copied the command from a source that Ann gave me. I think a better understanding of what the --levels flag does will help. Also using -R to filter some pages may speed things up.

Moving Forward

I'm going to be out of town next week, and then the following week I'll be back in Baltimore. When I get back, I aim to focus my attention on the following goals:

  1. Play with wget flags so that I really understand what they do
  2. Refine crawl times so that wget can run long enough to get the pages that we want
  3. Start adapting the cleaning script I've already written to clean up some of the articles we've crawled

Wednesday, June 19, 2013

Background pt. 2

Gun Violence Research from NAP Report

My summary

This information comes from a report issued by the National Academies Press entitled "Priorities for Research to Reduce the Threat of Firearm-Related Violence", published in 2013. Contributors include the Institute of Medicine (IOM) and the National Research Council (NRC).

After reading the article, here are my initial thoughts about what parameters we should look at:

  1. Characteristics of violence
    1. Homicide, suicide, fatal, non-fatal, accidental
    2. Role of controlled substances
    3. Type of firearm / ammunition used
  2. Location
    1. Rural vs. Urban
    2. Type of location
      1. In a home, park, school, etc.
    3. General geographic information
  3. Victim / Perpetrator information
    1. Age, sex, race
    2. Relationship of victim to perpetrator
    3. History of mental illness and other risk factors

My Notes

***(I include these as a sort of summary of parts of the report I thought would be relevant to our study. Page numbers refer to the page of the PDF document I viewed)***

"Applying Public Health Strategies to Reducing Firearm Violence" (p.29)
            This section describes how strategies can be implemented to prevent violence similar to those taken with tobacco/alcohol and motor vehicles.
            "Such strategies are designed to interrupt the connection between three essential elements: the “agent” (the source of injury [weapon or perpetrator]), the “host” (the injured person), and the “environment” (the conditions under which the injury occurred)" (p.29)
                        1. Agent - The source of injury
                        2. Host - The injured person
                        3. Environment - conditions under which injury occurred
            There are 5 areas where more information about gun violence is needed (p.33):
                        1. characteristics of firearm violence,
                        2. risk and protective factors,
                        3. interventions and strategies,
                        4. gun technology, and
                        5. influence of video games and other media.
            [For the purposes of our investigation, I suggest focusing on (1) and (2), which are discussed below.]

"Impact of Existing Federal Restrictions on Firearm Violence Research" (p.34)
            Information is lacking on:
                        1. Gun Sales, ownership, possession
                        2. Names of gun purchasers
"Policy makers need a wide array of information,
including community-level data and data concerning the circumstances of firearm deaths, types of weapons used, victim–offender relationships, role of substance use, and geographic location of injury — none of which is consistently available" (p.35)
                        3. Circumstances of death
                        4. Types of weapons used
                        5. Victim-offender relationships
                        6. Role of substance use
                        7. Geographic information
"Basic information about gun possession, acquisition, and storage is
lacking" (p. 36), [however I don't think this is the kind of information we will be able to gather, so I won't write much about it]
"Data about the sources of guns used in crimes are important because the means of acquisition may reveal opportunities for prevention of firearm related violence" (p.36)
            Currently some information is collected by the ATF
                        Only after a gun is used in a crime, though, and does not track changes in ownership - not representative of crimes
Possible source of information: Weapon-Related Injury Surveillance System (WRISS) which some municipalities use

            Basically, not much is known
            To Look Into:
                        1. Types and number of firearms that exist in the US
                                    "In general, there are three characteristics that define individual guns: gun type, firing action, and ammunition" (p.39)
            Types of Firearm Violence:
                        1. Broad level: fatal or non-fatal
                        2. Fatal: homicides, suicides, homicides, unintentional
                                    a. Mass-shootings sometimes another category
                        3. Non-fatal: unintentional vs. intentional, threats, defensive use,
                                    Though there are cross classifying characteristics, such as age, sex, etc., these categories are useful.

What is known / not known about the following occurrences:
                        Fairly well known:
                                    Urban vs. Rural
                                    Age, Sex, Race
                        Not well known:
                                    Premeditated or Impulsive?
                                    Use of firearm vs. other method
                        Fairly Well known:
                                    Victim-Offender relationship (though still important)
                                                Race, Sex, age, etc.
                                    Domestic violence related shootings
                                    Type of gun used
                        In general, more is known about homicides
            Unintentional Fatalities
                        Fairly well known:
                                    Self inflicted?
                                    Self Defense?
                                    Rural vs. urban
            Mass Shootings
                        Not well known:
                                    Characteristics of suicides associated with mass murders
                        Fairly well known
                                    Intentional vs. unintentional
                                    Self-inflicted vs. other-inflicted
                                    Use in assault (as a threat)

SUMMARY (p45):
Characterize differences in nonfatal and fatal gun use across the
United States. Examples of topics that could be examined:
            1.What are the characteristics of non-self-inflicted fatal and nonfatal gun injury?
                        o What attributes of guns, ammunition, gun users, and other circumstances affect whether a gunshot injury will be fatal or nonfatal?
                        o What characteristics differentiate mass shootings that were prevented from those that were carried out?
                        o What role do firearms play in illicit drug markets?
            2. What are the characteristics of self-inflicted fatal and nonfatal gun injury?
                        o What factors (e.g., storage practices, time of acquisition) affect the decision to use a firearm to inflict self-harm?
                        o To what degree can or would prospective suicidal users of firearms substitute other methods of suicide?
            3. What factors drive trends in firearm-related violence within subpopulations?
            4. What factors could bring about a decrease in unintentional firearm-related deaths?

Situational factors associated with firearm violence (p.48)
            1. Presence of drugs / alcohol
            2. Intent: to acquire money, or as an impulse
                        Need to protect personal status/property
                                    "Some social and psychological research suggests that the need to defend social status may increase the likelihood and severity of response to provocation in the presence of an audience"(Griffiths et al., 2011; Papachristos, 2009) (p.48)
            3. Gang involvement
            4. Other situational factors such as excessive heat (Anderson et al., 1995), the presence of community disorder (or “broken windows”)
            5. Specific locations, e.g.: house/apartment, public street, natural area, vehicle, parked car, athletic area, hotels/motels, commercial areas

Study-proposed research questions (p.50)
            Three important research topics were identified by the committee:
                        1) factors associated with youth having access to, possessing, and carrying guns;
                        2) the impact of gun storage techniques on suicide and unintentional injury, and
                        3) “high-risk” geographic/physical locations for firearm violence.
            Youth Gun Violence [probably can't tackle most of these]
                        Examples of topics that could be examined:
                                    o Which individual and/or situational factors influence the illegal acquisition, carrying, and use of guns by juveniles?
                                    o What types of weapons do youths obtain and carry?
                                    o How do youths acquire these weapons, e.g., through legal or illegal means?
                                    o What are key community-level risk and protective factors(such as the role of social norms), and how are these risk and protective factors affected by the social environment and neighborhood/community context?
                                    o What are key differences between urban and rural youth with regard to risk and protective factors for firearm-related violence?
                        o What are the associated probabilities of thwarting a crime versus committing suicide or sustaining an injury while in possession of a firearm?
                        o What factors affect this risk/benefit relationship of gun ownership and storage techniques?
                        o What is the impact of gun storage methods on the incidence of gun violence—unintentional and intentional—involving both youths and adults?
                        o What is the impact of gun storage techniques on rates of suicide and unintentional injury?
                        1. What are the characteristics of high- and low-risk physical locations?
                        2. Are the locations stable or do they change?
                        3. What factors in the physical and social environment characterize neighborhoods or sub-neighborhoods with higher or lower levels of gun violence?
                        4. Which characteristics strengthen the resilience of specific community locations?
                        5. What is the effect of stress and trauma on community violence, especially firearm-related violence?
                        6. What is the effect of concentrated disadvantage on community violence, especially firearm-related violence?

More information is needed on the effectiveness of intervention programs. Is this something we'll be able to consider? (p. 61).
            Possible factors: Childhood education, poverty, substance use

More information is needed about the effectiveness of gun safety technology

Sunday, June 16, 2013

Changing Tack

No more Newspapermapping

The person at got back to us with the database from his website. While I'm a little sad that 5+hrs of my time have been for naught, I'm glad I don't have to spend another 10+ hours pressing ctrl-c, ctrl-v...

I cleaned up the database by first eliminating all the non-English papers, and then adding "state" back in for about 50 entries that were lacking that field. I removed a handful of links whose connections timed out when I tried to visit their pages. As of right now, a crawl is running on the new URLs.

Thursday, June 13, 2013

Newspapermap Update

States Finished

  1. Washington
  2. Oregon
  3. California
  4. Idaho
  5. Utah
  6. Arazona
  7. Nevada
  8. Montana
  9. Wyoming
  10. Colorado
  11. New Mexico
  12. North Dakota
  13. South Dakota
  14. Nebraska
  15. Kansas
  16. Oklahoma
  17. Texas

I This last go-round I did 194 in an hour!

Also, I am updating the list on the CLSP nodes, so these pages are being crawled as they are added. We have 698 URLs so far

Wednesday, June 12, 2013

Newspapermap Update

States Completed

  • Washington
  • Oregon
  • California
  • Idaho
  • Utah
  • Arizona

Total URLs: 324
Total Time spent: 2.5hrs

I find I can only really do this for an hour at a time, or else I start to go kind of nuts.

Saturday, June 8, 2013

Object Oriented Crawls

A quick update

Today I re-wrote the crawling script in an object-oriented fashion, which took me about 7 hours. I told Ann I would do this later, since the top priority is starting the newspaper crawls, but I figured if I could get a working version by the end of today, I'd have killed two birds with one stone. The old versions of the crawling scripts are still in use for the language crawls, but the way they are written would have made incorporating a new list of newspapers an extremely involved process. I am testing my new version on the newspapers I have thus far culled from and so far all is going well. I'm storing the data on a12, as per Carl's suggestion. Note, I have changed my dating convention. newspaper crawls are labeled:

Time for bed now

Wednesday, June 5, 2013

Newspapermap Update

States completed

  • Oregon
  • Washington
  • Idaho

California is approx. 50% complete. So far, I'm working at about 150 newspapers per hour.

More to come!

Background & Newspapermap - Pt. 1


The fun begins

I started logging information for newspapers in Oregon. At present, I log classify 25 newspapers in 8 minutes (roughly 188 per hour). The most time-consuming part is copying the link from the balloon that pops up when I click on a newspaper's location. I'm thinking about better ways of doing this, but at present, I should be able to finish most of the west coast in a week or so. The east coast will take longer, as there are more newspapers and they're more densely packed - hence more zooming necessary. I'm still trying to think of an effective way of MTurking this.

Statistical Background on Gun Violence

So far

At this stage of the project, I have been focused on ascertaining what information about gun violence was previously collected by the CDC and other government agencies, and how these statistics are gathered.

What the CDC used to collect

I found a report from pre-1997 which focuses on injuries and deaths related to firearms. Additionally, I found a a table in a more general report from 2001 which lists causes of death by "mechanism" - "firearms" is one of the categories. Additionally, this report illustrates some of the "circumstances" of firearm injuries - eg. whether they occur at work, or due to "interpersonal violence". The report also includes this handy graphic:

Handy graphic from the CDC's Surveillance for Fatal and Nonfatal Firearm-Related Injuries --- United States, 1993--1998, (Gotsch et al.)

While this report does speak to how firearms are used, it doesn't really say anything about the demographics of the people involved in these types of incidents. Additionally, I have found no information for any year more recent than 1998.

How they get the data

It turns out, the CDC and the Consumer Product Safety Commission team up to gather data from something called NEISS - the "National Electronic Injury Surveillance System", which is database of information from various hospitals around the country. To give an idea of the sample sizes, the 2001 report above included data from 100 hospitals. NEISS can ostensibly be queried from its website, but when I tried, there was a recurring javascript error.

The Bureau of Justice Statistics also uses data from NEISS to inform its reports. They also run their own program called Firearm Inquiry Statistics (FIST) Program which has information from 1994 - 2005 which includes "[d]ata ... collected directly from state agencies conducting background checks and from local checking agencies and [including] the number of firearm applications made to the agency, firearm applications rejected by the agency, and the reasons for rejection". (Example summary information from the 2005 report: only 1.6% of firearm applications were denied in 2005 - 46% because the requester had a previous felony conviction).

Next Steps

The next steps for me will