Ok, just when you think you know XML, there is an API using JSON.
This is always so sad! But luckily some people have compassion and solved our problems:
http://blog.blprnt.com/blog/blprnt/processing-json-the-new-york-times
Following this blogpost, we can extract numbers from articles, and there is even a simple way to visualize the data.
Analyzing the JSON String:
when we get the full JSON file we see these parts:
{"body" : "Music CITY HALL COFFEE HOUSE The \"Rockin' the Courtroom\" musical series kicks off with Scott E. Moore, singer and guitarist, and Frank Bango, singer. Friday at 7 P.M. Tickets: $5. Next week, Caren Belle and Happy Boy. Hoboken City Hall, 94 Washington Street, Hoboken. (201) 420-2207. CLUB BENE Female impersonators take the stage in \"Boys Will Be" , "date" : "19951231" , "title" : "ON THE TOWNS" , "url" : "http:\/\/www.nytimes.com\/1995\/12\/31\/nyregion\/on-the-towns-050180.html"}
and other pieces that start with "body".
We also see that \" is used for quotes, and inside the body the main text is seperated from the date by a comma.
So we can use some String functions to extract the text from the results like this:
String request = baseURL + "?query=O.J.+Simpson&begin_date=19940101&end_date=19960101&api-key=" + apiKey;
String result = join( loadStrings( request ), "");
//println( result );
String[] resultList = split(result, "body");
for (int i = 0;i< resultList.length;i++){
//remove 5 front chars
String removeFrontSpaces = resultList[i].substring(5, resultList[i].length() );
//search for date and stop the string in front
int dateIndex = removeFrontSpaces.indexOf("date");
String resultWithoutDate = removeFrontSpaces.substring( 0, max ( dateIndex - 2, 0) );
//println (resultList[i] );// has " : " in front
//println (removeFrontSpaces );
println( resultWithoutDate );
println ("------------------------------");
}
Looking at the text, you can see "byline" as an indicator, this can be removed (if you want to) in the same way as date. You could also start the splitting at date and get the date out.
Here are the first few results:
------------------------------
Music CITY HALL COFFEE HOUSE The \"Rockin' the Courtroom\" musical series kicks off with Scott E. Moore, singer and guitarist, and Frank Bango, singer. Friday at 7 P.M. Tickets: $5. Next week, Caren Belle and Happy Boy. Hoboken City Hall, 94 Washington Street, Hoboken. (201) 420-2207. CLUB BENE Female impersonators take the stage in \"Boys Will Be" ,
------------------------------
They have sportcoats, E-mail and no suntan. Northwestern will not be playing its look-alike in the Rose Bowl; Harvard isn't eligible. The Wildcats are here for the first time in a half-century, thanks to either running back Darnell Autry or their longtime alum, Moses. A year ago this week, one of their leading tacklers was giving Penn State a" , "byline" : "By TOM FRIEND" ,
------------------------------
and so on...
Remark, often running the sketch, you get a NullPointerException, the request is not returned. This happens even if you use try catch structures. Just try a few times, the API must be very busy!
Roxane Torre has made this pdf with very useful info and exmples:
http://crosslabminor.files.wordpress.com/2011/04/dag32.pdf
Aucun commentaire:
Enregistrer un commentaire