Managing a Big Bunch of Information & Knowledge

About: sample article

     First off, there is a distinction between information & data; the latter being more homogeneous & more likely to be exploited by spreadsheets & databases. Information in the sense I am using it is more like knowledge about some topic in the news. Knowledge I am interested in should contain the who, what, when, where, why & how of events in the news.  The most valuable information is complete if all the question words just mentioned have their corresponding true answers in the source article along with the source of those answers. Ideally, the source would also have exposed the corresponding answers to those same question words & so on down the line for the publishers, reporters, & protagonists of an event under the news scrutiny.
     Now, the sample article mentioned has approximately 2825 words in it. Treated as data, the article word usage counts etc occupy a spreadsheet with 1156 lines & 3 colums representing the word, frequency (times word was used) & % total usage.  I don't have a sentence counter, but the number of periods s/b the approximate count which is 151. If we used mentography that would at least mean having 151 SVO constructs as a minimum. That would amount to 3X that amount of nodes or about 453 nodes without connecting to each other. The subject Obama, Acorn, Chicago & Stern ended up amongst the highest word counts as follows:
word count 

         %

Stern 14 0.41
Chicago 15 0.44
radical 15 0.44
Acorn’s 30 0.88
Obama’s 36 1.06
Obama 45 1.32
Acorn 58 1.7
     Beyond all that we could do a little more & tease out the sentences which have those words & try to derive the semantic content. The problem I find is that is a lot to deal with & display in concentrated form as discovered information that serves us better than the original article.
     Hence I have conceived the idea of an Event Log or Event Database. If I were to model my own research about a topic just reading articles or books it would be the building of a database in my mind which has extraordinary powers to input information & extraordinary powers of retrieval based on associations & the ability to hold conflicting accounts until resolved by future investigation.
     There are about 220 unique words in this item so far.

Tags

  1. item 10488
  2. item 9536
  3. item 10577

Comments


Seth says
There is a field of linguistic study called latent semantic analysis which relies on counting word occurrences in articles.  Each article has its own profile and similar articles have demonstrably similar profiles.  I think these techniques are used primarily for library science retrievals of the form: find me other articles like this one.  But i don't know of any tools (outside of the simple word counts you used for the sample) that are available on the net to be deployed in a practical sense today.  Deeper semantic (retrievals?) that would get you to the where-why-what nodes are even scarcer.  Perhaps CyCorp has something in the labs available for the big bucks. 

What I think we really need in UnhackTheBrain is something that would distinguish between articles which intend to report the who,what, and where of the news from pieces like this one and that one which were written with the intent to spin facts in some particular partisan manner.  Perhaps you could use laten semantic analysis but focus on propoganda words like "radical", "neocon", "liberal" etc ... the more or those kind of words the more likely it is a partisan spin article.  I had though that the spin spotter tool bar would help us there, but i couldn't get it to work.  Did you ever get it to work?

Mark de LA says
9536 (currently private) describes an event database with a mindmap.

Mark de LA says
Just for frustration sake & to spur more interest in the overall project consider the new Senate bailout proposed legislation here in pdf format.  It is 451 pages in that format. 70,077 words of which 6,574 are unique ones.
And a lot of stuff contrary to:
 
   (It is rumored to have earmarks & give aways to sweeten it up so the senate gets a pass vote.)


Mark de LA says
seth 2008-10-02 07:43:00 10577
There is a field of linguistic study called latent semantic analysis which relies on counting word occurrences in articles.  Each article has its own profile and similar articles have demonstrably similar profiles.  I think these techniques are used primarily for library science retrievals of the form: find me other articles like this one.  But i don't know of any tools (outside of the simple word counts you used for the sample) that are available on the net to be deployed in a practical sense today.  Deeper semantic (retrievals?) that would get you to the where-why-what nodes are even scarcer.  Perhaps CyCorp has something in the labs available for the big bucks. 

What I think we really need in UnhackTheBrain is something that would distinguish between articles which intend to report the who,what, and where of the news from pieces like this one and that one which were written with the intent to spin facts in some particular partisan manner.  Perhaps you could use laten semantic analysis but focus on propoganda words like "radical", "neocon", "liberal" etc ... the more or those kind of words the more likely it is a partisan spin article.  I had though that the spin spotter tool bar would help us there, but i couldn't get it to work.  Did you ever get it to work?
Not yet on the spin spotter - couldn't find an update for the FireFox plugin either.
I was surprised how much I could get just from counting.  My intent, however, went deeper than that. I was looking for a measure of the complexity of the article contents. I'm looking for a better way of displaying the real knowledge in an article measured as answers to the question words. Once I saw the complexity was beyond a visual graph without zoomability I redefined the interface in the last paragraphs of the item. I don't want to pursue a statistical approach unless I have to to understand the natural language of an article.

Mark de LA says
This graphic for building an ontology suggests to me some ideas of what an interface might look like to build a mentograph. The missing piece is where the article is & how it gets to a place where such an interface can be applied. In the pie the 6 questions & maybe a couple of action slices might be used


Mark de LA says
An interesting mentograph of sorts:

courtesy of this article.

See Also

  1. Thought Event Log &/OR Database with 7 viewings related by tag "item 10577".
  2. Thought Of Ego Trips & the Last Refuge - (Adolfz Result) with 7 viewings related by tag "item 10488".
  3. Thought Electioneering vs Truth & Substance with 0 viewings related by tag "item 10488".
  4. Thought Research on Thought Processes with 0 viewings related by tag "item 10488".