This blog is about the peculiar nature in which software sometimes gets developed. I hope that many readers will recognize the relevance of data science in the examples taken from my own projects. I propose that development is the product of creativity more than accreditation. Creativity is something complicated that interacts with a person over his or her life circumstances. Many people know how to write . . . sentences and paragraphs. However, the ability to write well does not necessarily make an individual a good novelist or creative writer. Taking into account the relevance of creativity as force in the process of development, it is necessary to recognize the importance of a person's life whether or not its events are specifically related to any particular undertaking. I will be detailing the meandering way that I personally tend develop software. I don't follow a straightforward path characterized by planning, schedules, and deliverables. Development for me tends to be sporadic, meshed in the details life, and closely tied to my interests. So it is rather non-business-like.
I am not a professional programmer. I am a bit of peculiarity. I program therapeutically as a form of relaxation. I discovered that programming helps to take me away from stress. Yet I would almost never describe programming as "enjoyable." In fact, I hardly receive any enjoyment from it. It is a powerful diversion that often provides relief from more mundane matters. I later found that it also helps me to maintain my concentration. These days driven by some desire to handle large amounts of data in an unconventional manner, I periodically make use of my coding activities to help me produce worthwhile applications. For instance when I was considering an academic career, I didn't want to rely on platform-specific commercial software to keep track of my notes and references. Neither did I wish to rely on my recollection. So I created a search engine to store and access files in exactly the manner that I required: a header file containing citations in various styles along with my observations; quick-click access to documents that satisfy the search parameters; and portability so I can access the data on different machines and operating systems for the rest of my life. It is the perpetuity that I needed most. I wrote programs to access the data in Java and Visual Basic. Under the right conditions, I am both productive and relaxed while programming; and although I don't have any exact formula to explain its development, these are usually the same conditions that give rise to greater levels of creativity.
Algorithms Initially to Study Capitalization
I mentioned in previous blogs that I created a visualization environment called "Storm" to study all sorts of dynamic data. I actually use the term "storm" to describe a family of algorithms originally developed to evaluate trading activity; this was not for trading purposes exactly but more to facilitate a type of accounting. I was searching for different ways to characterize "capitalization" - that is to say, the build-up of perceived and actual value - in order to summarize the impacts of flows. When a financial company holds assets for many people, I find at least two major operations: 1) holding the assets in trust physically or by name; and 2) keeping track of ownership interests. (I apologize for not knowing the industry jargon.) So a securities company might hold 100,000 shares of a particular equity; but then it has to keep track of the clients having interest in that equity, for example 1,500 clients for the 100,000 shares. These distinct duties can be distributed to different companies. In the case of operations maintaining a record of entitlements or interests, much of the work is "pure data." The company has no clients, and it has no assets of its own: it just keeps track of who owns what and maintains supporting transactional records. A problem occurs when the basis is other than cash: for instance, the basis of ownership might be mutual fund units or shares. The cash-flows then become detached from the unit-flows. One day I decided that it might be worthwhile to maintain all sorts of parallel valuations reflecting a variety of interests - one type being abstract perceptions of worth or capitalization.
I wrote a trading simulation in order to give body to my conceptualization. While examining segments of trading data against lines representing capitalization, I discovered some interesting behavioural peculiarities. (I never actually used the term "algorithm" until I had to explain these different lines - all formed using the storm family of algorithms. So it is interesting how I was trying to develop a program for back-office operations, which then became part of a trading simulation.) A good lesson here is how ideas for software can emerge during development itself rather than planning. Perhaps this is already obvious. Let me express things a bit differently. I almost never create an actual plan to guide development. A lot of companies spend a great deal of time planning. I largely skip this process! Some might ask, how does one know what the outcome will be? In fact, I never exactly know this. So I definitely find myself a bit surprised sometimes by the outcomes. We need to put things into perspective. When a person deals with more data than he or she can immediately understand, the overhead involved in planning to deal with an unknown is both ironic and wasteful. Programming for me is part of the discovery process. I use it for research not just for production.
I want to explain for those that don't program, it occurs at the fingertips. People can spend all sorts of time strategizing, building their cases, arguing to take development in a particular direction. But in the end, a person has to rely on his or her fingertips. What if the software doesn't do the job? Change it. That's what "soft" part means. What if the developer is just wasting time by producing something that nobody wants? Well, my understanding is that people debating matters are often working out what they want. If they know what they want, pray tell and get the process rolling. If they don't know what they want, then of course they should discuss matters to their satisfaction. For the record, when I do my own developments, I figure out what I want during development, which is never-ending. What I want tends to change along with the capabilities of the software. If I knew exactly what I wanted and the software did precisely this, there would hardly be a need to do research or make improvements. In short, the process of development for me is, I'm not sure what the best word might be, both happenstance and agile. I don't spend much time sorting out details for software that doesn't exist, to perform processes yet to be done, to control data that I barely understand. However, once the software takes form, it often becomes a "living" part of my life, changing to accommodate my relationship with the data. I now want to delve deeper into the story of Storm and explain its incarnations over the years. I hope readers enjoy its history, which leads up to an application called "Earthshield"; this will all be explained.
Average, Reluctant, and Eccentric
I posted some structural details about the storm family of algorithms in an earlier blog. Since a structural explanation would be too much of a digression here, I will simply detail some of the general behaviours: E for "eccentric" (later called "paranoid") represents an algorithm that exhibits sudden jerking movement in response to trading activity; R for "resistant" (later called "reluctant") shows exceptional insensitivity to trading; and finally, A for "average" (later called "reactionary") lies at some point between the extremes of E and R. E and R are mathematically rigid while A can be set as a fractal - i.e. A is not a mathematical average but the result of a process bounded by two extremes. I present the three lines below for a particular publicly-traded stock. Notice how E leaps ahead of R and A. E routinely presses against the boundaries. E provides early warning, but it is also prone to false positives and negatives in relation to the broader trend. E provides the most plays but at the greatest level of risk. On the other hand, A tends to touch but not press against the boundaries except under exceptional circumstances; so some might use A for market-timing although I certainly don't make any recommendations. R almost never touches the boundaries except in extreme cases. R offers the highest level of safety but the least amount of opportunity. See how it is all so mathemagical strictly from an algorithmic-behavioural standpoint.
The illustration above shows only three lines. Yet I said that A can be set as a fractal thereby offering many more lines if the user prefers. If a large number of lines are set side-by-side in a lattice, the trading activity results in a 3-dimensional plume stripped of its amplitudes to reflect kinetic persistence, which in the past I have described as "sentiment." (I have described paranoia, reluctance, and reactivity as simulated aspects of investor sentiment. I don't want to complicate matters, but I call an amplitude-free plume a displacement plume; this is algorithmically extracted from a pricing-free plume called a relational or differential plume.) In the next set of images, the left-side pattern shows trading prices for a particular equity in the high-tech sector. This equity experienced a terrible day during the sampling period as indicated by the cliff. The corresponding displacement plume is shown to the right. I decided to flip the displacement plume horizontally and add some reference lines to show that it is indeed derived from the same trading data: the only difference is that the plume indicates when the technicals are "bottoming out" and "peaking" by distributing the kinetics over the entire visualization field. It's not a magic trick. It's also not sophisticated math. It's due to a different and perhaps more complicated perspective on the meaning of the data. I want to point out however that plume sentiment and price are quite disassociated: the ride between boundaries might involve huge changes in price or little. So 3-dimensional displacement plume might benefit from an extra dimension: one comes to mind although, as I will explain shortly, I'm not developing this technology anymore.
Here are some general observations about displacement plumes: 1) they are designed to fit within the boundaries although the algorithms do not know the upper and lower price limits in advance; 2) the plumes are visually in opposition to the market but not negatively correlated due to lack of amplitude; 3) for the same reason, opposition is achieved whether the trading pattern is extremely sharp or gentle; and 4) since the algorithm doesn't know the future price, an E-type plume is often forced against the boundaries, thereby providing a means of auto-adjustment if desired. One way to regard the behaviour of A, R, and E is to interpret the lattices as people with different levels of sensitivity to trading activity. Pricing data tends to receive much public attention; but plumes as a characterizations of the underlying phenomena should probably be studied more closely. (My blog in a couple of weeks will describe a "Universal Data Model" providing a basis to examine different types of phenomena against the invoked metrics. Please consider reading it.)
911 Suspicious by Its Market Timing
This blog is about how life circumstances affect development; it is not about trading. Nonetheless, since I have crossed paths with the market so often, many concepts associated with trading continue to have a strong influence on my projects. I had a great interest in derivatives at one time. I passed a number of industry-standard courses to become a stock broker. I thought about continuing my education and maybe entering the field of portfolio management. I was drawn to the whole idea of trading as a way to manage risk. (If I really had to describe my state-of-mind, I would say that I was "finding myself.") As the preceding plume images show, there is often a great deal of blurriness in the patterns - conceptually simulating differences in sentiment. But at some point during my studies, I noticed the "stars coming to alignment"; there seemed to be a significant amount of clarity. I apologize for not recalling exact details except a particular date - 911. The events of 911 are suspicious by their relationship to a technical peculiarity in the markets. The next set of images is for a market index or composite. (Notice the more wave-like displacement plume generated by a composite compared to an individual stock.)
The displacement plume should not have known about the terrorist attacks in advance; yet 911 occurred quite near boundary-contact. There are a few ways to think of the 911 "point" shown above. Before reaching it, a person might rationalize the situation as follows: "The market seems to be heading to the boundary. I should anticipate it hitting the boundary and take this opportunity to clear my position." Another perspective that is ironically quite the opposite is follows: "The market will be hitting the boundary. I should take a position since the market seems unlikely to push beyond it." So a bet can be justified either way depending on the perspective and exact timing. I recall the one horrific thing about 911 from a trading standpoint that could not be predicted. My derivatives textbook failed to mention anything on the subject: it involved the closure stock markets. I had failed to take it into account. Since a person can neither clear nor assume a position, closure throws timing out the window. It seemed like the house could invent and enforce rules capriciously. I took this as an omen never to purchase derivatives again. 911 actually caused me to steer away from an investment career. However, I was left with this unusual algorithmic imaging environment.
Searching for the Mysteries of the Universe
I think that the ambivalence a nation has to its creative people is related to mass consumerism. There is an expectation of products going to market, and then others can pick and choose from the items they see on the shelf. However, that is a production model that has little to do with the development of creative solutions. I know there has been a strong parallel between production and innovation where it almost seems like production triggers innovation. Certainly from my standpoint, production never has to follow development. There is much to support the assertion that I am simply obsessed with my prototypes and with data. Doesn't it seem more reasonable for innovation to extend from obsession? I think that a certain percentage of the population is passionate about things that make them borderline pathological. But there are pervasive social normatives that suppress or at least fail to support those natural traits and instincts. On one hand, we want people in society to reflect that society and therefore share some commonalities. Conversely over the course of normalization, we might deprive society of the forces that drive change. I read it in posts and blogs all the time. Having a lot of data is pointless, the usual argument goes. That's actually a normative construct rather than an educated opinion; it is meant to prevent the transformation of our society. In any event, I think I'm perilously close to an entirely different kind of blog. So let's get back to the narrative about me with this amazing imaging environment but nowhere to use it - in a sad world that doesn't care.
Like Leeuwenhoek in his days collecting samples for his microscope, I started gathering different sorts of data to examine using my new imaging environment. Fortunately, the world was really starting to embrace the Internet. I found myself with access to all sorts of interesting data: tidal levels, storm speeds, and earthquakes. I was amazed to find that it often didn't matter what kind of data I fed into Storm. I would usually get pretty interesting plumes. Above is an image I believe from tidal levels I recall from a military database. I was using an unusually large lattice just to see what would happen. As indicated, I found an algorithmic anomaly visually resembling a squirrel. Readers can judge for themselves. I placed an actual squirrel image on the left for comparison. One wonders what other curious animals await discovery. I was actually wondering at one point whether the formation of liposome membranes and cell lattices might be explained mathematically - like the unpacking of a compressed file. The image looks rather biological. I generated the next image from raw data that I found posted on the Internet: respiration (left); heart rhythm (middle); and blood pressure (right). Apart from facilitating a new form of visualization, I was also interested in using plumes for triggering purposes - i.e. early-warning signals. I considered two different strategies at the time to interpret heart rhythm problems: 1) erosion from baseline patterns; and 2) planting kinetic traps at strategic parts of the visualization field. Since I am not a medical professional, I found it unlikely that my contribution would be taken seriously regardless of the level of success or how much time I spent on the research. So I didn't put too much effort into the development of pattern-matching or trapping. Due to issues of financial instability at the time, I was not in a position to simply pursue my personal interests regardless of the long-term benefits to society. Rarely have I had all of these things together: time to pursue development; freedom to choose my direction and pace; and resources to keep me going. In particular, lack of stability has almost always delayed or terminated projects.
The Transition - Earthquakes
The time that I started studying earthquakes represents a transitional period for me: it forced me to come to terms with the idea of "social obligation." I was back in school learning a skilled trade and also recovering from some health problems. (I had been pinned under a minivan. Perhaps I was dealing with certain psychological issues related to having a car over me.) In those days - and I believe the situation continues to be true today - the whole idea of predicting earthquakes evoked scepticism and ridicule. It's a bit like the whole argument against big data: it's impossible to extract value just by collecting more; therefore stop collecting more, so goes the argument. Extracting value is indeed unrelated to collecting more data. The whole idea of acquiring supplies is to support production. The question of how to facilitate production is quite separate from the issue of supply levels. In any event, just to get the details straight, I was 1) incapacitated; 2) unemployed; 3) studying earthquakes in my spare time; and 4) being told that there's no point trying to make predictions. Meanwhile during my vocational studies, there was a massive earthquake that killed an enormous number of people. This forced me to detach myself from the world in order to make sense of my role in it. I came to accept that sometimes a person's role is just plain limited. However, I tried to create reasonable opportunities for social intersection by doing what comes naturally - following my obsession with data. Since I was expecting to enter a career servicing furnaces, I decided that I would eventually be short on opportunities to study earthquakes. So I tried to satiate my interests as quickly as possible. Earthquake data is noisy especially it seems near populated areas. I "assume" (not being a geologist) that rippling is more coherent in an open countryside if the ground exhibits geological homogeneity. I experimented with ways to obtain wave patterns as shown on the image below. I should explain that I was doing this research "on the fly" - not really keeping good documentation (having planned to service furnaces for a living.)
"WesternQuakesAFI" on the image above is from Canada. "AFI" means Ambient Flux Index. Notice the seismic readings at the right side of the image, which I hope will show that the plume is actually adapting to the activity. I call it a relational plume because it isn't based on absolutes. I interpret the data almost like stock prices where there is no ceiling; the plume continuously re-sensitizes itself, focusing more on propensity than magnitude. The only thing that can cause this particular plume to lose its grip is absence of change. The AFI must seem rather impressive to make sense of chaotic ground signals. (I'm not exactly certain if I can recall how I came up with it now so many years later.) Try to determine the logic that I rather expertly used to determine the points to click on the interface to show relative lows and highs. I make use of a particular strategy. The technique doesn't work on induced readings such as heartbeats. A displacement plume is premised on the predictability of propensity once amplitudes are removed: this bifurcates magnitude from behavioural tendency, which are so habitually conflated in prediction. I might elaborate on another blog at some point.
Emergence of Earthshield
After successfully studying earthquake data ("success" being defined by my ability to study the data rather than predict quakes), I developed something called an "Earthshield." A plume is generated using data from a single data-logger or monitor. So it occurred to me to systematically go through all of the data-loggers in a particular area to produce a map: through this mapping process, one might be able to see "risk progression." An Earthshield is a map of risk. I understand the reluctance of organizations and seismologists to predict earthquakes. Prediction is certainly a good circus trick. Although people go to see the circus, their focus on the performance is unproductive. When would ambulance and other emergency services be most incapacitated is the question - not when exactly when they will have to deal with an earthquake. If there were an ice storm today, would these deployable assets be in a position to deal with the challenge? We're sort of focusing on the natural disaster as if we have the ability to control it. Does creating the software and collecting data mean that we know how to predict earthquakes? It means that we will have the means to develop our understanding. Below I show an Earthshield over the State of California. The GUI reveals the data settings in this case: between 2004-11-01 and 2004-11-25. It's really interesting. I apologize for not remembering the colour scheme. I'm uncertain whether blue or red indicates more riskquake.
I was impressed by the Earthshield - and rather surprised by own handiwork. So I tried to introduce it in a setting to encourage worthwhile social outcomes. I made an appointment to see a university professor from geography. Obviously not being a seismologist or geologist, I was rather out of my element in regards to earthquakes. So I said that I wanted to study potholes. I hoped to use city data to produce an Earthshield to show risk of potholes. At the same time, others in the academic community would eventually gain access to the technology for their own purposes - perchance even to predict earthquakes. The professor was surprisingly cooperative and responsive to my interests. However, the city that I hoped to use as a source of data seemed . . . maybe "apathetic" is word. I didn't receive any response to my enquiries. I try to "create reasonable opportunities for social intersection by doing what comes naturally to me." It wasn't lack of data per se that thwarted me but signs of indifference and apathy. The city wasn't just a potential source of data but also the main beneficiary for my research. My whole undergraduate experience had been about a 4-year commitment that the market was quick to devalue after I graduated. Maybe I didn't want to spend another 6 to 8 years of my life developing something nobody wants.
Problems Defining Creativity
I started off the blog describing a type of accounting application. However, here near the end, I'm discussing risk of potholes. This is not the sort of progression that one would find in a "business environment" or indeed any other type of professional setting. I believe that business would tend to question not only the program itself - its relative value to operations - but also its nature, how it came about, and the qualifications of the developer. The contextual focus of organizations limits not just aspects of development but also developers. That which exists does so only within the framework permitted by the setting or environment; and what is true of software I believe is equally so in relation to developers. If a work of art is produced by a painter or sculpture, the work often speaks for itself. Neither it nor the artist is defined by the environment. Development is a creative process where the work should stand on its own. But this is not the case at all, and neither is a developer free to simply follow his or her passions. Creativity in data science is essentially awarded rather than recognized. "Based on your qualifications, you seem to be an exceptional individual." This line of reasoning can both inflate and deflate the scientist. Unless every data scientist has a computer science background that can be confirmed on paper, it might be easy at some point to dismiss non-conforming individuals. Entire industries have emerged to help people gain the appearance of being qualified. But as I have pointed out in this blog, what appears on paper does not necessarily reflect the complex realities of individuals.
A business undertaking might persist if the expectation of return is quite strong; conversely, it might not commence in the absence of such expectations. A labour of "creativity" can start or stop with the developer whose interests shift and sway over the course of his or her life. I think in general in relation to creative initiatives, organizations can't treat developers as they would products that can be bought off the shelf and returned. I hope my story of the Earthshield demonstrates the potentially awkward and meandering paths towards development. Certainly when countries direct public resources to prop up struggling companies and economies continue to be mired, one might question whether business normatives remain relevant. For once the world is hostile to both development and developers, forcing square pegs to fit into round holes, quite possibly at some point organizations will start to get only round pegs. In my blogs I often discuss issues from the perspective of organizations. Here, I chose to consider development as a "lived experience" - i.e. something highly personal in nature. In terms of the Earthshield, development halted many years ago: so using it to study potholes, retail demand, logistics, theft, vandalism, congestion, equities, or anything else is all in the past. Maybe round pegs don't develop Earthshields.