# Finding Dependencies in Tabular Data

I have a file with tabular data and I want to find out if the range of possible values in one column is affected by the value of another column. Here is how I did it in Python 2.4.

This is related to relational database normalization and the code below could be modified to specifically find functional dependencies for normalization. My goal was a little more general, however. I wanted to know any time the range of possible values for one column given another column's value is a proper subset of the possible values for the column when no other column values are fixed.

I have two data structures. One is a tuple of sets of possible values for each column. The second is a set of the rows (each a tuple).

```possible_values = (set(), set(), set(), set(), set())
rows = set()
```

(Note, I've hard-coded the length of the possible_values tuple to 5 to match my current data.)

Next I load in the data — it's in a whitespace-delimited file, one row per line:

```for line in file("data.txt"):
row = tuple(line.strip().split())
for col_i in range(len(row)):
possible_values[col_i].add(row[col_i])
rows.add(row)
```

Now here's my function for finding whether one column's value restricts the possible values of another.

```def find_dependencies(col_i, col_j):
for i_value in possible_values[col_i]:
j_values = set()
for row in rows:
if row[col_i] == i_value:
j_values.add(row[col_j])
if j_values < possible_values[col_j]:
yield i_value, j_values
```

It goes through each possible value in the i column and finds out if fixing that reduces the possible values in the j column.

Notice that it makes use of < as a set operator for proper subset. It's a generator too. It will yield any value in the i column that restricts the values in the j column, along with what the restriction is.

What if you want to check the dependency, not between just two columns but two groups of columns?

The solution turned out to involve a couple of cool modifications which I'll save for a followup post.

The original post was in the category: python but I'm still in the process of migrating categories over.