GitExtractor API Documentation

class tidyextractors.tidygit.GitExtractor(source, auto_extract=True, *args, **kwargs)

The GitExtractor class is for extracting data from local git repositories. This class has methods for outputting data into the changes and commits tidy formats, and a raw untidy format.

Parameters:
  • source (str) – The path to a local git repository
  • auto_extract (bool) – Defaults to True. If True, data is extracted automatically. Otherwise, extraction must be initiated through the internal interface.
changes()

Returns a table of git log data, with “changes” as rows/observations.

Note

drop_collections is not available for this method, since there are no meaningful collections to keep.

Returns:pandas.DataFrame
commits(drop_collections=True)

Returns a table of git log data, with “commits” as rows/observations.

Parameters:drop_collections (bool) – Defaults to True. Indicates whether columns with lists/dicts/sets will be dropped.
Returns:pandas.DataFrame
raw(drop_collections=False)

Produces the extractor object’s data as it is stored internally.

Parameters:drop_collections (bool) – Defaults to False. Indicates whether columns with lists/dicts/sets will be dropped.
Returns:pandas.DataFrame